org.apache.mahout.cf.taste.impl.model.jdbc
Class MySQLJDBCDataModel

java.lang.Object
  extended by org.apache.mahout.cf.taste.impl.common.jdbc.AbstractJDBCComponent
      extended by org.apache.mahout.cf.taste.impl.model.jdbc.AbstractJDBCDataModel
          extended by org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel
All Implemented Interfaces:
Serializable, Refreshable, DataModel, JDBCDataModel

public class MySQLJDBCDataModel
extends AbstractJDBCDataModel

A JDBCDataModel backed by a MySQL database and accessed via JDBC. It may work with other JDBC databases. By default, this class assumes that there is a DataSource available under the JNDI name "jdbc/taste", which gives access to a database with a "taste_preferences" table with the following schema:

user_id item_id preference
987 123 0.9
987 456 0.1
654 123 0.2
654 789 0.3

preference must have a type compatible with the Java float type. user_id and item_id should be compatible with long type (BIGINT). For example, the following command sets up a suitable table in MySQL, complete with primary key and indexes:

 CREATE TABLE taste_preferences (
   user_id BIGINT NOT NULL,
   item_id BIGINT NOT NULL,
   preference FLOAT NOT NULL,
   PRIMARY KEY (user_id, item_id),
   INDEX (user_id),
   INDEX (item_id)
 )
 

The table may optionally have a timestamp column whose type is compatible with Java long.

Performance Notes

See the notes in AbstractJDBCDataModel regarding using connection pooling. It's pretty vital to performance.

Some experimentation suggests that MySQL's InnoDB engine is faster than MyISAM for these kinds of applications. While MyISAM is the default and, I believe, generally considered the lighter-weight and faster of the two engines, my guess is the row-level locking of InnoDB helps here. Your mileage may vary.

Here are some key settings that can be tuned for MySQL, and suggested size for a data set of around 1 million elements:

Also consider setting some parameters on the MySQL Connector/J driver:

 cachePreparedStatements = true
 cachePrepStmts = true
 cacheResultSetMetadata = true
 alwaysSendSetIsolation = false
 elideSetAutoCommits = true
 

Thanks to Amila Jayasooriya for contributing MySQL notes above as part of Google Summer of Code 2007.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class org.apache.mahout.cf.taste.impl.model.jdbc.AbstractJDBCDataModel
DEFAULT_ITEM_ID_COLUMN, DEFAULT_PREFERENCE_COLUMN, DEFAULT_PREFERENCE_TABLE, DEFAULT_PREFERENCE_TIME_COLUMN, DEFAULT_USER_ID_COLUMN
 
Fields inherited from class org.apache.mahout.cf.taste.impl.common.jdbc.AbstractJDBCComponent
DEFAULT_DATASOURCE_NAME
 
Constructor Summary
MySQLJDBCDataModel()
           Creates a MySQLJDBCDataModel using the default DataSource (named AbstractJDBCComponent.DEFAULT_DATASOURCE_NAME and default table/column names.
MySQLJDBCDataModel(DataSource dataSource)
           Creates a MySQLJDBCDataModel using the given DataSource and default table/column names.
MySQLJDBCDataModel(DataSource dataSource, String preferenceTable, String userIDColumn, String itemIDColumn, String preferenceColumn, String timestampColumn)
           Creates a MySQLJDBCDataModel using the given DataSource and default table/column names.
MySQLJDBCDataModel(String dataSourceName)
           Creates a MySQLJDBCDataModel using the default DataSource found under the given name, and using default table/column names.
 
Method Summary
protected  int getFetchSize()
           
 
Methods inherited from class org.apache.mahout.cf.taste.impl.model.jdbc.AbstractJDBCDataModel
buildPreference, doGetPreferencesForItem, exportWithIDsOnly, exportWithPrefs, getDataSource, getItemIDColumn, getItemIDs, getItemIDsFromUser, getLongColumn, getMaxPreference, getMinPreference, getNumItems, getNumUsers, getNumUsersWithPreferenceFor, getNumUsersWithPreferenceFor, getPreferenceColumn, getPreferencesForItem, getPreferencesFromUser, getPreferenceTable, getPreferenceTime, getPreferenceValue, getUserIDColumn, getUserIDs, hasPreferenceValues, refresh, removePreference, setLongParameter, setPreference
 
Methods inherited from class org.apache.mahout.cf.taste.impl.common.jdbc.AbstractJDBCComponent
checkNotNullAndLog, checkNotNullAndLog, lookupDataSource
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MySQLJDBCDataModel

public MySQLJDBCDataModel()
                   throws TasteException

Creates a MySQLJDBCDataModel using the default DataSource (named AbstractJDBCComponent.DEFAULT_DATASOURCE_NAME and default table/column names.

Throws:
TasteException - if DataSource can't be found

MySQLJDBCDataModel

public MySQLJDBCDataModel(String dataSourceName)
                   throws TasteException

Creates a MySQLJDBCDataModel using the default DataSource found under the given name, and using default table/column names.

Parameters:
dataSourceName - name of DataSource to look up
Throws:
TasteException - if DataSource can't be found

MySQLJDBCDataModel

public MySQLJDBCDataModel(DataSource dataSource)

Creates a MySQLJDBCDataModel using the given DataSource and default table/column names.

Parameters:
dataSource - DataSource to use

MySQLJDBCDataModel

public MySQLJDBCDataModel(DataSource dataSource,
                          String preferenceTable,
                          String userIDColumn,
                          String itemIDColumn,
                          String preferenceColumn,
                          String timestampColumn)

Creates a MySQLJDBCDataModel using the given DataSource and default table/column names.

Parameters:
dataSource - DataSource to use
preferenceTable - name of table containing preference data
userIDColumn - user ID column name
itemIDColumn - item ID column name
preferenceColumn - preference column name
timestampColumn - timestamp column name (may be null)
Method Detail

getFetchSize

protected int getFetchSize()
Overrides:
getFetchSize in class AbstractJDBCComponent


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.