org.apache.mahout.cf.taste.impl.model.mongodb
Class MongoDBDataModel

java.lang.Object
  extended by org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel
All Implemented Interfaces:
Serializable, Refreshable, DataModel

public final class MongoDBDataModel
extends Object
implements DataModel

A DataModel backed by a MongoDB database. This class expects a collection in the database which contains a user ID (long or ObjectId), item ID (long or ObjectId), preference value (optional) and timestamps ("created_at", "deleted_at").

An example of a document in MongoDB:

{ "_id" : ObjectId("4d7627bf6c7d47ade9fc7780"), "user_id" : ObjectId("4c2209fef3924d31102bd84b"), "item_id" : ObjectId(4c2209fef3924d31202bd853), "preference" : 0.5, "created_at" : "Tue Mar 23 2010 20:48:43 GMT-0400 (EDT)" }

Preference value is optional to accommodate applications that have no notion of a preference value (that is, the user simply expresses a preference for an item, but no degree of preference).

The preference value is assumed to be parseable as a double.

The user IDs and item IDs are assumed to be parseable as longs or ObjectIds. In case of ObjectIds, the model creates a Map<ObjectId>, long> (collection "mongo_data_model_map") inside the MongoDB database. This conversion is needed since Mahout uses the long datatype to feed the recommender, and MongoDB uses 12 bytes to create its identifiers.

The timestamps ("created_at", "deleted_at"), if present, are assumed to be parseable as a long or Date. To express timestamps as Dates, a DateFormat must be provided in the class constructor. The default Date format is "EE MMM dd yyyy HH:mm:ss 'GMT'Z (zzz)". If this parameter is set to null, timestamps are assumed to be parseable as longs.

It is also acceptable for the documents to contain additional fields. Those fields will be ignored.

This class will reload data from the MondoDB database when refresh(Collection) is called. MongoDBDataModel keeps the timestamp of the last update. This variable and the fields "created_at" and "deleted_at" help the model to determine if the triple (user, item, preference) must be added or deleted.

See Also:
Serialized Form

Field Summary
static String DEFAULT_MONGO_MAP_COLLECTION
           
 
Constructor Summary
MongoDBDataModel()
          Creates a new MongoDBDataModel
MongoDBDataModel(String host, int port, String database, String collection, boolean manage, boolean finalRemove, DateFormat format)
          Creates a new MongoDBDataModel with MongoDB basic configuration (without authentication)
MongoDBDataModel(String host, int port, String database, String collection, boolean manage, boolean finalRemove, DateFormat format, String user, String password)
          Creates a new MongoDBDataModel with MongoDB basic configuration (with authentication)
MongoDBDataModel(String host, int port, String database, String collection, boolean manage, boolean finalRemove, DateFormat format, String userIDField, String itemIDField, String preferenceField, String mappingCollection)
          Creates a new MongoDBDataModel with MongoDB advanced configuration (without authentication)
MongoDBDataModel(String host, int port, String database, String collection, boolean manage, boolean finalRemove, DateFormat format, String user, String password, String userIDField, String itemIDField, String preferenceField, String mappingCollection)
          Creates a new MongoDBDataModel with MongoDB advanced configuration (with authentication)
 
Method Summary
 void cleanupMappingCollection()
          Cleanup mapping collection.
 String fromIdToLong(String id, boolean isUser)
           Translates the MongoDB identifier to Mahout/MongoDBDataModel's internal identifier, if required.
 String fromLongToId(long id)
           Translates the Mahout/MongoDBDataModel's internal identifier to MongoDB identifier, if required.
 LongPrimitiveIterator getItemIDs()
           
 FastIDSet getItemIDsFromUser(long userID)
           
 float getMaxPreference()
           
 float getMinPreference()
           
 int getNumItems()
           
 int getNumUsers()
           
 int getNumUsersWithPreferenceFor(long itemID)
           
 int getNumUsersWithPreferenceFor(long itemID1, long itemID2)
           
 PreferenceArray getPreferencesForItem(long itemID)
           
 PreferenceArray getPreferencesFromUser(long id)
           
 Long getPreferenceTime(long userID, long itemID)
           
 Float getPreferenceValue(long userID, long itemID)
           
 LongPrimitiveIterator getUserIDs()
           
 boolean hasPreferenceValues()
           
 boolean isIDInModel(String ID)
           Checks if an ID is currently in the model.
 Date mongoUpdateDate()
           Date of the latest update of the model.
 void refresh(Collection<Refreshable> alreadyRefreshed)
           Triggers "refresh" -- whatever that means -- of the implementation.
 void refreshData(String userID, Iterable<List<String>> items, boolean add)
           Adds/removes (user, item) pairs to/from the model.
 void removePreference(long userID, long itemID)
           
 void setPreference(long userID, long itemID, float value)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_MONGO_MAP_COLLECTION

public static final String DEFAULT_MONGO_MAP_COLLECTION
See Also:
Constant Field Values
Constructor Detail

MongoDBDataModel

public MongoDBDataModel()
                 throws UnknownHostException
Creates a new MongoDBDataModel

Throws:
UnknownHostException

MongoDBDataModel

public MongoDBDataModel(String host,
                        int port,
                        String database,
                        String collection,
                        boolean manage,
                        boolean finalRemove,
                        DateFormat format)
                 throws UnknownHostException
Creates a new MongoDBDataModel with MongoDB basic configuration (without authentication)

Parameters:
host - MongoDB host.
port - MongoDB port. Default: 27017
database - MongoDB database
collection - MongoDB collection/table
manage - If true, the model adds and removes users and items from MongoDB database when the model is refreshed.
finalRemove - If true, the model removes the user/item completely from the MongoDB database. If false, the model adds the "deleted_at" field with the current date to the "deleted" user/item.
format - MongoDB date format. If null, the model uses timestamps.
Throws:
UnknownHostException - if the database host cannot be resolved

MongoDBDataModel

public MongoDBDataModel(String host,
                        int port,
                        String database,
                        String collection,
                        boolean manage,
                        boolean finalRemove,
                        DateFormat format,
                        String userIDField,
                        String itemIDField,
                        String preferenceField,
                        String mappingCollection)
                 throws UnknownHostException
Creates a new MongoDBDataModel with MongoDB advanced configuration (without authentication)

Parameters:
userIDField - Mongo user ID field
itemIDField - Mongo item ID field
preferenceField - Mongo preference value field
Throws:
UnknownHostException - if the database host cannot be resolved
See Also:
MongoDBDataModel(String, int, String, String, boolean, boolean, DateFormat)

MongoDBDataModel

public MongoDBDataModel(String host,
                        int port,
                        String database,
                        String collection,
                        boolean manage,
                        boolean finalRemove,
                        DateFormat format,
                        String user,
                        String password)
                 throws UnknownHostException
Creates a new MongoDBDataModel with MongoDB basic configuration (with authentication)

Parameters:
user - Mongo username (authentication)
password - Mongo password (authentication)
Throws:
UnknownHostException - if the database host cannot be resolved
See Also:
MongoDBDataModel(String, int, String, String, boolean, boolean, DateFormat)

MongoDBDataModel

public MongoDBDataModel(String host,
                        int port,
                        String database,
                        String collection,
                        boolean manage,
                        boolean finalRemove,
                        DateFormat format,
                        String user,
                        String password,
                        String userIDField,
                        String itemIDField,
                        String preferenceField,
                        String mappingCollection)
                 throws UnknownHostException
Creates a new MongoDBDataModel with MongoDB advanced configuration (with authentication)

Throws:
UnknownHostException - if the database host cannot be resolved
See Also:
MongoDBDataModel(String, int, String, String, boolean, boolean, DateFormat, String, String)
Method Detail

refreshData

public void refreshData(String userID,
                        Iterable<List<String>> items,
                        boolean add)
                 throws NoSuchUserException,
                        NoSuchItemException

Adds/removes (user, item) pairs to/from the model.

Parameters:
userID - MongoDB user identifier
items - List of pairs (item, preference) which want to be added or deleted
add - If true, this flag indicates that the pairs (user, item) must be added to the model. If false, it indicates deletion.
Throws:
NoSuchUserException
NoSuchItemException
See Also:
refresh(Collection)

refresh

public void refresh(Collection<Refreshable> alreadyRefreshed)

Triggers "refresh" -- whatever that means -- of the implementation. The general contract is that any should always leave itself in a consistent, operational state, and that the refresh atomically updates internal state from old to new.

Specified by:
refresh in interface Refreshable
Parameters:
alreadyRefreshed - s that are known to have already been refreshed as a result of an initial call to a method on some object. This ensures that objects in a refresh dependency graph aren't refreshed twice needlessly.
See Also:
refreshData(String, Iterable, boolean)

fromIdToLong

public String fromIdToLong(String id,
                           boolean isUser)

Translates the MongoDB identifier to Mahout/MongoDBDataModel's internal identifier, if required.

If MongoDB identifiers are long datatypes, it returns the id.

This conversion is needed since Mahout uses the long datatype to feed the recommender, and MongoDB uses 12 bytes to create its identifiers.

Parameters:
id - MongoDB identifier
isUser -
Returns:
String containing the translation of the external MongoDB ID to internal long ID (mapping).
See Also:
fromLongToId(long), Mongo Object IDs

fromLongToId

public String fromLongToId(long id)

Translates the Mahout/MongoDBDataModel's internal identifier to MongoDB identifier, if required.

If MongoDB identifiers are long datatypes, it returns the id in String format.

This conversion is needed since Mahout uses the long datatype to feed the recommender, and MongoDB uses 12 bytes to create its identifiers.

Parameters:
id - Mahout's internal identifier
Returns:
String containing the translation of the internal long ID to external MongoDB ID (mapping).
See Also:
fromIdToLong(String, boolean), Mongo Object IDs

isIDInModel

public boolean isIDInModel(String ID)

Checks if an ID is currently in the model.

Parameters:
ID - user or item ID
Returns:
true: if ID is into the model; false: if it's not.

mongoUpdateDate

public Date mongoUpdateDate()

Date of the latest update of the model.

Returns:
Date with the latest update of the model.

cleanupMappingCollection

public void cleanupMappingCollection()
Cleanup mapping collection.


getUserIDs

public LongPrimitiveIterator getUserIDs()
                                 throws TasteException
Specified by:
getUserIDs in interface DataModel
Throws:
TasteException

getPreferencesFromUser

public PreferenceArray getPreferencesFromUser(long id)
                                       throws TasteException
Specified by:
getPreferencesFromUser in interface DataModel
Throws:
TasteException

getItemIDsFromUser

public FastIDSet getItemIDsFromUser(long userID)
                             throws TasteException
Specified by:
getItemIDsFromUser in interface DataModel
Throws:
TasteException

getItemIDs

public LongPrimitiveIterator getItemIDs()
                                 throws TasteException
Specified by:
getItemIDs in interface DataModel
Throws:
TasteException

getPreferencesForItem

public PreferenceArray getPreferencesForItem(long itemID)
                                      throws TasteException
Specified by:
getPreferencesForItem in interface DataModel
Throws:
TasteException

getPreferenceValue

public Float getPreferenceValue(long userID,
                                long itemID)
                         throws TasteException
Specified by:
getPreferenceValue in interface DataModel
Throws:
TasteException

getPreferenceTime

public Long getPreferenceTime(long userID,
                              long itemID)
                       throws TasteException
Specified by:
getPreferenceTime in interface DataModel
Throws:
TasteException

getNumItems

public int getNumItems()
                throws TasteException
Specified by:
getNumItems in interface DataModel
Throws:
TasteException

getNumUsers

public int getNumUsers()
                throws TasteException
Specified by:
getNumUsers in interface DataModel
Throws:
TasteException

getNumUsersWithPreferenceFor

public int getNumUsersWithPreferenceFor(long itemID)
                                 throws TasteException
Specified by:
getNumUsersWithPreferenceFor in interface DataModel
Throws:
TasteException

getNumUsersWithPreferenceFor

public int getNumUsersWithPreferenceFor(long itemID1,
                                        long itemID2)
                                 throws TasteException
Specified by:
getNumUsersWithPreferenceFor in interface DataModel
Throws:
TasteException

setPreference

public void setPreference(long userID,
                          long itemID,
                          float value)
Specified by:
setPreference in interface DataModel

removePreference

public void removePreference(long userID,
                             long itemID)
Specified by:
removePreference in interface DataModel

hasPreferenceValues

public boolean hasPreferenceValues()
Specified by:
hasPreferenceValues in interface DataModel

getMaxPreference

public float getMaxPreference()
Specified by:
getMaxPreference in interface DataModel

getMinPreference

public float getMinPreference()
Specified by:
getMinPreference in interface DataModel

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.