org.apache.mahout.cf.taste.hadoop.similarity.item
Class ItemSimilarityJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public final class ItemSimilarityJob
- extends AbstractJob
Distributed precomputation of the item-item-similarities for Itembased Collaborative Filtering
Preferences in the input file should look like userID,itemID[,preferencevalue]
Preference value is optional to accommodate applications that have no notion of a preference value (that is, the user
simply expresses a preference for an item, but no degree of preference).
The preference value is assumed to be parseable as a double
. The user IDs and item IDs are
parsed as long
s.
Command line arguments specific to this class are:
- --input (path): Directory containing one or more text files with the preference data
- --output (path): output path where similarity data should be written
- --similarityClassname (classname): Name of distributed similarity measure class to instantiate or a predefined
similarity from
VectorSimilarityMeasure
- --maxSimilaritiesPerItem (integer): Maximum number of similarities considered per item (100)
- --maxPrefsPerUser (integer): max number of preferences to consider per user, users with more preferences will
be sampled down (1000)
- --minPrefsPerUser (integer): ignore users with less preferences than this (1)
- --booleanData (boolean): Treat input data as having no pref values (false)
- --threshold (double): discard item pairs with a similarity value below this
General command line options are documented in AbstractJob
.
Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other arguments.
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ITEM_ID_INDEX_PATH_STR
public static final String ITEM_ID_INDEX_PATH_STR
MAX_SIMILARITIES_PER_ITEM
public static final String MAX_SIMILARITIES_PER_ITEM
ItemSimilarityJob
public ItemSimilarityJob()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.