org.apache.mahout.clustering.lda.cvb
Class InMemoryCollapsedVariationalBayes0
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class InMemoryCollapsedVariationalBayes0
- extends AbstractJob
Runs the same algorithm as CVB0Driver
, but sequentially, in memory. Memory requirements
are currently: the entire corpus is read into RAM, two copies of the model (each of size
numTerms * numTopics), and another matrix of size numDocs * numTopics is held in memory
(to store p(topic|doc) for all docs).
But if all this fits in memory, this can be significantly faster than an iterative MR job.
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
InMemoryCollapsedVariationalBayes0
public InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta,
int numTrainingThreads,
int numUpdatingThreads,
double modelCorpusFraction)
setVerbose
public void setVerbose(boolean verbose)
trainDocuments
public void trainDocuments()
trainDocuments
public void trainDocuments(double testFraction)
iterateUntilConvergence
public double iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter)
iterateUntilConvergence
public double iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter,
double testFraction)
writeModel
public void writeModel(org.apache.hadoop.fs.Path outputPath)
throws IOException
- Throws:
IOException
main2
public static int main2(String[] args,
org.apache.hadoop.conf.Configuration conf)
throws Exception
- Throws:
Exception
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
- Overrides:
getConf
in class AbstractJob
run
public int run(String[] strings)
throws Exception
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.