org.apache.mahout.clustering.lda.cvb
Class InMemoryCollapsedVariationalBayes0

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class InMemoryCollapsedVariationalBayes0
extends AbstractJob

Runs the same algorithm as CVB0Driver, but sequentially, in memory. Memory requirements are currently: the entire corpus is read into RAM, two copies of the model (each of size numTerms * numTopics), and another matrix of size numDocs * numTopics is held in memory (to store p(topic|doc) for all docs). But if all this fits in memory, this can be significantly faster than an iterative MR job.


Field Summary
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
InMemoryCollapsedVariationalBayes0(Matrix corpus, String[] terms, int numTopics, double alpha, double eta, int numTrainingThreads, int numUpdatingThreads, double modelCorpusFraction)
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
 double iterateUntilConvergence(double minFractionalErrorChange, int maxIterations, int minIter)
           
 double iterateUntilConvergence(double minFractionalErrorChange, int maxIterations, int minIter, double testFraction)
           
static void main(String[] args)
           
static int main2(String[] args, org.apache.hadoop.conf.Configuration conf)
           
 int run(String[] strings)
           
 void setVerbose(boolean verbose)
           
 void trainDocuments()
           
 void trainDocuments(double testFraction)
           
 void writeModel(org.apache.hadoop.fs.Path outputPath)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InMemoryCollapsedVariationalBayes0

public InMemoryCollapsedVariationalBayes0(Matrix corpus,
                                          String[] terms,
                                          int numTopics,
                                          double alpha,
                                          double eta,
                                          int numTrainingThreads,
                                          int numUpdatingThreads,
                                          double modelCorpusFraction)
Method Detail

setVerbose

public void setVerbose(boolean verbose)

trainDocuments

public void trainDocuments()

trainDocuments

public void trainDocuments(double testFraction)

iterateUntilConvergence

public double iterateUntilConvergence(double minFractionalErrorChange,
                                      int maxIterations,
                                      int minIter)

iterateUntilConvergence

public double iterateUntilConvergence(double minFractionalErrorChange,
                                      int maxIterations,
                                      int minIter,
                                      double testFraction)

writeModel

public void writeModel(org.apache.hadoop.fs.Path outputPath)
                throws IOException
Throws:
IOException

main2

public static int main2(String[] args,
                        org.apache.hadoop.conf.Configuration conf)
                 throws Exception
Throws:
Exception

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable
Overrides:
getConf in class AbstractJob

run

public int run(String[] strings)
        throws Exception
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.