org.apache.mahout.clustering.lda.cvb
Class ModelTrainer
java.lang.Object
org.apache.mahout.clustering.lda.cvb.ModelTrainer
public class ModelTrainer
- extends Object
Multithreaded LDA model trainer class, which primarily operates by running a "map/reduce"
operation, all in memory locally (ie not a hadoop job!) : the "map" operation is to take
the "read-only" TopicModel
and use it to iteratively learn the p(topic|term, doc)
distribution for documents (this can be done in parallel across many documents, as the
"read-only" model is, well, read-only. Then the outputs of this are "reduced" onto the
"write" model, and these updates are not parallelizable in the same way: individual
documents can't be added to the same entries in different threads at the same time, but
updates across many topics to the same term from the same document can be done in parallel,
so they are.
Because computation is done asynchronously, when iteration is done, it's important to call
the stop() method, which blocks until work is complete.
Setting the read model and the write model to be the same object may not quite work yet,
on account of parallelism badness.
Constructor Summary |
ModelTrainer(TopicModel model,
int numTrainThreads,
int numTopics,
int numTerms)
WARNING: this constructor may not lead to good behavior. |
ModelTrainer(TopicModel initialReadModel,
TopicModel initialWriteModel,
int numTrainThreads,
int numTopics,
int numTerms)
|
Method Summary |
void |
batchTrain(Map<Vector,Vector> batch,
boolean update,
int numDocTopicsIters)
|
double |
calculatePerplexity(VectorIterable matrix,
VectorIterable docTopicCounts)
|
double |
calculatePerplexity(VectorIterable matrix,
VectorIterable docTopicCounts,
double testFraction)
|
double |
calculatePerplexity(Vector document,
Vector docTopicCounts,
int numDocTopicIters)
|
TopicModel |
getReadModel()
|
void |
persist(org.apache.hadoop.fs.Path outputPath)
|
void |
start()
|
void |
stop()
|
void |
train(VectorIterable matrix,
VectorIterable docTopicCounts)
|
void |
train(VectorIterable matrix,
VectorIterable docTopicCounts,
int numDocTopicIters)
|
void |
train(Vector document,
Vector docTopicCounts,
boolean update,
int numDocTopicIters)
|
void |
trainSync(Vector document,
Vector docTopicCounts,
boolean update,
int numDocTopicIters)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ModelTrainer
public ModelTrainer(TopicModel initialReadModel,
TopicModel initialWriteModel,
int numTrainThreads,
int numTopics,
int numTerms)
ModelTrainer
public ModelTrainer(TopicModel model,
int numTrainThreads,
int numTopics,
int numTerms)
- WARNING: this constructor may not lead to good behavior. What should be verified is that
the model updating process does not conflict with model reading. It might work, but then
again, it might not!
- Parameters:
model
- to be used for both reading (inference) and accumulating (learning)numTrainThreads
- numTopics
- numTerms
-
getReadModel
public TopicModel getReadModel()
start
public void start()
train
public void train(VectorIterable matrix,
VectorIterable docTopicCounts)
calculatePerplexity
public double calculatePerplexity(VectorIterable matrix,
VectorIterable docTopicCounts)
calculatePerplexity
public double calculatePerplexity(VectorIterable matrix,
VectorIterable docTopicCounts,
double testFraction)
train
public void train(VectorIterable matrix,
VectorIterable docTopicCounts,
int numDocTopicIters)
batchTrain
public void batchTrain(Map<Vector,Vector> batch,
boolean update,
int numDocTopicsIters)
train
public void train(Vector document,
Vector docTopicCounts,
boolean update,
int numDocTopicIters)
trainSync
public void trainSync(Vector document,
Vector docTopicCounts,
boolean update,
int numDocTopicIters)
calculatePerplexity
public double calculatePerplexity(Vector document,
Vector docTopicCounts,
int numDocTopicIters)
stop
public void stop()
persist
public void persist(org.apache.hadoop.fs.Path outputPath)
throws IOException
- Throws:
IOException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.