Class Summary |
CachingCVB0Mapper |
Run ensemble learning via loading the ModelTrainer with two TopicModel instances:
one from the previous iteration, the other empty. |
CachingCVB0PerplexityMapper |
|
CVB0DocInferenceMapper |
|
CVB0Driver |
See CachingCVB0Mapper for more details on scalability and room for improvement. |
CVB0Driver.DualDoubleSumReducer |
Sums keys and values independently. |
CVB0TopicTermVectorNormalizerMapper |
Performs L1 normalization of input vectors. |
InMemoryCollapsedVariationalBayes0 |
Runs the same algorithm as CVB0Driver , but sequentially, in memory. |
ModelTrainer |
Multithreaded LDA model trainer class, which primarily operates by running a "map/reduce"
operation, all in memory locally (ie not a hadoop job!) : the "map" operation is to take
the "read-only" TopicModel and use it to iteratively learn the p(topic|term, doc)
distribution for documents (this can be done in parallel across many documents, as the
"read-only" model is, well, read-only. |
TopicModel |
Thin wrapper around a Matrix of counts of occurrences of (topic, term) pairs. |