org.apache.mahout.utils.vectors.lucene
Class ClusterLabels
java.lang.Object
org.apache.mahout.utils.vectors.lucene.ClusterLabels
public class ClusterLabels
- extends Object
Get labels for the cluster using Log Likelihood Ratio (LLR).
"The most useful way to think of this (LLR) is as the percentage of in-cluster documents that have the
feature (term) versus the percentage out, keeping in mind that both percentages are uncertain since we have
only a sample of all possible documents." - Ted Dunning
More about LLR can be found at : http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
Constructor Summary |
ClusterLabels(org.apache.hadoop.fs.Path seqFileDir,
org.apache.hadoop.fs.Path pointsDir,
String indexDir,
String contentField,
int minNumIds,
int maxLabels)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MIN_IDS
public static final int DEFAULT_MIN_IDS
- See Also:
- Constant Field Values
DEFAULT_MAX_LABELS
public static final int DEFAULT_MAX_LABELS
- See Also:
- Constant Field Values
ClusterLabels
public ClusterLabels(org.apache.hadoop.fs.Path seqFileDir,
org.apache.hadoop.fs.Path pointsDir,
String indexDir,
String contentField,
int minNumIds,
int maxLabels)
getLabels
public void getLabels()
throws IOException
- Throws:
IOException
getClusterLabels
protected List<org.apache.mahout.utils.vectors.lucene.TermInfoClusterInOut> getClusterLabels(Integer integer,
Collection<WeightedPropertyVectorWritable> wpvws)
throws IOException
- Get the list of labels, sorted by best score.
- Throws:
IOException
getIdField
public String getIdField()
setIdField
public void setIdField(String idField)
getOutput
public String getOutput()
setOutput
public void setOutput(String output)
main
public static void main(String[] args)
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.