org.apache.mahout.vectorizer.encoders
Class LuceneTextValueEncoder
java.lang.Object
org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
org.apache.mahout.vectorizer.encoders.TextValueEncoder
org.apache.mahout.vectorizer.encoders.LuceneTextValueEncoder
public class LuceneTextValueEncoder
- extends TextValueEncoder
Encodes text using a lucene style tokenizer.
- See Also:
TextValueEncoder
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder |
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, getWeight, hash, hash, hash, hash, hash, isTraceEnabled, setProbes, setTraceDictionary, trace, trace |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LuceneTextValueEncoder
public LuceneTextValueEncoder(String name)
setAnalyzer
public void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
tokenize
protected Iterable<String> tokenize(CharSequence originalForm)
- Tokenizes a string using the simplest method. This should be over-ridden for more subtle
tokenization.
- Overrides:
tokenize
in class TextValueEncoder
- See Also:
LuceneTextValueEncoder
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.