Package org.apache.mahout.vectorizer

Interface Summary
Vectorizer  
Weight  
 

Class Summary
DictionaryVectorizer This class converts a set of input documents in the sequence file format to vectors.
DocumentProcessor This class converts a set of input documents in the sequence file format of StringTuples.The SequenceFile input should have a Text key containing the unique document identifier and a Text value containing the whole document.
EncodedVectorsFromSequenceFiles Converts a given set of sequence files into SparseVectors
EncodingMapper The Mapper that does the work of encoding text
HighDFWordsPruner  
SimpleTextEncodingVectorizer Runs a Map/Reduce job that encodes FeatureVectorEncoder the input and writes it to the output as a sequence file.
SparseVectorsFromSequenceFiles Converts a given set of sequence files into SparseVectors
TF Weight based on term frequency only
TFIDF  
VectorizerConfig The config for a Vectorizer.
 



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.