org.apache.mahout.vectorizer.encoders
Class WordValueEncoder
java.lang.Object
org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
org.apache.mahout.vectorizer.encoders.WordValueEncoder
- Direct Known Subclasses:
- AdaptiveWordValueEncoder, StaticWordValueEncoder
public abstract class WordValueEncoder
- extends FeatureVectorEncoder
Encodes words as sparse vector updates to a Vector. Weighting is defined by a
sub-class.
Method Summary |
void |
addToVector(byte[] originalForm,
double w,
Vector data)
Adds a value to a vector. |
String |
asString(String originalForm)
Converts a value into a form that would help a human understand the internals of how the value
is being interpreted. |
protected double |
getWeight(byte[] originalForm,
double w)
|
protected int |
hashForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
Provides the unique hash for a particular probe. |
protected abstract double |
weight(byte[] originalForm)
|
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder |
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordValueEncoder
protected WordValueEncoder(String name)
addToVector
public void addToVector(byte[] originalForm,
double w,
Vector data)
- Adds a value to a vector.
- Specified by:
addToVector
in class FeatureVectorEncoder
- Parameters:
originalForm
- The original form of the value as a string.data
- The vector to which the value should be added.
getWeight
protected double getWeight(byte[] originalForm,
double w)
- Overrides:
getWeight
in class FeatureVectorEncoder
hashForProbe
protected int hashForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
- Description copied from class:
FeatureVectorEncoder
- Provides the unique hash for a particular probe. For all encoders except text, this
is all that is needed and the default implementation of hashesForProbe will do the right
thing. For text and similar values, hashesForProbe should be over-ridden and this method
should not be used.
- Specified by:
hashForProbe
in class FeatureVectorEncoder
- Parameters:
originalForm
- The original byte array valuedataSize
- The length of the vector being encodedname
- The name of the variable being encodedprobe
- The probe number
- Returns:
- The hash of the current probe
asString
public String asString(String originalForm)
- Converts a value into a form that would help a human understand the internals of how the value
is being interpreted. For text-like things, this is likely to be a list of the terms found with
associated weights (if any).
- Specified by:
asString
in class FeatureVectorEncoder
- Parameters:
originalForm
- The original form of the value as a string.
- Returns:
- A string that a human can read.
weight
protected abstract double weight(byte[] originalForm)
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.