org.apache.mahout.classifier
Class AbstractVectorClassifier

java.lang.Object
  extended by org.apache.mahout.classifier.AbstractVectorClassifier
Direct Known Subclasses:
AbstractNaiveBayesClassifier, AbstractOnlineLogisticRegression, ClusterClassifier, CrossFoldLearner, GradientMachine, PassiveAggressive

public abstract class AbstractVectorClassifier
extends Object

Defines the interface for classifiers that take a vector as input. This is implemented as an abstract class so that it can implement a number of handy convenience methods related to classification of vectors.

A classifier takes an input vector and calculates the scores (usually probabilities) that the input vector belongs to one of n categories. In AbstractVectorClassifier each category is denoted by an integer c between 0 and n-1 (inclusive).

New users should start by looking at classifyFull(org.apache.mahout.math.Vector) (not classify(org.apache.mahout.math.Vector)).


Field Summary
static double MIN_LOG_LIKELIHOOD
          Minimum allowable log likelihood value.
 
Constructor Summary
AbstractVectorClassifier()
           
 
Method Summary
 Matrix classify(Matrix data)
          Returns n-1 probabilities, one for each categories 1 through n-1, for each row of a matrix, where n is equal to numCategories().
abstract  Vector classify(Vector instance)
          Compute and return a vector containing n-1 scores, where n is equal to numCategories(), given an input vector instance.
 Matrix classifyFull(Matrix data)
          Returns a matrix where the rows of the matrix each contain n probabilities, one for each category.
 Vector classifyFull(Vector instance)
          Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance.
 Vector classifyFull(Vector r, Vector instance)
          Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance.
 Vector classifyNoLink(Vector features)
          Compute and return a vector of scores before applying the inverse link function.
 Vector classifyScalar(Matrix data)
          Returns a vector of probabilities of category 1, one for each row of a matrix.
abstract  double classifyScalar(Vector instance)
          Classifies a vector in the special case of a binary classifier where classify(Vector) would return a vector with only one element.
 double logLikelihood(int actual, Vector data)
          Returns a measure of how good the classification for a particular example actually is.
abstract  int numCategories()
          Returns the number of categories that a target variable can be assigned to.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN_LOG_LIKELIHOOD

public static final double MIN_LOG_LIKELIHOOD
Minimum allowable log likelihood value.

See Also:
Constant Field Values
Constructor Detail

AbstractVectorClassifier

public AbstractVectorClassifier()
Method Detail

numCategories

public abstract int numCategories()
Returns the number of categories that a target variable can be assigned to. A vector classifier will encode it's output as an integer from 0 to numCategories()-1 (inclusive).

Returns:
The number of categories.

classify

public abstract Vector classify(Vector instance)
Compute and return a vector containing n-1 scores, where n is equal to numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to that category. The categories are denoted by the integers 0 through n-1 (inclusive), and the scores in the returned vector correspond to categories 1 through n-1 (leaving out category 0). It is assumed that the score for category 0 is one minus the sum of the scores in the returned vector.

Parameters:
instance - A feature vector to be classified.
Returns:
A vector of probabilities in 1 of n-1 encoding.

classifyNoLink

public Vector classifyNoLink(Vector features)
Compute and return a vector of scores before applying the inverse link function. For logistic regression and other generalized linear models, this is just the linear part of the classification.

The implementation of this method provided by AbstractVectorClassifier throws an UnsupportedOperationException. Your subclass must explicitly override this method to support this operation.

Parameters:
features - A feature vector to be classified.
Returns:
A vector of scores. If transformed by the link function, these will become probabilities.

classifyScalar

public abstract double classifyScalar(Vector instance)
Classifies a vector in the special case of a binary classifier where classify(Vector) would return a vector with only one element. As such, using this method can avoid the allocation of a vector.

Parameters:
instance - The feature vector to be classified.
Returns:
The score for category 1.
See Also:
classify(Vector)

classifyFull

public Vector classifyFull(Vector instance)
Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to the corresponding category. The categories are denoted by the integers 0 through n-1 (inclusive).

Using this method it is possible to classify an input vector, for example, by selecting the category with the largest score. If classifier is an instance of AbstractVectorClassifier and input is a Vector of features describing an element to be classified, then the following code could be used to classify input.
Vector scores = classifier.classifyFull(input);<br> int assignedCategory = scores.maxValueIndex();<br> Here assignedCategory is the index of the category with the maximum score.

If an n-1 encoding is acceptable, and allocation performance is an issue, then the classify(Vector) method is probably better to use.

Parameters:
instance - A vector of features to be classified.
Returns:
A vector of probabilities, one for each category.
See Also:
classify(Vector), classifyFull(Vector r, Vector instance)

classifyFull

public Vector classifyFull(Vector r,
                           Vector instance)
Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to the corresponding category. The categories are denoted by the integers 0 through n-1 (inclusive). The main difference between this method and classifyFull(Vector) is that this method allows a user to provide a previously allocated Vector r to store the returned scores.

Using this method it is possible to classify an input vector, for example, by selecting the category with the largest score. If classifier is an instance of AbstractVectorClassifier, result is a non-null Vector, and input is a Vector of features describing an element to be classified, then the following code could be used to classify input.
Vector scores = classifier.classifyFull(result, input); // Notice that scores == result<br> int assignedCategory = scores.maxValueIndex();<br> Here assignedCategory is the index of the category with the maximum score.

Parameters:
r - Where to put the results.
instance - A vector of features to be classified.
Returns:
A vector of scores/probabilities, one for each category.

classify

public Matrix classify(Matrix data)
Returns n-1 probabilities, one for each categories 1 through n-1, for each row of a matrix, where n is equal to numCategories(). The probability of the missing 0-th category is 1 - rowSum(this result).

Parameters:
data - The matrix whose rows are the input vectors to classify
Returns:
A matrix of scores, one row per row of the input matrix, one column for each but the last category.

classifyFull

public Matrix classifyFull(Matrix data)
Returns a matrix where the rows of the matrix each contain n probabilities, one for each category.

Parameters:
data - The matrix whose rows are the input vectors to classify
Returns:
A matrix of scores, one row per row of the input matrix, one column for each but the last category.

classifyScalar

public Vector classifyScalar(Matrix data)
Returns a vector of probabilities of category 1, one for each row of a matrix. This only makes sense if there are exactly two categories, but calling this method in that case can save a number of vector allocations.

Parameters:
data - The matrix whose rows are vectors to classify
Returns:
A vector of scores, with one value per row of the input matrix.

logLikelihood

public double logLikelihood(int actual,
                            Vector data)
Returns a measure of how good the classification for a particular example actually is.

Parameters:
actual - The correct category for the example.
data - The vector to be classified.
Returns:
The log likelihood of the correct answer as estimated by the current model. This will always be <= 0 and larger (closer to 0) indicates better accuracy. In order to simplify code that maintains eunning averages, we bound this value at -100.


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.