Package org.apache.mahout.classifier.sgd

Implements a variety of on-line logistric regression classifiers using SGD-based algorithms.

See:
          Description

Interface Summary
Gradient Provides the ability to inject a gradient into the SGD logistic regresion.
PriorFunction A prior is used to regularize the learning algorithm.
RecordFactory A record factor understands how to convert a line of data into fields and then into a vector.
 

Class Summary
AbstractOnlineLogisticRegression Generic definition of a 1 of n logistic regression classifier that returns probabilities in response to a feature vector.
AdaptiveLogisticRegression This is a meta-learner that maintains a pool of ordinary OnlineLogisticRegression learners.
AdaptiveLogisticRegression.TrainingExample  
AdaptiveLogisticRegression.Wrapper Provides a shim between the EP optimization stuff and the CrossFoldLearner.
CrossFoldLearner Does cross-fold validation of log-likelihood and AUC on several online logistic regression models.
CsvRecordFactory Converts CSV data lines to vectors.
DefaultGradient Implements the basic logistic training law.
ElasticBandPrior Implements a linear combination of L1 and L2 priors.
GradientMachine Online gradient machine learner that tries to minimize the label ranking hinge loss.
L1 Implements the Laplacian or bi-exponential prior.
L2 Implements the Gaussian prior.
MixedGradient Provides a stochastic mixture of ranking updates and normal logistic updates.
ModelDissector Uses sample data to reverse engineer a feature-hashed model.
ModelDissector.Weight  
ModelSerializer Provides the ability to store SGD model-related objects as binary files.
OnlineLogisticRegression Extends the basic on-line logistic regression learner with a specific set of learning rate annealing schedules.
PassiveAggressive Online passive aggressive learner that tries to minimize the label ranking hinge loss.
PolymorphicWritable Utilities that write a class name and then serialize using writables.
RankingGradient Uses the difference between this instance and recent history to get a gradient that optimizes ranking performance.
TPrior Provides a t-distribution as a prior.
UniformPrior A uniform prior.
 

Package org.apache.mahout.classifier.sgd Description

Implements a variety of on-line logistric regression classifiers using SGD-based algorithms. SGD stands for Stochastic Gradient Descent and refers to a class of learning algorithms that make it relatively easy to build high speed on-line learning algorithms for a variety of problems, notably including supervised learning for classification.

The primary class of interest in the this package is CrossFoldLearner which contains a number (typically 5) of sub-learners, each of which is given a different portion of the training data. Each of these sub-learners can then be evaluated on the data it was not trained on. This allows fully incremental learning while still getting cross-validated performance estimates.

The CrossFoldLearner implements OnlineLearner and thus expects to be fed input in the form of a target variable and a feature vector. The target variable is simply an integer in the half-open interval [0..numFeatures) where numFeatures is defined when the CrossFoldLearner is constructed. The creation of feature vectors is facilitated by the classes that inherit from FeatureVectorEncoder. These classes currently implement a form of feature hashing with multiple probes to limit feature ambiguity.



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.