org.apache.mahout.math.decomposer.hebbian
Class HebbianSolver

java.lang.Object
  extended by org.apache.mahout.math.decomposer.hebbian.HebbianSolver

public class HebbianSolver
extends Object

The Hebbian solver is an iterative, sparse, singular value decomposition solver, based on the paper Generalized Hebbian Algorithm for Latent Semantic Analysis (2005) by Genevieve Gorrell and Brandyn Webb (a.k.a. Simon Funk). TODO: more description here! For now: read the inline comments, and the comments for the constructors.


Constructor Summary
HebbianSolver(double convergenceTarget)
          Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread, with maxPassesPerEigen set to Integer.MAX_VALUE.
HebbianSolver(double convergenceTarget, int maxPassesPerEigen)
          This is the recommended constructor to use if you're not sure Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread.
HebbianSolver(EigenUpdater updater, SingularVectorVerifier verifier, double convergenceTarget)
          Creates a new HebbianSolver with maxPassesPerEigen = Integer.MAX_VALUE (i.e.
HebbianSolver(EigenUpdater updater, SingularVectorVerifier verifier, double convergenceTarget, int maxPassesPerEigen)
          Creates a new HebbianSolver
HebbianSolver(int numPassesPerEigen)
          Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread, with convergenceTarget set to 0, which means that the solver will not really care about convergence as a loop-exiting criterion (but will be checking for convergence anyways, so it will be logged and singular values will be saved).
 
Method Summary
protected  boolean hasNotConverged(Vector currentPseudoEigen, Matrix corpus, TrainingState state)
          Uses the SingularVectorVerifier to check for convergence
static void main(String[] args)
           
 TrainingState solve(Matrix corpus, int desiredRank)
          Primary singular vector solving method.
protected  EigenStatus verify(Matrix corpus, Vector currentPseudoEigen)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HebbianSolver

public HebbianSolver(EigenUpdater updater,
                     SingularVectorVerifier verifier,
                     double convergenceTarget,
                     int maxPassesPerEigen)
Creates a new HebbianSolver

Parameters:
updater - EigenUpdater used to do the actual work of iteratively updating the current "best guess" singular vector one data-point presentation at a time.
verifier - SingularVectorVerifier an object which perpetually tries to check how close to convergence the current singular vector is (typically is a AsyncEigenVerifier which does this in the background in another thread, while the main thread continues to converge)
convergenceTarget - a small "epsilon" value which tells the solver how small you want the cosine of the angle between a proposed eigenvector and that same vector after being multiplied by the (square of the) input corpus
maxPassesPerEigen - a cutoff which tells the solver after how many times of checking for convergence (done by the verifier) should the solver stop trying, even if it has not reached the convergenceTarget.

HebbianSolver

public HebbianSolver(EigenUpdater updater,
                     SingularVectorVerifier verifier,
                     double convergenceTarget)
Creates a new HebbianSolver with maxPassesPerEigen = Integer.MAX_VALUE (i.e. keep on iterating until convergenceTarget is reached). Not recommended unless only looking for the first few (5, maybe 10?) singular vectors, as small errors which compound early on quickly put a minimum error on subsequent vectors.

Parameters:
updater - EigenUpdater used to do the actual work of iteratively updating the current "best guess" singular vector one data-point presentation at a time.
verifier - SingularVectorVerifier an object which perpetually tries to check how close to convergence the current singular vector is (typically is a AsyncEigenVerifier which does this in the background in another thread, while the main thread continues to converge)
convergenceTarget - a small "epsilon" value which tells the solver how small you want the cosine of the angle between a proposed eigenvector and that same vector after being multiplied by the (square of the) input corpus

HebbianSolver

public HebbianSolver(double convergenceTarget,
                     int maxPassesPerEigen)
This is the recommended constructor to use if you're not sure Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread.

Parameters:
convergenceTarget - a small "epsilon" value which tells the solver how small you want the cosine of the angle between a proposed eigenvector and that same vector after being multiplied by the (square of the) input corpus
maxPassesPerEigen - a cutoff which tells the solver after how many times of checking for convergence (done by the verifier) should the solver stop trying, even if it has not reached the convergenceTarget.

HebbianSolver

public HebbianSolver(double convergenceTarget)
Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread, with maxPassesPerEigen set to Integer.MAX_VALUE. Not recommended unless only looking for the first few (5, maybe 10?) singular vectors, as small errors which compound early on quickly put a minimum error on subsequent vectors.

Parameters:
convergenceTarget - a small "epsilon" value which tells the solver how small you want the cosine of the angle between a proposed eigenvector and that same vector after being multiplied by the (square of the) input corpus

HebbianSolver

public HebbianSolver(int numPassesPerEigen)
Creates a new HebbianSolver with the default HebbianUpdater to do the updating work, and the default AsyncEigenVerifier to check for convergence in a (single) background thread, with convergenceTarget set to 0, which means that the solver will not really care about convergence as a loop-exiting criterion (but will be checking for convergence anyways, so it will be logged and singular values will be saved).

Parameters:
numPassesPerEigen - the exact number of times the verifier will check convergence status in the background before the solver will move on to the next eigen-vector.
Method Detail

solve

public TrainingState solve(Matrix corpus,
                           int desiredRank)
Primary singular vector solving method.

Parameters:
corpus - input matrix to find singular vectors of. Needs not be symmetric, should probably be sparse (in fact the input vectors are not mutated, and accessed only via dot-products and sums, so they should be SequentialAccessSparseVector
desiredRank - the number of singular vectors to find (in roughly decreasing order by singular value)
Returns:
the final TrainingState of the solver, after desiredRank singular vectors (and approximate singular values) have been found.

hasNotConverged

protected boolean hasNotConverged(Vector currentPseudoEigen,
                                  Matrix corpus,
                                  TrainingState state)
Uses the SingularVectorVerifier to check for convergence

Parameters:
currentPseudoEigen - the purported singular vector whose convergence is being checked
corpus - the corpus to check against
state - contains the previous eigens, various other solving state TrainingState
Returns:
true if either we have converged, or maxPassesPerEigen has been exceeded.

verify

protected EigenStatus verify(Matrix corpus,
                             Vector currentPseudoEigen)

main

public static void main(String[] args)


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.