org.apache.mahout.math.hadoop.decomposer
Class DistributedLanczosSolver

java.lang.Object
  extended by org.apache.mahout.math.decomposer.lanczos.LanczosSolver
      extended by org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class DistributedLanczosSolver
extends LanczosSolver
implements org.apache.hadoop.util.Tool

See the SSVD code for a better option than using this: https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition

See Also:
SSVDSolver

Nested Class Summary
 class DistributedLanczosSolver.DistributedLanczosSolverJob
          Inner subclass of AbstractJob so we get access to AbstractJob's functionality w.r.t.
 
Nested classes/interfaces inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
LanczosSolver.TimingSection
 
Field Summary
static String RAW_EIGENVECTORS
           
 
Fields inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
SAFE_MAX
 
Constructor Summary
DistributedLanczosSolver()
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
static Vector getInitialVector(VectorIterable corpus)
          For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be uniform over all input dimensions, L_2 normalized.
 DistributedLanczosSolver.DistributedLanczosSolverJob job()
           
static void main(String[] args)
           
 int run(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.fs.Path outputTmpPath, org.apache.hadoop.fs.Path workingDirPath, int numRows, int numCols, boolean isSymmetric, int desiredRank)
          Run the solver to produce the raw eigenvectors
 int run(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.fs.Path outputTmpPath, org.apache.hadoop.fs.Path workingDirPath, int numRows, int numCols, boolean isSymmetric, int desiredRank, double maxError, double minEigenvalue, boolean inMemory)
          Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them
 int run(String[] strings)
           
 LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig, LanczosState state, int desiredRank, boolean isSymmetric, String outputEigenVectorPathString)
           
 LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig, org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputTmpPath, int numRows, int numCols, boolean isSymmetric, int desiredRank, String outputEigenVectorPathString)
          Factored-out LanczosSolver for the purpose of invoking it programmatically
 void serializeOutput(LanczosState state, org.apache.hadoop.fs.Path outputPath)
           
 void setConf(org.apache.hadoop.conf.Configuration configuration)
           
 
Methods inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
calculateScaleFactor, orthoganalizeAgainstAllButLast, solve, solve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RAW_EIGENVECTORS

public static final String RAW_EIGENVECTORS
See Also:
Constant Field Values
Constructor Detail

DistributedLanczosSolver

public DistributedLanczosSolver()
Method Detail

getInitialVector

public static Vector getInitialVector(VectorIterable corpus)
For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be uniform over all input dimensions, L_2 normalized.


runJob

public LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig,
                           LanczosState state,
                           int desiredRank,
                           boolean isSymmetric,
                           String outputEigenVectorPathString)
                    throws IOException
Throws:
IOException

runJob

public LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig,
                           org.apache.hadoop.fs.Path inputPath,
                           org.apache.hadoop.fs.Path outputTmpPath,
                           int numRows,
                           int numCols,
                           boolean isSymmetric,
                           int desiredRank,
                           String outputEigenVectorPathString)
                    throws IOException
Factored-out LanczosSolver for the purpose of invoking it programmatically

Throws:
IOException

run

public int run(String[] strings)
        throws Exception
Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
Exception

run

public int run(org.apache.hadoop.fs.Path inputPath,
               org.apache.hadoop.fs.Path outputPath,
               org.apache.hadoop.fs.Path outputTmpPath,
               org.apache.hadoop.fs.Path workingDirPath,
               int numRows,
               int numCols,
               boolean isSymmetric,
               int desiredRank,
               double maxError,
               double minEigenvalue,
               boolean inMemory)
        throws Exception
Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them

Parameters:
inputPath - the Path to the input corpus
outputPath - the Path to the output
outputTmpPath - a Path to a temporary working directory
numRows - the int number of rows
numCols - the int number of columns
isSymmetric - true if the input matrix is symmetric
desiredRank - the int desired rank of eigenvectors to produce
maxError - the maximum allowable error
minEigenvalue - the minimum usable eigenvalue
inMemory - true if the verification can be done in memory
Returns:
an int indicating success (0) or otherwise
Throws:
Exception

run

public int run(org.apache.hadoop.fs.Path inputPath,
               org.apache.hadoop.fs.Path outputPath,
               org.apache.hadoop.fs.Path outputTmpPath,
               org.apache.hadoop.fs.Path workingDirPath,
               int numRows,
               int numCols,
               boolean isSymmetric,
               int desiredRank)
        throws Exception
Run the solver to produce the raw eigenvectors

Parameters:
inputPath - the Path to the input corpus
outputPath - the Path to the output
outputTmpPath - a Path to a temporary working directory
numRows - the int number of rows
numCols - the int number of columns
isSymmetric - true if the input matrix is symmetric
desiredRank - the int desired rank of eigenvectors to produce
Returns:
an int indicating success (0) or otherwise
Throws:
Exception

serializeOutput

public void serializeOutput(LanczosState state,
                            org.apache.hadoop.fs.Path outputPath)
                     throws IOException
Parameters:
state - The final LanczosState to be serialized
outputPath - The path (relative to the current Configuration's FileSystem) to save the output to.
Throws:
IOException

setConf

public void setConf(org.apache.hadoop.conf.Configuration configuration)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

job

public DistributedLanczosSolver.DistributedLanczosSolverJob job()

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.