org.apache.mahout.math.hadoop.decomposer
Class DistributedLanczosSolver
java.lang.Object
org.apache.mahout.math.decomposer.lanczos.LanczosSolver
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class DistributedLanczosSolver
- extends LanczosSolver
- implements org.apache.hadoop.util.Tool
See the SSVD code for a better option than using this:
https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
- See Also:
SSVDSolver
Method Summary |
org.apache.hadoop.conf.Configuration |
getConf()
|
static Vector |
getInitialVector(VectorIterable corpus)
For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be
uniform over all input dimensions, L_2 normalized. |
DistributedLanczosSolver.DistributedLanczosSolverJob |
job()
|
static void |
main(String[] args)
|
int |
run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
org.apache.hadoop.fs.Path workingDirPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank)
Run the solver to produce the raw eigenvectors |
int |
run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
org.apache.hadoop.fs.Path workingDirPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
double maxError,
double minEigenvalue,
boolean inMemory)
Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them |
int |
run(String[] strings)
|
LanczosState |
runJob(org.apache.hadoop.conf.Configuration originalConfig,
LanczosState state,
int desiredRank,
boolean isSymmetric,
String outputEigenVectorPathString)
|
LanczosState |
runJob(org.apache.hadoop.conf.Configuration originalConfig,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
String outputEigenVectorPathString)
Factored-out LanczosSolver for the purpose of invoking it programmatically |
void |
serializeOutput(LanczosState state,
org.apache.hadoop.fs.Path outputPath)
|
void |
setConf(org.apache.hadoop.conf.Configuration configuration)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RAW_EIGENVECTORS
public static final String RAW_EIGENVECTORS
- See Also:
- Constant Field Values
DistributedLanczosSolver
public DistributedLanczosSolver()
getInitialVector
public static Vector getInitialVector(VectorIterable corpus)
- For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be
uniform over all input dimensions, L_2 normalized.
runJob
public LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig,
LanczosState state,
int desiredRank,
boolean isSymmetric,
String outputEigenVectorPathString)
throws IOException
- Throws:
IOException
runJob
public LanczosState runJob(org.apache.hadoop.conf.Configuration originalConfig,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
String outputEigenVectorPathString)
throws IOException
- Factored-out LanczosSolver for the purpose of invoking it programmatically
- Throws:
IOException
run
public int run(String[] strings)
throws Exception
- Specified by:
run
in interface org.apache.hadoop.util.Tool
- Throws:
Exception
run
public int run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
org.apache.hadoop.fs.Path workingDirPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
double maxError,
double minEigenvalue,
boolean inMemory)
throws Exception
- Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them
- Parameters:
inputPath
- the Path to the input corpusoutputPath
- the Path to the outputoutputTmpPath
- a Path to a temporary working directorynumRows
- the int number of rowsnumCols
- the int number of columnsisSymmetric
- true if the input matrix is symmetricdesiredRank
- the int desired rank of eigenvectors to producemaxError
- the maximum allowable errorminEigenvalue
- the minimum usable eigenvalueinMemory
- true if the verification can be done in memory
- Returns:
- an int indicating success (0) or otherwise
- Throws:
Exception
run
public int run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
org.apache.hadoop.fs.Path workingDirPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank)
throws Exception
- Run the solver to produce the raw eigenvectors
- Parameters:
inputPath
- the Path to the input corpusoutputPath
- the Path to the outputoutputTmpPath
- a Path to a temporary working directorynumRows
- the int number of rowsnumCols
- the int number of columnsisSymmetric
- true if the input matrix is symmetricdesiredRank
- the int desired rank of eigenvectors to produce
- Returns:
- an int indicating success (0) or otherwise
- Throws:
Exception
serializeOutput
public void serializeOutput(LanczosState state,
org.apache.hadoop.fs.Path outputPath)
throws IOException
- Parameters:
state
- The final LanczosState to be serializedoutputPath
- The path (relative to the current Configuration's FileSystem) to save the output to.
- Throws:
IOException
setConf
public void setConf(org.apache.hadoop.conf.Configuration configuration)
- Specified by:
setConf
in interface org.apache.hadoop.conf.Configurable
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
job
public DistributedLanczosSolver.DistributedLanczosSolverJob job()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.