org.apache.mahout.math.hadoop
Class DistributedRowMatrix

java.lang.Object
  extended by org.apache.mahout.math.hadoop.DistributedRowMatrix
All Implemented Interfaces:
Iterable<MatrixSlice>, org.apache.hadoop.conf.Configurable, VectorIterable

public class DistributedRowMatrix
extends Object
implements VectorIterable, org.apache.hadoop.conf.Configurable

DistributedRowMatrix is a FileSystem-backed VectorIterable in which the vectors live in a SequenceFile, and distributed operations are executed as M/R passes on Hadoop. The usage is as follows:

   // the path must already contain an already created SequenceFile!
   DistributedRowMatrix m = new DistributedRowMatrix("path/to/vector/sequenceFile", "tmp/path", 10000000, 250000);
   m.setConf(new Configuration());
   // now if we want to multiply a vector by this matrix, it's dimension must equal the row dimension of this
   // matrix.  If we want to timesSquared() a vector by this matrix, its dimension must equal the column dimension
   // of the matrix.
   Vector v = new DenseVector(250000);
   // now the following operation will be done via a M/R pass via Hadoop.
   Vector w = m.timesSquared(v);
 


Nested Class Summary
static class DistributedRowMatrix.MatrixEntryWritable
           
 
Field Summary
static String KEEP_TEMP_FILES
           
 
Constructor Summary
DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputTmpPath, int numRows, int numCols)
           
DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputTmpPath, int numRows, int numCols, boolean keepTempFiles)
           
 
Method Summary
 Vector columnMeans()
           
 Vector columnMeans(String vectorClass)
          Returns the column-wise mean of a DistributedRowMatrix
 org.apache.hadoop.conf.Configuration getConf()
           
 org.apache.hadoop.fs.Path getOutputTempPath()
           
 org.apache.hadoop.fs.Path getRowPath()
           
 Iterator<MatrixSlice> iterateAll()
           
 Iterator<MatrixSlice> iterator()
           
 int numCols()
           
 int numRows()
           
 int numSlices()
           
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
 void setOutputTempPathString(String outPathString)
           
 DistributedRowMatrix times(DistributedRowMatrix other)
          This implements matrix this.transpose().times(other)
 DistributedRowMatrix times(DistributedRowMatrix other, org.apache.hadoop.fs.Path outPath)
          This implements matrix this.transpose().times(other)
 Vector times(Vector v)
           
 Vector timesSquared(Vector v)
           
 DistributedRowMatrix transpose()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

KEEP_TEMP_FILES

public static final String KEEP_TEMP_FILES
See Also:
Constant Field Values
Constructor Detail

DistributedRowMatrix

public DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
                            org.apache.hadoop.fs.Path outputTmpPath,
                            int numRows,
                            int numCols)

DistributedRowMatrix

public DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
                            org.apache.hadoop.fs.Path outputTmpPath,
                            int numRows,
                            int numCols,
                            boolean keepTempFiles)
Method Detail

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

getRowPath

public org.apache.hadoop.fs.Path getRowPath()

getOutputTempPath

public org.apache.hadoop.fs.Path getOutputTempPath()

setOutputTempPathString

public void setOutputTempPathString(String outPathString)

iterateAll

public Iterator<MatrixSlice> iterateAll()
Specified by:
iterateAll in interface VectorIterable

numSlices

public int numSlices()
Specified by:
numSlices in interface VectorIterable

numRows

public int numRows()
Specified by:
numRows in interface VectorIterable

numCols

public int numCols()
Specified by:
numCols in interface VectorIterable

times

public DistributedRowMatrix times(DistributedRowMatrix other)
                           throws IOException
This implements matrix this.transpose().times(other)

Parameters:
other - a DistributedRowMatrix
Returns:
a DistributedRowMatrix containing the product
Throws:
IOException

times

public DistributedRowMatrix times(DistributedRowMatrix other,
                                  org.apache.hadoop.fs.Path outPath)
                           throws IOException
This implements matrix this.transpose().times(other)

Parameters:
other - a DistributedRowMatrix
outPath - path to write result to
Returns:
a DistributedRowMatrix containing the product
Throws:
IOException

columnMeans

public Vector columnMeans()
                   throws IOException
Throws:
IOException

columnMeans

public Vector columnMeans(String vectorClass)
                   throws IOException
Returns the column-wise mean of a DistributedRowMatrix

Parameters:
vectorClass - desired class for the column-wise mean vector e.g. RandomAccessSparseVector, DenseVector
Returns:
Vector containing the column-wise mean of this
Throws:
IOException

transpose

public DistributedRowMatrix transpose()
                               throws IOException
Throws:
IOException

times

public Vector times(Vector v)
Specified by:
times in interface VectorIterable

timesSquared

public Vector timesSquared(Vector v)
Specified by:
timesSquared in interface VectorIterable

iterator

public Iterator<MatrixSlice> iterator()
Specified by:
iterator in interface Iterable<MatrixSlice>


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.