org.apache.mahout.math.ssvd
Class SequentialOutOfCoreSvd

java.lang.Object
  extended by org.apache.mahout.math.ssvd.SequentialOutOfCoreSvd

public class SequentialOutOfCoreSvd
extends Object

Sequential block-oriented out of core SVD algorithm.

The basic algorithm (in-core version) is that we do a random projects, get a basis of that and then re-project the original matrix using that basis. This re-projected matrix allows us to get an approximate SVD of the original matrix.

The input to this program is a list of files that contain the sub-matrices A_i. The result is a vector of singular values and optionally files that contain the left and right singular vectors.

Mathematically, to decompose A, we do this:

Y = A * \Omega

Q R = Y

B = Q" A

U D V' = B

(Q U) D V' \approx A

To do this out of core, we break A into blocks each with the same number of rows. This gives a block-wise version of Y. As we are computing Y, we can also accumulate Y' Y and when done, we can use a Cholesky decomposition to do the QR decomposition of Y in a latent form. That gives us B in block-wise form and we can do the same trick to get an LQ of B. The L part can be decomposed in memory. Then we can recombine to get the final decomposition.

The details go like this. Start with a block form of A.

Y_i = A_i * \Omega

Instead of doing a QR decomposition of Y, we do a Cholesky decomposition of Y' Y. This is a small in-memory operation. Q is large and dense and won't fit in memory.

R' R = \sum_i Y_i' Y_i

For reference, R is all we need to compute explicitly. Q will be computed on the fly when needed.

Q = Y R^-1

B = Q" A = \sum_i (A \Omega R^-1)' A_i

As B is generated, it needs to be segmented in row-wise blocks since it is wide but not tall. This storage requires something like a map-reduce to accumulate the partial sums. In this code, we do this by re-reading previously computed chunks and augmenting them.

While the pieces of B are being computed, we can accumulate B B' in preparation for a second Cholesky decomposition

L L' = B B' = sum B_j B_j'

Again, this is an LQ decomposition of BB', but we don't compute the Q part explicitly. L will be small and thus tractable.

Finally, we do the actual SVD decomposition.

U_0 D V_0' = L

D contains the singular values of A. The left and right singular values can be reconstructed using Y and B. Note that both of these reconstructions can be done with single passes through the blocked forms of Y and B.

U = A \Omega R^{-1} U_0

V = B' L'^{-1} V_0


Constructor Summary
SequentialOutOfCoreSvd(Iterable<File> partsOfA, File tmpDir, int internalDimension, int columnsPerSlice)
           
 
Method Summary
 void computeU(Iterable<File> partsOfA, File tmpDir)
           
 void computeV(File tmpDir, int ncols)
           
 Vector getSingularValues()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequentialOutOfCoreSvd

public SequentialOutOfCoreSvd(Iterable<File> partsOfA,
                              File tmpDir,
                              int internalDimension,
                              int columnsPerSlice)
                       throws IOException
Throws:
IOException
Method Detail

computeV

public void computeV(File tmpDir,
                     int ncols)
              throws IOException
Throws:
IOException

computeU

public void computeU(Iterable<File> partsOfA,
                     File tmpDir)
              throws IOException
Throws:
IOException

getSingularValues

public Vector getSingularValues()


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.