org.apache.mahout.math
Class VectorBinaryAggregate

java.lang.Object
  extended by org.apache.mahout.math.VectorBinaryAggregate
Direct Known Subclasses:
VectorBinaryAggregate.AggregateAllIterateSequential, VectorBinaryAggregate.AggregateAllIterateThatLookupThis, VectorBinaryAggregate.AggregateAllIterateThisLookupThat, VectorBinaryAggregate.AggregateAllLoop, VectorBinaryAggregate.AggregateIterateIntersection, VectorBinaryAggregate.AggregateIterateUnionRandom, VectorBinaryAggregate.AggregateIterateUnionSequential, VectorBinaryAggregate.AggregateNonzerosIterateThatLookupThis, VectorBinaryAggregate.AggregateNonzerosIterateThisLookupThat

public abstract class VectorBinaryAggregate
extends Object

Abstract class encapsulating different algorithms that perform the Vector operations aggregate(). x.aggregte(y, fa, fc), for x and y Vectors and fa, fc DoubleDouble functions: - applies the function fc to every element in x and y, fc(xi, yi) - constructs a result iteratively, r0 = fc(x0, y0), ri = fc(r_{i-1}, fc(xi, yi)). This works essentially like a map/reduce functional combo. The names of variables, methods and classes used here follow the following conventions: The vector being assigned to (the left hand side) is called this or x. The right hand side is called that or y. The aggregating (reducing) function to be applied is called fa. The combining (mapping) function to be applied is called fc. The different algorithms take into account the different characteristics of vector classes: - whether the vectors support sequential iteration (isSequential()) - what the lookup cost is (getLookupCost()) - what the iterator advancement cost is (getIteratorAdvanceCost()) The names of the actual classes (they're nested in VectorBinaryAssign) describe the used for assignment. The most important optimization is iterating just through the nonzeros (only possible if f(0, 0) = 0). There are 4 main possibilities: - iterating through the nonzeros of just one vector and looking up the corresponding elements in the other - iterating through the intersection of nonzeros (those indices where both vectors have nonzero values) - iterating through the union of nonzeros (those indices where at least one of the vectors has a nonzero value) - iterating through all the elements in some way (either through both at the same time, both one after the other, looking up both, looking up just one). The internal details are not important and a particular algorithm should generally not be called explicitly. The best one will be selected through assignBest(), which is itself called through Vector.assign(). See https://docs.google.com/document/d/1g1PjUuvjyh2LBdq2_rKLIcUiDbeOORA1sCJiSsz-JVU/edit# for a more detailed explanation.


Nested Class Summary
static class VectorBinaryAggregate.AggregateAllIterateSequential
           
static class VectorBinaryAggregate.AggregateAllIterateThatLookupThis
           
static class VectorBinaryAggregate.AggregateAllIterateThisLookupThat
           
static class VectorBinaryAggregate.AggregateAllLoop
           
static class VectorBinaryAggregate.AggregateIterateIntersection
           
static class VectorBinaryAggregate.AggregateIterateUnionRandom
           
static class VectorBinaryAggregate.AggregateIterateUnionSequential
           
static class VectorBinaryAggregate.AggregateNonzerosIterateThatLookupThis
           
static class VectorBinaryAggregate.AggregateNonzerosIterateThisLookupThat
           
 
Field Summary
static VectorBinaryAggregate[] OPERATIONS
           
 
Constructor Summary
VectorBinaryAggregate()
           
 
Method Summary
abstract  double aggregate(Vector x, Vector y, DoubleDoubleFunction fa, DoubleDoubleFunction fc)
          Main method that applies fc to x and y component-wise aggregating the results with fa.
static double aggregateBest(Vector x, Vector y, DoubleDoubleFunction fa, DoubleDoubleFunction fc)
          This is the method that should be used when aggregating.
abstract  double estimateCost(Vector x, Vector y, DoubleDoubleFunction fa, DoubleDoubleFunction fc)
          Estimates the cost of using this algorithm to compute the aggregation.
static VectorBinaryAggregate getBestOperation(Vector x, Vector y, DoubleDoubleFunction fa, DoubleDoubleFunction fc)
          The best operation is the least expensive valid one.
abstract  boolean isValid(Vector x, Vector y, DoubleDoubleFunction fa, DoubleDoubleFunction fc)
          Returns true iff we can use this algorithm to apply fc to x and y component-wise and aggregate the result using fa.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OPERATIONS

public static final VectorBinaryAggregate[] OPERATIONS
Constructor Detail

VectorBinaryAggregate

public VectorBinaryAggregate()
Method Detail

isValid

public abstract boolean isValid(Vector x,
                                Vector y,
                                DoubleDoubleFunction fa,
                                DoubleDoubleFunction fc)
Returns true iff we can use this algorithm to apply fc to x and y component-wise and aggregate the result using fa.


estimateCost

public abstract double estimateCost(Vector x,
                                    Vector y,
                                    DoubleDoubleFunction fa,
                                    DoubleDoubleFunction fc)
Estimates the cost of using this algorithm to compute the aggregation. The algorithm is assumed to be valid.


aggregate

public abstract double aggregate(Vector x,
                                 Vector y,
                                 DoubleDoubleFunction fa,
                                 DoubleDoubleFunction fc)
Main method that applies fc to x and y component-wise aggregating the results with fa. It returns the result of the aggregation.


getBestOperation

public static VectorBinaryAggregate getBestOperation(Vector x,
                                                     Vector y,
                                                     DoubleDoubleFunction fa,
                                                     DoubleDoubleFunction fc)
The best operation is the least expensive valid one.


aggregateBest

public static double aggregateBest(Vector x,
                                   Vector y,
                                   DoubleDoubleFunction fa,
                                   DoubleDoubleFunction fc)
This is the method that should be used when aggregating. It selects the best algorithm and applies it.



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.