org.apache.mahout.math.neighborhood
Class LocalitySensitiveHashSearch

java.lang.Object
  extended by org.apache.mahout.math.neighborhood.Searcher
      extended by org.apache.mahout.math.neighborhood.UpdatableSearcher
          extended by org.apache.mahout.math.neighborhood.LocalitySensitiveHashSearch
All Implemented Interfaces:
Iterable<Vector>

public class LocalitySensitiveHashSearch
extends UpdatableSearcher

Implements a Searcher that uses locality sensitivity hash as a first pass approximation to estimate distance without floating point math. The clever bit about this implementation is that it does an adaptive cutoff for the cutoff on the bitwise distance. Making this cutoff adaptive means that we only needs to make a single pass through the data.


Field Summary
 
Fields inherited from class org.apache.mahout.math.neighborhood.Searcher
distanceMeasure
 
Constructor Summary
LocalitySensitiveHashSearch(DistanceMeasure distanceMeasure, int searchSize)
           
 
Method Summary
 void add(Vector vector)
          Add a new Vector to the Searcher that will be checked when getting the nearest neighbors.
 void clear()
           
 int getSearchSize()
           
 Iterator<Vector> iterator()
           
 boolean remove(Vector v, double epsilon)
           
protected static WeightedThing<Vector> removeHash(WeightedThing<Vector> input)
           
 int resetEvaluationCount()
          This is only for testing.
 List<WeightedThing<Vector>> search(Vector query, int limit)
          When querying the Searcher for the closest vectors, a list of WeightedThings is returned.
 WeightedThing<Vector> searchFirst(Vector query, boolean differentThanQuery)
          Returns the closest vector to the query.
 void setRaiseHashLimitStrategy(double strategy)
           
 void setSearchSize(int size)
           
 int size()
          Returns the number of WeightedVectors being searched for nearest neighbors.
 
Methods inherited from class org.apache.mahout.math.neighborhood.Searcher
addAll, addAllMatrixSlices, addAllMatrixSlicesAsWeightedVectors, getCandidateQueue, getDistanceMeasure, search, searchFirst
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LocalitySensitiveHashSearch

public LocalitySensitiveHashSearch(DistanceMeasure distanceMeasure,
                                   int searchSize)
Method Detail

search

public List<WeightedThing<Vector>> search(Vector query,
                                          int limit)
Description copied from class: Searcher
When querying the Searcher for the closest vectors, a list of WeightedThings is returned. The value of the WeightedThing is the neighbor and the weight is the the distance (calculated by some metric - see a concrete implementation) between the query and neighbor. The actual type of vector in the pair is the same as the vector added to the Searcher.

Specified by:
search in class Searcher
Parameters:
query - the vector to search for
limit - the number of results to return
Returns:
the list of weighted vectors closest to the query

searchFirst

public WeightedThing<Vector> searchFirst(Vector query,
                                         boolean differentThanQuery)
Returns the closest vector to the query. When only one the nearest vector is needed, use this method, NOT search(query, limit) because it's faster (less overhead). This is nearly the same as search().

Specified by:
searchFirst in class Searcher
Parameters:
query - the vector to search for
differentThanQuery - if true, returns the closest vector different than the query (this only matters if the query is among the searched vectors), otherwise, returns the closest vector to the query (even the same vector).
Returns:
the weighted vector closest to the query

removeHash

protected static WeightedThing<Vector> removeHash(WeightedThing<Vector> input)

add

public void add(Vector vector)
Description copied from class: Searcher
Add a new Vector to the Searcher that will be checked when getting the nearest neighbors. The vector IS NOT CLONED. Do not modify the vector externally otherwise the internal Searcher data structures could be invalidated.

Specified by:
add in class Searcher

size

public int size()
Description copied from class: Searcher
Returns the number of WeightedVectors being searched for nearest neighbors.

Specified by:
size in class Searcher

getSearchSize

public int getSearchSize()

setSearchSize

public void setSearchSize(int size)

setRaiseHashLimitStrategy

public void setRaiseHashLimitStrategy(double strategy)

resetEvaluationCount

public int resetEvaluationCount()
This is only for testing.

Returns:
the number of times the actual distance between two vectors was computed.

iterator

public Iterator<Vector> iterator()

remove

public boolean remove(Vector v,
                      double epsilon)
Specified by:
remove in class UpdatableSearcher

clear

public void clear()
Specified by:
clear in class UpdatableSearcher


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.