org.apache.mahout.utils.vectors.lucene
Class LuceneIterator
java.lang.Object
com.google.common.collect.UnmodifiableIterator<T>
com.google.common.collect.AbstractIterator<Vector>
org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator
org.apache.mahout.utils.vectors.lucene.LuceneIterator
- All Implemented Interfaces:
- Iterator<Vector>
public class LuceneIterator
- extends AbstractLuceneIterator
An Iterator
over Vector
s that uses a Lucene index as the source
for creating the Vector
s. The field used to create the vectors currently must have
term vectors stored for it.
Fields inherited from class org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator |
bump, field, indexReader, maxErrorDocs, nextDocId, nextLogRecord, normPower, numErrorDocs, skippedErrorMessages, terminfo, weight |
Constructor Summary |
LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower)
Produce a LuceneIterable that can create the Vector plus normalize it. |
LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower,
double maxPercentErrorDocs)
|
Method Summary |
protected String |
getVectorName(int documentIndex)
Given the document name, derive a name for the vector. |
Methods inherited from class com.google.common.collect.AbstractIterator |
endOfData, hasNext, next, peek |
Methods inherited from class com.google.common.collect.UnmodifiableIterator |
remove |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
idFieldSelector
protected final Set<String> idFieldSelector
idField
protected final String idField
LuceneIterator
public LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower)
- Produce a LuceneIterable that can create the Vector plus normalize it.
- Parameters:
indexReader
- IndexReader
to read the documents from.idField
- field containing the id. May be null.field
- field to use for the VectortermInfo
- termInfoweight
- weightnormPower
- the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING
LuceneIterator
public LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower,
double maxPercentErrorDocs)
- Parameters:
indexReader
- IndexReader
to read the documents from.idField
- field containing the id. May be null.field
- field to use for the VectortermInfo
- termInfoweight
- weightnormPower
- the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING
maxPercentErrorDocs
- most documents that will be tolerated without a term freq vector. In [0,1].- See Also:
LuceneIterator(org.apache.lucene.index.IndexReader, String, String, org.apache.mahout.utils.vectors.TermInfo,
org.apache.mahout.vectorizer.Weight, double)
getVectorName
protected String getVectorName(int documentIndex)
throws IOException
- Description copied from class:
AbstractLuceneIterator
- Given the document name, derive a name for the vector. This may involve
reading the document from Lucene and setting up any other state that the
subclass wants. This will be called once for each document that the
iterator processes.
- Specified by:
getVectorName
in class AbstractLuceneIterator
- Parameters:
documentIndex
- the lucene document index.
- Returns:
- the name to store in the vector.
- Throws:
IOException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.