org.apache.mahout.text.wikipedia
Class XmlInputFormat.XmlRecordReader
java.lang.Object
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.mahout.text.wikipedia.XmlInputFormat.XmlRecordReader
- All Implemented Interfaces:
- Closeable
- Enclosing class:
- XmlInputFormat
public static class XmlInputFormat.XmlRecordReader
- extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
XMLRecordReader class to read through a given xml document to output xml blocks as records as specified
by the start tag and end tag
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XmlInputFormat.XmlRecordReader
public XmlInputFormat.XmlRecordReader(org.apache.hadoop.mapreduce.lib.input.FileSplit split,
org.apache.hadoop.conf.Configuration conf)
throws IOException
- Throws:
IOException
close
public void close()
throws IOException
- Specified by:
close
in interface Closeable
- Specified by:
close
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
getProgress
public float getProgress()
throws IOException
- Specified by:
getProgress
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
getCurrentKey
public org.apache.hadoop.io.LongWritable getCurrentKey()
throws IOException,
InterruptedException
- Specified by:
getCurrentKey
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
getCurrentValue
public org.apache.hadoop.io.Text getCurrentValue()
throws IOException,
InterruptedException
- Specified by:
getCurrentValue
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
initialize
public void initialize(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException,
InterruptedException
- Specified by:
initialize
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
nextKeyValue
public boolean nextKeyValue()
throws IOException,
InterruptedException
- Specified by:
nextKeyValue
in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.