org.apache.mahout.text.wikipedia
Class XmlInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
          extended by org.apache.hadoop.mapreduce.lib.input.TextInputFormat
              extended by org.apache.mahout.text.wikipedia.XmlInputFormat

public class XmlInputFormat
extends org.apache.hadoop.mapreduce.lib.input.TextInputFormat

Reads records that are delimited by a specific begin/end tag.


Nested Class Summary
static class XmlInputFormat.XmlRecordReader
          XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
 
Field Summary
static String END_TAG_KEY
           
static String START_TAG_KEY
           
 
Constructor Summary
XmlInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
isSplitable
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START_TAG_KEY

public static final String START_TAG_KEY
See Also:
Constant Field Values

END_TAG_KEY

public static final String END_TAG_KEY
See Also:
Constant Field Values
Constructor Detail

XmlInputFormat

public XmlInputFormat()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                org.apache.hadoop.mapreduce.TaskAttemptContext context)
Overrides:
createRecordReader in class org.apache.hadoop.mapreduce.lib.input.TextInputFormat


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.