org.apache.mahout.text
Class SequenceFilesFromMailArchives
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.text.SequenceFilesFromMailArchives
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public final class SequenceFilesFromMailArchives
- extends AbstractJob
Converts a directory of gzipped mail archives into SequenceFiles of specified
chunkSize. This class is similar to SequenceFilesFromDirectory
except
it uses block-compressed SequenceFile
s and parses out the subject and
body text of each mail message into a separate key/value pair.
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CHUNK_SIZE_OPTION
public static final String[] CHUNK_SIZE_OPTION
KEY_PREFIX_OPTION
public static final String[] KEY_PREFIX_OPTION
CHARSET_OPTION
public static final String[] CHARSET_OPTION
SUBJECT_OPTION
public static final String[] SUBJECT_OPTION
TO_OPTION
public static final String[] TO_OPTION
FROM_OPTION
public static final String[] FROM_OPTION
REFERENCES_OPTION
public static final String[] REFERENCES_OPTION
BODY_OPTION
public static final String[] BODY_OPTION
STRIP_QUOTED_OPTION
public static final String[] STRIP_QUOTED_OPTION
QUOTED_REGEX_OPTION
public static final String[] QUOTED_REGEX_OPTION
SEPARATOR_OPTION
public static final String[] SEPARATOR_OPTION
BODY_SEPARATOR_OPTION
public static final String[] BODY_SEPARATOR_OPTION
BASE_INPUT_PATH
public static final String BASE_INPUT_PATH
- See Also:
- Constant Field Values
SequenceFilesFromMailArchives
public SequenceFilesFromMailArchives()
createSequenceFiles
public void createSequenceFiles(MailOptions options)
throws IOException
- Throws:
IOException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.