org.apache.mahout.utils.email
Class MailProcessor
java.lang.Object
org.apache.mahout.utils.email.MailProcessor
public class MailProcessor
- extends Object
Converts an mbox mail archive into a group of Hadoop Sequence Files with equal size. The archive may optionally be
gzipped or zipped. @see org.apache.mahout.text.SequenceFilesFromMailArchives
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SUBJECT_PREFIX
public static final Pattern SUBJECT_PREFIX
FROM_PREFIX
public static final Pattern FROM_PREFIX
REFS_PREFIX
public static final Pattern REFS_PREFIX
TO_PREFIX
public static final Pattern TO_PREFIX
MailProcessor
public MailProcessor(MailOptions options,
String prefix,
Writer writer)
- Creates a
MailProcessor
that does not write to sequence files, but to a single text file.
This constructor is for debugging and testing purposes.
MailProcessor
public MailProcessor(MailOptions options,
String prefix,
ChunkedWriter writer)
- This is the main constructor of
MailProcessor
.
parseMboxLineByLine
public long parseMboxLineByLine(File mboxFile)
throws IOException
- Parses one complete mail archive, writing output to the
writer
constructor parameter.
- Parameters:
mboxFile
- mail archive to parse
- Returns:
- number of parsed mails
- Throws:
IOException
generateKey
protected static String generateKey(File mboxFile,
String prefix,
String messageId)
getPrefix
public String getPrefix()
getOptions
public MailOptions getOptions()
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.