|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.fpm.pfpgrowth.PFPGrowth
public final class PFPGrowth
Parallel FP Growth Driver Class. Runs each stage of PFPGrowth as described in the paper http://infolab.stanford.edu/~echang/recsys08-69.pdf
Field Summary | |
---|---|
static String |
ENCODING
|
static String |
F_LIST
|
static String |
FILE_PATTERN
|
static String |
FP_GROWTH
|
static String |
FREQUENT_PATTERNS
|
static String |
INPUT
|
static String |
MAX_HEAP_SIZE
|
static String |
MAX_PER_GROUP
|
static String |
MIN_SUPPORT
|
static String |
NUM_GROUPS
|
static int |
NUM_GROUPS_DEFAULT
|
static String |
OUTPUT
|
static String |
PARALLEL_COUNTING
|
static String |
PFP_PARAMETERS
|
static String |
SPLIT_PATTERN
|
static Pattern |
SPLITTER
|
static String |
USE_FPG2
|
Method Summary | |
---|---|
static int |
getGroup(int itemId,
int maxPerGroup)
|
static IntArrayList |
getGroupMembers(int groupId,
int maxPerGroup,
int numFeatures)
|
static List<Pair<String,Long>> |
readFList(org.apache.hadoop.conf.Configuration conf)
Generates the fList from the serialized string representation |
static List<Pair<String,Long>> |
readFList(Parameters params)
read the feature frequency List which is built at the end of the Parallel counting job |
static List<Pair<String,TopKStringPatterns>> |
readFrequentPattern(Parameters params)
Read the Frequent Patterns generated from Text |
static void |
runPFPGrowth(Parameters params)
|
static void |
runPFPGrowth(Parameters params,
org.apache.hadoop.conf.Configuration conf)
|
static void |
saveFList(Iterable<Pair<String,Long>> flist,
Parameters params,
org.apache.hadoop.conf.Configuration conf)
Serializes the fList and returns the string representation of the List |
static void |
startAggregating(Parameters params,
org.apache.hadoop.conf.Configuration conf)
Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature |
static void |
startParallelCounting(Parameters params,
org.apache.hadoop.conf.Configuration conf)
Count the frequencies of various features in parallel using Map/Reduce |
static void |
startParallelFPGrowth(Parameters params,
org.apache.hadoop.conf.Configuration conf)
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String ENCODING
public static final String F_LIST
public static final String NUM_GROUPS
public static final int NUM_GROUPS_DEFAULT
public static final String MAX_PER_GROUP
public static final String OUTPUT
public static final String MIN_SUPPORT
public static final String MAX_HEAP_SIZE
public static final String INPUT
public static final String PFP_PARAMETERS
public static final String FILE_PATTERN
public static final String FP_GROWTH
public static final String FREQUENT_PATTERNS
public static final String PARALLEL_COUNTING
public static final String SPLIT_PATTERN
public static final String USE_FPG2
public static final Pattern SPLITTER
Method Detail |
---|
public static List<Pair<String,Long>> readFList(org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static void saveFList(Iterable<Pair<String,Long>> flist, Parameters params, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static List<Pair<String,Long>> readFList(Parameters params)
public static int getGroup(int itemId, int maxPerGroup)
public static IntArrayList getGroupMembers(int groupId, int maxPerGroup, int numFeatures)
public static List<Pair<String,TopKStringPatterns>> readFrequentPattern(Parameters params) throws IOException
IOException
public static void runPFPGrowth(Parameters params, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException, ClassNotFoundException
params
- paramsconf
- Configuration
ClassNotFoundException
InterruptedException
IOException
public static void runPFPGrowth(Parameters params) throws IOException, InterruptedException, ClassNotFoundException
params
- params should contain input and output locations as a string value, the additional parameters
include minSupport(3), maxHeapSize(50), numGroups(1000)
IOException
InterruptedException
ClassNotFoundException
public static void startAggregating(Parameters params, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException, ClassNotFoundException
IOException
InterruptedException
ClassNotFoundException
public static void startParallelCounting(Parameters params, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException, ClassNotFoundException
IOException
InterruptedException
ClassNotFoundException
public static void startParallelFPGrowth(Parameters params, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException, ClassNotFoundException
IOException
InterruptedException
ClassNotFoundException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |