org.apache.mahout.clustering.topdown.postprocessor
Class ClusterOutputPostProcessorDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public final class ClusterOutputPostProcessorDriver
- extends AbstractJob
Post processes the output of clustering algorithms and groups them into respective clusters. Ideal to be
used for top down clustering. It can also be used if the clustering output needs to be grouped into their
respective clusters.
Method Summary |
static void |
main(String[] args)
|
static void |
run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
boolean runSequential)
Post processes the output of clustering algorithms and groups them into respective clusters. |
int |
run(String[] args)
CLI to run clustering post processor. |
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
run
public int run(String[] args)
throws Exception
- CLI to run clustering post processor. The input to post processor is the ouput path specified to the
clustering.
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public static void run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Post processes the output of clustering algorithms and groups them into respective clusters. Each
cluster's vectors are written into a directory named after its clusterId.
- Parameters:
input
- The output path provided to the clustering algorithm, whose would be post processed. Hint: The
path of the directory containing clusters-*-final and clusteredPoints.output
- The post processed data would be stored at this path.runSequential
- If set to true, post processes it sequentially, else, uses. MapReduce. Hint: If the clustering
was done sequentially, make it sequential, else vice versa.
- Throws:
IOException
InterruptedException
ClassNotFoundException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.