org.apache.mahout.clustering.topdown.postprocessor
Class ClusterOutputPostProcessorDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class ClusterOutputPostProcessorDriver
extends AbstractJob

Post processes the output of clustering algorithms and groups them into respective clusters. Ideal to be used for top down clustering. It can also be used if the clustering output needs to be grouped into their respective clusters.


Field Summary
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Method Summary
static void main(String[] args)
           
static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, boolean runSequential)
          Post processes the output of clustering algorithms and groups them into respective clusters.
 int run(String[] args)
          CLI to run clustering post processor.
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

run

public int run(String[] args)
        throws Exception
CLI to run clustering post processor. The input to post processor is the ouput path specified to the clustering.

Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
Post processes the output of clustering algorithms and groups them into respective clusters. Each cluster's vectors are written into a directory named after its clusterId.

Parameters:
input - The output path provided to the clustering algorithm, whose would be post processed. Hint: The path of the directory containing clusters-*-final and clusteredPoints.
output - The post processed data would be stored at this path.
runSequential - If set to true, post processes it sequentially, else, uses. MapReduce. Hint: If the clustering was done sequentially, make it sequential, else vice versa.
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.