org.apache.mahout.clustering.syntheticcontrol.kmeans
Class Job

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class Job
extends AbstractJob


Field Summary
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Method Summary
static void main(String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, double convergenceDelta, int maxIterations)
          Run the kmeans clustering job on an input dataset using the given distance measure, t1, t2 and iteration parameters.
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, int k, double convergenceDelta, int maxIterations)
          Run the kmeans clustering job on an input dataset using the given the number of clusters k and iteration parameters.
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       int k,
                       double convergenceDelta,
                       int maxIterations)
                throws Exception
Run the kmeans clustering job on an input dataset using the given the number of clusters k and iteration parameters. All output data will be written to the output directory, which will be initially deleted if it exists. The clustered points will reside in the path /clustered-points. By default, the job expects a file containing equal length space delimited data that resides in a directory named "testdata", and writes output to a directory named "output".

Parameters:
conf - the Configuration to use
input - the String denoting the input directory path
output - the String denoting the output directory path
measure - the DistanceMeasure to use
k - the number of clusters in Kmeans
convergenceDelta - the double convergence criteria for iterations
maxIterations - the int maximum number of iterations
Throws:
Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       double convergenceDelta,
                       int maxIterations)
                throws Exception
Run the kmeans clustering job on an input dataset using the given distance measure, t1, t2 and iteration parameters. All output data will be written to the output directory, which will be initially deleted if it exists. The clustered points will reside in the path /clustered-points. By default, the job expects the a file containing synthetic_control.data as obtained from http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series resides in a directory named "testdata", and writes output to a directory named "output".

Parameters:
conf - the Configuration to use
input - the String denoting the input directory path
output - the String denoting the output directory path
measure - the DistanceMeasure to use
t1 - the canopy T1 threshold
t2 - the canopy T2 threshold
convergenceDelta - the double convergence criteria for iterations
maxIterations - the int maximum number of iterations
Throws:
Exception


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.