org.apache.mahout.classifier.df.data
Class DataLoader

java.lang.Object
  extended by org.apache.mahout.classifier.df.data.DataLoader

public final class DataLoader
extends Object

Converts the input data to a Vector Array using the information given by the Dataset.
Generates for each line a Vector that contains :


adds an IGNORED first attribute that will contain a unique id for each instance, which is the line number of the instance in the input data


Method Summary
static Dataset generateDataset(CharSequence descriptor, boolean regression, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Generates the Dataset by parsing the entire data
static Dataset generateDataset(CharSequence descriptor, boolean regression, String[] data)
          Generates the Dataset by parsing the entire data
static Data loadData(Dataset dataset, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path fpath)
          Loads the data from a file
static Data loadData(Dataset dataset, String[] data)
          Loads the data from a String array
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

loadData

public static Data loadData(Dataset dataset,
                            org.apache.hadoop.fs.FileSystem fs,
                            org.apache.hadoop.fs.Path fpath)
                     throws IOException
Loads the data from a file

Parameters:
fs - file system
fpath - data file path
Throws:
IOException - if any problem is encountered

loadData

public static Data loadData(Dataset dataset,
                            String[] data)
Loads the data from a String array


generateDataset

public static Dataset generateDataset(CharSequence descriptor,
                                      boolean regression,
                                      org.apache.hadoop.fs.FileSystem fs,
                                      org.apache.hadoop.fs.Path path)
                               throws DescriptorException,
                                      IOException
Generates the Dataset by parsing the entire data

Parameters:
descriptor - attributes description
regression - if true, the label is numerical
fs - file system
path - data path
Throws:
DescriptorException
IOException

generateDataset

public static Dataset generateDataset(CharSequence descriptor,
                                      boolean regression,
                                      String[] data)
                               throws DescriptorException
Generates the Dataset by parsing the entire data

Parameters:
descriptor - attributes description
Throws:
DescriptorException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.