|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.math.stats.TDigest
public class TDigest
Adaptive histogram based on something like streaming k-means crossed with Q-digest.
The special characteristics of this algorithm are: a) smaller summaries than Q-digest b) works on doubles as well as integers. c) provides part per million accuracy for extreme quantiles and typically <1000 ppm accuracy for middle quantiles d) fast e) simple f) test coverage > 90% g) easy to adapt for use with map-reduce
Nested Class Summary | |
---|---|
static class |
TDigest.Group
|
Field Summary | |
---|---|
static int |
SMALL_ENCODING
|
static int |
VERBOSE_ENCODING
|
Constructor Summary | |
---|---|
TDigest(double compression)
A histogram structure that will record a sketch of a distribution. |
Method Summary | |
---|---|
void |
add(double x)
Adds a sample to a histogram. |
void |
add(double x,
int w)
Adds a sample to a histogram. |
void |
add(TDigest other)
|
void |
asBytes(ByteBuffer buf)
Outputs a histogram as bytes using a particularly cheesy encoding. |
void |
asSmallBytes(ByteBuffer buf)
|
int |
byteSize()
Returns an upper bound on the number bytes that will be required to represent this histogram. |
double |
cdf(double x)
|
int |
centroidCount()
|
Iterable<? extends TDigest.Group> |
centroids()
|
void |
compress()
|
double |
compression()
|
static int |
decode(ByteBuffer buf)
|
static void |
encode(ByteBuffer buf,
int n)
|
static TDigest |
fromBytes(ByteBuffer buf)
Reads a histogram from a byte buffer |
static TDigest |
merge(double compression,
Iterable<TDigest> subData)
|
double |
quantile(double q)
|
TDigest |
recordAllData()
Sets up so that all centroids will record all data assigned to them. |
int |
size()
Returns the number of samples represented in this histogram. |
int |
smallByteSize()
Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int VERBOSE_ENCODING
public static final int SMALL_ENCODING
Constructor Detail |
---|
public TDigest(double compression)
compression
- How should accuracy be traded for size? A value of N here will give quantile errors
almost always less than 3/N with considerably smaller errors expected for extreme
quantiles. Conversely, you should expect to track about 5 N centroids for this
accuracy.Method Detail |
---|
public void add(double x)
x
- The value to add.public void add(double x, int w)
x
- The value to add.w
- The weight of this point.public void add(TDigest other)
public static TDigest merge(double compression, Iterable<TDigest> subData)
public void compress()
public int size()
public double cdf(double x)
x
- the value at which the CDF should be evaluated
public double quantile(double q)
q
- The quantile desired. Can be in the range [0,1].
public int centroidCount()
public Iterable<? extends TDigest.Group> centroids()
public double compression()
public TDigest recordAllData()
public int byteSize()
public int smallByteSize()
public void asBytes(ByteBuffer buf)
public void asSmallBytes(ByteBuffer buf)
public static void encode(ByteBuffer buf, int n)
public static int decode(ByteBuffer buf)
public static TDigest fromBytes(ByteBuffer buf)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |