Colt 1.2.0

Package cern.jet.stat.quantile

Scalable algorithms and data structures to compute approximate quantiles over very large data sequences.

See:
          Description

Interface Summary
DoubleQuantileFinder The interface shared by all quantile finders, no matter if they are exact or approximate.
 

Class Summary
EquiDepthHistogram Read-only equi-depth histogram for selectivity estimation.
Quantile1Test A class to test the QuantileBin1D code.
QuantileFinderFactory Factory constructing exact and approximate quantile finders for both known and unknown N.
 

Package cern.jet.stat.quantile Description

Scalable algorithms and data structures to compute approximate quantiles over very large data sequences. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than for any other known technique by an order of magnitude.

The approx. algorithms are primarily intended to help applications scale. When faced with a large data sequence, traditional methods either need very large memories or time consuming disk based sorting. In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.

All classes can be seen from various angles, for example as

1. Algorithm to compute quantiles.
2. 1-dim-equi-depth histogram.
3. 1-dim-histogram arbitrarily rebinnable in real-time.
4. A space efficient MultiSet data structure using lossy compression.
5. A space efficient value preserving bin of a 2-dim or d-dim histogram.
(All subject to an accuracy specified by the user.) Have a look at the documentation of class QuantileFinderFactory and the interface DoubleQuantileFinder to learn more. Most users will never need to know more than how to use these. Actual implementations of the QuantileFinder interface are hidden. They are indirectly constructed via the the factory.
Also see QuantileBin1D, demonstrating how this package can be used.


Colt 1.2.0

Jump to the Colt Homepage