| Introduction |
Welcome to the Colt Project. Colt provides a set of Open Source Libraries for
High Performance Scientific and Technical Computing in Java.
Scientific and technical computing,
as, for example, carried out at CERN, is characterized by demanding problem sizes
and a need for high performance at reasonably small memory footprint. There is
a perception by many that the Java language is unsuited for such work. However,
recent trends in its evolution suggest that it may soon be a major player in performance
sensitive scientific and technical computing. For example, IBM Watson's Ninja
project showed that Java can indeed perform BLAS matrix computations up to
90% as fast as optimized Fortran. The Java
Grande Forum Numerics Working Group provides a focal point for information
on numerical computing in Java. With the performance gap steadily closing, Java
has recently found increased adoption in the field. The reasons include ease of
use, cross-platform nature, built-in support for multi-threading, network friendly
APIs and a healthy pool of available developers. Still, these efforts are to a
significant degree hindered by the lack of foundation toolkits broadly available
and conveniently accessible in C and Fortran.
The latest stable Colt release breaks the 1.9 Gflop/s barrier
on JDK ibm-1.4.1, RedHat 9.0, 2x IntelXeon@2.8 GHz.
This distribution provides an infrastructure for scalable scientific
and technical computing in Java. It is particularly useful in the domain of
High Energy Physics at CERN: It contains, among others, efficient and usable
data structures and algorithms for Off-line and On-line Data Analysis, Linear
Algebra, Multi-dimensional arrays, Statistics, Histogramming, Monte Carlo Simulation,
Parallel & Concurrent Programming. It summons some of the best concepts,
designs and implementations thought up over time by the community, ports or
improves them and introduces new approaches where need arises. In overlapping
areas, it is competitive or superior to toolkits such as STL,
Root,
HTL, CLHEP, TNT,
GSL,
C-RAND
/ WIN-RAND, (all C/C++) as well as
IBM Array,
JDK 1.2 Collections framework (all Java), in terms of performance (!), functionality and (re)usability.
This distribution consists of several free Java libraries,
for user convenience bundled under one single uniform umbrella. Namely the Colt
library, the Jet library, the CoreJava library, and the Concurrent library.
The Colt library provides fundamental general-purpose data
structures optimized for numerical data, such as resizable arrays, dense and
sparse matrices (multi-dimensional arrays), linear algebra, associative containers
and buffer management. The Jet library contains mathematical and statistical
tools for data analysis, powerful histogramming functionality, Random Number
Generators and Distributions useful for (event) simulations, and more. The CoreJava
library contains C-like print formatting. The Concurrent library contains standardized,
efficient utility classes commonly encountered in parallel & concurrent
programming.
A distribution download includes HTML API documentation
and source codes for all libraries, as well as one single cross-platform shared
library file, colt.jar, containing the distribution compiled
to immediately executable format. Thus, a user can start to work by setting
one single environment variable. He/she never needs to bother about compilation/architecture/linker
issues.
|
Features |
Templated Lists and Maps |
Dynamically resizing lists
holding objects or primitive data types such as int , double ,
etc. Operations on primitive arrays, algorithms on Colt lists and JAL algorithms
(see below) can freely be mixed at zero copy overhead. More
details. Automatically growing and shrinking maps holding objects or
primitive data types such as int, double, etc. More
details. Space efficient high performance BitVectors and BitMatrices.
More details
|
Templated Multi-dimensional matrices |
Dense and sparse fixed sized (non-resizable) 1,2, 3 and d-dimensional matrices
holding objects or primitive data types such as int, double,
etc; Also known as multi-dimensional arrays or Data Cubes.
More details.
|
Linear Algebra |
Standard matrix operations and decompositions. LU, QR, Cholesky, Eigenvalue, Singular value.
More details.
|
Histogramming |
Compact, extensible, modular
and performant histogramming functionality. AIDA offers the histogramming
features of HTL and HBOOK. More details here
and also there.
|
Mathematics |
Tools for basic and advanced
mathematics: Arithmetics and Algebra, Polynomials and Chebyshev series,
Bessel and Airy functions, Constants and Units, Trigonometric functions,
etc. More details.
|
Statistics |
Tools for basic and advanced
statistics: Estimators, Gamma functions, Beta functions, Probabilities,
Special integrals, etc. More
details.
|
Random Numbers and Random Sampling |
Strong
yet quick. Partly a port of CLHEP. More details here
and there and
also there.
|
util.concurrent |
Efficient utility classes commonly encountered in parallel & concurrent programming.
More
details.
|
|
Design Goals |
Efficiency |
Routines are typically fast both
due to the chosen algorithms and datastructures as well as due to careful
implementation. For comparative benchmarks the latest stable JDK is recommended.
|
User friendliness |
To the casual user this
is a high level object oriented toolkit, consisting of classes which directly
provide most frequently needed functionality. Most users will never need
to extend or modify any code. Classes are cleanly separated into several
mostly self contained packages.
|
Expert friendliness |
In our view, implementations
should not be hidden. Instead, a user, according to his or her likings,
should be encouraged to look under the hood and even tinker with the code.
Not only the public API is extensively documented, but also internal code.
Users who wish to enrich, modify or customize functionality should be able
to do so without much effort.
|
Safety |
Most methods defensively check preconditions
and throw appropriate exceptions. However, almost none of them are synchronized.
|
|
|