GNU Scientific Library – Reference Manual: The histogram probability distribution struct

23.10 The histogram probability distribution struct

The probability distribution function for a histogram consists of a set of bins which measure the probability of an event falling into a given range of a continuous variable x. A probability distribution function is defined by the following struct, which actually stores the cumulative probability distribution function. This is the natural quantity for generating samples via the inverse transform method, because there is a one-to-one mapping between the cumulative probability distribution and the range [0,1]. It can be shown that by taking a uniform random number in this range and finding its corresponding coordinate in the cumulative probability distribution we obtain samples with the desired probability distribution.

Data Type: gsl_histogram_pdf

size_t n: This is the number of bins used to approximate the probability distribution function.
double * range: The ranges of the bins are stored in an array of n+1 elements pointed to by range.
double * sum: The cumulative probability for the bins is stored in an array of n elements pointed to by sum.

The following functions allow you to create a gsl_histogram_pdf struct which represents this probability distribution and generate random samples from it.

Function: gsl_histogram_pdf * gsl_histogram_pdf_alloc (size_t n): This function allocates memory for a probability distribution with n bins and returns a pointer to a newly initialized gsl_histogram_pdf struct. If insufficient memory is available a null pointer is returned and the error handler is invoked with an error code of GSL_ENOMEM.

Function: int gsl_histogram_pdf_init (gsl_histogram_pdf * p, const gsl_histogram * h): This function initializes the probability distribution p with the contents of the histogram h. If any of the bins of h are negative then the error handler is invoked with an error code of GSL_EDOM because a probability distribution cannot contain negative values.

Function: void gsl_histogram_pdf_free (gsl_histogram_pdf * p): This function frees the probability distribution function p and all of the memory associated with it.

Function: double gsl_histogram_pdf_sample (const gsl_histogram_pdf * p, double r)

This function uses r, a uniform random number between zero and one, to compute a single random sample from the probability distribution p. The algorithm used to compute the sample s is given by the following formula,

s = range[i] + delta * (range[i+1] - range[i])

where i is the index which satisfies sum[i] <= r < sum[i+1] and delta is (r - sum[i])/(sum[i+1] - sum[i]).