GNU Scientific Library – Reference Manual: VEGAS

25.4 VEGAS

The VEGAS algorithm of Lepage is based on importance sampling. It samples points from the probability distribution described by the function |f|, so that the points are concentrated in the regions that make the largest contribution to the integral.

In general, if the Monte Carlo integral of f is sampled with points distributed according to a probability distribution described by the function g, we obtain an estimate E_g(f; N),

E_g(f; N) = E(f/g; N)

with a corresponding variance,

\Var_g(f; N) = \Var(f/g; N).

If the probability distribution is chosen as g = |f|/I(|f|) then it can be shown that the variance V_g(f; N) vanishes, and the error in the estimate will be zero. In practice it is not possible to sample from the exact distribution g for an arbitrary function, so importance sampling algorithms aim to produce efficient approximations to the desired distribution.

The VEGAS algorithm approximates the exact distribution by making a number of passes over the integration region while histogramming the function f. Each histogram is used to define a sampling distribution for the next pass. Asymptotically this procedure converges to the desired distribution. In order to avoid the number of histogram bins growing like K^d the probability distribution is approximated by a separable function: g(x_1, x_2, ...) = g_1(x_1) g_2(x_2) ... so that the number of bins required is only Kd. This is equivalent to locating the peaks of the function from the projections of the integrand onto the coordinate axes. The efficiency of VEGAS depends on the validity of this assumption. It is most efficient when the peaks of the integrand are well-localized. If an integrand can be rewritten in a form which is approximately separable this will increase the efficiency of integration with VEGAS.

VEGAS incorporates a number of additional features, and combines both stratified sampling and importance sampling. The integration region is divided into a number of “boxes”, with each box getting a fixed number of points (the goal is 2). Each box can then have a fractional number of bins, but if the ratio of bins-per-box is less than two, Vegas switches to a kind variance reduction (rather than importance sampling).

Function: gsl_monte_vegas_state * gsl_monte_vegas_alloc (size_t dim): This function allocates and initializes a workspace for Monte Carlo integration in dim dimensions. The workspace is used to maintain the state of the integration.

Function: int gsl_monte_vegas_init (gsl_monte_vegas_state* s): This function initializes a previously allocated integration state. This allows an existing workspace to be reused for different integrations.

Function: int gsl_monte_vegas_integrate (gsl_monte_function * f, double xl[], double xu[], size_t dim, size_t calls, gsl_rng * r, gsl_monte_vegas_state * s, double * result, double * abserr): This routines uses the VEGAS Monte Carlo algorithm to integrate the function f over the dim-dimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu, each of size dim. The integration uses a fixed number of function calls calls, and obtains random sampling points using the random number generator r. A previously allocated workspace s must be supplied. The result of the integration is returned in result, with an estimated absolute error abserr. The result and its error estimate are based on a weighted average of independent samples. The chi-squared per degree of freedom for the weighted average is returned via the state struct component, s->chisq, and must be consistent with 1 for the weighted average to be reliable.

Function: void gsl_monte_vegas_free (gsl_monte_vegas_state * s): This function frees the memory associated with the integrator state s.

The VEGAS algorithm computes a number of independent estimates of the integral internally, according to the iterations parameter described below, and returns their weighted average. Random sampling of the integrand can occasionally produce an estimate where the error is zero, particularly if the function is constant in some regions. An estimate with zero error causes the weighted average to break down and must be handled separately. In the original Fortran implementations of VEGAS the error estimate is made non-zero by substituting a small value (typically 1e-30). The implementation in GSL differs from this and avoids the use of an arbitrary constant—it either assigns the value a weight which is the average weight of the preceding estimates or discards it according to the following procedure,

current estimate has zero error, weighted average has finite error: The current estimate is assigned a weight which is the average weight of the preceding estimates.
current estimate has finite error, previous estimates had zero error: The previous estimates are discarded and the weighted averaging procedure begins with the current estimate.
current estimate has zero error, previous estimates had zero error: The estimates are averaged using the arithmetic mean, but no error is computed.

The convergence of the algorithm can be tested using the overall chi-squared value of the results, which is available from the following function:

Function: double gsl_monte_vegas_chisq (const gsl_monte_vegas_state * s): This function returns the chi-squared per degree of freedom for the weighted estimate of the integral. The returned value should be close to 1. A value which differs significantly from 1 indicates that the values from different iterations are inconsistent. In this case the weighted error will be under-estimated, and further iterations of the algorithm are needed to obtain reliable results.

Function: void gsl_monte_vegas_runval (const gsl_monte_vegas_state * s, double * result, double * sigma): This function returns the raw (unaveraged) values of the integral result and its error sigma from the most recent iteration of the algorithm.

The VEGAS algorithm is highly configurable. Several parameters can be changed using the following two functions.

Function: void gsl_monte_vegas_params_get (const gsl_monte_vegas_state * s, gsl_monte_vegas_params * params): This function copies the parameters of the integrator state into the user-supplied params structure.

Function: void gsl_monte_vegas_params_set (gsl_monte_vegas_state * s, const gsl_monte_vegas_params * params): This function sets the integrator parameters based on values provided in the params structure.

Typically the values of the parameters are first read using gsl_monte_vegas_params_get, the necessary changes are made to the fields of the params structure, and the values are copied back into the integrator state using gsl_monte_vegas_params_set. The functions use the gsl_monte_vegas_params structure which contains the following fields:

Variable: double alpha: The parameter alpha controls the stiffness of the rebinning algorithm. It is typically set between one and two. A value of zero prevents rebinning of the grid. The default value is 1.5.

Variable: size_t iterations: The number of iterations to perform for each call to the routine. The default value is 5 iterations.

Variable: int stage: Setting this determines the stage of the calculation. Normally, stage = 0 which begins with a new uniform grid and empty weighted average. Calling VEGAS with stage = 1 retains the grid from the previous run but discards the weighted average, so that one can “tune” the grid using a relatively small number of points and then do a large run with stage = 1 on the optimized grid. Setting stage = 2 keeps the grid and the weighted average from the previous run, but may increase (or decrease) the number of histogram bins in the grid depending on the number of calls available. Choosing stage = 3 enters at the main loop, so that nothing is changed, and is equivalent to performing additional iterations in a previous call.

Variable: int mode: The possible choices are GSL_VEGAS_MODE_IMPORTANCE, GSL_VEGAS_MODE_STRATIFIED, GSL_VEGAS_MODE_IMPORTANCE_ONLY. This determines whether VEGAS will use importance sampling or stratified sampling, or whether it can pick on its own. In low dimensions VEGAS uses strict stratified sampling (more precisely, stratified sampling is chosen if there are fewer than 2 bins per box).

Variable: int verbose
Variable: FILE * ostream: These parameters set the level of information printed by VEGAS. All information is written to the stream ostream. The default setting of verbose is -1, which turns off all output. A verbose value of 0 prints summary information about the weighted average and final result, while a value of 1 also displays the grid coordinates. A value of 2 prints information from the rebinning procedure for each iteration.

The above fields and the chisq value can also be accessed directly in the gsl_monte_vegas_state but such use is deprecated.