GNU Scientific Library – Reference Manual: Nonlinear Least-Squares Overview

39.1 Overview

The problem of multidimensional nonlinear least-squares fitting requires the minimization of the squared residuals of n functions, f_i, in p parameters, x_i,

\Phi(x) = (1/2) || f(x) ||^2
        = (1/2) \sum_{i=1}^{n} f_i(x_1, ..., x_p)^2

In trust region methods, the objective (or cost) function \Phi(x) is approximated by a model function m_k(\delta) in the vicinity of some point x_k. The model function is often simply a second order Taylor series expansion around the point x_k, ie:

\Phi(x_k + \delta) ~=~ m_k(\delta) = \Phi(x_k) + g_k^T \delta + 1/2 \delta^T B_k \delta

where g_k = \nabla \Phi(x_k) = J^T f is the gradient vector at the point x_k, B_k = \nabla^2 \Phi(x_k) is the Hessian matrix at x_k, or some approximation to it, and J is the n-by-p Jacobian matrix J_{ij} = d f_i / d x_j. In order to find the next step \delta, we minimize the model function m_k(\delta), but search for solutions only within a region where we trust that m_k(\delta) is a good approximation to the objective function \Phi(x_k + \delta). In other words, we seek a solution of the trust region subproblem (TRS)

\min_(\delta \in R^p) m_k(\delta), s.t. || D_k \delta || <= \Delta_k

where \Delta_k > 0 is the trust region radius and D_k is a scaling matrix. If D_k = I, then the trust region is a ball of radius \Delta_k centered at x_k. In some applications, the parameter vector x may have widely different scales. For example, one parameter might be a temperature on the order of 10^3 K, while another might be a length on the order of 10^{-6} m. In such cases, a spherical trust region may not be the best choice, since if \Phi changes rapidly along directions with one scale, and more slowly along directions with a different scale, the model function m_k may be a poor approximation to \Phi along the rapidly changing directions. In such problems, it may be best to use an elliptical trust region, by setting D_k to a diagonal matrix whose entries are designed so that the scaled step D_k \delta has entries of approximately the same order of magnitude.

The trust region subproblem above normally amounts to solving a linear least squares system (or multiple systems) for the step \delta. Once \delta is computed, it is checked whether or not it reduces the objective function \Phi(x). A useful statistic for this is to look at the ratio

\rho_k = ( \Phi(x_k) - \Phi(x_k + \delta_k) / ( m_k(0) - m_k(\delta_k) )

where the numerator is the actual reduction of the objective function due to the step \delta_k, and the denominator is the predicted reduction due to the model m_k. If \rho_k is negative, it means that the step \delta_k increased the objective function and so it is rejected. If \rho_k is positive, then we have found a step which reduced the objective function and it is accepted. Furthermore, if \rho_k is close to 1, then this indicates that the model function is a good approximation to the objective function in the trust region, and so on the next iteration the trust region is enlarged in order to take more ambitious steps. When a step is rejected, the trust region is made smaller and the TRS is solved again. An outline for the general trust region method used by GSL can now be given.

Trust Region Algorithm

Initialize: given x_0, construct m_0(\delta), D_0 and \Delta_0 > 0
For k = 0, 1, 2, ...
1. If converged, then stop
2. Solve TRS for trial step \delta_k
3. Evaluate trial step by computing \rho_k
  1. if step is accepted, set x_{k+1} = x_k + \delta_k and increase radius, \Delta_{k+1} = \alpha \Delta_k
  2. if step is rejected, set x_{k+1} = x_k and decrease radius, \Delta_{k+1} = {\Delta_k \over \beta}; goto 2(b)
4. Construct m_{k+1}(\delta) and D_{k+1}

GSL offers the user a number of different algorithms for solving the trust region subproblem in 2(b), as well as different choices of scaling matrices D_k and different methods of updating the trust region radius \Delta_k. Therefore, while reasonable default methods are provided, the user has a lot of control to fine-tune the various steps of the algorithm for their specific problem.