GNU Scientific Library – Reference Manual: Large Dense Linear Systems Normal Equations

38.6.1 Normal Equations Approach

The normal equations approach to the large linear least squares problem described above is popular due to its speed and simplicity. Since the normal equations solution to the problem is given by

c = ( X^T X + \lambda^2 I )^-1 X^T y

only the p-by-p matrix X^T X and p-by-1 vector X^T y need to be stored. Using the partition scheme described above, these are given by

X^T X = \sum_i X_i^T X_i
X^T y = \sum_i X_i^T y_i

Since the matrix X^T X is symmetric, only half of it needs to be calculated. Once all of the blocks (X_i,y_i) have been accumulated into the final X^T X and X^T y, the system can be solved with a Cholesky factorization of the X^T X matrix. If the Cholesky factorization fails (occasionally due to numerical rounding errors), a QR decomposition is then used. In both cases, the X^T X matrix is first transformed via a diagonal scaling transformation to attempt to reduce its condition number as much as possible to recover a more accurate solution vector. The normal equations approach is the fastest method for solving the large least squares problem, and is accurate for well-conditioned matrices X. However, for ill-conditioned matrices, as is often the case for large systems, this method can suffer from numerical instabilities (see Trefethen and Bau, 1997). The number of operations for this method is O(np^2 + {1 \over 3}p^3).