Limited-memory BFGS for MATLAB

Introduction

This is a small class I wrote using MATLAB's object-oriented programming framework. It implements a limited-memory quasi-Newton (BFGS, which stands for Broyden, Fletcher, Goldfarb and Shanno) method for approximating the Hessian of the objective function. This approximation is obtained via a small number of measurements of the gradient. For background on quasi-Newton methods for optimization, see the references below.

If you have any questions, praise, or comments, or would like to report a bug, do not hesitate to contact the author. I've tested this software in MATLAB version 7.8 on both the Linux operating systems.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Canada License. Please cite the source as Peter Carbonetto, Department of Computer Science, University of British Columbia.

Overview

The files in this package that are of greatest interest are:

  • @LBfgs/LBfgs.m Implementation of the limited-memory BFGS representation.
  • lassoip.m The primal-dual interior-point method illustrating use of the limited-memory BFGS representation for solving a constrained optimization problem. This function computes the linear regression model that maximizes the likelihood of the training data, subject to an L1 regularization term that enforces sparsity in the final solution (otherwise known in the statistics literature as the "Lasso").
  • example.m A short script demonstrating how to use the interior-point method for estimating the coefficients of the regression model.

Installation

You can download the MATLAB code for this project here.

Using the LBfgs class

The best way to understand how to use the LBfgs class for MATLAB is to go through the example usage in the file lassoip.m.

Initialization. To create the initial quasi-Newton approximation, you would call the class constructor with code that looks something like this:

    B = LBfgs(n,m);
This creates a new LBfgs object. The first input specifies the number of optimization variables, and the second input specifies the number of correction pairs. If you choose to store a large number of correction pairs, then the quasi-Newton approximation to the Hessian will be more accurate, but the drawback is that it will be more expensive to use.

Updating. Once you've visited two points in your optimization algorithm (with the appropriate line search ensuring that the Wolfe conditions are satisfied), you can update the BFGS approximation using the command

    B = update(B,s,y);
The first input is the LBfgs object to be updated. The second input is the difference between the two points, and the third input is the difference between the gradient vectors and each of those points.

Efficient multiplication. If for any reason you need to multiply the symmetric quasi-Newton approximation times a vector, simply use the overloaded matrix multiplication operator and type y = B * x. How this multiplication is done efficiently is described in the journal paper by Byrd, Nocedal and Schnabel (1994). (See the references below.)

Efficient linear solver. Another key method for the LBfgs class efficiently computes the solution to a system of linear equations of the form

    (W + B) x = -b
To calculate the solution to this system, call
    x = solve(B,b,W)
The first input is the quasi-Newton representation. The second input is the vector on the right-hand side of the linear system of equations. The third input is a sparse, symmetric matrix. Note that this matrix may be greater than the size of the limited-memory approximation to the Hessian, in which case we only add B to the left-left portion of W. Note: to use this linear solver correctly, you must first add to the matrix W the scaling factor "sigma" times the identity matrix. For instance, if W has the same dimensions as the quasi-Newton approximation, you would type the following:
    W = W + B.sigma * eye(n);
before calling the solve method. This method is particularly useful for efficiently solving the primal-dual systems that arise in interior-point methods; provided the matrix W has a nice sparsity structure (e.g. block diagonal), then computing the factors of this system will be done very quickly. This is exactly the case for the primal-dual interior-point solver implemented in lassoip.m.

References

Richard H Byrd, Jorge Nocedal, Robert B Schnabel (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, Vol. 63, No. 1., pp. 129-156.

Jorge Nocedal and Stephen J. Wright (2006). Numerical Optimization, 2nd edition. Springer.

R. A. Waltz, J. L. Morales, J. Nocedal, D. Orban (2006). An interior algorithm for nonlinear optimization that combines line search and trust region steps. Mathematical Programming, Vol. 107, No. 3., pp. 391-408.


June 8, 2009