Vous êtes sur la page 1sur 9

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

Overview Nonlinear Optimization


Overview of methods; the Newton method with line search Niclas Brlin
Department of Computing Science Ume University niclas.borlin@cs.umu.se

Most deterministic methods for unconstrained optimization have the following features:

They are iterative, i.e. they start with an initial guess x0 of the variables and tries to nd better points {xk }, k = 1, . . .. They are descent methods, i.e. at each iteration k , f (xk +1 ) < f (xk ) is (at least) required. At each iteration k , the nonlinear objective function f is replaced by a simpler model function mk that approximates f around xk . The next iterate xk +1 = xk + p is sought as the minimizer of mk .
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

November 19, 2007

c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

The model function mk is usually dened to be a quadratic function of the form 1 mk (xk + p ) = fk + p T fk + p T Bk p , 2 where fk = f (xk ), fk = f (xk ), and Bk is a matrix, usually a positive denite approximation of the hessian 2 f (xk ). If Bk is positive denite, a minimizer of mk may be found by solving p mk (xk + p ) = 0 for p . If the minimizer of mk does not produce a better point, the step p is modied to produce a point xk +1 = xk + p that is better. The modications come in two major avours: line search and trust-region.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Line search

In the line search strategy, the algorithm chooses a search direction pk and tries to solve the following one-dimensional minimization problem min f (xk + pk ),
>0

where the scalar is called the step length.

In theory we would like optimal step lengths, but in practice it is more efcient to test trial step lengths until we nd one that gives us a good enough point.

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

Trust-region
0.8

0.6

xk
0.4

In the trust-region strategy, the algorithm denes a region of trust around xk where the current model function mk is trusted. The region of trust is usually dened as p
2

0.2

0.2

where the scalar is called the trust-region radius. A candidate step p is found by approximately solving the following subproblem min mk (xk + p ) s.t. p
p 2

0.4

0.6 2.5 2 1.5 1 0.5

If the candidate step does not produce a good enough new point, we shrink the trust-region radius and re-solve the subproblem.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Line search Trust-region

0.8

In the line search strategy, the direction is chosen rst, followed by the distance. In the trust-region strategy, the maximum distance is chosen rst, followed by the direction.
0.8 0.8

0.6

xk
0.4 0.2

0.6

0.4

xk p

0.6

0.4

xk

0.2

0.2

0.2
0

0.2

0.2

0.4
0.4 0.4

0.6 2.5 2 1.5 1 0.5

0.6 2.5 2 1.5 1 0.5

0.6 2.5 2 1.5 1 0.5

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Convergence rate

Assume we have a series {xk } that converges to a solution x . Dene the sequence of errors as ek = xk x and note that
k

In order to compare different iterative methods, we need an efciency measure. Since we do not know the number of iterations in advance, the computational complexity measure used by direct methods cannot be used. Instead the concept of a convergence rate is dened.

lim ek = 0.

We say that the sequence {xk } converges to x with rate r and rate constant C if e k +1 lim =C k ek r and C < .

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Linear convergence, examples

In practice there are three important rates of convergence:


linear convergence, for r = 1 and 0 < C < 1; quadratic convergence, for r = 2. super-linear convergence, for r = 1 and C = 0.

For r = 1, C = 0.1 and e0 = 1, the norm of the error sequence becomes 1, 101 , 102 , . . . , 107
7 iterations

For C = 0.99 the corresponding sequence is 1, 0.9, 0.9801, . . . , 0.997 107 .


1604 iterations

Thus the constant C is of major importance for a method with linear convergence.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Quadratic convergence, examples

Local vs. global convergence

For r = 2, C = 0.1 och e0 = 1, the sequence becomes 1, 101 , 103 , 107 , . . .

For r = 2, C = 3 och e0 = 1, the sequence diverges 1, 3, 27, . . .

A method is called locally convergent if it produces a convergent sequence toward a minimizer x provided a close enough starting approximation. A method is called globally convergent if it produces a convergent sequence toward a minimizer x provided any starting approximation. Note that global convergence does not imply convergence towards a global minimizer.

For r = 2, C = 3 och e0 = 0.1, the sequence becomes 0.1, 0.03, 0.0027, . . . , i.e. it converges despite C > 1.

For quadratic convergence, the constant C is of lesser importance. Instead it is important that the initial approximation is close enough to the solution, i.e. e0 is small.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Globalization strategies

Descent directions

The line search and trust-region methods are sometimes called globalization strategies, since they modify a core method (typically locally convergent) to become globally convergent. There are two efciency requirements on any globalization strategy:

Consider the Taylor expansion of the objective function along a search direction p 1 f (xk + p ) = f (xk ) + p T fk + 2 p T 2 f (xk + p )p , 2 for some (0, )

Far from the solution, they should stop the methods from going out of control. Close to the solution, when the core method is efcient, they should interfere as little as possible.

Any direction p such that p T fk < 0 will produce a reduction of the objective function for a short enough step. A direction p such that p T fk < 0 is called a descent direction.

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

If the search direction has the form pk = the descent condition


T T 1 pk fk = fk Bk fk < 0 1 B k fk ,

p fk Since cos = p fk is the angle between the search direction and the negative gradient, descent directions are in the same half-plane as the negative gradient. The search direction corresponding to the negative gradient p = fk is called the direction of steepest descent.
1 0.9 0.8 0.7 0.6 0.5

is satised whenever Bk is positive denite.

0.4 0.3 0.2 0.1 0

1.5

0.5

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Line search

Exact and inexact line searches

Each iteration of a line search method computes a search direction pk and then decides how far to move along that direction. The next iteration is given by xk +1 = xk + k pk

Consider the function () = f (xk + pk ), > 0.

We will require pk to be a descent direction. This assures that the objective function will decrease f (xk + k pk ) < f (xk ) for some small k > 0.

Ideally we would like to nd the global minimizer of for every iteration. This is called an exact line search. However, it is possible to construct inexact line search methods that produce an adequate reduction of f at a minimal cost. Inexact line search methods construct a number of candidate values for and stop when certain conditions are satised.

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

The Sufcient Decrease Condition

() c1=1 c1=0.5 c1=0.1 c1=0

Mathematically, the descent condition f (xk + pk ) < f (xk ) is not enough to guarantee convergence. Instead, the sufcient decrease condition is formulated from the linear Taylor approximation of () () (0) + (0) or
T f (xk + pk ) f (xk ) + fk pk .
f() 0 1/16

The sufcient decrease condition states that the new point must at least produce a fraction 0 < c1 < 1 of the decrease predicted by the Taylor approximation, i.e.
T f (xk + pk ) < f (xk ) + c1 fk pk .
1/8 1/4 1/2 1

This condition is sometimes called the Armijo condition.


c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Backtracking

The Curvature Condition

The sufcient decrease condition alone is not enough to guarantee convergence, since it is satised for arbitrarily small values of . The sufcient decrease condition has to be combined with a strategy that favours large step lengths over small. A simple such strategy is called backtracking: Accept the rst element of the sequence 1 1 1, , , . . . , 2i , . . . 2 4 that satises the sufcient decrease condition. Such a step length always exist. Large step lengths are tested before small ones. Thus, the step length will not be too small. This technique works well for Newton-type algorithms.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Another approximation to the solution of min () f (xk + pk )


>0

is to solve for () = 0, which is approximated to the condition | (k )| c2 | (0)|, where c2 is a constant c1 < c2 < 1. T f (x + p ), we get Since () = pk k k
T T |pk f (xk + k pk )| c2 |pk f (xk )|.

This condition is called the curvature condition.


c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

The Wolfe Condition

The Newton-Raphson method in 1


The sufcient decrease condition and the curvature condition


T f (xk + pk ) f (xk ) + c1 fk pk , T T |pk f (xk + k pk )| c2 |pk f (xk )|,

Consider the non-linear problem f (x ) = 0, where f , x . The Newton-Raphson method for solving this problem is based on the linear Taylor approximation of f around xk f (xk + p) f (xk ) + pf (xk ).

where 0 < c1 < c2 < 1, are collectively called the strong Wolfe conditions. Step length methods that use the Wolfe conditions are more complicated than backtracking. Several popular implementations of nonlinear optimization routines are based on the Wolfe conditions, notably the BFGS quasi-Newton method.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

If f (xk ) = 0 we solve the linear equation f (xk ) + pf (xk ) = 0 for p and get p = f (xk )/f (xk ).

The new iterate is given by xk +1 = xk + pk = xk f (xk )/f (xk ).


c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

The Classical Newton minimization method in n

Geometrical interpretation; the model function

In order to use Newtons method to nd a minimizer we apply the rst-order necessary conditions on a function f f (x ) = 0 (f (x ) = 0)

The approximation of the non-linear function f (x ) with the linear (in p ) polynomial f (xk + p ) f (xk ) + 2 f (xk )p corresponds to approximating the non-linear function f (x ) with the quadratic (in p ) Taylor expansion

This results in the Newton sequence xk +1 = xk (2 f (xk ))1 f (xk ) (xk +1 = xk f (x )/f (x ))

This is often written as xk +1 = xk + pk , where pk is the solution of the Newton equation: 2 f (xk )pk = f (xk ). This formulation emphasizes that a linear equation system is solved in each step, usually by other means than calculating an inverse.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

1 mk (xk + p ) f (xk ) + f (xk )T p + p T 2 f (xk )p , 2

i.e. Bk = 2 f (xk ). Newtons method can be interpreted as that at each iteration k , f is approximated by the quadratic Taylor expansion mk around xk and xk +1 is calculated as the minimizer of mk .
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Properties of the Newton method


Advantages:

Disadvantages:

It converges quadratically toward a stationary point.

It does not necessarily converge toward a minimizer. It may diverge if the starting approximation is too far from the solution. It will fail if 2 f (xk ) is not invertible for some k . It requires second-order information 2 f (xk ).

Newtons method is rarely used in its classical formulation. However, many methods may be seen as approximations of Newtons method.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Ensuring a descent direction

Since the Newton search direction pN is written as


1 pN = Bk fk ,

with Bk = 2 fk , pN will be a descent direction if 2 fk is positive denite.

If 2 fk is not positive denite, the Newton direction pN may not a descent direction. In that case we will choose Bk as a positive denite approximation of 2 fk . Performed in a proper way, this modied algorithm will converge toward a minimizer. Furthermore, close to the solution the Hessian is usually positive denite and the modication will only be performed far from the solution.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method

The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search

The positive denite approximation Bk of the Hessian may be found with minimal extra effort: The search direction p is calculated as the solution of 2 f (x )p = f (x ).

The modied Newton algorithm with line search


Specify a starting approximation x0 and a convergence tolerance . Repeat for k = 0, 1, . . .

If 2 f (x ) is positive denite, the matrix factorization 2 f (x ) = LDLT

If f (xk ) < , stop. Compute the modied LDLT factorization of the Hessian. Solve
N (LDLT )pk = f (xk ) N for the search direction pk .

may be used, where the diagonal elements of D are positive. If 2 f (x ) is not positive denite, at some point during the factorization, a diagonal element will be dii 0. In this case, the element may be replaced with a suitable positive entry. Finally, the factorization is used to calculate the search direction (LDLT )p = f (x ).
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search

Perform a line search to determine the new approximation N xk +1 = xk + k pk .

c 2007 Niclas Brlin, CS, UmU

Nonlinear Optimization; The Newton method w/ line search

Vous aimerez peut-être aussi