OpTimIzation Overview

Optimization: an Overview
Moritz Diehl,
Optimization in Engineering Center (OPTEC) & ESAT
K.U. Leuven, Belgium
(some slide material was provided by W. Bangerth, K. Mombaur,
and P. Riede)
Overview of presentation
Optimization: basic definitions and concepts
Introduction to classes of optimization problems
Introduction to Newton type optimization algorithms

What is optimization?
Optimization = search for the best solution

in mathematical terms:
minimization or maximization of an objective function f (x)
depending on variables x subject to constraints
Equivalence of maximization and minimization problems:
(from now on only minimization)
f(x) -f(x)
x x
x* Maximum
x* Minimum
Constrained optimization
Often variable x shall satisfy certain constraints, e.g.:

x 0
x1 2 + x2 2 = C
General formulation:
f objective function / cost function

h equality constraints
g inequality constraints
Simple example: Ball hanging on a spring
To find position at rest,

minimize potential energy!
min x12 + x22 + mx2

spring gravity
1 + x1 + x2 0
3 x1 + x2 0
Feasible set
Feasible set = collection of all
points that satisfy all constraints:
Example feasible set is intersection

of grey and blue area
x2 0
1 x12 x22 0
h1 ( x) := x2 0
h2 ( x) := 1 x12 x22 0
Local and global optima
f(x)
Local Minimum
Local Minimum
Global Minimum:
x
Derivatives
First and second derivatives of the objective function or the
constraints play an important role in optimization
The first order derivatives are called the gradient (of the resp. fct)
and the second order derivatives are called the Hessian matrix
Optimality conditions (unconstrained)
min f ( x) x R n
Assume that f is twice differentiable.

We want to test a point x* for local
optimality.
necessary condition:
f(x*)=0 (stationarity)
x*
sufficient condition:
x* stationary and 2f(x*) positive definite
Types of stationary points
(a)-(c) x* is stationary: f(x*)=0
2f(x*) positive definite: 2f(x*) negative definite:

local minimum local maximum
2f(x*) indefinite: saddle point

Ball on a spring without constraints
min2 x12 + x22 + mx2

xR
contour lines of f(x)
gradient vector
f ( x) = (2 x1 ,2 x2 + m)
unconstrained minimum:
m
0 = f ( x* ) ( x1* , x2* ) = (0, )
2
Sometimes there are many local minima
e.g. potential energy

of macromolecule
Global optimization is a very hard issue - most algorithms find only

the next local minimum. But there is a favourable special case...
Convex functions
Convex: all connecting Non-convex: some connecting

lines are above graph lines are not above graph
Convex feasible sets
Convex: all connecting lines Non-convex: some connecting

between feasible points are in line between two feasible points
the feasible set is not in the feasible set
Convex problems
Convex problem if
f(x) is convex and the feasible set is convex
One can show:

For convex problems, every local minimum is also a global minimum.
It is sufficient to find local minima!
Characteristics of optimization problems 1
size / dimension of problem n ,

i.e. number of free variables
continuous or discrete search space
number of minima
Characteristics of optimization problems 2
Properties of the objective function:

type: linear, nonlinear, quadratic ...
smoothness: continuity, differentiability
Existence of constraints
Properties of constraints:
equalities / inequalities
type: simple bounds, linear, nonlinear,
dynamic equations (optimal control)
smoothness

Problem Class 1: Linear Programming (LP)
Linear objective,
linear constraints:
Linear Optimization Problem
(convex)
Example: Logistics Problem

shipment of quantities a1, a2, ... am
of a product from m locations
Origin of linear
to be received at n detinations in programming
quantities b1, b2, ... bn in 2nd world war
shipping costs cij
determine amounts xij
Problem Class 2: Quadratic Programming (QP)
Quadratic objective and linear

constraints:
Quadratic Optimization Problem
(convex, if Q pos. def.)
Example: Markovitz mean variance portfolio optimization

quadratic objective: portfolio variance (sum of the variances and
covariances of individual securities)
linear constraints specify a lower bound for portfolio return
QPs play an important role as subproblems in nonlinear optimization

Problem Class 3: Nonlinear Programming (NLP)
Nonlinear Optimization Problem

(in general nonconvex)
TODAYS MAIN TOPIC!
E.g. the famous nonlinear Rosenbrock

function
Problem Class 4: Non-smooth optimization
objective function or constraints are

non-differentiable or not continuous e.g.
Problem Class 5: Integer Programming (IP)
Some or all variables are integer

(e.g. linear integer problems)
Special case: combinatorial optimization

problems -- feasible set is finite
Example: traveling salesman problem

determine fastest/shortest round
trip through n locations
Problem Class 6: Optimal Control
Optimization problems Variables (partly -dim.)

including dynamics in form of
differential equations
(infinite dimensional)
THIS COURSES MAIN TOPIC!


Aim of Newton type optimization algorithms
min f ( x) ( x R ) n
Find a local minimizer x* of f(x), i.e. a point satisfying
f(x*)=0
Derivative based algorithms
Fundamental underlying structure of most algorithms:
choose start value x0

for i=1, ......:
determine direction of search (descent) p
determine step length
new iterate x i+1 = xi + p
check convergence
Optimization algorithms differ in the choice of p und

Basic algorithm:
Search direction:
choose descent direction
(f should be decreased)
Step length:
solve1-d minimization approximately,
satisfy Armijo condition
Computation of step length
Dream:
exact line search: k = arg min f ( xk + pk )

In practice:
inexact line search: k
arg min f ( x k
+ p k
)

ensure sufficient decrease, e.g. Armijo condition

How to compute search direction?
We discuss three algorithms:

Steepest descent method
Newtons method
Newton type methods
Algorithm 1: Steepest descent method
Based on first order Taylor series approximation of objective function
maximum descent, if
Steepest descent method
Choose steepest descent search direction, perform (exact) line search:
p k = f ( x k ) x k +1 = x k k f ( x k )
search direction is perpendicular to level sets of f(x)
Gradient direction
Convergence of steepest descent method
steepest descent method has linear convergence
i.e. x k x* C xk 1 x*
gain is a fixed factor C<1

convergence can be very slow
if C close to 1
If f(x) = xTAx, A positive definite,

eigenvalues of A, one can show that
max min
C
max + min
Example - steepest descent method
1 1 2
f ( x, y) = 4 ( x y 2 )2 + + y
100 100
banana valley function,
global minimum at x=y=0
Example - steepest descent method
xk x*
Convergence of steepest descent method:

needs almost 35.000 iterations to come closer than 0.1 to the solution
mean value of convergence constant C: 0.99995
at (x=4,y=2), there holds
268 0.1
1 = 0.1, 2 = 268 C 0.9993
268+ 0.1
Algorithm 2: Newtons Method
Based on second order Taylor series approximation of f(x)
Newton-Direction
Visualization of Newtons method
pk minimizes quadratic approximation of the objective
1 kT 2
Q ( p ) = f ( x ) + f ( x ) p + p f ( x k ) p k
k k k k
Gradient direction
if quadratic model is
good, then take full
step with k =1
Newton direction
Convergence of Newtons method
Newtons method has quadratic convergence
k 1 * 2
i.e. x x
k *
C x x
This is very fast close to a solution:
Correct digits double in each iteration!

Example - Newtons method
1 1 2
f ( x, y) = 4 ( x y 2 )2 + + y
100 100
banana valley function,

global minimum at x=y=0
Example - Newtons method
xk x*
Convergence of Newtons method:

less than 25 iterations for an accuracy of better than 10-7!
convergence roughly linear for first 15-20 iterations since step
length k 1
convergence roughly quadratic for last iterations with step length
k = 1
Comparison of steepest descent and Newton
xk x*
For banana valley example:

Newtons method much faster than steepest descent method (factor
1000)
Newtons method superior due to higher order of convergence
steepest descent method converges too slowly for practical
applications
Algorithm 3: Newton type methods
In practice, evaluation of second derivatives

for the hessian can be difficult!
approximate hessian matrix 2f(xk)

often methods ensure that the approximation Bk is positive
definite
xk +1 = x k Bk 1f ( xk )
Bk 2 f ( xk )
methods are collectively known as Newton type methods

Newton type variants
Notation:
Steepest Descent:
Convergence rate: linear
Newton Method:
Convergence rate: quadradic

Newton type variants (continued)
BFGS update (Broyden, Fletcher, Goldfarb, Shanno)
Convergence rate: super-linear
For Least-Squares Problems: Gauss-Newton Method
Convergence rate: linear

Summary: Optimization Overview
Optimization problems can be
(un)constrained, (non)convex, (non)linear, (non)smooth,
continuous/integer,(in)finite dimensional, ...
Aim: find local minima of smooth nonlinear problems: f(x*)=0
Derivative based methods iterate x i+1 = xi + i pi with

search direction pi and step length i .
start at initial guess x0 ,
Three methods:
steepest descent: intuitive, but slow (linear convergence)
Newtons method: very fast (quadratic convergence)
Newton type methods: cheap, fast, and popular, e.g.
BFGS (superlinear)
Gauss-Newton (fast linear convergence)
Attention: Nonlinear vs. Convex Optimization
For nonlinear problems, Newton type algorithms only find local minima, and
"optimal solution" depends on initialization!
Important exception: convex problems
"The great watershed in optimization isn't between linearity and

nonlinearity, but convexity and nonconvexity - R. Tyrrell Rockafellar
Literature
J. Nocedal, S. Wright: Numerical Optimization, Springer, 1999/2006

P. E. Gill, W. Murray, M. H. Wright: Practical Optimization, Academic
Press, 1981
R. Fletcher, Practical Methods of Optimization, Wiley, 1987
D. E. Luenberger: Linear and Nonlinear Programming, Addison
Wesley, 1984

OpTimIzation Overview

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

OpTimIzation Overview

Transféré par

Droits d'auteur :

Formats disponibles

Optimization: an Overview

Optimization: basic definitions and concepts

Introduction to classes of optimization problems

Introduction to Newton type optimization algorithms

Optimization = search for the best solution

Often variable x shall satisfy certain constraints, e.g.:

f objective function / cost function

To find position at rest,

min x12 + x22 + mx2

Example feasible set is intersection

Assume that f is twice differentiable.

(a)-(c) x* is stationary: f(x*)=0

2f(x*) positive definite: 2f(x*) negative definite:

2f(x*) indefinite: saddle point

min2 x12 + x22 + mx2

contour lines of f(x)

e.g. potential energy

Global optimization is a very hard issue - most algorithms find only

Convex: all connecting Non-convex: some connecting

Convex: all connecting lines Non-convex: some connecting

One can show:

size / dimension of problem n ,

continuous or discrete search space

Properties of the objective function:

Optimization: basic definitions and concepts

Introduction to classes of optimization problems

Introduction to Newton type optimization algorithms

Example: Logistics Problem

Quadratic objective and linear

Example: Markovitz mean variance portfolio optimization

QPs play an important role as subproblems in nonlinear optimization

Nonlinear Optimization Problem

TODAYS MAIN TOPIC!

E.g. the famous nonlinear Rosenbrock

objective function or constraints are

Some or all variables are integer

Special case: combinatorial optimization

Example: traveling salesman problem

Optimization problems Variables (partly -dim.)

THIS COURSES MAIN TOPIC!

Optimization: basic definitions and concepts

Introduction to classes of optimization problems

Introduction to Newton type optimization algorithms

Find a local minimizer x* of f(x), i.e. a point satisfying

Fundamental underlying structure of most algorithms:

choose start value x0

Optimization algorithms differ in the choice of p und

ensure sufficient decrease, e.g. Armijo condition

We discuss three algorithms:

Based on first order Taylor series approximation of objective function

Choose steepest descent search direction, perform (exact) line search:

search direction is perpendicular to level sets of f(x)

steepest descent method has linear convergence

gain is a fixed factor C<1

If f(x) = xTAx, A positive definite,

Convergence of steepest descent method:

Based on second order Taylor series approximation of f(x)

pk minimizes quadratic approximation of the objective

Newtons method has quadratic convergence

This is very fast close to a solution:

Correct digits double in each iteration!

banana valley function,

Convergence of Newtons method:

2f(x) positive definite: 2f(x) negative definite: