Vous êtes sur la page 1sur 110

2

Contents

1 Introduction 7
1.1 Design as a creative process . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Formulation of an optimization model . . . . . . . . . . . . . . . . . . . 10

2 Optimization from a mathematical viewpoint 11


2.1 Categories of optimization variables . . . . . . . . . . . . . . . . . . . . 13
2.2 Pure design variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Cross-sectional variables . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Cross-sectional values as variables . . . . . . . . . . . . . . . . . 15
2.2.3 Dimension variables . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.4 Whole numbered variables . . . . . . . . . . . . . . . . . . . . . 16
2.2.5 Discrete variables . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.6 Binary (0/1) variables . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Shape variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Trusses (shape optimization) . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Slabs (plates subjected to in-plane forces) . . . . . . . . . . . . . 17
2.3.3 Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Dependent design variables . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Description of a tube cross-section . . . . . . . . . . . . . . . . . 18
2.4.2 Moment of inertia expressed in terms of area . . . . . . . . . . . 19
2.4.3 Sandwich-shaped cross-section . . . . . . . . . . . . . . . . . . 19
2.4.4 I-shaped members . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.5 I-shaped members according to standards . . . . . . . . . . . . . 19
2.5 Realization of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Optimization criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 A structural optimization model for a two bar truss 25


3.1 Structural system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Definition of the design variables . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Optimization criterion for a two bar truss . . . . . . . . . . . . . . . . . . 26
3.4 Modeling the constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Stability problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3
4 Application of design optimization in structural engineering 33
4.1 Trusses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Statically determinate trusses . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Statically indeterminate trusses . . . . . . . . . . . . . . . . . . . 34
4.2 Several trusses in different situations . . . . . . . . . . . . . . . . . . . . 36
4.3 Optimization of a statically indeterminate truss . . . . . . . . . . . . . . 39
4.4 Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Framed structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6 Panel structures with in-plane forces . . . . . . . . . . . . . . . . . . . . 45
4.7 Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Solution techniques in optimization 51


5.1 Gradient vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Taylor series expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Concepts of necessary and sufficient conditions . . . . . . . . . . . . . . 55
5.5 Unconstrained problems . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Indirect solutions of constrained optimum design problems 63


6.1 Definition of regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Necessary conditions for inequality constraints . . . . . . . . . . . . . . 71
6.2.1 The Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . . . . . . 74
6.3 Sufficient conditions for constrained problems . . . . . . . . . . . . . . . 76
6.3.1 Convex problems . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.2 Second-order conditions for general problems . . . . . . . . . . . 77
6.3.3 Second-order necessary conditions for general constrained prob-
lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.4 Sufficient conditions for the general constrained problem . . . . . 78
6.4 Evaluation of the Kuhn-Tucker formalism . . . . . . . . . . . . . . . . . 78

7 Numerical methods for optimum design 81


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Basic concepts related to the
implementation of algorithms . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.1 Determination of the search direction x(k) . . . . . . . . . . . . . 83
7.2.2 Search methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.5 Determination of the step size α(k) . . . . . . . . . . . . . . . . . . . . . 89
7.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5.2 Interval search . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.5.3 Elimination methods . . . . . . . . . . . . . . . . . . . . . . . . 92

4
7.5.4 Interpolation methods . . . . . . . . . . . . . . . . . . . . . . . 93
7.5.5 One point pattern (Newton-Raphson iteration) . . . . . . . . . . . 93
7.5.6 Two point pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5.7 Three point pattern . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6 Handling of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6.1 Transformation methods . . . . . . . . . . . . . . . . . . . . . . 96
7.6.2 Sequential unconstrained minimization . . . . . . . . . . . . . . 96
7.6.3 Proof of concept of SUMT . . . . . . . . . . . . . . . . . . . . . 97
7.6.4 Multiplier methods . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.7 Sequential approximation methods . . . . . . . . . . . . . . . . . . . . . 100
7.7.1 SLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7.2 SQP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7.3 SCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 On the usage of termination criteria . . . . . . . . . . . . . . . . . . . . 101
7.9 Evolution strategies for design optimization . . . . . . . . . . . . . . . . 102

5
6
Chapter 1

Introduction

Before we begin with the details of design optimization, let us first discuss some funda-
mental issues on the background and history of this subject. In the past decades (since the
1980’s), design optimization has emerged as a new and substantial discipline in engineer-
ing, gaining more and more significance, in particular in the fields of

• mechanical engineering,

• automobil engineering,

• aerospace engineering,

• shipbuilding,

• ... and also, but slowly, in civil engineering.

The goal of design optimization is to design technical systems in an optimal fashion w.r.t.
costs, quality, mechanical behavior, efficiency and other objectives. When focusing on
structural systems, we customarily use the term “structural optimization”. The relevance
of optimization in engineering is obvious because of

• shrinking resources,

• expensive energy,

• expensive manufacturing and erection of structures,

• faster developments of new products with shorter periods available for testing.

We have to realize that the process of design optimization, in a modern fashion, needs
the application of (efficient) computers, so therefore computer systems are mandatory to
solve real-world problems. Also, design optimization is an interdisciplinary methodology
where a wide variety of engineering facets have to be incorporated. Design optimization
collects the numerically oriented aspects of a design process requiring a formal, or abstract
model and is represented through an optimization problem.
Design and structural optimization is closely linked with modern structural analysis (finite
elements, boundary elements, etc.) because displacements, stresses, vibrations, etc. are
fundamental for the design of engineering structures. Also, as noted above, optimization,
analysis and user-guided navigation of the design process requires powerful computer

7
systems (including advanced hard- and software) due to the inherent complexity. These
facts explain why the use of design (or structural) optimization in engineering has been
delayed so much compared to computational engineering in general. In fact, the design
optimization can be considered to begin only around the 1960’s:

1960 (2nd Conference on Electronic Computers in Engineering in Pittsburgh)

• Clough (Berkeley) introduced the finite element method for the first time

• L. A. Schmitt (Los Angeles) introduced the computer aided optimal design


method for the first time

The finite element method, as a rationale for design, required approx. 30 years to mature to
a real tool. It is therefore no surprise that design optimization needs some time to become
a mainstay tool. Since optimization is highly non-linear and multi-dimensional appropri-
ate software has been missed for a long time. Finally, in advanced design optimization
knowledge-based issues are also relevant.
We can summerize as follows: Design optimization

• is a consistent advancement of CAE/CAD/FEM,


• is an extremely complex domain w.r.t. information technology,
• requires the application of efficient software and hardware systems as well as mod-
ern computer science concepts.

The position of design optimization is illustrated in Fig. 1.1.

1.1 Design as a creative process


In general, design is a creative process that includes various engineering activities, such
as drawings, calculations, checking to standards, comparing, evaluating, technical docu-
mentation, writing of reports, etc.
In particular, design is an iterative process making use of a designer’s experience, intuition
and ingenuity. Consequently, engineering design is obviously not only a numerical task:
There are aspects of optimization which can be quantified only with great difficulty and
there are operative levels that are not manageable using only numerical models (such as
the questions of esthetics, functionality, and so on). A system evolution model is shown
in Fig. 1.2, displaying a very generic design approach.

CAD/CAE numerical/
FEM math. optimization

design
optimization

modern concepts
software
of
engineering
computer science

Figure 1.1: The position of design optimization within engineering and computer science fields.

8
system preliminary
detailed
identification specifications design testing
design
(needs) (concepts)

fabrication

Figure 1.2: The evolution of the design process

collect data check performance yes


estimate initial analyse
to describe criteria satisfactory stop
design the system
system "objective function"
no

result
change based
on experience

Figure 1.3: The conventional design process

The conventional design process guided by engineering intuition is interesting w.r.t. the
design optimization process, where heavy use is made of computers and numerical mod-
els. The basic steps of conventional design are shown in Fig. 1.3. An advantage of con-
ventional design is that the designer’s experience can go into making conceptual changes.
A disadvantage, however, is that changes are time consuming so that more than one or
two changes are avoided in the design process.
The optimum design process is much more efficient, in particular, in detailed design, as
can be seen in Fig. 1.4.
Advantages of optimum design are:

• an optimal solution is created from a multitude of candidate solutions by means of


an optimization algorithm (optimizer) in the scope of the numerical model,

• time is no longer such a serious matter because the computer, and not the engineer,
is doing the job (however, this doesn’t mean that computer time is not crucial for
very large structures),

define an abstract model


1. design variables
2. objective criterion
3. constraints

yes obj. criterion yes


create a estimate analyse the check constraints
satisfies convergence stop
numerical model initial design system (model) are satisfied
measures

no no
result
change design using
optimization method

Figure 1.4: The optimum design process

9
• a large set of constraints is possible and the objective criteria are checked automat-
ically.

Disadvantages of optimum design include:

• the modeling effort is large,

• structural optimization is a challenge,

• conceptual design changes are extremely cumbersome and expensive.

1.2 Formulation of an optimization model


The formulization in terms of an optimization model leads to the following quantities:

The design variables xi , i = 1, 2, 3, ..., n, describing the vital parameters of a system,


concentrated in the design vector x.

An objective or optimization criterion introduced as f (x), being a non-linear1 function


(or procedure, method, subroutine, etc.) of the variables xi , i = 1, 2, 3, ..., n.
Note: multi-criterions are not considered here, but reasonable in some cases.

A set of constraints where two categories are possible,

• equality constraints hk (x) = 0, k = 1, 2, 3, ..., p or

h(x) = 0;

• inequality constraints g j (x) ≤ 0, j = 1, 2, 3, ..., m or

g(x) ≤ 0.

According to the basic terminology and notation used in design optimization the con-
straints of the inequality form are declared as of the ≤-type. This is to say, if a formulation
of the ≥-type occurs, it should be transformed to the ≤-type, for example by multiply-
ing with −1. The reason for that is that most software packages assume the ≤-type of
constraint equations.
Furthermore, the optimization is defined as a minimization problem. Thus, we have the
general format
 
h(x) = 0
min f (x) .
x g(x) ≤ 0

If maximization is requested, the fact that max{ f (x)} =


ˆ min{− f (x)} may help to for-
mulate the optimization problem.

1
that is the normal case in engineering

10
Chapter 2

Optimization from a mathematical


viewpoint

With respect to the terminology, we can ask the


very useful question:“What does optimal mean?” x2

Answer: We have already seen that a set of admis-


sible solutions, defined by the constraints, is nec-
essary (see Fig. 2.1); from this set the best candi- solution domain
date is selected. Thus, at least two alternative so- with candidates

lutions are required for an “optimization”. There- x1


fore, the solution of the linear system of equations
Ax = r,where det A 6= 0, does not represent an op- Figure 2.1: Solution domain with pos-
timization problem! sible candidates
However, the solution of the above equation system can be easily transferred into an
equivalent optimization problem, as is often done in many scientific scenarios:
We define an error function (defect) ei (x) as
n
ei (x) = Ax − r = ∑ ai j x j − r i , i = 1, 2, 3, ..., n.
j=1

To eliminate the sign effect, we define a new error function


n
E(x) = e21 (x) + e22 (x) + ... + e2n(x) = ∑ e2i (x).
i=1

The solution to the problem is then given by

min {E(x)}.
x

A reasonable numerical approach is as follows:

1. estimate an initial, but possibly incorrect solution x = x0 ,

2. find a new improved solution based on an algorithm according to an adequate opti-


mization strategy, x1 = x0 + ∆x0 , or, more generally, xk+1 = xk + ∆xk ,

3. check if |E(xk+1 ) − E(xk )| ≤ ε, 0 ≤ ε ≪ 1.

11
Considering linear equation systems, however, there are much more efficient techniques
available for solving equations of the form Ax = r (for example, Gauß, Crout or Cholesky
solvers). By contrast, in the non-linear case the optimization approach is not such a bad
idea, because solving a general non-linear system of equations is still a challenge.
A simple example of a non-linear system of equations is:

x21 + 3 cos x2 = 2,
cos x1 + 2x1 x2 = 4.

Again, we introduce the error terms mentioned above,

e1 (x) = x21 + 3 cos x2 − 2,


e2 (x) = cos x1 + 2x1 x2 − 4,

and

E(x) = e21 (x) + e22 (x) → min .

In a more general approach we can apply the least p-th error condition,
n
E(x) = ∑ wi ei (x) → min,
p

i=1

where p is an even integer and wi is a weighting factor. This represents an effective and
generic solution technique.
To understand the mathematical background of optimization it is a good idea to “change”
the equation Ax = r into an inequality equation system Ax ≤ r.
Example:

3x1 + 5x2 ≤ 15,


−x1 − x2 ≤ 3,
x1 ≤ 4,
−3 ≤ x2 ≤ 2.

Note that last line can be rewritten as x2 ≤ 2 and −x2 ≤ 3.


In matrix form we obtain
   
3 5 15
−1 −1   3
  x1  
 · x2 ≤  4  ,
1 0  

0 1  2
0 −1 3
A · x ≤ r.

Fig. 2.2 illustrates the situation. As demonstrated, the solution domain S in the x1 /x2 -
space is bordered by the linear constraints, defining an infinite number of solutions (al-
ternatives). The best solution can be found by evaluating an objective function. Here, the
(linear) function Q(x) = x1 + x2 is used. The optimum x∗ can be found at the intersection
of the lines 3x1 + 5x2 = 15 and x1 = 4.

12
x2

4
x1 + x2 = -3 3

2 x2 = 2

1 Popt

x1
-5 -4 -3 -2 -1 1 2 3 4 5
-1
S 3x1 + 5x2 =15
-2

-3 x2 = -3

-4
Q1
-5
x1 = 4

Figure 2.2: The feasible region and solution defined by the the equation Ax = r

According to the standard notation defined above the graphical solution of the optimiza-
tion problem

min{x1 + x2 | Ax ≤ r}
x

has been discussed which is a simple linear optimization problem. Real world problems,
on the other hand, are usually non-linear (see Fig. 2.3), which makes life much more
complicated.
To give an insight in the complexity of non-linear optimization problems some basic cases
for the 2-dimensional problems are depicted in Fig. 2.4.
With respect to the objective function, complicated cases can also occur, which may also
cause trouble. Again, in Fig. 2.5 some typical cases for 2 design variable are shown.

2.1 Categories of optimization variables


What categories of optimization variables are applied in design optimization?
For the purpose of design optimization, in addition to the mathematical notation, enginee-
ring-oriented terminology has also evolved. Although redundant from the mathematical
point of view, the following notation/designations are in use:

• design variables,

• decision variables,

• independent structural variables.

13
x2
5 optimum

1
-5 -4 -3 -2 -1 1 2 3 5

feasible region x1
-1 contour lines of
the objective function
-2

-3

-4

-5

Figure 2.3: A 2-dimensional optimization problem with non-linear constraints and a non-linear
objective function

x2 x2 x2

x1 x1 x1

closed, convex domain closed, non-convex domain open domain

x2 x2 x2

x1 x1 x1

disjunct, convex domain disjunct, convex / disjunct, non-convex domain


non-convex domain

Figure 2.4: Types of domains for 2-dimensional optimization problems

x2 x2 x2 x2

x1 x1 x1 x1
one peak only two peaks infinite multiple
(well posed!) (a whole curve is optimal)

Figure 2.5: Typical contour plots of an objective function

14
These variables are regarded as explicitly free because the designer (engineer) can assign
any value to them. If the specified values do not satisfy all constraints of the design prob-
lem (represented as an optimization problem), the design is infeasible. If the values do
satisfy all constraints, the design is called feasible (or workable or usable).
It’s an important first step in the proper formulation / modeling of the problem to identify
the appropriate design variables. Sometimes it is desirable to designate more design vari-
ables than may be apparent from the statement of the problem formulation because this
gives flexibility in design. Later on, it is then possible to fix some of the variables to keep
the dimension of the optimization problem low.
It is to be emphasized that the design model, as described above, represents a synthesis
problem in contrast to the conventional design making use of an analysis problem.
To clarify the definition of design variables some practical examples will be considered.
By that, we are focusing on examples from structural engineering, as this will be the prime
field in this course!

2.2 Pure design variables

2.2.1 Cross-sectional variables

x4
x3

x1 = web thickness
x2 = web height
x2

x3 = flange height
x4 = flange width
x1

2.2.2 Cross-sectional values as variables

E I2 = E x2
x1 = moment of inertia for frame bars
E I1 = E x1 E I3 = E x3 (or shafts/columns) 1 and 3
H

1 3
x2 = moment of inertia for bar 2
EIi =flexural rigidity
E =Young’s modulus a.k.a. modulus of elasticity
2B

15
2.2.3 Dimension variables

x1 x2 l - x1 - x2 x1 = span section 1
x2 = span section 2
l = const.
l = total span (const.)

2.2.4 Whole numbered variables

J = jacking f orce
J J
xk = number of cables in prestressed
concrete box girder bridge

Note: This leads to an integer optimization


problem requiring specific features of an
optimization method not dealt with in this
course.

2.2.5 Discrete variables

Potential variables are, for example, h, b or


s according to values in the table. Again,
this is not an easy optimization problem
because of the discrete variables. Special
solution techniques are required.

IPB h b s t r S
mm mm mm mm mm cm2
100 100 100 6 10 12 26
120 120 120 6.5 11 12 34
140 140 140 7 12 12 43
160 160 160 8 13 15 54.3
180 180 180 8.5 14 15 65.3
200 200 200 9 15 18 78.1
220 220 220 9.5 16 18 91
... ... ... ... ... ... ...

2.2.6 Binary (0/1) variables

The dashed lines indicate to existence or


non-existence of a bar. This leads to a
so-called topology optimization problem,
which is again not in the focus of this
course.

16
Figure 2.6: The homogenization method for topology optimization

As a special case of topology optimization we have the “homogenization method”. This


is a popular approach in mechanical engineering, allow both “holes” and “no-holes” in a
design domain (see Fig. 2.6).

2.3 Shape variables


Shape variables determine the shape of a structural system, sometimes used exclusively,
sometimes used along with sizing variables discussed above.

2.3.1 Trusses (shape optimization)


(xi, yi)
new shape

The dashed line is the new shape. The


coordinates xi and yi of node i are assigned
to the design variables x1 and x2 ,
respectively.

2.3.2 Slabs (plates subjected to in-plane forces)

The master node (solid circles) are fixed.


The movable nodes (hollow circles)
represent design variables. This problem
can be considered as a mesh generation
problem in FE.

movable nodes (free)


master nodes (fixed)
moving nodes representing variables

17
2.3.3 Shells
Revolving hyperboloid (cooling tower)

r(z)

t = const.

t = const.
t = r(z)
z
r

Beam-like cylindrical shells

z = z(y) = x1 = x1 (y)
t = t(y) = x2 = x2 (y)

Originally, these are problems with unknown functions, for example, r = r(z) or z = z(y).
In other words, the design variables r and z are functions of further parameters. This leads
to so-called variational problems, where functionals are the basis in the design space.
Thus, given a base of functionals, variational problems can be transferred to parameter
optimization problems, using, for example, a transformation equation such as

k
z(y) = ∑ xi · yi−1 .
i=1

2.4 Dependent design variables


Besides using pure design variables (primary variables) and constants, we can also intro-
duce dependant design variables that are direct functions of the primary design variables.
Some examples will be given in the following sections.

2.4.1 Description of a tube cross-section


R
The thickness t = x1 can be a design
variable. The medial diameter
t
r dm = x2 = 2r + 2 2t = 2r + x1 can be a
second design variable. Also, the area is
given by A = π(R2 − r2 ) = π(2rt + t 2 ) =
πt(2r + t) = πx1 x2 .
dm

18
2.4.2 Moment of inertia expressed in terms of area
b Given a rectangular cross-section, we can
express the moment of inertia Iy in terms of
a design variable, say xk = A,

bh3 h
= xk · β,

h
Iy = = A·
12 12
with β = h/12.
Since areas are often defined as sizing variables, it is no surprise that cross-sectional
values of various categories of beams, columns, bar, etc. are introduced as dependant
design or behavior variables.

2.4.3 Sandwich-shaped cross-section


b
A =xk = 2bt,
h2 h2
t

Iy = A = Aβ1 , β1 = ,
4 4
h h
h

Wy = A = Aβ2 , β2 = ,
2 2
t

with section modulus Wy .

2.4.4 I-shaped members


With κ = s/t, we have

6bh2 + κh3
t

Iy = ,
24b + 12κh
so we can express Iy by means of the area A
s as
h

Iy = β1 A = βxk

with
b
h2 (6bt + s)
β= .
12(tb + sh)(2tb + sh − st)

2.4.5 I-shaped members according to standards


The cross-sections of I-shaped structural components can also be represented by depen-
dant design variables if the area is used as the primary optimization parameter. This is to
be demonstrated for an I-shape using the IPBl-series (HEA-Reihe, according to DIN 1025,
Iy Wy
part 3). Plotting the normalized 1 values Iy500 , Wy500 and AAs500
s
over the normalized areas
1
reference is the IPBl-500 I-profile

19
Figure 2.7: The relative values of Iy , Wy and As with respect to an IPBl-500 profile.

A
A500 , the non-linear relationships

Iy = 3.78A2 (curve × ),

Wy = 1.58A A (curve ),
As = 0.20A1.15 (curve  ),
are obtained as simplified models (see Fig. 2.7).

2.5 Realization of constraints


The optimization of an engineering structure is subject to constraints that may have the
form of equalities or inequalities. As a rule, a large number of constraints occur in prac-
tical real world problems. Hereby, the individual constraints may come from a wide va-
riety of aspects (structural mechanics, construction requirements, substitutes for esthetic
demands, etc.) Of course, each constraint may be influenced by one or more design vari-
ables. Only then is it meaningful to have constraints which can affect the optimal design.
Some constraints are pretty simple, such as side constraints, which are the bounds of the
design variables (upper and lower bounds of each design variable xi ). In engineering, those
constraints that depend non-linearly on the design variables are challenging. Contrary
to pure mathematical problems of theoretical nature, in engineering implicit constraints
create the largest obstacles. Implicit in this context means that the constraints depend only
indirectly on or are only indirectly influenced by the design variables.
Example: The displacement in a joint of a truss has to less than a prescribed value. To
compute this displacement, a complete finite element analysis for the current design, de-
fined by the design vector x, has to be carried out.
Implicitness makes optimal design a hard-to-do job. Some general knowledge how to
handle constraints in some special cases is helpful.
Possible transformations: In design optimization, inequality constraints are the rule.
This is to say that equality constraints (for example, the equations of equilibrium in struc-

20
tural mechanics) are circumvented to the greatest possible extent. The rudimentary trans-
formation, as in the example

x1 + x2 = 3 → 3 − ε ≤ x1 + x2 ≤ 3 + ε

where ε ≪ 1 and ε > 0, is in most cases, however, not appropriate due to numerical
problems.
In certain (simple) cases, similar to the example

x1 + x2 = 3,
x3 ≤ 4,
x4 + x5 ≤ 10,
x1 + x22 ≤ 5,

it is not a bad idea to resolve the equation system by eliminating one of the variables in
an equality equation. Thus, a solution of x1 + x2 = 3 with respect to, for example x1 , gives
x1 = 3 − x2 . Replacing the variable x1 in the last equation reduces the total number of
constraints and design variables. We have

x3 ≤ 4,
x4 + x5 ≤ 10,
(3 − x2 ) + x22 ≤ 5 or x22 − x2 ≤ 2.

Renaming x1 = x2 , x2 = x3 , x3 = x4 and x4 = x5 results in


 2 
x1 − x1 − 2 ≤ 0
 
g(x) =  x2 − 4 ≤ 0 .
x3 + x4 − 10 ≤ 0

There are some cases where it might be useful to eliminate specific constraints completely.
This may hold particularly for side constraints of the type xi ≥ 0 or 1 ≥ xi ≥ 0. It can easily
be shown that the transformations

xi → Xi2, xi → eXi , xi → |Xi |, xi → sin2 Xi ,

can replace the side constraints given above using the new design variables Xi . In the
same fashion, the general side constraint αi ≤ xi ≤ βi can be compensated by means of
the transformation xi → αi + (βi − αi ) sin2 Xi .
It should be emphasized explicitly that the original linear nature of the constraints gets
lost, of course. Now, a non-linearity is introduced into the given objective function. In
practical cases, occasionally, the engineer who is modeling an optimization problem does
not want be narrowed by mathematical requirements which ask for special categories of
functions used in the formulation of constraints. Instead, one wishes to formulate certain
constraints in an “algorithmic fashion”. An example of this is a constraint used in the
shape optimization of hypars (hyperbolic paraboloids) in order to assure flat shells. Here
we define (see Fig. 2.8)
 
x4 − x5 x4 − x3 x3 x5 T
gl = max | |, | |, | |, | | ≤ 0.24.
x1 x1 x1 lv

21
Figure 2.8: A hypar element

From the engineering point of view, constraints, to a large extent, derive from check-
ing procedures defined in numerous standards. Characteristic demands due to standards
include:

• existing stresses are not allowed to exceed admissible stress ranges,

• actual displacements, in dynamic systems also velocities and accelerations, have to


be below dangerous limits,

• no instability due to buckling is to occur,

• the ultimate limit states, represented in a plentitude of standards, have to be paid


attention to,

• the limit state of serviceability with respect to the required limit has to be complied
with reliably.

Obviously, the equality constraints stemming from (1) the equations of equilibrium, (2)
the equations of kinematics and (3) the constitutive equations according the material used
also play an important role. We will see, however, that these equations are regularly in-
corporated into the implicit constraints mentioned above.

2.6 Optimization criterion


The optimization criterion, also called objective criterion, objective function, merit func-
tion or decision criterion, determines how the feasible candidates xk , k = 0, 1, 2, ... are to
be evaluated. Consequently, the quality of design is measured.
The criterion must be a scalar function or, as is often the case in engineering, a scalar-
valued algorithm dependant on the design variables xi , i = 1, 2, ..., n. If it is a function, it
is represented by f (x), which, according to our convention, is to be minimized.
It is quite evident that the definition of an appropriate optimization criterion is a prominent
step in creating an optimization model (structural optimization or design optimization
model). Several objectives can be used and have already been used in the engineering
world. These are, for example:

• volume of a structural system,

22
Figure 2.9: Bending moment distribution along the shell

• weight,

• cost,

• profit (to be maximized!),

• energy expenditure,

• specified physical or mechanical quantities (e.g. maximum values of displacements,


rigidity, stiffness, frequency values, etc.) and

• failure probabilities.

Again, as outlined in the discussion of the constraints, in certain cases it makes sense for
an engineer to formulate a general optimization criterion that is not a “function” in the
strict mathematical sense, but rather a general algorithm. The purpose of the algorithm
is to describe in detail how a numerical value, used for the quantification of a candidate
design x, is computed.
By way of an example, such an algorithmic criterion was used for example in the opti-
mization of beam-like cylindrical shells (carried out by the author of these lecture notes).
To catch the manufacturing cost in terms of a structure-oriented quantity the following
algorithm was established:

1. compute the bending moment distribution along the shell,

2. determine the absolute values of the minimum and maximum my -values,

3. look for the supremum,

4. minimize it.

Thus we get

sup {|my,min,max |} → min.

23
24
Chapter 3

A structural optimization model for a


two bar truss

To get an insight in what the practical modeling of a structural optimization problem looks
like, a simple two bar truss is to be considered. As will be shown, the simple structure as-
sociated with a very elementary structural analysis does not result in a trivial optimization
problem.

3.1 Structural system


The two bar truss (see Fig. 3.1) is defined by the
2
quantities: t

B = half width,
H = height, dm
L = bar length,
L
H

α = inclination angle, 2P
1

E = modulus of elasticity.
The cross section of both bars is a tube with the α
1 3
characteristic parameters t (thickness of the thin-
walled tube) and dm (median diameter).
From the drawing of the structural system, it can be B B
seen that the nodes 1 and 3 are fixed nodes while
node 2 is free. The load at the vertex of the truss Figure 3.1: A two bar truss
is given as 2P. In fact, it would be very difficult to
find a less simple structural system!

3.2 Definition of the design variables


We are interested in a sizing as well as a shape optimization of the structure. Therefore,
we introduce two design variables that we wish to optimize, i.e. to determine the values
of the design variables such that an optimization criterion is optimal while a set of given
constraints is satisfied.
The selection of these two variables is based on “sure instinct” and “engineering intuition”
on what may be significant for a design. Two variables are taken here to gain a two-

25
dimensional optimization problem for the sake of clarity (the problem can be visualized
geometrically).
   
x H
Defining x1 = H and x2 = dm , we get the optimization vector x = 1 = , while
x2 dm
placing all further quantities B, E, P and t constant.

3.3 Optimization criterion for a two bar truss

R
As we know, each parameter of the optimization problem requires a def-
inite optimization criterion. Although a cost criterion would be desirable,
t
r
as in most cases, here an engineering-fashioned objective is taken into ac-
count. In this example, we define the volume V of the total structure as the
objective function, V = 2AL, where A is the cross section area of a single
dm
bar and L is the bar length. We have
p q
L = L(x1 ) = H 2 + B2 = x21 + B2

and

A = A(x2 ) = π(R2 − r2 ) = πtdm = πtx2 ,

because dm = 2r + t.
q
As a consequence, the volume V can be written as V = 2πt · x2 x21 + B2 .
With respect to the stability of a solution (numerical or geometrical), very often it is
reasonable to introduce normalized variables. In our case, we choose
x1 H
x̃1 = = ,
B B
or x1 = Bx̃1 and (to allow computational simplifications)
πx2
x̃2 = ,
B
or x2 = Bx̃2 /π. Note also that the new variables x̃1 and x̃2 are without dimensions.
Thus, we have
q
Bx̃2
V = 2πt · · x̃2 B2 + B2
πq 1
= 2B2t · x̃2 x̃21 + 1.

Dividing by 2B2t leads to

V p
= f (x̃) = x̃2 x̃1 + 1.
2b2t

Again, the function f (x̃1 , x̃2 ) is dimensionless because the term 2B2t has the same dimen-
sion as the volume V .

26
q
Figure 3.2: A 3D-plot (left) and a contour plot (right) of the objective function x̃2 x̃21 + 1

Obviously, the function f that we take as our objective function is non-linear1 . The visu-
alization using Mathematica is demonstrated in Fig. 3.2.
3.4 Modeling the constraints
Modeling the set of constraints is extensive, even in this simple problem. However, as
we have seen from the discussion of the possible solution domains in an optimization
problem the definition of constraints is crucial (to avoid pathological situations for the
computer2 ).
A very first step is to look at the lower and upper bounds of the optimization variables x1
and x2 or x̃1 and x̃2 . Immediately we can identify the following four cases (see Fig. 3.3):
1. x2 = dm > 0: zero or negative values make no sense,
2. x1 = H = 0: this truss is instable,
3. x1 = H > 0: the two bars are subjected to compression,
4. x1 = H < 0: a truss with tension forces in both bars.
The discussion of the possible ranges for the design variables demonstrates that the stresses
in the structure are prominent quantities in the formulation of constraints.
According to the equilibrium conditions at the vertex we have (see Fig. 3.4)
P P P
sin α = = = ,
S1 S2 S
1 Note that taking L as a design variable also leads to a non-linear objective function.
2 Note that computers are pretty “stupid” with respect to the recognition of problems!

+
2P H

2P

x1 = H = 0 x1 = H > 0 x1 = H < 0
the truss is instable bars are subjected to compresson bars are subjected to tension

Figure 3.3: Three possible cases of truss loading


27
where

x1 H
sin α = =
L L

and
q
L = x21 + B2 .

Thus
q
P P x21 + B2
S1 = S2 = S = = = S(x1 )!
sin α x1

It is to be emphasized that the equation for the axial force S represents a set of equal-
ity constraints within the structural optimization problem. Usually we have equilibrium
equations of the form

∑ ~F = 0, or ∑ Fx = 0, ∑ Fy = 0.
Also, we can see that
q
P x21 + B2
lim S = lim → ∞,
x1 →0 x1 →0 x1

which leads to a singularity in the model that, of course, has to be eliminated. We have to
therefore ensure that x1 6= 0 and x̃1 6= 0.
The stresses due to the forces S1 and S2 yield stresses
σ (simple stress concept assumed) that are given by
2
q S1
2P
S S(x1 ) P x21 + B2
σ = σ(x1 , x2 ) = = = · . α
A A(x2 ) πt x1 x2 α
L
H = x1

2P
S2
1

We have to demand that the existing stresses do not


exceed admissible limits, i.e., α
1 3
|σexist | ≤ σCadm , x1 > 0; |σexist | ≤ σTadm , x1 < 0,
B B
where the superscripts C and T denote compression
and tension, respectively. Taking into account the Figure 3.4: Equilibrium conditions in
signs of the stresses in the cases x1 > 0 and x1 < 0 the twobar
gives

σexist − σCadm ≤ 0, x1 > 0; −σexist − σTadm ≤ 0, x1 < 0.

For the solution, we still have to replace the design variables x1 and x2 by the dimension-
less new variables x̃1 and x̃2 ; this will be carried out subsequently.

28
3.5 Stability problem
In the case x1 > 0, the axial compressive forces S may induce a buckling that, of course,
has to be prohibited. From the fundamentals of structural mechanics we know the critical
buckling load Scrit corresponding to the end conditions in the two bar truss (Euler case II)
is
π2
Scrit = EI ,
L2
where EI is the flexural rigidity of each bar. It is required that ν · Sexist ≤ Scrit , introducing
ν as a safety factor.
An essential comment: Of course, more detailed checking concepts may be applied for
buckling (EC 3); nevertheless, simple concepts may serve as an appropriate “operational
basis”.
The moment of inertia becomes a behavior variable. For the cross section of the tube we
get
π 4
I= (d − di4 ).
64 a
Replacing da = dm + t = x2 + t and di = dm − t = x2 − t, after some intermediate compu-
tations we get
π
I = (dm3 t + dmt 3).
8
If
t t 1
= ≤ ,
da dm + t 10

we can neglect the term dmt 3 and write


1
I ≈ Adm2 ,
8
with A = πtx2 . With this, we can formulate the stability constraint as ν · Sexist − Scrit ≤ 0
or
q
x21 + B2 π  1
νP · − E tx32 · π2 2 2
≤ 0.
| {z
x1
} | 8{z } x1 + B
| {z }
Sexist I 1/L2

This equation is associated with the constraint for the validity limit
t 1 t 1
≤ , or ≤
da 10 t + x2 10
that can be rewritten to 10t ≤ t + x2 , or 9t − x2 ≤ 0.
Additional constraints are defined to ensure a closed solution domain; also geometrically
unacceptable configurations of the structure should be prevented. This is accomplished
by the side constraints x2 ≤ 50t, where 50t is more or less arbitrary (tubes according to

29
Table 3.1: The set of constraints for the two bar truss

q
x21 + B2 Etπ3 x3
g1 (x1 , x2 ) = ν · P − · 2 2 2 ≤ 0, x1 > 0,
x1 8 x1 + B
q
P x21 + B2
g2 (x1 , x2 ) = · − σCadm ≤ 0, x1 > 0,
πt x1 x2
q
−P x21 + B2
g3 (x1 , x2 ) = · − σTadm ≤ 0, x1 > 0,
πt x1 x2
g4 (x2 ) = x2 − 50t ≤ 0,
g5 (x2 ) = −x2 + 9t ≤ 0,
g6 (x1 ) = x1 − 2B ≤ 0, x1 > 0,
g7 (x1 ) = −x1 − 2B ≤ 0, x1 < 0,
g8 (x1 ) = −x1 + 0.2B ≤ 0, x1 > 0,
g9 (x1 ) = x1 + 0.2B ≤ 0, x1 < 0,

DIN 2448 are ≤ 16t) and 0.2B ≤ x1 ≤ 2.0B, where, again, the chosen factors 0.2 and 2.0
are arbitrary, but “reasonable”.
Summarizing the individual constraints in a systematic manner, we get the set of equations
in Table 3.1, or in a more abstract notation,

min V (x) | g j (x) ≤ 0; j = 1, 2, ...9 .
x

To improve the computational efficiency3 , in harmony with the reformulation of the ob-
jective function, the normalized design variables (x̃1 and x̃2 ) are applied.
The transformation process is to be demonstrated only for the stability constraint (x1 > 0)
q
x21 + B2 Etπ3 x32
νP − · ≤ 0,
x1 8 x21 + B2
where, again, we substitute
x̃2 B
x1 = x̃1 B and x2 = .
π
Thus,
q
x̃21 + 1 EtB x̃3
νP · − · 2 2 ≤ 0.
x̃1 8 x̃1 + 1
Multiplying the above inequality by 8/EtB leads to
q
8νP x̃21 + 1 x̃3
· − 2 2 ≤ 0,
EtB x̃1 x̃1 + 1
3
Note that optimization is always a numerically intensive job!

30
Table 3.2: The set of constraints using normalized variables

q
g̃1 (x̃) = Ω1 (x̃21 + 1)3 − x̃1 x̃32 ≤ 0, x̃1 > 0,
q
g̃2 (x̃) = Ω2 x̃21 + 1 − x̃1 x̃2 ≤ 0, x̃1 > 0,
q
g̃3 (x̃) = Ω3 x̃21 + 1 + x̃1 x̃2 ≤ 0, x̃1 < 0,
g̃4 (x̃) = x̃2 − Ω4t ≤ 0, x̃1 > 0,
g̃5 (x̃) = −x̃2 + Ω5 ≤ 0, x̃1 > 0,
g̃6 (x̃) = x̃1 − Ω6 ≤ 0, x1 > 0,x̃1 > 0,
g̃7 (x̃) = −x̃1 + Ω7 ≤ 0, x1 < 0,x̃1 < 0,
g̃8 (x̃) = −x̃1 + Ω8 ≤ 0, x1 > 0,x̃1 > 0,
g̃9 (x̃) = x̃1 − Ω9 ≤ 0, x1 < 0,x̃1 < 0,

Table 3.3: Definitions for the constraints using normalized variables

8νP P P
Ω1 = Ω2 = Ω3 =
EBt BtσCadm BtσTadm
t t
Ω4 = 157.1 Ω5 = 28.3 Ω6 = Ω7 = 2.0
B B
Ω8 = Ω9 = 0.2

or
q
x̃21 + 1 x̃32
Ω1 · − ≤ 0,
x̃1 x̃21 + 1
where Ω1 = 8νP/EtB is a dimensionless factor. Similarly, we obtain the other constraints
g j using normalized design variables (see Table 3.2) using the definitions given in Table
3.3.
In the x̃1 , x̃2 -plane the contour lines of the objective function are to be drawn. The can be
achieved by putting the values -1.0, -0.5, 0, 0.5, 1.0, etc. into the objective function Ṽ . By
that, an array of curves can be implicitly determined using the relationship

x̃2 = x̃2 (x̃1 ) = q .
x̃21 + 1

An example for the series 0.1, 0.2, 0.25, etc. has been visualized by Mathematica (see Fig.
3.5 (left)). Similarly the constraints can also be pictured in the x̃1 , x̃2 -plane. The approach
is to be exemplified for the stability constraint only.
Resolving
q
Ω1 (x̃21 + 1)3 − x̃1 x̃32 = 0

31
q
1/3
Figure 3.5: A contour plot of Ṽ (left); a plot of x̃21 + 1 · 0.0019
1/3 (right)
x̃1

Figure 3.6: A plot of the contour lines of the objective function and the constraints

gives, after some transformation,


q
p x̃21 + 1
x̃2 = Ω1 · √
3
3
,
x̃1
which is shown in the Mathematica graph in Fig. 3.5 (right).
The final situation, relevant to the solution, is illustrated in the Fig. 3.6 showing the con-
tour lines of the objective function and the controlling constraints. There are two separate
sub-domains for the solution: on the left the sub-domain for the tension forces, on the right
the sub-domain for compression. The optimal solution for both sub-domains is found at
the two points (1.014, 0.18) and (−1.014, 0.18) where the contour line f = fmin = 0.25 is
tangent to the constraints g̃2 and g̃3 , respectively.
Recalculation of these results into the original (true) quantities V , x1 and x2 yields
V ∗ = 1018 cm3 ,
x∗1 = ±101.4 cm,
x∗2 = 5.79 cm.

32
Chapter 4

Application of design optimization in


structural engineering

To demonstrate the enormous capabilities of structural optimization, characteristic appli-


cations in the engineering practice will be given in this chapter. In particular, the effects of
an optimization approach with respect to the layout of structural systems is to be clarified.
The two bar truss example (see last chapter) has been used only to get an insight into
the internals of an optimization model. To cover a broad spectrum of applications the fol-
lowing categories of structures are considered: trusses, beams, framed structures, plates,
shells and mixed structures.
In the following, a potpourri of interesting examples will be discussed, starting with
trusses (both statically determinate and indeterminate).

4.1 Trusses
Historically, truss optimization has been carried out very early, long before the com-
puter was even invented! Two names of famous engineers have to be mentioned, Maxwell
(1869) and Michell (1904), who set up the following theorem:

Theorem: A truss is optimal with respect to a specific load case if the bars
are concurrent (run parallel) to the direction of the principal strains.

This lead to the so-called Michell structures which were also verified by the homogeniza-
tion method based on computer models by Bensoe and Kikuchi (1988, see Fig. 4.1).
There are some further theorems in truss optimization that should be known because they
represent an effective method of testing numerical optimization strategies; these theorems
can be used to validate an “optimizer”.

4.1.1 Statically determinate trusses


Statically determinate trusses subject to stress constraints only (pure stress constraints or
derived from stability conditions) can be fully stressed (so-called fully stressed design)
which are optimal with respect to weight,
Si
= σ1 ≤ σadm ,
Ai

33
c)

Figure 4.1: (a) Michell structure, (b) Similar truss structure, simplified, (c) A schematic view of a
Michell structure derived from a plate where “unneeded” material has been removed.
This is an example of topology optimization.

F F

F F

Figure 4.2: For both load cases (top and bottom), there is a unique optimal subsystem.

or
Ai = σadm · Si ,
for the forces Si and cross sectional areas Ai of each bar i. Note: The design variables Ai
are degenerated, therefore, only shape optimization make sense in this case.

4.1.2 Statically indeterminate trusses


Statically indeterminate trusses do not necessarily need to be fully stressed designs. To
satisfy the kinematic conditions, they can be fully stressed designs only if, in the case of
k-fold indeterminancy, k bars are zero-bars! But: if there is only a single load case, one of
the potential statically determinate subsystems represents the optimum weight of the total
(original) system. In addition, as a consequence it is also fully stressed.
If there are multiple load cases, then for each load case a determinate subsystem can be
identified as an optimal solution (see Fig. 4.2). In total, the linear combination of these
subsystems that is compatible with respect to the kinematic conditions gives the final so-
lution, which is not necessarily a fully stressed design. An example of shape optimization
is shown in Fig. 4.3, where different constraints produce different optimization results.

34
initial geometry and loading
variables are the nodes on the upper chord

optimal solution #1 optimal solution #3


no stability constraints / no displacement constraints with stability constraints / no displacement constraints
mass: 1656 kg mass: 2905 kg

optimal solution #2 optimal solution #4


no stability constraints / max. displacement 0.01 m with stability constraints / max. displacement 0.01 m
mass: 2911 kg mass: 3315 kg

Figure 4.3: Shape optimization: constraints affecting the optimization result (Pedersen)

35
4.2 Several trusses in different situations
Some comments in advance: Based on an intuitive design, the optimum weight is sought.
The design variables are specified nodal coordinates of the trusses. Sizing variables can
be eliminated (degenerated design variables) because of the statical determinancy of the
trusses. Therefore, the cross sections are computed from stress constraints (stability ac-
cording to DIN 18800) such that they are fully stressed. As a consequence, again shape
optimization is of interest (see Figs. 4.4,4.5).

36
Cantilever
material: steel; loads are at the upper chord;
optimization variables are the coordinate of points 1, 2 and 3;
initial weight: 2.13 t; final weight 1.91 t; improvement: 10.1%

initial geometry optimal geometry

Cantilever
material: steel; loads are at the upper chord;
optimization variables are the vertical coordinates of nodes at the lower chord;
initial weight 5.43 t; final weight 4.33 t; improvement: 20.3%

initial geometry optimal geometry

Note: the shape of the bottom chord is analogous


to the bending moment distribution

Truss with two supports


material: steel; loads are at the upper chord;
optimization variables are the vertical coordinates of nodes at the upper chord;
initial weight 7.74 t; final weight 4.97 t; improvement: 35.8%

optimal geometry
initial geometry Notice the thrust-line shape of the upper chord

Truss with two supports


material: steel; loads are at the upper chord;
optimization variables are the vertical coordinates of nodes at the upper and lower chords;
initial weight 7.74 t; final weight 4.96 t; improvement: 35.9%

initial geometry optimal geometry


Notice the combination of strutted frame, thrust-line structure
and understretched beam

Figure 4.4: Examples of geometry optimization of structural systems

37
Bridge girder
material: steel; loads are on the lower chord;
optimization variables are the coordinates of nodes at the upper chord;
initial weight 7.48 t; final weight 3.42 t; improvement: 54.4%

initial geometry optimal geometry


Notice the shape of a Michell structure

Bridge girder
material: steel; the concentrated load is in the center of the lower chord;
optimization variables are the coordinates of nodes at the upper chord;
initial weight 12.51 t; final weight 7.70 t; improvement: 38.4%

initial geometry optimal geometry


Notice the wing like truss shape

Gantry crane as a three hinged arch


material: steel; the loads are at the upper chord;
optimization variables are the vertical coordinates of nodes at the upper chords and horizontal coordinates of nodes at the outer chords;
initial weight 24.09 t; final weight 20.42 t; improvement: 15.3%

initial geometry optimal geometry


Notice the M-line shape of the chords

Pylon used for high-voltage lines


material: steel; the loads are at the upper chord;
optimization variables are the coordinates of nodes;
coordinates of the support and upper chord nodes are fixed;
initial weight 32.92 t; final weight 24.90 t; improvement: 24.3%

initial geometry optimal geometry


Notice that the material is gathered at the outside

Figure 4.5: Examples of geometry optimization of structural systems (cont.)

38
4.3 Optimization of a statically indeterminate truss
In the following, a large-scale truss, being statically indeterminate, is optimized with re-
spect to two separate objective functions (see Fig. 4.6). Only cross sections are introduced
as design variables. Hence, no shape optimization is considered. Furthermore, two options
of the structural system are evaluated:

1. an angular assembly of the bars (alternative No. I),

2. a more homogenous assembly by additionally inserted bars to get a better, smoothed


contour (alternative No. II).

The objective functions are

1. a cost function Q = ∑i f (Ai ),

2. the weight of the structure Q = ∑i Ai li γ.

39
initial structure and loads sizing optimization, version 2, objection function is costs
(dashed lines indicate additional bars in version 2) volume: 0.108
Note: although there are more bars in this version, the
total volume is less than in version 1

sizing optimization, version 1, objection function is costs


total volume: 0.114 sizing optimization, version 2, objection function is total weight
volume: 0.0907
Note: observe the concentration of material in the
external domain of the structure

iteration history for the cost optimization problem iteration history for the weight optimization problem

Figure 4.6: Cross section optimization of truss (top) The iteration history of the cost optimization
(bottom left) and weight optimization (bottom right) problem

40
constant variable
F cross section cross section

h = const

h
g = 100% g = 74 %
1.0 F 1.21 F 1.61 F

Figure 4.7: Critical buckling loads depend on the cross section type (left); Theoretical alternatives
for the shape of an I-beam (right)

fixed support

fixed support

Figure 4.8: The fish-bellied beam is the theoretical optimal design

4.4 Beams
Similar to truss optimization there are a few forerunners in the optimization of beams
stressed through bending moments. Very early, Gallilei created some optimal solutions of
a beam subjected to a centrically induced compression load. Assuming that the volume
remains constant the critical buckling load of the system shown in Fig. 4.7 (left) can be
improved by changing the cross sections as shown.
Further rudimentary examples are shown in Fig. 4.7 (right). A continuously increased
flange (the top as well as the bottom flange) reduces the weight from 100% to 74%. (Of
course, the solution is debatable with respect to engineering requirements!). The fish-
bellied beam shown in Fig. 4.8 is another historical example for a clamped beam with a
uniform load and fixed supports.
A simple example for structural optimization of beams using the finite element method is
shown in Fig. 4.9. Material distribution is to be optimized. The loads are a uniform load
(left side) and a concentrated load with a varying number of sections.

41
definition and loading of a continuous beam

discretization into 4 beam elements

discretization into 10 beam elements

discretization into 20 beam elements

Figure 4.9: Optimization of a continuous beam with a varying number of sections

42
4.5 Framed structures
A highly sophisticated example is the strucutural optimization of a frame with three storys
and two bays. The assumed loading is an earthquake, represented in terms of a horizontal
pertubation (earth tremor) at the bottom to the structure. To compute the relevant con-
straints, in this case the finite element analysis requires the solution of the Newtonian
equations of motion,

Mr̈(t) + Dṙ(t) + Kr(t) = P(t),

where
M, D, K are system matrices,
r, ṙ, r̈ are kinematic quantities and
P(t) is the time-variant loading.
Design variables are the cross section where, in total, six distinguished cross section
groups are defined. The material used is St37 along with H-EA-shaped beams (standard
I-beams). Consequently, six design variables are introduced. The objective functions are

1. a cost function and

2. the weight of the frame.

Each function is minimized with respect to

• stress constraints,

• constructive constraints,

• displacement constraints (one at the top of the frame),

• validity constraints.

The finite element analysis is based upon 24 beam elements lumped together in 18 nodes
(so-called lumped mass concept). The initial frame an optimization results are shown in
Figs. 4.10 and 4.11.

43
Figure 4.10: Structure design and loading of the initial frame. Optimization variables are are cross
sectional areas, the displacement at the top has to be ≤ 0.0327 m. The design objec-
tive is weight and cost. The nodes 1, 2, and 3 are fixed.

Figure 4.11: Comparision of optimal moment distribution (left) and cross sectional area (right).
In the right figure, the results of cost optimization and weight optimzation are shown
with a dotted pattern and hashes, respectively. The final volume of cost optimization
is 0.9123 m3 , after weight optimization it is 0.8645 m3 .

44
Figure 4.12: In-plane weight optimization of a screw wrench for a given dominant load case. The
CAD model (left); the initial FE mesh (center); the final FE mesh and boundary
conditions (right).

Figure 4.13: Lug optimization: The state variables are bearing stress (≤ 36000 psi) and maximum
σe (≤ 21000 psi). The design variables are l, h, d1 , d2 , r and t. The loads are P1 =
2600 lbs. and P2 = 15000 lbs., both uniformly distributed.

4.6 Panel structures with in-plane forces


2D-stress problems, of course, require the incorporation of appropriate finite element
computations within the structural optimization process. Three optimization problems
should exemplify the charteristics involed:

1. the optimum design of a screw wrench (Fig. 4.12),

2. the optimal definition of a hole in a lug (Figs. 4.13, 4.14),

3. the optimization of holes in a Vierendeel girder (Fig. 4.15).

45
Figure 4.14: Initial and final lug geometries showing line and keypoint numbers. In the initial
design (top), the volume is 25.4 in3 , the thickness is 1.0 in. The design is infeasible.
In the final design (bottom), the volume is 16.4 in3 , the thickness is 1.2 in. It took 23
loops to converge.

46
Initial design, weight = 106.5 kN

2
Optimum design, weight = 87.5 kN (σν = 200 N/mm )

2
Optimum design, weight = 112.0 kN (σν = 150 N/mm )

Figure 4.15: Optimization of a Vierendeel truss (a frame with a top beam containing individual
shear walls). The question to be answered is: What are the optimal shapes of the
notches used for installation purposes? Shown are the initial design (top) and optimal
solutions for σν = 200 N/mm2 (middle) and σν = 150 N/mm2 (bottom)

47
Figure 4.16: System and optimal form: (a) crown load, supports movable

Figure 4.17: System and optimal form: (b) crown load, supports fixed

4.7 Shells
The structural efficiency of shells and the wide variety of the possible shapes, in terms
of 3-dimensional surfaces, make structural optimization of shells extremely interesting.
However, this also forms a real challange to the structural designer.
Obviously, powerful finite element methods have to be provided in structural optimiza-
tion models to obtain the desired optimal design. The influence of loading as well as the
influence of fixing of supports is demonstrated in the optimization of a spherical calotte
belonging to the shell of revolution category (line load at the crown, dead load, snow load,
wind load; see Fig. 4.16 - 4.20).

48
Figure 4.18: System and optimal form: (c) dead weight, supports movable

Figure 4.19: System and optimal form: (d) snow load, supports movable

49
Figure 4.20: System and optimal form: (e) wind load, supports movable

50
Chapter 5

Solution techniques in optimization

If more than two design variables are defined a mathematical or numerical soluton tech-
nique is needed. Descriptive graphical solutions, used for 2-dimensional problems, are no
longer applicable.
As an overview of the material that we have to compile, a very broad classification of the
optimization techniques available is given in Fig. 5.1.
A throrough knowledge of optimality conditions is important to understand the perfor-
mance of the various numerical methods used in the design practice. In particular, to dis-
cuss optimal design concepts some fundamentals of vector and matrix algebra is needed.
In this context, the differentiation notation for functions of several variables will be intro-
duced, or at least repeated (see the corresponding lectures in mathematics).
The gradient vector for a function of several variables plays a crucial role in classical
and also modern numerical approaches. In the same fashion, the so-called Hessian ma-
trix which defines the second partial derivatives of a function, is also important in the
mathematical solution of optimization problems.

5.1 Gradient vector


Since the gradient of a function is used during the discussion of methods for optimum de-
sign, its geometrical significance needs to be elucidated. Also, the differentiation notation,
used throughout numerical optimization, is to be defined.

optimization methods

indirect methods direct methods


(optimality criteria methods) (search methods)

constrained problems unconstrained methods

Figure 5.1: Classification of optimization methods

51
Consider a function f (x) of n variables x1 , x2 , x3 , . . . , xn . The partial derivatives of the
function with respect to x1 at a given point x∗ is defined as
∂ f (x∗ )
,
∂x1
and with respect to x2 as
∂ f (x∗ )
,
∂x2
and so on.
For convenience and compactness of notation, the individual partial derivatives are ar-
ranged into a column vector called the gradient vector that, in the mathematical literature,
is represented by any of the following symbols:
∂f
∇f, , grad f .
∂x
Thus, we have
 ∂f 
∂x
 1
 
 ∂f 
 ∂x   T
 2 ∂ f ∂ f ∂ f
∇ f (x ) = 

  = ∂x1 , ∂x2 , . . . , ∂xn ∗

 ..  x
 . 
 
 
∂f
∂xn x∗
where the superscript T denotes the transpose of the row vector. Note that all partial
derivatives are taken at the given point x∗ .
Geometrically, the gradient vector is normal to the tangent plane at the point x∗ as shown
in Fig. 5.2 for a function of three variables x1 , x2 and x3 . Also, it points in the direction
of maximum increase in the function. In our case, the gradient will be used in developing
optimality conditions or in calculating appropriate search directions.

5.2 Hessian matrix


Differentiating the gradient vector once again (that is, each component of the gradient
vector is differentiated with respect to each variable xi ), a matrix of the second partial
derivatives can be composed,
 2 
∂ f ∂2 f ∂2 f ∂2 f
 ∂x21 ∂x1 ∂x2 ∂x1 ∂x3 . . . ∂x1 ∂xn 
 
 2 
 ∂ f ∂2 f ∂2 f ∂2 f 
∂x2 ∂x3 . . . ∂x2 ∂xn 
 ∂x ∂x ∂x22

 2 1
 
∂2 f  = ∂ f (x )
  2 ∗
 ∂2 f ∂2 f ∂2 f
. . . ∂x ∂xn 
∂xi ∂x j
 ∂x ∂x ∂x ∂x
 3 1 3 2 ∂x23 3 
 
 
 . .. 
 .. . 
 
 
 2 
∂ f ∂ f
2 ∂ f
2 ∂ f
2
∂xn ∂x1 ∂xn ∂x2 ∂xn ∂x3 . . . ∂x2 ∗
n x

52
x3
∇f(x*)

x*

x2
x1

Figure 5.2: The gradient vector is normal to the tangent plane

where all derivatives are computed at the given point x∗ . This matrix of the type n × n is
usually denoted as the Hessian H or ∇2 f . It is important to emphasize that each element
of the Hessian H is itself a function which is evaluated at the given point x∗ . Also, since
f (x) is assumed to be twice continuously differentiable, the cross partial derivatives are
equal, that is

∂2 f ∂2 f
= i = 1, 2, . . ., n, j = 1, 2, . . ., n, i 6= j.
∂xi ∂x j ∂x j ∂xi

Thus, the Hessian is always a symmetric matrix. This plays a prominent role in the suffi-
ciency conditions for optimality discussed later on. We therefore define the Hessian matrix
as
 
∂2 f
H= , i = 1, 2, . . ., n, j = 1, 2, . . ., n.
∂xi ∂x j

5.3 Taylor series expansion


Besides the gradient vector ∇ f (x) and the Hessian matrix ∇2 f (x) = H(x) the Taylor se-
ries expansion of a function f (x) forms one of the most essential fundamentals for the
solution of an optimization problem. In particular, the concept of the Taylor series expan-
sion can be utilized for theoretical as well as practical purposes in numerical optimization.
From the elementary courses in mathematics we ought to know that

a function f (x), with x as a simple scalar variable, can be approximated by


polynomials in a neighbourhood of any point in terms of its values and deriva-
tives.

This idea is materialized in the Taylor series expansion. Considering first a simple function
of a single variable x, the Taylor expansion about the point x∗ (in our case, this point is

53
assumed to be the optimum solution) is given by

d f (x∗ ) 1 d 2 f (x∗ )
f (x) = f (x∗ ) + (x − x∗ ) + (x − x∗ )2
dx 2 dx2
1 d n f (x∗ )
+ ... + n
(x − x∗ )n .
n! dx
If we summarize the terms with the derivatives of the order larger than 2 in the remainder
term R, because it is smaller in magnitude than the previous terms of the first and second
order (provided x is sufficiently close to x∗ ), we then obtain more briefly using the prime
notation for d/dx:

1
f (x) = f (x∗ ) + f ′ (x∗ ) · (x − x∗ ) + f ′′ (x∗ )(x − x∗ )2 + R.
2
Let the difference (x − x∗ ) = ∆x be a small change in the point x∗ . Then the Taylor expan-
sion becomes
1
f (x∗ + ∆x) = f (x∗ ) + f ′ (x∗ )∆x + f ′′ (x∗ )∆x2 + R.
2
Accordingly, for a function of two variables f (x1 , x2 ), we can write the Taylor expansion
at the point x∗ = [x∗1 , x∗2 ]T as

∗ ∗ ∂ f ∗ ∂ f
f (x1 , x2 ) = f (x1 , x2 ) + (x1 − x1 ) + (x2 − x∗2 )
∂x1 x∗ ∂x2 x∗
" #
1 ∂2 f ∂2f ∂2f
+ (x1 − x∗1 )2 + 2 (x1 − x∗ )(x2 − x∗ ) + (x2 − x∗ )2 + R,
2 ∂x21 x∗ ∂x1 ∂x2 x∗ 1 2
∂x22 x∗ 2

where all partial derivatives are taken at the given point x∗ = [x∗1 , x∗2 ]T . Using the summa-
tion notation the Taylor expansion can also be written as

f (x1 , x2 ) = f (x∗1 , x∗2 )



2
∂ f
+∑ (xi − x∗i )
i=1 ∂x
i x ∗

1 2 2
∂2 f
+ ∑∑ (xi − x∗i )(x j − x∗j )
2 i=1 j=1 ∂xi ∂x j x∗
+ R.

Here, the quantities ∂ f /∂xi as well as ∂2 f /∂xi ∂x j are components of the gradient of the
function f (x1 , x2 ) and the Hessian matrix ∇2 f (x1 , x2 ), respectively, always evaluated at
the given point x∗ .
It is, therefore, not surprising that the Taylor series in matrix notation has the general form
1
f (x) = f (x∗ ) + ∇ f T x∗ (x − x∗ ) + (x − x∗ )T H|x∗ (x − x∗) + R.
2
This notation also holds for the case that the function f (x) has multiple, i.e. n, and not
only 2 variables. Therefore, x, x∗ and ∇ f are n-dimensional vectors and the matrix H is
of the type n × n.

54
Defining ∆x = x − x∗ we obtain
1
f (x∗ + ∆x) = f (x∗ ) + ∇ f T ∆x + ∆xT H∆x + R.
2
The change of the function f moving x∗ to a neighboring point x∗ + ∆x is
1
f (x∗ + ∆x) − f (x∗ ) = ∇ f T ∆x + ∆xT H∆x
2
if the term R can be neglected (second-order change).
A first-order change is consequently defined by

∆ f = ∇ f T ∆x ≡ δ f ,
where in all the cases above ∆x represents a small change in x∗ .
Example: Obtain a second order Taylor expansion for the function f (x) = 3x31 x2 at the
point x∗ = [1, 1]T.
Solution:
" #
∂f  2   
∂x1 9x1 x2 9
∇ f x∗ = ∂ f = 3 = ,
3x1 x∗ 3
∂x2 x∗
   
18x1 x2 9x21 18 9
H(x ) = ∇ f x∗ =
∗ 2
= .
9x21 0 x∗ 9 0
Approximating gives
 T    T   
9 x1 − 1 1 x1 − 1 18 9 x1 − 1
f˜(x) = 3 + +
3 x2 − 1 2 x2 − 1 9 0 x2 − 1
= 9x21 + 9x1 x2 − 18x1 − 6x2 + 9.
The accuracy of the approximate solution compared to the exact solution is shown in
following table:

x1 x2 f (x) f˜(x) error in %


1.0 1.0 3.0 3.0 0
1.2 1.2 6.2208 6.1200 1.62
1.5 1.5 15.1875 13.500 11.11

5.4 Concepts of necessary and sufficient conditions


Based upon the discussion of the Taylor series, optimality conditions can be derived by as-
suming that we are at an optimum point x∗ and then studying the behavior of the functions
and their derivatives at that point. Corresponding optimality conditions can be applied not
only in the indirect optimization methods (see the categorization of solution methods at
the beginning of the chapter), but also in the direct or search methods for specific aspects
in the numerical solution technique. Therefore, it is mandatory to understand the concept
of optimality conditions in detail.
There are (as we certainly know from school mathematics and the math courses in the
freshman year) two distinct optimality conditions, namely necessary conditions and suf-
ficient conditions.

55
1. The conditions that must be satisfied at the optimum point are called necessary con-
ditions. Conversely, if any point does not satify the necessary conditions, it cannot
be an optimum.
Note, however, that the satisfaction of the necessary conditions does not guarentee
an optimum point, that is, there may be non-optimum points that can also satisfy
the same conditions! Points satisfying the necessary conditions are called candi-
date optimum points or stationary points (maxima, minima, saddle points), such
that further tests for distinguishing between optimum and non-optimum points are
required.

2. The sufficient conditions provide these tests to decide on the actual optimality of
candidate optimum points. That is to say, if a candidate optimum point satisfies the
sufficient conditions, then it is indeed an optimum. In this case, no further tests are
needed.

Note, however, if the sufficient conditions are not satisfied or cannot be used, no conclu-
sion can be made that the candidate design (point) is an optimum. If should be emphasized
that the above discussion affects both unconstrained and constrained optimization prob-
lems. To elucidate the optimality conditions, however, we will start with the unconstrained
optimum design problem.

5.5 Unconstrained problems


The (simple) problem minx { f (x)} arises, as we already know from the discussion of all
our practical examples, infrequently in practical engineering. Nevertheless, it is appro-
priate to discuss such problems because optimality conditions for constrained optimum
design problems are nothing else than a logical extension of unconstrained situations. Fur-
thermore, some of the modern numerical strategies for solving constrained problems con-
vert the given constraint problem into a sequence of unconstrained subproblems which,
finally, should converge to the solution of the constrained problem. This approach in-
cludes transformation methods such as the barrier function method, the penalty method
or the multiplier method.
The optimality conditions only apply to local optimality; global optimality is considered
later on where global optimum solutions are limited only to some very special cases.
We start with the necessary optimality condition which can be used to determine station-
ary points and to qualify them as condidate points. For the sake of a thorough under-
standing, the necessary condition is not only introduced informally as a theorem, rather
it will also be deduced formally from the discussion of the Taylor series: Assuming that
we are at a minimum point x∗ , each movement within a small neighborhood cannot re-
duce the function any further. Consequently, the change in the function always has to be
non-negative. This implies that
!
∆ f = f (x∗ + ∆x) − f (x∗ ) ≥ 0

for all changes ∆x. From the introduction of the Taylor series we also know that

1 !
∆ f = ∇ f (x∗ )T ∆x + ∆xT H(x∗)∆x + R ≥ 0
2

56
is defined through first and second order terms, plus a remainder R. Since the change
vector ∆x is small, the first order term ∇ f (x∗ )T ∆x dominates the higher order terms.
Focusing on this term, we can conclude that the requirement ∆ f ≥ 0 can satisfied if and
only if ∇ f (x∗ ) = 0 for all possible changes ∆x. That is, the gradient of the function at the
assumed minimum point x∗ must vanish. This condition is called the necessary condition
for a stationary point x∗ . In the component form the necessary condition becomes
∂ f (x∗ )
= 0, i = 1, 2, 3, . . ., n.
∂xi
More precisely, both conditions are of first order. Considering the second term in the
Taylor series for ∆ f at the minimum point x∗ , i.e.

∆ f = ∇ f (x∗ )T ∆x + 12 ∆xT H(x∗)∆x+ |{z}


R ≥ 0,
| {z }
!
=0 ≈0
it is obvious that, necessarily, the term
1 T
∆x H(x∗ )∆x, ∀∆x 6= 0,
2
has to be non-negative, too. In other words, if this weren’t the case, then the point x∗ is
not the local minimum! This condition is called the second order necessary condition;
so any point violating the condition 21 ∆xT H(x∗ )∆x ≥ 0, ∀∆x 6= 0, can definitely not be a
minimum. (Note: If the case “equals 0” occurs, information of even higher order is needed
for a decision.)
The second order necessary condition can be re-formulated in the following fashion: If
f (x) has a local minimum at x∗ , then the Hessian matrix
 2 
∂ f
H(x ) = ∇ f x∗ =
∗ 2
∂xi ∂x j
is positive semidefinite or positive definite at the point x∗ .
This formulation is equivalent to the requirement that the term
1 T
∆x H(x∗ )∆x ≥ 0, ∀∆x 6= 0.
2
The above term is designated as a

quadratic form F(∆x) of the Hessian H.

In general, a quadratic form F may be either positive, negative or zero for any fixed ∆x.
It may also have the property of being always positive, except for F(x = 0). Then, such a
form is called positive definite. In harmony with the positive definiteness of the quadratic
form, the matrix associated with that quadratic form is also denoted as positive definite.
Using x instead of ∆x in our formulation, we can define a general matrix A to be positive
definite if xT Ax > 0, ∀x 6= 0. Accordingly, we can complete the classification of A as
shown in Table 5.1.
It is to be pointed out that, besides a trial and error method by which arbitrary x vectors are
evaluated for a given matrix A, the following two methods for checking the definiteness
of a matrix A are also applicable for use in numerical problems:

57
Table 5.1: Classification of a general matrix A
Quadratic Form matrix A
xT Ax >0, ∀x 6= 0 positive definite
xT Ax ≥0, ∀x 6= 0 positive semidefinite
T
x Ax < 0, ∀x 6= 0 negative definite
xT Ax ≤ 0, ∀x 6= 0 negative semidefinite
xT Ax < 0 for some x and
xT Ax > 0 for some other x indefinite

Eigenvalue check for the form a matrix


Let λi , i = 1, . . . , n be n eigenvalues of a symmetric n × n matrix A associated with the
quadratic form F(x) = 21 xT Ax. The following results can be stated regarding the quadratic
form F(x) or the matrix A:

1. F(x) is positive definite if and only if all eigenvalues of A are strictly positive, i.e.
λi > 0, i = 1, . . ., n.

2. F(x) is positive semidefinite if and only if all eigenvalues of A are non-negative, i.e.
λi ≥ 0, i = 1, . . ., n (note that at least one eigenvalue must be zero for it to be called
positive semidefinite).

3. F(x) is negative definite if and only if all eigenvalues of A are strictly negative, i.e.
λi < 0, i = 1, . . ., n.

4. F(x) is negative semidefinite if and only if all eigenvalues of A are non-positive, i.e.
λi ≤ 0, i = 1, . . ., n (note that at least one eigenvalue must be zero for it to be called
negative semidefinite).

5. F(x) is indefinite if some λi < 0 and some other λ j > 0.

Check for the form of a matrix using principal minors


Let Mk be the kth principal minor of the n × n symmetric matrix A defined as the deter-
minant of a k × k submatrix obtained by deleting the last (n − k) rows and columns of A.
Assume that no two consecutive principal minors are zero. Then

1. A is positive definite if and only if all the principal minors are positive, i.e. Mk > 0,
k = 1, . . ., n.

2. A is positive semidefinite if and only if Mk ≥ 0, k = 1, . . . , n (note that at least one


principal minor must be zero for it to be called positive semidefinite).

3. A is negative definite if and only if Mk < 0 for odd k and Mk > 0 for even k.

4. A is negative semidefinite if and only if Mk ≤ 0 for odd k and Mk ≥ 0 for even


k (note that at least one principal minor must be zero for it to be called negative
semidefinite).

58
5. A is indefinite if it does not satisfy any of the preceding criteria.

Finally, the sufficiency condition is to be derived from the discussion of the Taylor series.
Again, considering the second term in the series for ∆ f , where
1
∆ f = ∇ f (x∗ )T ∆x + ∆xT H(x∗ )∆x + R
2
is evaluated at the staionary point determined by the condition
!
∇ f |x∗ = ∇ f (x∗ ) = 0,

the positivity of ∆ f (∆ f ≥ 0) is assured if

∆xT H(x∗ )∆x > 0, ∀∆x 6= 0.

This will be true if the Hessian H(x∗ ) is a positive definite matrix which is then the
sufficient condition for a local minimum of f (x) at x∗ .
Summarizing the last development gives the second order sufficiency conditions. If the
matrix H(x∗ ) is positive definite at the stationary point x∗ , then x∗ is a local minimum
point for the function f (x).
Some last comments at the end of the derivation of the optimality conditions: Note that
the entire conditions involve derivatives and not “absolute” values of the functions. As
a consequence, adding a constant to f (x) or multiplying f (x) by any positive constant
does not change the minimization problem. Multiplying f (x) by a negative constant (e.g.
−1) changes the minimum of f to a maximum. This property allows us to convert a
minimization problem directly to a maximization problem by multiplying f (x) by −1.
A quick example can demonstrate the capabilities of the optimality conditions in action.

Example
Discuss the function f (x) with respect to optimum solutions where

f (x) = 2x21 + 4x1 x32 − 10x1 x2 + x22 .

The gradient ∇ f (x) has to be zero to identify stationary points,


 
4x1 + 4x32 − 10x2
∇ f (x) = = 0.
12x1 x22 − 10x1 + 2x2
Solving the non-linear system of equations

4x1 + 4x32 − 10x2 = 0,


12x1 x22 − 10x1 + 2x2 = 0,

by replacing
5
x1 = x2 − x32
2
from the first equation in the second equation yields
5 5
12( x2 − x32 ) · x22 − 10( x2 − x32 ) + 2x2 = 0
2 2

59
or
(−12x42 + 40x22 − 23)x2 = 0.
A possible solution point is x2 = 0 and x1 = 0. Thus,
 
∗ 0
x =
0
is a stationary point.
Checking the Hessian matrix for this point requires the computation of H(x),
 2 
∂ f ∂2 f  
2 − 10
∂x2 ∂x1 x 2 4 12x2
H(x) =  ∂2 1f ∂2 f
= 2 − 10 24x x + 2 .
12x2 1 2
∂x2 x1 ∂x2
2

Evaluating H for x∗ gives


 
∗ 4 −10
H(x ) = .
−10 2
Using the method of principal minors, we can compute
 
det 4 = 4 > 0,
but
 
4 −10
det = 4 · 2 − (−10) · (−10) = −92 < 0,
−10 2
proving
√ the indefiniteness
√ of H(x∗). Also, the eigenvalues of the matrix H(x∗) are 3 +
101 and 3 − 101, i.e. about 13.05 and -7.05, respectively, again showing that H(x∗) is
indefinite. Thus, x∗ is an inflection or saddle point.
The example carried out above allows some essential conclusions to be drawn with respect
to the usage of optimality conditions:
1. In the determination of candidate minimum points a plentitude of partial deriva-
tives has to be computed (this is hard work in the general case with many variables
although mathematical manipulators may be at hand or derivatives computed nu-
merically using finite differences).
2. Often the necessary conditions lead to a non-linear equation system that, customar-
ily, cannot be solved easily. Thus, the optimization problem is shifted to an equally
complicated problem (NLEQs).
3. The discussion of definiteness or indefiniteness, for qualifying stationary points,
doesn’t make life easy, in particular, if there are multiple stationary points which
compete with one another.
The following problem cases have still not been addressed:
• constrained problems (to be discussed subsequently; they, of course, enlarge the
already existing complexity),
• objective functions without C2 -continuity (twice continuously differentiable; they
constitute a different story as seen in Fig 5.3).

60
f ’ (x1) not continuous

f(x1)

x1

f(x1) not continuous

f(x1)

x1

f(x1) defined pointwise


f(x1)

x1

Figure 5.3: Examples of objective functions that are not C2 continuous

61
62
Chapter 6

Indirect solutions of constrained


optimum design problems

Based on the discussion of unconstrained optimization problems, one might conclude that
only the nature of the objective function f (x) will determine the location of the minimum
point. This, however, is not true.
The constraint functions, as we already know, can play a prominent role. The following
examples are given to illustrate possible situations.

Case #1
Solve the minimization problem

min f (x) = (x1 − 1.5)2 + (x2 − 1.5)2 | h(x) = x1 + x2 − 2 = 0 .
x

Solution. The potential solution point must lie on the line x2 = 2 − x1 (see Fig. 6.1, left).
The minimum can be determined by the perpendicular from the center of the contours of
f (x) (which are of course concentric circles) to the line x2 = 2 − x1 . Consequently, due to
the theorem of intersecting lines the solution is
 ∗  
∗ x 1.0
x = 1∗ = .
x2 1.0

That means that the solution is a specified point on the given line x2 = 2 − x1 where
h(x∗ ) = 0 and f ∗ = f (x∗ ) = (x21 − 1.5)2 + (x∗2 − 1.5)2 = 0.5.

Case #2
Solve the minimization problem
 
 x1 + x2 − 2 ≤ 0, 

min f (x) = (x1 − 1.5)2 + (x2 − 1.5)2 −x1 ≤ 0,
x  
−x2 ≤ 0.

Solution. In this (simple) case, instead of a single inequality constraint, three inequality
constraints are given. Although the graphical representation of the problem looks very
similar, the nature of the problem is completely different from the previous case (see

63
x2 x2

2 2
M

x2* = 1 1 feasible set


x2 = 2 - x1

x1* = 1 2 x1 x1
1 2

Figure 6.1: Solution for case #1 left and case #2 right

Fig. 6.1, right). The solution is determined by the feasible domain spanned by the three
inequality constraints. Nevertheless, the same solution point occurs. The minimum value
for f (x) in this case corresponds to a circle with the smallest possible radius of all circles
in the feasible domain that just intersects the feasible region. This, with respect to its
value, is the point
 ∗  
∗ x 1.0
x = 1∗ = ,
x2 1.0

where f (x∗ ) = 0.5, but the solution logic has changed. The location of the minimum is
found by considering a feasible domain! Again, the location is governed by the constraints
or boundaries of the feasible domain.

Case #3
Solve the minimization problem
 
 x1 + x2 − 2 ≤ 0, 

min f (x) = (x1 − 0.5)2 + (x2 − 0.5)2 −x1 ≤ 0,
x  
−x2 ≤ 0.

Solution. Now the solution is independent of the constraints. As we can see from the
graphical representation (see Fig. 6.2, left) the minimum lies on the inside of the feasible
domain because the center of the contours (again, circles) are located at the point ( 21 , 12 )
where
 
∗ ∗ 1/2
f (x ) = 0 and x = .
1/2

Case #4
Solve the minimization problem
 
 x1 + x2 − 2 ≤ 0, 

 

2 −x1 + x2 + 3 ≤ 0,
2

min f (x) = (x1 − 2) + (x2 − 2)
x  −x1 ≤ 0, 

 
−x2 ≤ 0.

64
x2 x2

M
2 2

1 1
M
1/2

x1 x1
1/2 1 2 1 2

Figure 6.2: Solution for case #3 left and case #4 right

In this case, the objective function is modified again. Also, a further inequality constraint
(−x1 + x2 + 3 ≤ 0) is added to the model.
Solution. The plot of the constraints for case #4 shows that the problem is over-constrained
(see Fig. 6.2, right). This is to say that there are conflicting requirements which can not be
satisfied. As a consequence, a solution does not exist. In such a case we must re-examine
the problem formulation and relax the constraints.
To discuss the above example in more detail we have to realize that

1 1
if P1 = [ , ]T , then g2 (x) = −x1 + x2 + 3 > 0,
2 2
if P2 = [2, 2]T, then g1 (x) = x1 + x2 − 2 > 0 and g2 (x) = −x1 + x2 + 3 > 0,
if P3 = [5, 1]T, then g1 (x) = x1 + x2 − 2 > 0.

All constraints in the constraint sets cannot be satisfied simultaneously. In other words,
no solution point exits that can fulfill all the requirements.
From the discussion of the four examples it follows that the general constrained optimum
design problem, based on the formulation
 
h(x) = 0
min f (x)
x g(x) ≤ 0

requires a sophisticated solution concept. In particular, the various scenarios inherent in


constrained problems must be covered if an indirect solution technique, similar to the
optimality conditions for unconstrained problems, is to be applied.
A solution concept that can master these requirements was introduced by Kuhn-Tucker in
terms of the so-called Kuhn-Tucker conditions (KTC). The KTC are also optimality con-
ditions like the necessary and sufficient conditions in the unconstrained problem. Similar
to these optimality conditions, the KTC can be used either as a basis for indirect solution
techniques or for direct solution techniques where they are often used as checking criteria
for solutions found by a numerically based search process.
The outstanding significance of the KTC requires a sound derivation! This derivation is
accomplished in two steps by transforming an unknown problem to a known problem.

65
This approach is often applied in mathematics (and engineering). Thus, we transform the
original problem
 
h(x) = 0
min f (x)
x g(x) ≤ 0

to the simpler problem

min { f (x) | h(x) = 0}


x

having only equality constraints instead of both inequality and equality constraints.
Furthermore, we transform the problem with equality contraints to an unconstrained prob-
lem

min{ f (x)}
x

which we can already solve in all details.


We start with the second transformation first. Having solved the problem asscociated with
equality constraints, in a second step we will then look at ways how to transform a prob-
lem with inequality constraints into one with equality constraints.
In the first step, the method of the Lagrangian multipliers brings us to the desired neces-
sary conditions that we are looking for. To understand the introduction of the Lagrange
multipliers we consider the following minimization problem:

min { f (x) | h(x1 , x2 ) = 0} .


x

As can be seen, the abstract objective function has two variables x1 and x2 and the mini-
mization over x is subjected to only one single equality constraint h(x1 , x2 ) = 0.
To derive necessary conditions which we are interested in, we assume that the equality
constraint can be used to solve for one variable in terms of the other (at least symboli-
cally), that is, we assume that we can write

x2 = Φ(x1 ),

where Φ is an appropriate function of x1 . (In some very elementary cases it may be possi-
ble to find an explicit representation for Φ(x1 ), for example if h(x1 , x2 ) = x1 + x2 − 2 = 0,
we could write x2 = Φ(x1 ) = 2 − x1 . In general, however, such explicit expressions cannot
be found.)
Taking the implicit representation x2 = Φ(x1 ), we are able to establish a procedure by
which the Lagrange multipliers (in our case, however, there is only one multiplier for
the single equality constraint) get defined naturally in the process. Since the (symbolic)
elimination of the second variable results in an equation with only one variable, we can
easily write the necessary condition for the newly obtained problem

min{ f (x1 , Φ(x1 ))}


x1

as
df !
= 0.
dx1

66
Using the chain rule of differentiation for f (x1 , Φ(x1 )) = f (x1 , x2 ), we get

d f (x1 , x2 ) ∂ f (x1 , x2 ) ∂ f (x1 , x2 ) dx2 !


= + · = 0.
dx1 ∂x1 ∂x2 dx1

Since this is the necessary condition for a stationary point x∗ = (x∗1 , x∗2 ) we have to re-write

∂ f (x∗1 , x∗2 ) ∂ f (x∗1 , x∗2 ) dΦ(x∗1 )


+ · =0
∂x1 ∂x2 dx1
where dx2 /dx1 has been replaced by dΦ(x1 )/dx1 due to x2 = Φ(x1 ).
We can now eliminate dΦ(x1 )/dx1 from the necessary condition above if we differentiate
the constraint equation h(x1 , x2 ) = 0 at the point (x∗1 , x∗2 ) as

dh(x∗1 , x∗2 ) ∂h(x∗1 , x∗2 ) ∂h(x∗1 , x∗2 ) dΦ(x∗1 )


= + · = 0.
dx1 ∂x1 ∂x2 dx1

This step make sense because we also have to involve the equality condition into our
consideration. Solving for dΦ/dx1 , we immediately obtain

dΦ ∂h(x∗1 , x∗2 )/∂x1


=− .
dx1 ∂h(x∗1 , x∗2 )/∂x2

Now we are capable of substituting dΦ/dx1 from the above equation into the equation for
the necessary condition and we obtain

∂ f (x∗1 , x∗2 ) ∂ f (x∗1 , x∗2 ) ∂h(x∗1 , x∗2 )/∂x1


− · = 0.
∂x1 ∂x2 ∂h(x∗1 , x∗2 )/∂x2

Reordering gives

∂ f (x∗1 , x∗2 ) ∂ f (x∗1 , x∗2 )/∂x2 ∂h(x∗1 , x∗2 )


− · = 0.
∂x1 ∂h(x∗1 , x∗2 )/∂x2 ∂x1

If we define the quantity λ as

∂ f (x∗1 , x∗2 )/∂x2


λ=−
∂h(x∗1 , x∗2 )/∂x2
we get

∂ f (x∗1 , x∗2 ) ∂h(x∗1 , x∗2 )


+λ = 0.
∂x1 ∂x1
Also, it turns out that
∂ f (x∗1 , x∗2 ) ∂h(x∗1 , x∗2 )
+λ =0
∂x2 ∂x2
where, now, the quantity λ is

∂ f (x∗1 , x∗2 )/∂x1


λ=− .
∂h(x∗1 , x∗2 )/∂x1

67
Note that in this derivation we force both terms
∂ f (x∗1 , x∗2 )/∂x1 ∂ f (x∗1 , x∗2 )/∂x2
and
∂h(x∗1 , x∗2 )/∂x1 ∂h(x∗1 , x∗2 )/∂x2

to be equal to the same quantity λ, although they contain distinct partial derivatives in their
numerators as well as denominators. This demands that, at the stationary point x = x∗ , we
force the equality of these two terms, or in other words

∂ f /∂x1 |x∗ ! ∂ f /∂x2 |x∗


= .
∂h/∂x1 |x∗ ∂h/∂x2 |x∗

This means that


∂ f /∂x1 |x∗ ∂ f /∂x2 |x∗ !
− = 0,
∂h/∂x1 |x∗ ∂h/∂x2 |x∗
or

∂ f ∂h ∂ f ∂h
· − · = 0.
∂x1 x∗ ∂x2 x∗ ∂x2 x∗ ∂x1 x∗

This expression is, in the sense of the functional analysis in mathematics, equivalent to
the requirements that the determinant of the Jacobian matrix of the two functions f (x)
and h(x), also called the functional determinant, vanishes. Thus,

∂f ∂f
∂x1 ∂x2
!
= 0.
∂h ∂h
∂x ∂x ∗
1 2 x

The geometrical interpretation of this mathematical fact is much more demonstrative:


Writing
∂f ∂h
∂x1 ∂x1
∂f
= ∂h
∂x2 ∂x2

at the stationary point shows that

∂x2 ∂ f ∂x2 ∂h
· = ·
∂ f ∂x1 ∂h ∂x1

which means that, at the point x∗ , the slopes ∂x2 /∂x1 with respect to both curves f (x∗ )
and h(x∗ ) coincide (see Fig. 6.3).
The two equations

∂ f ∂h
+λ =0
∂x1 x∗ ∂x1 x∗

and

∂ f ∂h
+λ =0
∂x2 x∗ ∂x2 x∗

68
f(x) contours

x2

h(x)
α dx2
x2*
dx1
tan α = dx2 / dx1

x1* x1

Figure 6.3: The slopes of f (x∗ ) and h(x∗ ) at x∗ .

along with the equality constraint

h(x)|x∗ = 0

are the necessary conditions of optimality for an optimum design problem associated with
the equality constraint h(x) = 0.
Any point x that violates these conditions cannot be a minimum point for the problem.
Here, the quantity λ is designated as the Lagrangian multiplier for the equality constraint
h(x) = 0. If the minimum point is known the value of the multiplier can be calculated. For
the above case #1 we get x∗1 = x∗2 = 1 and λ∗ = 1.
It is customary (and convenient) to use what is known as the Lagrange function in writing
the necessary conditions. The Lagrange function is denoted as L and becomes

L(x, λ) = f (x1 , x2 ) + λh(x1, x2 )

such that the necessary conditions can be obtained by the following derivatives

∂L ∂ f ∂h
= +λ = 0,
∂x1 ∂x1 x∗ ∂x1 x∗

∂L ∂ f ∂h
= +λ = 0,
∂x2 ∂x2 x∗ ∂x2 x∗
∂L
= h(x1 , x2 )|x∗ = 0.
∂λ

Consequently, the vanishing gradient of the L construct ∇x,λ L(x, λ) x∗ = ∇x,λ ( f (x) + λh(x)) x∗
= 0 yields the same statements as above.
Thus, the two problems

min{ f (x) | h(x) = 0}


x

and

min{L(x, λ) = f (x) + λh(x)}


x,λ

are completely equivalent to each other.


The equation ∇ f (x)|x∗ + λ ∇h(x)|x∗ = 0 can be used for a further interpretation. Re-
arranging this to ∇ f (x∗ ) = −λ∇h(x∗ ) shows that at the candidate minimum point the

69
gradient of the objective function and the constraint function have to be along the same
line and proportional to each other. Therefore, the Lagrange multiplier is a proportionality
constant. It can also be interpreted as a force required to impose the constraint.
The idea of a Lagrangian multiplier for an equality constraint can, of course, be general-
ized to many, say p, equality constraints. This leads to the Lagrange Multiplier Theorem.
Before defining this general theorem one important amendment is needed. It must be
asssured that the candidate minimum point that we wish to examine with respect to the
necessary optimality conditions represents a so-called regular point. This has to do with
the fact that the derived necessary optimality conditions are only valid if the point of in-
terest is a regular point. If not, no conclusion based upon the optimality conditions are
possible. (This does not mean that the point is not a minimum!)

6.1 Definition of regularity


A point x∗ satisfying the constraints h(x∗ ) = 0 (many constraints!) is said to be a regular
point if the gradient vectors of all constraints at the point x∗ are linearly independent.
Linear independence means that no two gradients are parallel to each other, and no gra-
dient can be expressed as a linear combination of the other gradients. (For later on, when
inequality constraints are also included, then for a regular point also the gradients of the
active inequalities, i.e. g j (x) = 0, must be linearly independent as well.)
With this amendment we can state the Lagrange Multiplier Theorem for the n-dimensional
case as follows: Consider the problem

min{ f (x) | h(x) = 0}


x

where x = [x1 , x2 , ..., xn]T . Let x∗ be a regular point that is a local minimum for the above
problem. Then, there exist Lagrange multipliers λ∗k , k = 1, 2, 3, ..., p such that ∇L(x,λλ)|x∗ ,λλ∗
= 0, or

∂L
= 0, i = 1, 2, 3, ..., n,
∂xi x∗ ,λλ∗

and

∂L
= 0, k = 1, 2, 3, ..., p,
∂λk x∗ ,λλ∗

λ) = f (x) +λ
where L(x,λ λT h(x) is the Lagrange function.

Example
Consider the problem

min{ f (x) = x21 + x22 | h(x) = x1 + x2 − 10 = 0}.


x

This can be refomulated as

λ)}
min{L(x,λ
λ
x,λ

70
x2

10

∇f |x*

∇h |x*

1
x2* = 5
1

x1
x1* = 5 10

Figure 6.4: The gradients of f and h at the point x∗ = (−5, −5). We have ∇ f |x∗ = −λ ∇h|x∗ with
∇ f |x∗ = [2x1 , 2x2 ]T = [10, 10]T , ∇h|x∗ = [1, 1]T and λ = −10.

where L(x,λ λ) = f (x) + λh(x) = x21 + x22 + λ(x1 + x2 − 10). The necessary optimality con-
ditions give

∂L
= 2x∗1 + λ∗ = 0 → λ∗ = −2x1 ,
∂x1
∂L
= 2x∗2 + λ∗ = 0 → λ∗ = −2x2 ,
∂x2
∂L
= x∗1 + x∗2 − 10 = 0.
∂λ

Since x∗1 = x∗2 = −λ/2 we obtain x∗1 + x∗2 − 10 = −λ∗ /2 − λ∗ /2 − 10 = 0, or λ∗ = −10 and
x∗1 = x∗2 = −λ∗ /2 = −5. The geometrical interpretation is shown in Fig. 6.4.
The transformation of the original problem into an unconstrained problem using the La-
grange function leads to a so-called “saddle point problem.” In numerical optimization,
there are some methods available which directly utilize the saddle point nature of the
problem as the solution philosophy (dual methods). These methods search for the deepest
point in the “basin” created by the ridges of the saddle surface. For this (see Fig. 6.5), first
a maximization of the λ multipliers is carried out, followed by a minimization of the xi
variables.

6.2 Necessary conditions for inequality constraints


Based on the Lagrange Multiplier Method, we now consider the second transformation
mentioned above to define the necessary conditions for optimization problems with both

71
ridge
basin of the ridge

x2

x1

Figure 6.5: Tansformation of a constrained problem into an unconstrained problem using the La-
grange function leads to a “saddle point problem”.

equality and inequality constraints. That means that we are interested in transforming the
original optimization problem
 
h(x) = 0
min f (x)
x g(x) ≤ 0
into a problem where only equality constraints appear.
We can transform the inequality constraints of the form g(x) ≤ 0 or g j (x) ≤ 0, j =
1, 2, 3, ..., m by addng new variables (to the constraints) which are called slack variables.
Using, for example, a specified constraint g j (x) ≤ 0 it can immediately be seen that the
slack variable s j for this constraint always has to be non-negative (i.e. positive or zero) to
make the inequality an equality.
To give an example, if g j (x) = −100, then s j = +100 gives exactly
g j (x) + s j = 0.
In the case that g j (x) is active, i.e. g j (x) = 0, then s j is zero because again
g j (x) + s j = 0.
Consequently, an inequality constraint g j (x) ≤ 0 is equivalent to the equality constraint
g j (x) + s j = 0, where s j ≥ 0.
The variables s j are treated as new unknowns of the design optimization problem, along
with the original design variables xi , i = 1, 2, 3, ..., n. Their values are determined as part
of the solution.
When a slack variable s j has zero value, the corresponding inequality constraint is satis-
fied at equality. Such an inequality is call an active constraint. In other words, there is no
“slack” in the constraint. For any s j ≥ 0, the corresponding constraint is a strict inequal-
ity and called an inactive constraint. Note that the usage of slack variables introduces
additional “design” variables and an additional constraint for each s j of the type s j ≥ 0.
This, of course, increases the dimension of the design problem, in a similar fashion as
the Lagrange multipliers enlarge the problem dimensions. In the case of the Lagrange

72
multipliers the dimensionality increases from n to n + p, if p equality constraints are
given. The constraints of the type s j ≥ 0, j = 1, 2, 3, ..., m, can, however, be avoided if we
use s2j instead of s j as the slack variable. Therefore, we have the improved version

g j (x) + s2j = 0,

where s j can be any real number. This new definition of quadratic slack variables allows
the evaluation of constraints as follows:

1. if constraints are satisfied (g j ≤ 0), then s2j ≥ 0;

2. if constraints are violated, then s2j < 0, which is not acceptable.

According to the preceeding discussion we can now apply the well known Lagrangian
Multiplier Theorem to treat the inequality constraints that we could not master so far. The
following transformation is obvious:
 
h(x) = 0
min f (x)
x g(x) ≤ 0

is transformed to
 
h(x) = 0
min f (x)
x g(x) + s2 = 0

where the quantity s2 is understood as s2 = [s21 , s22 , ...s2m]T .


The Lagrange function, therefore, takes the form

λ,ω
L(x,λ ω, s) = f (x) +λ
λT h(x) +ω
ωT (g(x) + s2),

where the ω j , j = 1, 2, 3, ..., m are new Lagrange multipliers specifically introduced for
those equality constraints that have been converted from inequality to equality constraints
by means of the slack variables s2 .
Applying the Lagrange Multiplier Theorem to this enhanced Lagrange function yields the
problem

λ,ω
min {L(x,λ ω, s)}.
λ,ω
x,λ ω,s

As can be seen, there are no constraints in the minimization problem min{L}, but we now
have a n + p + 2m-dimensional optimization problem. (That’s the price we have to pay
for getting an unconstrained problem.) There is an additional condition for the Lagrange
multiplier of the original ≤-type constraints, i.e. the ω j , j = 1, 2, 3, ..., m multipliers. Since
the g j (x) are required to have to be less than or equal to zero, the ω j -values are not allowed
to be negative. Thus

ω j ≥ 0, j = 1, 2, 3, ..., m

is an absolute must which creates m additional, although very elementary constraints (sign
constraints).
According to the Lagrange approach the necessary conditions for equality and inequality
constraints can be derived which are commonly known as the Kuhn-Tucker necessary
conditions (KTC).

73
6.2.1 The Kuhn-Tucker Theorem
Let x∗ be a regular point of the problem
 
h(x) = 0
min f (x) .
x g(x) ≤ 0
Define the Lagrange function
λ,ω
L(x,λ ω, s) = f (x) +λ
λT h(x) +ω
ωT (g(x) + s2)
or
p m
L = f (x) + ∑ λk hk (x) + ∑ ω j (g j (x) + s2j ).
k=1 j=1

Then there exist Lagrange multipliers λ∗k (or λ ∗ ) and ω∗j (or ω ∗ ) such that the Lagrangian
is stationary with respect to xi , λk , ω j and s j . Hence, we obtain the necessary KTC as
follows:
λ∗ · ∇x h|x∗ +ω
∇x L = 0 = ∇x f |x∗ +λ ωT · ∇x g|x∗ ,
∇λ L = 0 = h|x∗ ,
∇ω L = 0 = g|x∗ + s∗2 ,
∇s L = 0 = ω ∗ T s∗ , ω ∗ ≥ 0.

In a scalar notation we have



∂L ∂ f p
∂h m
∂g
∑ k ∂xi +
λ ∗
∑ j ∂xi ,
ω∗

=0= + i = 1, 2, ..., n,
∂xi ∂xi x∗ k=1 ∗ j=1 ∗
x x
∂L
= 0 = hk |x∗ , k = 1, 2, ..., p,
∂λk
∂L
= 0 = g j |x∗ + s∗2
j , j = 1, 2, ..., m,
∂ω j
∂L
= 0 = ω∗j s∗j , j = 1, 2, ..., m.
∂s j
The foregoing conditions are also called first-order necessary conditions. It is important
to understand their use to (i) check possible optimality conditions of a point and (ii) de-
termine candidate local minimum points.
Note first from the equations hk |x∗ = 0 and g j |x∗ + s j = 0 that the candidate point must
be feasible (equations of constraint warranty). Simultaneously, the gradient condition,
rewritten as

∂ f p m

∂xi ∗ ∑ k ∑ ω∗j g j |x∗ , i = 1, 2, ...n,


− = λ ∗
h k | x∗ +
x k=1 j=1

must be satisfied. Furthermore, the equations ω∗j s∗j = 0, called the switching conditions,
provide the multiplicity of solutions.
The gradient conditions also have a geometrical meaning. They can be interpreted such
that, at x∗ , the gradient of the objective function is a positive linear combination of the
gradients of the constraints with Lagrange multipliers as the parameters of the linear com-
bination. A 2D example is shown in Fig. 6.6.

74
−∇f | x* represented by a negative linear combination
xb* does not qualifiy for the Kuhn-Tucker conditions
xb*
hi
hi-1

∇hi | x*
xa*

contours of f(x) ∇hi+1 | x*


hi+1 −∇f | x* represented by a
positive linear combination
xa* qualifies for the
Kuhn-Tucker conditions

Figure 6.6: A 2D example of positve and negative linear combinations of constraint gradient vec-
tors

A numerical example (see case #2)


Consider the problem
min{ f (x) = (x1 − 1.5)2 + (x2 − 1.5)2 | g1 (x) ≤ 0}
x

where
g1 (x) = x1 + x2 − 2 ≤ 0.
The definition of the Langrange function L(x, ω, s) = f (x) + ω(g(x) + s2) is in this case
L = (x1 − 1.5)2 + (x2 − 1.5)2 + ω(x1 + x2 − 2 + s2 ).
The KTC for x = x∗ require
 
∂L !
2(x∗1 − 1.5) + ω∗ =0 =0 ,
∂x1
 
∂L !
2(x∗2 − 1.5) + ω∗ = 0 =0 ,
∂x2
 
∂L !
x∗1 + x∗2 − 2 + s∗2 = 0 =0 ,
∂ω
 
∂L !
2ω∗ s∗ = 0 =0 .
∂s
The solution starts by using the switching conditions (in general we have 2m cases for m
inequality constraints). By that, the following two case are possible:

case #1 For the switching condition ω∗ s∗ = 0 we assume s∗ = 0. Then we get 2(x∗1 −


1.5) + ω∗ = 0, 2(x2 − 1.5)2 + ω∗ = 0 and x∗1 + x∗2 − 2 = 0. Solving these equations
immediately gives the result x∗1 = x∗2 and x∗1 + x∗2 = 2, or x∗1 = x∗2 = 1. Also, note that
ω∗ = 1 and therefore positive. Thus, this is a feasible solution.

75
case #2 Now we assume that ω∗ = 0 in the switching condition. We obtain the equations
2(x∗1 − 1.5) = 0, 2(x∗2 − 1.5) = 0 and x∗1 + x∗2 2 + s∗2 = 0. Again, solving these equa-
tions show that x∗1 = x∗2 = 1.5, but s∗2 = 2 − x∗1 − x∗2 = −1, which means that there
is no real solution for s. Thus, this solution violates the given problem.

The KT-mechanism must be seen in the following context:

1. The KTC are not applicable at irregular points.

2. Any point that does not satisfy the KTC cannot be a local minimum point unless it
is an irregular point. Points satifying the KTC are called Kuhn-Tucker points.

3. Kuhn-Tucker points can be constrained or unconstrained. They are unconstrained


when there are no equalities, and all inequalities are inactive (the Langrange mul-
tipliers ω j are then zero). If the candidate point is unconstrained it can be a local
minimum, maximum or inflection point, depending on the Hessian matrix of the
objective function.

4. If there are equality constraints and no inequalities are active (i.e. their Langrange
multipliers are zero), the the Kuhn-Tucker points can again be only stationary (min-
imum, maximum or inflection) points.

5. If some inequality constraints are active and their multipliers are positive, then the
Kuhn-Tucker points cannot be local maxima. They may not be local minima either;
this will depend on the second-order necessary and sufficient conditions.

6. The KTC can be used to check whether or not a given point is a candidate minimum
point. (This fact is utilized in some popular numerical optimization methods for
termination.)

6.3 Sufficient conditions for constrained problems


Solution of the necessary first-order Kuhn-Tucker conditions discussed above are “only”
candidate minimum designs. Therefore, sufficient conditions are also required in the con-
strained case to determine whether or not a candiate design is indeed a local minimum, or
even a global minimum.

6.3.1 Convex problems


For convex problems, the first-order Kuhn-Tucker necessary conditions of the the KT
Theorem also become sufficient. Thus, if we can show the convexity of a problem, the
KT Theorem gives us a solution that is also sufficient. Moreover, the solution will be a
global minimum as well. (This is the best case that we can expect!) Convexity has been
tackled already when we introduced the different categories of solution domains, however,
a rigorous mathematical definition is lacking.
An optimization problem is said to be convex if

1. the constraints define a convex set S of points forming a collection that has the
following property: If P1 and P2 are any points in S, then the entire line segment

76
(2) (1)
x -x

α(x - x )
(2) (1)

f(x)

x secant
(2)
x
(1)
x

f(x) is convex
S
x

Figure 6.7: An example of a convex 2-dimensional set left; A 2D example of a convex function
right

P1 P2 is also in S. In the n-dimensional space the line segment can be written as (see
Fig. 6.7, left)

x = αx(2) + (1 − α)x(1) , 0 ≤ α ≤ 1.

2. Simultaneously, convexity requires that the objective function must be convex as


well, defined on the convex set S. This is the case if the function lies below the
line joining any two points on the curve f (x) (see Fig. 6.7, right). In general (n-
dimensional space), this can be expressed by the relationship f (x) ≤ α f (x(2) ) +
(1 − α) f (x(1)), or, using the definition of x in the first part,

f (αx(2) + (1 − α)x(1)) ≤ α f (x(2) ) + (1 − α) f (x(1)).

Numerically, the check for convexity of a function of n variables x1 , x2 , ..., xn defined on


a convex set S is carried out be means of the Hessian matrix H of f (x). If H is posi-
tive semidefinite or positive definite at all points in the set S, then we do have a convex
optimization problem (Note: The converse of this statement is not necessarily true!)

6.3.2 Second-order conditions for general problems


If a convex problem cannot be assumed, as in the unconstrained case, we have to use
second-order information about the function at the candidate point x∗ to determine if it is
indeed a local minimum. Recall for the unconstrained problem that the local sufficiency
and second-order necessary conditions requires the quadratic part of the Taylor series
expansion for an additional investigation of the problem.
As we remember, the quadratic part had to be non-negative for all nonzero changes ∆x.
In the constrained case, furthermore, any ∆x 6= 0 satisfying active constraints to the first-
order must be in the constraint tangent plane.
Such ∆x vectors are then orthogonal to the gradients of the active constraints (i.e. the
constraint gradients are normal to the constraint tangent plane, see Fig. 6.8.) Therefore,
the dot product of ∆x · ∇g j |x∗ must be zero. In general, ∇hT · ∆x = 0 and ∇gT · ∆x ≤ 0.
This discussion leads to the following second-order conditions.

77
ne
t pla
en
ng
∇gj(x) | x* int ta
tra
ns
co
∆x

∆x active constraint gj(x)


(one inequality only)

x*

Figure 6.8: The constraint gradients are normal to the constraint tangent plane

6.3.3 Second-order necessary conditions for general constrained prob-


lems
Let x∗ satisfy the first-order KT conditions. Define the Hessian of the Lagrange function
L at x∗ as

λ∗ )T · ∇2 h + (ω
∇2 L = ∇2 f + (λ ω∗ )T · ∇2 g.

Let there be nonzero feasible directions ∆x 6= 0 satisfying the following linear systems at
the point x∗ : ∇hT · ∆x = 0 and ∇gT · ∆x = 0, for all active inequalities. Then, if x∗ is a
local minimum point for the constrained minimization problem, it must be true that

Q = ∆xT · ∇2 L|x∗ · ∆x ≥ 0.

Note, if Q is not greater than 0, the point x∗ cannot be a local minimum!

6.3.4 Sufficient conditions for the general constrained problem


Let x∗ satisfy the first-order KT conditions. Define the Hessian of the Lagrange function
L at x∗ as ∇2 L = ∇2 f + (λ∗ )T · ∇2 h + (ωω∗ )T · ∇2 g. Define nonzero feasable directions
∆x 6= 0 as solutions of the linear system ∇hT ·∆x = 0 and ∇gT ·∆x = 0 for active inequality
constraints with ω∗j > 0. Also, let ∇gT · ∆x ≤ 0 for those constraints with ω∗j = 0.
If Q = ∆xT ∇2 L|x∗ ∆x > 0 is true, then x∗ is sufficiently an isolated local minimum, i.e.
there are no other minima in the neighborhood of x∗ .

6.4 Evaluation of the Kuhn-Tucker formalism


1. The Kuhn-Tucker conditions solve general constrained optimization problems, in
adjustment of classical theorems, indirectly by defining necessary and sufficient
conditions.

2. Again, these conditions create in general non-linear systems of equations in term of


the optimization (or design) variables.

78
3. The Kuhn-Tucker conditions allow a symbolic and analytical solution for some
very specific design problems (e.g. weight minimization of trusses with displaced
constraints).

4. The Kuhn-Tucker conditions provide a termination criterion for numerical opti-


mization besides further criteria. But, the Kuhn-Tucker conditions are much more
sophisticated and a lot more powerful.

5. The Kuhn-Tucker conditions are applicable solely for regular points. This is a qual-
ification that can make “life more difficult”.

6. The multipliers and slack variables can be used for navigation and for getting in-
sights into a problem. In particular, ω j = 0 implies an inactive constraint, s2j = 0
implies an active constraint, s2j > 0 implies an inactive constraint and s2j < 0 im-
plies a violation of the Kuhn-Tucker conditions.

7. Of course, it must be postulated that f , g and h are functions in C2 (twice differen-


tiable functions).

79
80
Chapter 7

Numerical methods for optimum design

7.1 Introduction
Numerical methods for nonlinear optimization problems are needed because the indirect
methods, based upon necessary and sufficient conditions, are either too cumbersome or
not applicable at all. Nevertheless, the theoretical background of the indirect solution
techniques is extraordinarily important with respect to an understanding of the numerical
optimization methods. As repeatedly outlined, many concepts discussed in the indirect
solution of optimization problems are often reused in numerical optimization.
It must be pointed out at the beginning that the mediation and representation, and therefore
the teaching of numerical optimization methods, is not an easy task. In contrast to other
numerical methods, such as linear or nonlinear equation solvers, integration methods,
eigenvalue solvers, etc., there is an enormous plentitude of distinct solution methods in
numerical optimization. This fact makes it difficult to select the most suitable method for
a given problem. In fact, one has to realize that

there is never that single outstanding numerical optimization method avail-


able that is always applicable, and that can solve each given problem with the
highest possible efficiency.

To facilitate the selection of an appropriate numerical solution method, at some places


knowledge-based systems, or expert systems, have been developed that are to assist engi-
neers in their selection (see e.g. thesis of Dr. Lehner).
For engineers who are interested in design optimization it is useful to get an overview
of the plethora of the different, and competitive, numerical optimization methods at first.
By that, an appropriate classification and rating of the individual methods is supported.
Based upon this overview given in terms of a block diagram, it is to be exemplified what
the prime rationales of the main categories of the numerical optimization methods are.
It should be clear that, due to the plethora of methods, there is no chance to discuss all
existing methods in every detail. The diagram in Fig. 7.1 illustrates the main concepts
used as well as the coherence in numerical optimization.
The systematic arrangement represents the personal view of the author of the lecture
notes, however, it is based upon long time experience. (In the literature there may be
different arrangements, but not always better suggestions.)
From the diagram, we can deduce the following facts and insight:

81
design optimization problem

transformation methods primal methods

concepts
- barrier function methods - methods solving the original constrained problem
- penalty function methods - methods solving the approximations of the constrained problem
- multiplier methods - methods for special cases

strategies
hill-climbing methods for hill-climbing methods for sequential special
unconstrained problems constrained problems approximation solvers

quadratic
SLP
substrategies/optimizers

search gradient Newton search gradient Newton solvers


methods methods methods methods methods methods SQP
geom. progr.
SCP
solvers

... ... ... ... ... ...

Figure 7.1: A classification of numerical optimization methods

1. Numerical methods for optimum design are conceptually different from indirect
(analytical) methods described in the previous chapters: They work in a direct fash-
ion using an iterative process, no matter if an unconstrained or a constrained ap-
proach is applied.

2. The left side of the classification (transformation methods) provides methods in


which the original constrained problem, i.e. the standard minimization problem
 
h(x) = 0
min f (x)

x g(x) ≤ 0

is transformed into an unconstrained surrogate problem, being a sequence of un-


constrained subproblems that, hopefully, converge to the solution of the original
constrained problem. (Note, further, that there are numerous sophisticated, some-
times also efficient solution techniques available because many scientists deal and
have dealt with unconstrained optimization strategies. This is not a surprise: Uncon-
strained problems are more elementary than constrained problems. The good news
is that the work undertaken in solving such unconstrained problems was not done
in vain because it also brings forward the solution of constrained problems!)

3. The right side of the classification (primal methods) provides methods that incorpo-
rate the constraints directly. Hereby, different subconcepts can be identified with re-
spect to the constraints. The first subconcept leaves the original constraint unaltered.
The second subconcept—in analogy to the transformation methods—transfers the
problem with constraints into a sequence of subproblems using specifically approx-
imated constraints, such as linearized constraints, quadratically approximated con-
straints or convex constraints. The third subconcept is limited to very specific non-
linear and constrained problems, for example pure quadratic or so-called geometric
optimization problems where polynomials (sums of products) form the objective
function and the constraints. The latter problems are, however, more of interest for
mathematicians and are of minor practical relevance in engineering.

82
(k)
select an initial solution x with k = 0

while (termination criterion is not satisfied)


(k)
determine an appropriate search direction s

determine an appropriate step size α(k) (for s )


(k)

(k+1) (k) (k)


create a new solution x = x +α(k)s

check the objective function first?

yes no

compute obj. function compute constraints

improved? satisfied?

yes no yes no

print obj. function (k) print constraints (k)


reject x reject x
compute constraints compute obj. function

satistified? improved?

yes no yes no

print constraints reject print obj. function reject


(k) (k) (k) (k)
accept x x accept x x
k=k+1 k=k+1

output results

Figure 7.2: Graphical illustration of the basic optimization process used in all numerical opti-
mization methods

7.2 Basic concepts related to the


implementation of algorithms
All numerical methods introduced above, albeit different in their logic, do have in com-
mon the same iterative process in which it is attempted to improve the objective function
step by step. If there are constraints the improvement has to take into account potential
violations; if there are no constraints, the check of constraints is omitted. The iterative
process is continued until no further moves (steps) are possible and the “optimality con-
ditions” or a suitable termination criteria are satisfied.
As a consequence, all numerical methods of numerical optimization are based on the ba-
sic iterative prescription shown in the Nassi-Shneiderman diagram in Fig. 7.2. Based on
the structure diagram, from a conceptional point of view, the iteration process is to be dis-
cussed. Hereby, the basic ideas, independent on a particular method, are to be considered.

7.2.1 Determination of the search direction x(k)


In the case of minimization (our standard case) it has to be assured that f (x(k+1) ) ≤
f (x(k) ), or f (x(k+1) )− f (x(k) ) ≤ 0. This statement expresses, in a mathematical sense, that
the objective function has to be reduced if we would like to find a better successor x(k+1)
to the current point x(k) within the optimization space. In a 2D-space we can illustrate this
requirement as shown in Fig. 7.3.
Substituting x(k+1) by x(k) +α(k) s(k) gives f (x(k) +α(k) s(k) ) ≤ f (x(k) ) or f (x(k) +α(k) s(k) )−

83
x2 contour of objective function

α
(k)

(k)
s

(k+1)
x
(k)
x

constraints

x1

Figure 7.3: The search direction in a 2D-space

f (x(k) ) ≤ 0, where s(k) represents the desirable direction of design change, also called the
direction of descent, and the quantity α(k) is a positive scalar called the step size.
The process of computing the change in design, ∆x(k) = α(k) s(k) , is therefore composed
of two parts (see Fig. 7.3).

1. the direction finding subproblem and


2. the step length determination subproblem (scaling along the x(k) direction).

In general, there are three distinct methodologies that can be used to find solutions for
the direction finding subproblem. The individual methodology depends on the order of
information with respect to the objective function and constraints, incorporated into the
computation of the s(k) direction.

7.2.2 Search methods


If neither first derivatives nor second derivatives of the objection function or constraints
are consulted then “only” zero-order information is evaluated, i.e. simple function calls of
the function f (x) and/or h(x) = 0 as well as g(x) ≤ 0 are required. Furthermore, in this
case it is no longer necessary that either the optimization criterion nor the constraints be
functions in the classical mathematical sense: They may also have an algorithmic repre-
sentation!
Methods based on zero-order information are called search methods in which plausible
directions of descent have to be selected. It is not surprising that there are numerous ways
and means to find reasonable directions of descent—in the last resort, by simple trial and
error approaches. To give some examples only a few of the known approaches will be
illustrated.
The so-called successive variation method determines the desirable direction by using
the individual directions of the optimization variables x1 , x2 , ..., xn . Thus, in a 2D-space
we have
α(k) s(k) = α∗1 e1 + α∗2 e2 = ∆x(k) ,

84
x2 e2

∆x
(0) α2*e2
α1*e1

(0)
x

x1 f(x)

Figure 7.4: Search vector based on a linear combination of the unit vectors

x2, e2

x1, e1

Figure 7.5: An example of the Rosenbrock method using rotated search directions

where the α∗1 and α∗2 are the optimal step sizes in the directions of the unity vectors e1
and e2 . Fig. 7.4 shows the solution process schematically for an unconstrained problem.
As we can see, α∗1 and α∗2 in the step from (k) to (k + 1) are determined such that the
minimum in each direction ek , k = 1, 2 is found (carried out by a line search method).
The Rosenbrock method is an improvement of the successive variation method. Success
and failures are identified and weighted by means of factors. This creates a rotation of
the search directions as illustrated in Fig. 7.5, resulting in an acceleration of the search
process.
In the simplex method (not to be confused with the Simplex algorithm im linear optimiza-
tion) a domain-oriented solution is found while the above two methods are path-oriented
methods. In path-oriented methods, according to their designation a single path creates
the solution points. By contrast, in a domain-oriented method, due to specified rules, a
subregion (in the feasible domain) of the optimization space is defined. Fig. 7.6 show the
schema which is applied in the simplex method. A triangle spans the subregion where
the worst point is eliminated in the consecutive steps through reflection. It must be men-

85
x2, e2
worst point in the
following step

worst point in the


first step

x1, e1

Figure 7.6: The simplex search method

(k+6)
x
x*
(k+5)
x
(k+4)
x
(k+3)
x
(k+2)
x
(k+1)
x

Figure 7.7: Solution points can be driven out of the feasible domain in the Simplex method

tioned that specific measures have to be undertaken if constraints have to be accounted


for, for example, if none of the transformations methods are used. In those cases, it may
happen that the solution points are driven out of the feasible domain, as shown in Fig. 7.7.
Then a “restoration process” (not described here in detail) has to provide an appropriate
projection back into the feasible domain.

7.3 Gradient methods


In the methods discussed above it could always be assumed that a search direction in the
design space is known, according to the logic used in the method or prescribed rules. If
the question of determining the search direction (by means of a separate computation)
arises, then first order and, for the Newton methods discussed subsequently, second order
information of the objective function is required to find a direction in which the values of
the objective function are reduced. This leads us to gradient methods and their specializa-
tions, in particular to the so-called conjugate gradient method.

86
x2

start

x1

Figure 7.8: Example of the steepest descent method.

The derivation of gradient methods can be demonstrated very rapidly if we make use
of the Taylor series expansion that we have already considered thoroughly. Using our
fundamental inequality f (x(k) + α(k) s(k) ) ≤ f (x(k) ) we can approximate the left-hand side
of the above “minimization condition” for numerical methods by the linear Taylor series
expansion about the point x(k) such that
f (x(k) ) + ∆x(k) T · ∇ f |x(k) + · · · ≤ f (x(k) ),

where ∇ f |x(k) is the gradient of f (x) at the point x(k) and the small terms of higher order
have been neglected. Also, ∆x(k) = α(k) s(k) . Subtracting f (x(k) ) from both sides of the
inequality gives
∆x(k) T · ∇ f |x(k) = α(k) s(k) T · ∇ f |x(k) ≤ 0.

Since α(k) > 0, it may be dropped from the inequality. Also, since ∇ f |x(k) is a known
quantity (the gradient of the objective function at x(k) ), the search direction must be com-
puted to satisfy the inequality such that
s(k) T · ∇ f |x(k) = s(k) T grad f |x(k) ≤ 0.

Geometrically, the inequality shows that the angle between the vectors s(k) and ∇ f |x(k)
must lie between 90◦ and 270◦ . In other words, any small movement in such a direction
must decrease the objective function. Furthermore, we can postulate that s(k) is propor-
tional to ∇ f |x(k) . The descent direction according to the gradient-proportional direction
is also called the downhill direction, which we should travel along to find our minimum
solution. As one can imagine, due to the wide range of the “downhill” angles (90◦ ...270◦ )
there are several methods available which compute the downhill direction differently.
The steepest descent method is the simplest, the oldest and probably the best known nu-
merical method for unconstrained optimization. (Cauchy, 1847, already introduced it even
before computers were invented.) In this method, exactly the negative gradient represents
the downhill direction, i.e. s(k) = −∇ f |x(k) . The graphical representation in Fig. 7.8 shows
the iteration history for a 2D example. Note that a large number of iterations may be re-
quired.
The method of conjugate gradient methods, due to Fletcher and Reeves (1964), is a very
simple and effective modification of the steepest descent method. In the steepest descent

87
method two consecutive steps within one iteration cycle, in the 2D example, are always
orthogonal to each other. This tends to slow down the process of the steepest descent
method, although the method converges. In contrast, the conjugate gradient directions
are not orthogonal to each other. Rather, these directions tend to cut diagonally through
the steepest descent directions. Without derivation, in the conjugate gradient method the
search direction is computed as s(k) = −∇ f |x(k) + β(k) s(k−1) , where

k ∇ f |x(k) k
β(k) = .
k ∇ f |x(k−1) k

Again, in the case that constrained problems are to be solved using gradient methods
specific enhancements are required. To give an example, the so-called gradient projec-
tion method by Rosen consists of a projecting mechanism by which the gradient of the
objective function is projected on the hyperplane tangent to the active set of constraints.
The vector so obtained is then the search direction. This indicates a close affinity to the
Kuhn-Tucker optimality conditions where we have also discussed the demands on feasi-
ble directions for active constraints (see second-order necessary and sufficient conditions,
in particular the dot products between ∇hT · ∆x and ∇gT · ∆x).

7.4 Newton methods


In the gradient methods, only first-order derivative information is used to determine the
direction of travel. If second-order derivative were available, we could use them to repre-
sent the surface of the objective function more accurately and a better direction of travel
could be found. The basic idea of the Newton methods is to use the second-order Taylor
series expansion of the objective function about the current design. This gives a quadratic
expression for the change in design ∆x. We obtain
1
f (x + ∆x) = f (x) + ∆xT ∇ f (x) + ∆xT · ∇2 f (x) · ∆x + . . .
2
Again, small terms of the order greater than 2 have been dropped. The necessary condi-
tions for minimization then give an explicit computation for the direction of travel in the
design space,

∂ f (x + ∆x) !
= 0 + ∇ f (x) + H∆x = 0,
∂∆x
or

∆x = −H−1 · ∇ f (x),

where ∆x is a small change in design and H is the Hessian of the function f (x) at the
current point. Of course, it must be assumed that the Hessian matrix is nonsingular to be
able to invert H.
Using this value for ∆x, the new estimate for the design is given according to our al-
ready known equation x(k+1) = x(k) + ∆x(k) . Each iteration requires the computation of
the Hessian, which is a symmetric matrix. Nevertheless, determining H needs n(n + 1)/2
second derivatives which means considerable computational effort. To improve efficiency
Newton methods have been developed which require less computational effort. In this

88
context, the Davidon-Fletcher-Powell method and the BFGS (Broyden, Fletcher, Gold-
farb and Shannon) update method are to be mentioned which use approximations of the
Hessians based on first-order derivatives. A “nice” property of the Newton methods is that
they show a so-called Q1-behavior, i.e. they find the minimum of a quadratic function in
one single step. Again, if constraints come into play additional remedies are necessary to
remain in the feasible domain during optimization.

7.5 Determination of the step size α(k)


As discussed earlier, the problem of obtaining the change in design ∆x is usually decom-
posed into two parts: the direction finding and the step size determination problems. The
step size determination is often called the one dimensional search or line search prob-
lem. This special problem is inherent in nearly all numerical optimization problems and,
therefore, extremely important with respect to efficiency and convergence.
To see how line search will be used in multidimensional problems, we assume for the
moment that a desirable direction of change x(k) = α(k) s(k) has already been found. Thus
we can write f (x(k+1) ) = f (x(k) + α(k) s(k) ). Now, since s(k) is known, the right-hand side
becomes a function of the scalar parameter α(k) = α only, and we can rewrite f (x(k+1) ) =
f (x(k) + αs(k) ) = Φ(α), where Φ(α) is the new function with α as the only independent
variable. Note that at α = 0, Φ(0) = f (x(k) ).
As a consequence of this discussion, we obtain the (known) specific problem

min { Φ(α) } .
α

Since this line search problem is a fundamental part in almost all optimization problems a
numerical example will be considered in which the one-dimensional minimization aspect
is presented in detail. The consideration of a concrete example may open the view for
other numerical line search which are numerously available in numerical optimization.

7.5.1 Example
Let a direction of change for the function f (x) = 3x21 + 2x1 x2 + 2x22 + 7 at the point (1, 2)
be given as (−1, −1). Therefore,
   
(k) 1 (k) −1
x = and s = .
2 −1

To compute the step size such that minα {Φ(α)} and minx { f (x)} hold, we first check to
see if s(k) is a direction of descent. Hence
   
6x1 + 2x2 10
∇ f (x)|x(k) = ∇ f |(1,2) = = .
2x1 + 4x2 10

With this we get for


 
10
s (k) T
· ∇ f |(1,2) = (−1, −1) = −20 < 0.
10

89
Φ

ε Φ’(0) secant
Φ(α) ≤ Φ(0) + ε Φ’(0) α
0 < ε < 1, recommended: ε ≈ 0.2 ... 0.5

0 1 α
test failed test satisfied

Figure 7.9: The Armijo test for convergence.

Therefore, s(k) = (−1, −1)T is indeed a direction of descent. Then, the new point x(k+1) =
x(k) + αs(k) is given as
 (k+1)    
x1 1 −1
= +α ,
x2 2 −1

(k+1) (k+1)
or, in component form, x1 = 1 − α, x2 = 2 − α. Substituting the above equations
(k+1) (k+1)
for x1 and x2 into the objective function at the point x(k+1) we have f (x(k+1) ) =
3(1 − α)2 + 2(1 − α)(2 − α) + 2(2 − α)2 + 7 = 7α2 − 20α + 22 = Φ(α).
!
According to the necessary conditions we demand that dΦ/dα = 0, or 14α∗ − 20 = 0 and
α∗ = 10/7. (Note that d 2 Φ/dα2 = 14 > 0.) As a result, we obtain
     
(k+1) 1 10 −1 −3/7
x = + = .
2 7 −1 4/7

In the above example, it was possible to obtain an explicit form of the function f (x(k+1) ) =
Φ(α) and to use the conventional necessary (and sufficient) conditions for computing the
desired step size α∗ . For many problems, however, such an explicit representation for
Φ(α) is not available. Moreover, even if the function Φ(α) would be known, it may be
too complicated to get an analytical solution. Therefore, a numerical method must be used
to find an α∗ value that minimizes the the function f (x) in the direction s(k) . This method
is designated as the numerical line search process, being iterative in itself.
For engineers solving structural optimization problems it is extremely important to know
that the accuracy of a line search crucially governs the convergence of all optimization
processes based upon the search and gradient methodologies. An exception is the opti-
mization using the Newton methodology where a rough estimated line search is sufficient;
in this case, only the Armijo test needs to be checked which means that the total process
is convergent when the function values remain below the downward directed secant at the
beginning of an interval (see Fig. 7.9).
Focusing on the line search techniques for search and gradient methodologies, it is not
astonishing that there are, again, numerous alternative line searches at work in the vari-
ous software packages available at the individual computer centers. Therefore, as at the
beginning of studying numerical optimization methods, an overview on the competitive
line searches is very important. The chart in Fig. 7.10 shows the main concepts.
In all cases, we must make some basic assumptions on the form of the line search function
to compute the step size numerically. For example, it has to be taken for granted that a

90
line search technique

interval search elimination approach interpolation approach

quadratic cubic

1 point 2 points 3 points

Figure 7.10: The main line search techniques.

f(α) is unimodal f(α) is unimodal in the


f(α) in the interval [a,b] f(α) two intervals [a,b] and [c,d]

a b α a b c d α

Figure 7.11: An example of a unimodal and partially unimodal function.

minimum exists and it is unique in the interval of interest. A function with this property
is called a unimodal function. The two examples in Fig. 7.11 demonstrate this situation.
Most line searches work for only unimodal functions. This may appear to be a severe
restriction, however, it is not. For functions that are not unimodal, we can think of locating
only a local minimum point that is closest to the starting point. The search problem then
is to find an interval at which the function f (α) has a global minimum value. This may be
carried out only in a numerical sense by determining the interval in which the minimum
may lie, i.e. by determining some lower and upper bound limits αl and αu for α∗ . The
interval [αl , αu ] is called the interval of uncertainty. This interval is reduced iteratively
until it is less than a specified small positive number ε. This is the desired accuracy for
locating the minimum.
According to this consideration, two phases can be identified. In phase one, the location
of the minimum point is bracketed and the initial interval of uncertainty is established. In
the second phase, the interval of uncertainty is refined by eliminating regions that cannot
contain the minimum.

7.5.2 Interval search


A very simple-minded approach (with respect to the global minimum, however, often the
only way out) is to successively reduce the interval of uncertainty to a small number by
means of equally spaced intervals (equal interval search). The idea is elementary: In a
given domain αl ≤ α ≤ αu the function f (α) is evaluated at the points defined by an
equidistant arrangement. Then, the globally smallest value determines the interval which

91
f(α)

α
interval of interest

Figure 7.12: Using equidistant intervals in a line search

f(α)

0.382 L

0.618 L

a b d c α
L=c-a

Figure 7.13: The golden ratio used in a line search.

is used to compute the minimum value more precisely (see Fig. 7.12).
Since only zero order information is used, the equal interval search is a “search method”
according to our general classification of numerical optimization methods.

7.5.3 Elimination methods

Elimination methods vary the increment of each step, which may be more efficient in
many cases. Bracketing the relevant minimum is very quick if a unimodal function can
be assumed. To give an example for the elimination approach the Golden Section Search
is considered here. In this method, the interval of interest is partitioned using the golden
ratio (see Fig. 7.13). Assume the three points a, b and c have been given. Then the next
step is to position a fourth point d in the larger of the two interval (a, b) or (b, c). It can be
shown that if the larger interval is divided by d into two segments that have the ratio of the
golden section, then this will give the fastest convergence if no other assumptions about
f (α) are made. This approach is carried out repeatedly until a convergence criterion is
reached.

92
quadratic approximation q(α)
f(α)

given f(α)

basic point α1 estimated α∗ true minimum α∗

Figure 7.14: An example of the one point pattern (Newton-Raphson iteration)

7.5.4 Interpolation methods


The zero order methods described above can require too many function evaluations during
a line search. Therefore, zero order methods are to avoided, if possible. Instead of evalu-
ating a function at numerous trial steps, an approximated curve can be passed through a
limited number of points that lie on the original curve f (α). As we know, any continuous
function on a given interval can be approximated as closely as desired by fitting a polyno-
mial of sufficient high order. Then the minimum point if this approximating polynomial
can be computed. Since the polynomial is determined in all details the minimum point
of the approximation can be found in terms of explicit formulas. Often, the first approxi-
mated minimum point is already a good estimate of the exact minimum of the line search
function f (α). As shown in the graphical overview above, there are various alternatives
of constructing the approximating polynomial, denoted in the following as q(α):

1. quadratic curve interpolation

• one point pattern (using first and second derivatives),


• two point pattern (using first derivatives),
• three point pattern (using no derivatives),

2. cubic interpolation

• two point pattern using two function values at two points along with their
derivative results.

Due to the limitations in time, only the quadratic curve interpolation (fitting) using a one
point pattern is to be considered in more detail.

7.5.5 One point pattern (Newton-Raphson iteration)


Here, only one measurement point at each iteration is applied. Therefore, it is apparent
that the first and second derivatives of the function f (α) have to be available to find an
estimate of the minimum point in the current iteration (see Fig. 7.14).

93
quadratic approximation q(α)
f(α)

given f(α)

interval of uncertainty [α1, α2]

α1 α2 α

estimated α∗ true minimum α∗

Figure 7.15: An example of the two point pattern

Suppose that at a point α1 (basic point) we can evaluate f (α1 ), f ′ (α1 ) and f ′′ (α1 ). It is
then possible to construct a quadratic function q(α) which agrees at α1 with the given
function f (α) up to the second derivative, according to a Taylor series expansion,

1
q(α) = f (α1 ) + f ′ (α1 )(α − α1) + f ′′ (α1 )(α − α1 )2 .
2

An estimate ᾱ∗ of the true minimum α∗ can now be calculated by finding the vanishing
point of the derivative of q(α) with respect to α. Thus, setting 0 = q′ (ᾱ∗ ) = f ′ (α1 ) +
f ′′ (α1 )(ᾱ∗ − α1 ) we obtain for the estimate

f ′ (α1 )
ᾱ∗ = α1 − .
f ′′ (α1 )

This process can be repeated at ᾱ∗ . It is apparent that the new point in the iteration of
the line search does not depend on the value f (α1 ). Therefore, the method can be viewed
as iteratively solving the equation f ′ (α) = 0. In fact, this is the well known method of
Newton-Raphson.

7.5.6 Two point pattern


The same iteration logic can be applied if two different points, say α1 and α2 , are used
(see Fig. 7.15). The second order term in the Taylor series expansion is then approximated
by first order terms,

1 f ′ (α2 ) − f ′ (α1 )
q(α) = f (α1 )(α − α1 ) + · (α − α1 )2 .
2 α2 − α1

Consequently, we obtain

α2 − α1
ᾱ∗ = α1 − · f ′ (α2 ).
f ′ (α
2 ) − f ′ (α )
1

This formula assumes that the interval [α1 , α2 ] includes the unknown minimum α∗ .

94
quadratic approximation q(α)
f(α)

given f(α)

interval of uncertainty [α1, α3]

α1 α2 α3 α

true minimum α∗ estimated α∗

Figure 7.16: An example of the three point pattern

7.5.7 Three point pattern


Without going into the details, the result for a three point pattern is briefly noted. Based
upon the function values at three different points forming an interval of uncertainty, the
estimate ᾱ∗ is computed through the formula

1 b23 f (α1 ) + b31 f (α2 ) + b12 f (α3 )


ᾱ∗ = · ,
2 a23 f (α1 ) + a31 f (α2 ) + a12 f (α3 )

where ai j = αi − α j and bi j = α2i − α2j (see Fig. 7.16). For example, a23 = α2 − α3 and
b31 = α23 − α21 .

7.6 Handling of constraints


If constraints have to be taken into account in the numerical optimization, according to our
classification diagram, there are four various concepts to handle them. The “hill climbing”
methods use repair mechanisms, like restoration steps, if constraints are violated. In most
cases, also the feasible direction is safe-guarded with respect to constraints that may be-
come infeasible (e.g. the Gradient Projection Method of Rosen or the Method of Feasible
Directions of Zoutendijk).
Two further concepts are

• sequential approximation methods and

• methods only valid for special problems.

For our purposes in this course, it is sufficient not to engross the repair mechanisms of hill
climbing methods, nor is there any reason to consider special non-linear constrained op-
timization methods (that may be only interesting for mathematicians). As a consequence,
we only have to look at sequential approximation methods and transformation methods,
which are the fourth concept to handle constraints in numerical optimization.
We start our investigation by describing the solution concepts used in the transformation
methodology. (Sequential approximation is treated later on.)

95
7.6.1 Transformation methods
Transformation methods try to circumvent the disturbing impact of the constraints by
reformulating the original constrained problem. As we know, the idea of reformulating
a given problem into a substitute problem is not new. The Lagrange multiplier or the
Kuhn-Tucker approach are good examples for that.
The idea, common to all individual transformation methods, is easily and rapidly ex-
plained. The original problem
 
hk (x) = 0, k = 1, 2, 3, ..., p
min f (x)
x g j (x) ≤ 0, j = 1, 2, 3, ..., m

is converted into a series of unconstrained problems


n o
min Φ(x, r(κ) = min{Φ(x, r)}
x x

where (r(κ) ) = r(1) , r(2) , r(3) , ... is a monotonic decreasing sequence with limκ→∞ r(κ) → 0
and Φ(x, r(κ) ) = f (x) + P(h(x), g(x), x, r(κ)) is the transformation function, also called
the composite or auxiliary function. In Φ(x, r(κ) ) the original objective function f (x) is
augmented in terms of a real valued function P whose action is controlled by the con-
straints h(x) as well as g(x), and the controlling parameters r(κ) . The form of the function
P depends on the transformation used.
The basic procedure is to choose an initial estimate x(0) and define the function Φ(x, r(κ)).
The controlling parameters r(κ) are also initially selected. Then, the function Φ(x, r(κ))
is minimized for x, keeping r(κ) fixed. The parameters r(κ) are adjusted between two
interactions and the procedure described above is repeated until no further improvement
is possible.

7.6.2 Sequential unconstrained minimization


This concept, also called the SUMT approach (used, by the way, in the Ansys FE pro-
gram), consists of two different types of P functions. The first one is called the Barrier
Function Method (BFM), the second one is the Penalty Function Method (PFM).
The BFM is only applicable to inequality constrained problems. As we already know
from the discussion of our practical design problems, this is not a severe restriction be-
cause pure inequality constraints form the vast majority of optimization design problems.
Popular barrier functions are

1. Inverse Barrier Function,


m
1
P(g(x), r (κ)
)=r (κ)
∑ − g j (x) ,
j=1

2. Log Barrier Function,


m
P(g(x), r(κ)) = −r(κ) ∑ log(−g j (x)).
j=1

96
f(x)

Φ∗(0)

Φ∗(1)

Φ(x,r ) = Φ , r = 1.0
(0) (0) (0)

Φ∗(2)
Φ(x,r ) = Φ , r = 0.5
(1) (1) (1)

Φ(x,r ) = Φ , r = 0.25
(2) (2) (2)

infeasible
region
feasible region

inequality constraint x

Figure 7.17: Using the SUMT method to minimize the objective function f (x) = x2 with the con-
straint g(x) = −x + 2 ≤ 0.

Both functions are called barrier functions because a large barrier is constructed around
the feasible region. In fact, both functions P become infinite if any of the inequalities
is active. Thus, when the iteration is started from a feasible point, it cannot go into the
infeasible region because the barrier cannot be crossed. In both cases, it is attempted to
converge against the unknown minimum x∗ as r(κ) → 0 and x∗ (r(κ) ) = x∗(κ) → x∗ for
κ → ∞. Correspondingly, limκ→∞ Φ(κ) → f ∗ = f (x∗ ) with g(x∗ ) ≤ 0. As an example,
see Fig. 7.17 for the minimization of the objective function f (x) = x2 with the constraint
g(x) = −x + 2 ≤ 0.
Since the solution is found only by feasible points, the BFM is also named the interior
point BFM. In those cases in which it is not easy to create a feasible initial point it is a
good idea to create a feasible point with the aid of other numerical methods (e.g. interval
search or evolution strategies).

7.6.3 Proof of concept of SUMT


Solve minx {x21 + x22 | 1 − x1 − x2 ≤ 0}. By conversion according to the BFM, we obtain
n o
(κ)
min{Φ(x, r )} = min x21 + x22 − r(κ) ln(−(1 − x1 − x2 )) .
x x

For a fixed r(κ) we omit the κ and can write Φ(κ) = Φ(x) = x21 + x22 − r ln(−(1 − x1 − x2 )).
The minimum of Φ(κ) is given by ∇Φ = 0 or

∂Φ 1
= 0 = 2x∗1 + 0 − r ∗ ,
∂x1 x1 + x∗2 − 1
∂Φ 1
= 0 = 0 + 2x∗2 − r ∗ .
∂x2 x1 + x∗2 − 1

∂Φ ∂Φ
Since ∂x1 − ∂x = 2x∗1 − 2x∗2 = 0, it follows that x∗1 = x∗2 . We have
2

r r
2x∗1 − = 2x∗2 − =0
x∗1 + x∗2 − 1 x∗1 + x∗2 − 1

97
x2

T
1 x* = [1/2, 1/2]

1/2

x1
1 2

contour lines for f(x)

1/2

Figure 7.18: Solving the optimization problem minx {x21 + x22 | 1 − x1 − x2 ≤ 0}

and, multiplying the above equation by 2x∗1 − 1, we get 4x∗2 ∗


1 − 2x1 − r = 0. The solution
of the quadratic equation is given by

∗ ∗ 1 + 1 + 4r
x1 = x2 = .
4

1+ 1+0
Now, it can demonstrated that indeed that limκ→∞;r→0 x∗1 (r(κ) )
= = 24 = 21 is the
4
desired solution (see Fig. 7.18).
The most popular penalty function in the PFM is called the quadratic loss function defined
as
" #
p m
1
P(h(x), g(x), r(κ)) = (κ) ∑ (hk (x))2 + ∑ (g+j (x))2 ,
r k=1 j=1

where g+j (x) = max(0, g j (x)). In contrast to the BFM, the penalty function is defined so
as to prescribe a high cost for violation of the constraints h(x) as well as g(x). Two facts
can be identified:
1. the monotonic decreasing sequence of control parameters r(κ) enlarge the value P
and impose a penalty,
2. the iteration proceeds in the infeasible region because max(0, g j (x)) defines P only
with respect to the infeasible domain.
Therefore, the PFM is an exterior point SUMT. Note, that the PFM allows the existence
of both equality as well as inequality constraints which are checked against a violation by
means of queries, i.e. it is examined if hk (x) is not equal to zero, or if g j (x) if positive.
As an illustration, Fig. 7.19 shows the penalty functions for the example minx { f (x) =
2x2 | g(x) = 1 − 2x ≤ 0}.
It should not be concealed that for optimum design it may be disadvantageous to ap-
proach the unknown minimum from the infeasible domain. If the iterative process stops
in between, an illegal design is created.

98
Φ(x,r ) = Φ , r = 0.05
(2) (2) (2)

Φ(x,r ) = Φ , r = 0.5
(1) (1) (1)
f(x)

Φ(x,r ) = Φ , r = 1.0
(0) (0) (0)

infeasible region
feasible region
x
inequality constraint

Figure 7.19: Penalty functions for the example minx { f (x) = 2x2 | g(x) = 1 − 2x ≤ 0}.

7.6.4 Multiplier methods


A serious weakness of the above two methods may be that they converge only slowly
against the final solution. In particular, this can happen if the Hessian matrix of the trans-
formation function Φ(κ) , i.e. H(κ) = ∇2 Φ(κ) = ∇2 Φ(x, r(κ) ) becomes ill-conditioned as
κ → ∞ and r(κ) → 0. The convergence can be determined in terms of the condition num-
ber
λmax
cond = ,
λmin
where λmax and λmin are the largest and smallest eigenvalues of the Hessian H(κ) , respec-
tively. If the condition number goes to infinity as r → 0, the convergence is slow or even
unstable.
To alleviate such convergence difficulties the multiplier or augmented Lagrangian meth-
ods have been developed, in particular in the years between 1970 and 1980. In these
methods there is no need for controlling the convergence of the process exclusively by
means of the r(κ) parameters. There are further parameters (the Lagrangian parameters)
besides the r(κ) parameters to improve the convergence and to lessen the SUMT weak-
nesses. This quality improvement is achieved by constructing the transformation func-
tion Φ(κ) differently to SUMT. Instead of the objective function f (x) of the problem
minx { f (x)|h(x) = 0, g(x) ≤ 0}, the Lagrange function L(x,λ λ,ω
ω, s2 ) = f (x) + λ T h(x) +
ω T (g(x) + s2 ) is taken as the basis for the augmentation. Then, the penalty function used
in the PFM is added to the Lagrange function. Thus, we are forming the transformation
function by the two functions L and P as
λ,ω
Φ(x, r(κ)) = L(x,λ ω, s2 ) + P(h(x), g(x), r(κ)) = L + P(κ)
where again
" #
p m
1
P(κ) =
r(κ)
∑ (hk (x))2 + ∑ (g+j (x))2 .
k=1 j=1

Through the integration of the Lagrange multipliers λ and ω the “destructive effect” of
the parameter r(κ) can be compensated in ill-conditioned cases. It is to be mentioned

99
that the multiplier methods solve a saddle point problem by two nested subproblems
(minx,λλfixed Φκ , maxλ lr (λ
λ)). They are, therefore, called dual methods.

7.7 Sequential approximation methods


The fact that a desired solution of an optimization problem can be found by solving a
sequence of adequate subproblems, as demonstrated by the transformation methods, is
also used in the sequential approximation methods. Since, here in this case, constraints, or
rather approximations of constraints, are knowingly applied to the solution concept, these
methods belong to the so-called primal methods. As already outlined, when the numerical
optimization methods have been classified, the following three sequential approximation
methods are most popular:

1. Sequential linear programming (SLP),

2. Sequential quadratic programming (SQP),

3. Sequential convex programming (SCP).

A short description of the above three methods is sufficient for our purposes.

7.7.1 SLP
In this case, the objective function and the constraints, assuming only inequality con-
straints, are repeatedly linearized as the modal point x(k) . Hence, the following type of
subproblems have to be considered.

f (x) 7→ f (x(k) ) + ∆x(k) T · ∇ f |x(k) (7.1)



g j (x) 7→ g j (x(k) ) + ∆x(k) T · ∇g j (k) , ∀ j.
x
(7.2)

7.7.2 SQP
Here, the objective function is approximated quadratically,i.e.
1
f (x) 7→ f (x(k) ) + ∆x(k) T · ∇ f |x(k) + ∆x(k) T · ∇2 f x(k) · ∆x(k)
2
while the constraints are linearized in the same fashion as in SLP.

7.7.3 SCP
In this method, the advantages of convex optimization are used. As we already know, the
property of convexity allows us to find global optimal solutions. Consequently, it seems to
be a brilliant idea to construct the individual subproblems in the sequential solution pro-
cess such that each subproblem is convex. It could be figured out that a specific Taylor se-
ries expansion creates the desired convexity (Fleury has affected the SCP tremendously).
The series expansion of the objective function as well as the constraints contains only first
order partial derivatives, but they are introduced depending on their algebraic sign.

100
Accordingly, for the objective function we get
!
n1
∂ f + n2
∂ f − 1 1 1
f (x) 7→ f (x ) + ∑ (xi − xi ) − ∑
(k) (k)
− (k) .
i=1 ∂xi x(k) i=1 ∂xi x(k) (x )2
(k) xi x
i i

In the above formula, the + and − superscripts indicate where the respective gradient is
positive (+) or negative (−). If the gradient is negative the first derivative is carried out
with respect to the reciprocal 1/xi of the optimization variable xi (that’s the trick by which
convexity can be created). In a similar fashion, the constraints are established, i.e.
!
m1 ∂g+ m2 ∂g−
1 1 1
g j (x) 7→ g j (x(k) ) + ∑ (xi − xi ) − ∑
j (k) j
− .
i=1 ∂xi (k) i=1 ∂xi (k) (x )2
(k)
xi x(k)
x x i i

Note that all sequential approximation methods are available in the optimization library
offered by the Institute of Computational Engineering.

7.8 On the usage of termination criteria


A severe problem associated with the interactive nature of the numerical optimization
process is the fact that the accurate number of repetitions in the optimization loop is
unknown. This is a direct consequence of the nonlinearity of the optimization problem.
The accuracy of the solution, or the number of repetitions, can therefore be navigated
only indirectly by means of secondary indicators. It may be plausible that, again, there are
several ways and means of evaluating indicators resulting in various termination criteria.
In most of the cases, the difference between two consecutive values of the objective func-
tion is taken as the measuring stick for terminating the optimization process. Hereby, two
distinguished criteria can be applied depending on the magnitude of the value of the ob-
jective function. If these values are large, then a so-called relative termination criterion is
a good choice. Assuming minimization we have
| f (k) − f (k+1) |
≤ ε1 , 0 < ε1 ≪ 1.
| f (k) |
By contrast, if the values of f (x∗ ) are expected to be around zero the the absolute termi-
nation criterion is of benefit,
| f (k) − f (k+1) | ≤ ε2 , 0 < ε2 ≪ 1.
Also, the individual solution points (vectors) found during the optimization process can
be compared with each other and then used to stop the iterative process. In particular,
in cases where the minimum represents a point on a plateau (see Fig. 7.20, left) it may
be justified to stop the iteration loop after a while if the changes between consecutive
solution vectors become less and less. This leads us to the so-called relative and absolute
coordinate criterion of termination
|x(k+1) − x(k) |
≤ ε3 , 0 < ε3 ≪ 1,
|x(k) |

and, if |x(k) | is approaching zero,

|x(k+1) − x(k) | ≤ ε4 , 0 < ε4 ≪ 1.

101
f(x1) f(x1)

dell

plateau plateau
x1 x1* x1

Figure 7.20: A “plateau” during the optimization process can hinder the search greatly. (left); a
minimal could lie in a dell. (right)

Note that it may be advisable, sometimes, to compare solution vectors using a certain lag,
e.g. x(k+r) and x(k) , where r may be a positive integer (5, 10, or so). By this retardation,
there may be a chance to find minima that could lie in a “dell” of a plateau (see Fig.
7.20, right).
The most “brutal” method of termination is to restrict the elapsed computer time or
the maximum number of iterations. It is not a secret that this is often the only way to
avoid serious problems, in particular to prevent endless loops due to an unintentionally
ill-conditioned optimization model.
Finally, in cases where the objective function and the constraint functions show a C2 -
continuity (i.e., if they are smooth functions in a mathematical sense and twice differen-
tiable), then also the Kuhn-Tucker conditions can be used as termination criteria. Indeed,
modern software solutions often make use of them.

7.9 Evolution strategies for design optimization


At the end of the course on design optimization, a numerical optimization strategy is to be
discussed that is robust in very complicated cases, particularly in cases when the numeric
optimization strategies introduced in the previous chapters may fail. This, however, does
not mean that the evolution strategy, which in fact even provides a whole family of strate-
gies, is particularly fast. Again, convergence difficulties may occur in very complicated
cases that have to be compensated by means of adequate user interactions. Nonetheless,
the evolution strategies are the best choice in complex situations.
Examples for complicated problems are:

• multi-modal problems that don’t have only one single local optimum,

• problems where the objective function or the constraints or parts of the constraints
are not differentiable. Examples are beam-like cylindrical shells (see Fig. 7.21) or
point-wise defined functions (see Fig. 7.22),

• migrating optimum points, e.g. due to temporal changes in the optimization model
(then the parameter t comes into play as well) or due to stochastic behavior of
the optimization model induced by stochastic processes (damages or deterioration
phenomena are some examples),

• discontinuous solution spaces having local optima (see Fig. 7.23),

102
- -

- -
+

my -distribution

Figure 7.21: For a structure with beam-like cylindrical shells, the moment distribution my is not
differentiable.

f(x1)

x1

Figure 7.22: A point-wise defined function is also not differentiable.

(2)
x*
x2

(1)
x*

x1

Figure 7.23: A discontinuous solution space can have many local optima.

103
general optimization
problem

transformation methods primal methods


(as a concept) (as a concept)

BFM sequential approx. methods

PFM special methods

MM

hill climbing
hill climbing
strategies
strategies
w.r.t. constraints

search gradient Newton search gradient Newton


methods methods methods methods methods methods

evolution evolution
strategies strategies

Figure 7.24: The classification of optimization techniques including evolution strategies.

In all the cases mentioned above the evolution strategies seem to be an appropriate choice.
As a consequence, the robustness and, in context of that, the general applicability of the
evolution strategies distinguish them from potential competitors.
From the viewpoint of categorizing, the evolution strategies are a member of the hill-
climbing methods, more appropriately, down-hill-climbing, since we are considering min-
imization as the standard formulation of an optimization process. According to our previ-
ous classification schema, therefore, the evolution strategies appear in both main branches
of our classification diagram, i.e. they can be included in the transformation concept as
well as the primal (constraint-oriented) concept. To clarify this consideration the classifi-
cation diagram in shown again in Fig. 7.24 where the evolution strategies are included in
the schema.
As demonstrated, the evolution strategies belong to the search methods. The reason for
this is because no (first or second order) derivatives are needed, i.e. only zero order infor-
mation is taken.
The basic approach of the evolution strategies will be briefly explained. Evolution strate-
gies are assigned to the field of “evolutionary computation” (also denoted as “evolutionary
algorithms” as used since 1990), which encompasses four subdomains:

1. genetic algorithms (GA),

2. evolution strategies (ES),

3. evolutionary programming (EP) and

4. genetic programming (GP).

104
Some very short comments on EP (famous scientists are L. J. Fogel and D. B. Fogel) and
GP (J. R. Koza): Both directions started from the manipulation of finite state automata and
computer programming, respectively, making use of the solution principles and mecha-
nisms of the biological evolution processes, where numerous optimization mechanisms
are embedded. In general, structured quantities can be considered such as data trees,
graphs, structural systems, etc. There are a lot of similarities to GA and ES such that
a further discussion in not necessary because the main solution aspects are dealt with
when ES are explained.
The GA and the ES were developed independently from each other, but both were in-
vented to solve optimization problems by applying the continuous adaption and improve-
ment of the biological evolution as a paradigm for mathematical or technical optimization.
Characteristic examples of biological evolution are

• dolphins, minimizing the flow resistance of their bodies in water,


• see urchins, maximizing their chow capabilities,
• mussels, maximizing their rigidity and
• bats, having an optimized navigation system.

Around 1975, John Holland (Ann Arbor, USA) together with Kenneth A. De Jong, in-
troduced the genetic algorithms. Some years previously, Ingo Rechenberg created the
first evolution strategy, the so-called (1 + 1)-ES, in Berlin in 1972, accompanied by Paul
Schwefel, who essentially intensified the further developments. (Note: for the first time
in engineering the (1 + 1)-ES was evaluated in the author’s Ph. D. thesis on the shape
optimization of cylindrical shells in time period 1970-1974.) Both approaches bear many
analogies. In the following, particularly the evolution strategies are to be introduced, as
mentioned before. The genetic algorithms are only described to get to know the main
differences between genetic and evolution strategies.
The most palpable difference between the GA and the ES consists in how the optimization
variables (from a biological point of view the “individuals of a population”) are modeled
and internally represented. In the case of genetic algorithms the individuals are repre-
sented in terms of “chains of bits”, which attempts to simulate the natural model, i.e. the
genetic code implemented in the DNA (deoxyribonucleic acid) structure of the double
helix, in the best possible fashion.
In contrast to that, the evolution strategies use real-valued vectors to represent the pheno-
type of the individuals (optimization variables). The genotype is therefore not considered.
As a consequence, the genetic operators which incorporate optimization mechanisms,
such as the mutation or recombination operator, are different. The GA have to code the
phenotype information into bits using mostly the floating point representation known from
computer science. Since bits are applied in GA directly, the combinatorial problems (e.g.
traveling salesman problem) are very appropriate for GA. On the other hand, the ES are
better suited for “parameter optimization” because no bit coding is necessary.
In the following, only the dyadic (1 + 1)-evolution strategy is examined. This is sufficient
for a first introduction into the philosophy of evolution strategies. The denotation (1 + 1)
means that there are two competitive individuals that are subjected to a selection process
in accordance with the principal of the survival of the fittest (Darwin’s principal). Hereby,
the first “1” accounts for a parents, the second “1” for a child, created from the parent by
a mutation.

105
Figure 7.25: The Gaussian distribution with mean mi and standard deviation σi .

It should be mentioned that, in the last two decades, further strategies have been created,
such as the multi-membered ES as follows:

• (µ + λ)-ES, where λ ≥ 5...6µ,


• (µ, λ)-ES, where λ ≥ 5...6µ,
• (µ, κ, λ)-ES, where λ ≥ 5...6µ.

Here, the “+” and “,” symbols stand for the way in which the selection process is car-
ried out; µ and λ represent the number of individuals in the parent and child generation,
respectively. “+” means that the members of both generations (parents and children) are
used in the selection process, “,” means that the parent generation is not considered in
the selection process for determining the fittest member. In this context, the parameter κ
constitutes the duration of a possible lifespan by which the lethality principle of biology
is materialized.
Now to some details of the (1 + 1)-ES. The core of the corresponding algorithm again
is the governing vector equation x(k+1) = x(k) + α(k) s(k) known from our structure chart
that has been used for describing the iterative optimization process in general. However,
this vector equation is modified in the ES, and also in the (1 + 1)-strategy. Now we have
(k)
x(k) + α(k) s(k) 7→ x(k) + z(k) , where the vector z(k) contains stochastic components zi ,
i = 1, 2, ...n, which follow a Gaussian distribution (see Fig. 7.25). Thus,
(zi −mi )2
1 −
2σ2
pdf(zi ) = √ e i .
σi 2π
Hence, the individual components zi of the stochastic vector z are created around the
position vector x(k) following a Gaussian distribution for each zi component. For this, the
mean value mi (arrow head of vector x(k) is the origin) becomes mi = 0 so that
z2
i
1 −
2σ2
pdf(zi ) = √ e i
σi 2π
can be simplified. According to the (0, σ)−pdf, stochastic steps into the solution space are
created which can be understood as a mutation mechanism. The geometric interpretation
of this mutation is as follows:

106
a b c

d e f

Figure 7.26: Some model functions used in the probabilistic theory of the (1 + 1)-ES.

The geometric location of points created by virtue of the mutation mechanism


(pdf) is a hyper-ellipsoid (an n-dimensional ellipsoid of scattering, character-
ized by the equation ∑(zi /σi ) = const).

Therefore, the standard deviations σi , i = 1, 2, ...n, represent step lengths. Furthermore,


a certain domain (cloud) in the space around x(k) is encompassed where—and this is
essential—due to the Gaussian distribution small changes zi are created often, while larger
changes occur only rarely.
Based upon the creation x(k) + z(k) , a selection is subsequently carried out. The selection
process is governed by the following simple if-then-else statement:
 (k)
(k+1) x + z(k) , if f (x(k) + z(k) ) < f (x(k) ), g j (x(k) + z(k) ) ≥ 0,
x =
x(k) , otherwise.
Obviously, this selection provides the “fittest solution” because the next point in the iter-
ation process, x(k+1) , is overwritten by a successor, created by x(k) + z(k) , if the quality of
the optimization criterion f at x(k) + z(k) is better than at x(k) , and if the constraints are
fulfilled at point x(k) + z(k) .
Note that in the ES-packages it is always assumed that the constraints are satisfied if the
are greater than or equal to 0. (This is different to our previous view where the feasible
domains are characterized by the equation g j (x) ≤ 0. However, if our standard g j (x) ≤ 0
formulation is multiplied by −1 a transformation to the ES-standard is easy.)
The essence of the (1 + 1) approach can be summarized in brief:

• The search directions cover the full search space Rn , if n optimization variables are
defined,
• the step length of the Gaussian step into the space remains small in general; large
step lengths are rare, due to the Gaussian distribution.

To achieve an acceptable convergence it is required to adopt the standard deviations σi ,


which act as step lengths, to the present conditions of the optimization criterion (objective
function) and the active constraints as well. Rechenberg, the developer of the (1 + 1)-ES,
established a (complex!) probabilistic theory for the acceleration of convergence, based
on representative model functions (see 3D representations in Fig. 7.26).
The results of the examination is surprising: It can be shown that the convergence rate for
the two models “d” and “b” in Fig. 7.26 yield nearly the same values for the best adoption

107
x2 x2

success
a a

x1 x1

Figure 7.27: The success rate for a small (a) and a large (b) circle of probability.

of the σi quantities, although the two model functions represent diametrically opposed
scenarios. The model “d” (corridor model) reflects the circumstances far away from the
optimum, the model “b” (spherical model) display the situation in the vicinity of a local
or global optimum.
Furthermore, it can be substantiated that the adaption of the σi -values has to make sure
that the values for σi are neither too small nor too large. This conclusion can be immedi-
ately drawn if the following two cases are considered (see Fig. 7.27 in the 2D case).
Assuming σi = σ = const, for all i, the geometric location of points having the same
“strike probability” is a circle of, say, radius a. Now we can state that if the radius a is too
small, the “gain of progress” is also small, although there are many successes within this
iteration step (see Fig. 7.27(a)). Otherwise, if the radius a is too large (see Fig. 7.27(b)),
the gain is likewise small because the number of successes is small, although the step size
is large! As a consequence it can be guessed that the appropriate step size, represented
in terms of the radius a, must lie in between these two extreme cases. Due to the proba-
bilistic investigations carried out by the Rechenberg group it could be figured out that the
convergence speed is optimal
1. in the corridor model if the rate successes/mutations, called the success probability
pdfs , is about 1/3.2,
2. in the spherical model if the success probability pdfs is about 1/5.
In fact, the two numbers 1/3.2 and 1/5 don’t differ very much, albeit the two models
represent two diametrically opposed endpoints of a wide scale of possible models. In other
words, the optimal convergence speed ranges in a very narrow window of possibilities
(like the escape velocity of missiles in aeronautics). If this window is abandoned a “deadly
reaction” happens. Due to the small difference between 1/3.2 and 1/5 it is suggested to
take the 1/5 value as the governing value to formulate a success rule for the (1 + 1)-ES
algorithm. If the success ratio therefore, is less than 1/5 the σi values have to be reduced,
if it is larger than 1/5 the σi values have to be increased.
The reduction and amplification factor can also be specified more accurately. For the
spherical model, which represents the most disadvantageous case with regard to the σi
changes the theoretical examination leads to the factor
 
0.202 n 1
lim 1 − = 0.817 = .
n→∞ n 1.224

108
This gives us the following adaption rule that is incorporated in the (1 + 1)-algorithm:

1. Check after n (i.e. the number of optimization variables) mutations how many suc-
cesses have been achieved during the last 10n mutations.

2. If the number of successes is less than 2n, then multiply the values σi by the factor
0.817 (reduction!).

3. If the number of successes is larger than 2n, then divide the values σi by the factor
0.817 (enlargement!).

4. Otherwise, the values of σi are optimal and there is no need to change them.

Obviously, this simple if-the-else construct can be easily implemented in programs to


navigate the adaption of the search in the solution space. However, the simple rules do not
regulate the relationship between individual σi values, i.e. all σi are enlarged or reduced
with exactly the same common factor. If a scaling is desirable, however, different settings
of the σi are required. This can be managed by entering different σi values for each op-
timization variable xi at the beginning. Of course, then the relationships between the σi
values are kept during the entire optimization run. In advanced ES, therefore, there are
opportunities by which the σi quantities can be defined as variables (“strategy variables”),
in addition to the original optimization variables. Also, recombination mechanisms (inter-
changing information between variables on the same level), associated with the mutations,
provide further flexibility and improvement.
Using the (1+1)-strategy in practical problems, it may happen that a continuous reduction
of the σi values by the factor 0.817 leads to a zero value, in the worst case. This means
that the search space is of a diminished dimension. To avoid this pathological case it must
be made sure that

• σi > εa , 0 < εa ≪ 1, and

• σi > εb |xi |, 0 < εb ,

where εa and εb are the absolute and relative accuracies of the computer processor used
in the optimization. By that, a minimum of variance is achieved.
Finally, if should be mentioned that the (1 + 1)-strategy is also able to create feasible
solutions if the starting point is infeasible. In this case, the primary objective criterion is
replaced temporarily by a substitute function fˆ representing the sum of the entire violated
constraints as follows,
m
fˆ = ∑ g j (x)δi,
j=1

where

−1, if g j (x) < 0 (infeasible),
δi =
0, otherwise.

In the appendix to this manuscript, a (Fortran) subroutine is provided that demonstrates


the compactness and brevity of the (1 + 1)-ES. Note, that there is a little subroutine cre-
ating Gaussian random numbers from equally distributed random numbers (Box-Muller
approach). The termination criteria used in the (1 + 1)-strategy are

109
1. the relative termination criteria

| f (k) − f (k+∆) |
≤ ε1 ,
| f (k) |

comparing the values of the optimization function f (x) within a range k + ∆,

2. the corresponding absolute termination criterion

| f (k) − f (k+∆) | ≤ ε2 ,

and, finally,

3. the elapsed computer time.

110

Vous aimerez peut-être aussi