Vous êtes sur la page 1sur 28

Simulated Annealing

Starting with
Steepest Descent Method
Steepest Descent Method
• Gradient Descent Method is a optimization
algorithm to find a local minimum of a
• Gradient is the slope of a
Select a function and the starting point. Click START.
Illustration of Gradient Descent


Original point in
weight space (xk)

New point in
weight space
Line Search Parameter
• Parameter αk is the rate of change to the point in the
specific direction
• Our interest if to find the rate of change of the function f along the
direction charaterised by the parameter
• The objective is to determine optimum line parameter (step length) αk*
which minimizes f
• This is determined by =0 and finding the value of optimum αk* that
minimizes f
Steepest Descent method
• Certain other properties
– Rate of convergence is dependent on starting
– The steepest descent method is a “local” property
and is not effective in most problems
A Heuristic is simply a rule of thumb that hopefully will find a good
Why use a Heuristic?
 Heuristics are typically used to solve complex(large, nonlinear,
non-convex (i.e. contain several local minima)) multivariate
optimization problems that are difficult to solve to optimality.
 Heuristics are NOT guaranteed to find the true global optimal
solution in a single objective problem, but should find many
good solutions (the mathematician's answer vs. the engineer’s
 Heuristics are good at dealing with local optima without getting
stuck in them while searching for the global optimum.
 A Heuristic is simply a rule of thumb that hopefully will find a
good answer.
• “A Metaheuristic is formally defined as an iterative
optimization process which guides a subordinate
heuristic by combining intelligently different concepts
for exploring and exploiting the search space, learning
strategies are used to structure information in order to
find efficiently near-optimal solutions.”
• “Metaheuristics are typically high-level strategies
which guide an underlying, more problem specific
heuristic, to increase their performance. The main goal
is to avoid the disadvantages of iterative improvement
and, in particular, multiple descent by allowing the
local search to escape from local optima
Common Heuristics/Metaheuristics
• Neighbourhood search methods
– Simulated Annealing
– Tabu Search
– Etc.
• Population search methods
– Genetic Algorithms
– Particle swarm optimization
– Ant Colony optimization
– Artificial Bee colony algorithm
– Etc..
Simulated Annealing
Method proposed in 1983 by
IBM researchers for solving Initial position
of the ball Simulated Annealing explores
VLSI layout problems more. Chooses this move with a
small probability (Hill Climbing)
(Kirkpatrick et al, Science,
220:671-680, 1983).

• Simulated Annealing Greedy Algorithm

gets stuck here!
Locally Optimum
Vs Greedy algorithms Solution.

Upon a large no. of iterations,

SA converges to this solution.
Simulated Annealing
• Annealing is a thermal process for obtaining low energy
states of a solid in a heat bath.
• The process contains two steps:
– Increase the temperature of the heat bath to a maximum value
at which the solid melts (re-crystallization temperature).
– Decrease carefully the temperature of the heat bath until the
particles arrange themselves in the ground state of the solid.
Ground state is a minimum energy state of the solid.
• If the heating temperature is sufficiently high to ensure
random state and the cooling process is slow enough to
ensure thermal equilibrium, then the atoms will place
themselves in a pattern that corresponds to the global
energy minimum of a perfect crystal.
Simulated Annealing
Metropolis algorithm
Simulated Annealing
• At a fixed temperature T :
• Perturb (randomly) the current state to a new state
• E is the difference in energy between current and new
• If E < 0 (new state is lower), accept new state as current
• If E  0 , accept new state with probability
Pr (accepted) = exp (- E / kB.T)
• Eventually the systems evolves into thermal equilibrium at
temperature T ; then the formula mentioned before holds
• When equilibrium is reached, temperature T can be
lowered and the process can be repeated
Select xcur
Set initial T,
N, B and α
and find Set n=0

Generate neighbourhood
soln. xnew and find E(xnew)

T←T .α

Y N n=n+1
If E<0

< exp(E/B.T)
Y Min. T?
xcur ← xnew xcur ← xnew

Print xcur
If n<N
Simulated Annealing Algorithm
1. Select a point xcur and find objective function E(xcur);
2. Select an initial temperature T>0; //T=2000 (say)
3. Set number of iterations at each temperature N //N=20 (say)
4. Set rate of change of temperature α; //say 0.99
5. Repeat
5.1 Set repetition counter n=0;
5.2 Repeat
5.2.1 Generate state xnew, a neighbor of xcur;
5.2.2 Calculate E=E(xnew)E(xcur);
5.2.3 If E<0 (minimization problem) then xcur ← xnew;
5.2.4 else if random(0,1)<exp(E/B.T) then xcur ← xnew;
5.2.5 n=n+1;
5.3 Until n=N;
5.4 T←T .α;
Repeat Step 5 Until stopping criterion reached.
// T=20 (say) or convergence
Algorithm Parameters
• Kirkpatrick et al. (1983)
– Set initial value T to be large enough
– T(t+1)= T(t),  :0.8~0.99 //rate of cooling
– N: a sufficient number of transitions
• corresponding to thermal equilibrium
• constant
• or proportional to the size of the neighborhood
– Stopping criterion
• when the solution obtained at each temperature change is
unaltered for a number of consecutive temperature changes.
• When the temperature reaches a specific minimum value
Algorithm parameters
• Constant: B should be chosen depending on ΔE (depending on
the function) and initial Temperature T and the final
ΔE/B.T Probability P=exp(E/B.T)
Start of algorithm
0.01 0.999
(Initial Temp)
0.1 0.905
0.25 0.779
0.5 0.607
1.0 0.368
2.0 0.135
3.0 0.050
4.0 0.018
5.0 0.007 End of algorithm
(Final Temp)
Good features
• It is very easy to implement.
• It can be generally applied to a wide range of
• SA provided high quality solutions to many
• Simulated Annealing algorithms are usually
better than greedy algorithms, when it comes
to problems that have numerous locally
optimum solutions.
• Care is needed to devise an appropriate neighborhood structure and
cooling scheduler to obtain an efficient algorithm.
• Results are generally not reproducible: another run can give a different
• SA can leave an optimal solution and not find it again(so try to remember
the best solution found so far)
• Proven to find the good quality solutions under certain conditions; one of
these conditions is that you must run forever (several restarts)
• The energy function of the left would work with SA while the one of the
right may fail.

• Step 1: We choose an initial point xcur=(2.5,2.5). Calculate

f(xcur)=8.125. Initial Temp=405. 0 . N=1. α = 0.5. B=1. Termination
factor is set as Minimum Temp=20.
• Step 2: create a new point in neighbourhood of xcur. Let us choose
Gaussian distribution (random number generator) for selecting
the new point w.r.t. current point.
xnew=xcur + N(µ,σ), Set µ=0 and σ=0.833
xnew=(2.5,2.5)+ (0.037,- 0.086) =(2.537,2.414)
Calculate f (xnew)=6.482
• Step 3: Calculate ΔE=f(xnew) - f(xcur) = -1.643. Since ΔE<0. Select
the new point and set it as current point. Xcur=(2.537,2.414)
• Step 4: Check for termination: Check lower temp. Reduce the
temperature T=αT=0.5x405=202.5
• Step 5: create a new point in neighbourhood of xcur.
xnew=xcur + N(µ,σ), Set µ=0 and σ=0.833
xnew=(2.537,2.414)+ (-0.426,- 1.810) =(2.072,0.604)
Calculate f (xnew)=58.067
• Step 6: Calculate ΔE=f(xnew) - f(xcur) = 58.067-6.482=51.585. Since
ΔE>0. Select the new point based on the probability P=exp (-
51.585/202.5) =0.775. Generate a random number in the range
Since r < P, we accept the new point and set it as current point.
• Step7: The termination criterion is not met here too.
Therefore we decrease the temperature to
• Step8: The next point in the neighbourhood found using
Gaussian distribution is xnew= (2.397,-0.312). Determine
• Step 9: Since (ΔE=-6.780) <0. Accept this point. xcur= (2.397,-
• Step 10: The termination criterion is not met here too.
Therefore we decrease the temperature to
• Step11: The next point in the neighbourhood found using
Gaussian distribution is xnew= (1.397,1.721). Determine
• Step 12: Calculate ΔE=f(xnew) - f(xcur) = 9.379. Since ΔE>0.
Select the new point based on the probability P=exp (-
9.379/50.625) =0.831. Generate a random number in the
range (0,1):r=0.942.
Since r > P, we donot accept the new point. Therefore retain
xcur=(2.397,-0.312) and f(xcur)=51.287.

• And so on until the termination criterion is reached.

Further Reading…
• Szu, H.H and Hartley, R.L, Non-convex optimization by
fast simulated annealing, Proceedings of the IEEE, vol 75 (11), 1987.
• Tsallis, C. and Stariolo, D A., Generalized Simulated Annealing, DOI:
• Ingber, L, Simulated annealing: Practice versus theory, Mathematical and
Computer Modelling, vol 18 (11),1993.
• Tekinalp, O.; Bingol, M., Simulated Annealing for Missile Optimization:
Developing Method and Formulation Techniques, Journal of Guidance,
Control, and Dynamics, Vo. 27(4),2004.
• Textbook:
– “Optimization for Engineering Design” by kalyanmoy Deb (IITK)