Vous êtes sur la page 1sur 230


Power Engineering Society

Tutorial on Modern Heuristic

Optimization Techniques with
Applications to Power Systems


Modern Heuristic Optimization Techniques
with Applications to Power Systems

Sponsored by:

NewIntelligentSystems Technologies Working Group

Intelligent System Applications Subcommittee

Power System Analysis, Computing, and Economics Committee

IEEE Power Engineering Society

Edited by

K. Y. Lee

M.A. EI-Sharkawi
From the Course Editors

Several heuristic tools have evolved in the last decade that facilitate solving optimization
problems that were previously difficult or impossible to solve. These tools include evolutionary
computation, simulated annealing, tabu search, particle swarm, etc. Reports of applications of
each of these tools have been widely published. Recently, these new heuristic tools have been
combined among themselves and with knowledge elements, as well as with more traditional
approaches such as statistical analysis, to solve extremely challenging problems. Developing
solutions with these tools offers two major advantages: 1) development time is much shorter than
when using more traditional approaches, and, 2) the systems are very robust, being relatively
insensitive to noisy and/or missing data.

The purpose of this course is to provide participants with basic knowledge of evolutionary
computation, and other heuristic optimization techniques, and how they are combined with
knowledge elements in computational intelligence systems. Applications to 'power problems are
stressed, and example applications are presented. The tutorial is composed of two parts: The
first part gives an overview of modem heuristic optimizationtechniques, including fundamentals
of evolutionary computation, genetic algorithms, evolutionary programming and strategies,
simulated annealing, tabu search, and hybrid system of evolutionary computation. It also gives
an overview ofpower system applications.

The second part of the tutorial deals with specific applications of the heuristic approaches to
power system problems, such as security assessment, operational planning, generation,
transmission and distribution planning, state estimation, and power plant and power system

Evolutionary Computation:

Natural evolution is a hypothetical population-based' optimization process. Simulating this

process on a computer results in stochastic optimization techniques that can often out-perform
classical methods of optimization when applied to difficult real-world problems. This tutorial
will provide a background in the inspiration, history, and application of evolutionary
computation and other heuristic optimization methods to system identification, automatic
control, gaming, and other combinatorial problems.

The objectives are to provide an overview of how evolutionary computation and other heuristic
optimization techniques may be applied to problems within your domain of expertise, to provide
a good understanding of the design issues involved in tailoring heuristic algorithms to real-world
problems, to compare and judge the efficacy of modem heuristic optimization techniques with
other more classic methods of optimization, and to program fundamental evolutionary algorithms
and other heuristic optimization routines.

Genetic Algorithms:

Genetic Algorithm (GA) is a search algorithm based on the conjecture of natural selection and
genetics. The features of genetic algorithm are different from other search techniques in several
aspects. First, the algorithm is a multi-path that searches many peaks in parallel, and hence
reducing the possibility of local minimum trapping. Secondly, GA works with a coding of
parameters instead of the parameters themselves. The coding of parameter will help the genetic
operator to evolve the current state into the next state with minimum computations. Thirdly, GA
evaluates the fitness of each string to guide its search instead of the optimization function. The
genetic algorithm only needs to evaluate objective function (fitness) to guide its search. There is
no requirement for derivatives or other auxiliary knowledge. Hence, there is no need for
computation of derivatives or other auxiliary functions. Finally, GA explores the search space
where the probability of finding improved performance is high.

Evolution Strategies and Evolutionary Programming:

Evolution Strategies (ES) employ real-coded variables and, in its original form, it relied on
Mutation as the search operator, and a Population size of one. Since then it has evolved to share
many features with GA. The major similarity between these two types of algorithms is that they
both maintain populations of potential solutions and use a selection mechanism for choosing the
best individuals from the population. The main differences are: ES operate directly on floating
point vectors while classical GAs operate on binary strings; GAs rely mainly on recombination
to explore the search space, while ES uses mutation as the dominant operator; and ES is an
abstraction of evolution at individual behavior level, stressing the behavioral link between an
individual and its offspring, while GAs maintain the genetic link.

Evolutionary Programming (EP) is a stochastic optimization strategy similar to GA, which

places emphasis on the behavioral linkage between parents and their offspring, rather than
seeking to emulate specific genetic operators as observed in nature. EP is similar to
Evolutionary Strategies, although the tow approaches developed independently. Like both ES
and GAs, EP is a useful method of optimization when other techniques such as gradient descent
or direct analytical discovery are not possible. Combinatorial and real-valued function
optimization in which the optimization surface or fitness landscape is "rugged", possessing many
locally optimal solutions, are well suited for Evolutionary Programming.

Particle Swarm:

Particle swarm optimization (PSO) is an exciting new methodology in evolutionary computation

that is somewhat similar to a genetic algorithm in that the system is initializedwith a population
of random solutions. Unlike other algorithms, however, each potential solution (called a
particle) is also assigned a randomized velocity and then flown through the problem hyperspace.
Particle swarm optimization has been found to be extremely effective in solving a wide range of
engineering problems. It is very simple to implement (the algorithm comprises two lines of
computer code) and solves problems very quickly.

Tabu Search:

Tabu Search (TS) is basically a gradient-descent search with memory. The memory preserves a
number of previously visited states along with a number of states that might be considered
unwanted. This information is stored in a Tabu List. The definition of a state, the area around it
and the length of the Tabu list are critical design parameters. In addition to these Tabu
parameters, two extra parameters are often used: Aspiration and Diversification. Aspiration is
used when all the neighboring states of the current state are also included in the Tabu list. In that
case, the Tabu obstacle is overridden by selecting a new state. Diversification adds randomness
to this otherwise deterministic search. If the Tabu search is not converging, the search is reset

Simulated Annealing:

In statistical mechanics, a physical process called annealing is often performed in order to relax
the system to a state with minimum free energy. In the annealing process, a solid in a heat bath
is heated up by increasing the temperature of the bath until the solid is melted into liquid, then
the temperature is lowered slowly. In the liquid phase all particles of the solid arrange
themselves randomly. In the ground state the particles are arranged in a highly structured lattice
and the energy of the system is minimum. The ground state of the solid is obtained only if the
maximum temperature is sufficiently high and the cooling is done sufficiently slowly. Based on
the annealing process in the statistical mechanics, the Simulated Annealing (SA) was introduced
for solving complicated combinatorial optimization.

The name 'simulated annealing' originates from the analogy with the physical process of solids,
and the analogy between physical system and simulated annealing is that the cost function and
the solution (configuration) in the optimization process correspond to the energy function andthe
state of statistical physics, respectively. In a large combinatorial optimization problem, an
appropriate perturbation mechanism, cost function, solution space, and cooling schedule are
required in order to- find an optimal solution with simulated annealing. SA is effective in
network reconfiguration problems for large-scale distribution systems, and its search capability
becomes more significant as the system size increases. Moreover, the cost function with a
smoothing strategy enables the simulated annealing to escape more easily from local minima and
to reach rapidly to the vicinity of an optimal solution.

K. 1': Lee and M A. El-Sharkawi

For further information, please contact

K. Y. Lee
DepartmentofElectrical Engineering
The PennsylvaniaState University
University Park, PA 16802
Phone:(814) 865-2621
Fax: (814) 865-7065
e-mail: kwanglee@psu.edu

M. A. EI-Sharkawi
Department of Electrical Engineering
University of Washington
Seattle, WA 98195-2500
Phone: (206) 685-2286
Fax: (206)543-3842
e-mail: elsharkawi@ee.washington.edu

In memory of Alcir J. Monticelli (1946 - 2001):

Alcir Jose Monticelli was born on November 16, 1946 in Rio Capinzal, Santa Catarina, Brazil. He was a
Fellow of the IEEE and a member of the Brazilian Academy of Sciences. He received his B.S. degree in
electronic engineering from the Insituto Tecnol6gico de Aeronantica (ITA) in 1970, the M.S. degree from
Universidade Federal da Paraiba (UFPB) in 1972, and the Ph.D. degree from Universidade Estadual de
Campinas (Unicamp) in 1975, all in Brazil. From 1982 to 1985, he was a visiting professor at the
University of California Berkeley where he worked on theoretical aspects of network analysis, and from
1991 to 1992 he was with Mitsubishi Electric Corporation, Japan as a researcher of the artificial
intelligence and parallel computing group. He was a professor of electrical engineering at Unicamp since

''Being a professor wasn't just a profession for him. It was a way of life: he used to observe everything
and teach all the time. He had a great pleasure living that way". - Isadora Monticelli

Everyone that had worked with him. corroborates the words of Alcir's daughter. He was the author of
three books on power systems and had more than 40 articles published in international journals,
transactions and proceedings with more than 500 citations according to the Science Citation Index. He
was a collaborator of National Science Foundation and a member of most conferences in the power
engineering area.

"Alcir Monticelli was a very important academic leader" - words of Carlos Henrique de Brito Cruz,
president of FAPESP (Fun~ao de Amparo a Pesquisa do Estado de Sao Paulo) - the State of Sao Paulo
Research Foundation. Alcir was an active collaborator with several projects, and he was the mentor of
Small Business Innovation Research program in the State of Sao Paulo by FAPESP. His creativity and
intelligence is present in the way the power system is treated nowadays. Load flow, state estimation,
security analysis, and network planning, had undergone' great advances with his contributions. The
recognition for his contributions came with the honor of IEEE Fellow (1996), Engineer of the Year in
Latin America (1997) and with the IEEE Third Millennium Medal (2000). As a professor, researcher,
writer and a man engaged with technological innovation, he never neglected his family, which was the
source of constant strength and joy to him. With his wife, Maria Stella he left three lovely daughters,
Viridiana, Isadora, and Eleonora, who are all successful. He will always be remembered and when
problems arise in power systems we will deeply miss his discussions.

List ofContributors

CI Theory of Evolutionary Computation

By Russell C. Eberhart, Indiana University Purdue University Indianapolis
Cl Overview of Applications in PowerSystems
By Alexandre P. Alves da Silva, FederalEngineeringSchoolat Itajuba
Cl Fundamentals of Genetic Algorithms
By Alexandre P. Alves do Silva, FederalEngineeringSchoolat Itajuba
c Fundamentals of Evolution Strategies and Evolutionary Programming
By Vladimiro Miranda, University ofPorto
o Fundamentals of Particle Swarm Techniques
By Yoshikazu Fukuyama, Fuji Electric Co. R&D
CJ Fundamentals of Simulated Annealing
By Aicir J. Monticelli, Universidade Estaduial de Campinas; Ruben Romero,
Universidade Estadual Paulista Julio de Mesquita Filho
Q Fundamentals of Tabu Search
By Alcir J. Monticelli, Universidade Estaduial de Campinas; Ruben Romero,
Untversidade Estadual Paulista Julio de Mesquita Filho
c Hybrid Systems: An Example with Fuzzy Systems
By GermanoLambert-Torres, FederalEngineeringSchoolat Itajuba
Cl Application ofEvolutionary Technique to PowerSystem Security Assessment
By Mohamed A. El-Sharkawi, University ofWashington
Cl Generation Expansion and Reactive Power Planning
By Kwang ~ Lee, PennsylvaniaState University
a NetworkPlanning
By Alcir J. Moniicellt; Universidade Estaduial de Campinas; Ruben Romero,
Universidade Estadual Paulista Julio de Mesquita Filho
[J OperationalPlanning: Unit Commitment and EconomicDispatch
By Vladimiro Miranda, University ofPorto
c Power System Controls: Particle Swarm Technique
By Yoshikazu Fukuyama, Fuji Electric Co. R&D
[J Power Plant Controls: Feedforward and Feedback Controller Design for Wide Range
OperationUsing Genetic Algorithm
By Kwang Y: Lee, PennsylvaniaState University
[J State Estimation
By Eduardo Nobuhiro Asada, Universidade Estaduial de Campinas; Ruben Romero,
Universidade Estadual Paulista Julio de Mesquita FiIho; Alcir J. Monticellt;
Universidade Estaduial de Campinas
(J Hybrid Models and Shade of Lamarckism
By Vladimiro Miranda, University ofPorto
(J Additional References on ModemHeuristic Optimization Techniques
By Kwang 1': Lee, PennsylvaniaState University

Table of Contents

Part 1: Theory ofEvolutionary Computation
Chapter 1. Theory of Evolutionary Computation 1
Chapter 2. Overview of Applications in Power Systems 16
Chapter 3. Fundamentals of Genetic Algorithm 24
Chapter 4. Fundamentals of Evolution Strategies and Evolutionary Programming 33
Chapter 5. Fundamentals of Particle Swann Techniques 45
Chapter 6. Fundamentals of Simulated Annealing 52
Chapter 7. Fundamentals of Tabu Search 67
Chapter 8. Hybrid Systems: An Example with Fuzzy Systems 81

Part 2: SelectedApplications ofEvolutionary Computation

Chapter 9. Application of Evolutionary Technique to Power System Security
Assessment 96
Chapter 10. Generation Expansion and Reactive Power Planning 101
Chapter 11. Network Planning 114
Chapter 12. Operational Planning: Unit Commitment and Economic Dispatch 130
Chapter 13. Power System Controls: Particle Swarm Technique 138
Chapter 14. Power Plant Controls: Feedforward and Feedback Controller Design
for Wide Range Operation Using Genetic Algorithm 152
Chapter 15. State Estimation 163
Chapter 16 Hybrid Models and Shade ofLamarckism 175
Chapter 17. Additional References on Modem Heuristic Optimization Techniques 190
Contributors' Biographies 218

Chapter 1
Theory of Evolutionary Computation

space that are likely to have higher values of fitness.. Thus,

1. INTRODUCTION for example, reproduction (selecti.on) is often .car:n~ o~t
with a probability that is proportional to the individual s
Evolutionary computation (Ee) paradigms generally differ
fitness value.
from traditional search and optimization paradigms in three Some Ee paradigms, and especially canonical genetic al-
-main ways: . , gorithms, use special encodings for the parameters of the
1. EC paradigms utilize a population of points (potential
problem being solved. In genetic algorithms, the. parameters
solutions) in their search,
are often encoded as binary strings, but any finite alphabet
2. EC paradigms use direct "fitness" information instead
can be used. These strings are almost always of fixed
of function derivativesor other related knowledge, and
length, with a fixed total number of Is and Os, in the case of
3. EC paradigms use probabilistic, rather than detenninis-
a binary string, being assigned to each parameter. By "f~ed
tic, transition rules.
length" it is meant that the string length does not vary dunng
In addition, EC implementations sometimes encode !he
the running of the EC paradigm. The string length (number
parameters in binary or other symbols, rather than working
of bits for a binary string) assigned to each parameter de-
with the parameters themselves. We now examine these
pends on its maximum range for the problem being solved,
differences in more detail.
and on the precision required.
Most traditional optimization paradigms move from one
Regardless of the paradigm implemented, evolutionary
point in the decision hyperspace to another, using some de-
computationtools often follow a similar procedure:
terministic rule. One of the drawbacks of this approach is
1. Initializethe population,
the likelihood of getting stuck at a local optimum. Be p~­
2. Calculate the fitness for each individual in the popula-
digms, on the other hand, start with a population of pomts
(hyperspace vectors). They typically generate a new popula-
3. Reproduce selected individuals to form a new popula-
tion with the same number of members each epoch, or
generation. Thus, many maxima or ~ima can b~ explored
4. Perform evolutionary operations, such as crossoverand
simultaneously, lowering the probability of getting stuck.
mutation,on the population, and
Operators such as crossover and mutation effectively en-
hance this parallel search capability, allowing the search to
s. Loop to step 2 until some condition is met
directly "tunnel through" from one promising hyperspace
Initializationis most commonly done by seedingthe popu-
regionto another. .. .. lation with random values. When the parameters are repre-
ac paradigms do not require info~o~ that IS auxdl~ sented by binary strings, this simply means generating ran-
to the problem, such as function derivatives, Many hill-
dom strings of Is and Os (with a uniform probabilityfor each
climbing searchparadigms,for example, require the c~cula­
value) of the fixed length described earlier. It is sometimes
tion of derivatives in order to explore the local maxunum,
feasible to seed the population with "promising" values,
In Ee optimization paradigms the fitness of each mem~ of
known to be in the hyperspace region relatively close to the
the population is calculated from the value of th~ function
optimum. The total number of individuals chosen to make
being optimized, and it is common to use the function output
up the population is both problem and paradigm dependent,
as the measure of fitness. Fitness is a direct metric of the
but is often in the range ofa few dozen to a few hundred.
performance of the individual population member on the
The fitness value is often proportional to the output value
function being optimized.
of the function being optimized, though it may also be de-
The fact that EC paradigms use probabilistic transition
rived from some combination of a number of function out-
rules certainly does not mean that a strictly random search is
puts. The fitness function takes as its inputs the .o~tputs of
being canied out. Rather, stochastic operatorsare appliedto
one or more functions, and outputs some probability of re-
operations that direct the search toward regions of the hyper-
production. It is sometimes necessary to transfonn the func-
tion outputs to produce an appropriate fitness metric; some- 4. GENETIC ALGORITHMS
timesit is not.
Selection of individuals for reproduction to constitute a 4.1 Introduction
new population (often called a new generation) is usually
It seems that every technology has its"jargon; genetic algo-
based upon fitness values. The higher the fitness, the more
rithms are no exception. Therefore, we begin by reviewing
likely it is that the individual will be selected for the new
some of the basic terminology that is needed to understand
generation. Some paradigms that are considered evolution-
the genetic algorithm (GA) literature. A sample problem is
ary, however, such as particle swarm optimization, can re-
then presented to illustrate how GAs work; a step-by-step
tain all population members from epoch to epoch.
analysis illustrates a GA application, with options discussed
In many, if not most, cases, a global optimum exists at one
for some of the individualoperations.
point in the decision hyperspace. Furthermore, there may be
stochastic or chaotic noise present. Sometimes the global 4.2 An Overview of GeneticAlgorithms .
optimum changes dynamically because of external influ-
Genetic algorithms (GAs) are search algorithms that re-
ences; frequently there are very good local optima as well.
flect in a primitive way some of the processes of natural evo-
For these and otherreasons,the bottom line is that it is often
lution. (As such, they are analogous to artificial neural net-
unreasonable to expect any optimization method to find a
works' status as primitive approximations to biological neu-
global optimum (even if it exists) within a finite time. The
ral processing.) Engineers and computer scientists do not
best that can be hopedfor is to find near-optimwn solutions,
care as much about the biological foundations of GAs as
and to hope that the time it takes to find them increases less
than exponentially with the number of variables. One lead-
their utility as analysis tools (another parallel with neural
networks). GAs often provide very effective search mecha-
ing EC researcher (H.-P. Schwefel 1994) suggests that the
nisms that can be used in optimization or classification ap-
focus should be on "meliorization" (improvement) rather
than optimization. We agree.
Put another way, evolutionary computation is often the Ee ~adigms work with a population of points, rather
second-best way to solve a problem. Classical methods such than a smgle point; each "point" is actually a vector in
as linear programming should often be tried first, as should
hyperspacerepresenting one potential,or candidate, solution
to the optimization problem. A population is thus just an
customized approaches that take full advantage of knowl-
edge about the problem. ensemble, or set, of hyperspace vectors. Each vector is
Why should we be satisfied with second best? Well, for called an individual in the population; sometimes an indi-
one thing, classical and customized approaches will fre- vidual in GA is referred to as a chromosome, becauseof the
quently not be feasible, and EC paradigms will be usable in analogyto genetic evolutionof organisms.
a vast number of situations. For another, a real strength of .Because real numbers are often encoded in GAs using
Ee paradigms is that they are generally quite robust. In this . bmary numbers, the dimensionality of the problem vector
field, robustness means that an algorithm can be used to mizht be different from the dimensionality of the bitstring
c~mosome. The number of elements in each vector(indi-
s~lve IDa?~ problems, and even many kinds of problems,
WIth a D11D1mum amount of special adjustments to account VIdual) equals the number of real parameters in the optimiza-
for special qualities of a particular problem. Typically an tion problem. A vector element generally corresponds to
evolutionary algorithm requires specification of the length of one parameter, or dimension, of the numeric vector. Each
the problemsolution vectors, some details of their encoding, elementcan be encoded in any number of bits, depending on
and an evaluation function - the rest of the program does not the representation of each parameter. The total number of
need to be changed. Finally, robust methodologies are gen- bits de.fines. the dimension of hyperspace being searched. If
erally fast and easy to implement. a GA IS being used to find "optimum" weights for a neural
The next section reviews genetic algorithms. It is one of network, for example, the number of vector elements equals
five areas of evolutionary computation: genetic algorithms, the number of weights in the network. If there are w
evolutionary programming, evolution strategies, genetic ~eights, ~d it is desired to calculateeach weight to a preci-
programmmg, and particle swarm optimization. Genetic sion of b bits, then each individual will consist of w· b bits
algorithms have traditionally received a majority of the at- and the dimension of binary hyperspace being searched i~
tention, and they cwrently account for many of the success- 2 wb •
~l ~plicati~ns in th~ literature. Particle swarm optimiza- The series of operations carried out when implementing a
non IS then discussed m the last section. "plain vanilla" GA paradigmis:
1. Initializethe population,

2. Calculatefitness for each individual in the population, in our population with an eight-bit binary string. The bi-
3. Reproduce selected individuals to form a new popula- nary string 00000000 will evaluate to 0, and 11111111 to
tion, 255.
4. Perform crossoverand mutation on the population, and It next must be decided how many individuals will make
5. Loop to step 2 until some condition is met. up the population. In an actual application,it would be com-
mon to have somewhere between a few dozen and a few
In some GA implementations, operations other than cross- hundred individuals. For the purposes of this illustrative
over and mutationare canied out in step four. example, however,the populationconsists of eight individu-
4.3 A SimpleGA ExampleProblem
The next step is to initializethe population. This is usually
Because implementing a "plain vanilla" GA paradigmis so done randomly. A randomnumber generator is thus used to
simple, a sample problem (also simple) seems to be the best assign a 1 or 0 to each of the eight positions in each of the
way to introduce most of the basic GA concepts and meth- eight individuals, resultingin the initial population in Figure
ods. As will be seen, implementing a simple GA involves 1. Also shown in the figure are the values of x and f(x) for
only copying strings, exchanging portions of strings, and each binary string.
flipping bits in strings. After fitness calculation, the next step is reproduction.
Our sample problem is to find the value of x that maxi- Reproduction consists of forming a new population with the
mizes the function f(x) = sin(7t x/2S6) over the range same total number of individuals by selecting from members
oS; x s 255 , where values of x are restricted to integers. This of the current population with a stochastic process that is
is just the sine function from zero to 1t radians. Its maxi- weighted by each of their fitness values. In the example
mum value of 1 occurs at 1t /2, or x = 128. The function problem, the sum of all fitness values for the initial popula-
value and the fitness value are thus defmed to be identical tion is 5.083. Dividing each fitness value by 5.083, then,
for the sample problem. yields a normalized fitness value fnorm for each individual.
There is only one variable in our sample problem: x. It is The sum of the normalized values is 1.
assumed for the sample problem that the GA paradigmuses These normalized fitness values are used in a process
a binary alphabet. The first decision to be made is how to called ''roulette wheel" selection, where the size of the rou-
represent the variable. This has been made easy in this case lette wheel wedge for each population member, which re-
since the variable can only take on integer values between 0 flects the probability of that individual being selected, is
and 255. It is therefore logical to represent each individual proportional to its normalized fitness value.

Individuals x f(x) f norm cumulative

f norm
1 0 1 1 1 1 0 1 189 .733 .144 .144
1 1 0 1 1 0 0 0 216 .471 .093 .237
0 1 1 0 0 0 1 1 99 .931 .184 .421
1 1 1 0 1 1 0 0 236 .243 .048 .469
1 0 1 0 1 1 1 0 174 .845 .166 .635
0 1 0 0 1 0 1 0 74 .788 .155 .790
0 0 1 0 0 0 1 1 35 .416 .082 .872
0 0 1 1 0 1 0 1 53 .650 .128 1.000

Figure 1: Initial populationand f{x) valuesfor GA example.

The roulette wheel is spun by generating eight random ness value. It is possible,though highly improbable, that the
numbers between 0 and 1. If a random number is between 0 individual with the lowest fitness value could be selected
and .144, the first individual in the existing population is eight times in a row and make up the entire next population.
selected for the next population. If one is between .144 and It is more likely that individuals with high fitness values are
(.144 + .093) = 237, the second individual is selected, and picked more than once for the new population.
so OD. Finally, if the random number is between (1 - .128) = The eight random numbers generated are 293, .971, .160,
.872 and 1.0, the last individual is selected. The probability .469, .664, .568, .371, and .109. This results in initial popu-
that an individual is selected is thus proportional to its fit-. lation member numbers 3, 8, 2, 5, 6, 5, 3, and 1 being cho-
sen to make up the population after reproduction, as shown First, the population is paired off randomly into pairs of
in Figure 2. parents. Sincethe order of the population after reproduction
in Figure 3 is already randomized, parents will be paired as
0 1 1 0 0 0 1 1 they appear there. For each pair, a random number is gener-
0 0 1 1 0 1 0 1 ated to determine whether crossover will occur. It is deter-
1 1 0 1 1 0 0 0 minedthat three of the four pairs will-undergo crossover.
1 0 1 0 1 1 1 0 Next, for the pairs undergoing crossover, two crossover
0 1 0 0 1 0 1 0 points are selected at random. (Other crossover techniques
1 0 1 0 1 i 1 0
are discussed later in this tutorial.) The portions of the
0 1 1 0 0 0 1 1
1 0 1 1 1 1 0 1
strings between the first and second crossover points (mov-
ing from left to right in the string) will be exchanged. The
Figure2: Population after reproduction. paired population, with the first and second crossoverpoints
labeled for the three pairs of individuals undergoing cross-
The next operation is crossover. To many evolutionary over, is illustrated in Figure 3a prior to the crossover opera-
computation practitioners, crossover of binary encodedsub- tion. The portions of the strings to be exchanged are in bold.
strings is what makes a genetic algorithm a genetic algo- Figure 3b illustrates the population after crossover is per-
rithm. Crossover is the process of exchanging portions of formed.
the strings of two "parent" individuals. Note that, for the third pair from the top, the first crossover
An overall probability is assigned to the crossover process point is to the right of the second. The crossover operation
which is the probability that, given two parents, the cross- thus "wraps around" the end of the string, exchanging the
over processwill occur. This crossover rate is often in the portionbetweenthe first and the second, movingfrom left to
range of .65 to .80; a value of .75 is selected for the sample right For two-point crossover, then, it is as if the head (left
end) of each individual string is joined to the tail (right end),
thus forming a ring structure. The section exchanged starts
at the first crossover point, moving to the right along the
binary ring, and ends at the second crossover point. The
values of x and f(x) for the population following crossover
appear in Figure3c and 3d, respectively.

1 2 Individuals x i(x)
0 1 110 o 011 1 o 1110 1 1 1 119 .994
0 0 111 o 110 1 00100 0 0 1 33 .394

1 2
111 0 1 110 0 0 1 0 1 0 1 0 0 0 168 .882
110 1 0 111 1 0 1 1 0 1 1 1 1 0 222 .405

2 1
o 110 0 1 0 110 1 0 0 0 1 0 1 0 138 .992
1 011 0 1 1 110 0 1 1 0 1 1 1 0 110 .976

0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 99 .937
1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 189 .733

(a) (b) (c) (d)

Figure 3: Populationbefore crossover showmgcrossover pomts(a); after crossover (b); and values ofx (c) and f(x) (d) after

The final operation in this plain vanilla genetic algorithm the probability of mutationcan vary widely according to the
is mutation. Mutation consists of flipping bits at random, application and the preference of the researcher. Values of
generally with a constant probability for each bit in the between .001 and .01 are not unusual for the mutationprob-
population. As is the case with the probability of crossover, ability. This meansthat the bit at each site on the bitstringis
flipped, on average, between0.1 and 1.0percent of the time. assigns the correct number of bits, and the coding. For
One fixed value is used for each generation and often is example, if a variable has a range from 2.5 to 6.5 (a dynamic
maintained for an entire run. range of 4) and it is desired to have a resolution of three
Since there are 64 bits in the example problem's popula- decimal places, the product of the dynamic range and the
tion (8 bits x 8 individuals), it is quite possible that none resolution requires a string 12 bits long, where the string of
would be altered as a result of mutation, so the populationof Os represents the value 2.5. A major advantage of being able
Figure 3b will be taken as the "final" population after one to represent variables in this way is that the user can think of
iterationof the GA procedure. Going thro~ the entire GA the population individuals as real-valued vectors rather than
procedure one time is said to produce a new generation. as bit strings, thus simplifying the developmentof GA appli-
The population of Figure 3b therefore represents the first cations.
generation of the initial randomizedpopulation. The "alphabet" used in the representation can, in theory,
Note that the fitness values now total 6.313, up from 5.083 be any finite alphabet. Thus, rather than using the binary
in the initial random population, and that there are now two alphabet of 1 and 0, we could use an alphabet containing
members of the population with fitness values higher than more characters or numbers. Most GA implementations,
.99. The average and maximum fitness values have thus however, use the binary alphabet
both increased.
4.4.2 Population Size
The population of Figure 3b, and the corresponding fitness
values in Figure 3d are now ready for another round of re- De Jong's dissertation (1975) offers guidelines that are
production, crossover, and mutation, producing yet another still usually observed: start with a relatively high crossover
generation. More generationsare produceduntil some stop- rate, a relatively low mutation rate, and a moderately sized
ping condition is met. The researcher may simply set a population - though just what constitutes a moderatelysized
maximumnumber of generationsto let the algorithmsearch, population is unclear. The main trade-off is obvious: a large
may let it run until a performancecriterion has been met, or population will search the space more completely, but at a
may stop the algorithm after some number of generations higher computational cost. The authors generally have used
with no improvement. populations of between 20 and 200 individuals, depending,
it seems, primarily on the string length of the individuals. It
4.4 A Review of GA Operations also seems (in the authors' experience) that the sizes of
Now that one iteration of the GA operations (one genera- populations used tend to increase approximately linearly
tion) for the example problem has been completed, each of with individual string length, rather than exponentially, but
the operations is reviewed in more detail. Various ap- "optimal" population size (if an optimal size exists) depends
proaches, andreasonsfor each, are examined. on the problem, as well.
4.4.1 Representation of Variables 4.4.3 Population Initialization
The representation of the values for the variable x was The initialization of the population is usually done
made (perhaps unrealistically) straightforward by choosing a stochastically, though it is sometimes appropriate to start
dynamic range of 256; an eight-bit binary number was thus with one or more individuals that are selected heuristically.
an obvious approach. Standard binary coding, however, is The GA is thereby initially aimed in promising directions, or
only one approach; others may be more appropriate. given hints. It is not uncommon to seed the populationwith
In this example, the nature of the sine function places the a few members selected heuristically, and to complete the
optimal value of x at 128, where f(x) is 1. The binary repre- population with randomly chosen members. Regardless of
sentation of 128 is 10000000; the representation of 127 is the process used, the population should represent a wide
01111111. Thus, the smallest change in fitness value can assortment of individuals. The urge to skew the population
require a change of every bit in the representation. This significantly should generally be avoided, if the limited
situation is an artifact of the encoding scheme and is not experience of the authors is generalizable.
desirable - it only makes the GA's search more difficult. 4.4.4 Fitness Calculation
Often, a better representation is one in which adjacent inte-
ger valueshave a Hammingdistance of one; in other words, The calculation of fitness values is conceptually simple,
adjacent values differ by only a singlebit. One such scheme though it can be quite complex to implement in a way that
is Graycoding. optimizes the efficiency of the GA' s search of the problem
Some GA software allows the user to specify the dynamic space. In the example problem, the value of f(x) varies
range and resolution for each variable. The program then (quite conveniently) from 0 to 1. Lurking within the prob-
lem, however, are two drawbacks to using the "raw" func-
tion output as a fitness function: one that is common to mizing the function!(x) = XZ over the range 0 to 10; there is
many implementations, the other arising from the nature of a higher differential in values of f(x) between adjacent val-
the sampleproblem. ues ofx near 10 than near o. Thus slight change of the inde-
The first drawback, common to many implementations, is pendent variable results in great improvement or deteriora-
that after the GA has beenrun for a number of generationsit tion of performance - which is equally informative - near
is not unusual for most (if not all) of the individuals' fitness the optimum.
values, after, say, a few dozen generations, to be quite high. In the discussion thus far, it has been assumed that optimi-
In cases where the fitness value can range from 0 to 1, for zation implies fmding a maximum value. Sometimes, of
example (as in the sample problem), most or all of the fit- course, optimization requires fmding a minimum value.
ness values may be 0.9 or higher. This lowers the fitness Some versions of GA implementations allow for this possi-
differences between individuals that provide the impetus for bility. Often, it is required that the user specify the maxi-
effective roulette-wheel selection; relatively higher fitness mum value I'maX of the function being optimized, ftx), over
valuesshould have a higherprobabilityof reproduction. the range of the search. The GA then can be programmed to
One way around this problem is to equally space the fit- maximize the fitness function/max - f(x). In this case, scal-
ness values. For example, in the sampleproblem,the fitness ing, described above, keeps track of fax over the past w
valuesused for reproduction could be equally spaced from 0 generations and uses it as a "roof' value from which to cal-
to I, assigning a fitness value of 1 to the most fit population culate fitness.
member, 0.875 to the second, and 0.125 to the lowest fitness
value of the eight. In this case the population members are 4.4.5 Roulette Wheel Selection
ranked on the basis of fitness and then their ranks are di- In genetic algorithms, the expected number of times each
vided by the number of individuals to provide a probability individual in the current population is selected for the new
threshold for selection. Note that the value of 0 is often not population is proportional to the fitness of that individual
assigned, since that would result in one population member relative to the average .fitness of the entire population.
being made ineligible for reproduction. Also note that ftx), Thus, in the initial population of the example problem,
the function result, is now not equal to the fitness, and that where the average fitness was 5.083/8 = 0.635, the third
in order to evaluate actual performance of the GA, the func- population member had a fitness value of 0.937, so it could
tion value should be monitored as well as the spacedfitness. be expected to appear about 1.5 times in the next population;
Another way around the problem is to use what is called it actually appeared twice.
scaling. Scaling takes into account the recent history of the The conceptualization is that of a wheel whose surface is
population, and assigns fitness values on the basis of com- subdivided into wedges representing the probabilities for
parison of individuals' performance to the recent average each individual. For instance, one point on the edge is de-
perfonnance of the population. If the GA optimization is termined to be the zero point and each arc around the circle
maximizing some function, then scaling involves keeping a corresponds to an area on the number line between zero and
record of the minimum fitness value obtained in the last w one. A random number is generated, between 0.0 and 1.0,
generations, where w is the size of the scaling window. If, and the individual whose wedge contains that number is
for example, w = 5, then the minimum fitness value in the chosen.
last five generations is kept and used instead of 0 as the In this way, individuals with greater fitness are more likely
"floor" of fitness values. Fitness values can be assigned a to be chosen. The selection algorithm can be repeated until
value based on their actualdistance from the floor value, or the desired number of individuals have been selected.
they can be equally spaced, as described earlier. One variation on the basic roulette wheel procedure is a
The second drawbackis that the exampleproblem exacer- process developed by Baker (1987) in which the portion of
bates the compression of fitness values situation described the roulette wheel is assigned based on each unique string's
earlier, because near the global optimum fitness value of 1, relative fitness. One spin of the roulette wheel then deter-
f(x) (which is also the fitness) is relativelyflat. There is thus mines the number of times each string will appear in the next
relatively little selection advantage for population members generation. To illustrate how this is done, assume the fitness
near the optimum value x = 128. If this situation is known values are normalized (sum of all equals 1). Each string is
to exist, a different representation schememight be selected, assigned a portion of the roulette wheel proportional to its
such as defming a new fitness function which is the function normalized fitness. Instead of one "pointer" on the roulette
outputraised to some power. wheel spun n times, there are n pointers spaced lin apart; the
Note that the shape of some functions "assists" discrimina- n-pointer assembly is spun once. Each of the n pointersnow
tion near the optimum value. For example, consider maxi- points to a string; each place one of the n pointers points

determines one population member in the next generation. tion to one set of environmental constraints, another indi-
If a string has a normalized fitness greater than lIn (corre- vidual might have evolved to deal better with another aspect
sponding to an expected value greater than 1), it is guaran- of survival. Perhaps one genetic line of rabbits has evolved
teed at leastone occurrence in the next generation. a winter coloration that protects it through the changing sea-
In the discussion thus far, it has been assumed that all of sons, while another has developed a "freeze" behavior that
the population members are replaced each generation. Al- makes it hard for predators to spot. Mating between these
though this is often the case, it sometimes is desirable to two lines of rabbitsmight result in offspring lacking both of
replace only a portion of the population, say, the 80 percent the advantages, offspring with one or the other characteristic
with the worstfitness values. The percentage of the popula- either totally or in some degree, and might end up with off-
tion replaced each generation is sometimes called the gen- spring possessing both of the advantageous traits. Selection
eration gap. will decide, in the long run, which of these possibilities are
Unless some provision is made, with standard roulette most adaptable; the ones that adapt, survive.
wheel selection it is possible that the individual with the Crossover is a term for the recombination of genetic in-
highest fitness value in a given generation may not survive formation during sexual reproduction. In GA, offspring
reproduction, crossover, and mutation to appear unaltered in have equal probabilities of receiving any gene from either
the new generation. It is frequently helpful to use what is parent, as the parents' chromosomes are combined ran-
called the elitist strategy, which ensures that the individual domly. In nature,chromosomalcombination leaves sections
with the highest fitness is always copied into the next gen- intact, that is, contiguoussections of chromosomes from one
eration. parent are combined with sectionsfrom the other, ratherthan
simply shuffling randomly. In GA there are many ways to
4.4.6 Crossover
The most important operator in GA is crossover, based on The two main attributes of crossoverthat can be variedare
the metaphor of sexual combination. (An operator is a rule the probabilitythat it occurs and the type of crossover that is
for changinga proposed problem solution.) If a solution is implemented. The following paragraphs examine variations
encoded as a bitstring, then mutation may be implemented of each.
by setting a probability threshold and flipping bits when a A crossover probability of 0.75 was used in the sample
random number is less than the threshold. As a matter of problem, and two-point crossover was implemented. Two-
fact, mutation is not an especially important operator in GA; point crossover with a probability of 0.60-0.80 is a rela-
it is usually set at a very low rate or omitted altogether. tively common choice, especially when Gray coding is used.
Crossover is more important, and adds a new dimension to The most basic crossover type is one-point crossover, as
the discussion of evolution so far. describedby Holland(1975/1992)and others, e.g., Goldberg
Other evolutionary algorithms use' random mutation (1989) and Davis (1991). It is inspiredby natural evolution
plus selection as the primary method for searching the land- processes. One-point crossover involves selecting a single
scape for peaks or niches. One of the greatest and most fun- crossover point at random and exchanging the portions of
damental search methods that biological life has found is the individual strings to the right of the crossover point.
sexual reproduction, which is extremely widespread Figure 4 illustrates one-point crossover; portions to be ex-
throughout both the animal and plant kingdoms. Sexual changed are in bold in Figure 4a.
reproduction capitalizes on the differences and similarities
among individuals within a species; where one individual
may have descended from a line that contained a good solu-

1 0 1 1 010 1 0 1 0 1 1 0 1 0 0
0 1 0 o 111 0 0 0 1 a o
1 0 1 0
(a) (b)

Figure 4: One-point crossoverbefore(a) and after (b) crossover.

Another type of crossover that has been found useful is the run flle can be set that increases the mutation rate sig-
called uniform crossover, described by Syswerda (1989). A nificantly when the variability in fitness values becomes
random decision is made at each bit position in the string as low, as is often the case late in the run.
to whether or not to exchange (crossover) bits between the
4.4.8 Final Comments on GeneticAlgorithms
parent strings. If a 0.50 probability at each bit position is
implemented, an average of about 50 percent of the bits in In sum, the genetic algorithm operates by evaluating a
the parent strings are exchanged. Note that a SO percent rate population of bitstrings (there are real-numbered GAs, but
will result in the maximum disruption due to uniform cross- binary implementations are more common), and selecting
over. Higher rates just mirror rates lower than SO percent. survivors stochastically based on their fitness, so fittermem-
For example, a 0.60 probability uniform crossover rate pro- bers of the population are more likely to survive. Survivors
duces results identical to a 0.40 probability rate. If the rate are paired for crossover, and often some mutation is per-
were 100 percent, the two strings would simply switch formed on chromosomes. Other operations might be per-
places, and if it were zero percent neither would change. formed as well, but crossover and mutation are the most
Values for the probability of crossover vary with the prob- important ones. Sexual recombination of genetic material is
lem. In general, values between 60 and 80 percent are com- a powerful method for adaptation.
mon for one-point and two-point crossover. Uniform The material on genetic algorithms in this tutorial has pro-
crossover sometimes works better with slighdy lower cross- vided only an introductionto the subject. It is suggested that
over probability. It is also common to start out running the the reader explore GAs further by sampling the references
GA with a relatively higher value for crossover, then taper cited in this section. With further study and application, it
off the value linearly to the end of the run, ending with a willbecomeapparent why GAshave such a devotedfollow-
value of, say, one-halfthe initial value. ing. In the words of Davis (1991):
"... [T]here is somethingprofoundly movingaboutlink-
4.4.7 Mutation inga genetic algorithm to a difficultproblemandreturn-
In GAs, mutation is the stochastic flipping of bits that oc- inglatertofind that the algorithm has evolveda solution
curs each generation. It is done bit-by-bit on the entire that is betterthan the one a humanfound With genetic
population. It is often done with a probability of something algorithms we are not optimizing,· we are creating condi-
like .001, but higher probabilities are not unusual. For ex- tions in whichoptimization occurs, as it may haveoc-
ample,Liepins and Potter (1991) used a mutation probability curredin the naturalworld. Onefeels a kind ofreso-
of .033 in a multiple-faultdiagnosis application. nanceat such timesthat is uncommon andprofound "
If the population comprises real-valued parameters, muta- This feeling, of course, is not unique to experiences with
tion can be implemented in different ways. For instance, in GAs;using other evolutionary algorithmscan result in simi-
an image classification application, Montana (1991) used lar feelings.
strings of real-valued parameters that represented thresholds
of event detection roles as the individuals. Each parameter 5. PARTICLESWARM OPTIMIZATION
in the string was range-limited and quantized (could take on
only a certain finite number of values). If chosen for muta- 5.1 Introduction
tion, a parameter was randomly assigned any allowed value
in the range of values valid for that parameter. This section presents an evolutionary computation method
The probability of mutation is often held constant for the for optimization of continuous nonlinear functions. The
entire run of the GA, although this approach will not pro- method was discovered by Jim Kennedy through simulation
duce optimal results in many cases. It can be varied during of a simplified social model; thus, the social metaphor is
the run, and if varied, usually is increased. For example, discussed, though the algorithm stands without metaphorical
mutation rate may start at .001 and end at .01 or so when the support. Much of the material appearing in this sectionwas
specifiednumber of generations has been completed. In the initially presented at two conferences (Kennedy and Eber-
software implementation described in Appendix B, a flag in hart 1995; Eberhart and Kennedy 1995). A special ac-
knowledgment is due to Jim Kennedy for origination of the simulations. Reynolds was intrigued by the aesthetics of
concept, and for many of the words in this section,whichare bird flocking choreography, and Heppner, a zoologist, was
his. Any mistakes, of course, are the responsibility of the interested in discovering the underlying rules that enabled
author. large numbers of birds to flock synchronously, often chang-
The particle swann optimization concept is described in ing directionsuddenly, scatteringand regrouping, etc.
terms of its precursors, and the stages of its development Both of these scientists had the insight that local processes,
from social simulation to optimizer are briefly reviewed. such as those modeled by cellular automata, might underlie
Discussed next are two paradigms that implement the con- the seemingly unpredictable group dynamics of bird social
cept, one globally oriented (GBESn and one locally ori- behavior. Both models relied heavily on manipulation of
ented (LBESn, followed by results obtained from applica- inter-individual distances; that is, the synchrony of flocking
tions and tests upon which the paradigms have been shown behavior was thought to be a function of birds' efforts to
to perform successfully. maintain an optimum distance between themselves and their
Particle swann optimization has roots in two main compo- neighbors.
nent methodologies. Perhaps more obvious are its ties to It does not seem a too-Iarge leap of logic to suppose that
artificial life (A-life) in general, and to bird flocking, fish some similar rules underlie the social behavior of animals,
schooling, and swarming theory in particular. It is also re- including herds, schools, and flocks, and that of humans. As
lated, however, to evolutionary computation, and it has ties sociobiologist E. O. Wilson (1975) haswritten, in reference
to both genetic algorithms and evolution strategies (Baeck to fish schooling, "In theory at least, individual members of
1995). the school can profit from the discoveries and previous ex-
Particle swarm optimizationcomprises a very simple con- perience of all other members of the school during the
cept, and paradigms are implemented in a few lines of com- search for food. This advantage can become decisive, out-
puter code. It requires only primitive mathematical opera- weighing the disadvantages of competition for food items,
tors, and is computationally inexpensive in terms of both whenever the resource is unpredictably distributed in
memory requirements and speed. Testing has found the im- patches." This statement suggests that social sharing of in-
plementations to be effective with several kinds of problems formation among conspeciates offers an evolutionaryadvan-
(Eberhart and Kennedy 1995). This section discusses appli- tage: this hypothesis was fundamental to the developmentof
cation of the algorithm to the training of neural network particleswann optimization.
weights. Particle swann optimizationhas also been demon- One motive for developing the simulation was to model
strated to perform well on genetic algorithm test functions. human social behavior, which is of course not identical to
The performance on Schaffer's /6 function, as described in fish schoolingor bird flocking. One important difference is
Davis(1991), is discussed. abstractness. Birds and fish adjust their physical movement
Particle swarm optimization can be used to solve many of to avoid predators, seek food and mates, optimize environ-
the same kinds of problems as genetic algorithms. This op- mental parameters such as temperature, etc. Humans adjust
timization techniquedoes not suffer, however, from some of not only physical movement, but cognitive or experiential
difficulties encountered with genetic algorithms; interaction variables as well. We do not usuallywalk in step and turn in
in the group enhances rather than detracts from progress unison (although some fascinating research in human con-
toward a solution. Further, a particle swarm system has formity shows that we are capable of it); rather, we tend to
memory, which a genetic algorithm population does not adjust our beliefs and attitudes to conform with those of our
have. Change in genetic populations results in destruction socialpeers.
of previous knowledge of the problem, except when elitism This is a major distinction in terms of contriving a com-
is employed, in which caseusually one or a small numberof puter simulation, for at least one obvious reason: collision.
individuals retain their "identities." In particle swarm opti- Two individuals can hold identical attitudes and beliefs
mization, individuals who fly past optima are tugged to re- without banging together, but two birds cannot occupy the
turn toward them; knowledge of good solutions is retained same position in space without colliding. It seems reason-
by all particles. able, in discussing human social behavior, to map the con-
cept of change into the bird/fish analogue of movement.
5.2 Simulating Social Behavior
This is consistent with the classic Aristotelian view of
A number of scientists have created computer simulations qualitative and quantitative change as types of movement
of various interpretations of the movements of organisms in Thus, besides moving through three-dimensional physical
a bird flock or fish school. Notably, Reynolds (1987) and space, and avoiding collisions, humans change in abstract
Heppner and Grenander (1990) developed bird flocking multidimensional space, collision-free. Physical space, of

course, affects informational inputs, but it is arguably a within the context of training a multilayer perceptron neural
trivial component of psychological experience. Humans network.
learn to avoid physicalcollision by an early age, but naviga-
5.4 Training a MultilayerPerceptron
tion of n-dimensional psychosocial space requires decades
of practice-and many of us never seem to acquire quite all
5.4.1 Introduction
the skills we need!
The problem of finding a set of weights to minimize re-
5.3 The ParticleSwarmOptimization Concept siduals in a feedforward neural network is not a trivial one.
As mentioned earlier, the particle swarm concept began as It is nonlinear and dynamic in that any change of one weight
a simulation of a simplified social milieu. The original in- may require adjustment of many others. Gradient descent
tent was to graphically simulate the graceful but unpredict- techniques, e.g., back-propagation of error, are usually used
able choreography of a bird flock. Initial simulations were to fmd a matrix of weights that meets error criteria, although
modified to incorporate nearest-neighbor velocity matching, there is not widespread satisfaction with the effectiveness of
eliminate ancillary variables, and incorporate multidimen- thesemethods.
sional search and acceleration by distance (Kennedy and A number of researchers have attempted to use genetic
Eberhart 1995). At some point in the evolutionof the algo- algorithms (GAs) to find sets of weights, but the problem is
rithm, it was realized that the conceptual model was, in fact, not well suited to crossover. Because a large number of
an optimizer. Througha process of trial and error, a number possible solutions exist, two chromosomes with high fitness
of parameters extraneous to optimization were stripped out evaluations are likely to be very different from one another.
of the algorithm, resulting in the very simple implementa- Therefore, recombination may not result in improvement.
tions described next. In this example, a three-layer network designed to solve
Particle swarm optimization is similar to a genetic algo- the XOR problem is used as a demonstrationof the particle
rithm (Davis 1991) in that the system is initialized with a swarm optimization concept. The network has two inputs,
population of random solutions. It is unlike a genetic algo- three hidden processing elements (PEs), and one output PEe
rithm, however, in that each potential solution is also as- The output PE returns a I if both inputs are the same, that is,
signed a randomized velocity, and the potential solutions, for input vector (1,1) or (0,0), and returns 0 if the inputs are
calledparticles, arethen "flown" through hyperspace. different (1,0) or (0,1). Counting bias inputs to the hidden
Each particle keeps track of its coordinates in hyperspace and output PEs, solution of this problem requires estimation
which are associated with the best solution (fitness) it has of 13 floating-point parameters. Note that, for the current
achieved so far. (The value of that fitness is also stored.) presentation, the number of hidden PEs is arbi1rary. A feed-
This value is called pbest. Another "best" value that is forward network with one or two hidden PEs can also solve
tracked by the global version of the particle swann opti- the XOR problem.
mizer is the overall best value, and its location, obtainedthus The particle swarm optimization approach is to "fly" a
far by any particlein the population. This is called gbest. population of particles through 13-dimensional hyperspace.
The particle swarm optimization concept consists of, at Eachparticle is initializedwith position and velocity vectors
each time step, changing the velocity (accelerating) each of 13 elements. For neural networks, it seems reasonable to
particle toward its pbest and gbest (global version). Accel- initialize all positional coordinates (corresponding to con-
eration is weighted by a random term, with separate random nectionweights) to within a range of (-I, I), and velocities
numbers being generated for acceleration toward pbest and should not be so high as to fly particles out of the usable
gbest. field. It is also necessaryto clamp velocities to some maxi-
There is also a local version of the optimizer in which, in mum to prevent overflow. The test examples use a popula-
addition to pbest, each particle keeps track of the best solu- tionof 20 particles for this problem. James Kennedyand the
tion, called lbest, attained within a local topological author have used populations of 10-50 particles for other
neighborhood of particles. Both the global and local ver- applications. The XOR data are entered into the net, and an
sions are describedin more detail later. error term to be minimized, usually squared error per output
The only variable that must be specifiedby the user is the PE, is computed for each of the 20 particles.
maximum velocity to which the particles are limited. An As the system iterates, individualagents are drawn toward
acceleration constant is also specified, but in the experience a global optimum based on the interaction of their individual
of the authors, it is not usually varied among applications. searches and the global or local group's public search. Error
Both the global and local versions of particle swarm opti- threshold and maximum iteration tennination criteria have
mizer implementations are introduced in the next section been specified. When these are met, or when a key is

pressed, iterations cease and the best weight vector found is velocity on that dimension is limited to VMAX. On any
written to a file. particular iteration, a good percentage of particles typically
are changing at this maximum rate, especially after a change
5.4.2 The GBESTModel in GBEST, when particles swann from one region toward
The standard"GBEST" particle swann algorithm, which is another.
the original form of particle swarm optimization developed, Thus, VMAX is a rather important parameter. It deter-
is very simple. The procedure listed below is for a minimi- mines, for instance, the fineness with which regionsbetween
zation problem. For maximization, reverse the inequality the present position and the target position will be searched.
signs in steps 3 and 4. The steps to run GBEST are: If VMAX is too high, particles might fly past good solu-
1. Initialize an array of particles with random positions and tions. On the other hand, if VMAX is too small, particles
velocitieson D dimensions. may not explore sufficiently beyond good regions. Further,
they could become trapped in local optima, unable to jump
2. Evaluate the desired minimization function in D vari- far enough to reach a better position in the data space.
ables. ACe_CONST represents the weighting of the stochastic
acceleration terms that pull each agent toward PBEST and
3. Compare evaluation with particle's previous best value GBEST positions. Thus, adjustment of this factor changes
PBESTD; if current value < PBEST[], then PBEST[] = the amount of "tension" in the system. Low values allow
current value and PBESTx[][d] = current position in D- agents to roam far from target regions before being tugged
dimensionalhyperspace. back, while high values result in abrupt movement toward
the target regions.
4. Compare evaluation with group's overall previous best
Table 1 shows results of manipulating VMAX and
(PBEST[GBEST]); if current value < PBEST[GBEST],
ACe_CONST for the GBEST model. Values in the table
then GBEST = particle's array index.
are median numbers of iterations of 20 observations. In
5. Change velocityby the following formula: V[][d] = some cases, the numbers of which are given in parentheses,
VD[d] + ACC_CNST*randO*(pBESTxD[d]- the swarm settled on a local optimum and remained there
Presentx[][d]) + until iterations equaled 2,000. This happened four times in
ACC_CNST*rand()*(pBESTx[GBEST][d] - the condition for VMAX=2.0 and ACC_CONST=2.0, and
PresentxO[d]) two times in the other two conditions for which VMAX=2.0,
plus once for VMAX=8.0. Medians are presented, rather
6. Move to PresentxO[d] + vO[d]; loop to step 2 and repeat than means, because the system was artificially stopped at
until a criterion is met. 2,000 iterations. These trials could probably have been run
to many more than 2,000 iterations; in any case, the means
5.4.3 Varying VMAX and ACC_CONST would have been inflated, while the effect on medians was
Particles' velocities on each dimension are. clamped by considerably less. Median values communicate the number
VMAX. If the sum of accelerations is greater than VMAX, of iterations that can be expected in SO percent of trials.
which is a system parameter specified by the user, then the

3.0 2.0 1.0 0.5
2.0 25.5(2) 22(4) 28(2) 37.5(2)
4.0 32(4) 19.5 20.5 34.5
6.0 29 19.5 36.5 33
8.0 31 27 29.5 23.5(1)

Table1: Median iterations required to meeta criterion of squaredCI1'Or per PE < .02. Population is 20 particles. Numbers in pa-
rentheses arethe numberof trials in which the system iterated 2,000times; in thesetrialsthe systemwas stuck in local optima.

In the neighborhood = 2 model, for instance, particle( i)
5.4.4 The LBEST Version
compares its error value with particle(i-I) and particle( i +
Based, among other things, on findings from social 1). The lbest version was tested with neighborhoods of
simulations, it was decided to design a "local" version (para- various sizes. Test results are summarized in the following
digm) of the particle swarm concept. In this paradigm, par- tables for neighborhoods consistingof the immediately adja-
ticles have information only of their own and their nearest cent neighbors (neighborhood = 2) and of the three neigh-
neighbors' bests,rather than that of the entire group. Instead bors on each side (neighborhood= 6).
of moving toward the stochastic average of pbest and gbest Table 2 shows results of performance on the XOR neural
(the best evaluation of the entire group), particles move to- networkproblemwith neighborhood =2. Note that no trials
ward the points defined by pbest and "lbest,' which is the fixated on local optima-nor have any in hundreds of unre-
index of the particle with the best evaluation in the ported tests.

2.0 1.0 0.5
2.0 38.5 47 37.5
4.0 28.5 33 53.5
6.0 29.5 40.5 39.5

Table2: Local version of particle swann with a neighborhood of two. Showsmedian iteraiionsrequiredto meeta criterionof
squared error per PE < .02 with a populationof 20 particles. There were no trials with more than 2,000 iterations.

Cluster analysis of sets of weights from this version tions on average to find a criterion error level. Table 3
showed that blocks of neighbors, consisting of regions of represents tests of an LBEST version with neighborhood =
from two to eightadjacent particles (individuals),had settled 6, that is, with the three neighbors on each side of the parti-
into the same regions of the solution space. It appears that cle taken into account (with arrays wrapped, so the fmal
the relative invulnerability of this version to local optima particlewas consideredto be beside the first one).
might result from the fact that a number of "groups" of par- This version is prone to local optima,at least when VMAX
ticles spontaneously separate and explore different regions. is small, though less so than the GBESTversion. Otherwise
It thus appears to be a more flexible approach to information it seems, in most cases, to perform somewhat less well than
processingthanthe GBEST model. the standardGBEST algorithm.
Nonetheless, though this version rarely if ever becomes
entrapped in a local optimum, it clearly requires more itera-

2.0 1.0 0.5
2.0 31.5(2) 38.5(1) 27(1)
4.0 36(1) 26 25
6.0 26.5 29 20

Table3: Local version of particle swann with a neighborhood of six. Median iteratioDS required to meet a criterion of squared er-
ror per PE < .02 with a populationof20 particles. Numbersin parenthesesare Dumber of trials in which the system iterated2,000
times; in these trials the system was stuck in local optima.

In sum, the neighborhood = 2 model offers some intrigu- convergence, but introduces the frailties of the GBEST
ing possibilities, in that it seems immune to local optima. It model.
is a highly decentralized model, which can be run with any
numberof particles. Expanding the neighborhood speeds up

6. CONCLUSIONS few lines of code, and requires only specification of the
problemand a few parametersin order to solve it.
This section has discussed the particle swarm concept and
examinedhow changes in the paradigm affect the number of
iterations required to meet an error criterion, and the fre-
quency with which models cycle intenninablyaround a non- Portions of this tutorial were adapted from Eberhart et al.
global optimum. Three versions were discussed: the (1996); the permission of Academic Press is acknowledged.
GBEST model in which each particle has information about Other portions have been adapted from the upcoming book
the population's best evaluation, and two variations of the entitled Swarm Intelligence (Kennedy et al. 2000). The kind
LBEST version, one with a neighborhoodof 6, and one with permission of Morgan Kaufmann Publishers is acknowl-
a neighborhood of 2. It appears that the original GBBST edged. Finally, the input of Jim Kennedy is gratefully ac-
version often performs best in terms of median number of knowledged.
iterations to converge, while the LBEST version with a
neighborhoodof 2 seems most resistant to local minima.
Particle swarm optimization is' an extremely simple algo- 8. SELECTED BIBLIOGRAPHY
rithm that seems to be effective for optimizing a wide range
of functions. The authors view it as a midlevel form of A-
life or biologically derived algorithm, occupying the space Baeck, T., and H.-P. Schwefel (1993). An overview of evo-
in nature between evolutionary search, which requires eons, lutionaryalgorithmsfor parameter optimization. Evolu-
and neural processing, which occurs (as far as we now tionary Computation, 1(1): 1-23.
know) on the order of milliseconds. Social optimization Baeck, T. (1995). Generalized convergence modelsfor
occurs in the time frame of ordinary experience - in fact, it tournament and (mu, lambda) selection. Proc. of the
is ordinary experience. Sixth International Conference on Genetic Algorithms,
In addition to their ties with A-life, particle swarm para- Morgan Kaufmann Publishers, San Francisco, CA, 2-7.
digms have obvious ties with evolutionary computation. Bagley, J. D. (1967). The behavior of adaptive systems
Conceptually, they seem to lie somewhere between genetic which employ genetic and correlation algorithms. Ph.D.
algorithms and evolution strategies. They are highly de- Dissertation, UniversityofMichigan, Ann Arbor, MI.
pendent on stochastic processes, like evolution strategies. Baker, J. A. (1987). Reducing bias and inefficiency in the
The adjustment toward pbest and gbest by a particle swann selection algorithm. Proc. 2nd Inti. Con! on Genetic
is conceptually similar to the crossover operationutilized by Algorithms: Genetic Algorithms and Their Applications,
genetic algorithms. They use the concept ofjitness, as do all Lawrence ErlbaumAssociates, Hillsdale, NJ.
evolutionarycomputationparadigms. Bezdek, J. C., S. Boggavarapu, L. O. Hall, and A. Bensaid
Unique to the concept of particle swarm optimization is (1994). Genetic algorithm guided clustering. Proc.
flying potential solutions through hyperspace, accelerating Int'l. Conf. on Evolutionary Computation, IEEE Service
toward "better" solutions. Other evolutionary computation Center, Piscataway,NJ, 34-39.
schemes operate directly on potential solutions, which are Davis, L., Ed. (1991). Handbook of Genetic Algorithms.
represented as locations in hyperspace. Much of the success Van Nostrand Reinhold,New York, NY.
of particle swarms seems to lie in the particles' tendency to De Jong, K. A. (1975). An analysis of the behavior of a
hurtle past their targets. In his chapter on the optimum allo- class of geneticadaptive systems. Doctoral dissertation,
cation of trials, Holland (1992) discusses the delicate bal- University of Michigan.
ance between conservative testing of known regions and Eberhart, R. C., and J. Kennedy (1995). A new optimizer
risky exploration of the unknown. It appears that the parti- using particle swarm theory. hoc. Sixth Inti. Sympo-
cle swarm paradigms allocate trials nearly optimally. The sium on Micro Machine and Human Science (Nagoya,
stochastic factors allow thorough search of spaces between Japan), IEEE Service Center, Piscataway,NJ, 39-43.
regions that have been found to be relatively good, and the Eberhart, R. C., P. K. Simpson and R. W. Dobbins (1996).
momentum effect caused by modifying the extant velocities Computational Intelligence PC Tools. Academic Press
rather than replacing them results in overshooting, or explo- Professional,Boston, MA.
ration of unknown reions of the problem domain. Fogel, D. B. (1991). System Identification Through Simu-
Much further research remains to be conducted on this lated Evolution: A Machine Learning Approach to
simple new concept and paradigms. The goals in developing Modeling. Ginn Press, Needham Heights, MA.
them have been to keep them simple and robust; these goals
seem to have been met. The algorithm is written in a very
Fogel, D. B. (1995). Evolutionary Computation: Towarda Holland, J. H. (1962). -Outline for a logical theory of adap-
New Philosophy of Machine Intelligence. IEEE Press, tive systems. Journal ofthe Associationfor Computing
Piscataway,NJ. Machinery, 3:297-314.
Fogel, D. B. (2000). What is evolutionary computation? Holland, J. H. (1992) [orig. ed. 1975].. Adaptation in Natu-
IEEE Spectrum, 37(2). ral and Artificial Systems. MIT Press, Cambridge, MA.
Fogel, L. J., A. J. Owens, and M. J. Walsh (1966). Artificial Kennedy, J., and R. C. Eberhart (1995). Particle swarm op-
Intelligence through Simulated Evolution, John Wiley, timization. hoc. IEEE Inti. Conf on Neural Networks
New York, NY. (Perth, Australia), IEEE ServiceCenter, Piscataway, NJ,
Fogel, L. J. (1994). Evolutionary programming in perspec- IV:1942-1948.
tive: the top-down view. In Computational Intel- Kennedy, J., R C. Eberhart, and Y. Shi (2000). Swarm In-
lingece: Imitating Life, J. M. Zurada, R. J. Marks II and telligence. Morgan Kaufmann Publishers, San Fran-
c. 1. Robinson, Eds., IEEE Press, Piscataway, NJ, 135- cisco,CA (in press). .
146. Koza, 1. R. (1992). Genetic Programming: On the Pro-
Fraser, A. S. (1957). Simulation of genetic systems by gramming ofComputers by Means ofNatural Selection.
automatic digital computers. Australian Journal ofBio- MIT Press, Cambridge, MA.
logical Science, 10:484-499. Levy, S. (1992). Artificial Life. Random House, New York,
Fraser, A. S. (1960). Simulation of genetic systems by NY.
automatic digital computers: 5-linkage, dominance and Liepins, G. E. and W. D. Potter (1991). A genetic algorithm
epistasis. In Biometrical Genetics, O. Kempthome, Ed., approach to multiple-fault diagnosis. In Handbook of
Macmillan,New York, NY, 70-83. GeneticAlgorithms, L. Davis, Ed., Van Nostrand Rein-
Fraser, A. S. (1962). Simulation of genetic systems. Jour- hold, New York, NY.
nal ofTheoretical Biology, 2:329-346. Michalewicz, Z., and M. Michalewicz (1995). Pro-life ver-
Friedberg, R. M. (1958). A learning machine: Part I. IBM sus pro-choice strategies in evolutionary computation
Journal ofResearch and Development, 2:2-13. techniques. In Computational Intelligence: A Dynamic
Friedberg, R. M., B. Dunham, and J. H. North (1959). A System Perspective, M. Palaniswami, Y. Attikiouzel, R.
leaming machine: Part II. IBM Journal ofResearch and Marks, D. Fogel and T. Fukuda, Eds., IEEE Press, Pis-
Development, 3:282-287. cataway,NJ, 137-151.
Goldberg,D. E. (1983). Computer-aided gas pipeline opera- Mitchell, M. (1996). An Introduction to Genetic Algo-
tion using genetic algorithms and rule learning (doctoral rithms. Cambridge, MA: M. I. T. Press.
dissertation, University of Michigan). Dissertation Ab- Montana, D. J. (1991). Automated parameter tuning for
stracts International, 44(10), 3174B. interpretation of synthetic images. In Handbook of Ge-
Goldberg, D. E. (1989). Genetic Algorithms in Search, Op- netic Algorithms, L. Davis, Ed., Van Nostrand Rein-
timization, and Machine Learning. Addison-Wesley, hold, New York, NY.
Reading, MA. Pedrycz,W. (1998). Computational Intelligence: An Intro-
Grefenstette, J. 1. (1984a). GENESIS: A system for using duction. Boca Raton, FL: CRC Press.
genetic search procedures. Proc. of the 1984 Conf on Rechenberg, I. (1965). Cybernetic solution path of an ex-
IntelligentSystemsand Machines, 161-165. perimental problem. Royal Aircraft Establishment, li-
Grefenstette, J. J. (1984b). A user's guide to GENESIS. brary translation 1122,Famborough,Hants, U.K.
Technical Report CS-84-11, Computer Science Dept., Rechenberg, I. (1973). Evolutionsstrategie: Optimierung
VanderbiltUniversity,Nashville, TN. techntscher Systeme nach Prinzipten der biologischen
Grefenstette, J. J., Ed. (1985). Proc. of an International Evolution, Frommann-Holzboog Verlag, Stuttgart, Ger-
Conference on Genetic Algorithms and Their Applica- many.
tions. LawrenceErlbaum Associates, Hillsdale, NJ. Rechenberg, I. (1994). Evolution strategy. In J. Zurada, R.
Haupt, R. and Haupt, S. (1998). Practical Genetic Algo- Marks, II, and C. Robinson, Eds., ComputationalIntel-
rithms. New York: John Wiley and Sons. ligence-Imitating Life, IEEE Press, Piscataway, NJ,
Heppner, F., and U. Grenander (1990). A stochastic nonlin- 147-159.
ear model for coordinated bird flocks. In S. Krasner, Reynolds, C. W. (1987). Flocks, herds, and schools: A
Ed., The Ubiquity of Chaos. AAAS Publications, distributed behavioral model. Computer Graphics,
Washington, DC. 21(4):25-34.
Schaffer,J. D. (1984). Some experiments in machine learn-
ing using vector evaluated genetic algorithms. Unpub-
lished doctoral dissertation, Vanderbilt University, Smith1980 Smith, S. F. (1980). A learning system based
Nashville, TN. on genetic adaptive algorithms. Unpublished doctoral
Schwefel, H.-P. (1965). Kybemetische Evolution als Strate- dissertation, University of Pittsburgh, Pittsburgh, PA.
gie der experimentellen Forschung in der Stromung- Syswerda, G. (1989). Uniform crossover in genetic algo-
stechnik. Diploma thesis, Technical University of Ber- rithms. In Proc. ofthe Third Int'l. Conf. on Genetic Al-
lin, Germany, gorithms, J. D. Schaffer, Ed., Morgan Kaufmann Pub-
Schwefel, H.-P. (1994) On the evolution of evolutionary lishers,San Mateo, CA.
computation. In J. M. Zurada, R. J. Marks II and C. J. Wilson, E. O. (1975). Sociobiology: The New Synthesis.
Robinson, Eds., Computational Intelligence: Imitating BelknapPress,.Cambridge, MA.
Life. IEEE Press, Piscataway,NJ.

Chapter 2
Overview of Applications in Power Systems

Abstract--This survey covers the broad area of evolutionary For discrete optimization problems, classical mathematical
computation applications to optimization, model identification, and programming techniques have collapsed for large-scale
control in power systems. Almost all reviewed papers have been problems. Optimizationalgorithmssuch as "branch and bound"
published in the IEEETransactions and the lEE Proceedings. A total of and dynamic programming, which seek for the best solution,
146 articles are listed in this survey. It shows the development of the
have no chance in dealing with the above mentioned problems,
area and identifies the current trends. The following techniques are
considered under the scopeof evolutionary computation: evolutionary unless significant simplifications are assumed (e.g., besides the
algorithms (e.g., genetic algorithms, evolution strategies, evolutionary curseof dimensionality, dynamicprogramming has difficulty in
programming, and genetic programming), simulated annealing, tabu dealing with time-dependent constraints). Problems such as
search, and particleswarmoptimization. generation scheduling have the typical features of large-scale
combinatorial optimization problems, i.e., NP-complete
IndexTerms--Survey, evolutionary computation, powersystems. .problems cannot be solved for the optimal solution in a
reasonable amount of time. For this class of problems, general
1. OPTIMIZATION purpose heuristic search techniques (problem independent),
such as BC, have been very efficient for fmding near optimal
Optimization is the basic concept behind the application of
solutions in reasonable time.
evolutionary computation (EC) to any problem in power
systems [1], [2]. Besides the problems in which optimization
itself is the final goal [3]-[89], it is also a means for modeling/ 2. POWER SYSTEMAPPLICAnONS
forecasting [109]-[119], control [120]-[135], and simulation
[145], [146]. Optimization models can be roughly divided in Generation scheduling is one of the most popular
two classes: continuous (involving real variables only) and applications of EC to power systems [3]-[31]. The pioneerwork
discrete (with at least one discrete variable). The objective of Zhuang and Galiana [3] inspired subsequent papers on the
function(s) (single or multiple) and the constraints of the application of general-purpose heuristic methods to unit
problem can be linear or nonlinear, convex or concave. commitment and hydrothermal coordination.Realisticmodeling
Optimization techniques have been applied to severalproblems is possible when solving these problems with such methods.
in power systems. Thennal unit commitment / hydrothermal The general problem of generation scheduling is subject to
coordination and economic dispatch I optimal power flow, constraints such as power balance, minimum spinning reserve,
maintenance scheduling, reactive sources allocation and energy constraints, minimum and maximum allowable
expansion planning are amongthe most important applications. generations for each unit, minimum up- and down-times of
Modem heuristic search techniques such as evolutionary thennal generation units, ramp rates limits for thermal units,
algorithms are still not competitive for continuous optimization and level constraints of storage reservoirs. Incorporation of
problems such as economic dispatch and optimal power flow. crew constraints, take-or-pay fuel contract [7], water balance
Successive linear programming, interior point methods, constraints caused by hydraulic coupling [9], rigorous
projected augmented Lagrangian, generalizedreduced gradient, environmental standards [13], multiple-fuel-constrained
augmented Lagrangian methods and sequential quadratic generation scheduling [16], and dispersed generation and
programming have all a long history of successful applications energystorage [21] is possiblewith BC techniques.
to this type of problem. However, the trade-off between A current trend is the utilization of Ee only for dealing with
modeling precisionand optimality has to be takeninto account. the discrete optimization problem of deciding the on/off status
One good example is when the input-output characteristics of the generation units. Mathematical programming techniques
of thermal generators are highly nonlinear (non-monotonically are employed to perform the economic dispatch, while meeting
increasing) due to effectssuch as "valve points" [32],[40],[42]. all plant and system constraints. Expansion planning is another
In this situation, their incremental fuel cost curves cannot be area which has been extensively studied with the EC approach
reasonably approximated by quadratic or piecewise quadratic [55]-[89]. The transmission expansion planning for the North-
(or linear) functions. Therefore, traditional optimization Northeastern Brazilian network, for which optimal solution is
techniques, although achieving mathematical optimality, have unknown, has been evaluated using genetic algorithms (GAs).
to sacrificemodelingaccuracy,providing sub-optimal solutions The estimated cost is about 8.SOA» less than the best solution
in a practicalsense. Evolutionary computationalgorithms allow obtained by conventional optimization [78]. Economic load
precise modeling of the optimization problem, although usually dispatch / optimal power flow [32]-[50] and maintenance
not providing mathematically optimal solutions, but nesr scheduling [51]-[54] have also been solved by EC methods.
optimal ones. Another advantage of using EC for solving Another interesting application is the simulation of energy
optimization problems is that the objective function does not markets[145], [146].
have to be differentiable.
3. MODELIDENTIFICATION essential for feeding the analytical methods used for
determining energy prices. The variability and nonstationarity
System identification methods can be applied to estimate of loads are gettingworse due to the dynamicsof energytariffs.
mathematical models based on measurements. Parametric and Besides, the number of nodal loads to be predicted does not
non-parametric are the two main classes in system identification allow frequent interventions from load forecasting specialists.
methods. The parametric methods assume a known model More autonomous load predictors are needed in the new
structure with unknown parameters. Their performances depend competitive scenario.
on a good guess of the model order, which usually requires With power systems growth and the increase in their
previous knowledge of the system characteristics. System complexity, many factors have become influential to the electric
identification can be used for modeling a plant or a problem power generation and consumption (load management, energy
solution (e.g., pattern recognition [112], [118]). The following exchange, spot pricing, independent power producers, non-
sections show a few examples of successful applications of Ee conventional energy, etc.). Therefore, the forecasting process
to identification. has become even more complex, and more accurate forecasts
are needed. The relationshipbetween the load and its exogenous
3.1. Dynamic Load Modeling factors is complex and nonlinear, making it quite difficult to
model through conventional techniques, such as time series
The fundamental importance of power system components linear modelsand linear regression analysis. Besidesnot giving
modeling has been shown in the literature. Regardless of the the required precision, most of the traditional forecasting
study to be performed, accurate models for transmission lines, techniques are not robust enough. They fail to give accurate
transformers, generators, regulators and compensators have predictionswhen quick weather changes occur. Otherproblems
already been proposed. However, the same has not occurred for include noise immunity, portability and maintenance.
loads. Although the importance of load modeling is well Linear methods interpret all regular structure in a data set,
known, especially for transient and dynamic stability studies, such as a dominantfrequency, as linear correlations. Therefore,
the random nature of a load composition makes its linear models are useful if and only if the power spectrum is a
representation very difficult. useful characterization of the relevant features of a time series.
Two approaches have been used for load modeling. In the Linear modelscan only represent an exponentially growingor a
first one,based on the knowledge of the individualcomponents, periodically oscillating behavior. Therefore, all irregular
the load model is obtained through the aggregation of the load behavior of a system has to be attributed to a random external
components models. The second approach does not require the input to the system. Chaos theory has shown that random input
knowledge of the load physical characteristics. Based on is not the only possible source of irregularity in a system's
measurements related to the load responses to disturbances, the output
modelis estimated using system identificationmethods. The goal in creating an ARMA model is to have the residual
The composition approach requires infonnation that is not as white noise [111]. This is equivalent to produce a flat power
generally available, which consists in a disadvantage of this spectrumfor the residual. However, in practice,this goal cannot
method. This approach does not seem to be appropriate since be perfectly achieved. Suspicious anomalies in the power
the determination of an average (and precise) composition for spectrum are very common, i.e., the residual's power spectrum
each load bus of interest is virtually impossible. The second is not really flat. Consequently, it is difficult to say if the
approach does not suffer from this drawback, since the load to residual corresponds to white noise or if there is still some
be modeledcan be assumed a black-box. However, a significant useful information to be extracted from the time series. Neural
amount of data related to staged tests and natural disturbances networks can find predictable patterns that cannot be detected
affectingthe systemneeds to be collected. by classical statistical tests such as auto(cross)correlation
Considering the shortcomings of the two approaches, and coefficientsand power spectrum.
the fact that data acquisition and processing are becoming very Besides, many observed load series exhibit periods during
cheap, it seems that the system identification approach is more which they are less predictable,depending on the past history of
in accordance to current technology. This approach allows real- the series. This dependence on the past of the series cannot be
time load monitoring and modeling, which are necessary for on- represented by a linear model [119]. Linear models fail to
line stability analysis. As the dynamic characteristics of the consider the fact that certain past histories may permit more
loads are highly nonstationary, structural adaptation of the accurate forecasting than others. Therefore, differently from
corresponding mathematical models is necessary. Evolutionary nonlinear models, they cannot identify the circumstances under
computation can be used in this adaptive process, searching for which more accurate forecasts can be expected.
new model structures and parameters. Examples of this The neural networks (NNs) ability in mapping complex
possibility are describedin [113] and [115]. nonlinearrelationships is responsiblefor the growing number"of
their applications to load forecasting. Several electric utilities,
3.2. Short-Term Load Forecasting allover the world, have been applying NNs to short-term load
forecasting in an experimental or operational basis. Despite
The importance of short-term load forecasting has increased, their success, there are still some technical issues that surround
lately. With deregulation and competition, energy price the application of NNs to load forecasting, particularly with
forecasting has become a big business. Load bus forecasting is regard to parameterization. The main issue in the application of
feedforward NNs to time series forecasting is the question of later. Therefore, pruning should be appliedas -a complementary
how to achieve good generalization. The NN ability to procedure to growing methods, in order to remove parts of the
generalize is extremely sensitive to the choice of the network's modelthat becomeunnecessary during the constructive process.
architecture, preprocessing of data, choice of activation
functions, number of training cycles, size of training sets, 3.3.2. Types of Approximation Functions
learning algorithm and the validationprocedure.
The greatest challenges in NN training are related with the The greatest concern when applying nonlinear NNs is to
issues raised in the previous paragraph. The huge number of avoid unnecessary complex models to not overfit the training
possible combinations of all NN training parameters makes its patterns. The ideal model is the one that matches the
application not very reliable. This is especially true when a complexity of the available data. However, it is desirable to
nonstationary system has to be tracked, i.e., adaptation is work with a general model that could provide any required
necessary, as it is the case in load forecasting. Nonparametric degreeof nonlinearity. Amongthe models that can be classified
NN models have been proposed in the literature [114]. With as universal approximators, i.e., the ones which can
nonparametric modeling methods, the underlying model is not approximate any continuous function with arbitrary precision,
known, and it is estimated using a large number of candidate the following types are the most important:
models to describe available date, Application of this kind of - multilayernetworks;
model to short-term load forecasting has been neglected in the - local basis functionnetworks;
literature. Although the very first attempt to apply this idea to - trigonometric polynomials; and
short-term load forecasting dates back to 1975 [109], it is still - algebraicpolynomials.
one of the few investigations on this subject, despite the The universal approximators above can be linear, although
tremendous increasein computationalresources. nonlinear in the inputs, or nonlinear in the parameters.
Regularization criteria, aDalytic (e.g., Akaike's Information
3.3. NeuralNetworkTraining Criterion, Minimum Description Length, etc.) or based on
resampling (e.g., cross-validation), have been proposed. In
The main motivation for developing nonparametric NNs is practice, model regularization considering nonlinearity in the
the creation of fully data driven models, i.e., automatic selection parameters is very difficult. An advantage of using universal
of the candidate model of the right complexity to describe the approximators that are linear in the parameters is the possibility
trainingdata. The idea is to leave for the designer only the data- of decoupling the exploration of architecture space from the
gathering task. Obviously, the state of the art in this area has not weight space search. Methods for selecting models with
reach that far. Every so-called nonparametric model still has nonlinearity in the parameters attempt to explore both spaces
some dependence on a few pre-set training parameters. A very simultaneously, which is an extremely hard non-convex
useful byproduct of the automatic estimation of the model optimization problem.
structure is the selectionof the most significant input variables
for synthesizing a desired mapping. Input variable selection for 4. CONTROL
NNs has been performed using the same techniques applied to
linear models. However, it has been shown that the best input Another importantapplicationofBe in power systems is the
variables for linear models are not among good input variables parameter estimation and tuning of controllers. Complex
for nonlinearones. systems cannot be efficiently controlled by standard feedback,
since the effects caused by plant parameter variations are not
3.3.1. PruningVersusGrowing eliminated. Adaptive control can be applied in this case, i.e.,
when the plant is time invariant with partially unknown
Nonparametric NN training uses two basic mechanisms for parametersor the plant is timevariant.
fmdingthe most appropriate architecture: pruning and growing. In practical applicatioDS, it is difficult to express the real
Pruning methods assume that the initial architecture contains plant dynamics in mathematical equations. Adaptive control
the optimal structure. It is common practice to start the search schemes can adjust the controller according to process
using an oversized network. The excessive connections and/or characteristics, providing a high performance level. The
neuronshave to be removed during training, while adjusting the adaptive control problem is concerned with the dynamic
remaining parts. The pruning methods have the following adjustment of the controller parameters, such that the plant
drawbacks: output follows the referencesigDal.
- there is no mathematicallysound initializationfor the neural However, conventional adaptive control has some
network architecture, therefore initial guesses usually use drawbacks. Existent adaptive control algorithms work for
very large structures; and specific problems.They do not workas well for a widerange of
- due to the previousargument, a lot of computational effort is problems. Every application must beaDalyzed individually, i.e.,
wasted. a specific problem is solved using a specificalgorithm. Besides,
As the growing methods operate in the opposite direction of a compatibility study between the model and the adaptive
the pruning methods, the shortcomings mentioned before are algorithm has to be performed.
overcome. However, the incorporation of one elementhas to be The benefits of coordination betweenvoltage regulation and
evaluated independently of other elements that could be added damping enhancements in power systems are well known. This
problem has been aggravated as power networks operational 7.1. Surveys
margins decrease. GAs can be used to simultaneously tune the
parameters of automatic voltage regulators with the parameters [1] V. Miranda, D. Srinivasan, and L.M. Proenea; "Evolutionary
computation in power systems", 12th Power Systems Computation
of power system stabilizers and terminal voltage limiters of Conference, August 1996,Vol. 1, pp. 25-40..
synchronousgenerators [120]-[135]. [2] K. Nara: "State of the arts of the modern heuristics application to power
One of the objectives is to maximize closed-loop system systems", IEEE PES Winter Meeting, January 2000, Vol. 2, pp. 1279-
damping, which is basically achieved by tuning the stabilizers 1283.
parameters. The second objective is to minimize terminal
voltage regulating error, which is _mainly accomplished by 7.2. GenerationScheduling
tuning the automatic voltage regulators and terminal voltage
limiters parameters. Minimum closed-loop damping and F. Zhuang and F. Galiana: "Unit commitment.by simulatedannealing",
IEEE Transactions on-PowerSystems, Vol. S, No.1, February 1990,pp.
maximum allowable overshoot are constraints that can be 311-317.
incorporated into the optimization problem. The design of the [4] D. Dasgupta and D.R. McGregor: "Thennal unit commitment using
control system takes into account system stabilization over a genetic algorithms", lEE Proceedings - Generation, Transmission and
Distribution, Vol. 141,No. S, September 1994,pp. 459-465.
pre-specified set of operating conditions (nominal and [51
K.P. Wongand Y.W. Wong:"Thermal generatorschedulingusing hybrid
contingencies). geneticlsimulated-8DDe8ling approach", lEE Proceedings - Generation,
Transmission and Distribution, Vol. 142,No.4, July 1995,pp. 372-380.
5. CONCLUSIONS - [6] S.A Kazarlis, A.G. Bakirtzis, and V. Petridis: "A genetic algorithm
solutionto the unit commitment problem", IEEE Transactions on Power
Systems, Vol. 11,No.1, February 1996,pp. 83-92.
This survey has covered the papers published in the IEEE [7] K.P. Wong and Y.W.W. Suzannah: "Combined genetic algorithm I
Transactions and lEE Proceedings on BC applications to power simulated annealing I fuzzy set approach to short-term generation
systems. System expansion planning and generationscheduling scheduling with take-or-pay fuel contract", IEEE Transactions on Power
Systems, Vol. 11, No.1, February 1996, pp. 128-136.
have been the most carefully investigated problems. Great [8] x. Bai and S.M. Shahidehpour: "Hydro-thennal scheduling by tabu
progress) particularly for the transmission system expansion search and decomposition method", IEEE Transactions on Power
planning, has been reported. Distribution systems have also Systems, Vol. 11, No.2, May 1996,pp. 968-974.
attracted significantattention. [9] P.-H. Chenand H.-C. Chang: "Genetic aided scheduling ofbydraulically
Regarding different Be techniques, GAs are by far the most coupled plants in hydro-thennal coordination", IEEE Transactions on
Power Systems, Vol. 11,No.2, May 1996,pp. 975-981.
popular. The application of GAs to large-scaleproblems is still [10] P.-e. Yang, H.-T. Yang, and C.-L. Huang: "Scheduling short-term
in progress. Only a few truly large power systems have been hydrotbermal generation using evolutionary programming techniques",
used for testing the EC approach. Evolutionary programming, lEE Proceedings - Generation, TraDSmission and Distribution, Vol. 143,
although less popular than GAs) has shown great potential for [11] T.T. Maifeld July 1996, pp. 371-376.
and G.B. Sheble: "Genetic-based unit commitment
tackling complexpractical applications. algoritbm", IEEE Transactions on Power Systems, Vol. 11, No.3,
After recognizing the limitations of the initial ideas, the EC August1996,pp. 1359-1370.
research community has merged many ideas that had been [12] D. Srinivasan and A.G.B. TettamaDzi: "Heuristics-guided evolutionary
independently proposed. In fact, algorithmsbased on evolution approach to multiobjective pneration scheduling", lEE Proceedings -
Generation, TlBDSIDission and Distribution, Vol. 143, No.6, November
theory are becoming prettymuch similar. Simulated annealing, 1996,pp. 553-559.
although the oldest heuristic search technique applied to power [13] D. Srinivasan and A.G.B. TettamaDzi: 66An evolutiooary algorithm for
systems, has shown limited capacity for dealing with large-scale evaluation of emission compliance options in view of the clean air act
problems. Tabu search and particle swarm still need more amendments", IEEE Transactions on Power Systems, Vol. 12, No.1,
February1997,pp. 336-341.
empirical evidence of their potential for solving such problems. [14] 8.-J. Huang and C.-L. Huang: "Application of genetic based neural
There is a clear trend towards the combination of EC networks to thennal unit commitmentn , IEEE Transactions on Power
methods among themselves, and with classical mathematical Systems, Vol. 12, No.2, May 1997, pp. 654-660.
programming techniques (e.g., [5], [7], [8], [16], [24], [28]). [IS) H.-T. Yang, P.-C. Yang, and C.-L. Huang: 66A parallel geneticalgorithm
to solving the unit commitment problem: implementation on the
The challenge for power engineers is to mcorporate domain- transputer networks", IEEETransactions on PowerSystems, Vol. 12, No.
specific knowledge into the heuristic search process, without 2, May 1997,pp. 661-668.
deteriorating the Ee exploration capability. A good start has (16) K.P. Wong and Y.W.W. Suzalmah: "Hybrid genetic I simulated
been made,but the real challenges still lie ahead. annealing approach to short-term multiple-fuel coDStraint generation
scheduling", IEEE Transactions on Power Systems, Vol. 12,No.2, May
1997,pp. 776-784.
6. ACKNOWLEDGMENTS [17] A.H. Mantawy, Y.L. Abdel-Magid, and S.Z. Selim: "Unit commitment
by tabu search", lEE Proceedings - Generation, Transmission and
Distribution, Vol. 145,No.1, January 1998,pp. 56-64.
This work was supported by the Brazilian Research Council [18] A.H. Mantawy, Y.L. Abdel-Magid, and S.l. Selim: "A simulated
(CNPq) under grant No. 300054/91-2. Alves cia Silva would annealing algorithm for unit commitmeDt", IEEE Transactions on Power
also like to thank PRONEX for the financial support and the Systems, Vol. 13,No.1, February 1998,pp. 197-204.
editors for their time and effort reviewing this chapter. [19] S.O. Orero and M.R.. Irving: "A genetic algorithm modellingframework
aDd solution technique for short term optimal hydrothermal scheduling",
IEEE Transactions on Power Systems, VoL 13, No.2, May 1998, pp.
7. REFERENCES 501-518.

[20J H.-C. Chang and P.-H. Chen: "Hydrothennal generation scheduling IEEETransactions on Power Systems, Vol. 11, No.1, February1996,pp.
package: a genetic based approach", lEE Proceedings - Generation, 112-118.
Transmission and Distribution, Vol. 145,No.4, July 1998,pp. 451-457. [41] rx. Xu, C.S. Chang; and X.W. Wang: "Constrained multiobjective
[21] I.F. MacGilJ and R.J. Kaye: "Decentralised coordinationof powersystem global optimisation of longitudinal interconnected power system by
operation using dual evolutionary programming", IEEE Transactions on genetic algorithm", lEE Proceectings - Generation, Transmission and
PowerSystems, Vol. 14,No.1, February 1999,pp. 112-119. Distribution, Vol. 143,No.5, September 1996,pp. 43S-446.
[22] E.S. Huse, I. Wangensteen, and H.H. Faanes:"Thermal powergeneration [42] S.O. Otero and MA. Irving: "Economic·dispatch of generators with
scheduling by simulated competition", IEEE Transactions on Power prohibited operating zones: a genetic algorithm approach", lEE
Systems, Vol. 14,No.2, May 1999,pp. 472-477. Proceedings - Generation, TraDSmissioD and Distribution, Vol. 143, No.
[23] S.-J. Huang: "Application of genetic based fuzzy systems to 6, November1996,pp. 529-S34.
hydroelectric generation scheduling", IEEE Transactions on Energy [43] Y.H. Song, G.S. Wang, P.Y. Wang, and A.T. Johns:
Conversion, Vol. 14,No.3, August 1999,pp. 724--730. "Environmental/economic dispatch using fuzzy logic controlled genetic
[24] A.H. Mantawy, Y.L. Abdel-Magid, and S.Z. Selim: "Integrating genetic algorithms", lEE Proceedings - Generation, Transmission and
algorithms, tabusearch,andsimulated annealingfor the unit commitment Distribution, Vol. 144,No.4, July 1997,pp. 377-382.
problem", IEEE Transactions on Power Systems, Vol. 14, No.3, August [44) K.P. WODg and J. Yuryevich: "Evolutionary-propamming-based
1999,pp. 829-836. algorithm for environmentally-eonstrained economic dispatch", IEEE
[25] T.G. Werner and J.F. Verstege: "An evolution strategy for shon-term Transactions on Power Systems,Vol. 13,No.2,. May 1998,pp. 301-306.
operation planning of hydrothermal power systems", IEEE Transactions [45] C.S. Chang and W. Fu: "Stochasticmultiobjectivepneration dispatchof
on PowerSystems,Vol. 14,No.4, November1999,pp. 1362-1368. combined heat and power systems", lEE Proceedinp - Generation,
[26] K..A. Juste, H. Kita, E. Tanaka, and J. Hasegawa: "An evolutionary Transmission and Distribution, Vol. 145, No.5, September 1998, pp.
programming solution to the unit comitment problem", IEEE 583-591.
Transactions on Power SYStemS, Vol. 14, No.4, November 1999, pp. [46] D.B. Des and C. Patvardban: "New multi-objective stochastic search
1452-1459. technique for economic load dispatch", lEE Proceedings - Generation,
[27] A. Rudolfand R. Bayrleitlmer: "A genetic algorithmfor solvingthe unit Transmission and Distribution, Vol. 145, No.6, November 1998, pp.
commitment problem of a hydro-tbermal power system", IEEE 747-752.
Transactions on Power Systems, Vol. 14, No.4, November 1999, pp. [47] J. Yuryevicb and K..P. WODg: "Evolutionary programminl based optimal
1460-1468. power flow algorithm", IEEE Transactions OD Power Systems, Vol. 14,
[28] C.-P. Cheng, C.-W. Liu, and C.-C. Liu: "Unit commitment by No.4, November 1999,pp. 1245-1250.
Lagrangian relaxation and genetic algorithms", IEEE Transactions on [48] J.R. Gomes and O.R. Saavedra: "Optimal JUCtive power dispatch using
PowerSystems, Vol. IS, No.2, May 2000,pp. 707-714. evolutionary computation: extended alpritbms", lEE Proceedings -
[29] C.W. Richterand G.B. Sheble: "A profit-basedunit commitment GA for Generation, Transmission and Distribution, Vol. 146, No.6, November
the competitive environment", IEEE Traasactions on Power Systems, 1999,pp. 586-592.
Vol. IS, No.2, May 2000,pp. 715-721. {49] N. Li; Y. Xu, and H. Chen: "FAcrs-based power flow control in
[30] R.-H. Liang and F.-C. KIng: "Thermal generating unit commitment interconnected power systemn, IEEE TraDllCtioas OD Power Systems,
using an exteDded mean field 8DDeatiDg DeUraI DetwOl'k", lEE Vol. IS, No.1, Februaly 2000, pp. 257-262.
Proceedings - Generation, Transmission and Distribution,Vol. 147, No. [SO] H. Yoshida, K. Kawa1a, Y. FuItuyama, S. Takayama,IDd Y. Nakanishi:
3, May 2000,pp. 164-170. "A particle swarm optimjzatioD for reactive power IDd voJtaae control
[31] Y.-G. Wu, C.-Y. Ho, and D.-Y. Wang: "A diploid genetic approach to considering voltage security .seameot", IEEE T-.ctioDS on Power
short-term scheduling of hydro-thermal system", IEEE TraDSlCtiODS on Systems,Vol. IS, No.4, November2000,pp. 1232-1239.
PowerSystems, Vol. 15,No.4, November2000,pp. 1268-1274.
7.4.Maintenance Scheduling
7.3. Economic I Reactive Dispatch and Optimal PowerFlow
[32] D.C.Walters and G.B. Sbeble:"Genetic algorithmsolution of economic [51] T. Satoh and K. Nara: "MaiDteDlDce scbeduliDl by using simulated
dispatch with valve point loading", IEEE Transactions on Power lDDealing method (for power plants)", IEEE T-.ctions on Power
Systems, Vol. 8, No.3, August 1993,pp. 1325-1332. Systems,Vol. 6, No. 2, May 1991.pp. 850-857.

dispatch algorithm", lEE Proceedings - Generation, Tl'IDStDission and

Distribution, Vol. 140,No.6, November1993,pp. 509-515.
[33) K.P. Wong and C.C. FUIlg: "Simulated lDDealing based economic [52] H. Kim, Y. Hayashi, IIld K.. Nara: "AD alpritbm for tbemwJ unit
maintenance scheduling combiDed UIe ofGA, SA and TS", IEEE
Transactions on Power Systems, Vol. 12, No.1, February1997,pp. 329-
[34] A. Bakirtzis, V. Petrictis, and S. Kazarlis: "GeDetic algorithmsolution to 335.
the economic dispatch problem", lEE Proceedings - Generation, [53] G. Bretthauer, T. Gamaleja, E. Handscbin, U. Neumann, and W.
Tl'IDStDission and Distribution, Vol. 141,NO.4, July 1994,pp. 377-382. KoftiDann: "lntepatecl maiDteDaDce scbeduliDl syatem for electrical
[35] K.P. Wong and Y.W. Wong: "GeDetic aDd .eneticlsimu1aled-annealiDg energy systems", IEEE TraDSaCtions on Power Delivery, Vol. 13, No.2,
approaches to economic dispatch", lEE Proceedings - Generation, April 1998,pp. 655-660.
Transmission and Distribution, Vol. 141, No.5, September 1994, pp. [54] E.K. Burke and AJ. Smith: "Hybrid evolutionary teelmiques for the
507-513. maintenance scheduHDg problem", IEEE TraDIICbcms on Power
[36] G.B. Sheble and K. Brittig: "Refined genetic alloritbm-economic Systems., Vol. 15,No.1, February2000,pp. 122·128.
dispatch example",IEEET1'8DS8Ctions on Power Systems,VoL 10,No.1,
February 1995,pp. 117-124. 7.S. Generation, Transmission, and VAr Planning
[37] Q.H. WU and J.T. Ma: "Power system optimal reactive power dispatch
usingevolutionaJy proJl'lllUlliDl", IEEET1'8DS8Ctions on PowerSystems,
Vol. 10,No.3, August 1995,pp. 1243-1249. [55] Y.-T. Hsiao, C.-C. Liu, H.-D. Chima, IDd Y.-L. Cben:"A new approach
[38] K..P. Wong, B. Fan, C.S. CbaD& aDd A.C. Liew: "Multi-objective for optimal VAr sourcespllDDing in Jarp scale electric power systems",
generation dispatch using bi-eriterioD Blabal optimisation", IEEE IEEE Tl'8IJS&CtioDs on Power Systems, Vol. 8, No.3, Aupst 1993, pp.
988-996. .
Transactions on Power Systems, Vol. -10, No.4, November 1995, pp.
1813-1819. [S6] Y.-T. Hsiao, H.-D. Cbiana, c-c, Uu, and Y.-L Chen: "A computer
[39] P.-H.Chen and H.-C. CbaDg: "Larp-scale economic dispatch by genetic package for optimal multi-objective VAr plaDDiDg in Jarp scale power
algorithm", IEEE Transactions on Power Systems., Vol. 10, No.4, systems", IEEE Transactions on Power Systems, VoL 9, No.2, May
November 1995,pp. 1919-1926. 1994,pp. 668-676.
[40] H.-T. Yang, P.-e. Yang, aDd C.-L. Huang: "EvolutioDIJ'Y programming [57] K. Iba: "Reactive power optimization by geDctic algorithm", IEEE
based economic dispatchfor units with non-smooth fuel cost functions", Transactiooson Power Systems,Vol.9, No.2, May 1994,pp. 685-692.

[58] Y.-L. Chenand C.-C. Liu: "Multiobjective VAr planning using the goal- [78] R.A. Gallego, A. Monticelli, and R. Romero: "Comparative studies on
attainment method", lEE Proceedings - Generation, Transmission and non-convex optimization methods for transmission network expansion
Distribution, Vol. 141,No.3, May 1994,pp. 227-232. planning", IEEETransactions on Power Systems,Vol. 13, No.3, August
[59] K.P. Wongand Y.W. Wong: "Short-term hydrothermalschedulingpart I. 1998,pp. 822-828.
Simulated annealing approach", lEE Proceedings - Generation, [79] L.L. Lai and J.T. Ma: "Practical applicationof evolutionarycomputing to
Transmission and Distribution, Vol. 141, No.5, September 1994, pp. reactive power planning", lEE Proceedings - Generation, Transmission
497-501. and Distribution,Vol. 145, No.6, November 1998,pp. 753-758.
[60] K.P. Wong and Y.W. Wong: "Short-term hydrothennal scheduling part [80] P. Patemi, S. Vitet, M. Bena, and A. Yokohama: "Optimal location of
II. Parallel simulated annealing approach", lEE Proceedings - phase shifters in the French network by genetic algorithm", IEEE
Generation, Transmission and Distribution, Vol. 141, No.5, September Transactions on Power Systems, Vol. 14, No.1, February 1999, pp. 37-
1994, pp. 502-506. 42.
[61] Y.L. Chen and C.-C. Liu: "Interactive fuzzy satisfying method for [81] Y.-M. Park, J.-R. Won, J.-B. Park, and D.-G. Kim: "Generation
optimal multi-objective VAr planning in power systems", lEE expansion planning based on an advanced evolutionary programming",
Proceedings - Generation, Transmission and Distribution, Vol. 141, No. IEEE Transactionson Power Systems,Vol. 14,No.1, February 1999, pp.
6, November 1994,pp. 554-560. 299-305.
[62] Y.-L. Chen and C.-C. Liu: "Optimal multi-objective VAr planningusing [82] AJ. Urdaneta, J.F. Gomez, E. Sorrentino, L. Flores, and R. Diaz: "A
an interactive satisfyingmethod", IEEE Transactions on Power Systems, hybrid genetic algoritlun for optimal reactive power planning basedupon
Vol. 10,No.2, May 1995,pp. 664-670. successive linear prograrmning", IEEE Transactions on Power Systems,
[63] W.-S. Jwo, C.-W. Liu, C.-C. Liu, and Y.-T. Hsiao: "Hybrid expert Vol. 14, No.4, November 1999,pp. 1292-1298.
system and simulated annealing approach to optimal reactive power [83] C.-J. Chou, C.-W. Liu, J.-Y. Lee, and K.-D. Lee: "Optimal planning of
planning", lEE Proceedings- Generation, Transmission and Distribution, large passive-harmonic-filters set at high voltage level", IEEE
VoL 142,No.4, July 1995,pp. 381-385. Transactionson Power Systems, Vol. 15, No.1, February 2000, pp.433-
[64] K.Y. Lee, X. Bai, and Y.-M. Park: "Optimization method for reactive 441.
power planning by using a modified simple genetic algorithm", IEEE [84] R.A. Gallego, R. Romero, and AJ. Monticelli: "Tabu search algorithm
Transactions on Power Systems, Vol. 10, No.4, November 1995, pp. for network synthesis", IEEE Transactions on Power Systems, Vol. 15,
1843-1850. No.2, May 2000, pp. 490-495.
r65] R. Romero, R.A. Gallego, and A. Monticelli: "Transmission system [85] J.-B. Park, Y.-M. Park, J.-R. Won, and K..Y. Lee: "An improved genetic
expansion planning by simulated annealing", IEEE Transactions on algorithm for generation expansion planning", IEEE Transactions on
PowerSystems, Vol. 11, No.1, February 1996,pp. 364-369. Power Systems,Vol. IS, No.3, August 2000, pp. 916-922.
[66] Y. Fukuyama and H.-D. Chiang: "A parallel genetic algorithm for [86] M. Delfanti, G.P. Granelli, P. Marannino, and M. Montagna: "Optimal
generation expansion planning", IEEE Transactions on Power Systems, capacitor placement using deterministic and genetic algorithms", IEEE
Vol. 11,No.2, May 1996,pp. 955-961. Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 1041-
[67] J.T. Ma and L.L. Lai: "Evolutionary programming approach to reactive 1046.
power planning", lEE Proceedings - Generation, Transmission and [87] E.L. da Silva, H.A. Gil, and J.M. Areiza: "Transmission network
Distribution, Vol. 143,No.4, July 1996,pp. 365-370. expansion planning under an improved genetic algorithm", IEEE
[68] H. Rudnick, R. Palma, E. Cura, and C. Silva: "Economically adapted Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 1168-
transmission systems in open access schemes - application of genetic 1174.
algorithms", IEEE Transactions on Power Systems, Vol. 11, No.3, [88] E.L. da Silva, J.M. Areiza Ortiz, G.C. de Oliveira, and S. Binato:
August 1996,pp. 1427-1440. "Transmission network expansion planning under a tabu search
[69] Y.-L. Cben: ~'Weak bus oriented reactive power planning for system approach", IEEE Transactions on Power Systems, Vol. 16, No.1,
security", lEE Proceedings - Generation, Transmission and Distribution, February 2001, pp. 62-68.
Vol. 143,No.6, November 1996, pp. 541-545. [89] IJ. Ramirez-Rosado and J .L. Bernal-Agustin: "Reliability and costs
[70] Y.-L. Cben: uWeakbus-oriented optimal multi-objective VAr pl8DDiDg", optimization for distribution networks expansion using aD evolutionary
IEEE Tl'8DS8Ctions on Power Systems, Vol. 11, No.4, November 1996, algorithm", IEEE TransactiODS OD Power Systems, Vol. 16, No.1,
pp. 1885-1890. February 2001, pp. 111-118.
[71] R.A. Gallego, A.B. Alves, A. Monticelli, and R. Romero: "Parallel
simulated annealing applied to long term transmission network expansion
plaDning", IEEE TraDSllCtions on Power Systems, Vol. 12, No.1,
7.6. Distribution Systems Planning and Operation
February1997,pp. 181-188.
[72] L.L. Lai and J.T. Ma: uApplication of evolutionary programming to [90] C.W. Hasselfield, P. Wilson, L. Penner, M. Lau, and A.M. Gole: "An
reactive power planning - comparison with nonlinear programming automated method for least cost distribution plaDning", IEEE
approach", IEEE Transactions on Power Systems, Vol. 12, No.1, Transactions on Power Delivery, Vol. 5, No.2, April 1990, pp. 1188-
February1997,pp. 198-206. 1194.
[73] C.-W. Liu, W.-S. Jwo, C.-C. Liu, and Y.-T. Hsiao: "A fast global [91] K. Nara, A. Shiose, M. Kitagawa, and T. Ishihara: "Implementation of
optimization approach to V Ar plaDniDg for the large scale electric power genetic algorithm for distribution systems loss minimum
systems", IEEE TraDSllCtions on Power Systems, Vol. 12, No.1, reconfiguration", IEEE Transactions on Power Systems, Vol. 7, No.3,
February 1997,pp. 437-443. August 1992,pp. 1044-1051.
[74] J. Zhu and M.-Y. Chow: "A review of emergingtechniques on generation [92] G.G. Richards and H. Yang: "Distribution system harmonic worst case
expansion planning",IEEE Transactions on Power Systems,Vol. 12, No. design using a genetic algorithm",IEEE Tnmsactionson Power Delivery,
4, November 1997,pp. 1722-1728. Vol. 8, No.3, July 1993,pp. 1484-1491.
[75] K.Y. Lee and F.F. Yang: "Optimal reactive power planning using [93] R.F. Chu, J.-C. Wang, and H.-D. Chiang: "Strategic plaDning of LC
evolutionary algorithms: a comparative study for evolutionary compensators in nonsinusoidal distribution systems", IEEE Transactions
programming, evolutionary strategy, genetic algorithm, and linear on Power Delivery,Vol. 9, No.3, July 1994, pp. 1558-1563.
programming", IEEE Transactions on Power Systems, Vol. 13, No.1, [94] S. Sundhararajan and A. Pahwa: "Optimal selection of capaciton for
February 1998,pp. 101-108. radial distribution systems using a geneticalgorithm", IEEETransactions
[76] e.s. Chang and J.S. Huang: "Optimal multiobjective SVC planning for on Power Systems,Vol. 9, No.3, August 1994,pp. 1499-1507.
voltage stability enbancement", lEE Proceedings - Generation, [95] V. Miranda, J.V. Ranito, and L.M. Proen~ "Genetic algorithms in
Transmission and Distribution, Vol. 145, No.2, March 1998, pp. 203- optimal multistage distribution network planning", IEEETransactions on
209. Power Systems, Vol. 9, No.4, November 19.94, pp. 1927-1933.
[77] Y.-L. Chen: "Weighted-norm approach for multiobjective VAr (96] H.-D. Chiang, J.-C. Wang, J. Tong, and G. DarliDg: "Optimal capacitor
planning", lEE Proceedings - Generation, Transmissionand Distribution, placement,replacement and control in large-scaleunbalanced distribution
Vol. 145,No.4, July 1998, pp. 369-374. systems: modeling and a new fomulation", IEEETransactions on Power
Systems,Vol. 10,No.1, February 1995,pp. 356-362.
[97] H.-D. Chiang, J.-C. Wang, J. Tong, and G. Darling: "Optimal capacitor [116] AJ. Gaul, E. Handscbin, W. Hoffmann, and C. Lehmkoster:
placement, replacement and control in large-scale unbalanced distribution "Establishing a role base for a hybrid ESlXPS approach to load
systems: system solution algorithms and numerical studies", IEEE management", IEEE Transactions on Power Systems, Vol. 13, No.1,
Transactions on PowerSystems, Vol. 10, No.1, February 1995,pp. 363- February1998,pp. 86-93.
369. [117] C.-H. Kung, MJ. Devaney, C.-M. Huang, and C.-M. Kung: "Fuzzy-
[98] E.-C. Yeh, S.S. Venkata, and Z. Sumic: "Improved distribution system based adaptive digital power metering using a genetic algorithm", IEEE
planning using computational evolution IEEE Transactions on Power
', Transactions on Instrumentation and Measurement, Vol. 47, No.1,
Systems, Vol. 11,No.2, May 1996,pp. 668-674. February1998,pp. 183-188.
[99] D. Jiang and R. Baldick: "Optimal electric distribution System switch [118] J.C.S. Souza, A.M. Leite cia Silva, and A.P. Alves da Silva: "On-line
reconfiguration and capacitor control", IEEE Transactions on Power topology determination and bad data supression in power system
Systems, Vol. 11, No.2, May 1996,pp. 89().897. operation using artificial neural networks", IEEE Transactions on Power
[100] R. Billintonand S. Jonnavitbu1a: "Optimal switchingdevice placementin Systems,Vol. 13,No.3, August 1998, pp. 796-803.
radial distribution systems", IEEE Transactions on Power Delivery, Vol. [119] A.P. Alves da Silva and L.S. Moulin: "Confidence intervalsfor neural
11,No.3, July 1996, pp. 1646-1651. network based short-term load forecasting", IEEETnmsactions on Power
[101] S. Jonnavithula and R. Billinton: "MiDirmun cost analysis of feeder Systems, Vol. 15,No.4, November 2000, pp. 1191-1196.
routing in distribution system planning", IEEE Transactions on Power
Delivery, Vol. 11,No.4, October 1996,pp. 1935-1940.
[102] Y.-C. Huang, H.-T. Yang, and C.-L. Huang: "Solving the capacitor
7.8. Control
placement problem in a radial distribution system using tabu search
approach", IEEE Transactions on Power Systems, Vol. 11, No.4, [120] R. Asgbarianand S.A. Tavakoli: "A systematic approachto perfonnance
November 1996,pp. 1868-1873. weightsselectionin design of robustHlsup lap) infmIl PSS usinggenetic
[103] K.N. Miu, H.-D. Chiang, and G. Darling: "Capacitor placement, algorithms", IEEE Transactions on Energy Conversion, Vol. II, No.1,
replacement and control in large-scale distribution systems by a GA- March1996,pp. 111-117.
based two-stage algorithm", IEEE Tl'IIlsactions on Power Systems, Vol. [121] P. J~ E. Handschin, and F. Reyer: "Genetic algorithm aided controller
12,No.3, August 1997,pp. 1160-1166. design with application to SVC", lEE Proceedinp - Generation,
[104] IJ. Ramirez-Rosado and J.L. Bernal-Agustin: "Genetic algorithms Transmission and Distribution, Vol. 143,No.3, May 1996,pp. 258-262.
applied to the design of large power distribution systems", IEEE [122] Y.L. Abdel-Magid, M. Bettayeb, and M.M. Dawoud: "Simultaneous
Transactions on PowerSystems,Vol. 13,No.2, May 1998,pp. 696..703. stabilisation of power systems using genetic algorithms", lEE
[105] J. Zhu, G. Bilbro, and M.-Y. Chow: "Phase balancing using simulated Proceedings - Generation,Transmission aucl Distribution, Vol. 144, No.
annealing", IEEE Transactions on Power Systems, Vol. 14, No.4, 1, January 1997, pp. 39-44.
November 1999,pp. 1508-1513. [123] G.M. Taranto and D.M. Falcio: "Robust decentralised control design
[106] A.S.Chuangand F. Wu: "An extensible genetic algorithmframework for using genetic algorithms in power system damping control", lEE
problem solving in a common environment", IEEE Transactions on Proceedings - Generation,Transmission and Distribution, Vol. 145, No.
PowerSystems,Vol. IS, No.1, Febrwuy 2000, pp. 269-275. 1, January 1998,pp. 1-6.
[107] T.-H.Chen and J.-T. Cherng: "Optimal pbue arrangementof distribution [124] M. Reformat, E. Kuffel, D. Woodford, and W. Pedrycz: "Application of
transformers connected to a primary feeder for system unbalance genetic algorithms for control design in power systems", lEE
improvement and loss reduction using a genetic algorithm", IEEE Proceedings - Generation,Trausmission and Distribution, Vol. 145,No.
Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 994- 4, July 1998,pp. 345-354.
1000. [125] J. Wen, S. Cheng, and O.P. Malik: "A syncluonous generator fuzzy
[108] Y.-T. Hsiao and C.-Y. Chien: "EDhancement of restoration service in excitationcontroller optimally desiped with a pnetic algorithm", IEEE
distribution systems using a combination fuzzy-GA method IEEE 7
Trausactions on Power Systems, Vol. 13, No.3, Aupst 1998, pp. 884-
Transactions on Power Systems, Vol. 15, No.4, November 2000, pp. 889.
1394-1400. [126] X.R. Chen, N.C. Pahalawathtba, U.D. ADDakkage, and C.S. K.umble:
"Design of decentralised output feedbackTCSC damping controllers by
using simulatedannealing",lEE Proceedings • Generation, Transmission
7.7. Load Forecasting / Management and ModelIdentification and Distribution, Vol. 145,No. S, September1998,pp. 553-558.
[127] M.A. Abido and Y.L. Abdel-Maaid: "Hybridizing rule-based power
[109] T.S. Dillon, K. Morsztyn, and K. Phua: "Shon tenn load forecasting system stabilizers with pnetic algorithms", IEEE Tnnsactions on Power
using adaptive pattern recognition and self-organizing techniques", Sib Systems,Vol. 14,No.2, May 1999,pp. 600-607.
PowerSystem Computation Conference, September 1975, Vol. 1, Paper [128] Y.L. Abdel-Magid, M.A. Abido, S. Al-Baiyat, and A.B. Mantawy:
2.413. "SimultaDeous stabilization of multimacbine power systems via genetic
[110] H. Morland H. Kobayashi: '6()ptimal fuzzy inferencefor short-tenn load algorithms", IEEE Tnmsactions on Power Systems, Vol. 14, No.4,
forecasting", IEEE Transactions on Power Systems, Vol. 11, No.1, November1999, pp. 1428-1439.
February 1996, pp. 390-396. [129] M. Welsh, P. Mehta, and M.K. Darwish: "Genetic algorithm and
[111] H.-T. Yang, C.-M. Huang, and C.-L. Huang: "Identificationof ARMAX extended analysis optimisation techniques for switched capacitor active
model for shon tenn load forecasting: an evolutionary programming filters-comparative study", lEE Proceedings - Electric Power
approach", IEEE Transactions on Power Systems, Vol. 11, No.1, Applications, Vol. 147,No.1, January2000,pp. 21-26.
February 1996,pp. 403-408. [130] A.L.B. do Bomfim, G.N. Taranto, aDd D.M. Falclo: "Simultaneous
[112] J.C.S. Souza, A.M. Leite cia Silva, and A.P. Alves da Silva: "Data tuning of power system damping controllers using genetic algorithms",
debugging for real-time power system monitoring based on pattern IEEETransactionson PowerSystems, Vol. 15,No.1, February 2000,pp.
analysis", IEEE Transactionson Power Systems, Vol. 11, No.3, August 163-169.
1996, pp. 1592-1599. [131] YL. Abdel-Magid, M.A. Abido,and A.H. Mantaway: "Robusttuningof
[113] P. Ju, E. Handscbin, and D. Karlsson: "Nonlinear dynamic load power system stabilizers in multimlcbine power systems", IEEE
modelling: model and parameter estimation", IEEE Transactions on Tnmsactionson Power Systems, VoL 15,No.2, May2000,pp. 735-740.
PowerSystems,Vol. 11,No.4, November 1996,pp. 1689-1697. [132] P. Zhang and A.H. CooDick: "CoordiDated syDthesis of PSS parameters
[114] T.-Y. Kwok and D.-Y. Yewg: "Constructive algorithms for structure in multi-machine power systems _ the method of inequalities applied
learning in feedforward neural networks for regression problems", IEEE to genetic algorithms", IEEE Transactions on Power Systems, Vol. 15,
Transactions on NeuralNetworks, Vol. 8, No.3, May 1997,pp. 630-645. No.2, May2000, pp. 811-816.
[115] A.P. Alves cia Silva, C. Ferreira, A.C. Zambroni de Souza, and G. [133] M.A. Abide: "Robust design of multimachine power system stabilizers
LambertTorres:"A new CODStnlCtive ANN and its applicationto electric using simulated annealiDg", IEEE Tnmsactions on Energy Conversion,
loadrepresentation", IEEE Tl'IDSaCtions on Power Systems,Vol. 12, No. Vol. 15,No.3, September 2000,pp. 297-304.
4, November 1997,pp. 1569-1575. [134] M.A. Abido and Y.L. Abdel-Magid: "R.obust design of multimacbine
power system stabilisers using tabu search algorithm", lEE Proceedings -
Generation,Transmission and Distribution, Vol. 147, No.6, November
2000, pp. 387-394.
[135] R.A.F. Saleh and H.R. Bolton: "Genetic algorithm-aided design of a 7.10. State Estimation and Analysis
fuzzy logic stabilizer for a superconducting generator", IEEE
Transactions on Power Systems, Vol. 15, No.4, November 2000, pp. [141] M.R. Irving and MJ.H. Sterling: "Optimal network tearing using
1329-1335. simulated annealing", lEE Proceedings - Generation, Transmission and
Distribution, Vol. 137,No.1, January 1990,pp. 69-72.
7.9. Alarm Processing, Fault Diagnosis, and Protection [142] T.L. Baldwin, L. Mili, M.B. Boisen Jr., and R. Adapa: "Power system

observabiJity with minimal phasor measurement placement", IEEE

Transactions on Power Systems,Vol. 8, No.2, May 1993,pp. 707-715.
[136] E.S. Wenand C.S. Chang: "Tabu search approach to alarm processing in
[143] H. Mori and K. Takeda: "Parallel simulated aDIlealing for power system
power systems", lEE Proceedings - Generation, Transmission and decomposition", IEEE Transactions on Power Systems, Vol. 9, No.2,
Distribution, Vol. 144, No.1, January 1997,pp. 31-38. May 1994, pp. 789-795.
[137] F.S. Wen and C.S. Chang: "Probabilistic approach for fault-section
[144] K.P. Wong,A. Li, and M.Y. Law: "Development of constrained-genetic-
estimation in power systems based on a refined genetic algorithm", lEE
algorithm load-flow method", lEE Proceedings - Generation,
Proceedings - Generation, Transmission and Distribution, Vol. 144, No.
Transmission and Distribution,Vol. 144, No.2, March 1997,pp. 91-99.
2, March 1997,pp. 160-168.
[138] L.L. Lai, A.G. Sichanie, and B.J. Gwyn: "Comparison between
evolutionary programming and a genetic algorithm for fault-section 7.11. Energy Markets
estimation", lEE Proceedings - Generation, Transmission and
Distribution,Vol. 145, No.5, September 1998,pp. 616-620. [145] C.W. RichterJr. and G.B. Sheble: "Geaetie algorithm evolution of utility
[139] F.S. Wen and C.S. Chang: 66possibilistic-diagnosis theory for fault- bidding strategies for the competitive marketplace", IEEE Transactions
section estimationand state identificationof unobservedprotectiverelays on PowerSystems,Vol. 13, No.1, Febnwy 1998,pp. 256-261.
using tabu-search method", lEE Proceedings - Generation, Transmission [146] C.W. RichterJr., G.B. Sheble, and D. Ashlock: "Comprehensive bidding
and Distribution, Vol. 145, No.6, November 1998,pp. 722-730. strategies with genetic programming I finite state automata", IEEE
[140] C.W. So and K.K. Li: "Time coordination method for power system Transactions on Power Systems, Vol. 14, No.4, November 1999, pp.
protection by evolutionary algorithm", IEEE Transactions on Industry 1207-1212.
Applications, Vol. 36, No.5, September/October2000,pp. 1235-1240.

Chapter 3
Fundamentals of Genetic Algorithms

Abstract-Research on geneticalgorithms (GAs) has shown that the evolu~onary techniques, like evolutionary programming,
initialproposals are incapable of solving hard problems in a robustand evolutionary strategies, etc.), simulated annealing (SA) [8], and
efficient way. Usually, for large-scale optimization problems, the tabu search (TS) [9]. Particle swarm [10] is another
execution time of first generation GAs dramatically increases while optimization technique that has shown great potential, lately.
solution qualitydecreases. The focus of this tutorial chapteris pointing
However, more experience is still necessary to prove its
out the maindesign issues in tailoring GAs to large-scale optimization
problems. Important topics such as encoding schemes, selection efficiency and robustness.
procedures, self-adaptive and knowledge-based operators are Simulated annealing uses a probability function that allows
discussed. a move to a worse solution with a decreasing probability, as the
search progresses. With GAs, a pool of solutions is used and the
Index Tenns-Genetic algorithms, large-scale optimization, power neighborhood function is extended to act on pairs of solutions.
systems. Tabu search uses a deterministic rather than stochastic search.
Tab~ search is based on neighborhood search with local optima
1. MODERN HEURISTIC SEARCH TECHNIQUES avoidance. In order to avoid cycling, a short-term adaptive
Optimization is the basic concept behind the application of memory is used in TS. Genetic algorithms have a basic
genetic algorithms (GAs), or any other evolutionary algorithm distinction when compared with other methods based on
[1]-[3], to any field of interest. Over and above the problems in stochasti~ .search. They can use coding (genotypic space). for
which optimization itself is the final goal, it is also a way for (or representmg the problem. The other methods solve the
~e main idea behind) modeling, forecasting, control, optimization problem in the original representation space
sunulation, etc.. Traditional optimization techniques begin with (phenotypic).
a single candidate and iteratively search for the optimal solution The most rigorous global search methods have asymptotic
by applying static heuristics. On the other hand, the GA ~onvergenc~ proof (a~ ~wn as convergence in probability),
approach uses a population of candidates to search several areas i.e., the optimal solution IS guaranteed to be found if infmite
of a solution space, simultaneously and adaptively. time is.available. Among SA, GA and TS algorithms, simulated
Evolutionary computation allows precise modeling of the annealmg and genetic algorithms are the only ones with proof
optimization problem, although not usually providing of convergence. However, there is no such proof for the
math~cal1y optimal solutions. Another advantage of using cano~cal GA [11], i.e., the one with proportional selection
evolutionary computation techniques is that there is no need for (Section 5.1.6) and crossover/mutation with constant
having an explicit objective function. Moreover, when the probabilities (Section 5.4).
objective function is available, it does not have to be Although all the mentioned algorithms have been
differentiable. succ~ssfully applied to real world problems, several of their
Genetic algorithms have been most commonly applied to crucial parameters have been selected empirically. Theoretical
solve combinatorial optimization problems. Combinatorial ~owledge of the impact of these parameters on convergence is
optm;Uzation .usually involves a huge number of possible still an open problem. In fact, there is no theoretical result for
sol~t1ons, which makes the use of enumeration techniques (e.g., tabu andparticle swarm searches.
cutting plane, branch and bound, or dynamic programming) The choice of representation for a GA is fundamental to
hopeless. achie~g good results. Encoding allows a kind of tunneling in
Thermal unit commitment, hydrothermal coordination the onginal search space. That means, a particle bas a non-zero
exp~ion planning.(generation, transmission, and distribution): probabilityof passing a potential barrier even when it does not
reactive compensation placement, maintenance scheduling, etc. ~ve .enough energy to jump over the barrier, The tunneling
have the typical features of a large-scale combinatorial Idea: IS that rather than escaping from local minima by random
optimization problem. In problems of this kind the number of uphill moves, escape can be achieved with the quantum tunnel
possible solutions grows exponentially with the problem size. effect. It is not the height of the barrier that determines the rate
Therefore, the application of optimization methods to find the of escape from a local optimum, but its width relative to current
optimal solution is computationally impracticable. Heuristic population variance.
search techniques are frequently employed in this case for The main shortcoming of the standard SA procedure is the
achieving high quality solutions within reasonable run time. slow asymptotic convergence with respect to the temperature
Among the heuristic search methods there are the ones that parameter T':In the standard SA algorithm, the cooling schedule
apply local search (e.g., hill climbing) and the ones that use a for asympt~tiC global convergence is inversely proportional to
non-convex optimization approach, in which cost-deteriorating the loganthm of the number of iterations, i.e.,
neighbors are accepted also. The most popular methods which T(k)=c/{1+1ogk). The constant c is the largest depth of any
go beyond simple local search are GAs [4]-[7] (and other local minimum that is not the global minimum. Convergence in
probability cannot be guaranteed for faster cooling rates, e.g., Then, crossover and mutation are applied to- the intermediate
lowervalues for c. population to create the next generation of potential solutions.
Tabu search owes its efficiencyto an experience-based fine- Although a lot of emphasis has been placed on the three above
tuning of a large collection of parameters. Tabu search is a mentioned operators,the coding scheme and the fitnessfunction
general search scheme that must be tailored to the details of the are the most important aspects of any GA, because they are
problem at hand. Unfortunately, as mentioned before, there is problem dependent.
little theoretical knowledge for guiding this tailoring process. The most popular explanation about how GAs can result in
Heuristic search methods utilize different mechanisms in robust search relies on the argument of hyperplanesampling. In
order to explorethe state space. These mechanisms are based on order to understand this concept, assume a problem encoded
three basic features: with 3 bits. The search space is representedby a cube with one
- the use of memoryless search (e.g., standard SA and GA) or of its vertices at the origin 000. For example, the upper surface
adaptivememory (e.g., TS); of the cube contains all the points of the form *1*, where *
- the kind of neighborhood exploration used, i.e., random (e.g., could be either 0 or 1. A string that contains the symbol * is
SA and GAs) or systematic (e.g., TS); and referred to as a schema. It can be viewed as a (hyper)plane
- the number of current solutions taken from one iteration to the representinga set of solutions with commonproperties.
next (GAs, as opposed to SA and TS, take multiples solutions to The order of a schema is the number of fixed positions
the next iteration). present in the string. The defining length is the distance
The combination of these mechanisms for exploring the between the first and last fixed positions of a particularschema.
state space determines the search diversification (global Building blocks are highly fit strings of low defining length and
exploration) and intensification (local exploitation)capabilities. low order. It can be shown that about 0( n 3 ) hyperplanes are
The standard SA algorithm is notoriously deficient with respect
simultaneously sampled when the number of strings contained
to the diversification aspect. On the other hand, the standardGA
in the population is n. Therefore, even though a GA never
is poor in intensification.
explicitly evaluates any particular hyperplane of the search
When the objective function has very many equally good
space, it changesthe distribution of strings as if it had.
local minima, wherever the starting point is, a small random
Genetic algorithms process many hyperplanes implicitly in
disturbance can avoid the small local minima and reach one of
parallel when selection acts on the population. The true fitness
the good ones, making this an appropriate problem for SA.
of a hyperplane partition corresponds to the average fitness of
However,SA is lesssuitablefor a problem in which there is one
all strings that lie in that hyperplane. Genetic algorithms use the
global minimum that is much better than all the other local
population as a sample for estimating the fitness of that
ones. In this case, it is very important to fmd that valley.
hyperplane partition. After the initial generation, the pool of
Therefore, it is better to spend less time improving any set of
new strings are biased toward regions that have previously
parameters and more time working with an ensemble to
contained strings that were above average with respect to
examine different regions of the space. This is what GAs do
previous populations. In order to further explore the search
best. Hybrid methods have been proposed in order to improve
space, crossover and mutation generate new sample points,
the robustnessof thesearch.
while partially preserving the distnbution of strings that is
observed after selection.
In the following sections, several important design stagesof
a GA are presented. Section 3 shows different possibilities for
Genetic algorithms operate on a population of individuals.
encoding. It emphasizes the importanceof the encodingscheme
Each individual is a potentialsolution to a given problem and is
on GA convergence. Section 4 treats the formulation of the
typically encoded as a fixed-length binary string (other
fitness function. Section 5 presents differentpropositions for the
representations have also been used, including character-based
selection, crossover and mutation operators. Parametercontrol
and real-valued encodings, etc.), which is an analogy with an
in GAs is addressed in Section S, too. This chapter is concluded
actual chromosome. After an initial population is randomly or
with a short presentation of niching methods, which serve for
heuristically generated, the algorithm evolves the population
multiobjectiveoptimization via GAs.
through sequential and iterative application of three operators:
selection, crossover and mutation. A new generation is formed 3. ENCODING
at the end of eachiteration.
For large-scale optimization problems, the initial population In order to apply a GA to a given problem the first decision
can incorporate prior knowledge about solutions. This one has to make is the kind of genotype the problem needs.
procedure should not drastically restrict the population That means, a decision must be taken on how the parameters of
diversity, otherwise premature convergence could occur. the problem will be mapped into a fmite string of symbols,
Typical population sizes vary between 30 and 200. The known as genes (with constant or dynamic length), encoding a
population size is usually set as a function of the chromosome possible solution in a given problem space. The issue of
length. selecting an appropriate representation is crucial for the search.
The execution of a GA iteration is basically a two stage The symbol alphabet used is often binary, though other
process. It starts with the current population. Selection is representations have also been used, including character-based
applied to create an intermediate population (mating pool). and real-valued encodings.
In the majority of GA applications, the strings use a binary One possible answer for the binarygene positionproblem is
alphabet and their length is constant during all the evolutionary to use an operator called inversion. This is implemented by
process. Also, all the parameters decode to the same range of extending every gene by adding the position it occupies in the
values and are allocated the same number of bits for the genes string. Inversion is interesting because it can freely mix the
in the string. A problem occurs when a gene may only have a genes of the same string in order to put together the building
finite numberof discrete valid values if a binary representation blocks, automatically, during evolution (e.g., [(2 1) (3 0) (1 0)
is used. If the number of values is not a power of two, then (4 1)], where the first number is a bit tag which indexes the bit
some of the binary codes are redundant, i.e., they will not and the second one represents the bit value, i.e., (3 0) means
correspond to any valid gene value. The most popular that the third bit is equal to zero). At first sight, the inversion
compromise is to mapthe invalidcode to a valid one. operator looks very useful when the correlated parameters are
Another shortcoming of binary encoding is the so called not known a priori. With the association of a position to every
Hamming cliffs (e.g., in the Appendix, although being gene, the string can be correctly reordered before evaluation.
neighbors in decimal representation, the Hamming distance However, for large-scale problems, inversion is useless.
between the binary strings for -0.6 and -0.5 is three (different Reordering greatly expands the search space, making the
bits)). It is worthwhile to mention that Gray coding, although problemmuch more difficult to solve.
frequently recommended as a solution to Hamming cliffs, Therefore, the very hard encoding problem still remains in
because adjacent numbers differ by a single bit, has an the hands of the designer. In order to achieve good performance
analogous drawback for numbers at the opposite extremes of for large tasks, GAs must be matched to the search problemat
the decimal scale(e.g., the minimum and maximum gene values hand. The only way to succeed is by using domain-specific
differby only onebit, too). Binary encoding can also introduce knowledge to select an appropriaterepresentation.
an additional nonlinearity, thus making the combined objective
function (the one in the genotype space) more multimodal than 4. FITNESS FUNCTION
the original one (in the phenotypespace).
At the beginning of GA research, the binary representation Each string is evaluated and assigned a fitness value after
was recommended because it was supposed to give the largest the creation of an initial population. It is useful to distinguish
number of schemata (plural of schema), therefore providing the betweenthe objective function and the fitness function usedby
highest degree of implicit parallelism. However, new a GA. The objective function provides a measure of
interpretations haveshown that high-cardinality alphabets (e.g., perfonnance with respect to a particular set of gene values,
real numbers) are more effective due to the higher expression independently of any other string. The fitness function
power and low effective cardinality [12]-[14]. Complex transforms that measure of performance into an allocation of
applications need non-binary alphabets. Integer or continuous- reproductive opportunities, i.e., the fitness of a string is defined
valued genes are typically used in large-scale function with respect to other members of the current population. After
optimization problems. Another advantage of non-binary decoding the chromosomes, i.e., applying the pnotype to
representations, particularly the real-valued one, is the easy phenotype transformation, each string is assigned a fitness
defmition of problem specificoperators. value. The phenotype is used as input to the fitness function.
Whenusing binary coding, the positions of the genes in the Then, the fitness values are employed to relatively ponder the
chromosome is extremely important for a successful GA design, strings in the population.
unless uniform crossover is applied (see Section 5.2). A bad The specification of an appropriatefitness function is crucial
choice can make the problem harder than necessary. Therefore, for the correct operation of a GA [IS]. As an optimization tool, .
correlated binary genes should be coded together in order to GAs face the task of dealing with problem constraints [16].
fonn building blocks, thus diminishing the disruptive effects of Crossover and mutation, i.e., the perturbation (variation)
crossover. However, this information is usually unavailable mechanism of GAs, are general operators that do not take into
beforehand. account the feasibility region. Therefore, infeasible offspring
Epistasis is a measure of problem difficulty for GAs. It appear quite frequently. There are four basic techniques for
represents the interaction among different genes in a handlingconstraints when using GAs.
chromosome. This dependson the extent to which the change in The simplest alternative is the rejecting technique in which
chromosome fitness resulting from a small change in one gene infeasible chromosomes are discarded allover the generations.
varies according to the values of other genes. The higher the A different strategy is the repairing procedure, which uses a
epistasis level, the harder the problem is. This is obviously also converter to transfonn an infeasiblechromosome into a feasible
true when applying uniform crossover or real-valued encoding. one. Another possible technique is the creation of problem
As mentioned earlier, a possibilityfor making the gene ordering specific genetic operators to preserve feasibility of
irrelevant is to apply uniform crossover, because the result of chromosomes.
this operation is not affectedby the positions of the genes. The The previous procedures do not generate infeasible
same goal can be achieved with real-valued encoding and solutions. This is not usually an advantage. In fact, for large-
recombination operators that also tum the genes positions scale, highly constrained optimization .problems, this is
irrelevant (Section 5.2). However, making the gene ordering certainly a great drawback. Particularly for power system
irrelevant does not necessarily mean an easier way to a good problems, where the optimal solutions usually are on the
solution. boundaries of feasible regions, the above mentioned techniques
for handling constraints often lead to poor solutions. One program, or a human expert that decides the quality of a string.
possible way for overcoming this drawback is to apply the At the beginning of the iterative search, the fitness function
repairing procedure only to a fraction (10%, for instance) of the values for the population members are usually randomly
infeasible population. distributed and wide spread over the problem domain. As the
It has been suggested that constraint handling for such type search evolves, particular values for each gene begin to
of optimization problem should be performed allowing search dominate. The fitness variance decreases as the population
through infeasible regions. Penalty functions allow the converges. This variation in fitness range during the
exploration of infeasible subspaces [17]. An infeasible point evolutionary process often leads to the problems of premature
close to the optimum solution generally contains much more convergence and slow finishing,
information about it than a feasible point far from the optimum.
On the other hand, the design of penalty functions is difficult 4.1. Premature Convergence
and problem dependent. Usually, there is no a priori
information about the distance to optimal points. Therefore, A frequent problem with GAs~ known as deception, is that
penalty methods consider only the distance from the feasible the genes from a few comparatively highly fit (but not optimal)
region. Penalties based on the number of violated constraints do individuals may rapidly come to dominate the population,
not work well. causing it to converge on a local maximum. Once the
There are two possible forms to build a fitness function with - population has converged, the ability of the GA to continue
penalty term: the addition and multiplication forms. The former -searching for better solutions is nearly eliminated. Crossover
is represented as g<!) =!<!) + p~) ; where for maximization (Section 5.2) of almost identical chromosomes generally
problems p<!) = 0 for feasible points, and p<!) < 0 otherwise. produces similar 0!fspring. Onl! mutation .(Section 5.3), with its
. random perturbation mechanism, remams to explore new
The maximum absolute p<!) value cannot be greater than the regions of'the search space ..
minimum absolute f~) value for any generation, in order to The schema theorem says that reproductive opportunities
avoid negative fitness values. The multiplication form is should be given to indivi~ls in proportion to their relative
represented as g<!) = f(~p~}; where for maximization fitnesses. However, by d?mg. that,. pre~ture .convergen~e
.. occurs because the population IS not infinite (basic hypothesis
problems p<!) =1 for feasible points, and 0 S p<.!) < 1 of the theorem). This is due to genetic drift (see Section 6). In
otherwise. order to make GAs work effectively on fmite populations, the
The penalty term should vary not only with respect to the (proportional) way individuals are selected for reproduction
degree of constraints violations, but also with respect to the GA must be modified. Different ways of performing selection are
iteration count. Therefore, besides the amount of violation, the described in Section 5.1. The basic idea is to control the number
penalty term usually contains variable penalty factors, too (one of reproductive opportunities each individual gets. The strategy
per violated constraint). The key for a successful penalty is to compress the range of fitnesses, without loosing selection
technique is the proper setting of these penalty factors. Small pressure (Section 4.2), and avoid any super-fit individual from
penalty factors can lead to infeasible solutions, while very large suddenly dominating the population.
ones totally neglect infeasible subspaces. In average, the
absolute values of the objective and penalty functions should be 4.2. Slow Finishing
similar. At least in theory, the parameters of the penalty
functions can, also, be encoded as GA parameters. This This is the opposite problem of premature convergence.
procedure createsan adaptivemethod, which is optimized as the After many generations, the population has almost converged,
GA evolves toward the solution. but it is still possible tbat the global maximum (or a high quality
In summary, the main problems associated with the fitness local one) has not been found. The average fitness is high, and
function specification are the following: the difference between the best and the average individuals is
- dependence on whether the problem is related to maximization small. Therefore, there is insufficient variance in the fitness
or minimization; function values to localize the maxima.
- when the fitness function is noisy for a non-deterministic The same techniques used to tackle premature convergence
environment [18]; are used also for fighting slow finishing. An expansion of the
- the fitness function may change dynamically as the GA is range . of population fitnesses is produced, instead of a
executed; compression. Both procedures are prone to bad remapping
- the fitness function evaluation can be so time consuming that (underexpansion or overcompression) due to super-poor or
only approximations to fitness values can be computed; super-fit individuals.
- the fitness function should allocate very different values to
strings in order to make the selection operator work easier s. BASIC OPERATORS
(Section 5.1.6);
- it must consider the constraints of the problem; and In this section, several important design issues for the
- it could incorporate different sub-objectives. selection, crossover and mutation operators are presented.
The fitness function is a black box for the GA. Internally, Selection implements the survival of the fittest according to
this may be achieved by a mathematical function, a simulator some predefined fitness function. Therefore, high-fitness

individuals have a better chance of reproducing, while low- every individuals' fitness is multiplied and added by a constant.
fitness ones are more likely to disappear. Selection alone cannot The selection intensity of proportionate selection is the only one
introduce any new individuals into the population, i.e., it cannot that is sensitive to the current population distribution [22].
find new points in the search space. Crossover and mutation are However, conclusive statementsabout the performance of rank-
used to explorethe solution space. based selection schemes are difficult to make because, by
Crossover, which represents mating (recombination) of two suitable (but tricky!) adjustment, proportionate selection can
individuals is performedby exchangingparts of their strings to give similarperformance.
form two new individuals (offspring). In its simplest form,
substrings are exchanged after a crossover point is randomly 5.1.1.Tournament Selection
determined. The crossover operator is applied with a certain
probability, usuallyin the range [0.5, 1.0]. This operatorallows This selection scheme is implemented by choosing some
the evolutionary process to move toward promising regions of number of individuals randomly from the population and
the search space. It is likely to create even better individuals by copying the best individual from this group into the
recombining portions of good individuals. The new offspring intermediate population, and by repeating it until the mating
created from mating, after being subject to mutation, are put pool is complete. Tournaments are frequently held only
into the next generation. betweentwo individuals. Bigger tournamentsare also used with
The purpose of the mutation operator is to maintain arbitrary group sizes (not too big in comparison with the
diversity within the population and inhibit premature population size). Tournament selection can be implemented
convergence to local optima by randomly sampling new points very efficiently becauseno sorting of the populationis required.
in the search space.The GA stoppingcriterion may be specified The tournament procedure selects the mating pool without
as a maximal numberof generationsor as the achievement of an remapping the fitnesses. By adjusting the tournament size the
appropriate levelfor the generationaverage fitness (stagnation). selection pressure can be made arbitrarily large or small. Bigger
tournaments havethe effect of increasingthe selectionpressure,
5.1. Selection since below average individuals do not have good chances of
winning a competition.
Selection,more than crossover and mutation, is the operator
responsible for determining the convergence characteristics of
5.1.2.Truncation Selection
GAs [19], [20]. Selection pressure is the degree to which the
best individuals are favored [21]. The higher the selection
pressure, the more the best individuals are favored. The In truncation selection, only a subset of the best individuals
selection intensity of GAs is the expected change of average are chosen to be possibly selected, with the same probability.
fitness in a population after selection is performed. Analysesof This procedure is repeated until the mating pool is complete. As
selection schemes show that the change in mean fitness at each a sorting of the population is required,truncationselectionhas a
generation is a functionof the populationfitness variance. greater time complexity than tournament selection. As in
The convergence rate of a GA is largely determined by th~ tournament selection, there is no fitness remapping in truncation
magnitude of the selection pressure. Higher selectionpressures selection.
imply in higher convergence rates. If the selection pressure is
too low, the convergence rate will be slow, and the GA will 5.1.3. Linear RankingSelectiOD
unnecessarily take longer to find a high quality solution. If the
selection pressure is too high, it is very probable that the GA The individuals are sorted according to their fitness values
will prematurely converge to a bad solution. In fact, selection and the last position is assigned to the best individual, while the
schemes should also preserve population diversity, in addition first position is allocated to the worst one. The selection
to providing selectionpressure. One possibility to achieve this probability is linearly assigned to the individuals according to
goal is to maximize the product of selection intensity and their ranks. All individuals get a different selection probability,
population fitness standard deviation. Therefore, if two even whenequalfitness values occur.
selection methods have the same selection intensity,the method
giving the higher standard deviation of the selected parents is 5.1.4.Exponential Ranking Selection
the best choice.
Many selection schemes are currently in use. They can be Exponential ranking selection differs from linear ranking
classified in two groups: proportionate selection and ordinal- selection only in that the probabilities of the ranked individuals
based selection. Proportionate-based procedures select are exponentially weighted.
individualsbasedon their fitness values relative to the fitnessof
the other individuals in the population. Ordinal-based 5.1.5.ElitistSelection
procedures select individuals not upon their fitness, but based
on their rank within the population. Preservation of the elite solutions from the preceding
An ordinal selection scheme has a fundamental advantage generation assures that the best solutions known so far will
over a proportional selection one. The former is translation and remain in the population and have more opportunities to
scale invariant, i.e., the selection pressure does not changewhen

produce offspring. Elitist selection is used in combination with
otherselectionstrategies. Another possibility is power scaling, i.e., 1'= Inr.
general, the k value is problem dependent and may require
5.1.6.Proportional Selection adaptation during a run to expand or compress the range of
fitness function values. The problem with all fitness scaling
This is the first selection method proposed for GAs. The schemesis that the degree of compression can be determined by
probability that an individual will be selected is simply a single extreme individual, degrading the GA performance.
proportionate to its fitness value. The time complexity of the
method is the same as in tournament selection. This mechanism 5.2. Crossover
works only if all fitness values are greater than zero. The
selection probabilities strongly depend on the scaling of the Crossover is a very controversial operator due to its
fitness function. In fact, most of the scaling procedures
disruptive nature (i.e., it can split important information). In
described in the next sections have been proposed to keep
fact, besides GAs, other evolutionary algorithms do not rely on
proportional selection working. One big drawback of
crossover (or similar type of recombination). However, no
proportional selection is that the selection intensity is usually
definite answer about the necessity of using crossover has been
low, because a single individual, either the fittest or the worst,
reached so far.
dictates the degree of compression of the range of fitnesses.
The traditional GA uses one-point crossover (Fig. 1), where
This is quite common even during the early stage of the search,
the two parents are each cut once at specific points, and the
when the population variance is high. Negative selection
segmentslocated after the cuts exchanged. The positions of the
intensity is also possible.
bits in the schema determine the likelihood that these bits will
Notice that in ordinal-basedselection schemes the effect of
remain together after crossover. Obviously, an order-I schema
extreme individuals is negligible, irrespective of how much
is not affected by recombination, since the critical bit is always
greater or smaller their fitnesses are than the rest of the
inheritedby one of the offspring.
population. Therefore, despite its popularity inside the power
systemresearch community, proportional selection(i.e., roulette
wheel)is usually an inferiorscheme. There are different scaling
operators that help in separating the fitness values in order to [1 11 ° o o 1] [1 o 1 0 0 0]
improve the work of the proportional selectionmechanism. The =>
most common ones are linear scaling, sigma truncation and [I 0 I 0 1 o 0 0] [1 0 0 o o 1]
Fig. 1. Exampleof one-point crossover. Linear scaling
The crossover operator presented above can be generalized
Linear scaling (i.e., 1'= a.f + b) works well except when in order to apply multiple-point crossover. However, more than
most populations members are highly fit, but a few very poor two crossover points, although giving a better exploration
individuals are present. The coefficients a and b are usually capacity,can be too much disruptive. The crossovermechanism
chosento enforce equality of the objective and fitness functions can be better visualized treating strings as rings. In Fig. 2, two-
average values, and also cause maximumscaled fitness to be a point crossover is applied to the example shown in Fig. 1. Each
specified multiple (usually two) of the average fitness. These offspring takes one ring segment, in between adjacent cut
two conditionsensure that average populationmembers receive points, from each parent. The contiguous ring segment(s)is(are)
one offspring copy on average, and the best receives the taken from the other parent. For more than two crossover
specified multiple number of copies. Notice that proportional points, this procedure is repeated until the last segmentis filled.
selection with linear scaling is not the same as linear ranking An extra cut is assumed at the beginning of the string, i.e.,
selection. between genes g8 and g1, for an odd number of cut points.
From the linear string point of view, the elements in Sigma truncation between the two crossover points are swapped between two
parents to form two offspring (Fig. 2). One-point crossover can
In order to overcomethe presence of super-poor individuals, be represented by the ring geometry as a two-point crossover
the use of population variance information has been suggested with the first cut point always between genes g8 and g1. For
to preprocess objective function values before scaling. This multiple-point crossover, the cut points can be anywhere, as
procedure subtracts a constant from the objective function long as they are not the same.
values; i ' =max[O,i -(f-dD)] , where 1 is the mean
objective function value in the population. The constant d is
gl 1
chosen as a multiple (between 1 and 3) of the population
standard deviation, and negative results are arbitrarily set to g8 g2 1 1 o o
zero. I I
g7 g3 -+ 0 o &, 0 o Power scaling
g6 g4 1 0 to takethe average of the two corresponding parent genes. The
I I square-root of the product of the two values can also be used.
gS 0 1 Another possibility is to take the difference between the two
values, and add it to the higher or subtract it from the lower.
[1 1I 0 01 o 1] [1 0 0 1]
=> 5.3.Mutation
[1 0I 0 11 0 0 0] [1 0 0 0 0 0 0]
TheGA literature has reflected a growing recognitionof the
Fig.2. Ringrepresentation and two-pointcrossover. importance of mutation in contrast with viewing it as
responsible for re-introducing inadvertently lost gene values.
Uniform crossover is another important recombination The mutation operator is more important at the final generations
mechanism [23]. Offspringis created by randomly picking each when the majority of the individualspresent similar quality. As
bit from either of the two parent strings (Fig. 3). This means is shown in Section 5.4, a variable mutation rate is very
that each bit is inherited independently from any other bit. important for the search efficiency. Its setting is much more
Uniform crossover has the advantage that the ordering of the critical than that of crossoverrate.
genes is irrelevantin terms of splitting buildingblocks. In the case of binary encoding, mutation is carried out by
flipping bits at random, with some small probability (usually in
the range [0.001; 0.05]). For real-valued encoding,the mutation
[1 1 0 1 0 1 0 1] operator can be implemented by random replacement, i.e.,
~t ~ t ~ t t ~ => [1 0 0 1 0 0] replace the value with a random one. Another possibility is to
[1 0 0 1 1 0 0 0] addIsubtract (or multiply by) a random (e.g., unifonnily or
Gaussian distributed) amount. Mutation can also be used as a
Fig. 3. Example of uniformcrossover,where each mow pointsto the randomly hill-climbing mechanism.
picked genevalue.
5.4.Control Parameters Estimation
Uniform crossover is more disruptive than two-point
crossover. On the other hand, two-point crossover performs Typical values for the population size, crossover and
poorly when the population has largely converged because of mutation rates have been selected in the intervals [30, 200],
the inability to promotediversity. For small populations, which [0.5, 1.0] and [0.001, 0.05], respectively. Fixed crossover and
is not usually the case for large-scaleproblems, more disruptive mutation operators do not provide enough search power for
crossover operators such as uniform or m-point (m>2) may tackling large-scale optimization problems. Parameter manual
perform better because they help overcome the limited amount tuning is common practice in GA design. One parameter is
of information. tuned at a time in order to avoid the impossible task of
Reduced surrogates can be used to improve two-point simultaneous estimation. However, as they strongly interact in
crossover exploration ability. It is highly recommended for complex forms, this tuning procedure is prone to sub-
large-scale problems. The idea is to ignore all bits that are optimality.
equivalent in the two parent strings (Fig. 4). Afterwards, In fact, any static set of parameters is inappropriate,
crossover is applied on the reduced surrogates, i.e., only one regardless of how they are tuBed. The GA search technique is
possible cut is considered between any pair of non-equivalent an adaptive process, which requires continuous tracking of the
bits. search dynamics. Therefore, the use of constant parameters
leads to inferior performance, For example, it is obvious that
large mutations can be helpful during early generations to
[1 0 1 0 1 0 1] [- o - 1] improve the GA exploration capability. This is not the case for
reducedsurrogates => the end of the search, when small mutation steps are needed to
[1 0 0 1 1 0 0 0] [- 0 o - 0] fine tune sub-optimalsolutions.
The proper way for dealing with this problem is by using
Fig.4. Implementation of reduced surrogates to improvecrossoverexploration parameters that are functions of the number of generations.
Deterministic rules are frequently applied for implementing this
Notice that the reduced surrogate form implements the idea. However, besides being very difficult to define, they fail
to take into account the actual progress of the population
original crossover operation in an unbiased way. For example,
performance. Adaptive rules based on population variance, or
the cut points between genes 2)3, 314 and 415 produce the same
effect on offspring. Therefore, two-point reduced surrogate even the search for optimal parameters as part of the GA
crossoverconsidersthese cut points as one singlepossible cross processing (i.e., including parameters as part of the
point. chromosomes) seem to be more promising [24],[25].
The crossover operator can be redefined for real-valued
encoding. Different combinations have been utilized (e.g., a 6. NICHING METHODS
convex combination such as At!} +A.2~2)' One possibility is

Two agents cause the reduction of population fitness under analysis [30]. Recently, another interesting idea, based on
variance at each generation. The first, selection pressure, the theory of immunity in biology, has been proposed [31].
multiplies copies of the fitter individuals. The other agent is The fJISt applications to power systems appeared after 1991
independentof fitness. It is called genetic drift [26] and is due [32]. Since then, GAs have been applied not only to pure
to the stochasticnatureof the selectionoperator (i.e., bias on the optimization problems in power systems, but also to model
random sampling of the population). When there is lack of identification, control, and neural network training. After a
selection pressure, genetic drift is responsible for premature necessary period of maturing, GAs are being used now,
convergence. The GA still ends up on a single peak, even when frequently in combination with conventional optimization
there are several ones of equal fitness. techniques, for solving large-scale problems.
Therefore, even whenmulti-objective optimization is not the
main goal, the identification of multiple optima is beneficial for 8. APPENDIX
the GA performance, Niching methods extend standard GAs by
creating stable subpopulations around global and local optimal TABLE I
solutions. Niching methods maintain population diversity and HAMMING DISTANCEAND ORAY CODE
allow GAs to explore many peaks simultaneously. They are Binary Gray Real
based on either fitnesssharing or crowding schemes [27]. [0000] [0000] -0.9
Fitness sharing decreases each element's fitness [0001] [0001] -0.8
proportionally to the number of similar individuals in the
population, i.e., in the same niche. The similarity measure is [0010] [0011] -0.7
based on either the genotype (e.g., Hamming distance) or the [0011] [0010] -0.6
phenotype (e.g., Euclidian distance). On the contrary, crowding [0100] [0110] -0.5
schemes do not require the setting of a similarity threshold [0101] [0111] -0.4
(niche radius). Crowding implicitly defines neighborhood by
[0110] [0101] -0.3
the application of tournament rules. It can be implemented as
follows. When an offspring is created, one individual is chosen, [0111] [0100] -0.2
from a random subset of the population, to disappear. The [1000] [1100] -0.1
chosen one is the element which most closely resembles the [1001] [1101] 0.0
new offspring. [1010] [1111] +0.1
Another idea used by niching methods is restricted mating.
This mechanismavoids the recombinationof individuals which [1011] [1110] +0.2
do not belong to the same niche. Highly fit, but not similar, [1100] [1010] +0.3
parents can produce highly unfit offspring. Restricted mating is [1101] [1011] +0.4
based on the assumption that if similar parents (i.e., from the [1110] [1001] +0.5
same niche) are mated, then offspringwill be similar to them.
[1111] [1000] +0.6
It is importantto notice that similarity of genotypes does not
necessarily imply similarity of the corresponding phenotypes.
The hypothesis that highly fit parents generate highly fit 9. ACKNOWLEDGMENTS
offspring is valid only under the occurrence of building blocks
and low epistasis. When the genes strongly interact, there is no This work was supported by the Brazilian Research Council
guarantee that these offspring will not be lethals. (CNPq) under grant No. 300054/91-2. Alves da Silva would
also like to thank PRONEX for the financial support and
7. FINALCOMMENTS Professor Djalma M. Falclo, from the Federal Universityof Rio
de Janeiro, for his time and effort reviewing this chapter.
This tutorialon GAshas pointed out the main topics on their
design. The focus on the essential topics helps to not miss the 10. REFERENCES
forest for the trees. The first generation of GAs, based on the
canonical algorithm, considering proportional selection and [1] D.B. Fogel: "An introduction to simulated evolutionary optimization",
crossover/mutation with constant probabilities, was not [2] IEEE Trans. on NeuralNetworks, VoL5, No.1, 1994,pp. 3-14.
D.B. Fogel: Evolutionary Computation - Toward a New Phylosophy of
originally proposed for solving static optimization problems Macltine Illte11igence, IEEE Press, 1995.
[28]. Almost three decades of research has adapted the original [3] T. Blck, U. Hammel, and H.-P. Schwefel: "EvolutioDll')' computation:
proposal [29] to deal with this type of problem. comments OD the history and current state", IEEE Trans. on Evolutionary
One important issue that has been avoided in this chapter is [4] Computation, Vol. 1, No.1, 1997, pp. 3-17.
D.E.Goldberg: Genetic Algorithmsin Search, Optimizlltion and Machine
parallel GAs. They introduce new parameters such as the Learning, Addison-Wesley, 1989.
number of populations and their sizes, the topology of [5] D. Beasley, D.R. Bull, aDd R.R. Martin: "An overview of genetic
communications (e.g., each population is connected to all the algorithms: Part 1, fundamentals", University Computing, Vol. 15,No.2,
others), and the migration rate. Althoughmany implementations 1993, pp. 58-69.
D. Beasley, D.L Bull, aDd R.Ro Martin: "An overview of genetic
of parallel GAs have been described in the literature, the effect [6] algorithms: Part 2, research topics", University Computing, Vol. IS, No.
of these new parameters on the quality of the search is still 4, 1993, pp. 170-181.
[7] D. Whitley: "A genetic algorithm tutorial", Statistics and Computing, [21] T. Back: "Selective pressure in evolutionary algorithms: A
Vol. 4, 1994,pp. 65·85. characterization of selection mechanisms", Proc. 1st IEEE Conf. on
[8] E. Aarts and J. Karst: SimulatedAnnealing and Boltzmann, John Wiley, EvolutionaryComputatiOn, IEEEPress, 1994,pp. 57-62.
1989. [22] L.D. Whitley: "The GENITOR algorithm and selection pressure: Why
[9] F. Gloverand M. Laguna: TabuSearch, Kluwer Academic, 1997. rank-based allocation of reproductive trials is best", Proc. 3rd Int. Conf.
[10] J. Kennedy and R.C. Eberhart: Swarm Intelligence, Morgan Kaufmann, on Genetic Algorithms,Morgan Kaufmann, 1989,pp. 116..121.
2001. [23] G. Syswerda: "Uniform crossover in genetic algorithms", Proc, 3rd Int.
[11] G. Rudolph: "Convergence analysis of canonical genetic algorithms", Conf. on Genetic Algorithms, Morgan Kaufmann, 1989,pp. 2·9.
IEEE Trans.on NeuralNetworks, Vol. 5, No.1, 1994,pp. 96-101. [24] N. Saravanan, D.B. Fogel, and K.M. Nelson: "A comparison of methods
[12] HJ. Antonisse: 64A new interpretation of schema notation that overturns for self-adaptation in evolutionary algorithms", BioSystems, Vol. 36,
the binary encoding constraint", Proc. 3rd Int. Conf. on Genetic 1995,pp. 157-166.
Algorithms, Morgan KaufiDann, 1989, pp. 86-91. [25] A.E. Eiben, R. Hinterding, and Z. Micbalewicz: UParameter control in
[13] D.E. Goldberg: "The theory of virtual alphabets", in Parallel Problem evolutioDary algorithms", IEEE Trans. on Evolutionary Computation,
Solving from Nature 1, Lecture Notes in Computer Science, Vol. 496, Vol. 3, No.2, 1999, pp. 124-141.
Springer,1991,pp. 13-22. [26] A. Rogers and A. Priigel-Bennet: "Genetic drift in genetic algorithm
[14] LJ. Eshelman and J.D. Schaffer: "Real-coded genetic algorithms and selection schemes", IEEE Trans. on Evolutionary Computation, Vol. 3,
interval-schemata", in Foundations of Genetic Algorithms 2, Morgan No.4, 1999, pp. 298·303.
Kaufmann, 1993,pp. 187-202. [27] B. Sareni and L. KrlhenbOhI: uFitness sharing and niching methods
[IS] N.J. Radcliffe and P. D. Surry: "Fitness variance of formae ana revisited", IEEE Trans. on Evolutionary Computation, Vol. 2, No.3,
perfonnance prediction", in Foundations of Genetic Algorithms 3, 1998,pp. 97-106.
Morgan Kaufmann, 1995,pp. 51-72. [28] LA. De Jong: "Genetic algorithms are NOT function optimizers," in
[16] Z. Michalewicz and M. Schoenauer: "Evolutionary algoritlmis for Foundations of Genetic Algorithms 2, Morgan Kaufmann, 1993, pp. 5-
constrained parameter optimization problems", Evolutionary 17.
Computation, Vol. 4, No.1, 1996,pp. 1-32. [29] J.B. Holland: A.daptation in Natural and ArtificialSystems, Universityof
[17] M. Gen and R. Cheng: "A survey of penalty techniques in genetic MichiganPress, 1975.
algorithms", Proc, 3rd IEEE ConL on Evolutionary Computation, IEEE [30] E. Cantu-Paz: "Markov chain models of perallel genetic algorithms",
Press, 1996,pp. 804-809. IEEE Trans. on EvolutionaryComputation, Vol. 4, No.3, 2000, pp. 216-
[18] B.L. Miller and D.E. Goldberg: "Genetic algorithms, selection schemes, 226.
and the varying effects of noise", University of Illinois at Urbana- [31] L. Jiao and L. Wang: "A novel genetic algorithm based on irmnunity",
Champaign, IlliGALReport,No. 95009, 1995. IEEE Trans. on Systems,Man, and Cybernetics- Pan A, Vol. 30, No.5,
[19] D.E. Goldberg and K. Deb: "A comparative analysis of selection 2000, pp. 552-561.
schemes used in genetic algorithms", in Foundations of Genetic [32] V. Miranda, D. Srinivasan, aDd L.M. ~: "Evolutionary
Algorithms, Morgan Kaufmann, 1991,pp. 69-93. computation in power systems", Electrical Power &. EDeI1Y Systems,
[20] T. Blickle and L. Thiele: UA comparison of selection schemes used in Elsevier,Vol. 20, No.2, 1998,pp. 89-98.
genetic algorithms", Swiss Federal Institute of Technology,TIK-RePQrt,
Nr.!!, 2ad Version, 1995.

Chapter 4
Fundamentals of Evolution Strategies and Evolutionary Programming

Abstract - In this Chapter one discusses the principles governing the location. And with this sense of independence, it is reasonable
family of Evolutionary Algorithms called Evolution Strategies. A to recognize that .Evolutionary Programming is just a
variant, called Evolutionary Programming and sometimes taken as conceptual subset (not historical, of course) of the general
independent, is also discussed. Evolution Strategies do Dot depend on conceptsdeveloped underthe name of EvolutionStrategies.
chromosome coding and have a strong theoretical background
justifyingtheir success. As in any evolutionary process, ES and EP rely on the
Keywords- Evolution Strategies, EvolutionaryProgramming definition of a fitness function, which sets up the
"environment" and establishes the way to measure the quality
of each solution (called individual). The fitness function is just
like the objectivefunction of an Operations Researchproblem;
1. INTRODUCTION as such, it may includepenalties for the violationof constraints.

This chapter is devoted to the description of a branch of To be fair, the concept of fitness function may include a
techniques in evolutionary computing: that, for historical loose definition of what a function is. In fact, the only real
reasons, was kept divided for many years, although they may requirement is a process that allows a rankingof alternatives in
be seen in fact as the same generic approach: evolutionary the solution space, such that this ranking is in agreement with
programming (EP) and evolution strategies (ES). This does not the preferences defined by the decision-maker. This process
happen by chanceor coincidence: today, we have difficultiesin can, therefore,include rules, besides mathematical expressions.
explaining to newcomers the differences between the two But it is true that the traditional models have analytical
approaches. expressions as their fitness function.

Instead of two distinct paradigms, what we find is a Both EP and ES share with GA this "selection-by-fitness-
collection of variations over the same theme and perhaps it is function"principle.It is worth to notice this, because this is not
about time to accept calling all them by a family name such as the only process that may ignite and drive an evolutiveprocess.
"evolution strategies"... Nevertheless, history records that For instance, "selection-by-arms-race" is another possible
evolutionary programming and evolution strategies were mechanism; it would require at least two distinct populations,
created independently. one of them possibly predating on the other. But, in fact, GA,
ES and EP are all three based on the same algorithmic
Evolutionary Programming is attributed to Lawrence J. approach, where individuals are not confronted with one
Fogel, sometimes back to the early 1960's in collaboration with anotherbut are measured against an external commonselection
John Holland, although the landmark publication is the book function.
"Artificial Intelligence Through Simulated Evolution" (1966).
While Holland led his followers into developing the GA But ~S and EP move away from GA in the way they
(Genetic Algorithm) approach, Fogel persisted in a line that is represent alternatives, solutions or individuals in a population.
now known as EP (EvolutionaryProgramming). While Genetic Algorithms rely on the power of a discrete
gen~tic representation to generate new offspring with higher
Evolution Strategies is a method claimed by I. Rechenberg SUI'VIval chances, EP and ES make direct representation of
and H.-P. Schwefel, who report the first developments back to individuals.
the TU Berlin, in 1963. They stimulated an independent group
that produced a remarkable set of practical and theoretical In GA, one must establisha mappingbetween the space of
results, in such a way that EP could be seen sometimes as a the genes and the space of the phenotypic variables. The
subset ofES. variation introducedby crossoverand mutation is generated at
gene level, while the phenotypic consequences are evaluated
These two communities evolved separately and gathered againstthe fitness function at problemvariablelevel.
rival factions around them, sustaining competing series of
workshops and conferences, with supporters aligning OD either In ES and EP, there is no real gene level and no need for a
side of the fence. Both the adepts of the American and of the mapping process between genes and problem variables. Each
European school of thought sustained a sort of race for solutionis represented by its own variables, with real or integer
popularity and tried to promote the originators of the processes values, within their feasible domains. Therefore, one can say
they supported as the true fathers of the evolutionaryapproach. that variation is introduced directly at the level of the
phenotype. Variation and diversity are essential to make
For practicalpurposes, it is healthy to witness some present- selection ~eful: They allow the coverage of the search space.
day efforts in order to unite the evolutionary perspective Loss of diversity leads usually to an early termination of
brought aboutby the two schools, and to give recognition to the evolutionary algorithms, either at sub-optimal solutions or at
ones who really deserve it, independendy of their geographical local optima.

In classical pure EP, variation is introduced solely by judgment, instead of using analytical solutions available at the
mutation. In ES, besides mutation, processes similar to time.
crossover also originate variation and new phenotypic Nowadays, there is a variety of models or versions of ES.
expressions. But, as we shall see, there is no real distinction In the following sections we will discuss some aspects of what
between the two approaches and they have actuallyconverged
could be called a canonical model - the (J,l,IC,A.,p) ES model,
in conceptual terms.
usingthe notation of [1] .
Contrary to many approaches to the theme, we will start
with the description of Evolutionary Strategies. We have 2.1. The general (J.l,K,A.,p) EvolutionStrategiesscheme
present that, because of its greater simplicity, EP easily gains
adepts. It can easily be developed by students and the The designation "(J.L,IC,A.,p) ES model" has been proposed
popularity gained contributes certainlyto keep it in the popular by Schwefel [1] and has the following parameters:
although questionable view as an independent development J.1 - numberof parents in a generation
from the Evolution Strategies mainstream. However, as we K - number of generations of survival or maximum
shall see, EP can be seen very naturally as a branch or a numberof reproduction cyclesof an individual
subdivision of ES. A. - number of offspring or children created in one
p - numberof directancestors for each individual
Under this light, Evolutionary Programming is almost like a
Under the general designation of Evolution Strategies, the
(J.1,CX>,J.1+Jl, 1)Evolution Strategy.
ideas started by Rechenberg and Schwefel were explored and
resulted in a remarkable legacy of theoretical and experimental This means that a whole family of processes can be started,
work. This work is so impressive that, after becoming aware of depending on the choiceof the above parameters. Some of the
its full extent, an independent observer cannot avoid the varieties have been researched in depth and some are still open
temptation to classifyEvolutionary Programming as a de facto field for research. Of course, the simplest ones have been the
subsetof the globalEvolution Strategiesframework. most investigated, and this effort has brought insight into the
mechanisms that power the Evolution Strategies and make
The geographical independence of the development of EP
them so successful (or that provoke divergence and lack of
(in the US) and ES (in Europe) and subsequent creation of
success in somecases).
schools of followers is probably the best expIanation why we
still today find a distinction in the literature. As we shall see, The aim of this Chapter is to explain to the Power System
the most fundamental reason that one sees claimed, to researchers and engineers the basics on how ES are built and
distinguish EP from ES, is the fact that "pure" EP does not work. Therefore, our didactic strategy will be to start with
make use of recombination of individuals as a means to simple models and progressively increase them in complexity.
generate offspringand diversity and relies only on mutation. After all, it will be like retracing the story of the theoretical
development of ES.
Evolution Strategies is a name that covers a wide family of
related algorithms. These algorithms follow the general The first ES models bad less degrees of freedom than the
biological paradigm of exploting a search space by means of (J.1,K,A.,p) model admits. The first approach became known as
processes mimicking mutation, recombination and selection the (1+1)ESmodel: it had, in each generation, only one parent,
(because EP does not use recombination, it is just a special case only one descendent was generated and the selection acted
withina family of evolutionary mimickingprocesses, ES). And uponthe set constituted by parent and child.
they are distinct from GA models in the fact that there is no
distinction betweengenotype and phenotype, i.e., there is Dot a Later, one spoke of an opposition of (Jl+).)ES against
separation of worlds where variation is generated in one of (J.1,A)ES. In the (J,1+A.)ES, the J.l survivors in each generation
them while selection acts on the effects felt in another ODC. In wereselectedfroma populationformed by the union of the sets
ES, the problem and the alternatives are represented in their of J.1 parents and Achildren. This meant that an lndividuaI had
natural variables. the possibility, in theory, of living forever. According to this
notation, the first experiments obeyed to a (1+1)ES strategy.
The problem addressed in the early 1960's by Rechenberg
and Schwefel was related with finding optimal shapes On the other hand, in a (Jl,A)ES with A ~ .... ~ 1, the new Il
presenting minimum drag in a wind tunnel - a typical future parents are selected from the A. offspring only, no matter
engineering problem. Between 1963 and 1974, these how good their parentsmight be. It has been demonstrated that
researchers developed the foundationsof a theory for Evolution this strategy risks to diverge in some cases, if the solution
Strategies. The results, however interesting, remained within "best-so-far" is not storedexternallyor at least preservedwithin
the knowledge of a closed community, perhaps especially the generation cycle (this deterministic preservation of the best
composed by civil or structural engineers. It seems that a reason is called elitism). The first modelsof this kind have been called
exists for this: there are many structural or technical the (1,A) ES.
optimization problems for which no mathematical analytical
closed fonn for an objective function exists. Therefore, The (J.1,A.)ES implies that an individual can have children
engineers had to rely on their intuition and professional only once, and that its life duration is of one generation, as

opposed to the (J.l+A)ES where there is no limit for the life span II initialize a random population P of J.L elements
of an individual. InitpopulationP[g];
II evaluate the fitness of all individuals of the population
The {J.l,lC,A.,p)ES introduces new degrees of freedom in Evaluate P[g);
defining an evolution strategy. With the variable 1C defining a while not done do
life span for each individual, one can now test a variety of II reproduction - generate A. offspring...
strategies and look at the {J.L,A)ES and the {J.l+A.)ES as the II .. •by recombination
P' [g) := recombine(P [g])
extreme cases of such variety. Furthermore, by recognizing the
II .••by mutation - introduce stochastic
role of the parameter p, which defines how many parents does perturbations in the new population
an individual have, one explicitly introduces the operation of
recombination as one major factor conditioning the P [g]:= Mutate ( P' [g) );
II evaluation - calculate the fitness of the new
development of an EvolutionStrategy. individuals
But in contemporary ES, the (J.L,le,A-,p) are not the only Evaluate P [g];
parameters to take in account.We can list somemore: II selection - of J1 survivors for the next generation,
based on the fitness value
p - the (start)population
P [g+1) := seled ( P[g] u P
[g] );
mut - the mutationoperator II test for termination criterion (based on fitness, on
Pm - mutationprobability If test is positive then done := TRUE;
rec - the recombination operator II increase the generation counter
Pr- recombination probability End while
sel- selection operator
l; - numberof stochastictournament participators 2.2. Some more basic concepts
Other parameters may be recognized, some of them Although there is still much work to be done in order to
associated with the operators ree, mot and sel adopted. establish on solid grounds a general theory about generalized
Evolution Strategies, there are some achievements made over
It is usual to distinguish in the representation of an
some simplified models that allowed insight into the way ES
individual two types of parameters: object parameters (OP) and
work. Although this text is not meant to organize all theory
strategy parameters (SP). Say, individual c is represented by
behind ES and is instead oriented to give an introduction to the
c(OP,8P), such that
topic to Power Engineers and Researchers, we win nevertheless
OP = (Oh 02, ... ,ouo) introduce some basic concepts that have been used by
researchersin the field.
and SP =(5h 52, ... , sas)
In trying to develop a formal description of the behavior of
given"no" objet parameters and "ns" strategyparameters. ES, researchershave worked mostly on the so-called"spherical
The object parameters may be OP = (9, xi, ... , xJ. The x model", and have introduced the concept of "progress rate".
are the classical n variables of the problem o(or the phenotypic The progress rate cp is defined as the Expectation of the
variables) and are the only ones that enter the fitness function. change in the (Euclidean) distance, from one generationto the
The parameter e counts the remaining life span measured in following, between the optimum (wherever it is) and the
number of iterations (reproduction cycles). Of course, at the average location of the population.
birth of an individual,e = x,
The spherical model consists of an isotropic fitness
The strategy parameters usually refer to the standard landscapedefined by
deviations CJ for mutations, which can be global or in each of
the n dimensions or variables of an individual, and to
parameters a establishing correlation between mutations in Ft(y)=co + tCi~i -y;f
distinct variables (sometimes called"angle"parameters). i=l

The general algorithm for Evolution Strategies could be with Ci = I , Vi = 1,... , n

something like this:
and the symbol· denoting "optimum".
This is an interesting model because it bas radial symmetry. °

Procedure ES Therefore, the fitness function may be also written as

II define parameters and operators F(y) - F(y· R) - Q(R), where R is the "distance-to-
set ll,lC,A.,p and other parameters; optimum" vector; but due to the radial symmetry of the
set operators(rec. mut, sel); spherical model, Q(R) is dependent only on R =ItRn, the length
II start the generation counter ofR, i.e., Q= Q(R.) and has y* as symmetry center.
9 :=0;

A mutatedindividual is thereforedefined as resultsfor this strategy,namely about convergencevelocity and
step size.
This model will have an individual in a generation
where Z is a random vector. Observing Figure 1, we can represented as a set of variables (without loss of generality,
understand that the rate of progress <p may be defined as the let's admit for the moment that we are dealing only with real
expectation valuedvariables), such as in Figure2.
cp= E{R -r}
where r is the distance of a mutated individualto the optimum.
This has been the basic model adopted by researchers like
Beyer [2] to analyze, in the spherical model, the progress rate
of an ES and to derive laws about the probability of success, Figure2 - Representation of an individuali with n real valued
i.e., the probability of the mutated y being inside the circle variables
definedby R aroundthe optimum, and about how to achieve an
optimal progress rate, i.e., the fastest progress possible towards
the optimum. The mutation scheme may be described as follows: a
mutation step is carried out at generationg by addinga random
perturbation Z to the parent individual X<a), creating X:
X :=X{J)+Z

, z = a(N 1(0,11.·., Nn (O,l))t

,, The individual X is a vector of"n" object variables. Nn(O,l)
corresponds to the Gaussian distribution with zero mean and
unit variance of the jth variable, and a is the step size or
mutation strength. Z is a random vector and, given the
,, defmition above,the mutation distribution is isotropic.
' .......... _-- The global mutation strength or step size a cannot be
maintained constant, if convergenceis to be achieved. A useful
rule-of-thumb has been proposed by Rechenberg [3], the so-
called "l/S success rule": in order to have an optimum
convergence velocity, the "success rate" S(h) or the ratio
Figure 1 - Representation of the projection of a fitness
between successful mutations and all mutations should
landscape in n=2 dimensions, with identification of the
approach the value of l/S. Therefore, the step size (global
optimum y* and a point y discovered at a generation g; the
mutation variance) should be increased if the observedsuccess
circlewith radius R defines the domain of successof mutations
rate is greater than 1/5 and decreasedif it is less than l/S. S(h)
Z addedto y, for which r < R.
is the success rate in the last h generations.
According to Rechenberg, there is an optimal value for h
given by multiplying by 10 the number n of variables or
2.3. Theearly (1+1)ES and the l/S rule
dimensions of the search space. So, if (J is the initial step size
The first ES models experimented in the 1960's were using for a problem with 10 variables, then one should evaluate the
only one parent and one child per generation. In what is now success rate in a set of h=100 generations and (J may change to
called a (1+1)ES, or the Plus Strategy, one parent generates a.a if S(h»1/5 and to a/a if S(h)<I/S. The value of a should be
one offspring per generation, by applying normallydistributed chosensuch that ae[l, 2].
mutations (this means that smallerchangesare more likely than
major changes); if one child has a better performance than a This result has been obtained for strongly convex functions
parent, measured in terms of the optimiation objective, it of the spherical model type. This spherical model became
replaces the parent (becomes selected) otherwise the child is important in many subsequent theoretical advancements,
discarded and a new offspring is generated from the parent because in many situations it is locally a good approximation of
individual by mutation. No recombination is used, the mutation a given fitness landscape. In the case of spherical models and
scheme is kept unchanged alongthe generations and there is an similar, a linear convergence order for convergence velocity
exogenous controlof the step size. has been proved.

We see that this is a simpleschemewith elitist selection. So However, this "l/S rule" may also reduce the effectiveness
simple, in fact, that it allowedthe derivationof sometheoretical of the algorithm in finding an optimum; it may accelerate the
discover of the optimum, but the probability of actually

reaching it becomes reduced, because it tends to get trapped in This model allows us to introduce a dynamic control of the
local optima; from then on, if may become difficult to find mutationstrength,as opposed to the Rechenberg I/S rule which
improvements in the neighborhood and then, after a number of was seen by many as a concept alien to the ES spirit: the
generations, the application of the 1/5 rule will further reduce mutation strength was controlled externally by -some
the stepsize, making it even more difficult to escape. deterministic rule.
Therefore, the research effort was directed towards
2.4. Focusingon the optimum achieving a dynamic control of the mutation strength under
Mutations leading to important variations in the individuals principles of evolution and self-adaptation - this means that a
are usually beneficial to the procedure in the beginning, mutation strengthparameter would also be subject to mutation
becausethey allow new individuals to jump away from parents and selection,in order to adapt the progress of the algorithm to
and thus to probe vast regions of the feasible domain of the an optimalprogress rate.
problem. However, at a later stage, large perturbations drive
A succesful family of models in this line of reasoning is the
individuals away from the region of the optimum.
aSA or a-Self Adaptation strategy, originally developed by
Common sense tells us that when we have solutions Schwefel [4,5].
neighboring a possible optimum, the spread of the probability
The central idea is that each individual is governed by
distribution that regulates mutation should become narrower..
object parametersand by evolvable strategy parameters, and if
This allows a fine adjustment of the solutions and is part of the
an individual is selected with respect to its fitness, the
rationalbehind Rechenberg's Rule.
corresponding set of strategicparameterssurvive as well. These
This rule introduces or extemally defines rules for reducing strategic parameters, if optimal, should drive the individuals
the spread of the probability distributions, with the increase in into a regime of optimum gain, i.e., of maximal expected
generations. Proposed as such, this naive scheme is mechanical, fitness improvement per generation.
deterministic and rigid and against the very spirit of
In the (1).) strategy we have only one evolvable strategy
evolutionary processes.
parameter - a mutation strength G. An individual at a
One technique instead gained popularity, because it uses the generation g can thereforebe representedsuch as in Figure 4.
very same principles ofevolution in a sort of meta-evolutionary
This scheme associates to each individual an extra variable,
which represents precisely the variance of its mutation
distribution. Although we will further elaborate on this topic
some sections below, we can immediately state that this
strategy is quite successful in many cases and that it has
theoretical background to support it Figure 4 - Representation of an individualwith n real valued
variables; it has an extra variable related with the variance of
the gaussian distribution commanding the mutations in its

The hope in a self-adaptationapproach is that the problem

will "learn" the value of an optimal strategic parameter a*
whose adoption would lead to the maximum possible progress
rate. Unfortunately, this is a value that is not available
beforehand, its calculation would depend on the knowledgeof
the location of the optimum that one is precisely seeking - it is
an unknowntheoreticalvalue to be approached.However, there
will always be random fluctuations around the exact value that
will cause some loss in efficiency.
Figure3 - Two mutation gaussian distributions with distinct
variances. Their mean value is the present value of a variable. As in (1+1 )ES, the object parameters are subject to mutation
A smaller variance allows greater probability for smaller such that, departing from a generation g, A offspring are
perturbations in the value ofa variable. generatedby

2.5. The (1).) ES and aSA self -adaptation

In this strategy, one parent X generates A, offspring
individuals, and selection acts among these to obtain a single
survivor to become a parent in the following generation.
Therefore, parent and children do not compete against one but now the mutation strength (J has been itself subject to
another. mutationsuch that

Gk (g+1) = n~(g) J proportional to 1/ ~ , where n is the dimension of the search
The n[.] mutation operator perform~ . m~ltip1icative
mutations. This can be done by the multiplication of the Practical rules fromBeyer [7] are the following:
parental ~) by a randomnumber ~ such that 1
For A.~10, or=-: ~Cl,')..
O'k (g+1) = ;O'(g) , k = 1 to A

The expectation of~ must not deviate too muchfrom 1, i.e., 1 et,A.
For 4<A<IO , 't~ r
E{;} ~ 1 vnn 2cf,A +1-2d~~~
There are some distributions of the random variable ~ that
have found practical use. One of the most important is !he where cl,A is called the ,'progress coeffici
tcient" and d(2) .
1,A IS
lognormal distribution, originallyproposed by Schwefel, w~~h
has the property that some value will have the same probabIlIty called the "second order progress coefficient" [6]. Here is a
of beingdoubled or of becomingdivided in half: tableof coefficients extracted from [7], calculatedby numerical
integration froma theoretical model.

1 1 2~
1(1n;J2 Table 1- Coefficients cJ.L,1 to adopt in a (l,A) ES
tJi,; ~ A et,A d(2)
For practical purposes, the random variable ~ can easily ~e
generated from the Gaussian N(O,l) by an exponential 2 0.5642 1.000
transformation such as 3 0.8463 1.2757
4 1.0294 1.5513
~= e'tN(O,l)
5 1.1630 1.8000
These expressions introduce the external value 1: - ~e 6 1.2672 2.0217
learningparameter, which will soon be discussed. The learning 7 1.3522 2.2203
parameter t conditions the speed and accuracy of the aSA 8 1.4236 2.3995
evolution strategy. Therefore, the question on how to choose 9 1.4850 2.5626
good valuesfor "t remains,for the moment. 10 1.5388 2.7121
Another mutation rule used in practice depends on the 20 1.8675 3.7632
symmetrical two-point disttibution. For practical purposes, the 30 2.0428 4.4187
mutatedvalue of Ok(g+l) is given by 40 2.1608 4.8969
50 2.2491 5.2740
0'(8+1) ={O'(g) (1 + p) , if U(O,I) S 0.5 60 2.3193 5.5856
O'(g) /(1 +p), if U(O,l) > 0.5 70 2.3774 5.8512
80 2.4268 6.0827
with U(O,l) being a sampling from the uniform distribution in 90 2.4697 6.2880
[0, I]. Also the application of this distribution depends on a 100 2.5076 6.4724
learning parameter p, sometimes appearing under the form of
200 2.7460 7.7015
a =l+p.
300 2.8778 8.4610
It has been proved that, given ~ and p sufficientlysmall, the
effects of these two mutation schemes become comparable. It
Below A=4 one cannotadoptthe second formula, becauseit
has been demonstratedthat there is an equivalence between the
yields an imaginary result; a value larger than c1,A. may be
two approaches, given the correspondence 't = P~l- P, if 't is
used, instead.The aSA algorithm cannot adapt itself in such a
sufficiently small. way to obtain a theoretical optimum rate of progress,but it still
self-adapts the mutationstrength.
2.6. How to choose a value for the learningparameter?
As a general indication, it must be said that there is a
It has been demonstrated, within the hyper-spherical model, theoretical value for t that maximizes the progress rate.
which is usually a good local model, that large values for ~ or "t However, this maximum is not symmetrical with respect to t,
should be avoided. The objective is to find a compromise value and one has a much strongerrisk of degrading the performance
that leads the algorithm to a near-optimal performance, of the algorithm by choosing a "t value too small than from
measured in terms of "rate of progress" towards the optimum.
The Schwefel's rule establishes that 't should be chosen using 't > Cl,A! ~ •

Also, it has been observed (as naturally expected) that a 4 0.41 1.05 1.49 1.70 1.84 1.95 2.24 2.51 2.65
transient periodprecedes the establishment of a steady state in 5 0.00 0.91 1.39 1.62 1.n 1.87 2.18 2.45 2.60
the progressof the algorithm. 10 0.00 0.99 1.28 1.46 1.59 1.94 2.24 2.40
20 0.00 0.76 1.03 1.20 1.63 1.97 1.15
The transient behavior time is proportional to D. This is not
30 0.00 0.65 0.89 1.41 1.79 1.99
a serious problem if the dimension of the problem is not too
40 0.00 0.57 1.22 1.65 1.86
large, say, n<200, which is realistic for a number of practical
50 0.00 1.06 1.53 1.75
100 0.00 1.07 1.36
The magnitude of the learning parameter influences the
duration of the transient period, whose duration in generation
number is inverselyproportional to '[2. If'[ is chosen according One of the consequencesof having J.1 parents is that one can
keep a number J.L of distinct (J strategic parameters, each
to the rule that makes it proportional to 1/ J;, then the associated with a parent and mutated according to the rules
transient phase duration becomes proportional to the space explained above in 2.5.. It has been demonstrated, however,
dimension n. If n is very large, say, n>1000, this may be a that this (J.1,A)Evolution Strategy risks to diverge and therefore
serious problem, and then it is advisable to keep a fixed 't =0.3 Schwefel has recommended the adoption of elitist strategies,
during an initialperiod before starting to applythe rules above. . such as keeping or preserving the best individuals to the
The choice of the leaming parameter according to .followinggeneration.
Schwefel's rule leads to a nearly optimum progress rate of the But if one is going to follow this elitist approach, then why
algorithm, once the transient phase is finished. A aSA not just adopt a (J.L+A.)ES and have naturally kept the best
algorithm with a learning parameter chosen according to the individual in the successivegenerations?
rules aboveexhibitsa linearconvergence order.
There are always fluctuations and it is not possible to attain 2.8. Self-adaptation in (Jl,A.)ES
the theoretical optimum rate of progress of the algorithmjust
Departing from the basic ideas of self-adaptation, tested,
by manipulating 'to Therefore, other mechanisms have been
examined and theoretically explained for (l,A)ES, some
tried, like keeping a memory of the past values of the mutation
variants have been considered and developed for the (J.L,A)ES,
rate in order to act upon some kind of moving average, instead
of upon the mostrecent value. in what we could call the aSA (J1,A.)ES, which follows the lines
of GSA(I ,A)ES.
2.7. The (J.1,A.)ES as an extensionof (l).)ES In a aSA (J1).)ES, we must consider again object
parameters (the variables of the problem) and strategic
The emergence of a (J.L,A,)ES model with (J.1 parents S "-
offspring) is a natural development of Evolution Strategies. In parameters- in this case, a mutation rate Ok associated to each
(J.1,A), there is a population of J.1 individuals evolving in the
individual to be mutated. The object parameters (variable
values) are mutated as usual, by having
parameter space; they generate A. offspring by randomly
selecting one parent and mutating it, and doing this A times. Xk=Xk(g)+Zst , k=ltoA
The J.1 individuals of the following generation are selected as
the best among the A individuals generated by mutation - an where
elitist strategy.
Zk = csk (g+l) [N(O,11...,N(O,l)]t
Beyer, in [8], developed theoretical work in order to explain
the progressof a (J.1,A)ES as a generalized model of to (1,A)E8. In order to approximate an optimal progress rate, the Ok are
He managed to derive a fonnula for the progressrate dependent mutated according to
on a single cJ.L,1 progress coefficient parameter, generalizing Zk
ak 7.1\
=cske-ue =(JkeZO+ZL..
the result obtainedfor the (1,A)ES.
Below, we reproduce the table included in [8] which gives As usual, the symbol- denotes a mutated variable.
the value of c IlJ.. to consider when developing a an Evolution According to Schwefel [1], the mutating factors should be
Strategymodel of this kind. given. by Gaussian distributions dependent on leaming
parameters 't such that

Table 2 - coefficients cJ.L,1 to adopt for a diversityof (J1).)

Zo eN(O,'t6) ,'t6 =K2 2~
Evolution Strategies
5 10 20 30 40 50 100
Zk eN(o,'ti), 'ti =K2 lr
Jl\1 200 300 2"n
1 1.16 1.54 1.87 2.04 2016 2025 2.51 2.75 2.88
2 0.92 1.36 1.72 1.91 2.04 2.13 2.40 2.65 2.79 For 1<f1<A., large n and A. and not too small J.L , one may
3 0.68 1.20 1.60 1.80 1.93 2.03 2.32 2.57 2.71
have a direct relation of K with the progress rate, and then

2.9. The (f.1+A.) ES and EvolutionaryProgramming
Instead of a (J..L,A)ES, one may have a (fJ.+A.)ES. In this case,
Having uncorrelated distinct mutation strengths associated the IJ. parents of a (g+ 1) generation are chosen among the J.l
to the variables of the problemallowsthe evolution to adapt to parents from generation (g) plus the A. offspring created by
an anisotropic shape of the fitness landscape; however, the mutation from those Il parents.
search proceeds much along the coordinate axes of the search The practical indications on the values of the parameters to
space,as illustrated in Figure 5. adopt in a basic (J.1+A)ES follow the general trend of the
A (J,L+A.)ES, with J.1=A, is similar and can be assimilated to
Evolutionary Programming. There is only one traditional
difference that is minor but that has been artificially inflated to
sustain the idea that Evolutionary Programmingand Evolution
Strategies are two separate methods - it is the form of selection.
While in ES it was traditional to have an elitist selection
(the best at each generation would be selected to the next one)
in EP the tradition had preferred selection by stochastic
tournament. We say tradition, on purpose: tournament and
elitistselectionhave been used by both schools,ofES and EP.
We recall that the most simple Stochastic Tournament is
T(l,2), the one that, in successive operations, randomly
samples 2 individuals from the parent population and, with a
given probability fixed externally, selects the one with better
Figure 5 - Illustration of the search pattern induced in two fitness to be includedin the next generation.
different individuals by distinct mutation rates affecting the
distinct variables This is done as many times as necessary until the required
number A of offspring is generated Other kinds of Stochastic
Tournaments can be conducted,such as T(m,n), wherethe best
This couldbe recognized in searching for the optimumin a m out of a sampleof n parents arc selected.
function tested by Schwefel [9] as simple as In some models, however, the stochastic nature of the
selection is abandoned, and pure elitist processes are adopted.
F(X) = Li.xf For instance, one simply examines the fitness of all individuals
in the parent population, and selects the best A. individuals to
form the next generation.
where each variable is differently scaled - self-adaptation
In parallel with the ES community, the Evolutionary
demands the learningof the scalingof n distinct (Jj_
Programming followers also developeda self-adaptive strategy.
It was verified that the self-adaptation scheme was very Applied to EP, this process has been introduced in 1992 as
successful, after examining the results of a series of "meta-EP" [10]. The mutation process governing the evolution
experiments of a aSA (J1.,lOO)ES for J1 varying between 1 and of the mutation strength parameter in each individual from
30. Furthermore, it was discovered that the most successful generationg to generation g+1 is given by
scheme was with p. = 12, and that both smaller and larger
valueswouldcause loss of convergence speed. 0'(1+1) = ~O'(I)
The interpretation given is that for self-adaptation to work where ~ is a randomnumbergiven by
properly and efficiently, it requires a certain degree ofdiversity,
represented in a number of parents. Furthermore, it has been ~ = l+tN(O,I)
discovered that having A>p. is important, as well as having a and 1: is the "learningparameter", fixed externally. One may
limited life-span of individuals (not allowing them to survive observe that the mutations in the mutation sttength parameter
for more than a given number K of generations) and also the are still of multiplicative type, like in ES, while the mutations
application of recombination in the strategy parameters are all in the phenotypicvariablesare additive.
pre-requisites for a successful self-adaptation scheme, which
then canbe made to approachtheoretical optimumconvergence Under this light, we can observe that the mutation operator
speed. used in EP, in the variant called "meta-EP" and discussed
above, can be derived from the lognormal operator presented
for ES 2.5. by taking the Taylor expaIJSion to its linear term,
which gives precisely

~ = 1+ 'tN(O,l) II test for termination criterion(basedon fitness,
number of generations, etc.)
The optimal value of the learning parameter T has been the If test is positivethen done := TRUE;
object of empirical and theoretical studies. In many practical II increase the generation counter
models we found that this value has been fixed by trial and g:= 9 + 1;
End while
error, but the conclusionsderived for ES are perfectly valid for EndEP
the meta-EP.
Observing this piece of pseudo-code, we can see that the
Therefore the ''meta-EP'' proposed by Fogel[ll] with its selection procedure acts upon a generation which is composed
gaussian approach, if or is sufficiently small, becomes included by "parents" and "sons" - P[g] and P'[g]. This helps preserving
in the class of models of (J1+A) ES and exhibits the same the best individuals so that they may allow the exploration of
behavior. It can rightly be considered, therefore, as a particular promising regionsby giving place to "good" mutations.
case of this set of Evolution Strategies.
In the spirit of evolutionary computing, the selection
2.10. A schemefor Evolutionary Programming procedure should be stochastic, i.e., the best individuals should
be selected to the following generation with a given (usually
A typicalEP model, as any model in the ES family, requires high) probability. However, it is usual to find also, in practical
the definition of a fitness function and of a population of applications, procedures for deterministic selection, where the
individuals. best are always selected.
Each individual is represented by its variables, in their
natural domains. If a solution requires the representation of 2.11. Enhancingthe mutationprocess
structural or topological aspects, these can also be represented After having experiments with a global evolutive mutation
as naturallyas possible, namely by discrete variables. strength, and defining a global learning factor, as we have
Mutations act directly on variable values of an individual. discussed so far, researchers tried to decouplemutations in one
Real-valued variables are subject to a zero mean multi-variate individual, so that the distinct variables could undergo distinct
Gaussian perturbation in each generation. This means that evolutiveprocesses.
minor variations in an offspring become highly probable while This meant that, for an individual, one would set not a
substantial variations become increasingly unlikely. The singlemutationstrengthbut instead n mutationstrengthsfor the
Gaussianscheme,however, does not prevent them. n objectiveparameters or n variablesdefiningan individual.
This procedureallows real-valued variables to converge to a This scheme has been tried with some success. Goingback
possible optimum continuously, avoiding the discrete nature of was a good
to the "spherical model", one has postulated that it
a genetic binary coding. Also, this allows, sometimes, a new local approximation in many cases. However, it assumed an
area in the search space to be explored, by an individual that isotropic topology of the search space. Therefore, allowing
sufferedan importantsuccessful mutation. distinct mutation strengths according to distinct coordinate
For discrete variables, often associated with topology directions (the variables of the problem) would in principle
features, mutations may follow Poisson distributed deletions or allow a more accurate approximation of regions with a sort of
additions, ellipsoidtopology,instead of spherical.
So here is the pseudo-code for a general EP algorithm: A scheme with n mutation strengths allows, therefore, a
decoupling of the mutationrate evolutionaccordingto the axial
Procedure EP directions of the search space. In many cases, this will be
II start the generation counter enough to enhance the performance of a self-adaptive
9 :=0; EvolutionStrategy.
II initialize a randompopulation P
Initpopulation P(g]; But, in some cases, this is not enough. Correlations must be
II evaluate the fitness of all individuals of the population establishedbetween evolution along some direction and along
EvaluateP[g); some other direction. Otherwise, slow progress or even
while not done do divergencemay occur.
II reproduction - duplicate the population
P' (g) := P (g] The original scheme with one single mutation strength
II mutation • introduce stochastic perturbations in the assumed an evolution in an isotropic space - say, the length R
new population, including in the strategicparametera of a vector R, or IIRlI, is given by URII = RtR. Decoupling
P [g]:= Mutate ( P' [g] ); variable mutation strengths is equivalent to assuming a
II evaluation • calculate the fitness ofthe new diagonal metrics matrix in the search space; therefore, the
individuals lengthR of a vector R would be given by IIRlI = RtDR,with D
beinga diagonal matrix. This has been illustrated in Figure S.
Evaluate P [g);
II selection- of the survivors for the next generation, Recognizing this, ES has incorporated correlation between
based on the fitness value and on stochastic mutations as strategic variables. This is equivalent to
tournament considering a Mahalanobis metric in space - the length of a
P [g+1) := select ( P(g] u P[g) ); vectorR will be givenby URII- RTR, whereT is a full matrix.

Figure 6 illustrates the effect of having non-zero where Z = (z),... ~) and Zj e N(O, G i2 ).
covariances between variables, allowing the exploration of the
search apace along directions not aligned with the coordinate The vector CZ is, therefore,a random vector with normally
axes. distributed and eventually correlated components, as a function
ES followers have adopted a formal mathematical of the Ui and the O"i.
representation of the possible covariances of the mutation The strategic variables that establish the correlations or the
distribution in the severaldirectionsin space. The basic concept covariances (filling up of non-zeros the elements out of the
is the one of "inclination angle a" as the departure point to main diagonal of C) are the inclination angles a; one can
defininglinearlydependentmutation correlations for the object readily sec that if these angles are set to 0, th~ all matrices Tpq
variables. become Identity matrices and therefore mutations will develop
Given an angle .aj, a basic covariance matrix between independently in all dimensionsof the searchspace.
directions p and q may be defined by the transformation matrix These angles a are, therefore, taken as strategic variables
T, givenby and may also be mutated and subject to a self-adaptive
1 0 o procedure.
o 1 In summary, to establish correlated mutations, one may
proceed step by step as follows:
1st - mutatethe OJ
o M
2 - mutate the «it
Tpq(a)= 3rd - calculateand apply the matrix C to obtain a new
o mutatedindividual
o Theangles ak shouldbe mutated with

o o ak =ak +zk with zk e N(O,p2)

Valuesof ~ come from experimental wode, and Schwefel[ 1]
where only lines and columns p and q have elements distinct
has reccomended something around5° or 0.0873 rad as giving
from 0 or 1.
goodresultsin practice.

2.12. Recombination as a majorfactor

So far, we have only discussed mutation as a factor of
progress in an Evolutionary Strategy. However, this is an
incorrect or at least an incomplete view. In fact, recombination
may play an important, even dominant role in accelerating the
progress to the optimum and enhacing the chances of
succession the searchprocedure.
Usingthe notation that Schwefel and Recbenbtq proposed,
as we have been doing so far, we have DOW to consider(lJIp,A.)

- --~
~a Evolution Strategies. In this variant, theperameter p determines
.........-.",J.::::'- the number of parents that recombine to form one new
The biological construction is based on p=2, but when
building an EvolutionStrategy, we do not need to be limitedto
that optionand may experiment strategieswith p>2.
Figure 6 - Illustration of the search process in correlated
directions of two distinct individuals with different linear Recombination is a word that designates a number of
correlations between variables. The angle a becomes one distinct procedures that share the property of building an
strategicvariable subject to mutation(and recombination) individual departing froma set of parents.
The product of all Tpq matrices according to all Here are some possible recombination schemes, that have
combinations of p and q gives the matrix C of covariances. beenused in EvolutionStrategies:
This allows the calculation of an actual mutation i to a given • Unifonn crossover - in this variant, the value for each
individual X by variable in the newly formed individual is obtained by
randomly selecting one of the p parents to "donate" its
x=x+cz value. In the case of p-2, it is traditional to randomly
generate a bit string with length equal to the Dumber of

variables of the individuals and then to use such string to discarded if the constraints are not met, and replacement
command the recombination procedure: if a bit associated generated until one is found that respects constraints.
to a variable is 1, the value from the first parent is selected, Furthermore, during the mutation phase, mutations can be in
ifit is 0 then the value from the secondparent is selected. many cases conditioned so that there is no possibility for an
unfeasibledescendentto be created.
• Intermediary recombination - in this variant, the value of
any variable in the offspring receives a contribution from This was the original ES scheme for handling constraints-
all parents. This could result either from averaging the but sometimes it may be a very time consuming procedure.
values of all parents (global intennediary recombination)
The other possibility is to handle constraints during the
or from averaging the values from a subset of the parents
selectionphase,by attributing a low fitness value to individuals
only, randomly chosen (local intermediary recombination).
that violateconstraints.
In these processes, one may still chose to average values
with equal weights or to randomly define weights for a
2.14. Startingpoint
weighted average. In the case of p=2, one could have the
valueof a variable givenby To start an EvolutionStrategy process, one has to generatea
first initial population of J.1 individuals. This can be typically
x~ew =Uk Xk,jl +(l-uk )Xk,j2 , done in two ways:
where the indices j 1 and j2 denote the two parent • By randomly generating the coordinates for the Jl
individuals and Uk sampledfrom an uniformdistribution in individuals, or
• By generating mutations from a seed or starting
• Pointcrossover - in this variant,parallelto the one adopted individual
in genetic algorithms, first one randomly defines 'Y «n) It was traditional in the community of Evolution Strategies
crossover points, common to all individuals in the set of to make sure that the initial population was composed of
parents,and then the offspringsuccessively receivesa part feasible individuals. However, this may not be mandatory if an
from eachparent, in turns. adequate method of penalties and handling constraints is
Experiences have demonstrated the power of recombination adopted.
to greatly accelerate the convergence of Evolution Strategies.
This beingso, some theoreticalexplanations were sought. 2.15. Fitnessfunction
In the Genetic Algorithm community, the Building Block The fitness function is usually represented by the objective
theory became popular. It stated that recombination allowed function to be evaluated, representative of the problem to be
good blocks from each parent to join together in a better solved.
descendent To this fitness function, one may add the effect of penalties
But in the ES community other mathematical descriptions to represent the undesired violation of constraints. This is an
allowed distinct views to emerge. Beyer, for instance, argued approach adoptedin all evolutionary algorithms variants.
differently [12], based on his developments of the conceptof One simple and yet claimed as effective way of adding the
progressrate:he suggested that recombination acted as a sortof effect of the violation of constraints, in a problem of
"genetic repair" mechanism, compensating the disruptive maximization, is to count the number of violated constraintsor
effects of mutation. Therefore, larger mutation strengths or add up the amount of violations, and attribute the fitness value
larger learning parameters were allowed, contributing to a according to the following role:
higher progress rate than in an ES withoutrecombination.
If no violationsoccur, Fitness(X) = F(X)
Furthermore, under some assumptions, Beyer also
demonstrated that the highest impulse from recombination was If violations occurr, Fitness(X) =- 1: violation values i,
achieved when all J.1 parents contributed to form one new i e constraint set
individual. He justified this assertion with a mathematical
demonstration and called his model the (J.1/J..L,A)ES. 2.16. Computing
Recombination, thus, plays a major role in modem One may find software allowing the implementation of
Evolution Strategies and is not a secondarytechnique. Evolution Strategies. A few examples are mentionedbelow.
One well known possibility is evoC, available from the
2.13. Handling constraints Bionics and Evolution Department of the Technical university
Unlike GA, EP allows a very natural way of handling of Berlin, Germany - it is an application written in C which
constraints in a problem. Becauseeach individualis codedin its may be used in a variety of platforms, from MS-DOS to
original or phenotypic variables, it is usually easy to enforce LINUX. It is free but not in the public domain, and could until
constraints. recently be obtained from ftp:/lftp-bionic.tb1Q.tu-berlin.de
under the directory/pub/softwarelevoC.
One way to do that is during the mutationphase- eachtime
a new individual is mutated, it can be checked for feasibility,

A set of MATLAB tools with a user-friendly interface
developed in the University of Magdburg, Germany, can be
International Conference on Artificial Life, vol 929 of
asked and obtained from bihn(@infaut.et.uni-magdburg.de, Dr.
Lecture Notes in Artificial Intelligence, page 893-907.
Springer,Berlin, 1995.
There is also a set of demonstration programs with a
[2] Beyer, H.-G., "Towards a Theory of Evolution Strategies:
relevant didactic interest, that were developed and are available
from the ftp server of the Technical Unversity of Berlin Some Asymptotical Results from the (l,+A.)-Theory, in
ftp:l/ftp-bionic.tblO.tu-berlin.de . These demonstrations are Evolutionary Computation, vol. 1, no. 2, page 165-188,
available also on internet, and have the support of a technical 1993
report [13] available from the same server. [3] Rechenberg, I., "Evolutionsstrategie-Optimierung
technischer Systeme nach Prinzipen der biologischen
Evolution", Fronunann-Holzboog,Stuttgart, 1973
[4] Schwefel, H.-P., "Adaptive Mechanismen in der
Evolution Strategies is the designation of a wide family of
biologischen Evolution und ihr EinfluB auf die
evolutionary algorithms which have in common the
Evolutionsgeschwindigkeit", Technical report, Technical
representation of solutions in the space of problem variables,
UniversityBerlin, 1974
instead of using any sort of coding like the binary coding
adopted in Genetic Algorithms. [5] Schwefel, H.-P., Evolution and Optimum Seeking, Ed.
Wiley, New York NY, 1995
The ES school of thought bad its birth in Germany and
remained confined to a closed community, perhaps due to the [6] Beyer, H.-G., "Towards a theory of Evolution Strategies:
fact that many of the early publicationswere made in German Progress Rates and Quality Gain for (1,+A.) Strategies on
language. However, it is now obvious that ES is a rich and (Nearly) Arbitrary Fitness Functions", in Y.Davidor,
fruitful field. R.MinDer and H.-P. Schwefel (eds.), "Parallel Problem
Detached from the historical processes that give birth to Solving from Nature", 3, Heidelberg, pp. 58-67, Springer
new ideas, one may also with no difficulty recognize that Verlag, 1994
Evolutionary Programming, which had for some time an [7] Beyer, H.-G., "Toward a Theory of Evolution Strategies:
independent development, is just a specialized subset of Self-Adaptation",in Evolutionary Computation, vol. 3, no.
Evolution Strategies. 3,pag.311-347, 1996
There is a substancialtheoretical workjustifyingthe success [8] Beyer, H..o., "Toward a Theory of Evolution Strategies:
of EvolutionStrategyalgorithms and givingindications on how the (J.1,A.) Theory", in Evolutionary Computation, vol. 2,
to tune an algorithm in order to obtain the maximum possible no.4,pag.381-407, 1995
efficiency, measured by the rate of progress towards the
optimum. [9] H.-P. Schwefel, "Natural Evolution and Collective
Optimum-Seeking", in A. Sydow ed., "Computational
This theoretical basis clearly indicatesthat recombination is System Analysis: Topics and Trends", page 5-14, Elsevier,
a major operator inducing fast evolutionary progress. Under Amsterdam, 1992
this light, pure EvolutionaryProgramming models, relying only
on mutation, seem to be more limited. [10] D.B.Fogel, "Evolving Artificial Intelligence·", Ph.D.
Thesis, University of California, San Diego, 1992
Theory and experiments also suggest that good strategies
should use a number of parents J.1 generating a larger offspring [11] D.B.Fogel, LJ.Fogel, J.W.Atmar, "Meta-evolutionary
A>J.1, and that in many cases generalized recombination programming", in R.R.Chen ed., "Proceedings of the 25th
processes, using all J.1 parents to generate each offspring, offer Asilomar Conference on Signals, Systemsand Computers,
the faster ratesof progress or algorithmefficiency. San Jose CA, USA, Maple Press, pag. 540-545,1991

Finally, is is evident today that self-adaptation schemes are [12] Beyer, 8 ..0., "Toward a Theory of Evolution Strategies':
usually very effectiveand offer the best chances of reachingthe on the benefits of Sex - the (JJlJ.L,A) Theory", in
absolute optimum while exogenously controlled mutation Evolutionary Computation, vol. 3, no. 1, page 81-111,
strenghts, even if allowing in some cases a fast progress 1995
towards the optimum, risk becomingtrappedin localoptima. [13] M. Herdy, G. Palone, "Evolution Strategy in Action - 10
ES-Demonstrations", proceedings of the International
4. REFERENCES Conference on Evolutionary Computatin, Jerusalem,
Israel, October 1994

[1] H.-P. Schwefel, G. Rudolph "Contemporary evolution

strategies" in F. Morin, A. Moreno, J. J. Merelo and P.
Chacon, ed., "Advances in Artificial Life", 3rd

Chapter 5
Fundamentals of Particle Swarm Optimization Techniques

Abstract: This chapter presents fundamentals of particle swann Chapter V shows some applications of PSO and ChapterVI
optimization (PSO) techniques. While a lot of evolutionary concludes this chapter with some remarks.
computation techniques have been developed for combinatorial
optimization problems, PSO has been basically developed for
continuous optimization problem, based on the backgrounds of 2. BASICPARTICLE SWARM OPTIMIZATION
artificial life and psychological research. PSO has several variations
including integration with selection mechanism and hybridization 2.1 Background of Particle SwarmOptimization
for handling both discrete and continuous variables. Moreover, Natural creaturessometimes behave as a swann. One of
recently developed constriction factor approach is useful for the main streams of artificial life researches is to examine
obtaining high quality solutions. how natural creatures behave as a swarmand reconfigure the
swarm models inside a computer. Swarm behavior can be
Key words: Continuous optimization problem, Mixed-integer _modeled with a few simple rules. Schoolof fishes and swarm
nonlinearoptimization problem, Constriction factor of birds can be modeled with such simple models. Namely,
even if the behavior rules of each individual (agent) are
simple, the behavior of the swarm can be complicated.
Reynolds called this kind of agent as boid and generated
Natural creatures sometimes behave as a swann. One of complicated swarm behavior by CG animation [1]. He
the main streams of artificial life researches is to examine utilized the following three vectorsas simpleroles.
how natural creatures behave as a swarm and reconfigure the (1) to step away fromthe nearestagent
swann models inside a computer. Reynolds developed boid (2) to go towardthe destination
as a swarm model with simple rules and generated (3) to go to the centerofthe swarm
complicated swarmbehavior by CO animation[1]. Namely, behavior of each agent inside the swarm can be
From the beginning of 90's, new optimizationtechnique modeled with simplevectors. This characteristic is one of the
researches using analogy of swarm behavior of natural basic conceptsofPSO.
creatures have been started. Dorigo developed ant colony Boyd and Richerson examine the decision process of
optimization (ACO) mainly based on the social insect, human being and developed the concept of individual
especially ant, metaphor [2]. Each individual exchanges learning and cultural transmission [6]. According to their
infonnation through pheromone implicitly in ACO. Eberhart examination, people utilize two important kinds of
and Kennedy developed particle swarm optimization (PSO) information in decision process. The first one is their own
based on the analogy of swarm of bird and fish school [3]. experience; that is, they have tried the choices and know
Each individual exchanges previous experiences in PSO. which state bas been better so far, and they know how good it
These researches are called "Swann Intelligence" [4][5]. This was. The second one is other people's experiences; that is,
chapter describes mainly about PSO as one of swarm they have knowledge of how the other agents around them
intelligence techniques. have performed. Namely, they know which choices their
Other evolutionary computation (EC) techniques such neighbors have found are most positive so far and how
as genetic algorithm (GA) also utilize some searching points positive the best pattern of choices was. Namely each agent
in the solution space. While GA can handle combinatorial decides his decision using his own experiences and other
optimization problems, PSO can handle continuous peoples' experiences. This characteristic is another basic
optimization problems originally. PSO has been expanded to concept ofPSO.
handle combinatorial optimization problems, and both
discrete and continuous variables as well. Efficient treatment 2.2 Basic method
of mixed-integer nonlinear optimization problems (MINLP) According to the background of PSO and simulation of
is one of the most difficult problems in optimization field. swarm of bird, Kennedy and Eberbart developed a PSO
Moreover, unlike other EC techniques, PSO can be realized concept. Namely, PSO is basically developed through
with only small program. Namely PSO can handle MINLP simulation of bird flocking in two-dimeDsiOD space. The
with only small program. The feature of PSO is one of the position of each agent is represented by XY axis positionand
advantages comparedwith other optimizationtechniques. also the velocity is expressedby vx (the velocity of X axis)
This chapter is organized as follows: Chapter II explains and vy (the velocity of Y axis). Modification of the agent
basic PSO method and chapter III explains variation of PSO position is realizedby the positionand velocityinformation.
such as discrete PSO and hybrid PSO. Chapter IV describes
parameter sensitivities and constriction factor approach.

Bird flocking optimizes a certain objective function. y
Each agent knows its best value so far (pbest) and its XY
position. This information is analogyof personal experiences
of each agent. Moreover;each agent knows the best value so
far in the group (gbest) among pbests. This information is
analogy of knowledge of how the other agents around them
have performed. Namely, Each agent tries to modify its
position usingthe following information:
- the currentpositions (x, Y),
--+---------.... x
Sk : current searchingpoint,
- the currentvelocities (vx, vy), Sk+l :modifiedsearching point,
- the distance between the currentposition and pbest VX : current velocity,
- the distance between the currentpositionand gbest VX+ 1 : modifiedvelocity,
This modification can be represented by the concept of Vpbcst : velocitybased on pbest
velocity. Velocity of each agent can be modified by the Vgbcst :velocity based on gbest
following equation: Fig.2 Concept of modificationof a searching pointby PSO.

where, Vi"*- : velocity of agent i at iteration k,
w : weighting function,
Cj : weighting factor,
rand : randomnumber betweenO and 1,
Sjk : currentposition of agent i at iterationk,
pbest, : pbestof agent i, Fig. 3 Searching concept with agents in a solution space
gbest : gbestof the group. byPSO.
The following weighting function is usually utilizedin (1): value is stored.
Step. 2 Evaluation of searchingpoint of each agent
w -w· The objective function value is calculated for each
w=wmu - ~ nun x iter (2)
iter.flU. agent. If the value is better than the current pbest of
where, Wmu : initial weight, the agent, the pbest value is replaced by the current
W min : finalweight, value. If the best value of pbest is better than the
itermu: maximum iterationnumber, current gbest, gbest is replaced by the best value and
iter : currentiterationnumber. the agent number with the best value is stored.
Step. 3 Modification of each searchingpoint
Usingthe aboveequation, a certain velocity, which gradually The current searching point of each agent is changed
gets close to pbest and gbest can be calculated. The current using (1)(2)(3).
position (searching point in the solution space) can be Step. 4 Checking the exit condition
modified by the following equation: The current iteration number reaches the
predetermined maximum iteration number, then exit
(3) Otherwise, go to step 2.

Fig. 4 shows the general flow chart of PSO. The features of

Fig. 2 shows a concept of modification of a searching point
the searching procedure of PSO can be summarized as
by PSO and Fig. 3 shows a searchingconcept with agents in
a solution space. Each agent changes its current position
(a) As shown in (1)(2)(3), PSO can essentially handle
usingthe integration of vectors as shownin fig. 2.
continuous optimizationproblem.
The general flow chart of PSO can be described as
(b) PSO utilizes several searching points like genetic
algorithm (GA) and the searching points gradually get
Step. 1 Generation of initial conditionof each agent
close to the optimalpoint using their pbests and the gbest.
Initial searching points (Si~ and velocities (Vi~ of each (c) The first term of right-band side (RHS) of (1) is
agent are usually generated randomly within the corresponding to diversification in the search procedure.
allowable range. The current searching point is set to The second and third terms of that are corresponding to
pbestfor each agent The best-evaluated value of pbest intensification in the search procedure. Namely, the
is set to gbest and the agent number with the best method has a well-balanced mechanism to utilize

agent's deciding yes or no, true or false, or making some
other decision, is a function of personal and social factors as

Generation of initial condition of (4)

each agent Step.1

The parameterv, an agent's predisposition to makeone or the

Evaluation of searching point of other choice, will determine a probability threshold. If v is
each agent Step.2 higher,the agent is more likely to choose 1, and lower values
favor the 0 choice. Such a threshold requires staying in the
Modification of each searching range [0, 1]. One of the functions accomplishing this feature
point Step.3 is the sigmoid function, which usually utilized with neural
. ( It): - -1- -
Slgv; (5)

The agent'sdisposition should be adjustedfor successof

Fig.4 A general flow chart ofPSO. the agent and the group. In order to accomplish this, a
formula for each Vik that will be some function of the
diversification and intensification in the search procedure difference between the agent's current position and the best
efficiently. positions found so far by itself and by the group. Namely,
(d) The above concept is explained using only XY-axis (two- like the basic continuous version, the formula for binary
dimension space). However, the method can be easily versionofPSO canbe describedas follows:
applied to n-dimension problem.Namely, PSO can handle
continuous optimization problems with continuous state v:· 1 =v: + rand x (pbest, -s;It)+rand x (gbest-s:) (6)
variables in a n-dimension solution space. p:+1 <sig(v:+I)thens;t+1 =l;else.s:+ l
=0 (7)
The above feature (e) can be explained as follows [7]. The
RHS of (2) consists of three terms. The first term is the where, rand: a positive random number drawn from a
previousvelocityof the agent The second and third termsare unifonn distribution with a predefined upper
utilized to change the velocity of the agent. Without the limit
second and third terms, the agent will keep on "flying" in the Pik+1 : a vectorofrandom numbersof [0.0, 1.0].
same direction until it hits the boundary. Namely, it tries to
explore new areas and, therefore, the first tenn is In the binary version, the limit of rand is often set so that the
corresponding to diversification in the search procedure. On two rand limits sum to 4.0. These formulas are iterated
the other hand, without the first term, the velocity of the repeatedly over each dimension of each agent The second
"flying" agentis only detennined by using its current position and third term of RHS of (6) can be weighted like the basic
and its best positions in history. Namely, the agents will try to continuous versionof P80. vl' can be limited so that sig(vl()
converge to the their pbests and/or gbest and, therefore, the does not approach too closely to 0.0 or 1.0. This ensures that
terms are corresponding to intensification in the search there is always some chance of a bit flipping. A constant
procedure. The basic PSO has been applied to a learning parameter V max can be set at the start of a trial. In practice,
problem of neural networks and Schaffer f6, the famous VIDIX is often set in [-4.0, +4.0]. The entire algorithm of the
benchmark function for GA, and efficiency of the methodhas binary version of PSO is almost the same as that of the basic
been confirmed [3]. continuousversionexcept the above decisionequations.


PARTICLE SWARMOPTIMIZATION Lots of engineering problems have to handle both
discrete and continuous variables using nonlinear objective
3.1 DiscretePSO [8] functions. Kennedy and Eberhart discussed about integration
The original PSO described in IT is basically developed of binary and continuous version of PSO [5]. Fukuyama, et
for continuous optimization problems. However, lots of al., presented a PSO for MINLP by modifyingthe continuous
practical engineering problems are formulated as version of PSO [9]. The method can be briefly described as
combinatorial optimization problems. Kennedy and Eberhart follows.
developed a discrete binary version of PSO for the problems Discrete variables can be handled in (1) and (3) with
[8]. They proposed a model wherein the probability of an little modification. Discrete numbers instead of continuous

numbers can be used to express the current position and
velocity. Namely, discrete random number is used for randin
(I) and the whole calculation of RHS of (1) is discritized to
Generation of initial searching
the existing discrete number. Using this modification for
pointsof eachagent
discrete numbers, both continuous and discrete number can
be handled in the algorithm with no inconsistency. In [9], the
PSO for MINLP was successfullyapplied to a reactive power Evaluation of searching point of
and voltage controlproblemwith promising results. each agent Step.2

3.3 HybridPSO (HPSO) [10] Natural selection using evaluation

HPSO utilizes the basic mechanism of PSO and the value of eachsearching point Step.3
natural selection mechanism,which is usually utilized by Ee
methods such as GAs. Since search procedure by PSO deeply
depends on pbest and gbest, the searching area may be Modification of each searching
point Step.4
limited by pbest and gbest, On- the contrary, by introduction
of the naturalselection mechanism, effect of pbest and gbest
is gradually vanished by the selection and broader area search
can be realized. Agent positions with low evaluation values
are replaced by those with high evaluation values using the
selection. The exchange rate at the selection is added as a
new optimization parameter of PSO. On the contrary, pbest
Fig.5 A general flow chart ofHPSO.
infonnation of each agent is maintained. Therefore, both
intensive search in a current effective area and dependence
on the past high evaluation position are realized at the same
Evalua1ioo values of
time. Fig. 5 shows a general flow chart of HPSO. Fig. 6 AgerI. 1,2lie lowand
shows conceptof step. 2, 3, and 4 of the general flow chart.

3.4 Lbest model

;~~i" ').Apllt3 ·t. . . . Pbest2 1bose of agent 3, 4 are

Eberhart and Kennedy called the above-mentioned Pbest3'" , Aglnt4

basic method as "gbest model". They also developed "lbest (Gbes.t)........ ' .
model" [5]. In the model, agents have infonnation only of (a)Step.2
their own and their nearest array neighbor' bests (lbests),
rather than that of the entire group. Namely, in (1), gbest is StardJiDg points of
replacedby lbests in the model. .Pbestl ~Agmtl .Pbest2
Agellt3.: Agmt4 Agmt2
apnt 1,2 aredBlged
10 dae of agent3, 4
by the sdecrion
4.1 Parameter Selection
PSO has several explicit parameters whose values can
be adjusted to produce variations in the way the algorithm ~A.gmtl New ..m is ' - -
• Pbestl '. - ,4 Pbest2
~~ Agellt2~ •
searches the solution space. The parameters in (1)(2) are as ian the DeW
follows: Agent3
I: ~'. --41
e,," f81dUDgpoinls.
Cj : weighting factor,
W IIIU : initial weight of the weight function,
••••..Pbest3 .
(Gbest) Pbest4
W min : final weight of the weight function,
Shi and Eberhart tried to examine the parameter selection of (c}Step.4
the above parameters [11][12]. According to their Fig. 6 Conceptof searching process by HPSO.
examination, the following parameters are appropriate and
the values do not depend on problems: 4.2 Constriction factor
cj=2.0, wmu=O.9, wmiD=O.4. The basic system equation of PSO (equ. (1), (2), and
(3»· can be considered as a kind of difference equations.
The values are also appropriate for power system problems Therefore, the system dynamics, namely, search procedure,
[9][13]. can be analyzed by the eigen value analysis. The constriction

factor approach utilizes the eigen value analysis and controls Table 1 PSO lications.
the system behavior so that the system behavior has the lication field
following features [14]: Neuralnctwork leamin at orithm
(a) The system does not diverge in a real value region and Humantremor anal sis
fmally can converge, RuleExtractionin F Neural Network
(b) The systemcan search differentregionsefficiently. Batte Pack State-of-Char e Estimation
The velocity of the constriction factor approach (simplest
Computer Numerically Controlled Milling
constriction) can be expressed as follows instead of (1) and
Reactive Power and Volta e Control
Distribution state estimation
Power S stem StabilizerDesi
Fault State Power Supply Reliability
(9) Enhancement
*) No. shows the paper No. shown in bibliographiessection.

continuous variables. Moreover, recently developed_

For example, if cp=4.1, then X=O.73. As <p increases above 4.0,
constncnon factor approach is based on mathematical
X gets smaller. For example, if <p=5.0, then X=O.38, and the
analysis and useful for obtaining high quality solutions. A
damping effect is even more pronounced. The constriction
few applications arc already appeared using PSO. PSO can be
factor approachresults in convergence of the agentover time.
an efficient optimization tool for nonlinear continuous
Unlike other EC methods, the constriction factor approach of
optimization problems, combinatorial optimizationproblems,
PSO ensures the convergenceof the searchprocedures based
and mixed-integer nonlinear optimizationproblem (MINLP).
on the mathematical theory. Namcly, the amplitude of the
each agent's oscillation decreases as it focuses on a previous
best point. The constriction factor approach can generate
higher quality solutions than the conventional PSO approach
[15]. [1] C. Reynolds, "Flocks, Herds, and Schools: A Distributed Behavioral
However, the constriction factor only considers Model", Compte" Graphics, VoL21, No.4, pp.2S-34,1987.
dynamic behavior of each agent and the effect of the [2] A. Colomi, M. Doriao, uad V. Maniezzo, "Distributed Optimization
interaction among agents; namely, the effect of pbest and by Ant Colonies", hoc. of First EII1'OpetlII Conference on Artificial
gbest in the system dynamicsis onc of the futureworks[14]. Lije,pp.134-14~ Cambrictae, MA: MIT Press 1991.
(3] J. Kennedy and R. Eberbart, "Panicle Swann Optimization",
Proceedings of IEEE Intemational Conference on NellTQl Networks
(lCNN'95), Vol IV, pp.1942-1948,Perth, Australia, 1995.
5. RESEARCH AREASAND APPLICATIONS [4] E. Bonabeau, M. Dorigo, and G. Theraulaz, SWtzml Intelligence:
From NalUral to Artificial Systems, Oxford Press, 1999.
Ref. [16]-[66] shows other PSO related papers. Most [S] J. Kennedy and R. Eberhart, Swarm lntelligence, Morgan Kaufmann
of papers are related to the method itsel~ and its modification [6] R. Boyd and P. Recbarson, Culture and the Evo/utio""'Y Process,
and comparison with other EC methods. PSO is a new Ee Universityof Chicaso Press, 1985.
technique and there are a few applications. Table 1 shows [7) Y. Sbi aDd R. Ebabart, "A Modified Particle Swarm Optimizer",
applications of PSO in general fields. The last four Proceedings of IEEE lntemtltional Conference on Evolutionary
applications are in power system fields. Detailed description Computation (lCEC'98), pp.69-73, Anchorage, May 1998.
[8] J. Kennedy aDd R. Ebabart, "A discrete binary version of the particle
of [9][32[66] and [SO] can be found in Chap.13. Application swann optimization algorithm", hoc. oj the /997 conjerace on
of PSO to various fields is at the early stage. More Systems# Man# and CyHrnetics (SMC'97),pp.4104-4109,1997.
applications can be expected. (9] Y. Fukuyama, et al., "A Particle Swarm Optimization for Reactive
Powerand VoltageControl ConsideringVoltale SecurityAssessment",
IEEETrtl1I.rQCtion on PowerSystems, Vol IS, No.4,November 2000.
[10] P. ADgetiDe, "Using Selectian to Iqxove PaItic1e SWII'I1l Optimization",
6. CONCLUSIONS Proc. of IEEE ~ ~ 011 EvoIutiontzIy Computation
(lCEC'98), ADchora&e, May 1998.
This chapter presents fundamentals of particle swarm [11] Y. Shi aDd R. EbedIart, "Parameter Selection in Particle Swann
optimization (PSO) techniques. While a lot of evolutionary OptinriDtion", Proc: of the 1998 A.1I1IIMIl Conf.-ce on EvoJudonary
computation techniques have been developed for Programming, San Diego,1998.
[12] Y. SID and It EbedIart, "A ModifiedPaItic1e Swarm Optimizer", hoc. of
combinatorial optimizationproblems,PSO has beenbasically IEEE lnlmltlliDnal Cortference on Evohdionary ComputDtion (lCEC'98),
developed for continuous optimization problem. PSO has ADc:horIp, AIIsb, May 4-9, 1998.
several variations including integration with selection [13] S. Nab, T. Genji, T. Yura, and Y. Fukuyama,"PracticalDistribution
mechanism and hybridization for handling both discrete and State Estimation Using Hybrid Particle Swann Optimization", hoc. 01

IEEE Power Engineering Society Winter Meeting, Columbus, Ohio, andTechnology,IUPUl, 2001.
January 2001. [32] Y. Fukuyama and H. Yoshida, "A Particle Swarm Optimization for
[14] M. Clerc, "The Swarmand the Queen: Towardsa Deterministic and ReactivePower and VoltageControl in Electric Power Systems", hoc.
Adaptive Particle SwarmOptimization", hoc. of IEEEInternational ofCongress on Evolutionary Computation (CEC2001), Seoul, Korea.
Conference onEvolutionary Computation (ICEC'99), 1999. Piscataway, NJ: IEEE Service Center,2001.
[15] R. Eberhartand Y. Shi, "Comparing Inertia Weights and Constriction [33] Z. He, C. Wei, L. Yang, X. Gao, S. Yeo, R. Eberhart, and Y. Shi,
Factors in Particle Swarm Optimization", Proc, of the Congress on "Extracting Rules from Fuzzy Neural Network. by Particle Swann
Evolutionary Computation (CEC2000), pp.84-88,2000. Optimization", hoc. of IEEE International Conference on
[16] M. A. Abido, "Particle Swarm Optimization for Multimachine Power Evolutiol'llZTY Computation (lCEC'98), Anchorqe, Alaska, USA, May
System Stabilizer Design", hoc. of IEEE PowerEngineering Society 4-9, 1998.
SummerMeeting, July 2001. [34] A. Ismail, A. P. Engelbrecht, "Traininl Product Units in Feedforward
[17] P. Angeline, "Evolutionary Optimization versus Particle Swann NeuralNetworks using Particle SwarmOptimization",Proceedings of
Optimization: Philosophy and Perfonnance Differences", Proceeding the International Conference on ArtUiciQlInleJligence, Durban, South
of The Seventh A1I1IUQI Cor(. on Evolutionary ProgrQlllming, March Africa,pp 36-40, 1999,
1998. [35] J. Kennedy, "The particle swann: social adaptation of knowledge",
[18] P. Angeline, "Using Selection to Improve Particle Swarm hoc. of Intemational Conf.ence on Evolfltionary ColflPUlalion
Optimization", Proceedings of IEEE Internaliona/ Conference on (CEC'97), IndiaDapolis, IN, 303-308. Piscataway, NJ: IEEE Service
Evolutionary Computation (lCEC'98), Anchorage, Alaska, USA,May Center, 1997.
4-9, 1998. [36] J. Kennedy, "Minds and cubures: Partide swarm implications",
[19] A. Carlisleand G. Dozier,"Adapting Particle Swarm Optimization to Socially Intelligent Agenu: PtlpC"S from the 1997 AMI Fall
Dynamic Environments", Proceedings ofInternational Conference on Symposium. Technical Report F5-97-D2, 67-72. MeDlo Park, CA:
ArtificiDIIntel/igence, MonteCarloResort,Las Vegas, Nevada,USA AAAI Press, 1997.
[20] A. Carlisle, and G. Dozier, "An off-the-shelf particle Swann [37] J. Kennedy, "Methods ofapeeiDent: inferenceamoas the eleMentals",
Optimization", Proceedings of the Wor/ahop on Particle Swarm hoc. ofInternational Symposium on Intelligent Co1llT01, Piscataway,
Optimization, Indianapolis, IN: Purdue School of Engineering and NJ: IEEE ServiceCenter,1998.
Technology, IUPUl,2001 (iIllftss). [38] J. Kennedy, "The behaviorofparticles", In V. W. Porto,N. Saravanan,
[21] M. Clerc, "The swarm and the queen: towards a deterministic and D. Waagen, and A. E. Eiben, &Is. Evolutioltllry /'rogrtlIIIming YI1:
adaptive particle swarm optimization", hoc. of 1999 Congress on Proc. 7th Ann. Conf. on EVo/fltiOftllt'Y Prog, tlMlrfing Conf., SIll Diego,
Evolutionary Compukltion (CEC'99), Washington, DC,pp 1951-1957, CA, pp.581-589. Berlin:Spriqer-VerIaa. 1998.
1999. [39] J. Kennedy,"ThinkiDg is social: Experiments with the adaptive culture
[22] R. Ebedwt and J. Kennedy, "A new optimizerusing particle swarm model",JtnmlQl ofCorVlict Raohdion, Vol.42,pp.s6-76, 1998.
theory", Proceedings of tlte Sixth Imernationol Sympolium on Micro [40] J. Kennedy, "Small worlds IDCl meaHDiDds: effects ofaeiahborhood
Machine and Human Science, Nagoya, Japan, pp.39-43, Piscataway, topology on particle swarm perfOI'lDlllCe", hoc. 01 CtHtgress on
NJ: IEEEService Center,1995. Evolutionary eompulQtion (CEC'99), 1931-1938, Piscataway, NJ:
[23] R. Ebedwt and X. Hu, "Human tremor analysisusingparticleswann IEEEService Center,1999.
optimization", hoc. of Congress on Evolutionary Computation [41] J. Kennedy, "Stereotyping: improving particle swarm performance
(CEC'99), Washington, DC, pp 1927-1930. Piscataway, NJ: IEEE with cluster analysis", hoc. of the 2000 Congress 011 Evo/lltiolltlry
ServiceCenter, 1999. Compullltion (CEC2000), San Diego, CA, Piscataway, NJ: IEEEPress,
[24] R. Eberhartand Y. Shi, "Comparison betweenGenetic Algorithms and 2000.
Particle Swann Optimization", hoc. of the Seventh Al'IItUQl [42] J. Kennedy, "Out of the computer, into the world: extemalizing the
Conference on Evoluti01ltD")l Programming, March 1998. particle swarm", Proceedings 01 the Workshop on Particle Swtmn
[25] R. Eberhartand Y. Shi, "Evolving artificial neural networks", hoc. of Optimization, Indianapolis, IN: Purdue School of EDaineeriDa and
International Conference on Neural Networks and Brain, Beijing, Teclmology, IUPU1,2001.
P.R.C.,PLS-PLI3, 1998. [43] J. Kennedy, and R. Eberhart, "PaJticle swarm optimization",
[26) R. Ebedwt and Y. Shi, Y, "Comparison between genetic algorithms Proceedinp 01 the 1995 IEEE Int.",tItional Conference on N.ral
and particle swann optimization", In V. W. Porto, N. SaravIDlll, D. Networks (lCNN), Perth, Australia,IEEE Service CeDter, Piscataway,
Wagen, andA. E. Eiben,Eds., Evolutionary Programming VII: Prot. NJ, IV: pp.I942-1948. 1995.
7th Annual Conferece on Bvolutionary Programming, San Diego,CA. [44] J. KeDDedy and R. Ebemart, "A discrete binary veniOil of the particle
Berlin: Springer-Verlag, 1998. swarm algorithm", Proc.-Ji1lp of the 1997 Conferace on Systems.
[27) R. Ebedwt and Y. Shi, "Comparing inertia weights and constriction Man, tmd Cybemetics (SMC'97), pp.4104-4109. IEEE Service Center,
factors in particle swann optimization", hoc. of Congress on Piscataway,NJ, 1997.
Evolutionary Computation (CEC2000), San Diego, CA, pp 84-88, [45] J. Kennedy and R. Ebemart, "The particle swarm: social adaptation in
2000. infonnationprocessing systems", In Come, D., Doriao, M.,IDdGlover,
[28] R. Eberhart and Y. Shi, "Tracking and optimizing dynamic systems F., Eels., New Ideas in Optimization, London: McGraw-Hill, 1999.
with particle swarms", Proc. of Congress on Evolutionary [46] J. Kennedy, R. Ebemart, aDd Y. Shi, SwtmIf Intelligence, San
Computation (CEC2001), Seoul, Korea,Piscataway, NJ: IEEEService Francisco:MorganKaufmannPublishers, 2001.
Center,2001. [47] J. Kennedy and W. Spears, "Matchina Alaorithms to Problems: An
[29] R. Eberhart and Y. Shi, "Particleswarm optimization: developments, experimentalTest of the Particle Swarm mel some Geaetic Alaorithms
applications and resources", hoc. of Congress on Evolutionary on the Multimodal Problem Generator", hoc. of IEEE I"teT'flDtional
Computation (CEC2001), Seoul,Korea,Piscataway, NJ: IEEEService Conference on Evolutionary Contplltation (lCEC'98), Anchorage,
Center,2001. Alaska, USA, May4-9, 1998.
[30] R. Ebedwt, P. Simpson, and R. Dobbins, Computational1nte/ligence [48] M. Lsvbjerg, T. Rasmussen, and T. Krink, "Hybrid Particle Swann
PC Tools, Boston: Academic PressProfessional, 1996. Optimizationwith Breeding and SubpopuIatiODS", Proce«Jlngs of the
[31] H.-Y. Fan and Y. Shi, "Study of Vmax of the particle swann third Genetic and Evolutioruzry Computation Conj.,.,ce (GECCo-
optimization algorithm", Proceedmgs of the Workshop on Particle 2001),2001.
SwarmOptimization, lnclianapolis, IN: Purdue School of Engineering [49] C. Mohan, and B. Al-kazemi,"Discreteparticle swann optimization",

Proceedings of the Workshop on Particle Swarm Optimization, [58] Y. Shi and R. Eberhart, "Parameter selection in particle swarm
Indianapolis, IN: Purdue School of Engineering and Technology, optimization", In Evolutionary Programming YII: Proc. EP98, New
IUPUI,2oo1. York:Springer-Verlag, pp. 591-600, 1998.
[SO] S. Nab, T. Genji, T. Yura, and Y. Fukuyama, "Practical Distribution [59] Y. Shi and R. Eberhart, "A modified particle swann optimizer",
StateEstimation Using Hybrid Particle SwarmOptimization", hoc. of Proceedings of the IEEE International Conference on Evolutionary
IEEE Power Engineering Society Winter Meeting, Columbus, Ohio, Computation (CEC'98), pp.69-73.Piscataway, NJ: IEEE Press, 1998.
USA,2001. [60] Y. Shi and R. Eberhart, "~irical study of particle swann
[51] K.. Naraand Y. Mishima,"ParticleSwann Optimisation for Fault State optimization", Proceedings of the /999 Congress on. Evolutionary
PowerSupplyReliability Enhancement",hoc. ofIEEE InternatiotIQ/ Computation (CEC'99), pp.I94S-1950, Piscataway, NJ: IEEE Service
Conference on Intelligent Systems Applications to Power Systems Center, 1999.
(lSAP2001), Budapest, J1Dle 2001. [61] Y. Shi and R. Eberhart, "Experimental study of particle swann
[52] E. Ozcan and C. Mohan, "Analysis of a Simple Particle Swann optimization", hoc. ofSCI2000 Conference, Orlando,FL, 2000.
Optimization System", Intelligent Engineering Systems Through [62] Y. Shi and R. Eberhart, "Fuzzy Adaptive Particle Swann
ArtificioJ NeuralNetworks, Vol.8,pp. 253-258, 1998. Optimization", Proc. of Congress on Evoluti01Ul1'Y Computation
[53] E. OzcanandC. Mohan, C. K, "Particle Swann Optimization: Surfing (CEC2001), Seoul, Korea. Piscataway, NJ: IEEE Service Center,
the Waves", hoc. of 1999 Congress on Evolutionary Computation zon,
(CEC'99), Washington, DC, USA,July 6-9, 1999. [63] Y. Shi and R. Eberhart, "Particle Swann Optimization with Fuzzy
[54) K.. Parsopoulos, V. Plagianakos; G. Magoulas, and M. Vrahatis, Adaptive Inertia Weight", Proceedings of the Workshop on Particle
"Stretching techniquefor obtainingglobal minimizers through particle Swarm Optimization, Indianapolis, IN: PurdueSchoolof Engineering
swarm optimization", Proceedings ofthe Worlcshop on Particle Swarm and Technology, IUPUI,2001.
Optimization, Indianapolis, IN: Purdue School of Engineering and [64] P. Suganthan, "Particle swann optimiser with neighbourhood
Technology, IUPUI,2001. operator", Proceedings of the 1999 Congress on Evolutionary
[55] T. Ray and K. M. Liew, "A Swarm with an Effective Infonnation Computation (CEC'99j, pp.l958-1962. Piscataway,NJ: IEEE Service
Sharing Mechanism for Unconstrained and Constrained Single Center, 1999.
Objective Optimization Problems", Proc. of the 2001 Congress on [65] V. Tandoo,"Closingthe gap between CAD/CAM and optimized CNC
Evolutio1lJl1Y Computation (CEC2001), Seoul Korea,2001. end milling", Masters th~is, Purdue School of Engineering and
[56] J. Salerno, " Using the Particle Swann Optimization Technique to Technology, IndianaUniversityPurdueUniversity IndiaDapolis, 2000.
Train a Recum:nt Neural Model", hoc. of 9th IntematiotIQ/ [66] H. Yoshida, K. Kawata, Y. Fukuyama, and Y. Nakanishi, "A particle
Conference on Tools with Artificial Intelligence (ICTIJ'97), 1997. swannoptimization for reactive powerand voltagecontrolconsidering
[57] B. Secrest and G. Lamont, "Communication in particle swarm voltagestability", In G. L. Torres andA. P. Alves ciaSilva, Eels., hoc.
optimization illustrated by the traveling salesman problem", of Inti. Conf. on Intelligent System Applicalion to Power Systems
Proceedings of the Workshop on Particle Swarm Optimization, (lSAl"99), Rio de Janeiro, Brazil, pp.l17-121, 1999.
Indianapolis, IN: Purdue School of Engineering and Technology,

Chapter 6

Fundamentals of Simulated Annealing

Abstract: Simulated annealing is one of the most flexible

techniques available for solving hard combinatorial prob-
lems. The main advantage of SA is that it can be ap-
plied to large problems regardless of the conditions of dif-
ferentiability, continuity anf convexity that are normally
required in conventional optimization methods. In this
chapter firstly the principles of simulated annealing are
presented on an intuitive basis. Next, the basic theory
of simulated annealing along with a sequential version of
the method are presented. A parallel version of the algo-
rithm is then introduced. A detailed discussion about the
application of the approach to two problems are given,
namely the traveling salesman problem and the transmis-
sion network expansion problem. As happens with other
combinatorial techniques, such as tabu search and genetic
algorithms, the codification of solutions, the characteri-
zation of the neighborhood of a. given configuration, the Fig. 1. Example of optimal solution for the traveling sales-
evaluation of the objective function, and the transition man problem in a symmetrical grid with 100 cities [2].
mechanisms are critical to the success of practical imple-
mentations of simulated annealing.
Index Terms: simulated annealing, optimization methods,
combinatorial optimization, transmission network plan- lization process. It was only in the 80's however that
ning. independent research by Kirkpatrick, Gelatt, Vecchi [7]
and Cerny [2] noted the similarities between the physical
process of annealing and some combinatorial optimization
1. INTRODUCTION problems. They noted that there is a correspondence be-
tween the alternative physical states of the matter and
Annealing is the process of submitting a solid to high tem- the space of possible solutions of the optimization prob-
perature, with subsequent cooling, so as to obtain high lem. It was also observed that the objective function of
quality crystals, i.e., crystals whose structure forms per- the optimization problem corresponds to the free energy
fect lattices [20]. Simulated annealing emulates the physi- of the material. An optimal solution is associated with a
cal process of annealing, and has originally been proposed perfect crystal, whereas a crystal with defects corresponds
in the domain of statistical mechanics as a means of mod- to a local optimal solution. The analogy is not complete
eling the natural process of solidification and formation however, since in the annealing process there is a physi-
of crystals. During the cooling process it is assumed that cal variable that is the temperature which under proper
thermal equilibrium (or quasi-equilibrium) conditions are control leads to the formation of a perfect crystal. When
maintained. The cooling process ends when the material simulated annealing is used as an optimization technique,
reachs a state of minimum energy which, in principle, cor- the "temperature" becomes simply a control parameter
responds to a perfect crystal. It is known that defect free that has to be properly determined in order to achieve
crystals, i.e. minimum energy solids, are more likely to the desired results.
be formed under a slow cooling process. The two main The method proposed by Metropolis is based on the
features of the simulated annealing process are (1) the Monte Carlo technique. The simulation goes as follows: at
transition mechanism between states and (2) the cooling every step of the algorithm, firstly a particle is randomly
schedule. When applied to combinatorial optimization, chosen; then a random move (a small perturbation) for
simulated annealing aims to find a optimal configuration this particle is determined; if the move leads to a state
(or state with minimum "energy") for a complex problem. with decreased free energy, then the move is accepted;
Simulated annealing was originally proposed by otherwise, if it leads to a state with higher free energy,
Metropolis in the early 50's as a model of the crystal- then it can be accepted with a certain probability (this is
also known as hill climbing, and allows moving out of local
optimal solutions). As the temperature decreases, the • • • • •
probability of a hill climbing move to occur also decreases. • • • •
As for the optimization process, it can be seen as sequence
of moves (transitions) in the state space, Le. the space • • • • •
of possible configurations. In certain problem it may be • • • •
convenient to use an extended state space where infeasible
states are also included. (a) Perfect Crystal.

... .. .. .....
... ...
• • • • •
... • • •
.... • • • • •

• • • •
(b) Vacancy Defect.
... .....
... .
.... .... • • • • •
... .. .... • •
• •
• • •

Fig. 2. Symmetrical TSP with 100 cities located on a cir-

• • • •
cle [2]. (c) Interstitial Defect.

Fig. 3. Solidification of crystals and common defects [21].

In this chapter two complex problems are used to illus-
trate the workings of the simulated annealing algorithm:
(1) the traveling salesman problem, TSP, and (2) the 2.1 Metropolis Algorithm
transmission network expansion planning problem. Cerny
The original idea behind the simulated annealing algo-
[2] has used a very simple version of SA to solve the trav-
rithm is the Metropolis algorithm that models the micro-
eling salesman problem for cases in which the cites are
scopic behavior of sets of large numbers of particles, as in
supposed to be arranged in a symmetrical array; Figures
a solid, by means of Monte Carlo simulation. In a mate-
1 and 2 shows two symmetrical configuration with 100
rial, the individual particles have different levels of energy,
cities. Such symmetrical structures normally involve a
according to a certain statistical distribution. The lowest
number of alternative optimal solutions. In Figure 1 the
possible level of energy, known as the fundamental level,
horizontal and vertical distances between adjacent cities
corresponds to the state where all particles stand still, and
is of one unit of length. An optimal tour in this case
occur at 0° K (or -273° 0). For temperatures above that
has a total length of 'V =
100. In Figure 2 the cities
level, the particles will occupy different levels of energy,
are located on a unit circle and the optimal solution has
such that the number of particles in each level decreases
v ~ 27r ~ 6.28 [2]; the larger the number of cities, the
as the energy level increases, i.e, the maximum number
closer the optimal solution is to v 21r.
of particles is found in the fundamental level. The distri-
bution of the particles in the various levels varies with the
temperature; for T =0° K, for example, all particles are
in the fundamental level; as the temperature increases, .
2. BASIC PRINCIPLES more particles are found in higher energy levels, but al-
ways as a decreasing function of the energy level.
This section summarizes the theoretical fundamentals of The Metropolis algorithm generates a sequence of states
simulated annealing. A more thorough description of the of a solid as follows: giving a solid in state 51, with energy
theory of simulated annealing can be found in [1, 21]. Ei , the next state Sj is generated by a transition mecha-
nism which consists of a small perturbation with respect moves allows simulated annealing to escape from local op-
to the original state, obtained by moving one of the par- timal configurations. The entire process is controlled by a
ticles of the solid chosen by the Monte Carlo method. cooling schedule that determines how the temperature is
Let the energy of the resulting state, which also is found decreased during the optimization process (for example,
probabilistically, be E j ; if the difference E j - E; is less a geometrical decrement can applied, which consists of
than or equal to zero, the new state Sj is accepted; other- multiplying the current temperature by a constant factor
wise, if the difference is greater than zero, the new state fj < 1). -
is accepted with probability

{~l-;j }
e B
Simulated Annealing;
where T is the temperature of the solid and kB is the Begin
Boltzmann constant. This acceptance rule is also known Initialize (To, No);
as Metropolis criterion and the algorithm summarized k:=O;
above is the Metropolis algorithm [20]. The temperature Initial configuration Si
is assumed to have a rate of variation (cooling schedule)
Elepeat procedure
such that thermodynamical equilibrium is reached for the
current temperature level, before moving to the next level. do L:= 1 to Nk
This normally requires a large number of state transitions generate (53 from Si);
of the Metropolis algorithm. The thermal equilibrium se f(S;) ~ f(Si) do Si:= 5j;
condition is such that the probability that the solid is otherwise
in state Si, with energy E i , is given by the Boltzmann
distribution, that is,
if exp(J(S,);/(S;») > random[O,l] do Si:= S;;
end do;
1 -E· )
PT{X = So} =- - e(~ k:= k+ 1;
I Z(T) Calculation of the length (NIc);
where X is a stochastic variable corresponding to the cur- Determine control parameter (Tic);
rent state of the solid, and Stopping criterion
-Eo end;
Z(T) = Le<m> Fig. 4. The algorithm Annealing (Aarts & Kors [1]).

is a normalization factor (participation factor); kB the Figure 4 summarizes the simulated annealing algorithm
Boltzmann constant; and e(-Ei/KB T) is known as the
which consists of two basic mechanisms: the generation of
Boltzmann factor. alternatives and an acceptance rule. Tic is the control pa-
rameter that corresponds to the temperature in physical
2.2 Simulated Annealing Algorithm annealing and N Ie is the number of alternatives generated
For a combinatorial optimization problem to be solved in the k-th temperature level (this corresponds to the time
by simulated annealing it is formulated as follows: let the system stays at a given temperature level and should
G be a finite, although perhaps very large, set of con- be big enough for allowing the system reaching a state
figurations and v the cost associated with each config- which corresponds to "thermal equilibrium". Initially,
uration of G. The solution to the combinatorial prob- when T is large, larger deterioration in the cost function
lem consists in searching the space of configurations for are allowed; as the temperature decreases, the simulated
the pair (G, v) presenting the lowest cost. The SA al- annealing algorithm becomes greedier, and only smaller
gorithm starts with a initial configuration Go, and ini- deteriorations are accepted; and finally when T tends to
tial "temperature" T = To, and generates a sequence of zero, no deteriorations are accepted.
configurations N = No. Then the temperature is de- From the current state Si, with cost !(Si), a neighbor
creased, the new number of steps to be performed at the solution 5;, with cost f(5;), is generated by the transi-
new temperature level is determined, and the process is tion mechanism. The following probability is calculated
then repeated. A candidate configuration is accepted if in performing the acceptance test:
its cost is less than that of the current configuration. If
the cost of the candidate configuration is bigger than the
cost of the current configuration it still can be accepted
with a certain probability. This ability to perform uphill

The cooling schedule is the control strategy used from the

beginning until the convergence of the simulated anneal-
ing algorithm, and is characterized by the following four From the definition of t/> it follows that:
4> = exp [- f(x
;0 f(xO) ]
• Initial temperature To;
From the two previous equations it follows that:
• Final temperature Tf;

• Number of transitions, NIc, at temperature Tic. Lnt/>=

• Temperature rate of change, Tk+l = g(Tk).Tk.

The efficiency of -the algorithm, regarding both the
=> To = _JJ_f(XO)
quality of the final solutions as well as the number of
iterations, will depend on the choice of these parameters. The determination of To from Eq.(2) has the advantage
The procedures used in the calculation of parameters are of being simple and direct, although it depends on the
based on the idea of thermal equilibrium and are detailed estimate l(xO) which is not always easily determined.
in the following.
Example: In a given optimization problem one wishes to
accept cP = 13% of the uphill moves with costs up to
3.1 Determination of the Initial Temperature, To 1% bigger than the cost of the initial solution which is of
There are several ways of determining the initial temper- f(xO) = 100000. The initial temperature To is determined
as follows (2):
ature To for the simulated annealing algorithm. On alter-
native consists in carrying out a constructive experimental
0.01 ( )
process which simulates the first temperature level of the T.o = -Ln(0.13) 100000 = 490
algorithm. In [1] the following procedure is suggested:
3.2 Determination of N Ie
To = l ( rna) (1) The number of moves performed at each temperature
n fR2Xo-ml(1-Xo)
level should be such that the condition of thermal near-
where X o is specified and the other parameters are de- . equilibrium is guaranteed. Hence the value of this param-
termined from the results of rn o tries; ml and m2, with eters is closely related to the rate of temperature reduc-
rn o ml + m2, corresponds to the number of moves with tion. Most algorithms use a value of NIr. that depends
decreasing and increasing costs, respectively. In the liter- on the size of the problem (number of decision variables).
ature a commonly used value is X o = 0.85, which means Two of the proposals that appear in the literature are
that, ,at the initial temperature, 85% of the uphill moves summarized below:
are accepted. After actual implementation, one can verify
whether or not the number of accepted moves is around • Constant N Ir.:
the specified level of 85%.
An alternative way of determining To was proposed in (3)

where it is assumed that if> % of the uphill moves are
accepted at the initial temperature level To. where No is the number of moves at the initial tempera-
ture level and p is a user supplied parameter.
Remarks: The result expressed in Eq. (2) can be verified
as follows: Consider a configuration with cost f(zC) is IJ Remar1cs: In [16] both versions above have been imple-
% worse than the cost 1(%0) of the initial configuration, mented and compared; although the second alternative
that is: is more demanding in terms of computational effort, it
normally leads to results that are better than the ones
obtained with the first approach.
Define initial temperature
3.3 Determination of Cooling Rate
There are a number of ways to carry out the temperature
reduction in simulated annealing. All methods however
are based on the fact that thermal equilibrium should be
Define other control
reached before the temperature is reduced. Three alter-
parameters: p, (3, /J, Wo, Cw
natives for calculating Tk+1 from the current temperature
Tk are summarized below: Nk =No =J.L.Nt

• Constant cooling rate: f3 E [0.50;0.99]

Transition Mechanism
Tk + 1 = (3Tk (5)
Repeat Nk times the
• Variable cooling rate: 6 E [0.01; 0.20] transition mechanism

T _ Tk (6)
k+l - [1 + in 1+6 Tie]
3u( Ie

where u(T/t,) is the standard deviation of the costs of

the configurations generated at the previous temper-
ature level Tk .
• Variable cooling rate: A $ 1.0
Enter local improvement
(7) phase (optional)

where the new value T1c+l also depends on the perfor- to

mance in the same way as in Eq. (6).

Remarks: The performance of the various cooling sched- Fig. 5. Flow chart for the Simulated Annealing algorithm.
ules is highly problem dependent. For the network trans-
mission expansion planning problem, for example, meth-
ods (6) and (7) present nearly the same performance as • Define the number of uphill moves that should be
the method of Eq. (5) as far as the quality of the solu- accepted. The process stops whenever the number of
tions is concerned, although the number of iterations to acceptances becomes less than the specified value.
convergence is normally much higher for the methods of
Eqs. (6) and (7). In [10, 16] the method of Eq. (5), with In problems suchasthe network transmission expansion .
proper calibration, has been applied with success. planning, and in a number of other applications of SA to
power networks, auxiliary problems have to be solve a
number of times to verify solution feasibility (Notice that
3.4 Stopping Criterion
this is not the case with several important operations re-
Stopping criteria vary a lot in degree of complexity and so- search problems such as the traveling salesman problem.).
phistication. Both pre-defined and adaptive criteria have In the case of the network expansion problem, the solu-
been suggested in the literature. Some of the most com- tion of linear programs representing the entire network
mon strategies are summarized below: are required. For these situations more elaborate stopping
criteria may be necessary in order to reduce the overall
• Define a constant number of temperature reductions computational effort. In [10, 16] the following criterion
normally between 6 e 50. have been tested with success:
• Use the rate of improvement of the cost function to • Stop the process if the number of LP solutions ex-
define the stopping criterion; hence, if the incumbent ceeds a specified limit. Or,
solution or the cost of the best solution so far does
not improves after a series of temperature reductions, • Stop the process if the incumbent solution does not
it is assumed that convergence has been achieved and improve after a specified number of iterations.
the process is stopped.

The basic structure of the simulated annealing algorithm

is given in Figure 4. The flow chart of Figure 5 extends
the previous algorithm by adding a phase of local im-
provement, in which a greedy algorithm is used with a
modified neighborhood structure. In the figure some ad-
ditional control; parameters are introduced such as Nt, 5
the number of right-of-ways where the addition of new
circuits is permitted, W o is the allowed level for loss of
load in the identification of the neighborhood, and Cw is
the loss of load accept for the optimal solution. 1
In the following two examples of simulated annealing
are discussed: the traveling salesman problem and the Fig. 6. Traveling salesman problem with 10 cities (VI =
power network expansion problem. 53.72).


The Traveling Salesman Problem is formulated as follows:
Given a set of n cities, V {I, 2, 3, ... , n}, a set of edges
connecting the cities, (i,j) E A, and the distances (costs)
di j between cities i and j (with d;j = dji in the symmetri-
Fig. 7. Example of neighbor configuration obtained by
cal case). A traveling salesman has to make a tour start-
ing from one of the cities and visiting each of the other
cities only once, and going back to the original city. The
objective is to find the tour with minimum length. The
following algorithm draws upon and improves the original tion given above is a popular choice in both simulated
algorithm presented by Cerny [2]. annealing and tabu search applications.

5.1 Problem Codification 5.2 Evaluation of the Cost Function

The codification for the TSP can be made very easily by In each step of the simulated annealing algorithm an ac-
using a vector in which each index i corresponds to a city ceptance test has to be performed. This test requires the
and the i-th element of the vector contains the next city evaluation of the change produced in the cost function by
in the tour. a perturbation. For the neighborhood structure described
Consider for example the solution shown in Figure 6. above, this evaluation can be easily performed as follows.
This solution can be codified as the vector Pl in Figure 7, Consider for example that the present topology is the one
which indicates that the cities are visited in the following given in Figure 6 and that the perturbed topology is the
order: 1,9,10, 7,4,6,8,5,3,2. The vector P2 in the figure one of Figure 8. In this case, the variation in the cost
was obtained by swapping the contents of the positions function is given by:
3 and 7 of the vector Pl. The tour corresponding to the
modified vector 1'2 is shown in Figure 8. The optimal tour
is shown in Figure 9.
dv = V2 - VI = dg,8 + d10 ,5 - clg,10 - ds,s
The codification described above has the advantage
that it always guarantees a feasible solution. Also, the av = 2v'2 + 10 - 9- Vfi = -0.295
size of the vector is the same as the number of cities, i.e.,
Since there is a actual reduction in the cost, the new topol-·
n, and then increases only linearly with the size of the
ogy would be accepted and would become the new current
problem. Moreover, good neighborhood structures can
solution. The new current topology would be that of P2,
be defined without difficulty.
with the following associated cost:
Remarks: There are several different ways of encoding the
solution of the traveling salesman problem. The codifica- 1)2 = VI + av = 53.72 - 0.29 = 53.43
sumed to remain constant throughout the cooling process,
i.e., NIt,+l = No, where No normally found according to
the number of cities. For example, for n =
10, one can
choose No = 4n = 4(10) = 40 trials.
The rate of temperature reduction {3 can be assumed
to be constant as well, for example, /3 0.98. Hence, for
updating the temperature one simply uses:

The stopping criterion is based on the number of tem-

perature levels for which the cost of the incumbent solu-
tion (the optimum solution so far) does not change (for
example,5 consecutive temperature levels).
The SA characterized by these parameters easily solves
Fig. 8. Another example of traveling salesman problem the TSP with the initial topology given in Figure 6, with
with 10 cities (This configuration was obtained from the n = 10, and finds the optimal solution shown in Figure 9.
configuration in Figure 6 by swapping elements 3 and 7.)
5.4 Comments on the Results for the TSP
9 10
The simulated annealing algorithm described above works
particularly well for the traveling salesman problem. The
v =41.03 algorithm as described above easily finds the optimal so-
lution for the symmetrical problems of the types illustrate
in Figures 1 and 2 with n = 100 (These problems were
originally studied by Cerny [2]). More elaborate SA algo-
6 7 rithms are able to find solution for cases with thousands
of cities.
An important feature of the algorithm discussed above
is the way neighborhood structure was defined (k-opt,
with k = 2 in the case shown in the figure) in which
case all neighboring configurations are feasible. Notice
that this is not the case with genetic algorithms and with
evolutive algorithms in general, where infeasibilities may
Another characteristic of the simulated annealing ap-
Fig. 9. Optimal solution for the TSP of Figures 6 and 8. proach as applied to the traveling salesman problem is
relatively small effort spent in computing the variations
of the cost function caused by a move. This is normally
less than what is required in computing the probability
5.3 Cooling Schedule
p = ezp(tiv/T) and suggested that it may be advanta-
As mentioned earlier in the chapter, the cooling schedule geous to use approximations in evaluating the exponential
is characterized by (1) the initial temperature, (2) the function.
number of trails at each temperature level, (3) the rate of
cooling and (4) the stopping criterion.
The initial temperature To can be calculated from Eq,
(2). Thus, for example, if one wishes to accept 4> 30% of
the uphill moves (moves with increasing costs) with costs Given the network configuration for a certain year and the
up to IJ = 15 % worse than that of the initial configura- peak generation/demand for the next year (along with
tion, which is of 1)1 = 53.72 (the. initial configuration is other data such as network operating limits, costs, and -
that of Figure 6), the initial temperature To is given by, investment constraints), one wants to determine the ex-
pansion plan with minimum cost, Le, one wants to de-
To = -L~~30) (53.72) =6.69 termine where and what type of new equipment should
be installed. Of course this is a subproblem of a more
The number of trials at each temperature level is as- general case, called dynamic expansion planning, where,
in addition to the questions what and where, one wants Problem (8) is a typical hard combinatorial problem
to know when to install new pieces of equipment. In this prone to combinatorial explosion as the number of deci-
section it is focussed the initial stages of the expansion sion variables increases. An extra complication regards
planning studies when the basic topology of the future the fact that there are cases in which planning does not
network is determined. Network topologies synthesized simply means the reinforcement of an existing network.
by the proposed approach will then be further analyzed Sometimes one has to start from scratch, at least for parts
and improved by testing their performances using other of the network due to the addition of new load, transition
analysis tools such as power flow, short circuit, transient and generation buses. In certain cases the simple addi-
and dynamic stability analysis. tion of one or two circuits (lines or transformers) will not
In the following the static transmission expansion plan- be enough to guarantee network connectivity: it is not
ning problem is formulated as an mixed integer nonlinear uncommon the situation in which entire paths have to be
programming problem in which the power network is rep- built to put all the pieces together in a single network.
resented by a DC power flow model. Under these circumstances the combinatorial burden is
even heavier than it would be in simpler, reinforcement
only, type of problems.
min v = Lij
Cijnij + L Qiri (8) For a given set of decisions variables Xij or nij, Problem
(8) becomes linear programming problem:
B(x + "(°)6 + 9 + r = d
(Xij + ,fj ) 18i - (Jj I ~ (Xi; + ,ij)4)ij min w= LQiri (9)
OSg~g i
O$r~d B(x k + ,°)8 + 9 + r d=
os nij S nij (xt + 1'(;) 18i - 8;1 S (x:j + 'Yij)CiJij
Cij - Cost of the addition of a circuit in branch i-j.
which is solved for testing the adequacy of a candidate
Xij - Total susceptance in branch I-j. solution; adequacy is indicated by zero loss of load. No-
tice that Problem (8) is always feasible due to the pres-
B (.) - Susceptance matrix. ence of the loss of load factor Ei Ctiri in the objective
function; thus whenever a tentative solution set Xij is in-
8 - Vector of nodal voltage angles. adequate, feasibility is achieved by the use of artificial
generators (loss of load). It has been observed that this
"(0 _ Vector of initial susceptances, whose elements are feature facilitates the optimization process to escape from
"(ij, Le, the summation of the susceptances in local minima, by temporarily moving through regions of
branch i-j at the beginning of the optimiza- inadequate solutions, but still keeping the feasibility of .
tion. the mathematical problem.

nij - Number of circuits added in branch i-j: nij =

6.1 Problem Codification
Xij/'Yij;where "Yij is the susceptance of the
new circuits. A codification for this problem was suggested in [16],
where only the integer variables (the number of circuits
(iJij - Defined as the ratio: ~ij = fij/"Yij; where fij is the that can be added in a right-of-way) are codified, the real
maximum flow in a circuit i-j. variables, such as the angle voltages, being determined
from the solution of the linear program formulated in Eq.
d - Vector of liquid demand. (9).
A configuration is then characterized by an n-vector,
g - Generation vector.
where n is the number of right-of-ways where new circuit
9 - Vector of maximum generation capacity. additions are allowed. In Figure 10 shows the initial con-
figuration for the 6-bus network (the complete set of data
r - Vector of artificial generations. can be found in [15]). New circuits can be added to all
right-of-ways (a total of 15 right-of-ways); the number of
Q - Penalty parameter associated with loss of load additions per right-of-way is not limited.
caused by lack of transmission capacity. The codification for this configuration is as follows:
which clearly indicates that no circuit additions have been
carried out so far. All other relevant variables, including
the loss of load (of w 545 MW) are given in Figure 10.
TS = 240
6.2 Determination of the Initial Solution 240
5 -..37.903
Of course one can always determine the initial solution
randomly; in this case the elements of the codification
vector are randomly selected in the range [0, nijaz] where 93 = 165
nij4:Z: is the maximum number of additions in right-of-
way i - j. This however normally generates a solution 5.64~
with an excessive number of added circuits. To cope with
this problem on can reduced the number of right-of-ways
where additions is allowed, for example to 30% of the total
number of right-of-ways. Another alternative consists in
initializing the configuration vector with no circuit addi- w=545
tions. i.e. assuming that initially only the already existing vi = 0
circuits are present (a solution with zero investment cost).
A constructive heuristic algorithm can also be used to 2.258
determine a initial solution with high quality. For exam- ~
ple the Garver's algorithm based on the transportation
model can be used to generate such initial configuration
[15]. Since the transportation model takes into account
only the Kirchhoff's current law, and not the voltage law, ........- ...4
this solution is a solution to a relaxed problem, which can
be a good initialization for the DC power flow model used
in formulation of Eq. 8.
Fig. 10. Basic configuration of the 6-bus network
The evaluation of the cost of a trial configuration in the
network expansion planning problem is more complicate
than it is in the traveling salesman problem. This is due
to the need to run a linear program in order to deter- TS = 120.469
mine the corresponding loss of load, which forms part of 240
5 .....19.531
the cost function. This type of difficulty is common in
problems with multiple constraints on both integer and
real variables such as the transmission network expansion
The cost associated with the initial configuration of Fig-
ure 10 for Q 5 can be found as follows:
v =V1 + 112 =0 + 5(545) = 2725

where the loss of load w = 545 is determined from the

solution of a linear program as in Eq. 9. As for the 21.797t
w = 158.24
topology shown in Figure 11, the cost is given by:
vi = 120
v = 111 + V2 = 4(30) + 5(158.24) = 911.2
6.3 Neighborhood Structure
For the codification described above there are a number of .....186.761
different ways of defining the structure of the neighbor-
hood of the current solution. A simple alternative con-
- ............. 6 n46 =2 4
96 = 386.761
sists in adapting the method of [9] which has been used
in connection with the application of SA to the knap-
sack multiconstraint problem. This technique has been Fig. 11. A configuration with loss of load.
extended to the network expansion planning problem in
[16], where the neighborhood structure is formed by the rs = 99.84
configurations that result from the current configuration ..... 40.16
by (1) adding a new circuit, (b) removing a previously
added circuit, and (c) by swapping two circuits (adding a
new one and removing another one that has been added
Once the neighborhood is defined, one neighbor is ran-
domly chosen and the acceptance test is carried out to
check whether the candidate configuration will become
the new current configuration or not. The technique pro-
posed in [16] goes as follows:
1. While the current topology presents loss of load, the w = 99.84
generation of neighbors is made in the order addition- vt = 150
swap-removal, where the swapping and the removal
of circuits is performed only when the configuration
that results from the addition of a new circuit does
not pass the acceptance test.

2. When the current topology does not present loss of

load, the generation of neighbors follows the order
removal-swap-addition, where the swap or addition =
96 492.17
of circuits is tried only if the removal fails.
Fig. 12. A configuration with loss of load.
For example, consider that the current topology is the
one of Figure 11, which is a solution with loss of load and
whose codification is as follows:

v= VI + V2 = 5(30) + 5(99.84) = 649.2

where the loss of load w = 99.84 was obtained from a LP
solution. The cost variation is then given by,
A circuit addition is randomly chosen, for example the
circuit 2-6. The codification of the new topology is as av = 649.2- 911.2 = -262.0
The resulting configuration also presents loss of load, al-
though its total cost (the cost of investment plus the
penalties associated with the loss of load) is less that of
the initial topology, which means that the new topology
is accepted and becomes the new current solution.

6.5 Cooling Schedule

6.4 Variation of the objective function Consider that, for example, one wishes to accept t/J 40% =
Most of the computational effort spent in the network of the uphill moves (moves with increasing costs) with
expansion planning problem is concentrated on the ac- costs up to JJ = 30 % worse than that of the initial config-
ceptance test where the solution of the linear program uration, which is of Vl = 2725 (the initial configuration is
formulated in Eq. (9) is required. This is in contrast with shown in Figure 10). In this case the initial temperature
what happens in the traveling salesman problem where
To is given by (2),
the feasibility is automatically guaranteed by the codifi-
,.,., 0.30 ( )
cation that has been adopted. .£0 = -Ln(O.40) 2725 =892.2
Consider for example that the current topology for the
network expansion problem is that of Figure 11 and that The number of trials at each temperature level is as-
the trial configuration is as shown in Figure 12. The cost sumed to remain constant throughout the cooling process,
associated with topology n2 is given by, i.e., NIt.+l = No, where No is normally found according to
the number of right-of-ways where new circuit additions (46 buses and 79 right-of-ways) size problems, as pre-
are allowed. For example, for n = 15, one can choose sented in [16]. In both cases global optimal solutions were
= =
No 3n 3(15) = 45 trials. found. For a larger, more complex test system (87 buses
The rate of temperature reduction {3 can be assumed and 183 right-of-ways) good quality local optimal solu-
to be constant as well, for example, f3 0.90. Hence, for tion were found, in the sense that those solutions were
updating the temperature one simply uses: better than the ones obtained by well tailored heuristic
constructive algorithms. When the parallel version was
introduced, slightly better local optimal solutions were
obtained for the 87-bus network.
The stopping criterion takes into consideration the
number of LP solutions performed during the optimiza-
tion process. Hence, the search stops when either (1) the 7. PARALLEL SIMULATED ANNEALING
maximum number of LP solutions is reached (1200 LPs
in the case of the 6-bus network) or (2) the incumbent Notwithstanding all its good characteristics, one has to
solution does not change for the last kp trials, where kp is recognize that simulated annealing is extremely greedy
a parameter set by the user (kp = 200 in the case of the . regarding computation time requirements. To cope with
6-bus network). -this limitation, parallelization has been suggested. Paral-
A simulated annealing algorithm with the parameters lelization of simulated annealing, however, is not a simple
defined above was able to solve the network expansion matter. The main difficulty being the fact that simulated
planning problem of Figure 10 leading to the optimal so- annealing works as a Markov chain, which is a typical
lution shown in Figure 13. sequential entity: a Markov chain models the simulated
annealing process as a sequence of trials where the prob-
ability of the outcome of a given trial depends only on
the outcome of the current trial (the current configura-
tion, solution or topology), and does not depend on the
trials in the sequence that came before the current trial.
Hence the parallelization should be carried out in such a
way that this sequential structure is not affected when the
simulated annealing algorithm is mapped onto a parallel
In the simulated annealing algorithm N Ie trials are per-
formed in each temperature level Tk. Each trial involves
the following steps:
1. From the neighborhood of the current configuration
vi = 200
select a trial configuration.

2. Calculate the difference betweenthe costs of the cur-

rent solution and that of the trial configuration.

"'188.118 3. Decide whether the trial configuration should be ac-

..101oo+-"'4 cepted or Dot.

4. Replace the current configuration by the new one if

this is the case.
Fig. 13. The optimal configuration.

The following remarks should be considered in perform-

ing parallelization:
6.6 Comments on the Results for the Network Expan-
sion Problem 1. The three first steps are independent and can be per-
formed in parallel without affecting the sequential
The sequential simulated annealing algorithm described nature of the algorithm. Running the step 4 in par-
aboveworks well for the network expansion problem when allel however, would alter the sequential nature of the
applied to small (6 buses, 15 right-of-ways) and medium algorithm.
2. The number of times the steps I through 3 are exe-
Proe. Proc. Proc. Proc. Proc. Proc. Proe.
cuted does not vary with the temperature level. Al- 123 7 8
though the frequency with which the step is per-
formed varies as the temperature is reduced (the
number of times it is activated decreases as the tem-
perature decreases.)

3. The computations involved in these four steps are

highly problem dependent. For example, in the trav-
eling salesman problem the computations involved in
the step 2 is much smaller than that involved in the
same step in the network expansion planning prob-
lem, since in the second case a LP solution is required.

7.1 Division Algorithm:

Let np denote the number of processors in the parallel
machine. A simple, effective way of parallelizing the SA
algorithm consists in dividing the effort of generating the
corresponding Markov chain over the np processors: each
processor performing Nle/np trials. In order to keep the Fig. 14. Division algorithm.
main characteristic of the SA algorithm, at each temper-
ature level, when all processors finish processing their in-
dividual tasks, the incumbent optimal solutions are sent
to the master node (node p =0), which then selects the 7.2 Clustering Algorithm
best one and broadcasts the results to all other processors
(p 1, ...,np-l).
In this case, in contrast with the division algorithm dis-
The communication requirements of this algorithm are
relatively small, which means that the potential for effi- cussed above, the sequential nature of the SA algorithm
ciency is high. The quality of the solution depends on the is strictly observed [1, 10], since the np processors evalu-
number of processors used in the parallel computation. ate the N Ie trials in a cooperative way, which means that
For larger numbers of processors the number of configu- all processors always work with the same current solu-
rations studied by each processors can become too small, tion. Thus whenever one processor accepts a new incum-
which may not allows the system to reach thermal equi- bent it is communicated to all the other processors. This
librium. To cope with this problem the number of trials parallelization approach is less efficient in high tempera-
performed per temperature level can be increased, which ture levels where the frequency with which new current
is then compensated by an increase in the rate of cooling solutions are accepted is relatively high. The opposite
(a smaller parameter p is used in this case). happens at lower temperature levels where very few ac-
Figure 14 illustrates the workings of the division algo- ceptances are performed and thus the algorithm presents
rithm, according to [1]. In the vertical line it is shown the best performance.
evolution of the temperature and the number of trials per A good alternative consists in using a hybrid divi-
temperature level, whereas in the horizontal line the np sion/clustering algorithm. The process is started as a
processors of the parallel machine are represented. The division algorithm and then switches to the clustering al-
processes communicate when they finish processing the gorithm which is executed for lower temperature levels.
trials assigned to them. At that point they receive the The point at which the switching to the clustering algo-
new global incumbent and restarts from there. rithm occurs is based on the observed acceptance rate, as
discussed above. The way the hybrid algorithm works is
Remarks: When node-O receives the partial solutions from
graphically illustrated in Figure 15.
the other processors, it also checks for thermal equilib-
rium conditions, as in the sequential algorithm (it veri-
fies whether the solutions arriving from processors p = Remarks: In {IO] both versions of the parallel simulated
1, ..., np - 1 coincide. This condition is normally satis- annealing algorithm have been implemented and tested.
fied before reaching the minimum specified temperature Normally the solutions obtained by the parallel versions
level, which means a faster convergence for the parallel have better quality when compared with the ones given
algorithm. by the sequential algorithm.
ing, includingQuadratic Assignment Problem (QAP) and
Proc. Proc. Proc. Proc. Proc. Proc. Proc.
123 4 678 Generalized Assignment Problem (GAP). An important
case of application to the Quadratic Assignment Prob-
lem is reported in [25]. Other examples can be found
in [26, 27]. The solution of the Generalized Assignment
Problem by simulated annealing is reported in [28]. The
Vehicle Routing Problem (VRP) was studied in [29, 30].
In [31] simulated annealing is used in connection with the
Graph Colouring Problem. An example of application of
SA to the Multidimensional Knapsack Problem is found
in [9]. A survey of SA applications in operations research
is presented in [32].

8.2 Application in power systems

SA has been extensively used to solve a variety of power
systems problems. One of the most explored problems is
Unit Commitment; a hybrid SA approach has been used
in [33,34, 35, 36,37]. Another complex problem that has
been tackled by SA is the transmission network expansion
problem; both sequential and parallel versions have been
developed [10, 11, 12, 16]. Other applications include the
Fig: 15. Hybrid Division/Clustering algorithm.
optimal allocation of capacitors in primary distribution
feeders [38, 39], the optimal reconfiguration of distribution
8. APPLICATIONS OF SIMULATED ANNEALING networks [40, 41], phase balancing [42], and the reactive
planning [43, 44].
The initial applications of simulated annealing to opera-
tions research problems occurred in the late 80's (trav-
eling salesman problem, quadratic assignment problem, 9. CONCLUSIONS
etc. ) The first important application in engineering was
in VLSI placement and was carried out by Kirkpatrick, This chapter presented the basic facts about simulated
one of the originators of the SA technique. In the 90's annealing, a widely used Monte Carlo type iterative tech-
the number of applications in engineering increased sub-' nique for solving combinatorial optimization problems.
stantially and two important books were published [1, 21]. SA tries to emulate the process of cooling a metal and
A survey made by the authors of this paper based on the the consequent formation of crystals. SA has the abil-
dabase of the Institute for Scientific Information (lSI), re- ity to escape from local optimum by performing uphill
vealed 4205 publications about simulated annealing and moves, that is, moves which deteriorate the current value
its applications in technical journal only. As for appli- of the objective function. As the cooling process progress,
cations to power system problems, according to the ISI however, the probability of accepting such uphill moves is
database, some 74 papers have been published in the spe- reduced and the SA algorithm becomes greedier regarding
cialized literature up to May 2001, most of then in jour- the value of the objective function. It can be proved that
nals such as the IEEE Transactions on Power Systems under proper cooling conditions a global optimal solution
(34), IEEE Transactions on Power Delivery (7), lEE Pro- can be reached. Notwithstanding this fact, the simulated
ceedings Generation, Transmission and Distribution (16), annealing algorithm is very greed regarding computation
Electric Power Systems Research (10), and International time requirements. This characteristic has motivated the
Journal of Electrical Power & Energy Systems (7). (Cer- development of parallel implementations of the simulated
tainly the list of application presented in the following annealing algorithm aiming to speed-up simulation as well
is not complete and is biased by the experience of the as improving solution quality.
authors in this field.)

8.1 Applications in engineering REFERENCES

Problems related to assignment in general have been a [1] Aarts E., Karst J.: "Simulated Annealing and
fertile ground for the application of simulated anneal- Boltzmann Machines" , John Wiley & Sons, 1989.
[2] Cerny V.: "Thermodynamical Approach to the [14] Gallego R.A., Monticelli A., Romero R.: "Tabu
Traveling Salesman Problem: An Efficient Simula- Search Algorithm -for Network Synthesis", IEEE
tion Algorithm", Journal of Optimization Theory Transactions on Power Systems, Vol.15, No.2, May,
and Application, 45(1), pp. 41-51, 1985. 2000.

[3] Chams M., Hertz A., de Werra D.: "Some Ex- [15] Garver L.L.: "Transmission Network Estimation
periments with Simulated Annealing for Coloring Using Linear Programming", IEEE Trans. Power
Graphs", European Journal of Operational Re- App. Syst., Vol. PA8-89, pp. 1688-1697, September-
search, 32, pp. 260-266, 1987. October, 1970.

[4] Connolly D.T.: "An Improved Annealing Scheme [16] Romero R., Gallego R.A., Monticelli A.: "Trans-
for the QAP" , European Journal of Operational Re- mission System Expansion Planning by Simulated
search, 46, pp. 93-100, 1990. Annealing", IEEE Transactions on Power Systems,
Vol. 11, No.1, pp. 364-369, February 1996.
[5] Eglese R. W.: "Simulated Annealing: A Tool for
[17] Wong K.P., Wong Y. W.: "Combined Genetic Algo-
Operational Research", European Journal of Oper-
rithms/Simulated Annealing/Fuzzy Set Approach
ational Research, 46, pp. 271-281, 1990
to Short-Term Generation Scheduling with Take-or-
[6] Hajek B.: "Cooling Schedules for Optimal Anneal- Pay Fuel Contract", IEEE Transactions on Power
ing" , Mathematics of Operations Research, 13, pp. Systems, Vol. 11, No.1, February 1996.
311-329, 1988. [18] Haffner S., Monticelli A., Garcia A., Mantovani
J., Romero R.: "Branch and Bound Algorithm for
[7] Kirkpatrick S., Gelatt Jr. C.D., Vecchi M.: "Op-
Transmission System Expansion Planning Using a
timization by Simulated Annealing", Science,
Transportation Model", IEE Proceedings - Gener-
220(4598), pp. 498-516, 1983.
ation, Transmission and Distribution, Vol. 147(3),
[8] Diaz A., Glover F., Ghaziri H.M., Gonzalez J.L., pp. 149-156, May 2000.
Laguna M., Moscato P., Tseng F.T.: "Opti- [19] Reeves C.R.: "Modem Heuristic Techniques for
mizaci6nHeuristica y Redes Neuronales", Editorial Combinatorial Problems", Jhon Wiley & Sons,
Paraninfo, Madrid, 1996. 1993.
[9] Drexl A.: "A Simulated Annealing Approach to [20] Sait S.M., Youssef H.: "Iterative Computer Al-
the Multiconstraint Zero-One Knapsack Problem" , gorithms with Applications in Engineering" , IEEE
Computing 40, pp. 1-8, 1988. Computer Society, 1999.

[10] Gallego R.A., Alves A.B., Monticelli A., Romero [21] Van LaarhotJen P.J.M., Aarts E.H.: "Simulated
R.: "Parallel Simulated Annealing Applied to Long Annealing: Theory and Applications", D. Reidel
Term Transmission Network Expansion Planning" , Publishing Company, Boland, 1987.
IEEE Transactions on Power Systems, Vol. 12, No.
1, pp. "181-188, February 1997.
[22] Aarts E.H.L, Korst J.H.M., Van Laarhoven P.J.M.:
"A Quantitative Analysis of the Simulated Anneal-
[11] Gallego R.A.: "Planejamento a Longo Prazo ing Algorithm: A Case Study for the Traveling
de Sistemas de Transmissio Usando Tecnicas de Salesman Problem", Journal of Statistical Physics,
Otimi~ Combinatorial", Tese de Doutorado, Vol. 50(1-2), pp. 187-206, January 1988.
UNICAMP, 1997. (23) Lee J.Y., ChoiM.Y.: "Optimization by Multicanon-
ical Annealing and the Traveling Salesman Prob-
[12] Gallego R.A., Monticelli A., Romero R.: "Compar-
ative Studies of Non-Convex Optimization Methods
lem", Physical Review E, Vol. 50(2), pp. 651-654,
August 1994.
for Transmission Network Expansion Planning",
IEEE Transactions on Power Systems, Vol. 13, No. [24] Ram D.J., Sreenivas T.H., Subramaniam K.G.:
2, May, 1998. "Parallel Simulated Annealing Algorithms", Jour-.
nal of Parallel and Distributed Computing, Vol.
[13] Gallego R.A., Monticelli A. , Romero R.: "Trans- 37(2), pp. 207-212, September 1996.
mission System Expansion Planning by Extended
Genetic Algorithm", lEE Proceedings - Genera- [25] Wilhelm M.R., Ward T .L.: "Solving Quadratic As-
tion, Transmission and Distribution, 145(3):329- signment Problems by Simulated Annealing", lIE
335, May, 1998. Transactions, Vol. 19(1), pp. 107-119, March 1987.
[26] Yip P.P.C., Pao Y.H.: "A Guided Evolutionary [37] Wong S.Y.W.: "An Enhanced Simulated Anneal-
Simulated Annealing Approach to the Quadratic ing Approach to Unit Commitment", International
Assignment Problem", IEEE Transactions on Sys- Journal of Electrical Power & Energy Systems,
tems Man and Cybernetics, Vol. 24(9), pp. 1383- 20(5), pp. 359-368, June 1998.
1387, September 1994.
[38] Chiang H.D., Wang J.C., Cockings 0., Shin H.D.:
[27] Laursen P.S.: "Simulated Annealing for the QAP "Optimal Capacitor Placement in Distribution Sys-
- Optimal Tradeoff Between Simulation Time and tems Part I: A New Formulation and the Overall
Solution Quality" , European Journal of the Opera- Problem", IEEE Transactions on Power Delivery,
tional Research, Vol. 69(2), pp. 238-243, September Vol. 5(2), pp 634-642, April 1990.
[39] Chiang B.D., Wang J.C., Cockings 0., Shin H.D.:
[28] Osman I.H.: "Heuristics for the Generalized As- "Optimal Capacitor Placement in Distribution Sys-
signment Problem - Simulated Annealing and Tabu tems Part II: Solution Algorithms and Numerical
Search Approaches", OR Spektrum, Vol. 17(4), pp. Results", IEEE 'Iransactions on Power Delivery,
211-225, October 1995. Vol. 5(2), pp 643-649, April 1990.

[29] Chiang W.C., Russell R.A.: "Simulated Anneal- [40] Chiang H.D., Jumeau R.J.: "Optimal Network
ing Metaheuristics for the Vehicle Routing Prob- Reconfigurations in Distributions Systems, Part I:
lem with Time Windows", Annals of Operations Formulation and a Solution Methodology", IEEE
Research, Vol. 63, pp. 3-27, 1996. Transactions on Power Delivery, 5(4), pp. 1902-
1909, October 1990.
[30] Alfa A.S., Heragu 8.S., Chen M.Y.: "A 3-0PT
Based Simulated Annealing Algorithm for Vehicle [41] Chiang H.D., Jumeau R.J.: "Optimal Network Re-
Routing Problems" , Computers & Industrial Engi- configurations in Distributions Systems, Part II: So-
neering, Vol. 21(1-4). pp. 635-639, 1991. lution Algorithms and Numerical Results", IEEE
Transactions on Power Delivery, 5(3), pp. 1568-
[31] Nolte A., Schrader R.: "Simulated Annealing 1574, July 1990.
and Graph Coloring" , Combinatorics Probability &
[42] Zhu J.X., Bilbro G., Chow M.Y.: "Phase Balancing
Computing, Vol. 10(1). pp. 29-40, January 2001.
Using Simulated Annealing", IEEE Transactions on
(32] Koulmas C., Antony S.R., Jaen R.: "A Survey Power Systems, 14(4), pp. 1508-1513, November
of Simulated Annealing Applications to Operations 1999.
Research Problems", Omega International Journal [43] Liu C.W., Jwo W.S., Liu C.C., HsiaoY.T.: "A Fast
of Management Science, Vol. 22(1). pp. 41-56, Jan- Global Optimization Approach to VAR Planning
uary 1994. for the Large Scale Electric Power Systems" , IEEE
[33] Mantawy A.H., Abdel-Magid Y.L., Selim S.Z.: "In- Transactions on Power Systems, 12(1), pp. 437-442,
tegrating Genetic Algorithms, Tabu Search and February 1997.
Simulated Annealing for the Unit Commitment [44] Hsiao Y.T., Chiang H.D.: "Applying Network Win-
Problem", IEEE Transactions on Power Systems, dow Schema and a Simulated Annealing Technique
14(3), pp. 829-836, August 1999. to Optimal VAR Planning in Large Scale Power Sys-
tems" , International Journal of Electrical Power &
[34] Mantawy A.H., Abdel-Magid Y.L., Selim S.Z.:
Energy Systems, 22(1), pp. 1-8, January 2000.
"A Simulated Annealing Algorithm for Unit Com-
mitment", IEEE Transactions on Power Systems,
13(1), pp. 197-204, February 1998.

[35] Zhuang F., Galiana F.D.: "Unit Commitment

by Simulated Annealing", IEEE Transactions on
Power Systems, 5(1), pp. 311-318, February 1990.

[36] Annakkage U.D., Numnonda T., Pahalawaththa

N.C.: "Unit Commitment by Parallel Simulated
Annealing", IEE Proceedings-Generation, Trans-
mission and Distribution, 142(6), pp. 595-600,
November 1995.
Chapter 7

Fundamentals of Tabu Search

Abstract: TS, is an alternative approach to the solution of neighborhood, i.e., in a minimization problem, the algo-
combinatorial problems. TS basically consists of a meta- rithm switches to the configuration presenting the small-
heuristic procedure used to manage heuristic algorithms est cost. Normally only the most attractive neighbors
that perform local search. Meta-heuristics are strategies are evaluated, otherwise the problem could become in-
that allows the exploitation of the search space by pro- tractable. Unlike gradient type algorithms used for local
viding means of avoiding getting entrapped into local op- search, the neighborhood in tabu search is updated dy-
timal solutions. As it happens with other combinatorial namically. Another difference is that transitions to con-
approaches, TS carries out a number of transitions in the figurations with higher cost are allowed (this gives the
search space aiming to find the optimal solutions or a method the ability to move out of local minimum points).
range of near optimal solutions. The name Tabuis related An essential feature of tabu search algorithms is the di-
to the fact that in order to avoid revisiting certain areas rect exclusion of search alternatives temporarily classed
of the search space that have already been searched, the as forbidden (tabu). As a consequence, the use of mem-
algorithm turns these areas "Iabu (or forbidden), which ory becomes crucial in these algorithms: one has to keep
means that for a certain period of time (the tabu tenure) track of the tabus.
the search will not consider the examination of alterna- Other mechanisms of tabu search are intensification
tives containing features that characterize the solution and diversification: by the intensification mechanism the
points belonging to the area declared Tabu. algorithm does a more comprehensive exploration of at-
Index Terms: tabu search, optimization methods, combi- tractive regions which may lead to a local optimal point;
natorialcptimizaticn. by the diversification mechanism, on the other hand, the
search is moved to previously unvisited regions, something
that is important in order to avoid local minimum points.
1. INTRODUCTION Tabu search consists of a set of principles (or functions)
applied in an integrated way to solve complex problem in
Tabu search was developed from concepts originally used an intelligent manner. According to Glover [2]:
in artificial intelligence. Unlike other combinatorial ap- Tabu search is based on the premise that problem solv-
proaches such as genetic algorithms and simulated an- ing, in order to qualify es intelligent, must incorporate
nealing, its origin is not related to biological or physical adaptive memory and responsive exploration. The use
optimization processes [6]. TS was originally proposed by of adaptive memory contrasts with "memoriless" designs,
Fred Glover in the early 80's and has ever since been ap- such as those inspired by metafors of physics and biolDgtJ,
plied with success to a number of complex problems in sci- and with "rigid memory" designs, such as those exempli-
ence and engineering. Applications to electric power net- fied by branch and bound and its AI-related cousins. The
work problems is already significant and growing. These emphasis on responsive e%ploration (and hence purpose)
includes, for example, the long term transmission network in tabu search, whether in a deterministic or probabilistic
expansion problem and distribution planning problems implementation, derives from the sapposition that a bad
such as the optimal capacitor placement in primary feed- strategic choice can yield more information than a good
ers. random choice.
The principal features (or functions) of Tabu Search are
1.1 Overview of the Tabu Search Approach summarized in [2] as follows:

Compared with simulated annealing and genetic algo- 1. Adaptive Memory.

rithms, tabu search explores the solution space in a more
aggressive way; i.e., it is greedier than those algorithms. • Selectivity (including strategic forgetting)
Tabu search algorithms are initialized with a configura-
tion (or a set of configurations when the search is per- • Abstraction and decomposition (through ex-
formed in parallel) which becomes the current configu- plicit and attributive memory)
ration. At every iteration of the algorithm, a neighbor- • Timing (which means, both recency and fre-
hood structure is defined for the current configuration; quency of events and differentiation between
a move is then made to the best configuration in this short term and long term)
• Quality and impact (meaning the relative at- Solution space X
tractiveness of alternative choices and magni-
tude of changes in structure or constraining re-
• Context (including both regional, structural and
sequential interdependences)

2. Sensible Exploration
• Strategically imposed restraints and induce-
ments (or, tabu conditions and aspiration lev-
• Concentrated focus on good regions and good Fig. 1: Illustration of a transition in tabu search.
solution features (intensification process)
• Characterizing and exploring promising new re-
gions (diversification process) algebraic constraints; configurations are represented by
• Non-monotonic search patterns (strategic oscil- codification instead.
lations) Tabu Search solves Problem (1) by first applying a lo-
cal heuristic search in which, given a configuration x (a
• Integrating and extending solutions (path re- solution), the neighborhood of x is defined as the set of
linking) all configurations x' E N(x) that can be obtained by the
same transition mechanism applied to z: The conditions
Different tabu search algorithms are formed by combin- required for x' to be a neighbor of x defines the structure
ing these functions to solve specific problems. Of course, of the neighborhood of x. The local search algorithm finds
the way actual implementation is made depends on prob- the transition which leads to the configuration x' present-
lem characteristics and on the degree of sophistication ing the largest decrement in the objective function (in the
needed in a particular application. Although the set of same way as a steepest gradient algorithm). The repeti-
functions listed above can be expanded and/or modified, tion of this procedure eventually leads to a local optimal
it is worth noting that the approach was originally pro- solution.
posed, and tested successfully in a number of problems, Tabu Search differs from the simple local search algo-
with only a reduced set of such functions (tabu search rithm above in at least two essential aspects:
with short term memory with tabu lists and aspiration
criteria). 1. Transitions leading to configurations for which the
objective function is actually greater than it is for
1.2 Problem Formulation the current solution are allowed (we are considering
a minimization problem such as Problem (1»).
Generally speaking, TS algorithms solves problems for-
mulated as follows: 2. The neighborhood of x, i.e, N(x), is not static, that
is, it can change both in size as well as in structure.
Min f{x) (1) A modified neighborhood N*(x) is shown in Figure
1. The elements of N*(x) are determined in different
Subject to xEX ways, for example:
where x is a configuration (or a decision variables), f(.) is
• Using a tabu list which contains attributes of
the objective function and X is the search space. Notice
configurations that are forbidden. In this case
also that no assumptions are made regarding the convex-
N* (e) c N (x); as noted above this is useful to
ity of f(x) and X or about the differentiability of f{x).
avoid cycling.
A variety of combinatorial optimization problems can
be represented as the minimization of an objective func- • Using strategies to reduce the size of the neigh-
tion subject to a set of algebraic constraints, as above. borhood in order to speed up the local search.
This is the case, for example, of the radiality constraints in As in the previous case, the reduced neighbor-
certain distribution system operation and planning prob- hood is such that N*(x) C N(z)
lems. This type of constraint, which may be a nuisance • Using the so called elite configurations to per-
for certain mathematical approaches, is easily handled by form path relinking. In this case it is not neces-
TS since normally TS does not work directly with the sarily true that N*(x) c N(x).
°0 0 O.o.oe

O 0 t;::®'".4 W
•• 0 •• 0 ••
Aaractlvc o • 00. 0 0 0 •

o eO. 0 0 • 0

D~ 0
• 0 •••Q••••
0 0 e 0


0 0 0
0 0

o • a a a • 0 e 0

OO@O O.ooeoo.

oow •

Unfcuible CCIIlfapacioD

Aancave fellible oodplatiOil

Fig. 2: illustration of the search space in Tabu Search. Fig. 3: Illustration of a neighborhood of a given configu-

e Redefining N (X) during the optimization pro-

cess; this is normally done in order to profit from 1.4 Neighborhood Structure
the specific properties of the problem.
Figure 3 illustrates a generic point x with its neighbor-
hood. Given a transition mechanism, a neighbor of x is
any point which can be reached from x by means of a
1.3 Codification and Representation single transition. There are both feasible and unfeasible
neighbors. Among the feasible ones, usually only the at-
As it happens with other combinatorial approaches (e.g., tractive configurations are of interest.
simulated annealing and genetic algorithms), representa- Figure 4 includes in the neighborhood of z configura-
tion and codification are key issues in tabu search. The tions that are attractive although unfeasible. The reason
definition of feasible and unfeasible solutions, as well as is that the temporary transition through an attractive un-
the characterization of the objective function, are directly feasible configuration may in certain circumstances be the
connected to the type of representation and codification shortest way leading to the desired attractive feasible so-
used to model the problem being solved. Another critical lution. Figure 5 shows then the complementary transition
issue affected by the method used in representation and which takes to configuration from an attractive unfeasible
codification is the characterization of the neighborhood of point to an attractive feasible one. These type of strategy
a given configuration. has been successfully used in the transmission expansion
Regarding the basic functions of tabu search, three as- planning problem, which is a good example of complex
pects are of interest: (a) the way transitions are made in system with plenty of local optimal solutions [33]. The
the solution space; (b) how each solution in the search transition through unfeasible points is facilitated by the
space is characterized; and (c) how the neighborhood of inclusion of penalty terms in the objective function to rep-
a given configuration is defined. These concepts are best resent the cost of unfeasibUity; hence, if the decrement
understood with the help of the graphical illustration dis- in the actual cost more than compensates for the cost
cussed in the following. For instance, the search space component due to the unfeasibility, then the transition is
can be visualized as shown is Figure 2, where attractive allowed.
and unattractive regions are indicated; two unattractive
configurations and three attractive ones are also repre- Example 1:
sented in the figure, including an optimal solution, which,
of course, is shown inside an attractive region. The tabu Consider now the well known n-queens problem that
search starts at a point such as point 2 in the figure and consists of placing n queens on an n x n chessboard so that
moves towards the optimal solution (the point 1 in the no two queens can capture each other [41]. This problem
figure). can be seen as an optimization problem for which an op-
timal solution is such that no two queens can be placed in
the same row, same column or same diagonal. Figure 6(a)
shows one topology for the n-queens problem in which
there are four collisions, or four possibilities for attack.
Figure 6(b) shows an optimal solution, i.e., a solution in
which no two queens can capture each other. The exam-
ples 1 through 6 which will be presented in the follow-
•• 0 •• 0 •• ing draws upon, with minor modifications, the material
o • 00. 0 0 0 • originally developed by Laguna in [5], and are an excelent
o • 0 • 0 0 • 0 .00 guide for understanting the basic workings of tabu search,
X mainly regarding the use of tabu lists and aspiration cri-
• 0 •.•Q•••~ 0 0 • 0 0 •
o 0 • 0 ~···;···O····.·-o 0 0
An efficient codification for this problem consists in de-
.00 • • 000 • •
scribing a configuration by an n x 1 vector P whose i-th
00000 .0. 0 a element, P(i), represents the column of the chessboard
o • 0 0 • 0 0 • where the i-th queen is placed; the i-th queen is assumed
• 0 0 0 • to be placed in row i. With this codification, the topology
o given in Figure 6(a) is described as follows:
• Feuib&e COIlfi.... 0 UIlfcIIib&c CGIlfiplltion
o Aancuvc uafalible CGlfiprllioft

• Aanccivc feuible OODfiJUDlicm

Fig. 4: Example of a feasible -+ unfeasible transition. where queen 1 is in row 1 and column 4, queen 2 is in row
2 and column 5, etc. With this codification no two queens
are placed in the same row or in the same column, and
so part of the problem is already solved. The remaining
problem can then be formulated as the minjmi?,&tion of
diagonal collisions.
Next we have to establish the objective function for the
n-queens problem. In order to do that, the concepts of
positive and negative diagonals are introduced, as illus-
trated in Figures 7(a) and 7(b). In order to find the num-
ber of collisions of a configuration it is then necessary to
00. go through the positive and negative diagonals and check
•• 0 •• 0 ••
for collisions. This task is made easier noticing that the
positive diagonals are characterized by the fact that the
o .00 .00 0 •
difference i - j is constant, whereas for the negative diag-
• 0 • 0 0 • 0
x onal i + j is constant, as illustrated in Figures 8{a) and
• 0 O.Q..
0 0 • 0 0 • 8(b). For example, for the configuration of Figure 6(a)
• 0 0 0 ·0···.·-. 0 0 collisions occur in positive diagonals 5, -2 and -3, and in
negative diagonal 7. Thus it can be seen that the evalua-
• 00. • 000 • •
00000 • 0 • 0
tion of the objective function for this type of codification
can be easily performed.
o • 0 0 • 00.
Notice that although the n-queens problem could have
• 0 0 0 • been mathematically formulated as a 0 - 1 problem, in
o tabu search such mathematical modeling is in fact unnec-
• Fcuible coafipa1iaD 0 Uar-ibJe ~
o AlncIiw _ _ibIeccafipaIioa essary, as illustrated by the discussion above.
• Amctiwfcalible . .fi.......

1.5 Characterization of a neighborhood

Fig. 5: Example of an unfeasible -+ feasible transition. A critical issue in tabu search is the need for evaluating
the configurations in a neighborhood. The number of con-
figurations in a neighborhood may be very large and the
quality of these configurations may vary a lot. At each
iteration of a basic TS algorithm it is normally necessary
to identify the best configuration in the neighborhood of
the current solution taking into account that the new so-
lution has no tabu attributes or, if it happens to have, the
1 2 ~ 4. Fi R 7 1 2 R 4 !i 6 7
aspiration criterion is satisfied (i.e., the new configuration
1 Q 1 Q
is good enough to justified relaxing the tabu constraint).
2 Q 2 Q One of the main strategies of tabu search consists in
3 Q 3 Q moving to the best configuration in the neighborhood of
4 Q 4 Q the current configuration. Usually the sizes of neighbor-
5 Q 5 Q hoods are much larger than can be evaluated by the algo-
6 Q 6 Q rithm and thus only the most attractive part of a neigh-
7 Q (a) 7 Q (b) borhood is actually explored. Having the means of finding
efficiently such attractive parts is critical to TS method-
Fig. 6: (a) An initial topology for the 7-queens problem. ology. Recent work on tabu search recommends general
(b) An optimal topology for the 7-queens problem. purpose strategies for reducing neighborhood sizes. These
strategies however do not necessarily work in all problems
and problem specific strategies have to be developed.
Example ~:

A move in tabu search consists in moving from the cur-

rent configuration to the best neighboring configuration.
In order to perform a move it is necessary then to know
the neighborhood of the current configuration. A popular
choice for the n-queens problem is as follows: a neigh-
bor configuration is a configuration that can be found be
swapping the columns occupied by any two queens. For
example, for the configuration of Figure 6(a), a neigh-
bor of the current configuration is obtained by swapping
queens 2 and 6. The corresponding configurations are as
............-..-.-................................. (b) ~ t
Fig. 7: (a) Positive diagonals. (b) Negative diagonals.
P2=4 13675 2-+-:- Neirnbor.
1 2 3 4 5 6 7 coD11gUratlon

For this neighborhood structure, the current configura-

tion has n(n - 1)/2 neighbors; for n =
7 the number of
neighbors is 21. For large values of n, however, the num-
1 ? ~ .4 Fi 6 7 , ? ~ .4 :i R 7 ber of neighbors can be prohibitive, and an appropriated
1 0 -1 -2 -3 -4 -5 -6 1 2 3 4 5 6 7 8 simplification of the neighborhood will have to be consid-
2 1 0 -1 -2 -3 -4 -5 2 3 4 5 6 7 8 9 ered.
3 2 1 0 -1 -2
-3 -4 3 4 5 6 7 8 9 10 Table 1 shows the objective function in the neighbor-
4 3 2 1 0 -1
-2 -3 4 5 6 7 8 9 10 11 hood of the current configuration as defined above. There
5 4 3 21 0 -1 -2 5 6 7 8 9 10 11 12 are four neighbors with cost reduction of -2, which means
2 1 0 -1 6 7 8 9 10 11 12 13
that the corresponding moves would reduce the number
6 5 4 3
of collisions from 4 to 2.
7 6 5 4 3 2 1 0 (a) 7 8 9 10 11 12 13 14 (b)

Fig. 8: Characterization of diagonals. (a) Positive diago- 2. FUNCTIONS AND STRATEGIES IN TABU
nals. (b) Negative diagonals. SEARCH
A tabu search strategy is an algorithm which normally
forms part of a more general tabu search procedure. The
Table 1: Neighbor configurations for the
configuration of Figure 6.

v ~t No Swap v
2 -2 8 3-5
~t No
3 -1 15
Swap v ~v
1-4 5 1 ~c
0 0 fA
. .. .

2 2-4 2 -2 9 3-6 3 -1 16 2-3 5 1
3 2-6 2 -2 10 4-7 3 -1 17 3-7 5 1
4 5-6 2 -2 11 6-7 3 -1 18 4-6 5 1
5 1-5 3 -1 12 2-5 4 0 19 5-7 5 1
6 1-6 3 -1 13 1-2- 5 1 20 4-5 6 2
7 2-7 3 -1 14 1-3 5 1 21 3-4 7 3

basic functions of tabu search are intensification, diversifi-

cation, strategic oscillation, elite configurations, and path
relinking. This section describes how these functions can
be combined to build a range of tabu search strategies
and operators, and how these can be used in effective ~ft:. . . ~
tabu search programs.
~.c;-;\ ~
~:"~ q~ /I."""
2.1 Recency Based Tabu Search

Recency-based memory is an important feature of tabu Fig. 9: Example of a search using Tabu Search.
search: it is a type of short term, memory which keeps
track of solution attributes that have changed during the
most recent moves made by the algorithm. The infor-
the search inefficient since the visit to certain attractive
mation contained in this memory allows labeling as tabu-
configurations may be delayed. An alternative path not
active selected attributes of recently visited solutions; this
shown in Figure 9 may have an optimal solution as an in-
feature avoids revisiting solution already visited in the re-
termediate point. In this case the solution process passes
cent past. A number of practical applications reported
through the optimal solution and continues until it stops
in the literature are direct implementations of this basic
at an non-optimal solution. This does not represent a seri-
tabu search algorithm.
ous problem since the optimal solution is kept in memory
This is the most basic type of tabu search algorithm and a the incumbent solution (Of course, rather than a sin-
is based on a list of forbidden attributes and an aspiration gle incumbent, one can keep track of a list of the best
criterion. The main objectives of the tabu list is (a) to solutions found during the solution process.)
avoid cycling, i.e., revising already visited solutions, and
(b) reducing the size of neighborhoods by excluding from Example 9:
consideration configurations labeled as tabu.
The main disadvantage of the use of a tabu list is that a This example illustrates the use of attributes to imple-
forbidden attribute may be part of an attractive solution ment tabu constraints. Consider again the n-queens prob-
of a neighborhood that has not been visited so far. To =
lem with n 7. Assume that the current solution is the
cope with this problem an aspiration criterion is used such configuration shown in Figure 6(a). This configuration is
that if the cost associated with a tabu configuration is codified as follows:
smaller than the costs of the last kp transitions, or is ~row (queen)
inferior to the cost of the incumbent solution, then the P1 = ~column
tabu constraint is relaxed and the transition is allowed.
Figure 9 illustrates the working of a short term mem- The 21 neighbors of this configuration are summarized in
ory tabu search algorithm of the type describe above. Table 1. The best neighbor corresponds to the move that
Four different processes, or paths, are shown in the figure: swaps queens 1 and 7. This move becomes tabu and can
paths 1-5-6-7-15 and 2-8-9-15 lead to the optimal solution; be store as illustrated in Figure 10, which indicates that'
path 3-9-10-14 is entrapped into a local optimal solution; the swap 1-7 will stay forbidden for the next 5 moves. The
and path 4-11-12-13 produces cycling. Even when tabu same arrangement can be use for other possible moves.
restraints are enforced, cycling may occur if the number of The arrangement is updated after every move: for exam-
moves k during which a tabu is active is relatively small. ple, when the next move is performed the number 5 in
Excessively large values of k, on the other hand, can turn position (1,7) is decremented to 4 to take into account
that the corresponding tabu tenure has decreased. Alter- 5. Move to the best neighbor if it is not tabu.
natively, rather than storing the tabu tenure for each tabu
constraint, one can store the iteration where the tabu is 6. Move to the best neighbor if it is tabu but satisfies
activated: for example, if the tabu is activated in itera- the aspiration criterion.
tion 237, this number is entered in position (1,7) of the 7. Update the tabu list.
storage arrangement of Figure 10.
2 3 4 5 6 7 8. Repeat steps 3 to 7 until a topology with zero cost is
1 I 5
2 The n-queens problem can be solved considering [6]:
(1) tabu tenure of three iterations, (2) aspiration crite-
rion that accepts a new (tabu) solution if its cost is lower
4 than that of the incumbent solution, (3) neighborhood is
5 formed by the topologies obtained by exchanging the po-
sitions of any two queens, (4) the initial topology is that
6 of Figure 6.
The application of this algorithm yields the following
sequence of topologies leading to an optimal solution:
Fig. 10: Storage of attributes for the n-queens problems.

2.2 Basic tabu search algorithm

A tabu search algorithm with short term memory has the iter=2~v=2
following characteristics:

• It is a process of kT moves in the search space which iter=3~v=1

comprises both feasible and unfeasible solutions. kT
is either predefined or determined adaptively. iter=4~v=1
• The neighborhood of the current configuration is
searched for the configuration with minimum cost. iter=5~v=2
The move is validated if the move found is not tabu,
or even being tabu, if an aspiration criterion is met. iter=6~v=1
Moves to configurations with cost higher then that of
the current configuration are allowed.'
• At each iteration the list of tabu attributes is up-
The best moves and the corresponding objective func-
Crucial to the efficiency of the algorithm is the choice tions and tabu lists at each iteration are summarized in
of the codification and the definition of neighborhood. the following:

Example .4.: • 1; move=l - 7; 'V = 2; tabue I - 7(3)

A possible algorithm for the n-queens problem is as • 2; move=2 - 4; 'V = 1; tabu=l- 7(2),2 - 4(3)
• 3; move=1-3; 'V = 1; tabu=1-7(l),2-4(2), 1-3(3)
1. Define the tabu tenure and the aspiration criterion. • 4; move=5-7; v =2; tabu=2-4(1), 1-3(2),5-7(3»
2. Define neighborhood structure. • 5; move=4-7; 'V = 1; tabu=1-3(1),5-7(2),4-7(3»)
3. Find the initial topology. • 6; move=1-3; v = 0; tabu=5-7(1),4-7(2), 1-3(3»
4. Compute and order the objective function for the en-
tire neighborhood. Remarks:
• The optimal solution was found after 6 iterations The elite candidate list technique starts with a master
(moves). In the two first iterations the objective func- list formed by the n p best elements of the neighborhood
tion was reduced. In the third iteration the objective of the initial configuration. Then a series of moves are
function remained constant. performed considering that the set of top nJ) neighbors
remain the same. When either a configuration satisfying
• In the fourth move there was no neighbor configu- a given objective is found or a maximum number of moves
ration with quality better than that of the current is performed, a new master list is built and the process is
solution, although there was two topologies with at- repeated.
tributes 1-3 and 1-7 which lead to topologies with the The successive filter strategy technique is normally used
same objectives (These are forbidden; the attribute in cases in which the neighborhood structure is defined
1-3 would lead to a topology already visited, whereas by swapping attributes. This is what happens, for exam-
the attribute 1-7 would lead to a new topology; since ple, in network transmission planning, where a neighbor
the aspiration criterion is not satisfied, the move is is obtained by the addition of a circuit and the removal of
not performed.) another circuit (a swap of candidate circuits). A reduced
• In the fourth move the objective function is in fact neighborhood is then obtained by defining two short lists,
increased. one with elements that can be added and another with
elements that can be removed from the current configu-
• In the fifth iteration there was an decrease in the ration.
objective function. The sequential fan candidate list technique is similar
to the concept of population used in genetic algorithms.
• Finally in the sixth iteration the attribute 1-3 is used Given an initial configuration the entire neighborhood is
since the aspiration criterion is satisfied (the penal-
evaluated and a reduced list of the n p best neighbors
ized objective is smaller than that of the incumbent).
is formed. These configurations are then called cutTent
configurations (a population). Next a reduced number 0
2.2.1 Candidate list strategies neighbors of each current configuration is evaluated and a
Once the neighborhood is defined it is evaluated. Since successor is found for each current configuration. When-
normally either the neighborhood is excessively large or ever two different successors are the same configuration
the evaluation of each alternative is time consuming, a an additional configuration is found to keep the size of
screening to limit the search to the most attractive neigh- the population constant (i.e., n p configurations).
bors has to be performed. In power system applications,
for example, evaluation may imply the need for solving ei- 2.2.2 Tabu tenure
ther a linear program or a non-linear program or a power
Tabu tenure is the number of tabu search iterations (i,e.,
flow problem. Four strategies have been used in the liter-
number of moves or transitions) an attribute remains for-
ature to perform the screening of a neighborhood:
bidden. In a typical application several tabu lists can be
• Aspiration plus; maintained simultaneously each with a different tenure.
Thus, not only single attributes but also combinations of
• Elite candidate list; attributes can be forbidden. Very seldom a tabu is spec-
ified by given the complete information about the unde-
• Successive filter strategy; and sired configuration; normally only certain attributes of the
• Sequential fan candidate list. configuration are put in the tabu list. The tabu tenure
can be either static (predefined) or dynamic (determined
The aspiration plus technique requires a minimum qual- on the fly). Dynamic tenure can be implemented in two
ity level for a configuration to be included in the neighbor- different ways: (1) random dynamic tabu tenure or (2)
hood. This method requires that the potential neighbors systematic dynamic tabu tenure.
be put in an ordered list and analyzed one by one until The random dynamic tabu tenure is implemented us-
a specified threshold is passed. The process stops after a ing bounds tmin and t m Q%; a randomly chosen tenure
few more configurations are evaluated after the threshold tmin ~ t ~ t m cu : when a new tabu attribute is estab-
is hit (value plus). In order to avoid that a reduced or lished. There are two alternatives for implementing this
an excessive number of neighbors is analyzed, upper and technique: in the first one the value t is kept for a certain
lower bounds have to be satisfied. There are at least two time atm Gz , where Q is a parameter. After that time a
alternatives for defining the threshold: (1) the value of new t is determined and so forth; the second alternative
objective function of the best configuration visited in the consists in determining a new t at each move, so that each
last k moves, and (2) the value of objective function of tabu attribute will remains forbidden for different periods
incumbent solution. of time.
As with the random dynamic tabu tenure, the system- is usually implemented as a subroutine of a more general
atic dynamic tenure can be implemented using two differ- long term algorithm. The long term algorithm is normally
ent approaches. The first alternative is a simple variation based on three techniques:
of the random dynamic tabu tenure in which the random
choice is replaced by a systematic choice: for example, • Frequency based memory.
assuming bounds tmin = 4 and tm4~ = 9, the system-
• Intensification;
atic choice is made chosing t cycling through the sequence
4,5,6, 7,8,9 [6]. The second alternative is the so called • Diversification;
moving gap technique, where the tabu list is partitioned
into two halves, one static and one dynamic. For example, 2.3.1 Frequency based memory
in a scheme with eight iterations, all attributes remain in
the tabu list for the four first iterations, whereas the last Frequency information is stored in order to be used to
four iterations are variable, as described in the following. change future search strategies. Two principal types of
In the case of the right gap the attributes initially remain frequency are used in practice: residence frequency and
in the tabu list for four iterations, then they are dropped transition frequency.
from the list for two iterations, and come back to the list By the residence frequency technique the number of
for two additional iterations. In the middle gap the at- times an attribute occurs in a predefined set of config-
tributes initially remain in the tabu list for five iterations urations is kept in memory. This set of configurations
(four plus one), then they are dropped from the list for can be either the set of all configurations visited so far,
two iterations, and come back to the list for one additional or a set of elite configuration or a set of low quality solu-
iterations. And in the left gap the attributes initially re- tions, etc. For example, regarding the set of elite config-
main in the tabu list for six iterations (four plus two), urations, the frequency of a given attribute may indicate
then they are dropped from the list for the two remaining it is highly desirable; and the opposite happens if we are
iterations. dealing with a set low quality solutions. In the first case
(elite configuration), these attributes can be used both in
2.2.3 Aspiration criteria intensification and in path relinking. On the other hand,
if the set of configurations is diverse and the residence
Typically a tabu test begins with the determination of a frequency of a certain attribute is high, it may indicate
trial solution Zl in the neighborhood of the current solu- that this attribute is limiting the search space and should
tion z: Then attributes of x that are changed in the move be penalized (become tabu).
from x to x' are identified. If among these attributes there By the transition frequency technique the number of
are tabu-active attributes the move still can be validade times an attribute occurs in transitions is stored. The fre-
if x' satisfies an aspiration level, i.e., if x' is good enough quent occurrence of these attributes does not necessarily
for justifying the relaxation of the tabu restraint. indicates that they will form part of the optimal solution.
The information regarding these attributes can be used
2.3 The Use of Long Term Memory in Tabu Search to change the search strategy by means of diversification;
Although a significant part of the literature on tabu search these attributes are penalized or become tabu. In trans-
deals with algorithms based exclusively on short term mission expansion planning, for example, this occurs with
memory techniques, more complex, sophisticated appli- low cost circuits which are frequently used as crack fillers,
cations require the use of long term memory. There are i.e., they are temporarily used in moving between two al-
several different ways to use long term memory: ternative configurations and then are removed from the
current solution.
• Reinitialize the search from a high quality configura-
tion; Example 5:

• Redefine the neighborhood structure based on a high Figure 11 illustrates the use of memory in a tabu search
quality solution; algorithm in the 7-queens problem discussed in the pre-
vious examples. In this case the lower triangle of the
• Redefine the objective function to penalize certain 7 x 7 matrix is used to store the transition frequency of
attributes of a high quality solution; each possible move, whereas the upper triangle is used for
• Change the search strategy based on knowledge ac- short term memory, as in the previous example. In the
quired during previous searches. case summarized in the figure, 25 moves have been per-
formed; for example, five of these moves were performed
In practice normally a combination of short term and by swapping the positions of queens 1 and 3, two moves
long term memory is adopted: the short term memory involving queens 1 and 5, etc.
Current solution
1 2 3 4 567
[I[ill]I[I[illJ [CX 0
~ ~~~~.~
0 W
..... a .
1.•.•••••. Uuarxtivc
No. of collisions
Q I <~:~~:Z.::..

. .~;;:';i(fjjjj
Tabu structure
1 2 345 6 7
3 1

5 2 3
(.. ~_. '\ \.~~) 0
1 3
2 1 3 4
1 4
\S) "',,8 ~~
1 1 6 ~ ..~ ~
3 1 7
f~~~~~~ . w YJ
Fig. 12: Examples of intensification and diversification in
Fig. 11: Transition frequency based memory (lower trian- tabu search.

example, this objective can be achieved performing sim-

The information of the frequency based memory can ple operations such as single circuit additions or removals
be used in different ways. From the current topology, for as well as circuit swaps.
which the number of collisions is equal to one, a diversifi- Intensification can also be based on building blocks that
cation process can be started. For example, the attributes are present in elite solutions stored in memory. Through a
that appear with high frequency are penalized during the mechanism called path relinking, we can produce new at-
diversification process. tractive configurations utilizing such building blocks. As
a matter of fact path-relinking is a means of implementing
2.3.2 Intensification both intensification and diversification, depending on the
Both intensification and diversification are considered ad- amplitude of the changes introduced by this mechanism
vanced functions of tabu search, in the sense that they in the current configuration (see Figure 13).
can be added to a basic tabu search algorithm that uses Decomposition is an alternative way to implement in-
short term memory with tabu lists and aspiration criteria. tensification. In this case severe constraints are imposed
Thus the search normally begins with the basic algorithm on the problem structure in order to intensify the search
that is then followed by intensification or diversification. in a restricted region. By this process certain problem
Tabu search algorithms explore intensification in a sys- variables are kept constant at predefine values and a re-
tematic way. Configurations found during the search are stricted search is then performed. For example, in the
stored and their neighborhoods are then explored more traveling salesman problem, a certain number of partial
thoroughly; the local optimal solutions closest to these paths are kept constant and the search is performed in the
solutions are found in the intensification phase. In Figure resulting reduced space to find the best way to connect
12 the sequence A-1-2-3-4-B was obtained by a short term and complement these paths.
memory algorithm; elite configurations 1-2-3-4 were found
and stored. Intensification can be performed starting from 2.3.3 Diversification
anyone of these elite solutions. In this case intensifica-
tion can be implemented usign basically the same alga- In tabu search algorithms diversification is performed in
rithm used in the short term memory algorithm but with a more planned way than it is in simulated annealing and .
some modification in the neighborhood structure, since genetic algorithms. As before, the objective of diversifi-
normally the intensification is restricted to the neighbor- cation is to move the search to unvisited regions of the
hood of an elite configuration and aims to find a local search space as well as to avoid cycling. In the capaci-
optimal solution in that neighborhood, as illustrated in tor placement problem this is carried out by temporarily
Figure 12. In the transmission expansion problem, for changing the rules for finding new configurations: for in-
five best neighbors of the current topology are as follows:

Queens v Vp

3-4 1 4
1-6 2 3
2-5 2 3
2-6 2 2
2-4 3 4

Hence, without penalties, the atribute 3 - 4 would be

chosen, in which case the objective function would be v =
1, whereas when the penalties are taken into account, the
atribute 2 - 6 is selected instead, with modified cost of
Fig. 13: Examples of path relinking in tabu search.
1Jp = 2, which then causes the diversification of the search,
since the atribute 2 - 6 has not bee used so far.

stance, certain capacitor banks are removed from a con-

figuration and become tabu-active, i.e. they will not be 2.4 Other TS Strategies
allowed to be reintroduced for a certain period of time.
The removal of key capacitor banks usually changes the In the following three important strategies that can be
architecture of a solution in such a way as to force the used both in connection with short term memory as well
algorithm to visit unexplored regions. as long term memory are discussed: path relinking, strate-
gic oscillations and elite configurations.
Diversificationcan be implemented either by re-sttJrting
the search from a new configuration or by modifying the
choice rules. A re-starting configuration can be obtained,
for example, via path relinking or by using attributes of 2.4.1 Path relinking
elite configurations or attributes found by the residence
frequency mechanism. Modified choice rules, on the other Two or more elite configurations are used to generate a
hand, involves the change of objective function to take new configuration as indicated in Figure 13. The con-
into account information obtained via residence frequency figurations used in path relinking are called reference s0-
(neighbor configurations with high residence frequency lutions. Reference solutions can be classified as initiat-
ing solutions or guiding solutions. For example in Fig-
are penalized so that the algorithm will tend to search
in a different neighborhood). ure 13, configurations 9, 10 and 12 were generated taking
configuration A as the initiating solution and consider-
ing configurations B and C as guiding solutions; this is
Ezample 6:
an intensification process (new solution are sought in the'
neighborhood of the elite configurations).
Diversification can be applied to the configuration
shown in Figure 11 (Example 5). The penalized objec- An alternative way to implement path relinking, called
tive function constructive neighborhood consists in finding a single con-
figuration and initiating intensification or diversification
from there. A few attributes of the initiating solution are
selected and by a constructive algorithm at each step an
attribute of a guiding configuration is added to the new
where vI' is the modified objective function, tJ is the orig- configuration; such added attributes are select either to
inal objective function (number of collisions), Q is the. improve quality or to restore feasibility. If a large number
penalty factor, and n J is the number of times a given of attributes of the initiating solution is present, the we
attribute occurred (the sum is performed of the set of ac- have an intensification process. Otherwise, new configu-
tive tabus). For Q = 1, the modified objectives for the ration will be used as the basis for diversification.
2.4.2 Strategic oscillation this problem is to find optimal, or even near optimal so-
lutions, for systems with high dimensions. Typical TS
Strategic oscillations is based on three different techniques applications is this area are described in [8]-[11].
that are used alternately: The Vehicle Routing Problem (VRP) is another prob-
lem in operations research that has been explored with
• A strategy for unfeasible regions in which the objec-
tive is to reach the boundary of a feasible region and TS techniques. The method proposed in [12] has shown
superior performance when compared with other combi-
enter the feasible region;
natorial algorithms when applied to a series of benchmark
• A strategy for searching into the feasible region for a problems. Other applications to the VRP problem are de-
local optimal solution; scribed in [13]-[15].
Assignment problems such as the Quadratic Assign-
• A strategy for leaving the feasible region and entering ment Problem (QAP), Generalized Assignment Problem
the unfeasible region. (GAP) and Multilevel Generalized Assignment Problem
(MGAP) were also studied with TS algorithms. A impor-
Strategic oscillation is more efficiently used in problems
tant application is reported in [16]. Other of use of TS in
where the size of the unfeasible region is relatively large,
assignment problems are addressed in [17]-[19].
as is the case of the transmission expansion planning prob-
lem. Other problems TS has been applied include weighted
Good solutions at one level are likely to be found close k-cardinality problem (20], multidimensional knapsack 0-
to good solutions at an adjacent level [2]; this is the basis 1 problem [21], fixed charge transportation problem [22],
for the so called Proximate Optimality Principle (POP). graph partitioning [23] and job shop scheduling [24, 26].
In strategic oscillation, and other constructive-destructive In [27] an application of TS to a telecommunication net-
processes, it is worthwhile to explore a level before moving work problem is described.
to the next one. Thus the search process is forced to stay
at the current level for n pop iterations (transitions) before 3.2 Application in power systems
moving to the next level; the transition is then taken from
the best configuration found in the npop transitions. An interesting application in the power system area is the
use of a TS and hybrids to the unit commitment problem
[28]-[30]. Another application to a very complex problem
is the case of the transmission network expansion plan-
The initial applications of tabu search occurred in the late ning problem described in [31]-(33]. Other examples are
80's and were limited to the operations research area, and described in [34]-[40].
included classical problems such as the traveling salesman
problem and graph coloring. Applications to complex
problems in other fields appeared in the early 90's. A 4. CONCLUSIONS
survey made by the authors of this chapter based on the
dabase of the Institute for Scientific Information (lSI) has This chapter has presented the basic features of tabu
listed 680 publications related to tabu search and its ap- search. Like other iterative techniques used to solve com-
plications, most of them in the period 1998-2001. As for plex combinatorial optimization problems, TS has the
applications to power system problems, according to the ability to escape from local optimal solutions by accept-
lSI about 30 papers have been published in the special- ing uphill moves, that is, moves that deteriorate the cur-
ized literature up to April 2001, most of then in journals rent value of the objective function. TS differs from the
such as the IEEE Transactions on Power Systems (9), lEE other techniques in the use 0 memory, which is crucial to
Proceedings Generation, Transmission and Distribution the successful implementation of tabu search. As the TS
(5) and Electric Power Systems Research (10), Interna- algorithm traverses the solution space it stores relevant
tional Journal of Electrical Power & Energy Systems (2). findings in short term and long term memories, which are
The list of application presented in the following is not then subsequently used to redirect search and modify the
complete and is biased by the previous experience of the local search algorithms that form part of the TS meta-
authors in this field. heuristic. Both simple tabu search algorithms based on
short term memory with tabu lists and aspiration criteria"
as well as more advanced techniques such as intensifica-
3.1 Applications in engineering
tion, diversification, path relinking, elite configurations
The traveling salesman problem is one of the combina- and strategic oscilations have been described. Examples
torial problems that has received a great deal of atten- are used throughout the chapter to illustrate the basic
tion in the literature. The main difficulty associated with concepts.
REFERENCES [15] B.D. Backer, V. Furnon, P. Shaw, P. Kilby, P.
Prosser. "Solving Vehicle Routing Problem Us-
[1] L.L. Garver: "Transmission Network Estimation ing Constraint Programming and Metaheuristics " ,
Using Linear Programming", IEEE Trans. Power Journal of Heuristics, 6(4), 501-523, 2000.
App. Syst., Vol. PA8-89, pp. 1688-1697, Sept/Oct
1970. [16] J. SkorinkapotJ: ."Extensions of a Tabu Search
Adaptation to the Quadratic Assignment Problem"
[2] F. Glover: "Tabu Search Fundamental and Uses", Computers & Operations Research, 21(8), 855-865,
Graduate School of Business, Uni_versity of Col- 1994.
orado, 1995.
[17] J.P. Kelly, M. Laguna, F. Glover. "A Study of
[3] F. Glover, J.P. Kelly, M. Laguna: "Genetic Al- Diversification Strategies for the Quadratic Assign-
gorithms and Tabu Search: Hybrids for Optimiza- ment Problem" Computers & Operations Research,
tion" , Computers and Operations Research, Vol. 22, 21(8), 885-893, 1994.
No. 1,1994.
[18] M. Laguna, J.P. Kelly, J.L. Gonzalez Velarde, F.
[4] F. Glover, E. Taillard, D. de WetTa: "A Users Glover: "Tabu Search for the Multilevel General-
Guide to Tabu Search", Annals of Operations Re- ized Assignment Problem", European Journal of
search, Vol. 41, 1993. Operational Research, 82(1), 176-189, 1995.

[5] M. Laguna: "A guide to" implementing tabu search" , [19] I. H. Osman: "Heuristic for the Generalized As-
Investigacion Operativa, Vol. 4, No.1, Abril 1994.' signment Problem - Simulated Annealing and Tabu
Search Approaches", OR Spektrum, 17(4), 211-225,
[6] F. Glover and M. Laguna: "Tabu Search", Kluwer 1995.
Academic Publishers, 199~.
[20] K. Jomsten, A. Lokketangen: "Tabu Search for
[7] C..R. Reeves: "Modern Heuristic Techniques for Weighted K-Cardinality Trees", Asia-Pacific Jour-
Combinatorial Problems" , McGraw-Hill Book Com- nal of Operational Research, 14(2), 9-26, 1997.
[21] S. Hanaji, A. FretJille: "An Efficient Tabu Search
[8] J. Knox; "Tabu Search Performance on the Sym- Approach for the 0-1 Multidimensional Knapsack
metrical Traveling Salesman Problem", Computers Problem", European Journal of Operational Re-
& Operations Research, 21(8),867-876, 1994. searCh, 106(2-3),659-675, 1998.
[9] C.N. Fiechter: "A Parallel Tabu Search Algorithm [22] M. Sun, J.E. Aronson, P.G. McKeown, D. Drinka:
for Large Traveling Salesman Problems", Discrete «A Tabu Search Heuristic Procedure for the Fixed
Applied Mathematics, 51(3), 243-267, 1994. Charge Transportation Problem", European Jour-
nal of Operational Research, 106(2-3), 441-456,
[10] W.B. Carlton, J. W. Barnes: "Solving the Traveling 1998.
Salesman Problem with Time Windows Using Tabu
Search", lIE Transactions, 28(8), 617-629, 1996. [23] E. Rolland, H. Pirkul, F. Glover: "Tabu Search
for Graph Partitioning", Annals of Operations Re-
[11] M. Gendreau, G. Laporte, F. Semet: "A Tabu search, 63, 209-232, 1996.
Search Heuristic for the Undirected Selective Trav-
eling Salesman Problem" , European Journal of Op- [24] S.G. Ponnambolam, P. Aru11indan, S. V. Ra;esh: "A
erational Research, 106(2-3), 539-545, 1998. Tabu Search Algorithm for Job Shop Scheduling",
International Journal of Advanced Manufacturing
[12] M. Gendreau, A. Hertz, G. Laporte: "A Tabu Technology, 16(10), 765-771, 2000.
Search Heuristic for the Vehicle Routing Problem" ,
Management Science, 40(10), 1276-1290, 1994. [25] J. W. Barnes, J.B. Chambers: "Solving the Job
Shop Scheduling Problem with Tabu Search", lIE
[13] J.F. Wu, J.P. Kellf/: "A Network FlowBased Tabu Transactions, 27(2), 257-263, 1995.
Search Heuristic for the Vehicle Routing Problem" ,
Transportation Science, 30(4), 379-393, 1996. [26] A. Fenni, M. Marchesi, F. Pilo, A. Se"';': "Tabu
Search Metaheuristic for Designing Digital Filters" ,
[14] G. Barbarosoglu, D. Ozgur. "A Tabu Search Algo- International Journal for Computation and Math-
rithm for the Vehicle Routing Problem" , Computers ematics in Electrical and Electronic Engineering,
& Operations Research, 26(3), 255-270, 1999. 17(5-6),1998.
(27] E. Costamagna, A. Fanni, G. Giacinto: "A Tabu [38] A. Augugliano, L. Dusonchei, S. Mangione, E.R.
Search Algorithm for the Optimization of Telecom- Sanseverino: "Fast Solution of Radial Distribution
munication Networks" , European Journal of Oper- Networks with Automated Compensation and Re-
ational Research, 106(2-3), 357-372, 1998. configuration", Electric Power Systems Research,
Vol. 56, No.2, pp. 159-165,2000.
[28] A.H. Mantawy, Y.L. Abdel-Magid, S.Z. Selim:
"Unit Commitment by Tabu Search ", IEE Pro- [39] F.S. Wen, C.S. Chang: "A Tabu Search Approach
ceedings Generation, Transmission and Distribu- to Fault Section Estimation in Power Systems",
tion, Vol. 145, No.1, pp. 56-64, 1998. Electric Power Systems Research, Vol. 40, No.1,
pp. 63-73, 1997.
[29) A.H. Mantawy, Y.L. Abdel-Magid, S.Z. Selim: "A
NewGenetic-BasedTabu Search Algorithm for Unit [40] F.S. Wen, C.S. Chang: "Tabu, Search Approach
Commitment Problem", Electric Power Systems to Alarm Processing in Power Systems" , lEE Pro-
Research, Vol. 49, No.2, pp. 71-78,1999. ceedings Generation, Transmission and Distribu-
tion, Vol. 144, No.1, pp. 31-38, 1997.
[30] A.H. Mantawy, Y.L. Abdel-Magid, S.Z. Selim: "In-
tegrating Genetic Algorithms, Tabu Search and .[41] S. M. Sait, H. Youssef: "Iterative computer al-
Simulated Annealing for the Unit Commitment gorithms with applications in engineering", IEEE
Problem", IEEE Transactions on Power Systems, Computer Society, Los Alamitos, CA, 1999.
Vol. 14, No.3, pp. 829-836, 1999.

[31] F.B. Wen, C.S. Chang: "TransmissionNetwork Op-

timal Planning Using the Tabu Search Method",
Electric Power Systems Research, Vol. 42, No.2,
pp. 153-163, 1997.

[32] R.A. Gallego, A. Monticelli, R. Romero: "Compar-

ative Studies of Non-Convex Optimization Methods
for Transmission Network Expansion Planning",
IEEE Transactions on Power Systems, Vol. 13, No.
3, pp. 822-828, 1998.

[33] R.A. Gallego, R. Romero, A. Monticelli: "Tabu

Search Algorithm for Network Synthesis", IEEE
Transactions on Power Systems, Vol. 15, No.2, pp.

[34] Y.O. Wang, H.T. Yang, C.L. Huang: "Solving the

Capacitor Placement Problem in a Radial Distri-
bution System Using Tabu Search", IEEE Transac-
tions on Power Systems, Vol. 11, No.4, pp. 1868-

[35] C.S. Chang, L.R. Lu, F.S. Wen: "Power System

Network Partitioning Using Tabu Search" , Electric
Power Systems Research, Vol. 49, No.1, pp. 55-61,

[36] K. Nara, Y. Hayashi, S. Muto, K. Tuchitla: "A

New Algorithm for Distribution Feeder Expansion
Planning in Urban Area", Electric Power Systems
Research, Vol. 46, No.3, pp. 185-193, 1998.

[37] D.Q. Gan, Z.H. Qu, H.Z. Cai: "Large-Scale VAR

Optimization and Planning by Tabu search", Elec-
tric Power Systems Research, Vol. 39, No.3, pp.
195-204, 1996.
Chapter 8
Hybrid Systems: An Example with Fuzzy Systems

l. INTEGRATION OF FUZZY SYSTEMS Genetic algorithms offer distinct advantages for

WITH GENETIC ALGORITHMS pertinence functions optimization and fuzzy
rules learning indeed. Genetic algorithms
The integration of fuzzy systems with genetic
provide a more wide research, shortening the
algorithms it is newer and less exploited than
chance of ending in a local minimum, through
the combination of the genetic algorithms or of
simultaneously sampling several solution sets.
the fuzzy logic with neural networks. The work
The fuzzy logic contributes with the evaluation
in this field was pioneered about 1989, and
function, a stage of the genetic algorithm where
Charles Kaar [1-4] is recognized as an initiator.
the adjustment is set up.
These works exploit that integration, showing
that it is possible to improve the fuzzy There are several possible ways for using the
controller's performance through the genetic genetic algorithms with fuzzy systems. A
algorithms. hybrid system type requires using separate
modulus as parts of a global system. The
The fuzzy logic and the genetic algorithms
modulus based both in genetic algorithms and
have some common characteristics and some of
fuzzy logic may be kept in a separate group or
them are complementary. For example, fuzzy
together with other intelligent or conventional
systems are very good to keep the knowledge,
computer programs sub-systems that make an
while genetic algorithm is very good to learn.
application system.
And, both technologies are well adequate to
deal with systems and non-linear data. The Another use is for system's design that is
systems that use these techniques have mainly headed for use together with fuzzy
.improved their performances regarding the logic. The use of genetic algorithms is aimed to
efficiency and execution speed [5]. the improvement of design's process and the
performance of the operational system based in
The fuzzy systems have the advantage of
the fuzzy system. The genetic algorithms may
knowledge storage. This is a characteristic of
be used for finding the best values for
the specialist systems so that the rules, for
pertinence functions when the value's hand
example, are easily modified. The fuzzy
selection is difficult or takes too much time.
systems are a convenient and efficient
alternative for the problem's solution The general procedure for using the genetic
representation when the states are well algorithms with fuzzy systems is shown in
diffident. Though, for big and complex Figure 1. For example, one chromosome may be
systems, the fuzzy systems are not easy to be defined as being a linking together of values of
adjusted, requiring hand methods of try-and- all pertinence functions. When triangular
error type. The matrix of fuzzy relation functions are used to represent pertinence
representing the relations between concepts and functions, the parameters are the centers and
actions may be of uneasy handling, and the best widths for each fuzzy system. From an initial
values for the necessary parameters for the range of initial values for possible parameters,
description of pertinence functions may be the fuzzy system is turned on to find out how
difficult to be determined. The fuzzy system much its performance is good. These inputs are
performance may result very sensitive to used for each chromosome's adjustment set and
parameter's specific values. finding the new population. A cycle repetition
happens until finding the best pertinence
function parameter's set of values.

Genetic Algorithm Fuzzy Logic is a way to use inherent analogical
NewPopulation +- Adjustment process data that shift trough a continuous
digital computer band, which works with well-
defined numeric data, i.e., discrete values. For
example, considering a brake system directed
Evaluation by a micro-control: the micro-control takes the
decisions according the brake temperature,
speed, and other system variables.
The temperature variable in this system may be
Figure 1 - Interaction between Genetic divided in a "states" band: "cold", "fresh",
Algorithms and Fuzzy Systems "normal", "tepid", "hot". Though, it is not easy
to set up the transition from one state to the
next one; an arbitrary limit should be defined
The above process may be expanded to use the as to divide, for example, the "tepid" from the
chromosomes that include information's "hot", but this would drive to a discontinue
regarding conditions and rules in relation with change when the entry value pass trough the
fuzzy rules. Their inclusion into the genetic limit. The micro-control should be able to
treatment allows the system to learn or to detect it.
refine the fuzzy rules.
The way would be the creation of the " fuzzy
states" or more common "fuzzy memberships",
2. FUZZY SYSTEMS which allows the graduated change from a state
to the next. You could define the entry
Lotfi A. Zadeh introduced the concept of fuzzy temperature by the use of intermediate
set, in 1965. He is considered as the great functions.
contributor for the Modern Control. At the
beginning of the 60"s decade, Zadeh observed In this way, the entry variable state will no
that the technologic resources available could longer to jump suddenly from one state to the
not provide the automation of the activities next one, it looses gradually the value in one
with regard to industrial, biologic or chemical state while it gains value in the next state,
nature problems, comprising ambiguous instead. Up to in a moment the brake
situations, not subject to processing through temperature "true value" will almost always be
computer logic based on Boolean logic. in some point between two consecutive
functions: 0.6 normal and 0.4 tepid, or 0.7
In order to solve these problems Prof. Zadeh normal and 0.3 fresh, and so on. Figure 2 shows
has published in 1965 [6] one paper summing a representation of such consecutive functions.
up the Fuzzy Set's concepts, and by the
creation of Fuzzy Systems made a re.volution in
the subject. cold fresh normal tepid hot
In 1974, Prof. Mamdani, from London
University's Queen Mary College, after several
trials for controlling a steam motor with
different types of controls, PID included, have
succeeded only after applying the fuzzy
reasoning [7].
Several papers have been published regarding
the Fuzzy Logic aiming to organize the many
applications and developments per
concentration areas [8-11]. brake temperature

Figure 2 - Fuzzy Memberships

2.1. Fuzzy Logic
The name "Fuzzy Logic" in the beginning
The entry variables of a fuzzy control system
seems to refer to something confuse (cloudy),
are generally map out inside consecutive
even providing skepticism with respect -to
function sets - the process for- conversion of an
equipment's control increasing their capacity to
intermediate entry value into a fuzzy value is
perform the work.
called "fuzzification". We should note that a
control system may have entry keys (on/off) more effective.
together with analogical entries, and such
entries (on/off) of the path always will have a
true value equal to 1 or 0 - but such entries are 2.2. Fuzzy Control
in fact a simplified case of a fuzzy variable, so
In concept fuzzy controls are very simple; they
the system may work with tem with no
consist of one entry stage, a processing stage,
and an outing stage. The entry stage map out
Setting up the entry variables map out inside sensors or another type of entries (numerical
the consecutive functions and true values, the data upon which the system will base its
micro-control then takes the decisions for decisions) in a proper way the consecutive
making the actions to follow the rules: functions and the true values; The processing
stage selects each adequate rule and provides
one result for each one of them, and then they
IF the brake temperature is tepid AND the combine the results of these rules; and finally
speed is not too fast the outing stage converts the combined result
from the previous stage into the control.
THEN the brake pressure is slightly reduced
The most usual shape of consecutive functions
is the triangular (as in Figure 2); even though
Where, in this case, the two entry variables are trapeze shaped curves, besides other shapes,
"brake temperature" and "speed". The outing are also used, but the shape is generally less
variable, "brake pressure", is by the same important than the number of curves and where
token, generated from a fuzzy set as a starting thy are located. From 3 up to 7 curves are in
point, which may have values as "static", general appropriate to cover the required entry
"slightly reduced", "slightly increased", and so values band.
The processing stage is, as already discussed,
The decision is based on a set of rules: all the based upon a collection of logic rules shaped
applied rules are invoked, using the on statements as IF - THEN, where IF is called
consecutive functions and true values which the "antecedent" and THEN is called
come from the entries, for the determination of "consequence." Typical fuzzy control systems
the rule result - that in return will be map out have dozens of rules.
inside the consecutive function and the true
For practical purposes) the set of rules usually
value, controlling the outing variable - and
have several antecedents which are combined
after these results are combined as to generate
using fuzzy operators, such as AND, OR, and
a specific answer, the present brake pressure, a
NO, (besides again the definitions are varying):
procedure known as "defuzzification". The
And (in an informal definition) uses in a simple
combination of fuzzy operations with rules
way the minimum weight for all the
based on "conclusion" describes an "expert
antecedents, whereas OR uses the maximum
fuzzy system".
values [12]. (Also there is an operator NO that
Traditional control systems are generally based subtracts one consecutive function from 1
on mathematical models which describe the resulting the complementary function.
control system using one or more differential
There are several ways for the definition of a
equations which define the system answer for
rule's result, but one more usually used and
its entries; such systems are frequently
simple is called "max-min", in which the outing
introduced by the so called '&PID" (proportional
of the consecutive function is given by the true
- integral - derivative). Such controllers are
value generated by the premise.
products from decades of developments and
theoretical work and are highly effective. Rules may be solved in parallel hardware or in
sequential software. All rule's results are
If PID controllers and other traditional control
"defuzzificated" for a concise value by one of
systems are so well developed, why so much
the several methods; there are several
concern regarding the fuzzy logic? Just because
theoretical ones, all of them with advantages'
in some cases it has some advantage: in many
and disadvantages.
cases, the process mathematical model may do
not exist or may be very "expensive" in terms The centered method is very popular, in which
of computer processing power and memory - the result's "mass center" supply the concise
and a system based in empiric rules may be value; in the process of defuzzification by

pondered average, the centered of each area is hurdle THEN break pressure is high.
calculated separately and the outing value is
calculated through the average of these
centered and pondered with the maximum value Speed
of the pertinence function. In the process by a b c d
maximum average of pertinence function, the
a = stopped
pertinence function maximum value average is b= slow
calculated, being this outing value [13]~ c=medium
The fuzzy control systems project is based in d= high
empirical methods - basically a methodical
approach for try-and-error. There are few pre-
defined rules in the present, once it is a new 0 1 2 5 7 10 2S 30 50
technology yet; generally the process follows
the steps below: Distance
a b c d
• The system operational specifications, the
a == in the hurdle
entries and outings are documented. b=near
• The fuzzy sets for the entries are c=faraway
documented. d == veryfar away

• The set of rules is documented.

0 S 6 40 60 80 100 120 150
• The defuzzification method is determined.
• It is executed through a tesi, for checking Pressure
the system, making the detail adjustments as a b c d
required. a == no pressure
2.3. An Example of a Fuzzy Control d=bigh
Let us have an example in which we will have a
specialist system for specifying the pressure to
be used in a vehicle brake as a function of the o 1 2 3 456 8
vehicle distance to a hurdle and the vehicle Figure 3 - Fuzzy Memberships for the Example
speed at that moment. of a Fuzzy Control
1) First the system variables must be set up:
vehicle speed, vehicle distance, and brake
pressure. 4) And as defuzzification function we will use
the centered method. So for a vehicle getting
2) Next the fuzzy sets of each variable should near a hurdle far from 6 and at a speed from 7,
be defined, as shown in Figure 3. It should be which will be the recommended pressure on the
noticed that for a speed of I.S we have two brake? We have as an entry V=7 and D=6, these
degrees of pertinence, one for the set a variables will be fuzzificated going to have the
(stopped), and another for the set b (slow). following degrees of pertinence:
This is valid for most of variable values
because of the intersection seen between sets. Speed: Slow Cu(V=7)=O,7) and Medium
(,u(V=7)=O,3 )
3) Now we define the system rules. Let us see
some of them: Distance: In the hurdle (,u(D=6)=O,8) and Near
• IF speed is slow AND distance is near the
hurdle THEN break pressure is low. The fuzzy system, beginning from these entries,
will provide the following outings, using
• IF speed is slow AND distance is in the minimum operation:
hurdle THEN break pressure is low.
• 0.7 (slow) AND 0.5 (near) THEN 0.5 (low
• IF speed is medium AND distance is near pressure)
the hurdle THEN break pressure is high.
.0.7(slow) AND 0.8 (point) THEN 0.7 (low
• IF speed is medium AND distance is in the pressure)
• 0.3(average) AND 0 .5 (near) THEN 0.3 (high
• 0.3(average) AND 0.3 (point) THEN 0.3
(high pressure)

(U,4O) P . - 0I0.J«llW
_, oW
This resulting fuzzy set will be defuzzificated
providing the outing 4 that corresponds to the .....
1Il.bOlCkpanoflll< 0' 110'

pressure that should be applied, as shown in 90"

Figure 4 : C_AIIS!r


a b c d
a = no pressure
c=medium Figure 5 - Program basic screen.

3 .1. Parking Conditions

o 1 2 456 8
Some conditions are set up for parking. They
Figure 4 - Output Computation belong to two types : linked to the computer
package and logic. The ones linked to the
package are refereed to physical limitations,
The package major aim is to park a vehicle in a • input variables limits :
garage, starting from any initial position. To go
- position (x, y): 0 < x < 32 and 0 < y < 20 (m)
through this task, the user should beginning the
(parking limitations)
development of a fuzzy control set of rules and
pertinence functions that will define the - vehicle angle : -90 0S ~ S 270°
vehicle route. Several windows and numeric
routines are available in the program for - vehicle direction: ahead or back
helping the users to set up such rules. The • output variable limit:
fuzzification and defuzzification processes are
performed by program with no interference - vehicle wheel angle : -30 0S e S 30° (actual
from the user [14] . model limitation)
To represent the vehicle parking problem the
program has a basic screen that is shown in With respect to the logic limitations, they may
Figure 5 . It shows the garage position, the be of different types according the strategy that
existing hurdles (actually the walls) and the will be used. Beside others, some examples are :
limit coordinate values. Also the entry
variables involved in the problem are shown,
namely, (x, y) measured from the vehicle back
• minimizing the number of direction changes
central point and the car angle (~). (ahead or back);
This program basic screen also comprises a set
• minimizing the vehicle travel;
of menus, being its functions indicated by the
attribute names . By starting (through keyboard • do not use part of garage by the parking
or mouse) the attribute File it will be possible time .
to charge a set of rules and pertinence
functions previously developed, to save a new
set or to start its creation. Also it is through The conditions for the vehicle displacement
this attribute that there is the route for the are : acceleration up to 1 (m/s 2 ) and maximum
impressions of several types and outing of the speed of 1(m/s) . Al the displacements are made
program. taking these both values as a reference. There

are three possibilities for changing the
direction of displacement that are:
a) A clash against the wall: When the system
detect that in the next simulation step , the
vehicle will clash against the wall ;
b) A rule that enforces to turn back : when
because of a rule the order for reversing be
used, or ,
c) Lack of outputs : when the control does not
use any rules, that is to say, if the output area
were void (zero).

3.2 . Creation of the Fuzzy Control

The user of this computer package may define a Figure 7 - Edit Fuzzy Sets Window .
new system by creating the pertinence
functions and the control rules . The Figure 6
shows the window Creating Fuzzy Sets where For defining the control rules, that is to say,
defining the number of pertinence functions for how the functions will be grouped, there is the
each variable. Pertinence functions are equally window Edit Rules . Figure 8 presents an
spaced in the variable's control surface. The example of this edit window for filling up the
user changes these pertinence through the rules output values with premise values equal
window Edit Fuzzy Sets. Figure 7 shows this to x = LC, y = YT, angle VE and ahead
window . (forward) movement. The program provides all
set up for the pertinence functions possible
combinations of the input variables, once the
user fills up the rule's conclusion value. This
figure 8 shows two regions of concern. In the
first there are the possibilities for selection of
the direction (ahead or back) and of the
coordinate corresponding to the car angle . The
second region of concern has the filling of the
rule's conclusion. This can be made through the
selection of one of the output values (or none
of them for a non-defined rule or to reverse).
So, this rule is:

Figure 6 - Creating Fuzzy Sets Window .

The user provides the pertinence functions of

the input and output variables of the program.
This can be made through files that have been
generated before or editing the new pertinence
functions . Figure 7 presents an example of
editing for the input variable y .

Figure 8 - Edit Fuzzy Rules Window.

IF x is LC AND Y is YT AND car angle is VE
AND Displacement Direction is ahead
THEN modified angle is NS.

3.3. Simulations
The computing pack learning process used is
the trial-and-error. The· user makes the
membership functions, supply a set of rules and
perform several tests checking the control
quality. It is known that this learning process
(try-and-error) may not provide the expected ( b)
results because many interpretation mistakes
may happen [15]. Figure 9 - Simulation Examples .

Figure 9(a) shows an example of control where

the vehicle starts from an initial position X 3.4. Problem Presentation
equal to 30, Y equal to 120 and an angle of
250, making 449 iterations before parking . The learning process used by computer package
Another example is shown in Figure 9(b) where is trial-and-error. The user fosters the
the vehicle starts from initial position X equal pertinence functions, supply a set of rules, and
to 80, Y equal to 195 and angle -of 50, making next, perform several tests to check the control
328 iterations before parking. If we want to quality. It is known that this learning process
reduce the iterations number, i.e. to minimize (trial-and-error) may not provide the awaited
the route made by the vehicle, we had to results because many interpretation mistakes
change the rules or to introduce new may happen [15] .
membership functions, or to adjust the existent If we want to shorten the iteration number, is
ones. To define the best values for membership that to say, to minimize the vehicle touring
functions by hand is difficult and takes a lot of space we should have to change the rules or to
time. introduce new pertinence functions or still
A training model using genetic algorithms was adjusting the existent ones [14]. To define the
developed in this work. The control is best values for the pertinence functions by hand
optimized through automatic adjustment of are difficult and take a long time.
existing membership functions, with none In this work a training modulus using genetic
action required from the user. This modulus algorithms was introduced. The control
was introduced in the computing package . optimization is made through the automatic
adjustment of the existent pertinence functions,
turning the user's actions no longer necessary .
This modulus was incorporated into the system
l:l • l:l that has earned a new menu, the Genetic

/ /
}, ......
Training menu, which will be discussed in the
following item.
;,/ '--
.",,/ I
l JI \
, \\
uI' \\_ This modulus has an adjusting function for the
f·...... ......
.... _--_.-
............... pertinence functions, using genetic algorithms ,
from a control previously fostered by the user.
For this purpose we have three options from
menu Genetic Training: Starting Positions for
Training, Genetic Training, and Best Results.
(a ) In a general way, the integration between
genetic algorithms and fuzzy control, was set
up as follow:

a) The chromosome was defined as being a set up.
linkage between pertinence functions
After ending all the generations, it is possible
adjustment values. to choose the best result found between all the
b) The parameters are the centers and the generations through the button Best. This
widths of each fuzzy set. These parameters operation allows evaluating the proposed
make up the chromosome's genes. solutions for the genetic algorithms. For that
purpose the user should choose one generation
c) From a possible parameter's initial values and to click the button Adjust.
range the fuzzy system is rolled up to check
whether it works well. After the adjustment, the pertinence functions
are redefined according the parameters of the
d) This information is used for set u~ .the chromosome correspondent to the chosen
adjustment of each chromosome (adaptab~llty)
generation. The system will make the control
and establishing in this way a new population.
based on these new functions.
e) The cycle repetition happens until
The original pertinence functions may be
completion of the generation's number defined
restored through the option Best Options from
by the user. At every generation the best set of
the menu Genetic Training, clicking the button
values is found for the pertinence functions
Restore Original. It will be presented now the
mechanisms used for the generation of very
good pertinence functions in the middle of
fuzzy control used through the genetic
4.1. The Option to Define the Starting Positions algorithms.
For the genetic training, it is possible to define
To describe each pertinence function of the
the initial positions that the vehicle will depart
fuzzy control introduced by the computer
from, to evaluate each chromosome that
package, four parameters are defined. They are:
represents the set of values for the pertinence
IE (down left), ID (down right), SE (upper left)
function's parameter, in this way looking for a
and SD (upper right). Figure 7 shows the
control optimization, not only with respect to
pertinence function PE parameters, variable x.
one sole route, but also of the whole possible
For this function IE's value is equal to 30, ID's
initial positions for starting the vehicle, and so
160, SE's 80 and SD's 110. For the adjustments
the parking happen.
of pertinence functions were defined for each
Through the edit window, the initial positions of them the following equations:
are edited. So, it is possible the definition of a
new position, edit an existent position, to
IE = (IE + k i) - Wi

exclude and qualify or disqualify one position ID = (ID + k i ) + Wi

so that it be or not be used for the genetic
SE = (SE + kt)
training. This is made through the option To
Use Position. SD = (SD + k i)
where k, and Wi are adjustments coefficients.
The k, makes each pertinence function moving
4.2. The Option Genetic Training
to the right or left without losing its shape. The
Through this option the user makes the coefficient Wi makes the pertinence function
adjustments of the genetic training parameters. shrink or expand. These coefficients take each
Through another window, the user may set up integer negative or positive value according the
the adjustment value for the pertinence adjustment value defined by the user in the
functions, besides to define the parameters Genetics Parameters window.
(population, generations, crossover tax, and
The Figure 10 shows an adjustment example in
mutation), that is how much the function will
PE's function with values for IE equal to 30,
be displaced for the right or left and how much
ID's 160, SE's 80 and SD's 110. Being k =-8
it will shrink or expand. After start training, it
and w = 3 the pertinence function will have
is possible to follow the genetic training to
following adjustment:
each generation. For each chromosome that
comprises the set of adjustment parameters of IE' = (30+ (-8» - 3 = 19
the pertinence functions we have the number of
total iterations generated for parking the
ID' = (160 + (-8») + 3 = ISS
vehicle, starting from all the initial positions SE' = ( 80 + (-8» = 72
SD' = (110 + (-8» = 102 problem the aim is to mmmnze the vehicle-
parking route. For this case the evaluation
function is given by :

where / is the total iterations up to parking,

regarding the adjust made for each chromosome
in the membership function . According this
function, each ' chromosome aptitude will be
inversely proportional to the iterations number.

4 .2.3. Genetic Operators: Crossover and

The genetic operators came from the natural
Figure 10 - Adjust Example. evolution principle and govern the
chromosomes renewal. The genetic operators
are necessary for the population diversificat ion
The genetic algorithms are used to find the and the acquired adaptation characteristics
opt imum values, according the strategy and from anterior generations are preserved.
initial points used, from k, and W i for the The crossover process or recombination
pertinence functions . comprises a random cut upon father
chromosomes in which genes will be changed,
generating two descendents. For example let us
4.2 .1. Genetic Representation of Solutions consider two-father chromosomes PI - P2. A
Usually a possible solution for a problem is crosspoint is randomly chosen. The former
associated with a chromosome p by a vector information to this po int in one of the fathers ,
with m positions p-={xj,x"xJ,•••,x.) where each are linked to the latter information to this point
component xi represents a gene. Among the into other father.
chromosomes representation ways the more The mutation process consist in making the
known are: the binary representation and the changes according the adjust value defined by
representation by integers. The binary the user in the values of one or more
representation is a classic one. chromosome genes. For a adjust value of 10,
However, in this paper the representation by for example, the change of a selected gene will
integers was used, because the genes of each fall in some value in the interval [-10, 10] .
chromosome 'are composed by the adjust
coefficients k, - W I which are integer values.
With regard to chromosome size, i.e., how 4.2.4. Renovation and Selection Criteria
many genes each chromosome will have, this The renovation and selection criteria
will depend on the number of membership introduced in this work through the algorithms
functions defined by user. For a fuzzy control were the reproduction one. Reproduction is a
with a group of 18 membership functions, for process in which the chromosomes copies are
example, there will be chromosomes with 36 used for the next generation according the
genes. This is because for each function we evaluation function values. Chromosomes with
have two adjust coefficients: k, - Wi. A 36 high aptitude value will contribute with one or
positions vector then represents the more exactly equals descendents for the next
chromosome. generation.
The example of Table 1 shows a population
4.2 .2. Evaluation Function where N = 4 with the evaluation function value
of each chromosome II = /J = 16,/J = 48, and h
This function deals with the evaluation of the = 80 into the current population. As 'I/J = 160,
aptitude level (adaptation) of each chromosome the partial aptitude of each chromosome is
generated by algorithms. For the f present
16/160 = 0,1 (10%), 16/160 = 0,10 (10%), The stop criteria used was the one that defines
48/160 = 0,30 (30%), and 80/160 = 0,50 (50%) the maximum number of generations to be
respectively . Starting from the partial produced. When the generation number is
aptitudes, it is calculated the number of completed by genetic algorithms, the new
expected descendents from each chromosome in populations generating process is finished, and
the next generation. In the example, for each the best solution is the one among the
chromosome it is expected 0.1 x 4 = 0.4, 0.1 x individuals better adapted to the evaluation
4 = 0.4, 0.3 x 4 = 1.2, and 0.5 x 0.4 =2 function.
respectively .
The next generation effectively reproduced
4.2 .6 . Algorithm Presentation
chromosomes number is given ' by the whole
part of the expected descendants from each Being G the generation number, P the
chromosome. Then for the example we have a population, Pc the crossover rate
reproduction of XJ and two reproductions from (recombination), Pm the mutation rate and VA
X4 ' The selection of one more chromosome for the allowed adjust value for membership
reproduction to complete the population of 4 functions , the algorithm shown bellow
chromosomes is made through proportional generates as a start the vector s with G
selection to the adjustment. positions . Each one of the vector elements is
the best chromosome of one generation .
This process was introduced through the
roulette technique, where the chromosomes that
show a bigger adaptation have greater
STEP 1. Generate initial population P with
probability for being selected . Being 1:/;=160
genes in the interval [. VA, + VA].
the roulette for this example is represented by
Figure 11. STEP 2. Evaluate population P. Store in
vector s the best chromosome .
f1 STEP 3. If completed in the generation
10% number G go to STEP 13.
STEP 4. Calculate the aptitude regarding
f4 population P .
50% STEP 5. Calculate the expected descendents
from population P .
STEP 6. To draw descendents from
population P' starting from population P.
Figure 11 • Roulette representation .
STEP 7. Compose the grouping of population
P' .
For practical purposes the roulette is STEP 8. To draw the crosspoint for the father
represented by vector v of M elements (ordered chromosomes of population P' .
from {l, . .. , N} and be one random index r, r =
1,.. . ,M. Then, v(r) corresponds to what STEP 9. Make the crossing for population P'
chromosome i was selected for . For example : according Pc.
With M = 10, for the previous example we STEP 10. Make the mutation in each
have, 7v = {I, 2, 3, 3, 3,4,4,4,4, 4}. If r = 4, chromosome according Pm.
then the selected chromosome is the number 3.
STEP 11. Evaluate the population P'. Store in
Finally, the roulette turns a determinate number vector s the best chromosome . Make P = P' .
of times, depending on the chromosomes
number necessary for the population STEP 12. Go back to STEP 3.
completion and are chosen as chromosomes that STEP 13. End.
will take part in the next generation, the ones
drawn by the roulette .

4.2.5 . Stop Criteria

Table 1 - Selection Criteria

Population fi(x) Partial Expected Reproduced

Evaluation Aptitude (%) Descendents Chromosomes
Xl 16 10 0.40 0
Xl 16 10 0.40 0
XJ 48 30 1.20 1
X4 80 SO 2.00 2
Soma 160 100 4 3

Table 2 - Starting positions for training.

Position X Y Car angle Iterations without training

1 25 120 180 330
2 160 130 -90 888
3 275 160 -40 655

5. TESTS starting from three initial positions as shown in

Table 2. In this table we also have the vehicle's
In this section it will be presented tests made
number of generated iterations untill parking,
with a fuzzy control, which membership
using the originals membership functions.
functions were adjusted through genetic
Figure 13 shows the vehicle in each starting
algorithms. These tests show those
mechanisms' efficiency, providing an objective
evaluation of the found results.


.~~- --~ ....~ ------- -------- ,
" •. 1



------- •
~ 1iI
.~~------- ----------
Figure 13 - Initial training positions.

Figure 14 shows the courses referred to each

initial position. These positions were chosen
Figure 12 - Original membership functions according the points where the vehicle doesn't
follow a good route till parking, and
consequently, generating an excessive number
The originals membership functions are shown of iterations. The definition of several initial
in Figure 12. This control training was made positions will not only minimizes the routes
referred to these points but also for other Table 4, a reduction of 932 iterations (49,75%)
points, resulting a global minimization of was made for parking the vehicle starting from
traveled space . The defined genetic parameters the initial positions set up for training. Further
for the training are shown in Table 3. it will be presented simulation results made
starting from initial positions not used in the
The generated results by genetic algorithms are
training . Figure 16 shows the membership
shown in Table 4 and Figure 15 . As shown in
functions after adjustment.

Table 3 - Genetic Parameters

Population Size 14
Generations Number 30
Crossover Probability 90%
Mutation Probability 1%

Table 4 - Iterations after the genetic training

Position Iterations without training Iterations with training
1 330 280
2 888 384
3 655 277
Total 1873 941
Average 624,33 331.67

(a) (b)

Figure 14 - Routes without training:

(a) Position 2 - (b) Position 3

Table 5 -Simulations results .
Position X Y Car Iterations
" \ -"~ Angle generated by
~\ ~\.._ l Fuzzy Controls
\ "', !
\ \ I
Original Trained
" \ I
"\ ,
\ 1\
1 1 126 182 450 329
\\ \I \\lJ
\ :
\ i
t 2 6 46 132 167 154
~ \

\ 3 8 41 190 1000 1000

4 10 187 228 453 328
(a) 5 15 70 -90 318 162
6 51 112 48 278 130
7 51 112 54 280 132
8 70 95 -40 275 261
9 74 69 190 164 164
10 76 193 232 605 363
11 88 46 44 283 305
12 115 120 0 182 280
13 120 90 45 182 156

(b) 14 131 140 -72 457 292

15 141 69 -28 342 31°4
Figure 15 - Routes after training :
16 154 166 -80 863 436
(a) Position 2 - (b) Position 3
17 160 135 268 1101 545
18 161 191 178 315 286
19 173 140 -72 762 590
20 208 143 244 363 310
21 217 66 -50 684 325
22 228 194 -48 830 655
23 246 169 154 312 307
24 250 180 -40 739 800
25 265 170 -40 672 329
26 290 95 -40 280 190
. 27 300 124 258 317 306
28 305 156 -90 350 346
29 314 73 -46 235 355
Figure 16 - Membership functions after the 30 314 194 -44 513 744
genetic adjustment
Total 13772 10894
Average 459.07 363.13

Table 5 shows the results obtained from genetic algorithm where the adjustment is
simulations made with 30 positions randomly settled.
chosen for the vehicle parking. The results
demonstrate an average reduction of iterations
number for vehicle to reach the final position 7. ACKNOWLEDGEMENT
of 21 % for the genetic algorithm trained
The author thanks PRONEX, CNPq and CAPES
control. These values represent a global
for the financial support of this project.
reduction of the vehicle route starting from
positions not used in genetic training.
It is possible to notice that in some positions 8. REFERENCES
the iterations number is bigger than the ones
generated by the original control (without
[1] C. L. Karr, "Genetic algorithms for fuzzy
controllers", Al Expert, vol 6, no. 2, pp.
training). In position 29 from Table 5, for
example, the original control generate 235 26-33,1991.
iterations to park the vehicle as compared to [2] C. L. Karr, "Applying genetics to fuzzy
the trained control which generates 355 logic", Al Expert, vol 6, no. 3, pp. 38-43,
iterations. This increase comes from the 1991.
modifications made in the membership
functions, that makes the vehicle change to a [3] C. L. Karr & D. A. Stanley, "Fuzzy logic
different route to reach the final position. and genetic algorithms in time-varying
control problems", in Proc. NAFIPS-91,
1991, pp. 285-290.
6. CONCLUSIONS [4] D. L. Meredith, K. K. Kumar, and C. L.
The fuzzy systems are a convenient and Karr, "The use of genetic algorithms in the
efficient alternative for solution of problems design of fuzzy logic controllers", in Proc.
where the fuzzy state are well defined. WNN-AIND 91,1991, pp. 695-702.
Nevertheless, the project of a fuzzy system may [5] L.R. Medsker - Hybrid Intelligent Systems,
became difficult for large and complex Boston: Kulwer Academic Pub., 1995.
systems, when the control quality depends of
"try-and-error" methods for defining the best [6] L. A. Zadeh - "Fuzzy Sets", Information
membership functions to solve the problem. and Control, Vol.8, pp.338-353, 1965.

The main purpose of the Computing Package [7] E. H. Mamdani, "Appl ication for fuzzy
for The Fuzzy Logic Teaching, as used in this algorithms for the control of a dynamic
work, is to provide the students with the plant," Proc. IEEE. , vol. 121, pp. 1585-
learning of this logic. The choice of a vehicle 1588, 1974.
parking lot is justified because they -don't need [8] T. Terano, K. Asai & M. Sugeno - Fuzzy
to have a previous knowledge. (at least in Systems Theory and Its Applications, New
mathematics terms) regarding the subject, for York: Academic Press, 1992.
making the control.
[9] P.P. Bissione, V. Badami, K.H. Chiang,
The genetic training modulus developed in this P.S. Khedkar, K.W. Marcelle & M.J.
work added this program with an automatic Schutten - "Industrial Applications of
technique for the adjustment of the membership Fuzzy Logic at General Electric", Proc. of
functions parameters. This technique shows the IEEE, pp.4S0-46S, March 1995.
that the performance of a fuzzy control may be
improved through the genetic algorithms, [10] J.A. Momoh, X.W. Ma & K. Tomsovic -
substituting for the "try-and-error" method, as "Overview and Literature Survey of Fuzzy
used before by students for this purpose, with Set Theory in Power Systems", IEEE
no good results. Trans. on Power Systems, Vol. 10, No.3,
pp.1676-1690, Aug. 1995.
The genetic algorithms provided distinctive
advantages for the optimization of membership [11] M. Sugeno - Industrial Applications of
functions, resulting in a global survey, Fuzzy Control, Amsterdam: North Holland,
reducing the chances of ending into a local 1985.
minimum, once it uses several sets of [12] A. Kaufmann - Introduction to the Theory
simultaneous solutions. The fuzzy logic of Fuzzy Subsets, Academic Presss, 1987.
supplied the evaluation function, a stage of the

[13] A. Kandel & G. Langholz, Fuzzy Control 117-124, Kingston, Canada, Jun. 16-18,
Systems, CRe Press, 1993 1996.
[14] G. Lambert-Torres, V.H. Quintana & L.E. [15] D. Park, A. Kandel & G. Langholz -
Borges da Silva - "A Fuzzy Control Lab for "Genetic-Based New Fuzzy Reasoning
Educational Purposes", Canadian Models with Application to Fuzzy
Conference on Engineering Education, pp. Control", IEEE Trans. on SMC, Vol. 24,
No.1, pp. 39-47, January 1994.

Chapter 9

Application of Evolutionary Technique to Power System Security


and adjusting the remaining variables to find the set of

Abstract: This chapter presents an application of Evolutionary points
Computation to the problem of dynamic security assessment. Ee
can be developed to enhance the accuracy of partially trained X={z:!(z,y)=s}
multilayer perceptron neuralnetworks in specific operatingregions.
The technique is based on query learning algorithms and where
evolutionary-based boundary marking algorithm to evenly spread z = the subset of variedparameters
points on a powersystemsecurityboundary. These points are then y = the subset of fixed paramters
presented to an oracle (i,e. simulator) for validation. Any points
that are discovered to have excessive error are then added to the
neural network training data and the network is retrained. This Repeated application of a simple one-dimensional root
technique has advantage over existing training methods because it finding technique was proposed to generatetwo-dimensional
produces training data in regions that are poorly learned and thus nomograms. An exampleof a typicalnomogramis shown in
can be used to improve the accuracyof the neural network in these Figure 1.
specificregions. Jensen et ale [5] proposed a similar idea using an
inversion of a trained neural network to extract infonnation
1. INTRODUCTION relative to the operation of the system. A gradient based
neural network is used for the inversionalgorithmto extract
The modem trend towards deregulation is altering the power system operating infonnation such as the location of
manner in whichelectricpower systems are operated. In the the security boundary to a given operating state. This
past, electric utilities were able to justify improvements in information is used to either avoid insecmity or to regain
system infrastructure based solely on security securityonce lost.
considerations. In a deregulated environment this is no Both of the above applications are based on searching the
longer the case. Economic pressure tends to delay functional relationship of a trained neural network.
construction of new facilities. Therefore, utilities are being Therefore, the accuracy of the neural network is critical to
forced to operate their systems closer to their security their performance. It is especially important that the neural
boundaries. This demands the industry to develop better network be accurate in operating regions of interest such as
methods of quantifying the real-time security status of their near the security boundary.
systems. This chapter presents an evolutionary-based query
Several researchers have investigated the use of neural learning algorithm whereby the accuracy of a partially
networks as a means to predict security status of large trained neural network can be increased. Moreover, the
electric power systems [1-3]. Neural networks provide a proposed algorithm is particularly well suited to quantifying
mapping f(x)=S, where f() is the network function, x is a
vector of networkinputs and S is the corresponding security
status of the power system. Neural networks offer several
advantages over traditional security assessment methods
including faster execution times and the ability to model the
entire powersystem in a compact and efficient form.
McCalley et ale proposed the idea of using neural
networks as a means of creating nomograms customized to
the current operating status of the power system [4].
Nomograms are usually 2-dimensional plots showing the
relationship of systemcontrol variablesto the security of the
In [4], a multilayer perceptron neural network was
trainedto learn the security status of a power system given a XI
set of precontingency operating variables. Nomogramswere
Figure 1 Nomogram for two parameters
then createdby fixinga subset of the network input variables
showing three security levels

and improving performance in specific regions of interest, In this chapter we used the IEEE 17 generator transient
such as security boundaries. The system is based on a stability test system as a case study. We used the EPRI
boundary marking technique originally proposed by Reed energy margin software package called DIRECT [13] to
and Marks [6] which makes use of an evolutionaryalgorithm create the training databasefor the neural network. Software
to spread points evenly on a contour of interest. These was written to automate the data gathering process by
points are then verified via simulations thus quantifying the repeatedly running the DIRECT software to calculate the
accuracy of the security boundary. Areas of inaccuracycan system energy margin for a single fault under many different
then be improved by augmenting the training database and prefault operating states. The database consists of a set of
retraining the neural network. prefauIt system features, in this case generator settings and
Section 3.3 of this chapter deals with issues involved in system load, and the corresponding system energy margin.
training neural networks for power system dynamic security The DIRECT softwaredetermines the energy margin,which
assessment including; data gathering, training and is related to the security of the system, by assigning a
validation. Section 3.4 introduces the concept of positive energy margin to secure states and a negative
evolutionary algorithms and the proposed query learning energy margin to insecure states. The magnitude of the
technique of this chapter. Section 3.5 describes the energymarginindicates the degree of stability or instability.
application of this technique to the creation of nomograms A software package called QwikNet [14] to design and
and the location of critical operating regions using the IEEE test the neural networkwas used. QwikNet is a remarkable
17 generator transient stability test system as a case study. windows based neural network simulation package that
Finally, conclusionsare presented in section 3.6. allows experimentation with many different network
topologies and trainingalgorithms. After training, the neural
2. NN'S -FOR DSA network function, f('1.)=8, can be written to a file in a
convenient C programming language format that can easily
Neural networks have demonstrated the ability to be incorporated intothe inversion software.
approximate complex nonlinear systems when presented
with a representative sample of training data. Several
researchers have reported remarkable results when applying
the multilayer perceptron neural network to the power
system security assessment problem [1-3]. Typically, Query learning [15-16] is a method that can be used to
traditional methods such as time domain simulations [7] or enhance the performance of partially trained neural
energy function methods [8] are used to generate a database networks. Query learning is based on the notion of asking a
of training data. This database includes examples of all partially trained network to respond to questions. These
power system operating scenarios of interest described by a questions are also presented to an oracle which always
set of selected power system features as well as their responds with the correct answer. The response of the
resulting security measure. The neural network then adapts neural network is then compared to that of the oracle and
itself to the training database and produces an approximation checked for accuracy. Areas that are poorly leamed by the
to the security assessment problem in the form an equation neural network can be thus identified. Training data is then
f(x)=S, wherefis the neural network function, i is the vector generated in these areas and the network is retrained to
of power system features and S is the resulting security improve its performance.
index. Examplesof commonly used security indices include The query leaming procedure proposed in this chapter is
energy functions and critical clearing times [7,9]. an extension of previouslyproposed methods. The principle
A key advantageof using neural networksis the ability to difference is that instead of locating and then querying
extract operating information after training via neural individual points, our algorithm works with a population of
network inversion techniques [10-12]. Neural network solutions, thus offering the ability to query entire areas of
inversion is the process of fmding an input vector that interest. This algorithm also seeks to evenly distribute the
produces a desired output response for a trained neural points across the area. Evenly distributing the points is
network. For example, consider a neural network trained to important because a global view of the securityboundary in
predict the security S of a power system given a vector of multiple dimensions is provided thus allowing the entire
system features x. By clamping the output value S to the boundaryto be queried and potentially improved. After the
marginally secure state, say 8=0.5, where S=1.0 is secure points are spread, they are simulated via the energy margin
and 8=0.0 is insecure, and inverting the network, a simulatorand their true security index is determined. If all
marginally secure state x· can be found in the input space. the points are within tolerance the algorithm stops.
This state then describes a region of the power system Otherwise, the points with unacceptably large errors are
operating space where insecurity is likely to occur. It should added to the training database and the neural network is
be noted that since the neural network is typically a many- retrained
to-one mapping, the inversion is generally not to a unique In the evolutionary boundary marking algorithm, all
point, but rather to some contour in the input space. reproduction is asexual, i.e. no mating or crossover takes

place. Offspring are produced as perturbations of single (c) Feasibility constraints are enforced on the
parents. This concentrates the search in the area close to the new offspring via the solution of a standard
security boundary and speeds convergence. The algorithm powerflow.
seeks to minimize a fitness function, F, of the following 4. Repeat until convergence.
By successively deleting points with poor fitness values and
1 replacing themwith perturbations of points with high fitness,
F = If(x)- Sl+ -Davg the population tends to spread evenly across the solution
contour. Typical values used in this chapter are N=100,
where, .M=20, m=3 and 0' =0.05.
f is the neural network function, Figure 2 shows histograms of the initial and final
x is the current point, population distributions. It can be seen that the final
S is the security boundary, and population has converged to the security boundary and is
Davg average distance to the nearestneighbors evenly spread across the boundary. These points are then
added to the training database and the network is retrained.
The evolutionary algorithm is randomly initialized with N Severaliterations of query learning may be requiredproduce
points and then proceeds as follows. acceptable results.
1. The populationis sorted based on fitness, F.
2. The M points with the lowest fitness scores are 4. CASESTUDY - IEEE 17GENERATOR SYSTEM
deleted. The IEEE 17 generatortransient stabilitytest system[17]
3. Replacements are generated for eachdeletedpoint: is used to illustrate the perfonnance of the proposed
(a) Mparents are selected proportional to fitness algorithm. This system Consists of 17 generators and 16~
from the remainingpoints. buses. The EPRI energy margin software DIRECT is used
(b) New offspring are created as perturbations of to determine the energy margin of the system in response to
the selected parents, xnew = Xparent + D , a single three phase fault. Twelve system features are
where n - N(O,0') . selected to represent the system for neural network training.
These include the real and reactive powers of the 5

Initia l D istance to Boundary Final Distance to Boundary

120 120
100 100
c 80
"c 80 ,
"".e.. 60
20 20
1_ . ... . m . "" . "" "" "" ~ t'!'I I:l ~ 0
0 0.1 0.2 0 .3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 0 0.1 0.2 0.3 0.4 0.5 0 .6 0 .7 0 .8 0.9 1 1.1 1.2
Distance to Boundary Distance to Boundary

Initial Distribution Final Distribution

~ 30
! 20
o I I I

rf> .,<:> ...

<:> .,<:> '0<:> ",<::> 'IJ<:> <§> ... ~ ...
<:><::> ......<:> ... .,<::> ~<::>

Distance to Neighbors

Figure 2 Histogramsof the initialpopulation and final population of the boundary marking algorithm

generators closest to the fault location and the total system power system operating nomograms from neural networks .
real and reactive load level. A training database of 436 Results show a nearly 5 fold improvement in RMS error
samples was created for initial training of the neural network when applied to the IEEE 17 generator test system.
by randomly perturbing generation settings and system load
levels and then simulating each case on the energy margin 6. REFERENCES
software. The initial RMS neural network testing error was
[1] DJ. Sobajic and Y. H. Pao, "Artificial Neural-Net Based
0.113 corresponding to a test database that was not used for
Dynamic Security Assessment for Electric Power
training .
Systems", IEEE Transactions on Power Systems, vol. 4,
Security Boundary Nomogram
no. 1, February 1989, pp. 220-228.
[2] M. A. EI-Sharkawi, R. J. Marks II, M. E. Aggoune, D. C.
1150 - - -- - - -- - - -. -.- - ..- -.- Park, M. 1. Damborg and L. E. Atlas, "Dynamic Security
-+- Simulato r 1
Assessment of Power Systems using Back Error
1100 +-_ _.=.....-- ~~I~;n:NN
Propagation Artificial Neural Networks." Second
'i" 1050 +--...-=------"'I~=_------__i
Symposium on Expert System Application to Power
1000 t-------===!:::::::=J;:>~:='i==:::;::=:;:___1
Systems, Seattle, WA, 1989.
[3] Y. Mansour, E. Vaahedi, A.Y. Chang, B. R. Corns, J.
Tamby and M. A. EI-Sharkawi, "Large Scale Dynamic
950 +------------~~:==_~ Security Screening and Ranking Using Neural Networks,"
IEEE Transaction on Power Systems, vol. 12 no. 2, May
900 +--~--~--_-~--~-____4
250 300 500 550 1997, pp. 954-960.
[4] J. D. McCalley, S. Wang, Q. Zhao, G. Zhou, R. T. Treinen,
and A. D. Papalexopoulos, "Security Boundary
Figure 3 Nomogram ofP73 vs, P76 Visualization For Systems Operation, " IEEE Transactions
on Power Systems, May 1997, pp. 940-947.
[5] C. A. Jensen, R. D. Reed, M. A. EI-Sharkawi and R. 1.
The proposed query learning algorithm was then used to Marks II, "Location of Operating Points on the Dynamic
generate additional training samples near the security Security Border Using Constrained Neural Network
boundary. These points are simulated on the DIRECT Inversion," International Conference on Intelligent
energy margin simulation software and the points with large Systems Applications to Power Systems (lSAP '97), Seoul,
errors are added to the training data file. The final training Korea, July 1997.
database consisted of 1177 training samples and the final [6] R. D. Reed and R. J. Marks II, "An Evolutionary
RMS test error was reduced to 0.062. Algorithm for Function Inversion and Boundary Marking,"
Nomograms were then created from the initial and the IEEE International Conference on Evolutionary
enhanced neural networks based on the method proposed in Computation, Perth, West Australia, December 1995.
[4]. These nomograms show the relationship between two [7] P. Kundur, Power System Stability and Control, McGraw-
Hill, New York, 1993.
generator power outputs and the security boundary. The two
[8] A. A. Fouad and V. Vittal, Power System Transient
nomograms are shown in Figure 3 along with the true
Stability Analysis Using the Transient Energy Function
nomogram which was created by repeatedly querying the
Method, Prentice Hall, 1992.
simulator. It should be noted that the nomogram of the [9] P. M. Anderson and A. A. Fouad, Power System Control
simulator as shown in Figure 3 required smoothing by fitting and Stability, The Iowa State University Press, Ames,
a 2ad order polynomial to the raw data. The smoothing Iowa, 1977.
operation is required due to the approximations and [10] R. J. Williams, "Inverting a Connectionist Network
assumptions made by the simulation software. The RMS Mapping by Backpropagation of Error", gJr Annual
error for the initial nomogram is 48.53 while the enhanced Conference of the Cognitive Science Society, Lawrence
neural network nomogram is 10.11. This experiment proves Erlbaum, Hinsdale, NJ, pp. 859-865, 1986.
the viability of the proposed technique in increasing the [11] A. Linden and 1. Kindermann, "Inversion of Multilayer
accuracy of a partially trained neural network near the Nets," Proceedings of the International Joint Conference
security boundary. on Neural Networks, Washington D. C., pp. 425-430, vol.
5. CONCLUSIONS [12] J. N. Hwang, 1. J. Choi, S. Oh, and R. J. Marks II,
"Classification Boundaries and Gradients of Trained
This chapter presents an enhanced query learning Multilayer Perceptrons", IEEE International Symposium
algorithm that effectively locates regions of interest and on Circuits and Systems, 1990, pp. 3256-3259. .
distributes neural network training data in these regions . [13] EPRI, "Analytical Methods for DSA Contingency
The process is used to enhance the accuracy of partially Selection and Ranking, Users Manual for DIRECT Version
trained neural networks in specific operating regions. The 4.0", EPRI TR-I05886, Palo Alto, California, 1993.
proposed technique is applied to the problem of generating

[14] QwikNet Neural Network Simulation Software, [16] J.N. Hwang, J.J. Choi, S. Ob and R. J. Marks II, "Query
http://www.kagi.com/cjensen, 1997. Based Leaming Applied to Partially Trained Multilayer
[15]M. A. El-Sharkawi and S. S. Huang, "Query-Based Perceptrons", IEEE Transactions on Neural Networks, Vol.
Learning Neural Network Approach to Power System 2, 1991,pp.131-136.
Dynamic Security Assessment," International Symposium [17] IEEE Committee Report, "Transient Stability Test Systems
on Nonlinear Theory and Its Applications, Waikiki, for Direct Stability Methods," IEEE Transactions on
Hawaii,December5-10, 1993. Power Systems, vol. 7, no. 1, February 1992, pp. 37-44.

Chapter 10

Generation Expansion and Reactive Power Planning

Abstract: This chapter presents applications of genetic algorithm to optimization algorithm for reactive power generation
generation expansion planning and reactive power planning planning problem [7].
problems. The generation expansion planning and reactive power
planning problems are nonlinear dynamic optimization problems
that can only be fully solved by complete enumeration, a process 2. GENERATION EXPANSION PLANNING
. which is computationally impossible for realistic planning
problems. Therefore, modem heuristic approaches, such as genetic Generation expansion planning (GEP) is one of the most
algorithm, are well suited for these complex planning problems.
important decision-making activities in electric utilities.
However, the simple genetic algorithm has some structural
problems such as premature convergence. This chapter introduces
Least-cost GEP is to determine the minimum-eost capacity
two approaches to overcome the shortfalls of simple genetic addition plan (i.e., the type and number of candidate plants)
algorithms: an improved genetic algorithm (lOA) incorporating a . that meets forecasted demand within a prespecified reliability
stochastic crossover technique and an artificial initial population criterion over a planning horizon.
scheme for generation expansion planning, and a modified simple A least-cost GEP problem is a highly constrained
genetic algorithm (MSGA) incorporating synthetic optimization nonlinear discrete dynamic optimization problem that can
procedure by combining the random search and optimization only be fully solved by complete enumeration in its nature
procedure for reactive power planning. [12,13,14]. Therefore, every possible combination of
candidate options over a planning horizon must be examined
Index Terms: Generation expansion planning, reactive power
planning, genetic algorithm, global optimization.
to get the optimal plan, which leads to the computational
explosion in a real-world GEP problem.
To solve this complicated problem, a number of methods
1. INTRODUCTION have been successfully applied during the past decades.
Masse and Gilbrat [15] applied a linear programming
The generation expansion planning and reactive power approach that necessitates the linear approximation of an
planning problems are nonlinear dynamic optimization objective function and constraints. Bloom [16] applied a
problems that can only be fully solved by complete mathematical programming technique using a decomposition
enumeration, a process which is computationally impossible method, and solved it in a continuous space. Park et ale [17]
for realistic planning problems. Therefore, modem heuristic applied the Pontryagin's maximum principle whose solution
approaches, such as genetic algorithm, are well suited for also lies in a continuous space. Although the above-
these complex planning problems. GA is a search algorithm mentioned mathematical programming methods have their
based on the hypothesis of natural selections and natural own advantages, they possess one or both of the following
genetics [1]. Recently, a global optimization technique using drawbacks in solving a GEP problem. That is, they treat
GA has been successfully applied to various areas of power decision variables in a continuous space. And there is no
system such as economic dispatch [2,3], unit commitment guarantee to get the global optimum since the problem is not
[4,5], reactive power planning [6,7,8], power plant control mathematically convex. Dynamic programming (DP) based
[9,10], and generation expansion planning [11]. GA-based framework is one of the most widely used algorithms in GEP
approaches have several advantages. Naturally, they can not [12,13,14,18,19]. However, so-called 'the curse of
only treat the discrete variables but also overcome the dimensionality' has interrupted direct application of the
dimensionality problem. I n addition, they have the capability conventional full DP in practical GEP problems. For this
to search for the global optimum or quasi-optimums within a reason, WASP [12] and EGEAS [13] use a heuristic
reasonable computation time. However, there exist some tunneling technique in the DP optimization routine where
structural problems in the conventional GA, such as users prespecify states and successively modify tunnels to
premature convergence and duplications among strings in a arrive at a local optimum. David and Zhao developed a
population as generation progresses [1]. heuristic-based DP [18] and applied the fuzzy set theory [19]
This chapter introduces two approaches in enhancing the to reduce the number of states. Recently, Fukuyama and
capability of GA for power system planning. First, an Chiang [20] and Park et ale [21] applied genetic algoritbin
improved genetic algorithm (IGA), which can overcome the (GA) to solve sample GEP problems, and showed promising
shortfalls of the conventional GA to some extents, is results. However, an efficient method for a practical GEP
introduced for generation expansion planning [11]. Second, problem that can overcome a local optimal trap and the
a synthetic optimization procedure is presented by combining dimensionality problem simultaneously has not been
two optimization methods together, random search and developed yet.

In this chapter, an improved genetic algorithm (IGA), /,I (U I) : discounted construction costs [$] associated
which can overcome the aforementioned problems of the with capacity addition U, in year t.
conventional GA to some extents, is introduced [11]. The
//(X,) : discounted fuel andO&M costs [$] associated
IGA incorporates the following two main features. First, an
artificial creation scheme for an initial population is devised, with capacity x, in year t,
which also takes the random creation scheme of the t;(U,) : discounted salvage value [$] associated with
conventional GA into account. Second, a stochastic crossover capacity addition U, in year I.
strategy is developed. In this scheme, one of the three
different crossover methods is randomly selected from a
The objective function is the sum of tripartite discounted
biased roulette wheel where the weight of each crossover
costs over a planning horizon. It is composed of discounted
method is determined through pre-performed experiments.
investment costs, expected fuel and O&M costs and salvage
The stochastic crossover scheme is similar to the stochastic
value. To consider investments with longer lifetimes than a
selection of reproduction candidates from a matingpool. The
planning horizon, the linear depreciation option is utilized
results of the IGA are compared with those of the
conventional simple genetic algorithm, the full DP, and the
[12]. In this paper, five types of constraints are considered.
Equation (2) implies state equation for dynamic planning
tunnel-constrained DP employed in WASP.
problem [6]. Equations (3) and (4) arerelated with the LOLP
reliability criteria and the reserve margin bands, respectively.
3. THE LEAST-COST GEP PROBLEM The capacity mixes by fuel types are considered in (5). Plant
types give another physical constraint in (6), which reflects
Mathematically, solving a least-cost GEP problem is the yearly construction capabilities.
equivalent to finding a set of-optimal decision vectors over a Although the state vector, X t , and the decision vector, UI ,
planning horizon that minimizes an objective function under have dimensions of MW, we can easily convert those into
several constraints. The GEP problem to be considered is vectors which have information on the number of units in
formulated as follows [6]: each plant type. This mapping strategy is very useful for GA
implementation of a GEP problem such as encoding and
t:..u- L {f/(U

t )+ ft2(X t ) - fJ(U t ) } (1) treatment of inequality (6), and illustrated in the following
ss. XI =X t - 1 + U, (I =1,.· ·,T) (2)
LOLP(Xt)<E (t=1,.· ,T) (3) X, =(x:, ... ,x~)T -+ X t, (,1
= X, ,···,XtIN)T. (7)

[iSR(Xt)SR (t=l,.···,T) (4)

U, = (u,,···,u,N)T
-+ u:t = (,1
"I ,··,U, N)T (8)
MI S LX: SM! (t = l,.···,T and j = l,.···,J) (5) N : number of plant types including both existing and
- ien j
candidate plants,
(t = 1,.· ··,T) (6) X; : cumulative numberof units by plant types in year t,
where U; : addition number of units by plant types in year t,
T: number of periods (years) in a planning horizon,
J : number of fuel types, x;i: i-th plant type's cumulative number of units in year I,
n j : index set for j-th fuel type plant, U;i: i-th plant type's addition number of units in year t.
x, : cumulative capacity [MW] vector of plant types in
x: : cumulative capacity [MW] of i-th plant type in year I,
u,: capacity addition [MW] vector by plant types in year I, Basically, genetic algorithm is a search mechanism based
U, : maximumconstructioncapacity [MW] vector by on the hypothesis ofnatmal selection [1]. GA is an artificial
optimization scheme that emulates the hypothetical adaptive
plant types in year I,
nature of natural genetics. GA provides solutions by
u; : capacity addition [MW] of i-th plant in year I, generating a set of chromosomes referred to as a generation.
LOLP(X,) : loss of load probability (LOLP) with x, in Each string (chromosome) has its own fitness measure that
year I, reflects how well a creature can survive under surrounding
R( X,) : reserve margin with x, in year t, environments. The new generation of strings is provided
E : reliability criterion expressed in LOLP, through three major genetic operations - reproductiori,
R,f! : upper and lower bounds of reserve margin, crossover and mutation, which provide a powerful global
search mechanism. Reproduction is a process in which
M!, M! : upper and lower bounds ofj-th fuel type in individual strings are copied into a mating pool according to
year I, their fitness values. Crossover, the most important genetic
operator, is a structured recombination operation. In the

classicalone-pointcrossover,a random position in a string is population.
chosen and all characters to the right of this position are
4) Crossover - Crossover is performed on two strings at a
swapped. Mutation, the secondary operator in GA, is an
time that are selected from the population at random. It
occasional randomalterationof the value of a stringposition.
involves choosing a random position in the two strings and
Variations of the simple GA for power system applications
swappingthe bits that occur after this position. Crossovercan
can be found in the References [2-11,20,21]. The
occur at a single position (single crossover). Crossover can
improvements on the conventional GA will be described in
be performed in different methods. Two different means are
the subsequentsections.
used in this paper: Tail-tail and head-tail crossover.
The simple Genetic Algorithmconsists of a populationof
The tail-tail crossover tends to change less significantbits.
bit strings transformed by three genetic operations: selection,
On the other hand, the head-tail crossover gives more chance
crossover and mutation. Each string represents a possible
of changes by changing more significant bits. The crossover
solution, with each substring representing a value for a
methods can be changed during iterations: the head-tail
variable of interest. The algorithm starts from an initial
crossovercan be used in early generations and then switched
population generated randomly. A new generation is
to tail-tailcrossover in eater generations for fine tuning.
generated by using the genetic operations considering the
fitness of a solution which corresponds to the objective S) Mutation - Mutation is performed sparingly, typically
function for the problem. The fitness of solutions is improved every 100-1000 bit transfers from crossover, and it involves
through iterations of generations. When the algorithm selectinga string at random as well as a bit postion at random
converges, a group of solutionswith better fitness is obtained, and changing it from a 1 to a 0 or vice-versa. It is used to
and the optimalsolutionis obtained. escape from a local minimum, After mutation, the new
generation is complete and procedure begins again with
4.1. StringRepresentation fitnessevaluation of the population.

String representation is an important factor in solving the

planning problem using the SGA. In order to accommodate 5. IMPROVEDGA FOR THE LEAST-eOST GEP
different representations of object parameters, i.e., the
investment variables, the following representation method is 5.1. String Structure
used. Since it is convenient to use integer values for GA
A string consists of sub-strings; the numberof sub-strings implementation of a GEP problem, the reordered structureof
is equal to the number of total candidate buses for adding (8) by plant types covering a planning horizon is used for
capacitors or inductors. Each sub-string is in the form of encodingof a string as shown in (9). Here, each elementof a
binary corresponding to the amount of investment This string (i.e., (j'" =~j",U2",.,UT"r for n=l, ...,N) corresponds to
binary representation is certainly not unique but it is simple
to implement. For example, the 3 bit binary for a unit can a substringand its structure is depicted in Fig. 1.
represent 2 = 8 different amounts of installation for U = {,I
, ,1,1 ,n.n,n .N,N
\"1 '"2 '.'UT"· ·,Ul ,U2 ,.,UT , . ·,Ul ,U2 ,.,Ur
generator, capacitive or inductive V AR. On the other hand,
the 4 bit represents 2 = 16 choices. {,UA, l ,...,UA,,, ,...,UA,Ny .
=~ (9)

4.2. GeneticOperations

1) Initialpopulation generation - Initial population of binary

strings is created randomly. Each of three strings represents
one feasible solutionsatisfyingconstraints.
2) Fitness evaluation - The solution strings and each
candidate solution is tested in its environment The fitness of
each candidate solution is evaluated through some
appropriated measure suchas the inverse of the cost function
z. The algorithmis driven towards maximizing this fitness.
3) Selection and reproduction - Selection and reproduction
create a new population from old population. A set of old
r······u·~D'inuy·T·· u~:·~biiWjl
~ ~.& l ~g ....J
strings is selectedto reproduce a set of new stringsaccording
to the probability determined by the simulated spin of
weighted roulette wheel. The roulette wheel is biased with
the fitness of each of the solution candidates. The wheel is
spun N times where N is the number of strings in the Fig. 1. A substring structure.

procedures are illustratedin the following and in Table I:
Step 1. Generate all possible binary seeds of each plant
5.2. Fitness Function type considering (6). For example, if i-th plant type has an
The objective function or cost of a candidate plan is upper limit of 3 units per year, then generate 4 possible
calculated through the probabilistic production costing and binary seeds (i.e., 00,01,10,11).
the direct investment costs calculation [12,13]. The fitness
Step 2. Find the least commonmultiple(LCM) m from the
value of a string can be evaluated using the following
numbers of the binary seeds of all types, and fill m binary
equation[1,9]: seeds in a look-up table for all plant types and planning
a years. For example, if three plant types have upper limits of
f=-- (10) 3, 3 and 5 units per year, respectively, then the numbers of
1 + J binary seedsare 4, 4, and 6, and m becomes 12.
Step 3. Select an integer within [1, m] at random for each
a : constant,
J : objective functionof (1). element u;" of a string in (9). Fill the string with the
corresponding binary digits, and delete it from the look-up
However, this simple mapping occasionally brings about a table. Repeatuntil m different strings are generated.
premature convergence and duplications among strings in a
population, since strings with higher fitness values dominate Step 4. Check the constraints of (3), (4) and (5). If a string
the occupation of a roulette wheel. satisfies these constraints for all years, then it becomes a
To ameliorate these problems, the following modified memberof an initial population. Otherwise, the only parts of
fitness function, which normalizes the fitness value of strings the string that violate the constraints in year t are generated at
intorealnumbers within [0,1], is used in this paper[1]. randomuntil they satisfy the constraints. Go to step 3 n times
for n'm less than P, where P is the number of strings in a
f '(i) = f(i)- fmin (11) population and n is an arbitrary positiveinteger.
fmax - fmin Step 5. The remaining Psn-m strings are created using
where uniform random variables with binary number {O,1}. Go to
f(i) : fitness valueof string i using(l0), step 4 to check the constraints and regenerate them if
f .... , f .. : maximum and minimum fitness value in a necessary. This process is repeated until all strings, which
generation, satisfy the constraints,are generated.
rei) : modified fitness value of string i.
This AlP is based on both artificial and random selection
5.3. Creation of an Artificial Initial Population schemes, which allows all possible string structures can be
includedin an initial population.
It is important to create an initial population of strings
spread out throughout the whole solutionspace, especially in 5.4. StochasticCrossover, Elitism, and Mutation
a large-scale problem. One alternative method could be to
increase the population size, which yields a high Most of GA works are based on the Goldberg's simple
computational burden. This paper suggests a new artificial geneticalgorithm(SGA) framework [1]. This paper proposes
initial population (AlP) scheme, which also takes the random two different schemes for genetic operation: a stochastic
creation scheme of the conventional GA into account. The crossover technique and the application of elitism. The


Type I Type2 Type3

(Upper Limit: 3 UnitsIYear) (U0Pe Limit: 3 Units/Year) runner Limit: 5 UnitsIYear)
m Year I Year 2 Year 3 Year I Year2 Year 3 Year I Year2 Year3
I 00 00 00 00 00 00 000 000 000
2 01 01 01 01 01 01 001 001 001
3 10 10 , 10 io 10 10 010 010 010
4 11 II II \I 11 11 011 Oil Oil
5 00 00 00 00 00 00 100 100 100
6 01 01 01 01 01 01 101 101 101
7 10 10 10 10 10 10 000 000 000
8 II II II II 11 11 001 001 001
9 00 00 00 00 00 00 010 010 010
10 01 01 01 01 01 01 Oi l Oil 011
11 10 10 Hj 10 10 10 100 100 100
12 II 11 II II 11 11 101 101 101
Generated String 1 : 011100101011000101010

stochastic crossover scheme covers three different crossover After genetic operations, we check all strings whether they
methods; I-point crossover, 2-point crossover, and l-point satisfy the constraints of (3) to (5) or not. If any string that
substring crossover as illustrated in Fig. 2. Each crossover violates the constraints of (3) to (5), only the parts of the
method has its own merits. The l-point substring crossover string that violate the constraints in year t are generated at
can provide diverse bit structures to search solution space, random until they satisfy the constraints as described in the
however it easily destroys the string structure that may have AlP scheme.
partial informationon the optimal structure.

Parent2 • _,_ II.....
X o
Dill 11• .11 111111 Child 1
• IIC DOC[]C[] Child 2
The IGA, SGA, tunnel-constrained dynamic programming
(TCDP) employed in WASP, and full dynamic programming
l-point crossover (DP) was implemented using the FORTRAN77 language on

Parent2 .._.1....
[][]~[]D~CD[] X DC!II ••pODD
2-point crossover
Child 1
Child 2
an IBM PC/Pentium (I 66MHz) computer.

6.1. Test Systems Description

The IGA, SGA, TCDP and DP methods have been applied
in two test systems: Case 1 for a power system with 15
=~ ~~~.~~,~~ X ~~~e~~~~~
Child 1
Cbild2 existing power plants, 5 types of candidate options and a 14-
I-point crossover for substrings
year study period, and Case 2 for a real-scale system with a
24-year study period. The planning horizons of 14 and 24
Fig. 2. Three different crossovermethodsused. years are divided into 7 and 12 stages (two-year intervals),
respectively. The forecasted peak demand over the study
period is given in Table II.


Fig. 3. Roulettewheel for stochastic selection of crossovermethod.

Although the 1- and 2-point crossovers can not explore

solution space as widely as the above crossover, the
Tables ITI and IV show the technical and economic data of
probability of destroying an already-found partial optimal
structure is very low. The stochastic crossover strategy is the existing plants and candidate plant types for future
similar to the process of stochastic selection of reproduction additions, respectively.
candidates from a mating pool. That is, one of the three
different crossover methods is selected from a biased roulette TABLEnI.
wheel, where each crossover method has a roulette wheel slot
sized according to its performance. The weight for each
Unit Operating FixedO&M
crossover method has been determined. Name No. of FOR
Capacity Cost Cost
The second feature lies in the application of elitism [5]. (Fuel Type) Units (%)
(MW) (SlkWh) (SlkW-Mon)
The roulette wheel selection scheme gives a reproduction Oil #1 (Heavy Oil) 1 200 7.0 0.024 2.25
opportunity to a set of recessive members and might not give Oil #2 (Heavy Oil) 1 200 6.8 0.027 2.25
the set of dominant strings (i.e., an elite group) a chance to Oil #3 (HeavyOil) 1 150 6.0 0.030 2.13
reproduce. Furthermore, the application of genetic operations LNG arr #1 (LNG) 3 50 3.0 0.043 4.52
changes the string structures of the fittest solutions. Thus, the LNGC/C#l (LNG) 1 400 10.0 0.038 1.63
best solutions in the current generation might not appear in LNG ClC #2 (LNG) 1 400 10.0 0.040 1.63
the next generation. To circumvent these problems an elite LNG ClC #3 (LNG) 1 450 11.0 0.035 2.00
group is directly copied into the next generation. Coal# 1 (Anthracite) 2 250 15.0 0.023 6.65
We have applied the conventional mutation scheme [I] Coal#2 (Bitummous) 1 500 9.0 0.019 2.81
where it performs bit-by-bit for the strings undergone the Coal#3 (Bituminous) 1 500 8.5 0.015 2.81
stochastic crossover operator. However, the mutation
Nuclear #1 (PWR) 1 1,000 9.0 0.005 4.94
procedure is not applied to the set of dominat strings to
Nuclear #2 (PWR) I 1,000 8.8 0.005 4.63
preserve the elitism.

TABLEIV. Among the three crossover methods, the l-point substring
TECHNICAL AND ECONOMIC DATA OFCANDIDATE PLANTS. crossover showed the best performance in every case. Thus,
Candidate Const- Capa- FOR Operating Fixed Capital Life
we set the I-point substring crossover with the biggest
Type ruction city (%) Cost O&M Cost Time weight, and others with an equal smaller weight. To
Upper (MW) (S/kWb) Cost (S/kW) (yrs) determine the weight of each crossover method in a biased
Limit roulette wheel, 18 simulations were performed with different
Oil 5 200 7.0 0.021 2.20 812.5 25 weights and crossoverprobabilitiesas shown in Table VII.
LNOC/C 4 450 10.0 0.035 0.90 500.0 20
Coal (Bitum.) 3 500 9.S 0.014 2.75 1062.5 25
Nuc.(PHWR) 3 700 7.0 0.003 5.50 1150.0 25
ObjectiveFunction in MillionDollars
Weights (Errors aDinst ODtimaI Solution %)
PC=0.6 PC-O.? PC=0.8
6.2. Parametersfor GEP and IGA 0.05 : 0.05 : 0.90 5007.40* 5010.63 5001.40
(0.02%) (0.09%) (0.02%)
There are several parameters to be pre-determined, which
0.10: 0.10: 0.80 5086.19 5010.63 5012.37
are related to the GEP problem and GA-based programs. In (O...." e) (0.09%) (0.12%)
this paper, we use 8.5 % as a discount rate, 0.01 as LOLP· 0.15 : 0.15 : 0.70 5007.40 5006.19 5886.19
criteria, and 15 % and 60 % as the lower and upper bounds . (0.02%) (I.OO%l 10.00%)
for reserve margin, respectively. The considered lower and 0.20 : 0.20 : 0.60 SOO6.19 5106.19 5011.19
{O.OO%l (O.IO~.) (0.11%)
upper bounds of capacity mix are 0 % and 30 % for oil-fired 0.25 : 0.25 : 0.50 SOO6.19 5007.40 5018.37
power plants, 0 % and 40 % for LNG-fired, 20 % and 60 0/0 (0.00%) (0.02%) (0.24%)
for coal-fired,and 30 % and 60 % for nuclear, respectively. 0.30 : 0.30 : 0.40 5006.19 5012.46 5007.40
Parameters for IGA are selected through experiments. (O...."e) (0.13%) (0.02%)
Especially, the dominant parameters such as crossover • The solution with objective function as S007.4O minion dollars IS the
secondbest solutionfoundby dyDamic programming.
probabilities and weights for crossover techniques are
determined empirically from a test system with a 6-year
planning horizon with other data being the same as Cases 1
and 2. Among 18 simulations, we have found the optimal
solution 7 times and the second best solution 4 times.
Funhermore,the optimal or the secondbest solution is found
TABLEV. PARAMETERS FOR lOA IMPLEMENTATION by applying the stochastic crossover technique when the
probability of crossover is 0.6. Also, when the weight of 1-
Parameters Value
point substring crossover is 0.7 and weights for others are
• Population Size 300 0.15, it always found optimal or the second best optimal
• Maximum Generation 300
solution. Therefore, we have set the weights in the stochastic
• Probabilities of CrossoveraDd Mutation
• Numberof Elite Strings 3 (1%) crossover technique as 0.15:0.15:0.70 among the three
• Weightsof I-point, 2-point,and I-point 0.15:0.15:0.10 crossovermethods. This choice has resulted in the robustness
Crossoverfor Substrings in a Biased R.oulette Wheel of the stochastic crossovermethod.

To decide the weight of each crossover method in a biased 6.3. Numerical Results
roulette wheel for stochastic crossover, nine experiments are
performed by changing the probability of crossover from 0.6 The developed lOA was applied to two test systems, and
to 0.8, and the results are compared with the optimal solution compared with the results of DP, TCDP and SOA.
obtained by the full DP as shown in Table VI. Throughout the tests, the solution of the conventional DP is
regardedas the global optimum and that of TCDP as a local
TABLE VI.RESULTS OBTAINED BYEACHCROSSOVER METHOD optimum. Both the global and a local solution can be
obtained in Case 1; however, only a local solution can be
Objective Functionin Minion Dollars obtained by using TCDP in Case 2 since the 'curse of
~ . :':)timal Solution. %) dimensionality' prevents the use of the conventional DP.
Crossover Method PC-0.6 PC-0.7 PC-0.8 Fig. 4 illustrates the convergence characteristics of various
One-point Crossover 5035.53 5013.50 5057.30
(0.59-A) (0.15%) (1.02%) GA-based methods in Case 1. It also shows the improvement
Two-pointCrossover 5034.89 5032.98 5034.89 of IGA over SGA. The IGA employing the stochastic
(0.57%) (0.54%) (0.57%) crossoverscheme(lGA2) has shownbetter performance than
One-pointSubstring 5012.53 5012.46 5010.63 the lOA using the artificial initial population scheme(IGAl).
Crossover (0.13%) (0.13%) (0.09%)
By considering both schemes simultaneously (IGA3), the
performance is significantlyenhanced.

Fig. 5. Observedexecution time for the number of stages.


Cumulative Discounted Cost (l0' $)

\. .__
0<: ... _ ... - Solution Method Case 1
(l4·year Study Period)
Case 2
(24-year Study Period)
moo \.
_ '""""_ ,,...,n _,..
-- _......... _..
.............................._._..."., ..... DP 11164.2 unknown
11200 TCDP 11207.7 16746.7
SOA 11310.5 16765.9
moo lOA1 11238.3 16759.2
11000 lOA lGA2 11214.1 16739.2
IOA3 11184.2 16644.7
1 51 101 151 201 251Gererati:n

- - $A - - I;1U _._-J;A2 - nA3 --o¢Ial! TABLE IX.

Fig. 4. Convergence cbatacteristicsofIOA method in Case 1 system. INTRODUCED PLANTS IN CASE 1 AND CASE 2 BY IOAJ.

Table VIII summarizes costs of the best solution obtained Type Oil LNOCIC Coal PWR PHWR
by each solution method. In Case I, the solution obtained by Year I(2ooMWl I(450MWl 500MWl (lOOOMWl I(7ooMWl
IGA3 is within 0.18% of the global solution costs while the 1998 3 (5)1 2 (1) 2 (3) 0(1) 2(0)
solutions by SGA and TCDP are within 1.3% and 0.4%, 2000 5 (6) 3 (1) 5 (6) 0(1) 4 (1)
respectively. In Case I and Case 2, IGA3 has achieved a 2002 5(7) 3 (1) 5 (6) 0(2) 4 (1)
0.21% and 0.61% improvement of costs over TCOP, 2004 8 (10) 7 (3) 6 (7) 0(2) 4 (1)
respectively. Although SGA and IGAs have failed in finding 2006 10 (12) 10 (3) 6 (7) 0(2) 6 (2)
the global solution, all IGAs have provided better solution 2008 10 (13) 10 (3) 6 (9) 0(2) 6(2)
than SGA. Furthermore, solutions of IGA3 are better than 2010 10 (13) 10(3) 6 (9) 0(2) 6 (4)
that of TCOP in both cases, which implies that it can 2012 14 11 8 1 7
overcome a local optimal trap in a practical long-term GEP. 2014 17 14 8 I 7
Table IX summarizes generation expansion plans of Case 1 2016 19 15 10 1 9
and Case 2 obtained by IGA3. 2018 19 17 10 3 9
The execution time of GA-based methods is much longer 2020 20 18 12 3 9
than that ofTCOP. That is, IGA3 requires approximately 3.7 1. The figures within parenthesisdenote the results ofIOAJ m Case 1.
and 6 times of execution time in Case I and Case 2,
respectively. However, it is much shorter than the
conventional OP. Fig. 5 shows the observed execution time The proposed method definitely provides quasi-optimums
of IGA3 and OP as the stages are expanded. Execution time in a long-term GEP within a reasonable computation time.
of IGA3 is almost linearly proportional to the number of Also, the results of the proposed IGA method are better than
stages while that ofOP exponentially increases. In the system those ofTCDP employed in the WASP, which is viewed as a
with II stages, it takes over 9 days for OP, and requires very powerful and computationally feasible model for a
about 1.2 millions of array memories to obtain the optimal practical long-term GEP problem. Since a long-range GEP
solution while it takes only II hours by IGA3 to get the near problem deals with a large amount of investment, a slight
improvement by the proposed IGA method can result in
substantial cost savings for electric utilities.

~ 3000 r----------+---..., 6.4. Summary

.g 2500
The IGA has been successfully applied to long-term GEP
~ 2000
problems. It provided better solutions than the conventional
~ 1500 SGA. Moreover, by incorporating all the improvements
. ~ 1000 (IGA3), it was found to be robust in providing quasi-
~ 500
O.~~-~-~-~- ~-i-~-~-~~~::::=r
optimums within a reasonable computation time and yield
better solutions compared to the TCOP employed in WASP.
1 2 3 4 5 6 7 8 9 10.11 12 Contrary to the OP, computation time of the proposed IGA is
-+- DP _ l0A3 1 Stages linearly proportional to the number of stages.
The developed IGA method can simultaneously overcome
the 'curse of dimensionality' and a local optimum trap

inherent in GEP problems. Therefore, the proposed IGA The method takes advantage of both the robustness of the
approach can be used as a practical planning tool for a real- SGA and the accuracy of conventional optimizationmethod.
system scalelong-term generation expansionplanning. The proposed VAR planning approach is in the form of a
two level hierarchy. In the first level, the SGA is used to
select the location and the amount of reactive power sources
7. REACTIVE POWERPLANNING to be installed in the system. This selection is passed on to
the operationoptimization sub-problemin the second level in
The reactive power, or VAR, planning problem is a order to solve the operational planning problem. It is a
nonlinear optimization problem. Its main objectis to find the common practice to use a successive linear programming
most economic investment plan for new reactive sources at
(LP) formulation to improve the computation speed and to
selected load buses which will guarantee proper voltage
enhancethe computation accuracy; the LP method is fast and
profile and the satisfaction of operationalconstraints. Usually
robust. The operational planning problem is decoupled into
the planning problem is divided into operational and
coupled real (P) and reactive (Q) power optimization
investment planning subproblems. In the operational
modules; and the successive linearized formulation of the P- .
planning problem the available shunt reactive sources and
Q optimization modules speeds up computation, and allow
transformer tap-settings are optimally dispatched at minimal
the LP to be used in finding the solution of the nonlinear
operation cost. In the investment planning problem new
problem [31]. The dual variables in the LP are transferred
reactive sources are optimally allocated over a planning
from the P-Q optimizationmodules to the SGA module in the
horizon at a minimal total cost (operationalandinvestment).
first level to set up the Benders' cut for investment planning.
During the past decade there has been a growing concern
This hierarchical optimization approach allows the SGA to
in power systems about reactive power operation and
obtain correct VAR installations, and at the same time
planning. Recent approaches to the VAR planning problem
satisfy all the operational constraints and the requirement of
are becoming very sophisticated in minimizing installation
minimum operation cost.
cost and for the efficient use of VAR sources to improve
system performance. Various mathematical optimization
formulations and algorithm have been developed, which, in 8. DECOMPOSITION OF REACTIVE POWER
most cases, by using nonlinear [22], linear [23], or mixed PLANNING PROBLEM
integer programming [24], and decomposition method [25-
28]. More recently, simulated annealing [29] and genetic The reactive power planning problem is to determine the
algorithm [16][30] have also been used. With the help of optimal investment of VAR sources over a planninghorizon.
powerful computers, it is now possible to do a large amount The cost function to be minimizedis the sum of the operation
of computation in order to achieve a global optimal insteadof cost and the investment cost. The investment cost is the cost
a local optimalsolution. to install new shunt reactive power compensationdevices for
Simulated annealing method is a random search method. the system. The fuel cost for generation is the only operation
Hsiao et al. [28] provided an approach for the simulated cost to be considered in this chapter.
annealing method using the modified fast decoupled load
flow. However only the new configuration 01AR 8.1. Investment-Operation Problem
installation) is checked with the load flow, and existing
resources such as generators and regulating transformers are The reactive power planning problem involves both
not fully exploited. Simple Genetic Algorithm (SOA) operation and investment costs, and it can be written in the
methodis a powerful optimization techniqueanalogous to the followingform:
natural genetic process in biology. Theoretically, this
technique converges to the global optimum solution with min f(Y .U) =L (Y) + L (U) (12a)
Y.ll o.
probability one, providedthat certain conditions are satisfied.
The SGA methodis known as a robust optimization method. subject to
It is useful especially when other optimization methods fail
in finding the optimal solution. However, it oftenrequires too G.(Y,U)SO (12b)
many repeatedcomputations in obtaining final results. G2(U) S 0 (12c)
In order to obtain a good result for reactive power
generation planning problem, a synthetic optimization where
procedure is presented by combining two optimization
methodstogether, randomsearch and optimization algorithm. T
This chapterpresentsan improvedmethod of operational and y =[pT ,V ,NT] : vector of operational variables
investment planning by using a simple genetic algorithm P : vector of real power generations,
combined with the successive linear programming method. V: vector ofbus voltage magnitudes,
The Benders cut are constructed during the SGA procedure N: vector oftap-settings,
to enhance the robustness and reliability of the algorithm. U : vector of investment variables

Lo (Y) : operationcost decision for the feasible investment U is obtained by solving
the U (investment) subproblem:
L" (U) : investment cost
minZ = CTU +0 (1Sa)
G1 (Y, U) : constraintinvolving both Yand U Y.U

G2(U) : constraintinvolving U only. subject to

Equation (12a) consists of investment and operation cost. (ISb)

Equation (12b) are coupled constraints for operation and (ISc)
investment variables. It includes load flow balance and other
important operational constraints. Equation (12c) includes where 0 is low limit varable, and W(U) is called the
constraints relative to only investment variables. Benders cut and is a function which supplies information
concerningthe capacity decision- U in terms of the operation
8.2. BendersDecomposition Formulation feasibility. Then the problem would determine a solution (U,
Y) that would minimizethe global functionf Ba),
The operation cost is nonlinear, and the investment cost The Benders decomposition method builds the function
can be assumed to be linear with respect to the amount of W(U) based on the solution of the Y subproblem. In
newly added reactive power compensation. According to this nonlinear optimization, W(U) can be determined if we
assumption, the minimizationproblem (12) can be expressed observe that the simplexmultiplier vector associatedwith the
in a nonlinearprogrammingformulation: first stage (Y subproblem) is the basic feasible solution for
the dual problem. Therefore,
min CTU + f(Y) (13a) Ic
ru W(U) = v(U + AIc(BU 1 - BU) (16)
subject to where v(.) is the optimal operation cost with the installation
H(Y)+BU S b, (13b) «tr .
DUSb2 (13c) The dual solution AIc is the simplex multiplier associated
with the constraint in the operation subproblem, where k is
where the iteration number. Since the revised simplex method is
Y: vector of operation variables
used for solving the operation subproblem, AIc is obtained as
U : vector of investmentvariables
a by-product and new constraints, each corresponding to a
C : vector of cost coefficients
different investmentinstallation,are established.
B,D : matricesof constraints
f,H : cost and constraint functions, respectively. From equations (15a) to (ISc), it can be seen that A/'
b, .b, : vector of constraints. presents the change of the cost caused by the unit change in
investment for unit i. If A;l > 0 then U; > U: is helpful in
Because of the structure of the constraints, it is quite natural generatinga new memberof population,which may decrease
to consider two level hierarchical approach to solve the the total cost Z. If only one constraint is considered, a
problem. That is, use the SOA to select the device and decreasing direction similar to the steepest descent method
amount, and use an optimization method to obtain optimal can be found. The Benders cut is viewed as a coordinator
results under the given installation. between investment and operation subproblems, and the
In this chapter the generalized Benders decomposition GBD method iterates between the two. At each iteration a
(GBD) method [32] is used in the SGA module in setting up new constraint is added into W(U) to fonn a new constraint
a Benders cut in order to improve the convergence set.
characteristics. The procedure is as following:

(i) Assuming a feasible investment U, the feasible decision Y 9. SOLUTION ALGORITHM FORVARPLANNING
is obtainedby solvingthe Y(operation) subproblem:
The planning methodology developed in the paper is
minf(Y) (14a) simulatedfor reactive power planning problem.The problem
is decomposed into investment and operation subproblems,
subject to and solved iterativelyuntil convergence[26].
The operation subproblem is again decomposed into
H(Y)Sbl-BU (14b) economic real (P) and reactive (Q) power dispatch problems
to minimize the fuel cost function [31][33]. In the P module
optimal values of real power generation,and in the Q module
{(ii)} Having found optimal Y from the first stage, the
the optimal values of bus voltage magnitudes and transformer

tap-settings are obtained. In addition, the optimal values of tail-tail crossover and head-tail crossover and the crossover
reactive power dispatched by the generators and position is selected randomly. The head-tail can also be used
compensatorsare also obtained. in producing new strings from two identical parents.
In each population, total operating and investment costs In the original SGA, only the of fitness value resulting
are calculated for each investment. The fitness is simply the from the operation subproblem is used to generate new
inverse of this total cost. The ratio of the average fitness and generation. However the new population generated only by
the maximum fitness of the population is computed and its fitness is random and blind. By using the Benders cut,
generation is repeateduntil: which makes use of both the dual variable infonnation and
the cost function, a new and better string can be found. If this
averagefitness new string is a good one (it may be the best one), i.e. it has a
- - - - -.....-~AP
maximumfitness higher fitness value, it will survive to the next generation.
Otherwise, it will likely die afterwards. hi this method, the
where AP is a given number that represents the degree of robust characteristics of the SOA can still be maintained; at
satisfaction. If the convergence has been reached at given same time it increases the chance to fmd the optimal result
accuracy, then optimal values for investment are found. earlier. The Benders cut can be set up without difficulty
Other criteria, such as the difference between the maximum because all variables are made available when the operation
and minimum fitnesses and the rate of increase in maximum optimization,sub-problem is solved.
fitness, can also be used as the stopping criteria. On other
possibility is to stop the algorithm at some fintess number of
generations and designate the result as the best fit from the
The systems tested and described here are: The 6 and 30-
The iterativeprocess is as follows:
bus networks. The emphasis is on the effectiveness of the
Step 1. Initial population generation - compute the fitness technique and validity of "results. The following parameter
of each member according to operation sub-problem are used for·SGA program:
mutation rate: 0.01
Step 2. Generate new population - typical SGA methods,
crossover rate: 1.0
reproduction crossover and mutation, are used. The
abandoning rate 0.9
Benders cut is used on a subset of strings to obtain one
parameter resolution: 3 bits per substring
new and bettermember ofthe population.
Step 3. Computethe fitness of the new generation.
10.1. The 6-bus System
Step 4. If convergence condition is satisfied, stop
computation. Otherwise, return to Step 2, and begin a
The 6-bus system given in [31] is considered, which has
new generation.
two generators at bus I and 2, and two load buses, 4 and 6,
are used for shunt reactive compensation. The initial load
flow results show that, with no reactive compensation there
The most important step is the Step 2. A new population is
are under-voltages at load buses 3 through 6. Thus the
generated according to the fitness of the old population
reactive power supply from generators is not adequate to
though the simulated spin of a weighted wheel in the SOA
maintain the required voltage profile.
[1]. Some modificationare made to the SGA for our planning
Table X shows the maximum, minimum and average
problem, resulting in a modified SGA(MSGA):
fitness obtained by the simple genetic method (SOA) and
modified simple genetic method (MSGA). Both methods
(1) In the GBD, the iteration procedure is an alternate
give the same final results. However, the MSOA method
computation between investment and operation till the
needs less iteration than the SOA method.
convergence is reached. The Benders cuts are selected and
After the reactive power planning is completed, the total
constructed from old population. It is used to obtain a new
reactive power compensation is summarized in Table XI. It is
member of population.The number of cuts can be adjusted as
observed that the voltage profile is within the operating range
a part of the procedure. Some better fitted strings and some
ofO.90-I.IS p.u, Both voltage limits are satisfied. The total
worse fitted strings are selected to construct the cuts. The
cost is decreased from 5619.53 to 5390.78, a decrease of
benders cut helps in narrowing down the space of possible
36.9%. The final operation cost for the optimization without
solutions, and thus speeds up the convergence.
capacitor investment is 5397.78, higher than the optimal
result. The column for Test 2 is corresponding to the result
(2) An abandoning rate is considered in giving up some
obtained by using the Benders decomposition method. The
poor alternativesby assorting the fitness of the alternatives.
total cost is also higher than that of the modified SOA
(3) Different crossovers are also considered, that is, the

values. Fig . 7 shows the iteration result for the test case using
only the SGA, where there were 325 crossovers and 141
Method Gen. min Avg. max It can be seen that when the Benders cuts are added for the
initial 0.2526 0.2543 0.2559 MSGA, only 2 generations are needed to find the optimal
SGA 11 result. After that the optimal results are still maintained
final 0.2556 0.2558 0.2561
during later iterations. As indicated in Fig. 7, the SGA
initial 0.2526 0.2543 0.2559
MASG 9 method needs 18 generations to find the fmal result Due to
final 0.2547 0.2556 0.2561
random search, the optimal result can only be reached after a
considerable number of iterations. The convergence
procedure is slower than the MSGA method .

Limits Initial Test I Test 2 Test 3

Low Upper State Result Result Result 13
, ~: j
1.0 1.1 1.00 1.1 1.065 1.032
~ 0.12
VI 1.0 1.15 1.00 1.15 1.150 1.150
0. 11
V; 0.9 1.0 0.78 0.932 0.948 0 .983
I .I \
" "
i \ ./
. ../'
""" ;
0.9 1.0 0.88 0.966 0.995 0.995 I '
V. .0.09 I

~"M; ~
V, 0.9 1.0 0.82 0.968 0.995 0.996
0.9 1.0 0.87 0.946 0.979 0.995
-20 .
25 .19
0.06.;;- 5 10
~ --7---:;;----:-:------:::--------
1S 20 25

C6 0.0 30.0 0.0 0.0 20 .0 30.0 Fig. 6. MSGAiteratioD result

Total cost ($) 6.19.53 397.78 391.35 390 .78

Losses (MW) 9.83 18.68 17.70 17.39
Test 1: Operation optimization without investment.
Test 2: Operation optimization with investment by Benders
decomposition method only.
Test 3: Operation optimization by using MSGA

10.2. IEEE 3D-bus System

For the IEEE 3D-bus system [31], there are 6 generator

buses. Seven buses are selected to add capacitors. Each
candidate bus has 3 bits for parameter resolution and it can
represent 8 different values for installation. The length of
chromosome string is 21 bits and the population size is 25.
Initial optimization is ron for operational variables. The
result shows that the system can maintain all operation
constraints without any new capacitor installed, but at higher
cost. In order to test the effectiveness of the program high
unit installation cost is used. It was anticipated that the SGA f ig. 7, SOA iteratioD result
method should find an optimal result after certain generations
and in which case the additional installation should be zero.
Fig. 6 shows the iteration result for the test case using the 10.3. Summary
MSGA with Benders cut added. There were the total of 264
crossovers and 104 discards for some string with bed fitness The voltage profile throughout the planning period was

improved from the under-voltages seen in the initial load efficient optimization method for operation sub-problem, and
flow, to the required operation range. It was also seen that making use of the parallel nature of the SGA, the MSGA
new shunt capacitors are installed at or near the load buses promises as a useful tool for planning problems.
that exhibit under-voltage violation. Test shows that the
MSGA method is robust in algorithm and gives good results
which include the global minimum as a solution. 12. ACKNOWLEDGEMENT
The SGA needs a higher cpu time compared with an
analytical optimization method. However, the SGA is This worked is supported in part by the Korea Science and
flexible, robust, and easy for modification. There are no need Engineering Foundation and the National Science
of assumptions for linearity, convexity, and so on. As it is Foundation under Grants INT-960S028 and ECS-970510S.
shown that the method can be easily combined with other The author likes to acknowledge the contribution of R.
methods. The heuristic experience can be added on without Dimeo, X. Bai, Y. -M. Park, J. -B. Park, J. -R. Won, M.
difficulty. With the help of high speed computers, using an Mangoli, L. T. O. Youn,and. Ortiz.
efficient optimization method for operation sub-problem, and
making use of the parallel nature of the SOA, the MSGA 13. REFERENCES
promises as a useful tool for planning problems.
[1] D. E. Goldberg, Genetic Algorithms in Search, Optimisation and
Machine Learning, Addison-Wesley Publishing Company Inc.,
11. CONCLUSIONS Massachusetts, 1989.
[2] D. C. Walters and G. B. Sheble, "Genetic algorithm solution of
This chapter introduced an improved genetic algorithm economic dispatch with valve point loading", IEEE Trans. on PWRS,
Vol. 8, No.3, 1993, pp. 1325-1332.
(IGA) for a long-term least-cost generation expansion [3] P. H. Chen and H. C. Chang, "Large-scale economic dispatch by
planning (GEP) problem. The IGA includes several genetic algorithm", IEEE Trans. on PWRS, Vol. 10, No.4, 1995, pp.
improvements such as the incorporation of an artificial initial 1919-1926.
population scheme, a stochastic crossover technique, elitism [4] D. Dasgupta and D. R. McGregor, "Thermal unit commitment using
geneticalgorithms", lEE Proc-Gener. Transm. distrib; Vol. 141,No.
and scaled fitness function. The lOA has been successfully S,1994,pp.4S9-46S
applied to long-term OEP problems. It provided better [5] G. B. Sbeble, T. T. Maifeld, K. Brittig, and G. Fabd, "Unit
solutions than the conventional SOA. Moreover, by commitment by genetic algorithm with penalty methods and a
incorporating all the improvements, it was found to be robust comparison of Lagrangian search and genetic algorithm-economic
dispatch algorithm", Int. Journm of Electric Power &: Energy
in providing quasi-optimums within a reasonable Systems, Vol. 18,No.6, 1996,pp. 339-346.
computation time and yield better solutions compared to the (6] K. lba, "Reactive power optimization by genetic algorithm", IEEE
TCDP employed in WASP. Contrary to the DP, computation Trans. onPWRS, Vol.9, No.2, 1994,pp.685-692.
time of the proposed IGA is linearly proportional to the [7] K. Y. Lee,X. Bai, and Y. M. Park, "Optimization method for reactive
power planning using a genetic algorithm", IEEE Trans. on PWRS,
number of stages. The developed IGA method can Vol. 10,No.4, 1995,pp. 1843-1850.
simultaneously overcome the 'curse of dimensionality' and a [8] K.. Y. Lee and F. F. Yang, "Optimal relCtive power planning using
local optimum trap inherent in GEP problems. Therefore, the evolutionary algorithms: A comparative study for evolutionary
IGA approach can be used as a practical planning tool for a programming, evolutionary strategy, genetic algorithm, and linear
programming", IEEE Trans. on PWRS, Vol. 13,No.1, 1998,pp. 101-
real-system scale long-term generation expansion planning.
A synthetic method of reactive power planning is [9] R. Dimeo and K. Y. Lee, "Boiler-Turbine control systemdesignusing
presented. Different from the conventional SOA, which a genetic algorithm", IEEE Trans. on Energy Conversion, Vol. 10,
mainly uses the objective function for its fitness evaluation, No.4, 1995,pp. 752-759.
[10] Y. Zhao, R. M. Edwards, and K. Y. Lee, "Hybrid feedforward and
the approach presented in this chapter, MSGA, makes use of
feedback controller design for nuclear steam generators over wide
not only the objective function but also the dual variable range operation using genetic allorithm", IEEE T1'tI1lS. on Energy
information. The SOA is a random search algorithm and Conversion, Vol. 12,No.1, 1997,pp. 100-106.
useful in finding the global optimal solution. The new (11] Park, J.-B., Y.-M. Park., J.-R.. Won, and K. Y. Lee, "An Improved
formulation of the Benders method for investment-operation Genetic Algorithm for Generation Expansion Planning," IEEE
Transactions on PowerSystems, Vol. 15, No.3, pp. 916-922, August
decomposition improves the robustness of the random 2000.
algorithm. Test shows that the MSGA method is robust in [12] S.T. Jenkinsand D.S. Joy, Wien Automatic SystemPlanning Paclcage
algorithm and gives good results which include the global (WASP) - An Electric UtilityOptimal Generation Expansion Planning
minimum as a solution. Computer Code, Oak Ridge National Laboratory, Oak Ridge,
Tennessee, 0RNL-4945, 1974.
The SGA needs a higher cpu time compared with an [13] Electric Power Research Institute (EPRI), Electric Generation
analytical optimization method. However, the SGA is Expansion Analysis System (EGEAS), EPR! EL-2561, Palo Alto,CA,
flexible, robust, and easy for modification. There are no need 1982.
of assumptions for linearity, convexity, and so on. As it is [14] S. Nakamura, "A review of electric production simulation and
capacity expansion planning programs", Energy Research, Vol. 8,
shown that the method can be easily combined with other 1984,pp. 231-240.
methods. The heuristic experience can be added on without [15] P. Masse and R. Gilbrat, "Application of linear programming to
difficulty. With the help of high speed computers, using an investments in the electric power industry", Manage",."t Science,
Vol. 3, No.2, 1957, pp. 149-166.

[16] J. A. Bloom, "Long-range generation planning using decomposition [29] R. C. Reeves, Modern Heuristic Techniques for Combinatorial
and probabilistic simulation", IEEE Trans. on PAS, Vol. 101, No.4, Problems. New York, Halsted Press, an imprint of John Wiley \&
1982,pp. 797-802. Son, Inc., 1993.
[17] Y. M. Park,K. Y. Lee, and L. T. O. Youn,"New analytical approach [30] V. Miranda, J. V. Ranito, and L. M. Proenca, "Genetic algorithms in
for long-term generation expansion planning based maximum optimalmultistagedistribution network planning", IEEE PES Winter
principle and Gaussian distribution function", IEEE Trans. on PAS, Meeting, #94 WM 229-S-PWRS.
Vol. 104,1985,pp. 390-397. [31] M. K. Mangoli, K. Y. Lee, and Y. M. Park, "Optimal real and
[18] A. K. Davidand R. Zhao, "Integrating expert systems with dynamic reactive power control using linear programming", Electric Power
programming in generation expansion planning", IEEE Trans. on SystemResearch, 26, 1993, pp, 1-10.
PWRS, Vol.4, No.3, 1989,pp. 1095-1101. [32] M. Geoffrion, .... Generalized Benders decomposition", Journal of
[19] A. K. David and R. Zhao, "An expert system with fuzzy sets for Optimization Theory and Applications. Vol. 10,No.4, 1972,pp. 237-
optimal planning", IEEE Trans. on PWRS, Vol. 6, No.1, 1991, pp. 26l.
59-65. [33] K. Y. Lee, Y. M. Park, and J. L. Ortiz, "United Approach to optimal
[20] Y. Fukuyama and H. Chiang, "A parallel genetic algorithm for real and reactive power dispatch", IEEE Trans. on Power Appar. and
generation expansion planning", IEEE Trans. on PWRS, Vol. 11,No. Syst. PAS-I04, (1985), pp. 1147-1153.
2, 1996,pp.955-961.
[21] Y. M. Park, J. B. Park, and J. R. Won, "A genetic algorithms
approach for generation expansion planning optimization", Proc. of Additional Referencse on ReactivePowerPlanning
the IFAC Symposium on Power Systems and Power Plant Control,
PERGAMON, UK, 1996,pp. 257-262. [34] H.H. Happ, '''Optimal power dispatch: A comprehensive smvey',
[22] R. Billington and S.S. Sachdev, ....Optimumnetwork VAR planning IEEE Trans. on Power Appar. and Systems, 1977, PAS-96, pp. 841-
by nonlinear programming", IEEE Trans. on Power Apparatus and - 854. .
Systems, PAS-92, 1973,pp. 1217-1225. . [35] O. Alsac, J. Bright, M. Prais, and B. Stott, "Further development in
[23] G.T. Hegdt and W.M. Grady, "Optimal var siting using linear load LP-based optimal power flow" IEEE Trans. on Power Systems,Vol.
flow formulation", IEEE Trans. on Power Apparatus and Systems, 5. No.3, August 1990,pp. 697-711.
July/Aug. 1975.pp 1214-1222.Vol. Pas-102No.5, May 1983. [36] E. Hobson,"Network constrained reactive power control using linear
[24] K. Aoki, M. Fan, and A. Nishikori, "'Optimal var planning by programming", IEEE Trans. on Power Appar. and Syst., PAS-99,(4),
approximation method for recursive mixed integer linear planning", 1980,pp.1040-1047
IEEE Trans. on Power Syst., PWRS-3,(4), 1988,pp. 1741-1747. [37] R. Fernandes, F. Lange, Burchettr., H. Happ, and K. Wirgau,' "Large
[25] K.Y. Lee, J.L. Ortiz, Y.M. Park, and L.G. Pond, "An optimization scale reactive power planning", IEEE Trans. on Power Appar. and
technique for reactive power planning of subtransmission network Syst.,Vol.PAS-I02 No.5, May 1983.
under nonnal operation", IEEE Trans: on Power Syst., 1986, [38] H.H. Happ andK.A. Wirgau, ....Static and dynamic var compensation
PWRS-l, pp. 153-159 in systemplanning", IEEE Trans. on Power Appar.Syst., PAS-97, (5),
[26] M. K. Mangoli, K. Y. Lee, and Y. M. Park, "'Optimal long-term 1978,pp.1564--1578.
reactive power planning using decomposition technique ", Electric [39] A. Hughes,G. Jee, P. Hsiang, R.R. Shoults, and M.S.Chen, .... Optimal
PowerSystemsResearch, 26, 1993,pp. 41-52. power planning", IEEE Trans. on Power Appar. and Syst, PAS-1OO,
[27] R. Nadira,W. Lebow, and P. Usoro, .... A decomposition approach to 1981,pp.2189--96.
preventiveplaDning of reactivevolt-ampere (VAR) source expansion", [40] R.R. Shoultsand M.S. Chen, . "R.eactive power controlby least square
IEEE Trans. on PowerSystems, Vol. PWRS-2,No.l,Feb. 1987. minimization", IEEE Trans. on Power Appar. and Syst., PAS-95,
[28] Y.-T. Hsiao, C.-C. Liu, H.-D. Chiang, and Y.-L. Chen,HA new 1976,pp. 397-405.
approach for optimal VAR sources planning in large scale electric [41] W.M. Lebow, R.K. Mehra, R. Nadira, R. Rouhani, and P.B. Usoro,
power systems",IEEE Transon PowerSystems, Vol. 8, No.3,August "Optimization of reactive volt-ampere (VAR) sources in system
1993.pp. 988-996. planning", EPRI Report, EL-3729, Project 2109-1,Vol.I Nov. 1984

Chapter 11

Network Planning

Abstract: A key feature of the application of iterative the chapter to illustrate the application of three approx-
approximation methods such as simulated annealing, ge- imation algorithms, namely, simulated annealing, genetic
netic algorithms and tabu search to power network prob- algorithms, and tabu search. Three of these combina-
lems is the codification of the problem and the consequen- torial problems are concerned with distribution systems:
tial definition of neighborhood of a given configuration. the distribution system expansion planning problem, the
This chapter addresses the different ways codification and optimal capacitor placement in distribution networks, and
neighborhood definition can be made in connection with the reconfiguration of primary distribution feeders prob-
four important network problems: optimal reconfigura- lem. The forth problem regards the transmission network
tion of distribution systems aiming to optimize opera- expansion planning problem, which has already been in-
tions, capacitor placement in primary distribution feed- troduced in [1] (This chapter expands that discussion and
ers, optimal distribution system expansion (the addition introduces results obtained for large real-world test sys-
of substations and feeders), and the optimal expansion of tems.)
transmission networks.
Keywords - network expansion planning, distribution sys- 2.1 Reconfiguration of Distribution Feeders
tem planning, capacitor placement, optimization meth-
ods, combinatorial optimization. The determination of the optimal configuration of distri-
bution feeders consists in finding the topology of a radial
system, with part of the feeder sections in operation while
1. INTRODUCTION others remain de-energized, as illustrated in Fig. 1 (this
This chapter presents the application of three iterative system was originally studied in [22]). The objective can
approximation algorithms (simulated annealing, genetic be, for example, the minimization of the power losses,
algorithm, and tabu search) to four important power net-
=L ri
work planning problems. By planning it is understood Loss i 2 i
both operations and expansion planning. The first prob- i=l 'Vi
lem is the reconfiguration of primary distribution systems,
which is a typical operations planning problem. The other where Pi and Qi are the active and reactive power flows
three problems are the allocation of capacitor in distribu- leaving the node i, Vi is the voltage magnitude at node i,
tion systems, the distribution system expansion planning, r i is the resistance of the corresponding feeder section, and
and the power network expansion planning, which are ex- Loss represents the total active power losses in all feeder
pansion planning problems. Both small example system sections. Alternatively, the objective can be to minimize
and real world cases are presented. the Load Balancing Index (LBI), formulated as follows:
The chapter is organized as follows. Firstly, the math-
ematical formulation of the example problems are pre-
sented, which includes the definition of the objective func- LBI =-
(L(Y -11;)2) t

n i=1
tions, the main constraints and variables involved in the
problems. Next, the codification of the problems for the where n is the number of primary feeders (including one
different techniques (simulated annealing, genetic algo- or more feeder sections), Yi is the normalized loading on
rithm and tabus search) are presented along with the feeder j (the actual loading divided by the loading limit)
neighborhood structures. Then implementation and algo- and y is the average of the normalized loadings '11;- Other
rithm details are discussed. And finally, relevant results objectives can be used as well, or a combination of two or
are summarized. more objectives, in which case the problem would become
a multi-objective optimization problem.
2. MATHEMATICAL FORMULATION OF THE The energized feeder sections form a forest (a set of
PROBLEMS trees) rooted ate the substations (the low voltage side of
the transformers). The de-energized feeder sections, on
This section presents the mathematical formulations of the other hand, connect two of such trees, as illustrated
four combinatorial optimization problems that are used in in Fig. 1 and 2. Using the terminology of the theory of
graphs, the reconfiguration problem can be stated as fol- I II III
lows: find the set of trees that lead to the minimization of
the objective function (say, the losses, or the LBI index),
satisfying (1) the voltage drop limits, (2) the capacities of
feeder sections and transformers, and (3) the power flow
An alternative formulation for the reconfiguration
problem consists in characterizing a topology by the
switch statuses (both sectionalizing switches and the tie ® ..
switches) . In this case rather than determining which 18 -,~.: 21
feeder sections are energized or not, the objective is to
determine which switches are open and which ones are 19 .... (IDe?)
closed in the optimal configuration. 5 ......•.•.··• 20
The radiality constraints, implicit to the requirement 23
that the topology forms a forest, makes the problem a
hard one to solve. According to [21, 26], the optimal
configuration for the example system given in Fig. 1 is n 1~t-----
14 @
...... · .. ·26·.·.
<D 25
shown in Figure 2.
Totallosses~ 466.1 kW

I II III Fig. 2. The optimal topology for the problem of Fig. 1.

large scale combinatorial problem in which the number of

local minimum solution points and the number of options
to be analyzed increases exponentially with the size of the
distribution system.
Since capacitor banks are added, and sometimes op-
erated, in discrete steps, the objective function is non-
differentiable. Load variations over a period of time are
discretized, i.e., the load duration curve is approximated
13 @'.~5 by a piece-wise linear function. Mathematically, this ma-
· 20 23 kes the capacitor placement problem a mixed integer non-
® linear program. Most of the conventional optimization al-
® gorithms used in practice, however, are unable to generate
14@ <D 25
1. .- - - -. . . . . • ••• •••• _ _- - - _
optimal solutions for this type of problem.
The objective function is commonly formulated as the
Total losses: 511.4 kW cost of losses and investments over a period of time (e.g.
1-10 years). As for the operating limits, the goal is to
Fig. 1. The reconfiguration problem: The initial topology keep voltages within the adequate range. This problem
for the example system of Ref. [22]. has been adequately described in mathematical terms in

nt ftC
2.2 Optimal Capacitor Placement mint} = keLTiPi(Xi ) + L!(Uk) (1)
i=o k=l
Capacitor banks are added to radial distribution systems s,c,
for power factor correction, loss reduction, voltage profile
improvement and, in a more limited way, circuit capacity Gi(Xi,U i ) 0; = i = 0,1, ... ,nt
increase. With these various objectives in mind, and sub- Hi(x i ) ~ 0; i = O,l, ... ,nt
ject to operating constraints, optimal capacitor placement o :5 ul u~;s k E C1 or
aims to determine capacitor types, sizes, locations and
control schemes. Like many other combinatorial problems
o $ ut Uk; = k E. C2
found in power network planning, the capacitor placement where nt represents the number of load levels in the piece-
problem presents a multimodallandscape. This is a hard, wise linear load duration curve; n c is the number of candi-

date buses (buses where capacitor allocation is allowed); demand growth as well as geographical expansion. A spe-
Gi(Xi, U i ) 0 represents the power flow equations for the cial case of this problem is the so called green field plan-
i-th load level (Xi are state variables and 'Ui are control ning where there is no initial system and an entirely new
variables, i.e., capacitor bank reactive power); Hi(zi) ~ 0 network has to be built from scratch (An illustrativa case
are the operating constraints for the i-th load level (e.g. green field planning is shown in Fig. 4 [31]). In this fig-
voltage limits); u~ represents the size of the capacitor ure the dotted lines indicate the places where new primary
bank that can be allocated to bus k; u~ represents the feeders can be added; the data for these feeders are sum-
operation level of the capacitor allocated to bus k for load marized in the Table 1. There are also two alternatives
level i. C 1 and C2 are the sets of candidate buses for both for substation additions, namely, substations 1 and 2.
fixed and variable capacitor banks. The objective function
of Problem 1 has two parts: (a) the first part represents
the cost of losses (Ti represents the fraction of time the w'9
load curve stays at level i with losses Pi (Xi); k e represents
energy cost in S US/KWh); (b) the second part represents
r :

the costs of capacitor placement at the candidate buses

where f(ulr.) is a non-differentiable function.
Two types of capacitors are considered: Fixed capaci-
tors in which case

and switched (variable) capacitors whose taps can be
changed according to the load level. In this case, for each
capacitor k there are nt + 1 different levels of operation,
u~, where i =
0,1, ... , nt are chosen according to the cur- Fig. 4. Alternatives for the expansion of a distribution
rent load level so that system [31].


1 2 3 4 5 Cire. Bus Bus L Circa Bus Bus L

1 1 3 12 9 2 6 16
2 1 5 13 10 2 7 14
3 1 7 10 11 2 10 10
4 1 9 16 12 3 4 15
10 9 8 7 6 5 1 10 0 13 5 6 14
6 2 3 10 14 7 8 12
7 2 4 16 15 9 10 16
8 2 5 8

Table 1. Data for the expansion of the to-bus network of

Fig. 3. Example of a radial distribution system.
Fig. 4.

The capacitor optimal allocation problem formulated An important constraint is the radiality condition,
above can the properly solved by the modern heuristic that is, only radial configurations are accepted, all other
optimization techniques discussed in this tutorial. Fig. 3 topologies being considered infeasible. Several alternative
illustrates a radial distribution system with 9 buses and mathematical formulations have been suggested in the lit-
one substation, which will be used in subsequent sections erature for the distribution expansion planning problem.
of this chapter as an illustration of the combinatorial tech- For the sake of illustration, the following formulation is
niques discussed herein. adopted in this chapter [32]:

min v = CF'Y + cvP + c/z + 1JvP (4)

2.3 Distribution System Expansion Planning
This problem deals with the addition of new substations
and primary feeders to distribution systems to cope with Ap=d
p~Mx The distribution expansion problem can be alterna-
tively formulated as follows: given a graph with k ver-
P=Ep tices associated with substations (both existing ones and
P5:Ry other that can be added to the system) and m vertices
associated with loads, it is sought a forest with a maxi-
z; y E {O,l} mum k independent trees, each one rooted at a substation
(or a vertex of the graph), with each of the m load ver-
tices connected to a single tree and such that the power
where flow constraints are satisfied and the total investment and
operations cost is minimized (4).
y - vector of decision variables associated with substa-
tion additions (y; = a means that the substation j 2.4 Transmission Network Expansion Planning
is not
added; Yj 1 means that the substation j is added. This subsection deals with the static transmission net-
work expansion planning problem. By static it is meant
x - vector of decision variables associated with feeder that the planning is made in one stage, or in one step for
additions (Xi: =
0 means that the feeder k is not a target year, which can be, for example, from three years
added; x" = 1 means that the feeder k is added. to 20 years ahead. Thus, given the network configuration
for a certain year and the peak generation/demand for the
CF - vector of fixed costs for the addition of substations. next year (along with other data such as network operat-
ing limits, costs, and investment constraints), one wants
Cv - vector of substation variable operational costs. to determine the expansion plan with minimum cost, i.e.
CJ - vector of the load capacity of the feeders. one wants to determine where and what type of new equip-
ment should be installed. This is a subproblem of a more
Ctt - vector of the cost per unit of power flow. general case, called dynamic expansion planning, where,
in addition to the questions what and where, one wants
P - vector of the total power flow supplied by a substa- to know when to install new pieces of equipment. In this
tion to the feeders. section it is focussed the initial stages of the expansion
planning studies when the basic topology of the future
p - vector of the power flows in the feeders. network is determined. Network topologies synthesized
d - vector of demand (power). by the proposed approach will then be further analyzed
and improved by testing their performances using other
A - matrix of power flows. analysis tools such as power flow, short circuit, transient
and dynamic stability analysis.
M - matrix of feeder transmission capacities. This problem has been formulated in [1] of this tuto-
rial, and is summarized below for the sake of complete-
R - matrix of the chosen transmission capacities. ness. The static transmission network expansion planning
problem can be formulated as an mixed integer nonlinear
A more thorough discussion about the planning model de- programming problem in which the power network is rep-
scribed above can be found in [31, 32, 35]. As mentioned resented by a DC power flow model:
earlier in this section, there are a number of different ways
this problem can be formulated. One thing all models
have in common, however, is the fact that, once the deci- min t1 = Li; Ci;Rij + L QiTi
sion variables a defined (or de investment variables), the
problem is reduced to the calculation of the power flows B(z + ,,°)8 + 9 + r = d
in a radial network. ( X · · + 'Vl!.) 18· - 8·1 < (x·· + 'Vt?):t..
"' II' '& 1 - '1 '" 'PI,
In order to illustrate the workings of the iterative ap- O~g~g
proximation algorithms the small size system studied in
[31], shown in Fig. 4, is used. This a totally islanded sys- O~r~d
tem. In practice, normally an existing system is available os ni; s iii;
as an starting point, which in general makes things a bit
easier. The objective is to select both the substations and where:
the feeders that will satisfy the operation constraints and Cij - Cost of the addition of a circuit in branch i-j.
which present the minimum cost (the summation of both
fixed and variable costs). Xi; - Total susceptance in branch 1-j.


..........................................!:::::::: ~

Fig. 5. Example of a complex transmission expansion problem: The Brazilian North-Notheastem Network.

B(.) - Susceptance matrix. size systems, although they present excessive computa-
tional burden when applied to larger problems (one hun-
() - Vector of nodal voltage angles. dred decision variables or more) such as the one illustrated
,0 _Vector of initial susceptances, whose elements are in Fig. 4.
Under these circumstances it was only natural to expect
"If.;, I.e, the summation of the susceptances in branch the recourse to iterative approximation algorithms such
i-j at the beginning of the optimization.
as the ones based on physical and biological metafores
(simulated annealing and genetic algorithms), or derived
nij -Number of circuits added in branch i-j: nij =
from artificial intelligence techniques (tabu search) [37,
Xij /,ij;
where 'Yij is the susceptance of the new cir-
cuits. 41].

¢ij - =
Defined as the ratio: 4>ij fij/'Yij; where Iii is the 3. CODIFICATION AND NEIGHBORHOOD STRUC-
maximum flow in a circuit i-j.
d - Vector of liquid demand.
In this section various codification approaches are dis-
g - Generation vector. cussed in connection with the four problems formulated
in the previous section, namely, the reconfiguration of dis-
9 - Vector of maximum generation capacity. tribution feeders, the optimal placement of capacitors in
distribution systems, the expansion of distribution sys-
r - Vector of artificial generations. tems and the expansion of transmission networks.

Q - Penalty parameter associated with loss of load

3.1 Reconfiguration of Distribution Feeders
caused by lack of transmission capacity.
This problem presents both binary variables (energized,
For a given set of decisions variables Xij or nij, Problem de-energized) and real valued variables such as power
(5) becomes linear programming problem: flows and voltage magnitudes. Consider the situation
depicted in Fig. 1. For this case a simple codification
will consist of a vector where the elements indicate which
(6) feeder sections are in operation and which ones are de-
energized. For example, the energized circuits are denoted
B(xk + ,°)8 + 9 + r = d by 1 and the de-energized ones denoted by 0, as follows:
(xt + 1'ij) 18i - 8;1 ~ (x~; + "Yij)4Jij

which is solved for testing the adequacy of a candidate This codification can be improved by ordering the feeder
solution; edequacyis indicated by zero loss of load. Notice sections such that the ones that are in operation appear
that Problem (5) is always feasible due to the presence of together, as indicated in the following:
the loss of load factor Ei Qiri in the objective function;
thus whenever a tentative solution set Xij is inadequate,
feasibility is achieved by the use of artificial generators
(loss of load).

Remarks: Up to the early 1990's, two main types of algo-

rithm were available for solving transmission network ex- A variation of the previous codification consists in stor-
pansion problems: (1) constructive heuristic algorithms, ing in the codification vector only the feeder sections that
and (2) classical optimization algorithms, mainly the ones form part of the forest. For example:
based on Benders decomposition. These techniques usu- Feeder section ~
ally are able to find optimal solutions for small size net-
works. For medium sized networks, the constructive
heuristic techniques only provide local optimal solutions.
As for larger and more complex networks, the solutions Alternatively, only the feeder sections that are currently
can be very poor. On the other hand, the Benders decom- de-energized are stored in the codification vector. All the
position based methods can work fine for certain medium above codifications are practically equivalent regarding
the performance of simulated annealing and tabu search the load level. Only one type of bank is considered, and
algorithms, although the differences can be significative load duration curve is discretized, for example, in three
for evolutive algorithms such as genetic algorithms. different levels: high, medium and low. Under this con-
In order to evaluate the objective function for a partic- ditions the, the codification vector can be expressed as
ular radial configuration (a candidate solution) it is nec- follows:
essary to solve a radial power flow problem to calculate
Bus -+
power flows and voltage magnitudes in the primary feeder 1 2 4 5 6 7 9 10
sections that are energized. This will normally represent
a substantial part of the computational effort required to PI = 0
solve the optimal reconfiguration problem.
For each type of codification, several different neigh- where one capacitor banks have been added to the bus 3,
borhood definitions can be used. This of course depends two to the bus 5, and three to the bus 6. No banks have
on the way the neighbors of the current topology (or con- been added to the remaining buses.
figuration) are characterized. For example, a very sim- When the addition of variable banks are considered, the
ple neighborhood structure can be defined for the first complexity of the formulation increases substantially. For
codification described above: a neighbor is obtained by example, if the load levels are considered, the number of
swapping a de-energized feeder by a energized one selected capacitor connected to the network can vary with the load
among the sections that form part of the closed path cre- level considered. Not only the codification becomes more
ated by the addition of the previously de-energized feeder complicated, but the size of the problem is increased as
section. In this case the size of the neighborhood depends well. In this case the codification can be reformulated as
directly on the number of de-energized feeder sections, follows:
that are assumed to be the sections that when energized
will close a loop in the primary feeder graph, as illustrated
in Fig. 1.

Remarks: A good codification approach should in princi-

ple make it easier to deal with infeasibilities. A critical Levell Level 2
type of infeasibility tha can occur in connection with the
optimal reconfiguration problem concerns the radiality of
the candidate topologies; DOD-radial topology being con- where two capacitor banks have been added to the bus
sidered infeasible. Another type of infeasibility is related 2, for the first load level, and so on. Alternatively, three
to the violation of operations constraints (voltage and flow codification vectors can be used, one for each load level,
limits). as follows:
Simulated annealing and tabu search algorithms nor- Bus -+
mally consider only feasible configuration when they tra- 1 2 10
verse the search space to find a optimal solution for the
reconfiguration problem. Similar strategy is usually fol-
lowed in evolutive algorithms, but is this case special care
must be taken in order to avoid introducing infeasibilities P1M
when crossover occurs.

3.2 Optimal Capacitor Placement

Like the reconfiguration problem discussed earlier in this To evaluate the cost associated with each proposal,
section, the capacitor placement problem also has two three different power flow problems have to be solved,
types of variables: discrete (the decision variables) and that is, one per each load level.
continuos (voltage magnitudes and power flows). The A simple way of defining the neighborhood of a eonfig-
codification in this case will be addressed with the help uration for the capacitor placement problem codified as
of Fig. 3. for example, the codification vector can have described above is as follows: (1) remove a bank from a
indices i corresponding to the the buses where capacitor bus and adding to another bus (swapping), (2) addition·
additions can be made such that the content of the i-th el- of a bank to a bus, and (3) removal of a previously added
ement is the number of capacitor banks that is placed for bank. In the case of variable capacitors, one should also
a certain proposal, for each level of load. Consider firstly consider that the number of capacitor in operation at each
the allocation of fixed capacitors, in which case the capac- bus for a lower level of the load should be less than the
itor banks are assumed to operate the same way regardless number of capacitors for the high load condition. Is this
constraint is not satisfied, Le., if there are more capaci-
tors in operation in the intermediate level than there is
in the higher load level, the corresponding configuration
is considered to be infeasible. 4

Remarks: In addition to the type of infeasibility men-

tioned above, other types of infeasibility can involve load-
ing constraints and voltage magnitude limits. As a rule,
the addition of capacitors improves the overall operation
conditions of a distribution system and so if one starts
from a system that is already feasible, the addition of ca-
pacitors will normally keep the feasibility conditions. The 5
common practice in the capacitor placement problem is
to keep feasibility while the selected algorithm traverses
the search space (This contrasts with the transmission Fig. 6. A possible solution for the expansion problem of
network expansion planning problem where the transi- Fig. 3.
tion through unfeasible solutions is frequent.) Both sim-
ulated annealing algorithm and tabu search usually con-
sider only feasible alternatives, as mentioned above. The substation 2, and the remaining feeders, namely, 4, 6, 8,
same is true regarding evolutive algorithm, although in 10, 11, 12 and 13 do not form part of that solution (these
this case care must be exercised in order to avoid that feeders are not added).
processes that introduce more radical changes to the ex- This codification can be improved by ordering the feeder
isting configurations, such as happen with crossover, will sections as indicated in the following:
create infeasible configurations. Feeder -+
1 2 3 5 14 15 7 9 4 6 8 10 11 12 13

3.3 Distribution System Expansion Planning

This problem has some similarities with the reconfigura-
tion problem discussed earlier in this section: while in the Another possibility would be to store the feeder-
reconfiguration problem the solution is a tree with spec- substation connections in separate vectors as follows:
ified cardinality k (or a forest formed by k independent
trees), in the expansion planning problem a tree with un- PIA =
determined cardinality is sought (the cardinality will be
known only when the solution is found). A difference
between the two problems is that in the reconfiguration
problem all substations will form part of the proposed
solution whereas in the expansion problem not all substa- PIC =
tions will necessarily form part of the optimal solution.
In general, the distribution expansion problem is much where the third vector contains the feeeders that are not
more complex that the reconfiguration problem since the added to that solution.
first problem involves both variable and fixed cost over a Like the two other problems discussed before, the dis-
period 0 time. tribution planning problem involves the solution of radial
A possible codification for this problem uses a vector power Bow problems in order to evaluate each candidate
in which the feeders are associated with the substations: configuration.
the i-th element of the vector contains either a zero, if the A possible definition of neighborhood for this problem
feeder is not added, or the number of the substation to consists in defining a neighbor of the current configuration
which it is connected. This codification can be illustrated as a configuration obtained by removing one feeder and
with help of a possible configuration for the problem sum- adding another one that does not form part of the cur-
marized in Fig. 1 and Table 1. rent configuration, provided that the radiality constraint
Feeder -+ remains is satisfied. Other possibilities for defining neigh-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
borhoods consists in allowing infeasible configurations,
1'1= which are then penalized in the objective function. The
neighborhood structure also should take into account the
where the feeders 1, 2, 3, 5, 14 and 15 are connected to possibility that a neighbor corresponds to a configuration
the substation 1, the feeders 7 and 9 are connected to the in which the number of substations differs from that in
the current configuration. Notice, for example, that the Table 2 will suffice.) New circuits can be added to all
optimal solution given in Fig. 8 has only one active sub- right-of-ways (a total of 15 right-of-ways); the number of
station. additions per right-of-way is not limited. The optimal
solution for this example is shown in Fig. 11.
For example, the codification for the configuration of
the expansion proposal of Fig. 12 is as follows:
Circuit -+
1-2 1·3 1-4 . . . 2-6 4-6 5-6

where two circuits have been added to the right-of-way

2 - 6 and two circuits to the right-of-way 4 - 6. From the
operations stand point this is an unfeasible solution since
it presents loss of load of w 158.2 MW.
Cire. Bus Bus nt, Circ. Bus Bus nii
1 1 2 1 9 2 6 0
Fig. 7. Optimal solution for the problem of Fig. 3. 2 1 3 0 10 3 4 0
3 1 4 1 11 3 5 1
4 1 5 1 12 3 6 0
5 1 6 0 13 4 5 0
Remarks: Like the other problem studied in this chapter 6 2 3 1 14 4 6 0
there are several types of infeasibility that can occur in 7 2 4 1 15 5 6 0
connection with the distribution system planning prob- 8 2 5 0
lem. Even if a topology satisfy the radiality constraint,
operation constraints can be violated. The algorithm ap- Table 2. Data for the 6-bus network of Fig. 8
plied to this problem then can be of the type that accept
In this problem the neighborhood structure is fonned
only feasible-to-feasible transitions or of the type that ac-
by the configurations that result from the current config-
cept transitions through infeasible solutions (Herein in-
uration by (1) adding a new circuit, (b) removing a pre-
feasibility is considered in the sense that constraints are
viously added circuit, and (c) by swapping two circuits
not satisfied; of course, if the constraints are represented
(adding a new one and removing another one that has
as penalties added to the objective function, all config-
been added before). Once the neighborhood is defined, in
uration become feasible.) Both simulated annealing and
the simulated annealing algorithm randomly choses one
tabu search normally perform transitions between radial
neighbor and the acceptance test is carried out to check
configurations; as for infeasible configurations they can al-
whether the candidate configuration will become the new
ways be treated with penalties as described above. As for
current configuration or not. The technique proposed in
the evolutive algorithm they normally tend to introduce
[2, 4] goes as follows:
non-radial configuration by mechanisms such as the con-
ventional crossover. In this regard, it may be worthwhile 1. While the current topology presents loss of load, the
see the remarks made in the following in connection with generation of neighbors is made in the order addition-
the transmission network expansion planning problem. swap-removal, where the swapping and the removal
of circuits is performed only when the configuration
3.4 Transmission Network Expansion Planning that results from the addition of a new circuit does
not pass the acceptance test.
A codification for this problem was suggested in [2],where
only the integer variables (the number of circuits that can 2. When the current topology does not present loss of
be added in a right-of-way) are codified, the real variables, load, the generation of neighbors follows the order
such as the angle voltages, being determined from the so- removal-swap-addition, where the swap or addition
lution of the linear program formulated in Eq. (6). A of circuits is tried only if the removal fails.
configuration is then characterized by an n-vector, where
n is the number of right-of-ways where new circuit addi- For networks with a high degree of islanding, such as
tions are allowed. Figure 8 shows the initial configuration the example shown in Fig. 5, the neighborhood structure
(or the basic configuration) for the 6-bus network (the has to consider the possibility of adding multiple circuits
complete set of data can be found in [12]; For the pur- at a time in order to become more efficient. For example,
poses of this chapter, however, the data summarized in a set of transformers and transmission lines that create a
path between a newload or generation bus and the rest of
TS = 240 91 = 50 the network can be considered as a single addition for the
240 80 purpose of defining the neighborhood of mthe current con-
5 ....... 37.903 1 figuration. For a more comprehensive explanation about
these substructures, called paths, see [2, 4].
In the tabu search algorithm, the entire neighborhood
has to be evaluated in order to determine the best move.
Thus in this approach controlling the size of the neighbor-
hood is critical, what does not happen with the simulated
annealing approach where only one neighbor is generated
and evaluated.

Remarks: In the transmission expansion planning prob-

lem it is frequently the case that the entire neighborhood
2.258 + of a given configuration is formed only by infeasible config-
urations (configurations that present certain level of loss
vi =0 of load). Thus in algorithms such as simulated annealing
it can be highly inefficient to allow only moves to feasible
solutions. Hence in practice infeasible transitions are per-
mitted, the infeasibilities being penalized in the objective
function (The loss of load multiplied by a penalty factor
is added to the cost function as indicated in Eq.550.)
91 =
240 80
--...--.... 5 -+- 52.999 1

n35 =1
Fig. 8. Basic configuration of the 6-bus network.
93 ~
= 165 187.001
rs = 120.469


vi = 200

w = 158.239
vi = 120 ..... 188.118

Fig. 10. Optimal configuration for the transmission net-

work expansion problem of Fig. 8.
....... 186.761
.................... 4

Fig. 9. A solution proposal for the transmission network
Special features of the the algorithms addressed in the
expansion problem of Fig. 8.
chapter are further discussed in this section.
4.1 Simulated Annealing 4.2 Genetic Algorithms
A modified genetic algorithm applied to the reconfigu-
ration problem is described in [26]. The codification
A application of SA to the capacitor placement problem
adopted in this paper departs from the one described pre-
was addressed in [16, 17], where the codification described
viously in this chapter and so.it is summarized in the fol-
in Section 3.2 has been adopted. Reference [16] also sug-
lowing. In this case a topology is identified by the feeder
gests the elimination of trivially infeasible solutions by
sections that are no in operation. For the topology of Fig.
avoiding placing more capacitor banks at a given bus for
1, the codification is as follows:
the intermediate load level than is added for the high load
level. In that paper the concept of compound neighbor- Feeder section

hood is used, according to which, the neighborhood of the
current topology is found by (1) the addition/removal of
singler banks, (2) the addition/removal of multiple banks,
and (3) the swapping of the banks added to two buses. In [26] the objective function is put into the fonn z(x) =
In the SA algorithm, in each iteration, only one neigh- K - f(x) to transform the minimization problem into a
bor is randomly chosen from the neighborhood as defined maximization problem, and proportional section is per-
above for further evaluation. Additionally, for variable formed. Elitism is used for selection and the four best
banks, two strategies are proposed for changing the num- configuration are kept in the next population, the n" - 4
ber of allocated banks: (1) synchronous changes where remaining members of the population being generated by
the number of capacitor banks is change for all load levels recombination and mutation. The crossover operator dif-
at the same time, and (2) asynchronous changes where fer from other proposals that have appeared in the liter-
the changes are performed only for the highest load level. ature, which is consistent with the codification that has'
In the capacitor placement problem the number of infea- been adopted. In the crossover operation the common
sible topology in a given neighborhood is relatively small, elements in both participating configurations a kept in
if compared with other problems such as the transmis- the two descendents, the differing elements being trans-
sion network expansion problem. Thus, the SA algorithm ferred by a random mechanism. For example, consider
traverses the search space stepping only on feasible so- the crossover between the two following topologies:
lutions, infeasible candidates being identified and elimi-
nated before the execution of the acceptance test. The Pl=~
cooling schedule is the standard one regarding fixed ca-
pacitor banks, whereas for variable banks, a local cooling P2=~
schedule is used to find the banks that should operate for
each load level (the sizing problem). This local cooling As 26 is common to both topologies it is passed to the
schedule is normally responsible for most of the compu- two descendents; the other elements, namely, 15, 17, 19 e
tational effort spent by the SA algorithm. 21, are then passed to the descendents by random choice;
for instance, 17 e 19 go to the first descendent and 15 e
Sequential and parallel SA algorithms applied to the
21 to the second, resulting the following:
transmission network expansion problem are described in

[2] and [3], respectively (The paralle implementation is
discussed in Chapter 6 of this tutorial). In both cases,

codification and neighborhood structure are as described
in Section 3.4 (see also Chapter 6 of this tutorial). In
[2], transitions through infeasible topologies are allowed,
since the occurrence of such topologies is much more fre- Although in this example the two resulting topologies are
quent than it is in the capacitor placement problem dis- radial, Figs. 1 and 2, this is not always the case: in
cussed above. The SA algorithm can start with a topol- the previous example, that would have occurred if 15 and
ogy generated randomly or determined by a constructive 19 had been passed to one descendent and 17 and 21 to
algorithm, although the SA algorithm has the tendency the other in which case both resulting topologies would be
to destroy the initial topology in the early stages of the non-radial. When this happens, the topologies are altered
cooling process, where the temperature is still high and to maintain radiality. Mutation also is designed with ra-
uphill movers are accepted with higher probabilities, it diality in mind. As mentioned before, mutation is per-
has been observed in practice that the results are slightly formed by randomly chasing a feeder sention to be added
better when the initialization is made by a constructive and another other belonging to the resulting closed loop
algorithm. Best performance was observe with Garver's to be removed. Consider for example that the following
algorithm [4, 12]. topology is subjected to mutation:
Pl=~ where it is indicated that no circuit has been added to the
feeder section 1, a type 2 circuit has been added to the
feeder section 2, a type 1 circuit has been added to the
If branch 19 is selected for addition, then one of the fol-
feeder section 3, and so forth. Also, a substation of type 1
lowing branches can be removed: 11, 12, 15, 16 and 18(see
has been added to the substation location 1. As a matter
Figs. 1 and 2). The following topology then results:
of fact the substation type is linked to the type of circuit
leaving that substation and so is not actually needed, so
P1=~ that when the circuit is built the substation also is. This
can be taken into account by the following codification,
It is also worth mentioning that in [26] is used a variable which is taken from [35]. In Fig. II(a) are shown the
mutation rate controlled by a SA mechanism and that the initial topology along with the expansion alternatives and
initial configuration is obtained by performing n p -1 tran- in Fig. 11(b) an expansion alternative. According to [35]
sitions with the mutation mechanism described above. the codification for this solution proposal is as follows::
The software GENESIS is used in [18] to develop an
L1 L2 L3 L4 L5 82
application of GA to the capacitor placement problem.
Is this application the most attractive candidate buses Pl=~
for capacitor bank additions are pre-select by sensitivity
analysis, which is helpful in reducing the dimension of the
combinatorial problem being solved. A variation of GA
called mimetic algorithm was used in [19] to solve the
capacitor placement problem considering a three-phase
network model which allow the modeling of unbalanced
systems. Mimetic algorithms are based on GA with a lo-
L1 ~
cal optimization phase in which, once a new population
is found, it is improved by means of local search. The L3 :
algorithm is applied in two stages: an initial stage which ···..··..··..···············0
consists of a conventional GA and a second stage which ::
performs the local search. In the first stage infeasibilities ; L5 L2:~
are treated by penalties; in the second stage only feasible
~ 14 •• :i•• L4
transitions are considered. The codification used in this 0 ------------4 52 •• ·: ••••:·
(a) (b)
application is based on a vector with four parts: (1) in
the first part are stored the candidate buses for capacitor
placement, (2) a second part with the binary representa-
tion of the capacitors that are added per bus for each load Fig. 11. Distribution network: (a) initial network and (b)
level, (3) a thied part similar to the first one but contained a solution proposal.
candidates for replacement, and (4) a fourth part similar
to the second one but containing information about the The genetic operator of crossover can generate non-radial
way banks are operated in replacement. The initial config- topologias. Thus, for example, for the case shown in Fig.
uration is randomly chosen. Crossover is applied to each 11, the following codification represents a non-radial s0-
substring of the codification vector. Remainder stochastic lution:
sampling is used to perform proportional selection. Ll L2 L3 L4 L5 82
The most critical issue in practical application of GA to
the distribution expansion problem is codification. Two
P2 = ITIillliIillJ
alternative approaches to codification originally proposed
in [33, 34] are discussed in the following. In [33] various Another interesting codification for the distribution plan-
types of cables used in primary feeders as well as different ning problem was suggested in [34]. This can be illus-
types of substation equipment are considered in the op- trated by the situation shown in Fig. 12, which was orig-
timization process. In this case the investment decisions inally studied in [35J.
are no longer modeled by binary variables as described In the method of [34] the main idea is to keep the binary
earlier in the paper. In [33] for example, a case with two codification and at the same time minimize the occurenee
substation types and 12 feeder sections, can be codified of infeasible topologies. Also, decodification is an issue
as follows: in order to facilitate the evaluation of the objective func-
tion. The relevant information is then stored in a per bus
structure, which requires the proper number of bits to
............ -'---"'--01-....r.-~..-.-..[ill]
......... keep all the information regarding the feeders connected
Bl B2 Bl B2
: ••••:
;. .f(.
: 81 Z•••••••.•••••••••.••••

.. .."
ILl L1
o 11 1
1 I1
81 L1 L3 Ll nil nil
Pl =
~B3 L3 B4~ L6 L6 L5 L2 L4 82
L~··················~2 L7 L7 L3 L5 L2
j B5 L4 B6:.J..: B5
0··························: 82 :
(a) : ••••:
- 1 GA applied to the transmission network expansion
problem was addressed in [6] and [7]. In [6] a decimal
codification was proposed for representing the network
topologies, the real part of the problem being solved via
Fig. 12. Rede de distribui,c 00: (a) rede inicial, (b) pro- linear program. The decimal representation requires an
posta de solu,c ao adaptation of the mutation mechanism. The binary repre-
sentation is avoided in this case due to problems caused by
the generation of infeasible solutions. This makes the cod-
to a bus. The idea behind this codification can be summa- ification used in GA, similar to the the pones that are used
rized as follows: (1) the number of circuits in operation is in SA and TS. To allow the used of proportional selection,
the number of buses minus one, (2) a bus load should be in [6] two types of transformations have been tested for
connected to at least one circuit, and (3) a bus without creating a maximization problem: (1) z = 1/11 and (2)
load can be connect to zero, one or more circuits. Figure z = K - v, where K is a parameter which is determined
12 can be used to illustrate this type of codification: the such that it is always bigger than the cost associated with
part (a) of the figure shows the system that should be ex- the worst configuration found in the current population.
panded, and the part (b) a solution proposal. According As with the capacitor placement problem describe above,
to [35] the codifiation in this case should be put in the remainder stochastic sampling was used both in [6]and in
following format: [7]. In [6] both single point and multiple point crossover
were used. although for small and medium size networks
the conventional crossover approach works fien, for larger
Bl B2 B3 B4 B5 B6 network, such as the one of Fig. 5, building blocks are
010 1 110 all
010 011 . used to facilitate the generation of meaningful configura-
tions (this build blocks are formed by sets of lines and
81 Ll L3 Ll nil nil
PI = transformers that connected islanded buses to the main
L6 L6 L5 L2 L4 S2 networks). Due to decimal codification, mutation is mod-
ified as follows: entire paths are considered for mutation
L7 L7 L3 L5 L2
and an acceptance test of the type used in SAis performed
L4 to avoid introducing configurations that are too ineffec-
tive. Unlike SA, the initial configurations have a definite
effect on the efficiency of GA when applied to the trans-
where the binary numbers represent the topology of Fig. mission network expansion problem. Thus the heuristic
12(b). In this case nil indicates that the bus will not algorithms of [12, 13, 14] have been used to generate a
be supplied, i.e., it will remain isolated. For example, good initial population, that is, configurations containing
in the previous codification, if B3 is represented by the a number of attractive building blocks.
binary number 10, then it means that the feeder section
L7 is connected to B3. On the other hand, B6 occupies
4.3 Tabu Search
two slots with four options, and represents a plausible
combination. However, if after crossover or mutation, B3 A basic TS algorithm with short term memory, tabu list
is represented by 11, then it would mean a non-existent and aspiration criterion applied to the capacitor place-
configuration. Finally, notice that this codification can ment problem is reported in [20]. In this application, long
reduce the number of infeasible topelogies, that are gen- term memory based on transition frequency has also been
erated by crossover and mutation, although the this possi- used for diversification. As in [18], sensitivity analysis is
bility is not entirely eliminated, as shown by the following used to select a set attractive buses for capacitor bank ad-
example: ditions, which reduces de size of the serve space but can
also affect solution optimality. The codification is made and (3) another Ts strategy for leaving the feasible re-
with the help of a vector set which contain the following gion. The overall metaheuristic is formed by the repeated
informations: candidate bus location, capacitor setting, application of this three-step procedure plus diversifica-
installed capacity, power loss, voltage magnitude, objec- tion. for a comparative evaluation of the various iterative
tive function value and frequency counter. A tabu list approximation methods applied to the transmission net-
with size 15 and an aspiration criterion that established work expansion problem see [5], where both theoretical
that relaxes the tabu restraint if the solution found is bet- and practical aspects are addressed.
ter than the incumbent. The initial topology is feasible
and found by a random method. In [21] a hybrid TS al-
gorithm with features taken from heuristic methods, GA
and SA is suggested for the capacitor placement problem. This chapter addressed the codification and the neighbor-
The hybrid algorithm works with a population as in GA hood definition for four power network problems: optimal
and works in two phases: (1) phase I is an heuristic strat- reconfiguration of distribution systems aiming to optimize
egy that finds a variety of high quality feasible topologies operations, capacitor placement in primary distribution
and, (2) phase II is TS strategy. In phase I topologies are feeders, optimal distribution system expansion (the addi-
found via sensitivity analysis, and variations around these tion of substations and feeders), and the optimal expan-
topologies are found by repeat application of the heuris- sion of transmission networks. Although only three types
tic method and prohibiting the use of specified banks for of randomized iterative algorithms have been considered
obtained different solutions. Part of these topologies will namely, simulated annealing, genetic algorithm and tabu
be elite configurations. All buses that appear in these search, the conclusions can be extended to other methods
solution will be considered as candidates of the reduced such as simulated evolution among others.
candidate set in Phase II which has three steps: (1) for
each element of the population is applied a TS with short
term memory, tabu list and aspiration criterion, and the REFERENCES
solutions found can either become the new incumbent or
enter the elite configuration list, provided that it is better [1] Romero R., Monticelli A.: "Fundamental of Sim-
than at list one of the configuration already in the list ulated Annealing" , IEEE PES Tutorial on Modem
and has a minimum number of attributes that differ from Heuristic Optimization Techniques - Application to
the configurations that will remain in the list; (2) gener- Power Systems, 2001.
ate a new population by genetic recombination and path [2] Romero R., Gallego R.A., Monacelli A.: "Trans-
relinking of the current elite configurations. The neigh- mission System Expansion Planning by Simulated
borhood structure is as defined in Section 3.2. the number Annealing" , IEEE Transactions on Power Systems,
of neighbors is kept at a minimum given that the evalua- Vol. 11(1), pp. 364-369, February 1996.
tion of each one of the neighbors requires the solution of
a power flow problem. [3] Gallego R.A., Altles A.B., Monticelli A., Romero
R.: "Parallel Simulated Annealing Applied to Long
References [8, 9, 10] report the application of the TS Term Transmission Network Expansion Planning" "
method to the transmission network expansion problem. IEEE Transactions on Power Systems, Vol. 12(1),
In [8] and [9] the codification is the same as in [5] and pp. 181-188, February 1997. .
so the same observations made regarding SA and GA al-
gorithms hold true for the TS application. The neigh- [4] Gallego R.A.: "Planejamento a Longo Prazo
borhood structure used in [8] comprises three parts: (1) de Sistemas de Transmissio Usando Tecnicas de
one for the basic TS algorithm based on short term mem- Otimiz~ Combinatorial", Tese de Doutorado,
ory, and where infeasible transitions are allowed (methods UNICAMP, 1997.
[12, 13, 14] are used in this part), (2) other for the intensi- [5] Gallego R.A., Monticelli A., Romero R.: "Compar-
fication phase, where only feasible transitions are allowed ative Studies of Non-Convex Optimization Meth-
(a greedy search is carried out), and (3) another for the
ods for Transmision Network Expansion Planning" ,
diversification phase, where the strategic oscillation takes IEEE Transactions on Power Systems, Vol. 13(2),
the current solution back to the infeasible region (this May, 1998.
is achieved by simply removing one or more circuits of
the current configuration). In the three cases the sizes [6] Gallego R.A., Monticelli A., Romero R.: "Trans-
of neighborhoods are kept at a minimum containing only mission System Expansion Planning by Extended
high quality alternatives. In [8] a three-step strategic os- Genetic Algorithm" , IEE Proceedings