IEEE
02TP160
Modern Heuristic Optimization Techniques
with Applications to Power Systems
Sponsored by:
Edited by
K. Y. Lee
and
M.A. EISharkawi
From the Course Editors
Several heuristic tools have evolved in the last decade that facilitate solving optimization
problems that were previously difficult or impossible to solve. These tools include evolutionary
computation, simulated annealing, tabu search, particle swarm, etc. Reports of applications of
each of these tools have been widely published. Recently, these new heuristic tools have been
combined among themselves and with knowledge elements, as well as with more traditional
approaches such as statistical analysis, to solve extremely challenging problems. Developing
solutions with these tools offers two major advantages: 1) development time is much shorter than
when using more traditional approaches, and, 2) the systems are very robust, being relatively
insensitive to noisy and/or missing data.
The purpose of this course is to provide participants with basic knowledge of evolutionary
computation, and other heuristic optimization techniques, and how they are combined with
knowledge elements in computational intelligence systems. Applications to 'power problems are
stressed, and example applications are presented. The tutorial is composed of two parts: The
first part gives an overview of modem heuristic optimizationtechniques, including fundamentals
of evolutionary computation, genetic algorithms, evolutionary programming and strategies,
simulated annealing, tabu search, and hybrid system of evolutionary computation. It also gives
an overview ofpower system applications.
The second part of the tutorial deals with specific applications of the heuristic approaches to
power system problems, such as security assessment, operational planning, generation,
transmission and distribution planning, state estimation, and power plant and power system
control.
Evolutionary Computation:
The objectives are to provide an overview of how evolutionary computation and other heuristic
optimization techniques may be applied to problems within your domain of expertise, to provide
a good understanding of the design issues involved in tailoring heuristic algorithms to realworld
problems, to compare and judge the efficacy of modem heuristic optimization techniques with
other more classic methods of optimization, and to program fundamental evolutionary algorithms
and other heuristic optimization routines.
iii
Genetic Algorithms:
Genetic Algorithm (GA) is a search algorithm based on the conjecture of natural selection and
genetics. The features of genetic algorithm are different from other search techniques in several
aspects. First, the algorithm is a multipath that searches many peaks in parallel, and hence
reducing the possibility of local minimum trapping. Secondly, GA works with a coding of
parameters instead of the parameters themselves. The coding of parameter will help the genetic
operator to evolve the current state into the next state with minimum computations. Thirdly, GA
evaluates the fitness of each string to guide its search instead of the optimization function. The
genetic algorithm only needs to evaluate objective function (fitness) to guide its search. There is
no requirement for derivatives or other auxiliary knowledge. Hence, there is no need for
computation of derivatives or other auxiliary functions. Finally, GA explores the search space
where the probability of finding improved performance is high.
Evolution Strategies (ES) employ realcoded variables and, in its original form, it relied on
Mutation as the search operator, and a Population size of one. Since then it has evolved to share
many features with GA. The major similarity between these two types of algorithms is that they
both maintain populations of potential solutions and use a selection mechanism for choosing the
best individuals from the population. The main differences are: ES operate directly on floating
point vectors while classical GAs operate on binary strings; GAs rely mainly on recombination
to explore the search space, while ES uses mutation as the dominant operator; and ES is an
abstraction of evolution at individual behavior level, stressing the behavioral link between an
individual and its offspring, while GAs maintain the genetic link.
Particle Swarm:
iv
Tabu Search:
Tabu Search (TS) is basically a gradientdescent search with memory. The memory preserves a
number of previously visited states along with a number of states that might be considered
unwanted. This information is stored in a Tabu List. The definition of a state, the area around it
and the length of the Tabu list are critical design parameters. In addition to these Tabu
parameters, two extra parameters are often used: Aspiration and Diversification. Aspiration is
used when all the neighboring states of the current state are also included in the Tabu list. In that
case, the Tabu obstacle is overridden by selecting a new state. Diversification adds randomness
to this otherwise deterministic search. If the Tabu search is not converging, the search is reset
randomly.
Simulated Annealing:
In statistical mechanics, a physical process called annealing is often performed in order to relax
the system to a state with minimum free energy. In the annealing process, a solid in a heat bath
is heated up by increasing the temperature of the bath until the solid is melted into liquid, then
the temperature is lowered slowly. In the liquid phase all particles of the solid arrange
themselves randomly. In the ground state the particles are arranged in a highly structured lattice
and the energy of the system is minimum. The ground state of the solid is obtained only if the
maximum temperature is sufficiently high and the cooling is done sufficiently slowly. Based on
the annealing process in the statistical mechanics, the Simulated Annealing (SA) was introduced
for solving complicated combinatorial optimization.
The name 'simulated annealing' originates from the analogy with the physical process of solids,
and the analogy between physical system and simulated annealing is that the cost function and
the solution (configuration) in the optimization process correspond to the energy function andthe
state of statistical physics, respectively. In a large combinatorial optimization problem, an
appropriate perturbation mechanism, cost function, solution space, and cooling schedule are
required in order to find an optimal solution with simulated annealing. SA is effective in
network reconfiguration problems for largescale distribution systems, and its search capability
becomes more significant as the system size increases. Moreover, the cost function with a
smoothing strategy enables the simulated annealing to escape more easily from local minima and
to reach rapidly to the vicinity of an optimal solution.
v
For further information, please contact
K. Y. Lee
DepartmentofElectrical Engineering
The PennsylvaniaState University
University Park, PA 16802
Phone:(814) 8652621
Fax: (814) 8657065
email: kwanglee@psu.edu
M. A. EISharkawi
Department of Electrical Engineering
University of Washington
Seattle, WA 981952500
Phone: (206) 6852286
Fax: (206)5433842
email: elsharkawi@ee.washington.edu
vi
In memory of Alcir J. Monticelli (1946  2001):
Alcir Jose Monticelli was born on November 16, 1946 in Rio Capinzal, Santa Catarina, Brazil. He was a
Fellow of the IEEE and a member of the Brazilian Academy of Sciences. He received his B.S. degree in
electronic engineering from the Insituto Tecnol6gico de Aeronantica (ITA) in 1970, the M.S. degree from
Universidade Federal da Paraiba (UFPB) in 1972, and the Ph.D. degree from Universidade Estadual de
Campinas (Unicamp) in 1975, all in Brazil. From 1982 to 1985, he was a visiting professor at the
University of California Berkeley where he worked on theoretical aspects of network analysis, and from
1991 to 1992 he was with Mitsubishi Electric Corporation, Japan as a researcher of the artificial
intelligence and parallel computing group. He was a professor of electrical engineering at Unicamp since
1972.
''Being a professor wasn't just a profession for him. It was a way of life: he used to observe everything
and teach all the time. He had a great pleasure living that way".  Isadora Monticelli
Everyone that had worked with him. corroborates the words of Alcir's daughter. He was the author of
three books on power systems and had more than 40 articles published in international journals,
transactions and proceedings with more than 500 citations according to the Science Citation Index. He
was a collaborator of National Science Foundation and a member of most conferences in the power
engineering area.
"Alcir Monticelli was a very important academic leader"  words of Carlos Henrique de Brito Cruz,
president of FAPESP (Fun~ao de Amparo a Pesquisa do Estado de Sao Paulo)  the State of Sao Paulo
Research Foundation. Alcir was an active collaborator with several projects, and he was the mentor of
Small Business Innovation Research program in the State of Sao Paulo by FAPESP. His creativity and
intelligence is present in the way the power system is treated nowadays. Load flow, state estimation,
security analysis, and network planning, had undergone' great advances with his contributions. The
recognition for his contributions came with the honor of IEEE Fellow (1996), Engineer of the Year in
Latin America (1997) and with the IEEE Third Millennium Medal (2000). As a professor, researcher,
writer and a man engaged with technological innovation, he never neglected his family, which was the
source of constant strength and joy to him. With his wife, Maria Stella he left three lovely daughters,
Viridiana, Isadora, and Eleonora, who are all successful. He will always be remembered and when
problems arise in power systems we will deeply miss his discussions.
vii
List ofContributors
viii
Table of Contents
Page
Part 1: Theory ofEvolutionary Computation
Chapter 1. Theory of Evolutionary Computation 1
Chapter 2. Overview of Applications in Power Systems 16
Chapter 3. Fundamentals of Genetic Algorithm 24
Chapter 4. Fundamentals of Evolution Strategies and Evolutionary Programming 33
Chapter 5. Fundamentals of Particle Swann Techniques 45
Chapter 6. Fundamentals of Simulated Annealing 52
Chapter 7. Fundamentals of Tabu Search 67
Chapter 8. Hybrid Systems: An Example with Fuzzy Systems 81
ix
Chapter 1
Theory of Evolutionary Computation
2
2. Calculatefitness for each individual in the population, in our population with an eightbit binary string. The bi
3. Reproduce selected individuals to form a new popula nary string 00000000 will evaluate to 0, and 11111111 to
tion, 255.
4. Perform crossoverand mutation on the population, and It next must be decided how many individuals will make
5. Loop to step 2 until some condition is met. up the population. In an actual application,it would be com
mon to have somewhere between a few dozen and a few
In some GA implementations, operations other than cross hundred individuals. For the purposes of this illustrative
over and mutationare canied out in step four. example, however,the populationconsists of eight individu
als.
4.3 A SimpleGA ExampleProblem
The next step is to initializethe population. This is usually
Because implementing a "plain vanilla" GA paradigmis so done randomly. A randomnumber generator is thus used to
simple, a sample problem (also simple) seems to be the best assign a 1 or 0 to each of the eight positions in each of the
way to introduce most of the basic GA concepts and meth eight individuals, resultingin the initial population in Figure
ods. As will be seen, implementing a simple GA involves 1. Also shown in the figure are the values of x and f(x) for
only copying strings, exchanging portions of strings, and each binary string.
flipping bits in strings. After fitness calculation, the next step is reproduction.
Our sample problem is to find the value of x that maxi Reproduction consists of forming a new population with the
mizes the function f(x) = sin(7t x/2S6) over the range same total number of individuals by selecting from members
oS; x s 255 , where values of x are restricted to integers. This of the current population with a stochastic process that is
is just the sine function from zero to 1t radians. Its maxi weighted by each of their fitness values. In the example
mum value of 1 occurs at 1t /2, or x = 128. The function problem, the sum of all fitness values for the initial popula
value and the fitness value are thus defmed to be identical tion is 5.083. Dividing each fitness value by 5.083, then,
for the sample problem. yields a normalized fitness value fnorm for each individual.
There is only one variable in our sample problem: x. It is The sum of the normalized values is 1.
assumed for the sample problem that the GA paradigmuses These normalized fitness values are used in a process
a binary alphabet. The first decision to be made is how to called ''roulette wheel" selection, where the size of the rou
represent the variable. This has been made easy in this case lette wheel wedge for each population member, which re
since the variable can only take on integer values between 0 flects the probability of that individual being selected, is
and 255. It is therefore logical to represent each individual proportional to its normalized fitness value.
The roulette wheel is spun by generating eight random ness value. It is possible,though highly improbable, that the
numbers between 0 and 1. If a random number is between 0 individual with the lowest fitness value could be selected
and .144, the first individual in the existing population is eight times in a row and make up the entire next population.
selected for the next population. If one is between .144 and It is more likely that individuals with high fitness values are
(.144 + .093) = 237, the second individual is selected, and picked more than once for the new population.
so OD. Finally, if the random number is between (1  .128) = The eight random numbers generated are 293, .971, .160,
.872 and 1.0, the last individual is selected. The probability .469, .664, .568, .371, and .109. This results in initial popu
that an individual is selected is thus proportional to its fit. lation member numbers 3, 8, 2, 5, 6, 5, 3, and 1 being cho
3
sen to make up the population after reproduction, as shown First, the population is paired off randomly into pairs of
in Figure 2. parents. Sincethe order of the population after reproduction
in Figure 3 is already randomized, parents will be paired as
0 1 1 0 0 0 1 1 they appear there. For each pair, a random number is gener
0 0 1 1 0 1 0 1 ated to determine whether crossover will occur. It is deter
1 1 0 1 1 0 0 0 minedthat three of the four pairs willundergo crossover.
1 0 1 0 1 1 1 0 Next, for the pairs undergoing crossover, two crossover
0 1 0 0 1 0 1 0 points are selected at random. (Other crossover techniques
1 0 1 0 1 i 1 0
are discussed later in this tutorial.) The portions of the
0 1 1 0 0 0 1 1
1 0 1 1 1 1 0 1
strings between the first and second crossover points (mov
ing from left to right in the string) will be exchanged. The
Figure2: Population after reproduction. paired population, with the first and second crossoverpoints
labeled for the three pairs of individuals undergoing cross
The next operation is crossover. To many evolutionary over, is illustrated in Figure 3a prior to the crossover opera
computation practitioners, crossover of binary encodedsub tion. The portions of the strings to be exchanged are in bold.
strings is what makes a genetic algorithm a genetic algo Figure 3b illustrates the population after crossover is per
rithm. Crossover is the process of exchanging portions of formed.
the strings of two "parent" individuals. Note that, for the third pair from the top, the first crossover
An overall probability is assigned to the crossover process point is to the right of the second. The crossover operation
which is the probability that, given two parents, the cross thus "wraps around" the end of the string, exchanging the
over processwill occur. This crossover rate is often in the portionbetweenthe first and the second, movingfrom left to
range of .65 to .80; a value of .75 is selected for the sample right For twopoint crossover, then, it is as if the head (left
end) of each individual string is joined to the tail (right end),
problem.
thus forming a ring structure. The section exchanged starts
at the first crossover point, moving to the right along the
binary ring, and ends at the second crossover point. The
values of x and f(x) for the population following crossover
appear in Figure3c and 3d, respectively.
1 2 Individuals x i(x)
0 1 110 o 011 1 o 1110 1 1 1 119 .994
0 0 111 o 110 1 00100 0 0 1 33 .394
1 2
111 0 1 110 0 0 1 0 1 0 1 0 0 0 168 .882
110 1 0 111 1 0 1 1 0 1 1 1 1 0 222 .405
2 1
o 110 0 1 0 110 1 0 0 0 1 0 1 0 138 .992
1 011 0 1 1 110 0 1 1 0 1 1 1 0 110 .976
0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 99 .937
1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 189 .733
The final operation in this plain vanilla genetic algorithm the probability of mutationcan vary widely according to the
is mutation. Mutation consists of flipping bits at random, application and the preference of the researcher. Values of
generally with a constant probability for each bit in the between .001 and .01 are not unusual for the mutationprob
population. As is the case with the probability of crossover, ability. This meansthat the bit at each site on the bitstringis
4
flipped, on average, between0.1 and 1.0percent of the time. assigns the correct number of bits, and the coding. For
One fixed value is used for each generation and often is example, if a variable has a range from 2.5 to 6.5 (a dynamic
maintained for an entire run. range of 4) and it is desired to have a resolution of three
Since there are 64 bits in the example problem's popula decimal places, the product of the dynamic range and the
tion (8 bits x 8 individuals), it is quite possible that none resolution requires a string 12 bits long, where the string of
would be altered as a result of mutation, so the populationof Os represents the value 2.5. A major advantage of being able
Figure 3b will be taken as the "final" population after one to represent variables in this way is that the user can think of
iterationof the GA procedure. Going thro~ the entire GA the population individuals as realvalued vectors rather than
procedure one time is said to produce a new generation. as bit strings, thus simplifying the developmentof GA appli
The population of Figure 3b therefore represents the first cations.
generation of the initial randomizedpopulation. The "alphabet" used in the representation can, in theory,
Note that the fitness values now total 6.313, up from 5.083 be any finite alphabet. Thus, rather than using the binary
in the initial random population, and that there are now two alphabet of 1 and 0, we could use an alphabet containing
members of the population with fitness values higher than more characters or numbers. Most GA implementations,
.99. The average and maximum fitness values have thus however, use the binary alphabet
both increased.
4.4.2 Population Size
The population of Figure 3b, and the corresponding fitness
values in Figure 3d are now ready for another round of re De Jong's dissertation (1975) offers guidelines that are
production, crossover, and mutation, producing yet another still usually observed: start with a relatively high crossover
generation. More generationsare produceduntil some stop rate, a relatively low mutation rate, and a moderately sized
ping condition is met. The researcher may simply set a population  though just what constitutes a moderatelysized
maximumnumber of generationsto let the algorithmsearch, population is unclear. The main tradeoff is obvious: a large
may let it run until a performancecriterion has been met, or population will search the space more completely, but at a
may stop the algorithm after some number of generations higher computational cost. The authors generally have used
with no improvement. populations of between 20 and 200 individuals, depending,
it seems, primarily on the string length of the individuals. It
4.4 A Review of GA Operations also seems (in the authors' experience) that the sizes of
Now that one iteration of the GA operations (one genera populations used tend to increase approximately linearly
tion) for the example problem has been completed, each of with individual string length, rather than exponentially, but
the operations is reviewed in more detail. Various ap "optimal" population size (if an optimal size exists) depends
proaches, andreasonsfor each, are examined. on the problem, as well.
4.4.1 Representation of Variables 4.4.3 Population Initialization
The representation of the values for the variable x was The initialization of the population is usually done
made (perhaps unrealistically) straightforward by choosing a stochastically, though it is sometimes appropriate to start
dynamic range of 256; an eightbit binary number was thus with one or more individuals that are selected heuristically.
an obvious approach. Standard binary coding, however, is The GA is thereby initially aimed in promising directions, or
only one approach; others may be more appropriate. given hints. It is not uncommon to seed the populationwith
In this example, the nature of the sine function places the a few members selected heuristically, and to complete the
optimal value of x at 128, where f(x) is 1. The binary repre population with randomly chosen members. Regardless of
sentation of 128 is 10000000; the representation of 127 is the process used, the population should represent a wide
01111111. Thus, the smallest change in fitness value can assortment of individuals. The urge to skew the population
require a change of every bit in the representation. This significantly should generally be avoided, if the limited
situation is an artifact of the encoding scheme and is not experience of the authors is generalizable.
desirable  it only makes the GA's search more difficult. 4.4.4 Fitness Calculation
Often, a better representation is one in which adjacent inte
ger valueshave a Hammingdistance of one; in other words, The calculation of fitness values is conceptually simple,
adjacent values differ by only a singlebit. One such scheme though it can be quite complex to implement in a way that
is Graycoding. optimizes the efficiency of the GA' s search of the problem
Some GA software allows the user to specify the dynamic space. In the example problem, the value of f(x) varies
range and resolution for each variable. The program then (quite conveniently) from 0 to 1. Lurking within the prob
lem, however, are two drawbacks to using the "raw" func
5
tion output as a fitness function: one that is common to mizing the function!(x) = XZ over the range 0 to 10; there is
many implementations, the other arising from the nature of a higher differential in values of f(x) between adjacent val
the sampleproblem. ues ofx near 10 than near o. Thus slight change of the inde
The first drawback, common to many implementations, is pendent variable results in great improvement or deteriora
that after the GA has beenrun for a number of generationsit tion of performance  which is equally informative  near
is not unusual for most (if not all) of the individuals' fitness the optimum.
values, after, say, a few dozen generations, to be quite high. In the discussion thus far, it has been assumed that optimi
In cases where the fitness value can range from 0 to 1, for zation implies fmding a maximum value. Sometimes, of
example (as in the sample problem), most or all of the fit course, optimization requires fmding a minimum value.
ness values may be 0.9 or higher. This lowers the fitness Some versions of GA implementations allow for this possi
differences between individuals that provide the impetus for bility. Often, it is required that the user specify the maxi
effective roulettewheel selection; relatively higher fitness mum value I'maX of the function being optimized, ftx), over
valuesshould have a higherprobabilityof reproduction. the range of the search. The GA then can be programmed to
One way around this problem is to equally space the fit maximize the fitness function/max  f(x). In this case, scal
ness values. For example, in the sampleproblem,the fitness ing, described above, keeps track of fax over the past w
valuesused for reproduction could be equally spaced from 0 generations and uses it as a "roof' value from which to cal
to I, assigning a fitness value of 1 to the most fit population culate fitness.
member, 0.875 to the second, and 0.125 to the lowest fitness
value of the eight. In this case the population members are 4.4.5 Roulette Wheel Selection
ranked on the basis of fitness and then their ranks are di In genetic algorithms, the expected number of times each
vided by the number of individuals to provide a probability individual in the current population is selected for the new
threshold for selection. Note that the value of 0 is often not population is proportional to the fitness of that individual
assigned, since that would result in one population member relative to the average .fitness of the entire population.
being made ineligible for reproduction. Also note that ftx), Thus, in the initial population of the example problem,
the function result, is now not equal to the fitness, and that where the average fitness was 5.083/8 = 0.635, the third
in order to evaluate actual performance of the GA, the func population member had a fitness value of 0.937, so it could
tion value should be monitored as well as the spacedfitness. be expected to appear about 1.5 times in the next population;
Another way around the problem is to use what is called it actually appeared twice.
scaling. Scaling takes into account the recent history of the The conceptualization is that of a wheel whose surface is
population, and assigns fitness values on the basis of com subdivided into wedges representing the probabilities for
parison of individuals' performance to the recent average each individual. For instance, one point on the edge is de
perfonnance of the population. If the GA optimization is termined to be the zero point and each arc around the circle
maximizing some function, then scaling involves keeping a corresponds to an area on the number line between zero and
record of the minimum fitness value obtained in the last w one. A random number is generated, between 0.0 and 1.0,
generations, where w is the size of the scaling window. If, and the individual whose wedge contains that number is
for example, w = 5, then the minimum fitness value in the chosen.
last five generations is kept and used instead of 0 as the In this way, individuals with greater fitness are more likely
"floor" of fitness values. Fitness values can be assigned a to be chosen. The selection algorithm can be repeated until
value based on their actualdistance from the floor value, or the desired number of individuals have been selected.
they can be equally spaced, as described earlier. One variation on the basic roulette wheel procedure is a
The second drawbackis that the exampleproblem exacer process developed by Baker (1987) in which the portion of
bates the compression of fitness values situation described the roulette wheel is assigned based on each unique string's
earlier, because near the global optimum fitness value of 1, relative fitness. One spin of the roulette wheel then deter
f(x) (which is also the fitness) is relativelyflat. There is thus mines the number of times each string will appear in the next
relatively little selection advantage for population members generation. To illustrate how this is done, assume the fitness
near the optimum value x = 128. If this situation is known values are normalized (sum of all equals 1). Each string is
to exist, a different representation schememight be selected, assigned a portion of the roulette wheel proportional to its
such as defming a new fitness function which is the function normalized fitness. Instead of one "pointer" on the roulette
outputraised to some power. wheel spun n times, there are n pointers spaced lin apart; the
Note that the shape of some functions "assists" discrimina npointer assembly is spun once. Each of the n pointersnow
tion near the optimum value. For example, consider maxi points to a string; each place one of the n pointers points
6
determines one population member in the next generation. tion to one set of environmental constraints, another indi
If a string has a normalized fitness greater than lIn (corre vidual might have evolved to deal better with another aspect
sponding to an expected value greater than 1), it is guaran of survival. Perhaps one genetic line of rabbits has evolved
teed at leastone occurrence in the next generation. a winter coloration that protects it through the changing sea
In the discussion thus far, it has been assumed that all of sons, while another has developed a "freeze" behavior that
the population members are replaced each generation. Al makes it hard for predators to spot. Mating between these
though this is often the case, it sometimes is desirable to two lines of rabbitsmight result in offspring lacking both of
replace only a portion of the population, say, the 80 percent the advantages, offspring with one or the other characteristic
with the worstfitness values. The percentage of the popula either totally or in some degree, and might end up with off
tion replaced each generation is sometimes called the gen spring possessing both of the advantageous traits. Selection
eration gap. will decide, in the long run, which of these possibilities are
Unless some provision is made, with standard roulette most adaptable; the ones that adapt, survive.
wheel selection it is possible that the individual with the Crossover is a term for the recombination of genetic in
highest fitness value in a given generation may not survive formation during sexual reproduction. In GA, offspring
reproduction, crossover, and mutation to appear unaltered in have equal probabilities of receiving any gene from either
the new generation. It is frequently helpful to use what is parent, as the parents' chromosomes are combined ran
called the elitist strategy, which ensures that the individual domly. In nature,chromosomalcombination leaves sections
with the highest fitness is always copied into the next gen intact, that is, contiguoussections of chromosomes from one
eration. parent are combined with sectionsfrom the other, ratherthan
simply shuffling randomly. In GA there are many ways to
4.4.6 Crossover
implementcrossover.
The most important operator in GA is crossover, based on The two main attributes of crossoverthat can be variedare
the metaphor of sexual combination. (An operator is a rule the probabilitythat it occurs and the type of crossover that is
for changinga proposed problem solution.) If a solution is implemented. The following paragraphs examine variations
encoded as a bitstring, then mutation may be implemented of each.
by setting a probability threshold and flipping bits when a A crossover probability of 0.75 was used in the sample
random number is less than the threshold. As a matter of problem, and twopoint crossover was implemented. Two
fact, mutation is not an especially important operator in GA; point crossover with a probability of 0.600.80 is a rela
it is usually set at a very low rate or omitted altogether. tively common choice, especially when Gray coding is used.
Crossover is more important, and adds a new dimension to The most basic crossover type is onepoint crossover, as
the discussion of evolution so far. describedby Holland(1975/1992)and others, e.g., Goldberg
Other evolutionary algorithms use' random mutation (1989) and Davis (1991). It is inspiredby natural evolution
plus selection as the primary method for searching the land processes. Onepoint crossover involves selecting a single
scape for peaks or niches. One of the greatest and most fun crossover point at random and exchanging the portions of
damental search methods that biological life has found is the individual strings to the right of the crossover point.
sexual reproduction, which is extremely widespread Figure 4 illustrates onepoint crossover; portions to be ex
throughout both the animal and plant kingdoms. Sexual changed are in bold in Figure 4a.
reproduction capitalizes on the differences and similarities
among individuals within a species; where one individual
may have descended from a line that contained a good solu
7
1 0 1 1 010 1 0 1 0 1 1 0 1 0 0
0 1 0 o 111 0 0 0 1 a o
1 0 1 0
(a) (b)
Another type of crossover that has been found useful is the run flle can be set that increases the mutation rate sig
called uniform crossover, described by Syswerda (1989). A nificantly when the variability in fitness values becomes
random decision is made at each bit position in the string as low, as is often the case late in the run.
to whether or not to exchange (crossover) bits between the
4.4.8 Final Comments on GeneticAlgorithms
parent strings. If a 0.50 probability at each bit position is
implemented, an average of about 50 percent of the bits in In sum, the genetic algorithm operates by evaluating a
the parent strings are exchanged. Note that a SO percent rate population of bitstrings (there are realnumbered GAs, but
will result in the maximum disruption due to uniform cross binary implementations are more common), and selecting
over. Higher rates just mirror rates lower than SO percent. survivors stochastically based on their fitness, so fittermem
For example, a 0.60 probability uniform crossover rate pro bers of the population are more likely to survive. Survivors
duces results identical to a 0.40 probability rate. If the rate are paired for crossover, and often some mutation is per
were 100 percent, the two strings would simply switch formed on chromosomes. Other operations might be per
places, and if it were zero percent neither would change. formed as well, but crossover and mutation are the most
Values for the probability of crossover vary with the prob important ones. Sexual recombination of genetic material is
lem. In general, values between 60 and 80 percent are com a powerful method for adaptation.
mon for onepoint and twopoint crossover. Uniform The material on genetic algorithms in this tutorial has pro
crossover sometimes works better with slighdy lower cross vided only an introductionto the subject. It is suggested that
over probability. It is also common to start out running the the reader explore GAs further by sampling the references
GA with a relatively higher value for crossover, then taper cited in this section. With further study and application, it
off the value linearly to the end of the run, ending with a willbecomeapparent why GAshave such a devotedfollow
value of, say, onehalfthe initial value. ing. In the words of Davis (1991):
"... [T]here is somethingprofoundly movingaboutlink
4.4.7 Mutation inga genetic algorithm to a difficultproblemandreturn
In GAs, mutation is the stochastic flipping of bits that oc inglatertofind that the algorithm has evolveda solution
curs each generation. It is done bitbybit on the entire that is betterthan the one a humanfound With genetic
population. It is often done with a probability of something algorithms we are not optimizing,· we are creating condi
like .001, but higher probabilities are not unusual. For ex tions in whichoptimization occurs, as it may haveoc
ample,Liepins and Potter (1991) used a mutation probability curredin the naturalworld. Onefeels a kind ofreso
of .033 in a multiplefaultdiagnosis application. nanceat such timesthat is uncommon andprofound "
If the population comprises realvalued parameters, muta This feeling, of course, is not unique to experiences with
tion can be implemented in different ways. For instance, in GAs;using other evolutionary algorithmscan result in simi
an image classification application, Montana (1991) used lar feelings.
strings of realvalued parameters that represented thresholds
of event detection roles as the individuals. Each parameter 5. PARTICLESWARM OPTIMIZATION
in the string was rangelimited and quantized (could take on
only a certain finite number of values). If chosen for muta 5.1 Introduction
tion, a parameter was randomly assigned any allowed value
in the range of values valid for that parameter. This section presents an evolutionary computation method
The probability of mutation is often held constant for the for optimization of continuous nonlinear functions. The
entire run of the GA, although this approach will not pro method was discovered by Jim Kennedy through simulation
duce optimal results in many cases. It can be varied during of a simplified social model; thus, the social metaphor is
the run, and if varied, usually is increased. For example, discussed, though the algorithm stands without metaphorical
mutation rate may start at .001 and end at .01 or so when the support. Much of the material appearing in this sectionwas
specifiednumber of generations has been completed. In the initially presented at two conferences (Kennedy and Eber
software implementation described in Appendix B, a flag in hart 1995; Eberhart and Kennedy 1995). A special ac
8
knowledgment is due to Jim Kennedy for origination of the simulations. Reynolds was intrigued by the aesthetics of
concept, and for many of the words in this section,whichare bird flocking choreography, and Heppner, a zoologist, was
his. Any mistakes, of course, are the responsibility of the interested in discovering the underlying rules that enabled
author. large numbers of birds to flock synchronously, often chang
The particle swann optimization concept is described in ing directionsuddenly, scatteringand regrouping, etc.
terms of its precursors, and the stages of its development Both of these scientists had the insight that local processes,
from social simulation to optimizer are briefly reviewed. such as those modeled by cellular automata, might underlie
Discussed next are two paradigms that implement the con the seemingly unpredictable group dynamics of bird social
cept, one globally oriented (GBESn and one locally ori behavior. Both models relied heavily on manipulation of
ented (LBESn, followed by results obtained from applica interindividual distances; that is, the synchrony of flocking
tions and tests upon which the paradigms have been shown behavior was thought to be a function of birds' efforts to
to perform successfully. maintain an optimum distance between themselves and their
Particle swann optimization has roots in two main compo neighbors.
nent methodologies. Perhaps more obvious are its ties to It does not seem a tooIarge leap of logic to suppose that
artificial life (Alife) in general, and to bird flocking, fish some similar rules underlie the social behavior of animals,
schooling, and swarming theory in particular. It is also re including herds, schools, and flocks, and that of humans. As
lated, however, to evolutionary computation, and it has ties sociobiologist E. O. Wilson (1975) haswritten, in reference
to both genetic algorithms and evolution strategies (Baeck to fish schooling, "In theory at least, individual members of
1995). the school can profit from the discoveries and previous ex
Particle swarm optimizationcomprises a very simple con perience of all other members of the school during the
cept, and paradigms are implemented in a few lines of com search for food. This advantage can become decisive, out
puter code. It requires only primitive mathematical opera weighing the disadvantages of competition for food items,
tors, and is computationally inexpensive in terms of both whenever the resource is unpredictably distributed in
memory requirements and speed. Testing has found the im patches." This statement suggests that social sharing of in
plementations to be effective with several kinds of problems formation among conspeciates offers an evolutionaryadvan
(Eberhart and Kennedy 1995). This section discusses appli tage: this hypothesis was fundamental to the developmentof
cation of the algorithm to the training of neural network particleswann optimization.
weights. Particle swann optimizationhas also been demon One motive for developing the simulation was to model
strated to perform well on genetic algorithm test functions. human social behavior, which is of course not identical to
The performance on Schaffer's /6 function, as described in fish schoolingor bird flocking. One important difference is
Davis(1991), is discussed. abstractness. Birds and fish adjust their physical movement
Particle swarm optimization can be used to solve many of to avoid predators, seek food and mates, optimize environ
the same kinds of problems as genetic algorithms. This op mental parameters such as temperature, etc. Humans adjust
timization techniquedoes not suffer, however, from some of not only physical movement, but cognitive or experiential
difficulties encountered with genetic algorithms; interaction variables as well. We do not usuallywalk in step and turn in
in the group enhances rather than detracts from progress unison (although some fascinating research in human con
toward a solution. Further, a particle swarm system has formity shows that we are capable of it); rather, we tend to
memory, which a genetic algorithm population does not adjust our beliefs and attitudes to conform with those of our
have. Change in genetic populations results in destruction socialpeers.
of previous knowledge of the problem, except when elitism This is a major distinction in terms of contriving a com
is employed, in which caseusually one or a small numberof puter simulation, for at least one obvious reason: collision.
individuals retain their "identities." In particle swarm opti Two individuals can hold identical attitudes and beliefs
mization, individuals who fly past optima are tugged to re without banging together, but two birds cannot occupy the
turn toward them; knowledge of good solutions is retained same position in space without colliding. It seems reason
by all particles. able, in discussing human social behavior, to map the con
cept of change into the bird/fish analogue of movement.
5.2 Simulating Social Behavior
This is consistent with the classic Aristotelian view of
A number of scientists have created computer simulations qualitative and quantitative change as types of movement
of various interpretations of the movements of organisms in Thus, besides moving through threedimensional physical
a bird flock or fish school. Notably, Reynolds (1987) and space, and avoiding collisions, humans change in abstract
Heppner and Grenander (1990) developed bird flocking multidimensional space, collisionfree. Physical space, of
9
course, affects informational inputs, but it is arguably a within the context of training a multilayer perceptron neural
trivial component of psychological experience. Humans network.
learn to avoid physicalcollision by an early age, but naviga
5.4 Training a MultilayerPerceptron
tion of ndimensional psychosocial space requires decades
of practiceand many of us never seem to acquire quite all
5.4.1 Introduction
the skills we need!
The problem of finding a set of weights to minimize re
5.3 The ParticleSwarmOptimization Concept siduals in a feedforward neural network is not a trivial one.
As mentioned earlier, the particle swarm concept began as It is nonlinear and dynamic in that any change of one weight
a simulation of a simplified social milieu. The original in may require adjustment of many others. Gradient descent
tent was to graphically simulate the graceful but unpredict techniques, e.g., backpropagation of error, are usually used
able choreography of a bird flock. Initial simulations were to fmd a matrix of weights that meets error criteria, although
modified to incorporate nearestneighbor velocity matching, there is not widespread satisfaction with the effectiveness of
eliminate ancillary variables, and incorporate multidimen thesemethods.
sional search and acceleration by distance (Kennedy and A number of researchers have attempted to use genetic
Eberhart 1995). At some point in the evolutionof the algo algorithms (GAs) to find sets of weights, but the problem is
rithm, it was realized that the conceptual model was, in fact, not well suited to crossover. Because a large number of
an optimizer. Througha process of trial and error, a number possible solutions exist, two chromosomes with high fitness
of parameters extraneous to optimization were stripped out evaluations are likely to be very different from one another.
of the algorithm, resulting in the very simple implementa Therefore, recombination may not result in improvement.
tions described next. In this example, a threelayer network designed to solve
Particle swarm optimization is similar to a genetic algo the XOR problem is used as a demonstrationof the particle
rithm (Davis 1991) in that the system is initialized with a swarm optimization concept. The network has two inputs,
population of random solutions. It is unlike a genetic algo three hidden processing elements (PEs), and one output PEe
rithm, however, in that each potential solution is also as The output PE returns a I if both inputs are the same, that is,
signed a randomized velocity, and the potential solutions, for input vector (1,1) or (0,0), and returns 0 if the inputs are
calledparticles, arethen "flown" through hyperspace. different (1,0) or (0,1). Counting bias inputs to the hidden
Each particle keeps track of its coordinates in hyperspace and output PEs, solution of this problem requires estimation
which are associated with the best solution (fitness) it has of 13 floatingpoint parameters. Note that, for the current
achieved so far. (The value of that fitness is also stored.) presentation, the number of hidden PEs is arbi1rary. A feed
This value is called pbest. Another "best" value that is forward network with one or two hidden PEs can also solve
tracked by the global version of the particle swann opti the XOR problem.
mizer is the overall best value, and its location, obtainedthus The particle swarm optimization approach is to "fly" a
far by any particlein the population. This is called gbest. population of particles through 13dimensional hyperspace.
The particle swarm optimization concept consists of, at Eachparticle is initializedwith position and velocity vectors
each time step, changing the velocity (accelerating) each of 13 elements. For neural networks, it seems reasonable to
particle toward its pbest and gbest (global version). Accel initialize all positional coordinates (corresponding to con
eration is weighted by a random term, with separate random nectionweights) to within a range of (I, I), and velocities
numbers being generated for acceleration toward pbest and should not be so high as to fly particles out of the usable
gbest. field. It is also necessaryto clamp velocities to some maxi
There is also a local version of the optimizer in which, in mum to prevent overflow. The test examples use a popula
addition to pbest, each particle keeps track of the best solu tionof 20 particles for this problem. James Kennedyand the
tion, called lbest, attained within a local topological author have used populations of 1050 particles for other
neighborhood of particles. Both the global and local ver applications. The XOR data are entered into the net, and an
sions are describedin more detail later. error term to be minimized, usually squared error per output
The only variable that must be specifiedby the user is the PE, is computed for each of the 20 particles.
maximum velocity to which the particles are limited. An As the system iterates, individualagents are drawn toward
acceleration constant is also specified, but in the experience a global optimum based on the interaction of their individual
of the authors, it is not usually varied among applications. searches and the global or local group's public search. Error
Both the global and local versions of particle swarm opti threshold and maximum iteration tennination criteria have
mizer implementations are introduced in the next section been specified. When these are met, or when a key is
10
pressed, iterations cease and the best weight vector found is velocity on that dimension is limited to VMAX. On any
written to a file. particular iteration, a good percentage of particles typically
are changing at this maximum rate, especially after a change
5.4.2 The GBESTModel in GBEST, when particles swann from one region toward
The standard"GBEST" particle swann algorithm, which is another.
the original form of particle swarm optimization developed, Thus, VMAX is a rather important parameter. It deter
is very simple. The procedure listed below is for a minimi mines, for instance, the fineness with which regionsbetween
zation problem. For maximization, reverse the inequality the present position and the target position will be searched.
signs in steps 3 and 4. The steps to run GBEST are: If VMAX is too high, particles might fly past good solu
1. Initialize an array of particles with random positions and tions. On the other hand, if VMAX is too small, particles
velocitieson D dimensions. may not explore sufficiently beyond good regions. Further,
they could become trapped in local optima, unable to jump
2. Evaluate the desired minimization function in D vari far enough to reach a better position in the data space.
ables. ACe_CONST represents the weighting of the stochastic
acceleration terms that pull each agent toward PBEST and
3. Compare evaluation with particle's previous best value GBEST positions. Thus, adjustment of this factor changes
PBESTD; if current value < PBEST[], then PBEST[] = the amount of "tension" in the system. Low values allow
current value and PBESTx[][d] = current position in D agents to roam far from target regions before being tugged
dimensionalhyperspace. back, while high values result in abrupt movement toward
the target regions.
4. Compare evaluation with group's overall previous best
Table 1 shows results of manipulating VMAX and
(PBEST[GBEST]); if current value < PBEST[GBEST],
ACe_CONST for the GBEST model. Values in the table
then GBEST = particle's array index.
are median numbers of iterations of 20 observations. In
5. Change velocityby the following formula: V[][d] = some cases, the numbers of which are given in parentheses,
VD[d] + ACC_CNST*randO*(pBESTxD[d] the swarm settled on a local optimum and remained there
Presentx[][d]) + until iterations equaled 2,000. This happened four times in
ACC_CNST*rand()*(pBESTx[GBEST][d]  the condition for VMAX=2.0 and ACC_CONST=2.0, and
PresentxO[d]) two times in the other two conditions for which VMAX=2.0,
plus once for VMAX=8.0. Medians are presented, rather
6. Move to PresentxO[d] + vO[d]; loop to step 2 and repeat than means, because the system was artificially stopped at
until a criterion is met. 2,000 iterations. These trials could probably have been run
to many more than 2,000 iterations; in any case, the means
5.4.3 Varying VMAX and ACC_CONST would have been inflated, while the effect on medians was
Particles' velocities on each dimension are. clamped by considerably less. Median values communicate the number
VMAX. If the sum of accelerations is greater than VMAX, of iterations that can be expected in SO percent of trials.
which is a system parameter specified by the user, then the
VMAX ACC_CONST
3.0 2.0 1.0 0.5
2.0 25.5(2) 22(4) 28(2) 37.5(2)
4.0 32(4) 19.5 20.5 34.5
6.0 29 19.5 36.5 33
8.0 31 27 29.5 23.5(1)
Table1: Median iterations required to meeta criterion of squaredCI1'Or per PE < .02. Population is 20 particles. Numbers in pa
rentheses arethe numberof trials in which the system iterated 2,000times; in thesetrialsthe systemwas stuck in local optima.
11
In the neighborhood = 2 model, for instance, particle( i)
5.4.4 The LBEST Version
compares its error value with particle(iI) and particle( i +
Based, among other things, on findings from social 1). The lbest version was tested with neighborhoods of
simulations, it was decided to design a "local" version (para various sizes. Test results are summarized in the following
digm) of the particle swarm concept. In this paradigm, par tables for neighborhoods consistingof the immediately adja
ticles have information only of their own and their nearest cent neighbors (neighborhood = 2) and of the three neigh
neighbors' bests,rather than that of the entire group. Instead bors on each side (neighborhood= 6).
of moving toward the stochastic average of pbest and gbest Table 2 shows results of performance on the XOR neural
(the best evaluation of the entire group), particles move to networkproblemwith neighborhood =2. Note that no trials
ward the points defined by pbest and "lbest,' which is the fixated on local optimanor have any in hundreds of unre
index of the particle with the best evaluation in the ported tests.
neighborhood.
VMAX ACC_CONST
2.0 1.0 0.5
2.0 38.5 47 37.5
4.0 28.5 33 53.5
6.0 29.5 40.5 39.5
Table2: Local version of particle swann with a neighborhood of two. Showsmedian iteraiionsrequiredto meeta criterionof
squared error per PE < .02 with a populationof 20 particles. There were no trials with more than 2,000 iterations.
Cluster analysis of sets of weights from this version tions on average to find a criterion error level. Table 3
showed that blocks of neighbors, consisting of regions of represents tests of an LBEST version with neighborhood =
from two to eightadjacent particles (individuals),had settled 6, that is, with the three neighbors on each side of the parti
into the same regions of the solution space. It appears that cle taken into account (with arrays wrapped, so the fmal
the relative invulnerability of this version to local optima particlewas consideredto be beside the first one).
might result from the fact that a number of "groups" of par This version is prone to local optima,at least when VMAX
ticles spontaneously separate and explore different regions. is small, though less so than the GBESTversion. Otherwise
It thus appears to be a more flexible approach to information it seems, in most cases, to perform somewhat less well than
processingthanthe GBEST model. the standardGBEST algorithm.
Nonetheless, though this version rarely if ever becomes
entrapped in a local optimum, it clearly requires more itera
VMAX ACC_CONST
2.0 1.0 0.5
2.0 31.5(2) 38.5(1) 27(1)
4.0 36(1) 26 25
6.0 26.5 29 20
Table3: Local version of particle swann with a neighborhood of six. Median iteratioDS required to meet a criterion of squared er
ror per PE < .02 with a populationof20 particles. Numbersin parenthesesare Dumber of trials in which the system iterated2,000
times; in these trials the system was stuck in local optima.
In sum, the neighborhood = 2 model offers some intrigu convergence, but introduces the frailties of the GBEST
ing possibilities, in that it seems immune to local optima. It model.
is a highly decentralized model, which can be run with any
numberof particles. Expanding the neighborhood speeds up
12
6. CONCLUSIONS few lines of code, and requires only specification of the
problemand a few parametersin order to solve it.
This section has discussed the particle swarm concept and
examinedhow changes in the paradigm affect the number of
7. ACKNOWLEDGMENTS
iterations required to meet an error criterion, and the fre
quency with which models cycle intenninablyaround a non Portions of this tutorial were adapted from Eberhart et al.
global optimum. Three versions were discussed: the (1996); the permission of Academic Press is acknowledged.
GBEST model in which each particle has information about Other portions have been adapted from the upcoming book
the population's best evaluation, and two variations of the entitled Swarm Intelligence (Kennedy et al. 2000). The kind
LBEST version, one with a neighborhoodof 6, and one with permission of Morgan Kaufmann Publishers is acknowl
a neighborhood of 2. It appears that the original GBBST edged. Finally, the input of Jim Kennedy is gratefully ac
version often performs best in terms of median number of knowledged.
iterations to converge, while the LBEST version with a
neighborhoodof 2 seems most resistant to local minima.
Particle swarm optimization is' an extremely simple algo 8. SELECTED BIBLIOGRAPHY
rithm that seems to be effective for optimizing a wide range
of functions. The authors view it as a midlevel form of A
life or biologically derived algorithm, occupying the space Baeck, T., and H.P. Schwefel (1993). An overview of evo
in nature between evolutionary search, which requires eons, lutionaryalgorithmsfor parameter optimization. Evolu
and neural processing, which occurs (as far as we now tionary Computation, 1(1): 123.
know) on the order of milliseconds. Social optimization Baeck, T. (1995). Generalized convergence modelsfor
occurs in the time frame of ordinary experience  in fact, it tournament and (mu, lambda) selection. Proc. of the
is ordinary experience. Sixth International Conference on Genetic Algorithms,
In addition to their ties with Alife, particle swarm para Morgan Kaufmann Publishers, San Francisco, CA, 27.
digms have obvious ties with evolutionary computation. Bagley, J. D. (1967). The behavior of adaptive systems
Conceptually, they seem to lie somewhere between genetic which employ genetic and correlation algorithms. Ph.D.
algorithms and evolution strategies. They are highly de Dissertation, UniversityofMichigan, Ann Arbor, MI.
pendent on stochastic processes, like evolution strategies. Baker, J. A. (1987). Reducing bias and inefficiency in the
The adjustment toward pbest and gbest by a particle swann selection algorithm. Proc. 2nd Inti. Con! on Genetic
is conceptually similar to the crossover operationutilized by Algorithms: Genetic Algorithms and Their Applications,
genetic algorithms. They use the concept ofjitness, as do all Lawrence ErlbaumAssociates, Hillsdale, NJ.
evolutionarycomputationparadigms. Bezdek, J. C., S. Boggavarapu, L. O. Hall, and A. Bensaid
Unique to the concept of particle swarm optimization is (1994). Genetic algorithm guided clustering. Proc.
flying potential solutions through hyperspace, accelerating Int'l. Conf. on Evolutionary Computation, IEEE Service
toward "better" solutions. Other evolutionary computation Center, Piscataway,NJ, 3439.
schemes operate directly on potential solutions, which are Davis, L., Ed. (1991). Handbook of Genetic Algorithms.
represented as locations in hyperspace. Much of the success Van Nostrand Reinhold,New York, NY.
of particle swarms seems to lie in the particles' tendency to De Jong, K. A. (1975). An analysis of the behavior of a
hurtle past their targets. In his chapter on the optimum allo class of geneticadaptive systems. Doctoral dissertation,
cation of trials, Holland (1992) discusses the delicate bal University of Michigan.
ance between conservative testing of known regions and Eberhart, R. C., and J. Kennedy (1995). A new optimizer
risky exploration of the unknown. It appears that the parti using particle swarm theory. hoc. Sixth Inti. Sympo
cle swarm paradigms allocate trials nearly optimally. The sium on Micro Machine and Human Science (Nagoya,
stochastic factors allow thorough search of spaces between Japan), IEEE Service Center, Piscataway,NJ, 3943.
regions that have been found to be relatively good, and the Eberhart, R. C., P. K. Simpson and R. W. Dobbins (1996).
momentum effect caused by modifying the extant velocities Computational Intelligence PC Tools. Academic Press
rather than replacing them results in overshooting, or explo Professional,Boston, MA.
ration of unknown reions of the problem domain. Fogel, D. B. (1991). System Identification Through Simu
Much further research remains to be conducted on this lated Evolution: A Machine Learning Approach to
simple new concept and paradigms. The goals in developing Modeling. Ginn Press, Needham Heights, MA.
them have been to keep them simple and robust; these goals
seem to have been met. The algorithm is written in a very
13
Fogel, D. B. (1995). Evolutionary Computation: Towarda Holland, J. H. (1962). Outline for a logical theory of adap
New Philosophy of Machine Intelligence. IEEE Press, tive systems. Journal ofthe Associationfor Computing
Piscataway,NJ. Machinery, 3:297314.
Fogel, D. B. (2000). What is evolutionary computation? Holland, J. H. (1992) [orig. ed. 1975].. Adaptation in Natu
IEEE Spectrum, 37(2). ral and Artificial Systems. MIT Press, Cambridge, MA.
Fogel, L. J., A. J. Owens, and M. J. Walsh (1966). Artificial Kennedy, J., and R. C. Eberhart (1995). Particle swarm op
Intelligence through Simulated Evolution, John Wiley, timization. hoc. IEEE Inti. Conf on Neural Networks
New York, NY. (Perth, Australia), IEEE ServiceCenter, Piscataway, NJ,
Fogel, L. J. (1994). Evolutionary programming in perspec IV:19421948.
tive: the topdown view. In Computational Intel Kennedy, J., R C. Eberhart, and Y. Shi (2000). Swarm In
lingece: Imitating Life, J. M. Zurada, R. J. Marks II and telligence. Morgan Kaufmann Publishers, San Fran
c. 1. Robinson, Eds., IEEE Press, Piscataway, NJ, 135 cisco,CA (in press). .
146. Koza, 1. R. (1992). Genetic Programming: On the Pro
Fraser, A. S. (1957). Simulation of genetic systems by gramming ofComputers by Means ofNatural Selection.
automatic digital computers. Australian Journal ofBio MIT Press, Cambridge, MA.
logical Science, 10:484499. Levy, S. (1992). Artificial Life. Random House, New York,
Fraser, A. S. (1960). Simulation of genetic systems by NY.
automatic digital computers: 5linkage, dominance and Liepins, G. E. and W. D. Potter (1991). A genetic algorithm
epistasis. In Biometrical Genetics, O. Kempthome, Ed., approach to multiplefault diagnosis. In Handbook of
Macmillan,New York, NY, 7083. GeneticAlgorithms, L. Davis, Ed., Van Nostrand Rein
Fraser, A. S. (1962). Simulation of genetic systems. Jour hold, New York, NY.
nal ofTheoretical Biology, 2:329346. Michalewicz, Z., and M. Michalewicz (1995). Prolife ver
Friedberg, R. M. (1958). A learning machine: Part I. IBM sus prochoice strategies in evolutionary computation
Journal ofResearch and Development, 2:213. techniques. In Computational Intelligence: A Dynamic
Friedberg, R. M., B. Dunham, and J. H. North (1959). A System Perspective, M. Palaniswami, Y. Attikiouzel, R.
leaming machine: Part II. IBM Journal ofResearch and Marks, D. Fogel and T. Fukuda, Eds., IEEE Press, Pis
Development, 3:282287. cataway,NJ, 137151.
Goldberg,D. E. (1983). Computeraided gas pipeline opera Mitchell, M. (1996). An Introduction to Genetic Algo
tion using genetic algorithms and rule learning (doctoral rithms. Cambridge, MA: M. I. T. Press.
dissertation, University of Michigan). Dissertation Ab Montana, D. J. (1991). Automated parameter tuning for
stracts International, 44(10), 3174B. interpretation of synthetic images. In Handbook of Ge
Goldberg, D. E. (1989). Genetic Algorithms in Search, Op netic Algorithms, L. Davis, Ed., Van Nostrand Rein
timization, and Machine Learning. AddisonWesley, hold, New York, NY.
Reading, MA. Pedrycz,W. (1998). Computational Intelligence: An Intro
Grefenstette, J. 1. (1984a). GENESIS: A system for using duction. Boca Raton, FL: CRC Press.
genetic search procedures. Proc. of the 1984 Conf on Rechenberg, I. (1965). Cybernetic solution path of an ex
IntelligentSystemsand Machines, 161165. perimental problem. Royal Aircraft Establishment, li
Grefenstette, J. J. (1984b). A user's guide to GENESIS. brary translation 1122,Famborough,Hants, U.K.
Technical Report CS8411, Computer Science Dept., Rechenberg, I. (1973). Evolutionsstrategie: Optimierung
VanderbiltUniversity,Nashville, TN. techntscher Systeme nach Prinzipten der biologischen
Grefenstette, J. J., Ed. (1985). Proc. of an International Evolution, FrommannHolzboog Verlag, Stuttgart, Ger
Conference on Genetic Algorithms and Their Applica many.
tions. LawrenceErlbaum Associates, Hillsdale, NJ. Rechenberg, I. (1994). Evolution strategy. In J. Zurada, R.
Haupt, R. and Haupt, S. (1998). Practical Genetic Algo Marks, II, and C. Robinson, Eds., ComputationalIntel
rithms. New York: John Wiley and Sons. ligenceImitating Life, IEEE Press, Piscataway, NJ,
Heppner, F., and U. Grenander (1990). A stochastic nonlin 147159.
ear model for coordinated bird flocks. In S. Krasner, Reynolds, C. W. (1987). Flocks, herds, and schools: A
Ed., The Ubiquity of Chaos. AAAS Publications, distributed behavioral model. Computer Graphics,
Washington, DC. 21(4):2534.
Schaffer,J. D. (1984). Some experiments in machine learn
ing using vector evaluated genetic algorithms. Unpub
14
lished doctoral dissertation, Vanderbilt University, Smith1980 Smith, S. F. (1980). A learning system based
Nashville, TN. on genetic adaptive algorithms. Unpublished doctoral
Schwefel, H.P. (1965). Kybemetische Evolution als Strate dissertation, University of Pittsburgh, Pittsburgh, PA.
gie der experimentellen Forschung in der Stromung Syswerda, G. (1989). Uniform crossover in genetic algo
stechnik. Diploma thesis, Technical University of Ber rithms. In Proc. ofthe Third Int'l. Conf. on Genetic Al
lin, Germany, gorithms, J. D. Schaffer, Ed., Morgan Kaufmann Pub
Schwefel, H.P. (1994) On the evolution of evolutionary lishers,San Mateo, CA.
computation. In J. M. Zurada, R. J. Marks II and C. J. Wilson, E. O. (1975). Sociobiology: The New Synthesis.
Robinson, Eds., Computational Intelligence: Imitating BelknapPress,.Cambridge, MA.
Life. IEEE Press, Piscataway,NJ.
15
Chapter 2
Overview of Applications in Power Systems
AbstractThis survey covers the broad area of evolutionary For discrete optimization problems, classical mathematical
computation applications to optimization, model identification, and programming techniques have collapsed for largescale
control in power systems. Almost all reviewed papers have been problems. Optimizationalgorithmssuch as "branch and bound"
published in the IEEETransactions and the lEE Proceedings. A total of and dynamic programming, which seek for the best solution,
146 articles are listed in this survey. It shows the development of the
have no chance in dealing with the above mentioned problems,
area and identifies the current trends. The following techniques are
considered under the scopeof evolutionary computation: evolutionary unless significant simplifications are assumed (e.g., besides the
algorithms (e.g., genetic algorithms, evolution strategies, evolutionary curseof dimensionality, dynamicprogramming has difficulty in
programming, and genetic programming), simulated annealing, tabu dealing with timedependent constraints). Problems such as
search, and particleswarmoptimization. generation scheduling have the typical features of largescale
combinatorial optimization problems, i.e., NPcomplete
IndexTermsSurvey, evolutionary computation, powersystems. .problems cannot be solved for the optimal solution in a
reasonable amount of time. For this class of problems, general
1. OPTIMIZATION purpose heuristic search techniques (problem independent),
such as BC, have been very efficient for fmding near optimal
Optimization is the basic concept behind the application of
solutions in reasonable time.
evolutionary computation (EC) to any problem in power
systems [1], [2]. Besides the problems in which optimization
itself is the final goal [3][89], it is also a means for modeling/ 2. POWER SYSTEMAPPLICAnONS
forecasting [109][119], control [120][135], and simulation
[145], [146]. Optimization models can be roughly divided in Generation scheduling is one of the most popular
two classes: continuous (involving real variables only) and applications of EC to power systems [3][31]. The pioneerwork
discrete (with at least one discrete variable). The objective of Zhuang and Galiana [3] inspired subsequent papers on the
function(s) (single or multiple) and the constraints of the application of generalpurpose heuristic methods to unit
problem can be linear or nonlinear, convex or concave. commitment and hydrothermal coordination.Realisticmodeling
Optimization techniques have been applied to severalproblems is possible when solving these problems with such methods.
in power systems. Thennal unit commitment / hydrothermal The general problem of generation scheduling is subject to
coordination and economic dispatch I optimal power flow, constraints such as power balance, minimum spinning reserve,
maintenance scheduling, reactive sources allocation and energy constraints, minimum and maximum allowable
expansion planning are amongthe most important applications. generations for each unit, minimum up and downtimes of
Modem heuristic search techniques such as evolutionary thennal generation units, ramp rates limits for thermal units,
algorithms are still not competitive for continuous optimization and level constraints of storage reservoirs. Incorporation of
problems such as economic dispatch and optimal power flow. crew constraints, takeorpay fuel contract [7], water balance
Successive linear programming, interior point methods, constraints caused by hydraulic coupling [9], rigorous
projected augmented Lagrangian, generalizedreduced gradient, environmental standards [13], multiplefuelconstrained
augmented Lagrangian methods and sequential quadratic generation scheduling [16], and dispersed generation and
programming have all a long history of successful applications energystorage [21] is possiblewith BC techniques.
to this type of problem. However, the tradeoff between A current trend is the utilization of Ee only for dealing with
modeling precisionand optimality has to be takeninto account. the discrete optimization problem of deciding the on/off status
One good example is when the inputoutput characteristics of the generation units. Mathematical programming techniques
of thermal generators are highly nonlinear (nonmonotonically are employed to perform the economic dispatch, while meeting
increasing) due to effectssuch as "valve points" [32],[40],[42]. all plant and system constraints. Expansion planning is another
In this situation, their incremental fuel cost curves cannot be area which has been extensively studied with the EC approach
reasonably approximated by quadratic or piecewise quadratic [55][89]. The transmission expansion planning for the North
(or linear) functions. Therefore, traditional optimization Northeastern Brazilian network, for which optimal solution is
techniques, although achieving mathematical optimality, have unknown, has been evaluated using genetic algorithms (GAs).
to sacrificemodelingaccuracy,providing suboptimal solutions The estimated cost is about 8.SOA» less than the best solution
in a practicalsense. Evolutionary computationalgorithms allow obtained by conventional optimization [78]. Economic load
precise modeling of the optimization problem, although usually dispatch / optimal power flow [32][50] and maintenance
not providing mathematically optimal solutions, but nesr scheduling [51][54] have also been solved by EC methods.
optimal ones. Another advantage of using EC for solving Another interesting application is the simulation of energy
optimization problems is that the objective function does not markets[145], [146].
have to be differentiable.
16
3. MODELIDENTIFICATION essential for feeding the analytical methods used for
determining energy prices. The variability and nonstationarity
System identification methods can be applied to estimate of loads are gettingworse due to the dynamicsof energytariffs.
mathematical models based on measurements. Parametric and Besides, the number of nodal loads to be predicted does not
nonparametric are the two main classes in system identification allow frequent interventions from load forecasting specialists.
methods. The parametric methods assume a known model More autonomous load predictors are needed in the new
structure with unknown parameters. Their performances depend competitive scenario.
on a good guess of the model order, which usually requires With power systems growth and the increase in their
previous knowledge of the system characteristics. System complexity, many factors have become influential to the electric
identification can be used for modeling a plant or a problem power generation and consumption (load management, energy
solution (e.g., pattern recognition [112], [118]). The following exchange, spot pricing, independent power producers, non
sections show a few examples of successful applications of Ee conventional energy, etc.). Therefore, the forecasting process
to identification. has become even more complex, and more accurate forecasts
are needed. The relationshipbetween the load and its exogenous
3.1. Dynamic Load Modeling factors is complex and nonlinear, making it quite difficult to
model through conventional techniques, such as time series
The fundamental importance of power system components linear modelsand linear regression analysis. Besidesnot giving
modeling has been shown in the literature. Regardless of the the required precision, most of the traditional forecasting
study to be performed, accurate models for transmission lines, techniques are not robust enough. They fail to give accurate
transformers, generators, regulators and compensators have predictionswhen quick weather changes occur. Otherproblems
already been proposed. However, the same has not occurred for include noise immunity, portability and maintenance.
loads. Although the importance of load modeling is well Linear methods interpret all regular structure in a data set,
known, especially for transient and dynamic stability studies, such as a dominantfrequency, as linear correlations. Therefore,
the random nature of a load composition makes its linear models are useful if and only if the power spectrum is a
representation very difficult. useful characterization of the relevant features of a time series.
Two approaches have been used for load modeling. In the Linear modelscan only represent an exponentially growingor a
first one,based on the knowledge of the individualcomponents, periodically oscillating behavior. Therefore, all irregular
the load model is obtained through the aggregation of the load behavior of a system has to be attributed to a random external
components models. The second approach does not require the input to the system. Chaos theory has shown that random input
knowledge of the load physical characteristics. Based on is not the only possible source of irregularity in a system's
measurements related to the load responses to disturbances, the output
modelis estimated using system identificationmethods. The goal in creating an ARMA model is to have the residual
The composition approach requires infonnation that is not as white noise [111]. This is equivalent to produce a flat power
generally available, which consists in a disadvantage of this spectrumfor the residual. However, in practice,this goal cannot
method. This approach does not seem to be appropriate since be perfectly achieved. Suspicious anomalies in the power
the determination of an average (and precise) composition for spectrum are very common, i.e., the residual's power spectrum
each load bus of interest is virtually impossible. The second is not really flat. Consequently, it is difficult to say if the
approach does not suffer from this drawback, since the load to residual corresponds to white noise or if there is still some
be modeledcan be assumed a blackbox. However, a significant useful information to be extracted from the time series. Neural
amount of data related to staged tests and natural disturbances networks can find predictable patterns that cannot be detected
affectingthe systemneeds to be collected. by classical statistical tests such as auto(cross)correlation
Considering the shortcomings of the two approaches, and coefficientsand power spectrum.
the fact that data acquisition and processing are becoming very Besides, many observed load series exhibit periods during
cheap, it seems that the system identification approach is more which they are less predictable,depending on the past history of
in accordance to current technology. This approach allows real the series. This dependence on the past of the series cannot be
time load monitoring and modeling, which are necessary for on represented by a linear model [119]. Linear models fail to
line stability analysis. As the dynamic characteristics of the consider the fact that certain past histories may permit more
loads are highly nonstationary, structural adaptation of the accurate forecasting than others. Therefore, differently from
corresponding mathematical models is necessary. Evolutionary nonlinear models, they cannot identify the circumstances under
computation can be used in this adaptive process, searching for which more accurate forecasts can be expected.
new model structures and parameters. Examples of this The neural networks (NNs) ability in mapping complex
possibility are describedin [113] and [115]. nonlinearrelationships is responsiblefor the growing number"of
their applications to load forecasting. Several electric utilities,
3.2. ShortTerm Load Forecasting allover the world, have been applying NNs to shortterm load
forecasting in an experimental or operational basis. Despite
The importance of shortterm load forecasting has increased, their success, there are still some technical issues that surround
lately. With deregulation and competition, energy price the application of NNs to load forecasting, particularly with
forecasting has become a big business. Load bus forecasting is regard to parameterization. The main issue in the application of
17
feedforward NNs to time series forecasting is the question of later. Therefore, pruning should be appliedas a complementary
how to achieve good generalization. The NN ability to procedure to growing methods, in order to remove parts of the
generalize is extremely sensitive to the choice of the network's modelthat becomeunnecessary during the constructive process.
architecture, preprocessing of data, choice of activation
functions, number of training cycles, size of training sets, 3.3.2. Types of Approximation Functions
learning algorithm and the validationprocedure.
The greatest challenges in NN training are related with the The greatest concern when applying nonlinear NNs is to
issues raised in the previous paragraph. The huge number of avoid unnecessary complex models to not overfit the training
possible combinations of all NN training parameters makes its patterns. The ideal model is the one that matches the
application not very reliable. This is especially true when a complexity of the available data. However, it is desirable to
nonstationary system has to be tracked, i.e., adaptation is work with a general model that could provide any required
necessary, as it is the case in load forecasting. Nonparametric degreeof nonlinearity. Amongthe models that can be classified
NN models have been proposed in the literature [114]. With as universal approximators, i.e., the ones which can
nonparametric modeling methods, the underlying model is not approximate any continuous function with arbitrary precision,
known, and it is estimated using a large number of candidate the following types are the most important:
models to describe available date, Application of this kind of  multilayernetworks;
model to shortterm load forecasting has been neglected in the  local basis functionnetworks;
literature. Although the very first attempt to apply this idea to  trigonometric polynomials; and
shortterm load forecasting dates back to 1975 [109], it is still  algebraicpolynomials.
one of the few investigations on this subject, despite the The universal approximators above can be linear, although
tremendous increasein computationalresources. nonlinear in the inputs, or nonlinear in the parameters.
Regularization criteria, aDalytic (e.g., Akaike's Information
3.3. NeuralNetworkTraining Criterion, Minimum Description Length, etc.) or based on
resampling (e.g., crossvalidation), have been proposed. In
The main motivation for developing nonparametric NNs is practice, model regularization considering nonlinearity in the
the creation of fully data driven models, i.e., automatic selection parameters is very difficult. An advantage of using universal
of the candidate model of the right complexity to describe the approximators that are linear in the parameters is the possibility
trainingdata. The idea is to leave for the designer only the data of decoupling the exploration of architecture space from the
gathering task. Obviously, the state of the art in this area has not weight space search. Methods for selecting models with
reach that far. Every socalled nonparametric model still has nonlinearity in the parameters attempt to explore both spaces
some dependence on a few preset training parameters. A very simultaneously, which is an extremely hard nonconvex
useful byproduct of the automatic estimation of the model optimization problem.
structure is the selectionof the most significant input variables
for synthesizing a desired mapping. Input variable selection for 4. CONTROL
NNs has been performed using the same techniques applied to
linear models. However, it has been shown that the best input Another importantapplicationofBe in power systems is the
variables for linear models are not among good input variables parameter estimation and tuning of controllers. Complex
for nonlinearones. systems cannot be efficiently controlled by standard feedback,
since the effects caused by plant parameter variations are not
3.3.1. PruningVersusGrowing eliminated. Adaptive control can be applied in this case, i.e.,
when the plant is time invariant with partially unknown
Nonparametric NN training uses two basic mechanisms for parametersor the plant is timevariant.
fmdingthe most appropriate architecture: pruning and growing. In practical applicatioDS, it is difficult to express the real
Pruning methods assume that the initial architecture contains plant dynamics in mathematical equations. Adaptive control
the optimal structure. It is common practice to start the search schemes can adjust the controller according to process
using an oversized network. The excessive connections and/or characteristics, providing a high performance level. The
neuronshave to be removed during training, while adjusting the adaptive control problem is concerned with the dynamic
remaining parts. The pruning methods have the following adjustment of the controller parameters, such that the plant
drawbacks: output follows the referencesigDal.
 there is no mathematicallysound initializationfor the neural However, conventional adaptive control has some
network architecture, therefore initial guesses usually use drawbacks. Existent adaptive control algorithms work for
very large structures; and specific problems.They do not workas well for a widerange of
 due to the previousargument, a lot of computational effort is problems. Every application must beaDalyzed individually, i.e.,
wasted. a specific problem is solved using a specificalgorithm. Besides,
As the growing methods operate in the opposite direction of a compatibility study between the model and the adaptive
the pruning methods, the shortcomings mentioned before are algorithm has to be performed.
overcome. However, the incorporation of one elementhas to be The benefits of coordination betweenvoltage regulation and
evaluated independently of other elements that could be added damping enhancements in power systems are well known. This
18
problem has been aggravated as power networks operational 7.1. Surveys
margins decrease. GAs can be used to simultaneously tune the
parameters of automatic voltage regulators with the parameters [1] V. Miranda, D. Srinivasan, and L.M. Proenea; "Evolutionary
computation in power systems", 12th Power Systems Computation
of power system stabilizers and terminal voltage limiters of Conference, August 1996,Vol. 1, pp. 2540..
synchronousgenerators [120][135]. [2] K. Nara: "State of the arts of the modern heuristics application to power
One of the objectives is to maximize closedloop system systems", IEEE PES Winter Meeting, January 2000, Vol. 2, pp. 1279
damping, which is basically achieved by tuning the stabilizers 1283.
parameters. The second objective is to minimize terminal
voltage regulating error, which is _mainly accomplished by 7.2. GenerationScheduling
tuning the automatic voltage regulators and terminal voltage
limiters parameters. Minimum closedloop damping and F. Zhuang and F. Galiana: "Unit commitment.by simulatedannealing",
[3]
IEEE Transactions onPowerSystems, Vol. S, No.1, February 1990,pp.
maximum allowable overshoot are constraints that can be 311317.
incorporated into the optimization problem. The design of the [4] D. Dasgupta and D.R. McGregor: "Thennal unit commitment using
control system takes into account system stabilization over a genetic algorithms", lEE Proceedings  Generation, Transmission and
Distribution, Vol. 141,No. S, September 1994,pp. 459465.
prespecified set of operating conditions (nominal and [51
K.P. Wongand Y.W. Wong:"Thermal generatorschedulingusing hybrid
contingencies). geneticlsimulated8DDe8ling approach", lEE Proceedings  Generation,
Transmission and Distribution, Vol. 142,No.4, July 1995,pp. 372380.
5. CONCLUSIONS  [6] S.A Kazarlis, A.G. Bakirtzis, and V. Petridis: "A genetic algorithm
solutionto the unit commitment problem", IEEE Transactions on Power
Systems, Vol. 11,No.1, February 1996,pp. 8392.
This survey has covered the papers published in the IEEE [7] K.P. Wong and Y.W.W. Suzannah: "Combined genetic algorithm I
Transactions and lEE Proceedings on BC applications to power simulated annealing I fuzzy set approach to shortterm generation
systems. System expansion planning and generationscheduling scheduling with takeorpay fuel contract", IEEE Transactions on Power
Systems, Vol. 11, No.1, February 1996, pp. 128136.
have been the most carefully investigated problems. Great [8] x. Bai and S.M. Shahidehpour: "Hydrothennal scheduling by tabu
progress) particularly for the transmission system expansion search and decomposition method", IEEE Transactions on Power
planning, has been reported. Distribution systems have also Systems, Vol. 11, No.2, May 1996,pp. 968974.
attracted significantattention. [9] P.H. Chenand H.C. Chang: "Genetic aided scheduling ofbydraulically
Regarding different Be techniques, GAs are by far the most coupled plants in hydrothennal coordination", IEEE Transactions on
Power Systems, Vol. 11,No.2, May 1996,pp. 975981.
popular. The application of GAs to largescaleproblems is still [10] P.e. Yang, H.T. Yang, and C.L. Huang: "Scheduling shortterm
in progress. Only a few truly large power systems have been hydrotbermal generation using evolutionary programming techniques",
used for testing the EC approach. Evolutionary programming, lEE Proceedings  Generation, TraDSmission and Distribution, Vol. 143,
No.4,
although less popular than GAs) has shown great potential for [11] T.T. Maifeld July 1996, pp. 371376.
and G.B. Sheble: "Geneticbased unit commitment
tackling complexpractical applications. algoritbm", IEEE Transactions on Power Systems, Vol. 11, No.3,
After recognizing the limitations of the initial ideas, the EC August1996,pp. 13591370.
research community has merged many ideas that had been [12] D. Srinivasan and A.G.B. TettamaDzi: "Heuristicsguided evolutionary
independently proposed. In fact, algorithmsbased on evolution approach to multiobjective pneration scheduling", lEE Proceedings 
Generation, TlBDSIDission and Distribution, Vol. 143, No.6, November
theory are becoming prettymuch similar. Simulated annealing, 1996,pp. 553559.
although the oldest heuristic search technique applied to power [13] D. Srinivasan and A.G.B. TettamaDzi: 66An evolutiooary algorithm for
systems, has shown limited capacity for dealing with largescale evaluation of emission compliance options in view of the clean air act
problems. Tabu search and particle swarm still need more amendments", IEEE Transactions on Power Systems, Vol. 12, No.1,
February1997,pp. 336341.
empirical evidence of their potential for solving such problems. [14] 8.J. Huang and C.L. Huang: "Application of genetic based neural
There is a clear trend towards the combination of EC networks to thennal unit commitmentn , IEEE Transactions on Power
methods among themselves, and with classical mathematical Systems, Vol. 12, No.2, May 1997, pp. 654660.
programming techniques (e.g., [5], [7], [8], [16], [24], [28]). [IS) H.T. Yang, P.C. Yang, and C.L. Huang: 66A parallel geneticalgorithm
to solving the unit commitment problem: implementation on the
The challenge for power engineers is to mcorporate domain transputer networks", IEEETransactions on PowerSystems, Vol. 12, No.
specific knowledge into the heuristic search process, without 2, May 1997,pp. 661668.
deteriorating the Ee exploration capability. A good start has (16) K.P. Wong and Y.W.W. Suzalmah: "Hybrid genetic I simulated
been made,but the real challenges still lie ahead. annealing approach to shortterm multiplefuel coDStraint generation
scheduling", IEEE Transactions on Power Systems, Vol. 12,No.2, May
1997,pp. 776784.
6. ACKNOWLEDGMENTS [17] A.H. Mantawy, Y.L. AbdelMagid, and S.Z. Selim: "Unit commitment
by tabu search", lEE Proceedings  Generation, Transmission and
Distribution, Vol. 145,No.1, January 1998,pp. 5664.
This work was supported by the Brazilian Research Council [18] A.H. Mantawy, Y.L. AbdelMagid, and S.l. Selim: "A simulated
(CNPq) under grant No. 300054/912. Alves cia Silva would annealing algorithm for unit commitmeDt", IEEE Transactions on Power
also like to thank PRONEX for the financial support and the Systems, Vol. 13,No.1, February 1998,pp. 197204.
editors for their time and effort reviewing this chapter. [19] S.O. Orero and M.R.. Irving: "A genetic algorithm modellingframework
aDd solution technique for short term optimal hydrothermal scheduling",
IEEE Transactions on Power Systems, VoL 13, No.2, May 1998, pp.
7. REFERENCES 501518.
19
[20J H.C. Chang and P.H. Chen: "Hydrothennal generation scheduling IEEETransactions on Power Systems, Vol. 11, No.1, February1996,pp.
package: a genetic based approach", lEE Proceedings  Generation, 112118.
Transmission and Distribution, Vol. 145,No.4, July 1998,pp. 451457. [41] rx. Xu, C.S. Chang; and X.W. Wang: "Constrained multiobjective
[21] I.F. MacGilJ and R.J. Kaye: "Decentralised coordinationof powersystem global optimisation of longitudinal interconnected power system by
operation using dual evolutionary programming", IEEE Transactions on genetic algorithm", lEE Proceectings  Generation, Transmission and
PowerSystems, Vol. 14,No.1, February 1999,pp. 112119. Distribution, Vol. 143,No.5, September 1996,pp. 43S446.
[22] E.S. Huse, I. Wangensteen, and H.H. Faanes:"Thermal powergeneration [42] S.O. Otero and MA. Irving: "Economic·dispatch of generators with
scheduling by simulated competition", IEEE Transactions on Power prohibited operating zones: a genetic algorithm approach", lEE
Systems, Vol. 14,No.2, May 1999,pp. 472477. Proceedings  Generation, TraDSmissioD and Distribution, Vol. 143, No.
[23] S.J. Huang: "Application of genetic based fuzzy systems to 6, November1996,pp. 529S34.
hydroelectric generation scheduling", IEEE Transactions on Energy [43] Y.H. Song, G.S. Wang, P.Y. Wang, and A.T. Johns:
Conversion, Vol. 14,No.3, August 1999,pp. 724730. "Environmental/economic dispatch using fuzzy logic controlled genetic
[24] A.H. Mantawy, Y.L. AbdelMagid, and S.Z. Selim: "Integrating genetic algorithms", lEE Proceedings  Generation, Transmission and
algorithms, tabusearch,andsimulated annealingfor the unit commitment Distribution, Vol. 144,No.4, July 1997,pp. 377382.
problem", IEEE Transactions on Power Systems, Vol. 14, No.3, August [44) K.P. WODg and J. Yuryevich: "Evolutionarypropammingbased
1999,pp. 829836. algorithm for environmentallyeonstrained economic dispatch", IEEE
[25] T.G. Werner and J.F. Verstege: "An evolution strategy for shonterm Transactions on Power Systems,Vol. 13,No.2,. May 1998,pp. 301306.
operation planning of hydrothermal power systems", IEEE Transactions [45] C.S. Chang and W. Fu: "Stochasticmultiobjectivepneration dispatchof
on PowerSystems,Vol. 14,No.4, November1999,pp. 13621368. combined heat and power systems", lEE Proceedinp  Generation,
[26] K..A. Juste, H. Kita, E. Tanaka, and J. Hasegawa: "An evolutionary Transmission and Distribution, Vol. 145, No.5, September 1998, pp.
programming solution to the unit comitment problem", IEEE 583591.
Transactions on Power SYStemS, Vol. 14, No.4, November 1999, pp. [46] D.B. Des and C. Patvardban: "New multiobjective stochastic search
14521459. technique for economic load dispatch", lEE Proceedings  Generation,
[27] A. Rudolfand R. Bayrleitlmer: "A genetic algorithmfor solvingthe unit Transmission and Distribution, Vol. 145, No.6, November 1998, pp.
commitment problem of a hydrotbermal power system", IEEE 747752.
Transactions on Power Systems, Vol. 14, No.4, November 1999, pp. [47] J. Yuryevicb and K..P. WODg: "Evolutionary programminl based optimal
14601468. power flow algorithm", IEEE Transactions OD Power Systems, Vol. 14,
[28] C.P. Cheng, C.W. Liu, and C.C. Liu: "Unit commitment by No.4, November 1999,pp. 12451250.
Lagrangian relaxation and genetic algorithms", IEEE Transactions on [48] J.R. Gomes and O.R. Saavedra: "Optimal JUCtive power dispatch using
PowerSystems, Vol. IS, No.2, May 2000,pp. 707714. evolutionary computation: extended alpritbms", lEE Proceedings 
[29] C.W. Richterand G.B. Sheble: "A profitbasedunit commitment GA for Generation, Transmission and Distribution, Vol. 146, No.6, November
the competitive environment", IEEE Traasactions on Power Systems, 1999,pp. 586592.
Vol. IS, No.2, May 2000,pp. 715721. {49] N. Li; Y. Xu, and H. Chen: "FAcrsbased power flow control in
[30] R.H. Liang and F.C. KIng: "Thermal generating unit commitment interconnected power systemn, IEEE TraDllCtioas OD Power Systems,
using an exteDded mean field 8DDeatiDg DeUraI DetwOl'k", lEE Vol. IS, No.1, Februaly 2000, pp. 257262.
Proceedings  Generation, Transmission and Distribution,Vol. 147, No. [SO] H. Yoshida, K. Kawa1a, Y. FuItuyama, S. Takayama,IDd Y. Nakanishi:
3, May 2000,pp. 164170. "A particle swarm optimjzatioD for reactive power IDd voJtaae control
[31] Y.G. Wu, C.Y. Ho, and D.Y. Wang: "A diploid genetic approach to considering voltage security .seameot", IEEE T.ctioDS on Power
shortterm scheduling of hydrothermal system", IEEE TraDSlCtiODS on Systems,Vol. IS, No.4, November2000,pp. 12321239.
PowerSystems, Vol. 15,No.4, November2000,pp. 12681274.
7.4.Maintenance Scheduling
7.3. Economic I Reactive Dispatch and Optimal PowerFlow
[32] D.C.Walters and G.B. Sbeble:"Genetic algorithmsolution of economic [51] T. Satoh and K. Nara: "MaiDteDlDce scbeduliDl by using simulated
dispatch with valve point loading", IEEE Transactions on Power lDDealing method (for power plants)", IEEE T.ctions on Power
Systems, Vol. 8, No.3, August 1993,pp. 13251332. Systems,Vol. 6, No. 2, May 1991.pp. 850857.
20
[58] Y.L. Chenand C.C. Liu: "Multiobjective VAr planning using the goal [78] R.A. Gallego, A. Monticelli, and R. Romero: "Comparative studies on
attainment method", lEE Proceedings  Generation, Transmission and nonconvex optimization methods for transmission network expansion
Distribution, Vol. 141,No.3, May 1994,pp. 227232. planning", IEEETransactions on Power Systems,Vol. 13, No.3, August
[59] K.P. Wongand Y.W. Wong: "Shortterm hydrothermalschedulingpart I. 1998,pp. 822828.
Simulated annealing approach", lEE Proceedings  Generation, [79] L.L. Lai and J.T. Ma: "Practical applicationof evolutionarycomputing to
Transmission and Distribution, Vol. 141, No.5, September 1994, pp. reactive power planning", lEE Proceedings  Generation, Transmission
497501. and Distribution,Vol. 145, No.6, November 1998,pp. 753758.
[60] K.P. Wong and Y.W. Wong: "Shortterm hydrothennal scheduling part [80] P. Patemi, S. Vitet, M. Bena, and A. Yokohama: "Optimal location of
II. Parallel simulated annealing approach", lEE Proceedings  phase shifters in the French network by genetic algorithm", IEEE
Generation, Transmission and Distribution, Vol. 141, No.5, September Transactions on Power Systems, Vol. 14, No.1, February 1999, pp. 37
1994, pp. 502506. 42.
[61] Y.L. Chen and C.C. Liu: "Interactive fuzzy satisfying method for [81] Y.M. Park, J.R. Won, J.B. Park, and D.G. Kim: "Generation
optimal multiobjective VAr planning in power systems", lEE expansion planning based on an advanced evolutionary programming",
Proceedings  Generation, Transmission and Distribution, Vol. 141, No. IEEE Transactionson Power Systems,Vol. 14,No.1, February 1999, pp.
6, November 1994,pp. 554560. 299305.
[62] Y.L. Chen and C.C. Liu: "Optimal multiobjective VAr planningusing [82] AJ. Urdaneta, J.F. Gomez, E. Sorrentino, L. Flores, and R. Diaz: "A
an interactive satisfyingmethod", IEEE Transactions on Power Systems, hybrid genetic algoritlun for optimal reactive power planning basedupon
Vol. 10,No.2, May 1995,pp. 664670. successive linear prograrmning", IEEE Transactions on Power Systems,
[63] W.S. Jwo, C.W. Liu, C.C. Liu, and Y.T. Hsiao: "Hybrid expert Vol. 14, No.4, November 1999,pp. 12921298.
system and simulated annealing approach to optimal reactive power [83] C.J. Chou, C.W. Liu, J.Y. Lee, and K.D. Lee: "Optimal planning of
planning", lEE Proceedings Generation, Transmission and Distribution, large passiveharmonicfilters set at high voltage level", IEEE
VoL 142,No.4, July 1995,pp. 381385. Transactionson Power Systems, Vol. 15, No.1, February 2000, pp.433
[64] K.Y. Lee, X. Bai, and Y.M. Park: "Optimization method for reactive 441.
power planning by using a modified simple genetic algorithm", IEEE [84] R.A. Gallego, R. Romero, and AJ. Monticelli: "Tabu search algorithm
Transactions on Power Systems, Vol. 10, No.4, November 1995, pp. for network synthesis", IEEE Transactions on Power Systems, Vol. 15,
18431850. No.2, May 2000, pp. 490495.
r65] R. Romero, R.A. Gallego, and A. Monticelli: "Transmission system [85] J.B. Park, Y.M. Park, J.R. Won, and K..Y. Lee: "An improved genetic
expansion planning by simulated annealing", IEEE Transactions on algorithm for generation expansion planning", IEEE Transactions on
PowerSystems, Vol. 11, No.1, February 1996,pp. 364369. Power Systems,Vol. IS, No.3, August 2000, pp. 916922.
[66] Y. Fukuyama and H.D. Chiang: "A parallel genetic algorithm for [86] M. Delfanti, G.P. Granelli, P. Marannino, and M. Montagna: "Optimal
generation expansion planning", IEEE Transactions on Power Systems, capacitor placement using deterministic and genetic algorithms", IEEE
Vol. 11,No.2, May 1996,pp. 955961. Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 1041
[67] J.T. Ma and L.L. Lai: "Evolutionary programming approach to reactive 1046.
power planning", lEE Proceedings  Generation, Transmission and [87] E.L. da Silva, H.A. Gil, and J.M. Areiza: "Transmission network
Distribution, Vol. 143,No.4, July 1996,pp. 365370. expansion planning under an improved genetic algorithm", IEEE
[68] H. Rudnick, R. Palma, E. Cura, and C. Silva: "Economically adapted Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 1168
transmission systems in open access schemes  application of genetic 1174.
algorithms", IEEE Transactions on Power Systems, Vol. 11, No.3, [88] E.L. da Silva, J.M. Areiza Ortiz, G.C. de Oliveira, and S. Binato:
August 1996,pp. 14271440. "Transmission network expansion planning under a tabu search
[69] Y.L. Cben: ~'Weak bus oriented reactive power planning for system approach", IEEE Transactions on Power Systems, Vol. 16, No.1,
security", lEE Proceedings  Generation, Transmission and Distribution, February 2001, pp. 6268.
Vol. 143,No.6, November 1996, pp. 541545. [89] IJ. RamirezRosado and J .L. BernalAgustin: "Reliability and costs
[70] Y.L. Cben: uWeakbusoriented optimal multiobjective VAr pl8DDiDg", optimization for distribution networks expansion using aD evolutionary
IEEE Tl'8DS8Ctions on Power Systems, Vol. 11, No.4, November 1996, algorithm", IEEE TransactiODS OD Power Systems, Vol. 16, No.1,
pp. 18851890. February 2001, pp. 111118.
[71] R.A. Gallego, A.B. Alves, A. Monticelli, and R. Romero: "Parallel
simulated annealing applied to long term transmission network expansion
plaDning", IEEE TraDSllCtions on Power Systems, Vol. 12, No.1,
7.6. Distribution Systems Planning and Operation
February1997,pp. 181188.
[72] L.L. Lai and J.T. Ma: uApplication of evolutionary programming to [90] C.W. Hasselfield, P. Wilson, L. Penner, M. Lau, and A.M. Gole: "An
reactive power planning  comparison with nonlinear programming automated method for least cost distribution plaDning", IEEE
approach", IEEE Transactions on Power Systems, Vol. 12, No.1, Transactions on Power Delivery, Vol. 5, No.2, April 1990, pp. 1188
February1997,pp. 198206. 1194.
[73] C.W. Liu, W.S. Jwo, C.C. Liu, and Y.T. Hsiao: "A fast global [91] K. Nara, A. Shiose, M. Kitagawa, and T. Ishihara: "Implementation of
optimization approach to V Ar plaDniDg for the large scale electric power genetic algorithm for distribution systems loss minimum
systems", IEEE TraDSllCtions on Power Systems, Vol. 12, No.1, reconfiguration", IEEE Transactions on Power Systems, Vol. 7, No.3,
February 1997,pp. 437443. August 1992,pp. 10441051.
[74] J. Zhu and M.Y. Chow: "A review of emergingtechniques on generation [92] G.G. Richards and H. Yang: "Distribution system harmonic worst case
expansion planning",IEEE Transactions on Power Systems,Vol. 12, No. design using a genetic algorithm",IEEE Tnmsactionson Power Delivery,
4, November 1997,pp. 17221728. Vol. 8, No.3, July 1993,pp. 14841491.
[75] K.Y. Lee and F.F. Yang: "Optimal reactive power planning using [93] R.F. Chu, J.C. Wang, and H.D. Chiang: "Strategic plaDning of LC
evolutionary algorithms: a comparative study for evolutionary compensators in nonsinusoidal distribution systems", IEEE Transactions
programming, evolutionary strategy, genetic algorithm, and linear on Power Delivery,Vol. 9, No.3, July 1994, pp. 15581563.
programming", IEEE Transactions on Power Systems, Vol. 13, No.1, [94] S. Sundhararajan and A. Pahwa: "Optimal selection of capaciton for
February 1998,pp. 101108. radial distribution systems using a geneticalgorithm", IEEETransactions
[76] e.s. Chang and J.S. Huang: "Optimal multiobjective SVC planning for on Power Systems,Vol. 9, No.3, August 1994,pp. 14991507.
voltage stability enbancement", lEE Proceedings  Generation, [95] V. Miranda, J.V. Ranito, and L.M. Proen~ "Genetic algorithms in
Transmission and Distribution, Vol. 145, No.2, March 1998, pp. 203 optimal multistage distribution network planning", IEEETransactions on
209. Power Systems, Vol. 9, No.4, November 19.94, pp. 19271933.
[77] Y.L. Chen: "Weightednorm approach for multiobjective VAr (96] H.D. Chiang, J.C. Wang, J. Tong, and G. DarliDg: "Optimal capacitor
planning", lEE Proceedings  Generation, Transmissionand Distribution, placement,replacement and control in largescaleunbalanced distribution
Vol. 145,No.4, July 1998, pp. 369374. systems: modeling and a new fomulation", IEEETransactions on Power
Systems,Vol. 10,No.1, February 1995,pp. 356362.
21
[97] H.D. Chiang, J.C. Wang, J. Tong, and G. Darling: "Optimal capacitor [116] AJ. Gaul, E. Handscbin, W. Hoffmann, and C. Lehmkoster:
placement, replacement and control in largescale unbalanced distribution "Establishing a role base for a hybrid ESlXPS approach to load
systems: system solution algorithms and numerical studies", IEEE management", IEEE Transactions on Power Systems, Vol. 13, No.1,
Transactions on PowerSystems, Vol. 10, No.1, February 1995,pp. 363 February1998,pp. 8693.
369. [117] C.H. Kung, MJ. Devaney, C.M. Huang, and C.M. Kung: "Fuzzy
[98] E.C. Yeh, S.S. Venkata, and Z. Sumic: "Improved distribution system based adaptive digital power metering using a genetic algorithm", IEEE
7
planning using computational evolution IEEE Transactions on Power
', Transactions on Instrumentation and Measurement, Vol. 47, No.1,
Systems, Vol. 11,No.2, May 1996,pp. 668674. February1998,pp. 183188.
[99] D. Jiang and R. Baldick: "Optimal electric distribution System switch [118] J.C.S. Souza, A.M. Leite cia Silva, and A.P. Alves da Silva: "Online
reconfiguration and capacitor control", IEEE Transactions on Power topology determination and bad data supression in power system
Systems, Vol. 11, No.2, May 1996,pp. 89().897. operation using artificial neural networks", IEEE Transactions on Power
[100] R. Billintonand S. Jonnavitbu1a: "Optimal switchingdevice placementin Systems,Vol. 13,No.3, August 1998, pp. 796803.
radial distribution systems", IEEE Transactions on Power Delivery, Vol. [119] A.P. Alves da Silva and L.S. Moulin: "Confidence intervalsfor neural
11,No.3, July 1996, pp. 16461651. network based shortterm load forecasting", IEEETnmsactions on Power
[101] S. Jonnavithula and R. Billinton: "MiDirmun cost analysis of feeder Systems, Vol. 15,No.4, November 2000, pp. 11911196.
routing in distribution system planning", IEEE Transactions on Power
Delivery, Vol. 11,No.4, October 1996,pp. 19351940.
[102] Y.C. Huang, H.T. Yang, and C.L. Huang: "Solving the capacitor
7.8. Control
placement problem in a radial distribution system using tabu search
approach", IEEE Transactions on Power Systems, Vol. 11, No.4, [120] R. Asgbarianand S.A. Tavakoli: "A systematic approachto perfonnance
November 1996,pp. 18681873. weightsselectionin design of robustHlsup lap) infmIl PSS usinggenetic
[103] K.N. Miu, H.D. Chiang, and G. Darling: "Capacitor placement, algorithms", IEEE Transactions on Energy Conversion, Vol. II, No.1,
replacement and control in largescale distribution systems by a GA March1996,pp. 111117.
based twostage algorithm", IEEE Tl'IIlsactions on Power Systems, Vol. [121] P. J~ E. Handschin, and F. Reyer: "Genetic algorithm aided controller
12,No.3, August 1997,pp. 11601166. design with application to SVC", lEE Proceedinp  Generation,
[104] IJ. RamirezRosado and J.L. BernalAgustin: "Genetic algorithms Transmission and Distribution, Vol. 143,No.3, May 1996,pp. 258262.
applied to the design of large power distribution systems", IEEE [122] Y.L. AbdelMagid, M. Bettayeb, and M.M. Dawoud: "Simultaneous
Transactions on PowerSystems,Vol. 13,No.2, May 1998,pp. 696..703. stabilisation of power systems using genetic algorithms", lEE
[105] J. Zhu, G. Bilbro, and M.Y. Chow: "Phase balancing using simulated Proceedings  Generation,Transmission aucl Distribution, Vol. 144, No.
annealing", IEEE Transactions on Power Systems, Vol. 14, No.4, 1, January 1997, pp. 3944.
November 1999,pp. 15081513. [123] G.M. Taranto and D.M. Falcio: "Robust decentralised control design
[106] A.S.Chuangand F. Wu: "An extensible genetic algorithmframework for using genetic algorithms in power system damping control", lEE
problem solving in a common environment", IEEE Transactions on Proceedings  Generation,Transmission and Distribution, Vol. 145, No.
PowerSystems,Vol. IS, No.1, Febrwuy 2000, pp. 269275. 1, January 1998,pp. 16.
[107] T.H.Chen and J.T. Cherng: "Optimal pbue arrangementof distribution [124] M. Reformat, E. Kuffel, D. Woodford, and W. Pedrycz: "Application of
transformers connected to a primary feeder for system unbalance genetic algorithms for control design in power systems", lEE
improvement and loss reduction using a genetic algorithm", IEEE Proceedings  Generation,Trausmission and Distribution, Vol. 145,No.
Transactions on Power Systems, Vol. 15, No.3, August 2000, pp. 994 4, July 1998,pp. 345354.
1000. [125] J. Wen, S. Cheng, and O.P. Malik: "A syncluonous generator fuzzy
[108] Y.T. Hsiao and C.Y. Chien: "EDhancement of restoration service in excitationcontroller optimally desiped with a pnetic algorithm", IEEE
distribution systems using a combination fuzzyGA method IEEE 7
',
Trausactions on Power Systems, Vol. 13, No.3, Aupst 1998, pp. 884
Transactions on Power Systems, Vol. 15, No.4, November 2000, pp. 889.
13941400. [126] X.R. Chen, N.C. Pahalawathtba, U.D. ADDakkage, and C.S. K.umble:
"Design of decentralised output feedbackTCSC damping controllers by
using simulatedannealing",lEE Proceedings • Generation, Transmission
7.7. Load Forecasting / Management and ModelIdentification and Distribution, Vol. 145,No. S, September1998,pp. 553558.
[127] M.A. Abido and Y.L. AbdelMaaid: "Hybridizing rulebased power
[109] T.S. Dillon, K. Morsztyn, and K. Phua: "Shon tenn load forecasting system stabilizers with pnetic algorithms", IEEE Tnnsactions on Power
using adaptive pattern recognition and selforganizing techniques", Sib Systems,Vol. 14,No.2, May 1999,pp. 600607.
PowerSystem Computation Conference, September 1975, Vol. 1, Paper [128] Y.L. AbdelMagid, M.A. Abido, S. AlBaiyat, and A.B. Mantawy:
2.413. "SimultaDeous stabilization of multimacbine power systems via genetic
[110] H. Morland H. Kobayashi: '6()ptimal fuzzy inferencefor shorttenn load algorithms", IEEE Tnmsactions on Power Systems, Vol. 14, No.4,
forecasting", IEEE Transactions on Power Systems, Vol. 11, No.1, November1999, pp. 14281439.
February 1996, pp. 390396. [129] M. Welsh, P. Mehta, and M.K. Darwish: "Genetic algorithm and
[111] H.T. Yang, C.M. Huang, and C.L. Huang: "Identificationof ARMAX extended analysis optimisation techniques for switched capacitor active
model for shon tenn load forecasting: an evolutionary programming filterscomparative study", lEE Proceedings  Electric Power
approach", IEEE Transactions on Power Systems, Vol. 11, No.1, Applications, Vol. 147,No.1, January2000,pp. 2126.
February 1996,pp. 403408. [130] A.L.B. do Bomfim, G.N. Taranto, aDd D.M. Falclo: "Simultaneous
[112] J.C.S. Souza, A.M. Leite cia Silva, and A.P. Alves da Silva: "Data tuning of power system damping controllers using genetic algorithms",
debugging for realtime power system monitoring based on pattern IEEETransactionson PowerSystems, Vol. 15,No.1, February 2000,pp.
analysis", IEEE Transactionson Power Systems, Vol. 11, No.3, August 163169.
1996, pp. 15921599. [131] YL. AbdelMagid, M.A. Abido,and A.H. Mantaway: "Robusttuningof
[113] P. Ju, E. Handscbin, and D. Karlsson: "Nonlinear dynamic load power system stabilizers in multimlcbine power systems", IEEE
modelling: model and parameter estimation", IEEE Transactions on Tnmsactionson Power Systems, VoL 15,No.2, May2000,pp. 735740.
PowerSystems,Vol. 11,No.4, November 1996,pp. 16891697. [132] P. Zhang and A.H. CooDick: "CoordiDated syDthesis of PSS parameters
[114] T.Y. Kwok and D.Y. Yewg: "Constructive algorithms for structure in multimachine power systems _ the method of inequalities applied
learning in feedforward neural networks for regression problems", IEEE to genetic algorithms", IEEE Transactions on Power Systems, Vol. 15,
Transactions on NeuralNetworks, Vol. 8, No.3, May 1997,pp. 630645. No.2, May2000, pp. 811816.
[115] A.P. Alves cia Silva, C. Ferreira, A.C. Zambroni de Souza, and G. [133] M.A. Abide: "Robust design of multimachine power system stabilizers
LambertTorres:"A new CODStnlCtive ANN and its applicationto electric using simulated annealiDg", IEEE Tnmsactions on Energy Conversion,
loadrepresentation", IEEE Tl'IDSaCtions on Power Systems,Vol. 12, No. Vol. 15,No.3, September 2000,pp. 297304.
4, November 1997,pp. 15691575. [134] M.A. Abido and Y.L. AbdelMagid: "R.obust design of multimacbine
power system stabilisers using tabu search algorithm", lEE Proceedings 
22
Generation,Transmission and Distribution, Vol. 147, No.6, November
2000, pp. 387394.
[135] R.A.F. Saleh and H.R. Bolton: "Genetic algorithmaided design of a 7.10. State Estimation and Analysis
fuzzy logic stabilizer for a superconducting generator", IEEE
Transactions on Power Systems, Vol. 15, No.4, November 2000, pp. [141] M.R. Irving and MJ.H. Sterling: "Optimal network tearing using
13291335. simulated annealing", lEE Proceedings  Generation, Transmission and
Distribution, Vol. 137,No.1, January 1990,pp. 6972.
7.9. Alarm Processing, Fault Diagnosis, and Protection [142] T.L. Baldwin, L. Mili, M.B. Boisen Jr., and R. Adapa: "Power system
°
23
Chapter 3
Fundamentals of Genetic Algorithms
AbstractResearch on geneticalgorithms (GAs) has shown that the evolu~onary techniques, like evolutionary programming,
initialproposals are incapable of solving hard problems in a robustand evolutionary strategies, etc.), simulated annealing (SA) [8], and
efficient way. Usually, for largescale optimization problems, the tabu search (TS) [9]. Particle swarm [10] is another
execution time of first generation GAs dramatically increases while optimization technique that has shown great potential, lately.
solution qualitydecreases. The focus of this tutorial chapteris pointing
However, more experience is still necessary to prove its
out the maindesign issues in tailoring GAs to largescale optimization
problems. Important topics such as encoding schemes, selection efficiency and robustness.
procedures, selfadaptive and knowledgebased operators are Simulated annealing uses a probability function that allows
discussed. a move to a worse solution with a decreasing probability, as the
search progresses. With GAs, a pool of solutions is used and the
Index TennsGenetic algorithms, largescale optimization, power neighborhood function is extended to act on pairs of solutions.
systems. Tabu search uses a deterministic rather than stochastic search.
Tab~ search is based on neighborhood search with local optima
1. MODERN HEURISTIC SEARCH TECHNIQUES avoidance. In order to avoid cycling, a shortterm adaptive
Optimization is the basic concept behind the application of memory is used in TS. Genetic algorithms have a basic
genetic algorithms (GAs), or any other evolutionary algorithm distinction when compared with other methods based on
[1][3], to any field of interest. Over and above the problems in stochasti~ .search. They can use coding (genotypic space). for
which optimization itself is the final goal, it is also a way for (or representmg the problem. The other methods solve the
~e main idea behind) modeling, forecasting, control, optimization problem in the original representation space
sunulation, etc.. Traditional optimization techniques begin with (phenotypic).
a single candidate and iteratively search for the optimal solution The most rigorous global search methods have asymptotic
by applying static heuristics. On the other hand, the GA ~onvergenc~ proof (a~ ~wn as convergence in probability),
approach uses a population of candidates to search several areas i.e., the optimal solution IS guaranteed to be found if infmite
of a solution space, simultaneously and adaptively. time is.available. Among SA, GA and TS algorithms, simulated
Evolutionary computation allows precise modeling of the annealmg and genetic algorithms are the only ones with proof
optimization problem, although not usually providing of convergence. However, there is no such proof for the
math~cal1y optimal solutions. Another advantage of using cano~cal GA [11], i.e., the one with proportional selection
evolutionary computation techniques is that there is no need for (Section 5.1.6) and crossover/mutation with constant
having an explicit objective function. Moreover, when the probabilities (Section 5.4).
objective function is available, it does not have to be Although all the mentioned algorithms have been
differentiable. succ~ssfully applied to real world problems, several of their
Genetic algorithms have been most commonly applied to crucial parameters have been selected empirically. Theoretical
solve combinatorial optimization problems. Combinatorial ~owledge of the impact of these parameters on convergence is
optm;Uzation .usually involves a huge number of possible still an open problem. In fact, there is no theoretical result for
sol~t1ons, which makes the use of enumeration techniques (e.g., tabu andparticle swarm searches.
cutting plane, branch and bound, or dynamic programming) The choice of representation for a GA is fundamental to
hopeless. achie~g good results. Encoding allows a kind of tunneling in
Thermal unit commitment, hydrothermal coordination the onginal search space. That means, a particle bas a nonzero
exp~ion planning.(generation, transmission, and distribution): probabilityof passing a potential barrier even when it does not
reactive compensation placement, maintenance scheduling, etc. ~ve .enough energy to jump over the barrier, The tunneling
have the typical features of a largescale combinatorial Idea: IS that rather than escaping from local minima by random
optimization problem. In problems of this kind the number of uphill moves, escape can be achieved with the quantum tunnel
possible solutions grows exponentially with the problem size. effect. It is not the height of the barrier that determines the rate
Therefore, the application of optimization methods to find the of escape from a local optimum, but its width relative to current
optimal solution is computationally impracticable. Heuristic population variance.
search techniques are frequently employed in this case for The main shortcoming of the standard SA procedure is the
achieving high quality solutions within reasonable run time. slow asymptotic convergence with respect to the temperature
Among the heuristic search methods there are the ones that parameter T':In the standard SA algorithm, the cooling schedule
apply local search (e.g., hill climbing) and the ones that use a for asympt~tiC global convergence is inversely proportional to
nonconvex optimization approach, in which costdeteriorating the loganthm of the number of iterations, i.e.,
neighbors are accepted also. The most popular methods which T(k)=c/{1+1ogk). The constant c is the largest depth of any
go beyond simple local search are GAs [4][7] (and other local minimum that is not the global minimum. Convergence in
24
probability cannot be guaranteed for faster cooling rates, e.g., Then, crossover and mutation are applied to the intermediate
lowervalues for c. population to create the next generation of potential solutions.
Tabu search owes its efficiencyto an experiencebased fine Although a lot of emphasis has been placed on the three above
tuning of a large collection of parameters. Tabu search is a mentioned operators,the coding scheme and the fitnessfunction
general search scheme that must be tailored to the details of the are the most important aspects of any GA, because they are
problem at hand. Unfortunately, as mentioned before, there is problem dependent.
little theoretical knowledge for guiding this tailoring process. The most popular explanation about how GAs can result in
Heuristic search methods utilize different mechanisms in robust search relies on the argument of hyperplanesampling. In
order to explorethe state space. These mechanisms are based on order to understand this concept, assume a problem encoded
three basic features: with 3 bits. The search space is representedby a cube with one
 the use of memoryless search (e.g., standard SA and GA) or of its vertices at the origin 000. For example, the upper surface
adaptivememory (e.g., TS); of the cube contains all the points of the form *1*, where *
 the kind of neighborhood exploration used, i.e., random (e.g., could be either 0 or 1. A string that contains the symbol * is
SA and GAs) or systematic (e.g., TS); and referred to as a schema. It can be viewed as a (hyper)plane
 the number of current solutions taken from one iteration to the representinga set of solutions with commonproperties.
next (GAs, as opposed to SA and TS, take multiples solutions to The order of a schema is the number of fixed positions
the next iteration). present in the string. The defining length is the distance
The combination of these mechanisms for exploring the between the first and last fixed positions of a particularschema.
state space determines the search diversification (global Building blocks are highly fit strings of low defining length and
exploration) and intensification (local exploitation)capabilities. low order. It can be shown that about 0( n 3 ) hyperplanes are
The standard SA algorithm is notoriously deficient with respect
simultaneously sampled when the number of strings contained
to the diversification aspect. On the other hand, the standardGA
in the population is n. Therefore, even though a GA never
is poor in intensification.
explicitly evaluates any particular hyperplane of the search
When the objective function has very many equally good
space, it changesthe distribution of strings as if it had.
local minima, wherever the starting point is, a small random
Genetic algorithms process many hyperplanes implicitly in
disturbance can avoid the small local minima and reach one of
parallel when selection acts on the population. The true fitness
the good ones, making this an appropriate problem for SA.
of a hyperplane partition corresponds to the average fitness of
However,SA is lesssuitablefor a problem in which there is one
all strings that lie in that hyperplane. Genetic algorithms use the
global minimum that is much better than all the other local
population as a sample for estimating the fitness of that
ones. In this case, it is very important to fmd that valley.
hyperplane partition. After the initial generation, the pool of
Therefore, it is better to spend less time improving any set of
new strings are biased toward regions that have previously
parameters and more time working with an ensemble to
contained strings that were above average with respect to
examine different regions of the space. This is what GAs do
previous populations. In order to further explore the search
best. Hybrid methods have been proposed in order to improve
space, crossover and mutation generate new sample points,
the robustnessof thesearch.
while partially preserving the distnbution of strings that is
observed after selection.
2. INTRODUCTION TO GAs
In the following sections, several important design stagesof
a GA are presented. Section 3 shows different possibilities for
Genetic algorithms operate on a population of individuals.
encoding. It emphasizes the importanceof the encodingscheme
Each individual is a potentialsolution to a given problem and is
on GA convergence. Section 4 treats the formulation of the
typically encoded as a fixedlength binary string (other
fitness function. Section 5 presents differentpropositions for the
representations have also been used, including characterbased
selection, crossover and mutation operators. Parametercontrol
and realvalued encodings, etc.), which is an analogy with an
in GAs is addressed in Section S, too. This chapter is concluded
actual chromosome. After an initial population is randomly or
with a short presentation of niching methods, which serve for
heuristically generated, the algorithm evolves the population
multiobjectiveoptimization via GAs.
through sequential and iterative application of three operators:
selection, crossover and mutation. A new generation is formed 3. ENCODING
at the end of eachiteration.
For largescale optimization problems, the initial population In order to apply a GA to a given problem the first decision
can incorporate prior knowledge about solutions. This one has to make is the kind of genotype the problem needs.
procedure should not drastically restrict the population That means, a decision must be taken on how the parameters of
diversity, otherwise premature convergence could occur. the problem will be mapped into a fmite string of symbols,
Typical population sizes vary between 30 and 200. The known as genes (with constant or dynamic length), encoding a
population size is usually set as a function of the chromosome possible solution in a given problem space. The issue of
length. selecting an appropriate representation is crucial for the search.
The execution of a GA iteration is basically a two stage The symbol alphabet used is often binary, though other
process. It starts with the current population. Selection is representations have also been used, including characterbased
applied to create an intermediate population (mating pool). and realvalued encodings.
2S
In the majority of GA applications, the strings use a binary One possible answer for the binarygene positionproblem is
alphabet and their length is constant during all the evolutionary to use an operator called inversion. This is implemented by
process. Also, all the parameters decode to the same range of extending every gene by adding the position it occupies in the
values and are allocated the same number of bits for the genes string. Inversion is interesting because it can freely mix the
in the string. A problem occurs when a gene may only have a genes of the same string in order to put together the building
finite numberof discrete valid values if a binary representation blocks, automatically, during evolution (e.g., [(2 1) (3 0) (1 0)
is used. If the number of values is not a power of two, then (4 1)], where the first number is a bit tag which indexes the bit
some of the binary codes are redundant, i.e., they will not and the second one represents the bit value, i.e., (3 0) means
correspond to any valid gene value. The most popular that the third bit is equal to zero). At first sight, the inversion
compromise is to mapthe invalidcode to a valid one. operator looks very useful when the correlated parameters are
Another shortcoming of binary encoding is the so called not known a priori. With the association of a position to every
Hamming cliffs (e.g., in the Appendix, although being gene, the string can be correctly reordered before evaluation.
neighbors in decimal representation, the Hamming distance However, for largescale problems, inversion is useless.
between the binary strings for 0.6 and 0.5 is three (different Reordering greatly expands the search space, making the
bits)). It is worthwhile to mention that Gray coding, although problemmuch more difficult to solve.
frequently recommended as a solution to Hamming cliffs, Therefore, the very hard encoding problem still remains in
because adjacent numbers differ by a single bit, has an the hands of the designer. In order to achieve good performance
analogous drawback for numbers at the opposite extremes of for large tasks, GAs must be matched to the search problemat
the decimal scale(e.g., the minimum and maximum gene values hand. The only way to succeed is by using domainspecific
differby only onebit, too). Binary encoding can also introduce knowledge to select an appropriaterepresentation.
an additional nonlinearity, thus making the combined objective
function (the one in the genotype space) more multimodal than 4. FITNESS FUNCTION
the original one (in the phenotypespace).
At the beginning of GA research, the binary representation Each string is evaluated and assigned a fitness value after
was recommended because it was supposed to give the largest the creation of an initial population. It is useful to distinguish
number of schemata (plural of schema), therefore providing the betweenthe objective function and the fitness function usedby
highest degree of implicit parallelism. However, new a GA. The objective function provides a measure of
interpretations haveshown that highcardinality alphabets (e.g., perfonnance with respect to a particular set of gene values,
real numbers) are more effective due to the higher expression independently of any other string. The fitness function
power and low effective cardinality [12][14]. Complex transforms that measure of performance into an allocation of
applications need nonbinary alphabets. Integer or continuous reproductive opportunities, i.e., the fitness of a string is defined
valued genes are typically used in largescale function with respect to other members of the current population. After
optimization problems. Another advantage of nonbinary decoding the chromosomes, i.e., applying the pnotype to
representations, particularly the realvalued one, is the easy phenotype transformation, each string is assigned a fitness
defmition of problem specificoperators. value. The phenotype is used as input to the fitness function.
Whenusing binary coding, the positions of the genes in the Then, the fitness values are employed to relatively ponder the
chromosome is extremely important for a successful GA design, strings in the population.
unless uniform crossover is applied (see Section 5.2). A bad The specification of an appropriatefitness function is crucial
choice can make the problem harder than necessary. Therefore, for the correct operation of a GA [IS]. As an optimization tool, .
correlated binary genes should be coded together in order to GAs face the task of dealing with problem constraints [16].
fonn building blocks, thus diminishing the disruptive effects of Crossover and mutation, i.e., the perturbation (variation)
crossover. However, this information is usually unavailable mechanism of GAs, are general operators that do not take into
beforehand. account the feasibility region. Therefore, infeasible offspring
Epistasis is a measure of problem difficulty for GAs. It appear quite frequently. There are four basic techniques for
represents the interaction among different genes in a handlingconstraints when using GAs.
chromosome. This dependson the extent to which the change in The simplest alternative is the rejecting technique in which
chromosome fitness resulting from a small change in one gene infeasible chromosomes are discarded allover the generations.
varies according to the values of other genes. The higher the A different strategy is the repairing procedure, which uses a
epistasis level, the harder the problem is. This is obviously also converter to transfonn an infeasiblechromosome into a feasible
true when applying uniform crossover or realvalued encoding. one. Another possible technique is the creation of problem
As mentioned earlier, a possibilityfor making the gene ordering specific genetic operators to preserve feasibility of
irrelevant is to apply uniform crossover, because the result of chromosomes.
this operation is not affectedby the positions of the genes. The The previous procedures do not generate infeasible
same goal can be achieved with realvalued encoding and solutions. This is not usually an advantage. In fact, for large
recombination operators that also tum the genes positions scale, highly constrained optimization .problems, this is
irrelevant (Section 5.2). However, making the gene ordering certainly a great drawback. Particularly for power system
irrelevant does not necessarily mean an easier way to a good problems, where the optimal solutions usually are on the
solution. boundaries of feasible regions, the above mentioned techniques
26
for handling constraints often lead to poor solutions. One program, or a human expert that decides the quality of a string.
possible way for overcoming this drawback is to apply the At the beginning of the iterative search, the fitness function
repairing procedure only to a fraction (10%, for instance) of the values for the population members are usually randomly
infeasible population. distributed and wide spread over the problem domain. As the
It has been suggested that constraint handling for such type search evolves, particular values for each gene begin to
of optimization problem should be performed allowing search dominate. The fitness variance decreases as the population
through infeasible regions. Penalty functions allow the converges. This variation in fitness range during the
exploration of infeasible subspaces [17]. An infeasible point evolutionary process often leads to the problems of premature
close to the optimum solution generally contains much more convergence and slow finishing,
information about it than a feasible point far from the optimum.
On the other hand, the design of penalty functions is difficult 4.1. Premature Convergence
and problem dependent. Usually, there is no a priori
information about the distance to optimal points. Therefore, A frequent problem with GAs~ known as deception, is that
penalty methods consider only the distance from the feasible the genes from a few comparatively highly fit (but not optimal)
region. Penalties based on the number of violated constraints do individuals may rapidly come to dominate the population,
not work well. causing it to converge on a local maximum. Once the
There are two possible forms to build a fitness function with  population has converged, the ability of the GA to continue
penalty term: the addition and multiplication forms. The former searching for better solutions is nearly eliminated. Crossover
is represented as g<!) =!<!) + p~) ; where for maximization (Section 5.2) of almost identical chromosomes generally
problems p<!) = 0 for feasible points, and p<!) < 0 otherwise. produces similar 0!fspring. Onl! mutation .(Section 5.3), with its
. random perturbation mechanism, remams to explore new
The maximum absolute p<!) value cannot be greater than the regions of'the search space ..
minimum absolute f~) value for any generation, in order to The schema theorem says that reproductive opportunities
avoid negative fitness values. The multiplication form is should be given to indivi~ls in proportion to their relative
represented as g<!) = f(~p~}; where for maximization fitnesses. However, by d?mg. that,. pre~ture .convergen~e
.. occurs because the population IS not infinite (basic hypothesis
problems p<!) =1 for feasible points, and 0 S p<.!) < 1 of the theorem). This is due to genetic drift (see Section 6). In
otherwise. order to make GAs work effectively on fmite populations, the
The penalty term should vary not only with respect to the (proportional) way individuals are selected for reproduction
degree of constraints violations, but also with respect to the GA must be modified. Different ways of performing selection are
iteration count. Therefore, besides the amount of violation, the described in Section 5.1. The basic idea is to control the number
penalty term usually contains variable penalty factors, too (one of reproductive opportunities each individual gets. The strategy
per violated constraint). The key for a successful penalty is to compress the range of fitnesses, without loosing selection
technique is the proper setting of these penalty factors. Small pressure (Section 4.2), and avoid any superfit individual from
penalty factors can lead to infeasible solutions, while very large suddenly dominating the population.
ones totally neglect infeasible subspaces. In average, the
absolute values of the objective and penalty functions should be 4.2. Slow Finishing
similar. At least in theory, the parameters of the penalty
functions can, also, be encoded as GA parameters. This This is the opposite problem of premature convergence.
procedure createsan adaptivemethod, which is optimized as the After many generations, the population has almost converged,
GA evolves toward the solution. but it is still possible tbat the global maximum (or a high quality
In summary, the main problems associated with the fitness local one) has not been found. The average fitness is high, and
function specification are the following: the difference between the best and the average individuals is
 dependence on whether the problem is related to maximization small. Therefore, there is insufficient variance in the fitness
or minimization; function values to localize the maxima.
 when the fitness function is noisy for a nondeterministic The same techniques used to tackle premature convergence
environment [18]; are used also for fighting slow finishing. An expansion of the
 the fitness function may change dynamically as the GA is range . of population fitnesses is produced, instead of a
executed; compression. Both procedures are prone to bad remapping
 the fitness function evaluation can be so time consuming that (underexpansion or overcompression) due to superpoor or
only approximations to fitness values can be computed; superfit individuals.
 the fitness function should allocate very different values to
strings in order to make the selection operator work easier s. BASIC OPERATORS
(Section 5.1.6);
 it must consider the constraints of the problem; and In this section, several important design issues for the
 it could incorporate different subobjectives. selection, crossover and mutation operators are presented.
The fitness function is a black box for the GA. Internally, Selection implements the survival of the fittest according to
this may be achieved by a mathematical function, a simulator some predefined fitness function. Therefore, highfitness
27
individuals have a better chance of reproducing, while low every individuals' fitness is multiplied and added by a constant.
fitness ones are more likely to disappear. Selection alone cannot The selection intensity of proportionate selection is the only one
introduce any new individuals into the population, i.e., it cannot that is sensitive to the current population distribution [22].
find new points in the search space. Crossover and mutation are However, conclusive statementsabout the performance of rank
used to explorethe solution space. based selection schemes are difficult to make because, by
Crossover, which represents mating (recombination) of two suitable (but tricky!) adjustment, proportionate selection can
individuals is performedby exchangingparts of their strings to give similarperformance.
form two new individuals (offspring). In its simplest form,
substrings are exchanged after a crossover point is randomly 5.1.1.Tournament Selection
determined. The crossover operator is applied with a certain
probability, usuallyin the range [0.5, 1.0]. This operatorallows This selection scheme is implemented by choosing some
the evolutionary process to move toward promising regions of number of individuals randomly from the population and
the search space. It is likely to create even better individuals by copying the best individual from this group into the
recombining portions of good individuals. The new offspring intermediate population, and by repeating it until the mating
created from mating, after being subject to mutation, are put pool is complete. Tournaments are frequently held only
into the next generation. betweentwo individuals. Bigger tournamentsare also used with
The purpose of the mutation operator is to maintain arbitrary group sizes (not too big in comparison with the
diversity within the population and inhibit premature population size). Tournament selection can be implemented
convergence to local optima by randomly sampling new points very efficiently becauseno sorting of the populationis required.
in the search space.The GA stoppingcriterion may be specified The tournament procedure selects the mating pool without
as a maximal numberof generationsor as the achievement of an remapping the fitnesses. By adjusting the tournament size the
appropriate levelfor the generationaverage fitness (stagnation). selection pressure can be made arbitrarily large or small. Bigger
tournaments havethe effect of increasingthe selectionpressure,
5.1. Selection since below average individuals do not have good chances of
winning a competition.
Selection,more than crossover and mutation, is the operator
responsible for determining the convergence characteristics of
5.1.2.Truncation Selection
GAs [19], [20]. Selection pressure is the degree to which the
best individuals are favored [21]. The higher the selection
pressure, the more the best individuals are favored. The In truncation selection, only a subset of the best individuals
selection intensity of GAs is the expected change of average are chosen to be possibly selected, with the same probability.
fitness in a population after selection is performed. Analysesof This procedure is repeated until the mating pool is complete. As
selection schemes show that the change in mean fitness at each a sorting of the population is required,truncationselectionhas a
generation is a functionof the populationfitness variance. greater time complexity than tournament selection. As in
The convergence rate of a GA is largely determined by th~ tournament selection, there is no fitness remapping in truncation
magnitude of the selection pressure. Higher selectionpressures selection.
imply in higher convergence rates. If the selection pressure is
too low, the convergence rate will be slow, and the GA will 5.1.3. Linear RankingSelectiOD
unnecessarily take longer to find a high quality solution. If the
selection pressure is too high, it is very probable that the GA The individuals are sorted according to their fitness values
will prematurely converge to a bad solution. In fact, selection and the last position is assigned to the best individual, while the
schemes should also preserve population diversity, in addition first position is allocated to the worst one. The selection
to providing selectionpressure. One possibility to achieve this probability is linearly assigned to the individuals according to
goal is to maximize the product of selection intensity and their ranks. All individuals get a different selection probability,
population fitness standard deviation. Therefore, if two even whenequalfitness values occur.
selection methods have the same selection intensity,the method
giving the higher standard deviation of the selected parents is 5.1.4.Exponential Ranking Selection
the best choice.
Many selection schemes are currently in use. They can be Exponential ranking selection differs from linear ranking
classified in two groups: proportionate selection and ordinal selection only in that the probabilities of the ranked individuals
based selection. Proportionatebased procedures select are exponentially weighted.
individualsbasedon their fitness values relative to the fitnessof
the other individuals in the population. Ordinalbased 5.1.5.ElitistSelection
procedures select individuals not upon their fitness, but based
on their rank within the population. Preservation of the elite solutions from the preceding
An ordinal selection scheme has a fundamental advantage generation assures that the best solutions known so far will
over a proportional selection one. The former is translation and remain in the population and have more opportunities to
scale invariant, i.e., the selection pressure does not changewhen
28
produce offspring. Elitist selection is used in combination with
otherselectionstrategies. Another possibility is power scaling, i.e., 1'= Inr.
general, the k value is problem dependent and may require
5.1.6.Proportional Selection adaptation during a run to expand or compress the range of
fitness function values. The problem with all fitness scaling
This is the first selection method proposed for GAs. The schemesis that the degree of compression can be determined by
probability that an individual will be selected is simply a single extreme individual, degrading the GA performance.
proportionate to its fitness value. The time complexity of the
method is the same as in tournament selection. This mechanism 5.2. Crossover
works only if all fitness values are greater than zero. The
selection probabilities strongly depend on the scaling of the Crossover is a very controversial operator due to its
fitness function. In fact, most of the scaling procedures
disruptive nature (i.e., it can split important information). In
described in the next sections have been proposed to keep
fact, besides GAs, other evolutionary algorithms do not rely on
proportional selection working. One big drawback of
crossover (or similar type of recombination). However, no
proportional selection is that the selection intensity is usually
definite answer about the necessity of using crossover has been
low, because a single individual, either the fittest or the worst,
reached so far.
dictates the degree of compression of the range of fitnesses.
The traditional GA uses onepoint crossover (Fig. 1), where
This is quite common even during the early stage of the search,
the two parents are each cut once at specific points, and the
when the population variance is high. Negative selection
segmentslocated after the cuts exchanged. The positions of the
intensity is also possible.
bits in the schema determine the likelihood that these bits will
Notice that in ordinalbasedselection schemes the effect of
remain together after crossover. Obviously, an orderI schema
extreme individuals is negligible, irrespective of how much
is not affected by recombination, since the critical bit is always
greater or smaller their fitnesses are than the rest of the
inheritedby one of the offspring.
population. Therefore, despite its popularity inside the power
systemresearch community, proportional selection(i.e., roulette
wheel)is usually an inferiorscheme. There are different scaling
operators that help in separating the fitness values in order to [1 11 ° o o 1] [1 o 1 0 0 0]
improve the work of the proportional selectionmechanism. The =>
most common ones are linear scaling, sigma truncation and [I 0 I 0 1 o 0 0] [1 0 0 o o 1]
powerscaling.
Fig. 1. Exampleof onepoint crossover.
5.1.6.1. Linear scaling
The crossover operator presented above can be generalized
Linear scaling (i.e., 1'= a.f + b) works well except when in order to apply multiplepoint crossover. However, more than
most populations members are highly fit, but a few very poor two crossover points, although giving a better exploration
individuals are present. The coefficients a and b are usually capacity,can be too much disruptive. The crossovermechanism
chosento enforce equality of the objective and fitness functions can be better visualized treating strings as rings. In Fig. 2, two
average values, and also cause maximumscaled fitness to be a point crossover is applied to the example shown in Fig. 1. Each
specified multiple (usually two) of the average fitness. These offspring takes one ring segment, in between adjacent cut
two conditionsensure that average populationmembers receive points, from each parent. The contiguous ring segment(s)is(are)
one offspring copy on average, and the best receives the taken from the other parent. For more than two crossover
specified multiple number of copies. Notice that proportional points, this procedure is repeated until the last segmentis filled.
selection with linear scaling is not the same as linear ranking An extra cut is assumed at the beginning of the string, i.e.,
selection. between genes g8 and g1, for an odd number of cut points.
From the linear string point of view, the elements in
5.1.6.2. Sigma truncation between the two crossover points are swapped between two
parents to form two offspring (Fig. 2). Onepoint crossover can
In order to overcomethe presence of superpoor individuals, be represented by the ring geometry as a twopoint crossover
the use of population variance information has been suggested with the first cut point always between genes g8 and g1. For
to preprocess objective function values before scaling. This multiplepoint crossover, the cut points can be anywhere, as
procedure subtracts a constant from the objective function long as they are not the same.
values; i ' =max[O,i (fdD)] , where 1 is the mean
objective function value in the population. The constant d is
gl 1
chosen as a multiple (between 1 and 3) of the population
standard deviation, and negative results are arbitrarily set to g8 g2 1 1 o o
zero. I I
g7 g3 + 0 o &, 0 o
5.1.6.3. Power scaling
29
g6 g4 1 0 to takethe average of the two corresponding parent genes. The
I I squareroot of the product of the two values can also be used.
gS 0 1 Another possibility is to take the difference between the two
values, and add it to the higher or subtract it from the lower.
[1 1I 0 01 o 1] [1 0 0 1]
=> 5.3.Mutation
[1 0I 0 11 0 0 0] [1 0 0 0 0 0 0]
TheGA literature has reflected a growing recognitionof the
Fig.2. Ringrepresentation and twopointcrossover. importance of mutation in contrast with viewing it as
responsible for reintroducing inadvertently lost gene values.
Uniform crossover is another important recombination The mutation operator is more important at the final generations
mechanism [23]. Offspringis created by randomly picking each when the majority of the individualspresent similar quality. As
bit from either of the two parent strings (Fig. 3). This means is shown in Section 5.4, a variable mutation rate is very
that each bit is inherited independently from any other bit. important for the search efficiency. Its setting is much more
Uniform crossover has the advantage that the ordering of the critical than that of crossoverrate.
genes is irrelevantin terms of splitting buildingblocks. In the case of binary encoding, mutation is carried out by
flipping bits at random, with some small probability (usually in
the range [0.001; 0.05]). For realvalued encoding,the mutation
[1 1 0 1 0 1 0 1] operator can be implemented by random replacement, i.e.,
~t ~ t ~ t t ~ => [1 0 0 1 0 0] replace the value with a random one. Another possibility is to
[1 0 0 1 1 0 0 0] addIsubtract (or multiply by) a random (e.g., unifonnily or
Gaussian distributed) amount. Mutation can also be used as a
Fig. 3. Example of uniformcrossover,where each mow pointsto the randomly hillclimbing mechanism.
picked genevalue.
5.4.Control Parameters Estimation
Uniform crossover is more disruptive than twopoint
crossover. On the other hand, twopoint crossover performs Typical values for the population size, crossover and
poorly when the population has largely converged because of mutation rates have been selected in the intervals [30, 200],
the inability to promotediversity. For small populations, which [0.5, 1.0] and [0.001, 0.05], respectively. Fixed crossover and
is not usually the case for largescaleproblems, more disruptive mutation operators do not provide enough search power for
crossover operators such as uniform or mpoint (m>2) may tackling largescale optimization problems. Parameter manual
perform better because they help overcome the limited amount tuning is common practice in GA design. One parameter is
of information. tuned at a time in order to avoid the impossible task of
Reduced surrogates can be used to improve twopoint simultaneous estimation. However, as they strongly interact in
crossover exploration ability. It is highly recommended for complex forms, this tuning procedure is prone to sub
largescale problems. The idea is to ignore all bits that are optimality.
equivalent in the two parent strings (Fig. 4). Afterwards, In fact, any static set of parameters is inappropriate,
crossover is applied on the reduced surrogates, i.e., only one regardless of how they are tuBed. The GA search technique is
possible cut is considered between any pair of nonequivalent an adaptive process, which requires continuous tracking of the
bits. search dynamics. Therefore, the use of constant parameters
leads to inferior performance, For example, it is obvious that
large mutations can be helpful during early generations to
[1 0 1 0 1 0 1] [ o  1] improve the GA exploration capability. This is not the case for
reducedsurrogates => the end of the search, when small mutation steps are needed to
[1 0 0 1 1 0 0 0] [ 0 o  0] fine tune suboptimalsolutions.
The proper way for dealing with this problem is by using
Fig.4. Implementation of reduced surrogates to improvecrossoverexploration parameters that are functions of the number of generations.
capability.
Deterministic rules are frequently applied for implementing this
Notice that the reduced surrogate form implements the idea. However, besides being very difficult to define, they fail
to take into account the actual progress of the population
original crossover operation in an unbiased way. For example,
performance. Adaptive rules based on population variance, or
the cut points between genes 2)3, 314 and 415 produce the same
effect on offspring. Therefore, twopoint reduced surrogate even the search for optimal parameters as part of the GA
crossoverconsidersthese cut points as one singlepossible cross processing (i.e., including parameters as part of the
point. chromosomes) seem to be more promising [24],[25].
The crossover operator can be redefined for realvalued
encoding. Different combinations have been utilized (e.g., a 6. NICHING METHODS
convex combination such as At!} +A.2~2)' One possibility is
30
Two agents cause the reduction of population fitness under analysis [30]. Recently, another interesting idea, based on
variance at each generation. The first, selection pressure, the theory of immunity in biology, has been proposed [31].
multiplies copies of the fitter individuals. The other agent is The fJISt applications to power systems appeared after 1991
independentof fitness. It is called genetic drift [26] and is due [32]. Since then, GAs have been applied not only to pure
to the stochasticnatureof the selectionoperator (i.e., bias on the optimization problems in power systems, but also to model
random sampling of the population). When there is lack of identification, control, and neural network training. After a
selection pressure, genetic drift is responsible for premature necessary period of maturing, GAs are being used now,
convergence. The GA still ends up on a single peak, even when frequently in combination with conventional optimization
there are several ones of equal fitness. techniques, for solving largescale problems.
Therefore, even whenmultiobjective optimization is not the
main goal, the identification of multiple optima is beneficial for 8. APPENDIX
the GA performance, Niching methods extend standard GAs by
creating stable subpopulations around global and local optimal TABLE I
solutions. Niching methods maintain population diversity and HAMMING DISTANCEAND ORAY CODE
allow GAs to explore many peaks simultaneously. They are Binary Gray Real
based on either fitnesssharing or crowding schemes [27]. [0000] [0000] 0.9
Fitness sharing decreases each element's fitness [0001] [0001] 0.8
proportionally to the number of similar individuals in the
population, i.e., in the same niche. The similarity measure is [0010] [0011] 0.7
based on either the genotype (e.g., Hamming distance) or the [0011] [0010] 0.6
phenotype (e.g., Euclidian distance). On the contrary, crowding [0100] [0110] 0.5
schemes do not require the setting of a similarity threshold [0101] [0111] 0.4
(niche radius). Crowding implicitly defines neighborhood by
[0110] [0101] 0.3
the application of tournament rules. It can be implemented as
follows. When an offspring is created, one individual is chosen, [0111] [0100] 0.2
from a random subset of the population, to disappear. The [1000] [1100] 0.1
chosen one is the element which most closely resembles the [1001] [1101] 0.0
new offspring. [1010] [1111] +0.1
Another idea used by niching methods is restricted mating.
This mechanismavoids the recombinationof individuals which [1011] [1110] +0.2
do not belong to the same niche. Highly fit, but not similar, [1100] [1010] +0.3
parents can produce highly unfit offspring. Restricted mating is [1101] [1011] +0.4
based on the assumption that if similar parents (i.e., from the [1110] [1001] +0.5
same niche) are mated, then offspringwill be similar to them.
[1111] [1000] +0.6
It is importantto notice that similarity of genotypes does not
necessarily imply similarity of the corresponding phenotypes.
The hypothesis that highly fit parents generate highly fit 9. ACKNOWLEDGMENTS
offspring is valid only under the occurrence of building blocks
and low epistasis. When the genes strongly interact, there is no This work was supported by the Brazilian Research Council
guarantee that these offspring will not be lethals. (CNPq) under grant No. 300054/912. Alves da Silva would
also like to thank PRONEX for the financial support and
7. FINALCOMMENTS Professor Djalma M. Falclo, from the Federal Universityof Rio
de Janeiro, for his time and effort reviewing this chapter.
This tutorialon GAshas pointed out the main topics on their
design. The focus on the essential topics helps to not miss the 10. REFERENCES
forest for the trees. The first generation of GAs, based on the
canonical algorithm, considering proportional selection and [1] D.B. Fogel: "An introduction to simulated evolutionary optimization",
crossover/mutation with constant probabilities, was not [2] IEEE Trans. on NeuralNetworks, VoL5, No.1, 1994,pp. 314.
D.B. Fogel: Evolutionary Computation  Toward a New Phylosophy of
originally proposed for solving static optimization problems Macltine Illte11igence, IEEE Press, 1995.
[28]. Almost three decades of research has adapted the original [3] T. Blck, U. Hammel, and H.P. Schwefel: "EvolutioDll')' computation:
proposal [29] to deal with this type of problem. comments OD the history and current state", IEEE Trans. on Evolutionary
One important issue that has been avoided in this chapter is [4] Computation, Vol. 1, No.1, 1997, pp. 317.
D.E.Goldberg: Genetic Algorithmsin Search, Optimizlltion and Machine
parallel GAs. They introduce new parameters such as the Learning, AddisonWesley, 1989.
number of populations and their sizes, the topology of [5] D. Beasley, D.R. Bull, aDd R.R. Martin: "An overview of genetic
communications (e.g., each population is connected to all the algorithms: Part 1, fundamentals", University Computing, Vol. 15,No.2,
others), and the migration rate. Althoughmany implementations 1993, pp. 5869.
D. Beasley, D.L Bull, aDd R.Ro Martin: "An overview of genetic
of parallel GAs have been described in the literature, the effect [6] algorithms: Part 2, research topics", University Computing, Vol. IS, No.
of these new parameters on the quality of the search is still 4, 1993, pp. 170181.
31
[7] D. Whitley: "A genetic algorithm tutorial", Statistics and Computing, [21] T. Back: "Selective pressure in evolutionary algorithms: A
Vol. 4, 1994,pp. 65·85. characterization of selection mechanisms", Proc. 1st IEEE Conf. on
[8] E. Aarts and J. Karst: SimulatedAnnealing and Boltzmann, John Wiley, EvolutionaryComputatiOn, IEEEPress, 1994,pp. 5762.
1989. [22] L.D. Whitley: "The GENITOR algorithm and selection pressure: Why
[9] F. Gloverand M. Laguna: TabuSearch, Kluwer Academic, 1997. rankbased allocation of reproductive trials is best", Proc. 3rd Int. Conf.
[10] J. Kennedy and R.C. Eberhart: Swarm Intelligence, Morgan Kaufmann, on Genetic Algorithms,Morgan Kaufmann, 1989,pp. 116..121.
2001. [23] G. Syswerda: "Uniform crossover in genetic algorithms", Proc, 3rd Int.
[11] G. Rudolph: "Convergence analysis of canonical genetic algorithms", Conf. on Genetic Algorithms, Morgan Kaufmann, 1989,pp. 2·9.
IEEE Trans.on NeuralNetworks, Vol. 5, No.1, 1994,pp. 96101. [24] N. Saravanan, D.B. Fogel, and K.M. Nelson: "A comparison of methods
[12] HJ. Antonisse: 64A new interpretation of schema notation that overturns for selfadaptation in evolutionary algorithms", BioSystems, Vol. 36,
the binary encoding constraint", Proc. 3rd Int. Conf. on Genetic 1995,pp. 157166.
Algorithms, Morgan KaufiDann, 1989, pp. 8691. [25] A.E. Eiben, R. Hinterding, and Z. Micbalewicz: UParameter control in
[13] D.E. Goldberg: "The theory of virtual alphabets", in Parallel Problem evolutioDary algorithms", IEEE Trans. on Evolutionary Computation,
Solving from Nature 1, Lecture Notes in Computer Science, Vol. 496, Vol. 3, No.2, 1999, pp. 124141.
Springer,1991,pp. 1322. [26] A. Rogers and A. PriigelBennet: "Genetic drift in genetic algorithm
[14] LJ. Eshelman and J.D. Schaffer: "Realcoded genetic algorithms and selection schemes", IEEE Trans. on Evolutionary Computation, Vol. 3,
intervalschemata", in Foundations of Genetic Algorithms 2, Morgan No.4, 1999, pp. 298·303.
Kaufmann, 1993,pp. 187202. [27] B. Sareni and L. KrlhenbOhI: uFitness sharing and niching methods
[IS] N.J. Radcliffe and P. D. Surry: "Fitness variance of formae ana revisited", IEEE Trans. on Evolutionary Computation, Vol. 2, No.3,
perfonnance prediction", in Foundations of Genetic Algorithms 3, 1998,pp. 97106.
Morgan Kaufmann, 1995,pp. 5172. [28] LA. De Jong: "Genetic algorithms are NOT function optimizers," in
[16] Z. Michalewicz and M. Schoenauer: "Evolutionary algoritlmis for Foundations of Genetic Algorithms 2, Morgan Kaufmann, 1993, pp. 5
constrained parameter optimization problems", Evolutionary 17.
Computation, Vol. 4, No.1, 1996,pp. 132. [29] J.B. Holland: A.daptation in Natural and ArtificialSystems, Universityof
[17] M. Gen and R. Cheng: "A survey of penalty techniques in genetic MichiganPress, 1975.
algorithms", Proc, 3rd IEEE ConL on Evolutionary Computation, IEEE [30] E. CantuPaz: "Markov chain models of perallel genetic algorithms",
Press, 1996,pp. 804809. IEEE Trans. on EvolutionaryComputation, Vol. 4, No.3, 2000, pp. 216
[18] B.L. Miller and D.E. Goldberg: "Genetic algorithms, selection schemes, 226.
and the varying effects of noise", University of Illinois at Urbana [31] L. Jiao and L. Wang: "A novel genetic algorithm based on irmnunity",
Champaign, IlliGALReport,No. 95009, 1995. IEEE Trans. on Systems,Man, and Cybernetics Pan A, Vol. 30, No.5,
[19] D.E. Goldberg and K. Deb: "A comparative analysis of selection 2000, pp. 552561.
schemes used in genetic algorithms", in Foundations of Genetic [32] V. Miranda, D. Srinivasan, aDd L.M. ~: "Evolutionary
Algorithms, Morgan Kaufmann, 1991,pp. 6993. computation in power systems", Electrical Power &. EDeI1Y Systems,
[20] T. Blickle and L. Thiele: UA comparison of selection schemes used in Elsevier,Vol. 20, No.2, 1998,pp. 8998.
genetic algorithms", Swiss Federal Institute of Technology,TIKRePQrt,
Nr.!!, 2ad Version, 1995.
32
Chapter 4
Fundamentals of Evolution Strategies and Evolutionary Programming
Abstract  In this Chapter one discusses the principles governing the location. And with this sense of independence, it is reasonable
family of Evolutionary Algorithms called Evolution Strategies. A to recognize that .Evolutionary Programming is just a
variant, called Evolutionary Programming and sometimes taken as conceptual subset (not historical, of course) of the general
independent, is also discussed. Evolution Strategies do Dot depend on conceptsdeveloped underthe name of EvolutionStrategies.
chromosome coding and have a strong theoretical background
justifyingtheir success. As in any evolutionary process, ES and EP rely on the
Keywords Evolution Strategies, EvolutionaryProgramming definition of a fitness function, which sets up the
"environment" and establishes the way to measure the quality
of each solution (called individual). The fitness function is just
like the objectivefunction of an Operations Researchproblem;
1. INTRODUCTION as such, it may includepenalties for the violationof constraints.
This chapter is devoted to the description of a branch of To be fair, the concept of fitness function may include a
techniques in evolutionary computing: that, for historical loose definition of what a function is. In fact, the only real
reasons, was kept divided for many years, although they may requirement is a process that allows a rankingof alternatives in
be seen in fact as the same generic approach: evolutionary the solution space, such that this ranking is in agreement with
programming (EP) and evolution strategies (ES). This does not the preferences defined by the decisionmaker. This process
happen by chanceor coincidence: today, we have difficultiesin can, therefore,include rules, besides mathematical expressions.
explaining to newcomers the differences between the two But it is true that the traditional models have analytical
approaches. expressions as their fitness function.
Instead of two distinct paradigms, what we find is a Both EP and ES share with GA this "selectionbyfitness
collection of variations over the same theme and perhaps it is function"principle.It is worth to notice this, because this is not
about time to accept calling all them by a family name such as the only process that may ignite and drive an evolutiveprocess.
"evolution strategies"... Nevertheless, history records that For instance, "selectionbyarmsrace" is another possible
evolutionary programming and evolution strategies were mechanism; it would require at least two distinct populations,
created independently. one of them possibly predating on the other. But, in fact, GA,
ES and EP are all three based on the same algorithmic
Evolutionary Programming is attributed to Lawrence J. approach, where individuals are not confronted with one
Fogel, sometimes back to the early 1960's in collaboration with anotherbut are measured against an external commonselection
John Holland, although the landmark publication is the book function.
"Artificial Intelligence Through Simulated Evolution" (1966).
While Holland led his followers into developing the GA But ~S and EP move away from GA in the way they
(Genetic Algorithm) approach, Fogel persisted in a line that is represent alternatives, solutions or individuals in a population.
now known as EP (EvolutionaryProgramming). While Genetic Algorithms rely on the power of a discrete
gen~tic representation to generate new offspring with higher
Evolution Strategies is a method claimed by I. Rechenberg SUI'VIval chances, EP and ES make direct representation of
and H.P. Schwefel, who report the first developments back to individuals.
the TU Berlin, in 1963. They stimulated an independent group
that produced a remarkable set of practical and theoretical In GA, one must establisha mappingbetween the space of
results, in such a way that EP could be seen sometimes as a the genes and the space of the phenotypic variables. The
subset ofES. variation introducedby crossoverand mutation is generated at
gene level, while the phenotypic consequences are evaluated
These two communities evolved separately and gathered againstthe fitness function at problemvariablelevel.
rival factions around them, sustaining competing series of
workshops and conferences, with supporters aligning OD either In ES and EP, there is no real gene level and no need for a
side of the fence. Both the adepts of the American and of the mapping process between genes and problem variables. Each
European school of thought sustained a sort of race for solutionis represented by its own variables, with real or integer
popularity and tried to promote the originators of the processes values, within their feasible domains. Therefore, one can say
they supported as the true fathers of the evolutionaryapproach. that variation is introduced directly at the level of the
phenotype. Variation and diversity are essential to make
For practicalpurposes, it is healthy to witness some present selection ~eful: They allow the coverage of the search space.
day efforts in order to unite the evolutionary perspective Loss of diversity leads usually to an early termination of
brought aboutby the two schools, and to give recognition to the evolutionary algorithms, either at suboptimal solutions or at
ones who really deserve it, independendy of their geographical local optima.
33
In classical pure EP, variation is introduced solely by judgment, instead of using analytical solutions available at the
mutation. In ES, besides mutation, processes similar to time.
crossover also originate variation and new phenotypic Nowadays, there is a variety of models or versions of ES.
expressions. But, as we shall see, there is no real distinction In the following sections we will discuss some aspects of what
between the two approaches and they have actuallyconverged
could be called a canonical model  the (J,l,IC,A.,p) ES model,
in conceptual terms.
usingthe notation of [1] .
Contrary to many approaches to the theme, we will start
with the description of Evolutionary Strategies. We have 2.1. The general (J.l,K,A.,p) EvolutionStrategiesscheme
present that, because of its greater simplicity, EP easily gains
adepts. It can easily be developed by students and the The designation "(J.L,IC,A.,p) ES model" has been proposed
popularity gained contributes certainlyto keep it in the popular by Schwefel [1] and has the following parameters:
although questionable view as an independent development J.1  numberof parents in a generation
from the Evolution Strategies mainstream. However, as we K  number of generations of survival or maximum
shall see, EP can be seen very naturally as a branch or a numberof reproduction cyclesof an individual
subdivision of ES. A.  number of offspring or children created in one
generation
p  numberof directancestors for each individual
2. EVOLUTION STRATEGIES
Under this light, Evolutionary Programming is almost like a
Under the general designation of Evolution Strategies, the
(J.1,CX>,J.1+Jl, 1)Evolution Strategy.
ideas started by Rechenberg and Schwefel were explored and
resulted in a remarkable legacy of theoretical and experimental This means that a whole family of processes can be started,
work. This work is so impressive that, after becoming aware of depending on the choiceof the above parameters. Some of the
its full extent, an independent observer cannot avoid the varieties have been researched in depth and some are still open
temptation to classifyEvolutionary Programming as a de facto field for research. Of course, the simplest ones have been the
subsetof the globalEvolution Strategiesframework. most investigated, and this effort has brought insight into the
mechanisms that power the Evolution Strategies and make
The geographical independence of the development of EP
them so successful (or that provoke divergence and lack of
(in the US) and ES (in Europe) and subsequent creation of
success in somecases).
schools of followers is probably the best expIanation why we
still today find a distinction in the literature. As we shall see, The aim of this Chapter is to explain to the Power System
the most fundamental reason that one sees claimed, to researchers and engineers the basics on how ES are built and
distinguish EP from ES, is the fact that "pure" EP does not work. Therefore, our didactic strategy will be to start with
make use of recombination of individuals as a means to simple models and progressively increase them in complexity.
generate offspringand diversity and relies only on mutation. After all, it will be like retracing the story of the theoretical
development of ES.
Evolution Strategies is a name that covers a wide family of
related algorithms. These algorithms follow the general The first ES models bad less degrees of freedom than the
biological paradigm of exploting a search space by means of (J.1,K,A.,p) model admits. The first approach became known as
processes mimicking mutation, recombination and selection the (1+1)ESmodel: it had, in each generation, only one parent,
(because EP does not use recombination, it is just a special case only one descendent was generated and the selection acted
withina family of evolutionary mimickingprocesses, ES). And uponthe set constituted by parent and child.
they are distinct from GA models in the fact that there is no
distinction betweengenotype and phenotype, i.e., there is Dot a Later, one spoke of an opposition of (Jl+).)ES against
separation of worlds where variation is generated in one of (J.1,A)ES. In the (J,1+A.)ES, the J.l survivors in each generation
them while selection acts on the effects felt in another ODC. In wereselectedfroma populationformed by the union of the sets
ES, the problem and the alternatives are represented in their of J.1 parents and Achildren. This meant that an lndividuaI had
natural variables. the possibility, in theory, of living forever. According to this
notation, the first experiments obeyed to a (1+1)ES strategy.
The problem addressed in the early 1960's by Rechenberg
and Schwefel was related with finding optimal shapes On the other hand, in a (Jl,A)ES with A ~ .... ~ 1, the new Il
presenting minimum drag in a wind tunnel  a typical future parents are selected from the A. offspring only, no matter
engineering problem. Between 1963 and 1974, these how good their parentsmight be. It has been demonstrated that
researchers developed the foundationsof a theory for Evolution this strategy risks to diverge in some cases, if the solution
Strategies. The results, however interesting, remained within "bestsofar" is not storedexternallyor at least preservedwithin
the knowledge of a closed community, perhaps especially the generation cycle (this deterministic preservation of the best
composed by civil or structural engineers. It seems that a reason is called elitism). The first modelsof this kind have been called
exists for this: there are many structural or technical the (1,A) ES.
optimization problems for which no mathematical analytical
closed fonn for an objective function exists. Therefore, The (J.1,A.)ES implies that an individual can have children
engineers had to rely on their intuition and professional only once, and that its life duration is of one generation, as
34
opposed to the (J.l+A)ES where there is no limit for the life span II initialize a random population P of J.L elements
of an individual. InitpopulationP[g];
II evaluate the fitness of all individuals of the population
The {J.l,lC,A.,p)ES introduces new degrees of freedom in Evaluate P[g);
defining an evolution strategy. With the variable 1C defining a while not done do
life span for each individual, one can now test a variety of II reproduction  generate A. offspring...
strategies and look at the {J.L,A)ES and the {J.l+A.)ES as the II .. •by recombination
P' [g) := recombine(P [g])
extreme cases of such variety. Furthermore, by recognizing the
II .••by mutation  introduce stochastic
role of the parameter p, which defines how many parents does perturbations in the new population
an individual have, one explicitly introduces the operation of
recombination as one major factor conditioning the P [g]:= Mutate ( P' [g) );
II evaluation  calculate the fitness of the new
development of an EvolutionStrategy. individuals
But in contemporary ES, the (J.L,le,A,p) are not the only Evaluate P [g];
parameters to take in account.We can list somemore: II selection  of J1 survivors for the next generation,
based on the fitness value
p  the (start)population
P [g+1) := seled ( P[g] u P
[g] );
mut  the mutationoperator II test for termination criterion (based on fitness, on
numberofgeneration~et~)
Pm  mutationprobability If test is positive then done := TRUE;
rec  the recombination operator II increase the generation counter
g:=g+1;
Pr recombination probability End while
EndES
sel selection operator
l;  numberof stochastictournament participators 2.2. Some more basic concepts
Other parameters may be recognized, some of them Although there is still much work to be done in order to
associated with the operators ree, mot and sel adopted. establish on solid grounds a general theory about generalized
Evolution Strategies, there are some achievements made over
It is usual to distinguish in the representation of an
some simplified models that allowed insight into the way ES
individual two types of parameters: object parameters (OP) and
work. Although this text is not meant to organize all theory
strategy parameters (SP). Say, individual c is represented by
behind ES and is instead oriented to give an introduction to the
c(OP,8P), such that
topic to Power Engineers and Researchers, we win nevertheless
OP = (Oh 02, ... ,ouo) introduce some basic concepts that have been used by
researchersin the field.
and SP =(5h 52, ... , sas)
In trying to develop a formal description of the behavior of
given"no" objet parameters and "ns" strategyparameters. ES, researchershave worked mostly on the socalled"spherical
The object parameters may be OP = (9, xi, ... , xJ. The x model", and have introduced the concept of "progress rate".
are the classical n variables of the problem o(or the phenotypic The progress rate cp is defined as the Expectation of the
variables) and are the only ones that enter the fitness function. change in the (Euclidean) distance, from one generationto the
The parameter e counts the remaining life span measured in following, between the optimum (wherever it is) and the
number of iterations (reproduction cycles). Of course, at the average location of the population.
birth of an individual,e = x,
The spherical model consists of an isotropic fitness
The strategy parameters usually refer to the standard landscapedefined by
deviations CJ for mutations, which can be global or in each of
the n dimensions or variables of an individual, and to
parameters a establishing correlation between mutations in Ft(y)=co + tCi~i y;f
distinct variables (sometimes called"angle"parameters). i=l
3S
A mutatedindividual is thereforedefined as resultsfor this strategy,namely about convergencevelocity and
step size.
y=y+Z
This model will have an individual in a generation
where Z is a random vector. Observing Figure 1, we can represented as a set of variables (without loss of generality,
understand that the rate of progress <p may be defined as the let's admit for the moment that we are dealing only with real
expectation valuedvariables), such as in Figure2.
cp= E{R r}
where r is the distance of a mutated individualto the optimum.
This has been the basic model adopted by researchers like
x.
Beyer [2] to analyze, in the spherical model, the progress rate
of an ES and to derive laws about the probability of success, Figure2  Representation of an individuali with n real valued
i.e., the probability of the mutated y being inside the circle variables
definedby R aroundthe optimum, and about how to achieve an
optimal progress rate, i.e., the fastest progress possible towards
the optimum. The mutation scheme may be described as follows: a
mutation step is carried out at generationg by addinga random
perturbation Z to the parent individual X<a), creating X:
X :=X{J)+Z
where
,,
;
We see that this is a simpleschemewith elitist selection. So However, this "l/S rule" may also reduce the effectiveness
simple, in fact, that it allowedthe derivationof sometheoretical of the algorithm in finding an optimum; it may accelerate the
discover of the optimum, but the probability of actually
36
reaching it becomes reduced, because it tends to get trapped in This model allows us to introduce a dynamic control of the
local optima; from then on, if may become difficult to find mutationstrength,as opposed to the Rechenberg I/S rule which
improvements in the neighborhood and then, after a number of was seen by many as a concept alien to the ES spirit: the
generations, the application of the 1/5 rule will further reduce mutation strength was controlled externally by some
the stepsize, making it even more difficult to escape. deterministic rule.
Therefore, the research effort was directed towards
2.4. Focusingon the optimum achieving a dynamic control of the mutation strength under
Mutations leading to important variations in the individuals principles of evolution and selfadaptation  this means that a
are usually beneficial to the procedure in the beginning, mutation strengthparameter would also be subject to mutation
becausethey allow new individuals to jump away from parents and selection,in order to adapt the progress of the algorithm to
and thus to probe vast regions of the feasible domain of the an optimalprogress rate.
problem. However, at a later stage, large perturbations drive
A succesful family of models in this line of reasoning is the
individuals away from the region of the optimum.
aSA or aSelf Adaptation strategy, originally developed by
Common sense tells us that when we have solutions Schwefel [4,5].
neighboring a possible optimum, the spread of the probability
The central idea is that each individual is governed by
distribution that regulates mutation should become narrower..
object parametersand by evolvable strategy parameters, and if
This allows a fine adjustment of the solutions and is part of the
an individual is selected with respect to its fitness, the
rationalbehind Rechenberg's Rule.
corresponding set of strategicparameterssurvive as well. These
This rule introduces or extemally defines rules for reducing strategic parameters, if optimal, should drive the individuals
the spread of the probability distributions, with the increase in into a regime of optimum gain, i.e., of maximal expected
generations. Proposed as such, this naive scheme is mechanical, fitness improvement per generation.
deterministic and rigid and against the very spirit of
In the (1).) strategy we have only one evolvable strategy
evolutionary processes.
parameter  a mutation strength G. An individual at a
One technique instead gained popularity, because it uses the generation g can thereforebe representedsuch as in Figure 4.
very same principles ofevolution in a sort of metaevolutionary
scheme.
This scheme associates to each individual an extra variable,
which represents precisely the variance of its mutation
distribution. Although we will further elaborate on this topic
some sections below, we can immediately state that this
strategy is quite successful in many cases and that it has
theoretical background to support it Figure 4  Representation of an individualwith n real valued
variables; it has an extra variable related with the variance of
the gaussian distribution commanding the mutations in its
offspring
37
Gk (g+1) = n~(g) J proportional to 1/ ~ , where n is the dimension of the search
space.
The n[.] mutation operator perform~ . m~ltip1icative
mutations. This can be done by the multiplication of the Practical rules fromBeyer [7] are the following:
parental ~) by a randomnumber ~ such that 1
For A.~10, or=: ~Cl,')..
O'k (g+1) = ;O'(g) , k = 1 to A
The expectation of~ must not deviate too muchfrom 1, i.e., 1 et,A.
For 4<A<IO , 't~ r
E{;} ~ 1 vnn 2cf,A +12d~~~
There are some distributions of the random variable ~ that
have found practical use. One of the most important is !he where cl,A is called the ,'progress coeffici
tcient" and d(2) .
1,A IS
lognormal distribution, originallyproposed by Schwefel, w~~h
has the property that some value will have the same probabIlIty called the "second order progress coefficient" [6]. Here is a
of beingdoubled or of becomingdivided in half: tableof coefficients extracted from [7], calculatedby numerical
integration froma theoretical model.
1 1 2~
1(1n;J2 Table 1 Coefficients cJ.L,1 to adopt in a (l,A) ES
Pa(~)=e
tJi,; ~ A et,A d(2)
1,).,.
For practical purposes, the random variable ~ can easily ~e
generated from the Gaussian N(O,l) by an exponential 2 0.5642 1.000
transformation such as 3 0.8463 1.2757
4 1.0294 1.5513
~= e'tN(O,l)
5 1.1630 1.8000
These expressions introduce the external value 1:  ~e 6 1.2672 2.0217
learningparameter, which will soon be discussed. The learning 7 1.3522 2.2203
parameter t conditions the speed and accuracy of the aSA 8 1.4236 2.3995
evolution strategy. Therefore, the question on how to choose 9 1.4850 2.5626
good valuesfor "t remains,for the moment. 10 1.5388 2.7121
Another mutation rule used in practice depends on the 20 1.8675 3.7632
symmetrical twopoint disttibution. For practical purposes, the 30 2.0428 4.4187
mutatedvalue of Ok(g+l) is given by 40 2.1608 4.8969
50 2.2491 5.2740
0'(8+1) ={O'(g) (1 + p) , if U(O,I) S 0.5 60 2.3193 5.5856
O'(g) /(1 +p), if U(O,l) > 0.5 70 2.3774 5.8512
80 2.4268 6.0827
with U(O,l) being a sampling from the uniform distribution in 90 2.4697 6.2880
[0, I]. Also the application of this distribution depends on a 100 2.5076 6.4724
learning parameter p, sometimes appearing under the form of
200 2.7460 7.7015
a =l+p.
300 2.8778 8.4610
It has been proved that, given ~ and p sufficientlysmall, the
effects of these two mutation schemes become comparable. It
Below A=4 one cannotadoptthe second formula, becauseit
has been demonstratedthat there is an equivalence between the
yields an imaginary result; a value larger than c1,A. may be
two approaches, given the correspondence 't = P~l P, if 't is
used, instead.The aSA algorithm cannot adapt itself in such a
sufficiently small. way to obtain a theoretical optimum rate of progress,but it still
selfadapts the mutationstrength.
2.6. How to choose a value for the learningparameter?
As a general indication, it must be said that there is a
It has been demonstrated, within the hyperspherical model, theoretical value for t that maximizes the progress rate.
which is usually a good local model, that large values for ~ or "t However, this maximum is not symmetrical with respect to t,
should be avoided. The objective is to find a compromise value and one has a much strongerrisk of degrading the performance
that leads the algorithm to a nearoptimal performance, of the algorithm by choosing a "t value too small than from
measured in terms of "rate of progress" towards the optimum.
The Schwefel's rule establishes that 't should be chosen using 't > Cl,A! ~ •
38
Also, it has been observed (as naturally expected) that a 4 0.41 1.05 1.49 1.70 1.84 1.95 2.24 2.51 2.65
transient periodprecedes the establishment of a steady state in 5 0.00 0.91 1.39 1.62 1.n 1.87 2.18 2.45 2.60
the progressof the algorithm. 10 0.00 0.99 1.28 1.46 1.59 1.94 2.24 2.40
20 0.00 0.76 1.03 1.20 1.63 1.97 1.15
The transient behavior time is proportional to D. This is not
30 0.00 0.65 0.89 1.41 1.79 1.99
a serious problem if the dimension of the problem is not too
40 0.00 0.57 1.22 1.65 1.86
large, say, n<200, which is realistic for a number of practical
50 0.00 1.06 1.53 1.75
problems.
100 0.00 1.07 1.36
The magnitude of the learning parameter influences the
duration of the transient period, whose duration in generation
number is inverselyproportional to '[2. If'[ is chosen according One of the consequencesof having J.1 parents is that one can
keep a number J.L of distinct (J strategic parameters, each
to the rule that makes it proportional to 1/ J;, then the associated with a parent and mutated according to the rules
transient phase duration becomes proportional to the space explained above in 2.5.. It has been demonstrated, however,
dimension n. If n is very large, say, n>1000, this may be a that this (J.1,A)Evolution Strategy risks to diverge and therefore
serious problem, and then it is advisable to keep a fixed 't =0.3 Schwefel has recommended the adoption of elitist strategies,
during an initialperiod before starting to applythe rules above. . such as keeping or preserving the best individuals to the
The choice of the leaming parameter according to .followinggeneration.
Schwefel's rule leads to a nearly optimum progress rate of the But if one is going to follow this elitist approach, then why
algorithm, once the transient phase is finished. A aSA not just adopt a (J.L+A.)ES and have naturally kept the best
algorithm with a learning parameter chosen according to the individual in the successivegenerations?
rules aboveexhibitsa linearconvergence order.
There are always fluctuations and it is not possible to attain 2.8. Selfadaptation in (Jl,A.)ES
the theoretical optimum rate of progress of the algorithmjust
Departing from the basic ideas of selfadaptation, tested,
by manipulating 'to Therefore, other mechanisms have been
examined and theoretically explained for (l,A)ES, some
tried, like keeping a memory of the past values of the mutation
variants have been considered and developed for the (J.L,A)ES,
rate in order to act upon some kind of moving average, instead
of upon the mostrecent value. in what we could call the aSA (J1,A.)ES, which follows the lines
of GSA(I ,A)ES.
2.7. The (J.1,A.)ES as an extensionof (l).)ES In a aSA (J1).)ES, we must consider again object
parameters (the variables of the problem) and strategic
The emergence of a (J.L,A,)ES model with (J.1 parents S "
offspring) is a natural development of Evolution Strategies. In parameters in this case, a mutation rate Ok associated to each
(J.1,A), there is a population of J.1 individuals evolving in the
individual to be mutated. The object parameters (variable
values) are mutated as usual, by having
parameter space; they generate A. offspring by randomly
selecting one parent and mutating it, and doing this A times. Xk=Xk(g)+Zst , k=ltoA
The J.1 individuals of the following generation are selected as
the best among the A individuals generated by mutation  an where
elitist strategy.
Zk = csk (g+l) [N(O,11...,N(O,l)]t
Beyer, in [8], developed theoretical work in order to explain
the progressof a (J.1,A)ES as a generalized model of to (1,A)E8. In order to approximate an optimal progress rate, the Ok are
He managed to derive a fonnula for the progressrate dependent mutated according to
on a single cJ.L,1 progress coefficient parameter, generalizing Zk

ak 7.1\
=cskeue =(JkeZO+ZL..
the result obtainedfor the (1,A)ES.
Below, we reproduce the table included in [8] which gives As usual, the symbol denotes a mutated variable.
the value of c IlJ.. to consider when developing a an Evolution According to Schwefel [1], the mutating factors should be
Strategymodel of this kind. given. by Gaussian distributions dependent on leaming
parameters 't such that
39
2.9. The (f.1+A.) ES and EvolutionaryProgramming
Instead of a (J..L,A)ES, one may have a (fJ.+A.)ES. In this case,
Having uncorrelated distinct mutation strengths associated the IJ. parents of a (g+ 1) generation are chosen among the J.l
to the variables of the problemallowsthe evolution to adapt to parents from generation (g) plus the A. offspring created by
an anisotropic shape of the fitness landscape; however, the mutation from those Il parents.
search proceeds much along the coordinate axes of the search The practical indications on the values of the parameters to
space,as illustrated in Figure 5. adopt in a basic (J.1+A)ES follow the general trend of the
(f.1).)ES.
A (J,L+A.)ES, with J.1=A, is similar and can be assimilated to
Evolutionary Programming. There is only one traditional
difference that is minor but that has been artificially inflated to
sustain the idea that Evolutionary Programmingand Evolution
Strategies are two separate methods  it is the form of selection.
While in ES it was traditional to have an elitist selection
(the best at each generation would be selected to the next one)
in EP the tradition had preferred selection by stochastic
tournament. We say tradition, on purpose: tournament and
elitistselectionhave been used by both schools,ofES and EP.
We recall that the most simple Stochastic Tournament is
T(l,2), the one that, in successive operations, randomly
samples 2 individuals from the parent population and, with a
given probability fixed externally, selects the one with better
Figure 5  Illustration of the search pattern induced in two fitness to be includedin the next generation.
different individuals by distinct mutation rates affecting the
distinct variables This is done as many times as necessary until the required
number A of offspring is generated Other kinds of Stochastic
Tournaments can be conducted,such as T(m,n), wherethe best
This couldbe recognized in searching for the optimumin a m out of a sampleof n parents arc selected.
function tested by Schwefel [9] as simple as In some models, however, the stochastic nature of the
n
selection is abandoned, and pure elitist processes are adopted.
F(X) = Li.xf For instance, one simply examines the fitness of all individuals
in the parent population, and selects the best A. individuals to
i=l
form the next generation.
where each variable is differently scaled  selfadaptation
In parallel with the ES community, the Evolutionary
demands the learningof the scalingof n distinct (Jj_
Programming followers also developeda selfadaptive strategy.
It was verified that the selfadaptation scheme was very Applied to EP, this process has been introduced in 1992 as
successful, after examining the results of a series of "metaEP" [10]. The mutation process governing the evolution
experiments of a aSA (J1.,lOO)ES for J1 varying between 1 and of the mutation strength parameter in each individual from
30. Furthermore, it was discovered that the most successful generationg to generation g+1 is given by
scheme was with p. = 12, and that both smaller and larger
valueswouldcause loss of convergence speed. 0'(1+1) = ~O'(I)
The interpretation given is that for selfadaptation to work where ~ is a randomnumbergiven by
properly and efficiently, it requires a certain degree ofdiversity,
represented in a number of parents. Furthermore, it has been ~ = l+tN(O,I)
discovered that having A>p. is important, as well as having a and 1: is the "learningparameter", fixed externally. One may
limited lifespan of individuals (not allowing them to survive observe that the mutations in the mutation sttength parameter
for more than a given number K of generations) and also the are still of multiplicative type, like in ES, while the mutations
application of recombination in the strategy parameters are all in the phenotypicvariablesare additive.
prerequisites for a successful selfadaptation scheme, which
then canbe made to approachtheoretical optimumconvergence Under this light, we can observe that the mutation operator
speed. used in EP, in the variant called "metaEP" and discussed
above, can be derived from the lognormal operator presented
for ES 2.5. by taking the Taylor expaIJSion to its linear term,
which gives precisely
40
~ = 1+ 'tN(O,l) II test for termination criterion(basedon fitness,
number of generations, etc.)
The optimal value of the learning parameter T has been the If test is positivethen done := TRUE;
object of empirical and theoretical studies. In many practical II increase the generation counter
models we found that this value has been fixed by trial and g:= 9 + 1;
End while
error, but the conclusionsderived for ES are perfectly valid for EndEP
the metaEP.
Observing this piece of pseudocode, we can see that the
Therefore the ''metaEP'' proposed by Fogel[ll] with its selection procedure acts upon a generation which is composed
gaussian approach, if or is sufficiently small, becomes included by "parents" and "sons"  P[g] and P'[g]. This helps preserving
in the class of models of (J1+A) ES and exhibits the same the best individuals so that they may allow the exploration of
behavior. It can rightly be considered, therefore, as a particular promising regionsby giving place to "good" mutations.
case of this set of Evolution Strategies.
In the spirit of evolutionary computing, the selection
2.10. A schemefor Evolutionary Programming procedure should be stochastic, i.e., the best individuals should
be selected to the following generation with a given (usually
A typicalEP model, as any model in the ES family, requires high) probability. However, it is usual to find also, in practical
the definition of a fitness function and of a population of applications, procedures for deterministic selection, where the
individuals. best are always selected.
Each individual is represented by its variables, in their
natural domains. If a solution requires the representation of 2.11. Enhancingthe mutationprocess
structural or topological aspects, these can also be represented After having experiments with a global evolutive mutation
as naturallyas possible, namely by discrete variables. strength, and defining a global learning factor, as we have
Mutations act directly on variable values of an individual. discussed so far, researchers tried to decouplemutations in one
Realvalued variables are subject to a zero mean multivariate individual, so that the distinct variables could undergo distinct
Gaussian perturbation in each generation. This means that evolutiveprocesses.
minor variations in an offspring become highly probable while This meant that, for an individual, one would set not a
substantial variations become increasingly unlikely. The singlemutationstrengthbut instead n mutationstrengthsfor the
Gaussianscheme,however, does not prevent them. n objectiveparameters or n variablesdefiningan individual.
This procedureallows realvalued variables to converge to a This scheme has been tried with some success. Goingback
possible optimum continuously, avoiding the discrete nature of was a good
to the "spherical model", one has postulated that it
a genetic binary coding. Also, this allows, sometimes, a new local approximation in many cases. However, it assumed an
area in the search space to be explored, by an individual that isotropic topology of the search space. Therefore, allowing
sufferedan importantsuccessful mutation. distinct mutation strengths according to distinct coordinate
For discrete variables, often associated with topology directions (the variables of the problem) would in principle
features, mutations may follow Poisson distributed deletions or allow a more accurate approximation of regions with a sort of
additions, ellipsoidtopology,instead of spherical.
So here is the pseudocode for a general EP algorithm: A scheme with n mutation strengths allows, therefore, a
decoupling of the mutationrate evolutionaccordingto the axial
Procedure EP directions of the search space. In many cases, this will be
II start the generation counter enough to enhance the performance of a selfadaptive
9 :=0; EvolutionStrategy.
II initialize a randompopulation P
Initpopulation P(g]; But, in some cases, this is not enough. Correlations must be
II evaluate the fitness of all individuals of the population establishedbetween evolution along some direction and along
EvaluateP[g); some other direction. Otherwise, slow progress or even
while not done do divergencemay occur.
II reproduction  duplicate the population
P' (g) := P (g] The original scheme with one single mutation strength
II mutation • introduce stochastic perturbations in the assumed an evolution in an isotropic space  say, the length R
new population, including in the strategicparametera of a vector R, or IIRlI, is given by URII = RtR. Decoupling
P [g]:= Mutate ( P' [g] ); variable mutation strengths is equivalent to assuming a
II evaluation • calculate the fitness ofthe new diagonal metrics matrix in the search space; therefore, the
individuals lengthR of a vector R would be given by IIRlI = RtDR,with D
beinga diagonal matrix. This has been illustrated in Figure S.
Evaluate P [g);
II selection of the survivors for the next generation, Recognizing this, ES has incorporated correlation between
based on the fitness value and on stochastic mutations as strategic variables. This is equivalent to
tournament considering a Mahalanobis metric in space  the length of a
P [g+1) := select ( P(g] u P[g) ); vectorR will be givenby URII RTR, whereT is a full matrix.
41
Figure 6 illustrates the effect of having nonzero where Z = (z),... ~) and Zj e N(O, G i2 ).
covariances between variables, allowing the exploration of the
search apace along directions not aligned with the coordinate The vector CZ is, therefore,a random vector with normally
axes. distributed and eventually correlated components, as a function
ES followers have adopted a formal mathematical of the Ui and the O"i.
representation of the possible covariances of the mutation The strategic variables that establish the correlations or the
distribution in the severaldirectionsin space. The basic concept covariances (filling up of nonzeros the elements out of the
is the one of "inclination angle a" as the departure point to main diagonal of C) are the inclination angles a; one can
defininglinearlydependentmutation correlations for the object readily sec that if these angles are set to 0, th~ all matrices Tpq
variables. become Identity matrices and therefore mutations will develop
Given an angle .aj, a basic covariance matrix between independently in all dimensionsof the searchspace.
directions p and q may be defined by the transformation matrix These angles a are, therefore, taken as strategic variables
T, givenby and may also be mutated and subject to a selfadaptive
1 0 o procedure.
o 1 In summary, to establish correlated mutations, one may
proceed step by step as follows:
1st  mutatethe OJ
o M
2  mutate the «it
Tpq(a)= 3rd  calculateand apply the matrix C to obtain a new
o mutatedindividual
cosaj
o Theangles ak shouldbe mutated with
&
 ~
~a Evolution Strategies. In this variant, theperameter p determines
,
..........",J.::::' the number of parents that recombine to form one new
individual.
The biological construction is based on p=2, but when
building an EvolutionStrategy, we do not need to be limitedto
that optionand may experiment strategieswith p>2.
Figure 6  Illustration of the search process in correlated
directions of two distinct individuals with different linear Recombination is a word that designates a number of
correlations between variables. The angle a becomes one distinct procedures that share the property of building an
strategicvariable subject to mutation(and recombination) individual departing froma set of parents.
The product of all Tpq matrices according to all Here are some possible recombination schemes, that have
combinations of p and q gives the matrix C of covariances. beenused in EvolutionStrategies:
This allows the calculation of an actual mutation i to a given • Unifonn crossover  in this variant, the value for each
individual X by variable in the newly formed individual is obtained by
randomly selecting one of the p parents to "donate" its
x=x+cz value. In the case of p2, it is traditional to randomly
generate a bit string with length equal to the Dumber of
42
variables of the individuals and then to use such string to discarded if the constraints are not met, and replacement
command the recombination procedure: if a bit associated generated until one is found that respects constraints.
to a variable is 1, the value from the first parent is selected, Furthermore, during the mutation phase, mutations can be in
ifit is 0 then the value from the secondparent is selected. many cases conditioned so that there is no possibility for an
unfeasibledescendentto be created.
• Intermediary recombination  in this variant, the value of
any variable in the offspring receives a contribution from This was the original ES scheme for handling constraints
all parents. This could result either from averaging the but sometimes it may be a very time consuming procedure.
values of all parents (global intennediary recombination)
The other possibility is to handle constraints during the
or from averaging the values from a subset of the parents
selectionphase,by attributing a low fitness value to individuals
only, randomly chosen (local intermediary recombination).
that violateconstraints.
In these processes, one may still chose to average values
with equal weights or to randomly define weights for a
2.14. Startingpoint
weighted average. In the case of p=2, one could have the
valueof a variable givenby To start an EvolutionStrategy process, one has to generatea
first initial population of J.1 individuals. This can be typically
x~ew =Uk Xk,jl +(luk )Xk,j2 , done in two ways:
where the indices j 1 and j2 denote the two parent • By randomly generating the coordinates for the Jl
individuals and Uk sampledfrom an uniformdistribution in individuals, or
[0,1].
• By generating mutations from a seed or starting
• Pointcrossover  in this variant,parallelto the one adopted individual
in genetic algorithms, first one randomly defines 'Y «n) It was traditional in the community of Evolution Strategies
crossover points, common to all individuals in the set of to make sure that the initial population was composed of
parents,and then the offspringsuccessively receivesa part feasible individuals. However, this may not be mandatory if an
from eachparent, in turns. adequate method of penalties and handling constraints is
Experiences have demonstrated the power of recombination adopted.
to greatly accelerate the convergence of Evolution Strategies.
This beingso, some theoreticalexplanations were sought. 2.15. Fitnessfunction
In the Genetic Algorithm community, the Building Block The fitness function is usually represented by the objective
theory became popular. It stated that recombination allowed function to be evaluated, representative of the problem to be
good blocks from each parent to join together in a better solved.
descendent To this fitness function, one may add the effect of penalties
But in the ES community other mathematical descriptions to represent the undesired violation of constraints. This is an
allowed distinct views to emerge. Beyer, for instance, argued approach adoptedin all evolutionary algorithms variants.
differently [12], based on his developments of the conceptof One simple and yet claimed as effective way of adding the
progressrate:he suggested that recombination acted as a sortof effect of the violation of constraints, in a problem of
"genetic repair" mechanism, compensating the disruptive maximization, is to count the number of violated constraintsor
effects of mutation. Therefore, larger mutation strengths or add up the amount of violations, and attribute the fitness value
larger learning parameters were allowed, contributing to a according to the following role:
higher progress rate than in an ES withoutrecombination.
If no violationsoccur, Fitness(X) = F(X)
Furthermore, under some assumptions, Beyer also
demonstrated that the highest impulse from recombination was If violations occurr, Fitness(X) = 1: violation values i,
achieved when all J.1 parents contributed to form one new i e constraint set
individual. He justified this assertion with a mathematical
demonstration and called his model the (J.1/J..L,A)ES. 2.16. Computing
Recombination, thus, plays a major role in modem One may find software allowing the implementation of
Evolution Strategies and is not a secondarytechnique. Evolution Strategies. A few examples are mentionedbelow.
One well known possibility is evoC, available from the
2.13. Handling constraints Bionics and Evolution Department of the Technical university
Unlike GA, EP allows a very natural way of handling of Berlin, Germany  it is an application written in C which
constraints in a problem. Becauseeach individualis codedin its may be used in a variety of platforms, from MSDOS to
original or phenotypic variables, it is usually easy to enforce LINUX. It is free but not in the public domain, and could until
constraints. recently be obtained from ftp:/lftpbionic.tb1Q.tuberlin.de
under the directory/pub/softwarelevoC.
One way to do that is during the mutationphase eachtime
a new individual is mutated, it can be checked for feasibility,
43
A set of MATLAB tools with a userfriendly interface
developed in the University of Magdburg, Germany, can be
International Conference on Artificial Life, vol 929 of
asked and obtained from bihn(@infaut.et.unimagdburg.de, Dr.
Lecture Notes in Artificial Intelligence, page 893907.
Bihn.
Springer,Berlin, 1995.
There is also a set of demonstration programs with a
[2] Beyer, H.G., "Towards a Theory of Evolution Strategies:
relevant didactic interest, that were developed and are available
from the ftp server of the Technical Unversity of Berlin Some Asymptotical Results from the (l,+A.)Theory, in
ftp:l/ftpbionic.tblO.tuberlin.de . These demonstrations are Evolutionary Computation, vol. 1, no. 2, page 165188,
available also on internet, and have the support of a technical 1993
report [13] available from the same server. [3] Rechenberg, I., "EvolutionsstrategieOptimierung
technischer Systeme nach Prinzipen der biologischen
Evolution", FronunannHolzboog,Stuttgart, 1973
3. CONCLUSIONS
[4] Schwefel, H.P., "Adaptive Mechanismen in der
Evolution Strategies is the designation of a wide family of
biologischen Evolution und ihr EinfluB auf die
evolutionary algorithms which have in common the
Evolutionsgeschwindigkeit", Technical report, Technical
representation of solutions in the space of problem variables,
UniversityBerlin, 1974
instead of using any sort of coding like the binary coding
adopted in Genetic Algorithms. [5] Schwefel, H.P., Evolution and Optimum Seeking, Ed.
Wiley, New York NY, 1995
The ES school of thought bad its birth in Germany and
remained confined to a closed community, perhaps due to the [6] Beyer, H.G., "Towards a theory of Evolution Strategies:
fact that many of the early publicationswere made in German Progress Rates and Quality Gain for (1,+A.) Strategies on
language. However, it is now obvious that ES is a rich and (Nearly) Arbitrary Fitness Functions", in Y.Davidor,
fruitful field. R.MinDer and H.P. Schwefel (eds.), "Parallel Problem
Detached from the historical processes that give birth to Solving from Nature", 3, Heidelberg, pp. 5867, Springer
new ideas, one may also with no difficulty recognize that Verlag, 1994
Evolutionary Programming, which had for some time an [7] Beyer, H.G., "Toward a Theory of Evolution Strategies:
independent development, is just a specialized subset of SelfAdaptation",in Evolutionary Computation, vol. 3, no.
Evolution Strategies. 3,pag.311347, 1996
There is a substancialtheoretical workjustifyingthe success [8] Beyer, H..o., "Toward a Theory of Evolution Strategies:
of EvolutionStrategyalgorithms and givingindications on how the (J.1,A.) Theory", in Evolutionary Computation, vol. 2,
to tune an algorithm in order to obtain the maximum possible no.4,pag.381407, 1995
efficiency, measured by the rate of progress towards the
optimum. [9] H.P. Schwefel, "Natural Evolution and Collective
OptimumSeeking", in A. Sydow ed., "Computational
This theoretical basis clearly indicatesthat recombination is System Analysis: Topics and Trends", page 514, Elsevier,
a major operator inducing fast evolutionary progress. Under Amsterdam, 1992
this light, pure EvolutionaryProgramming models, relying only
on mutation, seem to be more limited. [10] D.B.Fogel, "Evolving Artificial Intelligence·", Ph.D.
Thesis, University of California, San Diego, 1992
Theory and experiments also suggest that good strategies
should use a number of parents J.1 generating a larger offspring [11] D.B.Fogel, LJ.Fogel, J.W.Atmar, "Metaevolutionary
A>J.1, and that in many cases generalized recombination programming", in R.R.Chen ed., "Proceedings of the 25th
processes, using all J.1 parents to generate each offspring, offer Asilomar Conference on Signals, Systemsand Computers,
the faster ratesof progress or algorithmefficiency. San Jose CA, USA, Maple Press, pag. 540545,1991
Finally, is is evident today that selfadaptation schemes are [12] Beyer, 8 ..0., "Toward a Theory of Evolution Strategies':
usually very effectiveand offer the best chances of reachingthe on the benefits of Sex  the (JJlJ.L,A) Theory", in
absolute optimum while exogenously controlled mutation Evolutionary Computation, vol. 3, no. 1, page 81111,
strenghts, even if allowing in some cases a fast progress 1995
towards the optimum, risk becomingtrappedin localoptima. [13] M. Herdy, G. Palone, "Evolution Strategy in Action  10
ESDemonstrations", proceedings of the International
4. REFERENCES Conference on Evolutionary Computatin, Jerusalem,
Israel, October 1994
44
Chapter 5
Fundamentals of Particle Swarm Optimization Techniques
Abstract: This chapter presents fundamentals of particle swann Chapter V shows some applications of PSO and ChapterVI
optimization (PSO) techniques. While a lot of evolutionary concludes this chapter with some remarks.
computation techniques have been developed for combinatorial
optimization problems, PSO has been basically developed for
continuous optimization problem, based on the backgrounds of 2. BASICPARTICLE SWARM OPTIMIZATION
artificial life and psychological research. PSO has several variations
including integration with selection mechanism and hybridization 2.1 Background of Particle SwarmOptimization
for handling both discrete and continuous variables. Moreover, Natural creaturessometimes behave as a swann. One of
recently developed constriction factor approach is useful for the main streams of artificial life researches is to examine
obtaining high quality solutions. how natural creatures behave as a swarmand reconfigure the
swarm models inside a computer. Swarm behavior can be
Key words: Continuous optimization problem, Mixedinteger _modeled with a few simple rules. Schoolof fishes and swarm
nonlinearoptimization problem, Constriction factor of birds can be modeled with such simple models. Namely,
even if the behavior rules of each individual (agent) are
1. INTRODUCTION
simple, the behavior of the swarm can be complicated.
Reynolds called this kind of agent as boid and generated
Natural creatures sometimes behave as a swann. One of complicated swarm behavior by CG animation [1]. He
the main streams of artificial life researches is to examine utilized the following three vectorsas simpleroles.
how natural creatures behave as a swarm and reconfigure the (1) to step away fromthe nearestagent
swann models inside a computer. Reynolds developed boid (2) to go towardthe destination
as a swarm model with simple rules and generated (3) to go to the centerofthe swarm
complicated swarmbehavior by CO animation[1]. Namely, behavior of each agent inside the swarm can be
From the beginning of 90's, new optimizationtechnique modeled with simplevectors. This characteristic is one of the
researches using analogy of swarm behavior of natural basic conceptsofPSO.
creatures have been started. Dorigo developed ant colony Boyd and Richerson examine the decision process of
optimization (ACO) mainly based on the social insect, human being and developed the concept of individual
especially ant, metaphor [2]. Each individual exchanges learning and cultural transmission [6]. According to their
infonnation through pheromone implicitly in ACO. Eberhart examination, people utilize two important kinds of
and Kennedy developed particle swarm optimization (PSO) information in decision process. The first one is their own
based on the analogy of swarm of bird and fish school [3]. experience; that is, they have tried the choices and know
Each individual exchanges previous experiences in PSO. which state bas been better so far, and they know how good it
These researches are called "Swann Intelligence" [4][5]. This was. The second one is other people's experiences; that is,
chapter describes mainly about PSO as one of swarm they have knowledge of how the other agents around them
intelligence techniques. have performed. Namely, they know which choices their
Other evolutionary computation (EC) techniques such neighbors have found are most positive so far and how
as genetic algorithm (GA) also utilize some searching points positive the best pattern of choices was. Namely each agent
in the solution space. While GA can handle combinatorial decides his decision using his own experiences and other
optimization problems, PSO can handle continuous peoples' experiences. This characteristic is another basic
optimization problems originally. PSO has been expanded to concept ofPSO.
handle combinatorial optimization problems, and both
discrete and continuous variables as well. Efficient treatment 2.2 Basic method
of mixedinteger nonlinear optimization problems (MINLP) According to the background of PSO and simulation of
is one of the most difficult problems in optimization field. swarm of bird, Kennedy and Eberbart developed a PSO
Moreover, unlike other EC techniques, PSO can be realized concept. Namely, PSO is basically developed through
with only small program. Namely PSO can handle MINLP simulation of bird flocking in twodimeDsiOD space. The
with only small program. The feature of PSO is one of the position of each agent is represented by XY axis positionand
advantages comparedwith other optimizationtechniques. also the velocity is expressedby vx (the velocity of X axis)
This chapter is organized as follows: Chapter II explains and vy (the velocity of Y axis). Modification of the agent
basic PSO method and chapter III explains variation of PSO position is realizedby the positionand velocityinformation.
such as discrete PSO and hybrid PSO. Chapter IV describes
parameter sensitivities and constriction factor approach.
45
Bird flocking optimizes a certain objective function. y
Each agent knows its best value so far (pbest) and its XY
position. This information is analogyof personal experiences
of each agent. Moreover;each agent knows the best value so
far in the group (gbest) among pbests. This information is
analogy of knowledge of how the other agents around them
have performed. Namely, Each agent tries to modify its
position usingthe following information:
 the currentpositions (x, Y),
+.... x
Sk : current searchingpoint,
 the currentvelocities (vx, vy), Sk+l :modifiedsearching point,
 the distance between the currentposition and pbest VX : current velocity,
 the distance between the currentpositionand gbest VX+ 1 : modifiedvelocity,
This modification can be represented by the concept of Vpbcst : velocitybased on pbest
velocity. Velocity of each agent can be modified by the Vgbcst :velocity based on gbest
following equation: Fig.2 Concept of modificationof a searching pointby PSO.
(1)
where, Vi"* : velocity of agent i at iteration k,
w : weighting function,
Cj : weighting factor,
rand : randomnumber betweenO and 1,
Sjk : currentposition of agent i at iterationk,
pbest, : pbestof agent i, Fig. 3 Searching concept with agents in a solution space
gbest : gbestof the group. byPSO.
The following weighting function is usually utilizedin (1): value is stored.
Step. 2 Evaluation of searchingpoint of each agent
w w· The objective function value is calculated for each
w=wmu  ~ nun x iter (2)
iter.flU. agent. If the value is better than the current pbest of
where, Wmu : initial weight, the agent, the pbest value is replaced by the current
W min : finalweight, value. If the best value of pbest is better than the
itermu: maximum iterationnumber, current gbest, gbest is replaced by the best value and
iter : currentiterationnumber. the agent number with the best value is stored.
Step. 3 Modification of each searchingpoint
Usingthe aboveequation, a certain velocity, which gradually The current searching point of each agent is changed
gets close to pbest and gbest can be calculated. The current using (1)(2)(3).
position (searching point in the solution space) can be Step. 4 Checking the exit condition
modified by the following equation: The current iteration number reaches the
predetermined maximum iteration number, then exit
(3) Otherwise, go to step 2.
46
agent's deciding yes or no, true or false, or making some
other decision, is a function of personal and social factors as
follows:
47
numbers can be used to express the current position and
velocity. Namely, discrete random number is used for randin
(I) and the whole calculation of RHS of (1) is discritized to
Generation of initial searching
the existing discrete number. Using this modification for
pointsof eachagent
discrete numbers, both continuous and discrete number can
be handled in the algorithm with no inconsistency. In [9], the
PSO for MINLP was successfullyapplied to a reactive power Evaluation of searching point of
and voltage controlproblemwith promising results. each agent Step.2
48
factor approach utilizes the eigen value analysis and controls Table 1 PSO lications.
the system behavior so that the system behavior has the lication field
A
following features [14]: Neuralnctwork leamin at orithm
(a) The system does not diverge in a real value region and Humantremor anal sis
fmally can converge, RuleExtractionin F Neural Network
(b) The systemcan search differentregionsefficiently. Batte Pack StateofChar e Estimation
The velocity of the constriction factor approach (simplest
Computer Numerically Controlled Milling
constriction) can be expressed as follows instead of (1) and
timization
(2):
Reactive Power and Volta e Control
Distribution state estimation
Power S stem StabilizerDesi
(8)
Fault State Power Supply Reliability
(9) Enhancement
*) No. shows the paper No. shown in bibliographiessection.
49
IEEE Power Engineering Society Winter Meeting, Columbus, Ohio, andTechnology,IUPUl, 2001.
January 2001. [32] Y. Fukuyama and H. Yoshida, "A Particle Swarm Optimization for
[14] M. Clerc, "The Swarmand the Queen: Towardsa Deterministic and ReactivePower and VoltageControl in Electric Power Systems", hoc.
Adaptive Particle SwarmOptimization", hoc. of IEEEInternational ofCongress on Evolutionary Computation (CEC2001), Seoul, Korea.
Conference onEvolutionary Computation (ICEC'99), 1999. Piscataway, NJ: IEEE Service Center,2001.
[15] R. Eberhartand Y. Shi, "Comparing Inertia Weights and Constriction [33] Z. He, C. Wei, L. Yang, X. Gao, S. Yeo, R. Eberhart, and Y. Shi,
Factors in Particle Swarm Optimization", Proc, of the Congress on "Extracting Rules from Fuzzy Neural Network. by Particle Swann
Evolutionary Computation (CEC2000), pp.8488,2000. Optimization", hoc. of IEEE International Conference on
[16] M. A. Abido, "Particle Swarm Optimization for Multimachine Power Evolutiol'llZTY Computation (lCEC'98), Anchorqe, Alaska, USA, May
System Stabilizer Design", hoc. of IEEE PowerEngineering Society 49, 1998.
SummerMeeting, July 2001. [34] A. Ismail, A. P. Engelbrecht, "Traininl Product Units in Feedforward
[17] P. Angeline, "Evolutionary Optimization versus Particle Swann NeuralNetworks using Particle SwarmOptimization",Proceedings of
Optimization: Philosophy and Perfonnance Differences", Proceeding the International Conference on ArtUiciQlInleJligence, Durban, South
of The Seventh A1I1IUQI Cor(. on Evolutionary ProgrQlllming, March Africa,pp 3640, 1999,
1998. [35] J. Kennedy, "The particle swann: social adaptation of knowledge",
[18] P. Angeline, "Using Selection to Improve Particle Swarm hoc. of Intemational Conf.ence on Evolfltionary ColflPUlalion
Optimization", Proceedings of IEEE Internaliona/ Conference on (CEC'97), IndiaDapolis, IN, 303308. Piscataway, NJ: IEEE Service
Evolutionary Computation (lCEC'98), Anchorage, Alaska, USA,May Center, 1997.
49, 1998. [36] J. Kennedy, "Minds and cubures: Partide swarm implications",
[19] A. Carlisleand G. Dozier,"Adapting Particle Swarm Optimization to Socially Intelligent Agenu: PtlpC"S from the 1997 AMI Fall
Dynamic Environments", Proceedings ofInternational Conference on Symposium. Technical Report F597D2, 6772. MeDlo Park, CA:
ArtificiDIIntel/igence, MonteCarloResort,Las Vegas, Nevada,USA AAAI Press, 1997.
[20] A. Carlisle, and G. Dozier, "An offtheshelf particle Swann [37] J. Kennedy, "Methods ofapeeiDent: inferenceamoas the eleMentals",
Optimization", Proceedings of the Wor/ahop on Particle Swarm hoc. ofInternational Symposium on Intelligent Co1llT01, Piscataway,
Optimization, Indianapolis, IN: Purdue School of Engineering and NJ: IEEE ServiceCenter,1998.
Technology, IUPUl,2001 (iIllftss). [38] J. Kennedy, "The behaviorofparticles", In V. W. Porto,N. Saravanan,
[21] M. Clerc, "The swarm and the queen: towards a deterministic and D. Waagen, and A. E. Eiben, &Is. Evolutioltllry /'rogrtlIIIming YI1:
adaptive particle swarm optimization", hoc. of 1999 Congress on Proc. 7th Ann. Conf. on EVo/fltiOftllt'Y Prog, tlMlrfing Conf., SIll Diego,
Evolutionary Compukltion (CEC'99), Washington, DC,pp 19511957, CA, pp.581589. Berlin:SpriqerVerIaa. 1998.
1999. [39] J. Kennedy,"ThinkiDg is social: Experiments with the adaptive culture
[22] R. Ebedwt and J. Kennedy, "A new optimizerusing particle swarm model",JtnmlQl ofCorVlict Raohdion, Vol.42,pp.s676, 1998.
theory", Proceedings of tlte Sixth Imernationol Sympolium on Micro [40] J. Kennedy, "Small worlds IDCl meaHDiDds: effects ofaeiahborhood
Machine and Human Science, Nagoya, Japan, pp.3943, Piscataway, topology on particle swarm perfOI'lDlllCe", hoc. 01 CtHtgress on
NJ: IEEEService Center,1995. Evolutionary eompulQtion (CEC'99), 19311938, Piscataway, NJ:
[23] R. Ebedwt and X. Hu, "Human tremor analysisusingparticleswann IEEEService Center,1999.
optimization", hoc. of Congress on Evolutionary Computation [41] J. Kennedy, "Stereotyping: improving particle swarm performance
(CEC'99), Washington, DC, pp 19271930. Piscataway, NJ: IEEE with cluster analysis", hoc. of the 2000 Congress 011 Evo/lltiolltlry
ServiceCenter, 1999. Compullltion (CEC2000), San Diego, CA, Piscataway, NJ: IEEEPress,
[24] R. Eberhartand Y. Shi, "Comparison betweenGenetic Algorithms and 2000.
Particle Swann Optimization", hoc. of the Seventh Al'IItUQl [42] J. Kennedy, "Out of the computer, into the world: extemalizing the
Conference on Evoluti01ltD")l Programming, March 1998. particle swarm", Proceedings 01 the Workshop on Particle Swtmn
[25] R. Eberhartand Y. Shi, "Evolving artificial neural networks", hoc. of Optimization, Indianapolis, IN: Purdue School of EDaineeriDa and
International Conference on Neural Networks and Brain, Beijing, Teclmology, IUPU1,2001.
P.R.C.,PLSPLI3, 1998. [43] J. Kennedy, and R. Eberhart, "PaJticle swarm optimization",
[26) R. Ebedwt and Y. Shi, Y, "Comparison between genetic algorithms Proceedinp 01 the 1995 IEEE Int.",tItional Conference on N.ral
and particle swann optimization", In V. W. Porto, N. SaravIDlll, D. Networks (lCNN), Perth, Australia,IEEE Service CeDter, Piscataway,
Wagen, andA. E. Eiben,Eds., Evolutionary Programming VII: Prot. NJ, IV: pp.I9421948. 1995.
7th Annual Conferece on Bvolutionary Programming, San Diego,CA. [44] J. KeDDedy and R. Ebemart, "A discrete binary veniOil of the particle
Berlin: SpringerVerlag, 1998. swarm algorithm", Proc.Ji1lp of the 1997 Conferace on Systems.
[27) R. Ebedwt and Y. Shi, "Comparing inertia weights and constriction Man, tmd Cybemetics (SMC'97), pp.41044109. IEEE Service Center,
factors in particle swann optimization", hoc. of Congress on Piscataway,NJ, 1997.
Evolutionary Computation (CEC2000), San Diego, CA, pp 8488, [45] J. Kennedy and R. Ebemart, "The particle swarm: social adaptation in
2000. infonnationprocessing systems", In Come, D., Doriao, M.,IDdGlover,
[28] R. Eberhart and Y. Shi, "Tracking and optimizing dynamic systems F., Eels., New Ideas in Optimization, London: McGrawHill, 1999.
with particle swarms", Proc. of Congress on Evolutionary [46] J. Kennedy, R. Ebemart, aDd Y. Shi, SwtmIf Intelligence, San
Computation (CEC2001), Seoul, Korea,Piscataway, NJ: IEEEService Francisco:MorganKaufmannPublishers, 2001.
Center,2001. [47] J. Kennedy and W. Spears, "Matchina Alaorithms to Problems: An
[29] R. Eberhart and Y. Shi, "Particleswarm optimization: developments, experimentalTest of the Particle Swarm mel some Geaetic Alaorithms
applications and resources", hoc. of Congress on Evolutionary on the Multimodal Problem Generator", hoc. of IEEE I"teT'flDtional
Computation (CEC2001), Seoul,Korea,Piscataway, NJ: IEEEService Conference on Evolutionary Contplltation (lCEC'98), Anchorage,
Center,2001. Alaska, USA, May49, 1998.
[30] R. Ebedwt, P. Simpson, and R. Dobbins, Computational1nte/ligence [48] M. Lsvbjerg, T. Rasmussen, and T. Krink, "Hybrid Particle Swann
PC Tools, Boston: Academic PressProfessional, 1996. Optimizationwith Breeding and SubpopuIatiODS", Proce«Jlngs of the
[31] H.Y. Fan and Y. Shi, "Study of Vmax of the particle swann third Genetic and Evolutioruzry Computation Conj.,.,ce (GECCo
optimization algorithm", Proceedmgs of the Workshop on Particle 2001),2001.
SwarmOptimization, lnclianapolis, IN: Purdue School of Engineering [49] C. Mohan, and B. Alkazemi,"Discreteparticle swann optimization",
so
Proceedings of the Workshop on Particle Swarm Optimization, [58] Y. Shi and R. Eberhart, "Parameter selection in particle swarm
Indianapolis, IN: Purdue School of Engineering and Technology, optimization", In Evolutionary Programming YII: Proc. EP98, New
IUPUI,2oo1. York:SpringerVerlag, pp. 591600, 1998.
[SO] S. Nab, T. Genji, T. Yura, and Y. Fukuyama, "Practical Distribution [59] Y. Shi and R. Eberhart, "A modified particle swann optimizer",
StateEstimation Using Hybrid Particle SwarmOptimization", hoc. of Proceedings of the IEEE International Conference on Evolutionary
IEEE Power Engineering Society Winter Meeting, Columbus, Ohio, Computation (CEC'98), pp.6973.Piscataway, NJ: IEEE Press, 1998.
USA,2001. [60] Y. Shi and R. Eberhart, "~irical study of particle swann
[51] K.. Naraand Y. Mishima,"ParticleSwann Optimisation for Fault State optimization", Proceedings of the /999 Congress on. Evolutionary
PowerSupplyReliability Enhancement",hoc. ofIEEE InternatiotIQ/ Computation (CEC'99), pp.I94S1950, Piscataway, NJ: IEEE Service
Conference on Intelligent Systems Applications to Power Systems Center, 1999.
(lSAP2001), Budapest, J1Dle 2001. [61] Y. Shi and R. Eberhart, "Experimental study of particle swann
[52] E. Ozcan and C. Mohan, "Analysis of a Simple Particle Swann optimization", hoc. ofSCI2000 Conference, Orlando,FL, 2000.
Optimization System", Intelligent Engineering Systems Through [62] Y. Shi and R. Eberhart, "Fuzzy Adaptive Particle Swann
ArtificioJ NeuralNetworks, Vol.8,pp. 253258, 1998. Optimization", Proc. of Congress on Evoluti01Ul1'Y Computation
[53] E. OzcanandC. Mohan, C. K, "Particle Swann Optimization: Surfing (CEC2001), Seoul, Korea. Piscataway, NJ: IEEE Service Center,
the Waves", hoc. of 1999 Congress on Evolutionary Computation zon,
(CEC'99), Washington, DC, USA,July 69, 1999. [63] Y. Shi and R. Eberhart, "Particle Swann Optimization with Fuzzy
[54) K.. Parsopoulos, V. Plagianakos; G. Magoulas, and M. Vrahatis, Adaptive Inertia Weight", Proceedings of the Workshop on Particle
"Stretching techniquefor obtainingglobal minimizers through particle Swarm Optimization, Indianapolis, IN: PurdueSchoolof Engineering
swarm optimization", Proceedings ofthe Worlcshop on Particle Swarm and Technology, IUPUI,2001.
Optimization, Indianapolis, IN: Purdue School of Engineering and [64] P. Suganthan, "Particle swann optimiser with neighbourhood
Technology, IUPUI,2001. operator", Proceedings of the 1999 Congress on Evolutionary
[55] T. Ray and K. M. Liew, "A Swarm with an Effective Infonnation Computation (CEC'99j, pp.l9581962. Piscataway,NJ: IEEE Service
Sharing Mechanism for Unconstrained and Constrained Single Center, 1999.
Objective Optimization Problems", Proc. of the 2001 Congress on [65] V. Tandoo,"Closingthe gap between CAD/CAM and optimized CNC
Evolutio1lJl1Y Computation (CEC2001), Seoul Korea,2001. end milling", Masters th~is, Purdue School of Engineering and
[56] J. Salerno, " Using the Particle Swann Optimization Technique to Technology, IndianaUniversityPurdueUniversity IndiaDapolis, 2000.
Train a Recum:nt Neural Model", hoc. of 9th IntematiotIQ/ [66] H. Yoshida, K. Kawata, Y. Fukuyama, and Y. Nakanishi, "A particle
Conference on Tools with Artificial Intelligence (ICTIJ'97), 1997. swannoptimization for reactive powerand voltagecontrolconsidering
[57] B. Secrest and G. Lamont, "Communication in particle swarm voltagestability", In G. L. Torres andA. P. Alves ciaSilva, Eels., hoc.
optimization illustrated by the traveling salesman problem", of Inti. Conf. on Intelligent System Applicalion to Power Systems
Proceedings of the Workshop on Particle Swarm Optimization, (lSAl"99), Rio de Janeiro, Brazil, pp.l17121, 1999.
Indianapolis, IN: Purdue School of Engineering and Technology,
IUPUl,2001.
51
Chapter 6
... .. .. .....
... ...
.....
• • • • •
...
... • • •
.... • • • • •
····.
• • • •
(b) Vacancy Defect.
...
... .....
... .
.
..
.... .... • • • • •
... .. .... • •
• •
• • •
•
•
•
{~l;j }
e B
Simulated Annealing;
where T is the temperature of the solid and kB is the Begin
Boltzmann constant. This acceptance rule is also known Initialize (To, No);
as Metropolis criterion and the algorithm summarized k:=O;
above is the Metropolis algorithm [20]. The temperature Initial configuration Si
is assumed to have a rate of variation (cooling schedule)
Elepeat procedure
such that thermodynamical equilibrium is reached for the
current temperature level, before moving to the next level. do L:= 1 to Nk
This normally requires a large number of state transitions generate (53 from Si);
of the Metropolis algorithm. The thermal equilibrium se f(S;) ~ f(Si) do Si:= 5j;
condition is such that the probability that the solid is otherwise
in state Si, with energy E i , is given by the Boltzmann
distribution, that is,
if exp(J(S,);/(S;») > random[O,l] do Si:= S;;
end do;
1 E· )
PT{X = So} =  e(~ k:= k+ 1;
I Z(T) Calculation of the length (NIc);
where X is a stochastic variable corresponding to the cur Determine control parameter (Tic);
rent state of the solid, and Stopping criterion
Eo end;
Z(T) = Le<m> Fig. 4. The algorithm Annealing (Aarts & Kors [1]).
j
is a normalization factor (participation factor); kB the Figure 4 summarizes the simulated annealing algorithm
Boltzmann constant; and e(Ei/KB T) is known as the
which consists of two basic mechanisms: the generation of
Boltzmann factor. alternatives and an acceptance rule. Tic is the control pa
rameter that corresponds to the temperature in physical
2.2 Simulated Annealing Algorithm annealing and N Ie is the number of alternatives generated
For a combinatorial optimization problem to be solved in the kth temperature level (this corresponds to the time
by simulated annealing it is formulated as follows: let the system stays at a given temperature level and should
G be a finite, although perhaps very large, set of con be big enough for allowing the system reaching a state
figurations and v the cost associated with each config which corresponds to "thermal equilibrium". Initially,
uration of G. The solution to the combinatorial prob when T is large, larger deterioration in the cost function
lem consists in searching the space of configurations for are allowed; as the temperature decreases, the simulated
the pair (G, v) presenting the lowest cost. The SA al annealing algorithm becomes greedier, and only smaller
gorithm starts with a initial configuration Go, and ini deteriorations are accepted; and finally when T tends to
tial "temperature" T = To, and generates a sequence of zero, no deteriorations are accepted.
configurations N = No. Then the temperature is de From the current state Si, with cost !(Si), a neighbor
creased, the new number of steps to be performed at the solution 5;, with cost f(5;), is generated by the transi
new temperature level is determined, and the process is tion mechanism. The following probability is calculated
then repeated. A candidate configuration is accepted if in performing the acceptance test:
its cost is less than that of the current configuration. If
the cost of the candidate configuration is bigger than the
cost of the current configuration it still can be accepted
with a certain probability. This ability to perform uphill
54
3. COOLING SCHEDULE
(2)
(4)
where it is assumed that if> % of the uphill moves are
accepted at the initial temperature level To. where No is the number of moves at the initial tempera
ture level and p is a user supplied parameter.
Remarks: The result expressed in Eq. (2) can be verified
as follows: Consider a configuration with cost f(zC) is IJ Remar1cs: In [16] both versions above have been imple
% worse than the cost 1(%0) of the initial configuration, mented and compared; although the second alternative
that is: is more demanding in terms of computational effort, it
normally leads to results that are better than the ones
obtained with the first approach.
55
Define initial temperature
3.3 Determination of Cooling Rate
To
There are a number of ways to carry out the temperature
reduction in simulated annealing. All methods however
are based on the fact that thermal equilibrium should be
Define other control
reached before the temperature is reduced. Three alter
parameters: p, (3, /J, Wo, Cw
natives for calculating Tk+1 from the current temperature
Tk are summarized below: Nk =No =J.L.Nt
Transition Mechanism
Tk + 1 = (3Tk (5)
Repeat Nk times the
• Variable cooling rate: 6 E [0.01; 0.20] transition mechanism
T _ Tk (6)
k+l  [1 + in 1+6 Tie]
3u( Ie
Remarks: The performance of the various cooling sched Fig. 5. Flow chart for the Simulated Annealing algorithm.
ules is highly problem dependent. For the network trans
mission expansion planning problem, for example, meth
ods (6) and (7) present nearly the same performance as • Define the number of uphill moves that should be
the method of Eq. (5) as far as the quality of the solu accepted. The process stops whenever the number of
tions is concerned, although the number of iterations to acceptances becomes less than the specified value.
convergence is normally much higher for the methods of
Eqs. (6) and (7). In [10, 16] the method of Eq. (5), with In problems suchasthe network transmission expansion .
proper calibration, has been applied with success. planning, and in a number of other applications of SA to
power networks, auxiliary problems have to be solve a
number of times to verify solution feasibility (Notice that
3.4 Stopping Criterion
this is not the case with several important operations re
Stopping criteria vary a lot in degree of complexity and so search problems such as the traveling salesman problem.).
phistication. Both predefined and adaptive criteria have In the case of the network expansion problem, the solu
been suggested in the literature. Some of the most com tion of linear programs representing the entire network
mon strategies are summarized below: are required. For these situations more elaborate stopping
criteria may be necessary in order to reduce the overall
• Define a constant number of temperature reductions computational effort. In [10, 16] the following criterion
normally between 6 e 50. have been tested with success:
• Use the rate of improvement of the cost function to • Stop the process if the number of LP solutions ex
define the stopping criterion; hence, if the incumbent ceeds a specified limit. Or,
solution or the cost of the best solution so far does
not improves after a series of temperature reductions, • Stop the process if the incumbent solution does not
it is assumed that convergence has been achieved and improve after a specified number of iterations.
the process is stopped.
56
4. AN EXAMPLE OF SIMULATED ANNEALING AL 9
GORITHM
[3] Chams M., Hertz A., de Werra D.: "Some Ex [15] Garver L.L.: "Transmission Network Estimation
periments with Simulated Annealing for Coloring Using Linear Programming", IEEE Trans. Power
Graphs", European Journal of Operational Re App. Syst., Vol. PA889, pp. 16881697, September
search, 32, pp. 260266, 1987. October, 1970.
[4] Connolly D.T.: "An Improved Annealing Scheme [16] Romero R., Gallego R.A., Monticelli A.: "Trans
for the QAP" , European Journal of Operational Re mission System Expansion Planning by Simulated
search, 46, pp. 93100, 1990. Annealing", IEEE Transactions on Power Systems,
Vol. 11, No.1, pp. 364369, February 1996.
[5] Eglese R. W.: "Simulated Annealing: A Tool for
[17] Wong K.P., Wong Y. W.: "Combined Genetic Algo
Operational Research", European Journal of Oper
rithms/Simulated Annealing/Fuzzy Set Approach
ational Research, 46, pp. 271281, 1990
to ShortTerm Generation Scheduling with Takeor
[6] Hajek B.: "Cooling Schedules for Optimal Anneal Pay Fuel Contract", IEEE Transactions on Power
ing" , Mathematics of Operations Research, 13, pp. Systems, Vol. 11, No.1, February 1996.
311329, 1988. [18] Haffner S., Monticelli A., Garcia A., Mantovani
J., Romero R.: "Branch and Bound Algorithm for
[7] Kirkpatrick S., Gelatt Jr. C.D., Vecchi M.: "Op
Transmission System Expansion Planning Using a
timization by Simulated Annealing", Science,
Transportation Model", IEE Proceedings  Gener
220(4598), pp. 498516, 1983.
ation, Transmission and Distribution, Vol. 147(3),
[8] Diaz A., Glover F., Ghaziri H.M., Gonzalez J.L., pp. 149156, May 2000.
Laguna M., Moscato P., Tseng F.T.: "Opti [19] Reeves C.R.: "Modem Heuristic Techniques for
mizaci6nHeuristica y Redes Neuronales", Editorial Combinatorial Problems", Jhon Wiley & Sons,
Paraninfo, Madrid, 1996. 1993.
[9] Drexl A.: "A Simulated Annealing Approach to [20] Sait S.M., Youssef H.: "Iterative Computer Al
the Multiconstraint ZeroOne Knapsack Problem" , gorithms with Applications in Engineering" , IEEE
Computing 40, pp. 18, 1988. Computer Society, 1999.
[10] Gallego R.A., Alves A.B., Monticelli A., Romero [21] Van LaarhotJen P.J.M., Aarts E.H.: "Simulated
R.: "Parallel Simulated Annealing Applied to Long Annealing: Theory and Applications", D. Reidel
Term Transmission Network Expansion Planning" , Publishing Company, Boland, 1987.
IEEE Transactions on Power Systems, Vol. 12, No.
1, pp. "181188, February 1997.
[22] Aarts E.H.L, Korst J.H.M., Van Laarhoven P.J.M.:
"A Quantitative Analysis of the Simulated Anneal
[11] Gallego R.A.: "Planejamento a Longo Prazo ing Algorithm: A Case Study for the Traveling
de Sistemas de Transmissio Usando Tecnicas de Salesman Problem", Journal of Statistical Physics,
Otimi~ Combinatorial", Tese de Doutorado, Vol. 50(12), pp. 187206, January 1988.
UNICAMP, 1997. (23) Lee J.Y., ChoiM.Y.: "Optimization by Multicanon
ical Annealing and the Traveling Salesman Prob
[12] Gallego R.A., Monticelli A., Romero R.: "Compar
ative Studies of NonConvex Optimization Methods
lem", Physical Review E, Vol. 50(2), pp. 651654,
August 1994.
for Transmission Network Expansion Planning",
IEEE Transactions on Power Systems, Vol. 13, No. [24] Ram D.J., Sreenivas T.H., Subramaniam K.G.:
2, May, 1998. "Parallel Simulated Annealing Algorithms", Jour.
nal of Parallel and Distributed Computing, Vol.
[13] Gallego R.A., Monticelli A. , Romero R.: "Trans 37(2), pp. 207212, September 1996.
mission System Expansion Planning by Extended
Genetic Algorithm", lEE Proceedings  Genera [25] Wilhelm M.R., Ward T .L.: "Solving Quadratic As
tion, Transmission and Distribution, 145(3):329 signment Problems by Simulated Annealing", lIE
335, May, 1998. Transactions, Vol. 19(1), pp. 107119, March 1987.
65
[26] Yip P.P.C., Pao Y.H.: "A Guided Evolutionary [37] Wong S.Y.W.: "An Enhanced Simulated Anneal
Simulated Annealing Approach to the Quadratic ing Approach to Unit Commitment", International
Assignment Problem", IEEE Transactions on Sys Journal of Electrical Power & Energy Systems,
tems Man and Cybernetics, Vol. 24(9), pp. 1383 20(5), pp. 359368, June 1998.
1387, September 1994.
[38] Chiang H.D., Wang J.C., Cockings 0., Shin H.D.:
[27] Laursen P.S.: "Simulated Annealing for the QAP "Optimal Capacitor Placement in Distribution Sys
 Optimal Tradeoff Between Simulation Time and tems Part I: A New Formulation and the Overall
Solution Quality" , European Journal of the Opera Problem", IEEE Transactions on Power Delivery,
tional Research, Vol. 69(2), pp. 238243, September Vol. 5(2), pp 634642, April 1990.
1993.
[39] Chiang B.D., Wang J.C., Cockings 0., Shin H.D.:
[28] Osman I.H.: "Heuristics for the Generalized As "Optimal Capacitor Placement in Distribution Sys
signment Problem  Simulated Annealing and Tabu tems Part II: Solution Algorithms and Numerical
Search Approaches", OR Spektrum, Vol. 17(4), pp. Results", IEEE 'Iransactions on Power Delivery,
211225, October 1995. Vol. 5(2), pp 643649, April 1990.
[29] Chiang W.C., Russell R.A.: "Simulated Anneal [40] Chiang H.D., Jumeau R.J.: "Optimal Network
ing Metaheuristics for the Vehicle Routing Prob Reconfigurations in Distributions Systems, Part I:
lem with Time Windows", Annals of Operations Formulation and a Solution Methodology", IEEE
Research, Vol. 63, pp. 327, 1996. Transactions on Power Delivery, 5(4), pp. 1902
1909, October 1990.
[30] Alfa A.S., Heragu 8.S., Chen M.Y.: "A 30PT
Based Simulated Annealing Algorithm for Vehicle [41] Chiang H.D., Jumeau R.J.: "Optimal Network Re
Routing Problems" , Computers & Industrial Engi configurations in Distributions Systems, Part II: So
neering, Vol. 21(14). pp. 635639, 1991. lution Algorithms and Numerical Results", IEEE
Transactions on Power Delivery, 5(3), pp. 1568
[31] Nolte A., Schrader R.: "Simulated Annealing 1574, July 1990.
and Graph Coloring" , Combinatorics Probability &
[42] Zhu J.X., Bilbro G., Chow M.Y.: "Phase Balancing
Computing, Vol. 10(1). pp. 2940, January 2001.
Using Simulated Annealing", IEEE Transactions on
(32] Koulmas C., Antony S.R., Jaen R.: "A Survey Power Systems, 14(4), pp. 15081513, November
of Simulated Annealing Applications to Operations 1999.
Research Problems", Omega International Journal [43] Liu C.W., Jwo W.S., Liu C.C., HsiaoY.T.: "A Fast
of Management Science, Vol. 22(1). pp. 4156, Jan Global Optimization Approach to VAR Planning
uary 1994. for the Large Scale Electric Power Systems" , IEEE
[33] Mantawy A.H., AbdelMagid Y.L., Selim S.Z.: "In Transactions on Power Systems, 12(1), pp. 437442,
tegrating Genetic Algorithms, Tabu Search and February 1997.
Simulated Annealing for the Unit Commitment [44] Hsiao Y.T., Chiang H.D.: "Applying Network Win
Problem", IEEE Transactions on Power Systems, dow Schema and a Simulated Annealing Technique
14(3), pp. 829836, August 1999. to Optimal VAR Planning in Large Scale Power Sys
tems" , International Journal of Electrical Power &
[34] Mantawy A.H., AbdelMagid Y.L., Selim S.Z.:
Energy Systems, 22(1), pp. 18, January 2000.
"A Simulated Annealing Algorithm for Unit Com
mitment", IEEE Transactions on Power Systems,
13(1), pp. 197204, February 1998.
Abstract: TS, is an alternative approach to the solution of neighborhood, i.e., in a minimization problem, the algo
combinatorial problems. TS basically consists of a meta rithm switches to the configuration presenting the small
heuristic procedure used to manage heuristic algorithms est cost. Normally only the most attractive neighbors
that perform local search. Metaheuristics are strategies are evaluated, otherwise the problem could become in
that allows the exploitation of the search space by pro tractable. Unlike gradient type algorithms used for local
viding means of avoiding getting entrapped into local op search, the neighborhood in tabu search is updated dy
timal solutions. As it happens with other combinatorial namically. Another difference is that transitions to con
approaches, TS carries out a number of transitions in the figurations with higher cost are allowed (this gives the
search space aiming to find the optimal solutions or a method the ability to move out of local minimum points).
range of near optimal solutions. The name Tabuis related An essential feature of tabu search algorithms is the di
to the fact that in order to avoid revisiting certain areas rect exclusion of search alternatives temporarily classed
of the search space that have already been searched, the as forbidden (tabu). As a consequence, the use of mem
algorithm turns these areas "Iabu (or forbidden), which ory becomes crucial in these algorithms: one has to keep
means that for a certain period of time (the tabu tenure) track of the tabus.
the search will not consider the examination of alterna Other mechanisms of tabu search are intensification
tives containing features that characterize the solution and diversification: by the intensification mechanism the
points belonging to the area declared Tabu. algorithm does a more comprehensive exploration of at
Index Terms: tabu search, optimization methods, combi tractive regions which may lead to a local optimal point;
natorialcptimizaticn. by the diversification mechanism, on the other hand, the
search is moved to previously unvisited regions, something
that is important in order to avoid local minimum points.
1. INTRODUCTION Tabu search consists of a set of principles (or functions)
applied in an integrated way to solve complex problem in
Tabu search was developed from concepts originally used an intelligent manner. According to Glover [2]:
in artificial intelligence. Unlike other combinatorial ap Tabu search is based on the premise that problem solv
proaches such as genetic algorithms and simulated an ing, in order to qualify es intelligent, must incorporate
nealing, its origin is not related to biological or physical adaptive memory and responsive exploration. The use
optimization processes [6]. TS was originally proposed by of adaptive memory contrasts with "memoriless" designs,
Fred Glover in the early 80's and has ever since been ap such as those inspired by metafors of physics and biolDgtJ,
plied with success to a number of complex problems in sci and with "rigid memory" designs, such as those exempli
ence and engineering. Applications to electric power net fied by branch and bound and its AIrelated cousins. The
work problems is already significant and growing. These emphasis on responsive e%ploration (and hence purpose)
includes, for example, the long term transmission network in tabu search, whether in a deterministic or probabilistic
expansion problem and distribution planning problems implementation, derives from the sapposition that a bad
such as the optimal capacitor placement in primary feed strategic choice can yield more information than a good
ers. random choice.
The principal features (or functions) of Tabu Search are
1.1 Overview of the Tabu Search Approach summarized in [2] as follows:
2. Sensible Exploration
N*(x
• Strategically imposed restraints and induce
ments (or, tabu conditions and aspiration lev
els)
• Concentrated focus on good regions and good Fig. 1: Illustration of a transition in tabu search.
solution features (intensification process)
• Characterizing and exploring promising new re
gions (diversification process) algebraic constraints; configurations are represented by
• Nonmonotonic search patterns (strategic oscil codification instead.
lations) Tabu Search solves Problem (1) by first applying a lo
cal heuristic search in which, given a configuration x (a
• Integrating and extending solutions (path re solution), the neighborhood of x is defined as the set of
linking) all configurations x' E N(x) that can be obtained by the
same transition mechanism applied to z: The conditions
Different tabu search algorithms are formed by combin required for x' to be a neighbor of x defines the structure
ing these functions to solve specific problems. Of course, of the neighborhood of x. The local search algorithm finds
the way actual implementation is made depends on prob the transition which leads to the configuration x' present
lem characteristics and on the degree of sophistication ing the largest decrement in the objective function (in the
needed in a particular application. Although the set of same way as a steepest gradient algorithm). The repeti
functions listed above can be expanded and/or modified, tion of this procedure eventually leads to a local optimal
it is worth noting that the approach was originally pro solution.
posed, and tested successfully in a number of problems, Tabu Search differs from the simple local search algo
with only a reduced set of such functions (tabu search rithm above in at least two essential aspects:
with short term memory with tabu lists and aspiration
criteria). 1. Transitions leading to configurations for which the
objective function is actually greater than it is for
1.2 Problem Formulation the current solution are allowed (we are considering
a minimization problem such as Problem (1»).
Generally speaking, TS algorithms solves problems for
mulated as follows: 2. The neighborhood of x, i.e, N(x), is not static, that
is, it can change both in size as well as in structure.
Min f{x) (1) A modified neighborhood N*(x) is shown in Figure
1. The elements of N*(x) are determined in different
Subject to xEX ways, for example:
where x is a configuration (or a decision variables), f(.) is
• Using a tabu list which contains attributes of
the objective function and X is the search space. Notice
configurations that are forbidden. In this case
also that no assumptions are made regarding the convex
N* (e) c N (x); as noted above this is useful to
ity of f(x) and X or about the differentiability of f{x).
avoid cycling.
A variety of combinatorial optimization problems can
be represented as the minimization of an objective func • Using strategies to reduce the size of the neigh
tion subject to a set of algebraic constraints, as above. borhood in order to speed up the local search.
This is the case, for example, of the radiality constraints in As in the previous case, the reduced neighbor
certain distribution system operation and planning prob hood is such that N*(x) C N(z)
lems. This type of constraint, which may be a nuisance • Using the so called elite configurations to per
for certain mathematical approaches, is easily handled by form path relinking. In this case it is not neces
TS since normally TS does not work directly with the sarily true that N*(x) c N(x).
68
°0 0 O.o.oe
O 0 t;::®'".4 W
•• 0 •• 0 ••
es
Aaractlvc o • 00. 0 0 0 •
IqPOIl
o eO. 0 0 • 0
D~ 0
x
• 0 •••Q••••
...........
0 0 e 0
e
•
00.
0 0 0
e
0···.·.
000
0 0
o • a a a • 0 e 0
OO@O O.ooeoo.
oow •
a
•
Feasibleconfipmion
Unfcuible CCIIlfapacioD
Fig. 2: illustration of the search space in Tabu Search. Fig. 3: Illustration of a neighborhood of a given configu
ration.
Fig. 4: Example of a feasible + unfeasible transition. where queen 1 is in row 1 and column 4, queen 2 is in row
2 and column 5, etc. With this codification no two queens
are placed in the same row or in the same column, and
so part of the problem is already solved. The remaining
problem can then be formulated as the minjmi?,&tion of
diagonal collisions.
Next we have to establish the objective function for the
nqueens problem. In order to do that, the concepts of
positive and negative diagonals are introduced, as illus
trated in Figures 7(a) and 7(b). In order to find the num
ber of collisions of a configuration it is then necessary to
00. go through the positive and negative diagonals and check
•• 0 •• 0 ••
for collisions. This task is made easier noticing that the
positive diagonals are characterized by the fact that the
o .00 .00 0 •
difference i  j is constant, whereas for the negative diag
• 0 • 0 0 • 0
x onal i + j is constant, as illustrated in Figures 8{a) and
• 0 O.Q..
...........
0 0 • 0 0 • 8(b). For example, for the configuration of Figure 6(a)
• 0 0 0 ·0···.·. 0 0 collisions occur in positive diagonals 5, 2 and 3, and in
negative diagonal 7. Thus it can be seen that the evalua
• 00. • 000 • •
00000 • 0 • 0
tion of the objective function for this type of codification
can be easily performed.
o • 0 0 • 00.
Notice that although the nqueens problem could have
• 0 0 0 • been mathematically formulated as a 0  1 problem, in
o tabu search such mathematical modeling is in fact unnec
• Fcuible coafipa1iaD 0 UaribJe ~
o AlncIiw _ _ibIeccafipaIioa essary, as illustrated by the discussion above.
• Amctiwfcalible . .fi.......
Fig. 8: Characterization of diagonals. (a) Positive diago 2. FUNCTIONS AND STRATEGIES IN TABU
nals. (b) Negative diagonals. SEARCH
A tabu search strategy is an algorithm which normally
forms part of a more general tabu search procedure. The
71
Table 1: Neighbor configurations for the
configuration of Figure 6.
[No
i
Swap
17
v ~t No Swap v
2 2 8 35
~t No
3 1 15
Swap v ~v
14 5 1 ~c
0 0 fA
. .. .
~o\'~~@4
2 24 2 2 9 36 3 1 16 23 5 1
3 26 2 2 10 47 3 1 17 37 5 1
4 56 2 2 11 67 3 1 18 46 5 1
5 15 3 1 12 25 4 0 19 57 5 1
6 16 3 1 13 12 5 1 20 45 6 2
7 27 3 1 14 13 5 1 21 34 7 3
Recencybased memory is an important feature of tabu Fig. 9: Example of a search using Tabu Search.
search: it is a type of short term, memory which keeps
track of solution attributes that have changed during the
most recent moves made by the algorithm. The infor
the search inefficient since the visit to certain attractive
mation contained in this memory allows labeling as tabu
configurations may be delayed. An alternative path not
active selected attributes of recently visited solutions; this
shown in Figure 9 may have an optimal solution as an in
feature avoids revisiting solution already visited in the re
termediate point. In this case the solution process passes
cent past. A number of practical applications reported
through the optimal solution and continues until it stops
in the literature are direct implementations of this basic
at an nonoptimal solution. This does not represent a seri
tabu search algorithm.
ous problem since the optimal solution is kept in memory
This is the most basic type of tabu search algorithm and a the incumbent solution (Of course, rather than a sin
is based on a list of forbidden attributes and an aspiration gle incumbent, one can keep track of a list of the best
criterion. The main objectives of the tabu list is (a) to solutions found during the solution process.)
avoid cycling, i.e., revising already visited solutions, and
(b) reducing the size of neighborhoods by excluding from Example 9:
consideration configurations labeled as tabu.
The main disadvantage of the use of a tabu list is that a This example illustrates the use of attributes to imple
forbidden attribute may be part of an attractive solution ment tabu constraints. Consider again the nqueens prob
of a neighborhood that has not been visited so far. To =
lem with n 7. Assume that the current solution is the
cope with this problem an aspiration criterion is used such configuration shown in Figure 6(a). This configuration is
that if the cost associated with a tabu configuration is codified as follows:
smaller than the costs of the last kp transitions, or is ~row (queen)
inferior to the cost of the incumbent solution, then the P1 = ~column
tabu constraint is relaxed and the transition is allowed.
Figure 9 illustrates the working of a short term mem The 21 neighbors of this configuration are summarized in
ory tabu search algorithm of the type describe above. Table 1. The best neighbor corresponds to the move that
Four different processes, or paths, are shown in the figure: swaps queens 1 and 7. This move becomes tabu and can
paths 156715 and 28915 lead to the optimal solution; be store as illustrated in Figure 10, which indicates that'
path 391014 is entrapped into a local optimal solution; the swap 17 will stay forbidden for the next 5 moves. The
and path 4111213 produces cycling. Even when tabu same arrangement can be use for other possible moves.
restraints are enforced, cycling may occur if the number of The arrangement is updated after every move: for exam
moves k during which a tabu is active is relatively small. ple, when the next move is performed the number 5 in
Excessively large values of k, on the other hand, can turn position (1,7) is decremented to 4 to take into account
72
that the corresponding tabu tenure has decreased. Alter 5. Move to the best neighbor if it is not tabu.
natively, rather than storing the tabu tenure for each tabu
constraint, one can store the iteration where the tabu is 6. Move to the best neighbor if it is tabu but satisfies
activated: for example, if the tabu is activated in itera the aspiration criterion.
tion 237, this number is entered in position (1,7) of the 7. Update the tabu list.
storage arrangement of Figure 10.
2 3 4 5 6 7 8. Repeat steps 3 to 7 until a topology with zero cost is
found.
1 I 5
2 The nqueens problem can be solved considering [6]:
(1) tabu tenure of three iterations, (2) aspiration crite
3
rion that accepts a new (tabu) solution if its cost is lower
4 than that of the incumbent solution, (3) neighborhood is
5 formed by the topologies obtained by exchanging the po
sitions of any two queens, (4) the initial topology is that
6 of Figure 6.
"
The application of this algorithm yields the following
sequence of topologies leading to an optimal solution:
Fig. 10: Storage of attributes for the nqueens problems.
A possible algorithm for the nqueens problem is as • 2; move=2  4; 'V = 1; tabu=l 7(2),2  4(3)
follows:
• 3; move=13; 'V = 1; tabu=17(l),24(2), 13(3)
1. Define the tabu tenure and the aspiration criterion. • 4; move=57; v =2; tabu=24(1), 13(2),57(3»
2. Define neighborhood structure. • 5; move=47; 'V = 1; tabu=13(1),57(2),47(3»)
3. Find the initial topology. • 6; move=13; v = 0; tabu=57(1),47(2), 13(3»
4. Compute and order the objective function for the en
tire neighborhood. Remarks:
73
• The optimal solution was found after 6 iterations The elite candidate list technique starts with a master
(moves). In the two first iterations the objective func list formed by the n p best elements of the neighborhood
tion was reduced. In the third iteration the objective of the initial configuration. Then a series of moves are
function remained constant. performed considering that the set of top nJ) neighbors
remain the same. When either a configuration satisfying
• In the fourth move there was no neighbor configu a given objective is found or a maximum number of moves
ration with quality better than that of the current is performed, a new master list is built and the process is
solution, although there was two topologies with at repeated.
tributes 13 and 17 which lead to topologies with the The successive filter strategy technique is normally used
same objectives (These are forbidden; the attribute in cases in which the neighborhood structure is defined
13 would lead to a topology already visited, whereas by swapping attributes. This is what happens, for exam
the attribute 17 would lead to a new topology; since ple, in network transmission planning, where a neighbor
the aspiration criterion is not satisfied, the move is is obtained by the addition of a circuit and the removal of
not performed.) another circuit (a swap of candidate circuits). A reduced
• In the fourth move the objective function is in fact neighborhood is then obtained by defining two short lists,
increased. one with elements that can be added and another with
elements that can be removed from the current configu
• In the fifth iteration there was an decrease in the ration.
objective function. The sequential fan candidate list technique is similar
to the concept of population used in genetic algorithms.
• Finally in the sixth iteration the attribute 13 is used Given an initial configuration the entire neighborhood is
since the aspiration criterion is satisfied (the penal
evaluated and a reduced list of the n p best neighbors
ized objective is smaller than that of the incumbent).
is formed. These configurations are then called cutTent
configurations (a population). Next a reduced number 0
2.2.1 Candidate list strategies neighbors of each current configuration is evaluated and a
Once the neighborhood is defined it is evaluated. Since successor is found for each current configuration. When
normally either the neighborhood is excessively large or ever two different successors are the same configuration
the evaluation of each alternative is time consuming, a an additional configuration is found to keep the size of
screening to limit the search to the most attractive neigh the population constant (i.e., n p configurations).
bors has to be performed. In power system applications,
for example, evaluation may imply the need for solving ei 2.2.2 Tabu tenure
ther a linear program or a nonlinear program or a power
Tabu tenure is the number of tabu search iterations (i,e.,
flow problem. Four strategies have been used in the liter
number of moves or transitions) an attribute remains for
ature to perform the screening of a neighborhood:
bidden. In a typical application several tabu lists can be
• Aspiration plus; maintained simultaneously each with a different tenure.
Thus, not only single attributes but also combinations of
• Elite candidate list; attributes can be forbidden. Very seldom a tabu is spec
ified by given the complete information about the unde
• Successive filter strategy; and sired configuration; normally only certain attributes of the
• Sequential fan candidate list. configuration are put in the tabu list. The tabu tenure
can be either static (predefined) or dynamic (determined
The aspiration plus technique requires a minimum qual on the fly). Dynamic tenure can be implemented in two
ity level for a configuration to be included in the neighbor different ways: (1) random dynamic tabu tenure or (2)
hood. This method requires that the potential neighbors systematic dynamic tabu tenure.
be put in an ordered list and analyzed one by one until The random dynamic tabu tenure is implemented us
a specified threshold is passed. The process stops after a ing bounds tmin and t m Q%; a randomly chosen tenure
few more configurations are evaluated after the threshold tmin ~ t ~ t m cu : when a new tabu attribute is estab
is hit (value plus). In order to avoid that a reduced or lished. There are two alternatives for implementing this
an excessive number of neighbors is analyzed, upper and technique: in the first one the value t is kept for a certain
lower bounds have to be satisfied. There are at least two time atm Gz , where Q is a parameter. After that time a
alternatives for defining the threshold: (1) the value of new t is determined and so forth; the second alternative
objective function of the best configuration visited in the consists in determining a new t at each move, so that each
last k moves, and (2) the value of objective function of tabu attribute will remains forbidden for different periods
incumbent solution. of time.
74
As with the random dynamic tabu tenure, the system is usually implemented as a subroutine of a more general
atic dynamic tenure can be implemented using two differ long term algorithm. The long term algorithm is normally
ent approaches. The first alternative is a simple variation based on three techniques:
of the random dynamic tabu tenure in which the random
choice is replaced by a systematic choice: for example, • Frequency based memory.
assuming bounds tmin = 4 and tm4~ = 9, the system
• Intensification;
atic choice is made chosing t cycling through the sequence
4,5,6, 7,8,9 [6]. The second alternative is the so called • Diversification;
moving gap technique, where the tabu list is partitioned
into two halves, one static and one dynamic. For example, 2.3.1 Frequency based memory
in a scheme with eight iterations, all attributes remain in
the tabu list for the four first iterations, whereas the last Frequency information is stored in order to be used to
four iterations are variable, as described in the following. change future search strategies. Two principal types of
In the case of the right gap the attributes initially remain frequency are used in practice: residence frequency and
in the tabu list for four iterations, then they are dropped transition frequency.
from the list for two iterations, and come back to the list By the residence frequency technique the number of
for two additional iterations. In the middle gap the at times an attribute occurs in a predefined set of config
tributes initially remain in the tabu list for five iterations urations is kept in memory. This set of configurations
(four plus one), then they are dropped from the list for can be either the set of all configurations visited so far,
two iterations, and come back to the list for one additional or a set of elite configuration or a set of low quality solu
iterations. And in the left gap the attributes initially re tions, etc. For example, regarding the set of elite config
main in the tabu list for six iterations (four plus two), urations, the frequency of a given attribute may indicate
then they are dropped from the list for the two remaining it is highly desirable; and the opposite happens if we are
iterations. dealing with a set low quality solutions. In the first case
(elite configuration), these attributes can be used both in
2.2.3 Aspiration criteria intensification and in path relinking. On the other hand,
if the set of configurations is diverse and the residence
Typically a tabu test begins with the determination of a frequency of a certain attribute is high, it may indicate
trial solution Zl in the neighborhood of the current solu that this attribute is limiting the search space and should
tion z: Then attributes of x that are changed in the move be penalized (become tabu).
from x to x' are identified. If among these attributes there By the transition frequency technique the number of
are tabuactive attributes the move still can be validade times an attribute occurs in transitions is stored. The fre
if x' satisfies an aspiration level, i.e., if x' is good enough quent occurrence of these attributes does not necessarily
for justifying the relaxation of the tabu restraint. indicates that they will form part of the optimal solution.
The information regarding these attributes can be used
2.3 The Use of Long Term Memory in Tabu Search to change the search strategy by means of diversification;
Although a significant part of the literature on tabu search these attributes are penalized or become tabu. In trans
deals with algorithms based exclusively on short term mission expansion planning, for example, this occurs with
memory techniques, more complex, sophisticated appli low cost circuits which are frequently used as crack fillers,
cations require the use of long term memory. There are i.e., they are temporarily used in moving between two al
several different ways to use long term memory: ternative configurations and then are removed from the
current solution.
• Reinitialize the search from a high quality configura
tion; Example 5:
• Redefine the neighborhood structure based on a high Figure 11 illustrates the use of memory in a tabu search
quality solution; algorithm in the 7queens problem discussed in the pre
vious examples. In this case the lower triangle of the
• Redefine the objective function to penalize certain 7 x 7 matrix is used to store the transition frequency of
attributes of a high quality solution; each possible move, whereas the upper triangle is used for
• Change the search strategy based on knowledge ac short term memory, as in the previous example. In the
quired during previous searches. case summarized in the figure, 25 moves have been per
formed; for example, five of these moves were performed
In practice normally a combination of short term and by swapping the positions of queens 1 and 3, two moves
long term memory is adopted: the short term memory involving queens 1 and 5, etc.
75
Current solution
1 2 3 4 567
[I[ill]I[I[illJ [CX 0
~ ~~~~.~
0 W
=4
..... a .
1.•.•••••. Uuarxtivc
No. of collisions
Q I <~:~~:Z.::..
. .~;;:';i(fjjjj
Tabu structure
1 2 345 6 7
3 1
2
5 2 3
(.. ~_. '\ \.~~) 0
1 3
2 1 3 4
1 4
5
\S) "',,8 ~~
.
1 1 6 ~ ..~ ~
3 1 7
f~~~~~~ . w YJ
Fig. 12: Examples of intensification and diversification in
Fig. 11: Transition frequency based memory (lower trian tabu search.
gle).
°0
Queens v Vp
34 1 4
16 2 3
25 2 3
26 2 2
24 3 4
[5] M. Laguna: "A guide to" implementing tabu search" , [19] I. H. Osman: "Heuristic for the Generalized As
Investigacion Operativa, Vol. 4, No.1, Abril 1994.' signment Problem  Simulated Annealing and Tabu
Search Approaches", OR Spektrum, 17(4), 211225,
[6] F. Glover and M. Laguna: "Tabu Search", Kluwer 1995.
Academic Publishers, 199~.
[20] K. Jomsten, A. Lokketangen: "Tabu Search for
[7] C..R. Reeves: "Modern Heuristic Techniques for Weighted KCardinality Trees", AsiaPacific Jour
Combinatorial Problems" , McGrawHill Book Com nal of Operational Research, 14(2), 926, 1997.
pany,1995.
[21] S. Hanaji, A. FretJille: "An Efficient Tabu Search
[8] J. Knox; "Tabu Search Performance on the Sym Approach for the 01 Multidimensional Knapsack
metrical Traveling Salesman Problem", Computers Problem", European Journal of Operational Re
& Operations Research, 21(8),867876, 1994. searCh, 106(23),659675, 1998.
[9] C.N. Fiechter: "A Parallel Tabu Search Algorithm [22] M. Sun, J.E. Aronson, P.G. McKeown, D. Drinka:
for Large Traveling Salesman Problems", Discrete «A Tabu Search Heuristic Procedure for the Fixed
Applied Mathematics, 51(3), 243267, 1994. Charge Transportation Problem", European Jour
nal of Operational Research, 106(23), 441456,
[10] W.B. Carlton, J. W. Barnes: "Solving the Traveling 1998.
Salesman Problem with Time Windows Using Tabu
Search", lIE Transactions, 28(8), 617629, 1996. [23] E. Rolland, H. Pirkul, F. Glover: "Tabu Search
for Graph Partitioning", Annals of Operations Re
[11] M. Gendreau, G. Laporte, F. Semet: "A Tabu search, 63, 209232, 1996.
Search Heuristic for the Undirected Selective Trav
eling Salesman Problem" , European Journal of Op [24] S.G. Ponnambolam, P. Aru11indan, S. V. Ra;esh: "A
erational Research, 106(23), 539545, 1998. Tabu Search Algorithm for Job Shop Scheduling",
International Journal of Advanced Manufacturing
[12] M. Gendreau, A. Hertz, G. Laporte: "A Tabu Technology, 16(10), 765771, 2000.
Search Heuristic for the Vehicle Routing Problem" ,
Management Science, 40(10), 12761290, 1994. [25] J. W. Barnes, J.B. Chambers: "Solving the Job
Shop Scheduling Problem with Tabu Search", lIE
[13] J.F. Wu, J.P. Kellf/: "A Network FlowBased Tabu Transactions, 27(2), 257263, 1995.
Search Heuristic for the Vehicle Routing Problem" ,
Transportation Science, 30(4), 379393, 1996. [26] A. Fenni, M. Marchesi, F. Pilo, A. Se"';': "Tabu
Search Metaheuristic for Designing Digital Filters" ,
[14] G. Barbarosoglu, D. Ozgur. "A Tabu Search Algo International Journal for Computation and Math
rithm for the Vehicle Routing Problem" , Computers ematics in Electrical and Electronic Engineering,
& Operations Research, 26(3), 255270, 1999. 17(56),1998.
79
(27] E. Costamagna, A. Fanni, G. Giacinto: "A Tabu [38] A. Augugliano, L. Dusonchei, S. Mangione, E.R.
Search Algorithm for the Optimization of Telecom Sanseverino: "Fast Solution of Radial Distribution
munication Networks" , European Journal of Oper Networks with Automated Compensation and Re
ational Research, 106(23), 357372, 1998. configuration", Electric Power Systems Research,
Vol. 56, No.2, pp. 159165,2000.
[28] A.H. Mantawy, Y.L. AbdelMagid, S.Z. Selim:
"Unit Commitment by Tabu Search ", IEE Pro [39] F.S. Wen, C.S. Chang: "A Tabu Search Approach
ceedings Generation, Transmission and Distribu to Fault Section Estimation in Power Systems",
tion, Vol. 145, No.1, pp. 5664, 1998. Electric Power Systems Research, Vol. 40, No.1,
pp. 6373, 1997.
[29) A.H. Mantawy, Y.L. AbdelMagid, S.Z. Selim: "A
NewGeneticBasedTabu Search Algorithm for Unit [40] F.S. Wen, C.S. Chang: "Tabu, Search Approach
Commitment Problem", Electric Power Systems to Alarm Processing in Power Systems" , lEE Pro
Research, Vol. 49, No.2, pp. 7178,1999. ceedings Generation, Transmission and Distribu
tion, Vol. 144, No.1, pp. 3138, 1997.
[30] A.H. Mantawy, Y.L. AbdelMagid, S.Z. Selim: "In
tegrating Genetic Algorithms, Tabu Search and .[41] S. M. Sait, H. Youssef: "Iterative computer al
Simulated Annealing for the Unit Commitment gorithms with applications in engineering", IEEE
Problem", IEEE Transactions on Power Systems, Computer Society, Los Alamitos, CA, 1999.
Vol. 14, No.3, pp. 829836, 1999.
81
Genetic Algorithm Fuzzy Logic is a way to use inherent analogical
NewPopulation + Adjustment process data that shift trough a continuous
digital computer band, which works with well
defined numeric data, i.e., discrete values. For
example, considering a brake system directed
Evaluation by a microcontrol: the microcontrol takes the
InputData
decisions according the brake temperature,
FuDy
speed, and other system variables.
System
The temperature variable in this system may be
Figure 1  Interaction between Genetic divided in a "states" band: "cold", "fresh",
Algorithms and Fuzzy Systems "normal", "tepid", "hot". Though, it is not easy
to set up the transition from one state to the
next one; an arbitrary limit should be defined
The above process may be expanded to use the as to divide, for example, the "tepid" from the
chromosomes that include information's "hot", but this would drive to a discontinue
regarding conditions and rules in relation with change when the entry value pass trough the
fuzzy rules. Their inclusion into the genetic limit. The microcontrol should be able to
treatment allows the system to learn or to detect it.
refine the fuzzy rules.
The way would be the creation of the " fuzzy
states" or more common "fuzzy memberships",
2. FUZZY SYSTEMS which allows the graduated change from a state
to the next. You could define the entry
Lotfi A. Zadeh introduced the concept of fuzzy temperature by the use of intermediate
set, in 1965. He is considered as the great functions.
contributor for the Modern Control. At the
beginning of the 60"s decade, Zadeh observed In this way, the entry variable state will no
that the technologic resources available could longer to jump suddenly from one state to the
not provide the automation of the activities next one, it looses gradually the value in one
with regard to industrial, biologic or chemical state while it gains value in the next state,
nature problems, comprising ambiguous instead. Up to in a moment the brake
situations, not subject to processing through temperature "true value" will almost always be
computer logic based on Boolean logic. in some point between two consecutive
functions: 0.6 normal and 0.4 tepid, or 0.7
In order to solve these problems Prof. Zadeh normal and 0.3 fresh, and so on. Figure 2 shows
has published in 1965 [6] one paper summing a representation of such consecutive functions.
up the Fuzzy Set's concepts, and by the
creation of Fuzzy Systems made a re.volution in
the subject. cold fresh normal tepid hot
In 1974, Prof. Mamdani, from London
University's Queen Mary College, after several
trials for controlling a steam motor with
different types of controls, PID included, have
succeeded only after applying the fuzzy
reasoning [7].
Several papers have been published regarding
the Fuzzy Logic aiming to organize the many
applications and developments per
concentration areas [811]. brake temperature
83
pondered average, the centered of each area is hurdle THEN break pressure is high.
calculated separately and the outing value is
calculated through the average of these
centered and pondered with the maximum value Speed
of the pertinence function. In the process by a b c d
maximum average of pertinence function, the
a = stopped
pertinence function maximum value average is b= slow
calculated, being this outing value [13]~ c=medium
The fuzzy control systems project is based in d= high
empirical methods  basically a methodical
approach for tryanderror. There are few pre
defined rules in the present, once it is a new 0 1 2 5 7 10 2S 30 50
technology yet; generally the process follows
the steps below: Distance
a b c d
• The system operational specifications, the
a == in the hurdle
entries and outings are documented. b=near
• The fuzzy sets for the entries are c=faraway
documented. d == veryfar away
+
(U,4O) P .  0I0.J«llW
_, oW
This resulting fuzzy set will be defuzzificated
Rdm1q._om
providing the outing 4 that corresponds to the .....
1Il.bOlCkpanoflll< 0' 110'
(G,200)
Pressure
a b c d
a = no pressure
b=low
c=medium Figure 5  Program basic screen.
d=high
85
are three possibilities for changing the
direction of displacement that are:
a) A clash against the wall: When the system
detect that in the next simulation step , the
vehicle will clash against the wall ;
b) A rule that enforces to turn back : when
because of a rule the order for reversing be
used, or ,
c) Lack of outputs : when the control does not
use any rules, that is to say, if the output area
were void (zero).
86
IF x is LC AND Y is YT AND car angle is VE
AND Displacement Direction is ahead
THEN modified angle is NS.
3.3. Simulations
The computing pack learning process used is
the trialanderror. The· user makes the
membership functions, supply a set of rules and
perform several tests checking the control
quality. It is known that this learning process
(tryanderror) may not provide the expected ( b)
results because many interpretation mistakes
may happen [15]. Figure 9  Simulation Examples .
/ /
_1')
}, ......
Training menu, which will be discussed in the
following item.
;,/ '
;/
.",,/ I
/1'
tl
<,,
, ;',' \ '\ 4. GENETIC TRAINING MODULUS DESCRIPTION
l JI \
, \\
uI' \\_ This modulus has an adjusting function for the
;
f·...... ......
.... __.
............... pertinence functions, using genetic algorithms ,
from a control previously fostered by the user.
For this purpose we have three options from
menu Genetic Training: Starting Positions for
Training, Genetic Training, and Best Results.
(a ) In a general way, the integration between
genetic algorithms and fuzzy control, was set
up as follow:
87
a) The chromosome was defined as being a set up.
linkage between pertinence functions
After ending all the generations, it is possible
adjustment values. to choose the best result found between all the
b) The parameters are the centers and the generations through the button Best. This
widths of each fuzzy set. These parameters operation allows evaluating the proposed
make up the chromosome's genes. solutions for the genetic algorithms. For that
purpose the user should choose one generation
c) From a possible parameter's initial values and to click the button Adjust.
range the fuzzy system is rolled up to check
whether it works well. After the adjustment, the pertinence functions
are redefined according the parameters of the
d) This information is used for set u~ .the chromosome correspondent to the chosen
adjustment of each chromosome (adaptab~llty)
generation. The system will make the control
and establishing in this way a new population.
based on these new functions.
e) The cycle repetition happens until
The original pertinence functions may be
completion of the generation's number defined
restored through the option Best Options from
by the user. At every generation the best set of
the menu Genetic Training, clicking the button
values is found for the pertinence functions
Restore Original. It will be presented now the
parameters.
mechanisms used for the generation of very
good pertinence functions in the middle of
fuzzy control used through the genetic
4.1. The Option to Define the Starting Positions algorithms.
For the genetic training, it is possible to define
To describe each pertinence function of the
the initial positions that the vehicle will depart
fuzzy control introduced by the computer
from, to evaluate each chromosome that
package, four parameters are defined. They are:
represents the set of values for the pertinence
IE (down left), ID (down right), SE (upper left)
function's parameter, in this way looking for a
and SD (upper right). Figure 7 shows the
control optimization, not only with respect to
pertinence function PE parameters, variable x.
one sole route, but also of the whole possible
For this function IE's value is equal to 30, ID's
initial positions for starting the vehicle, and so
160, SE's 80 and SD's 110. For the adjustments
the parking happen.
of pertinence functions were defined for each
Through the edit window, the initial positions of them the following equations:
are edited. So, it is possible the definition of a
new position, edit an existent position, to
IE = (IE + k i)  Wi
90
Table 1  Selection Criteria
.~
:
'Il
.~~ ~ ....~   ,
•"
1Il
•
••
•••
•••
•

" •. 1
I
Pos.2
,Poa.3
•
•••
••
•
:
 •
1Il
~ 1iI
.~~ 
Figure 13  Initial training positions.
(a) (b)
92
Table 5 Simulations results .
I
Position X Y Car Iterations
" \ "~ Angle generated by
~\ ~\.._ l Fuzzy Controls
\ "', !
\ \ I
Original Trained
" \ I
"\ ,
\ 1\
1 1 126 182 450 329
\\ \I \\lJ
\ :
\ i
1
t 2 6 46 132 167 154
~ \
93
Table 5 shows the results obtained from genetic algorithm where the adjustment is
simulations made with 30 positions randomly settled.
chosen for the vehicle parking. The results
demonstrate an average reduction of iterations
number for vehicle to reach the final position 7. ACKNOWLEDGEMENT
of 21 % for the genetic algorithm trained
The author thanks PRONEX, CNPq and CAPES
control. These values represent a global
for the financial support of this project.
reduction of the vehicle route starting from
positions not used in genetic training.
It is possible to notice that in some positions 8. REFERENCES
the iterations number is bigger than the ones
generated by the original control (without
[1] C. L. Karr, "Genetic algorithms for fuzzy
controllers", Al Expert, vol 6, no. 2, pp.
training). In position 29 from Table 5, for
example, the original control generate 235 2633,1991.
iterations to park the vehicle as compared to [2] C. L. Karr, "Applying genetics to fuzzy
the trained control which generates 355 logic", Al Expert, vol 6, no. 3, pp. 3843,
iterations. This increase comes from the 1991.
modifications made in the membership
functions, that makes the vehicle change to a [3] C. L. Karr & D. A. Stanley, "Fuzzy logic
different route to reach the final position. and genetic algorithms in timevarying
control problems", in Proc. NAFIPS91,
1991, pp. 285290.
6. CONCLUSIONS [4] D. L. Meredith, K. K. Kumar, and C. L.
The fuzzy systems are a convenient and Karr, "The use of genetic algorithms in the
efficient alternative for solution of problems design of fuzzy logic controllers", in Proc.
where the fuzzy state are well defined. WNNAIND 91,1991, pp. 695702.
Nevertheless, the project of a fuzzy system may [5] L.R. Medsker  Hybrid Intelligent Systems,
became difficult for large and complex Boston: Kulwer Academic Pub., 1995.
systems, when the control quality depends of
"tryanderror" methods for defining the best [6] L. A. Zadeh  "Fuzzy Sets", Information
membership functions to solve the problem. and Control, Vol.8, pp.338353, 1965.
The main purpose of the Computing Package [7] E. H. Mamdani, "Appl ication for fuzzy
for The Fuzzy Logic Teaching, as used in this algorithms for the control of a dynamic
work, is to provide the students with the plant," Proc. IEEE. , vol. 121, pp. 1585
learning of this logic. The choice of a vehicle 1588, 1974.
parking lot is justified because they don't need [8] T. Terano, K. Asai & M. Sugeno  Fuzzy
to have a previous knowledge. (at least in Systems Theory and Its Applications, New
mathematics terms) regarding the subject, for York: Academic Press, 1992.
making the control.
[9] P.P. Bissione, V. Badami, K.H. Chiang,
The genetic training modulus developed in this P.S. Khedkar, K.W. Marcelle & M.J.
work added this program with an automatic Schutten  "Industrial Applications of
technique for the adjustment of the membership Fuzzy Logic at General Electric", Proc. of
functions parameters. This technique shows the IEEE, pp.4S046S, March 1995.
that the performance of a fuzzy control may be
improved through the genetic algorithms, [10] J.A. Momoh, X.W. Ma & K. Tomsovic 
substituting for the "tryanderror" method, as "Overview and Literature Survey of Fuzzy
used before by students for this purpose, with Set Theory in Power Systems", IEEE
no good results. Trans. on Power Systems, Vol. 10, No.3,
pp.16761690, Aug. 1995.
The genetic algorithms provided distinctive
advantages for the optimization of membership [11] M. Sugeno  Industrial Applications of
functions, resulting in a global survey, Fuzzy Control, Amsterdam: North Holland,
reducing the chances of ending into a local 1985.
minimum, once it uses several sets of [12] A. Kaufmann  Introduction to the Theory
simultaneous solutions. The fuzzy logic of Fuzzy Subsets, Academic Presss, 1987.
supplied the evaluation function, a stage of the
94
[13] A. Kandel & G. Langholz, Fuzzy Control 117124, Kingston, Canada, Jun. 1618,
Systems, CRe Press, 1993 1996.
[14] G. LambertTorres, V.H. Quintana & L.E. [15] D. Park, A. Kandel & G. Langholz 
Borges da Silva  "A Fuzzy Control Lab for "GeneticBased New Fuzzy Reasoning
Educational Purposes", Canadian Models with Application to Fuzzy
Conference on Engineering Education, pp. Control", IEEE Trans. on SMC, Vol. 24,
No.1, pp. 3947, January 1994.
95
Chapter 9
96
and improving performance in specific regions of interest, In this chapter we used the IEEE 17 generator transient
such as security boundaries. The system is based on a stability test system as a case study. We used the EPRI
boundary marking technique originally proposed by Reed energy margin software package called DIRECT [13] to
and Marks [6] which makes use of an evolutionaryalgorithm create the training databasefor the neural network. Software
to spread points evenly on a contour of interest. These was written to automate the data gathering process by
points are then verified via simulations thus quantifying the repeatedly running the DIRECT software to calculate the
accuracy of the security boundary. Areas of inaccuracycan system energy margin for a single fault under many different
then be improved by augmenting the training database and prefault operating states. The database consists of a set of
retraining the neural network. prefauIt system features, in this case generator settings and
Section 3.3 of this chapter deals with issues involved in system load, and the corresponding system energy margin.
training neural networks for power system dynamic security The DIRECT softwaredetermines the energy margin,which
assessment including; data gathering, training and is related to the security of the system, by assigning a
validation. Section 3.4 introduces the concept of positive energy margin to secure states and a negative
evolutionary algorithms and the proposed query learning energy margin to insecure states. The magnitude of the
technique of this chapter. Section 3.5 describes the energymarginindicates the degree of stability or instability.
application of this technique to the creation of nomograms A software package called QwikNet [14] to design and
and the location of critical operating regions using the IEEE test the neural networkwas used. QwikNet is a remarkable
17 generator transient stability test system as a case study. windows based neural network simulation package that
Finally, conclusionsare presented in section 3.6. allows experimentation with many different network
topologies and trainingalgorithms. After training, the neural
2. NN'S FOR DSA network function, f('1.)=8, can be written to a file in a
convenient C programming language format that can easily
Neural networks have demonstrated the ability to be incorporated intothe inversion software.
approximate complex nonlinear systems when presented
with a representative sample of training data. Several
3. EVOLUTIONARYBASED QUERY LEARNING
researchers have reported remarkable results when applying
ALGORITHM
the multilayer perceptron neural network to the power
system security assessment problem [13]. Typically, Query learning [1516] is a method that can be used to
traditional methods such as time domain simulations [7] or enhance the performance of partially trained neural
energy function methods [8] are used to generate a database networks. Query learning is based on the notion of asking a
of training data. This database includes examples of all partially trained network to respond to questions. These
power system operating scenarios of interest described by a questions are also presented to an oracle which always
set of selected power system features as well as their responds with the correct answer. The response of the
resulting security measure. The neural network then adapts neural network is then compared to that of the oracle and
itself to the training database and produces an approximation checked for accuracy. Areas that are poorly leamed by the
to the security assessment problem in the form an equation neural network can be thus identified. Training data is then
f(x)=S, wherefis the neural network function, i is the vector generated in these areas and the network is retrained to
of power system features and S is the resulting security improve its performance.
index. Examplesof commonly used security indices include The query leaming procedure proposed in this chapter is
energy functions and critical clearing times [7,9]. an extension of previouslyproposed methods. The principle
A key advantageof using neural networksis the ability to difference is that instead of locating and then querying
extract operating information after training via neural individual points, our algorithm works with a population of
network inversion techniques [1012]. Neural network solutions, thus offering the ability to query entire areas of
inversion is the process of fmding an input vector that interest. This algorithm also seeks to evenly distribute the
produces a desired output response for a trained neural points across the area. Evenly distributing the points is
network. For example, consider a neural network trained to important because a global view of the securityboundary in
predict the security S of a power system given a vector of multiple dimensions is provided thus allowing the entire
system features x. By clamping the output value S to the boundaryto be queried and potentially improved. After the
marginally secure state, say 8=0.5, where S=1.0 is secure points are spread, they are simulated via the energy margin
and 8=0.0 is insecure, and inverting the network, a simulatorand their true security index is determined. If all
marginally secure state x· can be found in the input space. the points are within tolerance the algorithm stops.
This state then describes a region of the power system Otherwise, the points with unacceptably large errors are
operating space where insecurity is likely to occur. It should added to the training database and the neural network is
be noted that since the neural network is typically a many retrained
toone mapping, the inversion is generally not to a unique In the evolutionary boundary marking algorithm, all
point, but rather to some contour in the input space. reproduction is asexual, i.e. no mating or crossover takes
97
place. Offspring are produced as perturbations of single (c) Feasibility constraints are enforced on the
parents. This concentrates the search in the area close to the new offspring via the solution of a standard
security boundary and speeds convergence. The algorithm powerflow.
seeks to minimize a fitness function, F, of the following 4. Repeat until convergence.
form;
By successively deleting points with poor fitness values and
1 replacing themwith perturbations of points with high fitness,
F = If(x) Sl+ Davg the population tends to spread evenly across the solution
contour. Typical values used in this chapter are N=100,
where, .M=20, m=3 and 0' =0.05.
f is the neural network function, Figure 2 shows histograms of the initial and final
x is the current point, population distributions. It can be seen that the final
S is the security boundary, and population has converged to the security boundary and is
Davg average distance to the nearestneighbors evenly spread across the boundary. These points are then
added to the training database and the network is retrained.
The evolutionary algorithm is randomly initialized with N Severaliterations of query learning may be requiredproduce
points and then proceeds as follows. acceptable results.
1. The populationis sorted based on fitness, F.
2. The M points with the lowest fitness scores are 4. CASESTUDY  IEEE 17GENERATOR SYSTEM
deleted. The IEEE 17 generatortransient stabilitytest system[17]
3. Replacements are generated for eachdeletedpoint: is used to illustrate the perfonnance of the proposed
(a) Mparents are selected proportional to fitness algorithm. This system Consists of 17 generators and 16~
from the remainingpoints. buses. The EPRI energy margin software DIRECT is used
(b) New offspring are created as perturbations of to determine the energy margin of the system in response to
the selected parents, xnew = Xparent + D , a single three phase fault. Twelve system features are
where n  N(O,0') . selected to represent the system for neural network training.
These include the real and reactive powers of the 5
120 120
100 100
>.
c
c 80
>.
"c 80 ,
""...
!
60
40
"".e.. 60
40
II. II.
20 20
0
1_ . ... . m . "" . "" "" "" ~ t'!'I I:l ~ 0
0 0.1 0.2 0 .3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 0 0.1 0.2 0.3 0.4 0.5 0 .6 0 .7 0 .8 0.9 1 1.1 1.2
Distance to Boundary Distance to Boundary
I
Initial Distribution Final Distribution
g:~
~ 30
...
! 20
II.
10
o I I I
Distance to Neighbors
Figure 2 Histogramsof the initialpopulation and final population of the boundary marking algorithm
98
generators closest to the fault location and the total system power system operating nomograms from neural networks .
real and reactive load level. A training database of 436 Results show a nearly 5 fold improvement in RMS error
samples was created for initial training of the neural network when applied to the IEEE 17 generator test system.
by randomly perturbing generation settings and system load
levels and then simulating each case on the energy margin 6. REFERENCES
software. The initial RMS neural network testing error was
[1] DJ. Sobajic and Y. H. Pao, "Artificial NeuralNet Based
0.113 corresponding to a test database that was not used for
Dynamic Security Assessment for Electric Power
training .
Systems", IEEE Transactions on Power Systems, vol. 4,
Security Boundary Nomogram
no. 1, February 1989, pp. 220228.
[2] M. A. EISharkawi, R. J. Marks II, M. E. Aggoune, D. C.
1150         . .  .. . Park, M. 1. Damborg and L. E. Atlas, "Dynamic Security
+ Simulato r 1
Assessment of Power Systems using Back Error
1100 +_ _.=..... ~~I~;n:NN
Propagation Artificial Neural Networks." Second
'i" 1050 +...="'I~=___i
Symposium on Expert System Application to Power
..li:
!.
1000 t===!:::::::=J;:>~:='i==:::;::=:;:___1
Systems, Seattle, WA, 1989.
[3] Y. Mansour, E. Vaahedi, A.Y. Chang, B. R. Corns, J.
Tamby and M. A. EISharkawi, "Large Scale Dynamic
950 +~~:==_~ Security Screening and Ranking Using Neural Networks,"
IEEE Transaction on Power Systems, vol. 12 no. 2, May
900 +~~_~~____4
250 300 500 550 1997, pp. 954960.
[4] J. D. McCalley, S. Wang, Q. Zhao, G. Zhou, R. T. Treinen,
and A. D. Papalexopoulos, "Security Boundary
Figure 3 Nomogram ofP73 vs, P76 Visualization For Systems Operation, " IEEE Transactions
on Power Systems, May 1997, pp. 940947.
[5] C. A. Jensen, R. D. Reed, M. A. EISharkawi and R. 1.
The proposed query learning algorithm was then used to Marks II, "Location of Operating Points on the Dynamic
generate additional training samples near the security Security Border Using Constrained Neural Network
boundary. These points are simulated on the DIRECT Inversion," International Conference on Intelligent
energy margin simulation software and the points with large Systems Applications to Power Systems (lSAP '97), Seoul,
errors are added to the training data file. The final training Korea, July 1997.
database consisted of 1177 training samples and the final [6] R. D. Reed and R. J. Marks II, "An Evolutionary
RMS test error was reduced to 0.062. Algorithm for Function Inversion and Boundary Marking,"
Nomograms were then created from the initial and the IEEE International Conference on Evolutionary
enhanced neural networks based on the method proposed in Computation, Perth, West Australia, December 1995.
[4]. These nomograms show the relationship between two [7] P. Kundur, Power System Stability and Control, McGraw
Hill, New York, 1993.
generator power outputs and the security boundary. The two
[8] A. A. Fouad and V. Vittal, Power System Transient
nomograms are shown in Figure 3 along with the true
Stability Analysis Using the Transient Energy Function
nomogram which was created by repeatedly querying the
Method, Prentice Hall, 1992.
simulator. It should be noted that the nomogram of the [9] P. M. Anderson and A. A. Fouad, Power System Control
simulator as shown in Figure 3 required smoothing by fitting and Stability, The Iowa State University Press, Ames,
a 2ad order polynomial to the raw data. The smoothing Iowa, 1977.
operation is required due to the approximations and [10] R. J. Williams, "Inverting a Connectionist Network
assumptions made by the simulation software. The RMS Mapping by Backpropagation of Error", gJr Annual
error for the initial nomogram is 48.53 while the enhanced Conference of the Cognitive Science Society, Lawrence
neural network nomogram is 10.11. This experiment proves Erlbaum, Hinsdale, NJ, pp. 859865, 1986.
the viability of the proposed technique in increasing the [11] A. Linden and 1. Kindermann, "Inversion of Multilayer
accuracy of a partially trained neural network near the Nets," Proceedings of the International Joint Conference
security boundary. on Neural Networks, Washington D. C., pp. 425430, vol.
2,1989.
5. CONCLUSIONS [12] J. N. Hwang, 1. J. Choi, S. Oh, and R. J. Marks II,
"Classification Boundaries and Gradients of Trained
This chapter presents an enhanced query learning Multilayer Perceptrons", IEEE International Symposium
algorithm that effectively locates regions of interest and on Circuits and Systems, 1990, pp. 32563259. .
distributes neural network training data in these regions . [13] EPRI, "Analytical Methods for DSA Contingency
The process is used to enhance the accuracy of partially Selection and Ranking, Users Manual for DIRECT Version
trained neural networks in specific operating regions. The 4.0", EPRI TRI05886, Palo Alto, California, 1993.
proposed technique is applied to the problem of generating
99
[14] QwikNet Neural Network Simulation Software, [16] J.N. Hwang, J.J. Choi, S. Ob and R. J. Marks II, "Query
http://www.kagi.com/cjensen, 1997. Based Leaming Applied to Partially Trained Multilayer
[15]M. A. ElSharkawi and S. S. Huang, "QueryBased Perceptrons", IEEE Transactions on Neural Networks, Vol.
Learning Neural Network Approach to Power System 2, 1991,pp.131136.
Dynamic Security Assessment," International Symposium [17] IEEE Committee Report, "Transient Stability Test Systems
on Nonlinear Theory and Its Applications, Waikiki, for Direct Stability Methods," IEEE Transactions on
Hawaii,December510, 1993. Power Systems, vol. 7, no. 1, February 1992, pp. 3744.
100
Chapter 10
Abstract: This chapter presents applications of genetic algorithm to optimization algorithm for reactive power generation
generation expansion planning and reactive power planning planning problem [7].
problems. The generation expansion planning and reactive power
planning problems are nonlinear dynamic optimization problems
that can only be fully solved by complete enumeration, a process 2. GENERATION EXPANSION PLANNING
. which is computationally impossible for realistic planning
problems. Therefore, modem heuristic approaches, such as genetic Generation expansion planning (GEP) is one of the most
algorithm, are well suited for these complex planning problems.
important decisionmaking activities in electric utilities.
However, the simple genetic algorithm has some structural
problems such as premature convergence. This chapter introduces
Leastcost GEP is to determine the minimumeost capacity
two approaches to overcome the shortfalls of simple genetic addition plan (i.e., the type and number of candidate plants)
algorithms: an improved genetic algorithm (lOA) incorporating a . that meets forecasted demand within a prespecified reliability
stochastic crossover technique and an artificial initial population criterion over a planning horizon.
scheme for generation expansion planning, and a modified simple A leastcost GEP problem is a highly constrained
genetic algorithm (MSGA) incorporating synthetic optimization nonlinear discrete dynamic optimization problem that can
procedure by combining the random search and optimization only be fully solved by complete enumeration in its nature
procedure for reactive power planning. [12,13,14]. Therefore, every possible combination of
candidate options over a planning horizon must be examined
Index Terms: Generation expansion planning, reactive power
planning, genetic algorithm, global optimization.
to get the optimal plan, which leads to the computational
explosion in a realworld GEP problem.
To solve this complicated problem, a number of methods
1. INTRODUCTION have been successfully applied during the past decades.
Masse and Gilbrat [15] applied a linear programming
The generation expansion planning and reactive power approach that necessitates the linear approximation of an
planning problems are nonlinear dynamic optimization objective function and constraints. Bloom [16] applied a
problems that can only be fully solved by complete mathematical programming technique using a decomposition
enumeration, a process which is computationally impossible method, and solved it in a continuous space. Park et ale [17]
for realistic planning problems. Therefore, modem heuristic applied the Pontryagin's maximum principle whose solution
approaches, such as genetic algorithm, are well suited for also lies in a continuous space. Although the above
these complex planning problems. GA is a search algorithm mentioned mathematical programming methods have their
based on the hypothesis of natural selections and natural own advantages, they possess one or both of the following
genetics [1]. Recently, a global optimization technique using drawbacks in solving a GEP problem. That is, they treat
GA has been successfully applied to various areas of power decision variables in a continuous space. And there is no
system such as economic dispatch [2,3], unit commitment guarantee to get the global optimum since the problem is not
[4,5], reactive power planning [6,7,8], power plant control mathematically convex. Dynamic programming (DP) based
[9,10], and generation expansion planning [11]. GAbased framework is one of the most widely used algorithms in GEP
approaches have several advantages. Naturally, they can not [12,13,14,18,19]. However, socalled 'the curse of
only treat the discrete variables but also overcome the dimensionality' has interrupted direct application of the
dimensionality problem. I n addition, they have the capability conventional full DP in practical GEP problems. For this
to search for the global optimum or quasioptimums within a reason, WASP [12] and EGEAS [13] use a heuristic
reasonable computation time. However, there exist some tunneling technique in the DP optimization routine where
structural problems in the conventional GA, such as users prespecify states and successively modify tunnels to
premature convergence and duplications among strings in a arrive at a local optimum. David and Zhao developed a
population as generation progresses [1]. heuristicbased DP [18] and applied the fuzzy set theory [19]
This chapter introduces two approaches in enhancing the to reduce the number of states. Recently, Fukuyama and
capability of GA for power system planning. First, an Chiang [20] and Park et ale [21] applied genetic algoritbin
improved genetic algorithm (IGA), which can overcome the (GA) to solve sample GEP problems, and showed promising
shortfalls of the conventional GA to some extents, is results. However, an efficient method for a practical GEP
introduced for generation expansion planning [11]. Second, problem that can overcome a local optimal trap and the
a synthetic optimization procedure is presented by combining dimensionality problem simultaneously has not been
two optimization methods together, random search and developed yet.
101
In this chapter, an improved genetic algorithm (IGA), /,I (U I) : discounted construction costs [$] associated
which can overcome the aforementioned problems of the with capacity addition U, in year t.
conventional GA to some extents, is introduced [11]. The
//(X,) : discounted fuel andO&M costs [$] associated
IGA incorporates the following two main features. First, an
artificial creation scheme for an initial population is devised, with capacity x, in year t,
which also takes the random creation scheme of the t;(U,) : discounted salvage value [$] associated with
conventional GA into account. Second, a stochastic crossover capacity addition U, in year I.
strategy is developed. In this scheme, one of the three
different crossover methods is randomly selected from a
The objective function is the sum of tripartite discounted
biased roulette wheel where the weight of each crossover
costs over a planning horizon. It is composed of discounted
method is determined through preperformed experiments.
investment costs, expected fuel and O&M costs and salvage
The stochastic crossover scheme is similar to the stochastic
value. To consider investments with longer lifetimes than a
selection of reproduction candidates from a matingpool. The
planning horizon, the linear depreciation option is utilized
results of the IGA are compared with those of the
conventional simple genetic algorithm, the full DP, and the
[12]. In this paper, five types of constraints are considered.
Equation (2) implies state equation for dynamic planning
tunnelconstrained DP employed in WASP.
problem [6]. Equations (3) and (4) arerelated with the LOLP
reliability criteria and the reserve margin bands, respectively.
3. THE LEASTCOST GEP PROBLEM The capacity mixes by fuel types are considered in (5). Plant
types give another physical constraint in (6), which reflects
Mathematically, solving a leastcost GEP problem is the yearly construction capabilities.
equivalent to finding a set ofoptimal decision vectors over a Although the state vector, X t , and the decision vector, UI ,
planning horizon that minimizes an objective function under have dimensions of MW, we can easily convert those into
several constraints. The GEP problem to be considered is vectors which have information on the number of units in
formulated as follows [6]: each plant type. This mapping strategy is very useful for GA
implementation of a GEP problem such as encoding and
t:..u L {f/(U
T
t=1
t )+ ft2(X t )  fJ(U t ) } (1) treatment of inequality (6), and illustrated in the following
equations:
ss. XI =X t  1 + U, (I =1,.· ·,T) (2)
LOLP(Xt)<E (t=1,.· ,T) (3) X, =(x:, ... ,x~)T + X t, (,1
= X, ,···,XtIN)T. (7)
102
classicalonepointcrossover,a random position in a string is population.
chosen and all characters to the right of this position are
4) Crossover  Crossover is performed on two strings at a
swapped. Mutation, the secondary operator in GA, is an
time that are selected from the population at random. It
occasional randomalterationof the value of a stringposition.
involves choosing a random position in the two strings and
Variations of the simple GA for power system applications
swappingthe bits that occur after this position. Crossovercan
can be found in the References [211,20,21]. The
occur at a single position (single crossover). Crossover can
improvements on the conventional GA will be described in
be performed in different methods. Two different means are
the subsequentsections.
used in this paper: Tailtail and headtail crossover.
The simple Genetic Algorithmconsists of a populationof
The tailtail crossover tends to change less significantbits.
bit strings transformed by three genetic operations: selection,
On the other hand, the headtail crossover gives more chance
crossover and mutation. Each string represents a possible
of changes by changing more significant bits. The crossover
solution, with each substring representing a value for a
methods can be changed during iterations: the headtail
variable of interest. The algorithm starts from an initial
crossovercan be used in early generations and then switched
population generated randomly. A new generation is
to tailtailcrossover in eater generations for fine tuning.
generated by using the genetic operations considering the
fitness of a solution which corresponds to the objective S) Mutation  Mutation is performed sparingly, typically
function for the problem. The fitness of solutions is improved every 1001000 bit transfers from crossover, and it involves
through iterations of generations. When the algorithm selectinga string at random as well as a bit postion at random
converges, a group of solutionswith better fitness is obtained, and changing it from a 1 to a 0 or viceversa. It is used to
and the optimalsolutionis obtained. escape from a local minimum, After mutation, the new
generation is complete and procedure begins again with
4.1. StringRepresentation fitnessevaluation of the population.
4.2. GeneticOperations
103
procedures are illustratedin the following and in Table I:
Step 1. Generate all possible binary seeds of each plant
5.2. Fitness Function type considering (6). For example, if ith plant type has an
The objective function or cost of a candidate plan is upper limit of 3 units per year, then generate 4 possible
calculated through the probabilistic production costing and binary seeds (i.e., 00,01,10,11).
the direct investment costs calculation [12,13]. The fitness
Step 2. Find the least commonmultiple(LCM) m from the
value of a string can be evaluated using the following
numbers of the binary seeds of all types, and fill m binary
equation[1,9]: seeds in a lookup table for all plant types and planning
a years. For example, if three plant types have upper limits of
f= (10) 3, 3 and 5 units per year, respectively, then the numbers of
1 + J binary seedsare 4, 4, and 6, and m becomes 12.
where
Step 3. Select an integer within [1, m] at random for each
a : constant,
J : objective functionof (1). element u;" of a string in (9). Fill the string with the
corresponding binary digits, and delete it from the lookup
However, this simple mapping occasionally brings about a table. Repeatuntil m different strings are generated.
premature convergence and duplications among strings in a
population, since strings with higher fitness values dominate Step 4. Check the constraints of (3), (4) and (5). If a string
the occupation of a roulette wheel. satisfies these constraints for all years, then it becomes a
To ameliorate these problems, the following modified memberof an initial population. Otherwise, the only parts of
fitness function, which normalizes the fitness value of strings the string that violate the constraints in year t are generated at
intorealnumbers within [0,1], is used in this paper[1]. randomuntil they satisfy the constraints. Go to step 3 n times
for n'm less than P, where P is the number of strings in a
f '(i) = f(i) fmin (11) population and n is an arbitrary positiveinteger.
fmax  fmin Step 5. The remaining Psnm strings are created using
where uniform random variables with binary number {O,1}. Go to
f(i) : fitness valueof string i using(l0), step 4 to check the constraints and regenerate them if
f .... , f .. : maximum and minimum fitness value in a necessary. This process is repeated until all strings, which
generation, satisfy the constraints,are generated.
rei) : modified fitness value of string i.
This AlP is based on both artificial and random selection
5.3. Creation of an Artificial Initial Population schemes, which allows all possible string structures can be
includedin an initial population.
It is important to create an initial population of strings
spread out throughout the whole solutionspace, especially in 5.4. StochasticCrossover, Elitism, and Mutation
a largescale problem. One alternative method could be to
increase the population size, which yields a high Most of GA works are based on the Goldberg's simple
computational burden. This paper suggests a new artificial geneticalgorithm(SGA) framework [1]. This paper proposes
initial population (AlP) scheme, which also takes the random two different schemes for genetic operation: a stochastic
creation scheme of the conventional GA into account. The crossover technique and the application of elitism. The
104
stochastic crossover scheme covers three different crossover After genetic operations, we check all strings whether they
methods; Ipoint crossover, 2point crossover, and lpoint satisfy the constraints of (3) to (5) or not. If any string that
substring crossover as illustrated in Fig. 2. Each crossover violates the constraints of (3) to (5), only the parts of the
method has its own merits. The lpoint substring crossover string that violate the constraints in year t are generated at
can provide diverse bit structures to search solution space, random until they satisfy the constraints as described in the
however it easily destroys the string structure that may have AlP scheme.
partial informationon the optimal structure.
6. CASE SUTDIES
Parent1 COjIJOOCDCIJ
Parent2 • _,_ II.....
X o
Dill 11• .11 111111 Child 1
• IIC DOC[]C[] Child 2
The IGA, SGA, tunnelconstrained dynamic programming
(TCDP) employed in WASP, and full dynamic programming
lpoint crossover (DP) was implemented using the FORTRAN77 language on
Parent1
Parent2 .._.1....
[][]~[]D~CD[] X DC!II ••pODD
..pee_•••
2point crossover
Child 1
Child 2
an IBM PC/Pentium (I 66MHz) computer.
105
TABLEIV. Among the three crossover methods, the lpoint substring
TECHNICAL AND ECONOMIC DATA OFCANDIDATE PLANTS. crossover showed the best performance in every case. Thus,
Candidate Const Capa FOR Operating Fixed Capital Life
we set the Ipoint substring crossover with the biggest
Type ruction city (%) Cost O&M Cost Time weight, and others with an equal smaller weight. To
Upper (MW) (S/kWb) Cost (S/kW) (yrs) determine the weight of each crossover method in a biased
Limit roulette wheel, 18 simulations were performed with different
Oil 5 200 7.0 0.021 2.20 812.5 25 weights and crossoverprobabilitiesas shown in Table VII.
LNOC/C 4 450 10.0 0.035 0.90 500.0 20
Coal (Bitum.) 3 500 9.S 0.014 2.75 1062.5 25
TABLEVII.
Nuc.(PWR) 3 1,000 9.0 0.004 4.60 1625.0 25 RESULTS OBTAINED BYSTOCHASTIC CROSSOVER METHOD
Nuc.(PHWR) 3 700 7.0 0.003 5.50 1150.0 25
ObjectiveFunction in MillionDollars
Weights (Errors aDinst ODtimaI Solution %)
PC=0.6 PCO.? PC=0.8
6.2. Parametersfor GEP and IGA 0.05 : 0.05 : 0.90 5007.40* 5010.63 5001.40
(0.02%) (0.09%) (0.02%)
There are several parameters to be predetermined, which
0.10: 0.10: 0.80 5086.19 5010.63 5012.37
are related to the GEP problem and GAbased programs. In (O...." e) (0.09%) (0.12%)
this paper, we use 8.5 % as a discount rate, 0.01 as LOLP· 0.15 : 0.15 : 0.70 5007.40 5006.19 5886.19
criteria, and 15 % and 60 % as the lower and upper bounds . (0.02%) (I.OO%l 10.00%)
for reserve margin, respectively. The considered lower and 0.20 : 0.20 : 0.60 SOO6.19 5106.19 5011.19
{O.OO%l (O.IO~.) (0.11%)
upper bounds of capacity mix are 0 % and 30 % for oilfired 0.25 : 0.25 : 0.50 SOO6.19 5007.40 5018.37
power plants, 0 % and 40 % for LNGfired, 20 % and 60 0/0 (0.00%) (0.02%) (0.24%)
for coalfired,and 30 % and 60 % for nuclear, respectively. 0.30 : 0.30 : 0.40 5006.19 5012.46 5007.40
Parameters for IGA are selected through experiments. (O...."e) (0.13%) (0.02%)
Especially, the dominant parameters such as crossover • The solution with objective function as S007.4O minion dollars IS the
secondbest solutionfoundby dyDamic programming.
probabilities and weights for crossover techniques are
determined empirically from a test system with a 6year
planning horizon with other data being the same as Cases 1
and 2. Among 18 simulations, we have found the optimal
solution 7 times and the second best solution 4 times.
Funhermore,the optimal or the secondbest solution is found
TABLEV. PARAMETERS FOR lOA IMPLEMENTATION by applying the stochastic crossover technique when the
probability of crossover is 0.6. Also, when the weight of 1
Parameters Value
point substring crossover is 0.7 and weights for others are
• Population Size 300 0.15, it always found optimal or the second best optimal
• Maximum Generation 300
0.6,0.01
solution. Therefore, we have set the weights in the stochastic
• Probabilities of CrossoveraDd Mutation
• Numberof Elite Strings 3 (1%) crossover technique as 0.15:0.15:0.70 among the three
• Weightsof Ipoint, 2point,and Ipoint 0.15:0.15:0.10 crossovermethods. This choice has resulted in the robustness
Crossoverfor Substrings in a Biased R.oulette Wheel of the stochastic crossovermethod.
To decide the weight of each crossover method in a biased 6.3. Numerical Results
roulette wheel for stochastic crossover, nine experiments are
performed by changing the probability of crossover from 0.6 The developed lOA was applied to two test systems, and
to 0.8, and the results are compared with the optimal solution compared with the results of DP, TCDP and SOA.
obtained by the full DP as shown in Table VI. Throughout the tests, the solution of the conventional DP is
regardedas the global optimum and that of TCDP as a local
TABLE VI.RESULTS OBTAINED BYEACHCROSSOVER METHOD optimum. Both the global and a local solution can be
obtained in Case 1; however, only a local solution can be
Objective Functionin Minion Dollars obtained by using TCDP in Case 2 since the 'curse of
~ . :':)timal Solution. %) dimensionality' prevents the use of the conventional DP.
Crossover Method PC0.6 PC0.7 PC0.8 Fig. 4 illustrates the convergence characteristics of various
Onepoint Crossover 5035.53 5013.50 5057.30
(0.59A) (0.15%) (1.02%) GAbased methods in Case 1. It also shows the improvement
TwopointCrossover 5034.89 5032.98 5034.89 of IGA over SGA. The IGA employing the stochastic
(0.57%) (0.54%) (0.57%) crossoverscheme(lGA2) has shownbetter performance than
OnepointSubstring 5012.53 5012.46 5010.63 the lOA using the artificial initial population scheme(IGAl).
Crossover (0.13%) (0.13%) (0.09%)
DP
By considering both schemes simultaneously (IGA3), the
5006.19
performance is significantlyenhanced.
106
Fig. 5. Observedexecution time for the number of stages.
MiIli:nS
1l8OO
\. .__
0<: ... _ ...  Solution Method Case 1
(l4·year Study Period)
Case 2
(24year Study Period)
moo \.
.."
L..
_ '""""_ ,,...,n _,..
 _......... _..
.............................._._..."., ..... DP 11164.2 unknown
11200 TCDP 11207.7 16746.7
SOA 11310.5 16765.9
moo lOA1 11238.3 16759.2
11000 lOA lGA2 11214.1 16739.2
IOA3 11184.2 16644.7
1 51 101 151 201 251Gererati:n
Table VIII summarizes costs of the best solution obtained Type Oil LNOCIC Coal PWR PHWR
by each solution method. In Case I, the solution obtained by Year I(2ooMWl I(450MWl 500MWl (lOOOMWl I(7ooMWl
IGA3 is within 0.18% of the global solution costs while the 1998 3 (5)1 2 (1) 2 (3) 0(1) 2(0)
solutions by SGA and TCDP are within 1.3% and 0.4%, 2000 5 (6) 3 (1) 5 (6) 0(1) 4 (1)
respectively. In Case I and Case 2, IGA3 has achieved a 2002 5(7) 3 (1) 5 (6) 0(2) 4 (1)
0.21% and 0.61% improvement of costs over TCOP, 2004 8 (10) 7 (3) 6 (7) 0(2) 4 (1)
respectively. Although SGA and IGAs have failed in finding 2006 10 (12) 10 (3) 6 (7) 0(2) 6 (2)
the global solution, all IGAs have provided better solution 2008 10 (13) 10 (3) 6 (9) 0(2) 6(2)
than SGA. Furthermore, solutions of IGA3 are better than 2010 10 (13) 10(3) 6 (9) 0(2) 6 (4)
that of TCOP in both cases, which implies that it can 2012 14 11 8 1 7
overcome a local optimal trap in a practical longterm GEP. 2014 17 14 8 I 7
Table IX summarizes generation expansion plans of Case 1 2016 19 15 10 1 9
and Case 2 obtained by IGA3. 2018 19 17 10 3 9
The execution time of GAbased methods is much longer 2020 20 18 12 3 9
than that ofTCOP. That is, IGA3 requires approximately 3.7 1. The figures within parenthesisdenote the results ofIOAJ m Case 1.
and 6 times of execution time in Case I and Case 2,
respectively. However, it is much shorter than the
conventional OP. Fig. 5 shows the observed execution time The proposed method definitely provides quasioptimums
of IGA3 and OP as the stages are expanded. Execution time in a longterm GEP within a reasonable computation time.
of IGA3 is almost linearly proportional to the number of Also, the results of the proposed IGA method are better than
stages while that ofOP exponentially increases. In the system those ofTCDP employed in the WASP, which is viewed as a
with II stages, it takes over 9 days for OP, and requires very powerful and computationally feasible model for a
about 1.2 millions of array memories to obtain the optimal practical longterm GEP problem. Since a longrange GEP
solution while it takes only II hours by IGA3 to get the near problem deals with a large amount of investment, a slight
improvement by the proposed IGA method can result in
optimum.
substantial cost savings for electric utilities.
107
inherent in GEP problems. Therefore, the proposed IGA The method takes advantage of both the robustness of the
approach can be used as a practical planning tool for a real SGA and the accuracy of conventional optimizationmethod.
system scalelongterm generation expansionplanning. The proposed VAR planning approach is in the form of a
two level hierarchy. In the first level, the SGA is used to
select the location and the amount of reactive power sources
7. REACTIVE POWERPLANNING to be installed in the system. This selection is passed on to
the operationoptimization subproblemin the second level in
The reactive power, or VAR, planning problem is a order to solve the operational planning problem. It is a
nonlinear optimization problem. Its main objectis to find the common practice to use a successive linear programming
most economic investment plan for new reactive sources at
(LP) formulation to improve the computation speed and to
selected load buses which will guarantee proper voltage
enhancethe computation accuracy; the LP method is fast and
profile and the satisfaction of operationalconstraints. Usually
robust. The operational planning problem is decoupled into
the planning problem is divided into operational and
coupled real (P) and reactive (Q) power optimization
investment planning subproblems. In the operational
modules; and the successive linearized formulation of the P .
planning problem the available shunt reactive sources and
Q optimization modules speeds up computation, and allow
transformer tapsettings are optimally dispatched at minimal
the LP to be used in finding the solution of the nonlinear
operation cost. In the investment planning problem new
problem [31]. The dual variables in the LP are transferred
reactive sources are optimally allocated over a planning
from the PQ optimizationmodules to the SGA module in the
horizon at a minimal total cost (operationalandinvestment).
first level to set up the Benders' cut for investment planning.
During the past decade there has been a growing concern
This hierarchical optimization approach allows the SGA to
in power systems about reactive power operation and
obtain correct VAR installations, and at the same time
planning. Recent approaches to the VAR planning problem
satisfy all the operational constraints and the requirement of
are becoming very sophisticated in minimizing installation
minimum operation cost.
cost and for the efficient use of VAR sources to improve
system performance. Various mathematical optimization
formulations and algorithm have been developed, which, in 8. DECOMPOSITION OF REACTIVE POWER
most cases, by using nonlinear [22], linear [23], or mixed PLANNING PROBLEM
integer programming [24], and decomposition method [25
28]. More recently, simulated annealing [29] and genetic The reactive power planning problem is to determine the
algorithm [16][30] have also been used. With the help of optimal investment of VAR sources over a planninghorizon.
powerful computers, it is now possible to do a large amount The cost function to be minimizedis the sum of the operation
of computation in order to achieve a global optimal insteadof cost and the investment cost. The investment cost is the cost
a local optimalsolution. to install new shunt reactive power compensationdevices for
Simulated annealing method is a random search method. the system. The fuel cost for generation is the only operation
Hsiao et al. [28] provided an approach for the simulated cost to be considered in this chapter.
annealing method using the modified fast decoupled load
flow. However only the new configuration 01AR 8.1. InvestmentOperation Problem
installation) is checked with the load flow, and existing
resources such as generators and regulating transformers are The reactive power planning problem involves both
not fully exploited. Simple Genetic Algorithm (SOA) operation and investment costs, and it can be written in the
methodis a powerful optimization techniqueanalogous to the followingform:
natural genetic process in biology. Theoretically, this
technique converges to the global optimum solution with min f(Y .U) =L (Y) + L (U) (12a)
Y.ll o.
probability one, providedthat certain conditions are satisfied.
The SGA methodis known as a robust optimization method. subject to
It is useful especially when other optimization methods fail
in finding the optimal solution. However, it oftenrequires too G.(Y,U)SO (12b)
many repeatedcomputations in obtaining final results. G2(U) S 0 (12c)
In order to obtain a good result for reactive power
generation planning problem, a synthetic optimization where
procedure is presented by combining two optimization
methodstogether, randomsearch and optimization algorithm. T
This chapterpresentsan improvedmethod of operational and y =[pT ,V ,NT] : vector of operational variables
investment planning by using a simple genetic algorithm P : vector of real power generations,
combined with the successive linear programming method. V: vector ofbus voltage magnitudes,
The Benders cut are constructed during the SGA procedure N: vector oftapsettings,
to enhance the robustness and reliability of the algorithm. U : vector of investment variables
108
Lo (Y) : operationcost decision for the feasible investment U is obtained by solving
the U (investment) subproblem:
L" (U) : investment cost
minZ = CTU +0 (1Sa)
G1 (Y, U) : constraintinvolving both Yand U Y.U
(i) Assuming a feasible investment U, the feasible decision Y 9. SOLUTION ALGORITHM FORVARPLANNING
is obtainedby solvingthe Y(operation) subproblem:
The planning methodology developed in the paper is
minf(Y) (14a) simulatedfor reactive power planning problem.The problem
r
is decomposed into investment and operation subproblems,
subject to and solved iterativelyuntil convergence[26].
The operation subproblem is again decomposed into
H(Y)SblBU (14b) economic real (P) and reactive (Q) power dispatch problems
to minimize the fuel cost function [31][33]. In the P module
optimal values of real power generation,and in the Q module
{(ii)} Having found optimal Y from the first stage, the
the optimal values of bus voltage magnitudes and transformer
109
tapsettings are obtained. In addition, the optimal values of tailtail crossover and headtail crossover and the crossover
reactive power dispatched by the generators and position is selected randomly. The headtail can also be used
compensatorsare also obtained. in producing new strings from two identical parents.
In each population, total operating and investment costs In the original SGA, only the of fitness value resulting
are calculated for each investment. The fitness is simply the from the operation subproblem is used to generate new
inverse of this total cost. The ratio of the average fitness and generation. However the new population generated only by
the maximum fitness of the population is computed and its fitness is random and blind. By using the Benders cut,
generation is repeateduntil: which makes use of both the dual variable infonnation and
the cost function, a new and better string can be found. If this
averagefitness new string is a good one (it may be the best one), i.e. it has a
    .....~AP
maximumfitness higher fitness value, it will survive to the next generation.
Otherwise, it will likely die afterwards. hi this method, the
where AP is a given number that represents the degree of robust characteristics of the SOA can still be maintained; at
satisfaction. If the convergence has been reached at given same time it increases the chance to fmd the optimal result
accuracy, then optimal values for investment are found. earlier. The Benders cut can be set up without difficulty
Other criteria, such as the difference between the maximum because all variables are made available when the operation
and minimum fitnesses and the rate of increase in maximum optimization,subproblem is solved.
fitness, can also be used as the stopping criteria. On other
possibility is to stop the algorithm at some fintess number of
10. SIMULATION RESULTS
generations and designate the result as the best fit from the
population.
The systems tested and described here are: The 6 and 30
The iterativeprocess is as follows:
bus networks. The emphasis is on the effectiveness of the
Step 1. Initial population generation  compute the fitness technique and validity of "results. The following parameter
of each member according to operation subproblem are used for·SGA program:
results.
mutation rate: 0.01
Step 2. Generate new population  typical SGA methods,
crossover rate: 1.0
reproduction crossover and mutation, are used. The
abandoning rate 0.9
Benders cut is used on a subset of strings to obtain one
parameter resolution: 3 bits per substring
new and bettermember ofthe population.
Step 3. Computethe fitness of the new generation.
10.1. The 6bus System
Step 4. If convergence condition is satisfied, stop
computation. Otherwise, return to Step 2, and begin a
The 6bus system given in [31] is considered, which has
new generation.
two generators at bus I and 2, and two load buses, 4 and 6,
are used for shunt reactive compensation. The initial load
flow results show that, with no reactive compensation there
The most important step is the Step 2. A new population is
are undervoltages at load buses 3 through 6. Thus the
generated according to the fitness of the old population
reactive power supply from generators is not adequate to
though the simulated spin of a weighted wheel in the SOA
maintain the required voltage profile.
[1]. Some modificationare made to the SGA for our planning
Table X shows the maximum, minimum and average
problem, resulting in a modified SGA(MSGA):
fitness obtained by the simple genetic method (SOA) and
modified simple genetic method (MSGA). Both methods
(1) In the GBD, the iteration procedure is an alternate
give the same final results. However, the MSOA method
computation between investment and operation till the
needs less iteration than the SOA method.
convergence is reached. The Benders cuts are selected and
After the reactive power planning is completed, the total
constructed from old population. It is used to obtain a new
reactive power compensation is summarized in Table XI. It is
member of population.The number of cuts can be adjusted as
observed that the voltage profile is within the operating range
a part of the procedure. Some better fitted strings and some
ofO.90I.IS p.u, Both voltage limits are satisfied. The total
worse fitted strings are selected to construct the cuts. The
cost is decreased from 5619.53 to 5390.78, a decrease of
benders cut helps in narrowing down the space of possible
36.9%. The final operation cost for the optimization without
solutions, and thus speeds up the convergence.
capacitor investment is 5397.78, higher than the optimal
result. The column for Test 2 is corresponding to the result
(2) An abandoning rate is considered in giving up some
obtained by using the Benders decomposition method. The
poor alternativesby assorting the fitness of the alternatives.
total cost is also higher than that of the modified SOA
method.
(3) Different crossovers are also considered, that is, the
110
values. Fig . 7 shows the iteration result for the test case using
only the SGA, where there were 325 crossovers and 141
TABLE X. TIlE FITNESS VALUESFORTHE 6BUSSYSTEM. discards .
Method Gen. min Avg. max It can be seen that when the Benders cuts are added for the
initial 0.2526 0.2543 0.2559 MSGA, only 2 generations are needed to find the optimal
SGA 11 result. After that the optimal results are still maintained
final 0.2556 0.2558 0.2561
during later iterations. As indicated in Fig. 7, the SGA
initial 0.2526 0.2543 0.2559
MASG 9 method needs 18 generations to find the fmal result Due to
final 0.2547 0.2556 0.2561
random search, the optimal result can only be reached after a
considerable number of iterations. The convergence
TABLE XI. SUMMARY OF RESULTS FORTHE 6BUSSYSTEM.
procedure is slower than the MSGA method .
~"M; ~
V, 0.9 1.0 0.82 0.968 0.995 0.996
0.9 1.0 0.87 0.946 0.979 0.995
V.
~
t;
Q.
Q2
10.0
50.0
20 .
20.
0.0
100.0
100.0
100.0
100.0
30.0
94.8
50.0
53.84
25 .19
0.0
53.68
100.0
67.535
18.047
0.0
52.67
100.0
33.06
12.14
20.0
52.40
100.0
14.097
8.930
25.0
:t
0.06.;; 5 10
.
~ 7:;;:::::
1S 20 25
arcon
30
C.
C6 0.0 30.0 0.0 0.0 20 .0 30.0 Fig. 6. MSGAiteratioD result
III
improved from the undervoltages seen in the initial load efficient optimization method for operation subproblem, and
flow, to the required operation range. It was also seen that making use of the parallel nature of the SGA, the MSGA
new shunt capacitors are installed at or near the load buses promises as a useful tool for planning problems.
that exhibit undervoltage violation. Test shows that the
MSGA method is robust in algorithm and gives good results
which include the global minimum as a solution. 12. ACKNOWLEDGEMENT
The SGA needs a higher cpu time compared with an
analytical optimization method. However, the SGA is This worked is supported in part by the Korea Science and
flexible, robust, and easy for modification. There are no need Engineering Foundation and the National Science
of assumptions for linearity, convexity, and so on. As it is Foundation under Grants INT960S028 and ECS970510S.
shown that the method can be easily combined with other The author likes to acknowledge the contribution of R.
methods. The heuristic experience can be added on without Dimeo, X. Bai, Y. M. Park, J. B. Park, J. R. Won, M.
difficulty. With the help of high speed computers, using an Mangoli, L. T. O. Youn,and. Ortiz.
efficient optimization method for operation subproblem, and
making use of the parallel nature of the SOA, the MSGA 13. REFERENCES
promises as a useful tool for planning problems.
[1] D. E. Goldberg, Genetic Algorithms in Search, Optimisation and
Machine Learning, AddisonWesley Publishing Company Inc.,
11. CONCLUSIONS Massachusetts, 1989.
[2] D. C. Walters and G. B. Sheble, "Genetic algorithm solution of
This chapter introduced an improved genetic algorithm economic dispatch with valve point loading", IEEE Trans. on PWRS,
Vol. 8, No.3, 1993, pp. 13251332.
(IGA) for a longterm leastcost generation expansion [3] P. H. Chen and H. C. Chang, "Largescale economic dispatch by
planning (GEP) problem. The IGA includes several genetic algorithm", IEEE Trans. on PWRS, Vol. 10, No.4, 1995, pp.
improvements such as the incorporation of an artificial initial 19191926.
population scheme, a stochastic crossover technique, elitism [4] D. Dasgupta and D. R. McGregor, "Thermal unit commitment using
geneticalgorithms", lEE ProcGener. Transm. distrib; Vol. 141,No.
and scaled fitness function. The lOA has been successfully S,1994,pp.4S946S
applied to longterm OEP problems. It provided better [5] G. B. Sbeble, T. T. Maifeld, K. Brittig, and G. Fabd, "Unit
solutions than the conventional SOA. Moreover, by commitment by genetic algorithm with penalty methods and a
incorporating all the improvements, it was found to be robust comparison of Lagrangian search and genetic algorithmeconomic
dispatch algorithm", Int. Journm of Electric Power &: Energy
in providing quasioptimums within a reasonable Systems, Vol. 18,No.6, 1996,pp. 339346.
computation time and yield better solutions compared to the (6] K. lba, "Reactive power optimization by genetic algorithm", IEEE
TCDP employed in WASP. Contrary to the DP, computation Trans. onPWRS, Vol.9, No.2, 1994,pp.685692.
time of the proposed IGA is linearly proportional to the [7] K. Y. Lee,X. Bai, and Y. M. Park, "Optimization method for reactive
power planning using a genetic algorithm", IEEE Trans. on PWRS,
number of stages. The developed IGA method can Vol. 10,No.4, 1995,pp. 18431850.
simultaneously overcome the 'curse of dimensionality' and a [8] K.. Y. Lee and F. F. Yang, "Optimal relCtive power planning using
local optimum trap inherent in GEP problems. Therefore, the evolutionary algorithms: A comparative study for evolutionary
IGA approach can be used as a practical planning tool for a programming, evolutionary strategy, genetic algorithm, and linear
programming", IEEE Trans. on PWRS, Vol. 13,No.1, 1998,pp. 101
realsystem scale longterm generation expansion planning.
108.
A synthetic method of reactive power planning is [9] R. Dimeo and K. Y. Lee, "BoilerTurbine control systemdesignusing
presented. Different from the conventional SOA, which a genetic algorithm", IEEE Trans. on Energy Conversion, Vol. 10,
mainly uses the objective function for its fitness evaluation, No.4, 1995,pp. 752759.
[10] Y. Zhao, R. M. Edwards, and K. Y. Lee, "Hybrid feedforward and
the approach presented in this chapter, MSGA, makes use of
feedback controller design for nuclear steam generators over wide
not only the objective function but also the dual variable range operation using genetic allorithm", IEEE T1'tI1lS. on Energy
information. The SOA is a random search algorithm and Conversion, Vol. 12,No.1, 1997,pp. 100106.
useful in finding the global optimal solution. The new (11] Park, J.B., Y.M. Park., J.R.. Won, and K. Y. Lee, "An Improved
formulation of the Benders method for investmentoperation Genetic Algorithm for Generation Expansion Planning," IEEE
Transactions on PowerSystems, Vol. 15, No.3, pp. 916922, August
decomposition improves the robustness of the random 2000.
algorithm. Test shows that the MSGA method is robust in [12] S.T. Jenkinsand D.S. Joy, Wien Automatic SystemPlanning Paclcage
algorithm and gives good results which include the global (WASP)  An Electric UtilityOptimal Generation Expansion Planning
minimum as a solution. Computer Code, Oak Ridge National Laboratory, Oak Ridge,
Tennessee, 0RNL4945, 1974.
The SGA needs a higher cpu time compared with an [13] Electric Power Research Institute (EPRI), Electric Generation
analytical optimization method. However, the SGA is Expansion Analysis System (EGEAS), EPR! EL2561, Palo Alto,CA,
flexible, robust, and easy for modification. There are no need 1982.
of assumptions for linearity, convexity, and so on. As it is [14] S. Nakamura, "A review of electric production simulation and
capacity expansion planning programs", Energy Research, Vol. 8,
shown that the method can be easily combined with other 1984,pp. 231240.
methods. The heuristic experience can be added on without [15] P. Masse and R. Gilbrat, "Application of linear programming to
difficulty. With the help of high speed computers, using an investments in the electric power industry", Manage",."t Science,
Vol. 3, No.2, 1957, pp. 149166.
112
[16] J. A. Bloom, "Longrange generation planning using decomposition [29] R. C. Reeves, Modern Heuristic Techniques for Combinatorial
and probabilistic simulation", IEEE Trans. on PAS, Vol. 101, No.4, Problems. New York, Halsted Press, an imprint of John Wiley \&
1982,pp. 797802. Son, Inc., 1993.
[17] Y. M. Park,K. Y. Lee, and L. T. O. Youn,"New analytical approach [30] V. Miranda, J. V. Ranito, and L. M. Proenca, "Genetic algorithms in
for longterm generation expansion planning based maximum optimalmultistagedistribution network planning", IEEE PES Winter
principle and Gaussian distribution function", IEEE Trans. on PAS, Meeting, #94 WM 229SPWRS.
Vol. 104,1985,pp. 390397. [31] M. K. Mangoli, K. Y. Lee, and Y. M. Park, "Optimal real and
[18] A. K. Davidand R. Zhao, "Integrating expert systems with dynamic reactive power control using linear programming", Electric Power
programming in generation expansion planning", IEEE Trans. on SystemResearch, 26, 1993, pp, 110.
PWRS, Vol.4, No.3, 1989,pp. 10951101. [32] M. Geoffrion, .... Generalized Benders decomposition", Journal of
[19] A. K. David and R. Zhao, "An expert system with fuzzy sets for Optimization Theory and Applications. Vol. 10,No.4, 1972,pp. 237
optimal planning", IEEE Trans. on PWRS, Vol. 6, No.1, 1991, pp. 26l.
5965. [33] K. Y. Lee, Y. M. Park, and J. L. Ortiz, "United Approach to optimal
[20] Y. Fukuyama and H. Chiang, "A parallel genetic algorithm for real and reactive power dispatch", IEEE Trans. on Power Appar. and
generation expansion planning", IEEE Trans. on PWRS, Vol. 11,No. Syst. PASI04, (1985), pp. 11471153.
2, 1996,pp.955961.
[21] Y. M. Park, J. B. Park, and J. R. Won, "A genetic algorithms
approach for generation expansion planning optimization", Proc. of Additional Referencse on ReactivePowerPlanning
the IFAC Symposium on Power Systems and Power Plant Control,
PERGAMON, UK, 1996,pp. 257262. [34] H.H. Happ, '''Optimal power dispatch: A comprehensive smvey',
[22] R. Billington and S.S. Sachdev, ....Optimumnetwork VAR planning IEEE Trans. on Power Appar. and Systems, 1977, PAS96, pp. 841
by nonlinear programming", IEEE Trans. on Power Apparatus and  854. .
Systems, PAS92, 1973,pp. 12171225. . [35] O. Alsac, J. Bright, M. Prais, and B. Stott, "Further development in
[23] G.T. Hegdt and W.M. Grady, "Optimal var siting using linear load LPbased optimal power flow" IEEE Trans. on Power Systems,Vol.
flow formulation", IEEE Trans. on Power Apparatus and Systems, 5. No.3, August 1990,pp. 697711.
July/Aug. 1975.pp 12141222.Vol. Pas102No.5, May 1983. [36] E. Hobson,"Network constrained reactive power control using linear
[24] K. Aoki, M. Fan, and A. Nishikori, "'Optimal var planning by programming", IEEE Trans. on Power Appar. and Syst., PAS99,(4),
approximation method for recursive mixed integer linear planning", 1980,pp.10401047
IEEE Trans. on Power Syst., PWRS3,(4), 1988,pp. 17411747. [37] R. Fernandes, F. Lange, Burchettr., H. Happ, and K. Wirgau,' "Large
[25] K.Y. Lee, J.L. Ortiz, Y.M. Park, and L.G. Pond, "An optimization scale reactive power planning", IEEE Trans. on Power Appar. and
technique for reactive power planning of subtransmission network Syst.,Vol.PASI02 No.5, May 1983.
under nonnal operation", IEEE Trans: on Power Syst., 1986, [38] H.H. Happ andK.A. Wirgau, ....Static and dynamic var compensation
PWRSl, pp. 153159 in systemplanning", IEEE Trans. on Power Appar.Syst., PAS97, (5),
[26] M. K. Mangoli, K. Y. Lee, and Y. M. Park, "'Optimal longterm 1978,pp.15641578.
reactive power planning using decomposition technique ", Electric [39] A. Hughes,G. Jee, P. Hsiang, R.R. Shoults, and M.S.Chen, .... Optimal
PowerSystemsResearch, 26, 1993,pp. 4152. power planning", IEEE Trans. on Power Appar. and Syst, PAS1OO,
[27] R. Nadira,W. Lebow, and P. Usoro, .... A decomposition approach to 1981,pp.218996.
preventiveplaDning of reactivevoltampere (VAR) source expansion", [40] R.R. Shoultsand M.S. Chen, . "R.eactive power controlby least square
IEEE Trans. on PowerSystems, Vol. PWRS2,No.l,Feb. 1987. minimization", IEEE Trans. on Power Appar. and Syst., PAS95,
[28] Y.T. Hsiao, C.C. Liu, H.D. Chiang, and Y.L. Chen,HA new 1976,pp. 397405.
approach for optimal VAR sources planning in large scale electric [41] W.M. Lebow, R.K. Mehra, R. Nadira, R. Rouhani, and P.B. Usoro,
power systems",IEEE Transon PowerSystems, Vol. 8, No.3,August "Optimization of reactive voltampere (VAR) sources in system
1993.pp. 988996. planning", EPRI Report, EL3729, Project 21091,Vol.I Nov. 1984
113
Chapter 11
Network Planning
Abstract: A key feature of the application of iterative the chapter to illustrate the application of three approx
approximation methods such as simulated annealing, ge imation algorithms, namely, simulated annealing, genetic
netic algorithms and tabu search to power network prob algorithms, and tabu search. Three of these combina
lems is the codification of the problem and the consequen torial problems are concerned with distribution systems:
tial definition of neighborhood of a given configuration. the distribution system expansion planning problem, the
This chapter addresses the different ways codification and optimal capacitor placement in distribution networks, and
neighborhood definition can be made in connection with the reconfiguration of primary distribution feeders prob
four important network problems: optimal reconfigura lem. The forth problem regards the transmission network
tion of distribution systems aiming to optimize opera expansion planning problem, which has already been in
tions, capacitor placement in primary distribution feed troduced in [1] (This chapter expands that discussion and
ers, optimal distribution system expansion (the addition introduces results obtained for large realworld test sys
of substations and feeders), and the optimal expansion of tems.)
transmission networks.
Keywords  network expansion planning, distribution sys 2.1 Reconfiguration of Distribution Feeders
tem planning, capacitor placement, optimization meth
ods, combinatorial optimization. The determination of the optimal configuration of distri
bution feeders consists in finding the topology of a radial
system, with part of the feeder sections in operation while
1. INTRODUCTION others remain deenergized, as illustrated in Fig. 1 (this
This chapter presents the application of three iterative system was originally studied in [22]). The objective can
approximation algorithms (simulated annealing, genetic be, for example, the minimization of the power losses,
algorithm, and tabu search) to four important power net
p2+Q2
=L ri
n
work planning problems. By planning it is understood Loss i 2 i
both operations and expansion planning. The first prob i=l 'Vi
lem is the reconfiguration of primary distribution systems,
which is a typical operations planning problem. The other where Pi and Qi are the active and reactive power flows
three problems are the allocation of capacitor in distribu leaving the node i, Vi is the voltage magnitude at node i,
tion systems, the distribution system expansion planning, r i is the resistance of the corresponding feeder section, and
and the power network expansion planning, which are ex Loss represents the total active power losses in all feeder
pansion planning problems. Both small example system sections. Alternatively, the objective can be to minimize
and real world cases are presented. the Load Balancing Index (LBI), formulated as follows:
The chapter is organized as follows. Firstly, the math
ematical formulation of the example problems are pre
sented, which includes the definition of the objective func LBI =
1
(L(Y 11;)2) t
n
n i=1
tions, the main constraints and variables involved in the
problems. Next, the codification of the problems for the where n is the number of primary feeders (including one
different techniques (simulated annealing, genetic algo or more feeder sections), Yi is the normalized loading on
rithm and tabus search) are presented along with the feeder j (the actual loading divided by the loading limit)
neighborhood structures. Then implementation and algo and y is the average of the normalized loadings '11; Other
rithm details are discussed. And finally, relevant results objectives can be used as well, or a combination of two or
are summarized. more objectives, in which case the problem would become
a multiobjective optimization problem.
2. MATHEMATICAL FORMULATION OF THE The energized feeder sections form a forest (a set of
PROBLEMS trees) rooted ate the substations (the low voltage side of
the transformers). The deenergized feeder sections, on
This section presents the mathematical formulations of the other hand, connect two of such trees, as illustrated
four combinatorial optimization problems that are used in in Fig. 1 and 2. Using the terminology of the theory of
114
graphs, the reconfiguration problem can be stated as fol I II III
lows: find the set of trees that lead to the minimization of
the objective function (say, the losses, or the LBI index),
satisfying (1) the voltage drop limits, (2) the capacities of
feeder sections and transformers, and (3) the power flow
equations.
An alternative formulation for the reconfiguration
problem consists in characterizing a topology by the
switch statuses (both sectionalizing switches and the tie ® ..
switches) . In this case rather than determining which 18 ,~.: 21
feeder sections are energized or not, the objective is to
determine which switches are open and which ones are 19 .... (IDe?)
closed in the optimal configuration. 5 ......•.•.··• 20
The radiality constraints, implicit to the requirement 23
@
that the topology forms a forest, makes the problem a
hard one to solve. According to [21, 26], the optimal
®
configuration for the example system given in Fig. 1 is n 1~t
14 @
...... · .. ·26·.·.
<D 25
shown in Figure 2.
Totallosses~ 466.1 kW
nt ftC
2.2 Optimal Capacitor Placement mint} = keLTiPi(Xi ) + L!(Uk) (1)
i=o k=l
Capacitor banks are added to radial distribution systems s,c,
for power factor correction, loss reduction, voltage profile
improvement and, in a more limited way, circuit capacity Gi(Xi,U i ) 0; = i = 0,1, ... ,nt
increase. With these various objectives in mind, and sub Hi(x i ) ~ 0; i = O,l, ... ,nt
ject to operating constraints, optimal capacitor placement o :5 ul u~;s k E C1 or
aims to determine capacitor types, sizes, locations and
control schemes. Like many other combinatorial problems
o $ ut Uk; = k E. C2
found in power network planning, the capacitor placement where nt represents the number of load levels in the piece
problem presents a multimodallandscape. This is a hard, wise linear load duration curve; n c is the number of candi
115
date buses (buses where capacitor allocation is allowed); demand growth as well as geographical expansion. A spe
=
Gi(Xi, U i ) 0 represents the power flow equations for the cial case of this problem is the so called green field plan
ith load level (Xi are state variables and 'Ui are control ning where there is no initial system and an entirely new
variables, i.e., capacitor bank reactive power); Hi(zi) ~ 0 network has to be built from scratch (An illustrativa case
are the operating constraints for the ith load level (e.g. green field planning is shown in Fig. 4 [31]). In this fig
voltage limits); u~ represents the size of the capacitor ure the dotted lines indicate the places where new primary
bank that can be allocated to bus k; u~ represents the feeders can be added; the data for these feeders are sum
operation level of the capacitor allocated to bus k for load marized in the Table 1. There are also two alternatives
level i. C 1 and C2 are the sets of candidate buses for both for substation additions, namely, substations 1 and 2.
fixed and variable capacitor banks. The objective function
of Problem 1 has two parts: (a) the first part represents
the cost of losses (Ti represents the fraction of time the w'9
load curve stays at level i with losses Pi (Xi); k e represents
energy cost in S US/KWh); (b) the second part represents
r :
(2)
and switched (variable) capacitors whose taps can be
changed according to the load level. In this case, for each
capacitor k there are nt + 1 different levels of operation,
u~, where i =
0,1, ... , nt are chosen according to the cur Fig. 4. Alternatives for the expansion of a distribution
rent load level so that system [31].
(3)
The capacitor optimal allocation problem formulated An important constraint is the radiality condition,
above can the properly solved by the modern heuristic that is, only radial configurations are accepted, all other
optimization techniques discussed in this tutorial. Fig. 3 topologies being considered infeasible. Several alternative
illustrates a radial distribution system with 9 buses and mathematical formulations have been suggested in the lit
one substation, which will be used in subsequent sections erature for the distribution expansion planning problem.
of this chapter as an illustration of the combinatorial tech For the sake of illustration, the following formulation is
niques discussed herein. adopted in this chapter [32]:
47
..........................................!:::::::: ~
Fig. 5. Example of a complex transmission expansion problem: The Brazilian NorthNotheastem Network.
118
B(.)  Susceptance matrix. size systems, although they present excessive computa
tional burden when applied to larger problems (one hun
()  Vector of nodal voltage angles. dred decision variables or more) such as the one illustrated
,0 _Vector of initial susceptances, whose elements are in Fig. 4.
Under these circumstances it was only natural to expect
"If.;, I.e, the summation of the susceptances in branch the recourse to iterative approximation algorithms such
ij at the beginning of the optimization.
as the ones based on physical and biological metafores
(simulated annealing and genetic algorithms), or derived
nij Number of circuits added in branch ij: nij =
from artificial intelligence techniques (tabu search) [37,
Xij /,ij;
where 'Yij is the susceptance of the new cir
cuits. 41].
¢ij  =
Defined as the ratio: 4>ij fij/'Yij; where Iii is the 3. CODIFICATION AND NEIGHBORHOOD STRUC
maximum flow in a circuit ij.
TURE
d  Vector of liquid demand.
In this section various codification approaches are dis
g  Generation vector. cussed in connection with the four problems formulated
in the previous section, namely, the reconfiguration of dis
9  Vector of maximum generation capacity. tribution feeders, the optimal placement of capacitors in
distribution systems, the expansion of distribution sys
r  Vector of artificial generations. tems and the expansion of transmission networks.
which is solved for testing the adequacy of a candidate This codification can be improved by ordering the feeder
solution; edequacyis indicated by zero loss of load. Notice sections such that the ones that are in operation appear
that Problem (5) is always feasible due to the presence of together, as indicated in the following:
the loss of load factor Ei Qiri in the objective function;
thus whenever a tentative solution set Xij is inadequate,
feasibility is achieved by the use of artificial generators
(loss of load).
n35 =1
Fig. 8. Basic configuration of the 6bus network.
93 ~
= 165 187.001
rs = 120.469
..19.531
31.748t
w=o,o
vi = 200
21.797t
w = 158.239
vi = 120 ..... 188.118
~""4
4. IMPLEMENTATION DETAILS
Fig. 9. A solution proposal for the transmission network
Special features of the the algorithms addressed in the
expansion problem of Fig. 8.
chapter are further discussed in this section.
123
4.1 Simulated Annealing 4.2 Genetic Algorithms
A modified genetic algorithm applied to the reconfigu
ration problem is described in [26]. The codification
A application of SA to the capacitor placement problem
adopted in this paper departs from the one described pre
was addressed in [16, 17], where the codification described
viously in this chapter and so.it is summarized in the fol
in Section 3.2 has been adopted. Reference [16] also sug
lowing. In this case a topology is identified by the feeder
gests the elimination of trivially infeasible solutions by
sections that are no in operation. For the topology of Fig.
avoiding placing more capacitor banks at a given bus for
1, the codification is as follows:
the intermediate load level than is added for the high load
level. In that paper the concept of compound neighbor Feeder section
P1=~
hood is used, according to which, the neighborhood of the
current topology is found by (1) the addition/removal of
singler banks, (2) the addition/removal of multiple banks,
and (3) the swapping of the banks added to two buses. In [26] the objective function is put into the fonn z(x) =
In the SA algorithm, in each iteration, only one neigh K  f(x) to transform the minimization problem into a
bor is randomly chosen from the neighborhood as defined maximization problem, and proportional section is per
above for further evaluation. Additionally, for variable formed. Elitism is used for selection and the four best
banks, two strategies are proposed for changing the num configuration are kept in the next population, the n"  4
ber of allocated banks: (1) synchronous changes where remaining members of the population being generated by
the number of capacitor banks is change for all load levels recombination and mutation. The crossover operator dif
at the same time, and (2) asynchronous changes where fer from other proposals that have appeared in the liter
the changes are performed only for the highest load level. ature, which is consistent with the codification that has'
In the capacitor placement problem the number of infea been adopted. In the crossover operation the common
sible topology in a given neighborhood is relatively small, elements in both participating configurations a kept in
if compared with other problems such as the transmis the two descendents, the differing elements being trans
sion network expansion problem. Thus, the SA algorithm ferred by a random mechanism. For example, consider
traverses the search space stepping only on feasible so the crossover between the two following topologies:
lutions, infeasible candidates being identified and elimi
nated before the execution of the acceptance test. The Pl=~
cooling schedule is the standard one regarding fixed ca
pacitor banks, whereas for variable banks, a local cooling P2=~
schedule is used to find the banks that should operate for
each load level (the sizing problem). This local cooling As 26 is common to both topologies it is passed to the
schedule is normally responsible for most of the compu two descendents; the other elements, namely, 15, 17, 19 e
tational effort spent by the SA algorithm. 21, are then passed to the descendents by random choice;
for instance, 17 e 19 go to the first descendent and 15 e
Sequential and parallel SA algorithms applied to the
21 to the second, resulting the following:
transmission network expansion problem are described in
~=~
[2] and [3], respectively (The paralle implementation is
discussed in Chapter 6 of this tutorial). In both cases,
~=~
codification and neighborhood structure are as described
in Section 3.4 (see also Chapter 6 of this tutorial). In
[2], transitions through infeasible topologies are allowed,
since the occurrence of such topologies is much more fre Although in this example the two resulting topologies are
quent than it is in the capacitor placement problem dis radial, Figs. 1 and 2, this is not always the case: in
cussed above. The SA algorithm can start with a topol the previous example, that would have occurred if 15 and
ogy generated randomly or determined by a constructive 19 had been passed to one descendent and 17 and 21 to
algorithm, although the SA algorithm has the tendency the other in which case both resulting topologies would be
to destroy the initial topology in the early stages of the nonradial. When this happens, the topologies are altered
cooling process, where the temperature is still high and to maintain radiality. Mutation also is designed with ra
uphill movers are accepted with higher probabilities, it diality in mind. As mentioned before, mutation is per
has been observed in practice that the results are slightly formed by randomly chasing a feeder sention to be added
better when the initialization is made by a constructive and another other belonging to the resulting closed loop
algorithm. Best performance was observe with Garver's to be removed. Consider for example that the following
algorithm [4, 12]. topology is subjected to mutation:
124
Pl=~ where it is indicated that no circuit has been added to the
feeder section 1, a type 2 circuit has been added to the
feeder section 2, a type 1 circuit has been added to the
If branch 19 is selected for addition, then one of the fol
feeder section 3, and so forth. Also, a substation of type 1
lowing branches can be removed: 11, 12, 15, 16 and 18(see
has been added to the substation location 1. As a matter
Figs. 1 and 2). The following topology then results:
of fact the substation type is linked to the type of circuit
leaving that substation and so is not actually needed, so
P1=~ that when the circuit is built the substation also is. This
can be taken into account by the following codification,
It is also worth mentioning that in [26] is used a variable which is taken from [35]. In Fig. II(a) are shown the
mutation rate controlled by a SA mechanism and that the initial topology along with the expansion alternatives and
initial configuration is obtained by performing n p 1 tran in Fig. 11(b) an expansion alternative. According to [35]
sitions with the mutation mechanism described above. the codification for this solution proposal is as follows::
The software GENESIS is used in [18] to develop an
L1 L2 L3 L4 L5 82
application of GA to the capacitor placement problem.
Is this application the most attractive candidate buses Pl=~
for capacitor bank additions are preselect by sensitivity
analysis, which is helpful in reducing the dimension of the
combinatorial problem being solved. A variation of GA
called mimetic algorithm was used in [19] to solve the
capacitor placement problem considering a threephase
network model which allow the modeling of unbalanced
systems. Mimetic algorithms are based on GA with a lo
L1 ~
cal optimization phase in which, once a new population
is found, it is improved by means of local search. The L3 :
algorithm is applied in two stages: an initial stage which ···..··..··..···············0
consists of a conventional GA and a second stage which ::
performs the local search. In the first stage infeasibilities ; L5 L2:~
are treated by penalties; in the second stage only feasible
~ 14 •• :i•• L4
transitions are considered. The codification used in this 0 4 52 •• ·: ••••:·
...........................
(a) (b)
application is based on a vector with four parts: (1) in
the first part are stored the candidate buses for capacitor
placement, (2) a second part with the binary representa
tion of the capacitors that are added per bus for each load Fig. 11. Distribution network: (a) initial network and (b)
level, (3) a thied part similar to the first one but contained a solution proposal.
candidates for replacement, and (4) a fourth part similar
to the second one but containing information about the The genetic operator of crossover can generate nonradial
way banks are operated in replacement. The initial config topologias. Thus, for example, for the case shown in Fig.
uration is randomly chosen. Crossover is applied to each 11, the following codification represents a nonradial s0
substring of the codification vector. Remainder stochastic lution:
sampling is used to perform proportional selection. Ll L2 L3 L4 L5 82
The most critical issue in practical application of GA to
the distribution expansion problem is codification. Two
P2 = ITIillliIillJ
alternative approaches to codification originally proposed
in [33, 34] are discussed in the following. In [33] various Another interesting codification for the distribution plan
types of cables used in primary feeders as well as different ning problem was suggested in [34]. This can be illus
types of substation equipment are considered in the op trated by the situation shown in Fig. 12, which was orig
timization process. In this case the investment decisions inally studied in [35J.
are no longer modeled by binary variables as described In the method of [34] the main idea is to keep the binary
earlier in the paper. In [33] for example, a case with two codification and at the same time minimize the occurenee
substation types and 12 feeder sections, can be codified of infeasible topologies. Also, decodification is an issue
as follows: in order to facilitate the evaluation of the objective func
tion. The relevant information is then stored in a per bus
Sub.
structure, which requires the proper number of bits to
............ '"'01....r.~.....[ill]
......... keep all the information regarding the feeders connected
125
Bl B2 Bl B2
: ••••:
..
L6
;. .f(.
: 81 Z•••••••.•••••••••.••••
L6
.. .."
L71
~
X
:
ILl L1
Bl
010
B2
1
B3
010
B4
o 11 1
B5
I0
B6
1 I1
81 L1 L3 Ll nil nil
Pl =
~B3 L3 B4~ L6 L6 L5 L2 L4 82
L~··················~2 L7 L7 L3 L5 L2
L4
j B5 L4 B6:.J..: B5
0··························: 82 :
(a) : ••••:
0
(b)
 1 GA applied to the transmission network expansion
problem was addressed in [6] and [7]. In [6] a decimal
codification was proposed for representing the network
topologies, the real part of the problem being solved via
Fig. 12. Rede de distribui,c 00: (a) rede inicial, (b) pro linear program. The decimal representation requires an
posta de solu,c ao adaptation of the mutation mechanism. The binary repre
sentation is avoided in this case due to problems caused by
the generation of infeasible solutions. This makes the cod
to a bus. The idea behind this codification can be summa ification used in GA, similar to the the pones that are used
rized as follows: (1) the number of circuits in operation is in SA and TS. To allow the used of proportional selection,
the number of buses minus one, (2) a bus load should be in [6] two types of transformations have been tested for
connected to at least one circuit, and (3) a bus without creating a maximization problem: (1) z = 1/11 and (2)
load can be connect to zero, one or more circuits. Figure z = K  v, where K is a parameter which is determined
12 can be used to illustrate this type of codification: the such that it is always bigger than the cost associated with
part (a) of the figure shows the system that should be ex the worst configuration found in the current population.
panded, and the part (b) a solution proposal. According As with the capacitor placement problem describe above,
to [35] the codifiation in this case should be put in the remainder stochastic sampling was used both in [6]and in
following format: [7]. In [6] both single point and multiple point crossover
were used. although for small and medium size networks
the conventional crossover approach works fien, for larger
Bl B2 B3 B4 B5 B6 network, such as the one of Fig. 5, building blocks are
010 1 110 all
010 011 . used to facilitate the generation of meaningful configura
tions (this build blocks are formed by sets of lines and
81 Ll L3 Ll nil nil
PI = transformers that connected islanded buses to the main
L6 L6 L5 L2 L4 S2 networks). Due to decimal codification, mutation is mod
ified as follows: entire paths are considered for mutation
L7 L7 L3 L5 L2
and an acceptance test of the type used in SAis performed
L4 to avoid introducing configurations that are too ineffec
tive. Unlike SA, the initial configurations have a definite
effect on the efficiency of GA when applied to the trans
where the binary numbers represent the topology of Fig. mission network expansion problem. Thus the heuristic
12(b). In this case nil indicates that the bus will not algorithms of [12, 13, 14] have been used to generate a
be supplied, i.e., it will remain isolated. For example, good initial population, that is, configurations containing
in the previous codification, if B3 is represented by the a number of attractive building blocks.
binary number 10, then it means that the feeder section
L7 is connected to B3. On the other hand, B6 occupies
4.3 Tabu Search
two slots with four options, and represents a plausible
combination. However, if after crossover or mutation, B3 A basic TS algorithm with short term memory, tabu list
is represented by 11, then it would mean a nonexistent and aspiration criterion applied to the capacitor place
configuration. Finally, notice that this codification can ment problem is reported in [20]. In this application, long
reduce the number of infeasible topelogies, that are gen term memory based on transition frequency has also been
erated by crossover and mutation, although the this possi used for diversification. As in [18], sensitivity analysis is
bility is not entirely eliminated, as shown by the following used to select a set attractive buses for capacitor bank ad
example: ditions, which reduces de size of the serve space but can
126
also affect solution optimality. The codification is made and (3) another Ts strategy for leaving the feasible re
with the help of a vector set which contain the following gion. The overall metaheuristic is formed by the repeated
informations: candidate bus location, capacitor setting, application of this threestep procedure plus diversifica
installed capacity, power loss, voltage magnitude, objec tion. for a comparative evaluation of the various iterative
tive function value and frequency counter. A tabu list approximation methods applied to the transmission net
with size 15 and an aspiration criterion that established work expansion problem see [5], where both theoretical
that relaxes the tabu restraint if the solution found is bet and practical aspects are addressed.
ter than the incumbent. The initial topology is feasible
and found by a random method. In [21] a hybrid TS al
5. CONCLUSIONS
gorithm with features taken from heuristic methods, GA
and SA is suggested for the capacitor placement problem. This chapter addressed the codification and the neighbor
The hybrid algorithm works with a population as in GA hood definition for four power network problems: optimal
and works in two phases: (1) phase I is an heuristic strat reconfiguration of distribution systems aiming to optimize
egy that finds a variety of high quality feasible topologies operations, capacitor placement in primary distribution
and, (2) phase II is TS strategy. In phase I topologies are feeders, optimal distribution system expansion (the addi
found via sensitivity analysis, and variations around these tion of substations and feeders), and the optimal expan
topologies are found by repeat application of the heuris sion of transmission networks. Although only three types
tic method and prohibiting the use of specified banks for of randomized iterative algorithms have been considered
obtained different solutions. Part of these topologies will namely, simulated annealing, genetic algorithm and tabu
be elite configurations. All buses that appear in these search, the conclusions can be extended to other methods
solution will be considered as candidates of the reduced such as simulated evolution among others.
candidate set in Phase II which has three steps: (1) for
each element of the population is applied a TS with short
term memory, tabu list and aspiration criterion, and the REFERENCES
solutions found can either become the new incumbent or
enter the elite configuration list, provided that it is better [1] Romero R., Monticelli A.: "Fundamental of Sim
than at list one of the configuration already in the list ulated Annealing" , IEEE PES Tutorial on Modem
and has a minimum number of attributes that differ from Heuristic Optimization Techniques  Application to
the configurations that will remain in the list; (2) gener Power Systems, 2001.
ate a new population by genetic recombination and path [2] Romero R., Gallego R.A., Monacelli A.: "Trans
relinking of the current elite configurations. The neigh mission System Expansion Planning by Simulated
borhood structure is as defined in Section 3.2. the number Annealing" , IEEE Transactions on Power Systems,
of neighbors is kept at a minimum given that the evalua Vol. 11(1), pp. 364369, February 1996.
tion of each one of the neighbors requires the solution of
a power flow problem. [3] Gallego R.A., Altles A.B., Monticelli A., Romero
R.: "Parallel Simulated Annealing Applied to Long
References [8, 9, 10] report the application of the TS Term Transmission Network Expansion Planning" "
method to the transmission network expansion problem. IEEE Transactions on Power Systems, Vol. 12(1),
In [8] and [9] the codification is the same as in [5] and pp. 181188, February 1997. .
so the same observations made regarding SA and GA al
gorithms hold true for the TS application. The neigh [4] Gallego R.A.: "Planejamento a Longo Prazo
borhood structure used in [8] comprises three parts: (1) de Sistemas de Transmissio Usando Tecnicas de
one for the basic TS algorithm based on short term mem Otimiz~ Combinatorial", Tese de Doutorado,
ory, and where infeasible transitions are allowed (methods UNICAMP, 1997.
[12, 13, 14] are used in this part), (2) other for the intensi [5] Gallego R.A., Monticelli A., Romero R.: "Compar
fication phase, where only feasible transitions are allowed ative Studies of NonConvex Optimization Meth
(a greedy search is carried out), and (3) another for the
ods for Transmision Network Expansion Planning" ,
diversification phase, where the strategic oscillation takes IEEE Transactions on Power Systems, Vol. 13(2),
the current solution back to the infeasible region (this May, 1998.
is achieved by simply removing one or more circuits of
the current configuration). In the three cases the sizes [6] Gallego R.A., Monticelli A., Romero R.: "Trans
of neighborhoods are kept at a minimum containing only mission System Expansion Planning by Extended
high quality alternatives. In [8] a threestep strategic os Genetic Algorithm" , IEE Proceedings