Copyright 2005 John Wiley & Sons Ltd Evolutionary algorithms make direct use of adaptive change in a population of computational agents in order to solve problems or model complex systems. INTRODUCTION So far, the only process that has produced intelligence is evolution. Consequently, the prospect of incorporating the principles underlying biological evolution into models of the development and performance of cognitive systems is attractive to many cognitive scientists. The basic principles of evolution are straightforward. Even before the mechanical details of DNA and the machinery of molecular biology were understood, it was apparent to any observer that individuals of the same species differ in minor ways. Darwin (1859) realized that these minor differences could affect the number of offspring left by an individual, particularly when there are more individuals of a species than its habitat can comfortably support. Natural selection therefore acts to reduce the genetic contribution of less fit individuals to future generations, gradually causing a population to become better adapted to its environment. Evolution, working in a blind and directionless fashion, has produced the astonishingly orderly and robust complexity of the biosphere. It is evidently a powerful optimizer, and the idea of incorporating evolutionary principles into computer algorithms has been used since the 1940s by engineers in search of optimization tools for use in complex, nonlinear systems. Evolutionary algorithms (EAs) are also a natural choice for modeling systems to whose development biological evolution has been fundamental, such as the human cognitive architecture. COMPUTATIONAL PARADIGMS FOR EVOLUTION Evolutionary algorithm is an umbrella term, covering a family of algorithms which take inspiration from the principles of biological evolution. An EA is any algorithm that makes use of adaptive change in a population of computational agents in order to solve problems or model complex systems. There are several varieties of EA (Mitchell, 1996). Genetic Algorithms A genetic algorithm (GA) involves a population of individual chromosomes. A chromosome may consist of a string of bits, a string of real numbers, or a more complex composition, depending on its purpose. Each chromosome can be assigned a numerical fitness, as defined by a fitness function, which measures how well the solution encoded in that chromosome solves the problem at hand. Chromosomes are chosen to contribute to the next generation in a fitness-dependent manner, so that fitter chromosomes have more offspring than less fit chromosomes. New chromosomes are produced by copying the parents and applying genetic operators based on biological mutation and crossover. The fitness of the new generation is then assessed, and the process iterated until a good solution is developed (Figure 1). The term simple genetic algorithm is often used to refer to an algorithm as outlined in Figure 1 using a bit- string chromosome. There are endless variations on the basic GA. The outcome may depend on the genetic operators used, how the problem is represented, selection strategies, the order of application of the operators, and how new generations are constructed. A simple genetic algorithm. (Figure 1.) Other Paradigms In addition to the simple GA, there are several other evolutionary computation paradigms. The major ones are described briefly below. Despite the differences in implementation, the various algorithms are alike in utilizing stochastic, fitness-dependent selection over random variations between individuals as a tool to develop individuals optimally adapted to their environment. Evolutionary programming Evolutionary programming (Fogel, 1999) was developed by L. J. Fogel in the early 1960s. It does not use a genomic representation. Each individual in the population is an algorithm, chosen at random from an appropriate sample space. Mutation is the only genetic operator used; there is no crossover. Evolutionary strategies Evolutionary strategies (Schwefel, 1995) were developed by H.-P. Schwefel, also in the 1960s, as an optimization tool. They use a real-valued chromosome, with a population of one, and mutation as the only genetic operator. In each generation, the parent is mutated to produce a descendant; if the descendant is fitter it becomes the parent for the next generation, otherwise the original parent is retained. Classifier systems In a classifier system (Holland, 1992), a classifier takes inputs from the environment and produces outputs indicating a classification of the input events. New classifiers are produced through the action of a genetic algorithm on the systems population of classifiers. (See Classifier Systems) Genetic programming The aim of genetic programming (Koza, 1999), developed by J. Koza in the late 1980s, is the automatic programming of computers: allowing programs to evolve to solve a given problem. The population consists of programs expressed as parse trees; the operators used include crossover, mutation, and architecture- altering operations patterned after gene duplication and gene deletion in nature. Evolution as a Search Process The fundamental process underlying all evolutionary algorithms is a heuristic search through a search space, or fitness landscape, defined by the problem representation in conjunction with the fitness function. The concept of a fitness landscape was introduced by S. Wright (1932) in the context of biological evolution. He suggested that for a given set of genes, each possible combination of gene values (alleles) could be assigned a fitness value for a particular set of conditions. The entire genotype space can then be visualized as a landscape, with genotypes of high fitness occupying peaks and those of low fitness forming troughs. Such a fitness landscape is generally very high-dimensional, but fitness landscape can be visualized in two dimensions (Figure 2). The fitness landscape metaphor has proven to be powerful, and is applicable to computational as well as biological evolution. A population whose members have slightly differing genotypes is represented as a set of points in a fitness landscape. Mutation and natural selection will act to drive a population up to its nearest local maximum, which may or may not be the global maximum for the landscape. Under strong selection pressure, the population may become trapped on a suboptimal local maximum. Because of the stochastic nature of the evolutionary process, however, the population will be spread out over the landscape to some extent, and different individuals may find themselves on the slopes of different maxima, depending on the ruggedness of the environment. The population is, in effect, performing a parallel search of the fitness landscape. (See Machine Learning) A two-dismensional fitness landscape. It shows the global maximum and one local maximum. (Figure 2.) The Schema Theorem Much of the research into evolutionary algorithms has been purely empirical, with EAs being used for optimization of problems involving multiple parameters, for which the problem domain, and hence the search space, is poorly understood. There are continuing efforts, however, to formulate a theoretical basis for evolutionary computation. One of the earliest attempts to produce such a formulation was the schema theorem proposed in 1975 by Holland (Holland, 1992) and restated by Goldberg (1989) as short, low-order, above-average schemata receive exponentially increasing trials in subsequent generations. Briefly, the idea is that a schema is a set of building blocks which can be described by a template comprising ones, zeros, and asterisks, with the asterisks representing wild cards which can take on any value. The evolutionary algorithm proceeds by identifying schemas of high fitness and recombining them using crossover in order to produce entire individuals having high fitness. The theory is attractive because, for a given EA, schemata can be identified, and the effects of mutation and crossover on schemata in a population of a given size can be quantified, potentially providing useful insight into the way in which the EA functions. There has been a large amount of investigation into the schema theorem, with often inconclusive and controversial results. Mitchell et al. (1991) designed a class of fitness landscapes which they called the royal road functions. These functions produce a hierarchical fitness landscape, in which crossover between instances of fit lower-order schemata tends to produce fit higher-order schemata. However, these researchers found that the presence of relatively fit intermediate stages could in fact interfere with the production of fit higher-order solutions, since once an instance of a fit higher-order schema is discovered its high fitness allows it to spread quickly throughout the population, carrying with it hitchhiking genes in positions not included in the schema. Low-order schemata are therefore discovered more or less sequentially, rather than in parallel. The extent to which the schema theorem applies in practice remains controversial. For example, Vose (1999) places little credence in the general applicability of the schema theorem, and offers an alternative mathematical approach to analyzing the behavior of the simple genetic algorithm. Exploration Versus Exploitation An EA attempts to find optimal areas of its search space by discovering new solutions (exploration) as well as by maintaining fit solutions that have already been discovered (exploitation). Achieving a good balance between exploration and exploitation is important to the efficacy of the algorithm: too little exploration means that important areas of the search space may be ignored, while too little exploitation increases the risk of prematurely discarding good solutions. The trade-off between exploration and exploitation is often studied in the context of the n-armed bandit problem. The n-armed bandit is an extension of the one-armed bandit, or casino slot machine. Instead of having a single lever which can be pulled, the bandit has n levers, each of which has an expected reward. Since the expected reward associated with each lever is unknown, a strategy must be developed to balance exploitation of knowledge already gained (by playing various arms) with exploration of the behavior of untested arms. Exploitation of the best action discovered so far will maximize expected reward on a single play, but exploration may lead to a better total reward in the long run. The n-armed bandit provides a basis for a mathematical formulation of the exploration-exploitation trade-off for an EA. The optimal strategy involves exponentially increasing the probability of selecting the best solutions over time. (See Holland, 1992, chap. 5 and chap. 10 for a detailed discussion of the n-armed bandit and its application to schema theory.) Evolution Versus Hill Climbing The power of an EA is often assumed to lie in its implicit parallelization: by maintaining a population of candidate solutions that are modified by mutation or crossover, the algorithm is, in effect, exploring different regions of its search space in parallel. The simplest alternative to an EA is a hill climber, an algorithm which maintains a population of one individual and performs a strictly local search using mutation. There are many variations of the hill climber algorithm; most involve mutating the current best candidate and accepting the mutated individual if it is fitter than the original. A hill climber will thus typically move only to the top of the nearest peak in the fitness landscape, which may not be a global optimum. An EA is no more efficient than multiple random restarts of a hill climber, in terms of the number of evaluations performed. An EA with a population size of 100 running for 1000 generations performs 100 000 evaluations, as does an algorithm with a single population member restarted 100 times for 1000 generations each time. The EA would, however, be expected to outperform the hill climber if the action of the genetic operators used in the EA provided advantages over local search. This would be the case if the schema theorem applied as described above, with useful partial solutions discovered by different individuals being recombined to produce fitter individuals more rapidly than could be done by mutation alone. The EA would also be expected to outperform the hill climber if the structure of the fitness landscape was such that the implicit memory of a population-based algorithm (i.e., the memory encoded into the structure of the population itself as a result of evolution) allowed it to concentrate its search in areas of high fitness in a manner that would not be possible for a hill climber. In practice, hill climbers with multiple restarts often perform at least as efficiently as population-based algorithms (Mitchell et al., 1994). COEVOLUTION The evolutionary algorithms described so far are primarily heuristic optimizers based upon a simple model of evolution. In biological systems, however, evolution in a particular population occurs in the context of a complex mixture of endogeneous and exogenous factors. The simple EA has been extended in several directions inspired by further observations from the realm of biology. Biological evolution does not occur in isolation. Individuals of a particular species live, breed, and die in collaboration and competition with other organisms of their own and other species, and evolutionary change in one population affects the fitness landscapes of other populations. Kauffman (1996) uses the analogy of the fitness landscape as a rubber sheet. Evolutionary change in one species deforms the sheet, not only for itself, but for all species existing in that fitness landscape. Coevolution is often used in an EA as a mechanism for the prevention of premature convergence to a suboptimal fitness peak. A population of problems and a population of solutions evolving together should, in theory, produce better solutions, since as good solutions are found the problems against which they are tested become harder, and an evolutionary arms race develops (Dawkins and Krebs, 1979). For an interesting overview of the background to coevolution in evolutionary computation, see Rosin and Belew (1997). LEARNING AND EVOLUTION The interaction between learning and evolution has been the subject of extensive research, both by those interested in using learning as a local search operator to improve the optimization performance of the algorithm, and by those interested in understanding the evolution and utility of learning in biological systems. The so-called Baldwin effect (Baldwin, 1896) is based upon the idea that learning on the part of individuals could guide the course of evolution in the population as a whole. A particular trait may be learned, or it may be innate (a term which can be taken, in this context, to indicate that it is genetically determined). A learned trait has the advantage of providing flexibility, but the disadvantage of being slow to acquire; an innate trait is present from birth, but inflexible. Baldwin suggested that traits that are initially learned may, over time, become encoded in the genotype of the population. Although this suggestion appears, at first glance, to be tantamount to Lamarckism, in fact no Lamarckian phenotype-to-genotype information flow is required. For the Baldwin effect to operate, two conditions must be met. Firstly, the trait in question (which may be a behavioral or a physical trait) must be influenced by several interacting genes, so that a mutation in one of these genes will make the phenotypic expression of the trait more likely. Secondly, an individual bearing such a mutation must be able to learn to express the trait. Under these conditions, learning acts to provide partial credit for a mutation. An individual carrying a mutation that predisposes it towards an advantageous phenotype will learn the trait more easily than its less fortunately genetically endowed conspecifics, and thus will tend to survive and pass on more copies of the relevant allele to the next generation. Over time, multiple mutations for the desirable trait will accumulate in the genes, and the trait will thus become innate in the population. Hinton and Nowlan (1987) were the first to demonstrate the feasibility of the Baldwin effect, at least in a simplified, computational model, and much research has been done in this area in the intervening years. See Turney (2001) for a comprehensive online bibliography of publications on the Baldwin effect. Learning may also be incorporated into an EA by making a learning system, such as a neural network, the object that is evolved. When evolving a neural network, the network architecture, weights, learning rules, and input features may all be subject to evolution. Modified genetic operators may be required in order to avoid disruption of the network architecture. See Yao (1999) for a comprehensive review of recent research in the evolution of artificial neural networks. (See Connectionist Architectures: Optimization) EVOLUTIONARY COMPUTATION AND COGNITIVE SCIENCE Evolutionary computation plays several roles in cognitive science. It has been used both as a modeling framework for exploring ideas inspired by biological evolution, and for the optimization of computational models. (See Artificial Life) Evolutionary algorithms have been used both to understand the development of well-known cognitive mechanisms, and to create new mechanisms with desired emergent behaviors. They are used in diverse domains, including language, memory, reasoning, motor control, and the analysis of social interactions. Modeling contributes to cognitive science in several ways: in providing converging evidence for empirical studies, in testing the internal consistency and completeness of theories, and in investigating the functioning and emergent properties of complex systems, which are often intractable to mathematical analysis. Evolutionary models have often been shown to exhibit complex dynamics emerging from the interaction between computational agents: dynamics that are not inherent in the behavior of a single such agent. As a modeling framework, evolutionary algorithms are most widely used within evolutionary psychology, which takes as its starting point the hypothesis that the mind contains a set of evolved systems, each one designed to solve a task that was important for survival in the human ancestral environment. (See Evolutionary Psychology: Theoretical Foundations; Human Brain, Evolution of the) The Evolution of Altruism: The Prisoners Dilemma Many organisms exist in social groups, interacting frequently with others of their own and other species. While an individual may benefit from cooperative interactions with others of its species, conspecifics are also any animals fiercest competitors for food, mates, and territory. Despite this inescapable competition, altruistic behavior (i.e., behavior that benefits another at some cost to the altruist) is often observed in natural populations. In an attempt to understand how such apparently paradoxical behavior could arise, a simple game known as the prisoners dilemma has been much studied. Suppose that two criminals have been arrested on suspicion of committing a crime. They are held incommunicado, and each offered a bargain: if the prisoner admits to the crime (defects) while the other prisoner keeps silent (cooperates), the defector will go free while the cooperator gets a long sentence. If both prisoners keep silent, they will both receive short sentences; and if they both admit to the crime they will each get an intermediate sentence. A possible pay- off matrix for the prisoners dilemma is shown in Figure 3. What makes the prisoners dilemma interesting is that, while the best outcome for the prisoners as a pair is for both to cooperate, the best decision for each prisoner, in the absence of knowledge about the other prisoners decision, is to defect. Prisoners dilemmas arise frequently in real life in any situation in which the action that most benefits an individual harms the group as a whole. Analysis of the prisoners dilemma may appear to support the conclusion that altruism cannot arise as a consequence of evolution, which requires that individuals act selfishly in order to pass on their own genes to the next generation. However, Axelrod (1984) studied an iterated version of the prisoners dilemma, in which individuals play against each other repeatedly, and have a memory of the past behavior of other individuals, with the opportunity to adjust their strategy based on this past history. In a computer tournament of strategies for the iterated prisoners dilemma, the clear winner was tit for tat, which starts by cooperating, and then copies whatever its opponent did in the previous round. Once cooperation becomes established, this strategy will continue to cooperate indefinitely. This strategy has been proven to be highly robust. Pay-off matrix for the prisoners dilemma. In each cell, the first amount is As sentence and the second amount is Bs sentence. (Figure 3.) Considerable research has been conducted into the evolution of altruism, using the iterated prisoners dilemma and other models. For an overview of the literature, see Ridley (1996). (See Social Processes, Computational Models of; Game Theory) The Evolution of Language A second area where evolutionary algorithms are being increasingly applied is the study of the evolution of language. Human language leaves no fossils, and no other animals have communication systems utilizing such extensive symbolic structures, so studying the evolution of language poses particular problems for cognitive scientists. Human languages share features (such as phonology and syntax) to such an extent that a universal grammar has been conjectured to explain the similarities, but their origins have long been controversial. (See Language Learning, Computational Models of) Recently, questions about the evolutionary origins of language and the extent to which it is determined by the cognitive architecture of the young child have been addressed using evolutionary algorithms. Groups of simple language users, modeled as computational agents, have been programmed to evolve communication systems. The emergent behaviors of such systems provide converging evidence on the possibilities for the evolution of language. Two levels of approach to the evolution of language phenomena have been proposed. In micro evolutionary modeling, learners are modeled by computational agents; a language is a set of utterances (sequences of symbols); global properties of languages are emergent properties; and either the learners or the set of utterances evolves. By contrast, in macro evolutionary modeling, learners are modeled as bundles of parameters; a language is an abstract entity; global properties are explicit parameters; and the distribution of parameters evolves. The micro and macro approaches differ in terms of model fidelity versus analytic tractability, and the role that emergence plays in explaining phenomena. (See Emergence) EAs have been used to study language at three levels: phonology (e.g. the self-organization of sound systems); lexicon (e.g. learning concepts, relating words to meanings, and convergence of a population on common conventions for word meanings); and syntax (e.g. the emergence of compositional structure from initially unstructured utterances). (See Phonology and Phonetics, Acquisition of; Semantics, Acquisition of; Syntax, Acquisition of) The main conclusion that can be drawn from simulations of language evolution is that over time weak functional constraints on language use and acquisition can give the appearance of strong constraints on language structure. Interaction itself accounts for many aspects of coordinated communication of groups. See Hurford et al. (1998) and Wray (2002) for an introduction to the breadth of work in this area. EVOLUTIONARY ROBOTICS Evolutionary robotics is a rapidly expanding field based on EA techniques. Both fixed and autonomous robots have many parameters that require optimizing, for example in kinematics (topology and dimensions), dynamics, components, and control. EAs are currently being used in a wide variety of projects, as a means for optimization of many parameters in parallel, with relatively unconstrained problems. Autonomous robots are being evolved to develop their own skills via direct interaction with the environment and without human intervention. See Pfeiffer and Scheier (1999) for an introduction to this area. CONCLUSION Evolutionary computation draws inspiration from the study of biological evolution in order either to model the evolutionary process, or to use simplified evolutionary principles in order to solve complex, nonlinear optimization tasks. Evolutionary ideas have been used in computing since the 1940s, but it is only with the recent availability of cheap, powerful desktop computers that EAs have become a tool readily available to researchers in all fields. In the field of cognitive science, they are used both to model the development of those cognitive structures and processes that are supposed to be the product of biological evolution such as human mental structures, language, and altruistic behavior and to optimize the parameter settings of cognitive models in general. References Axelrod, R (1984) The Evolution of Cooperation. New York, NY: Basic Books. Baldwin, J M (1896) A new factor in evolution. The American Naturalist 30: 441-451. [Reprinted in: Belew, R and Mitchell, M (1996) Adaptive Individuals in Evolving Populations, pp. 59-80. Reading, MA: Addison-Wesley.]. Darwin, C (1859) On the Origin of Species in Means of Natural Selection. London, UK: John Murray. Dawkins, R and Krebs, J R (1979) Arms races between and within species. Proceedings of the Royal Society, Series B 205: 1161. Fogel, L J (1999) Intelligence Through Simulated Evolution: Four Decades of Evolutionary Programming. New York, NY: John Wiley. Goldberg, D (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. Hinton, G E and Nowlan, S J (1987) How learning guides evolution. Complex Systems 1: 495-502. Holland, J H (1992) Adaptation in Natural and Artificial Systems:An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. Cambridge, MA: MIT Press. Hurford, J R, Studdert-Kennedy, M and Knight, C (eds) (1998) Approaches to the Evolution of Language: Social and Cognitive Bases. Cambridge, UK: Cambridge University Press. Kauffman, S A (1996) At Home in the Universe. London, UK: Penguin. Koza, J (1999) Genetic Programming, vol. III: Darwinian Invention and Problem Solving. San Francisco, CA: Morgan Kaufmann. Mitchell, M (1996) An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press. Mitchell, M, Forrest, S and Holland, J H (1991) The royal road for genetic algorithms: fitness landscapes and GA performance. In: Vorela, F J and Bourgine, P (eds) Towards a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life, pp. 245-254. Cambridge, MA: MIT Press. Mitchell, M, Holland, J H and Forrest, S (1994) When will a genetic algorithm outperform hill climbing? In: Cowan, D J, Tesauro, G and Alspector, J (eds) Advances in Neural Information Processing Systems, vol. VI, pp. 51-58. San Mateo, CA: Morgan Kaufman. Pfeiffer, R and Scheier, C (1999) Understanding Intelligence. Cambridge, MA: MIT Press. Ridley, M (1996) The Origins of Virtue. London, UK: Penguin. Rosin, C D and Belew, R K (1997) New methods for competitive coevolution. Evolutionary Computation 5(1): 1-26. Schwefel, H-P (1995) Evolution and Optimum Seeking. New York, NY: John Wiley. Turney, P (2001) The Baldwin Effect: A Bibliography. http://www.ai.mit.edu/~joanna/baldwin.html. Vose, M D (1999) The Simple Genetic Algorithm. Cambridge, MA: MIT Press. Wray, A (2002) The Transition to Language. Oxford, UK: Oxford University Press. Wright, S (1932) The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In: Jones, D F (ed.) Proceedings of the Sixth International Congress of Genetics, vol. I, pp. 356-366. [Reprinted In: Ridley, M (1997) Evolution. Oxford, UK: Oxford University Press.]. Yao, X (1999) Evolving artificial neural networks. Proceedings of the IEEE 87: 1423-1447. Further Reading Davis, L (ed.) (1991) Handbook of Genetic Algorithms. New York, NY: Van Nostrand. Dawkins, R (1976, 2nd edn. 1989) The Selfish Gene. Oxford, UK: Oxford University Press. Fogel, D (1995) Evolutionary Computation: Towards a New Philosophy of Machine Intelligence. Piscataway, NJ: IEEE Press. Fogel, D (1998) Evolutionary Computation: the Fossil Record. Piscataway, NJ: IEEE Press. Langdon, W B (1998) Genetic Programming and Data Structures. Hingham, MA: Kluwer. Michalewicz, Z (1992) Genetic Algorithms + Data Structures = Evolution Programs. New York, NY: Springer-Verlag. Jennifer S Hallinan University of Queensland, St Lucia, , Queensland, Australia Janet Wiles University of Queensland, St Lucia, , Queensland, Australia Copyright 2005 John Wiley & Sons Ltd Persistent URL to the Entry: http://search.credoreference.com/content/entry/wileycs/evolutionary_algorithms/0 APA Hallinan, J.(2005). Evolutionary algorithms. In Encyclopedia of cognitive science. Retrieved from http://search.credoreference.com/content/entry/wileycs/evolutionary_algorithms/0 MLA Hallinan, Jennifer S Wiles, Janet "Evolutionary Algorithms." Encyclopedia of Cognitive Science. Hoboken: Wiley, 2005. Credo Reference. Web. 25 July 2014. Chicago Hallinan, Jennifer S Wiles, Janet "Evolutionary Algorithms." In Encyclopedia of Cognitive Science. Hoboken: Wiley, 2005. http://search.credoreference.com/content/entry/wileycs/evolutionary_algorithms/0 (accessed July 25, 2014.) Harvard Hallinan, J. Wiles, J. 2005 'Evolutionary algorithms' in Encyclopedia of cognitive science, Wiley, Hoboken, USA. Accessed: 25 July 2014, from Credo Reference