100% found this document useful (2 votes)

1K views

Putation 1 Basic Algorithms and Operators by Thomas Back

Evolutionary.Computation.1.Basic.Algorithms.and.Operators.by.Thomas.Back

Uploaded by

Carlos Delgado

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

1K views

Putation 1 Basic Algorithms and Operators by Thomas Back

Evolutionary.Computation.1.Basic.Algorithms.and.Operators.by.Thomas.Back

Uploaded by

Carlos Delgado

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 379

TEAM LRN

Evolutionary Computation 1
Basic Algorithms and Operators

TEAM LRN

EDITORS IN CHIEF

Thomas Back
Associcite Projiessor of Computer Science, Leideri Uni,*ersity, The Netherlund.$;
cind Munuging Director mid Senior Resecirch F e l l o ~ . Center
,
j?)r Applied S y s t e m
Anulysis, Irformcitik Centrirm Dortmund, Germuny

David B Fogel
E.xec*iiti\fe Vice President ctnd c'hiej Scientist, Nuturd Selec-tion ltic,, Oi Jolltr,
Ca11fo rtr in, USA

Zbigniew Michalewicz
Projiissor oj'Computer Science, Univerhity cf North Cw-olinu, Charlotte, USA: cintl
lnstiticte ($Computer scicwce, Polish Acuderny i f Science.$, WLirsctw-,Poland

EDITORIAL BOARD

Peter J Angeline, USA

David Beasley, UK
Lashon B Booker, USA
Kalyanmoy Deb, India
Larry J Eshelman, USA
Hitoshi Iba, Japan
Kenneth E Kinnear Jr, USA
Raymond C Paton, U K
V William Porto, USA
Gunter Rudolph, Germany
Robert E Smith, USA
William M Spears, USA
ADVISORY BOARD

Kenneth De Jong, USA

Lawrence J Fogel, USA
John R Koza, USA
Ham-Paul Schwefel, Germany
Stewart W Wilson, USA

TEAM LRN

Evolutionary Computation 1
Basic Algorithms and Operators

Edited by

Thomas Back, David B Fogel

and Zbigniew Michalewicz

I N S T I T U T E OF PHYSICS PUBLISHING
Bristol and Philadelphia
TEAM LRN

INSTITUTE OF PHYSICS PUBLISHING

Bristol and Philadelphia
Copyright

02000 by

IOP Publishing Ltd

Published by Institute of Physics Publishing,

Dirac House, Temple Back, Bristol BSI 6BE, United Kingdom
(US Office: The Public Ledger Building, Suite 1035, 150 South Independence Mall West,
Philadelphia, PA 19106, USA)
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of IOP Publishing Ltd.
Brirish L i b m n Caruloguing-in-Publicdon D ~ t uand
Librcin' of'Congress Ccrtciiog irig - in- Pubiiccirion Datci (ire ci t Sciiicrbie
ISBN 0 7503 0664 5

PROJECT STAFF
Publisher: Nicki Dennis
Production Editor: Martin Beavis
Production Munuger: Sharon Toop
Assistunt Production Murtuger: Jenny Troyano
Production Controller: Sarah Plenty
Electronic Production Manuger: Tony Cox

Printed in the United Kingdom

@'IM The paper used in this publication meets the minimum requirements
of American National Standard for Information Sciences - Permanence of Paper
for Printed Library Materials, ANSI 239.48- 1984

TEAM LRN

Contents
Preface

xiii

List of contributors

xvii

Glossary

xxi

PART 1 WHY EVOLUTIONARY COMPUTATION?

1 Introduction to evolutionary computation
David B Fogel
1 , I Introductory remarks
1.2 Optimization
I .3 Robust adaptation
1.4 Machine intelligence
1.5 Biology
1.6 Discussion
References

2 Possible applications of evolutionary computation

David Beasley
2.1 Introduction
2.2 Applications in planning
2.3 Applications in design
2.4 Applications in simulation and identification
2.5 Applications in control
2.6 Applications in classification
2.7 Summary
References
Further reading

3 Advantages (and disadvantages) of evolutionary computation over

other approaches
Ham-Paul Schwefel
3.1 No-free-lunch theorem
3.2 Conclusions
References

TEAM LRN

4
4
6
7
8
9
10
10
18

20
20
21
22

Contents

PART 2 EVOLUTIONARY COMPUTATION:THE BACKGROUND

4 Principles of evolutionary processes
Duiiid B Fogel
4.1 Overview
References
5

Principles of genetics
Rayrnorzd C Pntorz
5. I Introduction
5.2 Some fundamental concepts in genetics
5.3 The gene in more detail
5.4 Options for change
5.5 Population thinking
References

6 A history of evolutionary computation

Kenrieth De Jong, Dmqid B Fogel and Huns-PaulSchwefel
6. I Introduction
6.2 Evolutionary programming
6.3 Genetic algorithms
6.4 Evolution strategies
References

23
23
26

27
27
27
33
35
35
38
40

40
41
44
48
51

PART 3 EVOLUTIONARY ALGORITHMS AND THEIR

STANDARD INSTANCES
Introduction to evolutionary algorithms
Thomas Biick
7.1 General outline of evolutionary algorithms
References
Further reading

8 Genetic algorithms
Lcirp J Eshelmarl
8.1 Introduction
8.2 Genetic algorithm basics and some variations
8.3 Mutation and crossover
8.4 Representation
8.5 Parallel genetic algorithms
8.6 Conclusion
References

9 Evolution strategies
Gunter Ritdnlph
9.1 The archetype of evolution strategies
LRN
9.2 Contemporary evolutionTEAM
strategies

59
62
62

64
65
68
75
77
78
78

Contents

vii

9.3

86
87

Nested evolution strategies

References

10 Evolutionary programming
V Willinin Porto
10. I Introduction
10.2 History
10.3 Current directions
10.4 Future research
References
Further reading

89
89
90
97
1 00
100

102

11 Derivative methods in genetic programming

Kenneth E Kitinear, Jr
1 I . 1 Introduction
1 I .2 Genetic programming defined and explained
1 1.3 The development of genetic programming
11.4 The value of genetic programming
References
Further reading

103

12 Learning classifier systems

Robert E Smith
12. I Introduction
12.2 Types of learning problem
I 2.3 Learning classifier system introduction
12.4 Michigan and Pitt style learning classifier systems
12.5 The bucket brigade algorithm (implicit form)
12.6 Internal messages
12.7 Parasites
12.8 Variations of the learning classification system
12.9 Final comments
References

114

13 Hybrid methods
Zbigniewi Micha le w ic;:
References

124

PART 4

103
103
108
109
111
112

I I4
114
117
118
119
120
120
121

122
122

126

REPRESENTATIONS

14 Introduction to representations
Kalyannzoy Deb
14.1 Solutions and representations
14.2 Important representations
14.3 Combined representations
TEAM LRN
References

127
128
130
131

...

Contents

Vlll

15 Binary strings
Thomas Biick
References

132

16 Real-valued vectors
Darkl B Fogel
16.I Object variables
16.2 Object variables and strategy parameters
References

136

17 Permutations
Darrell Whitley
17.1 Introduction
17.2 Mapping integers to permutations
17.3 The inverse of a permutation
17.4 The mapping function
17.5 Matrix representations
17.6 Alternative representations
17.7 Ordering schemata and other metrics
17.8 Operator descriptions and local search
References

139

135

I36
137
138

139
141
141
142
43
45
46
49
49

18 Finite-state representations
D m i d B Fogel
18.1 Introduction
18.2 Applications
References

151
152
154

19 Parse trees
Peter J Angeline
References

155

20 Guidelines for a suitable encoding

Dartid B Fogel and Peter J Angeline
References

160

21 Other representations
Peter J Angeline and David B Fogel
2 I . 1 Mixed-integer structures
21.2 Introns
2 1.3 Diploid representations
References

163

158

162

163
163
1 64
1 64

PART 5 SELECTION
22 Introduction to selection
Ka lyatimoy De b

166
TEAM LRN

Contents
22.1 Working mechanisms
22.2 Pseudocode
22.3 Theory of selective pressure
References

166
167
170
171

23 Proportional selection and sampling algorithms

John Grefenstette
23. I Introduction
23.2 Fitness functions
23.3 Selection probabilities
23.4 Sampling
23.5 Theory
References

172

24 Tournament selection

181

Tobias Blickle
24.1 Working mechanism
24.2 Parameter settings
24.3 Formal description
24.4 Properties
References

180

181
182
182
183
185

187

25 Rank-based selection
John Grefenstette

25. I
25.2
25.3
25.4
25.5

172
172
175
175
176

Introduction
Linear ranking
Nonlinear ranking
( p , A), ( p + A) and threshold selection
Theory
References

187
188
188

189
190
194

26 Boltzmann selection
Samir W Mahf{)ud
26.1 Introduction
26.2 Simulated annealing
26.3 Working mechanism for parallel recombinative simulated
annealing
26.4 Pseudocode for a common variation of parallel recombinative
simulated annealing
26.5 Parameters and their settings
26.6 Global convergence theory and proofs
References

195

27 Other selection methods

David B Fogel
27.1 Introduction
27.2 Tournament selection

201

TEAM LRN

195
196
196
197
197
199
200

20 1
20 1

Contents

27.3
27.4
27.5
27.6
27.7
27.8

Soft brood selection

Disruptive selection
Boltzmann selection
Nonlinear ranking selection
Competitive selection
Variable lifespan
References

202
202
202
202
203
203
204

28 Generation gap methods

Jnyshree Snrrna und Kenrieth De Jong
28.1 Introduction
28.2 Historical perspective
28.3 Steady state and generational evolutionary algorithms
28.4 Elitist strategies
References

205

29 A comparison of selection mechanisms

Peter J B Hnricock
29.1 Introduction
29.2 Simulations
29.3 Population models
29.4 Equivalence: expectations and reality
29.5 Simulation results
29.6 The effects of evaluation noise
29.7 Analytic comparison
29.8 Conclusions
References

212

30 Interactive evolution
WolfRnrzg Bnnzhnf
30. I Introduction
30,2 History
30.3 The problem
30.4 The interactive evolution approach
30.5 Difficulties
30.6 Application areas
30.7 Further developments and perspectives
References
Further reading

228

205
206
207
210
21 1

212
213
214
214
218
222
224
225
225

228
228
229
229
23 1
23 1
232
233
234

PART 6 SEARCH OPERATORS

31 Introduction to search operators
Zbigriiew Michalewic
TEAM LRN
References

235
236

Contents

32 Mutation operators
Thomas Biick, Daliid B Fogel, Darrell Whitley mid Peter J Aiigeliiie
32.1 Binary strings
32.2 Real-valued vectors
32.3 Permutations
32.4 Finite-state machines
32.5 Parse trees
32.6 Other representations
References

237
237
239
243
246
248
250
252

33 Recombination
Lnshoii B Bnoker, David B Fogel, Darrell Whitle?; Peter J Aiigeliiie
arzd A E Eiben
33.1 Binary strings
33.2 Real-valued vectors
33.3 Permutations
33.4 Finite-state machines
33.5 Crossover: parse trees
33.6 Other representations
33.7 Multiparent recombination
References

256

34 Other operators

308

256
270
274
284
286
289
289
302

Russell W Anderson, Daijid B Fogel and Martin Schiit,:

34.1 The Baldwin effect
34.2 Knowledge-augmented operators
34.3 Gene duplication and deletion
References
Further reading

308
317
319
326
329

Index

33 1

TEAM LRN

This page intentionally left blank

TEAM LRN

Preface
The original Handbook of Evolutionary Computation (Back et a1 1997) was
designed to fulfil1 the need for a broad-based reference book reflecting the
important role that evolutionary computation plays in a variety of disciplinesranging from the natural sciences and engineering to evolutionary biology and
computer sciences. The basic idea of evolutionary computation, which came
onto the scene in the 195Os, has been to make use of the powerful process of
natural evolution as a problem-solving paradigm, either by simulating it (by
hand or automatically) in a laboratory, or by simulating it on a computer. As
the history of evolutionary computation is the topic of one of the introductory
sections of the Handbook, we will not go into the details here but simply mention
that genetic algorithms, evolution strategies, and evolutionary programming are
the three independently developed mainstream representatives of evolutionary
computation techniques, and genetic programming and classifier systems are the
most prominent derivative methods.
In the 1960s, visionary researchers developed these mainstream methods of
evolutionary computation, namely J H Holland ( 1 962) at Ann Arbor, Michigan,
H J Bremermann (1962) at Berkeley, California, and A S Fraser (1957) at
Canberra, Australia, for genetic algorithms, L J Fogel (1962) at San Diego,
California, for evolutionary programming, and I Rechenberg ( 1965) and H
P Schwefel (1965) at Berlin, Germany, for evolution strategies. The first
generation of books on the topic of evolutionary compuation, written by
several of the pioneers themselves, still gives an impressive demonstration of
the capabilities of evolutionary algorithms, especially if one takes account of
the limited hardware capacity available at that time (see Fogel et a1 (1966),
Rechenberg ( I 973), Holland ( 1975), and Schwefel ( 1977)).
Similar in some ways to other early efforts towards imitating natures
powerful problem-solving tools, such as artificial neural networks and fuzzy
systems, evolutionary algorithms also had to go through a long period of
ignorance and rejection before receiving recognition. The great success that
these methods have had, in extremely complex optimization problems from
various disciplines, has facilitated the undeniable breakthrough of evolutionary
computation as an accepted problem-solving methodology. This breakthrough
is reflected by an exponentially growing number of publications in the field,
and an increasing interest in corresponding conferences and journals. With
these activities, the field now has its own archivable high-quality publications in
which the actual research results are published. The publication of a considerable
amount of application-specific work is, however, widely scattered over different
TEAM LRN

...

Xlll

xiv

Preface

disciplines and their specific conferences and journals, thus reflecting the general
applicability and success of evolutionary computation methods.
The progress in the theory of evolutionary computation methods since
1990 impressively confirms the strengths of these algorithms as well as their
limitations. Research in this field has reached maturity, concerning theoretical
and application aspects, so it becomes important to provide a complete reference
for practitioners, theorists, and teachers in ii variety of disciplines. The
original Hcrridbook of E\vliitioriary Computation was designed to provide such
a reference work. It included complete, clear, and accessible information.
thoroughly describing state-of-the-art evolutionary computation research and
application in a comprehensive style.
These new volumes, based in the original Handbook, but updated, are
designed to provide the material in units suitable for coursework as well as
for individual researchers. The first volume. E\diitionur.~ Computation I :
Basic Afgoritlzms arid Operators, provides the basic information on evolutionary
algorithms. In addition to covering all paradigms of evolutionary computation in
detail and giving an overview of the rationale of evolutionary computation and
of its biological background, this volume also offers an in-depth presentation
of basic elements of evolutionary computation models according to the types
of representations used for typical problem classes (e.g. binary, real-valued,
permutations, finite-state machines, parse trees). Choosing this classification
based on representation, the search operators mutation and recombination
(and others) are straightforwardly grouped according to the semantics of the
data they manipulate. The second volume, Eivlutionary Compiitatiori 2:
Acf\mc.ed Algorithms arid Operutors, provides information on additional topics
of major importance for the design of an evolutionary algorithm, such as
the fitness evaluation, constraint-handling issues, and population structures
(including all aspects of the parallelization of evolutionary algorithms). This
volume also covers some advanced techniques (e.g. parameter control, metaevolutionary approaches, coevolutionary algorithms, etc) and discusses the
efficient implementation of evolutionary algorithms.
Organizational support provided by Institute of Physics Publishing makes it
possible to prepare this second version of the Huricfbook. In particular, we would
like to express our gratitude to our project editor, Robin Rees, who worked with
us on editorial and organizational issues.

Thomas Back, David B Fogel and Zbigniew Michalewicz

August I999

References
E\dutiotiury Cotnpiitutioti
Back T, Fogel D B and Michalewicr Z I997 Huti(lhook
TEAM LRN
(Bristol: Institute of Physics Publishing
and New York: Oxford University Press)

References

Bezdek J C 1994 What is computational intelligence ? Cornpurationul Intelligence:

Imitating Life ed J M Zurada, R J Marks I1 and C J Robinson (New York: IEEE
Press) pp 1-12
Bremermann H J 1962 Optimization through evolution and recombination Self
Organizing Systems ed M C Yovits, G T Jacobi and G D Goldstine (Washington,
DC: Spartan Book) pp 93-106
Fogel L J 1962 Autonomous automata Industrial Research 4 14-9
Fogel L J, Owens A J and Walsh M J 1966 Artijicial Intelligence through Simulated
Evolution (New York: Wiley)
Fraser A S 1957 Simulation of genetic systems by automatic digital computers: I.
Introduction Austral. J. Biol. Sci. 10 pp 484-91
Holland J H 1962 Outline for a logical theory of adaptive systems J. ACM 3 297-314
-1975
Adaptation in Natural and Artijicial Systems (Ann Arbor, MI: University of
Michigan Press)
Rechenberg I 1965 Cybernetic solution path of an experimental problem Royal Aircrufr
Establishment Library Translation No 1122 (Farnborough, UK)
Rechenberg I 1973 Evolutionsstrategie: Optimierung technischer Systenie nach
Prinzipien der hiologischen Evolution (Stuttgart: Frommann-Holzboog)
Schwefel H-P 1965 Kybernetische Evolution als Strategie der experimentellen
Forschung in der Stromungstechnik Diplomarbeit Hermann Fottinger Institut fur
Stromungstechnik, Technische Universitat, Berlin
-I977 Numerische Optimierung von Computer-Modellen mittels der Ei~olution.\strcitegie Interdisciplinary Systems Research vol 26 (Basel: Birkhauser)

TEAM LRN

This page intentionally left blank

TEAM LRN

List of Contributors
Peter J Angeline (Chapters 19-21, 32, 33)
Senior Scientist, Natural Selection, Inc., Vestal, NY, USA
e-mai 1: angeli ne @ nat ural-selec tion.com

Thomas Back (Chapters 7, 15, 32, Glossary)

Associate Professor of Computer Science, Leiden University The Netherlands; and
Manugirrg Director and Senior Research Fellow: Center for Applied Systems
An a ly .s is, lrzformatik Centrum Dortmund, Germarry
e-mail: baeck@Isl I .informatik.uni-dortmund.de

Wolfgang Banzhaf (Chapter 30)

Prc$e.ssor c$ Computer Science, University of Dortmund, Germany
e-mail: banzhaf@Is 1 I .infonnatik.uni-dortmund.de

David Beasley (Chapter 2)

Sofhure Engineer, Praxis PLC, Deloitte and Touche Consulting Group, Bath, United
Kingdom
e-mail: dabley @ praxis.co.uk

Tobias Blickle (Chapter 24)

Electrical Engineer, Institute TIK, ETH Zurich, Swit:erlund
e-mail: [email protected]

Lashon B Booker (Chapter 33)

Principal Scientist, Artijicial Intelligence Technical Center, The MITRE Corporation,
McLean, VA, USA
e-mail: [email protected]

Kalyanmoy Deb (Chapters 14, 22)

Associate Professor of Mechanical Engineering, Indian Institute of Technology,
Kanpur, India
e-mail: [email protected]

Kenneth De Jong (Chapters 6, 28)

Professor of Computer Science, George Mason University, Fui$m-, VA, USA
e-mail: kdejong @ grnu.edu

A E Eiben (Chapter 33)

Leiden Institute ($Advanced Computer Science, Leiden Uniiiersity, The Netherlands;
and Faculty of Sciences, Vrije Universireit Amsterdam, The Netherlands
e-mail: [email protected] and [email protected]

TEAM LRN

xvii

List of Contributors

xviii

Larry J Eshelman (Chapter 8)

Priricipul Meniber cf Reseurch St& Philips Reseurch, Briarc-lcflMunor, NY, USA
e-mail: [email protected]

David B Fogel (Chapters 1. 4, 6, 16, 18, 20, 21, 27, 32-34, Glossary)
E.rec-uti\v Vice President cind Chief Scientist, Nuturd Selec*tionInc.., Lu Jollu, CA,
USA
e-mail: [email protected]

John Grefenstette (Chapters 23, 25)

Heclcl oj' the Muchine Lecirning Section, Ncc vy Center f o r Applied Reseurch in
A rt iJic*iulIn tell ig ence , Nu r ~1 iReseu rch Labora to n, Wush ing ton, DC, USA
e-mail: gref@ aic.nrl .navy.mil

Peter J B Hancock (Chapter 29)

Lec*turerin Psyc.hology, University of Stirling, United Kingdom
e-mail: [email protected]

Kenneth E Kinnear Jr (Chapter 1 1 )

Chief' Technicul Oficer, Ackripti,!e Coniputing Technology, Boxhoro, MA, USA
e-mail: [email protected]

Samir W Mahfoud (Chapter 26)

Vice

President oj' Reseurch und Soft\vure Engineering, Adrunced In\~estment

Technology, Clenntwter, FL, USA

e-mail: [email protected]

Zbigniew Michalewicz (Chapters 13, 3 1 )

Prcfessor cf Coniputer Science, Uniiv?r.sitycf North Curolina, Charlotte, USA: and
lnstitiite ( f Computer Science, Polish Amderny oj' Sciences, Wursuw?,Polund
e-mail: [email protected]

Raymond C Paton (Chapter 5 )

Lwturer in Coniputer Science, Unir*ersih*
cf Liverpool, United Kingdom
e - mai 1 : r .c .pa ton @ c sc .I i v .ac .u k

V William Port0 (Chapter 10)

Senior Stuf Scientist, Nutiirul Selection Inc., k
i Jollcc, CA, USA
e-mail: [email protected]

Gunter Rudolph (Chapter 9)

Senior Res eu rch Fello N', Center for Applied Sys terns Anu Iys is, li$o rniu tik Centruni
Dortmund, Germuny
e-mail: rudolph@ icd.de

Jayshree Sarma (Chapter 28)

Computer Science Rextrrcher, Ceorge Mason Uni\)ersity, Fuirjbx, VA, USA
e-mail: [email protected]

TEAM LRN

List of Contributors

xix

Martin Schutz (Chapter 34)

Computer Scirritist, Sjxtems Anctlysis Research Group, Utiil'c)rsity of Dortntwicl.
Ge rmiriy
e-mail: schuetz@Is1 I .informatik.uni-dortmund.de

Hans-Paul Schwefel (Chapters 3, 6)

Chair of Sys term Aricilj~sis, ctrid Professor of Conipiitrr Science, Utiiiytjrsit\. of
Dorttmtrici, Gernicrnj';cirid Director, Ceriter .for Applied Sjxtotm Aiiulj?$is,
Infonncitik Ceritriirn Dorrniund, Gerniutij?
e-mail: schwefel @Is 1 I .informatik.uni-dortmund.de

Robert E Smith (Chapter 12)

Senior Resecirr-h Fc4lo~~,
lntrlligent Coniputitig Sutetns Cetitrr, Conipictrr Stirdios
a t i d A4utheniiitic.s F w u l t ~ Uiii\vr.sih
,
of the West of Eii,q/urid. Bvi.sto1, Uiiitcd
Kingdom
rsm i t h 0 btc .u we. ac .uk

Darrell Whitley (Chapters 17, 32, 33)

Professor cf Conipirter Science, Colomdo State Utii~vrsity,Fort Collitis, CO, USA
e-mail: [email protected]

TEAM LRN

This page intentionally left blank

TEAM LRN

Glossary
Thornas Back and David B Fogel
Bold text within definitions indicates terms that are also listed elsewhere in this
glossary.

Adaptation: This denotes the general advantage in ecological or physiological

efficiency of an individual in contrast to other members of the population,
and it also denotes the process of attaining this state.
Adaptive behavior: The underlying mechanisms to allow living organisms,
and, potentially, robots, to adapt and survive in uncertain environments
(cf adaptation).
Adaptive surface: Possible biological trait combinations in a population of
individuals define points in a high-dimensional sequence space, where each
coordinate axis corresponds to one of these traits. An additional dimension
characterizes the fitness values for each possible trait combination, resulting
in a highly multimodal fitness landscape, the so-called adaptive surface or
adaptive topography.
Allele: An alternative form of a gene that occurs at a specified chromosomal
position (locus).
Artificial life: A terminology coined by C G Langton to denote the '. . . study
of simple computer generated hypothetical life forms, i.e. life-as-it-couldbe.' Artificial life and evolutionary computation have a close relationship
because evolutionary algorithms are often used in artificial life research
to breed the survival strategies of individuals in a population of artificial
life forms.
Automatic programming: The task of finding a program which calculates a
certain input-output function. This task has to be performed in automatic
programming by another computer program (cf genetic programming).
Baldwin effect: Baldwin theorized that individual learning allows an organism
to exploit genetic variations that only partially determine a physiological
structure. Consequently, the ability to learn can guide evolutionary
processes by rewarding partial genetic successes. Over evolutionary
time, learning can guide evolution because individuals with useful genetic
variations are maintained by learning, such that useful genes are utilized
more widely in the subsequent generation. Over time, abilities that
previously required learning are replaced by genetically determinant
TEAM LRN

xxi

xxii

G 10ssary

\y\tem\. The guiding effect of learning on evolution is referred to as

the Baldwin effect. (See crlso Sectiori 34. I . )
Behavior: The response of an organism to the pre3ent environmental stimulus.
The collection of behaviors of an organism defines the fitness of the
organism to its present environment.
Boltzmann selection: The Boltzmann selection method transfers the probabilistic acceptance criterion of simulated annealing to evolutionary algorithms. The method operates by creating an offspring individual from two
parents and accepting improvements (with respect to the parent's fitness)
in any case and deteriorations according to an exponentially decreasing
t'unction of an exogeneous 'temperature' parameter. (See c i l s o Chapter 26.
Building block: Certain forms of recombination in evolutionary algorithms
attempt to bring together building blocks, shorter pieces of an overall
\olution, in the hope that together these blocks will lead to increased
performance. (See crl.co Suction 26.3. )
Central dogma: The fact that, by means of translation and transcription
proces\e\, the genetic information is passed from the genotype to the
phenotype (i.e. from DNA to RNA and to the proteins). The dogma
implies that behaviorally acquired characteristics of an individual are not
inherited to its off\pring (cf Lamarckism).
Chromatids: The two identical parts of a duplicated chromosome.
Chromosome: Rod-shaped bodies in the nucleus of eukaryotic cells, which
contain the hereditary units or genes.
Classifier systems: Dynamic, rule-based systems capable of learning by
examples and induction. Classifier systems evolve a population of
production rule\ (in the \o-called Michigan approach, where an individual
corresponds to a single rule) or a population of production rule bases
(in the so-called Pittsburgh approach, where an individual represents a
complete rule base) by means of an evolutionary algorithm. The rules
are often encoded by a ternary alphabet. which contains a 'don't care'
symbol fxilitating a generalization capability of condition or action parts
of a rule, thus allowing for an inductive learning of concepts. In the
Michigan approach, the rule fitness (its strength) is incrementally updated
at each generation by the 'bucket brigade' credit assignment algorithm
based on the reward the system obtains from the environment, while in the
Pittsburgh approach the fitness of a complete rule base can be calculated
by testing the behavior of the individual within its environment.
Codon: A group of three nucleotide bases within the DNA that encodes a single
amino acid or start and \top information for the transcription process.
Coevolutionary system: In coevolutionary systems, different populations
interact with each other in a way such that the evaluation function of one
population may depend on the state of the evolution process in the other
TEAM LRN
population( s).

Glossary

xxiii

Comma strategy: The notation ( p , A) strategy describes a selection method

introduced in evolution strategies and indicates that a parent population
of p individuals generates h > p offspring and the best out of these h
offspring are deterministically selected as parents of the next generation.
(See also Section 25.4.)
Computational intelligence: The field of computational intelligence is
currently seen to include subsymbolic approaches to artificial intelligence.
such as neural networks, fuzzy systems, and evolutionary computation,
which are gleaned from the model of information processing in natural
systems. Following a commonly accepted characterization, a system is
computationally intelligent if it deals only with numerical data, does not use
knowledge in the classical expert system sense, and exhibits computational
adaptivity, fault tolerance, and speed and error rates approaching human
performance.
Convergence reliability: Informally, the convergence reliability of an
evolutionary algorithm means its capability to yield reasonably good
solutions in the case of highly multimodal topologies of the objective
function. Mathematically, this is closely related to the property of global
convergence with probability one, which states that, given infinite running
time, the algorithm finds a global optimum point with probability one.
From a theoretical point of view, this is an important property to justify
the feasibility of evolutionary algorithms as global optimization methods.
Convergence velocity: In the theory of evolutionary algorithms, the
convergence velocity is defined either as the expectation of the change
of the distance towards the optimum between two subsequent generations,
or as the expectation of the change of the objective function value between
two subsequent generations. Typically, the best individual of a population
is used to define the convergence velocity.
Crossover: A process of information exchange of genetic material that occurs
between adjacent chromatids during meiosis.
Cultural algorithm: Cultural algorithms are special variants of evolutionary
algorithms which support two models of inheritance, one at the
microevolutionary level in terms of traits, and the other at the
macroevolutionary level in terms of beliefs. The two models interact via
a communication channel that enables the behavior of individuals to alter
the belief structure and allows the belief structure to constrain the ways in
which individuals can behave. The belief structure represents cultural
knowledge about a certain problem and therefore helps in solving the
problem on the level of traits.
Cycle crossover: A crossover operator used in order-based genetic
algorithms to manipulate permutations in a permutation preserving way.
Cycle crossover performs recombination under the constraint that each
element must come from one TEAM
parentLRN
or the other by transferring element

xxiv

Giossary

cycles between the mates. The cycie crmsover operator preserves absolute
positions of the elements of permutations. (See also Section 33.3.)
Darwinism: The theory of evolution, proposed by Darwin, that evolution
comes about through random variation (mutation) of heritable characteristics, coupled with natural selection, which favors those species for
further survival and evolution that are best adapted to their environmental
conditions. (See also Chapter 4 . )
Deception: Objective functions are called deceptive if the combination of good
building b!ocks by means of recombination !eads to a reduction of fitness
rather than an increase.
Deficiency: A form of mutation that involves a terminal segment loss of
chromosome regions.
Defining length: The defining length of a scheiiia is the maximum distance
between specified positions within the schema. The larger the defining
length of "a schema, the higher becomes its disruption probability by

crossover.
Deletion: A form of mutation that involves an internal segment loss of a
chromosome region.
Deme: An independent subpopulation in the migration model of parallel
evolutionary algorithms.
Diffusion model: The diffusion model denotes a massively parallel
implementation of evolutionary algorithms, where each individual is
realized as a single process being connected to neighboring individuals,
such that a spatial individual structure is assumed. Recombination
and selection an: restricted to the neighborhood of an individual, such
that information is locally preserved and spreads only slowly over the
population.
Dipioid: In diploid organisms, each body ce!! carries two sets of chromosomes;
that is, each chromosome exists in two homologous fGrrns, one of which
is phenotypically realized.
Discrete recombination: Discrete recombination works o n two vectors of
object variables by performing an exchange of the corresponding object
variables with probability one half (other settings of the exchange
probability are in principle possible) (cf uniform crossover). (See cilso
Section 33.2.)
DNA: Deoxyribonucleic acid, a double-stranded macromolecule of helical
structure (comparable to a spiral staircase). Both single strands are linear,
unbranched nucleic acid molecules built up from alternating deoxyribose
(sugar) and phosphate molecules. Each deoxyribose part is coupled to
a nucleotide base, which is responsible for establishing the connection
to the other strand of the DNA, The four nucleotide bases adenine (A),
thymine (T), cytosine ( C ) and guanine (G) are thc alphabet of the genetic
information. The sequences of these bases i n the DNA molecuie determines
TEAM LRN
the building p l a ~of any organism.

Glossary

xxv

Duplication: A form of mutation that involves the doubling of a certain region

of a chromosome at the expense of a corresponding deficiency on the other
of two homologous chromosomes.
Elitism: Elitism is a feature of some evolutionary algorithms ensuring that the
maximum objective function value within a population can never reduce
from one generation to the next. This can be assured by simply copying
the best individual of a population to the next generation, if none of the
selected offspring constitutes an improvement of the best value.
Eukaryotic cell: A cell with a membrane-enclosed nucleus and organelles
found in animals, fungi, plants, and protists.
Evolutionary algorithm: See evolutionary computation.
Evolutionary computation: This encompasses methods of simulating evolution, most often on a computer. The field encompasses methods that comprise a population-based approach that relies on random variation and selection. Instances of algorithms that rely on evolutionary principles are
called evolutionary algorithms. Certain historical subsets of evolutionary
algorithms include evolution strategies, evolutionary programming, and
genetic algorithms.
Evolutionary operation (EVOP): An industrial management technique presented by G E P Box in the late fifties, which provides a systematic way
to test alternative production processes that result from small modifications
of the standard parameter settings. From an abstract point of view, the
A) strategy with a typical setting of h = 4 and
method resembles a ( I
h = 8 (the so-called 22 and 23 factorial design), and can be interpreted as
one of the earliest evolutionary algorithms.
Evolutionary programming: An evolutionary algorithm developed by
L J Fogel at San Diego, CA, in the 1960s and further refined by D B Fogel
and others in the 1990s. Evolutionary programming was originally
developed as a method to evolve finite-state machines for solving time
series prediction tasks and was later extended to parameter optimization
problems. Evolutionary programming typically relies on variation operators
that are tailored to the problem, and these often are based on a single parent;
however, the earliest versions of evolutionary programming considered the
possibility for recombining three or more finite-state machines. Selection
is a stochastic tournament selection that determines p individuals to
survive out of the p parents and the p (or other number of) offspring
generated by mutation. Evolutionary programming also uses the selfadaptation principle to evolve strategy parameters on-line during the
search (cf evolution strategy). (See also Chapter 10.)
Evolution strategy: An evolutionary algorithm developed by I Rechenberg
and H-P Schwefel at the Technical University of Berlin in the 1960s. The
evolution strategy typically employs real-valued parameters, though it has
also been used for discrete problems. Its basic features are the distinction
LRN
between a parent population TEAM
(of size
p ) and an offspring population (of

xxvi

Glossary

size h 2 p ) , the explicit emphasis on normally distributed mutations,

the utilization of different forms of recombination, and the incorporation
of the self-adaptation principle for strategy parameters; that is. those
parameters that determine the mutation probability density function are
evolved on-line, by the same principles which are used to evolve the object
variables. (See nlso Chapter Y.)
Exon: A region of codons within a gene that is expressed for the phenotype
of an organism.
Finite-state machine: A transducer that can be stimulated by a finite alphabet
of input symbols, responds in a finite alphabet of output symbols, and
possesses some finite number of different internal states. The behavior
of the finite-state machine is specified by the corresponding input-output
symbol pairs and next-state transitions for each input symbol, taken over
every state. In evolutionary programming, finite-state machines are
historically the first structures that were evolved to find optimal predictors
of the environmental behavior. (See also Chcipter 18.)
Fitness: The propensity of an individual to survive and reproduce in a
particular environment. In evolutionary algorithms, the fitness value
of an individual is closely related (and sometimes identical) to the
objective function value of the solution represented by the individual,
but especially when using proportional selection a scaling function is
typically necessary to map objective function values to positive values
such that the best-performing individual receives maximum fitness.
Fuzzy system: Fuzzy systems try to model the the fact that real-world
circumstances are typically not precise but fuzzy. This is achieved by
generalizing the idea of a crisp membership function of sets by allowing
for an arbitrary degree of membership in the unit interval. A fuzzy set
is then described by such a generalized membership function. Based on
membership functions, linguistic variables are defined that capture realworld concepts such as low temperature. Fuzzy rule-based systems then
allow for knowledge processing by means of fuzzification, fuzzy inference,
and defuzzitication operators which often enable a more realistic modeling
of real-world situations than expert systems do.
Gamete: A haploid germ cell that fuses with another in fertilization to form
a zygote.
Gene: A unit of codons on the DNA that encodes the synthesis for a protein.
Generation gap: The generation gap characterizes the percentage of the
population to be replaced during each generation. The remainder of the
population is chosen (at random) to survive intact. The generation gap
allows for gradually shifting from the generation-based working scheme
towards the extreme of just generating one new individual per generation,
the so-called steady-state selection algorithm. (See cilso Clzcipter 28.)
Genetic algorithm: An evolutionary algorithm developed by J H Holland
and his students at Ann Arbor,TEAM
MI, inLRN
the 1960s. Fundamentally equivalent

Glossary

xxvii

procedures were also offered earlier by H J Bremermann at UC Berkeley

and A S Fraser at the University of Canberra, Australia in the 1960s and
1950s. Originally, the genetic algorithm or adaptive plan was designed
as a formal system for adaptation rather than an optimization system.
Its basic features are the strong emphasis on recombination (crosso\,er),
use of a probabilistic selection operator (proportional selection), and the
interpretation of mutation as a background operator, playing a minor
role for the algorithm. While the original form of genetic algorithms
(the canonical genetic algorithm) represents solutions by binary strings,
a number of variants including real-coded genetic algorithms and orderbased genetic algorithms have also been developed to make the algorithm
applicable to other than binary search spaces. (See also Chapter 8.)
Genetic code: The translation process performed by the ribosomes essentially
maps triplets of nucleotide bases to single amino acids. This (redundant)
mapping between the 43 = 64 possible codons and the 20 amino acids is
the so-called genetic code.
Genetic drift: A random decrease or increase of biological trait frequencies
within the gene pool of a population.
Genetic programming: Derived from genetic algorithms, the genetic
programming paradigm characterizes a class of evolutionary algorithms
aiming at the automatic generation of computer programs. To achieve this,
each individual of a population represents a complete computer program in
a suitable programming language. Most commonly, symbolic expressions
representing parse trees in (a subset of) the LISP language are used to
represent these programs, but also other representations (including binary
representation) and other programming languages (including machine code)
are successfully employed. (See also Chapter / /.)
Genome: The total genetic information of an organism.
Genotype: The sum of inherited characters maintained within the entire
reproducing population. Often also the genetic constitution underlying a
single trait or set of traits.
Global optimization: Given a function f : M + R, the problem of
determining a point x* E M such that f(x*)is minimal (i.e. .f(x*)5
f(x)Vx E M ) is called the global optimization problem.
Global recombination: In evolution strategies, recombination operators
are sometimes used which potentially might take all individuals of a
population into account for the creation of an offspring individual.
Such recombination operators are called global recombination (i.e. global
discrete recombination or global intermediate recombination).
Gradient method: Local optimization algorithms for continuous parameter
optimization problems that orient their choice of search directions according
to the first partial derivatives of the objective function (its gradient) are
TEAM LRN strategy).
called gradient strategies (cf hillclimbing

xxviii

Glossary

Gray code: A binary code for integer values which ensures that adjacent
integers are encoded by binary strings with Hamming distance one.
Gray codes play an important role in the application of canonical genetic
algorithms to parameter optimization problems, because there are certain
situations in which the use of Gray codes may improve the performance of
an evolutionary algorithm.
Hamming distance: For two binary vectors, the Hamming distance is the
number of different positions.
Haploid: Haploid organisms carry one set of genetic information.
Heterozygous: Diploid organisms having different alleles for a given trait.
Hillclimbing strategy: Hillclimbing methods owe their name to the analogy
of their way of searching for a maximum with the intuitive way a sightless
climber might feel his way from a valley up to the peak of a mountain
by steadily moving upwards. These strategies follow a nondecreasing path
to an optimum by a sequence of neighborhood moves. In the case of
multimodal landscapes, hillclimbing locates the optimum closest to the
starting point of its search.
Homologues: Chromosomes of identical structure, but with possibly different
genetic information contents.
Homozygous: Diploid organisms having identical alleles for a given trait.
Hybrid method: Evolutionary algorithms are often combined with classical
optimization techniques such as gradient methods to facilitate an efficient
local search in the final stage of the evolutionary optimization. The
resulting combinations of algorithms are often summarized by the term
hybrid methods.
Implicit parallelism: The concept that each individual solution offers partial
information about sampling from other solutions that contain similar
subsections. Although it was once believed that maximizing implicit
parallelism would increase the efficiency of an evolutionary algorithm,
this notion has been proved false in several different mathematical
developments (See no-free-lunch theorem).
Individual: A single member of a population. In evolutionary algorithms,
an individual contains a chromosome or genome, that usually contains at
least a representation of a possible solution to the problem being tackled
(a single point in the search space). Other information such as certain
strategy parameters and the individual's fitness value are usually also
stored in each individual.
Intelligence: The definition of the term intelligence for the purpose of clarifying
what the essential properties of artificial or computational intelligence
should be turns out to be rather complicated. Rather than taking the usual
anthropocentric view on this, we adopt a definition by D Fogel which
states that intelligence is the capability of a system to adapt its behavior to
TEAM LRN
meet its goals in a range of environments.
This definition also implies that

Glossary

xxix

evolutionary algorithms provide one possible way to evolve intelligent

systems.
Interactive evolution: The interactive evolution approach involves the human
user of the evolutionary algorithm on-line into the variation-selection
loop. By means of this method, subjective judgment relying on human
intuition, esthetical values, or taste can be utilized for an evolutionary
algorithm if a fitness criterion can not be defined explicitly. Furthermore,
human problem knowledge can be utilized by interactive evolution to
support the search process by preventing unnecessary, obvious detours from
the global optimization goal. (See also Chapter 30.)
Intermediate recombination: Intermediate recombination performs an averaging operation on the components of the two parent vectors. (See also
Section 33.2.)
Intron: A region of codons within a gene that do not bear genetic information
that is expressed for the phenotype of an organism.
Inversion: A form of mutation that changes a chromosome by rotating an
internal segment by 180' and refitting the segment into the chromosome.
Lamarckism: A theory of evolution which preceded Darwin's. Lamarck
believed that acquired characteristics of an individual could be passed to its
offspring. Although Lamarckian inheritance does not take place in nature,
the idea has been usefully applied within some evolutionary algorithms.
Locus: A particular location on a chromosome.
Markov chain: A Markov process with a finite or countable finite number of
states.
Markov process: A stochastic process (a family of random variables) such
that the probability of the process being in a certain state at time k depends
on the state at time k - 1 , not on any states the process has passed earlier.
Because the offspring population of an evolutionary algorithm typically
depends only on the actual population, Markov processes are an appropriate
mathematical tool for the analysis of evolutionary algorithms.
Meiosis: The process of cell division in diploid organisms through which germ
cells (gametes) are created.
Metaevolution: The problem of finding optimal settings of the exogeneous
parameters of an evolutionary algorithm can itself be interpreted as an
optimization problem. Consequently, the attempt has been made to use
an evolutionary algorithm on the higher level to evolve optimal strategy
parameter settings for evolutionary algorithms, thus hopefully finding a
best-performing parameter set that can be used for a variety of objective
functions. The corresponding technique is often called a metaevolutionary
algorithm. An alternative approach involves the self-adaptation of strategy
parameters by evolutionary learning.
TEAM LRN
Migration: The transfer of an individual
from one subpopulation to another.

xxx

Glossary

Migration model: The migration model (often also referred to as the island
model) is one of the basic models of parallelism exploited by evolutionary
algorithm implementations. The population is no longer panmictic,
but distributed in to several independent subpopu I at ions (so-called demes ),
which coexist (typically on different processors, with one subpopulation
per processor) and may mutually exchange information by interdeme
migration. Each of the subpopulations corresponds to a conventional
(i.e. sequential) evolutionary algorithm. Since selection takes place
only locally inside a population, every deme is able to concentrate on
different promising regions of the search space, such that the global
search capabilities of migration models often exceed those of panmictic
populations. The fundamental parameters introduced by the migration
principle are the exchange frequency of information, the number of
individuals to exchange, the selection strategy for the emigrants, and the
replacement strategy for the immigrants.
Monte Carlo algorithm: See uniform random search.
( p . A) strategy: See comma strategy.
( p A) strategy: See plus strategy.
Multiarmed bandit: Classical analysis of schema processing relied on an
analogy to sampling from a number of slot machines (one-armed bandits)
in order to minimize expected losses.
Multimembered evolution strategy: All variants of evolution strategies that
use a parent population size of 1-1 > I and therefore facilitate the utilization
of recombination are summarized under the term multimembered evolution
strategy .
Multiobjective optimization: In multiobjective optimization, the simultaneous
optimization of several, possibly competing, objective functions is required.
The family of solutions to a multiobjective optimization problem is
composed of all those elements of the search space sharing the property that
the corresponding objective vectors cannot be all simultaneously improved.
These solutions are called Pareto optimal.
Multipoint crossover: A crossover operator which uses a predefined number
of uniformly distributed crossover points and exchanges alternating
segments between pairs of crossover points between the parent individuals
(cf one-point crossover).
Mutation: A change of the genetic material, either occurring in the germ path
or in the gametes (generative) or in body cells (somatic). Only generative
mutations affect the offspring. A typical classification of mutations
distinguishes gene mutations (a particular gene is changed), chromosome
mutations (the gene order is changed by translocation or inversion.
or the chromosome number is changed by deficiencies, deletions, or
duplications), and genome mutations (the number of chromosomes or
genomes is changed). In evolutionary algorithms, mutations are either
modeled on the phenotypic TEAM
level LRN
(e.g. by using normally distributed

xxxi

G 10ssary

variations with expectation zero for continuous traits) or on the genotypic

level (e.g. by using bit inversions with small probability as an equivalent
for nucleotide base changes). (See nlso Chnptcv- 32. )
Mutation rate: The probability of the occurrence of a mutation during DNA
replication.
Natural selection: The result of competitive exclusion as organisms f i l l the
available finite resource space.
Neural network: Artificial neural networks try to implement the data
processing capabilities of brains on a computer. To achieve this (at least in
a very simplified form regarding the number of procecsing units and their
interconnectivity), simple units (corresponding to neurons) are arranged i n
a number of layers and allowed to communicate via weighted connections
(corresponding to axons and dendrites). Working (at least principally) in
parallel. each unit of the network typically calculates a weighted sum of
its inputs, performs some internal mapping of the result, and e\.entually
propagates a nonzero value to its output connection. Though the artificial
models are strong simplifications of the natural model, impressive results
have been achieved in a variety of application fields.
Niche: Adaptation of a species occurs with respect to any major kind of
environment, the adaptive zone of this species. The set of possible
environments that permit survival of a species is called its (ecological)
niche.
Niching methods: In evolutionary algorithms, niching methods aim at the
formation and maintenance of stable subpopulations (niches) within a single
population. One typical way to achieve this proceeds by means of fitness
sharing techniques.
No-free-lunch theorem: This theorem proves that when applied across all
possible problems, all algorithms that do not resample points from the
search space perform exactly the same on average. This result implies that
it is necessary to tune the operators of an evolutionary algorithm to the
problem at hand in order to perform optimally, or even better than random
search. The no-free-lunch theorem has been extended to apply to certain
subsets of all possible problems. Related theorems have been developed
indicating that
Object variables: The parameters that are directly involved in the calculation
of the objective function value of an individual.
Off-line performance: A performance measure for genetic algorithms, giving
the average of the best fitness values found in a population over the course
of the search.
115 success rule: A theoretically derived rule for the deterministic adjustment
of the standard deviation of the mutation operator in a ( 1
I ) evolution
strategy. The 1/5 success rule reflects the theoretical result that, in order to
TEAM LRN
maximize the convergence velocity,
on average one out of five mutations

xxxii

Glossary

should cause an improvement with respect to the objective function value.

(See cilso Chapter 9.)
One-point crossover: A crossover operator using exactly one crossover point
on the genome.
On-line performance: A performance measure giving the average fitness over
all tested search points over the course of the search.
Ontogenesis: The development of an organism from the fertilized zygote until
its death.
Order: The order of a schema is given by the number of specified positions
within the schema. The larger the order of a schema, the higher becomes
its probability of disruption by mutation.
Order-based problems: A class of optimization problems that can be
characterized by the search for an optimal permutation of specific items.
Representative examples of this class are the traveling salesman problem
or scheduling problems. In principle, any of the existing evolutionary
algorithms can be reformulated for order-based problems, but the first
permutation applications were handled by so-called order-based genetic
algorithms, which typically use mutation and recombination operators
that ensure that the result of the application of an operator to a permutation
is again a permutation.
Order crossover: A crossover operator used in order-based genetic
algorithms to manipulate permutations in a permutation preserving way.
The order crossover (OX) starts in a way similar to partially matched
crossover by picking two crossing sites uniformly at random along the
permutations and mapping each string to constituents of the matching
section of its mate. Then, however, order crossover uses a sliding motion
to fill the holes left by transferring the mapped positions. This way,
order crossover preserves the relative positions of elements within the
permutation. (See also Section 33.3.)
Order statistics: Given A independent random variables with a common
probability density function, their arrangement in nondecreasing order
is called the order statistics of these random variables. The theory of
order statistics provides many useful results regarding the moments (and
other properties) of the members of the order statistics. In the theory
of evolutionary algorithms, the order statistics are widely utilized to
describe deterministic selection schemes such as the comma strategy and

tournament selection.
Panmictic population: A mixed population, in which any individual may
be mated with any other individual with a probability that depends only
on fitness. Most conventional evolutionary algorithms have panmictic
populations.
Parse tree: The syntactic structure of any program in computer programming
languages can be represented by a so-called parse tree, where the internal
LRN and leaves of the tree correspond
nodes of the tree correspond toTEAM
operators

xxxiii

G1ossary

to constants. Parse trees (or, equivalently, S-expressions) are the

fundamental data structure in genetic programming, where recombination
is usually implemented as a subtree exchange between two different parse
trees. (See also Chapter 19.)
Partially matched crossover: A crossover operator used to manipulate
permutations in a permutation preserving way. The partially matched
crossover (PMX) picks two crossing sites uniformly at random along the
permutations, thus defining a matching section used to effect a cross through
position-by-position exchange operations. (See also Section 33.3.)
Penalty function: For constraint optimization problems, the penalty function
method provides one possible way to try to achieve feasible solutions: the
unconstrained objective function is extended by a penalty function that
penalizes infeasible solutions and vanishes for feasible solutions. The
penalty function is also typically graded in the sense that the closer a
solution is to feasibility, the smaller is the value of the penalty term for
that solution. By means of this property, an evolutionary algorithm is
often able to approach the feasible region although initially all members of
the population might be infeasible.
Phenotype: The behavioral expression of the genotype in a specific
environment.
Phylogeny: The evolutionary relationships among any group of organisms.
Pleiotropy: The influence of a single gene on several phenotypic features of
an organism.
Plus strategy: The notation ( p A) strategy describes a selection method
introduced in evolution strategies and indicates that a parent population
of p individuals generates h
p offspring and all p
h individuals
compete directly, such that the p best out of parents and offspring are
deterministically selected as parents of the next generation.
Polygeny: The combined influence of several genes on a single phenotypical
characteristic.
Population: A group of individuals that may interact with each other, for
example, by mating and offspring production. The typical population
sizes in evolutionary algorithms range from one (for ( 1
1 ) evolution
strategies) to several thousands (for genetic programming).
Prokaryotic cell: A cell lacking a membrane-enclosed nucleus and organelles.
Proportional selection: A selection mechanism that assigns selection
probabilities in proportion to the relative fitness of an individual. (See
also Chapter 23.)
Protein: A multiply folded biological macromolecule consisting of a long chain
of amino acids. The metabolic effects of proteins are basically caused by
their three-dimensional folded structure (the tertiary structure) as well as
their symmetrical structure components (secondary structure), which result
TEAM
LRN(primary structure).
from the amino acid order in the
chain

xxxiv

Glossary

Punctuated crossover: A crossover operator to explore the potential for selfadaptation of the number of crossover points and their positions. To
achieve this, the vector of object variables is extended by a crossover
mask, where a one bit indicates the position of a crossover point in
the object variable part of the individual. The crossover mask itself is
subject to recombination and mutation to allow for a self-adaptation of the
crossover operator.
Rank-based selection: In rank-based selection methods, the selection
probability of an individual does not depend on its absolute fitness as in
case of proportional selection, but only on its relative fitness in comparison
with the other population members: its rank when all individuals are
ordered in increasing (or decreasing) order of fitness values. (See NISO
Chapter 25. )
Recombination: See crossover.
RNA: Ribonucleic acid. The transcription process in the cell nucleus
generates a copy of the nucleotide sequence on the coding strand of the
DNA. The resulting copy is an RNA molecule, a single-stranded molecule
which carries information by means of the necleotide bases adenine,
cytosine, guanine, and uracil (U) (replacing the thymine in the DNA).
The RNA molecule acts as a messenger that transfers information from the
cell nucleus to the ribosomes, where the protein synthesis takes place.
Scaling function: A scaling function is often used when applying proportional
selection, particularly when needing to treat individuals with non-positive
evaluations. Scaling functions typically employ a linear, logarithmic, or
exponential mapping. (See also Chapter 23.)
Schema: A schema describes a subset of all binary vectors of fixed length
that have similarities at certain positions. A schema is typically specified
by a vector over the alphabet (0, 1, #}. where the ## denotes a wildcard
matching both zero and one.
Schema theorem: A theorem offered to describe the expected number of
instances of a schema that are represented in the next generation of an
evolutionary algorithm when proportional selection is used. Although
once considered to be a fundamental theorem, mathematical results show
that the theorem does not hold in general when iterated over more than one
generation and that it may not hold when individual solutions have noisy
fitness evaluations. Furthermore, the theorem cannot be used to determine
which schemata should be recombined in future generations and has little
or no predictive power.
Segmented crossover: A crossover operator which works similarly to
multipoint crossover, except that the number of crossover points is not
fixed but may vary around an expectation value. This is achieved by a
segment switch rate that specifies the probability that a segment will end
TEAM LRN
at any point in the string.

GI0ssary

xxxv

Selection: The operator of evolutionary algorithms, modeled after the

principle of natural selection, which is used to direct the search process
towards better regions of the search space by giving preference to
individuals of higher fitness for mating and reproduction. The most widely
used selection methods include the comma and plus strategies, ranking
selection, proportional selection, and tournament selection. (See also
Chapters 22-30.)
Self-adaptation: The principle of self-adaptation facilitates evolutionary
algorithms learning their own strategy parameters on-line during the
search, without any deterministic exogeneous control, by means of
evolutionary processes in the same way as the object variables are
modified. More precisely, the strategy parameters (such as mutation rates,
variances, or covariances of normally distributed variations) are part of
the individual and undergo mutation (recombination) and selection as the
object variables do. The biological analogy consists in the fact that some
portions of the DNA code for mutator genes or repair enzymes; that is,
some partial control over the DNAs mutation rate is encoded in the DNA.
Sharing: Sharing (short for fitness sharing) is a niching method that derates the
fitnesses of population elements according to the number of individuals in
a niche, so that the population ends up distributed across multiple niches.
Simulated annealing: An optimization strategy gleaned from the model of
thermodynamic evolution, modeling an annealing process in order to reach
a state of minimal energy (where energy is the analogue of fitness in
evolutionary algorithms). The strategy works with one trial solution and
generates a new solution by means of a variation (or mutation) operator.
The new solution is always accepted if it represents a decrease of energy,
and it is also accepted with a certain parameter-controlled probability if
it represents an increase of energy. The control parameter (or strategy
parameter) is commonly called temperature and makes the thermodynamic
origin of the strategy obvious.
Speciation: The process whereby a new species comes about. The most
common cause of speciation is that of geographical isolation. If a
subpopulation of a single species is separated geographically from the
main population for a sufficiently long time, its genes will diverge (either
due to differences in selection pressures in different locations, or simply
due to genetic drift). Eventually, genetic differences will be so great
that members of the subpopulation must be considered as belonging to a
different (and new) species.
Species: A population of similarly constructed organisms, capable of
producing fertile offspring. Members of one species occupy the same
ecological niche.
Steady-state selection: A selection scheme which does not use a generationwise replacement of the population, but rather replaces one individual
TEAM LRN
per iteration of the main recombine-mutate-select
loop of the algorithm.

xxxvi

Glossary

Usually, the worst population member is replaced by the result of

recombination and mutation, if the resulting individual represents a fitness
improvement compared to the worst population member. The mechanism
corresponds to a ( p 1) selection method in evolution strategies (cf plus
strategy).
Strategy parameter: The control parameters of an evolutionary algorithm
are often referred to as strategy parameters. The particular setting of
strategy parameters is often critical to gain good performance of an
evolutionary algorithm, and the usual technique of empirically searching for
an appropriate set of parameters is not generally satisfying. Alternatively,
some researchers try techniques of metaevolution to optimize the strategy
parameters, while in evolution strategies and evolutionary programming
the technique of self-adaptation is successfully used to evolve strategy
parameters in the same sense as object variables are evolved.
Takeover time: A characteristic value to measure the selective pressure of
selection methods utilized in evolutionary algorithms. It gives the
expected number of generations until, under repeated application of
selection as the only operator acting on a population, the population is
completely filled with copies of the initially best individual. The smaller
the takeover time of a selection mechanism, the higher is its emphasis on
reproduction of the best individual, i.e. its selective pressure.
Tournament selection: Tournament selection methods share the principle of
holding tournaments between a number of individuals and selecting the
best member of a tournament group for survival to the next generation.
The tournament members are typically chosen uniformly at random, and
the tournament sizes (number of individuals involved per tournament) are
typically small, ranging from two to ten individuals. The tournament
process is repeated p times in order to select a population of p members.
(See also Chapter 24.)
Transcription: The process of synthesis of a messenger RNA (mRNA)
reflecting the structure of a part of the DNA. The synthesis is performed
in the cell nucleus.
Translation: The process of synthesis of a protein as a sequence of amino
acids according to the information contained in the messenger RNA and
the genetic code between triplets of nucleotide bases and amino acids. The
synthesis is performed by the ribosomes under utilization of transfer RNA
molecules.
Two-membered evolution strategy: The two-membered or (1 I ) evolution
strategy is an evolutionary algorithm working with just one ancestor
individual. A descendant is created by means of mutation, and selection
selects the better of ancestor and descendant to survive to the next
generation (cf plus strategy).
Uniform crossover: A crossover operator which was originally defined to
LRN
work on binary strings. The TEAM
uniform
crossover operator exchanges each

Glossary

xxxvii

bit with a certain probability between the two parent individuals. The
exchange probability typically has a value of one half, but other settings
are possible (cf discrete recombination). (See also Section 33.3.)
Uniform random search: A random search algorithm which samples the
search space by drawing points from a uniform distribution over the search
space. In contrast to evolutionary algorithms, uniform random search does
not update its sampling distribution according to the information gained
from past samples, i.e. it is not a Markov process.
Zygote: A fertilized egg that is always diploid.

TEAM LRN

This page intentionally left blank

TEAM LRN

1
Introduction to evolutionary computation
David B Fogel

1.1

Introductory remarks

As a recognized field, evolutionary computation is quite young.

The term
itself was invented as recently as 1991, and it represents an effort to bring
together researchers who have been following different approaches to simulating
various aspects of evolution. These techniques of genetic algorithms (Chapter 7),
evolution strategies (Chapter 8), and evolutionary programming (Chapter 9) have
one fundamental commonality: they each involve the reproduction, random
variation, competition, and selection of contending individuals in a population.
These form the essential essence of evolution, and once these four processes are
in place, whether in nature or in a computer, evolution is the inevitable outcome
(Atmar 1994). The impetus to simulate evolution on a computer comes from at
least four directions.

1.2 Optimization
Evolution is an optimization process (Mayr 1988, p 104). Darwin ( 1 859, ch 6)
was struck with the organs of extreme perfection that have been evolved, one
such example being the image-forming eye (Atmar 1976). Optimization does not
imply perfection, yet evolution can discover highly precise functional solutions
to particular problems posed by an organisms environment, and even though
the mechanisms that are evolved are often overly elaborate from an engineering
perspective, function is the sole quality that is exposed to natural selection, and
functionality is what is optimized by iterative selection and mutation.
It is quite natural, therefore, to seek to describe evolution in terms of an
algorithm that can be used to solve difficult engineering optimization problems.
The classic techniques of gradient descent, deterministic hill climbing, and
purely random search (with no heredity) have been generally unsatisfactory when
applied to nonlinear optimization problems, especially those with stochastic,
temporal, or chaotic components. But these are the problems that nature has
seemingly solved so very well. Evolution provides inspiration for computing
TEAM LRN

Introduction to evolutionary computation

the solutions to problems that have previously appeared intractable. This was a
key foundation for the efforts in evolution strategies (Rechenberg 1965, 1994,
Schwefel 1965, 1995).

1.3 Robust adaptation

The real world is never static, and the problems of temporal optimization are
some of the most challenging. They require changing behavioral strategies
in light of the most recent feedback concerning the success or failure of the
current strategy. Holland (1973, under the framework of genetic algorithms
(formerly called reproductive plans), described a procedure that can evolve
strategies, either in the form of coded strings or as explicit behavioral rule bases
called classifier systems (Chapter I2), by exploiting the potential to recombine
successful pieces of competing strategies, bootstrapping the knowledge gained
by independent individuals. The result is a robust procedure that has the potential
to adjust performance based on feedback from the environment.

1.4 Machine intelligence

Intelligence may be defined as the capability of a system to adapt its behavior to
meet desired goals in a range of environments (Fogel 1995, p xiii). Intelligent
behavior then requires prediction, lor adaptation to future circumstances requires
predicting those circumstances and taking appropriate action. Evolution has
created creatures of increasing intelligence over time. Rather than seek to
generate machine intelligence by replicating humans, either in the rules they
may follow or in their neural connections, an alternative approach to generating
machine intelligence is to simulate evolution on a class of predictive algorithms.
This was the foundation for the evolutionary programming research of Fogel
(1962, Fogel et a1 1966).

1.5 Biology
Rather than attempt to use evolution as a tool to solve a particular engineering
problem, there is a desire to capture the essence of evolution in a computer
simulation and use the simulation to gain new insight into the physics of natural
evolutionary processes (Ray 1991) (see also Chapter 4). Success raises the
possibility of studying alternative biological systems that are merely plausible
images of what life might be like in some way. It also raises the question of what
properties such imagined systems might have in common with life as evolved on
Earth (Langton 1987). Although every model is incomplete, and assessing what
life might be like in other instantiations lies in the realm of pure speculation,
computer simulations under the rubric of artificial life have generated some
LRN
patterns that appear to correspond TEAM
with naturally
occurring phenomena.

Discussion

1.6 Discussion
The ultimate answer to the question why simulate evolution? lies in the lack
of good alternatives. We cannot easily germinate another planet, wait several
millions of years, and assess how life might develop elsewhere. We cannot
easily use classic optimization methods to find global minima in functions when
they are surrounded by local minima. We find that expert systems and other
attempts to mimic human intelligence are often brittle: they are not robust to
changes in the domain of application and are incapable of correctly predicting
future circumstances so as to take appropriate action. In contrast, by successfully
exploiting the use of randomness, or in other words the itsefd use c,furzc.er?ainty.
all possible pathways are open for evolutionary computation (Hofstadter 1995,
p 1 IS). Our challenge is, at least in some important respects, to not allow our
own biases to constrain the potential for evolutionary computation to discover
new solutions to new problems in fascinating and unpredictable ways. However,
as always, the ultimate advancement of the field will come from the careful
abstraction and interpretation of the natural processes that inspire it.

References
Atmar J W 1976 Speculation on the Evolution of Intelligence und its Possible Reuli:ation
in Machine Form Doctoral Dissertation, New Mexico State University
Atmar W 1994 Notes on the simulation of evolution IEEE Truns. Neural NrtMwks NN-5
130-47
Darwin C R 1859 On the Origin of Species by Means of Nutiiral Selection or the
Presenwtion of Failoured Races in the Struggle for L f e (London: Murray)
Fogel D B 1995 Eidutionan Computation: ToMard a Neuv Philosophy cf Machine
Intelligence (Piscataway, NJ: IEEE)
Fogel L J 1962 Autonomous automata Industr. Res. 4 14-9
Fogel L J, Owens A J and Walsh M J 1966 Artijicial Intelligence through Simulated
Evolution (New York: Wiley)
Hofstadter D I995 Fluid Concepts and Creative Analogies: Computer Models of the
Fundamental Mechanisms of Thought (New York: Basic Books)
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Langton C G 1987 Artificial life Art$cial Life ed C G Langton (Reading, MA: AddisonWesley) pp 1 4 7
Mayr E 1988 Touwd a Neu- Philosophy of Biology: 0bser~~ution.s
of an Eidutionist
(Cambridge, MA: Belknap)
Ray T 1991 An approach to the synthesis of life Art$cinl Ltfe I1 ed C G Langton, C
Taylor, J D Farmer and S Rasmussen (Reading, MA: Addison-Wesley) pp 37 1-408
Rechenberg I I965 Cybernetic Solution Path of an Experimental Problem Royal Aircraft
Establishment Library Translation I 122, Farnborough, UK
-I994 Ei,olutionsstrateC:ieS 94 (Stuttgart: Frommann-Holzboog)
Schwefel H-P 1965 Kybernetische Evolution als Strategie der Experimentellen Forschung
in der Striimungstechnik Diploma Thesis, Technical University of Berlin
-1
995 Evolution and Optimum Seeking (New York: Wiley)
TEAM LRN

2
Possible applications of evolutionary
computation
David Beasley

2.1 Introduction
Applications of evolutionary computation (EC) fall into a wide continuum of
areas. For convenience, in this chapter they have been split into five broad
categories:
0
0
0
0
0

planning
design
simulation and identification
control
classification.

These categories are by no means meant to be absolute or definitive. They

all overlap to some extent, and many applications could rightly appear in more
than one of the categories.
A number of bibliographies where more extensive information on EC
applications can be found are listed after the references at the end of this chapter.
2.2 Applications in planning
2.2. I

Routing

Perhaps one of the best known combinatorial optimization problems is the

traveling salesman problem or TSP (Goldberg and Lingle 1985, Grefenstette
1987, Fogel 1988, Oliver et a1 1987, Miihlenbein 1989, Whitley er a1 1989,
Fogel 1993a, Homaifar et a1 1993). A salesman must visit a number of cities,
and then return home. In which order should the cities be visited to minimize
the distance traveled? Optimizing the tradeoff' between speed and accuracy of
solution has been one aim (Verhoeven et a1 1992).
A generalization of the TSP occurs when there is more than one salesman
(Fogel 1990). The vehic-le routing problem is similar. There is a fleet of vehicles,
4

TEAM LRN

Applications in planning

all based at the same depot. A set of customers must each receive one delivery.
Which route should each vehicle take for minimum cost? There are constraints,
for example, on vehicle capacity and delivery times (Blanton and Wainwright
1993, Thangia et a1 1993).
Closely related to this is the transportation problem, in which a single
commodity must be distributed to a number of customers from a number of
depots. Each customer may receive deliveries from one or more depots. What
is the minimum-cost solution? (Michalewicz 1992, 1993).
Planning the path which a robot should take is another route planning
problem. The path must be feasible and safe (i.e. it must be achievable within the
operational constraints of the robot) and there must be no collisions. Examples
include determining the joint motions required to move the gripper of a robot
arm between locations (Parker et a1 1989, Davidor I99 I , McDonnell et a1 1992).
and autonomous vehicle routing (Jakob et a1 1992, Page et a1 1992). In unknown
areas or nonstatic environments, on-line planninghavigating is required, in
which the robot revises its plans as it travels.

2.2.2 Scheduling
Scheduling involves devising a plan to carry out a number of activities over a
period of time, where the activities require resources which are limited, there
are various constraints and there are one or more objectives to be optimized.
Job shop scheduling is a widely studied NP-complete problem (Davis 1985,
Biegel and Davern 1990, Syswerda 1991, Yamada and Nakano 1992). The
scenario is a manufacturing plant, with machines of different types. There are
a number of jobs to be completed, each comprising a set of tasks. Each task
requires a particular type of machine for a particular length of time, and the tasks
for each job must be completed in a given order. What schedule allows all tasks
to be completed with minimum cost? Husbands (1993) has used the additional
biological metaphor of an ecosystem. His method optimizes the sequence of
tasks in each job at the same time as it builds the schedule. In real job shops
the requirements may change while the jobs are being carried out, requiring that
the schedule be replanned (Fang et a1 1993). In the limit, the manufacturing
process runs continuously, so all scheduling must be carried out on-line, as in
a chemical flowshop (Cartwright and Tuson 1994).
Another scheduling problem is to devise a timetable for a set of examinations
(Corne et a1 1994), university lectures (Ling 1992), a staff rota (Easton and
Mansour 1993) or suchlike.
In computing, scheduling problems include efficiently allocating tasks to
processors in a multiprocessor system (Van Driessche and Piessens 1992,
Kidwell 1993, Fogel and Fogel 1996), and devising memory cache replacement
TEAM LRN
policies (Altman et a1 1993).

Possible applications of evolutionary computation

2.2.3 Packing

Evolutionary algorithms (EAs) have been applied to many packing problems, the
simplest of which is the one-dimensional zero-one knapsack problem. Given a
knapsack of a certain capacity, and a set of items, each with a particular size and
value, find the set of items with maximum value which can be accommodated
in the knapsack. Various real-world problems are of this type: for example, the
allocation of communication channels to customers who are charged at different
rates.
There are various examples of two-dimensional packing problems. When
manufacturing items are cut from sheet materials (e.g. metal or cloth), it is
desirable to find the most compact arrangemerit of pieces, so as to minimize
the amount of scrap (Smith 1985, Fujita et a1 1993). A similar problem arises
in the design of layouts for integrated circuits--how should the subcircuits be
arranged to minimize the total chip area required (Fourman 1985, Cohoon and
Paris 1987, Chan et a1 1991)?
In three dimensions, there are obvious applications in which the best way of
packing objects into a restricted space is required. Juliff (1993) has considered
the problem of packing goods into a truck for delivery.

2.3 Applications in design

The design of filters has received considerable attention. EAs have been used
to design electronic or digital systems which implement a desired frequency
response. Both finite impulse response (FIR) and infinite impulse response
(IIR) filter structures have been employed (Etter et a1 1982, Suckley 1991, Fogel
1991, Fonseca et a1 1993, Ifeachor and Harris 1993, Namibar and Mars 1993,
Roberts and Wade 1993, Schaffer and Eshelman 1993, White and Flockton 1993,
Wicks and Lawson 1993, Wilson and Macleod 1993). EAs have also been used
to optimize the design of signal processing systems (San Martin and Knight
1993) and in integrated circuit design (Louis and Rawlins 1991, Rahmani and
Ono 1993). The unequal-area facility layout problem (Smith and Tate 1993)
is similar to integrated circuit design. It involves finding a two-dimensional
arrangement of departments such that the distance which information has to
travel between departments is minimized.
EC techniques have been widely applied to artificial neural networks, both in
the design of network topologies and in the search for optimum sets of weights
(Miller et a1 1989, Fogel et a1 1990, Harp and Samad 1991, Baba 1992, Hancock
1992, Feldman 1993, Gruau 1993, Polani and Uthmann 1993, Romaniuk 1993,
Spittle and Horrocks 1993, Zhang and Muhlenbein 1993, Port0 et a1 1995). They
have also been applied to Kohonen feature map design (Polani and Uthmann
1992). Other types of network design problems have also been approached, for
TEAM
example, in telecommunications (Cox
e f LRN
nl 1991, Davis and Cox 1993).

Applications in simulation and identification

There have been many engineering applications of EC: structure design,

both two-dimensional, such as a plane truss (Lohmann 1992, Watabe and Okino
1993), and three-dimensional, such as aircraft design (Bramlette and Bouchard
1991 ), actuator placement on space structures (Furuya and Haftka 1993), linear
accelerator design, gearbox design, and chemical reactor design (Powell and
Skolnick 1993). In relation to high-energy physics, the design of Monte Carlo
generators has been tackled.
In order to perform parallel computations requiring global coordination,
EC has been used to design cellular automata with appropriate communication
mechanisms.
There have also been applications in testing and fault diagnosis. For
example, an EA can be used to search for challenging fault scenarios for an
autonomous vehicle controller.

2.4 Applications in simulation and identification

Simulation involves taking a design or model for a system, and determining how
the system will behave. In some cases this is done because we are unsure about
the behavior (e.g. when designing a new aircraft). In other cases, the behavior
is known, but we wish to test the accuracy of the model.
EC has been applied to difficult problems in chemistry and biology. Roosen
and Meyer (1992) used an evolution strategy to determine the equilibrium of
chemically reactive systems, by determining the minimum free enthalpy of the
compounds involved. The determination of the three-dimensional structure of
a protein, given its amino acid sequence, has been tackled (Lucasius et a1
1991). Lucasius and Kateman (1992) approached this as a sequenced subset
selection problem, using two-dimensional nuclear magnetic resonance spectrum
data as a starting point. Others have searched for energetically favorable protein
conformations (Schulze-Kremer 1992, Unger and Moult 1993), and used EC to
assist with drug design (Gehlhaar et a1 1995). EC has been used to simulate
how the nervous system learns in order to test an existing theory. Similarly, EC
has been used in order to help develop models of biological evolution.
In the field of economics, EC has been used to model economic interaction
of competing firms in a market.
Identification is the inverse of simulation. It involves determining the design
of a system given its behavior.
Many systems can be represented by a model which produces a single-valued
output in response to one or more input signals. Given a number of observations
of input and output values, system identification is the task of deducing the
details of the model. Flockton and White (1993) concern themselves with
determining the poles and zeros of the system.
One reason for wanting to identify systems is so that we can predict the
output in response to a given set of inputs. EC may also employed to fit
TEAM
LRNin order to predict future values.
equations to noisy, chaotic medical
data,

Possible applications of evolutionary computation

Janikow and Cai (1992) similarly used EC to estimate statistical functions for
survival analysis in clinical trials. In a similar area, Manela et ul (1993) used
EC to fit spline functions to noisy pharmaceutical fermentation process data.
EC may also be used to identify the sources of airborne pollution, given data
from a number of monitoring points in an urban area-the source apportionment
problem. In electromagnetics, Tanaka et a1 (1993) have applied EC to
determining the two-dimensional current distribution in a conductor, given its
external magnetic field. Away from conventional system identification, an EC
approach has been used to help with identifying criminal suspects. This system
helps witnesses to create a likeness of the suspect, without the need to give an
explicit description.

2.5 Applications in control

There are two distinct approaches to the use of EC in control: off-line and
on-line. The off-line approach uses an EA to design a controller, which is then
used to control the system. The on-line approach uses an EA as an active
part of the control process. Therefore, with the off-line approach there is
nothing evolutionary about the control process itself, only about the design
of the controller.
Some researchers (Fogel et a1 1966, DeJong 1980) have sought to use the
adaptive qualities of EAs in order to build on-line controllers for dynamic
systems. The advantage of an evolutionary controller is that it can adapt to
cope with systems whose characteristics change over time, whether the change is
gradual or sudden. Most researchers, however, have taken the off-line approach
to the control of relatively unchanging systems.
Fonseca and Fleming (1993) used an EA to design a controller for a gas
turbine engine, to optimize its step response, and a control system has been
used to optimize combustion in multiple-burner furnaces and boiler plants.
EC has also been applied to the control of guidance and navigation systems
(Krishnakumar and Goldberg 1990, 1992).
Hunt (1992b) has tackled the problem of synthesizing LQG (linearquadratic-Gaussian) and H , (H-infinity) optimal controllers. He has also
considered the frequency domain optimization of controllers with fixed structures
(Hunt 1992a).
Two control problems which have been well studied are balancing a pole
on a movable cart (Fogel 1995), and backing up a trailer truck to a loading
bay from an arbitrary starting point (Abu Zitar and Hassoun 1993). In robotics,
EAs have been developed which can evolve control systems for visually guided
behaviors. They can also learn how to control mobile robots (Kim and Shim
1995), for example, controlling the legs of a six-legged insect to make it crawl
or walk (Spencer 1993). Almhssy and Verschure (1992) modeled the interaction
TEAM LRNof individuals during their lifetimes
between natural selection and the adaptation

Applications in classification

to develop an agent with a distributed adaptive control framework which learns

to avoid obstacles and locate food sources.

2.6 Applications in classification

As described in Chapter 12, a significant amount of EC research has concerned
the theory and practice of classifier systems (CFS) (Booker 1985, Holland
1985, 1987, Holland et a1 1987, Robertson 1987, Wilson 1987, Fogarty 1994).
Classifier systems are at the heart of many other types of system. For example,
many control systems rely on being able to classify the characteristics of their
environment before an appropriate control decision can be made. This is true
in many robotics applications of EC, for example, learning to control robot arm
motion (Pate1 and Dorigo 1994) and learning to solve mazes (Pipe and Carse
1994).
An important aspect of a classifier system, especially in a control application,
is how the state space is partitioned. Many applications take for granted
a particular partitioning of the state space, while in others, the appropriate
partitioning of the state space is itself part of the problem (Melhuish and Fogarty
I 994).
Game playing is another application for which classification plays a key
role. Although EC is often applied to rather simple games (e.g. the prisoners
dilemma (Axelrod 1987, Fogel 1993b)), this is sometimes motivated by more
serious applications, such as military ones (e.g. the two-tanks game (Fairley and
Yates 1994) and air combat maneuvering.
EC has been hybridized with feature partitioning and applied to a range of
tasks (Guvenir and Sirin 1993), including classification of iris flowers, prediction
of survival for heart attack victims from echocardiogram data, diagnosis of heart
disease, and classification of glass samples. In linguistics, EC has been applied
to the classification of Swedish words.
In economics, Oliver (1993) has found rules to reflect the way in which
consumers choose one brand rather than another, when there are multiple criteria
on which to judge a product. A fuzzy hybrid system has been used for financial
decision making, with applications to credit evaluation, risk assessment, and
insurance underwriting.
In biology, EC has been applied to the difficult task of protein secondarystructure determination, for example, classifying the locations of particular
protein segments (Handley 1993). It has also been applied to the classification
of soil samples (Punch et a1 1993).
In image processing, there have been further military applications,
classifying features in images as targets (Bala and Wechsler 1993, Tackett 1993),
and also non-military applications, such as optical character recognition.
Of increasing importance is the efficient storage and retrieval of information,
including the generation of equifrequency distributions of material, to improve
TEAM LRN
that efficiency. EC has also been employed
to assist with the representation and

Possible applications of evolutionary computation

storage of chemical structures, and the retrieval from databases of molecules

containing certain substructures (Jones er a1 1993). The retrieval of documents
which match certain characteristics is becoming increasingly important as more
and more information is held on-line. Tools to retrieve documents which contain
specified words have been available for many years, but they have the limitation
that constructing an appropriate search query can be difficult. Researchers are
now using EAs to help with query construction (Yang and Korfhage 1993).

2.7 Summary
EC has been applied in a vast number of application areas. In some cases it has
advantages over existing computerized techniques. More interestingly, perhaps,
it is being applied to an increasing number of areas in which computers have
not been used before. We can expect to see the number of applications grow
considerably in the future. Comprehensive bibliographies in many different
application areas are listed after the References.

References
Abu Zitar R A and Hassoun M H 1993 Regulator control via genetic search and assisted
reinforcement Proc. 5th Int. Conj: on Genetic Algorithms (Urbunu-Chumpuign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 254-62
Almassy N and Verschure P 1992 Optimizing self-organising control architectures with
genetic algorithms: the interaction between natural selection and ontogenesis
Purullel Problem Solving from Nuture, 2 (Proc. 2nd Int. Conj: on Purullel Problem
Sohping from Nuture, Brussels, 1992) ed R Manner and B Manderick (Amsterdam:
Elsevier) pp 451-60
Altman E R, Agarwal V K and Gao G R 1993 A novel methodology using genetic
algorithms for the design of caches and cache replacement policy Pmc. 5th Irtt.
Conj: on Genetic Algorithms ( Urbunu-Chumpuign, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 392-9
Axelrod R 1987 The evolution of strategies in the iterated prisoners dilemma Genetic
Algorithms und Sirnulated Annealing ed L Davis (Boston, MA: Pitman) ch 3, pp 3241
Baba N 1992 Utilization of stochastic automata and genetic algorithms for neural network
learning Purullel Problem Solving from Nuture, 2 (Proc. 2nd Int. Conf on Purullel
Problem Solving frorn Nuture, Brussels, 1992) ed R Manner and B Manderick
(Amsterdam: Elsevier) pp 43 1-40
Bagchi S, Uckun S, Miyabe Y and Kawamura K 1991 Exploring problem-specific
recombination operators for job shop scheduling Proc. 4th lnt. ConJ on Genetic
Algorithms (Sun Diego, CA, July 1991) ed R Belew and L Booker (San Mateo, CA:
Morgan Kaufmann) pp 10-7
Bala J W and Wechsler H 1993 Learning to detect targets using scale-space and genetic
search Proc. 5th lnt. Conj: on Genetic Algorithms ( Urbunu-Champuigtt, IL, July
TEAM LRN
1993) ed S Forrest (San Mateo. CA: Morgan Kaufmann) pp 516-22

References

Biegel J E and Davern J J 1990 Genetic algorithms and job shop scheduling Comput.
Indust. Eng. 19 8 1-9 I
Blanton J L and Wainwright R L 1993 Multiple vehicle routing with time and capacity
constraints Proc. 5th lnt. Cot$ on Genetic Algorithms (Urbana-Champaign, lL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 452-9
Booker L 1985 Improving the performance of genetic algorithms in classifier systems
Proc. 1st h i t . Con$ on Genetic Algorithms (Pittsburgh, PA, July 1985) ed J J
Grefenstette (Hillsdale, NJ: Lawrence Erlbaum Associates) pp 80-92
Bramlette M F and Bouchard E E 1991 Genetic algorithms in parametric design of
aircraft Handbook of Genetic Algorithms ed L Davis (New York: Van Nostrand
Reinhold) ch 10, pp 109-23
Cartwright H M and Tuson A L 1994 Genetic algorithms and flowshop scheduling:
towards the development of a real-time process control system Eidutiorzan
Computing (AISB Workshop, Leeds, 1994, Selected Pupers) (Lecture Notes itz
Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 277-90
Chan H, Mazumder P and Shahookar K 1991 Macro-cell and module placement by
genetic adaptive search with bitmap-represented chromosome Ititegrutiotz VLSI J .
12 49-77
Cohoon J P and Paris W D 1987 Genetic placement IEEE Tram. Computer-Aided Design
CAD-6 956-64
Corne D, Ross P and Fang H-L 1994 Fast practical evolutionary timetabling Evolutionary
Computing (AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in
Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 250-63
Cox L A, Davis L and Qiu Y 1991 Dynamic anticipatory routing in circuit-switched
telecommunications networks Handbook of Genetic Algorithms ed L Davis (New
York: Van Nostrand Reinhold) ch 1 I , pp 124-43
Davidor Y 1991 A genetic algorithm applied to robot trajectory generation Handbook
of Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold) ch 12,
pp 144-65
Davis L 1985 Job shop scheduling with genetic algorithms Proc. 1st hit. Cot$ on Genetic
AIgorithms (Pittsburgh, PA, July 1985) ed J J Grefenstette (Hillsdale, NJ: Lawrence
Erlbaum Associates) pp 1 3 6 4 0
Davis L and Cox A 1993 A genetic algorithm for survivable network design Proc. 5th
Int. Con$ on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) pp 408-15
DeJong K 1980 Adaptive system design: a genetic approach IEEE Trans. Systems, Man
Cybern. SMC-10 566-74
Easton F F and Mansour N 1993 A distributed genetic algorithm for employee staffing
and scheduling problems Proc. 5th In?. Con5 on Genetic Algorithms (UrbatiaChampaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
pp 360-67
Etter D M, Hicks M J and Cho K H 1982 Recursive adaptive filter design using
an adaptive genetic algorithm IEEE Int. Con$ on Acoutics, Speech arid Signal
Processing (Piscataway, NJ: IEEE) pp 635-8
Fairley A and Yates D F 1994 Inductive operators and rule repair in a hybrid genetic
learning system: some initial results Evolutionary Computing (AISB Workshop,
Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science 865) ed T C
TEAM LRN
Fogarty (Berlin: Springer) pp 166-79

Possible applications of evolutionary computation

Fang H-L, Ross P and Corne D 1993 A promising genetic algorithm approach to jobshop scheduling, rescheduling and open-shop scheduling problems Proc. 5th lnr.
Conf on Genetic Algorithms (Urbunu-Chumpuigti, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 375-82
Feldman D S 1993 Fuzzy network synthesis with genetic algorithms Proc. 5th Int. Cot@
on Genetic Algorithms ( Urbanu-Chumpaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 3 12-7
Flockton S J and White M 1993 Pole-zero system identification using genetic algorithms
Proc. 5th lnt. Cot$ on Genetic Algorithms ( Urbunu-Chumpuigti, IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 531-5
Fogarty T C 1994 Co-evolving co-operative populations of rules in learning control
systems Evolutionan Computing (AISB Workshop, Leeds, 1994, Selected Pupers)
(Le(-tureNotes in Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 195209
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biol.
Cybernet. 6 1 3 9 4 4
-1990
A parallel processing approach to a multiple traveling salesman problem
using evolutionary programming Proc. 4rh Atin. Synip. on Purullel Processing
(Piscataway, NJ: IEEE) pp 318-26
-199 I System Ident$c~ution through Siniuluted Evolution (Needham, MA: Ginn)
-1993a Applying evolutionary programming to selected traveling salesman problems
Cyberner. Sysr. 24 27-36
-1993b Evolving behaviors in the iterated prisoners dilemma E\wlur. Compur. 1
77-97
-1995 E\dutionury Coniputution: T0Mw-d U N e ~ vPhilosophy ($Muchine Intelligenc~e
(Piscataway. NJ: IEEE)
Fogel D B and Fogel L J 1996 Using evolutionary programming to schedule tasks on a
suite of heterogeneous computers Cotput. Operat. Res. 23 527-34
Fogel D B, Fogel L J and Porto V W 1990 Evolving neural networks B i d . Cyhern. 63
387-93
Fogel L J, Owens A J and Walsh M J 1966 Artijicial intelligence Through Simulated
E\diition (New York: Wiley)
Fonseca C M and Fleming P J 1993 Genetic algorithms for multiobjective optimization:
formulation, discussion and generalization Pro(-. 5th lnt. Conf on Genetic
Algorithms ( Urburrci-Chutmpciigti, IL, July 199.1) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 416-23
Fonseca C M, Mendes E M, Fleming P J and Billings S A 1993 Non-linear model
term selection with genetic algorithms Nutiirul Algorithms in Signal Processing
( Workshop, Chelmsfbrd, U K , No\*emher 1993) vol 2 (London: IEE) pp 27/1-27/8
Fourman M P 1985 Compaction of symbolic layout using genetic algorithms Proc. 1st
lnt. Cot$ on Generic Algorithms (Pittsburgh, PA, July 1985) ed J J Grefenstette
(Hillsdale, NJ: Lawrence Erlbaum Associates) pp 14 1-53
Fujita K, Akagi S and Hirokawa N 1993 Hybrid approach for optimal nesting
using genetic algorithm and a local minimization algorithm Ad\unce.s in Design
Automution vol I , DE-65- 1 (ASME) pp 477-84
Furuya H and Haftka R T 1993 Genetic algorithms for placing actuators on space
structures Proc. 5th Irrt. Conj: on Genetic Algorithms ( Urhana-Chantpaigti, IL, July
TEAM LRN
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 536-42

References

Gehlhaar D K, Verkhivker G M, Rejto P A, Sherman C J, Fogel D B, Fogel L J and

Freer S T 1995 Molecular recognition of the inhibitor AG- 1343 by HIV- 1 protcaw:
conformationally flexible docking by evolutionary programming Chcvn. Biol. 2 3 1724
Goldberg D E and Lingle R 1985 Alleles, loci and the travelling salesman problem Proc.
1st h i t . Col$ on Genetic Algorithms (Pittsburgh, PA, Julv 1985) ed J J Grefen\tctte
(Hillsdale, NJ: Lawrence Erlbaum Associates) pp 153-9
Grefenstette J J I987 Incorporating problem specific knob ledge into genetic algorithms
Genetic Algorithtns mid Sirnirluted Anneuling ed L Davi\ (Boston. MA: Pitman )
ch 4, pp 42-60
Gruau F 1993 Genetic synthesis of modular neural networks Proca. 5th hit. Cotif. on
Genetic Algorithm (Urbuiiu-Chatnpuigti, lL, J u l y 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 318-25
Giivenir H A and Sirin 1 1993 A genetic algorithm for classification by feature recognition
P I W . 5th Itit. Cot$ on Genetic Algorithms (Urt~urici-Cliunil,~iigii.
IL, Julj 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 543-8
Hancock P 1992 Recombination operators for the design of neural nets by genetic
algorithm Purullel Problem Sohitig from Nuture, 2 ( Proc. 2nd Int. Conf: o t i Parullel
Problem Solling from Nuture, Bru.ssels, 1992) ed R Manner and B Manderich
(Amsterdam: Elsevier) pp 441-50
Handley S 1993 Automated learning of a detector for a-helices in protein sequences
via genetic programming Pmc. 5th Itit. Cot$ oti Crnutic Algorithtns (UrlxrritrChuinpuign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufinann)
pp 27 1-8
Harp S A and Samad T 1991 Genetic synthesis of neural netnork architecture HuntllmoX
of Genetic Algorithms ed L Davis (New York: Van No\trand Reinhold) ch IS,
pp 202-21
Holland J H 1985 Properties of the bucket-brigade algorithm Proc. 1st h i t . Conf. O H
Genetic Algorithtns (Pittsburgh, PA, July 1985) ed J J Grefenstette (Hillsdale, NJ:
Lawrence Erlbaum Associates) pp 1-7
-1
987 Genetic algorithms and classifier system\: foundations and future directions
Proc. 2nd Int. Cotif: on Genetic Algorithms (PittsburtqIi, PA, July 198s) ed J J
Grefenstette (Hillsdale, NJ: Lawrence Erlbaum Ajsociates) pp 82-9
Holland J H, Holyoak K J, Nisbett R E and Thagard P R 1987 Classifier system\. Qmorphisms and induction Genetic Algorithms arid Simulated Anneuling ed L Da\ is
(Boston, MA: Pitman) ch 9, pp I 16-28
Homaifar, A., Guan S and Liepins G 1993 A new approach to the travelling salesman
problem by genetic algorithms Proc. 5th Itit. Conf: on Gertetic Aigorithnis ( UrhantrCliutnpuign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
PP 460-6
Hunt K J 1992a Optimal control system synthesis with genetic algorithms Purullcl
Probleni Soll*itig.front Nuture, 2 (Proc. 2nd Int. Cotif: on Purullel Pmhleni Sohiiig
.from Nuture, Brussels, 1992) ed R Manner and B Manderick (Amsterdam: Elsevier)
pp 381-9
-1
992b Polynomial LQG and H , controller synthesis: a genetic algorithm solution
Proc. IEEE Cotif: on Decision arid Control (Tuscon,AZ) (Picataway. NJ: IEEE)
Husbands P I993 An ecosystems model for integrated production planning lnt. J . Coinput.
Integruterl Munuj~ic~turing
6 74-86TEAM LRN

Possible applications of evolutionary computation

Ifeachor E C and Harris S P 1993 A new approach to frequency sampling filter

design using genetic algorithms Natural Algorithms in Signal Processing (Workshop.
Chelrnsjbrd, U K , November 1993) vol 1 (London: IEE) pp 5/1-5/8
Jakob W, Gorges-Schleuter M and Blume C 1992 Application of genetic algorithms to
task planning and learning Purcrllel Problem Solving from Nuture, 2 (Proc. 2nd lnt.
Conj: on Purullel Problem Solving from Nature, Brussels, 1992) ed R Manner and
B Manderick (Amsterdam: Elsevier) pp 29 1-300
Janikow C Z and Cai H 1992 A genetic algorithm application in nonparametric functional
estimation Purullel Problenz Sulving from Nuture, 2 (Proc. 2nd Itit. Cot$ on Parullel
Problem Solving jkom Nuture, Brussels, 1992) ed R Manner and B Manderick
(Amsterdam: Elsevier) pp 249-58
Jones G. Brown R D, Clark D E, Willett P and Glen R C 1993 Searching databases of
two-dimensional and three dimensional chemical structures using genetic algorithms
Proc. 5th Itit. Conj: on Genetic Algorithm3 ( Urbunu-Champaign, lL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 597-602
Juliff K 1993 A multi-chromosome genetic algorithm for pallet loading Proc. 5th Int.
Cot$ on Genetic Algorithms ( Urbonu-Champuign, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 467-73
Kidwell M D 1993 Using genetic algorithms to schedule distributed tasks on a bus-based
system Proc. 5th Int. Conj: on Genetic Algorithnu ( Urbana-Champaign, IL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 368-74
Kim J-H and Shim H-S 1995 Evolutionary programming-based optimal robust locomotion
control of autonomous mobile robots Proc. 4th Ann. Col$ on Evolutionary
Prugrutnming ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA:
MIT Press) pp 631-44
Krishnakumar K and Goldberg D E 1990 Genetic algorithms in control system
optimiLation. Proc.. AlAA Guiclurrc*e,Nuvigution, and Control Cot$ (Portland, OR)
pp 1568-77
-1992 Control system optimimtion using genetic algorithms J . Guidunce Contrul
Dynam. 15 735-40
Ling S-E 1992 Integrating genetic algorithms with a prolog assignment program as a
hybrid solution for a polytechnic timetable problem Parullel Prublem Sohing from
Nuture, 2 (Pruc. 2nd Int. Cot$ on Parallel Problem Solving from Nature, Brussels,
1092) ed R Manner and B Manderick (Amsterdam: Elsevier) pp 321-9
Lohmann R 1992 Structure evolution and incomplete induction Purullel Problem Solving
from Nuture, 2 ( Proc. 2nd Int. Cot$ on Parallel Problem Solvirig from Nature,
Brussels, 1992) ed R Manner and B Manderick (Amsterdam: Elsevier) pp 175-85
Louis S J and Rawlins G J E 1991 Designer genetic algorithms: genetic algorithms in
structure design Proc. 4th lnt. Conj: on Genetic Algorithms (Sun Diego, CA, July
1991) ed R Belew and L Booker (San Mateo, CA: Morgan Kaufmann) pp 53-60
Lucasius C B, Blommers M J, Buydens L M and Kateman G 1991 A genetic algorithm
for conformational analysis of DNA Handbook (,f Genetic Algorithms ed L Davis
(New York: Van Nostrand Reinhold) ch 18, pp 25 1-81
Lucasius C B and Kateman G 1992 Towards solving subset selection problems with the
aid of the genetic algorithm Purullel Problem Solving from Nuture, 2 (Proc. 2nd
ltrt. Cot$ un Purullel Problem Sollying from Nature, BrusseO, 1992) ed R Manner
TEAM LRN
and B Manderick (Amsterdam: Elsevier) pp 239-47

References

Manela, M., Thornhill N and Campbell J A 1993 Fitting spline functions to noisy data
using a genetic algorithm Proc. 5th Int. Con$ on Genetic Algorithms (UrbanuChampaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
pp 549-56
McDonnell J R, Andersen B L, Page W C and Pin F G 1992 Mobile manipulator
configuration optimization using evolutionary programming Proc. I st Ann. Conf on
Evolutionary Progrumming ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary
Programming Society) pp 52-62
Melhuish C and Fogarty T C 1994 Applying a restricted mating policy to determine state
space niches using immediate and delayed reinforcement E\dutinnun Compirting
(AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science
865) ed T C Fogarty (Berlin: Springer) pp 224-37
Michalewicz Z 1992 Genetic Algorithms + Data Structures = E\wlution ProgrumJ (Berlin:
Springer)
-1993
A hierarchy of evolution programs: an experimental study Eidut. Conrput. 1
5 1-76
Miller G F, Todd P M and Hegde S U 1989 Designing neural networks using genetic
algorithms. Proc. 3rd Int. Conj on Genetic Algoritlims (Fairfar. VA, June 1989) ed
J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 379-84
Miihlenbein H I989 Parallel genetic algorithms, population genetics and combinatorial
optimization Proc. 3rd hit. Con5 on Genetic Algorithms (Fuirjiar, VA, June 1989)
ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 4 16-2 I
Namibar R and Mars P 1993 Adaptive IIR filtering using natural algorithms Nuturul
Algorithms in Signal Processing (Workshop, Chelm.?ford.UK, Noiwnber 1993) vol
2 (London: IEE) pp 20/1-20/10
Oliver J R 1993 Discovering individual decision rules: an application of genetic
algorithms Proc.. 5th lnt. Con5 on Genetic Algorithms ( Urbana-Chanil~aiRri,
fL,J i c l ~
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 216-22
Oliver I M, Smith D J and Holland J R C 1987 A study of permutation crossover operators
on the travelling salesman problem Proc. 2nd Int. Conf on Genetic Algorithms
(Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 224-30
Page W C, McDonnell J R and Anderson B 1992 An evolutionary programming
approach to multi-dimensional path planning Proc.. I st Ann. Conf on E\wlurionur>
Programming ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 63-70
Parker J K, Goldberg D E and Khoogar A R 1989 Inverse kinematics of redundant robots
using genetic algorithms Proc. Int. Con6 on Robotics und Automation (Scottsdde,
AZ, 1989) vol 1 (Los Alamitos: IEEE Computer Society Press) pp 271-6
Pate1 M J and Dorigo M 1994 Adaptive learning of a robot arm Evolutionuv Computing
(AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science
865) ed T C Fogarty (Berlin: Springer) pp 180-94
Pipe A G and Carse B 1994 A comparison between two architectures for searching and
learning in maze problems Evolutionary Computing (AISB Workshop, Leeds, 1994,
Selected Papers) (Lecture Notes in Computer Scierzc-e865) ed T C Fogarty (Berlin:
Springer) pp 2 3 8 4 9
Polani D and Uthmann T 1992 Adaptation of Kohonen feature map topologies by genetic
TEAM
algorithms Parallel Problem Solving
fromLRN
Nature, 2 (Proc. 2nd lnt. Cot$ on Purullrl

Possible applications of evolutionary computation

Problem Soliing frotri Nuture, Bntssels, 1992) ed R MCnner and B Manderick

(Amsterdam: Elsevier) pp 421-9
___ 1993 Training Kohonen feature maps in different topologies: an analysis using
genetic algorithms Proc. 5th Itit. Cot$ oti Getretic Algorithim ( Urbatici-Chanipirign,
lL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 326-33
Porto V W, Fogel D B and Fogel L J 1995 Alternative neural network training methods
IEEE Expert 10 16-22
Powell D and Skolnick M M I993 Using genetic algorithms i n engineering design
optimization with non-linear constraints Proc. 5th I t i t . Cot$ on Genetic A1gorithnr.s
( Wrbatiti-Clicim~~ciigti,
lL, Jidy IYY.3) ed S Forrest (San Mateo. CA: Morgan
Kaufmann) pp 424-31
Punch W F, Goodman E D, Pei, M, Chia-Shun L, Hovlond P and Enbody R 1993 Further
research on feature selection and classification using genetic algorithms Proc.. 5th
Itit. Cot!/: oti Genetic Algorithms ( Urbunu-Champuign, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) pp 557-64
Rahmani A T and Ono N 1993 A genetic algorithm for channel routing problem Proc. 5th
Itit. Cot$ oti Geaetic Algorithttis ( Urbuna-Chami,aigii, IL, Jidy 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) pp 494-8
Roherts A and Wade G 1993 A structured GA for FIR filter design Nccri.rrd Algorithms
in Sigtiul Processing ( Workshop, Chelnis/i)rd, U K , N o I w i b e r 1993) vol 1 (London:
IEE) pp 16/1-16/8
Rohertson G 1987 Parallel implementation of genetic algorithms in a classifier system
Genetic Algorithms cind Simitlcitetl Anrieulirrg et1 L Davis (Boston, MA: Pitman)
ch 10, pp 12940
Romaniuk S G I993 Evolutionary growth perceptrons Proc.. 5th Itit. Cot$ oti Genetic
Algorithtns ( U r b ~ ~ t i ~ r - C h t r m l ~ IL,
~ i i gJuly
n , 1993) ed S Forrest (San Mateo. CA:
Morgan Kaufmann) pp 33441
Roosen P and Meyer F I992 Determination of chemical equilibria by means of an
evolution strategy Pm-crllel Problem Solritig frottr Ncttirre, 2 (Proc. 2nd Itit. Cot!/:
o ~ iPiircillel Problem Sol\+ig frorji Nutit re, Brir.ssels, 1992) ed R Manner and B
Manderick (Amsterdam: Elsevier) pp 4 1 1-20
San Martin R and Knight J P 1993 Genetic algorithms for optimization of integrated
circuit synthesis Proc. 5th Itit. ConJ on Genetit. Algorirht?i.s ( UrbLitiu-Ctinntpaigti,
IL, J i d ~ lY93)
*
ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 432-8
Schaffer J D and Eshelman L J I993 Designing multiplierless digital filters using genetic
algorithms Proc. 5th Itit. Cot$ o t i Genetic Algorirhrns ( UrhariLi-ChcimpNig,l. IL. JU(Y
10931 ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 439-44
SchulLe-Kremer S 1992 Genetic algorithms for protein tertiary structure prediction
Purnllel Problem Sol\irig from Ntiture, 2 (Proc-.2nd [ ) i t . Cot$ o r i Purullel Problem
Sohitig from Nuture, Britssels, 1992) ed R MCnner and B Manderick (Amsterdam:
Elsevier) pp 3914100
Smith A E and Tate D M 1993 Genetic optimization using a penalty function Pmc. -5th
h r . Cot$ o t i Genetic Algorithnis (Urbatici-Chumpciig,i, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) pp 499-505
Smith D 1985 Bin packing with adaptive search Proc. 1st Itit. Conf:oti Genetic Algorirhtiis
(Pittsburgh, PA, Jitls 1985) ed J J Grefenstette (Hillsdale, NJ: Lawrence Erlbaum
TEAM LRN
A ssoci ate s )

References

Spencer G F 1993 Automatic generation of programs for crawling and walking Proc. 5th
lnt. Con$ on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) p 654
Spittle M C and Horrocks D H 1993 Genetic algorithms and reduced complexity artificial
neural networks Natural Algorithms in Signal Processing (Workshop, Chelmsford,
UK, November 1993) vol I (London: IEE) pp 811-819
Suckley D 1991 Genetic algorithm in the design of FIR filters IEE Proc. G 138 234-8
Syswerda G 1991 Schedule optimization using genetic algorithms Handbook qf Generic
Algorithms ed L Davis(New York: Van Nostrand Reinhold) ch 21, pp 3 3 2 4 9
Tackett W A 1993 Genetic programming for feature discovery and image discrimination
Proc. 5th Itit. Cot$ on Genetic Algorithms (Urba,icr-Chainl,aigti,IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 303-9
Tanaka, Y., Ishiguro A and Uchikawa Y 1993 A genetic algorithms application to inverse
problems in electromagnetics Proc. 5th Itit. Con$ on Genetic Algorithms ( UrbunaChampaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) p 656
Thangia S R, Vinayagamoorthy R and Gubbi A V 1993 Vehicle routing with time
deadlines using genetic and local algorithms Proc. 5th hit. Conf on Genetic
Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 506-13
Unger R and Moult J 1993 A genetic algorithm for 3D protein folding simulations
Proc. 5th Int. Con& on Genetic Algorithms ( Urbana-CIiunipaigri, IL. July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 581-8
Van Driessche R and Piessens R 1992 Load balancing with genetic algorithms Pardlel
Problem Solving from Nature, 2 (Proc. 2nd lnt. Cor$ on Parallel Problem Solling
from Nature, Brussels, 1992) ed R Manner and B Manderick (Amsterdam: Elsevier)
pp 341-50
Verhoeven M G A, Aarts E H L, van de Sluis E and Vaessens R J M 1992 Parallel local
search and the travelling salesman problem Parallel Problem Solling from Nurure,
2 (Proc. 2nd Itit. Con5 on Parallel Problem Solving from Nature, Brussels, 1992)
ed R Manner and B Manderick (Amsterdam: Elsevier) pp 543-52
Watabe H and Okino N 1993 A study on genetic shape design Pro(.. 5th Int. Conf on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 445-50
White M and Flockton S 1993 A comparative study of natural algorithms for adaptive
IIR filtering Natural Algorithms in Signal Processing (Workshop, Chelmsford, UK,
November 1993) vol 2 (London: IEE) pp 22/1-22/8
Whitley D, Starkweather T and Fuquay D 1989 Scheduling problems and travelling
salesmen: the genetic edge recombination operator Proc. 3rd h i t . Cot$ on Genetic
Algorithms (Fairfim, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 1 3 3 4 0
Wicks T and Lawson S 1993 Genetic algorithm design of wave digital filters with
a restricted coefficient set Natural Algorithms in Signal Processing (Workshop,
Chelmsford, UK, November 1993) vol 1 (London: IEE) pp 17/1-17/7
Wilson P B and Macleod M D 1993 Low implementation cost IIR digital filter
design using genetic algorithms Natural Algorithms in Signcil Processing (Workshop,
Chelmsford, UK, November 1993) vol 1 (London: IEE) pp 4 / 1 4 / 8
Wilson S W I987 Hierarchical credit allocation in a classifier system Genetic Algorirhms
TEAM
LRN MA: Pitman) ch 8, pp 104-15
(Boston,
and Simulated Annealing ed L Davis

Possible applications of evolutionary computation

Yamada T and Nakano R 1992 A genetic algorithm applicable to large-scale job-shop

problems Purallel Problem Solving from Nuture, 2 (Proc. 2nd Int. Cot$ on Purullel
Problem Solving from Nuture, Brir.ssel.s, 1992) ed R Manner and B Manderick
(Amsterdam: Elsevier) pp 28 1-90
Ymg J-J and Korthage R R 1993 Query optimization in information retrieval using
genetic algorithms Proc. 5th Int. Conf on Genetic Algorithms ( Urbunu-Champuign,
IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 603-1 1
Zhang B-T and Muhlenbein H 1993 Genetic programming of minimal neural nets using
Occam's razor Proc. 5th Int. Conf on Genetic Algorithms ( Urbanci-Champuign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgari Kaufmann) pp 342-9

Further reading
This article has provided only a glimpse into the range of applications for
evolutionary computing. A series of comprehensive bibliographies has been
produced by J T Alander of the Department of Information Technology and
Production Economics, University of Vaasa, as listed below.
Art and Music: Indexed Bibliography of' Genctic Algorithms in Art and Music
Report 94- 1 -ART (ftp.uwasa.fi/cs/report94- I/gaARTbib.ps.Z)
Chemistry and Physics: Indexed Bibliogruphy of Genetic Algorithms in
Chemistn and Physics Report 94- I -CHEMPHYS (ftp.uwasa.fi/cs/report941/gaCHEMPHY Sbib.ps.Z)
Control: bzdesecl Bihliogruphy of Genetic Algorithms in Control. Report 94- I CONTROL (ftp.uwasa.fi/cs/report94- I /gaCONTKOLbib.ps.Z)
1 . Computer Aided Design: Indexed Bibliography of Genetic Algorithms in Computrr
Aided Design Report 94- I -CAD (ftp.uwasa.fi/cs/report94- 1 /gaCADbib.ps.Z)

2. Computer Science: Indexed Bihliogruphy of Genetic Algorithrns in Computer

Science Report 94- I -CS (ftp.uwasa.fi/cs/report94- I/gaCSbib.ps.Z)

3. Economics: Indexed Bihliogruphy of Genetic Algorithms in Economics Report 941 -ECO ( ftp.uwasa. fi/cs/report94- 1/gaECObi b.ps.Z)
4. Electronics and VLSI Design and Testing: Inde=ced Bibliography of Genetic
Algorithms in Electronics mid VLSI Design und Testing Report 94- I -VLSI
(ftp.uwasa.fi/cs/report94- I /gaVLSIbib.ps.Z)

5. Engineering: Indexed Bibliography of Genetic Algorithms in Engineering Report

94- 1 -ENG (ftp.uwasa.fi/cs/report94-I/gaENGbib.ps.Z)
6. Fuzzy Systems: Indexed Bibliography of Genetic Algorithms und Fuzzy Systerns
Report 94- I -FUZZY (ftp.uwasa.fi/cs/report94-l/gaFUZZYbib.ps.Z)

7. Logistics: Indexed Bibliography of Genetic Algorithms in Logi.stic.s Report 94- 1 TEAM1 /gaLOGI
LRN
LOGISTICS (ftp.uwasa. fi/cs/report94ST1CSbib.ps.Z)

8. Manufacturing: Indexed Bibliography of Genetic Algorithtns in Muni~fac~turing

Report 94- 1 -MANU (ftp.uwasa.fi/cs/report94- l/gaMANUbib.ps.Z)

9. Neural Networks: Indexed Bibliography qf Genetic Algorithriis arid Neicral

Netwvrks Report 94- 1-NN (ftp.uwasa.fi/cs/report94- l/gaNNbib.ps.Z)
10. Optimization: Indexed Bibliography of Genetic Algorithrzts and Optinii~utiori
Report 94- 1 -0F'TIMI (ftp.uwasa.fi/cs/report94- 1 /gaOPTIMIbib.ps.Z)
1 1 . Operations Research: Indexed Bibliography of Genetic Algorithrns in Opemtions
Research Report 94- 1 -OR (ftp.uwasa.fi/cs/report94- 1/gaORbib.ps.Z)

12. Power Engineering: Indexed Bibliography of Genetic Algorithrns in Pobivr

Engineering Report 94- 1-POWER (ftp.uwasa.fi/cs/report94- 1/gaPOWERbib.ps.Z)
1 3. Robotics: Indexed Bibliography of Genetic Algorithrns in Robotics Report 94- I ROB0 (ftp.uwasa.fi/cs/report94- 1/gaROBObib.ps.Z)

14. Signal and Image Processing: Indexed Bibliography of Genetic Algoritliriis

in Signal arid hiage Processing Report 94- 1-SIGNAL (ftp.uwasa.fi/cs/report941 /gaSIGNALbib.ps.Z)

TEAM LRN

3
Advantages (and disadvantages) of
evolutionary computation over other
approaches
Ham-Paul Schwefel

3.1 No-free-lunch theorem

Since. according to the no-free-lunch (NFL) theorem (Wolpert and
Macready 1996), there cannot exist any algorithm for solving all (e.g. optimization) problems that is generally (on average) superior to any competitor,
the question of whether evolutionary algorithms (EAs) are inferiodsuperior to
any alternative approach is senseless. What could be claimed solely is that
EAs behave better than other methods with respect to solving a specific class
of problems-with the consequence that they behave worse for other problem
classes.
The NFL theorem can be corroborated in the case of EAs versus many
classical optimization methods insofar as the latter are more efficient in solving
linear, quadratic, strongly convex, unimodal, separable, and many other special
problems. On the other hand, EAs do not give up so early when discontinuous,
nondifferentiable, multimodal, noisy, and otherwise unconventional response
surfaces are involved. Their effectiveness (or robustness) thus extends to a
broader field of applications, of course with a corresponding loss in efficiency
when applied to the classes of simple problems classical procedures have been
specifically devised for.
Looking into the historical record of procedures devised to solve optimization
problems, especially around the 1960s (see the book by Schwefel ( I995)), when
a couple of direct optimum-seeking algorithms were published, for example, in
the Cornpiiter Joirrrzcrl, a certain pattern of development emerges. Author A
publishes a procedure and demonstrates its suitability by means of tests using
some test functions. Next, author B comes along with a counterexample showing
weak performance of A's algorithm in the case of a certain test problem. Of
course, he also presents a new or modified technique that outperforms the older
20

TEAM LRN

Conclusions

one(s) with respect to the additional test problem. This game could in principle
be played ad infiniturn.
A better means of clarifying the scene ought to result from theory. This
should clearly define the domain of applicability of each algorithm by presenting
convergence proofs and efficiency results. Unfortunately. however, it is possible
to prove abilities of algorithms only by simplifying them as well as the situations
to which they are confronted. The huge remainder of questions must be
answered by means of (always limited) test series, and even that cannot tell
much about an actual real-world problem-solving situation with yet unanalyzed
features, that is, the normal case in applications.
Again unfortunately, there does not exist an agreed-upon test problem
catalogue to evaluate old as well as new algorithms in a concise way. It is
doubtful whether such a test bed will ever be agreed upon, but efforts in that
direction would be worthwhile.

3.2 Conclusions
Finally, what are the truths and consequences? First, there will always remain a
dichotomy between efficiency and general applicability, between reliability and
effort of problem-solving, especially optimum-seeking, algorithms. Any specific
knowledge about the situation at hand may be used to specify an adequate
specific solution algorithm, the optimal situation being that one knows the
solution in advance. On the other hand, there cannot exist one method that solves
all problems effectively as well as efficiently. These goals are contradictory.
If there is already a traditional method that solves a given problem, EAs
should not be used. They cannot do it better or with less computational effort.
In particular, they do not offer an escape from the curse of dimensionality-the
often quadratic, cubic, or otherwise polynomial increase in instructions used as
the number of decision variables is increased, arising, for example, from matrix
manipulation.
To develop a new solution method suitable for a problem at hand may be
a nice challenge to a theoretician, who will afterwards get some merit for his
effort, but from the application point of view the time for developing the new
technique has to be added to the computer time invested. In that respect, a
nonspecialized, robust procedure (and EAs belong to this class) may be, and
often proves to be, worthwhile.
A warning should be given about a common practice-the linearization or
other decomplexification of the situation in order to make a traditional method
applicable. Even a guaranteed globally optimal solution for the simplified task
may be a long way off and thus greatly inferior to an approximate solution to
the real problem.
The best one can say about EAs, therefore, is that they present a
methodological framework that is easy to understand and handle, and is either
usable as a black-box method orTEAM
open LRN
to the incorporation of new or old

Advantages (and disadvantages) of evolutionary computation over other

22
approaches
recipes for further sophistication, specialization or hybridization. They are
applicable even in dynamic situations where the goal or constraints are moving
over time or changing, either exogenously or self-induced, where parameter
adjustments and fitness measurements are disturbed, and where the landscape is
rough, discontinuous, multimodal, even fractal or cannot otherwise be handled
by traditional methods, especially those that need global prediction from local
surface analysis.
There exist EA versions for multiple criterion decision making (MCDM)
and many different parallel computing architectures. Almost forgotten today is
their applicability in experimental (non-computing) situations.
Sometimes striking is the fact that even obviously wrong parameter settings
do not prevent fairly good results: this certainly can be described as robustness.
Not yet well understood, but nevertheless very successful are those EAs which
self-adapt some of their internal parameters, a feature that can be described as
collective learning of the environmental conditions. Nevertheless, even selfadaptation does not circumvent the NFL theorem.
In this sense, and only in this sense, EAs always present an intermediate
compromise; the enthusiasm of their inventors is not yet taken into account
here, nor the insights available from the analysis of the algorithms for natural
evolutionary processes which they try to mimic.

References
Schwefel H-P 1995 Eidution cind Optimum Seeking (New York: Wiley)
Wolpert D H and Macready W G 1996 N o Free Lunch Theorem.sJi)r Seurch Technical
Report SFI-TR-95-02-010 Santa Fe Institute

TEAM LRN

4
Principles of evolutionary processes
David B Fogel

4.1

Overview

The most widely accepted collection of evolutionary theories is the neoDarwinian paradigm. These arguments assert that the vast majority of the
history of life can be fully accounted for by physical processes operating on
and within populations and species (Hoffman 1989, p 39). These processes
are reproduction, mutation, competition, and selection. Reproduction is an
obvious property of extant species. Further, species have such great reproductive
potential that their population size would increase at an exponential rate if
all individuals of the species were to reproduce successfully (Malthus 1826,
Mayr 1982, p 479). Reproduction is accomplished through the transfer of an
individuals genetic program (either asexually or sexually) to progeny. Mutation,
in a positively entropic system, is guaranteed, in that replication errors during
information transfer will necessarily occur. Competition is a consequence of
expanding populations in a finite resource space. Selection is the inevitable
result of competitive replication as species fill the available space. Evolution
becomes the inescapable result of interacting basic physical statistical processes
(Huxley 1963, Wooldridge 1968, Atmar 1979).
Individuals and species can be viewed as a duality of their genetic program,
the genotype (Section 5.2), and their expressed behavioral traits, the phenofype.
The genotype provides a mechanism for the storage of experiential evidence,
of historically acquired information. Unfortunately, the results of genetic
variations are generally unpredictable due to the universal effects of pleiotropy
and polygeny (figure 4.1) (Mayr 1959, 1963, 1982, 1988, Wright 1931, 1960,
Simpson 1949, p 224, Dobzhansky 1970, Stanley 1975, Dawkins 1986).
Pleiotropy is the effect that a single gene may simultaneously affect several
phenotypic traits. Polygeny is the effect that a single phenotypic characteristic
may be determined by the simultaneous interaction of many genes. There are no
one-gene, one-trait relationships in naturally evolved systems. The phenotype
varies as a complex, nonlinear function of the interaction between underlying
genetic structures and current environmental conditions. Very different genetic
TEAM LRN

Principles of evolutionary processes

Figure 4.1. Pleiotropy is the effect that a single gene may simultaneously affect several
phenotypic traits. Polygeny is the effect that a single phenotypic characteristic may
be determined by the simultaneous interaction of many genes. These one-to-many and
many-to-one mappings are pervasive in natural systems. As a result, even small changes
to a single gene may induce a raft of behavioral changes in the individual (after Mayr
1963 ).

structures may code for equivalent behaviors, just as diverse computer programs
can generate similar functions.
Selection directly acts only on the expressed behaviors of individuals and
species (Mayr 1988, pp 477-8). Wright (1932) offered the concept of adaptive
topography to describe the fitness of individuals and species (minimally, isolated
reproductive populations termed demes). A population of genotypes maps to
respective phenotypes (sensir Lewontin 1974), which are in turn mapped onto
the adaptive topography (figure 4.2). Each peak corresponds to an optimized
collection of phenotypes, and thus to one of more sets of optimized genotypes.
Evolution probabilistically proceeds up the slopes of the topography toward
peaks as selection culls inappropriate phenotypic variants.
Others (Atmar 1979, Raven and Johnson 1986, pp 400-1) have suggested
that i t is more appropriate to view the adaptive landscape from an inverted
position. The peaks become troughs, 'minimized prediction error entropy wells'
(Atmar 1979). Searching for peaks depicts evolution as a slowly advancing,
tedious, uncertain process. Moreover, there appears to be a certain fragility to
an evolving phyletic line; an optirnized population might be expected to quickly
fall of the peak under slight perturbations. The inverted topography leaves an
altogether different impression. Populations advance rapidly down the walls of
TEAM
the error troughs until their cohesive
set LRN
of interrelated behaviors is optimized,

Overview

Figure 4.2. Wrights adaptive topography, inverted. An adaptive topography, or adaptive

landscape, is defined to represent the fitness of all possible phenotypes (generated by the
interaction between the genotypes and the environment). Wright ( 1932) proposed that as
selection culls the last appropriate existing behaviors relatiLrc to others in the population.
the population advances to areas of higher fitness on the landscape. Atniar (1979) and
others have suggested viewing the topography from an inverted perspective. Populations
advance to areas of lower behavioral error.

at which point stagnation occurs. If the topography is generally static, rapid

descents will be followed by long periods of stasis. If, however, the topography
is in continual flux, stagnation may never set in.
Viewed in this manner, evolution is an obvious optimizing problemsolving process (not to be confused with a process that leads to perfection).
Selection drives phenotypes as close to the optimum as possible, gi\ren initial
conditions and environment constraints. However the environment is continually
changing. Species lag behind, constantly evolving toward a new optimum. N o
organism should be viewed as being perfectly adapted to its enLrironment. The
suboptimality of behavior is to be expected in any dynamic environment that
mandates tradeoffs between behavioral requirements. However selection never
ceases to operate, regardless of the populations position on the topography.
Mayr (1988, p 532) has summarized some of the more salient characteristics
of the neo-Darwinian paradigm. These include:
The individual is the primary target of selection.
Genetic variation is largely a chance phenomenon. Stochastic processes
play a significant role in evolution.
Genotypic variation is largely a product of recombination and only
ultimately of mutation.
Gradual evolution may incorporate phenotypic discontinuities.
Not all phenotypic changes are necessarily consequences of crd hoc natural
select ion.
Evolution is a change in adaptation and diversity. not merely a change in
gene frequencies.
Selection is probabilistic, not deterministic.
TEAM for
LRNevolutionary computation.
These characteristics form a framework

Principles of evolutionary processes

References
Atmar W I979 The inevitability of evolutionary invention, unpublished manuscript
Dawkins R 1986 The Blind Whtchmuker (Oxford: Clarendon)
Dobzhansky T I970 Genetita of the Evolutionury Processes (New York: Columbia
University Press)
Hoffman A 1989 Arguments on Evolution: a Puleontologist's Perspectitte (New York:
Oxford University Press)
Huxley J 1963 The evolutionary process Evolution UJ (1 Process ed J Huxley, A C Hardy
and E B Ford (New York: Collier) pp 9-33
Lewontin R C 1974 The Genetic Busis of E v olutionq Chunge (New York: Columbia
University Press)
Malthus T R 1826 An Essuy on the Principle of Population, a.\ it Aflects the Future
Impro\wnent of Socieh 6th edn (London: Murray)
Mayr E 1959 Where are we? Cold Spring Hurbor Symp. Quunt. Biol. 24 409-40
-1963 Anirnul Species und E\dution (Cambridge, MA: Belknap)
-I982 The Gro,rvth of Biologicd Thought: Diversity, Evolution and Inheritunce
(Cambridge, MA: Belknap)
-1988 Towurd N New- Philosophy of Biology: Observutions of an Evolutionist
(Cambridge, MA: Belknap)
Raven P H and Johnson G B 1986 Biology (St Louis, MO: Times Mirror)
Simpson G G 1949 The Mecining of Evolution: U Study of the History of Life and its
SigniJicuncejbr Mun (New Haven, CT: Yale University Press)
Stanley S M 1975 A theory of evolution above the species level Proc. Nut1 Amd. Sci.
USA 72 646-50
ul Busis of Intelligent L f e (New
Wooldridge D E 1968 The Mechanicul Mun: the Ph
York: McGraw-Hill)
Wright S 1931 Evolution in Mendelian populations Genetics 16 97-1 59
-1932 The roles of mutation, inbreeding, crossbreeding, and selection in evolution
Proc. 6th Int. Congr. on Genetic5 (Ithucu, N Y ) vol 1, pp 356-66
-1960
The evolution of life, panel discussion Eivlution After Dumin: Issues in
Eidution vol 3 , ed S Tax and C Callender (Chicago, IL: University of Chicago
Press)

TEAM LRN

5
Principles of genetics
Raymond C Paton

5.1 Introduction
The material covers a number of key areas which are necessary to understanding
the nature of the evolutionary process. We begin by looking at some basic ideas
of heredity and how variation occurs in interbreeding populations. From here
we look at the gene in more detail and then consider how i t can undergo change.
The next section looks at aspects of population thinking needed to appreciate
selection. This is crucial to an appreciation of Darwinian mechanisms of
evolution. The chapter concludes with selected references to further information.
In order to keep this contribution within its size limits, the material is primarily
about the biology of higher plants and animals.

5.2 Some fundamental concepts in genetics

Many plants and animals are produced through sexual means by which the
nucleus of a male sperm cell fuses with a female egg cell (ovum). Sperm and
ovum nuclei each contain a single complement of nuclear material arranged as
ribbon-like structures called chromosomes. When a sperm fuses with an egg
the resulting cell, called a zygote, has a double complement of chromosomes
together with the cytoplasm of the ovum. We say that a single complement
of chromosomes constitutes a haploid set (abbreviated as n ) and a double
complement is called the diploid set (2rz). Gametes (sex cells) are haploid
whereas most other cells are diploid. The formation of gametes (gametogenesis)
requires the number of chromosomes in the gamete-forming cells to be halved
(see figure 5.1).
Gametogenesis is achieved through a special type of cell division called
meiosis (also called reduction division). The intricate mechanics of meiosis
ensures that gametes contain only one copy of each chromosome.
A genotype is the genetic constitution that an organism inherits from its
parents. In a diploid organism, half the genotype is inherited from one parent and
half from the other. Diploid cells contain two copies of each chromosome. This
TEAM LRN

Principles of genetics

Figure 5.1. A common life cycle model.

rule is not universally true when it comes to the distribution of sex chromosomes.
Human diploid cells contain 46 chromosomes of which there are 22 pairs and
an additional two sex chromosomes. Sex is determined by one pair (called
the sex chromosomes); female is X and male is Y. A female human has the
sex chromosome genotype of XX and a male is XY. The inheritance of sex is
summarized in figure 5.2. The members of a pair of nonsex chromosomes are
said to be homologous (this is also true for XX genotypes whereas XY are not
homologous).

Figure 5.2. Inheritance of sex chromosomes.

Although humans have been selectively breeding domestic animals and

plants for a long time, the modern study of genetics began in the mid-19th
TEAM LRN
century with the work of Gregor Mendel.
Mendel investigated the inheritance

Some fundamental concepts in genetics

of particular traits in peas. For example, he took plants that had wrinkled
seeds and plants that had round seeds and bred them with plants of the same
phenotype (i.e. observable appearance), so wrinkled were bred with wrinkled and
round were bred with round. He continued this over a number of generations
until round always produced round offspring and wrinkled, wrinkled. These
are called pure breeding plants. He then cross-fertilized the plants by breeding
rounds with wrinkles. The subsequent generation (called the FI hybrids) was
all round. Then Mendel crossed the F1 hybrids with each other and found that
the next generation, the F2 hybrids, had round and wrinkled plants in the ratio
of 3 (round) : I (wrinkled).
Mendel did this kind of experiment with a number of pea characteristics
such as:
color of cotyledons
color of flowers
color of seeds
length of stem

yellow or green
red or white
graybrown or white
tall or dwarf.

In each case he found that the the Fl hybrids were always of one form and
the two forms reappeared in the F2. Mendel called the form which appeared in
the F1 generation dominant and the form which reappeared in the F2 recessive
(for the full text of Mendels experiments see an older genetics book, such as
that by Sinnott et nl (1958)).
A modern interpretation of inheritance depends upon a proper understanding
of the nature of a gene and how the gene is expressed in the phenotype. The
nature of a gene is quite complex as we shall see later (see also Alberts et nl
1989, Lewin 1990, Futuyma 1986). For now we shall take it to be the functional
unit of inheritance. An allele (allelomorph) is one of several forms of a gene
occupying a given locus (location) on a chromosome. Originally related to pairs
of contrasting characteristics (see examples above), the idea of observable unit
characters was introduced to genetics around the turn of this century by such
workers as Bateson, de Vries, and Correns (see Darden 1991). The concept of
a gene has tended to replace allele in general usage although the two terms are
not the same.
How can the results of Mendels experiments be interpreted? We know
that each parent plant provides half the chromosome complement found in its
offspring and that chromosomes in the diploid cells are in pairs of homologues.
In the pea experiments pure breeding parents had homologous chromosomes
which were identical for a particular gene; we say they are homozygous for
a particular gene. The pure breeding plants were produced through selffertilization and by selecting those offspring of the desired phenotype. As round
was dominant to wrinkled we say that the round form of the gene is R (big
r) and the wrinkled r (little r). Figure 5.3 summarizes the cross of a pure
breeding round (RR) with a pure breeding
wrinkled (rr).
TEAM LRN

Figure 5.3. A simple Mendelian experiment.

We see the appearance of the heterozygote (in this case Rr) in the F1
generation. This is phenotypically the same as the dominant phenotype but
genotypically contains both a dominant and a recessive form of the particular
gene under study. Thus when the heterozygotes are randomly crossed with
each other the phenotype ratio is three dominant : one recessive. This is called
the rnonohybrid ratio (i.e. for one allele). We see in Mendels experiments
the independent segregation of alleles during breeding and their subsequent
independent as sortmen t in offspring .
In the case of two genes we find more phenotypes and genotypes appearing.
Consider what happens when pure breeding homozygotes for round yellow seeds
(RRYY) are bred with pure breeding homozygotes for wrinkled green seeds
(rryy). On being crossed we end up with heterozygotes with a genotype of
RrYy and phenotype of round yellow seeds. We have seen that the genes
segregate independently during meiosis so we have the combinations shown in
figure 5.4.
R r Y y
genes segregate
independently

rY ry

Figure 5.4. Genes segregating independently

Thus the gametes of the heterozygote can be of four kinds though we assume
that each form can occur with equal frequency. We may examine the possible
combinations of gametes for the next generation by producing a contingency
TEAM LRN
table for possible gamete combinations.
These are shown in figure 5.5.

Some fundamental concepts in genetics

Figure 5.5. Genotype and phenotype patterns in F2.

We summarize this set of genotype combinations in the phenotype table

(figure 5.5(b)). The resulting ratio of phenotypes is called the dihybrid ratio
(9:3:3:1). We shall consider one final example in this very brief summary.
When pure breeding red-flowered snapdragons were crossed with pure breeding
white-flowered plants the F1 plants were all pink. When these were selfed the
population of offspring was in the ratio of one red : two pink : one white. This
is a case of incomplete dominance in the heterozygote.
It has been found that the Mendelian ratios do not always apply in breeding
experiments. In some cases this is because certain genes interact with each
other. Epistasis occurs when the expression of one gene masks the phenotypic
effects of another. For example, certain genotypes (cyanogenics) of clover can
resist grazing because they produce low doses of cyanide which makes them
unpalatable. Two genes are involved in cyanide production, one which produces
an enzyme which converts a precursor molecule into a glycoside and another
gene which produces an enzyme which converts the glycoside into hydrogen
cyanide (figure 5.6(a)). If two pure breeding acyanogenic strains are crossed
the heterozygote is cyanogenic (figure 5.6(b)).

Figure 5.6. Cyanogenic clover: TEAM

cyanideLRN
production and cyanogenic hybrid.

Principles of genetics

When the cyanogenic strain is selfed the genotypes are as summarized

in figure 5.7(a). There are only two phenotypes produced, cyanogenic and
acyanogenic, as summarized in figure 5.7(b).
So far we have followed Mendels laws regarding the independent
segregation of genes. This independent segregation does not occur when genes
are located on the same chromosome. During meiosis homologous chromosomes
(i.e. matched pairs one from each parental gamete) move together and are seen
to be joined at the centromere (the clear oval region in figure 5.8).
In this simplified diagram we show a set of genes (rectangles) in which those
on the top are of the opposite form to those on the bottom. As the chromosomes
are juxtaposed they each are doubled up so that four strands (usually called
chromatids) are aligned. The close proximity of the inner two chromatids and
the presence of enzymes in the cellular environment can result in breakages and
recombinations of these strands as summarized in figure 5.9.
The result is that of the four juxtaposed strands two are the same as
the parental chromosomes and two, called the recombinants, are different.
This crossover process mixes up the genes with respect to original parental
chromosomes. The chromosomes which make up a haploid gamete will be
a random mixture of parental and recombinant forms. This increases the
TEAM LRN
variability between parents and offspring
and reduces the chance of harmful

The gene in more detail

recessives becoming homozygous.

5.3 The gene in more detail

Genes are located on chromosomes. Chromosomes segregate independently
during meiosis whereas genes can be linked on the same chromosome. The
conceptual reasons why there has been confusion are the differences in
understanding about gene and chromosome such as which is the unit of heredity
(see Darden 1991). The discovery of the physicochemical nature of hereditary
material culminated in the Watson-Crick model in 1953 (see figure 5.10). The
coding parts of the deoxyribonucleic acid (DNA) are called bases; there are
four types (adenine, thymine, cytosine, and guanine). They are strung along
a sugar-and-phosphate string, which is arranged as a helix. Two intertwined
strings then form the double helix. The functional unit of this code is a triplet
of bases which can code for a single amino acid. The genes are located along
the DNA strand.

Figure 5.10. Idealization of the organization of chromosomes in a eukaryotic cell. (A

eukaryotic cell has an organized nucleus and cytoplasmic organelles.)

Transcription is the synthesis of ribonucleic acid (RNA) using the DNA

template. It is a preliminary step in the ultimate synthesis of protein. A gene
can be transcribed through the action of enzymes and a chain of transcript is
TEAM
LRN (mRNA). This mRNA can then be
formed as a polymer called messenger
RNA

Principles of genetics

translated into protein. The translation process converts the mRNA code into a
protein sequence via another form of RNA called transfer RNA (tRNA). In this
way, genes are transcribed so that mRNA may be produced, from which protein
molecules (typically the workhorses and structural molecules of a cell) can be
formed. This flow of information is generally unidirectional. (For more details
on this topic the reader should consult a molecular biology text and look at the
central dogma of molecular biology, see e.g. Lewin 1990, Alberts et a1 1989.)
Figure 5.1 1 provides a simplified view of the anatomy of a structural gene,
that is, one which codes for a protein or RNA.

Figure 5.11. A simplified diagram of a structural gene.

That part of the gene which ultimately codes for protein or RNA is preceded
upstream by three stretches of code. The enhancer facilitates the operation of
the promoter region. which is where RNA polymerase is bound to the gene in
order to initiate transcription. The operator is the site where transcription can
be halted by the presence of a repressor protein. Exons are expressed in the
final gene product (e.g. the protein molecule) whereas introns are transcribed
but are removed from the transcript leaving the fragments of exon material to
be spliced. One stretch of DNA may consist of several overlapping genes. For
example, the introns in one gene may be the exons in another (Lewin 1990).
The terminator is the postexon region of the gene which causes transcription
to be terminated. Thus a biological gene contains not only code to be read
but also coded instructions on how it should be read and what should be read.
Genes are highly organized. An operon system is located on one chromosome
and consists of a regulator gene and a number of contiguous structural genes
which share the same promoter and terminator and code for enzymes which
are involved in specific metabolic pathways (the classical example is the Lac
operon, see figure 5.12).
Operons can be grouped together into higher-order (hierarchical) regulatory
genetic systems (Neidhart et crl 1990). For example, a number of operons
from different chromosomes may be regulated by a single gene known as a
regulon. These higher-order systems provide a great challenge for change in a
genome. Modification of the higher-order gene can have profound effects on
TEAM
the expression of structural genes that
areLRN
under its influence.

Options for change

Figure 5.12. A visualization of an operon.

5.4

Options for change

We have already seen how sexual reproduction can mix up the genes which
are incorporated in a gamete through the random reassortment of paternal
and maternal chromosomes and through crossing over and recombination.
Effectively though. the gamete acquires a subset of the same genes as the
diploid gamete-producing cells; they are just mixed up. Clearly, any zygote that
is produced will have a mixture of genes and (possibly) some chromosomes
which have both paternal and maternal genes.
There are other mechanisms of change which alter the genes themselves
or change the number of genes present in a genome. We shall describe a
mutation as any change in the sequence of genomic DNA. Gene mutations
are of two types: point mutation. in which a single base is changed, and
frameshift mutation, in which one or more bases (but not a multiple of three)
are inserted or deleted. This changes the frame in which triplets are transcribed
into RNA and ultimately translated into protein. In addition some genes are
able to become transposed elsewhere in a genome. They jump about and
are called transposons. Chromosome changes can be caused by deletion (loss
of a section), duplication (the section is repeated), inversion (the section is in
the reverse order), and translocation (the section has been relocated elsewhere).
There are also changes at the genome level. Ploidy is the term used to describe
multiples of a chromosome complement such as haploid ( n ) , diploid (212), and
tetraploid (4n). A good example of the influence of ploidy on evolution is among
such crops as wheat and cotton. Somy describes changes to the frequency of
particular chromosomes: for example, trisomy is three copies of a chromosome.
5.5

Population thinking

So far we have focused on how genes are inherited and how they or their
combinations can change. In order to understand evolutionary processes
(Chapter 4) we must shift our attention to looking at populations (we shall not
emphasize too much whether of genes. chromosomes, genomes, or organisms).
Population thinking is central to our understanding of models of evolution.
The Hardy-Weinberg theorem applies to frequencies of genes and genotypes
LRN the relative frequency of each gene
in a population of individuals, and TEAM
states that

Principles of genetics

remains in equilibrium from one generation to the next. For a single allele. if
the frequency of one form is p then that of the other (say 4 ) is I - p . The three
genotypes that exist with this allele have the population proportions of
pz +2py +y2 = I.
This equation does not apply when a mixture of four factors changes the relative
frequencies of genes in a population: mutation, selection, gene flow, and random
genetic drift (drift). Drift can be described as the effect of the sampling of a
population on its parents. Each generation can be thought of as a sample of its
parents' population. In that the current population is a sample of its parents,
we acknowledge that a statistical sampling error should be associated with gene
frequencies. The effect will be small in large populations because the relative
proportion of random changes will be a very small component of the large
numbers. However, drift in a small population will have a marked effect.
One factor which can counteract the effect of drift is differential migration
of individuals between populations which leads to gene flow. Several models of
gene flow exist. For example, migration which occurs at random among a group
of small populations is called the island model whereas in the stepping stone
model each population receives migrants only from neighboring populations.
Mutation, selection, and gene flow are deterministic factors so that if fitness.
mutation rate, and rate of gene flow are the same for a number of populations
that begin with same gene frequencies, they will attain the same equilibrium
composition. Drift is a stochastic process because the sampling effect on the
parent population is random.
Sewall Wright introduced the idea of an adaptive landscape to explain how
a population's allele freyuencies might evolve over time. The peaks on the
landscape represent genetic compositions of a population for which the mean
fitness is high and troughs are possible compositions where the mean fitness
is low. As gene frequencies change and mean fitness increases the population
moves uphill. Indeed, selection will operate to increase mean fitness so, on
a multipeaked landscape, selection may operate to move populations to local
maxima. On a fixed landscape drift and selection can act together so that
populations may move uphill (through selection) or downhill (through drift).
This means that the global maximum for the landscape could be reached. These
ideas are formally encapsulated in Wright's ( 1968-1 978) shifring htr1ciiic.e theory
of evolution. Further information on the relation of population genetics to
evolutionary theory can be studied further in the books by Wright (1968-1978),
Crow and Kimura (1970) and Maynard Smith (1989).
The change of gene frequencies coupled with changes in the genes
themselves can lead to the emergence of new species although the process
is far from simple and not fully understood (Futuyma, 1986, Maynard Smith
1993). The nature of the species concept or (for some) concepts which is
central to Darwinism is complicated and will not be discussed here (see e.g.
TEAMapply
LRN to promote speciation (Maynard
Futuyma 1986). Several mechanisms

Population thinking

Smith 1993): geographical or spatial isolation, barriers preventing formation of

hybrids, nonviable hybrids, hybrid infertility, and hybrid breakdown-in which
post-F1 generations are weak or infertile.
Selectionist theories emphasize invariant properties of the system: the system
is an internal generator of variations (Changeux and Dehaene 1989) and diversity
among units of the population exists prior to any testing (Manderick 1994). We
have seen how section operates to optimize fitness. Darden and Cain (1987)
summarize a number of common elements in selectionist theories as follows:
0

a set of a given entity type (i.e. the units of the population)

a particular property ( P ) according to which members of this set vary

an environment in which the entity type is found

a factor in the environment to which members react differentially due to

their possession or nonpossession of the property ( P )
differential benefits (both shorter and longer term) according to the
possession or nonpossession of the property ( P ) .

This scheme summarizes the selectionist approach. In addition, Maynard

Smith ( 1989) discusses a number of selection systems (particular relevant
to animals) including sexual, habitat, family, kin, group, and synergistic
(cooperation). A very helpful overview of this area of ecology, behavior,
and evolution is that by Sigmund (1993). Three selectionist systems in the
biosciences are the neo-Darwinian theory of evolution in a population, clonal
selection theory applied to the immune system, and the theory of neuronal
group selection (for an excellent summary with plenty of references see that by
Manderick ( 1994)).
There are many important aspects of evolutionary biology which have
had to be omitted because of lack of space. The relevance of neutral
molecular evolution theory (Kimura 1983) and nonselectionist approaches (see
e.g. Goodwin and Saunders 1989, Lima de Faria 1988, Kauffman 1993) has not
been discussed. In addition some important ideas have not been considered, such
as evolutionary game theory (Maynard Smith 1989, Sigmund 1993), the role of
sex (see e.g. Hamilton et a1 1990), the evolution of cooperation (Axelrod 1984),
the red queen (Van Valen 1973, Maynard Smith 1989), structured genomes, for
example, incorporation of regulatory hierarchies (Kauffman 1993, Beaumont
1993, Clarke et al 1993), experiments with endosymbiotic systems (Margulis
and Foster 199I , Hilario and Gogarten 1993), coevolving parasite populations
(see e.g. Collins 1994; for a biological critique and further applications see
Sumida and Hamilton I994), inheritance of acquired characteristics (Landman
I99 I), and genomic imprinting and other epigenetic inheritance systems (for a
review see Paton 1994). There are also considerable philosophical issues which
must be addressed in this area which impinge on how biological sources are
applied to evolutionary computing (see Sober 1984). Not least among these is
TEAM LRN
the nature of adaptation.

Principles of genetics

References
Alberts B, Bray D, Lewis J, Raff M. Roberts K and Watson J D 1989 Molecitlrr Biology
of the Cell (New York: Garland)
Axelrod R 1984 The E\diitioti oJ Co-operation (Harmondsworth: Penguin)
Beauniont M A 1993 Evolution of optimal behaviour in networks of Boolean automata
J. Theor. Biol. 165 455-76
Changeux J-P and Dehaene S 1989 Neuronal models of cognitive functions Cognition
33 63- I09
Clarke B, Mittenthal J E and Senn M 1993 A model for the evolution of networks of
genes J . Theor. Biol. 165 269-89
Collins R 1994 Artificial evolution and the paradox of sex Compirting ktith Biolo,qicul
Mertrphor.s ed R C Paton (London: Chapman and Hall)
Crow J F and Kimura M 1970 An lritroduc~tiont o Popukrtion Genetics T h e o p (New(
York: Harper and Row)
Darden L 1991 Theory Chcrnge in Science (New York: Oxford University Press)
Darden L and Cain J A 1987 Selection type theories Phil. Sci. 56 106-29
Futuyma D J 1986 Eiwlittioncin Biology (MA: Sinauer)
Goodwin B C and Saunders P T (eds) 1989 Theoreticul Biology: Epigenetic. i i n d
Eivol ir t i o m Ord t.r from Conipl ex Systems ( Edinburgh : Edinburgh U n i ve rs i 1y
Press)
Hamilton W D, Axelrod A and Tanese R 1990 Sexual reproduction as an adaptation to
resist parasites Proca. Natl Acud. Sci. USA 87 3566-73
Hilario E and Gogarten J P 1993 Horizontal transfer of ATPase genes-the tree of life
becomes a net of life BioSj*stem.r31 1 1 1-9
Kauffman S A 1993 The Origiris of Order (New York: Oxford University Press)
Kimura, M I983 The Neirtrul Theot;v oj Molrc*itlar Eiwlirtioti (Cambridge: Cambridge
University Press)
Landman 0 E 199 1 The inheritance of acquired characteristics Anti. Re,: Genet. 25 1-20
Lewin B 1990 Genes IV (Oxford: Oxford University Press)
Lima de Faria A I988 E , d u t i o n Mithoirt Selection (Amsterdam: Elsevier)
Margulis L and Foster R (eds) 199I Symbiosis ( i s ci Soiirce ( f E ~ d u t i o n u nI t i n o i w t i o n :
Spwiiitioti ~ u i dMorphogeriesis (Cambridge, MA: MIT Press)
Manderick B 1994 The importance of selectionist systems for cognition Conipirting rcith
B i o l o g i d Metuphors ed R C Paton (London: Chapman and Hall)
Maynard Smith J 1989 E\dirrioncirj Genetics (Oxford: Oxford University Press)
____ I993 The Theor? ofEidirtioti Canto edn (Cambridge: Cambridge University Press)
Neidhart F C, Ingraham J L and Schaechter M 1990 Physiology of the Bucteritrl Cell
(Sunderland. MA: Sinauer)
Paton R C I994 Enhancing evolutionary computation using analogues of biological
mechanisms Eiwlittioncirj Conipirtirig (Lectitre Notes in Compicter Scienc*e 865) ed
T C Fogarty (Berlin: Springer) pp 51-64
Sigmund K 1993 Games ofLlfe (Oxford: Oxford University Press)
Sinnott E W, Dunn L C and Dobzhansky T 1958 Principles (J Genetics (New York:
McGraw-Hill)
Sober E 1984 The Nittirre oj Selection: E\diitionctt? Theorj in Philosophicd Foc-u.\
TEAM
LRN
Press)
(Chicago, IL: Unitwsity of Chicago

References

Sumida B and Hamilton W D 1994 Both Wrightian and 'parasite' peak shifts enhance
genetic algorithm performance in the travelling salesman problem Coniliictirig \r*ith
Biological Metriphors ed R C Paton (London: Chapman and Hall)
Van Valen L 1973 A new evolutionary law Eidutionurv Tiiror? 1 1-30
Wright S 1968-1978 E\dutioii und the Genetics of Popultrtioiis vols 1 4 (Chicago, IL:
Chicago University Press)

TEAM LRN

A history of evolutionary computation

Kenneth De Jong, David B Fogel and Ham-Paul
Sch wefel

6.1

Introduction

No one will ever produce a completely accurate account of a set of past events
since. as someone once pointed out, writing history is as difficult as forecasting.
Thus we dare to begin our historical summary of evolutionary computation
rather arbitrarily at a stage as recent as the mid- 1950s.
At that time there was already evidence of the use of digital computer
models to better understand the natural process of evolution. One of the first
descriptions of the use of an evolutionary process for computer problem solving
appeared in the articles by Friedberg (1958) and Friedberg ef a1 (1959). This
represented some of the early work in machine learning and described the use
of an evolutionary algorithm for uutomcrtic proqrmimirig, i.e. the task of finding
a program that calculates a given input-output function. Other founders in the
field remember a paper of Fraser (1957) that influenced their early work, and
there may be many more such forerunners depending on whom one asks.
In the same time frame Brernermann presented some of the first attempts
to apply simulated evolution to numerical optimization problems involving both
linear and convex optimization as well as the solution of nonlinear simultaneous
equations (Bremermann 1962). Bremermann also developed some of the
early evolutionary algorithm (EA) theory, showing that the optimal mutation
probability for linearly separable problems should have the value of I/! in the
case of t bits encoding an individual (Bremermann et ul 1965).
Also during this period Box developed his evolutionary operation (EVOP)
ideas which involved an evolutionary technique for the design and analysis of
(industrial) experiments (Box 1957, Box and Draper 1969). Box's ideas were
never realized as a computer algorithm, although Spendley et a1 (1962) used
them as the basis for their so-called simplex design method. It is interesting to
note that the REVOP proposal (Satterthwaite 1959a, b) introducing randomness
into the EVOP operations was rejected at that time.
TEAM LRN

Evolutionary programming

As is the case with many ground-breaking efforts, these early studies were
met with considerable skepticism. However, by the mid-1960s the bases for
what we today identify as the three main forms of EA were clearly established.
The roots of evolutionary programming (EP) (Chapter 10) were laid by Lawrence
Fogel in San Diego, California (Fogel et a1 1966) and those of genetic algorithms
(GAS) (Chapter 8) were developed at the University of Michigan in Ann Arbor
by Holland (1967). On the other side of the Atlantic Ocean, evolution strategies
(ESs) (Chapter 9) were a joint development of a group of three students, Bienert,
Rechenberg, and Schwefel, in Berlin (Rechenberg 1965).
Over the next 25 years each of these branches developed quite independently
of each other, resulting in unique parallel histories which are described in more
detail in the following sections. However, in 1990 there was an organized effort
to provide a forum for interaction among the various EA research communities.
This took the form of an international workshop entitled Parallel Problem
Solling fkom Nature at Dortmund (Schwefel and Manner I99 1 ).
Since that event the interaction and cooperation among EA researchers from
around the world has continued to grow. In the subsequent years special efforts
were made by the organizers of ZCGA91 (Belew and Booker 1991), EP92
(Fogel and Atmar 1992), and PPSN92 (Manner and Manderick 1992) to provide
additional opportunities for interaction.
This increased interaction led to a consensus for the name of this new field,
e\vlutionarv computation (EC), and the establishment in 1993 of a journal by the
same name published by MIT Press. The increasing interest in EC was further
indicated by the IEEE World Congress on Computationcll Intelligence ( WCCI)
at Orlando, Florida, in June 1994 (Michalewicz et ml 1994), in which one of the
three simultaneous conferences was dedicated to EC along with conferences on
neural networks and fuzzy systems.
That brings us to the present in which the continued growth of the field is
reflected by the many EC events and related activities each year, and its growing
maturity reflected by the increasing number of books and articles about EC.
In order to keep this overview brief, we have deliberately suppressed many
of the details of the historical developments within each of the three main EC
streams. For the interested reader these details are presented in the following
sections.

6.2 Evolutionary programming

Evolutionary programming (EP) was devised by Lawrence J Fogel in 1960
while serving at the National Science Foundation (NSF). Fogel was on leave
from Convair, tasked as special assistant to the associate director (research),
Dr Richard Bolt, to study and write a report on investing in basic research.
Artificial intelligence at the time was mainly concentrated around heuristics
and the simulation of primitive neural networks. It was clear to Fogel that
TEAM
LRN they model humans rather than
both these approaches were limited
because

A history of evolutionary computation

the essential process that produces creatures of increasing intellect: evolution.

Fogel considered intelligence to be based on adapting behavior to meet goals
i n a range of environments. In turn, prediction was viewed as the key
ingredient to intelligent behavior and this suggested a series of experiments on
the use of simulated evolution of tinite-state machines (Chapter 18) to forecast
nonstationary time series with respect to arbitrary criteria. These and other
experiments were documented in a series of publications (Fogel 1962, 1963,
Fogel ef t i i 1965, 1966, and many others).
Intelligent behabior was viewed as requiring the composite ability to ( i )
predict ones environment, coupled with ( i i ) a translation of the predictions
into ii suitable response in light of the given goal. For the sake of generality.
the en~~ironnient
was described as a sequence of symbols taken from a finite
alphabet. The evolutionary problem was detined as evolving an algorithm
(essentially a program) that would operate on the sequence of symbols thus far
obser\,ed in such a manner so as to produce an output symbol that is likely t o
maximize the algorithms performance in light ofboth the next symbol to appear
i n the environment and a well-defined payoff function. Finite-state machines
pro\fidt.d a useful representation for the required behavior.
The proposal was as follows. A population of finite-state machines is
exposed to the enbironment, that is, the sequence of symbols that have been
observed up to the current time. For each parent machine. as each input symbol
is offered to the machine, each output symbol is compared with the next input
symbol. The worth of this prediction is then measured with respect to the payoff
function (e.g. all-none, absolute error, squared error, or any other expression of
the meaning of the symbols). After the last prediction is made. a function of the
payoff for each symbol (e.g. average payoff per symbol) indicates the fitness of
the machine.
Offspring machines are created by randomly mutating each parent machine.
Each parent produces offspring (this was originally implemented as only a single
offspring simply for convenience). There are five possible modes of random
mutation that naturally result from the description of the machine: change an
output symbol. change a state transition, add a state, delete a state. or change
the initial state. The deletion of a state and change of the initial state are
only allowed when the parent machine has more than one state. Mutations are
chosen with respect to a probability distribution, which is typically uniform. The
number of mutations per offspring is also chosen with respect to a probability
distribution or may be fixed ( i priori. These offspring are then evaluated over the
existing environment in the same manner as their parents. Other mutations, such
;is majority logic mating operating on three or more machines, were proposed
by Fogel et cil (1966) but not implemented.
The machines that provide the greatest payoff are retained to become parents
of the next generation. (Typically, half the total machines were saved so that the
parent population remained at a constant size.) This process is iterated until an
TEAM
LRNunexperienced) in the environment
actual prediction of the next symbol
(as yet

Evolutionary programming

is required. The best machine generates this prediction, the new symbol is added
to the experienced environment, and the process is repeated. Fogel (1964) (and
Fogel et nl ( I 966)) used nonregressive evolution. To be retained, a machine
had to rank in the best half of the population. Saving lesser-adapted machines
was discussed as a possibility (Fogel et cil 1966, p 21) but not incorporated.
This general procedure was successfully applied to problems in prediction.
identification, and automatic control (Fogel et nl 1964, 1966, Fogel 1968) and
was extended to simulate coevolving populations by Fogel and Burgin (1969).
Additional experiments evolving finite-state machines for sequence prediction.
pattern recognition, and gaming can be found in the work of Lutter and
Huntsinger ( I969), Burgin ( 1969), Atmar ( 1976). Dearholt ( 1 9 7 6 ~ and
Takeuchi ( 1980).
In the mid-1980s the general EP procedure was extended to alternative
representations including ordered lists for the traveling salesman problem (Fogel
and Fogel 1986), and real-valued vectors for continuous function optimization
(Fogel and Fogel 1986). This led to other applications in route planning
(Fogel 1988, Fogel and Fogel 1988), optimal subset selection (Fogel 1989),
and training neural networks (Fogel et a1 1990), as well as comparisons to other
methods of simulated evolution (Fogel and Atmar 1990). Methods for extending
evolutionary search to a two-step process including evolution of the mutation
variance were offered by Fogel et nl (1991, 1992). Just as the proper choice of
step sizes is a crucial part of every numerical process, including optimization, the
internal adaptation of the mutation variance(s) is of utmost importance for the
algorithms efficiency. This process is called self-adaptation or autoadaptation
in the case of no explicit control mechanism, e.g. if the variances are part of
the individuals characteristics and underlie probabilistic variation in a similar
way as do the ordinary decision variables.
In the early 1990s efforts were made to organize annual conferences on EP,
these leading to the first conference in 1992 (Fogel and Atmar 1992). This
conference offered a variety of optimization applications of EP in robotics
(McDonnell et a1 1992, Andersen er a1 1992). path planning (Larsen and
Herman 1992, Page et a1 1992), neural network design and training (Sebald
and Fogel 1992, Porto 1992, McDonnell 1992), automatic control (Sebald et nl
19921, and other fields.
First contacts were made between the EP and ES communities just
before this conference, and the similar but independent paths that these two
approaches had taken to simulating the process of evolution were clearly
apparent. Members of the ES community have participated in all successive
EP conferences (Back et a1 1993, Sprave 1994, Back and Schiitz 1995, Fogel et
(11 1996). There is less similarity between EP and GAS, as the latter emphasize
simulating specific mechanisms that apply to natural genetic systems whereas
EP emphasizes the behavioral, rather than genetic, relationships between parents
and their offspring. Members of the GA and GP communities have, however,
LRN
also been invited to participate inTEAM
the annual
conferences, making for truly

A history of evolutionary computation

interdiwiplinary interaction (see e.g. Altenberg 1994, Land and Belew 1995,
Koza and Andre 1996).
Since the early 1990s. efforts i n EP have diversified in many directions.
Application\ in training neural networks have received considerable attention
( w e e.g. English 1994, Angeline st d 1994, NlcDonnell and Waagen 1994,
Porto et 111 199.5). while relatively less attention ha\ been devoted to evolving
furry \ystems (Haffner and Sebald 1993, Kim and Jeon 1996). Image processing
applications can be found in the articles by Bhattacharjya and Roysam (1994).
Brotherton et trl ( 1994). Rizki et til (1995), and others. Recent efforts to use
EP in medicine hace been offered by Fogel et cil (1995) and Gehlhaar et trl
( 1995 1.
Efforts \tudying and comparing methods of self-adaptation can be
found in the articles by Saralanan et crl ( 1995). Angeline er al ( 1996). and
other\. Mathematical analyse\ of EP have been summarized by Fogel (1995).
To offer a summary, the initial efforts of L J Fogel indicate some of the
early attempts to ( i ) use simulated cvolution to perform prediction, ( i i ) include
ariable-length encodings, (iii) use representation\ that take the form of a
\equence of instructions. ( i v ) incorporate a population of candidate solutions, and
( b ) coevolve evolutionary programs. Moreover, Fogel ( 1963, 1964) and Fogel
et trl (1966) offered the early recognition that natural evolution and the human
endeavor of the scientific method are essentially \imilar processes, a notion
recently echoed by Cell-Mann ( 1994). The initial prescription3 for operating
on tinite-state machines have been extended to arbitrary representations.
mutation operators, and selection methods, and techniques for self-adapting the
t ' b olutionary search hake been proposed and implemented. The population sire
need not be kept constant and there can be a variable number of offjpring
per parent, much like the ( p A ) methods (Section 25.4) offered in ESs. In
contra\t to these methods, selection is often made probabilistic in EP, giving
les\er-\coring solutions \ome probability of surviving as parents into the next
generation. In contrast to GAS, no effort is made in EP to support (some \ay
maximim) schema processing. nor is the use of random variation comtrained
to emphasize \pecitic mechanisms of genetic transfer, perhaps providing greater
\ ersatility to tackle specific problem domains that are unsuitable for genetic
operator\ wch a\ cros\over.

6.3 Genetic algorithms

The first glimpses of the ideas underlying genetic algorithms (GAs) are found in
Holland's paper\ in the early 1960s (see e.g. Holland 1962). In them Holland \et
o u t a broad and ambitious agenda for understanding the underlying principles
of adaptive sy\tems--\ystems that are capable of self-modification in re,pon\e
to their interactions with the environments in which they must function. Such a
LRNboth the understanding of complex
theorq of adaptive \y\tem\ jhould TEAM
facilitate

Genetic algorithms

forms of adaptation as they appear in natural systems and our ability to design
robust adaptive artifacts.
In Hollands view the key feature of robust natural adaptive systems
was the successful use of competition and innovation to provide the ability
to dynamically respond to unanticipated events and changing environments.
Simple models of biological evolution were seen to capture these ideas nicely via
notions of survival of the fittest and the continuous production of new offspring.
This theme of using evolutionary models both to understand natural adaptive
systems and to design robust adaptive artifacts gave Hollands work a somewhat
different focus than those of other contemporary groups that were exploring the
use of evolutionary models in the design of efficient experimental optimization
techniques (Rechenberg 1965) or for the evolution of intelligent agents (Fogel
et a1 1966), as reported in the previous section.
By the mid-1960s Hollands ideas began to take on various computational
forms as reflected by the PhD students working with Holland. From the outset
these systems had a distinct genetic flavor to them in the sense that the
objects to be evolved over time were represented internally as genomes and the
mechanisms of reproduction and inheritance were simple abstractions of familiar
population genetics operators such as mutation, crossover, and inversion.
Bagleys thesis (Bagley 1967) involved tuning sets of weights used in the
evaluation functions of game-playing programs, and represents some of the
earliest experimental work in the use of diploid representations. the role of
inversion, and selection mechanisms. By contrast Rosenbergs thesis (Rosenberg
1967) has a very distinct flavor of simulating the evolution of a simple
biochemical system in which single-celled organisms capable of producing
enzymes were represented in diploid fashion and were evolved over time to
produce appropriate chemical concentrations. Of interest here is some of the
earliest experimentation with adaptive crossover operators.
Cavicchios thesis (Cavicchio 1970) focused on viewing these ideas as a form
of adaptive search, and tested them experimentally on difficult search problems
involving subroutine selection and pattern recognition. In his work we see
some of the early studies on elitist (section 28.4) forms of selection and ideas
for adapting the rates of crossover and mutation. Hollstiens thesis (Hollstien
1971) took the first detailed look at alternate selection and mating schemes.
Using a test suite of two-dimensional fitness landscapes, he experimented with
a variety of breeding strategies drawn from techniques used by animal breeders.
Also of interest here is Hollstiens use of binary string encoding of the genome
and early observations about the virtues of Gray codings.
In parallel with these experimental studies, Holland continued to work on
a general theory of adaptive systems (Holland 1967). During this period he
developed his now famous schema analysis of adaptive systems, relating it to
the optimal allocation of trials using k-armed bandit models (Holland 1969).
He used these ideas to develop a more theoretical analysis of his reproductive
TEAM
LRN Holland then pulled all of these
plans (simple GAS) (Holland 1971,
1973).

A history of evolutionary computation

ideas together in his pivotal book AdnptLition in h'oturul mid Artijcicil Systems
(Holland 1975).
Of interest was the fact that many of the desirable properties of these
algorithms being identified by Holland theoretically were frequently not
observed experimentally. It was not difficult to identify the reasons for this.
Hampered by a lack of computational resources and analysis tools, most of
the early experimental studies involved a relatively small number of runs using
small population sizes (generally less than 20). It became increasingly clear
that many of the observed deviations from expected behavior could be traced
t o the well-known phenomenon in population genetics of genetic d r i f , the loss
of genetic diversity due to the stochastic aspects of selection, reproduction, and
the like in small populations.
By the early 1970s there was considerable interest in understanding better
the behavior of implementable GAS. In particular, it was clear that choices
of population size, representation issues, the choice of operators and operator
rates all had significant effects of the observed behavior of GAS. Frantz's thesis
(Frantz 1972) reflected this new focus by studying in detail the roles of crossover
and inversion in populations of size 100. Of interest here is some of the earliest
experimental work on mu1ti poin t crossover operators.
De Jong's thesis (De Jong 1975) broaded this line of study by analyzing
both theoretically and experimentally the interacting effects of population size.
crossover, and mutation on the behavior of a family of GAS being used to
optimize a fixed test suite of functions. Out of this study came a strong sense that
even these simple GAS had significant potential for solving difficult optimization
pro b 1ems.
The mid-1970s also represented a branching out of the family tree of GAS
as other universities and research laboratories established research activities in
this area. This happened slowly at first since initial attempts to spread the word
about the progress being made in GAS were met with fairly negative perceptions
from the artificial intelligence (AI) community as a result of early overhyped
work in areas such as self-organizing systems and perceptrons.
Undaunted, groups from several universities including the University of
Michigan, the University of Pittsburgh, and the University of Alberta organized
an Aikiptii'e Systerns Worksliop in the summer of 1976 in Ann Arbor, Michigan.
About 20 people attended and agreed to meet again the following summer. This
pattern repeated itself for several years, but by 1979 the organizers felt the
need to broaden the scope and make things a little more formal. Holland, De
Jong, and Sampson obtained NSF funding for Ail 1)zterdisi~iplincir~
Workshop iiz
,4ticiptii*eS~stems,which was held at the University of Michigan in the summer
of I98 1 (Sampson I98 I ).
By this time there were several established research groups working on GAS.
At the University of Michigan, Bethke, Goldberg, and Booker were continuing
to develop GAS and explore Holland's classifier systems (Chapter 12) as part
TEAM
LRN
o f their PhD research (Bethke 1981,
Booker
1982, Goldberg 1983). At the

Genetic algorithms

University of Pittsburgh, Smith and Wetzel were working with De Jong on

various CA enhancements including the Pitt approach to rule learning (Smith
1980, Wetzel 1983). At the University of Alberta, Brindle continued to look
at optimization applications of GAS under the direction of Sampson (Brindle
1981).
The continued growth of interest in GAS led to a series of discussions and
plans to hold the first Ziiterrzatiorzal Conference on Gerietic.Algorithms (ICGA) in
Pittsburgh, Pennsylvania, in 1985. There were about 75 participants presenting
and discussing a wide range of new developments in both the theory and
application of GAS (Grefenstette 1985). The overwhelming success of this
meeting resulted in agreement to continue ZCGA as a biannual conference. Also
agreed upon at ZCGA '85 was the initiation of a moderated electronic discussion
group called CA List.
The field continued to grow and mature as reflected by the ICGA conference
activities (Grefenstette 1987, Schaffer 1989) and the appearance of several books
on the subject (Davis 1987, Goldberg 1989). Goldberg's book, in particular,
served as a significant catalyst by presenting current GA theory and applications
in a clear and precise form easily understood by a broad audience of scientists
and engineers.
By 1989 the ZCGA conference and other CA-related activities had grown
to a point that some more formal mechanisms were needed. The result was
the formation of the International Society for Genetic Algorithms (ISGA), an
incorporated body whose purpose is to serve as a vehicle for conference funding
and to help coordinate and facilitate CA-related activities. One of its first acts of
business was to support a proposal to hold a theory workshop on the Foiimiutioiis
of Genetic. Algorithms (FOGA) in Bloomington, Indiana (Rawlins 199 1 ).
By this time nonstandard GAS were being developed to evolve complex.
nonlinear variable-length structures such as rule sets, LISP code, and neural
networks. One of the motivations for FOGA was the sense that the growth of
CA-based applications had driven the field well beyond the capacity of existing
theory to provide effective analyses and predictions.
Also in 1990, Schwefel hosted the first PPSN conference in Dortmund,
which resulted in the first organized interaction between the ES and CA
communities. This led to additional interaction at ICGA '91 in San Diego which
resulted in an informal agreement to hold ICGA and PPSN in alternating years,
and a commitment to jointly initiate a journal for the field.
It was felt that in order for the journal to be successful. it must have broad
scope and include other species of EA. Efforts were made to include the EP
community as well (which began to organize its own conferences in 1992). and
the new journal Evolutionary Computatioiz was born with the inaugural issue in
the spring of 1993.
The period from 1990 to the present has been characterized by tremendous
growth and diversity of the CA community as reflected by the many conference
TEAM
activities (e.g. ICGA and FOGA),
the LRN
emergence of new books on GAS.

A history of evolutionary computation

and a growing list of journal papers. New paradigms such as messy GAS
(Goldberg et a1 1991) and genetic programming (Chapter 1 1 ) (Koza 1992)
were being developed. The interactions with other EC communities resulted
in considerable crossbreeding of ideas and many new hybrid EAs. New GA
applications continue to be developed, spanning a wide range of problem areas
from engineering design problems to operations research problems to automatic
programming.

6.4 Evolution strategies

In 1964, three students of the Technical University of Berlin, Bienert.
Rechenberg, and Schwefel, did not at all aim at devising a new kind of
optimization procedure. During their studies of aerotechnology and space
technology they met at an Institute of Fluid Mechanics and wanted to construct
a kind of research robot that should perform series of experiments on a flexible
slender three-dimensional body in a wind tunnel so as to minimize its drag. The
method of minimization was planned to be either a one variable at a time or
a discrete gradient technique, gleaned from classical numerics. Both strategies,
performed manually, failed, however. They became stuck prematurely when
used for a two-dimensional demonstration facility, a joint plate-its optimal
shape being a flat plate-with which the students tried to demonstrate that it
was possible to find the optimum automatically.
Only then did Rechenberg (1965) hit upon the idea to use dice for random
decisions. This was the breakthrough-on
12 June 1964. The first version
of an evolutionary strategy (ES), later called the ( I
I ) ES, was born, with
discrete, binomially distributed mutations centered at the ancestors position,
and just one parent and one descendant per generation. This ES was first tested
on a mechanical calculating machine by Schwefel before it was used for the
experimerirum criicis, the joint plate. Even then, it took a while to overcome
a merely locally optimal S shape and to converge towards the expected global
optimum, the flat plate. Bienert (1967), the third of the three students, later
actually constructed a kind of robot that could perform the actions and decisions
automatically.
Using this simple two-membered ES, another student, Lichtfulj ( 1965).
optimized the shape of a bent pipe. also experimentally. The result was rather
unexpected, but nevertheless obviously better than all shapes proposed so far.
First computer experiments, on a Zuse 223, as well as analytical
investigations using binomially distributed integer mutations, had already been
performed by Schwefel (1965). The main result was that such a strategy can
become stuck prematurely, i.e. at solutions that are not even locally optimal.
Based on this experience the use of normally instead of binomially distributed
mutations became standard in most of the later computer experiments with realkalued variables and in theoretical investigations into the methods efficiency,
TEAM LRN
but not however in experimental optimization
using ESs. In 1966 the little ES

Evolution strategies

community was destroyed by dismissal from the Institute of Fluid Mechanics

('Cybernetics as such is no longer pursued at the institute!'). Not before 1970
was it found together again at the Institute of Measurement and Control of the
Technical University of Berlin, sponsored by grants from the German Research
Foundation (DFG). Due to the circumstances, the group missed publishing its
ideas and results properly, especially in English.
In the meantime the often-cited two-phase nozzle optimization was
performed at the Institute of Nuclear Technology of the Technical University
of Berlin, then in an industrial surrounding, the AEG research laboratory
(Schwefel 1968. Klockgether and Schwefel 1970), also at Berlin. For a hotwater flashing flow the shape of a three-dimensional convergent-divergent (thus
supersonic) nozzle with maximum energy efficiency was sought. Though in this
experimental optimization an exogenously controlled binomial-like distribution
was used again, it was the first time that gene duplication and deletion were
incorporated into an EA, especially in a ( I
I ) ES. because the optimal length
of the nozzle was not known in advance. As in case of the bent pipe this
experimental strategy led to highly unexpected results, not easy to understand
even afterwards, but definitely much better than available before.
First Rechenberg and later Schwefel analyzed and improved their ES. For the
( 1 1 ) ES, Rechenberg, in his Dr.-Ing. thesis of 197 I , developed. on the basis of
two convex n-dimensional model functions, a convergence rate theory for I I >> 1
variables. Based on these results he formulated a $ success rule for adapting
the standard deviation of mutation (Rechenberg 1973). The hope of arriving at
an even better strategy by imitating organic evolution more closely led to the
incorporation of the population principle and the introduction of recombination,
which of course could not be embedded in the ( I + I ) ES. A first multimembered
ES, the ( p I ) ES-the notation was introduced later by Schwefel-was also
designed by Rechenberg in his seminal work of 1973. Because of its inability
to self-adapt the mutation step sizes (more accurately, standard deviations of the
mutations), this strategy was never widely used.
Much more widespread became the ( p
A ) ES and ( p , A ) ES, both
formulated by Schwefel in his Dr.-Ing. thesis of 1974-197s.
It contains
A ) ES and
theoretical results such as a convergence rate theory for the ( I
the ( I , A ) ES (A > I ) , analogous to the theory introduced by Rechenberg
1 ) ES (Schwefel 1977). The mriltirneniher-ed ( p > I ) ESs arose
for the ( 1
from the otherwise ineffective incorporation of mutatable mutation parameters
(variances and covariances of the Gaussian distributions used). Self-adaptation
was achieved with the ( p ,A) ES first, not only with respect to the step sizes.
but also with respect to correlation coefficients. The enhanced ES version with
correlated mutations, described already in an internal report (Schwefel I974),
was published much later (Schwefel 1981) due to the fact that the author left
Berlin in 1976. A more detailed empirical analysis of the on-line self-adaptation
of the internal or strategy parameters was first published by Schwefel in 1987
TEAM LRN on one of the first small instruction
(the tests themselves were secretly performed

A history of evolutionary computation

rnultiple data (SIMD) parallel machines (CRAY I ) at the Nuclear Research

Centre (KFA) Julich during the early 1980s with a first parallel version of
the multimenibered ES with correlated mutations). It was in this work that the
notion of ,\elf-trtkrptcitioii by cwllec'ti\v letimiiig first came up. The importance of
recombination (for object as well as strategy parameters) and soft selection (or
/ i > I ) was clearly demonstrated. Only recently has Beyer ( 1995a. b ) delicered
the theoretical background to that particularly important issue.
I t may be worth mentioning that in the beginning there were strong objection\
against increasing h as well as p beyond one. The argument against A > 1
was that the exploitation of the current knowledge was unneceswily delayed.
and the argument against p > 1 was that the wrvival of inferior members of
the population would unnecessarily slow down the evolutionary progress. The
hint that h successors could be evaluated in parallel did not convince anybody
since parallel computers were neither available nor expected in the near future.
The two-membered ES and the very similar creeping random search method of
Rastrigin ( 1965) were investigated thoroughly with respect to their convergence
and convergence rates also by Matyas ( 1965) i n Czechoslovakia, Born ( 1978)
on the Eastern side of the Berlin ball ( !), and Rappl ( 1984) in Munich.
Since this early work many new results hJve been produced by the ES
community consisting of the group at Berlin (Rechenberg, since 1972) and that
at Dortmund (Schwefel. since 1985). In particular, strategy variants concerning
other than only real-valued parameter optimization. i.e. real-world problems,
were invented. The first use of an ES for binary optimization using multicellular
individuals was presented by Schwefel (1975). The idea of using several
\ubpopulations and niching mechanisms for global optimization was propagated
by Schwefel in 1977; due to a lack of computing resources. however, it could
not be tested thoroughly at that time. Rechenberg (1978) invented a notational
scheme for such nested ESs.
Beside these nonstandard approaches there now exists a wide range of
other ESs, e.g. seceral parallel concepts (Hoffmeister and Schwefel 1990,
Lohmann 199 1 , Rudolph I99 I , 1992, Sprave 1994, Rudolph and Sprave 1995).
ESs for multicriterion problems (Kursawe 199 1 , 1992). for mixed-integer tasks
(Lohmann 1992. Rudolph 1994. Back and Schutz 1995), and even for problems
with a variable-dimensional parameter space (Schutz and Sprave I996), and
variants concerning nonstandard step size and direction adaptation schemes (\ee
e.g. Matyas 1967, Stewart et trl 1967, Furst et t i l 1968, Heydt 1970, Rappl 1984,
Ostermeier et crl 1994). Comparisons between ESs, GAS, and EP may be found
in the articles by Back et cil (1991, 1993). It was Back ( 1996) who introduced
a common algorithmic scheme for all brands of' current EAs.
Omitting all these other useful nonstandard 13%-a commented collection of
literature concerning ES applications was made at the University of Dortmund
(Back et 111 1992)-the history of ESs is closed with a mention of three recent
books by Rechenberg ( 1994), Schwefel (1995), and Back ( 1996) as well as
LRN as written tutorial\ (Schwefel and
three recent contributions that mayTEAM
be seen

References

Rudolph 1995, Back and Schwefel 1995, Schwefel and Bick 1995), which on
the one hand define the actual standard ES algorithms and on the other hand
present some recent theoretical results.

References
Altenberg L I994 Emergent phenomena in genetic programming Proc. 3rd Aiinic. Conf
oii E\dirtionLiny Progmtnniing (San Diego, CA, 1994) ed A V Scbald and L J Fogel
(Singapore: World Scientific) pp 2 3 3 4 1
Andersen B, McDonnell J and Page W 1992 Configuration optimization of mobile
manipulators with equality constraints using evolutionary programming Proc. I \t
Atin. Conf oii Evolutionuq~Progranzming (Lci J o l l a , CA, 1992) ed D B Fogel and
W Atmar (La Jolla, CA: Evolutionary Programming Society) pp 71-9
Angeline P J, Fogel D B and Fogel L J 1996 A compariwn of self-adaptation methods
for finite \(ate machines in a dynamic environment Eidirtionut?~ProgruimiinCq
V-Proc. 5th Ann. Cot$ on E\dutinnai? Progrurnriiirig ( 1996) ed L J Fogel,
P J Angeline and T Back (Cambridge, MA: MIT Pre\s)
Angeline P J, Saunders G M and Pollack J B 1994 An evolutionary algorithm that
constructs recurrent neural networks IEEE Trrrns. Neirrul Nont*orhs " - 5 54-65
ofthe E,wlution c?flntelligc.nc.t~
u r i d I t s Possihlt~Reuli:trtiori
Atmar J W 1976 S~iec~irlcitioti
in Muclzine Form ScD Thesis, New Mexico State University
Back T I996 E\vlutionun Algorithriis iiz Theor? und Pructice (New York: Oxford
University Pre\s)
Back T, Hoffmeister F and Schwefel H-P 1991 A wrvcy of evolution \trategies Pmc.
4th Int. Conf. on Genetic Algorithitis (Sun Diego, CA, 1991I ed R K Belew and
L B Booker (San Mateo, CA: Morgan Kaufmann) pp 2-9
-1
992 Applicutions of E~dutioncrn?
Algorithnis Technical Report o f the Uni\wsity
of Dortmund Department of Computer Science Sy\tem\ Analy5is Rcwarch Group
sY s-2/92
Back T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Anti. Conf. on E\wlutionrrr~
Progrurnniing (San Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla. CA:
Evolutionary Programming Society) pp 1 1-22
Back T and Schutz M 1995 Evolution strategies for mixed-integer optimization of
optical multilayer systems Evolutionan Progrumrning IV-Pro(.. 4th Ann. Conf 0 1 1
E\dutionur? Programming (Sun Diego, CA, 1995) ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 33-51
Back T and Schwefel H-P 1995 Evolution strategies I: variant5 and their computational
implementation Genetic A Igorithms in Engineering und Coitiptrr Scirticr, Proc. I J t
Short Coiirse EUROGEN-95 ed G Winter, J Pkriaux, M Galhn and P Cuesta (New
York: Wiley) pp I 1 1-26
Bagley J D 1967 The Behavior of Adup five Systems \r%ic.h Employ Griwtic mid
Correlution Algorithms PhD Thesis, University of Michigan
Belew R K and Booker L B (eds) 199 1 Proc. 4th liit. Cot$ on Genrtic AI~qorithiti.s(Strii
Diego, CA, 1991) (San Mateo, CA: Morgan Kaufmann)
Bethke A D 198 1 Genetic Algorithriis us Futicrion Optimi,-er\ PhD Thesis, Univer4ty of
TEAM LRN
Michigan

A history of evolutionary computation

Be yer H-G 1995a H o ~ GAS

s
do Not Work-Utirlerstcinclitig GAS Wittioiit Scliemritci mid
Biriltlitig Blocks Technical Report of the University of Dortmund Department of
Computer Science Systems Analysis Research Group SYS-2/95
-199Sb
Toward a theory of evolution strategies: on the benetit of sex-the ( p / l c . 2-1theory Ei*olirtiotiiit;vCotiipt. 3 8 I - 1 1 I
Bhattacharjya A K and Roysam B 1993 Joint solution of IOW-, intermediate- and highlevel vision tasks by evolutionary optimization: application to computer vision at
low SNR IEEE Trcins. Neirrcil Netrrwks " -5 83-95
Bienert P 1967 Airfhriir eitier Oi)ti~,iisri~,ig.sciirtot~icitik~~r
drei Pcrrciriieter Dip1.-lng. Thesis,
Technical University of Berlin, Institute of Measurement and Control Technology
Bookcr L I982 Intelligerit Behliiior ( i s ( i i i A~liiptcitionto the Eisk Eiiiiroiinietit PhD
Thesis, University of Michigan
Born J I 978 Ei.olirtiotisstrtitegieti x r nc4tneri,sc~heri
Liisirng \'oti ALlcii)t(itioti,s(ii~~sCihen
PhD
Thesis. Humboldt University at Berlin
Box G E P 1957 Evolutionary operation: a method for increasing industrial productikrity
A/11>1.Stilt. 6 8 1-101
Box G E P and Draper N P 1969 Eidutiotitir.~Opertitiori. A Method ji)r Inc.rc.ti,siri!:
Indiistritil Protlicc.ti\tity (New York: Wiley )
Brctnermann H J 1962 Optimization through evolution and recombination Se!f'0rgtini:ing Systenis ed M C Yovits et nl (Washington. DC: Spartan)
Brcmcrniann H J. Rogson M and Salaff S 1965 Search by evolution Biop/ij~sit.s
ciriil Cyherrietic. Syster~is-Proc. 2nd Cyberrietit Scietices Sytiip. ed M Max field,
A Callahan and L J Fogel (Washington, DC: Spartan) pp 157-67
Brindle A 198 1 Genetic- Algorithtns Jhr Futic.tion Optiniizitioti PhD Thesis, University
of Alberta
Brotherton T W, Sinipson P K. Fogel D B and Pollard T 1994 Classifier design using
c\dutionary programming Proc.. 3rd Anti. Corzj: 011Eidirtioii~ir>~
Progrcininiincq (Sat1
Diego, CA, 1994) cd A V Sebald and L J Fogel (Singapore: World Scientific)
pp 68-75
Burgin G H 1969 On playing two-person zero-sum games against nonminimax players
IEEE Trcins. Syst. Sci. Cybertiet. SSC-5 369-70
~tl
PhD Thesis, University
Cavicchio D J 1970 Atlq'tir-e Srtirdi lIsirig S i ~ ~ i i c l n t tE\~)littion
01' Michigan
Davis L I987 Genetic' A l g o r i t h s w i t 1 Sitiiiiltited Aiitietilirig (London: Pitman J
Dcarholt D W I976 Some experiments on generalization using evolving automata Proc.
9th I t i t . Cot$ or1 System Scietic*es (Hoiioliilir, H I ) pp 13 1-3
Dc Jong K A 1975 Antrlysis of Behyior of (i C1tis.c of Genetic Adiipti\-e Systems PhD
Thesis, University of Michigan
English T M 1994 Generalization in populations of recurrent neural networks Proc. 3rd
Atin. Con/: o t i E1wliitioiiLii.y Progrcimming ( S m Diego,CA, 1994) ed A V Sehald
and L J Fogel (Singapore: World Scientific) pp 26-33
Fogcl D B 1988 An evolutionary approach to the traveling salesman problem Biol.
C)'hertiet. 60 1 3 9 4 4
__ 1989 Evolutionary programming for voice feature analysis Proc. 231x1 Asilor,itir
C'otlf: o t i Signii1.s. S>:stetii.smrl Cm~ipi4ter.r
(Ptrc*@cGroiv, CA) pp 38 1 -3
__ 1 995 Erwlutiotieir\~Computtition: T o ~ ~ i cri Neii
d
Piiilosophy of' Mrichitie ltitt~lligriic~e
TEAM LRN
(New York: IEEE)

References

Fogel D B and Atmar J W 1990 Comparing genetic operators with Gaussian mutations
in simulated evolutionary processing using linear systems Biol. Cyhernet. 63 1 1 1 - 4
-(eds)
1992 Proc. 1st A m . Con8 on Elvlutiortan Progrunttning (LA Jolla, CA, 1992)
(La Jolla, CA: Evolutionary Programming Society)
Fogel D B and Fogel L J 1988 Route optimization through evolutionary programming
Proc. 22nd Asilornur Conf on Signals, Systems und Computers (Pocijk Grmv. CA )
pp 679-80
Fogel D B, Fogel L J and Atmar J W 1991 Meta-evolutionary programming Pro(.. 25th
Asilontar Conf: on Signals, Systems and Compitters (Pncific Gro\te, CA) ed R R Chen
pp 540-5
Fogel D B, Fogel L J, Atmar J W and Fogel G B 1992 Hierarchic methods of evolutionary
programming Pro(.. I st Anti. Con8 on E\dutionuty Progruntntin~q(Lu Jollu, CA,
1992) ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 175-82
Fogel D B, Fogel L J and Port0 V W 1990 Evolving neural networks Biol. Cyhurnet. 63
487-93
Fogel D B, Wasson E C and Boughton E M 1995 Evolving neural networks for detecting
breast cancer Cancer Lett. 96 49-53
Fogel L J 1962 Autonomous automata Industrial Res. 4 14-9
-1963 Biotechnology: Corzcepts and Applications (Englewood Cliffs, N J : PrenticeHall)
-I964 On the 0rgcirti:cition of Intellect PhD Thesis, University of California at Los
Angeles
-1
968 Extending communication and control through simulated evolution
Bioengineering-urt Engineering Veu. Proc. Sytnp. on En~qineeringSi~qniJic~unc~u
of'
the Biological Sciences ed G Bugliarello (San Francisco, CA: San Francisco Press)
pp 2 8 6 3 0 4
Fogel L J, Angeline P J and Back T (eds) 1996 Evolutionun Progrumming V-Proc. 5th
Atin. Conf on Evolutionan Programming (1996) (Cambridge, MA: MIT Press)
Fogel L J and Burgin G H 1969 Competitive Goal-seeking throirgh E\~olutionctr?
Progrumming Air Force Cambridge Research Laboratories Final Report Corltract
AF 19(628)-5927
Fogel L J and Fogel D B 1986 Artijicial Iiztelligence through E\dutioitun Progrutnniin~q
U S Army Research Institute Final Report Contract P0-9-X56- 1 102C-1
Fogel L J, Owens A J and Walsh M J 1964 On the evolution of artificial intelligence
Proc. 5th Natl Syntp. on Human Factors in Electronic*.s(San Diego, CA: IEEE)
-1
965 Artificial intelligence through a simulation of evolution Biophysics und
Cybernetic Sjstems ed A Callahan, M Maxfield and L J Fogel (Washington, DC:
Spartan) pp 13 1-56
-1966 Artijicial Intelligence through Simulated E\dution (New York: Wiley)
FrantL D R 1972 Non-lirtearities in Genetic Aduptilre Setrrch PhD Thesis, University of
Michigan
Fraser A S 1957 Simulation of genetic systems by automatic digital computers Aust. J.
Biol. Sci. 10 484-99
Friedberg R M 1958 A learning machine: part I IBM J. 2 2-1 3
Friedberg R M, Dunham B and North J H 1959 A learning machine: part I1 IBM .I.
3
TEAM LRN
282-7

A history of evolutionary computation

Furst H, Muller P H and Nollau V 1968 Eine stochastische Methode zur Ermittlung
der Maximalstelle einer Funktion von mehreren Veranderlichen mit experimentell
ermittelbaren Funktionswerten und ihre Anwendung bei chemischen Prozcssen
c i ~ e t t t . - w h 20
. w - s
Gehlhaar et cil I99SGehlhaar D K et cil 19% Molecular recognition of the inhibitor
AG- 1343 by HIV- I protease: conformationally flexible docking by evolutionary
programming Clienz. Biol. 2 3 17-24
Cell-Mann M 1994 The Q i ~ i r kcintl the J q i i a r (New York: Freeman)
Goldberg D E 1983 Coinpiiter-Aided Gris Pipeliiie Opemtioti icsing Genetic A1goritlitn.s
ciiitl Ride Lecirtiirzg PhD Thesis, University of Michigan
1989 Genetic Algorithnis iri Seiirch, Optimiz~itiotir i d Mnchine Leiirtiirig (Reading,
MA: Addison- Wesley )
Goldberg D E. Deb K and Korb B 1991 Don't worry, be messy Proc-. 4th Int. Corzf. 011
Gerirtic Algorithr?i.s ( S m Diego, CA, I W l ) ed R K Belew and L B Booker (San
Mateo, CA: Morgan Kaufmann) pp 24-30
Grefenstette J J (ed) 1985 Pro(-. 1st Int. Conf: 011 Genetic Algorithnis mid Their
Applic*doris (Pittshitrgti, PA, 198.5) ( Hillsdale, NJ: Erlbaum)
__ I987 Proc. 2nd lnt. Cot$ o ~ G
i e w t i c Algorithms mcl Their Applicutiom (Cmihrirl~qr,
MA. 1987) (Hillsdale, NJ: Erlbaum)
Haffner S B and Sebald A V 1993 Computer-aided design of fuzzy HVAC
controllers using evolutionary programming Proc. 2nd Ann. Cot$ o i l E \ ~ ) / i i t i o i i ~ i r ~ ~
Pr-ogrcrtnmhg (Sciti Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 98- 107
Heydt G T 1970 Directed R ~ i t i d o mSetirdi PhD Thesis, Purdue University
Hoftmeister F and Schwefcl H-P I990 A taxonomy of parallel evolutionary algorithms
Ptircellu '90. Proc. 5th Int. Workshop on Prircillel Processing b y Cellitlcir A utonirrtci
w i t 1 Arrciys vol 2 , cd G Wolf, T Legendi and U Schendcl (Berlin: Academic)
pp 97-107
Holland J H 1962 Outline for a logical theory of adaptive systems J. ACM 9 297-313
__ 1967 Nonlinear environments permitting efficient adaptation CotnpicttJr md
Infbrnztition Scietic~~s
I1 (New York: Academic)
__ I969 Adaptive plans optimal for payoff-only environments Proc. 2nd Hm-trii Int.
Cot$ oti System Sciences pp 9 17-20
197 1 Processing and processors for schemata A s s o c i d \ v it~fi)rmitioiiprocessitiq ed
E L Jacks (New York: Elsevier) pp 127-46
-1973
Genetic algorithms and the optimal allocation of trials SIAM J. Conzput. 2
88- 1 OS
I975 Atlqitiition in Ntitirnil md Artijickil Systetns ( Ann Arbor, MI: University of
Michigan Press)
Hollsticn R B I97 I Ar-tijkitil Genrtic Ackiptcirioti in Coniiiicter Coritrol Systenzs PhD
Thesis, University of Michigan
Kim J-H and Jeon J-Y 1996 Evolutionary programming-based high-precision controller
design E\diitiottcit:\' Progmmniitig V-Pro(.. 5th Anti. Cord; 011 E\~olirtioticir\~
Progrcrmmitig (1996) ed L J FogeI, P J Angeline and T Back (Cambridgc, MA:
MIT Press)
Klockgcther J and Schwefel H-P 1970 Two-phase nozzle and hollow core jet experiments
Proc. I ltli Sytrip. 0 1 1 Engitieeririg Aspects c?f'~.kc~~,zetotiy~lr~)dyti~i~~iic.s
ed D G Elliott
TEAM
LRN
(Pasadena. CA: California Institute
of Technology)
pp 14 1-8
~

References

Koza J R 1992 Genetic Progrumtning (Cambridge, MA: MIT Press)

Koza J R and Andre D 1996 Evolution of iteration in genetic programming Ei~olictiotirrry
Programming V-Proc. 5th Ann. Cot$ on Evolutionury Progrumrning ( 1996) ed
L J Fogel, P J Angeline and T Back (Cambridge, MA: MIT Press)
Kursawe F I99 I A variant of evolution strategies for vector optimization Purullel
Problem Solving .froni Nutirre-Proc. 1s t Workshop PPSN 1 (Lecture Notes in
Computer Science 496) (Dortrnund, 1991) ed H-P Schwrfel and R Manner (Berlin:
Springer) pp 193-7
-1992 Naturanaloge Optimierverfahren-Neuere
Entwicklungen in der Informatik
Stiidien =.urE\wlirtorischert dkonomik 11 (Schriften de.5 Vereins fiir Sociulpolitik 195
11) ed U Witt (Berlin: Duncker and Humblot) pp 11-38
Land M and Belew R K 1995 Towards a self-replicating language for computation
E w l ir tioncr n Prog r u m n ing 1V-Proc. 4th Anti. Cor!f oti E idir tionu q Prog ruIntn it 1,s
(Sun Diego, CA, 1995) ed J R McDonnell, R G Reynolds and D B Fogel
(Cambridge, MA: MIT Press) pp 403-13
Larsen R W and Herman J S 1992 A comparison of evolutionary programming t o
neural networks and an application of evolutionary programming to a navy mission
planning problem Proc. 1st A m . Cot$ on Eidutionut-y Pmgrutnming (Lr Jollri,
CA, 1992) ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 127-33
LichtfuIj H J I965 E\dution eines Rohrkriirnrners Dip1.-Ing. Thesis, Technical University
of Berlin, Hermann Fottinger Institute for Hydrodynamics
Lohmann R 1991 Application of evolution strategy in parallel populations Purullel
Problem Solling .front Nutitre-Proc.
1st Workshop PPSN I (Dortmund, 1991)
(Lecture Notes in Computer Science 496) ed H-P Schwefcl and R Manner (Berlin:
Springer) pp 198-208
-1992 Structure evolution and incomplete induction Purullrl Prohleni Solitirig .from
Nuture 2 (Br~cssels,1992) ed R Manner and B Manderick (Amsterdam: ElsevierNorth-Holland) pp 175-85
Lutter B E and Huntsinger R C 1969 Engineering applications of finite automata
Sitnulation 13 5-1 I
Manner R and Manderick B (eds) 1992 Parallel Problevi SohGtig from Nuture 2 Bru.ssrl.v,
1992) (Amsterdam: Elsevier-North-Holland)
Matyas J 1965 Random optimization Automatioti Remote Control 26 244-5 1
-I967 Das zufallige Optimierungsverfahren und seine Konvergenz Pro( 5th lnt.
Anulogue Computation Meeting (Luusunne, 1967) 1 540-4
McDonnell J R 1992 Training neural networks with weight constraints Proc. 1st Atin.
Cortf: on E1dutiotiun' Progrumming (La Jollu, CA, 1992) ed D B Fogel and
W Atmar (La Jolla, CA: Evolutionary Programming Society) pp 1 1 1-9
McDonnell J R, Andersen B D, Page W C and Pin F 1992 Mobile manipulator
configuration optimization using evolutionary programming Proc. 1st Atin. Cotlf:
on Evolutionury Programming (La Jolla, CA, 1992) ed D B Fogel and W Atmar
(La Jolla, CA: Evolutionary Programming Society) pp 52-62
McDonnell J R and Waagen D 1994 Evolving recurrent perceptrons for time-series
prediction IEEE Truris. Neurul Nemw-ks "-5 24-38
Michalewicz Z et a1 (eds) 1994 Proc. 1st IEEE Conf o t i Ei*olirtiotiut-.v Compirtution
LRN
(Orlundo, FL, 1994) (Piscataway,TEAM
NJ: IEEE)

A history of evolutionary computation

Ostermeier A. Gawelcryk A and Hansen N 1994 Step-siLe adaptation based on non-local

U se of se I ect i on i n format i on Pei rci 11e 1 Pro hleiri Sol \ling from Nci tic re- P PSN Ill I n t.
C.or!f: o i i Et*olutioniiry Coiriyictcitiotr (Jeriiscileiii. 1994) (L..tc*tiire notes in Computer
Scirtic.r 866) ed Y Davidor, H-P Schwefel and R Manner (Berlin: Springer) pp 18998
Page W C, Andersen B D and McDonnell J R 1092 An evolutionary programming
approach to multi-dimensional path planning Pro(.. 1st A m . Cor$ oii \dirtiontin
Progrctniriiing ( L c i Jollcr. CA, 1992) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 63-70
Porto V W 1992 Alternative methods for training neural networks Pmc. 1st Anti. Coizf:
o i i E\wIiitioniiry Progr(iiiii,iiiig (Lci Jollci, CA, 1992) ed D B Fogel and W Atmar
(La Jolla, CA: Evolutionary Programming Society) pp 100-10
Porto V W, Fogel D B and Fogel L J I995 Alternative neural network training methods
IEEE E.ypert 10 16-22
Rappl G 1984 Kori i'rrgeiizrclten )lot1 RLtn~(~nz-SucirchVetfiihretr ziir glohcilen Optimieriin,g
PhD Thesis, Bundeswehr University
Rastrigin L A 1965 Riiriilorri Secirdi in Optimiziitioii Yrohlenisj 0 r Multipcircinieter S ~ s t e n i s
(trtinsliited Jroni tlie RiissiLiri origineil: Sliichtiinyi poisk 13 zciclcic.heikh optiuiistitsii
r~iriogocireinietric.heskikhsistem, Zincitiie, Rigci) ,4ir Force System Command Foreign
Technology Division FTD-HT-67-363
Rawlins G J E (ed) 199 I Foiriic1don.s oj' Geiietica Algorithms (San Mateo, CA: Morgan
Kaufmann)
Rechenberg I 1965 Cyheriietic- Solution Pcith oj'eiti Expritrietitcil Problem Royal Aircraft
Establishment Library Translation I 122
__ 1 973 E~~oliitioti.s.strcttegiu:Optiniierurig tec*hni.sc*herSj-sterrie m d i Prin5pien t k r
hiologisc-heri E\diition (Stuttgart: Frommann-Holzboog)
___ I978 Evolutionsstrategien Siniirlittioiisrviettio~l~~ii
in eler Mediziii irtiel Biologir ed
B Schneider and U Ranft (Berlin: Springer) pp 83-1 14
__ I994 Ei,olirtioti.s.strLrtehrie '94 (Stuttgart: Frominann-Holzboog)
Rizki M M, Tamburino L A and Zmuda M A 1995 Ekvlution of morphological
recognition systems E\diitioticir\. Progrcinirning lV-Proc. 4th Anti. Cor!f' on
E\+olirtioricit;yProgrctrriiniiig (Sciri Diego, CA, 1095) ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 95-106
Rosenberg R I 967 Sitriiiliitioti of G e r i d c Poj~iileitions\t'itli Bioc.hsiiiicd Properties PhD
Thesis, University of Michigan
Rudolph G 199I Global optimiiation by means of distributed evolution strategies
Peirdlel Problem Solling froiii Niiture-Prot . 1st Workshop PPSN I (Dortniuricl.
1991) (Lwtiire Notes in Coinputer Science 496) ed H-P Schwefel and R Manner
(Berlin: Springer) pp 209-1 3
I992 Parallel approaches to stochastic global optimization Pcirdlel Corripiitiiig: froni
Theory to Soioiel Prcic-tice, Proc. Eiir. Workshop oii Piircillel Coniputirig ed W Joosen
and E Milgrom (Amsterdam: 10s)pp 256-67
-1994 An evolutionary algorithni for integer programming Piircillel Prohleni Sol\*ing
f r o i ~ iNci tir re- P PSN 111 I n t. Cot$ oii E ~ wutiona
l
n Cornpii tci tion (Jer i m lerii, I 994 )
(Lectiire notes in Conipiiter Science 866) ed Y Davidor, H-P Schwefel and R Manner
(Berlin: Springer) pp I3948
Rudolph G and Sprave J 1995 A cellular genetic algorithm with selt-adjusting acceptance
TEAM
LRN
threshold Proi.. 1st IEEHEEE liit.
Con$
o i i Genetic Algoritlinis iri Erigirrtvriiig
~

References

Sjstenis: Iiitioiutions arid Applications (GALESIA ' 9 5 ) (Sheflfirld, 1995) (London:

IEE) pp 365-72
Sampson J R I98 1 A Swopsis of the Fifth Annual A m Arbor Adcrptii'c. Sy,stcm.s Worl\.\hop
Department of Computing and Communication Science, Logic of Computers Group
Technical Report University of Michigan
Saravanan N, Fogel D B and Nelson K M 1995 A comparison of methods for selfadaptation in evolutionary algorithms BioSystems 36 1 5 7 4 6
Satterthwaite F E 1959a Random balance experimentation Trc.hiionirtric..\ 1 1 1 1-37
-I959b REVOP or Raiidoni Eivlitrionan Operatioii Merrimack College Technical
Report 10-10-59
Schaffer J D (ed) 1989 Proc. 3rd Itit. Cor$ 011 Genetic Algoritlirm (Faii$is, WA, 1989)
(San Mateo. CA: Morgan Kaufmann)
Schutz M and Sprave J 1996 Application of parallel mixed-integer evolution 5trategies
with mutation rate pooling Et~)lutionaryProgruniniiiig V-Proc. -5th Aiin. Con/.
oti E\wlirtionur~ Progruniming (1996) ed L J Fogel, P J Angeline and T Back
(Cambridge, MA: MIT Press)
Schwe fel H-P 1 965 Kjberrietisc~heEivolution al.\ Srrategie der t ~ . r ~ ~ e r i i ? i t ~ i i tFordiirrig
tllt~ti
irt cier Striiniutigste~.IiriikDipl.-Ing. Thesis, Technical Univcrsity of Berlin, Herniann
Fiittinger Institute for Hydrodynamics
-I968 E.rperinieritelle Optiniirrirrzg eirier Z~iieiphLc.cc.ticIii.\i. Eil I AEG Research
Institute Project MHD-Staustrahlrohr I 1034/68 Technical Report 35
-1974 Aduptilve Mec.hunisnirti in der hiologischen Eiwlutiori irtid ihr Eii!jii$uiif'
die Eiiolirtioii,\ge.cc.}i~~tiiidigkeitTechnical University of Berlin Working Group of
Bionics and Evolution Techniques at the Institute for Mcawrement and Control
Technology Technical Report Re 215/3
-I975 Bitiiire Optiniic~rungdurch soniutische Muturioii Working Group of Bionics
and Evolution Techniques at the Institute of Measurement and Control Technology
of the Technical University of Berlin and the Central Animal Laboratory of the
Medical Highschool of Hannover Technical Report
-I
977 Niotierische Optimieruiig i w i Coniputer-Modellrii niittrls ckr Ei~olittioii.\.\trtrti~gie (ltiter~Ii.sc.ipliiictn?'
Sjs tenis Research 2 6 ) (Basle: Birkhauwr )
-1
98 1 Niunericul 0ptinii:utioii of Cotnpirter Modrls (Chichester: Wile) )
-1
987 Collective phenomena in evolutionary systems Problenis of Coristctricj~
uiid Chutige-the Coi?iplei?ietitarit~
(fl Systenis Approcdies to Conip1e.t it!, Ptrpers
Preserzted at the 3 l s t A m . Meeting h i t . Society Geii. Sj*.\t. Re.,. vol 2, ed P Checkland
and I Kiss (Budapest: International Society for General System Research) pp 102533
-1995 Eidutioii mid Optiniuni Seeking (New York: Wilcy )
Schwefel H-P and Back T 1995 Evolution strategies 11: theoretical aspects Geiic4c
Algorithms in Engiiieeritig arid Coiiipirter Sciericc. Pro(*. 1st Short Coirrw
EUROGEN-95 ed G Winter, J Pdriaux, M Galrin and P Cucsta (Neb York: Wilcy)
pp 127-40
Schwefel H-P and Manner R (eds) I99 I Prrrullel Problem Sol\*iiig.froniNuture-Proc. 1st
Workshop PPSN I ( D o r t i w i d , 1991) (Lecture Notes iii Coiupiiter SciiwcxJ 4 9 6 )
(Berlin: Springer)
Schwefel H-P and Rudolph G 1995 Contemporary evolution \trategies Adlwices iri
TEAM
LRN
Artificial Life-Proc. 3rd Eirr. Cot$
o t i Arrificiul
LIfi ( E C A L'95) (Lec*tirreNotcJs iii

A history of evolutionary computation

Compirfrr Science 929) ed F Morrin, A Moreno, J J Merelo and P Chac6n (Berlin:

Springer) pp 893-907
Sebald A V and Fogel D B 1992 Design of fault-tolerant neural networks for pattern
classi tication Proc. 1st Atin. Confi ott E\dutioricrr\* Progrcrmniing (hi Jolln, CAs
l Y Y 2 ) cd D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 90-9
Sebald A V, Schlenzig J and Fogel D B 1992 Minimax design of CMAC encoded
neural controllers for systems with variable time delay Procs. 1st A m . Corzj: ori
E\.olirtioricir-.~Pmgrcirtiniirig ( L A Jollu, CA, 1992) ed D B FogeI and W Atniar (LA
Jolla, CA: Evolutionary Programming Society) pp 120-6
Smith S F I980 A Letimirig System BLiseel o i i Geric>ticA(kipri\~oAlgorithnis PhD Thesis,
University of Pittsburgh
Spcndley W, Hext G R and Himsworth F R 1962 Sequential application of simplex
designs in optimisation and evolutionary operation ~ ~ . c , h n o m r t r i4c s44 1-6 1
Sprave J 1994 Linear neighborhood evolution strategy Proc. 3rd Aiin. Corzj: 0 1 1
E\dirtioriciry Progrtininiing ( S t i i i Die<+), CA, 1904) ed A V Scbald and L J Fogel
(Singapore: World Scientific) pp 42-5 1
Stcwart E C, Kakmaugh W P and Brocker D H 1967 Study of a global search algorithm
tor optimal control Pro(.. 5th /lit. Aritrlogue Conipirtcition Meeting (Lciicscirine, 19671
\ r o l I . pp 207-30
Takcuchi A 1980 Evolutionary automata-cornparison of automaton behavior and
Restle's learning model /tzfi)ntiution Sci. 20 9 1-9
Wct/t.l A I 983 E\dircttion of' the Eflee.tii*ene.s.sof' Gtwetic Algorithtm in Comhiiicrtor-iitl
0ptiitii:utiou unpublished manuscript, University of Pittsburgh

TEAM LRN

7
Introduction to evolutionary algorithms
Thornas Back

7.1

General outline of evolutionary algorithms

Since they are gleaned from the model of organic evolution, all basic instances
of evolutionary algorithms share a number of common properties, which are
mentioned here to characterize the prototype of a general evolutionary algorithm:

Evolutionary algorithms utilize the collective learning process of a

population of individuals. Usually, each individual represents (or encodes)
a search point in the space of potential solutions to a given problem.
Additionally, individuals may also incorporate further information; for
example, strategy parameters (Sections 16.2 and 32.2) of the evolutionary
algorithm.
(ii) Descendants of individuals are generated by randomized processes intended
to model mutation (Chapter 32) and recombination (Chapter 3 3 ) . Mutation
corresponds to an erroneous self-replication of individuals (typically,
small modifications are more likely than large ones), while recombination
exchanges information between two or more existing individuals.
(iii) By means of evaluating individuals in their environment, a measure of
quality or fitness value can be assigned to individuals. As a minimum
requirement, a comparison of individual fitness is possible, yielding a binary
decision (better or worse). According to the fitness measure, the selection
process favors better individuals to reproduce more often than those that
are relatively worse.
These are just the most general properties of evolutionary algorithms. and
the instances of evolutionary algorithms as described in the following chapters
use the components in various different ways and combinations. Some basic
differences in the utilization of these principles characterize the mainstream
instances of evolutionary algorithms; that is, genetic algorithms (Chapter 8),
evolution strategies (Chapter 9) , and evolutionary programming (Chapter 10) .
See D B Fogel ( I 995) and Back (1996) for a detailed overview of similarities
and differences of these instances and Back and Schwefel (1993) for a brief
comparison.
TEAM LRN

Introduction to evolutionary algorithms

Genetic algorithms (originally described by Holland (1962, 1975) at Ann

Arbor, Michigan, as so-called adaptive or reproductive plans) emphasize
recombination (crossover) (Chapter 33) as the most important search
operator and apply mutation (Chapter 32) with very small probability
solely as a 'background operator.' They also use a probabilistic selection
operator (proportional selection) (Chapter 23) and often rely on a binary
representation (Chapter IS) of individuals.
Evolution strategies (developed by Rechenberg ( 1965, 1973) and Schwefel
( 1965, 1977) at the Technical University of Berlin) use normally distributed
mutations to modify real-valued vectors (Chapter 16) and emphasize
mutation (Section 32.2) and recombination (Section 33.2) as essential
operators for searching in the search spacs and in the strategy parameter
space at the same time. The selection operator (Chapter 25) is deterministic.
and parent and offspring population sizes usually differ from each other.
Evolutionary programming (originally developed by Lawrence J Fogel
(1962) at the University of California in San Diego, as described in
Fogel er c d (1966) and refined by David B Fogel (1992) and others)
emphasizes mutation and does not incorporate the recombination of
individuals. Similarly to evolution strategies, when approaching realvalued optimization problems, evolutionary programming also works with
normally distributed mutations and extends the evolutionary process to the
strategy parameters. The selection operator (Section 27.1 ) is probabilistic.
and presently most applications are reported for search spaces involving
real-valued vectors, but the algorithm was originally developed to evolve
tinite-state machines (Chapter 18) .
In addition to these three mainstream methods, which are described in
detail in the next three chapters, genetic programming, classifier systems, and
hybridizations of evolutionary algorithms with other techniques are considered
in chapters 11-13. respectively.
As an introductory remark, we only
mention that genetic programming applies the evolutionary search principle to
automatically develop computer programs in suitable languages (Chapter 10)
(often LISP, but others are possible as well), while classifier systems search
the space of production rules (or sets of rules) of the form ' I F <condition>
THEN <action>'.
A variety of different representations of individuals and corresponding
operators are presently known in evolutionary algorithm research, and it is the
aim of Chapters 14-34 to present all these in detail. Here, we will use these
chapters as a construction kit to assemble the basic instances of evolutionary
algorithms.
As a general framework for these basic instances, we define I to denote an
arbitrary space of individuals a E I , and F : I -+ R to denote a real-valued
fitness function of individuals. Using p and h to denote parent and offspring
population sizes, P ( r ) = ( a ,( t ) , . . . , a J r ) ) E 1'' characterizes a population at
TEAM
LRN
generation t . Selection, mutation, and
recombination
are described as operators

General outline of evolutionary algorithms

I * + I i L , r n : I + I * , and r : Ii + I k that transform complete

populations. By describing all operators on the population level (though this
is counterintuitive for mutation), a high-level perspective is adopted, which is
sufficently general to cover different instances of evolutionary algorithms. For
mutation, the operator can of course be reduced to the level of single individuals
by defining rn through a multiple application of a suitable operator nz : I + I
on individuals.
These operators typically depend on additional sets of parameters @I,,
and (3,. which are characteristic for the operator and the representation of
individuals. Additionally, an initialization procedure generates a population of
individuals (typically at random, but an initialization with known starting points
should of course also be possible), an evaluation routine determines the fitness
values of the individuals of a population, and a termination criterion is applied
to determine whether or not the algorithm should stop.
Putting all this together, a basic evolutionary algorithm reduces to the simple
recombination-mutation-selection loop as outlined below:
s :

Input:
Output:
I
2
3
4
5
6
7
8
9

A, 8 , ,8,, (-I,, , (4,

U * ,the

to;

best individual found during the run, or

P * , the best population found during the run.

P ( t ) t initialize(p);
F ( t ) +- evaluate( P ( t ) , p ) ;
while ( i ( P ( t ) ,0 , )# true) do
P ( t ) t recombine(P(t), 0,);
P(r) t mutate(P(t), ( E n l ) ;
F ( t ) t evaluate(P(t), A);
P ( t 1 ) +- select(P(t), F ( t ) ,p , (-1,);

t t + l ;

od
After initialization of t (line 1 ) and the population P ( t ) of size p (line 2) as
well as its fitness evaluation (line 3), the while-loop is entered. The termination
criterion i might depend on a variety of parameters, which are summarized
here by the argument 0,. Similarly, recombination (line 5 ) , mutation (line
6), and selection (line 8) depend on a number of algorithm-specific additional
parameters. While P ( t ) consists of p individuals, P ( t ) and P ( t ) are assumed
to be of size K and A, respectively. Of course, h = K = 1-l is allowed and
is the default case in genetic algorithms. The setting K = p is also often
used in evolutionary programming (without recombination), but it depends on
the application and the situation is quickly changing. Either recombination
or mutation might be absent from the main loop. such that K = p (absence
of recombination) or K = h (absence of mutation) is required in these cases.
The selection operator selects p individuals from P ( t ) according to the fitness
values F ( t ) ,t is incremented (line 9),
andLRN
the body of the main loop is repeated.
TEAM

Introduction to evolutionary algorithms

The input parameters of this general evolutionary algorithm include the

population sizes p and h as well as the parameter sets ( ~ 1 ~(LOT,
.
and (4,
of the basic operators. Notice that we allow recombination to equal the identity
mapping; that is, P ( r ) = P ( r ) is possible.
The following sections of this chapter present the common evolutionary
algorithms as particular instances of the general scheme.

References
Back T I996 Eivlictionuq~Algorithms in Theory Litid Prcrc-ticv (New York: Oxford
University Press)
Back T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Eidi.rtionaq Coinputdon 1( 1 ) 1-23
Fogcl D B I992 E\*ol\+ig Art$ciril Intelligent-e PhD Thesis, University of California,
San Diego
1995 E iwlu tiotiti n v Comyutii tioii : h ure1 ei Nr \t Philosophy OJ Meicli in e In tell ig ence
(Piscataway, NJ: IEEE)
Fogel L J 1962 Autonomous automata Industr. Res. 4 14-9
Fogel L J, Owens A J and Walsh M J 1966 ArtiJicWl 1ntelligetic.e throicgh Sitniilcit(d
Eiwliition (New York: Wiley)
Holland J H 1962 Outline for a logical theory of adaptive systems J. ACM 3 297-314
__ 1975 Atkiptcition in Natiiml m d Articiul
System (Ann Arbor, MI: University of
Michigan Press)
Rechenberg I I965 Cybernetic solution path of an experimental problem Lihrtiry
Trcinsl~itionN o I I22 Royal Aircraft Establishment, Farnborough, U K
___ 1 973 Eioliitioiisstrtitegie: Optitnierung techiischer Systerne n c i c 4 Priii:ipirn tier
hiolog isc -hen E\ditt ion ( Stuttgart : Fromman n- Ho 1I,boog )
Sch we fel H-P 1965 K.vhertietisc.heEi)olution a l s Strcctegie der experimentellrti Forschng
iri der Striirizittig.stec.htzik Diplomarbeit, Technische Universitiit, Berlin
__ I 97 7 Nittnerist he Optitnierirtig \ w i Computer-Moclellen mittuls der E~olictioti.s.str~itegie Interdisciplinary Systems Research, vol 26 (Basel: Birkhiiuser)
~

Further reading
The introductory section to evolutionary algorithms certainly provides the right
place to mention the most important books on evolutionary computation and its
subdisciplines. The following list is not intended to be complete, but only to
guide the reader to the literature.
I. Back T 1996 E\dictiotitin Algorithms in Tlieot? and Pructic-r (New York: Oxford

University Press)
A presentation and comparison of evolution strategies, evolutionary programming.
and genetic algorithms with respect to their behavior as parameter optimization
methods. Furthermore, the role of mutation and selection in genetic algorithms is
discussed in detail, arguing that mutation is much more useful than usually claimed
TEAM LRN
in connection with genetic algorithms.

2. Goldberg D E I989 Genetic Algorithms in Seurch, Optinri:ution, rind McicAirw

Learning (Reading, MA: Addison-Wesley)
An overview of genetic algorithms and classi tier systems, discussing all important
techniques and operators used in these subfields of evolutionary computation.
3 . Rechenberg I 1994 E\,olutiorrsstrutegie '94 Werkstatt Bionik und Evolutionstechnik,
vol 1 (Stuttgart: Frommann-Holzboog)

A description of evolution strategies in the form used by Rechenberg's group in

Berlin, including a reprint of (Rechenberg 1973).
4. Schwefel H-P I995 Eidutioti and Optimum Seeking Sixth-Generation Cornputer
Technology Series (New York: Wiley)
The most recent book on evolution strategies, covering the ( /L hbstrategy and
all aspects of self-adaptation of strategy parameters as well as a comparison of
evolution strategies with classical optimization methods.

5. Fogel D B 1995 Evoli4tionar-y Computation: ToirarciN N m Philosoph~of Mcxhinr

Intelligence (Piscataway, NJ: IEEE)
The book covers all three main areas of evolutionary computation (i.e. genetic
algorithms, evolution strategies, and evolutionary programming) and discusses the
potential for using simulated evolution to achieve machine intelligence.
6. Michalewicz Z 1994 Genetic Algorithms
(Berlin: Springer)

+ Dutu Structures = E\dution

Programs

Michalewicz also takes a more general view at evolutionary computation, thinking

of evolutionary heuristics as a principal method for search and optimization, which
can be applied to any kind of data structure.
7. Kinnear K E 1994 Advances in Genetic Progrumming (Cambridge, MA: MIT Press)

This collection of articles summarizes the state of the art in genetic programming,
emphasizing other than LISP-based approaches to genetic programming.
8. Koza J R 1992 Genetic Programming: On the Progrumming of Computers h j Meurrs
of Nutural Selection (Cambridge, MA: MIT Press)
9. Koza J R 1994 Genetic Programming I1 (Cambridge. MA: MIT Press)
The basic books for genetic programming using LISP programs, demonstrating
the feasibility of the method by presenting a variety of application examples from
diverse fields.

TEAM LRN

Genetic algorithms
Larry J Eshelman

8.1 Introduction
Genetic algorithms (GAS) are a class of evolutionary algorithms first proposed
and analyzed by John Holland ( 1975). There are three features which distinguish
GAS, as first proposed by Holland, from other evolutionary algorithms: ( i )
the representation used-bitstrings (Chapter 15); (ii) the method of selectionproportional selection (Chapter 23) ; and (iii) the primary method of producing
variations-crossover (Chapter 33). Of these three features, however, it
is the emphasis placed on crossover which makes GAS distinctive. Many
subsequent GA implementations have adopted alternative methods of selection,
and many have abandoned bitstring representations for other representations
more amenable to the problems being tackled. Although many alternative
methods of crossover have been proposed, in almost every case these variants
are inspired by the spirit which underlies Hollands original analysis of GA
behavior in terms of the processing of schemata or building blocks. It should be
pointed out, however, that the evolution strategy paradigm (Chapter 9) has added
crossover to its repertoire, so that the distinction between classes of evolutionary
algorithms has become blurred (Bick et a1 1991).
We shall begin by outlining what might be called the canonical GA, similar
to that described and analyzed by Holland (1975) and Goldberg (1987). We
shall introduce a framework for describing GAS which is richer than needed
but which is convenient for describing some variations with regard to the
method of selection. First we shall introduce some terminology. The individual
structures are often referred to as chromosomes. They are the genotypes that
are manipulated by the GA. The evaluation routine decodes these structures
into some phenotypical structure and assigns a fitness value. Typically, but not
necessarily, the chromosomes are bitstrings. The value at each locus on the
bitstring is referred to as an allele. Sometimes the individuals loci are also
called genes. At other times genes are combinations of alleles that have some
phenotypical meaning, such as parameters.
TEAM LRN

Genetic algorithm basics and some variations

8.2 Genetic algorithm basics and some variations

An initial population of individual structures P ( 0 ) is generated (usually randomly) and each individual is evaluated for fitness. Then some of these individuals are selected for mating and copied (select-repro) to the mating buffer C ( r ) .
In Hollands original GA, individuals are chosen for mating probabilistically,
assigning each individual a probability proportional to its observed performance.
Thus, better individuals are given more opportunities to produce offspring (reproduction with emphasis). Next the genetic operators (usually mutation and
crossover) are applied to the individuals in the mating buffer, producing offspring C(t). The rates at which mutation and crossover are applied are an
implementation decision. If the rates are low enough. it is likely that some of
the offspring produced will be identical to their parents. Other implementation
details are how many offspring are produced by crossover (one or two), and
how many individuals are selected and paired in the mating buffer. In Hollands original description, only one pair is selected for mating per cycle. The
pseudocode for the genetic algorithm is as follows:
begin
t = 0;
initialize P(t);
evaluate structures in P(t);
while termination condition not satisfied do
begin
t = t + l ;
select-repro c(t) from P(t-1);
recombine and mutate structures in C(t)
forming Cy(t);
evaluate structures in C (t) ;
select-replace P(t) from C(t) and P(t-1);
end
end

After the new offspring have been created via the genetic operators the two
populations of parents and children must be merged to create a new population.
Since most GAS maintain a fixed-sized population M , this means that a total
of M individuals need to be selected from the parent and child populations to
create a new population. One possibility is to use all the children generated
(assuming that the number is not greater than M ) and randomly select (without
any bias) individuals from the old population to bring the new population up
to size M . If only one or two new offspring are produced, this in effect means
randomly replacing one or two individuals in the old population with the new
offspring. (This is what Hollands original proposal did.) On the other hand, if
TEAM to
LRN
the number of offspring created is equal
M , then the old parent population is

Genetic algorithms

completely replaced by the new population.

There are several opportunities for biasing selection: selection for
reproduction (or mating) and selection from the parent and child populations
to produce the new population. The GAS most closely associated with Holland
do all their biasing at the reproduction selection stage. Even among these
GAS,however, there are a number of variations. If reproduction with emphasis
is used, then the probability of an individual being chosen is a function of
its observed fitness. A straightforward way of doing this would be to total
the fitness values assigned to all the individuals in the parent population and
calculate the probability of any individual being selected by dividing its fitness
by the total fitness. One of the properties of this way of assigning probabilities is
that the GA will behave differently on functions that seem to be equivalent from
an optimization point of view such as y = c i x and y = u x 2 h. If the h value
is large in comparison to the differences in the value produced by the u x 2 term,
then the differences in the probabilities for selecting the various individuals in
the population will be small, and selection pressure will be very weak. This often
happens as the population converges upon a narrow range of values. One way
of avoiding this behavior is to scale the fitness function, typically to the worst
individual in the population (De Jong 1975). Hence the measure of fitness used
in calculating the probability for selecting an individual is not the individuals
absolute fitness, but its fitness relative to the worst individual in the population.
Although scaling can eliminate the problem of not enough selection
pressure, often GAS using fitness proportional selection suffer from the opposite
problem-too much selection pressure. If an individual is found which is much
better than any other, the probability of selecting this individual may become
quite high (especially if scaling to the worst is used). There is the danger
that many copies of this individual will be placed in the mating buffer, and
this individual (and its similar offspring) will rapidly take over the population
(premature convergence). One way around this is to replace fitness proportional
selection with ranked selection (Whitley 1989). The individuals in the parent
population are ranked. and the probability of selection is a linear function of
rank rather than fitness, where the steepness of this function is an adjustable
parameter.
Another popular method of performing selection is tournament selection
(Goldberg and Deb 1991). A small subset of individuals is chosen at random,
and then the best individual (or two) in this set is (are) selected for the mating
buffer. Tournament selection, like rank selection, is less subject to rapid takeover
by good individuals, and the selection pressure can be adjusted by controlling
the size of the subset used.
Another common variation of those GAS that rely upon reproduction
selection for their main source of selection bias is to maintain one copy of
the best individual found so far (De Jong 1975). This is referred to as the
elitist strategy (Section 28.4). It is actually a method of biased parent selection,
TEAM population
LRN
where the best member of the parent
is chosen and all but one

Genetic algorithm basics and some variations

of the M members of the child population are chosen. Depending upon the
implementation, the selection of the child to be replaced by the best indiLidual
from the parent population may or may not be biased.
A number of CA variations make use of biased replacement selection.
Whitleys GENITOR, for example, creates one child each cycle, selecting the
parents using ranked selection, and then replacing the worst member of the
population with the new child (Whitley 1989). Syswerdas steady-state CA
creates two children each cycle, selecting parents using ranked selection, and
then stochastically choosing two individuals to be replaced. with a bias towards
the worst individuals in the parent population (Syswerda 1989). Eshelmans
CHC uses unbiased reproductive selection by randomly pairing all the members
of the parent population, and then replacing the worst individuals of the parent
population with the better individuals of the child population. (In effect. the
offspring and parent populations are merged and the best M (population size)
individuals are chosen.) Since the new offspring are only chosen by CHC if
they are better than the members of the parent population. the selection of both
the offspring and parent populations is biased (Eshelman I99 1 ).
These methods of replacement selection, and especially that of CHC,
resemble the (1-1+A) ES method of selection (Section 25.4) sometimes originally
used by evolution strategies (ESs) (Back et (11 1991). From j1 parents h
offspring are produced; the p parents and A. offspring are merged; and the
best j i individuals are chosen to form the new parent population. The other
ES selection method, ( p , A ) ES (Section 25.4), places all the bias in the child
selection stage. In this case, j i parents produce h offspring ( E , > j i ) , and the best
p offspring are chosen to replace the parent population. Muhlenbeins breeder
CA also uses this selection mechanism (Muhlenbein and Schlierkamp-Voosen
1993).
Often a distinction is made between generational and stmdj~-state GAS
(Section 28.3). Unfortunately, this distinction tends to merge two properties that
are quite independent: whether the replacement strategy of the GA is biased
or not and whether the C A produces one (or two) versus many (usually M )
offspring each cycle. Syswerdas steady-state CA, like Whitleys GENITOR,
allows only one mating per cycle and uses a biased replacement selection,
but there are also GAS that combine multiple matings per cycle with biased
replacement selection (CHC) as well as a whole class of ESs ( ( p + E , ) ES).
Furthermore, the GA described by Holland (1975) combined a single mating per
cycle and unbiased replacement selection. Of these two features, it would seem
that the most significant is the replacement strategy. De Jong and Sarma (1993)
found that the main difference between GAS allowing many matings versus few
matings per cycle is that the latter have a higher variance in performance.
The choice between a biased and an unbiased replacement strategy, on the
other hand, is a major determinant of C A behavior. First, if biased replacement
is used in combination with biased reproduction, then the problem of premature
TEAM LRN(Of course this will depend upon
convergence is likely to be compounded.

Genetic algorithms

other factors, such as the size of the population, whether ranked selection is
used, and, if so, the setting of the selection bias parameter.) Second, the obvious
shortcoming of unbiased replacement selection can turn out to be a strength. On
the negative side, replacing the parents by the children, with no mechanism for
keeping those parents that are better than any of the children, risks losing,
perhaps forever, very good individuals. On the other hand, replacing the
parents by the children can allow the algorithm to wander, and it may be
able to wander out of a local minimum that would trap a GA relying upon
biased replacement selection. Which is the better strategy cannot be answered
except in the context of the other mechanisms of the algorithm (as well as the
nature of the problem being solved). Both Syswerdas steady-state GA and
Whitleys GENITOR combine a biased replacement strategy with a mechanism
for eliminating children which are duplicates of any member in the parent
population. CHC uses unbiased reproductive selection, relying solely upon
biased replacement selection as its only source of selection pressure, and uses
several mechanisms for maintaining diversity (not mating similar individuals and
seeded restarts), which allow it to take advantage of the preserving properties
of a deterministic replacement strategy without suffering too severely from its
shortcomings.

8.3 Mutation and crossover

All evolutionary algorithms work by combining selection with a mechanism for
producing variations. The best known mechanism for producing variations is
mutation, where one allele of a gene is randomly replaced by another. In other
words, new trial solutions are created by making small, random changes in the
representation of prior trial solutions. If a binary representation is used, then
mutation is achieved by flipping bits at random. A commonly used rate of
mutation is one over the string length. For example, if the chromosome is one
hundred bits long, then the mutation rate is set so that each bit has a probability
of 0.01 of being flipped.
Although most GAS use mutation along with crossover, mutation is
sometimes treated as if it were a background operator for assuring that the
population will consist of a diverse pool of alleles that can be exploited by
crossover. For many optimization problems, however, an evolutionary algorithm
using mutation without crossover can be very effective (Mathias and Whitley
1994). This is not to suggest that crossover never provides an added benefit,
but only that one should not disparage mutation.
The intuitive idea behind crossover is easy to state: given two individuals
who are highly fit, but for different reasons, ideally what we would like to
do is create a new individual that combines the best features from each. Of
course, since we presumably do not know which features account for the good
performance (if we did we would not need a search algorithm). the best we can
TEAMThis
LRNis how crossover operates. It treats
do is to recombine features at random.

Mutation and crossover

these features as building blocks scattered throughout the population and tries to
recombine them into better individuals via crossover. Sometimes crossover will
combine the worst features from the two parents, in which case these children
will not survive for long. But sometimes it will recombine the best features from
two good individuals, creating even better individuals, provided these features
are compatible.
Suppose that the representation is the classical bitstring representation:
individual solutions in our population are represented by binary strings of zeros
and ones of length L . A GA creates new individuals via crossover by choosing
two strings from the parent population, lining them up, and then creating two
new individuals by swapping the bits at random between the strings. (In some
GAS only one individual is created and evaluated, but the procedure is essentially
the same.) Holland originally proposed that the swapping be done in segments,
not bit by bit. In particular, he proposed that a single locus be chosen at random
and all bits after that point be swapped. This is known as one-point crossover.
Another common form of crossover is two-point crossover which involves
choosing two points at random and swapping the corresponding segments from
the two parents defined by the two points. There are of course many possible
variants. The best known alternative to one- and two-point crossover is i i i ~ l f o r i ? ~
crosso\er.Uniform crossover randomly swaps individual bits between the two
parents (i.e. exchanges between the parents the values at loci chosen at random).
Following Holland, GA behavior is typically analyzed in terms of schemata.
Given a space of structures represented by bitstrings of length L . schemata
represent partitions of the search space. If the bitstrings of length L are
interpreted as vectors in a L-dimensional hypercube, then schemata are
hyperplanes of the space. A schema can be represented by a string of L symbols
from the set 0, I , # where # is a wildcard matching either 0 or 1. Each string
of length L may be considered a sample from the partition defined by a schema
if it matches the schema at each of the defined positions (i.e. the non-# loci).
For example, the string 01 1001 instantiates the schema 01##0#. Each string, in
fact, instantiates 2 L schemata.
Two important schema properties are order and defining length. The order of
a schema is the number of defined loci (i.e. the number of non-# symbols). For
example the schema #01##1### is an order 3 schema. The defining length is
the distance between the loci of the first and last defined positions. The defining
length of the above schema is four since the loci of the first and last defined
positions are 2 and 6.
From the hyperplane analysis point of view, a GA can be interpreted as
focusing its search via crossover upon those hyperplane partition elements that
have on average produced the best-performing individuals. Over time the search
becomes more and more focused as the population converges since the degree of
variation used to produce new offspring is constrained by the remaining variation
in the population. This is because crossover has the property that Radcliffe refers
TEAM LRN
to as respect-if two parents are instances
of the same schema, the child will

Genetic algorithms

also be an instance (Radcliffe 1991). If a particular schema conveys high fitness

values to its instances, then the population is likely to converge on the defining
bits of this schema. Once it so converges, all offspring will be instances of this
schema. This means that as the population converges, the search becomes more
and more focused on smaller and smaller partitions of the search space.
It is useful to contrast crossover with mutation in this regard. Whereas
mutation creates variations by flipping bits randomly, crossover is restricted to
producing variations at those loci on which the population has not yet converged.
Thus crossover, and especially bitwise versions of crossover, can be viewed as
a form of adaptive mutation, or convergence-controlled variation (CCV).
The standard explanation of how GAS operate is often referred to as the
building block hypothesis. According to this hypothesis, GAS operate by
combining small building blocks into larger building blocks. The intuitive idea
behind recombination is that by combining features (or building blocks) from
two good parents crossover will often produce even better children; for example,
a mother with genes for sharp teeth and a father with genes for sharp claws will
have the potential of producing some children who have both features. More
formally, the building blocks are the schemata discussed above.
Loosely interpreted, the building block hypothesis is another way of asserting
that GAS operate through a process of CCV. The building block hypothesis,
however, is often given a stronger interpretation. In particular, crossover is
seen as having the added value of being able to recombine middle-level building
blocks that themselves cannot be built from lower-level building blocks (where
level refers to either the defining length or order, depending on the crossover
operator). We shall refer to this explanation i i s to how GAS work as the strict
building block hypothesis (SBBH), and contrast it with the weaker convergencecontrolled variation hypothesis (CCVH).
To differentiate these explanations, it is useful to compare crossover with
an alternative mechanism for achieving CCV. Instead of pairing individuals
and swapping segments or bits, a more direct method of generating CCVs is
to use the distribution of the allele values in the population to generate new
offspring. This is what Syswerdas bitwise simulated crossover (BSC) algorithm
does (Syswerda 1993). In effect, the distribution of allele values is used to
generate a vector of allele probabilities, which in turn is used to generate a
string of ones and zeros. Balujas PBIL goes one step further and eliminates
the population, and simply keeps a probability vector of allele values, using an
update rule to modify it based on the fitness of the samples generated (Baluja
1995).
The question is, if one wants to take advantage of CCV with its ability to
adapt, why use crossover, understood as involving pairwise mating, rather than
one of these poolwise schemes? One possible answer is that the advantage is
only one of implementation. The pairwise implementation does not require any
centralized bookkeeping mechanism. In other words, crossover (using pairwise
TEAM LRN
mating) is simply natures way of implementing
a decentralized version of CCV.

Mutation and crossover

A more theoretically satisfying answer is that pairwise mating is better able

to preserve essential linkages among the alleles. One manifestation of this is
that there is no obvious way to implement a segment-based version of poolwise
mating, but this point also applies if we compare poolwise mating with only
crossover operators that operate at the bit level, such as uniform crossover. If
two allele values are associated in some individual, the probability of these
values being associated in the children is much higher for pairwise mating than
poolwise. To see this consider an example. Suppose the population size is
100, and that an individual of average fitness has some unique combination of
allele values, say all ones in the first three positions. This individual will have
a 0.01 probability (one out of 100) of being selected for mating, assuming it is
of average fitness. If uniform crossover is being used, with a 0.5 probability
of swapping the values at each locus, and one offspring is being produced per
mating, then the probability of the three allele values being propagated without
disruption has a lower bound of 0.125 (0.5'). This is assuming the worst-case
scenario that every other member in the population has all zeros in the first
three positions (and ignoring the possibility of mating this individual with a
copy of itself). Thus, the probability of propagating this schema is 0.00125
(0.01 * 0.125). On the other hand, if BSC is being used, then the probability
of propagating this schema is much lower. Since there is only one instance of
this individual in the population, there is only one chance in 100 of propagating
each allele and only 0.O00001 (0.01') of propagating all three.
Ultimately, one is faced with a tradeoff: the enhanced capability of pairwise
mating to propagate difficult-to-find schemata is purchased at the risk of
increased hitchhiking; that is, the population may prematurely converge on bits
that do not convey additional fitness but happen to be present in the individuals
that are instances of good schemata, According to both the CCVH and the
SBBH, crossover must not simply preserve and propagate good schemata, but
must also recombine them with other good schemata. Recombination, however,
requires that these good schemata be tried in the context of other schemata.
In order to determine which schemata are the ones contributing to fitness, we
must test them in many different contexts, and this involves prying apart the
defining positions that contribute to fitness from those that are spurious, but the
price for this reduced hitchhiking is higher disruption (the breaking up of the
good schemata). This price will be too high if the algorithm cannot propagate
critical, highly valued, building blocks or, worse yet, destroys them in the next
crossover cycle.
This tradeoff applies not only to the choice between poolwise and pairwise
methods of producing variation, but also to the choice between various methods
of crossover. Uniform crossover, for example, is less prone to hitchhiking than
two-point crossover, but is also more disruptive, and poolwise mating schemes
are even more disruptive than uniform crossover. In Holland's original analysis
this tradeoff between preserving the good schemata while performing vigorous
recombination is downplayed by using
a segment-based
crossover operator such
TEAM
LRN

Genetic algorithms

as one- or two-point crossover and assuming that the important building blocks
are of short defining length. Unfortunately, for the types of problem to which
GAS are supposedly ideally suited-those that are highly complex with no
tractable analytical solution-there is no a priori reason to assume that the
problem will, or even can, be represented so that important building blocks will
be those with short defining length. To handle this problem Holland proposed an
inversion operator that could reorder the loci on the string, and thus be capable
of finding a representation that had building blocks with short defining lengths.
The inversion operator, however, has not proven sufficiently effective in practice
at recoding strings on the fly. To overcome this linkage problem, Goldberg has
proposed what he calls messy GAS, but, before discussing messy GAS, it will
be helpful to describe a class of problems that illustrate these linkage issues:
deceptive problems.
Deception is a notion introduced by Goldberg (1987). Consider two
incompatible schemata, A and B . A problem is deceptive if the average fitness of
A is greater than B even though B includes a string that has a greater fitness than
any member of A. In practice this means that the lower-order building blocks
lead the GA away from the global optimum. For example, consider a problem
consisting of five-bit segments for which the fitness of each is determined as
follows (Liepins and Vose 1991). For each one the segment receives a point,
and thus five points for all ones, but for all Term it receives a value greater
than five. For problems where the value of the optimum is between five and
eight the problem is fully deceptive (i.e. all relevant lower-order hyperplanes
lead toward the deceptive attractor). The total fitness is the sum of the fitness
of the segments.
It should be noted that it is probably a mistake to place too much emphasis on
the formal definition of deception (Grefenstette 1993). What is really important
is the concept of being misled by the lower-order building blocks. Whereas
the formal definition of deception stresses the average fitness of the hyperplanes
taken over the entire search space, selection only takes into account the observed
average fitness of hyperplanes (those in the actual population). The interesting
set of problems is those that are misleading in that manipulation of the lowerorder building blocks is likely to lead the search away from the middle-level
building blocks that constitute the optimum solution, whether these middle-level
building blocks are deceptive in the formal sense or not. In the above class of
functions, even when the value of the optimum is greater than eight (and so
not fully deceptive), but still not very large, e.g. ten, the problem is solvable
by a GA using segment-based crossover, very difficult for a GA using bitwise
uniform crossover, and all but impossible for a poolwise-based algorithm like
BSC.
As long as the deceptive problem is represented so that the loci of the
positions defining the building blocks are close together on the string. i t meets
Hollands original assumption that the important building blocks are of short
TEAM
defining length. The GA will be able
to LRN
exploit this information using one- or

Mutation and crossover

two-point crossover-the building blocks will have a low probability of being

disrupted, but will be vigorously recombined with other building blocks along
the string. If, on the other hand, the bits constituting the deceptive building
blocks are maximally spread out on the chromosome, then a crossover operator
such as one- or two-point crossover will tend to break up the good building
blocks. Of course, maximally spreading the deceptive bits along the string is
the extreme case, but bunching them together is the opposite extreme.
Since one is not likely to know enough about the problem to be able
to guarantee that the building blocks are of short defining length, segmented
crossover loses its advantage over bitwise crossover. It is true that bitwise
crossover operators are more disruptive, but there are several solutions to
this problem. First, there are bitwise crossover operators that are much less
disruptive than the standard uniform crossover operator (Spears and De Jong
1991, Eshelman and Schaffer 1995). Second, the problem of preservation can
often be ameliorated by using some form of replacement selection so that good
individuals survive until they are replaced by better individuals (Eshelman and
Schaffer 1995). Thus a disruptive form of crossover such as uniform crossover
can be used and good schemata can still be preserved. Uniform crossover will
still make it difficult to propagate these high-order, good schemata once they
are found, but, provided the individuals representing these schemata are not
replaced by better individuals that represent incompatible schemata, they will
be preserved and may eventually be able to propagate their schemata on to
their offspring. Unfortunately, this proviso is not likely to be met by any but
low-order deceptive problems. Even for deceptive problems of order five, the
difficulty of propagating optimal schemata is such that the suboptimal schemata
tend to crowd out the optimum ones.
Perhaps the ultimate GA for tackling deceptive problems is Goldbergs
messy GA (mGA) (Goldberg et a1 1991). Whereas in more traditional GAS
the manipulation of building blocks is implicit, mGAs explicitly manipulate
the building blocks. This is accomplished by using variable-length strings that
may be underspecified or overspecified; that is, some bit positions may not be
defined, and some positions may have conflicting specifications. This is what
makes mGAs messy.
These strings constitute the building blocks. They consist of a set of
position-value pairs. Overspecified strings are evaluated by a simple conflict
resolution strategy such as first-come-first-served rules. Thus, (( 1 0) (2 1) ( 1 1 )
( 3 0)) would be interpreted as 010, ignoring the third pair, since the first position
has already been defined. Underspecified strings are interpreted by filling in the
missing values using a competitive template, a locally optimal structure. For
example, if the locally optimal structure, found by testing one bit at a time, is
11 I , then the string (( 1 0) ( 3 0)) would be interpreted by filling in the value
for the (missing) second position with the value of the second position in the
template. The resulting 010 string would then be evaluated.
mGAs have an outer and anTEAM
innerLRN
loop. The inner loop consists of

Genetic algorithms

three phases: the initialization, primordial, and juxtaposition phases. In the

initialization phase all substrings of length k are created and evaluated, i.e. all
combinations of strings with k defining positions (where k is an estimate of the
highest order of deception in the problem). As was explained above the missing
values are filled in using the competitive template. (As will be explained below,
the template for the k level of the outer loop is the solution found at the k - 1
level.)
In the primordial phase, selection is applied to the population of individuals
produced during the initialization phase without any operators. Thus the
substrings that have poor evaluations are eliminated and those with good
evaluations have multiple copies in the resulting population.
In the juxtapositional phase selection in conjunction with cut and splice
operators is used to evolve improved variations. Again, the competitive template
is used for tilling in missing values, and the first-come-first-served rule is used
for handling overspecified strings created by the splice operator. The cut and
splice operators act much like one-point crossover in a traditional GA, keeping
in mind that the strings are of variable length and may be underspecified or
overspeci fied.
The outer loop is over levels. It starts at the level of k = I , and continues
through each level until it reaches a user-specified stopping criterion. At each
level, the solution found at the previous level is used as the competitive template.
One of the limitations of mGAs as originally conceived is that the
initialization phase becomes extremely expensive as the mGA progresses up
the levels. A new variant of the mGA speeds up the process by eliminating the
need to process all the variants in the initialization stage (Goldberg rt cil 1993).
The initialization and primordial phases of the original mGA are replaced by a
'probabilistically complete initialization' procedure. This procedure is divided
into several steps. During the first step strings of nearly length L are evaluated
(using the template to f i l l in the missing values). Then selection is applied to
these strings without any operators (much as was done in the primordial phase
of the original mGA, but for only a few generations). Then the algorithm enters
a filtering step where some of the genes in the strings are deleted. and the
shortened strings are evaluated using the competitive template. Then selection
is applied again. This process is repeated until the resulting strings are of
length k . Then the mGA goes into the juxtaposition stage like the original
mGA. By replacing the original initialization and primordial stages with stepwise
filtering and selection. the number of evaluations required is drastically reduced
for problems of significant size. (Goldberg ct cil (1993) provide analytical
methods for determining the population and filtering reduction constants.) This
new version of the mGA is very effective at solving loosely linked deceptive
problems, i.e. those problems where the defining positions of the deceptive
segments are spread out along the bitstring.
mGAs were designed to operate according to the SBBH, and deceptive
LRN where being able to manipulate
problems illustrate that there are TEAM
problems

Representation

building blocks can provide an added value over CCV. It still is an open
question, however, as to how representative deceptive problems are of the types
of real-world problem that GAS might encounter. No doubt, many difficult
real-world problems have deceptive or misleading elements in them. If they did
not, they could be easily solved by local search methods. However it does not
necessarily follow that such problems can be solved by a GA that is good at
solving deceptive problems. The SBBH assumes that the misleading building
blocks will exist in the initial population, that they can be identified early in the
search before they are lost, and that the problem can be solved incrementally
by combining these building blocks, but perhaps the building blocks that have
misleading alternatives have little meaning until late in the search and so cannot
be expected to survive in the population.
Even if the SBBH turns out not to be as useful an hypothesis as originally
supposed, the increased propagation capabilities of pairwise mating may give a
GA (using pairwise mating) an advantage over a poolwise CCV algorithm. To
see why this is the case it is useful to define the prototypical individual for a
given population: for each locus we assign a one or a zero depending upon which
value is most frequent in the population (randomly assigning a value if they are
equally frequent). Suppose the population contains some maverick individual
that is quite far from the prototypical individual although it is near the optimum
(as measured by Hamming distance) but is of only average fitness. Since an
algorithm using a poolwise method of producing offspring will tend to produce
individuals that are near the prototypical individual, such an algorithm is unlikely
to explore the region around the maverick individual. On the other hand, a GA
using pairwise mating is more likely to explore the region around the maverick
individual, and so more likely to discover the optimum. Ironically, pairwise
mating is, in this respect, more mutation-like than poolwise mating. While
pairwise mating retains the benefits of CCV, it less subject to the majoritarian
tendencies of poolwise mating.

8.4 Representation
Although GAS typically use a bitstring representation, GAS are not restricted
to bitstrings. A number of early proponents of GAS developed GAS that use
other representations, such as real-valued parameters (Davis 1991 , Janikow
and Michalewicz I99 1, Wright 1991; see Chapter I6), permutations (Davis
1985, Goldberg and Lingle 1985, Grefenstette et a1 1985; see Chapter 17), and
treelike hierarchies (Antonisse and Keller 1987; see Chapter 19). Kozas genetic
programming (GP) paradigm (Koza 1992; see Chapter I I ) is a GA-based method
for evolving programs, where the data structures are LISP S-expressions, and
crossover creates new LISP S-expressions (offspring) by exchanging subtrees
TEAM LRN
from the two parents.

Genetic algorithms

In the case of combinatorial problems such as the traveling salesman problem

(TSP), a number of order-based or sequencing crossover operators have been
proposed. The choice of operator will depend upon ones goal. If the goal is to
solve a TSP, then preserving adjacency information will be the priority, which
suggests a crossover operator that operates on common edges (links between
cities shared by the two parents) (Whitley er a1 1989). On the other hand,
if the goal is to solve a scheduling problem, then preserving relative order is
likely to be the priority, which suggests an order preserving crossover operator.
Syswerdas order crossover operator (Syswerda I99 1 ), for example, chooses
several positions at random in the first parent, and then produces a child so that
the relative order of the chosen elements in the first parent is imposed upon the
second parent.
Even if binary strings are used, there is still a choice to be made as to
which binary coding scheme to use for numerical parameters. Empirical studies
have usually found that Gray code is superior to the standard power-of-two
binary coding (Caruana and Schaffer 1988), at least for the commonly used
test problems. One reason is that the latter introduces Hamming cliffs-two
numerically adjacent values may have bit representations that are many bits apart
(up to L - I ). This will be a problem if there is some degree of gradualness in the
function, i.e. small changes in the variables usually correspond to small changes
in the function. This is often the case for functions with numeric parameters.
As an example, consider a five-bit parameter, with a range from 0 to 3 1 . If
it is encoded using the standard binary coding, then 15 is encoded as 01 1 1 I ,
whereas 16 is encoded as 10000. In order to move from 15 to 16, all five
bits need to be changed. On the other hand, using Gray coding, 15 would be
represented as 01000 and 16 as I 1000, differing only by 1 bit.
When choosing an alternative representation, it is critical that a crossover
operator be chosen that is appropriate for the representation. For example, if
real-valued parameters are used, then a possible crossover operator is one that
for each parameter uses the parameter values of the two parents to define an
interval from which a new parameter is chosen (Eshelman and Schaffer 1993).
As the GA makes progress it will narrow the range over which it searches for
new parameter values.
If, for the chosen representation and crossover operator, the building blocks
are unlikely to be instantiated independently of each other in the population,
then a GA may not be appropriate. This problem has plagued finding crossover
operators that are good for solving TSPs. The natural building blocks, it would
seem, are subtours. However, what counts as a good subtour will almost
always depend upon what the other subtours are. In other words, two good,
but suboptimal solutions to a TSP may not have many subtours (other than
very short ones) that are compatible with each other so that they can be spliced
together to form a better solution. This hurdle is not unique to combinatorial
problems.
TEAM LRN
Given the importance of the representation,
a number of researches have

Parallel genetic algorithms

suggested methods for allowing the GA to adapt its own coding. We noted
earlier that Holland proposed the inversion operator for rearranging the loci
in the string. Another approach to adapting the representation is Shaefer's
ARGOT system (Shaefer 1987). ARGOT contains an explicit parameterized
representation of the mappings from bitstrings to real numbers and heuristics
for triggering increases and decreases in resolution and for shifts in the ranges
of these mappings. A similar idea is employed by Schraudolph and Belew
( 1 992) who provide a heuristic for increasing the resolution triggered when the
population begins to converge. Mathias and Whitley ( 1994) have proposed
what they call delta coding. When the population converges, the numeric
representation is remapped so that the parameter ranges are centered around
the best value found so far, and the algorithm is restarted. There are also
heuristics for narrowing or extending the range.
There are also GAS with mechanisms for dynamically adapting the rate
at which CA operators are used or which operator is used. Davis, who has
developed a number of nontraditional operators, proposed a mechanism for
adapting the rate at which these operators are applied based on the past success
of these operators during a run of the algorithm (Davis 1987).

8.5 Parallel genetic algorithms

All evolutionary algorithms, because they maintain a population of solutions,
are naturally parallelizable. However, because GAS use crossover, which is a
way of sharing information, there are two other variations that are unique to
GAS (Cordon and Whitley 1993). The first, most straightforward, method is
to simply have one global population with multiple processors for evaluating
individual solutions. The second method, often referred to as the island
model (alternatively, the migration or coarse-grain model), maintains separate
subpopulations. Selection and crossover take place in each subpopulation in
isolation from the other subpopulations. Every so often an individual from one
of the subpopulations is allowed to migrate to another subpopulation. This way
information is shared among subpopulations.
The third method, often referred to as the neighborhood model (alternatively,
the diffusion or fine-grain model), maintains overlapping neighborhoods. The
neighborhood for which selection (for reproduction and replacement) applies is
restricted to a region local to each individual. What counts as a neighborhood
will depend upon the neighborhood topology used. For example, if the
population is arranged upon some type of spherical structure, individuals might
be allowed to mate with (and forced to compete with) neighbors within a certain
radius.
TEAM LRN

Genetic algorithms

8.6 Conclusion
Although the above discussion has been in the context of GAS as potential
function optimizers, it should be pointed out that Holland's initial GA work was
in the broader context of exploring GAS as adaptive systems (De Jong 1993).
GAS were designed to be a simulation of evolution, not to solve problems. Of
course, evolution has come up with some wonderful designs, but one must not
lose sight of the fact that evolution is an opportunistic process operating in an
environment that is continuously changing. Simon has described evolution as
a process of searching where there is no goal (Simon 1983). This is not to
question the usefulness of GAS as function optimizers, but only to emphasize
that the perspective of function optimization is somewhat different from that of
adaptation, and that the requirements of the corresponding algorithms will be
somewhat different.

References
Antonisse H J and Keller K S 1987 Genetic operators for high-level knowledge
representations Pro(.. 2nd ltit. Coiij; on Genetic Algorithms (Cunzhriclge,MA, 1987)
cd J J Grefenstette (Hillsdale, NJ: Erbaum) pp 69-76
Back T. Hoffmeister F and Schwefel H 1991 A survey of evolution strategies Pro(..
4th lnt. Coi$ on Genetic A1goritliriz.s (San Diego, CA, 1991) ed R K Belew and
L B Booker (San Mateo, CA: Morgan Kaufmann) pp 2-9
Baluja S 1995 A n Empiricd Cornparison of Se~*enItt#rutiL'earid Evolictionan Function
Optinziztion Hei4ristic.s Carnegie Mellon University School of Computer Science
Technical Report CMU-CS-95- I93
Caruana R A and Schafler J D 1988 Representation and hidden bias: Gray vs. binary
coding for genetic algorithms Proc. 5th ltzt. Conj: oii Machiize Leuniing (San Mateo,
CA: Morgan Kaufmann) pp 153-61
Davis L 1985 Applying adaptive algorithms to epistatic domains Pruc. I n t . Joirit
Coifiretice o i i Artijcicil ltitelligetice pp 162-4
__ 1987 Adaptive operator probabilities in genetic algorithms Proc. 3rd lnt. Conj: o t i
Genetic. Algorithms (F(iirjiix, VA, I989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 61-9
-199 I Hybridization and numerical representation The HuncIbook of Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 61-71
De Jong K I975 A n Ancilyis of the Behuviar of N Clu.s.s cf Genetic Aduptiiv Swtetns
Doctoral Thesis, Department of Computer and Communication Sciences, University
of Michigan
-1993 Genetic algorithms are not function optimizers Foutzdutions of Genetic
,4Igorithtrz.r 2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 5-17
De Jong K and Sarma J 1993 Generation gaps revisited Founddons of Genetic
Algorithms 2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 19-28
Eshelman L J I991 The CHC adaptive search algorithm: how to have safe search
when engaging in nontraditional genetic recombination Foundutions oJ' Genetic
TEAM LRN
Algorithnzs ed G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 265-83

References

Eshelman L J and Schaffer J D 1993 Real-coded genetic algorithm4 and interval schemata
Fourzdatioiis of Genetic Algorithnzs 2 ed D Whitlcy (San Matco. CA: Morgan
Kaufmann) pp 187-202
-I
995 Productive recombination and propagating and pre4ening xhemata
Foitiickltioiis of Genetic Algorithms 3 ed D Whitley (San Mateo, CA: Morgan
Kaufmann) pp 299-3 13
Goldberg D E 1987 Simple genetic algorithms and the minimal, deceptive problcm
Genetic Algorithms (xiid Sirnitlated Aniieciling cd L Datri4 (San Mateo, CA: Morgan
Kaufmann) pp 74-88
-I989 Genetic Algorithins in Seurch, Optiinixition, tiiirl M(ichine kririziiig (Reading,
MA: Addiwn-Wc4ey)
Goldberg D E and Deb K 1991 A comparative analysi\ of \election wherne9 used in
genetic algorithms Foitizdiitions c~ Genetic. Algoritliiii, ed G J E Rawlins (San
Mateo, CA: Morgan Kaufmann) pp 69-93
Goldberg D E, Deb K, Kargupta H and Harik G 1993 Rapid, accurate optimiration
of difficult problems using fast messy genetic algorithms Proc. -5th I i i t . Cont. oii
Genetic Algoritlinis (Urbaiza-Chuinpai~n,IL, 1993) cd S Forrest (San Matco. CA:
Morgan Kaufmann) pp 56-64
Goldberg D E, Deb K and Korb B 1991 Dont worry, be mc\sy Proc. 4th I i i t . Coiif. o i i
Genetic. Algorithms ( S m Diego, CA, 1991) ed R K Bclew and L B Booher (San
Mateo, CA: Morgan Kaufmann) pp 24-30
Goldberg D E and Lingle R L 1985 Alleles, loci, and the traveling salesman problcm
Proc. 1st Int. Car$ 011 Cerietic Algorithms (Pittshur~qh,PA, 1085) cd J J Grefcmtette
(Hillsdale, NJ: Erbaum) pp 154-9
Cordon V S and Whitley 1993 Serial and parallel genetic algorithm\ and function
optimizers Proc. 5th Int. Conf on Genetic Algorithm 5 ( Ur~~rriici-CIi~ni~~trigrl.
IL.
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 177-83
Grefenstette J J 1 993 Deception considered harmful FoitiickItioiis of Geiwtic AlCqoritlims
2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 75-91
Grefenstette J J, Gopal R, Roqmaita B J and Van Gucht D 1985 Genetic algorithm4 for thc
traveling salesman problem Proc. 1st Itit. Co@ oii Griwtic Al~qorithins( PittshicrCqh,
PA, 1985) ed J J Grefenstette (Hillsdale, NJ: Erbaum) pp 160-8
Holland J H 1975 Aduptatiori iri Natiirctl and Artijcicil Sl-steim (Ann Arbor, MI:
University of Michigan Press)
Janikow C 2 and Michalewicz Z 1991 An experimental comparison of binary and
floating point representations in genetic algorithms Proc. 4th Int. Coi!f: on Geiietic
Algorithms (Sciiz Diego, CA, 1991) ed R K Belew and L B Booker (San Mateo. CA:
Morgan Kaufmann) pp 3 1-6
Koza J I 992 Geizetic Prograinnzing: on the Progrciininiiig of Cotnpicters I n 7 M t w i i s of
Naturd Selection mid Genetics (Cambridge, MA: MIT Pres5)
Liepins G E and Vose M D 1991 Representational issues in genetic optimimtion J. E i p .
Tlieor. AI 2 101-15
Mathias K E and Whitley L D 1994 Changing representations during \earth: a
comparative study of delta coding E\~olutioiiur;~~
Conipirt. 2
Miihlenbein H and Schlierkamp-Voosen I993 The scicncc of brceding and its application
TEAM LRN Conipict. 1
to the breeder genetic algorithm Evolictiorzan

Genetic algorithms

Radcliffe N J I991 Forma analysis and random respectful recombination Proc.. 4th / t i t .
Corzj: ot1 Gerietic Algoritht?i.s(Sun Diego, CA, 1991) ed R K Belew and L B Booker
(San Mateo, CA: Morgan Kaufmann) pp 222-9
Schaffer J D, Eshelman L J and Offutt D 1991 Spurious correlations and premature
convergence in genetic algorithms Foutidations c.$ Genetic Algorithms ed
G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 102-12
Schraudolph N N and Belew R K 1992 Dynamic parameter encoding for genetic
algorithms Mwhitie Learning 9 9-2 1
Shaefer C G I987 The ARGOT strategy: adaptive representation genetic optimizer
technique Genetic- Algorithnis cind Their Applictitiot2.s: Proc. 2nd Itit. Cotif: oti
Genetic Algorithms { Cmihridge, MA, I987) ed J J Grefenstette (Hillsdale, NJ:
Erlbaum) pp 50-8
Simon H A 1983 Rectson in Hitmm Aflciirs (Stanford, CA: Stanford University Press)
Spears W M and De Jong K A 1991 On the virtues of parameterized uniform crossover
Proc. 4th Itit. Cot$ on Genetic Algorithms (San Diego, CA, 1991) ed R K Belew
and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 230-6
Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd Int. Cot$ 011
Generic A1gorirhni.s (Elctitfkr. VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2-9
-199I Schedule optimization using genetic algorithms Hutidhook oj Genetic
Algoritlrt~ised L Davis (New York: Van Nostrand Reinhold) pp 3 3 2 4 9
-I993 Simulated crossover in genetic algorithms Foittid~ition,sof Genetic Algorithms
2 ed D Whitley (San Mateo. CA: Morgan Kaufniann) pp 239-55
Whitley D I989 The GENITOR algorithm and selection pressure: why rank-based
allocation of reproductive trials is best Proc. 3rd Itit. Cot$ on Genetic Algorithm.s
(F(iitj+&r.VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 1 16-2 1
Whitley D, Starkweather T and Fuquay D 1989 Scheduling problems and traveling
salesmen: the genetic edge recombination operator Pro(.. 3rd Itit. Conf: oti
Genetic Algorithnis (FuitjL, VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 116-21
Wright A 1991 Genetic algorithms for real parameter optimitation Foiitirf&ms ~ f G e t i e t i ~
Al,gorithms ed G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 205-18

TEAM LRN

Evolution strategies
Giinter Rudolph

9.1 The archetype of evolution strategies

Minimizing the total drag of three-dimensional slender bodies in a turbulent
flow was, and still is, a general goal of research in institutes of hydrodynamics.
Three students (Peter Bienert, Ingo Rechenberg, and Hans-Paul Schwefel)
met each other at such an institute, the Hermann Fottinger Institute of the
Technical University of Berlin, in 1964. Since they were fascinated not only
by aerodynamics, but also by cybernetics, they hit upon the idea to solve the
analytically (and at that time also numerically) intractable form design problem
with the help of some kind of robot. The robot should perform the necessary
experiments by iteratively manipulating a flexible model positioned at the outlet
of a wind tunnel. An experimenturn crucis was set up with a two-dimensional
foldable plate. The iterative search strategy-first performed by hand, a robot
was developed later on by Peter Bienert-was expected to end up with a flat
plate: the form with minimal drag. But it did not, since a one-variable-at-a-time
as well as a discrete gradient-type strategy always got stuck in a local minimum:
an S-shaped folding of the plate. Switching to small random changes that were
idea of Ingo Rechenbergonly accepted in the case of improvements-an
brought the breakthrough, which was reported at the joint annual meeting of
WGLR and DGRR in Berlin, 1964 (Rechenberg 1965). The interpretation of
binomially distributed changes as mutations and of the decision to step back or
not as selection (on 12 June 1964) was the seed for all further developments
leading to evolution strategies (ESs) as they are known today. So much about
the birth of the ES.
It should be mentioned that the domain of the decision variables was not fixed
or even restricted to real variables at that time. For example, the experimental
optimization of the shape of a supersonic two-phase nozzle by means of
mutation and selection required discrete variables and mutations (Klockgether
and Schwefel 1970) whereas first numerical experiments with the early ES on
a Zuse 2 23 computer (Schwefel 1965) employed discrete mutations of real
variables. The apparent fixation of ESs to Euclidean search spaces nowadays
TEAM LRN

Evolution strategies

is probably due to the fact that Rechenberg (1973) succeeded in analyzing the
simple version in Euclidean space with continuous mutation for several test
problems.
Within this setting the archetype of ESs takes the following form. An
individual a consisting of an element X E R is mutated by adding a normally
distributed random vector 2 N ( 0 , I t , ) that is multiplied by a scalar (T > 0 ( I t l
denotes the unit matrix with rank n ) . The new point is accepted if it is better than
or equal to the old one, otherwise the old point passes to the next iteration. The
selection decision is based on a simple comparison of the objective function
values of the old and the new point. Assuming that the objective function
f : R + R is to be minimized, the simple ES, starting at some point Xo E R,
is determined by the following iterative scheme:

where t E NJOdenotes the iteration counter and where (2, : t 3 0) is a sequence

of independent and identically distributed standard normal random vectors.
The general algorithmic scheme (9.1) was riot a novelty: Schwefel (1995,
pp 94-5), presents a survey of forerunners and related versions of (9.1) since
the late 1950s. Most methods differed in the mechanism of adjusting the
parameter U , , that is used to control the strength of the mutations (i.e. the
length of the mutation steps in n-dimensional space). Rechenbergs solution to
control parameter 0,is known as the 1 / 5 success rule: Increase 0,if the relative
frequency of successful mutations over some period in the past is larger than
1/5, otherwise decrease U , . Schwefel (1995, p I 12), proposed the following
implementation. Let t E N be the generation (or mutation) counter and assume
that t 2 Ion.
If t mod vz = 0 then determine the number s of successful mutations that
have occurred during the steps t - Ion to t - 1 .
(ii) If s < 2rz then multiply the step lengths by the factor 0.85.
( i i i ) If s > 2n then divide the step lengths by the factor 0.85.

(i)

First ideas to extend the simple ES (9.1) can be found in the book by Rechenberg
(1973, pp 78-86). The population consists of / A > I parents. Two parents are
selected at random and recombined by multipoint crossover and the resulting
individual is finally mutated. The offspring is added to the population. The
selection operation chooses the I-( best individuals out of the I-(
1 in total to
serve as parents of the next iteration. Since the search space was binary, this ES
was exactly the same evolutionary algorithm as became known later under the
term steady-state genetic algorithm (Section 28. I ) . The usage of this algorithmic
scheme for Euclidean search spaces poses the problem of how to control the
step length control parameter 0,.Therefore, the steady-state ES is no longer
TEAM LRN
in use.

Contemporary evolution strategies

9.2 Contemporary evolution strategies

The general algorithmic frame of contemporary ESs is easily presented by the
symbolic notation introduced by Schwefel ( 1 977). The abbreviation ( p A) ES
denotes an ES that generates h offspring from p parents and selects the p best
individuals from the p h individuals (parents and offspring) in total. This
notation can be used to express the simple ES by ( I
I ) ES and the steadystate ES by ( p 1) ES. Since the latter is not in use i t is convention that the
abbreviation ( p A) ES always refers to an ES parametrized according to the
relation I 5 p 5 h < 30.
The abbreviation ( p , A) ES denotes an ES that generates h offspring from
p parents but selects the p best individuals only from the h offspring. As a
consequence, h must be necessarily at least as large as p . However, since the
parameter setting p = h represents nothing more than a random walk, i t is
convention that the abbreviation ( p ,A) ES always refers to an ES parametrized
according to the relation 1 5 p < h < 00.
Apart from the population concept contemporary ESs differ from early ESs in
that an individual consists of an element x E R of the search space plus several
individual parameters con trolling the individual mu tation distribution. Usual I y,
mutations are distributed according to a multivariate normal distribution with
zero mean and some covariance matrix C that is symmetric and positive definite.
Unless matrix C is a diagonal matrix, the mutations in each coordinate direction
are correlated (Schwefel 1995, p 240). It was shown in Rudolph ( 1992) that a
matrix is symmetric and positive definite if and only if i t is decomposable via
C = (ST)ST where S is a diagonal matrix with positive diagonal entries and

+
+

(9.2)
r=l

]=/+I

is an orthogonal matrix built by a product of 11 ( n - 1)/2 elementary rotation

matrices R,, with angles wx E (0, 2 n 3. An elementary rotation matrix R , , ( w )
is a unit matrix where four specific entries are replaced by r r r = rJJ = cosw
and r r I = - r j j = - sinw.
As a consequence, n ( n - 1)/2 angles and ri scaling parameters are sufficient
to generate arbitrary correlated normal random vectors with zero mean and
covariance matrix C = (ST)ST via 2 = TSN, where N is a standard normal
random vector (since matrix multiplication is associative, random vector 2 can
be created in O ( n 2 )time by multiplication from right to left).
There remains, however, the question of how to choose and adjust these
individual strategy parameters. The idea that a population-based ES could be
able to adapt a, individually by including these parameters in the mutationselection process came up early (Rechenberg 1973, pp 132-7). Although first
experiments with the ( p 1 ) ES provided evidence that this approach works in
principle, the first really successful implementation of the idea of self-adaptation
was presented by Schwefel (1977)TEAM
and LRN
it is based on the observation that a

Evolution strategies

surplus of offspring (i.e. A > p ) is a good advice to establish self-adaptation of

individual parameters.
To start with a simple case let C = a I,,. Thus, the only parameter to be
self-adapted for each individual is the step length control parameter a . For
this purpose let the the genome of each individual be represented by the tuple
( X ,0 ) E R x R+,that undergoes the genetic operators. Now mutation is a
two-stage operation:

where t = 1z-I and Z, is a standard normal random variable whereas 2 is

a standard normal random vector. This scheme can be extended to the general
case with iz (U 1)/2 parameters.

(i)

Let w E (0, 2nj- denote the angles that are necessary to build the
orthogonal rotation matrix T(w) via (9.2). The mutated angles o:+)~are
obtained by
( 1 )
(1)
= (w,
q Z : i ) mod 2n

where cp > 0 and the independent random numbers Z:;) with i =

1, . . . , 12 ( n - 1)/2 are standard normally distributed.
(ii) Let a E IQ denote the standard deviations that are represented by
the diagonal matrix S(0) = diag(a, . . . , oll)). The mutated standard
deviations are obtained as follows. Draw a standard normally distributed
random number Z,. For each i = 1 , . . . , I I set

where ( T . 1 1 ) E R$ and the independent random numbers Z: are standard

normally distributed. Note that 2, is drawn only once.
( i i i ) Let X E R be the object variables and 2 be a standard normal random
vector. The mutated object variable vector is given by

According to Schwefel (1995) a good heuristic for the choice of the constants
appearing in the above mutation operation is

but recent extensive simulation studies (Kursawe 1996) revealed that the above
recommendation is not the best choice-especially in the case of multimodal
objective functions i t seems to be better to use weak selection pressure ( p / A not
too small) and a parametrization obeying the relation r > 11. As a consequence,
a final recommendation cannot be TEAM
given LRN
here, yet.

Contemporary evolution strategies

As soon as ,U > 1, the decision variables as well as the internal strategy

parameters can be recombined with usual recombination operators. Notice
that there is no reason to employ the same recombination operator for the
angles, standard deviations, and object variables. For example, one could
apply intermediate recombination (Chapter 33) to the angles as well as standard
deviations and uniform crossover to the object variables. With this choice
recombination of two parents works as follows. Choose two parents (X,U . w )
and (X,
U , 0)
at random. Then the preliminary offspring resulting from the
recombination process is

+ O (O+ w ) mod 4n
ux + (I - U)X,
2
2
0

where I is the unit matrix and U is a random diagonal matrix whose diagonal
entries are either zero or one with the same probability. Note that the angles
must be adjusted to the interval ( 0 , 2 n ] .
After these preparations a sketch of a contemporary ES can be presented:
Generate ,U initial parents of the type (X,0 , w ) and determine their objective
function values f ( X ) .
repeat
do A times:
Choose p 2 2 parents at random.
Recombine their angles, standard deviations, and object variables.
Mutate the angles, standard deviations, and object variables of the
preliminary offspring obtained via recombination.
Determine the offsprings objective function value.
Put the offspring into the offspring population.
end do
Select the ,U best individuals either from the offspring population
or from the union of the parent and offspring population.
The selected individuals represent the new parents.
until some stopping criterion is satisfied.
It should be noted that there are other proposals to adapt u l . In the case
of a ( I , A ) ES with A = 3 k and k E N, Rechenberg (1994, p 47) devised
the following rule: Generate k offspring with U ~ k, offspring with C O , and k
offspring with U , / C for some c > 0 ( c = 1.3 is recommended for I I 5 100, for
larger n the constant c should decrease).
Further proposals, that are however still in an experimental state, try
to derandomize the adaptation process by exploiting information gathered in
preceding iterations (Ostermeier et al 1995). This approach is related to
(deterministic) variable metric (or quasi-Newton) methods, where the Hessian
matrix is approximated iteratively by certain update rules. The inverse of the
Hessian matrix is in fact the optimal choice for the covariance matrix C. A
TEAM
large variety of update rules is given
by LRN
the Oren-Luonherger c-lnss (Oren and

Evolution strategies

Luenberger 1974) and it might be useful to construct probabilistic versions of

these update rules, but it should be kept in mind that ESs are designed to tackle
difficult nonconvex problems and not convex ones: the usage of such techniques
increases the risk that ESs will be attracted by local optima.
Other ideas that have not yet achieved a standard include the introduction
of an additional age parameter K for individuals in order to have intermediate
forms of selection between the ( p A ) ES with K = cc and the ( p , A ) ES
with K = 1 (Schwefel and Rudolph 1995), as well as the huge variety of ESs
whose population possesses a spatial structure. Since the latter is important for
parallel implementations and applies to other evolutionary algorithms as well
the description is omitted here.

9.3 Nested evolution strategies

The shorthand notation ( p f A ) ES was extended by Rechenberg (1978) to the
expression
[ p t A( p t A ) y 17 ES
with the following meaning. There are p populations of p parents. These are
used to generate (e.g. by merging) A initial populations of p individuals each.
For each of these A populations a ( p t A ) ES is run for y generations. The
criterion to rank the A populations after termination might be the average fitness
of the individuals in each population. This scheme is repeated y times. The
obvious generalization to higher levels of nesting is described by Rechenberg
(1994), where it is also attempted to develop a shorthand notation to specify the
parametrization completely.
This nesting technique is of course not limited to ESs: other evolutionary
algorithms and even mixtures of them can be used instead. In fact, the somewhat
artificial distinction between ESs, genetic algorithms, and evolutionary programs
becomes more and more blurred when higher concepts enter the scene. Finally,
some fields of application of nested evolutionary algorithms will be described
briefly.

9.3. I

Alterriati\ie method to corztrol internal parameters

Herdy (1992) used A subpopulations, each of them possessing its own different
and fixed step size 0 . Thus, there is no step size control at the level of
individuals. After y generations the improvements (in terms of fitness) achieved
by each subpopulation is compared to each other and the best p subpopulations
are selected. Then the process repeats with slightly modified values of 0.Since
subpopulations with a near-optimal step size will achieve larger improvements,
they will be selected (i.e. better step sizes will survive), resulting in an alternative
method to control the step size. TEAM LRN

References

9.3.2 Mixed-integer optimization

Lohmann ( 1992) considered optimization problems in which the decision
variables are partially discrete and partially continuous. The nested approach
worked as follows. The ESs in the inner loop were optimizing over the
continuous variables while the discrete variables were held fixed. After
termination of the inner loop, the evolutionary algorithm in the outer loop
compared the fitness values achieved in the subpopulations, selected the best
ones, mutated the discrete variables and passed them as fixed parameters to the
subpopulations in the inner loop.
It should be noted that this approach to mixed-integer optimization may
cause some problems. In essence, a GauB-Seidel-like optimization strategy is
realized, because the search alternates between the subspace of discrete variables
and the subspace of continuous variables. Such a strategy must fail whenever
simultaneous changes in discrete and continuous variables are necessary to
achieve further improvements.
9.3.3 Minimax optimization
Sebald and Schlenzig ( 1 994) used nested optimization to tackle minimax
problems of the type
min{max{f(x, y ) ) }
XX

where X E R" and Y

follows:
min{g(x) : x E X }

.EY

R"'. Equivalently, one may state the problem as

where

g(x) = max{f(x. y ) : y E Y ) .

The evolutionary algorithm in the inner loop maximizes f ( x . y ) with fixed

parameters x , while the outer loop is responsible for minimize g(.u)ing over the
set X .
Other applications of this technique are imaginable. An additional aspect
touches the evident degree of independence of executing the evolutionary
algorithms in the inner loop. As a consequence, nested evolutionary algorithms
are well suited for parallel computers.

References
Herdy M 1992 Reproductive isolation as strategy parameter in hierachically organized
evolution strategies Parallel Problem Solving from Nutirrt?, 2 ( P m c . 2nd lrrt. Conf
on Parallel Prohlrrn Sohling from Nature, Brus.sel,s, 1992) ed R Manner and B
LRN7
Manderick (Amsterdam: Elsevier)TEAM
pp 207-1

Evolution strategies

Klockgether J and Schwefel H-P I970 Two-phase nozzle and hollow core jet experiments
Pro(.. I l t h Synip. on Etigineering Aspect\ of Mrigrietoh?.drod?.nclrrtic.Jed D Elliott
(Pasadena, CA: California lnstitute of Technology) pp 141-8
Kursawe F 1996 Breeding evolution strategies-first results, talk presented at Dagstuhl
lectures Applicutioiis o j E\dittioiiup Algorithnt Y (March 1996)
Lohmann R I992 Structure evolution and incomplete induction Purullel Problem Sol\-ing
frotti Nuture, 2 (Pro(*.2nd Int. Conj: on Parullel Problem Sohirig from Nuture,
Brusseis, l Y Y 2 ) ed R Manner and B Manderick (Amsterdam: Elsevier) pp 175-85
Oren S and Luenberger D 1974 Self scaling variable metric (SSVM) algorithms. Part 11:
criteria and wfficient conditions for scaling a class of algorithms Muntigemettt SCI.
20 845-62
O\termeier A. GawelcLyk A and Hansen N 1995 A derandomked approach to \elfadaptation of evolution strategies Evolut. Conzpict. 2 369-80
Rechenberg I I965 Cybernetic solution path of un expeririientcil problem Library
Translation 1 122, Royal Aircraft Establishment, Farnborough, UK
-1913 E~~olittions.ctrcitegie:Optiniierurig techni,wher Systeme nuch Principien der
hiologi.\ (*henE\dution (Stuttgart: Frommann- Hol~boog
)
-1978 Evolutionsstrategien Siti~ulatiori.~riictl~oderi
in dcr Medicin und Biologic ed
B Schneider and U Ranft (Berlin: Springer) pp 83-1 14
-1994 E~v)lirtioii.\.\trtite~ie
Y4 (Stuttgart: Frommann-Holzboog)
Rudolph G 1992 On correlated mutations in evolution strategies Purullel Prohlerzi Soliirig
jrom N~iture,2 (Pro(.. 2nd lnt. Car$ o)i Pm-ullel Problem Solving from Nutirrr,
Bnr.\wls, 1992) ed R Manner and B Manderich (Amsterdam: Elsevier) pp 105-14
Schwefel H-P I965 Kyherrietische Elwlutioii cils Strutegie der e.~~~eritiir,itelle~i
Forsc-hung
in der Striit~iiing.\trc.hnik Diplomarbeit, Technical University of Berlin
-1977 Nunierische Optiniierurig \or1 Compirter-Modrllrn niittel\ der E\~olirtion.s,,triitegie (Base!: Birkhauser)
-1995 E\dirtion untl Optitnunz Seeking (New York: Wiley )
Schbefel H-P and Rudolph G 1995 Contemporary evolution strategies Ad\wic*e\ in
Artifkicil Ljfe ed F Morana et cil (Berlin: Springer) pp 893-907
Sebald A V and SchlenLig J 1994 Minimax design of neural net controllers for highly
uncertain plants IEEE Trtins. Neirrul Netirw-ks - 5 73-82

TEAM LRN

10
Evolutionary programming
V William Porto

10.1 Introduction
Evolutionary programming (EP) is one of a class of paradigms for simulating
evolution which utilizes the concepts of Darwinian evolution to iteratively
generate increasingly appropriate solutions (organisms) in light of a static or
dynamically changing environment. This is in sharp contrast to earlier research
into artificial intelligence research which largely centered on the search for
simple heuristics. Instead of developing a (potentially) complex set of rules
which were derived from human experts, EP evolves a set of solutions which
exhibit optimal behavior with regard to an environment and desired payoff
function. In a most general framework, EP may be considered an optimization
technique wherein the algorithm iteratively optimizes behaviors, parameters, or
other constructs. As in all optimization algorithms, it is important to note that
the point of optimality is completely independent of the search algorithm, and
is solely determined by the adaptive topography (i.e. response surface) (Atmar
1992).
In its standard form, the basic evolutionary program utilizes the four main
components of all evolutionary computation (EC) algorithms: initialization,
variation, evaluation (scoring), and selection. At the basis of this, as well as
other EC algorithms, is the presumption that, at least in a statistical sense,
learning is encoded phylogenically versus ontologically in each member of
the population. Learning is a byproduct of the evolutionary process as
successful individuals are retained through stochastic trial and error. Variation
(e.g. mutation) provides the means for moving solutions around on the search
space, preventing entrapment in local minima. The evaluation function directly
measures fitness, or equivalently the behavioral error, of each member in the
population with regard to the environment. Finally, the selection process
probabilistically culls suboptimal solutions from the population, providing an
efficient method for searching the topography.
The basic EP algorithm starts with a population of trial solutions which are
initialized by random, heuristic, or other appropriate means. The size of the
TEAM LRN

Evolutionary programming

population, p , may range over a broadly distributed set, but is in general larger
than one. Each of these trial solutions is evaluated with regard to the specified
fitness function. After the creation of a population of initial solutions, each
of the parent members is altered through application of a mutation process;
in strict EP, recombination is not utilized. Each parent member i generates
A, progeny which are replicated with a stochastic error mechanism (mutation).
The fitness or behavioral error is assessed for all offspring solutions with the
selection process performed by one of several general techniques including: (i)
the best p solutions are retained to become the parents for the next generation
(elitist, see Section 28.4), or (ii) p of the best solutions are statistically retained
( tournament, see Chapter 24), or (iii) proportional-based selection (Chapter 23).
In most applications, the size of the population remains constant, but there is no
restriction in the general case. The process is halted when the solution reaches
a predetermined quality, a specified number of iterations has been achieved, or
some other criterion (e.g. sufficient convergence) stops the algorithm.
EP differs philosophically from other evolutionary computational techniques
such as genetic algorithms (GAS) (Chapter 8) in a crucial manner. EP is a
top-down versus bottom-up approach to optimization. It is important to note
that (according to neo-Darwinism) selection operates only on the phenotypic
expressions of a genotype; the underlying coding of the phenotype is only
affected indirectly. The realization that a sum of optimal parts rarely leads
to an optimal overall solution is key to this philosophical difference. GAS
rely on the identification, combination, and survival of good building blocks
(schemata) iteratively combining to form larger better building blocks. In a
GA, the coding structure (genotype) is of primary importance as it contains
the set of optimal building blocks discovered through successive iterations.
The building block hypothesis is an implicit assumption that the fitness is a
separable function of the parts of the genome. This successively iterated local
optimization process is different from EP, which is an entirely global approach
to optimization. Solutions (or organisms) in an EP algorithm are judged solely
on their fitness with respect to the given environment. No attempt is made
to partition credit to individual components of the solutions. In EP (and in
evolution strategies (ESs), see Chapter 9), the variation operator allows for
simultaneous modification of all variables at the same time. Fitness, described
in terms of the behavior of each population member, is evaluated directly, and is
the sole basis for survival of an individual in the population. Thus, a crossover
operation designed to recombine building blocks is not utilized in the general
forms of EP.
10.2

History

The genesis of EP (Section 6.2) was motivated by the desire to generate an

alternative approach to artificial intelligence. Fogel ( 1962) conceived of using
TEAM LRN
the simulation of evolution to develop
artificial intelligence which did not

History

rely on heuristics, but instead generated organisms of increasing intellect over

time. Fogel (1964, Fogel et 01 1966) made the observation that intelligent
behavior requires the ability of an organism to make correct predictions within
its environment, while being able to translate these predictions into a suitable
response for a given goal. This early work focused on evolving finite-state
machines (Chapter 18; see the articles by Mealy (1955), and Moore (1957) for
a discussion of these automata) which provided a most generic testbed for this
approach. A finite-state machine (figure 10.1) is a mechanism which operates
on a finite set (i.e. alphabet) of input symbols, possesses a finite number of
internal states, and produces output symbols from a finite alphabet. As in all
finite-state machines, the corresponding input-output symbol pairs and state
transitions from every state define the specific behavior of the machine.

1/6

Figure 10.1. A simple finite-state machine diagram. Input symbols are shown to the
left of the slash. Output symbols are to the right of the slash. The finite-state machine
is presumed to start in state A.

In a series of experiments (Fogel et 01 1966), an environment was simulated

by a sequence of symbols from a finite-length alphabet. The problem was
defined as follows: evolve an algorithm which would operate on the sequence of
symbols previously observed in a manner that would produce an output symbol
which maximizes the benefit to the algorithm in light of the next symbol to
appear in the environment, relative to a well-defined payoff function.
EP was originally defined by Fogel (1964) in the following manner. A
population of parent finite-state machines, after initialization, is exposed to the
sequence of symbols (i.e. environment) which have been observed up to the
current time. As each input symbol is presented to each parent machine, the
output symbol is observed (predicted) and compared to the next input symbol.
A predefined payoff function provides a means for measuring the worth of each
prediction. After the last prediction is made, some function of the sequence of
LRNfitness of each machine. Offspring
payoff values is used to indicate theTEAM
overall

Evolutionary programming

machines are created by randomly mutating each parent machine. As defined

above, there are five possible resulting modes of random mutation for a finitestate machine. These are: ( i ) change an output symbol; ( i i ) change a state
transition; (iii) add a state; (iv) delete an existing state; and ( v ) change the
initial state. Other mutations were proposed but results of experiments with
these mutations were not described by Fogel et nl (1966). To prevent the
possibility of creating null machines, the deletion of a state and the changing of
the initial state were allowed only when a parent machine had more than one
state.
Mutation operators are chosen with respect to a specified probability
distribution which may be uniform, or another desired distribution. The number
of mutation operations applied to each offspring is also determined with respect
to a specified probability distribution function (e.g. Poisson) or may be fixed CI
priori. Each of the mutated offspring machines is evaluated over the existing
environment (set of input-output symbol pairs) in the same manner as the parent
mac hi nes.
After offspring have been created through application of the mutation
operator(s) on the members of the parent population, the machines providing
the greatest payoff with respect to the payoff function are retained to become
parent members for the next generation. Typically, one offspring is created for
each parent, and half of the total machines are retained in order to maintain a
constant population size. The process is iterated until it is required to make an
actual prediction of the next output symbol in the environment, which has yet
to be encountered. This is analogous to the presentation of a naive exemplar to
a previously trained neural network. Out of ths entire population of machines,
only the best machine, in terms of its overall worth, is chosen to generate the new
output symbol. Fogel originally proposed selection of machines which score in
the top half of the entire population, i.e. a nonregressive selection mechanism.
Although discussed as a possibility to increase variance, the retention of lesserquality machines was not incorporated in these early experiments.
Since the topography (response surface) is changed after each presentation
of a symbol, the fitness of the evolved machines must change to reflect the
payoff from the previous prediction. This prevents evolutionary stagnation as
the adaptive topography is experiencing continuous change. As is evidenced
in nature, the complexity of the representation is increased as the finite-state
machines learn to recognize more subtle features in the experienced sequence
of symbols.

Fogel (see Fogel 1964, Fogel et crl 1966) used EP on a series of successively
more difficult prediction tasks. These experiments ranged from simple twoTEAM cyclic
LRN sequences degraded by addition
symbol cyclic sequences, eight-symbol

History

Pnmertess of the increasing integers

0 01 complexity cost/state
5 machines selcted/generation
Single mutabon. random series 1
10 pneratrons/prediction

80
100 120
140
160
Number of symbols experienced

180

200

220

Figure 10.2. A plot showing the convergence of EP on finite-state machines evolved to

predict primeness of numbers.

of noise, and sequences of symbols generated by other finite-state machines to

nonstationary sequences and sequences taken from the article by Flood ( 1962).
In one example, the capability for predicting the 'primeness'. i.e. whether or
not a number is prime, was tested. A nonstationary sequence of symbols was
generated by classifying each of the monotonically increasing set of integers
as prime (with symbol 1 ) or nonprime (with symbol 0). The payoff function
consisted of an all-or-none function where one point was provided for each
correct prediction. No points or penalty were assessed for incorrect predictions.
A small penalty term was added to maximize parsimony, through the subtraction
of 0.01 multiplied by the number of states in the machine. This complexity
penalty was added due to the limited memory available on the computers at
that time. After presentation of 719 symbols, the iterative process was halted
with the best machine possessing one state, with both output symbols being
zero. Figure 10.2 indicates the prediction score achieved in this nonstationary
environment. Because prime numbers become increasingly infrequent (Burton
1976), the asymptotic worth of this machine, given the defined payoff function,
approaches 100%.
After initial, albeit qualified, success with this experiment, the goal was
altered to provide a greater payoff for correct prediction of a rare event. Correct
prediction of a prime was worth one plus the number of nonprimes preceding it.
For the first 150 symbols, 30 correct predictions were made (primes predicted
as primes), 37 false positives (nonprimes predicted as primes), and five primes
were missed. On predicting the 151st through 547th symbols there were 65
LRN positives. Of the first 35 prime
correct predictions of primes, andTEAM
67 false

Evolutionary programming

numbers, five were missed, but of the next 65 primes, none were missed. Fogel
et crl (1966) indicated that the machines demonstrated the capability to quickly
recognize numbers which are divisible by two and three as being nonprime,
and that some capability to recognize divisibility by five as being indicative
of nonprimes was also evidenced. Thus, the machines generated evidence of
learning a definition of primeness without prior knowledge of the explicit nature
of a prime number, or any ability to explicitly divide.
Fogel and Burgin ( 1969) researched the use of EP in game theory. In
a number of experiments, EP was consistently able to discover the globally
optimal strategy in simple two-player, zero-sum games involving a small number
of possible plays. This research also showed the ability of the technique to
outperform human subjects in more complicated games. Several extensions were
made to the simulations to address nonzero-sum games (e.g. pursuit evasion.)
A three-dimensional model was constructed where EP was used to guide an
interceptor towards a moving target. Since the target was, in most circumstances,
allowed a greater degree of maneuverability, the success or failure of the
interceptor was highly dependent upon the learned ability to predict the position
of the target without CI priori knowledge of the targets dynamics.
A different aspect of EP was researched by Walsh et er1 (1970) where EP
was used for prediction as a precursor to automatic control. This research
concentrated on decomposing a finite-state machine into submachines which
could be executed in parallel to obtain the overall output of the evolved system.
A primary goal of this research was to maximize parsimony in the evolving
machines. In these experiments, finite-state machines containing seven and
eight states were used as the generating function for three output symbols. The
performance of three human subjects was compared to the evolved models when
predicting the next symbol in the respective environments. In these experiments,
EP was consistently able to outperform the human subjects.

The basic EP paradigm may be described by the following EP algorithm:

TEAM LRN

History

where:
U ' is an individual member in the population
p 3 I is the size of the parent population
h > 1 is the size of the offspring population
P ( t ) := { a ; ( t ) ,ah(t). . . . , ~ ~ ~ is
( tthe) population
)
at time t
CD : Z -+ '31 is the fitness mapping
m(..),,,
is the mutation operator with controlling parameters
.s(-), is the selection operator 3 s(-), : (Z* U I / ' + * ) + 1''
Q E {Vr, P ( t ) } is a set of individuals additionally accounted for in the
selection step, i.e. parent solutions.
Other than on initialization, the search space is generally unconstrained;
constraints are utilized for generation and initialization of starting parent
solutions. Constrained optimization may be addressed through imposition of
the requirement that (i) the mutation operator (Section 32.4) is formulated to
only generate legitimate solutions (often impossible) or ( i i ) a penalty function
is applied to offspring mutations lying outside the constraint bounds in such
a manner that they do not become part of the next generation. The objective
function explicitly defines the fitness values which may be scaled to positive
values (although this is not a requirement, it is sometimes performed to alter
the range for ease of implementation).
In early versions of EP applied to continuous parameter optimization (Fogel
1992) the mutation operator is Gaussian with a zero mean and variance obtained
for each component of the object variable vector as the square root of a linear
transform of the fitness value cp(z).
x, ( k

+ 1 ) :=

Xf ( k )

+ JBf (k)cp(.r,( k ) + Y I ) + N (0. 1 )

where z ( k ) is the object variable vector, is the proportionality constant, and

y is an offset parameter. Both B and y must be set externally for each problem.
N , (0, 1 ) is the ith independent sample from a Gaussian distribution with zero
mean and unit variance.
Several extensions to the finite-state machine formulation of Fogel et a1
( 1966) have been offered to address continuous optimization problems as well
as to allow for various degrees of parametric self-adaptation (Fogel 199 I a. 1992,
1995). There are three main variants of the basic paradigm, identified as follows:
(i) original EP, where continuous function optimization is performed without
any self-adaptation mechanism;
(ii) continuous EP where new individuals in the population are inserted directly
without iterative generational segmentation (i.e. an individual becomes part
of the existing (surviving) population without waiting for the conclusion of
a discrete generation; this is also known as steady-state selection (Section
28.3) in GAS and ( p 1 ) selection (Chapter 9) in ES);
(iii) self-adaptive EP, which augments the solution vectors with one or more
TEAM LRN
parameters governing the mutation
process (e.g. variances, covariances)

Evolutionary programming

to permit self-adaptation of these parameters through the same iterative

mutation, scoring, and selection process. In addition, self-adaptive EP may
also be continuous in the sense of (ii) above.
The original EP is an extension of the formulation of Fogel et a1 (1966)
wherein continuous-valued functions replace the discrete alphabets of finitestate machines. The continuous form of EP was investigated by Fogel and Fogel
(1993). To properly simulate this algorithmic variant, it is necessary to insert
new population members by asynchronous methods (e.g. event-driven interrupts
in a true multitasking, real-time operating system). Iterative algorithms running
on a single central processing unit (CPU) are much more prevalent. since they
are easier to program on todays computers, hence most implementations of EP
are performed on a generational (epoch-to-epoch) basis.
Self-adaptive EP is an important extension of the algorithm in that it
successfully overcomes the need for explicit user-tuning of the parameters
associated with mutation. Global convergence may be obtained even in the
presence of suboptimal parameterization, but available processing time is most
often a precious resource and any mechanism for optimizing the convergence
rate is helpful. As proposed by Fogel (1992, 1995). EP can self-adapt the
variances for each individual in the following way:

.r,(k

+ 1) := . r , ( k ) + u , ( k )t N,(O,1)
+

~ , ( k 1 ) := ~ , ( k ) [ a u , ( k ) ] * N,(O,1).
The variable (T ensures that the variance U , remains nonnegative. Fogel
(1992) suggests a simple rule wherein V u , ( k ) 5 0, u , ( k ) is set to 6, a value
close to but not identically equal to zero (to allow some degree of mutation).
The sequence of updating the object variable .U, and variance U , was proposed to
occur in opposite order from that of ESs (Back and Schwefel 1993, Rechenberg
1965, Schwefel 1981). Gehlhaar and Fogel (1096) provide evidence favoring
the ordering commonly found in ES.
Further development of this theme led Fogel (1991a, 1992) to extend the
procedure to alter the correlation coefficients between components of the object
vector. A symmetric correlation coefficient matrix P is incorporated into
the evolutionary paradigm in addition to the self-adaptation of the standard
deviations. The components of P are initialized over the interval [ - 1 , 11
and mutated by perturbing each component, again, through the addition of
independent realizations from a Gaussian random distribution. Bounding
limits are placed upon the resultant mutated variables wherein any mutated
coefficient which exceeds the bounds [ - I . I ] is reset to the upper or loher
limit, respectively. Again, this methodology is similar to that of Schwefel
(1981 ), as perturbations of both the standard deviations and rotation angles
(determined by the covariance matrix P) allow adaptation to arbitrary contours
TEAM LRN
on the error surface. This self-adaptation
through the incorporation of correlated

Current directions

mutations across components of each parent object vector provides a mechanism

for expediting the convergence rate of EP.
Fogel ( 1988) developed different selection operators which utilized
tournament competition (Chapter 24) between solution organisms. These
operators assigned a number of wins for each solution organism based on a
set of individual competitions (using fitness scores as the determining factor)
among each solution and each of the q competitors randomly selected from the
total population.

10.3 Current directions

Since the explosion of research into evolutionary algorithms in the late 1980s
and early 199Os, EP has been applied to a wide range of problem domains with
considerable success. Application areas in the current literature include training,
construction, and optimization of neural networks, optimal routing (in two, three,
and higher dimensions), drug design, bin packing, automatic control, game
theory, and optimization of intelligently interactive behaviors of autonomous
entities, among many others. Beginning in 1992, annual conferences on EP have
brought much of this research into the open where these and other applications
as well as basic research have expanded into numerous interdisciplinary realms.
Notable within a small sampling of the current research is the work in neural
network design. Early efforts (Port0 1989, Fogel et nl 1990, McDonnell 1992,
and others) focused on utilizing EP for training neural networks to prevent
entrapment in local minima. This research showed not only that EP was well
suited to training a range of network topologies, but also that it was often more
efficient than conventional (e.g. gradient-based) methods and was capable of
finding optimal weight sets while escaping local minima points. Later research
(Fogel 1992, Angeline et a1 1994, McDonnell and Waagen 1993) involved
simultaneous evolution of both the weights and structure of feedforward and
feedback networks. Additional research into the areas of using EP for robustness
training (Sebald and Fogel 1992), and for designing fuzzy neural networks for
feature selection, pattern clustering, and classification (Brotherton and Simpson
1995) have been very successful as well as instructive.
EP has been also used to solve optimal routing problems. The trawling
salesman problem (TSP), one of many in the class of nondeterministicpolynomial-time- (NP-) complete (see Aho et a1 1974) problems, has been
studied extensively. Fogel ( 1988, 1993) demonstrated the capability of EP to
address such problems. A representation was used wherein each of the cities
to be visited was listed in order, with candidate solutions being permutations
of this listing. A population of random tours is scored with respect to their
Euclidean length. Each of the tours is mutated using one of many possible
inversion operations (e.g. select two cities in the tour at random and reverse the
order of the segment defined by the two cities) to generate an offspring. All of
the offspring are then scored, with either
TEAM elitist
LRN or stochastic competition used to

Evolutionary programming

cull lower-scoring members from the population. Optimization of the tours was
quite rapid. In one such experiment with 1000 cities uniformly distributed, the
best tour (after only 4 x 10 function evaluations) was estimated to be within
5-75? of the optimal tour length. Thus, excellent solutions were obtained after
searching only an extremely small portion of the total potential search space.
EP has also been utilized in a number of medical applications. For
example, the issue of optimizing drug design was researched by Gehlhaar P t
ul (1995). EP was utilized to perform a conformational and position search
within the binding site of a protein. The search space of small molecules
which could potentially dock with the crystallographically determined binding
site was explored iteratively guided by a database of crystallographic proteinligand complexes. Geometries were constrained by known physical ( i n three
dimensions) and chemical bounds. Results demonstrated the efficacy of this
technique as it was orders of magnitude fater in finding suitable ligands than
previous hands-on methodologies. The probability of successfully predicting the
proper binding modes for these ligands was estimated at over 95% using nominal
values for the crystallographic binding mode and number of docks attempted.
These studies have permitted the rapid development of several candidate drugs
which are currently in clinical trials.
The issue of utilizing EP to control systems has been addressed widely
(Fogel and Fogel 1990, Fogel 1991a, Page P t cil 1992, and many others).
Automatic control of fuzzy heating, ventilation, and air conditioning (HVAC)
controllers was addressed by Haffner and Sebald (1993). In this study, a
nonlinear, multiple-input. multiple-output (MIMO) model of a HVAC system
was used and controlled by it fuzzy controller designed using EP. Typical fuzzy
controllers often use trial and error methods to determine parameters and transfer
functions, hence they can be quite time consuming with a complex MIhlO
HVAC system. These experiments used EP to design the membership functions
and values (later studies were extended to find rules as well a s responsibilities
of the primary controller) to automate the tuning procedure. EP worked in
an overall search space containing 76 parameters, 10 controller inputs, seven
controller outputs. and 80 rules. Simulation results demonstrated that EP was
quite effective at choosing the membership functions of the control laboratory
and corridor pressures in the model. The synergy of combining EP with fuzzy
set constructs proved quite fruitful in reducing the time required to design a
stable, functioning HVAC system.
Game theory has always been at the forefront of artificial intelligence
research. One interesting game, the iterated prisoners dilemma, has been
studied by numerous investigators (Axelrod 1987. Fogel I99 I b, Harrald and
Fogel 1996, and others). In this two-person game, each player can choose
one of two possible behavioral policies: defection or cooperation. Defection
implies increasing ones own reward at the expense of the opposing player,
while cooperation implies increasing the reward for both players. If the game
LRN move is defection. If the players
is played over a single iteration, theTEAM
dominant

Current direction s

strategies depend on the results of previous iterations, mutual cooperation may

possibly become a rational policy, whereas if the sequence of strategies is not
correlated, the game degenerates into a series of single plays with defection
being the end result. Each player must choose to defect or cooperate on each
trial. Table 10.I describes a general form of the payoff function in the prisoner's
dilemma.
Table 10.1. A general form of the payoff matrix for the prisoner's dilemma problem. y I
is the payoff to each player for mutual cooperation. y2 is the payoff' for cooperating when
the other player defects. y3 is the payoff for defecting when the other player cooperates.
y4 is the payoff to each player for mutual defection. Entries ( U ,p ) indicate pajof't's to
players A and B, respectively.
Player B

In addition, the payoff matrix defining the game is subject to the following
constraints (Rapoport 1966):

Both neural network approaches (Harrald and Fogel 1996) and finite-state
machine approaches (Fogel 1991b) have been applied to this problem. Finitestate machines are typically used where there are discrete choices between
cooperation and defection. Neural networks allow for a continuous range of
choices between these two opposite strategies. Results of these preliminary
experiments using EP, in general, indicated that mutual cooperation is more
likely to occur when the behaviors are limited to the extremes (the finitestate machine representation of the problem), whereas in the neural network
continuum behavioral representation of the problem, i t is easier to slip into a
state of mutual defection.
Development of interactively in tell igen t be haviors was investigated by Foge 1
et a1 (1996). EP was used to optimize computer-generated force (CGF)
behaviors such that they learned new courses of action adaptively as changes
in the environment (i.e. presence or absence of opposing side forces) were
encountered. The actions of the CGFs were created at the response of an event
scheduler which recognized significant changes in the environment as perceived
by the forces under evolution. New plans of action were found during these
event periods by invoking an evolutionary program. The iterative EP process
TEAM
LRNmet, and relinquished control of the
was stopped when time or CPU limits
were

100

Evolutionary programming

simulated forces back to the CGF simulator after transmitting newly evolved
instruction sets for each simulated unit. This process proved quite successful
and offered a significant improvement over other rule-based systems.

10.4 Future research

Important research is currently being conducted into the understanding of the
convergence properties of EP, as well as the basic mechanisms of different
mutation operators and selection mechanisms. Certainly of great interest is the
potential for self-adaptation of exogeneous parameters of the mutation operation
(meta and Rmeta-EP), as this not only frees the user from the often difficult
task of parameterization, but also provides a built-in, automated mechanism for
providing optimal settings throughout a range of problem domains. The number
of application areas of this optimization technique is constantly growing. EP.
along with the other EC techniques, is being used on previously untenable. often
NP-complete, problems which occur quite often in commercial and military
pro b I e ms.

References
Aho A V, Hopcroft J E and Ullman J D 1974 Tlic Desigii trricl Aticrlv.\i,\ of Cotnpiitrr
Algorithms (Reading, MA: Addison-Wesley) pp 143-5, 3 18-26
Angeline P, Saunders G and Pollack J 1994 Complete induction of recurrent neural
networks Proc. 3rd Anti. Conj: on E\vliitiotitiry Progrcirtirnitig (Sirti Diego, C A , 1994)
ed A V Sebald and L J Fogel (Singapore: World Scientific) pp 1-8
Atmar W 1992 On the rules and nature of simulated evolutionary programming Pro(*.
1st Anti. Cot$ o t i E\*oliitioticic Progrtittiniitig (LaJollu, C A , 1992) ed D B Fogel
and W Atmar (San Diego, CA: Evolutionary Programming Society) pp 17-26
Axelrod R 1987 The evolution of strategies in the iterated prisoners dilemma Griirrii.
Algorirhms mid Siriiiiltitetl Atirieciling ed L DaL is (London) pp 32-42
Back T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimimtion E\*oliitioncin C(itnpit. 1 1-23
Brotherton T W and Simpson P K 1095 Dynamic feature set training of neural
networks for classification E\diitionciry Progrutt?itnitig IV: Proc. 4th Anti. Cot!/.o t i
E\vliitioticin Progrtrtnmirig (Strti Diego, C A , l W 5 ) ed J R McDonnell, R G Reynold\
and D B Fogel (Cambridge, MA: MIT Press) pp 83-94
Burton D M 1976 E l e m e n t q v Nirriiher the or^ (Boston, MA: Allyn and Bacon) p 136-52
Flood M M 1962 Stochastic learning theory applied to choice experiment\ with cats,
dogs and men Buhmiorcrl Sci. 7 289-3 14
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biol.
Cyhumet. 60 139-44
-I99 1 a System Identificutioti tlzroiigh Simulated E\*oliitioti (Needham. MA: Ginn)
---I
99 1 b The evolution of intelligent decision making in gaming C\hrrnut. LY~:\t 22
223-36
TEAM
-1992 E\*ol\+ig Artijrticil Ititrlligrtii~e
PhDLRN
Dissertation, Univer\ity o f California

References

101

993 Applying evolutionary programming to selected traveling salesman problems

Cyhernet. Syst. 24 27-36
-1995 E\dutionun Computation, Touwd U New- Philosophy of Muchine Iiitelli~qi~ric~e
(Piscataway, NJ: IEEE)
Fogel D B and Fogel L J 1990 Optimal routing of multiple autonomous
underwater vehicles through evolutionary programming Proc. Symp. on Autonomoi4.t
Undem*citer Vehicle Technology (Washington, DC: IEEE Oceanic Engineering
Society) pp 44-7
Fogel D B, Fogel L J and Porto V W 1990 Evolving neural networks B i d . Cyhernet. 63
487-93
Fogel G B and Fogel D B 1993 Continuous evolutionary programming: analysis and
experiments Cyheriiet. Syst. 26 79-90
Fogel L J 1962 Autonomous automata Industrial Res. 4 14-9
Fogel L J 1964 On The Organization of Intellect PhD Dissertation, University of
California
Fogel L J and Burgin G H I969 Competitive Goal-Seeking through E\dutioriun
Programming Air Force Cambridge Research Labratories Final Report Contract
AFI 9(628)-5927
Fogel L J, Owens A J and Walsh M J 1966 ArtiJiciul Iritellig~~ric~e
through Siniuluted
Evolution (New York: Wiley)
Fogel L J, Porto V W and Owen M 1996 An intelligently interactive non-rule-based
computer generated force Proc. 6th COT$ on Computer Geiwruted Force.\ mid
Behuviorul Representutiorr (Orlando, FL: Institute for Simulation and Training
STRICOM-DMSO) pp 265-70
Gehlhaar D K and Fogel D B 1996 Tuning evolutionary programming for
conformationally flexible molecular docking Proc. 5th Ann. Car$ on E\*olurioncir~
Programming (1996) ed L J Fogel, P J Angeline and T Back (Cambridge, MA:
MIT Press) pp 419-29
Gehlhaar D K, Verkhivker G, Rejto P A, Fogel D B, Fogel L J and Freer S T 1995
Docking conformationally flexible small molecules into a protein binding site
through evolutionary programming Evolutionan Pmgrumming IV: Proc. 4th Anri.
CO/$ on Evolutionan Programming (San Diego, CA, 1995) (Cambridge, MA: MIT
Press) pp 615-27
Harrald P G and Fogel D B 1996 Evolving continuous behaviors in the iterated prisoners
dilemma BioSwtems 37 1 3 5 4 5
Haffner S B and Sebald A V 1993 Computer-aided design of fuzzy HVAC
controllers using evolutionary programming Proc. 2nd Ann. Conf on E\*oIutioncrn
Programming (San Diego, CA, 1993) ed D B Fogel and W Atmar (San Diego, CA:
Evolutionary Programming Society) pp 98-1 07
McDonnell J M 1992 Training neural networks with weight constraints Proc. 1st Ann.
Cot$ on Evolutioncin Programming (La Jollu, CA, 1992) ed D B Fogel and
W Atmar (San Diego, CA: Evolutionary Programming Society) pp 1 I 1-9
McDonnell J M and Waagen D 1993 Neural network structure design by evolutionary
programming Proc. 2nd A m . Con$ on Evolutioncin Progrmiming (Sun Diego, CA,
1993) ed D B Fogel and W Atmar (San Diego, CA: Evolutionary Programming
Society) pp 79-89
Mealy G H 1955 A method of synthesizing sequential circuits Bell S y ~ t .Tech. J. 34
TEAM LRN
1054-79
-1

102

Evolutionary programming

Moore E F I957 Gedanken-experiments on sequential machines: automata studies Annci1.c

o~Mat/zei)i(iti(cil
Stirdies vol 34 (Princeton, NJ: Princeton University Press) pp 12953
Page W C, McDonnell J M and Anderson B 1092 An evolutionary programming
approach to multi-dimensional path planning Proc. 1st Ann. Cot$ ott Eivlurionnt?
Progrunzming (LA Jolla, CA, 1992) ed D B Fogel and W Atmar (San Diego, CA:
Evolutionary Programming Society) pp 63-70
Porto V W I989 Evolutionary methods for training neural networks for underwater
pattern classification 24th Ann. Asilotncir Conj; on Signcils, Systenzs cind Conzpurers
vol 2 pp 1015-19
Rapoport A 1966 Optirnul Policies jbr the Prisoners Dilernrnci University of North
Carolina Psychometric Laboratory Technical Report 50
Rechenberg 1 1965 Cybernetic Soliition Peith c$un Erperimentcil Problem Royal Aircraft
Establishment Translation 1 I22
Schwefel H-P 198I Nirtneric~crl0ptimi:cition of Conzpirter Models (Chichester: Wiley)
Sebald A V and Fogel D B 1992 Design of fault tolerant neural networks for pattern
classification Proc. 1st Anti. Conj: on E\diitionun Progrcintming (hi J o l l a , CA,
1992) ed D B Fogel and W Almar (San Diego. CA: Evolutionary Programming
Society) pp 90-9
Walsh M J, Burgin G H and Fogel L J 1970 Prediction uncl Control through the Use oj
Aiitorncitci eind their E\*olirtiori US Navy Final Report Contract NOW 14-66-C-0284

Further reading
There are several excellent general references available to the reader interested
in furthering his or her knowledge in this exciting area of EC. The following
books are a few well-written examples providing a good theoretical background
in EP as well as other evolutionary algorithms.
1 . Back T 1996 E\wlictionciry Algorithms in Theory c i n d Prcictice (New York: Oxford

University Press)
2. Fogel D B I995 E\diitionury Compiitcition, Tmrurcl ci N e ~ vPhilosophy ($Machine
lntelligenc*e (Piscataway, NJ: IEEE)

3. Schwefel H-P 1981 Numeric~ulOptirnizcition

Wiley)

(.f Computer

Models (Chichester:

4. Schwefel H-P I995 Eidution cind Optirnizition Seeking (New York: Wiley)

TEAM LRN

11
Derivative methods in genetic
programming
Kenneth E Kinnear, Jr

11.1 Introduction
This chapter describes the fundamental concepts of genetic programming (GP)
(Koza 1989, 1992). Genetic programming is a form of evolutionary algorithm
which is distinguished by a particular set of choices as to representation,
genetic operator design, and fitness evaluation. When examined in isolation,
these choices define an approach to evolutionary computation (EC) which is
considered by some to be a specialization of the genetic algorithm (GA). When
considered together, however, these choices define a conceptually different
approach to evolutionary computation which leads researchers to explore new
and fruitful lines of research and practical applications.

11.2 Genetic programming defined and explained

Genetic programming is implemented as an evolutionary algorithm in which
the data structures that undergo adaptation are executable computer programs.
Fitness evaluation in genetic programming involves executing these evolved
programs. Genetic programming, then, involves an evolution-directed search of
the space of possible computer programs for ones which, when executed, will
produce the best fitness.
In short, genetic programming breeds computer programs. To create the
initial population a large number of computer programs are generated at random.
Each of these programs is executed and the results of that execution are used
to assign a fitness value to each program. Then a new population of programs,
the next generation, is created by directly copying certain selected existing
programs, where the selection is based on their fitness values. This population
is filled out by creating a number of new offspring programs through genetic
operations on existing parent programs which are selected based, again, on their
fitness. Then, this new population of programs is again evaluated and a fitness
is assigned to each program based on the results of its evaluation. Eventually
TEAM LRN

103

I 04

Derivative methods in genetic programming

this process is terminated by the creation and evaluation of a 'correct' program

or the recognition of some other specific termination criteria.
More specifically, at the most basic level, genetic programming is defined
as a genetic algorithm with some unusual choices made as to the representation
of the problem, the genetic operators used to modify that representation, and
the fitness evaluation techniques employed.

I I .2. I A .speciali:eci representation: executable programs

Any evolutionary algorithm is distinguished by the structures used to represent
the problem to be solved. These are the structures which undergo transformation,
and in which the potential solutions reside.
Originally, most genetic algorithms used linear strings of bits (Chapter 15)
as the structures which evolved (Holland 197S), and the representation of the
problem was typically the encoding of these bits as numeric or logical parameters
of a variety of algorithms. The evolving structures were often used as parameters
to human-coded algorithms. In addition, the bitstrings used were frequently of
fixed length, which aided in the translation into parameters for the algorithms
involved.
More recently, genetic algorithms have appeared with real-valued numeric
sequences used as the evolvable structures, still frequently used as parameters to
particular algorithms. In recent years, many genetic algorithm implementations
have appeared with sequences which are of variable length, sometimes based
on the order of the sequences, and which contain more complex and structured
information than parameters to existing algorithms.
The representation used by genetic programming is that of an executable
program. There is no single form of executable program which is used by
all genetic programming implementations, although many implementations use
a tree-structured representation (Chapter 19) highly reminiscent of a LISP
functional expression. These representations are almost always of a variable
size, though for implementation purposes a maximum size is usually specified.
Figure 1 1 . 1 shows an example of a tree-structured representation for a
genetic programming implementation. The specific task for which this is a
reasonable representation is the learning of a Boolean function from a set
of inputs. This figure contains two different types of node (as do most
genetic programming representations) which are called functions and terminals.
Terminals are usually inputs to the program, although they may also be constants.
They are the variables which are set to values external to the program itself prior
to the fitness evaluation performed by executing the program. In this example
dO and dl are the terminals. They can take on binary values of either zero or
one.
Functions take inputs and produce outputs and possibly produce side-effects.
The inputs can be either a terminal or the output of another function. In the
TEAM LRN
above example, the functions are AND,
OR, and NOT. Two of these functions

Genetic programming defined and explained

105

Figure 11.1. Tree-structured representation used in genetic programming.

are functions of two inputs, and one is a function of one input. Each produces
a single output and no side effect.
The fitness evaluation for this particular indi\liditril is determined by the
effectiveness with which it will produce the correct logical output for all of the
test cases against which i t is tested.
One way to characterize the design of a representation for an application
of genetic programming to a particular problem is to view it as the design of
a language, and this can be a useful point of view. Perhaps i t is more useful,
however, to view the design of a genetic programming representation as that
of the design of a virtual machine-since usually the execution engine must
be designed and constructed as well as the representation or language that is
executed.
The representation for the program (i.e. the definition of the functions and
terminals) must be designed along with the virtual machine that is to execute
them. Rarely are the programs evolved in genetic programming given direct
control of the central processor of a computer (although see the article by
Nordin (1994)). Usually, these programs are interpreted under control of a
virtual machine which defines the functions and terminals. This includes the
functions which process the data, the terminals that provide the inputs to the
programs, and any control functions whose purpose is to affect the execution
flow of the program.
As part of this virtual machine design task, it is important to note that the
output of any function or the value of any terminal may be used as the input to
any function. Initially, this often seems to be a trivial problem, but when actually
performing the design of the representation and virtual machine to execute that
representation, it frequently looms rather large. Two solutions are typically used
for this problem. One approach is to design the virtual machine, represented by
the choice of functions and terminals, to use only a single data type. In this way,
the output of any function or the value of any terminal is acceptable as input to
TEAM
LRN more than one data type to exist
any function. A second approach is
to allow

I06

Derivative methods in genetic programming

in the virtual machine. Each function must then be defined to operate on any of
the existing data types. Implicit coercions are performed by each function on
its input to convert the data type that it receives to one that it is more normally
defined to process. Even after handling the data type problem, functions must
be defined over the entire possible range of argument values. Simple arithmetic
division must be defined to return some value even when division by zero is
attempted.
I t is important to note that the definition of functions and the virtual machine
that executes them is not restricted to functions whose only action is to provide
a single output value based on their inputs. Genetic programming functions
are often defined whose primary purpose is the actions they take by virtue of
their side-effects. These functions rnust return some value as well, but their real
purpose is interaction with an environment external to the genetic programming
system.
An additional type of side-effect producing function is one that implements
a control structure within the virtual machine defined to execute the genetically
evolved program. All of the common programming control constructs such as
if-then+lse, while-do, for, and others have been implemented as evolvable
control constructs within genetic programming systems. Looping constructs
must be protected in such a way that they will never loop forever. and usually
have an arbitrary limit set on the number of loops which they will execute.
As part of the initialization of a genetic programming run, a large number of
individual programs are generated at random. This is relatively straightforward,
since the genetic programming system is supplied with information about the
number of arguments required by each function, as well as all of the available
terminals. Random program trees are generated using this information, typically
of a relatively small size. The program trees will tend to grow quickly to be
quite large in the absence of some explicit evolutionary pressure toward small
size or some simple hard-coded lirnits to growth.

The second specific design approach that distinguishes genetic programming

from other types of genetic algorithm is the design of the genetic operators.
Having decided to represent the problem to be solved as a population of
computer programs, the essence of an evolutionary algorithm is to evaluate
the fitness of the individuals in the population and then to create new members
of the population based in some way on the individuals which have the highest
fitness in the current population.
In genetic algorithms, recombination is typically the key genetic operator
employed, with some utility ascribed to mutation as well. In this way,
genetic programming is no different from any other genetic algorithm. Genetic
algorithms usually have genetic material organized in a linear fashion and
TEAM LRN defined for such genetic material
the recombination, or crossover, algorithm

Genetic programming defined and explained

107

is quite straightforward (see Section 33.1). The usual representation of

genetic programming programs as tree-structured combinations of functions and
terminals requires a different form of recombination algorithm. A major step in
the invention of genetic programming was the design of a recombination operator
which would simply and easily allow the creation of an offspring program tree
using as inputs the program trees of two individuals of generally high fitness as
parents (Cramer 1985, Koza 1989, 1992).
In any evolutionary algorithm it is vitally important that the fitness of the
offspring be related to that of the parents, or else the process degenerates into
one of random search across whatever representation space was chosen. It
is equally vital that some variation, indeed heritable variation, be introduced
into the offsprings fitness, otherwise no improvement toward an optimum is
possible.

Figure 11.2. Recombination in genetic programming.

The tree-structured genetic material usually used in genetic programming

has a particularly elegant recombination operator that may be defined for it.
In figure 11.2, there are two parent program trees, ( a ) and (b). They are to
be recombined through crossover to create an offspring program tree ( c ) . A
subtree is chosen in each of the parents, and the offspring is created by inserting
the subtree chosen from (6) into the place where the subtree was chosen in
( a ) . This very simply creates an offspring program tree which preserves the
TEAM LRN
same constraints concerning the number
of inputs to each function as each

I08

Derivative methods in genetic programming

parent tree. In practice it yields a offspring tree whose fitness has enough
relationship to that of its parents to support the evolutionary search process.
Variations in this crossover approach are easy to imagine, and are currently the
subject of considerable active research in the genetic programming community
(D'haeseleer 1994, Teller 1996).
Mutation (Chapter 3 2 ) is a genetic operator mhich can be applied to a \ingle
parent program tree to create an offspring tree. The typical mutation operator
used selects a point inside a parent tree, and generates a new random subtree
to replace the selected subtree. This random subtree i\ uwally generated by the
\ame procedure used to generate the initial population of program trees.

Finally, then, the last detailed distinction between genetic programming and a
more usual implementation o f the genetic algorithm is that of the assignment of
a fitness value for a individual.
I n genetic programming, the representation of the individual is a program
which. when executed under control of a defined virtual machine, implements
some algorithm. It may do this by returning some value (as would be the case
for a system t o learn a specitic Boolean function) or it might do this through the
performance of some task through the use of functions which have side-effects
that act on a simulated (or even the real) world.
The results of the program's execution are evaluated in some way, and this
evaluation represents the fitness of the individual. This fitness is used to drive
the selection process for copying into the next generation or for the selection of
parents to undergo genetic operations yielding offspring. Any selection operator
from those presented in Chapters 22-33 can be used.
There is certainly a desire to evolve programs using genetic programming
that are 'general', that is to say that they will not only correctly process the
fitness cases on which they are evolved, but will process correctly any fitness
cases which could be presented to them. Clearly, in the cases where there are
intinitely many possible cases, such as evolving a general sorting algorithm
(Kinnear 1993), the ekdutionary process can only be driven by a \ a y limited
number of fitness cases. Many of the lessons from machine learning on the
tradeoffs between generality and performance on training cases have been
helpful to genetic programming researchers, particularly those from decision
tree approaches to machine learning (Iba et (11 1994).

11.3 The development of genetic programming

LISP was the language in which the ideas which led to genetic programming
were first dekeloped (Cramer 1985, Koza 1989, 1992). LISP has always been
one of the preeminent language choices for implementation where programs
TEAMprograms
LRN
need to be treated as data. In this case,
are data while they are being

The value of genetic programming

109

evolved, and are only considered executable when they are undergoing fitness
evaluation.
As genetic programming itself evolved in LISP, the programs that were
executed began to look less and less like LISP programs. They continued to be
tree structured but soon few if any of the functions used in the evolved programs
were standard LISP functions. Around I992 many people implemented genetic
programming systems in C and C++, along with many other programming
languages. Today, other than a frequent habit of printing the representation
of tree-structured genetic programs in a LISP-like syntax, there is no particular
connection between genetic programming and LISP.
There are many public domain implementations of genetic programming in
a wide variety of programming languages. For further details, see the reading
list at the end of this section.

11.4 The value of genetic programming

Genetic programming is defined as a variation on the theme of genetic
algorithms through some specific selections of representation, genetic operators
appropriate to that representation, and fitness evaluation as execution of that
representation in a virtual machine. Taken in isolation, these three elements
do not capture the value or promise of genetic programming. What makes
genetic programming interesting is the conceptual shift of the problem being
solved by the genetic algorithm. A genetic algorithm searches for something,
and genetic programming shifts the search from that of parameter discovery
for some existing algorithm designed to solve a problem to a search for a
program (or algorithm) to solve the problem directly. This shift has a number
of ramifications.
0

This conceptualization of evolving computer programs is powerful in part

because it can change the way that we think about solving problems.
Through experience, it has become natural to think about solving
problems through a process of human-oriented program discovery. Genetic
programming allows us to join this approach to problem solving with
powerful EC-based search techniques.
An example of this is a variation of genetic programming called
stack genetic programming (Perkis 1994), where the program is a variablelength linear string of functions and terminals, and the argument passing
is defined to be on a stack. The genetic operators in a linear system
such as this are much closer to the traditional genetic algorithm operators,
but the execution and fitness evaluation (possibly including side-effects) is
equivalent to any other sort of genetic programming. The characteristics of
stack genetic programming have not yet been well explored but it is clear
that it has rather different strengths and weaknesses than does traditional
TEAM LRN
genetic programming.

110

Derivative methods in genetic programming

Many of the approaches to simulation of adaptive behavior involve
simple programs designed to control ariirncits . The conceptualization of
evolving computer programs as presented by genetic programming fits well
with work on evolving adaptive entities (Reynolds 1994, Sims 1994).
There has been a realization that not only can we evolve programs that are
built from human-created functions and terminals, but that the functions
from which they are built can evolve as well. Kozas invention of
automatically defined functions (ADFs) (Koza 1994) is one such example
of this realization. ADFs allow the definitions of certain subfunctions to
evolve even while the functions that call them are evolving. For certain
classes of problems, ADFs result in considerable increases in performance
(Koza 1994, Angeline and Pollack 1993, Kinnear 1994).
Genetic programming is capable of integrating a wide variety of existing
capabilities, and has potential to tie together several complementary
subsystems into an overall hybrid system. The functions need not be simple
arithmetic or logical operators, but could instead be fast Fourier transforms,
GMDH systems, or other complex building blocks. They could even be
the results of other evolutionary computation algorithms.
The genetic operators that create offspring programs from parent programs
are themselves programs. These programs can also be evolved either as
part of a separate process, or in a coevolutionary way with the programs on
which they operate. While any evolutionary computation algorithm could
have parameters that affect the genetic operators be part of the evolutionary
process, genetic programming provides a natural way to let the operators
(defined as programs) evolve directly (Teller 1996, Angeline 1996).
Genetic programming naturally enhances the possibility for increasingly
indirect evolution. As an example of the possibilities, genetic programming
has been used to evolve grammars which, when executed, produce the
structure of an untrained neural network. These neural networks are then
trained. and the trained networks are then evaluated on a test set. The results
of this evaluation are then used as the fitnesses of the evolved grammars
(Gruau 1993).

This last example is a step along the path toward modeling embryonic
development in genetic programming. The opportunity exists to evolve
programs whose results are themselves programs. These resulting programs
are then executed and their values or side-effects are evaluated-and become
the fitness for the original evolving, program creating programs. The analogy
to natural embryonic development is clear, where the genetic material, the
genotype, produces through development a body, the phenotype, which then
either does or does not produce offspring, the fitness (Kauffman 1993).
Genetic programming is valuable in part because we find it natural to
examine issues such as those mentioned above when we think about evolutionary
TEAM LRN
computation from the genetic programming
perspective.

References

111

References
Angeline P J 1996 Two self-adaptive crossover operator\ for genetic programming
Adiunces in Genetic Prograrnrnirig 2 ed P J Angcline and K E Kinnear Jr
(Cambridge, MA: MIT Press)
Angeline P J and Pollack J B 1993 Competitive environments evolve better solutions for
complex tasks Ptm. 5th lnt. Con& on Genetic Algorithrm ( Uri~ctritr-Cliciiiii~tri~~ii,
IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
Cramer N L 1985 A representation of the adaptive generation of simple sequential
programs Proc. 1st Int. Conf on Genetic Algorithm (Pittshi;qh, PA. J u l y 198-5)
ed J J Grefenstette (Hillsdale, NJ: Erlbaum)
Dhaeseleer P 1994 Context preserving crossover in genctic programming 1st IEEE Coiif
oti Eidutionun Coinpiitdon (Orlando, FL, June 1994) (Piscataway, NJ: IEEE)
Gruau F 1993 Genetic synthesis of modular neural networks PI-oc. 5th I i i t . Conf oti
Genetic Algorithins (Urbriiici-Chan2paign,IL, Jul! 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann)
Holland J H 1975 Adrptation in Natural and ArtiJir-itrlS\,steins (Ann Arbor, MI: The
University of Michigan Press)
Iba H, de Garis H and Sato T 1994 Genetic programming using a minimum description
length principle Adimces in Genetic Progmtnniiiig ed K E Kinnear Jr (Cambridge.
MA: MIT Press)
Kauffman S A 1993 The Origins of Order: SelfOrg.citii~crtiori( i i i d Selwtioti iii E\dirtioii
(New York: Oxford University Press)
Kinnear K E Jr 1993 Generality and difficulty in genetic programming: evolving a sort
Proc. 5th Inr. Conf. on Genetic Algorithms ( ~ r h c l l i c l - ~ h c ~ n i ~ ~IL,
r r i July,
g n , 199-3) ed
S Forrest (San Mateo, CA: Morgan Kaufmann)
-I994 Alternatives in automatic function definition: a comparison of performance
A d i w i c ~ sin Genetic Programming ed K E Kinnear Jr (Cambridge. MA: MIT Pre\\)
Koza J R 1989 Hierarchical genetic algorithms operating on population\ of computer
programs Proc. I I th Int. Joint Cot$ on Artijkicil liitc4ligcwcx~(San Mateo. CA:
Morgan Kaufmann)
Nordin P 1994 A compiling genetic programming system that directly manipulates the
machine code Ad\mce.\ in Genetic ProRraniniing ed K E Kinnex Jr (Cambridge,
MA: MIT Press)
Perkis T 1994 Stack-based genetic programming Proc. 1st IEEE Irit. Conf o ~ i
Eivlutiotiar~Contputatioti (Orlando, FL, June 1994) (Piscataway, NJ: IEEE)
Reynolds C R 1994 Competition, coevolution and the game of tag Artijkitrl Ljfi IV:
Proc. 4th Int. Workshop on the Synthesis anti Sitnuldoii of Liiiiig Systetiis cd R A
Brooks and P Maes (Cambridge, MA: MIT Press)
Sims K 1994 Evolving 3D morphology and behavior by competition Artificitrl L i p I\/:
Proc. 4th Int. Workshop on the Synthesis urid Siinulntioti of Liiirig Sjstcwis ed R A
Brooks and P Maes (Cambridge, MA: MIT Press)
Teller A 1996 Evolving programmers: the co-evolution of intelligent recombination
operators Advance.\ in Genetic Prograrnrning 2 ed P J Angeline and K E Kinnear
Jr (Cambridge, MA: MIT Press) TEAM LRN

Derivative methods in genetic programming

112
~~~~~

Further reading
1. Koza J R 1992 Genetic Progrcirnrnirig (Cambridge, MA: MIT Press)
The tirst book on the subject. Contains full instructions on the possible details of
carrying out genetic programming, as well as a complete explanation of genetic
algorithms (on which genetic programming is based). Also contains I I chapters
showing applications of genetic programming to a wide variety of typical artificial
intelligence, machine learning, and sometimes practical problems. Gives many
examples of how to design a representation of a problem for genetic programming.
3
Kom J R 1994 Genetic. Progrcrrtrtning / I (Cambridge, MA: MIT Press)
-.

A book principally about automatically defined functions ( ADFs). Shows the

applications of ADFs to a wide variety of problems. The problems shown in this
volume are considerably more complex than those shown in Genetic Progruntnting,
and there is much less introductory material.
3. Kinnear K E Jr (ed) 1994 Achwtc.es in Genetic Progrutntning (Cambridge, MA:
MIT Press)

Contains a short introduction to genetic programming, and 22 research papers on

the theory, implementation, and application of genetic programming. The papers are
typically longer than those i n a technical conference and allow a deeper exploration
of the topics involved. Shows a wide range of applications of genetic programming.
as well as useful theory and practice in genetic programming implementation.

4. Forrest S (ed) 1993 Proc. 5th Int. Cot$ on Genetic Algorithtns ( Urbuttu-C~z~it?lp~ii~tz,
IL. July 1993) (San Mateo, CA: Morgan Kaufmann)
Contains several interesting papers on genetic programming.

5. 1994 1st IEEE Cot$ on Evo/iitionur?. Compurution (Orfundo, FL, June 1994)
(Piscataway, NJ: IEEE)
Contains many papers on genetic programming as well as a wide assortment of
other EC-based papers.
6. Eshelman L J (ed) 1995 Proc-.6th lnt. Conj: on Genetic-Algorithms (Pittsburgh, PA,
July IYY.5) (Cambridge, MA: MIT Press)

Contains a considerable number of applications of genetic programming to

increasingly diverse areas.
7 . Angeline P J and Kinnear K E Jr (eds) 1996 Advunces in Genetic Progrumming I1

(Cambridge, MA: MIT Press)

A volume devoted exclusively to research papers on genetic programming, each
longer and more in depth than those presented i n most conference proceedings.

8. Kauffman S A 1993 The Origins of Order: Selj-Orguttixtiotz und Selection in

TEAM LRN
Etdiction (New York: Oxford University
Press)

Further read i n g

113

A tour-de-force of interesting ideas, many of them applicable to genetic

programming as well as other branches of e\wlutionary computation.

9. ft p. i ().corn pu b/gene t ic-prog rammi ng

An anonymous ftp site with considerable public domain information and
implementations of genetic programming systems. This is a yolunteer site, so its
lifetime is unknowm.

TEAM LRN

12
Learning classifier systems
Robert E Smith

12.1 Introduction
The learning classifier system (LCS) (Goldberg 1989, Holland et ul 1986) is
often referred to as the primary machine learning technique that employs genetic
algorithms (GAS). It is also often described as ii production system framework
with a genetic algorithm as the primary rule discovery method. However, the
details of LCS operation vary widely from one implementation to another. In
fact, no standard version of the LCS exists. In many ways, the LCS is more
of a concept than an algorithm. To explain details of the LCS concept, this
article will begin by introducing the type of machine learning problem most
often associated with the LCS. This discussion will be followed by a overview
of the LCS, in its most common form. Final sections will introduce the more
complex issues involved in LCSs.

12.2 Qpes of learning problem

To introduce the LCS, it will be useful to describe types of machine learning
problem. Often, in the literature, machine learning problems are described in
terms of cognitive psychology or animal behavior. This discussion will attempt
to relate the terms used in machine learning to engineering control.
Consider the generic control problem shown in figure 12.1. In this
problem, inputs from an external control system, combined with uncontrollable
disturbances from other sources, change the state of the plant. These changes
in state are reflected in the state information provided by the plant. Note that,
in general, the state information can be incomplete and noisy.
Consider the supen7ised learning problem shown in figure 12.2 (Barto 1990).
In this problem, an inverse plant model (or teacher) is available that provides
errors directly in terms of control actions. Given this direct error feedback, the
parameters of the control system can be adjusted by means of gradient descent,
to minimize error in control actions. Note that this is the method used in the
neural network backpropagation algorithm.
114

TEAM LRN

Types of learning problem

115

Figure 12.2. A supervised learning problem.

Nob consider the reiizfbrceinerzt learriirzg problem shown in figure 12.3

(Barto 1990). Here, no inverse plant model is available. However, a critic
is available that indicates error in the state information from the plant. Because
error is not directly provided in terms of control actions. the parameters of the
controller cannot be directly adjusted by methods such as gradient descent.

Figure 12.3. A reinforcement learning problem.

The remaining discussion will consider the control problem to operate as a

Markov decision problem. That is, the control problem operates in discrete time
steps. the plant is always in one of a finite number of discrete states, and a finite,
TEAM LRN
discrete number of control actions are available. At each time step, the control

I I6

Learning classifier systems

action alters the probability of moving the plant from the current state to any
other state. Note that deterministic environments are a specific case. Although
this discussion will limit itself to discrete problems, most of the points made
can be related directly to continuous problems.
A characteristic of many reinforcement learning problems is that one may
need to consider a sequence of control actions and their results to determine
how to improve the controller. One can examine the implications of this by
associating a rerturd or cost with each control action. The error in state in
figure 12.3 can be thought of as a cost. One can consider the long-term effects
of an action formally as the expected, iriJitiite-lzori,7ctidiscmmted cvst:

h'c,

where 0 5 h 5 1 is the discount parameter, and c, is the cost of the action

taken at time t .
To describe a strategy for picking actions, consider the following approach:
for each action 14 associated with a state i , assign a value Q ( i ,U ) . A 'greedy'
strategy is to select the action associated with the best Q at every time step.
Therefore, an optimum setting for the Q-values is one in which a 'greedy'
strategy leads to the minimum expected, infinite-horizon discounted cost. Qlearning is a method that yields optimal Q-values in restricted situations.
Consider beginning with random settings for each Q-value, and updating each
Q-value on-line as follows:

where min Q ( j , L i f t ] ) is the minimum Q available in state j , which is the state

arrived in after action u f is taken in state i (Barto et a1 1991, Watkins 1989).
The parameter ac is a learning rate parameter that is typically set to a small
value between zero and one. Arguments based on dyncimic progr~rmmitzgand
Bellmcui opfirnality show that if each state-action pair is tried an infinite number
of times, this procedure results in optimal Q-values. Certainly, it is impractical
to try every state-action pair an infinite number of times. With finite exploration,
Q-values can often be arrived at that are approximately optimal. Regardless of
the method employed to update a strategy in a reinforcement learning problem,
this exploration-exploitation dilemma always exists.
Another difficulty in the Q-value approach is that it requires storage of
a separate Q-value for each state-action pair. In a more practical approach,
one could store a Q-value for a group of state-action pairs that share the
same characteristics. However, it is not clear how state-action pairs should
be grouped. In many ways, the LCS can be thought of as a GA-based technique
for grouping state-action pairs. TEAM LRN

Learning classifier system introduction

117

12.3 Learning classifier system introduction

Consider the following method for representing a state-action pair in a
reinforcement learning problem: encode a state in binary, and couple it to an
action, which is also encoded in binary. In other words, the string

0 1 1 0 / 0 1 0
represents one of 16 states and one of eight actions. This string can also be
seen as a rule that says IF in state 0 1 1 0,THEN take action 0 1 0. In an
LCS, such a rule is called a classifier. One can easily associate a Q-value, or
other performance measures, with any given classifier.
Now consider generalizing over actions by introducing a dont care
character (#) into the state portion of a classifier. In other words, the string
# 1 1 # / 0 1 0
is a rule that says IF in state 0 1 1 0 OR state 0 1 1 1 OR state 1 1 1 0 OR
state 1 1 1 1, THEN take action 0 1 0.The introduction of this generality
allows an LCS to represent clusters of states and associated actions. By using
the genetic algorithm to search for such strings, one can search for ways of
clustering states together, such that they can be assigned joint performance
statistics, such as Q-values.
Note, however, that Q-learning is not the most common method of credit
assignment in LCSs. The most common method is called the bucket brigade

TEAM LRN

Figure 12.4. The structure of an LCS.

Learning classifier systems

1 I8

trlgorihi for updating a classifier performance statistic called sfrengtk. Details

of the bucket brigade algorithm will be introduced later in this section.
The structure of a typical LCS is shown in figure 12.4. This is what is
known as a .stimirlus-r~.spoiz.~~
LCS, since no internal messages are used as
memory. Details of internal message posting in LCSs will be discussed later.
In this system, detectors encode state information from an environment into
binary messages, which are matched against a list of rules called classifiers.
The classifiers used are of the form

IF (condition) THEN (action).

The operational cycle of this LCS is:
(i)
(ii)
(iii)
(iv)

Detectors post environmental messages on the message list.

All classifiers are matched against all messages on the message list.
Fully matched classifiers are selected to act.
A conflict resolution (CR) mechanism narrows the list of active classifiers
to e 1i m i nate contradictory act ions.
( v ) The message list is cleared.
(vi) The CR-selected classifiers post their messages.
(vii) Effectors read the messages from the list, and take appropriate actions in
the environment.
(viii) If a reward (or cost) signal is received. it is used by a credit allocation
(CA) system to update parameters associated with the individual classifiers
(such as the traditional strength measure, Q-like values, or other measures
(Booker 1989, Smith I99 1 )).
12.4

Michigan and Pitt style learning classifier systems

There are two methods of using the genetic algorithm in LCSs. One is for each
genetic algorithm population member to represent an entire set of rules for the
problem at hand. This type of LCS is typified by Smiths LS-I which was
developed at the University of Pittsburgh. Often, this type of LCS is called
the Pitt approach. Another approach is for each genetic algorithm population
member to represent a single rule. This type of LCS is typified by the CS1 of Holland and Reitman (1978), which was developed at the University of
Michigan, and is often called the Michigan approach.
In the Pitt approach, crossover and other operators are often employed
that change the number of rules in any given population member. The Pitt
approach has the advantage of evaluating a complete solution within each
genetic algorithm individual. Therefore, the genetic algorithm can converge to a
homogeneous population, as in an optimization problem, with the best individual
located by the genetic algorithm search acting as the solution. The disadvantage
is that each genetic algorithm population member must be completely evaluated
as a rule set. This entails a large computational expense, and may preclude
on-line learning in many situations.TEAM LRN

119

The bucket brigade algorithm (implicit form)

In the 'Michigan' approach, one need only evaluate a single rule set, that
comprised by the entire population. However, one cannot use the usual genetic
algorithm procedures that will converge to a homogeneous population, since one
rule is not likely to solve the entire problem. Therefore, one must c.oe\wlr?ea
set of cooperative rules that jointly solve the problem. This requires a genetic
algorithm procedure that yields a diverse population at steady state, in a fashion
that is similar to sharing (Deb and Goldberg 1989, Goldberg and Richardson
1987), or other multimodal genetic algorithm procedures. I n some cases simply
dividing reward between similar classifiers that fire can yield sharing-like effects
(Horn et a1 1994).
12.5

The bucket brigade algorithm (implicit form)

As was noted earlier, the bucket brigade algorithm is the most common form of
credit allocation for LCSs. In the bucket brigade, each classifier has a strength,
S , which plays a role analogous to a @value. The bucket brigade operates as
follows:

(i)
(ii)
(iii)
(iv)

Classifier A is selected to act at time t .

Reward I-, is assigned in response to this action.
Classifier B is selected to act at time t
1.
The strength of classifier A is updated as follows:

s:+' = S ~ <- aI > + a [r, + ( A s B ) J .

(v) The algorithm repeats.
Note that this is the implicit form of the bucket brigade, first introduced by
Wilson (Goldberg 1989, Wilson 1985).
Note that this algorithm is essentially equivalent to Q-learning, but with one
important difference. In this case, classifier A's strength is updated with the
strength of the classifier that actually acts (classifier B). In Q-learning. the Qvalue for the rule at time t is updated with the best (2-valued rule that matches
I or not.
the state at time t 1 , whether that rule is selected to act at time r
This difference is key to the convergence properties associated with Q-learning.
However, it is interesting to note that recent empirical studies have indicated
that the bucket brigade (and similar procedures) may be superior to Q-learning
in some situations (Rummery and Niranjan 1994, Twardowski 1993).
A wide variety of variations of the bucket brigade exits. Some include a
variety of taxes, which degrade strength based on the number of times a classifier
has matched and fired and the number of generations since the classifier's
creation. or other features. Some variations include a variety of methods for
using classifier strength in conflict resolution through strength-based biddirtg
procedures (Holland et a1 1986). However, how these techniques fit into the
broader context of machine learning, through similar algorithms such as QTEAM LRN
learning, remains a topic of research.

I20

Learning classifier systems

In many LCSs,strength is used as fitness in the genetic algorithm. However,

a promising recent study indicates that other measures of classifier utility may
be more effective (Wilson 1995).

12.6 Internal messages

The LCS discussed to this point has operated entirely in stimulus-response
mode. That is, it possesses no internal memory that influences which rule
tires. In a more advanced form of the LCS. the action sections of the rule
are internal messages that are posted on the message list. Classifiers have a
condition that matches environmental messages (those which are posted by the
environment) and a condition that matches internal messages (those posted by
other classifiers). Some internal messages will cause effectors to fire (causing
actions in the environment), and others simply act as internal memory for the
LCS.
The operational cycle of a LCS with internal memory is as follows:
Detectors post environmental messages on the message list.
All classifiers are matched against all messages on the message list.
Fully matched classifiers are selected to act.
A conflict resolution (CR)mechanism narrows the list of active classifiers
to eliminate contradictory actions, and to cope with restrictions on the
number of messages that can be posted.
The message list is cleared.
The CR-selected classifiers post their messages.
Effectors read the messages from the list, and take appropriate actions in
the environment.
(viii) If a reward (or cost) signal is received, i t updates parameters associated
with the individual classifiers.

In LCSs with internal messages, the bucket brigade can be used in its
original, explicit form. In this form, the next rule that acts is linked to the
previous rule through an internal message. Otherwise, the mechanics are similar
to those noted above. Once classifiers are linked by internal messages, they can
form rrdr drtrirzs that express complex sequences of actions.

12.7 Parasites
The possibility of rule chains introduced by internal messages, and by payback
credit allocation schemes such as the bucket brigade or @learning, also
introduces the possibility of rule parasites. Simply stated, parasites are rules that
TEAM LRN
in ;1 rule chain or a sequence of LCS
obtain fitness through their participation

Variations of the learning classification system

121

actions, but serve no useful purpose in the problem environment. In some cases,
parasite rules can prosper, while actually degrading overall system performance.
A simple example of parasite rules in LCSs is given by Smith (1994). In this
study, a simple problem is constructed where the only performance objective is
to exploit internal messages as internal memory. Although fairly effective rule
sets were evolved in this problem, parasites evolved that exploited the bucket
brigade, and the existing rule chains, but that were incorrect for overall system
performance. This study speculates that such parasites may be an inevitable
consequence in systems that use temporal credit assignment (such as the bucket
brigade) and evolve internal memory processing.

12.8 Variations of the learning classification system

As was stated earlier, this article only outlines the basic details of the LCS
concept. It is important to note that many variations of the LCS exist. These
include:

Variations in representation and matching proceditres. The { I , 0, #I

representation is by no means defining to the LCS approach. For instance.
Valenzuela-Rendon ( 199 1 ) has experimented with a fuzzy representation
of classifier conditions, actions, and internal messages. Higher-cardinality
alphabets are also possible. Other variations include simple changes
in the procedures that match classifiers to messages. For instance,
sometimes partial matches between messages and classifier conditions
are allowed (Booker 1982, 1985). In other systems, classifiers have
multiple environmental or internal message conditions. In some suggested
variations, multiple internal messages are allowed on the message list at
the same time.
Variations in credit assignment. As was noted above, a variety of credit
assignment schemes can be used in LCSs. The examination of such
schemes is the subject of much broader research in the reinforcement
learning literature. Alternate schemes for the LCS prominently include
epochnl techniques, where the history of reward (or cost) signals is recorded
for some period of time, and classifiers that act during the epoch are updated
simultaneously.
Variatioizs in discovery operators. In addition to various versions of the
genetic algorithm, LCSs often employ other discovery operators. The most
common nongenetic discovery operators are those which create new rules
to match messages for which no current rules exist. Such operators are
often called create, colwing, or guessing operators (Wilson 1985). Other
covering operators are suggested that create new rules that suggest actions
not accessible in the current rule set (Riolo 1986, Robertson and Riolo
TEAM LRN
1988).

122

Learning c 1ass i ti er sy stem s

12.9 Final comments

As was stated in section 12.1, the LCS remains a concept, more than a specitic
algorithm. Therefore, some of the details discussed here are necessarily sketchy.
However, recent research on the LCS is promising. For a particularly clear
examination of a simplified LCS, see a recent article by Wilson (1994). This
article also recommends clear ak'enues for LCS research and development.
Interesting LCS applications are also appearing in the literature (Smith and
Dike 1995).
Given the robust character of evolutionary computation algorithms, the
machine learning techniques suggested by the L,CS concept indicate a powerful
avenue of future evolutionary computation application.

References
Barto A G 1990 Soirie Lvtrrriirig Tusks jrom Li Control Perspec.ri\>eCOINS Technical
Report 90- 122, University of Massachusetts
Barto A G, Bradtke S J and Singh S P 1991 Rrcil-tiirir Leciriiirtg ciml Coiitml i r s i i i g
A.s~ti(~hro~ioii.s
Dyimriic- Progrtrmmiii~COINS Technical Report 9 1-57, University
of Massachusetts
Booker L B 1982 Intelligent behavior as an adaptation to the task environment
Dis.sertrition.sAhstnrc-ts lilt. 43 469B; University Microfilms 82 I4966
-1985 Improving the pertornmancc of genetic algorithms in classifier systems Prot..
Itit. Col$ 0 1 1 Genetic Aigorithms ciiid T h i r Applic~irionspp 80-92
-I989 Triggered rule discovery in classifier systems Proc. 3rd Int. Corzj: 0 1 1 G e i i i 4 .
Algorithms (Fairfus, VA, Jitiie 1989) ed J D Schaffer (San Mateo. CA: Morgan
Kaufmann) pp 265-74
Deb K and Goldberg D E I989 An investigation o f niche and species formation in genetic
function optimization Proc. 3rd hit, Coi$ oii Genetic Algoritkms (Fciirjh-, VA, Jirrir
1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 32-50
Goldberg D E 1989 Gerietic A1gorithm.s in Selrrdi, Optiirii;.~itioii,ctiitl Mtrc-hine Leciriiiiig
(Reading, MA: Addison-Wesley)
Goldberg D E and Richardson J 1987 Genetic algorithms with sharing !or multimodal
function optimiLation Proc. 2nd Irit. Coi$ 0 1 1 Cmutic Algorithrrrs (Cm~ihritl~qr.
M A,
1987) ed J J Grefenstette (Hillsdale. NJ: Erlbuum) pp 41-9
Holland J H, Holyoak K J , Nisbett R E and Thagard P R 1986 Itidiic~tioii:Proc~c~.s.~c~.s
of
Iizfkreiicx~,Letrrtiing, mid D i . s c - o \ ~(Cambridge,
n~
MA: MIT Press)
Holland J H and Reitman J S 1978 Cognitive systems based on adaptive algori~hms
Ptrtterti Directed li!fereiit~rSystriiis ed D A Waterman and F Hayes-Roth (New
York: Academic) pp 3 13-24,
Horn J , Goldberg D E and Deb K I993 Implicit niching in a learning classifier system:
Nature's way E\dittioncir~ Coiripitt. 2 37-66
Riolo R L 1 986 C'FS-c':ti Pwktige of' Domuin I~rclept~~i~lrtit
Sirhroirtinrsfi)r lmpirtiieiitiri~~
Clnssifirr S!xtriirs ii r A rhitra n User-dejiirud E'ii ~iroiirrrcw
t s U n i ve rs i t y of M i c h i ga n ,
Logic of Computers Croup, Technical Report
Robertson G G and Riolo R 1988 A tale of two classifier systems Mnchine Lwriiiii,q 3
TEAM LRN
139-60

References

123

Rummery G A and Niranjan M 1994 Oil-liiie Q-Lcwriiiiig i t s i r i g Coiiiiec~tioni.stS!~.steni.v

Cambridge University Technical Report CUED/F-INFENGRR 166
Smith R E 199I DrJilltlr Hierurchy Forni(itiori ~ i i i dMenior:\.Esploitatiori I i i Lctirr1iiig
Clrssijur STstems University of Alabama TCGA Report 9 1003; PhD Dissertation;
University Microfilms 9 1-30265
Smith R E and Dike B A 1995 Learning novel fighter combat maneuver rules via genetic
algorithms Int. J. Espert Syst. 8 84-94
Twardowski K 1993 Credit assignment for pole balancing with learning classifier systems
Proc-. 5th Irit. Con/: 0 1 1 Genetic. Algorithnis ( Urbccrirr-Chtrni~?cii~~~i,
IL. Jirlj. 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 2 3 8 4 5
Valenzuela-Rend6n M 199 I The fuzzy classifier system: a classifier system for
continuously varying variables Proc. 4th h i t . Con/: o i i Geiiotic AIgorithi?i.s(Sciri
Diego, CA, July l 9 9 i ) ed R Belew and L Booker (San Mareo, CA: Morgan
Kaufmann) pp 346-53
Watkins J C H 1989 Learning with delayed rewards
Wilson S W 1985 Knowledge growth in an artificial animal Proc. lilt. Col$ on Gcwric.
Algorithiws ccnd Their Applications pp 16-23
-I994 ZCS: a zeroth level classifier system E\~)lirtioritri?~
Coniput. 2 1 - 18
-I
995 Classifier fitness based or1 accuracy E\dictioiicii;v Compirt. 3 139-76

TEAM LRN

13
Hybrid methods
Zbign ie w Michale w icz
There is some experimental evidence (Davis 1991, Michalewicz 1993) that the
enhancement of evolutionary methods by some additional (problem-specific)
heuristics, domain knowledge, or existing algorithms can result in a system
with outstanding performance. Such enhanced systems are often referred to as
hjhrid e v ol ut i on ary systems.
Several researchers have recognized the potential of such hybridization of
evolutionary systems. Davis ( 199 1, p 56) wrote:
When I talk to the user, I explain that niy plan is to hybridize the
genetic algorithm technique and the current algorithm by employing
the following three principles:
0

Use the Cirrrent Encoding. Use the current algorithm's encoding

technique in the hybrid algorithm.
Hybridix Where Possible. Incorporate the positive features of the
current algorithm in the hybrid algorithm.
Acicipt the Getietir- Opt-utors. Create crossover and mutation
operators for the new type of encoding by analogy with bit
string crossover and mutation operators. Incorporate domainbased heuristics as operators as well.

[...I I use the term hybrid genetic algorithm for algorithms created by
applying these three principles.
The above three principles emerged as a result of countless experiments of
many researchers, who tried to 'tune' their evolutionary algorithms to some
problem at hand, that is, to create 'the best' algorithm for a particular class
of problems. For example, during the last 15 years, various applicationspecific variations of evolutionary algorithms have been reported (Michalewicz
1996); these variations included variable-length strings (including strings whose
elements were ij4tzeti-else rules), richer structures than binary strings, and
experiments with modified genetic operators to meet the needs of particular
applications. Some researchers (e.g. Grefenstette 1987) experimented with
incorporating problem-specific knowledge into the initialization routine of an
evolutionary system; if a (fast) heuristic algorithm provides individuals of the
124

TEAM LRN

Hybrid methods

125

initial population for an evolutionary system, such a hybrid evolutionary system

is guaranteed to do no worse than the heuristic algorithm which was used for
the initialization.
Usually there exist several (better or worse) heuristic algorithms for a given
problem. Apart from incorporating them for the purpose of initialization, some
of these algorithms transform one solution into another by imposing a change
in the solutions encoding (e.g. 2-opt step for the traveling salesman problem).
One can incorporate such transformations into the operator set of evolutionary
system, which usually is a very useful addition.
Note also (see Chapters I4 and 3 I ) that there is a strong relationship between
encodings of individuals in the population and operators, hence the operators
of any evolutionary system must be chosen carefully in accordance with the
selected representation of individuals. This is a responsibility of the developer
of the system; again, we would cite Davis (1991, p 58):
Crossover operators, viewed in the abstract are operators that combine
subparts of two parent chromosomes to produce new children. The
adopted encoding technique should support operators of this type, but
it is up to you to combine your understanding of the problem, the
encoding technique, and the function of crossover in order to figure
out what those operators will be. [...I
The situation is similar for mutation operators. We have decided
to use an encoding technique that is tailored to the problem domain;
the creators of the current algorithm have done this tailoring for
us. Viewed in the abstract, a mutation operator is an operator that
introduces variations into the chromosome. [...I these variations can
be global or local, but they are critical to keeping the genetic pot
boiling. You will have to combine your knowledge of the problem,
the encoding technique, and the function of mutation in a genetic
algorithm to develop one or more mutation operators for the problem
domain.
Very often hybridization techniques make use of local search operators,
which are considered as intelligent mutations. For example, the best
evolutionary algorithms for the traveling salesman problem use 2-opt or 3-opt
procedures to improve the individuals in the population (see e.g. Muhlenbein ef
a1 1988). It is not unusual to incorporate gradient-based (or hill-climbing)
methods as ways for a local improvement of individuals. It is also not
uncommon to combine simulated annealing techniques with some evolutionary
algorithms (Adler 1993).
The class of hybrid evolutionary algorithms described so far consists
of systems which extend evolutionary paradigm by incorporating additional
features (local search, problem-specific representations and operators, and
the like). This class also includes also so-called morphogenic evolutionary
LRN
techniques (Angeline 1995), whichTEAM
include
mappings (development functions)

I26

Hybrid methods

between representations that evolve (i.e. evolved representations) and

representations which constitutes the input for the evaluation function (i.e.
evaluated representations). However, there is another class of evolutionary
hybrid methods, where the evolutionary algorithm acts as a separate component
of a larger system. This is often the case for various scheduling systems, where
the evolutionary algorithm is just responsible for ordering particular items. This
is also the case for fuzzy systems, where the evolutionary algorithms may control
the membership function, or of neural systems, where evolutionary algorithms
may optimize the topology or weights of the network.

References
Adler D 1993 Genetic algorithms and simulated annealing: a marriage propo5al Proc.
IEEE lnt. Cot$ o t i Neurd Nehtwrks pp 1 104-9
Angeline P J 1995 Morphogenic evolutionary computation: introduction, issues. and
examples Proc. 4th Anti. Cot$ o t i E\*olutiotiury Progrmnming (Sun Diego, CA,
March l9Y5) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge. M.4:
MIT Press) pp 387-401
Davis L 1991 Handbook of Genetic Al,ip-ithtm (New York: Van Nostrand Reinhold)
Grefenstette J J 1987 Incorporating problem specific knowledge into genetic algorithms
Genetic. Aigorithm utid Siniuiuted Aiitieaiittg ed L Davi5 (Lo\ AI tos, CA: Morgan
Kaufmann) pp 42-60
Michalewicz 2 I993 A hierarchy of evolution programs: an experimental study
E,dictionqt Cotnplct. 1 5 1-76
-1996 Genetic Algorithms + D(itLi Struc-tures = E , d u t i o n Pmgmtm 3rd edn (New
York: Springer)
Miihlenbein H, Gorges-Schleuter M and Kramer 0 1988 Evolution algorithms in
combinatorial optimimtion Purdlel Cotnput. 7 65-85

TEAM LRN

14
Introduction to representations
Kalyanrnoy Deb

14.1 Solutions and representations

Every search and optimization algorithm deals with solutions, each of which
represents an instantiation of the underlying problem. Thus, a solution must
be such that it can be completely realized in practice; that is, either it can
be fabricated in a laboratory or in a workshop or it can be used as a control
strategy or it can be used to solve a puzzle, and so on. In most engineering
problems, a solution is a real-valued vector specifying dimensions to the key
parameters of the problem. In control system problems, a solution is a timeor frequency-dependent functional variation of key control parameters. In game
playing and some artificial-intelligence-related problems, a solution is a strategy
or an algorithm for solving a particular task. Thus, it is clear that the meaning
of a solution is inherent to the underlying problem.
As the structure of a solution varies from problem to problem, a solution
of a particular problem can be represented in a number of ways. Usually, a
search method is most efficient in dealing with a particular representation and
is not so efficient in dealing with other representations. Thus, the choice of an
efficient representation scheme depends not only on the underlying problem but
also on the chosen search method. The efficiency and complexity of a search
algorithm largely depends on how the solutions have been represented and how
suitable the representation is in the context of the underlying search operators.
In some cases, a difficult problem can be made simpler by suitably choosing a
representation that works efficiently with a particular algorithm.
In a classical search and optimization method, all decision variables are
usually represented as vectors of real numbers and the algorithm works on
one vector of solution to create a new vector of solution (Deb 1995. Reklaitis
et a1 1983). Different EC methods use different representation schemes
in their search process. Genetic algorithms (GAS) have been mostly used
with a binary string representing the decision variables. Evolution strategy
and evolictionary programming studies have used a combination of realTEAM LRN

127

128

Introduction to representations

valued decision variables and a set of strategy parameters as a solution vector.

In genetic programming, a solution is a LISP code representing a strategy or
an algorithm for solving a task. In permutation problems solved using an
EC method, a series of node numbers specifying a complete permutation is
commonly used as a solution. In the following subsection, we describe a number
of important representations used in EC studies.

14.2 Important representations

In most applications of GAS, decision variables are coded in binary strings
of 1s and OS. Although the variables can be integer or real valued, they are
represented by binary strings of a specific length depending on the required
accuracy in the solution. For example, a real-valued variable x, bounded in
the range ( c i , h ) can be coded in five-bit strings with the strings (00000) and
(11111) representing the real values ci and h, respectively. Any of the other
30 strings represents a solution in the range ( a ,h ) . Note that, with five bits,
the maximum attainable accuracy is only ( h - a)/(2' - 1). Binary coding is
discussed further in Chapter 15. Although binary string coding has been most
popular in GAS, a number of researchers prefer to use Gray coding to eliminate
the Hamming cliff problem associated with binary coding (Schaffer et nl 1989).
In Gray coding, the number of bit differences between any two consecutive
strings is one, whereas in binary strings this is not always true. However, as
in the binary strings, even in Gray-coded strings a bit change in any arbitrary
location may cause a large change in the decoded integer value. Moreover,
the decoding of the Gray-coded strings to the corresponding decision variable
introduces an artificial nonlinearity in the relationship between the string and
the decoded value.
The coding of the variables in string structures make the search space discrete
for GA search. Therefore, in solving a continuous search space problem,
GAS transform the problem into ;i discrete programming problem. Although
the optimal solutions of the original continuous search space problem and the
derived discrete search space problem may be marginally different (with large
string lengths), the obtained solutions are usually acceptable in most practical
search and optimization problems. Moreover, since GAS work with a discrete
search space, they can be conveniently used to solve discrete programming
problems, which are usually difficult to solve using traditional methods.
The coding of the decision variables in strings allows GAS to exploit the
similarities among various strings in a population to guide the search. The
similarities in string positions are represented by ternary strings (with 1, 0, and
*, where a * matches a 1 or a 0) known as schema. The power of GA search
is considered to lie in the implicit parallel schema processing.
Although string codings have been mostly used in GAS, there have been
some studies with direct recil-\~iluedvectors in GAS (Deb and Agrawal 1995.
LRN
and Schaffer
1993, Wright 1991). In those
Chaturvedi et ell 1995, Eshelman TEAM

I29

Important representations

applications, decision variables are directly used and modified genetic operators
are used to make a successful search. A detailed discussion of the real-valued
vector representations is given in Chapter 16.
In evolution strategy (ES) and evolutionary programming (EP) studies, a
natural representation of the decision variables is used where a real-valued
solution vector is used. The numerical values of the decision variables are
immediately taken from the solution vector to compute the objective function
value. In both ES and EP studies, the crossover and mutation operators are used
variable by variable. Thus, the relative positioning of the decision variables in
the solution vector is not an important matter. However, in recent studies of
ES and EP, in addition to the decision variables, the solution vector includes a
set of strategy parameters specifying the variance of search mutation for each
variable and variable combinations. For n decision variables, both methods use
an additional number between one and n ( n 1)/2 such strategy parameters,
depending on the degree of freedom the user wants to provide for the search
algorithm. These adaptive parameters control the search of each variable,
considering its own allowable variance and covariance with other decision
variables. We discuss these representations in Section 16.2.
In permutation problems, the solutions are usually a vector of node identifiers
representing a permutation. Depending on the problem specification, special care
is taken in creating valid solutions representing a valid permutation. In these
problems, the absolute positioning of the node identifiers is not as important as
the relative positioning of the node identifiers. The representation of permutation
problems is discussed further in Chapter 17.
In early EP works, finite-state machines were used to evolve intelligent
algorithms which were operated on a sequence of symbols so as to produce an
output symbol which would maximize the algorithms performance. Finite-state
representations were used as solutions to the underlying problem. The input
and output symbols were taken from two different finite-state alphabet sets. A
solution is represented by specifying both input and output symbols to each link
connecting the finite states. The finite-state machine tran forms a sequence of
input symbols to a sequence of output symbols. The finite-state representations
are discussed in Chapter 18.
In genetic programming studies, a solution is usually a LISP program
specifying a strategy or an algorithm for solving a particular task. Functions
and terminals are used to create a valid solution. The syntax and structure of
each function are maintained. Thus, if an OR function is used in the solution,
at least two arguments are assigned from the terminal set to make a valid OR
operation. Usually, the depth of nestings used in any solution is restricted to
a specified upper limit. In recent applications of genetic programming, many
special features are used in representing a solution. As the iterations progress,
a part of the solution is frozen and defined as a metafunction with specified
arguments. We shall discuss these features further in Chapter 19.
TEAM LRN of a solution is important in the
As mentioned earlier, the representation

130

Introduction to representations

working of a search algorithm, including evolutionary algorithms. In EC studies,

although a solution can be represented in a number of ways, the efficacy of a
representation scheme cannot be judged alone; instead it depends largely on the
chosen recombination operators. In the context of schema processing and the
building block hypothesis, it can be argued that a representation that allows good
yet important combinations of decision variables to propagate by the action of the
search operators is likely to perform well. Radcliffe (1993) outlines a number of
properties that a recombination operator must have in order to properly propagate
good building blocks. Kargupta et c d (1992) have shown that the success of
GAS in solving a permutation problem coded by three different representations
strongly depends on the appropriate recombination operator used. Thus, the
choice of a representation scheme must not be made alone, but must be made
in conjuction with the choice of the search operators. Guidelines for a suitable
representation of decision variables are discussed in Chapter 20.

14.3 Combined representations

In many search and optimization problems, the solution vector may contain
different types of variable. For example, in a mixed-integer programming
problem (common to many engineering and decision-making problems) some
of the decision variables could be real valued and some could be integer valued.
In an engineering gear design problem, the number of teeth in a gear and the
thickness of the gear could be two important design variables. The former
variable is an integer variable and the latter is a real-valued variable. If
the integer variable is coded in five-bit binary strings and the real variable
is coded in real numbers, a typical mixed-string representation of the above
gear design problem may look like (10011 2 3 . 5 ) , representing 19 gear teeth
and a thickness of 23.5 mm. Sometimes, the variables could be of different
types. In a typical civil engineering truss structure problem, the topology of
the truss (the connectivity of the truss members represented as presence or
absence of members) and the member cross-sectional areas (real valued) are
usually the design decision variables. These combined problems are difficult
to solve using traditional methods, simply because the search rule in those
algorithms does not allow mixed representations. Although there exists a number
of mixed-integer programming algorithms such as the branch-and-bound method
or the pencilfy .fitnc*tion method, these algorithms treat the discrete variables
as real valued and impose an artificial pressure for these solutions to move
towards the desired discrete values. This is achieved either by adding a set of
additional constraints or by penalizing infeasible solutions. These algorithms, in
general, require extensive computations. However, the string representation of
variables in GAS and the flexibility of using a discrete probability distribution
for creating solutions in ES and EP studies allow them to be conveniently
used to solve such combined problems. In these problems, a solution vector
TEAM LRN or numerical values representing
can be formed by concatenating substrings

131

References

or specifying each type of variable, as depicted in the above gear design

problem representation. Each part of the solution vector is then operated by
a suitable recombination and mutation operator. In the above gear design
problem representation using GAS,a binary crossover operator may be used for
the integer variable represented by the binary string and a real-coded crossover
can be used for the continuous variable. Thus, the recombination operator
applied to these mixed representations becomes a collection of a number of
operators suitable for each type of variable (Deb 1997). Similar mixed schemes
for mutation operators need also to be used for such combined representations.
These representations are discussed further in Chapter 26.

References
Deb K 1995 0ptimi:utionfiw Engineering Design: AIgorithnt\ r r n d E.triinples (New Delhi:
Prentice-Hall)
-I997 A robust optimal design technique for mechanical component design
Ewlution(in Algorithnts in Eiigirieeririg Applicutioris ed D Dasgupta and Z
Michalewicz (Berlin: Springer) in press
Deb K and Agrawal R 1995 Simulated binary crossover for continuous search space
C0nrple.u syst. 9 1 15-48
Chaturvedi D, Deb K and Chakrabarty S K 1995 Structural optimization using real-coded
genetic algorithms Proc. Syntp. on Genetic Algorithiiis (Detir(itiirii)cd P K Roj and
S D Mehta (Dehradun: Mahendra Pal Singh) pp 73-82
Eshelman L J and Schaffer J D 1993 Real-coded genetic algorithms and i n t e n d whemata
Founditions of Genetic Algoritliriis I1 ( k i l , CO) ed D Whitley (San Matco. CA:
Morgan Kaufmann) pp 187-202
Kargupta H, Deb K and Goldberg D E 1992 Ordering genetic algorithms and deception
Purullel Problem Solvitig .from Nature I1 (Brussels) ed R Manner and B Manderich
(Amsterdam: North-Holland) pp 47-56
Radcliffe N J I993 Genetic set recombination Fouiidutioits of Getretic Algorithi~isI1 ( Vtril,
CO) ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 203-19
Reklaitis G V, Ravindran A and Ragsdell K M 1983 Engineering 0ptiiiri:ciiioii: Mcthorls
cind Applicutions (New York: Wiley j
Schaffer J D, Caruana R A, Eshelman L J and Das R 1989 A study of control
parameters affecting online performance of genetic algorithms P r w . 3rd lilt. Car$
011 Genetic. Algorittinis (Fuirfis, WA, 1989) ed J D Schaffer (San Mateo. CA:
Morgan Kaufmannj pp 5 1-60
Wright A 1991 Genetic algorithms for real parameter optimiration Foiriicltitioii.c of
Genetic Algorithms (Bloonzirrgton, IN) ed G J E Rawlins (San Mateo, CA: Morgan
Kaufmann) pp 205-20

TEAM LRN

15
Binary strings
Thornas Back
The classical representation used in so-called canonical genetic algorithms
consists of binary vectors (often called bitstrings or binary strings) of fixed length
t ; that is, the individual space I is given by I = (0, I }' and individuals a E I are
denoted as binary vectors a = (al,. . . , a t ) E (0, I ] ' (see the book by Goldberg
( 1989)). The mutation operator (Section 32.1) then typically manipulates these
vectors by randomly inverting single variables CL, with small probability, and
the crossover operator (Section 33. I ) exchanges segments between two vectors
to form offspring vectors.
This representation is often well suited to problems where potential
solutions have a canonical binary representation, i.e. to so-called pseudo-Boolean
optimization problems of the form f' : (0, I ) ' -+ R. Some examples of such
combinatorial optimization problems are the maximum-independent-set problem
in graphs, the set covering problem, and the knapsack problem, which can be
represented by binary vectors simply by including (excluding) a vertex, \et.
or item i in (from) a candidate solution when the corresponding entry ( I , = 1
( ( I , = 0).
Canonical genetic algorithms, however, also emphasize the binary
representation in the case of problems f' : S -+ R where the search space S
fundamentally differs from the binary vector space (0, I } ' . The most prominent
example of this is given by the application of canonical genetic algorithms for
continuous parameter optimization problems f' : R" + R as outlined by Holland
( 1975) and empirically investigated by De Jong (1975). The mechanism\ of
encoding and decoding between the two different spaces (0, I } ' and R" then
require us to restrict the continuous space to finite intervals [ U , ,U , ] for each
variable s, E R, to divide the binary vector into I I segments of ( i n most
cases) equal length t , , such that t = t i t t , and to interpret a subsegment
(cJ,,-~)~,+I,
. . . , q t ,) (i = I , . . . , U ) as the hinary encoding of the variable
x, . Decoding then either proceeds according to the standard binary decoding
function r' : (0, I}' --+ [ U , .q ] , where (see Back 1996)
(15.1)

I32

TEAM LRN

Binary strings

133

or by using a Gray code interpretation of the binary vectors, which ensures that
adjacent integer values are represented by binary vectors with Hamming distance
one (i.e. they differ by one entry only). For the Gray code, equation ( 1 5. I ) is
extended by a conversion of the Gray code representation to the standard code,
which can be done for example according to

where @ denotes addition modulo two.

It is clear that this mapping from the representation space I = (0. I}' to
the search space S =
U , ] is injective but not surjective. i.e. not all
points of the search space are represented by binary vectors, such that the
genetic algorithm performs a grid search and, depending on the granularity
of the grid, might fail to locate an optimum precisely (notice that t', and the
range [U,,U , ] determine the distance of two grid points in dimension i according
to Ax, = ( U , - ~ , ) / ( 2 ' ~I)). Moreover, both decoding mappings given by
equations (15.1) and (15.2) introduce additional nonlinearities to the overall
objective function f ' : {0, 1)' + R, where f ' ( a ) = ( f o x : = , r ' ) ( u ) ,and
the standard code according to equation (15.1) might cause the problem f ' to
become harder than the original optimization problem .f (see the work of Back
(1993, 1996, ch 6) for a more detailed discussion).
While parameter optimization is still the dominant field where canonical
genetic algorithms are applied to problems in which the search space is
fundamentally different from the binary space (0, I}', there are other examples
as well, such as the traveling salesman problem (Bean 1993) and job shop
scheduling (Nakano and Yamada 1991). Here, rather complex mappings from
(0, 1 ) ' to the search space were defined-to improve their results, Yamada and
Nakano ( 1992) later switched to a more canonical integer representation space,
giving a further indication that the problem characteristics should determine the
representation and not vice versa.
The reasons why a binary representation of individuals in genetic algorithms
is favored by some researchers can probably be split into historical and schematheoretical aspects. Concerning the history, it is important to notice that Holland
(1975, p 21) does not define adaptive plans to work on binary variables (alleles)
a, E (0, I}, but allows arbitrary but finite individual spaces I = A I x . . . x A t ,
where A, = { a , , ., . . , a,,, }. Furthermore, his notion of schemata (certain
subsets of I characterized by the fact that all members-so-called instances
of a schema-share some similarities) does not require binary variables either,
but is based on extending the sets A, defined above by an additional 'don't
care' symbol (Holland 1975, p 68). For the application example of parameter
optimization, however, he chooses a binary representation (Holland 1975, pp 57,
70), probably because this is the canonical way to map the continuous object
TEAM
LRNin his adaptive plans, which in turn
variables to the discrete allele sets A,
defined

n:=l[~,,

134

Binary strings

are likely to be discrete because they aim at modeling the adaptive capabilities
of natural evolution on the genotype level.
Interpreting a genetic algorithm as an algorithm that processes schemata,
Holland (1975, p 7 I ) then argues that the number of schemata available under
a certain representation is maximized by using binary variables; that is, the
maximum number of schemata is processed by the algorithm if ci, E (0, I ) .
This result can be derived by noticing that, when the cardinality of an alphabet
A for the allele values is k = ( A ( ,the number of different schemata is ( k I ) '
(i.e. 3' in the case of binary variables). For binary alleles, 2' different solutions
can be represented by vectors of length t , and in order to encode the same
number of solutions by a k-ary alphabet, a vector of length

e'= c--In 2
Ink

( I 5.3)

is needed. Such a vector, however, is an instance of ( k + I ) t ' schemata, a number

that is always smaller than 3' for k > 2; that is, fewer schemata exist for an
alphabet of higher cardinality, if the same number of solutions is represented.
Goldberg ( 1989, p 80) weakens the general requirement for a binary alphabet
by proposing the so-called principle of minimal alphabets, which states that 'The
user should select the smallest alphabet that permits a natural expression of the
problem' (presupposing, however, that the binary alphabet permits a natural
expression of continuous parameter optimization problems and is no worse
than a real-valued representation (Goldberg 199 I )). Interpreting this strictly,
the requirement for binary alphabets can be dropped, as many practitioners
(e.g. Davis 199 1 and Michalewicz 1996) who apply (noncanonical) genetic
algorithms to industrial problems have already done, using nonbinary, problemadequate representations such as real-valued vectors (Chapter 16), integer lists
representing permutations (Chapter 17), finite-state machine representations
(Chapter 18), and parse trees (Chapter 19).
At present, there are neither clear theoretical nor empirical arguments that
a binary representation should be used for arbitrary problems other than those
that have a canonical representation as pseudo-Boolean optimization problems.
From an optimization point of view, where the quality of solutions represented
by individuals in a population is to be maximized, the interpretation of genetic
algorithms as schema processors and the corresponding implicit parallelism and
schema theorem results are of no practical use. From our point of view, the
decoding function r : (0, I}' + S that maps the binary representation to the
canonical search space a problem is defined on plays a much more crucial role
than the schema processing aspect, because, depending on the properties of r,
the overall pseudo-Boolean optimization problem f ' = f o r might become
more complex than the original search problem f : S + R. Consequently,
one might propose the requirement that, if a mapping between representation
space and search space is used at TEAM
all, itLRN
should be kept as simple as possible

135

References

and obey some structure preserving conditions that still need to be formulated
as a guideline for finding a suitable encoding.

References
Back T 1993 Optimal mutation rates in genetic search Proc. 5th Inr. Cot$ on Genetic
Algorithms (Urbartri-Chatnpaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 2-8
-1
996 E\dutionun Algorithms in T h e o n und Prwtice (New York: Oxford
University Press)
Bean J C I993 Genetics urtd Randotn Keys for Sequences ~ i n dOptitnixtion Technical
Report 9 2 4 3 , University of Michigan Department of Industrial and Operations
Engineering
Davis L 1991 Handbook of Genetic Algorithtits (New York: Van Nostrand Reinhold)
De Jong K A 1975 An Anulyis cf the Behaviour of a Cl~.s.scf Genetic Aduptiite Swtents
PhD Thesis, University of Michigan
Goldberg D E 1989 Genetic Algorithm in Search, 0ptitni:ution arid MucAine Letrrning
(Reading, MA: Addison Wesley)
-1
99 1 The theory of virtual alphabets Parallel Problent Soliing .front Nuticre-Proc.
1st Workshop, PPSN I (Lecture Notes in Computer Science 496) ed H - P Schwefel
and R Manner (Berlin: Springer) pp 13-22
Holland J H 1975 Aduptation in Natural and Art$ciul Systetns (Ann Arbor. MI:
University of Michigan Press)
Michalewicz Z 1996 Genetic Algorithms
Data Structures = Eidution ProgruniA
(Berlin: Springer)
Nakano R and T Yamada 1991 Conventional genetic algorithms for job shop problems
Proc. 4th Int. Cot$ on Genetic Algorithm (San Diego, CA, Jidy 1991) ed R K Belew
and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 474-9
Yamada T and R Nakano 1992 A genetic algorithm applicable to large-scale job-shop
problems Parullel Problem Solving from Nature, 2 (Proc. 2nd Int. Con$ on Purullt4
Problem Solving from Nature, Brussels, 1992) ed R Manner and B Manderick
(Amsterdam: Elsevier) pp 28 1-90

TEAM LRN

16
~~

~~~~

Real-valued vectors
David B Fogel

16.1 Object variables

When posed with a real-valued function optimization problem of the form find
the vector x such that F ( x ) : R -+ R is minimized (or maximized), evolution
strategies (Back and Schwefel 1993) and evolutionary programming (Fogel
1995, pp 75-84, 136-7) typically operate directly on the real-valued vector
x (with the components of x identified as object parameters). In contrast,
traditional genetic algorithms operate on a coding (often binary) of the vector
x (Goldberg 1989, pp 80-4). The choice to use a separate coding rather than
operating on the parameters themselves relies on the fundamental belief that
i t is useful to operate on subsections of a problem and try to optimize these
subsections (i.e. building blocks) in isolation, and then subsequently recombine
them so as to generate improved solutions. More specifically, Goldberg ( 1989,
p 80) recommends
The user should select a coding so that short, low-order schemata are
relevant to the underlying problem and relatively unrelated to schemata
over other fixed positions.
The user should select the smallest alphabet that permits a natural
expression of the problem.
Although the smallest alphabet generates he greatest implicit parallelism, there
is no empirical evidence to indicate that binary codings allow for greater
effectiveness or efficiency in solving real-valued optimization problems (see
the tutorial by Davis (1991, p 63) for a commentary on the ineffectiveness of
binary codings).
Evolution strategies and evolutionary programming are not generally
concerned with the recombination of building blocks in a solution and do not
consider schema processing. Instead, solutions are viewed in their entirety, and
no attempt is made to decompose whole solutions into subsections and assign
credit to these subsections.
I36

TEAM LRN

Object variables and strategy parameters

137

With the belief that maximizing the number of schemata being processed
is not necessarily useful, or may even be harmful (Fogel and Stayton 1994),
there is no compelling reason in a real-valued optimization problem to act
on anything except the real values of the vector x themselves. Moreover,
there has been a general trend away from binary codings within genetic
algorithm research (see e.g. Davis 199 1, Belew and Booker I99 1, Forrest
1993, and others). Michalewicz (1992, p 82) indicated that for real-valued
numerical optimization problems, floating-point representations outperform
binary representations because they are more consistent and more precise and
lead to faster execution. This trend may reflect a growing rejection of the
building block hypothesis as an explanation for how genetic algorithms act as
optimization procedures.
With evolution strategies and evolutionary programming, the typical method
for searching a real-valued solution space is to add a multivariate zero-mean
Gaussian random variable to each parent involved in the creation of offspring
(see Section 32.2). In consequence, this necessitates the setting of the covariance
matrix for the Gaussian perturbation. If the covariances between parameters
are ignored, only a vector of standard deviations in each dimension is required.
There are a variety of methods for setting these standard deviations. Section 32.2
offers a variety of procedures for mutating real-valued vectors.

16.2 Object variables and strategy parameters

It has been recognized since 1967 (Rechenberg 1994, Reed et cil 1967) that
it is possible for each solution to possess an auxiliary vector of parameters
that determine how the solution will be changed. Two general procedures
for adjusting the object parameters via Gaussian mutation have been proposed
(Schwefel 1981, Fogel et a1 1991) (see Section 32.2). In each case, a vector
of strcrfegy parameters for self-adaptation is included with each solution and
is subject to random variation. The vector may include covariance or rotation
information to indicate how mutations in each parameter may covary. Thus the
representation consists of two or three vectors:

where x is the vector of object parameters ( X I , . . . . x,]), U is the vector of

standard deviations, and (U is the vector of rotation angles corresponding to
the covariances between mutations in each dimension, and may be omitted if
these covariances are set to be zero. The vector c may have 11 components
(01, . . . , U , , ) where each entry corresponds to the standard deviation in the
ith dimension, i = 1 , . . . . 1 2 . The vector CT may also degenerate to a scalar
(T in which case this single value is used as the standard deviation in all
dimensions. Intermediate numbers of standard deviations are also possible,
TEAM LRN (this also applies to the rotation
although such implementation is uncommon

I38

Real-valued vectors

angles a ) . Very recent efforts by Ostermeier et a1 (1994) offer a variation on

the methods of Schwefel ( I98 1) and further study is required to determine the
general effectiveness of this new technique (see Section 32.2).
Recent efforts in genetic algorithms have also included self-adaptive
procedures (see e.g. Spears 1995) and these may incorporate similar real-valued
coding for variation operators including crossover and point mutations, on both
real-valued or binary or otherwise coded object parameters.

References
Back T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization E\dietioncint Cornpiet. 1 1-24
Belew R K and Booker L B (eds) 1991 Proc.. 4th Int. Cot$ on Getietic AIgorith~m(Strti
Diego, CA, July I Y Y I ) (San Mateo, CA: Morgari Kaufmann)
Davis L 1991 A genetic algorithms tutorial IV. Hybrid genetic algorithms Hmzdhook of
Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold)
Fogel D B 1995 Eidictiomin Compictcition: Rnzurd ci New. Philosophy of Machirie
lntrlligrwe (Piscataway, NJ: IEEE)
Fogel D B. Fogel L J and Atmar J W 1991 Meta-evolutionary programming Proc. 25th
A\ilomcir Cot$ o t i Signcils, Systenis, eind Cornpicters ed R R Chen (San Jose, CA:
Maple) pp 540-5
FogeI D B and Stayton L C 1994 On the effectiveness of crossover in 4mulated
evolutionary optirniration BioSystem 32 17 1-82
Forrest S (ed) I993 Proc. 5th liit. Cotif: on Genetic Algorithnis ( Urhcinci-Chnmpuigri,IL,
Jicly 1993) (San Mateo, CA: Morgan Kaufmann)
Goldberg D E 1989 Geizetic Algorithnis iti Search, 0ptitni:ution mid Machine Letirtiirig
(Reading, MA: Addison-Wesley )
Michalewicr Z 1992 Genetic Algorithms + Dcitii Structures = E~~olutiorrProgrtims
(Berlin: Springer)
Ostermeier A, Gawelc~ykA and Hansen N 1994 A derandomized approach to selfadaptation of evolution strategies Evolcctiotiay Cornput. 2 36940
Rechenberg I 1993 Personal communication
Reed J, Toombs R and Barricelli N A 1967 Simulation of biological ebolution and
machine learning J . Tlieor. Biol 17 3 1 9 4 2
Schwefel H-P I98 I Nitnirritul Optinii,-titiono j Corziputsr Models (Chichester: Wiley )
Spears W M I995 Adapting croswver in evolutionary algorithms Etolutioticiry
Progrtimmirig IV: Proc. 4th Anti. Cord. otz E\wlictioncin Prograniniiiig (Stiti Diego.
CA, MLirck 1995) ed J R McDonnell, R G Reynold\ and D B Fogel (Cambridge,
MA: MIT Press) pp 367-84

TEAM LRN

17
Permutations
Darrell Whitley

17.1 Introduction
To quote Knuth (1973), A permutation of a finite set is an arrangement of its
elements into a row. Given n unique objects, n ! permutations of the objects
exist. There are various prcperties of permutations that are relevant to the
manipulation of permutation representations by evolutionary algorithms, both
from a representation point of view and from an analytical perspective.
As researchers began to apply evolutionary algorithms to applications that are
naturally represented as permutations, it became clear that these problems pose
different coding challenges than traditional parameter optimization problems.
First, for some types of problem there are multiple equivalent solutions. When
a permutation is used to represent a cycle, as in the traveling salesman problem
(TSP), then all shifts of the permutation are equivalent solutions. Furthermore,
all reversals of a permutation are also equivalent solutions. Such symmetries
can pose problems for evolutionary algorithms that rely on recombination.
Another problem is that permutation problems cannot be processed using
the same general recombination and mutation operators which are applied to
parameter optimization problems. The use of a permutation representation may
in fact mask very real differences in the underlying combinatorial optimization
problems. An example of these differences is evident in the description of
classic problems such as the TSP and the problem of resource scheduling.
The traveling salesman problem is the problem of visiting each vertex (i.e.
city) in a full connected graph exactly once while minimizing a cost function
defined with respect to the edges between adjacent vertices. In simple terms,
the problem is to minimize the total distance traveled while visiting all the cities
and returning to the point of origin. The TSP is closely related to the problem
of finding a Hamiltonian circuit in an arbitrary graph. The Hamiltonian circuit
is a set of edges that form a cycle which visits every vertex exactly once.
It is relatively easy to show that the problem of finding a set of Boolean
values that yield an evaluation of true for a three-conjunction normal form
Boolean expression is directly polynomial-time reducible to the problem of
TEAM LRN

I39

Permutations

130

finding a Hamiltonian circuit in a specific type of graph (Cormen ef (11 1990).

The Hamiltonian circuit problem in turn is reducible to the TSP. A11 of these
problems have a nondeterministic polynomial-time (NP) solution but have no
known polynomial-time solution. These problems are also members of the set
of hardest problems in NP. and hence are NP complete.
Permutations are also important for scheduling applications, variants of
which are also often NP complete. Some scheduling problems are directly
related to the TSP. Consider minimizing setup times between a set of N jobs.
where the function Setup(A, B ) is the cost of switching from job A to job B. If
Setup(A, B ) = Setup(B, A) this is ii variant of the symmetric TSP, except that
the solution may be a path instead of a cycle through the graph (i.e. it visits
every Lwtex. but does not necessarily return to the origin.) The TSP and setup
minimization problem may also be nonsymmetric: the cost of going from L'ertex
A t o B inay not be equal to the cost of going from vertex B to A.
Other types of scheduling problem are different from the TSP. Assume that
one must schedule service times for a set of customers. If this involves access
t o a critical resource, then those customers that are scheduled early may have
access to resources that are unavailable to later customers. If one is scheduling
appointments, for example, later customers will have less choice with respect
to which time slots are available to them. In either case, access to limited
resources is critical. We would like to optimize the match between resources
and customers. This could allow us to give more customers what they want in
terms of resources, or the goal might be to increase the number of customers
who can be serlriced. In either case, permutations over the set of customers
can be used as a priority queue for scheduling. While there are various classic
problems i n the scheduling literature, the term re.soiirct~.sc.hetfirling is used here
to refer to scheduling applications where resources are consumed.
Permutations are also sometimes used to represent multisets. A multiset is
also sometimes referred to as a l m g , which is analogous to a set except that a
bag inay contain multiple copies of identical elements. In sets. the duplication
of elements is not significant. so that

However, in the following multiset,

= ( a , a, b, b , b, c , d , e , e , f , f )

there are two a's, three b * \ , one c, one d, two e's and two f ' s , and duplicates
are significant. In xheduling applications that map jobs to machines, it may
be necessary to xhedule two jobs of type a, three jobs of type b, and so on.
Note that it is not necessary that all jobs of type a be scheduled contiguously.
While M in the above illu\tration contains 1 1 elements. there are not I 1 ! unique
permutations. Rather, the number of unique permutations is given by
1I!

TEAM LRN
? '._
3!I !I !2!2!

A
.

Mapping integers to permutations

141

and in general
n!
nl ! n 2 ! n 3 ! .. .
where n is the number of elements in the multiset and n , is the number of
elements of type i (Knuth 1973). Radcliffe (1993) considers the application of
genetic and evolutionary operators when the solution is expressed as a set or
multiset (bag).
Before looking in more detail at the relationship between permutations and
evolutionary algorithms, some general properties of permutations are reviewed
that are both interesting and useful.

17.2 Mapping integers to permutations

The set of n ! permutations can be mapped onto the set of integers in various
ways. Whitley and Yoo (1995) give the following algorithm which converts an
integer X into the corresponding permutation.
Choose some ordering of the permutation which is defined to be sorted.
Sort and index the N elements ( N 3 I ) of the permutation from I to N .
Pick an index X for a specific permutation such that 0 5 X < N ! .
If X = 0, pick all remaining elements in the sorted permutation list in the
sequence in which they occur and stop.
IF X < ( N - I ) ! pick the first element of the remaining list; GOTO (vi).
Otherwise, continue.
FindY suchthat(Y-I)(N-I)! 5 X < Y(N-l)!. The Ythelementofthe
sort list is the next element of the permutation. X = X - ( Y - 1 ) ( ( N - I ) ! ) .
Delete the chosen element from the list of sorted elements; N = N - 1;
GOTO (iii).
This algorithm can also be inverted to map integers to permutations. For
permutations of length three this generates the following correspondance:
X = 0 indexes 123
X = 3 indexes 231
X = 1 indexes 132
X = 4 indexes 312
X = 2 indexes 213
X = 5 indexes 321.

17.3 The inverse of a permutation

One important property of a permutation is that it has a well-defined h\*erse
(Knuth 1973). Let a l a 2 . . .a,, be a permutation of the set { 1 , 2 , . . . , n } . This
can be written in a two-line form
1

2
a2

3
a3

...
...

I?
at1

The inverse is obtained by reordering both rows such that the second row is
LRN
transformed into the sequence 123.TEAM
. . n ; the
reordering of the first that occurs as

I42

Permu tat ions

a consequence of reordering the second row yields the inverse of permutation

. . . ~ 1 , ~ .The inverse is denoted U ; C I ~ L I .. ;. .
Knuth ( 1973) gives the
following example of a permutation, 5 9 1 8 2 6 4 7 3, and shows that its
inverse can be obtained as follows:

LIILI~(~~

59 1826473
i23456789

) (
=

which yields the inverse 3 5 9 7 1 6 8 4 2. Knuth also points out that UJ = k

if and only if ~ i k = j . The inverse can be used as part of a function for
mapping permutations to a canonical form, which in turn makes it easier to
model problems with permutation representations.

17.4 The mapping function

When modeling evolutionary algorithms it is often useful to compute a
transmission function rl./( k ) which yields the probability of recombining strings
i and j and obtaining an arbitrary string k . Whitley and Yoo (1995) explain how
to compute the transmission function for a single string k and then to generalize
the results to all other strings. In this case, the strings represent permutations
and the remapping function, denoted @, functions as follows:

The computation Y = A @ X behaves as follows. Let any permutation

X be represented by x 1 . v ~. .~. x~f , . Then C I I L ~ ~. .N .~N,, @ X I X ~ . T ~. . . x,, yields
Y = ylyzy3,. . yf, where y r = j when CI, = .vJ. Thus, (3421 03124) yields
(1432) since ( U I = 3 = -1-1) + (?I = I ) . Next, (112 = 4 = -1-4)j(yz = 4),
((43 = 2 = .r3) + (y3 = 3 ) . and (u4 = 1 = x 2 ) j4
('
= 2). This mapping
function is analogous to the bitwise addition (mod 2) used to reorder the vector
s for binary strings. However, note that A @ X # X @ A . Furthermore, for
permutation recombination operators it is not genrrolly true that r , . ] = r J , [ .
This allows one to compute the transmission function with respect to a
canonical permutation, in this case 1234, and generalize this mapping to all other
permutations. This mapping can be achieved by simple element substitution.
First, the function r can be generalized as follows:

where U!,x. y , and z are variables representing the elements of the permutation
(e.g. U' = 3, .v = I , y = 2, 2 = 4). If ul.ryr: now represents the canonically
ordered permutation 1234,
TEAM LRN

143

Matrix representations

We can also relate this mapping operator to the process of finding an inverse.
The permutations in the expression
r3421,1342(3 124) = r1-$32.7143
( 1234)

are included as rows in an array. To map the left-hand side of the preceding
expression to the terms in the right-hand side, first compute the inverses for
each of the terms in the left-hand side:

( 3421
) ( 4312
1234 )
1234
=

( 3124
1234 )
( ii?: ) *
=

Collect the three inverses into a single array. We also then add 1 2 3 4 to the
array and inverse the permutation 2 3 I 4, at the same time rearranging all the
other permutations in the array:

[ 1 (:!i
4312
1423
2314
I234

3124

This yields the permutations 1432, 2143, and 1234 which represent the
desired canonical form as it relates to the notion of substitution into a symbolic
canonical form. One can also reverse the process to find the permutations p ,
and p, in the following context:

17.5 Matrix representations

When comparing the TSP to the problem of resource scheduling, in one case
adjacency is important (the TSP) and in the other case relative order is important
(resource scheduling). One might also imagine problems where absolute position
is important. One way in which the differences between adjacency and relative
order can be illustrated is to use a matrix representation of the information
contained in a permutation.
When we discuss adjacency in the TSP, we typically are referring to a
symmetric TSP: the distance of going from city A to B is the same as going
from B to A. Thus, when we define a matrix representing a tour. there will be
two edges in every row of the matrix, where a row-column entry of 1 represents
an edge connecting the two cities. Thus, the matrices for the tours [ A B C D E
TEAM LRN
F] and [C D E B F A] (left and right,
respectively) are as follows:

I44

Permutations

c
D
E
F

A B C D E F
0 1 0 0 0 1
1 0 1 0 0 0
0 1 0 1 0 0
0 0 1 0 1 0
0 0 0 1 0 1
1 0 0 0 1 0

c
D
E
F

A
0
0
1
0
0

B
0
0
0
0
1

C D E F
1 0 0 1
0 0 1 1
0 1 0 0
1 0 1 0
0 1 0 0

One thing that is convenient about the matrix representation is that it is

easy to extract information about where common edges occur. This can also be
expressed in the form of a matrix, where a zero or one respectively is placed in
the matrix where there is agreement in the two parent structures. If the values
in the parent matrices conflict, we will place a ## in the matrix. Using the two
above structures as parents, the following common information is obtained:
A
B

c
D
E

0
#
#

#
O
#

# 0 0 1
# O # #
0 1 0 0

This matrix can be interpreted in the following way. If we convert the

* symbols, then (in the notation typically used by the genetic
algorithm community) a hyperplane is defined in this binary space in which
both of the parents reside. If a recombination operator is applied, the offspring
should also reside in this same subspace (this is the concept of respect, as used
by Radcliffe ( I99 1 ); note mutation can still be applied after recombination).
This matrix representation does bring out one feature rather well: the
common subtour information can automatically and easily be extracted and
passed on to the offspring during recombination.
The matrix defining the common hyperplane information also detines those
offspring that represent a recombination of the information contained in the
parent structures. In fact, any assignment of 0 or 1 bits to the locations
occupied by the # symbols could be considered valid recombinations, but not
all ctrejeasible solutioris t o the TSP, because not all recombinations result in a
Hamiltonian circuit. We would like to have an offspring that is not only a valid
recombination, but also a feasible solution.
The matrix representation can also make explicit relative order information.
Consider the same two parents: [A B C D E F] and [C D E B F A]. Relative order
can be represented as follows. Each row will be the relative order information
for a particular element of a permutation. The columns will be all permutation
elements in some canonical order. If A is the first element in a permutation,
then a one will be placed in every column (except column A; the diagonal will
again be zero) to indicate A precedes all other cities. This representation is
given by Fox and McMahon ( 1991). Thus, the matrices for [ A B C D E F] and
LRN respectively):
[C D E B F A] are as follows (leftTEAM
and right,
# symbols to

Alternative representations

A
B

c
D
E
F

A
0
0
0
0
0
0

B
1
0
0
0
0
0

C
1
1
0
0
0
0

145

D E
1 1
1 1
1 1
0 1
0 0
0 0

F
1
1
1
1
1
0

A
B

c
D
E
F

A
0
1
1
1
1
1

B
0
0
1
1
1
0

C
0
0
0
0
0
0

D
0
0
1
0
0
0

E
0
0
1
1
0
0

F
0
1
1
1
1
0

In this case, the lower triangle of the matrix flags im*ei-sioiis.which should
not be confused with an iniierse. If u I c i z ( i l . . . ci,, is a permutation of the
canonically ordered set 1 , 2, 3, . . . , 11 then the pair ( ( I , , N , ) is an iiilvrsioii i f
i < j and u, > CI, (Knuth 1973). Thus, the number of I bits in the lower
triangles of the above matrices is also a count of the number of inLrersions
(which should also not be confused with the itilvrsioiz operator used in simple
genetic algorithms, see Holland 1975, p 106, Goldberg 1989, p 166).
The common information can also extracted as before. This produces the
following matrix:
A

A
B

O
#

#
0

# # # #
# # # 1

D
E
F

#
#
#

# 0 0 1 1
# 0 0 0 1
O O O O O

Note that this binary matrix is again symmetric around the diagonal, except
that the lower triangle and upper triangle have complementary bit values. Thus
only N ( N - 1)/2 elements are needed to represent relative order information.
There have been few studies of how recombination crossover operators
generate offspring in this particular representation space. Fox and McMahon
( 199 1 ) offer some work of this kind and also define several operators that work
directly on these binary matrices for relative order.
While matrices may not be the most efficient form of implementation, they
do provide a tool for better understanding sequence recombination operators
designed to exploit relative order. It is clear that adjacency and relative order
relationships are different and are best expressed by different binary matrices.
Likewise, absolute position information also has a different matrix representation
(for example, rows could represent cities and the columns represent positions).
Cycle crossover (Section 33.3.6; see Starkweather et crl 1991, Oliver er 01 1987)
appears to be a good absolute position operator, although it is hard to find
problems in the literature where absolute position is critical.

17.6 Alternative representations

Let P be an arbitrary permutation and P, be the j t h element of the permutation.
One notable alternative representation of a permutation is to define some
TEAM LRN
canonical ordering, C , over the elements
in the permutation and then define

I46

Permutations

a vector of integers, I , such that the integer in position j corresponds to the

position in which element CJ appears in P . Such a vector I can then serve as
a representation of a permutation. More precisely,

To illustrate:

= U

hcd e j g h

I = 6 2 5 3 8 7 1 4 which represents P = g h d

j e.

This may seem like a needless indirection, but consider that I can
be generalized to allow a larger number of possible values than there are
permutation elements. I can also be generalized to allow all real values
(although for computer implementations the distinction is somewhat artificial
since all digital representations of real values are discrete and finite). We
now have a parameter-based presentation of the permutation such that we can
generate random vectors I representing permutations. If the number of values
for which elements in I are defined is dramatically larger than the number of
elements in the permutation, then duplicate values in randomly generated vectors
will occur with very small probability.
This representation allows a permutation problem to be treated as if i t were
a more traditional parameter optimization problem with the constraint that no
two elements of vector I should be equal, or that there is a well defined way
to resolve ties. Evolutionary algorithm techniques normally used for parameter
optimization problems can thus be applied to permutation problems using this
representation.
This idea has been independently invented on a couple of occasions. The
first use of this coding method was by Steve Smith of Thinking Machines. A
version of this coding was used by the ARGOT Strategy (Shaefer 1987) and the
representation was picked up by Syswerda (1989) and by Schaffer er ul (1989)
for the TSP. More recently, a similar idea was introduced by Bean (1994) under
the name random keys.
17.7

Ordering schemata and other metrics

Goldberg and Lingle (1985) built on earlier work by Franz (1972) to describe
similarity subsets between different permutations. Franzs calculations were
related to the use of inversion operators for traditional genetic algorithm binary
representations. The use of inversion operators is very much relevant to the
topic of permutations, since in order to apply inversion the binary alleles must be
tagged in some way and inversion acts in the space of all possible permutations
of allele orderings. Thus,
((6 0 ) ( 3 1) ( 2 0 ) ( 8 1) (1 0 ) (5 1) (7 0 ) (4 0 ) )

is equivalent to

TEAM LRN

Ordering schemata and other metrics

147

( ( 1 0 ) ( 2 0)(3 1 ) ( 4 0 ) ( 5 1 ) ( 6 0 ) (7 0 ) ( 8 1))

which represents the binary string 00101001 in a position-independent fashion

(Holland 1975).
Goldberg and Lingle were more directly concerned with problems where
the permutation was itself the problem representation, and. in particular, they
present early results for the TSP. They also introduced the partially mapped
crossover (PMX) operator and the notion of ordering .wlienint(r, or o-sclieniotci.
For o-schemata, the symbol ! acts as a wild card match symbol. Thus, the
template
!

l !

7 3 !

represents all permutations with a one as the third element. a seven as the
sixth element, and a three as the seventh element. Given o selected positions
in a permutation of length I , there are ( I - o)! permutations that match an oschemata. One can also count the number of possible o-schema. There are
clearly (:)) ways to choose o fixed positions; there are also (,:) ways to pick the
permutation elements that fill the slots, and o ! ways of ordering the elements
(i.e. the number of permutations over the chosen combination of subelements).
Thus, Goldberg (1989, Goldberg and Lingle 1985) notes that the total number
of o-schemata, no,, can be calculated by

Note that in this definition of the o-schemata, relative order is not accounted for.
In other words, if relative order is important then all of the following shifted
o-schemata,
1
!

!
1

!
!

7
!

!
l

3
7

!
3

!
!

7
!

!
!

3
7

!
3

could be viewed as equivalent. Such schemata may or may not wrap around.
Goldberg discusses o-schemata which have an absolute fixed position (oschemata, type a) and those with relative position which are shifts of il specified
template (0-schemata, type r).
This work on o-schemata predates the distinctions between relative
order permutation problems, absolute position problems, and adjacency-based
problems. Thus, o-schemata appear to be better for understanding resource
scheduling applications than for the TSP. In subsequent work, Kargupta et
nl (1992) attempt to use ordering schemata to construct deceptive functions
for ordering problems-that is, problems where the average fitness values of
the o-schemata provide misleading information. Note that such problems are
constructed to mislead simple genetic algorithms and may or may not be difficult
TEAM LRN(For a discussion of deception see
with respect to other types of algorithm.

148

Permutations

the article by Goldberg ( 1987) and Whitley (1991) and for another perspective
see the article by Grefenstette (1993).) The analysis of Kargupta et ul ( 1 992)
considers PMX. a uniform ordering crossover operator (UOX), and a relative
ordering crossover operator (ROX).
An alternative way of constructing relative order problems and of
comparing the similarity of permutations is given by Whitley and Yoo
(1995). Recall that a relative order matrix has a I bit in position ( X , Y ) if
row element X appears before column element Y in a permutation. Note
that the matrix representation yields a unique binary representation for each
permutation. Using this representation one can also define the Hamming
distance between two permutations PI and P2; Hamming distance is denoted
by HD(index(P1), index(P2)), where the permutations are represented by their
integer index. In the following examples, the Hamming distance is computed
with respect to the lower triangle (i.e. it is a count of the number of 1 bits in
the lower triangle):
A B C D
-- - - - - ---

A B C D

A
B
C
D

I 0 1 1 1
l O O l l
l O O O l

HD(0,O) = 0

I 0 0 0 0

A B C D
-- ---- - --

B D C A

A I 0 0 0 0
B I l O l l

HD(0,ll)

HD(0,23)

1 0 0 0

D 1 1 0 1 0

A B C D

- -- - -- - -D C B A

A
B
C
D

I
J
l
l

O
l
l
l

O
0
l
l

O O
0 0
O O
l O

Whitley and Yoo (1995) point out that this representation is not perfect.
Since 2"') > N ! , certain binary strings are undefined. For example, consider
the following upper triangle:
1 1 1
0 1
0

Element 1 occurs before 2, 3, and 4, which poses no problem, but 2 occurs

after 3, 2 occurs before 4, and 4 occurs before 3. Using > to denote relative
TEAM LRN
order, this implies a nonexistent ordering
such that

Operator descriptions and local search

3 > 2 > 4

but

149

4 > 3

Thus, not all matrices correspond to permutations. Nevertheless, the binary

representation does afford a metric in the form of Hamming distance and
suggests an alternative way of constructing deceptive ordering problems, since
once a binary representation exists several methods for constructing misleading
problems could be employed. Deb and Goldberg ( 1 992) explain how to construct
trap functions. Whitley ( I99 1) also discusses the construction of deceptive
binary functions.
While the topic of deception has been the focus of some controversy
(cf Grefenstette 1993), there are few tools for understanding the difficulty or
complexity of permutation problems. Whitley and Yoo found that simulation
results failed to provide clear evidence that deceptilre functions built using oschema fitness averages really were misleading or difficult for simple genetic
algorithms.
Aside from the fact that many problems with permutation-based
representations are known to be NP complete problems, there is little work which
characterizes the complexity of specific instances of these problems, especially
from a genetic algorithm perspective. One can attempt to estimate the size
and depth of basins of attraction, but such methods must presuppose the use
of a particular search methods. The use of different local search operators can
induce different numbers of local optima and different sized basins of attraction.
Changing representations can have the same effect.

17.8 Operator descriptions and local search

Section 32.3, on mutation for permutations, also provides information on local
search operators, the best known of which is 2-opt. Information on the most
commonly used forms of recombination for permutation-based representations is
found in Section 33.3. For a general discussion of permutations, see the books by
Niven ( 1 965) and Knuth (1973). Whitley and Yoo (1995) present methods for
constructing infinite-population models for simple genetic algorithms applied
to permutation problems which can be easily converted into finite-population
Markov models.

References
Bean J C 1994 Genetic algorithms and random keys for sequencing and optimization
ORSA J. Cornpiit. 6
Cormen T, Leiserson C and Rivest R 1990 Introduction to Algorithnis (Cambridge. MA:
MIT Press)
Deb K and Goldberg D 1993 Analyzing deception in trap functions Fouririntions of
Genetic Algorithms 2 ed D Whitley (San Mateo, CA: Morgan Kaufmann)
Fox B R and McMahon M B 1991 Genetic Operrrtors .for Seyirencing Prohleim
Foiindcctiorts of Genetic Algorithrns ed G J E Rawlins (San Mateo, CA: Morgan
TEAM LRN
Kaufmann) pp 284-300

150

Permutations

Franr. D R 1972 Noti-Lirieririties in Genetic Adqititv Secirc-h PhD Dissertation, University

of Michigan
Goldberg D 1987 Simple genetic algorithms and the minimal, deceptive problem Ceti~~tic
Algorithms rititl SiniulLitetl Antiecrling ed L Davis (Boston, MA: Pitman)
-I989 Getietic Algorithms in Secirch, 0ptimi:cctioti c i n d Mac-lzine Letrrnitig (Reading,
MA: Addison-Wesley)
Goldberg D and Lingle R Jr 1985 Alleles, loci, and the traveling salesman problem Proc.
I s t lnt. Cot$ on Genetic Algorithms crnd Their App1iccition.s ed J J Gre fenstette
(Hillsdale, NJ: Erlbaum)
Grefenstette J 1993 Deception considered harmful Found~itiotisof' Genetic Algorithtns 2
ed D Whitley (San Mateo, CA: Morgan Kaufmann)
Holland J I975 Arftiptdoti Iti Nutiird nrid Artijcid Svstetns (University of Michigan
Press)
Kargupta H, Deb K and Goldberg D 1992 Ordering genetic algorithms and deception
Pcircillel Proh1etn.s Soh'itig from Nuture, 2 ed R Manner and B Manderick
(Amsterdam: Elsevier) pp 47-56
Knuth R M 1973 The Art of Cotnputer Pmgr~imniirig Volicr?le3: Sorting lint1 SecrrcAiticy
(Reading, MA: Addison-Wesley )
Niven I 1965 M(itherncitic~sof' Choice, or Hob$, to Coutit brithoirt Coiiriting (Mathematical
Association of America)
Oliver I. Smith D and Holland J 1987 A study o f permutation crossover operators on the
traveling salesman problem 2nd Int. Cot$ 011 Genetic A1gorithm.s (Pittsburgh, PA,
1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 224-30
Radcliffe N 199 I Forma analysis and random respectful recombination Fourth Itit. Cot$
0 1 1 Genetic Algorithms (Strti DirJgo,CA, Jic!\* f 99f) ed R Bclew and L Booker (San
Mateo, CA: Morgan Kaufmann) pp 222-9
I993 Genetic set rccombination Foitndations of Genetic Algorithtns 2 ed D Whitley
(San Mateo, CA: Morgan Kauf'mann) pp 203--19
Schaffer J D, Caruana R A, Eshelman L J and Das R I989 A study o f control parameters
affecting online performance o f genetic algorithms for function optimization Pmc.
3rd /tit. Cot$ o t i Genrtic A1gorithtn.s (F(iirjGix, VA, 1989) ed J D Schaffer (San
Mateo, CA: Morgan Kaufmann) pp 5 1 - 4 0
Shaefer C I987 The ARGOT strategy: adaptive representation genetic optimizer
technique 2nd Int. Cot$ o t i Genetic Algoritl1rn.s (Pittsbiirgh, PA, 1987) ed J J
Grefenstette (Hillsdale, NJ: Erlbaum) pp 50-8
Starkweather T. McDaniel S, Mathias K, Whitley D and Whitley C 1991 A comparison of
genetic sequencing operators Proc. 4th I t i t . Conj: 0 1 1 Getietic Algorithtns (San Diego,
CA, I Y Y / ) cd R K Belew and L B Booker (San Mateo. CA: Morgan Kauffman)
Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd lilt. Cot$ 0 1 1
Genetic Algorithtns (Ftrirfiix, VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2-9
Whitley D I99 I Fundamental principles of deception in genetic search Foi4ndcrtiott.c
oj' Grtietic AIgorit/inls ed G .I E Rawlins (San Mateo, CA: Morgan Kaufmann)
pp 221-41
Whitley D and Yoo N-W 1905 Modeling simple genetic algorithms for permutation
problerns Foutickiriotis ($Getretic' AIgorithnts 3 ed D Whitley and M Vose (San
Mateo, CA: Morgan Kaufmann) pp 163-84
~

TEAM LRN

18
Finite-state representations
David B Fogel

18.1 Introduction
A finite-state machine is a mathematical logic. It is essentially a computer
program: it represents a sequence of instructions to be executed, each depending
on a current state of the machine and the current stimulus. More formally, a
finite-state machine is a 5-tuple
A4 = ( Q , t,p , S ,

where Q is a finite set, the set of states, t is a finite set, the set of input symbols,
p is a finite set, the set of output symbols, s : Q x t --+ Q is the next state
function, and o : Q x t + p is the next output function.
Any 5-tuple of sets and functions satisfying this definition is to be interpreted
as the mathematical description of a machine that, if given an input symbol .r
while it is in state q , will output the symbol o ( y , .r) and transition to state s ( q . x).
Only the information contained in the current state describes the behavior of the
machine for a given stimulus. The entire set of states serves as the 'memory' of
the machine. Thus a finite-state machine is a transducer that can be stimulated
by a finite alphabet of input symbols, that can respond in a finite alphabet
of output symbols, and that possesses some finite number of different internal
states. The corresponding input-output symbol pairs and next-state transitions
for each input symbol, taken over every state, specify the behavior of any finitestate machine, given any starting state. For example, a three-state machine is
shown in figure 18.1. The alphabet of input symbols are elements of the set
{0, I ] , whereas the alphabet of output symbols are elements of the set {a,/3, y ]
(input symbols are shown to the left of the slash, output symbols are shown to
the right). The finite-state machine transforms a sequence of input symbols into
a sequence of output symbols. Table 18.1 indicates the response of the machine
to a given string of symbols, presuming that the machine is found in state C.
It is presumed that the machine acts when each input symbol is perceived and
the output takes place before the next input symbol arrives.
TEAM LRN

151

152

Finite-state representations

Figure 18.1. A three-state finite machine. Input symbols are shown to the left of the
slash. Output symbols are to the right of the slash. Unless otherwise specified, the
machine is presumed to start in state A . (After Fogel et nl 1966, p 12).

Table 18.1. The response of the tinite-state machine shown in figure 18.1 to a string of
symbols. In this example, the machine starts in state C .
Present state
Input symbol
Next state
Output symbol

C
B

B
1
C

C
1
A
y

A
0
B

6 6

I
A

18.2 Applications
Finite-state representations are often convenient when the required solutions to
a particular problem of interest require the generation of a sequence of symbols
having specific meaning. For example, consider the problem offered by Fogel
et cil (1966) of predicting the next symbol in a sequence of symbols taken from
some alphabet A (here, T = p = A ) . A population of finite-state machines is
exposed to the environment, that is, the sequence of symbols that have been
observed u p to the current time. For each parent machine, as each input symbol
is offered to the machine, each output symbol is compared with the next input
symbol. The worth of this prediction is then measured with respect to the
given payoff function (e.g. all-none, absolute error, squared error, or any other
expression of the meaning of the symbols). After the last prediction is made,
a function of the payoff for each symbol (e.g. average payoff per symbol)
indicates the fitness of the machine. Offspring machines are created through
mutation (Section 32.4) and/or recombination (Section 33.4). The machines that
provide the greatest payoff are retained to become parents of the next generation.
TEAM LRN
This process is iterated until an actual
prediction of the next symbol (as yet

153

Applications
C,DID

D,CIC

Legend

I
Stan State
,=
C = Cooperate
D = Delecl

Figure 18.2.
A finite-state machine evolved in prisoners dilemma experiThe input symbols form the set
ments detailed by Fogel (1995b, p 215).
((C. C). (C, D),( D ,
C), ( D ,
D)}and the output symbols form the set (C,D ] . The
machine also has an associated first move indicated by the arrow; here the machine
cooperates initially then proceeds into state 6.

inexperienced) in the environment is required. The best machine generates this

prediction, the new symbol is added to the experienced environment, and the
process is repeated.
There is an inherent versatility in such a procedure. The payoff function
can be arbitrarily complex and can possess temporal components; there is
no requirement for the classical squared-error criterion or any other smooth
LRN
function. Further, it is not required TEAM
that the
predictions be made with a one-step

IS4

Finite-state representations

look ahead. Forecasting can be accomplished at an arbitrary length of time into

the future. Multivariate environments can be handled, and the environmental
proces5 need not be stationary because the simulated evolution will adapt to
changes in the transition statistics.
For example, Fogel ( 1991, 1993, 199Sa) has used finite-state machines to
describe behaviors in the iterated prisoner's dilemma. The input alphabet mas
selected as the set ((C.C ) , (C. D ) . ( D , C ) , ( 0 ,D ) } where C corresponds to a
move for cooperation and D corresponds to a move for defection. The ordered
pair ( X , Y ) indicates that the machine played X in the last move, while it5
opponent played Y . The output alphabet was the set ( C , D } and corresponded
to the next move of the machine based on the previous pair of move\ and the
current state of the machine (see figure 18.2).
Other applications of' finite-state representations have been offered. For
example, Jefferson et trl (1991). Angeline and Pollack ( 1993), and others
employed a finite-state machine to describe the behavior of a simulated ant
on a trail placed on a grid. The input alphabet was the set (0, I ) , where 0
indicated that the ant did not see a trail cell ahead and I indicated that it did see
such a cell ahead. The output alphabet was { M , L . R , N } where M indicated
a move forward, L indicated a turn to the left without moving, R indicated a
turn to the right without moving, and N indicated a condition to do nothing.
The task was to evolve a finite-state machine that would generate a sequence of
moves to traverse the trail in the shortest number of time steps.

References
Angeline P J and Pollack J B 1993 Evolutionary module acqui4ition Proc. 2ndAtiti. Cont.
o t i Eidictioritirv Progr~iiriiriirig( S m Diego,CA) ed D B Fogel and W Atniar (La
Jolla, CA: Evolutionary Prograniming Society) pp I 54-63
Fogel D B 1991 The evolution of intelligent deci\ion-making in gaming Cvber.net. Sv\t.
22 223-36
-1993
E v o l ~ing behavior5 in the iterated prisoner-5 dilemtna E\*olirt.Cornput. 1 77-97
-1995a
On the relationship between the duration o f an encounter and the ebvlution
o f cooperation in the iterated prisoner's dilemma E\*olitt. Coriipict. 3 349-63
__ 1995b E\*olutiotiIir?~
Coinpirtutioti*Touurd II N o t . Philosophy of Mtrchine ltitel1igenc.e
(Piwataway. NJ: IEEE)
Fogel L J, Owens A J and Walsh M J 1966 Artijkicil Ititelligetic~e TIiroirgh Sitiiirlcittd
E\wliitioti (New York: Wiley )
Jeffttrson D, Collins R, Cooper C, Dyer M, Flowers M, Korf R, Taylor C and Wang A
1991 Evolution as a theme i n artificial life: the Genesysflracker \ystcm Artifir*rcrl
Lde I1 ed C G Langton, C Taylor, J D Farmcr and S Ra\mu\wn (Reading, MA:
Addiwn-Wesley ) pp 549-77

TEAM LRN

19
Parse trees
Peter J Angeline
When an executable structure such as a program or a function is the object of
an evolutionary computation, representation plays a crucial role in determining
the ultimate success of the system. If a traditional, syntax-laden programming
language is chosen to represent the evolving programs, then manipulation by
simple evolutionary operators will most likely produce syntactically invalid
offspring. A more beneficial approach is to design the representation to ensure
that only syntactically correct programs are created. This reduces the ultimate
size of the search space considerably. One method for ensuring syntactic
correctness of generated programs is to evolve the desired programs parse
tree rather than an actual, unparsed, syntax-laden program. Use of the parse
tree representation completely removes the syntactic sugar introduced into
a programming language to ensure human readability and remove parsing
ambiguity.
Cramer (1985), in the first use of a parse tree representation in a
genetic algorithm, described two distinct representations for evolving sequential
computer programs based on a simple algorithmic language and emphasized the
need for offspring programs to remain syntactically correct after manipulation
by the genetic operators. To accomplish this, Cramer investigated two encodings
of the language into fixed-length integer representations.
Cramer (1985) first represented a program as an ordered collection of
statements. Each statement consisted of N integers; the first integer identified
the command to be executed and the remainder specified the arguments
to the command. If the command required fewer than N - I arguments.
then the trailing integers in the statement were ignored. Depending on the
syntax of the statements command, an integer argument could identify a
variable to be manipulated or a statement to be executed. Consequently,
even though the program was stored as a sequence it implicitly encoded an
execution tree that could be reconstructed by replacing all arguments referring
to program statements with the actual statement. Cramer (1985) noted that
this representation was not suitable for manipulation by genetic operators and
occasionally resulted in infinite loops when two auxiliary statements referred to
each other.
TEAM LRN

I ss

156

Parse trees

The second representation for simple programs reviewed by Cramer (1985)

alleviated some of the deficiencies of the first by making the implicit tree
representation explicit. Instead of evolving a sequence of statements with
arguments that referred to other statements, this representation replaces these
arguments with the actual statement. For instance, an encoded program would
have the form (0 ( 3 5 ) ( I 3 ( 1 4 (4 5 ) ) ) ) where a matching set of parentheses
denotes a single complete statement. Note that in the language used by Cramer
(1985), a subtree argument does not return a value to the calling statement but
only designates a command to be executed.
Probably the best known use of the parse tree representation is that by Koza
( 1992), an example of which is shown in figure 19.1. The only difference
between the representations used in genetic programming (Koza 1992) and the
explicit parse tree representation (Cramer 1985) is that the subtree arguments in
genetic programming return values to their calling statements. This provides
a direct mechanism for the communication of intermediate values to other
portions of the parse tree representation and fortifies a subtree as an independent
computational unit. The variety of problems investigated by Koza (1992)
demonstrates the flexibility and applicability of this representational paradigm.
An appealing aspect of the parse tree representation is its natural recursive
definition, which allows for dynamically sized structures. All parse tree
representations investigated to date have included an associated restriction on
the size of the evolving programs. Without such a restriction, the natural
dynamics of evolutionary systems would continually increase the size of the
evolving programs, eventually swamping the available computational resources.
Size restrictions take on two distinct forms. Depth limitation restricts the
size of evolving parse trees based on a user-defined maximal depth parameter.
Node limitation places a limit on the total number of nodes available for an
individual parse tree. Node limitation is the preferred method of the two
since it encodes fewer restrictions on the structural organization of the evolving
programs (Angeline 1996).
In a parse tree representation, the primitive language-the contents of
the parse tree-determines the power and suitability of the representation.
Sometimes the elements of this language are taken from existing programming
languages, but typically it is more prudent to design the primitive language so
that it takes into consideration as much domain-specific knowledge as available.
Failing to select language primitives tailored to the task at hand may prevent
the acquisition of solutions. For instance, if the objective is to evolve a function
that has a particular periodic behavior, it is important to include base language
primitives that also have periodic behavior, such as the mathematical functions
sin s and cos x.
Due to the acyclic structure of parse trees, iterative computations are often
not naturally represented. It is often difficult for an evolutionary computation to
correctly identify appropriate stopping criteria for loop constructs introduced
into the primitive language. To TEAM
compensate,
the evolved function often is
LRN

Parse trees

157

evaluated within an implied repeat until done loop that reexecutes the evolved
function until some predetermined stopping criterion is satisfied. For instance,
Koza ( I 992) describes evolving a controller for an artificial ant for which the
fitness function repeatedly applies its program until a total of 400 commands
are executed or the ant completes the task. Numerous examples of such implied
loops can be found in the genetic programming literature (e.g. Koza 1992,
pp 147, 329, 346, Teller 1994, Reynolds 1994, Kinnear 1993).
Often it is necessary to include constants in the primitive language, especially
when mathematical expressions are being evolved. The general practice is to
include as a potential terminal of the language a special symbol that denotes a
constant. When a new individual is created and this symbol is selected to be a
terminal, rather than enter the symbol into the parse tree, a numerical constant
is inserted drawn uniformly from a user-defined range (Koza 1992). Figure 19.1
shows a number of numerical constants that would be inserted into the parse
tree in this manner.

,-,

if - I t - 0

0
0

0.1467

0.547

sin

0.9765

sln

,, ,

, * ,

cis

if - I t - 0

Sin

1.075

Figure 19.1. An example parse tree representation for a complex numerical function. The
function if-It-0 is a numerical conditional that returns the value of its second argument
if its first argument evaluates to a negative number and otherwise returns the value of
its third argument. The function 3- denotes a protected division operator that returns a
value of 1.0 if the second argument (the denominator) is zero.

Typically, the language defined for a parse tree representation is syntactically

homogenous, meaning that the return values of all functions and terminals are
the same computational type, (e.g. integer). Montana (1995) has investigated the
evolution of multityped parse trees and shown that extra syntactic considerations
do not drastically increase the complexity of the associated genetic operators.
Koza ( 1992) also investigates constrained parse tree representations.
TEAMtrees,
LRN they are a natural representation
Given the recursive nature of parse

158

Parse trees

in which to investigate issues concerning induction of modular structures.

Currently, three methods for inducing modular parse trees have been proposed.
Angeline and Pollack (1994) add two mutation operators to their Genetic
Program Builder (GLIB) system, which dynamically form and destroy modular
components out of the parse tree. The mmprr.ss mutation operation, which bears
some resemblance to the erzr~cipsulciteoperator of Koza ( 1992), selects a subtree
and makes it a new representational primitive in the language. The e-rpcinci
mutation operation reverses the actions of the compress mutation by selecting a
compressed subtree in the individual and replacing it with the original subtree.
Angeline and Pollack (1994) claim that the natural evolutionary dynamics of the
genetic program automatically discover effective modularizations of the evolving
programs. Rosca and Ballard ( 1996) with their Adaptive Representation method
use a set of heuristics to evaluate the usefulness of all subtrees in the population
and then create subroutines from the ones that are most useful. Koza (1994)
describes a third method for creating modular programs, called automatically
defined functions ( ADFs) (see Chapter 1 I ), which allow the user to determine the
number of subroutines to which the main program can refer. During evolution,
the definitions of both the main routine and all of its subroutines are evolved
in parallel. Koza and Andre (1996) have more recently included a number of
mutations to dynamically modify various aspects of ADFs in order to reduce
the amount of prespecification required by the user.

References
Angeline P J 1996 Genetic programmings continued evolution Acl\wices in Genetic
Progrrttrrmirig vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Prc\\)
pp 1-20
Angeline P J and Pollack J B 1994 Co-evolving high-level representations Artficiul Lije
I l l ed C G Langton (Reading, MA: Addiwn-Wesley) pp 55-71
Cramer N L 1985 A rcprewmtion for the adaptive generation o f 5imple wquential
programs Proc. 1s t lrit. Conj: on Genetic Algorithms (Pittsburgh, PA, Jitly 1985) ed
J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 183-7
Kinnear K E 1993 Generality and difficulty in genetic programming: evolving a sort
Proc. 5th Int. Car$ on Genetic Algorithms ( Utbutiu-Chuinpuigti, IL, Jiily 1993) cd
S Forre\t (San Mateo, CA: Morgan Kaufmann) pp 287-94
Kola J R I992 Genetic Progrurriniing: o r i the Progriintnting o j Conipiiter\ bj*Meuti 5 of
Ntrturctl Selection (Cambridge, MA: MIT Pre\\)
-1994 Get1e tic Prog rutnui irig 11: A ut omutic Disco \Yr\ o j Re us LI ble ProCSrmis
(Cambridge, MA: MIT Press)
Kola J R and Andre D 1996 Classifying protein segments as transmembrane domains
wing architecture-altering operation5 in genetic programming Ad\wzc.e,\ in Genetic
Progrunzniing vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Pre55)
pp 155-76
TEAM
LRN
programming
E\wliitionuT Compirt. 3 199-230
Montana D J 1995 Strongly typed genetic

References

159

Reynolds C W 1994 Evolution of obstacle avoidance beha\ ior: using noi\e to promote
robust solutions Adilcirrces it? Gewtic Algorithnis ed K Kinnear (Cambridge, MA:
MIT Press) pp 22 111.3
Rosca J P and Ballard D H 1996 Discovery of subroutines in genetic programming
Adwiwes in Gerzetic Progruntniing vol 2, ed P J Angeline and K Kinnear
(Cambridge, MA: MIT Press) pp 177-202
Teller A 1994 The evolution of mental models Adiwic.c),\ in Gcwetic*Algorithms ed
K Kinnear (Cambridge, MA: MIT Press) pp 199-220

TEAM LRN

20
Guidelines for a suitable encoding
David B Fogel and Peter J Angeline
In any evolutionary computation application to an optimization problem, the
human operator determines at least four aspects of the approach: representation,
variation operators, method of selection, and objective function. It could be
argued that the most crucial of these four is the objective function because it
defines the purpose of the operator in quantitative terms. Improperly specifying
the objective function can lead to generating the right answer to the wrong
problem. However it should be clear that the selections made for each of
these four aspects depend in part on the choices made for all the others. For
example, the objective function cannot be specified in the absence of a problem
representation. The choice for appropriate representation, however, cannot
be made in the absence of anticipating the variation operators, the selection
function, and the mathematical formulation of the problem to be solved. Thus,
an iterative procedure for adjusting the representation and search and selection
procedures in light of a specified objective function becomes necessary in many
applications of evolutionary computation. This section focuses on selecting the
representation for a problem, but it is important to remain cognizant of the
interdependent nature of these operations within any evolutionary computation.
There have been proposals that the most suitable encoding for any problem
is a binary encoding (Chapter IS) because it maximizes the number of
schemata being searched implicitly (Holland 1975, Goldberg 1989), but there
have been many examples in the evolutionary computation literature where
alternative representations have provided for algorithms with greater efficiency
and optimization effectiveness when compared with identical problems (see
e.g. the articles by Biick and Schwefel (1993) and Fogel and Stayton (1994)
among others). Davis ( 1991) and Michalewicz (1996) comment that in many
applications real-valued (Chapter 16) or other representations may be chosen to
advantage over binary encodings. There does not appear to be any general
benefit to maximizing implicit parallelism in evolutionary algorithms, and,
therefore, forcing problems to fit binary representation is not recommended.
The close relationship between representation and other facets of
evolutionary computation suggests that, in many cases, the appropriate choice of
representation arises from the operator's ability to visualize the dynamics of the
I60

TEAM LRN

Guidelines for a suitable encoding

161

resulting search on an adaptive landscape. For example, consider the problem

of finding the minimum of the quadratic surface
3

f ( x , y ) = x-

+ J3

x,?'

Immediately, it is easy to visualize this function as shown in figure 20.1.

If an evolutionary approach to the problem were to be taken, an intuitive
representation suggested by the surface is to operate on the real values of
(x,J) directly (rather than recoding these values into some other alphabet).
Accordingly, a reasonable choice of variation operator would be the imposition
of a continuous random perturbation to each dimension (x,y ) (perhaps a
zero-mean Gaussian perturbation as is common in evolution strategies and
evolutionary programming). This would be followed by a hard selection against
all but the best solution in the current population, given that the function is
strongly convex. With even slight experience, the resulting population dynamics
of this approach can be visualized without executing a single line of code. In
contrast, for this problem other representational choices and variation operators
(e.g. mapping the real numbers into binary and then applying crossover operators
to the binary encoding) are contrived, difficult to visualize, and appear more
likely to be ineffectual (see Schraudolph and Belew 1992, Fogel and Stayton
1994).

Figure 20.1. A quadratic bowl in two dimensions. The shape of the response surface
suggests a natural approach for optimization. The intuitive choice is to use real-valued
encodings and continuous variation operators. The shape of a response surface can be
useful in suggesting choices for suitable encodings.

Thus, the basic recommendation for choosing a suitable encoding is that

TEAMfrom
LRN the problem at hand. If a traveling
the representation should be suggested

I62

Guidelines for a suitable encoding

salesman problem is to be investigated, obvious natural choices for the encoding

are a li\t of cities to be visited in order, or a corresponding list of edges. For
ii di\crete-\ymbol time-series prediction problem, finite-state machines (Chapter
18)may be especially appropriate. For continuous time-series prediction, other
model forms (e.g. neural networks, ARMA, or Box-Jenkins) appear better
wited. In non\tationary environments, that is, fitness functions that are dynamic
rather than \tatic. it is often necessary to include some form of memory in
the representation. Diplodic representations-representations
that include two
alleles per gene-have been used to model cyclic environments (Goldberg and
Smith 1987, Ng and Wong 1995). The most natural choice for representation
is a wbjective choice, and it will differ across investigator$, although, like
a witable scientific model, a suitable representation should be as complex a\
neceswry (and no more so) and should 'explain' the phenomena inve\tigated,
which here means that the rewlting search should be viwalimble or imaginable
to some extent.

References
Back T and Schwefel H-P I993 An overview of evolutionary algorithms for parameter
optiinimtion E \ d i i t i o i i r i n Cornpiit. 1 1-24
Davis L ( e d ) 199 1 H m r l h o o k oj'Cerirtic A1,qorithni.s (New York: Van Nostrand Reinh1.M)
Fogel D B and Stayton L C I994 On the eft'ec(ibmcss of crossover in simulated
evolutionary o p t i m i h o n BioSystenis 32 I 7 1-82
Goldberg D E 1989 C o w t i c AIgorithi)is i i i S r ~ i r ~ Optimiyitiori,
~h.
triitl Metchine Lectriiirix
(Reading, MA: Addison-Wesley
Goldberg D E and Smith R E I987 Nonstationary function optimization using gewtic
algortihms with dominance and tiiploidy Proc. 2nd lut. Cot?$ o i i Gcwetic AI~qorit/irti,s
(Ctimhridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlhaum) pp 59-68
Holland J H 1975 Aclciptcitiori in Nlitiirul rind Artijic-icil Systeim (Atin Arhor, MI:
University of Michigan Press)
Michalewicz Z 1996 Genetic Algorithms + Dutct Striwtirres = Elwliitioii Pro,qrrinis 3rd
edn (Berlin: Springer)
Ng K P and Wong K C 1995 A new diploid scheme and dominance change mecha~rixm
for non-stationary function optimization Proc. 6th lnt. Car$ o i i Grnetic Algoritlrriis
(Pittshiirgh, PA. Jirly jYY.5) ed L J Eshelrnan (San Mateo, CA: Morgan Kaufm.inn)
pp 159-66
Schraudolph N N and Beleu R K 1992 Dynamic parameter encoding for geiietic
algorithms Mtrchiiie Ltwrriiri,q 9 9-2 1

TEAM LRN

21
~~

0ther representations
Peter J Angeline and David B Fogel

21.1 Mixed-integer structures

Many real-world applications suggest the use of representations that are hybrids
of the canonical representations. One common instance is the simultaneous use
of discrete and continuous object variables, with a general formulation of the
global optimization problem as follows (Back and Schutz 1995):
min{f(z. d ) l z E M , R

1M , d

N , Z, 2 N ]

Within evolution strategies and evolutionary programming, the common

representation is simply the real-integer vector pair (i.e. no effort is made to
encode these vectors into another representation such as binary). Sections 32.6
and 33.6 offer methods for mutating and recombining the abokve representations.
Mixed representations also occur in the application of evolutionary
algorithms to neural networks or fuzzy logic systems, where real-world
parameters are used to define weights or shapes of membership functions and
integer values are used to define the number of nodes and their connections, or
the number of membership functions (see e.g. Fogel 1995. Angeline et a1 1994,
McDonnell and Waagen 1994, Haffner and Sebald 1993).

21.2 Introns
In contrast to the above hybridization of different forms of representation,
another nontraditional approach has involved the inclusion of noncoding
regions (introns) within a solution (see e.g. Levenick 1991, Golden et cil 1995.
Wu and Lindsay 1995). Solutions are represented in the form
S I)intronl.rz(intronI. . .

IintronI.r,,

where there are 11 components to vector x. Introns have been hypothesized to

allow for greater efficiency in recombining building blocks (see Section33.6).
TEAM LRN

163

1 64

Other representations

In the standard genetic algorithm representation, the semantics of an allele

value (how the allele is interpreted) is usually tied to its position in the fixedlength ri-ary string. For instance, in a binary string representation, each position
signifies the presence or absence of a specific feature in the genome being
decoded. The difficulty with such a representation is that with positions in
the string representation that are semantically linked but separated by a large
number of intervening positions in the string crossover has a high probability
of disrupting beneficial settings for these two positions. Goldberg et crl ( 1989)
describe a representation for a genetic algorithm that embodies one approach to
addressing this problem. In their messy genetic algorithm (mGA). each allele
value is represented as a pair of values, one specifying the actual allele value
and one specifying the position the allele occupies. Messy GAS are defined to be
of variable length, and Goldberg et ~ r l(1989) describe appropriate methods for
resolving underdetermined or overdetermined genomes. In this representation it
is important to note that the semantics are literally carried along with the allele
value in the form of the allele's string position.
21.3 Diploid representations
Diploid representations. representations that include multiple allele values for
each position in the genome, hace been offered as mechanisms for modeling
cyclic environments. In a diploid representation, a method for determining
which allele value for a gene will be expressed is required to adjudicate
when the allele values do not agree. Building on earlier investigations (see
e.g. Bagley 1967, Hollstein 1971, Brindle 1981), Goldberg and Smith (1087)
demonstrate that an evolving dominance map allows quicker adaptation to
cyclical environment changes than either a haploid representation or a diploid
repre5entation using a tixed dominance mapping. Goldberg and Smith (1087)
use a triallelic representation from Hollstein ( 1971): I , i, and 0. Both I and I
map to the allele value of ' l ' , while 0 maps to the allele value of '0' with 1
dominating both i and 0 and 0 dominating i. Thus, the dominance of a 1 over
a 0 allele value could be altered via mutation by altering the value to an i. Ng
and Wong ( 1995) extend the multiallele approach to dominance computLdion
by adding a fourth value for a recessive 0. Thus 1 dominates 0 and o while 0
dominates i and o. When both allele values for a gene are dominant or reces4ve.
then one of the two values is chosen randomly to be the dominant value. Ng
and Wong (1995) also suggest that the dominance of all of the component5 in
the genome should be reversed when the fitne5s value of an individual falls by
20% or more between generations.

References
Angeline P J, Saunders G M and Pollack J B 1994 An evolutionary algorithm that
TEAM
LRNTr(it1.r.Neilrd Netbtwrkr NN-5 54-65
constructs recurrent neural networks
IEEE

I65

References

Back T and Schiitz M 1995 Evolution strategies for mixed-integer optimization of optical
multilayer sy\tem\ Proc. 4th Anit. Coi$ on E,dirtioiirii?) Pmgrcinimiiig (Strri Dic>go,
CA, Murcli 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge.
MA: MIT Press) pp 33-51
Bagley J D I967 The Behaiior of Adaptiipe Systents \ivhicA Employ Cerietic tirid
Correldon Algorithtns Doctoral Dissertation, University of Michigan: Unik er\ity
Micro fi 1ms 68-7556
Brindle A 198 I Geitc4c Algorithms f o r Fuizctiort 0ptinti;titioir Doctoral Di\sertation,
University of Alberta
Cobb H G and Grefenstette J J 1993 Genetic algorithms for tracking changing
environments Proc. 5th liit. Corzf: on Gerietic Algorithm ( Urb~iirci-CIiiim~~ni~~~i.
IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 523-30
of Mtdiiric.
Fogel D B 1995 Eidiitiortcir~ Coniputution: T o t i ~ ~ (tI l Nc.\i. Pliilosopli~~
Iiirelligem-e (Piscataway, NJ: IEEE)
Goldberg D E, Korb D E and Deb K 1989 Messy genetic algorithms: motivation, analysis,
and first results Comp1e.rSyst. 3 493-530
Goldberg D E and Smith R E 1987 Nonstationary function optimization using genetic
algorithm\ with dominance and diploidy Proc. 2nd Itit. Car$ o t i Genetic Algorithm
(Ccimbridge, MA, July 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 59-68
Golden J B, Garcia E and Tibbetts C 1995 Evolutionary optimiyation o f a neural netuorkba\ed signal processor for photometric data from an automated DNA sequencer
Proc.. 4th A m . Car$ on E,vlutionan Prograriiriiirig ( S a n Diego, CA, Mlirdi 199.5)
ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge. MA: MIT Press)
pp 579-601
Haffner S B and Sebald A V 1993 Computer-aided design of fuzzy HVAC
controllers using evolutionary programming Proc. 2nd A i i r i . Coi$ o i i Eiwlictioriar-Jq
Prognrmi?iiiig ( S a n Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 98-1 07
Hollstein R B I97 1 Artijcicil Genetic Adaptation in Conipirter Control Systeim Doctoral
Dissertation, University of Michigan; University Microfilms 7 1-23, 773
Levenick J R 1991 Inserting introns improves genetic algorithm wccess rate: taking
a cue from biology Proc. 4th Int. Cotzf: on Genetic Alpwithim ( S m Diego, CA,
Jirly 1991) ed R K Belew and L B Booker (San Mateo, CA: Morgan Kaufmann)
pp 123-27
McDonnell J R and Waagen D 1994 Evolving recurrent perceptrons for time-series
modeling IEEE Traits. Neitrctl N e h w k s NN-5 24-38
Ng K P and Wong K C 1995 A new diploid scheme and dominance change mechanism
for non-stationary function optimization Proc. 6th I t i t . Coi!f: o i i Genetic Algorithm
(Pittsburgh, PA, J u l y 1995) ed L J Eshelman (San Mateo, CA: Morgan Kaufmann)
pp 159-66
Wu A S and Lindsay R K 1995 Empirical studies of the genetic algorithm with noncoding
segments E~dutioiiunComput. 3 I2 1-48

TEAM LRN

22
Introduction to selection
Kalyanmoy Deb

22.1

Working mechanisms

Selection is one of the main operators used in evolutionary algorithms. The

primary objective of the selection operator is to enzplzasize better solutions
in a population. This operator does not create any new solution, instead it
selects relatively good solutions from a population and deletes the remaining,
not-so-good, solutions. Thus, the selection operator is a mix of two different
concepts-reproduction and selection. When one or more copies of a good
solution are reproduced, this operation is called reproduction. Multiple copies
of a solution are placed in a population by deleting some inferior solutions.
This concept is known as selection. Although some EC studies use both these
concepts simultaneously, some studies use them separately.
The identification of good or bad solutions in a population is usu;dly
accomplished according to a solutions fitness. The essential idea is that a
solution having a better fitness must have a higher probability of selection.
However, selection operators differ in the way the copies are assigned to
better solutions. Some operators sort the population according to fitness and
deterministically choose the best few solutions. whereas some operators assign
a probability of selection to each solution according to fitness and make a copy
using that probability distribution. In the probubilistic selection operator, there
is some finite, albeit small, probability of rejecting a good solution and choosing
a bad solution. However. a selection operator is usually designed in a way so
that the above is a low-probability event. There is, of course, an advantage of
allowing this stochasticity (or flexibility) in the evolutionary algorithms. Due
to a small initial population or an improper parameter choice or in solving a
complex nonlinear fitness function, the best few individuals in a finite population
may sometimes represent a suboptimal region. If a deterministic selection
operator is used, these seemingly good individuals in the population will be
emphasized and the population may finally converge to a wrong solution.
However, if a stochastic selection operator is used, diversity in the population
will be maintained by occasionally choosing not-so-good solutions. This event
166

TEAM LRN

167

Pseudocode

may prevent EC algorithms from making a hasty decision in converging to a

wrong solution.
In the following, we present a pseudocode for the selection operator and
then discuss briefly some of the popular selection operators.

22.2 Pseudocode
Some EC algorithms (specifically, genetic algorithms (GAS) and genetic
prograininiiig (GP)) usually apply the selection operator first to select good
solutions and then apply the recombination and mutation operators on these
good solutions to create a hopefully better set of solutions. Other EC algorithms
(specifically, evolution strategies (ES) and adiitioiiury progrcinirnirzg (EP))
prefer using the recombination and mutation operator first to create a set of
solutions and then use the selection operator to choose a good set of solutions.
The selection operator in ( p A ) ES and EP techniques chooses the offspring
solutions from a combined population of parent solutions and solutions obtained
after recombination and mutation. In the case of EP, this is done statistically.
However, the selection operator in ( p ,A ) ES chooses the offspring solutions
only from the solutions obtained after the recombination and mutation operators.
Since the selection operators are different in different EC studies, it is difficult
to present a common code for all selection operators. However, the following
pseudocode is a generic for most of the selection operators used in EC studies.
The parameters U
, and A are the numbers of parent solutions and offspring
solutions after recombination and mutation operators, respectively.
The
parameter q is a parameter related to the operators selective pressure, a matter
we discuss later in this section. The population at iteration t is denoted by
P ( t ) = { a , az,
, . . .} and the population obtained after the recombination and
mutation operators is denoted by f ( t ) = {a,
, a>,. . .}. Since GAS and GP
techniques use the selection operator first, the population P ( t ) before the
selection operation is an empty set, with no solutions. The fitness function
is represented by F ( t ) .

Input: p , A, q , P ( t ) E Z, P ( t ) E Z E b , F ( t )
Output: P ( t ) = {a,,
ay...

I to

for i

a:(t)t S\election(P(t),
return({a,(r),. . . , a;:(t)

E ZL

Detailed discussions of some of the selection operators are presented in the

subsequent sections. Here, we outline a brief introduction to some of the popular
selection schemes, mentioned as Ssclcction in the above pseudocode.
In the proportionate selection operator, the expected number of copies a
solution receives is assigned proportionally to its fitness. Thus, a solution having
LRN
twice the fitness of another solution TEAM
receives
twice as many copies. The simplest

168

Introduction to selectiol

form of the proportionate selection scheme is known as the roulette-whcel

selection, where each solution in the population occupies an area on the roulette
wheel proportional to its fitness. Then, conceptually, the roulette wheel is spun
as many times as the population size, each time selecting a solution marked
by the roulette-wheel pointer. Since the solutions are marked proportionally to
their fitness, a solution with a higher fitness is likely to receive more copies
than a solution with a low fitness. There exists a number of variations to this
simple selection scheme, which are discussed in Chapter 23. However, one
limitation of the proportionate selection scheme is that since copies are assigned
proportionally to the fitness values, negative fitness values are not allowed.
Also, this scheme cannot handle minimization problems directly. (Minimization
problems must be transformed to an equivalent maximization problem in order
to use this operator.) Selecting solutions proportional to their fitness has two
inherent problems. If a population contains a solution having exceptionally
better fitness than the rest of the solutions in the population, this so-called
.sirpur.volirtiori will occupy most of the roulette-wheel area. Thus, most spinning
of the roulette wheel is likely to choose the same supersolution. This may cause
the population to lose genetic diversity and cause the algorithm to prematurely
converge to a suboptimal solution. The second inherent difficulty may arise
later in a simulation run, when most of the population members have more or
less the same fitness. In this case, the roulette wheel is marked almost equally
for each solution in the population and every solution becomes equally likely
to be selected. This has the effect of a random selection. Both these inheient
difficulties can be avoided by using a sculing scheme, where every solution
fitness is linearly mapped between a lower and an upper bound before marking
the roulette wheel (Goldberg 1989). This allows the selection operator to assign
a controlled number of copies, thereby eliminating both the above problems of
too large and random assignments. We discuss this scaling scheme further in the
next section. Although this selection scheme has been mostly used with GAS
and GP applications, in principle it can also be used with both multimembcred
ES and EP techniques.
In the toitrriciment selection operator, both the scaling problems mentioned
above are eliminated by playing tournaments among a specified number of parent
solutions according to fitness of solutions. In a tournament of y solutions, the
best solution is selected either deterministically or probabilistically. After the
tournament is played, there are two options-either all participating y solutions
are replaced into the population for the next tournament or they are not replaced
until a certain number of tournaments have been played. In its simplest form
(called the binary tournament selection), two solutions are picked and the bctter
solution is chosen. One advantage of this selection method is that this schzme
can handle both minimization and maximization problems without any structural
change in the fitness function. Only the solution having either the highest or
the lowest objective function value need to be chosen depending on whether
the problem is a maximization orTEAM
a minimization
problem. Moreover, it has
LRN

169

Pseudocode

no restriction on negative objective function values. An added advantage of

this scheme is that it is ideal for a parallel implementation. Since only a few
solutions are required to be compared at a time without resorting to calculation
of the population average fitness or any other population statistic, all solutions
participating in a tournament can be sent to one processor. Thus, tournaments
can be played in parallel on multiple processors and the complete selection
process may be performed quickly. Because of these properties, tournament
selection is fast becoming a popular selection scheme in most EC studies.
Tournament selection is discussed in detail in Chapter 24.
The r-nnkirzg selection operator is similar to proportionate selection except
that the solutions are ranked according to descending or ascending order of
their fitness values depending on whether it is a maximization or minimization
problem. Each solution is assigned a ranked fitness based on its rank in
the population. Thereafter, copies are allocated with the resulting selection
probabilities of the solutions calculated using the ranked fitness values. Like
tournament selection, this selection scheme can also handle negative fitness
values. There exists a number of other schemes based on the concept of the
ranking of solutions; these are discussed in Chapter 25.
In the Boltmnnri selection operator, a modified fitness is assigned to each
solution based on a Boltzmann probability distribution: F,= 1 /( 1 +exp( F, / T ) ) ,
where T is a parameter analogous to the temperature term in the Boltzmann
distribution. This parameter is reduced in a predefined manner in successiFfe
iterations. Under this selection scheme, a solution is selected based on the
above probability distribution. Since a large value of T is used initially. almost
any solution is equally likely to be selected, but, as the iterations progress, the
parameter T becomes small and only good solutions are selected. We discuss
this selection scheme further in Chapter 26.
In the ( p
A ) ES, the selection operator selects p best solutions
deterministically from a pool of all p parent solutions and A offspring
solutions. Since all parent and offspring solutions are compared, if performed
deterministically, this selection scheme guarantees preservation of the best
solution found in any iteration.
On the other hand, in the ( p , A) ES, the selection operator chooses p
best solutions from A (usually A > p ) offspring solutions obtained by the
recombinatior! and mutation operators. Unlike the ( p + A ) ES selection scheme,
the best solution found in any iteration is not guaranteed to be preserved
throughout a simulation. However, since many offspring solutions are created in
this scheme, the search is more exhaustive than that in the ( p A ) ES scheme.
In most applications of the ( p ,A ) ES selection scheme, a deterministic selection
of best p solutions is adopted.
In modern variants of the EP technique, a slightly different selection scheme
is used. In a pool of parent (of size p ) and offspring solutions (of size the same
as the parent population size), each solution is first assigned a score depending
on how many solutions it is better TEAM
than from
LRN a set of random solutions (of size

170

Introduction to selection

ei) chosen from the pool. The complete pool is then sorted in descending order

of this score and the first p solutions are chosen deterministically. Thus, this
selection scheme is similar to the ( p + p ) ES selection scheme with a tournament
selection of q tournament size. Back et cil ( 1994) analyzed this selection scheme
as a combination of ( p p ) ES and tournament selection schemes, and found
some convergence characteristics of this operator.
Goldberg and Deb (1991 ) have compared a number of popular selection
schemes in terms of their convergence properties, selective pressure, takeover
times, and growth factors, all of which are important in the understanding of
the power of different selection schemes used in GA and GP studies. Similar
studies have also been performed by Back et er1 (1994) for selection schemes
used in ES and EP studies. A detailed discussion of some analytical as well as
experimental comparisons of selection schemes is presented in Chapter 29. In
the following section, we briefly discuss the theory of selective pressure and its
importance in choosing a suitable selection operator for a particular application.

22.3 Theory of selective pressure

Selection operators are characterized by a parameter known as the .wImi\v~
prussiire, which relates to the takeover time of the selection operator. The

takeover time is defined as the speed at which the best solution in the initial
population would occupy the complete population by repeated application of the
selection operator alone (Back 1994. Goldberg and Deb 1991). If the takeover
time of a selection operator is lctrge (that is, the operator takes a large number
of iterations for the best solution to take over the population), the selective
pressure of the operator is smcill, and vice versa. Thus, the selective pressure or
the takeover time is an important parameter for successful operation of an EC
algorithm (Back 1994, Goldberg et crl 1993). This parameter gives an idea of
how greedy the selection operator is in terms of making the population uniform
with one particular solution. If a selection operator has a large selective pressure,
the population loses diversity in the population quickly. Thus. in order to awid
premature convergence to a wrong solution, either a large population is required
or highly disruptive recombination and mutation operators are needed. However.
a selection operator with a small selection pressure makes a slow convergence
and permits the recombination and mutation operators enough iterations to
properly search the space. Goldberg and Deb ( 1991) have calculated takeover
times of a number of selection operators used i n GAS and GP studies and Back
( 1994) has calculated the takeover time for a number of selection operators used
in ES, EP, and GA studies. The former study has also introduced two other
parameters-early and late growth rate-characterizing the selection operators.
The growth rate is defined as the ratio of the number of the best solutions
in two consecutive iterations. Since most selection operators have different
growth rates as the iterations progress, two different growth rates-early and
TEAM
LRNgrowth rate is calculated initially,
late growth rates-are defined. The
early

171

References

when the proportion of the best solution in the population is negligible. The
late growth rate is calculated later, when the proportion of the best solution in
the population is large (about 0.5). The early growth rate is important, especially
if a quick near-optimizer algorithm is desired, whereas the late growth rate can
be a useful measure if precision in the final solution is important. Goldberg
and Deb (1991) have calculated these growth rates for a number of selection
operators used in GAS. A comparison of different selection schemes based on
some of the above criteria is given in Chapter 29.
The above discussion suggests that, for a successful EC simulation. the
required selection pressure of a selection operator depends on the recombination
and mutation operators used. A selection scheme with a large selection pressure
can be used, but only with highly disruptive recombination and mutation
operators. Goldberg et cil ( 1993) and later Thierens and Goldberg ( 1993) ha\ie
found functional relationships between the selective pressure and the probabilit)?
of crossover for successful working of selectorecombinative GAS. These studies
show that a large selection pressure can be used but only with a large probability
of crossover. However. if a reasonable selection pressure is used, GAS ~ . ' o r k
successfully for a wide variety of crossover probablities. Similar studies can
also be performed with ES and EP algorithms.

References
Back T I994 Selectitre pressure in etrolutionary algorithms: a charactcrimtion of selection
mechanisms Proc. 1st IEEE C o i ~ oti
; E~dirtioti~ii;~~
Cornpirtrrtiori ( Orlmrio. Fl,.
1994) (Piscataway, NJ: IEEE) pp 57-62
Back T, Rudolph G and Schwefel H-P 1994 Evolutionary programming and ctwlution
strategies: similarities and differences Proc. 2ritl Aiiri. Col$ o i i ,*olirtioiitir;\~
Progrrinit?iitig ( S r i t i Diego, CA, Jirl?, 1994) ed D B Fogel and W Atmar ( L a Jolla.
CA : E v o 1U t i on ary Program m i ng Soc i et y )
Goldberg D E 1989 Genetic AIgor-ithnis in Setrr-(4, Optii~iixitiou,tirid Mticliirir Lctrr.riiricq
(Reading, MA: Addison-Wesley)
Goldberg D E and Deb K 1991 A comparison of selection schemes used i n genetic
I N ) ed G J E Raw'lins
algorithms Foiinckitions of Genetic Algorithnis ~Bl(~oiiii~i~stoii,
(San Mateo, CA: Morgan Kaufmann) pp 69-93
Goldberg D E, Deb K and Theirens D 1993 Toward a better understanding o f mixing i n
genetic algorithms J. SIC 32 10-6
Thierens D and Goldberg D E 1993 Mixing in genetic algorithms Proc. -5th / l i t . Car$ or1
Genetic. Algorithrm (Ur~?(i,i,i-Chai?iptiigii,
lL, Ji4!\' 1993) ed S Forrest (San Mateo.
CA: Morgan Kaufmann) pp 38-45

TEAM LRN

23
Proportional selection and sampling
algorithms
J o h I i G refe I i stet te

23.1

Introduction

Selection (Chapter 22) is the process of choosing individuals for reproduction in

an evolutionary algorithm. One popular form of selection is called proportiomil
.se/ectioir. As the name implies. this approach involves creating a number
of offspring in proportion to an individuals fitness. This approach was
proposed and analyzed by Holland ( 1975) and has been used widely in many
i 111 p It.me n ta t ion s of e v 0111t i onary algorithms .
B e si de s h ii v i n g s o me i n te re st i n g mat hem at i c a I properties , proportion a 1
selection proLrides a naturril counterpart in artificial evolutionary systems to the
umal practice in population genetics of defining an individuals fitness in terms
of its number of offspring.
For clarity of discussion, i t is convenient to decompose the selection process
into distinct steps, namely:
( i ) map the objective function to fitness.
( i i ) create a probability distribution proportional to fitness, and
( i i i ) draw samples from this distribution.
The tirst three sections o f this article discuss these steps. The final section
discusses some results in the theory of proportional selection. including the
schema theorem and the impact of the fitness function, and two characterizations
of selective pressure.

23.2

Fitness functions

The ecduation process of indictiduals in an evolutionary algorithm begins with

the user-defined ohjjec-ti\-e, t i i i i ( * t i o i i ,

172

TEAM LRN

173

Fitness functions

where A , is the object variable space. The objective function typically measures
some cost to be minimized or some reward to be maximized. The definition of
the objective function is, of course, application dependent. The characterization
of how well evolutionary algorithms perform on different classes of objective
functions is a topic of continuing research. However, a few general design
principles are clear when using an evolutionary algorithm.
(i) The objective function must reflect the relevant measures to be optimized.
Evolutionary algorithms are notoriously opportunistic, and there are several
known instances of an algorithm optimizing the stated objectii e function.
only to have the user realize that the objective function did not actually
represent the intended measure.
(ii) The objective function should exhibit some regularities over the space
defined by the selected representation.
(iii) The objective function should provide enough information to dri\fe the
selective pressure of the evolutionary algorithm. For example, needle-ina-haystack functions, i.e. functions that assign nearly equal calue to every
candidate solution except the optimum, should be avoided.
The jitness fiiriction
@:A,+IR+

maps the raw scores of the objective function to a non-negative interval. The
fitness function is often a composition of the objecti\ye function and a scaling
function g:
W 4 ( t ) )= g ( . f ( l N ) ) )
where cc,(t) E A , . Such a mapping is necessary if the goal is to minimize
the objective function, since higher fitness values correspond to lower objectiLre
values in this case. For example, one fitness function that might be used when
the goal is to minimize the objective function is
@ ( a m )= .finax - . f ( G ( t ) )

where fmax is the maximum value of the objective function. If the global
maximum value of the objective function is unknown, an alternatiLre is
@(U&))

= .fmax(t)- . f ( W ) )

where .fnlax(t)is the maximum observed value of the objective function up to

time t . There are many other plausible alternatives, such as
@(a,(t)>
=

1 f ( W )) .fi,,l(t)
where & ( t ) is the minimum observed value of the objective function up to
time t . For maximization problems, this becomes

@ ( a m )=

1 .fmax(t)- f ( % ( t ) )
TEAM LRN
Note that the latter two fitness functions
yield a range of (0, 1 1 .

174

Proportional selection and sampling algorithms

23.2. I

Fitrirss sculirig

As an evolutionary algorithm progresses, the population often becomes

dominated by high-performance individuals with a narrow range of objective
values. In this case, the fitness functions described above tend to assign similar
fitness values to all members of the population, leading to a loss in the selective
pressure toward the better individuals. To address this problem, fitriexs s(u1iizg
methods that accentuate small differences in objective values are often used in
order to maintain a productive level of selective pressure.
One approach to fitness scaling (Grefenstette 1986) is to define the fitness
function as a time-varying linear transformation of the objective value, for
ex amp 1e
@(cl;(t))

= q f ( l l ; ( t ) )-- B ( t )

where a is + I for maximization problems and - 1 for minimization problems,

and p ( t ) represents the worst value seen in the last few generations. Since p ( t )
generally improves over time, this scaling method provides greater selection
pressure later in the search. This method is sensitive, however, to lethals.
poorly performing individuals that may occasionally arise through crossover or
mutation. Smoother scaling can be achieved by defining p ( t ) as a recencyweighted running average of the worst observed objective values, for example

where 6 is an update rate of, say, 0.1, and fMOr,,(t)is the worst objective value
in the population at time t .
Sigizitr w r l i r i g (Goldberg 1989) is based on the distribution of objective
kralues within the current population. It is defined as follows:

where ~ f ( tis) the mean objective value of the current population, a,.(r)is the
(sample) standard deviation of the objective values in the current population,
and is a constant, say c. = 2. The idea is that f ( t )-cat (I)
represents the Icast
acceptable objective value for any reproducing individual. As the population
improves, this statistic tracks the improvement, yielding a level of seleclive
pressure that is sensitive to the spread of performance values in the population.
Fitness scaling methods based on power laws have also been proposed A
fixed transformation of the form
c s

where k is a problern-dependent parameter, is used by Gillies (1985). BoIt:intriiri

TEAM
selc~c-fion
(de la M a ~ aand Tidor 1993)
is aLRN
power-law-based scaling method that

I75

Selection probabilities

draws upon techniques used in simulated annealing. The fitness function is a

time-varying transformation given by

where the parameter T can be used to control the level of selective pressure
during the course of the evolution. It is suggested by de la Maza and Tidor
(1993) that, if T decreases with time as in a simulated annealing procedure,
then a higher level of selective pressure results than with proportional selection
without fitness scaling.

23.3 Selection probabilities

Once the fitness values are assigned, the next step in proportional selection is
to create a probability distribution such that the probability of selecting a gi\.en
individual for reproduction is proportional to the individual's fitness. That is,

23.4 Sampling
In an incremental, or steady-state, algorithm. the probability distribution can
be used to select one parent at a time. This procedure is commonly called
the roirlette \ t h e 1 sampling algorithm, since one can think of the probability
distribution as defining a roulette wheel on which each slice has a width
corresponding to the individual's selection probability, and the sampling can
be envisioned as spinning the roulette wheel and testing which slice ends up at
the top. The pseudocode for this is shown below:

Input: probability distribution Pr

Output: 1 1 , the selected parent
1
2
3
4
5

roulette wheel (Pr):

I1 + 1 ;
sum t Pr(n);
sample U
U(0,I ) ;
while sum < 14 do
12 t ( n
I);
sum t sum Pr(n);
od
return ( n ) ;

TEAM LRN

Proportional selection and sampling algorithrns

176

In a generational algorithm, the entire population is replaced during each

generation, \o the probability distribution is sampled p times. This could be
implemented by p independent calls to the roulette wheel procedure, but such an
implementation may exhibit a high variance in the number of offspring assigned
to each individual. For example, it is possible that the individual with the largtjt
selection probability may be assigned no offspring in a particular generation.
Baker ( 1987) developed an algorithm called .stochcr.stic urii\*er..\crl strmpling (SUS)
that exhibits less variance than repeated calls t o the roulette wheel algorithm.
The idea is to make a single draw from a uniform distribution, and use this
to determine how many offspring to assign to all parents. The pseudocode for
sus follows:

Input: a probability distribution, Pr; the total number of children

to assign, A.
Output: c = ( ( - 1 , . . . , c-,, ), where is the number of children assigned
t o individual c l , , and
= A.
SUS(Pr,A):
sample 11 U ( O ,
3
sum +- 0.0;
-5
for i = 1 to p do
5
C, +- 0:
6
sum +- sum Pr(i);
7
while 11 < sum do
8
+- c,
1:
9
11 + 11 4- ;;
od
od
10
return c ;
1

i):

Note that the pseudocode allows for any number h > 0 of children to
be \pecitied. If h = I . SUS behaves like the roulette wheel function. For
generational algorithm\, SUS is usually invoked with h = p .
In can he shown that the expected number of offspring that SUS assign\ to
indikidual i i \ A Pr(i), and that on each invocation of the procedure, SUS assigns
either LA Pr(i)J or [A Pr(i)l offspring to individual i. Finally, SUS is optimdly
efficient, making a \in& pa\\ over the individual3 to assign all off9pring.

23.5 Theory
The section presents some results from the theory of proportional selection.
TEAMfollowing
LRN
First. the schema theorem is described,
by a discussion of the effects

I77

Theory

of the fitness function on the allocation of trials t o schemata. The selectice

pressure of proportional selection is characterized i n two ~ ' a y s . First. the
selection differential describes the effects of selection on the mean population
fitness. Second, the takeover time describes the convergence rate of population
toward the optimal individual, in the absence of other genetic operators.

In the above description, Prprop(i)is the probability of selecting indi\,idual i for

reproduction. In a generational evolutionary algorithm, the entire population is
replaced, so the expected number of offspring of indi\ridual i is 1-1Pr,,,,,,(i). This
value is called the tov-get sampling rate, tsr(tr, , t ) of the indii idual (Grefenatette
199 I ). For any selection algorithm, the allocation of offspring to indi\iduals
induces a corresponding allocation to hyperplanes reprewnted by the indicvidual\:

where ( I , E H and n z ( H , r ) denotes the number of representatikw of hyperplane

H in population P ( r ) . In the remainder of this di\cu\sion. b e ill refer to
tsr( H . r ) as the tril;qet scrmpliq rate of H at time t .
For proportional selection, we have

where Q, is the fitness function and & ( t ) denotes the average titness of the
individuals in P ( t ) . The most important feature of proportional selection is
that it induces the following target sampling rates for all hyperplanes i n the
population :
rti(

tsr(H. t ) =

H.t)

tsr(ir,, t )

(23.1)
where @ ( H ,t ) is simply the average fitness of the representatiies of H i n P ( t ) .
This result is the heart of the schema theorem (Holland 1975), which has been
TEAM LRN
called the jiulckri?iei~rcIltlzeoreni of genetic
u l g o r i t h s (Goldberg 1 989 ).

178

Proportional selection and sampling algorithms

Schenitr thuor-em. In a genetic algorithm using a proportional selection

algorithm, the following holds for each hyperplane H represented in P ( t ) :

where M ( H , t ) is the expected number of representatives of hyperplane H in

P ( t ) . and lldl\r(H, I ) i \ the probability of disruption due to genetic operators
\uch as crossover and mutation.
Holland provides an analysis of the disruptive effects of various genetic
operator\, and shows that hyperplanes with \hart defining lengths, for example.
haLe a small chance of disruption due to one-point crossover and mutation
operators. Others have extended this analysis to many varieties of genet ic
operator\.
The main thrust of the \chema theorem 1s that trials are allocated i i i
p ~ i i - u l l t ~tlo a large number of hyperplane\ (i.e. the one\ with \hart definition
length\) according to the sampling rate ( 2 3 . I ), with minor di\ruption from the
recombination operator,. Over succeeding generation\, the number of trial\
a 11oc ;it ed to e x tan t short - de ti n i t i on - I e n g t h h y pe rp 1Line s wi t h pe r \ i ste n t 1y abo\ eaberage oherved fitness is expected to grow rapidly, while trial\ to thaw Miith
belou -average ob\er\ied fitness generally decline rapidly.

In his early analysis of genetic algorithms. Holland implicitly assumes a

nonnegative fitness and does not explicitly address the problem of mapping from
the objectikte function to fitness in his brief discussion of function optimization
(Holland 1975. ch 3). Consequently, many of the schema analysis results in the
literature use the symbol .f to refer to the fitriess and not to obje(.ti\v J i i i z c * t i o i i
\Aues. The methods mentioned above for mapping the objective function to the
fitness L.ii1ut.s must be kept in mind when interpreting the schema theorem. For
example, consider two genetic algorithms that both L I S proportional
~
selection
but that differ in that one L W S the fitness function

and the other uses the fitness function

where y # 0. Then for any hyperplane H represented in a given population

P ( t ) , the target sampling rate for H in the first algorithm is
TEAM LRN

Theory

179

while the target sampling rate for H in the second algorithm is

Even though both genetic algorithms behave according to the schema theorem.
they clearly allocate trials to hyperplane H at different rates, and thus produce
entirely different sequences of populations. The relationship between the schema
theorem and the objective function becomes even more complex if the fitness
function Q> is dynamically scaled during the course of the algorithm. Clearly.
the allocation of trials described by schema theorem depends on the precise
form of the fitness function used in the evolutionary algorithm. And of course,
crossover and mutation will also interact with selection.

23.5.3 Selectiori diferentinl

Drawing on the terminology of selective breeding, Miihlenbein and SchlierkampVoosen (1993) define the selection differential S ( t ) of a selection method as the
difference between the mean fitness of the selected parents and the mean fitness
of the population at time t . For proportional selection. they show that the
selection differential is given by

where cri(t) is the fitness variance of the population at time t . From this
formula, it is easy to see that, without dynamic fitness scaling, an evolutionary
algorithm tends to stagnate over time since cri(t) tends to decrease and & ( t )
tends to increase. The fitness scaling techniques described above are intended
to mitigate this effect. In addition, operators which produce random variation
(e.g. mutation) can also be used to reduce stagnation in the population.
23.5.4 Takeoiler time
Tckeolvr tirize refers to the number of generations required for an evolutionary
algorithm operating under selection alone (i.e. no other operators such as
mutation or crossover) to converge to a population consisting entirely of
instances of the optimal individual, starting from a population that contains a
single instance of the optimal individual. Goldberg and Deb (1991) show that,
assuming Q> = f , the takeover time r in a population of size p for proportional
selection is
plnp- 1
t]=
TEAM LRN
c

I80

Proportional selection and sampling algorithms

c
for .f.(x) = exp(ox). Goldberg and Deb compare these results with several other
selection mechanisms and show that the takeover time for proportional selection
(without fitness scaling) is larger than for many other selection methods.

References
Back T I994 Selective pressure in evolutionary algorithms: a characterization of selection
mechanisms Proc. 1 s t IEEE Int. Car$ on E,~oliitioiieii-yCoitipictcition (Orltindo, FL,
Jirrie 1994) (Piscataway, NJ: IEEE) pp 57-62
Baker J E 1987 Reducing bias and inefficiency in the selection algorithm Pmc. 2nd Irtt.
Cot$ o t i Genetic A1gorithrn.s (Cctmhridge, MA, 1987) ed J Grefenstette (Hillsdale,
NJ: Erlbaum) pp 14-21
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Cot$ o i i
Genetic Algorithms ( UrbaiiN-Chctmyciign,IL, July 1993) ed S Forrest (San Mateo.
CA: Morgan Kaufmann) pp 124-31
Gillies A M 1985 Mwhiiie Leciriiirig ProcwIures jhr Generciting Iiticige Dotttairi Fecitiire
Detectors Doctoral Dissertation, University of Michigan, Ann Arbor
Goldberg D E 1989 Genetic Algorithnis in Sectrch, Optimizcition, Litid Machine Leciriiiiig
(Reading, MA: Addison-Wesley
Goldberg D and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foirnckitions of Genetic Algoritlrtns ed G Rawlins (San Mateo,
CA: Morgan Kaufmann) pp 69-93
Gretenstette J 1986 Optimization of control paranieters for genetic algorithms IEEE
T r m s . Syst. M m Cyhenzet. SMC-16 122-8
-1
99 1 Conditions for implicit parallelism Founrlciriotr.s ($Getietir. AIgorithm.~ed G
Rawlins (San Mateo, CA: Morgan Kaufmann) pp 252-61
Holland J H 1975 Adqmtioti iii Neitimil ctnd Art$c*icil Systems ( A m Arbor, MI:
University of Michigan Press)
Muhlenbein H and Schlierkamp-Voosen D 1993 Predictive models for the breeder genetic
algorithm E\dict. Compirt. 1 2 5 4 9

TEAM LRN

24
Tournament selection
Tobias Blickle

24.1 Working mechanism

In tournament selection a group of q individuals is randomly chosen from
the population. They may be drawn from the population with or without
replacement. This group takes part in a tourrzanzent; that is, a winning individual
is determined depending on its fitness value. The best individual having the
highest fitness value is usually chosen deterministically though occasionally a
stochastic selection may be made. In both cases only the winner is inserted
into the next population and the process is repeated h times to obtain a
new population. Often, tournaments are held between two individuals (binary
tournament). However, this can be generalized to an arbitrary group size 4
called toitninment six.
The following description assumes that the individuals are drawn with
replacement and the winning individual is deterministically selected.
Input: Population P ( r ) E I ~ tournament
,
size y E { 1,2. . . . , A]
Output: Population after selection P ( t )
I tournament(q, U I , . . . , a ~ ) :
2 for i t I to h do
3
a: t best fit individual from 4 randomly chosen
individuals from { a , ,. . . , U * ] ;
od
4 return {a,, . . . , ail.
Tournament selection can be implemented very efficiently and has the time
complexity O ( h ) as no sorting of the population is required. However, the
above algorithm leads to high variance in the expected number of offspring as
h independent trials are carried out.
Tournament selection is translation and scaling invariant (de la Maza and
Tidor 1993). This means that a scaling or translation of the fitness value does
not affect the behavior of the selection method. Therefore, scaling techniques
as used for proportional selection are not necessary, simplifying the application
of the selection method.
TEAM LRN

182

Tournament selection

Furthermore, tournament selection is well suited for parallel evo1ution;u-y

algorithms. In most selection schemes global calculations are necessary to
compute the reproduction rates of the individuals. For example, in proportional
selection the mean of the fitness values in the population is required, and in
ranking selection and truncation selection a sorting of the whole population is
necessary. However, in tournament selection the tournaments can be performed
independently of each other such that only groups of 4 individuals need to
communicate.

24.2 Parameter settings

y = 1 corresponds to no selection at all (the individuals are randomly picked

from the population). Binary tournament is equivalent to linear ranking selection

with 1- = 1 /A (Blickle and Thiele 199Sa), where r1- gives the expected number
of offspring of the worst individual. With increasing 4 the selection pressure
increases (for a quantitative discussion of selection pressure see below). For
many applications in genetic programming values 4 E {6,. . . . 10) have been
recommended.

24.3 Formal description

Tournament selection has been well studied (Goldberg and Deb 1991, Biick
1994, 1995, Blickle and Thiele 1995a, b, Miller and Goldberg 1995). The
following description is based on the fitness distribution of the population.
Let y ( P ) denote the number of unique fitness values in the population.
Then p( P ) = ( p b l ( p ) , p f 2 ( p ).,. . . p b , , , , , ( p ) ) E [O, 1 1 ~ is the fitness distribution
of the population P , with F l ( P ) < F ? ( P ) <
< Fy(p,(P).pl,(p,
gives the proportion of individuals with fitnesc value F , ( P ) in the population
P . Furthermore the cumulative fitness distribution is denoted by R(Pr =
( R j , ( p )R
, b I ( p j . . . . , Rb,l,,l(p)) E
[O.
R I . , , ~ gives
)
the number of
individuals with fitness value F , ( P ) or less in the population P , i.e. R l , ( p , =
E;:; P / , ( P ) and R W ) := 0With these definitions, the selection operator s can be viewed as an operator
on fitness distributions (Blickle and Thiele 199%). The expected fitness
distribution after tournament selection with tournament size 4 is .srour(q) :
. . . . p;,(,,)). where
Ry+-t Ry,
sr,,,,(4)(p(P))= ( p L l ( p , ,
+

(24.1)

The expected number of occurrences of an individual with fitness value

F, ( P ) is given by p L , ( p , / p b l ( p ) . Consequently, stochastic universal sampling
(Baker 1987) (see Chapter 23) can also be used for tournament selection. This
TEAM
LRNvariance in the expected number of
almost completely reduces the usually
high

Properties

I83

offspring. However, the time complexity of the selection algorithm increases to

O ( hIn A) as calculation of the fitness distribution is required.
For the analytical analysis it is advantageous to use continuous fitness
distributions. The continuous form of (24.1) is given by

p(F)= 4p(F) (R(F))4+

(24.2)

p ( s ) d x is the
where p ( F ) is the continuous form of p ( P ) , R ( F ) =
cumulative continuous fitness distribution and F o ( P ) < F 5 F , , , p , ( P ) the
range of the distribution function p ( F ) .

24.4 Properties
24.4. I

Conc*ateriationof toiiniamerzts

An interesting property of tournament selection is the concatenation of several

selection phases. Assuming an arbitrary population with a fitness distribution
p , tournament selection with tournament size y~ is applied followed by
tournament selection with tournament size qz on the resulting population and
no recombination in between. The obtained expected fitness distribution is the
same as if only a single tournament selection with tournament size y ~ y were
Blickle and Thiele 199%):
applied to the initial distribution
stour(42)(stour(4
24.4.2 Takeolvr time

The takeover time was introduced by Goldberg and Deb (1991) to describe the
selection pressure of a selection method. The takeover time T * is the number of
generations needed under pure selection for a initial single best-fit individual to
fill up the whole population. The takeover time can. for example, be calculated
combining (24.1) and (24.3) as follows. Only the best individual is considered
and its expected proportion pie,, after tournament selection can be obtained as
= 1 /A and
pic\t = 1 - (1 - 1 /A)Y, which is a special case of (24. I ) using
Rb,-\[ = I . Performing t such tournaments subsequently with no recombination
in between leads to $best = 1 - ( 1 - I/h)qr by repeatedly applying (24.3).
Goldberg and Deb (1991) solved this equation for t and gave the following
approximation for the takeover time:
(24.4)
Figure 24.1 shows the dependence of the takeover time on the tournament size
4. For scaling purposes an artificial population size of h = e is assumed, such
that (24.4) simplifies to t:,,,(q)
ITEAM
/ Inq.LRN

Tournament selection

184

24.4.3 Selection intensity

The selection intensity is another measure for the strength of selection which
is borrowed from population genetics. The selection irrturzsity S is the change
in the average fitness of the population due to selection divided by the mean
variance of the population before selection 0 , that is, S = ( U * - u)/a, with
11 average fitness before selection. and U * average fitness after selection. To
eliminate the dependence of the selection intensity on the initial distribution
one usually assumes a Gaussian-distributed initial population (Muhlenbein and
Schlierkamp-Voosen 1993). Under this assumption, the selection intensity of
tournament selection is determined by

The dependence of the selection intensity on the tournament size is shown

in figure 24.1.

1.5

, q
33

Figure 24.1. The selection intensity S, the loss of diversity 0 , and the takeover time r*
(for h = e) of tournament selection in dependence cm the tournament size 4.

The known exact solutions of the integral equation (24.5) are given in
table 24.1. These values can also be obtained using the results of the order
statistics theory (Back 1995). The following formula was derived by Blickle
and Thiele (199%) and approximates the selection intensity with a relative error
of less than 1 % for tournament sizes of y > 5:

TEAM LRN

185

References

Table 24.1. Known exact values for the selection intensity of tournament selection.
4

24.4.4 Loss of di\qersitji

During every selection phase bad individuals are replaced by copies of better
ones. Thereby a certain amount of 'genetic material' contained in the bad
individuals is lost. The loss of diversity 8 is the proportion of the population
that is not selected for the next population (Blickle and Thiele 199%). Baker
( 1989) introduces a similar measure called 'reproduction rate, RR'. RR gives the
percentage of individuals that is selected to reproduce, hence RR = 100(1 - 0 ) .
For tournament selection this value computes to (Blickle and Thiele 199%)

It is interesting to note that the loss of diversity is independent of the initial

fitness distribution @. Furthermore, a relatively moderate tournament size of
y = 5 leads to a loss of diversity of almost 50% (see figure 24.1).

References
Back T 1994 Selective pressure in evolutionary algorithms: a characterization of \election
mechanisms Proc. 1st IEEE Corif on E\dutionun Conipututiori (Orlutido, FL, Jurie
1994) (Piscataway, NJ: IEEE) pp 57-62
-I
995 Generalized convergence models for tournament- and ( p . A)-selection Proc.
6th Itit. Conj: on Genetic Algorithms (Pittshitrg, PA, Jiily 1995) ed L J Eshelman
(San Mateo, CA: Morgan Kaufmann) pp 2-8
Baker J E 1987 Reducing bias and inefficiency in the selection algorithm P m c . 2tid I t i t .
Con$ o t i Genetic Algorithrirs (Cumhridge, MA, 1987) ed J J Gref'enstette (Hillsdale,
NJ: Erlbaum) pp 14-21
-1
989 An Atiulvsis of the Effects of Selection iti Getirtic Algorithtm PhD Thesis.
Graduate School of Vanderbilt University, Nashville, TN
Blickle T and Thiele L 1995a A Coniparisori of Selection Schenic~si ~ w diri Gericvic'
Algorithnts Technical Report I I , Computer Engineering and Communication
Networks Lab (TIK), Swiss Federal Institute of Technology (ETH) Zurich
--I
99% A mathematical analysis of tournament selection Pmc. 6th lttt. Cor!f: on
Genetic Algorithms (Pittshurg, PA, July 1995) ed L J Eshclman (San Mateo, CA:
Morgan Kaufmann) pp 9-16
de la Maza M and Tidor B 1993 An analysis of selection procedure\ with particular
attention paid to proportional and Boltzmann selection P r w . 5 t h I t i t . Cot$ o t i
Gertetic Algorithms (Urhaticc-Chunipuigti, /L, Jirly 1993) ed S Forrest (San Mateo,
TEAM LRN
CA: Morgan Kaufmann) pp 124-31

186

Tournament selection

Goldberg D E and Deb K 1991 A comparative analysis of \election jchemes used in

genetic algorithms Foi4nckitiotr.t (4
Genetic Algorithms ed G Rawlins (San Mafeo.
CA: Morgan Kaufmann) pp 69-93
Miller B L and Goldberg D E I995 Genetic Algorithms, T o i i n i c o w i i t Selectioii, ciircl the
Eflectt o j Noise Technical Report 95006, Illin&\ Genetic Algorithm Laboratory.
Univerity of Urbana-Champaign
Muhlenbein H and Schlierkamp-Voojen D 1993 Predictilrc model\ for the breeder gen2trc
algorithm Etwlrtt. Coitipict. 1 2 5 3 9

TEAM LRN

25
Rank-based selection
John Grefenstette

25.1 Introduction
Selection is the process of choosing individuals for reproduction or survival in
an evolutionary algorithm. Rank-based selectiorz or rcinking means that only
the rank ordering of the fitness of the individuals within the current population
determines the probability of selection.
As discussed in Chapter 23, the selection process may be decomposed into
distinct steps:
( i ) Map the objective function to fitness.
( i i ) Create a probability distribution based on fitness.
(iii) Draw samples from this distribution.
Ranking simplifies step (i), the mapping from the objective function .f to
the fitness function @. All that is needed is

@ ( a ; )= 6 f ( c i ; )

where 6 is 1 for maximization problems and - I for minimization problems.

Ranking also eliminates the need for fitness scaling (see Section 23.1 ), since
selection pressure is maintained even if the objective function values within the
population converge to a very narrow range, as often happens as the population
evolves.
This section discusses step (ii), the creation of the selection probability
distribution based on fitness. The final step (iii) is independent of the selection
method, and the stochastic universal sampling algorithm (see Section 23.4) is
an appropriate sampling procedure.
Besides its simplicity, other motivations for using rank-based selection
include:
(i)

Under proportional selection, a super individual, i.e. an individual with

vastly superior objective value, might completely take over the population
in a single generation unless an artificial limit is placed on the maximum
TEAM LRN

187

188

Ran k-based selection

number of offspring for any individual. Ranking help5 prevent premature

convergence due to '\uper' individuals, since the best indiv idual is alw2,ys
as\igned the same selection probability, regardless of it\ objective value.
( i i ) Ranking may be a natural choice for problems in which it is difficult
to precisely specify an objective function, e.g. if the objective function
inL olves a person's wbjective preference for alternative solutions. For
wch problems it may make little sense to pay too much attention to ihe
exact \ d u e \ of the objective function, if exact values exkt at all.
The following sections describe various form\ of linear and nonlinear ranking
algorithm\. The final section presents some of the theory of rank-ba\ed \election.

25.2 Linear ranking

Lirwcrr. r-cirikirig assigns a selection probability to each individual that is
proportional to the individual's rank (where the rank of the least tit is defined
to be zero and the rank of the most fit is defined to be p - I , given a population
of size p ) . For a generational algorithm, linear ranking can be implemented
by specifying a single parameter, firallk, the expected number of offspring to be
allocated to the best individual during each generation. The selection probability
for individual i is then defined as follows:

where (Yra[,h is the number of offspring allocated to the worst individual. The
sum of the selection probabilities is then

It follows that (;Yranh = 2 -prank, and 1 5 Prarlh 5 2. That is, the expected number
of offspring of the best individual is no more than twice that of the population
average. This shows how ranking can avoid premature convergence caused by
'super' individuals.

25.3 Nonlinear ranking

Nonlirzecrr rmzkhg assigns selection probabilities that are based on each
individual's rank, but are not proportional to the rank. For example, the selecLion
probabilities might be proportional to the square of the rank:

Prsq-ranh ( 1 =

+ [rank(i)'/(p
TEAM LRN

l)'](b - a )

( p ,A), ( p

+ A) and threshold selection

189

where c = (/3 - a ) p ( 2 p - 1 ) / 6 ( p- 1) pa is a normalization factor. This

version has two parameters, a and /3, where 0 < a < /3, such that the selection
probabilities range from a / c to P / c .
Even more aggressive forms of ranking are possible. For example, one could
assign selection probabilities based on a geometric distribution:
a(l - a)/l-l-~al~h(/)
Prgcorn-rank =
This distribution arises if selection occurs as a result of independent Bernoulli
trials over the individuals in rank order, with the probability of selecting the next
individual equal to a , and was introduced in the GENITOR system (Whitley
and Kauth 1988, Whitley 1989).
Another variation that provides exponential probabilities based on rank is

(25.I )
for a suitable normalization factor c. Both of the latter methods strongly bias
the selection toward the best few individuals in the population, perhaps at the
cost of premature convergence.
25.4

( p , A), ( p

+ A) and threshold selection

( p + h ) methods used in evolution

The ( p ,A) and

strategies (see Chapter
9 and Schwefel 19773 are deterministic rank-based selection methods. In
( p ,h ) selection, h = k p for some k > 1 . The process is that k offspring
are generated from each parent in the current population through mutation or
possibly recombination, and the best p offspring are selected for retention. This
method is similar to the technique called beam senrdz in artificial intelligence
7 is optimal
(Shapiro 1990). Experimental studies indicate that a value of k
(Schwefel 1987).
In ( p+ A ) selection, the best p individuals are selected from the union of the
p parents and the h offspring. Thus, ( p + h ) is an elitist method, since it always
retains the best individuals unless they are replaced by superior individuals.
According to Back and Schwefel (1993), the ( p . A ) method is preferable to
( p + A), since it is more robust on probabilistic or changing environments.
The ( p ,A ) method is closely related to methods known as threshold selection
or truncation selection in the genetic algorithm literature. In threshold selection
the best T p individuals are assigned a uniform selection probability, and the
rest of the population is discarded:
Prthrc\h-ranh (i) =

0
1/TP

if rank(i) < ( I
otherwise.

T)p

The parameter T is the called the threshold, where 0 < T 5 1 . According to

Miihlenbein and Schlierkamp-Voosen (1993), T should be chosen in the range
TEAM LRN
0.1-0.5.

I90

Ran k-based select ion

Threshold selection is essentially a ( p ,A ) method, with p = T p and

h = p , except that threshold selection is usually implemented as a probabilistic
procedure using the distribution Prrh[csh-l.a[,k,while systems using ( p ,A ) are
usually deterministic.
25.5

Theory

The theory of rank-based selection has received less attention than the
proportional selection method, due in part to the difficulties in applying the
schema theorem to ranking. The next subsection describes the issues that arise
in the schema analysis of ranking, and shows that ranking does exhibit a form
of implicit parallelism. Characterizations of the selective pressure of ranking
are also described, including its fertility rate, selective differential, and takeover
time. Finally, a simple substitution result is mentioned.

25.5. I

Ruliking und implicit purullelisrn

The use of rank-based selection makes it difficult to relate the schema theorem
to the original objective function, since the mean observed rank of a schema is
generally unrelated to the mean observed objective value for that schema. As
a result, the relative target sampling rates (see Section 23.5.1) of two schemata
under ranking cannot be predicted based on the mean objective values of the
schemata, in contrast to proportional selection. For example, consider the
following case:

where
U I ,U - I , (1s E

~ 2 . E~ Hz.
3

Assume that the goal is to maximize the objective function f . Even though
~ ( H I=) 20 > 10 = f ( H z ) , ranking will assign a higher target sampling rate
to H: than to H I .
However, ranking does exhibit a weaker form of implicit parcrllu1i.m.
meaning that it allocates search effort in a way that differentiates among a
large number of competing areas of the search space on the basis of a limited
number of explicit evaluations of knowledge structures (Grefenstette 199 1 ). The
following definitions assume that the goal is to maximize the objective funcrion.
A fitness function @ is called monotoriic- it

@([I,

* f<./1 I

5 @(q
)

f(L1,).

That is, a monotonic fitness function does not reverse the sense of any pairwise
ranking provided by the objective function. A fitness function is called strict&
morrototiic if it is monotonic and
TEAM LRN

191

Theory

A strictly monotonic fitness function preserves the relative ranking of apy two
individuals in the search space with distinct objective function values. Since
@ ( a , )= S f ( a ; ) ,ranking uses a strictly monotonic fitness function by definition.
Likewise, a selection algorithm is called momtonic if

where tsr(a) is the target sampling rate, or expected number of offspring, for
individual U . That is, a monotonic selection algorithm is one that respects
the s i r n ~ i i ~ a l - ~ , f - t h e - ~principle.
tfe.~f
A selection algorithm is called stric-tly
rnomtonic if it is monotonic and
@((a,)

+ tsr(u,) < tsr(u,).

< @(uJ)

A strictly monotonic selection algorithm assigns a higher selection probability to

individuals with better fitness values. Linear ranking selection and proportional
selection are both strictly monotonic, whereas threshold selection is monotonic
but not strict, since it may assign the same number of offspring to individuals
with different fitness values.
Finally, an evolutionary algorithm is called admissi6le if its fitness function
and selection algorithm are both monotonic. An evolutionary algorithm is sfricf
iff its fitness function and selection algorithm are both strictly monotonic.
Now, consider two arbitrary subsets of the solution space, A and B .
sorted by objective function value. By definition, B partially doniiiinfes A
( A -i B ) at time t if each representative of B is at least as good as the
corresponding representative of A. The following theorem (Grefenstette I99 I )
partially characterizes the implicit parallelism exhibited by ranking (any many
other selection methods):

Implicit parallelism c$ admissible evolutionary algorithms. In any admissible

evolutionary algorithm, if (A + B ) then tsr(A) 5 tsr(B). Furthermore, in any
strict evolutionary algorithm, if (A + B ) then tsr(A) < tsr(B).
One illustration of this result to rank-based selection is shown in figure 25.1.
Let A be the set of points in the space with objective function values between
the dotted lines. Let B be the set of points in the space with objective values
above the region between the dotted lines. Then, in any population that contains
points from both set A and set B , the number of offspring allocated to B by any
strict evolutionary algorithm grows strictly faster than the number allocated to
set A , since any subset of B dominates any subset of A . This example illustrates
implicit parnllelism because it holds no matter where the dotted lines are drawn.
This result holds not only for rank-based selection, but for any fitness function
LRN
and selection algorithm that satisfyTEAM
the requirement
of admissibility.

Rank-based selection

192

Figure 25.1. Two regions defined by range of objective values.

25.5.2 Fertility rate

The <fertilihrate .F of a selection method is the proportion of the population that
is expected to have at least one offspring as a result of the selection process.
Other terms that have been used for this include fertility factor (Baker 1985,
1987), reproductive rate (Baker, 1989), and diversity (Blickle and Thiele, 1995).
Baker (1987, 1989) shows that, for linear ranking, the fertility rate obeys
the following formula:

.F=I--

B-1
4

where ,tl is the number of offspring allocated to the best individual, 1 5 5 2.

So F ranges in value from 1 (if = 1) to 0.75 (if ,tl = 2) for linear ranking.
25.5.3 Selection diflerential
Drawing on the terminology of selective breeding, Muhlenbein and SchlierkampVoosen (1993) define the selection diferential S ( t ) of a selection method as the
difference between the mean fitness of the selected parents and the mean fitness
of the population at time t . If the fitness values are normally distributed the
selection differential for truncation selection is approximately

where ap is the standard deviation of the fitness values in the population, and
I is a value called the selection intensity. Back (1995) quantifies the selection
intensity for general ( p ,A) selection as follows:

TEAM LRN

193
where Z I : Aare order statistics based in the fitness of individuals in the current
population. That is, I is the average of the expectations of the p best samples
taken from iid normally distributed random variables Z . This analysis shows
that I is approximately proportional to A / p , and experimental studies confirm
this relationship (Back 1995, Miihlenbein and Schlierkamp-Voosen 1993).
25.5.4 Takeover time
Takeo/ier tirne refers to the number of generations required for an evolutionary
algorithm operating under selection alone (i.e. no other operators such as
mutation or crossover) to converge to a population consisting entirely of
instances of the optimal individual, starting from a population that contains
a single instance of the optimal individual. According to Goldberg and Deb
(1991), the approximate takeover time t in a population of size p for rankbased selection is
In p h ( l n p )
t =
In 2

for linear ranking with

prank

= 2 and

for linear ranking with 1 < prank < 2.

25.5.5 Sitbstitutinrz theorem

One interesting feature of rank-based selection is that it is clearly less sensitive

to the objective function than proportional selection. As a result, i t possible
to make the following observation about evolutionary algorithms that use rankbased selection:
Substitictiorz theorem. Let EA be an evolutionary algorithm that uses rankbased selection, along with any forms of mutation and recombination that
are independent of the the objective values of individuals. If EA optimizes
an objective function f then EA also optimizes the function <q 2 f , for any
monotonically increasing g.

Prm$ For any monotonically increasing function g. the composition g c .f

induces the same rank ordering of the search space as f . It follows that a rankbased algorithm EA produces an identical sequence of populations for objective
functions .f and g o f , assuming that mutation and recombination in EA are
independent of the the objective values of individuals. Since f and ,q c .f have
TEAM
LRN
the same optimal solutions, the result
follows.

I94

Rank- based selection

For example. a rank-based evolutionary algorithm that optimizes a giken

function f ( x ) in t steps will also optimize the function (J'(x))" in t steps. for
any even 11 > 0.

References
Bick T I995 Generalized convergence models tor tournament- and ( p ,A)-selection PIOC..
6th Int. Cot!$ o i i Genetic Algorithriis (Pittshitrgh. PA, J u l y 1995) ed L J Eshelnian
(San Mateo. CA: Morgan Kaufmann) pp 2-8
Back T and H-P Schwefel 1993 An overview of evolutionary algorithms for paramcter
optimization Ekditt. Coinpitt. 1 1-23
Baker J I985 Adaptive selection methods for genetic algorithms Pmc. 1st h i t . Car$ oii
Genetic Algorithms (Pittshitrgh, PA, July I9851 cd J J Grefenstette (Hillsdale, NJ:
Lawrence Erlbaum) pp 101-1 I
-I987 Reducing bias and inefficiency in the selection algorithm Proc. 2nd lrrt. CoiIf
o t i Getretic- A1,qorithtii.s (Cmtihridge, MA, 1087) ed J J Grefenstette (Hillsdale, NJ:
Erlbaum) pp 14-21
___ 1989 Aiiri1J~si.s
(J'tlre &fects of'Selc~choirin Genetic-Algorithms Doctoral Dissertatlon.
Department o f Computer Science, Vanderbilt University
Blickle T and Thiele L 1995 A mathematical analysis o f tournament selection Proc. 6th
Irrr. Car$ oii Getretic Algorithnis (Pittshiirgli, PA, Jitly 1995) ed L Eshclman ( San
Mateo, CA: Morgan Kaufmann) pp 9-16
Goldberg D and Deb K 1991 A cotnparative analysis of selection schemes used in
genetic algorithms Fountltitions ($Genetic Algoritlinis ed G Rawlins (San Mateo,
CA: Morgan Kaufmann) pp 69-93
Grefenstette J 199 1 Conditions for implicit parallelism FoioidLitions o$Generic- Algorit~rms
ed G Rawlins (San Mateo, CA: Morgan Kaufmann) pp 252-61
Muhlenbein H and Schlierkamp-Voosen D 1993 Predictive models for the breeder genetic
algorithm E\*olict. C'oiiipt. 1 25-49
Sc h we fe I H - P I 9 77 Nit r r r tJrisc tie 0 1 7 tirn iuriI I rg I'on Corny ii ter-ModeI f en in itte I s tle r
E~~olirtioti.s.str~rte,~ic.
( / i i t r n l i . s t ~ i ~ ~ l i tSjxteni
i t t r ~ ~ Rcr.setrrc.h 26) (Base]: Birkhauser)
____ 1987 Collective phenoinena in evolutionary systems Prepririts of' the 3 1st h i r .
Meetitig Iiitertrutiotid Societj- jt)r Gerierril Sj:~tem.sResectriA (Bicrltrpest) vol 2,
pp 1025-33
Shapiro S C (ed) I990 Enc-ycloprdirr o j Artijicid l~rtelligenc*evol 1 (New York: Wiley)
Whitley D 1989 The GENITOR algorithm and selective pressure: why rank-b.ised
allocation o f reproductive trials is best Proc. 3rd Iiit. Coizj: o i i Genetic A1gorirhri.s
(Fliirjiix, VA, Jiiire 1989) ed J Schaffer (San Matzo, CA: Morgan Kaufmann) pp I 1621
Whitley D and Kauth J 1988 GENITOR: a different genetic algorithm Pro(,. R O ~ A J .
Moitnttiin Col$ o i i Artificicil Iiitrlligrriw ( D e i i \ ~ r CO)
,
pp 1 1 8-30

TEAM LRN

26
Boltzmann selection
Sarnir W Mahfoud

26.1

Introduction

Boltzmann selection mechanisms thermodynamically control the selection

pressure in an evolutionary algorithm (EA), using principles from siriiirlcifed
m i r m i l i t i g (SA) (Kirpatrick et NI 1983). Boltzmann selection mechanisms can
be used to indefinitely prolong an EA's search, in order to locate better final
solutions.
In EAs that employ Boltzmann selection mechanisms, i t is often impossible
to separate the selection mechanism from the rest of the EA. In fact. the
mechanics of the recombination and neighborhood operators are critical to the
generation of the proper temporal population distributions. Therefore, most
of the following discusses Boltmarzii EAs rather than Boltzmann selection
mechanisms in isolation.
Boltzmann EAs represent parallel extensions of the inherently serial SA. In
addition, theoretical proDfs of asymptotic, global convergence for SA carry over
to certain Boltzmann selection EAs (Mahfoud and Goldberg 1995).
The heart of Boltzmann selection mechanisms is the Bolt;niiruri triul, a
competition between current solution i and alternative solution j , in which
i wins with logistic probability

(26.1)
where T is temperature and .f; is the energy, cost. or objective function value
(assuming minimization) of solution i . Slight variations of the Boltzmann trial
exist, but all variations essentially accomplish the same thing when iterated (the
winner of a trial becomes solution i for the next trial): at fixed T , given a
sufficient number of Boltzmann trials, a Boltzmann distribution arises among
the winning solutions (over time). The intent of the Boltzmann trial is that at
high T , i and j win with nearly equal probabilities, making the system fluctuate
wildly from solution to solution; at low T , the better of the two solutions nearly
always wins, resulting in a relatively stable system.
TEAM LRN

I95

196

Boltzmann selection

Several types of Boltzmann algorithm exist, each designed for slightly

different purposes. Boltzmann toiirrzcmeiit selec'tiorz (see Chapter 24 and
Goldberg 1990, Mahfoud 1993) is designed to give the population tzic-hirzg
capabilities (Mahfoud 1995), but is not able to significantly slow the population'\
convergence. (Con\vrgeric.e refers to a population's decrease in diversity over
time, as measured by an appropriatc diversity measure.) Whether any Boltzm;inn
EA is capable of performing effective niching remains an open question.
The Boltzmann selection method of de la Maza and Tidor ( 1993) scales the
fitnesses of population elements, following fitness assignment, according to the
Boltzmann distribution. It is designed to control the convergence of traditional
selection.
Pcrrdlel rec.ot?zhiticiti\lesitizitlatecl crrztiecilitig (PRSA) (Mahfoud and Goldberg
1992, 1995) allows control of EA convergence, achieves a true parallelizat ion
of SA, and inherits SA's convergence proofs. PRSA is the Boltzmann EA
discussed in the remainder of this section.

26.2 Simulated annealing

SA is an optimization technique. analogous to the physical process of annealing.
SA starts with a high temperature T and any initial state. A neighborhood
operator is applied to the current state i to yield state j . If f ) < j;,j becomes the
current state. Otherwise j becomes the current state with probability e'
'.
(If j does not become the current state. i remains the current state.) The
application of the neighborhood operator and the probabilistic acceptance 01 the
newly generated state are repeated either for a fixed number of iteration.;, or
until a quasi-equilibrium is reached. The entire above-described procedure is
performed repeatedly, each time starting from the current i and from a lower T .
At any given T , a sufficient number of iterations always leads to equilibrium,
at which point the temporal distribution of accepted states is stationary. ('This
stationary distribution is Boltzmann.) The SA algorithm, as described abov?. is
called the Metropolis erlgoritfzm. What distinguishes the Metropolis algorithm
is the criterion by which the newly generated state is accepted or rejected. An
alternative criterion is that of equation (26.1). Both criteria lead to a Boltzniann
distribution.
The key to achieving good performance with SA, as well as to proving
global convergence, is that a stationary distribution must be reached at each
temperature, and cooling (lowering T ) must proceed sufficiently slowly.

'I-'!
' 8 '

26.3 Working mechanism for parallel recombinative simulated annealing

PRSA is a population-level implementation of' simulated annealing. Instead of
processing one solution at a time, it processes an entire population of solutions in
parallel, using a recombination operator (typically crossover, see Chapter 33 I and
a neighborhood operator (typically TEAM
rrzictcition,
LRN see Chapter 32). The combination

Pseudocode for a common variation of parallel recombinative simulated

I97
annealing
of crossover and mutation produces a population-level neighborhood operator
whose action on the entire population parallels the action of SAs neighborhood
operator on a single solution. (See figure 26.1.) It is interesting to note that
without crossover, PRSA would be equivalent to running p independent SAS,
where p is population size. Without mutation, PRSAs global convergence
proofs would no longer hold.
PRSA works by pairing all population elements, at random, for crossoFfer
each generation. After crossover and mutation, children compete against their
parents in Boltzmann trials. Winners advance to the next generation.
In the Boltzmann trial step, many competitions are possible between two
children and two parents. One possibility, double nccel,tc~iic.e/~eje~,t;oii.
allows
both parents to compete as a unit against both children: the sum of the
two parents energies should be substituted for .f; in equation (26.1); the
sum of the two childrens energies, for fJ. A second possibility, sirigle
ac.c.eptnnc,e/rejec.tion,holds two competitions, each time pitting one child against
one parent. There are several possible single acceptance/rejection competitions.
For instance, each parent can always compete against the child formed from
its own right end and the other parents left end (assuming single-point
crossover). Other possibilities and their consequences are outlined by Mahfoud
and Goldberg (1995).

26.4 Pseudocode for a common variation of parallel recombinative

simulated annealing
The pseudocode at the top of the next page describes a common variation
of PRSA that employs single acceptance/rejection competitions, a static
stopping criterion, and random-without replacement-pairing of population
elements for recombination. The cooling schedule is set by the two functions
initialize-temperature() and adjust-temperature(). These two functions. as well
as initialize-population(), are shown without arguments, because their arguments
depend upon the type of cooling schedule and initialization chosen by the
user. The function random() simply returns a pseudorandom real number on
the interval (0, 1).

26.5 Parameters and their settings

PRSA allows the use of any recombination and neighborhood operators. It
performs minimization by default; maximization can be accomplished by
reversing the sign of all objective function values. Population size ( p ) remains
constant from generation to generation. The number of generations the algorithm
runs can either be fixed, as in the pseudocode, or dynamic, determined by a userspecified stopping or convergence criterion that is perhaps tied to the cooling
TEAM LRN
schedule.

198

Boltzmann selection

Input: g-number of generations to run, p-population

Output: P(g)-the final population

size

P (0) c initialize-population()
T ( 1 ) c initialize-temperature()
for t +- 1 to g do
P ( t ) t shuffle(P(t - I ) )
for i t 0 to p / 2 - I do
PI +- W J + l ( t )
P2 +- a2,+2(t)
{ c l , c . 2 ) t recombine(p1, p z )
t neighborhood(c.1)
ci t neighborhood(c.2)
if random() > [ I + e ( l ) - ~ ( ~ ) l ~ then
( f ) l -alz , + ~ ( t t
) c, fi
if random() > [ I + el f(/>)-!(;)I/r(f)
] - I then U ? ~ + Z (+-~ )c$ fi
od
T ( t 1 ) t adjust-temperature()
od
(9;

PRSA requires a user to select a population size, a type of competition,

recombination and neighborhood operators, and a cooling schedule. Prior
research offers some guidelines (Mahfoud and Goldberg 1992, 1995). A good
rule of thumb for population size is to choose as large a population size as
system limitations and time constraints allow. In general, smaller populations
require longer cooling schedules. The type of competition previously employed
is single acceptancehejection, in which each parent competes against the child
formed from its own right end and the other parents left end (under single-point
crossover).
Appropriate recombination and neighborhood operators are problem specific.
For example, in optimization of traditional binary encodings, one might employ
single-point crossover and mutation; in permutation problems, permutationbased crossover and inversion would be more appropriate.
Many styles of cooling schedule exist, but their discussion is beyond the
scope of this section. Several studies contain thorough discussions of cooling
(Aarts and Korst 1989, Azencott 1992, Ingber and Rosen 1992, Romeo and
Sangiovanni-Vincentelli I99 1 ). Perhaps the simplest type of cooling schedule
is to start at a high T , and to periodically lower T through multiplicatiori by
a positive constant such as 0.95. At each T , a number of generations are
performed. In general, the more generations performed at each T and the higher
TEAM
the multiplicative constant, the better
theLRN
end result.

199

Global convergence theory and proofs

26.6 Global convergence theory and proofs

The most straightforward global convergence proof for any variation of PRSA
shows that the variation is a special case of standard SA. This results in the
transfer of SAs convergence proof to the PRSA variant. Details of PRSAs
convergence proofs are given by Mahfoud and Goldberg (1995).
The variation of PRSA that we consider employs selection of parents with
replacement, and double acceptanceh-ejection. No population element may be
selected as both parents. (Self-mating is disallowed.)
Many authors have taken the viewpoint that SA is essentially an EA with
a population size of one. Our proof takes the opposite viewpoint. showing an
EA (PRSA) to be a special case of SA. To see this, concatenate all strings of
the PRSA population in a side-by-side fashion to form one superstring. Define
the fitness of this superstring to be the sum of the individual fitnesses of its
component substrings (the former population elements). Let cost be the negated
fitness of this superstring. The cost function will reach a global minimum only
when each substring is identically at a global maximum. Thus, to maximize all
elements of the former population, PRSA can search for a global minimum for
the cost function assigned to its superstring.
Consider the superstring as our structure to be optimized. Our chosen
variation of PRSA, as displayed graphically in figure 26.1, is now a special case
of SA, in which the crossover-plus-mutation neighborhood operator is applied
to selected portions of the superstring to generate new superstrings. Crossoverplus-mutations net effect as a population-level neighborhood operator is to
swap two blocks of the superstring, and then probabilistically flip bits of these
swapped blocks and of two other blocks (the other halves of each parent).

i
j

A:B

C:D

A : D

C : B
1

Figure 26.1. The population, after application of crossover and mutation (step 1 ),
transitions from superstring i to superstring j . After a Boltzmann trial (step 2), either
i or j becomes the current population. Individual population elements are represented
as rectangles within the superstrings. Blocks A, B, C, and D represent portions of
individual population elements, prior to crossover and mutation. Crossover points are
shown as dashed lines. Blocks A, B, C, and D result from applying mutation to A, B.
C, and D.

As a special case of SA, the chosen variation of PRSA inherits the

TEAM LRNthe population-level neighborhood
global convergence proof of SA, provided

200

Boltzmann selectic2

operator meets certain conditions. According to Aarts and Korst (1989), two
conditions on the neighborhood generation mechanism are sufficient to guarantee
asymptotic global convergence. The first condition is that the neighborhood
operator must be able to move from any state to a globally optimal state in a finite
number of transitions. The presence of mutation satisfies this requirement. The
second condition is symmetry. It requires that the probability at any temperature
of generating state y from state .r is the same as the probability of generating
state .r from state 3. Symmetry holds for common crossover operators such as
single-point, multipoint, and uniform crossover (Mahfoud and Goldberg 199s ).

References
Aarts E and Korst J 1989 Siinitkitetl Atinediiig titid Holtmciiin Mtrchines: CI Stochastic
Appro(rch to Comhirrtrtoricrl 0prinii;crtinn crnd Nritrrrl Corripiiriricq (Chichest3r:
Wiley)
Azencott R (ed) I992 Simitkitt~dAniir-.diiig: Ptiriillelizcrtioii Techiiiyites (New York:
Wiley)
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Car$ o t i
Getretic Algorithms ( Ur~~urici-C~i~tnip~tigti,
lL, Jrtly IY93) ed S Forrest (San Matco.
CA: Morgan Kaufmann) pp 124-31
Goldberg D E 1990 A note on Boltzmann tournament selection for genetic algorithms
and po pu 1at i o n - or ie n t ed si m u 1ated anneal i ng Conzplex Sjw. 4 445-60
Ingber L and Rosen B I992 Genetic algorithms and very fast simulated re-annealing: a
comparison Mcith. Cotitpt. Modelling 16 87- 100
Kirpatrick S, Gelatt C D Jr and Vecchi M P 1983 Optimization by simulated annealing
Science 220 67 1-80
Mahtoud S W 1993 Finite Markov chain models of an alternative selection strategy tor
the genetic algorithm Conip1e.v Syst. 7 155-70
-1995 Nithirig Methods fiw Crtietic Algorithms Doctoral Dissertation and IlliG.4L
Report 9500 I . UniLwsity of Illinois at Urhana-Champaign. Illinois Gencstic
Algorithms Laboratory; Dissertcition Ahstrctcts Int. 56(9) p 49878 (Uni\vr:virx
Mii.ro~1rii.vY543663)
Mahtoud S W and Goldberg D E 1992 A genetic algorithm for parallel simulated
annealing Proc. 2nd lilt. Col$ oti Partillel Prohleni Sohirig from Nmire (Briissols,
1992) ed R Manner and B Manderick (Amsterdam: Elsevier) pp 301-10
-1995 Parallel recombinative simulated annealing: a genetic algorithm Ptrrtillel
cor?lpllt. 2 1 1 -2 8
Romeo F and Sangiovanni-Vincentelli A 199I A theoretical framework for simulated
annealing Algorithniicvi 6 3 0 2 4 5

TEAM LRN

27
Other selection methods
David B Fogel

27.1

Introduction

In addition to the methods of selection presented in other sections of this

chapter, other procedures for selecting parents of successive generations are of
interest. These include the tournament selection typically used in eLtolutionary
programming (Fogel 1995, p 137), soft brood selection offered within research
in genetic programming (Altenberg 1994a, b), disruptive selection (Kuo and
Hwang 1993). Boltzmann selection (de la Maza and Tidor 1993). nonlinear
ranking selection (Michalewicz I996), competitive selection (Hillis 1992,
Angeline and Pollack 1993, Sebald and Schlenzig 1994), and the use of lifespan
(Back 1996).

27.2 Tournament selection

The tournament selection typically performed in evolutionary programming
allows for tuning the degree of stringency of the selection imposed. Rather
than selecting on the basis of each solutions fitness or error in light of the
objective function at hand, selection is made on the basis of the number of ,tim
earned in a competition. Each solution is made to compete with some number,
4 , of randomly selected solutions from the population. In each pairing, if the
first solutions score is at least as good as the randomly selected opponent, the
first solution receives a win. Thus up to 4 wins can be earned. This competition
is conducted for all solutions in the population and selection then chooses the
best subset of a given size from the population based on the number of wins
each solution has earned. For y = 1, the procedure yields essentially a random
walk with very low selection pressure. For 4 = x,the procedure becomes
selection based on objective function scores (with no probabilistic selection).
For practical purposes. 4 3 10 is often considered relatively hard selection, and
y in the range of three to five is considered soft. Soft selection allows for lower
probabilities of becoming trapped at local optima for periods of time.
TEAM LRN

20 I

202

Other selection methods

27.3 Soft brood selection

Soft brood selection holds a tournament (see Chapter 24) between members of
a brood of two parents. The winner of the tournament is considered to be the
offspring contributed by the mating. Soft brood selection is intended to shield
the recombination operator from the costs of producing deleteriou\ offspring.
It cull\ such offspring, essentially testing for their viability before being placed
into competition with the remainder of the population. (For further details m
the effects of \oft brood \election on wbexpressions in tree {tructures. \ee the
article by Altenberg ( 1994a).)

27.4 Disruptive selection

Disruptive selection can be used to select against individuals with moder;tte
values (in contrast to stabilizing selection which acts against extreme values, or
directional selection which acts to increase or decrease values). Kuo and Hwang
(1993) suggested a fitness function of the form

where .f(z)is the objective value of the solution z and f ( t ) is the mean of all
solutions in the population at time t . Thus a solutions fitness increases with
its distance from the mean of all current solutions. The idea is to distribute
more search effort to both the extremely good and extremely bad solutions. The
utility of this method is certainly very problem dependent.

27.5

Boltzmann selection

Boltzmann selection (as offered by de la Maza and Tidor 1993) proceeds as

where X is a population of solutions, U ( X ) is the problem dependent objective

function, F , ( . ) is the fitness function for the ith solution in X . U , ( . ) is the
objective function evaluated for the i t h solution in X , and T is a variable
temperature parameter. De la Maza and Tidor ( 1993) suggest that this method
of assigning fitness proportional selection converges faster than traditional
proportional selection. BZck ( 1994). however, describes this as a misleading
name for yet another scaling method for proportional selection.

27.6 Nonlinear ranking selection

Nonlinear ranking selection (Michalewicz 1996, pp 60-1) is a variant of linear
ranking selection. Recall that for linear ranking selection. the probability of a
solution with a given rank being selected can be set as
P(rank) TEAM
= 4 -LRN
(rank - 1)r

203

Competitive selection

where q is a user-defined parameter. For each lower rank. the probability of

being selected is reduced by a factor of r-. The requirement that the sum of all
the probabilities for each ranked solution must be equal to unity implies that
4 = r-(popsize - 1 )/2

+ 1 /popsize

where popsize is the number of solutions in the population. This relationship

can be made nonlinear by setting:
P(rank) = y ( I - 4)r3111.-'
where y E (0, I ) and does not depend on popsize; larger ~raluesof 4 imply
stronger selective pressure. Back ( 1994) notes that this nonlinear ranking method
fails to sum to unity and can be made practically identical to tournament selection
under the choice of 4.

27.7

Competitive selection

Competitive selection is implemented such that the fitness of a solution is

determined by its interactions with other members of the population, or other
members of a jointly evolving but separate population. Hillis (1992) used this
concept to evolke corting networks in which a population of sorting network\
competed against a population of various permutation\; the networhs mere
scored in light of how well they sorted the permutation\ and the permutation\
were scored in light of how well they could defeat the sorting netmforhs.
Angeline and Pollack (1993) used a similar idea to evolke programs to play
tic-tac-toe. Sebald and Schlenzig ( I 994) used evolutionary programming
on competing populations to generate suitable blood pre\sure controller\ for
simulated patients undergoing cardiac surgery (i.e. controllers were scored on
how well they maintained the patient's blood pressure while patient\ were scored
on how well they defeated the controllers). Fogel and Burgin (1969) describe
experiments in which competing evolutionary programs played a prisoner's
dilemma game using finite-state machines, but insufficient detail is provided
to allow for replication of the results. Axelrod (1987). and others. offered
an apparently similar procedure for evolving rule sets describing alternatii e
behaviors in the iterated prisoner's dilemma.

27.8

Variable lifespan

Finally, Back (1996) notes that the concept of a variable lifespan has been
incorporated into the ( p . A ) selection of evolution strategies by Schwefel and
Rudolph ( 1995) by allowing the parents to survive some number of generations.
When this number is one generation, the method is the familiar comma strategy:
TEAM
LRN
at infinity, the method becomes a plus
strategy.

2 0.1

Other selection methock

References

Altcnberg L 1993, Emergent phenomena in genetic programming, Proca. -3rd Anti. Cot:/:
o t i E\-olirtiotiiit;v Pro,qrtiriitiiiticq ( S m Diego. CA, Fohriuit;v 1994) cd A V Sebald arid
1- J Fogcl (Singapore: World Scientific) pp 233-1 1
__I
OC)3b The c\olution o f evolvability in genetic programming Atl\*ciricx~si t i Getic~ric.
~ro~qrtittittiiti~q
ed K Kinnear (Cambridge, MA: MIT Press) pp 37-73
Axelrod R I987 The c\~olutiono t strategies in the iterated prisoners dilemma Grtietic.
Algoritlirtis titit1 SirtiirItitcd A t i t i t w l i t i g ed L Davis ( Los AI tos, CA: Morg;in
Kaufmann) pp 3 2 1 I
Angclinc P J and Pollack J B 1993 Competitive environments e\wlve better solutions for
complex tasks Pro(.. 5th I t i t . Cot!f: o t i Gciirtic Algorithim ( U r t ~ ~ i t i ~ i - C l i ~ i t t i/I.,
i,~ii~~~i
Jirl\s I Y Y 3 ) ed S Forrest (San Matco, CA: Morgan Kaufmann) pp 264-70
Back -I- I c)W Selective pressure in evolutionary algorithms: a characterization of se1ectic)n
riicchanisms Prot.. t.vt /EEE Corlf: O I I E \ d l t r i o t l t i r - j , Cortipiittitiori (Orlutido, F I,,
100.3) (Piscataway, NJ: IEEE) pp 57-62
___ I 096 Et.olittioticir;\. tllgot-ithis it! Tlitwr;~~
tint1 Prwticv ( New, York:
Oxford
Llii i vers i t j , Press )
de la M w a hl and Tidor B 1993 An analysis ot selection procedures with particular
attention paid to proportional and Boltmiann selection Proc.. 5th I t i t . Cot!/: w i
G c v i t h c AI~qoritlit~rs
I ~ l r ~ , ~ i t i t i - C l i ( i t i i i , IL,
( i i ~Jirly
~ ~ i ~19Y-3) ed S Forrest (San Mattw,
CA: Morgun Kauft-nann) pp 124-3 1
FogeI D B I995 E\~olirtioritir-y Coriil,irtcctioti: To\rwtd ( I N r ~ yPhilosophy of Mtic~hiric~
I r r t c ~ l l i ~ q e t i c(~Nc ~~ NYork:
,
IEEE)
Fogel L J and Burgin G H 1969 C o r i i p c t i t i r v God-Seeking TIiroiqIi E\wliitioticrr:\.
~ t . o ~ q t . ~ i t i i t t i iFinal
t i ~ ~ Report, Contract N o AF I9(628 )-5927, Air Force Cambridge
Re sc arch Lab r at or i e s .
Hillis b D 1992 Co-c\~ol\~ing
parasites impro\,es simulated evolution as an optimintion
procedure Arrificicil Lit;. 11 ed C Langton, C Taylor, J Farmer and S Rasmus?.cn
(Reading, MA : Addison-Wesley ) pp 3 13-24
Kuo T and H ~ m S-Y
g 1993 A genetic algorithm with disrupti\re selection Pro(.. 5th / t i t .
Cot!f: o t i Gctietic. Algot-itlitiiv ( L ~ rh t i t i t i - C h t i t i i i ~ c i i ~ qt
It,.i , J i i / y 1993) ed S Forrest (San
Matco, CA: Morgan Kaufmann) pp 65-9
Michalewici Z I996 Grtiotic~Al,qorit/ittis + Dtitti Strrrctirrus = E\diitiori Pro,qrrini.s . h d
cdn (Berlin: Springer)
Sch\+vtcI H-P and Riidolph G I995 Contemporary evolution strategic5 , 4 t / i ~ t r i i t ~in
.s
A rt(fic.iti1 l.(fi)( Pt.oc.. 3rd Itit. Cot!$ o t i t\t,t(f;citil Lifr, G r m t i h , S p r i t i ) ( Loctiirr N c t c s
i t t Artific.iril ltitolligctic~cY 2 Y ) ed 1: Morrin r t trl (Berlin: Springer) pp 893-907
Scbald A V and Schlen/ig J 1094 Minirnax design of neural nel conlrollcrx for highly
.~
Nc>t\t.ork.\ NN-5 73-82
uncertain plant5 IE>EE7 k ~ t iNeirrril

TEAM LRN

28
Generation gap methods
Jayshree Sarrna and Kenneth De Jorzg

28.1 Introduction
The concept of a generation gap is linked to the notion of riorio,.er-ltrpl,iriS and
o\verlappiizg populations. In a nonoverlapping model parents and offspring nei er
compete with one another, i.e. the entire parent population is always replaced by
the offspring population, while in an overlapping system parents and offspring
compete for survival. The term generation gap refers to the amount of o\erlap
between parents and offspring. The notion of a generation gap i \ closely related
to selection algorithms and population management issues.
A selection algorithm in an evolutionary algorithm (EA) involves two
elements: (i) a selection pool and (ii) a selection distribution over that pool.
A selection pool is required for reproduction selection as well as for deletion
selection. The key issue in both these cases is 'what does the pool contain urhen
parents are selected and when survivors are selected'?'.
In the selection for the reproduction phase, parents are selected to produce
offspring and the selection pool consists of the current population. Hou the
parents are selected for reproduction depends on the individual EA paradigm.
In the selection for the deletion phase, a decision has to be made as to
which individuals to select for deletion to make room for the new offspring.
In nonoverlapping systems the entire selection pool consisting of the current
population is selected for deletion: the parent population (,U) is always replaced
by the offspring population (A). In overlapping models, the selection pool for
deletion consists of both parents and their offspring. Selection for deletion is
performed on this combined set and the actual selection procedure varies in each
of the EA paradigms.
Historically, both evolutionary programming and evolution strategies
had overlapping populations while the canonical genetic algorithms used
nonoverlapping populations.
TEAM LRN

206

Generation gap methods

28.2 Historical perspective

In ektolutionary programming (Fogel ut ~ i l1966), each individual produces one

offspring and the best half from the parent and offspring populations are selected
to form the new population. This is an overlapping system as the parents and
their offspring constantly compete with each other for survival.
In evolution strategies (Schwefel 198 I ) , the ( p A) and the ( p ,A) models
correspond to the overlapping and nonoverlapping populations respectively. In
the (1-1 h ) system parents and offspring compete for survival and the best p
are selected. In the ( p , A) model the number of offspring produced is generally
far greater than the parents. The offspring are then ranked according to fitness
and the best Y, are selected to replace the parent population.
Genetic algorithms are based on the two reproductive plans introduced and
analyzed by Holland (1975). In the first plan, R I , at each time step a single
individual was selected probabilistically using payoff proportional selection to
produce a single offspring. To make room for this new offspring, one individual
from the current population was selected for deletion using a uniform random
distribution.
In the second plan. R,!, at each time step all individuals were
deterministically selected to produce their expected number of offspring. The
selected parents were kept in a temporary storage location. When the process of
recombination was completed. the offspring produced replaced the entire current
population. Thus in R,I. individuals were guaranteed to produce their expected
number of offspring (within probabilistic roundoff).
At that time, from a theoretical point of view, the two plans were viewed
as generally equivalent. However, because of practical considerations relating
to the overhead of recalculating selection probabilities and severe genstic
drift (allele loss) in small populations, most early researchers favored the R,,
approach.
The earliest attempt at evaluating the properties of R I and R,, plans was
a set of empirical studies (De Jong 1975) in which a parameter G, called the
,qrtierdoti g u p , was defined to introduce the notion of overlapping generations.
The generation gap parameter controls the fraction of the population to be
replaced in each generation. Thus. G = 1 (replacing the entire population)
corresponded to R,, and G = l / p (replacing a single individual) represented
RI.
These early studies (De Jong 1975) suggested that any advantages that
overlapping populations might have were offset by the negative effects of
genetic drift (allele loss). The genetic drift was caused by the high variance
in expected lifetimes and expected number of offspring, mainly becausc at
that time, generally, modest population sizes were used ( p I. 100). These
negative effects were shown to increase in severity as G was reduced. These
studies also suggested the advantages of an irnplicit generation overlap. That
LRN
and optimal mutation rate of 0.001
is, using the optimal crossover rateTEAM
of 0.6

Steady state and generational evolutionary algorithms

207

(identified empirically for the test suite used) meant that approximately 40% of
the offspring were clones of their parents, even for G = 1 . A later empirical
study by Grefenstette ( 1986) confirmed the earlier results that a larger generation
gap value improved performance.
However. early experience with classifier systems (e.g. Holland and Reitman
1978) yielded quite the opposite behavior. In classifier systeiiir only a w h e t of
the population is replaced each time step. Replacing a small number of classifiers
was generally more beneficial than replacing a large number or possibly all of
them. Here the poor performance observed as the generation gap L alue increased
was attributed to the fact that the population ar a whole represented a Gngle
solution and thus could not tolerate large changes i n it\ content.
In recent years, computing equipment with increased capacity is easily
available and this effectively removes the reason for preferring the RJ approach.
The desire to solve more complex problems using genetic algorithms has
prompted researchers to develop an alternative to the generational sy stem called
the 'steady state' approach, in which typically parents and offspring do coexist
(see e.g. Syswerda 1989, Whitley and Kauth 1988).

28.3 Steady state and generational evolutionary algorithms

Steady state EAs are systems in which usually only one or two offspring Lire
produced in each generation. The selection pool for deletion can consist of the
parent population only or can be possibly augmented by the offspring produced.
The appropriate number of individuals are selected for deletion. based on some
distribution. to make room for these new offspring. Generational systems are
so named because the entire population is replaced every generation by the
offspring population: the lifetime of each individual in the population is only
population systems,
one generation. This is the same as the rzono\~er.l~rl,l,irz~~
while the steady state EA is an oivr.lnppirzg population system.
One can conceptually think of a steady state model in evolutionary
programming and evolution strategies. For example, from a parent population of
Y
, individuals, a single offspring can be formed by recombination and mutation
and can then be inserted into the population. A recent study of the steady state
evolutionary programming performed by Fogel and Fogel ( 1995) concluded that
the generational model of evolutionary programming may be more appropriate
for practical optimization problems. The first example of the steady state
evolutionary strategies is the ( p 1) approach introduced by Rechenberg ( 1973)
which had a parent population greater than one ( p > I ) . All the parents were
then allowed to participate in the reproduction phase to create one offspring.
The (,Y I ) model was not used as it was not feasible to selfadapt the step sizes
(Back et nl 1991).
An early example of the steady state model of genetic algorithms is the RI
model defined by Holland ( 1975) in which the selection pool for deletion consists
LRN deletion strategy is used. The RJ
only of the parent population and aTEAM
uniform

208

Generation gap methods

Figure 28.2. The mean and variance of the growth curves of the best in a nonoverlapping
system (population size, 50; G = 1).

approach is the generational genetic algorithm. Theoretically, the two systems

(overlapping systems using uniform deletion and nonoverlapping systems) are
considered to be similar in expectation for infinite populations. However, there
can be high variance in the expected lifetimes and expected number of offspring
when small finite populations are used.
This variance can be highlighted by keeping everything in the two systems
constant and changing only one parameter, viz., the number of offspring
produced. Figures 28.1 and 28.2 illustrate the average and variance for the
growth curve of the best in two systems, producing and replacing only a single
individual each generation in one and replacing the entire population each
generation in the other. A population size of 50 was used, the best occupied
10% of the initial population, and the curves are averaged over 100 independent
runs. Only payoff proportional selection, reproduction, and uniform deletion
were used to drive the systems to a state of equilibrium. Notice that in the
TEAM
overlapping system (figure 28.1) the
bestLRN
individuals take over the population

Steady state and generational evolutionary algorithms

209

only about 80% of the time and the growth curve, exhibit much higher variance
when compared to the nonoverlapping population (figure 28.2).
This high variance for small generation gap values causes more genetic
drift (allele loss). Hence, with smaller population sizes, the higher variance in
a steady state system makes it easier for alleles to disappear. Increasing the
population size is one way to reduce the the variance (see figure 28.3) and thus
offset the allele loss. In summary, the main difference between the generational
and steady state systems is higher genetic drift in the latter especially when
small population sizes are used with low generation gap values. (See the article
by De Jong and Sarma (1993) for more details.)

Figure 28.3. The mean and variance of the grouth curve5 of the best in an overlapping
5ystem (population w e , 200; G = 1/200)

So far we have assumed that there is an uniform distribution on the

selection pool used for deletion, but most researchers using a steady state
genetic algorithm generally use a distribution other than the standard uniform
distribution. Syswerda (1991) shows how the growth curves can change when
different deletion strategies, such as deleting the least fit, exponential ranking
of the members in the selection pool. and reverse fitness, are used. Peck and
Dhawan ( 1 995) demonstrate an improvement in the ideal growth behavior of
the steady state system when uniform deletion is changed to a first-in-firstout (FIFO) deletion strategy. An early model of a steady state (overlapping)
system is GENITOR (Whitley and Kauth 1988, Whitley 1989) which not only
uses ranking selection (Chapter 23) instead of proportional selection (Chapter
25) on the selection pool for reproduction but also uses deletion of the worst
member as the deletion strategy. The GENITOR approach exhibited significant
performance improvement over the standard generational approach.
Using a deletion scheme other than a uniform deletion changes the selection
pressure. The selection pressure induced by the different selection schemes can
vary considerably. Both these changes can alter the exploration-exploitation
balance. Two different studies have shown that improved performance in a
TEAM
LRNto higher growth rates and changes
is due
steady state system, like GENITOR.

2 10

Generation gap methods

in the exploration-exploi tation balance caused by using different selection and

deletion strategies and is not due to the use of an overlapping model (Goldbeig
and Deb 1991, De Jong and Sarma 1993).

28.4 Elitist strategies

The cycle of birth and death of individuals is very much linked to the
management of the population. Individuals that are born have an associated
lifetime. The expected lifetime of an individual is typically one generation, but
in some EA systems it can be longer. We now explore this issue in more detail.
Elitist strategies link the lifetimes of individuals to their fitnesses. Elitist
strategies are techniques to keep good solutions in the population longer than
one generation. Though all individuals in a population can expect to have a
lifetime of one generation, individuals with higher fitness can have a longer
lifetime when elitist strategies are used.
As stated earlier, the selection pool for deletion is comprised of both the
parents and the offspring populations in the overlapping system. This combined
population is usually ranked according to fitness and then truncated to form the
new population. This method ensures that most of the current individuals with
higher fitness survive into the next generation, thus extending their lifetime. In
the ( j i + A) evolution strategies, a very strong elitist policy is in effect as the
top ji are always kept. I n eLvlutionary programming. a stochastic tournamcnt
is used to select the survivors, and hence the elitist policy is not quite as strong
as in the evolution strategy case. In the ( p . A) evolution strategies there is no
elitist strategy to preserve the best parents.
Unlike evolution strategies and evolutionary programming, where there is
postselection of survivors based on fitness, in generational genetic algorithms
there is only preselection of parents for reproduction. Recombination operators
are applied to these parents to produce new offspring, which are then subject
to mutation. Since all parents are replaced each generation by their offspring,
there is no guarantee that the individuals with higher fitness will survive into
the next generation. An elitist strategy in generational genetic algorithms is
ii way of ensuring that the lifetime of the very best individual is extended
beyond one generation. Thus, unlike evolutionary programming and evolution
strategies. where more than just the best individual survive, in generational
genetic algorithms generally only the best individual survives. Steady si.ate
genetic algorithms which use deletion schemes other than uniform random
deletion have an implicit elitist policy and so automatically extend the lifetime
of the higher-fitness individuals in the population.
It should be noted that the elitist strategies were deemed necessary when
genetic algorithms are used as function optimizers and the goal is to find a
global optimal solution (De Jong 1993). Elitist strategies tend to make the
search more exploitative rather than explorative and may not work for problems
TEAM LRN
in which one is required to tind multiple
optimal solutions.

References

21 1

References
Back T, Hoffmeister F and Schwefel H-P 1991 A survey of eLrolutionary strategies Pro(..
4th liit. Coi$ o i i Gerietic Algorithms (Sun Diego, CA, J u I j . 1991I ed R K Belew and
L B Booker (San Mateo, CA: Morgan Kaufmann) pp 2-9
De Jong K A 1975 Ati Aiitr1y.si.sof the Behavior cf ei Clris.s of' Gerirtic Acltipti\>t>Sjsteiiis
Phd Dissertation, University of Michigan
-1
993 Genetic algorithms are NOT function optiniizcrs Foiiritltitioiis of' Geiietic
Algorithms 2 ed L D Whitley (San Mateo, CA: Morgan Kaufmann) pp 5-17
De Jong K A and Sarma J 1993 Generation gaps revisited Foiiiihtions of' Genrtic
Algorithms 2 ed L D Whitley (San Mateo, CA: Morgan Kaufmann) pp 19-28
Fogel G B and Fogel D B 1995 Continuous evolutionary programming: analysis and
experiments Cylwrii~t.Syst. 26 79-90
Fogel L J, Owens A J and Walsh M J 1996 Art(ficiti1 Iiitelli,qcwc~c~
through Siiiiuleitcd
Eidirtioii (New York: Wilcy)
Goldberg D E and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foiiiidcitiniis of Genetic Alp~r-ithnisI ed G J E Rawlins (San
Mateo, CA: Morgan Kaufmann) pp 69-93
Grefenstette J J 1986 Optimization of control parameters for genetic algorithms / E
Truns. Syst. Mtiii Cj)heriiet. SMC-16 122-8
Holland J H 1975 Adtiptcitioti in Nuturul ciiid Art$citrl Systenis ( Ann Arbor, MI:
University of Michigan Press)
Holland J H and Reitman J S 1978 Cognitive systems based on adaptikte algorithms
Pcrtterti-Directed li!forenc.e Sj,steiii.s ed D A Waterman and F Hayes-Roth (New
York: Academic)
Peck C C and Dhawan A P 1995 Genetic algorithms as global random search methods:
an alternative perspective E~dutioiiriq~
Contpiit. 3 39-80
Rechenberg I 1 973 Ei~olutioiis.strtrtegir: 0priiiiic.r-irng tec*/irii.vc~hc~r
S>atoriir i i r i d i
Priiizipieii der hiologischeti \dutioti (Stuttgart: Frommann-Holzboog )
Schwefel H-P 198 1 Nuinerical 0ptinii:atioii qf Coinputer Moclels (Chichester: Wiley )
Syswerda G 1989 Uniform crossover in genetic algorithms P roc. 3rd I t i t . CorJf: o r i Gerwtic.
Algorithms (Fciiifkv, VA, Juiie 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2-9
-199
1 A study of reproduction in generational and steady-state genetic algorithms
Foirnc1citioii.s cf Gerietic Algorithnzs I ed G J E Rawlins (San Mateo. CA: Morgan
Kaufmann) pp 94-101
Whitley D 1989 The GENITOR algorithm and selection pressure: why rank-based
allocation of reproductive trials is best Proc. 3rd liit. Cmf: oil G o i i d ( .AI,qorithi?i.s
(Ftiirjkv, VA, J i i i i e 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann)
pp 116-21
Whitley D and Kauth J 1988 GENITOR: a Di'ereitt
Gcwtic Algorithni Colorado State
University Technical Report CS-88- 10 1

TEAM LRN

29
A comparison of selection mechanisms
Peter J B Hancock

29.1

Introduction

Selection provides the driving force behind an evolutionary algorithm. Without

it, the search would be no better than random. This section explores the pros
and cons of a variety of different methods of performing selection. Selection
methods differ in two main ways: the way they aim to distribute reproductive
opportunities across members of the population, and the accuracy with which
they achieve their aim. The accuracy may differ because of sampling noise
inherent in some selection algorithms. There are also other differences that may
be signiticant. such as time complexity and suitability for parallel processing.
Crucially for some applications, they also differ in their ability to deal with
evaluation noise.
There have been a number of comparisons of different selection methods
by a mixture of analysis and simulation, usually on deliberately simplified
tasks. Goldberg and Deb ( 1991) considered a system with just two fitness
levels, and studied the time taken for the fitter individuals to take over the
population under the action of selection only, verifying their analysis with
simulations. Hancock ( 1994) extended the simulations to a wider range of
selection algorithms. and added mutation as a source of variation, to compare
effective growth rates. The effects of adding noise to the evaluation function
were also considered. Syswerda ( 199 1 ) compared generational and incremental
models on a ten-level takeover problem. Thierens and Goldberg (1994) derived
analytical results for rates of growth for a bit counting problem, where the
approximately normal distribution of fitness values allowed them to include
recombination in their analysis. Back ( 1994) compared takeover times for all
the major selection methods analytically and reported an experiment on a 30dimensional sphere problem. B2ck ( 1995) compared tournament and ( / L , A)
selection more closely. Blickle and Thiele (1995a, b) undertook a dehiled
analytical comparison of a number of selection methods (note that the second
paper corrects an error in the first). Other studies include those of Biick and
Hoffmeister ( 1991), de la Maza and Tidor (1993) and Pal ( 1994).

212

TEAM LRN

Simulat ions

213

It would be useful to have some objective measure(s) with which to compare

selection methods. A general term is selection pressure. The meaning of
this is intuitively clear, the higher the selection pressure, the faster the rate of
convergence, but it has no strict definition. Analysis of selection methods has
concentrated on two measures: takeover time and selection intensity. Takeover
time is the number of generations required for one copy of the best string to
reproduce so as to fill the population, under the effect only of selection (Goldberg
and Deb 1991). Selection intensity is defined in terms of the average fitness
before and after selection, .T and f s c , , and the fitness variance CJ:

This captures the notion that it is harder to produce a given step in average
fitness between the population and those selected when the fitness variance is
low. However, both takeover time and selection intensity depend on the fitness
functions, and so theoretical results may not always transfer to a real problem.
There is an additional difficulty because the fitness variance itself depends on
the selection method, so different methods configured to have the same selection
intensity may actually grow at different rates.
Most of the selection schemes have a parameter that controls either the
proportion of the population that reproduces or the distribution of reproductive
opportunities, or both. One aim in what follows will be to identify some
equivalent parameter settings for different selection methods.
29.2

Simulations

A number of graphs from simulations similar to those reported by

Hancock ( 1 994) are shown here, along with some analytical and experimental

results from elsewhere. The takeover simulation initializes a population of I00

randomly, with rectangular distribution, in the range 0- 1 , with the exception
that one individual is set to 1. The rate of takeover of individuals with the value
I under the action of selection alone is plotted. Results reported are averaged
over 100 different runs. The simulation is thus similar to that used by Goldberg
and Deb (1991), but the greater range of fitness values allows investigation of
the diversity maintained by the different selection methods. Since some of them
produce exponential takeover in such conditions, a second set of simulations
makes the problem slightly more realistic by adding mutation as a source of
variation to be exploited by the selection procedures. This growth simulation
initializes the population in the range 0-0.1. During reproduction, mutation with
a Gaussian distribution, mean 0, standard deviation 0.02, is added to produce the
offspring, subject to remaining in the range 0 - 1 . Some plots show the value of
the best member of the population after various numbers of evaluations, again
averaged over 100 different runs. Other plots show the growth of the worst
LRN
value in the population, which givesTEAM
an indication
of the diversity maintained in

2 14

A comparison of selection mechanisms

the population. Some selection methods are better at preserving such diversity:
other things being equal, this seems likely to improve the quality of the overall
search (Muhlenbein and Schlierkamp-Voosen 1095, Blickle and Thiele 199%).
It should be emphasized that fast convergence on these tasks is not
necessarily good: they are deliberately simple in an effort to illustrate some
of the differences between selection methods and the reasons underlying them.
Good selection methods need to balance exploration and exploitation. Before
reporting results, we shall consider a number of more theoretical points of
similarities and differences.

29.3 Population models

There are two different points in the population cycle at which selection may
be implemented. One approach, typical of genetic algorithms (GAS). is to
choose individuals from the population for reproduction, usually in some way
proportional to their fitness. These are then acted on by the chosen genztic
operators to produce the next generation. The other approach, more typical of
evolution strategies (ESs) and evolutionary programming (EP), is to allow all
the members of the population to reproduce, and then select the better memhers
of the extended population to go through to the next generation. This difference,
of allowing all members to reproduce, is sometimes flagged as one of the key
differences in approach between EWEP and GAS. In fact the two approaches
may be seen as equivalent once running, differing only in what is called the
population. If the extended population typical of the ES and EP approach is
labeled \imply the population, then it may be seen that, as with the first approxh.
the best individuals are selected for reproduction and used to generate the new
(extended) population. Looked at this way, it is the traditional GA approach that
allows all members of the population at least some chance of reproduction, where
the methods that use truncation selection restrict the number that are allowed
to breed. There remains, however, a difference in philosophy: the traditional
CA approach is reproduction according to fitness, while the truncation selection
typical of the ES, EP, and breeder GA is more like survival of the fittest. There
will al\o be a difference at startup, with ES/EP initializing p individuals, while
an equivalent CA initializes
A.

29.4 Equivalence: expectations and reality

A number of pairs of the common selection algorithms turn out to be. in some
respects, equivalent. The equivalence, usually in expected outcome, can hide
differences due to sampling errors, or behavior in the presence of noise, that
may cause significant differences in practice. This section considers some of
these similarities and differences. in order to reduce the number that need be
TEAM LRN
considered in detail in Section 29.5.

Equivalence: expectations and reality

29.4. I

215

Tournament selection and ranking

Goldberg and Deb (1991) showed that simple binary tournament selection (TS)
(see Chapter 24) is equivalent to linear ranking (Section 25.2) when set to give
two offspring to the top-ranked string (prank = 2). However, this is only in
expectation: when implemented the obvious way, picking each fresh pair of
potential parents from the population with replacement, tournament selection
suffers from sampling errors like those produced by roulette wheel sampling,
precisely because each tournament is performed separately. A way to reduce
this noise is to take a copy of the population and choose pairs for tournament
from it without replacement. When the copy population is exhausted, another
copy is made to select the second half of the new population (Goldberg ct crl
1989). This method ensures that each individual participates in exactly two
tournaments, and will not fight itself. It does not eliminate the problem, since.
for example, an average individual, that ought to win once, may pick better or
worse opponents both times, but it will at least stop several copies of any one
being chosen.
The selection pressure generated by tournament selection may be decreased
by making the tournaments stochastic. The equivalence, apart from sampling
errors, with linear ranking remains. Thus TS with a probability of the better
string winning of 0.75 is equivalent to linear ranking with rtjrank = 1.5. The
selection pressure may be increased by holding tournaments among more than
two individuals. For three, the best will expect three offspring, while an
average member can expect 0.75 (it should win one quarter of its expected
three tournaments). The assignment is therefore nonlinear and Back ( 1994)
shows that, to a first approximation, the results are equivalent to exponential
nonlinear ranking, where the probability of selection of each rank i, starting at
i = 1 for the best, is given by (s - l)(s'-')/(d' - l ) , where s is typically in the
range 0.9-1 (Blickle and Thiele 199%). (Note that the probabilities as specified
by Michalewicz (1992) do not sum to unity (Back 1994).) More precisely,
they differ in that TS gives the worst members of the population no chance to
reproduce. Figure 29.1 compares the expected number of offspring for each rank
in a population of 100. The difference results in a somewhat lower population
diversity for TS when run at the same growth rate.
Goldberg and Deb (1991) prefer TS to linear ranking on account of its
lower time complexity (since ranking requires a sort of the population), and
Back (1994) argues similarly for TS over nonlinear ranking. However, time
complexity is unlikely to be an issue in serious applications, where the evaluation
time usually dominates all other parts of the algorithm. The difference is in any
case reduced if the noise-reduced version of TS is implemented, since this
also requires shuffling the population. For global population models, therefore,
ranking, with Baker's sampling procedure (Baker I987), is usually preferable.
TS may be appropriate in incremental models, where only one individual is
to be evaluated at a time, and in TEAM
parallelLRN
population models. It may also be

A comparison of selection mechanisms

216

I00

Rank of individual

Figure 29.1. Expected number o ! otfspring against rank for tournament selection
tournament s i x 3 and exponential rank selection with J = 0.972.

ith

appropriate in, for instance, game playing applications, where the evaluation
itself consists of individuals playing each other.
Freisleben and Hiirtfelder (1993) compared a number of selection schemes
using a meta-level GA. that adjusted the parameters of the GA used to tackle
their problem. Tournament selection was chosen in preference to rank selection,
which at first sight seems odd, since the only difference is added noise. A
possible explanation lies in the nature of their task, which was learning the
weights for a neural net simulation. This is plagued with symmetry problems
(e.g. Hancock 1992). The GA has to break the symmetries and decide on
just one to make progress. I t seems possible that the inaccuracies inherent in
tournament selection faci I i tated this symmetry breaking, with one individual
having an undue advantage, and thereby taking over the population. Noise is
not always undesirable. though there may be more controlled ways to achit:ve
the same result.

29.4.2 Iric+remetitcilcirici generational models

There is apparently a large division between incremental and generational
reproduction models. However, Syswerda ( 1991 ) shows that an incremeiital
model where the deletion is at random produces the same expected result
as a generational model with the same rank selection for reproduction.
Again, however, this analysis overlooks sampling effects. Precisely because
incremental models generate only one or two offspring per cycle, they suffer
the roulette wheel sampling error. Figure 29.2 shows the growth rate for hest
and worst in the population for the two models with the same selection pressure
(best expecting 1.2 offspring). The incremental model grows more slowly. yet
TEAMcharacteristic
LRN
loses diversity more rapidly, an effect
of this kind of sampling

217

Equivalence: expectations and reality

~~~

error. Incremental models also suffer in the presence of evaluation noise (see
Section 29.6).

Generational, hest

Generational, worst
Incremental, hest
Incremental, worst

Evaluations

Figure 29.2. The growth rate in the presence of mutation of the best and worst in the
population for the incremental model with random deletion and the generational model.
both with linear rank selection for reproduction, PrClnh
= 1.2.

The very highest selection pressure possible from an evolutionary system

would arise from an incremental system, where only the best member of the
population is able to reproduce, and the worst is removed if the new string is an
improvement. Since the rest of the population would thus be redundant. this is
equivalent to a ( 1 1) ES, the dynamics of which are well investigated (Schwefel
1981).

Some GA workers allow only the top few members of the population to
reproduce (Nolfi et nl 1990, Muhlenbein and Schlierkamp-Voosen 1993). This
is often called truncation selection, and is equivalent to the ES ( p , A) approach
subject only to a difference in what is called the population (see Section 29.3).
EP uses a form of tournament selection where all members of the extended
population y h compete with c. others, chosen at random with replacement.
Those y that amass the most wins then reproduce by mutation to form the next
extended population. This may be seen as a rather softer form of truncation
selection, converging to the same result as a ( p y ) ES as the size of c
increases. The value of c does not directly affect the selection pressure, only
the noise in the selection process.
The EP selection process may be softened further by making the tournaments
probabilistic. One approach is to make the probability of the better individual
winning dependent on the relative TEAM
fitnessLRN
of the pair: p , = j ; / ( . f ; f , ) (Fogel

218

A comparison of selection mechanisms

1988). Although intuitively appealing, this has the effect of reducing selection
pressure as the population converges and can produce growth curves remarkahly
similar to unscaled fitness proportional selection (FPS; Hancock 1994).

29.5 Simulation results

29.5. I

Fitness proportiorial selestiori

Simple FPS suffers from sensitivity to the distribution of fitness values in the
population, as discussed in Chapter 23. The reduction of selection pressure as
the population converges may be countered by moving baseline techniques, such
as windowing and sigma scaling. These are still vulnerable to undesirable loss
of diversity caused by a particularly fit individual, which may produce many
offspring. Rescaling techniques are able to limit the number of offspring given
to the best, but may still be affected by the overall spread of fitness values, .And
particularly by the presence of very poor individuals.
Figure 29.3 compares takeover and growth rates of FPS and some of the
baseline adjustment and rescaling methods. The simple takeover rates for the
three adjusted methods are rather similar for these scale parameters, with linear
scaling just fastest. Simple FPS is so slow it does not really show on the
same graph: i t reaches only 80% convergence after 40000 evaluations on this
problem. The curves for growth in the presence of mutation are all rather
alike: the presence of the mutation maintains the range of fitness values in the
population, giving simple FPS something to work on. Note, however, that i t
still starts off relatively fast and slows down towards the end: probably the
opposite of what is desirable. The three scaled versions are still similar, but
note that the order has reversed. Windowing and sigma scaling now grow more
rapidly precisely because they Fail to limit the number of offspring to especially
good individuals. A fortuitous mutation is thus better exploited than in the
more controlled linear scaling, which leads to the correct result in this simple
hill-climbing task, but may not in a more complex real problem.
29.5.2 Rcrrrkirig

Goldberg and Deb (1991) show that the expected growth rate for linear ranking
is proportional to the value of prank, the number of offspring given to the best
individual. For exponential scaling, the selection pressure is proportiond to
1 - s . This makes available a wide range of selection pressures, defined by the
value of s, illustrated in figure 29.4. The highest takeover rate available with
linear ranking (prank = 2 ) is also shown. Exponential ranking can go faster with
smaller values of s (see table 29. I ). Note the logarithmic x-axis on this plot.
With exponential ranking, because of the exponential assignment curve, poor
individuals do rather better than with linear ranking, at the expense of those more
TEAM
in the middle o f the range. One result
of LRN
this is that, for parameter settings that

Simulation results

219

give similar takeover times, exponential ranking loses the worse values in the
population more slowly, which may help preserve diversity in practice.
29.5.3 Evolution strategies

The selection pressure generated by the ES selection methods have been

extensively analyzed, sometimes under the title of truncation selection (see
e.g. Back 1994). Selection pressure is dependent on the ratio of Y, to A (see

I00
c

%
L

0
0

1000

2000

3000

4000

5000

2000

Evaluations

4000

6 000

Evaluations

Figure 29.3. (a) The takeover rate for FPS, with windowing, sigma, and inear scaling.
(b) Growth rates in the presence of mutation.

U Exp s=0.999

Exp s=0.998
Exps=0.996

A
-

Exp s=0.992

=--

Exp s=0.984

__O__

Exps=0.968

+ Linear 2.0
100

1000

10000

Evaluations

Figure 29.4. The takeover rate for exponential rank selection for a number of values of
s, together with that for linear ranking, TEAM
prank=LRN
2.

A comparison of selection mechanisms

220

table 29.1). One simulation result is shown, in figure 29.5, to make clear
the selection pressure achievable by ( p , A) selection, and indicate its poteniial
susceptibility to evaluation noise, discussed further below.

Noise free

I,l00

+ 10. I00
25,100

With evaluation n o i x
__)__

I.100

+ 10,100
& 25.100

I
5000

10000
Evaluations

1
1 5000

20000

Figure 29.5. The growth rate in the presence of mutation for ES ( p ,A) selection \vith
and without evaluation noise, for h = 100 and p = 1, 10. and 25.

29.5.4 Itm-ementul models

Goldberg and Deb (1991) show that the Genitor incremental model develops
a very high growth rate, compared to that typical of GAS. This is mostly
due to the method of deletion, in which the worst member of the population
is eliminated (cf the ES truncation approach). As a consequence, the value
of prank used in the linear ranking to select parents has little effect on the
takeover rate. Even with prank = I (i.e. a random choice of parents), kill-worst
converges in around 900 evaluations (cf about 3000 for the scaled FPS variants
in figure 29.3(a)). Increasing prank to its maximum of 2.0 only reduces this to
around 600 evaluations.
There are a number of ways to decide which of the population should be
removed (Syswerda 1991), such as killing the oldest (also known as FIFO
deletion (De Jong and Sarma 1993)); one of the ri worst; by inverse rank; or
simply at random. The various deletion strategies radically affect the behavior
of the algorithm. As discussed above, random deletion resembles a generational
model. Kill oldest also produces much softer selection than kill-worst, producing
takeover rates similar to generational models with the same selection preswre
(see figure 29.6). However, the incremental model starts more quickly and ends
more slowly than the generational one.
Syswerda ( 199 I ) prefers kill-by-inverse-rank. In his simulations, this
TEAM
LRN
produces results similar to kill-worst,
but
he is using a high inverse selection

22 I

Simulation results

1000

2000

3000

4000

5000

6000

Evaluations

Figure 29.6. The takeover rates for the generational model and the kill-oldest incremental
model, both using linear ranking for selection.

kr 1 . 1

*
*

k r 1.2
k r 1.4

+ rI 1.2
rl 1.4

+ rI 1 . 8
" I

2500

I
5000

7500

I
10000

Evaluations

Figure 29.7. Growth rates in the presence of mutation for incremental kill-by-inverserank (kr) and generational linear ranking (rl) for various values of Drank.

pressure (exponential ranking with s = 0.9). A more controlled result is given

by selecting for reproduction from the top and for deletion from the bottom using
TEAM
LRN of prank. Using linear ranking, the
ranking with the same, more moderate
value

222

A comparison of selection mechanisms

growth rate changes more rapidly than prank. This is because an increase in p u n k
has two effects: increasing the probability of picking one of the better members
of the population at each step, and increasing the number of steps for which
they are likely to remain in the population, by decreasing their probability of
deletion. Figure 29.7 compares growth rates in the presence of mutation for
kill-by-rank incremental and equivalent generational models. It may be seen
that the generational model with prank = 1.4 and the incremental model with
prank = 1.2 produce very similar results. Another matched pair at lower gronth
rates is generational with prank = 1.2 and incremental with prank = 1.13 (not
shown).
One of the arguments in favor of incremental models is that they allow
good new individuals to be exploited at once, rather than having to wait a
generation. It might be thought that any such gain would be rather slight, since
although a good new member could be picked at once, it is more likely to
have to wait several iterations at normal selection pressures. There is also the
inevitable sampling noise to be overcome. De Jong and Sarma (1993) claim
that there is actually no net benefit, since adding new fit members has the
effect of increasing the average fitness, thus reducing the likelihood of them
being selected. However, this argument applies only to takeover problems:
when reproduction operators are included the incremental approach can generate
higher growth rates. Figure 29.8 compares the growth of an incremental killoldest model with a generational model using the same selection scheme. The
graph also shows one of the main drawbacks of the incremental models: their
sensitivity to evaluation noise, to be discussed in the following section.

29.6 The effects of evaluation noise

Hancock (1994) extended the growth simulations to study the effects of adding
evaluation noise. A Gaussian random variable, mean zero, standard deviation
0.2, was added to each underlying true value for use in selection. The true value
was used for reproduction. It proved necessary to add this much noise-ten
times the standard deviation of the signal mutation-to bring about significant
reduction in growth rates for the generational selection models.
The sensitivity of the different selection algorithms to evaluation noise is
largely dependent on whether they retain parents for further reproduction. Fully
generational models are relatively immune, while most incremental models and
those like the ( p + A) ES that allow parents to compete for retention fare
much worse, because individuals receiving a fortuitously good evaluation will
be kept. The exception for incremental models is kill-oldest, which maintains the
necessary turnover. Figure 29.8 shows the comparison. Kill oldest deteriorates
only a little more than the generational model in the presence of noise, while
kill-worst, which grows much the fastest in the absence of noise, almost fails
TEAM LRN
completely.

223

The effects of evaluation noise

Noise free
Kill worst

0.75
C

Kill oldest

+ Generational

a
3

With evaluation noise

0.5
C

Kill worst

0
v)

& Kill oldest

0.25

+ Generational
0

5 000

10000

I5000

Evaluations

Figure 29.8. Growth in the presence of mutation, with and without evaluation noise, for
the generational model with linear ranking and incremental models with kill-worst and
kill-oldest, all using prank= 1.2 for selection.

Within generational models, there are differences in noise sensitivity.

Figure 29.9 compares the growth rates for linear ranking and sigma-scaled FPS,
with and without noise. It may be seen that the scaled FPS deteriorates less.

No noise
Rank

+ Sigma
With evaluation noise

+ Rank
d

2000

4000

6000

Sigma

8000

Evaluations

Figure 29.9. Growth in the presence of mutation, with and without evaluation noise, for
the generational model with linear ranking,
TEAMprank
LRN= 1.8, and sigma-scaled FPS, s = 4.

224

A comparison of selection mechanisrns

This is caused by sigma scalings inability to control superfit individuals. A

genuinely good individual, that happens to receive a helpful boost from the
noise, may be given many offspring by sigma scaling, but will be limited to
prank,in this case 1.8, by ranking. As before, rapid convergence is beneficial in
this simple task, but is unlikely to be so in general.
The ES ( p ,A) method can achieve extremely high selection pressures, when
it becomes sensitive to evaluation noise in a manner similar to incremental
models (figure 29.5). In this case, the problem is that too many reproductions
are given to a few strings, whose rating may be overestimated by the noise.
Figure 29.5 shows a clear turnaround: as the selection pressure is increased,
performance in noise becomes worse.
One approach to countering the effects of noise is to perform two
or more evaluations per string, and average the results. Fitzpatrick and
Grefenstette (1988) investigated this and concluded that it is better to evaluate
only once and proceed with the next generation. A possibly more efficient
method is to reevaluate only the apparently fitter individuals. Candidates
may be chosen as for reproduction, e.g. by rank. However, experiments
with incremental kill-by-rank indicated that the extra evaluations did not pay
their way, with convergence taking only a little less than twice as many
evaluations in total (Hancock 1994). Hammel and Back (1994) compared the
effects of reevaluation with an equivalent increase in the population size and
showed that reevaluations lead to a better final result. Indeed, on Rastrigans
function, increasing the population size resulted in a deterioration of convergence
performance. Taken together, these results suggest a strategy of evaluating only
once initially, and keeping the population turning over, but then starting to
reevaluate as the population begins to converge. Hammel and Back suggest
an alternative possibility of incorporating reevaluation as a parameter to be
optimized by the evolutionary system itself.

29.7 Analytic comparison

Blickle and Thiele (199%) perform an extensive analysis of several selection
schemes, deriving the dependence of selection intensity on the selection
parameters under the assumption of a normal distribution of fitness. Their
results, which reassuringly agree with the simulation results here, are shown
in an adapted form in table 29.1. They also consider selection variance,
confirming that methods such as ES selection that disallow weakest strings
from reproduction reduce the population variance more rapidly than those that
allow weak strings some chance. Of the methods considered, exponential
rank selection gives the highest fitness variance, for the reasons illustrated in
figure 29. I . Their conclusion is that exponential rank selection is therefore
LRN
probably the best of the schemesTEAM
that they
consider.

225

Conclusions

Table 29.1. Parameter settings that give equivalent selection intensities for ES ( E L ,A).
TS, and linear and exponential ranking, adapted and extended from Blickle and
Thiele (199%). Under tournament size, p refers to the probability of the better string
winning.
I

ES ,u/A

0.1 I
0.34
0.56
0.84
1.03
1.16
1.35
I .54
1.87

0.94
0.80
0.66
0.47
0.36
0.30
0.22
0.15
0.08

Tournament size

2, p = 0.6
2, p = 0.8
2
3
4

5
7
10
20

prank,
Lin rank
1.2
1.6
2.0

.s,

Exp rank, A = 100

0.996
0.988
0.979
0.966
0.955
0.945
0.926
0.900
0.809

29.8 Conclusions
The choice of a selection mechanism cannot be made independently of
other aspects of the evolutionary algorithm. For instance, Eshelman ( 1991 )
deliberately combines a conservative selection mechanism with an explorative
recombination operator in his CHC algorithm. Where search is largely driven by
mutation, it may be possible to use much higher selection pressures. typical of
the ES approach. If the evaluation function is noisy, then most incremental
models and others that may retain parents are likely to suffer. Certainly.
selection pressures need to be lower in the presence of noise, and, of the
incremental models, kill-oldest fares best. Without noise, incremental methods
can provide a useful increase in exploitation of good new individuals. Care
is needed in the choice of method of deletion: killing the worst provides high
growth rates with little means of control. Killing by inverse rank or killing
the oldest offers more control. Amongst generational models, the ES ( p ,A)
and exponential rank selection methods give the biggest and most controllable
range of selection pressures, with the ES method probably most suited to
mutation-driven, high-growth-rate systems, and ranking better for slower, more
explorative searches, where maintenance of diversity is important.

References
Back T I994 Selective pressure in evolutionary algorithms: a characterization of selection
methods Proc. 1st IEEE Cot$ on Evolutionan Compiitcition (Orlnndo. FL, Jiinr
1994) (Piscataway, NJ: IEEE) pp 57-62
-1
995 Generalized convergence models for tournament and ( p . A)-selection Proc. 6th
Int. Conf on Genetic Algorithms (Pittsburgh, PA, July 1995) ed L J Eshelman (San
TEAM
Mateo, CA: Morgan Kaufmann) pp
2-8 LRN

226

A comparison of selection mechanisms

Back T and Hoffmeister F 1991 Extended selection mechanisms in genetic algorithms

Proc. 4th Int. Conj on Genetic Algorithms (San Diego, CA, July 1991) ed R K Belew
and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 92-9
Baker J E 1987 Reducing bias and inefficiency in the selection algorithm Proc. 2nd lnt.
Con$ on Genetic Algorithms (Cambridge,MA, 1987) ed J J Grefenstette (Hillsdale,
NJ: Erlbaum) pp 14-21
Blickle T and Thiele L 1995a A mathematical analysis of tournament selection Proc. 6th
Int. Cot$ on Genetic Algorithms (Pittsburgh, PA, Jitly 1995) ed L J Eshelman (San
Mateo, CA: Morgan Kaufmann) pp 9-16
-199% A Comparison of Selection Schemes used in Genetic Algorithm TIK-Report
1 I , Computer Engineering and Communication Networks Laboratory (TIK), Swiss
Federal Institute of Technology
De Jong K A and Sarma J 1993 Generation gaps revisited Foundations of G e n d c
Algorithms 2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 19-28
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzman selection Proc. 5th Int. Conf: on G e n 4 c
Algorithrn~( Urbario-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 124-3 1
Eshelman L J 1991 The CHC adaptive search algorithm: how to have safe search
when engaging in nontraditional genetic recombination Founddons of Genetic
Algorithms ed G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 265-83
Fitzpatrick J M and Grefenstette J J 1988 Genetic algorithms in noisy environmcnts
Machine Learning 3 101-20
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Bid.
Cybern. 60 139-44
Freisleben B and Hirtfelder M 1993 Optimiation of genetic algorithms by genetic
algorithms ArtiJicial Neurul Nets und Genetic Algorithms ed R F Albrecht,
C R Reeves and N C Steele (Berlin: Springer) pp 392-9
Goldberg D E and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foundations of Genetic Algorithms ed G J E Rawlins (San Mateo,
CA: Morgan Kaufmann) pp 69-93
Goldberg D E, Korb B and Deb K 1989 Messy genetic algorithms: motivation, analysis
and first results Complex Syst. 3 493-530
Hammel U and Back T 1994 Evolution strategies on noisy functions: how to improve
convergence properties Purallel Problem solving from Nature-PPSN 111 (Proc. lnt.
Con$ on Evolutionun Computation und 3rd Con$ on Parullel Problem Sol\ing jrom
Nuture, Jerusalem, October 1994) (Lecture Notes in Computer Science 866 ) ed Yu
Davidor, H-P Schwefel and R Manner (Berlin: Springer) pp 159-68
Hancock P J B 1992 Coding strutegies.fi)rgenetic algorithms und neurul nets PhD Thcsis,
Department of Computer Science, University of Stirling
-1994 An empirical comparison of selection methods in evolutionary algorithms
Evolutionun Computing (AISB Workshop, Leeds, 1994, Selected Papers) (Lecture
Notes in Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 80-94
Michalewicz Z 1992 Genetic Algorithms + Datu Struc titres = Evolution Progrums (Berlin:
Springer)
Muhlenbein H and Schlierkamp-Voosen D 1993 Predictive models for the breeder genetic
TEAM LRN
algorithm Elvliit. Contput. 1 25-50

References

227

995 Analysis of selection, mutation and recombination in genetic algorithms

Evolution as a Computational Process (Lecture Notes in Coriiputer Science 899)
ed W Banzhaf and F H Eckman (Berlin: Springer) pp 188-214
Nolfi S, Elman J L and Parisi D 1990 Learning arid Eidittiori in Neitrul NetM*orks
Technical report CRL TR 9019 UCSD
Pal K F 1994 Selection schemes with spatial isolation for genetic optimization
Parallel Problem solving from Nature-PPSN Ill (Proc. hit. Cot$ on Eidutionan
Computatiori and 3rd ConJ: on Parallel Problem SoliGzg .from Nuture, Jerusaleni,
October 1994) (Lecture Notes in Computer Science 8 6 6 ) ed Yu Davidor, H-P
Schwefel and R Miinner (Berlin: Springer) pp 170-9
Schwefel H-P 198I Numerical Optimization of Computer Models (Chichester: Wiley)
Syswerda G 1991 A study of reproduction in generational and steady-state genetic
algorithms Foundations of Genetic Algorithms ed G J E Rawlins (San Mateo, CA:
Morgan Kaufmann) pp 94-101
Thierens D and Goldberg D E 1994 Elitist recombination: an integrated selection
recombination GA Proc. IEEE World Congr. on Cotnputationul Intelligence
(Piscataway, NJ: IEEE)
-1

TEAM LRN

30
Interactive evolution
Wougang Banzhaf

30.1 Introduction
The basic idea of interactive evolution (IE) is to involve a human user on-line
in the variation-selection loop of the evolutionary algorithm (EA). This is to be
seen in contrast to the conventional participation of the user prior to running
the EA by defining a suitable representation of' the problem (Chapters 14-21),
the fitness criterion for evaluation of individual solutions, and corresponding
operators (Chapters 31-34) to improve fitness quality. In the latter case, the
user's role is restricted to passive observation during the EA run.
The minimum requirement for IE is the definition of a problem
representation, together with a determination of population parameters only.
Search operators of arbitrary kind as well as selection according to arbitrary
criteria might be applied to the representation by the user. The process is much
more comparable to the creation of a piece of art, for example, a painting, 1han
to the automatic evolution of an optimized problem solution. In IE, the user
assumes an active role in the search process. At the minimum level, tht: IE
system must hold present solutions together with variants presently generJted
or considered.
Usually. however, automatic means of variation (i.e. evolutionary sedrch
operators using random events) are provided with an IE system. In the present
context we shall require the existence of automatic means of variation by
operators for mutation (Chapter 32) and recombination (Chapter 33) of solutions
which are to be defined prior to running the EA.

30.2 History
Dawkins (1986) was the first to consider an elaborate IE system. The evolution
of biomorphs, as he called them, by IE in a system that he had originally intended
to be useful for the design of treelike graphical forms has served as a prototype
for many systems developed subsequently. Starting with the contributions of
Sims (1991) and the book of Todd and Lathani (1992), computer art developed
into the present major application area of IE.

228

TEAM LRN

The problem

229

IE of grammar-based structures has also been considered (Nguyen and Huang

1994, McCormack 1994). Raw image data have been used more recently for
the purpose of evolving forms (Graf and Banzhaf 1995a).

30.3 The problem

The problem IE is trying to address has been encountered in all varieties of EAs
that make use of automatic evolution: the existence of nonexplicit conditions,
that is. conditions that are not formalizable.
The absence of a human user in steering and controlling the process of
evolution sometimes leads to unnecessary detours from the goal of global
optimization. In most of these cases, human intervention into the search
and selection process would advance the search rather quickly and allow
faster convergence onto the most promising regions of the fitness landscape,
or, sometimes, escape from a local optimum. Hence, a mobilization of
human knowledge can be achieved by allowing the user to participate in
the process.
Many design processes require subjective judgment relying on human
intuition, aesthetical values, or taste. In such cases. the fitness criterion
cannot be formulated explicitly, but can only be applied on a comparative
case-by-case basis. Direct human participation in IE allows for machinesupported evolution of designs that would otherwise be completely manual.
Thus, IE can be used (i) to accelerate EAs and ( i i ) in some areas to enable
application of EAs altogether.

30.4 The interactive evolution approach

Selection in a standard IE system, as opposed to that in an automatic evolution
system, is based on user action. It is typically the selection step that is subjugated
to human action, although in less frequent cases the variation process might also
be done by hand.
The standard algorithm for IE (following the notation in the introduction)is
presented at the top of the next page. As in an automatic evolution system,
there are parameters that are required to be fixed n priori : p , A, (-IL,(-I,,, , (-Ir, (-I,.
There are, however, also parameters subject to change, (HI;,,,(-I:, (-I:, depending
on the user interaction with the IE system. Both parameter sets together
determine the actual effect of mutation, recombination, and selection operators.
A simple variation of the standard algorithm shown overleaf is to allow
for population parameters to be also the subject of user interaction with the
system. For example, some systems (Graf and Banzhaf 199Sa) consider growing
populations and a variable number of variants.
A more complicated variant of the standard algorithm would add a sorting
TEAM LRN fitness criterion. One step further
process of variants according to a predefined

230

Interactive evolution

I
2
3
4
5
6
7
8
9
10

er,0,
individual last selected during the run, or
P * , the population last selected during the run.

p , A, 0,,o,

Input:
Output:

U * ,the

to;

P ( t ) t initialize(p);
while ( i ( P ( t ) ,0 , )# true) do
Input: O:, O&
P(t) t recombine( P ( t ) , Or, 0;);
P ( t ) t mutate(P(t), (HI, O&>;
output: P ( t )
Input:
P ( t 1) +- select(P(t),p , @I,,0 : ) ;

t+t+l;

is to allow this sorting process to result in a preselection in order to present

a smaller number of variants for the interactike selection step. Both methods
help the user to concentrate his or her selective action on the most promising
variants according to this predefined criterion.
This algorithm is formulated as follows:

Input:
Output:
1
2
3
4
5
6
7
8
9
10
I1
12
13

p , A, q. +I>1.
em,

o,, o r ,

O S

U * ,the

to;

individual last selected during the run, or

P * , the population last selected during the run.

P ( r ) c initialize(p);
while ( l ( P ( t ) .0,)
# true) do
Input: O:, @I;
P(r) t recombine(P(t), Or,(HI:);
P ( t ) t mutate(P(t), O m , 0;);
F ( t ) t evaluate(P(t),A);
P(r) t sort(P(t),0,);
P(t) t select(P(t),F ( t ) ,p , q , 0,);
output: P(t)
Input: (3;
P ( t I ) t select(P(t),p , OS,0;);

t c t f l ;

od
The newly added parameter @Iois used here to specify the predefined order
of the result after evaluation according to the predefined criterion. As before,
the GII):-parametersare used to specify the user interaction with the system. q
is the parameter stating how many of the automatically generated and ordered
variants are to be presented to the TEAM
user. LRN
If p A = q in a ( p A)-strategy, or

23 1

Difficulties

A = r,~in a ( p , A)-strategy, all variants will be presented for interactive selection.

If, however, p h > q and h > r , ~respectively, solutions would be preselected
and we speak of a hybrid evolution system (having elements of automatic as
well as interactive evolution). Other parameters are used in the same way as in
the standard algorithm.

30.5 Difficulties
The second, more complicated version of IE requires a predefined fitness
criterion, in addition to user action. This trades one advantage of IE systems for
another: the absence of any requirement to quantify fitness for a small number
of variants to be evaluated interactively by the user.
Interactive systems have one serious difficulty, especially in connection
with the automatic means of variation that are usually provided: whereas the
generation of variants does not necessarily require human intervention, selection
of variants does call the attention of the user. Due to psychological constraints,
however, humans can normally select only from a small set of choices. IE
systems are thus constrained to present only of the order of ten choices at each
point in time from which to choose. Also in sequence, only a limited number
of generations can be practically inspected by a user before the user becomes
tired.
It is emphasized that this limitation must not mean that the generation of
variants has to be restricted to small numbers. Rather the variants have to be
properly ordered at least, for a presentation of a subset that can be handled
interactively.

30.6 Application areas

An application of IE may be roughly divided into two parts:
(i) structural evolution by discrete combination of predefined elements and
(ii) parametric evolution of genes coding for quantifiable features of the
phenotype.
All application use these parts to various degrees.
In the first part, one has to define the structure elements that might be
combined into a correct genotype. Examples are symbolic expressions coding
for appearance of points in an image plane (Sims 1991) or elementary geometric
figures such as cone and cube (Todd and Latham 1992). In the second part,
parameters have to be used to further specify features of these structural
elements. Together, this information constitutes the genotype of the future
design hopefully to be selected by a user. In a process called expression this
genotype is then transformed into an image or three-dimensional form that can
be displayed as a phenotype for the selection step.
Table 30.1 gives an overview of the presently used IE systems. The reader
TEAM
LRN given in the reference list.
is advised to consult details with the
sources

232

Interactive evolutic
Table 30.1. An overview of different IE systems.

Application

Genotypic elements

Phenotype

Source

Lifelike structures
Textures. images

line drawing parameters

math. functions, image
processing operations
math. functions, image
processing operations
(position of) facial parts

biomorphs
(s.y , :) pixel
values
( A . y , :) pixel
values
face images

Dawkins (1986)
Sims (1991)

geometric forms and

visually defined
graphical elements
CA rules, differential
equations
rules, parameters of
L-systems
structural elements,
e.g. wings, body
tiepoints of bitmap
images

3D rendering of
grown objects

Caldwell and
Johnston ( I90 1 )
Todd and
Latham ( 1992)

system behavior

Sims (1992)

rendered objects

McCormac k
(1994)
Nguyen and
Huang ( 1994)
Graf and
BanLhaf ( I 99Sa)

Animation
Person tracking
Images. sculptures
Dynamical systems
I mages, animation
Airplane design
Images, design

airplane drawings
bilmap images

Sims (1991)

Figure 30.1 illustrates some results with runs in different IE systems.

Within the process of genotype-phenotype mapping a (recursive)
developmental process is sometimes applied (Dawkins 1986, Todd and Latham
1992) whose results are finally displayed as the image for selection.

30.7 Further developments and perspectives

As of now, the means to generate a small group of variants from which to
choose interactively are still not very good. For example, one could imagine a
tool for categorizing variants into a number of families of similar design and
then present only one representative from each family. In this way, a Ixge
population of variants could be used in the background which is invisible to the
user but might have beneficial effects in the course of evolution.
Another very interesting area of research is to assign 11 posteriori effective
fitness values to members of the population, depending on user action. An
individual which is selected more often would be assigned a higher fitness than
an individual which is not. This might result in at least a crude measure of the
nonquantifiable fitness measures that lie at the heart of IE. One might even adjust
the effect the operators have on the population, based on what is observed in
the course of evolution directed by the user. In this way, an intelligent system
could be created, that is able to learn from actions of the user how to varj the
population in order to arrive at good designs.
Another direction of research is to look into involving the user not (only) in
TEAM LRN
the selection process, but in the variation
process. Quite often, humans would

References

233

Figure 30.1. Samples of evolved objects: ( U ) dynamical system, cell structure (Sims
1992, @ MIT Press); ( 6 ) artwork by Mutator (Todd and Latham 1992, with permission
of the authors); (c) hybrid car model (Graf and Banzhaf 1995b, @ IEEE Press).

have intuitive ideas for improvement of solutions when observing an automatic

evolutionary process taking its steps. These ideas might be used to cut short
the search routes an automatic algorithm is following. For this purpose, a user
might be allowed to intervene in the process at appropriate interrupt times.
Finally, all sensory inputs could be used for IE. The systematic variation of
components of a chemical compound that specifies an odor, for example, could
be used to evolve a nice smelling perfume. Taste could as well be subject to
interactive evolutionary tools, as could other objects if appropriately mapped to
our senses (for instance by virtual reality tools).
With the advent of interactive media in the consumer market, production-ondemand systems might one day include an interactive evolutionary design device
that allows the user not only to customize a product design before it goes into
production, but also to generate his or her own original design that has never
been realized before and usually will never be produced again. This would open
up the possibility of evolutionary product design by companies which track their
customers activities and then distribute the best designs they discover.

References
Caldwell C and Johnston V S 1991 Tracking a criminal suspect through face-space with
a genetic algorithm Proc. 4th Int. COT$on Genetic Algorithms (San Diego, CA, July
199I) ed R K Belew and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 416TEAM LRN
21

234

Interactive evolution

Dawkins R 1986 The Blind Watchmaker (New York: Norton)

Graf J and Banzhaf W 1995a Interactive evolution of images Proc. 4th Cot$ on
Eidutionun Progrumming (San Diego, CA, March 1995) ed J R McDonnell. R
G Reynolds and D Fogel (Cambridge: MIT Press) pp 53-65
-1995b
An expansion operator for interactive evolution Proc. 2nd IEEE Int. Cwj:
on Eidutionan Computution (Perth, November--December 1995) (Piscataway, NJ:
IEEE) pp 798-802
McCormack J I994 Interactive evolution of L-system grammars for computer graphics
modelling Complex system.^ ed T Bossomaier and D Green (Singapore: Wcrld
Scientific) pp 118-30
Nguyen T C and Huang T S 1994 Evolvable 3D modeling for model-based object
recognition systems Adiwzces in Genetic Progrumming ed K Kinnear (Cambridge:
MIT Press) pp 459-75
Sims K 199I Artificial evolution for computer graphics Comput. Graph. 25 3 19-28
-1992 Interactive evolution of dynamical systems Toutard a Pructice of Autonornoi~s
Systems ed F J Varela and P Bourgine (Cambridge, MA: MIT Press) pp 171-8
Todd S and Latham W 1992 E\dutionan Art cind Computers (London: Academic)

Further reading
This section is intended to give an overview of presently available work in IE
and modeling methods which might be interesting to use.
1. PrusinkiewicL P and Lindenmayer A I99 1 The Algorithmic Beuutj, ofPlunts (Berlin:

Springer)
An informative introduction to L-systems and their use in computer graphics.
2. Koza J R 1992 Genetic Programming (Cambridge, MA: MIT Press)
A book describing methods to evolve computer code, mainly in the form of LISPtype S-expressions.
3. Caldwell C and Johnston V 1991 Tracking a criminal suspect through 'face-space'
with a genetic algorithm Proc. Int. Con$ on Genetic Algorithms (Sun Diego, CA,
July 1991) ed R K Belew and L B Booker (San Mateo, CA: Morgan Kaufmann)
pp 416-21
Very interesting work containing one of the more profane applications of IE.
4. Baker E 1993 Evolving line drawings Proc. Int. Cot$ on Genetic Algorithms
(Urbanu-Champaign, IL, July I993) ed S Forrest (San Mateo, CA: Morgan
Kaufmann) p 627
This contribution discusses new ideas on design using simple style elements for IE.

5. Roston G P 1994 A Genetic Merhodology jbr Conjguratiorz Design Doctoral

Dissertation, Carnegie Mellon University
Informed discussion of different aspects of using genetic algorithms for de>ign
purposes.
TEAM LRN

31
Introduction to search operators
Zbignie w Michalew icz
Any evolutionary system processes a population of individuals, P ( t ) =
{ a : ,. . . , U ; ] } ( t is the iteration number), where each individual represents a
potential solution to the problem at hand. As discussed in Chapters 14-2 I , many
possible representations can be used for coding individuals; these representations
may vary from binary strings to complex data structures I .
Each solution a: is evaluated to give some measure of its fitness. Then a
new population (iteration t 1) is formed by selecting the more-fit individuals
(the selection step of the evolutionary algorithm, see Chapters 22-30). Some
members of the new population undergo transformations by means of genetic
operators to form new solutions. There are unary transformations m, (mutation
type), which create new individuals by a (usually small) change in a single
individual (m, : I -+ I ) , and higher-order transformations c, (crossover, or
recombination type), which create new individuals by combining parts from
several (two or more, up to the population size p ) individuals (cj : I + I ,
2 5 s 5 p).
It seems that, for any evolutionary computation technique, the representation
of an individual in the population and the set of operators used to alter its genetic
code constitute probably the two most important components of the system,
and often determine the systems success or failure. Thus, a representation of
object variables must be chosen along with the consideration of the evolutionary
computation operators which are to be used in the simulation. Clearly, the
reverse is also true: the operators of any evolutionary system must be chosen
carefully in accordance with the selected representation of individuals. Because
of this strong relationship between representations and operators, the latter are
discussed with respect to some (standard) representations.
In general, Chapters 3 1-34 provide a discussion on many operators which
have been developed since the mid-1960s. Chapter 32 deals with mutation
operators. Accordingly, several representations are considered (binary strings,
real-valued vectors, permutations, finite-state machines, parse trees, and others)
and for each representation one or more possible mutation operators are
discussed. Clearly, it is impossible to provide a complete overview of all
mutation operators, since the number of possible representations is unlimited.

TEAM LRN

235

236

Introduction to search operators

However. Chapter 32 provides a complete description of stcirrdcird mutation

operators which have been developed for stcirrckrd data structures.
Chapter 33 deals with recombination operators. Again. as for mutation
operators, several representations are considered (binary strings, real-valued
vectors, permutations, finite-state machines, parse trees, and others) and for
each representation several possible recombination operators are discussed.
Recombination operators exchange information between individuals arid
are considered to be the main driving force behind genetic algorithms.
while playing no role in evolutionary programming.
There are many
important and interesting issues connected with recombination operators: the .;e
include properties that recombination operators should have to be useful
(these are outlined by Radcliffe ( 1993)), the number of parents involved
in recombination process (Eiben et c i l (1994) described experiments with
multiparent recombination operators-so-called orgies). or the frequencies of
recombination operators.
Chapter 34 discusses some additional variations. These include the Bald&in
effect, gene duplication and deletion, and knowledge-augmented operators.

References
Eiben A E, Raue P-E and Ruttkay Zs 1994 Genetic algorithms with multi-parcnt
recombination Pro<*.Parcillel Prohleni Sohliiig from Ncitirre vol 3 (New York:
Springer) pp 78-87
Radcliffe N J 1993 Genetic set recombination Foiriicldom of Genetic Algorithms I1
(Ocrober f W 4 , Jerirstilrni) cd Yu Davidor. H-P Schwefel and R Minner (San Matco,
CA: Morgan Kaufmann) pp 203-19

TEAM LRN

32
Mutation operators
Thomas Back (32.1)) David B Fogel (32.2, 32.4, 32.6))
Darrell Whitley (32.3) and Peter J Angeline (32.5, 32.6)

32.1 Binary strings

The mutation operator currently used in canonical genetic algorithms to

manipulate binary vectors (also called binary strings or bitstrings, see Chapter
IS) a = ( a l ,. . . , a t ) E I = (0, 1) of fixed length f was originally introduced by
Holland (1975, pp 109-1 1 ) for general finite individual spaces I = A I x . . . x A ( ,
where A, = { a , , ., . . , a,,, ). According to his definition, the mutation operator
proceeds by:
determining the positions il, . . . , ill (i, E { I , . . . . t } )to undergo mutation
by a uniform random choice, where each position has the same small
probability p,,, of undergoing mutation, independently of what happens
at other positions, and
forming the new vector a = (a1, . . . , a,,- 1 , u,,, u , ,+ I , . . . . a,,,- I , u,,l N I,,+ 1
. . . , a t ) where al E A, is drawn uniformly at random from the set of
admissible values at position i.

The original value a, at a position undergoing mutation is riot excluded

from the random choice of U,! E A , ; that is, although the position is chosen for
mutation, the corresponding value might not change at all. This occurs with
probability 1 / [ A ,1, such that the effective (realized) mutation probability differs
from prn by a nonneglectible factor of 1/2 if a binary representation is used.
In order to avoid this problem, it is typically agreed on defining p,,, to be
the probability of independently inverting each of the variables u, E {O, 1 ) . such
that the mutation operator m : (0, 1 ) .+ (0, I ) produces a new individual
a = m(a) according to

(32.1)
TEAM LRN

237

238

Mutation operators

where 1.1 c / ( [ O , I ) ) denotes a uniform random variable sampled anew for each
i E ( I ,...,t ) .
From a computational point of view, the straightforward implementation
of equation (32.1) as a loop calling the random number generator for each
position i is extremely inefficient. Since the random variable T describing the
distances between two positions to be mutated has a geometrical distribution wi1.h
P { T = t } = p,( I - P,,~)- and expectation E[T] = l / ~ , ? and
~ , a geometrical
random number can be generated according to

In(l - U )
- Pm)

(32.2)

U ( [ O ,I))), equation (32.2) provides an efficient method to generate

(where U
the offset to find the next position for mutation from the current one. If the
actual position plus the offset exceeds the vector dimension l , it carries over
to the next individual and, if all individuals of the actual population have been
processed, to the next generation.
Concerning the importance of mutation for the evolutionary search process,
both Holland ( 1975, p I 1 1 ) and Goldberg (1989, p 14) emphasize that mutation
just serves as a background operator, supporting the crossover operator
(Section 33.1) by assuring that the full range of allele values is accessible
to the search. Consequently, quite small values of p,,, E [O.OOl, 0.011
were recommended for canonical genetic algorithms (see e.g. De Jong 1975,
Grefenstette 1986, Schaffer et a1 1989) until recently, when both empirical and
theoretical investigations clearly demonstrated the benefits of emphasizing the
role of mutation as a search operator in these algorithms. More specifically,
some of the important results include:
(i)

empirical findings favoring an initially large mutation rate that

exponentially decreases over time (Fogarty 1989),

(ii) the theoretical confirmation of the optimality of such an exponentially

decreasing mutation rate for simple test functions (Hesser and Manner 1901,
1992. Back 1996) and
(iii) the knowledge of a lower bound p,,* = I/[ for the optimal mutation rate
(Bremermann et nl 1966, Miihlenbein 1992, Back 1993).

It is obvious from these results that not only for evolution strategies and
evolutionary programming, but also for canonical genetic algorithms, mutation
is an important search operator that cannot be neglected either in practical
applications or in theoretical investigations of these algorithms. Moreover. it
is also possible to release the user of a genetic algorithm from the problem
of finding an appropriate mutation rate control or fine-tuning a fixed value
by transferring the strategy parameter self-adaptation principle from evolution
TEAM LRN
strategies and evolutionary programming
to genetic algorithms.

Real-valued vectors

239

32.2 Real-valued vectors

David B Fogel
Mutation generally refers to the creation of a new solution from one and only
one parent (otherwise the creation is referred to as a recombination (see Chapter
33). Given a real-valued representation where each element in a population is an
n-dimensional vector x E R, there are many methods for creating new elements
(offspring) using mutation. These methods have a long history, extending back
at least to Bremermann (1962), Bremermann et a1 ( I965), and others. A variety
of methods will be considered here.
The general form of mutation can be written as
2 = m ( x )

(32.3)

where x is the parent vector, m is the mutation function, and xis the resulting
offspring vector. Although there have been some attempts to include mutation
operators that do not operate on the specific values of the parents but instead
simply choose z from a fixed probability density function (PDF) (Montana and
Davis 1989), such methods lose the inheritance from parent to offspring that
can facilitate evolutionary optimization on a variety of response surfaces. The
more common form of mutation generates an offspring vector:

x=x+M

(32.4)

where the mutation M is a random variable. M is often zero mean such that
E ( z ) = x;the expected difference between a parent and its offspring is zero.
M can take different forms. For example, M could be the uniform random
variable U ( a , b), where a and 6 are the lower and upper limits respectively. In
this case, a is often set equal to -b. The result of applying this operator as M in
equation (32.4) yields an offspring within a hyperbox z U ( - h . h). Although
such a mutation is unbiased with respect to the position of the offspring within
the hyperbox, the method suffers from easy entrapment when the parent vector
x resides in a locally optimal well that is wider than the available step size.
Davis ( 1989, 1991b) offered a similar operator (known as creep) that has a
fixed probability of altering each component of x up or down by a bounded
small random amount. The only method for alleviating entrapment in such cases
relies on probabilistic selection, that is, maintaining a probability for choosing
lesser-valued solutions to become parents of the subsequent generations (see
Chapter 27). In contrast, unbounded mutation operators do not require such
selection methods to guarantee asymptotic global convergence (Fogel 1994.
Rudolph 1994).
The primary unbounded mutation PDF for real-valued vectors has been the
Gaussian (or normal) (Rechenberg 1973, Schwefel 1981, Fogel et a1 1990,
TEAM LRN1993, Fogel and Stayton 1994, and
Fogel and Atmar 1990, Back and Schwefel

240

Mutation operators

many others). The PDF is defined as

When p = 0, the parameter (T offers the single control on the scaling of the
PDF. It effectively generates a typical step size for a mutation. The use of zeromean Gaussian mutations generates offspring that are (i) on average no different
from their parents and (ii) increasingly less likely to be increasingly different
from their parents. Saltations are not completely avoided such that any local
optimum can be escaped from in a single iteration, yet they are not so common
as to lose all inheritance from parent to offspring.
Other density functions with similar characteristics have also been
implemented. Yao and Liu (1996) proposed using Cauchy distributions to aid
in escaping from local minima (the Cauchy distribution has a fatter tail than the
Gaussian) and demonstrated that Cauchy mutations may offer some advantages
across a wide testbed of problems. Montana and Davis (1989) examined the
use of Laplace-distributed mutations but there is no evidence that the Laplace
distribution is particularly better suited than Gaussian or Cauchy mutations for
typical real-valued optimization problems.
In the simplest version of evolution strategies or evolutionary programming.
described as a ( 1 + 1 ) evolutionary algorithm, a single parent x creates a single
offspring xby imposing a multivariate Gaussian perturbation with mean zero
and standard deviation (T on the parent, then selects the better of the two trial
solutions as the parent for the next iteration. The same standard deviation is
applied to each component of the vector x during mutation. For some problems,
the variation of (T (i.e. the step size control parameter in each dimension) c m
be computed to yield an optimal rate of convergence.
Let the convergence rate be defined as the ratio of the Euclidean distance
covered toward the optimum solution to the number of trials required to achieve
the improvement. Rechenberg ( 1973) calculated the convergence rates for two
functions:

(2, . . . , n ] -b/2 5 x l 5 h / 2

where x = ( X I , . . . , , Y , ~ ) ~E R. Function fl is termed the corridor model

and represents a linear function with inequality constraints. Improvement is
accomplished by moving along the first axis of the search space inside a
corridor of width h. Function f 2 is termed the sphere model and is a simple
,I-dimensional quadratic bowl.
Rechenberg ( 1973) showed that the optimum rates of convergence expected
progress toward the optimum) are
TEAM LRN

24 I

Real-valued vectors
on the corridor model, and
a = 1.22411x11/n

on the sphere model. That is, only a single step size control is needed for
optimum convergence. Given these optimum standard deviations for mutation,
the optimum probabilities of generating a successful mutation can be calculated
as
pyp' = (2e)-I

* 0.184

opt

p 3 = 0.270.

Noting the similarity of these two values, Rechenberg (1973) proposed the
following rule:
The ratio of successful mutations to all muta ions should be l / S . If this rati 1 is
greater than 1/5, increase the variance; if it is less, decrease the variance.
Schwefel (198I ) suggested measuring the success probability on-line over O??
trials (where there are 12 dimensions) and adjusting a at iteration t by
a(t - n)6
a ( t - 17)

if p , < 0.2
if p \ > 0.2
if p\ = 0.2

with 6 = 0.85 and p , equaling the number of successes in 1011trials divided

by 1017,which yields convergence rates of geometric order for both .f'l and .f?
(Back et a1 1993; see the book by Back (1996) for corrections to the update
rule offered by Back et a1 (1993)).
The use of a single step size control parameter covering all dimensions
simultaneously is of limited robustness. The optimization performance can be
improved by using appropriate step sizes in each dimension. This is particularly
evident when consideration is given to optimizing a vector of parameters each
of different units of dimension (e.g. temperature and pressure). Determining
appropriate settings for each of n step sizes poses a significant challenge to the
human operator; as such, methods have been proposed for self-adapting the step
sizes concurrent to the evolutionary search.
The first efforts in self-adaptation date back at least to the article by Reed et (11
(1967), but the two most common implementations in use currently derive from
the work of Schwefel (1 98 I ) and Fogel et a1 ( 199 1 ). In each case, the vector of
objective variables x is accompanied by a vector strategy parameters a where
a, denotes the standard deviation to use when applying a zero-mean Gaussian
mutation to that component in the parent vector. The strategy parameters are
updated by slightly different methods according to Schwefel ( I98 I ) and Fogel
TEAM LRN
er crl (1991).

242

Mutation operators

Schwefel ( 198I ) offered the procedure

= U / exp(roN(0, 1 )

x, = x,

+ N(0,

+ riv,(0, 1 ) )

U/)

where the constant T cx l/[2(n/2)]2, ro cx l/(2n)I2, N ( 0 , I ) is a standard

Gaussian random variable sampled once for all tz dimensions and N,(O,I ) is a
standard Gaussian random variable sampled anew for each of the n dimensioris.
The procedure offers a general control for all dimensions and an individualizcd
control for each dimension (Schwefel (1981) also offered a simplitied method
for self-adapting a single step size parameter a). The values of U are, as shoum,
log-normal perturbations of their parents vector U ,
Fogel et nl ( I99 I ) independently offered the procedure

where the parents strategy parameters are used to create the offsprings
objective values before being mutated themselves, and the mutation of the
strategy parameters is achieved using a Gaussian distribution scaled by x
and the standard deviation for each dimension. This procedure also requires
incorporating a rule such that if any component a/becomes negative it is reset
to an arbitrary small value 6 .
Several comparisons have been conducted between these methods.
Saravanan and Fogel (1994) and Saravanan et a/ (1995) indicated that ihe
log-normal procedure offered by Schwefel ( 198I ) generated generally superior
optimization performance (statistically significant) across a series of standxd
test functions. Angeline (1996a), in contrast, found that the use of Gaussian
mutations on the strategy parameters generated better optimization performarice
when the objective function was made noisy. Gehlhaar and Fogel (1906)
indicated that mutating the strategy parameters before creating the offspring
objective values appears to be more generally useful both in optimizing a set of
test functions and in molecular docking applications.
Both of the above methods for self-adaptation have been extended to
include possible correlation across the dimensions. That is, rather than us(: n
independent Gaussian random perturbations, a multivariate Gaussian mutat ion
with arbitrary covariance can be applied. Schwefel (1981) described a method
for incorporating rotation angles a such that new solutions are created by

where /3 =: 0.0873 ( 5 ), i = I , . . . , n and j = I , . . . , n ( n - l)/2, although it is

LRN correlations in the method. Fogel
not necessary to include all possibleTEAM
pairwise

Permutations

243

et al (1 992) offered a similar method operating directly on the components of the

covariance matrix but the method does not guarantee positive definite matrices
for n > 2, and the conventional method for implementing correlated mutation
relies on the use of rotation angles as described above.
Another type of zero-mean mutation found in the literature is the so-called
nonunifOrm mutation of Michalewicz (1996, pp I 1 I-2), where
X,'(t) = x,(t)

+ A ( t , ub,

-XI@))

x, (1) - A @ ,X, ( 1 ) - Ib,)

if U < 0.5
if u 2 0.5

where x ; ( t ) is the ith parameter of the vector x at generation t , s,E [Ib,. ub,],
the lower and upper bounds, respectively, U is a random uniform U ( 0 , I ) , and
the function A ( ? ,y ) returns a value in the range [0, y ] such that the probability
of A ( r , y) being close to zero increases as t increases, essentially taking smaller
steps on average. Michalewicz et a1 (1994) used the function
A ( r , y ) = yu(1

t/T)"

where T is a maximal generation number and h is a system parameter chosen

by the operator to determine the degree of nonuniformity.
There have been recent attempts to use nonzero-mean mutations on realvalued vectors. Ostermeier (1 992) proposed an evolution strategy where the
Gaussian mutations applied to the objective vector x are controlled by a vector
of expectations p as well as a vector of standard deviations 0 . Ghozeil and
Fogel (1996), following earlier work by Bremermann and Rogson (1964), have
implemented a polar coordinate mutation in which new offspring are generated
by perturbing the parent in a random direction (0) with a specified step size ( r ) .

32.3 Permutations
Darrell Whitley

32.3.I

Introduction

Mutation operators can be used in a number of ways. Random mutation

hillclimbing (Forrest and Mitchell 1993) is a search algorithm which applies
a mutation operator to a single string and accepts any improving moves. Some
forms of evolutionary algorithms apply mutation operators to a population of
strings without using recombination, while other algorithms may combine the
use of mutation with recombination.
Any form of mutation which is to be applied to a permutation must yield
a string which also represents a permutation. Most mutation operators for
permutations are related to operators which have also been used in neighborhood
local search strategies. Many of these operators thus can be applied in such as
way that they reach a well-defined TEAM
neighborhood
of adjacent states.
LRN

244

Mutation operators

32.3.2 2-0pt, 3-opt, and k-opt

The most common form of mutation is 2-opt (Lin and Kernighan 1973). Given
a sequence of elements

A B C D E F G H
the 2-opt operator selects two points along the string, then reverses the segment
between the points. Note that if the permutation is viewed as a circuit as in the
traveling salesman problem (TSP), then all shifts of a sequence of N elements
are equivalent. It follows that once two cut points have been selected in this
circular string, it does not matter which segment is reversed; the effect is the
same.
The 2-opt operator can be applied to all pairs of edges in N ( N - I )/2 steps.
This is analogous to one iteration of local search over all variables in a parameter
optimization problem. If a full iteration of 2-opt to all pairs of edges fails to
find an improving move, then a local optimum has been reached.

G
Figure 32.1. A graph.

2-opt is classically associated with the Euclidean TSP. Consider the graph
in figure 32.1. If this is interpreted as a Euclidean TSP, then reversing the
segment [C D E F] or the segment [G H A B] results in a graph where none of
the edges cross and which has lower cost than the graph where the edges cross.
Let {A, B, . . . , 2 ) be a set of vertices and (a, b) be the edge between vertices A
and B. If vertices {B, C, F, G} in figure 32.1 are connected by the set of edges
((b,c), (b,f), (b,g), (c,f), (c,g) (f,g)), then two triangles are formed when B
is connected to F and C is connected to G. To illustrate, create a new graph
by placing a new vertex X at the point where the edges (b,f) and (c,g) cross.
In the new graph in Euclidean space, the distance represented by edge ( ( 7 , ~ )
must be less than edges (b, x) (x, c), assuming B, C, and X are not on a line;
likewise, the distance represented by edge (f, g) must be less than edge (f, x )
(x,g). Thus, reversing the segment [C D E F] will always reduce the cost of
the tour due to this triangle inequality. For the TSP this leads to the general
principle that multiple applications of 2-opt will always yield a tour that has no
TEAM LRN
crossed edges.

Permutations

245

One can also look at reversing more than two segments at a time. The
3-opt operator cuts the permutation into three segments and then looks at all
possible ways of reordering these segments. There are 3 ! = 6 ways to order the
segments and each segment can be placed in a forward or reverse order. This
yields up to 23 * 6 = 48 possible new reorderings of the original permutation.
For the symmetric TSP, however, all shifted arrangements of the three segments
are equal and all reversed arrangements of the three segments are equal. Thus,
the 3! orderings are all equivalent. (By analogy, note that there is only one
possible Hamiltonian circuit tour between three cities.) This leaves only 23 = 8
ways of placing each of the segments in a forward or reverse direction, each
of which yields a unique tour. Thus, for the symmetric TSP, the cost to test
one 3-opt move is eight times greater than the cost of testing one 2-opt move.
For other types of scheduling problem, such as resource allocation, reversals
and shifts of the complete permutation are not necessarily equivalent and the
cost of a 3-opt move may be up to 48 times greater than that of a 2-opt move.
Also note that there are
ways to break a permutation up into combinations
ways of breaking the permutation into two
of three segments compared to
segments. Thus, the set of all possible 3-opt moves is much larger than the set
of possible 2-opt moves. This further increases the cost of performing one pass
of 3-opt over all possible ways of partitioning a permutation into three segments
compared to a pass of 2-opt over all pairs of possible segments.
One can also use k-opt, where the permutation is broken into k segments,
but such an operator will obviously be very costly.

(y)

(1)

,
scramble operators
32.3.3 Insert, s ~ u pand

The TSP is sensitive to the adjacency of elements in a permutation, so that

2-opt represents a minimal change from one Hamiltonian circuit to another. For
resource scheduling applications the permutation represent a priority queue and
reversing a segment of a permutation represents a major change in access to
available resources. For example, think of the permutation as representing a
line of people waiting to buy a limited supply of tickets for different seats on
different trains. The relative order of elements in the permutation tends to be
important in this case and not the adjacency of the individual elements. In this
case, a 2-opt segment reversal impacts many customers and is far from a minor
change.
Radcliffe and Surry ( 1995) argue for representation-independent concepts of
mutation and related forms of hillclimbers. Concerning desirable properties of
a mutation operator, they state, One nearly universal characteristic, however,
is that they ensure . . . that the entire search space remains accessible from any
population, and indeed from any individual. In most case mutation operators
can actually move from any point in the search space to any other point directly,
but the probability of making large moves is very much smaller than that of
TEAM
LRNmutation rates) (p 58). They also
making small moves (at least with
small

246

Mutation operators

suggest that a single mutation should represent a minimal change and look at
different types of mutation operator for different representations of the TSP.
For resource allocation problems, a more modest change than 2-opi is
to merely select one element and to insert it at some other position in the
permutation. Syswerda (1991) refers to a variant of this as position-based
mutation and describes it as selecting two elements and then moving the second
element before the first element. Position-based mutation appears to be less
general than the insert operator, since elements can only be moved forward in
position-based mutation.
Similarly, one can select two elements and swap the positions of the two
elements. Syswerda denotes this as order-based mutation. Note that if an
element is moved forward or backward one position, this is equivalent to a
swap of adjacent elements. One way in which swap can be used as a local
search operator is to swap all adjacent elements, or perhaps also all pairs of
elements. Finally, Syswerda also defines a scramble mutation operator that
selects a sublist of permutation elements and randomly reorders (i.e. scrambles)
the order of the subset while leaving the other elements in the permutation in
the same absolute position. Davis (1991a) also reports on a scramble sutilist
mutation operator, except that the sublist is explicitly composed of contiguous
elements of a permutation. (It is unclear whether Syswerdas scramble operator
is also meant to work on contiguous elements or not; an operator that selects
a sublist of elements over random positions of the permutation is certainly
possible.)
For a problem that involved scheduling a limited number of flight simulators,
Syswerda ( 199I , p 342) reported that when applied individually, the order-based
swap mutation operator yielded the best results when compared to positionbased mutation and scramble mutation. In this case the swaps were selected
randomly rather than being performed over a fixed well-defined neighborhood.
Davis ( 1991, p 8 I ) on the other hand reports that the scramble sublist mutation
operator proved to be better than the swap operator on a number of applications.
In conclusion, one cannot make a priori statements about the usefulness of
a particular mutation operator without knowing something about the type of
problem that is to be solved and the representation that is being used for that
problem, but in general it is useful to distinguish between permutation problems
that are sensitive to adjacency (e.g. the TSP) versus relative order (e.g. resource
scheduling) or absolute position, which appears to be the least common.

32.4 Finite-state machines

David B Fogel

Given a finite-state machine representation (Chapter 18) where each element in

a population is defined by a 5-tuple

M =TEAM
((3, TLRN
, P , S, n)

Finite-state machines

247

where Q is a finite set, the set of states, T is a finite set, the set of input
symbols, P is a finite set, the set of output symbols, s : Q x T -+ Q , the next
state function, and o : Q x T -+ P , the next output function,
there are various methods for mutating parents to create offspring. Following
directly from the definition, five obvious modes of mutation present themselves:
(i) change an output symbol, (ii) change a state transition, (iii) add a new state,
(iv) delete a state, and (v) change the start state. Each of these will be discussed
in turn.
(i)

(ii)

(iii)

(iv)

(v)

Changing an output symbol consists of determining a particular state 4 E Q ,

and then determining a particular symbol t E T . For this pair ( q , t),
identify the associated output symbol p E P and change it to a symbol
chosen at random over the set P . The probability mass function for
selecting a new symbol is typically uniform over the possible symbols in
P, but can be chosen to reflect nearness between symbols or other known
relationships between the symbols.
Changing a state transition consists of determining a particular state 4 1E Q ,
and then determining a particular symbol t E T . For this pair (41, t),
identify the associated next state q 2 and change it to a state chosen at
random over the set Q. The probability mass function for selecting a new
symbol is typically uniform over the possible states in Q.
Adding a state can only be performed when the maximum size of the
machine has not been exceeded. The operation is accomplished by
increasing the set Q by one element. This new state must be properly
defined by generating an associated output symbol p, and next state
transition 4, for all input symbols i = 1, . . . , I T (. The generation is
typically performed by selecting output symbols and next state transitions
with equal probability across their respective sets. Optionally, the new state
may also be forced to be connected to the preexisting states by redirecting
a randomly selected state transition of a randomly chosen preexisting state
to the new state.
Deleting a state can be performed when the machine has at least two states.
The operation is accomplished by decreasing the set Q by one element
chosen at random (uniformly). All state transitions from other states that
point to the deleted state must be redirected to the remaining states. This
is often performed at random, with the new states selected with equal
probability.
Changing the start state can be performed when the machine has at least
two states. The operation is accomplished by selecting a state q E Q to
be the new starting state. Again, the selection is typically made uniformly
over the available states.

The mutation operation can be implemented with various probabilities

assigned to each mode of mutation (Fogel and Fogel 1986), although many
TEAM LRN
of the initial experiments in evolutionary
programming used equal probabilities

248

Mutation operators

(Fogel et nl 1966). Further, multiple mutations can be performed (see e.g.

Fogel et al 1966), and macromutations can be defined over pairs or higher-order
combinations of these primitive operations. Recent efforts by Fogel et nl ( 1994,
1995) and Angeline et NI ( 1996) have incorporated the use of self-adaptatiori in
mutating finite-state machines.

32.5 Parse trees

Peter J Angelirie

Standard genetic programming (Koza 1992), much as with traditional genztic

algorithms, discounts mutations role during evolution, often to an extreme (i.e.
a mutation rate of zero). In many genetic programs, no mutation operations are
used, which forces population sizes to be quite large in order to ensure access
to all the primitives in the primitive language throughout a run.
In order to avoid unnecessarily large population sizes, Angeline ( I996b)
defines four distinct forms of mutation for parse trees (Chapter 19). The grout
mutation operator randomly selects a leaf from the tree and replaces it with a
randomly generated new subtree (figure 32.2). The shrink mutation operdtor
selects an internal node from the tree and replaces the subtree below it with
a randomly generated leaf node (figure 32.3). The switch mutation operdtor
selects an internal node from the parse tree and reorders its argument subtrees
(figure 32.4). Finally, the c y l e mutation operator selects a random node and
replaces it with a new node of the same type (figure 32.5). If a leaf node is
selected, then i t is replaced by a leaf node. If an internal node is selected, thcm i t
is replaced by a function primitive that takes an equivalent number of arguments.
Note that the mutation operation defined by Koza (1992) is a combination of a
shrink mutation followed by a grow mutation at the same position.
Angeline (1996b) also defines a numerical terminal mutation that
manipulates numerical terminals in a parse tree using the Gaussian mutations
typically used in evolution strategies and evolutionary programming (see also
Back 1996, Fogel 1995). In this mutation operation, a single numerical terminal
in the parse tree is selected at random and a Gaussian random variable with a
user-defined variance is added to its value.
If the application of a mutation operation creates a parse tree that violates
the size limitation criteria for the parse tree, typically the operation is revoked
and the state of the parse tree prior to the operation is restored. In some cases,
when a series of mutations are to be performed, as in Angeline (1996b), the
complete set of mutations is executed prior to checking whether the mutated
parse tree conforms to the imposed size restrictions.
When evolving typed parse trees as in Montana (1993, mutation must also
be sensitive to the return type of the manipulated node. In order to preserve
the syntactic constraints, the return type of the node after mutation must be the
TEAMtrack
LRN of the return types for the various
same. This is accomplished by keeping

Parse trees

249

Figure 32.2. An illustration of the grow mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, a terminal node is selected at random (highlighted)
and replaced by a randomly generated subtree to produce the child tree.

Figure 32.3. An illustration of the shrink mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, an internal function node is selected at random
(highlighted) and replaced by a randomly selected terminal to produce the child tree.

Figure 32.4. An illustration of the switch mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, an internal function node is selected, two of the
subtrees below it are selected (highlighted in the figure) and their positions switched to
produce the child tree.
TEAM LRN

250

Mutation operators

Figure 32.5. An illustration of the cycle mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, a single node, either a terminal or function, is selected
at random (highlighted in the parent) and replaced by a randomly selected node with the
same number of arguments to produce the child tree.

primitives in the language and restricting mutation to return those primitives

with the corresponding type.

32.6 Other representations

David B Fogel and Peter J Angeline

N , 2"" 1N } .

Within evolution strategies and evolutionary programming, the common

representation is simply the real-integer vector pair (i.e. no effort is made to
encode these vectors into another representation such as binary).
The simple approach to mutating such a representation would be to embed
the integers in the real numbers and use the standard methods of mutation (e.g.
Gaussian random perturbation) found in evolution strategies and evolutionary
programming. The results could be rounded to the integers when dealing with
the elements in d . Back and Schiitz (1995) note, however, that, for a discrete
optimization problem, the 'optimum point obtained by rounding the results of
the continuous optimization might be different from the true discrete optimum
point even for linear objective functions with linear constraints'. Back and
Schiitz (1995) also note the potential problems in optimizing x and d separately
(as in the work of Lohmann (1992) and Fogel (1991, 1993) among others)
because there may be interdependences between the appropriate mutations to z
TEAM LRN
and d.

Other representations

25 I

Back and Schutz (1995) approach the general problem by including a vector
of mutation strategy parameters p j E (0, 1) and j = I , 2, . . . , d , where there are
d components to the vector d . (Alternatively, fewer strategy parameters could
be used.) These strategy parameters are adapted along with the usual step size
control strategy parameters for Gaussian mutation of the real-world vector x.
The discrete strategy parameters are updated by the formula

where y is set proportional to [2(d)1/2]-1/2.

Actual mutation to the parameters
in d can be accomplished using an appropriate random variable (e.g. uniform
or Poisson).
With regard to mutation in introns, because the introns are not coded into
functional behavior (i.e. they do not affect performance in terms of the objective
function), the manner in which they are mutated is irrelevant.
In the standard genetic algorithm representation, the semantics of an allele
value (how the allele is interpreted) are typically tied to its position in the
fixed-length n-ary string. For instance, in a binary string representation, each
position signifies the presence or absence of a specific feature in the genome
being decoded. The difficulty with such a representation is that with positions
in the string representation that are semantically linked, but separated by a large
number of intervening positions in the string, crossover has a high probability
of disrupting beneficial settings for these two positions. Goldberg et a1 (1989)
describe a representation for a genetic algorithm that embodies one approach to
addressing this problem. In their messy genetic algorithm (mGA), each allele
value is represented as a pair of values, one specifying the actual allele value
and one specifying the position the allele occupies. Messy GAS are defined to be
of variable length, and Goldberg et a1 (1989) describe appropriate methods for
resolving underdetermined or overdetermined genomes. In this representation it
is important to note that the semantics are literally carried along with the allele
value in the form of the alleles string position.
Diplodic representations, representations that include multiple allele values
for each position in the genome, have been offered as mechanisms for modeling
cyclic environments. In a diplodic representation, a method for determining
which allele value for a gene will be expressed is required to adjudicate when the
allele values do not agree. Building on earlier investigations (e.g. Bagley 1967,
Hollstein 1971 , Brindle 1981), Goldberg and Smith ( 1987) demonstrate that
an evolving dominance map allows quicker adaptation to cyclical environment
changes than either a haploid representation or a diploid representation using
a fixed dominance mapping. In the article by Goldberg and Smith (1987), a
triallelic representation from the dissertation of Hollstein ( 1971 ) is used: 1,
i, and 0. Both 1 and i map to the allele value of I , , while 0 maps to the
TEAM
LRNi and 0 and 0 dominating i. Thus,
allele value of 0 with 1 dominating
both

252

Mutation operators

the dominance of a I over a 0 allele value could be altered via mutation by

altering the value to an i. Ng and Wong (1995) extend the multiallele approach
to dominance computation by adding a fourth value for a recessive 0. Thus 1
dominates 0 and o while 0 dominates i and 0. When both allele values for a
gene are dominant or recessive, then one of the two values is chosen randomly
to be the dominant value. Ng and Wong (1995) also suggest that the dominance
of all of the components in the genome should be reversed when the fitnzss
value of an individual falls by 20% or more between generations.

References
Angeline P J I996a The effects of noise on self-adaptive evolutionary optimization, Pmc.
5th Ann. Cot$ oti E\lolittioncin Progrunttnirtg ed L J Fogel, P J Angeline and T
Back (Cambridge, MA: MIT Press) pp 443-50
-1996b Genetic programmings continued evolution Adiutices in Genetic Progr,itnming vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Press) pp 89-1 10
Angeline P J, Fogel D B and Fogel L J 1996 A comparison of self-adaptation methods for
finite state machines in a dynamic environment Proc. 5th Atin. Cot$ otz Evo1utior:cin.
Progrcim?ting ed L J Fogel, P J Angeline and T Back (Cambridge, MA: MIT Przss)
pp 431-50
Back T 1993 Optimal mutation rates in genetic search Proc. 5th Itit. Cot$ oti Genetic
Algorithms (UrbcitiCi-Chat)ipcrign. IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 2-8
-1996 E\~olutiottciry Algorithnrs in Theory und Prcicticbe (New York: Oxford
University Press)
Biick T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Anti. Cot$ on E\dutiorrcin
Progranimi,ig (Sun Diego, CA) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 1 1-22
Back T and SchiitL M 1995 Evolution strategies for mixed-integer optimization of optical
i
Progrummirig (San Diego,
multilayer systems Proc. 4th Anri. Cortf. o ~ Evolittioncin
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge.
MA: MIT Press) pp 33-51
Biick T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization E\.wlutiotui n-Cotnput. 1 1-24
Bagley J D 1967 The BehuiGor of A d a p t i , ~SJAtetns \t*hich Employ Genetic cind
Correlcitioti Algorithnis Doctoral Dissertation, University of Michigan; Universitj
Microfilms 68-7556
Bremermann H J 1962 Optimidation through evolution and recombination SelfOrglinizing Sy.\tems ed M C Yovits, G T Jacobi and G D Goldstine (Washington,
DC: Spartan) pp 93-106
Bremermann H J and Rogson M 1964 An Evolution -type Secirch Method j i w Corii-e.x Sets
ONR Technical Report, contracts 222( 85) and 3656(58)
Bremermann H J, Rogson M and Salaff S 1965 Search by evolution BiophyAict arid
Cjhernetic System.\ ed M Maxtield, A Callahan and L J Fogel (Washington, DC:
TEAM LRN
Spartan) pp 157-67

References

25 3

-I966 Global properties of evolution processes Naturul Autoniutci and Usefitl

Simulations ed H H Pattec, E A Edelsack, L Fein and A B Callahan (Washington,
DC: Spartan) pp 3-41
Brindle A 198I Genetic Algoritlzms for Function Optimicutiori Doctoral Dissertation,
University of Alberta
Davis L I989 Adapting operator probabilities in genetic algorithms Pmc. 3rd I t i t . Corlf:
on Genetic Algorithms (Fuirfns, VA, June 1989) ed J D Schaffer (San Mateo, CA:
Morgan Kaufmann) pp 6 1-9
Davis L 1991a Hundhook of Genetic Algorithms (New York: Van Nostrand Reinhold)
-1991
b A genetic algorithms tutorial Handbook c,f Genetic A1gorithnr.s ed L DaL'is
(New York: Van Nostrand Reinhold) pp 1-101
De Jong K A 1975 An Analysis c,f the Behaviour of u Class of Genetic Adapti\v System.\
PhD Thesis, University of Michigan
Fogarty T C 1989 Varying the probability of mutation in the genetic algorithm Proc. 3rd
hit. Car$ on Genetic Algorithms (Fui&r, VA, 1989) ed J D Schaffer (San Mateo,
CA: Morgan Kaufmann) pp 104-9
Fogel D B 199I System IdentiJcation through Simduted Etdirtion (Needham, MA: Ginn)
-1
993 Using evolutionary programming to construct neural networks that are capable
of playing tic-tac-toe Proc. 1993 IEEE Int. Coiif on Neiirul Nehtm-ks (Piscataway.
NJ: IEEE) pp 875-80
-1
994 Asymptotic convergence properties of genetic algorithms and evolutionary
programming: analysis and experiments Cybern. Syst. 25 3 8 9 4 0 7
-1995 E t d u t i o n u n Computation: Toward a Neit*Philosophj~of Muchine Intelligence
(New York: IEEE)
Fogel D B and Atmar J W 1990 Comparing genetic operators with Gaussian mutations
in simulated evolutionary processes using linear systems Biol. Cyhern. 63 1 11-4
Fogel D B, Fogel L J and Atmar J W 1991 Meta-evolutionary programming Proc.. 25th
Asilomar Car$ on Signals, Systems, and Computers ed R R Chen (Pacific Grove,
CA: Maple) pp 540-5
Fogel D B, Fogel L J, Atmar W and Fogel G B 1992 Hierarchic methods of evolutionary
programming Proc. 1st Ann. Cont on Evolutionan Progrutnming ed D B Fogel
and W Atmar (La Jolla, CA: Evolutionary Programming Society) pp 175-82
Fogel D B and Stayton L C 1994 On the effectiveness of crossover in simulated
evolutionary optimization BioSystems 32 17 1-82
Fogel L J, Angeline P J and Fogel D B 1994 A preliminary investigation on extending
evolutionary programming to include self-adaptation on finite state machines
lnformaticu 18 387-98
-1995
An evolutionary programming approach to self-adaptation in finite state
machines Proc. 4th Anti. Con5 on Evolutionay Progrcinzming ( S a n Diego, CA,
March, 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA:
MIT Press) pp 355-65
Fogel L J and Fogel D B 1986 Artijciul Intelligence through E,dutionan Programming
Final Report for US Army Research Institute, contract no P0-9-X56- 1 1 02C- 1
Fogel L J, Owens A J and Walsh M J 1966 Art@cial Intelligence Through Simulated
Evolution (New York: Wiley)
Forrest S and Mitchell M 1993 Relative building-block fitness and the building block
hypothesis Foundations of Genetic Algorithms 2 ed D Whitley (San Mateo. CA:
Morgan Kaufmann) pp 109-26 TEAM LRN

254

Mutation operators

Gehlhaar D K and Fogel D B 1996 Tuning evolutionary programming for

conformationally flexible molecular docking Proc. 5th Ann. Conj: on Evolutioncin
Programming ed L J Fogel, P J Angeline and T Back (Cambridge, MA: MIT P r e s )
at press
Ghozeil A and Fogel D B 1996 A preliminary investigation into directed mutations in
evolutionary algorithms Parallel Problem Solving from Nature 4 to appear
Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning
(Reading, MA: Addison-Wesley)
Goldberg D E, Korb D E and Deb K 1989 Messy genetic algorithms: motivation, analysis,
and first results Complex Sjst. 3 493-530
Goldberg D E and Smith R E 1987 Nonstationary function optimization using genetic
algorithms with dominance and diploidy Proc. 2nd Int. Conj on Genetic Algorithms
(Cumbridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 59-68
Grefenstette J J 1986 Optimization of control parameters for genetic algorithms IEEE
T r a m Syst. Man Cybernet. SMC-16 122-8
Hesser J and R Manner 1991 Towards an optimal mutation probability in genctic
algorithms Proc. 1st Con& on Purallel Problem Solving front Nature (Dortmund,
1990) (Lecture Notes in Computer Science 496) ed H-P Schwefel and R Manner
(Berlin: Springer) pp 23-32
-1
992 Investigation of the m-heuristic for optimal mutation probabilities Parallel
Problem Solving from Nature, 2 (Proc. 2nd lnt. Con$ on Parallel Problem Solving
from Nature, Brussels, 1992) ed R Manner and B Manderick (Amsterdam: Elsevier)
pp 115-24
Holland J H 1975 Aduptation in Natural and Artificial Systems (Ann Arbor, M I :
University of Michigan Press)
Hollstein R B 197 1 Artijciul Genetic Adaptation in Computer Control Systems Doctoral
Dissertation, University of Michigan; University Microfilms 7 1-23, 773
Koza J R 1992 Genetic Programming: On the Programming of Computers by Means cf
Naturul Selection (Cambridge, MA: MIT Press 1
Lin S and Kernighan B 1973 An efficient heuristic procedure for the traveling salesman
problem Operations Res. 21 498-5 16
Lohmann, R 1992 Structure evolution in neural systems Dynamic, Genetic, and Chaotic
Progrumming ed B Soucek (New York: Wiley) pp 395-41 1
Michalewicz Z 1996 Genetic Algorithms + Data Structures = Evolution Programs 3rd
edn (Berlin: Springer)
Michalewicz Z, Logan T and Swaminathan S 1994 Evolutionary operators for continuous
convex parameter spaces Proc. 3rd Ann. Con& on Evolutionary Programming (Sun
Diego, CA, February 1994) ed A V Sebald and L J Fogel (Singapore: World
Scientific) pp 84-97
Montana D J 1995 Strongly typed genetic programming Evolutionary Comput. 3 199-230
Montana D J and Davis L 1989 Training feedforward neural networks using genetic
algorithms Proc. 11th lnt. Joint Conj on Artijicial Intelligence ed N S Sridharan
(San Mateo, CA: Morgan Kaufmann) pp 762-7
Muhlenbein H 1992 How genetic algorithms really work: I mutation and hillclimbing
Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Con& on Parallel Problem
Solving from Nuture, Brussels, 1992) ed R Manner and B Manderick pp 15-25
Ng K P and Wong K C 1995 A new diploid scheme and dominance change mechanism
TEAM LRN
for non-stationary function optimization
Proc. 6th Int. Cot6 on Genetic Algorirhms

References

255

(Pittsburgh, PA, July 1995) ed L J Eshelman (San Mateo, CA: Morgan Kaufmann)
pp 159-66
Ostermeier A 1992 An evolution strategy with momentum adaptation of the random
number distribution Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Cortf:
on Parallel Problem Solving from Nature, Brussels, 1992) ed R Manner and B
Manderick pp 197-206
Radcliffe N and Surry P D 1995 Fitness variance of formae and performance prediction
Foundations of Genetic Algorithms 3 ed D Whitley and M Vose (San Mateo, CA:
Morgan Kaufmann) pp 5 1-72
Rechenberg I 1973 Evolutionsstrategie: Optimierung rechrtischer Systeme nach
Prinzipien der biologischen Evolution (Stuttgart: Frommann-Holzboog)
Reed J, Toombs R and Barricelli N A 1967 Simulation of biological evolution and
machine learning J. Theor. Biol. 17 3 1 9 4 2
Rudolph G I994 Convergence properties of canonical genetic algorithms IEEE Tram.
Neural Nemorks 5 96-1 01
Saravanan N and Fogel D B 1994 Learning strategy parameters in evolutionary
programming: an empirical study Proc. 3rd Ann. Conf on Eivlutionap
Programming (San Diego, CA, February 1994) ed A V Sebald and L J Fogel
(Singapore: World Scientific) pp 269-80
Saravanan N, Fogel D B and Nelson K M 1995 A comparison of methods for selfadaptation in evolutionary algorithms BioSystems 36 157-66
Schwefel H-P 1981 Numerical Optimization of Computer Models (Chichester: Wiley)
-1995
Evolution and Optimum Seeking (New York: Wiley)
Schaffer J D, Caruana R A, Eshelman L J and Das R 1989 A study of control parameters
affecting online performance of genetic algorithms for function optimization Proc.
3rd Int. Con$ on Genetic Algorithms (Faigm, VA, June 1989) ed J D Schaffer (San
Mateo, CA: Morgan Kaufmann) pp 51-60
Syswerda G 199 1 Schedule optimization using genetic algorithms Handbook qf Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 332-49
Yao X and Liu Y 1996 Fast evolutionary programming Proc. 5th Ann. Con$ on
Evolutionary Programming ed L J Fogel, P J Angeline and T Back (Cambridge,
MA: MIT Press) at press

TEAM LRN

33
Recombination
Lashon B Booker (33. I), David B Fogel (33.2, 33.4, 33.6 I,
Darrell Whitley ( 3 3 3 , Peter J Angeline (33.5, 33.6)
a n d A E Eiben (33.7)

33.1 Binary strings

Lcrshon B Bnoker

33. I . I

Introduction

In biological systems (see section 5.4), crossing-over is a complex process

that occurs between pairs of chromosomes. Two chromosomes are physically
aligned, breakage occurs at one or more corresponding locations on eiich
chromosome, and homologous chromosome fragments are exchanged before
the breaks are repaired. This results in a recombination of genetic material
that contributes to variability in the population. In evolutionary algorithms, this
process has been abstracted into syntactic crossing-over (or crossover) operators
that exchange substrings between chromosomes represented as linear strings
of symbols. In this section we describe various approaches to implementing
these computational recombination techniques. Note that, while binary strings
(Chapter 15) are the canonical representation of chromosomes most often
associated with evolutionary algorithms, crossover operators work the same
way on all linear strings regardless of the cardinality of the symbol alphabet.
Accordingly, the discussion in this section applies to both binary and nonbinary
string representations. The obvious caveat is that the syntactic manipulations by
crossover must yield semantically valid results. When this becomes a problemfor example, when the chromosomes represent permutations (see Chapter 17)then other syntactic operations must be used.
33. I .2 Principal mec.hanisms
The basic crossover operation, introduced by Holland ( 1 975), is a three-step
procedure. First, two individuals are chosen at random from the population

256

TEAM LRN

Binary strings

257

of 'parent' strings generated by the selection operator (see Chapters 22-30).

Second, one or more string locations are chosen as breakpoints (or c'rosso\vr
points) delineating the string segments to exchange. Finally. parent string
segments are exchanged and then combined to produce two resultant 'offspring'
individuals. The proportion of parent strings undergoing crossover during a
generation is controlled by the crossover rate, p c E [0, 11. which determines
how frequently the crossover operator is invoked. Holland illustrates how to
implement this general procedure by describing the simple one-point crossover
operator. Given parent strings x and y, a crossover point is selected by randomly
choosing an integer k
U ( 1 , C. - 1):

Two new resultant strings are formed by exchanging the parent substrings to the
right of position k . Holland points out that when the overall algorithm is limited
to producing only one new individual per generation, one of the resultant strings
generated by this crossover operator must be discarded. The discarded string is
usually chosen at random.
Holland's general procedure defines a family of operators that can be
described more formally as follows. Given a space I of individual strings,
a crossover operator is a mapping

where m

Bt and
if m , = 0
if m i = 1

di =

{"
a,

if m , = 0
if i n, = I .

Although this formal description characterizes crossover as a binary operator,

there are some implementations of crossover involving more than two parents
(e.g. the multiparent uniform crossover operator described by Furuya and Haftka
( 1993) and the scanning crossover and diagonal crossover operators described
by Eiben et a1 (1995)).
The binary string m is a mask computed for each invocation of the operator
from the set of crossover points. This mask identifies which string segments
will be exchanged during the crossover operation. Note that the mask m and its
complement 1 - m = ( 1 - rnl . . . I - r n k ) generate the same (unordered) set of
resultant strings. Another way to interpret the mask is as a specification of which
parent provided the symbol at each position in a resultant string. A crossover
operation can be viewed as the simultaneous occurrence of two recombination
everzts, each producing one of the two offspring. The pair ( m ,1 - m ) can be
used to designate these recombination events. Each symbol in a resultant string
TEAM (denoted
LRN
is either transmitted by the first parent
in the mask by zero) or the

258

Recombination

second parent (denoted by one). Consequently, the event generating string, c

above is specified by m and the event generating d is specified by 1 - m.
A simple pseudocode for implementing one of these crossover operators is:
crossover(a, 6) :
sample U E U ( 0 , I )
if (U > p,)
then return(a, b)
fi
c := a ;
d := 6;
rn := compute-mas k() ;
for i := 1 to t! do
if ( m ;= 1 )
then
U;

:= bi;

n; :=

a;;

fi
od
return(c, d ) ;
Empirical studies have shown that the best setting for the crossover rate pc
depends on the choices made regarding other aspects of the overall algorithm,
such as the settings for other parameters such its population size and mutation
rate, and the selection operator used. Some commonly used crossover rates
are p c = 0.6 (De Jong 1975), p c E [0.45,0.95] (Grefenstette 1986), and
p , E [0.75,0.95](Schaffer et nl 1989). Techniques for adaptively modifying the
crossover rate have also proven to be useful (Booker 1987, Davis 1989, Srinivas
and Patnaik 1994, Julstrom 1995). The pseudocode shown above makes it clear
that the differences between crossover operators are most likely to be found in
the implementation of the compute-mask() procedure. The following examples
of pseudocode characterize the way compute-mask() is implemented for the
most commonly cited crossover operators.
One-point crossover. A single crossover point is selected. This operator can
only exchange contiguous substrings that begin or end at the endpoints of the
chromosome. This is rarely used in practice.

sample U E U ( 1 , t! - 1)
m := 0;
fori:=u+l told0
m ,= 1 ;
od
return m;

TEAM LRN

Binary strings

259

n-point crossover. This operator, first implemented by De Jong ( 1975),

generalizes one-point crossover by making the number of crossover points a
parameter. The value n = 2 designating two-point crossover is the choice
that minimizes disruptive effects (see the discussion of disruption in Section
33.1.3) and is frequently used in applications. There is no consensus about the
advantages and disadvantages of using values n 3 3. Empirical studies on this
issue (De Jong 1975, Eshelman et al 1989) are inconclusive.
sample u 1 , . . . , U,, E U ( l , l ) ,u1 5 - . . 5
if ( ( n mod 2) = 1)
then u,,+1 := t!;

U,,

fi
m := 0;
for j := 1 to n step 2 do
for i := U, 1 to u,+l do
m ,= 1 ;
od
od
return m;

By convention (De Jong 1975), when n is odd an additional crossover point

is assumed to occur at position l . Note that many implementations select the
crossover points without replacement-instead of with replacement as indicated
here-to guarantee that the crossover points are distinct. Analysis of disruptive
effects has shown that there are only small differences in the two approaches
(see the discussion of disruption in Section 33. I .3) and no empirical differences
in performance have been reported.

Uniform crossover. This is an operator introduced by Ackley (1987a) but most

often attributed to Syswerda (1989). (The basic idea can be traced to early
work in mathematical population genetics, see Geiringer ( 1944)). The number
of crossover points is not fixed in advance. Instead, the decision to insert
a breakpoint is made independently at each string position. This operator is
frequently used in applications.

m := 0;
for i := 1 to l do
sample U E U ( 0 , 1 )
if (U 5 P d
then m , = 1 ;
fi
od
return m

TEAM LRN

260

Recombination

The value p , = 0.5 first used by Ackley remains the standard setting for
the crossover probability at each position, though it may be advantageous to
use smaller values (Spears and De Jong 1991b). When p r = 0.5, every binary
string of length e is equally likely to be generated as a mask. In this case,
i t is often more efficient to implement the operator by using a random integer
sampled from U ( 0 ,2' - I ) as the mask instead of constructing the mask one bit
at a time.

Punc~uatedcro.sso\~er. Rather than computing the crossover mask directly,

Schaffer and Morishima (1987) used a binary string of 'punctuation marks'
to indicate the location of crossover points for a multipoint crossover operation.
The extra information was appended to the chromosome so that the number
and location of crossover points could be manipulated by genetic search. The
resulting representation used by the punctuated crossover operator is a string of
length 2.t, x = (A-, . . . .r[.r; . . . x i ) , where x r is the symbol at position i and .r( is a
punctuation mark that is 1 if position i is a crossover point and 0 otherwise. The
set of crossover points used in a recombination event under punctuated crossover
is given by the union of the crossover points specified on each chromosome
compute-mask(a, b )
j := 0;
for i := 1 to t / 2 do
nil := j ;
in: := j
if ( ( c r j = I ) or (h: = 1))
then j = 1 - j ;

fi
od
return ( m ) ;
Note that the symbol and punctuation mark associated with a chromosome
position are transmitted together by the punctuated crossover operator. While
the idea behind this operator is appealing, empirical tests of punctuated crossover
were not conclusive and the operator is not widely used.

In practice, various aspects of these operators are often modified to enhance

performance. Consider, for example, the choice of retaining both resultant
strings produced by crossover (a common practice) versus discarding one of
the offspring. Holland ( 1975) described an implementation designed to process
only one new individual per generation and, consequently, his algorithm discxds
one of the offspring generated by crossover. Some implementations retain this
feature even if they produce more than one new individual per generation.
However, empirical studies (Booker
TEAM1982)
LRN have shown that retaining both

Binary strings

26 I

offspring can substantially reduce the loss of diversity in the population. Another
widespread practice is to restrict the crossover points to those locations where
the parent strings have different symbols. This so-called reduced surrogate
technique (Booker 1987) improves the ability of crossover to produce offspring
that are different from their parents.
An implementation technique called shuffle crosso\er was introduced by
Eshelman et a1 (1989). The symbols in the parent strings are shuffled by a
permutation operator before crossover is invoked. The inverse permutation is
applied to the offspring produced by crossover to restore the original symbol
ordering. This method can be used to counteract the tendency in n-point
crossover ( I ? 2 1) to disrupt sets of symbols that are widely dispersed on the
chromosome more than it disrupts symbols which are close together (see the
discussion of bias in Section 33.1.4).
The crossover mechanisms described so far are all consistent with the
simplest principle of Mendelian inheritance: the requirement that every gene
carried by an offspring is a copy of a gene inherited from one of its parents.
Radcliffe ( 1991 ) points out that this conservation of genetic material during
recombination is not a necessary restriction for artificial recombination operators.
From the standpoint of conducting a robust exploration of the opportunities
represented by the parent strings, it is reasonable to ask whether a crossover
operator can generate all possible offspring having some combination of genes
found in the parents. Given a binary string representation, the answer for onepoint and n-point crossover is no while the answer for shuffle crossover and
uniform crossover is yes. (To see this, simply consider the set of possible
resultant strings for the parents 0 and 1.) For nonbinary strings, however, the
only way to achieve this capability is to allow the offspring to have genes
that are not carried by either parent. Radcliffe used this idea as the basis for
designing the random respectful recombination operator. This operator generates
a resultant string by copying the symbols at positions where the parents are
identical, then choosing random values to fill the remaining positions. Note that
for binary strings, random respectful recombination is equivalent to uniform
crossover with p , = 0.5.

33.1.3 Formal analysis

Mathematical characteri,-atiorzs of crossorTer. Several characterizations of

crossover operators have been formulated to facilitate the formal analysis
of recombination and genetic algorithms. Geiringer ( 1944) characterized
recombination in terms of the probability that sets of genes are transmitted
from parents to offspring during a recombination event. The behavior of a
crossover operator is then completely specified by the probability distribution
it induces over the set of all possible recombination events. Geiringers study
TEAM LRN includes a thorough analysis of
of these so-called recombination distributions

Recombination

262

recombination acting on a population of linear chromosomes in the absence of

selection.
In more detail, the recombination distribution associated with a crossover
operator is defined as follows. Let Sr = { 1, . . . , C} be the set of C numbers
designating the loci in strings of length C. The number of alleles allowed at
each locus can be any arbitrary integer. For notational convenience we will
identify a crossover mask m with the subset A 5 St which indicates the loci
corresponding to the bit positions i where rn, = 1 . The set A is simply another
way to designate the recombination event specified by m. The complementary
subset A' = St \ A designates the recombination event specified by 1 - m.
The recombination distribution R is given by the probabilities R ( A ) for e x h
recombination event. Clearly, under Mendelian segregation, R(A ) = R(A ' )
since all alleles will be transmitted to one offspring or the other. It is also clear
that
R ( A ) = 1. We can therefore view recombination distributions as
probability distributions over the power set 2" (Schnell 1961). The marginal
recombination distribution R A , describing the transmission of the loci in A , is
given by the probabilities

xAcS,

R A ( B ) is the marginal probability of the recombination event in which one

parent transmits the loci in B C A and the other parent transmits the loci in
A \ B.
Other mathematical characterizations of crossover operators are useful when
the chromosomes happen to be binary strings. If the sum z @ y denotes
component-wise addition in the group of integers modulo 2 and the product z y
denotes bitwise multiplication, then the strings produced by a crossover operator
with mask m are given by ma@(l-m)band m b @ ( l - m ) a .Liepins and Vose
(1992) use this definition to show that a binary operator is a crossover operator
if and only if the operator preserves schemata and commutes with addition and
bitwise multiplication. Furthermore, they provide two characterizations of the
set of chromosomes that can be generated by an operator given an initial pool
of parent strings. Algebraically, this set is given by the mathematical closure
of the parent strings under the crossover operator. Geometrically, the set is
determined by projections defined in terms of the crossover masks associ(3ted
with the operator. Liepins and Vose prove that these algebraic and geometric
characterizations are equivalent.
The dynamics of recombination. Geiringer used recombination distributions
to examine how recombination without selection modifies the proportions of
individuals in a population over time. Assume that each individual .r E
{ 1, 2, . . . , k}' is a string of length C in a finite alphabet of k characters. We
also assume in the following that TEAM
B E ALRN
5 5''. Let p ( ' ) ( z ) be the frequency

Binary strings

263

of individual 2 in a population at generation t , and p i ' ( z ) denote the marginal

frequency of individuals that are identical to z at the loci in A . That is,
~"'(y)

P ~ ' ( Z )=

for each y satisfying yi = x i Vi

Geiringer derives the following important recurrence relations:

p"+"(z) =

R(A)p"'(z)p")(y)

(33.1)

A.x,y

SE is arbitrary

xi = z; Vi E A'

B EA

C Sc. are arbitrary subsets

(33.3)
These recurrence relations are equivalent, complete characterizations of how
recombination changes the proportion of individuals from one generation to the
next. Equation (33.1) has the straightforward interpretation that alleles appear
in offspring if and only if they appear in the parents and are transmitted by
a recombination event. Each term on the right-hand side of (33.1) is the
probability of a recombination event between parents having the desired alleles at
the loci that are transmitted together. A string z is the result of a recombination
event A whenever the alleles of z at loci A come from one parent and the
alleles at loci A' come from the other parent. The change in frequency of an
individual string is therefore given by the total probability of all these favorable
occurrences. Equation (33.2) is derived from (33.1) by collecting terms based
on marginal recombination probabilities. Equation (33.3) is derived from (33.1)
by collecting terms based on marginal frequencies of individuals.
The last equation is perhaps the most significant, since it leads directly to a
theorem characterizing the expected distribution of individuals in the limit.
Theorem (Geiringer's theorem U). If t loci are arbitrarily linked, with the one
exception of 'complete linkage', the distribution of transmitted alleles 'converges
toward independence'. The limit distribution is given by

TEAM LRN

264

Recombination

which is the product of the t marginal distributions of alleles from the initial
popu M o n .
This theorem tells us that, in the limit, random mating and recombination
without selection lead to chromosome frequencies corresponding to the simple
product of initial allele frequencies. A population in this state is said to be
in lirzkcige eqiiilibriiirn or Robbins equilibrium (Robbins 19 18). This result
holds for all recombination operators that allow any two loci to be separated by
recombination.
Note that Holland (1975) sketched a proof of a similar result for schetna
frequencies and one-point crossover. Geiringers theorem applied to schemata
gives us a much more general result. Together with the recurrence equations,
this work paints a picture of search pressure from recombination acting to
reduce departures from linkage equilibrium for all schemata.
Subsequent work has carefully analyzed the dynamics of this convergence
to linkage equilibrium (Christiansen 1989). It has been proven, for example,
that the convergence rate for any particular schema is given by the probability
of the recombination event specified by the schemas defining loci. In this
view, an important difference between crossover operators is the rate at which,
undisturbed by selective pressures, they drive schemata to their equilibrium
proportions. These results from mathematical population genetics have oiily
recently been applied to evolutionary algorithms (Booker 1993, Altenberg 1995).

Disrupriorz una1ysi.s. Many formal studies of crossover operators focus

specifically on the way recombination disrupts and constructs schemata.
Hollands (1975) original analysis of genetic algorithms derived a bound for
the disruptive effects of one-point crossover. This bound was based on the
probability t ( ( ) / ( t- I ) that a single crossover point will fall within the
defining length t ( c )of a schema 6. Bridges and Goldberg (1987) subsequently
provided an exact expression for the probability of disruption for one-point
crossover. Spears and De Jong (1991a) generalized these results to provide
exact expressions for the disruptive effects of n-point and uniform crossover.
Recombination distributions provide a convenient framework for analyzing
these disruptive effects (Booker 1993). The first step in this analysis is to
derive the marginal distributions for one-point, n-point, and uniform crossover.
Analyses using recombination distributions can be simplified for binary strings
if we represent individual strings using index sets (Christiansen 1989). Each
binary string II: can be represented uniquely by the subset A C SI using the
convention that A designates the loci where x, = 1 and A designates the loci
where s, = 0. In this notation St represents the string 1, M represents the string
0, and A represents the binary complement of the string represented by A . Index
sets can greatly simplify expressions involving individual strings. Consider, for
LRN
example, the marginal frequency ~ TEAM
A ( I I : )of
individuals that are identical to 3: at

265

Binary strings
the loci in A . The index set expression

makes it clear that p A ( B ) involves strings having the allele values given by B
at the loci designated by A . Note that p f l ( B )= 1 and p s , ( B ) = p ( B ) .
With this notation we can also succinctly relate recombination distributions
A
and schemata. If A designates the defining loci of a schema and B
specifies the alleles at those loci, then the frequency of 6 is given by p , 4 ( B )and
the marginal distribution RAdescribes the transmission of the defining loci of
6. In what follows we will assume, without loss of generality, that the elements
of the index set A for a schema 6 are in increasing order so that the kth element
A ( A )is the locus of the kth defining position of
This means, in particular,
that the outermost defining loci of are given by the elements A " , and A , Q ~ , ,
where O(6) is the order of 6. It will be convenient to define the following
property relating the order of a schema to its defining length a(<).

Dejinition. The kth cwnporient of defining length for schema 6,&, (<), is the
1st defining loci, 1 5 k < O((), with the
distance between the kth and k
convention that
t - S(6).

&(c)

Note that the defining length of a schema is equal to the sum of its defining
length components:

k= 1

Given these preliminaries, we can proceed to describe the recombination

distributions for specific crossover operators.
One-point crossover. Assume exactly one crossover point in a string of
1 with probability I / ( t - 1) for
length t , chosen between loci i and i
i = 1 , 2 . . . . , l - 1. The only recombination events with nonzero probability
are S , = [1,x] and S: = [x 1 , e - 13 for x = I , 2 , . . . , t - 1. The probability
of each event is

RI(&.)
= RI($)=

2(t - I )

since each parent is equally likely to transmit the indicated loci. The marginal
distribution 72; for an arbitrary index set A can be expressed solely in terms
of these recombination events. We will refer to these events as the primary
recombination events.
Now for any arbitrary event B 2 A there are two cases to consider:
(i)

B = M. This corresponds to the primary recombination events S , , x < A ( l )

and S : , x 3 A ( o ( 6 , ) .There areTEAM
t - LRN
1 - 6(c) such events.

Recomb i n ati 02

266

(ii) B # 8. These situations involve the primary events S,r, A(1) 5 x <
A(o(6)). The events B having nonzero probability are given by B, =
{ A ( 1 ) ,. . . , A , , ) } , 1 5 i < O(6). For each i , there are 6,(6) corresponding
primary events.
The complete marginal distribution is therefore given by

RA(B)=

e - 1 - 6(6)
2 ( [ - 1)

W)
2(e - 1)
0

if B = 8 or B = A
if B = B , , 1 5 i < O(c)
otherwise.

Note that if we restrict our attention to disruptive events, we obtain the

familiar result

n-point crossover. The generalization to n crossover points in a string of

length e uses the standard convention (De Jong 1975) that when the number of
crossover points is odd, a final crossover point is defined at position zero. We
also assume that all the crossover points are distinct, which corresponds to the
way multipoint crossover is often implemented. Given these assumptions, there
are 2( ,:) nonzero recombination events if n is even or n = l , and 2( ';I)
such
events if n is odd. Since the n points are randomly selected, these events are
equally likely to occur.
We derive an expression for the marginal distributions in the same way as we
proceeded for one-point crossover. First we identify the relevant recombination
events, then we count them up and multiply by the probability of a single
event. Identification of the appropriate recombination events begins with the
observation (De Jong 1975) that crossover does not disrupt a schema whenever
an even number of crossover points (including zero) fall between successive
defining positions. We can use this to identify the configurations of crossover
points that transmit all the loci in B C A and none of the loci in A \ B . Given
any two consecutive elements of A , there should be an even number of crossover
points between them if they both belong to B or A \ B . Otherwise there should
be an odd number of crossover points between them. This can be formalized
as a predicate X, that tests these conditions for a marginal distribution R A
1
X A ( B ,n ,i ) =

if n is even and { A , , , ,A ( , - I ) }n B = !d or { A ( , ) ,A ( , - I ) )
if n is odd and [ A ( , ) ,A ( [ - l ) } n B # yl or { A ( , ) A
, (,- I)
where 2 5 i 5 O(6)
otherwise. TEAM LRN

Binary strings

267

The recombination events can be counted by simply enumerating all possible

configurations of crossover points and discarding those not associated with
the marginal distribution. The following function NAcomputes this count
recursively (as suggested by the disruption analysis of Spears and De Jong
( I99 1a)):

Putting all the pieces together, we can now give an expression for the
complete marginal distribution.

Uniform crossover. The marginal distribution R;for parametrized uniform

crossover with parameter p is easily derived from previous analyses (Spears and
De Jong 1991b). It is given by

Figure 33.1 shows how the marginal probability of transmission for secondorder schemata-2 R: ( A ) and 2 R:(o.,
I A J = 2-varies
as a function of
defining length. The shape of the curves depends on whether n is odd or
even. Since the curves indicate the probability of transmitting schemata, the
area above each curve can be interpreted as a measure of potential schema
disruption. This interpretation makes it clear that two-point crossover is the best
choice for minimizing disruption. Spears and De Jong (1991a) have shown that
this property of two-point crossover remains valid for higher-order schemata.
Note that these curves are not identical to the family of curves for
nondisruptive crossovers given by Spears and De Jong. The difference is
TEAM LRNpoints are selected randomly with
that Spears and De Jong assume crossover

268

Recombination

Figure 33.1. Transmission probabilities for second-order schemata. The inset shows the
behavior of these curves in the vicinity of the point L / 2 .

replacement. This means that their measure P2.e,enis a polynomial function of

the defining length having degree iz, with 17 identical solutions to the equation
P.,,,,,, = 1/2 at the point !/2. The function R ; ( A ) , on the other hand, has
iz distinct solutions to the equation 2R:,(A) = 1/2 as shown in the upper
right-hand corner of figure 33.1. This property stems from our assumption that
crossover points are distinct and hence selected without replacement.
Finally, regarding the construction of schema, Holland (1 989) has analyzed
the expected waiting time to construct a new schema that falls in the intersection
of two schemas already established in a population. He gives examples showing
that the waiting time for one-point crossover to construct the new schema can
be several orders of magnitude shorter than the waiting time for mutation.
Thierens and Goldberg (1993) also examine this property of recombination
by analyzing so-called mixing events-recombination events in which building
blocks from the parents are juxtaposed or mixed to produce an offspring having
more building blocks than either parent. Using the techniques of dimensional
analysis they show that, given only simple selection and uniform crossover,
effective mixing requires a population size that grows exponentially with the
number and length of the building blocks involved. This indicates that additional
mechanisms may be needed to achieve effective mixing in genetic algorithms.

In order to effectively use any inductive search operator, it is important to

understand whatever tendencies the operator may have to prefer one search
TEAM LRN
outcome over another. Any such tendency
is called an inductive bias. Random

269

Binary strings

search is the only search technique that has no bias. It has long been recognized
that an appropriate inductive bias is necessary in order for inductive search to
proceed efficiently and effectively (Mitchell 1980). Two types of bias have
been attributed to crossover operators in genetic search: di.rtributinnn1 h i n s and
positiorzal bins (Eshelman et a1 1989).
Distributional bias refers to the number of symbols transmitted during a
recombination event and the extent to which some quantities might be more
likely to occur than others. This bias is significant because i t is correlated with
the potential number of schemata from each parent that can be recombined by
the crossover operator. An operator has distributional bias if the probability
distribution for the number of symbols transmitted from a parent is not uniform.
Both one-point and two-point crossover are free of distributional bias. The
n-point ( 1 2 > 2) crossover operators have a distributional bias that is well
approximated by a binomial distribution with mean t / 2 for large i i . Uniform
crossover has a strong distributional bias, with the expected number of symbols
transmitted given by a binomial distribution with expected value p , t . More
recently, Eshelman and Schaffer ( I 993) have emphasized the expected value of
the number of symbols transmitted rather than the distribution of those numbers.
The bias defined by this criterion, though clearly similar to distributional bias,
is referred to as recmnbinati\,e bias.
Positional bias characterizes how much the probability that a set of symbols
will be transmitted intact during a recombination event depends on the relative
2 L '
01 pt

1.5
v)

:1 381s

03pts

02pts
04pt95pts

013pts
14pts

-a..
1/2

0.5

ogpts
7pts 1opts

; ;0f"
,

lf8
1.5

Distributional Bias

Figure 33.2. One view of the crossover bias 'landscape' generated using quantitative
TEAM LRN
measures derived from recombination distributions.

270

Recombination

positions of those symbols on the chromosome. This bias is important because

it indicates which schemata are likely to be inherited by offspring from their
parents. It is also indicative of the extent to which these schemata will appear
in new contexts that can help distinguish the genuine instances of co-adaptation
from spurious linkage effects. Hollands ( I 975j analysis of one-point crossover
pointed out that the shorter the defining length of a schema, the more likely it is
to be transmitted intact during the crossover operation. Consequently, one-point
crossover has a strong positional bias. Analyses of n-point crossover (Spears
and De Jong 1991 a) lead to a similar conclusion for those operators, though the
amount of positional bias varies with n (Booker 1993). Uniform crossover has
no positional bias, which is one of the primary reasons it is widely used. Note
that shuffle crossover was designed to remove the positional bias from one-point
and ri-point crossover. Eshelman and Schaffer ( 1993) have revised their view
of positional bias, generalizing the notion to something they now call scherna
bias. An operator has no schema bias if schemata of the same order are equally
likely to be disrupted regardless of their defining length.
Recombination distributions can be used to derive quantitative measures of
crossover bias (Booker 1993). The overall bias landscape for various crossover
operators based on these measures is summarized in figure 33.2.

33.2 Real-valued vectors

Recombination acts on two or more elements in a population to generate at

least one offspring. When the elements are real-valued vectors (Chapter I6),
recombination can be implemented in a variety of forms. Many of these forms
derive from efforts within the evolution strategies community because of their
long involvement with continuous optimization problems. The simpler versions,
however, have been popularized within research in genetic algorithms.
For two parent real-valued vectors 21 and 22.each of dimension n, one-point
crossover is performed by selecting a random crossover point k and exchanging
the elements occurring after point k in 21 with those that occur after point k in
21 (see figures 33.3 and 33.4). This operator can be extended to a two-point
crossover in which two crossover points kl and k 2 are selected at random and the
segment in between these points is exchanged between parents (see figure 33 5 ) .
Extensions to greater multiple-point crossover operators follow naturally.
The one-point and two-point operators attempt to recombine vector
segments. Alternatively, individual elements can be recombined without regard
to longer segments in which they reside by using a uniform recombination
operator. Given two parents 21 and 22, one or more offspring are created
by randomly selecting each next element from either parent (see figure 33.6).
Typically, each parent has an equal chance of contributing the next element. This
TEAMetLRN
procedure was offered early on by Reed
al(l967)and was reintroduced within

27 1

Real-valued vectors

Parents
21

= Xl,lXl,2

* *

*Xl,kXl,k+l

2 2

= x 2 . Ix2.2

* *

X2.kX2.k+ I

* *

*Xl.tl

* * *

x2.d

Figure 33.3. For one-point crossover, two parents are chosen and a crossover point k
is selected, typically uniformly across the components. Two offspring are created by
interchanging the segments of the parents that occur from the crossover point to the ends
of the string.

Figure 33.4. A two-dimensional illustration of the potential offspring under a one-point

crossover operator applied to real-valued parents.

the genetic algorithm community by Syswerda (1989). A similar procedure is

also used within evolution strategies and termed discrete recombination (see
below, and also see the uniform scan operator of Eiben er a1 (1994), which is
applied to multiple parents).
In contrast to the crossover type recombination operators that exchange
information between parents, intermediate recombination operators attempt to
TEAM
LRN parents. A canonical version acts
average or blend components across
multiple

Offspring

Figure 33.5. For two-point crossover, two parents are chosen and two crossover points,
k l and k 2 , are selected, typically uniformly across the components. Two offspring are
created by interchanging the \egments defined by the points kl and kz.

Figure 33.6. For uniform crossover, each element of an offspring (here two offspring
art' depicted) is selected from either parent. The example shows that the first element in
both offspring were selected from the first parent. In some applications such duplication
is not allowed. Typically each parent has an equal chance of contributing each elemcnt
to an offspring.
Offspring

Parent 1

Parent 2

Figure 33.7. A geometrical interpretation of intermediate recombination applied to two

parents in a single dimension.

on two parents

and

22. and

creates an offspring x' as the weighted averaze:

where a is a number in [0, I ] and i = 1 , . . . , n (figure 33.7). If a = 0.5, then

the operation is a simple average at each component. Note that this operator
can be extended to act on more than two parents (i.e. a multirecombination) by
the operation
XI'

= aI.rIl TEAM
a?.r?,
LRN . . .

+ a/(s/(r

Real-valued vectors
subject to

273

E./ = I

i =l,...,k

where there are k individuals involved in the multirecombination. This general

procedure is also known as arithmetic crossoi'er (Michalewicz 1996, p 1 12) and
has been described in various other terms in the literature.
In a more generalized manner, recombination operators can take the
following forms (Back et a1 1993, Fogel 1995, pp 146-7):
(33.4)
(33.5)
(33.6)
(33.7)
(33.8)
where S and T denote two arbitrary parents, I I is a uniform random variable
over [O. 11, and i and ,j index the components of a vector and the vector itself,
respectively. The versions are no recombination (33.4). discrete recombination
(or uniform crossover) (33.5), intermediate recombination (33.6), and (33.7)
and (33.8) are the global versions of (33.5) and (33.6). respectively. extended
to include more than two parents (up to as many as the entire population size).
There are several other variations of crossover operators that have been
applied to real-valued vectors.
(i)

The heuristic crosso\'er of Wright ( 1994) takes the form

where ii is a uniform random variable over [0, 11 and 21 and 2 2 are the
two parent vectors subject to the condition that 21 is not worse than 21.
Michalewicz (1996, p 129) noted that this operator uses values of the
objective function to determine a direction to search.
(ii) The simplex crossover of Renders and Bersini ( 1994) selects k > 2 parents
(say the set J ) , determines the best and worst individuals within the selected
group (say 21 and 22, respectively), computes the centroid of the group
without 2 2 (say c ) and computes the reflected vector 2' (the offspring)
obtained from the vector 22 as
2' = c

+ (c

- 2,).

(iii) The geometricul crt,sso\~erof Michalewicz et a1 ( 1996) takes two parents

21 and 22 and produces a single offspring 2' as

2' = [ ( X I lxzl)o.s,. . , ( - T 1 , / S 2 , 1 ) o . s ] .

This operator can be generalized to a multiparent version:

TEAM LRN

273

Recombination

(iv) The jitness-based scan of Eiben et a1 (1904) takes multiple parents arid
generates an offspring where each component is selected from one of the
parents with a probability corresponding to the parents relative fitness. If
a parent has fitness f ( i ) , then the likelihood of selecting each individual
component from that parent is f ( i ) /
f ( j ) ,where j = I , . . . , k and there
are k parents involved in the operator.
( v ) The diagonal multiparent crossover of Eiben et a1 (1994) operates much
like n-point crossover, except that in creating k offspring from k parents,
c >_ 1 crossover points are chosen and the first offspring is constructed to
contain the first segment from parent 1 , the second segment from parent 2,
and so forth. Subsequent offspring are similarly constructed from a rotation
of segments from the parents.

33.3 Permutations
I Whit1ey

Dli rre 1

33.3.I

Iritroduc*tion

An obvious attribute of permutation problems (see Chapter 17) is that simple

crossover operators fail to generate offspring that are permutations. Consider the
following example of simple one-point crossover, when one parent is denoted
with capital letters and the other with lower-case letters:
String 1: A B C D E F G H I
\/

/\
String 2: h d a e i c f b g
Offspriqg 1: A B C e 1 c f b g
Offspring 2: h d a D E F G H I.

Neither of the two offspring represents a legal permutation. Offspring 1

duplicates elements B and C while omitting elements H and D . Offspring 2
has just the opposite problem: it duplicates H and D while omitting B and C.
Davis (1985) and Goldberg and Lingle (1085) defined some of the first
operators for permutation problems. One variant of Daviss order crossover
operator can be described as follows.
Dailiss order crossover. Pick two permutations for recombination. Denote the
first parent as the cut string and the other the filler string. Select two crossoier
points. Copy the sublist of permutation elements between the crossover points
from the cut string directly to the offspring, placing them in the same absolute
position. This will be referred to as the crossover section. Next, starting at the
secorid crossoiler point, find the next element in the filler string that does not
LRN crossover point, place the element
appear in the offspring. Starting at TEAM
the second

Permutations

275

from the filler string into the next available slot in the offspring. Continue
moving the next unused element from the filler string to the offspring. When
the end of the filler string (or the offspring) is reached, wrap around to the
beginning of the string. When done in this way, Daviss order crossover has
the property that Radcliffe (1 994) describes as pure recombination: when two
identical parents are recombined the offspring will also be identical with the
parents. If one does not start copying elements from the filler string starting at
the second crossover point, the recombination may not be pure.
The following is an example of Daviss order crossover, where dots
represent the crossover points. The underscore symbol in the crossover section
corresponds to empty slots in the offspring.
Parent 1: A B . C D E F . G H I
Crossover-section: - - C D E F - - Parent 2: h d . a e i c . f b g
Available elements in order: b g h a i
Offspring: a i C D E F b g h.

Note that the elements in the crossover section preserve relative order,
absolute position, and adjacency from parent 1 . The elements that are copied
from the filler string preserve only the relative order information from the second
parent.

Partially mapped crossover (PMX). Goldberg and Lingle ( 1985) introduced the
partially mapped crossover operator (PMX). PMX shares the following attributes
with Daviss order crossover. One parent string is designated as parent 1, the
other as parent 2. Two crossover sites are selected and all of the elements in
parent 1 between the crossover sites are directly copied to the offspring. This
means that PMX also defines a crossover section in the same manner as order
crossover.
Parent 1: A B . C D E . F G
Crossover-section: - - C D E Parent 2: c f

e b a

d g.

The difference between the two operators is in how PMX copies elements
from parent 2 into the open slots in the offspring after a crossover section has
been defined. Denote the parents as P1 and P2 and the offspring as OS; let
P1, denote the ith element of permutation PI. The following description of
selecting elements from P2 to place in the offspring is based on the article by
Whitley and Yoo (1995).
For those elements between the crossover points in parent 2, if element P2,
has already been copied to the offspring, take no action. In the example given
here, element e in parent 2 requires no processing. We will consider the rest of
TEAM LRN
the elements by considering the positions
in which they appear in the crossover

276

RecombinatitE

section. If the next element at position P2, in parent 2 has not already bezn
copied to the offspring, then find PI; = P2j; if position j has not been filled
in the offspring then assign OSj = P2;. In the example given here, the next
element in the crossover section of parent 2 is b which is in the same position as
D in parent 1 . Element D is located in parent 2 with index 6 and the offspring at
OS6 has not been filled. Copy h to the offspring in the corresponding position.
This yields
Offspring: - - C D E b

A problem occurs when we try to place element A in the offspring. Element

A in parent 2 maps to element E in parent 1 ; E falls in position 3 in parent

2, but position 3 has already been filled in the offspring. The position in the
offspring is filled by C, so we now find element C in parent 2. The position
is unoccupied in the offspring, so element A is placed in the offspring at the
position occupied by C in parent 2. This yields
Offspring: a - C D E b _ .

All of the elements in parent 1 and parent 2 that fall within the crossover
section have now been placed in the offspring. The remaining elements can be
placed by directly copying their positions from parent 2. This yields
Offspring: a f C D E b g.

33.3.2 Order arid positiori crossot-er

Syswerdas ( 1991 ) order crossover-2 and position crossover are different from
either PMX or Daviss order crossover in that there is no contiguous block
which is directly passed to the offspring. Instead several elements are randomly
selected by absolute position.
Order c*ros.sot~er-2.This operator starts by selecting K random positions in
parent 2, where the parents are of length L . The corresponding elements from
parent 2 are then located in parent I and reordered so that they appear in the
same relative order as they appear in parent 2. Elements in parent 1 that do not
correspond to selected elements in parent 2 are passed directly to the offspring.
For example,
Parent 1: A B C D E F C
Parent 2: C F E B A D C
Selected Elements: * * * .

The selected elements in parent 2 are F, B, and A. Thus, the relevant elements
are reordered in parent 1.
Reorder A B - - - F - from parent 1 which yields f b - -

a _.

All other elements are copied directly from parent I .

( f b - - - a

( - LRN
- C D E - C> yields f b C D E a G.
combined with
TEAM

277

Permutations

Position crossover. Syswerda defines a second operator called position

crossover. Using the same example that was used to illustrate Syswerdas order
crossover-2, first pick L - K elements from parent 1 which are to be directly
copied to the offspring. These elements are copied by position. This yields
- - CDE-G.

Next scan parent 2 from left to right and place place each element which
does not yet appear in the offspring in the next available position. This yields
the following progression:
# # C D E # G = > f # C D E # G
= > f b C D E # G
= > f b C D E a G .

Obviously, in this case the two operators generate exactly the same offspring.
Jim Van Zant first pointed out the similarity of these two operators in the
electronic newsgroup The Genetic Algorithm Digest. Whitley and Yoo ( 1995)
show the two operators to be identical using the following argument.
Assume there is one way to produce a target string S by recombining two
parents. Given a pair of strings which can be recombined to produce string S. the
probability of selecting the K key positions using order crossover-2 required to
generate a specific string S is (:)-, while for position crossover the probability
of picking the L - K key elements that will produce exactly the same effect is
( L f K ) - l . Since
= ( L f K ) the probabilities are identical.
Now assume there are R unique ways to recombine two strings to generate a
target string S. The probabilities for each unique recombination event are equal
as shown by the argument in the preceding paragraph. Thus the sum of the
probabilities for the various ways of ways of generating S are equivalent for
order crossover-2 and position crossover. Since the probabilities of generating
any string S are identical, the operators are identical in expectation.
This also means that in practice there is no difference between using order
crossover-2 and position crossover as long as the parameters of the operators
are adjusted to reflect their complementary nature. If position crossover is used
so that X% of the positions are initially copied to the offspring. then order
crossover is identical if (100 - A)% positions are selected as relative order
positions.

(k)

33.3.3 Uniform crossover

Davis uniform crossover (Davis 1991, p 80) is identical to position crossover
and order crossover-2, except that two offspring are generated. A bitstring
is used to denote the selection of positions. Offspring 1 copies the elements
directly from parent 1 in those positions in the bitstring marked by a 1 bit.
Offspring 2 copies the elements from parent 2 in those positions marked by 0
bits. Both offspring then copy the remaining elements from the other parent in
TEAM LRN
relative order.

278

Recombination

33.3.4 Edge recombination

Edge recombination was introduced as a specialized operator for the traveling

salesman problem (TSP). The motivation behind this operator is that it should
preserve the adjacency between permutation elements, since the cost of a tour in
a TSP is directly related to the set of adjacency relationships (i.e. the distances
between cities) that exists between permutation elements. The original edge
recombination operator has gone through three revisions and enhancements over
the years. First, the basic idea behind edge recombination is introduced.
Since adjacency information directly translates into cost, the adjacency
information from two parent strings is extracted and stored in an adjacency
list called the edge table. The edge table really just combines the two
tours into a single graph. Recombination occurs by building an offspring
using the adjacency information stored in the edge table; in other words,
it tries to find a new Hamiltonian circuit in the graph created by merging
the two parent strings. Finding a Hamiltonian circuit in an arbitrary graph
is itself a nondeterministic-polynomial-time (NP) complete problem and edge
recombination must sometimes add edges not contained in the edge table in
order to generate a legal tour. The various enhancements to edge recombination
attempt to reduce the number of foreign edges (edges not found in the edge
table) that must be introduced into an offspring during recombination in order
to maintain a feasible tour.
In the original edge recombination operator, no information was maintained
about common edges that were shared by both parents. As a result the operator
sometimes failed to place an edge in the offspring that appeared in both parents,
resulting in a kind of mutation by omission (Whitley et a1 1991). To solve this
problem, information about shared edges was added to the edge table. Edges
shared by the two parents are marked with a + symbol. The algorithm can be
described as follows.
Consider the following tours as parents to be recombined:
Parent 1: g d m h b j f i a k e c
Parent 2: c e k a g b h i j f m d.
An edge list is constructed for each city in the tour. The edge list for some city
a is composed of all of the cities in the two parents that are adjacent to city a.
If some city is adjacent to a in both parents, this entry is flagged (using a plus
sign). Figure 33.8 shows the edge table which is the collective set of edge lists
for all cities.
The algorithm for edge recombination is as follows.
(i)

Pick a random city as the initial current city. Remove all references to this
city from the edge table.
(ii) Look at the adjacency list of the current city. If there is a common edge
(flagged by +), go to that city next. (Unless the initial city is the current
TEAM LRN
city, there can be only one common
edge; if two common edges existed,

279

Permutations

+j, m, i

1 I

+d, f, h

Figure 33.8. An edge table.

one was used to reach the current city.) Otherwise from the cities on the
current adjacency list pick the next city to be the one whose own adjacency
list is shortest. Ties are broken randomly. Once a city is visited, references
to the city are removed from the adjacency list of other cities and it is no
longer reachable from other cities.
(iii) Repeat step 2 until the tour is complete or a city has been reached that has
no entries in its adjacency list. If not all cities have been visited, randomly
pick a new city to start a new partial tour.
Jsing the edge table in figure 33.8, city a is randomly chosen as the first
city in the tour. City k is chosen as the second city in the tour since the edge
(a,k) occurs in both parent tours. City e is chosen from the edge list of city k
as the next city in the tour since this is the only city remaining in ks edge list.
This procedure is repeated until the partial tour contains the sequence [a k e c].
At this point there is no deterministic choice for the fifth city in the
tour. City c has edges to cities d and g, which both have two unused edges
remaining. Therefore city d is randomly chosen to continue the tour. The
normal deterministic construction of the tour then continues until position 7. At
position 7 another random choice is made between cities f and h. City h is
selected and the normal deterministic construction continues until we arrive at
the following partial tour: [a k e c d m h b g].
In this situation, a failure occurs since there are no edges remaining in the
edge list for city g. When a potential failure occurs during edge-3 recombination,
we attempt to continue construction at a previously unexplored terminal point
in the tour.
A terminal is a city which occurs at either end of a partial tour, where all
edges in the partial tour are inherited from the parents. The terminal is said to
be live if that city still has entries in its edge list; otherwise it is said to be a dead
terminal. Because city a was randomly chosen to start the tour in the previous
example, it serves as a new terminal in the event of a failure. Conceptually this
is the same as inverting the partial tour to build from the other end.
When a failure occurs, there is at most one live terminal in reserve at the
opposite end of the current partial tour. In fact, it is not guaranteed to be live,
since the construction of the partialTEAM
tour LRN
could isolate this terminal city. Once

280

Recombination

both terminals of the current partial tour are found to be dead, a new partial
tour must be initiated. Note that no local information is employed.
We now continue construction of the partial tour [a k e c d m h b g]. The
tour segment is reversed (i.e. [g b h m d c e k a]). Then city i is added to the
tour after city a. The tour is then constructed in the normal fashion. In this
case. there are no further failures. The final offspring tour is [g b h m d c e k
a i f j]. The offspring produced has a single foreign edge (&-g].)
When a failure occurs at both ends of the subtour, edge-3 recombination
starts a new partial tour. However, there is one other possibility, which has
been described as part of the edge-4 operator (Dzubera and Whitley 1994) but
which has not been widely tested.
Assume that the first partial tour has been constructed such that both ends
of the construction lack a liile terminal by which to continue. Since only one
partial tour has been constructed and since initially every city has at least two
edges in the edge table, there must be edges internal to the current partial tour
that represent possible edges to the terminal cities of the partial tour. The edge-4
operator attempts to exploit this fact by inverting part of the partial tour so that a
terminal city is reconnected to an edge which is both internal to the partial tour
and which appeared in the original edge list of the terminal city. This will cause
a previously visited city in the partial tour to move to a terminal position. If this
newly created terminal has cities remaining in its (old) edge list, the offspring
construction can continue. If it does not, one can look for other internal edges
that will allow an inversion. Details on the edge-4 recornbination operator are
given by Dzubera and Whitley (1994).
If one is using just a recombination operator and a mutation operator,
then edge recombination works very well as an operator for the TSP, at least
compared to other recombination operators, but if one is hybridizing such that
tours are being produced by recombination, then improved using 2-opt, then
both the empirical and the theoretical evidence suggests that Muhlenbeins MPX
operator may be more effective (Dzubera and Whitley 1994).

33.3.5 Maxima 1 p rese nuti LY crossover

Muhlenbein (1991, p 331) offers the following pseudocode for the maximal
preservative crossover (MPX) operator. (Numbering of the pseudocode has
been added for clarity.)

PROC crossover(receiver, donor, offspring)

Choose position 0 <= i < nodes and length blow <= k <= b,, randomly.
Extract the string of edges from position i to position j = ( i k ) MOD
nodes from the mate (donor). This is the crossover string.
Copy the crossover string to the offspring.
Add successively further edges until the offspring represents a valid tour.
This is done in the following TEAM
way. LRN

Permutations

28 1

(a) IF an edge from the receiver parent starting at the last city in the
offspring is possible (does not violate a valid tour)
(b) THEN add this edge from the receiver
(c) ELSE IF an edge from the donor starting at the last city in the
offspring is possible
(d) THEN add this edge from the donor
(e) ELSE add that city from the receiver which comes next in the string;
this adds a new edge, which we will mark as an implicit mutation.
The following example illustrates the MPX operator.
Receiver:
Donor :
Initial segment:

G D M H B J F I A K E C
c e k a g b h i j f m d
- - k a g - - - - - - _.

Note that G is connected to D in the receiver, and that element D through

element I can be taken from the receiver without duplicating any of the elements
already in the offspring. This produces the partial tour
- - k a g D M H B J F I.

At this point, there is no edge in either parent that is connected to I and

has that not been used. Here MPX skips cities in the receiver until it finds one
which has not been used. In this case, it reaches E. This causes E and C to be
added to the tour to yield
E C k a g D M H B J F I.

Note that MPX does not transmit adjacency information from parents to
offspring as effectively as the various edge recombination operators, since it
uses less lookahead to avoid a break in the tour construction. At the same time,
when it must introduce a new edge that does not appear in either parent, it skips
to a nearby city in the tour rather than picking a random edge. Assuming that
the tour is partially optimized (for example, if the tour has been improved via
2-opt) then a city nearby in the tour should also be a city nearby in Euclidean
space. This, coupled with the fact that an initial segment is copied from one of
the parents, appears to give MPX an advantage when when combined with an
operator such as 2-opt. Gorges-Schleuter ( 1989) implemented a variant of MPX
that has some notable features that are somewhat like Davis's order crossover
operator. A full description of Gorges-Schleuter's MPX is given by Dzubera
and Whitley (1994).

33.3.6 Cycle crossover

The operators discussed so far are aimed at preserving adjacency information
(such as edge recombination) or relative order information (such as Davis's
uniform order-based crossover). Operators may also emphasize position. Cycle
crossover partitions two parents into a set of cycles: a cycle is a subset of
TEAM LRN subset of positions on both the
elements which is located on a corresponding

282

Recombination

two parent strings. Consider the following example from Oliver et a1 (1987)
where the permutation elements correspond to the alphabetic characters with
numbers to indicate position:
Parentl: h k c e f d b l a i g j
Parent2: a b c d e f g h i j k 1
P o s i t i o n s : 1 2 3 4 5 6 7 8 9 10 11 12.

To find a cycle, pick a position from either parent. Starting with position 1,
elements (h, a) belong to cycle 1 . The elements (h, a) also appear in positions
8 and 9. Thus the cycle is expanded to include positions ( 1 , 8, 9) and the new
elements i and 1 are added to the corresponding subset. Elements i and I appear
in positions 10 and 12, which also causes j to be added to the subset of elements
in the cycle. Note that adding j adds no new elements, so the cycle terminates.
Cycle I includes elements (h, a, i, I, j ) in positions ( I , 8, 9, 10, 12).
Note that element (c) in position 3 forms a unary cycle of one element.
Aside from the unary cycle at element c (denoted U), Oliver et a1 note that
there are three cycles between this set of parents:
Parentl:
h k c e f d b l a i g j
Parent2:
a b c d e f g h i j k l
Cycle Label: 1 2 U 3 3 3 2 1 1 1 2 1.

Recombination can occur by picking some cycles from one parent and
the remaining cycles from the alternate parent. Note that all elements in the
offspring occupy the same positions as in one of the two parents. However, few
applications seem to be position sensitive and cycle crossover is less effective at
preserving adjacency information (as in the TSP) or relative order information
(as in resource scheduling) compared to other operators.

33.3.7 Merge crossover

Blanton and Wainwright ( 1993) construct permutation recombination operators
for multiple vehicle routing with time and capacity constraints. The follo\lring
example of the merge crossover operator MXI uses a global precedence vector.
Given any two elements in the permutation, the global precedence vector
indicates which element has higher priority for processing. Elements which
appear earlier in the vector have higher precedence. In vehicle routing each
customer has a time window in which they must be served, which can be
translated into a global precedence vector: for example, customer X should be
served before customer Y because the time window for X closes before the time
window for Y. The following example illustrates the operator:
Parent 1 :
C F C B A H D I E J
Parent 2:
E B C J D I C A F H
Precedence: A B C D E F C H I J .

A single offspring is constructed. In this case, starting at position 1, we

TEAM LRN
compare C and E from the two parents;
since C has higher precedence, it is

Permutations

283

placed in the offspring. Because C has already been allocated a position in the
offspring, the C which appears later in parent 2 is exchanged with the E in the
initial position of parent 2. This yields
Parent 1:
C F G B A H D I E J
Parent 2 :
C B G J D I <E> A F H
Precedence: A B C D E F G H I J

where the moved E element is bracketed: <E>. Going to position 2, B has

higher precedence than F, so B is kept in position 2. Also, elements F and B
are exchanged in parent 1, which yields
Parent 1:
C B G <F> A H
Parent 2 :
C B G J D I
Precedence: A B C D E F

I E J

<E> A F H
H I J.

Note that one need not actually build a separate offspring, since both parents
are in effect transformed into copies of the same offspring. The resulting
offspring in the above example is
Offspring: C B G F A H D E I J .

The MX-2 operator is similar, except that when an element is added to the
offspring it is deleted from both parents instead of being swapped. Thus, the
process works as follows:
Parent 1:
Parent 2 :
Precedence:

C F G B A H D I E J
E B G J D I C A F H
A B C D E F G H I J.

C is added to the offspring and deleted from both parents

Parent 1: - F G B A H
Parent 2 : E B G J D I
Offspring: C .

D I E J

A F H

Instead of now moving to the second element of each permutation, the first
remaining elements in the parents are compared: in this case, E and F are the
first elements and E is chosen and deleted. The parents are now represented as
follows:
Parent 1: - F G B A H
Parent 2 : - B G J D I
Offspring: C E.

D I

A F H

Element B is chosen to fill position 3 in the offspring, and the construction

continues to produce the offspring
Offspring: C E B F G A H D I J .

Note that, over time, this class of operator will produce offspring that are
closer to the precedence vectordven
if LRN
no selection is applied.
TEAM

284

Recombination

33.3.8 Some other operators

Other interesting operators have been introduced over the years for permutation
problems. Fox and McMahon (1991) introduced an intersection operator that
extracts features common to both parents. Eshelman (1991) used a similar
strategy to build a recombination operator that extracts all common subtours
for the TSP, and assigns all other elements using local search (2-opt) over an
otherwise random assignment. Fox and McMahon also constructed a union
operator. In this case, each permutation is converted into a binary matrix
representation and the offspring is the logical-or of the matrices representing
the parents.
Radcliffe and Surry (1995) have also introduced new operators for the TSP,
largely by looking at different representations and then defining appropriate
operators with respect to the representations. These representations include the
permutation representation, the undirected edge representation, the directed edge
representation, and the corner representation.

33.4 Finite-state machines

David B Fogel

Recombination can be applied to logical structures such as finite-state machines.

There have been a variety of proposals to accomplish this in the literature. Recall
that a finite-state machine (Chapter 18) is a 5-tuple:

M = ( Q , T , P , S , 01
where Q is a finite set, the set of states, T is a finite set, the set of input
symbols, P is a finite set, the set of output symbols, s : Q x T -+ Q, the
next state function, and o : Q x T --+ P , the next output function. Perhaps the
earliest proposal to recombine finite-state machines in simulated evolution can
be found in the work of Fogel (1964) and Fogel et a/ (1966, pp 21-3). The
following extended quotation (Fogel et a1 1966, p 21) may be insightful:
The recombination of individuals of opposite sex appears to benefit
natural evolution. By analogy, why not retain worthwhile traits that
have survived separate evolution by combining the best surviving
machines through some genetic rule; mutating the product to yield
offspring? Note that there is no need to restrict this mating to the
best two surviving individuals. In fact, the most obvious genetic
rule, majority logic, only becomes meaningful with the combination
of more than two machines.
Fogel et a1 (1966) suggested drawing a single state diagram which expresses
the majority logic of an array of finite-state machines. Each state of the majority
logic machine is the composite of TEAM
a state
from each of the original machines.
LRN

285

Finite-state machines

0
0/1,1/0

0/0,1/0

110

111

wi,i/n
WO

1/1

110

n/i

w1, 110

Figure 33.9. Three parent machines (top) are joined by a majority logic operator to
form another machine (bottom). The initial state of each machine is indicated by a short
arrow pointing to that state. Each state in the majority logic machine is a combination
of the states of the three parent machines with the output symbol being chosen as the
majority decision of two of the three parent machines. For example, the state BDF in
the majority logic machine is determined by examining the states B, D, and F in each
of the individual machines. For an input symbol of 0, all three states respond with a
0, therefore this symbol is chosen for the output to an input of 0 in state BDF. For an
input symbol of I , two of the three states respond with a 0, thus, this being the majority
decision, this symbol is chosen for the output to an input of 1 in state BDF. Note that
several states of the majority logic machine are isolated from the start state and could
never be expressed.

Thus the majority machine may have a number of states as great as the product
of the number of states in the original machines. Each transition of the majority
machine is described by that input symbol which caused the respective transition
in the original machines, and by that output symbol which results from the
majority element logic being applied to the output symbols from each of the
original machines (figure 33.9). If there are only two parents to recombine in
this manner, the majority logic machine reduces to the better of the two parents.
Zhou and Grefenstette ( 1 986) used recombination on finite-state automata
TEAM
LRN
applied to binary sequence induction
problems.
The finite-state automata were

286

Recombination

defined in terms of a 5-tuple:

where Q is a finite set of states, S is a finite input alphabet, qo E Q is the initial

state, 6 is the transition function mapping the Cartesian product of Q and S into
Q , and F is the set of final states, a subset of Q. The chosen representation
was
(XI, y1, FI), (x2,y2, F2), - * . (x8-y8- F8)
7

where each ( X I ,Y , , F , ) represented the state i, X, and Y, corresponded to the

destination state of the zero and one arrows from state i , respectively, and F,
was a three-bit code where the first two bits were used to indicate whether or
not there existed an arrow from state i, and the third bit showed whether the
state i was a final state. The maximum number of states was set to eight. The
details of how recombination was implemented on this representation are not
obvious from the article by Zhou and Grefenstette (1986) but it is reasonable to
infer that a simple one-point crossover operator was applied.
Fogel and Fogel (1986) used recombination in a similar manner on finitestate machines by exchanging single states between machines (i.e. output symbol
and next-state transitions for each input symbol for a particular state). Birgmeier
(1996) also used a similar method implemented as uniform crossover between
two machines by state. One offspring was produced from two parents by
choosing each row in the transition table from either parent (with specific
procedures for handling parents with differing numbers of states). Birgmeier
(1996) also offered a new joining operator where the offsprings size is the sum
of the two parents number of states. Both the output and transition matrices
from the two parents are juxtaposed in the offspring and some of the entries
are randomly reset to point to a state in the other half, thus joining the new
machines into one.

33.5 Crossover: parse trees

Peter J Angeline

From an evolutionary computation view, crossover, in its most basic form, is an

operator that exchanges representational material between two parent structures
to produce offspring. Occasionally, it is important to introduce additional
constraints on the crossover operation to ensure that the created children observe
certain necessary constraints of the representation or problem environment.
Parse tree representations (Chapter I9), as typically used in genetic
programming (Koza 1992), require that the crossover operation produce
offspring that are also valid parse trees. In order to remain a valid parse tree,
the structure must have only terminals at the leaf positions of the tree and only
TEAM
LRN
function nodes at each of its internal
positions.
In addition, each function node

Crossover: parse trees

287

of the parse tree must have the correct number of subtrees below it, one for
each argument that the function requires.
Often in genetic programming, a simplification is made so that all functions
and terminals in the primitive language return the same data type. This is
referred to as the closure principle (Koza 1992). The effect is to reduce the
number of syntactic constraints on the programs so that the complexity of the
crossover operation is minimized.
The recursive structure of parse tree representations makes the definition of
crossover for tree representations that adhere to the above caveats surprisingly
simple. Cramer (1985) initially defined the now standard subtree crossover
for parse trees shown in figure 33.10. First, a random subtree is selected
and removed from one of the parents. Note that this leaves a hole in the
parent such that there exists a function that has a null value for one of its
parameters. Next, a random subtree is extracted from the second parent and
inserted at the point in the first parent where its subtree was removed. Now
the hole in the first parent is again filled. The process is completed by
inserting the subtree extracted from the first parent into the position in the
second parent where its subtree was removed. As long as only complete
subtrees are swapped between parents and the closure principle holds, this simple
crossover operation is guaranteed to produce syntactically valid offspring every
execution.
Typically, when evolving parse tree representations, a user-defined limit on
the maximum size of any tree in the population is provided. Subtree crossover
will often increase the size of a given parent such that, over a number of
generations, individuals in an unrestricted population may grow to swamp the
available computational resources. Given a user-defined restriction on subtree
size, expressed as a limit according to either the depth of a tree or the number of
nodes it contains, crossover must enforce this limit. When a crossover operation
is executed that creates one or more offspring that violate the size limitation, the
crossover operation is invalidated and the offspring are restored to their original
forms. What happens next is a matter of choice. Some systems will reject both
children and revert back to selecting two new parents. Other systems attempt
crossover repeatedly either until both offspring fall within the size limit or until
a specified number of attempts is reached. Given the nature of the crossover
operation, the likelihood of performing a valid crossover operation in a small
number of attempts, say five, is fairly good.
Koza ( 1992) popularized the use of subtree crossover for manipulating
parse tree representations in genetic programming. The subtree swapping
crossover of Koza (1992) shares much with the subtree crossover defined
by Cramer (1985) with a few minor differences. The foremost difference is
a bias introduced by Koza (1992) to limit the probability that a leaf node
is selected as the subtree from a parent during crossover. The reasoning
for this bias according to Koza (1992) is that, in most trees, the number
TEAM LRNto the number of nonleaf nodes.
of leaf nodes will be roughly equivalent

288

Recombination

Figure 33.10. An illustration of the crossover operator for parse trees. A subtree is
selected at random from each parent, extracted, and exchanged to create two offspring
trees.

Consequently, the number of subtrees of depth one will be approximately

the number of subtrees of depth greater than one. Merely swapping a leaf
between parents to produce children half of the time will not tend to greatly
advance the evolutionary process, so, during crossover in a genetic program,
the probability that a leaf node is selected is controlled by a bias term called
the leaf frequency. Typically, the leaf frequency is set at about 10%, meaning
that 10% of the time when a subtree is selected a leaf node will be chosen
in a parent while the rest of the time only nonleaf nodes will be chosen.
Koza (1992) offers no empirical validation of this bias term or its assumed
value.
Often it is important to violate the closure principle and allow multiple
types in the parse tree representation in order to more effectively solve a given
problem. This implies that there are some functions such that they cannot be
used as arguments to certain other functions. Crossover in such typed parse
trees, as described by Montana (1995), proceeds much as in subtree crossover
with one caveat to compensate for the additional constraint of multiple return
types. First, a random node is selected in the first parents parse tree. The return
type of the root of the subtree is determined and the selection of crossover points
in the second parent is restricted to only those subtrees that have identical return
TEAM
LRN
types. This ensures that the syntactic
constraints
in both parents are upheld.

Other representations

289

When evolving genetic programs using automatically defined functions

(ADFs), Koza (1994) uses a slightly modified version of subtree crossover.
When crossing two genetic programs with ADFs, if the crossover position in
the first tree is selected to be within a particular subroutine then only crossover
points in the corresponding subroutine in the second parent are considered.
This is similar to the typed crossover of Montana (1995) except that, rather
than restricting the crossover positions in the second parent based on the type
of subtree extracted from the first, it restricts the selection using the functional
origin of the initially selected subtree.

33.6

Other representations

Peter J Angeline and David B Fogel

The use of recombination on the alternative mixed-integer representations, and
those using introns, does not generally vary from the standard usage. All of
the available options of discrete and intermediate recombination apply to the
mixed-integer format offered by Back and Schiitz (1995). Introns are used
with the belief that they will enhance the chances for crossover to recombine
building blocks. Moreover, Wu and Lindsay ( I 995) suggest that the addition of
introns can have an equivalent effect of varying crossover probabilities across
a chromosome, and state the advantages of the noncoding segment method
including the fact that the genetic algorithm does not need to be modified to
handle variable crossover probabilities and that crossover location calculations
are much simpler.

33.7 Multiparent recombination

A E Eiben
33.7.I

Introduction

To make the following survey unambiguous we have to start with setting some
conventions on terminology. The term population will be used for a multiset
of individuals that undergoes selection and reproduction. This terminology
is maintained in genetic algorithms, evolutionary programming, and genetic
programming, but in evolution strategies all p individuals in a ( p , A ) or ( p + A )
strategy are called parents. We, however, use the term parents only for those
individuals that are selected to undergo recombination. In other words, parents
are those individuals that are actually used as inputs for a recombination
operator; the arity of a recombination operator is the number of parents it uses.
The next notion is that of a donor, being a parent that actually contributes to (at
least one of) the alleles of the child(ren) created by the recombination operator.
TEAM
This contribution can be for instance
theLRN
delivery of an allele, as in uniform

290

Recombination

crossover in canonical genetic algorithms (GAS), or the participation in an

averaging operation, as in intermediate recombination in evolutionary strategies
(ESs). As an illustration consider a steady-state GA where 100 individuals
form the population and two of them are chosen as parents to undergo uniform
crossover to create one single offspring. If, by pure chance, the offspring only
inherits alleles from parent 1, then parent I is a donor, and parent 2 is not.

33.7.2 Miscellaneous operators

We begin this survey with papers where the multiparent aspect has an incidetital
character. By an incidental character we mean that the operator is defined and
used in a specific application and has, for instance, a certain fixed arity, or,
even if the definition is general and would allow comparison between different
number of parents, this aspect is not given attention.
The recombination mechanism of Kaufman ( 1967) is applied for evolving
models for a given process, where a model is an array of a number of blocks,
and models may differ in the numbers of blocks they contain. Recombination of
four models to create one new model is defined as follows. The size of the child
(the number of blocks) is equal to the size of each of its parents with probability
0.25. The ith block of the child is chosen with equal probability from those
parents that have at least i blocks. Let us note that there is an exception to this
latter rule of choosing one of the parents blocks, but that exception has a pery
problem-specific reason; therefore we rather present the general idea here.
In an extensive study on bit vector function optimization, stochastic iterated
genetic hill climbing (SIGH) is studied and compared with other techniques, such
as GAS, iterated hill climbing, and simulated annealing ?(Ackley 1987b). SIGH
applies a sophisticated probabilistic voting mechanism with time-dependent
probability distributions (cooling), where rn voters (m being the size of the
population) determine the values of a new bitstring. SIGH is shown to be better
than a GA with one-point and uniform crossover on four out of the six test
functions and the overall conclusion is that it is competitive in speed with a
variety of existing algorithms.
In the introductory paper on the parallel GA ASPARAGOS (Miihlenhein
1989), p-sexual lwting recombination is applied for the quadratic assignment
problem. Let us remark that the name p-sexual is somewhat misleading, as
there are no different genders and no restriction on having one representative of
each gender for recombination. The voting recombination produces one child
of p parents based on a threshold value U . It determines the ith allele of the
child by comparing the ith alleles of the selected parent individuals. If the same
allele is found more often than the threshold U , this allele is included in the
child; other bits are filled in randomly. In the experiments the values p = 7
and U = 5 are used and it worked surprisingly well, but comparison between
TEAM
LRN
this scheme and ordinary two-parent
recombination
was not performed.

Multiparent recombination

29 1

An interesting attempt to combine GAS with the simplex method resulted in

the ternary simplex crossob'er (Bersini and Seront 1992). If x ' , .I-', x 7 are the
three parents sorted in decreasing order of fitness, then the simplex crossover
generates one child x by the following two rules.
(i)

If x,] = x,? then x, = x,];

(ii) if x: # x,? then xt = x;' with probability p and x, = 1 -x;

I -p.

with probability

Using the value p = 0.8, the simplex GA performed better than the standard
GA on the DeJong functions. The authors remark that applying a modified
crossover on more than three parents 'is worth to try'.
The problem of placing actuators on space structures is addressed by Furuya
and Haftka ( 1993). The authors compare different crossovers: among others
they use uniform crossover with two as well as with three parents in a GA
using integer representation. Based on the experimental results they conclude
that the use of three parents did not improve the performance. This might be
related to another conclusion, indicating that for this problem mutation is an
efficient operator and crossover might not be important. Uniform crossover
with an arbitrary number of parents is also used by Aizawa ( 1 994) as part of a
special schema sampling procedure in a GA, but the multiparent feature is only
a side-effect and is not investigated.
A so-called triadic crossover is introduced and tested by Ph1 (1994) for a
multimodal spin-lattice problem. The triadic crossover is defined in terms of
two parents and one extra individual, chosen randomly from the population. The
operator creates one child; it takes the bits in positions where the first parent
and the third individual have identical bits from this parent and the rest of the
bits from the other parent. Clearly, the result is identical to the outcome of
a voting crossover on these three individuals as parents. Although the paper
is primarily concerned with different selection schemes, a comparison between
triadic, one-point, and uniform crossover is made, where triadic crossover turned
out to deliver the best results.

33.7.3 Operators with undejined arity

In the introduction to this section we defined the arity of a recombination
operator as the number of parents it uses. In some cases this number depends
on the outcomes of random drawings; the operator is called without knowing
in advance how many parents will be applied. In this section we treat three
mechanisms of this kind.
Global recombination in ESs allows the use of more than two recombinants
(Back 1996, Schwefel 1995). In ES there are two basic types of recombination,
intermediate and discrete recombination, both having a standard two-parent
TEAMaLRN
variant and a global variant. Given
population of 1-1 individuals global

292

Recombination

recombination creates one offspring x by the following mechanism.

global discrete recombination

=
x:

+ X I (x?

- x:

g loba 1 intermediate recom bina tion

where the two parents xsi, sT (S,, T, E ( 1 , . . . , p } ) are redrawn for each i
anew and so is the contraction factor x l . The above definition applies to the
object variables as well as the strategy parameter part; that is, for the mutation
stepsizes (a)and the rotation angles (a).Observe that the multiparent character
of global recombination is the consequence of redrawing the parents xsi , x T for
each coordinate i . Therefore, probably more than two individuals contribute
to the offspring x , but their number is not defined in advance. It is clear
that investigations on the effects of different numbers of parents on algorithm
performance could not be performed in the traditional ES framework. The option
of using multiple parents can be turned on or off, that is, global recombination
can be used or not, but the arity of the recombination operator is not tunable.
Experimental studies on global versus two-parent recombination are possible,
but so far there are almost no experimental results available on this subject.
Schwefel (1995) notes that appreciable acceleration is obtained by changing,to
a bisexual from an asexual scheme (i.e., adding recombination using two parents
to the mutation-only algorithm), but only a slight further increase is obtained
when changing from bisexual to multisexual recombination (i.e., using global
recombination instead of the two-parent variant). Recall the remark on the name
p-sexual voting. The terms bisexual and multisexual are not appropriate either
for the same reason: individuals have no gender or sex, and recombination can
be applied to any combination of individuals.
Gene-pool recornbination (GPR) was introduced by Muhlenbein and Voigt
(1996) as a multiparent recombination mechanism for discrete domains. It is
defined as a generalization of two-parent recombination (TPR). Applying GPR
is preceded by selecting a gene pool consisting of would-be parents. Applying
GPR the two parent alleles of an offspring are randomly chosen for each locus
with replacement from the gene pool and the offspring allele is computed using
any of the standard recombination schemes for TPR. Theoretical analysis on
infinite populations shows that GPR is mathematically more tractable than TPR.
I f n stands for the number of variables (loci), then the evolution with proportional
selection and GPR is fully described by n equations, while TPR needs 2
equations for the genotypic frequencies. In practice GPR converges about 25%
fdster than TPR for Onemax. The authors conclude that GPR separates the
identification and the search of promising areas of the search space better;
besides it searches more reasonably than does TPR. Voigt and Muhlenhein
(1995) extend GPR to continuous domains by combining it with uniform fuzzy
two-parent recombination (UFTPR) from Voigt et a1 ( 1995). The resulting
uniform fuzzy gene-pool recombination (UFGPR) outperforms UFTPR on the
TEAM
LRN
heritability,
giving it a higher convergence
spherical function in terms of realized

Multiparent recombination

293

speed. The convergence of UFGPR is shown to be about 25% faster than that
of UFTPR.
A very particular mechanism is the linkage e\d\ting genetic operator
(LEGO) as defined by Smith and Fogarty (1996). The mechanism is designed
to detect and propagate blocks of corresponding genes of potentially varying
length during the evolution. Punctuation marks in the chromosomes denote
the beginning and the end of each block and more chromosomes with the
appropriately positioned punctuation marks are considered as donors of a whole
block during the creation of a child. Although the multiparent feature is only a
side-effect, LEGO is a mechanism where more than two parents can contribute
to an offspring.

33.7.4 Operators with tunable a r i o

Unary reproduction operators, such as mutation, are often called asexual, based
on the biological analogies. Sexual reproduction traditionally amounts to
two-parent recombination in evolutionary computation (EC), but the operators
discussed in the previous section show that the sexual character of recombination
can be intensified, in the sense that more than two parents can be recombined.
Nevertheless, this intensification is not graded: the multiparent option can be
turned on or off, but the extent of sexuality (the number of parents) cannot be
tuned. In this section we consider recombination operators that make sexuality
a graded, rather than a Boolean, feature by having an arity that can vary. In
other words, the operators we survey here are called with a certain number of
parents as input, and this number can be modified by the user.
An early paper mentioning multiparent recombination is that of Bremermann
et a1 (1966) on solving linear equations. It presents the definition of three
different multiparent recombination mechanisms, called nz-tuple mating. Given
m binary parent vectors X I , . . . , x m , the majority mating mechanism creates one
offspring vector x by choosing
x; =

if half or more of the parents have .rl = 0

otherwise.

Another mating mechanism for rn binary parent vectors is called mating by

crossing over. Describing it in contemporary terms, the mechanism works by
selecting m - 1 crossover points (identical in each parent) and then composing
one child by selecting exactly one segment from each parent. The third
operator is called mating by averaging and it is defined for vectors of continuous
variables. Quite naturally, the child x of parents X I , . , . , s"' is defined by

TEAM LRN

294

Recombination
-

where E;:, A, = 1. Unfortunately, only very little is reported on the

performance of these operators. It is remarked that using majority mating and
mating by crossing over the results were somewhat inconclusive; no definite
benefit was obtained. Using mating by averaging, however, led to spectaciilar
effects within a linear programming scheme, but these effects are not specified.
Sc*mziiingcrossover has been introduced as a generalization and extension of
uniform crossover in GAS creating one child from Y parents (Eiben 1991, Eiben
et a1 1994). The name is based on the following general procedure scanning
parents and thus building the child from left to right. Let x , . . . . x r be the
selected parents of length L and let s denote the child.
procedure scanning:
begin

INITIALIZE position markers as il := . . . = i , := 1;

5% mark 1st position in each parent
fori=I toi=L
CHOOSE j
.Tr

:= .T

{ I , . . . ,r } ;

% i t h allde cf.r is the i, th allele o f x J

UPDATE position markers il , . . . , i,;

end

The above procedure provides a general framework for a certain style of

multiparent recombination, where the precise execution, hence the exact
definition of the operator, depends on the mechanisms of CHOOSE and
UPDATE. In the simplest case the UPDATE operation can shift the markers
I , j E { I , . . . , r } , can be used. This
one position to the right; that is, i, := i,
is appropriate for bitstrings, integer, and floating-point representation. Scanning
can also be easily adapted to order-based representation, where each individual
is a permutation, if the UPDATE operation shifts to the first allele which is not
in the child yet:

i, := min{k I k 2 i , , x i @

{XI,

. . . , x,,}}

j E { I , . .. ,r } .

Observe that, because of the term k 2 i, above, a marker can remain at the
same position after an UPDATE, and will only be shifted if the allele standing
at that position is included in the child. This guarantees that each offspring will
be a permutation.
Depending on the mechanism of choosing a parent (and thereby an allele)
there are three different versions of scanning. The choice can be deterministic,
choosing a parent containing the allele with the highest number of occurrences
and breaking ties randomly (oc.c.itrrerzce-6asedsc*annirig). Alternatively it can
be random, either unbiased, following a uniform distribution thus giving each
TEAM
LRN
parent an equal chance to deliver its
allele
(imfiwrn scanning), or biased bj the

Multiparent recombination

295

fitness of the parents, where the chance of being chosen is fitness proportional
(fitness-based scanning). Uniform scanning for r = 2 is the same as uniform
crossover, although creating only one child, and it also coincides with discrete
recombination in evolution strategies. The occurrence-based version is very
much like the voting or majority mating mechanism discussed before. but
without the threshold v or with v = [ m / 2 J respectively. The effect of the
number of parents in scanning crossover has been studied in several papers. An
overview of these studies is given in the next subsection.
Diagorzal crossover has been introduced as a generalization of one-point
crossover in GAS (Eiben et a1 1994). In its original form diagonal crossover
creates r children from r parents by selecting r - 1 crossover points in the parents
and composing the children by taking the resulting r chromosome segments from
the parents along the diagonals. Later on, a one-child version was introduced
(van Kemenade et a1 1995). Figure 33.1 1 illustrates both variants. It is easy to
see that for r = 2 diagonal crossover coincides with one-point crossover, and in
some sense it also generalizes traditional two-parent n-point crossover. To be
precise, if we define ( r ,s) segmentation crossover as working on r parents with
J crossover points, diagonal crossover becomes its ( r , r - 1) version, its (2. n )
variant coincides with n-point crossover, and one-point crossover is an instance
of both schemes for ( r , s ) = (2, 1 ) as parameters. The effect of operator arity
for diagonal crossovers will be also discussed in the next subsection.
A recombination mechanism with tunable arity in ES is proposed by
Schwefel and Rudolph (1995). The ( p , K , A, p ) ES provides the possibility
of freely adjusting the number of parents (called ancestors by the authors).
The parameter p stands for the number of parents and global recombination is
redefined for any given set { X I , . . . , x P ) of parents as
=

;j/p)cP

k=l

x,k

pary discrete recombination

p / p intermediate recornbination

where j E { 1 , . . . , p ) is uniform randomly chosen for each i independently.

Let us note that, in the original paper, the above operators are called uniform
crossover and global intermediate recombination respectively. We introduce
the names pary discrete recombination and p / p intermediate recombination
respectively here for the sake of a consequent terminology. (A reason for
using the tenn p / p intermediate recombination instead of pary intermediate
recombination is given below, in the paragraph discussing a paper by Eiben and
Back (1997).) Observe that pary discrete recombination coincides with uniform
scanning crossover, while p / p intermediate recombination is a special case of
mating by averaging. At this time there are no experimental results available
on the effect of p within this framework.
Related work in ESs also uses p as the number of parents as an independent
parameter for recombination (Beyer 1995). For purposes of a theoretical analysis
it is assumed that all parents are different, uniform randomly chosen from the
LRN
population of p individuals. Beyer TEAM
defines
the p / p intermediate recombination

296

Recombination

Figure 33.11. Diagonal crossover (top) and its one-child version (bottom) for three
parents.

and pary discrete recombinations similarly to Schwefel and Rudolph (1995)

and denotes them as intermediate ( p / p ~recombination
)
and dominant ( p / p ~ )
recombination, respectively. The ( p / p , A) evolution strategy is studied on the
spherical function for the special case of p = p . By this latter assumption it is
not possible to draw conclusions on the effect of p, but the analysis shows that
the optimal progress rate @* of the ( p / p ,A) ES is a factor of p higher than
that of the ( p , A ) ES, for both recombination mechanisms. Beyer hypothesizes
that recombination has a statistical error correction effect, called genetic repair,
and this effect can be improved by using more than two parents for creating
TEAM LRN
offspring (Beyer 1996).

MultiDarent recombination

297

Another generalization of global intermediate recombination in evolution

strategies is proposed by Eiben and Back (1997). The new operator is
applied after selecting p parent individuals from the population of p , and the
resampling of two donors xsi and xT for each i takes only these p individuals
into consideration. Note that this operator is also pary, just like the p / p
intermediate recombination as defined above, but utilizes only two donors for
each allele of the offspring. To express this difference, this operator is called
p / 2 intermediate recombination and the operator of Beyer ( 1995) and Schwefel
and Rudolph ( 1 995) is called p / p intermediate recombination. Observe, that
the p / 2 intermediate recombination is a true generalization of the original
intermediate recombination: the case of p = 2 coincides with local intermediate
recombination, while for p = p , it is equal to global intermediate recombination.
While intermediate recombination is based on taking the arithmetical average
of the real-valued alleles of the parents, the geometrical average is computed by
the geometrical crossover. Michalewicz et a1 ( I 996) present the definition for
any ( k 2 2) number of parents, where the offspring of the parents { X I . . . . , x k }
is defined as
,(

<x:>"',

X y '

( x y

...

+ +

where n is the chromosome length and a1 . . . ak = I . The experimental

part of the paper is, however, based on the two-parent version, hence there are
no results on the effect of using more than two parents with this operator.
The same holds for the so-called sphere crossover (Schoenauer and
Michalewicz 1997); the authors give the general definition for k parents, but
the experiments are restricted to the two-parent version. In the general case the
offspring of parents { X I ,. . . , x k } is defined as

33.7.5 The efSects of higher operator arities

In recent years quite a few papers have studied the effect of operator arity on EA
performance, some even in combination with varying selective pressure. Here
we give a brief summary of these results, sorted by articles.
The performance of scanning crossover for different numbers of parents is
studied by Eiben et a1 (1994) in a generational GA with proportional selection.
Bit-coded GAS for function optimization (DeJong functions FI-4 and a function
from Michalewicz) as well as order-based GAS for graph coloring and the TSP
are tested with different mechanisms to CHOOSE. In the bit-coded case more
parents perform better than two; for the TSP and graph coloring two parents
are advisable. Comparing different biases in choosing the child allele, on four
out of the five numerical problems fitness-based scanning outperforms the other
two and occurrence-based scanningTEAM
is theLRN
worst operator.

298

Recombinati(E

Eiben et ul ( 1 995) investigate diagonal crossover, compared to the classical

two-parent n-point crossover and uniform scanning in a steady-state GA with
linear-ranked biased selection ( b = 1.2) and worst-fitness deletion. The test suite
consists of two two-dimensional problems (F2 and a function from Michalewicz)
and four scalable functions (after Ackley, Griewangk, Rastrigin, and Schwefel).
The performance of diagonal crossover and n-point crossover shows a significant
correspondence with r and n , respectively. The best performance is always
obtained with high values, between 10 and 15, where 15 was the maximum
tested. Besides, diagonal crossover is always better than n-point crossover using
the same number of crossover points ( r = n - I), thus representing the sanie
level of disruptiveness. For scanning the relation between r and performance is
less clear, although the best performance is achieved for more than two parents
on five out of the six test functions.
The interaction between selection pressure and the parameters r for diagonal
crossover and n for n-point crossover is investigated by van Kemenade et ul
(1995). A steady-state GA with tournament selection (tournament size between
one and six) combined with random deletion and worst-fitness deletion was
applied to the Griewangk and the Schwefel functions. The disruptiveness of
both operators increases in parallel as the values for r and IZ are raised, but
the experiments show that diagonal crossover consistently outperforms n-point
crossover. The best option proves to be low selection pressure and high r in
diagonal crossover combined with worst-fitness deletion.
Motivated by the difficulties of characterizing the shapes of numerical
objective functions, the effects of operator arity are studied on fitness landscapes
with controllable ruggedness by Eiben and Schippers (1996). The NK
landscapes of Kauffman (1993), where the level of epistasis, hence the
ruggedness of the landscape, can be tuned by the parameter K , are used for this
purpose. The multiple-child and the one-child version of diagonal crossover and
uniform scanning are tested within a steady-state GA with linear-ranked biased
selection ( h = 1.2) and worst-fitness deletion for N = 100 and different values
of K . Two kinds of epistatic interaction, nearest-neighbor interaction ("1)
and
random-neighbor interaction (RNI), are considered. Similarly to earlier findings
(Eiben er a1 1995), the tests show that the performance of uniform scanning
cannot be related to the number of parents. The two versions of diagonal
crossover behave identically, and for both operators there is a consequent
improvement when increasing r . However, as the epistasis (ruggedness of
the landscape) grows from K = 1 to K = 5 the advantage of more parents
becomes smaller. On landscapes with significantly high epistasis ( K = 25)
the relationship between operator arity and algorithm performance seems to
diminish. We illustrate these observations with a figure showing the error
(deviation of the best individual from the optimum) at termination for the case
of NNI in figure 33.12. The final conclusions of this investigation can be very
well related to works of Schaffer and Eshelman ( 1991), Eshelman and Schaffer
(1993) and Hordijk and Manderick
(1995)
TEAM
LRN on the usefulness of (two-parent)

Multiparent recombination

299

Figure 33.12. Illustration of the effect of the number of parents (horizontal axis) on the
error at termination (vertical axis) on NK landscapes with NNI, N = 100, K = 1 (top),
K = 25 (bottom).

recombination. It seems that if and when crossover is useful, that is, on mildly
epistatic problems, then multiparent crossover can be more useful than the twoparent variants.
The results of an extensive study of diagonal crossover for numerical
optimization in GAS are reported by Eiben and van Kemenade (1997). Diagonal
TEAM LRN
crossover is compared to its one-offspring
version and n-point crossover on a test

300

Recombination

suite consisting of eight functions, monitoring the speed, that is, the total number
of evaluations, the accuracy, that is, the median of the best objective function
value found (all functions have an optimum of zero), and the success rate, th;it
is, the percentage of runs where the global optimum is found. In most cases a n
increase of performance can be achieved by increasing the disruptivity of the
crossover operator (using higher values of n for n-point crossover), and even
more improvement is achieved if the disruptivity of the crossover operator and
the number of parents is increased (using more parents for diagonal crossover).
This study gives a strong indication that for diagonal crossover an advantageous
multiparent effect does exist, that is, (i) using this operator with more than tmo
parents increases GA performance and (ii) this improvement is not only the
consequence of the increased number of crossover points.
A recent investigation of Eiben and Back (1097) addresses the working of
multiparent recombination operators in continuous search spaces, in particulz
within ESs. This study compares p / 2 intermediate recombination, pary discrete
recombination, which is identical to uniform scanning crossover, and diagonal
crossover with one child. Experiments are performed on unimodal landscapes
(sphere model and Schwefels double sum), multimodal functions with regularly
arranged optima and a superimposed unimodal topology (Ackley, Griewangk,
and Rastrigin functions) and on the Fletcher-Powell and the Langermarin
functions that have an irregular, random arrangement of local optima. On
the Fletcher-Powell function multiparent recombination does not increase
evolutionary algorithm (EA) performance; besides for the unimodal double sum
increasing operator arity decreases performance. Other conclusions seem to
depend on the operator in question; the greatest consequent improvement on
raising the number of parents is obtained for diagonal crossover.

33.7.6 Conclusions
The idea of applying more than two parents for recombination in an evolutionary
problem solver occurred as early as the 1960s (Bremermann et a1 1966). Several
authors have designed and applied recombination operators with higher arities
for a specific task, or used an existing operator with an arity higher than two
(Kaufman 1967, Muhlenbein 1989, Bersini and Seront 1992, Furuya and Haftka
1993, Aizawa 1994, Pal 1994). Nevertheless, investigations explicitly devoted
to the effect of operator arity on EA performance are still scarce; the study of the
phenomenon of multiparent recombination has just begun. What would such a
study mean? Similarly to the question of whether binary reproduction operators
(crossover with two parents) have advantages over unary ones (using mutation
only), it can be investigated whether or not using more than two parents is
advantageous. In the case of operators with tunable arity this question can be
refined and the relationship between operator arity and algorithm performance
can be studied. It is, of course, questionable whether multiparent recombination
TEAM LRN showing one behavioral pattern.
can be considered as one single phenomenon

Multiparent recombination

30 1

The survey presented here discloses that there are (at least) three different types
of multiparent mechanism with tunable arity:

operators based on allele frequencies among the parents, such as majority

mating, voting recombination, pary discrete recombination, or scanning
crossover;
(ii) operators based on segmenting and recombining the parents, such as mating
by crossing over, diagonal crossover, or (r.s) segmentation crossoLrer;
(iii) operators based on numerical operations, in particular averaging of
(real-valued) alleles, such as mating by averaging, p / p intermediate
recombination, p / 2 intermediate recombination. and geometrical and
spherical crossover.

A priori it cannot be expected that these different schemes show the same
response to raising operator arities. There are also experimental results
supporting differentiation among various multiparent mechanisms. For instance,
there seems to be no clear relationship between the number of parents and
the performance of uniform scanning crossover, while the opposite is true for
diagonal crossover (Eiben and Schippers 1996).
Another aspect multiparent studies have to take into consideration is the
expected different behavior on different types of fitness landscape. As no
single technique would work on every problem, multiparent mechanisms will
have their limitations too. Some studies indicate that on irregular landscapes.
such as NK landscapes with relatively high K values (Eiben and Schippers
1996), or the Fletcher-Powell function (Eiben and Back 1997). they do not
work. On the other hand, on the same Fletcher-Powell function Eiben and van
Kemenade ( I 997) observed an advantage of increasing the number of parents for
diagonal crossover in a GA framework using bit coding of variables, although
they also found indications that this can be an artifact, caused simply by the
increased disruptiveness of the operator for higher arities. Investigations on
multiparent effects related to fitness landscape characteristics smoothly fit into
the tradition of studying the (dis)advantages of two-parent crossovers under
different circumstances (Schaffer and Eshelman 1991, Eshelman and Schaffer
1993, Spears 1993, Hordijk and Manderick 1995).
Let us also touch on the issue of practical difficulties when using multiparent
recombination operators. Introducing operator ari ty as a new parameter implies
an obligation of setting its value. Since so far there are no reliable heuristics for
setting this parameter, finding good values may require numerous tests, prior
to 'real' application of the EA. A solution may be based on previous work on
adapting (Davis 1989) or self-adapting (Spears 1995) the frequency of applying
different operators. Alternatively, a number of competing subpopulations could
be used in the spirit of Schlierkamp-Voosen and Miihlenbein (1996). According
to the latter approach each different arity is used within one subpopulation
and subpopulations with greater progress, that is, with more powerful operators.
LRN
become larger. A first assessment ofTEAM
this technique
can be found in an article by

302

Recombinatior!

Eiben er al ( 1998a). Another recent result indicates the advantage of using more
parents in the context of constraint satisfaction problems (Eiben et a1 1998b).
Concluding this survey we can note the following. Even though there are
no biological analogies of recombination mechanisms where more than two
parent genotypes are mixed in one single recombination act, formally there is
no necessity to restrict the arity of reproduction mechanisms to one (mutation)
or two (crossover) in computer sirnulations. Studying the phenomenon of
multiparent recombination has just begun, but there is already substantial
evidence that applying more than two parents can increase the performancc
of EAs. Considering multiparent recombination mechanisms is thus a sound
design heuristic for practitioners and a challenge for theoretical analysis.

References
Ackley D H 1987a A Connectionist Machine for Genetic Hillclimbing (Boston, MA:
Kluwer)
__ 1987b An empirical study of bit vector function optimization Genetic Algorithms and
Simulated Anneulirtg ed L Davis (San Mateo, CA: Morgan Kaufmann) pp 170-2 I5
Aizawa A N 1994 Evolving SSE: a stochastic schemata exploiter Proc. 1st IEEE Con$ on
Evolutionun Computation (Orlundo, FL, 1990) (Piscataway, NJ: IEEE) pp 5 2 5 4
Altenberg L 1995 The schema theorem and Prices theorem Foundations c f l Genetic
Algorithms 3 ed L Whitley and M Vose (San Mateo, CA: Morgan Kaufmann)
Back T 1996 Evolutionun Algorithms in Theory und Pruc e (New York: Oxfotd
University Press )
Back T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Ann. Con$ on Evolutionuiy
Progrumnzing (Sun Diego, CA) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp I 1-22
Back T and Schutz M 1995 Evolution strategies for mixed-integer optimization of optical
multilayer systems Proc. 4th Anti. Conf: on Evolutionury Prugrumming (Sun Diego,
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge,
MA: MIT Press) pp 33-51
Bersini H and Seront G 1992 In search of a good evolution-optimization crossover
Purullel Problem Sohing frum Nuture, 2 (Proc. 2nd lnt. Conf on Purullel Problem
Soliling jrom Nuture, Brussels, 1992) ed R Manner and B Manderick (Amsterdam:
Elsevier-North-Holland) pp 479-88
Beyer H-G 1995 Toward a theory of evolution strategies: on the benefits of sex-the
( p / p ,A ) theory Etvlutionury Coniput. 3 81-1 1 1
-I996 Basic principles for a unified EA-theory E\dutionury Algorithms and their
Applicutions Workshop (Dugstuhl, 1996)
Birgmeier M I996 Evolutionary programming for the optimization of trellis-coded
modulation schemes Pruc. 5th Ann. Con$ on Evolutionq Prugramming ed L J
Fogel, P J Angeline and T Back (Cambridge, M4: MIT Press)
Blanton J and Wainwright R 1993 Multiple vehicle routing with time and capacity
constraints using genetic algorithms Proc. 5th Int. Cmj; on Genetic Algorithms
(Urburzci-ChcinipLiigri, IL, July 1093) ed S Forrest (San Mateo, CA: Morgan
TEAM LRN
Kaufmann) pp 452-9

References

303

Booker L B 1982 Intelligent Behavior as an Adaptation to the Task Environment Doctoral

Dissertation, Department of Computer and Communication Sciences, University of
Michigan
1987 Improving search in genetic algorithms Genetic Algorithms cind Sirrrulated
Annealing ed L Davis (San Mateo, CA: Morgan Kaufmann)
-I
993 Recombination distributions for genetic algorithms Foitndutions of Genetic
Algorithms 2 ed L Whitley (San Mateo, CA: Morgan Kaufmann)
Bremermann H, Rogson M and Salaff S 1966 Global properties of evolution processes
Natural Automatcl and Useful Simulations ed H Pattee, E Edlsack, L Fein and A
Callahan (Washington, DC: Spartan) pp 3-41
Bridges C L and Goldberg D E 1987 An analysis of reproduction and crossover
in a binary-coded genetic algorithm Proc. 2nd lnt. Coi$ on Genetic Algorithms
(Cambridge, MA, 1987) ed J J Grefenstette (Cambridge, MA: Erlbaum) pp 9-13
Christiansen F B 1989 The effect of population subdivision on multiple loci without
selection Mathematicd Evolutional?! Theory ed M W Feldman (Princeton, NJ:
Princeton University Press)
Cramer N L 1985 A representation for the adaptive generation of simple sequential
programs Proc. I s t Int. Col$ on Genetic Algorithms mid Their App1icutioir.s ed
J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 183-7
Davis L 1985 Applying adaptive algorithms to epistatic domains Proc. Int. Joint Cot$
on Artificial Intelligence
-1
989 Adapting operator probabilities in genetic algorithms Proc. 3rd lnt. Conf on
Genetic Algorithms (Fairfw, VA, June 1989) ed J D Schaffer (San Mateo, CA:
Morgan Kaufmann) pp 61-9
-(ed)
1991 Handbook of Genetic Algorithms (New York: Van Nostrand Reinhold)
De Jong K A 1975 An Analysis of the Behavior of (1 Class of Genetic Adciptiite Systenis
Doctoral Dissertation, Department of Computer and Communication Sciences,
University of Michigan
Dzubera J and Whitley D 1994 Advanced correlation analysis of operators for the
traveling salesman problem Parallel Problem Solving from Nature-PPSN I l l (Proc.
Int. Con$ on Evolutionav Computation and 3rd Conf on Parallel Prohlern Solitirig
from Nature, Jerusalem, October 1994) (Lecture Notes in Computer- Science 866 )
ed Yu Davidor, H-P Schwefel and R Manner (Berlin: Springer) pp 68-77
Eiben A 1991 A method for designing decision support systems for operational planning
PhD Thesis Eindhoven University of Technology
Eiben A and Back T 1997 An empirical investigation of multi-parent recombination
operators in evolution strategies Evolutionary Cornput. 5 347-65
Eiben A E, RauC P-E and Ruttkay Zs 1994 Genetic algorithms with multiparent
recombination Parallel Problem Solving from Nature-PPSN I l l (Proc. Int. Cot$ or1
Evolutionan Computation and 3rd Con$ on Parallel Problem Sohing .from Nature,
Jerusalem, October 1994) (Lecture Notes in Computer Science 866 ) ed Yu Davidor,
H-P Schwefel and R Manner (Berlin: Springer) pp 77-87
Eiben A and Schippers C 1996 Multi-parents niche: rr-ary crossovers on NK-landscapes
Proc. 4th lnt. Cot$ on Parallel Problem Solling from Nuture (Berlin, 1996) (Lucture
Notes in Computer Science 1141) ed H-M Voigt, W Ebeling, I Rechenberg and H-P
TEAM LRN
Schwefel (Berlin: Springer) pp 3 19-28

304

Recombinatioil

Eiben A. Sprinkhuizen-Kuyper 1 and Thijssen B 199th Competing crossovers i n

an adaptive CA framework Proc. 5th IEEE Cot$ on Eidutioiictrv Computcttiov
(Anc.horqye, A K , Muy 1998) (Piscataway, NJ: IEEE) at press
Eiben A. van der Hauw J and van Hemert J 3998b Graph coloring with adaptive
evolutionary algorithms J . Heicristic..s 4 25-46
Eiben A and van Kemenade C 1997 Diagonal crossover in genetic algorithms for
numerical optimization J . Control Cyhernet. 26 437-65
Eiben A E, van Kemenade C H M and Kok J N 1995 Orgy in the computer: multiparent reproduction in genetic algorithms Ati\vinces in Art$cicil L$e (Proc. 3rd. Eitr.
Cot$ oti Art$ciiil Lifk, Grctritrtltr ) (Lectirre Notes iti Art$c*icil Intelligettc~e929) ed
F Morrin. A Moreno, J J Merelo and P Chaccin (Berlin: Springer) pp 934-45
Eshelman L J 1991 The CHC adaptive search algorithm: how to have safe search
when engaging in nontraditional genetic recombination Founcidotis of Genetrc
A1~yorithni.sed G Rawlins (San Mateo, CA: Morgan Kaufmann)
Eshelman L J, Caruana R A and Schaffer J D 1989 Biases in the crossover landscape
Proc. 3rd lnt. Cot$ oti Genetic Algorithm (Fuirjiix, VA, Jitne 1989) ed J D Schaffu
(San Mateo, CA: Morgan Kaufmann) pp 10-19
Eshelman L and Schaffer J 1993 Crossover's niche Proc. 5th lnt. Cot$ on GeneticIL, 1993) ed S Forrest (San Mateo, CA: Morgan
Algorithms ( Ur-htrriir-CIiunt~~cti~~ti,
Kaufmann) pp 9-14
Fogel D 8 1995 Evolittionctr:v C'omputution: Tocrvtrd it Nertv Philosophy oj' Mcic-hitre
Ititrlli,qeticv (Piscataway, NJ: IEEE)
Fogel L J 1964 On the Ocqcitiiwioti of 1titellec.t Doctoral Dissertation. UCLA
Fogel L J and Fogel D B 1986 Artificial intelligence through evolutionary programming
F i n d Report JOr US Army Reseurdi Institute contract no P0-9-X56- 1 102C- I
Fogel L J, Owens A J and Walsh M J 1966 Art$c*icil 1ntrlligetic.e Through Simulcittd
Eidirtioti (New York: Wiley)
Fox B R and McMahon M B 1991 Genetic operators for sequencing problems
Foiititlutiom of' Getietic- Algorithms ed G J E Rawlins (San Mateo, CA: Morgiin
Kaufmann) pp 284-300
Furuya H and Haftka R T 1993 Genetic algorithms for placing actuators on space
structures Proc. 5th Itit. Corzf oti Genetic Algoritlims (Urbetnci-Chtrt?tplrig,t,IL, Jirly
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 5 3 6 4 2
Geiringer H I944 On the probability theory of linkage in Mendelian heredity Atin. Mcirh.
Stlrt. 15 25-57
Goldberg D and Lingle R Jr 1985 Alleles, loci, and the traveling salesman problem Pro(*.
1 s t I t i t . Conj; oti Gettetic A1gorithtii.s (Pittsbirrgh, PA, Jitly 199.5) ed J J Grefenstel te
(Hillsdale, NJ: Erlbaum)
Gorges-Schleuter M 1989 ASPARAGOS: an asynchronous parallel genetic optimization
strategy Pro(*.3rd Itit. Cot$ o t i Gertetic Algorithnis (Faitfku, VA, J i o w 1989) t d
J D Schaffer (San Mateo, CA: Morgan Kaufmann)
Grefenstette J 1986 Optimization of control parameters for genetic algorithms I E l E
Trctns. Syst. M m Cyheru. SMC-16 122-8
Holland J H 1975 Ad(iptcttioti in Naturcil and Artificid Sj:stem.s (Ann Arbor, MI:
University of Michigan Press)
__ 1989 Searching nonlinear functions for high values Appl. Math. Coniput. 32 255-74
Hordijk W and Manderick B 1995 The usefulness of recombination Ad\wct.s in Art$c-icil
TEAM LIje,
LRN GrLinciriLi) (Lectut-e Notes in A rt$c.itil
LIj? ( Proi.. 3rd ltit. Cot!$ o t i A rtiJicitr1

References

305

Intelligence 929) ed F MorBn, A Moreno, J J Merelo and P Chac6n (Berlin:

Springer) pp 908-19
Julstrom B A 1995 What have you done for me lately? Adapting operator probabilities
in a steady-state genetic algorithm Proc. 6th Int. Col$ on Genetic AIgorithms
(Pittburgh, PA, July 1995) ed L J Eshelman (San Francisco, CA: Morgan Kaufmann)
pp 81-7
Kauffman S 1993 Origins of Order: Self0rgani:ation und Selection in E\dution (New
York: Oxford University Press)
Kaufman H 1967 An experimental investigation of process identification by competitive
evolution IEEE T r a m Syst. Sci. Cybernet. SSC-3 I 1 4
Koza J R 1992 Genetic Programming: on the Programming cf Computers bj*Meuns of
Natural Selection (Cambridge, MA: MIT Press)
-1994 Genetic Programming 11: Automatic Disco\-ery c$ Reusable Progrums
(Cambridge, MA: MIT Press)
Liepins G and Vose M 1992 Characterizing crossover in genetic algorithms Ann. Muth.
Art$cial Intell. 5 27-34
Michalewicz Z 1996 Genetic Algorithms + Data Structures = Etvlution Programs 3rd
edn (Berlin: Springer)
Michalewicz Z, Nazhiyath G and Michalewicz M 1996 A note on the usefulness of
geometrical crossover for numerical optimization problems Proc. 5th Ann. Coi$ on
Evolutionary Programming ed L J Fogel, P J Angeline and T Back (Cambridge,
MA: MIT Press) pp 305-12
Mitchell T M 1980 The need for biases in learning generalizations Technical Report
CBM-TR- 1 17, Department of Computer Science, Rutgers University
Montana D J 1995 Strongly typed genetic programming Evolutionai? Comput. 3 199-230
Muhlenbein H 1989 Parallel genetic algorithms, population genetics and combinatorial
optimization Proc. 3rd Int. Con$ on Genetic Algorithms (Fairfus, VA, 1989) ed J D
Schaffer (San Mateo, CA: Morgan Kaufmann) pp 4 1 6 2 1
-1991
Evolution in time and space-the parallel genetic algorithm Foundations of
Genetic Algorithms ed G J E Rawlins (San Mateo, CA: Morgan Kaufmann)
Muhlenbein H and Voigt H-M 1996 Gene pool recombination in genetic algorithms MetaHeuristics: Theon and Applications ed I Osman and J Kelly (Dordrecht: Kluwer)
pp 53-62
Oliver I, Smith D and Holland J 1987 A study of permutation crossover operators on the
traveling salesman problem Proc. 2nd Int. Con5 on Genetic Algorithms (Cambridge,
MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 224-30
PBI K 1994 Selection schemes with spatial isolation for genetic optimization Parallel
Problem Solving from Nature-PPSN Ill (Proc. Int. Coi$ on E\dutiorrcin?
Computation and 3rd Con5 on Parallel Problem Sol\+zg from Nature, Jerusalem,
1994) (Lecture Notes in Computer Science 866) ed Yu Davidor, H-P Schwefel and
R Manner (Berlin: Springer) pp 170-9
Radcliffe N J 1991 Forma analysis and random respectful recombination Pmc. 4th Int.
Con5 on Genetic Algorithms (San Diego, CA, July 1991) ed R K Belew and L B
Booker (San Mateo, CA: Morgan Kaufmann) pp 222-9
-1994
The algebra of genetic algorithms Ann. Math. Artijcial Intell. 10 339-84
Radcliffe N and Suny P D 1995 Fitness variance of formae and performance prediction
Foundations of Genetic Algorithms 3 ed D Whitley and M Vose (San Mateo, CA:
TEAM LRN
Morgan Kaufmann) pp 5 1-72

306

Recombination

Reed J, Toombs R and Barricelli N A 1967 Simulation of biological evolution arid

machine learning J . Theor. Bid. 17 319-42
Renders J-M and Bersini H I994 Hybridizing genetic algorithms with hill-climbing
methods for global optimization: two possible ways Proc. 1 s t IEEE Conj: on
Evolutionun Computution (Orlmdo, FL, June 1994) (Piscataway, NJ: IEEE)
pp 3 12-7
Robbins R B 19 18 Some applications of mathematics to breeding problems, 111 Genetics
3 375-89
Schaffer J D, Caruana R A, Eshelman L J and Das R 1989 A study of control parameters
affecting online performance of genetic algorithms for function optimization Prcv.
3rd Int. Conj: on Genetic Algorithms (Fuirjkx, VA, June 1989) ed J D Schaffer (San
Mateo, CA: Morgan Kaufmann) pp 51-60
Schaffer J D and Eshelman L 1991 On crossover as an evolutionary viable strategy
Proc. 4th Int. Conj: on Genetic Algorithms ( S a n Diego, CA, 1991) ed R Belew and
L Booker (San Mateo, CA: Morgan Kaufmann) pp 61-8
Schaffer J D and Morishima A 1987 An adaptive crossover distribution mechanism for
genetic algorithms Pruc. 2nd Int. Col$ on Genetic Algorithms (Cumhridge, M A ,
1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 36-40
Schlierkamp-Voosen D and Muhlenbein H 1996 Adaptation of population sizes by
competing subpopulations Proc.. 3rd IEEE Con$ on Evolutionun Computation
(Piscataway, NJ: IEEE) pp 330-5
Schnell F W 1961 Some general formulations of linkage effects in inbreeding Genetics
46 947-57
Schoenauer M and Michalewicz Z 1997 Boundary operators for constrained parameter
optimization problems Proc. 7th Int. Con$ on Genetic Algorithms (Eust Lunsing,
MI, July 1997) ed T Bick (San Mateo, CA: Morgan Kaufmann) pp 320-9
Schwefel H-P I995 Evolution und Optimum Seeking (New York: Wiley)
in
Schwefel H-P and Rudolph G 1995 Contemporary evolution strategies Ad~~unc-es
Art$cinl Lije. (Proc. 3rd Int. Cot$ 011 Artijicial Lije, Grunadu) (Lecture Note5 in
Artijicial Intelligence 929) ed F MorAn, A Moreno, J J Merelo and P Chacdn (Berlin:
Springer) pp 893-907
Smith J and Fogarty T 1996 Recombination strategy adaptation via evolution of gene
linkage Proc. 3rd IEEE Col$ on Evolutionan Computation (Piscataway, NJ: IEEIE)
pp 826-31
Spears W 1993 Crossover or mutation? Foundutions (8Genetic Algorithms-2 ed L
Whitley (San Mateo, CA: Morgan Kaufmann) pp 221-38
-1995 Adapting crossover in evolutionary algorithms Proc. 4th Ann. Con$ o n
Eivlutionun Programming (Sun Diego, CA, 199.5)ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 367-84
Spears W M and De Jong K A 1991a An analysis of multi-point crossover ed G Rawlins
Foundutions of Genetic Algorithms (San Mateo, CA: Morgan Kaufmann)
-1991b
On the virtues of pxameterized uniform crossover Proc. 4rh Int. Con$ on
Genetic Algorithms ( S u n Diego, CA, July 1991) ed R K Belew and L B Booker
(San Mateo, CA: Morgan Kaufmann) pp 230-6
Srinivas M and Patnaik L M 1994 Adaptive probabilities of crossover and mutation in
TEAM
LRN
genetic algorithms IEEE Truns. Syst.
Mun
Cyhernet. SMC-24 6 5 6 6 7

References

307

Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd Int. Con$ on Genetic
Algorithms (Fairjkx, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2-9
-1
99 1 Schedule optimization using genetic algorithms Handbook of Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 3 3 2 4 9
Thierens D and Goldberg D E 1993 Mixing in genetic algorithms Proc. 5th Int. Con$ on
Genetic Algorithms (Urbana-Chumpaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 3 8 4 5
van Kemenade C, Kok J and Eiben A 1995 Raising GA performance by simultaneous
tuning of selective pressure and recombination disruptiveness Proc. 2nd IEEE Con$
on Evolutionary Computation (Perth, 1995) (Piscataway, NJ: IEEE) pp 346-5 1
Voigt H-M and Muhlenbein H 1995 Gene pool recombination and utilization of
covariances for the breeder genetic algorithm Proc. 2nd IEEE Int. Con$ on
Evolutionary Computation (Perth, 1995) (Piscataway, NJ: IEEE) pp 172-7
Voigt H-M, Muhlenbein H and CvetkoviC D 1995 Fuzzy recombination for the breeder
genetic algorithm Proc. 6th Int. Con$ on Genetic Algorithms (Pittsburgh, PA, 1995)
ed L J Eshelman (San Mateo, CA: Morgan Kaufmann) pp 1 0 4 - 1 1
Whitley D, Starkweather T and Shaner D 1991 Traveling salesman and sequence
scheduling: quality solutions using genetic edge recombination Handbook of
Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 350-72
Whitley D and Yoo N-W 1995 Modeling simple genetic algorithms for permutation
problems Foundations of Genetic Algorithms 3 ed D Whitley and M Vose (San
Mateo, CA: Morgan Kaufmann) pp 163-84
Wright A H 1994 Genetic algorithms for real parameter optimization Foundations of
Genetic Algorithms ed G Rawlins (San Mateo, CA: Morgan Kaufmann) pp 205-1 8
Wu A S and Lindsay R K 1995 Empirical studies of the genetic algorithm with noncoding
segments Evolutionary Comput. 3 121 4 8
Zhou H and Grefenstette J J 1986 Induction of finite automata by genetic algorithms Proc.
1986 IEEE Int. Con5 on Systems, Man, and Cybernetics (Atlanta, GA) pp 170-4

TEAM LRN

Other operators
Russell W Anderson (34.1), David B Fogel (34.2) and
Martin Schiitz (34.3)

34.1

The Baldwin effect

Russell W Andersont
34.1.1 Interactions between learning and evolution

In the course of an evolutionary optimization, solutions are often generated

with low phenotypic fitness even though the corresponding genotype may be
close to an optimum. Without additional information about the local fitnr:ss
landscape, such genetic near misses would be overlooked under strong selection.
Presumably, one could rank near misses by performing a local search and scoring
them according to distance from the nearest optimum. Such evaluations itre
essentially the goal of hybrid algorithms (Chapters 1 1-1 3, Balakrishnan and
Honavar 1995), which combine global search using evolutionary algorithms
and local search using individual learning algorithms. Hybrid algorithms can
exploit learning either actively (via Lamarckian inheritance) or passively (via
the Baldwin effect).
Under Lamarckian algorithms, performance gains from individual learning
are mapped back into the genotype used for the production of the next
generation. This is analogous to Lamarckian inheritance in evolution;uy
theory-whereby characters acquired during a parents lifetime are passed on to
their offspring. Lamarckian inheritance is rejected as a biological mechanism
under the modern synthesis, since it is difficult to envision a process by
which acquired information can be transferred into the gametes. Nevertheless,
the practical utility of Lamarckian algorithms has been demonstrated in some
evolutionary optimization applications (Ackley and Littman 1994, Paechter et
al 1995). Of course, these algorithms are limited to problems where a reverse
mapping from the learned phenotype to genotype is possible.
t This work was supported by the Public Health Foundation and the Kett Foundation. The aurhor
wishes to thank David Fogel and Peter Turney for encouragement and comments.

308

TEAM LRN

The Baldwin effect

309

However, even under purely Darwinian selection, individual learning

influences evolutionary processes, but the underlying mechanisms are subtle.
The Baldwin effect is one such mechanism, whereby learning facilitates the
assimilation of new genetic innovations (Baldwin 1896, Morgan 1896, Osborn
1896, Waddington 1942, Hinton and Nowlan 1987, Maynard Smith 1987,
Anderson 1995a, Turney et a1 1996). Learning allows an individual to complete
and exploit partial genetic programs and thereby survive. In other words,
learning guides evolution by assigning partial credit for genetic near misses.
Individuals with useful genetic variations are thus maintained by learning, and
the corresponding genes increase in frequency in the subsequent generation. As
genetic components necessary for a complex structure accumulate in the gene
pool, functions that previously required supplemental learning are replaced by
genetically determined systems.
Empirical studies can quantify the benefits of incorporating individual
learning into evolutionary algorithms (Belew 1989, French and Messinger 1994,
Nolfi et a1 1994, Whitley et a1 1994, Cecconi et a1 1995). However, a theoretical
treatment of the effects of learning on evolution can strengthen our intuition for
when and how to implement such approaches. This section presents an overview
of the principles underlying the Baldwin effect, beginning with a brief history of
the elucidation and development in evolutionary biology. Computational models
of the Baldwin effect are reviewed and critiqued. The Baldwin effect is then
analyzed using standard quantitative genetics. Given reasonable assumptions of
the effects of learning on fitness and its associated costs, this theoretical approach
builds and strengthens conventional intuition about the effects of individual
learning on evolution. Finally, issues concerning problem formulation, learning
algorithms, and algorithmic design are discussed.
34. I .2 The Baldwin effect in evolutionary biology

Complex biological structures require the coordinated expression of several

genes in order to function properly. Determining how such structures arise
through evolution is problematic because it is often difficult to envision the
evolutionary advantage offered by intermediate forms. Without additional
developmental mechanisms, individuals with incomplete genetic programs
would gain no evolutionary advantage over those devoid of any genetic
components.
Baldwin (1896), Osborn (1896), and Morgan (1896) proposed how
individual learning can facilitate the evolution of complex genetic structures
by protecting partial genetic innovations, or ontogenetic variations: [learning]
supplements such partial co-ordinations, makes them functional, and so keeps the
creature alive (Baldwin 1896). Baldwin further proposed how this individual
advantage of learning guides the process of evolution: the variations which
were utilized for ontogenetic adaptation in the earlier generation, being thus
TEAMwidely
LRN
in the subsequent generation
kept in existence, are utilized more

310

Other operators

(Baldwin 1896). Over evolutionary time, abilities that were previously

maintained by adaptive systems can be replaced by genetically determined
systems (i.e. instincts). Waddington proposed an analogous interaction between
developmental processes and evolution, whereby developmental adaptations
guide or canalize evolutionary change (Waddington 1942, Hinton and
Nowlan 1987). Formal mathematical or analytical models quantifying the
Baldwin effect did not appear in the literature until fairly recently.
The model of Hinton and Nowlan. The first quantitative model demonstrating
the Baldwin effect was constructed by Hinton and Nowlan (1987). They used a
computer simulation to study the effects of individual learning on the evolution
of a population of neural networks. They considered an extremely difficult
problem, where a network conferred a fitness advantage only if it was fully
functioning (all connections wired correctly). Each network was given 20
possible connections, specified by 20 genes.
Briefly consider the difficulty of finding this solution using a pure genetic
algorithm. Under a binary genetic coding scheme (allelic values of either
correct or incorrect), the probability of randomly generating a functional
net is 2*. Note that a net with 19 out of 20 correct connections is no better off
than one with no correct connections. The corresponding fitness landscape has a
singularity at the correct solution with no useful gradient information, analogous
to a putting green (figure 34.1). Finding this solution by a pure genetic algorithm,
then, is the evolutionary equivalent of a hole in one. Of course, given a large
enough random population, an evolutionary algorithm could theoretically find
this solution in one generation.
Hinton and Nowlan modeled a modified version of this problem, where
genes were allowed three alternative forms (alleles): present (l), absent (01. or
plastic (?). Connections specified by plastic alleles could be varied by random
trials during the individuals life span. This allowed an individual to complete
and exploit a partially hard-wired network. Hence, genetic near misses (e.g.
19 out of 20 correct genes) could quickly learn the remaining connection(s)
and differentially survive. The presence of plastic alleles, therefore, softened
the fitness landscape (figure 34.1). Hinton and Nowlan described the effect
of learning ability in their simulation as follows: [learning] alters the shape
of the search space in which evolution operates and thereby provides good
evolutionary paths towards sets of co-adapted alleles. The second aspect of the
Baldwin effect (genetic assimilation) was manifested in the mutation of plastic
alleles into genetically fixed alleles.
Issues raised with computational models. Hinton and Nowlans paper is
regarded as a landmark contribution to understanding the interactions between
learning and evolution (Mitchell and Belew 1995) and has inspired a
TEAM LRNand Meir 1990, Ackley and Littman
proliferation of modeling studies (Fontanari

31 1

The Baldwin effect

Combinations of alleles

Figure 34.1. Schematic representation of the fitness landscape in the model of Hinton
and Nowlan. A two-dimensional representation of genome space in the problem
considered by Hinton and Nowlan (1987). The horizontal axis represents all possible
gene combinations, and the vertical axis represents relative fitness. Without learning, only
one combination of alleles correctly completes the network; hence only one genotype has
higher fitness, and no gradient exists. The presence of plastic alleles radically alters this
fitness landscape. Assume a correct mutation occurs in one of the 20 genes. The advent
of a new correct gene only partially solves the problem. Learning allows individuals
close (in Hamming space) to complete the solution. Thus, these individuals will be
slightly more fit than individuals with no correct genes. Useful genes will thereby be
increased in subsequent generations. Over time, a large number of correct genes will
accumulate in the gene pool, leading to a completely genetically determined structure.

1991, 1994, Whitley and Gruau 1993, Whitley et a1 1994, Balakrishnan and
Honavar 1995, Turney 1995, 1996, Turney et a1 1996). Considering the rather
specific assumptions of their model, it is useful to contemplate which aspects
of their results are general properties. Among the issues raised by this and
subsequent studies are the degree of biological realism, the nature of the fitness
landscape, the computational cost of learning, and the role of learning in static
fitness landscapes.
First, the models assumption of plastic alleles that can mutate into
permanent alleles seems biologically spurious. However, the Baldwin effect
can be manifested in the evolution of a biological structure regardless of
the genetic basis of that structure or the mechanisms underlying the learning
process (Anderson 1995a). The Baldwin effect is simply a consequence of
individual learning on genetic evolution. Subsequent studies have demonstrated
the Baldwin effect using a variety of learning algorithms. Turney (1995, 1996)
has observed a Baldwin effect in a class of hybrid algorithms, combining a
genetic algorithm (GENESIS) and an inductive learning algorithm, where the
Baldwin effect was manifested in shifting biases in the inductive learner. French
and Messinger (1994) investigated the Baldwin effect under various forms of
TEAM
LRN observed the Baldwin effect in a
phenotypic plasticity. Cecconi et al
(1995)

312

Other operators

GA+NN hybrid (a hybrid of a genetic algorithm and a neural network), as

did Nolfi et a1 (1994) and Whitley and Gruau (1993). Unemi et a1 (1994)
demonstrated the Baldwin effect in a GA+RL hybrid (GA and reinforcement
learning; in particular, they studied Q-learning). Whitley et a f (1994) studied
the Baldwin effect with a hybrid of a GA and a simple hill climbing algorithm.
Finally, it is interesting to note that genetic mechanisms closely analogous
to the plastic alleles of Hinton and Nowlan may be in effect in evolutionary
interactions between natural and adaptive antibodies (Anderson I99Sb. 199Oa).
Nevertheless, it is difficult to see how this particular model could be generalized
to learning in neural systems.
Second, the model of Hinton and Nowlan assumed an extremely rugged
fitness landscape. The assumption of an all-or-nothing fitness landscape has
apparently led some to assert that a nonlinear selection function is necessary for
a Baldwin effect to occur (Hightower et a1 1996). This claim is not supported
by rigorous analysis. Learning can alter the shape of any fitness landscape
and therefore can affect evolutionary trajectories. For example, consider linear
directional selection. If learning only serves to change the slope of the selection
function, it will by definition affect its severity.
Third, the observation that learning facilitates evolution, has often been
interpreted as learning accelerates evolution. Although several empirical
studies have demonstrated increased convergence rates for hybrid algorithms
(Parisi et a1 1991, Turney 1995, Ackley and Littman 1991, 1994, Balakrishnan
and Honavar I99S), this more general claim is untenable under many conditions.
Intuitively, learning can slow genetic change by protecting otherwise less
optimal genotypes from selection. Furthermore. individual adaptive abilities can
represent an enormous investment of resources (consider the cerebral cortex in
man!). Since individual learning accrues a computational or biological cost, the
costs und benejts of learning must be weighed before drawing such conclusions.
Fourth, most current hybrid algorithm applications operate on a fixed
problem, or static fitness landscape. An exception is a study by Unemi et
crl (1994), which involves a simulated robot in a maze. They show that the
ability to learn is initially beneficial, but it will eventually be selected out 01 the
gene pool, unless the maze changes dynamically with each new individual trial.
Ultimately, learning has no selective advantage in fixed environments, since,
presumably, once the optimal genotype is found, exploration away from this
optimum only reduces fitness (Stephens 1993, Via 1993, Anderson 199Sa). The
studies by Hinton and Nowlan ( 1987) and Fontanari and Meir (1990) corroborate
this thesis: their simulations showed that as individuals arose with allelic
combinations close to the optimum, the plastic alleles (representing the ability
to learn) were selected out of the gene pool. In other words, the computational
advantage of individual learning decreases over the course of an evolutionary
optimization. Under these conditions, individual learning can only be maintained
in a population subject to changing environmental conditions. A similar case
TEAMinLRN
has been made for phenotypic plasticity
general (West-Eberhard 1989, Steams

The Baldwin effect

313

1989, Scheiner 1993, Via 1993) as well as for sexual versus asexual reproduction
(Maynard-Smith 1978).
34.1.3 Quantitative genetics models
In order to make some of these issues more explicit, it is useful to study
the Baldwin effect under the general assumptions of quantitative genetics.
A quantitative genetics methodology for modeling the effects of learning on
evolution was developed by Anderson (1995a), and the primary results of this
analysis are reviewed in this section. The limitations of this theoretical approach
are well known. For example, quantitative genetics assumes infinite population
sizes. Also, complete analysis is often limited to a single quantitative character.
Nevertheless, such analyses can provide a baseline intuition regarding the effects
of learning and evolution.
All essential elements of an evolutionary process subject to the Baldwin
effect are readily incorporated into a quantitative genetics model. These
elements include (i) a function for the generation of new genotypes through
mutation and/or recombination, (ii) a mapping from genotype to phenotype, (iii)
a model of the effects of learning on phenotype, and (iv) a selection function.
In this section, this methodology is demonstrated for a simple, first-order model,
where only the phenomenological effects of learning on selection are considered.
More advanced models are discussed, which incorporate a model of the learning
process, along with its associated costs and benefits. These analyses illustrate
several underappreciated points: (i) learning generally slows genetic change, (ii)
learning offers no long-term selective advantage in fixed environments, and (iii)
the effects of learning are somewhat independent of the mechanisms underlying
the learning process.
Learning as a phenotypic variance. For a first-order model, consider an
individual whose genotype is a real-valued quantitative character subject to
normal (Gaussian) selection:
(34.1)
where w,(g) represents selection as a function of genotype, ge represents the
optimal genotype, and V , ( t ) is variance of selection as a function of time. A
direct mapping from genotype to phenotype is implicitly assumed.
What effect does learning have on this selection function? Learning allows
an individual to modify its phenotype in response to its environment. Consider
an individual whose genotype ( g i ) is a given distance (Igi - gel) from the
environmental optimum ( g e ) . Regardless of the mechanisms underlying the
learning process, the net effect of learning is to reduce the fitness penalty
associated with this genetic distance. Because of its ability to learn, an
LRN
individual with genotype gi has aTEAM
probability
of modifying its phenotype to

3 14

Other operators

the environmental value g e which is a function of the distance between these

two values. A simple way to model this effect is to specify a phenotypic
variance due to learning (Vl). This is equivalent to increasing the variance of
selection. Thus, learning increases the width of the selection function such that
V, is replaced by V,* = V, Vl.

Fixed selection, constant learning. Consider a population subject to selection

with a fixed environmental optimum. For simplicity, let g e = 0. Assume
an initial Gaussian distribution of genotypes, & ( g ) = N ( m ( t ) , V p ( t ) ) ,where
m ( t ) and V p ( t )are the population mean and variance at time t . Each round of
selection changes the distribution of genotypes according to

The population mean and variance after selection (m*,V,*) can now be
expressed in the form of dynamic equations:

(34.5)
(34.6)
Lastly, mutations are introduced in the production of the next generation of
trials. To model this process, assume a Gaussian mutation function with mean
zero and variance VM. A convolution of the population distribution with the
mutation distribution has the effect of increasing the population variance:

where

(34.8)
Hence, in a fixed environment the population mean m ( t ) will converge on
the optimal genotype (Bulmer 1985), while a mutation-selection equilibrium
variance occurs at
vp (v; 4v, v y 2
v**
=
(34.9)
Peq
2
Inspection of equations ( 3 4 3 , (34.6), and (34.8) illustrates two important points.
TEAM
First, learning slows the convergence
of LRN
both m*(t) and V;(t). Second, once

The Baldwin effect

315

convergence in the mean is complete, the utility of learning is lost, and learning
only reduces fitness.
In a more elaborate version of this model, called the critical learning period
model (Anderson 1995a), a second gene is introduced to regulate the fraction of
an individuals life span devoted to learning (duration of the learning period).
Specification of a critical learning period implicitly assigns cost associated with
learning (the percent of life span not devoted to reproduction). Individuals are
then selected for the optimal combination of genotype and learning investment.
It is easily demonstrated that under these assumptions, learning ability is selected
out of a population subject to fixed selection.
Constant-velocity environments. Next, consider a simple case of changing
selection-a constantly moving optimum, ge(t) = 6 t , where 6 is defined as
the environmental velocity. Let the difference between the population mean
and the environmental optimum be defined as @ = m ( t ) - g , ( t ) . The dynamic
equation for @ is

(34.10)
At equilibrium, @ * ( t )= @ ( t ) ,hence
@eq

vp
~

+ v; 6

(34.11 )

where the equilibrium is expressed as a distance from the optimum. A similar

result can be found in the article by Charlesworth (1993), in his analysis of
the evolution of sex in a variable environment. The equilibrium population
variance remains the same as in the case of a fixed environment. Substituting
(34.9) yields

Thus in an environment where the optimal phenotype is changing at constant

rate, the population mean genotype converges on a constant phase lag ( @ e q ) .
Learning actually increases the phase lag by protecting suboptimal genotypes
from selection. But this model assumes ZO, = 1, so that only the relative
magnitude of selection is accounted for. Strong selection without learning might
actually lead to extinction in rapidly changing selection. Phenotypic variability
(due to learning) has the effect of shielding these marginal genotypes from
selection (Wright 1931).
As environmental conditions change, so will the selective advantage of
learning. The relations derived in this analysis show which equilibria will be
reached for an assumed phenotypic variability, but the model does not yield
information on what would represent the optimal investment in learning. Hence,
a complete model of the benefits of the Baldwin effect must incorporate the costs
associated with learning. The best way to estimate these costs is to develop a
TEAM LRN
model of the underlying learning process.

316

Other operators

Models of learning. A reasonable question to ask is how sensitive are the

effects of learning to the mechanisms underlying the learning process. The
most direct (and exhaustive) method for investigating this question would be
to construct a computer simulation to compare the effects of two learning
processes in an evolutionary program. However, estimates of comparative
performance can also be obtained using quantitative genetics models according
to the following methodology. First, one must develop a model of the learning
process. Next the effects of the learning process must be mapped onto the
selection function. A simple approximation is to construct a probabilistic or
phenomenological model of the effect of learning on phenotype.
Under the critical learning period model (Anderson 1995a), learning consists
of a series of independent trials conducted over a fraction of the individuals
life span, or learning period. This simple model incorporates two important
considerations: the sequential nature of learning and a model of the cost
associated with learning. Despite the more complicated assumptions, the
dynamical response of this model to various forms of selection (fixed, random
variation, and constant velocityj were qualitatively comparable to those derik ed
for the simple additive variance model analyzed here. Longer learning periods
increase the investment in (and cost of) learning: consequently, the amount
of learning investment generally only increased with increased environmental
variability.
Other models of the learning process can be incorporated using the
methodology outlined above. For example, under the critical learning period
model, individuals were not allowed to benefit from successive trials within
the learning period, nor were they allowed to begin exploitation of successful
trials until after the learning period. Removing these two restrictions yields
a seqiienricrl trial-and-error learning rule. Such a learning rule is a more
appropriate model of the learning process in some systems, such as affinity
maturation in the antibody immune system (Milstein 1990) or skill acquisition in
neural systems (Bremermann and Anderson 1991, Anderson 1996bj. For these
initial models, including such details of individual learning was unwarranted,
but any model of learning can be mapped onto a fitness function, although
mapping a sequential trial-and-error learning rule onto a survival probability
may be analytically more difficult. Often it turns out that this mapping masks
the details of the underlying process (Anderson 1995aj. This suggests that the
eflects of individual learning on evolution will be qualitatively the same.
33.1.4 Conclusions

Baldwins essential insight was that if an organism has the ability to learn,
it can exploit genes that only partially determine a structure-increasing the
frequencies of useful genes in subsequent generations. The Baldwin effect has
also been demonstrated to be operative in hybrid evolutionary algorithms. These
TEAM
empirical investigations can be used
to LRN
quantify the benefits of incorporating

Knowledge-augmented operators

317

individual learning into an evolutionary algorithm. Computation time is the

obvious performance criterion; however, such comparisons are often limited
to the particular application. Alternatively, phenomenological models can be
used to generate reasonable estimates of performance expectations, deferring
the arduous task of creating detailed computer simulations.
The introduction of individual learning can radically alter fitness landscapes.
This is especially true if the learning algorithm operates on phenotypes according
to a fundamentally different process. Clearly, if the learning algorithm is
identical to the genetic algorithm, no computational savings are likely to be
manifest.
Under certain conditions, learning slows genetic change by protecting
suboptimal genotypes from selection.
Thus, the benefits of individual
learning will probably be accrued early in optimization, when the population
is far from equilibrium, and learning will eventually impede algorithmic
convergence. Accordingly, for optimizations on fixed fitness landscapes,
a variable-learning-investmentstrategy-where the computational resources
applied toward learning are subject to change-should be considered (Saravanan
et a1 1995, Anderson 1995a).

34.2 Knowledge-augmented operators

Daiid B Fogel
Evolutionary computation methods are broadly useful because they are general
search procedures. The canonical forms of the evolutionary algorithms do not
take advantage of knowledge concerning the problem at hand. For example, in
the canonical genetic algorithm (Holland 1975), a one-point crossover operator
is suggested with a crossover point to be chosen randomly across the parents
chromosomes. However it is generally accepted that the effectiveness of a
particular search operator depends on at least three interrelated factors: (i) the
chosen representation, (ii) the selection criterion, and (iii) the objective function
to be minimized or maximized, subject to the given constraints if applicable.
There is no single best search operator for all problems.
Rather than rely on simple operators that may generate unacceptably
inefficient performance on a particular problem at hand, the search operators
can be tailored for individual applications. For example, in evolution strategies
and evolutionary programming, when searching for the minimum of a quadratic
surface, Rechenberg (1973) showed that the best choice for the standard
deviation when using a zero mean Gaussian mutation operator was
CT

= 1.224 f (x)/II

where f ( z ) is the quadratic function evaluated at the parent vector x, and ii

TEAM
LRN
is the dimensionality of the function.
This
choice of o incorporates knowledge

318

Other operators

about the function being searched i n order to provide the greatest expected rate
of convergence. In this particular case, however, knowledge that the function is
a quadratic surface indicates the use of search algorithms that can take greater
advantage of the available gradient information (e.g. Newton-Gauss).
There are other instances where incorporating domain-specific knowledge
into a search operator can improve the performance of an evolutionary algorithm.
In the traveling salesman problem, under the ob-jective function of minimizing
the Euclidean distance of the circuit of cities, and a representation of simply m
ordered listing of cities to be visited, Fogel (1988) offered a mutation operator
which selected a city at random and placed it in the list at another randomly
chosen position. This operator was not based on any knowledge about the
nature of the problem. In contrast, Fogel (1993) offered an operator that instead
inverted a segment of the listing (i.e. like a 2-opt of Lin and Kernighan (1976)).
The inversion operator in the traveling salesman problem is a knowledgeaugmented operator because it was devised to take advantage of the Euclidean
geometry present in the problem. In the case of a traveling salesmans tour, if
the tour crosses over itself it is always possible to improve the tour by undoing
the crossover (i.e. the diagonals of a quadrangle are always longer in sum than
any two opposite sides). When the two cities just before and after the crossing
point are selected and the listing of cities in between reversed, the crossing is
removed and the tour is improved. Note that this use of inversion is appropriate
in light of the traveling salesman problem, and no broader generality of its
effectiveness as an operator is suggested, or can be defended.
Domain knowledge can also be applied in the use of recombination. For
example, again when considering the traveling salesman problem, Grefenstette et
N I (1985) suggested a heuristic crossover operator that could perform a degree
of local search. The operator constructed an offspring from two parents by
(i) picking a random city as the starting point, (ii) comparing the two edges
leaving the starting cities in the parents and choosing the shorter edge, then (iii)
continuing to extend the partial tour by choosing the shorter of the two edges
in the parents which extend the tour. If a cycle were introduced, a random
edge would be selected. Grefenstette et a1 (1985) noted that offspring were
on average about 10% better than the better parent when implementing this
operator.
In many real-world applications, the physics governing the problem suggests
settings for search parameters. For example, in the problem of dockmg
small molecules into protein binding sites, the intermolecular potential can be
precalculated on a grid. Gehlhaar et a1 (1995) used a grid of 0.2 A, with each
grid point containing the summed interaction energy between an atom at that
point and all protein atoms within 6 A. This suggests that under Gaussian
perturbations following an evolutionary programming or evolution strategy
approach, a standard deviation of several hgstroms would be inappropriate
(i.e. too large).
TEAM
Whenever evolutionary algorithms
areLRN
applied to specific problems with the

Gene duplication and deletion

319

intention of generating the best available optimization performance, knowledge

about the domain of application should be considered in the design of the search
operators (and the representation, selection procedures, and indeed the objective
function itself).

34.3 Gene duplication and deletion

Martin Schiitz
34.3.1 Historical review
The idea of using operators such as gene duplication and deletion in the context
of evolutionary algorithms (EAs) is as old as the algorithms themselves.
Fogel et a1 (1966) seemed to be one of the first experimenting with variablelength genotypes. In their work they evolved finite-state machines of a varying
number of states, therefore making use of operators such as addition and
deletion. Typically, the add a state operator was performed randomly, rather
than a strict duplication. They also suggested a majority logic operator that
essentially created a machine in which each state was the composite of a state
from each of the original machines; that is, this operator duplicated the majority
logic vote at each state of multiple finite-state machines. Concerning engineering
problems Schwefel (1968) was one of the first using gene duplication and
deletion for solving the task of determining the internal shape of a two-phase jet
nozzle with maximum thrust under constant starting conditions. Holland ( 1975,
p 11 1) proposed the concepts of gene duplication and gene deletion in order to
raise the computational power of EAs.

34.3.2 Basic motivations for the use of gene duplication and deletion
From these first attempts concerning variable-length genotypes until now many
researchers have made use of gene duplication and deletion. Four different
motivations may be classified.
(i) Engineering applications. Many difficult optimization tasks arise from
engineering applications in which variable-dimensional mixed-integer problems
have to be solved. Often these problems are of dynamic nature: the optimum
is time dependent. Additionally, in order to obtain a reasonable model of
the system under consideration, a large number of constraints has to be
respected during the optimization. Solving the given task frequently assumes
the integration of expert (engineer) knowledge into the problem solving strategy:
into particular genetic operators in the case of EAs. Many such constrained,
variable-dimensional, mixed-integer, time-varying engineering problems and
their solutions can be found in the handbook by Davis (1991) and in the
TEAM
LRN
proceedings of several conferences,
such
as the International Conference on

320

Other operators

Evolutionary Computation (ICEC), Conference on Genetic Algorithms and

Their Applications (ICGA), Conference on Evolutionary Programming (EP) and
Parallel Problem Solving from Nature (PPSN).
(ii) Rtrisitig the c*ompirtatiotialpokver of EAs. As Goldberg et a1 ( 1989, p 49.3;
see also Goldberg et cil 1990) state, nature has formed its genotypes by
progressing from simple to more complex life forms, thereby using variablelength genotypes. He states that genetic algorithms (GAS) using variable-length
genotypes, thus being able to use duplication and deletion operators,
solve problems by combining relatively short, well-tested building
blocks to form longer, more complex strings that increasingly cover
all features of a problem. . . . Specifically, and more positively, we
assert that allowing more messy strings and operators permits genetic
algorithms to form and exploit tighter, higher performance building
blocks than is possible with random, fixed codings and relatively slow
reordering operators such as inversion.
Transferring this idea to EAs in general hopefully leads to more efficient EAs.
(iii) Extradimensional bypass. One additional motivation underpinning the
usefulness of variable-dimensional genotype lengths is given by the
extmdimerzsionul bypass thesis of Conrad ( 1993) (given more formally
by Conrad and Ebeling ( 1992)), which states (maximization):
As the number of dimensions increases the chance of our sitting on
top of an isolated peak decreases, assuming that the space has random
topographic features. The peaks will be transformed to saddlepoints.
The rate of evolution will then depend on how long it takes to discover
an uphill running pathway that requires a series of short steps and not
on how long it takes to make a long jump from one peak to another.
For example, imagine an alpinist walking in a two-dimensional environment
standing in front of a crater whose top he would like to reach. Even if he
cannot see the highest peak, climbing the crater and walking on the border of
the crater (extradimensional bypass) will lead him to the top. Walking in a onedimensional space would complicate the task. This time the surface consists of
two separated peaks (cut through the crater). If the alpinist climbs the first peak
(eventually the higher one) he sees the highest peak but since one dimension
is lost the desired path along the borderline of the crater does not exist. This
time the alpinist has to descend into the valley in order to solve his task. ,4s
one can recognize, introducing extra dimensions during the course of evolution
may overcome the problem of becoming stuck in a local optimum or, to put it
in other words, decrease the necessity of jumping from one basin of attraction
TEAM LRN
to another.

32 I

Gene duplication and deletion

(iv) ArtiJcial intelligence. Another important field in which variabledimensional techniques have also been used is the domain of artificial
intelligence (AI), especially machine learning (ML) and artijicial life (AL).
Whereas in the field of ML (subordinated fields are, for example, genetic
programming, classifier systems, and artificial neural networks) solving a
possibly variable-dimensional optimization problem (depending on the actual
subordinated field in mind) is one main objective, this aim plays a minor
role in the AL field. AL research concentrates on computer simulations of
simple hypothetical life forms and selforganizing properties emerging from
local interactions within a large number of basic agents (life forms). A second
objective of AL is the question of how to make the agents' behavior adaptive,
thus often leading to agents equipped with internal rules or strategies determining
their behavior. In order to learn/evolve such rule sets, learning strategies such
as EAs are used. Since the number of necessary rules is not known a priori,
a variable-dimensional problem instance arises. Despite the rule learning task,
the modeling of the simple life forms itself makes use of variable-dimensional
genotypes.

34.3.3 Formal description of gene duplication and deletion

From the preceding motivations one can see that solving variable-dimensional
optimization problems with constraints forms one main task forcing the use of
gene duplication and deletion. This sort of optimization (minimization) problem
may be formalized as follows.

Dejinition 34.3.1 (variable-dimensional minimization problem with constraints).

Given f : D 2 X = UZ,G' + R, minimize f (2)subject to

V i E { l ,..., m }
V j E { l , ..., I }
hj(z)=O
2 = ( x i , .. . ,Xnr) E D C X
nE
g;(z)LO

f,g ; , hj : X -+ R.

N arbitrary but not fixed,

gi are called inequality constraints and hj equality constraints. The main

difference concerning a non-variable-dimensional optimization (minimization)
problem is the fact that the dimension of the objective vector 2 may vary; that
is, it is not fixed. As a consequence the parameter space X has to be formulated
as the union of all parameter spaces G' of fixed sizes i . In the context of EAs the
gene space G ought not to be a vector space as usual in classical optimization
(most often a Banach space, e.g. R"), thereby omitting all the comfortable
properties Banach spaces have with respect to analysis. Instead, G might be
TEAM LRN
I5,N,Z,Q,@, R or any other complex
space (not in the strict mathematical

322

Other operators

sense) representable by a complex data structure. The use of G is necessary

because most duplication and deletion operators directly work on semantical
entities represented by G. Davidor (1991a), for example, uses binary encoded
vectors of triples (three angles) for representing a robot trajectory, thus G takes
the form G = B x B x B.
Until now we have presented motivations for the use of variable-length
genotypes in the field of EAs. Unfortunately. nature gives no real hint at
why using a variable-length genotype should be advantageous. A high degree
of genepool diversity and a high flexibility to a changing environment may
be one main benefit of non-fixed gene lengths, thus raising the evolutional
powedadaptability of a population. One interesting fact nature offers is that
gene duplication most often leads to viable individuals, whereas gene deletion
does not (Futuyma 1986, p 78). ( A brief and sufficient introduction into the
concepts of neo-Darwinism, i.e. the synthetic theory of evolution, is given by
Back (1996) and therefore omitted here. The more interested reader is referred
to the book by Futuyma (1986).)
Although nature offers a variety of schemes one central idea of how these
operators may be formalized can be extracted from nature as well as from several
approaches in the context of EAs. Whereas the general working mechanism of
both operators is very simple, the different achievements concerning distinct
applications may vary. In order not to focus on a special construction a
more abstract view of both operators is presented here (sufficing in the present
context).
Imagine a genotype z = (-XI, . . . ,x,) E X consisting of genes xi E G, i E
I , . . . , n ( n corresponds to the actual genotype length) from a gene space G.
The deletion operator del may than be formalized as a function transforming a
given individual a = (2,s ) E I by deleting a gene x,. If I = X x A, is the
individual space, where A, is the strategy parameter space, which depends on
the application and the EA, del has the form

In most cases an application-dependent probability PdeI E (0, 1) is responsible

for the decision of whether deletion should be applied or not. The position i
fixing the gene which has to be deleted is usually uniformly chosen from the
set { 1, . . . , n } . Since deletion and duplication produce genotypes of different
length it is important to notice that the dimension n varies from individual to
individual. Returning to our example (Davidor 1991a), deletions occur only
after a recombination ( p , = 1.0) and typically have a probability of 0.05. A
deleted gene xi has the form xi E Bx Bx B,where each bit vector of length
1 codes for an angle.
Similar to deletion, the duplication operator is simple. Generally, instead
TEAM
LRN
of deletion, a gene is added to the
genotype,
such that the operator may be

Gene duplication and deletion

323

formalized as follows:
dup : I + I , with dup(a) = dup(x1, . . . , s,.. . . , x,,, s)
s) = a.
= ( X I , . . . ,X I , XI, . . . ,
Analogously to deletion a duplication probability Pdup E (0, 1 ) is used and the
index i is usually uniformly chosen. Concerning the policy for introducing the
new gene xf several policies may be distinguished, such as:
0

Duplication. The gene x,! is a duplicate of x,, such that a has the form
a = ( X I , . . . , X I , x;, . . . , x,, s).
Related. The initialization of the new gene xf is context dependent: xfis
generated with help the of the actual values of x , and .rr+l.
Addition. x, is initialized at random.

For example, Davidor ( 1 991a) performs a duplication with a probability of

0.06 only when a recombination takes place. Whereas the duplication and
addition policy is intuitive the related policy may be further divided into two
strategies. First, the added arm-configuration is such that either its end-effector
is positioned at the mid distance between the two adjacent end-effector positions,
or its links have a mid metric value between the corresponding link positions
in the adjacent arm-configurations.
Finally, both operators have to adapt the length of the parameter vector
s E A,. Because this process depends on the form of A, details are omitted
here.
34.3.4 Problems arising when using variable-length genovpes
Despite the fact that variable-length genotypes may enhance the computational
power of EAs (see motivations (ii) and (iii)), the introduction of this new concept
borrowed from nature leads to several problems.
The role of positions in a variable-length genotype is destroyed:
the assignment of corresponding genes xi on different homologous
chromosomes is not possible. In order to construct genetic operators which
are able to generate interpretable individuals, thus being able to respect
semantical blocks on the genotype, the assignment problem has to be
solved.
In particular recombination is faced with the problem of finding the locus
of corresponding genes.
Whereas some authors introduced gene duplication and gene deletion
operators in order to improve the stability of the strings length (Davidor
1991a, p 84) others waive these operators; that is, they believe that variabledimensional recombination suffices for the stabilization of string lengths
TEAM LRN
(see e.g. Harp and Samad 1991).

324

Other operators

34.3.5 Solutions
The e i d u t i o n program approach of Michalewicz ( 1992), i.e. combining
the concept of evolutionary computation with problem-specific chromosome
structures and genetic operators, may be seen as one main concept used to
overcome the problems mentioned above. Although this concept is useful in
practice, it prevents the conception of a more general and formalized view
of variable-length EAs because there no longer exists the EA using the
representation and the set of operators, Instead, for each application problzm
a specialized EA exists. According to Lohmann (1992) and Kost (1993),
for example. the formulation of operators such as gene duplication and
deletion, used in their framework of structural evolution, is strongly application
dependent, thus inhibiting a more formal, general concept of these operators.
Davidor ( I 99 la, b) expressed the need for revised and new genetic operators
for his variable-length robot trajectory optimization problem. In contrast to
the evolution program approach, Schutz ( 1994) formulated an applicationindependent, variable-dimensional mixed-integer evolution strategy (ES), thus
following the course of constructing a more general sort of ES. This offered
Schutz the possibility to be more formal than other researchers. Unfortunately,
this approach is restricted to a class of problems which can easily be mapped
onto the mixed-integer representation he used.
Because most work concerning variable-length genotypes uses the evolution
program approach, a formal analysis of gene duplication and deletion is rarely
found in the literature and is therefore omitted here. As a consequence,
theoretical knowledge about the behavior of gene duplication and deletion is
nearly unknown. Harvey ( 19931, for example, points out that gene-duplication,
followed by mutation of one of the copies, is potentially a powerful method for
evolutionary progress. Most statements concerning nonstandard operators such
as duplication and deletion have the same quality as Harveys: they are far from
being provable.
Because of the lack of theoretical knowledge we proceed by discussing
some solutions used to circumvent the problems which arise when introducing
variable-length genotypes. In the first place, we question how other researchers
have solved the problem of noncomparable loci, i.e. the problem of respecting
the semantics of loci. Mostly this gene assignment problem is solved by
explicitly marking semantical entities on the genotype. The form of the tagging
varies from application to application and is carried out with the help of different
representations.
0

Davidor ( 199la, b) used a binary encoded non-fixed-length vector of arm

configurations, i.e. a vector of triples (three angles), for representing a robot
trajectory, thus defining semantical blocks.
The path of a mobile robot may be a variable-dimensional list of path nodes
(triples consisting of the two Cartesian coordinates and a flag indicating
TEAM LRN
whether a node is feasible or not).

Gene duplication and deletion

325

Harp and Samad (1991) implemented the tagging with the help of a special
and more complex data structure representing the structure and actual
weights of any feedforward net consisting of a variable number of hidden
layers and a variable number of units.
Goldberg et a1 ( I 989, 1990) extended the usual string representation of
GAS by using a list of ordered pairs, with the first component of each tuple
representing the position in the string and the second one denoting the
actual bit value. Using genotypes of fixed length a variable dimension
in the resulting messy GA was achieved by allowing strings not to
contain full gene complement (underspecification) and redundant or even
contradictionary genes (overspecification).
Koza ( 1992, 1994) used rooted point-labeled trees with ordered branches
(LISP expressions), thus having a genotype representing semantics very
well.
Lohmann ( 1992) circumvented the assignment problem using so-called
structural evolution. The basic idea of structural evolution is the separation
of structural and nonstructural parameters, thus leading to a two-level ES:
a multipopulation ES using isolation. While on the level of each population
a parameter optimization, concerning a fixed structure, is carried out, on
the population level several isolated structures compete with each other. In
this way Lohmann was able to handle structural optimization problems with
variable dimension: the dimension of the structural parameter space does
not have to be constant. Since each ES itself worked on a fixed number
of nonstructural parameters (here a vector of reals) no problem occurred on
this level. On the structural level (population level) special genetic operators
and a special selection criterion were formulated. The criticism concerning
structural evolution definitively lies in the basic assumption that structural
and nonstructural parameters can always be separated. Surely, many mixedinteger variable-dimensional problems are not separable. Secondly, on the
structural level the well-known semantical problem exists, but was not discussed.
Schutz (1 994) totally omitted a discussion concerning semantical problems
arising from variable-length genotypes.
If the genotype is sufficiently prepared, problems (especially) concerning
recombination disappear, because the genetic operators may directly use the
tagging in order to construct interpretable individuals. Another important idea
when designing recombination operators for variable-length genotypes is pointed
out by Davidor (1991a). He suggests a matching of parameters according to
their genotypic character instead of to their genotypic position. Essentially, this
leads to a matching on the phenotypic, instead of the genotypic level. Generally,
Davidor points out:

In a complex string structure where the number, size and position of

TEAM LRN
the parameters has no rigid structure,
it is important that the crossover

326

Other operators
occurs between sites that control the same, or at least the most similar,
function in the phenotypic space.

In case of the (two-point) segregation crossover used in his robot trajectory

optimization problem, crossing sites were specified according to the proximity
of the end effector positions.

One may remark that many ideas concerning the use of gene duplication ;ind
deletion exist. Unfortunately, most thoughts have been extremely application
oriented, that is, not formulated generally enough. Probably the construction of
a formal frame will be very complicated in the fBce of the diversity of problems
and solutions.

References
Ackley D and Littman M 1991 Interactions between learning and evolution ArriJlciu[
Lfe I1 (Suntci Fe, N M , Fehruuq 1990) ed C Langton, C Taylor, D Farmer and S
Rasmussen (Redwood City, CA: Addison-Wesley) pp 487-509
-1994
A case for Lamarckian evolution Artijicicil Lfe 111 ed C Langton (Redwood
City, CA: Addison-Wesley) pp 3-10
Anderson R W 1995a Learning and evolution: a quantitative genetics approach J. Theor.
Biol. 175 89-101
-1
99% Genetic mechanisms underlying the Baldwin effect are evident in natural
antibodies Proc. 4th A m . COY$on Esolutionrzn' Progrutnming (Sun Diego, CA,
March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA:
MIT Press) pp 547-63
-1996a How adaptive antibodies facilitate the evolution of natural antibodies
Immunol. Cell Biology 74 286-9 I
-1996b Random-walk learning: a neurobiological correlate to trial-and-error Prog.
Neurul Nemvrk.5 at press
Back T 1996 E\dutioticin Algorithms in Theon cirid Prac-tic-e (New York: Oxford
University Press)
Balakrishnan K and V Honavar I995 Ewlutionury Desigri of Neurul Architectiires:
(1 Prelimitiury k o n o t n y und Guide to Literature Artificial Intelligence Research
Group, Department of Computer Science, Iowa State University, Technical Report
CS TR 95-01
Baldwin J M 1896 A new factor in evolution Am. Nuturulist 30 44 1-5 I
Belew R K 1989 When both individuals and populations search: adding simple learning
to the genetic algorithm Proc. 3rd Int. Con$ on Genetic Algorithms (Fuitjiix, VA,
Jirne 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 3 4 4 1
-1
990 Evolution, learning and culture: computational metaphors for adaptive search
Cot?lples Syst. 4 1 1-49
Bremermann H J and Anderson R W 1991 How the brain adjusts synapses-maybe
Aittomated Reasoning: E s s a y in Horior of Woody Bledsoe ed R S Boyer (New
TEAM LRN
York: Kluwer) pp 119-47

References

327

Bulmer M G 1985 The Mathematical Theoty of Quantitative Genetics (Oxford: Oxford

University Press) pp 150-2
Cecconi F, Menczer F and Belew R K 1995 Muturutiort arid the Etdution qf
Imitative Learnirzg in ArtiJicial Organisms Technical Report CSE 506, University
of California, San Diego; 1996 Adaptive Behuiior 4 at press
Charlesworth B 1993 The evolution of sex and recombination in a varying environment
J. Heredity 84 345-50
Conrad M 1993 Structuring adaptive surfaces for effective evolution Proc. 2nd Ann. Cur$
on Evolutionary Programming (San Diego, CA) ed D B Fogel and W Atmar (La
Jolla, CA: Evolutionary Programming Society) pp 1-1 0
Conrad M and Ebeling W 1992 M V Volkenstein, evolutionary thinking and the structure
of fitness landscapes BioSystems 27 125-8
Davidor Y 199 I a Genetic Algorithms and Robotics. A Heuristic Strategyfor 0ptimi:utiorr
(World ScietztiJc Series in Robotics and Automated Systems I ) (Singapore: World
Scientific)
-I
991 b A genetic algorithm applied to robot trajectory generation Hutidbook of
Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold) ch 12, pp 14465
Davis L (ed) 1991 Haizdbook ofGerzetic Algorithms (New York: Van Nostrand Reinhold)
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biological
Cybern. 60 139-44
-1993 Applying evolutionary programming to selected traveling salesman problems
Cybern. Syst. 24 27-36
Fogel L J, Owens A J and Walsh M J 1966 Artijcial Intelligerzc~through Simulated
Evolution (New York: Wiley)
Fontanari J F and Meir R 1990 The effect of learning on the evolution of asexual
populations Complex Syst. 4 40 1- 14
French R and Messinger A 1994 Genes, phenes and the Baldwin effect Artifcial Lijk IV
(Cambridge, MA, July 1994) ed R A Brooks and P Maes (Cambridge, MA: MIT
Press) pp 277-82
Futuyma D J 1986 Evolutionary Biology (Sunderland, MA: Sinauer)
Gehlhaar D K, Verkhivker G, Rejto P A, Fogel D B, Fogel L J and Freer S T
1995 Docking conformationally flexible small molecules into a protein binding
site through evolutionary programming Proc. 4th Ann. Col$ O ~ IE\dutionury
Progrumming (San Diego, CA, March 1995) ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 615-27
Goldberg D E, Deb K and Korb B 1990 Messy genetic algorithms revisited: studies in
mixed size and scale Complex Syst. 4
Goldberg D E, Korb B and Deb K 1989 Messy genetic algorithms: motivation, analysis,
and first results Complex Syst. 3
Grefenstette J J, Gopal R, Rosmaita B and Van Gucht D 1985 Genetic algorithms for
the traveling salesman problem Proc. Int. Con5 0 1 1 Genetic Algorithriis arid their
Applications ed J J Grefenstette (Erlbaum) pp 160-6
Harp S A and Samad T 1991 Genetic synthesis of neural network architecture Haiidhook
of Genetic A1gorithm.s ed L Davis (New York: Van Nostrand Reinhold) ch 15,
pp 202-21
Harvey I 1993 The Artijcial Evolution ofAdaptive Behuviour Master's Thesis, University
TEAM LRN
of Sussex

328

Other operators

Hightower R, Forrest S and Perelson A 1996 The Baldwin effect in the immune system:
learning by somatic hypermutation Aduptirte Itidiriducils in E r d r i n g Populcition.\:
Models Lind Algorithms ed R K Belew and M Mitchell (Reading, MA: AddisonWesley) at press
Hiriton G E and Nowlan S J 1987 How learning can guide evolution Comp1e.x Syst. 1
495-502
Holland J H 1975 A~lciptcitioti in Niitiird r i d Artlficid S y s t e m (Ann Arbor, M I :
University of Michigan Press)
Kost B I993 Strict-tiircil Design iiti Er~)lutionStrcrtegic>s Internal Report, Department of
Bionics and Evolution Technique, Technical University of Berlin
Koza J R 1992 Genetic Progrcimniing (Cambridge, MA: MIT Press)
-1
994 Genetic Progrcltnmittg lI (Cambridge, MA: MIT Press)
Lin S and Kernighan B W 1976 An effective heuristic for the traveling salesman problem
Opercit. Res. 21 498-5 16
Lohmann R 1992 Structure evolution and incomplete induction Pcircillel Problem Solrilzg
from Nature. 2 (Proc. 2nd lnt. Cot$ on Peirullel Probletit Soh9ing from Nciticm,
Briissel\, 1992) ed R Manner and B Manderick (Amsterdam: Elsevier) pp 1 7 5 4 5
Manner R and Manderick B (ed5) 1992 Parctllel Problem Sohing jrom Nritiire, 2 (Proc..
2nd Int. Cot$ on Prirrillel Problem Sohirig from Nutitre, Br14.\,\d.\, 1992) ed R
Manner and B Manderick (Amsterdam: Elsevier)
Maynard Smith J 1978 The E\diction of Sex (Cambridge: Cambridge Univer\ity Pres,)
_- 1987 When learning guides evolution Ncitirre 329 76 1-2
Michalewicz Z I992 Genetic Algorithms
Dutci S t r i ~ t u r e s = E\dittioii Progrcirns
(Berlin: Springer)
Milstein C 1990 The Croonian lecture 1989 Antibodies: a paradigm for the biology of
molecular recognition Proc. R. Soc. B 239 1-16
Mitchell M and Belew R K 1995 Preface to G E Hinion and S J Nowlan How learning
can guide evolution A@ t i rte Int li \licluci Is it z E\ vol ring Pop it lu tions : Moclels ( I tzd
Algorithtns ed R K Belew and M Mitchell (Reading, MA: Addison-Wedey)
Morgan C L 1896 On modification and variation Science 4 733-40
Osborn H F 1896 Ontogenic and phylogenic variation Science 4 786-9
Nolfi S. Elman J L and Parisi D 1094 Learning and evolution in neural network5 Adcipt/\-e
BeCiurior 3 5-28
Paechter B, Cumming A, Norman M and Luchian H 1995 Extensions to a memetic
timetabling sy\tem Proc. 1st Int. Conf: oiz the t'rcictice cirzd Tlzeon cf Auto,ncired
fimetublirzg (ICPTAT 95) (Edinburgh, 1995)
Parisi D, Nolfi S and Cecconi F 1991 Learning, behavior, and evolution finrvird c1 Prcictrce
of Aittonotnoits S y s t e m (Proc. 1st Eur. Conj: on Artijicicil L f e (Puris, 1991)) ed F
J Varela and P Bourgine (Cambridge, MA: MIT Press)
Rechenberg I 1973 Ei~olittioiis.\trciteRie: Optinzieritrig Technisctier Systerne n d i
Principien der Biologi.\c*hrtz Erdirtion (Stuttgart: Fromman-HolLboog)
Saravanan N, Fogel D B and Nelson K M 1995 A comparison of methods for selfadaptation in evolutionary algorithms BioSystern\ 36 157-66
Scheiner S M 1993 Genetic\ and evolution of phenotypic plasticity Ann. Re\-. c o l .
Sv\temcit. 24 35-68
S c h Ut7 M I 994 Eirie Er v lu tions J trcrtegiufiir getni.sc.ht-~citz=~'cihligr
O~~tirrrirri~ng.~pro
blerne
LRN
Thesis,
University of Dortmund
rnit rwicihler Ditnen.\ioti DiplomaTEAM

Schwefel H P 1968 Projekt MHD-Staustrahlrohr: Experimentelle Optimierung einer

ZHteiphasendiise Teil I Technischer Bericht 1 1.034/68, 35, AEG Forschungsinstitut,
Berlin
Sober E 1994 The adaptive advantage of learning and a priori prejudice From a
Biological Point of Vieu,: Essay in Evolutionary Philosophy (a collection of essays
by E Sober) (Cambridge: Cambridge University Press) pp 50-70
Stearns S C 1989 The evolutionary significance of phenotypic plasticity-phenotypic
sources of variation among organisms can be described by developmental switches
and reaction norms Bioscience 39 4 3 6 4 5
Stephens D W 1993 Learning and behavioral ecology: incomplete information
and environmental predictability Insect Learning. Ecological and E,dutionan
Perspectives ed D R Papaj and A C Lewis (New York: Chapman and Hall) ch 8,
pp 195-218
Turney P D I995 Cost-sensitive classification: empirical evaluation of a hybrid genetic
decision tree induction algorithm J. ArtiJcial Intell. Res. 2 3 6 9 4 0 9
-1996
How to shift bias: lessons from the Baldwin effect E\dutionury Coinput. at
press
Turney P D, Whitley D and Anderson R W (eds) 1996 Special issue on evolution,
learning, and instinct: 100 years of the Baldwin effect E\dutionan, Cornput. at
press
Unemi T, Nagayoshi M, Hirayama N, Nade T, Yano K and Masujima Y 1994
Evolutionary differentiation of learning abilities-a case study on optimizing
parameter values in Q-learning by a genetic algorithm Artifciul Life IV ( J u l y 1994)
ed R A Brooks and P Maes (Cambridge, MA: MIT Press) pp 331-6
Via S 1993 Adaptive phenotypic plasticity: target or by-product of selection in a variable
environment? Am. Naturalist 142 352-65
Waddington C H 1942 Canalization of development and the inheritance of acquired
characters Nature 150 563-5
Wcislo W T 1989 Behavioral environments and evolutionary change Ann. Re\l. Ecol.
S y ~ t e ~ ~20
~ a137-69
t.
West-Eberhard M J 1989 Phenotypic plasticity and the origins of diversity Ann. Re\-.
Ecol. Systemat. 20 249-78
Whitley D and Gruau F 1993 Adding learning to the cellular development of neural
networks: evolution and the Baldwin effect Evolutionav Cornput. 1 2 13-33
Whitley D, Gordon S and Mathias K 1994 Lamarckian evolution, the Baldwin effect
and function optimization Parallel Problem Soliing from Nature-PPSN 111 (Proc.
h i t . Corzf: on Evolutionary Computation and 3rd Col$ on Parallel Problem Sol\ing
from Nuture, Jerusalem, October 1994) (Lecture Notes in Computer Science 866)
ed Yu Davidor, H P Schwefel and R Manner (Berlin: Springer) pp 6 1 5
Wright S I93 1 Evolution in Mendelian populations Genetics 16 97-1 59

Further reading
More extensive treatments of issues related to the Baldwin effect can be found
in the literature cited in section C3.4.1. The following are notable foundation
TEAM LRN
and review papers.

330

Other oDerators

1. Anderson R W 1 9 9 5 Learning and evolution: a quantitative genetics approach J.

Theor. Biol. 175 89- l0 I

2. Balakrishnan K and V Honavar 1995 Evolutionary Design ojNeuru1 Architectureb:

U Preliminary Tuonorny und Guide to Literature Artificial Intelligence Research
Group, Department of Computer Science, Iowa State University, Technical Report
CS TR 95-01

3. Baldwin J M 1896 A new factor in evolution Ant. Naturalist 30 44 1-5 1

4. Belew R K 1989 When both individuals and pvpulations search: adding simple
learning to the genetic algorithm Proc. 3rd Int. Cot$ on Genetic Algorithms (Fai$ux,
VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 34-41
5. Hinton G E and Nowla Hinton G E and S J Nowlan 1987 How learning can guide
evolution Complex Syst. 1 495-502
6. Morgan C L 1896 On modification and variation Science 4 7 3 3 4 0

7. Sober E 1994 The adaptive advantage of learning and a priori prejudice From U
Biologicd Point of Vieuv: Essay in Evolutionary Philosophy (a collection of essays
by E Sober) (Cambridge: Cambridge University Press) pp 50-70

8. Turney P D 1995 Cost-sensitive classification: empirical evaluation of a hybrid

genetic decision tree induction algorithm J. At-tiJicial lntell. Res. 2 369-409
9. Turney P D, Whitley D and Anderson R W (eds) 1996 Special Issue on evolution,
learning, and instinct: 100 years of the Baldwin effect ELdictionan Comput. at
press
10. Waddington C H 1942 Canalization of development and the inheritance of acquired
characters Nature 150 563-5
1 1 . Wcislo W T 1989 Behavioral environments and evolutionary change Ann. R,:LI.

Ecol. Systemat. 20 137-69

12. Whitley D and F Gruau I993 Adding learning to the cellular development of neural
networks: evolution and the Baldwin effect Eitolutionary Cumput. 1 2 13-33

TEAM LRN

Index, Volume 1
A
Actuator placement on space structures
7
Adaptation 37
Adaptation in Natural and Artijicial
Systems (book) 46
Adaptive behavior 110
Adaptive landscape 36
Adaptive mutation 70
Adaptive Systems Workshop 46
Adaptive topography 24
Air combat maneuvering 9
Airborne pollution 8
Aircraft design 7
Alleles 64, 70, 164, 209, 263, 310, 31 1
Amino acids 33
Animats 1 10
ARGOT 77, 146
Arithmetic crossover 272
Artificial intelligence (AI) 90, 97, 189,
32 1
Artificial life (AL) 2, 321
Artificial neural networks. See Neural
networks
Autoadaptation 43
Automatic control 43, 94
Automatic programming 40
Automatically defined functions (ADFs)
110, 158
Autonomous vehicle controller 8

Baldwin effect 308-17

in evolutionary biology 309-13
Beam search 189
Bellman optimality 116
Bent pipe 49
Bias 66
Bidding procedures 119
Binary representation 60
Binary strings 64, 69, 75, 76, 127, 128,
131-5, 237, 238, 256-70
Binary tournament selection I68

Binary vectors. See Binary strings

Biological systems 2, 256
Biology 7, 9
Biomorphs 228
Bit strings. See Binary strings
Bitwise crossover 73
Bitwise simulated crossover (BSC) 70
Bitwise uniform crossover 73
Boltzmann selection 174, 195-200, 202
mechanisms 195
Boltzmann selection operator 169
Boltzmann tournament selection (BTS)
196
Boltzmann trials 195
double acceptance/rejection 197
single acceptance/rejection 197
Boolean parse tree 249
Branch-and-bound techniques 130
Breeding strategies 45
Bucket brigade algorithm 1 17, I 18
Building block hypothesis (BBH) 70, 90

C
Canonical genetic algorithms I32
Cellular automata (CAs) 7
CHC algorithm 67
Chemistry 7
Chromatids 32
Chromosomes 27, 32, 33, 35, 64
Classification, applications 9, I0
Classifier systems (CFS) 2, 46
Clonal selection theory 37
Combinatorial problems (CES) 76
Combined representations 130, 131
Competitive selection 203
Compress mutation operation 158
Computer-generated force (CGF) 99
Computer programs 103, 109, 1 10
Computer simulation 1
Constant learning 314, 315
Constant-velocity environments 3 15
Control applications 8
Control systems 98

TEAM LRN

33 1

332
Convergence-controlled variation (CCV)
70, 75
Convergence-controlled variation
hypothesis (CCVH) 70. 71
Convergence rate 240, 241
theory 49
Corridor model 241
Creeping random search method 50
Criminal suspects 8
Critical learning period model 315
Crossover 64.65, 68-75, 235
bias 268, 269
in tetrad 32
mathematical characterizations 261,
262
mechanisms 256-6 1
one-point 69
points 257
probability 260
rate 257, 258
two-point 69
uniform 69
Crossover operators 76, 132. 238, 257,
258
characterizations 36 1
Cyanide production 3 1
Cyanogenic hybrid 3 I
Cycle crossover I45

Indes
Discount parameter 1 16
Discrete recombination 270
Disruption analysis 264
Disruptive selection 202
Distribution bias 269
Diversity 192
DNA (deoxyribonucleic acid) 33
Document retrieval 9
Domain-specific knowledge 3 18
Double helix 33
Drift 36, 46, 209
Drug design 98
Dynamic programming (DP) I16

Economics 7, 9
interaction modeling 7
Electrornagnetics 8
Elitist strategy 66, 210
Embryonic development 1 10
Encapsulate operator 158
Endosymbiotic systems 37
Engineering applications 7
Enzymes 33
Epistasis 31, 32
Equivalence 2 14- 18
Euclidean search spaces 81
Eukaryotic cell 33
Evaluations 89
D
noise 2 2 2 4
Darwinian evolution 89, 309
Evolution and learning 308, 309
Darwinism 27
Evolutionary algorithms (EAs) 6, 7,
Deception 72
20-2, 318
Deceptive functions 147, I49
admissible 191
Deceptive problems 72, 73
basic 59
Decision making, multicriterion. See
Boltzmann 195
Multicriterion decision making
common properties 59
Decision variables I27
computational power 320
Defining length 265
development 4 1
Delta coding 77
general outline 59
Deoxyribonucleic acid (DNA) 33
mainstream instances 59
Derivative methods 103-13
strict 19 1
Design applications 6, 7
theory 40, 41
Deterministic hill climbing 1
.SPY d.so specific types and
Dihybrid ratio 31
applications
Dimenxionality 2 1
Evolutionary computation (EC)
Diplodic representation 25 I
advantages (and disadvantages) 20--2
Diploid 27, 35
applications 4- 19
TEAM LRN consensus for name 41
Diploid representations 164

Index

333

discussion 3
population-based 83
history 40-58
steady-state 83
use of term 1
two-membered 38, 50
Eidutioiicrn' Computcition (journal) 47
Exons 33
Evolutionary game theory 37
Expected, infinite-horizon discounted
Evolutionary operation (EVOP) 40
cost 116
Evolutionary processes 37
Expression process 23 1
overview 23-6
Extradimensional bypass thesis 320
principles of 23-6
Evolutionary programming (EP) 1, 60,
136, 163, 167, 217, 218
F
basic concepts 89- 102
Fault diagnosis 7
basic paradigm 94
Feedback networks 97
continuous 95
Feedforward networks 97
convergence properties 100
Fertility factor 192
current directions 97-100
Fertility rate 192
diversification 44
Filters, design 6
early foundations 9 2 4
Financial decision making 9
early versions 95
Finite impulse response (FIR) filters 6
extensions 94-7
Finite-length alphabet 9 1
future research 100
Finite-state machines 33, 60, 9 I , 92, 95,
genesis 90
129, 134. 152, 153. 162, 236-8
history 40, 41, 90-7
Finite-state representations IS 1-4
main components 89
applications 1 5 2 3
main variants of basic paradigm 95
Fitness-hascd scan 273
medical applications 98
Fitness criterion 228
original 95, 96
Fitness evaluation 108
original definition 91
Fitness function 178
overview 40, 41
Fitness functions 172-5
self-adaptive 95, 96
monotonic 190, 191
standard form 89
scaling 66
v. GAS 90
strictly inonotonic 190, I9 I
Evolutionary robotics, see a l s o Robots
Fitness landscapes 229, 308, 31 1
Evolutionary strategies (ESs) 1, 48-5 1,
Fitness measure 235
60, 64, 81-8, 136, 163
Fitness proportional selection (FPS) 2 18
(1
I ) 48, 83
Fitness scaling 174. 175, 187
( 1 + A ) 48
Fitness values 59, 63, 66
(p
1 ) 83
Fixed selection 3 13, 3 15
( p A ) 48, 67, 83, 86, 167, 169,
Flat plate 48
189, 206, 210, 217, 220, 224, 230
Fouirclcttioris of' Getirtic Alpwithitis
( p , A) 189, 206, 210, 220, 222, 231
(FOGA) (workshop) 47
( P CL) 170
Functions 103, 105
alternative method to control internal
Fundamental theorem of genetic
parameters 86
algorithms 177
archetype 81, 82
Fuzzy logic systems 163
contemporary 83-6, 85
Fuzzy neural networks 97
development 40
Fuzzy systems 33
multimembered ( p > I ) 48
nested 86, 87
overview 48-5 1
TEAM LRN

Inde 5

334
G
Game playing 9
programs 45
Game theory 98
Gametes 27
Gametogenesis 27
Gaming 43
GauB-Seidel-like optimization strategy
87
Gaussian distribution 242
Gaussian mutations 24 1
Gaussian selection 3 I3
Geiringers theorem I1 263, 264
Gene duplication and deletion 3 I9
basic motivations 3 19
engineering applications 3 19, 320
formal description 32 1-3
historical review 3 19-2 I
Gene flow 36
Gene frequencies 36
Generation gap methods 205-1 1
historical perspective 206
Generational EAs 207
Generational models 2 16
Generic control problem 1 14
Genes 30, 33, 34, 64, 310
segregating independently 30
GENESIS 31 I
Genetic algorithms (GAS) 1. 2, 59, 60,
64-80, 103, 136, 155, 167
basics 65-8
breeder 67
canonical 64
generational 67
history 40-58
implementation 46, 65
messy 72, 73, 164
operation 70
overview 44-8, 64-80
pseudocode 65
steady-state 67
v. EP 90
see ctlso specific applications
Genetic drift 36, 46, 209
Genetic operators 65, 106-8, 110
Genetic Program Builder (GLIB) 158
Genetic programming (GP) 23, 60, 156,
167
defined 103-8

development 108, 109

functions 106
fundamental concepts 103-8
initialization 106
specialized representation as
executable program 104-6
value 109, 110
Genetics
fundamental concepts 27-33
principles of 27-39
GENITOR system 67, 189, 209
Genomic DNA 35
Genotypes 23, 64, 231, 232
Geomecrical crossover 273
Global convergence 199. 200
Gradient descent 1
Gray code 76, 133
Gray-coded strings 128
Grow mutation operator 248

H
Hamiltonian circuit 139
Hamming cliffs 128
Hamming distance 75, 133, 148
Haploid 27, 35
Hardy--Weinbeg theorem 35
Heating, ventilation and air conditioning
( HVAC) controllers 98
Heterozygote 30
Heuristic crossover 273
H-infinity optimal controllers 8
Hinton and Nowlans model 310, 31 I .
312
Hitchhiking effects 7 1
Homozygotes 30
Hybrid algorithms 308-1 1
Hybridizations 60
Hydrodynamics 8 I
Hyerplanes 72, 177, 178, 179
analysis 69

Identification applications 7, 8
IEEE World Cotigress on Cotnputatioiiul
Intelligence ( WCCI) 4 1
Image processing 9
applications 44
Implicit parallelism 134, 190, 191
TEAM LRN
Incremental models 2 17, 220-2

Index
Inductive bias 268
Infinite impulse response (IIR) filters 6
Information retrieval (IR) systems 9
Information storage 9
Inheritance systems 28, 29, 37
Initialization 74, 89
Insert operator 245, 246
Intelligent behavior 42
Interactive evolution (IE) 228-34
application areas 231, 232
approach 229-3 1
difficulties encountered 23 1
formulation of algorithm 230
further developments and
perspectives 232, 233
history and prospects 228, 229
minimum requirement 228
overview 231
problem definition 229
samples of evolved objects 233
selection 229
standard system 229
Interceptor 94
Interdisciplinary Workshop in Adaptive
Systems 46
Intermediate recombination 85
Interrratiortal Coi2ferenc-e on Gerzetic
Algorithms (ICGA) 47
International Society for Genetic
Algorithms (ISGA) 47
Introns 163, 164
Inverse permutation 14 I , 142
Inversion 145
Inversion operator 77
Island models 36, 77

335
L
Lamarckian inheritance 308
Languages 60
Laplace-distributed mutation 240
Learning
and evolution 308, 309
as phenotypic variance 313. 314
Learning algorithms 308
Learning classifier systems (LCSs)
1 14-23
introduction 117, 118
Michigan approach 118
operational cycle 1 18
Pitt approach 1 I 8
stimulus response 1 18
structure 117
Learning models 316
Learning problems 1 14-1 8
Life cycle model 28
LINAC. See Linear accelerator
Linear accelerator, design 7
Linear-quadratic-Gaussian controllers
CQG) 8
Linear ranking 188, 21 5
Linguistics 9
Linkage, equilibrium 264
LISP 75, 108. 109, 128, 129
Local search (LS) 149

Machine intelligence 2
Machine learning (ML) problems 321
Mapping function I42
Markov decision problem 115
Mask 257
Mathematical analyses 44
Job shop scheduling (JSS) 5
Matrix representations 143-5
Joint plate 48
Maximum independent set problem 132
Juxtapositional phase 74
Medical applications 98
Medicine 44
Meiosis 27, 32
Memory cache replacement policies 5
Mendel, Gregor 28
k-ary alphabet 134
Mendelian experiment 28, 29, 30
Knapsack problems (KP) 6, 132
Knowledge-augmented operators 3 17-1 9
Mendelian inheritance 261
Mendelian ratios 31
Kohonen feature map design 6
Messenger RNA (mRNA) 33, 34
Messy GA 251
TEAM LRN

336
Messy genetic algorithms (mGAs) 72,
73, I64
Metropolis algorithm 196
Military applications 9
Minimax optimization 87
Mi x ed - integer opt i mi zat i on 87
Mixed-integer representation 163
Mixing events 268
Monohybrid ratio 30
Monte Carlo (MC) generators 7
Multicriterion problems 5 0
Multimodal objective functions 84
Multiple criterion decision making
(MCDM) 22
Multiple-input, multiple-output (MIMO)
model 98, 99
Multipoint crossover 266
Multiprocessor system 5
Multivariate Lero-mean Guassian
random variable 137
Mutation 23. 35, 42, 59, 60, 61, 68-75,
89, 108, 152, 228, 237-55
?-opt, .?-opt and k-opt 244, 245
function 239
optimum standard deviations 24 1
successful 2 1 I
Mutation function 3 I4
Mutation operators 84,92, 93. 125, 132,
158. 237-5s
Mutation-selection equilibrium variance
3 14
Mutations 36

Inde?
Nonlinear optimimtion problems I
Nonlinear ranking 188. 189, 2 I5
Nonlinear ranking selection 202. 203
Nonoverlapping populations 205
Nonoverlapping systems 208
Nonregressive evolution 43
Nonuniforni mutation 243
n-point crossover 259. 266

0
Object parameters I36
Object variables 136-8
Objective functions 172, 173, 178
Objective values 192
Offspring machines 42, 152
One-point crossover 258, 265, 266, 271
Online planning/navigating 5
Operator descriptions 149
Operon system 34
Optical character recognition (OCR) 9
Optimimtion methods I , 2, 89, 160
Optimum convergence 24 1
Order-based mutation 246
Ordering schemata (o-schemata) 146.
147
Oren-Luenberger class 85
Ovcrlapping populations 205, 208, 209,
2 10
Ovum 27

Packing problems 6
Pairwise implementation 70
N
Pairwise mating 70, 71. 75
Near misses 308
Parallel genetic algorithms (PGAs) 77
Neighborhood, model 77
PLIrct llel PmhIein Sol 1siirg j mm Ncrtii re
Neo-Darwinism 23, 24, 37
( P P S N ) (workshop) 41
Nesting technique 86
Parallel recombinative simulated
Network design problems 6
annealing (PRSA) 196, 197
Neural networks 6, 43, 44, 99, 163
parameters and their settings 197
design 97
pseudocode for common variation
training 43, 44
I97
Neutral molecular evolution theory 37
working mechanism 196, 197
Niching methods 50
Parameter optimiation 133
No-free-lunch (NFL) theorem 20, 2 I
Parameter settings 22
Nodes 104
Parasites 120, 121
Nondetermi nis t ic-polynomial- ti me ( NP)
Parse tree representation 155-9
complete problems 97
complex numerical function I57
TEAM LRN primitiite language 156, 157
Nondisruptive crossovers 267

Index

337

Parse trees 134, 248-50

Partially matched crossover (PMX)
operators I47
Pattern recognition 45
Payoff matrix 99
Penalty functions 95, 130
Permutations 75, 128, 129, 134, 139-50,
243-6
inverse 141, 142
mapping integers to 140
Pharmaceutical fermentation process
data 8
Phenotype table 31
Phenotypes 23, 31, 232, 313, 314
Pitt approach to rule learning 47
Planning applications 4-6
Pleiotropy 23
Ploidy 35
Point mutation 35
Polygeny 23
Poolwise CCV algorithm 75
Poolwise mating 75
Poolwise methods 71
Poolwise schemes 70
Population models 2 14
Population parameters 228
Population size 198
Populations 35-7
Position-based mutation 246
Positional bias 269
Primordial phase 74
Prisoner's dilemma 98, 99, 153, 154
Probability density function (PDF) 239,
240
Proportional selection 64, 90, 167,
172-82
theory 176-80
Protein secondary-structure
determination 9
Protein segments 9
PRSA. See Parallel recombinative
simulated annealing
Pseudo-Boolean optimization problems
132, 134
Pseudocode 166-70
Punctuated crossover 260

Q
Q-learning 116

Q-values I16
Quantitative genetics models 3 13-16
Query construction 10

Random decisions 48
Random keys 146
Random mutation hill climbing
(RMHC) 243
Random program trees 106
Random search 1
Randomness 1
Rank-based selection 66, 169, 187-94.
209
overview 187, 188
theory 1 9 0 4
Ranking 187, 188, 215, 216, 218, 219
Real-valued parameters 75
Real-valued vectors 60, 128, 134,
136-8, 239-43, 270-4
Recombination 59, 60, 85, 106, 107,
152, 228, 256-307
dynamics 262, 263
events 257
formal analysis 261
Recombination bias 269
Recombination distributions 261, 264,
265, 270
Recombination events 265, 266, 269
Recombination-mutation-selection loop
61
Recurrence relations 263
Reduced surrogate recombination 26 1
Reinforcement learning problem 1 15.
I17
Replacement
biased v. unbiased strategy 67
selection 67
Representation 75-7, 127, 128, 228. 235
alternative 145, 146
guidelines for suitable encoding
160-2
importance of 128-30
nonstandard 163, 164
see c i l s o specific representations
Representations 127-3 1
Reproduction 66
Reproductive plans 45
TEAM LRN
Reproductive rate I92

338
Resource scheduling I39
REVOP 40
Ribonucleic acid (RNA) 33
Robbins equilibrium 264
Robots
applications 9
control systems 8
optimization applications of EP 43
path planning 5
.see also Evolutionary robotics
Robustness 1, 2, 20
Roulette wheel, sampling algorithm 175,
176
Route planning 43
Routing problems 4, 5, 97
Rule-based learning, Pitt approach to 47

Inder

Selection pressure 213, 215, 219, 220,

224
Selection probabilities 175
Selection process
steps involved 172, 187
see cilso specific processes
Selectionist theories 37
Selectike pressure 170, 171
Self-adaptation 43, 83, 84, 238, 241,
242, 248
by collective learning 50
procedures 138
Sequence prediction 43
Set covering problem (SCP) 132
Sex chromosomes 28
S-expressions 75
Shifting balance theory 36
Shrink mutation operator 248
S
Shuffle crossover 26 1
Sigma
scaling 174, 175
Sampling algorithms 175-7
SIMD (small instruction, multiple data)
Scaling 66
parallel machines 50
Scheduling 5
Simplex
crossover 273
Scheduling problems 5
Simplex design 40
Schema analysis 45
Simulated annealing (SA) 196, 197
Schema bias 270
Simulated evolution (SE) 43
Schema processing 44, 130, 134
Simulation applications 7
Schema theorem 134, 177, I78
Simulations 2 13, 2 14, 2 18-22
Schemata 69. 134, 265, 266, 269
Small instruction, multiple data (SIMD)
Scramble mutation operator 245, 246
parallel machines 50
Search operators 228
Soft brood selection 202
introduction 235, 236
Solutions 127, 128
Second-order schemata, transmission
Source apportionment problem 8
probabilities for 268
Speciation 36, 37
Segmented crossover 73
Sperm 27
Selection 24, 25, 37, 45, 59, 89
Sphere model 241
biasing 66, 67
Stack genetic programming 109
introduction 166-7 I
Staff rota 5
operators 166-7 1
Steady-state genetic algorithms 83, 207,
working mechanisms 166, 167
208
Selection algorithm
Steady-state selection 95
monotonic I9 I
Stepping stone model 36
strictly monotonic I9 1
Stimulus-response LCS 1 18
Selection differential 179, 192, 193
Stochastic universal sampling (SUS) 176
Selection intensity 184, 192, 213, 224
Strategy parameters 59, 137, I38
Selection mechanisms, comparison
Strength I18
2 12-27
Strict building block hypothesis (SBBH)
Selection methods 201-4
70, 71, 74, 75
analytic comparison 224, 225
Structural gene 34
Selection operators 60, I67
Substitution theorem 193, 194
TEAM LRN
Selection parameters 225

Index

339

Supersolution 168
Superstrings 199
Supervised learning 1 14, I 15
Swap mutation 245, 246
Switch mutation operator 248
System identification problems 8

T
Takeover time 179, 180, 182, 193
Target sampling rates 177, 190
Terminals 104
Termination criterion 6 1
Testing applications 7
Tetraploid 35
Three-dimensional convergent-divergent
nozzle 49
Threshold 189
Threshold selection 189, 190
Timetables 5
Tournament competition 97
Tournament selection 66, 18 1-6, 20 1,
202, 215
binary 168
concatenation of tournaments 183
formal description 182
loss of diversity 184
operator 168
parameter settings 182
properties 183-5
working mechanism 181, 182
Tournament size 181, 184
Transcription process 33, 34
Transfer RNA (tRNA) 34
Translation process 34
Transmission probabilities for
second-order schemata 268

Transportation problem 4, 5
Traveling salesman problem (TSP) 4,
43, 76, 97, 139, 143, 146, 147,
161, 162, 240, 245, 318
Tree-like hierarchies 75
Tree-structured genetic material 107
Tree-structured representation 104
Truncation selection 189, 217, 218
Two-dimensional foldable plate 8 1
Two-dimensional packing problems 6
Two-membered ES 48, 50
Two-phase nozzle optimization 49
Two-point crossover 267

U
Uncertainty 3
Unequal-area facility layout problem 6
Uniform crossover 73, 85, 259, 267
Uniform scan operator 271

V
Variable lifespan 203
Variable-length genotypes 3 19, 320
Variance 208, 209
Variation 89
Vehicle routing problem 4, 5
Virtual machine design task I05

W
Watson-Crick model 33

Z
Zero-one knapsack problems 6
Zygote 35

TEAM LRN