100% found this document useful (2 votes)
87 views73 pages

Introduction To Compilers and Language Design 2nd Edition Douglas Thain - Quickly Download The Ebook To Start Your Content Journey

The document provides information about various ebooks available for download at ebookmeta.com, including titles related to compilers, design science, emergency exercise design, and more. It highlights the second edition of 'Introduction to Compilers and Language Design' by Douglas Thain, which is available for free PDF download for personal use. Additionally, it includes a detailed table of contents for the compiler book, outlining its structure and topics covered.

Uploaded by

tunahthanbi88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
87 views73 pages

Introduction To Compilers and Language Design 2nd Edition Douglas Thain - Quickly Download The Ebook To Start Your Content Journey

The document provides information about various ebooks available for download at ebookmeta.com, including titles related to compilers, design science, emergency exercise design, and more. It highlights the second edition of 'Introduction to Compilers and Language Design' by Douglas Thain, which is available for free PDF download for personal use. Additionally, it includes a detailed table of contents for the compiler book, outlining its structure and topics covered.

Uploaded by

tunahthanbi88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.

com

Introduction to Compilers and Language Design 2nd


Edition Douglas Thain

https://ebookmeta.com/product/introduction-to-compilers-and-
language-design-2nd-edition-douglas-thain/

OR CLICK HERE

DOWLOAD EBOOK

Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

An Introduction to Design Science 2nd Edition Paul


Johannesson

https://ebookmeta.com/product/an-introduction-to-design-science-2nd-
edition-paul-johannesson/

ebookmeta.com

An Introduction to Emergency Exercise Design and


Evaluation 2nd Edition Robert Mccreight

https://ebookmeta.com/product/an-introduction-to-emergency-exercise-
design-and-evaluation-2nd-edition-robert-mccreight/

ebookmeta.com

How to Design Programs An Introduction to Programming and


Computing 2nd Edition Matthias Felleisen

https://ebookmeta.com/product/how-to-design-programs-an-introduction-
to-programming-and-computing-2nd-edition-matthias-felleisen/

ebookmeta.com

The Deep History of Ourselves The Four Billion Year Story


of How We Got Conscious Brains Joseph Ledoux

https://ebookmeta.com/product/the-deep-history-of-ourselves-the-four-
billion-year-story-of-how-we-got-conscious-brains-joseph-ledoux/

ebookmeta.com
Functional Foods and Nutraceuticals in Metabolic and Non-
communicable Diseases 1st Edition Ram B. Singh (Editor)

https://ebookmeta.com/product/functional-foods-and-nutraceuticals-in-
metabolic-and-non-communicable-diseases-1st-edition-ram-b-singh-
editor/
ebookmeta.com

Emotions in a Digital World: Social Research 4.0 1st


Edition Adrian Scribano

https://ebookmeta.com/product/emotions-in-a-digital-world-social-
research-4-0-1st-edition-adrian-scribano/

ebookmeta.com

The Generation The Rise and Fall of the Jewish Communists


of Poland Jaff Schatz

https://ebookmeta.com/product/the-generation-the-rise-and-fall-of-the-
jewish-communists-of-poland-jaff-schatz/

ebookmeta.com

You May Ask Yourself An Introduction to Thinking Like a


Sociologist 7th Edition Dalton Conley

https://ebookmeta.com/product/you-may-ask-yourself-an-introduction-to-
thinking-like-a-sociologist-7th-edition-dalton-conley/

ebookmeta.com

18th Century Male Tailoring Theatrical and Historical


Tailoring c1680 1790 1st Edition Graham Cottenden

https://ebookmeta.com/product/18th-century-male-tailoring-theatrical-
and-historical-tailoring-c1680-1790-1st-edition-graham-cottenden/

ebookmeta.com
A Selection of Simple Prose Texts 1st Edition Ruzbeh
Babaee

https://ebookmeta.com/product/a-selection-of-simple-prose-texts-1st-
edition-ruzbeh-babaee/

ebookmeta.com
Introduction to Compilers
and Language Design
Second Edition

Prof. Douglas Thain


University of Notre Dame
Introduction to Compilers and Language Design
Copyright © 2020 Douglas Thain.
Paperback ISBN: 979-8-655-18026-0
Second edition.

Anyone is free to download and print the PDF edition of this book for per-
sonal use. Commercial distribution, printing, or reproduction without the
author’s consent is expressly prohibited. All other rights are reserved.

You can find the latest version of the PDF edition, and purchase inexpen-
sive hardcover copies at http://compilerbook.org

Revision Date: January 15, 2021


iii

For Lisa, William, Zachary, Emily, and Alia.

iii
iv

iv
v

Contributions

I am grateful to the following people for their contributions to this book:


Andrew Litteken drafted the chapter on ARM assembly; Kevin Latimer
drew the RegEx to NFA and the LR example figures; Benjamin Gunning
fixed an error in LL(1) parse table construction; Tim Shaffer completed the
detailed LR(1) example.
And the following people corrected typos:
Sakib Haque (27), John Westhoff (26), Emily Strout (26), Gonzalo Martinez
(25), Daniel Kerrigan (24), Brian DuSell (23), Ryan Mackey (20), TJ Dasso
(18), Nedim Mininovic (15), Noah Yoshida (14), Joseph Kimlinger (12),
Nolan McShea (11), Jongsuh Lee (11), Kyle Weingartner (10), Andrew Lit-
teken (9), Thomas Cane (9), Samuel Battalio (9), Stéphane Massou (8), Luis
Prieb (7), William Diederich (7), Jonathan Xu (6), Gavin Inglis (6), Kath-
leen Capella (6), Edward Atkinson (6), Tanner Juedeman (5), John Johnson
(4), Luke Siela (4), Francis Schickel (4), Eamon Marmion (3), Molly Zachlin
(3), David Chiang (3), Jacob Mazur (3), Spencer King (2), Yaoxian Qu (2),
Maria Aranguren (2), Patrick Lacher (2), Connor Higgins (2), Tango Gu (2),
Andrew Syrmakesis (2), Horst von Brand (2), John Fox (2), Jamie Zhang
(2), Benjamin Gunning (1) Charles Osborne (1), William Theisen (1), Jes-
sica Cioffi (1), Ben Tovar (1), Ryan Michalec (1), Patrick Flynn (1), Clint
Jeffery (1), Ralph Siemsen (1), John Quinn (1), Paul Brunts (1), Luke Wurl
(1), Bruce Mardle (1), Dane Williams (1), Thomas Fisher (1), Alan Johnson
(1), Jacob Harris (1), Jeff Clinton (1)
Please send any comments or corrections via email to Prof. Douglas
Thain ([email protected]).

v
vi

vi
CONTENTS vii

Contents

1 Introduction 1
1.1 What is a compiler? . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why should you study compilers? . . . . . . . . . . . . . . . 2
1.3 What’s the best way to learn about compilers? . . . . . . . . 2
1.4 What language should I use? . . . . . . . . . . . . . . . . . . 2
1.5 How is this book different from others? . . . . . . . . . . . . 3
1.6 What other books should I read? . . . . . . . . . . . . . . . . 4

2 A Quick Tour 5
2.1 The Compiler Toolchain . . . . . . . . . . . . . . . . . . . . . 5
2.2 Stages Within a Compiler . . . . . . . . . . . . . . . . . . . . 6
2.3 Example Compilation . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Scanning 11
3.1 Kinds of Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 A Hand-Made Scanner . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 Deterministic Finite Automata . . . . . . . . . . . . . 16
3.4.2 Nondeterministic Finite Automata . . . . . . . . . . . 17
3.5 Conversion Algorithms . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1 Converting REs to NFAs . . . . . . . . . . . . . . . . . 19
3.5.2 Converting NFAs to DFAs . . . . . . . . . . . . . . . . 22
3.5.3 Minimizing DFAs . . . . . . . . . . . . . . . . . . . . . 24
3.6 Limits of Finite Automata . . . . . . . . . . . . . . . . . . . . 26
3.7 Using a Scanner Generator . . . . . . . . . . . . . . . . . . . . 26
3.8 Practical Considerations . . . . . . . . . . . . . . . . . . . . . 28
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Parsing 35
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Context Free Grammars . . . . . . . . . . . . . . . . . . . . . 36

vii
viii CONTENTS

4.2.1 Deriving Sentences . . . . . . . . . . . . . . . . . . . . 37


4.2.2 Ambiguous Grammars . . . . . . . . . . . . . . . . . . 38
4.3 LL Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 Eliminating Left Recursion . . . . . . . . . . . . . . . 41
4.3.2 Eliminating Common Left Prefixes . . . . . . . . . . . 42
4.3.3 First and Follow Sets . . . . . . . . . . . . . . . . . . . 43
4.3.4 Recursive Descent Parsing . . . . . . . . . . . . . . . . 45
4.3.5 Table Driven Parsing . . . . . . . . . . . . . . . . . . . 47
4.4 LR Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4.1 Shift-Reduce Parsing . . . . . . . . . . . . . . . . . . . 50
4.4.2 The LR(0) Automaton . . . . . . . . . . . . . . . . . . 51
4.4.3 SLR Parsing . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.4 LR(1) Parsing . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.5 LALR Parsing . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Grammar Classes Revisited . . . . . . . . . . . . . . . . . . . 62
4.6 The Chomsky Hierarchy . . . . . . . . . . . . . . . . . . . . . 63
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Parsing in Practice 69
5.1 The Bison Parser Generator . . . . . . . . . . . . . . . . . . . 70
5.2 Expression Validator . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Expression Interpreter . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Expression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 The Abstract Syntax Tree 85


6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.5 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.6 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . 95
6.7 Building the AST . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7 Semantic Analysis 99
7.1 Overview of Type Systems . . . . . . . . . . . . . . . . . . . . 100
7.2 Designing a Type System . . . . . . . . . . . . . . . . . . . . . 103
7.3 The B-Minor Type System . . . . . . . . . . . . . . . . . . . . 106
7.4 The Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.5 Name Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.6 Implementing Type Checking . . . . . . . . . . . . . . . . . . 113
7.7 Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 117

viii
CONTENTS ix

7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118


7.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 118

8 Intermediate Representations 119


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2 Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . 119
8.3 Directed Acyclic Graph . . . . . . . . . . . . . . . . . . . . . . 120
8.4 Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . 125
8.5 Static Single Assignment Form . . . . . . . . . . . . . . . . . 127
8.6 Linear IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.7 Stack Machine IR . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.8.1 GIMPLE - GNU Simple Representation . . . . . . . . 130
8.8.2 LLVM - Low Level Virtual Machine . . . . . . . . . . 131
8.8.3 JVM - Java Virtual Machine . . . . . . . . . . . . . . . 132
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9 Memory Organization 135


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.2 Logical Segmentation . . . . . . . . . . . . . . . . . . . . . . . 135
9.3 Heap Management . . . . . . . . . . . . . . . . . . . . . . . . 138
9.4 Stack Management . . . . . . . . . . . . . . . . . . . . . . . . 140
9.4.1 Stack Calling Convention . . . . . . . . . . . . . . . . 141
9.4.2 Register Calling Convention . . . . . . . . . . . . . . 142
9.5 Locating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.6 Program Loading . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 148

10 Assembly Language 149


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10.2 Open Source Assembler Tools . . . . . . . . . . . . . . . . . . 150
10.3 X86 Assembly Language . . . . . . . . . . . . . . . . . . . . . 152
10.3.1 Registers and Data Types . . . . . . . . . . . . . . . . 152
10.3.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . 154
10.3.3 Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . 156
10.3.4 Comparisons and Jumps . . . . . . . . . . . . . . . . . 158
10.3.5 The Stack . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.3.6 Calling a Function . . . . . . . . . . . . . . . . . . . . 160
10.3.7 Defining a Leaf Function . . . . . . . . . . . . . . . . . 162
10.3.8 Defining a Complex Function . . . . . . . . . . . . . . 163
10.4 ARM Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . 167
10.4.1 Registers and Data Types . . . . . . . . . . . . . . . . 167
10.4.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . 168
10.4.3 Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . 170

ix
x CONTENTS

10.4.4 Comparisons and Branches . . . . . . . . . . . . . . . 171


10.4.5 The Stack . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.4.6 Calling a Function . . . . . . . . . . . . . . . . . . . . 174
10.4.7 Defining a Leaf Function . . . . . . . . . . . . . . . . . 175
10.4.8 Defining a Complex Function . . . . . . . . . . . . . . 176
10.4.9 64-bit Differences . . . . . . . . . . . . . . . . . . . . . 179
10.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 180

11 Code Generation 181


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
11.2 Supporting Functions . . . . . . . . . . . . . . . . . . . . . . . 181
11.3 Generating Expressions . . . . . . . . . . . . . . . . . . . . . 183
11.4 Generating Statements . . . . . . . . . . . . . . . . . . . . . . 188
11.5 Conditional Expressions . . . . . . . . . . . . . . . . . . . . . 192
11.6 Generating Declarations . . . . . . . . . . . . . . . . . . . . . 193
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12 Optimization 195
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
12.2 Optimization in Perspective . . . . . . . . . . . . . . . . . . . 196
12.3 High Level Optimizations . . . . . . . . . . . . . . . . . . . . 197
12.3.1 Constant Folding . . . . . . . . . . . . . . . . . . . . . 197
12.3.2 Strength Reduction . . . . . . . . . . . . . . . . . . . . 199
12.3.3 Loop Unrolling . . . . . . . . . . . . . . . . . . . . . . 199
12.3.4 Code Hoisting . . . . . . . . . . . . . . . . . . . . . . . 200
12.3.5 Function Inlining . . . . . . . . . . . . . . . . . . . . . 201
12.3.6 Dead Code Detection and Elimination . . . . . . . . . 202
12.4 Low-Level Optimizations . . . . . . . . . . . . . . . . . . . . 204
12.4.1 Peephole Optimizations . . . . . . . . . . . . . . . . . 204
12.4.2 Instruction Selection . . . . . . . . . . . . . . . . . . . 204
12.5 Register Allocation . . . . . . . . . . . . . . . . . . . . . . . . 207
12.5.1 Safety of Register Allocation . . . . . . . . . . . . . . 208
12.5.2 Priority of Register Allocation . . . . . . . . . . . . . . 208
12.5.3 Conflicts Between Variables . . . . . . . . . . . . . . . 209
12.5.4 Global Register Allocation . . . . . . . . . . . . . . . . 210
12.6 Optimization Pitfalls . . . . . . . . . . . . . . . . . . . . . . . 211
12.7 Optimization Interactions . . . . . . . . . . . . . . . . . . . . 212
12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
12.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 215

A Sample Course Project 217


A.1 Scanner Assignment . . . . . . . . . . . . . . . . . . . . . . . 217
A.2 Parser Assignment . . . . . . . . . . . . . . . . . . . . . . . . 217
A.3 Pretty-Printer Assignment . . . . . . . . . . . . . . . . . . . . 218
A.4 Typechecker Assignment . . . . . . . . . . . . . . . . . . . . . 218

x
CONTENTS xi

A.5 Optional: Intermediate Representation . . . . . . . . . . . . . 218


A.6 Code Generator Assignment . . . . . . . . . . . . . . . . . . . 218
A.7 Optional: Extend the Language . . . . . . . . . . . . . . . . . 219

B The B-Minor Language 221


B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
B.2 Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
B.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
B.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
B.5 Declarations and Statements . . . . . . . . . . . . . . . . . . . 224
B.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
B.7 Optional Elements . . . . . . . . . . . . . . . . . . . . . . . . 225

C Coding Conventions 227

Index 229

xi
xii CONTENTS

xii
LIST OF FIGURES xiii

List of Figures

2.1 A Typical Compiler Toolchain . . . . . . . . . . . . . . . . . . 5


2.2 The Stages of a Unix Compiler . . . . . . . . . . . . . . . . . 6
2.3 Example AST . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Example Intermediate Representation . . . . . . . . . . . . . 9
2.5 Example Assembly Code . . . . . . . . . . . . . . . . . . . . . 10

3.1 A Simple Hand Made Scanner . . . . . . . . . . . . . . . . . . 12


3.2 Relationship Between REs, NFAs, and DFAs . . . . . . . . . 19
3.3 Subset Construction Algorithm . . . . . . . . . . . . . . . . . 22
3.4 Converting an NFA to a DFA via Subset Construction . . . . 23
3.5 Hopcroft’s DFA Minimization Algorithm . . . . . . . . . . . 24
3.6 Structure of a Flex File . . . . . . . . . . . . . . . . . . . . . . 27
3.7 Example Flex Specification . . . . . . . . . . . . . . . . . . . . 29
3.8 Example Main Program . . . . . . . . . . . . . . . . . . . . . 29
3.9 Example Token Enumeration . . . . . . . . . . . . . . . . . . 30
3.10 Build Procedure for a Flex Program . . . . . . . . . . . . . . . 30

4.1 Two Derivations of the Same Sentence . . . . . . . . . . . . . 38


4.2 A Recursive-Descent Parser . . . . . . . . . . . . . . . . . . . 46
4.3 LR(0) Automaton for Grammar G10 . . . . . . . . . . . . . . 53
4.4 SLR Parse Table for Grammar G10 . . . . . . . . . . . . . . . 56
4.5 Part of LR(0) Automaton for Grammar G11 . . . . . . . . . . 58
4.6 LR(1) Automaton for Grammar G10 . . . . . . . . . . . . . . 61
4.7 The Chomsky Hierarchy . . . . . . . . . . . . . . . . . . . . . 64

5.1 Bison Specification for Expression Validator . . . . . . . . . . 71


5.2 Main Program for Expression Validator . . . . . . . . . . . . 72
5.3 Build Procedure for Bison and Flex Together . . . . . . . . . 72
5.4 Bison Specification for an Interpreter . . . . . . . . . . . . . . 75
5.5 AST for Expression Interpreter . . . . . . . . . . . . . . . . . 76
5.6 Building an AST for the Expression Grammar . . . . . . . . . 78
5.7 Evaluating Expressions . . . . . . . . . . . . . . . . . . . . . . 80
5.8 Printing and Evaluating Expressions . . . . . . . . . . . . . . 81

7.1 The Symbol Structure . . . . . . . . . . . . . . . . . . . . . . . 107

xiii
xiv LIST OF FIGURES

7.2 A Nested Symbol Table . . . . . . . . . . . . . . . . . . . . . . 109


7.3 Symbol Table API . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.4 Name Resolution for Declarations . . . . . . . . . . . . . . . 112
7.5 Name Resolution for Expressions . . . . . . . . . . . . . . . . 112

8.1 Sample DAG Data Structure Definition . . . . . . . . . . . . 120


8.2 Example of Constant Folding . . . . . . . . . . . . . . . . . . 125
8.3 Example Control Flow Graph . . . . . . . . . . . . . . . . . . 126

9.1 Flat Memory Model . . . . . . . . . . . . . . . . . . . . . . . . 135


9.2 Logical Segments . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.3 Multiprogrammed Memory Layout . . . . . . . . . . . . . . 137

10.1 X86 Register Structure . . . . . . . . . . . . . . . . . . . . . . 153


10.2 X86 Register Structure . . . . . . . . . . . . . . . . . . . . . . 154
10.3 Summary of System V ABI Calling Convention . . . . . . . . 160
10.4 System V ABI Register Assignments . . . . . . . . . . . . . . 162
10.5 Example X86-64 Stack Layout . . . . . . . . . . . . . . . . . . 164
10.6 Complete X86 Example . . . . . . . . . . . . . . . . . . . . . . 166
10.7 ARM Addressing Modes . . . . . . . . . . . . . . . . . . . . . 169
10.8 ARM Branch Instructions . . . . . . . . . . . . . . . . . . . . 172
10.9 Summary of ARM Calling Convention . . . . . . . . . . . . . 174
10.10ARM Register Assignments . . . . . . . . . . . . . . . . . . . 175
10.11Example ARM Stack Frame . . . . . . . . . . . . . . . . . . . 177
10.12Complete ARM Example . . . . . . . . . . . . . . . . . . . . . 178

11.1 Code Generation Functions . . . . . . . . . . . . . . . . . . . 182


11.2 Example of Generating X86 Code from a DAG . . . . . . . . 184
11.3 Expression Generation Skeleton . . . . . . . . . . . . . . . . . 186
11.4 Generating Code for a Function Call . . . . . . . . . . . . . . 187
11.5 Statement Generator Skeleton . . . . . . . . . . . . . . . . . . 188

12.1 Timing a Fast Operation . . . . . . . . . . . . . . . . . . . . . 197


12.2 Constant Folding Pseudo-Code . . . . . . . . . . . . . . . . . 198
12.3 Example X86 Instruction Templates . . . . . . . . . . . . . . . 206
12.4 Example of Tree Rewriting . . . . . . . . . . . . . . . . . . . . 207
12.5 Live Ranges and Register Conflict Graph . . . . . . . . . . . 210
12.6 Example of Global Register Allocation . . . . . . . . . . . . . 211

xiv
1

Chapter 1 – Introduction

1.1 What is a compiler?

A compiler translates a program in a source language to a program in


a target language. The most well known form of a compiler is one that
translates a high level language like C into the native assembly language
of a machine so that it can be executed. And of course there are compilers
for other languages like C++, Java, C#, and Rust, and many others.
The same techniques used in a traditional compiler are also used in
any kind of program that processes a language. For example, a typeset-
ting program like TEX translates a manuscript into a Postscript document.
A graph-layout program like Dot consumes a list of nodes and edges and
arranges them on a screen. A web browser translates an HTML document
into an interactive graphical display. To write programs like these, you
need to understand and use the same techniques as in traditional compil-
ers.
Compilers exist not only to translate programs, but also to improve them.
A compiler assists a programmer by finding errors in a program at compile
time, so that the user does not have to encounter them at runtime. Usually,
a more strict language results in more compile-time errors. This makes the
programmer’s job harder, but makes it more likely that the program is
correct. For example, the Ada language is infamous among programmers
as challenging to write without compile-time errors, but once working, is
trusted to run safety-critical systems such as the Boeing 777 aircraft.
A compiler is distinct from an interpreter, which reads in a program
and then executes it directly, without emitting a translation. This is also
sometimes known as a virtual machine. Languages like Python and Ruby
are typically executed by an interpreter that reads the source code directly.
Compilers and interpreters are closely related, and it is sometimes pos-
sible to exchange one for the other. For example, Java compilers translate
Java source code into Java bytecode, which is an abstract form of assem-
bly language. Some implementations of the Java Virtual Machine work as
interpreters that execute one instruction at a time. Others work by trans-
lating the bytecode into local machine code, and then running the machine
code directly. This is known as just in time compiling or JIT.

1
2 CHAPTER 1. INTRODUCTION

1.2 Why should you study compilers?

You will be a better programmer. A great craftsman must understand his


or her tools, and a programmer is no different. By understanding more
deeply how a compiler translates your program into machine language,
you will become more skilled at writing effective code and debugging it
when things go wrong.
You can create tools for debugging and translating. If you can write a parser
for a given language, then you can write all manner of supporting tools
that help you (and others) debug your own programs. An integrated de-
velopment environment like Eclipse incorporates parsers for languages
like Java, so that it can highlight syntax, find errors without compiling,
and connect code to documentation as you write.
You can create new languages. A surprising number of problems are
made easier by expressing them compactly in a custom language. (These
are sometimes known as domain specific languages or simply little lan-
guages.) By learning the techniques of compilers, you will be able to im-
plement little languages and avoid some pitfalls of language design.
You can contribute to existing compilers. While it’s unlikely that you will
write the next great C compiler (since we already have several), language
and compiler development does not stand still. Standards development
results in new language features; optimization research creates new ways
of improving programs; new microprocessors are created; new operating
systems are developed; and so on. All of these developments require the
continuous improvement of existing compilers.
You will have fun while solving challenging problems. Isn’t that enough?

1.3 What’s the best way to learn about compilers?

The best way to learn about compilers is to write your own compiler from
beginning to end. While that may sound daunting at first, you will find
that this complex task can be broken down into several stages of moder-
ate complexity. The typical undergraduate computer science student can
write a complete compiler for a simple language in a semester, broken
down into four or five independent stages.

1.4 What language should I use?

Without question, you should use the C programming language and the
X86 assembly language, of course!
Ok, maybe the answer isn’t quite that simple. There is an ever-increasing
number of programming languages that all have different strengths and
weaknesses. Java is simple, consistent, and portable, albeit not high per-
formance. Python is easy to learn and has great library support, but is
weakly typed. Rust offers exceptional static type-safety, but is not (yet)

2
1.5. HOW IS THIS BOOK DIFFERENT FROM OTHERS? 3

widely used. It is quite possible to write a compiler in nearly any lan-


guage, and you could use this book as a guide to do so.
However, we really think that you should learn C, write a compiler in
C, and use it to compile a C-like language which produces assembly for a
widely-used processor, like X86 or ARM. Why? Because it is important for
you to learn the ins and outs of technologies that are in wide use, and not
just those that are abstractly beautiful.
C is the most widely-used portable language for low-level coding (com-
pilers, and libraries, and kernels) and it is also small enough that one
can learn how to compile every aspect of C in a single semester. True, C
presents some challenges related to type safety and pointer use, but these
are manageable for a project the size of a compiler. There are other lan-
guages with different virtues, but none as simple and as widely used as C.
Once you write a C compiler, then you are free to design your own (better)
language.
Likewise, the X86 has been the most widely-deployed computer archi-
tecture in desktops, servers, and laptops for several decades. While it is
considerably more complex than other architectures like MIPS or SPARC
or ARM, one can quickly learn the essential subset of instructions nec-
essary to build a compiler. Of course, ARM is quickly catching up as a
popular architecture in the mobile, embedded, and low power space, so
we have included a section on that as well.
That said, the principles presented in this book are widely applicable.
If you are using this as part of a class, your instructor may very well choose
a different compilation language and different target assembly, and that’s
fine too.

1.5 How is this book different from others?

Most books on compilers are very heavy on the abstract theory of scan-
ners, parsers, type systems, and register allocation, and rather light on
how the design of a language affects the compiler and the runtime. Most
are designed for use by a graduate survey of optimization techniques.
This book takes a broader approach by giving a lighter dose of opti-
mization, and introducing more material on the process of engineering a
compiler, the tradeoffs in language design, and considerations for inter-
pretation and translation.
You will also notice that this book doesn’t contain a whole bunch of
fiddly paper-and-pencil assignments to test your knowledge of compiler
algorithms. (Ok, there are a few of those in Chapters 3 and 4.) If you want
to test your knowledge, then write some working code. To that end, the
exercises at the end of each chapter ask you to take the ideas in the chapter,
and either explore some existing compilers, or write parts of your own. If
you do all of them in order, you will end up with a working compiler,
summarized in the final appendix.

3
4 CHAPTER 1. INTRODUCTION

1.6 What other books should I read?

For general reference on compilers, I suggest the following books:

• Charles N. Fischer, Ron K. Cytron, and Richard J. LeBlanc Jr, “Craft-


ing a Compiler”, Pearson, 2009.
This is an excellent undergraduate textbook which focuses on object-oriented soft-
ware engineering techniques for constructing a compiler, with a focus on generating
output for the Java Virtual Machine.

• Christopher Fraser and David Hanson, “A Retargetable C Com-


piler: Design and Implementation”, Benjamin/Cummings, 1995.
Also known as the “LCC book”, this book focuses entirely on explaining the C imple-
mentation of a C compiler by taking the unusual approach of embedding the literal
code into the textbook, so that code and explanation are intertwined.

• Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman,


“Compilers: Principles, Techniques, and Tools”, Addison Wesley,
2006. Affectionately known as the “dragon book”, this is a comprehensive treat-
ment of the theory of compilers from scanning through type theory and optimization
at an advanced graduate level.

Ok, what are you waiting for? Let’s get to work.

4
5

Chapter 2 – A Quick Tour

2.1 The Compiler Toolchain

A compiler is one component in a toolchain of programs used to create


executables from source code. Typically, when you invoke a single com-
mand to compile a program, a whole sequence of programs are invoked in
the background. Figure 2.1 shows the programs typically used in a Unix
system for compiling C source code to assembly code.

Headers
(stdio.h)

Source Preprocessor Preprocessed Compiler Assembly Assembler


(prog.c) (cpp) Source (cc1) (prog.s) (as)

Object Code
(prog.o)

Dynamic Static
Running Executable
Linker Linker
Process (prog)
(ld.so) (ld)

Libraries
Dynamic Libraries (libc.a)
(libc.so)

Figure 2.1: A Typical Compiler Toolchain

• The preprocessor prepares the source code for the compiler proper.
In the C and C++ languages, this means consuming all directives that
start with the # symbol. For example, an #include directive causes
the preprocessor to open the named file and insert its contents into
the source code. A #define directive causes the preprocessor to
substitute a value wherever a macro name is encountered. (Not all
languages rely on a preprocessor.)

• The compiler proper consumes the clean output of the preproces-


sor. It scans and parses the source code, performs typechecking and

5
6 CHAPTER 2. A QUICK TOUR

other semantic routines, optimizes the code, and then produces as-
sembly language as the output. This part of the toolchain is the main
focus of this book.

• The assembler consumes the assembly code and produces object


code. Object code is “almost executable” in that it contains raw ma-
chine language instructions in the form needed by the CPU. How-
ever, object code does not know the final memory addresses in which
it will be loaded, and so it contains gaps that must be filled in by the
linker.

• The linker consumes one or more object files and library files and
combines them into a complete, executable program. It selects the
final memory locations where each piece of code and data will be
loaded, and then “links” them together by writing in the missing ad-
dress information. For example, an object file that calls the printf
function does not initially know the address of the function. An
empty (zero) address will be left where the address must be used.
Once the linker selects the memory location of printf, it must go
back and write in the address at every place where printf is called.

In Unix-like operating systems, the preprocessor, compiler, assembler,


and linker are historically named cpp, cc1, as, and ld respectively. The
user-visible program cc simply invokes each element of the toolchain in
order to produce the final executable.

2.2 Stages Within a Compiler

In this book, our focus will be primarily on the compiler proper, which is
the most interesting component in the toolchain. The compiler itself can
be divided into several stages:

Abstract
Character Semantic Intermediate Code Assembly
Scanner Tokens Parser Syntax
Stream Routines Representation Generator Code
Tree

Optimizers

Figure 2.2: The Stages of a Unix Compiler

• The scanner consumes the plain text of a program, and groups to-
gether individual characters to form complete tokens. This is much
like grouping characters into words in a natural language.

6
2.3. EXAMPLE COMPILATION 7

• The parser consumes tokens and groups them together into com-
plete statements and expressions, much like words are grouped into
sentences in a natural language. The parser is guided by a grammar
which states the formal rules of composition in a given language.
The output of the parser is an abstract syntax tree (AST) that cap-
tures the grammatical structures of the program. The AST also re-
members where in the source file each construct appeared, so it is
able to generate targeted error messages, if needed.
• The semantic routines traverse the AST and derive additional mean-
ing (semantics) about the program from the rules of the language
and the relationship between elements of the program. For exam-
ple, we might determine that x + 10 is a float expression by ob-
serving the type of x from an earlier declaration, then applying the
language rule that addition between int and float values yields
a float. After the semantic routines, the AST is often converted into
an intermediate representation (IR) which is a simplified form of
assembly code suitable for detailed analysis. There are many forms
of IR which we will discuss in Chapter 8.
• One or more optimizers can be applied to the intermediate represen-
tation, in order to make the program smaller, faster, or more efficient.
Typically, each optimizer reads the program in IR format, and then
emits the same IR format, so that each optimizer can be applied in-
dependently, in arbitrary order.
• Finally, a code generator consumes the optimized IR and transforms
it into a concrete assembly language program. Typically, a code gen-
erator must perform register allocation to effectively manage the
limited number of hardware registers, and instruction selection and
sequencing to order assembly instructions in the most efficient form.

2.3 Example Compilation

Suppose we wish to compile this fragment of code into assembly:

height = (width+56) * factor(foo);

The first stage of the compiler (the scanner) will read in the text of
the source code character by character, identify the boundaries between
symbols, and emit a series of tokens. Each token is a small data structure
that describes the nature and contents of each symbol:

id:height = ( id:width + int:56 ) * id:factor ( id:foo ) ;

At this stage, the purpose of each token is not yet clear. For example,
factor and foo are simply known to be identifiers, even though one is

7
8 CHAPTER 2. A QUICK TOUR

the name of a function, and the other is the name of a variable. Likewise,
we do not yet know the type of width, so the + could potentially rep-
resent integer addition, floating point addition, string concatenation, or
something else entirely.
The next step is to determine whether this sequence of tokens forms
a valid program. The parser does this by looking for patterns that match
the grammar of a language. Suppose that our compiler understands a
language with the following grammar:

Grammar G1
1. expr → expr + expr
2. expr → expr * expr
3. expr → expr = expr
4. expr → id ( expr )
5. expr → ( expr )
6. expr → id
7. expr → int

Each line of the grammar is called a rule, and explains how various
parts of the language are constructed. Rules 1-3 indicate that an expression
can be formed by joining two expressions with operators. Rule 4 describes
a function call. Rule 5 describes the use of parentheses. Finally, rules 6 and
7 indicate that identifiers and integers are atomic expressions. 1
The parser looks for sequences of tokens that can be replaced by the
left side of a rule in our grammar. Each time a rule is applied, the parser
creates a node in a tree, and connects the sub-expressions into the abstract
syntax tree (AST). The AST shows the structural relationships between
each symbol: addition is performed on width and 56, while a function
call is applied to factor and foo.
With this data structure in place, we are now prepared to analyze the
meaning of the program. The semantic routines traverse the AST and de-
rive additional meaning by relating parts of the program to each other, and
to the definition of the programming language. An important component
of this process is typechecking, in which the type of each expression is
determined, and checked for consistency with the rest of the program. To
keep things simple here, we will assume that all of our variables are plain
integers.
To generate linear intermediate code, we perform a post-order traver-
sal of the AST and generate an IR instruction for each node in the tree. A
typical IR looks like an abstract assembly language, with load/store in-
structions, arithmetic operations, and an infinite number of registers. For
example, this is a possible IR representation of our example program:
1 The careful reader will note that this example grammar has ambiguities. We will discuss

that in some detail in Chapter 4.

8
2.3. EXAMPLE COMPILATION 9

ASSIGN

ID
MUL
height

ADD CALL

ID INT ID ID
width 56 factor foo

Figure 2.3: Example AST

LOAD $56 -> r1


LOAD width -> r2
IADD r1, r2 -> r3
ARG foo
CALL factor -> r4
IMUL r3, r4 -> r5
STOR r5 -> height
Figure 2.4: Example Intermediate Representation

The intermediate representation is where most forms of optimization


occur. Dead code is removed, common operations are combined, and code
is generally simplified to consume fewer resources and run more quickly.
Finally, the intermediate code must be converted to the desired assem-
bly code. Figure 2.5 shows X86 assembly code that is one possible trans-
lation of the IR given above. Note that the assembly instructions do not
necessarily correspond one-to-one with IR instructions.
A well-engineered compiler is highly modular, so that common code
elements can be shared and combined as needed. To support multiple lan-
guages, a compiler can provide distinct scanners and parsers, each emit-
ting the same intermediate representation. Different optimization tech-
niques can be implemented as independent modules (each reading and

9
10 CHAPTER 2. A QUICK TOUR

MOVQ width, %rax # load width into rax


ADDQ $56, %rax # add 56 to rax
MOVQ %rax, -8(%rbp) # save sum in temporary
MOVQ foo, %edi # load foo into arg 0 register
CALL factor # invoke factor, result in rax
MOVQ -8(%rbp), %rbx # load sum into rbx
IMULQ %rbx # multiply rbx by rax
MOVQ %rax, height # store result into height
Figure 2.5: Example Assembly Code

writing the same IR) so that they can be enabled and disabled indepen-
dently. A retargetable compiler contains multiple code generators, so that
the same IR can be emitted for a variety of microprocessors.

2.4 Exercises

1. Determine how to invoke the preprocessor, compiler, assembler, and


linker manually in your local computing environment. Compile a
small complete program that computes a simple expression, and ex-
amine the output at each stage. Are you able to follow the flow of
the program in each form?

2. Determine how to change the optimization level for your local com-
piler. Find a non-trivial source program and compile it at multiple
levels of optimization. How does the compile time, program size,
and run time vary with optimization levels?

3. Search the internet for the formal grammars for three languages that
you are familiar with, such as C++, Ruby, and Rust. Compare them
side by side. Which language is inherently more complex? Do they
share any common structures?

10
11

Chapter 3 – Scanning

3.1 Kinds of Tokens

Scanning is the process of identifying tokens from the raw text source code
of a program. At first glance, scanning might seem trivial – after all, iden-
tifying words in a natural language is as simple as looking for spaces be-
tween letters. However, identifying tokens in source code requires the
language designer to clarify many fine details, so that it is clear what is
permitted and what is not.
Most languages will have tokens in these categories:
• Keywords are words in the language structure itself, like while or
class or true. Keywords must be chosen carefully to reflect the
natural structure of the language, without interfering with the likely
names of variables and other identifiers.
• Identifiers are the names of variables, functions, classes, and other
code elements chosen by the programmer. Typically, identifiers are
arbitrary sequences of letters and possibly numbers. Some languages
require identifiers to be marked with a sentinel (like the dollar sign
in Perl) to clearly distinguish identifiers from keywords.
• Numbers could be formatted as integers, or floating point values, or
fractions, or in alternate bases such as binary, octal or hexadecimal.
Each format should be clearly distinguished, so that the programmer
does not confuse one with the other.
• Strings are literal character sequences that must be clearly distin-
guished from keywords or identifiers. Strings are typically quoted
with single or double quotes, but also must have some facility for
containing quotations, newlines, and unprintable characters.
• Comments and whitespace are used to format a program to make it
visually clear, and in some cases (like Python) are significant to the
structure of a program.
When designing a new language, or designing a compiler for an exist-
ing language, the first job is to state precisely what characters are permit-
ted in each type of token. Initially, this could be done informally by stating,

11
12 CHAPTER 3. SCANNING

token_t scan_token( FILE *fp ) {


int c = fgetc(fp);
if(c==’*’) {
return TOKEN_MULTIPLY;
} else if(c==’!’) {
char d = fgetc(fp);
if(d==’=’) {
return TOKEN_NOT_EQUAL;
} else {
ungetc(d,fp);
return TOKEN_NOT;
}
} else if(isalpha(c)) {
do {
char d = fgetc(fp);
} while(isalnum(d));
ungetc(d,fp);
return TOKEN_IDENTIFIER;
} else if ( . . . ) {
. . .
}
}

Figure 3.1: A Simple Hand Made Scanner

for example, “An identifier consists of a letter followed by any number of letters
and numerals.”, and then assigning a symbolic constant (TOKEN IDENTIFIER)
for that kind of token. As we will see, an informal approach is often am-
biguous, and a more rigorous approach is needed.

3.2 A Hand-Made Scanner

Figure 3.1 shows how one might write a scanner by hand, using simple
coding techniques. To keep things simple, we only consider just a few
tokens: * for multiplication, ! for logical-not, != for not-equal, and se-
quences of letters and numbers for identifiers.
The basic approach is to read one character at a time from the input
stream (fgetc(fp)) and then classify it. Some single-character tokens are
easy: if the scanner reads a * character, it immediately returns
TOKEN MULTIPLY, and the same would be true for addition, subtraction,
and so forth.
However, some characters are part of multiple tokens. If the scanner
encounters !, that could represent a logical-not operation by itself, or it
could be the first character in the != sequence representing not-equal-to.

12
3.3. REGULAR EXPRESSIONS 13

Upon reading !, the scanner must immediately read the next character. If
the next character is =, then it has matched the sequence != and returns
TOKEN NOT EQUAL.
But, if the character following ! is something else, then the non-matching
character needs to be put back on the input stream using ungetc, because
it is not part of the current token. The scanner returns TOKEN NOT and will
consume the put-back character on the next call to scan token.
In a similar way, once a letter has been identified by isalpha(c), then
the scanner keeps reading letters or numbers, until a non-matching char-
acter is found. The non-matching character is put back, and the scanner
returns TOKEN IDENTIFIER.
(We will see this pattern come up in every stage of the compiler: an
unexpected item doesn’t match the current objective, so it must be put
back for later. This is known more generally as backtracking.)
As you can see, a hand-made scanner is rather verbose. As more to-
ken types are added, the code can become quite convoluted, particularly
if tokens share common sequences of characters. It can also be difficult
for a developer to be certain that the scanner code corresponds to the de-
sired definition of each token, which can result in unexpected behavior on
complex inputs. That said, for a small language with a limited number of
tokens, a hand-made scanner can be an appropriate solution.
For a complex language with a large number of tokens, we need a more
formalized approach to defining and scanning tokens. A formal approach
will allow us to have a greater confidence that token definitions do not
conflict and the scanner is implemented correctly. Further, a formal ap-
proach will allow us to make the scanner compact and high performance
– surprisingly, the scanner itself can be the performance bottleneck in a
compiler, since every single character must be individually considered.
The formal tools of regular expressions and finite automata allow us
to state very precisely what may appear in a given token type. Then, auto-
mated tools can process these definitions, find errors or ambiguities, and
produce compact, high performance code.

3.3 Regular Expressions

Regular expressions (REs) are a language for expressing patterns. They


were first described in the 1950s by Stephen Kleene [4] as an element of
his foundational work in automata theory and computability. Today, REs
are found in slightly different forms in programming languages (Perl),
standard libraries (PCRE), text editors (vi), command-line tools (grep),
and many other places. We can use regular expressions as a compact
and formal way of specifying the tokens accepted by the scanner of a
compiler, and then automatically translate those expressions into working
code. While easily explained, REs can be a bit tricky to use, and require
some practice in order to achieve the desired results.

13
14 CHAPTER 3. SCANNING

Let us define regular expressions precisely:

A regular expression s is a string which denotes L(s), a set of strings


drawn from an alphabet Σ. L(s) is known as the “language of s.”
L(s) is defined inductively with the following base cases:

• If a ∈ Σ then a is a regular expression and L(a) = {a}.


•  is a regular expression and L() contains only the empty string.

Then, for any regular expressions s and t:

1. s|t is a RE such that L(s|t) = L(s) ∪ L(t).


2. st is a RE such that L(st) contains all strings formed by the
concatenation of a string in L(s) followed by a string in L(t).
3. s∗ is a RE such that L(s∗ ) = L(s) concatenated zero or more times.

Rule #3 is known as the Kleene closure and has the highest precedence.
Rule #2 is known as concatenation. Rule #1 has the lowest precedence and
is known as alternation. Parentheses can be added to adjust the order of
operations in the usual way.
Here are a few examples using just the basic rules. (Note that a finite
RE can indicate an infinite set.)
Regular Expression s Language L(s)
hello { hello }
d(o|i)g { dog,dig }
moo* { mo,moo,mooo,... }
(moo)* { ,moo,moomoo,moomoomoo,... }
a(b|a)*a { aa,aaa,aba,aaaa,aaba,abaa,... }
The syntax described so far is entirely sufficient to write any regular
expression. But, it is also handy to have a few helper operations built on
top of the basic syntax:

s? indicates that s is optional.


s? can be written as (s|)

s+ indicates that s is repeated one or more times.


s+ can be written as ss*

[a-z] indicates any character in that range.


[a-z] can be written as (a|b|...|z)

[ˆx] indicates any character except one.


[ˆx] can be written as Σ - x

14
3.4. FINITE AUTOMATA 15

Regular expressions also obey several algebraic properties, which make


it possible to re-arrange them as needed for efficiency or clarity:

Associativity: a|(b|c) = (a|b)|c


Commutativity: a|b = b|a
Distribution: a(b|c) = ab|ac
Idempotency: a** = a*

Using regular expressions, we can precisely state what is permitted in


a given token. Suppose we have a hypothetical programming language
with the following informal definitions and regular expressions. For each
token type, we show examples of strings that match (and do not match)
the regular expression.
Informal definition: An identifier is a sequence of capital letters and num-
bers, but a number must not come first.
Regular expression: [A-Z]+([A-Z]|[0-9])*
Matches strings: PRINT
MODE5
Does not match: hello
4YOU
Informal definition: A number is a sequence of digits with an optional dec-
imal point. For clarity, the decimal point must have
digits on both left and right sides.
Regular expression: [0-9]+(.[0-9]+)?
Matches strings: 123
3.14
Does not match: .15
30.
Informal definition: A comment is any text (except a right angle bracket)
surrounded by angle brackets.
Regular expression: <[ˆ>]*>
Matches strings: <tricky part>
<<<<look left>
Does not match: <this is an <illegal> comment>

3.4 Finite Automata

A finite automaton (FA) is an abstract machine that can be used to repre-


sent certain forms of computation. Graphically, an FA consists of a number
of states (represented by numbered circles) and a number of edges (repre-
sented by labeled arrows) between those states. Each edge is labeled with
one or more symbols drawn from an alphabet Σ.
The machine begins in a start state S0 . For each input symbol presented
to the FA, it moves to the state indicated by the edge with the same label

15
16 CHAPTER 3. SCANNING

as the input symbol. Some states of the FA are known as accepting states
and are indicated by a double circle. If the FA is in an accepting state after
all input is consumed, then we say that the FA accepts the input. We say
that the FA rejects the input string if it ends in a non-accepting state, or if
there is no edge corresponding to the current input symbol.
Every RE can be written as an FA, and vice versa. For a simple regular
expression, one can construct an FA by hand. For example, here is an FA
for the keyword for:

f o r
0 1 2 3

Here is an FA for identifiers of the form [a-z][a-z0-9]+

a-z
0-9

a-z
a-z 0-9
0 1 2

And here is an FA for numbers of the form ([1-9][0-9]*)|0

0-9

0-9
1-9 1 2

0
0
3

3.4.1 Deterministic Finite Automata


Each of these three examples is a deterministic finite automaton (DFA).
A DFA is a special case of an FA where every state has no more than one
outgoing edge for a given symbol. Put another way, a DFA has no am-
biguity: for every combination of state and input symbol, there is exactly
one choice of what to do next.
Because of this property, a DFA is very easy to implement in software
or hardware. One integer (c) is needed to keep track of the current state.

16
3.4. FINITE AUTOMATA 17

The transitions between states are represented by a matrix (M [s, i]) which
encodes the next state, given the current state and input symbol. (If the
transition is not allowed, we mark it with E to indicate an error.) For each
symbol, we compute c = M [s, i] until all the input is consumed, or an error
state is reached.

3.4.2 Nondeterministic Finite Automata

The alternative to a DFA is a nondeterministic finite automaton (NFA).


An NFA is a perfectly valid FA, but it has an ambiguity that makes it some-
what more difficult to work with.
Consider the regular expression [a-z]*ing, which represents all lower-
case words ending in the suffix ing. It can be represented with the follow-
ing automaton:

[a-z]

i n g
0 1 2 3

Now consider how this automaton would consume the word sing. It
could proceed in two different ways. One would be to move to state 0 on
s, state 1 on i, state 2 on n, and state 3 on g. But the other, equally valid
way would be to stay in state 0 the whole time, matching each letter to the
[a-z] transition. Both ways obey the transition rules, but one results in
acceptance, while the other results in rejection.
The problem here is that state 0 allows for two different transitions on
the symbol i. One is to stay in state 0 matching [a-z] and the other is to
move to state 1 matching i.
Moreover, there is no simple rule by which we can pick one path or
another. If the input is sing, the right solution is to proceed immediately
from state zero to state one on i. But if the input is singing, then we
should stay in state zero for the first ing and proceed to state one for the
second ing .
An NFA can also have an  (epsilon) transition, which represents the
empty string. This transition can be taken without consuming any input
symbols at all. For example, we could represent the regular expression
a*(ab|ac) with this NFA:

17
18 CHAPTER 3. SCANNING

a
a b 3
ε 1 2

0 ε
4 a c
5 6

This particular NFA presents a variety of ambiguous choices. From


state zero, it could consume a and stay in state zero. Or, it could take an 
to state one or state four, and then consume an a either way.
There are two common ways to interpret this ambiguity:
• The crystal ball interpretation suggests that the NFA somehow “knows”
what the best choice is, by some means external to the NFA itself. In
the example above, the NFA would choose whether to proceed to
state zero, one, or four before consuming the first character, and it
would always make the right choice. Needless to say, this isn’t pos-
sible in a real implementation.
• The many-worlds interpretation suggests that the NFA exists in all
allowable states simultaneously. When the input is complete, if any
of those states are accepting states, then the NFA has accepted the
input. This interpretation is more useful for constructing a working
NFA, or converting it to a DFA.
Let us use the many-worlds interpretation on the example above. Sup-
pose that the input string is aaac. Initially the NFA is in state zero. With-
out consuming any input, it could take an epsilon transition to states one
or four. So, we can consider its initial state to be all of those states si-
multaneously. Continuing on, the NFA would traverse these states until
accepting the complete string aaac:

States Action
0, 1, 4 consume a
0, 1, 2, 4, 5 consume a
0, 1, 2, 4, 5 consume a
0, 1, 2, 4, 5 consume c
6 accept
In principle, one can implement an NFA in software or hardware by
simply keeping track of all of the possible states. But this is inefficient.
In the worst case, we would need to evaluate all states for all characters
on each input transition. A better approach is to convert the NFA into an
equivalent DFA, as we show below.

18
3.5. CONVERSION ALGORITHMS 19

3.5 Conversion Algorithms

Regular expressions and finite automata are all equally powerful. For ev-
ery RE, there is an FA, and vice versa. However, a DFA is by far the most
straightforward of the three to implement in software. In this section, we
will show how to convert an RE into an NFA, then an NFA into a DFA,
and then to optimize the size of the DFA.

Thompson's Subset Transition


Nondeterministic Deterministic
Regular Construction Construction Matrix
Finite Finite Code
Expression
Automaton Automaton

Figure 3.2: Relationship Between REs, NFAs, and DFAs

3.5.1 Converting REs to NFAs


To convert a regular expression to a nondeterministic finite automaton, we
can follow an algorithm given first by McNaughton and Yamada [5], and
then by Ken Thompson [6].
We follow the same inductive definition of regular expression as given
earlier. First, we define automata corresponding to the base cases of REs:

The NFA for any character a is: The NFA for an  transition is:

a ε

Now, suppose that we have already constructed NFAs for the regular
expressions A and B, indicated below by rectangles. Both A and B have
a single start state (on the left) and accepting state (on the right). If we
write the concatenation of A and B as AB, then the corresponding NFA is
simply A and B connected by an  transition. The start state of A becomes
the start state of the combination, and the accepting state of B becomes the
accepting state of the combination:

The NFA for the concatenation AB is:

A ε B

19
20 CHAPTER 3. SCANNING

In a similar fashion, the alternation of A and B written as A|B can be ex-


pressed as two automata joined by common starting and accepting nodes,
all connected by  transitions:

The NFA for the alternation A|B is:

A
ε ε

ε ε
B

Finally, the Kleene closure A* is constructed by taking the automaton


for A, adding starting and accepting nodes, then adding  transitions to
allow zero or more repetitions:

The NFA for the Kleene closure A* is:

ε ε

ε
ε

Example. Let’s consider the process for an example regular expression


a(cat|cow)*. First, we start with the innermost expression cat and as-
semble it into three transitions resulting in an accepting state. Then, do the
same thing for cow, yielding these two FAs:

c o w

c a t

The alternation of the two expressions cat|cow is accomplished by


adding a new starting and accepting node, with epsilon transitions. (The
boxes are not part of the graph, but simply highlight the previous graph
components carried forward.)

20
3.5. CONVERSION ALGORITHMS 21

c a t
ε ε

ε ε
c o w

Then, the Kleene closure (cat|cow)* is accomplished by adding an-


other starting and accepting state around the previous FA, with epsilon
transitions between:

c o w
ε ε

ε ε ε ε
c a t

Finally, the concatenation of a(cat|cow)* is achieved by adding a


single state at the beginning for a:

c w
o
ε ε
ε
a ε ε ε ε
c a t

You can easily see that the NFA resulting from the construction algo-
rithm, while correct, is quite complex and contains a large number of ep-
silon transitions. An NFA representing the tokens for a complete language
could end up having thousands of states, which would be very impractical
to implement. Instead, we can convert this NFA into an equivalent DFA.

21
22 CHAPTER 3. SCANNING

3.5.2 Converting NFAs to DFAs


We can convert any NFA into an equivalent DFA using the technique of
subset construction. The basic idea is to create a DFA such that each state
in the DFA corresponds to multiple states in the NFA, according to the
“many-worlds” interpretation.
Suppose that we begin with an NFA consisting of states N and start
state N0 . We wish to construct an equivalent DFA consisting of states D
and start state D0 . Each D state will correspond to multiple N states. First,
we define a helper function known as the epsilon closure:

Epsilon closure.
−closure(n) is the set of NFA states reachable from NFA state n by zero
or more  transitions.
Now we define the subset construction algorithm. First, we create a
start state D0 corresponding to the −closure(N0 ). Then, for each outgo-
ing character c from the states in D0 , we create a new state containing the
epsilon closure of the states reachable by c. More precisely:

Subset Construction Algorithm.


Given an NFA with states N and start state N0 , create an equivalent DFA
with states D and start state D0 .
Let D0 = −closure(N0 ).
Add D0 to a list.
While items remain on the list:
Let d be the next DFA state removed from the list.
For each character c in Σ:
Let T contain all NFA states Nk such that:
c
Nj ∈ d and Nj → − Nk
Create new DFA state Di = −closure(T)
If Di is not already in the list, add it to the end.

Figure 3.3: Subset Construction Algorithm

22
3.5. CONVERSION ALGORITHMS 23

c a t
ε N8 N9 N10 N11 ε
a ε ε ε
N0 N1 N2 N3 N12 N13
ε ε
c o w
N4 N5 N6 N7

D3:
N6
w
D4:
o N7, N12, N13,
N2, N3, N4, N8
c
D1:
D0: a N1, N2, N3,
c D2:
N0
N4, N8, N13
N5, N9 c
D6:
a N11, N12, N13,
N2,N3, N4, N8
t
D5:
N10

Figure 3.4: Converting an NFA to a DFA via Subset Construction

Example. Let’s work out the algorithm on the NFA in Figure 3.4. This
is the same NFA corresponding to the RE a(cat|cow)* with each of the
states numbered for clarity.

1. Compute D0 which is −closure(N0 ). N0 has no  transitions, so


D0 = {N0 }. Add D0 to the work list.
2. Remove D0 from the work list. The character a is an outgoing tran-
sition from N0 to N1 . −closure(N1 ) = {N1 , N2 , N3 , N4 , N8 , N13 } so
add all of those to new state D1 and add D1 to the work list.
c c
3. Remove D1 from the work list. We can see that N4 → − N5 and N8 → −
N9 , so we create a new state D2 = {N5 , N9 } and add it to the work
list.
4. Remove D2 from the work list. Both a and o are possible transitions
o a
because of N5 − → N6 and N9 −
→ N10 . So, create a new state D3 for the
o transition to N6 and new state D5 for the a transition to N10 . Add
both D3 and D5 to the work list.
w
5. Remove D3 from the work list. The only possible transition is N6 −

N7 so create a new state D4 containing the −closure(N7 ) and add it
to the work list.
t
6. Remove D5 from the work list. The only possible transition is N10 →

N11 so create a new state D6 containing −closure(N11 ) and add it to
the work list.

23
24 CHAPTER 3. SCANNING

7. Remove D4 from the work list, and observe that the only outgoing
transition c leads to states N5 and N9 which already exist as state D2 ,
c
so simply add a transition D4 → − D2 .
c
8. Remove D6 from the work list and, in a similar way, add D6 →
− D2 .
9. The work list is empty, so we are done.

3.5.3 Minimizing DFAs


The subset construction algorithm will definitely generate a valid DFA,
but the DFA may possibly be very large (especially if we began with a
complex NFA generated from an RE.) A large DFA will have a large tran-
sition matrix that will consume a lot of memory. If it doesn’t fit in L1 cache,
the scanner could run very slowly. To address this problem, we can apply
Hopcroft’s algorithm to shrink a DFA into a smaller (but equivalent) DFA.
The general approach of the algorithm is to optimistically group to-
gether all possibly-equivalent states S into super-states T . Initially, we
place all non-accepting S states into super-state T0 and accepting states
into super-state T1 . Then, we examine the outgoing edges in each state
s ∈ Ti . If, a given character c has edges that begin in Ti and end in dif-
ferent super-states, then we consider the super-state to be inconsistent with
respect to c. (Consider an impermissible transition as if it were a transi-
tion to TE , a super-state for errors.) The super-state must then be split into
multiple states that are consistent with respect to c. Repeat this process for
all super-states and all characters c ∈ Σ until no more splits are required.

DFA Minimization Algorithm.


Given a DFA with states S, create an equivalent DFA with
an equal or fewer number of states T .
First partition S into T such that:
T0 = non-accepting states of S.
T1 = accepting states of S.
Repeat:
∀Ti ∈ T :
∀c ∈ Σ:
c
if Ti →
− { more than one T state },
then split Ti into multiple T states
such that c has the same action in each.
Until no more states are split.

Figure 3.5: Hopcroft’s DFA Minimization Algorithm

24
3.5. CONVERSION ALGORITHMS 25

Example. Suppose we have the following non-optimized DFA and


wish to reduce it to a smaller DFA:

3
b
b
1 a
a
b
b
a 4 5
2
a
a

We begin by grouping all of non-accepting states 1, 2, 3, 4 into one


super-state and the accepting state 5 into another super-state, like this:

b
1,2,3,4 a 5
b

Now, we ask whether this graph is consistent with respect to all possi-
ble inputs, by referring back to the original DFA. For example, we observe
that, if we are in super-state (1,2,3,4) then an input of a always goes to
state 2, which keeps us within the super-state. So, this DFA is consistent
with respect to a. However, from super-state (1,2,3,4) an input of b can
either stay within the super-state or go to super-state (5). So, the DFA is
inconsistent with respect to b.
To fix this, we try splitting out one of the inconsistent states (4) into a
new super-state, taking the transitions with it:

b b
4 5
1,2,3
a
a,b

Again, we examine each super-state for consistency with respect to


each input character. Again, we observe that super-state 1,2,3 is consis-
tent with respect to a, but not consistent with respect to b because it can
either lead to state 3 or state 4. We attempt to fix this by splitting out state
2 into its own super-state, yielding this DFA.

25
26 CHAPTER 3. SCANNING

b a

b
b 4 5
a
1,3 2 a
a
b

Again, we examine each super-state and observe that each possible in-
put is consistent with respect to the super-state, and therefore we have the
minimal DFA.

3.6 Limits of Finite Automata

Regular expressions and finite automata are powerful and effective at rec-
ognizing simple patterns in individual words or tokens, but they are not
sufficient to analyze all of the structures in a problem. For example, could
you use a finite automaton to match an arbitrary number of nested paren-
theses?
It’s not hard to write out an FA that could match, say, up to three pairs
of nested parentheses, like this:

0
( 1
( 2
( 3
) ) )

But the key word is arbitrary! To match any number of parentheses


would require an infinite automaton, which is obviously impractical. Even
if we were to apply some practical upper limit (say, 100 pairs) the automa-
ton would still be impractically large when combined with all the other
elements of a language that must be supported.
For example, a language like Python permits the nesting of parentheses
() for precedence, curly brackets {} to represent dictionaries, and square
brackets [] to represent lists. An automaton to match up to 100 nested
pairs of each in arbitrary order would have 1,000,000 states!
So, we limit ourselves to using regular expressions and finite automata
for the narrow purpose of identifying the words and symbols within a
problem. To understand the higher level structure of a program, we will
instead use parsing techniques introduced in Chapter 4.

3.7 Using a Scanner Generator

Because a regular expression precisely describes all the allowable forms


of a token, we can use a program to automatically transform a set of reg-

26
Exploring the Variety of Random
Documents with Different Content
(quoted on Petrograd Imperial Ballet School), 173f.
Petipa school, 185.
Petit battements, 95.
[Les] Petits Riens (Noverre and Mozart), 91.
Petrograd (Museum), 13;
(Imperial Ballet School), 172;
(opera house), 175.
Petrouchka (Stravinsky), 229ff.
Pharaohs (dancing in the court of), 17.
Philip of Macedonia, 55.
Philippus (Roman consul), 76.
Philosophic symbolism (in Indian dance), 29.
Phœnicians, 57.
Physical exercises, 239.
Pipe (Egyptian), 8, 18.
Pipes (in Graveyard Dance), 22;
(in 15th-cent. Italian ballet), 82.
Pirouette, 94, 97, 150, 163;
(in Egyptian dancing), 18, 20.
Plaasovaya (Russian folk-dance), 140.
Plastomimic choreography, 247ff.
Plato (quoted), iv;
(cited), 52, 58, 67, 69.
Plots (for ballets), 250.
Plutarch (cited), iv, 14, 45, 67.
Poetry, 235.
Pointes, 163, 215.
Poland (folk-dancing), 136.
Pollux, 54.
Polo (Moorish dance), 106.
Polonaise (Polish folk-dance), 136.
Polowetsi dance (Cossack), 140.
Portugal (mediæval strolling ballets), 80f.
Positions. See Steps.
Poushkin, 178.
Prévost, Mme., 100.
Priapus, 54.
Price, Waldemar (Danish ballet dancer), 164.
Primitive dances (rel. to sexual selection), 6.
Primitive peoples, 3ff.
Prince Igor, 228.
Professional dancing, 7;
(Egyptian), 18.
Provence, 80f, 122, 131.
Prussia (Fackeltanz), 128.
Pskoff, 140.
Psyche (French ballet), 92.
Psychology, 1ff, 24, 45, 136, 139.
Pugni, Cesare (ballet composer), 152.
Pygmalion and Galatea (ballet), 99.
Pylades (Roman dancer), 73, 74f.
Pyrrhic dance, 60f.
Pythian games, 54.

Q
Quadrille (French social dance), 122.
Quintilian (quoted), 72.

R
Rabinoff, Max, 188.
Racial characteristics, 11.
‘Ragtime,’ 263.
Rainbow Dance, 192.
Ramble (Indian goddess of dancing), 24f.
Realism, 157, 249f.
Réception d’une jeune Nymphe à la Court de Terpsichore, 152.
Reed pipes. See Pipes.
Reger, Max, 205.
Regnard (quoted), 88.
Reinach, Théodore (cited on Greek arts), 69.
René of Provence (author of mediæval ballet), 80.
Reno (painter of Salome dance), 45.
Rheinländer (German dance), 131.
Rhythm, 1, 2;
(in naturalistic dancing), 196, 198;
(as basis of all arts), 235;
(in Jacques-Dalcroze system), 239, 244;
(in ballet), 250.
Rhythmic gymnastics, 234ff, 240, 249.
Richelieu, 86, 100.
Rigaudon, 148f.
Rimsky-Korsakoff, Nicolai, 151, 152, 171, 183, 224, 226, 254.
Rinaldo and Armida (ballet by Noverre), 90, 99.
Risti Tants (Esthonian folk-dance), 126ff.
Robert of Normandie (ballet), 164.
Robespierre, 93.
Robinson, Louis (cited on dance instinct), 3.
Rodin (quoted), 196.
Romaika (Slavic folk-dance), 137.
Rome (dancing in), 3, 72ff, 247;
(sacred dancing), 9;
(imitation of Greek dances), 10;
(Pyrrhic dance), 60.
Roman Church. See Church.
Romulus, 73.
Rondes (similarity to Eleusinian Mysteries), 67;
(French folk-dance), 121.
Roses of Love (ballet by Noverre), 90.
Rossini, 101, 103, 151.
Rouen, 100.
Roumania (folk-dance), 137f.
Round. See Ronde.
Royal Academy of Dancing (French), 86.
Rubinstein, Anton, 183, 256;
(composed ‘Tarantella’), 124.
Rubinstein, Ida, 45.
Ruggera (Italian folk-dancing), 124.
Rune tunes (Finnish), 63.
Russia (Imperial Ballet), 92;
(influence of, on choreography), 102;
(nationalistic tendencies), 104f;
(folk-dancing), 139ff, 262;
(influences on ballet), 169;
(ballets of opera house), 175;
(influence of Duncan school), 200, 206, 218f.
Russian Imperial Ballet School, 90f, 105, 172.
Russian Imperial Dramatic Dancing School, 180.
Ruthenia (folk-dancing). See Slavic folk-dances.

S
Sacchetto, Rita, 203, 212.
Sacre du Printemps (Stravinsky), 231.
Sacred dancing (in rel. to folk-lore), 9;
(Egyptian), 15;
(Indian), 26;
(Japanese), 38;
(American Indian), 39, 41f;
(Greek), 59, 67ff;
(Roman), 73f.
Sadler, Michael T. H. (quoted on Jacques-Dalcroze School), 235f.
Sahara Graveyard Dance, 21.
Sailor’s Dance (Dutch), 135.
St. Basil (cited), iii.
St. Carlos (celebrated by strolling ballet), 80.
St. Denis, Ruth, 208, 212.
Saint-Léon, 159.
St. Matthew (quoted), 44.
St. Petersburg (court ballet), 90, 161.
See also Petrograd.
Saint-Saëns, Camille, 186.
St. Vitus’ Dance, 129.
Sakuntala (French ballet), 152.
Sallé, Mlle., 94, 99, 100.
Salmacida Spolia (Sir William Davenant), 84.
Salome dances, 44f, 191.
Salome (Richard Strauss), 45.
Saltarello (Italian folk-dance), 124.
Sangalli, Rita, 159.
Sappho, 70, 94.
Sarabande, 146.
Sarasate, Pablo, 108.
Satyr Dance (in Dionysian Mysteries), 68, 69.
Sauvages de la Mer du Sud, [Les] (French ballet), 94.
Savage peoples. See Primitive peoples.
Savinskaya, 206.
Saxony (folk-dancing), 130.
Scaliger, Joseph Justa (cited), 54.
Scandinavia (folk-dances), 2, 133;
(nationalistic tendencies), 104f;
(waltz), 131;
(naturalistic school), 205.
Schafftertanz (of Munich), 129.
Scheherezade (Rimsky-Korsakoff), 152, 226.
Schiller, 166, 250.
Schirjajeff, 182.
Schliemann (Egyptologist), cited, 17.
Schmoller (Saxonian folk-dance), 130.
Schnitzler, Arthur, 166.
Schönberg, Arnold, 205.
Schools of dancing, (Petipa), 185;
(Duncan), 197;
(Jacques-Dalcroze), 197f.
See Academies.
Schopenhauer (cited), 250;
(quoted), 64.
Schleiftänze, 129.
Schreittänze. 129.
Schubert, Franz, 103f, 254.
Scotch Reel, 118f.
Scotland (folk-dancing), 118f.
Scribe, Eugène. 103.
Schuhplatteltanz (Bavarian folk-dance), 129f.
Schumann, Robert, 206.
Sculpture (in rel. to dancing), 173, 196, 235.
Seguidilla (Spanish dance), 50.
Sensationalism, 190.
Seroff, Alexander Nikolayevitch, 104, 171, 181.
Serpentine Dance, 189, 190f.
Servia (folk-dancing).
See Slavic folk-dances.
Setche, Egyptologist (cited), 14.
Seville (church dancing), iv, 78;
(court dancing), 47.
Sex instinct (in rel. to folk-dancing), v, 11, 134, 139.
Shakespeare (cited on the jig), 119.
Sharp, Cecil (quoted on Morris dances), 113f.
Shean Treuse (Scotch folk-dance), 118.
Shintoism (Japanese religion), 36.
Shirley, James, 83.
Sibelius, Jean, 205, 254, 256, 257f.
Siberia (folk-dancing), 140.
Siciliana (Italian folk-dance), 124.
[Le] Sicilien (ballet), 153.
Sieba (ballet), 152.
Siebensprung (Swabian folk-dance), 130.
Singing (in Finnish dances), 133.
Singing ballet, 177f.
Singing Sirens, 57.
Skirt Dance, 189, 212.
Skoliasmos (in Dionysian mysteries), 68f.
Skralat (Swedish folk-dance), 133.
Slavic folk-dances, 136ff.
Sleeping Beauty (Tschaikowsky), 152, 185.
Snake dances (Lithuanian), 135;
(American Indian), 38, 41, 135.
Snegourotchka (Rimsky-Korsakoff). See Snow Maiden.
Snow Maiden (Rimsky-Korsakoff), 152, 177, 183f.
Social dancing (Greek), 54f;
(Polish), 136;
(in 17th cent.), 144ff.
See also Court dancing.
Socrates, 54, 56.
Sokolova (ballerina), 151, 183.
Solomon, Hebrew king, 43, 44.
Sophocles, 62.
Sound (in relation to movement), 238
[La] Source (Delibes), 152.
Spain (religious dancing), iv;
(folk-dancing), 2, 105ff, 210ff;
(choreographic art of Moors), 46, 50f;
(mediæval strolling ballets), 80f.
Spartan dance, 54f, 60.
Spectre de la Rose (ballet), 221, 223, 229.
Spendiaroff, 256.
Spinning top principle, 216.
Stage dancing (in Middle Ages), 81, 148.
See also Professional dancing.
Steps, 2;
(in American Indian dances), 42;
(in courante), 88;
(in classic French ballet), 95f;
(Bolero), 109;
(Seguidilla), 110;
(Hungarian folk-dances), 125f;
(Rigaudon), 149;
(Bournoville’s reform), 163.
Stephania (Roman dancer), 77.
Stewart-Richardson, Lady Constance, 206.
Stockholm (ballet dancing), 161.
Stockholm school, 151.
Stomach Dance (Arabian dance), 3, 21, 22.
Stone Age, 5.
Stramboe, Adolph F., 164.
Strassburg, 129.
Strauss, Johann, 132.
Strauss, Richard, 204f, 232.
Stravinsky, Igor, 185, 229ff.
Strindberg, August, 165.
String instruments (Indian), 27.
Strolling ballets (mediæval), 80f;
(in French Revolution), 93f.
Strophic principle, 63.
Stuck (painter of Salome dance), 45.
Stuttgart (court), 90, 153.
Subra, Mlle. (ballerina), 159.
Su-Chu-Fu (dancing academy), 34.
Suetonius (cited), 76.
Sun’s Darling (English masque), 84.
Svendsen, Johann, 133, 205.
Svetloff (cited), 218.
Swan, The (Saint-Saëns), 186.
Swanhilde (ballet), 167.
Swan Lake (Russian ballet), 152, 184f.
Swabia (folk-dancing), 130.
Sweden (influence on Russian ballet), 169.
See also Scandinavia.
Sword Dance (English), 21, 33, 113, 115ff.
La Sylphide (Delibes), 152, 153, 154, 156, 163.
[Les] Sylphides, 175, 221.
Sylvia (Delibes), 152.
Symbolism (in Indian dancing), 29, 263f;
(in Hungarian folk-dancing), 126;
(in Lada’s dances), 254f;
(in modern ballet), 258, 265.
Symons, Arthur (quoted), 264f.
Symphonic music (as basis for dancing), 200, 206.
Syrinx (Egyptian instrument), iv.
Szolo (Hungarian folk-dance), 126.

T
Tabor (in Morris dance), 115.
Tacitus (cited), 76.
Taglioni, Maria, 11, 151, 152ff, 156, 157, 193.
Taglioni, Salvatore, 151, 152, 161.
Ta-gien (Chinese dance), 32.
Ta-gu (Chinese dance), 32.
Ta-knen (Chinese dance), 32.
Talmud, 43.
Ta-mao (Chinese dance), 32.
Tambourine (in Hebrew dance), 19;
(in Indian dance), 27;
(with bells, Chinese), 32;
(in Greek dances), 71;
(in Spanish dance), 79f, 106;
(in Tarantella), 122.
Taneieff, Sergei Ivanovich, 224.
Tarantella (Italian folk-dance), 122ff.
Tartar tribes, 140.
Tascara (Spanish folk-dance), 111f.
Taubentanz (Black Forest), 130.
Ta-u (Chinese dance), 32.
Tcherepnin, 185, 226, 229.
Technique (Duncan), 199;
(instrumental), 237;
(eurhythmic), 239.
Telemachus, 53.
Telemaque (French ballet), 92.
Teleshova (ballerina), 151, 181.
Telethusa (Roman dancer), 77.
Tempe Restored (Aurelian Townsend), 84f.
Temple dancing (Hebraic), 43, 44;
(Greek), 54f;
(Esthonian), 127.
See also Sacred dancing.
Terpsichore, 10, 57.
Terpsichore (ballet by Handel), 99.
Teu-Kung (Chinese dancing teacher), 31.
Thackeray (quoted on Taglioni), 154.
Thales, 59.
Théatre des Arts, 92.
Theatre of Dionysius, 64f.
Thebes, 19.
Theseus, iv, 54, 69.
They (Chinese monarch), 30.
Tiberius (Roman emperor), 76.
Tichomiroff, 221.
Time, 240f.
Time-marker (in Greek dancing), 70f.
Time-values, 241.
Titans, 59.
Titus (Roman emperor), 34.
Toe-dance, 215.
Toledo (church dancing), iv, 78.
Toreadoren (ballet), 164.
Torra (Murcian folk-dance), 106.
Tourdion (social dance), 150.
Townsend, Aurelian, 84f.
Trepak (Russian folk-dance), 140.
Trescona (Florentine folk-dance), 124.
Triangle (in English Horn dance), 117.
Tripoli (Almeiis dancers in), 21.
Triumph of Love, 87.
Triumph of Peace (James Shirley), 83.
Trouhanova, Natasha, 45, 244, 256f.
Trumpets (in 15th-cent. Italian ballet), 82.
Tschaikowsky, Peter Ilyitch, 104, 151, 152, 171, 177, 183, 184,
185.
Tshamuda (Indian goddess), 26.
Tuileries, 87.
Tunic, ballerina’s, 215.
Tunis (Almeiis dancers in), 21.
Turgenieff, 104, 171;
(quoted on Elssler), 155f.
Tuta, 215.

U
Uchtomsky, Prince (cited), 28.
U-gientze (Chinese dance), 32.
Ulysses, 52.
Urbino, Duke of, 80.
V
Vafva Vadna (Swedish folk-dance), 133f.
Valdemar (Danish ballet), 163, 164.
Valencia, iv, 78, 107f.
Valencian Bishop (advocate of dancing), 78.
Valentine, Gwendoline (ballet dancer), 206.
Vanka (Cossak dance), 140.
Van Staden (Colonel), 179.
Vaudoyer, J. L., 229.
Vaughan, Kate (ballet dancer), 193.
Veie de Noue (in Lou Gue), 80.
Veils (used in Greek dancing), 66, 70.
Venera (Indian goddess), 24.
[La] Ventana (ballet), 166.
Venus of Cailipyge, 76f.
Verbunkes (Hungarian folk-dance), 126.
[La] Vestale (ballet), 153.
Vestris brothers, 91, 101, 148, 151, 162.
Viennese court, 90.
Viennese School, 151.
Villiani, Mme. (ballet dancer), 22, 193.
Vingakersdans (Swedish folk-dance), 134.
Violin (in 15th-cent. Italian ballet), 82;
(in Spanish folk-dance), 107.
Vision of Salome (ballet), 201.
Vocal ballets, 177f.
Vocal music (dependence of dancing upon), 8;
(in Greek dances), 58.
Voisins, Comte Gilbert des, 154.
Volga, 140.
Volinin (Russian ballet dancer), 185, 187, 248.
Volkhonsky, Prince Serge (quoted), 197f, 212f, 215ff, 232, 249.
Voltaire (cited), 99.
Volte (French folk-dance), 131.
Vuillier (quoted on Spanish temple dancing), 79f.
Vulcan, 53.
Vulture Dance (Greek), 69.

W
Wagnerian operas, 63.
Waldteufel, 132.
Waltz, 131f.
Walzer, 131.
War-dances (primitive), 5f;
(Pyrrhic), 60;
(Roman), 73;
(Hungarian), 126.
Warsaw (opera house), 175.
Weber, Carl Maria von, 91, 103, 229.
Weber, Louise, 192.
Weiss, Mme., 159.
Wellman, Christian, 180.
Whistles (in American Indian dances), 41;
(in Morris dance), 115.
Whitehall (masques performed at), 83.
Wiesenthal, Elsa and Grete, 202f, 212.
Wilhelm II, 130.
Wilkinson, Sir Gardner, on Egypt (cited), 18f;
(quoted), 20f.
Women (earliest appearance of, in ballet), 87.
Wood-wind instruments (Indian), 27.
Wsevoloshky, 183.
Würtemberg (folk-dancing), 130.

X
Xenophon (quoted), 55f.
Xeres, iv.

Y
Yorkshire (English sword dance of), 116.
Yu-Wang (Chinese emperor), 33.

Z
Zarzuela (Spanish comic opera), 63f, 106.
Zeus, 59.
Zorongo (Spanish folk-dance), 111.
Zulus (war dances of), 5.
Zunfttänze, 129.
Zwölfmonatstanz (Würtemberg), 130.
Transcriber’s Notes
Punctuation, hyphenation, and spelling were made
consistent when a predominant preference was found in
the original book; otherwise they were not changed.
Simple typographical errors were corrected;
unbalanced quotation marks were remedied when the
change was obvious, and otherwise left unbalanced.
The index was not checked for proper
alphabetization or correct page references, with this
exception: all references to pages iii–vi should be to
pages vii–x. In versions of this eBook that support
hyperlinks, the links have been corrected, but the
displayed page numbers have not been changed in any
version of this eBook.
Page 110: “Albacetex” was printed that way;
probably is a misprint for “Albacete”.
Page 131: “3/4 rhythm” was printed as “3-4 rhythm”
but changed here to conform with the predominant form
of notation throughout the original book.
Page 275: “English Cathedrals” reference to page viii
was printed as “iii-f”; changed here.
*** END OF THE PROJECT GUTENBERG EBOOK THE DANCE ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the


United States and most other parts of the world at no
cost and with almost no restrictions whatsoever. You may
copy it, give it away or re-use it under the terms of the
Project Gutenberg License included with this eBook or
online at www.gutenberg.org. If you are not located in
the United States, you will have to check the laws of the
country where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute


this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you


derive from the use of Project Gutenberg™ works
calculated using the method you already use to calculate
your applicable taxes. The fee is owed to the owner of
the Project Gutenberg™ trademark, but he has agreed to
donate royalties under this paragraph to the Project
Gutenberg Literary Archive Foundation. Royalty payments
must be paid within 60 days following each date on
which you prepare (or are legally required to prepare)
your periodic tax returns. Royalty payments should be
clearly marked as such and sent to the Project Gutenberg
Literary Archive Foundation at the address specified in
Section 4, “Information about donations to the Project
Gutenberg Literary Archive Foundation.”

• You provide a full refund of any money paid by a user


who notifies you in writing (or by e-mail) within 30 days
of receipt that s/he does not agree to the terms of the
full Project Gutenberg™ License. You must require such a
user to return or destroy all copies of the works
possessed in a physical medium and discontinue all use
of and all access to other copies of Project Gutenberg™
works.

• You provide, in accordance with paragraph 1.F.3, a full


refund of any money paid for a work or a replacement
copy, if a defect in the electronic work is discovered and
reported to you within 90 days of receipt of the work.

• You comply with all other terms of this agreement for


free distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like