Introduction To Compilers and Language Design 2nd Edition Douglas Thain - Quickly Download The Ebook To Start Your Content Journey
Introduction To Compilers and Language Design 2nd Edition Douglas Thain - Quickly Download The Ebook To Start Your Content Journey
com
https://ebookmeta.com/product/introduction-to-compilers-and-
language-design-2nd-edition-douglas-thain/
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/an-introduction-to-design-science-2nd-
edition-paul-johannesson/
ebookmeta.com
https://ebookmeta.com/product/an-introduction-to-emergency-exercise-
design-and-evaluation-2nd-edition-robert-mccreight/
ebookmeta.com
https://ebookmeta.com/product/how-to-design-programs-an-introduction-
to-programming-and-computing-2nd-edition-matthias-felleisen/
ebookmeta.com
https://ebookmeta.com/product/the-deep-history-of-ourselves-the-four-
billion-year-story-of-how-we-got-conscious-brains-joseph-ledoux/
ebookmeta.com
Functional Foods and Nutraceuticals in Metabolic and Non-
communicable Diseases 1st Edition Ram B. Singh (Editor)
https://ebookmeta.com/product/functional-foods-and-nutraceuticals-in-
metabolic-and-non-communicable-diseases-1st-edition-ram-b-singh-
editor/
ebookmeta.com
https://ebookmeta.com/product/emotions-in-a-digital-world-social-
research-4-0-1st-edition-adrian-scribano/
ebookmeta.com
https://ebookmeta.com/product/the-generation-the-rise-and-fall-of-the-
jewish-communists-of-poland-jaff-schatz/
ebookmeta.com
https://ebookmeta.com/product/you-may-ask-yourself-an-introduction-to-
thinking-like-a-sociologist-7th-edition-dalton-conley/
ebookmeta.com
https://ebookmeta.com/product/18th-century-male-tailoring-theatrical-
and-historical-tailoring-c1680-1790-1st-edition-graham-cottenden/
ebookmeta.com
A Selection of Simple Prose Texts 1st Edition Ruzbeh
Babaee
https://ebookmeta.com/product/a-selection-of-simple-prose-texts-1st-
edition-ruzbeh-babaee/
ebookmeta.com
Introduction to Compilers
and Language Design
Second Edition
Anyone is free to download and print the PDF edition of this book for per-
sonal use. Commercial distribution, printing, or reproduction without the
author’s consent is expressly prohibited. All other rights are reserved.
You can find the latest version of the PDF edition, and purchase inexpen-
sive hardcover copies at http://compilerbook.org
iii
iv
iv
v
Contributions
v
vi
vi
CONTENTS vii
Contents
1 Introduction 1
1.1 What is a compiler? . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why should you study compilers? . . . . . . . . . . . . . . . 2
1.3 What’s the best way to learn about compilers? . . . . . . . . 2
1.4 What language should I use? . . . . . . . . . . . . . . . . . . 2
1.5 How is this book different from others? . . . . . . . . . . . . 3
1.6 What other books should I read? . . . . . . . . . . . . . . . . 4
2 A Quick Tour 5
2.1 The Compiler Toolchain . . . . . . . . . . . . . . . . . . . . . 5
2.2 Stages Within a Compiler . . . . . . . . . . . . . . . . . . . . 6
2.3 Example Compilation . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Scanning 11
3.1 Kinds of Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 A Hand-Made Scanner . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 Deterministic Finite Automata . . . . . . . . . . . . . 16
3.4.2 Nondeterministic Finite Automata . . . . . . . . . . . 17
3.5 Conversion Algorithms . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1 Converting REs to NFAs . . . . . . . . . . . . . . . . . 19
3.5.2 Converting NFAs to DFAs . . . . . . . . . . . . . . . . 22
3.5.3 Minimizing DFAs . . . . . . . . . . . . . . . . . . . . . 24
3.6 Limits of Finite Automata . . . . . . . . . . . . . . . . . . . . 26
3.7 Using a Scanner Generator . . . . . . . . . . . . . . . . . . . . 26
3.8 Practical Considerations . . . . . . . . . . . . . . . . . . . . . 28
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Parsing 35
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Context Free Grammars . . . . . . . . . . . . . . . . . . . . . 36
vii
viii CONTENTS
5 Parsing in Practice 69
5.1 The Bison Parser Generator . . . . . . . . . . . . . . . . . . . 70
5.2 Expression Validator . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Expression Interpreter . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Expression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7 Semantic Analysis 99
7.1 Overview of Type Systems . . . . . . . . . . . . . . . . . . . . 100
7.2 Designing a Type System . . . . . . . . . . . . . . . . . . . . . 103
7.3 The B-Minor Type System . . . . . . . . . . . . . . . . . . . . 106
7.4 The Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.5 Name Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.6 Implementing Type Checking . . . . . . . . . . . . . . . . . . 113
7.7 Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 117
viii
CONTENTS ix
ix
x CONTENTS
12 Optimization 195
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
12.2 Optimization in Perspective . . . . . . . . . . . . . . . . . . . 196
12.3 High Level Optimizations . . . . . . . . . . . . . . . . . . . . 197
12.3.1 Constant Folding . . . . . . . . . . . . . . . . . . . . . 197
12.3.2 Strength Reduction . . . . . . . . . . . . . . . . . . . . 199
12.3.3 Loop Unrolling . . . . . . . . . . . . . . . . . . . . . . 199
12.3.4 Code Hoisting . . . . . . . . . . . . . . . . . . . . . . . 200
12.3.5 Function Inlining . . . . . . . . . . . . . . . . . . . . . 201
12.3.6 Dead Code Detection and Elimination . . . . . . . . . 202
12.4 Low-Level Optimizations . . . . . . . . . . . . . . . . . . . . 204
12.4.1 Peephole Optimizations . . . . . . . . . . . . . . . . . 204
12.4.2 Instruction Selection . . . . . . . . . . . . . . . . . . . 204
12.5 Register Allocation . . . . . . . . . . . . . . . . . . . . . . . . 207
12.5.1 Safety of Register Allocation . . . . . . . . . . . . . . 208
12.5.2 Priority of Register Allocation . . . . . . . . . . . . . . 208
12.5.3 Conflicts Between Variables . . . . . . . . . . . . . . . 209
12.5.4 Global Register Allocation . . . . . . . . . . . . . . . . 210
12.6 Optimization Pitfalls . . . . . . . . . . . . . . . . . . . . . . . 211
12.7 Optimization Interactions . . . . . . . . . . . . . . . . . . . . 212
12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
12.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 215
x
CONTENTS xi
Index 229
xi
xii CONTENTS
xii
LIST OF FIGURES xiii
List of Figures
xiii
xiv LIST OF FIGURES
xiv
1
Chapter 1 – Introduction
1
2 CHAPTER 1. INTRODUCTION
The best way to learn about compilers is to write your own compiler from
beginning to end. While that may sound daunting at first, you will find
that this complex task can be broken down into several stages of moder-
ate complexity. The typical undergraduate computer science student can
write a complete compiler for a simple language in a semester, broken
down into four or five independent stages.
Without question, you should use the C programming language and the
X86 assembly language, of course!
Ok, maybe the answer isn’t quite that simple. There is an ever-increasing
number of programming languages that all have different strengths and
weaknesses. Java is simple, consistent, and portable, albeit not high per-
formance. Python is easy to learn and has great library support, but is
weakly typed. Rust offers exceptional static type-safety, but is not (yet)
2
1.5. HOW IS THIS BOOK DIFFERENT FROM OTHERS? 3
Most books on compilers are very heavy on the abstract theory of scan-
ners, parsers, type systems, and register allocation, and rather light on
how the design of a language affects the compiler and the runtime. Most
are designed for use by a graduate survey of optimization techniques.
This book takes a broader approach by giving a lighter dose of opti-
mization, and introducing more material on the process of engineering a
compiler, the tradeoffs in language design, and considerations for inter-
pretation and translation.
You will also notice that this book doesn’t contain a whole bunch of
fiddly paper-and-pencil assignments to test your knowledge of compiler
algorithms. (Ok, there are a few of those in Chapters 3 and 4.) If you want
to test your knowledge, then write some working code. To that end, the
exercises at the end of each chapter ask you to take the ideas in the chapter,
and either explore some existing compilers, or write parts of your own. If
you do all of them in order, you will end up with a working compiler,
summarized in the final appendix.
3
4 CHAPTER 1. INTRODUCTION
4
5
Headers
(stdio.h)
Object Code
(prog.o)
Dynamic Static
Running Executable
Linker Linker
Process (prog)
(ld.so) (ld)
Libraries
Dynamic Libraries (libc.a)
(libc.so)
• The preprocessor prepares the source code for the compiler proper.
In the C and C++ languages, this means consuming all directives that
start with the # symbol. For example, an #include directive causes
the preprocessor to open the named file and insert its contents into
the source code. A #define directive causes the preprocessor to
substitute a value wherever a macro name is encountered. (Not all
languages rely on a preprocessor.)
5
6 CHAPTER 2. A QUICK TOUR
other semantic routines, optimizes the code, and then produces as-
sembly language as the output. This part of the toolchain is the main
focus of this book.
• The linker consumes one or more object files and library files and
combines them into a complete, executable program. It selects the
final memory locations where each piece of code and data will be
loaded, and then “links” them together by writing in the missing ad-
dress information. For example, an object file that calls the printf
function does not initially know the address of the function. An
empty (zero) address will be left where the address must be used.
Once the linker selects the memory location of printf, it must go
back and write in the address at every place where printf is called.
In this book, our focus will be primarily on the compiler proper, which is
the most interesting component in the toolchain. The compiler itself can
be divided into several stages:
Abstract
Character Semantic Intermediate Code Assembly
Scanner Tokens Parser Syntax
Stream Routines Representation Generator Code
Tree
Optimizers
• The scanner consumes the plain text of a program, and groups to-
gether individual characters to form complete tokens. This is much
like grouping characters into words in a natural language.
6
2.3. EXAMPLE COMPILATION 7
• The parser consumes tokens and groups them together into com-
plete statements and expressions, much like words are grouped into
sentences in a natural language. The parser is guided by a grammar
which states the formal rules of composition in a given language.
The output of the parser is an abstract syntax tree (AST) that cap-
tures the grammatical structures of the program. The AST also re-
members where in the source file each construct appeared, so it is
able to generate targeted error messages, if needed.
• The semantic routines traverse the AST and derive additional mean-
ing (semantics) about the program from the rules of the language
and the relationship between elements of the program. For exam-
ple, we might determine that x + 10 is a float expression by ob-
serving the type of x from an earlier declaration, then applying the
language rule that addition between int and float values yields
a float. After the semantic routines, the AST is often converted into
an intermediate representation (IR) which is a simplified form of
assembly code suitable for detailed analysis. There are many forms
of IR which we will discuss in Chapter 8.
• One or more optimizers can be applied to the intermediate represen-
tation, in order to make the program smaller, faster, or more efficient.
Typically, each optimizer reads the program in IR format, and then
emits the same IR format, so that each optimizer can be applied in-
dependently, in arbitrary order.
• Finally, a code generator consumes the optimized IR and transforms
it into a concrete assembly language program. Typically, a code gen-
erator must perform register allocation to effectively manage the
limited number of hardware registers, and instruction selection and
sequencing to order assembly instructions in the most efficient form.
The first stage of the compiler (the scanner) will read in the text of
the source code character by character, identify the boundaries between
symbols, and emit a series of tokens. Each token is a small data structure
that describes the nature and contents of each symbol:
At this stage, the purpose of each token is not yet clear. For example,
factor and foo are simply known to be identifiers, even though one is
7
8 CHAPTER 2. A QUICK TOUR
the name of a function, and the other is the name of a variable. Likewise,
we do not yet know the type of width, so the + could potentially rep-
resent integer addition, floating point addition, string concatenation, or
something else entirely.
The next step is to determine whether this sequence of tokens forms
a valid program. The parser does this by looking for patterns that match
the grammar of a language. Suppose that our compiler understands a
language with the following grammar:
Grammar G1
1. expr → expr + expr
2. expr → expr * expr
3. expr → expr = expr
4. expr → id ( expr )
5. expr → ( expr )
6. expr → id
7. expr → int
Each line of the grammar is called a rule, and explains how various
parts of the language are constructed. Rules 1-3 indicate that an expression
can be formed by joining two expressions with operators. Rule 4 describes
a function call. Rule 5 describes the use of parentheses. Finally, rules 6 and
7 indicate that identifiers and integers are atomic expressions. 1
The parser looks for sequences of tokens that can be replaced by the
left side of a rule in our grammar. Each time a rule is applied, the parser
creates a node in a tree, and connects the sub-expressions into the abstract
syntax tree (AST). The AST shows the structural relationships between
each symbol: addition is performed on width and 56, while a function
call is applied to factor and foo.
With this data structure in place, we are now prepared to analyze the
meaning of the program. The semantic routines traverse the AST and de-
rive additional meaning by relating parts of the program to each other, and
to the definition of the programming language. An important component
of this process is typechecking, in which the type of each expression is
determined, and checked for consistency with the rest of the program. To
keep things simple here, we will assume that all of our variables are plain
integers.
To generate linear intermediate code, we perform a post-order traver-
sal of the AST and generate an IR instruction for each node in the tree. A
typical IR looks like an abstract assembly language, with load/store in-
structions, arithmetic operations, and an infinite number of registers. For
example, this is a possible IR representation of our example program:
1 The careful reader will note that this example grammar has ambiguities. We will discuss
8
2.3. EXAMPLE COMPILATION 9
ASSIGN
ID
MUL
height
ADD CALL
ID INT ID ID
width 56 factor foo
9
10 CHAPTER 2. A QUICK TOUR
writing the same IR) so that they can be enabled and disabled indepen-
dently. A retargetable compiler contains multiple code generators, so that
the same IR can be emitted for a variety of microprocessors.
2.4 Exercises
2. Determine how to change the optimization level for your local com-
piler. Find a non-trivial source program and compile it at multiple
levels of optimization. How does the compile time, program size,
and run time vary with optimization levels?
3. Search the internet for the formal grammars for three languages that
you are familiar with, such as C++, Ruby, and Rust. Compare them
side by side. Which language is inherently more complex? Do they
share any common structures?
10
11
Chapter 3 – Scanning
Scanning is the process of identifying tokens from the raw text source code
of a program. At first glance, scanning might seem trivial – after all, iden-
tifying words in a natural language is as simple as looking for spaces be-
tween letters. However, identifying tokens in source code requires the
language designer to clarify many fine details, so that it is clear what is
permitted and what is not.
Most languages will have tokens in these categories:
• Keywords are words in the language structure itself, like while or
class or true. Keywords must be chosen carefully to reflect the
natural structure of the language, without interfering with the likely
names of variables and other identifiers.
• Identifiers are the names of variables, functions, classes, and other
code elements chosen by the programmer. Typically, identifiers are
arbitrary sequences of letters and possibly numbers. Some languages
require identifiers to be marked with a sentinel (like the dollar sign
in Perl) to clearly distinguish identifiers from keywords.
• Numbers could be formatted as integers, or floating point values, or
fractions, or in alternate bases such as binary, octal or hexadecimal.
Each format should be clearly distinguished, so that the programmer
does not confuse one with the other.
• Strings are literal character sequences that must be clearly distin-
guished from keywords or identifiers. Strings are typically quoted
with single or double quotes, but also must have some facility for
containing quotations, newlines, and unprintable characters.
• Comments and whitespace are used to format a program to make it
visually clear, and in some cases (like Python) are significant to the
structure of a program.
When designing a new language, or designing a compiler for an exist-
ing language, the first job is to state precisely what characters are permit-
ted in each type of token. Initially, this could be done informally by stating,
11
12 CHAPTER 3. SCANNING
for example, “An identifier consists of a letter followed by any number of letters
and numerals.”, and then assigning a symbolic constant (TOKEN IDENTIFIER)
for that kind of token. As we will see, an informal approach is often am-
biguous, and a more rigorous approach is needed.
Figure 3.1 shows how one might write a scanner by hand, using simple
coding techniques. To keep things simple, we only consider just a few
tokens: * for multiplication, ! for logical-not, != for not-equal, and se-
quences of letters and numbers for identifiers.
The basic approach is to read one character at a time from the input
stream (fgetc(fp)) and then classify it. Some single-character tokens are
easy: if the scanner reads a * character, it immediately returns
TOKEN MULTIPLY, and the same would be true for addition, subtraction,
and so forth.
However, some characters are part of multiple tokens. If the scanner
encounters !, that could represent a logical-not operation by itself, or it
could be the first character in the != sequence representing not-equal-to.
12
3.3. REGULAR EXPRESSIONS 13
Upon reading !, the scanner must immediately read the next character. If
the next character is =, then it has matched the sequence != and returns
TOKEN NOT EQUAL.
But, if the character following ! is something else, then the non-matching
character needs to be put back on the input stream using ungetc, because
it is not part of the current token. The scanner returns TOKEN NOT and will
consume the put-back character on the next call to scan token.
In a similar way, once a letter has been identified by isalpha(c), then
the scanner keeps reading letters or numbers, until a non-matching char-
acter is found. The non-matching character is put back, and the scanner
returns TOKEN IDENTIFIER.
(We will see this pattern come up in every stage of the compiler: an
unexpected item doesn’t match the current objective, so it must be put
back for later. This is known more generally as backtracking.)
As you can see, a hand-made scanner is rather verbose. As more to-
ken types are added, the code can become quite convoluted, particularly
if tokens share common sequences of characters. It can also be difficult
for a developer to be certain that the scanner code corresponds to the de-
sired definition of each token, which can result in unexpected behavior on
complex inputs. That said, for a small language with a limited number of
tokens, a hand-made scanner can be an appropriate solution.
For a complex language with a large number of tokens, we need a more
formalized approach to defining and scanning tokens. A formal approach
will allow us to have a greater confidence that token definitions do not
conflict and the scanner is implemented correctly. Further, a formal ap-
proach will allow us to make the scanner compact and high performance
– surprisingly, the scanner itself can be the performance bottleneck in a
compiler, since every single character must be individually considered.
The formal tools of regular expressions and finite automata allow us
to state very precisely what may appear in a given token type. Then, auto-
mated tools can process these definitions, find errors or ambiguities, and
produce compact, high performance code.
13
14 CHAPTER 3. SCANNING
Rule #3 is known as the Kleene closure and has the highest precedence.
Rule #2 is known as concatenation. Rule #1 has the lowest precedence and
is known as alternation. Parentheses can be added to adjust the order of
operations in the usual way.
Here are a few examples using just the basic rules. (Note that a finite
RE can indicate an infinite set.)
Regular Expression s Language L(s)
hello { hello }
d(o|i)g { dog,dig }
moo* { mo,moo,mooo,... }
(moo)* { ,moo,moomoo,moomoomoo,... }
a(b|a)*a { aa,aaa,aba,aaaa,aaba,abaa,... }
The syntax described so far is entirely sufficient to write any regular
expression. But, it is also handy to have a few helper operations built on
top of the basic syntax:
14
3.4. FINITE AUTOMATA 15
15
16 CHAPTER 3. SCANNING
as the input symbol. Some states of the FA are known as accepting states
and are indicated by a double circle. If the FA is in an accepting state after
all input is consumed, then we say that the FA accepts the input. We say
that the FA rejects the input string if it ends in a non-accepting state, or if
there is no edge corresponding to the current input symbol.
Every RE can be written as an FA, and vice versa. For a simple regular
expression, one can construct an FA by hand. For example, here is an FA
for the keyword for:
f o r
0 1 2 3
a-z
0-9
a-z
a-z 0-9
0 1 2
0-9
0-9
1-9 1 2
0
0
3
16
3.4. FINITE AUTOMATA 17
The transitions between states are represented by a matrix (M [s, i]) which
encodes the next state, given the current state and input symbol. (If the
transition is not allowed, we mark it with E to indicate an error.) For each
symbol, we compute c = M [s, i] until all the input is consumed, or an error
state is reached.
[a-z]
i n g
0 1 2 3
Now consider how this automaton would consume the word sing. It
could proceed in two different ways. One would be to move to state 0 on
s, state 1 on i, state 2 on n, and state 3 on g. But the other, equally valid
way would be to stay in state 0 the whole time, matching each letter to the
[a-z] transition. Both ways obey the transition rules, but one results in
acceptance, while the other results in rejection.
The problem here is that state 0 allows for two different transitions on
the symbol i. One is to stay in state 0 matching [a-z] and the other is to
move to state 1 matching i.
Moreover, there is no simple rule by which we can pick one path or
another. If the input is sing, the right solution is to proceed immediately
from state zero to state one on i. But if the input is singing, then we
should stay in state zero for the first ing and proceed to state one for the
second ing .
An NFA can also have an (epsilon) transition, which represents the
empty string. This transition can be taken without consuming any input
symbols at all. For example, we could represent the regular expression
a*(ab|ac) with this NFA:
17
18 CHAPTER 3. SCANNING
a
a b 3
ε 1 2
0 ε
4 a c
5 6
States Action
0, 1, 4 consume a
0, 1, 2, 4, 5 consume a
0, 1, 2, 4, 5 consume a
0, 1, 2, 4, 5 consume c
6 accept
In principle, one can implement an NFA in software or hardware by
simply keeping track of all of the possible states. But this is inefficient.
In the worst case, we would need to evaluate all states for all characters
on each input transition. A better approach is to convert the NFA into an
equivalent DFA, as we show below.
18
3.5. CONVERSION ALGORITHMS 19
Regular expressions and finite automata are all equally powerful. For ev-
ery RE, there is an FA, and vice versa. However, a DFA is by far the most
straightforward of the three to implement in software. In this section, we
will show how to convert an RE into an NFA, then an NFA into a DFA,
and then to optimize the size of the DFA.
The NFA for any character a is: The NFA for an transition is:
a ε
Now, suppose that we have already constructed NFAs for the regular
expressions A and B, indicated below by rectangles. Both A and B have
a single start state (on the left) and accepting state (on the right). If we
write the concatenation of A and B as AB, then the corresponding NFA is
simply A and B connected by an transition. The start state of A becomes
the start state of the combination, and the accepting state of B becomes the
accepting state of the combination:
A ε B
19
20 CHAPTER 3. SCANNING
A
ε ε
ε ε
B
ε ε
ε
ε
c o w
c a t
20
3.5. CONVERSION ALGORITHMS 21
c a t
ε ε
ε ε
c o w
c o w
ε ε
ε ε ε ε
c a t
c w
o
ε ε
ε
a ε ε ε ε
c a t
You can easily see that the NFA resulting from the construction algo-
rithm, while correct, is quite complex and contains a large number of ep-
silon transitions. An NFA representing the tokens for a complete language
could end up having thousands of states, which would be very impractical
to implement. Instead, we can convert this NFA into an equivalent DFA.
21
22 CHAPTER 3. SCANNING
Epsilon closure.
−closure(n) is the set of NFA states reachable from NFA state n by zero
or more transitions.
Now we define the subset construction algorithm. First, we create a
start state D0 corresponding to the −closure(N0 ). Then, for each outgo-
ing character c from the states in D0 , we create a new state containing the
epsilon closure of the states reachable by c. More precisely:
22
3.5. CONVERSION ALGORITHMS 23
c a t
ε N8 N9 N10 N11 ε
a ε ε ε
N0 N1 N2 N3 N12 N13
ε ε
c o w
N4 N5 N6 N7
D3:
N6
w
D4:
o N7, N12, N13,
N2, N3, N4, N8
c
D1:
D0: a N1, N2, N3,
c D2:
N0
N4, N8, N13
N5, N9 c
D6:
a N11, N12, N13,
N2,N3, N4, N8
t
D5:
N10
Example. Let’s work out the algorithm on the NFA in Figure 3.4. This
is the same NFA corresponding to the RE a(cat|cow)* with each of the
states numbered for clarity.
23
24 CHAPTER 3. SCANNING
7. Remove D4 from the work list, and observe that the only outgoing
transition c leads to states N5 and N9 which already exist as state D2 ,
c
so simply add a transition D4 → − D2 .
c
8. Remove D6 from the work list and, in a similar way, add D6 →
− D2 .
9. The work list is empty, so we are done.
24
3.5. CONVERSION ALGORITHMS 25
3
b
b
1 a
a
b
b
a 4 5
2
a
a
b
1,2,3,4 a 5
b
Now, we ask whether this graph is consistent with respect to all possi-
ble inputs, by referring back to the original DFA. For example, we observe
that, if we are in super-state (1,2,3,4) then an input of a always goes to
state 2, which keeps us within the super-state. So, this DFA is consistent
with respect to a. However, from super-state (1,2,3,4) an input of b can
either stay within the super-state or go to super-state (5). So, the DFA is
inconsistent with respect to b.
To fix this, we try splitting out one of the inconsistent states (4) into a
new super-state, taking the transitions with it:
b b
4 5
1,2,3
a
a,b
25
26 CHAPTER 3. SCANNING
b a
b
b 4 5
a
1,3 2 a
a
b
Again, we examine each super-state and observe that each possible in-
put is consistent with respect to the super-state, and therefore we have the
minimal DFA.
Regular expressions and finite automata are powerful and effective at rec-
ognizing simple patterns in individual words or tokens, but they are not
sufficient to analyze all of the structures in a problem. For example, could
you use a finite automaton to match an arbitrary number of nested paren-
theses?
It’s not hard to write out an FA that could match, say, up to three pairs
of nested parentheses, like this:
0
( 1
( 2
( 3
) ) )
26
Exploring the Variety of Random
Documents with Different Content
(quoted on Petrograd Imperial Ballet School), 173f.
Petipa school, 185.
Petit battements, 95.
[Les] Petits Riens (Noverre and Mozart), 91.
Petrograd (Museum), 13;
(Imperial Ballet School), 172;
(opera house), 175.
Petrouchka (Stravinsky), 229ff.
Pharaohs (dancing in the court of), 17.
Philip of Macedonia, 55.
Philippus (Roman consul), 76.
Philosophic symbolism (in Indian dance), 29.
Phœnicians, 57.
Physical exercises, 239.
Pipe (Egyptian), 8, 18.
Pipes (in Graveyard Dance), 22;
(in 15th-cent. Italian ballet), 82.
Pirouette, 94, 97, 150, 163;
(in Egyptian dancing), 18, 20.
Plaasovaya (Russian folk-dance), 140.
Plastomimic choreography, 247ff.
Plato (quoted), iv;
(cited), 52, 58, 67, 69.
Plots (for ballets), 250.
Plutarch (cited), iv, 14, 45, 67.
Poetry, 235.
Pointes, 163, 215.
Poland (folk-dancing), 136.
Pollux, 54.
Polo (Moorish dance), 106.
Polonaise (Polish folk-dance), 136.
Polowetsi dance (Cossack), 140.
Portugal (mediæval strolling ballets), 80f.
Positions. See Steps.
Poushkin, 178.
Prévost, Mme., 100.
Priapus, 54.
Price, Waldemar (Danish ballet dancer), 164.
Primitive dances (rel. to sexual selection), 6.
Primitive peoples, 3ff.
Prince Igor, 228.
Professional dancing, 7;
(Egyptian), 18.
Provence, 80f, 122, 131.
Prussia (Fackeltanz), 128.
Pskoff, 140.
Psyche (French ballet), 92.
Psychology, 1ff, 24, 45, 136, 139.
Pugni, Cesare (ballet composer), 152.
Pygmalion and Galatea (ballet), 99.
Pylades (Roman dancer), 73, 74f.
Pyrrhic dance, 60f.
Pythian games, 54.
Q
Quadrille (French social dance), 122.
Quintilian (quoted), 72.
R
Rabinoff, Max, 188.
Racial characteristics, 11.
‘Ragtime,’ 263.
Rainbow Dance, 192.
Ramble (Indian goddess of dancing), 24f.
Realism, 157, 249f.
Réception d’une jeune Nymphe à la Court de Terpsichore, 152.
Reed pipes. See Pipes.
Reger, Max, 205.
Regnard (quoted), 88.
Reinach, Théodore (cited on Greek arts), 69.
René of Provence (author of mediæval ballet), 80.
Reno (painter of Salome dance), 45.
Rheinländer (German dance), 131.
Rhythm, 1, 2;
(in naturalistic dancing), 196, 198;
(as basis of all arts), 235;
(in Jacques-Dalcroze system), 239, 244;
(in ballet), 250.
Rhythmic gymnastics, 234ff, 240, 249.
Richelieu, 86, 100.
Rigaudon, 148f.
Rimsky-Korsakoff, Nicolai, 151, 152, 171, 183, 224, 226, 254.
Rinaldo and Armida (ballet by Noverre), 90, 99.
Risti Tants (Esthonian folk-dance), 126ff.
Robert of Normandie (ballet), 164.
Robespierre, 93.
Robinson, Louis (cited on dance instinct), 3.
Rodin (quoted), 196.
Romaika (Slavic folk-dance), 137.
Rome (dancing in), 3, 72ff, 247;
(sacred dancing), 9;
(imitation of Greek dances), 10;
(Pyrrhic dance), 60.
Roman Church. See Church.
Romulus, 73.
Rondes (similarity to Eleusinian Mysteries), 67;
(French folk-dance), 121.
Roses of Love (ballet by Noverre), 90.
Rossini, 101, 103, 151.
Rouen, 100.
Roumania (folk-dance), 137f.
Round. See Ronde.
Royal Academy of Dancing (French), 86.
Rubinstein, Anton, 183, 256;
(composed ‘Tarantella’), 124.
Rubinstein, Ida, 45.
Ruggera (Italian folk-dancing), 124.
Rune tunes (Finnish), 63.
Russia (Imperial Ballet), 92;
(influence of, on choreography), 102;
(nationalistic tendencies), 104f;
(folk-dancing), 139ff, 262;
(influences on ballet), 169;
(ballets of opera house), 175;
(influence of Duncan school), 200, 206, 218f.
Russian Imperial Ballet School, 90f, 105, 172.
Russian Imperial Dramatic Dancing School, 180.
Ruthenia (folk-dancing). See Slavic folk-dances.
S
Sacchetto, Rita, 203, 212.
Sacre du Printemps (Stravinsky), 231.
Sacred dancing (in rel. to folk-lore), 9;
(Egyptian), 15;
(Indian), 26;
(Japanese), 38;
(American Indian), 39, 41f;
(Greek), 59, 67ff;
(Roman), 73f.
Sadler, Michael T. H. (quoted on Jacques-Dalcroze School), 235f.
Sahara Graveyard Dance, 21.
Sailor’s Dance (Dutch), 135.
St. Basil (cited), iii.
St. Carlos (celebrated by strolling ballet), 80.
St. Denis, Ruth, 208, 212.
Saint-Léon, 159.
St. Matthew (quoted), 44.
St. Petersburg (court ballet), 90, 161.
See also Petrograd.
Saint-Saëns, Camille, 186.
St. Vitus’ Dance, 129.
Sakuntala (French ballet), 152.
Sallé, Mlle., 94, 99, 100.
Salmacida Spolia (Sir William Davenant), 84.
Salome dances, 44f, 191.
Salome (Richard Strauss), 45.
Saltarello (Italian folk-dance), 124.
Sangalli, Rita, 159.
Sappho, 70, 94.
Sarabande, 146.
Sarasate, Pablo, 108.
Satyr Dance (in Dionysian Mysteries), 68, 69.
Sauvages de la Mer du Sud, [Les] (French ballet), 94.
Savage peoples. See Primitive peoples.
Savinskaya, 206.
Saxony (folk-dancing), 130.
Scaliger, Joseph Justa (cited), 54.
Scandinavia (folk-dances), 2, 133;
(nationalistic tendencies), 104f;
(waltz), 131;
(naturalistic school), 205.
Schafftertanz (of Munich), 129.
Scheherezade (Rimsky-Korsakoff), 152, 226.
Schiller, 166, 250.
Schirjajeff, 182.
Schliemann (Egyptologist), cited, 17.
Schmoller (Saxonian folk-dance), 130.
Schnitzler, Arthur, 166.
Schönberg, Arnold, 205.
Schools of dancing, (Petipa), 185;
(Duncan), 197;
(Jacques-Dalcroze), 197f.
See Academies.
Schopenhauer (cited), 250;
(quoted), 64.
Schleiftänze, 129.
Schreittänze. 129.
Schubert, Franz, 103f, 254.
Scotch Reel, 118f.
Scotland (folk-dancing), 118f.
Scribe, Eugène. 103.
Schuhplatteltanz (Bavarian folk-dance), 129f.
Schumann, Robert, 206.
Sculpture (in rel. to dancing), 173, 196, 235.
Seguidilla (Spanish dance), 50.
Sensationalism, 190.
Seroff, Alexander Nikolayevitch, 104, 171, 181.
Serpentine Dance, 189, 190f.
Servia (folk-dancing).
See Slavic folk-dances.
Setche, Egyptologist (cited), 14.
Seville (church dancing), iv, 78;
(court dancing), 47.
Sex instinct (in rel. to folk-dancing), v, 11, 134, 139.
Shakespeare (cited on the jig), 119.
Sharp, Cecil (quoted on Morris dances), 113f.
Shean Treuse (Scotch folk-dance), 118.
Shintoism (Japanese religion), 36.
Shirley, James, 83.
Sibelius, Jean, 205, 254, 256, 257f.
Siberia (folk-dancing), 140.
Siciliana (Italian folk-dance), 124.
[Le] Sicilien (ballet), 153.
Sieba (ballet), 152.
Siebensprung (Swabian folk-dance), 130.
Singing (in Finnish dances), 133.
Singing ballet, 177f.
Singing Sirens, 57.
Skirt Dance, 189, 212.
Skoliasmos (in Dionysian mysteries), 68f.
Skralat (Swedish folk-dance), 133.
Slavic folk-dances, 136ff.
Sleeping Beauty (Tschaikowsky), 152, 185.
Snake dances (Lithuanian), 135;
(American Indian), 38, 41, 135.
Snegourotchka (Rimsky-Korsakoff). See Snow Maiden.
Snow Maiden (Rimsky-Korsakoff), 152, 177, 183f.
Social dancing (Greek), 54f;
(Polish), 136;
(in 17th cent.), 144ff.
See also Court dancing.
Socrates, 54, 56.
Sokolova (ballerina), 151, 183.
Solomon, Hebrew king, 43, 44.
Sophocles, 62.
Sound (in relation to movement), 238
[La] Source (Delibes), 152.
Spain (religious dancing), iv;
(folk-dancing), 2, 105ff, 210ff;
(choreographic art of Moors), 46, 50f;
(mediæval strolling ballets), 80f.
Spartan dance, 54f, 60.
Spectre de la Rose (ballet), 221, 223, 229.
Spendiaroff, 256.
Spinning top principle, 216.
Stage dancing (in Middle Ages), 81, 148.
See also Professional dancing.
Steps, 2;
(in American Indian dances), 42;
(in courante), 88;
(in classic French ballet), 95f;
(Bolero), 109;
(Seguidilla), 110;
(Hungarian folk-dances), 125f;
(Rigaudon), 149;
(Bournoville’s reform), 163.
Stephania (Roman dancer), 77.
Stewart-Richardson, Lady Constance, 206.
Stockholm (ballet dancing), 161.
Stockholm school, 151.
Stomach Dance (Arabian dance), 3, 21, 22.
Stone Age, 5.
Stramboe, Adolph F., 164.
Strassburg, 129.
Strauss, Johann, 132.
Strauss, Richard, 204f, 232.
Stravinsky, Igor, 185, 229ff.
Strindberg, August, 165.
String instruments (Indian), 27.
Strolling ballets (mediæval), 80f;
(in French Revolution), 93f.
Strophic principle, 63.
Stuck (painter of Salome dance), 45.
Stuttgart (court), 90, 153.
Subra, Mlle. (ballerina), 159.
Su-Chu-Fu (dancing academy), 34.
Suetonius (cited), 76.
Sun’s Darling (English masque), 84.
Svendsen, Johann, 133, 205.
Svetloff (cited), 218.
Swan, The (Saint-Saëns), 186.
Swanhilde (ballet), 167.
Swan Lake (Russian ballet), 152, 184f.
Swabia (folk-dancing), 130.
Sweden (influence on Russian ballet), 169.
See also Scandinavia.
Sword Dance (English), 21, 33, 113, 115ff.
La Sylphide (Delibes), 152, 153, 154, 156, 163.
[Les] Sylphides, 175, 221.
Sylvia (Delibes), 152.
Symbolism (in Indian dancing), 29, 263f;
(in Hungarian folk-dancing), 126;
(in Lada’s dances), 254f;
(in modern ballet), 258, 265.
Symons, Arthur (quoted), 264f.
Symphonic music (as basis for dancing), 200, 206.
Syrinx (Egyptian instrument), iv.
Szolo (Hungarian folk-dance), 126.
T
Tabor (in Morris dance), 115.
Tacitus (cited), 76.
Taglioni, Maria, 11, 151, 152ff, 156, 157, 193.
Taglioni, Salvatore, 151, 152, 161.
Ta-gien (Chinese dance), 32.
Ta-gu (Chinese dance), 32.
Ta-knen (Chinese dance), 32.
Talmud, 43.
Ta-mao (Chinese dance), 32.
Tambourine (in Hebrew dance), 19;
(in Indian dance), 27;
(with bells, Chinese), 32;
(in Greek dances), 71;
(in Spanish dance), 79f, 106;
(in Tarantella), 122.
Taneieff, Sergei Ivanovich, 224.
Tarantella (Italian folk-dance), 122ff.
Tartar tribes, 140.
Tascara (Spanish folk-dance), 111f.
Taubentanz (Black Forest), 130.
Ta-u (Chinese dance), 32.
Tcherepnin, 185, 226, 229.
Technique (Duncan), 199;
(instrumental), 237;
(eurhythmic), 239.
Telemachus, 53.
Telemaque (French ballet), 92.
Teleshova (ballerina), 151, 181.
Telethusa (Roman dancer), 77.
Tempe Restored (Aurelian Townsend), 84f.
Temple dancing (Hebraic), 43, 44;
(Greek), 54f;
(Esthonian), 127.
See also Sacred dancing.
Terpsichore, 10, 57.
Terpsichore (ballet by Handel), 99.
Teu-Kung (Chinese dancing teacher), 31.
Thackeray (quoted on Taglioni), 154.
Thales, 59.
Théatre des Arts, 92.
Theatre of Dionysius, 64f.
Thebes, 19.
Theseus, iv, 54, 69.
They (Chinese monarch), 30.
Tiberius (Roman emperor), 76.
Tichomiroff, 221.
Time, 240f.
Time-marker (in Greek dancing), 70f.
Time-values, 241.
Titans, 59.
Titus (Roman emperor), 34.
Toe-dance, 215.
Toledo (church dancing), iv, 78.
Toreadoren (ballet), 164.
Torra (Murcian folk-dance), 106.
Tourdion (social dance), 150.
Townsend, Aurelian, 84f.
Trepak (Russian folk-dance), 140.
Trescona (Florentine folk-dance), 124.
Triangle (in English Horn dance), 117.
Tripoli (Almeiis dancers in), 21.
Triumph of Love, 87.
Triumph of Peace (James Shirley), 83.
Trouhanova, Natasha, 45, 244, 256f.
Trumpets (in 15th-cent. Italian ballet), 82.
Tschaikowsky, Peter Ilyitch, 104, 151, 152, 171, 177, 183, 184,
185.
Tshamuda (Indian goddess), 26.
Tuileries, 87.
Tunic, ballerina’s, 215.
Tunis (Almeiis dancers in), 21.
Turgenieff, 104, 171;
(quoted on Elssler), 155f.
Tuta, 215.
U
Uchtomsky, Prince (cited), 28.
U-gientze (Chinese dance), 32.
Ulysses, 52.
Urbino, Duke of, 80.
V
Vafva Vadna (Swedish folk-dance), 133f.
Valdemar (Danish ballet), 163, 164.
Valencia, iv, 78, 107f.
Valencian Bishop (advocate of dancing), 78.
Valentine, Gwendoline (ballet dancer), 206.
Vanka (Cossak dance), 140.
Van Staden (Colonel), 179.
Vaudoyer, J. L., 229.
Vaughan, Kate (ballet dancer), 193.
Veie de Noue (in Lou Gue), 80.
Veils (used in Greek dancing), 66, 70.
Venera (Indian goddess), 24.
[La] Ventana (ballet), 166.
Venus of Cailipyge, 76f.
Verbunkes (Hungarian folk-dance), 126.
[La] Vestale (ballet), 153.
Vestris brothers, 91, 101, 148, 151, 162.
Viennese court, 90.
Viennese School, 151.
Villiani, Mme. (ballet dancer), 22, 193.
Vingakersdans (Swedish folk-dance), 134.
Violin (in 15th-cent. Italian ballet), 82;
(in Spanish folk-dance), 107.
Vision of Salome (ballet), 201.
Vocal ballets, 177f.
Vocal music (dependence of dancing upon), 8;
(in Greek dances), 58.
Voisins, Comte Gilbert des, 154.
Volga, 140.
Volinin (Russian ballet dancer), 185, 187, 248.
Volkhonsky, Prince Serge (quoted), 197f, 212f, 215ff, 232, 249.
Voltaire (cited), 99.
Volte (French folk-dance), 131.
Vuillier (quoted on Spanish temple dancing), 79f.
Vulcan, 53.
Vulture Dance (Greek), 69.
W
Wagnerian operas, 63.
Waldteufel, 132.
Waltz, 131f.
Walzer, 131.
War-dances (primitive), 5f;
(Pyrrhic), 60;
(Roman), 73;
(Hungarian), 126.
Warsaw (opera house), 175.
Weber, Carl Maria von, 91, 103, 229.
Weber, Louise, 192.
Weiss, Mme., 159.
Wellman, Christian, 180.
Whistles (in American Indian dances), 41;
(in Morris dance), 115.
Whitehall (masques performed at), 83.
Wiesenthal, Elsa and Grete, 202f, 212.
Wilhelm II, 130.
Wilkinson, Sir Gardner, on Egypt (cited), 18f;
(quoted), 20f.
Women (earliest appearance of, in ballet), 87.
Wood-wind instruments (Indian), 27.
Wsevoloshky, 183.
Würtemberg (folk-dancing), 130.
X
Xenophon (quoted), 55f.
Xeres, iv.
Y
Yorkshire (English sword dance of), 116.
Yu-Wang (Chinese emperor), 33.
Z
Zarzuela (Spanish comic opera), 63f, 106.
Zeus, 59.
Zorongo (Spanish folk-dance), 111.
Zulus (war dances of), 5.
Zunfttänze, 129.
Zwölfmonatstanz (Würtemberg), 130.
Transcriber’s Notes
Punctuation, hyphenation, and spelling were made
consistent when a predominant preference was found in
the original book; otherwise they were not changed.
Simple typographical errors were corrected;
unbalanced quotation marks were remedied when the
change was obvious, and otherwise left unbalanced.
The index was not checked for proper
alphabetization or correct page references, with this
exception: all references to pages iii–vi should be to
pages vii–x. In versions of this eBook that support
hyperlinks, the links have been corrected, but the
displayed page numbers have not been changed in any
version of this eBook.
Page 110: “Albacetex” was printed that way;
probably is a misprint for “Albacete”.
Page 131: “3/4 rhythm” was printed as “3-4 rhythm”
but changed here to conform with the predominant form
of notation throughout the original book.
Page 275: “English Cathedrals” reference to page viii
was printed as “iii-f”; changed here.
*** END OF THE PROJECT GUTENBERG EBOOK THE DANCE ***
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.