Course Overview
PART I: overview material
1Introduction
2Language processors (tombstone diagrams, bootstrapping)
3Architecture of a compiler
Supplementary material:
PART II: inside a compiler Theoretical foundations
(Finite-state machines)
4 Syntax analysis
5 Contextual analysis
6 Runtime organization
7 Code generation
PART III: conclusion
8 Interpretation
9 Review
1
Finite State Machines (aka Finite Automata)
• A FSM is similar to a compiler in that:
– A compiler recognizes legal programs
in some (source) language.
– A finite-state machine recognizes legal strings
in some language.
• Example: Pascal Identifiers
– sequences of one or more letters or digits,
starting with a letter:
letter | digit
letter
S A
2
Finite State Machines (aka Finite Automata)
•In this picture: Nodes are states.
•Edges (arrows) are transitions. Each edge should be labeled with a single character.
In this example, we've used a single edge labeled "letter" to stand for 52 edges
labeled 'a', 'b', ..., 'z', 'A', ..., 'Z'. (Similarly, the label "letter,digit" stands for 62
edges labeled 'a',...'Z','0',...'9'.)
•S is the start state; every FSM has exactly one (a standard convention is to label
the start state "S").
•A is a final state. By convention, final states are drawn using a double circle, and
non-final states are drawn using single circles. A FSM may have more than one final
state.
letter | digit
letter
S A
3
Finite State Machines viewed as Graphs
• A state
• The start state
• An accepting state
a
• A transition
4
Finite State Machines
• Transition
s1 a
> s2
• Is read
In state s1 on input “a” go to state s2
• If end of input
– If in accepting state => accept
– Otherwise => reject
• If no transition possible (got stuck) => reject
5
Language defined by FSM
• The language defined by a FSM is the set of
strings accepted by the FSM.
– Are in the language of the FSM shown above:
• x, mp2, XyZzy, position27.
– Are not in the language of the FSM shown above:
• 123, a?, 13apples.
6
Example: Integer Literals
• FSM that accepts integer literals with an
optional + or - sign:
digit
B
digit
digit
+
S A
-
7
Formal Definition
• Each finite state machine is a 5-tuple (, Q, ,
q, F) that consists of:
– An input alphabet / I
– A set of states Q / S
– A start state q / s0
– A set of accepting states (or final states) F Q
is a state transition function: Q x Q
that encodes transitions statei input> statej
8
State-Transition Function
for the integer-literal example:
(S, +) = A
(S, –) = A
(S, digit) = B
(A, digit) = B
(B, digit) = B
9
FSM Examples
0 1 1
A B
0
Accepts strings
over alphabet
{0,1} that end in 1
10
FSM Examples
a 1 b
a b
2 3
b a a b Accepts strings
over alphabet {a,b}
that begin and end
4 with same symbol
b 5 a
11
FSM Examples
0
Accepts strings over
{0,1,2} such that sum of
1 digits is a multiple of 3
Start 2
1
0 0 1 2 2 0
1
12
FSM Examples
0 0
1
Even Odd
1
Accepts strings over {0,1}
that have an odd number
of ones
13
FSM Examples
1 0,1
0 0 1
'0' '00' '001'
1
0
Accepts strings over
{0,1} that contain
the substring 001
14
Examples
• Design a FSM to recognize strings with an
equal number of ones and zeros.
– Not possible
• Design a FSM to recognize strings with an
equal number of substrings "01" and "10".
– Perhaps surprisingly, this is possible
15
FSM Examples
0
1
0 1
0 Accepts strings with
an equal number
of substrings "01" and
"10"
1
1 0
1 0
16
TEST YOURSELF
• Question 1: Draw a finite-state machine that
accepts Java identifiers
– one or more letters, digits, or underscores,
starting with a letter or an underscore.
• Question 2: Draw a finite-state machine that
accepts only Java identifiers that do not end
with an underscore
17
TEST YOURSELF
Question 3: What strings does this FSM accept?
Describe the set of accepted strings in English.
1
q0 q2
1
0 0 0 0
1
q1 q3
1
18
Two kinds of Finite State Machines
Deterministic (DFSM):
– No state has more than one outgoing edge with the
same label. [All previous FSM were DFSM.]
Non-deterministic (NFSM):
– States may have more than one outgoing edge with
same label.
– Edges may be labeled with (epsilon), the empty
string. [Note that some books use the symbol .]
– The automaton can make an epsilon transition
without consuming the current input character.
19
Example of NFSM
• integer-literal example:
digit
B
digit
+
S A
-
20
Example of NFSM
0,1 0,1
0 0 1
'0' '00' '001'
Accepts strings over
{0,1} that contain
the substring 001
21
Non–deterministic finite state machines (NFSM)
• sometimes simpler than DFSM
• can be in multiple states at the same time
• NFSM accepts a string if
– there exists a sequence of moves
– starting in the start state,
– ending in a final state,
– that consumes the entire string.
• Examples:
– Consider the integer-literal NFSM on input "+752"
– Consider the second NFSM on input "10110001"
22
Equivalence of DFSM and NFSM
• Theorem:
– For each non-deterministic finite state machine N,
we can construct a deterministic finite state
machine D such that N and D accept the same
language.
– [proof omitted]
• Theorem:
– Every deterministic finite state machine can be
regarded as a non–deterministic finite state
machine that just doesn’t use the extra non–
deterministic capabilities.
23
How to Implement a FSM
A table-driven approach:
• Table:
– one row for each state in the machine, and
– one column for each possible character.
• Table[j][k]
– which state to go to from state j on input character k,
– an empty entry corresponds to the machine getting stuck.
24
The table-driven program for a DFSM
state = S // S is the start state
repeat {
k = next character from the input
if (k == EOF) then // end of input
if (state is a final state) then accept
else reject
state = T[state][k]
if (state = empty) then reject // got stuck
}
25
Finite State Machines
Control Circuits
26
Example: Vending Machine
• Takes only quarters and dollar bills
• Won't hold more than $1.00
• Sodas cost $.75
• Possible actions (inputs)
– deposit $.25 (25)
– deposit $1.00 ($)
– push button to get soda (soda)
– push button to get money returned (ret)
27
Example: Vending Machine
• State: description of the internal settings of
the machine, e.g. how much money has been
depositied and not spent
• Finite states: 0, 25, 50, 75, 100,
• Rules: determine how inputs can change state
28
Example: Vending Machine
25
25 50
001 010 25
25 ret
ret soda
0 soda 75
000 ret 011
100
ret 25
Inputs
25 = 00 100
100 = 01 100
soda = 10
29
ret = 11
Example: Vending Machine
state input new state state input new state
S2 S1 S0 I0 I1 S2 S1 S0 S2 S1 S0 I0 I1 S2 S1 S0
0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1
0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 0
0 1 1 0 0 1 0 0 0 1 1 1 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1
0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0
0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0
0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 0
0 1 1 0 1 0 1 1 0 1 1 1 1 0 0 0
1 0 0 0 1 1 0 0 1 0 0 1 1 0 0 0
30
Example: Vending Machine
state input new state state input new state
S2 S1 S0 I0 I1 S2 S1 S0 S2 S1 S0 I0 I1 S2 S1 S0
0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1
0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 0
0 1 1 0 0 1 0 0 0 1 1 1 0 0 0 0
1 0 0 0 X 1 0 0 1 0 0 1 0 0 0 1
0 0 0 0 1 1 0 0 X X X 1 1 0 0 0
0 0 1 0 1 0 0 1
0 1 0 0 1 0 1 0
0 1 1 0 1 0 1 1
31
Example: Vending Machine
Clock
S0
S1
S2
I0
I1
32