Lecture 6
Lecture 6
Parsing
ID ID = =
ID INT ID INT
Parsing
● Lexer generates sequence of tokens that is input to parser
● Output is parse tree
● Not all tokenstreams are valid programs
● Parser needs to distinguish between valid and invalid programs
● Need a language for describing valid tokens
○ Regular languages are weakest formal languages
○ Many languages cannot be expressed in regular languages
■ Balanced parentheses {(i)i | i ≥ 0}
■ Arithmetic expressions
○ FA does not remember the number of times it has passed a state
○ Need a language that can recursively refer to constructs
Context Free Grammars
● Use CFGs for parsing
● Consist:
○ Set of terminals T
○ Set of non-terminals N
○ A non-terminal start symbol S
○ Set of productions of the form:
X → α1α2α3..αn
where X ∈ N and αi ∈ T ∪ N ∪ {ε}
● Context free means that the non-terminals can be replaced in any
order to get same result.
Context Free Grammars
String of balanced parentheses using CFG:
P→(P)
P→ε
Context Free Grammars
● Begin with a string consisting of the start symbol “S”
● Replace any non-terminal X in the string by the right-hand side
of some production:
X → α1 ... αn
● Repeat the above step until there are no non-terminals in the
string
Context Free Grammars
Formally, replace (a derivation is a step)
X1 … Xi-1 Xi Xi+1… → X1 … Xi-1 α1 … αm Xi+1 …
if there is a production
Xi → α1 … αm
Context Free Grammars
Formally,
X1 … Xn →* α1 … αm
if there are productions s.t.
X1 … Xn → … → α1 … αm
Language of Context Free Grammars
The language of a context-free grammar, G, having the start symbol
S is:
{α1 … αn | S →* α1 … αn ; forall i = 1 to n, αi is a terminal}
E → id | n | E + E | E R E
S → id = E | if E then S else S
| while E do S
Parse Trees
● Parse trees are representation of derivations that show a
sequence of productions leading to only terminals.
● Start symbol is the root of the tree
● For every production, from the left-hand non-terminal (X), add
an edge to the (non-)terminals (αi) on the right-hand side, each of
which become the children of X
● Terminals form the leaf nodes of the tree
● In-order traversal of the leaf nodes gives the input
Parse Trees - Example
CFG : E → n | id | E + E | E – E | E * E | (E)
Derivations: E → E * E → id * E → id * (E) → id * (E + E)
→ id * (id + E) → id * (id + id)
Parse Trees - Example E
E CFG :
E → n | id | E + E
|E–E|E*E
| (E)
→ id * (id + E) id
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id ( E )
→ id * (id + E) id id
→ id * (id + id)
Left- and Right- Derivations
● Previous derivation was a left-derivation
○ Replaced left-most non-terminal symbol
● Similar right-derivation possible
○ E →E*E
→ E * (E)
→ E * (E + E)
→ E * (E + id)
→ E * (id + id)
→ id * (id + id)
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) ( E )
→ E * (id + id) id id
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) id ( E )
→ E * (id + id) id id
→ id * (id + id)
CFG for Conditionals in Python
expr ::= ...
stmt ::= ... | cond | …
stmts ::= stmt | stmts stmt
bstmt ::= INDENT stmts DEDENT
cond ::= "if" expr ‘:’ bstmt elifs else
elifs ::= ε | "elif" expr ‘:’ bstmt elifs
else ::= ε | "else" ‘:’ bstmt
Question
Which of the strings are in the language given by the CFG:
S → aXa
X → ε | bY
Y → ε | cXc
1. abcba
2. acca
3. aba
4. abcbcba
Question
Which of the strings are in the language given by the CFG:
S → aXa
X → ε | bY
Y → ε | cXc
1. abcba
2. acca
3. aba
4. abcbcba
Question
Which of the following are valid derivations for :
S → aXa
X → ε | bY
Y → ε | cXc
Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
id
→ id * id + E
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
id id
→ id * id + E
→ id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E
|E–E|E*E
Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
id
→ E * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
id id
→ E * id + id
→ id * id + id
Multiple Parse Trees
E * E E + E
id id
E + E E * E
id id id id
Multiple Parse Trees
E
Left-most derivation Right-most derivation
E E E * E
E * E E + E id ( E )
id id
E + E E * E E + E
id id id id id id
Multiple Parse Trees
● A grammar is ambiguous if it has more than one parse tree for
some string, i.e., it has more than one left-most or right-most
derivation
● Not good for compilation
○ Programs are ill-defined
Ambiguity
S → if E then S
| if E then S else S
Ambiguity
S → if E then S
| if E then S else S
if
E1 if S2
E2 S1
Ambiguity
S → if E then S
| if E then S else S
if if
E1 if S2 E1 if
E2 S1 E2 S1 S2