0% found this document useful (0 votes)
24 views

Lecture 6

This document discusses parsing in compilers. It explains how a lexer generates tokens that are input to a parser. The parser then builds a parse tree from these tokens according to the rules of a context-free grammar. Context-free grammars and parse trees are described in detail.

Uploaded by

Vedang Chavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 6

This document discusses parsing in compilers. It explains how a lexer generates tokens that are input to a parser. The parser then builds a parse tree from these tokens according to the rules of a context-free grammar. Context-free grammars and parse trees are described in detail.

Uploaded by

Vedang Chavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CS327 - Compilers

Parsing

Abhishek Bichhawat 02/02/2024


Parsing
● Lexer generates sequence of tokens that is input to parser
● Output is parse tree
○ if x == 0 then y = 1 else z = 2
INPUT: IF ID RELOP ID THEN ID = INT ELSE ID = INT
OUTPUT:
IF-THEN-ELSE

RELOP THEN ELSE

ID ID = =

ID INT ID INT
Parsing
● Lexer generates sequence of tokens that is input to parser
● Output is parse tree
● Not all tokenstreams are valid programs
● Parser needs to distinguish between valid and invalid programs
● Need a language for describing valid tokens
○ Regular languages are weakest formal languages
○ Many languages cannot be expressed in regular languages
■ Balanced parentheses {(i)i | i ≥ 0}
■ Arithmetic expressions
○ FA does not remember the number of times it has passed a state
○ Need a language that can recursively refer to constructs
Context Free Grammars
● Use CFGs for parsing
● Consist:
○ Set of terminals T
○ Set of non-terminals N
○ A non-terminal start symbol S
○ Set of productions of the form:
X → α1α2α3..αn
where X ∈ N and αi ∈ T ∪ N ∪ {ε}
● Context free means that the non-terminals can be replaced in any
order to get same result.
Context Free Grammars
String of balanced parentheses using CFG:

P→(P)
P→ε
Context Free Grammars
● Begin with a string consisting of the start symbol “S”
● Replace any non-terminal X in the string by the right-hand side
of some production:
X → α1 ... αn
● Repeat the above step until there are no non-terminals in the
string
Context Free Grammars
Formally, replace (a derivation is a step)
X1 … Xi-1 Xi Xi+1… → X1 … Xi-1 α1 … αm Xi+1 …
if there is a production
Xi → α1 … αm
Context Free Grammars
Formally,
X1 … Xn →* α1 … αm
if there are productions s.t.
X1 … Xn → … → α1 … αm
Language of Context Free Grammars
The language of a context-free grammar, G, having the start symbol
S is:
{α1 … αn | S →* α1 … αn ; forall i = 1 to n, αi is a terminal}

Terminals cannot be replaced in the string

In the context of Compilers, terminals are tokens of the language


CFG for Arithmetic Expressions
E→n
E → id
E→E+E
E→E–E
E→E*E
E → (E)
CFG for Arithmetic Expressions
E→n
| id
|E+E
|E–E
|E*E
| (E)
CFG for a Language with Conditionals
R → == | > | < | >= | <=

E → id | n | E + E | E R E

S → id = E | if E then S else S
| while E do S
Parse Trees
● Parse trees are representation of derivations that show a
sequence of productions leading to only terminals.
● Start symbol is the root of the tree
● For every production, from the left-hand non-terminal (X), add
an edge to the (non-)terminals (αi) on the right-hand side, each of
which become the children of X
● Terminals form the leaf nodes of the tree
● In-order traversal of the leaf nodes gives the input
Parse Trees - Example
CFG : E → n | id | E + E | E – E | E * E | (E)

String : id * (id + id)

Derivations: E → E * E → id * E → id * (E) → id * (E + E)
→ id * (id + E) → id * (id + id)
Parse Trees - Example E
E CFG :
E → n | id | E + E
|E–E|E*E
| (E)

Str : id * (id + id)


Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
| (E)

Str : id * (id + id)


Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id

Str : id * (id + id)


Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id ( E )

→ id * (E) Str : id * (id + id)


Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id ( E )

→ id * (E) Str : id * (id + id)


E + E
→ id * (E + E)
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id ( E )

→ id * (E) Str : id * (id + id)


E + E
→ id * (E + E)

→ id * (id + E) id
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E | (E) id ( E )

→ id * (E) Str : id * (id + id)


→ id * (E + E) E + E

→ id * (id + E) id id
→ id * (id + id)
Left- and Right- Derivations
● Previous derivation was a left-derivation
○ Replaced left-most non-terminal symbol
● Similar right-derivation possible
○ E →E*E
→ E * (E)
→ E * (E + E)
→ E * (E + id)
→ E * (id + id)
→ id * (id + id)
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) ( E )

Str : id * (id + id)


Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) ( E )

→ E * (E + E) Str : id * (id + id)


E + E
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) ( E )

→ E * (E + E) Str : id * (id + id)


E + E
→ E * (E + id)
id
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) ( E )

→ E * (E + E) Str : id * (id + id)


E + E
→ E * (E + id)

→ E * (id + id) id id
Parse Trees - Example E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ E * (E) | (E) id ( E )

→ E * (E + E) Str : id * (id + id)


→ E * (E + id) E + E

→ E * (id + id) id id
→ id * (id + id)
CFG for Conditionals in Python
expr ::= ...
stmt ::= ... | cond | …
stmts ::= stmt | stmts stmt
bstmt ::= INDENT stmts DEDENT
cond ::= "if" expr ‘:’ bstmt elifs else
elifs ::= ε | "elif" expr ‘:’ bstmt elifs
else ::= ε | "else" ‘:’ bstmt
Question
Which of the strings are in the language given by the CFG:
S → aXa
X → ε | bY
Y → ε | cXc

1. abcba
2. acca
3. aba
4. abcbcba
Question
Which of the strings are in the language given by the CFG:
S → aXa
X → ε | bY
Y → ε | cXc

1. abcba
2. acca
3. aba
4. abcbcba
Question
Which of the following are valid derivations for :
S → aXa
X → ε | bY
Y → ε | cXc

1. S → aXa → abYa → acXca → acca


2. S → aXa → aa
3. S → aXa → abYa → abcXca → abcbYca → abcbca
4. S → aXa → abYa → abcXcba → abccba
Question
Which of the following are valid derivations for :
S → aXa
X → ε | bY
Y → ε | cXc

1. S → aXa → abYa → acXca → acca


2. S → aXa → aa
3. S → aXa → abYa → abcXca → abcbYca → abcbca
4. S → aXa → abYa → abcXcba → abccba
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E
|E–E|E*E

Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E

Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
id
→ id * id + E
Parse Trees - Example (Left) E
E CFG :
E → n | id | E + E E * E
→ E*E
|E–E|E*E
→ id * E id
Str : id * id + id E + E
→ id * E + E
id id
→ id * id + E

→ id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E
|E–E|E*E

Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E

Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
id
→ E * id + id
Parse Trees - Example (Right) E
E CFG :
E → n | id | E + E E + E
→ E+E
|E–E|E*E
id
→ E + id
Str : id * id + id E * E
→ E * E + id
id id
→ E * id + id

→ id * id + id
Multiple Parse Trees

Left-most derivation Right-most derivation


E E

E * E E + E

id id
E + E E * E

id id id id
Multiple Parse Trees
E
Left-most derivation Right-most derivation
E E E * E

E * E E + E id ( E )

id id
E + E E * E E + E

id id id id id id
Multiple Parse Trees
● A grammar is ambiguous if it has more than one parse tree for
some string, i.e., it has more than one left-most or right-most
derivation
● Not good for compilation
○ Programs are ill-defined
Ambiguity
S → if E then S
| if E then S else S
Ambiguity
S → if E then S
| if E then S else S

Str: if E1 then if E2 then S1 else S2

if

E1 if S2

E2 S1
Ambiguity
S → if E then S
| if E then S else S

Str: if E1 then if E2 then S1 else S2

if if

E1 if S2 E1 if

E2 S1 E2 S1 S2

You might also like