Chapter - Three: Syntax Analysis
Chapter - Three: Syntax Analysis
Syntax analysis
1
Outline
Introduction
Context free grammar (CFG)
Derivation
Parse tree
Ambiguity
Left recursion
Left factoring
Top-down parsing
• Recursive Descent Parsing (RDP)
• Non-recursive predictive parsing
– First and follow sets
– Construction of a predictive parsing table
2
Outline
LR(1) grammars
Syntax error handling
Error recovery in predictive parsing
Panic mode error recovery strategy
Yacc
3
Introduction
Syntax: the way in which tokens are put together to form
expressions, statements, or blocks of statements.
The rules governing the formation of statements in a programming
language.
Parse tree
next char next token
lexical Syntax
analyzer analyzer
get next
char get next
token
Source
Program
symbol
table
Lexical Syntax
(Contains a record Error
Error
for each identifier)
5
Introduction…
The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG or
not.
If it satisfies, the parser creates the parse tree of that
program.
Otherwise, the parser gives the error messages.
A CFG:
gives a precise syntactic specification of a programming
language.
A grammar can be directly converted in to a parser by
some tools (yacc).
6
Introduction…
The parser can be categorized into two groups:
Top-down parser
The parse tree is created top to bottom, starting from the
root to leaves.
Bottom-up parser
The parse tree is created bottom to top, starting from the
leaves to root.
Both top-down and bottom-up parser scan the input from
left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented using context-free-grammar.
LL for top-down parsing
LR for bottom-up parsing
7
Context free grammar (CFG)
A context-free grammar (CFG) is a specification for the
syntactic structure of a programming language.
Context-free grammar has 4-tuples:
G = (T, N, P, S) where
T is a finite set of terminals (a set of tokens)
N is a finite set of non-terminals (syntactic variables)
P is a finite set of productions of the form
A→α where A is non-terminal and
α is a strings of terminals and non-terminals (including the empty
string)
S N is a designated start symbol (one of the non-
terminal symbols)
8
Example: grammar for simple arithmetic expressions
9
Notational Conventions Used
Terminals:
Lowercase letters early in the alphabet, such as a, b, c.
Operator symbols such as +, *, and so on.
Punctuation symbols such as parentheses, comma, and so on.
The digits 0,1,. . . ,9.
Boldface strings such as id or if, each of which represents a
single terminal symbol.
Non-terminals:
Uppercase letters early in the alphabet, such as A, B, C.
The letter S is usually the start symbol.
Lowercase, italic names such as expr or stmt.
Uppercase letters may be used to represent non-terminals for
the constructs.
• expr, term, and factor are represented by E, T, F 10
Notational Conventions Used…
Grammar symbols
Uppercase letters late in the alphabet, such as X, Y, Z, that is, either
non-terminals or terminals.
Strings of terminals
Lowercase letters late in the alphabet, mainly u,v,x,y T*
Strings of grammar symbols
Lowercase Greek letters, α, β, γ (N T)*
A set of productions A α1, A α2, . . . , A αk with a common head A
(call them A-productions), may be written
A α1 | α2 |…| αk
α1, α2,. . . , αk the alternatives for A.
The head of the first production is the start symbol.
EE+T|E-TIT
TT*FIT/FIF
F ( E ) | id 11
Derivation
A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.
Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
14
Parse tree
A parse tree is a graphical representation of a derivation.
It filters out the order in which productions are applied to replace
non-terminals.
15
Parse tree and Derivation
Grammar E E + E | E E | ( E ) | - E | id
Lets examine this derivation:
E -E -(E) -(E + E) -(id + id)
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
16
Exercise
a) Using the grammar below, draw a parse tree for the
following string:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
b) Give a rightmost derivation for the string given in (a).
17
Ambiguity
A grammar, which produces more than one parse tree for a
sentence is called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same sentence.
18
Ambiguity: Example
Example: The arithmetic expression grammar
E → E + E | E * E | ( E ) | id
19
Ambiguity: example
E E + E | E E | ( E ) | - E | id
Construct parse tree for the expression: id + id id
E E E E
E + E E + E E + E
E E id E E
id id
E E E E
E E E E E E
E + E E + E id
Which parse tree is correct?
id id
20
Ambiguity: example…
E E + E | E E | ( E ) | - E | id
Find a derivation for the expression: id + id id
E
According to the grammar, both are correct.
E + E
id E E
E + E
E E id
id id
21
Elimination of ambiguity
Precedence/Association
These two derivations point out a problem with the grammar:
The grammar do not have notion of precedence, or implied order of
evaluation.
To add precedence
Create a non-terminal for each level of precedence
Isolate the corresponding part of the grammar
Force the parser to recognize high precedence sub expressions first
To add association
Left-associative : The next-level (higher) non-terminal places at the last of a
production.
22
Elimination of ambiguity
To disambiguate the grammar:
E E + E | E E | ( E ) | id
EE+T|T id + id * id
TTF|F
F ( E ) | id
23
Left Recursion
EE+T|T
Consider the grammar: TTF|F
F ( E ) | id
E E E E
E + T E + T E + T
E + T E + T
E + T
24
Elimination of Left recursion
A grammar is left recursive, if it has a non-terminal A
such that there is a derivation
A=>+Aα for some string α.
Top-down parsing methods cannot handle left-
recursive grammar.
so a transformation that eliminates left-recursion is needed.
E TE’
E’ +TE’ |
T FT’
T’ FT’ |
F ( E ) | id
26
Elimination of Left recursion…
Generally, we can eliminate immediate left recursion from
them by the following technique.
First we group the A-productions as:
27
Eliminating left-recursion algorithm
i=3, j=1: C → A B | CC | a
C → B C B | a B | CC | a
i=3, j=2: C → B C B | a B | CC | a
C A B’ CB | abB’CB | aB | CC | a
(imm) C → abB’CBC’ | a B C’ | a C’
C’ → AB’CBC’ | CC’ |ε
29
Eliminating left-recursion (more)
Example: Given: S Aa | b
A Ac |Sd |ε
Substitute the S productions in A Sd to obtain the
following productions:
A Ac | Aad | bd |ε
Eliminating the immediate left recursion among the
A productions yields the following grammar:
S Aa | b
A bdA’ | A’
A’ cA’ | adA’ |ε
30
Left factoring
31
Left factoring…
When processing α we do not know whether to expand A
to αβ1 or to αβ2, but if we re-write the grammar as
follows:
A αA’
A’ β1 | β2 so, we can immediately expand A to αA’.
Example: given the following grammar:
S iEtS | iEtSeS | a
Eb
Left factored, this grammar becomes:
S iEtSS’ | a
S’ eS | ε
Eb
32
Left factoring…
33
Syntax analysis (Parsing)
Every language has rules that prescribe the syntactic
structure of well formed programs.
The syntax can be described using Context Free Grammars
(CFG) notation.
The use of CFGs has several advantages:
helps in identifying ambiguities
a grammar gives a precise yet easy to understand syntactic
specification of a programming language
it is possible to have a tool which produces automatically a
parser using the grammar
a properly designed grammar helps in modifying the parser
easily when the language changes
34
Top-down parsing
Recursive Descent Parsing (RDP)
This method of top-down parsing can be considered as an
attempt to find the left most derivation for an input string.
It may involve backtracking.
36
RDP…
Example: G: S cAd
A ab|a
Draw the parse tree for the input string cad using
the above method.
38
Non-recursive predictive parsing
It is possible to build a non-recursive parser by explicitly
maintaining a stack.
This method uses a parsing table that determines the next
production to be applied.
x=a=$ id + id id $ OUTPUT:
INPUT:
x=a≠$
X is non-terminal E
E
Predictive Parsing
STACK:
$ Program
43
Predictive Parsing Simulation
INPUT: id + id id $ OUTPUT:
E
T E’
T
E
Predictive Parsing
STACK:
E’
$ Program
$
PARSING NON-
TERMINAL id +
INPUT SYMBOL
* ( ) $
TABLE: E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E) 44
Predictive Parsing Simulation…
INPUT: id + id id $ OUTPUT:
E
T E’
Predictive Parsing
STACK: T
F
Program F T’
T’
E’
E’
$
$
INPUT: id + id id $ OUTPUT:
E
T E’
Predictive Parsing
STACK: id
T
F
Program F T’
T’
E’
E’
$ id
$
INPUT: id + id id $ OUTPUT:
E
T E’
Predictive Parsing
STACK: T’
E’
Program F T’
E’
$
$ id
T FT’ id F T’
F id
id F T’
T’ FT’
F id id
T’ When Top(Stack) = input = $
E’ the parser halts and accepts the
input string.
48
Non-recursive predictive parsing…
Example: G:
E TR
R +TR Input: 1+2
R -TR
Rε
T 0|1|…|9
X|a 0 1 … 9 + - $
49
Non-recursive predictive parsing…
50
FIRST and FOLLOW
The construction of both top-down and bottom-up parsers
are aided by two functions, FIRST and FOLLOW, associated
with a grammar G.
51
FIRST and FOLLOW
We need to build a FIRST set and a FOLLOW set
for each symbol in the grammar.
FIRST
FIRST(α) = set of terminals that begin the strings
derived from α.
If α => ε in zero or more steps, ε is in FIRST(α).
58
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
59
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
60
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
61
Exercies:
Find FIRST and FOLLOW sets for the following
grammar G:
E TR
FIRST(E)=FIRST(T)={0,1,…,9}
R +TR FIRST(R)={+,-,ε}
R -TR
Rε
T 0|1|…|9 FOLLOW(E)={$}
FOLLOW(T)={+,-,$}
FOLLOW(R)={$}
62
Exercise…
Consider the following grammar over the alphabet
{ g,h,i,b}
A BCD
B bB | ε
C Cg | g | Ch | i
D AB | ε
Fill in the table below with the FIRST and FOLLOW sets for
the non-terminals in this grammar:
FIRST FOLLOW
A
B
C
D
63
Construction of predictive parsing table
Input Grammar G
Output Parsing table M
For each production of the form A α of the
grammar do:
• For each terminal a in FIRST(α), add A α to
M[A, a]
• If ε Є FIRST(α), add A α to M[A, b] for each b in
FOLLOW(A)
• If ε Є FIRST(α) and $ Є FOLLOW(A), add A α
to M[A, $]
• Make each undefined entry of M be an error.
64
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
65
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
66
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
67
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
68
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
69
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
2. If A :
if FIRST(), add A to M[A, b]
for each terminal b FOLLOW(A),
70
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
2. If A :
if FIRST(), add A to M[A, b]
for each terminal b FOLLOW(A),
71
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
2. If A :
if FIRST(), add A to M[A, b]
for each terminal b FOLLOW(A),
3. If A :
if FIRST(), and $ FOLLOW(A),
add A to M[A, $]
PARSING NON- INPUT SYMBOL
TERMINAL id + * ( ) $
TABLE:
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
72
Example:
73
Non-recursive predictive parsing…
Exercise 1:
Consider the following grammars G, Construct the
predictive parsing table and parse the input symbols:
id + id * id
FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E TE’ FIRST(E’)={+,ε}
E’ +TE’ | FIRST(T’)={*,ε}
T FT’
T’ FT’ | FOLLOW(E)=FOLLOW(E’)={$,)}
F ( E ) | id FOLLOW(T)=FOLLOW(T’)={+,$,)}
FOLLOW(F)={*,+,$,)}
75
LL(k) Parser
This parser parses from left to right, and does a
leftmost-derivation. It looks up 1 symbol ahead to
choose its next action. Therefore, it is known as
a LL(1) parser.
77
Non- LL(1) Grammar: Examples
78
LL(1) Grammars…
Exercise: Consider the following grammar G:
A’ A
A xA | yA |y
a) Find FIRST and FOLLOW sets for G:
b) Construct the LL(1) parse table for this grammar.
c) Explain why this grammar is not LL(1).
d) Transform the grammar into a grammar that is
LL(1).
e) Give the parse table for the grammar created in
(d).
79
A’A
AxA | yA | y x y $
A’ A’A A’A
A AxA AyA
FIRST(A)=FIRST(A’)={x,y}
Ay
FOLLOW(A)=FOLLOW(A’)={$}
Now G is LL(1)
Not LL(1): Multiply
x y $ defined entry in [A,y]
A’ A’A A’A
A AxA AyA’’
A’’ A’’A A’’A A’’ε Left factor
FIRST(A’)=FIRST(A)={x,y} A’A
FIRST(A’’)={x,y,ε} AxA | yA’’
FOLLOW(A)=FOLLOW(A’)=FOLLOW(A’’)={$} A’’A | ε
80
LL(1) Grammar: Exercise
Given G:
FIRST(S)={i,a}
S iEtSS’ | a FIRST(E)={b}
S’ eS | ε FIRST(S’)={e,ε}
Eb FOLLOW(S)=FOLLOW(S’)={$,e}
FOLLOW(E)={t}
No: Multiply a b e i t $
defined table S Sa SiEtSS’
entry S’ S’eS S’ε
S’ε
E Eb
ibtaea
81
Exercises
82
Exercises…
83
Exercises…
3. Given the following grammar:
program procedure STMT–LIST
STMT–LIST STMT STMT–LIST | STMT
STMT do VAR = CONST to CONST begin STMT–LIST end
| ASSN–STMT
Show the parse tree for the following code fragment:
procedure
do i=1 to 100 begin
ASSN –STMT
ASSN-STMT
end
ASSN-STMT
84
Exercises…
85
Syntax error handling
Common programming errors can occur at many different
levels:
Lexical errors include misspellings of identifiers, keywords, or
operators: E.g., ebigin instead of begin
Syntactic errors include misplaced semicolons ; or adding or
missing of braces { }, case without switch…
Semantic errors include type mismatches between operators
and operands. a return statement in a Java method with result
type void. Operator applied to incompatible operand
Logical errors can be anything from incorrect reasoning. E.g,
assignment operator = instead of the comparison operator ==
86
Syntax error handling…
The error handler should be written with the
following goals in mind:
87
Syntax error handling…
There are four main strategies in error handling:
89
Panic mode error recovery strategy
Primary error situation occurs with
a non-terminal A on the top of the stack and
the current input token is not in FIRST A (or FOLLOW (A), ε €
FIRST (A))
Solution
Build the set of synchronizing tokens directly into the
LL(1) parsing table.
Possible alternatives
1. Pop A from the stack
2. Successively pop tokens from the input until a token is
seen for which we can restart the parse.
90
Panic mode error recovery…
Choose alternative 1 – If the current input token is $ or is in
FOLLOW (A) (synch)
Chose alternative 2 – If the current input token is not $ and is not
in FIRST (A) υ FOLLOW (A). (scan)
Example: Using FOLLOW and FIRST symbols as synchronizing
tokens, the parse table for grammar G:
E TE’ FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E’ +TE’ | FIRST(E’)={+,ε}
FIRST(T’)={*,ε} FOLLOW(E)=FOLLOW(E’)={$,)}
T FT’ FOLLOW(T)=FOLLOW(T’)={+,$,)}
T’ FT’ | FOLLOW(F)={*,+,$,)}
F ( E ) | id
Bottom-up parsers:
• build the nodes on the bottom of the parse tree first.
• Suitable for automatic parser generation, handle a larger
class of grammars.
examples: shift-reduce parser (or LR(k) parsers)
92
Bottom-Up Parser
A bottom-up parser, or a shift-reduce parser, begins
at the leaves and works up to the top of the tree.
S aABe
Consider the Grammar: A Abc | b
B d
93
Bottom-Up Parser: Simulation
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc
Program
Ab
Bd
94
Bottom-Up Parser: Simulation
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc Program A
Ab
Bd b
95
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc Program A
Ab
Bd b
96
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc Program A
Ab
Bd b
97
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc Program A b c
Ab
Bd b
98
Bottom-Up Parser: Simulation
INPUT: a A d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc Program A b c
Ab
Bd b
99
Bottom-Up Parser: Simulation
INPUT: a A d e $ OUTPUT:
Production
A B
S aABe
Bottom-Up Parsing
A Abc Program A b c d
Ab
Bd b
100