0% found this document useful (0 votes)
31 views100 pages

Chapter - Three: Syntax Analysis

Uploaded by

Fedasa Bote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views100 pages

Chapter - Three: Syntax Analysis

Uploaded by

Fedasa Bote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Chapter – three

Syntax analysis

1
Outline
 Introduction
 Context free grammar (CFG)
 Derivation
 Parse tree
 Ambiguity
 Left recursion
 Left factoring
 Top-down parsing
• Recursive Descent Parsing (RDP)
• Non-recursive predictive parsing
– First and follow sets
– Construction of a predictive parsing table

2
Outline
 LR(1) grammars
 Syntax error handling
 Error recovery in predictive parsing
 Panic mode error recovery strategy

 Bottom-up parsing (LR(k) parsing)


 Stack implementation of shift/reduce parsing
 Conflict during shift/reduce parsing
 LR parsers
 Constructing SLR parsing tables
 Canonical LR parsing
 LARL (Reading assignment)

 Yacc

3
Introduction
 Syntax: the way in which tokens are put together to form
expressions, statements, or blocks of statements.
 The rules governing the formation of statements in a programming
language.

 Syntax analysis: checks if the sequence of tokens generated


by the lexical analyzer follows the grammatical rules of the
programming language.
 Parsing: is the process of analyzing the grammatical
structure of a program's source code to determine its
syntactic correctness and build a structured representation of
it.
 The syntax of a programming language is usually given by the
grammar rules of a context free grammar (CFG).
4
Parser

Parse tree
next char next token
lexical Syntax
analyzer analyzer
get next
char get next
token

Source
Program
symbol
table

Lexical Syntax
(Contains a record Error
Error
for each identifier)

5
Introduction…
 The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG or
not.
 If it satisfies, the parser creates the parse tree of that
program.
 Otherwise, the parser gives the error messages.

 A CFG:
 gives a precise syntactic specification of a programming
language.
 A grammar can be directly converted in to a parser by
some tools (yacc).
6
Introduction…
 The parser can be categorized into two groups:
 Top-down parser
 The parse tree is created top to bottom, starting from the
root to leaves.
 Bottom-up parser
 The parse tree is created bottom to top, starting from the
leaves to root.
 Both top-down and bottom-up parser scan the input from
left to right (one symbol at a time).
 Efficient top-down and bottom-up parsers can be
implemented using context-free-grammar.
 LL for top-down parsing
 LR for bottom-up parsing
7
Context free grammar (CFG)
 A context-free grammar (CFG) is a specification for the
syntactic structure of a programming language.
 Context-free grammar has 4-tuples:
G = (T, N, P, S) where
 T is a finite set of terminals (a set of tokens)
 N is a finite set of non-terminals (syntactic variables)
 P is a finite set of productions of the form
A→α where A is non-terminal and
α is a strings of terminals and non-terminals (including the empty
string)
 S N is a designated start symbol (one of the non-
terminal symbols)
8
Example: grammar for simple arithmetic expressions

expression  expression + term Terminal symbols


expression  expression - term id + - * / ( )
expression  term
term  term * factor Non-terminals
term  term / factor expression
term  factor term
factor  (expression) Factor
factor  id Start symbol
expression

9
Notational Conventions Used
 Terminals:
 Lowercase letters early in the alphabet, such as a, b, c.
 Operator symbols such as +, *, and so on.
 Punctuation symbols such as parentheses, comma, and so on.
 The digits 0,1,. . . ,9.
 Boldface strings such as id or if, each of which represents a
single terminal symbol.
 Non-terminals:
 Uppercase letters early in the alphabet, such as A, B, C.
 The letter S is usually the start symbol.
 Lowercase, italic names such as expr or stmt.
 Uppercase letters may be used to represent non-terminals for
the constructs.
• expr, term, and factor are represented by E, T, F 10
Notational Conventions Used…
 Grammar symbols
 Uppercase letters late in the alphabet, such as X, Y, Z, that is, either
non-terminals or terminals.
 Strings of terminals
 Lowercase letters late in the alphabet, mainly u,v,x,y T*
 Strings of grammar symbols
 Lowercase Greek letters, α, β, γ (N T)*
 A set of productions A  α1, A  α2, . . . , A  αk with a common head A
(call them A-productions), may be written
A  α1 | α2 |…| αk
α1, α2,. . . , αk the alternatives for A.
 The head of the first production is the start symbol.

EE+T|E-TIT
TT*FIT/FIF
F  ( E ) | id 11
Derivation
 A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.

 Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id

E => E + E means that E + E is derived from E


- we can replace E by E + E
- we have to have a production rule E → E + E in our grammar.

E=>E+E =>id+E=>id+id means that a sequence of replacements of


non-terminal symbols is called a derivation of id+id from E.
12
Derivation…
 In general The one-step derivation is defined by
αAβ α γ β if there is a production rule A → γ in our
grammar
Where α and β are arbitrary strings of terminal and non-
terminal symbols.
α1=> α2=>….=> αn (αn is derived from α1 or α1 derives αn)

 At each derivation step, we can choose any of the non-


terminal in the sentential form of G for the replacement.

 Transitive closure * (zero or more steps)


 Positive closure + (one or more steps)
13
Derivation…
 If we always choose the left-most non-terminal in each
derivation step, this derivation is called left-most derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)
 If we always choose the right-most non-terminal in each
derivation step, this derivation is called right-most
derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)
 We will see that the top-down parser try to find the left-most
derivation of the given source program.
 We will see that the bottom-up parser try to find right-most
derivation of the given source program in the reverse order.

14
Parse tree
 A parse tree is a graphical representation of a derivation.
 It filters out the order in which productions are applied to replace
non-terminals.

 A parse tree corresponding to a derivation is a labeled tree


in which:
• the interior nodes are labeled by non-terminals,
• the leaf nodes are labeled by terminals, and
• the children of each internal node represent the
replacement of the associated non-terminal in one step
of the derivation.

15
Parse tree and Derivation
Grammar E  E + E | E  E | ( E ) | - E | id
Lets examine this derivation:
E  -E  -(E)  -(E + E)  -(id + id)

E E E E E

- E - E - E - E

( E ) ( E ) ( E )

E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
16
Exercise
a) Using the grammar below, draw a parse tree for the
following string:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
b) Give a rightmost derivation for the string given in (a).

17
Ambiguity
 A grammar, which produces more than one parse tree for a
sentence is called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same sentence.

 We should eliminate the ambiguity in the grammar during the


design phase of the compiler.
 An unambiguous grammar should be written to eliminate
the ambiguity.
 E.g. Ambiguous grammars (b/c of ambiguous operators) can be
disambiguated according to the precedence and associatively rules.

18
Ambiguity: Example
 Example: The arithmetic expression grammar

E → E + E | E * E | ( E ) | id

 permits two distinct leftmost derivations for the


sentence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id

19
Ambiguity: example
E  E + E | E  E | ( E ) | - E | id
Construct parse tree for the expression: id + id  id
E E E E

E + E E + E E + E

E  E id E  E

id id
E E E E

E  E E  E E  E

E + E E + E id
Which parse tree is correct?
id id
20
Ambiguity: example…
E  E + E | E  E | ( E ) | - E | id
Find a derivation for the expression: id + id  id
E
According to the grammar, both are correct.
E + E

id E  E

A grammar that produces more than one id id


parse tree for any input sentence is said
to be an ambiguous grammar. E

E + E

E  E id

id id
21
Elimination of ambiguity
Precedence/Association
 These two derivations point out a problem with the grammar:
 The grammar do not have notion of precedence, or implied order of
evaluation.
To add precedence
 Create a non-terminal for each level of precedence
 Isolate the corresponding part of the grammar
 Force the parser to recognize high precedence sub expressions first

For algebraic expressions


 Multiplication and division, first (level one)
 Subtraction and addition, next (level two)

To add association
 Left-associative : The next-level (higher) non-terminal places at the last of a
production.
22
Elimination of ambiguity
 To disambiguate the grammar:

E  E + E | E  E | ( E ) | id

 we can use precedence of operators as follows:


* Higher precedence (left associative)
+ Lower precedence (left associative)

 We get the following unambiguous grammar:

EE+T|T id + id * id
TTF|F
F  ( E ) | id
23
Left Recursion
EE+T|T
Consider the grammar: TTF|F
F  ( E ) | id

A top-down parser might loop forever when parsing


an expression using this grammar

E E E E

E + T E + T E + T

E + T E + T

E + T

24
Elimination of Left recursion
 A grammar is left recursive, if it has a non-terminal A
such that there is a derivation
A=>+Aα for some string α.
 Top-down parsing methods cannot handle left-
recursive grammar.
 so a transformation that eliminates left-recursion is needed.

 To eliminate left recursion for single production


A  Aα|β could be replaced by the non left-recursive
productions
A  β A’
A’  α A’| ε
25
Elimination of Left recursion…

This left-recursive EE+T|T


grammar: TTF|F
F  ( E ) | id

Can be re-written to eliminate the immediate left recursion:

E  TE’
E’  +TE’ | 
T  FT’
T’  FT’ | 
F  ( E ) | id

26
Elimination of Left recursion…
 Generally, we can eliminate immediate left recursion from
them by the following technique.
 First we group the A-productions as:

A  Aα1 |Aα2 |…. |Aαm |β1 | β2|….| βn

Where no βi begins with A. then we replace the A


productions by:
A  β1A’ | β2A’ | … | βnA’
A’  α1Α’ | α2A’ | … | αmA’ |ε

27
Eliminating left-recursion algorithm

 Arrange non-terminals in some order A1....An


for i from 1 to n do {
for j from 1 to i-1 do {
replace each production of the form Ai  Ajγ
By the productions
Ai  α1γ|....|αkγ, where
Aj  α1 | α2 . . . |αk are all current Aj-productions
}
Eliminate immediate left-recursions among the Ai
production.
}
28
Example Left Recursion Elim.
A→BC|a
Choose arrangement : A,B,C
B → CA | Ab
C → AB | CC |a
A→BC|a
i=1: nothing to do
B → CA B’ | abB’
i=2, j=1: B → CA | A b B’ → CbB’ | ε
B → CA | BC b | a b
C → abB’CBC’|a B C’ | a C’
(imm) B → CA B’ | abB’ C’ → AB’CBC’ | CC’ |ε
B’ → CbB’ | ε

i=3, j=1: C → A B | CC | a
C → B C B | a B | CC | a

i=3, j=2: C → B C B | a B | CC | a
C A B’ CB | abB’CB | aB | CC | a
(imm) C → abB’CBC’ | a B C’ | a C’
C’ → AB’CBC’ | CC’ |ε
29
Eliminating left-recursion (more)
 Example: Given: S  Aa | b
A  Ac |Sd |ε
 Substitute the S productions in A  Sd to obtain the
following productions:
A  Ac | Aad | bd |ε
 Eliminating the immediate left recursion among the
A productions yields the following grammar:

S  Aa | b
A  bdA’ | A’
A’  cA’ | adA’ |ε

30
Left factoring

 When a non-terminal has two or more productions


whose right-hand sides start with the same grammar
symbols, the grammar is not LL(1) and cannot be used
for predictive parsing.
 A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
 In general : A  αβ1 | αβ2 , where α-is a non empty and
the first symbol of β1 and β2.

31
Left factoring…
 When processing α we do not know whether to expand A
to αβ1 or to αβ2, but if we re-write the grammar as
follows:
A  αA’
A’  β1 | β2 so, we can immediately expand A to αA’.
 Example: given the following grammar:
S  iEtS | iEtSeS | a
Eb
 Left factored, this grammar becomes:
S  iEtSS’ | a
S’  eS | ε
Eb
32
Left factoring…

The following stmt  if expr then stmt else stmt


grammar: | if expr then stmt
Cannot be parsed by a predictive parser that looks
one element ahead.
But the grammar stmt  if expr then stmt stmt’
can be re-written: stmt‘ else stmt | 
Where  is the empty string.
Rewriting a grammar to eliminate multiple productions
starting with the same token is called left factoring.

33
Syntax analysis (Parsing)
 Every language has rules that prescribe the syntactic
structure of well formed programs.
 The syntax can be described using Context Free Grammars
(CFG) notation.
 The use of CFGs has several advantages:
 helps in identifying ambiguities
 a grammar gives a precise yet easy to understand syntactic
specification of a programming language
 it is possible to have a tool which produces automatically a
parser using the grammar
 a properly designed grammar helps in modifying the parser
easily when the language changes
34
Top-down parsing
Recursive Descent Parsing (RDP)
 This method of top-down parsing can be considered as an
attempt to find the left most derivation for an input string.
 It may involve backtracking.

 To construct the parse tree using RDP:


 we create one node tree consisting of S.
 two pointers, one for the tree and one for the input, will be
used to indicate where the parsing process is.
 initially, they will be on S and the first input symbol, respectively.
 then we use the first S-production to expand the tree. The tree
pointer will be positioned on the left most symbol of the newly
created sub-tree.
35
Recursive Descent Parsing (RDP)…

 as the symbol pointed by the tree pointer matches that of the


symbol pointed by the input pointer, both pointers are moved
to the right.
 whenever the tree pointer points on a non-terminal, we
expand it using the first production of the non-terminal.
 whenever the pointers point on different terminals, the
production that was used is not correct, thus another
production should be used. We have to go back to the step
just before we replaced the non-terminal and use another
production.
 if we reach the end of the input and the tree pointer passes the
last symbol of the tree, we have finished parsing.

36
RDP…

 Example: G: S  cAd
A  ab|a
 Draw the parse tree for the input string cad using
the above method.

 Exercise: Consider the following grammar:


SA
A  A + A | B++
By
Draw the parse tree for the input “ y+++y++”
37
Exercise
 Using the grammar below, draw a parse tree for the
following string using RDP algorithm:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E

38
Non-recursive predictive parsing
 It is possible to build a non-recursive parser by explicitly
maintaining a stack.
 This method uses a parsing table that determines the next
production to be applied.
x=a=$ id + id  id $ OUTPUT:
INPUT:
x=a≠$
X is non-terminal E

E
Predictive Parsing
STACK:
$ Program

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
PARSING E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
TABLE: T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
39
Non-recursive predictive parsing…
 The input buffer contains the string to be parsed followed
by $ (the right end marker)
 The stack contains a sequence of grammar symbols with $
at the bottom.
 Initially, the stack contains the start symbol of the grammar
followed by $.
 The parsing table is a two dimensional array M[A, a]
where A is a non-terminal of the grammar and a is a
terminal or $.
 The parser program behaves as follows.
 The program always considers
 X, the symbol on top of the stack and
 a, the current input symbol.
40
Predictive Parsing…
 There are three possibilities:
1. x = a = $ : the parser halts and announces a successful
completion of parsing
2. x = a ≠ $ : the parser pops x off the stack and advances
the input pointer to the next symbol
3. X is a non-terminal : the program consults entry M[X, a]
which can be an X-production or an error entry.
 If M[X, a] = {X  uvw}, X on top of the stack will be replaced
by uvw (u at the top of the stack).
 As an output, any code associated with the X-production can
be executed.
 If M[X, a] = error, the parser calls the error recovery method.
41
Predictive Parsing algorithm
set ip to point to the first symbol of w;
set X to the top stack symbol;
while ( X ≠ $ ) { /* stack is not empty */
if ( X is a ) pop the stack and advance ip;
else if ( X is a terminal ) error();
else if ( M[X, a] is an error entry ) error();
else if ( M[X,a] = X  Y1Y2 … Yk ) {
output the production X  Y1Y2 … Yk;
pop the stack;
push Yk, Yk-1,. . . , Y1 onto the stack, with Y1 on top;
}
set X to the top stack symbol;
}
42
A Predictive Parser table
E  TE’
E’  +TE’ | 
T  FT’
Grammar: T’  FT’ | 
F  ( E ) | id

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
E E  TE’ E  TE’
Parsing E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
Table: T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

43
Predictive Parsing Simulation

INPUT: id + id  id $ OUTPUT:
E

T E’
T
E
Predictive Parsing
STACK:
E’
$ Program
$

PARSING NON-
TERMINAL id +
INPUT SYMBOL
* ( ) $
TABLE: E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E) 44
Predictive Parsing Simulation…

INPUT: id + id  id $ OUTPUT:
E

T E’
Predictive Parsing
STACK: T
F
Program F T’
T’
E’
E’
$
$

PARSING NON- INPUT SYMBOL


TABLE: TERMINAL id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E) 45
Predictive Parsing Simulation…

INPUT: id + id  id $ OUTPUT:
E

T E’
Predictive Parsing
STACK: id
T
F
Program F T’
T’
E’
E’
$ id
$

PARSING NON- INPUT SYMBOL


TABLE: TERMINAL id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E) 46
Predictive Parsing Simulation…

INPUT: id + id  id $ OUTPUT:
E

T E’
Predictive Parsing
STACK: T’
E’
Program F T’
E’
$
$ id 

PARSING NON- INPUT SYMBOL


TABLE: TERMINAL id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E) 47
Predictive Parsing Simulation…

The predictive parser proceeds E

in this fashion using the T E’


following productions:
E’  +TE’ F T’ + T E’

T  FT’ id  F T’ 
F  id
id  F T’
T’   FT’
F  id id 
T’   When Top(Stack) = input = $
E’   the parser halts and accepts the
input string.
48
Non-recursive predictive parsing…
 Example: G:
E  TR
R  +TR Input: 1+2
R  -TR
Rε
T  0|1|…|9
X|a 0 1 … 9 + - $

E ETR ETR … ETR Error Error Error

R Error Error … Error R+TR R-TR Rε

T T0 T1 … T9 Error Error Error

49
Non-recursive predictive parsing…

50
FIRST and FOLLOW
 The construction of both top-down and bottom-up parsers
are aided by two functions, FIRST and FOLLOW, associated
with a grammar G.

 During top-down parsing, FIRST and FOLLOW allow us to


choose which production to apply, based on the next input
symbol.

 During panic-mode error recovery, sets of tokens produced


by FOLLOW can be used as synchronizing tokens.

51
FIRST and FOLLOW
We need to build a FIRST set and a FOLLOW set
for each symbol in the grammar.

The elements of FIRST and FOLLOW are


terminal symbols.

FIRST() is the set of terminal symbols that can


begin any string derived from .

FOLLOW() is the set of terminal symbols that can follow :

t  FOLLOW()   derivation containing t


52
Construction of a predictive parsing table

 Makes use of two functions: FIRST and FOLLOW.

FIRST
 FIRST(α) = set of terminals that begin the strings
derived from α.
 If α => ε in zero or more steps, ε is in FIRST(α).

 FIRST(X) where X is a grammar symbol can be found


using the following rules:
1- If X is a terminal, then FIRST(x) = {x}
2- If X is a non-terminal: two cases
53
Construction of a predictive parsing table…
2- If X is a non-terminal: two cases…
a) If X  ε is a production, then add ε to FIRST(X)
b) For each production X  Y1Y2…Yk, place a in
FIRST(X) if for some i, a Є FIRST(Yi) and ε Є
FIRST(Yj), for 1<j<i
If ε Є FIRST(Yj), for j=1, …,k then ε Є FIRST(X)

For any string Y = X1X2…Xn


a- Add all non- ε symbols of FIRST(X1) in FIRST(Y)
b- Add all non- ε symbols of FIRST(Xi) for i≠1 if for all
j<i, ε Є FIRST(Xj)
c- ε Є FIRST(Y) if ε Є FIRST(Xi) for all i
54
Construction of a predictive parsing table…
FOLLOW
 FOLLOW(A) = set of terminals that can appear
immediately to the right of A in some sentential form.

1- Place $ in FOLLOW(A), where A is the start symbol.

2- If there is a production B  αAβ, then everything in


FIRST(β), except ε, should be added to FOLLOW(A).

3- If there is a production B  αA or B  αAβ and ε Є


FIRST(β), then all elements of FOLLOW(B) should be
added to FOLLOW(A).
55
Rules to Create FIRST
GRAMMAR: FIRST rules:
E  TE’ 1. If X is a terminal, FIRST(X) = {X}
E’  +TE’ | 
T  FT’ 2. If X   , then   FIRST(X)
T’  FT’ |  3. If X  Y1Y2 ••• Yk
F  ( E ) | id and Y1 ••• Yi-1 * 
SETS: and a FIRST(Yi)
FIRST(id) = {id} then a  FIRST(X)
FIRST() = {}
FIRST(+) = {+}
FIRST(() = {(}
FIRST()) = {)}
FIRST(E’) = {} {+, }
FIRST(T’) = {} {, }
FIRST(F) = {(, id}
FIRST(T) = FIRST(F) = {(, id}
FIRST(E) = FIRST(T) = {(, id}
56
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}

GRAMMAR: FOLLOW rules:


E  TE’ 1. If S is the start symbol, then $  FOLLOW(S)
E’  +TE’ | 
2. If A  B,
T  FT’
T’  FT’ | 
and a  FIRST()
F  ( E ) | id and a  
then a  FOLLOW(B)
SETS: 3. If A  B
FOLLOW(E) = {$} { ), $} and a  FOLLOW(A)
FOLLOW(E’) = { ), $} then a  FOLLOW(B)
FOLLOW(T) = { ), $} 3a. If A  B
 *  and
and a  FOLLOW(A)
then a  FOLLOW(B)

A and B are non-terminals,


 and  are strings of grammar symbols 57
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}

GRAMMAR: FOLLOW rules:


E  TE’ 1. If S is the start symbol, then $  FOLLOW(S)
E’  +TE’ | 
2. If A  B,
T  FT’
T’  FT’ | 
and a  FIRST()
F  ( E ) | id and a  
then a  FOLLOW(B)
SETS: 3. If A  B
FOLLOW(E) = {), $} and a  FOLLOW(A)
FOLLOW(E’) = { ), $} then a  FOLLOW(B)
FOLLOW(T) = { ), $} {+, ), $} 3a. If A  B
 *  and
and a  FOLLOW(A)
then a  FOLLOW(B)

58
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}

GRAMMAR: FOLLOW rules:


E  TE’ 1. If S is the start symbol, then $  FOLLOW(S)
E’  +TE’ | 
2. If A  B,
T  FT’
T’  FT’ | 
and a  FIRST()
F  ( E ) | id and a  
then a  FOLLOW(B)
SETS: 3. If A  B
FOLLOW(E) = {), $} and a  FOLLOW(A)
FOLLOW(E’) = { ), $} then a  FOLLOW(B)
FOLLOW(T) = {+, ), $} 3a. If A  B
 *  and
FOLLOW(T’) = {+, ), $}
and a  FOLLOW(A)
then a  FOLLOW(B)

59
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}

GRAMMAR: FOLLOW rules:


E  TE’ 1. If S is the start symbol, then $  FOLLOW(S)
E’  +TE’ | 
2. If A  B,
T  FT’
T’  FT’ | 
and a  FIRST()
F  ( E ) | id and a  
then a  FOLLOW(B)
SETS: 3. If A  B
FOLLOW(E) = {), $} and a  FOLLOW(A)
FOLLOW(E’) = { ), $} then a  FOLLOW(B)
FOLLOW(T) = {+, ), $} 3a. If A  B
 *  and
FOLLOW(T’) = {+, ), $}
FOLLOW(F) = {+, ), $} and a  FOLLOW(A)
then a  FOLLOW(B)

60
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}

GRAMMAR: FOLLOW rules:


E  TE’ 1. If S is the start symbol, then $  FOLLOW(S)
E’  +TE’ | 
2. If A  B,
T  FT’
T’  FT’ | 
and a  FIRST()
F  ( E ) | id and a  
then a  FOLLOW(B)
SETS: 3. If A  B
FOLLOW(E) = {), $} and a  FOLLOW(A)
FOLLOW(E’) = { ), $} then a  FOLLOW(B)
FOLLOW(T) = {+, ), $} 3a. If A  B
 * and
FOLLOW(T’) = {+, ), $}
FOLLOW(F) = {+, ), $} {+, , ), $} and a  FOLLOW(A)
then a  FOLLOW(B)

61
Exercies:
 Find FIRST and FOLLOW sets for the following
grammar G:
E  TR
FIRST(E)=FIRST(T)={0,1,…,9}
R  +TR FIRST(R)={+,-,ε}
R  -TR
Rε
T  0|1|…|9 FOLLOW(E)={$}
FOLLOW(T)={+,-,$}
FOLLOW(R)={$}

62
Exercise…
 Consider the following grammar over the alphabet
{ g,h,i,b}
A  BCD
B  bB | ε
C  Cg | g | Ch | i
D  AB | ε
Fill in the table below with the FIRST and FOLLOW sets for
the non-terminals in this grammar:
FIRST FOLLOW
A
B
C
D

63
Construction of predictive parsing table
 Input Grammar G
 Output Parsing table M
 For each production of the form A  α of the
grammar do:
• For each terminal a in FIRST(α), add A  α to
M[A, a]
• If ε Є FIRST(α), add A  α to M[A, b] for each b in
FOLLOW(A)
• If ε Є FIRST(α) and $ Є FOLLOW(A), add A  α
to M[A, $]
• Make each undefined entry of M be an error.
64
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

65
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

66
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

67
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

68
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

69
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]
2. If A  :
if   FIRST(), add A   to M[A, b]
for each terminal b  FOLLOW(A),

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

70
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]
2. If A  :
if   FIRST(), add A   to M[A, b]
for each terminal b  FOLLOW(A),

PARSING NON- INPUT SYMBOL


TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

71
GRAMMAR: FIRST SETS: FOLLOW SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’  +TE’ | 
T  FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
F  ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}

1. If A  :
if a  FIRST(), add A   to M[A, a]
2. If A  :
if   FIRST(), add A   to M[A, b]
for each terminal b  FOLLOW(A),
3. If A  :
if   FIRST(), and $  FOLLOW(A),
add A   to M[A, $]
PARSING NON- INPUT SYMBOL
TERMINAL id + * ( ) $
TABLE:
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)

72
Example:

 Construct the predictive parsing table for


the grammar G:
E  TR FIRST(E)=FIRST(T)={0,1,…,9}
FIRST(R)={+,-,ε}
R  +TR
R  -TR
Rε FOLLOW(E)={$}
T  0|1|…|9 FOLLOW(T)={+,-,$}
FOLLOW(R)={$}

73
Non-recursive predictive parsing…
Exercise 1:
Consider the following grammars G, Construct the
predictive parsing table and parse the input symbols:
id + id * id
FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E  TE’ FIRST(E’)={+,ε}
E’  +TE’ |  FIRST(T’)={*,ε}
T  FT’
T’  FT’ |  FOLLOW(E)=FOLLOW(E’)={$,)}
F  ( E ) | id FOLLOW(T)=FOLLOW(T’)={+,$,)}
FOLLOW(F)={*,+,$,)}

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
74
Non-recursive predictive parsing…
Exercise 2:
Let G be the following grammar:
S  [ SX ] | a
X  ε | +SY | Yb
Y  ε | -SXc
A – Find FIRST and FOLLOW sets for the non-terminals
in this grammar.
B – Construct predictive parsing table for the grammar
above.
C – Show a top down parse of the string [a+a-ac]

75
LL(k) Parser
This parser parses from left to right, and does a
leftmost-derivation. It looks up 1 symbol ahead to
choose its next action. Therefore, it is known as
a LL(1) parser.

An LL(k) parser looks k symbols ahead to decide


its action.

• A grammar for which the parsing table does not


have a multiply-defined entries is called an LL(1)
grammar.
• If G is left recursive, ambiguous, or has problem of left
factor, then M will have at least one multiply-defined entry.
76
LL(1) Grammars
Determining LL(1) grammars:
 A grammar is LL(1) iff for each pair of A-productions
Aα|β
 For no terminal a do α and β can derive strings beginning
with a
 At most one of α and β can derive to an empty string (ε)
 If β ==>* ε then α does not derive to any string beginning
with a terminal in follow(A)

1. FIRST(α) ∩ FIRST(β) = (disjoint sets)


2. If β => ε then
2.a α *ε
2.b FIRST(α) ∩ FOLLOW(A) = (disjoint sets)

77
Non- LL(1) Grammar: Examples

Grammar Not LL(1), because:


S→ S a | a Left recursive
S→aS|a Left factored
FIRST(a S) ∩ FIRST(a) ≠
S→aR|ε
R→S|ε For R: S *ε
S→aRa For R:
R→S|ε FIRST(S) ∩ FOLLOW(R) ≠

78
LL(1) Grammars…
 Exercise: Consider the following grammar G:
A’  A
A  xA | yA |y
a) Find FIRST and FOLLOW sets for G:
b) Construct the LL(1) parse table for this grammar.
c) Explain why this grammar is not LL(1).
d) Transform the grammar into a grammar that is
LL(1).
e) Give the parse table for the grammar created in
(d).
79
A’A
AxA | yA | y x y $
A’ A’A A’A
A AxA AyA
FIRST(A)=FIRST(A’)={x,y}
Ay
FOLLOW(A)=FOLLOW(A’)={$}

Now G is LL(1)
Not LL(1): Multiply
x y $ defined entry in [A,y]
A’ A’A A’A
A AxA AyA’’
A’’ A’’A A’’A A’’ε Left factor

FIRST(A’)=FIRST(A)={x,y} A’A
FIRST(A’’)={x,y,ε} AxA | yA’’
FOLLOW(A)=FOLLOW(A’)=FOLLOW(A’’)={$} A’’A | ε
80
LL(1) Grammar: Exercise
 Given G:
FIRST(S)={i,a}
S iEtSS’ | a FIRST(E)={b}
S’ eS | ε FIRST(S’)={e,ε}
Eb FOLLOW(S)=FOLLOW(S’)={$,e}
FOLLOW(E)={t}

Is this Grammar LL(1)?

No: Multiply a b e i t $
defined table S Sa SiEtSS’
entry S’ S’eS S’ε
S’ε
E Eb
ibtaea
81
Exercises

1. Given the following grammar:


S  WAB | ABCS
A  B | WB
B  ε |yB
Cz
Wx
a) Find FIRST and FOLLOW sets of the grammar.
b) Construct the LL(1) parse table.
c) Is the grammar LL(1)? Justify your answer.

82
Exercises…

2. Consider the following grammar:


S  ScB | B
B  e | efg | efCg
C  SdC | S

a) Justify whether the grammar is LL(1) or not?


b) If not, translate the grammar into LL(1).
c) Construct predictive parsing table for the above
grammar.

83
Exercises…
3. Given the following grammar:
program  procedure STMT–LIST
STMT–LIST  STMT STMT–LIST | STMT
STMT  do VAR = CONST to CONST begin STMT–LIST end
| ASSN–STMT
Show the parse tree for the following code fragment:
procedure
do i=1 to 100 begin
ASSN –STMT
ASSN-STMT
end
ASSN-STMT
84
Exercises…

4. Consider the grammar:


E  BA
A  &BA | ε
B  TRUE | FALSE

note: &, true, false are terminals


A- Construct LL(1) parse table for this grammar
B- Parse the following input string TRUE &FALSE &TRUE

85
Syntax error handling
 Common programming errors can occur at many different
levels:
 Lexical errors include misspellings of identifiers, keywords, or
operators: E.g., ebigin instead of begin
 Syntactic errors include misplaced semicolons ; or adding or
missing of braces { }, case without switch…
 Semantic errors include type mismatches between operators
and operands. a return statement in a Java method with result
type void. Operator applied to incompatible operand
 Logical errors can be anything from incorrect reasoning. E.g,
assignment operator = instead of the comparison operator ==

86
Syntax error handling…
 The error handler should be written with the
following goals in mind:

• Errors should be reported clearly and accurately


• The compiler should recover efficiently and detect
other errors
• It should not slow down the whole process
significantly
• It should report the place of the error
• It should also report the type of the error

87
Syntax error handling…
 There are four main strategies in error handling:

 Panic mode error recovery: discards all tokens until a


synchronization token is found.
 Phrase level recovery: the parser makes a local correction so
that it can continue to parse the rest of the input.
• Replace comma by a semicolon, delete or insert semicolon…

 Error productions: augment the grammar to capture the most


common errors that programmers make.
 Global correction: attempt to achieve the most accurate and
minimal repair of syntactical errors by considering the entire
program's context.
• It seeks to make the fewest possible changes to the erroneous input to
transform it into a valid, syntactically correct program.
88
Error recovery in predictive parsing
 An error can be detected in predictive parsing:
 When the terminal on top of the stack does not match
the next input symbol or
 When there is a non-terminal A on top of the stack and a
is the next input symbol and M[A, a] = error.
 Panic mode error recovery method
 Synchronization tokens and scan

89
Panic mode error recovery strategy
 Primary error situation occurs with
 a non-terminal A on the top of the stack and
 the current input token is not in FIRST A (or FOLLOW (A), ε €
FIRST (A))

Solution
 Build the set of synchronizing tokens directly into the
LL(1) parsing table.
Possible alternatives
1. Pop A from the stack
2. Successively pop tokens from the input until a token is
seen for which we can restart the parse.

90
Panic mode error recovery…
 Choose alternative 1 – If the current input token is $ or is in
FOLLOW (A) (synch)
 Chose alternative 2 – If the current input token is not $ and is not
in FIRST (A) υ FOLLOW (A). (scan)
 Example: Using FOLLOW and FIRST symbols as synchronizing
tokens, the parse table for grammar G:
E  TE’ FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E’  +TE’ |  FIRST(E’)={+,ε}
FIRST(T’)={*,ε} FOLLOW(E)=FOLLOW(E’)={$,)}
T  FT’ FOLLOW(T)=FOLLOW(T’)={+,$,)}
T’  FT’ |  FOLLOW(F)={*,+,$,)}
F  ( E ) | id

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
E E  TE’ scan scan E  TE’ synch synch
Parse E’ scan E’  +TE’ scan scan E’   E’  
+id*+id T T  FT’ synch scan T  FT’ synch synch
T’ scan T’  T’  *FT’ scan T’   T’  
F F  id synch synch F  (E) synch synch
91
Bottom-Up and Top-Down
Parsers
Top-down parsers:
• Starts constructing the parse tree at the top (root) of the
tree and move down towards the leaves.
• Easy to implement by hand, but work with restricted
grammars.
example: predictive parsers

Bottom-up parsers:
• build the nodes on the bottom of the parse tree first.
• Suitable for automatic parser generation, handle a larger
class of grammars.
examples: shift-reduce parser (or LR(k) parsers)

92
Bottom-Up Parser
A bottom-up parser, or a shift-reduce parser, begins
at the leaves and works up to the top of the tree.

The reduction steps trace a rightmost derivation


on reverse.

S  aABe
Consider the Grammar: A  Abc | b
B d

We want to parse the input string abbcde.

93
Bottom-Up Parser: Simulation

INPUT: a b b c d e $ OUTPUT:

Production
S  aABe
Bottom-Up Parsing
A  Abc
Program
Ab
Bd

94
Bottom-Up Parser: Simulation

INPUT: a b b c d e $ OUTPUT:

Production
S  aABe
Bottom-Up Parsing
A  Abc Program A
Ab
Bd b

95
Bottom-Up Parser: Simulation

INPUT: a A b c d e $ OUTPUT:

Production
S  aABe
Bottom-Up Parsing
A  Abc Program A
Ab
Bd b

96
Bottom-Up Parser: Simulation

INPUT: a A b c d e $ OUTPUT:

Production
S  aABe
Bottom-Up Parsing
A  Abc Program A
Ab
Bd b

We are not reducing here in this example.


A parser would reduce, get stuck and then backtrack!

97
Bottom-Up Parser: Simulation

INPUT: a A b c d e $ OUTPUT:

Production
A
S  aABe
Bottom-Up Parsing
A  Abc Program A b c
Ab
Bd b

98
Bottom-Up Parser: Simulation

INPUT: a A d e $ OUTPUT:

Production
A
S  aABe
Bottom-Up Parsing
A  Abc Program A b c
Ab
Bd b

99
Bottom-Up Parser: Simulation

INPUT: a A d e $ OUTPUT:

Production
A B
S  aABe
Bottom-Up Parsing
A  Abc Program A b c d
Ab
Bd b

100

You might also like