0% found this document useful (0 votes)
5 views

Lec02-Syntax Analysis and LL

Uploaded by

Jojo Lannister
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lec02-Syntax Analysis and LL

Uploaded by

Jojo Lannister
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

SYNTAX ANALYSIS

2ND PHASE OF COMPILER


CONSTRUCTION

1
SECTION 2.1: CONTEXT FREE GRAMMAR

2
SYNTAX ANALYZER

 The syntax analyzer (parser) checks whether a given


source program satisfies the rules implied by a context-free
grammar or not.
 If it satisfies, the parser creates the parse tree of that program.
 Otherwise the parser gives the error messages.

 It creates the syntactic structure of the given source


program.
 This syntactic structure is mostly a parse tree.
 Syntax Analyzer is also known as parser.
 The syntax of a programming is described by a context-
free grammar (CFG).
 A context-free grammar
 gives a precise syntactic specification of a programming language.
3
 the design of the grammar is an initial phase of the design of a
compiler.
 a grammar can be directly converted into a parser by some tools.
PARSER
• Parser works on a stream of tokens.

• The smallest item is a token.

source Lexical token


Parser parse tree
program Analyzer get next token

4
PARSERS (CONT.)
 We categorize the parsers into two groups:

1. Top-Down Parser
 Parse-trees built is build from root to leaves (top to bottom).
 Input to parser is scanned from left to right one symbol at a time
2. Bottom-Up Parser
 Start from leaves and work their way up to the root.
 Input to parser scanned from left to right one symbol at a time

 Efficient top-down and bottom-up parsers can be


implemented only for sub-classes of context-free grammars.
 LL for top-down parsing
 LR for bottom-up parsing 5
WHY DO WE NEED A
GRAMMAR?

Grammar defines a Language.


There are some rules which need to be followed to
express or define a language.
These rules are laid down in the form of Production
rules (P).

Context-free grammar (CFG) is used to generate a


language called Context Free Language (L)

6
CONTEXT-FREE GRAMMARS
(CFG)

CFG G consist of 4 symbol (T,V, S, P):

 T: A finite set of terminals

 V: A finite set of non-terminals ( also denoted


by N)

 S: A start symbol (Non-terminal symbol with


which the grammar starts)

 P: A finite set of productions rules


7
CONTEXT-FREE GRAMMARS
(CFG)

Consider the Grammar:


S aAa/b
A a

G = (T,V, S, P)

{a, b} S aAa
S, A S b
Aa 8
TERMINALS SYMBOLS

Terminals include:
 Lower case letters early in the
alphabets
 Operator symbols, +, %
 Punctuation symbols such as
(),;
 Digits 0,1,2, …
 Boldface strings id or if

Consider the Grammar:


S aAa
S b;c Here Terminal Symbols
A aA/ ε are {a, b, c, ; , ε}
9
NON TERMINALS SYMBOLS

Non - Terminals include:


 Uppercase letters early in the alphabet
 The letter S, start symbol
 Lower case italic names such as expr or
stmt

Consider the Grammar:


Here Non- Terminal
S aAa Symbols are {A, B, S}
S bB
A aA/ ε
B b
10
PRODUCTION RULES

Production Rules include:


 Set of Rules which define the grammar G

Consider the Grammar:


S aAa
A aA/ a
Here we have three production rules
i. SaAa
ii. AaA
iii. A a

11
DERIVATION OF A STRING

String ‘w’ of terminals is generated by the grammar if:


Starting with the start variable, one can apply
productions and end up with ‘w’.
A sequence of replacements of non-terminal symbols
or a sequence of strings so obtained is a derivation of
‘w’.
We can derive sentence ‘aaa’ from
Consider the Grammar: this grammar.
S aAa SaAa
A aA/ a S aaa (Aa)

12
DERIVATION OF A STRING

In general a derivation step is:


A   if there is a production rule A in a grammar
where  and  are arbitrary strings of
terminal + and non-terminal symbols

1  2  ...  n (n derives from 1 or 1 derives n )

 : derives in one step


*
 : derives in zero or more steps
+
 : derives in one or more steps 13
DERIVATION OF A STRING

Consider the Grammar:


S aSa/b/aA
A a
Derived in one step Sb

Derived in two steps


SaSa  aba

Derived in multiple
SaSa  aaSaaaaaSaaaaaabaaa
steps

14
SENTENCE AND SENTENTIAL
FORM

A sentence of L(G) is a string of terminal symbols only.

A sentential form is a combination of terminals and non-


terminals.
Say, we have a production
S
If  contains non-terminals, it is called as a sentential form of G.

If  does not contain non-terminals, it is called as a sentence of


G.
15
LEFT-MOST AND RIGHT-MOST DERIVATIONS

We can derive the grammar in two ways:

 Left-Most Derivation
 Right- Most Derivation

In Left Most Derivation , we start deriving the string ‘w’


from the left side and convert all non terminals into
terminals.

In Right Most Derivation, we start deriving the string ‘w’


from the right side and convert all non terminals into
terminals.
16
LEFT-MOST DERIVATIONS

Consider the Grammar:


E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
EE+E (EE+E) EE+E (EE+E)
Eid+E (Eid) EE+E*E (EE*E)
Eid+E*E (EE*E) Eid+E*E (Eid)
Eid+id*E (Eid) Eid+id*E (Eid)
Eid+id*id (Eid) Eid+id*id (Eid)
17
PARSE TREE FOR LEFT-MOST
DERIVATIONS

E
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id E E
Derive the string ‘id+id *id’
+

EE+E (EE+E) id E E
Eid+E (Eid) *

Eid+E*E (EE*E)
id
Eid+id*E (Eid) id

Eid+id*id (Eid)
18
RIGHT-MOST DERIVATIONS

Consider the Grammar:


E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EE*E (EE*E) EE*E (EE+E)

EE*id (Eid) EE+E*E (EE+E)

EE+E*id (EE+E) EE+E*id (Eid)

EE+id*id (Eid) EE+id*id (Eid)

Eid+id*id (Eid) Eid+id*id (Eid)

19
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
E

EE*E (EE+E) E E

EE*id (Eid) *

EE+E*id (EE+E) E id
+ E
EE+id*id (Eid)
Eid+id*id (Eid)
id id
20
SECTION 2.2: AMBIGUOUS GRAMMAR

21
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e. it can be
derived by more then one ways from LMD or RMD.

Consider the Grammar:


E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EE+E (EE+E) EE*E (EE*E)


Eid+E (Eid) Eid*E (Eid) More than one
leftmost derivations
Eid+E*E (EE*E) Eid+E*E (EE+E)
Ambiguous Grammar
Eid+id*E (Eid) Eid+id*E (Eid)

Eid+id*id (Eid) Eid+id*id (Eid)


22
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e.
it can be derived by more then one ways from LMD or RMD.

Consider the Grammar:


E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EE*E (EE*E) EE+E (EE+E)


More than one
EE*id (Eid) EE+E*E (EE*E) rightmost derivations
EE+E*id (EE+E) EE+E*id (Eid)
Ambiguous Grammar
EE+id*id (Eid) EE+id*id (Eid)

Eid+id*id (Eid) Eid+id*id (Eid)


23
AMBIGUITY (CONT.)
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

E2 S1 E2 S1 S2
1 2 24
AMBIGUITY (CONT.)

• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.

• The unambiguous grammar will be:

stmt  matchedstmt | unmatchedstmt

matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt  if expr then stmt |


if expr then matchedstmt else unmatchedstmt

25
SECTION 2.3: LEFT RECURSION AND
LEFT FACTORING

26
LEFT RECURSION
 A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
+
A  A for some string 

 Top-down parsing techniques cannot handle left-recursive


grammars.
 So, we have to convert our left-recursive grammar into an

equivalent grammar which is not left-recursive.


 The left-recursion may appear in a single step of the

derivation (immediate left-recursion), or may appear in more


27
than one step of the derivation.
IMMEDIATE LEFT-RECURSION

A A |  A   A'
Eliminate A'   A' | 
where  does immediate
left
not start with A recursion
An equivalent grammar

In general,
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
Eliminate immediate left
 recursion

A   1 A' | ... |  n A' an equivalent grammar


A'   1 A ' | ... |  m A' |  28
REMOVING IMMEDIATE LEFT-RECURSION

Immediate Left Recursion In


E  E+T | T EE+T|T
T  T*F | F TT*F|F
No Immediate left recursion
F  id | (E) in
F id|(E)

EE+T|T (AA | )
A is E;  is +T and  is T
Applying Rule we get
E  T E'
E  T E' (A   A ')
E’  +T E' |  (A '   A '| ) E’  +T E' | 
T  F T'
T’  *F T' | 
TT*F|F (AA | ) F  id | (E)
A is T;  is *F and  is F
Applying Rule we get Final
29
T  F T' (A   A') Output
T’  *F T' |  (A’   A'|)
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the
No Immediate left recursion
Grammar S  Aa | b in the grammar
A  Sc | d
Substitutio
n

S  Aa  Sca Immediate left recursion in


or the grammar
A  Sc  Aac

We need to check and eliminate both Immediate left recursion and Left recursion

30
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the
Grammar S  Aa | b No Immediate left recursion in S
A  Ac | Sd | f
Order of non-terminals: S, A
Substitute ASd with Aad| for S:
bd - there is no immediate left recursion in S.

S  Aa | b Immediate left recursion in A


A  Ac | Aad |bd| f
1 is c; 2 is ad; 1 is bd and 2
is f

Applying Rule

S  Aa | b
We get: A  bdA' | fA'
A  bdA' | fA' A'  cA' | adA' | 
A'  cA' | adA' |  31
Final
Output
NO IMMEDIATE LEFT-RECURSION AA| 

BUT A   A'
A' 
GRAMMAR IS LEFT RECURSIVE  A' | 

Consider the Grammar Order of non-terminals: A, S


S  Aa | b
A  Ac | Sd | f

for A:
Eliminate the immediate left-recursion A  Ac | Sd | f
in A  is c; 1 is Sd and 2 is f
A  SdA' | fA'
A'  cA' | 

for S:
- Replace S  Aa with S  SdA' a|fA'a
So, we will have S  SdA' a | fA'a | b
S  SdA'| fA'a | b
Eliminate the immediate left-recursion in S
 is dA' a; 1 is fA'a and 2 is b
S  fA 'aS ' | bS'
S’  dA ' aS ' | 
S  fA'aS' | bS'
32
S'  dA' aS' | 
A  SdA' | fA'
Final
Output A'  cA' | 
PRACTICE QUESTION: LEFT
RECURSION

Remove the left recursion from the grammar given below

A Bx y | x
BCD
C A| c
D d

33
ELIMINATE LEFT-RECURSION -- ALGORITHM

- Arrange non-terminals in some order: A1 ... An


- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai  A j 
by
Ai  1  | ... | k 
where Aj  1 | ... | k
}
- eliminate immediate left-recursions among Ai
productions 34

}
LEFT-FACTORING
Consider the Grammar
S  Aa |A b

OR

stmt  if expr then stmt else stmt


|
if expr then stmt

When we see A or if, we cannot determine which production


rule to choose to expand S or stmt since both productions
have same left most symbol at the starting of the production.
35
(A in first example and if in second example)
LEFT-FACTORING (CONT.)
If there is a grammar
A  1|2
where  is non-empty and the first symbols of 1
and 2
(if they have one)are different.

Re-write the grammar as follows:


A  A'
A'  1|2

Now, we can immediately expand A to A'

This rewriting of the grammar is called LEFT


FACTORING
36
LEFT-FACTORING --
ALGORITHM
 For each non-terminal A with two or more
alternatives (production rules) with a common
non-empty prefix, let say

A   1 | ... |   n |  1 | ... |  m

convert it into

A  A' |  1 | ... |  m
A'   1 | ... |  n
37
LEFT-FACTORING – EXAMPLE1

A  abB | aB | cdg | cdeB | cdfB


 is a; 1 is bB;2 is
 B

A  aA' | cdg | cdeB | cdfB


A'  bB | B

 is cd; 1 is g; 2 is eB; 3 is
fB

A  aA' | cdA''
A'  bB | B
A''  g | eB | fB

38
LEFT-FACTORING – EXAMPLE2

A  ad | a | ab | abc | b

 is a; 1 is d; 2 is  ; 3 is b, 4 is
bc

A  aA' | b
A'  d |  | b | bc

 is b; 1 is  ; 2 is
c

A  aA' | b
A'  d |  | bA''
A''   | c

39
NON-CONTEXT FREE LANGUAGE
CONSTRUCTS
 There are some language constructions in the
programming languages which are not context-free. This
means that, we cannot write a context-free grammar for
these constructions.

 L1 = { c |  is in (a|b)*} is not context-free


 Declaring an identifier and checking whether it is
declared or not later. We cannot do this with a context-free
language. We need semantic analyzer (which is not
context-free).

 L2 = {anbmcndm | n1 and m1 } is not context-free


 Declaring two functions (one with n parameters, the 40
other one with m parameters), and then calling them with
actual parameters.
SECTION 2.4 : TOP DOWN PARSING

41
TOP-DOWN PARSING
 Beginning with the start symbol, try to guess the
productions to apply to end up at the user's
program.

42
CHALLENGES IN TOP-DOWN PARSING

 Top-down parsing begins with virtually no information.


 Begins with just the start symbol, which matches

every program.
 How can we know which productions to apply?

 In general, we can't.

 There are some grammars for which the best we can

do is guess and backtrack if we're wrong.


 If we have to guess, how do we do it?

43
TOP-DOWN PARSING
 Top-down parser
 Recursive-Descent Parsing
 Backtracking is needed (If a choice of a production rule does not

work, we backtrack to try other alternatives.)


 It is a general parsing technique, but not widely used.

 Not efficient

 Predictive Parsing
 No backtracking

 Efficient

 Needs a special form of grammars (LL(1) grammars).

 Recursive Predictive Parsing is a special form of Recursive Descent

parsing without backtracking.


 Non-Recursive (Table Driven) Predictive Parser is also known as LL(1)

parser.
44
RECURSIVE-DESCENT PARSING
(USES BACKTRACKING)
 Backtracking is needed.
 It tries to find the left-most derivation.

S  aBc
B  bc | b
S S
Input: abc
a B c a B
c

b c
b fails, backtrack
45
RECURSIVE PREDICTIVE
PARSING
 Each non-terminal corresponds to a procedure.

Ex: A  aBb (This is only the production rule for A)

proc A {
- match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
}

46
RECURSIVE PREDICTIVE PARSING
(CONT.)
A  aBb | bAB

proc A {
case of the current token
{
‘a’: - match the current token with a, and move to the
next token;
- call ‘B’;
- match the current token with b, and move to the
next token;
‘b’: - match the current token with b, and move to the
next token;
- call ‘A’;
47
- call ‘B’;
}
RECURSIVE PREDICTIVE PARSING
(CONT.)
 When to apply -productions.

A  aA | bB | 

 If all other productions fail, we should apply an -production. For


example, if the current token is not a or b, we may apply the -
production.

 Most correct choice: We should apply an -production for a non-


terminal A when the current token is in the follow set of A (which
terminals can follow A in the sentential forms).

48
TOP-DOWN, PREDICTIVE PARSING: LL(1)

 L: Left-to-right scan of the tokens


 L: Leftmost derivation.

 (1): One token of lookahead

 Construct a leftmost derivation for the sequence of


tokens.

 When expanding a nonterminal, we predict the


production to use by looking at the next token of the
input.
49
TOP-DOWN, PREDICTIVE PARSING:
LL(1)

a grammar   a grammar suitable for


predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.

 When re-writing a non-terminal in a derivation step, a predictive


parser can uniquely choose a production rule by just looking the
current symbol in the input string.

A  1 | ... | n input: ... a .......

current token

50
TOP-DOWN, PREDICTIVE PARSING: LL(1)

stmt  if ...... |
while ...... |
begin ...... |
for .....
 When we are trying to write the non-terminal stmt, if

the current token is if we have to choose first


production rule
 When we are trying to write the non-terminal stmt, we

can uniquely choose the production rule by just looking


the current token.
 We eliminate the left recursion in the grammar, and
51
left factor it. But it may not be suitable for predictive
parsing (not LL(1) grammar).
NON-RECURSIVE PREDICTIVE
PARSING -- LL(1) PARSER
 Non-Recursive predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.

Input Buffer

Non-Recursive
Stack Predictive Output
Parser

Parsing Table
52
LL(1) PARSER
Input buffer
 Contains the string to be parsed. We will assume that its end is marked with
a special symbol $.

Output
 A production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.

Stack
 Contains the grammar symbols
 At the bottom of the stack, there is a special end marker symbol $.
 Initially the stack contains only the symbol $ and the starting symbol S.
 $S  initial stack
 When the stack is emptied (ie. only $ left in the stack), the parsing is
completed.

Parsing table
 A two-dimensional array M[A,a]
 Each row is a non-terminal symbol
 Each column is a terminal symbol or the special symbol $ 53
 Each entry holds a production rule.
LL(1) PARSER – PARSER
ACTIONS
 The symbol at the top of the stack (say X) and the current symbol in the
input string (say a) determine the parser action.
 There are four possible parser actions.

1. If X and a are $  parser halts (successful completion)

2. If X and a are the same terminal symbol (different from $)


 parser pops X from the stack, and moves the next symbol in the input
buffer.

3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule
XY1Y2...Yk to represent a step of the derivation.

4. none of the above  error


 all empty entries in the parsing table are errors.
 If X is a terminal symbol different from a, this is also an error case. 54
CONSTRUCTING LL(1) PARSING
TABLES
 Two functions are used in the construction of LL(1)
parsing tables:
 FIRST FOLLOW

 FIRST() is a set of the terminal symbols which


occur as first symbols in strings derived from 
where  is any string of grammar symbols.

 FOLLOW(A) is the set of the terminals which occur


immediately after (follow) the non-terminal A in the
strings derived from the starting* symbol.
 a terminal a is in FOLLOW(A) if S   Aa

55
COMPUTE FIRST FOR ANY STRING X

 We want to tell if a particular nonterminal A derives a


string starting with a particular terminal t.
 Intuitively, FIRST(A) is the set of terminals that can be

at the start of a string produced by A.


 If we can compute FIRST sets for all non terminals in a

grammar, we can efficiently construct the LL(1) parsing


table.

56
COMPUTE FIRST FOR ANY STRING X
 Initially, for all non-terminals A, set
FIRST(A) = { t | A → t  for some  }
Consider the grammar :
SaC/bB
Bb
Cc
FIRST(S) ={a,b}; FIRST (B) ={b} and FIRST(C) ={c}

 For each nonterminal A, for each production A → B, set


FIRST(A) = FIRST(A) ∪ FIRST(B)
Consider the grammar :
SaC/bB/C
Bb Consider the grammar:
Cc SAb
FIRST(S) ={a,b,c}; Aa
FIRST (B) ={b} FIRST(S)=FIRST (A)={a} 57

FIRST(C) ={c}
FIRST COMPUTATION WITH ΕPSILON
 For all NT A where A → ε is a production, add ε to FIRST(A).
For eg. Sa|ε FIRST(S) {a, ε}
 For each production A → , where  is a string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ { ε }.
For eg. SAB|c ; Aa| ε ; B b| ε
FIRST(S) {a, b,c, ε} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ;
 For each production A → t, where  is a string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ { t }
For eg. SABcD ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b, c} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
 For each production A → B, where  is string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ (FIRST(B) - { ε }).
For eg. SABDc|f ; Aa| ε ; B b| ε ; Dd 58
FIRST(S) {a,b,d,f } ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
FOLLOW SET
 The FOLLOW set represents the set of
terminals that might come after a given
nonterminal
 Formally:

FOLLOW(A) = { t | S ⇒* αAt for some α,  }


where S is the start symbol of the grammar.
 Informally, every nonterminal that can ever

come after A in a derivation.

59
COMPUTE FOLLOW FOR ANY STRING X

RULE 1: If S is the start symbol  $ is in FOLLOW(S)

RULE 2: if A  B is a production rule


 everything in FIRST() is FOLLOW(B) except 

RULE 3(i) If ( A  B is a production rule ) or


RULE 3(ii) ( A  B is a production rule and  is in
FIRST() )  everything in FOLLOW(A) is in
FOLLOW(B).

We apply these rules until nothing more can be added to any


follow set.
60
FIRST AND FOLLOW SET
EXAMPLE

Consider the grammar:


SAa
ABD
B  b|
D  d| 

FOLLOW(S) = { $ } (Rule 1)
FIRST(S) = {b, d, a}
FOLLOW(A) = { a } (Rule 2)
FIRST(A) = { b, d,  }
FOLLOW(B) = { d, a } (Rule 2; Rule
FIRST(B) = { b,  } 3(ii))
FIRST(D) = { d,  } FOLLOW(D) = { a } Rule 3

61
FIRST AND FOLLOW SET
EXAMPLE
Consider the grammar
C  P F class id X Y
P  public |
F  final |
X  extends id |
Y  implements I |
I  id J
J  , I |

FOLLOW(C)={$} (Rule 1)
FIRST(C) = {public, final, FOLLOW(P)={final, class} (Rule 2; Rule 3
class} (ii))
FIRST(P) = { public, } FOLLOW(F) ={class} (Rule 2)
FIRST(F) = { final, }
FOLLOW(X)={implements,$}(Rule 2; Rule
FIRST(X) = { extends,  }
3(ii))
FIRST(Y) = { implements,  }
FOLLOW(Y)={$} (Rule 3(i))
FIRST(I) = { id} 62
FIRST(J) = { ‘,’ ,  } FOLLOW(I)={$} (Rule 3(i))
FOLLOW(J)={$} (Rule 3(i))
LL(1) PARSING

Consider the grammar:


E  E+T|T
T  T*F|F
F  (E) | id

Remove Immediate Left Recursion:


(Ref: Slide no. 29)

E TE'
E'  +TE'|
T FT'
T'  *FT'| 63

F (E)|id
FIRST EXAMPLE

Consider the grammar:


E  TE'
E'  +TE'| 
T  FT'
T'  *FT'|
F  (E)| id

FIRST(F) = {(,id}
FIRST(T') = {*, }
FIRST(T) = {(,id}
FIRST(E') = {+, }
FIRST(E) = {(,id}
64
FIRST(F) = {(,id}
FIRST(T’) = {*, }
FIRST(T) = {(,id}
FOLLOW EXAMPLE
FIRST(E’) = {+, } 1. If S is the start symbol  $ is in FOLLOW(S)
FIRST(E) = {(,id} 2(i) If A  B is a production rule
 everything in FIRST() is
FOLLOW(B) except 
3(i) If ( A  B is a production rule ) or
Consider the following grammar: 3(ii) ( A  B is a production rule
E  TE' and  is in FIRST() )
ETE’ {(Rule 1: $ in FOLLOW(E);
 everything in FOLLOW(A) is in FOLLOW(B).
E’  +TE' | (Rule 2: A  B :  is ; B is T and  is E’ );
T  FT' (Rule3(i): A  B :  is T; B is E ’);
Rule 3 (ii): A  B :  is ; B is T and E ’ is  ; FIRST of 
T’  *FT' |  has )}
F  (E) |id E+TE ’ |  {Rule 2: A  B :  is +; B is T and  is E’ );
(Rule3(i): A  B:  is +T; B is E ’;
(Rule3(ii): A  B :  is +; B is T;  is E ’;FIRST of  has
)}
TFT ’ {Rule 2: A  B :  is ; B is F and  is T’);
FOLLOW(E) = { $, ) } (Rule3(i): A  B :  is F; B is T ’);
FOLLOW(E') = { $, ) } (Rule3(ii): A  B :  is  ; B is F and  is T ’ FIRST of 
FOLLOW(T) = { +, ), $ } has )}
T’*FT ’|  {Rule 2: A  B :  is *; B is F and  is T ’);
FOLLOW(T') = { +, ), $} (Rule3(i): A  B :  is *; B is F;  is T ’);
FOLLOW(F) = {+, *, ), $ } Rule3(ii): A  B :  is *; B is F;  is T ’; FIRST of  has65
)}
F  (E)|id
{(Rule 2: A  B :  is ‘(‘; B is E and ‘)’ is  )}
CONSTRUCTING LL(1) PARSING
TABLE -- ALGORITHM
 for each production rule A   of a grammar G
 for each terminal a in FIRST( )
 add A   to M[A,a]
 If  in FIRST( )
 for each terminal a in
FOLLOW(A) add A   to M[A,a]
 If  in FIRST( ) and $ in FOLLOW(A)
 add A   to M[A,$]

 All other undefined entries of the parsing table


are error entries. 66
CONSTRUCTING LL(1) PARSING TABLE
E  TE' FIRST(TE'id}  E  TE'’ into M[E,(] and M[E,id]

E'  +TE' FIRST(+TE' )={+}  E’  +TE' into M[E',+]

E'   FIRST()={}  none


but since  in FIRST()
and FOLLOW(E')={$,)}  E'   into M[E' and M[E',)]

T  FT' FIRST(FT’)={(,id}  T  FT' into M[T,(] and M[T,id]

T'  *FT' FIRST(*FT’ )={*}  T'  *F' into M[T',*]

T'   FIRST()={}  none


but since  in FIRST()
and
FOLLOW(T’)={$,),+}  T'   into M[T',$], M[T' and67
M[T',+]

F  (E) FIRST((E) )={(}  F  (E) into M[F,(]


LL(1) PARSER – EXAMPLE 1
FIRST(F) = {(,id} FOLLOW(E) = { $, ) }
FIRST(T’) = {*, }
E TE' FIRST(T) = {(,id}
FOLLOW(E') = { $, ) }
FOLLOW(T) = { +, ), $ }
FIRST(E’) = {+, }
E' +TE' |  FIRST(E) = {(,id}
FOLLOW(T') = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

T FT'
T' *FT' | 
F (E) | id FIRST (E') has , so add E’  in FOLLOW (E’)
FIRST (T') has , so add T’  in FOLLOW (T’)

id + * ( ) $
E E  TE' E  TE'
E' E'  +TE' E'   E'  
T T  FT' T  FT'
T' T'   T'  *FT’ T'   T'  
68
F F  id F  (E)
LL(1) PARSER – EXAMPLE 1
Stack Input Output id + * ( ) $
E E  TE' E  TE'
$E id+id$ ETE'
$E'T id+id$ TFT' E' E'  +TE' E'   E'  
$E'T'F id+id$ Fid T T  FT' T  FT'
$E'T'id id+id$
T' T'   T'  *FT’ T'   T'  
$E'T' +id$ T'
F F  id F  (E)
$E' +id$ E’+TE'
$E'T+ +id$
$E'T id$ TFT'
$E'T'F id$ Fid
$E’T'id id$
$ET' $ T'
$E' $ E'
$ $ Accept
69
LL(1) PARSER – EXAMPLE 2

a b $
S  aBa S S  aBa
B  bB |  B B B  bB

LL(1) Parsing Table


Stack Input Output
$S abba$ SaBa
$aBa abba$
$aB bba$ BbB
$aBb bba$
$aB ba$ BbB
$aBb ba$
$aB a$ B
$a a$ 70

$ $ Accept,
Successful
LL(1) PARSER – EXAMPLE2 (CONT.)

Outputs: S  aBa B  bB B  bB B 

Derivation(left-most): S aBa abBa abbBa abba

S
parse tree
a B a

b B

b B
71

A GRAMMAR WHICH IS NOT
LL(1)

S iCtSE | a FOLLOW(S) = { $,e }


E eS |  FOLLOW(E) = { $,e }
C b FOLLOW(C) = { t }
a b e i t $
FIRST(iCtSE) = {i}
S Sa S
FIRST(a) = {a} iCtSE
FIRST(eS) = {e} E EeS E
FIRST() = {} E
FIRST(b) = {b}
C Cb
two production rules for
72
M[E,e]
A GRAMMAR WHICH IS NOT LL(1)
(CONT.)
 What do we have to do it if the resulting parsing table contains
multiply defined entries?
 If we didn’t eliminate left recursion, eliminate the left recursion in the
grammar.
 If the grammar is not left factored, we have to left factor the grammar.
 If its (new grammar’s) parsing table still contains multiply defined entries,
that grammar is ambiguous or it is inherently not a LL(1) grammar.
 A left recursive grammar cannot be a LL(1) grammar.
 A  A | 
any terminal that appears in FIRST() also appears
FIRST(A) because A  .
If  is , any terminal that appears in FIRST() also appears in

FIRST(A) and FOLLOW(A).


 A grammar is not left factored, it cannot be a LL(1) grammar
• A   1 |  2
 any terminal that appears in FIRST(1) also appears in 73

FIRST(2).

PROPERTIES OF LL(1)
GRAMMARS
 A grammar G is LL(1) if and only if the following
conditions hold for two distinctive production rules A  
and A  
1. Both  and  cannot derive strings starting with same
terminals.
2. At most one of  and  can derive to .
3. If  can derive to , then  cannot derive to any string
starting with a terminal in FOLLOW(A).
 In other word we can say that a grammar G is LL(1) iff for
any productions
A → ω and A → ω , the sets
1 2

FIRST(ω1 FOLLOW(A)) and FIRST(ω2 FOLLOW(A)) are disjoint.


74

This condition is equivalent to saying that there are no
conflicts in the table.

You might also like