Lec02-Syntax Analysis and LL
Lec02-Syntax Analysis and LL
1
SECTION 2.1: CONTEXT FREE GRAMMAR
2
SYNTAX ANALYZER
4
PARSERS (CONT.)
We categorize the parsers into two groups:
1. Top-Down Parser
Parse-trees built is build from root to leaves (top to bottom).
Input to parser is scanned from left to right one symbol at a time
2. Bottom-Up Parser
Start from leaves and work their way up to the root.
Input to parser scanned from left to right one symbol at a time
6
CONTEXT-FREE GRAMMARS
(CFG)
G = (T,V, S, P)
{a, b} S aAa
S, A S b
Aa 8
TERMINALS SYMBOLS
Terminals include:
Lower case letters early in the
alphabets
Operator symbols, +, %
Punctuation symbols such as
(),;
Digits 0,1,2, …
Boldface strings id or if
11
DERIVATION OF A STRING
12
DERIVATION OF A STRING
Derived in multiple
SaSa aaSaaaaaSaaaaaabaaa
steps
14
SENTENCE AND SENTENTIAL
FORM
Left-Most Derivation
Right- Most Derivation
E
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id E E
Derive the string ‘id+id *id’
+
EE+E (EE+E) id E E
Eid+E (Eid) *
Eid+E*E (EE*E)
id
Eid+id*E (Eid) id
Eid+id*id (Eid)
18
RIGHT-MOST DERIVATIONS
19
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
E
EE*E (EE+E) E E
EE*id (Eid) *
EE+E*id (EE+E) E id
+ E
EE+id*id (Eid)
Eid+id*id (Eid)
id id
20
SECTION 2.2: AMBIGUOUS GRAMMAR
21
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e. it can be
derived by more then one ways from LMD or RMD.
stmt stmt
E2 S1 E2 S1 S2
1 2 24
AMBIGUITY (CONT.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
25
SECTION 2.3: LEFT RECURSION AND
LEFT FACTORING
26
LEFT RECURSION
A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
+
A A for some string
A A | A A'
Eliminate A' A' |
where does immediate
left
not start with A recursion
An equivalent grammar
In general,
A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
Eliminate immediate left
recursion
EE+T|T (AA | )
A is E; is +T and is T
Applying Rule we get
E T E'
E T E' (A A ')
E’ +T E' | (A ' A '| ) E’ +T E' |
T F T'
T’ *F T' |
TT*F|F (AA | ) F id | (E)
A is T; is *F and is F
Applying Rule we get Final
29
T F T' (A A') Output
T’ *F T' | (A’ A'|)
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the
No Immediate left recursion
Grammar S Aa | b in the grammar
A Sc | d
Substitutio
n
We need to check and eliminate both Immediate left recursion and Left recursion
30
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the
Grammar S Aa | b No Immediate left recursion in S
A Ac | Sd | f
Order of non-terminals: S, A
Substitute ASd with Aad| for S:
bd - there is no immediate left recursion in S.
Applying Rule
S Aa | b
We get: A bdA' | fA'
A bdA' | fA' A' cA' | adA' |
A' cA' | adA' | 31
Final
Output
NO IMMEDIATE LEFT-RECURSION AA|
BUT A A'
A'
GRAMMAR IS LEFT RECURSIVE A' |
for A:
Eliminate the immediate left-recursion A Ac | Sd | f
in A is c; 1 is Sd and 2 is f
A SdA' | fA'
A' cA' |
for S:
- Replace S Aa with S SdA' a|fA'a
So, we will have S SdA' a | fA'a | b
S SdA'| fA'a | b
Eliminate the immediate left-recursion in S
is dA' a; 1 is fA'a and 2 is b
S fA 'aS ' | bS'
S’ dA ' aS ' |
S fA'aS' | bS'
32
S' dA' aS' |
A SdA' | fA'
Final
Output A' cA' |
PRACTICE QUESTION: LEFT
RECURSION
A Bx y | x
BCD
C A| c
D d
33
ELIMINATE LEFT-RECURSION -- ALGORITHM
}
LEFT-FACTORING
Consider the Grammar
S Aa |A b
OR
A 1 | ... | n | 1 | ... | m
convert it into
A A' | 1 | ... | m
A' 1 | ... | n
37
LEFT-FACTORING – EXAMPLE1
A aA' | cdA''
A' bB | B
A'' g | eB | fB
38
LEFT-FACTORING – EXAMPLE2
A ad | a | ab | abc | b
is a; 1 is d; 2 is ; 3 is b, 4 is
bc
A aA' | b
A' d | | b | bc
is b; 1 is ; 2 is
c
A aA' | b
A' d | | bA''
A'' | c
39
NON-CONTEXT FREE LANGUAGE
CONSTRUCTS
There are some language constructions in the
programming languages which are not context-free. This
means that, we cannot write a context-free grammar for
these constructions.
41
TOP-DOWN PARSING
Beginning with the start symbol, try to guess the
productions to apply to end up at the user's
program.
42
CHALLENGES IN TOP-DOWN PARSING
every program.
How can we know which productions to apply?
In general, we can't.
43
TOP-DOWN PARSING
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not
Not efficient
Predictive Parsing
No backtracking
Efficient
parser.
44
RECURSIVE-DESCENT PARSING
(USES BACKTRACKING)
Backtracking is needed.
It tries to find the left-most derivation.
S aBc
B bc | b
S S
Input: abc
a B c a B
c
b c
b fails, backtrack
45
RECURSIVE PREDICTIVE
PARSING
Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
}
46
RECURSIVE PREDICTIVE PARSING
(CONT.)
A aBb | bAB
proc A {
case of the current token
{
‘a’: - match the current token with a, and move to the
next token;
- call ‘B’;
- match the current token with b, and move to the
next token;
‘b’: - match the current token with b, and move to the
next token;
- call ‘A’;
47
- call ‘B’;
}
RECURSIVE PREDICTIVE PARSING
(CONT.)
When to apply -productions.
A aA | bB |
48
TOP-DOWN, PREDICTIVE PARSING: LL(1)
current token
50
TOP-DOWN, PREDICTIVE PARSING: LL(1)
stmt if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if
Input Buffer
Non-Recursive
Stack Predictive Output
Parser
Parsing Table
52
LL(1) PARSER
Input buffer
Contains the string to be parsed. We will assume that its end is marked with
a special symbol $.
Output
A production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
Stack
Contains the grammar symbols
At the bottom of the stack, there is a special end marker symbol $.
Initially the stack contains only the symbol $ and the starting symbol S.
$S initial stack
When the stack is emptied (ie. only $ left in the stack), the parsing is
completed.
Parsing table
A two-dimensional array M[A,a]
Each row is a non-terminal symbol
Each column is a terminal symbol or the special symbol $ 53
Each entry holds a production rule.
LL(1) PARSER – PARSER
ACTIONS
The symbol at the top of the stack (say X) and the current symbol in the
input string (say a) determine the parser action.
There are four possible parser actions.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule
XY1Y2...Yk to represent a step of the derivation.
55
COMPUTE FIRST FOR ANY STRING X
56
COMPUTE FIRST FOR ANY STRING X
Initially, for all non-terminals A, set
FIRST(A) = { t | A → t for some }
Consider the grammar :
SaC/bB
Bb
Cc
FIRST(S) ={a,b}; FIRST (B) ={b} and FIRST(C) ={c}
FIRST(C) ={c}
FIRST COMPUTATION WITH ΕPSILON
For all NT A where A → ε is a production, add ε to FIRST(A).
For eg. Sa|ε FIRST(S) {a, ε}
For each production A → , where is a string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ { ε }.
For eg. SAB|c ; Aa| ε ; B b| ε
FIRST(S) {a, b,c, ε} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ;
For each production A → t, where is a string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ { t }
For eg. SABcD ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b, c} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
For each production A → B, where is string of NT whose FIRST sets
contain ε, set
FIRST(A) = FIRST(A) ∪ (FIRST(B) - { ε }).
For eg. SABDc|f ; Aa| ε ; B b| ε ; Dd 58
FIRST(S) {a,b,d,f } ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
FOLLOW SET
The FOLLOW set represents the set of
terminals that might come after a given
nonterminal
Formally:
59
COMPUTE FOLLOW FOR ANY STRING X
FOLLOW(S) = { $ } (Rule 1)
FIRST(S) = {b, d, a}
FOLLOW(A) = { a } (Rule 2)
FIRST(A) = { b, d, }
FOLLOW(B) = { d, a } (Rule 2; Rule
FIRST(B) = { b, } 3(ii))
FIRST(D) = { d, } FOLLOW(D) = { a } Rule 3
61
FIRST AND FOLLOW SET
EXAMPLE
Consider the grammar
C P F class id X Y
P public |
F final |
X extends id |
Y implements I |
I id J
J , I |
FOLLOW(C)={$} (Rule 1)
FIRST(C) = {public, final, FOLLOW(P)={final, class} (Rule 2; Rule 3
class} (ii))
FIRST(P) = { public, } FOLLOW(F) ={class} (Rule 2)
FIRST(F) = { final, }
FOLLOW(X)={implements,$}(Rule 2; Rule
FIRST(X) = { extends, }
3(ii))
FIRST(Y) = { implements, }
FOLLOW(Y)={$} (Rule 3(i))
FIRST(I) = { id} 62
FIRST(J) = { ‘,’ , } FOLLOW(I)={$} (Rule 3(i))
FOLLOW(J)={$} (Rule 3(i))
LL(1) PARSING
E TE'
E' +TE'|
T FT'
T' *FT'| 63
F (E)|id
FIRST EXAMPLE
FIRST(F) = {(,id}
FIRST(T') = {*, }
FIRST(T) = {(,id}
FIRST(E') = {+, }
FIRST(E) = {(,id}
64
FIRST(F) = {(,id}
FIRST(T’) = {*, }
FIRST(T) = {(,id}
FOLLOW EXAMPLE
FIRST(E’) = {+, } 1. If S is the start symbol $ is in FOLLOW(S)
FIRST(E) = {(,id} 2(i) If A B is a production rule
everything in FIRST() is
FOLLOW(B) except
3(i) If ( A B is a production rule ) or
Consider the following grammar: 3(ii) ( A B is a production rule
E TE' and is in FIRST() )
ETE’ {(Rule 1: $ in FOLLOW(E);
everything in FOLLOW(A) is in FOLLOW(B).
E’ +TE' | (Rule 2: A B : is ; B is T and is E’ );
T FT' (Rule3(i): A B : is T; B is E ’);
Rule 3 (ii): A B : is ; B is T and E ’ is ; FIRST of
T’ *FT' | has )}
F (E) |id E+TE ’ | {Rule 2: A B : is +; B is T and is E’ );
(Rule3(i): A B: is +T; B is E ’;
(Rule3(ii): A B : is +; B is T; is E ’;FIRST of has
)}
TFT ’ {Rule 2: A B : is ; B is F and is T’);
FOLLOW(E) = { $, ) } (Rule3(i): A B : is F; B is T ’);
FOLLOW(E') = { $, ) } (Rule3(ii): A B : is ; B is F and is T ’ FIRST of
FOLLOW(T) = { +, ), $ } has )}
T’*FT ’| {Rule 2: A B : is *; B is F and is T ’);
FOLLOW(T') = { +, ), $} (Rule3(i): A B : is *; B is F; is T ’);
FOLLOW(F) = {+, *, ), $ } Rule3(ii): A B : is *; B is F; is T ’; FIRST of has65
)}
F (E)|id
{(Rule 2: A B : is ‘(‘; B is E and ‘)’ is )}
CONSTRUCTING LL(1) PARSING
TABLE -- ALGORITHM
for each production rule A of a grammar G
for each terminal a in FIRST( )
add A to M[A,a]
If in FIRST( )
for each terminal a in
FOLLOW(A) add A to M[A,a]
If in FIRST( ) and $ in FOLLOW(A)
add A to M[A,$]
T FT'
T' *FT' |
F (E) | id FIRST (E') has , so add E’ in FOLLOW (E’)
FIRST (T') has , so add T’ in FOLLOW (T’)
id + * ( ) $
E E TE' E TE'
E' E' +TE' E' E'
T T FT' T FT'
T' T' T' *FT’ T' T'
68
F F id F (E)
LL(1) PARSER – EXAMPLE 1
Stack Input Output id + * ( ) $
E E TE' E TE'
$E id+id$ ETE'
$E'T id+id$ TFT' E' E' +TE' E' E'
$E'T'F id+id$ Fid T T FT' T FT'
$E'T'id id+id$
T' T' T' *FT’ T' T'
$E'T' +id$ T'
F F id F (E)
$E' +id$ E’+TE'
$E'T+ +id$
$E'T id$ TFT'
$E'T'F id$ Fid
$E’T'id id$
$ET' $ T'
$E' $ E'
$ $ Accept
69
LL(1) PARSER – EXAMPLE 2
a b $
S aBa S S aBa
B bB | B B B bB
$ $ Accept,
Successful
LL(1) PARSER – EXAMPLE2 (CONT.)
Outputs: S aBa B bB B bB B
S
parse tree
a B a
b B
b B
71
A GRAMMAR WHICH IS NOT
LL(1)
FIRST(2).
PROPERTIES OF LL(1)
GRAMMARS
A grammar G is LL(1) if and only if the following
conditions hold for two distinctive production rules A
and A
1. Both and cannot derive strings starting with same
terminals.
2. At most one of and can derive to .
3. If can derive to , then cannot derive to any string
starting with a terminal in FOLLOW(A).
In other word we can say that a grammar G is LL(1) iff for
any productions
A → ω and A → ω , the sets
1 2