0% found this document useful (0 votes)
16 views

Chapter Four

i want your support now

Uploaded by

mersenchala419
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Chapter Four

i want your support now

Uploaded by

mersenchala419
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

CHAPTER FOUR

Syntax Analysis and the Parser


The role of the Parser (Syntax Analyzer) and its parsing Process
Syntax analysis/parser:
• The parser accepts the token from lexical
analysis and the parser checks whether the token
follows the grammar of the language or not, if
the token follows the grammar of the language it
produces the parse tree
• The parser gets information about various tokens
from the symbol table
cont…
• The methods commonly used in compilers can be classified as :
• top-down
• top-down methods build parse trees from the top (root) to the bottom
(leaves)
• bottom-up
• This methods start from the leaves and work their way up to the root.
• In either case, the input to the parser is scanned from left to right, one
symbol at a time
Context-free grammars
• Like regular expressions, context-free grammars describe sets of strings, i.e.,
languages.
• It is a formal grammar used to generate all possible string patterns in a given
formal language.
• In formal language theory, a context-free language is generated by CFG.
• The automata used to design CFL is pushdown automata.
• The automata that are used to design RL is finite-state automata
• If Grammar is accepted by a finite automaton that grammar is called regular
grammar
Context-free grammars
• CFG can define both regular and irregular language but RE only
defines regular languages
• Regular language: a language that RE can define
• L={bb, abb, aabb…}, RE=a*bb
• Non regular language: a language that can’t be defined by RE
• L={Є, ab, aabb, aaabbb….}
• CFG is powerful than regular expression b/c they have more expressive
power
Context-free grammars
• CFG is defined by 4 tuples as G={V,T,S,P}where
• V=set of variables or non-terminal symbols, denoted by upper case letter
• T=set of terminal symbols, denoted by lowercase letters
• S=start symbol
• P=production rule each of the form Aa where, aЄ(V U T)* and AЄV
• In formal language theory, a context-free grammar (CFG) is a formal grammar in
which every production rule is of the form V → w
• where V is a single nonterminal symbol, and w is a string of terminals and/or
nonterminal (w can be empty).
• It does not matter which symbols the nonterminal is surrounded by, the single
nonterminal on the left-hand side can always be replaced by the right-hand side.
Context-free grammars
• Example: for generating a language that generates an equal number of a’s and b’s in
the form anbn (this is not regular language b/c it can’t be designed in finite state
machine(automata)), the CFG will be defined as
• G={(S,A),(a,b),(SaAb,AaAb|Є)}
• Example: construct CFG for the language having any number of a’s over the set of
∑={a}.
• L={Є,a,aa,aaa,…}
• RE=a*
• Production rule: SaS
SЄ
Derivation
• The process of deriving a string by applying a set of production rules is called as derivation.
• The basic idea of derivation is to consider productions as rewrite rules:
• Whenever we have a nonterminal, we can replace this with the right-hand side of any
production in which the nonterminal appears on the left-hand side.
• We can do this anywhere in a sequence of symbols (terminals and nonterminal) and repeat
doing so until we have only terminals left.
• Grammar rules determine the legal strings of token symbols using derivations.
• A derivation is a sequence of replacements of structure names by choices on the right-hand
sides of grammar rules.
• A derivation begins with a single structure name and ends with a string of token symbols.
• At each step in a derivation, a single replacement is made using one choice from a grammar
rule.
Derivation
• How grammar rules determine a "language," or set of legal strings of tokens
(number - number ) * number
Derivation
• Example 1: A CFG for ab* = { a, ab, abb, abbb, abbbb, . . . . }
1. Terminals: ∑ = {a, b},
2. Nonterminal: N = {S, B}
3. Productions:
P = { S -> aB, B -> bB | ε }
DERIVATION of abbb using the CFG of example 1:
S => aB => abB => abbB => abbbB => abbb
• Example 2: A CFG for aba* = { ab, aba, abaa, abaaa, . . . . }
S -> abA, A -> aA | ε
• DERIVATION of abaaa using the CFG of example 2:
S => abA => abaA=> abaaA => abaaaA => abaaa
Example 3: A CFG for ab*a = { aa, aba, abba, abbba, . . . . }
S -> aBa, B -> bB | ε
• DERIVATION of abbbba using the CFG of example 3:
S => aBa => abBa => abbBa => abbbBa => abbbbBa => abbbba
Derivation order
There are two types of derivation:
• A leftmost derivation: a derivation in which the leftmost nonterminal is replaced at each
step in the derivation. In any step, the leftmost nonterminal is expanded.
• A rightmost derivation: a derivation in which the rightmost nonterminal is replaced at
each step in the derivation. In any step, the rightmost nonterminal is expanded.
• Example: let G be a CFG for which the production rules are
• SaAB, A bBb, B A|Є, derive the string abbbb by
• Leftmost derivation
• S aAB abBbB abAbB abbBbbB abbbbB  abbbb
• Rightmost derivation
• S  aAB  aA  abBb  abAb  abbBbb  abbbb
Derivation Trees

• The geometrical/diagrammatical representation of a derivation is called as a parse tree or


derivation tree
• These trees are called syntax trees, parse trees, generation trees, production trees, or
derivation trees
• The derivation or the yield of a parse tree is the final string obtained by concatenating the
labels of the leaves of the tree from left to right, ignoring the Nulls. However, if all the leaves
are Null, derivation is Null.
 We can draw a derivation as a tree:
• The root of the tree is the start symbol of the grammar
• i.e root vertex: must be labelled by start symbol
• The leaves of the tree are terminals
• i.e leaves: labelled by terminal symbols or
Derivation tree
• Example: Let any set of production rules in a CFG be
• X → X+X | X*X |X| a over an alphabet {a}.
• The leftmost derivation for the string "a+a*a" may be:
• X → X+X → a+X → a + X*X → a+a*X → a+a*a
• The stepwise derivation of the above string is shown below:
Derivation tree

• The rightmost derivation for the above string "a+a*a" may be −


• X → X*X → X*a → X+X*a → X+a*a → a+a*a.
• The stepwise derivation of the above string is shown as below −
Derivation tree
Derivation Trees

• Example Let a CFG {N,T,P,S} be N = {S}, T = {a, b}, Starting symbol = S, P


= S → SS | aSb | ε. One derivation from the above CFG is “abaabb”

• S → SS → aSbS → abS → abaSb → abaaSbb → abaabb


Ambiguous grammars

• For ambiguous grammars


• More than one leftmost derivation and more than one rightmost derivation
exist for at least one string.
• Leftmost derivation and rightmost derivation represents different parse
trees.
• For unambiguous grammars,
• A unique leftmost derivation and a unique rightmost derivation exist for all
the strings.
• Leftmost derivation and rightmost derivation represents the same parse tree.
Ambiguity:
• the grammar is said to be ambiguous grammar if it generates more
than one parse tree for the corresponding input string so the
corresponding grammar is known ambiguous grammar.
• Example: EE+E|E*E|(E)|a a string ”a+a*a”
Removal of ambiguity
• Ambiguous grammar has a precedence problem(if we have different
operators and have the same operators).
• Associativity: Solved by recursion(the growing of the tree to particular
direction)
• Precedency: Solved by giving the level(highest precedence lower level).
• Example: S S+S|a 2. S  S+S|S*S|a ,string”a+a*a”
• String”a+a+a” SS + a|a S S+T|T
T T*F|F
F a
parser

top down Bottom up

Recursive Operator
LR
descent precedence

with out
back tracking LR(0) SLR(1)
back tracking

Predictive
parser

LL(1)
Parser
• Parser: can also be called as syntax analysis check
• whether the token syntax is correct or not if the token syntax is correct then the
parser generate the parse tree
• suppose the token syntax is not correct it report error message to the user.
• There are two types of parsing technique
• Top down parsing: the derivation process start from the top most
node(root node) and the process continues until you get the leaf node.
• Bottom up parsing: the derivation process start from the leaf node & we
have to apply the production rule in order to get the root symbol.
Top-Down Parsing
• The process of construction of parse tree starting from root and proceed to leaf
is called TDP.
• A parser is top-down if it discovers a parse tree top to bottom
• A top-down parse corresponds to a preorder traversal of the parse tree
• A leftmost derivation is applied at each derivation step.
• Eg, S  aABe string”abcde”
A  bc
B d

• TDP constructed for the grammar if it is free from ambiguity and left recursion.
Recursive descent parsing
• It is a top-down parsing technique that constructs the parsed tree from the top and reads the input
from left to right.
• This technique recursively parses the input to make a parse tree, which may/may not require
backtracking
• In Recursive descent parsing we have to write a recursive procedure for every non-terminal which
is available in the grammar. i.e. A procedure is associated with each non-terminal of the grammar
• The step for constructing the Recursive descent parser
• Step1: If the input is nonterminal then call the corresponding procedure of nonterminal
• Step 2: If the input is a terminal symbol compare the terminal with the input string and if they are
the same we have to increment the input pointer.
• Step3: If the non-terminal produces more than one production then we have to write those
productions in the corresponding function
Recursive descent parsing Example
Eample EiE’ , E’+jE’|Є string”i+j$”
E() { if (input==‘i’)
input++;
Eprime();}
Eprime(){ if (input==‘+’) {
input++;
if(input==‘j’)
input++;
Eprime(); }
else
return;}
main(){E();
if(input==$) cout<<“success”; }
Backtracking
• The parse tree is started from the root node and the input string is matched against the
production rules for replacing them.
• Example: ScAd, Aab|d string w= “cdd”

• So it yields w = cabd does not match the first string.


• So, it needs backtracking next production.

• it matches the input string. These processes repeat until they yield the input string.
• limitation: if the given grammar has more numbers of alternatives the cost of the
backtracking is high.
Predictive parsing

• Predictive parser is a recursive descent parser, which has the capability to


predict which production is to be used to replace the input string.
• The predictive parser does not suffer from backtracking.
• To accomplish its tasks, the predictive parser uses a look-ahead pointer, which
points to the next input symbols.
• To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as
LL(k) grammar.
• it is a type of recursive decent with no backtracking.
• It also called LL(1) parser.
LL Parsing
• Uses an explicit stack rather than recursive calls to perform a parse
• LL(k) parsing means that
• K number of looks ahead are used in the input string(how many times are we
going to track the input string)
• Generally k=1 = LL(1)
• The first L means that the token sequence is read from left to right
• The second L means a leftmost derivation is applied at each step
• LL(1) A grammar whose parsing table has no multiply-defined entries
Definition of LL(1)
• An LL parser consists of
• Input buffer consists of the input generated to stack
• Parser stack that holds grammar symbols: non-terminals and tokens
• Parsing table that specifies the parser action
• LL(1) parser that interacts with parser stack, parsing table, and Input buffer
• Structure of LL(1)
i/p buffer a + b $

s
stack LL(1) parser
$

Parsing table
Construction of predictive LL(1) parsing
• Construction of LL(1) parser:
1.Elimination of left recursion
2. Elimination of left factoring
3.Calculation of FIRST and FOLLOW functions
4.Construction of parsing table by using first and follow function
5. Stack implementation
6.Check whether the input string is accepted by parser or not
• Finding FIRST and FOLLOW
• First and follow sets are needed for the parser properly apply the
needed production rule at the correct position
Eliminating of left recursion
• A grammar is left recursive if it has a non terminal A such that there is a production of the
form AA
• Top down parser can’t handle a grammar which contains left recursion production, so a
transformation is needed to eliminate left recursion
• Left recursion in a production may be removed by transforming the grammar in the following
way.

Example: EE+T|T
TT*F|F
• Left recursion of : EE+T|T replace with ETE’, E’+TE’|Є
• Left recursion of : TT*F|F replace with TFT’, T’*FT’|Є


Left Factoring of Common Prefixes
• Another problem of LL parser is to have a common prefix
• An LL(1) parser cannot predict with production to apply
• The solution is use left factoring of the common prefix
• General form: A…..
• Left factoring solution: AA’|
A’…..
• Example: S iEtS|iEtSeS|a
E b
• Left factoring solution :S iEtSS’|a
S’ Є|es
E b
FIRST function
• First(α) is a set of terminal symbol that begins in the string derive from α.we take
only terminal symbols.
• Eg. A  abc|def|ghi
• First(A)={a,d,g}
• Rules for calculating first functions
rule 1: for production rule
X  ε, So, first(X)={ε}
rule 2: for any terminal symbol ‘a’
first(a) = {a}
FIRST function
rule 3: for a production rule
X  Y1Y2Y3
• For calculating first(X)
• If ε doesn’t belongs to first(Y1), then first(X)=first(Y1)
• If ε does belongs to first(Y1), then first(X) = {first(Y1)-ε} u first(Y2,Y3)
• For calculating first(Y2,Y3)
• If ε doesn’t belongs to first(Y2), then first(Y2Y3)=first(Y2)
• If ε does belongs to first(Y2), then first(Y2Y3) = {first(Y2)-ε} u first(Y3)
• Similarly, we can expand the rule for any production rule
X  Y1,Y2,Y3………..YN
Follow function
• FOLLOW() Function
• Which terminal is following
• It check the right hand side.
• Follow(α) is set of terminal symbol that appear immediately to the right of α
• Rules for calculating follow functions
rule 1: for starting symbol s, place $ in follow(s)
rule 2: for production rule
A  αB
follow(B) = follow(A)
rule3:for any production rule
A  αBβ
• If ε doesn’t belongs to first(β), then follow(B)=first(β)
• If ε does belongs to first(β), then follow(B) = {first(β)-ε} u follow(A)
Example of first and follow

• SABCDE first(S)={a,b,c} follow(S)={$}


• A a|Є first(A)={a,Є} follow(A)={b,c}
• B b|Є first(B)=(b,Є) follow(B)={c}
• Cc first(C)={c} follow(C)={d,e,$}
• Dd|Є first(D)={d,Є} follow(D)={e,$}
• Ee|Є first(E)={e,Є} follow(E)={$}
Example of LL(1)parser
Example: Calculate the first and follow function for the given grammar and construct the
parse tree for the input string = abd$.
• S  A
A  aB|Ad
B  b
C  g
The given grammar is left recursive so, first remove left recursion from the grammar.
After Eliminating left recursion S  A
A  aBA‘
A’  dA‘|ε
B  b
C  g
Step1:find first and follow function

• First function Follow function


first(S)=first(A)={a} follow(S)={$}
first(A)={a} follow(A)=follow(S)={$}
first(A’)={d,ε} follow(A’)=follow(A)=follow(S)={$}
first(B)={b} follow(B)=first(A’)Ufollow(A)={d,$}
first(C)={g} follow(C)=NA(not answer) b\c non
terminal C there is no in RHS
Step2: Construct parse table using first and follow function

a d b g $
S SA

A A  aBA'

A’ A ‘ dA’ A’  ε

B B  b

C C  g
Step3 stack implementation by using parse table
Stack i|p production
S$ abd$ S A
A$ abd$ A aBA‘
aBA‘$ abd$ pop a b\c they are similar
BA’$ bd$ B b
bA’$ bd$ pop b
A’$ d$ A’ dA’
dA’$ d$ pop d
A’$ $ A’ ε
$ $ Accept(the input is properly parsed)
Note: after parsing the given input string if the stack contains only dollar($) symbol
then we can say that our string is accepted by the parser ,so the parser produce the
corresponding parse tree.
Step4 generate parse tree using stack implementation

a B A’

b d A’

ε
Bottom-Up Parsing


Attempts to traverse a parse tree bottom up (post-order traversal) 


Reduces a sequence of tokens to the start symbol 


At each reduction step, the RHS of a production is replaced with LHS 


A reduction step corresponds to the reverse of a rightmost derivation 
 Example: given the following grammar 
E → E+T|T
T → T*F|F
F → ( E ) | id
A rightmost derivation for id + id * id is shown below:
E ⇒ rm E + T ⇒ rm E + T * F ⇒ rm E + T * id
⇒ rm E + F * id ⇒ rm E + id * id ⇒ rm T + id *
id ⇒ rm F + id * id ⇒ rm id + id * id
Bottom up parsing /Shift reduce parsing
• Shift reduce parsing is a process of reducing a string to the start symbol of a
grammar.
• The main actions are shift and reduce
• A bottom-up parser is also known as shift –reduce parser
• Shift reduce parsing uses :
• Stack :to store the symbol of the grammar initial the bottom of the stack contains $
and
• Input buffer: to Store the input string to be parsed, the end of the input buffer
denoted with help of $
• $:specify the bottom of the stack as well as the end of the input buffer
Stack Implementation of a Bottom-Up Parser
Basic Operations/actions –
• Shift: This involves moving of symbols from input buffer onto the stack.
• Reduce: If the handle(a substring which is much with the right side of the
production) appears on top of the stack then, its reduction by using appropriate
production rule is done
• i.e. RHS of production rule is popped out of stack and LHS of production rule
is pushed onto the stack.
• Accept: If only start symbol is present in the stack and the input buffer is empty
then, the parsing action is called accept. When accept action is obtained, it is
means successful parsing is done.
• Error: This is the situation in which the parser can neither perform shift action
nor reduce action and not even accept action.
Stack Implementation of a Bottom-Up Parser

There are two conflict which a raises in Shift reduce parsing


• Shift Reduced conflict: may occur if the parser has a choice to select both the
shift action as well as reduced action but the parser we have to select one action
only.
• Reduce Reduce conflict: may occur if more than one reduction is possible for
the corresponding handle.
Example on shift reduce parsing Parsing

Consider the parsing of the input string id + id * id


Stack Input Action
$ id + id * id $ shift E →E+T|T
$id + id * id $ reduce F → id
$F + id * id $ reduce T → F T → T*F |F
$T + id * id $ reduce E → T
$E + id * id $ shift F → ( E ) | id
$E + id * id $ shift
$E + id * id $ reduce F → id
$E + F * id $ reduce T → F
$E + T * id $ shift
$E + T * id $ shift
$E + T * id $ reduce F → id
We use $ to mark the bottom of the stack as well as the end of input
$E + T * F $ reduce T → T * F
$E + T $ reduce E → E + T
$E $ accept
Operator precedence parsing
• Any grammar G is called an operator precedence grammar if it meets
the following condition:
• There exist no production rule which contain epsilon on RHS.
• There exist no production rule which contain two non-terminal
adjacent each other on its RHS.
• a parser that reads and understand an operator precedence grammar is
called operator precedence parser.
• It can only established between the terminals of the grammar by
ignoring non-terminals.
There are 3 operator precedence rules

• a>b terminal a has higher precedence than b.


• a<b terminal a has lower precedence than b.
• a=b terminal a and b have the same precedence.
• Precedence table
• id,a,b,c=high precedence, $=low precedence ,+>+,*>*,idid, $A$
+ - * / ^ id ( ) $
+ > > < < < < < > >
- > > < < < < < > >
* > > > > < < < > >
/ > > > > < < < > >
^ > > > > < < < > >
Id > > > > > > >
( < < < < < < < =
) > > > > > > >
$ < < < < < < < A
Operator precedence parsing
• Steps of Operator precedence parsing
1. check operator precedence grammar or not if not convert the given
grammar to operator precedence grammar
2. construct Operator precedence relation table
3. parse the string
4. generate parse tree
Example
• Construct the Operator precedence parser EEAE|id, A+|*,then
parse the following string “id+id*id”
• Step1: convert the given grammar to operator precedence grammar
• EE+E|E*E|id
• Step2: Construct operator precedence table
id + * $
id > > >
+ < > < >
* < > > >
$ < < < A
Step3: parse the given string: id+id*id
• Stack r/n input action
$ < id+id*id$ shift id
$id > +id*id $ reduce E id
$E < +id*id $ shift +
$E+ < id*id$ shift id
$E+id > *id$ reduce E id
$E+E < *id$ shift *
$E+E* < id$ shift id
$E+E*id > $ reduce E id
$E+E*E > $ reduce E E*E
$E+E > $ reduce E E+E
$E A $ Accept
Step4:generate parse tree

• E

E + E

id E * E

id id
Reading assignment
TYPES OF LR parser

You might also like