UNIT II:
Top Down Parsing: Pre Processing Steps of Top Down Parsing, Backtracking, Recursive
Descent Parsing, LL (1) Grammars, Non-recursive Predictive Parsing, Error Recovery in
Predictive Parsing.
Bottom Up Parsing: Introduction, Difference between LR and LL Parsers, Types of LR Parsers,
Shift Reduce Parsing, SLR Parsers, Construction of SLR Parsing Tables, More Powerful LR
Parses, Construction of CLR (1) and LALR Parsing Tables, Dangling Else Ambiguity, Error
Recovery in LR Parsing, Handling Ambiguity Grammar with LR Parsers.
Compiler design has many functional modules one of them is the parser which takes the output of
the lexical analyzer (often a set of tokens list) and builds a parse tree. The main responsibility of the
parser is to confirm whether the generated language can produce the input string and helps in the
analysis of syntax.
What is a Parser?
The parser is one of the phases of the compiler which takes a token of string as input and converts it
into the corresponding Intermediate Representation (IR) with the help of an existing grammar. The
parser is also known as Syntax Analyzer.
Types of Parsers
The parser is mainly classified into two categories.
Top-down Parser
Bottom-up Parser
There are 2 parsing techniques which works on the following principles,
1. The parsers scans the input string from Left to Right and identifies that the derivation is Left Most
Derivation or Right Most Derivation.
2. The parser makes use of production rules for choosing the appropriate derivation.
3. The different parsing techniques use different approaches in selecting appropriate rules for
derivation and finally a parse tree is constructed.
When the parse tree can be constructed from root and expanded to leaves then such type of parser
is called Top-down parser. The parse tree can be built from top to bottom.
When the parse tree can be constructed from leaves to root, then such type of parser is called as
bottom-up parser. Thus the parse tree in build in bottom up manner.
Top-down Parsing:
Top-down parser is the parser that generates parse tree for the given input string with the
help of grammar productions by expanding the non-terminals. It starts from the start
symbol and ends down on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
1) Recursive descent parser. 2) non-recursive descent parser.
Recursive descent parser is also known as the Brute force parser or the backtracking
parser. It basically generates the parse tree by using brute force and backtracking
techniques.
Non-recursive descent parser is also known as LL(1) parser or predictive parser or
without backtracking parser or dynamic parser. It uses a parsing table to generate the
parse tree instead of backtracking.
In top-down parsing the parse tree is generated from top to bottom i.e, from roots to leaves.
The derivation terminate when required input string is identified. The main task in the top
down parsing is to field the appropriate production rule in order to produce the correct input
string.
Let us consider the grammar.
S-> a A b
A-> c d|c
Consider the input string “a c b” as shown below.
a c b
↑ Input Buffer
To construct the parse tree for the above grammar deriving the given input string can be done using top-down approach.
Problems with top-down parsing:
There are certain problems in top-down parsing in order to implement the parsing we need
to eliminate these problems. The Problems of top down parsing is of 4 types.
1.Back Tracking
2.Left Recursion
3.Left Factoring
4.Ambiguity
Top down parsing is classified into 2 types.
1)Backtracking
2)Predictive Parsing
Backtracking : A backtracking parser will tries different production rules to find the match
for the input string by backtracking each time.
As this is powerful than predictive parsing but this technique is slower and it requires
exponential time. Hence backtracking is not possible for practical compilers.
Predictive Parsing: The Predictive parser tries to predict the construction tree using one
/more look ahead symbols from input string. There are 2 types of predictive parsers.
1)Recursive decent Parser
2)LL(1) Parser.
1) Recursive decent Parser:
A Parser that uses collection of recursive procedures for parsing the given input string is
called recursive decent parser .
Advantages :
1) Recursive decent parser are simple to build
2) recursive decent parser can be constructed with the help of parse tree and is easy to
build the recursive procedure to the given grammar.
Disadvantages :
1) Recursive decent parser are not very efficient as compared to other parsing techniques.
2) Recursive decent parser cannot provide good error messaging.
3) It is difficult to parse the string if look ahead symbol is arbitarily long.
Basic Steps to construction of Recursive decent parser
Step-1
If the input symbol is non-terminal then call to the procedure corresponding to the non-
terminal.
Step-2
If the input symbol is terminal it is matched with look ahead from the input. The
lookahead pointer has to be advanced on matching the input symbol.
Step-3
If the production rule has many alternatives then all these alternatives has to be
combined into a single procedure.
Step-4
The Parser should be activated by a procedure corresponding to the start symbol.
Example: If lookahead = $ else NULL
{ }
Consider the grammar declare success;
Procedure match (token t)
}
else {
E ➡ num T if lookahead = t
error;
T ➡ * num T / ϵ } lookahead = next_token;
Procedure T else
Procedure E { error;
{ if lookahead = ‘*’ }
{
if lookahead = num then match (‘*’);
Procedure error
{ if lookahead = ‘num’ {
match (num); { Printf(“error !!!”);
T() match(num); }
T();
} }
else else
error; error;
}
Let us consider the String 3*4 $
LL(1) Parser : (Non- Recursive Predictive Parser)
This top-down parsing algorithm is of non-recursive type. In LL(1), the 1st L means the input is
scan from Left to Right. The 2nd L means it uses Left Most Derivation(LMD) for the input string
& the number 1 is the input symbol which uses only 1 input symbol (lookahead) to predict
the parsing process. The simple block diagram for LL(1) Parser is as
The data structures used by LL(1) are
i)Input buffer
ii)Stack
iii)Parsing table
•The input buffer is used to maintain the input string.
•Stack contains sequence of grammar symbol proceeded by $. Initially we push the start
symbol into the stack and this start symbol is also proceeded by $.
•Parsing table is a 2D array and is represented as M[X,a] where ‘X’ is a non-terminal, ‘a’ is a
terminal.
•The parser reads input from the input buffer and identifies the top of the stack symbol then
the parser performs, the following actions.
Action 1:
If ‘X’ is a top of the stack ‘a’ is a current input symbol, if ‘X=a=$’ then the parser halts and
announces the successful completion of parser.
Action 2:
If ‘X=a≠$’ then pop ‘X’ from the stack & the input pointer moves to the next input
symbol.
Action 3:
If ‘X’ is non- terminal then the parser consults parsing table in entry M[X,a],
If M[X, a] contains X=a b c then replace ‘X’ by c, b, a where ‘c ‘ is on the top of the stack.
Action 4:
If M[X,a] is error then the parser calls error recovery routine.
Error Recovery in Predictive Parsing
•An error is detected during the predictive parsing when the terminal on the top of the
stack does not match the next input string/Symbol, or when non-terminal A on the top of
the stack, a is the next input symbol and the parsing table entry M[A,a] is empty.
Error Recovery Techniques:
Panic-Mode Error Recovery: In Panic-Mode Error Recovery the technique is skipping the
input symbols until a synchronizing token is found.
Phrase-Level Error Recovery: Each empty entry in the parsing table is filled with a
pointer to a specific error routing take care of that error case.
Panic-Mode Error Recovery in LL(1) Parsing:
Panic-mode error recovery says that all the input symbols are skipped until a
synchronizing token is found from the string.
In this recovery method, we use FOLLOW symbols as synchronizing tokens and the
“synch” in the predictive parsing table to indicate synchronizing tokens obtained from
the non-terminal’s FOLLOW sets.
•We might add keywords that begins statements to the synchronizing sets for the non-
terminals.
•We can add symbols in FIRST to the synchronizing set for non-terminal A.
•If a terminal on the top of the stack cannot matched, a simple idea is to pop the terminal
with a message that the terminal was inserted.
•If a non-terminal can generate the empty string then the production deriving ε can be used
as a default. This may postpone some error detection, but cannot cause an error to be
missed.
Phrase-level recovery in LL(1) Parsing:
Phrase level recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines. At each unfilled entry in the parsing table, it is filled by
a pointer to a special error routine that will take care of that error case specifically. These
error routines can be of different types like :
•change, insert, or delete input symbols.
•issue appropriate error messages.
•pop items from the stack.
Bottom Up Parsing
Introduction: Bottom-up parsing begins at the terminal symbols (leaf nodes) of a parse
tree and ascends towards the root node, combining smaller units into larger ones until the
entire structure is formed. This method is commonly used in syntax analysis to construct
parse trees for input strings.
Bottom Up Parsing:
• In this method the input string is token and it is taken first and we have to try to reduce
the string with the help of grammar and try to obtain the string symbol.
• The process of parsing halts successfully as soon as we reach to the starting symbol. The
parse tree is constructed from bottom to up i.e., from leaves to root in this process the input
symbols are placed at the leaf nodes after successful parsing.
•The parse trees tries to identify the RHS of production rule and replace it by corresponding
LHS. This activity is called reduction. Thus the main task of bottom up parsing is to find the
production that can be used for reduction.
Example:
Consider the grammar
S → TL;
T → float / int
L → L , id / id
Now consider the input string as “ float id,id, ”
Difference between LR and LL Parsers
LL Parsers LR Parsers
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root non-terminal on the stack. Ends with the root non-terminal on the stack.
Ends when the stack is empty. Starts with an empty stack.
Uses the stack for designating what is still to be expected. Uses the stack for designating what is already seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a non-terminal off the stack, and pushes Tries to recognize a right hand side on the stack, pops
the corresponding right hand side. it, and pushes the corresponding non-terminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off the stack. Reads the terminals while it pushes them on the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Types of LR Parsers: LR Parser is classified into 3 types
•SLR means simple LR Parser, LALR means Lookahead LR Parser & CLR means Canonical LR
parser/ Simply LR Parser.
•The overall structure of all these are same. The relative powers of these parsers is
SLR(1) ≤ LALR(1) ≤ CLR(1)
•That means CLR parsers larger class then LALR and LALR parsers larger class than SLR
parser.
•A general shift reduce parsing is LR parsing. The L stands for scanning the input from left to
right and R stands for constructing a rightmost derivation in reverse.
Benefits of LR parsing:
•Many programming languages using some variations of an LR parser. It should be noted
that C++ and Perl are exceptions to it.
•LR Parser can be implemented very efficiently.
• All the Parsers that scan their symbols from left to right, LR Parsers detect syntactic errors,
as soon as possible.
Why LR Parsers:
The most efficient method of bottom up parsing which can be used to
parse the large class of context free grammars. This method is also called
LR(K) parsing.
Here
• L stands for left to right scanning.
• R stands for rightmost derivation in reverse.
• k is number of input symbols. When k is omitted k is assumed to be 1.
Properties of LR parser
LR parser is widely used for following reasons.
1.LR parsers can be constructed to recognize most of the programming
languages for which context free grammar can be written.
2.The class of grammar that can be parsed by LR parser is a superset of
class of grammars that can be parsed using predictive parsers.
3.LR works using non backtracking shift reduce technique yet it is
efficient one.
4.LR Parsers detect syntactical errors very efficiently.
Shift Reduce Parsing :
Shift reduce parser attempts to construct parse tree from leaves to root. It
contains following data structures.
1.Input buffer : Which is used to storing the input.
2.A stack for storing and accessing the LHS & RHS of roots.
The parser performs the following four operations
1)Shift. 2) Reduce. 3) Accept. 4) Error.
1) Shift : Moving of the symbols from input buffer on to the stack is called shift.
2) Reduce: If the handle appears on the top of the stack then the reduction can be done by
poping RHS from the stack and LHS is pushed this action is reduce.
3) Accept: If the stack contains the starting symbol only & input buffer is empty at the same
time then that action is called accept.
4) Error: A situation in which parser cannot either shift or reduce the symbol. It cannot
perform the accept action is called error.
LR Parser
The Structure of LR parser is
• It consists of input buffer for storing
the input string, a stack for storing the
grammar symbols and output.
•A parsing table consists of two parts,
namely action and goto.
•There is one parsing program which is
actually a derive a program and reads
the input symbol one at a time from the
input buffer.
The program works on following line.
1.It initializes the stack with start symbol and invokes scanner to get next
token.
2.It determines Sj the state currently on the top of the stack and ai the current
input symbol.
3. It consults the parsing table for the action [Sj , ai] which can have one of the
four value.
i) si means shift state i.
ii) rj means reduce by rule j.
iii) Accept means successful parsing is done.
iv) Error indicates syntactical error.