M2 Compiler Design
M2 Compiler Design
SYNTAX
ANALYSIS
MODULE 2
Role of the Syntax Analyser – Syntax error handling.
Review of Context Free Grammars - Derivation and Parse Trees,
Eliminating Ambiguity.
Basic parsing approaches - Eliminating left recursion, left factoring.
Top-Down Parsing - Recursive Descent parsing, Predictive Parsing, LL(1)
Grammars.
SYNTAX ANALYSIS
The second phase of compiler is syntax analyzer or parser.
The parser receives a steam of tokens from the lexical analyzer and verifies that the string
can be generated by the grammar for the source language by constructing a parse tree.
The term parsing comes from Latin word pars which means part of speech.
SYNTAX ANALYSIS Scanner
[Lexical Analyzer]
Tokens
Parser
[Syntax Analyzer]
INTERACTION BETWEEN LEXICAL ANALYZER
AND PARSER
CONTEXT FREE GRAMMAR
(CFG)
Context free grammar is a grammar whose productions are of the form
where A is a non terminal and α is a set of terminals and non terminals (α can be
empty also)
A formal grammar is "context free" if its production rules can be applied regardless of the
context of a nonterminal.
No matter which symbols surround it, the single nonterminal on the left hand side can always be
replaced by the right hand side.
CONTEXT FREE GRAMMAR
A CFG consist of (NTPS)
Terminals
basic symbols from which strings are formed
tokens
Non terminals
nonterminals define sets of strings that help define the language generated by the
grammar
Production
Start Symbol
Grammar for simple arithmetic expression
DERIVATION
• A derivation is basically a sequence of production rules, in order to get the input
string.
• Beginning with the start symbol, each replaces a non terminal by the body of one of
its productions.
• Types:
• Left Most Derivation - In left most derivation, the left most non terminal is replaced in each step
• Right Most Derivation - In right most derivation, the right most non terminal is replaced in each
step
Consider the grammar
PARSE TREE
Parse tree is a hierarchical structure which represents the derivation of the grammar to yield
input strings.
Derivation tree
The leaves of the parse tree are labeled by non-terminals or terminals and read from left to
right, they constitute a sentential form, called the yield or frontier of the tree.
PARSING
Parsing is the process of determining if a string of token can be
generated by a grammar.
2 approaches
Top Down Parsing - In top down parsing, parse tree is constructed from top (root) to the
bottom (leaves).
TDP approaches:
Predictive Parser
RECURSIVE DESCENT
PARSING
RECURSIVE DESCENT PARSING
IMPLEMENTATION
Procedure S()
{ if nextsymbol = ‘c’
{ A();
if nextsymbol = ‘d’
return success;
} Procedure A()
} { if nextsymbol = ‘a’
{ if nextsymbol = ‘b’
return;
else return;
}
error;
}
It is the most general form of top-down parsing.
A left-recursive grammar can cause a recursive-descent parser, to go into an infinite loop. That is when
we try to expand A, we may find ourselves again trying to expanding A, without having consumed any
input.
Recursive-descent parsers are not very common as programming language constructs can be parsed
without using backtracking.
Stack:
initialized with $, to indicate bottom of stack.
Parsing table:
2 D array M[A,a] where A is a nonterminal and a is terminal or the symbol $
28
//Reverse and push into stack
EXAMPLE:
Input : id + id * id
Grammar :
ETE’
E’ +TE’ | є
TFT’
T’*FT’ | є
F(E) | id
30
Moves made by predictive parser for the input id+id*id
31
CONSTRUCTION OF PREDICTIVE PARSING TABLE:
Uses 2 functions:
FIRST()
FOLLOW()
These functions allows us to fill the entries of
predictive parsing table
32
FIRST
RULES TO COMPUTE FIRST SET
35
FOLLOW
36
RULES TO COMPUTE FOLLOW SET
37
EXAMPLE:
38
Calculate First and Follow of the given
grammar
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
ALGORITHM TO CONSTRUCT PREDICTIVE
PARSING TABLE:
40
LL(1) GRAMMAR
44
LL(1) GRAMMARS
A context-free grammar G , whose parsing table has no multiple entries is said to be LL(1).
LL(l) grammars are the class of grammars from which the predictive parsers can be constructed
the first L stands for scanning the input from left to right,
and the 1 stands for using one input symbol of lookahead at each step to make parsing action decision.
Not LL(1) Grammar
NB:
The goal of predictive parsing is to construct a top-down parser that
never backtracks. To do so, we must transform a grammar in two ways:
Eliminate Left Recursion
Perform Left factoring
The problem is that if we use this production for top-down derivation, we will fall into an
infinite derivation chain. This is called left recursion.
An ambiguous sentence has two or more possible meanings within a single sentence or sequence
of words. This can confuse the reader and make the meaning of the sentence unclear.
AMBIGUOUS GRAMMAR
An ambiguous grammar is one that produces more
than one leftmost or more than one rightmost
derivation for the same sentence.