Syntax Analysis in Compiler Design

SYNTAX ANALYSIS
M. Mahasree,
Assistant Professor,
Dept. of CSE,
SRM IST, Ramapuram

Outline
 Definition - Role of parser
 Lexical versus Syntactic Analysis
 Representative Grammars
 Syntax Error Handling
 Error recovery strategies
 Derivations
2

What is Syntax Analysis?
 Syntax Analysis is a second phase of the compiler design process after lexical
analysis in which the given input string is checked for the confirmation of rules and
structure of the formal grammar.
 Syntax Analyzer or Parser analyses the syntactical structure and checks if the
given input is in the correct syntax of the programming language or not.
 It does so by getting the input from the tokens and building a data structure, called a
Syntax tree or Parse tree.
3

Cont.
 The parse tree is constructed by using the pre-defined Grammar of the language and
the input string.
 If the given input string can be produced with the help of the syntax tree, then the
input string is found to be in the correct syntax.
 Otherwise, error is reported by syntax analyzer.
5

Tasks performed by Parser
 The parser helps to apply rules to the code
 Helps to make sure that each opening brace has a corresponding closing balance
 Helps to detect all types of Syntax errors
 Find the position at which error has occurred
 Clear and accurate description of the error
 Recovery from an error to continue and find further errors in the code
 Should not affect compilation of “correct” programs
 The parse must reject invalid texts by reporting syntax errors
7

Types of Parsers
 Three types:
 Universal Parser
 Top-down Parser
 Bottom-up Parser
 Universal Parsers like CYK (Cocke-Younger-Kasami) algorithm and Earley’s
Algorithm can parse any grammar but they are inefficient to use in production
compilers.
 As implied by their names, top-down parsers build parse trees from the top (root) to
the bottom (leaves). Eg. LL parser
 Bottom-up parsers start from the leaves and work their way up to the root. Eg. LR
parser
8

Syntax Error Handling
Types of Errors
1) Lexical
2) Syntactic
3) Semantic
4) Logical
 Lexical errors include misspellings of identiers, keywords, or
operators.
 e.g., the use of an identier elipseSize instead of ellipseSize
 missing quotes around text intended as a string
 Syntactic errors include misplaced semicolons or extra or
missing braces; that is, “{”or “}”
 Example: In C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error
9

Types of Errors
1) Lexical
2) Syntactic
3) Semantic
4) Logical
 Semantic errors include type mismatches between operators and
operands,
 e.g., the return of a value in a Java method with result type void.
 Logical errors occur when executed code does not produce the
expected result.
 incorrect reasoning on the part of the programmer
 The use in a C program of the assignment operator = instead of the
comparison operator = =
10

 The error handler in a parser has goals that are simple to state but
challenging to realize:
 Report the presence of errors clearly and accurately.
 Recover from each error quickly enough to detect subsequent errors.
 Add minimal overhead to the processing of correct programs.
11

Error-Recovery Strategies
1. Panic-Mode Recovery
2. Phrase-Level Recovery
3. Error Productions
4. Global Correction
12

Panic-Mode Recovery
 Once an error is found, the parser intends to find designated
set of synchronizing tokens by discarding input symbols one
at a time.
 Synchronizing tokens are delimiters, semicolon or } whose
role in source program is clear.
 When parser finds an error in the statement, it ignores the rest
of the statement by not processing the input.
 Advantage:
 Simplicity
 Never get into infinite loop
 Disadvantage:
 Additional errors cannot be checked as some of the input
symbols will be skipped.
13

Phrase-Level Recovery
 When a parser finds an error, it tries to take corrective
measures so that the rest of inputs of statement allow the
parser to parse ahead. The corrections may be
 Replacing a prefix by some string.
 Replacing comma by semicolon.
 Deleting extraneous semicolon.
 Inserting missing semicolon.
 Advantage:
 It can correct any input string.
 Disadvantage:
 It is difficult to cope up with actual error if it has occurred before the
point of detection.
14

Error Productions
 The use of the error production method can be incorporated if
the user is aware of common mistakes that are encountered in
grammar in conjunction with errors that produce erroneous
constructs.
 Example: write 5x instead of 5*x
 Advantage:
 If this is used then, during parsing appropriate error messages can be
generated and parsing can be continued.
 Disadvantage:
 The disadvantage is that it’s difficult to maintain.
15

Global Correction
 The parser considers the program in hand as a whole and tries
to figure out what the program is intended to do and tries to
find out a closest match for it, which is error-free.
 When an erroneous input statement X is fed, it creates a parse
tree for some closest error-free statement Y.
 Advantage:
 This may allow the parser to make minimal changes in the source
code.
 Disadvantage:
 Due to the complexity (time and space) of this strategy, it has not
been implemented in practice yet.
16

Lexical Vs Syntactic analysis
17

Lexical Vs Syntactic analysis
18

Representative Grammars - CFG
 Context-free grammars are named as such because any of the production
rules in the grammar can be applied regardless of context—it does not
depend on any other symbols that may or may not be around a given symbol
that is having a rule applied to it.
 A context free grammar G is defined by four tuple format as
G = (V, T, P, S)
where,
G − Grammar
V − Set of variables
T − Set of terminals
P − Set of productions
S − Start symbol
19

Context Free Grammar
 Terminals are symbols from which strings are formed.
Lowercase letters, i.e., a, b, c.
Operators, i.e., +,−, ∗.
Punctuation symbols, i.e., comma, parenthesis.
Digits, i.e., 0, 1, 2, · · · ,9.
Boldface letters, i.e., id, if.
 Non-terminals are syntactic variables that denote a set of strings.
Uppercase letters, i.e., A, B, C.
Lowercase italic names, i.e., expr, stmt.
 Start symbol is the head of the production stated first in the grammar.
 Production is of the form LHS → RHS or head → body, where head
contains only one non-terminal and body contains a collection of
terminals and non-terminals.
20

Context Free Grammar
 Example
E → E + T | E − T | T
T → T ∗ F | T / F | F
F → (E) | id
V = {E, T, F}
T = {+ , −, *, /, ( , ), id}
S = {E}
P :
E → E+T T→ T / F
E → E−T T→ F
E →T F→ (E)
T → T∗ F F→ id
21

Leftmost Derivation
 At each and every step the leftmost non-terminal is expanded by substituting its
corresponding production to derive a string.
E → E + E | E ∗ E | id
23

Rightmost Derivation
 At each and every step the rightmost non-terminal is expanded by substituting its
corresponding production to derive a string.
E → E + E | E ∗ E | id
24

Derivation Examples
 Leftmost
S → SS + | SS ∗ | a
 Rightmost
S → SS + | SS ∗ | a
25

Parse Tree
• Parse tree is a hierarchical structure
which represents the derivation of the
grammar to yield input strings.
• Root node of parse tree has the start
symbol of the given grammar from
where the derivation proceeds.
• Leaves of parse tree represent terminals.
• Each interior node represents
productions of grammar.
• If G : E → E+E | E*E | ( E ) | - E | id is
the grammar, then Parse tree for the
input string - (id + id) is shown.
26

References
 Alfred V Aho, Jeffery D Ullman, Ravi Sethi, "Compilers, Principles
techniques and tools", Pearson Education 2011
 https://www.gatevidyalay.com
 https://electricalvoice.com
28

Syntax Analysis in Compiler Design

More Related Content

What's hot

Similar to Syntax Analysis in Compiler Design

Recently uploaded

Syntax Analysis in Compiler Design