SYNTAX ANALYSIS
M. Mahasree,
Assistant Professor,
Dept. of CSE,
SRM IST, Ramapuram
Outline
 Definition - Role of parser
 Lexical versus Syntactic Analysis
 Representative Grammars
 Syntax Error Handling
 Error recovery strategies
 Derivations
2
What is Syntax Analysis?
 Syntax Analysis is a second phase of the compiler design process after lexical
analysis in which the given input string is checked for the confirmation of rules and
structure of the formal grammar.
 Syntax Analyzer or Parser analyses the syntactical structure and checks if the
given input is in the correct syntax of the programming language or not.
 It does so by getting the input from the tokens and building a data structure, called a
Syntax tree or Parse tree.
3
Cont.
Parse Tree
4
Cont.
 The parse tree is constructed by using the pre-defined Grammar of the language and
the input string.
 If the given input string can be produced with the help of the syntax tree, then the
input string is found to be in the correct syntax.
 Otherwise, error is reported by syntax analyzer.
5
Role of the parser
6
Tasks performed by Parser
 The parser helps to apply rules to the code
 Helps to make sure that each opening brace has a corresponding closing balance
 Helps to detect all types of Syntax errors
 Find the position at which error has occurred
 Clear and accurate description of the error
 Recovery from an error to continue and find further errors in the code
 Should not affect compilation of “correct” programs
 The parse must reject invalid texts by reporting syntax errors
7
Types of Parsers
 Three types:
 Universal Parser
 Top-down Parser
 Bottom-up Parser
 Universal Parsers like CYK (Cocke-Younger-Kasami) algorithm and Earley’s
Algorithm can parse any grammar but they are inefficient to use in production
compilers.
 As implied by their names, top-down parsers build parse trees from the top (root) to
the bottom (leaves). Eg. LL parser
 Bottom-up parsers start from the leaves and work their way up to the root. Eg. LR
parser
8
Syntax Error Handling
Types of Errors
1) Lexical
2) Syntactic
3) Semantic
4) Logical
 Lexical errors include misspellings of identiers, keywords, or
operators.
 e.g., the use of an identier elipseSize instead of ellipseSize
 missing quotes around text intended as a string
 Syntactic errors include misplaced semicolons or extra or
missing braces; that is, “{”or “}”
 Example: In C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error
9
Syntax Error Handling
Types of Errors
1) Lexical
2) Syntactic
3) Semantic
4) Logical
 Semantic errors include type mismatches between operators and
operands,
 e.g., the return of a value in a Java method with result type void.
 Logical errors occur when executed code does not produce the
expected result.
 incorrect reasoning on the part of the programmer
 The use in a C program of the assignment operator = instead of the
comparison operator = =
10
Syntax Error Handling
 The error handler in a parser has goals that are simple to state but
challenging to realize:
 Report the presence of errors clearly and accurately.
 Recover from each error quickly enough to detect subsequent errors.
 Add minimal overhead to the processing of correct programs.
11
Error-Recovery Strategies
1. Panic-Mode Recovery
2. Phrase-Level Recovery
3. Error Productions
4. Global Correction
12
Panic-Mode Recovery
 Once an error is found, the parser intends to find designated
set of synchronizing tokens by discarding input symbols one
at a time.
 Synchronizing tokens are delimiters, semicolon or } whose
role in source program is clear.
 When parser finds an error in the statement, it ignores the rest
of the statement by not processing the input.
 Advantage:
 Simplicity
 Never get into infinite loop
 Disadvantage:
 Additional errors cannot be checked as some of the input
symbols will be skipped.
13
Phrase-Level Recovery
 When a parser finds an error, it tries to take corrective
measures so that the rest of inputs of statement allow the
parser to parse ahead. The corrections may be
 Replacing a prefix by some string.
 Replacing comma by semicolon.
 Deleting extraneous semicolon.
 Inserting missing semicolon.
 Advantage:
 It can correct any input string.
 Disadvantage:
 It is difficult to cope up with actual error if it has occurred before the
point of detection.
14
Error Productions
 The use of the error production method can be incorporated if
the user is aware of common mistakes that are encountered in
grammar in conjunction with errors that produce erroneous
constructs.
 Example: write 5x instead of 5*x
 Advantage:
 If this is used then, during parsing appropriate error messages can be
generated and parsing can be continued.
 Disadvantage:
 The disadvantage is that it’s difficult to maintain.
15
Global Correction
 The parser considers the program in hand as a whole and tries
to figure out what the program is intended to do and tries to
find out a closest match for it, which is error-free.
 When an erroneous input statement X is fed, it creates a parse
tree for some closest error-free statement Y.
 Advantage:
 This may allow the parser to make minimal changes in the source
code.
 Disadvantage:
 Due to the complexity (time and space) of this strategy, it has not
been implemented in practice yet.
16
Lexical Vs Syntactic analysis
17
Lexical Vs Syntactic analysis
18
Representative Grammars - CFG
 Context-free grammars are named as such because any of the production
rules in the grammar can be applied regardless of context—it does not
depend on any other symbols that may or may not be around a given symbol
that is having a rule applied to it.
 A context free grammar G is defined by four tuple format as
G = (V, T, P, S)
where,
G − Grammar
V − Set of variables
T − Set of terminals
P − Set of productions
S − Start symbol
19
Context Free Grammar
 Terminals are symbols from which strings are formed.
Lowercase letters, i.e., a, b, c.
Operators, i.e., +,−, ∗.
Punctuation symbols, i.e., comma, parenthesis.
Digits, i.e., 0, 1, 2, · · · ,9.
Boldface letters, i.e., id, if.
 Non-terminals are syntactic variables that denote a set of strings.
Uppercase letters, i.e., A, B, C.
Lowercase italic names, i.e., expr, stmt.
 Start symbol is the head of the production stated first in the grammar.
 Production is of the form LHS → RHS or head → body, where head
contains only one non-terminal and body contains a collection of
terminals and non-terminals.
20
Context Free Grammar
 Example
E → E + T | E − T | T
T → T ∗ F | T / F | F
F → (E) | id
V = {E, T, F}
T = {+ , −, *, /, ( , ), id}
S = {E}
P :
E → E+T T→ T / F
E → E−T T→ F
E →T F→ (E)
T → T∗ F F→ id
21
Derivations
22
Leftmost Derivation
 At each and every step the leftmost non-terminal is expanded by substituting its
corresponding production to derive a string.
E → E + E | E ∗ E | id
23
Rightmost Derivation
 At each and every step the rightmost non-terminal is expanded by substituting its
corresponding production to derive a string.
E → E + E | E ∗ E | id
24
Derivation Examples
 Leftmost
S → SS + | SS ∗ | a
 Rightmost
S → SS + | SS ∗ | a
25
Parse Tree
• Parse tree is a hierarchical structure
which represents the derivation of the
grammar to yield input strings.
• Root node of parse tree has the start
symbol of the given grammar from
where the derivation proceeds.
• Leaves of parse tree represent terminals.
• Each interior node represents
productions of grammar.
• If G : E → E+E | E*E | ( E ) | - E | id is
the grammar, then Parse tree for the
input string - (id + id) is shown.
26
Parse Tree
27
References
 Alfred V Aho, Jeffery D Ullman, Ravi Sethi, "Compilers, Principles
techniques and tools", Pearson Education 2011
 https://www.gatevidyalay.com
 https://electricalvoice.com
28
THANK YOU
29

Syntax Analysis in Compiler Design

  • 1.
    SYNTAX ANALYSIS M. Mahasree, AssistantProfessor, Dept. of CSE, SRM IST, Ramapuram
  • 2.
    Outline  Definition -Role of parser  Lexical versus Syntactic Analysis  Representative Grammars  Syntax Error Handling  Error recovery strategies  Derivations 2
  • 3.
    What is SyntaxAnalysis?  Syntax Analysis is a second phase of the compiler design process after lexical analysis in which the given input string is checked for the confirmation of rules and structure of the formal grammar.  Syntax Analyzer or Parser analyses the syntactical structure and checks if the given input is in the correct syntax of the programming language or not.  It does so by getting the input from the tokens and building a data structure, called a Syntax tree or Parse tree. 3
  • 4.
  • 5.
    Cont.  The parsetree is constructed by using the pre-defined Grammar of the language and the input string.  If the given input string can be produced with the help of the syntax tree, then the input string is found to be in the correct syntax.  Otherwise, error is reported by syntax analyzer. 5
  • 6.
    Role of theparser 6
  • 7.
    Tasks performed byParser  The parser helps to apply rules to the code  Helps to make sure that each opening brace has a corresponding closing balance  Helps to detect all types of Syntax errors  Find the position at which error has occurred  Clear and accurate description of the error  Recovery from an error to continue and find further errors in the code  Should not affect compilation of “correct” programs  The parse must reject invalid texts by reporting syntax errors 7
  • 8.
    Types of Parsers Three types:  Universal Parser  Top-down Parser  Bottom-up Parser  Universal Parsers like CYK (Cocke-Younger-Kasami) algorithm and Earley’s Algorithm can parse any grammar but they are inefficient to use in production compilers.  As implied by their names, top-down parsers build parse trees from the top (root) to the bottom (leaves). Eg. LL parser  Bottom-up parsers start from the leaves and work their way up to the root. Eg. LR parser 8
  • 9.
    Syntax Error Handling Typesof Errors 1) Lexical 2) Syntactic 3) Semantic 4) Logical  Lexical errors include misspellings of identiers, keywords, or operators.  e.g., the use of an identier elipseSize instead of ellipseSize  missing quotes around text intended as a string  Syntactic errors include misplaced semicolons or extra or missing braces; that is, “{”or “}”  Example: In C or Java, the appearance of a case statement without an enclosing switch is a syntactic error 9
  • 10.
    Syntax Error Handling Typesof Errors 1) Lexical 2) Syntactic 3) Semantic 4) Logical  Semantic errors include type mismatches between operators and operands,  e.g., the return of a value in a Java method with result type void.  Logical errors occur when executed code does not produce the expected result.  incorrect reasoning on the part of the programmer  The use in a C program of the assignment operator = instead of the comparison operator = = 10
  • 11.
    Syntax Error Handling The error handler in a parser has goals that are simple to state but challenging to realize:  Report the presence of errors clearly and accurately.  Recover from each error quickly enough to detect subsequent errors.  Add minimal overhead to the processing of correct programs. 11
  • 12.
    Error-Recovery Strategies 1. Panic-ModeRecovery 2. Phrase-Level Recovery 3. Error Productions 4. Global Correction 12
  • 13.
    Panic-Mode Recovery  Oncean error is found, the parser intends to find designated set of synchronizing tokens by discarding input symbols one at a time.  Synchronizing tokens are delimiters, semicolon or } whose role in source program is clear.  When parser finds an error in the statement, it ignores the rest of the statement by not processing the input.  Advantage:  Simplicity  Never get into infinite loop  Disadvantage:  Additional errors cannot be checked as some of the input symbols will be skipped. 13
  • 14.
    Phrase-Level Recovery  Whena parser finds an error, it tries to take corrective measures so that the rest of inputs of statement allow the parser to parse ahead. The corrections may be  Replacing a prefix by some string.  Replacing comma by semicolon.  Deleting extraneous semicolon.  Inserting missing semicolon.  Advantage:  It can correct any input string.  Disadvantage:  It is difficult to cope up with actual error if it has occurred before the point of detection. 14
  • 15.
    Error Productions  Theuse of the error production method can be incorporated if the user is aware of common mistakes that are encountered in grammar in conjunction with errors that produce erroneous constructs.  Example: write 5x instead of 5*x  Advantage:  If this is used then, during parsing appropriate error messages can be generated and parsing can be continued.  Disadvantage:  The disadvantage is that it’s difficult to maintain. 15
  • 16.
    Global Correction  Theparser considers the program in hand as a whole and tries to figure out what the program is intended to do and tries to find out a closest match for it, which is error-free.  When an erroneous input statement X is fed, it creates a parse tree for some closest error-free statement Y.  Advantage:  This may allow the parser to make minimal changes in the source code.  Disadvantage:  Due to the complexity (time and space) of this strategy, it has not been implemented in practice yet. 16
  • 17.
  • 18.
  • 19.
    Representative Grammars -CFG  Context-free grammars are named as such because any of the production rules in the grammar can be applied regardless of context—it does not depend on any other symbols that may or may not be around a given symbol that is having a rule applied to it.  A context free grammar G is defined by four tuple format as G = (V, T, P, S) where, G − Grammar V − Set of variables T − Set of terminals P − Set of productions S − Start symbol 19
  • 20.
    Context Free Grammar Terminals are symbols from which strings are formed. Lowercase letters, i.e., a, b, c. Operators, i.e., +,−, ∗. Punctuation symbols, i.e., comma, parenthesis. Digits, i.e., 0, 1, 2, · · · ,9. Boldface letters, i.e., id, if.  Non-terminals are syntactic variables that denote a set of strings. Uppercase letters, i.e., A, B, C. Lowercase italic names, i.e., expr, stmt.  Start symbol is the head of the production stated first in the grammar.  Production is of the form LHS → RHS or head → body, where head contains only one non-terminal and body contains a collection of terminals and non-terminals. 20
  • 21.
    Context Free Grammar Example E → E + T | E − T | T T → T ∗ F | T / F | F F → (E) | id V = {E, T, F} T = {+ , −, *, /, ( , ), id} S = {E} P : E → E+T T→ T / F E → E−T T→ F E →T F→ (E) T → T∗ F F→ id 21
  • 22.
  • 23.
    Leftmost Derivation  Ateach and every step the leftmost non-terminal is expanded by substituting its corresponding production to derive a string. E → E + E | E ∗ E | id 23
  • 24.
    Rightmost Derivation  Ateach and every step the rightmost non-terminal is expanded by substituting its corresponding production to derive a string. E → E + E | E ∗ E | id 24
  • 25.
    Derivation Examples  Leftmost S→ SS + | SS ∗ | a  Rightmost S → SS + | SS ∗ | a 25
  • 26.
    Parse Tree • Parsetree is a hierarchical structure which represents the derivation of the grammar to yield input strings. • Root node of parse tree has the start symbol of the given grammar from where the derivation proceeds. • Leaves of parse tree represent terminals. • Each interior node represents productions of grammar. • If G : E → E+E | E*E | ( E ) | - E | id is the grammar, then Parse tree for the input string - (id + id) is shown. 26
  • 27.
  • 28.
    References  Alfred VAho, Jeffery D Ullman, Ravi Sethi, "Compilers, Principles techniques and tools", Pearson Education 2011  https://www.gatevidyalay.com  https://electricalvoice.com 28
  • 29.