ባ ሕር ዳ ር ቴ ክ ኖ ሎጂ ኢ ን ስ ቲ ት
Bahir Dar Institute of Technology
ባሕር ዳር ዩኒቨርሲቲ
Bahir Dar University
Compiler Design
Course Code: CoSc4022 Target Group: 4th Year CSD Students Instructor: Haileyesus D.
Chapter Four
Semantic Analyzer
Chapter Outline
Introduction
Where are we are?
A Short Program
Semantic Analysis
Annotated Abstract Syntax tree (AST)
Syntax-Directed Translation
Syntax-Directed Definitions (SDD)
Evaluation of S-Attributed Definitions
L-Attributed Definitions
Translation Schemes
3
Introduction
§ Program is lexically well-formed:
§ Identifiers have valid names.
§ Strings are properly terminated.
§ No stray characters.
§ Program is syntactically well-formed:
§ Class declarations have the correct structure.
§ Expressions are syntactically valid.
§ Does this mean that the program is legal?
4
A short program
class MyClass implements MyInterface {
string myInteger;
void doSomething() {
int[] x = new string;
x[5] = myInteger * y;
}
void doSomething() {
}
int fibonacci(int n) {
return doSomething() + fibonacci(n – 1);
}
}
5
A short program
6
Semantic Analysis
Ensure that the program has a well-defined meaning.
Verify properties of the program that aren't caught during the earlier phases:
Variables are declared before they're used.
Expressions have the right types.
Classes don't inherit from nonexistent base classes
Type consistence;
Inheritance relationship is correct;
A class is defined only once;
A method in a class is defined only once;
Reserved identifiers are not misused;
…
Once we finish semantic analysis, we know that the user's input program is
legal.
7
A simple semantic check
“Matching identifier declarations with uses”
Important analysis in most languages
If there are multiple declarations, which one to match?
The scope of an identifier is the
portion of a program in which that
identifier is accessible
8
Semantic Analysis…
Semantic analysis also gathers useful information about program for later
phases:
E.g. count how many variables are in scope at each point.
Why can't we just do this during parsing?
Limitations of CFGs
How would you prevent duplicate class definitions?
How would you differentiate variables of one type from variables of another type?
How would you ensure classes implement all interface methods?
For most programming languages, these are probably impossible.
9
Semantic Analysis
Semantic Analysis can be implemented using Annotated Abstract Syntax tree (AST)
The input for Semantic Analysis (syntax analyzer) is Abstract Syntax tree and the
output is Annotated Abstract Syntax tree.
Annotated Abstract Syntax tree is parse-tree that also shows the values of the attributes
at each node.
10
Syntax-Directed Translation (SDT)
SDT is used to drive semantic analysis tasks based on the language’s syntax
structures
What semantic tasks?
Generate AST (abstract syntax tree)
Check type errors
Generate intermediate representation (IR)
What is syntax structures?
Context free grammar (CFG)
Parse tree generated from parser
11
Syntax-Directed Translation (SDT)
How?
Attach attributes to grammar symbols/parse tree
Attach either rules or program fragments to productions in a grammar
Evaluate attribute values using semantic actions/ semantic rules
associated with the production rules.
12
Types of Attribute
Attributes can represent anything we need: a string, a type, a number, a memory
location , … and they are two types
Synthesized attributes: attribute values are computed from some attribute values
of its children nodes
P.synthesized_attr = f([Link], [Link], [Link])
Inherited attributes: attribute values are computed from attributes of the siblings
or/and parent of the node
C3.inherited_attr = f([Link], [Link], [Link])
13
Two Types of Syntax Directed Translation
When we associate semantic rules with productions, we use two notations:-
Syntax-Directed Definitions
associates a production rule with a set of semantic actions, and we do not say when
they will be evaluated.
don’t specify order of evaluation/translation - hides implementation details
Translation Schemes
indicate the order of evaluation of semantic actions associated with a production rule.
shows more implementation details
14
Syntax-Directed Definitions (SDD)
SDD is a context-free grammar together with, attributes and rules.
Attributes are associated with grammar symbols and rules are associated with productions.
If X is a symbol and a is one of its attributes, then we write X.a to denote the value of a at a
particular parse-tree node labeled X.
Production Semantic Rules
L E return { print([Link]) }
E E1 + T { [Link] = [Link] + [Link] } Symbols E, T, and F are associated
ET { [Link] = [Link] }
with a synthesized attribute val.
The token digit has a synthesized
T T1 * F { [Link] = [Link] * [Link] }
attribute lexval (it is assumed that it
TF { [Link] = [Link] } is evaluated by the lexical analyzer).
F(E) { [Link] = [Link] }
F digit { [Link] = [Link] }
15
Annotated Parse Tree (Example)
16
Annotated Parse Tree
§ Exercise: Consider the following grammar that is used for Simple
desk calculator. Obtain the Semantic action and also the annotated
parse tree for the string 3*5+4n
L En
E E1+T
E T
T T1*F
T F
F (E)
F digit
17
Annotated Parse Tree
Production Rules Sematic Actions
L→En [Link]=[Link]
E→E1+T [Link]=[Link] + [Link]
E→T [Link]=[Link]
T→T1*F [Link]=[Link]*[Link]
T→F [Link]=[Link]
F→(E) [Link]=[Link]
F→digit [Link]=[Link]
18
Annotated Parse Tree
19
Syntax-Directed Definition – Inherited Attributes
Production Semantic Rules
DTL {[Link] = [Link] }
T int { [Link] = integer }
T real { [Link] = real }
L L1,id { [Link] = [Link] : addtype([Link],[Link]) }
L id { addtype([Link],[Link]) }
§ Symbol T is associated with a synthesized attribute type.
§ Symbol L is associated with an inherited attribute in.
§ We can use inherited attributes to track type information
§ We can use inherited attributes to track whether an identifier appear on the left or right side of an
assignment operator “:=” ( e.g. a := a +1 )
20
Syntax-Directed Definition – Inherited Attributes
21
S-Attributed and L-Attributed Definitions
§ There are two sub-classes of the syntax-directed definitions:
§ S-Attributed Definitions: only synthesized attributes used in the syntax-directed definitions.
§ L-Attributed Definitions: in addition to synthesized attributes, we may also use inherited
attributes in a restricted fashion.
§ Implementation: S-Attributed and L-Attributed definitions are easy
§ we can evaluate semantic rules in a single pass during the parsing
§ However, implementations of S-attributed Definitions are a little bit easier
than implementations of L-Attributed Definitions
§ An S-attributed SDD can be implemented naturally in conjunction with an LR
parser.
22
S-Attributed Definitions
§ Uses only synthesized attributes
§ Semantic Actions are always placed at the right end of the production. Ti is also
called Postfix SDD.
§ Attributes are evaluated with bottom-up parsing.
23
L-Attributed Definitions
§ Uses both synthesized and inherited attributes but inherited attributes is
restricted to inherit from parent or left sibling only.
§ Semantic actions are placed anywhere at RHS
§ Attributes are evaluated by traversing parse tree first left to right order.
24
Bottom-Up Eval. of S-Attributed Definitions
We put the values of the synthesized attributes of the grammar symbols into a
parallel stack.
When an entry of the parser stack holds a grammar symbol X (terminal or non-
terminal), the corresponding entry in the parallel stack will hold the synthesized
attribute(s) of the symbol X.
We evaluate the values of the attributes during reductions.
A XYZ A.a=f(X.x,Y.y,Z.z) where all attributes are synthesized.
25
Bottom-Up Eval. of S-Attributed Definitions …
At each shift of digit, we also push [Link] into val-stack.
At all other shifts, we do not put anything into val-stack because other terminals
do not have attributes (but we increment the stack pointer for val-stack).
26
Canonical LR(0) Collection for The Grammar
.. . . .
I0: L’→
.
L L’→L I7: L →Er *
I11: E →E+T 9
.. .. .
L→ Er r T T →T *F
E→
E→
..
E+T E
T
I2: L →E r
E →E +T
+
..
I8: E →E+ T
T → T*F (
F 4
5
T→
T→
F→ ..
T*F
F
(E)
T I3: E →T
T →T *F
.. ..
T→ F
F → (E)
F→ d
d
6
F→ d
F
. *
. .
I4: T →F
. ..
I9: T →T* F F
I12: T →T*F
..
( I5: F → ( E) F → (E)
(
E→ E+T F→ d 5
E
..
E→ T d
6
T→
T→
F→..
T*F
F
(E)
T
3
..
I10: F →(E )
E →E +T +
) I13: F →(E) .
F→ d F
4 8
d I6: F →d . (
d
5
6
27
Bottom-Up Evaluation -- Example
28
Bottom-Up Evaluation -- Example
Approach:
1. Semantic stack: Store attributes. (May be separate from main stack).
2. For every symbol shifted, store its corresponding attribute on stack.
3. For every reduction A µq, compute attribute of A by popping attributes for µ and
q from semantic stack.
Example 2:
29
Bottom-Up Evaluation -- Example
Stack for 2 + 3:
30
L-Attributed Definitions
A syntax-directed definition is L-attributed if each inherited attribute of Xj,
where 1jn, on the right side of A X1X2...Xn depends only on:
1. The attributes of the symbols X1,...,Xj-1 to the left of Xj in the production and
2. the inherited attribute of A
L-Attributed Definitions can always be evaluated by the depth first visit of the parse
tree-this means that they can also be evaluated during the parsing.
Algorithm: L-Eval(n: Node)
Input: Node of an annotated parse-tree.
Output: Attribute evaluation
31
Translation Schemes
In a syntax-directed definition, we do not say anything about the evaluation
times of the semantic rules
when the semantic rules associated with a production should be evaluated?
A translation scheme is a context-free grammar in which:
attributes are associated with the grammar symbols and
semantic actions enclosed between braces {} are inserted within the right
sides of productions.
Ex: A { ... } X { ... } Y { ... } § Translation schemes
f indicate the order in which
semantic rules and
Semantic Actions attributes are to be
evaluated.
32
A Translation Scheme Example
A simple translation scheme that converts infix expressions to the
corresponding postfix expressions.
E T R
R + T { print(“+”) } R1
R
T id { print([Link]) }
a+b+c ab+c+
infix expression postfix expression
33
A Translation Scheme Example
§ The depth first traversal of the parse tree (executing the semantic actions in that
order) will produce the postfix representation of the infix expression.
34