Syntax Directed Translation
Syntax Directed Translation
•SDT maintains the compiler and is easy to modify as it separates the translation process from the
parsing method.
•Also, SDT does not perform well when the translations are complex.
•In addition to that, the process of error fixing and error location is not easy in SDT. This happens
in the case of complex translations.
Syntax Directed Definitions
A SDD is a context free grammar with attributes and
rules
Attributes are associated with grammar symbols and
rules with productions
Attributes may be of many kinds: numbers, types,
table references, strings, etc.
Synthesized attributes
A synthesized attribute at node N is defined only in
terms of attribute values of children of N and at N itself
Inherited attributes
An inherited attribute at node N is defined only in terms
of attribute values at N’s parent, N itself and N’s siblings
Example of S-attributed SDD
Production Semantic Rules
1) L -> E n L.val = E.val
2) E -> E1 + T E.val = E1.val + T.val
3) E -> T E.val = T.val
4) T -> T1 * F T.val = T1.val * F.val
5) T -> F T.val = F.val
6) F -> (E) F.val = E.val
7) F -> digit F.val = digit.lexval
Example of mixed attributes
Production Semantic Rules
1) T -> FT’ T’.inh = F.val
T.val = T’.syn
2) T’ -> *FT’1 T’1.inh = T’.inh*F.val
T’.syn = T’1.syn
3) T’ -> ε T’.syn = T’.inh
1) F -> digit F.val = digit.lexval
SDD with inherited attribute
Annotated parse tree
Parse tree with inherited
attribute
Evaluation orders for SDD’s
A dependency graph is used to determine the order of
computation of attributes
Dependency graph
For each parse tree node, the parse tree has a node for
each attribute associated with that node
If a semantic rule defines the value of synthesized
attribute A.b in terms of the value of X.c then the
dependency graph has an edge from X.c to A.b
If a semantic rule defines the value of inherited
attribute B.c in terms of the value of X.a then the
dependency graph has an edge from X.c to B.c
Ordering the evaluation of
attributes
If dependency graph has an edge from M to N then M
must be evaluated before the attribute of N
Thus the only allowable orders of evaluation are
those sequence of nodes N1,N2,…,Nk such that if
there is an edge from Ni to Nj then i<j
Such an ordering is called a topological sort of a
graph
S-Attributed definitions
An SDD is S-attributed if every attribute is synthesized
We can have a post-order traversal of parse-tree to
evaluate attributes in S-attributed definitions
postorder(N) {
for (each child C of N, from the left) postorder(C);
evaluate the attributes associated with node N;
}
S-Attributed definitions can be implemented during bottom-
up parsing without the need to explicitly create parse trees
L-Attributed definitions
A SDD is L-Attributed if the edges in dependency graph
goes from Left to Right but not from Right to Left.
More precisely, each attribute must be either
Synthesized
Inherited, but if there us a production A->X1X2…Xn and
there is an inherited attribute Xi.a computed by a rule
associated with this production, then the rule may only use:
Inherited attributes associated with the head A
Either inherited or synthesized attributes associated with the
occurrences of symbols X1,X2,…,Xi-1 located to the left of Xi
Inherited or synthesized attributes associated with this occurrence
of Xi itself, but in such a way that there is no cycle in the graph
Intermediate Code Generation
The following are commonly used intermediate representations:
1. Postfix notation
2. Syntax tree
3. Three-address code
Postfix notation
Syntax Tree
The syntax tree is nothing more than a condensed form of the parse tree.
Parse tree for the string id+id*id. Syntax tree for id+id*id
Three Address Code
Three address code is a sequence of statements of the
form x = y op z. Since a statement involves no more
than three references, it is called a "three-address
statement," and a sequence of such statements is
referred to as three-address code. For example, the
three-address code for the expression A + B * C + D is:
Sometimes a statement might contain less than three
references; but it is still called a three-address
statement. The following are the three-address
statements used to represent various programming
language constructs:
Used for representing arithmetic expressions:
Used for representing Boolean expressions:
(1) + a b t1
(2) - c t2
(3) * t1 t2 t3
(4) / t3 d t4
(5) = t4 x
t1=i*4
t2=addr(a)-4
t3=t2[t1]
a[i,j] is an array of size 30X40 and 4 bytes per word
(bpw)
Translation:
t1=i*40
t1=t1+j
t1=t1*4
t2=addr(a)-164
t3=t2[t1]
main()
{ int i = 1;
int a[10];
while(i <= 10)
a[i] =0 ;
}
The three-address code for the above C program is:
1.i=1
2.if i <= 10 goto(4)
3.goto(8)
4.t1 = i * width
6.t2[t1] = 0
7.goto(2)