0% found this document useful (0 votes)
104 views

15Cs314J - Compiler Design: Unit Iii

The unit discusses semantic analysis and intermediate code generation in compiler design. It describes how semantic analysis verifies the meaning and validity of a program by analyzing syntax trees. Different types of attributes like synthesized and inherited are used. Intermediate languages like postfix, triples and quadruples are discussed which help generate machine-independent code and improve optimization.

Uploaded by

KANCHAN -
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

15Cs314J - Compiler Design: Unit Iii

The unit discusses semantic analysis and intermediate code generation in compiler design. It describes how semantic analysis verifies the meaning and validity of a program by analyzing syntax trees. Different types of attributes like synthesized and inherited are used. Intermediate languages like postfix, triples and quadruples are discussed which help generate machine-independent code and improve optimization.

Uploaded by

KANCHAN -
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

15CS314J – COMPILER DESIGN

UNIT III
Prepared by:
Dr.R.I.Minu
Associate Professor

1
UNIT III

Unit III
Semantic
Intermediate Code generator
analysis
Intermediate
Synthesized Languages - prefix -
Syntax tree- Intermediate
attributes – postfix - Quadruple Assignment Boolean Back patching –
Evaluation of languages – Case Statements
Inherited - triple - indirect Statements Expressions Procedure calls.
expression - Declarations
attributes triples - three-
address code
SEMANTIC ANALYSIS
• From Syntax analysis ,We have learnt how a parser constructs parse trees in
the syntax analysis phase.
• The plain parse-tree constructed in that phase is generally of no use for a
compiler, as it does not carry any information of how to evaluate the tree.
• Semantics of a language provide meaning to its constructs, like tokens and
syntax structure.
• Semantics help interpret symbols, their types, and their relations with each
other.
• Semantic analysis judges whether the syntax structure constructed in the
source program derives any meaning or not.
SEMANTIC ANALYSIS
• There are two ways to represent the semantic rules associated with
grammar symbols.
• Syntax-Directed Definitions (SDD)
• Syntax-Directed Translation Schemes (SDT)
• Syntax Directed Definitions (SDD)
Context free Grammar + Semantic Rule = SDD
• Syntax-Directed Translation Schemes (SDT) embeds program
fragments called semantic actions within production bodies
• SDTs are more efficient than SDDs as they indicate the order of
evaluation of semantic actions associated with a production rule.
Syntax Directed Definitions
• A syntax-directed definition (SDD) is a context-free grammar together
with attributes and rules.
• Attributes are associated with grammar symbols
• Rules are associated with productions.
SDD- Attributes
• Additional information (attributes) are appended to one or more of
its non-terminals in order to provide context-sensitive information
• Each attribute has well-defined domain of values, such as integer,
float, character, string, and expressions.
• Attribute grammar (when viewed as a parse-tree) can pass values or
information among the nodes of a tree.

E→E+T
{ E.value = E.value + T.value }
SDD- Attributes
• Semantic attributes may be assigned to their values from their
domain at the time of parsing and evaluated at the time of
assignment or conditions.
• Based on the way the attributes get their values, they can be broadly
divided into two categories
• Synthesized attributes and
• Inherited attributes.
Synthesized attributes

• These attributes get values from the attribute values of their child
nodes. To illustrate, assume the following production:
S→ ABC

• If S is taking values from its child nodes (A,B,C), then it is said to be a


synthesized attribute, as the values of ABC are synthesized to S.
• As in our previous example (E → E + T), the parent node E gets its
value from its child node. Synthesized attributes never take values
from their parent nodes or any sibling nodes.
Inherited attributes

• In contrast to synthesized attributes, inherited attributes can take


values from parent and/or siblings. As in the following production,
S→ ABC

• A can get values from S, B and C. B can take values from S, A, and C.
Likewise, C can take values from S, A, and B.
S-attributed SDT

• If an SDT uses only synthesized attributes, it is called as S-attributed SDT.


These attributes are evaluated using S-attributed SDTs that have their
semantic actions written after the production (right hand side).

• As depicted above, attributes in S-attributed SDTs are evaluated in bottom-


up parsing, as the values of the parent nodes depend upon the values of
the child nodes.
L-attributed SDT

• This form of SDT uses both synthesized and inherited attributes with
restriction of not taking values from right siblings.
• In L-attributed SDTs, a non-terminal can get values from its parent,
child, and sibling nodes. As in the following production
S→ ABC

• S can take values from A, B, and C (synthesized). A can take values


from S only. B can take values from S and A. C can get values from S,
A, and B. No non-terminal can get values from the sibling to its right.
• Attributes in L-attributed SDTs are evaluated by depth-first and left-
to-right parsing manner.
S-attributed SDT & L-attributed SDT

• We may conclude that if a definition is S-attributed, then it is also L-


attributed as L-attributed definition encloses S-attributed definitions.
Syntax Direct Definition
Example of Synthesized attribute
S-attribute
• Steps to be followed to compute S attributed definition
• Write the syntax directed definition using the appropriate semantic
action for corresponding production rule of the given grammar
• The annotated parse tree is generated and attribute values are
computed. The computation is done in bottom up manner
• The value obtained at the root node is supposed to be the final
output
Parse tree, syntax tree and annotated parse
tree for the input string 5*6+7
Example of inherited attribute

For the string int a,b,c we have to distribute the data type int to all the identifiers a,b and c;
Steps to be followed
• Construct the syntax directed definition using semantic action
• Annotate the parse tree with inherited attributes by processing in top down fashion
Dependency graph

• The directed graph that represents the interdependencies between


synthesized and inherited attributes at nodes in the parse tree is
called dependency graph

• For the rule X-> YZ the semantic action is given by X.x = f(Y.y,Z.z) then
synthesized attribute is X.x and X.x depends upon attributes Y.y and
Z.z
Dependency graph for L- Attribute
From Sibling

Parent to
child
Evaluation Order

• The topological sort of the dependency graph decided the evaluation


order in a parse tree
• In deciding valuation order the semantic rules in the syntax directed
definition are used .
Evaluation Order
Construction of Syntax Trees
• The nodes of the syntax tree is represented as objects with suitable
number of fields
• Each object will have an op field that is the label of the node.
• The objects will have additional fields as follows:
• If the node is a leaf, an additional field holds the lexical value for the leaf.A
constructor function Leaf ( op, val) creates a leaf object. Alternatively, if nodes
are viewed as records, then Leaf returns a pointer to a new record for a leaf.
• If the node is an interior node, there are as many additional fields as the node
has children in the syntax tree. A constructor function Node takes two or
more arguments: Node(op, cl, ca, . . . , ck) creates an object with first field op
and k additional fields for the k children cl, . . . , ck.
Construction of Syntax Trees
Example 1
Example 1
Example 2 (x*y-5+z)
Example 2 (x*y-5+z) (SDD for this grammar)
Variant of syntax tree- Directed Acyclic Graph
for Expression
• Directed Acyclic Graph (DAG) is a tool that depicts the structure of
basic blocks, helps to see the flow of values flowing among the basic
blocks, and offers optimization too. DAG provides easy transformation
on basic blocks. DAG can be understood here:

• Leaf nodes represent identifiers, names or constants.


• Interior nodes represent operators.
• Interior nodes also represent the results of expressions or the
identifiers/name where the values are to be stored or assigned.
Intermediate code generation
Intermediate code generation
• If we generate machine code directly from source code then for n
target machine we will have n optimisers and n code generators but if
we will have a machine independent intermediate code,
we will have only one optimiser.
• Intermediate code can be either language specific (e.g., Bytecode for
Java) or language independent (three-address code).
Postfix
• The ordinary (infix) way of writing the sum of a and b is with operator
in the middle : a + b
The postfix notation for the same expression places the operator at
the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is any
binary operator, the result of applying + to the values denoted by e1
and e2 is postfix notation by e1e2 +.
• Example – The postfix representation of the expression
(a – b) * (c + d) + (a – b) is : ab – cd + *ab -+.
Three-Address Code

• The RHS should have single operand and the LHS should have two
operand and a operator
• Example :
Question: A=-B * (C/D)
T1 = -B
T2 = C/D
T3 = T1*T2
A = T3
Quadruples

• Each instruction in quadruples presentation is divided into four fields:


operator, arg1, arg2, and result. The above example is represented
below in quadruples format:
• Eg: a=b+(c*d)
Op arg1 arg2 result
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
Triples

• Each instruction in triples presentation has three fields : op, arg1, and
arg2.
• The results of respective sub-expressions are denoted by the position
of expression.
Op arg1 arg2
(0) * c d
(1) + b (0)
(2) + (1) (0)
(3) = (2)
Indirect Triples

• This representation is an enhancement over triples representation. It


uses pointers instead of position to store results.

Statement Op arg1 arg2

(0) (32) (0) * c d

(1) (33) (1) + b (32)

(2) (34) (2) + (33) (32)

(3) (35) (3) = (34)


Declaration
Declaration
• In the declarative statement the data items along with their data
types are declared
Declaration
• The body of the T-production consists of nonterminal B, an action,
and nonterminal C, which appears on the next line.
• The action between B and C sets t to B.type and w to B. width.
• If B = int then B. type is set to integer and B. width is set to 4, the
width of an integer.
• If B = float then B. type is float and B. width is 8, the width of a
float.
• The productions for C determine whether T generates a basic type
or an array type.
• If C = ɛ, then t becomes C.type and w becomes C.width.
• Otherwise, C specifies an array component.
• The action for C = [ num } C1 forms C.type by applying the type
constructor array to the operands num.value and Cl .type.
• The result of applying array might be a tree structure
Declaration
In the declarative statement the data items along with their data types are declared
Assignment
Assignment
• Expression can be of type integer, real, array and record.
• As part of the translation of assignment to three-address code,here
the names can be looked up in the symbol table
Example
Example x=(a+b)*(c+d)
Boolean Expressions
Boolean Expressions
• Two primary purpose
• Used to compute logical value
• Used as conditional expression that alter the flow of control
• Boolean expressions are composed of the boolean operators
• Denote &&, ||,and !, using the C convention for the operators AND,
OR, and NOT, respectively applied to elements that are boolean
variables or relational expressions.
• Relational expressions are of the form El re1 E2, where El and E2 are
arithmetic expressions
Method of translating Boolean expression
• There are two principal methods of representing the value of a
Boolean expression
• Encode true and false numerically and to evaluate a Boolean
expression analogously to an arithmetic expression
• 1 is used for true and 0 is used for false
• Second method of implementation is by flow of control, representing
the value of a Boolean expression by a position reached in a program
• If E1 or E2, if we determine that E1 is true, then we can conclude that the
entire expression is true without having to evaluate E2
Numerical Representation_ Boolean Expression
• Expressed using 1 to denote true and 0 to denote false
• Express will be evaluated completely from left to right A relational expression
such as a<b is equivalent
Example: Consider “a or b and not c” to the conditional
statement if a < b then 1
• Is the three address sequence else 0 which can be
translated into the three
t1 = not c address code sequence
t2 = b and t1
t3 = a or t2
emit places three address statements into an
output file in the right format

nextstat gives the index of the next three


address statement in the output sequence

emit increments nextstat after producing


each three address statement
Short circuit
• The translation of Boolean expression into three address code
without generating code for any of the Boolean operators
• This style of evaluation is called short circuit or jumping code
Flow-of-Control Statements

• In these productions,
• Nonterminal B represents a Boolean
expression
• Nonterminal S represents a statement.

• Both B and S have a synthesized attribute code,


which gives the translation into three-address
instructions.
• For simplicity, we build up the translations B. code
and S. code as strings
Flow-of-Control Statements
• The translation of if (B) S1 consists of B. code
followed by S1. code,
• Within B. code are jumps based on the value of B.
• If B is true, control flows to the first instruction of
S1 .code,
• if B is false, control flows to the instruction
immediately following Sl .code.
• The labels for the jumps in B.code and S.code are
managed using inherited attribute
• With a statement S, we associate an inherited
attribute S.next denoting a label for the
instruction immediately after the code for S.
• In some cases, the instruction immediately
following S.code is a jump to some label L.
• A jump to a jump to L from within S.code is
avoided using S.next.
Control-Flow Translation of Boolean Expressions
• Suppose B is of the form B1 I I B2.
• If B1 is true, then we immediately know that B itself is
true ---- so Bl.true is the same as B.true.
• If B1 is false, then B2 must be evaluated, so we make
Bl.false be the label of the first instruction in the code for
B2.
• The true and false exits of B2 are the same as the true
and false exits of B, respectively.

• The translation of Bl && B2 is


similar.

No code is needed for an expression B


of the form ! B1: just interchange
The constants true and false translate the true and false exits of B to get the
into jumps to B.true and B.false, true and false exits of B1.
respectively
Case Statements

• The "switch" or "case" statement is available in a variety of languages


Translation of Switch-Statements
• The intended translation of a switch is code to:
1. Evaluate the expression E.
2. Find the value V, in the list of cases that is the same as the value of
the expression. Recall that the default value matches the expression
if none of the values explicitly mentioned in cases does.
3. Execute the statement Sj associated with the value found.
Translation of Switch-Statements – Step 2
• Step (2) is an n-way branch, which can be implemented in one of
several ways.
• If the number of cases is small, say 10 at most, then it is reasonable
to use a sequence of conditional jumps, each of which tests for an
individual value and transfers to the code for the corresponding
statement.
• If the number of values exceeds 10 or so, it is more efficient to
construct a hash table for the values, with the labels of the various
statements as entries.
• If no entry for the value possessed by the switch expression is found,
a jump to the default statement is generated
Translation of Switch-Statements – Step 2
• There is a common special case that can be implemented even more
efficiently than by an n-way branch.
• If the values all lie in some small range, say min to max, and the number of
different values is a reasonable fraction of max - min, then we can
construct an array of max - min "buckets,"
• Where bucket j - min contains the label of the statement with value j;
• Any bucket that would otherwise remain unfilled contains the default label.
• To perform the switch,
• Evaluate the expression to obtain the value j;
• Check that it is in the range min to max and
• Transfer indirectly to the table entry at offset j - min
Syntax-Directed Translation of Switch-
Statements
• when we see the keyword switch, we generate two
new labels test and next, and a new temporary t.
• Then, as we parse the expression E, we generate
code to evaluate E into t.
• After processing E, we generate the jump goto test
.
• Then, as we see each case keyword, we create a
new label Li and enter it into the symbol table.
• We place in a queue, used only to store cases, a
value-label pair consisting of the value V, of the case
constant and Li (or a pointer to the symbol-table
entry for Li)
• We process each statement case V, : Si by emitting
the label Li attached to the code for Si followed by
the jump goto next.
Syntax-Directed Translation of Switch-
Statements
Back patching
• A key problem when generating code for Boolean expressions and
flow-of-control statements is that of matching a jump instruction with
the target of the jump.
• For example, the translation of the Boolean expression B in if ( B ) S
contains a jump, for when B is false, to the instruction following the
code for S.
• In a one-pass translation, B must be translated before S is examined.
• What then is the target of the goto that jumps over the code for S?
Backpatching for Boolean Expressions
Backpatching for Boolean Expressions
• Consider semantic action (1) for the production B : B1 ||M B2.
• If B1 is true, then B is also true, so the jumps on B1. truelist become part of
B.truelist.
• If B1 is false, however, we must next test B2, so the target for the jumps
Bl.falselist must be the beginning of the code generated for B2.
• This target is obtained using the marker nonterminal M. BB1 || M B2 {
• That nonterminal produces, as a synthesized attribute M.instr, the index backpatch(B1.falselist,M.instr);
of the next instruction, just before B2 code starts being generated. B.Truelist =
merge(B1.truelist,B2.truelist);
B.Falselist = B2.falselist;

• To obtain that instruction index, we associate with the production The


variable nextinstr holds the index of the next instruction to follow.
• This value will be backpatched onto the Bl .falselist (i.e., each instruction
on the list Bl. falselist will receive M.instr as its target label)
https://www.youtube.com/watch?v=queUceGJqh0

You might also like