0% found this document useful (0 votes)

21 views36 pages

Unit 4&5 Final

The document discusses Semantic Analysis in programming languages, detailing its role in computing additional meaning beyond syntax, including type checking and symbol table management. It explains Syntax Directed Translation, including its principles and notations, and introduces intermediate code generation, highlighting the benefits of machine-independent forms and various representations like syntax trees and three-address code. Additionally, it outlines the advantages of three-address code for optimization and target code generation, along with types of statements used in this representation.

Uploaded by

kuttia726

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views36 pages

Unit 4&5 Final

Uploaded by

kuttia726

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

SEMANTIC ANALYSIS

SEMANTIC ANALYSIS
➢ Semantic Analysis computes additional information related to the
meaning of the program once the syntactic structure is known.
➢ In typed languages as C, semantic analysis involves adding information to the symbol

table and performing type checking.

➢ The information to be computed is beyond the capabilities of
standard parsing techniques, therefore it is not regarded as syntax.
➢ As for Lexical and Syntax analysis, also for Semantic Analysis we need both a

Representation Formalism and an Implementation Mechanism.

➢ As representation formalism this lecture illustrates what are called Syntax
Directed
Translations.
SYNTAX DIRECTED TRANSLATION
➢ The Principle of Syntax Directed Translation states that the meaning
of an input sentence is related to its syntactic structure, i.e., to its
Parse-Tree.
➢ By Syntax Directed Translations we indicate those formalisms
for specifying translations for programming language constructs guided by
context-free grammars.
o We associate Attributes to the grammar symbols representing the
language
constructs.
o Values for attributes are computed by Semantic Rules
associated with grammar productions.
➢ Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
o Issue error messages;
There are two notations for attaching semantic rules:
1. Syntax Directed Definitions. High-level specification hiding many
implementation details (also called Attribute Grammars).
2. Translation Schemes. More implementation oriented: Indicate the order in which

semantic rules are to be evaluated.

Syntax Directed Definitions
• Syntax Directed Definitions are a generalization of context-free grammars in which:
1. Grammar symbols have an associated set of Attributes;
2. Productions are associated with Semantic Rules for computing the values of attributes.
▪ Such formalism generates Annotated Parse-Trees where each node of
the tree is a record with a field for each attribute (e.g.,X.a indicates the
attribute a of the grammar symbol X).
▪ The value of an attribute of a grammar symbol at a given parse-tree node is defined by
a semantic rule associated with the production used at that node.

We distinguish between two kinds of attributes:

1. Synthesized Attributes. They are computed from the values of the
attributes of the children nodes.
2. Inherited Attributes. They are computed from the values of the
attributes of both the siblings and the parent nodes

Syntax Directed Definitions: An Example

• Example. Let us consider the Grammar for arithmetic
expressions. The Syntax Directed Definition associates to
each non terminal a synthesized attribute called val.
S-ATTRIBUTED DEFINITIONS
Definition. An S-Attributed Definition is a Syntax Directed
Definition that uses only synthesized attributes.
• Evaluation Order. Semantic rules in a S-Attributed Definition can be

evaluated by a bottom-up, or PostOrder, traversal of the parse-tree.
• Example. The above arithmetic grammar is an example of an S-Attributed
Definition. The annotated parse-tree for the input 3*5+4n is:
L-attributed definition
Definition: A SDD its L-attributed if each inherited attribute of Xi in the RHS of A ! X1 :

:Xn depends only on

1. attributes of X1;X2; : : : ;Xi1 (symbols to the left of Xi in the RHS)
2. inherited attributes of A.
Restrictions for translation schemes:
1. Inherited attribute of Xi must be computed by an action before Xi.
2. An action must not refer to synthesized attribute of any symbol to the right of that action.
3. Synthesized attribute for A can only be computed after all attributes it
references have been completed (usually at end of RHS).
UNIT III – INTERMEDIATE CODE GENERATION

INTRODUCTION

The front end translates a source program into an intermediate representation from which
the back end generates target code.

Benefits of using a machine-independent intermediate form are:

1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end.

2. A machine-independent code optimizer can be applied to the intermediate representation.

Position of intermediate code generator

parser static intermediate intermediate code

checker code generator generator
code

INTERMEDIATE LANGUAGES

Three ways of intermediate representation:

 Syntax tree

 Postfix notation

 Three address code

The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.

Graphical Representations:

Syntax tree:

A syntax tree depicts the natural hierarchical structure of a source program. A dag
(Directed Acyclic Graph) gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statement a : =
b * - c + b * - c are as follows:

NOTES.PMR-INSIGNIA.ORG
assign assign

a + a +

* * *

b uminus b uminus b uminus

c c c

(a) Syntax tree (b) Dag

Postfix notation:

Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of

the tree in which a node appears immediately after its children. The postfix notation for the
syntax tree given above is

a b c uminus * b c uminus * + assign

Syntax-directed definition:

Syntax trees for assignment statements are produced by the syntax-directed definition.
Non-terminal S generates an assignment statement. The two binary operators + and * are
examples of the full operator set in a typical language. Operator associativities and precedences
are the usual ones, even though they have not been put into the grammar. This definition
constructs the tree from the input a : = b * - c + b* - c.

PRODUCTION SEMANTIC RULE

S  id : = E S.nptr : = mknode(‘assign’,mkleaf(id, id.place), E.nptr)

E  E1 + E2 E.nptr : = mknode(‘+’, E1.nptr, E2.nptr )

E  E1 * E 2 E.nptr : = mknode(‘*’, E1.nptr, E2.nptr )

E  - E1 E.nptr : = mknode(‘uminus’, E1.nptr)

E  ( E1 ) E.nptr : = E1.nptr

E  id E.nptr : = mkleaf( id, id.place )

Syntax-directed definition to produce syntax trees for assignment statements

NOTES.PMR-INSIGNIA.ORG
The token id has an attribute place that points to the symbol-table entry for the identifier.
A symbol-table entry can be found from an attribute id.name, representing the lexeme associated
with that occurrence of id. If the lexical analyzer holds all lexemes in a single array of
characters, then attribute name might be the index of the first character of the lexeme.

Two representations of the syntax tree are as follows. In (a) each node is represented as a
record with a field for its operator and additional fields for pointers to its children. In (b), nodes
are allocated from an array of records and the index or position of the node serves as the pointer
to the node. All the nodes in the syntax tree can be visited by following pointers, starting from
the root at position 10.

Two representations of the syntax tree

aaaaaaaaaaaaa
assign 0 id b

1 id c
id a
2 uminus
2 1

3 * 0 2
+
4 id b

5 id c
* *
6 uminus 5
id b id b
7 * 4 6
bb
uminus uminus 8 + 3 7

9 id a
id c id c
10 assign 9 8

(a) (b)

Three-Address Code:

Three-address code is a sequence of statements of the general form

x : = y op z

where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into a sequence

t1 : = y * z
t2 : = x + t 1

where t1 and t2 are compiler-generated temporary names.

NOTES.PMR-INSIGNIA.ORG
Advantages of three-address code:

 The unraveling of complicated arithmetic expressions and of nested flow-of-control

statements makes three-address code desirable for target code generation and
optimization.

 The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged – unlike postfix notation.

Three-address code is a linearized representation of a syntax tree or a dag in which

explicit names correspond to the interior nodes of the graph. The syntax tree and dag are
represented by the three-address code sequences. Variable names can appear directly in three-
address statements.

Three-address code corresponding to the syntax tree and dag given above

t1 : = - c t 1 : = -c

t 2 : = b * t1 t2 : = b * t1

t3 : = - c t 5 : = t 2 + t2

t 4 : = b * t3 a : = t5

t5 : = t2 + t4

a : = t5

(a) Code for the syntax tree (b) Code for the dag

The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.

Types of Three-Address Statements:

The common three-address statements are:

1. Assignment statements of the form x : = y op z, where op is a binary arithmetic or logical

operation.

2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary

operations include unary minus, logical negation, shift operators, and conversion operators
that, for example, convert a fixed-point number to a floating-point number.

3. Copy statements of the form x : = y where the value of y is assigned to x.

4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.

5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (
<, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in relation
NOTES.PMR-INSIGNIA.ORG
relop to y. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.

6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x 1, x2, …. ,xn ).

7. Indexed assignments of the form x : = y[i] and x[i] : = y.

8. Address and pointer assignments of the form x : = &y , x : = y, and x : = y.

Syntax-Directed Translation into Three-Address Code:

When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. For example, id : = E consists of code to evaluate E into some temporary
t, followed by the assignment id.place : = t.

Given input a : = b * - c + b * - c, the three-address code is as shown above. The

synthesized attribute S.code represents the three-address code for the assignment S.
The nonterminal E has two attributes :
1. E.place, the name that will hold the value of E , and
2. E.code, the sequence of three-address statements evaluating E.

Syntax-directed definition to produce three-address code for assignments

PRODUCTION SEMANTIC RULES

S  id : = E S.code : = E.code || gen(id.place ‘:=’ E.place)

E  E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)

E  E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)

E  - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)

E  ( E1 ) E.place : = E1.place;
E.code : = E1.code

E  id E.place : = id.place;
E.code : = ‘ ‘

NOTES.PMR-INSIGNIA.ORG
Semantic rules generating code for a while statement

S.begin:

E.code

if E.place = 0 goto S.after

S1.code

goto S.begin

S.after: ...

PRODUCTION SEMANTIC RULES

S  while E do S1 S.begin := newlabel;

S.after := newlabel;
S.code := gen(S.begin ‘:’) ||
E.code ||
gen ( ‘if’ E.place ‘=’ ‘0’ ‘goto’ S.after)||
S1.code ||
gen ( ‘goto’ S.begin) ||
gen ( S.after ‘:’)

 The function newtemp returns a sequence of distinct names t1,t2,….. in response to

successive calls.
 Notation gen(x ‘:=’ y ‘+’ z) is used to represent three-address statement x := y + z.
Expressions appearing instead of variables like x, y and z are evaluated when passed to
gen, and quoted operators or operand, like ‘+’ are taken literally.
 Flow-of–control statements can be added to the language of assignments. The code for S
 while E do S1 is generated using new attributes S.begin and S.after to mark the first
statement in the code for E and the statement following the code for S, respectively.
 The function newlabel returns a new label every time it is called.
 We assume that a non-zero expression represents true; that is when the value of E
becomes zero, control leaves the while statement.

Implementation of Three-Address Statements:

A three-address statement is an abstract form of intermediate code. In a compiler,

these statements can be implemented as records with fields for the operator and the operands.
Three such representations are:

NOTES.PMR-INSIGNIA.ORG
 Quadruples

 Triples

 Indirect triples

Quadruples:

 A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.

 The op field contains an internal code for the operator. The three-address statement x : =
y op z is represented by placing y in arg1, z in arg2 and x in result.

 The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.

Triples:

 To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.

 If we do so, three-address statements can be represented by records with only three fields:
op, arg1 and arg2.

 The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).

 Since three fields are used, this intermediate code format is known as triples.

op arg1 arg2 result op arg1 arg2

(0) uminus c t1 (0) uminus c

(1) * b t1 t2 (1) * b (0)

(2) uminus c t3 (2) uminus c

(3) * b t3 t4 (3) * b (2)

(4) + t2 t4 t5 (4) + (1) (3)

(5) := t3 a (5) assign a (4)

(a) Quadruples (b) Triples

Quadruple and triple representation of three-address statements given above

NOTES.PMR-INSIGNIA.ORG
A ternary operation like x[i] : = y requires two entries in the triple structure as shown as below
while x : = y[i] is naturally represented as two operations.

op arg1 arg2 op arg1 arg2

(0) []= x i (0) =[] y i

(1) assign (0) y (1) assign x (0)

(a) x[i] : = y (b) x : = y[i]

Indirect Triples:

 Another implementation of three-address code is that of listing pointers to triples, rather

than listing the triples themselves. This implementation is called indirect triples.

 For example, let us use an array statement to list pointers to triples in the desired order.
Then the triples shown above might be represented as follows:

statement op arg1 arg2

(0) (14) (14) uminus c

(1) (15) (15) * b (14)
(2) (16) (16) uminus c
(3) (17) (17) * b (16)
(4) (18) (18) + (15) (17)
(5) (19) (19) assign a (18)

Indirect triples representation of three-address statements

DECLARATIONS

As the sequence of declarations in a procedure or block is examined, we can lay out

storage for names local to the procedure. For each local name, we create a symbol-table entry
with information like the type and the relative address of the storage for the name. The relative
address consists of an offset from the base of the static data area or the field for local data in an
activation record.

NOTES.PMR-INSIGNIA.ORG
Declarations in a Procedure:
The syntax of languages such as C, Pascal and Fortran, allows all the declarations in a
single procedure to be processed as a group. In this case, a global variable, say offset, can keep
track of the next available relative address.

In the translation scheme shown below:

 Nonterminal P generates a sequence of declarations of the form id : T.

 Before the first declaration is considered, offset is set to 0. As each new name is seen ,
that name is entered in the symbol table with offset equal to the current value of offset,
and offset is incremented by the width of the data object denoted by that name.

 The procedure enter( name, type, offset ) creates a symbol-table entry for name, gives its
type type and relative address offset in its data area.

 Attribute type represents a type expression constructed from the basic types integer and
real by applying the type constructors pointer and array. If type expressions are
represented by graphs, then attribute type might be a pointer to the node representing a
type expression.

 The width of an array is obtained by multiplying the width of each element by the
number of elements in the array. The width of each pointer is assumed to be 4.

Computing the types and relative addresses of declared names

P D { offset : = 0 }

DD;D

D  id : T { enter(id.name, T.type, offset);

offset : = offset + T.width }

T  integer { T.type : = integer;

T.width : = 4 }

T  real { T.type : = real;

T.width : = 8 }

T  array [ num ] of T1 { T.type : = array(num.val, T1.type);

T.width : = num.val X T1.width }

T  ↑ T1 { T.type : = pointer ( T1.type);

T.width : = 4 }

NOTES.PMR-INSIGNIA.ORG
Keeping Track of Scope Information:

When a nested procedure is seen, processing of declarations in the enclosing procedure is

temporarily suspended. This approach will be illustrated by adding semantic rules to the
following language:

PD

D  D ; D | id : T | proc id ; D ; S

One possible implementation of a symbol table is a linked list of entries for names.

A new symbol table is created when a procedure declaration D  proc id D1;S is seen,
and entries for the declarations in D1 are created in the new table. The new table points back to
the symbol table of the enclosing procedure; the name represented by id itself is local to the
enclosing procedure. The only change from the treatment of variable declarations is that the
procedure enter is told which symbol table to make an entry in.

For example, consider the symbol tables for procedures readarray, exchange, and
quicksort pointing back to that for the containing procedure sort, consisting of the entire
program. Since partition is declared within quicksort, its table points to that of quicksort.

Symbol tables for nested procedures

sort
nil header
a
x
readarray to readarray
exchange to exchange
quicksort

readarray exchange quicksort

header header header
i k
v
partition

partition
header
i
j

NOTES.PMR-INSIGNIA.ORG
The semantic rules are defined in terms of the following operations:

1. mktable(previous) creates a new symbol table and returns a pointer to the new table. The
argument previous points to a previously created symbol table, presumably that for the
enclosing procedure.

2. enter(table, name, type, offset) creates a new entry for name name in the symbol table pointed
to by table. Again, enter places type type and relative address offset in fields within the entry.

3. addwidth(table, width) records the cumulative width of all the entries in table in the header
associated with this symbol table.

4. enterproc(table, name, newtable) creates a new entry for procedure name in the symbol table
pointed to by table. The argument newtable points to the symbol table for this procedure
name.

Syntax directed translation scheme for nested procedures

PMD { addwidth ( top( tblptr) , top (offset));

pop (tblptr); pop (offset) }

Mɛ { t : = mktable (nil);

push (t,tblptr); push (0,offset) }

D  D1 ; D2

D  proc id ; N D1 ; S { t : = top (tblptr);

addwidth ( t, top (offset));
pop (tblptr); pop (offset);
enterproc (top (tblptr), id.name, t) }

D  id : T { enter (top (tblptr), id.name, T.type, top (offset));

top (offset) := top (offset) + T.width }

Nɛ { t := mktable (top (tblptr));

push (t, tblptr); push (0,offset) }

 The stack tblptr is used to contain pointers to the tables for sort, quicksort, and partition
when the declarations in partition are considered.

 The top element of stack offset is the next available relative address for a local of the
current procedure.

 All semantic actions in the subtrees for B and C in

A  BC {actionA}

are done before actionA at the end of the production occurs. Hence, the action associated
with the marker M is the first to be done.

NOTES.PMR-INSIGNIA.ORG
 The action for nonterminal M initializes stack tblptr with a symbol table for the
outermost scope, created by operation mktable(nil). The action also pushes relative
address 0 onto stack offset.

 Similarly, the nonterminal N uses the operation mktable(top(tblptr)) to create a new

symbol table. The argument top(tblptr) gives the enclosing scope for the new table.

 For each variable declaration id: T, an entry is created for id in the current symbol table.
The top of stack offset is incremented by T.width.

 When the action on the right side of D  proc id; ND1; S occurs, the width of all
declarations generated by D1 is on the top of stack offset; it is recorded using addwidth.
Stacks tblptr and offset are then popped.
At this point, the name of the enclosed procedure is entered into the symbol table of its
enclosing procedure.

ASSIGNMENT STATEMENTS

Suppose that the context in which an assignment appears is given by the following grammar.

PMD

Mɛ

D  D ; D | id : T | proc id ; N D ; S

Nɛ

Nonterminal P becomes the new start symbol when these productions are added to those in the
translation scheme shown below.

Translation scheme to produce three-address code for assignments

S  id : = E { p : = lookup ( id.name);
if p ≠ nil then
emit( p ‘ : =’ E.place)
else error }

E  E1 + E 2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘ + ‘ E2.place ) }

E  E1 * E 2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘ * ‘ E2.place ) }

E  - E1 { E.place : = newtemp;
emit ( E.place ‘: =’ ‘uminus’ E1.place ) }

E  ( E1 ) { E.place : = E1.place }

NOTES.PMR-INSIGNIA.ORG
E  id { p : = lookup ( id.name);

if p ≠ nil then
E.place : = p
else error }

Reusing Temporary Names

 The temporaries used to hold intermediate values in expression calculations tend to

clutter up the symbol table, and space has to be allocated to hold their values.

 Temporaries can be reused by changing newtemp. The code generated by the rules for E
 E1 + E2 has the general form:

evaluate E1 into t1
evaluate E2 into t2
t : = t 1 + t2

 The lifetimes of these temporaries are nested like matching pairs of balanced parentheses.

 Keep a count c , initialized to zero. Whenever a temporary name is used as an operand,

decrement c by 1. Whenever a new temporary name is generated, use $c and increase c
by 1.

 For example, consider the assignment x := a * b + c * d – e * f

Three-address code with stack temporaries

statement value of c

0
$0 := a * b 1
$1 := c * d 2
$0 := $0 + $1 1
$1 := e * f 2
$0 := $0 - $1 1
x := $0 0

Addressing Array Elements:

Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w, then the ith element of array A
begins in location

base + ( i – low ) x w

where low is the lower bound on the subscript and base is the relative address of the storage
allocated for the array. That is, base is the relative address of A[low].

NOTES.PMR-INSIGNIA.ORG
292 Design and Implementation of Compiler

A
fter syntax tree have been constructed, the compiler must check whether the input program is type-
correct (called type checking and part of the semantic analysis). During type checking, a compiler
checks whether the use of names (such as variables, functions, type names) is consistent with their
definition in the program. Consequently, it is necessary to remember declarations so that we can detect
inconsistencies and misuses during type checking. This is the task of a symbol table. Note that a symbol table
is a compile-time data structure. It’s not used during run time by statically typed languages. Formally, a
symbol table maps names into declarations (called attributes), such as mapping the variable name x to its
type int. More specifically, a symbol table stores:
• For each type name, its type definition.
• For each variable name, its type. If the variable is an array, it also stores dimension information. It may
also store storage class, offset in activation record etc.
• For each constant name, its type and value.
• For each function and procedure, its formal parameter list and its output type. Each formal parameter
must have name, type, type of passing (by-reference or by-value), etc.

9.1 OPERATION ON SYMBOL TABLE

We need to implement the following operations for a symbol table:
1. insert ( String key, Object binding )
2. object_lookup ( String key )
3. begin_scope () and end_scope ()
(1) insert (s,t)- return index of new entry for string ‘s’ and token ‘t’
(2) lookup (s)- return index of new entry for string ‘s’ or o if ‘s’ is not found.
(3) begin_scope () and end_scope () : When we have a new block (ie, when we encounter the token {), we
begin a new scope. When we exit a block (i.e. when we encounter the token }) we remove the scope (this is
the end_scope). When we remove a scope, we remove all declarations inside this scope. So basically, scopes
behave like stacks. One way to implement these functions is to use a stack. When we begin a new scope we
push a special marker to the stack (e.g., 1). When we insert a new declaration in the hash table using insert, we
also push the bucket number to the stack. When we end a scope, we pop the stack until and including the first
–1 marker.
Example 9.1: Consider the following program:
1) {
2) int a;
3) {
4) int a;
5) a = 1;
6) };
7) a = 2;
8) };
we have the following sequence of commands for each line in the source program (we assume that the hash
key for a is 12):
1) push(–1)
2) insert the binding from a to int into the beginning of the list table[12]
push(12)
3) push(–1)
4) insert the binding from a to int into the beginning of the list table[12]
Contd...
Symbol Table 293
push(12)
6) pop()
remove the head of table[12]
pop()
7) pop()
remove the head of table[12]
pop()
Recall that when we search for a declaration using lookup, we search the bucket list from the beginning to the
end, so that if we have multiple declarations with the same name, the declaration in the innermost scope
overrides the declaration in the outer scope.
(4) Handling Reserve Keywords: Symbol table also handle reserve keywords like ‘PLUS’, MINUS’, ‘MUL’
etc. This can be done in following manner.
insert (“PLUS”, PLUS);
insert (“MINUS”, MINUS);
In this case first ‘PLUS’ and ‘MINUS’ indicate lexeme and other one indicate token.

9.2 SYMBOL TABLE IMPLEMENTATION

The data structure for a particular implementation of a symbol table is sketched in figure 9.1. In figure 9.1, a
separate array ‘arr_lexemes’ holds the character string forming an identifier. The string is terminated by an
end-of-string character, denoted by EOS, that may not appear in identifiers. Each entry in symbol-table array
‘arr_symbol_table’ is a record consisting of two fields, as “lexeme_pointer”, pointing to the beginning of a
lexeme, and token. Additional fields can hold attribute values. In figure 9.1, the 0th entry is left empty,
because lookup return 0 to indicate that there is no entry for a string. The 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th entries
are for the ‘a’, ‘plus’ ‘b’ ‘and’, ‘c’, ‘minus’, and ‘d’ where 2nd, 4th and 6th entries are for reserve keyword.

ARRAY all_symbol_table

a+b AND c – d Lexeme_pointer Token Attribute Position

0
id 1
plus 2
id 3
AND 4
id 5
minus 6
id 7

a EOS P L U S EOS b EOS AND EOS c EOS M I N U S EOS d

ARRAY arr_lexeme

Figure 9.1: Implemented symbol table

294 Design and Implementation of Compiler

When lexical analyzer reads a letter, it starts saving letters, digits in a buffer ‘lex_bufffer’. The string collected
in lex_bufffer is then looked in the symbol table, using the lookup operation. Since the symbol table initialized
with entries for the keywords plus, minus, AND operator and some identifiers as shown in figure 9.1 the
lookup operation will find these entries if lex_buffer contains either div or mod. If there is no entry for the
string in lex_buffer, i.e., lookup return 0, then lex_buffer contains a lexeme for a new identifier. An entry for
the identifier is created using insert( ). After the insertion is made; ‘n’ is the index of the symbol-table entry
for the string in lex_buffer. This index is communicated to the parser by setting tokenval to n, and the token
in the token field of the entry is returned.

9.3 DATA STRUCTURE FOR SYMBOL TABLE

9.3.1 List

The simplest and easiest to implement data structure for symbol table is a linear list of records. We use single
array or collection of several arrays for this purpose to store name and their associated information. Now
names are added to end of array. End of array always marks by a point known as space. When we insert any
name in this list then searching is done in whole array from ‘space’ to beginning of array. If word is not found
in array then we create an entry at ‘space’ and increment ‘space’ by one or value of data type. At this time insert
( ), object look up ( ) operation are performed as major operation while begin_scope ( ) and end_scope( ) are
used in simple table as minor operation field as ‘token type’ attribute etc. In implementation of symbol table
first field always empty because when ‘object-lookup’ work then it will return ‘0’ to indicate no string in
symbol table.
Complexity: If any symbol table has ‘n’ names then for inserting any new name we must search list ‘n’ times
in worst case. So cost of searching is O(n) and if we want to insert ‘n’ name then cost of this insert is O(n^2)
in worst case.

Variable Information(type Space (byte)

a Integer 2
b Float 4

c Character 1
d Long 4
SPACE

Figure 9.2: Symbol table as list

Symbol Table 295

9.3.2 Self Organizing List

To reduce the time of searching we can add an addition field ‘linker’ to each record field or each array index.
When a name is inserted then it will insert at ‘space’ and manage all linkers to other existing name.

Variable Information Variable Information

Id1 Info1 Id1 Info1

Id2 Info2 Id2 Info2

SPACE
Id3 Info3

SPACE

(a) (b)

Figure 9.3: Symbol table as self organizing list

In above figure (a) represent the simple list and (b) represent self organzing list in which Id1 is related to Id2
and Id3 is related to Id1.

9.3.3 Hash Table

A hash table, or a hash map, is a data structure that associates keys with values ‘Open hashing’ is a key that
is applied to hash table. In hashing –open, there is a property that no limit on number of entries that can be
made in table. Hash table consist an array ‘HESH’ and several buckets attached to array HESH according to
hash function. Main advantage of hash table is that we can insert or delete any number or name in O (n) time
if data are search linearly and there are ‘n’ memory location where data is stored. Using hash function any
name can be search in O(1) time. However, the rare worst-case lookup time can be as bad as O(n). A good
hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead
to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) occur. One
organization of a hash table that resolves conflicts is chaining. The idea is that we have list elements of type:
class Symbol {
public String key;
public Object binding;
public Symbol next;
public Symbol ( String k, Object v, Symbol r ) { key=k; binding=v;
next=r; }
}
296 Design and Implementation of Compiler

Structure of hash table look like as

a \n

b c d \n

e \n

f g \n

Figure 9.4: Symbol table as hash table (\n represent NULL)

9.3.4 Search Tree

Another approach to organize symbol table is that we add two link fields i.e. left and right child, we use these
field as binary search tree. All names are created as child of root node that always follow the property of binary
tree i.e. name <name ie and Namej <name. These two statements show that all smaller name than Namei must be
left child of name otherwise right child of namej. For inserting any name it always follow binary search tree
insert algorithm.
Example 9.2: Create list, search tree and hash table for given program for given program
int a,b,c;
int sum (int x, int y)
{
a = x+y
return (a)
}
main ()
{
int u,
u=sum (5,6);
}
Symbol Table 297

(i) List

Variable Information Space(byte)

u Integer 2 byte

a Integer 2 byte
b Integer 2 byte
c Integer 2 byte
x Integer 2 byte
y Integer 2 byte
sum Integer 2 byte
Space

Figure 9.5: Symbol table as list for example 9.2

(ii) Hash Table

a b c \n

sum \n

u \n

x y \n

Figure 9.6: Symbol table as hash table for example 9.2

298 Design and Implementation of Compiler

(iii) Search Tree

a x

y
b

c sum

Figure 9.7: Symbol table as search tree for example 9.2

9.4 SYMBOL TABLE HANDLER

Any interface i.e. symbol table between source and object code must take recognisance of data-related
concepts like storage, addresses and data representation as well as control-related ones like location counter,
sequential execution and branch instruction which are fundamental to nearly all machines on which high-
level language programs execute. Typically machines allow some operations which simulate arithmetic or
logical operations on data bit patterns which simulate numbers or characters, these patterns being stored in
an array like structure of memory whose elements are distinguished by addresses. In high-level languages
these addresses are usually given mnemonic names. The context-free syntax of many high-level languages
seems to draw a distinction between the “address” for a variable and the “value” associated with that variable
and stored at its address. Hence we find statements like
X := X + 4
in which the ‘X’ on the left of the ‘:=’ operator actually represents an address (sometimes called the ‘L-
value’ of ‘X’) while the ‘X’ on the right (sometimes called the ‘R-value’ of ‘X’) actually represents the value
of the quantity currently residing at the same address or we can say that each ‘X’ i.e., all variable in the above
assignment was syntactically a ‘Designator’. Semantically these two designators are very different we shall
refer to the one that represents an address as a ‘Variable Designator’ and to the one that represents a value as
a ‘Value Designator’.

To perform its task, the code generation interface will require the extraction of further information associated
with user-defined identifiers and best kept in the symbol table. In the case of constants we need to record the
associated values and in the case of variables we need to record the associated addresses and storage
demands (the elements of array variables will occupy a contiguous block of memory).
STACK INPUT ACTION

0 id + id * id $ GOTO ( I0 , id ) = s5 ; shift

0 id 5 + id * id $ GOTO ( I5 , + ) = r6 ; reduce by F→id

0F3 + id * id $ GOTO ( I0 , F ) = 3
GOTO ( I3 , + ) = r4 ; reduce by T → F

0T2 + id * id $ GOTO ( I0 , T ) = 2
GOTO ( I2 , + ) = r2 ; reduce by E → T

0E1 + id * id $ GOTO ( I0 , E ) = 1
GOTO ( I1 , + ) = s6 ; shift

0E1+6 id * id $ GOTO ( I6 , id ) = s5 ; shift

0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ; reduce by F → id

0E1+6F3 * id $ GOTO ( I6 , F ) = 3
GOTO ( I3 , * ) = r4 ; reduce by T → F

0E1+6T9 * id $ GOTO ( I6 , T ) = 9
GOTO ( I9 , * ) = s7 ; shift

0E1+6T9*7 id $ GOTO ( I7 , id ) = s5 ; shift

0 E 1 + 6 T 9 * 7 id 5 $ GOTO ( I5 , $ ) = r6 ; reduce by F → id

0 E 1 + 6 T 9 * 7 F 10 $ GOTO ( I7 , F ) = 10
GOTO ( I10 , $ ) = r3 ; reduce by T → T * F

0E1+6T9 $ GOTO ( I6 , T ) = 9
GOTO ( I9 , $ ) = r1 ; reduce by E → E + T

0E1 $ GOTO ( I0 , E ) = 1
GOTO ( I1 , $ ) = accept

TYPE CHECKING

A compiler must check that the source program follows both syntactic and semantic conventions
of the source language.
This checking, called static checking, detects and reports programming errors.

Some examples of static checks:

1. Type checks – A compiler should report an error if an operator is applied to an incompatible

operand. Example: If an array variable and function variable are added together.

NOTES.PMR-INSIGNIA.ORG
2. Flow-of-control checks – Statements that cause flow of control to leave a construct must have
some place to which to transfer the flow of control. Example: An error occurs when an
enclosing statement, such as break, does not exist in switch statement.

Position of type checker

token parser syntax type checker syntax intermediate intermediate

stream tree tree code generator representation

 A type checker verifies that the type of a construct matches that expected by its context.
For example : arithmetic operator mod in Pascal requires integer operands, so a type
checker verifies that the operands of mod have type integer.

 Type information gathered by a type checker may be needed when code is generated.

TYPE SYSTEMS

The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.

For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then
the result is of type integer ”

Type Expressions

 The type of a language construct will be denoted by a “type expression.”

 A type expression is either a basic type or is formed by applying an operator called a type
constructor to other type expressions.

 The sets of basic types and constructors depend on the language to be checked.

The following are the definitions of type expressions:

1. Basic types such as boolean, char, integer, real are type expressions.

A special basic type, type_error , will signal an error during type checking; void denoting
“the absence of a value” allows statements to be checked.

2. Since type expressions may be named, a type name is a type expression.

3. A type constructor applied to type expressions is a type expression.

Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression denoting the type
of an array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product T 1 X T2 is a

type expression.

NOTES.PMR-INSIGNIA.ORG
Records : The difference between a record and a product is that the fields of a record have
names. The record type constructor will be applied to a tuple formed from field names and
field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.

Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type
“pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).

Functions : A function in programming languages maps a domain type D to a range type R.

The type of such function is denoted by the type expression D → R

4. Type expressions may contain variables whose values are type expressions.

Tree representation for char x char → pointer (integer)

x pointer

char char integer

Type systems

 A type system is a collection of rules for assigning type expressions to the various parts of
a program.

 A type checker implements a type system. It is specified in a syntax-directed manner.

 Different type systems may be used by different compilers or processors of the same
language.

Static and Dynamic Checking of Types

 Checking done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic.

 Any check can be done dynamically, if the target code carries the type of an element
along with the value of that element.

NOTES.PMR-INSIGNIA.ORG
Sound type system
A sound type system eliminates the need for dynamic checking for type errors because it
allows us to determine statically that these errors cannot occur when the target program runs.
That is, if a sound type system assigns a type other than type_error to a program part, then type
errors cannot occur when the target code for the program part is run.

Strongly typed language

A language is strongly typed if its compiler can guarantee that the programs it accepts
will execute without type errors.

Error Recovery

 Since type checking has the potential for catching errors in program, it is desirable for
type checker to recover from errors, so it can check the rest of the input.

 Error handling has to be designed into the type system right from the start; the type
checking rules must be prepared to cope with errors.

SPECIFICATION OF A SIMPLE TYPE CHECKER

Here, we specify a type checker for a simple language in which the type of each identifier
must be declared before the identifier is used. The type checker is a translation scheme that
synthesizes the type of each expression from the types of its subexpressions. The type checker
can handle arrays, pointers, statements and functions.

A Simple Language

Consider the following grammar:

Translation scheme:

P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type) }
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }

In the above language,

→ There are two basic types : char and integer ;
→ type_error is used to signal errors;
→ the prefix operator ↑ builds a pointer type. Example , ↑ integer leads to the type expression
pointer ( integer ).

NOTES.PMR-INSIGNIA.ORG
Type checking of expressions

In the following rules, the attribute type for E gives the type expression assigned to the
expression generated by E.

1. E → literal { E.type : = char }

E → num { E.type : = integer }
Here, constants represented by the tokens literal and num have type char and integer.

2. E → id { E.type : = lookup ( id.entry ) }

lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by e.

3. E → E1 mod E2 { E.type : = if E1. type = integer and

E2. type = integer then integer
else type_error }
The expression formed by applying the mod operator to two subexpressions of type integer has
type integer; otherwise, its type is type_error.

4. E → E1 [ E2 ] { E.type : = if E2.type = integer and

E1.type = array(s,t) then t
else type_error }
In an array reference E1 [ E2 ] , the index expression E2 must have type integer. The result is
the element type t obtained from the type array(s,t) of E1.

5. E → E1 ↑ { E.type : = if E1.type = pointer (t) then t

else type_error }

The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t
of the object pointed to by the pointer E.

Type checking of statements

Statements do not have values; hence the basic type void can be assigned to them. If an error is
detected within a statement, then type_error is assigned.

Translation scheme for checking the type of statements:

1. Assignment statement:
S → id : = E { S.type : = if id.type = E.type then void
else type_error }

2. Conditional statement:
S → if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }

3. While statement:
S → while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }

NOTES.PMR-INSIGNIA.ORG
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and
S1.type = void then void
else type_error }

Type checking of functions

The rule for checking the type of a function application is :

E → E 1 ( E2 ) { E.type : = if E2.type = s and
E1.type = s → t then t
else type_error }

SOURCE LANGUAGE ISSUES

Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The
identifier is the procedure name, and the statement is the procedure body.
For example, the following is the definition of procedure named readarray :

procedure readarray;
var i : integer;
begin
for i : = 1 to 9 do read(a[i])
end;

When a procedure name appears within an executable statement, the procedure is said to be
called at that point.

Activation trees:
An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
1. Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only if control flows from activation a to
b.
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs before the
lifetime of b.

Control stack:

 A control stack is used to keep track of live procedure activations. The idea is to push the
node for an activation onto the control stack as the activation begins and to pop the node
when the activation ends.

 The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the path
from n to the root.

NOTES.PMR-INSIGNIA.ORG
The Scope of a Declaration:
A declaration is a syntactic construct that associates information with a name.
Declarations may be explicit, such as:
var i : integer ;
or they may be implicit. Example, any variable name starting with I is assumed to denote an
integer.

The portion of the program to which a declaration applies is called the scope of that declaration.

Binding of names:
Even if each name is declared once in a program, the same name may denote different
data objects at run time. “Data object” corresponds to a storage location that holds values.

The term environment refers to a function that maps a name to a storage location.
The term state refers to a function that maps a storage location to the value held there.

environment state

name storage value

When an environment associates storage location s with a name x, we say that x is bound
to s. This association is referred to as a binding of x.

STORAGE ORGANISATION

 The executing target program runs in its own logical address space in which each
program value has a location.
 The management and organization of this logical address space is shared between the
complier, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.

Typical subdivision of run-time memory:

Code

Static Data

Stack

free memory

Heap

NOTES.PMR-INSIGNIA.ORG
 Run-time storage comes in blocks, where a byte is the smallest unit of addressable
memory. Four bytes form a machine word. Multibyte objects are stored in consecutive
bytes and given the address of first byte.
 The storage layout for data objects is strongly influenced by the addressing constraints of
the target machine.
 A character array of length 10 needs only enough bytes to hold 10 characters, a compiler
may allocate 12 bytes to get alignment, leaving 2 bytes unused.
 This unused space due to alignment considerations is referred to as padding.
 The size of some program objects may be known at run time and may be placed in an
area called static.
 The dynamic areas used to maximize the utilization of space at run time are stack and
heap.

Activation records:
 Procedure calls and returns are usually managed by a run time stack called the control
stack.
 Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack.
 The contents of the activation record vary with the language being implemented. The
diagram below shows the contents of activation record.

 Temporary values such as those arising from the evaluation of expressions.

 Local data belonging to the procedure whose activation record this is.
 A saved machine status, with information about the state of the machine just before the
call to procedures.
 An access link may be needed to locate data needed by the called procedure but found
elsewhere.
 A control link pointing to the activation record of the caller.

NOTES.PMR-INSIGNIA.ORG
 Space for the return value of the called functions, if any. Again, not all called procedures
return a value, and if one does, we may prefer to place that value in a register for
efficiency.
 The actual parameters used by the calling procedure. These are not placed in activation
record but rather in registers, when possible, for greater efficiency.

STORAGE ALLOCATION STRATEGIES

The different storage allocation strategies are :
1. Static allocation – lays out storage for all data objects at compile time
2. Stack allocation – manages the run-time storage as a stack.
3. Heap allocation – allocates and deallocates storage as needed at run time from a data area
known as heap.

STATIC ALLOCATION
 In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time support package.
 Since the bindings do not change at run-time, everytime a procedure is activated, its
names are bound to the same storage locations.
 Therefore values of local names are retained across activations of a procedure. That is,
when control returns to a procedure the values of the locals are the same as they were
when control left the last time.
 From the type of a name, the compiler decides the amount of storage for the name and
decides where the activation records go. At compile time, we can fill in the addresses at
which the target code can find the data it operates on.

STACK ALLOCATION OF SPACE

 All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack.
 Each time a procedure is called , space for its local variables is pushed onto a stack, and
when the procedure terminates, that space is popped off the stack.

Calling sequences:
 Procedures called are implemented in what is called as calling sequence, which consists
of code that allocates an activation record on the stack and enters information into its
fields.
 A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
 The code in calling sequence is often divided between the calling procedure (caller) and
the procedure it calls (callee).
 When designing calling sequences and the layout of activation records, the following
principles are helpful:
 Values communicated between caller and callee are generally placed at the
beginning of the callee’s activation record, so they are as close as possible to the
caller’s activation record.

NOTES.PMR-INSIGNIA.ORG
 Fixed length items are generally placed in the middle. Such items typically include
the control link, the access link, and the machine status fields.
 Items whose size may not be known early enough are placed at the end of the
activation record. The most common example is dynamically sized array, where the
value of one of the callee’s parameters determines the length of the array.
 We must locate the top-of-stack pointer judiciously. A common approach is to have
it point to the end of fixed-length fields in the activation record. Fixed-length data
can then be accessed by fixed offsets, known to the intermediate-code generator,
relative to the top-of-stack pointer.

...
Parameters and returned values

caller’s
control link
activation
links and saved status
record

caller’s temporaries and local data

responsibility
Parameters and returned values
callee’s
activation control link
record links and saved status
top_sp
callee’s
responsibility temporaries and local data

Division of tasks between caller and callee

 The calling sequence and its division between caller and callee are as follows.

 The caller evaluates the actual parameters.

 The caller stores a return address and the old value of top_sp into the callee’s
activation record. The caller then increments the top_sp to the respective
positions.
 The callee saves the register values and other status information.
 The callee initializes its local data and begins execution.
 A suitable, corresponding return sequence is:

 The callee places the return value next to the parameters.

 Using the information in the machine-status field, the callee restores top_sp and
other registers, and then branches to the return address that the caller placed in
the status field.
 Although top_sp has been decremented, the caller knows where the return value
is, relative to the current value of top_sp; the caller therefore may use that value.

NOTES.PMR-INSIGNIA.ORG
Variable length data on stack:
 The run-time memory management system must deal frequently with the allocation of
space for objects, the sizes of which are not known at the compile time, but which are
local to a procedure and thus may be allocated on the stack.
 The reason to prefer placing objects on the stack is that we avoid the expense of garbage
collecting their space.
 The same scheme works for objects of any type if they are local to the procedure called
and have a size that depends on the parameters of the call.

activation
control link
record for p
pointer to A
pointer to B
pointer to C

array A
arrays of p
array B

array C

activation record for control link top_sp

procedure q called by p

arrays of q top

Access to dynamically allocated arrays

 Procedure p has three local arrays, whose sizes cannot be determined at compile time.
The storage for these arrays is not part of the activation record for p.
 Access to the data is through two pointers, top and top-sp. Here the top marks the actual
top of stack; it points the position at which the next activation record will begin.
 The second top-sp is used to find local, fixed-length fields of the top activation record.
 The code to reposition top and top-sp can be generated at compile time, in terms of sizes
that will become known at run time.

NOTES.PMR-INSIGNIA.ORG
HEAP ALLOCATION
Stack allocation strategy cannot be used if either of the following is possible :
1. The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.

 Heap allocation parcels out pieces of contiguous storage, as needed for activation records
or other objects.
 Pieces may be deallocated in any order, so over the time the heap will consist of alternate
areas that are free and in use.

Position in the Activation records in the heap Remarks

activation tree

s Retained activation
s record for r

r q ( 1 , 9) control link

control link

q(1,9)

control link

 The record for an activation of procedure r is retained when the activation ends.

 Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.

 If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.

NOTES.PMR-INSIGNIA.ORG

Intermediate Code Generation Guide
No ratings yet
Intermediate Code Generation Guide
17 pages
CD Unit 3
No ratings yet
CD Unit 3
23 pages
CD Unit IV
No ratings yet
CD Unit IV
10 pages
Unit 3
No ratings yet
Unit 3
14 pages
CD Unit 3
No ratings yet
CD Unit 3
10 pages
CD - Unit Iii
No ratings yet
CD - Unit Iii
69 pages
Intermediate Code Generation Guide
No ratings yet
Intermediate Code Generation Guide
205 pages
Compiler Syntax Translation
No ratings yet
Compiler Syntax Translation
30 pages
Chapter 5
No ratings yet
Chapter 5
15 pages
Unit - 4-Sdd and Sdts - MMM
No ratings yet
Unit - 4-Sdd and Sdts - MMM
47 pages
Topic: Syntax Directed Translations: Unit Iv
No ratings yet
Topic: Syntax Directed Translations: Unit Iv
52 pages
Unit 4
No ratings yet
Unit 4
205 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
UNIT IV CD Mam Notes
No ratings yet
UNIT IV CD Mam Notes
36 pages
CD Unit 3 - Merged
No ratings yet
CD Unit 3 - Merged
51 pages
Syntax-Directed Translation
No ratings yet
Syntax-Directed Translation
23 pages
UNIT-III Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-III Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
24 pages
CC Lecture 4
No ratings yet
CC Lecture 4
12 pages
Mod 1
No ratings yet
Mod 1
24 pages
Compiler Syntax Translation
No ratings yet
Compiler Syntax Translation
19 pages
Unit 4
No ratings yet
Unit 4
24 pages
Module-5-Syntax Directed Translation
No ratings yet
Module-5-Syntax Directed Translation
54 pages
Lovely Professional University: Declaration
No ratings yet
Lovely Professional University: Declaration
10 pages
Syntax Directed Translation
No ratings yet
Syntax Directed Translation
8 pages
Compiler Intermediate Code Guide
No ratings yet
Compiler Intermediate Code Guide
23 pages
Module 3
No ratings yet
Module 3
12 pages
Unit 4 Syntax Directed Translation First Half
No ratings yet
Unit 4 Syntax Directed Translation First Half
93 pages
Unit 3
No ratings yet
Unit 3
24 pages
Unit 03 CD
No ratings yet
Unit 03 CD
11 pages
Syntax Directed Translation Three Address Code
No ratings yet
Syntax Directed Translation Three Address Code
28 pages
Chapter 4
No ratings yet
Chapter 4
35 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
2 Syntax Directed Transiation
No ratings yet
2 Syntax Directed Transiation
9 pages
Syntax-Directed Translation Guide
No ratings yet
Syntax-Directed Translation Guide
30 pages
UNIT 3 - Chapter 1 in Compiler Design
No ratings yet
UNIT 3 - Chapter 1 in Compiler Design
28 pages
Csf401 Unit 03
No ratings yet
Csf401 Unit 03
33 pages
15Cs314J - Compiler Design: Unit Iii
No ratings yet
15Cs314J - Compiler Design: Unit Iii
69 pages
CD Unit 3
No ratings yet
CD Unit 3
187 pages
Syntax-Directed Translation Guide
No ratings yet
Syntax-Directed Translation Guide
51 pages
Unit 3 Compiler
No ratings yet
Unit 3 Compiler
27 pages
Unit 3 SDT Part 1
No ratings yet
Unit 3 SDT Part 1
13 pages
Compiler Design Chapter-4
100% (2)
Compiler Design Chapter-4
77 pages
Chap - 4 - Syntax - Directed - Translation - N07 - G10
No ratings yet
Chap - 4 - Syntax - Directed - Translation - N07 - G10
39 pages
Module 3 - Semantic Analysis
No ratings yet
Module 3 - Semantic Analysis
26 pages
Module 3
No ratings yet
Module 3
135 pages
CD Module 4new Full
No ratings yet
CD Module 4new Full
46 pages
SE Compiler Chapter 4-SDT
No ratings yet
SE Compiler Chapter 4-SDT
7 pages
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
8 pages
Unit 4
No ratings yet
Unit 4
27 pages
CD Unit-Iii
No ratings yet
CD Unit-Iii
20 pages
CC 6
No ratings yet
CC 6
30 pages
Unit 4 CD
No ratings yet
Unit 4 CD
26 pages
Compiler Lecture 5
No ratings yet
Compiler Lecture 5
19 pages
Syntax Directed Translationit
No ratings yet
Syntax Directed Translationit
47 pages
Poc Unit 3
No ratings yet
Poc Unit 3
22 pages
Intermediate Code Generation Guide
No ratings yet
Intermediate Code Generation Guide
27 pages
Syntax Directed Translation
No ratings yet
Syntax Directed Translation
27 pages
Chapter 4 - Compiler Designnn 1 Compressed
No ratings yet
Chapter 4 - Compiler Designnn 1 Compressed
35 pages
MEAN Unit-4
No ratings yet
MEAN Unit-4
30 pages
MEAN Unit-5
No ratings yet
MEAN Unit-5
34 pages
Unit 6 Final
No ratings yet
Unit 6 Final
11 pages
Unit 5 Contd Final
No ratings yet
Unit 5 Contd Final
12 pages
Unit 3 Final
No ratings yet
Unit 3 Final
14 pages
JAVA UNIT - 1 Lecture Notes
No ratings yet
JAVA UNIT - 1 Lecture Notes
42 pages
Code With Harry
100% (1)
Code With Harry
7 pages
17428-2019-Winter-Model-Answer-Paper (Msbte Study Resources)
No ratings yet
17428-2019-Winter-Model-Answer-Paper (Msbte Study Resources)
31 pages
AWS Amazon VPC Connectivity Options
100% (3)
AWS Amazon VPC Connectivity Options
18 pages
Heapsort: By: Vimal Awasthi B.Tech (CSE)
No ratings yet
Heapsort: By: Vimal Awasthi B.Tech (CSE)
27 pages
GDI Logic Based Design of Hamming Code Encoder and Decoder
No ratings yet
GDI Logic Based Design of Hamming Code Encoder and Decoder
24 pages
8085 Timing Diagram Guide
No ratings yet
8085 Timing Diagram Guide
20 pages
GIGABYTE GA-8IG1000MG Schematics: Sheet Title Sheet Title
No ratings yet
GIGABYTE GA-8IG1000MG Schematics: Sheet Title Sheet Title
37 pages
Bda Bits - Mid I-Qp (2024-25)
No ratings yet
Bda Bits - Mid I-Qp (2024-25)
2 pages
Expt 1 Insertion and Deletion in An Array
No ratings yet
Expt 1 Insertion and Deletion in An Array
9 pages
DBMSDPP9 by Vijay Agarwal Sir
No ratings yet
DBMSDPP9 by Vijay Agarwal Sir
16 pages
Mastering ArduinoJson 6
50% (2)
Mastering ArduinoJson 6
54 pages
Laboratory 5 Getting Started With Oracle SQL PLUS™: Helpful Tips: 1. Create A Table
No ratings yet
Laboratory 5 Getting Started With Oracle SQL PLUS™: Helpful Tips: 1. Create A Table
9 pages
TVL - CSS12 - Q2 - DW8
No ratings yet
TVL - CSS12 - Q2 - DW8
3 pages
F05 Work
No ratings yet
F05 Work
198 pages
Arrays
No ratings yet
Arrays
8 pages
Idma
No ratings yet
Idma
1 page
Repaire Windows 7
No ratings yet
Repaire Windows 7
21 pages
Android Game Initialization Log
No ratings yet
Android Game Initialization Log
2 pages
UNIT-6 Important Questions & Answers
No ratings yet
UNIT-6 Important Questions & Answers
20 pages
HTML Quiz Questions and Answers
No ratings yet
HTML Quiz Questions and Answers
151 pages
File Handling Programs For XII B
No ratings yet
File Handling Programs For XII B
16 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Transaction & Concurrency Control
No ratings yet
Transaction & Concurrency Control
19 pages
Image Compression with DCT
0% (1)
Image Compression with DCT
5 pages
How To Share Files Between Two Computers Using LAN Cable
No ratings yet
How To Share Files Between Two Computers Using LAN Cable
10 pages
Postgresql Interview Questions - Postgresql Intereview Questions With Answers
No ratings yet
Postgresql Interview Questions - Postgresql Intereview Questions With Answers
10 pages
05 Arrays and Strings (CS174)
No ratings yet
05 Arrays and Strings (CS174)
25 pages
Sample Question - Bank - IP
No ratings yet
Sample Question - Bank - IP
3 pages
Model GLTF
No ratings yet
Model GLTF
7 pages

Unit 4&5 Final

Uploaded by

Unit 4&5 Final

Uploaded by

SEMANTIC​ ​ANALYSIS

table​ ​and​ ​performing​ ​type​ ​checking.

Representation​ ​Formalism​ ​and​ ​an​ ​Implementation​ ​Mechanism.

semantic​ ​rules​ ​are​ ​to​ ​be​ ​evaluated.

We​ ​distinguish​ ​between​ ​two​ ​kinds​ ​of​ ​attributes:

Syntax​ ​Directed​ ​Definitions:​ ​An​ ​Example

:Xn​ ​depends​ ​only​ ​on

Benefits of using a machine-independent intermediate form are:

2. A machine-independent code optimizer can be applied to the intermediate representation.

Position of intermediate code generator

parser static intermediate intermediate code

Three ways of intermediate representation:

 Three address code

b uminus b uminus b uminus

(a) Syntax tree (b) Dag

Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of

a b c uminus * b c uminus * + assign

PRODUCTION SEMANTIC RULE

S  id : = E S.nptr : = mknode(‘assign’,mkleaf(id, id.place), E.nptr)

E  E1 + E2 E.nptr : = mknode(‘+’, E1.nptr, E2.nptr )

E  E1 * E 2 E.nptr : = mknode(‘*’, E1.nptr, E2.nptr )

E  - E1 E.nptr : = mknode(‘uminus’, E1.nptr)

E  id E.nptr : = mkleaf( id, id.place )

Syntax-directed definition to produce syntax trees for assignment statements

Two representations of the syntax tree

Three-address code is a sequence of statements of the general form

where t1 and t2 are compiler-generated temporary names.

 The unraveling of complicated arithmetic expressions and of nested flow-of-control

Three-address code is a linearized representation of a syntax tree or a dag in which

Types of Three-Address Statements:

The common three-address statements are:

1. Assignment statements of the form x : = y op z, where op is a binary arithmetic or logical

2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary

3. Copy statements of the form x : = y where the value of y is assigned to x.

7. Indexed assignments of the form x : = y[i] and x[i] : = y.

8. Address and pointer assignments of the form x : = &y , x : = *y, and *x : = y.

Syntax-Directed Translation into Three-Address Code:

Given input a : = b * - c + b * - c, the three-address code is as shown above. The

Syntax-directed definition to produce three-address code for assignments

PRODUCTION SEMANTIC RULES

S  id : = E S.code : = E.code || gen(id.place ‘:=’ E.place)

if E.place = 0 goto S.after

PRODUCTION SEMANTIC RULES

S  while E do S1 S.begin := newlabel;

 The function newtemp returns a sequence of distinct names t1,t2,….. in response to

Implementation of Three-Address Statements:

A three-address statement is an abstract form of intermediate code. In a compiler,

op arg1 arg2 result op arg1 arg2

(0) uminus c t1 (0) uminus c

(1) * b t1 t2 (1) * b (0)

(2) uminus c t3 (2) uminus c

(3) * b t3 t4 (3) * b (2)

(4) + t2 t4 t5 (4) + (1) (3)

(5) := t3 a (5) assign a (4)

(a) Quadruples (b) Triples

Quadruple and triple representation of three-address statements given above

op arg1 arg2 op arg1 arg2

(0) []= x i (0) =[] y i

(1) assign (0) y (1) assign x (0)

(a) x[i] : = y (b) x : = y[i]

 Another implementation of three-address code is that of listing pointers to triples, rather

statement op arg1 arg2

(0) (14) (14) uminus c

Indirect triples representation of three-address statements

As the sequence of declarations in a procedure or block is examined, we can lay out

In the translation scheme shown below:

 Nonterminal P generates a sequence of declarations of the form id : T.

Computing the types and relative addresses of declared names

D  id : T { enter(id.name, T.type, offset);

T  integer { T.type : = integer;

T  real { T.type : = real;

T  array [ num ] of T1 { T.type : = array(num.val, T1.type);

T  ↑ T1 { T.type : = pointer ( T1.type);

When a nested procedure is seen, processing of declarations in the enclosing procedure is

Symbol tables for nested procedures

readarray exchange quicksort

SEMANTIC ANALYSIS

table and performing type checking.

Representation Formalism and an Implementation Mechanism.

semantic rules are to be evaluated.

We distinguish between two kinds of attributes:

Syntax Directed Definitions: An Example

:Xn depends only on

8. Address and pointer assignments of the form x : = &y , x : = y, and x : = y.