0% found this document useful (0 votes)
51 views

Lecture21-22 Compiler Construction

The document discusses compiler construction and intermediate code generation. It covers topics such as data types and type checking, type expressions and constructors, type inference and checking, and variants of syntax trees. The key points are that compilers perform type checking to ensure code makes sense under language rules, and that intermediate representations are constructed to translate from a high-level source program to a low-level target code.

Uploaded by

rohail hanif
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Lecture21-22 Compiler Construction

The document discusses compiler construction and intermediate code generation. It covers topics such as data types and type checking, type expressions and constructors, type inference and checking, and variants of syntax trees. The key points are that compilers perform type checking to ensure code makes sense under language rules, and that intermediate representations are constructed to translate from a high-level source program to a low-level target code.

Uploaded by

rohail hanif
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Compiler Construction

(CS-636)

Muhammad Bilal Bashir


UIIT, Rawalpindi

1
Outline

1. Data Types & Type Checking


2. Intermediate Code Generation
3. Variants of Syntax Trees
4. Three-Address Code
5. Static Single-Assignment Form
6. Summary

2
Semantic Analysis

Lecture: 21-22

3
Data Types & Type Checking

One of the principal tasks of a compiler is the


computation and maintenance of information on
data types (type inference)
Compiler uses this information to ensure that each
part of the program makes sense under the type
rules of the language (type checking)
Data type information can occur in a program in
several different forms
Theoretically, a data type is a set of values, or more
precisely a set of values with certain operations on
those values
4
Data Types & Type Checking (Continue)

For instance, data type integer in a programming


language refers to a subset of mathematical
integers, together with the arithmetic operations
These sets in compiler constructions are described
by a type expression
Type expressions can occur in several places in a
program

5
Type Expressions & Type
Constructors
A programming language always contain a number
of built-in types
These predefined types correspond either to
numeric data types like int or double OR they are
elementary types like boolean or char
Such data types are called simple types, in that
their values exhibit no explicit internal structure
An interesting predefined type in C language is
void type
This type has no values, and so represents empty set

6
Type Expressions & Type
Constructors (Continue)
In some languages it is possible to define new
simple types
subrange in Pascal and enumerated types in C
In Pascal, subrange of integers from 0 to 9 can be
declared as
type Digit = 0..9;

In C, an enumerated type consisting of named


values can be declared as
typedef enum {red, green, blue} Color;

7
Type Expressions & Type
Constructors (Continue)
Given a set of predefined types, new data types can
be created using type constructors, such as array
and record, or struct
Such constructors can be viewed as functions that
take existing types as parameters and return new
types with a structure that depends on the
constructor
Such types are called structured types

8
Type Names, Type Declarations, and
Recursive Types
Languages that have a rich set of type constructors
usually also have a mechanism for a programmer to
assign names to type expressions
Such type declarations (sometimes called type
definitions) can be done in C as follows

struct RealIntRec {
double r;
int I;
};

9
Type Names, Type Declarations, and
Recursive Types (Continue)
Type declarations cause the declared type names to
be entered into the symbol table just as variable
declarations cause variable names to be entered
Type names are associated with attributes in the
symbol table in a similar way to variable declarations
These attributes include scope and type
expressions corresponding to the type name
Since type names can appear in type expressions,
question arise about the recursive use of type
names

10
Type Names, Type Declarations, and
Recursive Types (Continue)
In C programming language, recursive type names
cannot be declared directly because at time of
declaration it is unknown that how much memory be
required for the structure;

struct intBST {
int val;
struct intBST *left, *right;
};

11
Type Equivalence

Given the possible type expressions of a language,


a type checker must frequently answer the question
of when two type expressions represent the same
type
This is the question of type equivalence
There are many possible ways for type equivalence
to be defined by a language
Type equivalence checking can be seen as a
function in a compiler
function typeEqual( t1, t2, TypeExp ) : Boolean

12
Type Equivalence (Continue)

The typeEqual() function takes two type


expressions and returns true if they represent the
same type according to the type equivalence rules
of the language
One issue that relates directly to the description of
type equivalence algorithm is the way type
expressions are represented within a compiler
One straightforward method is to use a syntax tree
representation

13
Type Inference & Type Checking

Type checking is described in terms of semantic


actions based on representation of types and a
typeEqual() operation.
Compiler needs symbol table as well for this
purpose along with three of its basic operations
insert, lookup, and delete

14
Type Inference & Type Checking
(Continue)

Consider the following grammar;

15
Type Inference & Type Checking
(Continue)

16
Intermediate-Code
Generation

Back-end of a Compiler

17
Where Are We Now?
Source code

Scanner
Tokens

Parser
Syntax Tree

Semantics Analyzer

Annotated Tree

Intermediate Code Generator


Intermediate code

18
Intermediate-Code Generation

In the analysis-synthesis model of a compiler, the front


end analyzes a source program and creates an
intermediate representation, from which the back
end generates target code
Ideally, details of the source language are confined
to the front end, and details of the target machine to
the back end
With a suitably defined intermediate representation,
a compiler for language I and machine j can then be
built by combining the front end for language I with
back end for the machine j
19
Intermediate-Code Generation
(Continue)

Following figure shows front-end model of compiler

Static checking includes type checking, which


ensures that operators are applied to compatible
operands
Static checking also includes any syntactic checks
that remain after parsing
A break statement in C is enclosed within a while, for or
switch statement
20
Intermediate-Code Generation
(Continue)

While translating a program, compiler may construct


a sequence of intermediate representations
High Level Low Level Target
Source
Intermediate Intermediate Code
Program
Representation Representation

High-level representations are close to the source


language and low-level representation are close to
the target machine
The abstract syntax trees are high-level intermediate
representation
Depict natural hierarchical structure of the source program

21
Intermediate-Code Generation
(Continue)

A low-level representation is suitable for machine-


dependent tasks like register allocation and
instruction selection
Three-address code can range from high- to low-
level, depending upon the choice of operators
The difference between syntax trees and three-
address code are superficial
A syntax tree represents the component of a statement,
whereas three-address code contains labels and jump
instructions to represent the flow of control, as in machine
language

22
Intermediate-Code Generation
(Continue)

The choice or design of an intermediate


representation varies from compiler to compiler
An intermediate representation may either be an
actual language or it may consist of internal data
structures that are shared by phases of the compiler
C is a programming language, yet it is often used as
an intermediate form
C is flexible, it compiles into efficient machine code, and its
compilers are widely available
The C++ compiler consisted of a front end that generated
C, treating a C compiler as a back end

23
Variants of Syntax Trees

Nodes in a syntax tree represent constructs in the


source program
The children of the node represents meaningful
components of a construct
A directed acyclic graph (DAG) for an expression
identifies the common suhexpression of the
expression

24
Directed Acyclic Graphs for
Expressions
A directed acyclic graph (DAG), is a directed graph
with no directed cycles
Like syntax tree for an expression, a DAG has
leaves corresponding to atomic operands and
interior nodes corresponding to operators
A node N in a DAG has more than one parent if N
represents a common subexpression
A DAG not only represents expressions more
succinctly, it gives the compiler important clues
regarding the generation of efficient code to
evaluate the expression
25
Directed Acyclic Graphs for
Expressions (Continue)
Create Syntax Trees and DAGs for the following
expressions
a = a + 10
a + b + (a + b)
a+b+a+b
a + a * (b c) + (b c) * d

26
The Value-Number Method for
Constructing DAGs
Often, the nodes of a syntax tree or DAG are stored
in an array of records
Each row of the array represents one record, and
therefore one node
Consider the figure on next slide that shows a DAG
along with an array for expression i = i + 10

27
The Value-Number Method for
Constructing DAGs (Continue)
In the following figure leaves have one additional
field, which holds the lexical value, and interior
nodes have two additional fields indicating the left
and right children

28
The Value-Number Method for
Constructing DAGs (Continue)
In the array, we refer to nodes by giving the integer
index of the record for that node within the array
This integer is called the value number for the node
or for the expression represented by the node

29
Three-Address Code

In three-address code, there is at most one


operation on the right side of an instruction
Expression like x+y*z might be translated into the
sequence of three-address instructions
t1 = y*z
t2 = x+t1
t1 and t2 are compiler generated temporary names
The use of names for intermediate values computed
by a program allows three-address code to be
rearranged easily

30
Three-Address Code (Continue)

Exercise
Represent the following DAG in three-address code
sequence

31
Addresses and Instructions

Three-address code is built from two concepts:


addresses and instructions
In object-oriented terms, these concepts correspond
to classes, and the various kinds of addresses and
instructions correspond to appropriate subclasses
Alternatively, three-address code can be
implemented using records with fields for the
addresses
The records called quadruples and triples

32
Addresses and Instructions (Continue)

In three-address code scheme, an address can be


one of the following
A name: The names that appear in source program. In
implementation, a source name is replaced by a pointer to
its symbol table entry, where all the information about the
name is kept
A constant: In practice, a compiler must deal with many
different types of constants and variables
A compiler-generated temporary: It is useful, especially in
optimizing compilers, to create a distinct name each time a
temporary is needed

33
Addresses and Instructions (Continue)

Few examples of three-address code instructions


are mentioned below;
Assignment instruction x = y op z
Assignment of the form x = op y
Copy instructions of the form x = y
An unconditional jump goto L
Conditional jumps of the form if x goto L
Indexed copy instructions of the form x = y[z] OR y[z] = x
etc.

34
Addresses and Instructions (Continue)

Consider the following statement and its three-


address code in the figures;
do
i = i+1;
while( a[i]<v );

35
Quadruples & Triples

The description of three-address instructions


specifies components of each type of instructions,
but it does not specify the representation of these
instructions in a data structure
In a compiler, these instructions can be
implemented as objects or as records with fields for
the operator and the operands
Three such representations are called quadruples,
triples, and indirect triples

36
Quadruples

A quadruple or just quad has four fields, which we


call op, arg1, arg2, and result
In x=y+z, + is op, y and z are arg1 and arg2 whereas x is
result
The following are some exceptions in this rule;
Instructions with unary operators like x = minus y OR x = y
do not use arg2
Operators like param use neither arg2 nor result
Conditional and unconditional jumps put the target label in
result

37
Quadruples (Continue)

Example: Three-address code for the assignment


a = b*-c+b*-c is shown below

38
Triples

A triple has only three fields which we call op, arg1,


and arg2
In earlier example we have seen the result field is
used primarily for temporary names
Using triples, we refer to the result of an operation x
op y by its position rather than an explicit temporary
name
Consider the figure in next slide for details;

39
Triples (Continue)

Example: Three-address code using Triples

40
Static Single-Assignment Form
The Static Single-Assignment Form (SSA) is an
intermediate representation that facilitates certain
code optimizations
Two aspects distinguish SSA from three-address
code
All assignments in SSA are to variables with distinct names
SSA uses a notational convention -function to combine
two definitions of same variables
if( flag ) x = -1; else x = 1;
y = x + a

if( flag ) x1 = -1; else x2 = 1;


x3 = (x1,x2)

41
Summary

Any Questions?

42

You might also like