0% found this document useful (0 votes)

24 views

Unit V

Uploaded by

Thenmozhi Elumalai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Unit V

Uploaded by

Thenmozhi Elumalai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

UNIT - 5

Issues in Code Generation

The most important criterion for a code generator is that it produce correct code. The
following issues arise during the code generation phase :
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order
1. Input to code generator:
The input to the code generation consists of the intermediate representation
of the source program produced by front end , together with information
in the symbol table to determine run-time addresses of the data
objects denoted by the names in the intermediate representation.
Intermediate representation can be :
a. Linear representation such as postfix notation
b. Three address representation such as quadruples, triples, indirect triples
c. Virtual machine representation such as byte code and stack machine code
d. Graphical representations such as syntax trees and DAG’s.
Prior to code generation, the front end must be scanned, parsed and
translated into intermediate representation. The syntactic and static semantic errors
have been detected, that the necessary type checking has been done. Therefore, input
to code generation is assumed to be error-free.
2. The Target program:
The instruction-set architecture of the target machine has a significant impact on the
difficulty of constructing a good code generator that produces high-quality machine
code. The most common target-machine architectures are RISC (reduced instruction set
computer) , CISC (complex instruction set computer) , and stack based.
The output of the code generator is the target program.
A RISC machine typically has many registers, three-address instructions, simple
addressing modes, and a relatively simple instruction-set architecture.
In contrast, a CISC machine typically has few registers, two-address
instructions, a variety of addressing modes , several register classes, variable-length
instructions, and instructions with side effects .
In a stack-based machine, operations are done by pushing operands onto a
stack and then performing the operations on the operands at the top of the stack. To
achieve high performance the top of the stack is typically kept in registers. Stack-based
machines almost disappeared because the stack organization was too limiting and
required too many swap and copy operations.
However , stack-based architectures were revived with the introduction of the Java
Virtual Machine (JVM) . The JVM is a software interpreter for Java bytecodes, an
intermediate language produced by Java compilers. The interpreter provides software
compatibility across multiple platforms.
Just-in-time (JIT) Java compilers have been created which translate bytecodes
during run time to the native hardware instruction set of the target machine. Another
approach is to build a compiler that compiles directly into the machine instructions.
Producing an absolute machine-language program as output has the
advantage that it can be placed in a fixed location in memory and immediately
executed.
Producing a relocatable machine-language program as output allows
subprograms to be compiled separately. A set of relocatable object modules can be
linked together and loaded for execution by a linking loader. Although we pay the added
expense of linking and loading, it gives a great deal of flexibility to compile subroutines
separately and to call other previously compiled programs from an object module.
Producing an assembly-language program as output makes the process of code
generation easier. We can generate symbolic instructions and use the macro facilities of
the assembler to help generate code.
3. Memory management:
Names in the source program are mapped to addresses of data objects in
run-time memory by the front end and code generator. Labels in three-address
statements have to be converted to addresses of instructions.
For example,
j : goto i generates jump instruction as follows :
 if i < j, a backward jump instruction with target address equal to
location of code for quadruple i is generated.
 if i > j, the jump is forward. We must store on a list for quadruple i the
location of the first machine instruction generated for quadruple j. When
i is processed, the machine locations for all instructions that forward
jumps to i are filled.
4. Instruction selection:
 The instructions of target machine should be complete and uniform.
 Instruction speeds and machine idioms are important factors when efficiency of
target program is considered.
 The quality of the generated code is determined by its speed and size
 For example, every three-address statement of the form x = y + z, where x, y, and z
are statically allocated, can be translated into the code sequence

 This strategy often produces redundant loads and stores. For example, the sequence
of three-address statements

 would be translated into


 Here, the fourth statement is redundant since it loads a value that has just been
stored, and so is the third if a is not subsequently used.
 The quality of the generated code is usually determined by its speed and size.
 While translating, the cost also should be taken into account.
 For example, if the target machine has an "increment" instruction (INC) , then the
three-address statement a = a + 1 may be implemented more efficiently by the single
instruction INC a, rather than by a more obvious sequence that loads a into a
register, adds one to the register, and then stores the result back into a.

5. Register allocation
 Instructions involving register operands are shorter and faster than those
involving operands in memory.
 The use of registers is subdivided into two subproblems :
o Register allocation - the set of variables that will reside in registers at
a point in the program is selected
o Register assignment - the specific register that a variable will
reside in is picked
 Finding an optimal assignment of registers to variables is difficult
Example 1: Certain machines require register-pairs (an even and next odd numbered
register) for some operands and results. For example, for integer multiplication and
integer division involve register pairs.
The multiplication instruction is of the form
M x, y
 where x, the multiplicand, is the even register of an even/odd register pair
 the multiplier y, is the odd register.
 The product occupies the entire even/odd register pair.
The division instruction is of the form
D x, y
 Where, the dividend occupies an even/odd register pair whose even
register is x;
 the divisor is y.
 After division,
o even register holds the remainder
o the odd register the quotient.

Example 2: Consider the two three-address code sequences below, in which the only
difference in (a) and (b) is the operator in the second statement.

The shortest assembly-code sequences for (a) and (b) are given below.
Ri stands for register i. SRDA stands for Shift-Right-Double-Arithmetic and
SRDA RO,32 shifts the dividend into R1 and clears R0 so all bits equal its sign bit. L,
ST, and A stand for load, store, and add, respectively
6. Evaluation order
 The order in which the computations are performed can affect the efficiency of the
target code.
 Some computation orders require fewer registers to hold intermediate results
than others.
Example

Design of a simple Code Generator

 A code generator generates target code for a sequence of three- address
statements and effectively uses registers to store operands of the statements.
 One of the primary issues during code generation is deciding how to use
registers to best advantage.
 There are four principal uses of registers:
1. In most machine architectures, some or all of the operands of an operation
must be in registers in order to perform the operation.
2. Registers make good temporaries - places to hold the result of a sub
expression while a larger expression is being evaluated, or more generally,
a place to hold a variable that is used only within a single basic block.
3. Registers are used to hold (global values that are computed in one basic
block and used in other blocks, for example, a loop index that is
incremented going around the loop and is used several times within the
loop .
4. Registers are often used to help with run-time storage management, for
example, to manage the run-time stack, including the maintenance of
stack pointers and possibly the top elements of the stack itself.
Register and Address Descriptors
 The code-generation algorithm considers each three-address instruction and
decides what loads are necessary to get the needed operands into registers.
 After generating the loads, it generates the operation itself.
 Then, if there is a need to store the result into a memory location, it also
generates that store.
 A data structure is needed to keep track of the program variables currently
having their value in a register, and which register or registers.
 Also need to know whether the memory location for a given variable currently
has the proper value for that variable, since a new value for the variable may
have been computed in a register and not yet stored.
 The desired data structure has the following descriptors:
1. Register Descriptor : For each available register, a register descriptor keeps
track of the variable names whose current value is in that register. Since only
those registers that are available for local use is available, assume that initially,
all register descriptors are empty. As the code generation progresses, each
register will hold the value of zero or more names.
2. Address Descriptor : For each program variable, an address descriptor keeps
track of the location or locations where the current value of that variable can be
found. The location might be a register, a memory address, a stack location. The
information can be stored in the symbol table entry for that variable name.
The Code-Generation Algorithm
 The code-generation algorithm contains a function getReg(I), which selects
registers for each memory location associated with the three-address instruction
I.
 Function getReg has access to the register and address descriptors for all the
variables of the basic block, and to certain useful data-flow information such as
the variables that are live on exit from the block.
 Assume that there are enough registers to accomplish any three-address
operation.
 In a three-address instruction such as x = y + z, + is treated as a generic
operator and ADD as the equivalent machine instruction.
Machine Instructions for Operations :
For a three-address instruction such as x = y + z, do the following:
1. Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry, and Rz.
2. If y is not in Ry , then issue an instruction LD Ry, y’
where y’ is one of the memory locations for y
3. Similarly, if z is not in Rz, issue and instruction LD Rz, z’
where z’ is a location for z.
4. Issue the instruction ADD Rx, Ry, Rz
Machine Instructions for Copy Statements :
 A three-address copy statement of the form x = y.
 Assume that getReg will always choose the same register for both x and y.
o If y is not already in that register Ry, then generate the machine
instruction LD Ry, y.
o If y was already in Ry, do nothing.
o It is only necessary to adjust the register description for R y so that it
includes x as one of the values found there.
Ending the Basic Block :
 Variables used by the block may wind up with their only location being a
register.
 If the variable is a temporary used only within the block, then when the block
ends, the value of the temporary is forgot and assume its register is empty.
 If the variable is live on exit from the block, or if we don't know which variables
are live on exit, then assume that the value of the variable is needed later. In
that case, if the value for variable x is not in memory location of x, then generate
the instruction ST x, R, where R is a register holding the x's value.

Managing Register and Address Descriptors

 As the code-generation algorithm issues load, store, and other machine
instructions, it updates the register and address descriptors.
 The rules are as follows:
1. For the instruction LD R, x
a) Change the register descriptor for register R so it holds only x.
b) Change the address descriptor for x by adding register R as an additional
location.
2. For the instruction ST x, R, change the address descriptor for x to include its
own memory location.
3. For an operation such as ADD Rx, Ry, Rz implementing a three-address
instruction x = y + z.
a) Change the register descriptor for Rx so that it holds only x.
b) Change the address descriptor for x so that its only location is Rx. Note that
the memory location for x is not now in the address descriptor for x.
c) Remove Rx from the address descriptor of any variable other than x.
4. When we process a copy statement x = y, after generating the load for y into
register Ry , if needed, and after managing descriptors as for all load
statements (per rule 1):
a) Add x to the register descriptor for Ry.
b) Change the address descriptor for x so that its only location is Ry.

Design of the Function getReg

Consider the instruction x=y + z . First, need to pick a register for y and a register for z.
The rules are as follows
1. If y is currently in a register, pick a register already containing y as Ry. Do not
issue a machine instruction to load this register.
2. If y is not in a register, but there is a register that is currently empty, pick one such
register as Ry .
3. The difficult case occurs when y is not in a register, and there is no register that is
currently empty. Pick one of the allowable registers, and need to make it safe to
reuse. Let R be a candidate register, and suppose v is one of the variables that the
register descriptor for R says is in R. We need to make sure that v's value either is
not really needed, or that there is somewhere else we can go to get the value of R.
The possibilities are:
a) If the address descriptor for v says that v is somewhere besides R, then we are OK.
(b) If v is x, the value being computed by instruction I, and x is not also one of the
other operands of instruction I (z in this example) ,then we are OK. The reason is that
in this case, we know this value of x is never again going to be used, so we are free to
ignore it.
( c) Otherwise, if v is not used later (that is , after the instruction I, there are no further
uses of v, and if v is live on exit from the block, then v is recomputed within the block),
then we are OK.
(d) If we are not OK by one of the first two cases, then we need to generate the store
instruction ST v, R to place a copy of v in its own memory location. This operation is
called a spill.
Now , consider the selection of the register Rx . The issues and options are almost as
for y, so we shall only mention the differences.
1. Since a new value of x is being computed, a register that holds only x is always an
acceptable choice for Rx; . This statement holds even if x is one of y and z, since our
machine instructions allows two registers to be the same in one instruction.
2. If y is not used after instruction I, and Ry holds only y after being loaded, then Ry
can also be used as Rx; . A similar option holds regarding z and Rz ·
The last to consider specially is the case when I is a copy instruction x = y. Pick the
register Ry as above. Then, always choose Rx = Ry .

Example : Translate the basic block consisting of the three-address statements

 Assume that t , u, and v are temporaries, local to the block, while a, b, c , and d are
variables that are live on exit from the block.
 Assume that there are as many registers as needed and when a register's value is no
longer needed then reuse its register.
 The figure shows the register and address descriptors before and after the
translation of each three-address instruction.
Three
Address Assembly Code Register Descriptor Address Descriptor
Code
a= a t=
R1 = EMPTY
Initial Setup of Register and b= b u=
R2 = EMPTY
Address Descriptor c= c v=
R3 = EMPTY
d= d
t=a-b R1 = a a= a t=R2
R2 = t b= b u=
R3 = EMPTY c= c v=
d= d
u = a-c R1 = u a= a t=R2
R2 = t b= b u=R1
R3 = c c= c, R3 v=
d= d
v=t+u R1 = u a= a t=R2
R2 = t b= b u=R1
R3 = v c= c v=R3
d= d
a=d R1 = u a= R2 t=
R2 = a,d b= b u= R1
R3 = v c= c v= R3
d= d, R2
d = v+u R1 = d a= R2 t=
R2 = a b= b u=
R3 = v c= c v= R3
d= R1
Exit R1 = d a= a,R2 t=
R2 = a b= b u=
R3 = v c= c v= R3
d= d, R1
 For the first three-address instruction, t = a - b we need to issue three instructions,
since nothing is in a register initially. Thus, a and b are loaded into registers R1 and
R2, and the value t produced in register R2. Can use R2 for t because the value b
previously in R2 is not needed within the block.
 The second instruction, u = a - c, does not require a load of a, since it is already in
register R1. Can reuse R1 for the result, u, since the value of a, previously in that
register, is no longer needed within the block, and its value is in its own memory
location. Change the address descriptor for a to indicate that it is no longer in R1,
but is in the memory location called a.
 The third instruction, v = t + u, requires only the addition. Can use R3 for the
result, v, since the value of c in that register is no longer needed within the block,
and c has its value in its own memory location.
 The copy instruction, a = d, requires a load of d, since it is not in memory. Register
R2's descriptor holding both a and d. The addition of a to the register descriptor is
the result of processing the copy statement.
 The fifth instruction, d = v + u, uses two values that are in registers. Since u is a
temporary whose value is no longer needed, reuse its register R1 for the new value
of d. Variable d and a is now in only R1, and is not in its own memory location. The
machine code for the basic block that stores the live-on-exit variables a and d into
their memory locations is needed.
CODE OPTIMIZATION
Principal Sources of Optimization
Peep-hole optimization
Machine independent Optimization
Basic Blocks

Code Optimization: Introduction

• Optimization is a program transformation technique, which tries to improve
the code by making it consume less resources and deliver high speed.
• A code optimizing process must follow the three rules given below:
- The output code must not, in any way, change the meaning of the
program.
- Optimization should increase the speed of the program and if
possible, the program should demand less number of resources.
- Optimization should itself be fast and should not delay the overall
compiling process.

Types of Optimizations
Machine independent optimizations:
Machine independent optimizations are program transformations
that improve the target code without taking into consideration any properties of the
target machine.
Machine dependant optimizations:
Machine dependant optimizations are based on register allocation and
utilization of special machine-instruction sequences.
• We will consider only Machine-Independent Optimizations—i.e., they don’t
take into consideration any properties of the target machine.
• The techniques used are a combination of Control-Flow and Data-Flow
analysis.
• - Control-Flow Analysis: Identifies loops in the flow graph of a program
since such loops are usually good candidates for improvement.
• - Data-Flow Analysis: Collects information about the way variables are
used in a program.
Properties of Optimizing Compilers:
The source code should produce correct target code
Dead code should be completely removed from source language.
While applying the optimizing transformations the semantic of the source
program should not be changed.
Basic Blocks
A basic block is a sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without any halt or
possibility of branching except at the end.
The following sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5

Basic Block Construction

Algorithm: Partition into basic blocks
Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one
block
Method:
1. We first determine the set of leaders, i.e., the first statements of basic blocks.
The rules we use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto
statement is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but
not including the next leader or the end of the program.
Example
begin
prod :=0;
i:=1;
do begin
prod :=prod+ a[i] * b[i];
i :=i+1;
end
while i <= 20
end

Three Address Code:

(1) prod := 0
(2) i := 1
(3) t1 := 4* i
(4) t2 := a[t1] /*compute a[i] */
(5) t3 := 4* i
(6) t4 := b[t3] /*compute b[i] */
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
Basic block 1: Statement (1) to (2)
Basic block 2: Statement (3) to (12)

Flow Graphs
Flow graph is a directed graph containing the flow-of-control information
for the set of basic blocks making up a program.
The nodes of the flow graph are basic blocks. It has a distinguished initial
node.

• B1 is the initial node. B2 immediately follows B1, so there is an edge from

B1 to B2. The target of jump from last statement of B2 is the first
statement B2, so there is an edge from B2 (last statement) to B2 (first
statement).
• B1 is the predecessor of B2, and B2 is a successor of B1.
5.1 PRINCIPAL SOURCES OF OPTIMIZATION
• A compiler optimization must preserve the semantics of the original
program.
• Except in very special circumstances, once a programmer chooses and
implements a particular algorithm, the compiler cannot understand enough
about the program to replace it with a substantially different and more
efficient algorithm.
• A compiler knows only how to apply relatively low-level semantic
transformations, using general facts such as algebraic identities like i + 0 =
i.
5.1.1 Causes of Redundancy
• There are many redundant operations in a typical program. Sometimes the
redundancy is available at the source level.
• For instance, a programmer may find it more direct and convenient to
recalculate some result, leaving it to the compiler to recognize that only one
such calculation is necessary. But more often, the redundancy is a side
effect of having written the program in a high-level language.
• As a program is compiled, each of these high level data structure accesses
expands into a number of low-level pointer arithmetic operations, such
as the computation of the location of the (i, j)th element of a matrix A.
• Accesses to the same data structure often share many common low-level
operations. Programmers are not aware of these low-level operations and
cannot eliminate the redundancies themselves.
5.1.2 A Running Example: Quicksort
Consider a fragment of a sorting program called quicksort to illustrate several
important code improving transformations. The C program for quicksort is given
below
void quicksort(int m, int n)
/* recursively sorts a[m] through a[n] */
{
int i, j;
int v, x;
if (n <= m) return;
/* fragment begins here */
i=m-1; j=n; v=a[n];
while(1) {
do i=i+1; while (a[i] < v); do
j = j-1; while (a[j] > v); i f (i
>= j) break;
x=a[i]; a[i]=a[j]; a[j]=x; /* swap a[i], a[j] */
}
x=a[i]; a[i]=a[n]; a[n]=x; /* swap a[i], a[n] */
/* fragment ends here */
quicksort (m, j); quicksort (i+1, n) ;
}
Fig 5.1: C code for quicksort

Intermediate code for the marked fragment of the program in Figure 5.1 is shown
in Figure 5.2. In this example we assume that integers occupy four bytes. The
assignment x = a[i] is translated into the two three address statements t6=4*i and
x=a[t6] as shown in steps (14) and (15) of Figure. 5.2. Similarly, a[j] = x becomes
t10=4*j and a[t10]=x in steps (20) and (21).

Fig 5.2: Three-address code for fragment in Figure.5.1

Fig 5.3: Is the flow graph for the program in Figure 5.1.
Figure 5.3 is the flow graph for the program in Figure 5.2. Block B1 is the entry
node. All conditional and unconditional jumps to statements in Figure 5.2 have
been replaced in Figure 5.3 by jumps to the block of which the statements are
leaders. In Figure 5.3, there are three loops. Blocks B2 and B3 are loops by
themselves. Blocks B2, B3, B4, and B5 together form a loop, with B2 the only
entry point.
5.1.3 Semantics-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes. Common subexpression elimination, copy
propagation, dead-code elimination, and constant folding are common examples of
such function-preserving (or semantics preserving) transformations.

(a) Before (b)After

Fig 5.4: Local Common-Subexpression Elimination
Some of these duplicate calculations cannot be avoided by the programmer
because they lie below the level of detail accessible within the source language.
For example, block B5 shown in Figure 5.4(a) recalculates 4 * i and 4 *j,
although none of these calculations were requested explicitly by the programmer.
5.1.4 Global Common Subexpressions
An occurrence of an expression E is called a common subexpression if E was
previously computed and the values of the variables in E have not changed since
the previous computation. We avoid re-computing E if we can use its previously
computed value; that is, the variable x to which the previous computation of
E was assigned has not changed in the interim.
The assignments to t7 and t10 in Figure 5.4(a) compute the common
subexpressions 4 * i and 4 * j, respectively. These steps have been eliminated in
Figure 5.4(b), which uses t6 instead of t7 and t8 instead of t10.
Figure 9.5 shows the result of eliminating both global and local common
subexpressions from blocks B5 and B6 in the flow graph of Figure 5.3. We first
discuss the transformation of B5 and then mention some subtleties involving
arrays.
After local common subexpressions are eliminated, B5 still evaluates 4*i and
4 * j, as shown in Figure 5.4(b). Both are common subexpressions; in
particular, the three statements
t8=4*j
t9=a[t8]
a[t8]=x
in B5 can be replaced by
t9=a[t4]
a[t4]=x
using t4 computed in block B3. In Figure 5.5, observe that as control passes from
the evaluation of 4 * j in B3 to B3, there is no change to j and no change to t4, so t4
can be used if 4 * j is needed.
Another common subexpression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level. Not only does j
retain its value as control leaves B3 and then enters B5, but a[j], a value
computed into a temporary t5, does too, because there are no assignments to
elements of the array a in the interim. The statements
t9=a[t4]
a[t6]=t9
in B5 therefore can be replaced by
a[t6]=t5
Analogously, the value assigned to x in block B5 of Figure 5.4(b) is seen to be
the same as the value assigned to t3 in block B2. Block B5 in Figure 5.5 is the result
of eliminating common subexpressions corresponding to the values of the source
level expressions a[i] and a[j] from B5 in Figure 5.4(b). A similar series of
transformations has been done to B6 in Figure 5.5.
The expression a[tl] in blocks B1 and B6 of Figure 5.5 is not considered a
common subexpression, although tl can be used in both places. After control
leaves B1 and before it reaches B6, it can go through B5, where there are
assignments to a. Hence, a[tl] may not have the same value on reaching B6 as it
did on leaving B1, and it is not safe to treat a[tl] as a common subexpression.

Fig 5.5: B5 and B6 after common-subexpression elimination

5.1.5 Copy Propagation
Block B5 in Figure 5.5 can be further improved by eliminating x, using two
new transformations. One concerns assignments of the form u = v called copy
statements, or copies for short. Copies would have arisen much sooner, because
the normal algorithm for eliminating common subexpressions introduces them, as
do several other algorithms.

(a) (b)
Fig 5.6: Copies introduced during common subexpression elimination
In order to eliminate the common subexpression from the statement c = d+e in
Figure 5.6(a), we must use a new variable t to hold the value of d + e. The value of
variable t, instead of that of the expression d + e, is assigned to c in Figure 5.6(b).
Since control may reach c = d+e either after the assignment to a or after the
assignment to b, it would be incorrect to replace c = d+e by either c = a or by c =
b.
The idea behind the copy-propagation transformation is to use v for u, wherever
possible after the copy statement u = v. For example, the assignment x = t3 in
block B5 of Figure 5.5 is a copy. Copy propagation applied to B5 yields the code in
Figure 5.7. This change may not appear to be an improvement, but, it gives us the
opportunity to eliminate the assignment to x.

Fig 5.7: Basic block B5 after copy propagation

5.1.6 Dead-Code Elimination
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is dead (or
useless) code - statements that compute values that never get used. While the
programmer is unlikely to introduce any dead code intentionally, it may appear as
the result of previous transformations.
Deducing at compile time that the value of an expression is a constant and
using the constant instead is known as constant folding.
One advantage of copy propagation is that it often turns the copy statement
into dead code. For example, copy propagation followed by dead-code elimination
removes the assignment to x and transforms the code in Figure 5.7 into

This code is a further improvement of block B5 in Figure 5.5.

5.1.7 Code Motion
Loops are a very important place for optimizations, especially the inner
loops where programs tend to spend the bulk of their time. The running time of a
program may be improved if we decrease the number of instructions in an inner
loop, even if we increase the amount of code outside that loop.
An important modification that decreases the amount of code in a loop is
code motion. This transformation takes an expression that yields the same
result independent of the number of times a loop is executed (a loop-invariant
computation) and evaluates the expression before the loop.
Evaluation of limit - 2 is a loop-invariant computation in the following while
statement :
while (i <= limit-2) /* statement does not change limit */
Code motion will result in the equivalent code
t = limit-2
while ( i <= t ) /* statement does not change limit or t */
Now, the computation of limit - 2 is performed once, before we enter the loop.
Previously, there would be n + 1 calculations of limit - 2 if we iterated the body of the
loop n times.
5.1.8 Induction Variables and Reduction in Strength
Another important optimization is to find induction variables in loops and
optimize their computation. A variable x is said to be an "induction variable" if
there is a positive or negative constant c such that each time x is assigned, its
value increases by c. For instance, i and t2 are induction variables in the loop
containing B2 of Figure 5.5. Induction variables can be computed with a single
increment (addition or subtraction) per loop iteration.
Replacement of a computation with a less expensive one.
Thus, we shall see how this optimization applies to our quicksort example by
beginning with one of the innermost loops: B3 by itself. Note that the values of j and
t4 remain in lock step; every time the value of j decreases by 1, the value of t4
decreases by 4, because 4 * j is assigned to t4. These variables, j and t4, thus form
a good example of a pair of induction variables.
When there are two or more induction variables in a loop, it may be possible to get
rid of all but one. For the inner loop of B3 in Fig. 5.8, we cannot get rid of either j
or t4 completely; t4 is used in B3 and j is used in B4. However, we can illustrate
reduction in strength and a part of the process of induction-variable elimination.
Eventually, j will be eliminated when the outer loop consisting of blocks B2, B3, B4
and Bs is considered.
Example.
Consider the assignment t4=4*j in Block B3
j is decremented by 1 each time, then t4=4*j -4 Thus, we may replace
t4=4*j by t4=t4 -4
Problem: We need to initialize t4 to t4 =4 *j before entering the Block B3.
Result. The substitution of a multiplication by a subtraction will speed up
the resulting code.
B2: B3:
i=i+1 j = j -1
t2=4*i t4 = 4*j
Fig 5.8: Strength reduction applied to 4 * j in block B3

After reduction in strength is applied to the inner loops around B2 and B3, the
only use of i and j is to determine the outcome of the test in block B4. We know
that the values of i and t2 satisfy the relationship t2 = 4 * i, while those of j and
t4 satisfy the relationship t4 = 4* j. Thus, the test t2 >= t4 can substitute for i >=
j. Once this replacement is made, i in block B2 and j in block B3 become dead
variables, and the assignments to them in these blocks become dead code that can
be eliminated. The resulting flow graph is shown in Figure. 5.9.
Fig 5.9: Flow graph after induction-variable elimination
Note:
1. Code motion, induction variable elimination and strength reduction are loop
optimization techniques.
2. Common sub expression elimination, copy propogation dead code elimination
and constant folding are function preserving transformations.
PEEPHOLE OPTIMIZATION
Most production compilers produce good code through careful instruction
selection and register allocation, a few use an alternative strategy: they generate naive
code and then improve the quality of the target code by applying "optimizing"
transformations to the target program.
The term "optimizing" is many simple transformations can significantly improve
the running time or space requirement of the target program.
A simple but effective technique for locally improving the target code is
peephole optimization, which is done by examining a sliding window of target
instructions (called the peephole) and replacing instruction sequences within the
peephole by a shorter or faster sequence, whenever possible.
Peephole optimization can also be applied directly after intermediate code generation
to improve the intermediate representation.
The peephole is a small, sliding window on a program. The code in the peephole need
not be contiguous, although some implementations do require this.
Characteristic of peephole optimization:

• Each improvement may spawn opportunities for additional improvements.

• Repeated passes over the target code are necessary to get the maximum benefit.
Examples of program transformations that is characteristic of peephole
optimizations:
• Redundant-instruction elimination
• Flow-of-control optimizations
• Algebraic simplifications
• Use of machine idioms
Eliminating Redundant Loads and Stores
If we see the instruction sequence in a target program,
LD a ,
R0ST R0
,a
we can delete store instructions because whenever it is executed. First instruction
will ensure that the value of a has already been loaded into register R0.
Note that if the store instruction had a label we could not be sure that first instruction
was always executed immediately before the second and so we could not remove the
store instruction.
Put another way, the two instructions have to be in the same basic block for this
transformation to be safe.
Eliminating Unreachable Code
Another opportunity for peephole optimization is the removal of unreachable
instructions. An unlabeled instruction immediately following an unconditional jump
may be removed. This operation can be repeated to eliminate a sequence of instructions.
For example, for debugging purposes, a large program may have within it certain code
fragments that are executed only if a variable debug is equal to 1.
In C, the source code might look like:

#define debug 0
….

if ( debug ) {
Print debugging information

In the intermediate representation, this code may look like

if debug == 1 goto L1
goto L2
L 1 : print debugging information
L2 :
One obvious peephole optimization is to eliminate jumps over jumps. Thus, no matter
what the value of debug, the code sequence above can be replaced by
if debug != 1 goto L2
print debugging informationL2 :
If debug is set to 0 at the beginning of the program, constant propagation would
transform this sequence into
if 0 != 1 goto L2
print debugging informationL2:
Now the argument of the first statement always evaluates to true, so the statement can
be replaced by goto L2. Then all statements that print debugging information are
unreachable and can be eliminated one at a time.
Flow-of-Control Optimizations
Simple intermediate code-generation algorithms frequently produce jumps to jumps,
jumps to conditional jumps, or conditional jumps to jumps. These unnecessary
jumps can be eliminated in either the intermediate code or the target code by the
following types of peephole optimizations. We can replace the sequence
goto L1
L1 : goto L2
by the sequence
goto L2
L1 : goto L2
If there are now no jumps to L1, then it may be possible to eliminate the
statement L1 : goto L2 provided it is preceded by an unconditional jump.
Similarly, the sequence
if a < b goto L1
L1 : goto L2
can be replaced by the sequence
if a < b goto L2
L1 : goto L2
Finally, suppose there is only one jump to L1 and L1 is preceded by an
unconditional goto. Then the sequence goto L1
L1 : if a < b goto L2
L3 :
may be replaced by the sequence
L3 :
if a < b goto L2
goto L3

While the number of instructions in the two sequences is the same, we sometimes skip
the unconditional jump in the second sequence, but never in the first. Thus, the second
sequence is superior to the first in execution time.
Algebraic Simplification and Reduction in Strength
These algebraic identities can also be used by a peephole optimizer to eliminate three-
address statements such as
x =x +0
or
x= x * 1 in the peephole.
Similarly, reduction-in-strength transformations can be applied in the
peephole to replace expensive operations by equivalent cheaper ones on the target
machine. Certain machine instructions are considerably cheaper than others and can
often be used as special cases of more expensive operators.
For example, x^ 2 is invariably cheaper to implement as x * x than as a call to an
exponentiation routine.

• Fixed-point multiplication or division by a power of two is cheaper to

implement as a shift.

• Floating-point division by a constant can be approximated as multiplication by a

constant, which may be cheaper. i.e. x / 2 = x * 0.5

Use of Machine Idioms

• The target machine may have hardware instructions to implement certain specific
operations efficiently.

• Detecting situations that permit the use of these instructions can reduce
execution time significantly.
For example, some machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or after using its value.
The use of the modes greatly improves the quality of code when pushing or popping a
stack, as in parameter passing. These modes can also be used in code for statements like
x = x + 1.
x = x + 1 → x++
x = x - 1 → x- -

OPTIMIZATION OF BASIC BLOCKS

We can often obtain a substantial improvement in the running time of code merely by
performing local optimization within each basic block by itself.
DIRECTED ACYCLIC GRAPHS (DAG)
Like the syntax tree for an expression, a DAG has leaves corresponding to atomic
operands and interior codes corresponding to operators.
The DAG Representation of Basic Blocks
Many important techniques for local optimization begin by transforming a basic block
into a DAG (directed acyclic graph). The idea extends naturally to the collection of
expressions that are created within one basic block. We construct a DAG for a basic
block as follows:
1. There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
2. There is a node N associated with each statement s within the block. The children of
N are those nodes corresponding to statements that are the last definitions, prior to s, of
the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is the list of
variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose variables are
live on exit from the block; that is, their values may be used later, in another block of the
flow graph. Calculation of these "live variables" is a matter for global flow analysis,
The DAG representation of a basic block lets us perform several code improving
transformations on the code represented by the block.
a) We can eliminate local common subexpressions, that is, instructions that compute
a value that has already been computed.
b) We can eliminate dead code, that is, instructions that compute a value that is never
used.
c) We can reorder statements that do not depend on one another; such reordering may
reduce the time a temporary value needs to be preserved in a register.
d) We can apply algebraic laws to reorder operands of three-address instructions, and
sometimes thereby simplify the computation.
Example: Construct DAG from the basic block.
1 t1 = 4*i
2 t2 = a[t1]
3 t3 = 4*i
4 t4 = b[t3]
5 t5 = t2*t4
6 t6 = prod + t5 7
t7 = i+1
8 i = t7
9 if i<=20 goto 1
Step by step construction of DAG
c block

Machine Independent code optimization

It tries to make the intermediate code more efficient by transforming a section of code that doesn’t
involve hardware components like CPU registers or any absolute memory location. Generally, it
optimizes code by eliminating redundancies, reducing the number of lines in code, eliminating useless
code or reducing the frequency of repeated code. Thus can be used on any processor irrespective of
machine specifications.

Machine independent code optimization can be achieved using the following methods:

Function Preserving Optimization :

Function Preserving optimizations deals with the code in a given function in an attempt of reducing the
computational time. It can be achieved by following methods:

1. Common Subexpression elimination

2. Folding
3. Dead code elimination
4. Copy Propagation
1. Common Subexpression elimination :
A common subexpression is the one which was computed and doesn’t change after it last computation,
but is often repeated in the program. The compiler evaluates its value even if it does not change. Such
evaluations result in wastage of resources and time. Thus it better be eliminated. Consider an example:

//Code snippet in three address code format

t1=x*z;

t2=a+b;

t3=p%t2;

t4=x*z; //t4 and t1 is same expression

//but evaluated twice by compiler.

t5=a-z;

// after Optimization

t1=x*z;

t2=a+b;

t3=p%t2;

t5=a-z;

It is troublesome if a common subexpression is often repeated in a program. Thus it needs to be

eliminated.

2. Constant Folding:
Constant Folding is a technique where the expression which is computed at compile time is replaced by
its value. Generally, such expressions are evaluated at runtime, but if we replace them with their values
they need not be evaluated at runtime, saving time.

//Code segment

int x= 5+7+c;

//Folding applied

int x=12+c;

Folding can be applied on boolean, integers as well as on floating point numbers but one should be
careful with floating point numbers. Constant folding is often interleaved with constant propagation.

Constant Propagation:
If any variable is assigned a constant value and used in further computations, constant propagation
suggests using the constant value directly for further computations. Consider the below example

// Code segment

int a = 5;

int c = b * 2;

int z = a;

//Applying constant propagation once

int c = 5 * 2;

int z = a;

//Applying constant propagation second time

int c = 10;

int z = a;

//Applying constant propagation last time

int z = a[10];

3. Dead Code Elimination:

Dead code is a program snippet that is never executed or never reached in a program. It is a code that
can be efficiently removed from the program without affecting any other part of the program. In case, a
value is obtained and never used in the future, it is also regarded as dead code. Consider the below dead
code:

//Code

int x= a+23; //the variable x is never used

//in the program. Thus it is a dead code.

z=a+y;
printf("%d,%d".z,y);

//After Optimization

z=a+y;

printf("%d,%d".z,y);

Another example of dead code is assign a value to a variable and changing that value just before using
it. The previous value assignment statement is dead code. Such dead code needs to be deleted in order to
achieve optimization.

4. Copy Propagation :
Copy Propagation suggests to use one variable instead of other, in cases where assignments of the form
x=y are used. These assignments are copy statements. We can efficiently use y at all required place
instead of assign it to x. In short, elimination of copies in the code is Copy Propagation.

//Code segment

----;

a=b;

z=a+x;

x=z-b;

----;

//After Optimization

----;

z=b+x;

x=z-b;

----;

Another kind of optimization, loop optimization deals with reducing the time a program spends inside a
loop.

Loop Optimizations:
The program spends most of its time inside the loops. Thus the loops determine the time complexity of
the program. So, in order to get an optimal and efficient code, loop optimization is required. In order to
apply loop optimization, we first need to detect the loop using control flow analysis with the help of
program flow graph. A cycle in a program flow graph will indicate presence of a loop. Note that, the
code from Intermediate code Generation phase which is in three-address format is given as input to the
optimization phase. It is difficult to identify loops in such a format, thus a program flow graph is
required.

The program flow graph consists of basic blocks, which is nothing but the code divided into parts or
blocks and show the execution flow of the code,

Sample Program flow graph

The cycle in the above graph shows a presence of loop from block 2 to block3.

Once the loops are detected following Loop Optimization techniques can be applied :

1. Frequency Reduction
2. Algebraic expression simplification
3. Strength Reduction
4. Redundancy Elimination
1. Frequency Reduction:
It applies to the concept that a loop runs for least possible lines of code. It can be achieved by following
methods:

a. Code Motion:
Many times, in a loop, statements that remain unchanged for every iteration are included in the loop.
Such statements are loop invariants and only result in the program spending more time inside the loop.
Code motion simply moves loop invariant code outside the loop, reducing the time spent inside the loop.
To understand this considers the example below.

//Before code motion

p=100

for(i=0;i<p;i++)

a=b+40; //loop invariant code

if(p/a==0)

printf("%d",p);
}

// After code motion

p=100

a=b+40;

for(i=0;i<p;i++)

if(p/a==0)

printf("%d",p);

In the example, before optimizing, the loop invariant code was evaluated for every iteration of the loop.
Once code motion is applied, the frequency of evaluating loop invariant code also decreases. Thus it is
also called as Frequency Reduction. The following is also an example for code motion.

//Before code motion

----;

while((x+y)>n)

----;

// After code motion

----;

int t=x+y;

while(t>n)

----;
}

----;

b. Loop Unrolling :
If a loop runs doing the same operation for every iteration, we can perform that same operation inside
the loop more than once. This is called loop unrolling. Such unrolled loop will perform the evaluation
more than once in a single loop iteration.

//Before Loop unrolling

while(i<50) //while loop initialize all array elements to 0;

//one element each iteration. Thus the loop runs 50 times.

x[i]=0;

i++;

//After loop unrolling

while(i<50) //After unrolling, each iteration

//initializes 5 elements to 0;

//Thus this loop runs only 5 times.

x[i]=0;

i++;

x[i]=0;

i++;

x[i]=0;

i++;
x[i]=0;

i++;

x[i]=0;

i++;

As in above example , an unrolled loop is more efficient than the previous loop.

c. Loop Jamming :
Combining the loops that carry out the same operations is called as loop jamming.

//Before loop jamming

----;

for(i=0;i<5;i++) //Setting all elements of 2D array to 0.

for(j=0;j<5;j++)

x[i][j]=0;

for(i=0;i<5;i++) //Setting diagonal elements to 1.

x[i][i]=0;

----;

//After loop jamming

----;

for(i=0;i<5;i++) //Setting all elements of 2D array to 0

//and diagonal elements to 1.

for(j=0;j<5;j++)

x[i][j]=0;

x[i][i]=1;

----;

Thus, instead of executing same loops twice, that operation can be done by executing the loop only
once.

2. Algebraic expression simplification :

A program might contain some trivial algebraic expressions that do not result in any useful computation
or change of value. Such lines of code can be eliminated so the compiler doesn’t waste time evaluating
it. For example,

A=A+0;

x=x*1;

These statements do not result in any useful computations. Such code may seem harmless, but when
used inside any loops they keep on being evaluated by compiler. Thus it is best to eliminate them.

3. Strength Reduction :
It suggests replacing a costly operation like multiplication with a cheaper one.

Example:

a*4

after reduction

a<<2
It is an important optimization for programs where array accesses occur within loops and should be used
with integer operands only.

4. Redundancy Elimination :
It may happen, that a specific expression is repeated in a code many times. This expression is redundant
to code because we may evaluate it once and substitute its next occurrence with its evaluated value. This
substitution is nothing but redundancy elimination. A simple example is given below

//Code:

----;

int x=a+b;

----;

int z=a+b+40;

----;

return (a+b)*5;

//After optimization

----;

int x=a+b;

----;

int z=x+40;

----;

return x*5

Redundancy elimination avoids evaluating the same expressions multiple times resulting in faster
execution.
Chapter 13

Bootstrapping a compiler

13.1 Introduction
When writing a compiler, one will usually prefer to write it in a high-level language.
A possible choice is to use a language that is already available on the machine
where the compiler should eventually run. It is, however, quite common to be in
the following situation:
You have a completely new processor for which no compilers exist yet. Nev-
ertheless, you want to have a compiler that not only targets this processor, but also
runs on it. In other words, you want to write a compiler for a language A, targeting
language B (the machine language) and written in language B.
The most obvious approach is to write the compiler in language B. But if B
is machine language, it is a horrible job to write any non-trivial compiler in this
language. Instead, it is customary to use a process called “bootstrapping”, referring
to the seemingly impossible task of pulling oneself up by the bootstraps.
The idea of bootstrapping is simple: You write your compiler in language A
(but still let it target B) and then let it compile itself. The result is a compiler from
A to B written in B.
It may sound a bit paradoxical to let the compiler compile itself: In order to
use the compiler to compile a program, we must already have compiled it, and to
do this we must use the compiler. In a way, it is a bit like the chicken-and-egg
paradox. We shall shortly see how this apparent paradox is resolved, but first we
will introduce some useful notation.

13.2 Notation
We will use a notation designed by H. Bratman [11]. The notation is hence called
“Bratman diagrams” or, because of their shape, “T-diagrams”.

281
282 CHAPTER 13. BOOTSTRAPPING A COMPILER

In this notation, a compiler written in language C, compiling from the language

A and targeting the language B is represented by the diagram

A B
C

In order to use this compiler, it must “stand” on a solid foundation, i.e., something
capable of executing programs written in the language C. This “something” can be
a machine that executes C as machine-code or an interpreter for C running on some
other machine or interpreter. Any number of interpreters can be put on top of each
other, but at the bottom of it all, we need a “real” machine.
An interpreter written in the language D and interpreting the language C, is
represented by the diagram

C
D

A machine that directly executes language D is written as

JD
J

The pointed bottom indicates that a machine need not stand on anything; it is itself
the foundation that other things must stand on.
When we want to represent an unspecified program (which can be a compiler,
an interpreter or something else entirely) written in language D, we write it as

These figures can be combined to represent executions of programs. For example,

running a program on a machine D is written as

D
JD
J

Note that the languages must match: The program must be written in the language
that the machine executes.
We can insert an interpreter into this picture:
13.3. COMPILING COMPILERS 283

C
C
D
JD
J

Note that, also here, the languages must match: The interpreter can only interpret
programs written in the language that it interprets.
We can run a compiler and use this to compile a program:

A A B B
C
JC
J

The input to the compiler (i.e., the source program) is shown at the left of the
compiler, and the resulting output (i.e., the target program) is shown on the right.
Note that the languages match at every connection and that the source and target
program are not “standing” on anything, as they are not executed in this diagram.
We can insert an interpreter in the above diagram:

A A B B
C
C
D
JD
J

13.3 Compiling compilers

The basic idea in bootstrapping is to use compilers to compile themselves or other
compilers. We do, however, need a solid foundation in form of a machine to run
the compilers on.
284 CHAPTER 13. BOOTSTRAPPING A COMPILER

It often happens that a compiler does exist for the desired source language, it
just does not run on the desired machine. Let us, for example, assume we want a
compiler for ML to x86 machine code and want this to run on an x86. We have
access to an ML-compiler that generates ARM machine code and runs on an ARM
machine, which we also have access to. One way of obtaining the desired compiler
would be to do binary translation, i.e., to write a compiler from ARM machine
code to x86 machine code. This will allow the translated compiler to run on an
x86, but it will still generate ARM code. We can use the ARM-to-x86 compiler to
translate this into x86 code afterwards, but this introduces several problems:

• Adding an extra pass makes the compilation process take longer.

• Some efficiency will be lost in the translation.

• We still need to make the ARM-to-x86 compiler run on the x86 machine.

A better solution is to write an ML-to-x86 compiler in ML. We can compile this

using the ML compiler on the ARM:

ML x86 ML x86
ML ML ARM ARM

ARM
JARM
J

Now, we can run the ML-to-x86 compiler on the ARM and let it compile itself1 :

ML x86 ML x86
ML ML x86 x86
ARM
JARM
J

We have now obtained the desired compiler. Note that the compiler can now be
used to compile itself directly on the x86 platform. This can be useful if the com-
piler is later extended or, simply, as a partial test of correctness: If the compiler,
when compiling itself, yields a different object code than the one obtained with the
above process, it must contain an error. The converse is not true: Even if the same
target is obtained, there may still be errors in the compiler.
1 We regard a compiled version of a program as the same program as its source-code version.
13.3. COMPILING COMPILERS 285

It is possible to combine the two above diagrams to a single diagram that covers
both executions:

ML x86 ML x86
ML x86 ML ML x86 x86
ML ML ARM ARM

ARM JARM
J
JARM
J

In this diagram, the ML-to-x86 compiler written in ARM has two roles: It is the
output of the first compilation and the compiler that runs the second compilation.
Such combinations can, however, be a bit confusing: The compiler that is the in-
put to the second compilation step looks like it is also the output of the leftmost
compiler. In this case, the confusion is avoided because the leftmost compiler is
not running and because the languages do not match. Still, diagrams that combine
several executions should be used with care.

13.3.1 Full bootstrap

The above bootstrapping process relies on an existing compiler for the desired lan-
guage, albeit running on a different machine. It is, hence, often called “half boot-
strapping”. When no existing compiler is available, e.g., when a new language has
been designed, we need to use a more complicated process called “full bootstrap-
ping”.
A common method is to write a QAD (“quick and dirty”) compiler using an
existing language. This compiler needs not generate code for the desired target
machine (as long as the generated code can be made to run on some existing plat-
form), nor does it have to generate good code. The important thing is that it allows
programs in the new language to be executed. Additionally, the “real” compiler is
written in the new language and will be bootstrapped using the QAD compiler.
As an example, let us assume we design a new language “M+”. We, initially,
write a compiler from M+ to ML in ML. The first step is to compile this, so it can
run on some machine:
286 CHAPTER 13. BOOTSTRAPPING A COMPILER

M+ ML M+ ML
ML ML ARM ARM

ARM
JARM
J

The QAD compiler can now be used to compile the “real” compiler:

M+ ARM M+ ARM

M+ M+ ML ML
ARM
JARM
J

The result is an ML program, which we need to compile:

M+ ARM M+ ARM

ML ML ARM ARM

ARM
JARM
J

The result of this is a compiler with the desired functionality, but it will probably
run slowly. The reason is that it has been compiled by using the QAD compiler (in
combination with the ML compiler). A better result can be obtained by letting the
generated compiler compile itself:

M+ ARM M+ ARM

M+ M+ ARM ARM

ARM
JARM
J

This yields a compiler with the same functionality as the above, i.e., it will generate
the same code, but, since the “real” compiler has been used to compile it, it will run
faster.
The need for this extra step might be a bit clearer if we had let the “real” com-
piler generate x86 code instead, as it would then be obvious that the last step is
13.3. COMPILING COMPILERS 287

required to get the compiler to run on the same machine that it targets. We chose
the target language to make a point: Bootstrapping might not be complete even if a
compiler with the right functionality has been obtained.

Using an interpreter
Instead of writing a QAD compiler, we can write a QAD interpreter. In our ex-
ample, we could write an M+ interpreter in ML. We would first need to compile
this:
M+ M+
ML ML ARM ARM

ARM
JARM
J

We can then use this to run the M+ compiler directly:

M+ ARM M+ ARM

M+ M+ ARM ARM

M+
M+
ARM
JARM
J

Since the “real” compiler has been used to do the compilation, nothing will be
gained by using the generated compiler to compile itself, though this step can still
be used as a test and for extensions.
Though using an interpreter requires fewer steps, this should not really be a
consideration, as the computer(s) will do all the work in these steps. What is im-
portant is the amount of code that needs to be written by hand. For some languages,
a QAD compiler will be easier to write than an interpreter, and for other languages
an interpreter is easier. The relative ease/difficulty may also depend on the language
used to implement the QAD interpreter/compiler.

147324-1 MotoCom SDK Functon Manual
No ratings yet
147324-1 MotoCom SDK Functon Manual
217 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
Cassandra Guidelines
No ratings yet
Cassandra Guidelines
23 pages
Unit 5
No ratings yet
Unit 5
10 pages
Code Generation (Autosaved)
No ratings yet
Code Generation (Autosaved)
48 pages
CD Unit 5
No ratings yet
CD Unit 5
26 pages
Unit 4 PCD
No ratings yet
Unit 4 PCD
15 pages
CD Unit 5
No ratings yet
CD Unit 5
26 pages
Unit4 Compiler PDF
No ratings yet
Unit4 Compiler PDF
73 pages
Unit-4-5
No ratings yet
Unit-4-5
36 pages
Code Generation
No ratings yet
Code Generation
5 pages
Code Geneartion
No ratings yet
Code Geneartion
13 pages
Chapter 10 - Code Generation
No ratings yet
Chapter 10 - Code Generation
31 pages
UNIT 4 - Chapter 1 in Compiler Design
No ratings yet
UNIT 4 - Chapter 1 in Compiler Design
51 pages
Compiler Notes KCG Unit IV
No ratings yet
Compiler Notes KCG Unit IV
14 pages
Unit Viii
No ratings yet
Unit Viii
16 pages
Compiler Notes Unit IV
No ratings yet
Compiler Notes Unit IV
15 pages
Code Generation: Issues in The Design of A Code Generator
No ratings yet
Code Generation: Issues in The Design of A Code Generator
33 pages
CODE generation cd
No ratings yet
CODE generation cd
57 pages
4nd5 unit cd CodeGeneration
No ratings yet
4nd5 unit cd CodeGeneration
21 pages
Unit V
No ratings yet
Unit V
21 pages
Compiler-Design U5
No ratings yet
Compiler-Design U5
13 pages
13-Issues in the Design of a Code Generator--22!10!2024
No ratings yet
13-Issues in the Design of a Code Generator--22!10!2024
54 pages
Unit-V Code Generation: 4.5. Issues in The Design of A Code Generator
No ratings yet
Unit-V Code Generation: 4.5. Issues in The Design of A Code Generator
6 pages
Acd 5
No ratings yet
Acd 5
9 pages
Code Generation 5th Year Computer Science Course
No ratings yet
Code Generation 5th Year Computer Science Course
20 pages
CD R19 Unit-5
No ratings yet
CD R19 Unit-5
13 pages
Issues in the design of a code generator
No ratings yet
Issues in the design of a code generator
4 pages
Code Generation I
No ratings yet
Code Generation I
32 pages
CODE GENERATION and Issues
No ratings yet
CODE GENERATION and Issues
3 pages
Design Issues: 1. Input To The Code Generator
100% (1)
Design Issues: 1. Input To The Code Generator
3 pages
Target Code Generation: Utkarsh Jaiswal 11CS30038
No ratings yet
Target Code Generation: Utkarsh Jaiswal 11CS30038
15 pages
34-Issues in the design of a code generator_Target Machine-25-10-2024
No ratings yet
34-Issues in the design of a code generator_Target Machine-25-10-2024
29 pages
Unit 5
No ratings yet
Unit 5
13 pages
Chapter 8 Code Optimization and Code Generation
No ratings yet
Chapter 8 Code Optimization and Code Generation
58 pages
Code Generation
No ratings yet
Code Generation
40 pages
Compiler Design Code Generation
No ratings yet
Compiler Design Code Generation
4 pages
Issues in Code Generator-Pages-2
No ratings yet
Issues in Code Generator-Pages-2
3 pages
15Cs314J - Compiler Design: Unit 4
No ratings yet
15Cs314J - Compiler Design: Unit 4
71 pages
CH5 2
No ratings yet
CH5 2
23 pages
CD Unit 5
No ratings yet
CD Unit 5
9 pages
Compiler Design (Unit-5)
No ratings yet
Compiler Design (Unit-5)
22 pages
CC 7
No ratings yet
CC 7
20 pages
CH5 2
No ratings yet
CH5 2
24 pages
Code Generation
No ratings yet
Code Generation
49 pages
Principles of Compiler Design (Seng 3043) : Chapter - 8 Code Generation
No ratings yet
Principles of Compiler Design (Seng 3043) : Chapter - 8 Code Generation
25 pages
UNIT V
No ratings yet
UNIT V
11 pages
CD UNIT-6 LM
No ratings yet
CD UNIT-6 LM
17 pages
Code Generation PDF
No ratings yet
Code Generation PDF
19 pages
CD Uint5
No ratings yet
CD Uint5
16 pages
Code Generation
No ratings yet
Code Generation
9 pages
CD Unit 6.1
No ratings yet
CD Unit 6.1
20 pages
Unit 4 Part 2 A
No ratings yet
Unit 4 Part 2 A
19 pages
Compiler Design and Construction Lecture Notes
No ratings yet
Compiler Design and Construction Lecture Notes
28 pages
Code Generation
No ratings yet
Code Generation
25 pages
mod 4-5
No ratings yet
mod 4-5
40 pages
5.1 Issues in Code Generation
No ratings yet
5.1 Issues in Code Generation
16 pages
Pcs - 2m
No ratings yet
Pcs - 2m
6 pages
Code Generation and Instruction Selection Unit-8
No ratings yet
Code Generation and Instruction Selection Unit-8
6 pages
Introduction To Compilers: Jun.-Prof. Dr. Christian Plessl Custom Computing University of Paderborn
No ratings yet
Introduction To Compilers: Jun.-Prof. Dr. Christian Plessl Custom Computing University of Paderborn
51 pages
Experiment No 6 - DONE
No ratings yet
Experiment No 6 - DONE
8 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
sapnote-79634 Lock logic in manual planning
No ratings yet
sapnote-79634 Lock logic in manual planning
5 pages
Advt No 38 2019
No ratings yet
Advt No 38 2019
1 page
Rakesh Resume
No ratings yet
Rakesh Resume
2 pages
C# Lab
No ratings yet
C# Lab
35 pages
CANON Ir8500, 8500P, 7200, 7200P, 8500B, 85 Replacement Manual
No ratings yet
CANON Ir8500, 8500P, 7200, 7200P, 8500B, 85 Replacement Manual
166 pages
Comparative Study of Different Cryptographic Algorithms For Data Security in Cloud Computing
No ratings yet
Comparative Study of Different Cryptographic Algorithms For Data Security in Cloud Computing
7 pages
TBarCode OCX 8.0 User Manual
No ratings yet
TBarCode OCX 8.0 User Manual
65 pages
2180165___FAQ__SAP_HANA_Expensive_Statements_Trace_v49
No ratings yet
2180165___FAQ__SAP_HANA_Expensive_Statements_Trace_v49
7 pages
Vnxe Software Ds
No ratings yet
Vnxe Software Ds
2 pages
Instrument Symbols - 1
No ratings yet
Instrument Symbols - 1
51 pages
Javascript Notes
No ratings yet
Javascript Notes
48 pages
Microsoft Corporation SM
No ratings yet
Microsoft Corporation SM
18 pages
Electronics-1-Course Syllabus-New
No ratings yet
Electronics-1-Course Syllabus-New
4 pages
Centroid
No ratings yet
Centroid
1 page
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Applications of Neural Networks - Tutorialspoint
No ratings yet
Applications of Neural Networks - Tutorialspoint
2 pages
OIM 6 Software PDF
No ratings yet
OIM 6 Software PDF
2 pages
PBV TBV Cat 08
No ratings yet
PBV TBV Cat 08
36 pages
Axcelerate - Learner Portal 08-11-2023
No ratings yet
Axcelerate - Learner Portal 08-11-2023
20 pages
34 Samss 718
No ratings yet
34 Samss 718
14 pages
Can’t we be skibidi?
No ratings yet
Can’t we be skibidi?
1 page
Old Engraving Style Portfolio by Slidesgo
No ratings yet
Old Engraving Style Portfolio by Slidesgo
40 pages
#01 KNX Architecture
0% (1)
#01 KNX Architecture
35 pages
COALAB4 (1)
No ratings yet
COALAB4 (1)
12 pages
Gasman II Manual
No ratings yet
Gasman II Manual
36 pages
2SC1079 2SC1080
No ratings yet
2SC1079 2SC1080
4 pages
PROFIBUS in A Marine Application: Mark O'Halloran BAE Systems
No ratings yet
PROFIBUS in A Marine Application: Mark O'Halloran BAE Systems
14 pages
AUR450C Specification 1
No ratings yet
AUR450C Specification 1
16 pages

Unit V

Uploaded by

Unit V

Uploaded by

UNIT - 5

Issues in Code Generation

 would be translated into

Design of a simple Code Generator

Managing Register and Address Descriptors

Design of the Function getReg

Example : Translate the basic block consisting of the three-address statements

Code Optimization: Introduction

Basic Block Construction

Three Address Code:

• B1 is the initial node. B2 immediately follows B1, so there is an edge from

Fig 5.2: Three-address code for fragment in Figure.5.1

(a) Before (b)After

Fig 5.5: B5 and B6 after common-subexpression elimination

Fig 5.7: Basic block B5 after copy propagation

This code is a further improvement of block B5 in Figure 5.5.

• Each improvement may spawn opportunities for additional improvements.

In the intermediate representation, this code may look like

• Fixed-point multiplication or division by a power of two is cheaper to

• Floating-point division by a constant can be approximated as multiplication by a

Use of Machine Idioms

OPTIMIZATION OF BASIC BLOCKS

Machine Independent code optimization

Function Preserving Optimization :

1. Common Subexpression elimination

//Code snippet in three address code format

t4=x*z; //t4 and t1 is same expression

//but evaluated twice by compiler.

It is troublesome if a common subexpression is often repeated in a program. Thus it needs to be

//Applying constant propagation once

//Applying constant propagation second time

//Applying constant propagation last time

3. Dead Code Elimination:

int x= a+23; //the variable x is never used

//in the program. Thus it is a dead code.

Sample Program flow graph

//Before code motion

a=b+40; //loop invariant code

// After code motion

//Before code motion

// After code motion

//Before Loop unrolling

while(i<50) //while loop initialize all array elements to 0;

//one element each iteration. Thus the loop runs 50 times.

//After loop unrolling

while(i<50) //After unrolling, each iteration

//Thus this loop runs only 5 times.

//Before loop jamming

for(i=0;i<5;i++) //Setting all elements of 2D array to 0.

for(i=0;i<5;i++) //Setting diagonal elements to 1.

//After loop jamming

for(i=0;i<5;i++) //Setting all elements of 2D array to 0

//and diagonal elements to 1.

2. Algebraic expression simplification :

In this notation, a compiler written in language C, compiling from the language

A machine that directly executes language D is written as

These figures can be combined to represent executions of programs. For example,

13.3 Compiling compilers

• Adding an extra pass makes the compilation process take longer.

• Some efficiency will be lost in the translation.

A better solution is to write an ML-to-x86 compiler in ML. We can compile this

13.3.1 Full bootstrap

The result is an ML program, which we need to compile:

We can then use this to run the M+ compiler directly:

You might also like