Intermediate Code Generation
● So there are 6 phases in a compiler where a source code is been converted to
target code.
● After semantic analysis the compiler generates an intermediate code of
the source code for the target machine. It represents a program for some
abstract machine. It is in between the high-level language and the
machine language. This intermediate code should be generated in such a
way that it makes it easier to be translated into the target machine code.
● If a compiler translates the source code to its target machine code
without having the option for generating intermediate code, then for each
new machine, a full native compiler is required.
● Intermediate code eliminates the need of a new full compiler for every
unique machine by keeping the analysis portion same for all the
compilers.
● The second part of compiler is changed according to the target machine.
1. Intermediate Languages
a. Infix, Prefix, Postfix
● Infix: (a + b) — human-readable.
● Prefix: + a b — no parentheses.
● Postfix: a b + — used in stack machines.
b. Notation Conversions:
Infix: (a + b) * (c - d) → Prefix: * + a b - c d → Postfix: a b + c d - *
types of Intermediate Code
Representation
Postfix Notation
In postfix notation, the operator comes after an operand, i.e., the operator
follows an operand. The postfix notation is the linear representation of the
syntax tree.
Example
● Postfix Notation for the expression (a+b) * (c+d) is ab + cd +*
● Postfix Notation for the expression (a*b) - (c+d) is ab* + cd + - .
4. Three-Address Code (TAC)
Each line has up to 3 operands.
That means each instruction should contain utmost 3 addresses
Then atleast 1 operator in rhs
Quadruple Representation
📌 Format: (op, arg1, arg2, result)
Each instruction is represented by four fields:
● op – operator
● arg1 – first operand
● arg2 – second operand
● result – where to store the result
✅ Example (for: (a + b) * (c - d)):
(+, a, b, t1)
(-, c, d, t2)
(*, t1, t2, t3)
There are too many temp variables that require more memory
Triples Representation
📌 Format: (index) = (op, arg1, arg2)
● No explicit result field.
● Each instruction is referred to by its position (index).
● Used to save memory and simplify internal representation.
✅ Example
Expression: (a + b) * (c - d)
Steps:
(0) = (+, a, b) // t1 = a + b
(1) = (-, c, d) // t2 = c - d
(2) = (*, (0), (1)) // t3 = t1 * t2
● Here, result of each line is implicitly stored by its index.
● In line 2, (0) and (1) refer to results of previous operations.
✅ Advantages:
● Memory-efficient (no need for temp variables like t1, t2)
● Compact structure
Indirect Triples
📌 Concept:
● An improved version of Triples.
● Solves the problem of instruction reordering in Triples.
● Uses a Pointer Table that points to the actual Triples.
🔧 Structure:
1. Triple Table (like normal Triples)
(0) = (+, a, b)
(1) = (-, c, d)
(2) = (*, (0), (1))
2. Pointer Table
[0] → (2) = (*, (0), (1))
[1] → (0) = (+, a, b)
[2] → (1) = (-, c, d)
Here, the execution order is controlled by the Pointer Table, not the index in the
triple table. You can rearrange pointers without changing actual instructions.
✅ Advantages:
● Allows easy reordering of code (unlike basic Triples).
● Keeps memory efficiency of Triples.
● More modular and flexible.
Three-Address Instruction Forms
Instruction Format Example Purpose
Type
Binary Operation x = y op z t1 = a + b Arithmetic/logic
computation
Simple x = y a = b Copy value
Assignment
Unary Operation x = op y t1 = -a Negation/logical NOT
Address x = &y ptr = &a Get address
Assignment
Pointer Read x = *y val = *ptr Read from pointer
Pointer Write *x = y *ptr = 10 Write to pointer
Unconditional goto L1 goto LOOP Change control flow
Jump
Conditional if x relop y if a < b goto If condition true, jump
Jump goto L2 L1
Label L1: LOOP: Target for jumps