0% found this document useful (0 votes)
29 views

phases of compiler

Uploaded by

21131a4211
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

phases of compiler

Uploaded by

21131a4211
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT I

Introduction to Compiling
Terminology
• Compiler:
– a program that translates an executable program in one
language into an executable program in another
language
– we expect the program produced by the compiler to be
better, in some way, than the original
• Interpreter:
– a program that reads an executable program and
produces the results of running that program
– usually, this involves executing the source program in
some fashion
Abstract view

Source Machine
code Compiler code

errors
• Recognizes legal (and illegal) programs
• Generate correct code
• Manage storage of all variables and code
• Agreement on format for object (or
assembly) code
Front-end, Back-end division

Source IR Machine
Front end Back end
code code

errors
• Front end maps legal code into IR
• Back end maps IR onto target machine
• Simplify retargeting
• Allows multiple front ends
• Multiple passes -> better code
Front end
Source tokens IR
Scanner Parser
code

errors

• Recognize legal code


• Report errors
• Produce IR
• Preliminary storage maps
Front end
Source tokens IR
Scanner Parser
code

errors

• Scanner:
– Maps characters into tokens – the basic unit of syntax
• x = x + y becomes <id, x> = <id, x> + <id, y>
– Typical tokens: number, id, +, -, *, /, do, end
– Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
LEX it sometimes needed to write your own
scanner
Front end
Source tokens IR
Scanner Parser
code

errors
• Parser:
– Recognize context-free syntax
– Guide context-sensitive analysis
– Construct IR
– Produce meaningful error messages
– Attempt error correction
• There are parser generators like YACC which
automates much of the work
Front end
• Context free grammars are used to represent
programming language syntaxes:

<expr> ::= <expr> <op> <term> |


<term>
<term> ::= <number> | <id>
<op> ::= + | -
Front end
• A parser tries to map a
program to the syntactic
elements defined in the
grammar
• A parse can be
represented by a tree
called a parse or syntax
tree
Front end
• A parse tree can be
represented more
compactly referred to as
Abstract Syntax Tree
(AST)
• AST is often used as IR
between front end and
back end
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Translate IR into target machine code


• Choose instructions for each IR operation
• Decide what to keep in registers at each
point
• Ensure conformance with system interfaces
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Produce compact fast code


• Use available addressing modes
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Have a value in a register when used


• Limited resources
• Optimal allocation is difficult
Traditional three pass compiler

Source IR Middle IR Machine


Front end Back end
code end code

errors
• Code improvement analyzes and change IR
• Goal is to reduce runtime
Middle end (optimizer)
• Modern optimizers are usually built as a set
of passes
• Typical passes
– Constant propagation
– Common sub-expression elimination
– Redundant store elimination
– Dead code elimination
The Phases of a Compiler
Phase-1: Lexical 1
Analysis 2
• Lexical analyzer reads the stream of characters making up the source program and
groups the characters into meaningful sequences called lexeme
• For each lexeme, the lexical analyzer produces a token of the form that it
passes on to the subsequent phase, syntax analysis
(token-name, attribute-value)
• Token-name: an abstract symbol is used during syntax analysis.
• attribute-value: points to an entry in the symbol table for this token.
Example: 1
3
newval := oldval + 12 Tokens:
newval Identifier
= Assignment
operator oldval Identifier
+ Add operator
12 Number
Lexical analyzer truncates white spaces and also removes
errors.
Phase-2: Syntax Analysis 1
5
• Also called Parsing or Tokenizing.
• The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation
Example: 1
6
Phase-3: Semantic Analysis 1
7
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition.
• Gathers type information and saves it in either the syntax tree or the
symbol table, for subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
• For example, many programming language definitions require an array
index to be an integer; the compiler must report an error if a floating-point
number is used to index an array.
• Example: newval := oldval+12
The type of the identifier newval must match with the type of expression (oldval+12).
Example:
1
• Semantic analysis 8
• Syntactically correct, but semantically incorrect

• example:
• sum = a + b;

Semantic records
int a; a integer
double sum; data type mismatch sum double
char b; b char
Phase-4: Intermediate Code 1
Generation 9
After syntax and semantic analysis of the source program, many compilers generate
an explicit low-level or machine-like intermediate representation (a program for an
abstract machine). This intermediate representation should have two important
properties:
• it should be easy to produce and
• it should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists of a
sequence of assembly-like instructions with three operands per instruction. Each
operand can act like a register.
This phase bridges the analysis and synthesis phases of translation .
Example: 2
0
newval := oldval
+ fact * 1

Id1 := Id2 + Id3 * 1


Temp1 = into real
(1)
Temp2 = Id3 Temp
1
*
Temp3 = Id2 Temp
2
+
Id1 = Temp3
Phase-5: Code Optimization 2
1
• The compiler looks at large segments of the program to decide how to
improve performance
• The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
• Usually better means:
• faster, shorter code, or target code that consumes less power.
• There are simple optimizations that significantly improve the running time of the
target program without slowing down compilation too much.
• Optimization cannot make an inefficient algorithm efficient - “only makes
an efficient algorithm more efficient”
Example: 2
2
• The above intermediate code will be
optimized as:

Temp1 = Id3 * 1
Id1 = Id2 + Tem
p1
Phase-6: Code Generation 2
3
• The last phase of translation is code generation.
• Takes as input an intermediate representation of the source program and maps it into
the target language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into
sequences of machine instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers
to hold variables.
Example: 2
4
Id1 := Id2 + Id3
*1
MO R1,Id
V 3
MU R1,#
L 1
MO R2,Id
V 2
AD R1,R
D 2
MO Id1,R
V 1
2
5
Symbol-Table Management 2
6
• The symbol table is a data structure containing a record for each variable name, with
fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the record for each
name quickly and to store or retrieve data from that record quickly
• These attributes may provide information about the storage allocated for a name, its type,
its scope (where in the program its value may be used), and in the case of procedure
names, such things as the number and types of its arguments, the method of passing each
argument (for example, by value or by reference), and the type returned.

new Val Id1 & attribute


old Val Id2 & attribute
fact Id3 &attribute
Error Handling 2
Routine: 7
• One of the most important functions of a compiler is the detection and
reporting of errors in the source program. The error message should allow
the programmer to determine exactly where the errors have occurred.
Errors may occur in all or the phases of a compiler.
• Whenever a phase of the compiler discovers an error, it must report the
error to the error handler, which issues an appropriate diagnostic message.
Both of the table-management and error-Handling routines interact with all
phases of the compiler.
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C

Semantic analyzer (type checking, Annotated parse tree or abstract


etc) syntax tree
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
3
2
Preprocessors, Compilers,
Assemblers, and Linkers

Skeletal Source Program

Preprocessor
Source Program
Try for example:
Compiler
gcc -v myprog.c
Target Assembly Program
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine Code
Context of a Compiler 8
• The programs which assist the compiler to
convert a skeletal source code into executable
form make the context of a compiler and is as
follows:
• Preprocessor:
The preprocessor scans the source code and
includes the header files which
contain relevant information for various
functions.
• Compiler:
The compiler passes the source
code through various phases and
generates the
target assembly code.
Cont…. 9

• Assembler:
The assembler converts the assembly code into relocatable machine code or object code.
Although this code is in 0 and 1 form, but it cannot be executed because this code has not
been assigned the actual memory addresses.
• Loader/Link Editor:
It performs two functions. The process of loading consists of taking machine code, altering the
relocatable addresses and placing the altered instructions and data in memory at proper
location.
The link editor makes a single program from several files of relocatable machine code. These
files are library files which the program needs.
The loader/link editor produces the executable or absolute machine code.

You might also like