Compiler Design Chapter-1
Compiler Design Chapter-1
COMPILER DESIGN
The Evolution of Programming Languages
• The first electronic computers appeared in the 1940's and were programmed in
machine language by sequences of O's and 1 's that explicitly told the computer
what operations to execute and in what order.
• The operations themselves were very low level: move data from one location to
another, add the contents of two registers, compare two values , and so on.
• Needless to say, this kind of programming was slow, tedious, and error prone. And
once written, the programs were hard to understand and modify.
Compiler Design
The Evolution of Programming Languages
• The Move to Higher-level Languages : Today, there are thousands of
programming languages. They can be classified in a variety of ways.
• One classification is by generation.
• First-generation languages are the machine languages,
• second-generation the assembly languages, and
• third-generation the higher-level languages like Fortran, Cobol, Lisp, C, C++, C#, and
Java.
• Fourth-generation languages are languages designed for specific applications like
NOMAD for report generation, SQL for database queries, and Postscript for text
formatting.
• Fifth-generation language has been applied to logic- and constraint-based
languages like Prolog and OPS5.
Compiler Design
The Evolution of Programming Languages
Compiler Design
Introduction
Machine Language:
• The only language that is “understood” by a computer.
• Varies from machine to machine.
• The only choice in the 1940s.
0001 01 00 00001111
0011 01 10 00000010
0010 01 00 00010011
b=a+2
Compiler Design
Introduction
Assembly Languages:
• Also known as symbolic languages.
• First developed in the 1950s.
• Easier to read and write.
• Assembler converts to machine code.
• Still different for each type of machine.
MOV a, R1
ADD #2, R1
MOV R1, b
b=a+2
Compiler Design
Introduction
High-Level Languages:
• Developed in 1960s and later.
• Much easier to read and write.
• Potable to many different computers.
• Languages include C, Pascal, C++, Java, Perl, etc.
• Still must be converted to machine code!
Compiler Design
Introduction
Compiler Design
Introduction
What is a Compiler?
Compiler Design
Introduction
# include< stdio.h >
int main()
{
printf (“Hello World”)
}
• Before this C program can be run, a compiler must
be used to convert the whole program in to
machine code.
• Conversion to machine code occurs some time
before the program is run.
Compiler Design
Introduction
Compiler Design
Compiler
• Translator can convert one HLL in to any other HLL also. But compiler
generally converts HLL to LLL only.
• The compiled programs are eligible for execution. So it is must for any
program to be converted in to an object program.
• Compilers are classified as,
1. Single pass compiler.
2. multi-pass compiler.
3. Load-and-go compiler.
4. Debugging or optimizing compiler.
• But the basic tasks that any compiler must perform are essentially the
same.
Compiler Design
Introduction
What is an interpreter?
Compiler Design
Introduction
• Example : Java language processors combine compilation and interpretation. A Java
source program may first be compiled into an intermediate form called byte codes. The
byte codes are then interpreted by a virtual machine. A benefit of this arrangement is
that byte codes compiled on one machine can be interpreted on another machine ,
perhaps across a network.
• In order to achieve faster processing of inputs to outputs, some Java compilers, called
just-in-time compilers, translate the byte codes into machine language immediately
before they run the intermediate program to process the input.
Compiler Design
Introduction
Example:
10 A simple BASIC Program
20 PRINT “Hello World”
30 GO TO 10
• When this BASIC program is run, the interpreter
converts each line in turn in to machine code.
• Conversion to machine code occurs while the
program is running.
Compiler Design
Assembler , Compiler and Interpreter
Assembler : It takes source code written in Assembly Language & convert to machine language. Fast & small size
executable but tedious to write.
Compiler : It takes source code and converts it into executable code. Some compilers convert it into a binary file
that must then be linked with several other libraries of code before it can execute , some other compilers can
compile straight to executable code & other compilers convert it to a sort of tokenized code that still needs to
be semi-interpreted by a VM.
Interpreter : It does not compile code. Instead, it typically reads source code statement by statement, produces &
executes machine instructions on the fly. Most early forms of BASIC were interpreted languages. Slow execution.
Compiler Design
EXAMPLE
Compiler Design
Introduction
Language-Processing System (or) The context of a Compiler:
Preprocessor:
• Preprocessors produce input to compilers.
• They may perform the following functions
1. Macro processing
2. File inclusion.
3. Rational preprocessors.
4. Language Extensions.
Compiler Design
Introduction
Compiler
Analysis Synthesis
Synthesis
• Synthesis (machine dependent/language
independent phase) takes the tree structure
and translates the operations into the target Target Program
program.
Compiler Design
Introduction
Analysis of the source program:
• Analysis consists of 3 phases:
1. Linear analysis in which the stream of characters making
up the source program is read from left-to-right and
grouped in to tokens that are sequences of characters
having a collective meaning.
2. Hierarchical Analysis in which characters or tokens are
grouped hierarchically into nested collections with
collective meaning.
3. Semantic Analysis in which certain checks are performed
to ensure that the components of a program fit together
meaningfully.
Compiler Design
Phases of a Compiler
Source Program
1
Lexical Analyzer
2
Syntax Analyzer
3
Semantic Analyzer
5
Code Optimizer
6
Code Generator
Target Program
Compiler Design
Phases of a Compiler
Compiler Design
Phases of a Compiler
Analysis of the source program:
The Analysis Phase consists of 3 phases:
• Linear analysis or Lexical analysis or Scanning.
• Hierarchical analysis or Syntax analysis or parsing.
• Semantic analysis.
The Synthesis Phase consists of 3 phases:
• Intermediate code Generation.
• Code Optimization.
• Code Generation.
Compiler Design
Lexical Analysis
• Stream of characters is grouped into tokens.
• Also known as linear analysis or scanning.
• Groups input into tokens
• Examples of tokens are identifiers, reserved words, integers, doubles
or floats, delimiters, operators and special symbols
int a;
a = a + 2;
Compiler Design
Lexical analysis
position = initial + rate*60;
Lexical analysis
Where Id1,id2 and id3 are tokens for position, initial and rate.
Compiler Design
Syntax Analysis or Parsing
• Also called hierarchical analysis or parsing.
• Groups tokens into grammatical phrases, often
represented by a parse tree.
• Parsing uses a context-free grammar of valid programming
language structures to find the structure of the input.
• Result of parsing usually represented by a syntax tree.
Example of grammar rules:
expression → expression + expression | variable |
constant.
variable → identifier
constant → intconstant | doubleconstant | …
Compiler Design
Syntactic Analysis
Syntax analysis
id1 +
id2 *
id3 60
Compiler Design
Semantic Analysis
• Checks source program for semantic errors (e.g. type checking).
• Uses hierarchical structure determined by syntactic analysis to determine
operators and operands.
• Parse tree is checked for things that violates the semantic rules of the language
– Semantic rules may be written with an attribute grammar
• Examples:
– Using undeclared variables
– Function called with improper arguments
• Number and type of arguments
– Array variables used without array syntax
– Type checking of operator arguments
– Left hand side of an assignment must be a variable (sometimes called an L-
value).
Compiler Design
• Suppose it is assumed that all identifiers have been
declared to be Real.
• But 60 assumes it self to be an integer.
• When * is applied to a real rate and integer 60,
integer is converted in to real during the type
checking process.
• This is achieved from the operator int to real,
which converts an integer in to real.
• The characters grouped as token will be recorded
in a table called as symbol table.
Compiler Design
Type checking
=
id1 +
id2 *
id3 60
Type checking
=
id1 +
id2 *
id3 int_to_real
Compiler Design 60
Intermediate Code Generation:
• Thus the intermediate code generation phase transforms the parse tree
into an intermediate language representation of the source program.
Compiler Design
Intermediate Code Generation
• The three address code for the statement :
temp1 = int_to_real(60)
temp2 = id3*temp1
temp3 = id2 + temp2
id1 = temp3
• The intermediate form has several properties.
• First each three-address instruction has at most one operator in addition
to the assignment.
• compiler has to decide the order of operations to be done.
• Second, the compiler must generate a temporary name hold the value
computed by each instruction.
• Third , some three address instruction may have less then three
operands such as the instructions
temp1 = int_to_real(60) and
id1 = temp3
Compiler Design
Code optimization
• Code optimization phase improves the intermediate code i.e. it reduces
the code by removing the repeated or unwanted instructions from the
intermediate code.
• The above given intermediate code can be optimized to the following
code:
temp1 = id3* 60.0
id1 = id2 + temp1
• Compiler can reduce that the conversion of 60 from integer to real can
be done all at once at compile time, so that int to real operation can be
eliminated.
• Temp3 is used to transmit its value to id1,instead which can be directly
substituted to id1.
Compiler Design
Code optimization
temp1 = int_to_real(60)
temp2 = id3*temp1
temp3 = id2 + temp2
id1 = temp3
Optimization
Compiler Design
Code Generation:
Compiler Design
Reviewing the Entire Process
lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1 r
id2l *
r
position .... id3 inttoreal o
60 r
initial ….
s
intermediate code generator
rate….
Compiler Design
Reviewing the Entire Process
Symbol Table E
r
position ....
r
initial …. o
intermediate code generator r
rate….
temp1 := inttoreal(60) s
temp2 := id3 * temp1
temp3 := id2 + temp2 3 address code
id1 := temp3
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1 Compiler Design
position := initial + rate * 60
intermediate code generator
lexical analyzer
:=
code optimizer
id1 +
id2 *
temp1 := id3 * 60.0
id3 60 id1 := id2 + temp1
:=
MOVF id3, R2
id1 +
MULF #60.0, R2
id2 * MOVF id2, R1
ADDF R2, R1
id3 inttoreal
MOVF R1, id1
60
Compiler Design
Symbol Tables
• A symbol table management or book keeping is a portion of the compiler
which keeps track of the names used by the program and records
information(attributes).
• The data structures used to record, this information is called a symbol
table.
• Data structure containing a record for each identifier with fields
specifying attributes.
• Attributes for variables include storage, type, scope, etc.
• Attributes for procedures include name, parameters, etc.
• When a lexical analyzer sees an identifier for the first time, it adds it to
the symbol table.
Compiler Design
Symbol Tables
• Symbol table management is a part of the compiler that interacts with
several of the phases
– Identifiers are found in lexical analysis and placed in the symbol table
– During syntactical and semantical analysis, type and scope
information is added
– During code generation, type information is used to determine what
instructions to use
– During optimization, the “live analysis” may be kept in the symbol
table.
Compiler Design
Error Handling
• Error handling and reporting also occurs across many phases
– Lexical analyzer reports invalid character sequences.
– Syntactic analyzer reports invalid token sequences.
– Semantic analyzer reports type and scope errors.
• The compiler may be able to continue with some errors, but other
errors may stop the process.
The main functionality of a compiler is the detection and reporting of
errors in the source program.
• detection
•Recovery
•Repair
•correction
Compiler Design
Compiler- Construction Tools