0% found this document useful (0 votes)
112 views

Compiler Design Chapter-1

The document discusses the evolution of programming languages from machine language to higher-level languages like C++ and Java. It explains that early computers in the 1940s were programmed using machine language using 0s and 1s, while assembly languages developed in the 1950s made programming easier by using symbolic representations. Higher-level languages emerged in the 1960s and later, making programming even easier and allowing programs to be portable to different computers. A compiler translates programs written in a high-level language into machine language.

Uploaded by

Asheber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

Compiler Design Chapter-1

The document discusses the evolution of programming languages from machine language to higher-level languages like C++ and Java. It explains that early computers in the 1940s were programmed using machine language using 0s and 1s, while assembly languages developed in the 1950s made programming easier by using symbolic representations. Higher-level languages emerged in the 1960s and later, making programming even easier and allowing programs to be portable to different computers. A compiler translates programs written in a high-level language into machine language.

Uploaded by

Asheber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

CHAPTER-1

COMPILER DESIGN
The Evolution of Programming Languages

• The first electronic computers appeared in the 1940's and were programmed in
machine language by sequences of O's and 1 's that explicitly told the computer
what operations to execute and in what order.

• The operations themselves were very low level: move data from one location to
another, add the contents of two registers, compare two values , and so on.

• Needless to say, this kind of programming was slow, tedious, and error prone. And
once written, the programs were hard to understand and modify.

Compiler Design
The Evolution of Programming Languages
• The Move to Higher-level Languages : Today, there are thousands of
programming languages. They can be classified in a variety of ways.
• One classification is by generation.
• First-generation languages are the machine languages,
• second-generation the assembly languages, and
• third-generation the higher-level languages like Fortran, Cobol, Lisp, C, C++, C#, and
Java.
• Fourth-generation languages are languages designed for specific applications like
NOMAD for report generation, SQL for database queries, and Postscript for text
formatting.
• Fifth-generation language has been applied to logic- and constraint-based
languages like Prolog and OPS5.

Compiler Design
The Evolution of Programming Languages

• An object-oriented language is one that supports object-oriented


programming, a programming style in which a program consists of a collection
of objects that interact with one another.
• Simula 67 and Smalltalk are the earliest major object-oriented languages.
• Languages such as C++, C#, Java, and Ruby are more recent object-oriented languages.

• Scripting languages are interpreted languages with high-level operators


designed for "gluing together" computations. These computations were
originally called "scripts.“
• Awk, JavaS cript, Perl, PHP, Python, Ruby are popular examples of scripting languages.
Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.

Compiler Design
Introduction
Machine Language:
• The only language that is “understood” by a computer.
• Varies from machine to machine.
• The only choice in the 1940s.

0001 01 00 00001111
0011 01 10 00000010
0010 01 00 00010011
b=a+2

Compiler Design
Introduction
Assembly Languages:
• Also known as symbolic languages.
• First developed in the 1950s.
• Easier to read and write.
• Assembler converts to machine code.
• Still different for each type of machine.
MOV a, R1
ADD #2, R1
MOV R1, b
b=a+2

Compiler Design
Introduction
High-Level Languages:
• Developed in 1960s and later.
• Much easier to read and write.
• Potable to many different computers.
• Languages include C, Pascal, C++, Java, Perl, etc.
• Still must be converted to machine code!

Compiler Design
Introduction

Compiler Design
Introduction
What is a Compiler?

• A compiler acts as a translator, transforming human-oriented programming


languages into computer-oriented machine language.
• A compiler is a program that reads a program written in one language called the
source language into an equivalent program in another language called the
target language.
• The target program is then provided the input to produce output.
• A program that translates an executable program in one language into an
executable program in another language.

Compiler Design
Introduction
# include< stdio.h >
int main()
{
printf (“Hello World”)
}
• Before this C program can be run, a compiler must
be used to convert the whole program in to
machine code.
• Conversion to machine code occurs some time
before the program is run.

Compiler Design
Introduction

Example of tasks of compiler

1. Add two numbers

2.Move numbers from one location to another

3. Move information between CPU and memory


4.Software Translator
5.Modula-2 to C
6.Java to byte codes

Compiler Design
Compiler
• Translator can convert one HLL in to any other HLL also. But compiler
generally converts HLL to LLL only.
• The compiled programs are eligible for execution. So it is must for any
program to be converted in to an object program.
• Compilers are classified as,
1. Single pass compiler.
2. multi-pass compiler.
3. Load-and-go compiler.
4. Debugging or optimizing compiler.
• But the basic tasks that any compiler must perform are essentially the
same.

Compiler Design
Introduction
What is an interpreter?

•An interpreter is another common kind of language processor . Instead of


producing a target program as a translation, an interpreter appears to
directly execute the operations specified in the source program on inputs
supplied by the user.

•The machine-language target program produced by a compiler is usually


much faster than an interpreter at mapping inputs to outputs . An
interpreter, however, can usually give better error diagnostics than a
compiler, because it executes the source program statement by statement.

Compiler Design
Introduction
• Example : Java language processors combine compilation and interpretation. A Java
source program may first be compiled into an intermediate form called byte codes. The
byte codes are then interpreted by a virtual machine. A benefit of this arrangement is
that byte codes compiled on one machine can be interpreted on another machine ,
perhaps across a network.
• In order to achieve faster processing of inputs to outputs, some Java compilers, called
just-in-time compilers, translate the byte codes into machine language immediately
before they run the intermediate program to process the input.

Compiler Design
Introduction
Example:
10 A simple BASIC Program
20 PRINT “Hello World”
30 GO TO 10
• When this BASIC program is run, the interpreter
converts each line in turn in to machine code.
• Conversion to machine code occurs while the
program is running.

Compiler Design
Assembler , Compiler and Interpreter
Assembler : It takes source code written in Assembly Language & convert to machine language. Fast & small size
executable but tedious to write.

Compiler : It takes source code and converts it into executable code. Some compilers convert it into a binary file
that must then be linked with several other libraries of code before it can execute , some other compilers can
compile straight to executable code & other compilers convert it to a sort of tokenized code that still needs to
be semi-interpreted by a VM.

Interpreter : It does not compile code. Instead, it typically reads source code statement by statement, produces &
executes machine instructions on the fly. Most early forms of BASIC were interpreted languages. Slow execution.

Compiler Design
EXAMPLE

Compiler Design
Introduction
Language-Processing System (or) The context of a Compiler:

Preprocessor:
• Preprocessors produce input to compilers.
• They may perform the following functions
1. Macro processing
2. File inclusion.
3. Rational preprocessors.
4. Language Extensions.

Compiler Design
Introduction

• Any compiler must perform two major tasks

Compiler

Analysis Synthesis

• Analysis of the source program


• Synthesis of a machine-language program
Compiler Design
Introduction

Analysis-Synthesis Model of Compilation:


Source Program
• There are two parts to compilation:
• Analysis (machine independent/language
dependent phase) determines the operations Analysis
implied by the source program which are
recorded in a tree structure. Intermediate Code

Synthesis
• Synthesis (machine dependent/language
independent phase) takes the tree structure
and translates the operations into the target Target Program
program.

Compiler Design
Introduction
Analysis of the source program:
• Analysis consists of 3 phases:
1. Linear analysis in which the stream of characters making
up the source program is read from left-to-right and
grouped in to tokens that are sequences of characters
having a collective meaning.
2. Hierarchical Analysis in which characters or tokens are
grouped hierarchically into nested collections with
collective meaning.
3. Semantic Analysis in which certain checks are performed
to ensure that the components of a program fit together
meaningfully.

Compiler Design
Phases of a Compiler
Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate
Code Generator

5
Code Optimizer

6
Code Generator

Target Program
Compiler Design
Phases of a Compiler

Compiler Design
Phases of a Compiler
Analysis of the source program:
The Analysis Phase consists of 3 phases:
• Linear analysis or Lexical analysis or Scanning.
• Hierarchical analysis or Syntax analysis or parsing.
• Semantic analysis.
The Synthesis Phase consists of 3 phases:
• Intermediate code Generation.
• Code Optimization.
• Code Generation.

Compiler Design
Lexical Analysis
• Stream of characters is grouped into tokens.
• Also known as linear analysis or scanning.
• Groups input into tokens
• Examples of tokens are identifiers, reserved words, integers, doubles
or floats, delimiters, operators and special symbols

int a;
a = a + 2;

int reserved word


a identifier
; special symbol
a identifier
= operator
a identifier
+ operator
2 integer constant
; special symbol
Compiler Design
Lexical Analysis
• Example: Position := initial + rate * 60
1. The identifier Position.
2. The assignment symbol :=.
3. The identifier initial.
4. The plus sign +.
5. The identifier rate
6. The Multiplication sign *.
7. The number 60.
• Blanks are typically ignored.
• Character sequence forming token is called the “lexeme”
for the token.

Compiler Design
Lexical analysis
position = initial + rate*60;

Lexical analysis

id1 = id2 + id3*60;

Where Id1,id2 and id3 are tokens for position, initial and rate.

Examples are: Variable name, keyword or number.

Compiler Design
Syntax Analysis or Parsing
• Also called hierarchical analysis or parsing.
• Groups tokens into grammatical phrases, often
represented by a parse tree.
• Parsing uses a context-free grammar of valid programming
language structures to find the structure of the input.
• Result of parsing usually represented by a syntax tree.
Example of grammar rules:
expression → expression + expression | variable |
constant.
variable → identifier
constant → intconstant | doubleconstant | …

Compiler Design
Syntactic Analysis

id1 = id2 + id3*60;

Syntax analysis

id1 +

id2 *

id3 60
Compiler Design
Semantic Analysis
• Checks source program for semantic errors (e.g. type checking).
• Uses hierarchical structure determined by syntactic analysis to determine
operators and operands.
• Parse tree is checked for things that violates the semantic rules of the language
– Semantic rules may be written with an attribute grammar
• Examples:
– Using undeclared variables
– Function called with improper arguments
• Number and type of arguments
– Array variables used without array syntax
– Type checking of operator arguments
– Left hand side of an assignment must be a variable (sometimes called an L-
value).

Compiler Design
• Suppose it is assumed that all identifiers have been
declared to be Real.
• But 60 assumes it self to be an integer.
• When * is applied to a real rate and integer 60,
integer is converted in to real during the type
checking process.
• This is achieved from the operator int to real,
which converts an integer in to real.
• The characters grouped as token will be recorded
in a table called as symbol table.

Compiler Design
Type checking

=
id1 +
id2 *
id3 60

Type checking

=
id1 +
id2 *
id3 int_to_real
Compiler Design 60
Intermediate Code Generation:

• After syntax and semantic analysis some compilers generate an


intermediate representation of the source program.

• This representation should be easy to produce and easy to translate in to


the target program.

• Thus the intermediate code generation phase transforms the parse tree
into an intermediate language representation of the source program.

• One of the popular type of intermediate language is called as “three-


address code”, which is like the assemble language.

Compiler Design
Intermediate Code Generation
• The three address code for the statement :
temp1 = int_to_real(60)
temp2 = id3*temp1
temp3 = id2 + temp2
id1 = temp3
• The intermediate form has several properties.
• First each three-address instruction has at most one operator in addition
to the assignment.
• compiler has to decide the order of operations to be done.
• Second, the compiler must generate a temporary name hold the value
computed by each instruction.
• Third , some three address instruction may have less then three
operands such as the instructions
temp1 = int_to_real(60) and
id1 = temp3
Compiler Design
Code optimization
• Code optimization phase improves the intermediate code i.e. it reduces
the code by removing the repeated or unwanted instructions from the
intermediate code.
• The above given intermediate code can be optimized to the following
code:
temp1 = id3* 60.0
id1 = id2 + temp1
• Compiler can reduce that the conversion of 60 from integer to real can
be done all at once at compile time, so that int to real operation can be
eliminated.
• Temp3 is used to transmit its value to id1,instead which can be directly
substituted to id1.
Compiler Design
Code optimization

temp1 = int_to_real(60)
temp2 = id3*temp1
temp3 = id2 + temp2
id1 = temp3

Optimization

temp1 = id3* 60.0


id1 = id2 + temp1

Compiler Design
Code Generation:

• Code generation phase converts the intermediate code in to a target code


consisting of sequenced machine code or assembly code that perform the
same task.
• The above optimized code can be written using Registers R1 and R2 which is
as given below:
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
• First and second operands of each instruction specify source and destination.
• The F in each instruction tells that instructions deal with floating point
numbers.
Compiler Design
Code Generation:

• The code moves the contents of id3 to register R2.


• It is multiplied with the real constant 60.
• The third instruction moves id2 in to register R1 and adds to it’s the
value of register R2.
• Finally the value in register R1 is moved in to the address of id1.

Compiler Design
Reviewing the Entire Process

position := initial + rate * 60

lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1 r
id2l *
r
position .... id3 inttoreal o
60 r
initial ….
s
intermediate code generator
rate….

Compiler Design
Reviewing the Entire Process

Symbol Table E
r
position ....
r
initial …. o
intermediate code generator r
rate….
temp1 := inttoreal(60) s
temp2 := id3 * temp1
temp3 := id2 + temp2 3 address code
id1 := temp3
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1 Compiler Design
position := initial + rate * 60
intermediate code generator
lexical analyzer

The Phases of a Compiler


temp1 := inttoreal (60)
id1 := id2 + id3 * 60 temp2 := id3 * temp1
temp3 := id2 + temp2
syntax analyzer id1 := temp3

:=
code optimizer
id1 +
id2 *
temp1 := id3 * 60.0
id3 60 id1 := id2 + temp1

semantic analyzer code generator

:=
MOVF id3, R2
id1 +
MULF #60.0, R2
id2 * MOVF id2, R1
ADDF R2, R1
id3 inttoreal
MOVF R1, id1
60
Compiler Design
Symbol Tables
• A symbol table management or book keeping is a portion of the compiler
which keeps track of the names used by the program and records
information(attributes).
• The data structures used to record, this information is called a symbol
table.
• Data structure containing a record for each identifier with fields
specifying attributes.
• Attributes for variables include storage, type, scope, etc.
• Attributes for procedures include name, parameters, etc.
• When a lexical analyzer sees an identifier for the first time, it adds it to
the symbol table.

Compiler Design
Symbol Tables
• Symbol table management is a part of the compiler that interacts with
several of the phases
– Identifiers are found in lexical analysis and placed in the symbol table
– During syntactical and semantical analysis, type and scope
information is added
– During code generation, type information is used to determine what
instructions to use
– During optimization, the “live analysis” may be kept in the symbol
table.

Compiler Design
Error Handling
• Error handling and reporting also occurs across many phases
– Lexical analyzer reports invalid character sequences.
– Syntactic analyzer reports invalid token sequences.
– Semantic analyzer reports type and scope errors.
• The compiler may be able to continue with some errors, but other
errors may stop the process.
The main functionality of a compiler is the detection and reporting of
errors in the source program.
• detection
•Recovery
•Repair
•correction
Compiler Design
Compiler- Construction Tools

compiler-construction tools include:


1. Parser generators that automatically produce syntax analyzers from a
grammatical description of a programming language.
2. Scanner generators that produce lexical analyzers from a regular-
expression description of the tokens of a language.
3. Syntax-directed translation engines that produce collections of
routines for walking a parse tree and generating intermediate code.
4. Code-generator generators that produce a code generator from a
collection of rules for translating each operation of the intermediate
language into the machine language for a target machine.
5. Data-flow analysis engines that facilitate the gathering of information
about how values are transmitted from one part of a program to each
other part. Data-flow analysis is a key part of code optimization.
6. Compiler- construction toolkits that provide an integrated set of
routines for constructing various phases of a compiler.
Compiler Design
Compiler- Construction Tools

•The compiler writer, like any software developer, can


profitably use modern software development environments
containing tools such as language editors , debuggers,
version managers , profilers, test harnesses, and so on.
•In addition to these general software-development tools,
other more specialized tools have been created to help
implement various phases of a compiler.
•These tools use specialized languages for specifying and
implementing specific components, and many use quite
sophisticated algorithms.
•The most successful tools are those that hide the details of
the generation algorithm and produce components that can
be easily integrated into the remainder of the compiler.
Compiler Design

You might also like