0% found this document useful (0 votes)

92 views49 pages

CSC303 - Compiler Design - 060624

Compiler Design

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views49 pages

CSC303 - Compiler Design - 060624

Compiler Design

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

UNIVERSITY OF DELTA, AGBOR

FACULTY OF COMPUTING
COMPUTER SCIENCE DEPARTMENT
COURSE TITLE: COMPILER CONSTRUCTION & DESIGN

COURSE CODE: CSC 303

LESSON 1

INTRODUCTION
 We generally write a computer program using a high-level
language. A high-level language is one that is understandable by
us, humans. This is called source code.
 However, a computer does not understand high-level language. It
only understands the program written in 0's and 1's in binary,
called the machine code.
 To convert source code into machine code, we use either a
compiler or an interpreter.
 Both compilers and interpreters are used to convert a program
written in a high-level language into machine code understood by
computers. However, there are differences between how an
interpreter and a compiler works.
Compiler
 A compiler is a software /program that converts a program written
in high level language to a low level language (object/target
language).
 It also reports errors present in source program
Input

Source Program Compiler Target program

Error messages Output

Types of Compilers
a). Single pass compilers: These are compilers that process the
source code only once. Example Turbo Pascal compiler.

b). Multi-pass compilers: These are compilers that process the

source code multiple times, in converting from high-level language
to low-level language. Example – GCC compiler.
High-level language High-level language

All passes in First pass

one single
module

Second pass

low-level language low-level language

Compilation process/phase

 Analysis: This phase breakdown the source code or program into

smaller parts and creates an intermediate code or representation of
the source program.

 Synthesis: This phase takes the intermediate code or representation

of the source program as input and creates the desired code or
program
Interpreters

 Interpreters: An interpreters translates code line by line during

execution, making it easier to detect errors but potentially slowing
down the program.

Source Program

Interpreter Output

Input
Error messages
Interpreter Compiler
Translates program one statement at a Scans the entire program and translates
time. it as a whole into machine code.
Slow in speed Fast in speed
No intermediate object code is generated, Generates intermediate object code
hence are memory requirement is less. which further requires linking, hence
requires more memory.
Errors – continues translating the program Errors – All errors are displayed at once
until the 1st error is encountered, and stops. (together). Hence difficult to detect
Errors easy to detect
Interpreters are small in size Compilers are large in size
Examples – Perl, Python, Ruby, Matlab etc Examples – C, C++, Scala etc
Language Processing System
We have learnt that any computer system is made of hardware
and software. The hardware understands a language, which
humans cannot understand. So we write programs in high-
level language, which is easier for us to understand and
remember. These programs are then fed into a series of tools
and OS components to get the desired code that can be used by
the machine. This is known as Language Processing System.
Removes directives, adds files and
performs macro expansion

Language Processing System

Preprocessor
A preprocessor, generally considered as a part of compiler, is a
tool that produces input for compilers. It deals with the
following:
 High-Level Language (source code) is converted to pure
HLL by removing preprocessor directives (#define,
#include <stdio.h> etc) and add the respective file (file
inclusion)
 It performs macro expansion, operator conversion (e.g
a++; a=a+1)
Compiler
The compiler, translates high-level language into low-level
machine language. The difference lies in the way they read the
source code or input. A compiler reads the whole source code at
once, creates tokens, checks semantics, generates intermediate
code, executes the whole
Assembler
An assembler translates assembly language programs into machine
code. The output of an assembler is called an object file,
which contains a combination of machine instructions as well
as the data required to place these instructions in memory.
Linker
Linker is a computer program that links and merges various object
files together in order to make an executable file. All these files might
have been compiled by separate assemblers. The major task of a
linker is to search and locate referenced module/routines in a program
and to determine the memory location where these codes will be
loaded, making the program instruction to have absolute references.
Loader
Loader is a part of operating system and is responsible for
loading executable files into memory and execute them. It
calculates the size of a program (instructions and data) and creates
memory space for it. It initializes various registers to initiate execution.
Native-compiler
A compiler that runs on platform (A) and is capable of generating
executable code for platform (A) is called a native-compiler.
Cross-compiler
A compiler that runs on platform (A) and is capable of generating
executable code for platform (B) is called a cross-compiler.
Source-to-source Compiler
A compiler that takes the source code of one programming
language and translates it into the source code of another
programming language is called a source-to-source compiler.
Compiler – writing – tools
Number of tools has been developed in helping to construct
compilers. Tools range from scanner and parser generators
to complex systems, called compiler-compilers, compiler-
generators or translator-writing systems.
The input specification for these systems may contain:
1. A description of the lexical and syntactic structure of the
source languages.
2. A description of what output is to be generated for each
source language construct.
3. A description of the target machine.
The principle aids provided by the compiler-compilers are:
1. For Scanner Generator the Regular Expression is being
used.
2. For Parser Generator the Context Free Grammars are
used.
NOTE: A compiler is characterized by three languages:
1. source language
2. object language
3. The language in which it is written.
Compiler Architecture
A compiler can broadly be divided into two phases based on the
way they compile.

Analysis Phase
Analysis phase is known as the front-end of the compiler, this
phase of the compiler reads the source program, divides it into
core parts, and then checks for lexical, grammar, and syntax
errors. The analysis phase generates an intermediate
representation of the source program and symbol table, which
should be fed to the Synthesis phase as input.
Working Principle of Compiler
Synthesis Phase
Synthesis phase is known as the back-end of the compiler, this
phase generates the target program with the help of intermediate
source code representation and symbol table. A compiler can
have many phases and passes.
Pass: A pass refers to the traversal of a compiler through the
entire program.
Phase: A phase of a compiler is a distinguishable stage, which
takes input from the previous stage, processes and yields output
that can be used as input for the next stage. A pass can have
more than one phase.
Phases v/s Passes:
Phases of a compiler are sub tasks that must be
performed to complete the compilation process. Passes
refers to the number of times the compiler has to traverse
through the entire program.
Phases of Compiler
The compilation process is a sequence of various phases. Each
phase takes input from its previous stage, has its own
representation of source program, and feeds its output to the
next phase of the compiler. Let us understand the phases of a
compiler.
High Level Language

Tokens

Parse tree

Parse tree (verified semantically)

Three address code

Optimized code

Assembly code Architecture of the Compiler

Lexical Analysis
The first phase of compiler is also known as Scanner. The scanner
works as a text scanner. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens as:
<Token-name, attribute-value>
 reads the source code/program and converts it into tokens using a
tool called LEX.
 Tokens are defined by regular expression which are understood by
the lexical analyzer
 the lexical analyzer removes white spaces, comments, tabs etc.
from the source code.
Syntax Analysis
 takes the token one by one and uses Context Free Grammar (CFG)
to construct the parse tree. If it is not possible to construct the
parse tree, then the input is syntactically incorrect and error
message will be shown or displayed.
 Using the production from CFG, we can represent what the
program actually is.
 The input has to be checked whether it is in the desired format or
not.
 Syntax errors can be detected by it if the input is not according to
the grammar given.
Semantic Analysis (Parser)
 Semantic analysis checks whether the parse tree constructed
(is meaningful or not) thus follows the rules of language.
For example, it checks type casting, type conversions issues
and so on.
 Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared
before use or not, etc.
 The semantic analyzer produces an annotated syntax tree
as an output.
Intermediate Code Generation
 After semantic analysis, the compiler generates an
intermediate code of the source code for the target
machine.
 It represents a program for some abstract machine. It is in
between the high-level language and the machine language.
 This intermediate code should be generated in such a way
that it makes it easier to be translated into the target
machine code. The intermediate code may be a Three
Address code or Assembly code.
Code Optimization
 The next phase does code optimization, it is an optional phase.
Optimization can be assumed as something that removes
unnecessary code lines, and arranges the sequence of statements in
order to speed up the program execution without wasting resources
like CPU, memory. The output of this phase is an optimized
intermediate code.
 Hence, code optimization phase attempts to improve the
intermediate code so that it runs faster and consumes less resources.
Code Generation
 In this phase, the code generator takes the optimized
representation of the intermediate code and maps it to the target
machine language.
 The code generator translates the intermediate code into a sequence
of re-locatable machine code (Assembly code) - sequence of
instructions of machine code performs the task as the intermediate
code would do.
Symbol Table
 Symbol Table is also known as Book Keeping.
 It is a data-structure maintained throughout all the
phases of a compiler. All the identifiers‟ names along
with their information like type, size, etc., are stored here.
 The symbol table makes it easier for the compiler to
quickly search and retrieve the identifiers record.
 The symbol table is also used for scope management (All
phases interacts with the symbol table).
Error Hander
 It is a module which takes care of all events encountered
during compilation.
 It takes care to continue the compilation process even if
errors are encountered.
 The task of the error handling process are to detect each
error, report it to the user and make some recovery strategy
and implement them to handle errors.
Summary
 A compiler is a program that converts high-level language to
assembly language.
 A linker tool is used to link all the parts of the program
together for execution. A loader loads all of them into
memory and then the program is executed.
 A compiler that runs on machine and produces executable
code for another machine is called a cross-compiler.
Summary
 A Compiler is divided into two parts namely Analysis and
Synthesis. The compilation process is done in various
phases.
 Two or more phases can be combined to form a pass.
 A parser should be able to detect and report any error in the
program.
Assignment
1. Differentiate between a compiler and an Interpreter.
2. List five (5) programming languages each that uses
compiler and Interpreter.
3. Write a short note on Compiler Writing tools.
2. Differentiate between Linker and Loader.
3. Explain Bootstrapping.
4. Differentiate between Analysis phase and Synthesis phase.
5. Describe the phases of the Compiler.
MAIN FUNCTION OF LEXICAL ANALYZER
The Lexical Analysis phase converts source programs into
streams of tokens. This phase is also called the scanning phase.
Functions
 it reads the input program character by character, and
produces a stream of tokens, and passes the data to the
syntax analyzer when demanded.
 removing whitespaces/tabs
 removing comments from the source program
 generates error and gives the line number of the error
Parse Tree

Suppose we pass the following:

a = b + c;
We will get tokens like
id = id + id
Where each id refers to its variable in the symbol table
Tokens, Lexemes and Pattern
 Tokens: A token is a sequence of characters that can be treated as a
unit or single logical entity. Typical tokens are: keywords (for, if,
while), identifiers (variable names), operators (+, -, *, /, ;) etc.
int a = 5; (has 5 tokens)
int is a keyword, a is an identifier, = is an operator, 5 is a constant and
; is a separator.

 Lexemes: A lexeme is a sequence of characters in the source

program that is matched by the pattern for a token or a sequence of
input characters that comprises a single token.
Tokens, Lexemes and Pattern
 Pattern: A pattern is a rule describing all lexemes that can
represent a particular token in a source language and are defined by
means of regular expression. Or A pattern are some predefined rules
for every lexeme to be identified as a valid token. These rules are
defined by grammar rules by means of pattern.
Tokens, Lexemes and Pattern
 Questions: Count the number of tokens
 Int max (int i); 7 tokens

 int main( )
{
// 2 variables declared below
18 token (comments
int a, b;
are omitted)
a = 10;
return 0;
}
Tokens, Lexemes and Pattern
 Questions: Count the number of tokens
 printf(“Never give up”); 5 tokens

 printf(“%d Hello”, &x); 8 tokens

 int main( )
{
int a = 10, b = 20;
27 tokens
printf(“sum is = %d”, a + b);
return 0;
}
Specifications of Tokens
Let us understand how the language theory considers the
following terms:
 Alphabets
 Any finite set of symbols {0,1} is a set of binary
alphabets.
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of
Hexadecimal alphabets.
 {a-z, A-Z} is a set of English language alphabets.
 Strings
 Any finite sequence of alphabets is called a string.
Length of the string is the total number of
alphabets in the string, e.g., the string S is “NIGERIA”,
the length of the string, S is 7 and is denoted by |S|= 7.
A string having no alphabets, i.e. a string of zero
length is known as an empty string and is denoted by
ε (epsilon).
Language
A language is considered as a finite set of strings over some
finite set of alphabets. Computer languages are considered as
finite sets, and mathematically set operations can be performed
on them. Finite languages can be described by means of
regular expressions.
Regular Expressions
The lexical analyzer needs to scan and identify only
a finite set of valid string/token/lexeme that belong to the
language in hand. It searches for the pattern defined by the
language rules. Regular expressions have the capability to
express finite languages by defining a pattern for finite strings
of symbols. The grammar defined by regular expressions is
known as Regular Grammar. The language defined by
regular grammar is known as Regular Language.
Regular expression is an important notation for specifying
patterns. Each pattern matches a set of strings, so regular
expressions serve as names for a set of strings. Programming
language tokens can be described by regular languages. The
specification of regular expressions is an example of a recursive
definition. Regular languages are easy to understand and have
efficient implementation.
There are a number of algebraic laws that are obeyed by regular
expressions, which can be used to manipulate regular
expressions into equivalent forms.
Operations
The various operations on languages are:
1. Union of two languages L and M is written as:
L U M = {s | s is in L or s is in M}
2. Concatenation of two languages L and M is written as:
LM = {st | s is in L and t is in M}
3. The Kleene Closure of a language L is written as:
L* = Zero or more occurrence of language L.
X* means zero or more occurrence of x. i.e., it can
generate { e, x, xx, xxx, xxxx, … }
Notations
If r and s are regular expressions denoting the languages L(r)
and L(s), then
 Union : (r)|(s) is a regular expression denoting L(r) U L(s)
 Concatenation : (r)(s) is a regular expression denoting
L(r)L(s)
 Kleene closure : (r)* is a regular expression denoting
(L(r))*
Note: (r) is a regular expression denoting L(r)
Example:
Given the regular languages
A = {xy, z} and B = {k, mn}
Perform the following
(i)A* (ii) B* (iii) A U B (iv) A o B

Solution
(i) A* = {ɛ, xy, z, xyz, xyxy, zz, xyxyxy, zzz, …}
(ii) B* = {ɛ, k, mn, kmn, kk, mnmn, kkk, mnmnmn…}
(iii) A U B = {xy, z, k, mn}
(iv) A o B = {xyk, xymn, zk, zmn}

CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
CD 1
No ratings yet
CD 1
15 pages
Compiler Design Essentials
No ratings yet
Compiler Design Essentials
14 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
Introduction To Compiler
No ratings yet
Introduction To Compiler
10 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Compiler Course for CS Students
No ratings yet
Compiler Course for CS Students
41 pages
CSC 320 Notes - 1
No ratings yet
CSC 320 Notes - 1
67 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Chapter 2ditt
No ratings yet
Chapter 2ditt
19 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
Compiler 2024
No ratings yet
Compiler 2024
179 pages
Intro To Compilers
No ratings yet
Intro To Compilers
77 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
9 pages
CD All Units
No ratings yet
CD All Units
117 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
Compiler Design
No ratings yet
Compiler Design
152 pages
Compiler Construction and Phases
No ratings yet
Compiler Construction and Phases
8 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
CS501 Ca1 10600121071
No ratings yet
CS501 Ca1 10600121071
24 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Compiler Design CSE - 353: UNIT-1
No ratings yet
Compiler Design CSE - 353: UNIT-1
42 pages
Compiler Design Quick Guide
No ratings yet
Compiler Design Quick Guide
45 pages
Compiler Design Notes Unit 1-5
No ratings yet
Compiler Design Notes Unit 1-5
185 pages
CD Notes
No ratings yet
CD Notes
69 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
CD - Unit 1 Notes
No ratings yet
CD - Unit 1 Notes
38 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
CD Module 1 Cambridge
No ratings yet
CD Module 1 Cambridge
136 pages
CD Notes
No ratings yet
CD Notes
28 pages
Unit 1 Introduction To Compiler 1. Introduction To Compiler
No ratings yet
Unit 1 Introduction To Compiler 1. Introduction To Compiler
134 pages
m433-نظرية المترجمات د عبدالباقي
No ratings yet
m433-نظرية المترجمات د عبدالباقي
146 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Compilers
No ratings yet
Compilers
86 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
1.lecture Notes 19 Apil
No ratings yet
1.lecture Notes 19 Apil
26 pages
Debre Markos University Burie Campus Departement of Computer Science
No ratings yet
Debre Markos University Burie Campus Departement of Computer Science
44 pages
Compiler Design-Notes
100% (2)
Compiler Design-Notes
212 pages
Introduction To Compiler
No ratings yet
Introduction To Compiler
57 pages
Principles of Compiler Design - Unit I
No ratings yet
Principles of Compiler Design - Unit I
71 pages
#Chapter 1 - CD
No ratings yet
#Chapter 1 - CD
37 pages
Compiler Construction: Language Processing System
No ratings yet
Compiler Construction: Language Processing System
8 pages
Unit 01
No ratings yet
Unit 01
78 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
9 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Compiler Construction Guide
100% (1)
Compiler Construction Guide
91 pages
CD Unit 1
No ratings yet
CD Unit 1
11 pages
Introduction
No ratings yet
Introduction
40 pages
CMP 352
No ratings yet
CMP 352
16 pages
Compiler Design and Analysis Guide
No ratings yet
Compiler Design and Analysis Guide
44 pages
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
No ratings yet
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
43 pages
Unit 1 Part 3 - Compiler
No ratings yet
Unit 1 Part 3 - Compiler
45 pages
Compiler Notes Arv
No ratings yet
Compiler Notes Arv
171 pages
Python Lab File
No ratings yet
Python Lab File
23 pages
Subhasish Chatterjee Resume
No ratings yet
Subhasish Chatterjee Resume
5 pages
Chapter 9 Lab More Classes and Objects Lab Objectives
No ratings yet
Chapter 9 Lab More Classes and Objects Lab Objectives
6 pages
AACC Application Development and Integration Capabilities
No ratings yet
AACC Application Development and Integration Capabilities
41 pages
10.PEGA Scenario Based Question, Lock Mechanism, All Obj Methods
No ratings yet
10.PEGA Scenario Based Question, Lock Mechanism, All Obj Methods
61 pages
Google Test Framework Public
No ratings yet
Google Test Framework Public
40 pages
ASP.NET MVC Music Store Guide
No ratings yet
ASP.NET MVC Music Store Guide
4 pages
Continuations by Example: Exceptions, Time-Traveling Search, Generators, Threads, and Coroutines
No ratings yet
Continuations by Example: Exceptions, Time-Traveling Search, Generators, Threads, and Coroutines
8 pages
Oracle 1Z0-071 Exam Prep Guide
No ratings yet
Oracle 1Z0-071 Exam Prep Guide
6 pages
Xcerts Certifications
No ratings yet
Xcerts Certifications
4 pages
Fixing ng Build Prod Errors
No ratings yet
Fixing ng Build Prod Errors
4 pages
Class Concept Questions
No ratings yet
Class Concept Questions
11 pages
SQL Injection
No ratings yet
SQL Injection
49 pages
1rm s4cld2308 BPD en XX
No ratings yet
1rm s4cld2308 BPD en XX
23 pages
Fundamental of Software Enginnering Chapter 1 Book
No ratings yet
Fundamental of Software Enginnering Chapter 1 Book
47 pages
Lesson 2 - Learners Guide
No ratings yet
Lesson 2 - Learners Guide
3 pages
Speed Preservation Calculator
No ratings yet
Speed Preservation Calculator
2 pages
(Ebooks PDF) Download (Ebook PDF) HTML5 and CSS3, Illustrated Complete 1st Edition by Sasha Vodnik Full Chapters
100% (6)
(Ebooks PDF) Download (Ebook PDF) HTML5 and CSS3, Illustrated Complete 1st Edition by Sasha Vodnik Full Chapters
38 pages
Mod Menu Log - Com - Tfgco.games - Strategy.free - Castlecrush
No ratings yet
Mod Menu Log - Com - Tfgco.games - Strategy.free - Castlecrush
121 pages
OOMD
No ratings yet
OOMD
52 pages
Zscaler Cloud Protection at A Glance
No ratings yet
Zscaler Cloud Protection at A Glance
2 pages
Systems Analysis SAD NOTES
No ratings yet
Systems Analysis SAD NOTES
11 pages
SystemVerilog Basics Part 1
No ratings yet
SystemVerilog Basics Part 1
27 pages
Ai Project File
No ratings yet
Ai Project File
6 pages
IBMTurbonomic 8.12.2
No ratings yet
IBMTurbonomic 8.12.2
1,720 pages
Usr CP Optimizer
No ratings yet
Usr CP Optimizer
140 pages
Blitz-Logs 20210412170009
No ratings yet
Blitz-Logs 20210412170009
84 pages
OS Installation & CPU Scheduling Lab
No ratings yet
OS Installation & CPU Scheduling Lab
59 pages
Unit - I Introduction To Programming Languages: Computer Application in Business
No ratings yet
Unit - I Introduction To Programming Languages: Computer Application in Business
93 pages
Lecture07 MPI by Example
No ratings yet
Lecture07 MPI by Example
27 pages