0% found this document useful (0 votes)

111 views

Compiler Design

The document discusses the process of compiling programs from high-level languages to machine-executable code. It explains that a compiler translates programs written in a source language into an equivalent program in a target language. The compilation process involves several steps like preprocessing, compiling, assembling, linking and loading. These steps are carried out by components like the preprocessor, compiler, assembler, linker and loader as part of the language processing system. The document also describes the structure and phases of a compiler in detail.

Uploaded by

HDKH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views

Compiler Design

Uploaded by

HDKH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

COMPILER DESIGN

Programming languages are notations for describing computations to people and to machines.
But, before a program can be run, it first must be translated into a form in which it can be
executed by a computer. The software systems that do this translation are called compilers.

A compiler is a program that can read a program in one language - the source language - and
translate it into an equivalent program in another language - the target language. An
important role of the compiler is to report any errors in the source program that it detects
during the translation process.
LANGUAGE PROCESSING SYSTEM
The computer is an intelligent combination of software and hardware. Hardware is simply a
piece of mechanical equipment and its functions are being compiled by the relevant software.
To enlighten, the hardware code has to be written in binary format, which is just a series of 0s
and 1s. Writing such code would be an inconvenient and complicated task for computer
programmers, so we write programs in a high-level language, which is Convenient for us to
comprehend and memorize. These programs are then fed into a series of devices and
operating system (OS) components to obtain the desired code that can be used by the
machine. This is known as a language processing system.
In a language processing system, the source code is first preprocessed. The modified source
program is processed by the compiler to form the target assembly program which is then
translated by the assembler to create relocatable object codes that are processed by linker and
loader to create the target program. It is based on the input the translator takes and the output
it produces, and a language translator can be defined as any of the following.
Components of Language processing system:

 Preprocessor –
The preprocessor includes all header files and also evaluates whether a macro(A
macro is a piece of code that is given a name. Whenever the name is used, it is
replaced by the contents of the macro by an interpreter or compiler. The purpose of
macros is either to automate the frequency used for sequences or to enable more
powerful abstraction) is included. It takes source code as input and produces modified
source code as output. The preprocessor is also known as a macro evaluator,
processing is optional that is if any language that does not support #include and
macros processing is not required.

 Compiler –
The compiler takes the modified source code as input. The compiler may produce an
assembly-language program as its output, because assembly language is easier to
produce as output and is easier to debug.

 Assembler –
The target assembly code is then processed by a program called an assembler that
produces relocatable machine code as its output.

 Linker/ Loader –
Large programs are often compiled in pieces, so the relocatable machine code may
have to be linked together with other relocatable object files and library files into the
code that actually runs on the machine. The linker resolves external memory
addresses, where the code in one file may refer to a location in another file. The
loader then puts together all of the executable object files into memory for execution.
Or
A linker or link editor is a program that takes a collection of objects (created by
assemblers and compilers) and combines them into an executable program.
The loader keeps the linked program in the main memory.

 Executable code –
It is the low level and machine specific code and machine can easily understand. Once
the job of linker and loader is done then object code finally converted it into the
executable code.

STRUCTURE OF A COMPILER

A compiler is a software or a translating program that translates the instructions of high level
language to machine level language. A program which is input to the compiler is called
a Source program. This program is now converted to a machine level language by a compiler
is known as the Object code.
Some compilers convert the high-level language to an assembly language as an intermediate
step. Whereas some others convert it directly to machine code. This process of converting the
source code into machine code is called compilation. It lists all the errors if the input code
does not follow the rules of its language. The main purpose of compiler is to change the code
written in one language without changing the meaning of the program.

Analysis of a Source Program

We can analyze a source code in three main steps. Moreover, these steps are further divided into
different phases. The three steps are:

1. Linear Analysis
Here, it reads the character of the code from left to right. The characters having a collective
meaning are formed. We call these groups tokens.

2. Hierarchical Analysis
According to collective meaning, we divide the tokens hierarchically in a nested manner.

3. Semantic Analysis
In this step, we check if the components of the source code are appropriate in meaning.

STRUCTURE/PHASES OF A COMPILER
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler.

We basically have two phases of compilers, namely the Analysis phase and Synthesis
phase. The analysis phase creates an intermediate representation from the given source
code. The synthesis phase creates an equivalent target program from the intermediate
representation.

The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program is
either syntactically ill formed or semantically unsound, then it must provide informative
messages, so the user can take corrective action. The analysis part also collects information
about the source program and stores it in a data structure called a symbol table, which is
passed along with the intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.

A typical decomposition of a compiler into phases is shown in following figure. In practice,

several phases may be grouped together, and the intermediate representations between the
grouped phases need not be constructed explicitly. The symbol table, which stores
information about the entire source program, is used by all phases of the compiler. Some
compilers have a machine-independent optimization phase between the front end and the
back end. The purpose of this optimization phase is to perform transformations on the
intermediate representation, so that the back end can produce a better target program than it
would have otherwise produced from an unoptimized intermediate representation.

The phases include:

 Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a
stream of characters and converts it into meaningful lexemes. Lexical analyzer represents
these lexemes in the form of tokens as:
<token-name, attribute-value>

 Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.
 Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language.
For example, assignment of values is between compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc. The semantic analyzer produces an
annotated syntax tree as an output.
 Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for
the target machine. It represents a program for some abstract machine. It is in between the
high-level language and the machine language. This intermediate code should be generated in
such a way that it makes it easier to be translated into the target machine code.

 Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).

 Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) re-locatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it. The symbol table is also used
for scope management.

Example of Top-Down Parsing

Consider the input string provided by the lexical analyzer is ‘abd’ for the following
grammar.
S -> a A d
A -> b | b c
The top-down parser will parse the input string ‘abd’ and will start creating the parse tree
with the starting symbol ‘S‘. At each step of the top down parse, the key problem is that of
determining the production to be applied for a nonterminal. Once an production is chosen, the
rest of the parsing process consists of \matching" the terminal symbols in the production body
with the input string.

Now the first input symbol ‘a‘ matches the first leaf node of the tree. So the parser will move
ahead and find a match for the second input symbol ‘b‘.

But the next leaf node of the tree is a non-terminal i.e., A, that has two productions. Here, the
parser has to choose the A-production that can derive the string ‘abc‘. So the parser identifies
the A-production A-> b.

Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third input
symbol ‘d‘ matches the last leaf node ‘d‘ of the tree. Thereby successfully completing the
top-down parsing

Drawback of Top-Down Parsing

 Top-down parsing tries to identify the left-most derivation for an input string
ω which is similar to generating a parse tree for the input string ω that starts
from the root and produce the nodes in a pre-defined order.
 The reason that top-down parsing follow the left-most derivation for an input
string ω and not the right-most derivation is that the input string ω is scanned
by the parser from left to right, one symbol/token at a time. The left-most
derivation generates the leaves of the parse tree in the left to right order,
which connect the input scan order.
 In the top-down parsing, each terminal symbol produces by multiple
production of the grammar (which is predicted) is connected with the input
string symbol pointed by the string marker. If the match is successful, the
parser can sustain. If the mismatch occurs, then predictions have gone
wrong.
 At this phase it is essential to reject previous predictions. The prediction
which led to the mismatching terminal symbol is rejected and the string
marker (pointer) is reset to its previous position when the rejected production
was made. This is known as backtracking.
 Backtracking was the major drawback of top-down parsing.

Backtracking in Top-Down Parsing

Backtracking is scanning the provided input string by parser repeatedly. If one
production of a non-terminal fails in deriving the input string. The parser has to go
back to the position where it has chosen the production. And start deriving the string
again using another production of the same nonterminal. This process may need
repeated scans over the input string and we refer to it as backtracking.
Backtracking parsers are rarely used. This is because they are quite inefficient in
parsing the programing language.
Top- down parsers start from the root node (start symbol) and match the input string
against the production rules to replace them (if matched).
To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most
letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the
top-down parser advances to the next input letter (i.e. ‘e’). The parser tries to expand
non-terminal ‘X’ and checks its production from the left (X → oa). It does not match
with the next input symbol. So the top-down parser backtracks to obtain the next
production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is
accepted.

Top-Down Parsing without Backtracking

Once, the production rule is applied, it cannot be undone.

These are of two types:

1. Recursive Descent Parsing
2. Predictive Parsing or Non-Recursive Parsing or LL(1) Parsing or Table Driver
Parsing

Recursive Descent Parsing

A top-down parser that implements a set of recursive procedures to process the input without
backtracking is known as recursive-descent parser, and parsing is known as recursive-descent parsing.
This parsing technique recursively parses the input to make a parse tree
This parsing technique is regarded recursive as it uses context-free grammar which is recursive in
nature
To implement a recursive descent parser, the grammar must hold the following
properties:
1. It should not be left recursive.
2. It should be left-factored. (Alternates should not have common prefixes).
3. Language should have a recursion facility.
;

Chapter 2-Fundamentals of Logic PDF
No ratings yet
Chapter 2-Fundamentals of Logic PDF
14 pages
Forall: York Edition Solutions Booklet
No ratings yet
Forall: York Edition Solutions Booklet
91 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Introduction to Compiler
No ratings yet
Introduction to Compiler
10 pages
Compiler Construction: Language Processing System
No ratings yet
Compiler Construction: Language Processing System
8 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
CD Notes
No ratings yet
CD Notes
69 pages
Language Processing System:-: Compiler
No ratings yet
Language Processing System:-: Compiler
6 pages
Compiler Construction and Phases
No ratings yet
Compiler Construction and Phases
8 pages
Unit 1 Introduction To Compiler 1. Introduction To Compiler
No ratings yet
Unit 1 Introduction To Compiler 1. Introduction To Compiler
134 pages
CD Unit I Part I Introduction
No ratings yet
CD Unit I Part I Introduction
67 pages
Compiler Design Quick Guide
No ratings yet
Compiler Design Quick Guide
45 pages
Language Processing System in Compiler Design: Difficulty Level: Last Updated: 22 Feb, 2021
No ratings yet
Language Processing System in Compiler Design: Difficulty Level: Last Updated: 22 Feb, 2021
54 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Introduction of Compiler Design
No ratings yet
Introduction of Compiler Design
63 pages
Compiler Design - Quick Guide
No ratings yet
Compiler Design - Quick Guide
38 pages
Compiler Construction
No ratings yet
Compiler Construction
63 pages
Compiler Unit - 1 PDF
No ratings yet
Compiler Unit - 1 PDF
16 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
CD Sanchit Sir Notes
No ratings yet
CD Sanchit Sir Notes
115 pages
Compiler Design - Quick Guide: Language Processing System
No ratings yet
Compiler Design - Quick Guide: Language Processing System
51 pages
CD Unit1 Notes
No ratings yet
CD Unit1 Notes
28 pages
Quick Book of Compiler
100% (1)
Quick Book of Compiler
66 pages
CC 1
No ratings yet
CC 1
41 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Compiler 2024
No ratings yet
Compiler 2024
179 pages
Chapter 1 - Overview of Compilation
No ratings yet
Chapter 1 - Overview of Compilation
32 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
1-Phases of compiler
No ratings yet
1-Phases of compiler
68 pages
Compiler Design - Module 1-Notes
No ratings yet
Compiler Design - Module 1-Notes
74 pages
Compiler 1
No ratings yet
Compiler 1
28 pages
lecture notes of compiler design lab
No ratings yet
lecture notes of compiler design lab
170 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
Compiler Design UNIT 1
No ratings yet
Compiler Design UNIT 1
27 pages
CD Unit1 Notes
No ratings yet
CD Unit1 Notes
28 pages
ACD Unit-2 part-1
No ratings yet
ACD Unit-2 part-1
36 pages
Com 413 Compiler - Notes1-1
No ratings yet
Com 413 Compiler - Notes1-1
6 pages
CD Unit-1
No ratings yet
CD Unit-1
37 pages
Compiler Design
No ratings yet
Compiler Design
65 pages
CD Unit-1 (Complete)
No ratings yet
CD Unit-1 (Complete)
90 pages
CD Unit 1
No ratings yet
CD Unit 1
11 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
#Chapter 1 - CD
No ratings yet
#Chapter 1 - CD
37 pages
Principles of Compiler Design: Million G/her
No ratings yet
Principles of Compiler Design: Million G/her
40 pages
Unit 1
No ratings yet
Unit 1
9 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
No ratings yet
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
12 pages
CD Notes
No ratings yet
CD Notes
28 pages
CD Notes Final
No ratings yet
CD Notes Final
72 pages
Department of Computer Science & Engineering: Special Assignment - 1 As Course Research Paper
No ratings yet
Department of Computer Science & Engineering: Special Assignment - 1 As Course Research Paper
3 pages
cd-unit-i
No ratings yet
cd-unit-i
15 pages
Chapter 1 Introduction To Compiler Design
No ratings yet
Chapter 1 Introduction To Compiler Design
13 pages
Compiler Design - Introduction
No ratings yet
Compiler Design - Introduction
6 pages
Compiler 1
No ratings yet
Compiler 1
33 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Maths Book BBA
No ratings yet
Maths Book BBA
27 pages
Java Practice Solutions
No ratings yet
Java Practice Solutions
7 pages
7.1 Context Free Grammars: 7.1.1 The Basics
No ratings yet
7.1 Context Free Grammars: 7.1.1 The Basics
5 pages
Hoare
No ratings yet
Hoare
23 pages
Theory of Computation - All
0% (1)
Theory of Computation - All
7 pages
Theory of Automata Lecture 1
No ratings yet
Theory of Automata Lecture 1
51 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Planning: Version 2 CSE IIT, Kharagpur
No ratings yet
Planning: Version 2 CSE IIT, Kharagpur
5 pages
Math 1
No ratings yet
Math 1
4 pages
CSC108 - Week 3 Notes
No ratings yet
CSC108 - Week 3 Notes
5 pages
Aieee Reasoning PDF
No ratings yet
Aieee Reasoning PDF
81 pages
Test Flight Problem Set Q1
No ratings yet
Test Flight Problem Set Q1
1 page
Public String Intern
No ratings yet
Public String Intern
6 pages
Python
No ratings yet
Python
6 pages
BDII Tema02-Rez
No ratings yet
BDII Tema02-Rez
8 pages
Tut 6
No ratings yet
Tut 6
2 pages
Java Notes by Pradeep Goud
No ratings yet
Java Notes by Pradeep Goud
45 pages
Module 1
No ratings yet
Module 1
26 pages
Lambda Calculus
No ratings yet
Lambda Calculus
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
7 pages
State Space Stability
No ratings yet
State Space Stability
4 pages
Lecture 07
No ratings yet
Lecture 07
35 pages
PEE - Lesson 2
No ratings yet
PEE - Lesson 2
2 pages
pREPOSITIONAL LOGIC
No ratings yet
pREPOSITIONAL LOGIC
36 pages
List of Logic Symbols - Wikipedia
No ratings yet
List of Logic Symbols - Wikipedia
5 pages
Operators in Java
No ratings yet
Operators in Java
11 pages
Chapter 3 Lexical Analyser
No ratings yet
Chapter 3 Lexical Analyser
29 pages
Discrete Structure Chapter 2.1 - Logic Circuits-29
No ratings yet
Discrete Structure Chapter 2.1 - Logic Circuits-29
29 pages