0% found this document useful (0 votes)
111 views

Compiler Design

The document discusses the process of compiling programs from high-level languages to machine-executable code. It explains that a compiler translates programs written in a source language into an equivalent program in a target language. The compilation process involves several steps like preprocessing, compiling, assembling, linking and loading. These steps are carried out by components like the preprocessor, compiler, assembler, linker and loader as part of the language processing system. The document also describes the structure and phases of a compiler in detail.

Uploaded by

HDKH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

Compiler Design

The document discusses the process of compiling programs from high-level languages to machine-executable code. It explains that a compiler translates programs written in a source language into an equivalent program in a target language. The compilation process involves several steps like preprocessing, compiling, assembling, linking and loading. These steps are carried out by components like the preprocessor, compiler, assembler, linker and loader as part of the language processing system. The document also describes the structure and phases of a compiler in detail.

Uploaded by

HDKH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

COMPILER DESIGN

Programming languages are notations for describing computations to people and to machines.
But, before a program can be run, it first must be translated into a form in which it can be
executed by a computer. The software systems that do this translation are called compilers.

A compiler is a program that can read a program in one language - the source language - and
translate it into an equivalent program in another language - the target language. An
important role of the compiler is to report any errors in the source program that it detects
during the translation process.
LANGUAGE PROCESSING SYSTEM
The computer is an intelligent combination of software and hardware. Hardware is simply a
piece of mechanical equipment and its functions are being compiled by the relevant software.
To enlighten, the hardware code has to be written in binary format, which is just a series of 0s
and 1s. Writing such code would be an inconvenient and complicated task for computer
programmers, so we write programs in a high-level language, which is Convenient for us to
comprehend and memorize. These programs are then fed into a series of devices and
operating system (OS) components to obtain the desired code that can be used by the
machine. This is known as a language processing system.
In a language processing system, the source code is first preprocessed. The modified source
program is processed by the compiler to form the target assembly program which is then
translated by the assembler to create relocatable object codes that are processed by linker and
loader to create the target program. It is based on the input the translator takes and the output
it produces, and a language translator can be defined as any of the following.
Components of Language processing system:

 Preprocessor –
The preprocessor includes all header files and also evaluates whether a macro(A
macro is a piece of code that is given a name. Whenever the name is used, it is
replaced by the contents of the macro by an interpreter or compiler. The purpose of
macros is either to automate the frequency used for sequences or to enable more
powerful abstraction) is included. It takes source code as input and produces modified
source code as output. The preprocessor is also known as a macro evaluator,
processing is optional that is if any language that does not support  #include  and
macros  processing is not required.

 
 Compiler –
The compiler takes the modified source code as input. The compiler may produce an
assembly-language program as its output, because assembly language is easier to
produce as output and is easier to debug.

 Assembler –
The target assembly code is then processed by a program called an assembler that
produces relocatable machine code as its output.

 
 Linker/ Loader  –
Large programs are often compiled in pieces, so the relocatable machine code may
have to be linked together with other relocatable object files and library files into the
code that actually runs on the machine. The linker resolves external memory
addresses, where the code in one file may refer to a location in another file. The
loader then puts together all of the executable object files into memory for execution.
Or
A linker or link editor is a program that takes a collection of objects (created by
assemblers and compilers) and combines them into an executable program.
The loader keeps the linked program in the main memory.

 Executable code – 
It is the low level and machine specific code and machine can easily understand. Once
the job of linker and loader is done then object code finally converted it into the
executable code. 

STRUCTURE OF A COMPILER

A compiler is a software or a translating program that translates the instructions of high level
language to machine level language. A program which is input to the compiler is called
a Source program. This program is now converted to a machine level language by a compiler
is known as the Object code.
Some compilers convert the high-level language to an assembly language as an intermediate
step. Whereas some others convert it directly to machine code. This process of converting the
source code into machine code is called compilation. It lists all the errors if the input code
does not follow the rules of its language. The main purpose of compiler is to change the code
written in one language without changing the meaning of the program.

Analysis of a Source Program

We can analyze a source code in three main steps. Moreover, these steps are further divided into
different phases. The three steps are:

1. Linear Analysis
Here, it reads the character of the code from left to right. The characters having a collective
meaning are formed. We call these groups tokens.

2. Hierarchical Analysis
According to collective meaning, we divide the tokens hierarchically in a nested manner.

3. Semantic Analysis
In this step, we check if the components of the source code are appropriate in meaning.

STRUCTURE/PHASES OF A COMPILER
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler.

We basically have two phases of compilers, namely the Analysis phase and Synthesis
phase. The analysis phase creates an intermediate representation from the given source
code. The synthesis phase creates an equivalent target program from the intermediate
representation.  

The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program is
either syntactically ill formed or semantically unsound, then it must provide informative
messages, so the user can take corrective action. The analysis part also collects information
about the source program and stores it in a data structure called a symbol table, which is
passed along with the intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.

A typical decomposition of a compiler into phases is shown in following figure. In practice,


several phases may be grouped together, and the intermediate representations between the
grouped phases need not be constructed explicitly. The symbol table, which stores
information about the entire source program, is used by all phases of the compiler. Some
compilers have a machine-independent optimization phase between the front end and the
back end. The purpose of this optimization phase is to perform transformations on the
intermediate representation, so that the back end can produce a better target program than it
would have otherwise produced from an unoptimized intermediate representation.

The phases include:


 Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a
stream of characters and converts it into meaningful lexemes. Lexical analyzer represents
these lexemes in the form of tokens as:
<token-name, attribute-value>

 Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.
 Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language.
For example, assignment of values is between compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc. The semantic analyzer produces an
annotated syntax tree as an output.
 Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for
the target machine. It represents a program for some abstract machine. It is in between the
high-level language and the machine language. This intermediate code should be generated in
such a way that it makes it easier to be translated into the target machine code.

 Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).

 Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) re-locatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it. The symbol table is also used
for scope management.

Example of Top-Down Parsing


Consider the input string provided by the lexical analyzer is ‘abd’  for the following
grammar.
S -> a A d
A -> b | b c
The top-down parser will parse the input string ‘abd’ and will start creating the parse tree
with the starting symbol ‘S‘. At each step of the top down parse, the key problem is that of
determining the production to be applied for a nonterminal. Once an production is chosen, the
rest of the parsing process consists of \matching" the terminal symbols in the production body
with the input string.

Now the first input symbol ‘a‘ matches the first leaf node of the tree. So the parser will move
ahead and find a match for the second input symbol ‘b‘.

But the next leaf node of the tree is a non-terminal i.e., A, that has two productions. Here, the
parser has to choose the A-production that can derive the string ‘abc‘. So the parser identifies
the A-production A-> b.

Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third input
symbol ‘d‘ matches the last leaf node ‘d‘ of the tree. Thereby successfully completing the
top-down parsing

Drawback of Top-Down Parsing


 Top-down parsing tries to identify the left-most derivation for an input string
ω which is similar to generating a parse tree for the input string ω that starts
from the root and produce the nodes in a pre-defined order.
 The reason that top-down parsing follow the left-most derivation for an input
string ω and not the right-most derivation is that the input string ω is scanned
by the parser from left to right, one symbol/token at a time. The left-most
derivation generates the leaves of the parse tree in the left to right order,
which connect the input scan order.
 In the top-down parsing, each terminal symbol produces by multiple
production of the grammar (which is predicted) is connected with the input
string symbol pointed by the string marker. If the match is successful, the
parser can sustain. If the mismatch occurs, then predictions have gone
wrong.
 At this phase it is essential to reject previous predictions. The prediction
which led to the mismatching terminal symbol is rejected and the string
marker (pointer) is reset to its previous position when the rejected production
was made. This is known as backtracking.
 Backtracking was the major drawback of top-down parsing.

Backtracking in Top-Down Parsing


Backtracking is scanning the provided input string by parser repeatedly. If one
production of a non-terminal fails in deriving the input string. The parser has to go
back to the position where it has chosen the production. And start deriving the string
again using another production of the same nonterminal. This process may need
repeated scans over the input string and we refer to it as backtracking.
Backtracking parsers are rarely used. This is because they are quite inefficient in
parsing the programing language.
Top- down parsers start from the root node (start symbol) and match the input string
against the production rules to replace them (if matched).
To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most
letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the
top-down parser advances to the next input letter (i.e. ‘e’). The parser tries to expand
non-terminal ‘X’ and checks its production from the left (X → oa). It does not match
with the next input symbol. So the top-down parser backtracks to obtain the next
production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is
accepted.

Top-Down Parsing without Backtracking


Once, the production rule is applied, it cannot be undone.

These are of two types:


1. Recursive Descent Parsing
2. Predictive Parsing or Non-Recursive Parsing or LL(1) Parsing or Table Driver
Parsing

Recursive Descent Parsing


A top-down parser that implements a set of recursive procedures to process the input without
backtracking is known as recursive-descent parser, and parsing is known as recursive-descent parsing.
This parsing technique recursively parses the input to make a parse tree
This parsing technique is regarded recursive as it uses context-free grammar which is recursive in
nature
To implement a recursive descent parser, the grammar must hold the following
properties:
1. It should not be left recursive.
2. It should be left-factored. (Alternates should not have common prefixes).
3. Language should have a recursion facility.
;

You might also like