Compiler Design
Compiler Design
Programming languages are notations for describing computations to people and to machines.
But, before a program can be run, it first must be translated into a form in which it can be
executed by a computer. The software systems that do this translation are called compilers.
A compiler is a program that can read a program in one language - the source language - and
translate it into an equivalent program in another language - the target language. An
important role of the compiler is to report any errors in the source program that it detects
during the translation process.
LANGUAGE PROCESSING SYSTEM
The computer is an intelligent combination of software and hardware. Hardware is simply a
piece of mechanical equipment and its functions are being compiled by the relevant software.
To enlighten, the hardware code has to be written in binary format, which is just a series of 0s
and 1s. Writing such code would be an inconvenient and complicated task for computer
programmers, so we write programs in a high-level language, which is Convenient for us to
comprehend and memorize. These programs are then fed into a series of devices and
operating system (OS) components to obtain the desired code that can be used by the
machine. This is known as a language processing system.
In a language processing system, the source code is first preprocessed. The modified source
program is processed by the compiler to form the target assembly program which is then
translated by the assembler to create relocatable object codes that are processed by linker and
loader to create the target program. It is based on the input the translator takes and the output
it produces, and a language translator can be defined as any of the following.
Components of Language processing system:
Preprocessor –
The preprocessor includes all header files and also evaluates whether a macro(A
macro is a piece of code that is given a name. Whenever the name is used, it is
replaced by the contents of the macro by an interpreter or compiler. The purpose of
macros is either to automate the frequency used for sequences or to enable more
powerful abstraction) is included. It takes source code as input and produces modified
source code as output. The preprocessor is also known as a macro evaluator,
processing is optional that is if any language that does not support #include and
macros processing is not required.
Compiler –
The compiler takes the modified source code as input. The compiler may produce an
assembly-language program as its output, because assembly language is easier to
produce as output and is easier to debug.
Assembler –
The target assembly code is then processed by a program called an assembler that
produces relocatable machine code as its output.
Linker/ Loader –
Large programs are often compiled in pieces, so the relocatable machine code may
have to be linked together with other relocatable object files and library files into the
code that actually runs on the machine. The linker resolves external memory
addresses, where the code in one file may refer to a location in another file. The
loader then puts together all of the executable object files into memory for execution.
Or
A linker or link editor is a program that takes a collection of objects (created by
assemblers and compilers) and combines them into an executable program.
The loader keeps the linked program in the main memory.
Executable code –
It is the low level and machine specific code and machine can easily understand. Once
the job of linker and loader is done then object code finally converted it into the
executable code.
STRUCTURE OF A COMPILER
A compiler is a software or a translating program that translates the instructions of high level
language to machine level language. A program which is input to the compiler is called
a Source program. This program is now converted to a machine level language by a compiler
is known as the Object code.
Some compilers convert the high-level language to an assembly language as an intermediate
step. Whereas some others convert it directly to machine code. This process of converting the
source code into machine code is called compilation. It lists all the errors if the input code
does not follow the rules of its language. The main purpose of compiler is to change the code
written in one language without changing the meaning of the program.
We can analyze a source code in three main steps. Moreover, these steps are further divided into
different phases. The three steps are:
1. Linear Analysis
Here, it reads the character of the code from left to right. The characters having a collective
meaning are formed. We call these groups tokens.
2. Hierarchical Analysis
According to collective meaning, we divide the tokens hierarchically in a nested manner.
3. Semantic Analysis
In this step, we check if the components of the source code are appropriate in meaning.
STRUCTURE/PHASES OF A COMPILER
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler.
We basically have two phases of compilers, namely the Analysis phase and Synthesis
phase. The analysis phase creates an intermediate representation from the given source
code. The synthesis phase creates an equivalent target program from the intermediate
representation.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program is
either syntactically ill formed or semantically unsound, then it must provide informative
messages, so the user can take corrective action. The analysis part also collects information
about the source program and stores it in a data structure called a symbol table, which is
passed along with the intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language.
For example, assignment of values is between compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc. The semantic analyzer produces an
annotated syntax tree as an output.
Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for
the target machine. It represents a program for some abstract machine. It is in between the
high-level language and the machine language. This intermediate code should be generated in
such a way that it makes it easier to be translated into the target machine code.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).
Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) re-locatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it. The symbol table is also used
for scope management.
Now the first input symbol ‘a‘ matches the first leaf node of the tree. So the parser will move
ahead and find a match for the second input symbol ‘b‘.
But the next leaf node of the tree is a non-terminal i.e., A, that has two productions. Here, the
parser has to choose the A-production that can derive the string ‘abc‘. So the parser identifies
the A-production A-> b.
Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third input
symbol ‘d‘ matches the last leaf node ‘d‘ of the tree. Thereby successfully completing the
top-down parsing