0% found this document useful (0 votes)
17 views14 pages

Chapter OneCompiler Design

This document provides an overview of compiler design, explaining the role of compilers in translating source code into machine code and detailing the phases of compilation, including analysis and synthesis. It discusses various components such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. Additionally, it highlights the importance of error detection, symbol table management, and the functions of preprocessors and assemblers in the compilation process.

Uploaded by

teediidamtow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Chapter OneCompiler Design

This document provides an overview of compiler design, explaining the role of compilers in translating source code into machine code and detailing the phases of compilation, including analysis and synthesis. It discusses various components such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. Additionally, it highlights the importance of error detection, symbol table management, and the functions of preprocessors and assemblers in the compilation process.

Uploaded by

teediidamtow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Compiler Design

Chapter One
1.1. INTRODUCTIO TO COMPILERS

 Computer programs are formulated in a programming language and specify classes of computing
processes. Computers, however, interpret sequences of particular instructions, but not program texts.
Therefore, the program text must be translated into a suitable instruction sequence before it can be
processed by a computer. This translation can be automated, which implies that it can be formulated
as a program itself. The translation program is called a compiler, and the text to be translated is
called source text (or sometimes source code).
 A Compiler is a program that reads a program written in one language - the source Language - and
translates it in to an equivalent program in another language - the target language without changing
the meaning of the program. Most specifically a compiler takes a computer program and translates it
into an object program. Some other tools associated with the compiler are responsible for making an
object program into executable form.
 The compiler reports to its user the presence of errors in the source program.

Figure 1.1: Major function of Compiler


 Source Program:- It is normally a program written in a high-level programming language. It
contains a set of rules, symbols, and special words used to construct a computer program.
 Target Program:- It is normally the equivalent program in machine code. It contains the binary
representation of the instructions that the hardware of computer can perform.
 Error Message:- A message issued by the compiler due to detection of syntax errors in the source
program.
Features of Compiler
 Correctness
 Speed of compilation
 Preserve the correct meaning of the code
 The speed of the target code
 Recognize the legal or illegal program constructs
 Helps in code debugging

Compiled by Mr. Abraham Wolde Department of Computer Science 1


Compiler Design
Chapter One
1.1.1. PARTS OF COMPILATION

There are two parts to compilation:-


1) Analysis
2) Synthesis
The analysis part breaks up the source program into constituent pieces and creates an intermediate
representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation. Of the
two parts, synthesis requires the most specialized techniques.
During analysis, the operations implied by the source program are determined and recorded in a
hierarchical structure called a tree.
Often, a special kind of tree called a syntax tree is used, in which each node represents an operation and
the children of a node represent the arguments of the operation.

Many software tools that manipulate source programs first perform some kind of analysis. Some
examples of such tools are:
 Structure editor
 Pretty printers
 Static checkers
 Interpreters

Structure Editor

 A structure editor takes as input a sequence of commands to build a source program.


 The structure editor not only performs the text-creation and modification functions of an ordinary
text editor, but it also analyzes the program text, putting an appropriate hierarchical structure on
the source program.
 For example, it can check that the input is correctly formed, can supply keywords automatically
(e-g.. when the user types while. the editor supplies the matching do and reminds the user that a
conditional must come between them), and can jump from a begin or left parenthesis to its
matching end or right parenthesis.

Compiled by Mr. Abraham Wolde Department of Computer Science 2


Compiler Design
Chapter One
Pretty Printers

 A pretty printer analyzes a program and prints it in which a way that the structure of the program
becomes clearly visible.
 For example, comments may appear in a special font, and statements may appear with an amount
of indentation proportional to the depth of their nesting in the hierarchical organization of the
statements.

Static Checkers

 A static checker reads a program, analyzes it, and attempts to discover potential bugs without
running the program.
 For example, a static checker may detect that parts of the source program can never be executed.
 It can catch logical errors such as trying to use a real variable as a pointer.

Interpreters

 Interpreter performs the operations implied by the source program.


 For an assignment statement, for example, an interpreter might build a tree like Fig. 1.2, and then
carry out the operations at the nodes as it "walks" the tree.
 Interpreters are frequently used to execute command languages, since each operator executed in a
command language is usually an invocation of a complex routine such as an editor or compiler.
 The analysis portion in each of the following examples is similar to that of a conventional
compiler.

1.1.2. ANALYSIS OF THE SOURCE PROGRAM

Analysis consists of three parts


1) Linear Analysis:- is called lexical analysis or scanning. It is the process of reading a character
from left-to-right and grouped into tokens that are sequences of characters having a collective
meaning.
2) Hierarchical Analysis:- is called as syntax analysis or parsing. In this analysis the characters or
tokens are grouped hierarchically into nested collections with collective meaning.
3) Semantic Analysis:- in which certain checks are performed to ensure that the components of a
program fit together meaningfully. i.e.; it check the source program for semantic errors and
gathers type information for subsequent code generation phase.

1.2. THE PHASES OF A COMPILER

 A compiler operates in phases, each of which transforms the source program from one
representation to another.

Compiled by Mr. Abraham Wolde Department of Computer Science 3


Compiler Design
Chapter One
 A typical decomposition of a compiler is shown in Fig 1.2.
 The first three phases forming the bulk of the analysis portion of a compiler.
 Symbol table management and error handling, are shown interacting with the six phases of the
compiler.

Symbol Table Management

 An essential function of a compiler is to record the identifiers used in the source program and collect
information about various attributes of each identifier.
 These attributes may provide information about the storage allocated for an identifier, its type, its
scope and, in the case of procedure names, such things as the number and types of its arguments, the
method of passing each argument and the type returned.
 A symbol table is a data structure containing a record for each identifier, with fields for the attributes
of the identifier. The data structure allows us to find the record for each identifier quickly and to
store or retrieve data from that record quickly.
 When an identifier in the source program is detected by the lexical analyzer, the identifier is entered
into the symbol table.
 However, the attributes of an identifier cannot normally determined during lexical analysis. For
example, in a Pascal declaration like
var position, initial, rate: real;
 The type real is not known when position, initial, and rate are seen by the lexical analyzer.
 The remaining phases enter information about identifiers into the symbol table and then use this
information in various ways.

Compiled by Mr. Abraham Wolde Department of Computer Science 4


Compiler Design
Chapter One

Figure 1.2: Phases of Compiler (Block diagram of compiler)

Error Detection and Reporting

 Each phase can encounter errors. However, after detecting an error, a phase must deal with that
error, so that compilation can proceed, allowing further errors in the source program to be
detected.
 The lexical phase can detect errors where the characters remaining in the input do not form any
token of the language.
 Errors where the token stream violates the structure rules of the language are determined by the
syntax analysis phase.
 During semantic analysis the compiler tries to detect constructs that have the right syntactic
structure but no meaning to the operation involved.

2.1.1. THE ANALYSIS PHASES OF THE COMPILER

Lexical Analysis

 Lexical Analysis or Scanners reads the source program one character at a time, carving the source
program into a sequence of automatic units is called tokens.
 The lexical analysis phase reads the characters in the source program and groups them into a
stream of tokens.

Compiled by Mr. Abraham Wolde Department of Computer Science 5


Compiler Design
Chapter One
 Each token represents a logically cohesive sequence of characters, such as an identifier, a keyword
(if, while, etc.), a punctuation character, or a multi-character operator like :=.
 The character sequence forming a token is called the lexeme for the token.
 Certain tokens will be augmented by a "lexical value."
 The lexical analyzer not only generates a token, say id, but also enters the lexeme rate into the
symbol table.
 Consider the above expression
Position:= initial + rate * 60
 The representation of the above expression after the lexical analysis is
id1 = id2 + id3 * 60

Syntax Analysis

 The second stage of translation is called syntax analysis or parsing. In this phase expressions,
statements, declarations etc… are identified by using the results of lexical analysis.
 Syntax analysis is aided by using techniques based on formal grammar of the programming
language.
 It groups token together into syntactic structures. (Fig.1.11a.. syntax tree)
 A typical data structure for the tree is shown in Fig. 1.11(b) in which an interior node is a record
with a field for the operator and two fields containing pointers to the records for the left and right
children.
 A leaf is a record with two or more fields, one to identify the token at the leaf, and the others to
record information about the token.

Semantic Analysis

 An important component of semantic analysis is type checking. Here the compilers checks that
each operator has operands that are permitted by the source language specification.

Compiled by Mr. Abraham Wolde Department of Computer Science 6


Compiler Design
Chapter One
 For example; many programming language definition require a compiler to report an error every
time a real number is used to index an array.
 However, the language specification may permit some operand coercions, for example, when a
binary arithmetic operator is applied to an integer and real, in this case, the compiler may need to
convert the integer to a real.

Intermediate Code Generation

 After syntax and semantic analysis, some compilers generates an explicit intermediate
representation of the source program.
 An intermediate representation of the final machine language code is produced. This phase
bridges the analysis and synthesis phases of translation.
 This intermediate representation should have two important properties. It should be easy to
produce and easy to translate into the target program.
 The intermediate representation can have a variety of forms and one of the forms is called “Three
address code”, which is like the assembly language for a machine in which every memory location
can act like a register.
 Three address code consists of a sequence of instructions, each of which has at most three
operands.
 Three address code for the statement position : = initial + rate * 60 is

Inter mediate form has several properties.


 First, each three-address instruction has at most one operator in addition to the assignment. Thus,
when generating these instructions, the compiler has to decide on the order in which operations
are to be done; the multiplication precedes the addition in the source program of (1.1).
 Second, the compiler must generate a temporary name to hold the value computed by each
instruction.
 Third, some "three address" instructions have fewer than three operands, e.g., the first and last
instructions in (1.3).
Code Optimization
 The code optimization phase attempts to improve the intermediate code, so that the output faster
running machine code and takes less space.

Compiled by Mr. Abraham Wolde Department of Computer Science 7


Compiler Design
Chapter One
 The above intermediate code is optimized like this,

 int to real operation can be eliminated by the conversion of 60 integer in to real and temp3 is used
only once, to transmit its value to id1, so it can be eliminated.

Code Generation

 The final phase of the compiler is the generation of target code, consisting normally of relocatable
machine code or assembly code, Memory locations are selected for each of the variables used by
the program.
 Then, intermediate instructions are each translated into a sequence of machine instructions that
perform the same task.
 A crucial aspect is the assignment of variables to registers. For example, using registers 1and 2,
the translation of the code of the above code might become

 The first and second operands of each instruction specify a source and destination, respectively.
The F in each instruction tells us that instructions deal with floating-point numbers.
 This code moves the contents of the address id3 into register 2, then multiplies it with the real
constant 60.0
 The third instruction moves id2 into register 1 and adds to it the value previously computed in
register 2. Finally, the value in register 1 is moved into the address of id1.

Compiled by Mr. Abraham Wolde Department of Computer Science 8


Compiler Design
Chapter One

EXERCISE
Write output of all the phases of compiler for following statements:
1) x=b-c*2 3) E=M * C**2
2) I=P*n*r/100

1.3. COUSINS OF THE COMPILER

The input to a compiler may be produced by one or more preprocessors, and further processing of the
compiler's output may be needed before running machine code is obtained.

Compiled by Mr. Abraham Wolde Department of Computer Science 9


Compiler Design
Chapter One
1.3.1. Overview of Language Processing System

Figure 1: Language Processing System


Preprocessors
A preprocessor, generally considered as a part of compiler, is a tool that produces input for compilers. It
deals with macro-processing, augmentation, file inclusion, language extension, etc. They may perform
the following functions:
1) Macro Processing:- A preprocessor may allow a user to define macros that are short hands for
longer constructs.
2) File Inclusion:- A preprocessor may include header files into the program text. For example, the C
preprocessor causes the contents of the file <global.h> to replace the statement #include <global.h>
when it processes a file containing this statement.
3) Rational Preprocessors:- These processors augment older languages with more modern flow-of-
control and data-structuring facilities. For example, such a preprocessor might provide the user with
built-in macros for constructs like while-statements or if-statements, where none exist in the
programming language itself.
4) Language Extensions:- These preprocessors attempt to add capabilities to the language by what
amounts to built-in macros, For example. The language Equel is a database query language
embedded in C.
Statements beginning with ## are taken by the preprocessor to be database-access statements, unrelated
to C, and are translated into procedure calls on routines that perform the database access.
Macro processors deal with two kinds of statement:-
 macro definition
 macro use
Definitions are normally indicated by some unique character or keyword, like define or macro. They
consist of a name for the macro being defined and a body, forming its definition.

Compiled by Mr. Abraham Wolde Department of Computer Science 10


Compiler Design
Chapter One
The use of a macro consists of naming the macro and supplying actual parameters, that is values for its
formal parameters. The macro processor substitutes the actual parameters for the formal parameters in
the body of the macro; the transformed body then replaces the macro use itself.

Assemblers

 Some compilers produce assembly code that is passed to an assembler for further processing, other
compilers perform the job of the assembler, producing relocatable machine code that can be passed
directly to the loader/link-editor.
 Assembly code is a mnemonic version of machine code, in which names are used instead of binary
codes for operations, and names are also given to memory addresses.
 A typical sequence of assembly instruction might be

 This code moves the contents of the address a in to register 1, then adds the constant 2 to it and
finally stores the result in the location named by b. Thus, it computes b: = a + 2.

Two Pass Assembly

 The simplest form of assembler makes two passes over the input, where a pass consists of reading an
input file once. In the first pass, all the identifiers that denote storage locations are found and stored
in a symbol table.
 Identifiers are assigned storage locations as they are encountered for the first time, so after reading,
the symbol table might contain the entries shown in Figure. In that figure, we have assumed that a
word, consisting of four bytes, is set aside for each identifier, and that addresses are assigned starting
from byte 0.
Symbol table

 In the second pass, the assembler scans the input again. This time, it translates each operation code
into the sequence of bits representing that operation in machine language, and it translates each
identifier representing a location into the address given for that identifier in the symbol table.
 The output of the second pass is usually relocatable machine code, meaning that it can be loaded
starting at any location L in memory.

Loaders and Link-editors

Compiled by Mr. Abraham Wolde Department of Computer Science 11


Compiler Design
Chapter One
 Usually, a program called a loader performs the two functions of loading and link-editing.
 The process of loading consists of taking relocatable machine code, altering the relocatable
addresses, and placing the altered instructions and data in memory at the proper locations.
 The link-editor allows us to make a single program from several files of relocatable machine code,
these files may have been the result of several different compilations, and one or more may be
library files of routines provided by the system and available to any program that needs them.
 If the files art to be used together in a useful way, there may be some external references, in which
the code of one file refers to a location in another file. This reference may be to a data location
defined in one file and used in another, or it may be to the entry point of a procedure that appears in
the code for one file and is called from another file.
 The relocatable machine code file must retain the information in the symbol table for each data
location or instruction label that is referred to externally. If we do not know in advance what might
be referred to, we in effect must include the entire assembler symbol table as part of the relocatable
machine code.

1.4. COMPILER CONSTRUCTION TOOLS

In addition to these software development tools, other more specialized tool has been developed for
helping implements various phases of a compiler.
Some general tools have been created for the automatic design of specific compiler components, these
tools use specialized languages for specifying and implementing the component, and many use
algorithms that are quite sophisticated.
The following is a list of some useful compiler construction tools:
1) Parser generators:- These produce syntax analyzers, normally from input that is based on a
context-free grammar. In early compilers, syntax analysis consumed not only a large fraction of the
running time of a compiler, but a large fraction of the intellectual effort of writing a compiler. This
phase is now considered one of the easiest to implement. Many parser generators utilize powerful
parsing algorithms that are too complex to be carried out by hand.

2) Scanner generators:- These automatically generate lexical analyzers, normally from a specification
based on regular expressions. The basic organization of the resulting lexical analyzer is in effect a
finite automaton.
3) Syntax-directed translation engines:- These produce collections of routines that walk the parse
tree, such as intermediate code. The basic idea is that one or more "translations" are associated with

Compiled by Mr. Abraham Wolde Department of Computer Science 12


Compiler Design
Chapter One
each node of the parse tree, and each translation is defined in terms of translations at its neighbor
nodes in the tree.
4) Automatic code generators:- Such a tool takes a collection of rules that define the translation of
each operation of the intermediate language into the machine language for the target machine. The
rules must include sufficient detail that we can handle the different possible access methods for data;
e.g.. Variables may be in registers, in a fixed (static) location in memory, or may be allocated a
position on a stack.
The basic technique is "template matching." The intermediate code statements are replaced by
"templates" that represent sequences of machine instructions, in such a way that the assumptions
about storage of variables match from template to template.
5) Data flow engines:- Much of the information needed to perform good code optimization involves
"data-flow analysis," the gathering of information about how values are transmitted from one part of
a program to each other part. Different tasks of this nature can be performed by essentially the same
routine, with the user supplying details of the relationship between intermediate code statements and
the information being gathered.

Features of compiler construction tools :

Lexical Analyzer Generator: This tool helps in generating the lexical analyzer or scanner of the
compiler. It takes as input a set of regular expressions that define the syntax of the language being
compiled and produces a program that reads the input source code and tokenizes it based on these
regular expressions.
Parser Generator: This tool helps in generating the parser of the compiler. It takes as input a
context-free grammar that defines the syntax of the language being compiled and produces a program
that parses the input tokens and builds an abstract syntax tree.
Code Generation Tools: These tools help in generating the target code for the compiler. They take as
input the abstract syntax tree produced by the parser and produce code that can be executed on the
target machine.
Optimization Tools: These tools help in optimizing the generated code for efficiency and
performance. They can perform various optimizations such as dead code elimination, loop
optimization, and register allocation.
Debugging Tools: These tools help in debugging the compiler itself or the programs that are being
compiled. They can provide debugging information such as symbol tables, call stacks, and runtime
errors.
Profiling Tools: These tools help in profiling the compiler or the compiled code to identify
performance bottlenecks and optimize the code accordingly.
Documentation Tools: These tools help in generating documentation for the compiler and the
programming language being compiled. They can generate documentation for the syntax, semantics,
and usage of the language.

Compiled by Mr. Abraham Wolde Department of Computer Science 13


Compiler Design
Chapter One
Language Support: Compiler construction tools are designed to support a wide range of
programming languages, including high-level languages such as C++, Java, and Python, as well as
low-level languages such as assembly language.
Cross-Platform Support: Compiler construction tools may be designed to work on multiple
platforms, such as Windows, Mac, and Linux.
User Interface: Some compiler construction tools come with a user interface that makes it easier for
developers to work with the compiler and its associated tools

Compiled by Mr. Abraham Wolde Department of Computer Science 14

You might also like