0% found this document useful (0 votes)
10 views

#Chapter 1 - CD

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

#Chapter 1 - CD

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Compiler Design

Mattu University

Instructor Name: Worku.B

Email:[email protected]
Compiler Design

Chapter One

Introduction To Compiler Design


Preliminaries Required

Basic knowledge of Programming languages.

Basic knowledge of Automata and Context Free
Grammar.


Textbook:
Alfred V. Aho, Ravi Sethi, and Jeffrey D.
Ullman,
“Compilers: Principles, Techniques, and Tools”
Addison-Wesley, 2007.

3
Objectives
At the end of this session students will be able to:
 Understand the basic concepts and principles of Compiler Design

 Understand the term compiler, its functions and how it

works.
 Be familiar with cousins of compiler: Interpreters,
Assemblers.
 Understand the need of studying Compiler Design and
Construction
 Understand the Phases of Compilation and the steps of

Compilation. 4
LANGUAGE PROCESSING SYSTEM IN COMPILER DESIGN

 The hardware understands a language, which humans


cannot understand.
✓ So we write programs in high-level language, which is
easier for us to understand and remember.
✓ These programs are then fed into a series of tools and OS
components to get the desired code that can be used by the
machine. This is known as Language Processing System
Language processing System in Compiler
Design

source program

preprocessor
Modified source program/ Pure
HLL
compiler

Target assembly program

assembler

Relocatable machine code


Library
linker/loader
files

Absolute machine code


6
Language processing System in Compiler
Design

 Preprocessor: It includes all header files. It takes source code as


input and produces modified source code (Pure HLL) as output.
E.g #include, #define.
 Compiler: The compiler takes the modified code as input and
produces the target code as output.
 Assembler: The assembler takes the target code as input and
produces real locatable machine code as output.

7
CONT…

Linker: Linker or link editor is a program that takes a


collection of objects (created by assemblers and compilers)
and combines them into an executable program.
Loader: The loader keeps the linked program in the main
memory.
Executable code: It is low-level and machine-specific code
that the machine can easily understand.
Once the job of the linker and loader is done the object code
is finally converted it into executable code.
Differences between Linker/Loa
der
Linker Loader

The linker is part of the library files. The loader is part of an operating system.

The linker performs the linking operation. The loader loads the program for execution.

It also connects user-defined functions to Loading a program involves reading the


user-defined libraries. contents of an executable file in memory.

Linker Combines object files into a single Loader Loads executable files into memory
executable file. for execution.

Object files generated by the compiler. Executable files generated by the linker.

Linker is a Single executable file. Loader loads the program into memory.

Linker assigns memory addresses to code Loader Allocates memory for the program in
and data sections. the process space.

Resolves external references between Resolves external references between


object files. executable files.

Linker does not execute the program. Loader executes the program in memory.
9
What is a compiler?
 A compiler is a software that converts program written in High
Level Language (HLL) (source program) to equivalent program
in a target language.
 A program that reads a program written in one language and
translates it into another language.
 Traditionally, compilers go from high-level languages to low-
level languages.
Source Target
Program Compiler Program

Error
10
Cont…
 Source Program is normally a program written in a high-
level language.
 Target Program is normally the equivalent program in
machine code (relocatable object file)

Source language Compiler


Target
language
high-level language Machine code
(relocatable object file)

11
11
Cousins of Compiler
A. Assembler:- is a translator that converts programs written in
assembly language into machine code.
 Translate mnemonic operation codes to their machine language equivalents.
 Assigning machine addresses to symbolic labels.

B. Interpreter:- is a computer program that translates high level


instructions/programs into machine code as they are encountered.
 It produces output of statement as they are interpreted

12
Contd…

13
Compiler vs. Interpreter

Compiler Interpreter
 Takes Entire program as input  Take single instruction as input
 It is Faster  It is Slower
 Required more memory due to  Required less memory As no
intermediate object code intermediate code is generated
 Program not need compile every  Every time higher level program is
time converted into lower level program.
 Errors are displayed after entire  Errors are displayed for every
program is checked. instruction interpreted.
 Debugging is comparatively hard.  Debugging is easy.
 Ex: C, C++.  Ex: python, Ruby, basic. 14
Basic Compiler Design
 Write a huge program that takes as input another program in the source
language for the compiler, and gives as output an executable that we can run.
For modifying code easily, usually, we use modular design
(decomposition) methodology to design a compiler.
Two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser, semantic analyzer,
and assembly tree generator), and write a separate back end for each platform
that you want to support
2. Write an efficient highly optimized back end, and write a different front end
for several languages, such as Fortran, C, C++, and Java.
Sour Intermedi Targe
ce Front End ate Back End t
code code code 15
Major Parts of Compilers
 There are two major parts of a compiler: Analysis and
Synthesis

Analysis (machine independent)- front end

Synthesis (machine dependent)- back end
 In analysis phase, an intermediate representation is
created from the given source program. Analysis
determines the operations implied by the source
program which are recorded in a tree structure
 Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are
the parts of this phase.

 In synthesis phase, the equivalent target program is


created from this intermediate representation.
16

Con…..
 During analysis, the operations implied by the source program are

determined and recorded in a hierarchical structure called a tree.


 During synthesis, the operations involved in producing translated code.

1. Lexical Analysis  Breaks up source program


2. Syntax Analysis
Analysis
Front

into constituent pieces


End

3. Semantic Analysis  Creates intermediate


representation of source
program
4. Intermediate  Construct target program
Code Generator
from intermediate
5. Optimization Synthesis
Back
End

representation
6. Code
 Takes the tree structure and
Generation
translates the operations
into the target program 17
Phases of Compiler
 Compiler Phases: A compiler operates as a sequence of
phases, each of which transforms the source program
from one intermediate representation to another.
 Each phase transforms the source program from one
representation into another representation.

They communicate with error handlers.

They communicate with the symbol table.

18
Phases of Compiler
Source Program
Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Symbol Table Error


Manager Intermediate Code Generator Handler

Code Optimizer

Code Generation

Target Program

19
Cont…

Symbol Table – It is a data structure being used and


maintained by the compiler, consisting of all the identifier’s
names along with their types. It helps the compiler to
function smoothly by finding the identifiers quickly
Phase I: Lexical Analyzer
(Scanner)

Scanner Tokens Parser Semantic Code


Source (lexical (syntax Generator
Analysis
languag analysis) analysis) (IC generator)
e

Code
Optimizer

• Tokens described formally


• Breaks input into tokens
• Remove white space and comments
Symbol
Table

21
Phase I: Lexical Analyzer
(Scanner)
 Lexical Analyzer reads the source program character by character and returns
the tokens of the source program.
 A token describes a pattern of characters having same meaning in the source
program. (such as identifiers, operators, keywords, numbers, delimiters and so
on)
Ex1: newval = oldval + 12 => tokens: newval identifier
= assignment
operator
oldval identifier
+ add operator
12 a number

 Puts information about identifiers into the symbol table.


 Regular expressions are used to describe tokens (lexical constructs).
 A (Deterministic) Finite State Automaton can be used in the implementation
of a lexical analyzer.

22
Example- 2

Input: result = a + b * c / d
Phase II: Syntax Analyzer
 A Syntax Analyzer creates the syntactic structure (generally
a parse tree) of the given program.
 A syntax analyzer is also called as a Parser.
 A Parse tree describes a syntactic structure.
 Constructed by repeated application of rules in Context Free
Grammar (CFG)

Example: parse tree for position:=initial + rate*60


 In a parse tree, all terminals are
at leaves.

 All inner nodes are non-


terminals in a context free
grammar.

24
Input: result = a + b * c / d
Exp ::= Exp ‘+’ Exp Assign
| Exp ‘*’ Exp
| Exp ‘/’ Exp
| ID ID ‘=‘ Exp
Assign ::= ID ‘=‘ Exp
Exp ‘+’ Exp

ID Exp ‘*’ Exp

ID Exp ‘/’ Exp

ID ID

25
25
Syntax Analyzer (CFG)

 The syntax of a language is specified by a context free


grammar (CFG).
 The rules in a CFG are mostly recursive.
 A syntax analyzer checks whether a given program
satisfies the rules implied by a CFG or not.
 If it satisfies, the syntax analyzer creates a parse tree for the
given program.

Ex: We use BNF (Backus Naur Form) to specify a CFG


assgntmt -> identifier := expression
expression -> identifier
expression -> number 26
Parsing Techniques
 Depending on how the parse tree is created,
there are different parsing techniques.
 These parsing techniques are categorized
into two groups:
 Top-Down Parsing,
 Bottom-Up Parsing
 Top-Down Parsing:
 Construction of the parse tree starts at the root,
and proceeds towards the leaves.
 Efficient top-down parsers can be easily
constructed by hand.
 Recursive Predictive Parsing, Non-Recursive
Predictive Parsing (LL Parsing).
 Bottom-Up Parsing:
27

Phase III: Semantic Analyzer
 A Semantic Analyzer checks the source program for semantic errors and
collects the type information for the code generation.
 Type-checking is an important part of semantic analyzer.
 Normally semantic information cannot be represented by a context-free
language that used in syntax analyzers.
 Context-free grammars used in the syntax analysis are integrated with attributes
(semantic rules)

 the result is a syntax-directed translation,


 Attribute grammars
Ex: newval := oldval + 12

The type of the identifier newval must match with type of the
28
Semantic Analysis

Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Syntactic/semantic
structure
Code
Optimizer

• “Meaning”
• Type/Error Checking
• Intermediate Code
Generation – abstract machine Symbol
Table

29
Phase IV: Intermediate Code
Generation
 A compiler may produce an explicit intermediate codes
representing the source program.
 These intermediate codes are generally machine
codes(architecture independent).
 TAC(Three Address Code)
Ex: newval := oldval + fact * 1
id1 := id2 + id3 * 1
temp1 := inttoreal (1)
temp2 := id3 * temp1
temp3 := id2 + temp2
newval := temp3

30
Phase V: Code Optimizer
 The code optimizer optimizes the code
produced by the intermediate code
generator in the terms of time and space.
 Improving efficiency (machine
independent)
Phase
Ex: VI: Code
temp1 := id3 *Generator
1.0
id1 := id2 + temp1
 Produces the target language in a specific
architecture.
 The target program is normally is a relocatable
object file containing the machine codes.
Ex: ( assume that we have an architecture with instructions whose at
least one of its operands is
a machine register)
MOVE id3,R1
MULT #1,R1
ADD id2,R1 31
Summary of Phases of Compiler

32
Compiler-Construction Tools

1) Parser generators that automatically produce syntax analyzers


from a grammatical description of a programming language.
2) Scanner generators that produce lexical analyzers from a regular-
expression description of the tokens of a language.
3) Syntax-directed translation engines that produce collections of
routines for walking a parse tree and generating intermediate code.
4) Code-generator generators that produce a code generator from a
collection of rules for translating each operation of the
intermediate language into the machine language for a target
machine. 33
33
5) Data-flow analysis engines that facilitate the gathering of
information about how values are transmitted from one part of
a program to each other part. Data-flow analysis is a key part
of code optimization.
6) Compiler-construction toolkits that provide an integrated
set of routines for constructing various phases of a compiler.
Compiler-Construction Tools
 Software development tools are available
to implement one or Other
more compiler
compiler tools: phases
 JavaCC, a parser generator for Java,
 Scanner generators including scanner generator and
 Parser generators. parser generator. Input
specifications are different than
 Syntax-directed translation engines
those suitable for Lex/YACC. Also,
 Automatic code generators unlike YACC, JavaCC generates a
 Scanner top-down parser.
 Data generators
Flow for
EnginesC/C++: Flex, Lex.
 ANTLR,
 Parser generators for C/C++: Bison, YACC. a set of language translation
tools (formerly PCCTS). Includes
 Available scanner generators for Java: scanner/parser generators for C, C+
 JLex, a scanner generator for Java, +, and
very Java. to Lex.
similar
 JFlex, flex for Java.
 Available parser generators for Java:
 CUP, a parser generator for Java, very similar to YACC.
 BYACC/J, a different version of Berkeley YACC for Java.
It is an extension of the standard YACC (a -j flag has
been added to generate Java code). 35
WHY STUDY COMPILERS?
✓ Increases understanding of language
semantics
✓ General background information for
good software engineer
✓ Seeing the machine code generated for
language constructs helps understand
performance issues for languages
✓ Teaches good language design
✓ New devices may need device-specific
languages
✓ New business fields may need domain-
specific languages
assignment
1. Compiler
Design Tools
2. Advantages
and
disadvantages of
compiler Design
?
Thank
Thank You
You ...
...

37

You might also like