0% found this document useful (0 votes)
40 views38 pages

1 Introduction

A compiler translates high-level programming languages into machine language, facilitating the execution of programs. The document outlines the structure and functions of compilers, including analysis and synthesis phases, and discusses the importance of understanding compilers for programming proficiency. It also covers various components of compilation, such as lexical analysis, syntax analysis, and code generation.

Uploaded by

yoyolgs889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views38 pages

1 Introduction

A compiler translates high-level programming languages into machine language, facilitating the execution of programs. The document outlines the structure and functions of compilers, including analysis and synthesis phases, and discusses the importance of understanding compilers for programming proficiency. It also covers various components of compilation, such as lexical analysis, syntax analysis, and code generation.

Uploaded by

yoyolgs889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Compiler Construction

1
Chapter 1
Introduction

2
Definition
A compiler is an executable program that can
read a program in one high-level language
and translate it into an equivalent executable
program in machine language.

program in executable program


high-level Compiler in machine language
language

Executable program
input output
in machine language

3
Typical Compilation

Source program Compiler Target program

4
The progression of programming
languages:
Machine language c7 06 0000 0002
Assembly language mov x 2
High-level language x = 2

*The first compiler was developed by


the team at IBM led by John Backus
between 1954 and 1957.

5
Why Take this Course
Reason #1: understand compilers and
languages
understand the code structure
understand language semantics
understand relation between source code and
generated machine code
become a better programmer

6
Why Take this Course
Reason #2: nice balance of theory and
practice
Theory
◦ mathematical models: regular
expressions, automata, grammars,
graphs
◦ algorithms that use these models
Practice
◦ Apply theoretical notions to build a real
compiler

7
Why Take this Course
Reason #3: programming experience
write a large program which manipulates
complex data structures

8
Programs Related to Compilers

source program
Interpreters
 Assemblers preprocessor
 Linkers
modified source program
 Loaders
 Preprocessors compiler
 Editors target assembly program
 Debuggers
assembler

relocatable machine code

Library
linker/loader
files

target machine code


9
Definitions of (translation-related)
Languages
- Source language
- Target language
- Implementation language

10
Translator
A program, written in the implementation
language, that takes sentences (strings) in
the source language and outputs equivalent
sentences (strings) in the target language.
e.g. - preprocessor, pretty printer,
fortran2c,
pascal2c (high to high)
- assembler (low to lower)
- disassembler (lower to low)
- compiler (high to low)

11
What are the differences between
"Interpreter" and "compiler”?
portability
execution speed
with/without object code
with/without optimization
debugging capability

12
Source Code
int expr( int n )
{
int d;
d = 4*n*n*(n+1)*(n+1);
return d;
}

13
Source Code
Optimized for human readability
Matches human notions of grammar
Uses named constructs such as
variables and procedures

14
Assembly Code
.globl _expr
_expr:
imull %eax,%edx
pushl %ebp
movl 8(%ebp),%eax
movl %esp,%ebp
incl %eax
subl $24,%esp
imull %eax,%edx
movl 8(%ebp),%eax
movl %edx,-4(%ebp)
movl %eax,%edx
movl -4(%ebp),%edx
leal 0(,%edx,4),
%eax movl %edx,%eax
movl %eax,%edx jmp L2
imull 8(%ebp),%edx .align 4
movl 8(%ebp),%eax L2:
incl %eax leave
ret
15
Assembly Code
Optimized for hardware
Consists of machine instructions
Uses registers and unnamed memory
locations
Much harder to understand by
humans

16
The Analysis-Synthesis Model of
Compilation

-There are two parts to compilation:


analysis & synthesis.
-During analysis, the operations implied
by the source program are determined
and recorded in a hierarchical structure
called a tree.
- During synthesis, the operations
involved in producing translated code.

17
The Front-end and Back-end Model
of Compilation

Source Intermediate
Target
Code Front End Back End
Code
Code

18
Target code
optimizer

19
Preprocessor (or Character handler )
 throw away the comments
 compress multiple blank characters
 include files (include nested files)
 perform macro expansions (nested macro
expansion)
- a macro facility is a text replacement capability
(two aspects: definition & use).
- a macro statement will be expanded into a set of
programming language statements or other macro.
 compiler option (conditional compilation)

(These jobs may be conducted by lexical analyzer.)

20
Scanner (Lexical Analyzer)
To identify lexical (word) structure
Input: a stream of chars;
Output: a stream of tokens.
A scanner may also enter identifiers into the
symbol table and enter literals into literal table.
(literals include numeric constants such as
3.1415926535 and quoted strings such as
“Hello, world!”).

21
An Example: a[index] = 4 + 2 ;

(1) Output of the Scanner :

a ===> identifier
[ ===> left bracket
index ===> identifier
] ===> right bracket
= ===> assignment
4 ===> number
+ ===> plus sign
2 ===> number
; ===> semicolon
22
How tokens (string of chars) are
formed from underlying character
set?
 Usually specified (described) by sequence
of regular expression.
 Lexical structures are analyzed via finite
state automata.
 But it has the look-ahead requirement.

(To recognize a token the scanner may


need to look more characters ahead of the
token.)

23
Parser (Syntax Analyzer)
 To identify syntax structure
- Input: a stream of tokens
- Output: On a logical level, some representation of
a
parse tree.
- Determine how do the tokens fit together to
make
up the various syntax entity of a program.
**Most compilers do not generate a parse tree
explicitly but rather go to intermediate code
directly as syntax analysis takes place.
- Usually specified via context free grammar.
- Syntax structures are analyzed by DPDA
(Deterministic Push Down Automata) 24
Predefined context-free grammar

expression  assign-expression
| subscript-expression
| additive-expression
| identifier
| number
assign-expression  expression = expression
subscript-expression  expression [ expression ]
additive-expression  expression + expression

25
(2) Output of the parser – parse tree (logical level)

structure names

tokens

26
(2)’ Output of the parser – Abstract Syntax Tree (AST)
(condensed parse tree)

[] +

27
Semantic Analyzer
==> Semantic Structure
- What is the program supposed to do?
- Semantics analysis can be done during
syntax analysis phase or intermediate code
generator phase or the final code generator.
- typical static semantic features include
declarations and type checking.
- information (attributes) gathered can be
either added to the tree as annotations or
entered into the symbol table.

28
(3) Output of the semantic analyzer – annotated AST

with subscripts from a range

29
(3) Output of the semantic analyzer
(cont’d)

- finds the consistence of data type


among ‘a’, ‘index’, and 2 + 4, or

- declares a type dismatch error if not.

30
The time ratio for scanning, parsing,
and semantic processing is 30:25:45.

31
Source Code Optimizer

32
(4) Output of the Source Code Optimizer

with subscripts from a range

33
Intermediate Code Generator

 Transform the parse tree (logical level) into


an intermediate language representation,
e.g., three address code: A = B op C ( op
is a binary operator)
 Difference between intermediate code and
assembly code:
- Specify the registers to be used for each
operation in assembly code.
- Actually intermediate code can be
represented as any internal representation
such as the syntax tree.
34
Advanced Code Optimizer

Detection of undefined variables


Detection of loop invariant
computation
Constant folding
Removal of induction variables
Elimination of common expression

35
Elimination of common expression

A=B+C+D
E=B+C+F

might be
T=B+C
A=T+D
E=T+F

36
Code Generator

37
(5) Output of the code generator

Mov R0, index // value of index -


> R0
Mov R1, &a // address of a ->
R1
Add R1, R0 // add R0 to R1
Mov *R1, 6 // constant 6 ->
address
in R1
38

You might also like