TK3163– Compiler Constructions
1:Introduction to Compilers
Dr. Bahari Idrus
Prof. Dr. Mohd Juzaiddin Ab Aziz
Agenda for Today
Introductions
Language Processors
Why study compiler
The structure of a compiler
Front end
Back end
Introduction
Is this correct?
Nasi makan harimau.
Rice eats tiger.
9/30/2002 © 2002 Hal Perkins & UW CSE A-3
Introduction
Which one is correct?
A. Nama saya Ahmad.
B. Na masa yaAh mad.
C. Ahmad saya Nama.
9/30/2002 © 2002 Hal Perkins & UW CSE A-4
Introduction
Which one is correct for additional
operations?
A. 12 + 3 = 15
B. X+Y=Z
C. 2x + Y = Z
D. +a=bc
9/30/2002 © 2002 Hal Perkins & UW CSE A-5
Introduction
If x == y then
z = 1;
Else
z = 2;
9/30/2002 © 2002 Hal Perkins & UW CSE A-6
Introduction
Programming languages are notations for
describing computations to people and to
machines (computers).
Before a program can be run, it first must
be translated into a form in which it can
be executed by a computer.
The software systems that do this
translation are called compilers.
Introduction
This course is about how to design and
implement compilers.
We shall discover that a few basic ideas can be
used to construct translators for various languages
and machines.
Introduction
Besides compilers, the principles and techniques
for compiler design apply to so many other
domains that they are likely to be reused many
times in the career of a computer scientist.
The study of compiler writing touches upon
programming languages, machine architecture,
language theory, algorithms, and software
engineering.
Language Processors – What
is compiler?
A program that reads a program written in one
language (source language) and translates it
into another language (target language).
Source language Compiler Target language
An important role of the compiler is to report
any errors in the source program that it detects
during the translation process.
Language Processors (cont)
If the target program is an executable
machine-language program, it can
then be called by the user to process
inputs and produce outputs.
Target
input output
Program
Language Processors -
Interpreter
An interpreter is another common kind of
language processor.
Instead of producing a target program as a
translation, an interpreter appears to directly
execute the operations specified in the source
program on inputs supplied by the user.
Source
Program
Interpreter Output
Input
Error messages
The Advantages of Compiler
and Interpreter
The machine-language target program produced by a
compiler is usually much faster than an interpreter at
mapping inputs to outputs.
An interpreter, however, can usually give better
error diagnostics than a compiler because it executes
the source program statement by statement.
Common Issues
Compilers and interpreters both must
read the input – a stream of characters
– and “understand” it; analysis
w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0
) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }
Typical Implementations
Compilers
FORTRAN, C, C++, Java, COBOL, etc.
Target program produces by a compiler is
usually much faster than an interpreter at
mapping inputs to outputs.
Interpreters
PERL, Python, postscript printer, Java VM
Give better error diagnostic than a compiler,
because it executes the source program
statement by statement.
Task 1
1. What is the difference between a compiler
and an interpreter?
2. What are the advantages of (a) a compiler
over an interpreter (b) an interpreter over
a compiler.
Why Study Compilers? (1)
Why study compiler?
Become a better programmer(!)
Insight into interaction between languages,
compilers, and hardware.
Understanding of implementation
techniques.
Better intuition about what your code does.
Why Study Compilers? (2)
Compiler techniques are everywhere
Parsing (little languages, interpreters)
Database engines
AI: domain-specific languages
Text processing
Tex/LaTex -> dvi -> Postscript -> pdf
Hardware: VHDL; model-checking tools
Mathematics (Mathematica, Matlab)
Why Study Compilers? (3)
Fascinating blend of theory and
engineering
Direct applications of theory to practice
Parsing, scanning, static analysis
Some very difficult problems (NP-hard or
worse)
Resource allocation, “optimization”, etc.
Need to come up with good-enough solutions
Structure of a compiler
Front end: analysis
Read source program and understand its structure
and meaning
Back end: synthesis
Generate equivalent target language program
Intermediate
Language
Source Target Language
Front End – Back End –
Language
language specific machine specific
Compiler Architecture
Front End – language specific Back End –machine specific
Intermediate
Intermediate
Language
Language
Scanner Parser Semantic Code
Source Code Target
language
(lexical (syntax Analysis Generator language
Optimizer
analysis) analysis) (IC generator)
tokens Syntactic
structure
Analysis Synthesis
Symbol
Table
Implications
Must generate correct code
Must manage storage of all variables
Must agree with OS & linker on target format
Source Front End Back End Target
More Implications
Need some sort of Intermediate
Representation (IR)
Front end maps source into IR
Back end maps IR to target machine code
Source Front End Back End Target
source tokens IR
Scanner Parser
Front End
Split into two parts
Scanner: Responsible for converting character
stream to token stream
Also strips out white space, comments
Parser: Reads token stream; generates IR
Both of these can be generated automatically
Source language specified by a formal grammar
Tools read the grammar and generate scanner &
parser (either table-driven or hard coded)
Lexical Analysis - Scanning
Scanner tokens Parser Semantic Code
Source (lexical (syntax Generator
Analysis
languag analysis) analysis) (IC generator)
e
Code
Optimizer
the lexical analyzer reads the
stream of characters making up
• Tokens described formally the source program & groups the
• Breaks input into tokens characters into meaningful
• Remove white space
Symbol
sequences called lexemes.
Table
For each lexemes, the lexical
Analyzer produces as output a
token of the form:
<token-name, attribute-value>
Tokens
Token stream: Each significant lexical
chunk of the program is represented by
a token
Operators & Punctuation: {}[]!+-=*;: …
Keywords: if, while, return, goto
Identifiers: id & actual name
Constants: kind & value; int, floating-point
character, string, …
Scanner Example (1)
Scanner Example (1)
Source program ---------------------------- lexical analysis
position:=initial+rate*60 <id,1><=><id,2><+><id,3><*><60>
Scanner Example (1)
Input text
// this statement does very little
if (x >= y) y = 42;
Token Stream
IF LPAREN ID(x) GEQ ID(y)
RPAREN ID(y) BECOMES INT(42) SCOLON
Note: tokens are atomic items, not character
strings
Syntax Analysis - Parsing
tokens Syntactic
Scanner Parser Semantic Code
Source structure Target
language
(lexical (syntax Analysis Generator language
analysis) analysis) (IC generator)
Code
Optimizer
Tokens organized into syntax
tree that describes structure
Error checking (syntax) Symbol
Common output from a parser is Table
an abstract syntax tree
Parser Example (1)
lexical analysis ---------------------------- syntax analysis
<id,1><=><id,2><+><id,3><*><60>
<id,1> +
<id,2> *
<id,3> 60
Parser Example (2)
Token Stream Input Abstract Syntax Tree
IF LPAREN ID(x) ifStmt
GEQ ID(y) RPAREN
>= assign
ID(y) BECOMES
INT(42) SCOLON ID(x) ID(y) ID(y) INT(42)
Parser Example (2)
Assign
Exp ::= Exp ‘+’ Exp
| Exp ‘-’ Exp
ID ‘=‘ Exp
| Exp ‘*’ Exp
| Exp ‘/’ Exp Exp ‘+’ Exp
| ID
ID Exp ‘*’ Exp
Assign ::= ID ‘=‘ Exp
ID Exp ‘/’ Exp
ID ID
Semantic Analysis
Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Syntactic/semantic Code
structure
Optimizer
• “Meaning”
• Type/Error Checking
• Intermediate Code Generation –
abstract machine Symbol
Table
Back End
Responsibilities
Translate IR into target machine code
Should produce fast, compact code
Should use machine resources effectively
Registers
Instructions
Memory hierarchy
9/30/2002 © 2002 Hal Perkins & UW CSE A-36
Back End Structure
Typically split into two major parts with
sub phases
“Optimization” – code improvements
May well translate parser IR into another IR
Code generation
Instruction selection & scheduling
Register allocation
The Result
Input Output
if (x >= y) mov eax,[ebp+16]
y = 42; cmp eax,[ebp-8]
jl L17
mov [ebp-8],42
L17:
Optimization
Scanner Parser Semantic Code
Source Target
(lexical (syntax Analysis Generator
language language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Syntactic/semantic
Code structure
Optimizer
Symbol
• Improving efficiency (machine independent) Table
• Finding optimal code is NP
Code Generation
Syntactic/semantic
structure
Scanner Parser Semantic Code
Source Target
(lexical (syntax Analysis Generator
language language
analysis) analysis) (IC generator)
Syntactic/semantic
Code structure
Optimizer
• IC to real machine code
• Memory management, register allocation,
instruction selection, instruction scheduling, Symbol
… Table
Translation
of
statement
The Phases of a Compiler
Phase Output Sample
Programmer Source string A=B+C;
Scanner (performs lexical Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
analysis) And symbol table for identifiers
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C
Semantic analyzer (type Parse tree or abstract syntax tree
checking, etc)
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A