CS-441: Compiler Construction
By:
Muhammad Nadeem
Assistant Professor
International Islamic University, Islamabad
Compiler Construction
Lecture 1
What is a compiler?
It is a language processor !!
Translator
It is a program that can read a program in one
language (Java, C, Lisp, C#, Pascal) etc. – the
source language – and translate it into an
equivalent program in another language –
target language.
A compiler needs to report any errors in the
source program that it detects during the
translation process. (e.g. a missing semicolon at
the end of a statement)
3
Why compilers?
Programming in machine (or assembly) language
is tedious, error prone, and machine dependent
Historical note: In 1954, IBM started developing
FORTRAN language and its compiler
4
Why study theory of compiler?
Besides it is required…
Prerequisite for developing advanced compilers, which
continues to be active as new computer architectures
emerge
Useful to develop software tools that parse computer
codes or strings
E.g., editors, debuggers, interpreters, preprocessors, …
Important to understand how compliers work to program
more effectively
5
Compilation Process
Something that
Something we
computer can
can understand
understand easily
easily
Source Code Compilation Process Object Code
Error Messages
6
Phases of a Compiler
Analysis Synthesis
Source Code
Lexical Analyzer
Syntax Analyzer
Analysis
Symbol Semantic Analyzer
Error
Table
Handler
Manager
Intermediate Code Generator
Code Optimizer
Synthesis
Code Generator
Object Code 8
Position=initial + rate*60
Source Code
Lexical Analyzer
Syntax Analyzer
Symbol Semantic Analyzer
Error
Table
Handler
Manager
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 9
Lexical Analyzer (Scanner)
Lexical Analyzer (Scanner)
It reads a stream of characters and groups the
characters into tokens
Learn by Example
Position = initial + rate*60
Tokens Generated
1. Identifier#1 Position
2. Assignment Operator =
3. Identifier#2 initial
4. Addition Operator +
5. Identifier#3 rate
6. Multiplication Operator *
7. Number 60
Learn by doing
Percentage = Marks_Obtained / Total * 100
11
Scanning/Tokenization
Input
File Token Buffer
What does the Token Buffer contain?
Token being identified
Why a two-way ( ) street?
Characters can be read
and unread
Termination of a token
Example
main()
m
Example
main()
am
Example
main()
iam
Example
main()
niam
Example
main()
(niam
Example
main()
niam
Keyword: main
Source Code
id1 = id2 + id3*number
Lexical Analyzer
Syntax Analyzer
Symbol Semantic Analyzer
Error
Table
Handler
Manager
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 19
Syntax Analyzer (Parser)
Syntax Analyzer (Parser)
Uses the tokens produced by the lexical analyzer
to create a tree-like intermediate representation.
Parse tree depicts the grammatical structure of
the token stream.
Example
Source Code --> Position = initial +
rate*60
Lexical Analyzer --> id1= id2+ id3 * number
Parse Tree / Syntax
= Tree
id1 id2 + id3 * number
21
Syntax Analyzer (Parser)
=
id1 +
id2 Id3 * 60
22
Syntax Analyzer (Parser)
id1
+
position
id2 *
initial id3 number
rate 60
23
=
Source Code
id1 +
Lexical Analyzer position
id2 *
initial id3 numbe
Syntax Analyzer
rate 60
Semantic Analyzer
Error
Handler
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 24
Syntax Analyzer (Parser)
Learn by doing
Percentage = Marks_Obtained / Total *
100
25
Parser
Translating code to rules of a
grammar
Control the overall operation
Demands scanner to produce a token
Failure: Syntax Error!
Success:
Does nothing and returns to get next
token, or
Takes semantic action
Grammar Rules
<C-PROG> MAIN OPENPAR <PARAMS> CLOSEPAR <MAIN-BODY>
<PARAMS> NULL
<PARAMS> VAR <VAR-LIST>
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT>
CURLYCLOSE
<DECL-STMT> <TYPE> VAR <VAR-LIST>;
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
<EXPR> VAR<OP><EXPR>
<OP> +
<OP> -
<TYPE> INT
<TYPE> FLOAT
Demo
main() { Scanner Token Buffer
int a,b;
a = b;
}
Parser
Demo
main() { Scanner Token Buffer
int a,b;
a = b; "Please, get me
} the next token"
Parser
Demo
main() { Scanner m
int a,b;
a = b;
}
Parser
Demo
main() { Scanner am
int a,b;
a = b;
}
Parser
Demo
main() { Scanner iam
int a,b;
a = b;
}
Parser
Demo
main() { Scanner niam
int a,b;
a = b;
}
Parser
Demo
main() { Scanner (niam
int a,b;
a = b;
}
Parser
Demo
main() { Scanner niam
int a,b;
a = b;
}
Parser
Demo
main() { Scanner Token Buffer
int a,b;
a = b; Token: main
}
Parser
Demo
main() { Scanner Token Buffer
int a,b;
a = b;
}
Parser
"I recognize this"
Parsing (Matching)
Start matching using a rule
When match takes place at certain
position, move further (get next token &
repeat)
If expansion needs to be done, choose
appropriate rule (How to decide which rule
to choose?)
If no rule found, declare error
If several rules found, the grammar (set of
rules) is ambiguous
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; "Please, get me
} the next token"
Parser
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: MAIN
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; "Please, get me
} the next token"
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: OPENPAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: CLOSEPAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<PARAMETERS> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: CLOSEPAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<PARAMETERS> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: CLOSEPAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: CURLYOPEN
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: INT
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<TYPE> INT
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: INT
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<TYPE> INT
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: INT
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<TYPE> INT
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ',' [COMMA]
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
<VARLIST> , VAR <VARLIST>
<VARLIST> NULL
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<DECL-STMT> <TYPE>VAR<VAR-LIST>;
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: '='
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: VAR
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
<EXPR> VAR
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: ';'
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
<ASSIGN-STMT> VAR = <EXPR>;
Scanning & Parsing Combined
main() { Scanner
int a,b;
a = b; Token: CURLYCLOSE
}
Parser
<C-PROG> MAIN OPENPAR <PARAMETERS> CLOSEPAR <MAIN-BODY>
<MAIN-BODY> CURLYOPEN <DECL-STMT> <ASSIGN-STMT> CURLYCLOSE
What Is Happening?
During/after parsing?
Tokens get gobbled
Symbol tables
Variables have attributes
Declaration attached attributes to
variables
Semantic actions
What are semantic actions?
Semantic checks
Symbol Table
int a,b;
Declares a and b
Within current scope
Type integer
Use of a and b now legal
Basic Symbol Table
Name Type Scope
a int "main"
b int "main"
Lets Revise !
Source Code Position=initial + rate*60
Lexical Analyzer
Syntax Analyzer
Symbol Semantic Analyzer
Error
Table
Handler
Manager
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 70
Source Code id1 = id2 + id3*number
Lexical Analyzer
Syntax Analyzer
Symbol Semantic Analyzer
Error
Table
Handler
Manager
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 71
=
=
Source Code
id1
id1 +
+
Lexical Analyzer position
position id2
id2 **
initial
initial id3
id3 numbe
numbe
Syntax Analyzer
rate
rate 60
60
Semantic Analyzer
Error
Handler
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code 72