Compiler
Construction
Lecture 3
Topics Covered in
Lecture 2
Source Code
Lexical Analyzer
Syntax Analyzer
Symbol
Table
Manager
Semantic Analyzer
Error
Handler
Intermediate Code Generator
Code Optimizer
Code Generator
Object Code
Lexical Analyzer
(Part One)
Lexical Analysis
INPUT: sequence of characters
OUTPUT: sequence of tokens
Next_char()
Input
Next_token()
Scanner
character
Parser
token
Symbol
Table
A lexical analyzer is generally a subroutine
of parser
A symbol table is a data structure
containing a record of each identifier along
with its attributes
Role of Lexical Analyzer
1.
2.
3.
4.
5.
6.
Removal of white space
Removal of comments
Recognizes constants
Recognizes Keywords
Recognizes identifiers
Correlates error messages with the
source program
6
1. Removal of white space
By white space we mean
Blanks
Tabs
New lines
Why ?
White space is generally used for
formatting source code.
A = B + C
Equals
A=B+C
7
1. Removal of white space
Learn by Example
// This is beginning of my code
int A;
int B = 2;
int
C = 33;
A = B + C;
/* This is
end of
my code
*/
1. Removal of white space
Learn by Doing
// This is beginning of my code
int A ;
A = A
*
A
;
/* This is
end of
my code
*/
2. Removal of comments
Why ?
Comments are user-added strings which
do not contribute to the source code
Example in Java
// This is beginning of my code Means nothing to the program
int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
Means nothing to the program
end of
my code
*/
10
3. Recognizes
constants/numbers
How is recognition done?
If the source code contains a stream of digits
coming together, it shall be recognized as a
constant.
Example in Java
// This is beginning of my code
int A;
int B = 2 ;
int C = 33 ;
A = B + C;
/* This is
end of
my code
*/
11
4. Recognizes keywords
Keywords in C and Java
If , else , for, while, do , return etc
How is recognition done?
By comparing the combination of letters with/without
digits in source code with keywords pre defined in the
grammar of the programming
language
Considered a keyword if character sequence
Example in Java 1. I
int A;
int B = 2 ;
int C = 33 ;
If ( B < C )
A = B + C;
else
A= C-B
2.
3.
N
T
Considered a keyword if character sequence
1. I
2. F
Considered a keyword if character sequence
1. E
2. L
3.S
4.E
12
5. Recognizes identifiers
What are identifiers ?
Names of variables, functions, arrays , etc
How is recognition done?
If the combination of letters with/without digits in source code is not a keyword,
then compiler considers it as an identifier.
Where is identifier stored ?
When an identifier is detected, it is entered into the symbol table
Example in Java
// This is beginning of my code
int A;
int B2 = 2 ;
int C4R = 33 ;
A = B + C;
/* This is
end of
my code
*/
13
6. Correlates error messages with
the source program
How ?
Keeps track of the number of new line characters seen
in the source code
Tells the line number when an error message is to be
Error Message at line 1
generated.
Example in Java
1.
2.
3.
4.
5.
6.
7.
8.
9.
This is beginning of my code
int A;
int B2 = 2 ;
int
C4R = 33 ;
A = B + C;
/* This is
end of
my code
*/
14
Errors generated by Lexical
Analyzer
1. Illegal symbols
=>
2. Illegal identifiers
2ab
3. Un terminated comments
/* This is beginning of my code
15
Learn by example
// Beginning of Code
int a char } switch b[2] =;
// end of code
No error generated
Why ?
It is the job of syntax analyzer
16
Terminologies
Token
A classification for a common set of strings
Examples:
Identifier, Integer, Float, LeftParen
Lexeme
Actual sequence of characters that matches a pattern and has
a given Token class.
Examples:
Identifier: Name, Data, x
Integer: 345, 2, 0, 629
Pattern
The rules that characterize the set of strings for a token
Example:
Integer: A digit followed or not followed by digits
Identifier: A character followed or not followed by characters or
digits
17
18
Learn by Example:
Input string: size := r * 32 + c
Identify the <token ,lexeme> pairs
1. <id, size>
2. <assign, :=>
3. <id, r>
4. <arith_symbol, *>
5. <integer, 32>
6. <arith_symbol, +>
7. <id, c>
19
Learn by Doing
Input string:
position = initial + rate * 60
Identify the <token ,lexeme> pairs
20
Lets Revise!
21
Lexical Analysis
Next_char()
Input
character
Next_token()
Scanner
token
Parser
Symbol
Table
22
Role of Lexical Analyzer
1.
2.
3.
4.
5.
6.
Removal of white space
Removal of comments
Recognizes constants
Recognizes Keywords
Recognizes identifiers
Correlates error messages with the
source program
23
Terminologies
Token
Identifier, Integer, Float, LeftParen
Lexeme
Identifier: Name, Data, x
Integer: 345, 2, 0, 629
Pattern
Example:
Integer: A digit followed or not followed by
digits
Identifier: A character followed or not followed
by characters or
digits
24
Homework
Identify the <token ,lexeme> pairs
1. For ( int x= 0; x<=5; x++)
2. B= (( c + a) * d ) / f
3. While ( a < 5 )
a= a+1
4. Char MyCourse[5];
5. if ( a< b)
a=a*a;
else
b=b*b;
25
Assignment-1
Write a program in C++ or Java that reads a
source file and performs the followings
operations:
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes Identifiers
Due Date: 28th Nov, 2014
26