0% found this document useful (1 vote)
242 views26 pages

Lec 3-Compiler Construction

The document discusses the role and functions of a lexical analyzer in compiler construction. It covers the following key points in 3 sentences: A lexical analyzer removes whitespace and comments from source code. It recognizes constants, keywords, and identifiers by comparing character sequences to predefined patterns. It correlates error messages to line numbers in the source code and communicates detected errors to other compiler components like the parser.

Uploaded by

Bilal Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
242 views26 pages

Lec 3-Compiler Construction

The document discusses the role and functions of a lexical analyzer in compiler construction. It covers the following key points in 3 sentences: A lexical analyzer removes whitespace and comments from source code. It recognizes constants, keywords, and identifiers by comparing character sequences to predefined patterns. It correlates error messages to line numbers in the source code and communicates detected errors to other compiler components like the parser.

Uploaded by

Bilal Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Compiler

Construction
Lecture 3

Topics Covered in
Lecture 2

Source Code
Lexical Analyzer

Syntax Analyzer

Symbol
Table
Manager

Semantic Analyzer

Error
Handler

Intermediate Code Generator

Code Optimizer

Code Generator
Object Code

Lexical Analyzer
(Part One)

Lexical Analysis

INPUT: sequence of characters


OUTPUT: sequence of tokens
Next_char()

Input

Next_token()

Scanner
character

Parser
token

Symbol
Table

A lexical analyzer is generally a subroutine


of parser
A symbol table is a data structure
containing a record of each identifier along
with its attributes

Role of Lexical Analyzer


1.
2.
3.
4.
5.
6.

Removal of white space


Removal of comments
Recognizes constants
Recognizes Keywords
Recognizes identifiers
Correlates error messages with the
source program
6

1. Removal of white space


By white space we mean
Blanks
Tabs
New lines

Why ?
White space is generally used for
formatting source code.
A = B + C

Equals

A=B+C
7

1. Removal of white space


Learn by Example
// This is beginning of my code
int A;
int B = 2;
int
C = 33;
A = B + C;
/* This is
end of
my code
*/

1. Removal of white space


Learn by Doing
// This is beginning of my code
int A ;
A = A
*
A
;
/* This is
end of
my code
*/

2. Removal of comments
Why ?

Comments are user-added strings which


do not contribute to the source code

Example in Java

// This is beginning of my code Means nothing to the program


int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
Means nothing to the program
end of
my code
*/
10

3. Recognizes
constants/numbers
How is recognition done?
If the source code contains a stream of digits
coming together, it shall be recognized as a
constant.

Example in Java
// This is beginning of my code
int A;
int B = 2 ;
int C = 33 ;
A = B + C;
/* This is
end of
my code
*/

11

4. Recognizes keywords
Keywords in C and Java

If , else , for, while, do , return etc

How is recognition done?

By comparing the combination of letters with/without


digits in source code with keywords pre defined in the
grammar of the programming
language
Considered a keyword if character sequence
Example in Java 1. I
int A;
int B = 2 ;
int C = 33 ;
If ( B < C )
A = B + C;
else
A= C-B

2.
3.

N
T

Considered a keyword if character sequence


1. I
2. F
Considered a keyword if character sequence
1. E
2. L
3.S
4.E
12

5. Recognizes identifiers
What are identifiers ?

Names of variables, functions, arrays , etc

How is recognition done?

If the combination of letters with/without digits in source code is not a keyword,


then compiler considers it as an identifier.

Where is identifier stored ?

When an identifier is detected, it is entered into the symbol table

Example in Java

// This is beginning of my code


int A;
int B2 = 2 ;
int C4R = 33 ;
A = B + C;
/* This is
end of
my code
*/

13

6. Correlates error messages with


the source program
How ?

Keeps track of the number of new line characters seen


in the source code
Tells the line number when an error message is to be
Error Message at line 1
generated.

Example in Java
1.
2.
3.
4.
5.
6.
7.
8.
9.

This is beginning of my code


int A;
int B2 = 2 ;
int
C4R = 33 ;
A = B + C;
/* This is
end of
my code
*/
14

Errors generated by Lexical


Analyzer
1. Illegal symbols
=>

2. Illegal identifiers
2ab

3. Un terminated comments
/* This is beginning of my code

15

Learn by example
// Beginning of Code

int a char } switch b[2] =;


// end of code

No error generated
Why ?
It is the job of syntax analyzer
16

Terminologies
Token

A classification for a common set of strings


Examples:
Identifier, Integer, Float, LeftParen

Lexeme

Actual sequence of characters that matches a pattern and has


a given Token class.
Examples:
Identifier: Name, Data, x
Integer: 345, 2, 0, 629

Pattern

The rules that characterize the set of strings for a token


Example:
Integer: A digit followed or not followed by digits
Identifier: A character followed or not followed by characters or
digits
17

18

Learn by Example:
Input string: size := r * 32 + c
Identify the <token ,lexeme> pairs
1. <id, size>
2. <assign, :=>
3. <id, r>
4. <arith_symbol, *>
5. <integer, 32>
6. <arith_symbol, +>
7. <id, c>

19

Learn by Doing
Input string:
position = initial + rate * 60
Identify the <token ,lexeme> pairs

20

Lets Revise!

21

Lexical Analysis

Next_char()

Input
character

Next_token()
Scanner

token

Parser

Symbol
Table

22

Role of Lexical Analyzer


1.
2.
3.
4.
5.
6.

Removal of white space


Removal of comments
Recognizes constants
Recognizes Keywords
Recognizes identifiers
Correlates error messages with the
source program
23

Terminologies
Token
Identifier, Integer, Float, LeftParen

Lexeme
Identifier: Name, Data, x
Integer: 345, 2, 0, 629

Pattern
Example:
Integer: A digit followed or not followed by
digits
Identifier: A character followed or not followed
by characters or
digits
24

Homework
Identify the <token ,lexeme> pairs
1. For ( int x= 0; x<=5; x++)
2. B= (( c + a) * d ) / f
3. While ( a < 5 )
a= a+1
4. Char MyCourse[5];
5. if ( a< b)
a=a*a;
else
b=b*b;
25

Assignment-1
Write a program in C++ or Java that reads a
source file and performs the followings
operations:
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes Identifiers
Due Date: 28th Nov, 2014
26

You might also like