0% found this document useful (0 votes)

28 views39 pages

File 1675742677 110405 LexicalAnalysis-Continue1

The document discusses the role of lexical analyzers in programming, detailing the process of token recognition using regular expressions and finite automata. It explains the architecture of a transition-diagram-based lexical analyzer and the differences between deterministic and nondeterministic finite automata. Additionally, it covers the conversion of regular expressions to finite automata and the implementation of a DFA using a table-driven approach.

Uploaded by

ifexplora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views39 pages

File 1675742677 110405 LexicalAnalysis-Continue1

Uploaded by

ifexplora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

The role of lexical analyzer

token
Source Lexical To semantic
program Parser analysis
Analyzer
getNextToken

Symbol
table
Regular expressions
Ɛ is a regular expression, L(Ɛ) = {Ɛ}
If a is a symbol in ∑then a is a regular
expression, L(a) = {a}
(r) | (s) is a regular expression denoting the
language L(r) ∪ L(s)
 (r)(s) is a regular expression denoting the
language L(r)L(s)
(r)* is a regular expression denoting (L9r))*
(r) is a regular expression denting L(r)
Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn

 Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
One or more instances: (r)+
Zero of one instances: r?
Character classes: [abc]

Example:
letter_ -> [A-Za-z_]
digit -> [0-9]
id -> letter_(letter|digit)*
Recognition of tokens
Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams
Transition diagram for relop
Transition diagrams (cont.)
Transition diagram for reserved words and
identifiers
Transition diagrams (cont.)
Transition diagram for unsigned numbers
Transition diagrams (cont.)
Transition diagram for whitespace
Architecture of a transition-
diagram-based lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
[Link] = GT;
return(retToken);
}
Lexical Analyzer Generator -
Lex
Lex Source Lexical [Link].c
program Compiler
lex.l

[Link].c
C [Link]
compiler

Sequence
Input stream [Link]
of tokens
Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example
%{
Int installID() {/* funtion to
/* definitions of manifest constants
install the lexeme, whose first
LT, LE, EQ, NE, GT, GE, character is pointed to by
IF, THEN, ELSE, ID, NUMBER, RELOP */ yytext, and whose length is
%} yyleng, into the symbol table
and return a pointer thereto
*/
/* regular definitions
}
delim [ \t\n]
ws {delim}+
Int installNum() { /* similar to
letter [A-Za-z]
installID, but puts numerical
digit [0-9] constants into a separate
id {letter}({letter}|{digit})* table */
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)? }

%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum();
return(NUMBER);}
…
Finite Automata
Regular expressions = specification
Finite automata = implementation

A finite automaton consists of

An input alphabet 
A set of states S
A start state n
A set of accepting states F  S
A set of transitions state input state

16
Finite Automata
Transition
s1 a s2
Is read
In state s1 on input “a” go to state s2

If end of input

If in accepting state => accept, othewise =>
reject
If no transition possible => reject

17
Finite
A state
Automata State Graphs

• The start state

• An accepting state

a
• A transition

18
A Simple Example
A finite automaton that accepts only “1”

A finite automaton accepts a string if we can follow transitions labeled with the characters in the string from the start to some accepting state

19
Another Simple Example
A finite automaton accepting any number of 1’s followed by a single 0
Alphabet: {0,1}

Check that “1110” is accepted but “110…” is not

20
And Another
Alphabet {0,1}
Example
What language does this recognize?

1 0

0 0

1
1

21
And Another Example
Alphabet still { 0, 1 }
1

The operation of the automaton is not

completely defined by the input
On input “11” the automaton could be in either
state
22
Epsilon Moves
Another kind of transition: -moves

A B

• Machine can move from state A to state B

without reading input

23
Deterministic and
Nondeterministic Automata
Deterministic Finite Automata (DFA)
One transition per input per state
No -moves
Nondeterministic Finite Automata (NFA)
Can have multiple transitions for one input in a
given state
Can have -moves
Finite automata have finite memory
Need only to encode the current state

24
Execution of Finite Automata
A DFA can take only one path through the
state graph
Completely determined by input

NFAs can choose

Whether to make -moves
Which of multiple transitions for a single input
to take

25
Acceptance of NFAs
An NFA can get into multiple states
1

0 1

• Input: 1 0 1

• Rule: NFA accepts if it can get in a final state

26
NFA vs. DFA (1)
NFAs and DFAs recognize the same set of
languages (regular languages)

DFAs are easier to implement

There are no choices to consider

27
NFA vs. DFA (2)
For a given language the NFA can be simpler
than the DFA
1
0 0
NFA
0

1 0
0 0
DFA
1
1
• DFA can be exponentially larger than NFA

28
Regular Expressions to Finite
Automata
High-level sketch

NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

29
Regular Expressions to NFA (1)
For each kind of rexp, define an NFA
Notation: NFA for rexp A

• For 


• For input a
a

30
Regular Expressions to NFA (2)
For AB

A  B

• For A | B
B 


 A

31
Regular Expressions to NFA (3)
For A*


A


32
Example of RegExp -> NFA
conversion
Consider the regular expression
(1 | 0)*1
The NFA is


 C 1 E 
A B G 1
 0 F H  I J
 D 


33
Next
NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

34
NFA to DFA. The Trick
Simulate the NFA
Each state of resulting DFA
= a non-empty subset of states of the NFA
Start state
= the set of NFA states reachable through -
moves from NFA start state
Add a transition S a S’ to DFA iff
S’ is the set of NFA states reachable from the
states in S after seeing the input a
 considering -moves as well
35
NFA -> DFA Example


 C 1 E 
A B G 1
 0 F H  I J
 D 

0
0 FGABCDHI
ABCDHI 0 1
1
1 EJGABCDHI

36
NFA to DFA. Remark
An NFA may be in many states at any time

How many different states ?

If there are N states, the NFA must be in

some subset of those N states

How many non-empty subsets are there?

2N - 1 = finitely many, but exponentially many

37
Implementation
A DFA can be implemented by a 2D table T
One dimension is “states”
Other dimension is “input symbols”
For every transition Si a Sk define T[i,a] = k
DFA “execution”
If in state Si and input a, read T[i,a] = k and
skip to state Sk
Very efficient

38
Table Implementation of a DFA
0
0 T
S 0 1
1
1 U

0 1
S T U
T T U
U T U

Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Token Recognition in Compiler Design
No ratings yet
Token Recognition in Compiler Design
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Chapter 3 Implementation - of - Lexical - Analysis
No ratings yet
Chapter 3 Implementation - of - Lexical - Analysis
63 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
(3rd Year) Compiler PPT RM
No ratings yet
(3rd Year) Compiler PPT RM
50 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Lecture 04
No ratings yet
Lecture 04
37 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Chapter-2 Compiler Design
No ratings yet
Chapter-2 Compiler Design
98 pages
3 Regex
No ratings yet
3 Regex
16 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Scanner and Token Recognition Basics
No ratings yet
Scanner and Token Recognition Basics
26 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
PLDI Week 06 Parsing
No ratings yet
PLDI Week 06 Parsing
55 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
Lexical Analysis: Tokens & Patterns Explained
No ratings yet
Lexical Analysis: Tokens & Patterns Explained
77 pages
Compiler
No ratings yet
Compiler
60 pages
CD ch2
No ratings yet
CD ch2
104 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
32 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Lecture Week 03
No ratings yet
Lecture Week 03
24 pages
Unit 2-Introduction To Compilers
No ratings yet
Unit 2-Introduction To Compilers
51 pages
Lect 03
No ratings yet
Lect 03
19 pages
Week 02
No ratings yet
Week 02
28 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
CH 2
No ratings yet
CH 2
36 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
16 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Implementation of The Regular Expression
No ratings yet
Implementation of The Regular Expression
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
16 pages
A Mini Project Report On Hotel Management
No ratings yet
A Mini Project Report On Hotel Management
28 pages
Clean Architecture
No ratings yet
Clean Architecture
29 pages
Python by Pug Al
No ratings yet
Python by Pug Al
211 pages
.Trashed 1741781836 Complete Exam Solutions
100% (1)
.Trashed 1741781836 Complete Exam Solutions
10 pages
ITP1 Prelims 2021 QP
No ratings yet
ITP1 Prelims 2021 QP
15 pages
Computer Science Students' Worksheet
No ratings yet
Computer Science Students' Worksheet
6 pages
Understanding the Stable Marriage Problem
No ratings yet
Understanding the Stable Marriage Problem
17 pages
21.mysql - Students. 1
No ratings yet
21.mysql - Students. 1
6 pages
Overview of Sorting Algorithms
No ratings yet
Overview of Sorting Algorithms
24 pages
Lab Haskell Programs 1
No ratings yet
Lab Haskell Programs 1
15 pages
Unit 2 Python
No ratings yet
Unit 2 Python
17 pages
AI Engineer Candidate Task
No ratings yet
AI Engineer Candidate Task
3 pages
Set Up JBPM with Liferay Guide
100% (2)
Set Up JBPM with Liferay Guide
5 pages
Minimum Steps to Gold in Dungeon
No ratings yet
Minimum Steps to Gold in Dungeon
19 pages
Dse 2225 Os Midterm
No ratings yet
Dse 2225 Os Midterm
4 pages
Salesforce Spring '25 Release Overview
No ratings yet
Salesforce Spring '25 Release Overview
36 pages
Unit 5 Laravel PHP and MySQL
No ratings yet
Unit 5 Laravel PHP and MySQL
15 pages
深入浅出oracle ebs之核心功能（mfg）
No ratings yet
深入浅出oracle ebs之核心功能（mfg）
93 pages
C Programming: Functions, Switch, Loops
No ratings yet
C Programming: Functions, Switch, Loops
114 pages
Topic 5 - Assembler Directives
No ratings yet
Topic 5 - Assembler Directives
26 pages
Balancing Decoding Speech and Memory Usage
No ratings yet
Balancing Decoding Speech and Memory Usage
15 pages
Java Message Encryption and Billing System
No ratings yet
Java Message Encryption and Billing System
6 pages
Memo Basics .NET and Office 2003
No ratings yet
Memo Basics .NET and Office 2003
14 pages
04 Laboratory Exercise 1
No ratings yet
04 Laboratory Exercise 1
3 pages
Query Builder
No ratings yet
Query Builder
58 pages
MSC 1 Sem Computer Science Programming in Java 2950 Summer 2019
No ratings yet
MSC 1 Sem Computer Science Programming in Java 2950 Summer 2019
1 page
Oracle 8i Basics for Beginners
No ratings yet
Oracle 8i Basics for Beginners
18 pages
XSA Data
No ratings yet
XSA Data
2 pages
Adil Practicall Final
No ratings yet
Adil Practicall Final
48 pages
AS228 Encoder Okuma
No ratings yet
AS228 Encoder Okuma
13 pages

File 1675742677 110405 LexicalAnalysis-Continue1

Uploaded by

File 1675742677 110405 LexicalAnalysis-Continue1

Uploaded by

The role of lexical analyzer

A finite automaton consists of

If end of input

• The start state

Check that “1110” is accepted but “110…” is not

The operation of the automaton is not

• Machine can move from state A to state B

NFAs can choose

• Rule: NFA accepts if it can get in a final state

DFAs are easier to implement

How many different states ?

If there are N states, the NFA must be in

How many non-empty subsets are there?

You might also like