Lexical Analyzer
Lexical Analyzer
Topics to be covered
✓ Looping
• Interaction of scanner & parser
• Token, Pattern & Lexemes
• Input buffering
• Specification of tokens
• Regular expression & Regular definition
• Transition diagram
• Hard coding & automatic generation lexical analyzers
• Finite automata
• Regular expression to NFA using Thompson's rule
• Conversion from NFA to DFA using subset construction method
• DFA optimization
Interaction with Scanner & Parser
Interaction of scanner & parser
Token
Source Lexical
Parser
Program Analyzer
Get next token
Symbol Table
• Upon receiving a “Get next token” command from parser, the lexical analyzer
reads the input character until it can identify the next token.
• Lexical analyzer also stripping out comments and white space in the form of
blanks, tabs, and newline characters from the source program.
Why to separate lexical analysis & parsing?
1. Simplicity in design.
2. Improves compiler efficiency.
3. Enhance compiler portability.
Token, Pattern & Lexemes
Token, Pattern & Lexemes
Token Pattern
The set of rules called pattern associated
Sequence of character having a
with a token.
collective meaning is known as
Example: “non-empty sequence of digits”,
token. “letter followed by letters and digits”
Categories of Tokens:
1. Identifier Lexemes
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Input buffering
Input buffering
There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels
Buffer Pair
The lexical analysis scans the input string from left to right one character at a
time.
Buffer divided into two N-character halves, where N is the number of character on
one disk block. : : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
forward forward
lexeme_beginnig
forward
lexeme_beginnig
In buffer pairs we must check, each time we move the forward pointer that we
have not moved off one of the buffers.
Thus, for each character read, we make two tests.
We can combine the buffer-end test with the test for the current character.
We can reduce the two tests to one if we extend each buffer to hold a sentinel
character at the end.
The sentinel is a special character that cannot be part of the source program, and
a natural choice is the character EOF.
Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof
Written L U M
L U M = { s | s is in L or s is in M }
Concatenation of L
and M LM = { st | s is in L and t is in M }
Written LM
*
𝜖
a
aa
aaa Infinite …..
aaaa
aaaaa…..
Regular expression
L = One or More Occurrences of a =
a+
+ a
aa
aaa
aaaa
aaaaa…..
Infinite …..
Precedence and associativity of operators
Operator Precedence Associative
Kleene * 1 left
Concatenation 2 left
Union | 3 left
Regular expression examples
Regular expression examples
7. 0 or more occurrence of either a or b or both
is a
state
is a transition
is a start state
is a final state
Transition Diagram : Relational operator
< =
2 return (relop,LE)
>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
=
7 return (relop,GE)
other
8 return (relop,GT)
Transition diagram : Unsigned number
E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Hard coding & automatic generation Lexical
analyzers
Hard coding and automatic generation lexical analyzers
Lexical analysis is about identifying the pattern from the input.
To recognize the pattern, transition diagram is constructed.
It is known as hard coding lexical analyzer.
Example: to represent identifier in ‘C’, the first character must be letter and other
characters are either letter or digits.
To recognize this pattern, hard coding lexical analyzer will work with a transition
diagram.
The automatic generation lexical analyzer takes special notation as input.
For example, lex compiler tool will take regular expression as input and finds out
the pattern matching to that regular expression.
Letter or digit
Start Letter
1 2 3
Finite Automata
Finite Automata
Types of finite automata
Types of finite automata are:
DFA
b
a b b
1 2 3 4
b NFA
Regular expression to NFA using Thompson's
rule
Regular expression to NFA using Thompson's rule
start
start 𝜖 N(s) N(t)
start a a b
1 2 3
Regular expression to NFA using Thompson's rule
𝜖
N(s) 𝜖
𝜖
start 𝜖 𝜖
start N(s)
𝜖 N(t) 𝜖 𝜖
𝜖
a
2 3
𝜖 𝜖 𝜖 𝜖
1 2 3
1 6
𝜖 𝜖 𝜖
4 5
b
Regular expression to NFA using Thompson's rule
a*b
𝜖 𝜖
1 2 3
𝜖
b*ab
𝜖 𝜖
1 2 3 5
𝜖
Exercise
Convert following regular expression to NFA:
1. abba
2. bb(a)*
3. (a|b)*
4. a* | b*
5. a(a)*ab
6. aa*+ bb*
7. (a+b)*abb
8. 10(0+1)*1
9. (a+b)*a(a+b)
10. (0+1)*010(0+1)*
11. (010+00)*(10)*
12. 100(1)*00(0+1)*
Conversion from NFA to DFA using subset
construction method
Subset construction algorithm
OPERATION DESCRIPTION
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
𝜖
Conversion from NFA to DFA
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
𝜖- Closure(0)= {0, 1, 7, 2, 4}
= {0,1,2,4,7} ---- A
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
𝜖 𝜖
4 5
b
𝜖
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
𝜖- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
𝜖
A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
𝜖- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
𝜖
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
𝜖- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
𝜖- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
𝜖- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
𝜖
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
𝜖- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B
b
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
𝜖- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
𝜖
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
𝜖- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
𝜖- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
𝜖- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C
Conversion from NFA to DFA
b
States a b
a
A = {0,1,2,4,7} B C a
B = {1,2,3,4,6,7,8} B D
a a b
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E b
E = {1,2,4,5,6,7,10} B C b
Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA
Exercise
Convert following regular expression to DFA using subset construction method:
1. (a+b)*a(a+b)
2. (a+b)*ab*a
DFA optimization
DFA optimization
DFA optimization
DFA optimization
States a b
A B C
B B D
C B C
D B E
E B C
States a b
A B A
B B D
D B E
Now no more splitting is possible.
E B A
If we chose A as the representative for Optimized
group (AC), then we obtain reduced Transition Table
transition table
Conversion from regular expression to DFA
Rules to compute nullable, firstpos, lastpos
Rules to compute nullable, firstpos, lastpos
Node n nullable(n) firstpos(n) lastpos(n)
true
false
if (nullable(c1)) if (nullable(c2))
n nullable(c1)
thenfirstpos(c1) ∪ then lastpos(c1)
and
c1 firstpos(c2) ∪ lastpos(c2)
c2 nullable(c2)
else firstpos(c1) else lastpos(c2)
n
true firstpos(c1) lastpos(c1)
c1
Rules to compute followpos
1. If n is concatenation node with left child c1 and right child c2 and i is a position
in lastpos(c1), then all position in firstpos(c2) are in followpos(i)
n
firstpos(c1)
c1
n if (nullable(c1))
thenfirstpos(c1) ∪
firstpos(c2)
c1 c2 else firstpos(c1)
Conversion from regular expression to DFA
Step 3: Calculate lastpos
Lastpos
.
.
. n
. lastpos(c1) ∪ lastpos(c2)
c1 c2
n
lastpos(c1)
c1
n if (nullable(c2)) then
lastpos(c1) ∪ lastpos(c2)
else lastpos(c2)
c1 c2
Conversion from regular expression to DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos .
Lastpos
.
.
. .
Conversion from regular expression to DFA
Step 4: Calculate followpos Position followpos
5 6
. 4 5
.
.
. .
Conversion from regular expression to DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos . 4 5
Lastpos
. 3 4
.
. .
Conversion from regular expression to DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos . 4 5
Lastpos
. 3 4
2 3
.
1 3
. .
Conversion from regular expression to DFA
Step 4: Calculate followpos Position followpos
5 6
Firstpos . 4 5
Lastpos
. 3 4
2 1,2, 3
.
1 1,2, 3
.
*
Conversion from regular expression to DFA
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4}
Conversion from regular expression to DFA
State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3) 5 6
DFA
Conversion from regular expression to DFA
Construct DFA for following regular expression:
1. (c | d)*c#
Thank You