Unit II Regular Expression
Unit II Regular Expression
UNIT II:
Regular Expressions
Representator
Example: (a + b c) *
r1 + r2 {a}
Given
regular r1 r2
expressions Are regular expressions
r1 and r2 r1 *
(r1 )
Examples
Example
L( ) =
L( ) =
L(a ) = a
Definition (continued)
L(r1 *) = ( L(r1 )) *
L((r1 )) = L(r1 )
Example
Regular expression: (a + b ) a *
L((a + b ) a *) = L((a + b )) L(a *)
= L(a + b ) L(a *)
= ( L(a ) L(b )) ( L(a )) *
= (a b) (a) *
= a, b , a, aa, aaa,...
= a, aa, aaa,..., b, ba, baa,...
Example
Regular expression r = (a + b ) * (a + bb )
L(r ) = {a b
2n 2m
b : n, m 0}
Equivalent Regular Expressions
Definition:
Languages
Generated by
Regular Expressions
= Regular
Languages
Find RE for the Language of strings of Length 2 over the
alphabet {a,b}
Find RE for the Language of strings of Length at least 2
over the alphabet {a,b}
Find RE for the Language of strings of Length at most 2
over the alphabet {a,b}
Find RE for the Language of strings of even Length over the
alphabet {a,b}
Find RE for the Language of strings of odd Length over the
alphabet {a,b}
Find RE for the Language of strings which is divisible by 3
over the alphabet {a,b}
Find RE for the Language of strings ~= 2 mod 3 over the
alphabet {a,b}
Find RE for the Language of strings of a’s exactly 2 over the
alphabet {a,b}
• Find RE for the Language of strings of a’s atleast 2 over the
alphabet {a,b}
• Find RE for the Language of strings of a’s atmost 2 over the
alphabet {a,b}
• Find RE for the Language of strings of even length a’s over
the alphabet {a,b}
• Find RE for the Language of strings which starts with a over
the alphabet {a,b}
• Find RE for the Language of strings which ends with a over
the alphabet {a,b}
• Find RE for the Language of strings which contains a over
the alphabet {a,b}
• Find RE for the Language of strings which starts and ends
with different symbol over the alphabet {a,b}
• Find RE for the Language of strings which starts and ends
with Same symbol over the alphabet {a,b}
• Give RE for Following over the alphabet {0,1}
(Aug-2015 6 Marks)
– All binary strings with at least one 0
– All binary strings with at Most one 0
• Language of all strings containing substring 00
• The Language of Strings in {a, b}∗ Ending with b
and Not Containing aa
• Find RE for the Language of strings which does
not contains two a’s together over the alphabet
{a,b}
Review
• Regular Expression
Identities Related to Regular Expressions
• Given R, P, L, Q as regular expressions, the following
identities hold −
• ∅* = ε
• ε* = ε
• RR* = R*R
• R*R* = R*
• (R*)* = R*
• RR* = R*R
• (PQ)*P =P(QP)*
• (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
• R + ∅ = ∅ + R = R (The identity for union)
• R ε = ε R = R (The identity for concatenation)
• ∅ L = L ∅ = ∅ (The annihilator for
concatenation)
• R + R = R (Idempotent law)
• L (M + N) = LM + LN (Left distributive law)
• (M + N) L = ML + NL (Right distributive law)
• ε + RR* = ε + R*R = R*
Theorem
Languages
Generated by
Regular Expressions
= Regular
Languages
Proof:
Languages
Generated by Regular
Languages
Regular Expressions
Languages
Generated by Regular
Languages
Regular Expressions
Proof - Part 1
Languages
Generated by Regular
Languages
Regular Expressions
regular
L( M 2 ) = {} = L( )
languages
a
L( M 3 ) = {a} = L(a)
Inductive Hypothesis
Suppose
that for regular expressions r1 and r2 ,
L(r1) and L(r2 ) are regular languages
Inductive Step
We will prove:
L(r1 + r2 )
L(r1 r2 )
Are regular
Languages
L(r1 *)
L((r1 ))
Using the regular closure of these operations,
we can construct recursively the NFA M
that accepts L(M ) = L(r )
Example: r = r1 + r2
L(M1 ) = L(r1 )
L(M ) = L(r )
L(M2 ) = L(r2 )
Regular Expression r1 Regular Expression r2
NFA M1 NFA M2
M2
Concatenation
M1 M2
Star Operation
NFA for r1*
M1
Example
M1
a
r1 = (a b)
*
b
M2
r2 = (ba) b a
Example
L1 = {a b} n
r1 = (a b)
*
a
b
L2 = {ba} r2 = (ba)
b a
Example
r1 = (a b)
*
r2 = (ba)
L1 = {a b}n
a L2 = {ba}
b b a
• An NFA Corresponding to ((aa +b)*(aba)*bab)*
ε
a ε a
ε
ε
ε b
ε
ε ε a ε a
ε b
ε
ε
b ε a ε b
ε
Problems
• Construct an NFA with null-moves, which
accepts the language defined by:
• ((0 + 1)* 10 + (00)* (11)*)*
• 01[((10)+ + 111)* + 0]* 1
• (a / b)* ab.
Problems
• Construct DFA for the R.E l0 + (0 + ll)
• Construct DFA for regular expression (0 + l)*, (00 + ll)
Review
• Regular Expression
• Construction of RE from Language
• RE and DFA
• Conversion of RE to DFA
Proof - Part 2
Languages
Generated by Regular
Languages
Regular Expressions
Example: Corresponding
M Generalized transition graph
a c a c
a, b a+b
Problem
Construct a regular expression corresponding
to the automata given below:
q1 = q1a + q3a + ε
= q1a + q2aa + ε (Substituting value of q3)
Transition labels b b
are regular a
expressions q0 q1 a + b q2
b
In General
• Removing a state: e
d c
qi q qj
a b
By repeating the process until
two states are left, the resulting graph is
Initial graph Resulting graph
r1 r4
r3
q0 qf
r2
The resulting regular expression:
r = r1 * r2 (r4 + r3r1 * r2 ) *
L( r ) = L( M ) = L
End of Proof-Part 2
Standard Representations
of Regular Languages
Regular Languages
DFAs
Regular
NFAs
Expressions
When we say: We are given
a Regular Language L
(Pumping Lemma)
{a b : n 0}
n n
Non-regular languages
{vv : v {a, b}*}
R
Regular languages
a *b b*c + a
b + c ( a + b) *
etc...
How can we prove that a language L
is not regular?
3 pigeonholes
A pigeonhole must
contain at least two pigeons
n pigeons
...........
m pigeonholes nm
...........
The Pigeonhole Principle
n pigeons
m pigeonholes
There is a pigeonhole
nm with at least 2 pigeons
...........
The Pigeonhole Principle
and
DFAs
Consider a DFA with 4 states
b
b b
q1 a q2
a
q3 b q4
a a
Consider the walk of a “long’’ string: aaaab
(length at least 4)
b
b b
q1 a q2 a q3 b q4
a a
The state is repeated as a result of
the pigeonhole principle
Walk of aaaab
Pigeons: q1 a q2 a q3 a q2 a q3 b q4
(walk states)
Nests: q1 q2 q3 q4
(Automaton states) Repeated
state
Consider the walk of a “long’’ string: aabb
(length at least 4)
b
b b
q1 a q2 a q3 b q4
a a
The state is repeated as a result of
the pigeonhole principle
Walk of aabb
a a b b
Pigeons: q1 q2 q3 q4 q4
(walk states)
Nests: q1 q2 q3 q4
(Automaton states) Repeated
Automaton States
state
In General: If | w | # states of DFA ,
by the pigeonhole principle,
a state is repeated in the walk w
Walk of w = 1 2 k
q1 1 2 .... i q i +1 .... j
qi
j +1.... k
qz
i
Arbitrary DFA
q1 1 2 ...... ...... k
qi qz
Repeated state
| w | # states of DFA = m
Pigeons: Walk of w
(walk states)
q1 .... qi .... qi .... qz
Are
more
than
m
states
Take string w L with | w| m
(number of
states of DFA)
Walk in DFA of
w = 1 2 k
1 2 ...... q ...... k
Repeated state in DFA
There could be many states repeated
Unique states
Review
• DFA to RE using State Elimination Method
Non-regular languages
Regular Languages: Grand Unification
L( FA) = L( RG ) (Construction)
L( RG ) = L( RE ) (Solving linear
equations) study later
Pumping Lemma for Regular Languages
• It is a necessary condition.
– Every regular language satisfies it.
– If a language violates it, it is not regular.
• RL => PL not PL => not RL
• It is not a sufficient condition.
– Not every non-regular language violates it.
• not RL =>? PL or not PL (no conclusion)
We can write w = xyz
...
j i +1
1 ... i
q j +1
... ... k
x z
Observation: length | x y | m number
of states
of DFA
y
...
Unique States
j i +1
1 ... i
q Since, in xy no
state is repeated
x (except q)
Observation: length | y | 1
Since there is at least one transition in loop
y
...
j i +1
q
We do not care about the form of string z
z
... q
x
Additional string: The string xz
is accepted
j i +1
1 ... i
q j +1
... ... k
x z
Additional string: The string xyyz
is accepted
Follow loop y
2 times
...
j i +1
1 ... i
q j +1
... ... k
x z
Additional string: The string xyyyz
is accepted
Follow loop y
3 times
...
j i +1
1 ... i
q j +1
... ... k
x z
i
In General: The string xy z
is accepted i = 0, 1, 2, ...
Follow loop y
i times
...
j i +1
1 ... i
q j +1
... ... k
x z
Therefore: x y z L
i
i = 0, 1, 2, ...
j i +1
1 ... i
q j +1
... ... k
x z
In other words, we described:
• we can write w= x y z
• with |x y| m and | y | 1
• such that: xy z L
i i = 0, 1, 2, ...
For all sufficiently long strings (w)
There exists non-null prefix (xy)
and substring (y)
For all repetitions of the substring (y),
we get strings in the language.
w L : | w | m
x, y, z : ( xyz = w)
( | xy | m) ( | y | 0)
(i : i 0 xy i z L)
In the book:
of
3. Write w = xyz
4. Show that w = xy z L
i
for some i 1
Since L is infinite
we can apply the Pumping Lemma
L = {a b : n 0}
n n
We pick w=a b m m
From the Pumping Lemma:
we can write w = a mb m = x y z
with lengths | x y | m, | y | 1
m m
w = xyz = a mb m = a...aa...aa...ab...b
x y z
Thus: y = a , 1 k m
k
x y z=a b
m m y =a , 1k m
k
i = 0, 1, 2, ...
Thus: xy z L
2
x y z=a b m m y =a , 1k m
k
m+k m
xy z = a...aa...aa...aa...ab...b L
2
x y y z
m+ k m
Thus: a b L
m+ k m
a b L k ≥1
BUT: L = {a b : n 0}
n n
m+ k m
a b L
CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true
END OF PROOF
Non-regular language {a b : n 0}
n n
Regular languages
* *
L( a b )
More Applications
of
• we can write w= x y z
• with |x y| m and | y | 1
• such that: xy z L
i i = 0, 1, 2, ...
Non-regular languages L = {vv : v *}
R
Regular languages
Theorem: The language
L = {vv : v *}
R
= {a, b}
is not regular
Since L is infinite
we can apply the Pumping Lemma
L = {vv : v *}
R
We pick w=a b b a
m m m m
From the Pumping Lemma:
we can write: w = a b b a =x y z
m m m m
with lengths: | x y | m, | y | 1
m m m m
w = xyz = a...aa...a...ab...bb...ba...a
x y z
Thus: y =a , 1k m
k
x y z=a b b a
m m m m
y =a , 1k m
k
i = 0, 1, 2, ...
Thus: xy z L
2
x y z=a b b a
m m m m
y =a , 1k m
k
m+k m m m
2
xy z = a...aa...aa...a...ab...bb...ba...a ∈L
x y y z
m+ k m m m
Thus: a b b a L
m+ k m m m
a b b a L k 1
m+ k m m m
a b b a L
CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true
END OF PROOF
Non-regular languages
n l n +l
L = {a b c : n, l 0}
Regular languages
Theorem: The language
n l n +l
L = {a b c : n, l 0}
is not regular
Since L is infinite
we can apply the Pumping Lemma
n l n +l
L = {a b c : n, l 0}
Let m be the critical length of L
length | w| m
We pick w=a b c m m 2m
From the Pumping Lemma:
We can write w =a b c m m 2m
=x y z
With lengths | x y | m, | y | 1
m m 2m
w = xyz = a...aa...aa...ab...bc...cc...c
x y z
Thus: y =a , 1k m
k
x y z=a b c
m m 2m
y =a , 1k m
k
i = 0, 1, 2, ...
0
Thus: x y z = xz ∈ L
x y z=a b c
m m 2m
y =a , 1k m
k
m−k m 2 m
Thus: a b c L
m−k m 2 m
a b c L k 1
BUT: n l n +l
L = {a b c : n, l 0}
m−k m 2 m
a b c L
CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true
END OF PROOF
Non-regular languages L = {a : n 0}
n!
Regular languages
Theorem: The language L = {a : n 0}
n!
is not regular
n! = 1 2 (n − 1) n
Since L is infinite
we can apply the Pumping Lemma
L = {a : n 0}
n!
We pick w=a m!
From the Pumping Lemma:
We can write w =a m!
=x y z
With lengths | x y | m, | y | 1
m m!−m
w = xyz = a m!
= a...aa...aa...aa...aa...a
x y z
Thus: y = a , 1 k m
k
x y z=a m!
y = a , 1 k m
k
i = 0, 1, 2, ...
Thus: xy z L
2
x y z=a m!
y = a , 1 k m
k
m+k m!−m
xy z = a...aa...aa...aa...aa...aa...a L
2
x y y z
m!+ k
Thus: a L
m!+ k
a L 1 k m
Since: L = {a : n 0}
n!
m!+ k = p!
m!+ k 1 k m
a L
BUT: L = {a : n 0}
n!
m!+ k
a L
CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true
END OF PROOF
Review
• Pumping Lemma
Closure Properties of
Regular Languages
Regular Languages
• If ∑ is an alphabet, the set R of regular
languages over ∑ is defined as follows.
• 1. The language ∅ is an element of R, and
for every a ∈ , the language {a} is in R.
• 2. For any two languages L1 and L2 in R, the
three languages L1 ∪ L2, L1L2, and L1* are
elements of R.
For regular languages L1 and L2
we will prove that:
Union: L1 L2
Concatenation: L1L2
Star: Are regular
L1 *
Languages
Reversal: L1R
Complement: L1
Intersection: L1 L2
We say: Regular languages are closed under
Union: L1 L2
Concatenation: L1L2
Star: L1 *
Reversal: L1R
Complement: L1
Intersection: L1 L2
A useful transformation: use one accept state
NFA
a
b 2 accept states
a
b
Equivalent
NFA a
1 accept state
a b
b
In General
NFA
Equivalent NFA
Single
accepting
state
Extreme case
L(M1 ) = L1 L(M 2 ) = L2
NFA M1 NFA M2
M1
n0
a
L1 = {a b}
n
b
M2
L2 = ba b a
Union
NFA for L1 L2
M1
M2
Example
L1 = {a b}n
a
b
L2 = {ba}
b a
Concatenation
M1 M2
Example
L1 = {a b}n
a L2 = {ba}
b b a
Star Operation
NFA for L1 * w = w1w2 wk
wi L1
M1
L1 *
Example
L1 = {a b} n
a
b
Reverse
R
NFA for L1
L1 M1 M1
M1
a
L1 = {a b}
n
b
M1
a
R
L1 = {ba }
n b
Complement
L1 M1 L1 M1
M1
a a, b
L1 = {a b}
n b a, b
M1
a a, b
L1 = {a, b} * −{a b}
n
b a, b
Intersection
L1 regular
We show L1 L2
L2 regular regular
DeMorgan’s Law: L1 L2 = L1 L2
L1 , L2 regular
L1 , L2 regular
L1 L2 regular
L1 L2 regular
L1 L2 regular
Example
L1 = {a b}
n regular
L1 L2 = {ab}
L2 = {ab, ba} regular regular
Another Proof for Intersection Closure
Machine M1 Machine M2
DFA for L1 DFA for L2
qi , p j
State in M1 State in M2
DFA M1 DFA M2
q1 a q2 p1 a p2
transition transition
DFA M
q1, p1 a q2 , p2
New transition
DFA M1 DFA M2
q0 p0
initial state initial state
DFA M
q0 , p0
New initial state
DFA M1 DFA M2
qi pj pk
DFA M
qi , p j qi , pk
n0 m0
L1 = {a b} n
L2 = {ab } m
M1 M2
a b
q0 b q1 p0 a p1
a, b b a
q2 p2
a, b a, b
Automaton for intersection
n n
L = {a b} {ab } = {ab}
a, b
q0 , p0 a q0 , p1 b q1, p1 a q2 , p2
b a b a
q1, p2 b q0 , p2 q2 , p1
a b
a, b
M simulates in parallel M1 and M 2
M1 accepts string w
and M2 accepts string w
L ( M ) = L ( M1 ) L ( M 2 )
Applications of Regular Expression
• Regular Expressions in Lexical Analysis
• Regular Expressions in Web Search Engines
• Regular Expressions in Software Engineering
•Thank You….