UNIT 2
Syllabus: Regular Expressions, Finite Automata and
Regular Expressions, Applications of Regular
Expressions, Algebraic Laws for Regular Expressions,
Properties of Regular Languages-
Pumping Lemma for Regular Languages, Applications
of the Pumping Lemma, Closure Properties of
Regular Languages, Decision Properties of Regular
Languages, Equivalence and Minimization of
Automata.
==============================================
=================
Regular Expression:
Just as finite automata are used to recognize patterns of strings; regular
expressions are used to generate patterns of strings. A regular expression is
an algebraic formula whose value is a pattern consisting of a set of strings,
called the language of the expression.
1. A Regular Language is used to specify a Language and it does so preciously
2. Regular expressions are very intuitive.
3. Regular expressions are very useful in a variety of contexts.
4. Given a regular expression, an NFA-ε can be constructed from
it automatically.
5.Thus, so can an NFA, a DFA, and a corresponding program, all
automatically!
Operands in a regular expression can be:
characters from the alphabet over which the regular expression is defined.
variables whose values are any pattern defined by a regular expression.
epsilon which denotes the empty string containing no characters.
null which denotes the empty set of strings.
• Let Σ be an alphabet. The regular expressions over Σ are:
– Ø Represents the empty set { }
– ε Represents the set {ε}
– a Represents the set {a}, for any symbol a in Σ
Operators used in regular expressions include:
Union: If R1 and R2 are regular expressions, then R1 | R2 (also written
as R1 U R2 or R1 + R2) is also a regular expression.
L(R1|R2) = L(R1) U L(R2).
Concatenation: If R1 and R2 are regular expressions, then R1R2 (also
written as R1.R2) is also a regular expression.
L(R1R2) = L(R1) concatenated with L(R2).
Kleene closure: If R1 is a regular expression, then R1* (the Kleene
closure of R1) is also a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...
Let r and s be regular expressions that represent the sets R and S, respectively.
– r+s Represents the set R U S (precedence 3)
– rs Represents the set RS (precedence 2)
*
– r Represents the set R* (highest precedence)
– (r) Represents the set R (not an op,
provides precedence)
If r is a regular expression, then L(r) is used to denote the corresponding
language
Some RE Examples
Regular Regular Set
Expression
(0+10*) L= { 0, 1, 10, 100, 1000, 10000, … }
(0*10*) L={1, 01, 10, 010, 0010, …} at least one 1.
(0*10*) L={1, 01, 10, 010, 0010, …} at least one 1.
(0+ε)(1+ ε) L= {ε, 0, 1, 01}
Set of strings of a’s and b’s of any length including the
(a+b)*
null string. So L= { ε, a, b, aa , ab ,bb , ba, aaa…….}
Set of strings of a’s and b’s ending with the string abb.
(a+b)*abb
So L = {abb, aabb, babb, aaabb, ababb,…… }
Set consisting of even number of 1’s including empty
(11)*
string, So L= {ε, 11, 1111, 111111, }
Set of strings consisting of even number of a's followed
(aa)*(bb)*b
by odd number of b’s .
So, L={b, aab,aabbb,aabbbbb, aaaab, aaaabbb,…………..}
String of a’s and b9s of even length can be obtained by
(aa + ab + ba +
bb)* concatenating any combination of the strings aa, ab, ba
and bb including null,
So,L= {aa, ab, ba, bb, aaab, aaba,…….. }
Closure Properties of Regular Languages:
Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism,
Inverse Homomorphism
Closure Properties Recall a closure property is a statement that a certain operation on
languages, when applied to languages in a class (e.g., the regular languages), produces a
result that is also in that class. For regular languages, we can use any of its representations
to prove a closure property.
Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets/ Regular Languages:
Property 1 The union of two regular set is regular.
Proof:
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1= {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
L2={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 ={ ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,…}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Hence, proved.
Property 2 The intersection of two regular set is regular.
Proof:
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So,
L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 ={ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null) RE
(L1 ∩ L2) = aa(aa)*
which is a regular expression itself.
Hence, proved.
Property 3 The complement of a regular set is regular.
Proof:
Let us take a regular expression: RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including
Null) Complement of L is all the strings that is not in L.
So, L = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4 The difference of two regular set is regular.
Proof:
Let us take two regular expressions:
RE1 = a (a*) and RE2 = (aa)* So,
L1= {a, aa, aaa, aaaa, …………..} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....} (Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)*
which is a regular expression. Hence, proved.
Property 5 The reversal of a regular set is regular.
Proof:
We have to prove LR is also regular if L is a regular set.
Let, L= {01, 10, 11}
RE (L)= 01 + 10 + 11
LR= {10, 01, 11}
RE (LR)= 10+ 01+ 11
which is regular Hence, proved.
Property 6 The closure of a regular set is regular.
Proof:
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null) i.e.,
RE
(L) = a (aa)*
L*= {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)* Hence,
proved.
Property 7 The concatenation of two regular sets is regular.
Proof:
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {010, 0100, 0110, 0110, 011110, 011000, 010010,.................................}
Set of strings containing 010 as a substring which can be represented by an
RE: 01(0+1)*0
Hence, proved.
Identities Related to Regular Expressions:
Given R, P, L, Q as regular expressions, the following identities hold:
1. Ø* = ε
2. ε* = ε
3. RR* = R*R
4. R*R* = R*
5. (R*)* = R*
6. RR* = R*R
7. (PQ)*P =P(QP)*
8. (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
9. R + Ø = Ø + R = R (The identity for union)
10. Rε = εR = R (The identity for concatenation)
11. ØL = LØ = Ø (The annihilator for concatenation)
12. R + R = R (Idempotent law)
13. L (M + N) = LM + LN (Left distributive law)
14. (M + N) L = LM + LN (Right distributive law)
15. ε + RR* = ε + R*R = R*
Arden’s Theorem
In order to find out a regular expression of a Finite Automaton, we use Arden’s
Theorem along with the properties of regular expressions.
Statement:
Let P and Q be two regular expressions.
If P does not contain null string,
then R = Q + RP has a unique solution that is R= QP*
Proof:
R=Q+RP
R = Q + (Q + RP) P [After putting the value R = Q + RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the following
equation:
R = Q + QP + QP2 + QP3…..
R = Q (є + P + P2 + P3 + …. )
R = QP*[As P* represents (є + P + P2 + P3 + ….) ]
Hence, proved.
Assumptions for Applying Arden’s Theorem
1. The transition diagram must not have NULL transitions
2. It must have only one initial state
Method
Step 1: Create equations as the following form for all the states of the DFA having
n states with initial state q1.
q1 = q1R11 + q2R21 + … + qnRn1 + є q2 = q1R12 + q2R22 + … + qnRn2
..…………………………
……………………………
……………………………
……………………………
qn = q1R1n + q2R2n + … + qnRnn
Rij represents the set of labels of edges from qi to qj,
if no such edge exists, then Rij = Ø
Step 2: Solve these equations to get the equation for the final state in terms of Rij
CONVERTION FINATE AUTOMATA TO REGULAR EXPRESSION
Problem
Construct a regular expression corresponding to the automata given below:
b a
q
2 q
b 3
b a
q
1
a
Finite automata
Solution
Here the initial state is q2 and the final state is q1.
The equations for the three states q1, q2, and q3 are as follows:
q1 = q1a + q3a + є (є move is because q1 is the initial state)
q2 = q1b + q2b+ q3b
q3 = q2a
Now, we will solve these three equations:
q2 = q1b + q2b + q3b
= q1b + q2b + (q2a)b (Substituting value of q3)
= q1b + q2(b + ab)
= q1b (b + ab)* (Applying Arden’s Theorem)
q1 = q1a + q3a + є
= q1a + q2aa + є (Substituting value of q3)
= q1a + q1b(b + ab*)aa + є (Substituting value of q2)
= q1(a + b(b + ab)*aa) + є
= є (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
Problem
Construct a regular expression corresponding to the automata given below:
0 0,1
q1 q3
1 1
q
0
Finite automata
Solution :
Here the initial state is q1 and the final state is q2 Now we write down the
equations:
q1 = q10 + є
q2 = q11 + q20
q3 = q21 + q30 + q31
Now, we will solve these three equations: q1 = є0* [As, εR = R]
So, q1 = 0* q2
= 0*1 + q20
So, q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is 0*10*.
Construction of a FA from an RE
We can use Thompson's Construction to find out a Finite
Automaton from a Regular Expression. We will reduce the regular
expression into smallest regular expressions and converting these to
NFA and finally to DFA.
Some basic RA expressions are the following:
Case 1: For a regular expression 8a9, we can construct the
following FA:
q1 q
a
Finite automata for RE = a
Case 2: For a regular expression 8ab9, we can construct the
following FA:
q1 q1 q
a b
Finite automata for RE = ab
Case 3: For a regular expression (a+b), we can construct the
following FA:
q q
1 a
Finite automata for RE= (a+b)
Case 4: For a regular expression (a+b)*, we can construct the
following FA:
a,b
Method:
Finite automata for RE= (a+b)*
Step 1: Construct an NFA with Null moves from the given regular
expression.
Step 2: Remove Null transition from the NFA and convert it into
its equivalent DFA.
Problem: Convert the following RA into its equivalent DFA: 1 (0 + 1)* 0
Solution:
We will concatenate three expressions "1", "(0 + 1)*" and "0"
0,1
q0 q1 q2 q3 q
1 є є 0
NDFA with NULL transition for RA: 1 (0 + 1)* 0
Now we will remove the є transitions. After we remove the є
transitions from the NDFA, we get the following:
0,1
q0 q2 q
1 0
NDFA without NULL transition for RA: 1 (0 + 1)* 0
It is an NDFA corresponding to the RE: 1 (0 + 1)* 0. If you want
to convert it into a DFA, simply apply the method of converting
NDFA to DFA discussed in Chapter 1.
Pumping Lemma for Regular Languages
Theorem
Let L be a regular language. Then there exists a constant ‘c’ such that for every string
w in L: |w| ≥ c
We can break w into three strings, w = xyz, such that:
1. |y| > 0
2. |xy| <=c
3. For all k >= 0, the string xykz is also in L.
Applications of Pumping Lemma:
Pumping Lemma is to be applied to show that certain languages are not regular. It
should never be used to show a language is regular.
1. If L is regular, it satisfies Pumping Lemma.
2. If L does not satisfy Pumping Lemma, it is non-regular.
Method to prove that a language L is not regular:
1. At first, we have to assume that L is regular.
2. So, the pumping lemma should hold for L.
3. Use the pumping lemma to obtain a contradiction:
a. Select w such that |w| >= c
b. Select y such that |y| >=1
c. Select x such that |xy| <= c
d. Assign the remaining string to z.
e. Select k such that the resulting string is not in L. L is not regular.
Problem :
Prove that L = {aibi | i ≥ 0} is not regular.
Solution:
1. At first, we assume that L is regular and n is the number of states.
2. Let w = anbn. Thus |w| = 2n >= n.
3. By pumping lemma, let w = xyz, where |xy|<= n.
4. Let x = ap, y = aq, and z = arbn, where p + q + r = n,
p ≠ 0, q ≠ 0, r ≠ 0. Thus |y|≠ 0.
5. Let k = 2. Then xy2z = apa2qarbn.
6. Number of a’s = (p + 2q + r) = (p + q + r) + q = n + q
7. Hence, xy2z = an+q bn. Since q ≠ 0, xy2z is not of the
form anbn.
8. Thus, xy2z is not in L. Hence L is not regular.
What is the Pumping Lemma (for Regular Languages)?
The Pumping Lemma helps us prove that a language is not regular*
It says:
If a language is regular, then any long enough string in the language can
be broken into three parts x, y, z— such that the middle part y can be
pumped (repeated any number of times) and the string still stays in the
language.
Conditions of the Pumping Lemma
If a language L is regular, then there exists a number p (pumping length)
such that every string s in L with |s| > p can be split into:
s = xyz
Such that:
1. |xy| <=p -- the first part is within the first pletters
2. |y| >= 1 --y is not empty
3. xyiz € in L for all i >=0-- you can repeat or remove y, and the result is
still in the language.
Let's Use It to Prove a Language is Not Regular
Language:
L = { an bn , n >= 0 }
This means: same number of `a`’s followed by same number of `b`’s
Examples: €, ab, aabb, aaabbb, etc.
1. Assume L is regular.
By the Pumping Lemma, there exists a pumping length p.
2. Choose a string from the language:
s = ap bp
This string is in L because it has the same number of a’s and b’s.
3. Split S into xyz:
Since the first p letters are all `a`, and |xy| ≤ p, both x and y contain only
a’s.
Let’s say:
x = ak
y = am, where m ≥1
z =ap-k-mbp
So:
s = xyz = ap bp
4. Pump the string:
Try i = 2
This means we repeat `y` once:
xy2z = xyyz = ak a2m ap-k-m b p = ap+m bp
Now, we have more a’s than b’s.
So:
xy2z !=L
5. Contradiction!
The pumped string is not in the language L.
That contradicts the Pumping Lemma.
Final Conclusion:
Since the string could not be pumped and stay in the language:
L = { an bn ,n ≥0 } is not regular.