Formal Language and Automata Theory
Chapter 4
Patterns, Regular Expressions and Finite Automata
(include lecture 7,8,9)
Transparency No. 4-1 Transparency No. 4-1
Patterns and their defined languages
Patterns, regular expression & FAs
S: a finite alphabet A pattern is a string of symbols representing a set of strings in S*. The set of all patterns is defined inductively as follows: 1. atomic patterns: a S, e, , #, @. 2. compound patterns: if a and b are patterns, then so are: a + b, a b , a*, a+, ~ a and ab . For each pattern a, L(a) is the language represented by a and is defined inductively as follows: 1. L(a) = {a}, L(e) = {e }, L()= {}, L(#) = S, L(@) = S *. 2. If L(a) and L(b) have been defined, then L(a + b ) = L(a ) U L(b ), L(a b ) = L(a ) L(b ). L(a+) = L(a )+, L(a*) = L(a)*, L(~ a ) = S* - L(a ), L(a b) = L(a ) L(b ).
Transparency No. 4-2
More on patterns
Patterns, regular expression & FAs
We say that a string x matches a pattern a iff x L(a). Some examples: 1. S* = L(@) = L(#*) 2. L(x) = {x} for any x S* 3. for any x1,,xn in S*, L(x1+x2++xn) = {x1,x2,,xn}. 4. {x | x contains at least 3 as} = L(@a@a@a@}
5. S - {a} = # ~a
6. {x | x does not contain a} = (# ~a)* 7. {x | every a in x is followed sometime later by a b } =
= {x | either no a in x or $ b in x followed no a }
= (# ~a)* + @b(# ~a)*
Transparency No. 4-3
More on pattern matching
Patterns, regular expression & FAs
Some interesting and important questions: 1. How hard is it to determine if a given input string x matches a given pattern a ? ==> efficient algorithm exists 2. Can every set be represented by a pattern ? ==> no! the set {anbn | n > 0 } cannot be represented by any pattern. 3. How to determine if two given patterns a and b are equivalent ? (I.e., L(a) = L(b)) --- an exercise ! 4. Which operations are redundant ? e = ~(#+ @) = * ; a+ = a a* # = a1 + a2 ++ an if S = {a1,.., an} a + b = ~(~a ~b) ; a b = ~ (~a + ~b ) It can be shown that ~ is redundant.
Transparency No. 4-4
Equivalence of patterns, regular expr. & FAs
Patterns, regular expression & FAs
Recall that regular expressions are those patterns that can be built from: a S, e, , +, and *. Notational conventions: a + br means a + (br) a + b* means a + (b*) a b* means a (b*) Theorem 8: Let A S*. Then the followings are equivalent: 1. A is regular (I.e., A = L(M) for some FA M ), 2. A = L(a) for some pattern a, 3. A = L(b) for some regular expression b. pf: Trivial part: (3) => (2). (2) => (1) to be proved now! (1)=> (3) later.
Transparency No. 4-5
(2) => (1) : Every set represented by a pattern is regular
Patterns, regular expression & FAs
Pf: By induction on the structure of pattern a. Basis: a is atomic: (by construction!)
1. a = a :
2. a = e:
a e
3. a = :
4. a = #: 5. a = @ = #* : a,b,c, e a,b,c,
Transparency No. 4-6
Patterns, regular expression & FAs
Inductive cases: Let M1 and M2 be any FAs accepting L(b) and L(g), respectively.
6. a = b g : => L(a) = L(M1 M2)
7. a = b * : => L(a) = L(M1*) 8. a = b + g, a = ~b or a = b g : By ind. hyp. b and g are regular. Hence by closure properties of regular languages, a is regular, too. 9. a = b+ = b b* : Similar to case 8.
Transparency No. 4-7
Some examples patterns & their equivalent FAs
Patterns, regular expression & FAs
1. (aaa)* + (aaaaa)*
Transparency No. 4-8
(1)=>(3): Regular languages can be represented by reg.
Patterns, regular expression & FAs expr.
M = (Q, S, d, S, F) : a NFA; X Q: a set of states; m,n Q : two states
pX(m,n) =def {y S* | $ a path from m to n labeled y and all intermediate states X }. Note: L(M) = ? pX(m,n) can be shown to be representable by a regular expr, by induction as follows: Let D(m,n) = { a | (m an) d } = {a1,,ak} ( k 0) = the set of symbols by which we can reach from m to n, then Basic case: X = : 1.1 if m n: p(m,n) = {a1, a2,,ak } = L(a1 + a2++ ak) if k > 0, = {} = L() if k = 0. 1.2 if m =n: p(m,n) = {a1, a2, ak, e}=L(a1 + a2++ ak +e) if k > 0, = {e} = L(e) if k = 0.
Transparency No. 4-9
Continue.
Patterns, regular expression & FAs
3. For nonempty X, let q be any state in X, then : pX(m,n) = pX-{q} (m,n) U pX-{q}(m,q) (pX-{q}(q,q))* pX-{q}(q,n).
By Ind.hyp.(why?), there are regular expressions a, b, g, r with L( [a, b, g, r] ) = [pX-{q} (m,n), pX-{q}(m,q), (pX-{q}(q,q)), pX-{q}(q,n) ]
Hence pX(m,n) = L( a ) U L(b) L(g) = L(a + bg*r ) and can be represented as a reg. expr. Finally, L(M) = {x | s --x--> f, s S, f F } = SsS, fF pQ(s,f), is representable by a regular expression. * L(r ),
Transparency No. 4-10
Some examples
Patterns, regular expression & FAs
Example (9.3): M : L(M) = p{p,q,r}(p,p) = p{p,r}(p,p) + p{p,r}(p,q) (p{p,r}(q,q))* p{p,r}(q,p) p{p,r}(p,p) = ? p{p,r}(p,q) = ? p{p,r}(q,q) = ? p{p,r}(q,p) = ?
0 >pF q r {p} {r} {p}
1 {q} {} {q}
Hence L(M) = ?
Transparency No. 4-11
Another approach
Patterns, regular expression & FAs
The previous method easy to prove, easy for computer implementation, but hard for human computation. The strategy of the new method: reduce the number of states in the target FA and encodes path information by regular expressions on the edges. until there is one or two states : one is the start state and one is the final state.
Transparency No. 4-12
Steps
Patterns, regular expression & FAs
0. Assume the machine M has only one start state and one final state. Both may probably be identical. 1. While the exists a third state p that is neither start nor final: 1.1 (Merge edges) For each pair of states (q,r) that has more than 1 edges with labels t1,t2,tn, respectively, than merge these edges by a new one with regular expression t = t1 + t2 + tn. 1.2 (Replace state p by edges; remove state) Let (p1, a1, p), (pn, an, p) where pj != p be the collection of all edges in M with p as the destination state, and (p,b1, q1),,(p, bm, qm) where qj != p be the collection of all edges with p as the start state. Now the sate p together with all its connecting edges can be removed and replaced by a set of m x n new edges : { (pi, ai t* bj, qj) | i in [1,n] and j in [1,m] }. The new machine is equivalent to the old one.
Transparency No. 4-13
Patterns, regular expression & FAs
Merge Edges : a b g
Replace state by Edges g a1 b1 p1 a2 p2 p b2 a3 p3
q1
q2
p1
a1 g*b1
a+b+g
p2
a2 g*b1 a3g*b1 a2 g*b2 a1 g*b2 a3 g*b2
q1
p3
q2
Note: {p1,p2,p3} may intersect with {q1,q2}.
Transparency No. 4-14
Patterns, regular expression & FAs
2. perform 1.1 once again (merge edges) // There are one or two states now 3 Two cases to consider: 3.1 The final machine has only one state, that is both start and final. Then if there is an edge labeled t on the sate, then t* is the result, other the result is e.
3.2 The machine has one start state s and one final state f. Let (s, ss, s), (f, ff, f), (s,sf, f) and (f, ff, f) be the collection of all edges in the machine, where (sf) means the regular expression or label on the edge from s to f. The result then is
[ (ss) + (sf ) (ff)* (fs) ] * (sf) (ff)*
Transparency No. 4-15
Example
0 >p q rF {p,r} {r} {p,q}
1 {q,r} {p,q,r} {q,r}
p q
1 1
Patterns, regular expression & FAs
1. another representation
r
0,1 0,1
p q 1
0,1 1
Transparency No. 4-16
Merge edges
Patterns, regular expression & FAs
p
q r p q r 1 0
1
1
0,1
0,1
0,1 1
p
q r 1 0
1
1
0+1
0+1
0+1 1
Transparency No. 4-17
remove q
Patterns, regular expression & FAs
p
p p p q
0, 11*1 1
q
1 1
r
0+1 0+1
0
1 0
q
1 1,
r
0+1, 11* (0+1) 0+1
q r
0+1 1
1
p r
r 0+1 0, 0+1 1, r (0+1)1*(0+1) (0+1) 1*1
q
1
0+1
Transparency No. 4-18
Form the final result
Patterns, regular expression & FAs
p >p rF
0+11*1 0+ (0+1) 1*1
r
0+1+11* (0+1) 1+ (0+1)1*(0+1)
Final result : = [ pp + (pr) (rr)* (rp) ]* (pr) (rr) *
[ (0+11*1) +(0+1+11*(0+1)) (1+(0+1)1*(0+1))* (0+(0+1)1*1) ]* (0+1+11*(0+1)) (1+(0+1)1*(0+1))*
Transparency No. 4-19