Lecture Slides Regular Expressions
Lecture Slides Regular Expressions
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
Σ*aaΣ*
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions
The
Thelength
lengthof ofaa
string
stringwwisis
denoted
denoted|w||w|
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Σ4
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Σ4
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
Here
Hereare
aresome
somecandidate
candidateregular
regularexpressions
expressionsfor
for
the
thelanguage
languageL.
L.Which
Whichofofthese
theseare
arecorrect?
correct?
Σ*aΣ*
Σ*aΣ*
b*ab* ∪ b*
b*ab* ∪ b*
b*(a
b*(a ∪∪ ε)b*
ε)b*
b*a*b*
b*a*b* ∪∪ b*
b*
b*(a*
b*(a* ∪∪ ε)b*
ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*a?b*
bbbbabbb
bbbbbb
abbb
a
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@ aa*.aa*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@ aa*.aa*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@ aa*.aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*(.aa*)*@ aa*.aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
a+ (.aa*)*@ aa*.aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
a+ (.aa*)*@ aa*.aa*(.aa*)*
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
[email protected]
[email protected]
[email protected]
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
a+(.a+)*@a+(.a+)+
[email protected]
[email protected]
[email protected]
For Comparison
@, .
Σ
@, . @, .
q2 q8 q7
@
. a @, . @ . a
@, .
start a @ a . a
q0 q1 q3 q4 q5 q6
a a a
Shorthand Summary
●
Rn is shorthand for RR … R (n times).
●
Edge case: define R⁰ = ε.
●
Σ is shorthand for “any character in Σ.”
●
R? is shorthand for (R ∪ ε), meaning
“zero or one copies of R.”
●
R⁺ is shorthand for RR*, meaning “one or
more copies of R.”
The Lay of the Land
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.
Regular
Languages
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.
Regular
Languages
Regular
Languages
q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs
q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs
q₂
Σ b
start a These
q₀ q₁ Theseare
areall
allregular
regular
expressions!
expressions!
ε
Σ
q₃ b q₄
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
Note:
Note:Actual
ActualNFAs
NFAsaren't
aren't
allowed to have transitions
allowed to have transitions
like
likethese.
these.This
Thisisisjust
justaa
thought
thoughtexperiment.
experiment.
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Key Idea 1: Imagine that we can label
transitions in an NFA with arbitrary regular
expressions.
Generalizing NFAs
start ab ∪ b
q₀ q₁
Generalizing NFAs
start ab ∪ b
q₀ q₁
IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs
start ab ∪ b
q₀ q₁
IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Key Idea 2: If we can convert an NFA into
a generalized NFA that looks like this...
start some-regex
q₀ q₁
R11 R22
R12
start q1 q2
R21
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Here,
Here,R₁₁,
R₁₁,R₁₂, R₂₁,and
R₁₂,R₂₁, R₂₂are
andR₂₂ are
arbitrary
arbitraryregular
regularexpressions.
expressions.
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Question:
Question:Can
Canweweget
getaaclean
cleanregular
regular
expression
expressionfrom
fromthis
thisNFA?
NFA?
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Key
KeyIdea
Idea3:
3:Somehow
Somehowtransform
transformthis
this
NFA
NFAsosothat
thatititlooks
lookslike
likethis:
this:
start some-regex
q₀ q₁
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
The
Thefirst
firststep
stepisisgoing
goingto
tobe
beaa
bit
bitweird...
weird...
From NFAs to Regular Expressions
R11 R22
R12
start qs q1 q2 qf
R21
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
Could
Couldwe weeliminate
eliminate
this
thisstate
statefrom
fromthe
the
NFA?
NFA?
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
Note:
Note:We're
We'reusing
using
concatenation
concatenationand andKleene
Kleene
closure
closurein
inorder
ordertotoskip
skipthis
this
state.
state.
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
R22
start ε
qs q2 qf
R22
start ε
qs q2 qf
start ε
qs q2 qf
Note:
Note:We're usingunion
We'reusing uniontoto
combine
combinethese
thesetransitions
transitions
together.
together.
From NFAs to Regular Expressions
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
start qs qf
From NFAs to Regular Expressions
start qs qf
From NFAs to Regular Expressions
start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf
From NFAs to Regular Expressions
start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf
R11 R22
R12
start q1 q2
R21
The State-Elimination Algorithm
●
Start with an NFA N for the language L.
● Add a new start state qs and accept state qf to the
NFA.
● Add an ε-transition from qs to the old start state of N.
● Add ε-transitions from each accepting state of N to qf,
then mark them as not accepting.
● Repeatedly remove states other than qs and qf
from the NFA by “shortcutting” them until only
two states remain: qs and qf.
● The transition from qs to qf is then a regular
expression for the NFA.
The State-Elimination Algorithm
●
To eliminate a state q from the automaton, do the following
for each pair of states q₀ and q₁, where there's a transition
from q₀ into q and a transition from q into q₁:
● Let Rin be the regex on the transition from q₀ to q.
● Let Rout be the regex on the transition from q to q₁.
● If there is a regular expression Rstay on a transition from q
to itself, add a new transition from q₀ to q₁ labeled
((Rin)(Rstay)*(Rout)).
●
If there isn't, add a new transition from q₀ to q₁ labeled
((Rin)(Rout))
●
If a pair of states has multiple transitions between them
labeled R₁, R₂, …, Rₖ, replace them with a single transition
labeled R₁ ∪ R₂ ∪ … ∪ Rₖ.
Our Transformations