0% found this document useful (0 votes)
16 views

Lecture Slides Regular Expressions

This document discusses regular expressions and regular languages. It introduces atomic and compound regular expressions, how to construct regular languages using closure properties, and how to design regular expressions to represent specific languages.

Uploaded by

ykupeli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture Slides Regular Expressions

This document discusses regular expressions and regular languages. It introduces atomic and compound regular expressions, how to construct regular languages using closure properties, and how to design regular expressions to represent specific languages.

Uploaded by

ykupeli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

Regular Expressions

Recap from Last Time


Regular Languages

A language L is a regular language iff
there is a DFA D such that ℒ(D) = L.

Theorem: The following are equivalent:

L is a regular language.

There is a DFA for L.

There is an NFA for L.
Language Concatenation

If w ∈ Σ* and x ∈ Σ*, then wx is the
concatenation of w and x.

If L₁ and L₂ are languages over Σ, the
concatenation of L₁ and L₂ is the language L₁L₂
defined as
L₁L₂ = { wx | w ∈ L₁ and x ∈ L₂ }

Example: if L₁ = { a, ba, bb } and L₂ = { aa, bb },
then
L₁L₂ = { aaa, abb, baaa, babb, bbaa, bbbb }
Lots and Lots of Concatenation

Consider the language L = { aa, b }

L0 = {ε}

LL=L2 is the set of strings formed by concatenating pairs of
strings in L.
{ aaaa, aab, baa, bb }

LLL = L3 is the set of strings formed by concatenating triples of
strings in L.
{ aaaaaa, aaaab, aabaa, aabb, baaaa, baab, bbaa, bbb}

LLLL = L4 is the set of strings formed by concatenating
quadruples of strings in L.
{ aaaaaaaa, aaaaaab, aaaabaa, aaaabb, aabaaaa,
aabaab, aabbaa, aabbb, baaaaaa, baaaab, baabaa,
baabb, bbaaaa, bbaab, bbbaa, bbbb}
The Kleene Closure

An important operation on languages is
the Kleene Closure, which is defined as
L* = { w ∈ Σ* | ∃n ∈ ℕ. w ∈ Ln }
Closure Properties

Theorem: If L₁ and L₂ are regular
languages over an alphabet Σ, then so are
the following languages:

L₁

L₁ ∪ L₂

L₁ ∩ L₂

L₁L₂

L₁*

These properties are called closure
properties of the regular languages.
New Stuff!
Another View of Regular Languages
Rethinking Regular Languages

We currently have several tools for
showing a language L is regular:

Construct a DFA for L.

Construct an NFA for L.

Combine several simpler regular languages
together via closure properties to form L.

Today we expand on this last idea.
Constructing Regular Languages

Idea: Build up all regular languages as
follows:

Start with a small set of simple languages we
already know to be regular.

Using closure properties, combine these
simple languages together to form more
elaborate languages.

This is a bottom-up approach to the
regular languages.
Constructing Regular Languages

Idea: Build up all regular languages as
follows:

Start with a small set of simple languages we
already know to be regular.

Using closure properties, combine these
simple languages together to form more
elaborate languages.

This is a bottom-up approach to the
regular languages.
Regular Expressions

Regular expressions are a way of describing a language
via a string representation.

They’re used just about everywhere:

They’re built into the JavaScript language and used for data
validation.
● They’re used in the UNIX grep and flex tools to search files
and build compilers.

They’re employed to clean and scrape data for large-scale
analysis projects.

Conceptually, regular expressions are strings describing
how to assemble a larger language out of smaller pieces.
Atomic Regular Expressions

The regular expressions begin with three
simple building blocks.

The symbol Ø is a regular expression that
represents the empty language Ø.

For any a ∈ Σ, the symbol a is a regular
expression for the language {a}.

The symbol ε is a regular expression that
represents the language {ε}.

Remember: {ε} ≠ Ø!

Remember: {ε} ≠ ε!
Compound Regular Expressions
● If R1 and R2 are regular expressions, R1R2 is a
regular expression for the concatenation of
the languages of R1 and R2.
● If R1 and R2 are regular expressions, R1 ∪ R2 is
a regular expression for the union of the
languages of R1 and R2.

If R is a regular expression, R* is a regular
expression for the Kleene closure of the
language of R.

If R is a regular expression, (R) is a regular
expression with the same meaning as R.
Operator Precedence

Here’s the operator precedence for
regular expressions:
(R)
R*
R1R2
R 1 ∪ R2

So ab*c∪d is parsed as ((a(b*))c)∪d
Regular Expression Examples

The regular expression trick∪treat represents the
language
{ trick, treat }.

The regular expression booo* represents the
regular language
{ boo, booo, boooo, … }.

The regular expression candy!(candy!)*
represents the regular language
{ candy!, candy!candy!, candy!candy!candy!,
… }.
Regular Expressions, Formally

The language of a regular expression is the
language described by that regular expression.

Formally:

ℒ(ε) = {ε}

ℒ(Ø) = Ø

ℒ(a) = {a} Worthwhile
Worthwhileactivity:
activity:Apply
Applythis
this
recursive
recursivedefinition
definitionto
to
● ℒ(R1R2) = ℒ(R1) ℒ(R2)
● ℒ(R1 ∪ R2) = ℒ(R1) ∪ ℒ(R2) a(b∪c)((d))
a(b∪c)((d))

ℒ(R*) = ℒ(R)* and
andsee
seewhat
whatyou
youget.
get.

ℒ((R)) = ℒ(R)
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*

bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*

bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains aa as a
substring }.
Σ*aaΣ*

bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions

Let Σ = {a, b}.


Let L = { w ∈ Σ* | |w| = 4 }.

The
Thelength
lengthof ofaa
string
stringwwisis
denoted
denoted|w||w|
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

ΣΣΣΣ
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

ΣΣΣΣ
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

ΣΣΣΣ

aaaa
baba
bbbb
baaa
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

ΣΣΣΣ

aaaa
baba
bbbb
baaa
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

Σ4

aaaa
baba
bbbb
baaa
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | |w| = 4 }.

Σ4

aaaa
baba
bbbb
baaa
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

Here
Hereare
aresome
somecandidate
candidateregular
regularexpressions
expressionsfor
for
the
thelanguage
languageL.
L.Which
Whichofofthese
theseare
arecorrect?
correct?
Σ*aΣ*
Σ*aΣ*
b*ab* ∪ b*
b*ab* ∪ b*
b*(a
b*(a ∪∪ ε)b*
ε)b*
b*a*b*
b*a*b* ∪∪ b*
b*
b*(a*
b*(a* ∪∪ ε)b*
ε)b*
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

b*(a ∪ ε)b*
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

b*(a ∪ ε)b*
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

b*(a ∪ ε)b*

bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

b*(a ∪ ε)b*

bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions

Let Σ = {a, b}.

Let L = { w ∈ Σ* | w contains at most one a }.

b*a?b*

bbbbabbb
bbbbbb
abbb
a
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@ aa*.aa*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@ aa*.aa*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@ aa*.aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

aa*(.aa*)*@ aa*.aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.aa*)*@ aa*.aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.aa*)*@ aa*.aa*(.aa*)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)*

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)+

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+ (.a+)* @ a+ .a+ (.a+)+

[email protected]
[email protected]
[email protected]
A More Elaborate Design

Let Σ = { a, ., @ }, where a represents
“some letter.”

Let's make a regex for email addresses.

a+(.a+)*@a+(.a+)+

[email protected]
[email protected]
[email protected]
For Comparison

a (.a )*@a (.a )


+ + + + +

@, .
Σ
@, . @, .
q2 q8 q7
@
. a @, . @ . a
@, .

start a @ a . a
q0 q1 q3 q4 q5 q6

a a a
Shorthand Summary

Rn is shorthand for RR … R (n times).

Edge case: define R⁰ = ε.

Σ is shorthand for “any character in Σ.”

R? is shorthand for (R ∪ ε), meaning
“zero or one copies of R.”

R⁺ is shorthand for RR*, meaning “one or
more copies of R.”
The Lay of the Land
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
The Power of Regular Expressions
Theorem: If R is a regular expression,
then ℒ(R) is regular.
Proof idea: Use induction!

The atomic regular expressions all represent
regular languages.

The combination steps represent closure
properties.

So anything you can make from them must
be regular!
Thompson’s Algorithm

In practice, many regex matchers use an
algorithm called Thompson's algorithm
to convert regular expressions into NFAs
(and, from there, to DFAs).

Read Sipser if you’re curious!

Fun fact: the “Thompson” here is Ken
Thompson, one of the co-inventors of
Unix!
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
Languages
Languagesyou
youcan
can Languages
Languagesyouyoucan
can
build
buildaaDFA
DFAfor.
for. build
buildan
anNFA
NFAfor.
for.

Regular
Languages

Languages You Can


Write a Regex For
The Power of Regular Expressions
Theorem: If L is a regular language,
then there is a regular expression for L.
This is not obvious!
Proof idea: Show how to convert an
arbitrary NFA into a regular expression.
Generalizing NFAs

q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs

q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs

q₂
Σ b
start a These
q₀ q₁ Theseare
areall
allregular
regular
expressions!
expressions!
ε
Σ
q₃ b q₄
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

Note:
Note:Actual
ActualNFAs
NFAsaren't
aren't
allowed to have transitions
allowed to have transitions
like
likethese.
these.This
Thisisisjust
justaa
thought
thoughtexperiment.
experiment.
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Generalizing NFAs

start ab ∪ b
q₀ q₁

a ab*

a*b?a*
q₂ q₃

a a a b a a b b b
Key Idea 1: Imagine that we can label
transitions in an NFA with arbitrary regular
expressions.
Generalizing NFAs

start ab ∪ b
q₀ q₁
Generalizing NFAs

start ab ∪ b
q₀ q₁

IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs

start ab ∪ b
q₀ q₁

IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs

start a+(.a+)*@a+(.a+)+
q₀ q₁
Generalizing NFAs

start a+(.a+)*@a+(.a+)+
q₀ q₁

IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Generalizing NFAs

start a+(.a+)*@a+(.a+)+
q₀ q₁

IsIsthere
thereaasimple
simpleregular
regular
expression
expressionfor
forthe
the
language
languageofofthis
this
generalized NFA?
generalized NFA?
Key Idea 2: If we can convert an NFA into
a generalized NFA that looks like this...

start some-regex
q₀ q₁

...then we can easily read off a regular


expression for the original NFA.
From NFAs to Regular Expressions

R11 R22
R12
start q1 q2
R21
From NFAs to Regular Expressions

R11 R22
R12
start q1 q2
R21

Here,
Here,R₁₁,
R₁₁,R₁₂, R₂₁,and
R₁₂,R₂₁, R₂₂are
andR₂₂ are
arbitrary
arbitraryregular
regularexpressions.
expressions.
From NFAs to Regular Expressions

R11 R22
R12
start q1 q2
R21

Question:
Question:Can
Canweweget
getaaclean
cleanregular
regular
expression
expressionfrom
fromthis
thisNFA?
NFA?
From NFAs to Regular Expressions

R11 R22
R12
start q1 q2
R21

Key
KeyIdea
Idea3:
3:Somehow
Somehowtransform
transformthis
this
NFA
NFAsosothat
thatititlooks
lookslike
likethis:
this:

start some-regex
q₀ q₁
From NFAs to Regular Expressions

R11 R22
R12
start q1 q2
R21

The
Thefirst
firststep
stepisisgoing
goingto
tobe
beaa
bit
bitweird...
weird...
From NFAs to Regular Expressions

R11 R22
R12
start qs q1 q2 qf
R21
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf

Could
Couldwe weeliminate
eliminate
this
thisstate
statefrom
fromthe
the
NFA?
NFA?
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf

Note:
Note:We're
We'reusing
using
concatenation
concatenationand andKleene
Kleene
closure
closurein
inorder
ordertotoskip
skipthis
this
state.
state.
From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf

R21 R11* R12


From NFAs to Regular Expressions
ε R11* R12

R11 R22
R12
start ε ε
qs q1 R21 q2 qf

R21 R11* R12


From NFAs to Regular Expressions
ε R11* R12

R22

start ε
qs q2 qf

R21 R11* R12


From NFAs to Regular Expressions
R11* R12

R22

start ε
qs q2 qf

R21 R11* R12


From NFAs to Regular Expressions
R11* R12

start ε
qs q2 qf

R22 ∪ R21 R11* R12

Note:
Note:We're usingunion
We'reusing uniontoto
combine
combinethese
thesetransitions
transitions
together.
together.
From NFAs to Regular Expressions

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions
What
Whatshould
shouldweweput
puton
on
this transition?
this transition?

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)* ε

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)* ε

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)* ε

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)* ε

R11* R12 ε
start qs q2 qf

R22 ∪ R21 R11* R12


From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)* ε

start qs qf
From NFAs to Regular Expressions

R11* R12 (R22 ∪ R21R11*R12)*

start qs qf
From NFAs to Regular Expressions

start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf
From NFAs to Regular Expressions

start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf

R11 R22
R12
start q1 q2
R21
The State-Elimination Algorithm

Start with an NFA N for the language L.
● Add a new start state qs and accept state qf to the
NFA.
● Add an ε-transition from qs to the old start state of N.
● Add ε-transitions from each accepting state of N to qf,
then mark them as not accepting.
● Repeatedly remove states other than qs and qf
from the NFA by “shortcutting” them until only
two states remain: qs and qf.
● The transition from qs to qf is then a regular
expression for the NFA.
The State-Elimination Algorithm

To eliminate a state q from the automaton, do the following
for each pair of states q₀ and q₁, where there's a transition
from q₀ into q and a transition from q into q₁:
● Let Rin be the regex on the transition from q₀ to q.
● Let Rout be the regex on the transition from q to q₁.
● If there is a regular expression Rstay on a transition from q
to itself, add a new transition from q₀ to q₁ labeled
((Rin)(Rstay)*(Rout)).

If there isn't, add a new transition from q₀ to q₁ labeled
((Rin)(Rout))

If a pair of states has multiple transitions between them
labeled R₁, R₂, …, Rₖ, replace them with a single transition
labeled R₁ ∪ R₂ ∪ … ∪ Rₖ.
Our Transformations

direct conversion state elimination

DFA NFA Regexp

subset construction Thompson's algorithm


Theorem: The following are all equivalent:
· L is a regular language.
· There is a DFA D such that ℒ(D) = L.
· There is an NFA N such that ℒ(N) = L.
· There is a regular expression R such that ℒ(R) = L.
Why This Matters

The equivalence of regular expressions
and finite automata has practical
relevance.

Regular expression matchers have all the
power available to them of DFAs and NFAs.

This also is hugely theoretically
significant: the regular languages can be
assembled “from scratch” using a small
number of operations!
Next Time

Applications of Regular Languages

Answering “so what?”

Intuiting Regular Languages

What makes a language regular?

The Myhill-Nerode Theorem

The limits of regular languages.

You might also like