0% found this document useful (0 votes)
88 views4 pages

Algebraic Laws for Regular Expressions

This document discusses finite automata and regular expressions. Finite automata are abstract machines that recognize patterns using a finite set of states, an alphabet, and transitions between states labeled with alphabet symbols. Regular expressions formally define languages using operators like union, concatenation, and Kleene closure. Regular expressions can be converted to finite automata models like non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs) using constructions like Thompson's algorithm. NFAs and DFAs are also interconvertible using the subset construction algorithm.

Uploaded by

Manoj Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views4 pages

Algebraic Laws for Regular Expressions

This document discusses finite automata and regular expressions. Finite automata are abstract machines that recognize patterns using a finite set of states, an alphabet, and transitions between states labeled with alphabet symbols. Regular expressions formally define languages using operators like union, concatenation, and Kleene closure. Regular expressions can be converted to finite automata models like non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs) using constructions like Thompson's algorithm. NFAs and DFAs are also interconvertible using the subset construction algorithm.

Uploaded by

Manoj Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

• Finite Automata – formal or abstract

machine to recognize patterns


Patterns, Automata, and
• Regular expressions – formal notation to
Regular Expressions describe/generate patterns

Finite Automata Regular Expressions


• A finite collection of states • Defines a set of strings over the characters contained in
• An alphabet some alphabet, defining a language
• A set of transitions between those states labeled • Atomic operand can be
– a character,
with symbols in the alphabet
– the symbol ε,
• A start state, S0 – the symbol Φ, or
• One or more final states – a variable whose value can be any pattern defined by a regular
expression
• Three basic operations/operators
Deterministic – single-valued transition, no epsilon
– Union – e.g., a|b
transitions – Concatenation – e.g., ab
Non-deterministic – multi-valued transitions – Closure – (Kleene closure) – e.g., a* - where a can be a set
concatenated with itself zero or more times

Precedence of Regular Expression Algebraic Laws for Regular


Operators Expressions
• Closure (highest) • Identity for Union Φ|R = R|Φ= R
• Identity for concatenation εR=Rε=R
• Concatenation • Associativity and commutativity of union
• Union R|S=S|R, ((R|S)|T) = (R|(S|T))
• Associativity of concatenation (RS)T = R(ST)
• Non-commutativity of concatenation
• Left distributivity of concatenation over union
(R(S|T)) = (RS|RT)
• Right distributivity of concatenation over union
((S|T)R) = (SR|TR)
• Idempotence of union (R | R) = R

1
RE <-> DFA Automated RE->NFA

State-Elimination
• Build NFA for each term, connect them
a b
Construction
DFA
with moves s0 s1 s2 s3
RE
– Concatenate - ab
a  b
s0 s1 s2
Subset s3
Thompson’s
Construction
Construction
– Union – a|b
 s0
a

s1
NFA s4
 s2
b  s5

– Kleene Closure – a*  s3

 a

s2 s0 s1 s3


Thompson’s Construction NFA->DFA


• Each NFA has a single start state and a single • Subset construction algorithm
– Each state in DFA corresponds to a set of states in NFA
final state, with no transitions leaving the final
state and only the initial transition entering the q0 ε-closure(n0)
start state initialize Q with {q0}

• An  -move always connects two states that were


while (Q is still changing)

for each qi Q
start or final states for each character  
• A state has at most 2 entering and 2 exiting  - t ε-closure(move(qi, )) 
T[qi, ]  t
moves, and at most 1 entering and 1 exiting 
if t Q then
move on a symbol in the alphabet add t to Q

Example DFA Minimization


 
NFA P P {SF, {S - SF}
while (P is still changing)
m a i n
S0 Sm Sa Si Sn T 0
for each set p  P
 -m  -a,m  -i,m  -n,m for each  

Corresponding DFA partition p by 


m a i
S0 S0,Sm S0,Sa S0,Si into p1, p2, p3, … pk
n
 -m m m m T T U p1, p2, p3, … pk
m m m
if T P then
m
a So,Sn,
i
S0,Sn, P T
S0,Sn S0,Sn,
Sa Si  -m
 -a,m
Sm
 -i,m

2
Automated DFA->RE Regular Expression and DFA
Identifier
for i = 1 to N
for j = 1 to N letter (a|b|c| … |z|A|B|C| … |Z)
Rij0 = {a| (si,a) = sj} digit (0|1|2|3|4|5|6|7|8|9)
if (i == j)
Rij0 = Rij0 U 
id letter (letter|digit)*
letter
for k = 1 to N digit
for i = 1 to N
letter
for j = 1 to N S0 S1

Rijk = Rikk-1 Rkkk-1 Rkjk-1 URijk-1


digit
L = U R1jN accept
s S
j


F S2 error
letter
digit

Implementing Scanners Code for Semi-Mechanical Pure


(Recognizer) DFA
state = S0; /* code for S0 */
• Ad-hoc done = false;
token_value = “” /* empty string */
token_type = error;

• Semi-mechanical pure DFA char = next_char();


while (not done) {
class = char_class[char];

• Table-driven DFA switch(state) {


case S0:
switch (class)
case ‘letter’: token_type = identifier; token_value = token_value+char;
state = S1; char = next_char(); break;
case ‘digit’: done = true; break;
case ‘other’: done = true; break;
break;
case S1:
switch(class)
case ‘letter’:
case ‘digit’: token_value = token_value+char; char = next_char(); break;
case ‘other’: done = true; break;
break;
}
}
return(token_type);

Table-Driven Recognizer Tables Driving the Recognizer


a-z A-Z 0-9 other
letter char_class
digit value letter letter digit other
letter other
S0 S1 S2

digit
other
class S0 S1 S2 S3
accept

S3 error letter S1 S1 -- --
next_state
digit S3 S1 -- --
other S3 S2 -- --
To change language, we can just change tables

3
Table-Driven Recognizer/Scanner Error Recovery
char = next_char();
state = S0; /* code for S0 */
token_value = “” /* empty string */
• E.g., illegal character
while (not done) {
class = char_class[char]; • What should the scanner do?
state = next_state[class,state];
switch(state) {
case S2: /* accept state */
– Report the error
token_type = identifier;
done = true; break; – Try to correct it?
case S3: /* error */
token_type = error;
done = true; break;
• Error correction techniques
default: /* building an id */
token_value = token_value + char; – Minimum distance corrections
char = next_char; break;

}
} – Hard token recovery
return(token_type);
– Skip until match

Scanner Summary
• Break up input into tokens
• Catch lexical errors
• Difficulty affected by language design
• Issues
– Input buffering
– Lookahead
– Error recovery
• Scanner generators
– Tokens specified by regular expressions
– Regular expression -> DFA
– Highly efficient in practice

You might also like