0% found this document useful (0 votes)
39 views56 pages

Class Notes (Unit I - FA and Regular Expression)

The document discusses Finite-State Automata (FSA) and Regular Expressions (RE) as methods for describing regular languages, specifically focusing on the 'sheeptalk' language. It explains how an FSA can recognize strings through a state-transition mechanism and introduces concepts such as deterministic and non-deterministic FSAs, along with algorithms for string recognition. Additionally, it covers the formal definition of languages and the use of automata in generating and recognizing strings within those languages.

Uploaded by

paropel491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views56 pages

Class Notes (Unit I - FA and Regular Expression)

The document discusses Finite-State Automata (FSA) and Regular Expressions (RE) as methods for describing regular languages, specifically focusing on the 'sheeptalk' language. It explains how an FSA can recognize strings through a state-transition mechanism and introduces concepts such as deterministic and non-deterministic FSAs, along with algorithms for string recognition. Additionally, it covers the formal definition of languages and the use of automata in generating and recognizing strings within those languages.

Uploaded by

paropel491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Finite-State Automata

• An RE is one way of describing a FSA.


• An RE is one way of characterizing a particular kind of
formal language called a regular language.
• Both Regular Expressions & Finite State Automata can
be used to describe regular languages.
• The relation among these three theoretical
constructions is sketched out in the figure in next slide,
which suggested by Martin Kay.

4
Finite-State Automata

5
Using an FSA to Recognize Sheeptalk
• As we defined the sheep language in Part 1 it is any string
from the following (infinite) set:
baa!
baaa!
baaaa!
baaaaa!
baaaaaa!
...

• The regular expression for this kind of ‘sheep talk’ is


/baa+!/. Figure 2.10 shows an automaton for modeling
this regular expression.
6
Using an FSA to Recognize Sheeptalk
• The automaton (i.e. machine, also called finite automaton,
finite-state automaton, or FSA) recognizes a set of strings, in
this case the strings characterizing sheep talk, in the same
way that a regular expression does.
• We represent the automaton as a directed graph: a finite
set of vertices (also called nodes), together with a set of
directed links between pairs of vertices called arcs.
• We’ll represent vertices with circles and arcs with arrows.
• The automaton has five states, which are represented by
nodes in the graph.
• State 0 is the start state which we represent by the incoming
arrow.
• State 4 is the final state or accepting state, which we
represent by the double circle. It also has four transitions,
which we represent by arcs in the graph.
7
Using an FSA to Recognize Sheeptalk

• The FSA can be used for recognizing (we also say


accepting) strings in the following way.
• First, think of the input as being written on a long tape
broken up into cells, with one symbol written in each
cell of the tape, as in Figure 2.11.

8
Using an FSA to Recognize Sheeptalk

• The machine starts in the start state (q0), and iterates the
following process:
▪ Check the next letter of the input. If it matches the symbol on an arc
leaving the current state, then cross that arc, move to the next state,
and also advance one symbol in the input.
▪ If we are in the accepting state (q4) when we run out of input, the
machine has successfully recognized an instance of sheeptalk.
▪ If the machine never gets to the final state, either because it runs out
of input, or it gets some input that doesn’t match an arc (as in Figure
2.11), or if it just happens to get stuck in some non-final state, we say
the machine rejects or fails to accept an input.

9
Using an FSA to Recognize Sheeptalk
• We can also represent an automaton with a state-transition
table. As in the graph notation, the state-transition table
represents the start state, the accepting states, and what
transitions leave each state with which symbols.
• Here’s the state-transition table for the FSA of Figure 2.10.

10
Using an FSA to Recognize Sheeptalk
• See the input b we must go to state 1. If we’re in state 0 and
we see the input a or !, we fail”.

• More formally, a finite automaton is defined by the following 5


parameters:
▪ Q: a finite set of N states q0, q1, …, qN
▪ : a finite input alphabet of symbols
▪ q0: the start state
▪ F: the set of final states, F  Q
▪ (q,i): the transition function or transition matrix between states. Given a
state q  Q and input symbol i  , (q,i) returns a new state q’  Q.  is
thus a relation from Q   to Q;

11
Using an FSA to Recognize Sheeptalk
• For the sheeptalk automaton in Figure 2.10, Q = {q0, q1, q2, q3, q4},
 = {a, b, !}, F = {q4}, and (q,i) is defined by the transition table in
Figure 2.12.

• Figure 2.13 presents an algorithm for recognizing a string using a


state transition table. The algorithm is called D-RECOGNIZE for
‘deterministic recognizer’.

• A deterministic algorithm is one that has no choice points; the


algorithm always knows what to do for any input.
• But non-deterministic automata that must make decisions about
which states to move to.

12
Using an FSA to Recognize Sheeptalk

13
Using an FSA to Recognize Sheeptalk
• D-RECOGNIZE takes as input a tape and an automaton. It returns accept
if the string it is pointing to on the tape is accepted by the automaton,
and reject otherwise.

• Note that since D-RECOGNIZE assumes it is already pointing at the string


to be checked, its task is only a subpart of the general problem that we
often use regular expressions for, finding a string in a corpus.

• D-RECOGNIZE begins by initializing the variables index and currentstate


to the beginning of the tape and the machine’s initial state.

• D-RECOGNIZE then enters a loop that drives the rest of the algorithm.
• It first checks whether it has reached the end of its input. If so, it either
accepts the input (if the current state is an accept state) or rejects the
input (if not).
14
Using an FSA to Recognize Sheeptalk
• If there is input left on the tape, D-RECOGNIZE looks at the transition table
to decide which state to move to.

• The variable current-state indicates which row of the table to consult,


while the current symbol on the tape indicates which column of the table
to consult.

• The resulting transition-table cell is used to update the variable current-


state and index is incremented to move forward on the tape.

• If the transition-table cell is empty then the machine has nowhere to go


and must reject the input.

15
Using an FSA to Recognize Sheeptalk
• Figure 2.14 traces the execution of this algorithm on the sheep language
FSA given the sample input string baaa!.

16
Using an FSA to Recognize Sheeptalk

• Before examining the beginning of the tape, the machine is in state q0.

• Finding a b on input tape, it changes to state q1 as indicated by the


contents of transition-table[q0 ,b] in Figure 2.12 on slide 10.

• It then finds an a and switches to state q2, another a puts it in state q3, a
third a leaves it in state q3,
• where it reads the ‘!’, and switches to state q4.

• Since there is no more input, the End of input condition at the beginning
of the loop is satisfied for the first time and the machine halts in q4.

• State q4 is an accepting state, and so the machine has accepted the


string baaa! as a sentence in the sheep language.
17
Using an FSA to Recognize Sheeptalk

• The algorithm will fail whenever there is no legal transition for a given
combination of state and input.
• The input abc will fail to be recognized since there is no legal transition
out of state q0 on the input a, (i.e. this entry of the transition table in
Figure 2.12 on slide 10 has a ∅).
• Even if the automaton had allowed an initial a it would have certainly
failed on c, since c isn’t even in the sheeptalk alphabet!).
• We can think of these ‘empty’ elements in the table as if they all pointed
at one ‘empty’ state, which we might call the fail state or sink state.
• In a sense then, we could view any machine with empty transitions as if
we had augmented it with a fail state, and drawn in all the extra arcs, so
we always had somewhere to go from any state on any possible input.
18
Using an FSA to Recognize Sheeptalk
• Just for completeness, Figure 2.15 shows the FSA from Figure 2.10 with the
fail state qF filled in.

19
Formal Languages
• We can use the same graph in Figure 2.10 as an automaton for
GENERATING sheeptalk.
• If we do, we would say that the automaton starts at state q0, and crosses
arcs to new states, printing out the symbols that label each arc it follows.

• When the automaton gets to the final state it stops. Notice that at state 3,
the automaton has to chose between printing out a ! and going to state
4, or printing out an a and returning to state 3.
• Let’s say for now that we don’t care how the machine makes this
decision; maybe it flips a coin.
• For now, we don’t care which exact string of sheeptalk we generate, as
long as it’s a string captured by the regular expression for sheeptalk
above.
20
Formal Languages
• Key concept #1: Formal Language: A model which can both generate
and recognize all and only the strings of a formal language acts as a
definition of the formal language.

• A formal language is a set of strings, each string composed of symbols


from a finite symbol-set call an alphabet (the same alphabet used above
for defining an automaton!).

• The alphabet for the sheep language is the set  = {a, b, !}.
• Given a model m (such as a particular FSA), we can use L(m) to mean
“the formal language characterized by m”.
• So the formal language defined by our sheeptalk automaton m in Figure
2.10 (and Figure 2.12) is the infinite set:

L(m) = {baa!, baaa!, baaaa!, baaaaa!, baaaaaa!...}


21
Formal Languages
• The usefulness of an automaton for defining a language is that it can
express an infinite set in a closed form.
• A formal language may bear no resemblance at all to a real language
(natural language), but
▪ We often use a formal language to model part of a natural language,
such as parts of the phonology, morphology, or syntax.

• The term generative grammar is used in linguistics to mean a grammar of


a formal language; the origin of the term is this use of an automaton to
define a language by generating all possible strings.

22
Another Example

23
Another Example

24
Non-Deterministic FSAs
• Consider the sheeptalk automaton in Figure 2.18, which is much like our
first automaton in Figure 2.10:

• The only difference between this automaton and the previous one is that
here in Figure 2.18 the self-loop is on state 2 instead of state 3.
25
Non-Deterministic FSAs
• Consider using this network as an automaton for recognizing sheeptalk.
• When we get to state 2, if we see an a we don’t know whether to remain
in state 2 or go on to state 3.

• Automata with decision points like this are called non-deterministic FSAs
(or NFSAs).

• Recall by contrast that Figure 2.10 specified a deterministic automaton,


i.e. one whose behavior during recognition is fully determined by the
state it is in and the symbol it is looking at.

• A deterministic automaton can be referred to as a DFSA.

• That is not true for DFSA the machine in Figure 2.18 (NFSA #1).
26
Non-Deterministic FSAs
• There is another common type of non-determinism, which can be
caused by arcs that have no symbols on them (called 𝜀-transitions).

• The automaton in Figure 2.19 defines the exact same language as the
last one, or our first one, but it does it with an 𝜀 -transition.

27
Non-Deterministic FSAs
• We interpret this new arc as follows: if we are in state 3, we are
allowed to move to state 2 without looking at the input, or
advancing our input pointer.

• So this introduces another kind of non-determinism – we might not


know whether to follow the 𝜀 -transition or the ! arc.

28
Using an NFSA to Accept Strings

• There are three standard solutions to problem of choice in non-


deterministic models:

1. Backup: Whenever we come to a choice point, we could put a


marker to mark where we were in the input, and what state the
automaton was in. Then if it turns out that we took the wrong choice,
we could back up and try another path.

2. Look-ahead: We could look ahead in the input to help us decide


which path to take.

3. Parallelism: Whenever we come to a choice point, we could look at


every alternative path in parallel.
29
Using an NFSA to Accept Strings
• The backup approach suggests that we should blithely make choices
that might lead to deadends, knowing that we can always return to
unexplored alternative choices.

• There are two keys to this approach: we need to remember all the
alternatives for each choice point, and we need to store sufficient
information about each alternative so that we can return to it when
necessary.

• When a backup algorithm reaches a point in its processing where no


progress can be made (because it runs out of input, or has no legal
transitions), it returns to a previous choice point, selects one of the
unexplored alternatives, and continues from there.

30
Using an NFSA to Accept Strings
• Applying this notion to our nondeterministic recognizer, we need only
remember two things for each choice point: the state, or node, of the
machine that we can go to and the corresponding position on the tape.

• We will call the combination of the node and position the search-state of
the recognition algorithm.

• To avoid confusion, we will refer to the state of the automaton (as


opposed to the state of the search) as a node or a machine-state.

• Figure 2.21 presents a recognition algorithm based on this approach.

31
Using an NFSA to Accept Strings
• Before going on to describe the main part of this algorithm, we should
note two changes to the transition table that drives it.
• First, in order to represent nodes that have outgoing 𝜀-transitions, we add
a new 𝜀 -column to the transition table. If a node has an 𝜀-transition, we
list the destination node in the 𝜀 -column for that node’s row.
• The second addition is needed to account for multiple transitions to
different nodes from the same input symbol.
• We let each cell entry consist of a list of destination nodes rather than a
single node.

• Figure 2.20 shows the transition table for the machine in Figure 2.18 (NFSA
#1).
• While it has no 𝜀 -transitions, it does show that in machine-state q2 the
input a can lead back to q2 or on to q3.
32
Using an NFSA to Accept Strings

33
Using an NFSA to Accept Strings

• Figure 2.21 shows the algorithm for using a non-deterministic FSA to


recognize an input string.

• The function ND-RECOGNIZE uses the variable agenda to keep track of


all the currently unexplored choices generated during the course of
processing.

• Each choice (search state) is a tuple consisting of a node (state) of the


machine and a position on the tape.

• The variable current-search-state represents the branch choice being


currently explored.

34
Using an NFSA to Accept Strings
• ND-RECOGNIZE begins by creating an initial search-state and placing it
on the agenda.

• For now we don’t specify what order the search-states are placed on the
agenda.

• This search-state consists of the initial machine-state of the machine and


a pointer to the beginning of the tape.

• The function NEXT is then called to retrieve an item from the agenda and
assign it to the variable current-search-state.

35
Using an NFSA to Accept Strings
• As with D-RECOGNIZE, the first task of the main loop is to determine if the
entire contents of the tape have been successfully recognized.

• This is done via a call to ACCEPT-STATE?, which returns accept if the


current search-state contains both an accepting machine-state and a
pointer to the end of the tape.

• If we’re not done, the machine generates a set of possible next steps by
calling GENERATE-NEW-STATES, which creates search-states for any 𝜀-
transitions and any normal input-symbol transitions from the transition
table.

• All of these search-state tuples are then added to the current agenda.

36
Using an NFSA to Accept Strings
• Finally, we attempt to get a new search-state to process from the
agenda.

• If the agenda is empty we’ve run out of options and have to reject the
input.

• Otherwise, an unexplored option is selected and the loop continues.

• It is important to understand why ND-RECOGNIZE returns a value of reject


only when the agenda is found to be empty.

• Unlike D-RECOGNIZE, it does not return reject when it reaches the end of
the tape in an non-accept machine-state or when it finds itself unable to
advance the tape from some machine-state.
37
Using an NFSA to Accept Strings

• This is because, in the non-deterministic case, such roadblocks


only indicate failure down a given path, not overall failure.

• We can only be sure we can reject a string when all possible


choices have been examined and found lacking.

• Figure 2.22 illustrates the progress of ND-RECOGNIZE as it


attempts to handle the input baaa!.

38
Using an NFSA to Accept Strings

• Each strip illustrates the state of the algorithm at a given point


in its processing.

• The current-search-state variable is captured by the solid


bubbles representing the machine-state along with the arrow
representing progress on the tape.

39
Using an NFSA to Accept Strings
• Each strip lower down in the figure represents progress from one current-
search-state to the next.

• Little of interest happens until the algorithm finds itself in state q2 while
looking at the second a on the tape.

• An examination of the entry for transition-table[q2 ,a] returns both q2 and


q3.

• Search states are created for each of these choices and placed on the
agenda.

• Unfortunately, our algorithm chooses to move to state q3, a move that


results in neither an accept state nor any new states since the entry for
transition-table[q3 , a] is empty.
40
Using an NFSA to Accept Strings

• At this point, the algorithm simply asks the agenda for a new
state to pursue.

• Since the choice of returning to q2 from q2 is the only


unexamined choice on the agenda it is returned with the tape
pointer advanced to the next a.

• Somewhat diabolically, ND-RECOGNIZE finds itself faced with


the same choice.

41
Using an NFSA to Accept Strings

• The entry for transition-table[q2 ,a] still indicates that looping


back to q2 or advancing to q3 are valid choices.

• As before, states representing both are placed on the


agenda. These search states are not the same as the previous
ones since their tape index values have advanced.

• This time the agenda provides the move to q3 as the next


move. The move to q4, and success, is then uniquely
determined by the tape and the transition-table.

42
Using an NFSA to Accept Strings

43
Recognition as Search
• ND-RECOGNIZE accomplishes the task of recognizing strings in a regular
language by providing a way to systematically explore all the possible
paths through a machine.

• If this exploration yields a path ending in an accept state, it accepts the


string, otherwise it rejects it.

• This systematic exploration is made possible by the agenda mechanism,


which on each iteration selects a partial path to explore and keeps track
of any remaining, as yet unexplored, partial paths.

• Algorithms such as ND-RECOGNIZE, which operate by systematically


searching for solutions, are known as state-space-search algorithms.

44
Recognition as Search
• In such algorithms, the problem definition creates a space of possible
solutions; the goal is to explore this space, returning an answer when one
is found or rejecting the input when the space has been exhaustively
explored.

• In ND-RECOGNIZE, search states consist of pairings of machine-states with


positions on the input tape.

• The state-space consists of all the pairings of machine-state and tape


positions that are possible given the machine in question.

• The goal of the search is to navigate through this space from one state to
another looking for a pairing of an accept state with an end of tape
position.
45
Recognition as Search

• The key to the effectiveness of such programs is often the order in which
the states in the space are considered.

• A poor ordering of states may lead to the examination of a large number


of unfruitful states before a successful solution is discovered.

• Unfortunately, it is typically not possible to tell a good choice from a bad


one, and often the best we can do is to insure that each possible solution
is eventually considered.

46
Recognition as Search
• You may have noticed that the ordering of states in ND-RECOGNIZE
has been left unspecified.

• We know only that unexplored states are added to the agenda as


they are created and that the (undefined) function NEXT returns an
unexplored state from the agenda when asked.

• How should the function NEXT be defined?

47
Recognition as Search
• Consider an ordering strategy where the states that are considered
next are the most recently created ones.

• Such a policy can be implemented by placing newly created


states at the front of the agenda and having NEXT return the state
at the front of the agenda when called.

• Thus the agenda is implemented by a stack.

• This is commonly referred to as a depth-first search or Last In First Out


(LIFO) strategy.

48
Recognition as Search

• Such a strategy dives into the search space following newly


developed leads as they are generated.

• It will only return to consider earlier options when progress


along a current lead has been blocked.

• The trace of the execution of ND-RECOGNIZE on the string


baaa! as shown in Figure 2.22 illustrates a depth-first search.

49
Recognition as Search

• The algorithm hits the first choice point after seeing ba when it
has to decide whether to stay in q2 or advance to state q3.

• At this point, it chooses one alternative and follows it until it is


sure it’s wrong.
• The algorithm then backs up and tries another older
alternative.

50
Recognition as Search

• The second way to order the states in the search space is to


consider states in the order in which they are created.

• Such a policy can be implemented by placing newly created


states at the back of the agenda and still have NEXT return the
state at the front of the agenda.

• Thus the agenda is implemented via a queue.

• This is commonly referred to as a breadth-first-search or First In


First Out (FIFO) strategy.
51
Recognition as Search

• Consider a different trace of the execution of ND-RECOGNIZE


on the string baaa! as shown in Figure 2.23.

• Again, the algorithm hits its first choice point after seeing ba
when it had to decide whether to stay in q2 or advance to
state q3.

• But now rather than picking one choice and following it up, we
imagine examining all possible choices, expanding one ply of
the search tree at a time.
52
Recognition as Search

53
Recognition as Search

• Both the two algorithms (Depth-first search and Breadth-first search)have


their own disadvantages, such as they both can enter an
infinite loop under certain circumstances, but depth-first search
algorithm is normally preferred for its more efficient use of
memory.

• For larger problems, more complex search techniques such as


dynamic programming or A* must be used.

54
Regular Languages and FSAs
• The class of languages that are definable by regular expressions is exactly
the same as the class of languages that are characterizable by FSA (D or
ND).
▪ These languages are called regular languages.
• The regular languages over  is formally defined as:
1.  is an RL
2. a  , {a} is an RL
3. If L1 and L2 are RLs, then so are:
a)L1L2 ={xy| x  L1 and y  L2}, the concatenation of L1 and L2
b)L1L2, the union or disjunction of L1 and L2
c)L1*, the Kleene closure of L1

• All and only the sets of languages which meet the above prosperities are
regular languages.
55
Regular Languages and FSAs
• Regular languages are also closed under the following operations (where
Σ ∗ means the infinite set of all possible strings formed from the alphabet Σ):

❖ Intersection: if L1 and L2 are regular languages, then so is L1 ∩ L2, the


language consisting of the set of strings that are in both L1 and L2.

❖ Difference: if L1 and L2 are regular languages, then so is L1 − L2, the


language consisting of the set of strings that are in L1 but not L2.

❖ Complementation: If L1 is a regular language, then so is Σ ∗ − L1, the set


of all possible strings that aren’t in L1.

❖ Reversal: If L1 is a regular language, then so is 𝐿1𝑅 , the language


consisting of the set of reversals of all the strings in L1.
56
Regular Languages and FSAs

57
Regular Languages and FSAs

58
Regular Languages and FSAs

59

You might also like