0% found this document useful (0 votes)
135 views17 pages

Lexical Analysis Finite Automata: CMSC 331, Some Material © 1998 by Addison Wesley Longman, Inc

The document discusses finite automata and how they are used for lexical analysis. Finite automata can be deterministic or non-deterministic and are represented by state diagrams or transition tables. Regular expressions are converted to finite automata to recognize language tokens, which is used in scanner generation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views17 pages

Lexical Analysis Finite Automata: CMSC 331, Some Material © 1998 by Addison Wesley Longman, Inc

The document discusses finite automata and how they are used for lexical analysis. Finite automata can be deterministic or non-deterministic and are represented by state diagrams or transition tables. Regular expressions are converted to finite automata to recognize language tokens, which is used in scanner generation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Lexical analysis

Finite Automata

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

Finite Automata (FA)


FA also called Finite State Machine (FSM)
Abstract model of a computing entity.
Decides whether to accept or reject a string.
Every regular expression can be represented as a FA and vice versa
Two types of FAs:
Non-deterministic (NFA): Has more than one alternative action for the same input
symbol.
Deterministic (DFA): Has at most one action for a given input symbol.
Example: how do we write a program to recognize java keyword int?

q0

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

q1

q2

q3

RE and Finite State Automaton (FA)


Regular expression is a declarative way to describe the tokens
It describes what is a token, but not how to recognize the token.
FA is used to describe how the token is recognized
FA is easy to be simulated by computer programs;
There is a 1-1 correspondence between FA and regular expression
Scanner generator (such as lex) bridges the gap between regular
expression and FA.
String stream

Finite
automaton

Regular
expression

Scanner generator

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

scanner
program

Tokens

Inside scanner generator


Main components of scanner
generation (e.g., Lex)
Convert a regular expression to
a non-deterministic finite
automaton (NFA)
Convert the NFA to a
determinstic finite automaton
(DFA)
Improve the DFA to minimize
the number of states
Generate a program in C or
some other language to
simulate the DFA
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

RE

Thompson construction

NFA
Subset construction

DFA
Minimization

Minimized DFA
DFA simulation

Scanner
generator

Program

Non-deterministic Finite Automata (FA)


NFA (Non-deterministic Finite Automaton) is a 5-tuple
(S, , , S0, F):

S: a set of states;
: the symbols of the input alphabet;
: a set of transition functions;
move(state, symbol) a set of states
S0: s0 S, the start state;
F: F S, a set of final or accepting states.

Non-deterministic -- a state and symbol pair can be


mapped to a set of states.
Finitethe number of states is finite.

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

Transition Diagram
FA can be represented using transition diagram.
Corresponding to FA definition, a transition diagram has:
States represented by circles;
An Alphabet () represented by labels on edges;
Transitions represented by labeled directed edges between states. The
label is the input symbol;
One Start State shown as having an arrow head;
One or more Final State(s) represented by double circles.

Example transition diagram to recognize (a|b)*abb


a
q0

q1

q2

q3

b
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

Simple examples of FA
a

start

a*

start

0
a

start

a+

a
start

(a|b)*

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

a, b
start

Procedures of defining a
DFA/NFA

Defining input alphabet and initial state


Draw the transition diagram
Check

Do all states have out-going arcs labeled with all the input
symbols (DFA)
Any missing final states?
Any duplicate states?
Can all strings in the language can be accepted?
Are any strings not in the language accepted?

Naming all the states


Defining (S, , , q0, F)
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

Example of constructing a FA
Construct a DFA that accepts a language L over the
alphabet {0, 1} such that L is the set of all strings with
any number of 0s followed by any number of 1s.
Regular expression: 0*1*
= {0, 1}
Draw initial state of the transition diagram

Start

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

Example of constructing a FA
0

Draft the transition diagram


0

Start

1
1

Is 111 accepted?
The leftmost state has missed an arc with input 1
0
Start

1
1
1

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

10

Example of constructing a FA
Is 00 accepted?
The leftmost two states are also final states
First state from the left: is also accepted
Second state from the left:
strings with 0s only are also accepted

0
Start

1
1
1

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

11

Example of constructing a FA
The leftmost two states are duplicate
their arcs point to the same states with the same symbols

1
1

Start

Check that they are correct


All strings in the language can be accepted
, the empty string, is accepted
strings with 0s / 1s only are accepted
No strings not in language are accepted

Naming all the states


Start
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

0
q0

1
1

q1
12

How does a FA work


a

NFA definition for (a|b)*abb

q0

q1

q2

q3

S = {q0, q1, q2, q3 }


b
= { a, b }
Transitions: move(q0,a)={q0, q1}, move(q0,b)={q0}, ....
s0 = q0
F = { q3 }

Transition diagram representation


Non-determinism:
exiting from one state there are multiple edges labeled with same symbol, or
There are epsilon edges.
How does FA work? Input: ababb
move(0, a) = 1
move(1, b) = 2
move(2, a) = ? (undefined)
REJECT !

move(0, a) = 0
move(0, b) = 0
move(0, a) = 1
move(1, b) = 2
move(2, b) = 3
ACCEPT !

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

13

FA for (a|b)*abb
a

q0

q1

q2

q3

What does it mean that a string is accepted by a FA?


An FA accepts an input string x iff there is a path from the start
state to a final state, such that the edge labels along this path spell
out x;
A path for aabb:
Q0a q0a q1b q2b q3
Is aab acceptable?
Q0a q0a q1b q2
Q0a q0a q0b q0
Final state must be reached;
In general, there could be several paths.
Is aabbb acceptable?
Q0a q0a q1b q2b q3
Labels on the path must spell out the entire string.

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

14

Transition table
A transition table is a good way to implement a FSA
One row for each state, S
One column for each symbol, A
Entry in cell (S,A) gives the state or set of states can be reached from state
S on input A.

A Nondeterministic Finite Automaton (NFA) has at least one cell


with more than one state.
A Deterministic Finite Automaton (DFA) has a singe state in
every cell
INPUT

(a|b)*abb

STATES

>Q0

{q0, q1}

q0

a
q0

q1

q2

q3

Q1

q2

Q2

q3

*Q3
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

15

DFA (Deterministic Finite


A special case of NFAAutomaton)
where the transition function maps the
pair (state, symbol) to one state.

When represented by transition diagram, for each state S and symbol a, there
is at most one edge labeled a leaving S;
When represented transition table, each entry in the table is a single state.
There are no -transition

Example: DFA for (a|b)*abb


INPUT

STATES

q0

q1

q0

q1

q1

q2

q2

q1

q3

q3

q1

q0

Recall the NFA:

CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

16

DFA to program
NFA is more concise, but not as easy to
implement;
In DFA, since transition tables dont
have any alternative options, DFAs are
easily simulated via an algorithm.
Every NFA can be converted to an
equivalent DFA

RE

Thompson construction

NFA
Subset construction

What does equivalent mean?

There are general algorithms that can


take a DFA and produce a minimal
DFA.
Minimal in what sense?

There are programs that take a regular


expression and produce a program
based on a minimal DFA to recognize
strings defined by the RE.
You can find out more in 451
(automata theory) and/or 431
(Compiler design)
CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

DFA
Minimization

Minimized DFA
DFA simulation

Scanner
generator

Program

17

You might also like