0% found this document useful (0 votes)
10 views107 pages

Lecture 1 To 4 Theory of Computation

The document outlines the Theory of Computation course, covering topics such as formal grammars, automata, regular expressions, context-free languages, and Turing machines. It emphasizes the importance of understanding computation limits, algorithm design, and applications in fields like artificial intelligence and cybersecurity. The course also introduces fundamental concepts and terminologies essential for studying computation theory.

Uploaded by

okea561
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views107 pages

Lecture 1 To 4 Theory of Computation

The document outlines the Theory of Computation course, covering topics such as formal grammars, automata, regular expressions, context-free languages, and Turing machines. It emphasizes the importance of understanding computation limits, algorithm design, and applications in fields like artificial intelligence and cybersecurity. The course also introduces fundamental concepts and terminologies essential for studying computation theory.

Uploaded by

okea561
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

THEORY OF COMPUTATION

DR O. J Falana

1
CSC 206: THEORY OF COMPUTATION (2 Units)

Formal grammars and automata; meaning of alphabet, string, concatenation, language


and level of language; regular expression, regular grammar and context-free languages,
deterministic and non-deterministic parsing of context free languages; recursive
language, finite state automata, turing machine, pumping lemma, chomsk normal form
and CYK algorithm. Pre-requisites MTS 101
Outlines
1. Introduction to Computation Theory
Definition and importance of Computation Theory
Relationship between Computation Theory and Computer Science
Overview of key topics in the course

2. Formal Grammars and Automata


Definition of Grammar and Automata
Types of Formal Grammars (Regular, Context-Free, Context-Sensitive, Unrestricted)
Automata Theory: Finite Automata, Pushdown Automata, Turing Machines

3. Alphabet, Strings, Concatenation, Language, and Levels of Language


Definition of Alphabet (Σ), Strings and their properties
String operations (Concatenation, Substring, Reverse)
Definition of Language and Examples, Chomsky Hierarchy of Languages
Outlines contd..
4. Regular Expressions and Regular Grammar
Definition of Regular Expressions, Operations on Regular Expressions (Union, Concatenation,
Kleene Star), Regular Grammars and their Types (Left-Linear, Right-Linear), Equivalence
between Regular Expressions and Finite Automata

5. Context-Free Languages and Parsing


Definition of Context-Free Languages, Context-Free Grammars (CFGs) and their
Representation, Derivations and Parse Trees, Deterministic vs. Non-Deterministic Parsing
Applications of CFGs in Programming Languages

6. Recursive Languages and Decidability


Definition of Recursive and Recursively Enumerable Languages, Decidability and
Undecidability in Computation, Examples of Recursive and Non-Recursive Languages
7. Finite State Automata (FSA), Definition and Types (Deterministic and Non-Deterministic)
Transition Functions and State Diagrams, Conversion between NFA and DFA
Outlines contd..
8. Turing Machines and Computability
Introduction to Turing Machines, Components and Functioning of a Turing Machine
Variants of Turing Machines (Multi-Tape, Non-Deterministic), Church-Turing Thesis
and Computability

9. Pumping Lemma
Pumping Lemma for Regular Languages, Pumping Lemma for Context-Free Languages
Applications in Proving Language Non-Regularity

10. Chomsky Normal Form (CNF) and CYK Algorithm


Definition of Chomsky Normal Form, Conversion of CFG to CNF
CYK Algorithm for Parsing Strings in CNF, Application in Compiler Design
LECTURE 1
Introduction to Computation Theory
Definition of Computation Theory
The Theory of Computation is a branch of computer science that focuses on the study of
problems that can be solved using algorithms and the efficiency of these solutions.

Automata theory, also known as the Theory of Computation, is a field within computer
science and mathematics that focuses on studying abstract machines to understand the
capabilities and limitations of computation by analyzing mathematical models of how
machines can perform calculations.

It answers fundamental questions like:

• What problems can be solved using a computer?


• How efficient can these solutions be?
• Are there problems that are unsolvable by any computer?
1.2 Branches of Computation Theory
The theory of computation is divided into three main areas:

A. Automata Theory
• Studies abstract computing devices (automata) and their capabilities.
• Explains how computers process languages and recognize patterns.

B. Computability Theory
• Focuses on what problems are solvable using algorithms.
• Deals with decidable vs. undecidable problems.

C. Complexity Theory
• Examines how efficiently problems can be solved.
• Categorizes problems into complexity classes like P, NP, NP-complete.
1.3 Why Study Computation Theory?
1. Understanding Computation Limits – Helps in identifying which problems are
solvable and which are impossible to compute.
2. Algorithm Design – Provides techniques for designing better and more efficient
algorithms.
3. Artificial Intelligence & Machine Learning – Automata concepts are used in pattern
recognition, natural language processing, and AI.
4. Compiler Design – Used to design programming language parsers and interpreters.
5. Cybersecurity & Cryptography – Complexity theory is used to design secure
cryptographic algorithms.
6. Regular Expressions (RE) in Systems -Regular expressions are powerful tools for
pattern matching and text processing used extensively in many systems.
7. Finite Automata in Modeling Systems -Modeling Protocols and Circuits: Finite
automata (FA) are used to model protocols, like those in network communication,
and to design electronic circuits that operate based on a set of predefined rules or
states.
1.4 Fundamental Concepts in Computation Theory
The study of computation is based on fundamental mathematical models
and formal systems, including:

• Alphabet (Σ) – A set of symbols used in computation. Example: Σ = {0,


1}.
• String – A sequence of symbols from an alphabet. Example: "1011".
• Language (L) – A set of valid strings formed using an alphabet.
• Formal Grammar – A set of rules that define how strings are generated in
a language.
• Automata (Computing Models) – Machines that process languages (e.g.,
Finite Automata, Pushdown Automata, Turing Machines).
1.5 Basic Terminologies of Theory of
Computation
Now, let’s understand the basic terminologies, which are important
and frequently used in the Theory of Computation.
1. Symbol
A symbol (often also called a character) is the smallest building block,
which can be any alphabet, letter, or picture.

2. Alphabets (Σ)
A finite, non-empty set of symbols used to construct strings and
languages. For example, Σ = {a, b}.
1.5 Basic Terminologies of Theory of
Computation
3. String
A string is a finite sequence of symbols from some alphabet. A string
is generally denoted as w and the length of a string is denoted
as |w|. Empty string is the string with zero occurrence of symbols,
represented as ε.
2. GRAMMARS: REGULAR EXPRESSIONS AND
LANGUAGES, CONTEXT-FREE LANGUAGES
Having learnt about Strings and Alphabets in the previous lecture, another important
concept in formal language and automata theory, is Grammar.
Learning Outcomes
At the end of this lecture, you should be able to:
• define Grammar
• state the types of grammars available in the field of Computer Science
• describe the class of automata that can recognize strings generated by each grammar
• identify strings that are generated by a particular grammar
• describe the Chomsky hierarchy
• explain the relevance of grammar and formal languages to computer programming.

1
4
Lecture 2: Grammars and Automata
GRAMMAR
Grammar is a set of rules for forming strings in a formal language.
The rules describe how to form strings from the language's alphabet that are valid
according to the language's syntax. A grammar does not describe the meaning of the
strings or what can be done with them in whatever context - only their form.
Formal language theory, which is the discipline that studies formal grammars and
languages, is a branch of Applied Mathematics. Its applications are found in theoretical
computer science, theoretical linguistics, formal semantics, mathematical logic, and other
areas.
Grammar is a set of rules for rewriting strings, along with a "start symbol" from which
rewriting must start.
Therefore, a grammar is usually thought of as a language generator. However, it can also
sometimes be used as the basis for a "recognizer", a function in computing that
determines whether a given string belongs to the language or is grammatically incorrect.
To describe such recognisers, formal language theory uses separate formalisms, known as
automata theory.
1
6
Elements of a Grammar
• Grammar is composed of two basic elements:
• Terminal Symbols: Terminal symbols are those that are
the components of the sentences generated using
grammar and are represented using small case letters
like a, b, c, etc.
• Non-Terminal Symbols: Non-terminal symbols are
those symbols that take part in the generation of the
sentence but are not the component of the sentence.
Non-Terminal Symbols are also called Auxiliary
Symbols and Variables. These symbols are represented
using a capital letters like A, B, C, etc.
Representation of Grammar

• Any Grammar can be represented by 4 tuples – <N, T,


P, S>
• N – Finite Non-Empty Set of Non-Terminal Symbols.
• T – Finite Set of Terminal Symbols.
• P – Finite Non-Empty Set of Production Rules.
• S – Start Symbol (Symbol from where we start
producing our sentences or strings).
Production Rules

• A production or production rule in computer


science is a rewrite rule specifying a symbol
substitution that can be recursively performed to
generate new symbol sequences. It is of the
form α-> β where α is a Non-Terminal Symbol
which can be replaced by β which is a string of
Terminal Symbols or Non-Terminal Symbols.
• TExample:
= {a,b} Consider Grammar
#Set of terminal symbols G1 = <N, T, P, S>
P = {A->Aa,A->Ab,A->a,A->b,A-> ϵϵ} #Set of all production
rules
S = {A} #Start Symbol
Derivation of strings
• As the start symbol is S then we can produce Aa, Ab,
a,b,ϵ ϵ which can further produce strings where A
can be replaced by the Strings mentioned in the
production rules and hence this grammar can be used
to produce strings of the form (a+b)*.
A grammar mainly consists of a set of rules for transforming strings.
To generate a string in the language, one begins with a string consisting of only a single
start symbol. The production rules are then applied in any order, until a string that contains
neither the start symbol nor designated nonterminal symbols is produced.
The language formed by the grammar consists of all distinct strings that can be generated
in this manner.
Any particular sequence of production rules on the start symbol yields a distinct string in
the language.
If there are multiple ways of generating the same single string, the grammar is said to be
ambiguous.
For example, assume the alphabet consists of a and b, the start symbol is S, and we have
the following production rules:
S -► aSb
S -► ba

2
1
2. The Semantics of Grammars
Semantics is the linguistic and philosophical study of meaning, in language, programming
languages and formal logics.
It is concerned with the relationship between signifiers, like words, phrases, signs and
symbols, and their denotations.

For example, assume the alphabet consists of a and b, the start symbol is S, and we have
the following production rules:
S => aSb
S => ba
Then we start with S, and can choose a rule to apply to it. If we choose rule 1, we obtain
the string aSb. If we choose rule 1 again, we replace S with aSb and obtain the string
aaSbb. If we now choose rule 2, we replace S with ba and obtain the string aababb, and are
done.
We can write this series of choices more briefly, using symbols:
S => aSb => aaSbb => aababb. 6
Example 1: Consider the Grammar G where S = {a, b, c} is the start symbol, and P
consists of the following production rules:
1. S => aBSc
2. S => abc
3. Ba => aB
4. Bb => bb
Construct the grammar of language L(G) = {anbncn n > 1}
Solution: the language is the set of strings that consist of 1 or more a's, followed by the
same number of b's, followed by the same number of c's.
Some examples of the derivation of strings in L(G) are:
S => aBSc => aBabcc => aaBbcc => aabbcc
OR
S => aBSc => aBaBScc => aBaBabccc => aaBBabccc => aaaBBbccc => aaaBbbccc =>
aaabbbccc
2
3
Language theory is a branch of Mathematics concerned with describing languages as a set
of operations over an alphabet. It is closely linked with automata theory, as automata are
used to generate and recognize formal languages.
There are several classes of formal languages, each allowing more complex language
specification than the one before it, i.e. Chomsky hierarchy, and each corresponding to a
class of automata which recognizes it.
Because automata are used as models for computation, formal languages are the preferred
mode of specification for any problem that must be computed.

2
4
TYPES OF GRAMMARS

1. REGULAR EXPRESSIONS: are means to describe regular languages.


Before formally defining the notion of a regular expression, we give some examples.
Consider the expression:
(0 � 1)01*.
The language described by this expression is the set of all binary strings:
1. that start with either 0 or 1 (this is indicated by (0 � 1)),
2. for which the second symbol is 0 (this is indicated by 0), and
3. that end with zero or more 1s (this is indicated by 1*). 9
That is, the language described by this expression is:
{00, 001, 0011, 00111, . . . , 10, 101, 1011, 10111, . . .}

Here are some more examples (in all cases, the alphabet is {0, 1}):
• The language {w : w contains exactly two 0s} can be described by the expression
1*01*01*
• The language {w : w contains at least two 0s}can be described by the expression
(0 � 1)*0(0 � 1)*0(0 � 1)*.
• The language {w : 1011 is a substring of w} can be described by the expression
(0 � 1)*1011(0 � 1)*.

26
• The language {w : the length of w is odd} can be described by the expression:
(0 � 1) ((0 � 1)(0 � 1))* .

• The language {1011, 0} can be described by the expression:


1011 � 0.
• The language {w : the first and last symbols of w are equal} can be described by the
expression: 0(0 � 1)*0 � 1(0 � 1)*1 � 0 � 1.

After these examples, we give a formal definition of regular expressions:

27
Regular Expressions
Regular expressions are symbolic notations used to define search
patterns in strings. They describe regular languages and are
commonly used in tasks such as validation, searching, and parsing
A regular expression over an alphabet Σ is defined as follows:
1.Base Cases
• Empty string: ε is a regular expression that represents the language {ε}.
• Single symbols: Any symbol a ∈ Σ is a regular expression that represents {a}.
2.Recursive Rules
• Union (OR, denoted by "|"): If R₁ and R₂ are regular expressions, then R₁ | R₂ represents the set of strings in
R₁ or R₂.
• Concatenation (denoted by "•"): If R₁ and R₂ are regular expressions, then R₁R₂ represents strings where R₁
is followed by R₂.
• Kleene Star (denoted by "*"): If R is a regular expression, then R* represents zero or more occurrences of R.
Regular Languages
Regular languages are the class of languages that can be represented
using finite automata, regular expressions, or regular grammar.
These languages have predictable patterns and are computationally
efficient to recognize.
Properties of Regular Languages
1. Closure Properties
Regular languages are closed under operations like union,
concatenation, and Kleene star.
•Union: If L1 and If L2 are two regular languages, their union L1 ?
L2 will also be regular. For example, L1 = {an | n ? 0} and L2 = {bn |
n ? 0} L3 = L1 ? L2 = {an ? bn | n ? 0} is also regular.
•Intersection: If L1 and If L2 are two regular languages, their
intersection L1 ? L2 will also be regular. For example, L1= {ambn | n ?
0 and m ? 0} and L2= {ambn ? bnam | n ? 0 and m ? 0} L3 = L1 ? L2
= {ambn | n ? 0 and m ? 0} is also regular.
Properties of Regular Languages
contd..
•Concatenation: If L1 and If L2 are two regular languages, their
concatenation L1.L2 will also be regular. For example, L1 = {an | n ?
0} and L2 = {bn | n ? 0} L3 = L1.L2 = {am. bn | m ? 0 and n ? 0} is
also regular.
•Kleene Closure: If L1 is a regular language, its Kleene closure L1* will
also be regular. For example, L1 = (a ? b) L1* = (a ? b)*
•Complement: If L(G) is regular language, its complement L’(G) will
also be regular. Complement of a language can be found by
subtracting strings which are in L(G) from all possible strings. For
example, L(G) = {an | n > 3} L’(G) = {an | n <= 3}
Definition:
Let Σ be a non-empty alphabet.
1. є is a regular expression.
2. ϕ is a regular expression.
3. For each a∈Σ, a is a regular expression.
4. If R1 and R2 are regular expressions, then R1 � R2 is a regular expression.
5. If R1 and R2 are regular expressions, then R1 R2 is a regular expression.
6. If R is a regular expression, then R* is a regular expression.

You can regard 1, 2, and 3 as being the “building blocks” of regular expressions. Items 4, 5
and 6 give rules that can be used to combine regular expressions into new (and larger)
regular expressions.

31
To give an example, we claim that:
(0 � 1)*101(0 � 1)*
is a regular expression (where the alphabet Σ is equal to {0, 1}). In order to prove this, we
have to show that this expression can be built using the “rules” given in Definition above.
Here we go:
• By point 3, 0 is a regular expression.
• By point 3, 1 is a regular expression.
• Since 0 and 1 are regular expressions, by point 4, 0�1 is a regular expression.
• Since 0�1 is a regular expression, by point 6, (0�1)* is a regular expression.
• Since 1 and 0 are regular expressions, by point 5, 10 is a regular expression.
• Since 10 and 1 are regular expressions, by point 5, 101 is a regular expression.
•Since (0 � 1)* and 101 are regular expressions, by point 5, (0 � 1)*101 is a
regular expression.
• Since (0 � 1)*101 and (0 � 1)* are regular expressions, by point 5, (0 �1)*101(0 � 1)* is a
regular expression. 13
Next we define the language that is described by a regular expression.

Let Σ be a non-empty alphabet.


1. The regular expression є describes the language {є}
2. The regular expression ϕ describes the language ϕ
3. For each a∈Σ, the regular expression a describes the language {a}.
4.Let R1 and R2 be regular expressions and let L1 and L2 be the languages described by
them, respectively. The regular expression R1 � R2 describes the language L1 � L2.
5.Let R1 and R2 be regular expressions and let L1 and L2 be the languages described by
them, respectively. The regular expression R1 R2 describes the language L1 L2.
6.Let R be a regular expression and let L be the language described by it. The regular
expression R* describes the language L*.

33
For example:
• The regular expression (0�є)(1�є) describes the language {01, 0, 1, є}.
• The regular expression 0�є describes the language {0, є}, whereas the regular
expression 1* describes the language {є, 1, 11, 111, . . .}.
Therefore, the regular expression (0�є)1* describes the language {0, 01, 011, 0111, . . . , є,
1, 11, 111, . . .}.

Observe that this language is also described by the regular expression 01* � 1*
• The regular expression 1*∅ describes the empty language, i.e., the language ∅.
• The regular expression ∅* describes the language {є}.

34
2. CONTEXT-FREE GRAMMARS
A Context-Free Grammar (CFG) is more powerful than a regular grammar and is
defined as:
G=(V,Σ,P,S)
A context-free grammar is a set of recursive rules used to generate patterns of strings.
Context-free grammars are used for defining the syntax of programming languages and
their compilation.
Context-free grammars (CFGs) are used to describe context-free languages. A context-free
grammar can describe all regular languages and more, but they cannot
describe all possible languages.
Context-free grammars are studied in fields of theoretical computer science, compiler
design (in particular parsing), and linguistics. CFG’s are used to describe programming
languages and parser programs in compilers.

36
Definition:
A Context-Free Grammar (CFG) is a 4-tuple (V, T, S, P) where:
( i ) V is a finite set called the variables (Set of non-terminal symbol). Typically,
non-terminals are represented by uppercase letters (e.g., S, A,
B).
(ii) T is a finite set, disjoint from V, called the terminals. They are usually
represented by lowercase letters (e.g., a, b, c) or specific symbols.

The left-hand side can only be a Variable, it cannot be a terminal.


But on the right-hand side here it can be a Variable or Terminal
or both combination of Variable and Terminal.

(iii) P is a finite set of rules, with each rule being a variable and a string of variables and
terminals
(iv) S ∈ V is the start variable.
37
Examples of Context free langauages:
(a) The grammar G = ({S}, {a, b}, S, P) with productions
S => aSa, S => bSb,
S => γ is context free.

S => aSa
=> aaSaa
=> aabSbaa
=> aabbaa

(b) The grammar G, with production rules given by


S => abB,
A => aaBb, B => bbAa,
A => γ is context free. 38
39
Relationship of CFG with other computation models
Context-free languages are described by context-free grammars, which can be generated
by pushdown automata, just as regular languages can be generated by finite state
machines. Regular languages and finite state machines can describe some context-free
languages, but not all. Turing machines can generate all regular languages, all context-
free languages, and more.
Any language that can be generated using regular expressions can be generated by a
context-free grammar. All regular languages are context-free languages, but not all
context-free languages are regular.

There are grammars called context-sensitive grammars which are more powerful
(meaning they can generate more complex languages that might require more memory)
than both regular languages and context-free languages.

40
3. CONTEXT SENSITIVE GRAMMARS AND LANGUAGES
A context-sensitive grammar is a formal grammar in which the left-hand sides and right-
hand sides of any production rules may be surrounded by a context of terminal and non-
terminal symbols.
Context-sensitive grammar are more general than context-free grammars, in the sense that
there are languages that can be described by CSG but not by context-free grammar
A context-sensitive Language is a language generated by a context sensitive grammar.
Definition:
A context-sensitive grammar is one whose productions are all of the form
xAy => xvy
where A ∈v and x, v, y ∈ (V �T )*
“Context-sensitive” implies the fact that the actual string modification is given by A=> v,
while the x and y provide the context in which the rule may be applied.

41
For example: S => abc│aAbc
Ab => bA
Ac => Bbcc
bB => Bb
aB => aa │aaA

RELATIONSHIPS BETWEEN GRAMMARS


THEOREM 1: Every context-free language is context-sensitive.
THEOREM 2: There exists a context-sensitive language that is not context-free.
THEOREM 3: Every context-sensitive language is recursive.

42
Other Forms of Generative Grammars
Many extensions and variations on Chomsky's original hierarchy of formal grammars have
been developed, both by linguists and by computer scientists, usually either in order to
increase their expressive power or in order to make them easier to analyse or parse.
Some forms of grammars developed include:
• Tree-adjoining grammars increase the expressiveness of conventional generative
grammars by allowing rewrite rules to operate on parse trees instead of just strings.
• Affix grammars and attribute grammars allow rewrite rules to be augmented with
semantic attributes and operations, useful both for increasing grammar expressiveness
and for constructing practical language translation tools.
• Analytic Grammars

43
THE CHOMSKY HIERARCY
The Chomsky hierarchy is an hierarchy of the classes of formal grammars.
The Chomsky Hierarchy, as originally defined by Noam Chomsky in 1956, comprises
four types of languages and their associated grammars and the type of machines
that recognizes it.

The Chomsky Hierarchy is shown in Table 1:

44
Table 1 : Chomsky Hierarchy

45
• The Unrestricted grammars are classified as Type 0.
• Type 1 grammars generate context-sensitive languages.
• Type 2 grammars generate context-free languages and
• Type 3 grammars generate regular languages.

Each type of grammars is recognised by a particular type of automata.


For example:
Type-2 grammars are recognised by pushdown automata
Type-3 grammars are recognised by finite state automata.
The Unrestricted grammars can express any language that can be accepted by a
Turing machine.

46
47
PARSING
A grammar can be used in two ways:
(a) Using the grammar to generate strings of the language.
(b) Using the grammar to recognize the strings.

“Parsing” a string is finding a derivation (or a derivation tree) for that string.
Parsing a string is like recognizing a string. The only realistic way to recognize a string
of a context-free grammar is to parse it.

48
49
50
51
CONCLUSION
In this lecture, you have been introduced to the concept of formal grammars. Grammars
are very important in the field of automata theory since they are the building blocks of
languages.

SELF EXERCISE
1. What you understand by Grammars?
2. Give examples of Context-Free Grammar
3. Distinguish among the following grammar types:
a. Regular Grammars
b. Context-Free Grammars
c. Analytical Grammars.
4. Discuss the Chomsky hierarchy.
What is the relationship amongst the various types of grammars described in the Chomsky
hierarchy? 32
COURSE OUTLINE

FINITE AUTOMATA
• NFA
• Regular Expressions
• Regular Languages
• Two-way finite automata
• Finite automata with output

5
3
Definition: A Nondeterministic Finite Automata (NFA) is also
defined by a 5- tuple

5
4
NFA differs from DFA in that, the range of δ in NFA is in the powerset 2Q . A string is accepted by
an NFA if there is some sequence of possible moves that will put the machine in the final state at the
end of the string.

Example 1: Obtain an NFA for a language consisting of all strings over {0,1} containing a 1 in
the third position from the end.
Solution:

q1 , q2 , q3 are initial states, q4 is the final state


Note that this is an NFA as δ(q2 , 0) = q3 and δ(q2, 1) = q3.

5
5
Example 2: Determine an NFA accepting the language

Solution:

5
6
We shall come back to NFA later
REGULAR EXPRESSION
Regular Languages.
The regular languages are those languages that can be constructed from the “big three” set
operations viz., (a) Union (b) Concatenation (c) Kleene star. A regular language is defined as
follows.

Definition: Let Σ be an alphabet. The class of “regular languages” over Σ is defined inductively
as follows:

6
58
Regular Expressions:
Regular expressions are designed to represent regular languages with a mathematical
tool, a tool built from a set of primitives and operations. This representation involves
a combination of strings of symbols from some alphabet Σ, parentheses and the
operators +, ⋅ and *. A regular expression is obtained from the symbol {a, b, c},
empty string ∈, and empty-set ∅ with the operations +, ⋅ and * (i.e union,
concatenation and Kleene star).

Examples:
0 + 1 represents the set {0, 1}
1 represents the set {1}
0 represents the set {0}
(0 + 1) 1 represents the set {01, 11}
(a + b ).(b + c) represents the set {ab, bb, ac, bc}
(0 + 1)* = ∈ + (0 + 1) + (0 + 1) (0 + 1) + … = Σ*
∈(0+1)+ = (0+1)(0+1)(0+1)* = Σ+ = Σ*-{∈} 59
Building Regular Expressions
Assume that Σ = {a b, c}
a* means “zero or more instances of a concatenated together”, So a* ={λ,a, aa, aaa,
…}
To say “zero or more ab’s,” = {λ, ab abab, …} = (ab)*.

Also a+ means one or more instances of a concatenated together


E.g. a+ = {a,, aa, aaa, …}

60
Languages defined by Regular Expressions
There is a very simple correspondence between regular expressions and the languages they denote:

Regular Expression L(Regular Expression)

x, for each x ∈Σ {x}


λ {λ}
∅ {}
(r1) L(r1)
r1 ∗ L(r1) ∗
r1 r2 L(r1)L(r2)
r1 + r 2 L(r1)UL(r2)

61
TWO-WAY FINITE AUTOMATA
Two-way finite automata are machines that can read input string in either direction.
This type of machines have a “read head”, which can move left or right over the input
string. Like the finite automata, the two-way finite automata also have a finite set Q of
states and they can be either deterministic (2DFA) or nondeterministic (2NFA). They
accept only regular sets like the ordinary finite automata. Let us assume that the
symbols of the input string are occupying cells of a finite tape, one symbol per cell as
shown below. The left and right end markers |— and —| enclose the input string. The
end markers are not part of the input alphabet Σ.

62
Definition:
A 2DFA is an octuple M = (Q, Σ |—, —|, δ, s, t, r)
where, Q is a finite set of states
Σ is a finite set of input alphabet.
|— is the left end marker, |— ∉Σ,
—| is the right end marker, —|∉ Σ,
δ: Q × (Σ �{|—, —|}) ( → Q × {L, R}) is the transition function.
s∈Q is the start state,
t∈Q is the accept state, and
r∈Q is the reject state, r ≠ t

such that for all the states q,

63
δ takes a state and a symbol as arguments and returns a new state
and a direction to move the head i.e., if δ(p, b) = (q, d), then
whenever the machine is in state p and scanning a tape cell
containing symbol b, it moves its head one cell in the direction d and
enters the state q.

64
FINITE AUTOMATA WITH OUTPUT
Definition: A finite-state machine M = (Q, Σ, O, δ, λ, q0) consists of a finite set Q of states, a finite
input alphabet Σ, a finite output alphabet O, a transition function δ that assigns to each state and input
pair a new state, an output function λ that assigns to each state and input pair an output, and an initial
state q0 . Let M = M = (Q, Σ, O, δ, λ, q0) be a finite state machine. A state table is used to denote the
values of the transition function δ and the output function λ for all pairs of states and input.

Mealey Machine: Usually the finite automata have binary output, i.e., they accept the string or do not
accept the string. This is basically decided on the basis of whether the final state is reached by the initial
state. Removing this restriction, we are trying to consider a model where the outputs can be chosen from
some other alphabet.

65
The values of the output function F(t) in the most general case is a function of the present state q(t) and
present input x(t).
F(t) = λ(q(t), x(t))
where λ is called the output function. This model is called the “Mealey machine”.
A Mealey machine is a six-tuple (Q, Σ, O, δ, λ q0) where all the symbols except λ have the same meaning
as discussed in the section above. λ is the output function mapping Σ × Q into O.

Moore Machine: (To be continued)

66
TWO-WAY FINITE AUTOMATA
Two-way finite automata are machines that can read input string in either direction.
This type of machines have a “read head”, which can move left or right over the input
string. Like the finite automata, the two-way finite automata also have a finite set Q of
states and they can be either deterministic (2DFA) or nondeterministic (2NFA). They
accept only regular sets like the ordinary finite automata. Let us assume that the
symbols of the input string are occupying cells of a finite tape, one symbol per cell as
shown below. The left and right end markers |— and —| enclose the input string. The
end markers are not part of the input alphabet Σ.

67
Definition:
A 2DFA is an octuple M = (Q, Σ |—, —|, δ, s, t, r)
where, Q is a finite set of states
Σ is a finite set of input alphabet.
|— is the left end marker, |— ∉Σ,
—| is the right end marker, —|∉ Σ,
δ: Q × (Σ �{|—, —|}) ( → Q × {L, R}) is the transition function.
s∈Q is the start state,
t∈Q is the accept state, and
r∈Q is the reject state, r ≠ t

such that for all the states q,

68
δ takes a state and a symbol as arguments and returns a new state
and a direction to move the head i.e., if δ(p, b) = (q, d), then
whenever the machine is in state p and scanning a tape cell
containing symbol b, it moves its head one cell in the direction d and
enters the state q.

69
FINITE AUTOMATA WITH OUTPUT
Definition: A finite-state machine M = (Q, Σ, O, δ, λ, q0) consists of a finite set Q of states, a finite
input alphabet Σ, a finite output alphabet O, a transition function δ that assigns to each state and input
pair a new state, an output function λ that assigns to each state and input pair an output, and an initial
state q0 . Let M = M = (Q, Σ, O, δ, λ, q0) be a finite state machine. A state table is used to denote the
values of the transition function δ and the output function λ for all pairs of states and input.

Mealey Machine: Usually the finite automata have binary output, i.e., they accept the string or do not
accept the string. This is basically decided on the basis of whether the final state is reached by the initial
state. Removing this restriction, we are trying to consider a model where the outputs can be chosen from
some other alphabet.

70
The values of the output function F(t) in the most general case is a function of the present state q(t) and
present input x(t).
F(t) = λ(q(t), x(t))
where λ is called the output function. This model is called the “Mealey machine”.
A Mealey machine is a six-tuple (Q, Σ, O, δ, λ q0) where all the symbols except λ have the same meaning
as discussed in the section above. λ is the output function mapping Σ × Q into O.

Moore Machine: (To be continued)

71
DETERMINISTIC FINITE AUTOMATA – DFA
What is an Automaton? An automaton is an abstract model of a digital computer.
It has a mechanism to read input, which is a string over a given alphabet. This
input is place on an “input file”, which can be read by the automaton but cannot
change it.
The input file is divided into cells, each of which can hold
one symbol. The automaton has a temporary “storage”
device, which has unlimited number of cells, the contents of
which can be altered by the automaton. Automaton has a
control unit, which is said to be in one of a finite number of
“internal states”. The automaton can change state in a
defined way.
A model of Automaton
Types of Automaton- We have two types of
Automaton
(a) Deterministic Automata
(b) Non-deterministic Automata
A deterministic automata is one in which each move (i.e. transition
from one state to another) is determined by the current configuration.
If the internal state, input and contents of the storage are known, it is
possible to predict the future behaviour of the automaton. This type
of automaton is said to be deterministic automata, otherwise it is
non-determinist automata.
Definition:

A Deterministic Finite Automaton (DFA) is a 5-tuple


The input mechanism can move only from left to right and reads
exactly one symbol on each step. The transition from one internal
state to another are governed by the transition function δ.
If δ(q0, a) = q1, then if the DFA is in state q0 and the current

input symbol is a, the DFA will go into state q1 .


Discussion:
The machine, M, accepts the specified language, as long as three consecutive b’s have not
been read. It should be noted that:
(i) M is in state qi (where i = 0,1, or 2) immediately after reading a run of i

consecutive b’s that either began the input string or was preceded by an ‘a’.
(ii) If an ‘a’ is read and M is in state, q0 , q1 , or M returns to its initial state q0.

q0 , q1 and q1 are “final states” (as given in the problem). Therefore any input string
not containing three consecutive b’s will be accepted. In case we get three consecutive b’s
then the q3 state is reached (which is not a final state), hence M will

remain in this state, irrespective of any other symbol in the rest of the string. This state q3
is said to be a “dead state” or M is said to be “trapped” at q3 . The DFA

schematic is shown below based on the discussion above


Finite Automaton with four states
Example 2: Determine the DFA schematic for M = (Q, Σ, δ, q, F)
where Q = {q1 , q2 , q3 }, Σ = {0,1}, q1 is the start state, F = {q2 }

and δ is given by the table below.


Also determine a Language L recognized by the DFA.
Solution:

From the given table for δ, the DFA is drawn, where q2 is the only final state.
(It is to be noted that a DFA “accept” a string it can “recognize” a
language. The catch here is that “accept” is used for strings and
“recognize” for of a language).
It could be seen that the DFA accepts strings that has at least one 1 and an even
number of 0s following the last 1. Hence the language L is given by
Example 3: Sketch the DFA given

M = ({q1 q2 }, {0, 1}, δ, q1, {q2})

and δ is given by δ

(q1, 0) = q1

δ ( q 2 , 0 ) = q 1

δ ( q 1 , 1 ) = q 2

δ (q2, 1) = q2

Determine a Language L(M), that the DFA recognizes.


Solution:
From the given data, it is easy to predict the schematic of DFA as
follows.
Internal states = q1 , q2 .
Symbols = 0, 1.
Transition function = δ (as defined above in the given problem) q1 =
Initial state
q2 = Final state

The state diagram of the DFA


If a string ends in a 0, it is “rejected” and “accepted” only if
Therefore the language L(M) = {w | w ends in a 1}.
the string ends in a 1.

Example 4: Design a DFA, the language recognized by the

Automaton being
L  { anb : n  0}
Solution:
Therefore the DFA accepts all strings consisting of an
arbitrary number of a’s, followed by a single b. All other
input strings are rejected.
Example 5: Obtain the state table and state transition diagram
(DFA Schematic) of the finite state automaton: M = (Q, Σ, δ,
q0, F),
where Q = {q0 q1 q2 q3}, Σ = {a b}, q0 is the initial state, F is
the final state with the transition defined by
δ(q0,a) = q2 δ(q3,a) = q1 δ(q2,b) = q3 δ(q2,a) = q0

δ(q1,a) = q3 δ(q0,b) = q1 δ(q3,b) = q2 δ(q1,b) = q0


Solution:
The State Table diagram is as shown below

With the given definitions, the State


Transition diagram/DFA Schematic is shown on next slide.
20
Example 6: Obtain the DFA that accepts/recognizes the language
L(M) = {w | w ∈ {a, b, c}* and w contains the pattern abac}
(Note: This is an application of DFA’s involving searching a text for a
specified pattern
Solution: Let us begin by “hard coding” the pattern into the
machines states as shown in fig. (a) below.
As the pattern ‘abac’ has length four, there are four states
required in addition to one initial state q0, to remember the

pattern. q4 is the only accepting state required and this state

q4 can be reached only after reading ‘abac’. The complete DFA


is as shown below in in this slide.
Exercise: Determine the languages produced by the following FAs

(1)

(i) (ii)

(2) *Construct a finite state machine that accepts only positive integers that are evenly
divisible by 4. Hint: Use Σ = {0, 1}.

* Intermediate
Language of a DFA
A DFA A accepts string w if there is a path from q0 to an accepting (or final) state that is
labeled by w

i.e., L(A) = { w | δ(q0,w)  F }

I.e., L(A) = all strings that lead to an accepting state from q0

96
Non-deterministic Finite Automata
(NFA)
A Non-deterministic Finite Automaton (NFA)
is of course “non-deterministic”
Implying that the machine can exist in more than one state at the same time
Transitions could be non-deterministic

1 qj
qi … • Each transition function therefore
1 maps to a set of states
qk

97
Non-deterministic Finite Automata
A Non-deterministic Finite Automaton (NFA) consists of:
(NFA)
Q ==> a finite set of states
∑ ==> a finite set of input symbols (alphabet)
q0 ==> a start state
F ==> set of accepting states
δ ==> a transition function, which is a mapping between Q x ∑ ==> subset of Q
An NFA is also defined by the 5-tuple:
{Q, ∑ , q0,F, δ }

98
How to use an NFA?
Input: a word w in ∑*
Question: Is w acceptable by the NFA?
Steps:
Start at the “start state” q0
For every input symbol in the sequence w do
Determine all possible next states from all current states, given the current input symbol in w and the
transition function
If after all symbols in w are consumed and if at least one of the current states is a final state then
accept w;
Otherwise, reject w.

99
Regular expression: (0+1)*01(0+1)*
NFA for strings containing 01

Why is this non-deterministic? • Q = {q0,q1,q2}

0,1 0,1 •  = {0,1}


• start state = q0
start 0 1
q0 q1 q2 • F = {q2}
Final • Transition table
state symbols
0 1
What will happen if at state q1 q0 {q0,q1} {q0}

states
an input of 0 is received? q1 Φ {q2}
*q2 {q2} {q2}
10
0
Note: Omitting to explicitly show error states is just a matter of design convenience
(one that is generally followed for NFAs), and
i.e., this feature should not be confused with the notion of non-determinism.
What is an “error state”?
A DFA for recognizing the key word “while”

An NFA for the same purpose: w h i l e


q0 q1 q2 q3 q4 q5

Any other input symbol


qerr
Any symbol

w h i l e
q0 q1 q2 q3 q4 q5

Transitions into a dead state are implicit 10


1
Example #2
Build an NFA for the following language:
L = { w | w ends in 01}
?
Other examples
Keyword recognizer (e.g., if, then, else, while, for, include, etc.)
Strings where the first symbol is present somewhere later on at least once

10
2
Language of an NFA
An NFA accepts w if there exists at least one path from the start state to an accepting (or
final) state that is labeled by w
L(N) = { w | δ(q0,w) ∩ F ≠ Φ }

10
3
Advantages of NFA
Great for modeling regular expressions
String processing - e.g., grep, lexical analyzer

Could a non-deterministic state machine be implemented in practice?


Probabilistic models could be viewed as extensions of non-deterministic state machines
(e.g., toss of a coin, a roll of dice)
They are not the same though
A parallel computer could exist in multiple “states” at the same time

10
4
Technologies for NFAs

Micron’s Automata Processor (introduced in 2013)


2D array of MISD (multiple instruction single data) fabric w/
thousands to millions of processing elements.
1 input symbol = fed to all states (i.e., cores)
Non-determinism using circuits
http://www.micronautomata.com/

10
5
But, DFAs and NFAs are equivalent in their power to capture langauges !!
Differences: DFA vs. NFA
DFA
1. All transitions are deterministic
NFA Each transition leads to exactly one state
1. Some For
2. transitions could
each state, be non-deterministic
transition on all possible symbols (alphabet) should be defined
A3.transition could lead to a subset of
Accepts input if the last state states
visited is in F
2. Not
4. all symbol transitions need to be
Sometimes harder to construct because defined of the number of states
explicitly
5. (if undefined
Practical will go to
implementation an error state –
is feasible
this is just a design convenience, not to be confused
with “non-determinism”)
3. Accepts input if one of the last states is in F
4. Generally easier than a DFA to construct
5. Practical implementations limited but emerging
(e.g., Micron automata processor)

10
6
Regular Expressions

Reading: Chapter 3

10
7

You might also like