Lesson 22 CFG and CNF 11012023 022658pm 21022024 043832pm
Lesson 22 CFG and CNF 11012023 022658pm 21022024 043832pm
Theory Of Automata
Instructor: Tahir Iqbal
2
Context Free Grammar (CFG)
The earliest computers accepted no
instructions other then their own assembly
language. Every procedure, no matter how
complicated , had to be encoded in the set of
instructions, LOAD, STORE, ADD the
contents of two registers and so on. The major
problem was to display mathematical formulas
as follows
- 2
+ - 2
+ -
(8 0) (7 10) (11 10) 2
S=
2
or 3
CFG continued …
1
+9
A= 2
8 5
4+ +
21 1
3+
2
So, it was necessary to develop a way of
writing such expressions in one line of
standard typewriter symbols, so that in this
way a high level language could be invented.
Before the invention of computers, no one
would ever have dreamed of writing such
complicated formula in parentheses e.g. the
right side of formula can be written as
4
((1/2)+9)/(4+(8/21)+(5/(3+(1/2))))
CFG continued …
The high level language is converted into
assembly language codes by a program called
compiler.
The compiler that takes the user’s programs as
its inputs and prints out an equivalent program
written in assembly language.
Like spoken languages, high level languages
for computer have also, certain grammar. But
in case of computers, the grammatical rules,
don’t involve the meaning of the words.
5
CFG continued …
It can be noted that the grammatical rules
which involve the meaning of words are called
Semantics, while those don’t involve the
meaning of the words are called Syntactics.
e.g. in English language, it can not be written “
Buildings sing ”, while in computer language
one number is as good as another.
e.g. X = B + 10, X = B + 999
Following is a remark
6
Remark
In general, the rules of computer language
grammar, are all syntactic and not semantic.
A law of grammar is in reality a suggestion
for possible substitutions.
7
CFG terminologies
8
CFG
CFG is a collection of the followings
1. An alphabet Σ of letters called terminals from
which the strings are formed, that will be the
words of the language.
2. A set of symbols called non-terminals, one of
which is S, stands for “start here”.
3. A finite set of productions of the form
non-terminal → finite string of terminals and /or
non-terminals.
Following is a note in this regard
9
Note
The terminals are designated by small
letters, while the non-terminals are
designated by capital letters.
There is at least one production that has the
non-terminal S as its left side.
10
Context Free Language (CFL)
The language generated by CFG is called
Context Free Language (CFL).
Example:
= {a}
productions:
1. S →aS
2. S→
Applying production (1) six times and then
production (2) once, the word aaaaaa is
generated as
11
S aS
aaS
aaaS
aaaaS
aaaaaS
aaaaaaS
aaaaaa
= aaaaaa
12
Example continued …
13
Example
= {a}
productions:
1. S→SS
2. S→a
3. S→
This grammar also defines the language
expressed by a*.
Note: It is to be noted that is considered to
be non-terminal. It has a special status. If for
a certain non-terminal N, there may be a
production N→ . This simply means that N
can be deleted when it comes in the working
14
string.
Example
= {a,b}
productions:
1. S→X
2. S→Y
3. X→
4. Y→aY
5. Y→bY
6. Y→a
7. Y→b 15
Example continued …
All words of this language are of either X-type
or of Y-type. i.e. while generating a word the
first production used is S→X or S→Y. The
words of X-type give only , while the words
of Y-type are words of finite strings of a’s or
b’s or both i.e. (a+b)+. Thus the language
defined is expressed by (a+b)*.
16
Example
= {a,b}
productions:
1. S→aS
2. S→bS
3. S→a
4. S→b
5. S→
This grammar also defines the language
expressed by (a+b)*. 17
Example
= {a,b}
productions:
1. S→XaaX
2. X→aX
3. X→bX
4. X→
This grammar defines the language
expressed by (a+b)*aa(a+b)*.
18
Summing Up
Context Free Grammar, Terminals, non-
terminals, productions, CFG, context Free
language, examples.
19
Recap lecture 31
Context Free Grammar, Terminals, non-
terminals, productions, CFG, context Free
language, examples.
Example
= {a,b}
productions:
1. S SS
2. S XS
3. S
4. S YSY
5. X aa
6. X bb
7. Y ab
8. Y ba
This grammar generates EVEN-EVEN
language.
= {a,b}
Example
productions:
1. S aB
2. S bA
3. A a
4. A aS
5. A bAA
6. B b
7. B bS
8. B aBB
This grammar generates the language
EQUAL(The language of strings, with
number of a’s equal to number of b’s).
Note
It is to be noted that if the same non-terminal
have more than one productions, it can be
written in single line e.g.
S aS, S bS, S can be written as
S aS|bS|
It may also be noted that the productions
S SS| always defines the language which
is closed w.r.t. concatenation i.e.the language
expressed by RE of type r*. It may also be
noted that the production S SS defines the
language expressed by r+.
Example
Consider the following CFG = {a,b}
productions:
1. S YXY
2. Y aY|bY|
3. X bbb
It can be observed that, using prod.2, Y
generates . Y generates a. Y generates b. Y
also generates all the combinations of a and
b. thus Y generates the strings generated by
(a+b)*. It may also be observed that the
above CFG generates the language expressed
by (a+b)*bbb(a+b)*. Following are four
words generated by the given CFG
Example continued …
S YXY S YXY
aYbbb bYbbbaY
abYbbb b bbbabY
ab bbb bbbbabbY
= abbbb bbbbabbaY
bbbbabba
S YXY
= bbbbabba
bbbaY
bbbabY S YXY
bbbabaY bYbbbaY
bbbaba b bbba
= bbbaba = bbbba
Example
Consider the following CFG
1. S SS|XaXaX|
2. X bX|
It can be observed that, using prod.2, X
generates . X generates any number of
b’s. Thus X generates the strings
generated by b*. It may also be observed
that the above CFG generates the
language expressed by (b*ab*ab*)*.
Example
Consider the following CFG
= {a,b}
productions:
S aSa|bSb|a|b|
The above CFG generates the language
PALINDROME. It may be noted that the
CFG
S aSa|bSb|a|b generates the language NON-
NULLPALINDROME.
Example
Consider the following CFG
= {a,b}
productions:
S aSb|ab|
It can be observed that the CFG generates the
language {anbn: n=0,1,2,3, …}. It may also be
noted that the language {anbn: n=1,2,3, …} can
be generated by the following CFG S aSb|ab
Task
Construct CFG that generates the language
*
L = {w {a,b} : length(w) 2 and second
letter of w from right is a}
Example
Consider the following CFG
(1) S aXb|bXa (2) X aX|bX|
The above CFG generates the language of
strings, defined over ={a,b}, beginning and
ending in different letters.
Task
Construct the CFG for the language of strings,
defined over ={a,b}, beginning and ending in
same letters.
Trees
As in English language any sentence can be
expressed by parse tree, so any word generated
by the given CFG can also be expressed by the
parse tree, e.g.
consider the following CFG
S AA
A AAA|bA|Ab|a
Obviously, baab can be generated by the above
CFG. To express the word baab as a parse tree,
start with S. Replace S by the string AA, of
nonterminals, drawing the downward lines
from S to each character of this string as
follows
Trees continued …
S
A A
Now let the left A be replaced by bA and the
right one by Ab then the tree will be
S
A A
b AA b
Trees continued …
Replacing both A’s by a, the above tree will
be
S
A A
b AA b
a a
Trees continued …
Thus the word baab is generated. The above
tree to generate the word baab is called
Syntax tree or Generation tree or
Derivation tree as well.
Task
Construct CFG that generates the language
*
L = {w {a,b} : length(w) 2 and second
letter of w from right is a}
Solution of the task
Construct CFG that generates the language L =
{w {a,b}*: length(w) 2 and second letter of
w from right is a}
It can be observed that the language L can be
expressed by (a+b)*(aa+ab). Here the
nonterminal S should be replaced the
nonterminals X or Y where X should generate
the strings corresponding to (a+b)* and Y
should generate the strings corresponding to
(aa+ab). Thus the required CFG may be
(1) S XY (2) X aX|bX| (3) Y
aa|ab
Task
Construct the CFG for the language of strings,
defined over ={a,b}, beginning and ending in
same letters.
Solution:
It may be noted that the above language
contains the strings a and b as well. So while
constructing the required CFG the strings a and
b must be generated. Thus the required CFG
may be
S aXa|bXb|a|b
X aX|bX|
Example
Consider the following CFG
S S+S|S*S|number
where S and number are non-terminals and the
operators behave like terminals.
The above CFG creates ambiguity as the
expression 3+4*5 has two possibilities
(3+4)*5=35 and 3+(4*5)=23 which can be
expressed by the following production trees
Example continued …
S S
(i) S + S (ii) S S
*
3 S * S S + S 5
4 5 3 4
Example continued …
The expressions can be calculated starting
from bottom to the top, replacing each
nonterminal by the result of calculation e.g.
S S
(i) 3 + S 3 + 20 23
4 * 5
Example continued …
Similarly
S S
(ii) 7 35
S * 5 * 5
3 + 4
The ambiguity that has been observed in this
example can be removed with a change in the
CFG as discussed in the following example
Example
S (S+S)|(S*S)|number
where S and number are nonterminals, while
(, *, +, ) and the numbers are terminals.
Here it can be observed that
1. S (S+S)
(S+(S*S))
(3+(4*5)) = 23
2. S (S*S)
((S+S)*S)
((3+4)*5) = 35
Polish Notation (o-o-o)
There is another notation for arithmetic
expressions for the CFG S S+S|S*S|number.
Consider the following derivation trees
S S
(i) S + S (ii) S S
*
3 S * S S + S 5
4 5 3 4
Polish Notation (o-o-o)
The arithmetic expressions shown by the trees (i)
and (ii) can be calculated from the following trees,
respectively
S S
+ *
(i) 3 * (ii) + 5
5 3 4
4
Here most of the S’s are eliminated.
Polish notation continued …
The branches are connected directly with the
operators. Moreover, the operators + and * are
no longer terminals as these are to be replaced
by numbers (results).
To write the arithmetic expression, it is
required to traverse from the left side of S and
going onward around the tree. The arithmetic
expressions will be as under
(i) + 3 * 4 5 (ii) * +3 4 5
The above notation is called operator prefix
notation.
Polish notation continued …
To evaluate the strings of characters, the first
substring (from the left) of the form
operator-operand-operand (o-o-o) is found and
is replaced by its calculation e.g.
(i) +3*4 5 = +3 20 = 23
(ii) *+3 4 5 = * 7 5 = 35
It may be noted that 4*5+3 is an infix
arithmetic expression, while an arithmetic in
(o-o-o) form is a prefix arithmetic expression.
Consider another example as follows
Example
To calculate the arithmetic expression of the
following tree
S
*
+ 6
* 5
+ +
1 23 4
Example continued …
it can be written as
*+*+1 2+3 4 5 6
The above arithmetic expression in (o-o-o)
form can be calculated as
*+*+1 2+3 4 5 6 = *+*3+3 4 5 6
= *+*3 7 5 6 = *+21 5 6 = *26 6 = 156.
Following is a note
Note
The previous prefix arithmetic expression can be
converted into the following infix arithmetic
expression as
*+*+1 2+3 4 5 6
= *+*+1 2 (3+4) 5 6
= *+*(1+2) (3+4) 5 6
= *(((1+2)*(3+4)) + 5) 6
= (((1+2)*(3+4)) + 5)*6
Task
Convert the following infix expressions
into the corresponding prefix expressions.
Calculate the values of the expressions as
well
1. 2*(3+4)*5
2. ((4+5)*6)+4
Ambiguous CFG
The CFG is said to be ambiguous if there
exists atleast one word of it’s language that
can be generated by the different production
trees.
Example: Consider the following CFG
S aS|Sa|a
The word aaa can be generated by the
following three different trees
Example continued …
S S S
a S a S S a
a S S a S a
a a a
Thus the above CFG is ambiguous, while the
CFG S aS|a is not ambiguous as neither the
word aaa nor any other word can be derived
from more than one production trees. The
derivation tree for aaa is as follows
Recap lecture 33
Example of trees, Polish Notation,
examples, Ambiguous CFG, example,
Solution of the Task
Convert the following infix expressions into the
corresponding prefix expressions. Calculate the
values of the expressions as well
1. 2*(3+4)*5
Solution: 2*+3 4 *5 which can be calculated as
*2+3 4 *5 = * *2+3 4 5= **2 7 5 = *14 5 = 70
2. ((4+5)*6)+4
Solution: (+4 5 * 6)+4= (*+4 5 6) + 4 which can
be calculated as
+*+4 5 6 4 = +* 9 6 4 = +54 4 = 58
Example
Consider the following CFG
S aS | bS | aaS |
It can be observed that the word aaa can be
derived from more than one production
trees. Thus, the above CFG is ambiguous.
This ambiguity can be removed by
removing the production S aaS
Following is another example
Example
Consider the CFG of the language
PALINDROME
S aSa|bSb|a|b|
It may be noted that this CFG is unambiguous
as all the words of the language
PALINDROME can only be generated by a
unique production tree.
It may be noted that if the production
S aaSaa is added to the given CFG, the CFG
thus obtained will be no more unambiguous.
Total language tree
For a given CFG, a tree with the start symbol S
as its root and whose nodes are working strings
of terminals and non-terminals. The
descendants of each node are all possible
results of applying every production to the
working string. This tree is called total
language tree. Following is an example of
total language tree
Example
Consider the following CFG
S aa|bX|aXX
X ab|b, then the total language tree for the
given CFG may be
S
aa aXX
bX
aXb
bab bb abX abb
aabb
aabX aXab
abababb
aabab aabbaabab abab
Example continued …
It may be observed from the previous total
language tree that dropping the repeated
words, the language generated by the given
CFG is
{aa, bab, bb, aabab, aabb, abab, abb}
Example
Consider the following CFG
S X|b, X aX
then following will be the total language tree of
the above CFG S
X b
Note: It is to be aX
noted that the
only word in aaX
…
aa
bb S- +
a A a
a
S- +
b
b
b
B
The corresponding RE may be (aa+bb)+.
Following is another example
Example
Consider the following CFG
S aaS|bbS|abX|baX|
X aaX|bbX|abS|baS,
then the corresponding TG will be
aa,bb aa,bb
ab,ba
+ S- X
ab,ba
a A a
a
S- +
b
b
b
B
The corresponding RE may be (aa+bb)+.
Following is another example
Example
Consider the following CFG
S aaS|bbS|abX|baX|
X aaX|bbX|abS|baS,
then the corresponding TG will be
aa,bb aa,bb
ab,ba
+ S- X
ab,ba
A a
B b
C AB
S CC|BC|AC|CB|CA|AA|C|BA|BB|a|b
is the CFG in CNF.
Task
Convert the following CFG to be in CNF
S ABAB
A a|
B b|
Example
To construct an FA that accepts the grammar
S abA
A baB
B aA|bb
The language can be identified by the three
words generated as follows
(i) S abA
abbaB (using A baB)
abba bb (using B bb)
(ii) S abA
abbaB (using A baB)
abbaaA (using B aA)
abbaabaB (using A baB)
abbaababb (using B bb)
(iii) S abA
abbaB (using A baB)
abbaaA (using B aA)
abbaabaB (using A baB)
abbaabaaA (using B aA)
abbaabaabaB (using A baB)
abbaabaababb (using B bb)
Example continued …
which shows that corresponding language
has RE abba(aba)*bb. Thus the FA
accepting the given CFG may be
a
S- a b A b a B b b +
a
b b a
a a,b
a,b
Left most derivation
Definition: The derivation of a word w,
generated by a CFG, such that at each step, a
production is applied to the left most
nonterminal in the working string, is said to be
left most derivation.
It is to be noted that the nonterminal that
occurs first from the left in the working string,
is said to be left most nonterminal. Following
is an example
Example
Consider the following CFG
S XY
X XX|a
Y YY|b
then following are the two left most
derivations of aaabb
Example continued …
S XY S XY
XXY XXY
aXY XXXY
aXXY aXXY
aaXY aaXY
aaaY aaaY
aaaYY aaaYY
aaabY aaabY
= aaabb = aaabb
Theorem
Any word that can be generated by a certain CFG
has also a left most derivation.
It is to be noted that the above theorem can be
stated for right most derivation as well.
Example: Consider the following CFG
S YX
X XX|b
Y YY|a
Following are the left most and right most
derivations of abbbb
Example continued …
S YX S YX
aX YXX
aXX YXb
abX YXXb
abXX YXbb
abbX YXXbb
abbXX YXbbb
abbbX Ybbbb
= abbbb = abbbb
A new format for FAs
A class of machines (FAs) has been discussed
accepting the regular language i.e.
corresponding to a regular language there is a
machine in this class, accepting that language
and corresponding to a machine of this class
there is a regular language accepted by this
machine. It has also been discussed that there
is a CFG corresponding to regular language
and CFGs also define some nonregular
languages, as well
A new format for FAs contd. …
There is a question whether there is a class of
machines accepting the CFLs? The answer is
yes. The new machines which are to be defined
are more powerful and can be constructed with
the help of FAs with new format.
To define the new format of an FA, the
following are to be defined
Summing Up
• Chomsky Normal Form, Theorem regarding
CNF, examples of converting CFG to be in
CNF, Example of an FA corresponding to
Regular CFG, Left most and Right most