CHOMSKY NORMAL FORM
A CFG is in Chomsky Normal Form if the Productions are in the following forms,
NT → T
A Non-Terminal (NT) on left-hand side and a Terminal (T) on Right-Hand Side
Example:
A→a
E → id
NT → 𝑵𝑻𝟏 𝑵𝑻𝟐
A Non-Terminal (NT) on left-hand side and two consecutive Non-Terminal
(𝑁𝑇1, 𝑁𝑇2 ) on Right-Hand Side
Example:
A → BC
If the grammar is not on the above said forms, then our tasks are,
1. Arrange all bodies of the production of length two or more consists only of
variables.
2. Break the bodies of length 3 or more into a cascade productions, each with the
body consisting of only two variables.
Algorithm to Convert into Chomsky Normal Form,
Step 1. If the start symbol S occurs on some right side, create a new start symbol S1
and a new production S1→ S.
Step 2. Remove useless symbols.
Removing useless symbols from the grammar does not affect the language
generated by the grammar.
Eliminating useless symbol begins with two approaches,
∗
1. Say X is generating if 𝑋 ⇒ 𝑤 for some terminal w. Every terminal is
generating, since w can be that terminal itself, which is derived by zero
steps.
∗
2. Say x is reachable if there is a derivation 𝑆 ⇒ 𝛼𝑋𝛽 for some 𝛼 𝑎𝑛𝑑 𝛽.
Step 3. Remove Null productions or ε - productions.
Basis:
If A → ∈ is a production of G, then A is nullable.
Induction:
Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity Page 1
If there is a production B → C1C2..Ck , where each Ci is nullable then B
is nullable. Ci must be variable to be nullable, so we only have to
consider productions with all-variable bodies.
Step 4. Remove unit productions.
It is a production of the form A → B , where A and B are the variables. Unit
productions can complicate certain proofs and they introduce extra steps into
derivations. That are, technically need not to be there in CFG.
Step 5. Replace each production A → B1…Bn where n > 2 with A → B1C where C →
B2 …Bn. Repeat this step for all productions having two or more symbols in
the right side.
Step 6. If the right side of any production is in the form A → aB where a is a terminal
and A, B are non-terminal, then the production is replaced by A → XB and X
→ a. Repeat this step for every production which is in the form A → aB.
Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S1 and S1→S is added to the
production set and it becomes,
S1→S
S→ ASA | aB
A→B|S
B→b|∈
(2) Remove Useless Symbols,
There is no useless symbols.
S1→S
S→ ASA | aB
A→B|S
B→b|∈
(3) Remove the Null productions,
B → ∈ and A → ∈ are Null productions
After removing B → ε, the production set becomes,
S1→S
Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity Page 2
S→ ASA | aB | a
A→B|S|∈ Substitute B by ∈
B→b
After removing A → ∈, the production set becomes,
S1→S
S→ ASA | aB | a | AS | SA | S Substitute A by ∈
A→B|S
B→b
(4) Remove the unit productions.
Here S → S, S1→ S, A → B, and A → S are the Unit productions.
After removing S → S, the production set becomes,
S1→S
S→ ASA | aB | a | AS | SA S → S is Removed However S → S
A→B|S gives same as like S→ ASA | aB | a |
B→b AS | SA
After removing S1→ S, the production set becomes,
S1→ ASA | aB | a | AS | SA Substitute S by ASA | aB | a | AS | SA
S→ ASA | aB | a | AS | SA
A→B|S
B→b
After removing A→ B, the production set becomes,
S1 → ASA | aB | a | AS | SA
S→ ASA | aB | a | AS | SA
A→S|b Substitute B by b
B→b
After removing A→ S, the production set becomes,
S1 → ASA | aB | a | AS | SA
S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA Substitute S by ASA | aB | a | AS | SA
B→b
Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity Page 3
(5) Now we will find out more than two variables in the R.H.S
Here, S1→ ASA, S → ASA, A→ ASA violates more than two Non-terminals in
R.H.S. Hence we will apply step 5 from an algorithm to get the following final
production set,
S1→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(6) We have to change the productions S1→ aB, S→ aB, A→ aB can be replaced by
using step 6 from an algorithm. And the final production set becomes,
S1→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity Page 4