Converting a CFG to CNF is an important step in many parsing algorithms, like the CYK algorithm, and helps in understanding the structure of languages. A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production rules satisfy the following conditions:
- A non-terminal generating a terminal (e.g.; X→ x)
- A non-terminal generating two non-terminals (e.g.; X→YZ)
- Start symbol generating ε. (e.g.; S→ ε)
Consider the following grammars,
G1 = {S→a, S→AZ, A→a, Z→z}
G2 = {S→a, S→aZ, Z→a}
The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However, the grammar G2 is not in CNF as the production rule S→aZ contains terminal followed by non-terminal which does not satisfy the rules specified for CNF.
Key Properties of CNF
- A single CFG can be converted into different equivalent CNF forms.
- CNF produces the same language as the original CFG.
- CNF is widely used in parsing algorithms such as:
- Cocke - Younger - Kasami (CYK) algorithm for membership checking.
- Bottom-up parsers in compilers.
- For a string of length n, a CNF derivation requires at most 2n-1 derivation steps.
- Any CFG that does not generate ε has an equivalent CNF.
Steps to Convert CFG to CNF
Step 1: Eliminate the Start Symbol from RHS
If start symbol S is at the RHS of any production in the grammar, create a new production as: S0→S where S0 is the new start symbol.
Step 2: Remove Null, Unit, and Useless Productions
- Null (ε) Productions: If a rule contains ε, remove it by modifying other rules accordingly.
- Unit Productions: If a rule has a single non-terminal on the RHS (e.g.,
A→B ), replace it with B’s productions. - Useless Productions: Remove non-reachable or non-generating symbols from the grammar.
Step 3: Replace Terminals in Mixed Productions
Eliminate terminals from RHS if they exist with other terminals or non-terminals. e.g. , production rule X→ xY can be decomposed as: X→ZY, Z→x.
Step 4: Reduce Productions with More Than Two Non-Terminals
Eliminate RHS with more than two non-terminals. e.g,; production rule X→XYZ can be decomposed as: X→PZ, P→XY
Example: Converting a CFG to CNF
Let us take an example to convert CFG to CNF. Consider the given grammar G1:
S → ASB
A → aAS | a | ε
B → SbS | A | bb
Step 1.
As start symbol S appears on the RHS, we will create a new production rule S0→S. Therefore, the grammar will become:
S0→S
S → ASB
A → aAS | a | ε
B → SbS | A | bb
Step 2.
As grammar contains null production A→ ε, its removal from the grammar yields:
S0→S
S → ASB | SB
A → a | aAS | aS
B → SbS | A | ε | bb
Now, it creates null production B→ ε, its removal from the grammar yields:
S0→S
S → AS | S | ASB | SB
A → a | aAS | aS
B → SbS | A | bb
Now, it creates unit production B→A, its removal from the grammar yields:
S0→S
S → AS | ASB | SB | S
A → a | aAS | aS
B → SbS | bb | aAS | aS | a
Also, removal of unit production S0→S from grammar yields:
S0→ AS | ASB | SB | S
S → AS | ASB | SB | S
A → aAS | aS | a
B → SbS | bb | aAS | aS | a
Also, removal of unit production S→S and S0→S from grammar yields:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → aAS | aS | a
B → SbS | bb | aAS | aS | a
Step 3.
In production rule A→aAS | aS and B→ SbS | aAS | aS, terminals a and b exist on RHS with non-terminates. Removing them from RHS:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS |a
B → SYS | bb | XAS | XS |a
X →a
Y→b
Also, B→ bb can’t be part of CNF, removing it from grammar yields:
S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b
Step 4:
In production rule S0→ASB, S→ASB RHS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b
P → AS
Similarly, A→XAS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → SYS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA
Similarly, B→SYS has more than two symbols, removing it from grammar yields:
S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → TS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA
T → SY
So this is the required CNF for given grammar.