Open In App

Converting Context Free Grammar to Chomsky Normal Form

Last Updated : 16 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Chomsky Normal Form (CNF) is a way to simplify context-free grammars (CFGs) so that all production rules follow specific patterns. In CNF, each rule either produces two non-terminal symbols, or a single terminal symbol, or, in some cases, the empty string. Converting a CFG to CNF is an important step in many parsing algorithms, like the CYK algorithm, and helps in understanding the structure of languages.

What is Chomsky Normal Form (CNF)

A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production rules satisfy the following conditions:

  • A non-terminal generating a terminal (e.g.; X→ x)
  • A non-terminal generating two non-terminals (e.g.; X→YZ)
  • Start symbol generating ε. (e.g.; S→ ε)

Consider the following grammars,

G1 = {S→a, S→AZ, A→a, Z→z}
G2 = {S→a, S→aZ, Z→a}

The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However, the grammar G2 is not in CNF as the production rule S→aZ contains terminal followed by non-terminal which does not satisfy the rules specified for CNF.

Key Properties of CNF

  • A single CFG can be converted into different equivalent CNF forms.
  • CNF produces the same language as the original CFG.
  • CNF is widely used in parsing algorithms such as:
    • Cocke – Younger – Kasami (CYK) algorithm for membership checking.
    • Bottom-up parsers in compilers.
  • For a string of length n, a CNF derivation requires at most 2n-1 derivation steps.
  • Any CFG that does not generate ε has an equivalent CNF.

Steps to Convert CFG to CNF

Step 1: Eliminate the Start Symbol from RHS

If start symbol S is at the RHS of any production in the grammar, create a new production as: S0→S where S0 is the new start symbol.

Step 2: Remove Null, Unit, and Useless Productions

  • Null (ε) Productions: If a rule contains ε, remove it by modifying other rules accordingly.
  • Unit Productions: If a rule has a single nonterminal on the RHS (e.g., [Tex]A→B[/Tex]), replace it with B’s productions.
  • Useless Productions: Remove non-reachable or nongenerating symbols from the grammar.

Step 3: Replace Terminals in Mixed Productions

Eliminate terminals from RHS if they exist with other terminals or non-terminals. e.g. , production rule X→ xY can be decomposed as: X→ZY, Z→x.

Step 4: Reduce Productions with More Than Two Non-Terminals

Eliminate RHS with more than two non-terminals. e.g,; production rule X→XYZ can be decomposed as: X→PZ, P→XY

Example: Converting a CFG to CNF

Let us take an example to convert CFG to CNF. Consider the given grammar G1:

S → ASB
A → aAS | a | ε
B → SbS | A | bb

Step 1.

As start symbol S appears on the RHS, we will create a new production rule S0→S. Therefore, the grammar will become:

S0→S
S → ASB
A → aAS | a | ε
B → SbS | A | bb

Step 2.

As grammar contains null production A→ ε, its removal from the grammar yields:

S0→S
S → ASB | SB
A → a | aAS | aS
B → SbS | A | ε | bb

Now, it creates null production B→ ε, its removal from the grammar yields:

S0→S
S → AS | S | ASB | SB
A → a | aAS | aS
B → SbS | A | bb

Now, it creates unit production B→A, its removal from the grammar yields:

S0→S
S → AS | ASB | SB | S
A → a | aAS | aS
B → SbS | bb | aAS | aS | a

Also, removal of unit production S0→S from grammar yields:

S0→ AS | ASB | SB | S
S → AS | ASB | SB | S
A → aAS | aS | a
B → SbS | bb | aAS | aS | a

Also, removal of unit production S→S and S0→S from grammar yields:

S0→ AS | ASB | SB
S → AS | ASB | SB
A → aAS | aS | a
B → SbS | bb | aAS | aS | a

Step 3.

In production rule A→aAS | aS and B→ SbS | aAS | aS, terminals a and b exist on RHS with non-terminates. Removing them from RHS:

S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS |a
B → SYS | bb | XAS | XS |a
X →a
Y→b

Also, B→ bb can’t be part of CNF, removing it from grammar yields:

S0→ AS | ASB | SB
S → AS | ASB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b

Step 4:

In production rule S0→ASB, S→ASB RHS has more than two symbols, removing it from grammar yields:

S0→ AS | PB | SB
S → AS | PB | SB
A → XAS | XS | a
B → SYS | YY | XAS | XS | a
X → a
Y → b
P → AS

Similarly, A→XAS has more than two symbols, removing it from grammar yields:

S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → SYS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA

Similarly, B→SYS has more than two symbols, removing it from grammar yields:

S0→ AS | PB | SB
S → AS | PB | SB
A → RS | XS | a
B → TS | YY | RS | XS | a
X → a
Y → b
P → AS
R → XA
T → SY

So this is the required CNF for given grammar.



Next Article

Similar Reads