CYK Algorithm
CYK Algorithm
• Purpose: The CYK algorithm is primarily used to determine the validity of a sentence based
on a CFG and to find its possible parse trees.
• Applicability: CYK works only on grammars that are in CNF. Hence, converting CFG to CNF
is an essential prerequisite for using this algorithm.
1. Input:
o A string w of length n.
o A CFG in CNF.
2. Matrix Initialization:
o Create an n x n table, where each cell (i, j) stores the set of non-terminal symbols
that can generate the substring w[i:j].
4. Determining Validity:
o If the start symbol (e.g., S) is in the cell (1, n) at the end of the process, then the
string can be generated by the CFG.
CFG in CNF:
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
o For "b": B
o For "a": A
o For the full sentence "baaba", the table is filled by looking at possible combinations
of S → AB and BC based on smaller substring matches.
2. Fill for Substrings of increasing length, combining results according to the production
rules.
By filling out the matrix systematically, we check for combinations that satisfy the production rules,
determining if S can derive the full sentence.
A CFG can be converted to CNF to make it suitable for algorithms like CYK. In CNF, production rules
take one of two forms:
o If any non-terminal generates an empty string, adjust other rules to account for this
possibility.
o If there are rules where one non-terminal directly leads to another, replace them
with equivalent productions.
o If any rule has both terminals and non-terminals, introduce new non-terminals to
represent each terminal.
Example of Converting CFG to CNF
Original CFG:
S → AB | AC | aB
A→a|ε
B→b
C→c
Step-by-Step Conversion:
Updated Rules:
S → AB | AC | aB | B
A→a
B→b
C→c
Updated Rules:
S → AB | AC | A1 B | B
A→a
A1 → a
B→b
C→c
• All are binary or unitary (one terminal) productions, so the final CFG is now in CNF.
Final CNF:
S → AB | AC | A1 B | B
A→a
A1 → a
B→b
C→c
Now, this CNF grammar can be used directly in the CYK algorithm.
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
The CYK table will be a 5x5 table (since there are 5 characters in the sentence "baaba"). Each cell (i,
j) in the table represents the set of non-terminal symbols that can generate the substring from
position i to j in the input string.
We fill each cell (i, j) by considering all possible divisions of the substring w[i:j] and looking at
combinations of non-terminals that can produce these substrings based on the production rules.
CYK Table Filling
i\j 1 2 3 4 5
1 B {} S, B {} S
2 A, C A, C, S S, B S, A, C
3 A, C {} S
4 B {}
5 A, C
1. Length = 2:
2. Length = 3:
▪ B with A, C: S → BC results in S
3. Length = 4:
4. Length = 5:
• The final cell (1,5) contains S, indicating that the sentence "baaba" is derivable from the
start symbol and valid according to the grammar.