0% found this document useful (0 votes)
21 views

CYK Algorithm

Uploaded by

purid9991
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

CYK Algorithm

Uploaded by

purid9991
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction to the CYK Algorithm

The Cocke-Younger-Kasami (CYK) Algorithm is a parsing algorithm used in Natural Language


Processing (NLP) for determining whether a given string can be generated by a given context-free
grammar (CFG). It is particularly useful for grammars in Chomsky Normal Form (CNF).

• Purpose: The CYK algorithm is primarily used to determine the validity of a sentence based
on a CFG and to find its possible parse trees.

• Applicability: CYK works only on grammars that are in CNF. Hence, converting CFG to CNF
is an essential prerequisite for using this algorithm.

Steps in the CYK Algorithm

1. Input:

o A string w of length n.

o A CFG in CNF.

2. Matrix Initialization:

o Create an n x n table, where each cell (i, j) stores the set of non-terminal symbols
that can generate the substring w[i:j].

3. Filling the Matrix:

o For substrings of length 1, check which non-terminals directly generate the


terminals.

o For longer substrings, find combinations of non-terminals from smaller substrings


that can produce the substring.

4. Determining Validity:

o If the start symbol (e.g., S) is in the cell (1, n) at the end of the process, then the
string can be generated by the CFG.

Example Using the CYK Algorithm

CFG in CNF:

S → AB | BC

A → BA | a

B → CC | b

C → AB | a

Input Sentence: "baaba"


Step-by-Step Solution:

1. Initialize the Table with substrings of length 1:

o For "b": B

o For "a": A

o For the full sentence "baaba", the table is filled by looking at possible combinations
of S → AB and BC based on smaller substring matches.

2. Fill for Substrings of increasing length, combining results according to the production
rules.

3. Check Cell (1, n):

o If it contains S, the sentence is derivable from the grammar.

By filling out the matrix systematically, we check for combinations that satisfy the production rules,
determining if S can derive the full sentence.

Converting CFG to CNF

A CFG can be converted to CNF to make it suitable for algorithms like CYK. In CNF, production rules
take one of two forms:

• A → BC (where B and C are non-terminal symbols)

• A → a (where a is a terminal symbol)

Steps to Convert a CFG to CNF:

1. Remove Nullable Productions (A → ε):

o If any non-terminal generates an empty string, adjust other rules to account for this
possibility.

2. Eliminate Unit Productions (A → B):

o If there are rules where one non-terminal directly leads to another, replace them
with equivalent productions.

3. Reduce Rules to Binary Productions (A → B C):

o For productions with more than two non-terminals, introduce intermediate


variables.

4. Convert Terminals in Mixed Rules:

o If any rule has both terminals and non-terminals, introduce new non-terminals to
represent each terminal.
Example of Converting CFG to CNF

Original CFG:

S → AB | AC | aB

A→a|ε

B→b

C→c

Step-by-Step Conversion:

1. Remove Nullable Production (A → ε):

• Adjust S → AB to include S → B (since A can be null).

Updated Rules:

S → AB | AC | aB | B

A→a

B→b

C→c

2. No Unit Productions to Eliminate (in this case).

3. Convert to Binary Productions:

• Rules like S → AC remain, as they are already binary.

• For S → aB, add a new non-terminal: A1 → a and replace S → A1 B.

Updated Rules:

S → AB | AC | A1 B | B

A→a

A1 → a

B→b

C→c

4. Ensure CNF Compliance (terminals with non-terminals):

• All are binary or unitary (one terminal) productions, so the final CFG is now in CNF.

Final CNF:

S → AB | AC | A1 B | B

A→a
A1 → a

B→b

C→c

Now, this CNF grammar can be used directly in the CYK algorithm.

Example Sentence: "baaba"

Step 1: Define the Grammar in CNF

We have a grammar in Chomsky Normal Form (CNF) as follows:

S → AB | BC

A → BA | a

B → CC | b

C → AB | a

Step 2: Set Up the CYK Table

The CYK table will be a 5x5 table (since there are 5 characters in the sentence "baaba"). Each cell (i,
j) in the table represents the set of non-terminal symbols that can generate the substring from
position i to j in the input string.

Step 3: Fill the Table Using the CYK Algorithm

1. Initialize the Diagonal (Substrings of Length 1):

o For "b" (first character), cell (1,1): B

o For "a" (second character), cell (2,2): A, C

o For "a" (third character), cell (3,3): A, C

o For "b" (fourth character), cell (4,4): B

o For "a" (fifth character), cell (5,5): A, C

2. Filling the Rest of the Table:

We fill each cell (i, j) by considering all possible divisions of the substring w[i:j] and looking at
combinations of non-terminals that can produce these substrings based on the production rules.
CYK Table Filling

i\j 1 2 3 4 5

1 B {} S, B {} S

2 A, C A, C, S S, B S, A, C

3 A, C {} S

4 B {}

5 A, C

Steps to Fill the Table:

1. Length = 2:

• For (1,2), "ba":

▪ B and A produce no valid combinations, hence {}

• For (2,3), "aa":

▪ A and C combinations: S → AB, so S

• For (3,4), "ab":

▪ A and B combinations: S → AB, so S

• For (4,5), "ba":

▪ B and A produce no valid combinations, hence {}

2. Length = 3:

• For (1,3), "baa":

▪ B with A, C: S → BC results in S

• For (2,4), "aab":

▪ A, C with B: S → AB, S → BC, and B → CC, results in S, B

• For (3,5), "aba":

▪ A, C with A, C: S → AB, results in S

3. Length = 4:

• For (1,4), "baab":

▪ B with S, B: Produces no new non-terminals, hence {}

• For (2,5), "aaba":


▪ A, C with A, C: S → AB, A → BA, C → AB, results in S, A, C

4. Length = 5:

• For (1,5), "baaba":

▪ B, S with A, C: S → AB, results in S

Step 4: Check the Top Cell (1, n)

• The final cell (1,5) contains S, indicating that the sentence "baaba" is derivable from the
start symbol and valid according to the grammar.

You might also like