0% found this document useful (0 votes)
29 views17 pages

8 LCS 19 01 2024

The document discusses dynamic programming solutions to edit distance and longest common subsequence problems. It explains that edit distance between two strings is the minimum number of edits needed to change one string into the other. A dynamic programming solution uses a table to store solutions to subproblems to find the edit distance between prefixes of the strings. Longest common subsequence finds the longest subsequence common to two sequences. A dynamic programming approach uses a table to store the length of the LCS of prefixes, recursively defining the problem based on matching last characters of the prefixes. Both problems can be solved in O(mn) time using dynamic programming where m and n are the lengths of the input strings.

Uploaded by

Baladhithya T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views17 pages

8 LCS 19 01 2024

The document discusses dynamic programming solutions to edit distance and longest common subsequence problems. It explains that edit distance between two strings is the minimum number of edits needed to change one string into the other. A dynamic programming solution uses a table to store solutions to subproblems to find the edit distance between prefixes of the strings. Longest common subsequence finds the longest subsequence common to two sequences. A dynamic programming approach uses a table to store the length of the LCS of prefixes, recursively defining the problem based on matching last characters of the prefixes. Both problems can be solved in O(mn) time using dynamic programming where m and n are the lengths of the input strings.

Uploaded by

Baladhithya T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

BCSE204L Design and

Analysis of Algorithms
Module-2
Topic-2: Dynamic Programming-problems
Edit distance
PROBLEM:
• Spell checker encounters a possible misspelling, it looks in its dictionary for other words that are close by.
What is the appropriate notion of closeness in this case?
• Measure of the distance between two strings is the extent to which they can be aligned, or matched up
• An alignment is simply a way of writing the strings one above the other
• Two possible alignments of SNOWY and SUNNY:

➢“−” indicates a “gap”; any number of these can be placed in either string
➢Cost of an alignment is the number of columns in which the letters differ
➢Edit distance between two strings is the cost of their best possible alignment
➢Edit distance means the minimum number of edits—insertions, deletions, and
substitutions of characters—needed to transform the first string into the second
➢For the first the alignment, edit distance 3 corresponds to three edits: insert U, substitute
O → N, and delete W.
A dynamic programming solution
• For DP problem, the most crucial question is, What are the subproblems?
❖Goal is to find the edit distance between two strings x[1 · · · m] and y[1 · · · n]
• Subproblem will be finding the edit distance between some prefix of the first
string, x[1 · · · i], and some prefix of the second, y[1 · · · j]

• If the subproblem is E(i, j), our final objective, then, is to compute E(m, n)
• For computing the best alignment between x[1 · · · i] and y[1 · · · j] rightmost
column can only be one of three things:
• The first case incurs a cost of 1 for this particular column, and it remains to
align x[1 · · · i − 1] with y[1 · · · j]------> subproblem E(i−1, j)
• The second case, also with cost 1, still need to align x[1 · · · i] with y[1 · · · j − 1].
--------> subproblem E(i, j − 1)
• The final case, which either costs 1 (if x[i] == y[j]) or 0 (if x[i] != y[j]), --------------
---------->subproblem E(i − 1, j − 1)
➢ Thus, we have expressed E(i, j) in terms of three smaller
subproblems E(i − 1, j), E(i, j − 1), E(i − 1, j − 1)
➢ As we have no idea which of them is the right one, so we need to try them all
and pick the best:
• The answers to all the subproblems E(i, j) form a two-dimensional table
• In what order should these subproblems be solved?
• If we follow iterative procedure, E(i − 1, j), E(i, j − 1), and E(i − 1, j − 1) are
handled before E(i, j).
• For instance, we could fill in the table one row at a time, from top row to
bottom row, and moving left to right across each row. Or alternatively, we
could fill it in column by column.
Longest Common Subsequence (LCS)
• The longest common subsequence (LCS)
problem is the problem of finding the
longest subsequence common to all
sequences in a set of sequences
• Unlike substrings, subsequences are NOT
required to occupy consecutive positions
within the original sequences
LCS Example
• Seq1: (𝐴𝐵𝐶𝐷), Seq2: (𝐴𝐶𝐵𝐴𝐷)
• Length-2 common subsequences:
(𝐴𝐵), (𝐴𝐶), (𝐴𝐷), (𝐵𝐷), and (𝐶𝐷), totaling to 5
• Length-3 common subsequences: (𝐴𝐵𝐷) and (𝐴𝐶𝐷),
totaling to 2
• No common subsequences longer than 3
• So (𝐴𝐵𝐷) and (𝐴𝐶𝐷) are their longest common
subsequences and length of LCS = 5
LCS DP- The idea
• Let the input sequences be 𝑋[0. . 𝑚 − 1] and
𝑌 [0. . 𝑛 − 1] of lengths 𝑚 and 𝑛 respectively.
• Let 𝐿(𝑋[0. . 𝑚 − 1], 𝑌 [0. . 𝑛 − 1]) be the length of
LCS of the two sequences 𝑋 and 𝑌 .
• The recursive definition of 𝐿 (𝑋[0. . 𝑚 −
1], 𝑌 0. . 𝑛 − 1 ) can be framed based on whether
the last characters in the previous call match or
not
LSP DP- The idea
• If last characters of both sequences match (or
𝑋[𝑚 − 1] == 𝑌 [𝑛 − 1]) then
𝐿 𝑋 0. . 𝑚 − 1 , 𝑌 0. . 𝑛 − 1
= 1 + 𝐿(𝑋[0. . 𝑚 − 2], 𝑌 [0. . 𝑛 − 2])
•If last characters of both sequences do not match (or
𝑋[𝑚 − 1] ≠ 𝑌 [𝑛 − 1]) then
𝐿(𝑋[0. . 𝑚 − 1], 𝑌 [0. . 𝑛 − 1]) =
MAX{𝐿(𝑋[0. . 𝑚 − 2], 𝑌 [0. . 𝑛 − 1]), 𝐿(𝑋[0. . 𝑚
− 1], 𝑌 [0. . 𝑛 − 2])}
LCS DP- An Example
• Consider the input strings “𝐴𝐺𝐺𝑇𝐴𝐵” and “𝐺𝑋𝑇𝑋𝐴𝑌𝐵”.
Last characters match for the strings
• So length of LCS can be written as:
𝐿(𝐴𝐺𝐺𝑇𝐴𝐵, 𝐺𝑋𝑇𝑋𝐴𝑌𝐵) =
1 + 𝐿 𝐴𝐺𝐺𝑇𝐴, 𝐺𝑋𝑇𝑋𝐴𝑌
• Consider the input strings “𝐴𝐵𝐶𝐷𝐺𝐻” and “𝐴𝐸𝐷𝐹𝐻𝑅”.
Last characters do not match for the strings
• So length of LCS can be written as: 𝐿(𝐴𝐵𝐶𝐷𝐺𝐻, 𝐴𝐸𝐷𝐹𝐻𝑅) =
𝑀𝐴𝑋 {𝐿(𝐴𝐵𝐶𝐷𝐺, 𝐴𝐸𝐷𝐹𝐻𝑅), 𝐿(𝐴𝐵𝐶𝐷𝐺𝐻, 𝐴𝐸𝐷𝐹𝐻)}
LCS DP Pseudocode
• Find LCS of ABCBDAB and BDCABA
Analysis
• Time complexity
• If and are the lengths of the sequences
• Then filing an table
• Therefore,

You might also like