0% found this document useful (0 votes)
0 views32 pages

Lecture - 21 - Dynamic Programming - LCS

The document discusses the Longest Common Subsequence (LCS) algorithm, which identifies the longest subsequence common to given sequences without requiring consecutive elements. It outlines applications of LCS in various fields such as file comparison, bioinformatics, plagiarism detection, and data compression. The document also contrasts brute force and dynamic programming approaches for calculating LCS, highlighting the inefficiency of brute force and the systematic method of dynamic programming.

Uploaded by

zkashif139
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views32 pages

Lecture - 21 - Dynamic Programming - LCS

The document discusses the Longest Common Subsequence (LCS) algorithm, which identifies the longest subsequence common to given sequences without requiring consecutive elements. It outlines applications of LCS in various fields such as file comparison, bioinformatics, plagiarism detection, and data compression. The document also contrasts brute force and dynamic programming approaches for calculating LCS, highlighting the inefficiency of brute force and the systematic method of dynamic programming.

Uploaded by

zkashif139
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DESIGN AND ANALYSIS OF ALGORITHMS

Dr. Kashif Ayyub


1
Longest Common Subsequence (LCS)
• The longest common subsequence (LCS) is defined as the longest
subsequence that is common to all the given sequences, provided that the
elements of the subsequence are not required to occupy consecutive
positions within the original sequences.
• Example:
• S1 = {B, C, D, A, A, C, D}
• S2 = {A, C, D, B, A, C}
• Common Sequences = {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C}, {C, D}, ...
• Longest Common Sequence = {C, D, A, C}

2
LCS Applications
Application Area Use of LCS
Detects unchanged lines between file versions to highlight
File/version comparison additions/deletions during software development (git diff).
Aligns DNA, RNA, or protein sequences to find functional, structural, or
Bioinformatics evolutionary relationships.
Finds long, common sequences of words or phrases across documents to
Plagiarism detection detect copied content.
Identifies repeated subsequences to store only changes instead of entire
Data compression data, reducing storage needs.
Compares user-typed words with dictionary entries by finding closest
Spell checking
word with longest shared character subsequence.
Measures similarity between sentences or paragraphs by evaluating their
NLP/Text similarity
longest matching sequence of words.

Record matching Matches inconsistent or slightly varied records (e.g., “Kashif Ayyub” vs
“Kashif Ayub”) in databases by finding common subsequences.

3
Brute Force Approach
• Given two strings:
• X = "AB"
• Y = "ACB"

• Goal: Find the Longest Common Subsequence (LCS) using Brute Force.
• Step 1: Generate All Subsequences
Subsequences of X = "AB" Subsequences of Y = "ACB"
“” “” “AC”
“A” “A” “AB”
“B” “C” “CB”
“AB” “B” “ACB”
4
Brute Force Approach
• Step 2: Compare Each Subsequence of X with Each Subsequence of Y
X Subsequence Matching Y Subsequences Is Common?
"" all ("" is subsequence of any string) ✅
"A" "A", "AC", "AB", "ACB" ✅
"B" "B", "AB", "CB", "ACB" ✅
"AB" "AB", "ACB" ✅

Now pick the longest of the common subsequences:


• "", "A", "B", "AB" → Answer: "AB"

5
Brute Force Approach
• X of length m → 2m subsequences
• Y of length n → 2n subsequences
• Compare each pair → 2m × 2n comparisons
• O(2m⋅2n⋅min(m,n))​

• Example: m = 10, n = 10 → 1,048,576 comparisons!

• This is exponential time, highly inefficient and impractical for even moderate
values of m and n.

6
LCS using DP - Algorithm
• Instead of checking all subsequences (like Brute Force), DP solves smaller
subproblems and stores their results in a table, building up to the final
answer.
• First we’ll find the length of LCS. Later we’ll modify the algorithm to find LCS
itself.
• Define Xi, Yj to be the prefixes of X and Y of length i and j respectively
• Define c[i,j] to be the length of LCS of Xi and Yj
• Then the length of LCS of X and Y will be
c[m,n]
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ] 
 max(c[i, j  1], c[i  1, j ]) otherwise
7
LCS – Recursive Solution
• We start with i = j = 0 (empty substrings of x and y)

• Since X0 and Y0 are empty strings, their LCS is always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is empty, so for every i and j: c[0, j] =
c[i,0] = 0

c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ] 
 max(c[i, j  1], c[i  1, j ]) otherwise
8
LCS – Recursive Solution
• When we calculate c[i,j], we consider two cases:

• First case: x[i]=y[j]: one more symbol in strings X and Y matches, so the length of LCS Xi and
Yj equals to the length of LCS of smaller strings Xi-1 and Yi-1 , plus 1

• Second case: x[i] != y[j]

• As symbols don’t match, our solution is not improved, and the length of LCS(Xi , Yj) is the
same as before (i.e. maximum of
LCS(Xi, Yj-1) and LCS(Xi-1,Yj)

• Why not just take the length of LCS(Xi-1, Yj-1) ?


c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ] 
 max(c[i, j  1], c[i  1, j ]) otherwise 9
LCS - Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c[m,n] // return LCS length for X and Y
10
LCS - Example
We’ll see how LCS algorithm works on the following example:
• X = ABCB
• Y = BDCAB

• What is the Longest Common Subsequence of X and Y?

• LCS(X, Y) = BCB
•X = A B C B
•Y = B D C A B

11
LCS – Example (0)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi

1 A

2 B

3 C
4 B

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4] 12
LCS – Example (1)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0
2 B 0
3 C 0
4 B 0

for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0 13
LCS – Example (2)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 14
LCS – Example (3)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 15
LCS – Example (4)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 16
LCS – Example (5)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 17
LCS – Example (6)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 18
LCS – Example (7)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 19
LCS – Example (8)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 20
LCS – Example (9)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 21
LCS – Example (10)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 22
LCS – Example (11)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 23
LCS – Example (12)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 24
LCS – Example (13)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 25
LCS – Example (14)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 26
How to find actual LCS?
• So far, we have just found the length of LCS, but not LCS itself.
• We want to modify this algorithm to make it output Longest Common
Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:

2 2 For example, here


2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3
27
How to find actual LCS?
• Remember that
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ] 
 max(c[i, j  1], c[i  1, j ]) otherwise

• So we can start from c[m,n] and go backwards


• Look first to see if 2nd case above was true
• If not, then c[i,j] = c[i-1, j-1]+1, so remember x[i] (because x[i] is a part of
LCS)
• When i=0 or j=0 (i.e. we reached the beginning), output remembered letters
in reverse order

28
How to find actual LCS? - Algorithm
• Here’s a recursive algorithm to do this:

LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n]) // go up?
LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1] // go left?
LCS_print(x, m, n-1, c);
else { // it was a match!
LCS_print(x, m-1, n-1, c);
print(x[m]); // print after recursive call
}
}

29
How to find actual LCS? - Example
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n])
1 A 0 0 0 0 1 1 LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1]
LCS_print(x, m, n-1, c);
2 B 0 1 1 1 1 2 else {
LCS_print(x, m-1, n-1, c);
3 C 0 1 1 2 2 2 }
print(x[m]);

}
4 B 0 1 1 2 2 3

30
How to find actual LCS? - Example
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n])
1 A 0 0 0 0 1 1 LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1]
LCS_print(x, m, n-1, c);
2 B 0 1 1 1 1 2 else {
LCS_print(x, m-1, n-1, c);
3 C 0 1 1 2 2 2 }
print(x[m]);

}
4 B 0 1 1 2 2 3

LCS (reversed order): B C B


LCS (straight order): BCB
(this string turned out to be a palindrome) 31
LCS – Running Time
• LCS algorithm calculates the values of each entry of the array c[m,n]
• So what is the running time?
O(mn)
• LCS algorithm calculates the values of each entry of the array c[m,n]
• So what is the running time?

32

You might also like