Lecture - 21 - Dynamic Programming - LCS
Lecture - 21 - Dynamic Programming - LCS
2
LCS Applications
Application Area Use of LCS
Detects unchanged lines between file versions to highlight
File/version comparison additions/deletions during software development (git diff).
Aligns DNA, RNA, or protein sequences to find functional, structural, or
Bioinformatics evolutionary relationships.
Finds long, common sequences of words or phrases across documents to
Plagiarism detection detect copied content.
Identifies repeated subsequences to store only changes instead of entire
Data compression data, reducing storage needs.
Compares user-typed words with dictionary entries by finding closest
Spell checking
word with longest shared character subsequence.
Measures similarity between sentences or paragraphs by evaluating their
NLP/Text similarity
longest matching sequence of words.
Record matching Matches inconsistent or slightly varied records (e.g., “Kashif Ayyub” vs
“Kashif Ayub”) in databases by finding common subsequences.
3
Brute Force Approach
• Given two strings:
• X = "AB"
• Y = "ACB"
• Goal: Find the Longest Common Subsequence (LCS) using Brute Force.
• Step 1: Generate All Subsequences
Subsequences of X = "AB" Subsequences of Y = "ACB"
“” “” “AC”
“A” “A” “AB”
“B” “C” “CB”
“AB” “B” “ACB”
4
Brute Force Approach
• Step 2: Compare Each Subsequence of X with Each Subsequence of Y
X Subsequence Matching Y Subsequences Is Common?
"" all ("" is subsequence of any string) ✅
"A" "A", "AC", "AB", "ACB" ✅
"B" "B", "AB", "CB", "ACB" ✅
"AB" "AB", "ACB" ✅
5
Brute Force Approach
• X of length m → 2m subsequences
• Y of length n → 2n subsequences
• Compare each pair → 2m × 2n comparisons
• O(2m⋅2n⋅min(m,n))
• This is exponential time, highly inefficient and impractical for even moderate
values of m and n.
6
LCS using DP - Algorithm
• Instead of checking all subsequences (like Brute Force), DP solves smaller
subproblems and stores their results in a table, building up to the final
answer.
• First we’ll find the length of LCS. Later we’ll modify the algorithm to find LCS
itself.
• Define Xi, Yj to be the prefixes of X and Y of length i and j respectively
• Define c[i,j] to be the length of LCS of Xi and Yj
• Then the length of LCS of X and Y will be
c[m,n]
c[i 1, j 1] 1 if x[i ] y[ j ],
c[i, j ]
max(c[i, j 1], c[i 1, j ]) otherwise
7
LCS – Recursive Solution
• We start with i = j = 0 (empty substrings of x and y)
• Since X0 and Y0 are empty strings, their LCS is always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is empty, so for every i and j: c[0, j] =
c[i,0] = 0
c[i 1, j 1] 1 if x[i ] y[ j ],
c[i, j ]
max(c[i, j 1], c[i 1, j ]) otherwise
8
LCS – Recursive Solution
• When we calculate c[i,j], we consider two cases:
• First case: x[i]=y[j]: one more symbol in strings X and Y matches, so the length of LCS Xi and
Yj equals to the length of LCS of smaller strings Xi-1 and Yi-1 , plus 1
• As symbols don’t match, our solution is not improved, and the length of LCS(Xi , Yj) is the
same as before (i.e. maximum of
LCS(Xi, Yj-1) and LCS(Xi-1,Yj)
• LCS(X, Y) = BCB
•X = A B C B
•Y = B D C A B
11
LCS – Example (0)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi
1 A
2 B
3 C
4 B
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4] 12
LCS – Example (1)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0
2 B 0
3 C 0
4 B 0
for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0 13
LCS – Example (2)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 14
LCS – Example (3)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 15
LCS – Example (4)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 16
LCS – Example (5)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 17
LCS – Example (6)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 18
LCS – Example (7)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 19
LCS – Example (8)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 20
LCS – Example (9)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 21
LCS – Example (10)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 22
LCS – Example (11)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 23
LCS – Example (12)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 24
LCS – Example (13)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 25
LCS – Example (14)
j 0 1 2 3 4 5 ABCB
i Yj B D C A B BDCAB
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] ) 26
How to find actual LCS?
• So far, we have just found the length of LCS, but not LCS itself.
• We want to modify this algorithm to make it output Longest Common
Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
28
How to find actual LCS? - Algorithm
• Here’s a recursive algorithm to do this:
LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n]) // go up?
LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1] // go left?
LCS_print(x, m, n-1, c);
else { // it was a match!
LCS_print(x, m-1, n-1, c);
print(x[m]); // print after recursive call
}
}
29
How to find actual LCS? - Example
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n])
1 A 0 0 0 0 1 1 LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1]
LCS_print(x, m, n-1, c);
2 B 0 1 1 1 1 2 else {
LCS_print(x, m-1, n-1, c);
3 C 0 1 1 2 2 2 }
print(x[m]);
}
4 B 0 1 1 2 2 3
30
How to find actual LCS? - Example
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS_print(x, m, n, c) {
if (c[m][n] == c[m-1][n])
1 A 0 0 0 0 1 1 LCS_print(x, m-1, n, c);
else if (c[m][n] == c[m][n-1]
LCS_print(x, m, n-1, c);
2 B 0 1 1 1 1 2 else {
LCS_print(x, m-1, n-1, c);
3 C 0 1 1 2 2 2 }
print(x[m]);
}
4 B 0 1 1 2 2 3
32