0% found this document useful (0 votes)
40 views

Lecture Notes

The document discusses the longest common subsequence (LCS) problem and dynamic programming approach. It defines the LCS problem as finding the longest subsequence that is common to two given sequences. It presents a dynamic programming algorithm that builds the solution from optimal solutions to overlapping subproblems. The algorithm uses a 2D array to store the lengths of LCS between prefixes of the input sequences, with running time of O(mn) where m and n are sequence lengths. An example demonstrates how the algorithm works step-by-step to find the LCS of two sample sequences.

Uploaded by

ananyaxx09
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Lecture Notes

The document discusses the longest common subsequence (LCS) problem and dynamic programming approach. It defines the LCS problem as finding the longest subsequence that is common to two given sequences. It presents a dynamic programming algorithm that builds the solution from optimal solutions to overlapping subproblems. The algorithm uses a 2D array to store the lengths of LCS between prefixes of the input sequences, with running time of O(mn) where m and n are sequence lengths. An example demonstrates how the algorithm works step-by-step to find the LCS of two sample sequences.

Uploaded by

ananyaxx09
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

MA 515: Introduction to Algorithms &

MA353 : Design and Analysis of Algorithms


[3-0-0-6]
Lecture 27
http://www.iitg.ernet.in/psm/indexing_ma353/y09/index.html

Partha Sarathi Manal


[email protected]
Dept. of Mathematics, IIT Guwahati
Mon 10:00-10:55 Tue 11:00-11:55 Fri 9:00-9:55
Class Room : 2101
Dynamic programming

• One of the most important algorithm tools!


• Very common interview question

• Method for solving problems where optimal


solutions can be defined in terms of optimal
solutions to sub-problems AND
• the sub-problems are overlapping
Identifying a dynamic
programming problem

• The solution can be defined with respect to


solutions to subproblems

• The subproblems created are overlapping, that


is we see the same subproblems repeated
Two main ideas for
dynamic programming

• Identify a solution to the problem with


respect to smaller subproblems
– F(n) = F(n-1) + F(n-2)
• Bottom up: start with solutions to the
smallest problems and build solutions to the
larger problems use an array to
Fib(n): store solutions to
F[0] := 0; F[1] := 1; subproblems
for i := 2 to n do
F[i] := F[i-1] + F[i-2]
return F[n]
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

ABA?
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

ABA
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

ACA?
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

ACA
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

DCA?
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

DCA
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

AADAA?
Longest common subsequence
(LCS)
• For a sequence X = x1,x2, …, xn, a
subsequence is a subset of the sequence
defined by a set of increasing indices (i1, i2, …,
ik) where 1 ≤ i1 < i2 < … < ik ≤ n

X =ABAC DABAB

AADAA
LCS problem

• Given two sequences X and Y, a common subsequence


is a subsequence that occurs in both X and Y
• Given two sequences X = x1, x2, …, xm and
Y = y1, y2, …, yn, What is the longest common
subsequence?

X =AB C B DAB
Y=BDCABA
LCS problem

• Given two sequences X and Y, a common subsequence


is a subsequence that occurs in both X and Y
• Given two sequences X = x1, x2, …, xm and
Y = y1, y2, …, yn, What is the longest common
subsequence?

X =AB C B DAB
The sequences{B, D, A, B}
of length 4 are the LCS of
Y=B D CABA X and Y, since there is no
common subsequence of
length 5 or greater.
LCS problem

Application:
comparison of two DNA strings

Brute force algorithm would compare each


subsequence of X with the symbols in Y
LCS Algorithm

• Brute-force algorithm: For every subsequence


of x, check if it’s a subsequence of y
– How many subsequences of x are there ?
– What will be the running time of the brute-force
algorithm ?
LCS Algorithm

• if |X| = m, |Y| = n, then there are 2m subsequences of


x; we must compare each with Y (n comparisons)
• So the running time of the brute-force algorithm is
O(n 2m)
• Notice that the LCS problem has optimal
substructure: solutions of subproblems are parts of the
final solution.
• Subproblems: “find LCS of pairs of prefixes of X and
Y”
LCS Algorithm

• First we’ll find the length of LCS. Later we’ll modify


the algorithm to find LCS itself.
• Define Xi, Yj to be the prefixes of X and Y of length i
and j respectively
• Define c[i,j] to be the length of LCS of Xi and Yj
• Then the length of LCS of X and Y will be c[m,n]
• c[m,n] is the final solution.
Step 1: Define the problem with
respect to subproblems
X =AB C B DAB

Y=BDCABA
Step 1: Define the problem with
respect to subproblems
X =AB C B DA?

Y=BDCAB?

Is the last character part of the LCS?


Step 1: Define the problem with
respect to subproblems
X =AB C B DA?

Y=BDCAB?

Two cases: either the characters are


the same or they’re different
Step 1: Define the problem with
respect to subproblems
X = A B C B D AA
LCS The characters are part
of the LCS
Y=BDCABA

If they’re the same

LCS( X , Y )  LCS( X m1 , Yn1 )  1


Step 1: Define the problem with
respect to subproblems
X =ABC B DAB
LCS

Y=BDCABA

If they’re different

LCS( X , Y )  LCS( X m1 , Y )


Step 1: Define the problem with
respect to subproblems
X =ABC B DAB
LCS

Y=BDCABA

If they’re different

LCS( X , Y )  LCS( X , Yn1 )


Step 1: Define the problem with
respect to subproblems
X =ABC B DAB
Y=BDCABA

X=ABCBDAB
?
Y=BDCABA

If they’re different
Step 1: Define the problem with
respect to subproblems
X =ABC B DAB

Y=BDCABA

 1  LCS( X m 1 , Yn 1 ) if xm  yn
LCS( X , Y )  
max( LCS( X m1 , Y ), LCS( X , Yn 1 ) otherwise
Step 2: Build the solution from
the bottom up
 LCS( X m 1 , Yn 1 )  1 if xm  yn
LCS( X , Y )  
max( LCS( X m1 , Y ), LCS( X , Yn 1 ) otherwise

What types of subproblem


solutions do we need to store?

LCS(Xj, Yk)

two different indices


Step 2: Build the solution from
the bottom up
 LCS( X m 1 , Yn 1 )  1 if xm  yn
LCS( X , Y )  
max( LCS( X m1 , Y ), LCS( X , Yn 1 ) otherwise

What types of subproblem


solutions do we need to store?

LCS(Xi, Yj)

 1  c[i  1, j  1] if xi  y j
c[i, j ]  
max(c[i  1, j ], c[i, j  1] otherwise
LCS recursive solution

c[i  1, j  1]  1 if xi  yj,
c[i, j ]  
 max(c[i, j  1], c[i  1, j ]) otherwise

• We start with i = j = 0 (empty substrings of x and y)


• Since X0 and Y0 are empty strings, their LCS is
always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is empty,
so for every i and j: c[0, j] = c[i,0] = 0
LCS recursive solution

c[i  1, j  1]  1 if xi  yj,
c[i, j ]  
 max(c[i, j  1], c[i  1, j ]) otherwise

• When we calculate c[i,j], we consider two cases:


• First case: xi = yj: one more symbol in strings X and
Y matches, so the length of LCS Xi and Yj equals to
the length of LCS of smaller strings Xi-1 and Yi-1 ,
plus 1
LCS recursive solution

c[i  1, j  1]  1 if xi  yj,
c[i, j ]  
 max(c[i, j  1], c[i  1, j ]) otherwise
• Second case: xi ≠ yj
• As symbols don’t match, our solution is not
improved, and the length of LCS(Xi , Yj) is the same
as before (i.e. maximum of LCS(Xi, Yj-1) and
LCS(Xi-1,Yj)

Why not just take the length of LCS(Xi-1, Yj-1) ?


LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c
LCS Example
We’ll see how LCS algorithm works on the
following example:
• X = ABCB
• Y = BDCAB
What is the Longest Common Subsequence
of X and Y?
LCS(X, Y) = BCB
X=AB C B
Y= B DCAB
ABCB
LCS Example (0) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi
1 A
2 B
3 C
4 B
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
ABCB
LCS Example (1) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0
2 B 0
3 C 0
4 B 0

for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
ABCB
LCS Example (2) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (3) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (4) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (5) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (6) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (7) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (8) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (10) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (11) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (12) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (13) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (14) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
ABCB
LCS Example (15) BDCAB
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Algorithm Running Time

• LCS algorithm calculates the values of each


entry of the array c[m,n]
• So what is the running time?
• O(mn)
• since each c[i,j] is calculated in constant time,
and there are mn elements in the array
How to find actual LCS

• So far, we have just found the length of LCS, but not


LCS itself.
• We want to modify this algorithm to make it output
Longest Common Subsequence of X and Y
• Each c[i,j] depends on c[i-1,j] and c[i,j-1] or c[i-1,j-1]
• For each c[i,j] we can say how it was acquired:

2 2 For example, here


2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3
How to find actual LCS - continued

• Remember that
c[i  1, j  1]  1 if x[i]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
• So we can start from c[m,n] and go backwards
• Whenever c[i,j] = c[i-1, j-1]+1, remember x[i]
(because x[i] is a part of LCS)
• When i=0 or j=0 (i.e. we reached the beginning),
output remembered letters in reverse order
Finding LCS
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
Finding LCS (2)
j 0 1 2 3 4 5
i yj B D C A B
0 xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome)
Complexity Analysis

• After we have filled the array c[ ], we can use this


data to find the characters that constitute the
Longest Common Subsequence
• Algorithm runs in O(mn), which is much better
than the brute-force algorithm: O(n2m)

You might also like