Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
Péter Gács
Fall 2010
The class structure
1 for j ← 2 to A.size do
2 key ← A[ j]
// Insert A[ j] into the sorted sequence A[1 . . j − 1]
3 i ← j−1
4 while i > 0 and A[i] > key do
5 A[i + 1] ← A[i]
6 i−−
7 A[i + 1] ← key
Pseudocode conventions
Assignment
Indentation
Objects Arrays are also objects.
Meaning of A[3 . . 10].
Parameters Ordinary parameters are passed by value.
Objects are always treated like a pointer to the body of data,
which themselves are not copied.
Analyzing algorithms
In the insertion sort, every time A[i] > key is found, two
assignments are made. So we perform 2 comparisons (cost c1 ) and
2 assignments (cost c2 ). But the bound c2 papers over some
differences. The assignment
A[i + 1] ← A[i]
could be the costly (who know how much data is in A[i]). Even
without changing the algorithm, by choosing the way of storing the
data can influence this cost significantly. Improvements:
A[i] should not contain the actual data if it is large, only the
address of the place where it be found (a link).
Instead of an array, use a linked list. Then insertion does not
involve pushing back everything above.
A[1] A[2] A[3] ... ... ... A[n]
The array contains links to the actual data, so the copying during
sorting has a fixed low cost.
When moving an element to a different place in a linked list, most
intermediate elements are not touched.
Worst case, average case
Here are two functions that are not comparable. Let f (n) = n2 , and
for k = 0, 1, 2, . . . , we define g(n) recursively as follows. Let nk be
the sequence defined by n0 = 1, nk+1 = 2nk . So, n1 = 2, n2 = 4,
n3 = 16, and so on. For k = 1, 2, . . . let
So,
g(nk + 1) = nk+1 = 2nk = g(nk + 1) = g(nk + 2) = · · · = g(nk+1 ).
This gives g(n) = 2n−1 for n = nk + 1 and g(n) = n for n = nk+1 .
Function g(n) is sometimes much bigger than f (n) = n2 and
sometimes much smaller: these functions are incomparable for
∗
, <.
Some function classes
log2 x
log b x = = (log2 x)(log b 2).
log2 b
∗
These are all equivalence classes under =.
Some simplification rules
n2 − 3 log log n = n2 ,
∗
log n/n,
log log n,
n log2 n,
∗
3 + 1/n = 1,
p
5n/2n ,
p
(1.2)n−1 + = (1.2)n .
∗
n + log n
Solution:
p
5n/2n log n/n 1 log log n
n/ log log n n log2 n n2 (1.2)n .
Sums of series
6 + 18 + · · · + 2 · 3n = 3n
∗
is easy.
Lower bound, with k = dn/2e:
12 + · · · + n2 ¾ k2 + (k + 1)2 + · · · + n2
¾ (n/2 − 1)(n/2)2 = n3 .
∗
Infinite series
∗ 1
< 1 + q + q2 + · · · =
1−q
where q = 3−1/2 .
Another example:
Lower bound:
...
Assume n = 2k :
Perform first the jobs at the bottom level, then those on the next
level, and so on. In passes k and k + 1:
2k−1
2k
merge merge
Sorting without random access
What if we have to sort such a large list A, that it does not fit
into random access memory?
If we are at a position p we can inspect or change A[p], but in
one step, we can only move to position p + 1 or p − 1.
Note that the Algorithm MERGE(A, p, q, r) passed through arrays
A, L, R in one direction only, not jumping around. But both the
recursive and nonrecursive MERGE-SORT(A) uses random access.
We will modify the nonrecursive MERGE-SORT(A) to work
without random access.
Runs
a<b
Y N
a<c b<c
Y Y N
N
Indeed, when we increase the height by 1, each leaf can give rise to
at most two children, so the number of leaves can at most double.
We want lower bound. Let us show that for n large enough, this is
¾ 0.8n log n.
Indeed, if there is any shorter path to some leaf l, take a leaf l 0 with
longest path, and move it under l. This decreases the average
pathlength.
1 x ← A[i]
2 if i < j then for k ← i + 1 to j do A[k − 1] ← A[k]
3 else if j < i then for k ← i − 1 to j do A[k + 1] ← A[k]
4 A[ j] ← x
A[PARENT(i)] ¾ A[i].
PARENT(i) = bi/2c,
LEFT(i) = 2i, RIGHT(i) = 2i + 1.
A[1]=15
A[2]=12 A[3]=9
The naive estimate for the cost is O(n log n), if n = A.heap-size. But
more analysis shows:
Though there is an exact formula for the last sum, a rough estimate
is more instructive for us. Since h 2h/2 , write, with some
constant c:
X X
h2−h ¶ c 2−h/2 < ∞,
h¾0 h¾0
p, . . . , i , i + 1, . . . , j − 1
| {z } | {z }
¶x >x
Algorithm 6.2: QUICKSORT(A, p, r)
1 if p < r then
2 q ← PARTITION(A, p, r)
3 QUICKSORT(A, p, q − 1); QUICKSORT(A, q + 1, r)
Performance analysis
Worst case
T (n) = T (q − 1) + T (n − q) + n − 1.
1 q ← RANDOMIZED -PARTITION(A, p, r)
2 RANDOMIZED -QUICKSORT(A, p, q − 1)
3 RANDOMIZED -QUICKSORT(A, q, r)
Expected value
In order to analyze randomized quicksort, we learn some
probability concepts. I assume that you learned about random
variables already in an introductory course (Probability in
Computing, or its equivalent).
For a random variable X with possible values a1 , . . . , an , its
expected value EX is defined as
a1 P X = a1 + · · · + an P X = an .
EZ = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5.
EY = 1 · P { Z ¾ 5 } + 0 · P { Z < 5 } = P { Z ¾ 5 } .
Sum theorem
E(X + Y ) = EX + EY.
Let the sorted order be z1 < z2 < · · · < zn . If i < j then let
Zi j = {zi , zi+1 , . . . , z j }.
¦ © 1
P πi j = x = .
j−i+1
2
¦ ©
It follows that P Ci j = 1 = ECi j = j−i+1 . The expected number
of comparisons is, with k = j − i + 1:
n n−k+1
X X 2 X X 1
ECi j = =2
1¶i< j¶n 1¶i< j¶n
j−i+1 k=2 i=1
k
n
n−k+1 n−1 n−2 1
X
=2 =2 + + ··· +
k=2
k 2 3 n
1 1 1
= 2(n + 1) 1 + + + · · · + − 4n.
2 3 n
The median of a list is the middle element (or the first one, if
there are two) in the sorted order. We will try to find it faster
than by full sorting.
More generally, we may want to find the ith element in the
sorted order. (This generalizes minimum, maximum and
median.) This is called selection.
We will see that all selection tasks can be performed in linear
time.
Selection in expected linear time
τ0 ¶ an + τ1 ,
Eτ0 ¶ an + Eτ1 ,
Theorem:
n n−1
X 1 2 X
Eτ1 ¶ T (max(k − 1, n − k)) ¶ T (k),
k=1
n n
k=n/2
a = q b + r, 0 ¶ r < b.
u ≡ v (mod b)
d = xa + y b > 0
30 = 1 · 21 + 9
21 = 2 · 9 + 3
9=3·3
gcd = 3
Running time
Assume b < a:
a = bq + r ¾ b + r > 2r,
a/2 > r,
a b/2 > br.
a = bq + r.
By induction,
d = x 0 b + y 0 r = x 0 b + y 0 a − y 0 q b = y 0 a + (x 0 − q y 0 )b.
Algorithm 8.2: EXTENDED -EUCLID(a, b)
Returns (d, x, y) where d = gcd(a, b) = x a + y b.
1 if b = 0 then return (a, 1, 0)
2 else
3 (d 0 , x 0 , y 0 ) ← EXTENDED -EUCLID(b, a mod b)
4 return (d 0 , y 0 , x 0 − ba/bc y 0 )
Nonrecursive version
ax + b y = c
x = x 0 (c/d), y = y 0 (c/d).
Multiplicative inverse
aa0 mod m = 1.
a1 ± a2 ≡ b1 ± b2 (mod m),
a1 a2 ≡ b1 b2 (mod m).
Verify this in the lab. Division is also possible in the cases when
there is a multiplicative inverse:
These are abstract data types, just like heaps. Various subtypes
exist, depending on which operations we want to support:
SEARCH(S, k)
INSERT(S, k)
DELETE(S, k)
A data type supporting these operations is called a dictionary.
Other interesting operations:
MINIMUM(S), MAXIMUM(S)
SUCCESSOR(S, x) (given a pointer to x), PREDECESSOR(S, x)
UNION(S1 , S2 )
Elementary data structures
x = 〈x 1 , x 2 , . . . , x d 〉, x i < m.
δ = x 1 − y1 , W = r2 (x 2 − y2 ) + · · · + rd (x d − yd ),
A − B ≡ r1 δ + W (mod m).
Ordered left-to-right, items placed into the tree nodes (not only
in leaves).
Inorder tree walk.
Search, min, max, successor, predecessor, insert, delete. All in
O(h) steps.
Searches
TRANSPLANT(T, u, v),
The records are kept in the tree nodes, sorted, “between” the
edges going to the subtree.
Every path from root to leaf has the same length.
Every internal node has at least t − 1 and at most 2t − 1
records (t to 2t children).
t h+1 −1
Total number of nodes n ¾ 1 + t + · · · + t h = t−1
, showing
h = O(log t n).
Maintaining a B-tree
1 r ← T.root
2 if r.n < 2t − 1 then B-TREE-INSERT-NONFULL(r, k)
3 else // Add new root above r before splitting it.
4 s ← new node; s.leaf ← false; s.n ← 0
5 s.c1 ← r; T.root ← s
6 B-TREE-SPLIT-CHILD(s, 1)
7 B-TREE-INSERT-NONFULL(s, k)
Deletion
The diff program in Unix: what does it mean to say that we find
the places where two files differ (including insertions and
deletions)? Or, what does it mean to keep the “common” parts?
Let it mean the longest subsequence present in both:
X =a b c b d a b
Y = bdcababa
b c b a
Running through all subsequences would take exponential
time. There is a faster solution, recognizing that we only want
to find some longest subsequence.
Let c[i, j] be the length of the longest common subsequence of
the prefix of X [1 . . i] and Y [1 . . j]. Recursion:
0 if i = 0 or j = 0,
c[i, j] = c[i − 1, j − 1] + 1 if i, j > 0 and x i = y j ,
max(c[i, j − 1], c[i − 1, j]) otherwise.
1 m ← X .length; n ← Y.length
2 b[1 . . m, 1 . . n], c[0 . . m, 0 . . n] ← new tables
3 for i = 1 to m do c[i, 0] ← 0
4 for j = 1 to n do c[0, j] ← 0
5 for i = 1 to m do
6 for j = 1 to n do
7 if x i = y j then
8 c[i, j] ← c[i − 1, j − 1] + 1
9 b[i, j] ← {-}
10 else
11 c[i, j] ← max(c[i, j − 1], c[i − 1, j]); b[i, j] ← ;
12 if c[i − 1, j] = c[i, j] then b[i, j] ← {↑}
13 if c[i, j − 1] = c[i, j] then b[i, j] ← b[i, j] ∪ {←}
14 return c, b
Algorithm 11.2: LCS-PRINT(b, X , i, j)
Print the longest common subsequence of X [1 . . i] and
Y [1 . . j] using the table b computed above.
1 if i = 0 or j = 0 then return
2 if -∈ b[i, j] then LCS-P RINT (b, X , i − 1, j − 1); print x i
3 else if ↑∈ b[i, j] then LCS-P RINT (b, X , i − 1, j)
4 else LCS-P RINT (b, X , i, j − 1)
The knapsack problem
maximize w1 x 1 + · · · + w n x n
subject to a1 x 1 + · · · + an x n ¶ b,
x i = 0, 1, i = 1, . . . , n.
mk (p) = min{ a1 x 1 + · · · + ak x k : w1 x 1 + · · · + w k x k = p }.
where x + {u1 , . . . , uq } = {x + u1 , . . . , x + uq }.
Now the optimum is max{ p : b ∈ An (p) }.
Complexity: O(nw b).
Example: Let {a1 , . . . , a5 } = {2, 3, 5, 7, 11}, w i = 1, b = 13.
Otherwise,
1 2 3 4
1 {2} {} {} {}
2 {2, 3} {5} {} {}
3 {2, 3, 5} {5, 7, 8} {10} {}
4 {2, 3, 5, 7} {5, 7, 8, 9, 10, 12} {10, 12} {}
5 {2, 3, 5, 7, 11} {5, 7, 8, 9, 10, 12, 13} {10, 12} {}
max{ p : 13 ∈ A5 (p) } = 2.
Application: money changer problem. Produce the sum b using
smallest number of coins of denominations a1 , . . . , an (at most one
of each).
Greedy algorithms
Activity selection
[si , f i )
(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10),
(8, 11), (8, 12), (2, 13), (12, 14).
1 2 3 4
MAX-INDEP-LB(G, b)
1 2 3 4
(6 1) : (G \ {1}, 0)
(6 1, 6 2) : (G \ {1, 2}, 0)
(6 1, 6 2, 6 3) : (G \ {1, 2, 3}, 0) returns {4}, b0 = 1
(6 1, 6 2, 3) : (G \ {1, 2, 3, 4}, 1 − 1 = 0) returns ;,
so (6 1, 6 2) : (G \ {1, 2}, 0) returns {4}, b0 = 1.
(6 1, 2) : (G \ {1, 2, 3}, 1 − 1 = 0) returns {4},
so (6 1) : (G \ {1}, 0) returns {2, 4}, b0 = 2.
(1) : (G \ {1, 2}, 2 − 1 = 1)
(1, 6 2, 6 3) : (G \ {1, 2, 3}, 1) returns ; by bound, b0 = 2.
(1, 6 2, 3) : (G \ {1, 2, 3, 4}, 1 − 1 = 0) returns ;,
so (1) : (G \ {1, 2}, 2 − 1 = 1) returns ; by bound,
giving the final return {2, 4}.
Graph representation
Directed graph G = (V, E). Call some vertex s the source. Finds
shortest path (by number of edges) from the source s to every
vertex reachable from s.
Rough description of breadth-first search:
Distance x.d from s gets computed.
Black nodes have been visited.
Gray nodes frontier.
White nodes unvisited.
Algorithm 14.1: Generic graph search
1 paint s grey
2 while there are grey nodes do
3 take a grey node u, paint it black
4 for each white neighbor v of u do
5 v.d ← u.d + 1
6 make v a grey child of u
To turn this into breadth-first search, make the set of grey nodes a
queue Q. Now Q is represented in two ways, as a queue, and as a
subset of V (the grey nodes). The two representations must remain
synchronized.
Analysis: O(|V | + |E|) steps, since every link is handled at most
once.
Algorithm 14.2: BFS(G, s)
Input: graph G = (V, Adj).
Returns: the tree parents u.π, and the distances u.d.
Queue Q makes this search breadth-first.
1 for u ∈ V \ {s} do
2 u.color ← white; u.d ← ∞; u.π ← nil
3 s.color ← gray; s.d ← 0; Q ← {s}
4 while Q 6= ; do
5 u ← Q.head; DEQUEUE(Q)
6 for v ∈ u.Adj do
7 if v.color = white then
8 v.color ← gray; v.d ← u.d + 1; v.π ← u
9 ENQUEUE(Q, v)
10 u.color ← black
Personaleinkommensteuerschätzungskommissionsmitglieds-
reisekostenrechnungsergänzungsrevisionsfund (Mark Twain)
start finish
Classification of edges:
Tree edges
Back edges
Forward edges (nonexistent in an undirected graph)
Cross edges (nonexistent in an undirected graph)
Topological sort
DECREASE-KEY(Q, v, k),
Cost: O(n3 ).
Applications
It depends. If each worker is familiar only with the same one job
(say, digging), then no.
Example At a dance party, with 300 students, every boy knows
50 girls and every girl knows 50 boys. Can they all dance
simultaneously so that only pairs who know each other dance with
each other?
Bipartite graph: left set A (of girls), right set B (of boys).
Matching, perfect matching.
For S ⊆ A let
N(S) ⊆ B
The total flow entering a non-end node equals the total flow
leaving it:
X
for all u ∈ V \ {s, t} f (u, v) = 0.
v
s 3/4 9 7
t
3/13 4
v2 v4
14
s t
12/12 s 3/4 9 t
v1 v3 7
12/20
9/16 7/13 4/4
v2 v4
s 3/4 9 7
t 4/14
3/13 4 12/12
v2 v4 v1 v3
18/20
14 9/16
s 3/4 9 6/7
t
Increment it along the path 13/13 4/4
v2 v4
s-v2 -v4 -t by 4.
10/14
Then increment along the path
s-v2 -v4 -v3 -t by 6. Now we are
New idea: Increase (by 1) on
stuck: no more direct way to
(s, v1 ), decrease on (v1 , v2 ),
augment.
increase on (v2 , v4 , v3 , t).
Residual network, augmenting path
original flow?) 4
12/12
v1 v3
19/20
10/16
s 2/4 9 7/7 t
13/13 4/4
v2 v4
11/14
This cannot be improved: look at the cut (S, T ) with T = {v3 , t}.
Method of augmenting paths
12/12
v1 v3
12/20
9/16
s 3/4 9 7
t
3/13 4
v2 v4
14
S T
The equivalence of the first two statements says that the size of the
maximum flow is equal to the size of the minimum cut.
Proof: 1 ⇒ 2 and 2 ⇒ 3 are obvious. The crucial step is 3 ⇒ 1 .
Given f with no augmenting paths, we construct (S, T ): let S be
the nodes reachable from s in the residual network G f .
Proof of the marriage theorem
Let H = S ∩ A, H 0 = N(H).
H� ∩ T
H� ∩ S n > c(S, T )
= c({s}, T ) + c(S ∩ A, T ∩ B) + c(S, {t})
¾ (n − |H|) + |H 0 ∩ T | + |H 0 ∩ S|
T ∩A = n − |H| + |H 0 |,
H =S∩A
|H| > |H 0 |.
s
Polynomial-time algorithm
O(m2 n).
where y runs over the possible values which we will call witnesses.
For 0 < λ, an algorithm A(x) is a λ-approximation if
|{ {u, v} ∈ E : u ∈ S, v ∈ S }|.
Greedy algorithm:
Repeat: find a point on one side of the cut whose moving
to the other side increases the cutsize.
Theorem If you cannot improve anymore with this algorithm
then you are within a factor 2 of the optimum.
Generalize maximum cut for the case where edges e have weights
w e , that is maximize
X
wuv .
u∈S,v∈S
1 C ←;
2 E 0 ← E[G]
3 while E 0 6= ; do
4 let (u, v) be an arbitrary edge in E 0
5 C ← C ∪ {u, v}
6 remove from E 0 every edge incident on either u or v
7 return C
Theorem Approx_Vertex_Cover has a ratio bound of 2.
We will show a graph where this algorithm does not even have a
constant approximation ratio.
A = {a1 , . . . , an }.
B2 = {b2,1 , . . . , b2,bn/2c }. Here, b2,1 is connected to a1 , a2 ,
further b2,2 to a3 , a4 , and so on.
B3 = {b3,1 , . . . , b3,bn/3c }. Here b3,1 is connected to a1 , a2 , a3 ,
further b3,2 to a4 , a5 , a6 , and so on.
and so on.
Bn consists of a single point connected to all elements of A.
The n elements of A will cover all edges. Greedy algorithm:
Each element of A has degree ¶ (n − 1), so it chooses Bn first.
Then Bn−1 , then Bn−2 , and so on, down to B2 .
The total number of points chosen is
Can it be worse than this? No, even for a more general problem:
set cover.
Given (X , F ): a set X and a family F of subsets of X , find a
min-size subset of F covering X .
Example: Smallest committee with people covering all skills.
Generalization: Set S has weight w(S) > 0. We want a
minimum-weight set cover.
The algorithm Greedy_Set_Cover(X , F ):
1 U←X
2 C ←;
3 while U 6= ; do
4 select an S ∈ F that maximizes |S ∩ U|/w(S)
5 U ← U \S
6 C ← C ∪ {S}
7 return C
Analysis
maximize w x
P
Pj j j
subject to j aj x j ¶ b,
x i = 0, 1, i = 1, . . . , n.
Idea for approximation: break each w i into a smaller number of big
chunks, and use dynamic programming. Let r > 0, w i0 = bw i /rc.
P 0
maximize w x
Pj j j
subject to j aj x j ¶ b,
x i = 0, 1, i = 1, . . . , n.
Examples
shortest vs. longest simple paths
compositeness
subset sum
perfect matching
graph isomorphism
3-coloring
Ultrasound test of sex of fetus.
Decision problems vs. optimization problems vs. search problems.
V (x, w)
with yes/no values that verifies, for a given input x and witness
(certificate) w whether w is indeed witness for x.
Decision problem: Is there a witness (say, a Hamilton cycle)?
Search problem: Find a witness!
The same decision problem may belong to very different
verification functions (search problems):
Examples
Reducing matching to maximum flow.
Vertex cover to maximum independent set: C is a vertex cover
iff V \ C is an independent set.
NP-hardness.
NP-completeness.
Satisfiability of Boolean formulas
Boolean formulas.