0 ratings 0% found this document useful (0 votes) 329 views 332 pages The Design and Analysis of Algorithms - 1992 Kozen, Dext
The document lists various texts and monographs in computer science, authored by notable figures such as Suad Alagic and Edsger W. Dijkstra, covering topics from database programming to algorithm design. It includes a preface by Dexter C. Kozen, detailing his lecture notes for a graduate course on the design and analysis of algorithms, which combines core and advanced topics. The course material emphasizes asymptotic complexity and includes numerous lectures and homework exercises, aiming to prepare students for PhD qualifying exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save The Design and Analysis of Algorithms - 1992 Kozen... For Later Texts and Monographs in Computer Science
SSS
‘Suad Alagic
Object-Oriented Database Programming
1989, XV, 320 pages, 84 illus.
‘Suad Alagic
Relational Database Technology
1986. XI, 259 pages, 114 illus.
Suad Alagic and Michael A. Arbib
The Design of Well-Structured and Correct Programs
1978. X, 292 pages, 68 illus.
S. Thomas Alexander
Adaptive Signal Processing: Theory and Applications
1986. IX, 179 pages, 42 illus.
Krzysztof R. Apt and Ernst-Ridiger Olderog
Verification of Sequential and Concurrent Programs
1991. XVI, 441 pages
Michael A. Arbib, AJ. Kfoury, and Robert N. Moll
A Basis for Theoretical Computer Sclence
1981. Vill, 220 pages, 49 illus.
Friedrich L. Bauer and Hans Wossner
Algorithmic Language and Program Development
1982, XVI, 497 pages, 109 Illus.
Kaare Christian
A Gulde to Modula-2
1986. XIX, 436 pages, 46 illus.
Edsger W. Dijkstra
Selected Writings on Computing: A Personal Perspective
1982. XVII, 362 pages, 13 illus.
Edsger W. Dijkstra and Carel S. Scholten
Predicate Calculus and Program Semantics
1990. Xl, 220 pages
W.H.J. Feijen, Au.M. van Gasteren, D. Gries, and J. Misra, Eds.
Beauty Is Our Bualness: A Birthday Salute to Edsger W. Dijkstra
1990. XX, 453 pages, 21 illus,
P.A. Fejer and D.A. Simovici
Mathematical Foundations of Computer Science, Volume |:
Sets, Relations, and Induction
1990. X, 425 pages, 36 illus.
continued after indexThe Design
and Analysis
of Algorithms
Dexter C. Kozen
With 72 Illustrations
®
Springer-Verlag
New York Berlin Heidelberg London Paris
Tokyo Hong Kong Barcelona BudapestDexter C. Kozen
Department of Computer Science
Cornell University
Upson Hall
Ithaca, NY 14853-7501
USA
Series Editor:
David Gries
Department of Computer Science
Cornell University
Upson Hall
Ithaca, NY 14853-7501
USA
Library of Congress Cataloging-in-Publication Data
Kozen, Dexter, 1951- _.
The design and analysis of algorithms / Dexter C. Kozen.
Pp. om.
Includes bibliographical references and index.
ISBN 0-387-97687-6
1. Computer algorithms. |. Title.
QA76,9.A43K69 1991
005,1—de20
Printed on acid-free paper.
© 1992 Springer-Verlag New York, Inc.
All rights reserved, This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New
York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or here-
after developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication,
even if the former are not especially identified, is not to be taken as a sign that such names,
as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used
freely by anyone.
91-38759
Production managed by Bill Imbornoni; manufacturing supervised by Jacqui Ashri.
Photocomposed from a LaTeX file.
Printed and bound by R.R. Donnelley & Sons, Inc., Harrisonburg, VA.
Printed in the United States of America.
987654321
ISBN 0-387-97687-6 Springer-Veriag New York Berlin Heidelberg
ISBN 3-540-97687-6 Springer-Verlag Berlin Heidelberg New YorkTo my wife Frances
and my sons Alexander, Geoffrey, and TimothyPreface
These are my lecture notes from CS681: Design and Analysis of Algo-
rithms, a one-semester graduate course I taught at Cornell for three consec-
utive fall semesters from ’88 to 90. The course serves a dual purpose: to
cover core material in algorithms for graduate students in computer science
preparing for their PhD qualifying exams, and to introduce theory students to
some advanced topics in the design and analysis of algorithms. The material
is thus a mixture of core and advanced topics.
At first I meant these notes to supplement and not supplant a textbook,
but over the three years they gradually took on a life of their own. In addition
to the notes, I depended heavily on the texts
e A.V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis
of Computer Algorithms. Addison-Wesley, 1975.
« M. R. Garey and D. S. Johnson, Computers and Intractibility: A Guide
to the Theory of NP-Completeness. W. H. Freeman, 1979.
e R.E. Tarjan, Data Structures and Network Algorithms. SIAM Regional
Conference Series in Applied Mathematics 44, 1983.
and still recommend them as excellent references.
The course consists of 40 lectures. The notes from these lectures were
prepared using scribes. At the beginning of each lecture, I would assign a
scribe who would take notes for the entire class and prepare a raw JATgX
source, which I would then doctor and distribute. In addition to the 40 lec-
tures, I have included 10 homework sets and several miscellaneous homework
exercises, all with complete solutions. The notes that were distributed are
essentially as they appear here; no major reorganization has been attempted.
There is a wealth of interesting topics, both classical and current, that I
would like to have touched on but could not for lack of time. Many of these,
such as computational geometry and factoring algorithms, could fill an entire
semester. Indeed, one of the most difficult tasks was deciding how best to
spend a scant 40 lectures.
I wish to thank all the students who helped prepare these notes and who
kept me honest: Mark Aagaard, Mary Ann Branch, Karl-Friedrich Bohringer,
Thomas Bressoud, Suresh Chari, Sofoklis Efremidis, Ronen Feldman, Ted
viiFischer, Richard Huff, Michael Kalantar, Steve Kautz, Dani Lischinski, Pe-
ter Bro Miltersen, Marc Parmet, David Pearson, Dan Proskauer, Uday Rao,
Mike Reiter, Gene Ressler, Alex Russell, Laura Sabel, Aravind Srinivasan,
Sridhar Sundaram, Ida Szafranska, Filippo Tampieri, and Sam Weber. I am
especially indebted to my teaching assistants Mark Novick (fall ’88), Alessan-
dro Panconesi (fall ’89), and Kjartan Stefansson (fall ’90) for their help with
proofreading, preparation of solution sets, and occasional lecturing. I am also
indebted to my colleagues Lészl6 Babai, Gianfranco Bilardi, Michael Luby,
Keith Marzullo, Erik Meineche Schmidt, Bernd Sturmfels, Eva Tardos, Steve
Vavasis, Sue Whitesides, and Rich Zippel for valuable comments and interest-
ing exercises. Finally, I wish to express my sincerest gratitude to my colleague
Vijay Vazirani, who taught the course in fall ’87 and who was an invaluable
source of help.
I would be most grateful for any suggestions or criticism from readers.
Cornell University Dexter Kozen
Ithaca, NY December 1990
viiiContents
Preface vii
I Lectures
1 Algorithms and Their Complexity 3
2 Topological Sort and MST
3 Matroids and Independence. . . . 13
4 Depth-First and Breadth-First Search 19
5 Shortest Paths and Transitive Closure... . tae 25
6 Kleene Algebra... 1... ee ee eee
7 More on Kleene Algebra... . .
8 Binomial Heaps se
9 Fibonacci Heaps . .
10 Union-Find.....
11 Analysis of Union-Find
12 Splay Trees
13 Random Search Trees
14 Planar and Plane Graphs .. . .
15 The Planar Separator Theorem .
16 MaxFlow ............
17 More on Max Flow ......
18 Still More on Max Flow... .
19 Matching
20 More on Matching
21 Reductions and NP-Completeness
22 More on Reductions and NP-Completeness 116
23 More NP-Complete Problems 122
24 Still More NP-Complete Problems 128
25 Cook’s Theorem ......... 134
26 Counting Problems and #P 138
27 Counting Bipartite Matchings 144
28 Parallel Algorithms and NC . 151
29 Hypercubes and the Gray Representation 156
30 Integer Arithmeticin NC .......... wae . + 160
31 Csanky’s Algorithm .............0...000. - + 166Chistov's Algorithm
Matrix Rank
Linear Equations and Polynomial GCDs. . .
The Fast Fourier Transform (FFT)
Luby’s Algorithm ........
Analysis of Luby’s Algorithm
Miller’s Primality Test
Analysis of Miller’s Primality Test ......
Probabilistic Tests with Polynomials
II Homework Exercises
Homework 1 219
Homework 2 . 220
Homework 3 . 221
Homework 4 222
Homework5....... 223
Homework 6....... 224
Homework 7 225
Homework 8 226
Homework 9....... 227
Homework 10 ...... 228
Miscellaneous Exercises 230
III Homework Solutions
Homework 1 Solutions 239
Homework 2 Solutions . 242
Homework 3 Solutions . 245
Homework 4 Solutions . 250
Homework 5 Solutions . 252
Homework 6 Solutions 254
Homework 7 Solutions 257
Homework 8 Solutions 260
Homework 9 Solutions 262
Homework 10 Solutions tae oe 268
Solutions to Miscellaneous Exercises .............. - + 272
Bibliography 301
Index 309I LecturesLecture 1 Algorithms and Their
Complexity
This is a course on the design and analysis of algorithms intended for first-
year graduate students in computer science. Its purposes are mixed: on the
one hand, we wish to cover some fairly advanced topics in order to provide
a glimpse of current research for the benefit of those who might wish to spe-
cialize in this area; on the other, we wish to introduce some core results and
techniques which will undoubtedly prove useful to those planning to specialize
in other areas.
We will assume that the student is familiar with the classical material nor-
mally taught in upper-level undergraduate courses in the design and analysis
of algorithms. In particular, we will assume familiarity with:
© sequential machine models, including Turing machines and random ac-
cess machines (RAMs)
© discrete mathematical structures, including graphs, trees, and dags, and
their common representations (adjacency lists and matrices)
e fundamental data structures, including lists, stacks, queues, arrays, bal-
anced trees
« fundamentals of asymptotic analysis, including O(-), o(-), and 2(-) no-
tation, and techniques for the solution of recurrences
« fundamental programming techniques, such as recursion, divide-and-
conquer, dynamic programming
e basic sorting and searching algorithms.
These notions are covered in the early chapters of [3, 39, 100].
34 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY
Familiarity with elementary algebra, number theory, and discrete proba-
bility theory will be helpful. In particular, we will be making occasional use of
the following concepts: linear independence, basis, determinant, eigenvalue,
polynomial, prime, modulus, Euclidean algorithm, greatest common divisor,
group, ring, field, random variable, expectation, conditional probability, con-
ditional expectation. Some excellent classical references are [69, 49, 33].
The main emphasis will be on asymptotic worst-case complexity. This
measures how the worst-case time or space complexity of a problem grows
with the size of the input. We will also spend some time on probabilistic
algorithms and analysis.
1.1. Asymptotic Complexity
Let f and g be functions NW — N, where N denotes the natural numbers
{0,1,...}. Formally,
© f is O(g) if
dee N Yn f(n) = g(n).
e f is Q(g) if f is both O(g) and Q(g).
There is one cardinal rule:
Always use O and o for upper bounds and 2 for lower bounds. Never
use O for lower bounds.LECTURE 1_ALGORITHMS AND THEIR COMPLEXITY
There is some disagreement about the definition of 2. Some authors (such
as [43]) prefer the definition as given above. Others (such as [108]) prefer: f
is Q(g) if g is not o(f); in other words, f is Q(g) if
3ceN Sn f(a) >>-9(n).
(The notation J means “there exist infinitely many”.) The latter is weaker and
presumably easier to establish, but the former gives sharper results. We won’t
get into the fray here, but just comment that neither definition precludes
algorithms from taking less than the stated bound on certain inputs. For
example, the assertion, “The running time of mergesort is Q(nlogn)” says
that there is a c such that for all but finitely many n, there is some input
sequence of length n on which mergesort makes at least Anlog n comparisons.
There is nothing to prevent mergesort from taking less time on some other
input of length n.
The exact interpretation of statements involving O, 0, and 2 depends on
assumptions about the underlying model of computation, how the input is
presented, how the size of the input is determined, and what constitutes a
single step of the computation. In practice, authors often do not bother to
write these down. For example, “The running time of mergesort is O(n logn)”
means that there is a fixed constant c such that for any n elements drawn from
a totally ordered set, at most cnlogn comparisons are needed to produce a
sorted array. Here nothing is counted in the running time except the number
of comparisons between individual elements, and each comparison is assumed
to take one step; other operations are ignored. Similarly, nothing is counted
in the input size except the number of elements; the size of each element
(whatever that may mean) is ignored.
It is important to be aware of these unstated assumptions and understand
how to make them explicit and formal when reading papers in the field. When
making such statements yourself, always have your underlying assumptions in
mind. Although many authors don’t bother, it is a good habit to state any
assumptions about the model of computation explicitly in any papers you
write.
The question of what assumptions are reasonable is more often than not a
matter of esthetics. You will become familiar with the standard models and
assumptions from reading the literature; beyond that, you must depend on
your own conscience.
1.2 Models of Computation
Our principal model of computation will be the unit-cost random access ma-
chine (RAM), Other models, such as uniform circuits and PRAMs, will be
introduced when needed. The RAM model allows random access and the use6 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY
of arrays, as well as unit-cost arithmetic and bit-vector operations on arbi-
trarily large integers; see [3].
For graph algorithms, arithmetic is often unnecessary. Of the two main
representations of graphs, namely adjacency matrices and adjacency lists, the
former requires random access and 92(n”) array storage; the latter, only linear
storage and no random access. (For graphs, linear means O(n + m), where
n is the number of vertices of the graph and m is the number of edges.) The
most esthetically pure graph algorithms are those that use the adjacency list
representation and only manipulate pointers. To express such algorithms one
can formulate a very weak model of computation with primitive operators
equivalent to car, edr, cons, eq, and nil of pure LISP; see also [99].
1.3. A Grain of Salt
No mathematical model can reflect reality with perfect accuracy. Mathemat-
ical models are abstractions; as such, they are necessarily flawed.
For example, it is well known that it is possible to abuse the power of
unit-cost RAMs by encoding horrendously complicated computations in large
integers and solving intractible problems in polynomial time [50]. However,
this violates the unwritten rules of good taste. One possible preventative
measure is to use the log-cost model; but when used as intended, the unit-cost
model reflects experimental observation more accurately for data of moderate
size (since multiplication really does take one unit of time), besides making
the mathematical analysis a lot simpler.
Some theoreticians consider asymptotically optimal results as a kind of
Holy Grail, and pursue them with a relentless frenzy (present company not
necessarily excluded). This often leads to contrived and arcane solutions that
may be superior by the measure of asymptotic complexity, but whose con-
stants are so large or whose implementation would be so cumbersome that
no improvement in technology would ever make them feasible. What is the
value of such results? Sometimes they give rise to new data structures or
new techniques of analysis that are useful over a range of problems, but more
often than not they are of strictly mathematical interest. Some practitioners
take this activity as an indictment of asymptotic complexity itself and refuse
to admit that asymptotics have anything at all to say of interest in practical
software engineering.
Nowhere is the argument more vociferous than in the theory of parallel
computation. There are those who argue that many of the models of compu-
tation in common use, such as uniform circuits and PRAMs, are so inaccurate
as to render theoretical results useless. We will return to this controversy later
on when we talk about parallel machine models.
Such extreme attitudes on either side are unfortunate and counterproduc-
tive. By now asymptotic complexity occupies an unshakable position in our
computer science consciousness, and has probably done more to guide us inLECTURE 1_ ALGORITHMS AND THEIR COMPLEXITY 7
improving technology in the design and analysis of algorithms than any other
mathematical abstraction. On the other hand, one should be aware of its lim-
itations and realize that an asymptotically optimal solution is not necessarily
the best one.
A good rule of thumb in the design and analysis of algorithms, as in life, is
to use common sense, exercise good taste, and always listen to your conscience.
1.4 Strassen’s Matrix Multiplication Algorithm
Probably the single most important technique in the design of asymptotically
fast algorithms is divide-and-conquer. Just to refresh our understanding of this
technique and the use of recurrences in the analysis of algorithms, let’s take a
look at Strassen’s classical algorithm for matrix multiplication and some of its
progeny. Some of these examples will also illustrate the questionable lengths
to which asymptotic analysis can sometimes be taken.
The usual method of matrix multiplication takes 8 multiplications and 4
additions to multiply two 2 x 2 matrices, or in general O(n) arithmetic oper-
ations to multiply two n x n matrices. However, the number of multiplications
can be reduced. Strassen [97] published one such algorithm for multiplying
2 x 2 matrices using only 7 multiplications and 18 additions:
[? al[s i _ [steyerte sa+ 85
ed gh 86+ 87 82 — 83 + 85 — 87
where
81 = (b-d)-(9+h)
82 = (a+d)-(e+h)
83 = (a—0)-(e+ f)
s4 = h-(a+b)
8 = a-(f-h)
86 = d-(g-e)
$87 = e-(c+d).
Assume for simplicity that n is a power of 2. (This is not the last time you will
hear that.) Apply the 2 x 2 algorithm recursively on a pair of n x n matrices
by breaking each of them up into four square submatrices of size Bxh
AB) IE F] _ [ 5+%-S44+56 Sat Ss
CD GH|~ Sot 8; S_— 53 + Ss — Sy
where
S, = (B-D)-(G+H)8 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY
Sy = (A+D)-(E+H)
S35 = (A-C)-(E4+FP)
Sa H-(A+B)
Ss = A-(F-H)
Ss = D-(G-B)
S; = E.(C+D).
I
tl
Everything is the same as in the 2 x 2 case, except now we are manipulat-
ing } x } matrices instead of scalars. (We have to be slightly cautious, since
matrix multiplication is not commutative.) Ultimately, how many scalar oper-
ations (+, —,-) does this recursive algorithm perform in multiplying two n x n
matrices? We get the recurrence
T(n) = 70(5) + dn?
with solution
T(n) (i+ fan's? + O(n?)
O(n's2")
= O(n?)
I
which is o(n*). Here d is a fixed constant, and dn? represents the time for the
matrix additions and subtractions.
This is already a significant asymptotic improvement over the naive algo-
rithm, but can we do even better? In general, an algorithm that uses c multi-
plications to multiply two d x d matrices, used as the basis of such a recursive
algorithm, will yield an O(n'°##°) algorithm. To beat Strassen’s algorithm, we
must have c < d'°627, For a 3 x 3 matrix, we need c < 37 = 21.8,.., but
the best known algorithm uses 23 multiplications.
In 1978, Victor Pan (83, 84] showed how to multiply 70 x 70 matrices using
143640 multiplications, This gives an algorithm of approximately O(n?7°~),
The asymptotically best algorithm known to date, which is achieved by en-
tirely different methods, is O(n?3"--) [25], Every algorithm must be 2(n?),
since it has to look at all the entries of the matrices; no better lower bound is
known,Lecture 2 Topological Sort and MST
A recurring theme in asymptotic analysis is that it is often possible to get
better asymptotic performance by maintaining extra information about the
structure. Updating this extra information may slow down each individual
step; this additional cost is sometimes called overhead. However, it is often
the case that a small amount of overhead yields dramatic improvements in the
asymptotic complexity of the algorithm,
To illustrate, let’s look at topological sort. Let G = (V, E) be a directed
acyclic graph (dag). The edge set E of the dag G induces a partial order (a
reflexive, antisymmetric, transitive binary relation) on V, which we denote
by E* and define by: uE*v if there exists a directed E-path of length 0 or
greater from u to v. The relation E* is called the reflexive transitive closure
of E.
Proposition 2.1 Every partial order extends to a total order (a partial order
in which every pair of elements is comparable).
Proof. If R is a partial order that is not a total order, then there exist u,v
such that neither wRv nor vRu. Extend R by setting
R = RU{(z,y) | eRu and vRy} .
The new R is a partial order extending the old R, and in addition now uRv.
Repeat until there are no more incomparable pairs. a10 LECTURE 2_TopoLoGicaL Sorr AND MST
In the case of a dag G = (V,£) with associated partial order E*, to say
that a total order < extends E* is the same as saying that if uZv then u < v.
Such a total order is called a topological sort of the dag G. A naive O(n*)
algorithm to find a topological sort can be obtained from the proof of the
above proposition.
Here is a faster algorithm, although still not optimal.
Algorithm 2.2 (Topological Sort II)
1. Start from any vertex and follow edges backwards until finding a
vertex u with no incoming edges. Such a u must be encountered
eventually, since there are no cycles and the dag is finite.
2, Make u the next vertex in the total order.
3. Delete u and all adjacent edges and go to step 1.
Using the adjacency list representation, the running time of this algorithm is
O(n) steps per iteration for n iterations, or O(n?).
The bottleneck here is step 1, A minor modification will allow us to perform
this step in constant time. Assume the adjacency list representation of the
graph associates with each vertex two separate lists, one for the incoming
edges and one for the outgoing edges. If the representation is not already of
this form, it can easily be put into this form in linear time. The algorithm
will maintain a queue of vertices with no incoming edges. This will reduce the
cost of finding a vertex with no incoming edges to constant time at a slight
extra overhead for maintaining the queue.
Algorithm 2.3 (Topological Sort III)
1. Initialize the queue by traversing the graph and inserting each v
whose list of incoming edges is empty.
2. Pick a vertex u off the queue and make u the next vertex in the
total order.
3. Delete u and all outgoing edges (u,v). For each such », if its list
of incoming edges becomes empty, put v on the queue. Go to step
2.
Step 1 takes time O(n). Step 2 takes constant time, thus O(n) time over all
iterations. Step 3 takes time O(m) over all iterations, since each edge can be
deleted at most once. The overall time is O(m + n).
Later we will see a different approach involving depth first search.LecTurE 2_Topo.ocicaL Sort AND MST 11
2.1 Minimum Spanning Trees
Let G = (V, E) be a connected undirected graph.
Definition 2.4 A forest in G is a subgraph F = (V, E’) with no cycles. Note
that F has the same vertex set as G. A spanning tree in G is a forest with
exactly one connected component. Given weights w : E + N (edges are
assigned weights over the natural numbers), a minimum (weight) spanning
tree (MST) in G is a spanning tree T’ whose total weight (sum of the weights
of the edges in 7’) is minimum over all spanning trees. a
Lemma 2.5 Let F = (V,E) be an undirected graph, c the number of con-
nected components of F, m = |E|, and n = |V|. Then F has no cycles iff
c+m=n,
Proof.
(—) By induction on m. If m = 0, then there are n vertices and each
forms a connected component, so c = n. If an edge is added without forming
a cycle, then it must join two components. Thus m is increased by 1 and c is
decreased by 1, so the equation c+ m = n is maintained.
(+) Suppose that F has at least one cycle. Pick an arbitrary cycle and
remove an edge from that cycle. Then m decreases by 1, but c and n remain
the same. Repeat until there are no more cycles. When done, the equation
c+m =n holds, by the preceding paragraph; but then it could not have held
originally. a
We use a greedy algorithm to produce a minimum weight spanning tree.
This algorithm is originally due to Kruskal [66].
Algorithm 2.6 (Greedy Algorithm for MST)
1. Sort the edges by weight.
2. For each edge on the list in order of increasing weight, include that
edge in the spanning tree if it does not form a cycle with the edges
already taken; otherwise discard it.
The algorithm can be halted as soon as n — 1 edges have been kept, since we
know we have a spanning tree by Lemma 2.5.
Step 1 takes time O(m log m) = O(m log n) using any one of a number of
general sorting methods, but can be done faster in certain cases, for example
if the weights are small integers so that bucket sort can be used.
Later on, we will give an almost linear time implementation of step 2, but
for now we will settle for O(n log n). We will think of including an edge e in the
spanning tree as taking the union of two disjoint sets of vertices, namely the
vertices in the connected components of the two endpoints of e in the forest12 LECTURE 2_TopoLocicaL SorTr AND MST
being built. We represent each connected component as a linked list. Each
list element points to the next element and has a back pointer to the head of
the list. Initially there are no edges, so we have n lists, each containing one
vertex. When a new edge (u,v) is encountered, we check whether it would
form a cycle, i.e. whether u and v are in the same connected component,
by comparing back pointers to see if u and v are on the same list. If not,
we add (u,v) to the spanning tree and take the union of the two connected
components by merging the two lists. Note that the lists are always disjoint,
so we don’t have to check for duplicates.
Checking whether u and v are in the same connected component takes
constant time. Each merge of two lists could take as much as linear time,
since we have to traverse one list and change the back pointers, and there
are n — 1 merges; this will give O(n?) if we are not careful. However, if we
maintain counters containing the size of each component and always merge
the smaller into the larger, then each vertex can have its back pointer changed
at most log n times, since each time the size of its component at least doubles.
If we charge the change of a back pointer to the vertex itself, then there are at
most logn changes per vertex, or at most nlogn in all. Thus the total time
for all list merges is O(n logn).
2.2 The Blue and Red Rules
Here is a more general approach encompassing most of the known algorithms
for the MST problem. For details and references, see (100, Chapter 6], which
proves the correctness of the greedy algorithm as a special case of this more
general approach. In the next lecture, we will give an even more general
treatment.
Let G = (V,£) be an undirected connected graph with edge weights w :
E—N. Consider the following two rules for coloring the edges of G, which
Tarjan [100] calls the blue rule and the red rule:
Blue Rule: Find a cut (a partition of V into two disjoint sets X and
V — X) such that no blue edge crosses the cut. Pick an uncolored edge
of minimum weight between X and V — X and color it blue.
Red Rule: Find a cycle (a path in G starting and ending at the same
vertex) containing no red edge. Pick an uncolored edge of maximum
weight on that cycle and color it red.
The greedy algorithm is just a repeated application of a special case of the
blue rule. We will show next time:
Theorem 2.7 Starting with all edges uncolored, if the blue and red rules are
applied in arbitrary order until neither applies, then the final set of blue edges
forms a minimum spanning tree.Lecture 3 Matroids and Independence
Before we prove the correctness of the blue and red rules for MST, let’s first
discuss an abstract combinatorial structure called a matroid. We will show
that the MST problem is a special case of the more general problem of find-
ing a minimum-weight maximal independent set in a matroid. We will then
generalize the blue and red rules to arbitrary matroids and prove their cor-
rectness in this more general setting. We will show that every matroid has a
dual matroid, and that the blue and red rules of a matroid are the red and
blue rules, respectively, of its dual. Thus, once we establish the correctness of
the blue rule, we get the red rule for free.
We will also show that a structure is a matroid if and only if the greedy
algorithm always produces a minimum-weight maximal independent set for
any weighting.
Definition 3.1 A matroid is a pair (S,Z) where S is a finite set and Z is a
family of subsets of S such that
() if JeZ andi C J, then €T;
(ii) if I,J € Z and |J| < |J|, then there exists an 2 € J — J such that
TU {x} eZ.
The elements of Z are called independent sets and the subsets of S not in I
are called dependent sets. “Oo
This definition is supposed to capture the notion of independence in a
general way. Here are some examples:
1314 LECTURE 3_MATROIDS AND INDEPENDENCE
1, Let V be a vector space, let 5 be a finite subset of V, and let J C 25 be
the family of linearly independent subsets of S. This example justifies
the term “independent”.
2. Let A be a matrix over a field, let S be the set of rows of A, and let
I C 2 be the family of linearly independent subsets of 3.
3. Let G = (V,E) be a connected undirected graph. Let S = E and let Z
be the set of forests in G. This example gives the MST problem of the
previous lecture.
4, Let G = (V,E) be a connected undirected graph. Let S = E and let
T be the set of subsets E’ C E such that the graph (V,E — E’) is
connected.
5. Elements a1,...,@n of a field are said to be algebraically independent
over a subfield & if there is no nontrivial polynomial p(x,...,2n) with
coefficients in k such that p(ai,...,@n) = 0. Let $ be a finite set of
elements and let Z be the set of subsets of S that are algebraically
independent over k.
Definition 3.2 A cycle (or circuit) of a matroid ($,Z) is a setwise minimal
(i.e., minimal with respect to set inclusion) dependent set. A cut (or cocircuit)
of (S,Z) is a setwise minimal subset of $ intersecting all maximal independent
sets. Q
The terms circuit and cocircuit are standard in matroid theory, but we
will continue to use cycle and cut to maintain the intuitive connection with
the special case of MST. However, be advised that cuts in graphs as defined in
the last lecture are unions of cuts as defined here. For example, in the graph
t
s
the set {(s,u), (t,u)} forms a cut in the sense of MST, but not a cut in
the sense of the matroid, because it is not minimal. However, a moment’s
thought reveals that this difference is inconsequential as far as the blue rule
is concerned.
Let the elements of S be weighted. We wish to find a setwise maximal
independent set whose total weight is minimum among all setwise maximal
independent sets. In this more general setting, the blue and red rules become:
Blue Rule: Find a cut with no blue element. Pick an uncolored ele-
ment of the cut of minimum weight and color it blue.
Red Rule: Find a cycle with no red element. Pick an element of the
cycle of maximum weight and color it red.LECTURE 3 MATROIDS AND INDEPENDENCE 15,
3.1 Matroid Duality
As the astute reader has probably noticed by now, there is some kind of duality
afoot. The similarity between the blue and red rules is just too striking to be
mere coincidence.
Definition 3.3 Let (S,Z) be a matroid. The dual matroid of (S,Z) is (S,Z*),
where
I* = {subsets of S disjoint from some maximal element of Z}.
In other words, the maximal elements of Z* are the complements in S of the
maximal elements of Z. Qo
The examples 3 and 4 above are duals. Note that I** =. Be careful: it
is not the case that a set is independent in a matroid iff it is dependent in its
dual. For example, except in trivial cases, @ is independent in both matroids,
Theorem 3.4
1. Cuts in (S,Z) are cycles in (S,I*).
2. The blue rule in (S,Z) is the red rule in (S,Z*) with the ordering of the
weights reversed.
3.2 Correctness of the Blue and Red Rules
Now we prove the correctness of the blue and red rules in arbitrary matroids,
A proof for the special case of MST can be found in Tarjan’s book [100,
Chapter 6]; Lawler [70] states the blue and red rules for arbitrary matroids
but omits a proof of correctness.
Definition 3.5 Let ($,Z) be a matroid with dual (5,Z*). An acceptable
coloring is a pair of disjoint sets B ¢ I (the blue elements) and R € I* (the
red elements). An acceptable coloring B, R is total if BUR = S,ieif Bisa
maximal independent set and R is a maximal independent set in the dual. An
acceptable coloring B’, R’ extends or is an extension of an acceptable coloring
B,RifBC BandRC R. a
Lemma 3.6 Any acceptable coloring has a total acceptable extension.
Proof. Let B, R be an acceptable coloring. Let U* be a maximal element
of T* extending R, and let U = S$ —U*. Then U is a maximal element of
T disjoint from R. As long as |B| < |U|, select elements of U and add them
to B, maintaining independence. This is possible by axiom (ii) of matroids.
Let B be the resulting set. Since all maximal independent sets have the same
cardinality (Exercise 1a, Homework 1), B is a maximal element of I containing
B and disjoint from R. The desired total extension is 8, S — B. o16 LECTURE 3_MATROIDS AND INDEPENDENCE
Lemma 3.7 A cut and a cycle cannot intersect in exactly one element.
Proof. Let C be a cut and D a cycle. Suppose that CN D = {x}. Then
D-—{z} is independent and C — {z} is independent in the dual. Color D— {x}
blue and C'—{z} red; by Lemma 3.6, this coloring extends to a total acceptable
coloring. But depending on the color of 2, either C' is all red or D is all blue;
this is impossible in an acceptable coloring, since D is dependent and C is
dependent in the dual. oO
Suppose B is independent and BU{z} is dependent. Then BU{z} contains
a minimal dependent subset or cycle C,, called the fundamental cycle! of z and
B. The cycle C must contain x, because C — {x} is contained in B and is
therefore independent.
Lemma 3.8 (Exchange Lemma) Let B,R be a total acceptable coloring.
(i) Let x € R and let y lie on the fundamental cycle of x and B. If the
colors of x and y are exchanged, the resulting coloring is acceptable.
(ii) Let y € B and let x lie on the fundamental cut of y and R (the funda-
mental cut of y and R is the fundamental cycle of y and R in the dual
matroid). If the colors of x and y are exchanged, the resulting coloring
is acceptable.
Proof. By duality, we need only prove (i). Let C’ be the fundamental cycle
of x and B and let y lie on C. If y = 2, there is nothing to prove. Otherwise
y € B. The set C'— {y} is independent since C’ is minimal. Extend C—{y} by
adding elements of |B| as in the proof of Lemma 3.6 until achieving a maximal
independent set B’. Then B’ = (B — {y}) U {x}, and the total acceptable
coloring B’, S — B’ is obtained from B, R by switching the colors of x and y.
Qa
A total acceptable coloring B, F is called optimal if B is of minimum weight
among all maximal independent sets; equivalently, if R is of maximum weight
among all maximal independent sets in the dual matroid.
Lemma 3.9 Jf an acceptable coloring has an optimal total extension before
execution of the blue or red rule, then so has the resulting coloring afterwards.
Proof. We prove the case of the blue rule; the red rule follows by duality.
Let B,R be an acceptable coloring with optimal total extension B, R. Let A
be a cut containing no blue elements, and let x be an uncolored element of
A of minimum weight. If x € B, we are done, so assume that x € R. Let C
be the fundamental cycle of x and B. By Lemma 3.7, ANC must contain
TWe say “the” because it is unique (Exercise 1b, Homework 1), although we do not need
to know this for our argument.LECTURE 3_MarTRoIpS AND INDEPENDENCE 17
another element besides z, say y. Then y € B, and y ¢ B because there are
no blue elements of A. By Lemma 3.8, the colors of x and y in B,R can be
exchanged to obtain a total acceptable coloring B’, R extending BU {zr}, R.
Moreover, 8’ is of minimum weight, because the weight of x is no more than
that of y. a
We also need to know
Lemma 3.10 Jf an acceptable coloring is not total, then either the blue or red
rule applies.
Proof. Let B, R be an acceptable coloring with uncolored element z. By
Lemma 3.6, B, 2 has a total extension B,R. By duality, assume without loss
of generality that 2 € B. Let C be the fundamental cut of « and R. Since all
elements of C’ besides « are in R, none of them are blue in B. Thus the blue
rule applies. a
Combining Lemmas 3.9 and 3.10, we have
Theorem 3.11 If we start with an uncolored weighted matroid and apply the
blue or red rules in any order until neither applies, then the resulting coloring
is an optimal total acceptable coloring.
What is really going on here is that all the subsets of the maximal inde
pendent sets of minimal weight form a submatroid of (S,Z), and the blue rule
gives a method for implementing axiom (ii) for this matroid; see Miscellaneous
Exercise 1.
3.3 Matroids and the Greedy Algorithm
We have shown that if (S,T) is a matroid, then the greedy algorithm produces
a maximal independent set of minimum weight. Here we show the converse:
if ($,Z) is not a matroid, then the greedy algorithm fails for some choice of
integer weights. Thus the abstract concept of matroid captures exactly when
the greedy algorithm works.
Theorem 3.12 ([32]; see also [70]) A system (S,Z) satisfying axiom (i) of
matroids is a matroid (i.e., it satisfies (ii)) if and only if for all weight as-
signments w: SN, the greedy algorithm gives a minimum-weight maximal
independent set.
Proof. The direction (+) has already been shown. For (+), let (S,Z)
Satisfy (i) but not (ii). There must be A, B such that A,B € Z, |A| < |B,
but for nox € B- A is AU{x} ET.
Assume without loss of generality that B is a mazimal independent set.
If it is not, we can add elements to B maintaining the independence of B; for18 Lecrurge 3_MATROIDS AND INDEPENDENCE
any element that we add to B that can also be added to A while preserving
the independence of A, we do so. This process never changes the fact that
|A| < |B| and for no x € B- Ais AU{z} eT.
Now we assign weights vw: S + N. Let a = |A— Bl and 6 = |B — Al.
Then a a,b. (Actually h > 6? will do.)
Case 1 If A is a maximal independent set, assign
w(z)=a+1 force B-A
w(x)=b+1 force A-—B
w(x) =0 force ANB
w(t) =h forz AUB.
Thus
w(A)
w(B)
a(b +1) ab+a
b(a+1) = ab+o.
This weight assignment forces the greedy algorithm to choose B when in fact
A is a maximal independent set of smaller weight.
Case 2 If A is not a maximal independent set, assign
w(x)=0 force A
w(x)=b frre B-A
w(x)=h forx ¢AUB.
All the elements of A will be chosen first, and then a huge element outside of
AUB must be chosen, since A is not maximal. Thus the minimum-weight
maximal independent set B was not chosen. aLecture 4 Depth-First and Breadth-First
Search
Depth-first search (DFS) and breadth-first search (BFS) are two of the most
useful subroutines in graph algorithms. They allow one to search a graph
in linear time and compile information about the graph. They differ in that
the former uses a stack (LIFO) discipline and the latter uses a queue (FIFO)
discipline to choose the next edge to explore.
Undirected depth-first search produces in linear time a numbering of the
vertices called the depth-first numbering and a particular spanning tree called
the depth-first spanning tree of each connected component. This is done as
follows. Choose an arbitrary vertex u, which will become the root of the tree.
Push all edges (u,v) € E onto the stack. Assign u the DFS number 0 and
set the DFS counter c to 1. Now repeat the following activity until the stack
becomes empty. Let (z,y) be the top element of the stack. This is the next
edge to explore. The vertex x has a DFS number already (this is an invariant
of the loop). If y has no DFS number, assign it the DFS number c, increment
¢, push all edges (y, z) € E onto the stack, and make the (directed) edge (x, y)
a tree edge. Otherwise, if y has a DFS number already, just pop (x,y) off the
stack.
The tree edges form a directed spanning tree of the connected component
of u rooted at u. It is a dag rooted at u, since tree edges (x,y) only go from
lower numbered vertices to higher numbered vertices. It is a tree, since no
vertex has indegree greater than one; this is because (x,y) becomes a tree
edge only if y has no DFS number, and thereafter y has a DFS number. It is
1920 Lecrure 4_DEpTH-FirsT AND BREADTH-FIRST SEARCH
a spanning tree, since it is easily shown inductively that every vertex in the
connected component of u eventually receives a DFS number. This spanning
tree is called the depth-first spanning tree.
We can repeat the whole process with a new arbitrarily chosen unvisited
vertex to search the other connected components.
The non-tree edges (x, y) are called back edges and are directed from higher
numbered to lower numbered vertices. When we draw a DFS tree, we usually
draw the root at the top, the tree edges pointing down (hence the term depth-
first), and the back edges pointing up.
Back edges out of » can only go to ancestors of v in the DFS tree. There
cannot be a back edge to a nonancestor, since that edge would have been
explored earlier from the other direction and would have been a tree edge.
DFS takes time O(m +n) where n is the number of vertices and m is the
number of edges, since each edge is stacked at most once in each direction,
and each edge and vertex requires a constant amount of processing.
See (3, 78] for an alternative treatment.
4.1 Biconnected Components
Let G = (V, E) be a connected undirected graph.
Definition 4.1 A vertex v is an articulation point if its removal disconnects
the graph. Oo
Definition 4.2 A connected graph is called biconnected if any pair of distinct
vertices u and » lie on a simple cycle (one with no repeated vertices). Q
Note that according to this definition, a graph with two vertices connected by
a single edge is biconnected (no one said anything about not repeating edges).
If G is not biconnected, we define the biconnected components of G in terms
of an equivalence relation on edges:
Definition 4.3 For e,e’ € E, define e = ¢’ if e and e’ lie ona simple cycle.
Oo
Lemma 4.4 The relation = is an equivalence relation (reflexive, symmetric,
and transitive).
Proof. Reflexivity e = e follows from the fact that the edge e and its two
endpoints constitute a simple cycle. The relation is symmetric, since e and
e’ can be interchanged in the definition of =. The hard one is transitivity.
Suppose (u,v) = (u’,v’) and (u,v) = (u,v). Let cand ¢ be the two simple
cycles involved, respectively. Assume u,w’,v’,v occur in that order around ¢.
Let x be the first vertex on the segment of c from u to wu’ that also lies in c;
x must exist since u’ € ¢, at least. Let y be the first vertex on the segment ofLECTURE 4 DepTH-FIRsT AND BREADTH-FIRST SEARCH 21
¢ from v to v’ that also lies in c’; y must exist since v' € c. Also, « # y since
cis simple. Let p be the path from « to y in c containing (u,v) and let p’ be
the path from « to y in ¢ containing (u,v”). Then p and p’ intersect only in
x and y, and together form a simple cycle containing (u,v) and (u",v”). 2
Definition 4.5 The equivalence classes of = are called biconnected compo-
nents. o
Lemma 4.6 The vertex a is an articulation point iff a is contained in at least
two biconnected components.
Proof. Suppose the removal of a disconnects the graph. Then there exist
wand v adjacent to a such that every path from u to v goes through a. Then
the edges (u,a) and (a,v) cannot lie on a simple cycle, thus are in different
biconnected components.
Conversely, suppose u and v are adjacent to a and (u,a) # (a,v). Then
all paths between u and v must go through a. Thus if a is removed, there is
no path between u and v, so G is disconnected. Q
Below, when using the terms “descendant” and “ancestor” in a depth-first
search tree, we will always consider a vertex u to be a descendant of itself and
an ancestor of itself. In other words, we take the descendant and ancestor
relations to be reflexive. If we want to exclude u, we do so explicitly by using
the terms “proper descendant” and “proper ancestor”.
Lemma 4.7 Let (u,v) and (v,w) be two adjacent tree edges in a depth-first
search tree of G. Then (u,v) = (v,w) if and only if there exists a back edge
from some descendant of w to some ancestor of u.
Proof.
(—+) If there exists a back edge from some descendant w! of w to some
ancestor u’ of u, then (u,v) and (v,w) are edges in a simple cycle consisting
of the back edge (w’, u’) along with the path of tree edges from u' to w’. Thus
(u,v) = (v, w).
(+) Suppose (u,v) = (v,w). Then there must be a simple cycle containing
them. This cycle must contain the edges (u,v) and (v, w) in this order, since
it may only go through v once. Consider the subtree of the depth-first tree
Tooted at w. The simple cycle must contain a back edge (w', u') out of this
subtree, since it must get back to u eventually. (Before coming out, the path
inside the subtree can be quite complicated, since it can traverse tree and back
edges in either direction—don’t forget that the graph is undirected.) Then w!
is a descendant of w and w' is an ancestor of w’. Since w’ is not in the subtree
rooted at w, it must be an ancestor of v. But it cannot be v because v cannot
be used twice on the cycle. Therefore u’ must be an ancestor of u. Q22 LEcTu DepTH-First AND BREADTH-FirsT SEARCH
The biconnected components can be found from a DFS tree as follows.
Assume the vertices are named by their DFS numbers. We compute a value
for each vertex v, called low(v), which gives the DFS number of the lowest
numbered vertex 2 (i.e. the highest in the tree) such that there is a back edge
from some descendant of v to z. By Lemmas 4.6 and 4.7, a vertex u will be
an articulation point, and the biconnected component of the tree edge (u, v)
will lie entirely in the subtree rooted at u, if low(v) > u. We can inductively
compute low(v) as follows:
rz
y
low(v)
min{low(w) | w is an immediate descendant of v}
min{z | z is reachable by a back edge from v}
= min(z,y) .
The values low(v) can be computed simultaneously with the construction of
the DFS tree in linear time. As soon as an articulation point u is discovered
with (u,v) a tree edge such that low(v) > u, the biconnected component
containing the edge (u,v) can be deleted from the graph. See (3, 78] for more
details.
4.2 Directed DFS
The DFS procedure on directed graphs is similar to DFS on undirected graphs,
except that we only follow edges from sources to sinks. Four types of edges
can result:
© tree edges to a vertex not yet visited
e back edges to an ancestor
forward edges to a descendant previously visited
© cross edges to a vertex previously visited that is neither an ancestor nor
a descendant.
There can be no cross edges to a higher numbered vertex; such an edge would
have been a tree edge. If we mark the vertex y when the tree edge (z,y) is
popped to indicate that the subtree below y has been completely explored,
we can recognize each of these four cases when we explore the edge (u,v) by
checking marks and comparing DFS numbers:
(u,v) is a if
tree edge DFS(v) does not exist
back edge DFS(v) < DFS(u) and v is not marked
forward edge | DFS(v) > DFS(u)
cross edge DFS(v) < DFS(u) and v is markedBb 4 TH- FIRS’ ND BREADTH-FIRS' RCH 23
The directed DFS tree can be constructed in linear time; see (3, 78] for details.
The first application of directed DFS is determining acyclicity:
Theorem 4.8 A directed graph is acyclic iff its DF'S forest has no back edges.
Proof. If there is a back edge, the graph is surely cyclic. Conversely, if
there are no back edges, consider the postorder numbering of the DFS forest:
traverse the forest in depth-first order, but number the vertices in the order
they are ast seen. Then tree edges, forward edges, and cross edges all go from
higher numbered to lower numbered vertices, so there can be no cycles. ©
4.3 Strong Components
Definition 4.9 Let G = (V,E) be a directed graph. For u,v € V, define
u =v if u and v lie on a directed cycle in G. This is an equivalence relation,
and its equivalence classes are called strongly connected components or just
strong components. A graph G is said to be strongly connected if for any pair
of vertices u,v there is a directed cycle in G containing u and 0; i.e., if G has
only one strong component. a
The strong components of a directed graph can be computed in linear time
using directed depth-first search. The algorithm is similar to the algorithm
for biconnected components in undirected graphs; see [3] for details.
4.4 Strong Components and Partial Orders
Strong components are important in the representation of partial orders. Fi-
nite partial orders are often represented as the reflexive transitive closures E*
of dags G = (V,E) (recall (u,v) € E* iff there exists an E-path from u to
v of length 0 or greater). If G is not acyclic, then the relation E* does not
satisfy the antisymmetry law, and is thus not a partial order. However, it is
still reflexive and transitive. Such a relation is called a preorder or sometimes
@ quasiorder.
Given an arbitrary preorder (P,~), define x ~ y if x y andy Xz.
This is an equivalence relation, and we can collapse its equivalence classes
into single points to get a partial order. This construction is called a quotient
construction. Formally, let [2] denote the ~-class of x and let P/s denote the
set of all such classes; i.e.,
(2)
Pix
tyly =z}
{[z]| 2 P}.
The preorder < induces a preorder, also denoted x, on P/* in a natural
way: [x] = [y] if 2 < y in P. (The choice of x and y in their respective
equivalence classes doesn’t matter.) It is easily shown that the preorder ~< is2 ‘URE DeprTu-FIRsT AND BREA) -FIRST SEARCH
actually a partial order on P/*; intuitively, by collapsing equivalence classes,
we identified those elements that caused antisymmetry to fail.
Forming the strong components of a directed (not necessarily acyclic)
graph G = (V,£) allows us to perform this operation effectively on the
preorder (V,E*). We form a quotient graph G/= by collapsing the strong
components of G into single vertices:
fe]
{u | u = v} (the strong component of v)
V/= = {hl |vev}
E = {((u, fel) | (ue) € B}
G/= = (V/=,B).
It is not hard to show that G/= is acyclic. Moreover,
Theorem 4.10 The partial orders (V/~, E*) and (V/=,(E’)*) are isomor-
phic.
In other words, the partial order represented by the collapsed graph is the
same as the collapse of the preorder represented by the original graph.Lecture 5 Shortest Paths and Transitive
Closure
5.1 Single-Source Shortest Paths
Let G = (V,£) be an undirected graph and let @ be a function assigning
a nonnegative length to each edge. Extend £ to domain V x V by defining
&v,v) = 0 and £(u,v) = oo if (u,v) ¢ EB. Define the length? of a path
P = €1€2...€, to be &(p) = Dh (ei). For u,v € V, define the distance
d(u,v) from u to v to be the length of a shortest path from wu to v, or oo if
no such path exists. The single-source shortest path problem is to find, given
5 €V, the value of d(s,u) for every other vertex u in the graph.
If the graph is unweighted (i.e., all edge lengths are 1), we can solve the
problem in linear time using BFS. For the more general case, here is an algo-
tithm due to Dijkstra [28]. Later on we will give an O(m +n log n) implemen-
tation using Fibonacci heaps. The algorithm is a type of greedy algorithm: it
builds a set X vertex by vertex, always taking vertices closest to X.
——____
In this context, the terms “length” and “shortest” applied to a path refer to é, not the
number of edges in the path.
2526 LECTURE 5 SHORTEST PATHS AND TRANSITIVE CLOSURE
Algorithm 5.1 (Dijkstra’s Algorithm)
X:= {3};
D(s) :=0;
for each u € V — {s} do
D(u) := &(s, u);
while X # V do
let ue V —X such that D(u) is minimum;
X = Xv {u};
for each edge (u,v) with v € V — X do
D(v) := min(D(v), D(u) + &(u, v))
end while
The final value of D(u) is d(s,u). This algorithm can be proved correct: by
showing that the following two invariants are maintained by the while loop:
e for any u, D(u) is the distance from s to u along a shortest path through
only vertices in X;
© for any u€ X, 0 ¢ X, D(u) < D(v).
5.2 Reflexive Transitive Closure
Let E denote the adjacency matrix of the directed graph G = (V, E). Using
Boolean matrix multiplication, the matrix E? has a 1 in position wv iff there
is a path of length exactly 2 from vertex u to vertex v; ie., iff there exists a
vertex w such that (u,w),(w,v) € E. Similarly, one can prove by induction
on k that (E*),, = 1 iff there is a path of length exactly k from u to v.
The reflexive transitive closure of G is
E* IVEVE*y..-
IVEVE*y..-vE™
(vey.
The infinite join is equal to the finite one because if there is a path connecting
u and v, then there is one of length at most n — 1.
Suppose that two n x n Boolean matrices can be multiplied in time M(n).
Then E* = (I v E)""! can be calculated in time O(M(n) logn) by squaring
E logn times. We will show below how to calculate E* in time O(M(n)).
Conversely, if there is an algorithm to compute E* in time T(n), then M(n)
is O(T(n)) (under the reasonable assumption that M(3n) is O(M(n))): to
multiply A and B, place them strategically into a 3n x 3n matrix, then take
its reflexive transitive closure:
0 AO
008B
000
* I A AB
or B
oo 7LECTURE 5 SHORTEST PATHS AND TRANSITIVE CLOSURE 27
The product AB can be read off from the upper right-hand block.
Here is a divide and conquer algorithm to find E* in time M(n).
Algorithm 5.2 (Refiexive Transitive Closure)
1. Divide E into 4 submatrices A, B,C, D of size roughly 3 X J such
that A and D are square.
ra
2. Recursively compute D*. Compute
F = A+BD*C.
Recursively compute F*.
3. Set
BY = F* F*BD*
D*CF* | D¥ + D*CF*BD
Essentially, we are partitioning the set of vertices into two disjoint sets U
and V, where A describes the edges from U to U, B describes edges from U
to V, C describes edges from V to U, and D describes edges from V to V.
We compute reflexive transitive closures on these sets recursively and use this
information to describe the reflexive transitive closure of E. Note that we
compute two reflexive transitive closures, a few matrix multiplications (whose
complexity is given by M) and a few matrix additions (whose complexity is
assumed to be quadratic) of matrices of roughly half the size of E. This gives
the recurrence
= 2 2 Rye
Tn) = 27(5) +eM(5) +4(5)
where c and d are constants. Under the quite reasonable assumption that
M(2n) > 4M(n), the solution to this recurrence is O(M(n)).
5.3. All-Pairs Shortest Paths
Let E denote the adjacency matrix of a directed graph with edge weights.
Replace the 1's in E by the edge weights and the 0’s by oo. Apply Algorithm
5.2 to calculate E*, except use + instead of A and min instead of V. We will
show next time that this solves the all-pairs shortest path problem.Lecture 6 Kleene Algebra
Consider a binary relation on an n element set represented by an nxn Boolean
matrix E. Recall from the last lecture that we can compute the reflexive
transitive closure of E by divide-and-conquer as follows: partition E into four
submatrices A, B,C, D of size roughly $ x 3 such that A and D are square:
+ fat
By induction, construct the matrices D*, F = A+ BD*C, and F*, then take
BY = F* F* BD* | (1)
D*CF* | D* + D*CF* BD .
We will prove that the matrix E* as defined in (1) is indeed the reflexive
transitive closure of E, but the proof will be carried out in a more abstract
setting which will allow us to use the same construction in other applications.
For example, we will be able to compute the lengths of the shortest paths
between all pairs of points in a weighted directed graph using the same general
algorithm, but with a different interpretation of the basic operations.
How did we come up with the expressions in (1)? This is best motivated by
considering a simple finite-state automaton over the alphabet Z = {a, b,c, d}
with states s,¢ and transitions s-% s, 5% t,t s,¢St:
28LECTURE 6 KLEENE ALGEBRA 29
For each pair of states u,v, consider the set of input strings in D* taking
state u to state v in this automaton. Each such set is a regular subset of D*
and is represented by a regular expression corresponding to the expressions
appearing in (1):
sos: f*
s—t : f*bd*
tos : d*cf*
tot: d* 4+d*cf*bd* ,
where f = a +bd*c. (See (3, §9.1, pp. 318-319] for more information on finite
automata and regular expressions.)
6.1 Definition of Kleene Algebras
The appropriate level of abstraction we are seeking is Kleene algebra. This
concept goes back to Kleene [61], but received significant impetus from the
work of Conway [21]. The definition here is from [63].
Definition 6.1 A (*continuous) Kleene algebra is any structure of the form
K = (S,+,-+,*, 0, 1)
where S is a set of elements, + and - are binary operations S x S — s,*
is a unary operation S — S, and 0 and 1 are distinguished elements of S,
satisfying the axioms
a+(b+c) = (a+b)+e (+ is associative) (2)
a+b = b+a (+ is commutative) (3)
at+a=a (+ is idempotent) (4)
a+0 = 0+a =a (0 is an a identity for +) (5)
a-(b-c) = (a-b)-c (- is associative) (6)
@l=tlaz=a (1 is an a identity for -) (7)
Oa =a0=0 (0 is an annihilator for -) (8)
a-(b+c) = a-b+a-c (- distributes over +) (9)
(b+c)-@ = b-a+c-a (10)
plus the following axiom to deal with the * operator, which will require further
explanation:
ab*c = sup ab"c (11)30 LECTURE 6 KLEENE ALGEBRA
where
eo =
et = boo,
a
Axioms (2-5) say that the structure (S, +, 0) is an idempotent commu-
tative monoid. Axioms (6-7) say that (S, -, 1) is a monoid. Axioms (8-10)
describe how these two monoid structures interact. Altogether, Axioms (2-10)
say that K is an idempotent semiring.
The axiom (11) asserts the existence of the supremum or least upper bound
of a certain set with respect to a certain partial order. In any idempotent
semiring, there is a natural partial order defined by
a 0}
exists and is equal to ab*c.
The postulate (11) captures axiomatically the behavior of reflexive transi-
tive closure of a binary relation. It also captures the behavior of the Kleene *
operator of formal language theory. In addition, there are many nonstandard
examples of Kleene algebras that are useful in various contexts. We will give
several examples below.
Instead of Kleene algebras, many authors (such as [3, 78]) use so-called
closed semirings. These structures are strongly related to Kleene algebras,LEcTuU} E EBRA
but are defined in terms of a countable summation operator 7 instead of a
supremum. In closed semirings, the * operator is not a primitive operator but
is defined in terms of > by
oe = Sor.
nz0
The countable summation operator >, which sums a countably infinite se-
quence of elements, is postulated not to depend on the order of the elements
in the sequence or their multiplicity, and thus is essentially a supremum. The
operator is also postulated to satisfy an infinite distributivity property that
we get for free for all suprema of interest by stating the axiom as we did in
(11).
The main drawback with closed semirings is that the suprema of all count-
able sets are required to exist, which is too many. Although every closed
semiring is a Kleene algebra, there are definitely Kleene algebras that are not
closed semirings. The most important example of such a Kleene algebra is the
family Regy of regular subsets of =*, where © is a finite alphabet (Example
6.2 below). This example is important because it is the free Kleene algebra
freely generated by ©, which essentially says that an equation between regular
expressions over © holds in all Kleene algebras if and only if it holds in Regy.
We will find this fact very useful in reducing arguments about Kleene algebras
in general to arguments about regular subsets of D*.
Kleene algebras were studied extensively in the monograph of Conway (21].
It is possible to axiomatize the equational theory of Kleene algebras in a purely
finitary way [65]. The precise relationship between Kleene algebras and closed
semirings is drawn in [64].
6.2 Examples of Kleene Algebras
Kleene algebras abound in computer science. Here are some examples.
Example 6.2 Let = be a finite alphabet and let Regs denote the family of
regular sets over © with the following operations:
A+B = AUB
A-B = {xy|reA, y € B}
A* = {xy22--+tq|n>0 and a; € A, 1 {a}
over Regy extends to the map
R:RExpy — Regs
in which R(q) is the regular set denoted by the regular expression a in the
usual sense. The interpretation R is called the standard interpretation over
Regs.
The following lemma. generalizes (11).
Lemma 7.1 Let R: © -+ Regy be the standard interpretation over Regs,
and let I: & — K be any interpretation over any Kleene algebra K. For any
regular expression a over D,
Ia) = up T@) : (13)
Note that since R(q) is a regular set of strings over the alphabet 5, the x
in (13) denotes a string. Strings over © are themselves regular expressions
over ©, so the expression I(r) makes sense. The equation (13) states that the
supremum of the possibly infinite set
{I(z)|eER@)} CK
exists and is equal to I(a). We leave the proof of Lemma 7.1 as an exercise
(Homework 3, Exercise 2).
It follows that for any pair a, @ of regular expressions over 5, the equation
@ = 3 is a logical consequence of the axioms of Kleene algebra, i.e. it holds
under all interpretations over all Kleene algebras, if and only if it holds under
the standard interpretation R over Regy. A fancy way of saying this is that
Regs is the free Kleene algebra on free generators ©.
Theorem 7.2 Let a and G be regular expressions over © and let R be the
standard interpretation over Regy. Then
(a) = 1(8)
for all interpretations I over Kleene algebras if and only if
R(a) = Rl).LECTURE. N_KLEENI EBRA
Proof. (+) This follows immediately from the fact that Regy is a Kleene
algebra and 2 is an interpretation over Regy.
(<) Suppose R(a) = R(G). Then
I(a)
sup I(x) by Lemma 7.1
ze R(a)
sup I(x) by the assumption R(a) = R(f)
zER(B)
1(@) , again by Lemma 7.1.
7.1 Matrix Kleene Algebras
The collection M(n, XK) of n x n matrices with elements in a Kleene algebra
K again forms a Kleene algebra, provided the Kleene algebra operators on
M(n,K) are defined appropriately. We always define + as ordinary matrix
addition, - as ordinary matrix multiplication, 0 as the zero matrix, 1 as the
identity matrix, and * recursively by equation (1) of the previous lecture. We
must show that all the axioms of Kleene algebra are satisfied by M(n,K)
under these definitions. For example, in M(2, KX) the identity elements for +
and - are
00 10
00 01
respectively, and the operations +, -, and * are given by
ab ef] _ [ate b+f
ce d|*|g h| ~ Le+g dth
ab] je fj} _ | ae+bg af + bh
cd g h| |ce+dg cf+dh
ab]* _ f[ ft frbd*
cd ~ | d¥cf* d* + d*cf*ba*
where f = a+bd*c. Note that A < B in the natural order on M(n, KX) defined
by (12) if and only if Ay < By for all 1 0
Now let A, B,C be arbitrary matrices over an arbitrary Kleene algebra K.
Let aj;, bij, cj denote the ij** elements of A, B, and C, respectively. Let J
be the interpretation
Taj) = 9
I(bijs) = by
I(cy) = cy.
Then
(AB*C)y = I((AB*C)y)
= sup{I(x) |x € R((AB*C),;)} by Lemma 7.1
= sp{i(a) |e € U R(ABYO),)}
30
= supsup{i(2) | 7 € R(AB‘C)y)}
= sup/((AB‘C),;)
k20
= i
sup(4B Chis 5