0% found this document useful (0 votes)
329 views332 pages

The Design and Analysis of Algorithms - 1992 Kozen, Dext

The document lists various texts and monographs in computer science, authored by notable figures such as Suad Alagic and Edsger W. Dijkstra, covering topics from database programming to algorithm design. It includes a preface by Dexter C. Kozen, detailing his lecture notes for a graduate course on the design and analysis of algorithms, which combines core and advanced topics. The course material emphasizes asymptotic complexity and includes numerous lectures and homework exercises, aiming to prepare students for PhD qualifying exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
329 views332 pages

The Design and Analysis of Algorithms - 1992 Kozen, Dext

The document lists various texts and monographs in computer science, authored by notable figures such as Suad Alagic and Edsger W. Dijkstra, covering topics from database programming to algorithm design. It includes a preface by Dexter C. Kozen, detailing his lecture notes for a graduate course on the design and analysis of algorithms, which combines core and advanced topics. The course material emphasizes asymptotic complexity and includes numerous lectures and homework exercises, aiming to prepare students for PhD qualifying exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 332
Texts and Monographs in Computer Science SSS ‘Suad Alagic Object-Oriented Database Programming 1989, XV, 320 pages, 84 illus. ‘Suad Alagic Relational Database Technology 1986. XI, 259 pages, 114 illus. Suad Alagic and Michael A. Arbib The Design of Well-Structured and Correct Programs 1978. X, 292 pages, 68 illus. S. Thomas Alexander Adaptive Signal Processing: Theory and Applications 1986. IX, 179 pages, 42 illus. Krzysztof R. Apt and Ernst-Ridiger Olderog Verification of Sequential and Concurrent Programs 1991. XVI, 441 pages Michael A. Arbib, AJ. Kfoury, and Robert N. Moll A Basis for Theoretical Computer Sclence 1981. Vill, 220 pages, 49 illus. Friedrich L. Bauer and Hans Wossner Algorithmic Language and Program Development 1982, XVI, 497 pages, 109 Illus. Kaare Christian A Gulde to Modula-2 1986. XIX, 436 pages, 46 illus. Edsger W. Dijkstra Selected Writings on Computing: A Personal Perspective 1982. XVII, 362 pages, 13 illus. Edsger W. Dijkstra and Carel S. Scholten Predicate Calculus and Program Semantics 1990. Xl, 220 pages W.H.J. Feijen, Au.M. van Gasteren, D. Gries, and J. Misra, Eds. Beauty Is Our Bualness: A Birthday Salute to Edsger W. Dijkstra 1990. XX, 453 pages, 21 illus, P.A. Fejer and D.A. Simovici Mathematical Foundations of Computer Science, Volume |: Sets, Relations, and Induction 1990. X, 425 pages, 36 illus. continued after index The Design and Analysis of Algorithms Dexter C. Kozen With 72 Illustrations ® Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona Budapest Dexter C. Kozen Department of Computer Science Cornell University Upson Hall Ithaca, NY 14853-7501 USA Series Editor: David Gries Department of Computer Science Cornell University Upson Hall Ithaca, NY 14853-7501 USA Library of Congress Cataloging-in-Publication Data Kozen, Dexter, 1951- _. The design and analysis of algorithms / Dexter C. Kozen. Pp. om. Includes bibliographical references and index. ISBN 0-387-97687-6 1. Computer algorithms. |. Title. QA76,9.A43K69 1991 005,1—de20 Printed on acid-free paper. © 1992 Springer-Verlag New York, Inc. All rights reserved, This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or here- after developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. 91-38759 Production managed by Bill Imbornoni; manufacturing supervised by Jacqui Ashri. Photocomposed from a LaTeX file. Printed and bound by R.R. Donnelley & Sons, Inc., Harrisonburg, VA. Printed in the United States of America. 987654321 ISBN 0-387-97687-6 Springer-Veriag New York Berlin Heidelberg ISBN 3-540-97687-6 Springer-Verlag Berlin Heidelberg New York To my wife Frances and my sons Alexander, Geoffrey, and Timothy Preface These are my lecture notes from CS681: Design and Analysis of Algo- rithms, a one-semester graduate course I taught at Cornell for three consec- utive fall semesters from ’88 to 90. The course serves a dual purpose: to cover core material in algorithms for graduate students in computer science preparing for their PhD qualifying exams, and to introduce theory students to some advanced topics in the design and analysis of algorithms. The material is thus a mixture of core and advanced topics. At first I meant these notes to supplement and not supplant a textbook, but over the three years they gradually took on a life of their own. In addition to the notes, I depended heavily on the texts e A.V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms. Addison-Wesley, 1975. « M. R. Garey and D. S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. e R.E. Tarjan, Data Structures and Network Algorithms. SIAM Regional Conference Series in Applied Mathematics 44, 1983. and still recommend them as excellent references. The course consists of 40 lectures. The notes from these lectures were prepared using scribes. At the beginning of each lecture, I would assign a scribe who would take notes for the entire class and prepare a raw JATgX source, which I would then doctor and distribute. In addition to the 40 lec- tures, I have included 10 homework sets and several miscellaneous homework exercises, all with complete solutions. The notes that were distributed are essentially as they appear here; no major reorganization has been attempted. There is a wealth of interesting topics, both classical and current, that I would like to have touched on but could not for lack of time. Many of these, such as computational geometry and factoring algorithms, could fill an entire semester. Indeed, one of the most difficult tasks was deciding how best to spend a scant 40 lectures. I wish to thank all the students who helped prepare these notes and who kept me honest: Mark Aagaard, Mary Ann Branch, Karl-Friedrich Bohringer, Thomas Bressoud, Suresh Chari, Sofoklis Efremidis, Ronen Feldman, Ted vii Fischer, Richard Huff, Michael Kalantar, Steve Kautz, Dani Lischinski, Pe- ter Bro Miltersen, Marc Parmet, David Pearson, Dan Proskauer, Uday Rao, Mike Reiter, Gene Ressler, Alex Russell, Laura Sabel, Aravind Srinivasan, Sridhar Sundaram, Ida Szafranska, Filippo Tampieri, and Sam Weber. I am especially indebted to my teaching assistants Mark Novick (fall ’88), Alessan- dro Panconesi (fall ’89), and Kjartan Stefansson (fall ’90) for their help with proofreading, preparation of solution sets, and occasional lecturing. I am also indebted to my colleagues Lészl6 Babai, Gianfranco Bilardi, Michael Luby, Keith Marzullo, Erik Meineche Schmidt, Bernd Sturmfels, Eva Tardos, Steve Vavasis, Sue Whitesides, and Rich Zippel for valuable comments and interest- ing exercises. Finally, I wish to express my sincerest gratitude to my colleague Vijay Vazirani, who taught the course in fall ’87 and who was an invaluable source of help. I would be most grateful for any suggestions or criticism from readers. Cornell University Dexter Kozen Ithaca, NY December 1990 viii Contents Preface vii I Lectures 1 Algorithms and Their Complexity 3 2 Topological Sort and MST 3 Matroids and Independence. . . . 13 4 Depth-First and Breadth-First Search 19 5 Shortest Paths and Transitive Closure... . tae 25 6 Kleene Algebra... 1... ee ee eee 7 More on Kleene Algebra... . . 8 Binomial Heaps se 9 Fibonacci Heaps . . 10 Union-Find..... 11 Analysis of Union-Find 12 Splay Trees 13 Random Search Trees 14 Planar and Plane Graphs .. . . 15 The Planar Separator Theorem . 16 MaxFlow ............ 17 More on Max Flow ...... 18 Still More on Max Flow... . 19 Matching 20 More on Matching 21 Reductions and NP-Completeness 22 More on Reductions and NP-Completeness 116 23 More NP-Complete Problems 122 24 Still More NP-Complete Problems 128 25 Cook’s Theorem ......... 134 26 Counting Problems and #P 138 27 Counting Bipartite Matchings 144 28 Parallel Algorithms and NC . 151 29 Hypercubes and the Gray Representation 156 30 Integer Arithmeticin NC .......... wae . + 160 31 Csanky’s Algorithm .............0...000. - + 166 Chistov's Algorithm Matrix Rank Linear Equations and Polynomial GCDs. . . The Fast Fourier Transform (FFT) Luby’s Algorithm ........ Analysis of Luby’s Algorithm Miller’s Primality Test Analysis of Miller’s Primality Test ...... Probabilistic Tests with Polynomials II Homework Exercises Homework 1 219 Homework 2 . 220 Homework 3 . 221 Homework 4 222 Homework5....... 223 Homework 6....... 224 Homework 7 225 Homework 8 226 Homework 9....... 227 Homework 10 ...... 228 Miscellaneous Exercises 230 III Homework Solutions Homework 1 Solutions 239 Homework 2 Solutions . 242 Homework 3 Solutions . 245 Homework 4 Solutions . 250 Homework 5 Solutions . 252 Homework 6 Solutions 254 Homework 7 Solutions 257 Homework 8 Solutions 260 Homework 9 Solutions 262 Homework 10 Solutions tae oe 268 Solutions to Miscellaneous Exercises .............. - + 272 Bibliography 301 Index 309 I Lectures Lecture 1 Algorithms and Their Complexity This is a course on the design and analysis of algorithms intended for first- year graduate students in computer science. Its purposes are mixed: on the one hand, we wish to cover some fairly advanced topics in order to provide a glimpse of current research for the benefit of those who might wish to spe- cialize in this area; on the other, we wish to introduce some core results and techniques which will undoubtedly prove useful to those planning to specialize in other areas. We will assume that the student is familiar with the classical material nor- mally taught in upper-level undergraduate courses in the design and analysis of algorithms. In particular, we will assume familiarity with: © sequential machine models, including Turing machines and random ac- cess machines (RAMs) © discrete mathematical structures, including graphs, trees, and dags, and their common representations (adjacency lists and matrices) e fundamental data structures, including lists, stacks, queues, arrays, bal- anced trees « fundamentals of asymptotic analysis, including O(-), o(-), and 2(-) no- tation, and techniques for the solution of recurrences « fundamental programming techniques, such as recursion, divide-and- conquer, dynamic programming e basic sorting and searching algorithms. These notions are covered in the early chapters of [3, 39, 100]. 3 4 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY Familiarity with elementary algebra, number theory, and discrete proba- bility theory will be helpful. In particular, we will be making occasional use of the following concepts: linear independence, basis, determinant, eigenvalue, polynomial, prime, modulus, Euclidean algorithm, greatest common divisor, group, ring, field, random variable, expectation, conditional probability, con- ditional expectation. Some excellent classical references are [69, 49, 33]. The main emphasis will be on asymptotic worst-case complexity. This measures how the worst-case time or space complexity of a problem grows with the size of the input. We will also spend some time on probabilistic algorithms and analysis. 1.1. Asymptotic Complexity Let f and g be functions NW — N, where N denotes the natural numbers {0,1,...}. Formally, © f is O(g) if dee N Yn f(n) = g(n). e f is Q(g) if f is both O(g) and Q(g). There is one cardinal rule: Always use O and o for upper bounds and 2 for lower bounds. Never use O for lower bounds. LECTURE 1_ALGORITHMS AND THEIR COMPLEXITY There is some disagreement about the definition of 2. Some authors (such as [43]) prefer the definition as given above. Others (such as [108]) prefer: f is Q(g) if g is not o(f); in other words, f is Q(g) if 3ceN Sn f(a) >>-9(n). (The notation J means “there exist infinitely many”.) The latter is weaker and presumably easier to establish, but the former gives sharper results. We won’t get into the fray here, but just comment that neither definition precludes algorithms from taking less than the stated bound on certain inputs. For example, the assertion, “The running time of mergesort is Q(nlogn)” says that there is a c such that for all but finitely many n, there is some input sequence of length n on which mergesort makes at least Anlog n comparisons. There is nothing to prevent mergesort from taking less time on some other input of length n. The exact interpretation of statements involving O, 0, and 2 depends on assumptions about the underlying model of computation, how the input is presented, how the size of the input is determined, and what constitutes a single step of the computation. In practice, authors often do not bother to write these down. For example, “The running time of mergesort is O(n logn)” means that there is a fixed constant c such that for any n elements drawn from a totally ordered set, at most cnlogn comparisons are needed to produce a sorted array. Here nothing is counted in the running time except the number of comparisons between individual elements, and each comparison is assumed to take one step; other operations are ignored. Similarly, nothing is counted in the input size except the number of elements; the size of each element (whatever that may mean) is ignored. It is important to be aware of these unstated assumptions and understand how to make them explicit and formal when reading papers in the field. When making such statements yourself, always have your underlying assumptions in mind. Although many authors don’t bother, it is a good habit to state any assumptions about the model of computation explicitly in any papers you write. The question of what assumptions are reasonable is more often than not a matter of esthetics. You will become familiar with the standard models and assumptions from reading the literature; beyond that, you must depend on your own conscience. 1.2 Models of Computation Our principal model of computation will be the unit-cost random access ma- chine (RAM), Other models, such as uniform circuits and PRAMs, will be introduced when needed. The RAM model allows random access and the use 6 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY of arrays, as well as unit-cost arithmetic and bit-vector operations on arbi- trarily large integers; see [3]. For graph algorithms, arithmetic is often unnecessary. Of the two main representations of graphs, namely adjacency matrices and adjacency lists, the former requires random access and 92(n”) array storage; the latter, only linear storage and no random access. (For graphs, linear means O(n + m), where n is the number of vertices of the graph and m is the number of edges.) The most esthetically pure graph algorithms are those that use the adjacency list representation and only manipulate pointers. To express such algorithms one can formulate a very weak model of computation with primitive operators equivalent to car, edr, cons, eq, and nil of pure LISP; see also [99]. 1.3. A Grain of Salt No mathematical model can reflect reality with perfect accuracy. Mathemat- ical models are abstractions; as such, they are necessarily flawed. For example, it is well known that it is possible to abuse the power of unit-cost RAMs by encoding horrendously complicated computations in large integers and solving intractible problems in polynomial time [50]. However, this violates the unwritten rules of good taste. One possible preventative measure is to use the log-cost model; but when used as intended, the unit-cost model reflects experimental observation more accurately for data of moderate size (since multiplication really does take one unit of time), besides making the mathematical analysis a lot simpler. Some theoreticians consider asymptotically optimal results as a kind of Holy Grail, and pursue them with a relentless frenzy (present company not necessarily excluded). This often leads to contrived and arcane solutions that may be superior by the measure of asymptotic complexity, but whose con- stants are so large or whose implementation would be so cumbersome that no improvement in technology would ever make them feasible. What is the value of such results? Sometimes they give rise to new data structures or new techniques of analysis that are useful over a range of problems, but more often than not they are of strictly mathematical interest. Some practitioners take this activity as an indictment of asymptotic complexity itself and refuse to admit that asymptotics have anything at all to say of interest in practical software engineering. Nowhere is the argument more vociferous than in the theory of parallel computation. There are those who argue that many of the models of compu- tation in common use, such as uniform circuits and PRAMs, are so inaccurate as to render theoretical results useless. We will return to this controversy later on when we talk about parallel machine models. Such extreme attitudes on either side are unfortunate and counterproduc- tive. By now asymptotic complexity occupies an unshakable position in our computer science consciousness, and has probably done more to guide us in LECTURE 1_ ALGORITHMS AND THEIR COMPLEXITY 7 improving technology in the design and analysis of algorithms than any other mathematical abstraction. On the other hand, one should be aware of its lim- itations and realize that an asymptotically optimal solution is not necessarily the best one. A good rule of thumb in the design and analysis of algorithms, as in life, is to use common sense, exercise good taste, and always listen to your conscience. 1.4 Strassen’s Matrix Multiplication Algorithm Probably the single most important technique in the design of asymptotically fast algorithms is divide-and-conquer. Just to refresh our understanding of this technique and the use of recurrences in the analysis of algorithms, let’s take a look at Strassen’s classical algorithm for matrix multiplication and some of its progeny. Some of these examples will also illustrate the questionable lengths to which asymptotic analysis can sometimes be taken. The usual method of matrix multiplication takes 8 multiplications and 4 additions to multiply two 2 x 2 matrices, or in general O(n) arithmetic oper- ations to multiply two n x n matrices. However, the number of multiplications can be reduced. Strassen [97] published one such algorithm for multiplying 2 x 2 matrices using only 7 multiplications and 18 additions: [? al[s i _ [steyerte sa+ 85 ed gh 86+ 87 82 — 83 + 85 — 87 where 81 = (b-d)-(9+h) 82 = (a+d)-(e+h) 83 = (a—0)-(e+ f) s4 = h-(a+b) 8 = a-(f-h) 86 = d-(g-e) $87 = e-(c+d). Assume for simplicity that n is a power of 2. (This is not the last time you will hear that.) Apply the 2 x 2 algorithm recursively on a pair of n x n matrices by breaking each of them up into four square submatrices of size Bxh AB) IE F] _ [ 5+%-S44+56 Sat Ss CD GH|~ Sot 8; S_— 53 + Ss — Sy where S, = (B-D)-(G+H) 8 LECTURE 1 ALGORITHMS AND THEIR COMPLEXITY Sy = (A+D)-(E+H) S35 = (A-C)-(E4+FP) Sa H-(A+B) Ss = A-(F-H) Ss = D-(G-B) S; = E.(C+D). I tl Everything is the same as in the 2 x 2 case, except now we are manipulat- ing } x } matrices instead of scalars. (We have to be slightly cautious, since matrix multiplication is not commutative.) Ultimately, how many scalar oper- ations (+, —,-) does this recursive algorithm perform in multiplying two n x n matrices? We get the recurrence T(n) = 70(5) + dn? with solution T(n) (i+ fan's? + O(n?) O(n's2") = O(n?) I which is o(n*). Here d is a fixed constant, and dn? represents the time for the matrix additions and subtractions. This is already a significant asymptotic improvement over the naive algo- rithm, but can we do even better? In general, an algorithm that uses c multi- plications to multiply two d x d matrices, used as the basis of such a recursive algorithm, will yield an O(n'°##°) algorithm. To beat Strassen’s algorithm, we must have c < d'°627, For a 3 x 3 matrix, we need c < 37 = 21.8,.., but the best known algorithm uses 23 multiplications. In 1978, Victor Pan (83, 84] showed how to multiply 70 x 70 matrices using 143640 multiplications, This gives an algorithm of approximately O(n?7°~), The asymptotically best algorithm known to date, which is achieved by en- tirely different methods, is O(n?3"--) [25], Every algorithm must be 2(n?), since it has to look at all the entries of the matrices; no better lower bound is known, Lecture 2 Topological Sort and MST A recurring theme in asymptotic analysis is that it is often possible to get better asymptotic performance by maintaining extra information about the structure. Updating this extra information may slow down each individual step; this additional cost is sometimes called overhead. However, it is often the case that a small amount of overhead yields dramatic improvements in the asymptotic complexity of the algorithm, To illustrate, let’s look at topological sort. Let G = (V, E) be a directed acyclic graph (dag). The edge set E of the dag G induces a partial order (a reflexive, antisymmetric, transitive binary relation) on V, which we denote by E* and define by: uE*v if there exists a directed E-path of length 0 or greater from u to v. The relation E* is called the reflexive transitive closure of E. Proposition 2.1 Every partial order extends to a total order (a partial order in which every pair of elements is comparable). Proof. If R is a partial order that is not a total order, then there exist u,v such that neither wRv nor vRu. Extend R by setting R = RU{(z,y) | eRu and vRy} . The new R is a partial order extending the old R, and in addition now uRv. Repeat until there are no more incomparable pairs. a 10 LECTURE 2_TopoLoGicaL Sorr AND MST In the case of a dag G = (V,£) with associated partial order E*, to say that a total order < extends E* is the same as saying that if uZv then u < v. Such a total order is called a topological sort of the dag G. A naive O(n*) algorithm to find a topological sort can be obtained from the proof of the above proposition. Here is a faster algorithm, although still not optimal. Algorithm 2.2 (Topological Sort II) 1. Start from any vertex and follow edges backwards until finding a vertex u with no incoming edges. Such a u must be encountered eventually, since there are no cycles and the dag is finite. 2, Make u the next vertex in the total order. 3. Delete u and all adjacent edges and go to step 1. Using the adjacency list representation, the running time of this algorithm is O(n) steps per iteration for n iterations, or O(n?). The bottleneck here is step 1, A minor modification will allow us to perform this step in constant time. Assume the adjacency list representation of the graph associates with each vertex two separate lists, one for the incoming edges and one for the outgoing edges. If the representation is not already of this form, it can easily be put into this form in linear time. The algorithm will maintain a queue of vertices with no incoming edges. This will reduce the cost of finding a vertex with no incoming edges to constant time at a slight extra overhead for maintaining the queue. Algorithm 2.3 (Topological Sort III) 1. Initialize the queue by traversing the graph and inserting each v whose list of incoming edges is empty. 2. Pick a vertex u off the queue and make u the next vertex in the total order. 3. Delete u and all outgoing edges (u,v). For each such », if its list of incoming edges becomes empty, put v on the queue. Go to step 2. Step 1 takes time O(n). Step 2 takes constant time, thus O(n) time over all iterations. Step 3 takes time O(m) over all iterations, since each edge can be deleted at most once. The overall time is O(m + n). Later we will see a different approach involving depth first search. LecTurE 2_Topo.ocicaL Sort AND MST 11 2.1 Minimum Spanning Trees Let G = (V, E) be a connected undirected graph. Definition 2.4 A forest in G is a subgraph F = (V, E’) with no cycles. Note that F has the same vertex set as G. A spanning tree in G is a forest with exactly one connected component. Given weights w : E + N (edges are assigned weights over the natural numbers), a minimum (weight) spanning tree (MST) in G is a spanning tree T’ whose total weight (sum of the weights of the edges in 7’) is minimum over all spanning trees. a Lemma 2.5 Let F = (V,E) be an undirected graph, c the number of con- nected components of F, m = |E|, and n = |V|. Then F has no cycles iff c+m=n, Proof. (—) By induction on m. If m = 0, then there are n vertices and each forms a connected component, so c = n. If an edge is added without forming a cycle, then it must join two components. Thus m is increased by 1 and c is decreased by 1, so the equation c+ m = n is maintained. (+) Suppose that F has at least one cycle. Pick an arbitrary cycle and remove an edge from that cycle. Then m decreases by 1, but c and n remain the same. Repeat until there are no more cycles. When done, the equation c+m =n holds, by the preceding paragraph; but then it could not have held originally. a We use a greedy algorithm to produce a minimum weight spanning tree. This algorithm is originally due to Kruskal [66]. Algorithm 2.6 (Greedy Algorithm for MST) 1. Sort the edges by weight. 2. For each edge on the list in order of increasing weight, include that edge in the spanning tree if it does not form a cycle with the edges already taken; otherwise discard it. The algorithm can be halted as soon as n — 1 edges have been kept, since we know we have a spanning tree by Lemma 2.5. Step 1 takes time O(m log m) = O(m log n) using any one of a number of general sorting methods, but can be done faster in certain cases, for example if the weights are small integers so that bucket sort can be used. Later on, we will give an almost linear time implementation of step 2, but for now we will settle for O(n log n). We will think of including an edge e in the spanning tree as taking the union of two disjoint sets of vertices, namely the vertices in the connected components of the two endpoints of e in the forest 12 LECTURE 2_TopoLocicaL SorTr AND MST being built. We represent each connected component as a linked list. Each list element points to the next element and has a back pointer to the head of the list. Initially there are no edges, so we have n lists, each containing one vertex. When a new edge (u,v) is encountered, we check whether it would form a cycle, i.e. whether u and v are in the same connected component, by comparing back pointers to see if u and v are on the same list. If not, we add (u,v) to the spanning tree and take the union of the two connected components by merging the two lists. Note that the lists are always disjoint, so we don’t have to check for duplicates. Checking whether u and v are in the same connected component takes constant time. Each merge of two lists could take as much as linear time, since we have to traverse one list and change the back pointers, and there are n — 1 merges; this will give O(n?) if we are not careful. However, if we maintain counters containing the size of each component and always merge the smaller into the larger, then each vertex can have its back pointer changed at most log n times, since each time the size of its component at least doubles. If we charge the change of a back pointer to the vertex itself, then there are at most logn changes per vertex, or at most nlogn in all. Thus the total time for all list merges is O(n logn). 2.2 The Blue and Red Rules Here is a more general approach encompassing most of the known algorithms for the MST problem. For details and references, see (100, Chapter 6], which proves the correctness of the greedy algorithm as a special case of this more general approach. In the next lecture, we will give an even more general treatment. Let G = (V,£) be an undirected connected graph with edge weights w : E—N. Consider the following two rules for coloring the edges of G, which Tarjan [100] calls the blue rule and the red rule: Blue Rule: Find a cut (a partition of V into two disjoint sets X and V — X) such that no blue edge crosses the cut. Pick an uncolored edge of minimum weight between X and V — X and color it blue. Red Rule: Find a cycle (a path in G starting and ending at the same vertex) containing no red edge. Pick an uncolored edge of maximum weight on that cycle and color it red. The greedy algorithm is just a repeated application of a special case of the blue rule. We will show next time: Theorem 2.7 Starting with all edges uncolored, if the blue and red rules are applied in arbitrary order until neither applies, then the final set of blue edges forms a minimum spanning tree. Lecture 3 Matroids and Independence Before we prove the correctness of the blue and red rules for MST, let’s first discuss an abstract combinatorial structure called a matroid. We will show that the MST problem is a special case of the more general problem of find- ing a minimum-weight maximal independent set in a matroid. We will then generalize the blue and red rules to arbitrary matroids and prove their cor- rectness in this more general setting. We will show that every matroid has a dual matroid, and that the blue and red rules of a matroid are the red and blue rules, respectively, of its dual. Thus, once we establish the correctness of the blue rule, we get the red rule for free. We will also show that a structure is a matroid if and only if the greedy algorithm always produces a minimum-weight maximal independent set for any weighting. Definition 3.1 A matroid is a pair (S,Z) where S is a finite set and Z is a family of subsets of S such that () if JeZ andi C J, then €T; (ii) if I,J € Z and |J| < |J|, then there exists an 2 € J — J such that TU {x} eZ. The elements of Z are called independent sets and the subsets of S not in I are called dependent sets. “Oo This definition is supposed to capture the notion of independence in a general way. Here are some examples: 13 14 LECTURE 3_MATROIDS AND INDEPENDENCE 1, Let V be a vector space, let 5 be a finite subset of V, and let J C 25 be the family of linearly independent subsets of S. This example justifies the term “independent”. 2. Let A be a matrix over a field, let S be the set of rows of A, and let I C 2 be the family of linearly independent subsets of 3. 3. Let G = (V,E) be a connected undirected graph. Let S = E and let Z be the set of forests in G. This example gives the MST problem of the previous lecture. 4, Let G = (V,E) be a connected undirected graph. Let S = E and let T be the set of subsets E’ C E such that the graph (V,E — E’) is connected. 5. Elements a1,...,@n of a field are said to be algebraically independent over a subfield & if there is no nontrivial polynomial p(x,...,2n) with coefficients in k such that p(ai,...,@n) = 0. Let $ be a finite set of elements and let Z be the set of subsets of S that are algebraically independent over k. Definition 3.2 A cycle (or circuit) of a matroid ($,Z) is a setwise minimal (i.e., minimal with respect to set inclusion) dependent set. A cut (or cocircuit) of (S,Z) is a setwise minimal subset of $ intersecting all maximal independent sets. Q The terms circuit and cocircuit are standard in matroid theory, but we will continue to use cycle and cut to maintain the intuitive connection with the special case of MST. However, be advised that cuts in graphs as defined in the last lecture are unions of cuts as defined here. For example, in the graph t s the set {(s,u), (t,u)} forms a cut in the sense of MST, but not a cut in the sense of the matroid, because it is not minimal. However, a moment’s thought reveals that this difference is inconsequential as far as the blue rule is concerned. Let the elements of S be weighted. We wish to find a setwise maximal independent set whose total weight is minimum among all setwise maximal independent sets. In this more general setting, the blue and red rules become: Blue Rule: Find a cut with no blue element. Pick an uncolored ele- ment of the cut of minimum weight and color it blue. Red Rule: Find a cycle with no red element. Pick an element of the cycle of maximum weight and color it red. LECTURE 3 MATROIDS AND INDEPENDENCE 15, 3.1 Matroid Duality As the astute reader has probably noticed by now, there is some kind of duality afoot. The similarity between the blue and red rules is just too striking to be mere coincidence. Definition 3.3 Let (S,Z) be a matroid. The dual matroid of (S,Z) is (S,Z*), where I* = {subsets of S disjoint from some maximal element of Z}. In other words, the maximal elements of Z* are the complements in S of the maximal elements of Z. Qo The examples 3 and 4 above are duals. Note that I** =. Be careful: it is not the case that a set is independent in a matroid iff it is dependent in its dual. For example, except in trivial cases, @ is independent in both matroids, Theorem 3.4 1. Cuts in (S,Z) are cycles in (S,I*). 2. The blue rule in (S,Z) is the red rule in (S,Z*) with the ordering of the weights reversed. 3.2 Correctness of the Blue and Red Rules Now we prove the correctness of the blue and red rules in arbitrary matroids, A proof for the special case of MST can be found in Tarjan’s book [100, Chapter 6]; Lawler [70] states the blue and red rules for arbitrary matroids but omits a proof of correctness. Definition 3.5 Let ($,Z) be a matroid with dual (5,Z*). An acceptable coloring is a pair of disjoint sets B ¢ I (the blue elements) and R € I* (the red elements). An acceptable coloring B, R is total if BUR = S,ieif Bisa maximal independent set and R is a maximal independent set in the dual. An acceptable coloring B’, R’ extends or is an extension of an acceptable coloring B,RifBC BandRC R. a Lemma 3.6 Any acceptable coloring has a total acceptable extension. Proof. Let B, R be an acceptable coloring. Let U* be a maximal element of T* extending R, and let U = S$ —U*. Then U is a maximal element of T disjoint from R. As long as |B| < |U|, select elements of U and add them to B, maintaining independence. This is possible by axiom (ii) of matroids. Let B be the resulting set. Since all maximal independent sets have the same cardinality (Exercise 1a, Homework 1), B is a maximal element of I containing B and disjoint from R. The desired total extension is 8, S — B. o 16 LECTURE 3_MATROIDS AND INDEPENDENCE Lemma 3.7 A cut and a cycle cannot intersect in exactly one element. Proof. Let C be a cut and D a cycle. Suppose that CN D = {x}. Then D-—{z} is independent and C — {z} is independent in the dual. Color D— {x} blue and C'—{z} red; by Lemma 3.6, this coloring extends to a total acceptable coloring. But depending on the color of 2, either C' is all red or D is all blue; this is impossible in an acceptable coloring, since D is dependent and C is dependent in the dual. oO Suppose B is independent and BU{z} is dependent. Then BU{z} contains a minimal dependent subset or cycle C,, called the fundamental cycle! of z and B. The cycle C must contain x, because C — {x} is contained in B and is therefore independent. Lemma 3.8 (Exchange Lemma) Let B,R be a total acceptable coloring. (i) Let x € R and let y lie on the fundamental cycle of x and B. If the colors of x and y are exchanged, the resulting coloring is acceptable. (ii) Let y € B and let x lie on the fundamental cut of y and R (the funda- mental cut of y and R is the fundamental cycle of y and R in the dual matroid). If the colors of x and y are exchanged, the resulting coloring is acceptable. Proof. By duality, we need only prove (i). Let C’ be the fundamental cycle of x and B and let y lie on C. If y = 2, there is nothing to prove. Otherwise y € B. The set C'— {y} is independent since C’ is minimal. Extend C—{y} by adding elements of |B| as in the proof of Lemma 3.6 until achieving a maximal independent set B’. Then B’ = (B — {y}) U {x}, and the total acceptable coloring B’, S — B’ is obtained from B, R by switching the colors of x and y. Qa A total acceptable coloring B, F is called optimal if B is of minimum weight among all maximal independent sets; equivalently, if R is of maximum weight among all maximal independent sets in the dual matroid. Lemma 3.9 Jf an acceptable coloring has an optimal total extension before execution of the blue or red rule, then so has the resulting coloring afterwards. Proof. We prove the case of the blue rule; the red rule follows by duality. Let B,R be an acceptable coloring with optimal total extension B, R. Let A be a cut containing no blue elements, and let x be an uncolored element of A of minimum weight. If x € B, we are done, so assume that x € R. Let C be the fundamental cycle of x and B. By Lemma 3.7, ANC must contain TWe say “the” because it is unique (Exercise 1b, Homework 1), although we do not need to know this for our argument. LECTURE 3_MarTRoIpS AND INDEPENDENCE 17 another element besides z, say y. Then y € B, and y ¢ B because there are no blue elements of A. By Lemma 3.8, the colors of x and y in B,R can be exchanged to obtain a total acceptable coloring B’, R extending BU {zr}, R. Moreover, 8’ is of minimum weight, because the weight of x is no more than that of y. a We also need to know Lemma 3.10 Jf an acceptable coloring is not total, then either the blue or red rule applies. Proof. Let B, R be an acceptable coloring with uncolored element z. By Lemma 3.6, B, 2 has a total extension B,R. By duality, assume without loss of generality that 2 € B. Let C be the fundamental cut of « and R. Since all elements of C’ besides « are in R, none of them are blue in B. Thus the blue rule applies. a Combining Lemmas 3.9 and 3.10, we have Theorem 3.11 If we start with an uncolored weighted matroid and apply the blue or red rules in any order until neither applies, then the resulting coloring is an optimal total acceptable coloring. What is really going on here is that all the subsets of the maximal inde pendent sets of minimal weight form a submatroid of (S,Z), and the blue rule gives a method for implementing axiom (ii) for this matroid; see Miscellaneous Exercise 1. 3.3 Matroids and the Greedy Algorithm We have shown that if (S,T) is a matroid, then the greedy algorithm produces a maximal independent set of minimum weight. Here we show the converse: if ($,Z) is not a matroid, then the greedy algorithm fails for some choice of integer weights. Thus the abstract concept of matroid captures exactly when the greedy algorithm works. Theorem 3.12 ([32]; see also [70]) A system (S,Z) satisfying axiom (i) of matroids is a matroid (i.e., it satisfies (ii)) if and only if for all weight as- signments w: SN, the greedy algorithm gives a minimum-weight maximal independent set. Proof. The direction (+) has already been shown. For (+), let (S,Z) Satisfy (i) but not (ii). There must be A, B such that A,B € Z, |A| < |B, but for nox € B- A is AU{x} ET. Assume without loss of generality that B is a mazimal independent set. If it is not, we can add elements to B maintaining the independence of B; for 18 Lecrurge 3_MATROIDS AND INDEPENDENCE any element that we add to B that can also be added to A while preserving the independence of A, we do so. This process never changes the fact that |A| < |B| and for no x € B- Ais AU{z} eT. Now we assign weights vw: S + N. Let a = |A— Bl and 6 = |B — Al. Then a a,b. (Actually h > 6? will do.) Case 1 If A is a maximal independent set, assign w(z)=a+1 force B-A w(x)=b+1 force A-—B w(x) =0 force ANB w(t) =h forz AUB. Thus w(A) w(B) a(b +1) ab+a b(a+1) = ab+o. This weight assignment forces the greedy algorithm to choose B when in fact A is a maximal independent set of smaller weight. Case 2 If A is not a maximal independent set, assign w(x)=0 force A w(x)=b frre B-A w(x)=h forx ¢AUB. All the elements of A will be chosen first, and then a huge element outside of AUB must be chosen, since A is not maximal. Thus the minimum-weight maximal independent set B was not chosen. a Lecture 4 Depth-First and Breadth-First Search Depth-first search (DFS) and breadth-first search (BFS) are two of the most useful subroutines in graph algorithms. They allow one to search a graph in linear time and compile information about the graph. They differ in that the former uses a stack (LIFO) discipline and the latter uses a queue (FIFO) discipline to choose the next edge to explore. Undirected depth-first search produces in linear time a numbering of the vertices called the depth-first numbering and a particular spanning tree called the depth-first spanning tree of each connected component. This is done as follows. Choose an arbitrary vertex u, which will become the root of the tree. Push all edges (u,v) € E onto the stack. Assign u the DFS number 0 and set the DFS counter c to 1. Now repeat the following activity until the stack becomes empty. Let (z,y) be the top element of the stack. This is the next edge to explore. The vertex x has a DFS number already (this is an invariant of the loop). If y has no DFS number, assign it the DFS number c, increment ¢, push all edges (y, z) € E onto the stack, and make the (directed) edge (x, y) a tree edge. Otherwise, if y has a DFS number already, just pop (x,y) off the stack. The tree edges form a directed spanning tree of the connected component of u rooted at u. It is a dag rooted at u, since tree edges (x,y) only go from lower numbered vertices to higher numbered vertices. It is a tree, since no vertex has indegree greater than one; this is because (x,y) becomes a tree edge only if y has no DFS number, and thereafter y has a DFS number. It is 19 20 Lecrure 4_DEpTH-FirsT AND BREADTH-FIRST SEARCH a spanning tree, since it is easily shown inductively that every vertex in the connected component of u eventually receives a DFS number. This spanning tree is called the depth-first spanning tree. We can repeat the whole process with a new arbitrarily chosen unvisited vertex to search the other connected components. The non-tree edges (x, y) are called back edges and are directed from higher numbered to lower numbered vertices. When we draw a DFS tree, we usually draw the root at the top, the tree edges pointing down (hence the term depth- first), and the back edges pointing up. Back edges out of » can only go to ancestors of v in the DFS tree. There cannot be a back edge to a nonancestor, since that edge would have been explored earlier from the other direction and would have been a tree edge. DFS takes time O(m +n) where n is the number of vertices and m is the number of edges, since each edge is stacked at most once in each direction, and each edge and vertex requires a constant amount of processing. See (3, 78] for an alternative treatment. 4.1 Biconnected Components Let G = (V, E) be a connected undirected graph. Definition 4.1 A vertex v is an articulation point if its removal disconnects the graph. Oo Definition 4.2 A connected graph is called biconnected if any pair of distinct vertices u and » lie on a simple cycle (one with no repeated vertices). Q Note that according to this definition, a graph with two vertices connected by a single edge is biconnected (no one said anything about not repeating edges). If G is not biconnected, we define the biconnected components of G in terms of an equivalence relation on edges: Definition 4.3 For e,e’ € E, define e = ¢’ if e and e’ lie ona simple cycle. Oo Lemma 4.4 The relation = is an equivalence relation (reflexive, symmetric, and transitive). Proof. Reflexivity e = e follows from the fact that the edge e and its two endpoints constitute a simple cycle. The relation is symmetric, since e and e’ can be interchanged in the definition of =. The hard one is transitivity. Suppose (u,v) = (u’,v’) and (u,v) = (u,v). Let cand ¢ be the two simple cycles involved, respectively. Assume u,w’,v’,v occur in that order around ¢. Let x be the first vertex on the segment of c from u to wu’ that also lies in c; x must exist since u’ € ¢, at least. Let y be the first vertex on the segment of LECTURE 4 DepTH-FIRsT AND BREADTH-FIRST SEARCH 21 ¢ from v to v’ that also lies in c’; y must exist since v' € c. Also, « # y since cis simple. Let p be the path from « to y in c containing (u,v) and let p’ be the path from « to y in ¢ containing (u,v”). Then p and p’ intersect only in x and y, and together form a simple cycle containing (u,v) and (u",v”). 2 Definition 4.5 The equivalence classes of = are called biconnected compo- nents. o Lemma 4.6 The vertex a is an articulation point iff a is contained in at least two biconnected components. Proof. Suppose the removal of a disconnects the graph. Then there exist wand v adjacent to a such that every path from u to v goes through a. Then the edges (u,a) and (a,v) cannot lie on a simple cycle, thus are in different biconnected components. Conversely, suppose u and v are adjacent to a and (u,a) # (a,v). Then all paths between u and v must go through a. Thus if a is removed, there is no path between u and v, so G is disconnected. Q Below, when using the terms “descendant” and “ancestor” in a depth-first search tree, we will always consider a vertex u to be a descendant of itself and an ancestor of itself. In other words, we take the descendant and ancestor relations to be reflexive. If we want to exclude u, we do so explicitly by using the terms “proper descendant” and “proper ancestor”. Lemma 4.7 Let (u,v) and (v,w) be two adjacent tree edges in a depth-first search tree of G. Then (u,v) = (v,w) if and only if there exists a back edge from some descendant of w to some ancestor of u. Proof. (—+) If there exists a back edge from some descendant w! of w to some ancestor u’ of u, then (u,v) and (v,w) are edges in a simple cycle consisting of the back edge (w’, u’) along with the path of tree edges from u' to w’. Thus (u,v) = (v, w). (+) Suppose (u,v) = (v,w). Then there must be a simple cycle containing them. This cycle must contain the edges (u,v) and (v, w) in this order, since it may only go through v once. Consider the subtree of the depth-first tree Tooted at w. The simple cycle must contain a back edge (w', u') out of this subtree, since it must get back to u eventually. (Before coming out, the path inside the subtree can be quite complicated, since it can traverse tree and back edges in either direction—don’t forget that the graph is undirected.) Then w! is a descendant of w and w' is an ancestor of w’. Since w’ is not in the subtree rooted at w, it must be an ancestor of v. But it cannot be v because v cannot be used twice on the cycle. Therefore u’ must be an ancestor of u. Q 22 LEcTu DepTH-First AND BREADTH-FirsT SEARCH The biconnected components can be found from a DFS tree as follows. Assume the vertices are named by their DFS numbers. We compute a value for each vertex v, called low(v), which gives the DFS number of the lowest numbered vertex 2 (i.e. the highest in the tree) such that there is a back edge from some descendant of v to z. By Lemmas 4.6 and 4.7, a vertex u will be an articulation point, and the biconnected component of the tree edge (u, v) will lie entirely in the subtree rooted at u, if low(v) > u. We can inductively compute low(v) as follows: rz y low(v) min{low(w) | w is an immediate descendant of v} min{z | z is reachable by a back edge from v} = min(z,y) . The values low(v) can be computed simultaneously with the construction of the DFS tree in linear time. As soon as an articulation point u is discovered with (u,v) a tree edge such that low(v) > u, the biconnected component containing the edge (u,v) can be deleted from the graph. See (3, 78] for more details. 4.2 Directed DFS The DFS procedure on directed graphs is similar to DFS on undirected graphs, except that we only follow edges from sources to sinks. Four types of edges can result: © tree edges to a vertex not yet visited e back edges to an ancestor forward edges to a descendant previously visited © cross edges to a vertex previously visited that is neither an ancestor nor a descendant. There can be no cross edges to a higher numbered vertex; such an edge would have been a tree edge. If we mark the vertex y when the tree edge (z,y) is popped to indicate that the subtree below y has been completely explored, we can recognize each of these four cases when we explore the edge (u,v) by checking marks and comparing DFS numbers: (u,v) is a if tree edge DFS(v) does not exist back edge DFS(v) < DFS(u) and v is not marked forward edge | DFS(v) > DFS(u) cross edge DFS(v) < DFS(u) and v is marked Bb 4 TH- FIRS’ ND BREADTH-FIRS' RCH 23 The directed DFS tree can be constructed in linear time; see (3, 78] for details. The first application of directed DFS is determining acyclicity: Theorem 4.8 A directed graph is acyclic iff its DF'S forest has no back edges. Proof. If there is a back edge, the graph is surely cyclic. Conversely, if there are no back edges, consider the postorder numbering of the DFS forest: traverse the forest in depth-first order, but number the vertices in the order they are ast seen. Then tree edges, forward edges, and cross edges all go from higher numbered to lower numbered vertices, so there can be no cycles. © 4.3 Strong Components Definition 4.9 Let G = (V,E) be a directed graph. For u,v € V, define u =v if u and v lie on a directed cycle in G. This is an equivalence relation, and its equivalence classes are called strongly connected components or just strong components. A graph G is said to be strongly connected if for any pair of vertices u,v there is a directed cycle in G containing u and 0; i.e., if G has only one strong component. a The strong components of a directed graph can be computed in linear time using directed depth-first search. The algorithm is similar to the algorithm for biconnected components in undirected graphs; see [3] for details. 4.4 Strong Components and Partial Orders Strong components are important in the representation of partial orders. Fi- nite partial orders are often represented as the reflexive transitive closures E* of dags G = (V,E) (recall (u,v) € E* iff there exists an E-path from u to v of length 0 or greater). If G is not acyclic, then the relation E* does not satisfy the antisymmetry law, and is thus not a partial order. However, it is still reflexive and transitive. Such a relation is called a preorder or sometimes @ quasiorder. Given an arbitrary preorder (P,~), define x ~ y if x y andy Xz. This is an equivalence relation, and we can collapse its equivalence classes into single points to get a partial order. This construction is called a quotient construction. Formally, let [2] denote the ~-class of x and let P/s denote the set of all such classes; i.e., (2) Pix tyly =z} {[z]| 2 P}. The preorder < induces a preorder, also denoted x, on P/* in a natural way: [x] = [y] if 2 < y in P. (The choice of x and y in their respective equivalence classes doesn’t matter.) It is easily shown that the preorder ~< is 2 ‘URE DeprTu-FIRsT AND BREA) -FIRST SEARCH actually a partial order on P/*; intuitively, by collapsing equivalence classes, we identified those elements that caused antisymmetry to fail. Forming the strong components of a directed (not necessarily acyclic) graph G = (V,£) allows us to perform this operation effectively on the preorder (V,E*). We form a quotient graph G/= by collapsing the strong components of G into single vertices: fe] {u | u = v} (the strong component of v) V/= = {hl |vev} E = {((u, fel) | (ue) € B} G/= = (V/=,B). It is not hard to show that G/= is acyclic. Moreover, Theorem 4.10 The partial orders (V/~, E*) and (V/=,(E’)*) are isomor- phic. In other words, the partial order represented by the collapsed graph is the same as the collapse of the preorder represented by the original graph. Lecture 5 Shortest Paths and Transitive Closure 5.1 Single-Source Shortest Paths Let G = (V,£) be an undirected graph and let @ be a function assigning a nonnegative length to each edge. Extend £ to domain V x V by defining &v,v) = 0 and £(u,v) = oo if (u,v) ¢ EB. Define the length? of a path P = €1€2...€, to be &(p) = Dh (ei). For u,v € V, define the distance d(u,v) from u to v to be the length of a shortest path from wu to v, or oo if no such path exists. The single-source shortest path problem is to find, given 5 €V, the value of d(s,u) for every other vertex u in the graph. If the graph is unweighted (i.e., all edge lengths are 1), we can solve the problem in linear time using BFS. For the more general case, here is an algo- tithm due to Dijkstra [28]. Later on we will give an O(m +n log n) implemen- tation using Fibonacci heaps. The algorithm is a type of greedy algorithm: it builds a set X vertex by vertex, always taking vertices closest to X. ——____ In this context, the terms “length” and “shortest” applied to a path refer to é, not the number of edges in the path. 25 26 LECTURE 5 SHORTEST PATHS AND TRANSITIVE CLOSURE Algorithm 5.1 (Dijkstra’s Algorithm) X:= {3}; D(s) :=0; for each u € V — {s} do D(u) := &(s, u); while X # V do let ue V —X such that D(u) is minimum; X = Xv {u}; for each edge (u,v) with v € V — X do D(v) := min(D(v), D(u) + &(u, v)) end while The final value of D(u) is d(s,u). This algorithm can be proved correct: by showing that the following two invariants are maintained by the while loop: e for any u, D(u) is the distance from s to u along a shortest path through only vertices in X; © for any u€ X, 0 ¢ X, D(u) < D(v). 5.2 Reflexive Transitive Closure Let E denote the adjacency matrix of the directed graph G = (V, E). Using Boolean matrix multiplication, the matrix E? has a 1 in position wv iff there is a path of length exactly 2 from vertex u to vertex v; ie., iff there exists a vertex w such that (u,w),(w,v) € E. Similarly, one can prove by induction on k that (E*),, = 1 iff there is a path of length exactly k from u to v. The reflexive transitive closure of G is E* IVEVE*y..- IVEVE*y..-vE™ (vey. The infinite join is equal to the finite one because if there is a path connecting u and v, then there is one of length at most n — 1. Suppose that two n x n Boolean matrices can be multiplied in time M(n). Then E* = (I v E)""! can be calculated in time O(M(n) logn) by squaring E logn times. We will show below how to calculate E* in time O(M(n)). Conversely, if there is an algorithm to compute E* in time T(n), then M(n) is O(T(n)) (under the reasonable assumption that M(3n) is O(M(n))): to multiply A and B, place them strategically into a 3n x 3n matrix, then take its reflexive transitive closure: 0 AO 008B 000 * I A AB or B oo 7 LECTURE 5 SHORTEST PATHS AND TRANSITIVE CLOSURE 27 The product AB can be read off from the upper right-hand block. Here is a divide and conquer algorithm to find E* in time M(n). Algorithm 5.2 (Refiexive Transitive Closure) 1. Divide E into 4 submatrices A, B,C, D of size roughly 3 X J such that A and D are square. ra 2. Recursively compute D*. Compute F = A+BD*C. Recursively compute F*. 3. Set BY = F* F*BD* D*CF* | D¥ + D*CF*BD Essentially, we are partitioning the set of vertices into two disjoint sets U and V, where A describes the edges from U to U, B describes edges from U to V, C describes edges from V to U, and D describes edges from V to V. We compute reflexive transitive closures on these sets recursively and use this information to describe the reflexive transitive closure of E. Note that we compute two reflexive transitive closures, a few matrix multiplications (whose complexity is given by M) and a few matrix additions (whose complexity is assumed to be quadratic) of matrices of roughly half the size of E. This gives the recurrence = 2 2 Rye Tn) = 27(5) +eM(5) +4(5) where c and d are constants. Under the quite reasonable assumption that M(2n) > 4M(n), the solution to this recurrence is O(M(n)). 5.3. All-Pairs Shortest Paths Let E denote the adjacency matrix of a directed graph with edge weights. Replace the 1's in E by the edge weights and the 0’s by oo. Apply Algorithm 5.2 to calculate E*, except use + instead of A and min instead of V. We will show next time that this solves the all-pairs shortest path problem. Lecture 6 Kleene Algebra Consider a binary relation on an n element set represented by an nxn Boolean matrix E. Recall from the last lecture that we can compute the reflexive transitive closure of E by divide-and-conquer as follows: partition E into four submatrices A, B,C, D of size roughly $ x 3 such that A and D are square: + fat By induction, construct the matrices D*, F = A+ BD*C, and F*, then take BY = F* F* BD* | (1) D*CF* | D* + D*CF* BD . We will prove that the matrix E* as defined in (1) is indeed the reflexive transitive closure of E, but the proof will be carried out in a more abstract setting which will allow us to use the same construction in other applications. For example, we will be able to compute the lengths of the shortest paths between all pairs of points in a weighted directed graph using the same general algorithm, but with a different interpretation of the basic operations. How did we come up with the expressions in (1)? This is best motivated by considering a simple finite-state automaton over the alphabet Z = {a, b,c, d} with states s,¢ and transitions s-% s, 5% t,t s,¢St: 28 LECTURE 6 KLEENE ALGEBRA 29 For each pair of states u,v, consider the set of input strings in D* taking state u to state v in this automaton. Each such set is a regular subset of D* and is represented by a regular expression corresponding to the expressions appearing in (1): sos: f* s—t : f*bd* tos : d*cf* tot: d* 4+d*cf*bd* , where f = a +bd*c. (See (3, §9.1, pp. 318-319] for more information on finite automata and regular expressions.) 6.1 Definition of Kleene Algebras The appropriate level of abstraction we are seeking is Kleene algebra. This concept goes back to Kleene [61], but received significant impetus from the work of Conway [21]. The definition here is from [63]. Definition 6.1 A (*continuous) Kleene algebra is any structure of the form K = (S,+,-+,*, 0, 1) where S is a set of elements, + and - are binary operations S x S — s,* is a unary operation S — S, and 0 and 1 are distinguished elements of S, satisfying the axioms a+(b+c) = (a+b)+e (+ is associative) (2) a+b = b+a (+ is commutative) (3) at+a=a (+ is idempotent) (4) a+0 = 0+a =a (0 is an a identity for +) (5) a-(b-c) = (a-b)-c (- is associative) (6) @l=tlaz=a (1 is an a identity for -) (7) Oa =a0=0 (0 is an annihilator for -) (8) a-(b+c) = a-b+a-c (- distributes over +) (9) (b+c)-@ = b-a+c-a (10) plus the following axiom to deal with the * operator, which will require further explanation: ab*c = sup ab"c (11) 30 LECTURE 6 KLEENE ALGEBRA where eo = et = boo, a Axioms (2-5) say that the structure (S, +, 0) is an idempotent commu- tative monoid. Axioms (6-7) say that (S, -, 1) is a monoid. Axioms (8-10) describe how these two monoid structures interact. Altogether, Axioms (2-10) say that K is an idempotent semiring. The axiom (11) asserts the existence of the supremum or least upper bound of a certain set with respect to a certain partial order. In any idempotent semiring, there is a natural partial order defined by a 0} exists and is equal to ab*c. The postulate (11) captures axiomatically the behavior of reflexive transi- tive closure of a binary relation. It also captures the behavior of the Kleene * operator of formal language theory. In addition, there are many nonstandard examples of Kleene algebras that are useful in various contexts. We will give several examples below. Instead of Kleene algebras, many authors (such as [3, 78]) use so-called closed semirings. These structures are strongly related to Kleene algebras, LEcTuU} E EBRA but are defined in terms of a countable summation operator 7 instead of a supremum. In closed semirings, the * operator is not a primitive operator but is defined in terms of > by oe = Sor. nz0 The countable summation operator >, which sums a countably infinite se- quence of elements, is postulated not to depend on the order of the elements in the sequence or their multiplicity, and thus is essentially a supremum. The operator is also postulated to satisfy an infinite distributivity property that we get for free for all suprema of interest by stating the axiom as we did in (11). The main drawback with closed semirings is that the suprema of all count- able sets are required to exist, which is too many. Although every closed semiring is a Kleene algebra, there are definitely Kleene algebras that are not closed semirings. The most important example of such a Kleene algebra is the family Regy of regular subsets of =*, where © is a finite alphabet (Example 6.2 below). This example is important because it is the free Kleene algebra freely generated by ©, which essentially says that an equation between regular expressions over © holds in all Kleene algebras if and only if it holds in Regy. We will find this fact very useful in reducing arguments about Kleene algebras in general to arguments about regular subsets of D*. Kleene algebras were studied extensively in the monograph of Conway (21]. It is possible to axiomatize the equational theory of Kleene algebras in a purely finitary way [65]. The precise relationship between Kleene algebras and closed semirings is drawn in [64]. 6.2 Examples of Kleene Algebras Kleene algebras abound in computer science. Here are some examples. Example 6.2 Let = be a finite alphabet and let Regs denote the family of regular sets over © with the following operations: A+B = AUB A-B = {xy|reA, y € B} A* = {xy22--+tq|n>0 and a; € A, 1 {a} over Regy extends to the map R:RExpy — Regs in which R(q) is the regular set denoted by the regular expression a in the usual sense. The interpretation R is called the standard interpretation over Regs. The following lemma. generalizes (11). Lemma 7.1 Let R: © -+ Regy be the standard interpretation over Regs, and let I: & — K be any interpretation over any Kleene algebra K. For any regular expression a over D, Ia) = up T@) : (13) Note that since R(q) is a regular set of strings over the alphabet 5, the x in (13) denotes a string. Strings over © are themselves regular expressions over ©, so the expression I(r) makes sense. The equation (13) states that the supremum of the possibly infinite set {I(z)|eER@)} CK exists and is equal to I(a). We leave the proof of Lemma 7.1 as an exercise (Homework 3, Exercise 2). It follows that for any pair a, @ of regular expressions over 5, the equation @ = 3 is a logical consequence of the axioms of Kleene algebra, i.e. it holds under all interpretations over all Kleene algebras, if and only if it holds under the standard interpretation R over Regy. A fancy way of saying this is that Regs is the free Kleene algebra on free generators ©. Theorem 7.2 Let a and G be regular expressions over © and let R be the standard interpretation over Regy. Then (a) = 1(8) for all interpretations I over Kleene algebras if and only if R(a) = Rl). LECTURE. N_KLEENI EBRA Proof. (+) This follows immediately from the fact that Regy is a Kleene algebra and 2 is an interpretation over Regy. (<) Suppose R(a) = R(G). Then I(a) sup I(x) by Lemma 7.1 ze R(a) sup I(x) by the assumption R(a) = R(f) zER(B) 1(@) , again by Lemma 7.1. 7.1 Matrix Kleene Algebras The collection M(n, XK) of n x n matrices with elements in a Kleene algebra K again forms a Kleene algebra, provided the Kleene algebra operators on M(n,K) are defined appropriately. We always define + as ordinary matrix addition, - as ordinary matrix multiplication, 0 as the zero matrix, 1 as the identity matrix, and * recursively by equation (1) of the previous lecture. We must show that all the axioms of Kleene algebra are satisfied by M(n,K) under these definitions. For example, in M(2, KX) the identity elements for + and - are 00 10 00 01 respectively, and the operations +, -, and * are given by ab ef] _ [ate b+f ce d|*|g h| ~ Le+g dth ab] je fj} _ | ae+bg af + bh cd g h| |ce+dg cf+dh ab]* _ f[ ft frbd* cd ~ | d¥cf* d* + d*cf*ba* where f = a+bd*c. Note that A < B in the natural order on M(n, KX) defined by (12) if and only if Ay < By for all 1 0 Now let A, B,C be arbitrary matrices over an arbitrary Kleene algebra K. Let aj;, bij, cj denote the ij** elements of A, B, and C, respectively. Let J be the interpretation Taj) = 9 I(bijs) = by I(cy) = cy. Then (AB*C)y = I((AB*C)y) = sup{I(x) |x € R((AB*C),;)} by Lemma 7.1 = sp{i(a) |e € U R(ABYO),)} 30 = supsup{i(2) | 7 € R(AB‘C)y)} = sup/((AB‘C),;) k20 = i sup(4B Chis 5

You might also like