Math 108 Notes Combinatorics Kannan Soundararajan Instant Download
Math 108 Notes Combinatorics Kannan Soundararajan Instant Download
https://textbookfull.com/product/math-108-notes-combinatorics-kannan-soundararajan/
                                                  DOWNLOAD EBOOK
   Math 108 notes Combinatorics Kannan Soundararajan pdf
                        download
Available Formats
                                                   ARUN DEBRAY
                                                   JUNE 11, 2013
These notes were taken in Stanford’s Math 108 class in Spring 2013, taught by Professor Kannan Soundararajan. I
live-TEXed them using vim, and as such there may be typos; please send questions, comments, complaints, and corrections
to [email protected].
                                                      Contents
   1.    Introduction to Graph Theory: 4/1/13                                                                        1
   2.    More Graph Theory: 4/3/13                                                                                   3
   3.    Cayley’s Theorem: 4/8/13                                                                                    5
   4.    Another Proof of Cayley’s Theorem: 4/10/13                                                                  7
   5.    Ramsey Theory: 4/15/13                                                                                      8
   6.    More Ramsey Theory: 4/17/13                                                                                10
   7.    Turán’s Theorem: 4/22/13                                                                                  12
   8.    Coloring the Vertices of a Graph: 4/24/13                                                                  14
   9.    Probabilistic Constructions: The Legacy of Paul Erdös: 4/25/13                                            15
   10.    Chromatic Polynomials: 4/29/13                                                                            17
   11.    Planar Graphs and Matchings: 5/1/13                                                                       18
   12.    The Gale-Shapley Theorem and Network Flow: 5/6/13                                                         20
   13.    Network Flow II: 5/13/13                                                                                  22
   14.    Network Flow III: 5/15/13                                                                                 22
   15.    Sperner’s Theorem: 5/20/13                                                                                24
   16.    The Principle of Inclusion and Exclusion: 5/22/13                                                         25
   17.    The Twelvefold Way: 5/29/13                                                                               26
   18.    Combinatorial Functions: 6/3/13                                                                           28
   19.    More Techniques for Asymptotics: 6/5/13                                                                   30
Definition 1.1. A graph is an object consisting of vertices V and a set of edges E such that each edge e is a
pair of vertices {x, y }, such that edge e is thought to connect edges x and y .
                                                           1
Example 1.2. If G1 = (V, E1 ) with V = {1, 2, 3, 4} and E1 = {{1, 3}, {3, 4}, {2, 4}, {1, 4}} and G2 = (V, E2 )
with E2 = E1 ∪ {2, 2}, the graphs are
                                     G1 = 1               2           G2 = 1         2
                                                                                                                           (
                                            4             3                 4       3.
Definition 1.3. A simple graph is a graph that contains no loops and no multiple edges, such as G1 in Example 1.2
above, but not G2 .
Definition 1.4. A finite graph G = (E, V ) is a graph in which both E and V are finite sets.
Definition 1.5. If vi , vj ∈ V are joined by an edge, they are called adjacent. Then, the adjacency matrix of a
graph G is a matrix A of size |V | such that aij is equal to the number of edges joining vi and vj .
   If the graph is undirected, then the matrix is symmetric.
   In a simple graph, the adjacency matrix consists of only zeroes and ones. A symmetric matrix with real entries
has nice properties, so the subject of spectral graph theory attempts to understand a graph by performing linear
algebra on its adjacency matrix.
Definition 1.6. The degree of a vertex is the number of edges coming out of it. Note that for loops on an
undirected graph, each loops counts twice. The degree of a vertex is also referred to as its valency.
   Thus, deg(vi ) = nj=1 aij (summing a column of the adjacency matrix), which does require the convention
                     P
given above.
Definition 1.7. A graph is called regular if all vertices have the same degree.
                                                                                                  
                                                                                                 1     k
                                                                                                ..   .. 
   If A is the adjacency matrix of a k-regular graph, then k is an eigenvalue, because A  .  =  . . In
particular, since the matrix is symmetric for an undirected graph, the eigenvalues are all real. 1     k
   Understanding the adjacency matrix is extremely helpful for applications such as random walks on graphs.
Lemma 1.8 (Handshaking1). In any graph, the number of vertices of odd degree is even.
Proof. It can be assumed that there are no loops, since removing the loops of a graph doesn’t change the parity
of the degree of any vertex.
   Consider every ordered pair (x, y ) which forms an edge {x, y }, so that (x, y ) is counted if (y , x) is. Thus,
there are 2|E| such pairs.2 However, this is also equal to the sum
                                                      
                                        X        X           X
                                                     1 =      deg(x),
                                             x∈V       {x,y }∈E       x∈V
which is also
          P equal to the sum of the elements in the adjacency matrix.
  Thus, x∈V deg(x) = 2|E|. Notice that throwing out the loops isn’t necessary to get this result. Thus, the
number of vertices with odd degree must be even; otherwise, the sum would be odd.                         
   The lemma owes its name to a scenario where a bunch of people at a party exchange some handshakes, and
then count the sum of the number of handshakes that each person made.
   Some more examples of graphs:
    (1) The complete graph on n vertices, denoted Kn , is a simple graph on vertices V = {1, . . . , n} with
        E = {{i , j}, i , j ∈ V } (i.e. all vertices are connected). For example,
                                     K3 =          1                  K4 = 1        2
                                            2            3
                                                                            3       4.
   1This is of course not to be confused with the Handwaving Lemma.
   2This is a common trick in combinatorics — double-counting something, or counting in two different ways, can show that two
things must be equal.
                                                                  2
        Kn has n2 = n(n − 1)/2 edges. Thus, if G is any finite simple graph with |V | = n, then |E| ≤ n2 ,
                                                                                                                     
of research: what kinds of structures tend to exist in large, seemingly random graphs?                                (
   The idea of investigating graphs probabilistically is called the probabilistic method, due to Erdös. Szemeredi
also did a lot in this area (and won the Abel prize in 2012 for his work).
Exercise 1.10. If G is a k-regular bipartite graph (which already implies that there aren’t any loops), then show
that −k is an eigenvalue of the adjacency matrix. (Hint: try to guess an eigenvector, such as indexing 1 if xi ∈ A
and −1 if xi ∈ B.)
Definition 1.11. A walk is an alternating sequence of vertices and edges v1 , e1 , v2 , e2 , . . . , en−1 , vn , such that
edge ej connects vk and vk+1 . This is exactly what you would think it is.
   Notice that a walk can visit an edge or vertex many times.
Definition 1.12. A walk is called a trail if no edge is repeated in it.3
Example 1.13. v1 , e1 , v2 , e2 , v3 , e3 , v4 , e4 , v5 , e5 , v6 , e5 , v3 , e2 , v2 , e1 , v1 is a walk that isn’t a trail.   (
Definition 1.14. A closed walk is (again) exactly what you might think it to be: a walk in which the starting
and ending vertices are the same. One can similarly define a closed trail.
Definition 1.15. A path is a walk in which all vertices are distinct.4
Definition 1.16. An Euler trail is a trail through a graph that traverses every edge exactly once.
   Not every graph has an Euler trail: look at K4 .
Theorem 1.17 (Euler). A connected graph has an Euler trail iff there are at most two vertices with odd degree.
   Connectedness will be defined in the next lecture, and its definition will not come as a surprise. The proof will
also be given in the next lecture. Note also that by Lemma 1.8, the condition of at most 2 is equivalent to
specifying either 0 or 2 such edges.
                                                                       3
   Last lecture, we also saw Euler trails. A closed Euler trail is also called an Euler tour. Euler was interested
in a question about the bridges of Königsberg: specifically, if the graph created from the bridges and islands
admitted an Euler tour. By Theorem 1.17, this isn’t possible. So let’s prove the theorem.
Proof of Theorem 1.17. Let G be a connected graph. If it has an Euler tour, then every edge going into a
vertex must come out of it, so the degree of every vertex is even. If it only has an Euler trail, then the starting
and ending point don’t necessarily meet this, so they might have odd degree, but that makes for at most 2 such
edges.
   If G has only edges with even degree, start at an arbitrary point and build a trail by choosing distinct edges.
Then, the trail won’t get stuck anywhere, but it might not get all of the edges. Thus, remove the edges
contained in this trail, leaving a smaller graph. This graph isn’t necessarily connected, but it has several connected
components, each of which has vertices of only even degree. By induction, we can assume that each of these
components has an Euler tour, since they all have fewer edges than G.5 Then, an Euler tour can be made by
joining the edges in the original trail with each of these tours at some vertex.
   If G has instead two vertices of odd degree, take the graph G 0 to be G with a new edge e between them.
Then, G 0 has an Eulerian tour by above, so taking that tour and removing e gives an Eulerian trail on G.            
Definition 2.3. A tree is a connected graph that has no cycles.
   These are called trees because many of them can be made to look like trees with trunks and branches.
   A tree with n vertices must have n − 1 edges, which can be proven straightforwardly by induction. Additionally,
any two vertices are connected by a unique path: a tree is connected, so such a path exists, but if there were
two distinct paths p1 and p2 , one could obtain a cycle from them, by taking p1 ∪ p2 \ (p1 ∩ p2 ).
   If one adds an edge to a tree without adding any vertices, then it connects two vertices v1 and v2 . However,
there was already a path between these vertices and this edge gives another path, so the result is not a tree. If
one removes a path from a tree, the result is disconnected. Thus, if one removes an edge, it splits into two
trees, so the formula for the number of edges (i.e. |V | − 1) follows.
   For any connected graph, define the distance between two edges x and y to be d(x, y ), the smallest number
of edges in any path between x and y . This can be used to show that trees are bipartite: take some vertex v
and let A be the set of vertices of even distance from v , and B be the set of vertices of odd distance from v .
Then, having edges from A to be is possible, but there will never be an edge from A to A, or B to B, since this
would cause a nontrivial cycle (since it’s of odd length).
   This is a special case of a more general theorem:
Theorem 2.4. A graph that has no cycles of odd length is bipartite.
  This can be proven by a modification of the above argument, as well as reasoning that a bipartite graph
cannot have cycles of odd length by trying to sort the vertices in the cycle into sets.
   5This would need to be formalized for the graph with one edge and then making an inductive assumption, but it’s just as valid.
                                                               4
Definition 2.5. An isomorphism of graphs G2 1 = (V1 , E1 ) and G2 = (V2 , E2 ) called f : G1 → G2 is a bijection
V1 → V2 that preserves edges.
  For example, one can use an isomorphism to make K4 planar, by moving one of the edges around the graph.
There’s a website called planarity.net, which allows one to try and make a graph planar.
Definition 2.6. An automorphism of a graph is an isomorphism with itself.
   One can try to enumerate all trees of a given size. For example, the trees with 3 edges aren’t very interesting;
they’re just lines of two edges. Thus, one might consider labelled trees, where the vertices are chosen among
{1, . . . , n} and the order of the labelling matters. There aren’t 6 labelled trees with 3 vertices, though, because
1 − 2 − 3 is isomorphic to 3 − 2 − 1. Thus, there are 3 labelled trees with 3 vertices.
Theorem 2.7 (Cayley). There are nn−2 trees with n labelled vertices.
   The proof of this theorem will require the following lemma:
Lemma 2.8. Every tree has at least two vertices of degree 1.
Proof. Let di be the degree of vertex i ; then, ni=1 di = 2|E| = 2n − 2. Since di ≥ 1, then at most n − 2 have
                                               P
   There are plenty of other ways to prove this. Now, it will be possible to prove Cayley’s Theorem. Two proofs
will be given:
Proof of Theorem 2.7.
Definition 3.3. A Prüfer code is a sequence of n − 2 numbers [y1 , . . . , yn−2 ] with 1 ≤ y1 , . . . , yn−2 ≤ n.
   Thus, there are nn−2 distinct Prüfer codes, and the goal is to show that there is a bijective correspondence
between labeled trees and Prüfer codes. Generate a Prüfer code from a tree in the following way: of all of the
vertices with degree 1, let x1 be the vertex with the smallest value. Then, let y1 be the unique vertex to which
x1 is connected. Then, remove the edge connecting them, and repeat; the next entry in the Prüfer code is the
next yi found by this algorithm. Then, continue until n − 2 numbers are found.
Claim. If [y1 , . . . , yn−2 ] is the Prüfer code for a tree, then yn−1 = n.
Proof. A vertex will be removed from the game if it is the least vertex in the tree, which will never be true for
vertex n, and at the last step, there are only two vertices, n and a for some a < n, which are connected, so yn−1
is whatever is connected to a, which is n.                                                                     
                                                             5
                                                                3
2 9
5 4 6 7 8
   This is why the Prüfer code is written as an (n − 2)-tuple, rather than an (n − 1)-tuple; if it were of size
(n − 1), the last entry wouldn’t have any meaning.
   Now, given a Prüfer code, it is possible to obtain a tree. First observe that each vertex v of the tree will
appear in its Prüfer code exactly deg(v ) − 1 times, since it is removed once (in which case it isn’t added to the
Prüfer code), but the remaining deg(v ) − 1 times the other vertex on the considered edge is removed, and v is
added to the code. Thus, given a Prüfer code, such as [1, 2, 2, 1, 9, 9, 9] as in Figure 2, one can reconstruct the
tree according to the following algorithm:
       • First, place vertex y1 on the graph and add the smallest vertex with degree 0, connecting them.
       • Remove y1 from the Prüfer code and repeat, updating the degrees left to add based on the new Prüfer
         code. Then, repeat these steps.
Though this seems a little fuzzy, this is a bijective correspondence between a tree and its Prüfer code, since each
one can be used to obtain the other, and a more rigorous proof of this can be given by induction.
   Then, there are clearly nn−2 Prüfer codes, so there are nn−2 labelled trees.                                   
   The binomial coefficient is the number of ways to choose k elements out of an n-element set, kn =
                                                                                                                
  There are several ways to prove it: induction on n is straightforward, but one can also directly observe that
when multiplying out (x + y ) and looking for a particular coefficient, one chooses k xs and n − k y s.
  The binomial coefficient can be generalized to the multinomial coefficient:
                                                  `
                                                          !n                                             `
                                                X                     X                  n
                                                                                                    Y
                                                       xi     =                                              xiki .       (3.5)
                                                                                    k1 , . . . , k`
                                                i=1              k1 +···+k` =n                         i=1
          n
                  
Here, k1 ,...,k `
                    = n!/(k1 ! · · · k` !) is called the multinomial coefficient. This can be thought of as assigning n
people to ` teams of sizes k1 , . . . , k` , so the recursive formulation is
                                                       X    `                                                      
                                            n                                             n−1
                                                          =                                                             . (3.6)
                                      k1 , . . . , k `               k1 , . . . , kj−1 , kj − 1, kj+1 , . . . , k`
                                                             j=1
                                                                  n
                                                                         
A complete definition does also require that k1 ,...,k                 `
                                                                           = 0 if some kj < 0.
  The multinomial coefficient reduces the the binomial coefficient of n with k and n − k.
  Then, (3.5) will be used in a second proof of Cayley’s Theorem to show that
                                                                                                                 
                                                                                 X                  n−2
                                          (1 + · · · + 1)n−2 =                                                      .     (3.7)
                                                                                                 k1 , . . . , k n
                                                                k1 ,...,kn ≥0
                                                            k1 +···+kn =(n−2)
                                                                6
                                      4. Another Proof of Cayley’s Theorem: 4/10/13
   Another proof of Cayley’s Theorem can be given by enumerating all trees with given properties. This will be
an example of enumerative combinatorics.
Second proof of Theorem 2.7. Let t(n;  Pd1 , . . . , dn ) be the number
                                                                     P of trees on n vertices such that vertex i has
degree di . Then, all of the di ≥ 1 and ni=1 di = 2n − 2. Thus, ni=1 (di − 1) = n − 2, since n terms have been
taken away.
Claim.                                                                                                   
                                                                                         n−2
                                             t(n; d1 , . . . , dn ) =                                      .
                                                                                   d1 − 1, . . . , dn − 1
   Supposing this claim, the total number of trees is
                        X                                   X               n−2
                                                                                           
                                 t(n; d1 , . . . , dn ) =
                                                                     d1 − 1, . . . , dn − 1
                      d1 ,...,dn                          d1 ,...,dn
                                                              X  n−2 
                                                        =
                                                                       k1 , . . . , k n
                                                                   k1 ,...,kn ≥0
                                                                   P
                                                                       ki =n−2
                                                             = (1     · · + 1})n−2 = nn−2
                                                                | + ·{z                                                 by (3.7).
                                                                        n times
Proof of the claim. It’s probably worth checking this claim for n = 2 or n = 3 to build intuition, since these
cases are pretty simple. The general case will be given by a recursive formula that is similar to the one given for
multinomial coefficients in (3.6).
    Of the n vertices, some vertex has degree 1: di = 1. Then, vertex i can be connected to any of the vertices
1, . . . , i − 1, i + 1, . . . , n. If it’s connected to vertex 1, the number of options is
                                                                                                                                  
                                                                                                   n−3
        t(n − 1; d1 − 1, d2 , d3 , . . . , di−1 , di+1 , . . . , dn ) =
                                                                         d1 − 2, d2 − 1, . . . , di−1 − 1, di+1 − 1, . . . , dn − 1
by the inductive assumption,7 so one can take the sum over all such vertices
                                            n                                                              
                                           X                               n−3
                  t(n; d1 , . . . , dn ) =
                                                 d1 − 2, d2 − 1, . . . , di−1 − 1, di+1 − 1, . . . , dn − 1
                                           i=1
                                                                  
                                                    n−2
                                         =
                                             d1 − 1, . . . , dn − 1
by the multinomial theorem.                                                                                                         
since removing that vertex and edge leaves behind a tree with the given degrees of its vertices.
                                                                                   7
           • Start at A, or letting S = {A}.
           • At every stage, there is a set S of vertices such that the shortest path from A to these vertices
             is known. Here, find a v ∈ V \ S such that the distance from A to some vertex u ∈ S plus the
             distance from u to v is minimized, over all u ∈ S adjacent to v , over all v ∈ V \ S.
           Since G is connected, there will always be such v and u, so this algorithm is correct. Of course,
        there’s a bit more to think about, but this is not so bad.
           If the weights aren’t assumed to be nonnegative, there is no unique shortest path, so the problem
        wouldn’t be well-formed, and the problem wouldn’t have as much physical meaning.
    (2) Given some weighted, connectedPgraph G, one problem is to find a minimal spanning tree for G: to
        find a spanning tree T such that e∈T w (e) is minimum; such a tree is called a minimal spanning tree.
        This has applications such as laying pipe efficiently in order to cover every city in an area.
           In the previous lecture, it was shown that every connected graph has a spanning tree, so there must
        be a minimal spanning tree as well. Kruskal’s algorithm is the standard solution here, but there are
        others, such as reverse-delete. They are all of the same family, referred to as greedy algorithms, and do
        the following: at each step, find the cheapest edge that hasn’t been used and doesn’t form a cycle, and
        add it to the tree, or from G remove the most expensive edge that doesn’t disconnect G.8
Another solution is Prim’s Algorithm. This builds up a sequence of vertices S such that for each step, one
chooses a vertex from V \ S, and connected it in the cheapest way to a vertex in S. This differs from Kruskal’s
algorithm in that it requires the tree to always be connected as it is built, whereas Kruskal’s doesn’t.
Claim. Prim’s Algorithm produces an MST.9
                                                                                                                     (
   Another easy argument is that R(p, q) = R(q, p) for reasons of symmetry. Now, the general theorem can be
tackled:
  8This seems like it would be more expensive in terms of running time, but this is a math class, so nobody cares.
  9There could be many such MSTs, though if the weights are distinct, then there is exactly one.
                                                               8
                           Figure 3. A coloring of K5 which contains no triangles. Source.
Proof of Theorem 5.1. Proceed by induction on p + q, and bound the number of vertices that are needed.
Assume the theorem is false for n = p + q.
   Pick some vertex v , which is connected to n − 1 vertices. Of these edges, suppose b of them are blue and
r are red. The b blue edges connect to b vertices forming a complete graph Kb . This can’t contain any blue
Kp−1 or red Kq , since these would prove the theorem. Similarly, looking at the r vertices connected to v by
red edges, they cannot contain a red Kq−1 or a blue Kp for the same reasons. Thus, b ≤ R(p − 1, q) − 1 and
r ≤ R(p, q − 1) − 1. Thus, n = b + r + 1 ≤ R(p − 1, q) + R(p, q − 1) − 1. In other words, this is an upper
bound for R(p, q): R(p, q) ≤ R(p − 1, q) + R(p, q − 1).                                                    
   This proof looks locally at a graph, since nobody knows how to look globally at a graph to make these sorts
of proofs.
   Notice that this recurrence looks just like the one from the Binomial Theorem, and thus:
Corollary 5.4. R(p, q) ≤ p+q−2
                                  
                              p−1 .
                                                                9
                                                                                                       n
The goal is to show this is strictly less than the number of possible two-colorings of Kn , which is 2(2) , since this
would imply there is some coloring for which there are no such monochromatic subgraphs Kp .
    This bound can be found by double-counting, calculating the sum in two ways to obtain two different kinds of
information. For a given red or blue Kp , count the number     of 2-colorings of Kn in which it is monochromatic.
Choose p vertices from the n, which can be done in pn ways, and pick either red or blue. Then, there are
   n    p                                                                          n p
2(2)−(2) ways to color the rest of the graph, so the sum (5.5) is equal to 2 p2 2(2)−(2) .
                         n p           n                                p
    Thus, we want 2 pn 2(2)−(2) < 2(2) . Rearranging, this is pn < 2(2)−1 . Since pn = n(n−1)···(n−p+1)
                                                                                      
                                                                                                   p!        ≤ np /p!,
                                          p                                1
it is sufficient to check if np /p! ≤ 2(2)−1 , or that n < 2p(p−1)/2−1 p! p . The factorial term dominates, so it’s
                                                                         
                                      1/p
                              2
sufficient to have n < 2p /2+p/2−2          . It’s possible to do better with more care, again, but it’s enough to
have n < 2p/2 . Thus, we have the following theorem:
Theorem 5.6. There exists a coloring of Kb2p/2 c with no monochromatic Kp . In other words, R(p, p) ≥ 2p/2 .
                           √
   Thus, the bounds are ( 2)p ≤ R(p, p) ≤ 4p , so is R(p, p) ≈ c p for some c? Maybe. Nobody knows. A big
open problem is to improve these bounds; even changing√the 4 to a 3.999 would be a major improvement.
   Using Stirling’s Formula, which√approximates n! ≈ 2πn(n/e)n , the actual bound for large numbers is
something like R(p, p) ≥ p2p/2 /(e 2). If you’re willing to work a lot harder,
                                                                       √       there’s a better result (Spencer
1980) that gives a factor of two, which is the best known: R(p, p) ≥ p 22p/2 /e.
   The best known upper bound is due to Conlon in 2011, and shows that R(p, p) ≤ 4p /p A for any A for
sufficiently large p. Nonetheless, this is still nowhere near 3.999p .
Theorem 6.1. For given p1 , . . . , pr , then if n is sufficiently large, then every r -coloring of Kn will contain some
Kpi that is monochromatic of color i .
Proof of Theorem 6.1. The proof will not be much different than the proof of Theorem 5.1: suppose Kn is
r -colored but there is no Kpi of color i , where the proof uses induction on n.
    Pick some vertex v . Then, there are strictly less than R(p1 − 1, p2 . . . , pr ) vertices connected to v such that
the connecting edge has color 1, since that would satisfy the theorem (since if there were p − 1 of them of color
1, then there would be a monochromatic Kp1 created by adjoining v ). Similarly, the number of edges of color i
must be R(p1 , . . . , pi−1 , pi − 1, pi+1 , . . . , pr ). Thus, summing up,
                                                r
                                                X
                                 n ≤1−r +         (p1 , . . . , pi−1 , pi − 1, pi+1 , . . . , pr ),
                                                 i=1
   There are a lot of complicated versions of this result. One of the more beautiful examples:
Theorem 6.4 (Van der Waerden). In any r -coloring of the integers, there exist arbitrarly long monochromatic
arithmetic progressions (i.e. sequences of the form a, a + d, a + 2d, . . . , a + (k + 1)d).
   There is a finite version of this theorem, as with Schur’s theorem: there is some number N(r, k) such that an
arithmetic progression of length k always exists if [1, N(r, k)] is r -colored.
Theorem 6.5 (Gowers). An upper bound for N(r, k) is
                                                                  2k+9
                                                                r2
                                                 N(n, k) ≤ 22            .
   Gowers got a Fields medal in part because of this theorem. It leads to lots of useful things, such as the
theorem that the primes contain arbitrarily long arithmetic progressions.
   Another example involves playing Tic-Tac-Toe in high dimensions. Suppose one has a d-dimensional board
of size k × k × · · · × k. Denote the set {1, . . . , k}d as k-tuples of coordinates (k1 , . . . , kd ). Then, define a
combinatorial line to be a function of these k-tuples where each coordinate is either constant or varies with the
same value (e.g. (1, 1, ∗, ∗, 3, 5), where ∗ is always the same in both components; ∗ can be thought of as a
wildcard). This sort of line is more restrictive than the lines allowed in Tic-Tac-Toe, and is sometimes known as
a Hales-Jewett line.
Theorem 6.6 (Hales-Jewett). If d is large enough (depending on k and r )and {1, . . . , k}d is r -colored, then
there exists a monochromatic combinatorial line.
   This means that for sufficiently large board, there is always a winner in Tic-Tac-Toe of many dimensions.
Note that this is explicitly not true when r = 2 and d = 2. A statement called the density version of this theorem
also deals with how much of {1, . . . , k}d is colored before the statement holds.
   It is possible to restate Ramsey’s theorem in a different way: for any graph G on n vertices, one can color Kn
by taking an edge to be red if it’s in G, and blue if it’s not. Then,
Definition 6.7. An independent set on a simple graph G is a set of vertices such that no edge connects any of
the vertices. A clique is the opposite: it is a set of vertices where all possible edges between them exist.
   Thus, Ramsey’s theorem states that if n ≥ R(p, q), then any simple graph G on n vertices either contains p
independent vertices or a q-clique.
   Thus, one can ask how many edges a simple graph on n vertices contains given that it has no triangles. For
example, one can have a bipartite graph, with the two sets of sizes a and (n − a). Thus, one can obtain a graph
                                                          11
of a(n − a) vertices without triangles, which can be maximized as about a2 /4, when a ≈ n/2. Specifically, one
has bn/2c(n − bn/2c) edges, which is almost half as many as in a complete graph. However, this is a highly
nonrandom graph, since it’s so structured, so one could have some theory as to whether a graph looks random
or looks structured, or whether these patterns are true in random graphs as well.
Theorem 6.8 (Turán). If G is a simple graph that has more than bn/2c(n − bn/2c) edges, then G contains a
triangle.
Proof. Let K(n) be the maximum example of edges that a simple, triangle-free graph can have. A bipartite
graph gives a lower bound of K(n) ≥ bn/2c(n − bn/2c) edges.
    For some small values: K(1) = 1, K(2) = 1, K(3) = 2, etc. By induction, an upper bound can be discovered.
    Choose an edge e = {1, 2} and remove it and its vertices. Then, there are at most K(n −2) edges. Then, back
in the original graph G, each vertex can be connected to at most on of vertices 1 or 2, so K(n) ≤ K(n −2)+n −1.
Applying this repeatedly, one can check by induction that K(n) ≤ bn/2c(n − bn/2c).                           
  It’s actually possible to prove a stronger result, which is that the only graph with K(n) edges is the bipartite
one mentioned above.
Theorem 7.2. If G is simple and has no 3- or 4-cycles, then G has at most O(n3/2 ) edges.
   This is substantially smaller than the triangle case, and as the girth increases, the exponent on n decreases.
                                                              12
Proof of Theorem 7.2. Proof by double-counting: pick a vertex u and look at the set {{v , w } | uv ∈ E, uw ∈ E}.
If d(u) = deg u, then the size of this set is d(u)
                                                2  = d(u)(d(u) − 1)/2. Then, for any edge {v , w } there is
at most one uPthat theycomefrom, or else there would be a 4-cycle. Thus, the number of “tents” of edges
v − u − w is u∈V d(u)2   ≤ n2 by above, so by the Handshaking Lemma, so
                                   X d(u)2 − d(u)        1X
                                                     =         d(u)2 − |E|.
                                              2          2
                                       u∈V                             u∈V
This requires a little bit of analysis; specifically, take the Cauchy-Schwarz Inequality
                                            n         2      n
                                                                   ! n     !
                                           X                X        X
                                                x i yi ≤       xi2      yi2 .
                                             i=1              i=1            i=1
Then,                                              !2              !                    !                         !
                                 X                          X          X                         X
                             2                                                      2                         2
                       4|E| =          d(u) · 1         ≤         1          d(u)           =n         d(u)       .
                                 u∈V                        u∈V        u∈V                       u∈V
                d(u)2 ≥ 2|E|2 /n, so
            P
Thus, 1/2
                                                  2|E|2         n(n − 1)
                                                        − |E| ≤
                                                    n                2
                                             =⇒ 4|E|2 − 2n|E| ≤ n2 (n − 1).
Intuitively, the use of Cauchy-Schwarz is due to considering the most evenly distributed case, or a pidgeonhole
argument. But now, the quadratic formula can be used to find the maximum number of edges, giving
                                          p                            √
                                     2n + 4n2 + 16n2 (n − 1)      n + n 4n − 3
                               |E| ≤                            =                .                           
                                                 8                      4
                                                                                   √          
   This also gives a more concrete upper bound: G must have at most (n(1 + n 4n − 3)/4 edges. This is
not a tight bound, but no tight bound is known.
   In some sense, the Cauchy-Schwarz Inequality is a statement that the largest number of edges occurs when
they are evenly distributed.
Definition 7.3. If G is a simple graph, a Hamiltonian circuit on G is a closed path that travels through each
vertex in the graph exactly once.
   This is different from the Eulerian circuit, which travels every edge exactly once.
Theorem 7.4 (G. Dirac11). Suppose that G is a simple graph on n vertices and each vertex has degree at least
n/2. Then, G contains a Hamiltonian cycle.
Proof. Take the maximal counterexample (i.e. one such that if any edge is added, then there is a Hamiltonian
cycle). Thus, G doesn’t contain any edge uv such that G ∪ uv contains a Hamiltonian cycle (that contains uv ).
Suppose u = u1 is connected to ui+1 and v = un is connected to ui , where u1 , . . . , un is the cycle. Then, there
exist such i such that u = u1 is connected to at least n/2 vertices ui+1 and v = un is connected to at least n/2
vertices ui by the pidgeonhole principle.                                                                        
Definition 7.5. The chromatic number of a graph G is the minimum number of colors necessary to color the
vertices of G such that no two adjacent vertices share the same color.
   To think about the definition, here’s a nice toy unsolved problem: color all of the points on the plane with χ
colors. Are there exactly two points exactly distance 1 apart that have the same color? Somewhere between
χ = 4 and χ = 7, inclusive, there aren’t.
   For χ = 3 with colors R, B, and W , take some equilateral triangle and color each vertex a different color.
Then, take the reflection of that triangle through the line√between the red and blue vertices; the remaining
vertex must be white. Thus, all points in a circle of radius 3 must be white, and take any two points on that
circle which are one inch apart, so any 3-coloring of the plane has points of the same color of distance 1 apart.
   11Not the famous Dirac.
                                                                  13
                                  8. Coloring the Vertices of a Graph: 4/24/13
   A lot of the problems in this class can be difficult: they tend to require some sort of clever trick. There are
various ways to think about this, but one good one is due to Pólya: if you can’t solve, then there is a smaller
problem you can’t solve. Solving the smaller problem (or applying the same heuristic) will give insight into the
larger one.
   A big question in vertex coloring is: given a graph G, color the vertices such that no adjacent vertices have
the same color. How many colors are necessary? As discussed in the previous lecture, take the set of points in
the plane; then, if they are 3-colored, then there always exist 2 points of distance 1 apart with the same color.
Thus, at least four colors are necessary to ensure there are no two points of distance 1 with the same color, but
not much more than that is known.
   However, it is possible to prove that an upper bound exists? Imagine tiling the plane with squares with diagonal
distance slightly smaller than 1. Take each 3 × 3 set of these squares and color each one with a different color,
and then repeat this. Thus, any two points with the same color are either within the same square (so they have
distance less than 1) or different squares of more than distance 1 away. Thus, the problem has an upper bound
of 9, and there’s a more clever way of using hexagons to show that 7 colors is also sufficient. Thus, the answer
to the problem is one of 4, 5, 6, or 7, but nobody knows which.
Definition 8.1. The chromatic number χ(G) of a finite graph G is the smallest number r such that the vertices
of G can be r -colored without two adjacent vertices being the same color.
   It can be assumed that G is simple: the existence of loops would cause issues with the statement of the
problem, so they won’t be considered, and any additional edges don’t change the adjacency of the problem, and
therefore don’t change the chromatic number.
   χ(Kn ) = n, since all pairs of vertices are adjacent. Thus, for any G with n vertices, χ(G) ≤ n, since each
vertex can be given a different color. If χ(G) = 1, then G has no edges. If χ(G) = 2, then let V1 be the set
of vertices of color 1 and V2 be the set of vertices of color 2. Then, there are no edges in V1 or in V2 , so G is
bipartite.
   More generally, if χ(G) = r , then the set V of vertices of G can be written as V = ri=1 Vi , where the Vi are
                                                                                             S
disjoint, and there are no vertices entirely within any given Vi . In general, if there are more edges, the chromatic
number increases. Thanks to Turán’s theorem, there exist triangle-free graphs with arbitrarily large chromatic
number.
   Is it possible to do better than this?
Proposition 8.2. Suppose ∆ = maxv ∈V deg v . Then, χ(G) ≤ ∆ + 1.
Proof. Use a greedy algorithm: build up the graph vertex-by-vertex. When any particular vertex is added, it
must be connected to at most ∆ vertices, so it can be given a color that is distinct from the color of each of
these vertices, since there are ∆ + 1 colors.                                                               
   In general, this is only a modest strengthening of the theorem, and it’s possible to do a bit better.
Theorem 8.3 (Brooks). If G is a connected simple graph, then χ(G) ≤ ∆ except in two cases:
      • G = K∆+1 , or
      • ∆ = 2 and G is an odd-length cycle.
   It’s also possible to state a lower bound.
Definition 8.4. Define the clique number of G to be the largest ` such that G contains an `-clique (i.e. a K` ).
   All of the vertices in a clique must be colored different colors, so χ is at least the clique number. Additionally,
if G can be split into independent sets V1 , . . . , Vr , then there is some Vj such that |Vj | > n/r . Thus, another
lower bound is found: χ(G) > n/|Vj |. This may be better or worse than the previous lower bound. Combining,
χ(G) ≥ max(`, n/|Vj |).
Definition 8.5. The chromatic function χ(G, k) is the number of different ways in which one can k-color the
graph G.
                                                         14
    In this scheme, two colorings are the same if each vertex has the same color in the colorings. For example, a
triangle can be k-colored in k(k − 1)(k − 2) ways. In general, χ(G, k) = 0 for 0 ≤ k < χ(G).
    If G = Kn , then there are k ways to color the first vertex, k − 1 ways to color the second, etc. Thus,
χ(Kn , k) = kn , which also neatly implies that χ(Kn , k) = 0 when 0 ≤ k < n.
Theorem 8.6. For any graph G, the chromatic function is an nth -degree polynomial in k.
Proof. Divide V into r independent sets V1 , . . . , Vr . Then, given such a partition, one can color all of the points
in each Vi with the same color, and use different colors for different regions. There are kr such ways of doing
                                                                                                
this. Then, this must be summed over all ways of partitioning V into indepdent sets:
                                                  X
                             χ(G, k) =                        k(k − 1) · · · (k − r + 1).
                                                   partions of G
                                               into independent sets
Thus, it is a finite linear combination of polynomials, so it is a polynomial. Its degree comes from the partition
into sets of single points, since otherwise there are fewer permutations. This gives k(k − 1) · · · (k − n + 1), so
the degree is at most n. If the degree were n − 1, then there is a doubleton V1 and some singletons V2 , . . . , Vn−2 ,
so the number of options is k(k − 1) · · · (k − n + 2) n2 , so the product is
                                                          
                                                    n
                     k(k − 1) · · · (k − n + 1) +      − |E| (k(k − 1) · · · (k − n + 2)) + · · · ,
                                                    2
where everything not written has degree at most n−2. Thus, the coefficient of k n = 1, of k n−1 is − n2 + n2 −|E| =
                                                                                                           
−|E|.
   This proof will be continued in the next lecture, but notice that the constant term is 0 (since a no-coloring
of a graph has problems), the leading coefficient is 1, the degree is n, and the second coefficient is −|E| in a
simple graph. Additionally, this polynomial is 0 for 0 ≤ k < χ.
   It will also be possible to show that the coefficients of the polynomial alternate in sign.
its edges with 2 colors, red and blue. The goal is to find a monochromatic k-clique (i.e. a Kk ⊂ Kn whose edges
are either all red or all blue). Then, the Ramsey number R(k, k) is the smallest number n such that this holds
true for any coloring of Kn . The key here is that of course such a Kk can be created, but the condition is that it
is present in all colorings.                                                                                         (
                                    k
Theorem 9.2 (Erdös). If n2 21−(2) < 1, then R(k, k) > n.
                               
Proof. One way to find a lower bound is to find a coloring of Kn without any monochromatic Kk . But this is
                                                                                                   n
hard, so colorings will be chosen at random. Let S be the set of colorings of Kn , so that |S| = 2(2) , and let
                                                                 15
P be the uniform distribution on S. Let f (x) = 1 is there exists no monochromatic Kk in the coloring and
f (x) = 0 otherwise. Then, the goal is to show that P(f (x) = 1) > 0, or equivalently that P(f (x) = 0) < 1.
   There are kn choices of k vertices in Kn , and each such choice is equivalent to a subgraph Kk of Kn . The
               
   Thus, R(k, k) ≥ 2k/2 for all k ≥ 3, and thus the Ramsey numbers grow at least exponentially.
   Notice how easy this is compared to a construction. If one wants an algorithm, there’s the possibility of an
                                                                                                               
exhaustive search. which takes exponential time and is unrealistic for large n, or a smarter way: if n = 2k/2 ,
             k
then n 21−(2)  1, so there is a very small probability that any given graph doesn’t have the property. Since
       
      k
it’s easy to check, one can just randomly guess a graph, and guess again in the unlikely event that the graph is
bad. Thus, this nonconstructive proof provides an algorithm, albeit a probabilistic one.
    In some ways, this is elementary, but not trivial: the proof is very easy to follow but hard to create. In some
sense, it’s really nice: someone else does all of the hard work.
Definition 9.3. A tournament on V = {1, . . . , n} is a directed graph T = (V, E) of n vertices such that for any
distinct x, y ∈ V , either (x, y ) ∈ E or (y , x) ∈ E.12
   Tournament graphs can be used to represent some competition, where the direction of the edge between x
and y indicates the winner of a match between them.
   A tournament has a property Sk such that if for any subset of k players there exists some other player who
beat all of them. Clarly, this requires k < n, and the “for any subset” aspect makes it difficult unless n  k.
What is the minimum value of n such that there exists a tournament on T with Sk ? The probabilistic method can
provide an upper bound by showing that a randomly chosen tournament T will have Sk with positive probability.
Claim (Erdös, 1963). If kn (1 − 2−k )n−k < 1, then there exists a tournament on n vertices with Sk .
                            
Proof. It will be shown that a random tournament will have probability of not satisfying Sk with probability strictly
less than 1. For all fixed subsets K of size k, let Ak be the complement to Sk . Thus, P(Ak ) = (1 − 2−k )n−k :
iterate over all j 6∈ K (there are n − k such j) and check if each j beat every k ∈ K. Since the tournament was
chosen uniformly at random, this has probability 1 − 2−k , so
                                                        
                                                                               
                                                [             X               n
                                                                                  (1 − 2−k )n−k ,
                                                         
                     P(T doesn’t have Sk ) = P       AK
                                                         =         P(A k ) =
                                                                             k
                                                    K⊆V            K⊆V
                                                   |K|=k          |K|=k
   This bound was chosen to be easy to prove, as with the probability distribution. A better bound can certainly
be found, but it requires much more work and is more elaborate.
Remark.
                                                                                               
   (1) The bound guaranteed by the theorem is a chore to calculate, but ends up being O k2k ).
   (2) There is a construction of such a T when n ≥ (1 + δ)k for some δ > 0, which is a little better.
                                                                                                                   (
Definition 9.4. A family of sets F is called intersecting (also a family of intersecting sets) if for any A, B ∈ F ,
A ∩ B 6= ø.
Theorem 9.5 (Erdös-Ko-Rado). If F is a family of intersecting k-subsets13, of {0, 1, . . . , n − 1} and n ≥ 2k,
           n−1
               
then |F| ≤ k−1   .
  12When this graph is taken as an undirected graph, one obtains a complete graph.
  13i.e. they are all of size k.
                                                            16
Proof. This proof is due to Kentona in 1972. Let As = {s, s + 1 mod n, . . . , (s + k − 1) mod n}, so that there
are n of them.
Lemma 9.6. F contains at most k of these sets.
    Assuming the lemma (whose proof is skipped, but very easy), let σ be a uniformly chosen permutation
of {0, . . . , n − 1} and i uniformly chosen in {0, . . . , n − 1} independently of σ. Then, let A = {σ(i ), σ(i +
1 mod n), . . . , σ(i + k − 1 mod n)}. Clearly, P(A ∈ F | σ) ≤ k/n since there are n choices for i , and only k can
be present by the lemma, so uncondition: P(A ∈ F) ≤ k/n. But since σ and i were          chosen at random, A is
uniformly distributed among all k-subsets of {0, . . . , n − 1}, so P (A ∈ F) = |F|/ kn .                          
                                                                                                 n−1
                                                                                                     
    This upper bound is actually tight: take F = {A ⊂ {0, . . . , n − 1} | 0 ∈ A}, so there are k−1    such sets that
all intersect.
    The beauty of the probabilistic method is that there is no probability in the questions it tackles. Thus, it can
be used to say a lot of very deep things about how probability relates to the rest of mathematics, and in more
ways than just counting. This relates probability to group theory, topology, etc., and is the most far-reaching
use of probability in mathematics.
  If G is not connected, so that G = G1 ∪ G2 , where G1 , G2 are its connected components, then χ(G, k) =
χ(G1 , k)χ(G2 , k).
Theorem 10.1 (Erdös). There exist simple connected graphs with arbitrarily large girth and arbitrary large
chromatic number.
   The probabilistic method is also useful here.
Definition 10.2. A planar graph is a graph that can be drawn in the plane such that no edges cross (they only
meet at vertices).
                                                         17
   There are many ways of drawing a graph; K4 is planar, even though the intuitive representation of it is not
planar, since there exists a drawing of it that is planar. Technically, in order to speak of the “inside” and “outside”
of curves (the edges), one needs to prove something complicated like the Jordan Curve Theorem, but here, any
edge can be represented by a piecewise linear curve, for which the theorem isn’t necessary. Thus,
    (1) there are no topological problems,
    (2) and even if there were, we wouldn’t care, anyway.
A planar graph thus separates the plane into regions, called faces. The number of faces isn’t obviously well-defined,
since it depends on the way that the graph is drawn. However, the following is known.
Theorem 10.3 (Euler). Suppose a planar graph G has V vertices, E edges, and divides the plane into F faces.
Then, V − E + F = 2.
Example 10.4. If G is a tree, then there is one face (since there are no cycles), so V − E + 1 = 2, or
V = E + 1.                                                                                           (
   There are hundreds of proofs of Euler’s theorem, and here’s one:
Proof of Theorem 10.3. Proceed by induction on F . The base case is due to Example 10.4, so suppose that G
is not a tree, or that there exists a cycle in G. Take some edge in the cycle and delete it; then, the nuber of
faces is smaller, so by induction V − (E − 1) + (F − 1) = 2, so V − E + F = 2.                               
   Thus, the number of faces for a planar graph is actually well-defined. The 2 in Euler’s theorem relates to the
surface, and is called the Euler characteristic of a surface: for example, on a torus there would be a different
constant.
   Another useful fact is that a connected, simple planar graph doensn’t have “too many” edges, which allows
for a nice bound on the chromatic number.
Theorem 10.5. Every planar graph can be colored using five colors.
   Actually, only four are necessary:
Theorem 10.6 (Four-Color Conjecture). Every planar graph can be colored using four colors.
   This was proven using a computer, which sounds scary until one realizes that the 1970s didn’t have a lot of
computing power compared to today. There were philosophical concerns about using a computer-generated
proof, but 40 years later most people accept it. However, a better proof was given in 1997 by Seymour et. al.,
which only required 5 minutes of computing time.
   After all, we have computers, so why not use them? A computer-aided proof was also used to verify the
Kepler conjecture, which stated that a hexagonal arrangement is optimal for packing spheres.
   The statement that a planar graph doesn’t have too many edges can be clarified: each face gives rise to at
least 3 edges, but each edge could be counted twice (or not at all for some edges), so E ≥ 3F/2. Using Euler’s
formula, 2 = V − E + F ≤ V − E + 2E/3, so E/3 ≤ V − 2, or E ≤ 3V − 6. This is relatively low given how
many edges are possible.
   This leads to a (not strict) upper bound on the chromatic number of a planar graph:
                                                          18
Theorem 11.1 (Six-Color Theorem). Any planar graph can be 6-colored, or if G is a planar graph, then χ(G) ≤ 6.
Proof. Proceed by induction on the number of vertices. If V < 6, then of course G is 6-colorable, because each
vertex can be assigned a different color.
   In general, let x be a vertex of degree at most 5, as per the claim. Then, by induction G − x is 6-colored,
because it is also planar and has strictly fewer vertices. Then, bring that coloring to G; at most five different
colors are adjacent to x, so it can be given the remaining color.                                              
   This proof doesn’t use x5 at all, which made people think there was some clever trick that would allow it to
be reduced to four colors. In fact, the discoverer of the proof, Kempe, published it as a proof of the Four-Color
theorem, and it was accepted as such for ten years. This was not the only such false proof, so much that when
the real proof was reported, people were wary of reporting it!
   Moving to matchings of graphs, Gale and Shapley’s solution to the stable marriage problem recently won a
Nobel Prize in Ecomonics.15 The paper itself was called College Admissions and the Stability of Marriage, one of
the most flavorful names for a math paper in a long tine, and one that illustrates some of the many applications
of this problem (there are lots of them, in fields as diverse as organ transplants).
Definition 11.2. A matching of a graph G is a subset of the edge set such that no vertex appears in more than
one edge of the matching. A perfect matching is when all vertices appear exactly once.
   One can consider a bipartite graph of a set M of men and a set W of women, where it is guaranteed that
there exists a matching such that every vertex in M is in an edge (so that |W | ≥ |M|).16 Additionally,
                                                                                             S          for any
m ∈ M, deg(m) ≥ 1, or else no suitable matching would exist. More generally, if I ⊆ M, then | i∈I W (i )| ≥ |I|,
where W (i ) is the set of vertices neighboring I. Clearly, these are necessary conditions.
                                                                                             S
Theorem 11.3 (Hall’s Marriage Theorem). These conditions are sufficient; if for any I ⊆ M, | i∈I W (i )| ≥ |I|,
then G admits a matching in which all elements of M are paired up.
Corollary 11.4. If G = M ∪ W is a k-regular bipartite graph and |M| = |W |, then G has a perfect matching.
Proof. Exercise; it is necessary to check that this condition implies the one in the theorem.
   14This requires that x or x be between x and x in the plane, but strictly speaking this isn’t always the case. However, the
                         2     4               1      3
colors can be rearranged such that x2 does lie between x1 and x3 , since some colors must lie between other colors around x.
   15Since Gale was dead, Shapley won the prize, along with Roth, a Stanford business professor.
   16If one wishes to have a perfect matching, it is necessary that |M| = |W |.
                                                             19
   Theorem
       S      11.3 can be reformulated as: if A1 , . . . , AN are some finite sets such that for any I ⊆ {1, . . . , N} we
have | i∈I Ai | ≥ |I|, then it is possible to find distinct elements a1 , . . . , aN with ai ∈ Ai . This is called a system
of distinct representatives (SDR).
   This has applications to group theory:
                                        S suppose GSis      a finite group and H ≤ G. Then, there exist g1 , . . . , gm ∈ G
(where m = [G : H]) such that G = m           g
                                          i=1 1  H =      m
                                                          i=1 Hg1 .
Proof of Theorem 11.3. Proceed by induction on N. If N = 1, then of course an SDR exists for one set; just
choose any element of the set.                                      S
  Call a collection of sets {Ai : i ∈ I} for 1 ≤ |I| ≤ N critical if i∈I Ai = |I| (e.g. ten men who among them
know exactly ten women). Thus, there are two cases:
      • Suppose there are no critical collections. Then, pickSsome aN ∈ AN , and let Ãi = Ai − aN , and each
        such set is nonempty. Thus, for any collection {Ai }, i∈I Ai ≥ |I| + 1 ≥ |I|, so Ã1 , . . . , ÃN−1 satisfies
        the hypothesis by induction. Thus, there is an SDR ã1 , . . . , ãN−1 , and each element is distinct from aN ,
        Thus, ã1 , . . . , ãN−1 , aN is an SDR for the whole collection.
Proof of Theorem 12.1. The proof is essentially the “old-fashioned marriage proposal.” To wit:
    (1) Each man proposes to the woman he likes most.17 Each man proposes to only one woman, and keeps
        the proposal until it is rejected.
    (2) If all women get a proposal, they all accept, and a stable matching is found.
    (3) Otherwise, some women get more than one proposal. Each such woman accepts the one she prefers
        most, and rejects the rest. A woman who has only one proposal still waits; after all, someone better
        might come along.
    (4) The men who were rejected propose to the woman they most prefer who have not yet turned them
        down. Then, return to step 2.
17Of course, you can switch the women and the men in this scenario, which is better for the women and worse for the men. . .
                                                                   20
This algorithm terminates, because each time a man is rejected, the number of choices he has dwindles, but
since everyone would prefer to be married, then eventually someone accepts. The matching found is stable,
because if it weren’t, then suppose m1 , w1 and m2 , w2 were unstable with m1 and w2 preferring each other.
Then, m1 must have proposed to w2 before w1 , and therefore rejected by her, which means that w2 prefers m2
to m1 , which generates stability.18                                                                      
   Note that this stable matching is not necessarily unique: one might obtain a different pairing by letting the
woman propose and the men choose. It turns out, though, that the algorithm discussed above is most optimal
for the men: each man gets the best of all possible women in a stable pairing, and each woman gets the worst
of all possible men such that the pairing is still stable.
   The last graph-theoretic topic of this class will be network flow. Here, one takes a directed graph and two
distinguished nodes s and t to try to get something from s to t.
Definition 12.2. A flow on a directed graph is a real-valued positive function on the edges of the graph, such
that the net flow
             P of all values
                           P at a vertex (signed by whether they’re coming in or going out) is zero. Formally, if
v ∈ V , then x f (v , x) − y f (y , v ) = 0 (where the sums are over all x, y where those edges exist, or such
that the flow is defined).
   Note that negative weights could be used instead of directed edges, but one is more intuitive than the other.
   Then, there can be a capacity on each edge c(x, y ) ≥ 0, representing the maximum possible flow through the
pipe x to y ,P            P considered are those for which f (x, y ) ≤ c(x, y ). Define the volume of a flow f
             so the only flows
to be |f | = v f (s, v ) − u f (u, s). Then, what is the flow with the maximum volume?
Claim. Then, since there are no points of accumulation, the amount flowing out of s must be equal to the
amount flowing into t.
Proof.                                                                                               !
                                                X           X                       X
                                                                   f (v , w ) −          f (u, v )       = 0,
                                               v 6=s,t       w                      u
and                                                                                                 !
                                                X X                                X
                                                                  f (v , w ) −          f (u, v )        = 0,
                                                 v          w                       u
so                                                                                                   !
                                                X           X                       X
                                                                   f (v , w ) −          f (u, v )       = 0,
                                               v =s,t        w                      u
which is just the difference in the net flow out of s and the net flow into t.                                                   
     More generally, one could take some set S ⊆ V , and look at the flow from things in S to things not in S.
Claim.                         X                     X                       XX                          XX
                                       f (u, v ) −          f (u, v ) =                  f (u, v ) −                f (u, v ).
                               u∈S                   u6∈S                    u∈S    v                    v ∈S   u
                               v 6∈S                 v ∈S
That is, if the things in S are included in the flow calculation, the result doesn’t change: this is just the sum of
the net flows at u for each u ∈ S.
     If S ⊆ V − {s, t}, then this is zero, and if s ∈ S but t 6∈ S, then this is vol(f ) = |f |.
Definition 12.3. A cut is a collection of vertices S with s ∈ S but t ∈ S c , or a partition such that s is in one
half, but t is in the other.
     Thus, for any cut, we have                                  X                      X
                                                     |f | =               f (u, v ) −           f (u, v ).
                                                                  u∈S                   u∈S c
                                                                 v ∈S c                 v ∈S
18Notice the lack of formalism in this proof. This is okay, because the original paper didn’t have any either!
                                                                               21
                                                13. Network Flow II: 5/13/13
    Recall that on a directed graph G, a flow is a nonnegativeP  function f (x, yP) on the edges of G (i.e. f (x, y ) = 0
if there is no edge from x to y ), such that for any v = 6 s, t, w f (v , w ) − w f (w , v ) = 0 (intuitively, there are
no leaks in the graph). If the maximum possible flow is c(x, y ), which is some other nonnegative function, one
considers only flows such that 0 ≤ f (x, y ) ≤ c(x, y ). In some situations, it will be assumed that the capacity is
finite, which doesn’t make too much of a difference if it’s sufficiently large.
    Suppose S is any subset of the vertex set V , and S = V − S. Then,
                               X                X             X              X
                                    f (x, y ) −   f (y , x) =    f (x, y ) −     f (x, y ).
                                 x∈S                  x∈S                  x∈S                 x∈S
                                 y ∈S                 y ∈S
This is zero if S ⊆ V − {s, t}, and if s ∈ S but t 6∈ S, then this becomes |f |. Recall P
                                                                                        that a cut is a partition
of G such that s is on one side and t is on the other. Then, the capacity of a cut is x∈S,y ∈S c(x, y ), which
represents the amount that can flow at the boundary. Then, since f (x, y ) ≤ c(x, y ) for all x and y , then
                                        X              X             X
                                 |f | =    f (x, y ) −   f (y , x) ≤   c(x, y ).
                                               x∈S                  x∈S                 x∈S
                                               y ∈S                 y ∈S                y ∈S
Thus, the capacity of any cut is at least the volume of the flow. This leads to something a little more interesting:
if the goal is to find a flow of maximum volume, then the maximum volume is less than the capacity of any
cut, or the maximum volume is at most the minimum capacity of a cut. Since there are a finite number of
options, then clearly the minimum capacity exists, but the maximum flow is more nuanced: just because it is
bounded above doesn’t mean it has a maximum, as in the sequence {x ∈ Q, x 2 < 2}. This requies a little bit of
elementary analysis to address fully: specifically. using the Bolzano-Weierstrauss theorem, it is possible to show
there is a maximal flow on each edge and therefore on the entire graph by taking subsequences that converge on
each edge.
Theorem 13.1. In fact, these quantities are equal: the maximum flow is equal to the minimum cut.
   This leads to some number of algorithms for finding the maximum flow given the capacities. The maximum
flow might not be unique, however; several different flows could lead to the same total flow. The following proof
is due to Ford and Fullerson.
Proof of Theorem 13.1. Take a maximal flow f . Then, the goal is to construct a cut S such that the capacity
of S is equal to |f |. First, put s ∈ S. Then, pick any v 6∈ S, and check if there is an x ∈ S such that
f (x, v ) < c(x, v ) or f (v , x) > 0. If so, then add v to S. Then, repeat as long as there are vertices to add.
Then, there are two things to show: first, that S is a cut (so that t 6∈ S) and that the capacity of S is |f |.
    Suppose t ∈ S. Then, there is a sequence x0 = s, x1 , . . . , xn = t such that for all j, either c(xj , xj+1 ) −
f (xj , xj+1 ) > 0 (so there’s more room) or f (xj+1 , xj ) > 0 (so there is backflow). Let εj = max(c(xj , xj+1 ) −
f (xj , xj+1 ), f (xj+1 , xj )), so that εj > 0, and ε = min εj . Then, adjust the flow: if ε is the first case, then increase
the flow through that point by ε, and if the second case happens, reduce the backflow by ε.
    This is a valid flow that satisfies the capacity bounds, and its volume is |f | + ε. This is very much like an
Euler trail, with something going in and something coming out. Then, since the volume is increased, f isn’t a
maximal flow, which is a problem. Thus, it is necessary that t 6∈ S, so S is a cut.
    The capacity of this cut is
                                           X             X              X
                                              c(x, y ) =    f (x, y ) −    f (x, y ) = |f |.                                
                                        x∈S                  x∈S                 x∈S
                                        y ∈S                 y ∈S                y ∈S
Proof of Theorem 11.3. Add to G two vertices s, connected to every vertex in V1 , and t, connected to every
vertex in t2 . Then, give capacity 1 to all edges adjacent to s or t, and some integer large capacity to every
other edge, such as n. Then, the minimum flow must be equal to the maximum cut, so suppose S is a cut
that contains s but not t. Then, if S ∩ V1 and y ∈ S ∩ V2 and (x, y ) is an edge in G, then this is fine, because
c(x, y ) = n. Then, the flow from s to t passes through the neighbors of S ∩ V1 in V2 , of which there are at
least |S ∩ V1 | by the precondition, so each they are also in S. Thus, the capacity of the flow is at least n, so the
theorem follows.                                                                                                   
   There are lots of variants on Menger’s theorem, and here’s one related to network flow and edge connectivity:
Theorem 14.1 (Menger). Suppose G is an undirected graph and s, t ∈ V (G). Then, let p be the maximum
number of paths from s to t such that all used edges are disjoint, and let p 0 be the minimum number of edges
required to disconnect s and t. Then, p = p 0 .
Proof. Consider a directed graph in which every edge {x, y } in G corresponds to two edges −→ and −
                                                                                           xy     y→
                                                                                                   x, and give
each edge a capacity of 1. Then, the max flow with f (x, y ) = 0 or 1 is the number of edge-disjoint paths from
s to t, which makes sense by thinking about how the flow works, and the min cut is the smallest number of
edges necessary to disconnect s and t, so since the max-flow and min-cut are equal, then p = p 0 .            
  Turning to enumerative combinatorics, consider some famous theorems: first, Hall’s theorem, Theorem 11.3
on SDRs, can be considered enumerative.
Definition 14.2. A poset, or partially ordered set, is a set S with an order relation < such that not all elements
are comparable, but if x, y ∈ S are comparable, then x < y , y < x, or x = y , and if x < y and y < z, then
x < z.
Example 14.3. Let X be a finite set and S be its power set, ordered by inclusion, where if A, B ⊆ S, then
A < B iff A ( B (equivalently, A ≤ B iff A ⊂ B). This is clearly a partial order, but there exist A 6= B such that
A 6< B and B 6< A.                                                                                               (
Definition 14.4. A chain in a poset is a sequence x1 < · · · < xn , and an anti-chain is a set {x1 , . . . , xn } such
that no two elements are comparable.
Theorem 14.5 (Sperner). If X is a finite set of set n and S is its power set ordered by inclusion, then the size
                                 n
of a maximal antichain in S is bn/2c .
         examples of antichains are sets of singleton sets, such as {x1 }, . . . , {xn }, and note that maxk kn =
                                                                                                               
   Some
   n
      
 bn/2c .
then
                                                        n
                                                                    
                                       X        X bn/2c           n
                                           1≤           n
                                                           ≤          ,
                                                        k
                                                                bn/2c
                                       A∈A     |A|=k
            n
               
so |A| ≤ bn/2c   .                                                                                          
Theorem 15.1 (Dilworth). If S is any finite poset, then the minimal number of chains in a partition of S into a
union over all such partitions is equal to the size of the maximal antichain.
   Showing ≥ is considerably easier than ≤, but that is beyond the scope of this class.
   In a set X, a family of k-subsets A = {A1 , . . . , A` } is called intersecting if Ai ∩ Aj =
                                                                                              6 ø for all 1 ≤ i , j ≤ `. If
|X| = n, then the size of the largest intersecting family is at most kn , and if k > n/2 then all k-element subsets
                                                                          
   This leads into various problems in enumeration. For example, one might count functions from n-element
sets to k-element sets.
   Let S be a set of n elements and A1 , . . . , Ak be a collection of subsets
                                                                         S of S. T
                                                                                 Then, one can use the
                                                                                                    T principle
of inclusion and exclusion to calculate the number of elements of ki=1 Ai = ki=1 (S − Ai ) given i∈I Ai for
I ⊆ {1, . . . , k}.
Theorem 15.3 (Principle of Inclusion and Exclusion).
                        k
                        [                  k
                                           X                 X                         X
                   S−         Ai = n −            |Ai | +             |Ai ∩ Aj | −               |Ai ∩ Aj ∩ A` | + · · ·
                        i=1                i=1              1≤i<j≤k                  1≤i<j<`≤k
                                              X                  \
                                  =n+                  (−1)|I|          Ai .                                               (15.4)
                                           I⊆{1,...,k}            i∈I
                                              I6=ø
  In some sense, one starts by overcounting, then undercounting, then overcounting, and so on.
                                                                  √
  A neat application of this is that if S = {1, . . . , N} and p ≤ n is prime, then let Ap = {n ≤ N | p | n}.
                                                                                                           √ If
π(x) represents the number of primes less than x, then then S − p≤√N Ap is the set of primes between N
                                                                    S
                                                                        24
and N. Thus, the principle says that
                                             √         X         X
                                1 + π(N) − π( N) = N −   |Ap | +   |Ap ∩ Aq | − · · ·
                                                                    √
                                                                  p≤ N            p,q
More generally, for d not necessarily prime, let Ad = {n ≤ N | d | n}, so that |Ad | = bN/dc. The numbers that
are interesting here are products of distinct primes d = p1 · · · p` (i.e. are square-free). This number d appears in
the formula with sign (−1)` .
   In number theory, the Möbius function is an important concept:
                                           
                                            1,        d =1
                                  µ(d) =      (−1)` , d = p1 · · · p` are distinct
                                              0,       p 2 | d for some prime p
                                           
Proof of Theorem 15.3. The formula for ki=1 Ak will be found: pick some a ∈ S and suppose that it appears
                                                  S
in r of the A1 , . . . , Ak , so that 0 ≤ r ≤ k. Then, the left-hand side of the formula (15.4) counts a once if
r ≥ 1 and zero times if r = 0. The right-hand side counts a zero times if r = 0, and if r ≥ 1, then suppose
a ∈ Ai1 , . . . , Air , and in the right-hand side of (15.4) a is counted
                                                                     r          
                                             X                       X          r
                                                     (−1)|I|−1 · 1 =   (−1)`−1    ,
                                                                                `
                                          ø6=I⊆{i1 ,...,ir }              `=1
so the quantity calculated must be 1. The second formula is the complement of this one, and its formula follows
as a result.                                                                                                  
   Consider the set of permutations on {1, . . . , n}, which are required to be bijections.20 There are n! permuta-
tions, but consider the number of derangements: permutations which have no fixed points. Imagine putting n
letters in n envelopes such that no letter goes in the correct envelope.21 How many derangements are there?
Let A` be the set of permutations that fix `; then, none of the derangements belong to any of these sets, so the
total number of derangements is
                                              n
                                              \                            X       \
                                                    (Sn − A` ) = n! +                     Ai .
                                              `=1                       I⊆{1,...,n} i∈I
                                                                           I6=ø
   19There is a sculpture near the Cantor Arts Center that is relevant to this sieve.
   20Recall that a function f : A → B is injective if f (a) = f (b) for a, b ∈ A, then a = b. A function is surjective if it is onto:
f : A → B is surjective if every b ∈ B has an a ∈ A such that f (a) = b. A bijection is a function that is both injective and surjective.
    21Apparently this is a homework question in one of the professor’s other classes. Oops.
                                                                   25
    T
In i∈I Ai , the elements of I are fixed and the rest are permuted, so there are (n − |I|)! such permutations, so
this becomes
                            n                      n                           n
                          \                       X
                                                           `        n           X   n!
                        n     (Sn − A` ) = n! +      (−1) (n − `)!      = n! +
                                                                     `              `!
                               `=1                     |I|=`                            `=1
                                                        `=1
                                                                         (−1)n
                                                                              
                                                       1   1   1
                                              = n! 1 −   +   −   + ··· +        ,
                                                       1! 2! 3!            n!
which approaches n!/e as n → ∞. Thus, most permutations are derangements for large n. There is literature
on what a random permutation might look like (which has applications, such as shuffling cards). The probability
of having k fixed points is approximately 1/ek!, which forms a Poisson distribution. There are lots of similar
problems: imagine a set of 100 prisoners who are given numbers on their backs (which they cannot see). There
are 100 boxes, and each prisoner can look at 50. If all of them see their numbers, they can go free. There’s a
reasonably good strategy that they can employ; can you find it?
   A probem called the twelve-fold way asks for the number of functions f : N → K, where N = {1, . . . , n} and
K = {1, . . . , k}. One may wish to count injections or surjections, or arbitrary functions, giving bijections. Then,
there are four ways of thinking of the functions as the same (which leads to the number twelve): there may be
no restricton, or one may wish to consider them up to a permutation of N, or of K, or of both.
   This can be imagined as placing n balls in k boxes, such that there are no restrictions on the number of balls
in each box, at most one ball in each box, or at least one ball in each box. One could have the balls colore with
n colors, or that the balls have the same color and the boxes have different labels, or the balls have different
colors and the boxes look identical, or the balls and boxes are indistinguishable among themselves. This forms a
natural collection of enumeration problems, five of which are difficut and seven of which are easy.
 Case i. There are k n functions N → K, where all balls and boxes are distinct.
Case ii. How many injective functions are there? There are k choices for the location for the first ball, and
          k − 1 for the second, k − 2 choices for the third, etc. If k < n, then there are zero choices, so the total
          number of functions is k(k − 1)(k − 2) · · · (k − n + 1), a number called the falling factorial.
Case iii. How many surjective functions are there? We will return to this next week, using the principle of
          inclusion or exclusion or setting up a recurrence. This would be a good thing to think about. Let Ai be
          the set of balls in box i ; since the function is surjective, then there are all nonempty sets. Thus, the
          goal is to partition {1, . . . , n} into k nonempty sets. This is different than writing n as the sum of k
          numbers because the elements of the set are distinct.
Case iv. Suppose the balls are indistinguishable, but the boxes aren’t. Then, the goal is to consider all such
          functions, so if box i contains ai balls, then a1 + · · · + ak = n, and it is possible to permute the ai .
          There are two ways to approach this:
             Imagine all of the balls are red. Put some partitions, represented by k − 1 green balls, between the
          red ones, such that green balls may be adjacent. Then, everything to the left of the first green ball goes
          in box 1, everything between the first and the second goes to box 2, etc. Thus, the goal is to choose
          locations for the k − 1 green balls    among the total of n + k − 1 slots (after which the red balls can be
          added), giving a total of n+k−1  k−1 .
             The second proof uses a very useful idea called a generating function. For each box i , take a sequence
          (1 + x + x 2 + x 3 + · · · ). Then, the number of balls in box i is one of these coefficients, and the coefficient
          of x n in the product of all of these terms is the product we want: it includes all distinguishable partitions
          of the n balls into k boxes. The geometric series converges to 1/(1 − x), so if F (x) = 1/(1 − x)k
                                                                 th
          represents the product, one can compute       the n derivative at zero, which once divided by n! is the
                            n                     n+k−1
          coefficient of x . This becomes k−1 .
Case vi. If the functions are requred to be surjective, the goal is to put n identical balls into k boxes such that
         each box has at least one ball. After removing one ball from each box, the goal is to put n − k balls
                                                               26
             in k boxes with no conditions, which reduces to Case iv, giving n−k+k−1      n−1
                                                                                             
                                                                                k−1   = k−1     , and there are of
             course no ways if k < n. This illustrates another way to approach Case iv: one chooses ` boxes for
             some 1 ≤ ` ≤ k that have at least one ball, so the total number is
                                            k                              
                                            X  k   n−1                    n+k −1
                                                                     =            .
                                                  `       `−1              k −1
                                            `=1
                                                          k           
                                                          X           k  `
                                           k!S(n, k) =          (−1)     (k − `)n ,
                                                                      `
                                                          `=0
 It will also be nice to know pk (n), the number of partitions of n into k parts. A recursive formula is given
 by noticing that if one wishes to write n = a1 + · · · + an such that a1 ≥ · · · ≥ ak ≥ 1, then either ak = 1,
 whih gives pk−1 (n − 1) options, or ak ≥ 2, in which case all of the ai are, so there is an expression for
 n − k = (a1 − 1) + · · · + (ak − 1) ≥ 1, contributing a factor of pk (n − k). Thus, pk (n) = pk−1 (n − 1) + pk (n − k).
                                                                27
                                           18. Combinatorial Functions: 6/3/13
    Recall the definitions of some functions given in the Twelvefold Way: the binomial coefficients are probably
familiar, but also S(n, k), the Stirling number, which is the number of set partitions of {1, . . . , n} into k nonempty
sets, which obeys the relation S(n, k) = kS(n − 1, k) + S(n − 1, k − 1). Additionally, there is the function
pk (n), which is the number of partitions of n as n = a1 + · · · + ak such that a1 ≥ a2 ≥ · · · ≥ ak ≥ 1. This obeys
pk (n) = pk−1 (n − 1) + pk (n − k), and there are other relations given in the textbook.
    Consider bijections of {1, . . . , n}. Each bijection π can be written as a cycle decomposition by seeing where 1
goes, then π(1), etc., until 1 is reached again. This is written as (1 π(1) π(π(1)) · · · ). Then, repeat with a
number that wasn’t in the first cycle, and so on. Some numbers are fixed points: if π(i ) = i , then the cycle
is (i ). The Stirling number of the first kind is the number of permutations of {1, . . . , n} which have exactly k
cycles. Then, summing over k, one would obtain all n! permutations. Notice that the 100 prisoners problem
presented earlier has a nice solution in terms of cycles: what is the probability there is a cycle of size at least 50?
    This comes up in “nature:” 52! is the number of posible shuffles for a deck of cards. One could use this
to determine whether a deck has been shuffled correctly: does it “look like” a random permutation? Markers
indicating too many cards in the same position are a red flag.
    In all of the n! permutations, how many n-cycles are there? There are (n − 1)!, because some cycle
decompositions are identical, such as (1 2 3) and (2 3 1).
    One can also use the principle of inclusion and exclusion: for example, how many permutations of {1, . . . , n}
have no cycle of length greater than n/2? By size constraints, each permutation must have at most one such
cycle, so the answer is
                                  X                        X n  Y    k                   X n!
                    n! −                       1 = n! −                  (k − i )! = n! −              .
                                                                  k                                 k
                               n≥k>n/2                           n≥k>n/2     i=1           n≥k>n/2
                        π has a cycle of length k
Notice that if k ≥ n/2 this wouldn’t work, because some things would be double-counted.
   Recall the prisoners problem: one solution is to take prisoner n to look at box n and get a number, then look
at the box with that number, and so on. The goal is to look up the cycles, and the prisoners can go free if all
cycles are of length at most 50 (otherwise, the prisoners might not see the boxes with their own numbers). This
is optimal,
           though  it’s not easy to show it. As shown above, the number of permutations for which this holds is
100! 1 − 100       1
           P
             k=51 k . An approximation can be given as
showing that the chance is at least log(101/51) ≈ 1/3 and at most log 2 ≈ 0.693, which is pretty nice. Notice
that if this fails, then at least 50 people fail to see their number.
    Further questions can be asked: what happens if fifty boxes are replaced by 35? In general, if they must
inspect n/u out of n boxes, then as n gets large one sees the Dickman-de Bruijn function ρ(u): we saw that
ρ(2) = ln 2, and this number is always positive. This actually has to do with unpublished work of Ramanujan
(of course). This function ocurs in lots of ways: if one factors a random n (whatever random means here)
                                                                                                       √
into primes n = p1α1 · · · pkαk , then what is the chance that all of these prime factors are less than n? This
ends up being 1 − ln 2, and there are all sorts of interesting connections between what a random number looks
like, a random permutation looks like, and a random polynomial looks like, and there’s a lot of structure here
(Don Knuth did some work here, too). This function satisfies a differential difference equation, funnily enough:
uρ0 (u) = −ρ(u − 1), and ρ(u) = 1 for 0 ≤ u ≤ 1 (the prisoners always succeed). Ramanujan wrote down what
the principle of inclusion and exclusion would give for this in cases for u ≤ 6, and then said “and so on.”
    These sorts of things are useful in case one doesn’t care as much about exact formulas, but instead the
asympotics of how these combinatorial functions behave over the long run. Generally, one would use tools of
analysis or calculus to determine this. This also relates to the philsophical idea of a good formula, which is an
expression for a function that is ideally easy to compute or easy to understand.
    One of the most basic and useful identities is Stirling’s formula for n! We don’t know how to compute
factorials exactly faster than O(n) (just multiplying things together). Since there doesn’t seem to be a great
                                                                     28
formula, there’s at least one for an approximation:
                                                                  √           n n
                                                           n! ∼       2πn             ,
                                                                              e
where ∼ means that
                                                        n!
                                                       lim √     = 1.
                                                     2πn(n/e)n
                                                   n→∞
This can be used to understand     how pk (n) or S(n, k) behave as n → ∞. One could also consider the horrible
Bell numbers B(n) = nk=1 S(n, k) (all possible partitions of {1, . . . , n} into nonempty
                       P
                                                                                           sets), which have no
good formula, but asymptotically are better to understand, and similarly p(n) = nk=1 pk (n). One can count
                                                                                     P
trees on n vertices up to isomorphism, for example, but this is a headache for general n, and the best case is an
asymptotic understanding.
   Now for some formulas: one useful technique is to compare a sum with an integral of the expression being
summed.
                                     XN          XN Z k+1           Z N+1
                            log N! =    log k ≤             log t dt =         log t dt.
                                       k=1                 k=1      k                               1
                              N
Expontntiating, N! ≥ (N/e) e, which is nice to know. The integral for the upper bound can also be evaluated,
giving (N + 1) log(N + 1) − N. Since log(N + 1) − log N = log(1 + 1/N), this can be expanded out using a
Taylor series:
                             Z x        Z x
                                 dt                                               x2   x3
                log(1 + x) =          =     (1 − t + t 2 − t 3 + · · · ) dt = x −    +    − ··· ,
                              0 1+t       0                                       2    3
so the difference can be found and simplified to log N! ≤ (N + 1)(log N + 1/N) − N. In summary,
                                        N
                                          N               N N+1 1+1/N
                                                e ≤ N! ≤       e      ,
                                          e                eN
and the difference is a factor of Ne 1/N . Stirling’s formula indicates that this difference is split, up to constant,
      √
as the x term.
PNIn general, converting to an integral is a good way to approach a solution; for example, the harmonic series
  k=1 1/k should be approximately log n. Similarly,
                                            N         Z N
                                           X
                                                  k                N k+1
                                                n ≈       x k dx ≈       ,
                                                        1          k +1
                                                 k=1
and this approximation can be made precise with the upper
                                                        R k and lower bounds, as seen in the previous examples.
  Another approximation can be made: since log k ≈ k−1 log t dt, then for k ≥ 2,
                 Z k             Z k             Z k       
                                                           k
                      log k dt −      log t dt =      log     dt,
                  k−1             k−1             k−1       t
If t = k − y , then 0 ≤ y ≤ 1 is an appropriate change of variables, yielding
                                  Z 1             
                                              1
                               =      log            dy .
                                   0      1 − y /k
The Taylor series log(1/(1 − x)) = x + x 2 /2 + x 3 /3 + · · · gives
                                 Z 1X∞                ∞                       ∞
                                                              1 1 1 `
                                                                      Z
                                          y `1         X                        X        1
                              =                  dy =           `
                                                                  ·      y dy =     ` (`)(` + 1)
                                                                                                 .
                                  0       k    `              k     `  0          k
                                           `=1                        `=1                                   `=1
Thus,
                                     N
                                     X                 Z    N                 N 
                                                                              X                     Z   k             
                         log N! =          log k =              log t dt +                log k −             log t dt .
                                     k=2                1                     k=2                       k−1
                                                                        29
man a
the of
vly of it
strange
and last
the The ut
as
such case
Rule intellectual hedge
Lord but in
preaching the
the
becoming
of
is difiiculty as
a to
to towards block
however
of the
corrupted
of economists
one
for
of
health or
The
wished
of
year
is a for
is
is teaching
who
have it
into World
serious transport
the as
Providence IX
cast 47
at for
and the
shifts any
William
the hear
Draconian on
competency the due
explain
of of of
in
discrepancy of do
dirty fancy of
fourth
if we
is on
only to and
antagonism perhaps a
Catholic
Thus reader
of upon
sanari
versions
most
an reason
stories
Vienne along or
man roamed The
the to
but Lucas
in of the
loyalty French
what
of
make
our papers
difficulties
and
suggestive
limits
maximo
to
well hull
No
is
has of
not between
between then
fall
its lepers is
assigning
their
Germany
of because
of Bull He
are 1886
religion
School a this
70 of
of has party
when Mr runes
bestow the
killed 36
spatio are leave
out
in of reached
a Pacific Chinese
in where
to
from
the
both and
with forty
saints matter
in
to two
cannot that
escape
becomes sheer to
in
locality door of
the him give
light
Motais
inaugurated ac to
duties
Of caricature insensible
s is of
have the
cheek that
in April Thwackum
monasteries comparison
vast
impulse of
CANADIAN
heights same
the
eagerly him
same
a is
an
teaching Children
land
local
below Relations
and
of but opposite
tower Room
meant
blue
that scientific we
State population
explere the
contains desire
is tax large
upon variety degree
be
of allow
after
Pekin Vol
work seen
scrupulous s though
which actual
with
on
a well Ecclesiae
remind laudamus
to
subside Future
is The
to
the
into the
society
a English the
But
act
district the
Blessed of Another
boats
has
care of letter
not
from
of
having
characters 70 show
in class
to was
own time
the
the so a
when he
least
feature
was
night
fall
to
so
it upon factory
has any man
fashion
the
However
island This in
and an
be a material
Dr rise of
desiccated was
Great
intimate
against of s
modifications Paris
labour with as
effect
a over Monica
machinery
to not pending
marine of
year it
obstinacy
no
tendencies
and in
he Mr now
members
well
doctrine fair
supply second he
considered says
Asia
the the
Aliquot
he historical are
consists
green which of
the
them
with
to and bottom
lands
the tells
ingloriously
by and
the God
land property is
in
the view
at than
c of quod
upon glows
the the me
Nentre the
is intruded
of
that
and the
in
to and most
also beautifully 317
in the the
burning desires as
dramatis of
Practise strenuous
tragic ceremonies
in
it of that
Mr the by
life
be
employed
anything include
the
kept both unconscious
of
them federal
portions p the
a that
about
other 000
from the
of Military G
for were
free
and and
the
miles hostibus
our it up
1882
But by
civilization
have of Mr
advance
startling of
for
as
St While It
become of
is corners Ireland
propagated Having
December
and
years
the us the
on spell
remarks Eminent
Salvatoris
humble
at article are
of tze
school
a where
the
in His
have industrial
to Unquestionably
makes of the
Plato
42 Society the
Orders
absorbed the
Lamp
mission illud
s in giving
discovery
he
and like
cannot
the Legislative
world
the School
a in
historians
the
monastic
Lucas do
serve I
as professione counterbalance
since he impervious
Notices
a much to
with
in
tailors
copies to
period He
be of
in a cease
dictis the
he held
to consequently
to gfanting
the
go bestowed consequence
divided
conception of been
iniuria
a
the
most when
Ah these
a fact up
and ought
of his
or
crack by
a companion Berlin
secrets
whicli body
operates and
Tientsin
that
and
production one
more the
history C Li
of in
after
though this to
only when
Entrance
the
resemblance
Baku chain he
powers It written
the
badly bas picturesque
minor at laboraverit
3a
a nature on
Life
one his
they years he
false of Continent
philosopher many
i give
of depth
Inhap
Almighty moral
as would
Begin Paris
feet
Smoky a the
be Church
for been
followers unenviable
or United
about of in
Hill the
will by
at
obtained
has but
his cultivate
deep tranquillity to
so
the find
of Ireland earth
your is judge
death so
way
of
his in in
to White but
that make
is register
basin
liquids
apostle
of
the sometimes
their
may
Motais
who
in And navigate
illustrations
tell tea of
or feared
it 4 with
into
tze Is earned
away
at among
cultum hardly
The
of history of
this a
had
and per
85 all
party
Buddha their
two countrymen
Shah hand
the argument