ECE 606, Algorithms: Mahesh Tripunitara Tripunit@uwaterloo - Ca ECE, University of Waterloo
ECE 606, Algorithms: Mahesh Tripunitara Tripunit@uwaterloo - Ca ECE, University of Waterloo
Mahesh Tripunitara
[email protected]
ECE, University of Waterloo
2
Acknowledgements
Contents
Lecture 1 5
Lecture 2 65
Lecture 3 107
Lecture 4 147
Lecture 5 201
Lecture 6 259
Lecture 7 297
Lecture 8 341
Lecture 9 385
Lecture 10 403
Lecture 11 419
Lecture 12 437
4
Lecture 1
• Introduction to Python 3.
5
6
Many chapters of this book touch on the elements of discrete mathematics. This
chapter reviews more completely the notations, definitions, and elementary prop-
erties of sets, relations, functions, graphs, and trees. Readers already well versed
in this material need only skim this chapter.
B.1 Sets
• N denotes the set of natural numbers, that is, the set {0, 1, 2, . . .}.2
1 A variation of a set, which can contain the same object more than once, is called a multiset.
2 Some authors start the natural numbers with 1 instead of 0. The modern trend seems to be to start
with 0.
B.1 Sets 1071
If all the elements of a set A are contained in a set B, that is, if x ∈ A implies
x ∈ B, then we write A ⊆ B and say that A is a subset of B. A set A is a
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
A ∩ B = {x : x ∈ A and x ∈ B} .
A ∪ B = {x : x ∈ A or x ∈ B} .
A − B = {x : x ∈ A and x ∈
/ B} .
Idempotency laws:
A∩ A = A,
A∪ A = A.
Copyright @ 2001. MIT Press.
Commutative laws:
A∩B = B∩ A,
A∪B = B∪ A.
1072 Appendix B Sets, Etc.
A B A B A B A B A B
− = = ∪
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
C C C C C
A − (B ∩ C) = A − (B ∩ C) = ( A − B) ∪ ( A − C)
Figure B.1 A Venn diagram illustrating the first of DeMorgan’s laws (B.2). Each of the sets A, B,
and C is represented as a circle.
Associative laws:
A ∩ (B ∩ C) = ( A ∩ B) ∩ C ,
A ∪ (B ∪ C) = ( A ∪ B) ∪ C .
Distributive laws:
A ∩ (B ∪ C) = ( A ∩ B) ∪ ( A ∩ C) ,
(B.1)
A ∪ (B ∩ C) = ( A ∪ B) ∩ ( A ∪ C) .
Absorption laws:
A ∩ ( A ∪ B) = A,
A ∪ ( A ∩ B) = A.
DeMorgan’s laws:
A − (B ∩ C) = ( A − B) ∪ ( A − C) ,
(B.2)
A − (B ∪ C) = ( A − B) ∩ ( A − C) .
The first of DeMorgan’s laws is illustrated in Figure B.1, using a Venn diagram, a
graphical picture in which sets are represented as regions of the plane.
Often, all the sets under consideration are subsets of some larger set U called the
universe. For example, if we are considering various sets made up only of integers,
the set Z of integers is an appropriate universe. Given a universe U , we define the
complement of a set A as A = U − A. For any set A ⊆ U , we have the following
Copyright @ 2001. MIT Press.
laws:
A = A,
A∩ A = ∅,
A∪ A = U .
B.1 Sets 1073
DeMorgan’s laws (B.2) can be rewritten with complements. For any two sets
B, C ⊆ U , we have
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
B ∩C = B ∪C ,
B ∪C = B ∩C .
Two sets A and B are disjoint if they have no elements in common, that is, if
A ∩ B = ∅. A collection S = {Si } of nonempty sets forms a partition of a set S if
• the sets are pairwise disjoint, that is, Si , S j ∈ S and i = j imply Si ∩ S j = ∅,
and
• their union is S, that is,
S= Si .
Si ∈S
and is called the power set of S. For example, 2{a,b} = {∅, {a} , {b} , {a, b}}. The
power set of a finite set S has cardinality 2|S| .
We sometimes care about setlike structures in which the elements are ordered.
An ordered pair of two elements a and b is denoted (a, b) and can be defined
formally as the set (a, b) = {a, {a, b}}. Thus, the ordered pair (a, b) is not the
same as the ordered pair (b, a).
1074 Appendix B Sets, Etc.
The Cartesian product of two sets A and B, denoted A × B, is the set of all
ordered pairs such that the first element of the pair is an element of A and the
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Exercises
B.1-1
Draw Venn diagrams that illustrate the first of the distributive laws (B.1).
B.1-2
Prove the generalization of DeMorgan’s laws to any finite collection of sets:
A1 ∩ A2 ∩ · · · ∩ An = A1 ∪ A2 ∪ · · · ∪ An ,
A1 ∪ A2 ∪ · · · ∪ An = A1 ∩ A2 ∩ · · · ∩ An .
B.1-3
Prove the generalization of equation (B.3), which is called the principle of inclu-
sion and exclusion:
|A1 ∪ A2 ∪ · · · ∪ An | =
Copyright @ 2001. MIT Press.
B.1-4
Show that the set of odd natural numbers is countable.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
B.1-5
Show that for any finite set S, the power set 2S has 2|S| elements (that is, there
are 2|S| distinct subsets of S).
B.1-6
Give an inductive definition for an n-tuple by extending the set-theoretic definition
for an ordered pair.
B.2 Relations
Proof For the first part of the proof, we must show that the equivalence classes
of R are nonempty, pairwise-disjoint sets whose union is A. Because R is reflex-
ive, a ∈ [a], and so the equivalence classes are nonempty; moreover, since every
element a ∈ A belongs to the equivalence class [a], the union of the equivalence
classes is A. It remains to show that the equivalence classes are pairwise dis-
joint, that is, if two equivalence classes [a] and [b] have an element c in common,
then they are in fact the same set. Now a R c and b R c, which by symmetry and
transitivity imply a R b. Thus, for any arbitrary element x ∈ [a], we have x R a
implies x R b, and thus [a] ⊆ [b]. Similarly, [b] ⊆ [a], and thus [a] = [b].
For the second part of the proof, let A = {Ai } be a partition of A, and define
R = {(a, b) : there exists i such that a ∈ Ai and b ∈ Ai }. We claim that R is an
equivalence relation on A. Reflexivity holds, since a ∈ Ai implies a R a. Symme-
try holds, because if a R b, then a and b are in the same set Ai , and hence b R a. If
a R b and b R c, then all three elements are in the same set, and thus a R c and tran-
sitivity holds. To see that the sets in the partition are the equivalence classes of R,
observe that if a ∈ Ai , then x ∈ [a] implies x ∈ Ai , and x ∈ Ai implies x ∈ [a].
collection of different-sized boxes there may be several maximal boxes that don’t
fit inside any other box, yet no single “maximum” box into which any other box
will fit.3
3 To be precise, in order for the “fit inside” relation to be a partial order, we need to view a box as
fitting inside itself.
EBSCO : eBook Collection (EBSCOhost) - printed on 8/4/2019 9:14 PM via UNIV OF WATERLOO
AN: 139815 ; Cormen, Thomas H..; Introduction to Algorithms
Account: s9860349.main.ehost
B.3 Functions 1077
For example, the relation “≤” is a total order on the natural numbers, but the “is a
descendant of” relation is not a total order on the set of all people, since there are
individuals neither of whom is descended from the other.
Exercises
B.2-1
Prove that the subset relation “⊆” on all subsets of Z is a partial order but not a
total order.
B.2-2
Show that for any positive integer n, the relation “equivalent modulo n” is an equiv-
alence relation on the integers. (We say that a ≡ b (mod n) if there exists an
integer q such that a − b = qn.) Into what equivalence classes does this relation
partition the integers?
B.2-3
Give examples of relations that are
a. reflexive and symmetric but not transitive,
b. reflexive and transitive but not symmetric,
c. symmetric and transitive but not reflexive.
B.2-4
Let S be a finite set, and let R be an equivalence relation on S × S. Show that if
in addition R is antisymmetric, then the equivalence classes of S with respect to R
are singletons.
B.2-5
Professor Narcissus claims that if a relation R is symmetric and transitive, then it
is also reflexive. He offers the following proof. By symmetry, a R b implies b R a.
Transitivity, therefore, implies a R a. Is the professor correct?
Copyright @ 2001. MIT Press.
B.3 Functions
Given two sets A and B, a function f is a binary relation on A × B such that for
all a ∈ A, there exists precisely one b ∈ B such that (a, b) ∈ f . The set A is called
the domain of f , and the set B is called the codomain of f . We sometimes write
1078 Appendix B Sets, Etc.
n ∈ N}.
A function is a surjection if its range is its codomain. For example, the function
f (n) = n/2 is a surjective function from N to N, since every element in N
appears as the value of f for some argument. In contrast, the function f (n) = 2n
is not a surjective function from N to N, since no argument to f can produce 3 as a
value. The function f (n) = 2n is, however, a surjective function from the natural
B.3 Functions 1079
Exercises
B.3-1
Let A and B be finite sets, and let f : A → B be a function. Show that
a. if f is injective, then |A| ≤ |B|;
Copyright @ 2001. MIT Press.
B.3-2
Is the function f (x) = x + 1 bijective when the domain and the codomain are N?
Is it bijective when the domain and the codomain are Z?
17
Examples of propositions:
1. “Hey, you!”
semantics, that is, what the truth value of the compound proposition (iii) is
as a function of the truth values of its constituent, atomic propositions.
To clarify what we mean, suppose the glass is indeed empty. Then Propo-
sition (i) above is false. This implies that Proposition (iii) is false as well.
Similarly, suppose Proposition (iii) is false. Then at least one of Proposition
(i) and (ii) is false. A customary way, in propositional logic, to specify a se-
mantics for a proposition that is composed of other propositions is to specify
a truth table. For our example of Propositions (i)–(iii) above, such a truth
table may look like the following.
If “the glass is not and “the glass is then “the glass is
empty” is not full” is neither empty nor
full” is
true true true
true false false
false true false
false false false
An important aspect of logic is to carefully distinguish syntax from seman-
tics. Syntax refers to the way we write things down. Semantics refers to
what they mean. We now specify a syntax for compound propositions. We
then clarify what the semantics of each is, via truth tables. The manner
in which we specify a syntax for compound propositions is by introducing
logical connectives, and then asserting that the use of such connectives in
particular ways is syntactically valid.
• ¬p: negation.
• p ∧ q: conjunction.
• p ∨ q: disjunction.
• p =⇒ q: implication.
20
• p ⇐= q: inference.
• p ⇐⇒ q: if and only if.
Given the above syntax for the use of logical connectives to make new propo-
sitions, we can further propose rules via which even more propositions can
be derived. They would be similar to the axioms of boolean algebra, which
we encounter in the context of digital circuits. We present an example here,
but leave more for a future course. For this course, we focus on employing
semantics, which we specify using truth tables, to infer more propositions.
Similarly, in the context of digital circuits, we usually employ “truth tables,”
like those employed here, rather than proofs based on the axioms of boolean
algebra.
We point out that more connectives can be introduced, for example, ⊕,
“exclusive-or.” It turns out that in propositional logic, the connectives ¬, ∨
and ∧ suffice, and all other connectives can be defined using those three
only. Also, not all of the connectivities are necessary, in the sense that
given a smaller set of them, we can realize the others. In particular, (·), ¬, ∨
and ∧ suffice. We introduced =⇒ , ⇐= and ⇐⇒ as well because
those are used heavily in this course for proofs. Consequently, it is useful to
directly specify and understand those connectives as well. Similarly, XOR
gates are convenient in the context of digital circuits to have, even though
its functionality can be realized from NOT, AND and OR gates only.
As an example of the use of purely syntactic derivation, see proofwiki.org/
wiki/Rule_of_Material_Implication/Formulation_1/Forward_Implication/Proof,
which shows a derivation from p =⇒ q to ¬p ∨ q.
p (p)
• parenthesization: T T
F F
21
The above truth table merely emphasizes that the truth value of p is
unaffected by parenthesization.
p ¬p
• negation: T F
F T
Example: suppose “the Sun is hot” is true. Then, “the Sun is not hot”
is false. The second statement is the manner in which we customarily
write the negation of “the Sun is hot” in English.
p q p∧q
T T T
• conjunction: T F F
F T F
F F F
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then, “the Moon is made of cheese and the Sun is hot”
is false.
p q p∨q
T T T
• disjunction: T F T
F T T
F F F
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then, “either the Moon is made of cheese, or the Sun
is hot, or both” is true.
p q p =⇒ q
T T T
• implication: T F F
F T T
F F T
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then:
– “If the Sun is hot, then the Moon is made of cheese” is false.
– “If the Moon is made of cheese, then the Sun is hot” is true.
22
– “If the Sun is not hot, then the Moon is made of cheese” is true.
The last two examples illustrate that, in propositional logic, “if p then
q” may have a very different meaning than in natural language. In
English, it is often used, for instance, to imply a causal relationship
between p and q. But given a premise p that is false – for example,
“the Sun is not hot” – the implication p =⇒ q is true for any q,
even a completely unrelated proposition q such as “the Moon is made
of cheese.” So the current truth of p =⇒ q does not mean that,
when the Sun eventually cools, the Moon will then be composed en-
tirely of fermented curd; rather, when the Sun cools, the implication
itself will be false: in our truth-functional semantics, the truth value of
the compound proposition reflects only the specific truth values of the
constituent propositions, and no more profound relationship between
those constituent propositions. It may be helpful to think of “if p then
q” as shorthand for, “(in any row of the truth table in which p =⇒ q
is true), if p is true, then q is true.”
In mathematics, because we use these same truth-functional semantics,
if p is false, we say that p =⇒ q is vacuously true, to mean that the
implication is true simply by virtue of the falsity of its premise. For
example, if p is “x is an element of the empty set,” and q is “x has
property Q,” then p =⇒ q is (vacuously) true, whatever the property
Q: the elements of the empty set can be said to have any property that
you like, because there are no such elements.
It is not necessary to read p =⇒ q as “if p then q”; another com-
mon way is to say “p only if q.” Again, the proper interpretation is
truth-functional. In other words, in our truth-functional semantics, the
following two statements are completely equivalent:
p q p ⇐= q
T T T
• inference: T F T
F T F
F F T
23
– “The Sun is hot if and only if the Moon is made of cheese” is false.
– “The Moon is made of cheese if and only if the Sun is not hot” is
true.
Given the above semantics via truth tables, we can now infer several more
propositions.
Proof. By truth-table.
p q ¬p p =⇒ q ¬p ∨ q (p =⇒ q) ⇐⇒ (¬p ∨ q)
F F T T T T
F T T T T T
T F F F F T
T T F T T T
24
We claim that the above is a valid proof for the claim because for every
possible combination of truth values for p and q, we have shown that the
proposition in the claim is true. We now make and prove two more claims.
The first, which is an implication, has a special name, and is useful for
carrying out some proofs. Given p =⇒ q, we call the proposition ¬q =⇒
¬p its contrapositive. The contrapositive of an implication is different from
the converse: the converse of p =⇒ q is q =⇒ p. It turns out that
(p =⇒ q) ⇐⇒ (¬q =⇒ ¬p), that is, an implication and its contrapositive
are completely equivalent from the standpoint of their respective truth values.
However, given a proposition p =⇒ q, its converse, q =⇒ p, is not
necessarily true.
For example, suppose you know that if it rains, then I carry an umbrella. You
happen to observe that I am carrying an umbrella. Can you infer anything,
for example, that it is raining? The answer is no, not necessarily. On the
other hand, suppose you observe that I am not carrying an umbrella. Can
you infer anything? The answer is yes, you can infer that it is not raining.
(p =⇒ q) ⇐⇒
p q ¬p ¬q p =⇒ q ¬q =⇒ ¬p
(¬q =⇒ ¬p)
F F T T T T T
F T T F T T T
T F F T F F T
T T F F T T T
We now assert something that is perhaps not as easy to prove. If only because
it involves three propositions, p, q and r. But again, careful use of the truth
table enables us to carry out the proof somewhat mechanically.
Claim 3. (p =⇒ q) =⇒ (p ∨ r =⇒ q ∨ r).
p ∨ r =⇒ (p =⇒ q) =⇒
p q r p∨r q∨r p =⇒ q
q∨r (p ∨ r =⇒ q ∨ r)
F F F F F T T T
F F T T T T T T
F T F F T T T T
F T T T T T T T
T F F T F F F T
T F T T T F T T
T T F T T T T T
T T T T T T T T
Perhaps the trickiest part of the truth table in the above proof is intuiting
the truth value of the last column when p =⇒ q is false. Recall that the
proposition φ =⇒ ψ is true whenever φ is false. And in this case, φ is
p =⇒ q.
A number of other useful propositions can similarly be inferred from the
truth tables. Following are some useful propositions, and names we associate
with them when perceived as properties.
• (p ∨ q) ⇐⇒ (q ∨ p) – commutativity of ∨.
• (p ∧ q) ⇐⇒ (q ∧ p) – commutativity of ∧.
• (p =⇒ q) ⇐⇒ (q ⇐= p).
• (p ⇐⇒ q) ⇐⇒ ((p =⇒ q) ∧ (p ⇐= q)).
26
Sometimes, when we use the same quantifier over multiple variables, we write
one instance of a quantifier only, and not several. For example:
∀ real a, b, (a ≤ b ∨ b ≤ a)
Proof techniques
We now discuss proof techniques that are useful in this course, and in future,
to you in your engineering profession. The mindset and systematic thinking
that working out a proof develops is critical to one’s success as an engineer.
The kinds of proofs we develop, and the underlying mindsets and techniques
we use, are not only of esoteric or theoretical interest. They have immedi-
ate, practical consequence. Also, the precise communication that such proofs
require also are very valuable for one to develop as an engineer. Precise tech-
nical communication is an invaluable skill, that is highy prized not only in
academia, but also industry and business settings. We return to this some-
what philosophical discussion once we have discussed the proof techniques
we seek to impart as part of this course.
For example, consider the following claim, and its proof by contradic-
tion.
√
Claim 5. 2 is not rational.
Another example, which was on the final exam of the Spring’18 offering
of the course is the following claim. We define an even number as
follows: x is an even number if x = 2y, where y is an integer.
Claim 6. If a, b, c are positive integers, then at least one of a − b, b −
c, c − a is even.
y
x
!
X X
Claim 7. For x, y positive integers, i= i =⇒ (x = y).
i=1 i=1
Claim 8. Given any two real numbers, x, y such that x < y, there
exists a real number z such that x < z < y.
– We first prove that the statement is true for the base case. The
base case is the statement for the first natural number, 0.
– We then prove the step, i.e., the following implication: if the state-
ment is true for all natural numbers, 0, 1, . . . , i − 1, then the state-
ment is true for the natural number i.
Together, the two steps above prove the statement for all items in the
sequence, for example, every natural number. This is because proving
the (i) base case, i.e., the statement for 0, and, (ii) the step, implies
that the statement is true for the second natural number, 1. This, with
the step, in turn implies that the statement is true for 2. And, 3, and
so on, for all natural numbers. Following is an example.
n(n+1)
Claim 9. 1 + 2 + . . . + n = 2
.
Proof. By induction on n.
Base case: n = 1. When n = 1, the left hand side is 1. And the right
hand-side is 1×2
2
= 1. Thus, we have proved that the statement is true
for the base case.
Step: we adopt the induction assumption, that the statement is true
for all n = 1, 2, . . . , i − 1, for some i ≥ 2. Under that premise, we seek
to prove the statement for n = i. We observe:
(i − 1)i
1 + 2 + ... + i − 1 + i = +i ∵ induction assumption
2
i2 − i + 2i
=
2
i2 + i i(i + 1)
= =
2 2
33
Thus, we have proven the base case and the step, and therefore we have
successfully carried out our proof by induction on n.
As the base case, we have proved that the statement is true when n = 1. As
a consequence of proving the step, then, we have proved that the statement
is true for n = 2. And with that, and as a consequence of the step, we have
proved that the statement is true for n = 3. And so on.
We now carry out several proofs as examples to demonstrate the above strate-
gies. We begin with a problem from the final exam of the Spring’18 offering
of the course.
Claim 10. For every non-negative integer n ≥ 12, there exist non-negative
integers m1 , m2 such that n = 4m1 + 5m2 .
Proof. By induction on n.
Base cases: we prove the statement for the following cases: n = 12, 13, 14, 15.
The reason we consider several base cases becomes apparent once we get in
to proving the step. We observe:
• 12 = 4 × 3 + 5 × 0.
• 13 = 4 × 2 + 5 × 1.
• 14 = 4 × 1 + 5 × 2.
• 15 = 4 × 0 + 5 × 3.
Step: we assume that the assertion is true for all n = 12, 13, . . . , i − 1 for
some i ≥ 13. For n = i, we first observe that i = i − 1 + 1 = 4k1 + 5k2 + 1,
for some non-negative integers k1 , k2 , from the induction assumption. We do
a case analysis.
Case (i): k1 > 0. Then, i = 4k1 + 5k2 + 1 = 4(k1 − 1) + 5(k2 + 1).
Case (ii): k1 = 0. Then, because i > 12, k2 ≥ 3. Then, i = 5k2 + 1 =
4 × 4 + 5(k2 − 3).
34
The reason we prove several base cases is to address Case (ii) of the step.
Because the smallest n for which k2 ≥ 3 is n = 15. By addressing several
base cases, we ensure that our proof is indeed correct, i.e., that we can indeed
make the inductive argument.
Claim 11. For every non-negative integer n, exactly one of the following is
true:
We need to be careful here in that the statement says that exactly one of
those cases is true. That is, for a particular n, one of the cases is true, and
neither of the others is true. We need to prove both those properties.
the horse that we first removed. Again, we are left with i − 1 horses which,
by the induction assumption must all have the same colour.
A flaw in the above proof is in the manner in which we prove the step. While
it is certainly ok to remove a horse, call it H, from the set and then assert
that the remainder all have the same colour, what we now need to do is prove
that H has the same colour as the other i−1 horses. We cannot again appeal
to the induction assumption to do that, as the above flawed proof does.
We now present one more correct example of proof by induction. In the
following claim, we address a situation that there appears to be more than
one choice for the parameter on which we carry out induction.
Claim 12. Suppose n is a natural number whose digits, in order of most-
to least-significant, are nk−1 nk−2 . . . , n0 , where each ni is one of 0, . . . , 9. If
k−1
X
the sum of the digits of n, Sn = ni , is divisible by 3, then n is divisible
i=0
by 3.
i−2
X
• Suppose nj is divisible by 3. Then, for Sn to be divisible by 3,
j=0
ni−1 must be divisible by 3 by Claim 4. That is, ni−1 = 3a for some
i−2
X
natural number a. Then, n = 10i−1 × 3a + (ni−2 . . . n0 )10 . As nj is
j=0
divisible by 3, by the induction assumption, (ni−2 . . . n0 )10 is divisible
by 3. Therefore, by Claim 4, n = 10i−1 × 3a + (ni−2 . . . n0 )10 is divisible
by 3, because it is the sum of two numbers, each of which is divisible
by 3.
i−2
X i−2
X
• Suppose nj is not divisible by 3. Then, nj = 3a + b, for some
j=0 j=0
natural number a, and for b either 1 or 2. We now do a case analysis
of those two cases for b.
Disproof
If we simply try to prove this statement, we will never succeed. But we may
succeed in disproving it: it turns out that the above claim is false. That is,
there exists natural n such that n2 − n + 11 is not prime.
Such an n is 11, and this is called a counterexample to the claim: an example
of a specific n for which the claim does not hold. Producing a counterexample
is an effective way of refuting a statement of the form “for all . . . ”
For another disproof by counterexample, consider the statement, “no mam-
mal lays eggs.” This can be seen as the negation of a statement with the
“there exists” quantifier. Which can in turn be rephrased as a statement
with “for all. . . ”
be a natural number, i.e., ≥ 1.) (ii) Suppose q is even and p is odd. Then,
q = 2 and p = 515, which contradicts the assumption that p is prime.
We now consider a more complex example, a statement that involves two
quantifiers. This example illustrates the utility of first carefully negating the
statement, and then choosing a strategy when trying to disprove the original
statement.
Claim 16. ∀m ∈ Z, ∃n ∈ N, m1 − n1 > 12 .
The above statement is not true. Its negation, which we seek to prove, is:
1 1 1
∃m ∈ Z, ∀n ∈ N, − ≤
m n 2
To disprove the above claim, we observe that when p is false, q is false and
r is true, the statement is not true. We observe that for some other truth-
assignments, the statement is true; for example, p = true, q = false, r = true.
But that is immaterial to the fact that the claim is false.
42
Another example of the use of the kind of logic and proof techniques is the
following claim and proof. In the proof, we directly use “ ⇐⇒ .” We could,
instead, first establish A ⊆ A, and then A ⊇ A. The symbol “\” denotes
set difference.
Claim 18. A = A.
Proof.
x ∈ A ⇐⇒ x ∈ U \ A
⇐⇒ x ∈ U ∧ x 6∈ A
⇐⇒ x ∈ U ∧ x 6∈ (U \ A)
⇐⇒ x ∈ U ∧ ¬(x ∈ U \ A)
⇐⇒ x ∈ U ∧ ¬(x ∈ U ∧ ¬(x ∈ A))
⇐⇒ x ∈ U ∧ (x 6∈ U ∨ x ∈ A)
⇐⇒ (x ∈ U ∧ x 6∈ U) ∨ (x ∈ U ∧ x ∈ A)
⇐⇒ false ∨ (x ∈ U ∧ x ∈ A)
⇐⇒ x∈U ∧x∈A
⇐⇒ x ∈ U ∩ A ⇐⇒ x ∈ A
As another somewhat more general example of the use of logic in the context
of sets, we discuss Russel’s paradox, which points out that the set builder
notation should be used with care.
That is, S is the set of all sets that do not contain themselves. Now, we ask:
does S contain itself?
{x ∈ A | conditions on x}
That is, we must specify of what superset A this set being specified is a
subset. And the conditions that appear after “ | ” are then used to specify
which members of A are members of this set. Under these requirements, the
earlier specification, S = {x | x 6∈ x} is no longer allowed.
And if we specify, for example, S = {x ∈ A | x 6∈ x}, we no longer have a
paradox. Because suppose S = {x ∈ A | x 6∈ x} is our specification of S, and
we again ask: is S ∈ S?
44
S ∈ S =⇒ S ∈ A ∧ S 6∈ S =⇒ S 6∈ S
S 6∈ S =⇒ S 6∈ A ∨ (S ∈ A ∧ S 6∈ S)
Discrete probability
This chapter reviews elementary combinatorics and probability theory. If you have
a good background in these areas, you may want to skim the beginning of the
chapter lightly and concentrate on the later sections. Most of the chapters do not
require probability, but for some chapters it is essential.
Section C.1 reviews elementary results in counting theory, including standard
formulas for counting permutations and combinations. The axioms of probability
and basic facts concerning probability distributions are presented in Section C.2.
Random variables are introduced in Section C.3, along with the properties of ex-
pectation and variance. Section C.4 investigates the geometric and binomial dis-
tributions that arise from studying Bernoulli trials. The study of the binomial dis-
tribution is continued in Section C.5, an advanced discussion of the “tails” of the
distribution.
C.1 Counting
Counting theory tries to answer the question “How many?” without actually enu-
merating how many. For example, we might ask, “How many different n-bit num-
bers are there?” or “How many orderings of n distinct elements are there?” In this
section, we review the elements of counting theory. Since some of the material
assumes a basic understanding of sets, the reader is advised to start by reviewing
the material in Section B.1.
Copyright @ 2001. MIT Press.
follows from equation (B.3). For example, each position on a car’s license plate
is a letter or a digit. The number of possibilities for each position is therefore
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Strings
A string over a finite set S is a sequence of elements of S. For example, there are
8 binary strings of length 3:
000, 001, 010, 011, 100, 101, 110, 111 .
We sometimes call a string of length k a k-string. A substring s of a string s
is an ordered sequence of consecutive elements of s. A k-substring of a string
is a substring of length k. For example, 010 is a 3-substring of 01101001 (the
3-substring that begins in position 4), but 111 is not a substring of 01101001.
A k-string over a set S can be viewed as an element of the Cartesian product S k
of k-tuples; thus, there are |S|k strings of length k. For example, the number of
binary k-strings is 2k . Intuitively, to construct a k-string over an n-set, we have n
ways to pick the first element; for each of these choices, we have n ways to pick the
second element; and so forth k times. This construction leads to the k-fold product
n · n · · · n = n k as the number of k-strings.
Permutations
A permutation of a finite set S is an ordered sequence of all the elements of S,
with each element appearing exactly once. For example, if S = {a, b, c}, there
are 6 permutations of S:
abc, acb, bac, bca, cab, cba .
There are n! permutations of a set of n elements, since the first element of the
Copyright @ 2001. MIT Press.
Combinations
A k-combination of an n-set S is simply a k-subset of S. For example, there are
six 2-combinations of the 4-set {a, b, c, d}:
ab, ac, ad, bc, bd, cd .
(Here we use the shorthand of denoting the 2-set {a, b} by ab, and so on.) We can
construct a k-combination of an n-set by choosing k distinct (different) elements
from the n-set.
The number of k-combinations of an n-set can be expressed in terms of the num-
ber of k-permutations of an n-set. For every k-combination, there are exactly k!
permutations of its elements, each of which is a distinct k-permutation of the n-set.
Thus, the number of k-combinations of an n-set is the number of k-permutations
divided by k!; from equation (C.1), this quantity is
n!
. (C.2)
k! (n − k)!
For k = 0, this formula tells us that the number of ways to choose 0 elements from
an n-set is 1 (not 0), since 0! = 1.
Binomial coefficients
We use the notation nk (read “n choose k”) to denote the number of k-combinations
of an n-set. From equation (C.2), we have
n n!
= .
k k! (n − k)!
Copyright @ 2001. MIT Press.
n
n
(x + y)n = x k y n−k . (C.4)
k
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
k=0
Binomial bounds
We sometimes need to bound the size of a binomial coefficient. For 1 ≤ k ≤ n, we
have the lower bound
n n(n − 1) · · · (n − k + 1)
=
k k(k − 1) · · · 1
n n − 1
n−k+1
= ···
k k−1 1
n k
≥ .
k
Taking advantage of the inequality k! ≥ (k/e)k derived from Stirling’s approxima-
tion (3.17), we obtain the upper bounds
n n(n − 1) · · · (n − k + 1)
=
k k(k − 1) · · · 1
k
n
≤
k!
en k
≤ . (C.5)
k
For all 0 ≤ k ≤ n, we can use induction (see Exercise C.1-12) to prove the bound
n nn
Copyright @ 2001. MIT Press.
≤ k , (C.6)
k k (n − k)n−k
where for convenience we assume that 00 = 1. For k = λn, where 0 ≤ λ ≤ 1, this
bound can be rewritten as
n nn
≤
λn (λn)λn ((1 − λ)n)(1−λ)n
1100 Appendix C Counting and Probability
C.1-15
Show that for any integer n ≥ 0,
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
n
n
k = n2n−1 . (C.11)
k=0
k
C.2 Probability
Probability is an essential tool for the design and analysis of probabilistic and ran-
domized algorithms. This section reviews basic probability theory.
We define probability in terms of a sample space S, which is a set whose el-
ements are called elementary events. Each elementary event can be viewed as a
possible outcome of an experiment. For the experiment of flipping two distinguish-
able coins, we can view the sample space as consisting of the set of all possible
2-strings over {H, T }:
S = {HH, HT, TH , TT } .
An event is a subset1 of the sample space S. For example, in the experiment of
flipping two coins, the event of obtaining one head and one tail is {HT, TH }. The
event S is called the certain event, and the event ∅ is called the null event. We say
that two events A and B are mutually exclusive if A ∩ B = ∅. We sometimes treat
an elementary event s ∈ S as the event {s}. By definition, all elementary events are
mutually exclusive.
Axioms of probability
A probability distribution Pr {} on a sample space S is a mapping from events of S
to real numbers such that the following probability axioms are satisfied:
1. Pr {A} ≥ 0 for any event A.
2. Pr {S} = 1.
1 For a general probability distribution, there may be some subsets of the sample space S that are not
Copyright @ 2001. MIT Press.
considered to be events. This situation usually arises when the sample space is uncountably infinite.
The main requirement is that the set of events of a sample space be closed under the operations of
taking the complement of an event, forming the union of a finite or countable number of events, and
taking the intersection of a finite or countable number of events. Most of the probability distributions
we shall see are over finite or countable sample spaces, and we shall generally consider all subsets of
a sample space to be events. A notable exception is the continuous uniform probability distribution,
which will be presented shortly.
C.2 Probability 1101
Pr {s} = 1/ |S| ,
then we have the uniform probability distribution on S. In such a case the experi-
ment is often described as “picking an element of S at random.”
As an example, consider the process of flipping a fair coin, one for which the
probability of obtaining a head is the same as the probability of obtaining a tail, that
is, 1/2. If we flip the coin n times, we have the uniform probability distribution
1102 Appendix C Counting and Probability
defined on the sample space S = {H, T}n , a set of size 2n . Each elementary event
in S can be represented as a string of length n over {H, T}, and each occurs with
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
you that at least one of the coins showed a head. What is the probability that both
coins are heads? The information given eliminates the possibility of two tails. The
three remaining elementary events are equally likely, so we infer that each occurs
with probability 1/3. Since only one of these elementary events shows two heads,
the answer to our question is 1/3.
C.2 Probability 1103
for all 1 ≤ i < j ≤ n. We say that the events of the collection are (mutually)
independent if every k-subset Ai1 , Ai2 , . . . , Aik of the collection, where 2 ≤ k ≤ n
and 1 ≤ i 1 < i 2 < · · · < i k ≤ n, satisfies
Pr {Ai1 ∩ Ai2 ∩ · · · ∩ Aik } = Pr { Ai1 } Pr { Ai2 } · · · Pr { Aik } .
1104 Appendix C Counting and Probability
For example, suppose we flip two fair coins. Let A1 be the event that the first coin
is heads, let A2 be the event that the second coin is heads, and let A3 be the event
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Bayes’s theorem
From the definition of conditional probability (C.14) and the commutative law
A ∩ B = B ∩ A, it follows that for two events A and B, each with nonzero proba-
bility,
Pr { A ∩ B} = Pr {B} Pr {A | B} (C.16)
= Pr { A} Pr {B | A} .
Solving for Pr {A | B}, we obtain
Pr { A} Pr {B | A}
Pr { A | B} = , (C.17)
Pr {B}
which is known as Bayes’s theorem. The denominator Pr {B} is a normalizing
constant that we can reexpress as follows. Since B = (B ∩ A) ∪ (B ∩ A) and B ∩ A
and B ∩ A are mutually exclusive events,
Pr {B} = Pr {B ∩ A} + Pr {B ∩ A}
= Pr {A} Pr {B | A} + Pr { A} Pr {B | A} .
Copyright @ 2001. MIT Press.
Exercises
C.2-1
Prove Boole’s inequality: For any finite or countably infinite sequence of events
A1 , A2 , . . .,
Pr {A1 ∪ A2 ∪ · · ·} ≤ Pr { A1 } + Pr { A2 } + · · · . (C.18)
C.2-2
Professor Rosencrantz flips a fair coin once. Professor Guildenstern flips a fair
coin twice. What is the probability that Professor Rosencrantz obtains more heads
than Professor Guildenstern?
C.2-3
A deck of 10 cards, each bearing a distinct number from 1 to 10, is shuffled to mix
the cards thoroughly. Three cards are removed one at a time from the deck. What
is the probability that the three cards are selected in sorted (increasing) order?
C.2-4
Describe a procedure that takes as input two integers a and b such that 0 < a < b
and, using fair coin flips, produces as output heads with probability a/b and tails
Copyright @ 2001. MIT Press.
with probability (b − a)/b. Give a bound on the expected number of coin flips,
which should be O(1). (Hint: Represent a/b in binary.)
C.2-5
Prove that
Pr {A | B} + Pr { A | B} = 1 .
1106 Appendix C Counting and Probability
C.2-6
Prove that for any collection of events A1 , A2 , . . . , An ,
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Pr { A1 ∩ A2 ∩ · · · ∩ An } = Pr {A1 } · Pr {A2 | A1 } · Pr { A3 | A1 ∩ A2 } · · ·
Pr {An | A1 ∩ A2 ∩ · · · ∩ An−1 } .
C.2-7
Show how to construct a set of n events that are pairwise independent but such that
no subset of k > 2 of them is mutually independent.
C.2-8
Two events A and B are conditionally independent, given C, if
Pr { A ∩ B | C} = Pr { A | C} · Pr {B | C} .
Give a simple but nontrivial example of two events that are not independent but are
conditionally independent given a third event.
C.2-9
You are a contestant in a game show in which a prize is hidden behind one of three
curtains. You will win the prize if you select the correct curtain. After you have
picked one curtain but before the curtain is lifted, the emcee lifts one of the other
curtains, knowing that it will reveal an empty stage, and asks if you would like
to switch from your current selection to the remaining curtain. How would your
chances change if you switch?
C.2-10
A prison warden has randomly picked one prisoner among three to go free. The
other two will be executed. The guard knows which one will go free but is forbid-
den to give any prisoner information regarding his status. Let us call the prisoners
X , Y , and Z . Prisoner X asks the guard privately which of Y or Z will be executed,
arguing that since he already knows that at least one of them must die, the guard
won’t be revealing any information about his own status. The guard tells X that Y
is to be executed. Prisoner X feels happier now, since he figures that either he or
prisoner Z will go free, which means that his probability of going free is now 1/2.
Is he right, or are his chances still 1/3? Explain.
Copyright @ 2001. MIT Press.
for uncountably infinite sample spaces, but they raise technical issues that are un-
necessary to address for our purposes. Henceforth, we shall assume that random
variables are discrete.
For a random variable X and a real number x, we define the event X = x to be
{s ∈ S : X (s) = x}; thus,
Pr {X = x} = Pr {s} .
{s∈S:X (s)=x}
The function
f (x) = Pr {X = x}
is the probability density function
of the random variable X . From the probability
axioms, Pr {X = x} ≥ 0 and x Pr {X = x} = 1.
As an example, consider the experiment of rolling a pair of ordinary, 6-sided
dice. There are 36 possible elementary events in the sample space. We assume
that the probability distribution is uniform, so that each elementary event s ∈ S is
equally likely: Pr {s} = 1/36. Define the random variable X to be the maximum of
the two values showing on the dice. We have Pr {X = 3} = 5/36, since X assigns
a value of 3 to 5 of the 36 possible elementary events, namely, (1, 3), (2, 3), (3, 3),
(3, 2), and (3, 1).
It is common for several random variables to be defined on the same sample
space. If X and Y are random variables, the function
f (x, y) = Pr {X = x and Y = y}
is the joint probability density function of X and Y . For a fixed value y,
Pr {Y = y} = Pr {X = x and Y = y} ,
x
and similarly, for a fixed value x,
Pr {X = x} = Pr {X = x and Y = y} .
y
We define two random variables X and Y to be independent if for all x and y, the
events X = x and Y = y are independent or, equivalently, if for all x and y, we
have Pr {X = x and Y = y} = Pr {X = x} Pr {Y = y}.
Given a set of random variables defined over the same sample space, one can
define new random variables as sums, products, or other functions of the original
variables.
1108 Appendix C Counting and Probability
the “average” of the values it takes on. The expected value (or, synonymously,
expectation or mean) of a discrete random variable X is
E [X ] = x Pr {X = x} , (C.19)
x
which is well defined if the sum is finite or converges absolutely. Sometimes the
expectation of X is denoted by µ X or, when the random variable is apparent from
context, simply by µ.
Consider a game in which you flip two fair coins. You earn $3 for each head but
lose $2 for each tail. The expected value of the random variable X representing
your earnings is
E [X ] = 6 · Pr {2 H’s} + 1 · Pr {1 H, 1 T} − 4 · Pr {2 T’s}
= 6(1/4) + 1(1/2) − 4(1/4)
= 1.
The expectation of the sum of two random variables is the sum of their expecta-
tions, that is,
E [X + Y ] = E [X ] + E [Y ] , (C.20)
whenever E [X ] and E [Y ] are defined. We call this property linearity of expecta-
tion, and it holds even if X and Y are not independent. It also extends to finite and
absolutely convergent summations of expectations. Linearity of expectation is the
key property that enables us to perform probabilistic analyses by using indicator
random variables (see Section 5.2).
If X is any random variable, any function g(x) defines a new random variable
g(X ). If the expectation of g(X ) is defined, then
E [g(X )] = g(x) Pr {X = x} .
x
E [a X + Y ] = aE [X ] + E [Y ] . (C.22)
When two random variables X and Y are independent and each has a defined
expectation,
E [X Y ] = x y Pr {X = x and Y = y}
x y
C.3 Discrete random variables 1109
= x y Pr {X = x} Pr {Y = y}
x y
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
= x Pr {X = x} y Pr {Y = y}
x y
= E [X ] E [Y ] .
In general, when n random variables X 1 , X 2 , . . . , X n are mutually independent,
E [X 1 X 2 · · · X n ] = E [X 1 ] E [X 2 ] · · · E [X n ] . (C.23)
When a random variable X takes on values from the set of natural numbers
N = {0, 1, 2, . . .}, there is a nice formula for its expectation:
∞
E [X ] = i Pr {X = i}
i=0
∞
= i(Pr {X ≥ i} − Pr {X ≥ i + 1})
i=0
∞
= Pr {X ≥ i} , (C.24)
i=1
since each term Pr {X ≥ i} is added in i times and subtracted out i −1 times (except
Pr {X ≥ 0}, which is added in 0 times and not subtracted out at all).
When we apply a convex function f (x) to a random variable X , Jensen’s in-
equality gives us
E [ f (X )] ≥ f (E [X ]) , (C.25)
provided that the expectations exist and are finite. (A function f (x) is convex if for
all x and y and for all 0 ≤ λ ≤ 1, we have f (λx +(1−λ)y) ≤ λ f (x)+(1−λ) f (y).)
Var [X ] = q/ p2 . (C.32)
As an example, suppose we repeatedly roll two dice until we obtain either a
seven or an eleven. Of the 36 possible outcomes, 6 yield a seven and 2 yield an
eleven. Thus, the probability of success is p = 8/36 = 2/9, and we must roll
1/ p = 9/2 = 4.5 times on average to obtain a seven or eleven.
C.4 The geometric and binomial distributions 1113
k−1
2 1
3 3
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
0.35
0.30
0.25
0.20
0.15
0.10
0.05
k
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure C.1 A geometric distribution with probability p = 1/3 of success and a probability
q = 1 − p of failure. The expectation of the distribution is 1/ p = 3.
0.25
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
0.20
0.15
0.10
0.05
k
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure C.2 The binomial distribution b(k; 15, 1/3) resulting from n = 15 Bernoulli trials, each
with probability p = 1/3 of success. The expectation of the distribution is np = 5.
n
b(k; n, p) = 1 , (C.35)
k=0
= np p q
k=1
k−1
n−1
n − 1 k (n−1)−k
= np p q
k=0
k
C.4 The geometric and binomial distributions 1115
n−1
= np b(k; n − 1, p)
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
k=0
= np . (C.36)
By using the linearity of expectation, we can obtain the same result with sub-
stantially less algebra. Let X i be the random variable describing the number of
successes in the ith trial. Then E [X i ] = p · 1 + q · 0 = p, and by linearity of
expectation (equation (C.20)), the expected number of successes for n trials is
n
E [X ] = E Xi
i=1
n
= E [X i ]
i=1
n
= p
i=1
= np . (C.37)
The same approach can be used to calculate the variance of the distribution.
Using equation (C.26), we have Var [X i ] = E [X i2 ] − E2 [X i ]. Since X i only takes
on the values 0 and 1, we have E [X i2 ] = E [X i ] = p, and hence
Var [X i ] = p − p2 = pq . (C.38)
To compute the variance of X , we take advantage of the independence of the n
trials; thus, by equation (C.28),
n
Var [X ] = Var Xi
i=1
n
= Var [X i ]
i=1
n
= pq
i=1
= npq . (C.39)
Copyright @ 2001. MIT Press.
As can be seen from Figure C.2, the binomial distribution b(k; n, p) increases
as k runs from 0 to n until it reaches the mean np, and then it decreases. We can
prove that the distribution always behaves in this manner by looking at the ratio of
successive terms:
63
Introduction to Python 3