0% found this document useful (0 votes)
382 views64 pages

ECE 606, Algorithms: Mahesh Tripunitara Tripunit@uwaterloo - Ca ECE, University of Waterloo

algorithm

Uploaded by

Chan David
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
382 views64 pages

ECE 606, Algorithms: Mahesh Tripunitara Tripunit@uwaterloo - Ca ECE, University of Waterloo

algorithm

Uploaded by

Chan David
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

ECE 606, Algorithms

Mahesh Tripunitara
[email protected]
ECE, University of Waterloo
2

Acknowledgements

Considerable material is straight out of CLRS, T. H. Cormen, C. E. Leiserson,


R. L. Rivest and C. Stein, “Introduction to Algorithms,” MIT Press, Editions
II and III. Some material is from the following.

• J. Thistle and M. Tripunitara, ECE 108 “textbook.”

• W. Conradie and V. Goranko, “Logic and Discrete Mathematics, a


Concise Introduction,” Wiley.

• G. P. Hochschild, “Perspectives of Elementary Mathematics,” Springer-


Verlag.

• P. J. Cameron, “Sets, Logic and Categories,” Springer.

• M. Huth and M. Ryan, “Logic in Computer Science, Modelling and


Reasoning about Systems,” Cambridge.

• S. Dasgupta, C. Papadimitriou and U. Vazirani, “Algorithms,” McGraw-


Hill.

• Elyse Yeager, http://www.math.ubc.ca/~elyse/220/2016/9Disproof.pdf

• J. Kleinberg and E. Tardos, “Algorithm Design,” Addison Wesley Long-


man.

• S. Arora and B. Barak, “Computational Complexity – a Modern Ap-


proach,” Cambridge.

• J. Hopcroft, R. Motwani and J. Ullman, “Introduction to Automata


Theory, Languages, and Computation,” Addison Wesley.

• V. Vazirani, “Approximation Algorithms,” Springer.


3

Contents

Lecture 1 5

Lecture 2 65

Lecture 3 107

Lecture 4 147

Lecture 5 201

Lecture 6 259

Lecture 7 297

Lecture 8 341

Lecture 9 385

Lecture 10 403

Lecture 11 419

Lecture 12 437
4
Lecture 1

• Discrete math review.

• Introduction to Python 3.

Discrete math is a collection of branches of mathematics that deals with


discrete, as opposed to continuous, structures. An example of a discrete
structure is the set of integers, {. . . , −2, −1, 0, 1, . . .}. We call those “dis-
crete” because they are a collection of “distinct and unconnected elements,”
as defined in the Merriam-Webster’s dictionary. The real numbers, on the
other hand, are not discrete: between any two real numbers, we can find
another real number.
Logic is a set of principles for systematic reasoning. For example, if I know
that Arthur is a cat, and that every cat is a carnivore, then I can infer that
Arthur is a carnivore.
Discrete math and logic are useful for understanding and analyzing algo-
rithms. We review the following here.

• Sets and functions.

• Propositional logic, and the universal and existential quantifers.

• Techniques for proving an assertion or its negation.

• Discrete probability and expectation.

5
6

Sets, relations and functions

From Cormen, et al., “Introduction to Algorithms.”


B Sets, Etc.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

Many chapters of this book touch on the elements of discrete mathematics. This
chapter reviews more completely the notations, definitions, and elementary prop-
erties of sets, relations, functions, graphs, and trees. Readers already well versed
in this material need only skim this chapter.

B.1 Sets

A set is a collection of distinguishable objects, called its members or elements. If


an object x is a member of a set S, we write x ∈ S (read “x is a member of S”
or, more briefly, “x is in S”). If x is not a member of S, we write x ∈ S. We
can describe a set by explicitly listing its members as a list inside braces. For
example, we can define a set S to contain precisely the numbers 1, 2, and 3 by
writing S = {1, 2, 3}. Since 2 is a member of the set S, we can write 2 ∈ S, and
since 4 is not a member, we have 4 ∈ / S. A set cannot contain the same object more
1
than once, and its elements are not ordered. Two sets A and B are equal, written
A = B, if they contain the same elements. For example, {1, 2, 3, 1} = {1, 2, 3} =
{3, 2, 1}.
We adopt special notations for frequently encountered sets.
• ∅ denotes the empty set, that is, the set containing no members.
• Z denotes the set of integers, that is, the set {. . . , −2, −1, 0, 1, 2, . . .}.
• R denotes the set of real numbers.
Copyright @ 2001. MIT Press.

• N denotes the set of natural numbers, that is, the set {0, 1, 2, . . .}.2

1 A variation of a set, which can contain the same object more than once, is called a multiset.

2 Some authors start the natural numbers with 1 instead of 0. The modern trend seems to be to start
with 0.
B.1 Sets 1071

If all the elements of a set A are contained in a set B, that is, if x ∈ A implies
x ∈ B, then we write A ⊆ B and say that A is a subset of B. A set A is a
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

proper subset of B, written A ⊂ B, if A ⊆ B but A = B. (Some authors use the


symbol “⊂” to denote the ordinary subset relation, rather than the proper-subset
relation.) For any set A, we have A ⊆ A. For two sets A and B, we have A = B
if and only if A ⊆ B and B ⊆ A. For any three sets A, B, and C, if A ⊆ B
and B ⊆ C, then A ⊆ C. For any set A, we have ∅ ⊆ A.
We sometimes define sets in terms of other sets. Given a set A, we can define a
set B ⊆ A by stating a property that distinguishes the elements of B. For example,
we can define the set of even integers by {x : x ∈ Z and x/2 is an integer}. The
colon in this notation is read “such that.” (Some authors use a vertical bar in place
of the colon.)
Given two sets A and B, we can also define new sets by applying set operations:
• The intersection of sets A and B is the set

A ∩ B = {x : x ∈ A and x ∈ B} .

• The union of sets A and B is the set

A ∪ B = {x : x ∈ A or x ∈ B} .

• The difference between two sets A and B is the set

A − B = {x : x ∈ A and x ∈
/ B} .

Set operations obey the following laws.


Empty set laws:
A∩∅ = ∅,
A∪∅ = A.

Idempotency laws:
A∩ A = A,
A∪ A = A.
Copyright @ 2001. MIT Press.

Commutative laws:
A∩B = B∩ A,
A∪B = B∪ A.
1072 Appendix B Sets, Etc.

A B A B A B A B A B
− = = ∪
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

C C C C C
A − (B ∩ C) = A − (B ∩ C) = ( A − B) ∪ ( A − C)

Figure B.1 A Venn diagram illustrating the first of DeMorgan’s laws (B.2). Each of the sets A, B,
and C is represented as a circle.

Associative laws:
A ∩ (B ∩ C) = ( A ∩ B) ∩ C ,
A ∪ (B ∪ C) = ( A ∪ B) ∪ C .

Distributive laws:

A ∩ (B ∪ C) = ( A ∩ B) ∪ ( A ∩ C) ,
(B.1)
A ∪ (B ∩ C) = ( A ∪ B) ∩ ( A ∪ C) .

Absorption laws:
A ∩ ( A ∪ B) = A,
A ∪ ( A ∩ B) = A.

DeMorgan’s laws:

A − (B ∩ C) = ( A − B) ∪ ( A − C) ,
(B.2)
A − (B ∪ C) = ( A − B) ∩ ( A − C) .

The first of DeMorgan’s laws is illustrated in Figure B.1, using a Venn diagram, a
graphical picture in which sets are represented as regions of the plane.
Often, all the sets under consideration are subsets of some larger set U called the
universe. For example, if we are considering various sets made up only of integers,
the set Z of integers is an appropriate universe. Given a universe U , we define the
complement of a set A as A = U − A. For any set A ⊆ U , we have the following
Copyright @ 2001. MIT Press.

laws:
A = A,
A∩ A = ∅,
A∪ A = U .
B.1 Sets 1073

DeMorgan’s laws (B.2) can be rewritten with complements. For any two sets
B, C ⊆ U , we have
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

B ∩C = B ∪C ,
B ∪C = B ∩C .
Two sets A and B are disjoint if they have no elements in common, that is, if
A ∩ B = ∅. A collection S = {Si } of nonempty sets forms a partition of a set S if
• the sets are pairwise disjoint, that is, Si , S j ∈ S and i = j imply Si ∩ S j = ∅,
and
• their union is S, that is,

S= Si .
Si ∈S

In other words, S forms a partition of S if each element of S appears in exactly


one Si ∈ S.
The number of elements in a set is called the cardinality (or size) of the set,
denoted |S|. Two sets have the same cardinality if their elements can be put into
a one-to-one correspondence. The cardinality of the empty set is |∅| = 0. If
the cardinality of a set is a natural number, we say the set is finite; otherwise, it
is infinite. An infinite set that can be put into a one-to-one correspondence with
the natural numbers N is countably infinite; otherwise, it is uncountable. The
integers Z are countable, but the reals R are uncountable.
For any two finite sets A and B, we have the identity
|A ∪ B| = |A| + |B| − |A ∩ B| , (B.3)
from which we can conclude that
|A ∪ B| ≤ |A| + |B| .
If A and B are disjoint, then |A ∩ B| = 0 and thus |A ∪ B| = |A| + |B|. If A ⊆ B,
then |A| ≤ |B|.
A finite set of n elements is sometimes called an n-set. A 1-set is called a
singleton. A subset of k elements of a set is sometimes called a k-subset.
The set of all subsets of a set S, including the empty set and S itself, is denoted 2S
Copyright @ 2001. MIT Press.

and is called the power set of S. For example, 2{a,b} = {∅, {a} , {b} , {a, b}}. The
power set of a finite set S has cardinality 2|S| .
We sometimes care about setlike structures in which the elements are ordered.
An ordered pair of two elements a and b is denoted (a, b) and can be defined
formally as the set (a, b) = {a, {a, b}}. Thus, the ordered pair (a, b) is not the
same as the ordered pair (b, a).
1074 Appendix B Sets, Etc.

The Cartesian product of two sets A and B, denoted A × B, is the set of all
ordered pairs such that the first element of the pair is an element of A and the
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

second is an element of B. More formally,


A × B = {(a, b) : a ∈ A and b ∈ B} .
For example, {a, b} × {a, b, c} = {(a, a), (a, b), (a, c), (b, a), (b, b), (b, c)}.
When A and B are finite sets, the cardinality of their Cartesian product is
|A × B| = |A| · |B| . (B.4)
The Cartesian product of n sets A1 , A2 , . . . , An is the set of n-tuples
A1 × A2 × · · · × An = {(a1 , a2 , . . . , an ) : ai ∈ Ai , i = 1, 2, . . . , n} ,
whose cardinality is
|A1 × A2 × · · · × An | = |A1 | · |A2 | · · · |An |
if all sets are finite. We denote an n-fold Cartesian product over a single set A by
the set
An = A × A × · · · × A ,
whose cardinality is |An | = |A|n if A is finite. An n-tuple can also be viewed as a
finite sequence of length n (see page 1078).

Exercises

B.1-1
Draw Venn diagrams that illustrate the first of the distributive laws (B.1).

B.1-2
Prove the generalization of DeMorgan’s laws to any finite collection of sets:
A1 ∩ A2 ∩ · · · ∩ An = A1 ∪ A2 ∪ · · · ∪ An ,
A1 ∪ A2 ∪ · · · ∪ An = A1 ∩ A2 ∩ · · · ∩ An .

B.1-3 
Prove the generalization of equation (B.3), which is called the principle of inclu-
sion and exclusion:
|A1 ∪ A2 ∪ · · · ∪ An | =
Copyright @ 2001. MIT Press.

|A1 | + |A2 | + · · · + |An |


− |A1 ∩ A2 | − |A1 ∩ A3 | − · · · (all pairs)
+ |A1 ∩ A2 ∩ A3 | + · · · (all triples)
..
.
+ (−1) |A1 ∩ A2 ∩ · · · ∩ An | .
n−1
B.2 Relations 1075

B.1-4
Show that the set of odd natural numbers is countable.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

B.1-5
Show that for any finite set S, the power set 2S has 2|S| elements (that is, there
are 2|S| distinct subsets of S).

B.1-6
Give an inductive definition for an n-tuple by extending the set-theoretic definition
for an ordered pair.

B.2 Relations

A binary relation R on two sets A and B is a subset of the Cartesian product A× B.


If (a, b) ∈ R, we sometimes write a R b. When we say that R is a binary relation
on a set A, we mean that R is a subset of A × A. For example, the “less than”
relation on the natural numbers is the set {(a, b) : a, b ∈ N and a < b}. An n-ary
relation on sets A1 , A2 , . . . , An is a subset of A1 × A2 × · · · × An .
A binary relation R ⊆ A × A is reflexive if
aRa
for all a ∈ A. For example, “=” and “≤” are reflexive relations on N, but “<” is
not. The relation R is symmetric if
a R b implies b R a
for all a, b ∈ A. For example, “=” is symmetric, but “<” and “≤” are not. The
relation R is transitive if
a R b and b R c imply a R c
for all a, b, c ∈ A. For example, the relations “<,” “≤,” and “=” are transitive, but
the relation R = {(a, b) : a, b ∈ N and a = b − 1} is not, since 3 R 4 and 4 R 5 do
not imply 3 R 5.
A relation that is reflexive, symmetric, and transitive is an equivalence relation.
For example, “=” is an equivalence relation on the natural numbers, but “<” is not.
Copyright @ 2001. MIT Press.

If R is an equivalence relation on a set A, then for a ∈ A, the equivalence class


of a is the set [a] = {b ∈ A : a R b}, that is, the set of all elements equivalent to a.
For example, if we define R = {(a, b) : a, b ∈ N and a + b is an even number},
then R is an equivalence relation, since a + a is even (reflexive), a + b is even
implies b + a is even (symmetric), and a + b is even and b + c is even imply a + c
is even (transitive). The equivalence class of 4 is [4] = {0, 2, 4, 6, . . .}, and the
1076 Appendix B Sets,. Etc

equivalence class of 3 is [3] = {1, 3, 5, 7, . . .}. A basic theorem of equivalence


classes is the following.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

Theorem B.1 (An equivalence relation is the same as a partition)


The equivalence classes of any equivalence relation R on a set A form a partition
of A, and any partition of A determines an equivalence relation on A for which the
sets in the partition are the equivalence classes.

Proof For the first part of the proof, we must show that the equivalence classes
of R are nonempty, pairwise-disjoint sets whose union is A. Because R is reflex-
ive, a ∈ [a], and so the equivalence classes are nonempty; moreover, since every
element a ∈ A belongs to the equivalence class [a], the union of the equivalence
classes is A. It remains to show that the equivalence classes are pairwise dis-
joint, that is, if two equivalence classes [a] and [b] have an element c in common,
then they are in fact the same set. Now a R c and b R c, which by symmetry and
transitivity imply a R b. Thus, for any arbitrary element x ∈ [a], we have x R a
implies x R b, and thus [a] ⊆ [b]. Similarly, [b] ⊆ [a], and thus [a] = [b].
For the second part of the proof, let A = {Ai } be a partition of A, and define
R = {(a, b) : there exists i such that a ∈ Ai and b ∈ Ai }. We claim that R is an
equivalence relation on A. Reflexivity holds, since a ∈ Ai implies a R a. Symme-
try holds, because if a R b, then a and b are in the same set Ai , and hence b R a. If
a R b and b R c, then all three elements are in the same set, and thus a R c and tran-
sitivity holds. To see that the sets in the partition are the equivalence classes of R,
observe that if a ∈ Ai , then x ∈ [a] implies x ∈ Ai , and x ∈ Ai implies x ∈ [a].

A binary relation R on a set A is antisymmetric if


a R b and b R a imply a = b .
For example, the “≤” relation on the natural numbers is antisymmetric, since a ≤ b
and b ≤ a imply a = b. A relation that is reflexive, antisymmetric, and transitive
is a partial order, and we call a set on which a partial order is defined a partially
ordered set. For example, the relation “is a descendant of” is a partial order on the
set of all people (if we view individuals as being their own descendants).
In a partially ordered set A, there may be no single “maximum” element a such
that b R a for all b ∈ A. Instead, there may be several maximal elements a such
that for no b ∈ A, where b = a, is it the case that a R b. For example, in a
Copyright @ 2001. MIT Press.

collection of different-sized boxes there may be several maximal boxes that don’t
fit inside any other box, yet no single “maximum” box into which any other box
will fit.3

3 To be precise, in order for the “fit inside” relation to be a partial order, we need to view a box as
fitting inside itself.

EBSCO : eBook Collection (EBSCOhost) - printed on 8/4/2019 9:14 PM via UNIV OF WATERLOO
AN: 139815 ; Cormen, Thomas H..; Introduction to Algorithms
Account: s9860349.main.ehost
B.3 Functions 1077

A partial order R on a set A is a total or linear order if for all a, b ∈ A, we


have a R b or b R a, that is, if every pairing of elements of A can be related by R.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

For example, the relation “≤” is a total order on the natural numbers, but the “is a
descendant of” relation is not a total order on the set of all people, since there are
individuals neither of whom is descended from the other.

Exercises

B.2-1
Prove that the subset relation “⊆” on all subsets of Z is a partial order but not a
total order.

B.2-2
Show that for any positive integer n, the relation “equivalent modulo n” is an equiv-
alence relation on the integers. (We say that a ≡ b (mod n) if there exists an
integer q such that a − b = qn.) Into what equivalence classes does this relation
partition the integers?

B.2-3
Give examples of relations that are
a. reflexive and symmetric but not transitive,
b. reflexive and transitive but not symmetric,
c. symmetric and transitive but not reflexive.

B.2-4
Let S be a finite set, and let R be an equivalence relation on S × S. Show that if
in addition R is antisymmetric, then the equivalence classes of S with respect to R
are singletons.

B.2-5
Professor Narcissus claims that if a relation R is symmetric and transitive, then it
is also reflexive. He offers the following proof. By symmetry, a R b implies b R a.
Transitivity, therefore, implies a R a. Is the professor correct?
Copyright @ 2001. MIT Press.

B.3 Functions

Given two sets A and B, a function f is a binary relation on A × B such that for
all a ∈ A, there exists precisely one b ∈ B such that (a, b) ∈ f . The set A is called
the domain of f , and the set B is called the codomain of f . We sometimes write
1078 Appendix B Sets, Etc.

f : A → B; and if (a, b) ∈ f , we write b = f (a), since b is uniquely determined


by the choice of a.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

Intuitively, the function f assigns an element of B to each element of A. No


element of A is assigned two different elements of B, but the same element of B
can be assigned to two different elements of A. For example, the binary relation
f = {(a, b) : a, b ∈ N and b = a mod 2}
is a function f : N → {0, 1}, since for each natural number a, there is exactly one
value b in {0, 1} such that b = a mod 2. For this example, 0 = f (0), 1 = f (1),
0 = f (2), etc. In contrast, the binary relation
g = {(a, b) : a, b ∈ N and a + b is even}
is not a function, since (1, 3) and (1, 5) are both in g, and thus for the choice a = 1,
there is not precisely one b such that (a, b) ∈ g.
Given a function f : A → B, if b = f (a), we say that a is the argument of f
and that b is the value of f at a. We can define a function by stating its value for
every element of its domain. For example, we might define f (n) = 2n for n ∈ N,
which means f = {(n, 2n) : n ∈ N}. Two functions f and g are equal if they have
the same domain and codomain and if, for all a in the domain, f (a) = g(a).
A finite sequence of length n is a function f whose domain is the set of n
integers {0, 1, . . . , n − 1}. We often denote a finite sequence by listing its values:
 f (0), f (1), . . . , f (n − 1). An infinite sequence is a function whose domain
is the set N of natural numbers. For example, the Fibonacci sequence, defined by
recurrence (3.21), is the infinite sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, . . ..
When the domain of a function f is a Cartesian product, we often omit the extra
parentheses surrounding the argument of f . For example, if we had a function
f : A1 × A2 × · · · × An → B, we would write b = f (a1 , a2 , . . . , an ) instead
of b = f ((a1 , a2 , . . . , an )). We also call each ai an argument to the function f ,
though technically the (single) argument to f is the n-tuple (a1 , a2 , . . . , an ).
If f : A → B is a function and b = f (a), then we sometimes say that b is the
image of a under f . The image of a set A ⊆ A under f is defined by
f ( A ) = {b ∈ B : b = f (a) for some a ∈ A } .
The range of f is the image of its domain, that is, f ( A). For example, the range of
the function f : N → N defined by f (n) = 2n is f (N) = {m : m = 2n for some
Copyright @ 2001. MIT Press.

n ∈ N}.
A function is a surjection if its range is its codomain. For example, the function
f (n) = n/2 is a surjective function from N to N, since every element in N
appears as the value of f for some argument. In contrast, the function f (n) = 2n
is not a surjective function from N to N, since no argument to f can produce 3 as a
value. The function f (n) = 2n is, however, a surjective function from the natural
B.3 Functions 1079

numbers to the even numbers. A surjection f : A → B is sometimes described as


mapping A onto B. When we say that f is onto, we mean that it is surjective.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

A function f : A → B is an injection if distinct arguments to f produce


distinct values, that is, if a = a  implies f (a) = f (a  ). For example, the function
f (n) = 2n is an injective function from N to N, since each even number b is the
image under f of at most one element of the domain, namely b/2. The function
f (n) = n/2 is not injective, since the value 1 is produced by two arguments: 2
and 3. An injection is sometimes called a one-to-one function.
A function f : A → B is a bijection if it is injective and surjective. For example,
the function f (n) = (−1)n n/2 is a bijection from N to Z:
0 → 0,
1 → −1 ,
2 → 1,
3 → −2 ,
4 → 2,
..
.
The function is injective, since no element of Z is the image of more than one
element of N. It is surjective, since every element of Z appears as the image of
some element of N. Hence, the function is bijective. A bijection is sometimes
called a one-to-one correspondence, since it pairs elements in the domain and
codomain. A bijection from a set A to itself is sometimes called a permutation.
When a function f is bijective, its inverse f −1 is defined as
f −1 (b) = a if and only if f (a) = b .
For example, the inverse of the function f (n) = (−1)n n/2 is

−1 2m if m ≥ 0 ,
f (m) =
−2m − 1 if m < 0 .

Exercises

B.3-1
Let A and B be finite sets, and let f : A → B be a function. Show that
a. if f is injective, then |A| ≤ |B|;
Copyright @ 2001. MIT Press.

b. if f is surjective, then |A| ≥ |B|.

B.3-2
Is the function f (x) = x + 1 bijective when the domain and the codomain are N?
Is it bijective when the domain and the codomain are Z?
17

Propositional logic, ∀ and ∃

We now discuss some logic, mostly propositional logic. We then augment it


with the universal and existential quantifiers.

Definition 1 (Proposition). A proposition is a statement with which we are


able to associate true or false. Each of true and false is called a truth value.

Examples of propositions:

1. “The Earth is flat.”

2. “Not all birds can fly.”

3. “A dog is a mammal, and not a bird.”

Of course, in the above, Proposition (1) happens to be false, and Propositions


(2) and (3) are true. In this context, we should make the rather important
observation that in assessing the truth value of a proposition, e.g., “the Earth
is flat,” we do so in the context of a domain of discourse. For example, if the
domain of discourse is a sci-fi novel in which there is an entity called Earth
which the novel specifies to be flat, then the truth value of the proposition
“the Earth is flat” is true in that domain of discourse. Usually, the domain
of discourse is clear from context. For example, when we said confidently
that Proposition (1) above is false, we were presumably adopting a domain
of discourse in which “Earth” refers to the planet on which we reside.

Examples of statements that are not propositions:

1. “Hey, you!”

2. “Which way is the hotel?”

3. “This statement is false.”

4. “The variable x is non-negative.”


18

The first of the above is an exclamation, and the second is a question. As


for the third, if it is true, then it is false, and if it is false, then it is true.
Thus, we are able to associate neither true nor false with that statement. The
fourth refers to a variable that can take on one of several values. Without
knowledge of exactly what value x takes at a given moment, we cannot assess
the truthfulness of the statement.
In this context, it is interesting and fun to address an old riddle. Suppose
one is faced with two persons, call them Alice and Bob, one of whom always
speaks the truth, and the other of whom always lies. What questions, when
asked of Alice and/or Bob, would reveal which one amongst them is the
truth-teller, and which one is the liar?
Suppose we ask one of them, say Alice, whether the other, Bob, would say
‘yes’ if asked whether Alice is the liar. If Alice is the truth-teller, then she
would say ‘yes,’ because Bob is the liar, and he would answer ‘yes’ to our
question to him, when the correct answer is ‘no.’ If Alice is the liar, then
she would say ‘no,’ because Bob, as the truth-teller, would say ‘yes’ if we
asked him whether Alice is the liar, and because Alice always lies, she would
negate that expected response from Bob.
While devising the right question to ask above certainly takes creativity,
underlying the entire exercise is careful logical reasoning. Communicating
and inculcating this is exactly our intent with our discussions on propositional
logic.
To develop an understanding of propositional logic, we will often deal with
propositions abstractly. Specifically, we will adopt usages such as: “Assume
that p is a proposition.” When we say that, we do not know exactly what
the proposition p is. All we know is that p is either true or false.
Given propositions, we can compose them in certain ways to yield other
propositions. Some refer to such a new proposition as a compound proposi-
tion. A proposition that is not compound is called an atomic proposition.
The third example of a proposition above, “A dog is a mammal, and not a
bird” is an example of a compound proposition. As another example, consider
the following two propositions: (i) “The glass is not empty.” (ii) “The glass
is not full.” We can compose them and say, (iii) “The glass is neither empty
nor full.” Given such a compound proposition, it is necessary to clarify its
19

semantics, that is, what the truth value of the compound proposition (iii) is
as a function of the truth values of its constituent, atomic propositions.
To clarify what we mean, suppose the glass is indeed empty. Then Propo-
sition (i) above is false. This implies that Proposition (iii) is false as well.
Similarly, suppose Proposition (iii) is false. Then at least one of Proposition
(i) and (ii) is false. A customary way, in propositional logic, to specify a se-
mantics for a proposition that is composed of other propositions is to specify
a truth table. For our example of Propositions (i)–(iii) above, such a truth
table may look like the following.
If “the glass is not and “the glass is then “the glass is
empty” is not full” is neither empty nor
full” is
true true true
true false false
false true false
false false false
An important aspect of logic is to carefully distinguish syntax from seman-
tics. Syntax refers to the way we write things down. Semantics refers to
what they mean. We now specify a syntax for compound propositions. We
then clarify what the semantics of each is, via truth tables. The manner
in which we specify a syntax for compound propositions is by introducing
logical connectives, and then asserting that the use of such connectives in
particular ways is syntactically valid.

Logical connectives – syntax Given that each of p and q is a proposition,


so are the following:

• (p): parenthesization – used to force precedence.

• ¬p: negation.

• p ∧ q: conjunction.

• p ∨ q: disjunction.

• p =⇒ q: implication.
20

• p ⇐= q: inference.
• p ⇐⇒ q: if and only if.

Given the above syntax for the use of logical connectives to make new propo-
sitions, we can further propose rules via which even more propositions can
be derived. They would be similar to the axioms of boolean algebra, which
we encounter in the context of digital circuits. We present an example here,
but leave more for a future course. For this course, we focus on employing
semantics, which we specify using truth tables, to infer more propositions.
Similarly, in the context of digital circuits, we usually employ “truth tables,”
like those employed here, rather than proofs based on the axioms of boolean
algebra.
We point out that more connectives can be introduced, for example, ⊕,
“exclusive-or.” It turns out that in propositional logic, the connectives ¬, ∨
and ∧ suffice, and all other connectives can be defined using those three
only. Also, not all of the connectivities are necessary, in the sense that
given a smaller set of them, we can realize the others. In particular, (·), ¬, ∨
and ∧ suffice. We introduced =⇒ , ⇐= and ⇐⇒ as well because
those are used heavily in this course for proofs. Consequently, it is useful to
directly specify and understand those connectives as well. Similarly, XOR
gates are convenient in the context of digital circuits to have, even though
its functionality can be realized from NOT, AND and OR gates only.
As an example of the use of purely syntactic derivation, see proofwiki.org/
wiki/Rule_of_Material_Implication/Formulation_1/Forward_Implication/Proof,
which shows a derivation from p =⇒ q to ¬p ∨ q.

Logical connectives – semantics The following truth tables are custom-


arily associated with the above propositions that are formed using logical
connectives. A truth table specifies, for every possibility of a truth value for
the constituent propositions, what the truth value of a compound proposition
is. We use T for true, and F for false.

p (p)
• parenthesization: T T
F F
21

The above truth table merely emphasizes that the truth value of p is
unaffected by parenthesization.

p ¬p
• negation: T F
F T
Example: suppose “the Sun is hot” is true. Then, “the Sun is not hot”
is false. The second statement is the manner in which we customarily
write the negation of “the Sun is hot” in English.

p q p∧q
T T T
• conjunction: T F F
F T F
F F F
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then, “the Moon is made of cheese and the Sun is hot”
is false.
p q p∨q
T T T
• disjunction: T F T
F T T
F F F
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then, “either the Moon is made of cheese, or the Sun
is hot, or both” is true.

p q p =⇒ q
T T T
• implication: T F F
F T T
F F T
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then:

– “If the Sun is hot, then the Moon is made of cheese” is false.
– “If the Moon is made of cheese, then the Sun is hot” is true.
22

– “If the Sun is not hot, then the Moon is made of cheese” is true.

The last two examples illustrate that, in propositional logic, “if p then
q” may have a very different meaning than in natural language. In
English, it is often used, for instance, to imply a causal relationship
between p and q. But given a premise p that is false – for example,
“the Sun is not hot” – the implication p =⇒ q is true for any q,
even a completely unrelated proposition q such as “the Moon is made
of cheese.” So the current truth of p =⇒ q does not mean that,
when the Sun eventually cools, the Moon will then be composed en-
tirely of fermented curd; rather, when the Sun cools, the implication
itself will be false: in our truth-functional semantics, the truth value of
the compound proposition reflects only the specific truth values of the
constituent propositions, and no more profound relationship between
those constituent propositions. It may be helpful to think of “if p then
q” as shorthand for, “(in any row of the truth table in which p =⇒ q
is true), if p is true, then q is true.”
In mathematics, because we use these same truth-functional semantics,
if p is false, we say that p =⇒ q is vacuously true, to mean that the
implication is true simply by virtue of the falsity of its premise. For
example, if p is “x is an element of the empty set,” and q is “x has
property Q,” then p =⇒ q is (vacuously) true, whatever the property
Q: the elements of the empty set can be said to have any property that
you like, because there are no such elements.
It is not necessary to read p =⇒ q as “if p then q”; another com-
mon way is to say “p only if q.” Again, the proper interpretation is
truth-functional. In other words, in our truth-functional semantics, the
following two statements are completely equivalent:

– If the Sun is hot, then the Moon is made of cheese.


– The Sun is hot only if the Moon is made of cheese.

p q p ⇐= q
T T T
• inference: T F T
F T F
F F T
23

Here the compound proposition is a different way of writing q =⇒ p.


It is commonly read, “p if q,” but should be interpreted only truth-
functionally, and not as implying some deeper relationship between p
and q.
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then:

– “the Sun is hot if the Moon is made of cheese” is true.


– “the Moon is made of cheese if the Sun is hot” is false.
– “the Moon is made of cheese if the Sun is not hot” is true.
p q p ⇐⇒ q
T T T
• if and only if: T F F
F T F
F F T
Example: suppose “the Moon is made of cheese” is false, and “the Sun
is hot” is true. Then:

– “The Sun is hot if and only if the Moon is made of cheese” is false.
– “The Moon is made of cheese if and only if the Sun is not hot” is
true.

Given the above semantics via truth tables, we can now infer several more
propositions.

Claim 1. (p =⇒ q) ⇐⇒ (¬p ∨ q).

Proof. By truth-table.
p q ¬p p =⇒ q ¬p ∨ q (p =⇒ q) ⇐⇒ (¬p ∨ q)
F F T T T T
F T T T T T
T F F F F T
T T F T T T
24

We claim that the above is a valid proof for the claim because for every
possible combination of truth values for p and q, we have shown that the
proposition in the claim is true. We now make and prove two more claims.
The first, which is an implication, has a special name, and is useful for
carrying out some proofs. Given p =⇒ q, we call the proposition ¬q =⇒
¬p its contrapositive. The contrapositive of an implication is different from
the converse: the converse of p =⇒ q is q =⇒ p. It turns out that
(p =⇒ q) ⇐⇒ (¬q =⇒ ¬p), that is, an implication and its contrapositive
are completely equivalent from the standpoint of their respective truth values.
However, given a proposition p =⇒ q, its converse, q =⇒ p, is not
necessarily true.
For example, suppose you know that if it rains, then I carry an umbrella. You
happen to observe that I am carrying an umbrella. Can you infer anything,
for example, that it is raining? The answer is no, not necessarily. On the
other hand, suppose you observe that I am not carrying an umbrella. Can
you infer anything? The answer is yes, you can infer that it is not raining.

Claim 2. (p =⇒ q) ⇐⇒ (¬q =⇒ ¬p).

Proof. We prove by truth table.

(p =⇒ q) ⇐⇒
p q ¬p ¬q p =⇒ q ¬q =⇒ ¬p
(¬q =⇒ ¬p)
F F T T T T T
F T T F T T T
T F F T F F T
T T F F T T T

We now assert something that is perhaps not as easy to prove. If only because
it involves three propositions, p, q and r. But again, careful use of the truth
table enables us to carry out the proof somewhat mechanically.

Claim 3. (p =⇒ q) =⇒ (p ∨ r =⇒ q ∨ r).

Proof. By truth table.


25

p ∨ r =⇒ (p =⇒ q) =⇒
p q r p∨r q∨r p =⇒ q
q∨r (p ∨ r =⇒ q ∨ r)
F F F F F T T T
F F T T T T T T
F T F F T T T T
F T T T T T T T
T F F T F F F T
T F T T T F T T
T T F T T T T T
T T T T T T T T

Perhaps the trickiest part of the truth table in the above proof is intuiting
the truth value of the last column when p =⇒ q is false. Recall that the
proposition φ =⇒ ψ is true whenever φ is false. And in this case, φ is
p =⇒ q.
A number of other useful propositions can similarly be inferred from the
truth tables. Following are some useful propositions, and names we associate
with them when perceived as properties.

• (p ∨ q) ⇐⇒ (q ∨ p) – commutativity of ∨.

• (p ∧ q) ⇐⇒ (q ∧ p) – commutativity of ∧.

• ((p ∨ q) ∨ r) ⇐⇒ (p ∨ (q ∨ r)) – associativity of ∨.

• ((p ∧ q) ∧ r) ⇐⇒ (p ∧ (q ∧ r)) – associativity of ∧.

• (¬(p ∨ q)) ⇐⇒ (¬p ∧ ¬q) – De Morgan’s law (¬ over ∨).

• (¬(p ∧ q)) ⇐⇒ (¬p ∨ ¬q) – De Morgan’s law (¬ over ∧).

• (p ∨ (q ∧ r)) ⇐⇒ ((p ∨ q) ∧ (p ∨ r)) – distributivity of ∨ over ∧.

• (p ∧ (q ∨ r)) ⇐⇒ ((p ∧ q) ∨ (p ∧ r)) – distributivity of ∧ over ∨.

• (p =⇒ q) ⇐⇒ (q ⇐= p).

• (p ⇐⇒ q) ⇐⇒ ((p =⇒ q) ∧ (p ⇐= q)).
26

Quantifiers We now introduce constructs that are not part of propositional


logic, but a higher-order logic called predicate logic. However, as they are
useful for this course in intuiting properties in various contexts, we introduce
and discuss them here. The constructs are called quantifiers, and they are
useful when we want to make assertions that have variables in them.
An example of the use of a quantifier is the following: “every star is hot.”
Another way of saying the same thing, while explicating the use of a variable
and a quantifier is: “for every star x, x is hot.” The “for every” part is
a quantifier, specifically the universal quantifier. The other quantifier of
interest to use is the existential quantifier. An example of its use is: “there
exists x such that x is a bird and x can fly.” (More simply, in English we
would say, “there exists a bird that can fly,” or “some birds can fly.”)
The notation we use for the universal quantifier is “ ∀ ” and for the existential
quantifier is “ ∃ .” For example, we might write: “∃ rational y such that y 2 =
2.” As another example, “∀ integer x, x3 is an integer .” We can use the
logical connectives ¬, ∨ and ∧ along with quantifiers. For example, to ex-
press that there exists no rational y such that y 2 = 2, we could write:
“¬(∃ rational y such that y 2 = 2),”
In the context of that last example, it is useful to be able to intuit equiva-
lent assertions. We could equivalently assert: “∀ rational y, ¬(y 2 = 2),” for
that example, or, “∀ rational y, y 2 6= 2,” if we define the symbol “6=” as the
complement of “=.” Indeed, following are the rules, in general, of negating
an assertion with a quantifier. In the following, we assume that p(x) is an
assertion that involves the variable x.

• ¬(∃x, p(x)) ⇐⇒ ∀x, ¬p(x).


• ¬(∀x, p(x)) ⇐⇒ ∃x, ¬p(x).

We can quantify over more than√one variable. For example: “∀ positive


integer a, ∃ real b such that b = a.” Note that, when different quantifiers
are used, as in this example, their order matters: in general, “∀ person a,
∃ person b such that b is a’s mother” is not equivalent to “∃ person b, such
that, ∀ person a, b is a’s mother”; the first formula asserts that every person
has a mother, the second that there is a person who is mother to everyone
(even herself).
27

Sometimes, when we use the same quantifier over multiple variables, we write
one instance of a quantifier only, and not several. For example:

∀ real a, b, (a ≤ b ∨ b ≤ a)

When we really should write “∀ real a, ∀ real b . . ..”


We have already been using quantifiers implicitly. For example, consider
Claim 3 above. When we refer to p, q and r in the statement of the claim,
what we really mean to say is, “for all propositions p, q and r, it is true
that. . . ” The “for all” quantifiers on each of p, q and r were left implicit in
the statement of the claim.
28

Proof techniques

We now discuss proof techniques that are useful in this course, and in future,
to you in your engineering profession. The mindset and systematic thinking
that working out a proof develops is critical to one’s success as an engineer.
The kinds of proofs we develop, and the underlying mindsets and techniques
we use, are not only of esoteric or theoretical interest. They have immedi-
ate, practical consequence. Also, the precise communication that such proofs
require also are very valuable for one to develop as an engineer. Precise tech-
nical communication is an invaluable skill, that is highy prized not only in
academia, but also industry and business settings. We return to this some-
what philosophical discussion once we have discussed the proof techniques
we seek to impart as part of this course.

Logical deduction The overarching technique we use is logical deduction:


going from a set of known or assumed statements to new statements, that are
typically derived by logic implication. We have already seen some examples
of this in our discussions on logic in this chapter.
Consider the following joke. Three logicians walk into a bar. The bartender
asks, “would y’all like something to drink?” Logician 1 says, “I don’t know.”
Logician 2 says, “I don’t know.” Logician 3 says, “yes.”
The joke is a play on the wording of the bartender’s question, specifically,
her use of “all.” She seems to be asking whether all three of the logicians
want a drink. Presumably, each of Logicians 1 and 2 would like a drink. But
they do not know yet as to whether all of them want a drink. Therefore,
they are compelled to say, “I don’t know.” Logician 3 infers that the other
two would each like a drink; otherwise, one of them would have said, “no.”
She knows that she wants a drink herself, and therefore says, “yes.”
Imagine that Logician 3 had said, “no.” Then, presumably Logicians 1 and
2 want a drink each, but Logician 3 does not. While this is admittedly a
joke, it exercises logical deduction in a good way. Such logical deduction is
at the foundations of every proof we carry out. Following are some specific
strategies one could adopt to carry out a proof. Each strategy provides a
kind of framework within which logical deduction is used. More than one
strategy may be useful in carrying out a proof, and a proof does not require
29

any particular strategy to be adopted to be carried out successfully. It is


important also to recognize when one has successfully carried out a proof;
the strategy helps with this aspect as well.
Some of the strategies that arise in this course, and in future courses are:

• Case analysis: we enumerate, exhaustively, all possible cases that can


occur, and prove each, in turn. Following is an example.
Claim 4. For any three natural numbers x, y, z, where x + y = z, if
any two of x, y, z are divisible by 3, then so is the third.

Proof. By case analysis.

1. x, y are divisible by 3. Then, x = 3a, y = 3b for some natural


numbers a, b. Then, because z = x + y, z = 3(a + b), which implies
that z is divisible by 3.
2. x, z are divisible by 3. Then, x = 3a, z = 3b for some natural
numbers a, b. As y is a natural number, i.e., y ≥ 0 and x + y = z,
b ≥ a. And, y = 3(b − a). As b ≥ a, b − a is a natural number,
and therefore y is a natural number that is divisible by 3.
3. y, z are divisible by 3. This is identical to the previous case as x
and y are interchangeable.

An interesting observation about the above claim is that its converse


is not necessarily true. That is, for three natural numbers x, y, z with
x+y = z, if one of them is divisible 3, it does not necessarily imply that
the other two are as well. A counterexample can be used to establish
this. A counterexample is x = 1, y = 2, z = 3.

• Contradiction: we recall the truth table for an implication, and observe


that the only case such a proposition is false is when φ is true, and ψ
is false. For a proof by contradiction of a proposition φ =⇒ ψ, we
assume that the premise, φ is true, and yet, the implication, ψ, is false.
We then establish by logical deduction that something that is false
must be true, or that something that is true must be false – this is the
contradiction we deduce.
30

For example, consider the following claim, and its proof by contradic-
tion.

Claim 5. 2 is not rational.

Proof. To perceive √the statement the claim as an implication, we can


rephrase it as: x = 2 =⇒ x is not a rational number.

For the purpose of contradiction, assume that x = 2, and x is rational.
Then, x = p/q, where p and q are integers. We assume, without loss of
generality, that p and q have only 1 as a common factor, i.e., p/q is in
its simplest form. Then, x2 = 2 = p2 /q 2 =⇒ p2 = 2q 2 .
Thus, p2 is even. This implies that p is even, because if p is odd, then p
is of the form 2x + 1 where x is an integer, and (2x + 1)2 = 4x2 + 4x + 1,
which is odd. Thus, p = 2y, for some integer y.
Therefore, p2 /2 = (2y)2 /2 = 2y 2 = q 2 . Thus, q 2 is even as well, and
therefore q is even. Thus, both p and q are even, which means p/q is
not in its simplest form, which is our desired contradiction.

Another example, which was on the final exam of the Spring’18 offering
of the course is the following claim. We define an even number as
follows: x is an even number if x = 2y, where y is an integer.
Claim 6. If a, b, c are positive integers, then at least one of a − b, b −
c, c − a is even.

An example is a = 13, b = 8, c = 5. Then, c − a = −8, which is even.

Proof. Assume, for the purpose of contradiction, that none of a − b, b −


c, c−a is even. Then, a−b = 2k+1 for some integer k, and b−c = 2l+1
for some integer l. then, c − a = −(b − c + a − b) = −(2l + 1 + 2k + 1) =
2(−l − k + 1), which is an integer because l, k are integers, and is even.
This contradicts our assumption that c − a is odd.

• Contrapositive: recall that (φ =⇒ ψ) ⇐⇒ (¬ψ =⇒ ¬φ); the two


implications are contrapositives of one another. Given a claim φ =⇒ ψ
a proof of the contrapositive proves, instead, ¬ψ =⇒ ¬φ.
Following is an example of proof by contrapositive.
31

y
x
!
X X
Claim 7. For x, y positive integers, i= i =⇒ (x = y).
i=1 i=1

Proof. We prove the contrapositive, that is, for x, y positive integers,


y
x
!
X X
(x 6= y) =⇒ i 6= i .
i=1 i=1
x
!
X
Given that x 6= y, either (i) x > y or (ii) x < y. In case (i), i =
i=1
y y y
x
! ! !
X X X X
i+ i ≥ y+1+ i > 1+ i , because x ≥ y +1
i=1 i=y+1 i=1 i=1
x y
X X
and y > 0. This implies that i 6= i, as desired.
i=1 i=1

Case (ii) is proven identically, by interchanging x and y.

• Construction: this is typically for statements of the form “there ex-


ists. . . ” That is, a natural way to prove that something exists is to
construct, or present, one. For example, if we all agree on what an
elephant is, and I am challenged to prove that elephants exist, I can
simply produce and present an elephant. Following is an example.

Claim 8. Given any two real numbers, x, y such that x < y, there
exists a real number z such that x < z < y.

Proof. By construction. Let z = (x + y)/2. Then z is real because the


sum of two real numbers is real, and dividing a real by another that is
not zero yields a real. To establish that x < z < y, we observe:
x+y
x < z < y ⇐= x < <y
2
⇐= 2x < x + y < 2y
⇐= x + x < x + y < y + y
⇐= x < y
32

The above proof demonstrates a useful strategy: to begin with what


we seek to prove, and then work backwards to a sufficient condition for
that to be true, in this case, x < y, which we know to be true.

• Induction: a proof by induction is usually put to use when we have a


statement that involves a universal quantifier, for a sequence of items,
for example, all natural numbers. A proof by induction is structured
as follows:

– We first prove that the statement is true for the base case. The
base case is the statement for the first natural number, 0.
– We then prove the step, i.e., the following implication: if the state-
ment is true for all natural numbers, 0, 1, . . . , i − 1, then the state-
ment is true for the natural number i.

Together, the two steps above prove the statement for all items in the
sequence, for example, every natural number. This is because proving
the (i) base case, i.e., the statement for 0, and, (ii) the step, implies
that the statement is true for the second natural number, 1. This, with
the step, in turn implies that the statement is true for 2. And, 3, and
so on, for all natural numbers. Following is an example.
n(n+1)
Claim 9. 1 + 2 + . . . + n = 2
.

Proof. By induction on n.
Base case: n = 1. When n = 1, the left hand side is 1. And the right
hand-side is 1×2
2
= 1. Thus, we have proved that the statement is true
for the base case.
Step: we adopt the induction assumption, that the statement is true
for all n = 1, 2, . . . , i − 1, for some i ≥ 2. Under that premise, we seek
to prove the statement for n = i. We observe:

(i − 1)i
1 + 2 + ... + i − 1 + i = +i ∵ induction assumption
2
i2 − i + 2i
=
2
i2 + i i(i + 1)
= =
2 2
33

Thus, we have proven the base case and the step, and therefore we have
successfully carried out our proof by induction on n.

As the base case, we have proved that the statement is true when n = 1. As
a consequence of proving the step, then, we have proved that the statement
is true for n = 2. And with that, and as a consequence of the step, we have
proved that the statement is true for n = 3. And so on.
We now carry out several proofs as examples to demonstrate the above strate-
gies. We begin with a problem from the final exam of the Spring’18 offering
of the course.

Claim 10. For every non-negative integer n ≥ 12, there exist non-negative
integers m1 , m2 such that n = 4m1 + 5m2 .

Proof. By induction on n.
Base cases: we prove the statement for the following cases: n = 12, 13, 14, 15.
The reason we consider several base cases becomes apparent once we get in
to proving the step. We observe:

• 12 = 4 × 3 + 5 × 0.

• 13 = 4 × 2 + 5 × 1.

• 14 = 4 × 1 + 5 × 2.

• 15 = 4 × 0 + 5 × 3.

Step: we assume that the assertion is true for all n = 12, 13, . . . , i − 1 for
some i ≥ 13. For n = i, we first observe that i = i − 1 + 1 = 4k1 + 5k2 + 1,
for some non-negative integers k1 , k2 , from the induction assumption. We do
a case analysis.
Case (i): k1 > 0. Then, i = 4k1 + 5k2 + 1 = 4(k1 − 1) + 5(k2 + 1).
Case (ii): k1 = 0. Then, because i > 12, k2 ≥ 3. Then, i = 5k2 + 1 =
4 × 4 + 5(k2 − 3).
34

The reason we prove several base cases is to address Case (ii) of the step.
Because the smallest n for which k2 ≥ 3 is n = 15. By addressing several
base cases, we ensure that our proof is indeed correct, i.e., that we can indeed
make the inductive argument.
Claim 11. For every non-negative integer n, exactly one of the following is
true:

• there exists a non-negative integer m such that n = 3m.


• there exists a non-negative integer m such that n = 3m + 1.
• there exists a non-negative integer m such that n = 3m + 2.

We need to be careful here in that the statement says that exactly one of
those cases is true. That is, for a particular n, one of the cases is true, and
neither of the others is true. We need to prove both those properties.

Proof. By induction on n. Again, we are careful to address several base cases.


Base cases: for each of n = 0, 1, 2, we prove the first part by construction,
i.e., by producing an m that demonstrates that the statement is true.
For n = 0, we observe that 0 = 3 × 0, i.e., m = 0, which proves that the
statement is true. For n = 1, we again propose m = 0, and observe that
n = 1 = 3 × 0 + 1. And for n = 2, we propose m = 0, and observe that
n = 2 = 3 × 0 + 2. Thus, we have shown one part of the statement for each
of n = 0, 1, 2, which is that there exists such an m.
We now prove the other part of the statement: that given that 0 = 3m for
some m, then it can be neither 3m0 + 1 nor 3m0 + 2 for any non-negative
integer m0 . Suppose, for the purpose of contradiction, there exists such an
m0 , that is, 0 = 3m0 +1. Then, m0 = −1/3, which contradicts the assumption
that m0 is a non-negative integer. Similarly, 0 = 3m0 + 2 =⇒ m0 = −2/3,
again a contradiction.
And similarly, if 1 = 3m0 , then m0 = 1/3 and if 1 = 3m0 + 2, then m0 = −1/3,
in each case a contradiction to the assumption that m0 is a non-negative
integer. And finally, if 2 = 3m0 , then m0 = 2/3, and if 2 = 3m0 + 1, then
m0 = 1/3.
35

Step: we assume that the statement is true for all n = 0, 1, 2, . . . , i − 1 for


some i ≥ 1. For n = i, we do a case analysis, and in each case, produce an
m.

• if i − 1 = 3m for some non-negative integer m, then, i = 3m + 1.

• if i − 1 = 3m + 1 for some non-negative integer m, then, i = 3m + 2.

• if i − 1 = 3m + 2 for some non-negative integer m, then, i = 3(m + 1).


And because m is a non-negative integer, so is m + 1.

To establish that no other case applies, assume that a non-negative integer


m0 exists that corresponds to one of the other cases, for the purpose of
contradiction. We again do a case analysis.

• if i = 3m and i = 3m0 + 1, then m0 = m − 1/3, which is a contradiction


to the assumption that m0 is a non-negative integer. And if i = 3m0 +2,
then m0 = m − 2/3, which is a similar contradiction.

• if i = 3m + 1 and i = 3m0 , then m0 = m + 1/3, and if i = 3m0 + 2, then


m0 = m − 1/3, each of which is a contradiction.

• if i = 3m + 2 and i = 3m0 , then m0 = m + 2/3, and if i = 3m0 + 1, then


m0 = m + 1/3, both of which contradict our assumption that m0 is a
non-negative integer.

We now consider a proof by induction for a statement that is obviously not


true. The statement is: all horses have the same colour. The proof is as
follows. For the base case, pick a horse. Obviously it is the same colour as
itself. Therefore, the base case has been proved. The induction assumption
is that given up to n = i − 1 horses, for some i ≥ 2, they all have the
same colour. Now consider that we are given n = i horses. We pick some
horse, and temporarily remove it from the set. Then we are left with i − 1
horses which, by the induction assumption all have the same colour. We now
temporarily remove one of those i − 1 horses from the set, and add back in
36

the horse that we first removed. Again, we are left with i − 1 horses which,
by the induction assumption must all have the same colour.
A flaw in the above proof is in the manner in which we prove the step. While
it is certainly ok to remove a horse, call it H, from the set and then assert
that the remainder all have the same colour, what we now need to do is prove
that H has the same colour as the other i−1 horses. We cannot again appeal
to the induction assumption to do that, as the above flawed proof does.
We now present one more correct example of proof by induction. In the
following claim, we address a situation that there appears to be more than
one choice for the parameter on which we carry out induction.
Claim 12. Suppose n is a natural number whose digits, in order of most-
to least-significant, are nk−1 nk−2 . . . , n0 , where each ni is one of 0, . . . , 9. If
k−1
X
the sum of the digits of n, Sn = ni , is divisible by 3, then n is divisible
i=0
by 3.

An example is n = 809173. Then, Sn = 28, which is not divisible by 3.


Therefore, from the statement in the claim, we cannot infer anything as to
whether n is divisible by 3. On the other hand, the digits of 82907370 add
up to 36, and therefore, if the claim is true, then 82907370 is divisible by 3.
We emphasize that the implication in the statement goes in one direction
only. “. . . if Sn is divisible by 3, then n is divisible by 3. . . ” It says nothing
about what Sn may be if n is divisible by 3.
The above claim presents an example of where if we choose to carry out a
proof by induction, then we need to clearly say on what parameter we carry
out induction. For the above claim, there appear to be at least two choices:
induction on n, and induction on k. In the following proof, we carry out
induction on k, i.e., the number of digits when we write n in decimal.

Proof. Base case: k = 1. Then, n = n0 = Sn , i.e., n has only one digit.


Then, for Sn to be divisible by 3, Sn must be one of 3, 6 or 9. In each case,
because n = Sn , we observe that n is divible by 3 as well.
Step: our induction assumption is that given any n that has k = 1, . . . , i − 1
digits, for some i ≥ 2, if Sn is divisible by 3, then so is n. We need to
37

now prove that given some n of i digits, if Sn is divisible by 3, then so is n.


Henceforth, we use the notation ( )10 to indicate when we write a number in
base-10, i.e., its digits from most- to least-significant.
We have n = (ni−1 ni−2 . . . n0 )10 . Therefore, n = 10i−1 ni−1 + 10i−2 ni−2 + . . . +
Xi−1 Xi−2
0 i−1
10 n0 = 10 ni−1 + (ni−2 . . . n0 )10 . Also, Sn = nj = ni−1 + nj . We do
j=0 j=0
i−2
X
a case analysis on nj as to whether it is divisible by 3. We appeal often
j=0
to Claim 4. Recall that that claim is: given three natural numbers x, y, z
such that x + y = z and any two are divisible by 3, then so is the third.

i−2
X
• Suppose nj is divisible by 3. Then, for Sn to be divisible by 3,
j=0
ni−1 must be divisible by 3 by Claim 4. That is, ni−1 = 3a for some
i−2
X
natural number a. Then, n = 10i−1 × 3a + (ni−2 . . . n0 )10 . As nj is
j=0
divisible by 3, by the induction assumption, (ni−2 . . . n0 )10 is divisible
by 3. Therefore, by Claim 4, n = 10i−1 × 3a + (ni−2 . . . n0 )10 is divisible
by 3, because it is the sum of two numbers, each of which is divisible
by 3.
i−2
X i−2
X
• Suppose nj is not divisible by 3. Then, nj = 3a + b, for some
j=0 j=0
natural number a, and for b either 1 or 2. We now do a case analysis
of those two cases for b.

– If b = 1, then ni−1 = 3a0 + 2 for some natural number a0 , because


otherwise, Sn is not divisible by 3. And we have:

n = 10i−1 ni−1 + (ni−2 . . . n0 )10


= 10i−1 (3a0 + 2) + (ni−2 . . . n0 )10
= 10i−1 × 3a0 + 10i−2 × 20 + (ni−2 . . . n0 )10

Now, we do a further case analysis on ni−2 :


38

∗ If ni−2 = 0, then, we choose to write n as:


n = 10i−1 × 3a0 + 10i−2 × 18 + (2 ni−3 . . . n0 )10
Now, each of 10i−1 × 3a0 and 10i−2 × 18 is divisible by 3.
And the digits of (2 ni−3 . . . n0 )10 are divisible by 3, because
i−2
X
nj = 3a + 1. Therefore, by the induction assumption,
j=0
(2 ni−3 . . . n0 ) is divisible by 3. Thus, n is the sum of three
numbers, each of which is divisible by 3, and therefore n is
divisible by 3.
∗ If ni−2 > 0, then, we choose to write n as:

n = 10i−1 × 3a0 + 10i−2 × 21 + ((ni−2 − 1) ni−3 . . . n0 )10


Again, n is the sum of three numbers each of which is divisible
by 3.
– If b = 2, then ni−1 = 3a0 + 1 for some natural number a0 , because
otherwise, Sn is not divisible by 3. And we have:
n = 10i−1 ni−1 + (ni−2 . . . n0 )10
= 10i−1 (3a0 + 1) + (ni−2 . . . n0 )10
= 10i−1 × 3a0 + 10i−2 × 10 + (ni−2 . . . n0 )10

As before, we do a further case analysis on ni−2 :


∗ If ni−2 = 0 or ni−2 = 1, then we choose to write n as:
n = 10i−1 × 3a0 + 10i−2 × 9 + ((ni−2 + 1) ni−3 . . . n0 )10
And n is the sum of three numbers each of which is divisible
by 3.
∗ If ni−2 ≥ 2, then we choose to write n as:
n = 10i−1 × 3a0 + 10i−2 × 12 + ((ni−2 − 2) ni−3 . . . n0 )10
And n is the sum of three numbers each of which is divisible
by 3.
39

Disproof

Sometimes we are faced with a statement that we don’t know to be true or


to be false. We may need to find out whether it’s true or false. In such a
case, we can see whether we can prove it, or disprove it.
Consider the following example.
Claim 13. For every natural number n, n2 − n + 11 is prime.

If we simply try to prove this statement, we will never succeed. But we may
succeed in disproving it: it turns out that the above claim is false. That is,
there exists natural n such that n2 − n + 11 is not prime.
Such an n is 11, and this is called a counterexample to the claim: an example
of a specific n for which the claim does not hold. Producing a counterexample
is an effective way of refuting a statement of the form “for all . . . ”
For another disproof by counterexample, consider the statement, “no mam-
mal lays eggs.” This can be seen as the negation of a statement with the
“there exists” quantifier. Which can in turn be rephrased as a statement
with “for all. . . ”

No mammal lays eggs ⇐⇒ @ a mammal that lays eggs


⇐⇒ ¬(∃ a mammal that lays eggs)
⇐⇒ ∀ mammals m, m does not lay eggs

As a counterexample to the latter statement, we could present the platypus,


which is an egg-laying mammal. A note of caution: sometimes it is not
obvious that something that is presented as a counterexample is indeed a
counterexample. In such a situation, we need to prove that it is indeed a
counterexample. For example, we need to prove that the platypus that we
present as a counterexample is indeed a mammal, and does lay eggs.
For our counterexample for Claim 13, as proof that n = 11 is indeed a valid
counterexample, we would observe that 112 − 11 + 11 = 121, which is not
prime because it has a divisor, 11, which is neither itself nor 1. The proof of
Claim 5 of Chapter 2 establishes the non-obvious fact that the square root
of 2 is a valid counterexample to the claim that all real numbers are rational.
40

Why is the presentation of a counterexample a valid way of disproving a


statement of the form “for all . . . ”? The reason is that we are proving the
negation of the statement. That is, to prove that a statement S is false, we
prove ¬S to be true. Thus, if P (x) is a statement about x, and S = ∀x, P (x),
then:

¬S ⇐⇒ ¬(∀x, P (x)) ⇐⇒ ∃x, ¬P (x)

And then, a counterexample is a proof by construction of ∃x, ¬P (x).


We consider one more example of a claim that we are able to disprove by
counterexample.

Claim 14. For all sets A, B and C, A × C = B × C =⇒ A = B.

As a counterexample, pick C = ∅ and A, B be any sets such that A 6= B.


To disprove a statement of a form other than “for all . . . ” we simply negate
the statement, and prove that this negation is true. For example, to disprove
a statement of the form “there exists . . . ”, we need to prove a statement of
the form “for all . . . ” That is:

Let S = ∃x, P (x)


Then, ¬S = ¬(∃x, P (x)) ⇐⇒ ∀x, ¬P (x)

Claim 15. There exist primes p, q such that p − q = 513.

The above claim is false. Its negation is:

For all primes p, q, p − q 6= 513

We can prove this by contradiction. Suppose there exist primes p, q such


that p − q = 513. (Observe that this is exactly the statement of Claim 15.)
Then, one of p, q is even and the other is odd. We now do a case-analysis.
(i) Suppose p is even, and q is odd. Then, p = 2 as that is the only even
prime. Then, q = −511, which contradicts the assumption that q is prime.
(We adopt the customary condition that for a number to be prime, it must
41

be a natural number, i.e., ≥ 1.) (ii) Suppose q is even and p is odd. Then,
q = 2 and p = 515, which contradicts the assumption that p is prime.
We now consider a more complex example, a statement that involves two
quantifiers. This example illustrates the utility of first carefully negating the
statement, and then choosing a strategy when trying to disprove the original
statement.

Claim 16. ∀m ∈ Z, ∃n ∈ N, m1 − n1 > 12 .

The above statement is not true. Its negation, which we seek to prove, is:

1 1 1
∃m ∈ Z, ∀n ∈ N, − ≤
m n 2

We can prove this statement by construction of a suitable m, and then prov-


ing that for that choice of m, the “∀n . . .” part is true. Choose m = 2. Then,
we perform a case-analysis on n.

When n = 1, 12 − 1 = 21 ≤ 12 .

When n = 2, 12 − 21 = 0 ≤ 12 .

When n ≥ 3, 0 < n1 < 12 . Therefore, 12 − n1 = 12 − n1 < 21 .
We conclude with an example of a logical implication which does not hold.
We use this example to illustrate the fact that when a statement that is
an implication is false, this means that there is some assignment of truth-
values to the constituent propositions that causes the implication not to
hold. For some other assignments, it may or may not hold. And of course,
for a proposition to be true, it must be true for all truth-assignments of its
constituent propositions.

Claim 17. (p =⇒ q) =⇒ ((p ∨ r) =⇒ q)

To disprove the above claim, we observe that when p is false, q is false and
r is true, the statement is not true. We observe that for some other truth-
assignments, the statement is true; for example, p = true, q = false, r = true.
But that is immaterial to the fact that the claim is false.
42

Another example of the use of the kind of logic and proof techniques is the
following claim and proof. In the proof, we directly use “ ⇐⇒ .” We could,
 
instead, first establish A ⊆ A, and then A ⊇ A. The symbol “\” denotes
set difference.

Claim 18. A = A.

Proof.

x ∈ A ⇐⇒ x ∈ U \ A
⇐⇒ x ∈ U ∧ x 6∈ A
⇐⇒ x ∈ U ∧ x 6∈ (U \ A)
⇐⇒ x ∈ U ∧ ¬(x ∈ U \ A)
⇐⇒ x ∈ U ∧ ¬(x ∈ U ∧ ¬(x ∈ A))
⇐⇒ x ∈ U ∧ (x 6∈ U ∨ x ∈ A)
⇐⇒ (x ∈ U ∧ x 6∈ U) ∨ (x ∈ U ∧ x ∈ A)
⇐⇒ false ∨ (x ∈ U ∧ x ∈ A)
⇐⇒ x∈U ∧x∈A
⇐⇒ x ∈ U ∩ A ⇐⇒ x ∈ A

As another somewhat more general example of the use of logic in the context
of sets, we discuss Russel’s paradox, which points out that the set builder
notation should be used with care.

Russell’s paradox A set is a collection of items. A set itself can be per-


ceived as an item. Therefore, it is possible to specify a set of sets. For
example, {{1}, ∅, {1, 2, 3, 4, 5}} is a set of sets of integers, which has three
members. An immediate question that then arises is: can a set be a member
of itself? It does not seem meaningful to allow this, and therefore we may
mandate that no set is allowed to be a member of itself.
However, it turns out that this by itself does not preclude contradictions that
can occur in the specification of a set. A particular contradiction is Russell’s
43

paradox, which is demonstrated by the following specification of a set using


set-builder notation.

Let S = {x | x is a set with the property that x 6∈ x}

That is, S is the set of all sets that do not contain themselves. Now, we ask:
does S contain itself?

• If the answer is ‘yes,’ then:

S ∈ S =⇒ S is a set that does not contain itself =⇒ S 6∈ S

Thus, we have a contradiction.

• If the answer is ‘no,’ then:

S 6∈ S =⇒ S is a set that does not contain itself =⇒ S ∈ S

Thus, we again have a contradiction.

A thorough discussion on “clean” specifications of sets and other constructs is


beyond the scope of this course. The above discussion on Russell’s paradox
reveals, however, that care must be taken. A quick “hack” is to restrict
the manner in which the set-builder notation is used. We require that when
specifying a set using the set-builder notation, it must look like the following:

{x ∈ A | conditions on x}

That is, we must specify of what superset A this set being specified is a
subset. And the conditions that appear after “ | ” are then used to specify
which members of A are members of this set. Under these requirements, the
earlier specification, S = {x | x 6∈ x} is no longer allowed.
And if we specify, for example, S = {x ∈ A | x 6∈ x}, we no longer have a
paradox. Because suppose S = {x ∈ A | x 6∈ x} is our specification of S, and
we again ask: is S ∈ S?
44

• If the answer is ‘yes,’ then:

S ∈ S =⇒ S ∈ A ∧ S 6∈ S =⇒ S 6∈ S

Thus, we have a contradiction.

• If the answer is ‘no,’ then:

S 6∈ S =⇒ S 6∈ A ∨ (S ∈ A ∧ S 6∈ S)

Now, if S ∈ A, then S ∈ A ∧ S 6∈ S =⇒ S ∈ S, a contradiction.

Thus, we have a possibility without a contradiction, and that is that S 6∈ A.


Which implies S ∈
6 S, and the answer to the question “is S ∈ S?” is “no.”

Discrete probability

From Cormen, et al., “Introduction to Algorithms.”


C Counting and Probability
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

This chapter reviews elementary combinatorics and probability theory. If you have
a good background in these areas, you may want to skim the beginning of the
chapter lightly and concentrate on the later sections. Most of the chapters do not
require probability, but for some chapters it is essential.
Section C.1 reviews elementary results in counting theory, including standard
formulas for counting permutations and combinations. The axioms of probability
and basic facts concerning probability distributions are presented in Section C.2.
Random variables are introduced in Section C.3, along with the properties of ex-
pectation and variance. Section C.4 investigates the geometric and binomial dis-
tributions that arise from studying Bernoulli trials. The study of the binomial dis-
tribution is continued in Section C.5, an advanced discussion of the “tails” of the
distribution.

C.1 Counting

Counting theory tries to answer the question “How many?” without actually enu-
merating how many. For example, we might ask, “How many different n-bit num-
bers are there?” or “How many orderings of n distinct elements are there?” In this
section, we review the elements of counting theory. Since some of the material
assumes a basic understanding of sets, the reader is advised to start by reviewing
the material in Section B.1.
Copyright @ 2001. MIT Press.

Rules of sum and product


A set of items that we wish to count can sometimes be expressed as a union of
disjoint sets or as a Cartesian product of sets.
The rule of sum says that the number of ways to choose an element from one
of two disjoint sets is the sum of the cardinalities of the sets. That is, if A and B
are two finite sets with no members in common, then |A ∪ B| = |A| + |B|, which
C.1 Counting 1095

follows from equation (B.3). For example, each position on a car’s license plate
is a letter or a digit. The number of possibilities for each position is therefore
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

26 + 10 = 36, since there are 26 choices if it is a letter and 10 choices if it is a


digit.
The rule of product says that the number of ways to choose an ordered pair is the
number of ways to choose the first element times the number of ways to choose the
second element. That is, if A and B are two finite sets, then |A × B| = |A| · |B|,
which is simply equation (B.4). For example, if an ice-cream parlor offers 28
flavors of ice cream and 4 toppings, the number of possible sundaes with one scoop
of ice cream and one topping is 28 · 4 = 112.

Strings
A string over a finite set S is a sequence of elements of S. For example, there are
8 binary strings of length 3:
000, 001, 010, 011, 100, 101, 110, 111 .
We sometimes call a string of length k a k-string. A substring s  of a string s
is an ordered sequence of consecutive elements of s. A k-substring of a string
is a substring of length k. For example, 010 is a 3-substring of 01101001 (the
3-substring that begins in position 4), but 111 is not a substring of 01101001.
A k-string over a set S can be viewed as an element of the Cartesian product S k
of k-tuples; thus, there are |S|k strings of length k. For example, the number of
binary k-strings is 2k . Intuitively, to construct a k-string over an n-set, we have n
ways to pick the first element; for each of these choices, we have n ways to pick the
second element; and so forth k times. This construction leads to the k-fold product
n · n · · · n = n k as the number of k-strings.

Permutations
A permutation of a finite set S is an ordered sequence of all the elements of S,
with each element appearing exactly once. For example, if S = {a, b, c}, there
are 6 permutations of S:
abc, acb, bac, bca, cab, cba .
There are n! permutations of a set of n elements, since the first element of the
Copyright @ 2001. MIT Press.

sequence can be chosen in n ways, the second in n − 1 ways, the third in n − 2


ways, and so on.
A k-permutation of S is an ordered sequence of k elements of S, with no element
appearing more than once in the sequence. (Thus, an ordinary permutation is just
an n-permutation of an n-set.) The twelve 2-permutations of the set {a, b, c, d} are
ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc .
1096 Appendix C Counting and Probability

The number of k-permutations of an n-set is


n!
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

n(n − 1)(n − 2) · · · (n − k + 1) = , (C.1)


(n − k)!
since there are n ways of choosing the first element, n − 1 ways of choosing the
second element, and so on until k elements are selected, the last being a selection
from n − k + 1 elements.

Combinations
A k-combination of an n-set S is simply a k-subset of S. For example, there are
six 2-combinations of the 4-set {a, b, c, d}:
ab, ac, ad, bc, bd, cd .
(Here we use the shorthand of denoting the 2-set {a, b} by ab, and so on.) We can
construct a k-combination of an n-set by choosing k distinct (different) elements
from the n-set.
The number of k-combinations of an n-set can be expressed in terms of the num-
ber of k-permutations of an n-set. For every k-combination, there are exactly k!
permutations of its elements, each of which is a distinct k-permutation of the n-set.
Thus, the number of k-combinations of an n-set is the number of k-permutations
divided by k!; from equation (C.1), this quantity is
n!
. (C.2)
k! (n − k)!
For k = 0, this formula tells us that the number of ways to choose 0 elements from
an n-set is 1 (not 0), since 0! = 1.

Binomial coefficients

We use the notation nk (read “n choose k”) to denote the number of k-combinations
of an n-set. From equation (C.2), we have
 
n n!
= .
k k! (n − k)!
Copyright @ 2001. MIT Press.

This formula is symmetric in k and n − k:


   
n n
= . (C.3)
k n−k
These numbers are also known as binomial coefficients, due to their appearance in
the binomial expansion:
C.1 Counting 1097

n  
 n
(x + y)n = x k y n−k . (C.4)
k
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

k=0

A special case of the binomial expansion occurs when x = y = 1:


n  
n
2 =
n
.
k=0
k
n
This formula corresponds nto counting the 2 binary n-strings by the number of 1’s
they contain:
 there are k binary n-strings containing exactly k 1’s, since there
n
are k ways to choose k out of the n positions in which to place the 1’s.
There are many identities involving binomial coefficients. The exercises at the
end of this section give you the opportunity to prove a few.

Binomial bounds
We sometimes need to bound the size of a binomial coefficient. For 1 ≤ k ≤ n, we
have the lower bound
 
n n(n − 1) · · · (n − k + 1)
=
k k(k − 1) · · · 1
n  n − 1 
n−k+1

= ···
k k−1 1
 n k
≥ .
k
Taking advantage of the inequality k! ≥ (k/e)k derived from Stirling’s approxima-
tion (3.17), we obtain the upper bounds
 
n n(n − 1) · · · (n − k + 1)
=
k k(k − 1) · · · 1
k
n

k!
 en k
≤ . (C.5)
k
For all 0 ≤ k ≤ n, we can use induction (see Exercise C.1-12) to prove the bound
 
n nn
Copyright @ 2001. MIT Press.

≤ k , (C.6)
k k (n − k)n−k
where for convenience we assume that 00 = 1. For k = λn, where 0 ≤ λ ≤ 1, this
bound can be rewritten as
 
n nn

λn (λn)λn ((1 − λ)n)(1−λ)n
1100 Appendix C Counting and Probability

C.1-15 
Show that for any integer n ≥ 0,
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

n  
n
k = n2n−1 . (C.11)
k=0
k

C.2 Probability

Probability is an essential tool for the design and analysis of probabilistic and ran-
domized algorithms. This section reviews basic probability theory.
We define probability in terms of a sample space S, which is a set whose el-
ements are called elementary events. Each elementary event can be viewed as a
possible outcome of an experiment. For the experiment of flipping two distinguish-
able coins, we can view the sample space as consisting of the set of all possible
2-strings over {H, T }:
S = {HH, HT, TH , TT } .
An event is a subset1 of the sample space S. For example, in the experiment of
flipping two coins, the event of obtaining one head and one tail is {HT, TH }. The
event S is called the certain event, and the event ∅ is called the null event. We say
that two events A and B are mutually exclusive if A ∩ B = ∅. We sometimes treat
an elementary event s ∈ S as the event {s}. By definition, all elementary events are
mutually exclusive.

Axioms of probability
A probability distribution Pr {} on a sample space S is a mapping from events of S
to real numbers such that the following probability axioms are satisfied:
1. Pr {A} ≥ 0 for any event A.
2. Pr {S} = 1.

1 For a general probability distribution, there may be some subsets of the sample space S that are not
Copyright @ 2001. MIT Press.

considered to be events. This situation usually arises when the sample space is uncountably infinite.
The main requirement is that the set of events of a sample space be closed under the operations of
taking the complement of an event, forming the union of a finite or countable number of events, and
taking the intersection of a finite or countable number of events. Most of the probability distributions
we shall see are over finite or countable sample spaces, and we shall generally consider all subsets of
a sample space to be events. A notable exception is the continuous uniform probability distribution,
which will be presented shortly.
C.2 Probability 1101

3. Pr { A ∪ B} = Pr { A} + Pr {B} for any two mutually exclusive events A


and B. More generally, for any (finite or countably infinite) sequence of events
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

A1 , A2 , . . . that are pairwise mutually exclusive,


 
 
Pr Ai = Pr { Ai } .
i i

We call Pr { A} the probability of the event A. We note here that axiom 2 is a


normalization requirement: there is really nothing fundamental about choosing 1
as the probability of the certain event, except that it is natural and convenient.
Several results follow immediately from these axioms and basic set theory (see
Section B.1). The null event ∅ has probability Pr {∅} = 0. If A ⊆ B, then
Pr {A} ≤ Pr {B}. Using A to denote the event S − A (the complement of A),
we have Pr {A} = 1 − Pr { A}. For any two events A and B,
Pr {A ∪ B} = Pr {A} + Pr {B} − Pr { A ∩ B} (C.12)
≤ Pr {A} + Pr {B} . (C.13)
In our coin-flipping example, suppose that each of the four elementary events
has probability 1/4. Then the probability of getting at least one head is
Pr {HH, HT, TH } = Pr {HH} + Pr {HT} + Pr {TH }
= 3/4 .
Alternatively, since the probability of getting strictly less than one head is
Pr {TT } = 1/4, the probability of getting at least one head is 1 − 1/4 = 3/4.

Discrete probability distributions


A probability distribution is discrete if it is defined over a finite or countably infinite
sample space. Let S be the sample space. Then for any event A,

Pr {A} = Pr {s} ,
s∈A

since elementary events, specifically those in A, are mutually exclusive. If S is


finite and every elementary event s ∈ S has probability
Copyright @ 2001. MIT Press.

Pr {s} = 1/ |S| ,
then we have the uniform probability distribution on S. In such a case the experi-
ment is often described as “picking an element of S at random.”
As an example, consider the process of flipping a fair coin, one for which the
probability of obtaining a head is the same as the probability of obtaining a tail, that
is, 1/2. If we flip the coin n times, we have the uniform probability distribution
1102 Appendix C Counting and Probability

defined on the sample space S = {H, T}n , a set of size 2n . Each elementary event
in S can be represented as a string of length n over {H, T}, and each occurs with
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

probability 1/2n . The event


A = {exactly k heads and exactly n − k tails occur}
 
is a subset of S of size |A| = nk , since there are nk strings of length n over {
 n H, T
}
that contain exactly k H’s. The probability of event A is thus Pr {A} = k /2 .
n

Continuous uniform probability distribution


The continuous uniform probability distribution is an example of a probability
distribution in which not all subsets of the sample space are considered to be
events. The continuous uniform probability distribution is defined over a closed
interval [a, b] of the reals, where a < b. Intuitively, we want each point in the
interval [a, b] to be “equally likely.” There is an uncountable number of points,
however, so if we give all points the same finite, positive probability, we cannot
simultaneously satisfy axioms 2 and 3. For this reason, we would like to associate
a probability only with some of the subsets of S in such a way that the axioms are
satisfied for these events.
For any closed interval [c, d], where a ≤ c ≤ d ≤ b, the continuous uniform
probability distribution defines the probability of the event [c, d] to be
d −c
Pr {[c, d]} = .
b−a
Note that for any point x = [x, x], the probability of x is 0. If we remove the
endpoints of an interval [c, d], we obtain the open interval (c, d). Since [c, d] =
[c, c] ∪ (c, d) ∪ [d, d], axiom 3 gives us Pr {[c, d]} = Pr {(c, d)}. Generally, the set
of events for the continuous uniform probability distribution is any subset of the
sample space [a, b] that can be obtained by a finite or countable union of open and
closed intervals.

Conditional probability and independence


Sometimes we have some prior partial knowledge about the outcome of an exper-
iment. For example, suppose that a friend has flipped two fair coins and has told
Copyright @ 2001. MIT Press.

you that at least one of the coins showed a head. What is the probability that both
coins are heads? The information given eliminates the possibility of two tails. The
three remaining elementary events are equally likely, so we infer that each occurs
with probability 1/3. Since only one of these elementary events shows two heads,
the answer to our question is 1/3.
C.2 Probability 1103

Conditional probability formalizes the notion of having prior partial knowledge


of the outcome of an experiment. The conditional probability of an event A given
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

that another event B occurs is defined to be


Pr { A ∩ B}
Pr {A | B} = (C.14)
Pr {B}
whenever Pr {B} = 0. (We read “Pr { A | B}” as “the probability of A given B.”)
Intuitively, since we are given that event B occurs, the event that A also occurs
is A ∩ B. That is, A ∩ B is the set of outcomes in which both A and B occur. Since
the outcome is one of the elementary events in B, we normalize the probabilities
of all the elementary events in B by dividing them by Pr {B}, so that they sum to 1.
The conditional probability of A given B is, therefore, the ratio of the probability
of event A ∩ B to the probability of event B. In the example above, A is the event
that both coins are heads, and B is the event that at least one coin is a head. Thus,
Pr {A | B} = (1/4)/(3/4) = 1/3.
Two events are independent if
Pr {A ∩ B} = Pr { A} Pr {B} , (C.15)
which is equivalent, if Pr {B} = 0, to the condition
Pr {A | B} = Pr { A} .
For example, suppose that two fair coins are flipped and that the outcomes are
independent. Then the probability of two heads is (1/2)(1/2) = 1/4. Now suppose
that one event is that the first coin comes up heads and the other event is that
the coins come up differently. Each of these events occurs with probability 1/2,
and the probability that both events occur is 1/4; thus, according to the definition
of independence, the events are independent—even though one might think that
both events depend on the first coin. Finally, suppose that the coins are welded
together so that they both fall heads or both fall tails and that the two possibilities
are equally likely. Then the probability that each coin comes up heads is 1/2, but
the probability that they both come up heads is 1/2 = (1/2)(1/2). Consequently,
the event that one comes up heads and the event that the other comes up heads are
not independent.
A collection A1 , A2 , . . . , An of events is said to be pairwise independent if
Pr {Ai ∩ A j } = Pr {Ai } Pr { A j }
Copyright @ 2001. MIT Press.

for all 1 ≤ i < j ≤ n. We say that the events of the collection are (mutually)
independent if every k-subset Ai1 , Ai2 , . . . , Aik of the collection, where 2 ≤ k ≤ n
and 1 ≤ i 1 < i 2 < · · · < i k ≤ n, satisfies
Pr {Ai1 ∩ Ai2 ∩ · · · ∩ Aik } = Pr { Ai1 } Pr { Ai2 } · · · Pr { Aik } .
1104 Appendix C Counting and Probability

For example, suppose we flip two fair coins. Let A1 be the event that the first coin
is heads, let A2 be the event that the second coin is heads, and let A3 be the event
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

that the two coins are different. We have


Pr {A1 } = 1/2 ,
Pr {A2 } = 1/2 ,
Pr {A3 } = 1/2 ,
Pr { A1 ∩ A2 } = 1/4 ,
Pr { A1 ∩ A3 } = 1/4 ,
Pr { A2 ∩ A3 } = 1/4 ,
Pr { A1 ∩ A2 ∩ A3 } = 0.
Since for 1 ≤ i < j ≤ 3, we have Pr {Ai ∩ A j } = Pr {Ai } Pr {A j } = 1/4, the
events A1 , A2 , and A3 are pairwise independent. The events are not mutually inde-
pendent, however, because Pr { A1 ∩ A2 ∩ A3 } = 0 and Pr { A1 } Pr { A2 } Pr {A3 } =
1/8 = 0.

Bayes’s theorem
From the definition of conditional probability (C.14) and the commutative law
A ∩ B = B ∩ A, it follows that for two events A and B, each with nonzero proba-
bility,
Pr { A ∩ B} = Pr {B} Pr {A | B} (C.16)
= Pr { A} Pr {B | A} .
Solving for Pr {A | B}, we obtain
Pr { A} Pr {B | A}
Pr { A | B} = , (C.17)
Pr {B}
which is known as Bayes’s theorem. The denominator Pr {B} is a normalizing
constant that we can reexpress as follows. Since B = (B ∩ A) ∪ (B ∩ A) and B ∩ A
and B ∩ A are mutually exclusive events,
Pr {B} = Pr {B ∩ A} + Pr {B ∩ A}
= Pr {A} Pr {B | A} + Pr { A} Pr {B | A} .
Copyright @ 2001. MIT Press.

Substituting into equation (C.17), we obtain an equivalent form of Bayes’s theo-


rem:
Pr { A} Pr {B | A}
Pr { A | B} = .
Pr { A} Pr {B | A} + Pr { A} Pr {B | A}
C.2 Probability 1105

Bayes’s theorem can simplify the computing of conditional probabilities. For


example, suppose that we have a fair coin and a biased coin that always comes up
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

heads. We run an experiment consisting of three independent events: one of the


two coins is chosen at random, the coin is flipped once, and then it is flipped again.
Suppose that the chosen coin comes up heads both times. What is the probability
that it is biased?
We solve this problem using Bayes’s theorem. Let A be the event that the bi-
ased coin is chosen, and let B be the event that the coin comes up heads both
times. We wish to determine Pr { A | B}. We have Pr {A} = 1/2, Pr {B | A} = 1,
Pr {A} = 1/2, and Pr {B | A} = 1/4; hence,
(1/2) · 1
Pr {A | B} =
(1/2) · 1 + (1/2) · (1/4)
= 4/5 .

Exercises

C.2-1
Prove Boole’s inequality: For any finite or countably infinite sequence of events
A1 , A2 , . . .,
Pr {A1 ∪ A2 ∪ · · ·} ≤ Pr { A1 } + Pr { A2 } + · · · . (C.18)

C.2-2
Professor Rosencrantz flips a fair coin once. Professor Guildenstern flips a fair
coin twice. What is the probability that Professor Rosencrantz obtains more heads
than Professor Guildenstern?

C.2-3
A deck of 10 cards, each bearing a distinct number from 1 to 10, is shuffled to mix
the cards thoroughly. Three cards are removed one at a time from the deck. What
is the probability that the three cards are selected in sorted (increasing) order?

C.2-4 
Describe a procedure that takes as input two integers a and b such that 0 < a < b
and, using fair coin flips, produces as output heads with probability a/b and tails
Copyright @ 2001. MIT Press.

with probability (b − a)/b. Give a bound on the expected number of coin flips,
which should be O(1). (Hint: Represent a/b in binary.)

C.2-5
Prove that
Pr {A | B} + Pr { A | B} = 1 .
1106 Appendix C Counting and Probability

C.2-6
Prove that for any collection of events A1 , A2 , . . . , An ,
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

Pr { A1 ∩ A2 ∩ · · · ∩ An } = Pr {A1 } · Pr {A2 | A1 } · Pr { A3 | A1 ∩ A2 } · · ·
Pr {An | A1 ∩ A2 ∩ · · · ∩ An−1 } .

C.2-7 
Show how to construct a set of n events that are pairwise independent but such that
no subset of k > 2 of them is mutually independent.

C.2-8 
Two events A and B are conditionally independent, given C, if
Pr { A ∩ B | C} = Pr { A | C} · Pr {B | C} .
Give a simple but nontrivial example of two events that are not independent but are
conditionally independent given a third event.

C.2-9 
You are a contestant in a game show in which a prize is hidden behind one of three
curtains. You will win the prize if you select the correct curtain. After you have
picked one curtain but before the curtain is lifted, the emcee lifts one of the other
curtains, knowing that it will reveal an empty stage, and asks if you would like
to switch from your current selection to the remaining curtain. How would your
chances change if you switch?

C.2-10 
A prison warden has randomly picked one prisoner among three to go free. The
other two will be executed. The guard knows which one will go free but is forbid-
den to give any prisoner information regarding his status. Let us call the prisoners
X , Y , and Z . Prisoner X asks the guard privately which of Y or Z will be executed,
arguing that since he already knows that at least one of them must die, the guard
won’t be revealing any information about his own status. The guard tells X that Y
is to be executed. Prisoner X feels happier now, since he figures that either he or
prisoner Z will go free, which means that his probability of going free is now 1/2.
Is he right, or are his chances still 1/3? Explain.
Copyright @ 2001. MIT Press.

C.3 Discrete random variables

A (discrete) random variable X is a function from a finite or countably infinite


sample space S to the real numbers. It associates a real number with each possible
C.3 Discrete random variables 1107

outcome of an experiment, which allows us to work with the probability distribu-


tion induced on the resulting set of numbers. Random variables can also be defined
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

for uncountably infinite sample spaces, but they raise technical issues that are un-
necessary to address for our purposes. Henceforth, we shall assume that random
variables are discrete.
For a random variable X and a real number x, we define the event X = x to be
{s ∈ S : X (s) = x}; thus,

Pr {X = x} = Pr {s} .
{s∈S:X (s)=x}

The function
f (x) = Pr {X = x}
is the probability density function
 of the random variable X . From the probability
axioms, Pr {X = x} ≥ 0 and x Pr {X = x} = 1.
As an example, consider the experiment of rolling a pair of ordinary, 6-sided
dice. There are 36 possible elementary events in the sample space. We assume
that the probability distribution is uniform, so that each elementary event s ∈ S is
equally likely: Pr {s} = 1/36. Define the random variable X to be the maximum of
the two values showing on the dice. We have Pr {X = 3} = 5/36, since X assigns
a value of 3 to 5 of the 36 possible elementary events, namely, (1, 3), (2, 3), (3, 3),
(3, 2), and (3, 1).
It is common for several random variables to be defined on the same sample
space. If X and Y are random variables, the function
f (x, y) = Pr {X = x and Y = y}
is the joint probability density function of X and Y . For a fixed value y,

Pr {Y = y} = Pr {X = x and Y = y} ,
x
and similarly, for a fixed value x,

Pr {X = x} = Pr {X = x and Y = y} .
y

Using the definition (C.14) of conditional probability, we have


Pr {X = x and Y = y}
Pr {X = x | Y = y} = .
Pr {Y = y}
Copyright @ 2001. MIT Press.

We define two random variables X and Y to be independent if for all x and y, the
events X = x and Y = y are independent or, equivalently, if for all x and y, we
have Pr {X = x and Y = y} = Pr {X = x} Pr {Y = y}.
Given a set of random variables defined over the same sample space, one can
define new random variables as sums, products, or other functions of the original
variables.
1108 Appendix C Counting and Probability

Expected value of a random variable


The simplest and most useful summary of the distribution of a random variable is
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

the “average” of the values it takes on. The expected value (or, synonymously,
expectation or mean) of a discrete random variable X is

E [X ] = x Pr {X = x} , (C.19)
x

which is well defined if the sum is finite or converges absolutely. Sometimes the
expectation of X is denoted by µ X or, when the random variable is apparent from
context, simply by µ.
Consider a game in which you flip two fair coins. You earn $3 for each head but
lose $2 for each tail. The expected value of the random variable X representing
your earnings is
E [X ] = 6 · Pr {2 H’s} + 1 · Pr {1 H, 1 T} − 4 · Pr {2 T’s}
= 6(1/4) + 1(1/2) − 4(1/4)
= 1.
The expectation of the sum of two random variables is the sum of their expecta-
tions, that is,
E [X + Y ] = E [X ] + E [Y ] , (C.20)
whenever E [X ] and E [Y ] are defined. We call this property linearity of expecta-
tion, and it holds even if X and Y are not independent. It also extends to finite and
absolutely convergent summations of expectations. Linearity of expectation is the
key property that enables us to perform probabilistic analyses by using indicator
random variables (see Section 5.2).
If X is any random variable, any function g(x) defines a new random variable
g(X ). If the expectation of g(X ) is defined, then

E [g(X )] = g(x) Pr {X = x} .
x

Letting g(x) = ax, we have for any constant a,


E [a X ] = aE [X ] . (C.21)
Consequently, expectations are linear: for any two random variables X and Y and
any constant a,
Copyright @ 2001. MIT Press.

E [a X + Y ] = aE [X ] + E [Y ] . (C.22)
When two random variables X and Y are independent and each has a defined
expectation,

E [X Y ] = x y Pr {X = x and Y = y}
x y
C.3 Discrete random variables 1109


= x y Pr {X = x} Pr {Y = y}
x y
   
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

= x Pr {X = x} y Pr {Y = y}
x y
= E [X ] E [Y ] .
In general, when n random variables X 1 , X 2 , . . . , X n are mutually independent,
E [X 1 X 2 · · · X n ] = E [X 1 ] E [X 2 ] · · · E [X n ] . (C.23)
When a random variable X takes on values from the set of natural numbers
N = {0, 1, 2, . . .}, there is a nice formula for its expectation:


E [X ] = i Pr {X = i}
i=0
∞
= i(Pr {X ≥ i} − Pr {X ≥ i + 1})
i=0


= Pr {X ≥ i} , (C.24)
i=1

since each term Pr {X ≥ i} is added in i times and subtracted out i −1 times (except
Pr {X ≥ 0}, which is added in 0 times and not subtracted out at all).
When we apply a convex function f (x) to a random variable X , Jensen’s in-
equality gives us
E [ f (X )] ≥ f (E [X ]) , (C.25)
provided that the expectations exist and are finite. (A function f (x) is convex if for
all x and y and for all 0 ≤ λ ≤ 1, we have f (λx +(1−λ)y) ≤ λ f (x)+(1−λ) f (y).)

Variance and standard deviation


The expected value of a random variable does not tell us how “spread out” the
variable’s values are. For example, if we have random variables X and Y for which
Pr {X = 1/4} = Pr {X = 3/4} = 1/2 and Pr {Y = 0} = Pr {Y = 1} = 1/2, then
both E [X ] and E [Y ] are 1/2, yet the actual values taken on by Y are farther from
Copyright @ 2001. MIT Press.

the mean than the actual values taken on by X .


The notion of variance mathematically expresses how far from the mean a ran-
dom variable’s values are likely to be. The variance of a random variable X with
mean E [X ] is
1112 Appendix C Counting and Probability

C.4 The geometric and binomial distributions


All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

A coin flip is an instance of a Bernoulli trial, which is defined as an experiment


with only two possible outcomes: success, which occurs with probability p, and
failure, which occurs with probability q = 1 − p. When we speak of Bernoulli
trials collectively, we mean that the trials are mutually independent and, unless we
specifically say otherwise, that each has the same probability p for success. Two
important distributions arise from Bernoulli trials: the geometric distribution and
the binomial distribution.

The geometric distribution


Suppose we have a sequence of Bernoulli trials, each with a probability p of suc-
cess and a probability q = 1 − p of failure. How many trials occur before we
obtain a success? Let the random variable X be the number of trials needed to
obtain a success. Then X has values in the range {1, 2, . . .}, and for k ≥ 1,
Pr {X = k} = q k−1 p , (C.30)
since we have k − 1 failures before the one success. A probability distribution sat-
isfying equation (C.30) is said to be a geometric distribution. Figure C.1 illustrates
such a distribution.
Assuming that q < 1, the expectation of a geometric distribution can be calcu-
lated using identity (A.8):


E [X ] = kq k−1 p
k=1
p ∞
= kq k
q k=0
p q
= ·
q (1 − q)2
= 1/ p . (C.31)
Thus, on average, it takes 1/ p trials before we obtain a success, an intuitive result.
The variance, which can be calculated similarly, but using Exercise A.1-3, is
Copyright @ 2001. MIT Press.

Var [X ] = q/ p2 . (C.32)
As an example, suppose we repeatedly roll two dice until we obtain either a
seven or an eleven. Of the 36 possible outcomes, 6 yield a seven and 2 yield an
eleven. Thus, the probability of success is p = 8/36 = 2/9, and we must roll
1/ p = 9/2 = 4.5 times on average to obtain a seven or eleven.
C.4 The geometric and binomial distributions 1113

 k−1  
2 1
3 3
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

0.35

0.30

0.25

0.20

0.15

0.10

0.05

k
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure C.1 A geometric distribution with probability p = 1/3 of success and a probability
q = 1 − p of failure. The expectation of the distribution is 1/ p = 3.

The binomial distribution


How many successes occur during n Bernoulli trials, where a success occurs with
probability p and a failure with probability q = 1 − p? Define the random vari-
able X to be the number of successes in n trials. Then X has values in the range
{0, 1, . . . , n}, and for k = 0, . . . , n,
 
n k n−k
Pr {X = k} = p q , (C.33)
k

since there are nk ways to pick which k of the n trials are successes, and the prob-
ability that each occurs is pk q n−k . A probability distribution satisfying equation
(C.33) is said to be a binomial distribution. For convenience, we define the family
Copyright @ 2001. MIT Press.

of binomial distributions using the notation


 
n k
b(k; n, p) = p (1 − p)n−k . (C.34)
k
Figure C.2 illustrates a binomial distribution. The name “binomial” comes from
the fact that (C.33) is the kth term of the expansion of ( p + q)n . Consequently,
since p + q = 1,
1114 Appendix C Counting and Probability

b (k; 15, 1/3)

0.25
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

0.20

0.15

0.10

0.05

k
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure C.2 The binomial distribution b(k; 15, 1/3) resulting from n = 15 Bernoulli trials, each
with probability p = 1/3 of success. The expectation of the distribution is np = 5.


n
b(k; n, p) = 1 , (C.35)
k=0

as is required by axiom 2 of the probability axioms.


We can compute the expectation of a random variable having a binomial distri-
bution from equations (C.8) and (C.35). Let X be a random variable that follows
the binomial distribution b(k; n, p), and let q = 1 − p. By the definition of expec-
tation, we have

n
E [X ] = k Pr {X = k}
k=0
n
= k b(k; n, p)
k=0
n  
n k n−k
= k p q
k=1
k
 n  
n − 1 k−1 n−k
Copyright @ 2001. MIT Press.

= np p q
k=1
k−1
n−1  
n − 1 k (n−1)−k
= np p q
k=0
k
C.4 The geometric and binomial distributions 1115


n−1
= np b(k; n − 1, p)
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

k=0
= np . (C.36)
By using the linearity of expectation, we can obtain the same result with sub-
stantially less algebra. Let X i be the random variable describing the number of
successes in the ith trial. Then E [X i ] = p · 1 + q · 0 = p, and by linearity of
expectation (equation (C.20)), the expected number of successes for n trials is
 
n
E [X ] = E Xi
i=1

n
= E [X i ]
i=1

n
= p
i=1
= np . (C.37)
The same approach can be used to calculate the variance of the distribution.
Using equation (C.26), we have Var [X i ] = E [X i2 ] − E2 [X i ]. Since X i only takes
on the values 0 and 1, we have E [X i2 ] = E [X i ] = p, and hence
Var [X i ] = p − p2 = pq . (C.38)
To compute the variance of X , we take advantage of the independence of the n
trials; thus, by equation (C.28),
 
 n
Var [X ] = Var Xi
i=1

n
= Var [X i ]
i=1
n
= pq
i=1
= npq . (C.39)
Copyright @ 2001. MIT Press.

As can be seen from Figure C.2, the binomial distribution b(k; n, p) increases
as k runs from 0 to n until it reaches the mean np, and then it decreases. We can
prove that the distribution always behaves in this manner by looking at the ratio of
successive terms:
63

Introduction to Python 3

We conclude with an introduction to Python 3.

• Look over and try some examples from https://docs.python.org/3/


tutorial/
• In particular, it is useful to peruse and try out, from https://docs.
python.org/3/library/index.html#library-index, the following:
– Built-in Functions
– Build-in Constants
– Built-in Types
∗ The principal built-in types are: numerics, sequences, map-
pings, classes, instances and exceptions. Of these, you will
most likely not have to worry about classes, instances and ex-
ceptions. The code you will need for this course will be quite
straightforward.
• You can of course install Python 3 on your personal device. Also,
Python 3 is installed on eceUbuntu. As a student in ECE, you should
already have an account on eceUbuntu. If you are on campus, you
should be able to simply ssh eceUbuntu. If you are off campus, you can
either install the campus VPN:
https://uwaterloo.ca/information-systems-technology/
services/virtual-private-network-vpn
Or first ssh ecelinux4.uwaterloo.ca and then immediately, ssh -X eceUbuntu
as the instructions say.
• It is useful to do the following. Better yet, add it to your .bashrc file.
[alice@ecetesla2 ∼]$ alias python=’/usr/bin/python3’
• Example python code on Learn:
– ask.py, and,
– romandecimal.py, romandecimalsolution.py, tester-rnstringtodec.py
64

You might also like