0% found this document useful (0 votes)
127 views

HW1 Solns

This document contains solutions to homework problems about asymptotic notation and analysis. It includes: 1) Proving that f(n) + g(n) = Θ(max{f(n), g(n)}) and analyzing the runtime of an algorithm with tunable subroutines. 2) Showing that the harmonic number Hn = Θ(ln n) and that n! = Θ(n ln n). 3) Deriving Stirling's approximation that n! ≤ (n/e)√n. Integral bounds and concavity of the log function are used.

Uploaded by

MK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

HW1 Solns

This document contains solutions to homework problems about asymptotic notation and analysis. It includes: 1) Proving that f(n) + g(n) = Θ(max{f(n), g(n)}) and analyzing the runtime of an algorithm with tunable subroutines. 2) Showing that the harmonic number Hn = Θ(ln n) and that n! = Θ(n ln n). 3) Deriving Stirling's approximation that n! ≤ (n/e)√n. Integral bounds and concavity of the log function are used.

Uploaded by

MK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ORIE 4520: Stochastics at Scale Homework 1: Solutions

Fall 2015 Sid Banerjee ([email protected])

Problem 1: (Practice with Asymptotic Notation)


An essential requirement for understanding scaling behavior is comfort with asymptotic (or ‘big-O’)
notation. In this problem, you will prove some basic facts about such asymptotics.

Part (a)
Given any two functions f (·) and g(·), show that f (n) + g(n) = Θ(max{f (n), g(n)}).

Solution: Note – Unless mentioned otherwise, we will always consider functions from the positive
integers to the non-negative real numbers.
To show f (n)+g(n) = Θ(max{f (n), g(n)}), we need to show f (n)+g(n) = Ω(max{f (n), g(n)})
and f (n) + g(n) = O(max{f (n), g(n)}).
First, since the functions are non-negative, we have that f (n) + g(n) ≥ f (n) and f (n) + g(n) ≥
g(n) – combining these, we get that f (n) + g(n) ≥ max{f (n), g(n)} for all n; thus f (n) + g(n) =
Ω(max{f (n), g(n)}). On the other hand, we also have that f (n) + g(n) ≤ 2 max{f (n), g(n)} for all
n; thus f (n) + g(n) = O(max{f (n), g(n)}). This completes the proof.

Part (b)
An algorithm ALG consists of two tunable sub-algorithms ALGA and ALGB , which have to be
executed serially (i.e., one run of ALG involves first executing ALGA followed by ALGB ). Moreover,
given any function f (n), we can tune the two algorithms such that one run of ALGA takes time
O(f (n)) and ALGB takes time O(n/f (n)). How should we choose f to minimize the overall runtime
of ALG (i.e., to ensure the runtime of ALG is O(h(n)) for the smallest-growing function h)?
How would your answer change if ALGA and ALGB could be executed in parallel, and we have
to wait for both to finish?

Solution: Since the two algorithms are run sequentially, the total runtime is O(f (n) + n/f (n))
– from the previous part, we have that this is same as O(max{f (n), n/f (n)}). Now, in order to
minimize this, it is clear we need to set f (n) such that both parts are equal. Thus, we should
√ √
choose f (n) = n and thus h(n) = n.
In case the two ran in parallel, the runtime would now be O(max{f (n), n/f (n)}) – clearly this
would have the same optimal runtime!

Part (c)
We are given a recursive algorithm which, given an input of size n, splits it into 2 problems of size
n/2, solves each recursively, and then combines the two parts in time O(n). Thus, if T (n) denotes
the runtime for the algorithm on an input of size n, then we have:

T (n) = 2T (n/2) + O(n)

Prove that T (n) = O(n log n).

1
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Hint: Note that for a constant size input, the algorithm takes O(1) time. How many recursions
does it require to reduce a problem of size n to constant size subproblems? What is the total runtime
overhead at each recursive level?

Solution: We will solve this via an explicit counting argument, which I find instructive in under-
standing how runtimes accumulate in a recursion. Let k = {1, 2, . . . , K} denote the levels of the
recursion tree – here k = 1 is the original problem of size n, k = 2 is the first level of recursion with
two subproblems of size n/2, and extending this, at level k, we have 2k subproblems, each of size
n/2k , and at level K the subproblems are of size 1. Now observe the following:
• The subproblems are of size 1 after K ≤ dlog2 ne recursive levels. Moreover, the time taken
to solve a subproblem of size 1 is O(1).

• The overhead from a subproblem of size n is O(n) – thus the total overhead at level k is
2k · O(n/2k ) = O(n)
Putting this together, we get that T (n) = O(n log n).

Problem 2: (Some important asymptotes)


Part (a)
In class, we defined the harmonic number Hn = ni=1 1/i. Argue that:
P

Z n+1 Z n
1 1
dx ≤ Hn ≤ 1 + dx
1 x 1 x

Thus, prove that Hn = Θ(ln n).


Hint: Bound the 1/x function from above and below by a step function.

Solution: The idea is to represent the harmonic number as the area of a curve (see Figure 1).
Essentially, we have that Hn is the area under a set of rectangles of length 1 and height 1/i, i ∈
{1, 2, . . . , n}. Now suppose we have a step function such that fu (x) = 1/i, x ∈ [i, i + 1) (i.e., the
rectangles above the 1/x curve). Then we have:
Z n+1
1
Hn ≥ dx = ln(n + 1)
1 x
On the other hand, if we define fl (x) = 1/i, x ∈ (i − 1, i], and consider x ∈ [2, n], we have that:
n
X 1
Hn = 1 +
i
Zi=2n
1
≤1+ dx = 1 + ln(n)
1 x
Thus we have that ln(n + 1) ≤ Hn ≤ 1 + ln n and hence Hn = Θ(log n).

2
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Figure 1: ”Integral Test” by Jim.belk. Licensed under Public Domain via Commons, https:
//commons.wikimedia.org/wiki/File:Integral_Test.svg#/media/File:Integral_Test.svg

Part (b)
Next, we try to find the asymptotic growth n!. As in the previous part, argue that:
Z n Z n+1
ln xdx ≤ ln n! ≤ ln xdx
1 1

Thus, prove that n! = Θ(n ln n).

Solution: This proceeds in a very similar fashion as above, except that now log x is an increasing
function. We first compare the function fl (x) = log i, x ∈ [i, i + 1) for x ∈ [1, n + 1] to get:
Z n+1
log n! ≤ ln xdx = x log x − x|n+1
1 = (n + 1) log(n + 1) − n
1

To lower bound, we use the function fu (x) = log i, x ∈ (i − 1, i] for x ∈ [1, n] to get:
n
X Z n
log n! = log i ≥ ln xdx = x log x − x|n1 = n log n − n + 1
i=2 1

Combining and using the fact that n = O(n log n), we get that log n! = Θ(n log n).

Part (c)
(Stirling’s approximation) We now improve the estimate in the previous part to get the familiar
form of Stirling’s approximation. First, argue that for any integer i ≥ 1, we have:
Z i+1
log i + log(i + 1)
log xdx ≥
i 2

3
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Using this, show that:


√  n n
n! ≤ e n
e
Hint: Given any i > 1, where does the line joining the two points (i, ln i) and (i + 1, ln(i + 1)) lie
with respect to the function log x?

Solution: We again want to use the integral bounding trick – here however, we use the added
property that log x is a concave function, and hence for any positive integer i, the line segment
joining (i, log i) and (i + 1, log(i + 1)) lies below the curve log x for x ∈ [i, i + 1]. Moreover, the
area of the trapezoid bounded by the line segment joining (i, log i) and (i + 1, log(i + 1)), and
the lines x = 0, y = i and y = i + 1 is given by log i+log(i+1)
2 (recall – the area of a trapezoid is
1/2 · (height) · (sum of lengths of parallel sides)). Thus we have that:
Z i+1
log i + log(i + 1)
log xdx ≥
i 2

Now summing up over i ∈ {1, 2, . . . , n − 1}, we have:


n Z i+1
X log n
log xdx ≥ log n! −
i 2
i=1
Rn
However the left hand side is just 1 log xdx = n log n − n + 1. Thus we get:

log n
log n! ≤ n log n − n + 1 +
2
Exponentiating both sides, we get:
√  n n
n! ≤ e n
e

Problem 3: (The Geometric Distribution)


A random variable X is said to have a Geometric(p) distribution if for any integer k ≥ 1, we have
P[X = k] = p(1 − p)k−1 .

Part (a)
Suppose we repeatedly toss a coin which gives HEADS with probability p. Argue that the number
of tosses until we see the first HEADS is distributed as Geometric(p).

Solution: Our sample space Ω consists of all sequences over the alphabet {H, T } that end with
H (HEADS) and contain no other H’s, i.e. Ω = {H, T H, T T H, ...}. The number of failures k − 1
before the first success (HEADS) with a probability of success p is given by: P[X = k] = p(1−p)k−1
with k being the total number of tosses including the first HEADS that terminates the experiment.
Therefore, the number of tosses until we see the first HEADS is distributed as Geometric(p).

4
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Part (b)
(Memoryless property) Using the definition of conditional probability, prove that for any integers
i, k ≥ 1, the random variable X obeys:

P[X = k + i|X > k] = P[X = i]

Also convince yourself that this follows immediately from the characterization of the Geometric
r.v. in Part (a).

Solution: By the definition of conditional probability,

P[X = k + i ∩ X > k] P[X = k + i] p(1 − p)k+i−1


P[X = k+i|X > k] = = = = p(1−p)i−1 = P[X = i].
P[X > k] P[X > k] (1 − p)k

Note that here we used that P[X > k] = (1 − p)k . The event ”X > k” means that at least k + 1
tosses are required. This is exactly equivalent to saying that the first k tosses are all TAILS and
the probability of this event is precisely (1 − p)k .

Part (c)
Show that: (i)E[X] = p1 , and (ii)V ar[X] = 1−p
p2
Hint: Note that by the memoryless property, a Geometric(p) random variable X is 1 with probability
p, and 1+Y with probability (1−p), where Y also has a Geometric(p) distribution. Now try writing
the expectation and variance recursively.

Solution: Note that by the memoryless property, a Geometric(p) random variable X is 1 with
probability p, and 1 + Y with probability (1 − p), where Y also has a Geometric(p) distribution.
Therefore,

E[X] = E[p · 1 + (1 − p) · (1 + Y )] = p + (1 − p)E[1 + Y ] = 1 + (1 − p)E[Y ]


= 1 + (1 − p)E[X].

Solving for E[X], we get E[X] = p1 .


Next, recall that V ar[X] = E[X 2 ] − (E[X])2 = E[X 2 ] − p12 . So, first we need to calculate E[X 2 ].

E[X 2 ] = E[p · 1 + (1 − p) · (1 + Y )2 ]
= p + (1 − p)(1 + 2 · E[Y ] + E[Y 2 ]) = p + (1 − p)(1 + 2 · E[X] + E[X 2 ])
1
= p + (1 − p)(1 + 2 · + E[X 2 ]).
p
2−p 1 1−p
Simplifying, we get E[X 2 ] = p2
, and hence V ar(X) = E[X 2 ] − p2
= p2

5
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Problem 4: (Upper Bounds on Collision Probabilities)


Let Xm,n denote the number of collisions when m balls are
 dropped u.a.r. into n bins. In class, we
showed that then the expected number of collisions is m
2 /n. We now upper bound the probability
that no collision occurs.
Assume that n > m (clearly this is required for no collisions!). First, using the law of total
probability, argue that:
m−1
Y 
i
P[No collisions when m balls dropped u.a.r. in n bins] = 1−
n
i=1

Next, using the inequality e−x ≥ (1 − x), simplify the above to show:

P[No collisions when m balls dropped u.a.r. in n bins] ≤ e−E[Xm,n ]

Solution: Let Di be the event that there is no collision after having thrown in the i-th ball. If
there is no collision after throwing in i balls then they must all be occupying different slots, so the
probability of no collision upon throwing in the (i + 1)-st ball is exactly (n − i)/n. That is,
n−i
P[Di+1 |Di ] = .
n
Also note that P[D1 ] = 1. The probability of no collision at the end of the game can now be
computed via
m−1 m−1
Y 
Y i
P[Dm ] = P[Dm |Dm−1 ] · P[Dm−1 ] = · · · = P[Di+1 |Di ] = 1− .
n
i=1 i=1

Note that i/n ≤ 1. So we can use the inequality 1x ≤ ex for each term of the above expression.
This means that:
m−1
Y  i m(m−1)
P[Dm ] ≤ e− n = e− 2n = e−E[Xm,n ] .
i=1

Problem 5: (Posterior Confidence in Verifying Matrix Multiplication)


In class, we saw Freivald’s algorithm for checking matrix multiplication, which, given matrices A, B
and C, returned the following:
• If AB = C, then the algorithm always returned TRUE

• If AB 6= C, then the algorithm returned TRUE with probability at most 1/2

Part (a)
Given any  > 0, how many times do we need to run Freivald’s algorithm to be sure that {AB = C}
with probability greater than 1 − ?

6
ORIE 4520: Stochastics at Scale Homework 1: Solutions
Fall 2015 Sid Banerjee ([email protected])

Part (b)
Suppose we started with the belief that the events {AB = C} and {AB 6= C} were equally likely
(i.e., P[AB = C] = P[AB 6= C] = 1/2). Moreover, suppose k independent runs of Freivald’s
algorithm all returned TRUE. Then what is our new (or posterior ) belief that {AB = C}?

Solution:

Part (a)
We know that P[AB = C] ≥ 1 − 21n , therefore, we want 1 − 21n ≥ 1 − . Which means, k ≥ − log2 ,
and hence n = d− log2 e.

Part (b)
Let I be our information that k independent runs of Freivald’s algorithm all returned TRUE. Now
we simply need to use Bayes’ Theorem to find the posterior:

P[I|AB = C]P[AB = C] 1 1 1
P[AB = C|I] =
P[I|AB = C]P[AB = C] + P[I|AB 6= C]P[AB 6= C]
= 1 1 1 · 2 = 1 + 2−k .
1· 2 + 2k · 2

You might also like