Rahul Jha (210802) January 21, 2025 1
1 1 Marks
1. For any two events A and B, P (A ∪ B) = P (A) + P (B) - P (A ∩ B).
A: We know from the definition of probability that for any two disjoint events A, B we
have P (A ∪ B) = P (A) + P (B).
A = (A ∩ B) ∪ (A ∩ B c )
P (A) = P (A ∩ B) + P (A ∩ B c )
(A ∪ B) = ((A ∪ B) ∩ B) ∪ (A ∩ B c ) = B ∪ (A ∩ B c )
P (A ∪ B) = P (B) + P (A ∩ B c )
P (A ∪ B) = P (B) + P (A) − P (A ∩ B)
Where equation 5 is due to equations 2, 4. Note that equations 1, 3 break down a set into
a union of two disjoint sets. ,
2. For any two random variables X, Y , E[X + Y ] = E[X] + E[Y ].
A: I use the following definition of discrete expectation E[X] = ΣxP (X = y) where the
summation is over the image of the sample space under X. WLOG, we can assume that
both X, Y are defined in the same sample space. Also, ΣP (x = X) = 1. Now,
E[X + Y ] = Σy Σx (x + y)P (X = x, Y = y)
E[X + Y ] = Σy Σx xP (X = x, Y = y) + Σy Σx yP (X = x, Y = y)
E[X + Y ] = Σx xΣP (X = x, Y = y) + Σy yΣx P (X = x, Y = y)
E[X + Y ] = Σx xP (X = x) + Σy yP (Y = y)
E[X + Y ] = E[X] + E[Y ]
,
3. Var(cX) = c2 Var(X) where c is a constant and X is a random variable.
A: Using the definition of variance, for a random variable X, Var(X) = E(X − E[X])2 .
Also, by linearity of expectation, we have E[cX] = cE[X]. Now,
Var(cX) = E(cX − E[cX])2
Var(cX) = E(cX − cE[X])2 = Ec2 (X − E[X])
Var(cX) = c2 E(X − E[X])2
Var(cX) = c2 Var(X)
,
4. Prove that with probability atleast 1 − k1 , a uniformly random permutation σ : [n] → [n]
Rahul Jha (210802) January 21, 2025 2
has atmost k fixed points.
A: Consider the case where a permutation has atleast k + 1 fixed points. The probability
of that happening can be calculated as
n!
1. Choosing k + 1 numbers in n Ck+1 , ie (k+1)!(n−k−1)!
ways.
2. Permuting the remaining (n − k − 1) numbers in (n − k − 1)! ways.
n! 1
3. The total permutations then are (k+1)!
. The probability is (k+1)!
1
We therefore have P (atmost k fixed points) = 1 − (k+1)!
.
Proving the bound is trivial, and it goes like k < (k + 1)! =⇒ k1 > (k+1)! 1
=⇒ 1 − k1 <
1 1
1 − (k+1)! . Finally, P (atmost k fixed points) = 1 − (k+1)! > 1 − k1 . ,
5. Consider a particle that does an unbiased random walk on real line. It starts at position
0. For any i, if the particle is at i, it moves to position i + 1 with probability 12 and to
position i1 with probability 12 . Prove that after n steps, with at least 1 √10n probability, the
√
distance of the particle from start, i.e., 0 is at most nlnn.
A: We can write the position as Sn = ΣXi , where √ for each i −1 ≤ Xi ≤ 1, and E[Xi ] = 0.
The problem now reduces to finding √ P (|Sn | ≤− 2n lnn
n lnn). We can simply √ use Hoeffding’s
√
−ln n
inequality which√ gets us P (|Sn | ≥ n lnn) ≤ 2e 4n , which is P (|S | ≥
n n lnn) ≤ 2e ,
i,e P (|Sn | ≥ n lnn) ≤ √2n .
√
The desired probability is then P (|Sn | ≤ n lnn) ≥ 1 − √2n ≥ 1 − √10n .
,
2 2 Marks
1. If X, Y are independent random variables then E[XY ] = E[X]E[Y ].
A: If X, Y are independent random variables, then the joint distribution function factorizes
as P (X = x, Y = y) = P (X = x)Y (y = y). Now,
E[XY ] = ΣΣxyP (X = x, Y = y)
E[XY ] = ΣΣxyP (X = x)P (Y = y)
E[XY ] = ΣxP (X = x)ΣyP (Y = y)
E[XY ] = E[X]E[Y ]
Rahul Jha (210802) January 21, 2025 3
QED. ,
2. E[X] = P (A)E[X|A] + P (Ac )E[X|Ac ]
A: Conditioning on A, P (X) = P (A)P (X|A) + P (Ac )P (X|Ac ). Now,
E[X] = ΣxP (X = x)
E[X] = Σx(P (A)P (X|A) + P (Ac )P (X|Ac ))
E[X] = P (A)ΣxP (X|A) + P (Ac )ΣxP (X|Ac )
E[X] = P (A)E[X|A] + P (Ac )E[X|Ac ].
This is as required. ,
3. If X1 , X2 , . . . , Xn are independent random variables then V ar(ΣXi ) = ΣV ar(Xi ).
A: WLOG, E[Xi ] = 0. Using the first problem, we can do the following.
V ar(ΣXi ) = E(ΣXi )2
=⇒ E(ΣXi2 + Σi̸=j Xi Xj )
=⇒ ΣEXi2 + Σi̸=j E[Xi Xj ]
=⇒ ΣEXi2 + Σi̸=j E[Xi ]E[Xj ]
=⇒ ΣEXi2 = ΣV ar(Xi ).
We can make the initial assumption because we can always define Yi = Xi − E[Xi ]. ,.
4. Prove Markov inequality.
A: We start as follows
E[X] = ΣxP (X = x)
≥ Σx≥c cP (X = x)
≥ cΣx≥c P (X = x)
≥ cP (X ≥ c)
E[X]
P (X ≥ c) ≤
c
E[X]
Now, choosing c = cE[X], we get P (X ≥ cE[X]) ≤ cE[X] ≤ 1c as required. ,
5. Prove Chebyshev’s inequality.
A. First, we note that the event |X − E[X]| ≥ t is equivalent to the event (X − E[X]) ≥ t2 .
Now using the transformation Y = (X − E[X])2 we are interested in P (Y ≥ t2 ). By
definition, Y only takes non negative values and we can use Markov’s inequality. We have
P (Y ≥ t2 ) ≤ E[Y
t2
]
, which is V ar(X)
t2
as required. ,.
6. Show that pairwise independence does not imply independence.
Rahul Jha (210802) January 21, 2025 4
A. Can show directly using a contradiction. Consider a 4 sided die, and consider the events
A = {1, 2}, B = {2, 3} and C = {2, 4}. Clearly, P (AB) = P (A)P (B). P (AC) = P (A)P (C)
and P (BC) = P (B)P (C) but P (ABC) ̸= P (A)P (B)P (C). ,.
7. Compare Markov, Chebyshev and Hoeffding Bounds.
A. Define the random variable Xi as 1 if the ith throw is 6 else 0 and S = ΣXi . Clearly
S ∼ Binomial(1/6, N ). We know E[S] = N/6 and V ar(S) = 5N/36 using properties of
Binomial distribution. Now,
1. Markov Bound : P (S ≥ N/4) = P (S ≥ 3/2E[S]) ≤ 2/3. The bound is indepen-
dent of N .
Var(S)
2. Chebyshev Bound: P (|S − E[S]| ≥ N/12) ≤ N 2 /144
≤ 20/N . The bound becomes
stronger as N increases.
3. Hoeffding Bound: P (|S − E[S]|) ≥ N/12) ≤ 2e−N/72 . The bound decreases expo-
nentially as N increases.
3 3 Marks
1. It is promised that a given coin is either fair (Pr(Head) = 21 ) or biased with Pr(Head) =
1
2
+ ϵ where 0 < ϵ < 12 . Show that 100 ϵ2
coin tosses are sufficient to correctly determine the
type of coin (fair or biased) with at least 54 probability, i.e., give an algorithm that will
need at most 100
ϵ2
coin tosses, and should have the following guarantee: if the coin is fair, the
algorithm will return ‘fair’ with probability at least 54 , and if the coin is biased, the algorithm
will return ‘biased’ with probability at least 45 .
A. Consider the following algorithm.
1. Define δ = 2ϵ . Toss the coin 100
ϵ2
times, and let the ith outcome be represented as Xi .
Clearly, if we define Heads to be 1, tails to be 0, Xi ∈ [0, 1] and is therefore a bounded
r.v.
2. Compute X = ΣXi . Now, if p = X/n > 0.5 + δ, declare the coin biased, else declare
the coin fair.
3. Note that pf air = 0.5 and pbiased = 0/5 + ϵ
I claim that this rule works, and the proof goes as
1. Coin Is Fair : The strategy returns biased with the probability P (|p − pf air | ≥
200ϵ2
ϵ/2), which by Hoeffdings’ inequality is bounded by P (|p − pf air | ≥ ϵ/2) ≤ 2e− 4 .
Rahul Jha (210802) January 21, 2025 5
2
Therefore P (|p − pf air | ≥ ϵ/2) ≤ e50
≤ 1/5. Therefore the strategy returns fair with
atleast probability 4/5.
2. Coin Is Biased: The strategy returns fair with the probability P (|p − pbiased | ≥
200ϵ2
ϵ/2), which by Hoeffdings’ inequality is bounded by P (|p − pbiased | ≥ ϵ/2) ≤ 2e− 4 .
Therefore P (|p − pbiased | ≥ ϵ/2) ≤ e250 ≤ 1/5. Therefore the strategy returns biased
with atleast probability 4/5. ,
2.
A. Let N = 100000 log 1δ , and the probability of correct estimation p ≥ 2/3. Repeating the
algorithm N times, let Xi be the random variable that is 1 if the ith algorithm estimates the
mean to be within ϵ distance, and 0 otherwise. Consider S = ΣXi . If we can show that
S ≥ N/2 we are done because then the median necessarily estimates the mean correctly.
Now consider the following proof that with N as defined, this is indeed the case.
P (S < N/2) ⇐⇒ P (|S − E[S]| ≥ N/6) because if p ≥ 2/3, S < N/2 ⇐⇒ (S − E[S]) ≥ N/6.
N2
P (|S − E[S]| ≥ N/6) ≤ 2e−2 36N by Hoeffding’s Inequality
N2 100000 ln δ
e−2 36N = e 18 ≤ δ 1000 which for small δ is less than δ
This proves that the median is an incorrect estimate with probability atmost δ. ,.
3.
A. Intuitively, the regret should simply scale by x, otherwise it will be possible to get a
better regret bound in the [0, 1] rewards case by simply scaling the rewards. The formal
proof goes as follows.
1. m is the number of exploration rounds for each arm, and T is the total number of
rounds (fixed).
2ϵ2 n
2. As rewards ∈ [0, x], the Hoeffding inequality takes the form P (|µˆa − µa | ≥ ϵ) ≤ 2e− x2
for all arms a.
q
2
3. To set RHS ≤ T 10 we can set ϵ = 5x mln T . This immediately gives us P (Bad) ≤ T19
1
where Bad means the previous event holds for all arms.
4. Now, E[R(T )] = P (Bad)E[R(T )|Bad] + P (Good)E[R(T )|Good], where the Bad event
is the event described in the previous bullet. By lecture notes, we have E[R(T )] ≤
2x
T8
+ E(R(T )|Good) where we take the worst case regret in case of Bad.
Rahul Jha (210802) January 21, 2025 6
5. To calculate E(R(T )|Good), we set the regret to x during exploration phase and µ∗ −µa
for the rest of the rounds.
E(R(T )|Good) = mKx + (T − mK)(µ∗ − µa )
mKx + (T − mK)(µ∗ − µa ) ≤ mKx + T (µ∗ − µa ) ≤ mKx + 2T ϵ
r
5 ln T
E(R(T )|Good) ≤ mKx + x
r m
5 ln T
E(R(T )|Good) ≤ x(mK + )
m
This is just the regret in the [0, 1] case multiplied by x. So the regret bound is O(xT 2/3 K 1/3 (log T )1/3 )
,
4.
A.
a. If both arms have a mean of 1/2, then the regret of choosing any arm is 0. The best
upper bound is 0. We can set the exploration rounds to 0 in this case.
1/3 1/3
b. Under the Good event, the maximum regret is 1000m( (lnTT1/3 )
)+(T −2m)1000( (lnTT1/3
)
)
2/3 1/3
which at m = T /3 becomes O(T (ln T ) . P (Bad) can be made vanishingly small.
√
c. If one arm has a mean of 1/2 and the other of √ 1/2+1/ T . Under the Good event,
√ the
regret even if the smaller arm is chosen is m/ T +P (1/2 arm is chosen)(T −2m)/ T =
O(T 1/2 ). The probability Bad can be made arbitrarily small. The regret bound then
is O(T 1/2 )