0% found this document useful (0 votes)
21 views10 pages

Berkeley Hw2-Sol

Uploaded by

devario.shaheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

Berkeley Hw2-Sol

Uploaded by

devario.shaheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UC Berkeley

Department of Electrical Engineering and Computer Sciences

EECS 126: Probability and Random Processes

Problem Set 2
Fall 2019

Self-Graded Scores Due: 11.59 PM, Monday, September 16, 2019


Submit your self-graded scores via the Google form:
https://forms.gle/HCgYqRiM1qMnRyoQA.
Make sure you use your SORTABLE NAME on CalCentral.

1. Secret Hitler
In the game of Secret Hitler, there are 9 players. 4 of them are Fascist and
5 of them are Liberal. There is also a deck of 17 cards containing 11 Fascist
“policies” and 6 Liberal “policies”. Fascists want to play Fascist policies, and
Liberals want to play Liberal policies. Here’s how the play proceeds.

• A President and a Chancellor are chosen uniformly at random from the 9


players.
• The President draws 3 policies from the deck and gives 2 to the Chancellor.
• The Chancellor chooses one to play.

Now suppose you are the Chancellor, but the President gave you 2 Fascists.
Being a Liberal, you wonder, did the President just happen to have 3 Fascist
policies, or was the President a Fascist who secretly discarded a Liberal policy.
In this scenario, what’s the probability that the President is Fascist? Let’s
assume that Fascist presidents always try to discard Liberal policies.
Solution: Let’s define the following.

• Let F be the event that the President is Fascist.


• Let O be the event that you observe 2 Fascist policies.

By Bayes Rule,

P (O | F )P (F )
P (F | O) =
P (O | F )P (F ) + P (O | F { )P (F { )
4 4
You know that P (F ) = 8 and P (F { ) = 8 (you are Liberal). For P (O | F ),
there are two cases.

• The President drew 3 Fascist policies.


• The President drew 2 Fascist policies and 1 Liberal policy, but discarded
the Liberal policy.

1
.
On the other hand, for P (O | F { ), there is only one case.

• The President drew 3 Fascist policies.

.
The probability that the President draws 3 Fascist policies is
11 6 11·10·9
 
3 0 6 ·1
17
 = 17·16·15
3 6
33
=
136
The probability that the President draws 2 Fascist policies and 1 Liberal policy
is
11 6 11·10
 
2 1 2 ·6
17
 = 17·16·15
3 6
66
=
136
Putting everything together, we get
33 66
( 136 + 136 ) · 48
P (F | O) = 33 66
( 136 + 136 ) · 48 + 13633
· 4
8
99
=
99 + 33
3
=
4

2. Packet Routing
Packets arriving at a switch are routed to either destination A (with probability
p) or destination B (with probability 1 − p). The destination of each packet is
chosen independently of each other. In the time interval [0, 1], the number of
arriving packets is Poisson(λ).

(a) Show that the number of packets routed to A is Poisson distributed. With
what parameter?
(b) Are the number of packets routed to A and to B independent?

Solution:

(a) Let X, Y be random variables which are equal to the number of packets
routed to the destinations A, B respectively. Let Z = X + Y . We are
given that Z ∼ Poisson(λ). We prove that X has the Poisson distribution
with mean pλ.

X
P(X = x) = P(X = x, Z = z)
z=x

2

X
= P(Z = z)P(X = x | Z = z)
z=x
∞ −λ z  
X e λ z
= px (1 − p)z−x
z=x
z! x

X λz z!
= e−λ px (1 − p)z−x
z=x
z! x!(z − x)!
−λ ∞
e (λp)x X (λ(1 − p))z−x
=
x! z=x
(z − x)!
e−λ (λp)x λ(1−p)
= e
x!
e−λp (λp)x
= .
x!

(b) We prove that X and Y are independent.



X
P(X = x, Y = y) = P(X = x, Y = y, Z = z)
z=0

X
= P(X = x, Y = y | Z = z)P(Z = z)
z=0
= P(X = x, Y = y | Z = x + y)P(Z = x + y)
(x+y)!

e−λ λx+y
= 
px (1 − p)y
x!y! (x
 +y)!


e−λp (λp)x e−λ(1−p) (λ(1 − p))y


= ·
x! y!
= P(X = x)P(Y = y).

3. Compact Arrays
Consider an array of n entries, where n is a positive integer. Each entry is
chosen uniformly randomly from {0, . . . , 9}. We want to make the array more
compact, by putting all of the non-zero entries together at the front of the
array. As an example, suppose we have the array

[6, 4, 0, 0, 5, 3, 0, 5, 1, 3].

After making the array compact, it now looks like

[6, 4, 5, 3, 5, 1, 3, 0, 0, 0].

Let i be a fixed positive integer in {1, . . . , n}. Suppose that the ith entry of
the array is non-zero (assume that the array is indexed starting from 1). Let
X be a random variable which is equal to the index that the ith entry has
been moved after making the array compact. Calculate E[X] and var(X).
Solution:

3
Let Xj be the indicator that the jth entry of the original Pi−1array is 0, for
j ∈ {1, . . . , i − 1}. Then, the ith entry is moved backwards j=1 Xj , positions,
so
i−1
X i−1 9i + 1
E[X] = i − E[Xj ] = i − = .
10 10
j=1

The variance is also easy to compute, since the Xj are independent. Then,
var(Xj ) = (1/10)(9/10) = 9/100, so
 i−1  Xi−1
X 9(i − 1)
var(X) = var i − Xj = var(Xj ) = .
100
j=1 j=1

4. Message Segmentation
The number of bytes N in a message has a geometric distribution with parameter
p. Suppose that the message is segmented into packets, with each packet
containing m bytes if possible, and any remaining bytes being put in the last
packet. Let Q denote the number of full packets in the message, and let R
denote the nubmer of bytes left over.

(a) Find the joint PMF of Q and R. Pay attention on the support of the
joint PMF.
(b) Find the marginal PMFs of Q and R.
(c) Repeat part (b), given that we know that N > m.

Note: you can use the formulas


n
X 1 − an+1
ak = , for a 6= 1
1−a
k=0

X 1
xk = , for |x| < 1
1−x
k=0

in order to simplify your answer.


Solution:

(a) Given any N there is a unique way to write it as N = Qm + R, where


R ∈ {0, 1, . . . , m}. Therefore

P(Q = q, R = r) = P(N = qm + r)
= (1 − p)qm+r−1 p.

The range of values of (Q, R) is {(q, r) | q ≥ 0, 0 ≤ r < m} − {(0, 0)}.


Note that since P(N = 0) = 0 we cannot have both Q = 0 and R = 0 at
the same time.

4
(b) The marginal PMF of Q is
m−1
X
P(Q = q) = (1 − p)qm+r−1 p
r=0
m−1
X
qm−1
= (1 − p) p (1 − p)r
r=0
= (1 − p)qm−1 1 − (1 − p)m , for q = 1, 2, . . . ,


m−1
X
P(Q = 0) = (1 − p)r−1 p
r=1
= 1 − (1 − p)m−1 .

The marginal PMF of P is



X
P(R = r) = (1 − p)qm+r−1 p
q=0
p(1 − p)r−1
= , for r = 1, 2, . . . , m − 1,
1 − (1 − p)m

X
P(R = 0) = (1 − p)qm−1 p
q=1

X q
= p(1 − p)m−1 (1 − p)m
q=0
p(1 − p)m−1
= .
1 − (1 − p)m

(c) Due to the memoryless property of the geometric distribution, the PMF
of R will be exactly the same as in part (b), while the PMF of Q is

P(Q = q) = (1 − p)(q−1)m−1 1 − (1 − p)m , for q = 2, 3, . . . ,




P(Q = 1) = 1 − (1 − p)m−1 .

5. Almost fixed points of a permutation


Let Ω be the set of all permutations of the numbers 1, 2, ..., n. Let an almost
fixed point be defined as follows: If we put the numbers i ∈ 1, 2, ..., n around a
circle in clockwise order (such that 1 and n are next to each other) and then
assign another number ω(i) ∈ 1, 2, ..., n to it, if the number ω(i) is next to i (or
is equal to i), we will say that i is almost a fixed point. So, for the permutation
ω(1) = 5, ω(2) = 3, ω(3) = 1, ω(4) = 4, ω(5) = 2, we have that 1, 2, and 4 are
almost fixed points.
Now, let X(ω) denote the number of almost fixed points in ω ∈ Ω. Find E[X]
and var(X). You may assume that n ≥ 5.
Solution:

5
To find the expectation from above, we need to observe to write X as a sum
of indicator random variables indicating whether i is an almost fixed
P point or
1 i E[1i ] =
P
not, and doing so will give the answer as 3 by E[X] = E[ i i ] =
P 3
i P (i is an almost fixed point) = n n = 3. (We get that P (i is a fixed point)
from counting the number of such permutations; there are 3 choices for what
i can map to, and once we pick it for i, we can permute the rest in (n − 1)!
ways). Note that this calculation is true so long as you have the three choices,
and for n = 1 or n = 2, the expectations are 1 and 2 respectively.
Continuing from the sum interpretation above, we get that E X 2 = E[ (i,j) 1i 1j ] =
P

E[ i 1i + 2 i,j 1i 1j ]. We note that the first sum is equal to E X since


P 2 P

12i = 1i . Now, we simply look at the probability that both i and j may be
almost fixed; to do so, we find it convenient to divide the sum into three
sets A1 = {{i, j} | j = i + 1 (mod n), i, j ∈ [n]}, A2 = {{i, j} | j = i + 2
(mod n), i, j ∈ [n]}, A3 = {{i, j} | {i, j} 6∈ A1 , A2 , i, j ∈ [n]}
We now need to calculate the sizes of the sets A1 , A2 and A3 . When n is
 large (for this problem, n ≥ 5), we have that |A1 | = |A2 | = n and
sufficiently
n
|A3 | = 2 − 2n.
1 1 {i,j}∈A1 1i 1j +
P P
Thus, the second sum can be rewritten as: 2 i,j i j = 2
2 {i,j}∈A2 1i 1j + 2 {i,j}∈A3 1i 1j .
P P

Now, computing the probabilities gives us that E[2 i,j 1i 1j ] = 2 {i,j}∈A1 n(n−1)7
P P
+
8 9 15 n 9
P P 
2 {i,j}∈A2 n(n−1) +2 {i,j}∈A3 n(n−1) = 2( n−1 +( 2 −2n) n(n−1) ), which again
holds only for n ≥ 5.
So the final answer is:
15 n 9 6

var(X) = 3 + 2( n−1 +( 2 − 2n) n(n−1) )−9 = 3− n−1 which now holds for
n ≥ 5.
We note that the variance for n = 1, 2, 3 is 0 since everything is an almost fixed
point. For n = 4 we count the sizes of the sets as |A1 | = 4, |A2 | = 2, |A3 | = 0.
Now, we note that the probability that both i and j are fixed points if they
7
are in |A2 | will be n(n−1) . So, simplifying what we had before, we now get
7
3 + 2( 2 ) − 9 = 1 as our variance.
Alternate Solution
The tricky part in the above solution is calculating

1i 1j ]
X
E[
i,j6=i

n n
P (1i = 1, 1j = 1)
X X
=
i=1 j=1,j6=i
n n
P (1j = 1 | 1i = 1)P (1i = 1)
X X
=
i=1 j=1,j6=i
n n
3
P (1j = 1 | 1i = 1)
X X
=
n
i=1 j=1,j6=i

6
n n
3 X
P (1j = 1 | 1i = 1)
X
=
n
i=1 j=1,j6=i

To find j=1,j6=i P (1j = 1 | 1i = 1), note that regardless of what ω(i) we pick,
Pn
there are only two cases for j: j is obstructed by ω(i) in two cases and has two
positions for ω(j); or j is not obstructed by i and has three positions for ω(j)
2 3
in n − 3 remaining cases. This means the above sum is n−1 ∗ 2 + n−1 ∗ (n − 3).
Since this is independent of i, we get
2 3
1i 1j ] = 3(
X
E[ ·2+ · (n − 3))
n−1 n−1
i,j6=i

4 − 9 + 3n 6
=3 =9−
n−1 n−1
Combining this with the other terms in the variance, we get
6 6
var(X) = 3 + 9 − −9=3−
n−1 n−1

6. Soliton Distribution
This question pertains to the fountain codes introduced in the lab.
Say that you wish to send n chunks of a message, X1 , . . . , Xn , across a channel,
but alas the channel is a packet erasure channel: each of the packets you
send is erased with probability pe > 0. Instead of sending the n chunks directly
through the channel, we will instead send n packets through the channel,
call them Y1 , . . . , Yn . How do we choose the packets Y1 , . . . , Yn ? Let p(·) be a
probability distribution on {1, . . . , n}; this represents the degree distribution
of the packets.

(i) For i = 1, . . . , n, pick Di (a random variable) according to the distribution


p(·). Then, choose Di random chunks among X1 , . . . , Xn , and “assign” Yi
to the Di chosen chunks.
(ii) For i = 1, . . . , n, let Yi be the XOR of all of the chunks assigned for Yi
(the number of chunks assigned for Yi is called the degree of Yi ).
(iii) Send each Yi across the channel, along with metadata which describes
which chunks were assigned to Yi .

The receiver on the other side of the channel receives the packets Y1 , . . . , Yn (for
simplicity, assume that no packets are erased by the channel; in this problem,
we are just trying to understand what we should do in the ideal situation of
no channel noise), and decoding proceeds as follows:

(i) If there is a received packet Y with only one assigned chunk Xj , then
set Xj = Y . Then, “peel off” Xj : for all packets Yi that Xj is assigned
to, replace Yi with Yi XOR Xj . Remove Y and Xj (notice that this may
create new degree-one packets, which allows decoding to continue).
(ii) Repeat the above step until all chunks have been decoded, or there are
no remaining degree-one packets (in which case we declare failure).

7
In the lab, you will play around with the algorithm and watch it in action.
Here, our goal is to work out what a good degree distribution p(·) is.
Intuitively, a good degree distribution needs to occasionally have prolific packets
that have high degree; this is to ensure that all packets are connected to at
least one chunk. However, we need “most” of the packets to have low degree
to make decoding easier. Ideally, we would like to choose p(·) such that at each
step of the algorithm, there is exactly one degree-one packet.

(a) Suppose that when k chunks have been recovered (k = 0, 1, . . . , N − 1),


then the expected number of packets of degree d (for d > 1) is fk (d).
Assuming we are in the ideal situation where there is exactly one degree-
one packet for any k : What is the probability that a given degree d
packet is connected to the chunk we are about to peel off? Based on that,
what is the expected number of packets of degree d whose degrees are
reduced by one after the (k + 1)st chunk is peeled off?
(b) We want fk (1) = 1 for all k = 0, 1, . . . , n − 1. Show that in order for this
to hold, then for all d = 2, . . . , n we have fk (d) = (n − k)/[d(d − 1)]. From
this, deduce what p(d) must be, for d = 1, . . . , n. (This is called the ideal
soliton distribution.)
[Hint: You should get two different recursion equations since the only
degree 1 node at peeling k + 1 is going to come from the peeling of degree
2 nodes at peeling k, however, for other higher degree d nodes, there will
be some probability that some degree d ones will remain from the previous
iteration and some probability that they will come from d + 1 one that
will be peeled off]
(c) Find the expectation of the distribution p(·).

In practice, the ideal soliton distribution does not perform very well because it
is not enough to design the distribution to work well in expectation.
Solution:

(a) Of the fk (d) packets with degree d, each packet has probability d/(n − k)
(since there are n − k packets remaining) of being connected with the
message packet which is peeled off at iteration k + 1. Thus, by linearity,
the answer is fk (d)d/(n − k).
(b) We certainly need f0 (1) = 1 and 1 = f1 (1) = f0 (2) · 2/n, so f0 (2) = n/2.
For k = 0, 1, . . . , n − 1, we have 1 = fk+1 (1) = fk (2) · 2/(n − k), so
fk (2) = (n − k)/2.
Proceed by induction. Suppose that for all d ≤ d0 , where d0 = 2, . . . , n − 1,
we know that fk (d) = (n − k)/[d(d − 1)]. Then, for k = 0, 1, . . . , n − d − 1,

n−k−1 d+1  d 
= fk+1 (d) = fk (d + 1) + fk (d) 1 −
d(d − 1) n−k n−k
d+1 n−k  d 
= fk (d + 1) + 1−
n − k d(d − 1) n−k

so fk (d + 1) = (n − k)/[d(d + 1)].

8
Note that f0 (d), the expected number of degree-d received packets at the
beginning of the algorithm, is exactly np(d), so:
1
 ,
 d=1
n
p(d) = 1

 , d = 2, . . . , n
d(d − 1)

(c) The expectation is


n n n n
X 1 X 1 1 X 1 X1
dp(d) = + d· = + = ≈ ln n.
n d(d − 1) n d−1 d
d=1 d=2 d=2 d=1

7. [Bonus] Connected Random Graph


The bonus question is just for fun. You are not required to submit the bonus
question, but do give it a try and write down your progress.
We start with the empty graph on n vertices, and iteratively we keep on adding
undirected edges {u, v} uniformly at random from the edges that are not so
far present in the graph, until the graph is connected. Let X be a random
variable which is equal to the total number of edges of the graph. Show that
E[X] = O(n log n).
Hint: consider the random variable Xk which is equal to the number of edges
added while there are k connected components, until there are k − 1 connected
components. Don’t try to calculate E[Xk ], an upper bound is enough.
Solution:
The hint suggests that we should follow the approach used Pn in the coupon
collecting problem. Indeed, observe that we can write X = k=2 Xk . Suppose
that pk is the probability that we add an edge which brings us to k−1 connected
components at the time when we first have k connected components, and let
Yk be a geometric random variable with probability of success pk . Note that
as we continue to add edges, the probability of producing k − 1 connected
components from k connected components will increase, starting from pk . Then,
E[Xk ] ≤ E[Yk ]. 1
In order to bound pk , assume that there are k connected components so far
and u is one endpoint of the edge that we are currently adding. Then there are
at least k − 1 other vertices to which we can connect u and reduce the number
1
Intuitively, Yk is “larger” than Xk , although it is difficult to explain precisely what this means in
the context of randomness. We say that Yk stochastically dominates Xk if P(Yk ≥ x) ≥ P(Xk ≥ x)
for each x. Here, Yk does indeed stochastically dominate Xk , and this fact implies E[Yk ] ≥ E[Xk ].
To explain why, we use a coupling argument: suppose that each time we add an edge, we flip a
coin of probability pk ; if the coin comes up heads, then we add an edge to the graph that connects
two connected components. If the coin comes up tails, then we still have a chance of adding an
edge that connects two connected components (this is because the probability of connecting two
connected components is ≥ pk and can be strictly larger); however, it is clear that in this case, the
number of flips until we have k − 1 connected components (starting with k connected components)
is at most the number of flips until we see heads, i.e., Xk ≤ Yk . Thus, E[Xk ] ≤ E[Yk ].

9
of components. In total there are n − 1 other vertices to which we can connect
u so the probability that this edge reduces the number of components is
k−1
pk ≥ .
n−1
Putting it all together
n n n n
X X X 1 X n−1
E[X] = E[Xk ] ≤ E[Yk ] = ≤ = (n − 1)Hn−1
pk k−1
k=2 k=2 k=2 k=2
= O(n log n).

10

You might also like