E2 211 DIGITAL COMMUNICATION
Problem Set 1
1. Definition: The uncertainty(Entropy) H(X) of a discrete random variable X is defined
by
X
H(X) = − p(x) log p(x)
x∈X
Prove the following,
(i) H(X) ≥ 0.
P
(ii) Hb (X) = (logb a)Ha (X). Where Ha (X) , − x∈X p(x) loga p(x)
2. Definition: The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y )
with a joint distribution p(x, y) is defined as,
XX
H(X, Y ) = − p(x, y) log p(x, y)
x∈X y∈Y
Definition: If (X, Y ) ∼ p(x, y), the conditional entropy H(Y |X) is defined as,
XX
H(Y |X) = − p(x, y) log p(y|x)
x∈X y∈Y
Prove the following,
(i) H(X, Y ) = H(X) + H(Y |X) (Chain rule)
(ii) H(X, Y |Z) = H(X|Z) + H(Y |X, Z)
3. Definition: The relative entropy between two probability mass functions p(x) and q(x)
is defined as,
X p(x)
D(p||q) = p(x) log
x∈X
q(x)
Definition: Consider two random variables X and Y with a joint probability mass function
p(x, y) and marginal probability mass functions p(x) and p(y). The mutual information
I(X; Y ) is the relative entropy between the joint distribution and the product distribution
p(x)p(y).
XX p(x, y)
I(X; Y ) = p(x, y) log
x∈X y∈Y
p(x)p(y)
Prove the following,
(i) I(X; Y ) = H(X) − H(X|Y )
(ii) I(X; Y ) = H(Y ) − H(Y |X)
(iii) I(X; Y ) = H(X) + H(Y ) − H(X, Y )
(iv) I(X; Y ) = I(Y ; X)
(v) I(X; X) = H(X)
1
4. Definition: The conditional mutual information of random variables X and Y given
Z is defined by,
I(X; Y |Z) = H(X|Z) − H(X|Y, Z)
Definition: For joint probability mass functions p(x, y) and q(x, y), the conditional rel-
ative entropy D(p(y|x)||q(y|x)) is the average of the relative entropies between the con-
ditional probability mass functions p(y|x) and q(y|x) averaged over the probability mass
function p(x).
X X p(y|x)
D(p(y|x)||q(y|x)) = p(x) p(y|x) log
x y
q(y|x)
Prove the following chain rules,
(i) Chain rule for entropy
n
P
H(X1 , X2 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 ).
i=1
(ii) Chain rule for information
n
P
I(X1 , X2 , . . . , Xn ; Y ) = I(Xi ; Y |Xi−1 , Xi−2 , . . . , X1 ).
i=1
(iii) Chain rule for relative entropy
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
5. (Information Inequality) Let p(x), q(x), x ∈ X , be two probability mass functions.
Then prove that D(p||q) ≥ 0, with equality if and only if p(x) = q(x) for all x.
6. (Non-negativity of mutual information) For any two random variables X and Y , prove
that I(X; Y ) ≥ 0, with equality if and only if X and Y are independent. Also, prove the
following,
(i) D(p(y|x)||q(y|x)) ≥ 0, with equality if and only if p(y|x) = q(y|x) for all y and x
such that p(x) > 0.
(ii) I(X; Y |Z) ≥ 0, with equality if and only if X and Y are conditionally independent
given Z.
7. Prove that H(X) ≤ log |X |, where |X | denotes the number of elements in the range of
X, with equality if and only if X has a uniform distribution over X .
8. (Conditioning reduces entropy)(Information can’t hurt!) Prove that H(X|Y ) ≤ H(X),
with equality if and only if X and Y are independent.
9. (Independence bound on entropy) Let X1 , X2 , . . . , Xn be drawn according to p(x1 , x2 , . . . , xn ).
n
P
Prove that H(X1 , X2 , . . . , Xn ) ≤ H(Xi ), with equality if and only if the Xi are indepen-
i=1
dent.
10. A fair coin is flipped until the first head occurs. Let X denotes the number of flips
required. Find the uncertainty H(X) in bits.
11. What is the minimum value of H(p1 , . . . , pn ) = H(p) as p ranges over the set of
n-dimensional probability vectors? Find all p’s that achieve this minimum.
12. An unbiased die with six faces is thrown twice and the following is disclosed. ”The
product of the faces appeared is even”. How much information is conveyed by this disclosure?
2
13. Let p(x, y) be given by,
HH Y
H
0 1
X HHH
1 1
0 3 3
1
1 0 3
Find,
(a) H(X), H(Y ).
(b) H(X|Y ), H(Y |X).
(c) H(X, Y ).
(d) H(Y ) − H(Y |X).
(e) I(X; Y ).
References
[1] Thomas M. Cover and Joy A. Thomas, 2006, Elements of Information Theory (Wiley Series in Telecom-
munications and Signal Processing). Wiley-Interscience, USA.