Exam2 Review Solutions
Exam2 Review Solutions
Question 1
Perceptrons can learn all Boolean functions of two inputs.
True
√
False
Explain:
Solution: False they can only learn linearly separable functions. In particular, they
cannot learn the XOR function.
Question 2
A particular perceptron is initialized with the weights w = [1, 1, 1] and the bias b = 1.
(a) The perceptron undergoes one iteration of training, with learning rate η = 1, with the
training token x = [0.1, 0.6, 0.5], y = −1. After this one iteration of training, what are w
and b?
Solution:
w = [0.9, 0.4, 0.5], b = 0
(b) The perceptron undergoes one more iteration of training, with learning rate η = 1, with
the training token x = [0.1, 0.1, 0.4], y = 1. After this second iteration of training, what
are w and b?
Solution:
w = [0.9, 0.4, 0.5], b = 0
Question 3
A particular perceptron is initialized with the weights w = [1, 1, 1] and the bias b = 1.
(a) The perceptron undergoes one iteration of training, with learning rate η = 1, with the
training token x = [0.1, 0.1, 0.8], y = −1. After this one iteration of training, what are w
and b?
1
Solution:
w = [0.9, 0.9, 0.2], b = 0
(b) The perceptron undergoes one more iteration of training, with learning rate η = 1, with
the training token x = [0.1, 0.3, 0.9], y = 1. After this second iteration of training, what
are w and b?
Solution:
w = [0.9, 0.9, 0.2], b = 0
Question 4
A particular perceptron is initialized with the weights w = [1, 1, 1] and the bias b = 1.
(a) The perceptron undergoes one iteration of training, with learning rate η = 1, with the
training token x = [0.5, 0.9, 0.6], y = 1. After this one iteration of training, what are w
and b?
Solution:
w = [1, 1, 1], b=1
(b) The perceptron undergoes one more iteration of training, with learning rate η = 1, with
the training token x = [0.4, 0.9, 0.3], y = −1. After this second iteration of training, what
are w and b?
Solution:
w = [0.6, 0.1, 0.7], b=0
Question 5
Gradient descent is guaranteed to find a set of model parameters that has the smallest possible
loss function on the training corpus.
True
√
False
Explain:
Solution: Gradient descent finds a local optimum of the loss function, but it is not guar-
anteed to find a global optimum.
Question 6
Cross-entropy is
n
1X
L=− ln P (Y = yi |xi )
n
i=1
2
Suppose you have n = 2 training samples, x1 = [0.2, 0.6]T , y1 = 1, x2 = [−1.2, 0.3]T , y2 = 0.
Suppose P (Y = 1|x) is defined as P (Y = 1|x) = σ(wT x), where σ(·) is the logistic sigmoid
function. This computation has already been performed, therefore you already know that
P (Y = 1|x1 ) = 0.2, P (Y = 1|x2 ) = 0.9. Find the gradient of L with respect to the vector w.
Solution:
1
∇w L = − [(0.8)(0.2) − (0.9)(−1.2), (0.8)(0.6) − (0.9)(0.3)]T
2
Question 7
A particular neural net has scalar input xi , and scalar training target yi , where i is the training
token number. There is just one hidden node, with activation hi = ReLU(xi + b). The output
is ŷi = ReLU(whi + c), and the loss is
1 X
L= (ŷi − yi )2
2n
i
Suppose the coefficients are initialized to w = 1, b = c = 0, and then the loss is computed with
respect to the following n=4 training tokens: x1 = −2, y1 = 4, x2 = −1, y2 = 1, x3 = 1, y3 =
1, x4 = 2, y4 = 4. What is dL/db?
Solution:
dL X dL dŷi dhi
=
db dŷi dhi db
i
dhi /db = 0 for i = 1 and i = 2. dL/dŷi = (1/4)(ŷi − yi ), which is zero for i = 3. Therefore,
the only nonzero term is for i = 4, where we have
! !
dL 1 dReLU(whi ) dReLU(xi ) 1
= (2)(2 − 4) =−
db 8 dhi w=1,h1 =1 dxi x1 =2 2
Question 8
Suppose
exp(ei )
hi = qP
j (exp(2ej ))
3
qP
Solution: Let’s define N U M1 = exp(e1 ), N U M2 = exp(e2 ), and DEN = j (exp(2ej )),
so that h1 = N U M/DEN . Then
d ln(h1 ) 1 dh1
= (1)
de2 h1 de2
1 −N U M1 dDEN
= (2)
h1 DEN 2 de2
−0.5
DEN −N U M1 X
= 0.5 (exp(ej )) 2 exp(2e2 ) (3)
N U M1 DEN 2
j
1 1
=− 0.5 2N U M22 (4)
DEN DEN
= −(h2 )2 (5)
Question 9
A, B, and C are binary random variables, whose dependencies are shown in the Bayes net
below:
Variable A has a probability P (A = 1) = 0.3. Conditional probabilities of the other two variables
are given in the table below:
a P (B = 1|A = a) P (C = 1|A = a, B = 0) P (C = 1|A = a, B = 1)
0 0.2 0.4 0.9
1 0.8 0.3 0.7
(a) What is P (A = 1, B = 0, C = 1)? Leave your answer in the form of a product of real
numbers; do not simplify.
Solution: (0.3)(0.2)(0.3)
(b) What is P (A = 1|C = 1)? Leave your answer in the form of a ratio of sums of products; do
not simplify.
Solution:
(0.3)(0.2)(0.3) + (0.3)(0.8)(0.7)
P (A = 1|C = 1) =
(0.3)(0.2)(0.3) + (0.3)(0.8)(0.7) + (0.7)(0.8)(0.4) + (0.7)(0.2)(0.9)
Question 10
4
A, B, and C are binary random variables, whose dependencies are shown in the Bayes net
below:
Variable A has a probability P (A = 1) = 0.8. Conditional probabilities of the other two variables
are given in the table below:
a P (B = 1|A = a) P (C = 1|A = a, B = 0) P (C = 1|A = a, B = 1)
0 0.8 0.3 0.1
1 0.4 0.9 0.9
(a) What is P (A = 0, B = 1, C = 0)? Leave your answer in the form of a product of real
numbers; do not simplify.
Solution: (0.2)(0.8)(0.9)
(b) What is P (A = 0|C = 0)? Leave your answer in the form of a ratio of sums of products; do
not simplify.
Solution:
(0.2)(0.2)(0.7) + (0.2)(0.8)(0.9)
P (A = 0|C = 0) =
(0.2)(0.2)(0.7) + (0.2)(0.8)(0.9) + (0.8)(0.6)(0.1) + (0.8)(0.4)(0.1)
Question 11
A, B, and C are binary random variables, whose dependencies are shown in the Bayes net
below:
Variable A has a probability P (A = 1) = 0.1. Conditional probabilities of the other two variables
are given in the table below:
a P (B = 1|A = a) P (C = 1|A = a, B = 0) P (C = 1|A = a, B = 1)
0 0.8 0.1 0.2
1 0.1 0.6 0.4
(a) What is P (A = 1, B = 1, C = 1)? Leave your answer in the form of a product of real
numbers; do not simplify.
Solution: (0.1)(0.1)(0.4)
(b) What is P (A = 1|C = 1)? Leave your answer in the form of a ratio of sums of products; do
not simplify.
5
Solution:
(0.1)(0.9)(0.6) + (0.1)(0.1)(0.4)
P (A = 0|C = 0) =
(0.1)(0.9)(0.6) + (0.1)(0.1)(0.4) + (0.9)(0.2)(0.1) + (0.9)(0.8)(0.2)
Question 12
We have a bag of three biased coins, a, b, and c, with probabilities of coming up heads of 20%,
60%, and 80%, respectively. One coin is drawn randomly from the bag (with equal likelihood
of drawing each of the three coins), and then the coin is flipped three times to generate the
outcomes X1, X2, and X3.
(a) Draw the Bayesian network corresponding to this setup and define the necessary condi-
tional probability tables (CPTs).
Solution: You need an intermediate variable, C ∈ {a, b, c}, to specify which coin is
drawn, then the graph is
C
X1 X2 X3
(b) Calculate which coin was most likely to have been drawn from the bag if the observed flips
come out heads twice and tails once.
Solution:
Question 13
Consider the following Bayes network (all variables are binary):
6
P (A) = 0.4, P (B) = 0.1
A B A, B P (C|A, B)
False,False 0.7
C False,True 0.7
True,False 0.1
True,True 0.9
(a) What is P (C)? Write your answer in numerical form, but you don’t need to simplify.
Solution:
(b) What is P (A|B = T rue, C = T rue)? Write your answer in numerical form, but you don’t
need to simplify.
Solution:
P (A, B, C)
P (A|B, C) =
P (A, B, C) + P (¬A, B, C)
(0.4)(0.1)(0.9)
=
(0.4)(0.1)(0.9) + (0.6)(0.1)(0.7)
(c) You’ve been asked to re-estimate the parameters of the network based on the following
observations:
Observation A B C
1 True False False
2 False False True
3 True True False
4 False False False
Given the data in the table, what are the maximum likelihood estimates of the model
parameters? If there is a model parameter that cannot be estimated from these data,
mark it “UNKNOWN.”
7
(d) Use the table of data given in part (c), but this time, estimate the data using Laplace
smoothing, with a smoothing parameter of k = 1.
Question 14
Consider the following Bayes network (all variables are binary):
P (C) = 0.1
C
C P (A|C) P (B|C)
A B False 0.8 0.7
True 0.4 0.7
(a) What is P (A)? Write your answer in numerical form, but you don’t need to simplify.
Solution:
(b) What is P (C|A = T rue, B = T rue)? Write your answer in numerical form, but you don’t
need to simplify.
Solution:
P (A, B, C)
P (C|A, B) =
P (A, B, C) + P (A, B, ¬C)
(0.1)(0.4)(0.7)
=
(0.1)(0.4)(0.7) + (0.9)(0.8)(0.7)
(c) You’ve been asked to re-estimate the parameters of the network based on the following
observations:
Observation A B C
1 False True False
2 True True False
3 False False True
4 False False True
8
Given the data in the table, what are the maximum likelihood estimates of the model
parameters? If there is a model parameter that cannot be estimated from these data,
mark it “UNKNOWN.”
(d) Use the table of data given in part (c), but this time, estimate the data using Laplace
smoothing, with a smoothing parameter of k = 1.
Question 15
Consider the following Bayes network (all variables are binary):
A
P (A) = 0.8
A P (B|A)
B False 0.7
True 0.3
B P (C|B)
False 0.5
C True 0.7
(a) What is P (C)? Write your answer in numerical form, but you don’t need to simplify.
Solution:
(b) What is P (A|B = T rue, C = T rue)? Write your answer in numerical form, but you don’t
need to simplify.
9
Solution:
P (A, B, C)
P (A|B, C) =
P (A, B, C) + P (¬A, B, C)
(0.8)(0.3)(0.7)
=
(0.8)(0.3)(0.7) + (0.2)(0.7)(0.7)
(c) You’ve been asked to re-estimate the parameters of the network based on the following
observations:
Observation A B C
1 True False False
2 False False True
3 True True False
4 False False False
Given the data in the table, what are the maximum likelihood estimates of the model
parameters? If there is a model parameter that cannot be estimated from these data,
mark it “UNKNOWN.”
Solution:
P (A) = 2/4
A P (B|A)
False 0/2
True 1/2
B P (C|B)
False 1/3
True 0/1
(d) Use the table of data given in part (c), but this time, estimate the data using Laplace
smoothing, with a smoothing parameter of k = 1.
P (A) = 3/6
A P (B|A)
False 1/4
True 2/4
B P (C|B)
False 2/5
True 1/3
10
Question 16
Consider the following Bayes network (all variables are binary):
P (A) = 0.4
A B A, B P (C|A, B)
A P (B|A) False,False 0.9
C False 0.1 False,True 0.3
True 0.2 True,False 0.7
True,True 0.5
(a) What is P (C)? Write your answer in numerical form, but you don’t need to simplify.
Solution:
(b) What is P (A|B = T rue, C = T rue)? Write your answer in numerical form, but you don’t
need to simplify.
Solution:
P (A, B, C)
P (A|B, C) =
P (A, B, C) + P (¬A, B, C)
(0.4)(0.2)(0.5)
=
(0.4)(0.2)(0.5) + (0.6)(0.1)(0.3)
(c) You’ve been asked to re-estimate the parameters of the network based on the following
observations:
Observation A B C
1 True True False
2 False True True
3 False True False
4 False False True
Given the data in the table, what are the maximum likelihood estimates of the model
parameters? If there is a model parameter that cannot be estimated from these data,
mark it “UNKNOWN.”
Solution:
P (A) = 1/4
11
A P (B|A)
False 2/3
True 1
A, B P (C|A, B)
False,False 1/1
False,True 1/2
True,False UNKNOWN
True,True 0/1
(d) Use the table of data given in part (c), but this time, estimate the data using Laplace
smoothing, with a smoothing parameter of k = 1.
Solution:
P (A) = 2/6
A P (B|A)
False 3/5
True 2/3
A, B P (C|A, B)
False,False 2/3
False,True 2/4
True,False 1/2
True,True 1/3
Question 17
Maria likes ducks and geese. She notices that when she leaves the heat lamp on (in her back
yard), she is likely to see ducks and geese. When the heat lamp is off, she sees ducks and geese
in the summer, but not in the winter.
(a) The following Bayes net summarizes Maria’s model, where the binary variables D,G,L,
and S denote the presence of ducks, geese, heat lamp, and summer, respectively:
L S
D G
On eight randomly selected days throughout the year, Maria makes the observations shown
in the following table:
day D G L S day D G L S
1 0 1 1 0 5 1 0 0 1
2 1 0 1 0 6 1 0 1 1
3 0 0 0 0 7 0 1 1 1
4 0 0 0 0 8 0 1 0 1
Write the maximum-likelihood conditional probability tables for D, G, L and S.
12
Solution: We have that P (S) = 0.5, P (L) = 0.5, and
S L P (D|S, L) P (G|S, L)
0 0 0 0
0 1 0.5 0.5
1 0 0.5 0.5
1 1 0.5 0.5
(b) Maria speculates that ducks and geese don’t really care whether the lamp is lit or not,
they only care whether or not the temperature in her yard is warm. She defines a binary
random variable, W , which is 1 when her back yard is warm, and she proposes the following
revised Bayes net:
L S
D G
She forgot to measure the temperature in her back yard, so W is a hidden variable.
Her initial guess is that P (D|W ) = 23 , P (D|¬W ) = 13 , P (G|W ) = 23 , P (G|¬W ) = 13 ,
P (W |L ∧ S) = 23 , P (W |¬(L ∧ S)) = 13 . Find the posterior probability P (W |day) for
each of the 8 days, day ∈ {1, . . . , 8}, whose observations are shown in the table of model
parameters above.
day 1 2 3 4 5 6 7 8
Solution: 1 1 1 1 1 2 2 1
P (W |day) 3 3 9 9 3 3 3 3
Question 18
Suppose you have a Bayes net with two binary variables, Jahangir (J) and Shahjahan (S):
J S
This network has three trainable parameters: P (J) = a, P (S|J) = b, and P (S|¬J) = c.
Suppose you have a training dataset in which S is observed, but J is hidden. Specifically, there
are N training tokens for which S = True, and M training tokens for which S = False. Given
current estimates of a, b, and c, you want to use the EM algorithm to find improved estimates
â, b̂, and ĉ.
(a) Find the following expected counts, in terms of M , N , a, b, and c:
E[# times J True] =
E[# times J and S True] =
E[# times J True and S False] =
13
Solution:
abN a(1 − b)M
E[# times J True] = +
ab + (1 − a)c a(1 − b) + (1 − a)(1 − c)
abN
E[# times J and S True] =
ab + (1 − a)c
a(1 − b)M
E[# times J True and S False] =
a(1 − b) + (1 − a)(1 − c)
(b) Find re-estimated values â, b̂, and ĉ in terms of M , N , E[# times J True], E[# times J and S True],
and E[# times J True and S False].
Solution:
E[# times J True]
â =
M +N
E[# times J and S True]
b̂ =
E[# times J True]
E[# times J False and S True]
ĉ =
M + N − E[# times J True]
Question 19
There is a lion in a cage in the dungeons under Castle Rock.
Solution:
Z F B
(b) (3 points) Circe pets the lion, and it bites her hand. In terms of the unknown parameters
P , Q, R, and S, what is the probability that the zookeper is on vacation?
14
Solution:
P (Z, F, B) + P (Z, ¬F, B)
P (Z|B) =
P (Z, F, B) + P (Z, ¬F, B) + P (¬Z, F, B) + P (¬Z, ¬F, B)
PR
=
P R + (1 − P )QS + (1 − P )(1 − Q)R
(c) (2 points) Lord Lucky, the Lord of Castle Rock, hires a troupe of circus performers to pet
the lion, once per day, in an attempt to learn the parameters P , Q, R, and S. Over the
course of seven days, he collects the following observations. Based on these observations,
estimate P , Q, R, and S.
Day Z F B
1 1 0 1
2 0 1 1
3 0 1 0
4 1 0 0
5 0 0 1
6 0 1 0
7 0 1 0
Question 20
The University of Illinois Vaccavolatology Department has four professors, named Aya, Bob,
Cho, and Dale. The building has only one key, so we take special care to protect it. Every day
Aya goes to the gym, and on the days she has the key, 60% of the time she forgets it next to
the bench press. When that happens one of the other three TAs, equally likely, always finds it
since they work out right after. Bob likes to hang out at Einstein Bagels and 50% of the time
he is there with the key, he forgets the key at the shop. Luckily Cho always shows up there
and finds the key whenever Bob forgets it. Cho has a hole in her pocket and ends up losing
the key 80% of the time somewhere on Goodwin street. However, Dale takes the same path
to campus and always finds the key. Dale has a 10% chance to lose the key somewhere in the
Vaccavolatology classroom, but then Cho picks it up. The professors lose the key at most once
per day, around noon (after losing it they become extra careful for the rest of the day), and
15
they always find it the same day in the early afternoon.
(a) Let Xt = the first letter of the name of the person who has the key (Xt ∈ {A, B, C, D}).
Find the Markov transition probabilities P (Xt |Xt−1 ).
Xt
Xt−1 A B C D
A 0.4 0.2 0.2 0.2
Solution:
B 0 0.5 0.5 0
C 0 0 0.2 0.8
D 0 0 0.1 0.9
(b) Sunday night Bob had the key (the initial state distribution assigns probability 1 to X0 = B
and probability 0 to all other states). The first lecture of the week is Tuesday at 4:30pm, so
one of the professors needs to open the building at that time. What is the probability for
each professor to have the key at that time? Let X0 , XM on and XT ue be random variables
corresponding to who has the key Sunday, Monday, and Tuesday evenings, respectively.
Fill in the probabilities in the table below.
Professor P (X0 ) P (XM on ) P (XT ue )
A 0
B 1
C 0
D 0
Question 21
A particular hidden Markov model (HMM) has state variable Yt , and observation variables Xt ,
where t denotes time. Suppose that this HMM has two states, Yt ∈ {0, 1}, and three possible
observations, Xt ∈ {0, 1, 2}. The initial state probability is P (Y1 = 1) = 0.3. The transition
and observation probability matrices are
{X1 , X2 } = {2, 1}
16
Solution:
(b) What is the probability of the most likely state sequence ending in Y2 = 0? In other words,
what is maxY1 P (Y1 , X1 = 2, Y2 = 0, X2 = 1)?
Solution:
max P (Y1 , X1 = 2, Y2 = 0, X2 = 1) = max P (Y1 )P (X1 = 2|Y1 )P (Y2 = 0|Y1 )P (X2 = 1|Y2 = 0)
Y1 Y1
= max ((0.7)(0.5)(0.4)(0.1), (0.3)(0.3)(0.6)(0.1))
= (0.7)(0.5)(0.4)(0.1)
Question 22
A particular HMM has a binary state variable Yt ∈ {0, 1}, and a binary observation variable
Xt ∈ {0, 1}. Suppose the HMM starts at time t = 1 with initial probability P (Y1 = 1) = 0.5.
The transition probabilities and observation probabilities are given in the following table:
P (Yt+1 = 1|Yt ) P (Xt = 1|Yt )
Yt = 0 0.8 0.5
Yt = 1 0.6 0.3
What is P (Y1 = 1, X1 = 0, X2 = 1)? Express your answer as a sum of products; do not simplify.
Solution:
P (Y1 = 1, X1 = 0, X2 = 1) = P (Y1 = 1)P (X1 = 0|Y1 = 1)P (Y2 = 0|Y1 = 1)P (X2 = 1|Y2 = 0)
+ P (Y1 = 1)P (X1 = 0|Y1 = 1)P (Y2 = 1|Y1 = 1)P (X2 = 1|Y2 = 1)
= (0.5)(0.7)(0.4)(0.5) + (0.5)(0.7)(0.6)(0.3)
Question 23
A particular HMM has a binary state variable Xt ∈ {0, 1}, and a binary observation variable
Et ∈ {0, 1}. Suppose the HMM starts at time t = 1 with initial probability P (X1 = 1) = 0.5.
The transition probabilities and observation probabilities are given in the following table:
P (Xt+1 = 1|Xt ) P (Et = 1|Xt )
Xt 0 0.2 0.4
1 0.8 0.3
What is P (X1 = 0, E1 = 0, E2 = 0)? Express your answer as a sum of products; do not simplify.
17
Solution:
(0.5)(0.6)(0.8)(0.6) + (0.5)(0.6)(0.2)(0.7)
Question 24
A particular HMM has a binary state variable Xt ∈ {0, 1}, and a binary observation variable
Et ∈ {0, 1}. Suppose the HMM starts at time t = 1 with initial probability P (X1 = 1) = 0.2.
The transition probabilities and observation probabilities are given in the following table:
P (Xt+1 = 1|Xt ) P (Et = 1|Xt )
Xt 0 0.3 0.8
1 0.3 0.9
What is P (X1 = 0, E1 = 0, E2 = 0)? Express your answer as a sum of products; do not simplify.
Solution:
(0.8)(0.2)(0.7)(0.2) + (0.8)(0.2)(0.3)(0.1)
18