Birla Institute of Technology & Science - Pilani, Hyderabad Campus
First Semester 2023-24
CS F320 – Foundations of Data Science
Mid Semester Examination
Type: Closed Time: 90 mins Max Marks: 60 Date: 13.10.2023
All parts of the same question should be answered together.
1. Consider samples x1, . . ,xn from a Gaussian random variable with known variance ɚ2 and unknown
mean u. We further assume a prior distribution (also Gaussian) over the mean, u that follows normal
distribution with mean m and variance s2 i.e., N(m; s2), with fixed mean m and fixed variance s2. Thus
the only unknown is u.
a. MAP is defined as
u’ = argmax log p(u/D)
u
Calculate the MAP estimate, u’.
Sol:
b. Show that as the number of samples n increase, the MAP estimate converges to the maximum
likelihood estimate.
Sol:
c. Suppose n is small and fixed. What does the MAP estimator converge to if we increase the prior
variance s2?
Sol:
d. Suppose n is small and fixed. What does the MAP estimator converge to if we decrease the prior
variance s2? [18 Marks]
Sol:
2. Suppose y1, y2, . . yn are drawn from a univariate normal distribution. Find out the Maximum
Likelihood Estimate of Variance of the normal distribution. [6 Marks]
Sol:
3. a. What is the significance of having validation data set on top of training data set and testing data
set. Support your answer with a case study. [3 Marks]
Sol:
3.b. Suppose there are two features for a regression problem and you are asked to fit a polynomial of
degree four to this problem. Find out the number of all possible weight parameters for this problem
with suitable explanation. [3 Marks]
Sol:
3.c. Does the gradient descent algorithm guarantee to converge to local minima or is there any
possibility of divergence? Support your answer with appropriate reasoning. [3 Marks]
Sol: If the learning rate, eta, starts taking sufficiently large values then the gradient descent
algorithm might get to diverge.
4. Derive the decomposition of expected loss for polynomial regression to noise, bias and variance by
making necessary assumptions. [12 Marks]
Sol: Refer to class notes
5. The number of paramedical treatments for a particular sports injury is a random variable X with
probability mass function P(X = k) = (11−k)/55 for 1 ≤ k ≤ 10. What are the expected value and standard
deviation of X? An insurance policy reimburses the costs of the treatments up to a maximum of five
treatments. What are the expected value and the standard deviation of the number of reimbursed
treatments? [8 Marks]
Sol:
7. Let X be a random vector with mean m and covariance matrix ∑. Let A be a real valued matrix. Prove
that the covariance matrix of AX is A ∑ AT. [7 Marks]
Hint: Cov [Z] = E( (Z – E(Z))(Z – E(Z))T)
E(B Z ZT BT) = B E(Z ZT) BT
Assume X is a n component vector and A is an nXn matrix.
Sol: