exam_practice
exam_practice
• This document contains some old exam questions. You can solve these questions to help
prepare for the exam.
• Be aware that the questions here are not necessarily representative of the exam. The
exam may contain questions that are more difficult or easier than the ones here.
• Just as in the exam, you may use all results that we proved in the lectures as long as you
justify why they can be used.
1
1 True/False questions
There will be no True/False questions on the exam, however, you can still use these questions
to check whether you understand the material of the course. Determine whether each of the
following statements are true or false.
1
1.1 Let p : [0, ∞) → [0, ∞) be the density defined for all x ∈ [0, ∞) by p(x) := √ 1 e− 2 x .
2πx
Then a random variable Z with density p is sub-Gaussian.
where X ⊤ X is invertible. Let βbλR denote the ridge regression estimator applied to (X, Y ).
Then, for all λ > 0 it holds that Var(βbλR ) − σ 2 (X ⊤ X)−1 is positive semi-definite.
Define a scaled design matrix X̃ := 2X. Let βb1 and βb2 denote the OLS regression estimators
for a regression of Y on X and a regression of Y on X̃, respectively. Then, it holds that
X βb1 = X̃ βb2 .
1.5 Let k : R × R → R be a positive definite kernel with RKHS Hk and define the positive
definite kernel ℓ : R2 × R2 → R for all x, y ∈ R2 by ℓ(x, y) := x1 y 1 + k(x2 , y 2 ) with
corresponding RKHS Hℓ . Then, it holds that
1.6 Let k1 , k2 : X × X → R be two arbitrary positive definite kernels on X , then the function
k ∗ : X × X → R defined for all x, y ∈ X by k ∗ (x, y) := k1 (x, y) − k2 (x, y) is again a positive
definite kernel.
For λ > 0, let βbOLS and βbλR denote the OLS and ridge regression (with penalty λ) estimator
of Y on X, respectively. Assume that the singular values of X are all equal to 1. Then, it
holds that
1
X βbOLS = X βbλR .
λ+1
2
1.8 Let H be an RKHS with reproducing kernel k : X × X → R. Then it holds for all x ∈ X
and all α, β ∈ R that α + βk(x, ·) ∈ H.
where 0 < E[D2 ] < ∞ and E[Y 2 ] < ∞. Then, the linear OLS estimator from regressing Y
on D is a consistent estimate of θ0 .
1.10 Consider a data set (X1 , Y1 ), . . . , (Xn , Yn ) from the fixed-design linear regression model
Yi = Xi⊤ β0 + ϵiwith ϵi ∼ N (0, σ02 ).
1
Pn ⊤
Fix λ > 0 and define for all β ∈ Rp the Lasso loss Qλ (β) := 2n 2
i=1 (Yi − Xi β) + λ∥β∥1 .
Let βbλL be a Lasso solution. Then, the true parameter β0 always satisfies Qλ (βbλL ) ≤ Qλ (β0 ).
1.12 Consider a fixed-design linear regression model Y = Xβ0 + ε, with ε ∼ N (0, σ 2 In ) and
assume X ⊤ X is invertible. Let βb1 and βb2 be two Lasso solutions for the same penalty
parameter λ > 0, then βb1 = βb2 .
1.12 Consider the non-linear regression models given by Y = f0 (X) + ϵ, where X ∼ Unif(0, 1),
E[ϵ|X] = 0 and f0 ∈ F := {f : [0, 1] → R | f continuous}. Then, there exists an estimator
fˆ based on n i.i.d. samples such that for all f0 ∈ F the following bound holds
" n #
3
X
E 1 ˆ
(f0 (Xi ) − f (Xi )) ≤ O(n− 4 ).
2
n
i=1
1.14 Let (D1 , X1 , Y1 ), . . . , (Dn , Xn , Yn ) be i.i.d. copies of (D, X, Y ) generated by the partially
linear model
Y = Dθ0 + g0 (X) + U E[U |X, D] = 0
with
D = m0 (X) + V E[V |X] = 0.
Let θ̂nDML be the DML estimator based on the ML estimators ĝn and m̂n . Assume the two
ML estimators satisfy
1 1
lim sup n 3 · E[(m0 (X) − m̂n (X))2 ]1/2 = 1 and lim n 4 · E[(g0 (X) − ĝn (X))2 ]1/2 = 1
n→∞ n→∞
Then, given that the noise terms are sufficiently well-behaved, θ̂nDML is asymptotically
normal.
3
2 Multiple choice questions
The following questions are multiple choice. There will be no multiple choice questions at the
exam so, whenever possible, try to solve the question without using process of elimination.
2.1 Assume that we observe i.i.d. data X1 , . . . , Xn ∼ Exp(λ) for λ ∈ (0, ∞) (i.e., the density
of Xi is given by p(x) = λe−λx ) and want to estimate the variance ρ := Var(Xi ) = λ12 . For
any unbiased estimator ρ̂ of ρ, what is the minimal variance that ρ̂ can achieve?
4 4 1 1
(a) Varλ (ρ̂) ≥ nλ6 (b) Varλ (ρ̂) ≥ nλ4 (c) Varλ (ρ̂) ≥ nλ2 (d) Varλ (ρ̂) ≥ n
2.2 Consider the fixed-design linear regression model Y = Xβ0 + ϵ with ϵ ∼ N (0, σ02 In ). For
λ1 , . . . , λp > 0, denote by diag(λ1 , . . . , λp ) ∈ Rp×p the diagonal matrix with λ1 , . . . , λp on
−1 T
the diagonal and define the estimator βbλ := X T X + diag(λ1 , . . . , λp ) X Y . Which of
the following objective functions does βbλ minimize over β ∈ Rp ?
Pp Pp
(a) ∥Y − Xβ∥22 + j 2
j=1 λj (β ) (c) ∥Y − Xβ∥22 + j=1 λj ∥β∥2
2
Pp Pp
(b) ∥Y − Xβ∥22 + j 2
j=1 (λj β ) (d) ∥Y − Xβ∥22 + 2 2
j=1 (λj ) ∥β∥2
2.3 Given nK observations from a partially linear model with one dimensional parameter of
interest θ0 . Let I1 ∪ · · · ∪ IK = {1, . . . , nK} be mutually disjoint with |I1 | = · · · = |IK | = n.
Assume that training the ML estimators ĝn and m̂n on n samples each has a computational
cost of order O(n3 ). What is the cost in terms of n and K of computing the DML estimator
θ̂nDML for the partially linear model.
2.5 Let (D1 , X1 , Y1 ), . . . , (Dn , Xn , Yn ) be i.i.d. copies drawn from the partially linear model
Assume P(V = 1) = P(V = −1) = 1/2 and U = V · W where W ∼ Unif([1, −1]) and W
and V are independent. Let θ̂nDML be the DML estimator and assume that all assumptions
√
for Theorem 4.5 are satisfied. What is the asymptotic variance of n(θ̂nDML − θ0 )?
4
3 Longer questions
3.1 Consider the fixed-design linear regression model Y = Xβ0 + ϵ with β0 ∈ Rp , X ∈ Rn×p ,
i.i.d.
ϵ = (ϵ1 , . . . , ϵn ) and ϵ1 , . . . , ϵn ∼ Unif([−2, 2]). Fix λ > 0 and denote by βbλL the Lasso
estimator based on the data (X, Y ). Prove that there exists a constant C > 0 satisfying
P-a.s. that
∥βbλL ∥1 ≤ C.
3.3 Let (D1 , X1 , Y1 ), . . . , (Dn , Xn , Yn ) be i.i.d. observations from the conditional mean model
Assume that U , V , g0 and m0 are bounded and E[V 2 ] > 0. Let gbn and m b n be estimators
trained on (D1 , X1 , Y1 ), . . . , (Dn , Xn , Yn ) that are L2 -consistent for g0 and m0 , respectively.
Then, define the estimator
Pn
(Yi − gbn (Xi ))(Di − m b n (Xi ))
θn := i=1 Pn
b
2
.
i=1 (D i − m
b n (X i ))
Prove that
E[Cov(Y, D|X)]
θbn −→ in probability as n → ∞.
E[Var(D|X)]