Master’s Written Examination
Option: Statistics and Probability Spring 2016
Full points may be obtained for correct answers to eight questions. Each numbered question
(which may have several parts) is worth the same number of points. All answers will be
graded, but the score for the examination will be the sum of the scores of your best eight
solutions.
Use separate answer sheets for each question. DO NOT PUT YOUR NAME ON
YOUR ANSWER SHEETS. When you have finished, insert all your answer sheets into
the envelope provided, then seal it.
1
Problem 1—Stat 401. Let X and Y have parameters E(X) = µ1 , E(Y ) = µ2 , V ar(x) =
σ12 , V ar(Y ) = σ22 and correlation coefficient ρ of X and Y . Show that the correlation
coefficient of X and Y − ρσσ1
2
X is zero.
Solution to Problem 1.
ρσ2 ρσ2
Since the correlation coefficient ρ X, Y − X = 0 iff. Cov X, Y − X = 0, it suf-
σ1 σ1
fices to show
ρσ2
Cov X, Y − X = E[X(Y − ρσ2 X/σ1 )] − EXE[Y − ρσ2 X/σ1 ]
σ1
ρσ2 ρσ2
= E[XY ] − E[X 2 ] − EXEY + (EX)2
σ1 σ1
ρσ2 2 ρσ2
= E[XY ] − EXEY − (σ1 + µ21 ) −
σ1 σ1
= EXY − EXEY − ρσ1 σ2 = 0
Problem 2—Stat 401. Suppose Xn converges to X in distribution and Yn converges in
probability to some constant C. Show that Xn + Yn converges to X + C in distribution.
Solution to Problem 2.
Yn converges in probability to some constant C Let C(F ) := {x ∈ R : F is continuous at x}.
Apparently C(FX+C ) = {x : FX is continuous at x − C}. Thus for any a ∈ C(FX+C ),
a − C ∈ C(FX ). We need to show
lim FXn +Yn (a) = FX+C (a).
n→∞
For this purpose, we first prove limn→∞ FXn +Yn (a) ≤ FX+C (a). Let {k } ↓ 0 be a sequence
of points such that {a − C + k ⊂ C(FX )}. It is possible to select such a sequence of {k }k≥1
because C(FX )c is at most countable. Therefore, for any k ≥ 1,
P[Xn + Yn ≤ a] = P[Xn + Yn ≤ a, |Yn − C| ≥ k ] + P[Xn + Yn ≤ a, |Yn − C| < k ]
≤ P[|Yn − C| ≥ k ] + P[Xn ≤ a − C + k ].
Letting n → ∞, note that Yn converges in probability to some constant C, we have
lim sup FXn +Yn (a) ≤ lim inf FXn (a − C + k ) = FX (a − C + k ), for all k ≥ 1.
n→∞ n→∞
Note that FX is continuous at all a − C + k , letting k → ∞ proves
limn→∞ FXn +Yn (a) ≤ FX+C (a).
2
To prove limn→∞ FXn +Yn (a) ≥ FX+C (a), we select {k } ↓ 0 be a sequence of points such that
{a − C − k ⊂ C(FX )}. Similar to the argument above, we can show P[Xn + Yn > a] ≤
P[|Yn − C| ≥ k ] + P[Xn > a − C − k ] for all n, k. First fix k, letting n → ∞ yields
lim sup 1 − FXn +Yn (a) = P[Xn + Yn > a] ≤ 1 − FX (a − C − k ), for every k.
n→∞
i.e.,
lim inf FXn +Yn (a) ≥ FX (a − C − k ), for every k.
n→∞
Again since FX is continuous at a − C, now letting k → ∞ completes the proof.
Problem 3—Stat 411. Let X1 , X2 , ..., Xn be a random sample from N (0, θ2 ), 0 < θ < ∞.
(i) Find the Fisher information I(θ).
(ii) Derive θ̂ (MLE of θ).
√
(iii) What is the asymptotic distribution of n(θ̂ − θ)?
(iv) Compute the efficiency of θ̂. Hint: χ2 distribution is also a special gamma distribution.
The density function of gamma distribution with parameters (α, β) can be written as
1
Γ(α)β α
xα−1 e−x/β , x > 0.
Solution to Problem 3.
(i) The Fisher information I(θ) can be computed as
∂ 2 log f (X, θ)
I(θ) = −E( )
∂θ2
1 3
= −E( 2 − 4 X 2 )
θ θ
2
= 2.
θ
(ii) The joint likelihood function of X1 , X2 , ..., Xn can be written as
n
−n/2 −n 1 X 2
L(θ) = (2π) θ exp(− 2 X ).
2θ i=1 i
∂ log L(θ)
Setting ∂θ
= 0, we have
Pn 2
n i=1 Xi
− + = 0.
θ θ3
pPn
Solving this equation about θ, we have θ̂ = i=1 Xi2 /n.
3
√
(iii) The asymptotic distribution of n(θ̂ − θ) is normal distribution with mean 0 and
2
variance I −1 (θ) = θ2 .
(iv) We need to compute the variance of θ̂. Notice that z = ni=1 Xi2 /θ2 follows χ2 distri-
P
bution with degrees of freedom n, which is also a gamma distribution with α = n/2
n
and β = 2, i.e., with density function f (z) = n1 n2 z 2 −1 e−z/2 , z > 0. So we have
Γ( 2 )2
√ Z ∞ √
z n
−1 −z/2
E( Z) = n n z
2 e dz
0 Γ( 2
)2 2
Z ∞
1 n+1
−1 −z/2
= n n z
2 e dz
0 Γ( 2
)2 2
n+1 Z
∞
Γ( n+1
2
)2 2 1 n+1
−1 −z/2
= n n/ n n+1 n+1 z
2 e dz
Γ( 2 )2 2 0 Γ( 2 )2 2
√
2Γ( n+1
2
)
= n .
Γ( 2 )
Thus,
Pn r Pn !2
2
i=1 Xi θ i=1 X 2
i 2 2θ2 Γ2 ( n+1
2
)
V ar(θ̂) = E( )− E( √ 2
) = θ − 2 n
n n θ nΓ ( 2 )
So the efficiency of θ̂ is
(nI(θ))−1 nΓ2 ( n2 )
=
2n nΓ2 ( n2 ) − Γ2 ( n+1
V ar(θ̂) 2
)
Problem 4—Stat 411. Let X1 , X2 , . . . , Xn be a random sample from the uniform distri-
bution over the interval (θ − 1, θ + 1). The parameter θ can be any real number.
(a) Show that the order statistics Y1 = mini {Xi } and Yn = maxi {Xi } are jointly sufficient
statistics for θ.
(b) Show that Y1 and Yn are also minimally sufficient for θ.
(c) Are Y1 and Yn jointly complete statistics for −∞ < θ < ∞? Why?
Solution to Problem 4.
(a) The joint pdf of X1 , . . . , Xn
n
Y 1
I(θ−1,θ+1) (xi ) = 2−n · I(θ−1,+∞) (Y1 )I(−∞,θ+1) (Yn )
i=1
2
By Factorization theorem, Y1 and Yn are sufficient for θ.
4
(b) For two random samples x = (x1 , . . . , xn )T and z = (z1 , . . . , zn )T . The ratio of joint
pdf Qn 1
I (x )
2 (θ−1,θ+1) i I(θ−1,+∞) (mini {xi }) I(−∞,θ+1) (maxi {xi })
Qi=1
n 1 = ·
i=1 2 I(θ−1,θ+1) (zi ) I(θ−1,+∞) (mini {zi }) I(−∞,θ+1) (maxi {zi })
does not depend on θ if and only if mini {xi } = mini {zi } and maxi {xi } = maxi {zi }.
Therefore, Y1 and Yn are minimally sufficient for θ.
(c) No, Y1 and Yn are not jointly complete for θ. Because
Yn − Y1 n − 1
E − =0
2 n+1
for all θ. That is, there is a nonzero function of those statistics whose expectation is
always zero.
Problem 5—Stat 411. Let X1 , . . . , Xn be an iid sample from a shifted exponential
distribution, i.e., the density function for each Xi is
fθ (x) = e−(x−θ) I(θ,∞) (x), θ ∈ (−∞, ∞).
1. Consider testing H0 : θ = θ0 versus H1 : θ = θ1 , where θ0 and θ1 are fixed, with θ1 > θ0 .
Show that the most powerful test is of the form
reject H0 if and only if X(1) > c
for some constant c, where X(1) = min{X1 , . . . , Xn } is the sample minimum.
2. For a specified α ∈ (0, 1), find the constant c so that the size, or Type I error probability,
of the test in Part (a) is α.
3. Calculate the power, pn (θ1 ), of the test at the alternative θ1 . What happens to pn (θ1 )
as n → ∞?
Solution to Problem 5.
1. The likelihood function is
n n
Y Y Pn
L(θ) = fθ (Xi ) = e−(Xi −θ) I(θ,∞) (Xi ) = e− i=1 (Xi −θ) I(θ,∞) (X(1) ).
i=1 i=1
Then the likelihood ratio is
L(θ0 ) I(θ ,∞) (X(1) )
= en(θ0 −θ1 ) 0 .
L(θ1 ) I(θ1 ,∞) (X(1) )
According to the Neyman–Pearson lemma, the most powerful test rejects H0 iff the
likelihood ratio above is small which, in this case, is equivalent to X(1) being big.
Therefore, the most powerful test rejects H0 iff X(1) is bigger than some constant c.
5
2. To find the constant c so that the size of the test is the specified level α, we must solve
the equation:
Pθ0 (X(1) > c) = α.
Since the Xi ’s are iid, the left-hand side above can be rewritten as
Pθ0 (X(1) > c) = Pθ0 (X1 > c)n = e−n(c−θ0 ) .
Setting this equal to α and solving gives
c = θ0 − n−1 log α.
3. Let pn (θ1 ) be the power function. Then a calculation similar to that for the size above
gives
pn (θ1 ) = Pθ1 (X(1) > c) = e−n max{c−θ1 ,0} .
Plugging in the value of c derived above gives
−1
pn (θ1 ) = e−n max{θ0 −θ1 −n log α}
= min{1, αe−n(θ0 −θ1 ) }.
Since θ1 > θ0 , the second term in the minimum → ∞ as n → ∞. Therefore, the power
is converging to 1 as n → ∞.
Problem 6—Stat 416. Consider the following two independent random samples drawn
from continuous populations which have the same form but possibly a difference of θ in their
locations:
X 79 13 138 129 59 76 75 53
Y 96 141 133 107 102 129 110 104
(a) Using the Mann-Whitney test and the significance level 0.10, test
H0 : θ = 0 versus H1 : θ 6= 0
(For a two-sided test with significance level 0.10, the rejection region for the Mann-
Whitney test is U ≤ 15 or U ≥ 49.)
(b) For what kind of distributions, the Mann-Whitney test performs better than the t test?
Solution to Problem 6.
(a) In this case, m = 8, n = 8. Note that there is a tie that X4 = Y6 = 129. The Mann-
Whitney U statistic is either 12 (X4 precedes Y6 ) or 13 (Y6 precedes X4 ). In either
case, we reject the null hypothesis.
6
(b) The Mann-Whitney test performs better than the t test for heavy-tailed distributions
including the double exponential distribution and the logistic distribution.
Problem 7—Stat 431. A client has a finite population of 6 units and has funds to survey
3 units for the purpose of estimating the mean and its related standard deviation. There are
two sampling plans to be considered.
• Sampling Plan 1: A simple random sample of size 3 without replacement, SRS (6, 3)
• Sampling Plan 2: A uniform sampling plan on the following support. Samples in the
support: {1,3,5}, {2,4,5}, {1,4,6}, {1,2,3}, {2,5,6}. Suppose upon the survey we obtain
the following data: 10, 15, and 20.
Answer the following questions:
(i) Can we estimate the HT estimation of the population mean under both sampling plans?
If yes, compute the estimation.
(ii) Can we estimate the standard deviation of your estimator under both sampling plans?
If yes, compute the estimation.
Solution to Problem 7.
(i) To compute the HT estimator, we need to compute the first order inclusive probability.
For Plan 1, the first order inclusive probability is πi = 1/2, i = 1, . . . , 6. For Plan 2, the
first order inclusive probability is π1 = 3/5, πP 2 = 3/5, π3 = 2/5, π4 = 2/5, π5 = 3/5,
and π6 = 2/5. HTE estimator is given by n i∈S Yπii . Notice that we don’t have the
1
information which samples are corresponding to the data 10, 15, and 20. So we cannot
estimate the population mean of based on HTE for Plan 2. However, we are still be
able to estimate HTE based on Plan I since all πi ’s are the same. The corresponding
HTE for Plan 1 is 61 (10 + 15 + 20) × 2 = 15.
(ii) Plan 1 is simple random sample. The variance of HT estimator under simple random
sample is given by ( n1 − N1 )s2 , where s2 is sample variance. Thus the standard deviation
p p
of the HT estimator is (1/3 − 1/6) × 25 = 25/6.
For Plan 2, we cannot estimate the standard deviation since we do not have the required
information on unit samples.
Problem 8—Stat 451. Consider a triangular distribution with density function
f (x) = 1 − |x|, x ∈ [−1, 1].
7
1. Propose an accept–reject procedure to simulate a random variable X having the tri-
angular distribution above. (Hint: Keep it simple!)
2. What is the acceptance probability for your proposed method?
3. Is the X produced by your accept–reject method an exact or approximate sample from
the triangular distribution? Justify your answer.
Solution to Problem 8.
1. The simplest approach is to choose a Unif(−1, 1) proposal distribution, which is possible
since the support is bounded. Let g(y) = 12 I[−1,1] (y) be the corresponding density
function. We have the following bound f (y) ≤ M g(y) where M = 2. Then the
accept–reject algorithm goes as follows:
• Sample Y ∼ g = Unif(−1, 1) and U ∼ Unif(0, 1).
f (Y )
• If U ≤ M g(Y )
= 1 − |Y |, then set X = Y ; else, go back to previous step.
2. The acceptance probability is M −1 = 21 .
3. The sample X has exactly the triangular distribution.
Problem 9—Stat 461. Consider a Markov chain X0 , X1 , ... with transition probability
matrix
1 0 0 0
0.4 0.1 0.1 0.4
P =
0.3 0.2 0.2 0.3 .
0 0 0 1
The transition probability matrix Q corresponding to the non-absorbing states is
0.1 0.1
.
0.2 0.2
Calculate the matrix inverse to I − Q, and use it to answer the following two questions:
(a) Suppose the chain starts from state 1. What is the mean time spent in each of states 1
and 2 prior to absorption?
(b) What is the probability of absorption into state 3 from state 1?
Solution to Problem 9.
0.9 −0.1
I −Q= .
−0.2 0.9
8
Hence we have
−1 1 0.8 0.1 8/7 1/7
W = (I − Q) = = .
0.72 − 0.2 0.2 0.9 2/7 9/7
Hence, given the chain starts from state 1, the mean time spent in state 1 is w11 = 8/7, the
mean time spend in state 2 is w12 = 1/7.
Note that
0.4 0.4
R= .
0.3 0.3
Since
0.5 0.5
U = WR = ,
0.5 0.5
we obtain that the probability of absorption into state 3 from state 1 is u13 = 0.5.
Problem 10—Stat 461. A population begins with a single individual. In each generation,
each individual in the population dies with probability 1/2 or doubles with probability 1/2.
Let Xn be the number of individuals in the population in the nth generation. It is clear that
Xn is a branching process with X0 = 1. Find the mean and variance of Xn .
Solution to Problem 10. Let ξi be i.i.d. random variables with a common distribution
1 1
P {ξi = 0} = , P {ξi = 2} = .
2 2
It is clear that ξi has mean µ = 1 and variance σ 2 = 1. By the definition of a Branching
process, we have
Xn+1 = ξ1 + ξ2 + · · · + ξXn .
Hence for the mean M (n) of Xn we have
M (n) = µM (n − 1) = µ2 M (n − 2) = · · · = µn = 1.
Similarly, for the variance V (n) of Xn , we have
V (n) = σ 2 M (n − 1) + µ2 V (n − 1) = 1 + V (n − 1) = · · · = n.
Problem 11—Stat 481. In an attempt to study fat absorption in doughnuts, 24 dough-
nuts are randomly selected in the study of four kind of fats. The dependent variable is
grams of fat absorbed, and the factor variable is the type of fat. The factor contains 4 levels
(four types of fat were tested) and there are 6 doughnuts from each of 4 kinds of fats. The
researcher accidentally dropped one of the doughnuts from the second type of fat, so the
second type of fat contains 5 observations instead of 6. Given SST R = 1504.5, SSE = 2018,
9
(1). What is the research objective of this study? Determine the parameters of interest first
and state the null and alternative hypothesis for the parameters.
(2). What design is employed in this study? What model will you suggest to fit the data?
Write down the model with necessary assumptions.
(3). Construct an ANOVA table, and draw your decision accordingly. The significance α
level 0.05 is given. [F0.05 (3, 19) = 3.12, F0.05 (3, 20) = 3.10.]
(4). If we would like to do further analysis on the data, for example to test hypotheses on all
pairwise comparison between four treatment levels simultaneously. What method(s) would
you like to suggest? Why?
Solution to Problem 11. (1). To investigate the fat absorption of four kinds of fats in
doughnuts. Denote µi is the mean fat absorption of the i-th kind of fat, i = 1, ..., 4. Null
hypothesis H0 : µ1 = µ2 = µ3 = µ4 vs alternative hypothesis H1 : at least one µi is different
from other means.
(2). It is a completely randomized design with a fixed effect. One-way ANOVA model, Yij =
i.i.d.
µi + εij , εi ∼ N (0, σ 2 ) , i = 1, ..., 4; j = 1, ..., ni .
(3). ANOVA table
Source of Variation SS DF M S F p − value
T reatment 1504.5 3 501.5 4.722 0.013
Error 2018 19 106.2
T otal 3522.5 22
As p-value < 0.05, then the null hypothesis will be rejected, i.e. there is significant evidence
to show that there exists difference(s) among the mean fab absorptions of the four kinds of
fats in the study.
(4). For simultaneous pairwise comparisons among the four mean fat absorptions, Tukey’s
method is recommended as the studentized range test for pairwise comparison is exact. It
have more accurate estimation for pairwise comparison than the conservative though flexible
Bonferroni’s method, also better than the Scheffé’s confidence region method for all linear
hypotheses (contrasts of the four means).
Problem 12—Stat 481. Consider a linear regression model Yi = β0 + β1 x1i + β2 x2i + εi ,
where i.i.d. errors εi ∼ N (0, σ 2 ) , i = 1, ..., n. Constant variance σ 2 is unknown.
(1). Express the model in the matrix form with the notation below
Y1 1 x1,1 x2,1
β0
Y = ... , X = ... .. .. , β = β .
. . 1
Yn 1 x1,n x2,n β2
Based on the least square criterion (loss function),
derive the 0 normal equation and then the
least squares estimates for the coefficients, β̂ = β̂0 , β̂1 , β̂2 .
10
(2). Calculate the variance-covariance matrix of linear coefficient estimators V ar β̂ , and
determine the distribution of β̂.
(3). Construct the confidence interval for the mean response µi = E (Yi ) at a design point
(x1i, x2i ) .
Solution to Problem 12. (1). The linear regression model can be written as Y =
Xβ + ε, ε ∼ Nn (0, σ 2 In ) Least square objective function: Q (β) = (Y − Xβ)0 (Y − Xβ) .
Take derivative w.r.t. β and obtain the normal equation, i.e.
−1
∂Q (β) /∂β = 0 ⇒ X 0 Xβ = X 0 Y ⇒ β̂ = (X 0 X) X 0Y
(2). The variance-covaraince matrix of the least square estimator β̂ is
−1 −1
V ar β̂ = (X 0 X) X 0 · V ar (Y ) · X (X 0 X)
−1 −1 −1
= σ 2 (X 0 X) X 0 · In · X (X 0 X) = σ 2 (X 0 X) .
In addition, we can show that E β̂ = (X 0 X)−1 X 0 · (Xβ) = β. As β̂ = (X 0 X)−1 X 0 Y, so it
follows a normal distribution, i.e. β̂ ∼ N3 β, σ 2 (X 0 X)−1 .
(3). The mean response at a design point (x1i, x2i ) is µi = β0 + β1 x1i + β2 x2i = Xi0 β, where
Xi0 = (1, x1i, x2i ). Its estimator based on the least square estimates β̂ is µ̂i = Xi0 β̂, its
distribution also follows normal,
0 0 2 0 0 −1 2 0 0 −1
Xi β̂ ∼ N Xi β, σ Xi (X X) Xi = N µi , σ Xi (X X) Xi .
Its sampling distribution with σ 2 unknown is
Xi0 β̂ − µi
q ∼ t (n − 3)
M SE · Xi0 (X 0 X)−1 Xi
with M SE = SSE/ (n − 3) . The 100 (1 − α) % confidence interval for µi is
q
Xi β̂ ± tα/2 (n − 3) · M SE · Xi0 (X 0 X)−1 Xi .
0
11