0% found this document useful (0 votes)
165 views15 pages

The Bootstrap and The Jackknife

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views15 pages

The Bootstrap and The Jackknife

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

3

The Bootstrap and the Jackknife

The bootstrap and the jackknife are nonparametric methods for computing
standard errors and confidence intervals. The jackknife is less computationally
expensive, but the bootstrap has some statistical advantages.

3.1 The Jackknife


The jackknife, due to Quenouille (1949), is a simple method for approximating
the bias and variance of an estimator. Let Tn = T (X1 , . . . , Xn ) be an estimator
of some quantity θ and let bias(Tn ) = E(Tn ) − θ denote the bias. Let T(−i)
denote the statistic with the ith observation removed. The jackknife bias
estimate is defined by

bjack = (n − 1)(T n − Tn ) (3.1)


−1

where T n = n i T(−i) . The bias-corrected estimator is Tjack = Tn − bjack .
Why is bjack defined this way? For many statistics it can be shown that
a b 1
bias(Tn ) = + 2 +O (3.2)
n n n3

for some a and b. For example, let σ 2 = V(Xi ) and let σ n2 = n−1 ni=1 (Xi −
X)2 . Then, E(σn2 ) = (n − 1)σ 2 /n so that bias(
σn2 ) = −σ 2 /n. Thus, (3.2) holds
2
with a = −σ and b = 0.
28 3. The Bootstrap and the Jackknife

When (3.2) holds, we have

a b 1
bias(T(−i) ) = + +O . (3.3)
n − 1 (n − 1)2 n3

It follows that bias(T n ) also satisfies (3.3). Hence,


 
E(bjack ) = (n − 1) E(bias(T n )) − E(bias(Tn ))
! "
1 1 1 1 1
= (n − 1) − a+ − 2 b+O
n−1 n (n − 1)2 n n3
a (2n − 1)b 1
= + +O
n n2 (n − 1) n2
1
= bias(Tn ) + O
n2

which shows that bjack estimates the bias up to order O(n−2 ). By a similar
calculation,

b 1 1
bias(Tjack ) = − +O =O
n(n − 1) n2 n2

so the bias of Tjack is an order of magnitude smaller than that of Tn . Tjack can
also be written as
n
1
Tjack = Ti
n i=1
where
Ti = nTn − (n − 1)T(−i)
are called pseudo-values.
The jackknife estimate of V(Tn ) is

s2
vjack = (3.4)
n
where  2
n n
i=1 Ti − 1
n i=1 Ti
s2 =
n−1
is the sample variance of the pseudo-values. Under suitable conditions on T ,
it can be shown that vjack consistently estimates V(Tn ). For example, if T is
a smooth function of the sample mean, then consistency holds.

3.5 Theorem. Let µ = E(X1 ) and σ 2 = V(X1 ) < ∞ and suppose that
Tn = g(X n ) where g has a continuous, nonzero derivative at µ. Then (Tn −
3.1 The Jackknife 29

g(µ))/σn  N (0, 1) where σn2 = n−1 (g  (µ))2 σ 2 . The jackknife is consistent,


meaning that
vjack a.s.
−→ 1. (3.6)
σn2
3.7 Theorem (Efron, 1982). If T (F ) = F −1 (p) is the pth quantile, then the
jackknife variance estimate is inconsistent. For the median (p = 1/2) we have
that vjack /σn2  (χ22 /2)2 where σn2 is the asymptotic variance of the sample
median.

3.8 Example. Let Tn = X n . It is easy to see that Ti = Xi . Hence, Tjack = Tn ,


b = 0 and vjack = Sn2 /n where Sn2 is the sample variance. 

There is a connection between the jackknife and the influence function.


Recall that the influence function is
T ((1 − )F + δx ) − T (F )
LF (x) = lim . (3.9)
→0 
Suppose we approximate LF (Xi ) by setting F = Fn and  = −1/(n − 1). This
yields the approximation
T ((1 − )Fn + δxi ) − T (Fn )
LF (Xi ) ≈ = (n − 1)(Tn − T(−i) ) ≡ i .

It follows that
n
 
1 1
b=− i , vjack = 2i − nb2 .
n i=1
n(n − 1) i

In other words, the jackknife is an approximate version of the nonparametric


delta method.

3.10 Example. Consider estimating the skewness T (F ) = (x− µ)3 dF (x)/σ 3
of the nerve data. The point estimate is T (Fn ) = 1.76. The jackknife estimate
of the standard error is .17. An approximate 95 percent confidence interval
for T (F ) is 1.76 ± 2(.17) = (1.42, 2.10). These exclude 0 which suggests that
the data are not Normal. We can also compute the standard error using the
influence function. For this functional, we have (see Exercise 1)
 
(x − µ)3 3 ((x − µ)2 − σ 2 )
LF (x) = − T (F ) 1 + .
σ3 2 σ2
Then
 
n
 2 (Xi )
τ2 L
 =
se = i=1
= .18.
n n2
It is reassuring to get nearly the same answer. 
30 3. The Bootstrap and the Jackknife

3.2 The Bootstrap


The bootstrap is a method for estimating the variance and the distribution
of a statistic Tn = g(X1 , . . . , Xn ). We can also use the bootstrap to construct
confidence intervals.
Let VF (Tn ) denote the variance of Tn . We have added the subscript F to
emphasize that the variance is a function of F . If we knew F we could, at
n
least in principle, compute the variance. For example, if Tn = n−1 i=1 Xi ,
then  2  2
σ2 x dF (x) − xdF (x)
VF (Tn ) = =
n n
which is clearly a function of F .
With the bootstrap, we estimate VF (Tn ) with VFn (Tn ). In other words,
we use a plug-in estimator of the variance. Since, VFn (Tn ) may be difficult
to compute, we approximate it with a simulation estimate denoted by vboot .
Specifically, we do the following steps:

Bootstrap Variance Estimation

1. Draw X1∗ , . . . , Xn∗ ∼ Fn .

2. Compute Tn∗ = g(X1∗ , . . . , Xn∗ ).


∗ ∗
3. Repeat steps 1 and 2, B times to get Tn,1 , . . . , Tn,B .

4. Let  2
B B
1 ∗ 1 ∗
vboot = Tn,b − Tn,r . (3.11)
B B r=1
b=1

a.s.
By the law of large numbers, vboot −→ VFn (Tn ) as B → ∞. The estimated

 boot = vboot . The following diagram illustrates the
standard error of Tn is se
bootstrap idea:

Real world: F =⇒ X 1 , . . . , Xn =⇒ Tn = g(X1 , . . . , Xn )


Bootstrap world: Fn =⇒ X1∗ , . . . , Xn∗ =⇒ Tn∗ = g(X1∗ , . . . , Xn∗ )

√ √
O(1/ n) O(1/ B)
#$%& #$%&
VF (Tn ) ≈ VFn (Tn ) ≈ vboot .

How do we simulate from Fn ? Since Fn gives probability 1/n to each data
point, drawing n points at random from Fn is the same as drawing a
3.3 Parametric Bootstrap 31

Bootstrap for the Median


Given data X = (X(1), ..., X(n)):

T = median(X)
Tboot = vector of length B
for(i in 1:N){
Xstar = sample of size n from X (with replacement)
Tboot[i] = median(Xstar)
}
se = sqrt(variance(Tboot))

FIGURE 3.1. Pseudo-code for bootstrapping the median.

sample of size n with replacement from the original data. Therefore


step 1 can be replaced by:

1. Draw X1∗ , . . . , Xn∗ with replacement from X1 , . . . , Xn .

3.12 Example. Figure 3.1 shows pseudo-code for using the bootstrap to esti-
mate the standard error of the median. 

The bootstrap can be used to approximate the cdf of a statistic Tn . Let


Gn (t) = P(Tn ≤ t) be the cdf of Tn . The bootstrap approximation to Gn is

B
 ∗ 
 ∗n (t) = 1
G I Tn,b ≤t . (3.13)
B
b=1

3.3 Parametric Bootstrap


So far, we have estimated F nonparametrically. There is also a parametric
bootstrap. If Fθ depends on a parameter θ and θ is an estimate of θ, then
we simply sample from Fθ instead of Fn . This is just as accurate, but much
simpler than, the delta method.

3.14 Example. When applied to the nerve data, the bootstrap, based on B =
1000 replications, yields a standard error for the estimated skewness of .16
which is nearly identical to the jackknife estimate. 
32 3. The Bootstrap and the Jackknife

3.4 Bootstrap Confidence Intervals


There are several ways to construct bootstrap confidence intervals. They vary
in ease of calculation and accuracy.
Normal Interval. The simplest is the Normal interval

 boot
Tn ± zα/2 se

 boot is the bootstrap estimate of the standard error. This interval is


where se
not accurate unless the distribution of Tn is close to Normal.

Pivotal Intervals. Let θ = T (F ) and θn = T (Fn ) and define the pivot
Rn = θn − θ. Let H(r) denote the cdf of the pivot:

H(r) = PF (Rn ≤ r).

Let Cn = (a, b) where


 α α
a = θn − H −1 1 − and b = θn − H −1 .
2 2
It follows that

P(a ≤ θ ≤ b) = P(θn − b ≤ Rn ≤ θn − a)


= H(θn − a) − H(θn − b)
  α    α 
= H H −1 1 − − H H −1
2 2
α α
= 1 − − = 1 − α.
2 2
Hence, Cn is an exact 1 − α confidence interval for θ. Unfortunately, a and b
depend on the unknown distribution H but we can form a bootstrap estimate
of H:
B
 1 ∗
H(r) = I(Rn,b ≤ r)
B
b=1

where Rn,b ∗
= θn,b

−θn . rβ∗
Let denote the β sample quantile of (Rn,1 ∗ ∗
, . . . , Rn,B )
and let θβ∗ denote the ∗
β sample quantile of (θn,1 ∗
, . . . , θn,B ). Note that rβ∗ =
θβ∗ − θn . It follows that an approximate 1 − α confidence interval is Cn = ( a, b)
where
 
a = θn − H
  −1 1 − α = θn − r∗  ∗
1−α/2 = 2θn − θ1−α/2
  2
 −1 α
b = θn − H = θn − rα/2

= 2θn − θα/2

.
2
3.4 Bootstrap Confidence Intervals 33

In summary:

The 1 − α bootstrap pivotal confidence interval is


 
Cn = 2θn − θ((1−α/2)B)

, 2θn − θ((α/2)B)

. (3.15)

Typically, this is a pointwise, asymptotic confidence interval.

The next result follows from Theorem 3.21.

3.16 Theorem. If T (F ) is Hadamard differentiable and Cn is given in (3.15)


then PF (T (F ) ∈ Cn ) → 1 − α.

Studentized Pivotal Interval. There is a different version of the pivotal in-


terval which has some advantages. Let

Tn − θ
Zn =
 boot
se

and


Tn,b − Tn
Zn,b =
 ∗b
se

where se  ∗b is an estimate of the standard error of Tn,b ∗


not Tn . Now we reason
as in the pivotal interval. The sample quantiles of the bootstrap quantities
∗ ∗
Zn,1 , . . . , Zn,B should approximate the true quantiles of the distribution of Zn .
∗ ∗ ∗
Let zα denote the α sample quantile of Zn,1 , . . . , Zn,B , then P(Zn ≤ zα∗ ) ≈ α.
Let
 
∗ ∗
Cn = Tn − z1−α/2  boot , Tn − zα/2
se  boot .
se

Then,
 
∗ ∗
P(θ ∈ Cn ) = P Tn − z1−α/2  boot ≤ θ ≤ Tn − zα/2
se  boot
se

∗ Tn − θ ∗
= P zα/2 ≤ ≤ z1−α/2
seboot
 
∗ ∗
= P zα/2 ≤ Zn ≤ z1−α/2
≈ 1 − α.

This interval has higher accuracy then all the intervals discussed so far (see
Section 3.5) but there is a catch: you need to compute se ∗b for each bootstrap
sample. This may require doing a second bootstrap within each bootstrap.
34 3. The Bootstrap and the Jackknife

The 1 − α bootstrap studentized pivotal interval is


 
∗ ∗
Tn − z1−α/2  boot , Tn − zα/2
se  boot
se

where zβ∗ is the β quantile of Zn,1


∗ ∗
, . . . , Zn,B and


Tn,b − Tn
Zn,b = .
 ∗b
se

Percentile Intervals. The bootstrap percentile interval is defined by


 
∗ ∗
Cn = T(Bα/2) , T(B(1−α/2)) ,

that is, just use the α/2 and 1 − α/2 quantiles of the bootstrap sample. The
justification for this interval is as follows. Suppose there exists a monotone
transformation U = m(T ) such that U ∼ N (φ, c2 ) where φ = m(θ). We do not
suppose we know the transformation, only that one exists. Let Ub∗ = m(Tb∗ ).
∗ ∗
Note that U(Bα/2) = m(T(Bα/2) ) since a monotone transformation preserves
2
quantiles. Since, U ∼ N (φ, c ), the α/2 quantile of U is φ − zα/2 c. Hence,
∗ ∗
U(Bα/2) = φ − zα/2 c ≈ U − zα/2 c and U(B(1−α/2)) ≈ U + zα/2 c. Therefore,
∗ ∗ ∗ ∗
P(TBα/2 ≤ θ ≤ TB(1−α/2) ) = P(m(TBα/2 ) ≤ m(θ) ≤ m(TB(1−α/2) ))
∗ ∗
= P(UBα/2 ≤ φ ≤ UB(1−α/2) )
≈ P(U − czα/2 ≤ φ ≤ U + czα/2 )
U −φ
= P −zα/2 ≤ ≤ zα/2
c
= 1 − α.

Amazingly, we never need to know m. Unfortunately, an exact normalizing


transformation will rarely exist but there may exist approximate normaliz-
ing transformations. This has led to an industry of adjusted percentile
methods, the most popular being the BCa interval (bias-corrected and ac-
celerated). We will not consider these intervals here.

3.17 Example. For estimating the skewness of the nerve data, here are the
various confidence intervals.
Method 95% Interval
Normal (1.44, 2.09)
percentile (1.42, 2.03)
pivotal (1.48, 2.11)
studentized (1.45, 2.28)
3.5 Some Theory 35

The studentized interval requires some explanation. For each bootstrap


replication we compute θ∗ and we also need the standard error se ∗ of θ∗ .
We could do a bootstrap within the bootstrap (called a double bootstrap)
 ∗ using the
but this is computationally expensive. Instead, we computed se
nonparametric delta method applied to the bootstrap sample as described in
Example 3.10. 

3.5 Some Theory


Under certain conditions, G ∗ is a consistent estimate of Gn (t) = P(Tn ≤ t).
n
To make this precise, let PFn (·) denote probability statements made from Fn ,
treating the original data X1 , . . . , Xn as fixed. Assume that Tn = T (Fn ) is
some functional of Fn . Then,


∗ (t) = P  (T (F ∗ ) ≤ t) = P 
G n(T (Fn∗ ) − T (F )) ≤ u (3.18)
n Fn n Fn


where u = n(t − T (F )). Consistency of the bootstrap can now be expressed
as follows.

3.19 Theorem. Suppose that E(X12 ) < ∞. Let Tn = g(X n ) where g is con-
tinuously differentiable at µ = E(X1 ) and that g  (µ) = 0. Then,
 
 √ √ 
  a.s.
supPFn n(T (Fn ) − T (Fn )) ≤ u − PF

n(T (Fn ) − T (F ) ≤ u −→ 0.
u  
(3.20)

3.21 Theorem. Suppose that T (F ) is Hadamard differentiable with respect to



d(F, G) = supx |F (x) − G(x)| and that 0 < L2F (x)dF (x) < ∞. Then,
 
 √ √ 
  P
supPFn n(T (Fn∗ ) − T (Fn )) ≤ u − PF n(T (Fn ) − T (F )) ≤ u −→ 0.
u  
(3.22)

Look closely at Theorems 3.19 and 3.21. It is because of results like these
that the bootstrap “works.” In particular, the validity of bootstrap confidence
intervals depends on these theorems. See, for example, Theorem 3.16. There
is a tendency to treat the bootstrap as a panacea for all problems. But the
bootstrap requires regularity conditions to yield valid answers. It should not
be applied blindly.
36 3. The Bootstrap and the Jackknife

It can also be shown that the bootstrap variance estimate is consistent


with some conditions on T . Generally, the conditions for consistency of the
bootstrap are weaker than those for the jackknife. For example, the bootstrap
estimate of the variance of the median is consistent, but the jackknife estimate
of the variance of the median is not consistent (Theorem 3.7).
Let us now compare the accuracy of the different confidence interval meth-
ods. Consider a 1 − α one-sided interval [θα , ∞). We would like P(θ ≤ θα ) = α
but usually this holds only approximately. If P(θ ≤ θα ) = α + O(n−1/2 ) then
we say that the interval is first-order accurate. If P(θ ≤ θα ) = α + O(n−1 )
then we say that the interval is second-order accurate. Here is the comparison:

Method Accuracy
Normal interval first-order accurate
basic pivotal interval first-order accurate
percentile interval first-order accurate
studentized pivotal interval second-order accurate
adjusted percentile interval second-order accurate

Here is an explanation of why the studentized interval is more accurate.


See Davison and Hinkley (1997), and Hall (1992a), for more details. Let Zn =

n(Tn −θ)/σ be a standardized quantity that converges to a standard Normal.
Thus PF (Zn ≤ z) → Φ(z). In fact,
1 1
PF (Zn ≤ z) = Φ(z) + √ a(z)φ(z) + O (3.23)
n n
for some polynomial a involving things like skewness. The bootstrap version
satisfies
1 1
PF (Zn∗ ≤ z) = Φ(z) + √  a(z)φ(z) + OP (3.24)
n n
a(z) − a(z) = OP (n−1/2 ). Subtracting, we get
where 
1
PF (Zn ≤ z) − PF (Zn∗ ≤ z) = OP . (3.25)
n

Now suppose we work with the nonstudentized quantity Vn = n(Tn − θ).
Then,
Vn z
PF (Vn ≤ z) = PF ≤
σ σ
z 1 z z 1
= Φ +√ b φ +O
σ n σ σ n
3.6 Bibliographic Remarks 37

for some polynomial b. For the bootstrap we have

Vn z
PF (Vn∗ ≤ z) = PF ≤
σ 
σ
z 1  z   z  1
= Φ +√ b φ + OP

σ n σ  
σ n

 = σ + OP (n−1/2 ). Subtracting, we get


where σ

1
PF (Vn ≤ z) − PF (Vn∗ ≤ z) = OP √ (3.26)
n

which is less accurate than (3.25).

3.6 Bibliographic Remarks


The jackknife was invented by Quenouille (1949) and Tukey (1958). The boot-
strap was invented by Efron (1979). There are several books on these top-
ics including Efron and Tibshirani (1993), Davison and Hinkley (1997), Hall
(1992a) and Shao and Tu (1995). Also, see Section 3.6 of van der Vaart and
Wellner (1996).

3.7 Appendix
The book by Shao and Tu (1995) gives an explanation of the techniques
for proving consistency of the jackknife and the bootstrap. Following Section
3.1 of their text, let us look at two ways of showing that the bootstrap is
n
consistent for the case Tn = X n = n−1 i=1 Xi . Let X1 , . . . , Xn ∼ F and let
√  n (t) =
Tn = n(X n − µ) where µ = E(X1 ). Let Hn (t) = PF (Tn ≤ t) and let H
∗ ∗ √ ∗
PFn (Tn ≤ t) be the bootstrap estimate of Hn where Tn = n(X n − X n ) and
a.s.
X ∗ , . . . , X ∗ ∼ Fn . Our goal is to show that supx |Hn (x) − H
1 n
 n (x)|−→ 0.
The first method was used by Bickel and Freedman (1981) and is based on
Mallows’ metric. If X and Y are random variables with distributions F and
1/r
G, Mallows’ metric is defined by dr (F, G) = dr (X, Y ) = inf (E|X − Y |r )
where the infimum is over all joint distributions with marginals F and G. Here
are some facts about dr . Let Xn ∼ Fn and X ∼ F . Then, dr (Fn , F ) → 0 if
 
and only if Xn  X and |x|r dFn (x) → |x|r dF (x). If E(|X1 |r ) < ∞ then
a.s.
dr (Fn , F )−→ 0. For any constant a, dr (aX, aY ) = |a|dr (X, Y ). If E(X 2 ) < ∞
38 3. The Bootstrap and the Jackknife
 2
and E(Y 2 ) < ∞ then d2 (X, Y )2 = d2 (X − E(X), Y − E(Y )) + |E(X − Y )|2 .
If E(Xj ) = E(Yj ) and E(|Xj |r ) < ∞, E(|Yj |r ) < ∞ then
 m m
2 m
d2 Xj , Yj ≤ d2 (Xj , Yj )2 .
j=1 j=1 j=1

Using the properties of dr we have


√ ∗ √
 n , Hn ) =
d2 (H d2 ( n(X n − X n ), n(X n − µ))
 n n

1
= √ d2 (Xi∗ − X n ), (Xi − µ)
n i=1 i=1
'
( n
(1
≤ ) d2 (Xi∗ − X n , Xi − µ)2
n i=1
= d2 (X1∗ − X1 , X1 − µ)

= d2 (X1∗ , X1 )2 − (µ − E∗ X1∗ )2

= d2 (Fn , F )2 − (µ − X n )2
a.s.
−→ 0
a.s. a.s. a.s.
since d2 (Fn , F )−→ 0 and X n −→ µ. Hence, supx |Hn (x) − H
 n (x)|−→ 0.
The second method, due to Singh (1981), uses the Berry–Esséen bound
(1.28) which we now review. Let X1 , . . . , Xn be iid with finite mean µ =
E(X1 ), variance σ 2 = V(X1 ) and third moment, E|X1 |3 < ∞. Let Zn =

n(X n − µ)/σ. Then
33 E|X1 − µ|3
sup |P(Zn ≤ z) − Φ(z)| ≤ √ 3 . (3.27)
z 4 nσ
∗ n
Let Zn∗ = (X n − X n )/ 2 = n−1
σ where σ i=1 (Xi − X n )2 . Replacing F with

Fn and X n with X n we get
n
33 |Xi − X n |3
sup |PFn (Zn∗ ≤ z) − Φ(z)| ≤ i=1
. (3.28)
z 4 n3/2 σ
3
Let d(F, G) = supx |F (x) − G(x)| and define Φa (x) = Φ(x/a). Then
 
 √ 
∗  ∗ zσ 
sup |PFn (Zn ≤ z) − Φ(z)| = supPFn n(X n − X n ) ≤ z
σ −Φ 
z z  
σ 
 
 √ 
 ∗ 
= supPFn n(X n − X n ) ≤ t − Φσ (t)
t  
 n , Φσ ).
= d(H
3.8 Exercises 39

By the triangle inequality

d(H  n , Φσ ) + d(Φσ , Φσ ) + d(Φσ , Hn ).


 n , Hn ) ≤ d(H (3.29)

The third term in (3.29) goes to 0 by the central limit theorem. For the second
a.s. a.s.
term, d(Φσ , Φσ )−→ 0 since σ2 −→ σ 2 = V(X1 ). The first term is bounded by
the right-hand side of (3.28). Since E(X12 ) < ∞, this goes to 0 by the following
n a.s.
result: if E|X1 |δ < ∞ for some 0 < δ < 1 then n−1/δ i=1 |Xi |−→ 0. In
a.s.
 n , Hn )−→ 0.
conclusion, d(H

3.8 Exercises

1. Let T (F ) = (x − µ)3 dF (x)/σ 3 be the skewness. Find the influence
function.

2. The following data were used to illustrate the bootstrap by Bradley


Efron, the inventor of the bootstrap. The data are LSAT scores (for
entrance to law school) and GPA.

LSAT 576 635 558 578 666 580 555 661


651 605 653 575 545 572 594

GPA 3.39 3.30 2.81 3.03 3.44 3.07 3.00 3.43


3.36 3.13 3.12 2.74 2.76 2.88 3.96

Each data point is of the form Xi = (Yi , Zi ) where Yi = LSATi and Zi =


GPAi . Find the plug-in estimate of the correlation coefficient. Estimate
the standard error using (i) the influence function, (ii) the jackknife
and (iii) the bootstrap. Next compute a 95 percent studentized pivotal
bootstrap confidence interval. You will need to compute the standard
error of T ∗ for every bootstrap sample.
2  n
3. Let Tn = X n , µ = E(X1 ), αk = |x−µ|k dF (x) and α k = n−1 i=1 |Xi −
X n |k . Show that
2
2
4X n α 3
4X n α 4
α
vboot = + + 3.
n n2 n

4. Prove Theorem 3.16.


40 3. The Bootstrap and the Jackknife

5. Repeat the calculations in Example 3.17 but use a parametric boot-


strap. Assume that the data are log-Normal. That is, assume that Y ∼
N (µ, σ 2 ) where Y = log X. You will draw bootstrap samples Y1∗ , . . . , Yn∗

from N ( 2 ). Then set Xi∗ = eYi .
µ, σ

6. (Computer experiment.) Conduct a simulation to compare the four boot-



strap confidence interval methods. Let n = 50 and let T (F ) = (x −
µ)3 dF (x)/σ 3 be the skewness. Draw Y1 , . . . , Yn ∼ N (0, 1) and set Xi =
eYi , i = 1, . . . , n. Construct the four types of bootstrap 95 percent inter-
vals for T (F ) from the data X1 , . . . , Xn . Repeat this whole thing many
times and estimate the true coverage of the four intervals.

7. Let
X 1 , . . . , X n ∼ t3
where n = 25. Let θ = T (F ) = (q.75 − q.25 )/1.34 where qp denotes the
pth quantile. Do a simulation to compare the coverage and length of the
following confidence intervals for θ: (i) Normal interval with standard
error from the jackknife, (ii) Normal interval with standard error from
the bootstrap, (iii) bootstrap percentile interval.
Remark: The jackknife does not give a consistent estimator of the vari-
ance of a quantile.

8. Let X1 , . . . , Xn be distinct observations (no ties). Show that there are

2n − 1
n
distinct bootstrap samples.
Hint: Imagine putting n balls into n buckets.

9. Let X1 , . . . , Xn be distinct observations (no ties). Let X1∗ , . . . , Xn∗ denote


∗  ∗
a bootstrap sample and let X n = n−1 ni=1 Xi∗ . Find: E(X n |X1 , . . . , Xn ),
∗ ∗ ∗
V(X n |X1 , . . . , Xn ), E(X n ) and V(X n ).

10. (Computer experiment.) Let X1 , . . . , Xn ∼ Normal(µ, 1). Let θ = eµ and


let θ = eX be the mle. Create a data set (using µ = 5) consisting of
n=100 observations.
(a) Use the delta method to get the se and 95 percent confidence in-
terval for θ. Use the parametric bootstrap to get the se and 95 percent
confidence interval for θ. Use the nonparametric bootstrap to get the se
and 95 percent confidence interval for θ. Compare your answers.
3.8 Exercises 41

(b) Plot a histogram of the bootstrap replications for the parametric


and nonparametric bootstraps. These are estimates of the distribution
 The delta method also gives an approximation to this distribution,
of θ.
 se
namely, Normal(θ,  2 ). Compare these to the true sampling distribution
 Which approximation—parametric bootstrap, bootstrap or delta
of θ.
method—is closer to the true distribution?

11. Let X1 , . . . , Xn ∼ Uniform(0, θ). The mle is

θ = Xmax = max{X1 , . . . , Xn }.

Generate a data set of size 50 with θ = 1.


 Compare the true distribution of θ to the
(a) Find the distribution of θ.
histograms from the parametric and nonparametric bootstraps.
(b) This is a case where the nonparametric bootstrap does very poorly.
In fact, we can prove that this is the case. Show that, for the parametric
bootstrap P(θ∗ = θ)
 = 0 but for the nonparametric bootstrap P(θ∗ =
 ≈ .632. Hint: Show that, P (θ∗ = θ)
θ)  = 1 − (1 − (1/n))n . Then take
the limit as n gets large.

12. Suppose that 50 people are given a placebo and 50 are given a new
treatment. Thirty placebo patients show improvement, while 40 treated
patients show improvement. Let τ = p2 − p1 where p2 is the probability
of improving under treatment and p1 is the probability of improving
under placebo.
(a) Find the mle of τ . Find the standard error and 90 percent confidence
interval using the delta method.
(b) Find the standard error and 90 percent confidence interval using the
bootstrap.

13. Let X1 , . . . , Xn ∼ F be iid and let X1∗ , . . . , Xn∗ be a bootstrap sam-


ple from Fn . Let G denote the marginal distribution of Xi∗ . Note that
G(x) = P(Xi∗ ≤ x) = EP(Xi∗ ≤ x|X1 , . . . , Xn ) = E(Fn (x)) = F (x). So
it appears that Xi∗ and Xi have the same distribution. But in Exercise

9 we showed that V(X n ) = V(X n ). This appears to be a contradiction.
Explain.

You might also like