The Bootstrap and The Jackknife
The Bootstrap and The Jackknife
The bootstrap and the jackknife are nonparametric methods for computing
standard errors and confidence intervals. The jackknife is less computationally
expensive, but the bootstrap has some statistical advantages.
a b 1
bias(T(−i) ) = + +O . (3.3)
n − 1 (n − 1)2 n3
which shows that bjack estimates the bias up to order O(n−2 ). By a similar
calculation,
b 1 1
bias(Tjack ) = − +O =O
n(n − 1) n2 n2
so the bias of Tjack is an order of magnitude smaller than that of Tn . Tjack can
also be written as
n
1
Tjack = Ti
n i=1
where
Ti = nTn − (n − 1)T(−i)
are called pseudo-values.
The jackknife estimate of V(Tn ) is
s2
vjack = (3.4)
n
where 2
n n
i=1 Ti − 1
n i=1 Ti
s2 =
n−1
is the sample variance of the pseudo-values. Under suitable conditions on T ,
it can be shown that vjack consistently estimates V(Tn ). For example, if T is
a smooth function of the sample mean, then consistency holds.
3.5 Theorem. Let µ = E(X1 ) and σ 2 = V(X1 ) < ∞ and suppose that
Tn = g(X n ) where g has a continuous, nonzero derivative at µ. Then (Tn −
3.1 The Jackknife 29
4. Let 2
B B
1 ∗ 1 ∗
vboot = Tn,b − Tn,r . (3.11)
B B r=1
b=1
a.s.
By the law of large numbers, vboot −→ VFn (Tn ) as B → ∞. The estimated
√
boot = vboot . The following diagram illustrates the
standard error of Tn is se
bootstrap idea:
√ √
O(1/ n) O(1/ B)
#$%& #$%&
VF (Tn ) ≈ VFn (Tn ) ≈ vboot .
How do we simulate from Fn ? Since Fn gives probability 1/n to each data
point, drawing n points at random from Fn is the same as drawing a
3.3 Parametric Bootstrap 31
T = median(X)
Tboot = vector of length B
for(i in 1:N){
Xstar = sample of size n from X (with replacement)
Tboot[i] = median(Xstar)
}
se = sqrt(variance(Tboot))
3.12 Example. Figure 3.1 shows pseudo-code for using the bootstrap to esti-
mate the standard error of the median.
B
∗
∗n (t) = 1
G I Tn,b ≤t . (3.13)
B
b=1
3.14 Example. When applied to the nerve data, the bootstrap, based on B =
1000 replications, yields a standard error for the estimated skewness of .16
which is nearly identical to the jackknife estimate.
32 3. The Bootstrap and the Jackknife
boot
Tn ± zα/2 se
Pivotal Intervals. Let θ = T (F ) and θn = T (Fn ) and define the pivot
Rn = θn − θ. Let H(r) denote the cdf of the pivot:
where Rn,b ∗
= θn,b
∗
−θn . rβ∗
Let denote the β sample quantile of (Rn,1 ∗ ∗
, . . . , Rn,B )
and let θβ∗ denote the ∗
β sample quantile of (θn,1 ∗
, . . . , θn,B ). Note that rβ∗ =
θβ∗ − θn . It follows that an approximate 1 − α confidence interval is Cn = ( a, b)
where
a = θn − H
−1 1 − α = θn − r∗ ∗
1−α/2 = 2θn − θ1−α/2
2
−1 α
b = θn − H = θn − rα/2
∗
= 2θn − θα/2
∗
.
2
3.4 Bootstrap Confidence Intervals 33
In summary:
Tn − θ
Zn =
boot
se
and
∗
∗
Tn,b − Tn
Zn,b =
∗b
se
Then,
∗ ∗
P(θ ∈ Cn ) = P Tn − z1−α/2 boot ≤ θ ≤ Tn − zα/2
se boot
se
∗ Tn − θ ∗
= P zα/2 ≤ ≤ z1−α/2
seboot
∗ ∗
= P zα/2 ≤ Zn ≤ z1−α/2
≈ 1 − α.
This interval has higher accuracy then all the intervals discussed so far (see
Section 3.5) but there is a catch: you need to compute se ∗b for each bootstrap
sample. This may require doing a second bootstrap within each bootstrap.
34 3. The Bootstrap and the Jackknife
that is, just use the α/2 and 1 − α/2 quantiles of the bootstrap sample. The
justification for this interval is as follows. Suppose there exists a monotone
transformation U = m(T ) such that U ∼ N (φ, c2 ) where φ = m(θ). We do not
suppose we know the transformation, only that one exists. Let Ub∗ = m(Tb∗ ).
∗ ∗
Note that U(Bα/2) = m(T(Bα/2) ) since a monotone transformation preserves
2
quantiles. Since, U ∼ N (φ, c ), the α/2 quantile of U is φ − zα/2 c. Hence,
∗ ∗
U(Bα/2) = φ − zα/2 c ≈ U − zα/2 c and U(B(1−α/2)) ≈ U + zα/2 c. Therefore,
∗ ∗ ∗ ∗
P(TBα/2 ≤ θ ≤ TB(1−α/2) ) = P(m(TBα/2 ) ≤ m(θ) ≤ m(TB(1−α/2) ))
∗ ∗
= P(UBα/2 ≤ φ ≤ UB(1−α/2) )
≈ P(U − czα/2 ≤ φ ≤ U + czα/2 )
U −φ
= P −zα/2 ≤ ≤ zα/2
c
= 1 − α.
3.17 Example. For estimating the skewness of the nerve data, here are the
various confidence intervals.
Method 95% Interval
Normal (1.44, 2.09)
percentile (1.42, 2.03)
pivotal (1.48, 2.11)
studentized (1.45, 2.28)
3.5 Some Theory 35
√
∗ (t) = P (T (F ∗ ) ≤ t) = P
G n(T (Fn∗ ) − T (F )) ≤ u (3.18)
n Fn n Fn
√
where u = n(t − T (F )). Consistency of the bootstrap can now be expressed
as follows.
3.19 Theorem. Suppose that E(X12 ) < ∞. Let Tn = g(X n ) where g is con-
tinuously differentiable at µ = E(X1 ) and that g (µ) = 0. Then,
√ √
a.s.
supPFn n(T (Fn ) − T (Fn )) ≤ u − PF
∗
n(T (Fn ) − T (F ) ≤ u −→ 0.
u
(3.20)
Look closely at Theorems 3.19 and 3.21. It is because of results like these
that the bootstrap “works.” In particular, the validity of bootstrap confidence
intervals depends on these theorems. See, for example, Theorem 3.16. There
is a tendency to treat the bootstrap as a panacea for all problems. But the
bootstrap requires regularity conditions to yield valid answers. It should not
be applied blindly.
36 3. The Bootstrap and the Jackknife
Method Accuracy
Normal interval first-order accurate
basic pivotal interval first-order accurate
percentile interval first-order accurate
studentized pivotal interval second-order accurate
adjusted percentile interval second-order accurate
Vn z
PF (Vn∗ ≤ z) = PF ≤
σ
σ
z 1 z z 1
= Φ +√ b φ + OP
σ n σ
σ n
1
PF (Vn ≤ z) − PF (Vn∗ ≤ z) = OP √ (3.26)
n
3.7 Appendix
The book by Shao and Tu (1995) gives an explanation of the techniques
for proving consistency of the jackknife and the bootstrap. Following Section
3.1 of their text, let us look at two ways of showing that the bootstrap is
n
consistent for the case Tn = X n = n−1 i=1 Xi . Let X1 , . . . , Xn ∼ F and let
√ n (t) =
Tn = n(X n − µ) where µ = E(X1 ). Let Hn (t) = PF (Tn ≤ t) and let H
∗ ∗ √ ∗
PFn (Tn ≤ t) be the bootstrap estimate of Hn where Tn = n(X n − X n ) and
a.s.
X ∗ , . . . , X ∗ ∼ Fn . Our goal is to show that supx |Hn (x) − H
1 n
n (x)|−→ 0.
The first method was used by Bickel and Freedman (1981) and is based on
Mallows’ metric. If X and Y are random variables with distributions F and
1/r
G, Mallows’ metric is defined by dr (F, G) = dr (X, Y ) = inf (E|X − Y |r )
where the infimum is over all joint distributions with marginals F and G. Here
are some facts about dr . Let Xn ∼ Fn and X ∼ F . Then, dr (Fn , F ) → 0 if
and only if Xn X and |x|r dFn (x) → |x|r dF (x). If E(|X1 |r ) < ∞ then
a.s.
dr (Fn , F )−→ 0. For any constant a, dr (aX, aY ) = |a|dr (X, Y ). If E(X 2 ) < ∞
38 3. The Bootstrap and the Jackknife
2
and E(Y 2 ) < ∞ then d2 (X, Y )2 = d2 (X − E(X), Y − E(Y )) + |E(X − Y )|2 .
If E(Xj ) = E(Yj ) and E(|Xj |r ) < ∞, E(|Yj |r ) < ∞ then
m m
2 m
d2 Xj , Yj ≤ d2 (Xj , Yj )2 .
j=1 j=1 j=1
The third term in (3.29) goes to 0 by the central limit theorem. For the second
a.s. a.s.
term, d(Φσ , Φσ )−→ 0 since σ2 −→ σ 2 = V(X1 ). The first term is bounded by
the right-hand side of (3.28). Since E(X12 ) < ∞, this goes to 0 by the following
n a.s.
result: if E|X1 |δ < ∞ for some 0 < δ < 1 then n−1/δ i=1 |Xi |−→ 0. In
a.s.
n , Hn )−→ 0.
conclusion, d(H
3.8 Exercises
1. Let T (F ) = (x − µ)3 dF (x)/σ 3 be the skewness. Find the influence
function.
7. Let
X 1 , . . . , X n ∼ t3
where n = 25. Let θ = T (F ) = (q.75 − q.25 )/1.34 where qp denotes the
pth quantile. Do a simulation to compare the coverage and length of the
following confidence intervals for θ: (i) Normal interval with standard
error from the jackknife, (ii) Normal interval with standard error from
the bootstrap, (iii) bootstrap percentile interval.
Remark: The jackknife does not give a consistent estimator of the vari-
ance of a quantile.
2n − 1
n
distinct bootstrap samples.
Hint: Imagine putting n balls into n buckets.
θ = Xmax = max{X1 , . . . , Xn }.
12. Suppose that 50 people are given a placebo and 50 are given a new
treatment. Thirty placebo patients show improvement, while 40 treated
patients show improvement. Let τ = p2 − p1 where p2 is the probability
of improving under treatment and p1 is the probability of improving
under placebo.
(a) Find the mle of τ . Find the standard error and 90 percent confidence
interval using the delta method.
(b) Find the standard error and 90 percent confidence interval using the
bootstrap.