0% found this document useful (0 votes)

45 views15 pages

Boot

Chapter 11 discusses the bootstrap, a nonparametric technique for estimating the variance of an estimator and constructing confidence intervals for parameters. It explains the bootstrap sampling process, algorithms for variance estimation and confidence intervals, and provides examples illustrating its application. The chapter also addresses the theoretical justification for why the bootstrap works, emphasizing its utility in both parametric and nonparametric settings.

Uploaded by

abdel.arck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views15 pages

Boot

Uploaded by

abdel.arck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 11

The Bootstrap

This chapter covers the following topics:

• What is the Bootstrap?

• Why Does it Work?
• Examples of the Bootstrap.

11.1 Introduction

Most of this volume is devoted to parametric inference. In this chapter we depart from
the parametric framework and discuss a nonparametric technique called the bootstrap.
The bootstrap is a method for estimating the variance of an estimator and for finding
approximate confidence intervals for parameters. Although the method is nonparametric,
it can be used for inference about parameters in parametric and nonparametric models
which is why we include it in this volume.

11.2 A More General Notion of “Parameter”

We begin by broadening what we mean by a parameter. Let us begin with a few examples.

1. Let X1 , . . . , Xn ⇠ P where P 2 (P✓ : ✓ 2 ⇥). Let ✓bn be the maximum likelihood

estimator of ✓. We would like to estimate the variance of ✓bn and we want a 1 ↵
confidence interval for ✓.

209
210 CHAPTER 11. THE BOOTSTRAP

2. Let X1 , . . . , Xn ⇠ P and let ✓ = T (P ) denote the mean of P . Hence, ✓ = E[Xi ] =

R P
xdP (x). Let ✓bn = n1 ni=1 Xi . Again, we would like to estimate the variance of ✓bn
and we want a 1 ↵ confidence interval for ✓.

3. Let X1 , . . . , Xn ⇠ P and let ✓ = T (P ) denote the median of P . Hence, P(Xi  ✓) =

P(Xi > ✓) = 1/2. Let ✓bn denote the sample median. Yet again, we would like to
estimate the variance of ✓bn and we want a 1 ↵ confidence interval for ✓.

In the first example, ✓ denotes the parameter of a parametic model. In the second and third
example, we are in a nonparametric situation; in these cases we think of a “parameter” as
a function of the distribution P and we write ✓ = T (P ). The bootstrap can be used in both
the parametric and nonparametric settings.

Let Pn be the empirical distribution. This is the discrete distribution that puts mass 1/n at
each datapoint Xi . Hence,
n
1X
Pn (A) = I(Xi 2 A). (11.1)
n i=1

In the nonparametric case, we will estimate the parameter ✓ = TR (P ) by ✓bn = T (Pn ) which
is called the plug-in estimator. For example, when ✓ = T (P ) = xdP (x) is the mean, the
plug-in estmator is Z
1X
✓bn = T (Pn ) = xdPn (x) = Xi (11.2)
n i=1
which is the sample mean.

A sample of size n drawn from Pn is called a bootstrap sample, denoted by

X1⇤ , . . . , Xn⇤ ⇠ Pn .

Bootstrap samples play an important role in what follows. Note that drawing an iid sample
X1⇤ , . . . , Xn⇤ from Pn is equivalent to drawing n observations, with replacement, from the
original data {X1 , . . . , Xn }. Thus, bootstrap sampling is often described as “resampling the
data.” This can be a bit confusing and we think it is much clearer to think of a bootstrap
sample X1⇤ , . . . , Xn⇤ as n draws from the empirical distribution Pn .

11.3 The Bootstrap

Now we give the bootstrap algorithms for estimating the variance of ✓bn and for construct-
ing confidence intervals. The explanation of why (and when) the bootstrap gives valid
estimates, is deferred until Section 11.5. Let ✓bn = g(X1 , . . . , Xn ) denotes some estimator.
11.3. THE BOOTSTRAP 211

Bootstrap Variance Estimator

1. Draw a bootstrap sample X1⇤ , . . . , Xn⇤ ⇠ Pn . Compute ✓bn⇤ = g(X1⇤ , . . . , Xn⇤ ).

2. Repeat the previous step, B times, yielding estimators ✓bn,1

⇤
, . . . , ✓bn,B
⇤
.

3. Compute: v
u B
u1 X
sb = t (✓b⇤ ✓)2
B j=1 n,j
PB b⇤
where ✓ = 1
B j=1 ✓n,j .

4. Output sb.

The next theorem states that sb2 approximates Var(✓bn ). The are two sources of error in
this apprixmation. The first is due to the fact that n is finite and the second is due to the
fact that B is finite. However, we can make B as large as we like. (In practice, it usually
suffices to take B = 10, 000.) So we ignore the error due to finite B.

s2 P
Theorem 138. Under appropriate regularity conditions, Var(✓bn )
! 1 as n ! 1.

Now we describe the confidence interval algorithm.

Bootstrap Confidence Interval

1. Draw a bootstrap sample X1⇤ , . . . , Xn⇤ ⇠ Pn . Compute ✓bn⇤ = g(X1⇤ , . . . , Xn⇤ ).

2. Repeat the previous step, B times, yielding estimators ✓bn,1

⇤
, . . . , ✓bn,B
⇤
.

3. Let
1 X ⇣p b⇤ ⌘
B
Fb(t) = I n(✓n,j ✓bn  t).
B j=1

4. Let 
t1 ↵/2 b t↵/2
Cn = ✓bn p , ✓n p
n n
where t↵/2 = Fb 1 (↵/2) and t1 ↵/2 = Fb 1 (1 ↵/2).

5. Output Cn .
212 CHAPTER 11. THE BOOTSTRAP

0.5
●

●
●
● ●
●

0.0
● ● ● ●
● ●
● ● ● ●
●
● ●

● ● ●
● ●
●
●
● ● ●
●
●
−0.5

●
●
●

●
●
● ●

● ●

●
●
−1.0

●
●
●●

0.0 0.5 1.0 1.5 2.0

●

Figure 11.1: 50 points drawn from the model Yi = 1 + 2Xi Xi2 + ✏i where Xi ⇠
Uniform(0, 2) and ✏i ⇠ N (0, .22 ). In this case, the maximum of the polynomail occurs at
✓ = 1. The true and estimated curves are shown in the figure. At the bottom of the plot we
show the 95 percent boostrap confidence interval based on B = 1, 000.

Theorem 139. Under appropriate regularity conditions,

✓ ◆
1
P(✓ 2 Cn ) = 1 ↵ O p .
n

as n ! 1.

11.4 Examples

Example 140. Consider the polynomial regression model Y = g(X) + ✏ where X, Y 2 R

and g(x) = 0 + 1 x+ 2 x2 . Given data (X1 , Y1 ), . . . , (Xn , Yn ) we can estimate = ( 0 , 1 , 2 )
with the least squares estimator b. Suppose that g(x) is concave and we are interested in
the location at which g(x) is maximized. It is easy to see that the maximum occurs at
x = ✓ where ✓ = (1/2) 1 / 2 . A point estimate of ✓ is ✓b = (1/2) b1 / b2 . Now we use the
bootstrap to get a confidence interval for ✓. Figure 11.1 shows 50 points drawn from the
above model with 0 = 1, 1 = 2, 2 = 1. The Xi ’s were sample uniformly on [0, 2] and
we took ✏i ⇠ N (0, .22 ). In this case, ✓ = 1. The true and estimated curves are shown in
the figure. At the bottom of the plot we show the 95 percent boostrap confidence interval
based on B = 1, 000.
11.5. WHY DOES THE BOOTSTRAP WORK? 213

Example 141. Let (X1 , Y1 , Z1 ), . . . , (Xn , Yn , Zn ) ⇠ P where Xi 2 R, Yi 2 R, Zi 2 Rd . The

partial correlation of X and Y given Z is

⌦12
✓= p
⌦11 ⌦22

where ⌦ = ⌃ 1 and ⌃ is the covariance matrix of W = (X, Y, Z)T . The partial correlation
measures the linear dependence between X and Y after removing the effect of Z. For
illustration, suppose we generate the data as follows: we take Z ⇠ N (0, 1), X = 10Z + ✏
and Y = 10Z + where ✏, ⇠ N (0, 1). The correlation between X and Y is very large. But
the partial correlation is 0. We generated n = 100 data points from this model. The sample
correlation was 0.99. However, the estimate partial correaltion was -0.16 which is much
closer to 0. The 95 percent bootstrap confidence interval is [-.33,.02] which includes the
true value, namely, 0.

11.5 Why Does the Bootstrap Work?

To explain why the bootstrap works, let us begin with a heuristic. Let
p
Fn (t) = P( n(✓b⇤ ✓bn )  t)

and let
p
Fbn (t) = P( n(✓b⇤ ✓bn )  t|X1 , . . . , Xn ).

be the bootstrap approximation to Fn . We do not know Fn be we do know Fbn in the

sense that it depends only on the observed data. Usually, Fn will be close to some limiting
distribution L. Similarly, Fbn will be close to some limiting distribution L.
b Moreover, L and
b b
L will be close which implies that Fn and Fn are close. In practice, we usually approximate
Fbn by its Monte Carlo version
B
1 X p b⇤
F (t) = I( n(✓j ✓bj )  t).
B j=1

But F is close to Fbn as long as we take B large. See Figure 11.2.

Now we will give more detail in a simple, special case. Suppose that X1 , . . . , Xn ⇠ P where
Xi has mean µ and variance 2 . Suppose we want to construct a confidence interval for µ.
P
bn = n1 ni=1 Xi and define
Let µ
p
Fn (t) = P( n(b
µn µ)  t). (11.3)
214 CHAPTER 11. THE BOOTSTRAP

We do not know the cdf F . But, for the moment, that an oracle gave us F . For any
0 < < 1, define z = F 1 ( ). Define the oracle confidence interval

z1 ↵/2 z↵/2
An = µ bn p , µ bn p . (11.4)
n n
We claim that Bn is a 1 ↵ confidence interval. To see this, note that the probability that
An traps µ is
✓ ◆
z1 ↵/2 z↵/2
P(µ 2 An ) = P µ bn p µµ bn p
n n
p
= P z↵/2  n(b µn µ)  z1 ↵/2
⇣ ↵⌘ ↵
= Fn (z1 ↵/2 ) Fn (z↵/2 ) = 1 =1 ↵.
2 2

Unfortunately, we do not know F but we can estimate it. The bootstrap estimate if F is
⇣p ⌘
b
Fn (t) = P n(b ⇤
µn µ bn )  t X1 , . . . , Xn
P
where µb⇤n = n1 ni=1 Xi⇤ and X1⇤ , . . . , Xn⇤ ⇠ Pn . The data X1 , . . . , Xn are treated as fixed
during the bootstrap which is why we write Fbn as a conditional distribution.

Note that when we do the bootstrap algorithm, we are just approximating Fbn (t) by
B
1 X p
F (t) = µ⇤n,j
I( n(b bn )  t).
µ
B j=1

But
sup |F (t) Fbn (t)| ! 0
t

almost surely, as B ! 1. Since we can take B as large as we want, we can ignore

the approximation error and just assume we know Fbn (t). For any 0 < < 1, define
b
t = F ( ). The bootstrap confidence interval is
1


t1 ↵/2 t↵/2
Cn = µ bn p , µ bn p . (11.5)
n n
This is the same as the oracle confidence interval except that we have used t↵/2 and t1 ↵/2
in place of z↵/2 and z1 ↵/2 . To show that t↵/2 ⇡ z↵/2 and t1 ↵/2 ⇡ z1 ↵/2 , we need to show
that Fbn (t) approximates Fn (t).
Theorem 142 (Bootstrap Theorem). Suppose that µ3 = E|Xi |3 < 1. Then,
✓ ◆
b 1
sup |Fn (t) Fn (t)| = OP p .
t n
11.5. WHY DOES THE BOOTSTRAP WORK? 215

p
O(1/ n)
Fn L

p
OP (1/ n)

Fbn p b
L
OP (1/ n)

p
O(1/ B)

p
Figure 11.2: The distribution Fn (t) = P( n(✓bn ✓)  t) is close to some limit distribution
p
L. Similarly, the bootstrap distribution Fbn (t) = P( n(✓bn⇤ ✓bn )  t|X1 , . . . , Xn ) is close to
some limit distribution L.b Since L
b and L are close, it follows that Fn and Fbn are close. In
practice, we approximate Fbn with its Monte Carlo version F which we can make as close
to Fbn as we like by taking B large.
216 CHAPTER 11. THE BOOTSTRAP

To prove this result, let us recall that Berry-Esseen Theorem from Chapter 2. For conve-
nience, we repeat the theorem here.
Theorem 143 (Berry-Esseen Theorem). Let X P1 n, . . . , Xn be i.i.d. with mean µ and variance
2
. Let µ3 = E[|Xi µ| ] < 1. Let X n = n p i=1 Xi be the sample mean and let be the
3 1

cdf of a N (0, 1) random variable. Let Zn = n(X n µ) . Then

33 µ3
sup P(Zn  z) (z)  p . (11.6)
z 4 n

Proof of the Bootstrap Theorem.

P Let (t) denote the cdf p of a⇤ Normal with mean 0 and
variance 2 . Let b2 = n1 ni=1 (Xi µ
bn )2 . Thus, b2 = Var( n(bµn µ bn )|X1 , . . . , Xn ). Now,
by the triangle inequality,

sup |Fbn (t) Fn (t)|  sup |Fn (t) (t)| + sup | (t) b (t)| + sup |Fbn (t) b (t)|
t t t t
= I + II + III.

Let Z ⇠ N (0, 1). Then, Z ⇠ N (0, 2

) and from the Berry-Esseen theorem,
p
I = sup |Fn (t) (t)| = sup P n(b µn µ)  t P ( Z  t)
t t
✓p ◆ ✓ ◆
n(b
µn µ) t t 33 µ3
= sup P  P Z  p .
t 4 n

Using the same argument on the third term, we have that

b
33 µ
III = sup |Fbn (t) b (t)|  p3
t 4 n
P
where µb3 = n1 i=1 |Xi µ bn |3 is the empirical third moment. By the strong law of large
numbers, µ b3 converges almost surely to µ3 . So, almost p surely, for all large n, µ
b3  2µ3
and so III  4 n . From the fact that b
33 2µ
p 3
= OP ( 1/n) it may be shown that II =
p
supt | (t) b (t)| = OP ( 1/n). (This may be seen by Taylor expanding b (t) around .)
This completes the proof. ⇤
⇣ ⌘
b
We have shown that supt |Fn (t) Fn (t)| = OP p1n . From this, it may be shown that, for
⇣ ⌘
each 0 < < 1, t z = OP p1n . From this, one can prove Theorem 139.

So far we have focused on the mean. Similar theorems may be proved for more general
parameters. The details are complex so we will not discuss them here. We give a little more
information in the appendix. For a thorough treatment, we refer the reader to Chapter 23
of van der Vaart (1998).
11.6. A FEW REMARKS ABOUT THE BOOTSTRAP 217

11.6 A Few Remarks About the Bootstrap

Here are some random remarks about the bootstrap:

1. The bootstrap is nonparametric but it does require some assumptions. You can’t
assume it is always valid. (See the appendix.)
2. The bootstrap is an asymptotic method.
p Thus the coverage of the confidence interval
is 1 ↵ + rn where, typically, rn = C/ n.
3. There is a related method called the jackknife where the standard error is estimated
by leaving out one observation at a time. However, the bootstrap is valid under
weaker conditions than the jackknife. See Shao and Tu (1995).
4. Another way to construct a bootstrap confidence interval is to set C = [a, b] where a is
the ↵/2 quantile of ✓b1⇤ , . . . , ✓bB
⇤
and b is the 1 ↵/2 quantile. This is called the percentile
interval. This interval seems very intuitive but does not have the theoretical support
for the interval Bn . However, in practice, the percentile interval and Bn are often
quite similar.
5. There are many cases where the bootstrap is not formally justified. This is especially
true with discrete structures like trees and graphs. Nonethless, the bootstrap can be
used in an informal way to get some intuition of the variability of the procedure. But
keep in mind that the formal guarantees may not apply in these cases. For example,
see Holmes (2003) for a discussion of the bootstrap applied to phylogenetic tres.
6. There is a method related to the bootstrap called subsampling. In this case, we draw
samples of size m < n without replacement. Subsampling produces valid confidence
intervals under weaker conditions than the bootstrap. See Politis, Romano and Wolf
(1999).
7. There are many modifications of the bootstrap that lead to more accurate confidence
intervals; see Efron (1996).
8. There is also a parametric bootstrap. If {p(x : ✓) : ✓ 2 ⇥} is a parametric model and
✓b is an estimator, such as the maximum likelihood estimator, we sample X1⇤ , . . . , Xn⇤
from p(x; ✓)b instead of sampling from Pn .

11.7 The High-Dimensional Bootstrap

Now we consider the bootstrap in high-dimensions. Let X1 , . . . , Xn 2 Rd where d may be

larger than n. In fact, we allow the dimension d = dn to increase with n. We will assume
218 CHAPTER 11. THE BOOTSTRAP

that the distribution of Xi is sub-Gaussian, although this is stronger than needed. This
T 2
means that E(et X )  ec||t|| for some c > 0.

Let µ = E[Xi ] 2 Rd . Here is a bootstrap algorithm for constructing a confidence set for µ.

High Dimensional Bootstrap

Pn
1. Draw a bootstrap sample X1⇤ , . . . , Xn⇤ ⇠ Pn . Compute µ
b⇤n = 1
n i=1 Xi⇤ .

2. Repeat the previous step, B times, yielding estimators µ b⇤n,B .

b⇤n,1 , . . . , µ

3. Let
B
b 1 X p
Fn (t) = µ⇤n,j
I( n||b bn ||1  t).
µ
B j=1

4. Let ( )
t↵
Cn = a 2 R : ||a
d
bn ||1
µ p
n

where t↵ = Fb 1 (1 ↵).

5. Output Cn .

1/8
Theorem 144 (Chernozhukov, Chetverikov and Kato, 2014). Suppose that d = o(en ).
Then
c log d
P(µ 2 Cn ) 1 ↵
n1/8
for some c > 0.

Under the stated conditions, the same result applies to higher-order moments. If ✓ = g(µ)
for some function g then we can get a confidence set for ✓ by applying g to Cn . We call this
the projected confidence set. That is, if we define An = {g(µ) : µ 2 Cn } then it follows that

c log d
P(✓ 2 An ) 1 ↵ .
n1/8
p
Alternatively, we can apply the bootstrap to n(g(bµ) g(µ)). However, we do not auto-
matically get the same coverage guarantee that the projected set has.
Example 145. Let us consider constructing a confidence set for a high-dimensional covari-
ance matrix. Let X1 , . . . , Xn 2 Rk be a random sample and let ⌃ = Var(X) which is a k ⇥ k
matrix. There are d = O(k 2 ) parameters here. Let ⌃ b = (1/n) Pn (Xi X n )(Xi X n )T .
i=1
Also, let = vec(⌃) and b = vec(⌃), b where vec takes a matrix and converts it into a vector
p
by stacking the columns. We can then apply the bootstrap algorithm above to n(b )
11.8. SUBSAMPLING 219

p p
to get the bootstrap quantile t↵ . Let `n = b t↵ / n and un = b + t↵ / n. We can then
unstack `n and un into k ⇥ k matrices Ln and Un . It then follows that
c log d
P(Ln  ⌃  Un ) 1 ↵
n1/8
where A  B means that Ajk  Bjk for all (j, k).

11.8 Subsampling

11.9 Finite Sample Methods

11.9.1 The Permutation Test

In this section we discuss a nonparametric hypothesis testing method. The test is not based
on the bootstrap but we include it here because it is similar in spirit to the bootstrap. Let
X1 , . . . , Xn ⇠ F, Y1 , . . . , Y m ⇠ G
be two independent samples and suppose we want to test the hypothesis
H0 : F = G versus H1 : F 6= G. (11.7)
The permutation test gives an exact (nonasymptotic), nonparametric method for testing
this hypothesis. Let Z = (X, Y ) where X = (X1 , . . . , Xn )T and Y = (Y1 , . . . , Ym )T . Define a
vector W of length N = n + m that indicates which group Zi is from. Thus, Wi = 1 if i  n
and Wi = 2 if i > n. The data look like this:

(X, Y )T X1 ... Xn Y1 ... Ym

Z Z1 ... Zn Zn+1 ... Zn+m
W 1 ... 1 2 ... 2

Let T = T (Z, W ) be any test statistic. For example, consider T = |X Y |. We can

write T as a function of Z and W as follows. Define X(Z, W ) = {Zi : Wi = 1} and
Y (Z, W ) = {Zi : Wi = 2} and then T = |X Y | = |X(Z, W ) Y (Z, W )|.

Let T ⇤ = T (Z, W ⇤ ) where W ⇤ denotes a random permutation of W . Define the permutation

p-value
p = P(T ⇤ > t) (11.8)
where t = T (Z, W ) is the observed value of the test statistic. This p-value defines an exact
test. The steps of the algorithm are as follows:
220 CHAPTER 11. THE BOOTSTRAP

Permutation Test

1. Compute t = T (Z, W ).

2. Repeat B times: form a random permutation W ⇤ of W and compute T ⇤ = T (Z, W ⇤ ).

Denote the values by T1⇤ , . . . , TB⇤ .

3. Compute the p-value

B
1 X
p= I(Tj⇤ > t). (11.9)
B j=1

Theorem 146. Suppose we reject H0 : F = G whenever p < ↵. If H0 is true then

P(rejecting H0 )  ↵.

Proof. xxxx

The test is called exact since the probability of falsely rejecting the null hypothesis is less
than or equal to ↵. There is no large sample approximation here.

Remark: There is a bootstrap hypothesis test that is similar to the permutation test. The
advantage of the bootstrap test is that it is more general than the permutation test. The
disadvantage is that it is an approximate test, not an exact test. The bootstrap p-value based
on a statistic T = T (X) is
p = PF0 (T ⇤ > t) (11.10)
where t = T (X), T ⇤ = T (X ⇤ ) and X ⇤ is drawn from the null distribution F0 . If the null
hypothesis does not completely specify a distribution F0 then we compute p = PFb0 (T ⇤ > t)
where Fb0 is an estimate F under the restriction that F 2 F0 where F0 is the set of distributions
consistent with the null hypothesis. However, this is an approximate test while the permutation
test is exact.
Example 147. Gretton et al (2008) developed a two sample test based on reproducing
kernel Hilbert spaces. The test statistic is
n n m
1 X 2 X 1 X
T = 2 K(Xi , Xj ) K(Xi , Yj ) + 2 K(Yi , Yj )
n i,j=1 nm i,j=1 m i,j=1
2 2
where K is a symmetric kernel. Suppose we take K = Kh (x, y) = e ||x y|| /(2h ) to be
the Gaussian kernel. Rather than choosing a bandwidth h we can simply define the test
statistic to be the maximum over all bandwidths:
0 1
X n X n Xm
1 2 1
T = sup @ 2 Kh (Xi , Xj ) Kh (Xi , Yj ) + 2 Kh (Yi , Yj )A .
h>0 n i,j=1 nm i,j=1 m i,j=1
11.10. SUMMARY 221

4
2

2
0

0
−2

−2
−4

−4
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6

400
300
200
100
0

0.2 0.3 0.4 0.5 0.6

Test Statistics

Figure 11.3: Top left: X1 , . . . , Xn . Top right: Y1 , . . . , Ym . Bottom left: values of the test
statistic from 1,000 permutations.

It would be difficult to find a useful expression for the distribution of the test statistic T
under the null hypothesis H0 : F = G. However, we can compute the p-value easily using
the permutation test. Figure 11.3 shows an example. The top left plot shows n = 10
observations from F and the top right plot shows n = 10 observations from G. (We took F
to be bivariate normal and G to be a mixture of two normals.) The test statistic is 0.45 and
the p-value, based on B = 1, 000 is 0.006 suggesting that we should reject H0 . The bottom
left shows a histogram of the values of T from the 1,000 permutations. The vertical line is
the observed value of T . The p-value is the fraction of statistics greater than T .

11.9.2 Confidence Rectangles for Quantiles

11.9.3 Confidence Rectangles for Means

11.9.4 Conformal Methods

11.10 Summary

The bootstrap provides nonparametric standard errors and confidence intervals. To draw
a bootstrap sample we draw n observations X1⇤ , . . . , Xn⇤ from the empirical distribution
Pn . This is equivalent to drawing n observations with replacement from the original daa
X1 , . . . , Xn . We then compute the estimator ✓b⇤ = g(X1⇤ , . . . , Xn⇤ ). If we repeat this whole
222 CHAPTER 11. THE BOOTSTRAP

process B times we get ✓b1⇤ , . . . , ✓B

⇤
. The standard deviation of these values approximates
the stanard error of ✓bn = g(X1 , . . . , Xn ).

11.11 Bibliographic Remarks

Further details on statistical functionals can be found in [51], [13], [52], [23] and [59].
The jackknife was invented by [47] and [58]. The bootstrap was invented by [20]. There
are several books on these topics including [22], [13], [29] and [52]. Also, see Section
3.6 of [60].

Appendix

More on Plug-in Estimators. Let ✓ = T (P ). The plug-in estimator of ✓ is ✓bn = T (Pn )

where Pn is the
R empirical distribution that puts mass
R 1/n at each Xi1. P
For example, suppose
n
that T (P ) = x dP (x) is the mean. Then T (Pn ) = x dPn (x) = n i=1 Xi since itegrat-
ing with respect to Pn corresponds to summing over the discrete measure with mass 1/n
at Xi .

As another example, suppose that ✓ = T (P ) is the variance of X. Let µ denote the mean.
Then Z Z Z 2
✓ = E(X µ) = (x µ) dP (x) = x dP (x)
2 2 2
xdP (x) .

Thus, the plug-in estimator is

Z Z n
" n
#2 n
1X 2 1X 1X
2
✓bn = 2
x dPn (x) xdPn (x) = X Xi = (Xi X n )2 .
n i=1 i n i=1 n i=1

For one more example, let ✓ be the ↵ quantile of X. Here it is convenient to work with the
cdf Fn (x) = P (X  x). Thus ✓ = T (P ) = T (F ) = F 1 (↵) where F 1 (y) = inf x {Fn (x)
P
y}. The empirical cdf is Fn (x) = n 1 ni=1 I(Xi  x) and ✓bn = T (Fn ) = inf x {Fn (x) ↵}.
In other words, ✓bn is just the corresponding sample quantile.

Hadamard Differentiability. The key condition needed for the bootstrap is Hadamard
differentiability. Let P denote all distributions on the real line and let D denote the linear
space generated by P. Write T ((1 ✏)P + ✏Q) = T (P + ✏D) where D = Q P 2 D. The
11.11. BIBLIOGRAPHIC REMARKS 223

Gateaux derivative, Gateaux derivative is defined by

T (P + ✏D) T (P )
LP (D) = lim LF (D) ! 0. (11.11)
✏#0 ✏

Thus T (P + ✏D) ⇡ ✏LP (D) + o(✏) and the error term o(✏) goes to 0 as ✏ ! 0. Hadamard
differentiability requires that this error term be small uniformly over compact sets. Equip
D with a metric d. T is Hadamard differentiable at P if there exists a linear functional
LP on D such that for any ✏n ! 0 and {D, D1 , D2 , . . .} ⇢ D such that d(Dn , D) ! 0 and
P + ✏n Dn 2 P, ✓ ◆
T (P + ✏n Dn ) T (P )
lim LP (Dn ) = 0. (11.12)
n!1 ✏n

Let d(P, Q) = supx |P (( 1, x]) Q(( 1, x])|. Sufficient conditions

R for bootstrap validity
are: T is Hadamard differentiable with respect to d and 0 < L2P ( x P )dP (x) < 1 where
x denotes a point mass at x.

Lecture13 PDF
No ratings yet
Lecture13 PDF
10 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Bootstrap Methods 2020
No ratings yet
Bootstrap Methods 2020
16 pages
L22 Bootstrap
No ratings yet
L22 Bootstrap
7 pages
AdvEcx Chp3 Full 3006
No ratings yet
AdvEcx Chp3 Full 3006
17 pages
Notessc w05
No ratings yet
Notessc w05
10 pages
Lecture 19 20
No ratings yet
Lecture 19 20
5 pages
Bootstrap
No ratings yet
Bootstrap
4 pages
Wasserman 8 PDF
No ratings yet
Wasserman 8 PDF
12 pages
Bootstrapping for Regression Analysis
No ratings yet
Bootstrapping for Regression Analysis
14 pages
Bootstrap: Estimate Statistical Uncertainties
No ratings yet
Bootstrap: Estimate Statistical Uncertainties
22 pages
Bootstrap Techniques in R
No ratings yet
Bootstrap Techniques in R
10 pages
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
No ratings yet
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
19 pages
Bootstrap Up
No ratings yet
Bootstrap Up
5 pages
Bootstrap Methods for Estimating Means
No ratings yet
Bootstrap Methods for Estimating Means
61 pages
Boots Trapping
No ratings yet
Boots Trapping
4 pages
Understanding Bootstrap Statistics
No ratings yet
Understanding Bootstrap Statistics
18 pages
1 s2.0 S0167947399000663 Main
No ratings yet
1 s2.0 S0167947399000663 Main
11 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Bootstrap 1
No ratings yet
Bootstrap 1
7 pages
Bootstrap Methods in Statistics
No ratings yet
Bootstrap Methods in Statistics
25 pages
Intro to Bootstrap for Econometrics
No ratings yet
Intro to Bootstrap for Econometrics
29 pages
MIT18 05S14 Class24-Slde-A
No ratings yet
MIT18 05S14 Class24-Slde-A
16 pages
Introduction To Monte Carlo Procedures: The Non-Parametric and Parametric Bootstrap 1. Review of The Non-Parametric Bootstrap
100% (1)
Introduction To Monte Carlo Procedures: The Non-Parametric and Parametric Bootstrap 1. Review of The Non-Parametric Bootstrap
10 pages
An Introduction To The Bootstrap 3ai7r0o65z
No ratings yet
An Introduction To The Bootstrap 3ai7r0o65z
8 pages
Bootstrap Report
No ratings yet
Bootstrap Report
92 pages
S M S T C Lecture 2425 4
No ratings yet
S M S T C Lecture 2425 4
43 pages
Sta255 Week 10-2 Pre
No ratings yet
Sta255 Week 10-2 Pre
20 pages
Bootstrap Methodology
No ratings yet
Bootstrap Methodology
33 pages
Braun Bootstrap2012 PDF
No ratings yet
Braun Bootstrap2012 PDF
63 pages
Bootstrap Tut
No ratings yet
Bootstrap Tut
2 pages
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
No ratings yet
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
13 pages
Bootstrap Methods Overview
No ratings yet
Bootstrap Methods Overview
90 pages
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
No ratings yet
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
17 pages
Resampling Methods For Time Series
No ratings yet
Resampling Methods For Time Series
5 pages
Bootstrap
No ratings yet
Bootstrap
52 pages
Bradley Efron, R.J. Tibshirani An Introduction To Bootstrap
60% (5)
Bradley Efron, R.J. Tibshirani An Introduction To Bootstrap
225 pages
Jackknife and Bootstrap Methods Explained
No ratings yet
Jackknife and Bootstrap Methods Explained
15 pages
Small-Sample Inference and Bootstrap: Leonid Kogan
No ratings yet
Small-Sample Inference and Bootstrap: Leonid Kogan
29 pages
Bootstrapping for Measurement Uncertainty
No ratings yet
Bootstrapping for Measurement Uncertainty
5 pages
Bootstrap Method in Statistics
No ratings yet
Bootstrap Method in Statistics
14 pages
Bootstrap Methods for Statistical Inference
No ratings yet
Bootstrap Methods for Statistical Inference
22 pages
Financial Statistics Laboratory 3: Bootstrap
No ratings yet
Financial Statistics Laboratory 3: Bootstrap
16 pages
Double Bootstrapping For Visualising The Distribution
No ratings yet
Double Bootstrapping For Visualising The Distribution
22 pages
Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics
No ratings yet
Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics
16 pages
Bootstrap 1
No ratings yet
Bootstrap 1
16 pages
Lesson 16
No ratings yet
Lesson 16
24 pages
Horowitz Annu Rev
No ratings yet
Horowitz Annu Rev
67 pages
Gradient Descent and Model Validation
No ratings yet
Gradient Descent and Model Validation
57 pages
4.5-Bootstrap Variations
No ratings yet
4.5-Bootstrap Variations
25 pages
Study Theme 2 - Chapter 12 - Confidence Intervals With Bootstrapping
No ratings yet
Study Theme 2 - Chapter 12 - Confidence Intervals With Bootstrapping
12 pages
L8 Bootstrap Methods
No ratings yet
L8 Bootstrap Methods
69 pages
MPRA Paper 7163
No ratings yet
MPRA Paper 7163
24 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
A Practical Guide To Bootstrap in R
No ratings yet
A Practical Guide To Bootstrap in R
4 pages
Validation Slides
No ratings yet
Validation Slides
18 pages
2 Bootstrap: 2.1 Review On Usual Asymptotic Inference
No ratings yet
2 Bootstrap: 2.1 Review On Usual Asymptotic Inference
7 pages
Bootstrapping Method for Statistical Analysis
100% (1)
Bootstrapping Method for Statistical Analysis
18 pages
ThinkNEXT: Industrial Training & Awards
No ratings yet
ThinkNEXT: Industrial Training & Awards
15 pages
Advances in Science & Technology Unit 9
No ratings yet
Advances in Science & Technology Unit 9
56 pages
MikroTik Security for Network Pros
No ratings yet
MikroTik Security for Network Pros
3 pages
IBM B Type Gen 7 Installation Migration and Best Practices Guide
No ratings yet
IBM B Type Gen 7 Installation Migration and Best Practices Guide
342 pages
Marketing Presentation 460
No ratings yet
Marketing Presentation 460
10 pages
Types of Matrices
No ratings yet
Types of Matrices
8 pages
19864-Air802 Catalog Poweroverethernet PDF
No ratings yet
19864-Air802 Catalog Poweroverethernet PDF
9 pages
ED35 & ED36: Auxiliary Flight Data Acquisition & Management Unit
No ratings yet
ED35 & ED36: Auxiliary Flight Data Acquisition & Management Unit
2 pages
New Syllabus
No ratings yet
New Syllabus
6 pages
Fast API
No ratings yet
Fast API
4 pages
Diagnostics Apps Check 280719
No ratings yet
Diagnostics Apps Check 280719
692 pages
Similarity Semhas Proposal Karya Tulis Ilmiah Selly PDF
No ratings yet
Similarity Semhas Proposal Karya Tulis Ilmiah Selly PDF
249 pages
Basic Concepts in Ict
No ratings yet
Basic Concepts in Ict
28 pages
GIS - Chap 3
No ratings yet
GIS - Chap 3
7 pages
Ch3 DLL Tennanbaum
No ratings yet
Ch3 DLL Tennanbaum
41 pages
OS Unit-1 Notes
83% (6)
OS Unit-1 Notes
8 pages
84395-Wsa Authentication Methods
No ratings yet
84395-Wsa Authentication Methods
10 pages
(WWW - Entrance-Exam - Net) - Tamil Nadu Open University B.C.A First Year Exam Sample Paper 4
No ratings yet
(WWW - Entrance-Exam - Net) - Tamil Nadu Open University B.C.A First Year Exam Sample Paper 4
3 pages
.Json Application/json
No ratings yet
.Json Application/json
13 pages
Ade 20-21
No ratings yet
Ade 20-21
6 pages
Joining A Zoom Meeting For Patients UPDATE 1
No ratings yet
Joining A Zoom Meeting For Patients UPDATE 1
4 pages
Blockchain for Electronics Supply Chains
No ratings yet
Blockchain for Electronics Supply Chains
34 pages
Multi-Agent Deep Reinforcement Learning For Persistent Monitoring With Sensing Communication and Localization Constraints
No ratings yet
Multi-Agent Deep Reinforcement Learning For Persistent Monitoring With Sensing Communication and Localization Constraints
13 pages
Performance: Performance Accessibility Best Practices SEO
No ratings yet
Performance: Performance Accessibility Best Practices SEO
6 pages
Class Diagram for User System
No ratings yet
Class Diagram for User System
1 page
Accenture Supply Chain AI
No ratings yet
Accenture Supply Chain AI
17 pages
ECOSYS M3860idnf
No ratings yet
ECOSYS M3860idnf
2 pages
Mobile Application & Development Unit-3
No ratings yet
Mobile Application & Development Unit-3
17 pages
215-12218 - A0 - Replacing An E5700 or E2800 Controller Canister (Duplex) 2
No ratings yet
215-12218 - A0 - Replacing An E5700 or E2800 Controller Canister (Duplex) 2
17 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
6 pages

Boot

Uploaded by

Boot

Uploaded by

Chapter 11

This chapter covers the following topics:

• What is the Bootstrap?

11.2 A More General Notion of “Parameter”

1. Let X1 , . . . , Xn ⇠ P where P 2 (P✓ : ✓ 2 ⇥). Let ✓bn be the maximum likelihood

2. Let X1 , . . . , Xn ⇠ P and let ✓ = T (P ) denote the mean of P . Hence, ✓ = E[Xi ] =

3. Let X1 , . . . , Xn ⇠ P and let ✓ = T (P ) denote the median of P . Hence, P(Xi  ✓) =

A sample of size n drawn from Pn is called a bootstrap sample, denoted by

11.3 The Bootstrap

Bootstrap Variance Estimator

1. Draw a bootstrap sample X1⇤ , . . . , Xn⇤ ⇠ Pn . Compute ✓bn⇤ = g(X1⇤ , . . . , Xn⇤ ).

2. Repeat the previous step, B times, yielding estimators ✓bn,1

Now we describe the confidence interval algorithm.

Bootstrap Confidence Interval

1. Draw a bootstrap sample X1⇤ , . . . , Xn⇤ ⇠ Pn . Compute ✓bn⇤ = g(X1⇤ , . . . , Xn⇤ ).

2. Repeat the previous step, B times, yielding estimators ✓bn,1

0.0 0.5 1.0 1.5 2.0

Theorem 139. Under appropriate regularity conditions,

Example 140. Consider the polynomial regression model Y = g(X) + ✏ where X, Y 2 R

Example 141. Let (X1 , Y1 , Z1 ), . . . , (Xn , Yn , Zn ) ⇠ P where Xi 2 R, Yi 2 R, Zi 2 Rd . The

11.5 Why Does the Bootstrap Work?

be the bootstrap approximation to Fn . We do not know Fn be we do know Fbn in the

But F is close to Fbn as long as we take B large. See Figure 11.2.

almost surely, as B ! 1. Since we can take B as large as we want, we can ignore

cdf of a N (0, 1) random variable. Let Zn = n(X n µ) . Then

Proof of the Bootstrap Theorem.

Let Z ⇠ N (0, 1). Then, Z ⇠ N (0, 2

Using the same argument on the third term, we have that

11.6 A Few Remarks About the Bootstrap

Here are some random remarks about the bootstrap:

11.7 The High-Dimensional Bootstrap

Now we consider the bootstrap in high-dimensions. Let X1 , . . . , Xn 2 Rd where d may be

High Dimensional Bootstrap

2. Repeat the previous step, B times, yielding estimators µ b⇤n,B .

11.9 Finite Sample Methods

11.9.1 The Permutation Test

(X, Y )T X1 ... Xn Y1 ... Ym

Let T = T (Z, W ) be any test statistic. For example, consider T = |X Y |. We can

Let T ⇤ = T (Z, W ⇤ ) where W ⇤ denotes a random permutation of W . Define the permutation

2. Repeat B times: form a random permutation W ⇤ of W and compute T ⇤ = T (Z, W ⇤ ).

3. Compute the p-value

Theorem 146. Suppose we reject H0 : F = G whenever p < ↵. If H0 is true then

0.2 0.3 0.4 0.5 0.6

11.9.2 Confidence Rectangles for Quantiles

11.9.3 Confidence Rectangles for Means

11.9.4 Conformal Methods

process B times we get ✓b1⇤ , . . . , ✓B

11.11 Bibliographic Remarks

More on Plug-in Estimators. Let ✓ = T (P ). The plug-in estimator of ✓ is ✓bn = T (Pn )

Thus, the plug-in estimator is

Gateaux derivative, Gateaux derivative is defined by

Let d(P, Q) = supx |P (( 1, x]) Q(( 1, x])|. Sufficient conditions

You might also like