An
Introduction
to the
Bootstrap
Bradley Efron
Department of Statistics
Stanford University
and
Robert J. Tibshirani
Department of Preventative Medicine and Biostatistics
and Department of Statistics, University of Zbronto
CHAPMAN & HALLICRC
Boca Raton London New York Washington, D.C.
Contents
1 Introduction 1
1.1 An overview of this book 6
1.2 Information for instructors 8
1 .3 Some of the notation used in the book 9
2 The accuracy of a sample mean 10
2 .1 Problems 15
3 -Random samples and probabilities 17
3 .1 Introduction 17
3 .2 Random samples 17
3.3 . Probability theory 20
3 .4 Problems 28
4 The empirical distribution function and the plug-in
principle 31
4.1 Introduction 31
4.2 The empirical distribution function 31
4.3 The plug-in principle 35
4.4 Problems 37
5 Standard errors and estimated standard errors 39
5.1 Introduction 39
5.2 The standard error of a mean 39
5.3 Estimating the standard error of.the mean 42
5.4 Problems 43
Viii CONTENTS
6 The bootstrap estimate of standard error 45
6.1 Introduction 45
6.2 The bootstrap estimate of standard error 45
6.3 Example: the correlation coefficient 49
6.4 The number of bootstrap replications B 50
6.5 The parametric bootstrap 53
6.6 Bibliographic notes 56
6.7 Problems 57
7 Bootstrap standard errors : some examples 60
7.1 Introduction 60
7.2 Example 1: test score data 61
7.3 Example 2: curve fitting 70
7.4 An example of bootstrap failure 81
7.5 Bibliographic notes 81
7.6 Problems 82
8 More complicated data structures 86
8.1 Introduction 86
8.2 One-sample problems 86
8.3 The two-sample problem 88
8 .4 More general data structures 90
8.5 Example: lutenizing hormone 92
8.6 The moving blocks bootstrap 99
8.7 Bibliographic notes 102
8.8 Problems 103
9 Regression models 105
9.1 Introduction 105
9.2 The linear regression model 105
9.3 Example; the hormone data 107
9.4 Application of the bootstrap 111
9.5 Bootstrapping pairs vs bootstrapping residuals 113
9.6 Example: the cell survival data 115
9.7 Least median of squares 117
9.8 Bibliographic notes 121
9.9 Problems 121
10 Estimates of bias 12,1
10.1 Introduction 124
1.0.2 The bootstrap estimate of bias 124
10 .3 Example: the patch data 126
10 .4 An improved estimate of bias 130
10 .5 The jackknife estimate of bias 133
10 .6 Bias correction 138
10 .7 Bibliographic notes 139
10 .8 Problems 139
11 The jackknife 141
11 .1 Introduction 141
11.2 Definition of the jackknife 141
11.3 Example : test score data 143
11.4 Pseudo-values 145
11.5 Relationship between the jackknife and bootstrap 145
11.6 Failure of the jackknife 148
11.7 The delete-d jackknife 149
11.8 Bibliographic notes 149
11.9 Problems 150
12 Confidence intervals based on bootstrap "tables" 153
12.1 Introduction 153
12.2 Some background on confidence intervals 155
12.3 Relation between confidence intervals and hypothe-
sis tests 156
12.4 Student's t interval 158
12.5 The bootstrap-t interval 160
12.6 Transformations and the bootstrap-t 162
12.7 Bibliographic notes 166
12.8 Problems 166
13 Confidence intervals based on bootstrap
percentiles 168
13.1 Introduction 168
13.2 Standard normal intervals 168
13.3 The percentile interval 170
13.4 Is the percentile interval backwards? 174
13.5 Coverage performance 174
13.6 The transformation-respecting property 175
13.7 The range-preserving property 176
13.8 Discussion 176
CONTENTS
13.9 Bibliographic notes 176
13.10 Problems 177
14 Better bootstrap confidence intervals 178
14 .1 Introduction 178
14.2 Example: the spatial test data 179
14.3 The BCd method 184
14.4 The ABC method 188
14.5 Example: the tooth data 190
14.6 Bibliographic notes 199
14.7 Problems 199
15 Permutation tests 202
15.1 Introduction 202
15.2 The two-sample problem 202
15.3 Other test statistics 210
15.4 Relationship of hypothesis tests to confidence
intervals and the bootstrap 214
15 .5 Bibliographic notes 218
15 .6 Problems 218
16 Hypothesis testing with the bootstrap 220
16.1. Introduction 220
16.2 The two-sample problem 220
16.3 Relationship between the permutation test and the
bootstrap 223
16.4 The one-sample problem 224
16.5 Testing multimodality of a population 227
16.6 Discussion 232
16.7 Bibliographic notes 233
16.8 Problems 234
17 Cross-validation and other estimates of prediction
error 237
17.1 Introduction 237
17.2 Example: hormone data 238
17.3 Cross-validation 239
17.4 Cr and other estimates of prediction error 242
17.5 Example : classification trees 243
1.7.6 Bootstrap estimates o£ prediction error 247
17.6.1 Overview 247
17 .6.2 Some details 249
17.7 The .632 bootstrap estimator 252
17.8 Discussion 254
17.9 Bibliographic notes 255
17.10 Problems 255
7.8 Adaptive estimation and calibration 258
18.1 Introduction 258
18.2 Example: smoothing parameter selection for curve
fitting 258
18.3 Example: calibration of a confidence point 263
18.4 Some general considerations 266
18.5 Bibliographic notes 268
18.6 Problems 269
19 Assessing the error in bootstrap estimates 271
19.1 Introduction 271
19.2 Standard error estimation 272
19.3 Percentile estimation 273
19.4 The jackknife-after-bootstrap 275
19.5 Derivations 280
19.6 Bibliographic notes 281
19.7 Problems 281
20 A geometrical representation for the bootstrap and
jackknife 283
20.1 Introduction 283
20.2 Bootstrap sampling 285
20.3 The jackknife as an approximation to the bootstrap 287
20.4 Other jackknife approximations 289
20.5 Estimates of bias 290
20.6 An example 293
20.7 Bibliographic notes 295
20.8 Problems 295
21 An overview of nonparametric and parametric
inference 296
21.1 Introduction 296
21.2 Distributions, densities and likelihood functions 296
xii CONTENTS
21,3 FVnctional statistics and influence functions 298
21.4 Parametric maximum likelihood inference 302
21.5 The parametric bootstrap 306
21 .6Relation of parametric maximum likelihood, boot-
strap and jackknife approaches 307
21,6 .1 Example: influence components for the mean 309
21.7 The empirical cdf as a maximum likelihood estimate 310
21.8 The sandwich estimator 310
21 .8.1 Example: Mouse data 311
21.9 The delta method 313
21,9 .1 Example: delta method for the mean 315
21 .9.2 Example: delta method for the correlation
coefficient 315
21.10 Relationship between the delta method and in
finitesimal jackknife 315
21.11 Exponential fandlies 316
21 .12 Bibliographic notes 319
21,13 Problems 320
22 Further topics in bootstrap confidence intervals 321
22.1 Introduction 321
22.2 Correctness and accuracy 321
22.3 Confidence points based on approximate pivots 322
22.4 The BC,, interval 325
22 .5 The underlying basis for the BC,, interval 326
22.6 The ABC approximation 328
22.7 Least favorable families 331
22.8 The ABCq method and transformations 333
22.9 Discussion 334
22.10 Bibliographic notes 335
22.11 Problems 335
23 Efficient bootstrap computations 338
23.1 Introduction 338
23.2 Post-sampling adjustments 340
23.3 Application to bootstrap bias estimation 342
23.4 Application to bootstrap variance estimation 346
23.5 Pre- and post-sampling adjustments 348
23.6 Importance sampling for tail probabilities 349
23.7 Application to bootstrap tail probabilities 352
CONTENTS xiii
23.8 Bibliographic notes 356
23.9 Problems 357
24 Approximate likelihoods 358
24.1 Introduction 358
24.2 Empirical likelihood 360
24.3 Approximate pivot methods 362
24.4 Bootstrap partial likelihood 364
24.5 Implied likelihood 367
24 .6 Discussion 370
24.7 Bibliographic notes 371
24.8 Problems 371
25 Bootstrap bioequivalence 372
25.1 Introduction 372
25.2 A bioequivalence problem 372
25.3 Bootstrap confidence intervals 374
25.4 Bootstrap power calculations 379
25.5 A more careful power calculation 381
25.6 Fieller's intervals 384
25.7 Bibliographic notes 389
25.8 Problems 389
26 Discussion and further topics 392
26.7. Discussion 392
26.2 Some questions about the bootstrap 394
26.3 References on further topics 396
Appendix : software for bootstrap computations 398
Introduction 398
Some available software 399
S language functions 399
References 413
Author index 426
Subject index 430