1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.
http://www.jstor.org
A LeisurelyLook at the Bootstrap,the Jackknife,and
Cross-Validation
BRADLEY EFRON and GAIL GONG*
KEY WORDS: Bias estimation;Variance estimation; HavingobservedXI = xi, X, =x, . I,= xn, we com-
Nonparametricstandard errors; Nonparametriccon- pute the sample average x-= n x,ln for use as an
fidenceintervals;Errorrate prediction. estimateof the expectationof F.
An interestingfact,and a crucial one forstatistical
applications,is thatthe data set providesmorethanthe
estimatex It also givesan estimateforthe accuracyof
1. INTRODUCTION
x-,namely
This articleis intendedto coverlotsof ground,but at
a relaxed mathematicallevel that omits most proofs, - ) X (2)
&=:n.(n
regularityconditions,and technicaldetails.The ground
in questionis thenonparametric estimationofstatistical 6 is the estimatedstandarderrorof X= x, the root
error."Error" here refersmainlyto the bias and stan- mean squared errorof estimation.
dard errorof an estimator,or to the errorrate of a
The troublewithformula(2) is thatitdoes not,in any
data-based predictionrule.
obvious way, extend to estimatorsother than X, for
All of the methodswe discussshare some attractive
example the sample median. The jackknifeand the
propertiesforthe statisticalpractitioner:theyrequire
bootstrapare two ways of makingthisextension.Let
verylittlein thewayof modeling,assumptions,or anal-
ysis, and can be applied in an automaticway to any n __ -X
3.50 -
2. THE BOOTSTRAP *8
*5
? The AmericanStatistician,
February1983, Vol. 37, No. 1 37
cases, buthas highervariability thanaB, as shownbyits
highercoefficientof variation.The minimumpossible
Normaltheorydensity H>
Histogram coefficient ofvariation(C.V.), fora scale-invariant esti-
mate of a(F), assumingfull knowledgeof the para-
metricmodel, is shownin brackets.In the normalcase,
Histogram for example, .19 is the C.V. of [ (x,--)
I /14]12. The
percentiles
16% 50% 84%
bootstrapestimateperformswell by thisstandardcon-
sidering its totally nonparametriccharacter and the
-.4 -.3 -.2 -.1 0 2 small sample size.
Figure 2. Histogram of B = 1000 bootstrap replications p* for the Table 2 returnsto the case of p, the correlationcoef-
law school data. The normal theory density curve has a similar ficient.Instead of real data we have a samplingexperi-
shape, but falls off more quickly at the upper tail. ment in which F is bivariatenormal,true correlation
p = .5, and the sample size is n = 14. The leftside of
obtaining independent bootstrap replications p', Table 2 refersto p, while the rightside refersto the
p*2
I p*B , and approximate&Bby statistic$ = tanhW -I = .5 log(1 + p)/(1- p). For each
estimator6, the rootmean squared errorof estimation
(B - 1) [E(a - U)2]1/2 is given in the column headed MSE.
aB= [( E ) P_- (11)
.-)p
The bootstrapwas run with B = 128 and B = 512,
the lattervalue yieldingonly slightlybetterestimates
As B xc, (11) approachesthe originaldefinition(10).
&B. FurtherincreasingB would be pointless. It can
The choice of B is furtherdiscussedbelow, but mean- be shownthatB = x would give MSE = .063 in the p
whilewe won't distinguish between(10) and (11), call- case, only .001 less than using B = 512. As a point
ing both estimates1B- of comparison, the normal theory estimate for the
Figure 2 shows B = 1000 bootstrapreplicationsp', standard error of p, &NORM = (1 - )1(n - 3)12, has
..., p*'? for the law school data. The abscissa is plot- /MSE = .056.
ted in terms of p* - p= p* - .776. Formula (11) gives Why not generate the bootstrapobservationsfrom
aB = .127. This can be comparedwiththe normalthe- an estimate of F which is smootherthan F? This is
oryestimateof standarderrorforp, (Johnsonand Kotz done in lines 3, 4, and 5 of Table 2. Let X =,
1970, p. 229), (x, - x-)(x, - x-)'In be the sample covariance matrix
of the observed data. The normal smoothed boot-
&NORM X z *11 5 - strap draws the bootstrap sample Xl, X*, ..., X
fromF eDX2(O, .25Z), e indicatingconvolution.This
One thingis obvious about the bootstrapprocedure: amountsto estimatingF by an equal mixtureof the n
it can be applied just as well to any statistic,simpleor distributions XNS(x,,.25k), thatis by a normalwindow
complicated, as to the correlation coefficient.In estimate.Smoothingmakes littledifferenceon the left
Table 1 the statisticis the25 percenttrimmedmean for side of the table, but is spectacularlyeffectivein the 4
a sample of size n = 15. The true distributionF (now case. The latterresultis suspectsincethe truesampling
defined on the line ratherthan on the plane) is the distributionis bivariate normal, and the function
standardnormalX(0, 1) fortheleftside of thetable, or 4 = tanh-' p is specificallychosen to have nearlycon-
one-sided negativeexponentialforthe rightside. The stantstandarderrorin thebivariate-normal family.The
true standarderrorsa(F) are .286 and .232, respec- uniformsmoothedbootstrapsamplesX .,X* from
tively.In bothcases, &B, calculatedwithB = 200 boot- Fe6DN0, .25,Z), where t(0, .25X) is the uniform
strapreplications,is nearlyunbiased forca(F). distribution on a rhombusselectedso t has mean vec-
The jackknife estimate of standard error &j, de- tor 0 and covariancematrix.25,. It yieldsmoderate
scribed in Section 3, is also nearlyunbiased in both reductionsin MSE forboth sides of the table.
The standardnormal-theory estimatesofline8, Table
Table 1. A Sampling ExperimentComparingthe 2, are themselvesbootstrapestimates,carriedout in a
Bootstrapand JackknifeEstimatesof Standard parametricframework.The bootstrapsample Xl, ... .
Errorforthe 25% TrimmedMean, X* is drawnfromthe parametricmaximumlikelihood
Sample Size n = 15 distribution
F'NORM~
J<(,,)
F Standard Normal F Negative Exponential
Coeff Coeff ratherthanthenonparametric maximumlikelihooddis-
Ave Sd Var Ave Sd Var tributionF, and with only this change the bootstrap
Bootstrap&B .287 .071 .25 .242 .078 .32 algorithmproceeds as previouslydescribed.In practice
(B = 200) the bootstrapprocess is not actuallycarriedout. If it
Jackknife
&JJ: .280 .084 .30 .224 .085 .38 were, and if B-*x, then a high-orderTaylor series
analysis shows that UB would equal approximately
True: .286 [.191 .232 [.27]
[Minimum
C.V.] (1 - ip2)/(n- 3)' 2 theformulaactuallyused to compute
line 8 forthe p side of Table 2. Notice thatthe normal
SummaryStatisticsfor200 Trials
Standard Error Standard Error
Estimates forp Estimates for )
Ave Std Dev CV VMSE Ave Std Dev CV VMSE
1. BootstrapB= 128 .206 .066 .32 .067 .301 .065 .22 .065
2. BootstrapB = 512 .206 .063 .31 .064 .301 .062 .21 .062
3. NormalSmoothed BootstrapB 128 .200 .060 .30 .063 .296 .041 .14 .041
4. UniformSmoothed BootstrapB 128 .205 .061 .30 .062 .298 .058 .19 .058
5. UniformSmoothed BootstrapB 512 .205 .059 .29 .060 .296 .052 .18 .052
6. Jackknife .223 .085 .38 .085 .314 .090 .29 .091
7. Delta Method .175 .058 .33 .072 .244 .052 .21 .076
(Infinitesimal
Jackknife)
8. NormalTheory .217 .056 .26 .056 .302 0 0 .003
True Standard Error .218 .299
UJ=[((n
P(i) (1/(n - 1)) (1, 1, . . . 1,,0I 1, .,. 1)'
Like the bootstrap,the jackknifecan be applied to any (O in ith place),
statisticthatis a functionof n independentand identi- i = 1, 2, ..., n. These are indicatedby the open circles
callydistributedvariables.It performsless well thanthe in Figure 3. In general there are n jackknifepoints,
bootstrapin Tables 1 and 2, and in most cases investi- compared with(2n7 1) bootstrappoints.
gated by the author(see Efron 1982), but requiresless The troublewithbootstrapformula(13) is that 0(P)
computation.In fact the two methods are closely re- is usually a complicated functionof P (think of the
lated, whichwe shall now show. examples in Sec. 2), and so var, 0(P*) cannotbe evalu-
Suppose the statisticof interest,which we will now
call O(xl,X2, . . . Xv) is offunctional
, form:0 F0(), x3
where 0(F) is a functionalassigninga real numberto
any distribution F on the sample space. Both examples 1/27
in Section 2 are of thisform.Let P= (P1, P2,. PO)
be a probabilityvectorhavingnonnegativeweightssum-
mingto one, and definethe reweightedempiricaldistri-
1/9 1/9
butionF(P): mass Pi on xi, i = 1, 2, . . ., n. Correspond-
ing to P is a resampledvalue of the statisticof interest,
say 0(P) = 0(F(P)). The shorthandnotation 0(P) as- p (2) P0 (1)
sumes that the data points x1, x2, ..., xn are fixed at
theirobserved values. 1/9 2/9 1/9
Anotherway to describethe bootstrapestimate6B iS
as follows. Let P* indicate a vector drawn fromthe
rescaled multinomialdistribution
P* 4ultn(n,P?)ln, (Po (1/n) (1, 1,. 1)'), (12)
x1 1/27 1/9 P (3) 1/9 1/27 x2
meaning the observed proportions from n random
draws on n categories, with equal probability1/nfor Figure 3. The bootstrapand jackknifesamplingpointsin the case
each category.Then n = 3. The bootstrap points (-) are shown withtheirprobabilities.
? The AmericanStatistician,
February1983, Vol. 37, No. 1 41
1, a = E{0(F) - O(F)}. In the notationof Section3, and havebeeninterested in theexpectation, and the
Quenouille'sestimateis standarddeviationcrof R.) The bootstrapalgorithm
proceedsas describedin Section2, withthese two
pi= (n - 1)(0() -) (18) changes:at step(ii), we calculatethebootstraprepli-
1j from0, to correctthebias leads to the
Subtracting cationR * = R (X*, X*, ..., X*; F), and at step (iii) we
jackknifeestimateof 0, 0;= nO - (n - 1)0(.), see Miller calculatethedistributional ofinterest
property fromthe
(1974), and also Schucany,Gray,and Owen (1971). empiricaldistribution
ofthebootstrap R*1
replications
Thereare manywaysto justify (18). Herewe follow D* 2 D* B
R , . . ., R
thesamelineofargument as in thejustification of 6f]. For example,we mightbe interested in theproba-
The bootstrap estimateof 1, which has an obvious mo- VNi(X - ,u)/Sexceeds2,
bilitythattheusualt statistic
tivation,is introduced,and then(18) is relatedto the where ,u= E{X} and S2 = I(Xi - X)21(n - 1). Then
bootstrap estimatebya Taylorseries argument. R =NV"(X* --)/S*, and the bootstrapestimate is
The bias can be thought of as a function of theun- #fR*b>2}1B. Thiscalculation is used in Section9 of
knownprobability distributionF, 1 = 1(F). The boot- Efron(1981c)to getconfidence intervalsforthemean
strapestimateofbias is simply ,uin a situation
whereiiormalityis suspect.
problemofSections8 and 9 in-
The cross-validation
B = 1(F) = E4f0(F*) - 0(F)}. (19)
volvesa differenttypeof errorrandomvariableR. It
HereE*indicates expectation withrespecttobootstrap willbe usefulthereto use a jackknife-type approxi-
sampling,and Ft*is the empiricaldistribution of the mationto thebootstrap expectation ofR,
bootstrap sample.
E*{R* } = R? + (n - 1)(R(.) - R?). (20)
In practice1B mustbe approximated byMonteCarlo
methods. Theonlychange inthealgorithmdescribed in Here R 0 = R (xi, x2, . . ., Xn; F) and R(.) = (ll/n)R(i),
Section2 is at step(iii), when of
instead (or in addition R(i)= R (xi, x2, . . ., xi 1, xi+1, . . ., Xn; F). The justifica-
to) CB we calculate tion of (20) is the same as forthe theoremof this
section,being based on a quadraticapproximation
formula.
B
42 ?) The AmericanStatistician,
February1983, Vol. 37, No. I
Table 4. BootstrapEstimatesof Standard Errorforthe between y and the vector of predictedvalues q(3)=
Hodges-Lehmann Two-Sample ShiftEstimate; (gl (13),. gn (13)),
m = 6, n = 9; TrueDistributionsBoth F and G
:min D(y, iq(1)).
Uniform[0, 1]
The most common choice of D is D (y, q) =
Expectation St. Dev. C. V. MSE -
(yi i)
B = 100 .165 .030 .18 .030 Having calculated 1, we can modifythe one-sample
Separate bootstrapalgorithmof Section 2, and obtain an esti-
B = 200 .166 .031 .19 .031
mate of 13'svariability:
B = 100 .145 .028 .19 .036
Combined (i) ConstructF puttingmass l/n at each observed
B = 200 .149 .025 .17 .031 residual,
True Standard Error .167 F: mass l/n on ii = y, - g,(13)
(ii) Constructa bootstrapdata set
which m = 6, n = 9, and both F and G were uniform
Y7 = g() + E*, i = 1, 2, n,
distributionson theinterval[0, 1]. The table is based on
100 trialsof the situation.The true standarderroris where the E* are drawn independentlyfromF, and
c(F, G) = .167. "Separate" refersto UB calculatedex- calculate
actlyas describedin the previousparagraph.The im- *:min D (Y*, M(n))
provementin going fromB = 100 to B = 200 is too
small to show up in the table.
(iii) Do step (ii) some large numberB of times,ob-
"Combined" refersto thefollowingidea: suppose we
taining independent bootstrap replications *1, *2
believe thatG is reallya translateof F. Then it wastes
information to estimateF and G separately.Insteadwe
.. *B, and estimatethe covariancematrixof (3 by
can formthe combinedempiricaldistribution
I 1 [E (0*b - *) (0*- 3*.)) (B - 1)1,
H: mass m+ on
m+n
X1, X2,. . ., Xm,Yi - 6, Y2 - 0. yYn-6.
(13*= 143*b)
All m + n bootstrapvariatesXl, ..., XA, Y7, ,.Yn In ordinarylinearregressionwe have gi(1) = t' 1 and
are thensampledindependently fromH. (We could add D(y, lj) = I(yi - qi)2. Section7 of Efron(1979a) shows
0 back to the Y, values, but thishas no effecton the thatin thiscase the algorithmabove can be carriedout
bootstrapstandarderrorestimate,since itjust adds the theoretically,B =, and yields
constant0 to each bootstrapreplicationV*.)
n it
The combinedmethodgives no improvementhere,
but it mightbe valuable in a many-sampleproblem (B= (> tit;) 6.2 = E I/n. (22)
wherethereare small numbersof observationsin each
sample, a situationthat arises in stratifiedsampling. This is theusual answer,exceptfordividingbyn instead
(See Efron 1982, Ch. 8.) The main point here is that of n - p in &2. Of coursethe advantageofthebootstrap
"bootstrap"is not a well-definedverb, and thatthere approach is that XB can just as well be calculated if,
maybe more than one way to proceed in complicated say, gi(1) = exp (ti1) and D(y, 1q) = [|, - 'rij
situations. Next we consider regression problems, There is another simplerway to bootstrapthe re-
whereagainthereis a choice of bootstrapping methods. gression problem. We can consider each covariate-
In a typicalregressionproblemwe observe n inde- responsepair x, = (t,,yi) to be a singledata point ob-
pendentreal-valuedquantitivesYi =yi, tained by random samplingfroma distributionF on
p + 1 dimensionspace. Then we applythe one-sample
Y, =gi(I) + Ei,i = 1, 2, . . , n. (21) bootstrapof Section 2 to the data set xi, x,, ..., xn.
The functions g,( ) are of knownform,usuallygi() = The two bootstrapmethodsforthe regressionprob-
g(3; ti), wheretiis an observedp-dimensionalvectorof lem are asymptoticallyequivalent, but can perform
covariates;i is a vectorofunknownparameterswe wish quite differently in small-samplesituations.The simple
to estimate.The gi are an independentand identically method,describedlast,takes less advantageof thespe-
distributed randomsamplefromsome distribution F on cial structureoftheregressionproblem.It does notgive
the real line, answer(22) in thecase ofordinaryleastsquares. On the
otherhand thlesimplemethodgivesa trustworthy esti-
81 E 2 . . , En F
mate of 1's variabilityeveniftheregressionmodel (21)
where F is assumed to be centered at zero in some is notcorrect.For thisreason we use thesimplemethod
sense, perhapsE{E} =0 or Prob{e < 0} =0.5. ofbootstrapping on theerrorratepredictionproblemof
Havingobservedthedata vectorV = y =(y1., n Sections9 and 10.
some measureof distance
we estimate<3 by minimizing As a finalexampleof bootstrapping complicateddata
of* forthe
of 1000 bootstrapreplications
Figure4. Histogram
err' Z Q [y, -j(t,;X())] .
hazards model. Courtesyof Rob
leukemiadata, proportional
Stanford.
Tibshirani, -r(t,; x,) being the prediction rule based on x(,) =
Table 5. The First10 Trialsof a Sampling Experiment R* =R(X*, F)= (P -P*) Q[yI, -r(t,;X*)], (25)
InvolvingFisher's Linear DiscriminantFunction.The
TrainingSet Has Size n = 14. The Expected whereP?= (1, 1,. 1)'/nas before,and -(, X*) is
Overoptimism is w =.096, see Table 6 the predictionrule based on the bootstrapsample.
Table 6 shows the resultsof two simulationexperi-
Rates
Error Estimates ofOveroptimism ments(100 trialseach) involvingFisher'slineardiscrim-
Appar- Over- Cross- Jack- Bootstrap inantfraction.The leftside relatesto the bivariatenor-
True ent optimismvalidationknife (B = 200)
Trialn., n, err err R wt B
mal situationdescribedin Section8: samplesize n = 14.
dimensiond = 2, mean vectorsfor the two randomly
A
1 9,5 .45a .286 .172 .214 .214 .083 = (+,, 0). The rightside
2 6,8 .312 .357 -.045 .000 .066 .098 selected normaldistributions
3 7,7 .313 .357 -.044 .071 .066 .110 stillhas n = 14, butthe dimensionhas been raisedto 5,
4 8,6 .351 .429 -.078 .071 .066 .107 withmean vectors(+1, 0, 0, 0, 0). Fullerdescriptions
5 8,6 .330 .357 - .027, .143 .148 .102
6 8,6 .318 .143 .175 .214 .194 .073 appear in Chapter7 of Efron(1982).
7 8,6 .310 .071 .239 .071 .066 .087 Seven estimatesof overoptimism wereconsidered.In
8 6,8 .382 .286 .094 .071 .056 .097 thed = 2 situation,the cross-validationestimate~, for
9 7,7 .360 .429 -.069 .071 .087 .127
10 8,6 .335 .143 - .192 .000 .010 .048 example, had expectation .091, standard deviation
.073, and correlation- .07 withR. This gave rootmean
46 ?) The AmericanStatistician,
February1983, Vol. 37, No. I
Table 7. The Last 11 LiverPatients.Negative Numbers Indicate Missing Values
Cons- Ster- Anti- Mal- Anor- Liver Liver Spleen As- Bili- Alk Albu- Pro- Histo-
tant Age Sex oid viral Fatigue aise exia Big Firm Palp Spiders cites Varices rubin Phos SGOT min tein logy
y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 #
1 1 45 1 2 2 1 1 1 2 2 2 1 1 2 1.90 -1 114 2.4 -1 -3 145
0 1 31 1 1 2 1 2 2 2 2 2 2 2 2 1.20 75 193 4.2 54 2 146
1 1 41 1 2 2 1 2 2 2 1 1 1 2 1 4.20 65 120 3.4 -1 -3 147
1 1 70 1 1 2 1 1 1 -3 -3 -3 -3 -3 -3 1.70 109 528 2.8 35 2 148
0 1 20 1 1 2 2 2 2 2 -3 2 2 2 2 .90 89 152 4.0 -1 2 149
0 1 36 1 2 2 2 2 2 2 2 2 2 2 2 .60 120 30 4.0 -1 2 150
1 1 46 1 2 2 1 1 1 2 2 2 1 1 1 7.60 -1 242 3.3 50 -3 151
0 1 44 1 2 2 1 2 2 2 1 2 2 2 2 .90 126 142 4.3 -1 2 152
0 1 61 1 1 2 1 1 2 1 1 2 1 2 2 .80 95 20 4.1 -1 2 153
0 1 53 2 1 2 1 2 2 2 2 1 1 2 1 1.50 84 19 4.1 48 -3 154
1 1 43 1 2 2 1 2 2 2 2 1 1 1 2 1.20 100 19 3.1 42 2 155
butitimpliesthatin situationswithspecialstructure
the Amongthese 19 tests,13 predictorsindicatedpredic-
bootstrapmay be outperformedby more specialized tive power by rejectingHo:j = 18, 13, 15, 12, 14, 7, 6,
methods.Here we have done so in two different ways. 19, 20, 11, 2, 5, 3. These are listedin orderof achieved
BootRand uses an estimateof F thatis betterthanthe significancelevel, j = 18 attainingthe smallestalpha.
totallynonparametric estimateF. BootAve makes use 2. These 13 predictorswere tested in a forward
of the particularform of R for the overoptimism multiple-logistic-regression program,whichadded pre-
problem. dictorsone at a time(beginningwiththeconstant)until
no furthersingle addition achieved significancelevel
aL= .10. Five predictorsbesides the constantsurvived
10. A COMPLICATED PREDICTION PROBLEM thisstep,j = 13, 20, 15, 7, 2.
3. A finalforward,stepwisemultiple-logistic-regres-
We end this articlewiththe bootstrapanalysisof a
sion programon these five predictors,stoppingthis
genuine predictionproblem, involvingmany of the
time at level a= .05, retainedfourpredictorsbesides
complexitiesand difficulties typicalof genuine prob-
the constant,j = 13, 15, 7, 20.
lems. The bootstrapis not necessarilythe best method
here,as discussedin Section9, butitis impressiveto see At each of the threesteps,onlythosepatientshaving
how muchinformation thissimpleidea, combinedwith no relevantdata missingwereincludedin thehypothesis
massivecomputation,can extractfroma situationthat tests.At step2 forexample,a patientwas includedonly
is hopelesslybeyondtraditionaltheoreticalsolutions.A if all 13 variableswere available.
fullerdiscussionappears in Efronand Gong (1981). The finalpredictionrule was based on the estimated
Among n = 155 acute chronichepatitispatients,33 logisticregression
were observedto die fromthe disease, while 122 sur-
vived.Each patienthad associateda vectorof 20 covar- log r(ti)- ~ ,~1
iates. On the basis of thistrainingset it was desiredto 1 -
rr(tJ) J=1, 13, 15, 7, 20
produce a rule for predicting,from the covariates,
whethera givenpatientwould liveor die. If an effective where j was the maximumlikelihoodestimatein this
predictionrule were available, it would be useful in model. The predictionrule was
choosingamong alternativetreatments.For example, '
patientswitha verylow predictedprobabilityof death -q(t; x) = ifY, ti ' (26)
could be givenless rigoroustreatment.
Let xi = (ti,yi) representthe data forpatienti, i = 1, c = log 33/122.
2, .. ., 155. Here tiis the 20-dimensionalvectorof co- Among the 155 patients,133 had none of thepredic-
variates,and yiequals 1 or 0 as thepatientdied or lived. tors 13, 15, 7, 20 missing.When the rule -q(t; x) was
Table 7 showsthedata forthelast 11 patients.Negative appliedto these133patients,itmisclassified 21 ofthem,
numbersrepresentmissingvalues. Variable 1 is thecon- for an apparent error rate err= 21/133= .158. We
stant1, includedforconvenience.The meaningof the would like to estimatehow overoptimistic e-rris.
19 otherpredictors,and theircodingin Table 7, willnot To answer this question, the simple bootstrapwas
be explainedhere. applied as describedin Section 9. A typicalbootstrap
A predictionrule was constructedin 3 steps: sample consistedof X*, X*, ..., X*5, randomlydrawn
withreplacementfromthe trainingset x1,x2, ,x155
1. An a = .05 test of the importanceof predictorj, The bootstrapsample was used to constructthe boot-
Ho: j = 0 versus HI: j * 0, was run separately for strappredictionrule-9( , X*), followingthesame three
j = 2, 3, ..., 20, based on the logisticmodel stepsused in theconstruction of-q(*, x), (26). This gives
)
a bootstrapreplicationR* fortheoveroptimism random
Tr(t,
log 1 - (t) + t, variableR = err- err,essentiallyas in (25), but witha
modificationto allow fordifficulties caused by missing
Tr(ti) Prob{patienti dies}. predictorvalues.
? The AmericanStatistician,
February1983, Vol. 37, No. 1 47
(see Efron 1982, Ch. VII), whichby definitionequals
[E(err - err-_ )2]112, the \/M@Eof e5irr + w as an esti-
mate of err.Comparingline 1 withline 4 in Table 6, we
expect err+ 0B = .203 to have VK4WE at least thisbig
forestimatingerr.
Figure6 illustratesanotheruse ofthebootstraprepli-
cations.The predictionschosenbythe three-stepselec-
tionprocedure,applied to thebootstraptrainingsetX*,
WB are shownforthelast25 ofthe500 replications.Among
all 500 replications,predictor13 was selected37 percent
-.10 -.05 0 .05 .10 .15 of the time,predictor15 selected48 percent,predictor
Figure 5. Histogram of 500 bootstrap replications of over- 7 selected35 percent,and predictor20 selected59 per-
optimismforthe hepatitisproblem. cent. No other predictorwas selected more than 50
percentof the time. No theoryexistsfor interpreting
Figure5 shows the histogramof B = 500 such repli- Figure6, buttheresultscertainlydiscourageconfidence
cations. 95 percentof these fall in the range0 - R* in the casual nature of the predictors13, 15, 7, 20.
.12. This indicates that the unobservabletrue over- [ReceivedJanuary1982. RevisedMay 1982.]
optimismerr- erris likelyto be positive.The average
value is REFERENCES
B
48 ? The AmericanStatistician,
February1983, Vol. 37, No. 1