0% found this document useful (0 votes)
102 views

Bootstrap PDF

The document discusses resampling methods like cross-validation and the bootstrap. It explains that the bootstrap is a statistical tool that can quantify the uncertainty associated with an estimator or statistical learning method. As an example, it shows how the bootstrap can estimate the standard error of coefficients from a linear regression. It then demonstrates applying the bootstrap to estimate the variability of the optimal asset allocation coefficient α in a basic portfolio optimization model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Bootstrap PDF

The document discusses resampling methods like cross-validation and the bootstrap. It explains that the bootstrap is a statistical tool that can quantify the uncertainty associated with an estimator or statistical learning method. As an example, it shows how the bootstrap can estimate the standard error of coefficients from a linear regression. It then demonstrates applying the bootstrap to estimate the variability of the optimal asset allocation coefficient α in a basic portfolio optimization model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Resampling Methods: The Bootstrap

Rebecca C. Steorts, Duke University

STA 325, Chapter 5 ISL

1 / 24
Agenda

I Re-sampling Methods
I Cross Validation
I The Bootstrap

2 / 24
Re-sampling Methods

A re-sampling method involves repeatedly drawing samples from a


training data set and refitting a model to obtain addition
information about that model.
Example: Suppose we want to know the variability associated with a
linear regression model.

1. Draw different samples from the training data


2. Fit a linear regression to each sample
3. Examine how the fits differ

3 / 24
Re-sampling Methods

In this module, we focus on cross-validation (CV) and the bootstrap.

I CV can be used to estimate the test error associated with a


statistical learning method to evaluate its performance or to
select a model’s level of flexibility
I The boostrap most commonly measures the accuracy of a
parameter estimate or of a given statistical learning method.

1. Model assessment: the process of evaluating a model’s


performance
2. Model selection: the process of selectiing the proper level of
flexibility for a model

4 / 24
The Bootstrap

The bootstrap is a widely applicable and extremely powerful


statistical tool that can be used to quantify the uncertainty
associated with a given estimator or statistical learning method.

5 / 24
The Bootstrap

As a simple example, the bootstrap can be used to estimate the


standard errors of the coefficients from a linear regression fit.
Of course, we can get these from packages, so this isn’t particularly
useful, but this is just one simple example of the bootsrap.
The power of the bootstrap lies in the fact that it can be easily
applied to a wide range of statistical learning methods, including
some for which a measure of variability is otherwise difficult to
obtain and is not automatically output by statistical software.

6 / 24
Toy example: Investing

Suppose we wish to determine the best investment allocation under


a simple model.
Later, we explore the use of the bootstrap to assess the variability
associated with the regression coefficients in a linear model fit.

7 / 24
Toy example: Investing

Suppose that we wish to invest a fixed sum of money in two


financial assets that yield returns of X and Y , where X and Y are
random quantities.
We will invest a fraction α of our money in X , and will invest the
remaining 1 − α in Y .
Since there is variability associated with the returns on these two
assets, we wish to choose α to minimize the total risk, or variance,
of our investment.

8 / 24
Toy example: Investing

That is we want to minimize

Var (αX + (1 − α)Y ).

One can show (exercise) that the value that minimizes the risk is
given by

σY2 − σXY
α= , (1)
σY2 + σX2 − 2σXY

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

9 / 24
Toy example: Investing

I In reality, σX2 , σY2 , σXY are unknown.


I We can compute estimates of these quantities σ̂X2 , σ̂Y2 , σ̂XY
using a data set that contains past measurements for X and Y .
I We can then estimate the value of α that minimizes the
variance of our investment using

σ̂Y2 − σ̂XY
α̂ = , (2)
σ̂Y2 + σ̂X2 − 2σ̂XY

10 / 24
Toy example: Investing

I It is natural to wish to quantify the accuracy of our estimate of


α.
I We can understand how this might work for simulated data but
in general, we cannot apply this to real data since we cannot
generate new samples from the original population (since its
unknown).

11 / 24
The Bootstrap

The bootstrap approach allows us to use a computer to emulate the


process of obtaining new sample sets, so that we can estimate the
variability of α̂ without generating additional samples.
Rather than repeatedly obtaining independent data sets from the
population, we instead obtain distinct data sets by repeatedly
sampling observations from the original data set.

12 / 24
The Boostrap

Suppose we have a simple dataset Z with n observations.

1. Randomly select n observations from the data set in order to


produce a bootstrap data set, Z ∗1 .

I The sampling is performed with replacement, which means that


the same observation can occur more than once in the
bootstrap data set.

2. We can use Z ∗1 to produce a new bootstrap estimate for α,


which we call α̂∗1 .

13 / 24
The Boostrap (continued)
I This procedure is repeated B times for some large value of B,
in order to produce B different bootstrap data sets,

Z ∗1 , Z ∗2 , . . . , Z ∗B .

and B corresponding α estimates,

α̂∗1 , α̂∗2 , . . . , α̂∗B .

3. We can compute the standard error of these bootstrap


estimates using the formula
v
u B B
u 1 X 1 X
SEB (α̂) = t (α̂∗r − α̂∗r 0 )
B − 1 r =1 B r 0 =1

4. This serves as an estimate of the standard error of α̂ estimated


from the original data set.
14 / 24
The Bootstrap

Figure 1: A graphical illustration of the bootstrap approach on a small


sample containing n = 3 observations. Each bootstrap data set contains n
observations, sampled with replacement from the original data set. Each
bootstrap data set is used to obtain an estimate of α.
15 / 24
The Boostrap in Practice

Performing a bootstrap analysis in R entails only two steps.

1. We must create a function that computes the statistic of


interest.
2. We use the boot() function, which is part of the boot library,
to perform the bootstrap by repeatedly sampling observations
from the data set with replacement.

16 / 24
The Boostrap on the Portfolio data set

The Portfolio data set in the ISLR package is the investment data
set that motivated the bootstrap earlier.
To illustrate the use of the bootstrap on this data, we must

1. Create a function, alpha.fn(), which takes as input the (X , Y )


data as well as a vector indicating which observations should
be used to estimate α.
2. Then the function will output the estimate for α based on the
selected observations.

17 / 24
The Boostrap on the Portfolio data set

library(ISLR)
alpha.fn=function(data,index){
X=data$X[index]
Y=data$Y[index]
return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}
alpha.fn(Portfolio , 1:100)

## [1] 0.5758321

This function returns, or outputs, an estimate for α based on


applying equation (5.7) to the observations indexed by the
argument index.

18 / 24
The Boostrap on the Portfolio data set

The next command uses the sample() function to randomly select


100 observations from the range 1 to 100, with replacement.
This is equivalent to constructing a new bootstrap data set and
recomputing α̂ based on the new data set.

set.seed (1)
alpha.fn(Portfolio,sample(100,100,replace=T))

## [1] 0.5963833

19 / 24
Implementing the Bootstrap

We can implement a bootstrap analysis by performing this


command many times, recording all of the corresponding estimates
for α, and computing the resulting standard deviation.
The boot() function automates this approach.

20 / 24
Implementing the Bootstrap

Load the boot package in R.

library(boot)

21 / 24
Implementing the Bootstrap
# produce R=1000 bootstrap estimates
# for alpha using boot()
boot(Portfolio, alpha.fn, R=1000)

##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Portfolio, statistic = alpha.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.5758321 -7.315422e-05 0.08861826

22 / 24
Bootstrap Summary Results

The final output shows that using the original data, α̂ = 0.5758,
and that the bootstrap estimate for SE(α̂) is 0.0886.

23 / 24
Bootstrap: Other Applied Examples with R

There is an excellent lab on applying the bootstrap to linear


regression. Please work through this on your own. See ISL, page
195 – 197.

24 / 24

You might also like