0% found this document useful (0 votes)

102 views

Bootstrap PDF

The document discusses resampling methods like cross-validation and the bootstrap. It explains that the bootstrap is a statistical tool that can quantify the uncertainty associated with an estimator or statistical learning method. As an example, it shows how the bootstrap can estimate the standard error of coefficients from a linear regression. It then demonstrates applying the bootstrap to estimate the variability of the optimal asset allocation coefficient α in a basic portfolio optimization model.

Uploaded by

Fernanda Paula Rocha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views

Bootstrap PDF

Uploaded by

Fernanda Paula Rocha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Resampling Methods: The Bootstrap

Rebecca C. Steorts, Duke University

STA 325, Chapter 5 ISL

1 / 24
Agenda

I Re-sampling Methods
I Cross Validation
I The Bootstrap

2 / 24
Re-sampling Methods

A re-sampling method involves repeatedly drawing samples from a

training data set and refitting a model to obtain addition
information about that model.
Example: Suppose we want to know the variability associated with a
linear regression model.

1. Draw different samples from the training data

2. Fit a linear regression to each sample
3. Examine how the fits differ

3 / 24
Re-sampling Methods

In this module, we focus on cross-validation (CV) and the bootstrap.

I CV can be used to estimate the test error associated with a

statistical learning method to evaluate its performance or to
select a model’s level of flexibility
I The boostrap most commonly measures the accuracy of a
parameter estimate or of a given statistical learning method.

1. Model assessment: the process of evaluating a model’s

performance
2. Model selection: the process of selectiing the proper level of
flexibility for a model

4 / 24
The Bootstrap

The bootstrap is a widely applicable and extremely powerful

statistical tool that can be used to quantify the uncertainty
associated with a given estimator or statistical learning method.

5 / 24
The Bootstrap

As a simple example, the bootstrap can be used to estimate the

standard errors of the coefficients from a linear regression fit.
Of course, we can get these from packages, so this isn’t particularly
useful, but this is just one simple example of the bootsrap.
The power of the bootstrap lies in the fact that it can be easily
applied to a wide range of statistical learning methods, including
some for which a measure of variability is otherwise difficult to
obtain and is not automatically output by statistical software.

6 / 24
Toy example: Investing

Suppose we wish to determine the best investment allocation under

a simple model.
Later, we explore the use of the bootstrap to assess the variability
associated with the regression coefficients in a linear model fit.

7 / 24
Toy example: Investing

Suppose that we wish to invest a fixed sum of money in two

financial assets that yield returns of X and Y , where X and Y are
random quantities.
We will invest a fraction α of our money in X , and will invest the
remaining 1 − α in Y .
Since there is variability associated with the returns on these two
assets, we wish to choose α to minimize the total risk, or variance,
of our investment.

8 / 24
Toy example: Investing

That is we want to minimize

Var (αX + (1 − α)Y ).

One can show (exercise) that the value that minimizes the risk is
given by

σY2 − σXY
α= , (1)
σY2 + σX2 − 2σXY

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

9 / 24
Toy example: Investing

I In reality, σX2 , σY2 , σXY are unknown.

I We can compute estimates of these quantities σ̂X2 , σ̂Y2 , σ̂XY
using a data set that contains past measurements for X and Y .
I We can then estimate the value of α that minimizes the
variance of our investment using

σ̂Y2 − σ̂XY
α̂ = , (2)
σ̂Y2 + σ̂X2 − 2σ̂XY

10 / 24
Toy example: Investing

I It is natural to wish to quantify the accuracy of our estimate of

α.
I We can understand how this might work for simulated data but
in general, we cannot apply this to real data since we cannot
generate new samples from the original population (since its
unknown).

11 / 24
The Bootstrap

The bootstrap approach allows us to use a computer to emulate the

process of obtaining new sample sets, so that we can estimate the
variability of α̂ without generating additional samples.
Rather than repeatedly obtaining independent data sets from the
population, we instead obtain distinct data sets by repeatedly
sampling observations from the original data set.

12 / 24
The Boostrap

Suppose we have a simple dataset Z with n observations.

1. Randomly select n observations from the data set in order to

produce a bootstrap data set, Z ∗1 .

I The sampling is performed with replacement, which means that

the same observation can occur more than once in the
bootstrap data set.

2. We can use Z ∗1 to produce a new bootstrap estimate for α,

which we call α̂∗1 .

13 / 24
The Boostrap (continued)
I This procedure is repeated B times for some large value of B,
in order to produce B different bootstrap data sets,

Z ∗1 , Z ∗2 , . . . , Z ∗B .

and B corresponding α estimates,

α̂∗1 , α̂∗2 , . . . , α̂∗B .

3. We can compute the standard error of these bootstrap

estimates using the formula
v
u B B
u 1 X 1 X
SEB (α̂) = t (α̂∗r − α̂∗r 0 )
B − 1 r =1 B r 0 =1

4. This serves as an estimate of the standard error of α̂ estimated

from the original data set.
14 / 24
The Bootstrap

Figure 1: A graphical illustration of the bootstrap approach on a small

sample containing n = 3 observations. Each bootstrap data set contains n
observations, sampled with replacement from the original data set. Each
bootstrap data set is used to obtain an estimate of α.
15 / 24
The Boostrap in Practice

Performing a bootstrap analysis in R entails only two steps.

1. We must create a function that computes the statistic of

interest.
2. We use the boot() function, which is part of the boot library,
to perform the bootstrap by repeatedly sampling observations
from the data set with replacement.

16 / 24
The Boostrap on the Portfolio data set

The Portfolio data set in the ISLR package is the investment data
set that motivated the bootstrap earlier.
To illustrate the use of the bootstrap on this data, we must

1. Create a function, alpha.fn(), which takes as input the (X , Y )

data as well as a vector indicating which observations should
be used to estimate α.
2. Then the function will output the estimate for α based on the
selected observations.

17 / 24
The Boostrap on the Portfolio data set

library(ISLR)
alpha.fn=function(data,index){
X=data$X[index]
Y=data$Y[index]
return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}
alpha.fn(Portfolio , 1:100)

## [1] 0.5758321

This function returns, or outputs, an estimate for α based on

applying equation (5.7) to the observations indexed by the
argument index.

18 / 24
The Boostrap on the Portfolio data set

The next command uses the sample() function to randomly select

100 observations from the range 1 to 100, with replacement.
This is equivalent to constructing a new bootstrap data set and
recomputing α̂ based on the new data set.

set.seed (1)
alpha.fn(Portfolio,sample(100,100,replace=T))

## [1] 0.5963833

19 / 24
Implementing the Bootstrap

We can implement a bootstrap analysis by performing this

command many times, recording all of the corresponding estimates
for α, and computing the resulting standard deviation.
The boot() function automates this approach.

20 / 24
Implementing the Bootstrap

Load the boot package in R.

library(boot)

21 / 24
Implementing the Bootstrap
# produce R=1000 bootstrap estimates
# for alpha using boot()
boot(Portfolio, alpha.fn, R=1000)

##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Portfolio, statistic = alpha.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.5758321 -7.315422e-05 0.08861826

22 / 24
Bootstrap Summary Results

The final output shows that using the original data, α̂ = 0.5758,
and that the bootstrap estimate for SE(α̂) is 0.0886.

23 / 24
Bootstrap: Other Applied Examples with R

There is an excellent lab on applying the bootstrap to linear

regression. Please work through this on your own. See ISL, page
195 – 197.

24 / 24

Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
100% (4)
Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
54 pages
Test Bank For Psych 3rd Edition by Rathus
100% (4)
Test Bank For Psych 3rd Edition by Rathus
40 pages
Resampling Methods For Dependent Data
No ratings yet
Resampling Methods For Dependent Data
382 pages
R For Actuaries & Data Scientists
50% (2)
R For Actuaries & Data Scientists
70 pages
Foundations of Quantitative Finance Book IV
No ratings yet
Foundations of Quantitative Finance Book IV
269 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Financial Statistics Laboratory 3: Bootstrap
No ratings yet
Financial Statistics Laboratory 3: Bootstrap
16 pages
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
No ratings yet
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
19 pages
R Programming R Basics For Beginners. (Z-Library)
No ratings yet
R Programming R Basics For Beginners. (Z-Library)
177 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
7 pages
Dennis C. Mueller - Profits in The Long Run-Cambridge University Press (2009)
No ratings yet
Dennis C. Mueller - Profits in The Long Run-Cambridge University Press (2009)
389 pages
Forecast
No ratings yet
Forecast
82 pages
Random Forest
No ratings yet
Random Forest
5 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Advance Stats
No ratings yet
Advance Stats
233 pages
A Primer of Multivariate Statistics PDF
No ratings yet
A Primer of Multivariate Statistics PDF
626 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
R Lesson (1 of 2) PDF
No ratings yet
R Lesson (1 of 2) PDF
182 pages
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
100% (1)
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
532 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
100% (12)
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
60 pages
The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
No ratings yet
The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
184 pages
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
No ratings yet
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
19 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
The Advantages of Least Squares Monte Carlo
0% (1)
The Advantages of Least Squares Monte Carlo
9 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Monte Carlo Studies Using SAS
100% (2)
Monte Carlo Studies Using SAS
258 pages
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
100% (1)
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
59 pages
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
100% (1)
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
65 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Causal Inference for Statistics Social and Biomedical Sciences An Introduction 1st Edition Guido W. Imbens all chapter instant download
100% (1)
Causal Inference for Statistics Social and Biomedical Sciences An Introduction 1st Edition Guido W. Imbens all chapter instant download
27 pages
Portfolio Optimization Using Particle Swarm Optimization
No ratings yet
Portfolio Optimization Using Particle Swarm Optimization
6 pages
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
100% (4)
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
84 pages
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
No ratings yet
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
18 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
Introduction Econometrics R
No ratings yet
Introduction Econometrics R
48 pages
Immediate download Paul Wilmott on Quantitative Finance 3 Volume Set (2nd Edition) ebooks 2024
100% (7)
Immediate download Paul Wilmott on Quantitative Finance 3 Volume Set (2nd Edition) ebooks 2024
36 pages
Sanet ST
No ratings yet
Sanet ST
385 pages
Statistics Done Wrong PDF
No ratings yet
Statistics Done Wrong PDF
27 pages
Writing Reproducible Reports: Knitr With R Markdown
No ratings yet
Writing Reproducible Reports: Knitr With R Markdown
24 pages
Bok:978 1 4899 7218 7 PDF
No ratings yet
Bok:978 1 4899 7218 7 PDF
375 pages
Workflow of Statistical Data Analysis
No ratings yet
Workflow of Statistical Data Analysis
105 pages
Download (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebook All Chapters PDF
100% (8)
Download (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebook All Chapters PDF
46 pages
Introduction To Python For Econometrics PDF
No ratings yet
Introduction To Python For Econometrics PDF
359 pages
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
100% (1)
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
51 pages
Time Series Analysis
100% (1)
Time Series Analysis
15 pages
Ibook - Pub Seasonal Adjustment Methods and Real Time Trend Cycle Estimation
No ratings yet
Ibook - Pub Seasonal Adjustment Methods and Real Time Trend Cycle Estimation
293 pages
Regression Modelling With Actuarial and Financial Applications - Key Notes
No ratings yet
Regression Modelling With Actuarial and Financial Applications - Key Notes
3 pages
Time Series PDF
No ratings yet
Time Series PDF
121 pages
Introductory Concepts of Probabability & Statistics
No ratings yet
Introductory Concepts of Probabability & Statistics
6 pages
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
No ratings yet
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
108 pages
Max and Min PDF
No ratings yet
Max and Min PDF
19 pages
Tidy Portfoliomanagement in R
100% (1)
Tidy Portfoliomanagement in R
94 pages
Measurement Uncertainty and Probability (Willink R., 2013)
No ratings yet
Measurement Uncertainty and Probability (Willink R., 2013)
294 pages
Statistics With Common Sense - David Kault (2003)
No ratings yet
Statistics With Common Sense - David Kault (2003)
272 pages
Advanced Microeconomic Theory
No ratings yet
Advanced Microeconomic Theory
103 pages
Lecture 9 PDF
No ratings yet
Lecture 9 PDF
22 pages
Appendix Bootstrapping
No ratings yet
Appendix Bootstrapping
18 pages
Braun Bootstrap2012 PDF
No ratings yet
Braun Bootstrap2012 PDF
63 pages
A Class of Runge-Kutta-Rosenbrock Methods For Solving Stiff Diferential Equations
No ratings yet
A Class of Runge-Kutta-Rosenbrock Methods For Solving Stiff Diferential Equations
8 pages
Incorporating Boosted Regression Trees Into Ecological Latent Variable Models
No ratings yet
Incorporating Boosted Regression Trees Into Ecological Latent Variable Models
25 pages
Preliminary Solutions To Examination TNK053 2008-12-19 1
No ratings yet
Preliminary Solutions To Examination TNK053 2008-12-19 1
3 pages
Preliminary Solutions To Examination TNK053 2009-12-21 1. 2
No ratings yet
Preliminary Solutions To Examination TNK053 2009-12-21 1. 2
3 pages
Solution Trefetee
No ratings yet
Solution Trefetee
93 pages
How To Write A Thesis HSC English
100% (2)
How To Write A Thesis HSC English
7 pages
Report Card
No ratings yet
Report Card
3 pages
Chapter 08
No ratings yet
Chapter 08
110 pages
Analysis of In-Class Writing Errors of College Freshmen Students
No ratings yet
Analysis of In-Class Writing Errors of College Freshmen Students
24 pages
1-s2.0-S2713374524000098-main
No ratings yet
1-s2.0-S2713374524000098-main
6 pages
Peerj Cs 1481
No ratings yet
Peerj Cs 1481
22 pages
UNC Report: Distressed Urban Tracts
No ratings yet
UNC Report: Distressed Urban Tracts
30 pages
An Analysis On The Level of Satisfaction Brought by The School Canteen
100% (6)
An Analysis On The Level of Satisfaction Brought by The School Canteen
81 pages
Or Journal
No ratings yet
Or Journal
8 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
3 pages
References: DR Malcolm Reed
No ratings yet
References: DR Malcolm Reed
5 pages
Guidelines For Manuscript Preparation and Submission-Case Report
No ratings yet
Guidelines For Manuscript Preparation and Submission-Case Report
24 pages
10-Use of LyricTraining Website To Improve Listening
No ratings yet
10-Use of LyricTraining Website To Improve Listening
13 pages
BBC Learning English 6 Minute English: Angry People
No ratings yet
BBC Learning English 6 Minute English: Angry People
4 pages
Intuition Telepathyand Interspecies Communication
No ratings yet
Intuition Telepathyand Interspecies Communication
9 pages
Apayao VMOKRAPI Edited
0% (1)
Apayao VMOKRAPI Edited
42 pages
Designing Risk Qualitative Assessment On Fiber Optic Instalation
No ratings yet
Designing Risk Qualitative Assessment On Fiber Optic Instalation
13 pages
Psychometrics of The PTSD and Depression Screener For Bereaved Youth
No ratings yet
Psychometrics of The PTSD and Depression Screener For Bereaved Youth
13 pages
Training Evaluation
No ratings yet
Training Evaluation
6 pages
Statistical Inference Final Project Hamza
No ratings yet
Statistical Inference Final Project Hamza
25 pages
Journal of King Saud University - Science: Manoj Kumar Rastogi, Faton Merovci
No ratings yet
Journal of King Saud University - Science: Manoj Kumar Rastogi, Faton Merovci
7 pages
Family Medicine Practice in Saudi Arabia The Curre
No ratings yet
Family Medicine Practice in Saudi Arabia The Curre
8 pages
Manajemen Sumber Daya Manusia: Kelompok 5
No ratings yet
Manajemen Sumber Daya Manusia: Kelompok 5
25 pages
WinFIT 2008
No ratings yet
WinFIT 2008
2 pages
An Investigation Into The Impact of Digitalization in The SME's Development in Namibia: A Systematic Literature Review
No ratings yet
An Investigation Into The Impact of Digitalization in The SME's Development in Namibia: A Systematic Literature Review
5 pages
Culture Fair Test
100% (1)
Culture Fair Test
3 pages
DR&MS Sept 2017
No ratings yet
DR&MS Sept 2017
328 pages
Tele FIKREMARIAM BEREDED
100% (1)
Tele FIKREMARIAM BEREDED
78 pages
Study On Investor Behaviour in Systematic Investment Plan of Mutual Fund
No ratings yet
Study On Investor Behaviour in Systematic Investment Plan of Mutual Fund
7 pages

Bootstrap PDF

Uploaded by

Bootstrap PDF

Uploaded by

Resampling Methods: The Bootstrap

Rebecca C. Steorts, Duke University

STA 325, Chapter 5 ISL

A re-sampling method involves repeatedly drawing samples from a

1. Draw different samples from the training data

In this module, we focus on cross-validation (CV) and the bootstrap.

I CV can be used to estimate the test error associated with a

1. Model assessment: the process of evaluating a model’s

The bootstrap is a widely applicable and extremely powerful

As a simple example, the bootstrap can be used to estimate the

Suppose we wish to determine the best investment allocation under

Suppose that we wish to invest a fixed sum of money in two

That is we want to minimize

Var (αX + (1 − α)Y ).

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

I In reality, σX2 , σY2 , σXY are unknown.

I It is natural to wish to quantify the accuracy of our estimate of

The bootstrap approach allows us to use a computer to emulate the

Suppose we have a simple dataset Z with n observations.

1. Randomly select n observations from the data set in order to

I The sampling is performed with replacement, which means that

2. We can use Z ∗1 to produce a new bootstrap estimate for α,

and B corresponding α estimates,

α̂∗1 , α̂∗2 , . . . , α̂∗B .

3. We can compute the standard error of these bootstrap

4. This serves as an estimate of the standard error of α̂ estimated

Figure 1: A graphical illustration of the bootstrap approach on a small

Performing a bootstrap analysis in R entails only two steps.

1. We must create a function that computes the statistic of

1. Create a function, alpha.fn(), which takes as input the (X , Y )

This function returns, or outputs, an estimate for α based on

The next command uses the sample() function to randomly select

We can implement a bootstrap analysis by performing this

Load the boot package in R.

There is an excellent lab on applying the bootstrap to linear

You might also like