1.
Bootstrap and Jackknife Methods (Lec 9)
These are resampling techniques used to estimate the precision of sample statistics
(like bias, standard error, or confidence intervals) without making strong assumptions
about the underlying distribution of the data.
Bootstrap Method:
o Concept: To mimic the process of sampling from a larger population, we
"pull ourselves up by our own bootstraps" by sampling with replacement
from our original data sample.
o Procedure:
1. Create a "bootstrap sample" by randomly drawing n observations
from the original dataset of size n, with replacement. This means
some original data points may appear multiple times, while others
may not appear at all.
2. Calculate the statistic of interest (e.g., mean, median, standard
deviation) for this bootstrap sample.
3. Repeat steps 1 and 2 a large number of times (e.g., B = 1000
times) to get B bootstrap statistics.
4. The distribution of these B statistics is an approximation of the
sampling distribution of your statistic.
o Application: The standard deviation of the B bootstrap statistics is the
Bootstrap estimate of the standard error of your original statistic.
Jackknife Method:
o Concept: A simpler, less computationally intensive resampling method
than Bootstrap. It systematically leaves out one observation at a time.
o Procedure:
1. Create n "jackknife samples" by deleting one observation at a time
from the original dataset of size n.
2. Calculate the statistic of interest for each of the n jackknife
samples.
3. Use the collection of these n statistics to estimate the bias and the
standard error of the original statistic.
o Formulas:
Jackknife estimate of bias: bias=(n−1)(θˉjack−θobs) where θˉjack
is the mean of the jackknife statistics and θobs is the statistic from
the original sample.
Jackknife estimate of standard error: SEjack=nn−1∑i=1n(θi
−θˉjack)2 .
2. Bayesian Statistics (Lec 10-11)
Bayesian statistics is a framework for inference that updates the probability for a
hypothesis based on new evidence. It's centered around Bayes' Theorem.
Bayes' Theorem: The core of Bayesian inference. It describes the relationship
between the prior belief and the posterior belief after accounting for data.
Conclusion: There is only a ~0.17% chance you have the disease even if you
test negative.
3. Monte Carlo Methods (Lec 11-12)
This is a broad class of computational algorithms that rely on repeated random
sampling to obtain numerical results. They are5 particularly useful for optimization,
numerical integration, and drawing samples from probability distributions.
Monte Carlo Integration: 6 A method to approximate the value of a definite
integral using random numbers.
4. Statistical and Systematic Uncertainties (Lec 13)
Understanding and quantifying uncertainty is crucial for interpreting experimental
results.
Types of Uncertainty:
o Systematic Uncertainty: Due to flaws in the experimental setup or
method. It causes results to be consistently wrong in one direction (bias).
It cannot be reduced by repeating measurements. 7
o Statistical (Random) Uncertainty: Due to random fluctuations in
measurements. It can be reduced by increasing the sample size n.
o
5. Advanced and Numerical Methods (Lec 14)
This topic covers various numerical techniques used to solve problems that lack
analytical solutions.
Root Finding: Algorithms for solving equations of the form f(x)=0.
o Bisection Method: A simple bracketing method that repeatedly halves an
interval and selects the subinterval in which the root must lie.
o Newton-Raphson Method: An iterative method that uses the function's
derivative to find successively better approximations to the root. It's faster
but can be less stable.
Numerical Optimization (Minimization/Maximization):
o Purpose: To find the minimum or maximum of a function. This is essential
in statistics for finding:
Maximum Likelihood Estimate (MLE): The parameter value that
maximizes the likelihood function L(θ∣D).
Maximum A Posteriori (MAP) Estimate: The parameter value that
maximizes the posterior probability p(θ∣D).
o Algorithms:
Gradient Descent/Ascent: An iterative algorithm that takes steps
in the direction of the steepest descent (for minimization) or ascent
(for maximization) of the function's gradient.
1. Bootstrap Method
Definition
The Bootstrap is a computer-based method for estimating the precision of a statistical
estimate, such as its standard error or bias1. The core idea is to treat the collected
sample as if it were the entire population2. By repeatedly drawing new samples with
replacement from the original sample, one can create a distribution of the statistic of
interest and measure its variability3.
Example: Estimating the Standard Error of the Mean
Let's say we have a small original dataset of n=4 observations: X = {2, 4, 5, 9}. We want
to estimate the standard error of the mean of this dataset.
1. Calculate the original statistic:
The mean of the original sample is xˉ=(2+4+5+9)/4=5.
2. Create Bootstrap Samples:
Create a large number of new samples (called bootstrap samples), each of size
n=4, by sampling with replacement from the original dataset X4.
o Bootstrap Sample 1: {9, 2, 4, 9} → Mean = (9+2+4+9)/4 = 6.0
o Bootstrap Sample 2: {5, 9, 2, 2} → Mean = (5+9+2+2)/4 = 4.5
o Bootstrap Sample 3: {4, 5, 5, 2} → Mean = (4+5+5+2)/4 = 4.0
o ... (Repeat this process a large number of times, e.g., 1000 times)
3. Collect the Bootstrap Statistics:
After creating 1000 bootstrap samples, you will have a list of 1000 bootstrap
means: {6.0, 4.5, 4.0, ...}.
4. Estimate the Standard Error:
The Bootstrap estimate of the standard error of the mean is the standard
deviation of this list of 1000 bootstrap means5.
2. Jackknife Method
Definition
The Jackknife is a method used to estimate the bias and standard error of a statistic6.
Unlike the Bootstrap, it is a deterministic method that works by systematically creating
new samples by leaving out one observation at a time from the original dataset7.
Example: Estimating the Standard Error of the Mean
Using the same original dataset: X = {2, 4, 5, 9}.
1. Calculate the original statistic:
The mean of the original sample is xˉ=(2+4+5+9)/4=5.
2. Create Jackknife Samples:
Create n=4 new samples (called jackknife samples) by leaving out one
observation at a time8.
Sample 1 (leave out 2): {4, 5, 9} → Mean = (4+5+9)/3 ≈ 6.00
o
Sample 2 (leave out 4): {2, 5, 9} → Mean = (2+5+9)/3 ≈ 5.33
o
Sample 3 (leave out 5): {2, 4, 9} → Mean = (2+4+9)/3 = 5.00
o
Sample 4 (leave out 9): {2, 4, 5} → Mean = (2+4+5)/3 ≈ 3.67
o
3. Collect the Jackknife Statistics:
The collection of means calculated from the jackknife samples is {6.00, 5.33,
5.00, 3.67}.
3. Monte Carlo Method
Definition
A Monte Carlo method is a numerical technique that uses random numbers to compute
a result, often for problems that are difficult to solve analytically10. One of the most
common applications is Monte Carlo Integration, which uses random numbers to
calculate the value of a definite integral11.
Example: Monte Carlo Integration
1.
Identify the Function and Interval:
The function is f(x)=x2 and the interval is [a,b]=[0,1].
2. Generate Random Numbers:
Generate a large number (N) of random numbers uniformly distributed between
a=0 and b=1. Let's say we generate N=10 random numbers:
{0.87, 0.15, 0.92, 0.43, 0.38, 0.67, 0.21, 0.55, 0.74, 0.09}
3. Evaluate the Function:
Calculate the value of f(x)=x2 for each random number:
{0.7569, 0.0225, 0.8464, 0.1849, 0.1444, 0.4489, 0.0441, 0.3025, 0.5476,
0.0081}
4. Calculate the Average:
Find the average of the function values from Step 3.
Average ≈ (0.7569 + ... + 0.0081) / 10 ≈ 0.3306
5. Estimate the Integral:
Apply the Sample Mean formula for integration12:
∫abf(x)dx≈(b−a)×Average of f(xi)
Integral ≈ (1 - 0) × 0.3306 = 0.3306
As the number of random points (N) increases, this estimate will get closer to the true
value of 1/3.
2.