Questions_for_Unit_4 (2)
Questions_for_Unit_4 (2)
Bayesian Approach
• Focus: Treats parameters as random variables with prior distributions.
• Bayes’ Theorem:
p(X|θ)p(θ)
p(θ|X) =
p(X)
• Posterior Distribution: Combines prior and likelihood to form the posterior p(θ|X).
• Objective: Incorporate prior beliefs with sample evidence to update knowledge.
• Posterior Distribution:
P (θ|X) ∝ P (X|θ) · P (θ)
• Properties:
– Flexible, subjective.
– Allows prior updates with new data.
– Uses posterior predictive distribution for inference.
Summary :
• Philosophy: Classical estimation treats parameters as fixed values, while Bayesian estimation treats parameters
as random variables with distributions.
• Prior Information: Bayesian estimation incorporates prior beliefs through prior distributions, while classical
estimation uses data alone.
• Uncertainty Quantification: Classical methods typically use point estimates and confidence intervals, while
Bayesian methods provide a full distribution (posterior) and credible intervals.
• Computation: Classical estimation is often simpler, while Bayesian methods require more complex computational
techniques such as MCMC.
Methods of Estimation
Point Estimation
• A single value estimate of a parameter.
• Methods:
– MLE: Maximizes likelihood function;
θ̂MLE = arg max L(θ|X)
θ
1
Interval Estimation
• Provides a range within which the parameter lies with a certain confidence.
• Confidence Interval: h i
θ̂ − zα/2 · σθ̂ , θ̂ + zα/2 · σθ̂
• Log-Likelihood:
ℓ(θ; X) = log(L(θ; X))
• Properties of Likelihood:
– Consistency: θ̂MLE → θ as n → ∞.
– Asymptotic Normality: √
n(θ̂MLE − θ) ∼ N (0, I −1 (θ))
where I(θ) is the Fisher information.
– Helps find MLE.
– For large samples, the MLE is approximately normally distributed.
• EM Algorithm:
– Purpose: Estimate parameters in models with latent variables.
– Properties:
∗ Iterative improvement.
∗ Converges to a local maximum of the likelihood.
– Iterative algorithm for finding MLE when data are incomplete or have latent variables.
– Steps:
∗ E-step: Compute the expectation of the complete-data log-likelihood.(Calculate expected log-likelihood
Q(θ|θ(t) ).)
∗ M-step: Maximize this expectation to update parameters.(Maximize Q(θ|θ(t) ) with respect to θ.)
Prior Distributions
Conjugate Priors
• Definition: Prior and posterior distributions are in the same family.
• Examples:
– Normal prior for normal likelihood.
– Beta prior for binomial likelihood.
• Benefit: Simplifies computation of posterior.
2
Informative Prior
• Reflects specific prior knowledge about the parameter.
Non-informative Prior
• Represents lack of prior information (e.g., uniform distribution).
Loss Functions
• Purpose: Quantify the cost of estimation errors.
Risk Function
• Definition: Expected value of the loss function over the parameter space.
• Formula:
R(θ, θ̂) = Eθ [L(θ, θ̂)]
Examples
Example (Classical): Suppose you have a sample of heights from a population and want to estimate the population
mean µ.
Let’s assume the sample heights are X = {170, 165, 180, 175, 160}.
Sample Mean (Point Estimate):
n
1X 170 + 165 + 180 + 175 + 160
µ̂ = Xi = = 170
n i=1 5
Confidence Interval for Mean (assuming normal distribution with unknown variance): Compute the sample
standard deviation s: v
u n r
u 1 X 1X
s=t (Xi − µ̂)2 = (Xi − 170)2 = 7.91
n − 1 i=1 4
3
2. Methods of Estimation
Example (Maximum Likelihood Estimation): Suppose X1 , X2 , . . . , Xn are i.i.d. samples from an exponential
distribution with unknown rate λ: f (x|λ) = λe−λx .
Likelihood Function:
n
Y P
L(λ) = λe−λXi = λn e−λ Xi
i=1
Log-Likelihood: X
ℓ(λ) = n ln λ − λ Xi
Maximize by taking the derivative:
dℓ n X n
= − Xi = 0 ⇒ λ̂ = P
dλ λ Xi
Example (Method of Moments): Suppose you have data from a distribution with unknown mean µ and variance
σ2 .
For a normal distribution, the first two moments are E[X] = µ and E[(X − µ)2 ] = σ 2 .
Equating sample moments to population moments: - Sample mean X̄ = µ. - Sample variance S 2 = σ 2 .
Thus, the method of moments estimates are µ̂ = X̄ and σ̂ 2 = S 2 .
4. Prior Distributions
Example (Conjugate Prior with Beta-Binomial): Assume X ∼ Binomial(n, θ) with a beta prior θ ∼ Beta(α, β).
Posterior Distribution: Since the beta prior is conjugate, the posterior is also a Beta distribution:
θ|X ∼ Beta(α + X, β + n − X)
Interpretation: Posterior updates based on the observed successes X and failures n − X, blending prior beliefs with
new data.
5. Loss Functions
Example (Squared Error Loss): Suppose you want to estimate the parameter θ = 5 and your estimate θ̂ = 4.
Squared Error Loss:
L(θ, θ̂) = (θ − θ̂)2 = (5 − 4)2 = 1
Example (Bayesian Decision with Loss Function): For estimating a parameter with squared error loss, the
Bayes estimator is the posterior mean.
Suppose the posterior distribution of θ after observing data is θ|X ∼ N (10, 2).
Posterior Mean: Since the Bayes estimator minimizes squared error loss, the best estimate of θ is 10.
6. Risk Function
Example (Risk Function for a Specific Estimator): Suppose X ∼ N (µ, 1) and you use µ̂ = X as the estimator for
µ.
Squared Error Risk: Since X is an unbiased estimator, R(µ, µ̂) = E[(X − µ)2 ] = Var(X) = 1.
The risk, or expected loss, is constant at 1, regardless of µ.
4
Problem:
Suppose you have a sample of n = 5 observations from a distribution with the probability density function (PDF) given
by:
θxθ−1
f (x; θ) = , 0 ≤ x ≤ 1, θ>0
θ
where θ is the unknown parameter.
Solution:
1. Method of Moments:
The method of moments is used to estimate parameters by equating the sample moments to the population moments.
- The first population moment (mean) for the given distribution is:
Z 1 Z 1
E[X] = xf (x; θ) dx = x · θxθ−1 dx
0 0
This simplifies to:
1
E[X] =
θ+1
Now, the sample mean is:
n
1X
µ̂ = xi
n i=1
Equating the sample mean to the population mean:
n
1X 1
xi =
n i=1 θ+1
Solving for θ:
1
θ= 1
Pn −1
n i=1 xi
Thus, the method of moments estimate for θ is:
1
θ̂M M = 1
Pn −1
n i=1 xi
5
n
d n X
log L(θ) = + log xi
dθ θ i=1
Setting the derivative equal to 0:
n
n X
+ log xi = 0
θ i=1
Solving for θ:
n
θ̂M LE = − Pn
i=1 log xi
Thus, the MLE estimate for θ is:
n
θ̂M LE = − Pn
i=1 log xi
3. EM Algorithm:
In this case, suppose the observed data comes from a mixture of two distributions with the same form but different
parameters: f (x; θ1 ) and f (x; θ2 ). The goal is to estimate the parameters θ1 and θ2 .
E-step (Expectation step): Given the current estimates of θ1 and θ2 , compute the responsibilities, i.e., the
probabilities of each data point belonging to each distribution:
π1 f (xi ; θ1 )
γi1 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
π2 f (xi ; θ2 )
γi2 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
where π1 and π2 are the mixing coefficients.
M-step (Maximization step): Update the parameter estimates θ1 , θ2 , π1 , and π2 based on the responsibilities:
Pn ˆ
γi1 xθi 1 −1
θˆ1 = i=1
Pn
i=1 γi1
Pn ˆ
γ xθ2 −1
Pn i2 i
θˆ2 = i=1
i=1 γi2
The mixing coefficients are updated as:
n
1X
πˆ1 = γi1
n i=1
n
1X
πˆ2 = γi2
n i=1
Repeat the E-step and M-step iteratively until convergence.