0% found this document useful (0 votes)
5 views

Questions_for_Unit_4 (2)

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Questions_for_Unit_4 (2)

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Estimation - Classical and Bayesian Approach

Classical (Frequentist) Approach


• Focus: Considers parameter as fixed, uses data to infer its value.
• Point Estimation: Uses estimators like Maximum Likelihood Estimation (MLE) or Method of Moments (MoM).
– MLE Formula:
θ̂MLE = arg max L(θ; X)
θ
• Interval Estimation: Constructs confidence intervals, often using normal approximations.
• Objective: Obtain point estimates for parameters without involving prior knowledge.
• Properties:
– Consistency: θ̂ → θ as n → ∞.
– Unbiasedness: E(θ̂) = θ.
– Efficiency: Minimizes variance among unbiased estimators.
– Sufficiency: Uses all information in the sample.
• Key Method: Maximum Likelihood Estimation (MLE).

Bayesian Approach
• Focus: Treats parameters as random variables with prior distributions.
• Bayes’ Theorem:
p(X|θ)p(θ)
p(θ|X) =
p(X)
• Posterior Distribution: Combines prior and likelihood to form the posterior p(θ|X).
• Objective: Incorporate prior beliefs with sample evidence to update knowledge.
• Posterior Distribution:
P (θ|X) ∝ P (X|θ) · P (θ)
• Properties:
– Flexible, subjective.
– Allows prior updates with new data.
– Uses posterior predictive distribution for inference.

Summary :
• Philosophy: Classical estimation treats parameters as fixed values, while Bayesian estimation treats parameters
as random variables with distributions.
• Prior Information: Bayesian estimation incorporates prior beliefs through prior distributions, while classical
estimation uses data alone.
• Uncertainty Quantification: Classical methods typically use point estimates and confidence intervals, while
Bayesian methods provide a full distribution (posterior) and credible intervals.
• Computation: Classical estimation is often simpler, while Bayesian methods require more complex computational
techniques such as MCMC.

Methods of Estimation
Point Estimation
• A single value estimate of a parameter.
• Methods:
– MLE: Maximizes likelihood function;
θ̂MLE = arg max L(θ|X)
θ

– Method of Moments (MoM): Sets sample moments equal to population moments.


– Bayesian Estimation: Uses posterior mean, median, or mode as estimates.

1
Interval Estimation
• Provides a range within which the parameter lies with a certain confidence.
• Confidence Interval: h i
θ̂ − zα/2 · σθ̂ , θ̂ + zα/2 · σθ̂

• Method of Moments (MoM):

– Sets sample moments equal to population moments to solve for parameters.


– Example: For parameter θ, E(X) = sample mean.

• Maximum Likelihood Estimation (MLE):


– Maximizes the likelihood function with respect to parameters.
• Bayesian Estimation:
– Computes posterior mean, median, or mode based on the posterior distribution.
– Posterior Mean: Z
E(θ|X) = θ p(θ|X) dθ

Likelihood and EM Algorithm


• Likelihood Function:
L(θ; X) = p(X|θ)

• Log-Likelihood:
ℓ(θ; X) = log(L(θ; X))

• Properties of Likelihood:
– Consistency: θ̂MLE → θ as n → ∞.
– Asymptotic Normality: √
n(θ̂MLE − θ) ∼ N (0, I −1 (θ))
where I(θ) is the Fisher information.
– Helps find MLE.
– For large samples, the MLE is approximately normally distributed.
• EM Algorithm:
– Purpose: Estimate parameters in models with latent variables.
– Properties:
∗ Iterative improvement.
∗ Converges to a local maximum of the likelihood.

– Iterative algorithm for finding MLE when data are incomplete or have latent variables.
– Steps:
∗ E-step: Compute the expectation of the complete-data log-likelihood.(Calculate expected log-likelihood
Q(θ|θ(t) ).)
∗ M-step: Maximize this expectation to update parameters.(Maximize Q(θ|θ(t) ) with respect to θ.)

Prior Distributions
Conjugate Priors
• Definition: Prior and posterior distributions are in the same family.
• Examples:
– Normal prior for normal likelihood.
– Beta prior for binomial likelihood.
• Benefit: Simplifies computation of posterior.

2
Informative Prior
• Reflects specific prior knowledge about the parameter.

• Example: Using expert data for priors.

Non-informative Prior
• Represents lack of prior information (e.g., uniform distribution).

• Objective: Minimize influence of prior on posterior.

Loss Functions
• Purpose: Quantify the cost of estimation errors.

• Common Loss Functions:


– Squared Error Loss:
L(θ, θ̂) = (θ − θ̂)2
– Absolute Error Loss:
L(θ, θ̂) = |θ − θ̂|
– Zero-One Loss:
L(θ, θ̂) = I(θ ̸= θ̂)

Risk Function
• Definition: Expected value of the loss function over the parameter space.
• Formula:
R(θ, θ̂) = Eθ [L(θ, θ̂)]

• Bayesian Risk: Minimizes expected posterior loss for decision-making.

Examples
Example (Classical): Suppose you have a sample of heights from a population and want to estimate the population
mean µ.
Let’s assume the sample heights are X = {170, 165, 180, 175, 160}.
Sample Mean (Point Estimate):
n
1X 170 + 165 + 180 + 175 + 160
µ̂ = Xi = = 170
n i=1 5

Confidence Interval for Mean (assuming normal distribution with unknown variance): Compute the sample
standard deviation s: v
u n r
u 1 X 1X
s=t (Xi − µ̂)2 = (Xi − 170)2 = 7.91
n − 1 i=1 4

For a 95% confidence level, with t0.025,4 ≈ 2.776:


 
s s
CI = µ̂ − tα/2,n−1 · √ , µ̂ + tα/2,n−1 · √
n n

= (170 − 2.776 × 3.54, 170 + 2.776 × 3.54) = (160.17, 179.83)


Example (Bayesian): Assume a prior belief that the population mean µ is normally distributed with µ0 = 160 and
variance σ02 = 25.
The likelihood (data distribution) is also normal, X ∼ N (µ, σ 2 ), with σ 2 = 16.
Posterior Mean (using conjugate normal prior):

σ02 · X̄ + σ 2 · µ0 25 · 170 + 16 · 160


µposterior = = = 165.27
σ02 + σ 2 25 + 16

3
2. Methods of Estimation
Example (Maximum Likelihood Estimation): Suppose X1 , X2 , . . . , Xn are i.i.d. samples from an exponential
distribution with unknown rate λ: f (x|λ) = λe−λx .
Likelihood Function:
n
Y P
L(λ) = λe−λXi = λn e−λ Xi
i=1
Log-Likelihood: X
ℓ(λ) = n ln λ − λ Xi
Maximize by taking the derivative:
dℓ n X n
= − Xi = 0 ⇒ λ̂ = P
dλ λ Xi
Example (Method of Moments): Suppose you have data from a distribution with unknown mean µ and variance
σ2 .
For a normal distribution, the first two moments are E[X] = µ and E[(X − µ)2 ] = σ 2 .
Equating sample moments to population moments: - Sample mean X̄ = µ. - Sample variance S 2 = σ 2 .
Thus, the method of moments estimates are µ̂ = X̄ and σ̂ 2 = S 2 .

3. Likelihood and Expectation-Maximization (EM) Algorithm


Example (Likelihood): Suppose X1 , X2 , . . . , Xn are i.i.d. samples from a normal distribution N (µ, σ 2 ).
Likelihood Function:
n
Y 1 (Xi −µ)2
L(µ, σ 2 ) = √ e− 2σ2
i=1 2πσ 2
Log-Likelihood:
n
n 1 X
ℓ(µ, σ 2 ) = − ln(2πσ 2 ) − 2 (Xi − µ)2
2 2σ i=1
To find the MLEs µ̂ and σ̂ 2 , differentiate ℓ with respect to µ and σ 2 , set to zero, and solve.
Example (EM Algorithm): Assume you observe data X from a mixture of two normal distributions with unknown
means µ1 , µ2 and common variance σ 2 .
E-Step: Compute the probability each observation belongs to each component, given the current parameter estimates.
M-Step: Use these probabilities to update the parameter estimates (e.g., means and variances) by maximizing the
expected log-likelihood.

4. Prior Distributions
Example (Conjugate Prior with Beta-Binomial): Assume X ∼ Binomial(n, θ) with a beta prior θ ∼ Beta(α, β).
Posterior Distribution: Since the beta prior is conjugate, the posterior is also a Beta distribution:
θ|X ∼ Beta(α + X, β + n − X)
Interpretation: Posterior updates based on the observed successes X and failures n − X, blending prior beliefs with
new data.

5. Loss Functions
Example (Squared Error Loss): Suppose you want to estimate the parameter θ = 5 and your estimate θ̂ = 4.
Squared Error Loss:
L(θ, θ̂) = (θ − θ̂)2 = (5 − 4)2 = 1
Example (Bayesian Decision with Loss Function): For estimating a parameter with squared error loss, the
Bayes estimator is the posterior mean.
Suppose the posterior distribution of θ after observing data is θ|X ∼ N (10, 2).
Posterior Mean: Since the Bayes estimator minimizes squared error loss, the best estimate of θ is 10.

6. Risk Function
Example (Risk Function for a Specific Estimator): Suppose X ∼ N (µ, 1) and you use µ̂ = X as the estimator for
µ.
Squared Error Risk: Since X is an unbiased estimator, R(µ, µ̂) = E[(X − µ)2 ] = Var(X) = 1.
The risk, or expected loss, is constant at 1, regardless of µ.

4
Problem:
Suppose you have a sample of n = 5 observations from a distribution with the probability density function (PDF) given
by:

θxθ−1
f (x; θ) = , 0 ≤ x ≤ 1, θ>0
θ
where θ is the unknown parameter.

1. Method of Moments: Use the method of moments to estimate the parameter θ.


2. Maximum Likelihood Estimation (MLE): Find the Maximum Likelihood Estimate (MLE) for θ.
3. EM Algorithm: Suppose the observed data come from a mixture of two distributions with the same form (but dif-
ferent parameters) and the goal is to estimate the parameters of the mixture. Set up the Expectation-Maximization
(EM) algorithm for this problem.

Solution:
1. Method of Moments:
The method of moments is used to estimate parameters by equating the sample moments to the population moments.
- The first population moment (mean) for the given distribution is:
Z 1 Z 1
E[X] = xf (x; θ) dx = x · θxθ−1 dx
0 0
This simplifies to:
1
E[X] =
θ+1
Now, the sample mean is:
n
1X
µ̂ = xi
n i=1
Equating the sample mean to the population mean:
n
1X 1
xi =
n i=1 θ+1
Solving for θ:
1
θ= 1
Pn −1
n i=1 xi
Thus, the method of moments estimate for θ is:
1
θ̂M M = 1
Pn −1
n i=1 xi

2. Maximum Likelihood Estimation (MLE):


To find the MLE, we first write down the likelihood function for a sample of size n:
n n
Y Y θxθ−1 i
L(θ) = f (xi ; θ) =
i=1 i=1
1
This simplifies to:
n
Y
L(θ) = θn xθ−1
i
i=1

The log-likelihood function is:


n
X
log L(θ) = n log θ + (θ − 1) log xi
i=1

To maximize the log-likelihood, we take the derivative with respect to θ:

5
n
d n X
log L(θ) = + log xi
dθ θ i=1
Setting the derivative equal to 0:
n
n X
+ log xi = 0
θ i=1
Solving for θ:
n
θ̂M LE = − Pn
i=1 log xi
Thus, the MLE estimate for θ is:
n
θ̂M LE = − Pn
i=1 log xi

3. EM Algorithm:
In this case, suppose the observed data comes from a mixture of two distributions with the same form but different
parameters: f (x; θ1 ) and f (x; θ2 ). The goal is to estimate the parameters θ1 and θ2 .
E-step (Expectation step): Given the current estimates of θ1 and θ2 , compute the responsibilities, i.e., the
probabilities of each data point belonging to each distribution:

π1 f (xi ; θ1 )
γi1 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
π2 f (xi ; θ2 )
γi2 =
π1 f (xi ; θ1 ) + π2 f (xi ; θ2 )
where π1 and π2 are the mixing coefficients.
M-step (Maximization step): Update the parameter estimates θ1 , θ2 , π1 , and π2 based on the responsibilities:
Pn ˆ
γi1 xθi 1 −1
θˆ1 = i=1
Pn
i=1 γi1
Pn ˆ
γ xθ2 −1
Pn i2 i
θˆ2 = i=1
i=1 γi2
The mixing coefficients are updated as:
n
1X
πˆ1 = γi1
n i=1
n
1X
πˆ2 = γi2
n i=1
Repeat the E-step and M-step iteratively until convergence.

You might also like