0% found this document useful (0 votes)
48 views166 pages

Chapter 1 - Estimation Theory

Chapter 1 of 'Estimation Theory' by Christophe Hurlin introduces the estimation problem involving continuous random variables and the need to estimate unknown parameters. It outlines the structure of the chapter, which includes definitions of estimators, their properties, and the concept of sampling distributions. The chapter emphasizes the importance of sampling in econometrics and provides examples and MATLAB exercises to illustrate the concepts discussed.

Uploaded by

ermal.bis.16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views166 pages

Chapter 1 - Estimation Theory

Chapter 1 of 'Estimation Theory' by Christophe Hurlin introduces the estimation problem involving continuous random variables and the need to estimate unknown parameters. It outlines the structure of the chapter, which includes definitions of estimators, their properties, and the concept of sampling distributions. The chapter emphasizes the importance of sampling in econometrics and provides examples and MATLAB exercises to illustrate the concepts discussed.

Uploaded by

ermal.bis.16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 1 - Estimation Theory

espace
Data science and advanced programming
espace
MSc in Finance - UNIL

Christophe Hurlin

September 2024

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 1 / 166


Section 1

Introduction

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 2 / 166


1. Introduction

Estimation problem

Let us consider a continuous random variable Y characterized by a marginal


probability density function fY (y ; θ ) for y ∈ R and θ ∈ Θ. The parameter θ is
unknown.

Let {Y1 , .., Yn } a random sample of i.i.d. random variables that have the same
distribution as Y .

We have one realisation {y1 , .., yn } of this sample.

How to estimate the parameter θ?

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 3 / 166


1. Introduction

Remarks

1 The estimation problem can be extended to the case of an econometric model. In


this case we consider two variables Y and X and a conditional pdf f Y |X =x (y ; θ )
that depends on a parameter or a vector of unknown parameters θ.

2 In this chapter, we don’t derive the estimators (for the estimation methods, see next
chapters). We admit that we have an estimator θb for θ whatever the estimation
method used and we study its finite sample and large sample properties.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 4 / 166


Notations

Mathematical notations:
I will (try to...) follow the conventions of notation of Abadir and Magnus (2002).

Y random variable
y realization
y vector
Y matrix

Problem: this system of notations does not allow to discriminate between a vector
(matrix) of random elements and a vector (matrix) of non-stochastic elements.

Abadir, K. M., and Magnus, J. R. (2002). Notation in econometrics: A proposal for a standard.
The Econometrics Journal, 5(1), 76-90.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 5 / 166


Outline of the Chapter

1 Introduction

2 What is an Estimator?

3 Finite Sample Properties

4 Asymptotic Properties
Almost Sure Convergence
Convergence in Probability
Convergence in Mean Square
Convergence in Distribution
Asymptotic Distributions

5 Bibliography

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 6 / 166


Section 2

What is an Estimator?

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 7 / 166


2. What is an Estimator?

Objectives

1 Define the notion of sample.

2 Define the concept of estimator.

3 Define the concept of estimate.

4 Sampling distribution of an estimator.

5 Discussion about the notion of ”good” estimator.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 8 / 166


2. What is an Estimator?

Econometrics is fundamentally based on four elements:

1 A sample of data.

2 A parametric model.

3 An estimation method.

4 Some inference methods.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 9 / 166


2. What is an Estimator?

Question: Why using a sample?

Let us assume that we want to study a characteristic / property x of the


individuals of a population.

The individuals (unit) of the population are not necessarily some persons: it can be
firms, assets, countries, time index etc..

The characteristic x may be quantitative (salary, weight, total asset, GDP etc.) or
qualitative (social status, genre, etc.)

The characteristic x may be stochastic or deterministic (weight, size etc..).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 10 / 166


2. What is an Estimator?

In the case of a small population,


such as one consisting of only two
individuals, the use of inferential
statistics, econometrics, and similar
techniques becomes unnecessary.
For example, consider a population
where Adam weighs 80 kg and Eve
weighs 50 kg...

Adam and Eve, Titian (1490-1576)

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 11 / 166


2. What is an Estimator?

When the population is large or


infinite, sampling is the only mean
to study the weight.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 12 / 166


2. What is an Estimator?

Definition (Population)
In statistics, a population refers to the entire set of individuals, items, or data points
that share a common characteristic and are of interest in a particular study or analysis.

1 In most of cases, it is impossible to observe the entire statistical population, due to


cost constraints, time constraints, constraints of geographical accessibility.

2 A researcher would instead observe a statistical sample from the population in order
to attempt to learn something about the population as a whole.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 13 / 166


2. What is an Estimator?

In most of cases, the sample is random:

Definition (Probability sampling)


A probability sampling is a sampling method in which every unit in the population has a
chance (greater than zero) of being selected in the sample.

Consequence: a random sample is a collection of random variables even the


characteristic x is deterministic.

sample: {X1 , X2 , ..., Xn }

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 14 / 166


2. What is an Estimator?

Example (random sample)


Let us consider a population of four persons and denote by xe the weight (assumed to be
non stochastic) of the individual with:

xeA = 80 xeB = 50 xeC = 40 xeD = 90

Consider a random sample of n = 2 individuals denoted by:


 

 

X1 , X2

 |{z} 

Weight of the first individual selected in the sample

So, we can obtain a realisation:

{x1 , x2 } = {50, 80} or {x1 , x2 } = {90, 40} or {x1 , x2 } = {90, 90} etc.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 15 / 166


2. What is an Estimator?

Fact
Whatever the assumption made on the characteristic X (deterministic or stochastic) the
result of the probability sampling is a random sample, i.e. a collection of random
variables X1 , X2 , .., Xn .

Fact
Given the sampling probability method used, we can assume that these random variables
areindependent and identically distributed (i.i.d.).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 16 / 166


2. What is an Estimator?
Inference: In general, in economics and finance, only one realisation of the sample is
available: this is your data set!
{x1 , x2 , .., xn }
The challenge of data science is to draw conclusions about a population after observing
only one realisation {x1 , ..xn } of a random sample (your data set..).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 17 / 166


2. What is an Estimator?

Definition (Estimator)
An estimator is a function T (Y1 , Y2 , .., Yn ) of a sample. Any statistic is a point
estimator.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 18 / 166


2. What is an estimator?

Example (Sample mean)


Assume that Y1 , Y2 , .., Yn are i.i.d. N m, σ2 random variables. The sample mean (or


average)
1 n
Y n = ∑ Yi
n i =1
is a point estimator (or an estimator) of m.

Example (Sample variance)


Assume that Y1 , Y2 , .., Yn are i.i.d. N m, σ2 random variables. The sample variance


n
1 2
Sn2 = ∑
n − 1 i =1
Yi − Y n

is a point estimator (or an estimator) of σ2 .

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 19 / 166


2. What is an Estimator?

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 20 / 166


2. What is an Estimator?

Fact
An estimator θb is a random variable.

Consequence: θb has a (marginal or conditional) probability distribution. This sampling


distribution is characterized by a probability density function (pdf) fθb (u )

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 21 / 166


2. What is an Estimator?

Definition (Sampling Distribution)


The probability distribution of an estimator (or a statistic) is called the sampling
distribution.

Consequence: The sampling distribution of θb is characterised by moments such that:


 
The mean E θb ,
 
The variance V θb ,
More generally the k th central moment defined by:
  k  Z k
E θb − E θb = u − µθb fθb (u ) du ∀k ∈ N

  Z
µθb = E θb = u fθb (u ) du

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 22 / 166


2. The CAPM and the linear regression model

MATLAB Commands

We will need the following MATLAB commands and structures:


unifrnd(a,b,n,1): draws a vector of random variables from a uniform
continuous distribution over the interval [a,b].

disp: display the content of a variable.

zeros(n,1: creates a vector of zeros of size n × 1.

mean(X) and var(X): Compute the empirical mean and variance of X.

skewness(X) and kurtosis(X): Compute the skewness and kurtosis of X.

figure: Creates a figure window.

histogram(X): Creates an histogram of the values of X.

grid(’on’): Displays or hides axes grid lines.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 23 / 166


Exercise: MATLAB code

Exercise: Simulating a sampling distribution

Write a well-commented MATLAB code to simulate the sampling distribution of the


empirical mean for a sample of 10 random variables drawn from a continuous uniform
distribution:
Generate a sample of size n = 10 from a uniform distribution over [60, 115],
and display the generated sample.
Compute and display the empirical mean of the sample.
Initialize a vector of size K × 1 with K = 1000, setting all elements to zero.
Perform K trials, each involving the generation of a new sample of size n = 10
from a uniform distribution, to simulate the sampling distribution of the
empirical mean.
Plot the histogram of the K realizations of the empirical mean.
Compute and display the mean, variance, skewness, and kurtosis of the
empirical mean based on the K realizations.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 24 / 166


Solution: MATLAB code

1 %===================================================
2 % PURPOSE: Illustrate the sampling distributoon
3 % Lecture: "Data Science and advanced programming", HEC Lausanne
4 %-------------------------
5 % Author: Christophe Hurlin, University of Orleans
6 % Version: v1. September 2024
7 %===================================================
8
9 clear; clc; close all;
10
11 % Parameters
12 n = 10; % Sample size
13 number_trials = 1000; % Number of trials
14
15 % Generate one sample from a uniform distribution
16 X = unifrnd(60, 115, 1, n);
17 disp(’Sample X:’);
18 disp(X); % Display the sample X
19
20 % Compute and display the empirical mean (estimator)
21 xbar = mean(X);
22 disp(’Empirical mean xbar:’);
23 disp(xbar);
24
25 % Initialize array to store empirical means from multiple trials
26 Xbar = zeros(number_trials, 1);

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 25 / 166


Solution: MATLAB code

1 % Perform multiple trials to simulate the sampling distribution of the mean


2 for i = 1:number_trials
3 X = unifrnd(60, 115, 1, n); % Draw a new sample
4 Xbar(i) = mean(X); % Compute the mean of the sample
5 end
6
7 disp(’Empirical means from all trials:’);
8 disp(Xbar’); % Display the empirical means from all trials
9
10 % Plot the empirical distribution of the estimator
11 figure;
12 histogram(Xbar);
13 title(’Empirical Distribution of the Estimator Xbar’);
14 xlabel(’Xbar’);
15 grid on;
16
17 % Moments of the estimator Xbar
18 fprintf(’Mean of the estimator Xbar = %.4f\n’, mean(Xbar));
19 fprintf(’Variance of the estimator Xbar = %.4f\n’, var(Xbar));
20 fprintf(’skewness of the estimator Xbar = %.4f\n’, skewness(Xbar));
21 fprintf(’Kurtosis of the estimator Xbar = %.4f\n’, kurtosis(Xbar));

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 26 / 166


Solution: MATLAB code

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 27 / 166


2. What is an Estimator?

Definition (Point estimate)


A (point) estimate is the realized value of an estimator (i.e. a number) that is obtained
when a sample is actually taken. For an estimator θb it can be denoted by θb (y ) .

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 28 / 166


2. What is an Estimator?

Example (Point estimate)


For instance y n is an estimate of m.

1 n
n i∑
yn = yi
=1

If n = 3 and {y1 , y2 , y3 } = {3, −1, 2} then y n = 1.333.


If n = 3 and {y1 , y2 , y3 } = {4, −8, 1} then y n = −1.
etc..

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 29 / 166


2. What is an Estimator?

Question 1: What constitutes a good estimator?


The search for good estimators is a central focus in econometrics.
An estimator is considered ”good” if:
1 It is consistent.
2 It is efficient.
3 Additionally, it should ideally be unbiased in finite samples.

Question 2: What is the distribution of the estimator?

This should not be confused with the concept of the sampling distribution of the
estimator θ,
b although the two are related.

In addition to these ”good” properties, we generally consider estimators that have a


normal asymptotic distribution (as discussed in this chapter).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 30 / 166


2. What is an Estimator?

Question (cont’d): What constitutes a good estimator?


Estimators are compared on the basis of a variety of attributes.

1 Finite sample properties (or finite sample distribution) of estimators are those
attributes that can be compared regardless of the sample size (Section 3).

2 Some estimation problems involve characteristics that are unknown in finite samples.
In these cases, estimators are compared on the basis on their large sample, or
asymptotic properties (Section 4).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 31 / 166


2. What is an Estimator?

Key Concepts

1 Random sample.

2 Estimator.

3 Point estimate.

4 Sampling distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 32 / 166


Section 3

Finite Sample Properties

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 33 / 166


3. Finite Sample Properties

Objectives

1 Define the concept of a finite sample distribution.

2 Explore the finite sample properties of estimators.

3 Define what constitutes an unbiased estimator.

4 Compare two unbiased estimators.

5 Introduce the FDCR, or Cramer-Rao bound.

6 Define what constitutes a Best Linear Unbiased Estimator (BLUE).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 34 / 166


3. Finite Sample Properties

Definition (Finite sample properties and finite sample distribution)


The finite sample properties of an estimator θb correspond to the properties of its finite
sample distribution (or exact distribution) defined for any sample size n ∈ N.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 35 / 166


3. Finite Sample Properties

Two cases:

1 In some particular cases, the finite sample distribution of the estimator is known.
It corresponds to the distribution of the random variable θb for any sample size n.

2 In most of cases, the finite sample distribution is unknown, but we can study some
specific moments (mean, variance, etc..) of this distribution (finite sample
properties).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 36 / 166


3. Finite Sample Properties

Example (Sample mean and finite sample distribution)


Assume that Y1 , Y2 , .., Yn are N .i.d. m, σ2 random variables. The estimator m

b = Yn
(sample mean) has also a normal distribution:

1 n σ2
 
b = ∑ Yi ∼ N m,
m ∀n ∈ N
n i =1 n

b for any n ∈ N is fully characterized by


Consequence: the finite sample distribution of m
m and σ2 (parameters that can be estimated).
b ∼ N m, σ2 /3 , if n = 10, then m b ∼ N m, σ2 /10 , etc.
 
Example: if n = 3, then m

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 37 / 166


3. Finite Sample Properties

Proof: The sum of independent normal variables has a normal distribution with:

1 n nm
n i∑
E (m
b) = E (Yi ) = =m
=1 n
!
1 n 1 n nσ2 σ2
V (m
b) = V ∑
n i =1
Yi = 2 ∑ V (Yi ) = 2 =
n i =1 n n
since the variables Yi are:

independent, so cov Yi , Yj = 0,

identically distributed (so E (Yi ) = m and V (Yi ) = σ2 , ∀i ∈ [1, .., n ]).


Christophe Hurlin Chapter 1 - Estimation Theory September 2024 38 / 166


3. Finite Sample Properties

Remarks

1 Except in very particular cases (normally distributed samples), the exact distribution
of the estimator is very difficult to calculate.

2 Sometimes,
  it is possible to derive the exact distribution of a transformed variable
g θ , where g (.) is a continuous function.
b

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 39 / 166


3. Finite Sample Properties

Example (Sample variance and finite sample distribution)


Assume that Y1 , Y2 , .., Yn are N .i.d. m, σ2 random variables. The sample variance


n
1 2
Sn2 = ∑
n − 1 i =1
Yi − Y n

is an estimator of σ2 . The transformed variable (n − 1) Sn2 /σ2 has a Chi-squared (exact /


finite sample) distribution with n − 1 degrees of freedom:

(n − 1) 2
Sn ∼ χ 2 ( n − 1 ) ∀n ∈ N
σ2

Proof: see Chapter 4.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 40 / 166


3. Finite Sample Properties

Fact
In most of cases, it is impossible to derive the exact finite sample distribution for the
estimator (or a transformed variable).

Two reasons:
1 In some cases, the exact distribution of Y1 , Y2 ..Yn is known, but the function T (.)
is too complicated to derive the distribution of θb :

θb = T (Y1 , ..Yn ) ∼ ??? ∀n ∈ N


2 In most of cases, the distribution of the sample variables Y1 , Y2 ..Yn is unknown...

θb = T (Y1 , ..Yn ) ∼ ??? ∀n ∈ N

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 41 / 166


3. Finite Sample Properties

Question: How to evaluate the finite sample properties of the estimator θb when its finite
sample distribution is unknown?

θb ∼ ??? ∀n ∈ N

Solution: We will focus on some specific moments of this (unknown) finite sample
(sampling) distribution in order to study some properties of the estimator θb and
determine if it is a ”good” estimator or not.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 42 / 166


3. Finite Sample Properties

Do not confuse moments and empirical moments


X a random variable X1 , . . . , Xn collection of random
pdf fX (x ) variables
Moments Empirical moments
Mean =⇒ constant Empirical Mean =⇒ random variable
E (X ) = xfX (x ) dx = µ X n = n1 ∑ni=1 Xi
R

Variance =⇒ constant Empirical Variance =⇒ random variable


2
V (X ) = (x − µ)2 fX (x ) dx Sn2 = n1 ∑ni=1 Xi − X n
R

Skewness =⇒ constant Empirical Skewness =⇒ random variable


3
(x − µ)3 fX (x ) dx /σ3 Sk = n1 ∑ni=1 Xi − X n /Sn3
R
3 =

Kurtosis =⇒ constant Empirical Kurtosis =⇒ random variable


4
V (X ) = (x − µ)2 fX (x ) dx /σ4 Ku = n1 ∑ni=1 Xi − X n /Sn4
R

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 43 / 166


3. Finite Sample Properties

Definition (Unbiased estimator)


An estimator θb of a parameter θ is unbiased if the mean of its sampling distribution is θ:
 
E θb = θ

or    
E θb − θ = Bias θb θ = 0

implies that θb is unbiased. If θ is a vector of parameters, then the estimator is unbiased if


the expected value of every element of θb equals the corresponding element of θ.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 44 / 166


3. Finite Sample Properties

Source: Greene (2018), Econometrics

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 45 / 166


3. Finite Sample Properties

Example (Bernoulli distribution)


Let Y1 , Y2 , .., Yn be a random sampling from a Bernoulli distribution with a success
probability p. An unbiased estimator of p is

1 n
n i∑
pb = Yi
=1

Proof: Since the Yi are i.i.d. with E (Yi ) = p, then we have:

1 n pn
n i∑
E (pb) = E (Yi ) = =p □
=1 n

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 46 / 166


3. Finite Sample Properties

Example (Uniform distribution)


Let Y1 , Y2 , .., Yn be a random sampling from a uniform distribution U[0,θ ] . An unbiased
estimator of θ is:
2 n
θb = ∑ Yi
n i =1

Proof: Since the Yi are i.i.d. with E (Yi ) = (θ + 0) /2 = θ/2, then we have:
!
  2 n 2 n 2 nθ
E θ =E
b ∑
n i =1
Yi = ∑ E (Yi ) = ×
n i =1 n 2
=θ□

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 47 / 166


3. Finite Sample Properties

Example (Multiple linear regression model)


Consider the model:
y = Xβ + µ
where y ∈ Rn ,
X ∈ Mn×K is a nonrandom matrix, β ∈ RK is a vector of parameters,
E (µ) = 0n×1 and V (µ) = σ2 I n . The OLS estimator
  −1  
βb = X⊤ X X⊤ y

is an unbiased estimator of β.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 48 / 166


3. Finite Sample Properties

Proof: Since y = Xβ + µ, X ∈ Mn×K is a nonrandom matrix and E (µ) = 0, we have

E (y) = Xβ

As a consequence:
    −1
E βb = X ⊤X X⊤ E (y )
  −1
= X⊤ X X⊤ X β
= β

The estimator βb is unbiased. □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 49 / 166


3. Finite Sample Properties

Remark:

Even it is not relevant in the section devoted to the finite sample properties of
estimators, we can introduce here the notion of asymptotically unbiased estimator
(which can be considered as a large sample property..).

Here we assume that the estimator θb = θbn depends on the sample size n.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 50 / 166


3. Finite Sample Properties

Definition (Asymptotically unbiased estimator)


The sequence of estimators θbn (with n ∈ N) is asymptotically unbiased if
 
lim E θbn = θ
n→∞

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 51 / 166


3. Finite Sample Properties

Example (Sample variance)


Assume that Y1 , Y2 , .., Yn are N .i.d. m, σ2 random variables. The non-corrected


sample variance defined by:


1 n 2
Sen2 = ∑ Yi − Y n
n i =1

is a biased estimator of σ2 but is asymptotically unbiased.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 52 / 166


3. Finite Sample Properties

Proof: We want to study the bias of the non-corrected empirical variance:

1 n 2
Sen2 = ∑ Yi − Y n
n i =1

We known that the corrected empirical variance verifies:


n
1 2
Sn2 = ∑ Y −Yn
n − 1 i =1 i

(n − 1) 2
Sn ∼ χ 2 ( n − 1 ) ∀n ∈ N
σ2
Since, we have a relationship between Sn2 and Sen2 , such that:
 
n
Sn2 = Sen2
n−1
then we get:
(n − 1) 2 n
Sn = 2 Sen2 ∼ χ2 (n − 1) ∀n ∈ N
σ2 σ

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 53 / 166


3. Finite Sample Properties

Proof (cont’d):
n e2
S ∼ χ2 (n − 1) ∀n ∈ N
σ2 n
Reminder: If X ∼ χ2 (v ) , then E (X ) = v and V (X ) = 2v .
By definition:  n 
E 2
Sen2 = n − 1
σ
or equivalently:
  n − 1
E Sen2 = σ2 ̸ = σ2
n
2
So, Sen2 = (1/n ) ∑ni=1 Yi − Y n is a biased estimator of σ2 .

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 54 / 166


3. Finite Sample Properties

2
Proof (cont’d): But Sen2 = (1/n ) ∑ni=1 Yi − Y n is asymptotically unbiased since:

n−1
   
lim E Sen2 = lim σ2 = σ2 □
n→∞ n→∞ n

Remark: Even in a more general framework (non-normal), the sample variance (with a
correction for small sample) is an unbiased estimator of σ2
n 2
Sn2 = (n − 1 ) −1 ∑ Yi − Y n
| {z } i =1
correction for small sample
 
E Sn2 = σ2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 55 / 166


2. The CAPM and the linear regression model

MATLAB Commands

We will need the following MATLAB commands and structures:


normrnd(a,b,n,1): Draws a vector of random variables from a normal
distribution with a mean of a and a standard deviation of b.

var(X): Computes the empirical variance of X. By default, this corresponds to


the corrected variance Sn2 .

sum(X): Computes the sum of the elements in the vector/matrix X.

hold(’on’): Allows superimposing multiple graphs in the same figure


window.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 56 / 166


Exercise: MATLAB code
Exercise: Empirical variances

Write a well-commented MATLAB code to illustrate the bias of the non-corrected


variance Sen2 . Compare the sampling distributions of the corrected empirical variance
Sn2 and the non-corrected variance Sen2 for a sample of 15 random variables drawn
from a normal distribution:
Generate a sample of size n = 15 from a normal distribution N (0, σ2 ) with
σ = 2.
Compute and display the empirical mean of the sample.
Initialize two vectors of size K × 1 with K = 1000, setting all elements to zero.
Perform K trials, each involving the generation of a new sample of size n = 15
from the normal distribution.
For each trial, compute the corrected empirical variance Sn2 and the
non-corrected variance Sen2 .
Plot the histograms of the K realizations of Sn2 and Sen2 on the same figure.
Compute and display the mean of the K realizations of Sn2 and Sen2 , and
compare them to the true variance σ2 .
Repeat the exercise with a sample size of n = 500. Comment.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 57 / 166


Solution: MATLAB code
1 clear; clc; close all;
2
3 % Parameters
4 n = 15; % Sample size
5 number_trials = 10000; % Number of trials
6 sigma=2; % True standard deviation
7 corrected_var= zeros(number_trials, 1);
8 non_corrected_var= zeros(number_trials, 1);
9
10 % Perform multiple trials to simulate the sampling distribution of the mean
11 for i = 1:number_trials
12 X = normrnd(0, 2, n, 1); % Draw a normal sample
13 corrected_var(i) = var(X); % Empirical variance
14 non_corrected_var(i)=sum((X-mean(X)).ˆ2)/n; % Non corrected variance
15 end
16
17 % Plot the empirical distribution of the estimator
18 figure; % Normalize to show the probability density function
19 histogram(non_corrected_var, ’Normalization’, ’pdf’);
20 xlabel(’Xbar’);
21 ylabel(’Density’);
22 grid on;
23 hold(’on’)
24 histogram(corrected_var, ’Normalization’, ’pdf’);
25 legend(’Non-Corrected variance’,’Corrected variance’)
26
27 % Moments of the estimator Xbar
28 fprintf(’True variance = %.0f\n’, sigmaˆ2);
29 fprintf(’Mean of the variance = %.4f\n’, mean(corrected_var));
30 fprintf(’Variance of the non-corrected variance = %.4f\n’, mean(non_corrected_var));

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 58 / 166


Solution: MATLAB code

n = 15
True variance = 4
Mean of the variance = 4.0112
Variance of the non-corrected variance = 3.7438
Christophe Hurlin Chapter 1 - Estimation Theory September 2024 59 / 166
Solution: MATLAB code

n = 500
True variance = 4
Mean of the variance = 3.9984
Variance of the non-corrected variance = 3.9904
Christophe Hurlin Chapter 1 - Estimation Theory September 2024 60 / 166
3. Finite Sample Properties

Unbiasedness is interesting per se but not so much!

1 The absence of bias is not a sufficient criterion to discriminate among competitive


estimators.

2 It may exist many unbiased estimators for the same parameter (vector) of interest.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 61 / 166


3. Finite Sample Properties

Example (Estimators)
Assume that Y1 , Y2 , .., Yn are i.i.d. with E (Yi ) = m, the statistics

1 n
n i∑
b1 =
m Yi
=1

b 2 = Y1
m
are unbiased estimators of m.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 62 / 166


3. Finite Sample Properties

Proof: Since the Yi are i.i.d. with E (Yi ) = m, then we have:

1 n nm
n i∑
E (m
b1 ) = E ( Yi ) = =m
=1 n

E (m
b 2 ) = E ( Y1 ) = m
Both estimators m
b 1 and m
b 2 of the parameter m are unbiased. □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 63 / 166


3. Finite Sample Properties

How can we compare two unbiased estimators?

When two or more estimators are unbiased, the best one is the most precise, i.e.,
the estimator with the smallest variance.

Therefore, comparing two or more unbiased estimators is equivalent to comparing


their variance-covariance matrices.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 64 / 166


3. Finite Sample Properties

Definition
Suppose that θb1 and θb2 are two unbiased estimators. θb1 dominates θb2 , i.e. θb1 ⪰ θb2 , if
and only if:    
V θb1 ≤ V θb2

In the case where θb1 , θb2 and θ are vectors, this inequality becomes:
   
V θb2 − V θb1 is a positive semi definite matrix

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 65 / 166


3. Finite Sample Properties

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 66 / 166


3. Finite Sample Properties

Example (Estimators)
Assume that Y1 , Y2 , .., Yn are i.i.d. E (Yi ) = m and V (Yi ) = σ2 , the estimator
b 1 = n−1 ∑ni=1 Yi dominates the estimator m
m b 2 = Y1 .

Proof: The two estimators m


b 1 and m
b 2 are unbiased, so they can be compared in terms
of variance (precision):

1 n nσ2 σ2
V (m
b1 ) = 2 ∑
n i =1
V ( Yi ) = 2 =
n n
since the Yi are i.i.d.

b 2 ) = V (Y1 ) = σ2
V (m
So, V (m
b 1 ) ≤ V (m
b 2 ) , the estimator m
b 1 is preferred to m b1 ⪰ m
b2 , m b2 . □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 67 / 166


Exercise: MATLAB code

Exercise: Comparison of two unbiased estimators

Write a well-commented MATLAB code to compare the sampling distributions of the


estimators m̂1 and m̂2 for a sample of size n = 20 of variables Yi drawn from a
Poisson distribution with mean m = 4. Recall that E(Yi ) = m.
Generate a sample of size n = 20 from a Poisson distribution P (m ) with
m = 4.
Initialize two vectors of size K × 1 with K = 1000, setting all elements to zero.
Perform K trials, each involving the generation of a new sample of size n from
the Poisson distribution.
For each trial, compute the two estimators m̂1 and m̂2 .
Plot the histograms of the K realizations of m̂1 and m̂2 on the same figure.
Compute and display the mean and variance of the K realizations of m̂1 and
m̂2 . What is the best estimator?
Repeat the exercise with a sample size of n = 100. Provide a commentary on
the results.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 68 / 166


Solution: MATLAB code
1 clear; clc; close all;
2
3 % Parameters
4 n = 20; % Sample size
5 K = 1000; % Number of trials
6 m=4; % Degrees of freedom
7 m1= zeros(K, 1); m2=m1; % Initialisation
8
9 % Perform multiple trials to simulate
10 % the sampling distributions of m1 and m2
11 for i = 1:K
12 X = poissrnd(4, n, 1); % Draw a Poisson sample
13 m1(i) = mean(X); % First estimator
14 m2(i) = X(1); % Second estimator
15 end
16
17 % Plot the empirical distribution of the estimator
18 figure;
19 histogram(m1, ’Normalization’, ’pdf’);
20 xlabel(’Xbar’); ylabel(’Density’);
21 grid on; hold(’on’)
22 histogram(m2, ’Normalization’, ’pdf’);
23 legend(’Estimator m1’,’Estimator m2’)
24
25 % Moments of the two estimators
26 fprintf(’True mean (unobserved) = %.0f\n’, m)
27 fprintf(’Mean of the estimator m1 = %.4f\n’, mean(m1))
28 fprintf(’Mean of the estimator m2 = %.4f\n’, mean(m2))
29 fprintf(’Variance of the estimator m1 = %.4f\n’, var(m1))
30 fprintf(’Variance of the estimator m2 = %.4f\n’, var(m2))

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 69 / 166


Solution: MATLAB code

True mean (unobserved) = 4


Mean of the estimator m1 = 3.9773
Mean of the estimator m2 = 3.9430
Variance of the estimator m1 = 0.1950
Variance of the estimator m2 = 4.0438
Christophe Hurlin Chapter 1 - Estimation Theory September 2024 70 / 166
3. Finite Sample Properties

Question: is there a bound for the variance of the unbiased estimators?

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 71 / 166


3. Finite Sample Properties

Definition (Cramer-Rao or FDCR bound)


Let X1 , .., Xn be an i.i.d. sample with pdf fX (θ; x ). Let θb be an unbiased estimator of θ;
i.e., Eθ (θb) = θ. If fX (θ; x ) is regular then:
 
Vθ θb ≥ In−1 (θ0 ) = FDCR or Cramer-Rao bound

where In (θ0 ) denotes the Fisher information number for the sample
 evaluated at the true
value θ0 . If θ is a vector then this inequality means that Vθ θb − In−1 (θ0 ) is positive
semi-definite.

FDCR: Frechet - Darnois - Cramer and Rao


Remark: we will define the Fisher information matrix in Chapter 2 (Maximum Likelihood
Estimation).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 72 / 166


3. Finite Sample Properties

Definition (Efficiency)
An estimator is efficient if its variance attains the FDCR (Frechet - Darnois - Cramer -
Rao) or Cramer-Rao bound:  
Vθ θb = In−1 (θ0 )

where In (θ0 ) denotes the Fisher information matrix associated to the sample evaluated
at the true value θ0 .

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 73 / 166


3. Finite Sample Properties

Finally, note that in some cases we further restrict the set of estimators to linear
functions of the data.

Definition (Estimator BLUE)


An estimator is the minimum variance linear unbiased estimator or best linear unbiased
estimator (BLUE) if it is a linear function of the data and has minimum variance among
linear unbiased estimators

Remark: the term ”linear” means that the estimator θb is a linear function of the data Yi :
n
θbj = ∑ ωij Yi
i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 74 / 166


3. Finite Sample Properties

Key Concepts

1 Finite sampling distribution.


2 Finite sample properties.
3 Bias and unbiased estimator.
4 Comparison of unbiased estimators.
5 Cramer-Rao or FDCR bound.
6 Efficient estimator.
7 Linear estimator.
8 BLUE estimator.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 75 / 166


Section 4

Asymptotic Properties

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 76 / 166


4. Asymptotic Properties

Problem:

1 Let us consider an i.i.d. sample Y1 , Y2 .., Yn , where Y has a pdf fY (y ; θ ) and θ is an


unknown parameter.
2 We assume that fY (y ; θ ) is also unknown (we do not know the distribution of Yi ).
3 We consider an estimator θb (also denoted θbn to show that it depends on n) such
that:
θb = T (Y1 , Y2 , .., Yn ) ≡ θbn
4 The finite sample distribution of θbn is unknown....

θbn ∼ ??? ∀n ∈ N

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 77 / 166


4. Asymptotic Properties

Question: what is the behavior of the random variable θbn when the sample size n tends
to infinity?

Definition (Asymptotic theory)


Asymptotic or large sample theory consists in the study of the distribution of the
estimator when the sample size is sufficiently large.

The asymptotic theory is fundamentally based on the notion of convergence...

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 78 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 79 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 80 / 166


Sub-Section 4.1

Almost Sure Convergence

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 81 / 166


4. Asymptotic Properties
4.1 Almost sure convergence

Definition (Almost sure convergence)


Let Xn be a sequence random variable indexed by the sample size. Xn converges almost
surely (or with probability 1 or strongly) to a constant c, if, for every ε > 0,
 
Pr lim Xn − c < ε = 1
n→∞

or equivalently if:  
Pr lim Xn = c = 1
n→∞
It is written
a.s.
Xn → c

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 82 / 166


4. Asymptotic Properties
4.1 Almost sure convergence

Comments:

1 When n is very large, the realizations of Xn are always equal to c. For instance, if
c = 2, when we draw Xn , we obtain realizations such as 2, 2, 2, 2, etc.

2 In another words, it means that when n tends to infinity, the random variable Xn
tends to a degenerate random variable (a random variable which only takes a
single value c) with a pdf equal to a probability mass function.

3 A sequence of random variables is a collection of random variables indexed in a


specific order, usually by the natural numbers.

4 Here, Xn can be viewed as a function of other variables Y1 , . . . , Yn with


Xn = g (Y1 , . . . , Yn ). As n varies, the probability distribution of Xn changes.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 83 / 166


4. Asymptotic Properties
4.1 Almost sure convergence

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 84 / 166


4. Asymptotic Properties
4.1 Almost sure convergence

Definition (Strong consistency)


A point estimator θbn of θ is strongly consistent if:
a.s.
θbn → θ

Remark: When n → ∞, the estimator tends to a degenerate random variable that takes
a single value equal to θ. The crème de la crème (best of the best) of the estimators....

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 85 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 86 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 87 / 166


Sub-Section 4.2

Convergence in Probability

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 88 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Definition (Convergence in probability)


Let Xn be a sequence random variable indexed by the sample size. Xn converges in
probability to a constant c, if, for any ε > 0, we have:

lim Pr (|Xn − c | > ε) = 0


n→∞

It is written as:
p
Xn → c or plim Xn = c

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 89 / 166


4. Asymptotic Properties
4.2 Convergence in probability

p
Xn → c if lim Pr (|Xn − c | > ε) = 0
n→∞

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 90 / 166


4. Asymptotic Properties
4.2 Convergence in probability

p
Xn → c if lim Pr (|Xn − c | > ε) = 0 for a very small ε
n→∞

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 91 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Comments:

1 The general idea is the same than for the a.s. convergence: Xn tends to a
degenerate random variable (even if it is not exactly the case) equal to c..

2 When n is very large, the realizations of Xn are very close to c. For instance, if
c = 2, when we draw Xn , we obtain realizations such as 1.9999, 2.00001, 2,
2.00002, etc.

3 Convergence in probability allows more erratic behavior in the converging sequence


than almost sure convergence.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 92 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Remark. The notation:


p
Xn → X
where X is a random element (scalar, vector, matrix) means that the variable Xn − X
converges to c = 0.
p
Xn − X → 0

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 93 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Definition (Weak consistency)


A point estimator θbn of θ is (weakly) consistent if:
p
θbn → θ

Remark: In econometrics, in most of cases, we only consider the weak consistency.


When we say that an estimator is ”consistent”, it generally refers to the convergence in
probability.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 94 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Lemma (Convergence in probability)


Let Xn be a sequence random variable indexed by the sample size and c a constant. If Xn
converges in probability to c as n → ∞, then

lim E (Xn ) = c
n→∞

lim V (Xn ) = 0
n→∞

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 95 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Example (Consistent estimator)


Assume that Y1 , Y2 , .., Yn are i.i.d. with E (Yi ) = m and V (Yi ) = σ2 , where σ2 is
known and m is unknown. The estimator m, b defined by,

1 n
n i∑
b=
m Yi
=1

is a (weakly) consistent estimator of m. Compute E (m


b ) and V (m
b ) when n → ∞.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 96 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof: Since Y1 , Y2 , .., Yn are i.i.d. with E (Yi ) = m and V (Yi ) = σ2 , we have :

1 n
n i∑
E (m
b) = E (Yi ) = m
=1

1 n σ2
lim V (m
n→∞
b ) = lim
n→∞ n2
∑ V ( Yi ) = lim
n→∞ n
=0□
i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 97 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Example (Consistent estimator)


Assume that Y1 , Y2 , .., Yn are N .i.d. m, σ2 random variables. The sample variance


defined by:
n
1 2
Sn2 = ∑ Y −Yn
n − 1 i =1 i

is a (weakly) consistent estimator of σ2 . Compute E Sn2 and V Sn2 when n → ∞.


 

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 98 / 166


3. Finite Sample Properties
4.2 Convergence in probability

Proof: We known that for normal sample:


(n − 1) 2
Sn ∼ χ 2 ( n − 1 )
σ2
(n − 1) 2 (n − 1) 2
   
E S n = n − 1 V Sn = 2 (n − 1)
σ2 σ2
We get immediately:  
E Sn2 = σ2

2σ4
   
lim V Sn2 = lim =0□
n→∞ n→∞ n − 1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 99 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Lemma (Chain of implication)


The almost sure convergence implies the convergence in probability:
a.s. p
→ =⇒ →

where the symbol ”=⇒ ” means ’implies”. The converse is not true

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 100 / 166


4. Asymptotic Properties
4.2 Convergence in Probability

Comments:

1 One of the key applications of convergence in probability and almost sure


convergence is the Law of Large Numbers.

2 The Law of Large Numbers states that the sample mean converges in probability
(Weak Law of Large Numbers or WLLN) or almost surely (Strong Law of Large
Numbers or SLLN) to the population mean:

1 n
n i∑
Xn = Xi −→ E (Xi )
n→∞
=1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 101 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Theorem (Weak Law of Large Numbers, Khinchine (1929))


If {Xi }ni is a sequence of independently and identically distributed (i.i.d.) random
variables with finite mean E (Xi ) = µ < ∞, then the sample mean X n converges in
probability to µ:
1 n p
X n = ∑ Xi → E (Xi ) = µ
n i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 102 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Theorem (Strong Law of Large Numbers, Kolmogorov)


If {Xi } , for i = 1, .., n is a sequence of independently and identically distributed
(i.i.d.) random variables such that E (Xi ) = µ < ∞ and E (|Xi |) < ∞, then the sample
mean X n converges almost surely to µ:

1 n
n i∑
a.s.
Xn = Xi → E (Xi ) = µ
=1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 103 / 166


4. Asymptotic Properties

MATLAB Commands

We will need the following MATLAB commands and structures:


subplot(m,n,p) divides the current figure into an m-by-n grid and creates
axes in the position specified by p..

xlim(limits) sets the x-axis limits for the current axes or chart. Specify
limits as a two-element vector of the form [xmin xmax], where xmax is greater
than xmin.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 104 / 166


Exercise: MATLAB code

Exercise: Weak Law of Large Numbers

Write a well-commented MATLAB code to illustrate the Weak Law of Large Numbers.
Assume that Xi ∼ U [0, 10] with E(Xi ) = 5.
Generate a sample of size n = 10 of i.i.d. random variables Xi ∼ U [0, 10].

Compute the sample mean x n = 1


n ∑ni=1 xi .

Repeat this procedure K = 5,000 times to obtain 5,000 realizations of the


sample mean X n .

Construct a histogram of these K realizations of X n .

Repeat the exercise with sample sizes of n = 100, n = 10,000, and


n = 100,000. Provide a commentary on the results.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 105 / 166


Solution: MATLAB code

1 clear; clc; close all;


2
3 % Parameters
4 n = [10, 100, 10000, 100000]; % Sample sizes
5 K = 5000; % Number of trials
6
7 % Preallocate matrix to store empirical means
8 Empirical_means = zeros(length(n), K);
9
10 % Loop over each sample size
11 for i = 1:length(n)
12 % Draw K samples of size n(i) from a uniform distribution U[0, 10]
13 X = unifrnd(0, 10, n(i), K);
14 % Compute the mean of each sample and store in Empirical_means
15 Empirical_means(i, :) = mean(X);
16 end
17
18 % Plot the empirical distribution of the sample means
19 figure;
20 for i = 1:length(n)
21 subplot(2, 2, i);
22 histogram(Empirical_means(i, :), ’Normalization’, ’pdf’);
23 xlabel(’Sample Mean (\overline{X}_n)’);
24 ylabel(’Density’); grid on;
25 xlim([2 8]);
26 title(sprintf(’Sample Size = %d’, n(i)));
27 end

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 106 / 166


Solution: MATLAB code

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 107 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof: There are many proofs of the law of large numbers. Most of them use the
additional assumption of finite variance V (Xi ) = σ2 and the Chebyshev’s inequality.

Theorem (Chebyshev’s inequality)


Let X be a random variable with finite expected value µ and finite non-zero variance σ2 .
Then for any real number k > 0,
1
Pr (|X − µ| ≥ kσ) ≤
k2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 108 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof (cont’d): Under the assumpition of i.i.d. µ, σ2 , we have that:




σ2
E Xn = µ V Xn =
 
n
Given the Chebyshev’s inequality, we get for k > 0:
 
σ 1
Pr X n − µ ≥ k √ ≤ 2
n k
Let us define ε > 0 such that

kσ ε n
ε = √ ⇐⇒ k =
n σ
Then we get for any ε > 0:
 σ2
Pr Xn − µ ≥ ε ≤ 2
ε n

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 109 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof (cont’d): for any ε > 0:


 σ2
Pr Xn − µ ≥ ε ≤ 2
ε n
So, when n → ∞ this probability is necessarily equal to 0 (since ≤ 0 means = 0)
 
Pr lim X n − µ ≥ ε = 0 ∀ε > 0
n→∞
 
Since Pr X n − µ < ε = 1 − Pr X n − µ ≥ ε , we have:
 
Pr lim X n − µ < ε = 1 ∀ε > 0
n→∞
p
X n → µ (WLLN) □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 110 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Remarks:

These two theorems consider a sequence of independently and identically distributed


(i.i.d.) random variables, as a consequence with the same mean E (Xi ) = µ,
∀i = 1, .., n.

There are alternative versions of the law of large numbers for independent random
variables not identically (heterogeneously) distributed with E (Xi ) = µi (Greene,
2018).
1 Chebychev’s Weak Law of Large Numbers.
2 Markov’s Strong Law of Large Numbers.

For more details about these alternative Laws of Large Numbers, see Greene (2018).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 111 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Theorem (Slutsky’s theorem)


p p
Let Xn and Yn be two sequences of random variables where Xn → X and Yn → c, where
c ̸= 0, then:
p
Xn + Yn → X + c
p
Xn Yn → cX
Xn p X

Yn c

Remark: This also holds for sequences of random matrices. The last statement reads: if
p p
Xn → X and yn → Ω then
p
yn−1 Xn → Ω−1 X
provided that Ω−1 exists.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 112 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Example
Let us consider the multiple linear regression model

yi = Xi⊤ β + µi

where Xi = (xi1 ..xiK )⊤ is K × 1 vector of random variables, β = ( β 1 ...β K )⊤ is K × 1


vector of parmeters, and where the error term µi satisfies E (µi ) = 0 and E µi | xij = 0
∀j = 1, ..K . Question: show that the OLS estimator defined by:
! −1 !
n n
βb = ∑ Xi X⊤ i ∑ i i
X y
i =1 i =1

is a consistent estimator of β.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 113 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof: let us rewritte the OLS estimator as:


! −1 !
n n
β =
b ∑ Xi X ⊤
∑ Xi yi i
i =1 i =1
! −1 !
n n  
= ∑ Xi Xi⊤ ∑ Xi Xi⊤ β + µi
i =1 i =1
! −1 ! ! −1 !
n n n n
= ∑ Xi Xi⊤ ∑ Xi Xi⊤ β+ ∑ Xi Xi⊤ ∑ Xi µi
i =1 i =1 i =1 i =1
! −1 !
n n
= β+ ∑ Xi Xi⊤ ∑ Xi µi
i =1 i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 114 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof (cont’d): By multiplying and dividing by n, we get:


! −1 !
n n
1 1
n i∑ n i∑

βb = β + Xi Xi Xi µi
=1 =1

1 By using the (weak) law of large number (Kitchine’s therorem), we have:

1 n p
  1 n p

n i =1
Xi Xi⊤ → E Xi Xi⊤
n i∑
Xi µi → E (Xi µi )
=1

2 By using the Slutsky’s theorem:


p
 
βb → β + E−1 Xi Xi⊤ E (Xi µi )

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 115 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Reminder: If X and Y are two random variables, then

E(X|Y) = 0 =⇒ E (X Y ) = 0

The reverse is not true.


(
cov (X , Y ) = E (XY ) − E (X ) E (Y ) = 0
E ( X | Y ) = 0 =⇒
E (X ) = 0

E ( X | Y ) = 0 =⇒ E (XY ) = 0

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 116 / 166


4. Asymptotic Properties
4.2 Convergence in probability

Proof (cont’d):
p
 
βb → β + E−1 Xi Xi⊤ E (Xi µi )

Since
E µi | Xij = 0 ∀j = 1, ..K ⇒ E (µi Xi ) = 0K ×1


We have
p
βb → β
The OLS estimator βb is (weakly) consistent. □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 117 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 118 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 119 / 166


Sub-Section 4.3

Convergence in Mean Square

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 120 / 166


4. Asymptotic Properties
4.3 Convergence in mean square

Definition (Convergence in mean square)


Let {Xi }for i = 1, .., n be a sequence of real-valued random variables such that
E |Xn |2 < ∞. Xn converges in mean square to a constant c, if:
 
lim E |Xn − c |2 = 0
n→∞

It is written
m.s.
Xn → c

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 121 / 166


4. Asymptotic Properties
4.3 Convergence in mean square

Remark: It is the less usefull notion of convergence.. except for the demonstrations of
the convergence in probability.

Lemma (Chain of implication)


The convergence in mean square implies the convergence in probability:
m.s. p
→ =⇒ →

where the symbol ”=⇒ ” means ’implies”. The converse is not true.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 122 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 123 / 166


4. Asymptotic Properties

We are mainly concerned with four modes of convergence:

1 Almost sure convergence

2 Convergence in probability

3 Convergence in quadratic mean

4 Convergence in distribution

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 124 / 166


Sub-Section 4.4

Convergence in Distribution

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 125 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Definition (Convergence in distribution)


Let Xn be a sequence random variable indexed by the sample size with a cdf Fn (.). Xn
converges in distribution to a random variable X with cdf F (.) if

lim Fn (x ) = F (x ) ∀x
n→∞

It is written:
d
Xn → X

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 126 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Comment: In general, we have:


d
Xn → X
|{z}
|{z}
random var. random var.

p
Xn → |{z}
c
|{z}
random var. constant

In the case, where


p p
Xn → X
|{z} it means Xn − X → |{z}
0
|{z} | {z }
random var. random var. random var. constant

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 127 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Lemma (Chain of implication)


The convergence in probability implies the convergence in distribution:
p d
→ =⇒ →

where the symbol ”=⇒ ” means ’implies”. The converse is not true.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 128 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Definition (Asymptotic distribution)


If Xn converges in distribution to X , where Fn (.) is the cdf of Xn , then F (.) is the cdf of
the limiting or asymptotic distribution of Xn .

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 129 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Notation. Generally, we denote:


d
Xn → L
|{z}
|{z}
random var. asy. distribution

It means Xn converges in distribution to a random variable X that has a distribution L.

Example
d
Xn → N (0, 1)
means that Xn converges to a random variable X normally distributed or that Xn has an
asymptotic standard normal distribution.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 130 / 166


4. Asymptotic Properties
4.4 Convergence in distribution

Definition (Asymptotic mean and variance)


The asymptotic mean and variance of a random variable Xn are the mean and variance
of the asymptotic or limiting distribution, assuming that the limiting distribution and its
moments exist. These moments are denoted by:

Easy (Xn ) Vasy (Xn )

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 131 / 166


4. Asymptotic Properties
4.4 Convergence in Distribution

Definition (Asymptotically Normally Distributed Estimator)


A consistent estimator θb of θ is said to be asymptotically normally distributed (or
asymptotically normal) if:
√  
d
n θb − θ0 → N (0, Σ0 )

Equivalently, θb is asymptotically normal if:


asy
 
θb ∼ N θ0 , n−1 Σ0

The asymptotic variance of θb is then defined by:


    1
Vasy θb ≡ avar θb = Σ0
n

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 132 / 166


Sub-Section 4.5

Asymptotic Distributions

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 133 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Let’s go back to our estimation problem

We consider a (strongly) consistent estimator θbn of the true parameter θ0 .


a.s. p
θbn → θ0 =⇒ θbn → θ0

This estimator has a degenerated asymptotic distribution (point-mass distribution),


since when n → ∞, we have:

lim f (x ) = f (x )
n→∞ θbn

where fθb (.) is the pdf of θbn and f (x ) is defined by:


n

1 if x = θ0
f (x ) =
0 0 otherwise

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 134 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Conclusion: one needs more than consistency to do inference (tests about the true value
of θ, etc.).
Solution: we will transform the estimator θbn to get a transformed variable that has a non
degenerated asymptotic distribution in order to derive the the asymptotic distribution.
It is the general idea of the Central Limit Theorem for a particular estimator: the sample
mean...

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 135 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Theorem (Lindeberg–Levy Central Limit Theorem, univariate)


Let X1 , .., Xn denote a sequence of independent and identically distributed random
variables with finite mean E (Xi ) = µ and finite variance V (Xi ) = σ2 . Then the sample
mean X n = n−1 ∑ni=1 Xi satisfies:
√  d  
n X n − µ → N 0, σ2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 136 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Comments:

1 The result is quite remarkable as it holds regardless of the form of the parent
distribution (the distribution of Xi ).

2 The central limit theorem requires virtually no assumptions (other than


independence and finite variances) to end up with normality: normality is inherited
from the sums of ”small” independent disturbances with finite variance.

Proof: Rao (1973).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 137 / 166


Exercise: MATLAB code

Exercise: Central Limit Theorem

Write a well-commented MATLAB code to illustrate the Central Limit Theorem


(CLT). Consider a random variable Xi ∼ χ2 (2) such that E(Xi ) = 2 and V(Xi ) = 4.

Generate a sample of size n = 10 of i.i.d. random variables Xi ∼ χ2 (2).

Compute the sample mean x n = ∑ni=1 xi .


1
n

Compute the transformed variable n (x n − 2).

Repeat this procedure√K = 5,000 times to obtain 5,000 realizations of the


transformed variable n (x n − 2).

Construct a histogram of these K realizations.


Compare this histogram to the probability density function (pdf) of the N (0, 4)
distribution. Use the normpdf(x, m, sigma) function for comparison.
Repeat the exercise with sample sizes of n = 100, n = 10,000, and
n = 100,000. Provide a commentary on the results.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 138 / 166


Solution: MATLAB code - part 1

1 clear; clc; close all;


2
3 % Parameters
4 n = [10, 100, 1000, 100000]; % Sample sizes
5 K = 5000; % Number of trials
6
7 % Preallocate matrix to store transformed sample means
8 Z = zeros(length(n), K);
9
10 % Loop over each sample size
11 for i = 1:length(n)
12 % Draw K samples of size n(i) from a chi-square distribution with 2 degrees of
freedom
13 X = chi2rnd(2, n(i), K);
14 % Compute the transformed sample mean for each sample
15 Z(i, :) = sqrt(n(i)) * (mean(X) - 2);
16 end
17
18 % Generate the pdf of the normal distribution N(0, 4)
19 e = (-10:0.1:10); % Vector of values from -10 to 10
20 pdf_normal = normpdf(e, 0, 2); % Pdf values of N(0, 4)

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 139 / 166


Solution: MATLAB code - part 2

1 % Plot the empirical distribution of the transformed sample means


2 figure;
3 for i = 1:length(n)
4 subplot(2, 2, i);
5 histogram(Z(i, :), ’Normalization’, ’pdf’, ’BinWidth’, 0.5);
6 hold on; grid on;
7 plot(x, pdf_normal, ’r’, ’LineWidth’, 1.5); % Plot the pdf of the normal
distribution
8 xlabel(’Transformed Variable’); ylabel(’Density’);
9 xlim([-10, 10]);
10 title(sprintf(’Sample Size = %d’, n(i)));
11 legend(’Empirical Distribution’, ’Normal Distribution’);
12 end

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 140 / 166


Solution: MATLAB code - Output

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 141 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Definition
The convergence result (CLT):
√  d  
n X n − µ → N 0, σ2

can be understood as:


σ2
 
asy
Xn ≈ N µ,
n
asy
where the symbol ≈ means ”asymptotically distributed as”. The asymptotic mean and
variance of the sample mean are then defined by:

σ2
Easy X n = µ Vasy X n =
 
n

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 142 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Proof: Assume that


√  d  
n X n − µ → N 0, σ2

Then, consider a large but finite sample size n (ex: n = 100, 000
√  asy  
n X n − µ ≈ N 0, σ2

asy
where the symbol ≈ means approximately ”asymptotically distributed as”. Since n is
finite, this expression can written as:

σ2
 
asy
X n − µ ≈ N 0,
n
or:
σ2
 
asy
Xn ≈ N µ,
n

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 143 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions


Speed of convergence: why studying nX n = n1/2 X n in the TCL?

1 For simplicity, let us assume that µ = E (Xi ) = 0 and let us study the asymptotic
behavior of nα X n for any α

σ2
V nα X n = n2α V X n = n2α = n2α−1 σ2
 
n
2 If we assume that α > 1/2, then 2α − 1 > 0, the asymptotic variance of nα X n is
infinite:
lim V nα X n = +∞

n→∞

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 144 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

1 If we assume that α < 1/2, then 2α − 1 < 0, the nα X n has a degenerated


distribution:
lim V nα X n = 0

n→∞
2 As a consequence α = 1/2 is the only choice to get a finite and positive variance

V nX n = σ2


Christophe Hurlin Chapter 1 - Estimation Theory September 2024 145 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Summary: Let X1 , .., Xn denote a sequence of independent and identically


 distributed
random variables with finite mean E (Xi ) = µ and finite variance V Xi2 = σ2 . Then,
the sample mean
1 n
X n = ∑ Xi
n i =1
satisfies
p
WLLN: X n → ⪯
√  d  
CLT: n X n − µ → N 0, σ2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 146 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

The central limit theorem does not assert that the sample mean tends to normality.
It is the transformation of the sample mean which has this property
p
WLLN: X n → µ
√  d  
CLT: n X n − µ → N 0, σ2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 147 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Theorem (Lindeberg–Levy Central Limit Theorem, multivariate)


Let X1 , .., Xn denote a sequence of independent and identically distributed random K × 1
vectors with finite mean E (Xi ) = µ and finite variance covariance K × K matrix
V (Xi ) = Σ. Then the sample mean Xn = n−1 ∑ni=1 Xi satisfies
 
√  d
n Xn − µ → N |{z} Σ 
0 , |{z}
| {z }
K ×1 K ×1 K ×K

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 148 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Remark: there exist other versions of the CLT, especially for independent but not
identically (heterogeneously) distributed variables:

1 Lindeberg–Feller Central Limit Theorem for unequal variances.

2 Liapounov Central Limit Theorem for unequal means and variances.

For more details, see Greene (2018).

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 149 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Question: from the CLT (univariate or multivariate), and the asymptotic distribution of
X n , how to derive the asymptotic distribution of an estimator θb that depends on the
sample mean?
 asy
θb = g X n ≈ ???

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 150 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Theorem (Continouous mapping theorem)


Let {Xi } for i = 1, .., n be a sequence of real-valued random variables and g (.) a
continous function:
a.s a.s
if Xn → X then g (Xn ) → g (X )
p p
if Xn → X then g (Xn ) → g (X )
d d
if Xn → X then g (Xn ) → g (X )

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 151 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Example (multiple linear regression model)


Let us consider the multiple linear regression model

yi = Xi⊤ β + µi

where Xi = (xi1 ..xiK )⊤ is K × 1 vector of random variables, β = ( β 1 ...β K )⊤ is K × 1


vector of parameters, and where the error term µi satisfies E (µi ) = 0, V (µi ) = σ2 and
E µi | Xij = 0, ∀j = 1, ..K Question: show that the OLS estimator satisfies


√  
d
  
n βb − β 0 → N 0, σ2 E−1 Xi⊤ Xi

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 152 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Proof:
1 Rewrite the OLS estimator as:
! −1 ! ! −1 !
n n n n
β = ∑ Xi Xi
b ⊤
∑ Xi yi = β0 + ∑ Xi Xi⊤ ∑ Xi µi
i =1 i =1 i =1 i =1

2 Normalize the vector βb − β 0


! −1 !
√   1 n √ 1 n

n i∑ ∑ Xi µi
n βb − β 0 = Xi Xi⊤ n
=1 n i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 153 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Reminder: if x is a vector of random variables and Y is a scalar (random variable) such


that E (XY ) = 0, then  
V (XY ) = E X E ( Y | X) X⊤

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 154 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Proof (cont’d):
3. Using the WLLN and the CMP:
! −1
1 n p
 
n i∑
Xi Xi⊤ → E−1 Xi Xi⊤
=1

4. Using the CLT:


!
√ 1 n
n i∑
d
n Xi µi − E (Xi µi ) → N (0, V (Xi µi ))
=1

with E ( µi | xik ) = 0, ∀k = 1, ..K =⇒ E (Xi µi ) = 0 and


    
V (Xi µi ) = E Xi µi µi Xi⊤ = E E Xi µi µi Xi⊤ Xi
   
= E Xi V ( µi | Xi ) Xi⊤ = σ2 E Xi Xi⊤

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 155 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Proof (cont’d). We have :


! −1
1 n p
 

n i =1
Xi Xi⊤
→ E−1 Xi Xi⊤

!
√ 1 n   

d
n X µ
i i → N 0, σ2 E Xi Xi⊤
n i =1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 156 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Theorem (Slutsky’s theorem for convergence in distribution)


d p
Let Xn and Yn be two sequences of random variables where Xn → X and Yn → c, where
c ̸= 0, then:
d
Xn + Yn → X + c
d
Xn Yn → cX
Xn d X

Yn c
d
If Yn and Xn are matrices/vectors, then Yn−1 Xn → c −1 X with
 
V c −1 X = c −1 Vc −1⊤

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 157 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Proof (cont’d). By using the Slusky’s theorem (for a convergence in distribution), we


have: ! −1 !
√   1 n √ 1 n
n i∑
n ∑ Xi µi → N (Π, Ω)
⊤ d
n β − β0 =
b Xi Xi
=1 n i =1
with  
Π = E−1 Xi Xi⊤ × 0 = 0
       
Ω = E−1 Xi Xi⊤ × σ2 E Xi Xi⊤ × E−1 Xi Xi⊤ = σ2 E−1 Xi Xi⊤

Finally, we have:
√  
d
  
n βb − β 0 → N 0, σ2 E−1 Xi Xi⊤ □

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 158 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Definition (Univariate Delta method)


Let Zn be a sequence random variable indexed by the sample size n such that
√ d
 
n (Zn − µ) → N 0, σ2

If g (.) is a continuous and continuously differentiable function with g (µ) ̸= 0 and not
involving n, then
 !2 
√ d ∂g ( x )
n (g (Zn ) − g (µ)) → N 0, σ2 
∂x µ

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 159 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Multivariate Delta method: Let Zn be a sequence random vectors indexed by the


sample size such that:
√ d
n (Z n − µ) → N (0, Σ)
If g (.) is a continuous and continuously differentiable multivariate function with
g (µ) ̸= 0 and not involving n, then:
!
√ d ∂g (X) ∂g (X)
n (g (Z n ) − g (µ)) → N 0, ×Σ×
∂X µ ∂X⊤ µ

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 160 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Example (Gamma distribution)


Let X1 , .., Xn denote a sequence of independent and identically distributed random
variables. We assume that Xi ∼ Γ (α, β) (gamma distribution) with E (X ) = αβ and
V (X ) = αβ2 , where α > 0 and β > 0. The probability density function (pdf) is defined
by:  
x α−1 exp − xβ
fX (x; α, β) = ,
Γ (α) βα
R∞
for x ∈ [0, +∞), where Γ (α) = 0 t α−1 exp (−t ) dt denotes the Gamma function. We
assume that α is known.
Question: What is the asymptotic distribution of the estimator βb defined by:

1 n
αn i∑
βb = Xi
=1

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 161 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Solution: The estimator βb is defined by:

1 n
αn i∑
βb = Xi
=1

Since X1 , .., Xn are i.i.d. with E (X ) = αβ and V (X ) = αβ2 , we can apply the
Lindeberg–Levy CLT, and we get immediately:
√  d  
n X n − αβ → N 0, αβ2

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 162 / 166


4. Asymptotic Properties
4.5 Asymptotic distributions

Solution (cont’d): If we define g (x ) = x /α, with

g E X n = g (αβ) = β ̸= 0


1 
βb = X n = g X n
α
√  d  
n X n − αβ → N 0, αβ2

By using the delta method, we have:


 !2 
√ d ∂g (z )
αβ2 
 
n g X n − g (αβ) → N 0,
∂z αβ

Since ∂g (z ) /∂z = ∂ (z/α) /∂z = 1/α, we have:

√  β2
  
d
n βb − β → N 0, □
α

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 163 / 166


4. Asymptotic Properties

Key Concepts

1 Almost sure convergence.


2 Convergence in probability.
3 Law of large numbers: Khinchine’s and Kolmogorov’s theorems.
4 Weakly and strongly consistent estimator.
5 Slutsky’s theorem.
6 Convergence in mean square.
7 Convergence in distribution.
8 Asymptotic distribution and asymptotic variance.
9 Lindeberg-Levy Central Limite Theorem (univariate and multivariate).
10 Continuous mapping theorem.
11 Delta method.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 164 / 166


References

Abadir, K. and Magnus, J. (2002). Notation in econometrics: a proposal for a standard.


The Econometrics Journal, 5(1):76–90.
Greene, W. H. (2018). Econometric Analysis. Pearson - Prentice Hall, Upper Saddle
River, NJ, 8th edition.
Khinchine, A. (1929). Sur la loi des grandes nombres. Comptes Rendus de l’Académie
des Sciences.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications. John Wiley & Sons,
New York, 2nd edition.

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 165 / 166


End of Chapter 1

Christophe Hurlin (University of Orléans)

Christophe Hurlin Chapter 1 - Estimation Theory September 2024 166 / 166

You might also like