0% found this document useful (0 votes)
89 views9 pages

Stochastic Convergence Concepts

1. The document defines three types of stochastic convergence: convergence in distribution, convergence in probability, and almost sure convergence. 2. Convergence in distribution means the cumulative distribution functions converge at all points of continuity. Convergence in probability means the probability that the random variables are within ε of a constant or random variable goes to 1 as n goes to infinity. 3. Almost sure convergence means the random variables converge to a constant or random variable with probability 1, or for almost every outcome in the sample space.

Uploaded by

Bipin Attri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views9 pages

Stochastic Convergence Concepts

1. The document defines three types of stochastic convergence: convergence in distribution, convergence in probability, and almost sure convergence. 2. Convergence in distribution means the cumulative distribution functions converge at all points of continuity. Convergence in probability means the probability that the random variables are within ε of a constant or random variable goes to 1 as n goes to infinity. 3. Almost sure convergence means the random variables converge to a constant or random variable with probability 1, or for almost every outcome in the sample space.

Uploaded by

Bipin Attri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

556: M ATHEMATICAL S TATISTICS I

C HAPTER 5: S TOCHASTIC C ONVERGENCE

The following definitions are stated in terms of scalar random variables, but extend naturally to vector
random variables defined on the same probability space with measure P . For√example, some results
are stated in terms of the Euclidean distance in one dimension |Xn − X| = (Xn − X)2 , or for se-
quences of k-dimensional random variables Xn = (Xn1 , . . . , Xnk )⊤ ,
 1/2
∑k
∥Xn − X∥ =  (Xnj − Xj )2  .
j=1

5.1 Convergence in Distribution


Consider a sequence of random variables X1 , X2 , . . . and a corresponding sequence of cdfs, FX1 , FX2 , . . .
so that for n = 1, 2, .. FXn (x) =P[Xn ≤ x] . Suppose that there exists a cdf, FX , such that for all x at
which FX is continuous,
lim FXn (x) = FX (x).
n−→∞
Then X1 , . . . , Xn converges in distribution to random variable X with cdf FX , denoted
d
Xn −→ X

and FX is the limiting distribution. Convergence of a sequence of mgfs or cfs also indicates conver-
gence in distribution, that is, if for all t at which MX (t) is defined, if as n −→ ∞, we have
d
MXi (t) −→ MX (t) ⇐⇒ Xn −→ X.

Definition : DEGENERATE DISTRIBUTIONS


The sequence of random variables X1 , . . . , Xn converges in distribution to constant c if the limiting
d
distribution of X1 , . . . , Xn is degenerate at c, that is, Xn −→ X and P [X = c] = 1, so that
{
0 x<c
FX (x) =
1 x≥c

Interpretation: A special case of convergence in distribution occurs when the limiting distribution is
discrete, with the probability mass function only being non-zero at a single value, that is, if the limiting
random variable is X, then P [X = c] = 1 and zero otherwise. We say that the sequence of random
variables X1 , . . . , Xn converges in distribution to c if and only if, for all ϵ > 0,

lim P [|Xn − c| < ϵ] = 1


n−→∞

This definition indicates that convergence in distribution to a constant c occurs if and only if the prob-
ability becomes increasingly concentrated around c as n −→ ∞.

Note: Points of Discontinuity


To show that we should ignore points of discontinuity of FX in the definition of convergence in distri-
bution, consider the following example: let
{
0 x<ϵ
Fϵ (x) =
1 x≥ϵ

1
be the cdf of a degenerate distribution with probability mass 1 at x = ϵ. Now consider a sequence {ϵn }
of real values converging to ϵ from below. Then, as ϵn < ϵ, we have
{
0 x < ϵn
Fϵn (x) =
1 x ≥ ϵn
which converges to Fϵ (x) at all real values of x. However, if instead {ϵn } converges to ϵ from above,
then Fϵn (ϵ) = 0 for each finite n, as ϵn > ϵ, so lim Fϵn (ϵ) = 0.
n−→∞
Hence, as n −→ ∞,
Fϵn (ϵ) −→ 0 ̸= 1 = Fϵ (ϵ).
Thus the limiting function in this case is
{
0 x≤ϵ
Fϵ (x) =
1 x>ϵ
which is not a cdf as it is not right-continuous. However, if {Xn } and X are random variables with
distributions {Fϵn } and Fϵ , then P [Xn = ϵn ] = 1 converges to P [X = ϵ] = 1, however we take the limit,
so Fϵ does describe the limiting distribution of the sequence {Fϵn }. Thus, because of right-continuity,
we ignore points of discontinuity in the limiting function.

5.2 Convergence in Probability


Definition : CONVERGENCE IN PROBABILITY TO A CONSTANT
The sequence of random variables X1 , . . . , Xn converges in probability to constant c, denoted
p
Xn −→ c, if
lim P [|Xn − c| < ϵ] = 1 or lim P [|Xn − c| ≥ ϵ] = 0
n−→∞ n−→∞
that is, if the limiting distribution of X1 , . . . , Xn is degenerate at c.
Interpretation : Convergence in probability to a constant is precisely equivalent to convergence in
distribution to a constant.

THEOREM (WEAK LAW OF LARGE NUMBERS)


Suppose that X1 , . . . , Xn is a sequence of i.i.d. random variables with expectation µ and finite
variance σ 2 . Let Yn be defined by
1∑
n
Yn = Xi
n
i=1
then, for all ϵ > 0,
lim P [|Yn − µ| < ϵ] = 1,
n−→∞
p
that is, Yn −→ µ, and thus the mean of X1 , . . . , Xn converges in probability to µ.

Proof. Using the properties of expectation, it can be shown that Yn has expectation µ and variance
σ 2 /n, and hence by the Chebychev Inequality,
σ2
P [|Yn − µ| ≥ ϵ] ≤ −→ 0 as n −→ ∞
nϵ2
for all ϵ > 0. Hence
P [|Yn − µ| < ϵ] −→ 1 as n −→ ∞
p
and Yn −→ µ.

2
Definition : CONVERGENCE IN PROBABILITY TO A RANDOM VARIABLE
The sequence of random variables X1 , . . . , Xn converges in probability to random variable X,
p
denoted Xn −→ X, if, for all ϵ > 0,
lim P [|Xn − X| < ϵ] = 1 or equivalently lim P [|Xn − X| ≥ ϵ] = 0
n−→∞ n−→∞

To understand this definition, let ϵ > 0, and consider


An (ϵ) ≡ {ω : |Xn (ω) − X(ω)| ≥ ϵ}
p
Then we have Xn −→ X if
lim P (An (ϵ)) = 0
n−→∞
that is, if there exists an n such that for all m ≥ n, P (Am (ϵ)) < ϵ.

5.3 Convergence Almost Surely


The sequence of random variables X1 , . . . , Xn converges almost surely to random variable X, denoted
a.s.
Xn −→ X if for every ϵ > 0 [ ]
P lim |Xn − X| < ϵ = 1,
n−→∞
a.s.
that is, if A ≡ {ω : Xn (ω) −→ X(ω)}, then P (A) = 1. Equivalently, Xn −→ X if for every ϵ > 0
[ ]
P lim |Xn − X| > ϵ = 0.
n−→∞

This can also be written


lim Xn (ω) = X(ω)
n−→∞
for every ω ∈ Ω, except possibly those lying in a set of probability zero under P .

Alternative characterization:

• Let ϵ > 0, and the sets An (ϵ) and Bm (ϵ) be defined for n, m ≥ 0 by


An (ϵ) ≡ {ω : |Xn (ω) − X(ω)| > ϵ} Bm (ϵ) ≡ An (ϵ).
n=m
a.s.
Then Xn −→ X if and only if P (Bm (ϵ)) −→ 0 as m −→ ∞.
Interpretation:
– The event An (ϵ) corresponds to the set of ω for which Xn (ω) is more than ϵ away from X.
– The event Bm (ϵ) corresponds to the set of ω for which Xn (ω) is more than ϵ away from X,
for at least one n ≥ m.
– The event Bm (ϵ) occurs if there exists an n ≥ m such that |Xn − X| > ϵ.
a.s.
– Xn −→ X if and only if and only if P (Bm (ϵ)) −→ 0.
a.s.
• Xn −→ X if and only if
P [ |Xn − X| > ϵ infinitely often ] = 0
a.s.
that is, Xn −→ X if and only if there are only finitely many Xn for which
|Xn (ω) − X(ω)| > ϵ
if ω lies in a set of probability greater than zero.

3
a.s.
• Note that Xn −→ X if and only if
( ∞
)

lim P (Bm (ϵ)) = lim P An (ϵ) =0
m−→∞ m−→∞
n=m

p
in contrast with the definition of convergence in probability, where Xn −→ X if

lim P (Am (ϵ)) = 0.


m−→∞

Clearly


Am (ϵ) ⊆ An (ϵ)
n=m

and hence almost sure convergence is a stronger form.

Alternative terminology:
a.e.
• Xn −→ X almost everywhere, Xn −→ X
w.p.1
• Xn −→ X with probability 1, Xn −→ X

Interpretation: A random variable is a real-valued function from (a sigma-algebra defined on) sample
space Ω to R . The sequence of random variables X1 , . . . , Xn corresponds to a sequence of functions
defined on elements of Ω. Almost sure convergence requires that the sequence of real numbers Xn (ω)
converges to X(ω) (as a real sequence) for all ω ∈ Ω, as n −→ ∞, except perhaps when ω is in a set
having probability zero under the probability distribution of X.

THEOREM (STRONG LAW OF LARGE NUMBERS)


Suppose that X1 , . . . , Xn is a sequence of i.i.d. random variables with expectation µ and (finite)
variance σ 2 . Let Yn be defined by
1∑
n
Yn = Xi
n
i=1

then, for all ϵ > 0, [ ]


P lim |Yn − µ| < ϵ = 1,
n−→∞
a.s.
that is, Yn −→ µ, and thus the mean of X1 , . . . , Xn converges almost surely to µ.

5.4 Convergence In rth Mean


The sequence of random variables X1 , . . . , Xn converges in rth mean to random variable X, denoted
r
Xn −→ X if
lim E [|Xn − X|r ] = 0.
n−→∞

For example, if [ ]
E
lim
n−→∞
(Xn − X)2 = 0

then we write
r=2
Xn −→ X.
In this case, we say that {Xn } converges to X in mean-square or in quadratic mean.

4
THEOREM
For r1 > r2 ≥ 1,
r=r r=r
Xn −→1 X =⇒ Xn −→2 X

Proof. By Lyapunov’s inequality


E[ |Xn − X|r 2
]1/r2 ≤ E[ |Xn − X|r1 ]1/r1
so that
E[ |Xn − X|r 2
] ≤ E[ |Xn − X|r1 ]r2 /r1 −→ 0
as n −→ ∞, as r2 < r1 . Thus
E[ |Xn − X|r 2
] −→ 0
r=r
and Xn −→2 X. The converse does not hold in general.
THEOREM (RELATING THE MODES OF CONVERGENCE)
For sequence of random variables X1 , . . . , Xn , following relationships hold
a.s. 
Xn −→ X 


p d
or =⇒ Xn −→ X =⇒ Xn −→ X


r 
Xn −→ X
so almost sure convergence and convergence in rth mean for some r both imply convergence in
probability, which in turn implies convergence in distribution to random variable X.
No other relationships hold in general.

THEOREM (Partial Converses: NOT EXAMINABLE)

(i) If


P [ |Xn − X| > ϵ ] < ∞
n=1
a.s.
for every ϵ > 0, then Xn −→ X.
(ii) If, for some positive integer r,


E[ |Xn − X|r ] < ∞
n=1
a.s.
then Xn −→ X.
THEOREM (Slutsky’s Theorem)
Suppose that
d p
Xn −→ X and Yn −→ c
Then
d
(i) Xn + Yn −→ X + c
d
(ii) Xn Yn −→ cX
d
(iii) Xn /Yn −→ X/c provided c ̸= 0.

5
5.5 The Central Limit Theorem
THEOREM (THE LINDEBERG-LÉVY CENTRAL LIMIT THEOREM)
Suppose X1 , . . . , Xn are i.i.d. random variables with mgf MX , with expectation µ and variance σ 2 ,
both finite. Let the random variable Zn be defined by

n
Xi − nµ √
i=1 n(X n − µ)
Zn = √ =
nσ 2 σ

where
1∑
n
Xn = Xi ,
n
i=1

and denote by MZn the mgf of Zn . Then, as n −→ ∞,

MZn (t) −→ exp{t2 /2}


d
irrespective of the form of MX . Thus, as n −→ ∞, Zn −→ Z ∼ N (0, 1).

Proof. First, let Yi = (Xi − µ)/σ for i = 1, . . . , n. Then Y1 , . . . , Yn are i.i.d. with mgf MY say, and
EfY
[Yi ] = 0, VarY [Yi ] = 1 for each i. Using a Taylor series expansion, we have that for t in a neighbour-
hood of zero,

t2 3 2
MY (t) = 1 + tEY [Y ] + EY [Y 2 ] + t EY [Y 3 ] + . . . = 1 + t + O(t3 )
2! 3! 2
using the O(t3 ) notation to capture all terms involving t3 and higher powers. Re-writing Zn as

1 ∑
n
Zn = √ Yi
n
i=1

as Y1 , . . . , Yn are independent, we have by a standard mgf result that


n {
∏ ( )} { }n { }n
t t2 −3/2 t2 −1
MZn (t) = MY √ = 1+ + O(n ) = 1+ + o(n ) .
n 2n 2n
i=1

so that, by the definition of the exponential function, as n −→ ∞


d
MZn (t) −→ exp{t2 /2} ∴ Zn −→ Z ∼ N (0, 1)

where no further assumptions on MX are required.

Alternative statement: The theorem can also be stated in terms of



n
Xi − nµ
i=1 √
Zn = √ = n(X n − µ)
n

so that
d
Zn −→ Z ∼ N (0, σ 2 ).
and σ 2 is termed the asymptotic variance of Zn .

6
Notes :

(i) The theorem requires the existence of the mgf MX .

(ii) The theorem holds for the i.i.d. case, but there are similar theorems for non identically dis-
tributed, and dependent random variables.

(iii) The theorem allows the construction of asymptotic normal approximations. For example, for
large but finite n, by using the properties of the Normal distribution,

X n ∼ AN (µ, σ 2 /n)

n
Sn = Xi ∼ AN (nµ, nσ 2 ).
i=1

where AN (µ, σ 2 ) denotes an asymptotic normal distribution. The notation

Xn ∼.
. N (µ, σ /n)
2

is sometimes used.

(iv) The multivariate version of this theorem can be stated as follows: Suppose X1 , . . . , Xn are i.i.d.
k-dimensional random variables with mgf MX , with

EfX [Xi ] = µ VarfX [Xi ] = Σ

where Σ is a positive definite, symmetric k × k matrix defining the variance-covariance matrix of


the Xi . Let the random variable Zn be defined by

Zn = n(Xn − µ)

where
1∑
n
Xn = Xi .
n
i=1

Then
d
Zn −→ Z ∼ N (0, Σ)
as n −→ ∞.

7
Appendix (NOT EXAMINABLE)
Proof. Relating the modes of convergence.
a.s. p a.s.
(a) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Then

P [ |Xn − X| < ϵ ] ≥ P [ |Xm − X| < ϵ, ∀m ≥ n ] (1)

as, considering the original sample space,

{ω : |Xm (ω) − X(ω)| < ϵ, ∀m ≥ n} ⊆ {ω : |Xn (ω) − X(ω)| < ϵ}


a.s.
But, as Xn −→ X, P [ |Xm − X| < ϵ, ∀m ≥ n ] −→ 1, as n −→ ∞. So, after taking limits in
equation (1), we have

lim P [ |Xn − X| < ϵ ] ≥ lim P [ |Xm − X| < ϵ, ∀m ≥ n ] = 1


n−→∞ n−→∞

and so
p
lim P [ |Xn − X| < ϵ ] = 1 ∴ Xn −→ X.
n−→∞

r p r
(b) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Then, using an argument similar to
Chebychev’s Lemma,

E[ |Xn − X|r ] ≥ E[ |Xn − X|r I{|X n −X|>ϵ}


] ≥ ϵr P [|Xn − X| > ϵ].

Taking limits as n −→ ∞, as Xn −→ X, E[ |Xn − X|r ] −→ 0 as n −→ ∞, so therefore, also, as


r

n −→ ∞
p
P [|Xn − X| > ϵ] −→ 0 ∴ Xn −→ X.
p d p
(c) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Denote, in the usual way,

FXn (x) = P [Xn ≤ x] and FX (x) = P [X ≤ x].

Then, by the theorem of total probability, we have two inequalities

FXn (x) = P [Xn ≤ x] = P [Xn ≤ x, X ≤ x+ϵ]+P [Xn ≤ x, X > x+ϵ] ≤ FX (x+ϵ)+P [|Xn −X| > ϵ]

FX (x−ϵ) = P [X ≤ x−ϵ] = P [X ≤ x−ϵ, Xn ≤ x]+P [X ≤ x−ϵ, Xn > x] ≤ FXn (x)+P [|Xn −X| > ϵ].
as A ⊆ B =⇒ P (A) ≤ P (B) yields

P [ Xn ≤ x, X ≤ x + ϵ ] ≤ FX (x + ϵ) and P [ X ≤ x − ϵ, Xn ≤ x ] ≤ FXn (x).

Thus
FX (x − ϵ) − P [ |Xn − X| > ϵ] ≤ FXn (x) ≤ FX (x + ϵ) + P [ |Xn − X| > ϵ]
and taking limits as n −→ ∞ (with care; we cannot yet write limn−→∞ FXn (x) as we do not know
p
that this limit exists) recalling that Xn −→ X,

FX (x − ϵ) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x + ϵ)


n−→∞ n−→∞

Then if FX is continuous at x, FX (x − ϵ) −→ FX (x) and FX (x + ϵ) −→ FX (x) as ϵ −→ 0, so

FX (x) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x)
n−→∞ n−→∞

and thus FXn (x) −→ FX (x) as n −→ ∞.

8
Proof. (Partial converses)

(i) Let ϵ > 0. Then for n ≥ 1,


[ ∞
] ∞
∪ ∑
P [ |Xn − X| > ϵ, for some m ≥ n ] ≡ P {|Xm − X| > ϵ} ≤ P [ |Xm − X| > ϵ ]
m=n m=n

as, by elementary probability theory, P (A ∪ B) ≤ P (A) + P (B). But, as it is the tail sum of a
convergent series (by assumption), it follows that


lim P [ |Xm − X| > ϵ ] = 0.
n−→∞
m=n

Hence
lim P [ |Xn − X| > ϵ, for some m ≥ n ] = 0
n−→∞
a.s.
and Xn −→ X.

r p
(ii) Identical to part (i), and using part (b) of the previous theorem that Xn −→ X =⇒ Xn −→ X.

You might also like