0% found this document useful (0 votes)
62 views

Convergence in Probability

The document discusses various modes of convergence for random variables and their implications. It defines almost sure convergence, convergence in probability, Lp convergence, convergence in distribution, and weak convergence. It then proves the following implications between these modes of convergence: 1. Lp convergence implies L1 convergence for 1 ≤ p ≤ ∞. 2. L1 convergence implies convergence in probability. 3. Almost sure convergence implies convergence in probability. 4. Convergence in probability implies convergence in distribution and weak convergence. Moreover, if random variables converge in probability, then functions of those random variables converge in the same modes.

Uploaded by

wueb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Convergence in Probability

The document discusses various modes of convergence for random variables and their implications. It defines almost sure convergence, convergence in probability, Lp convergence, convergence in distribution, and weak convergence. It then proves the following implications between these modes of convergence: 1. Lp convergence implies L1 convergence for 1 ≤ p ≤ ∞. 2. L1 convergence implies convergence in probability. 3. Almost sure convergence implies convergence in probability. 4. Convergence in probability implies convergence in distribution and weak convergence. Moreover, if random variables converge in probability, then functions of those random variables converge in the same modes.

Uploaded by

wueb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Convergence in Probability

Robert Baumgarth1
1
Mathematics Research Unit, FSTM, University of Luxembourg, Maison du Nombre, 6, Avenue de la Fonte, 4364
Esch-sur-Alzette, Grand-Duché de Luxembourg

positive bound & (DOM)


𝖫𝑝 -convergence
A.4
e n ce,
sequ
sub

Vitali’s theorem A.5


a.s. convergence
𝖫1 -convergence
su
bs
equ
enc
e,3
.3

convergence in probability
(stochastic convergence)
limit a.s., 3.4
same const.

weak convergence
(convergence in distribution/law)

1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 «Subsequence» implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Weak convergence as convergence in distribution . . . . . . . . . . . . . . . . . . . . . . 5
5 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A Appendix - some measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1
– 2

1 Definitions
Definition 1.1. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) defined on the same probability
space (Ω, 𝓐, ℙ). We say that
a.s. 𝑛↑∞
(i) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 almost surely (a.s.) or 𝑋𝑛 ⟶ 𝑋 with probability one

∶⟺ ℙ 𝜔 ∈ Ω ∶ lim 𝑋𝑛 (𝜔) = 𝑋(𝜔) = 1.


( 𝑛→∞ )
▷ «presque sûrement» (p.s.), «fast sicher» (f.s.) or «avec probabilité 1», «mit Wahrscheinlichkeit 1».

(ii) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 in probability or 𝑋𝑛 converges stochastically to 𝑋

∶⟺ ∀𝜀 > 0 ∶ lim ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) = 0


𝑛→∞
▷ en probabilité, in Wahrscheinlichkeit or –, stochastisch.
𝖫𝑝
(iii) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 in 𝖫𝑝 or 𝑋𝑛 converges to 𝑋 in the 𝑝th mean

∶⟺ ∀𝑋, 𝑋𝑛 ∈ 𝖫𝑝 (ℙ) ∶ lim ‖𝑋𝑛 − 𝑋 ‖𝖫𝑝 = 0,


𝑛→∞
1/𝑝
where, ‖𝑋‖𝖫𝑝 (ℙ) ∶= (𝔼 |𝑋|𝑝 ) , i.e.
𝑝
∶⟺ ∀𝑋, 𝑋𝑛 ∈ 𝖫𝑝 ∶ lim 𝔼 |𝑋𝑛 − 𝑋 | = 0.
𝑛→∞
▷ dans 𝖫𝑝 , in 𝖫𝑝 or en moyenne d’ordre 𝑝, im 𝑝-ten Mittel.

Definition 1.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) not necessarily defined on the same
probability space. We say that
d/𝓓/𝓛
(i) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 in distribution/law

∶⟺ lim 𝐹𝑛 (𝑥) = lim ℙ(𝑋𝑛 ⩽ 𝑥) = ℙ(𝑋 ⩽ 𝑥) = 𝐹 (𝑥)


𝑛→∞ 𝑛→∞

for all points 𝑥 ∈ ℝ, where 𝐹 is continuous.


▷ en loi, in Verteilung.
d/w
(ii) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 weakly

∶⟺ ∀𝑓 ∈ 𝖢𝑏 (ℝ) ∶ lim 𝔼𝑓 (𝑋𝑛 ) = 𝔼𝑓 (𝑋). (1.1)


𝑛→∞
▷ faible, schwach.

Remark 1.3. (1) Since the real random variables 𝑋𝑛 , 𝑋 do not have to be defined on the same
probability space in the definition of weak convergence, we actually should emphasise the de-
pendence by writing ℙ, ℙ𝑛 and 𝔼, 𝔼𝑛 respectively. It is a common convention to omit this
dependence.

(2) As already indicated in the notation in Definition 1.2, (i) and (ii) are equivalent (⇝ Portmanteau
theorem). We can equivalently write (1.1) as

∀𝑓 ∈ 𝖢𝑏 (ℝ) ∶ lim 𝑓 (𝑥)ℙ(𝑋𝑛 ∈ d𝑥) = 𝑓 (𝑥)ℙ(𝑋 ∈ d𝑥) .


𝑛→∞ ∫ ∫

In functional analysis it is common to write lim𝑛→∞ ⟨𝑓 , ℙ𝑋𝑛 ⟩ = ⟨𝑓 , ℙ𝑋 ⟩, where ⟨𝑓 , 𝜇⟩ = ∫ 𝑓 d𝜇 .


– 3

2 Implications
Theorem 2.1. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) and 1 ⩽ 𝑝 ⩽ ∞. Then
𝖫𝑝 𝖫1
(a) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋
𝖫1 ℙ
(b) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋
a.s. ℙ
(c) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋

Proof. (a) Let 1 ⩽ 𝑝 < ∞ and 𝑞 the conjugate exponent 1/𝑝 + 1/𝑞 = 1. By Hölder’s inequality
1/𝑝 1/𝑞
𝑝
𝔼 |𝑋 𝑛 − 𝑋 | = |𝑋𝑛 − 𝑋 | ⋅ 1 dℙ ⩽ |𝑋𝑛 − 𝑋 | dℙ 1𝑞 dℙ
∫ (∫ ) (∫ )
𝑝 1/𝑝 𝑛↑∞
= (𝔼 |𝑋𝑛 − 𝑋 | ) ⟶ 0.
assumption

The case 𝑝 = ∞ follows in a completely analogue matter. In particular, we have shown that
𝖫𝑝 (ℙ) ↪ 𝖫1 (ℙ) is a continuous embedding.

(b) By the Chebyshev–Markov inequality, we get

1 𝑛↑∞
∀𝜀 > 0 ∶ ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) ⩽ 𝔼 |𝑋𝑛 − 𝑋 | ⟶ 0.
𝜀

(c) For all 𝜀 > 0 we have

ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) = 𝔼𝟙{|𝑋𝑛 −𝑋 |>𝜀} = 𝔼𝟙[−𝜀,𝜀]𝑐 (𝑋𝑛 − 𝑋).

𝑛↑∞
The random variables 𝑌𝑛 ∶= 𝟙{|𝑋𝑛 −𝑋 |>𝜀} are uniformly bounded by 𝟭 ∈ 𝖫𝑝 (ℙ) and 𝑌𝑛 ⟶ 0 a.s.
Hence, by dominated convergence A.1, it follows
𝑛↑∞
∀𝜀 > 0 ∶ ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) = 𝔼𝟙{|𝑋𝑛 −𝑋 |>𝜀} ⟶ 0. ■

ℙ d
Theorem 2.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) such that 𝑋𝑛 ⟶ 𝑋 . Then 𝑋𝑛 ⟶ 𝑋 .
𝖫1
Moreover, it holds 𝑓 (𝑋𝑛 ) ⟶ 𝑓 (𝑋) for all 𝑓 ∈ 𝖢𝑏 (ℝ).

Proof. Let 𝑓 ∈ 𝖢𝑏 (ℝ) and 𝜀 > 0 fixed. Since {|𝑋| > 𝑘} ↓ ∅ for 𝑘 ↑ ∞, by the ∅-continuity of the
probability measure ℙ, we have

∃𝑁 = 𝑁(𝜀) ∈ ℕ ∀𝑘 ⩾ 𝑁 ∶ ℙ(|𝑋| > 𝑘) < 𝜀. (2.1)

By assumption |𝑓 | ⩽ 𝑀 , for some suitable constant 𝑀 , and 𝑓 |[−(𝑁+1),𝑁+1] is uniformly continuous,


i.e.

∃𝛿 = 𝛿(𝜀) ∈ (0, 1) ∀ |𝑥| , |𝑦| ⩽ 𝑁 + 1, |𝑥 − 𝑦| ⩽ 𝛿 ∶ |𝑓 (𝑥) − 𝑓 (𝑦)| ⩽ 𝜀. (2.2)


– 4

Hence, splitting the area of integration in clever way,

|𝔼 [𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)]| ⩽ 𝔼 |𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)|

⩽ + + 𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)| dℙ
(∫{|𝑋𝑛 −𝑋 |⩽𝛿 }∩{|𝑋|⩽𝑁} ∫{|𝑋𝑛 −𝑋 |⩽𝛿 }∩{|𝑋|>𝑁} ∫{|𝑋𝑛 −𝑋 |>𝛿 } ) |

⩽ 𝜀ℙ(|𝑋𝑛 − 𝑋 | ⩽ 𝛿, |𝑋| ⩽ 𝑁 ) + 2𝑀ℙ(|𝑋|


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ > 𝑁) + 2𝑀ℙ(|𝑋𝑛 − 𝑋 | > 𝛿 ),
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
|𝑋𝑛 |⩽|𝑋𝑛 −𝑋 |+|𝑋|⩽𝛿+𝑁 ⩽1+𝑁 |𝑓 (𝑥)−𝑓 (𝑦)|⩽2‖𝑓 ‖∞ ⩽2𝑀

where we used that |𝑓 | ⩽ 𝑀 together with (2.2) and since on the set {|𝑋𝑛 − 𝑋 | ⩽ 𝛿 } ∩ {|𝑋| ⩽ 𝑁},
we have |𝑋𝑛 | ⩽ |𝑋𝑛 − 𝑋 | + |𝑋| ⩽ 𝛿 + 𝑁 ⩽ 1 + 𝑁 . Finally, using (2.1), we get

𝑛↑∞ 𝜀↓0
𝔼 |𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)| ⩽ (2𝑀 + 1)𝜀 + 2𝑀ℙ(|𝑋𝑛 − 𝑋 | > 𝛿 ) ⟶ (2𝑀 + 1)𝜀 ⟶ 0,

thus 𝑓 (𝑋𝑛 ) → 𝑓 (𝑋) in 𝖫1 and 𝔼𝑓 (𝑋𝑛 ) → 𝔼𝑓 (𝑋). ■

3 «Subsequence» implications
Recall the definition of the limit superior of sets:

𝜔 ∈ lim sup 𝐴𝑛 ∶= 𝐴𝑛 ⟺ 𝜔 appears in infinitely many of the 𝐴𝑛


⋂ ⋃
𝑚∈ℕ (𝑛⩾𝑚 )

Hence, we can justify the probabilistic terms

lim sup 𝐴𝑛 = {𝐴𝑛 for infinitely many 𝑛 ∈ ℕ} = {𝐴𝑛 infinitely often (i.o.)} .

Lemma 3.1 (Borel-Cantelli). Let (𝐴𝑛 )𝑛∈ℕ ⊂ 𝓐. Then

∑ ℙ(𝐴𝑛 ) < ∞ ⟹ ℙ(𝐴𝑛 for infinitely many 𝑛 ∈ ℕ) = 0.


𝑛∈ℕ

Proof.

𝜔 ∈ lim sup 𝐴𝑛 ⟺ 𝜔 appears in infinitely many of the 𝐴𝑛 ⟺ ∑ 𝟙𝐴𝑛 (𝜔) = ∞.


𝑛∈ ℕ

By Beppo Levi, it follows that

𝔼 ∑ 𝟙𝐴𝑛 (𝜔) = ∑ 𝔼𝟙𝐴𝑛 (𝜔) = ∑ ℙ(𝐴𝑛 ) < ∞,


( 𝑛∈ ℕ ) 𝑛∈ℕ 𝑛∈ ℕ

and we see that ∑𝑛∈ℕ 𝟙𝐴𝑛 (𝜔) < ∞ a.s., hence ℙ(𝐴𝑛 for infinitely many 𝑛 ∈ ℕ) = 0. ■

Making the special choice 𝐴𝑛 ∶= {|𝑋𝑛 − 𝑋 | > 𝜀} in 3.1, we can prove that having control of a fast
rate of convergence of ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) → 0, we already get convergence almost surely.


Lemma 3.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) with 𝑋𝑛 ⟶ 𝑋 and a null sequence 𝜀𝑛 ↓ 0
a.s.
with ℙ(|𝑋𝑛 − 𝑋 | > 𝜀𝑛 ) < ∞. Then 𝑋𝑛 ⟶ 𝑋 .
– 5

Proof. By the Borel-Cantelli lemma 3.1,

ℙ(𝐴𝑛 for infinitely many 𝑛 ∈ ℕ) = 0


⟺ ℙ(𝐴𝑛 for at most finitely many 𝑛 ∈ ℕ) = 1
⟺ ∃Ω′ ⊂ Ω, ℙ(Ω′ ) = 1 ∀𝑤 ∈ Ω′ ∃𝑁(𝜔) ∀𝑛 ⩾ 𝑁(𝜔) ∶ |𝑋𝑛 (𝜔) − 𝑋(𝜔)| < 𝜀𝑛
𝑛↑∞
⟹ ∀𝑤 ∈ Ω′ ∶ |𝑋𝑛 (𝜔) − 𝑋(𝜔)| ⟶ 0. ■

Corollary 3.3 (ℙ convergence ⟹ a.s. of subsequence). Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables


(𝑛 ∈ ℕ). Then
ℙ 𝑘↑∞
𝑋𝑛 ⟶ 𝑋 ⟹ ∃(𝑋𝑛(𝑘) )𝑘∈ℕ ∶ 𝑋𝑛(𝑘) ⟶ 𝑋.
a.s.

Proof. By assumption,

∀𝑘 ∈ ℕ ∀𝜀 > 0 ∃𝑁(𝑘, 𝜀) ∈ ℕ ∀𝑛 ⩾ 𝑁(𝑘, 𝜀) ∶ ℙ(|𝑋𝑘 − 𝑋 | > 𝜀) ⩽ 2−𝑘 .

Choose 𝜀 = 2−𝑘 and 𝑛(𝑘) ∶= 𝑁(𝑘, 2−𝑘 ). Hence,

−𝑘 −𝑘
∑ ℙ(|𝑋𝑛(𝑘) − 𝑋 | > 2 ) ⩽ ∑ 2 < ∞.
𝑘∈ℕ 𝑘∈ℕ

and the assertion follows from Lemma 3.2. ■

Finally, we show a relation between convergence in distribution and convergence in probability.

Lemma 3.4. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) defined on the same probability space. If
𝑋 ≡ 𝑐 is constant a.s., then
ℙ d
𝑋𝑛 ⟶ 𝑋 ≡ 𝑐 ⟺ 𝑋𝑛 ⟶ 𝑋 ≡ 𝑐.

Proof. ⟹ Clearly by 2.1.

⟸ We choose a cut-off function in a clever way: Fix 𝜀 > 0 and 𝜒𝜀 ∈ 𝖢𝑏 (ℝ) with 𝜒𝜀 (0) = 0 and
𝜒𝜀 ⩾ 𝟙[−𝜀,𝜀]𝑐 . Then 𝜒𝜀 (⋅ − 𝑐) ∈ 𝖢𝑐 (ℝ) and we have

w
ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) ⩽ 𝜒𝜀 (𝑋𝑛 − 𝑋)dℙ = 𝜒𝜀 (𝑋𝑛 − 𝑐)dℙ ⟶ 𝜒𝜀 (𝑋 − 𝑐)dℙ = 0. ■
∫ ∫ 𝑛↑∞ ∫

4 Weak convergence as convergence in distribution


The following theorem links convergence in distribution to a.s. convergence (on a different probab-
ility space). Note that the theorem also holds in metric spaces.

Theorem 4.1 (Skorokhod’s representation theorem). Let 𝑋𝑛 , 𝑋 be real random variables (𝑛 ∈ ℕ), not
d
necessarily defined on the same probability space. If 𝑋𝑛 ⟶ 𝑋 , then there is another probability space
and random variables 𝑌𝑛 , 𝑌 on this probability space, such that
a.s.
𝑋𝑛 ∼ 𝑌 𝑛 , 𝑋∼𝑌 and 𝑋𝑛 ⟶ 𝑋.
– 6

Theorem 4.2 (CMT – continuous mapping theorem). Let 𝑋𝑛 , 𝑋 ∶ Ω → ℝ𝑑 be random variables with
d d
𝑋𝑛 ⟶ 𝑋 and 𝑔 ∶ ℝ𝑑 → ℝ𝑟 continuous. Then 𝑔(𝑋𝑛 ) ⟶ 𝑔(𝑋) in ℝ𝑟 .
𝑛→∞ 𝑛→∞

Proof. Let 𝑓 ∈ 𝖢𝑏 (ℝ𝑟 ). Then 𝑓 ∘ 𝑔 ∈ 𝖢𝑏 (ℝ𝑑 ) and a direct calculation shows


d
𝔼𝑓 (𝑔(𝑋𝑛 )) = 𝔼(𝑓 ∘ 𝑔)(𝑋𝑛 ) ⟶ 𝔼(𝑓 ∘ 𝑔)(𝑋) = 𝔼𝑓 (𝑔(𝑋)).
𝑛→∞

d
Hence, 𝑔(𝑋𝑛 ) ⟶ 𝑔(𝑋) in ℝ𝑟 . ■
𝑛→∞

Theorem 4.3 (Cramér). Let 𝑋𝑛 , 𝑌𝑛 ∶ Ω → ℝ𝑑 be random variables. Then


ℙ d d
𝑋𝑛 − 𝑌𝑛 ⟶ 0 ⟹ 𝑋𝑛 ⟶ 𝑍 ⟺ 𝑌𝑛 ⟶ 𝑍 .
𝑛→∞ ( 𝑛→∞ 𝑛→∞ )
Such sequences (𝑋𝑛 ) and (𝑌𝑛 ) are called stochastically equivalent.

Theorem 4.4 (Cramér-Slutsky). Let 𝑋𝑛 ∶ Ω → ℝ𝑟 and 𝑌𝑛 ∶ Ω → ℝ𝑑 be random variables with


d ℙ
𝑋𝑛 ⟶ 𝑋 in ℝ𝑟 and 𝑌𝑛 ⟶ 𝑐 in ℝ𝑑 for some constant 𝑐 . Then
𝑛→∞ 𝑛→∞

𝑋𝑛 d 𝑋
⟶ in ℝ𝑟+𝑑 .
( 𝑌𝑛 ) 𝑛 → ∞ ( 𝑐 )

By the CMT, Theorem 4.2 above, for any function 𝑔 continuous on ℝ𝑟+𝑑 or a.s. continuous in (𝑋, 𝑐),
moreover
d
𝑔(𝑋𝑛 , 𝑌𝑛 ) ⟶ 𝑔(𝑋, 𝑐).
𝑛→∞


Proof. Using the assumption 𝑌𝑛 ⟶ 𝑐
𝑛→∞

𝑋𝑛 𝑋𝑛 0 ℙ 0 𝑋𝑛 𝑋𝑛
− = ⟶ ⟹ , are stochastically equivalent.
( 𝑌𝑛 ) ( 𝑐 ) (𝑌𝑛 − 𝑐) 𝑛 → ∞ (0) ( 𝑌𝑛 ) ( 𝑐 )

Let 𝑓 ∈ 𝖢𝑏 (ℝ𝑟+𝑑 ). Then 𝑓 (⋅, 𝑐) ∈ 𝖢𝑏 (ℝ𝑟 ) for fixed 𝑐 ∈ ℝ𝑑 and

𝑋𝑛 𝑋
⟹ 𝔼𝑓 ⟶ 𝔼
( 𝑐 ) 𝑛→∞ (𝑐 )
⟹ 𝔼𝑓 (𝑋𝑛 , 𝑐) ⟶ 𝔼𝑓 (𝑋, 𝑐),
𝑛→∞

d
as 𝑋𝑛 ⟶ 𝑋 by assumption. So the assertion follows. ■
𝑛→∞

We emphasise that the CMT, Cramér’s theorem and the Cramér-Slutsky theorem also hold in a
metric space setting.

d
Example 4.5. Let 𝑋𝑛 ∶ Ω → ℝ𝑟 and 𝑌𝑛 ∶ Ω → ℝ𝑑 be random variables with 𝑋𝑛 ⟶ 𝑋 in ℝ𝑟 and
𝑛→∞

𝑌𝑛 ⟶ 𝑐 in ℝ𝑑 for some constant 𝑐 . By applying the CMT, Theorem 4.2 to the functions 𝑔(𝑥, 𝑦) = 𝑥+𝑦,
𝑛→∞
𝑔(𝑥, 𝑦) = 𝑥𝑦, and 𝑔(𝑥, 𝑦) = 𝑥𝑦−1 , we get by Theorem 4.4: For 𝑟 = 𝑑
– 7

d
(a) 𝑋𝑛 + 𝑌𝑛 ⟶ 𝑋 + 𝑐 ,
𝑛→∞

d
(b) ⟨𝑋𝑛 , 𝑌𝑛 ⟩ ⟶ ⟨𝑋, 𝑐⟩,
𝑛→∞

and for 𝑑 = 1,
d
(c) 𝑋𝑛 𝑌𝑛 ⟶ 𝑋𝑐 ,
𝑛→∞

𝑋𝑛 d 𝑋
(c) ⟶ .
𝑌𝑛 𝑛 → ∞ 𝑐

Next, we characterise convergence in distribution with help of the characteristic function


𝜑 ∶ ℝ𝑑 → 𝔼, 𝜑(𝜉) = 𝑒𝑖⟨𝜉,𝑋⟩ .

Theorem 4.6 (Lévy). Let 𝑋𝑛 , 𝑋 ∶ Ω → ℝ𝑑 be random variables (𝑛 ∈ ℕ). Then

d
𝑋𝑛 ⟶ 𝑋 ⟺ ∀𝜉 ∈ ℝ𝑑 ∶ 𝜑𝑋𝑛 ⟶ 𝜑𝑋 (𝜉).
𝑛→∞ 𝑛→∞

The convergence of the characteristic function is locally uniformly.

We immediately get a characterisation of multidimensional convergence in distribution, known as


Cramér-Wold device.

Corollary 4.7 (Cramér-Wold). Let 𝑋𝑛 , 𝑋 ∶ Ω → ℝ𝑑 be random variables (𝑛 ∈ ℕ). Then

d d
𝑋𝑛 ⟶ 𝑋 ⟺ ∀𝜉 ∈ ℝ𝑑 ∶ ⟨𝜉, 𝑋𝑛 ⟩ ⟶ ⟨𝜉, 𝑋⟩ .
𝑛→∞ 𝑛→∞

Finally, we give two result for the convergence in distribution of sums and vectors of random
d
variables. Without further assumptions (in general) it is not possible to deduce that, if 𝑋𝑛 ⟶ 𝑋
𝑛→∞
d d d
and 𝑌𝑛 ⟶ 𝑌 , then also 𝑋𝑛 + 𝑌𝑛 ⟶ 𝑋 + 𝑌 or (𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ), even if the random variables
𝑛→∞ 𝑛→∞ 𝑛→∞
𝑋𝑛 , 𝑋, 𝑌𝑛 , 𝑌 are defined on the same probability space. But it holds

Lemma 4.8. Let 𝑋𝑛 , 𝑌𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ), not necessarily defined on the same
probability space. Then

d d
⎧𝑋𝑛 ⟶ 𝑋, 𝑌𝑛 ⟶ 𝑌 and
d ⎪ 𝑛→∞ 𝑛→∞
(𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ) ⟹ ⎨
𝑛→∞ d
⎪𝑋 + 𝑌 ⟶ 𝑋 +𝑌.
⎩ 𝑛 𝑛 𝑛→∞

Proof. We choose 𝑑 = 2, and 𝜉 = (𝜏, 0), 𝜉 = (0, 𝜏) and 𝜉(𝜏, 𝜏) in the Cramér-Wold device 4.7. ■

Theorem 4.9 (Slutsky). Let 𝑋𝑛 , 𝑌𝑛 ∶ Ω → ℝ𝑑 be sequences of random variables (𝑛 ∈ ℕ) such that


d ℙ d
𝑋𝑛 ⟶ 𝑋 and 𝑋𝑛 − 𝑌𝑛 ⟶ 𝑋 . Then 𝑋𝑛 ⟶ 𝑋 .
– 8

5 Counterexamples
Example 5.1. Let (Ω, 𝓐, ℙ) = ([0, 1), 𝓑[0, 1), Leb|[0,1) = d𝜔).

(a) 𝖫1 -convergence ⟹
/ 𝖫𝑝 -convergence (𝑝 > 1)

𝑋𝑛 (𝜔) = 𝟙[1/𝑛,1) (𝜔) 𝜔−1/𝑝 , 𝑋(𝜔) = 𝜔−1/𝑝 .

(b) 𝖫1 -convergence ⟹
/ a.s. convergence

𝑋𝑛,𝑘 (𝜔) = 𝟙[𝑘/𝑛,(𝑘+1)/𝑛) (𝜔), 𝑛 ∈ ℕ, 𝑘 = 0, 1, ..., 𝑛 − 1.

It is easy so see that 𝑋(𝜔) ≡ 0 in 𝖫1 , but the sequence does not converge at any point 𝜔 ∈ [0, 1).

/ 𝖫1 -convergence
(c) ℙ-convergence ⟹

𝑋𝑛 (𝜔) = 𝑛 𝟙[0,1/𝑛] (𝜔), 𝑋𝑛 (𝜔) ≡ 0.

(d) ℙ-convergence ⟹
/ a.s. convergence. Clear by part (b).

(e) w-convergence ⟹
/ ℙ-convergence (also if all random variables are defined on the same probabil-
ity space). We define the so called «Rademacher» functions
𝑅1 , 𝑅2 , 𝑅3 , ... by
1 1
𝑅𝑛 ∼ 𝛿
2 1
+ 𝛿 ,
2 −1
i.e. the 𝑅𝑛 are defined alternating on sets of the same length. Clearly,
w
𝔼𝑓 (𝑅𝑛 ) = 12 𝑓 (1) + 12 𝑓 (−1) ⟶ 𝔼𝑓 (𝑅1 ), i.e. 𝑅𝑛 ⟶ 𝑅1 . On the other hand we see that the
𝑅𝑛 (𝜔) cannot converge a.s., since

lim inf 𝑅𝑛 (𝜔) = −1 < +1 = lim sup 𝑅𝑛 (𝜔) ∀𝜔 ∈ (0, 1).


𝑛→∞ 𝑛→∞

For symmetry reasons we also have that, for 𝑘 ≠ 𝑛,

1

⎪0, {𝑅𝑛 = 𝑅𝑘 } in of all cases
2
⎪ 1
𝑅𝑛 − 𝑅𝑘 = ⎨+2, {𝑅𝑛 = 1, 𝑅𝑘 = −1} in of all cases
4
⎪ 1

⎩−2, {𝑅𝑛 = −1, 𝑅𝑘 = 1} in 4
of all cases.

Hence, it follows that ℙ(|𝑅𝑛 − 𝑅𝑘 | > 𝜀) = 12 for all 𝜀 < 2. So, (𝑅𝑛 )𝑛∈ℕ cannot be a ℙ-Cauchy
sequence, and thus, also not stochastically convergent.

d d d
̸
Example 5.2. 𝑋𝑛 ⟶ 𝑋, 𝑌𝑛 ⟶ 𝑌 ⟹(𝑋
𝑛→∞
𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ). Choose 𝑋, 𝑌 ∼ Ber(1/2) iid Bernoulli
𝑛→∞ 𝑛→∞
d d
and define 𝑋𝑛 ∶= 𝑋 + 1𝑛 and 𝑌𝑛 ∶= 1−𝑋𝑛 . Then 1−𝑋 ∼ Ber(1/2) ∼ 𝑌 , 𝑋𝑛 ⟶ 𝑋 and 𝑌𝑛 ⟶ 1−𝑋 ∼ 𝑌 .
d
Assuming that, (𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ), then it follows that
𝑛→∞

1 1
𝑋+𝑌 ∼ 𝛿 + 𝛿1 ) ⊗ (𝛿0 + 𝛿1 )
2( 0 2
d
by 𝑋 ⫫ 𝑌 . On the other hand, obviously 𝑋𝑛 + 𝑌𝑛 ≡ 1 ⟶ 1.
– 9

A Appendix - some measure theory


Let 𝐸 be a polish space, typically 𝐸 = ℝ, ℝ𝑛 , and (𝐸, 𝓐, 𝜇) a measure space. Let 1 ⩽ 𝑝 < ∞. Recall
that the spaces of 𝑝-times integrable functions are defined as

𝓛𝑝 (𝐸, 𝓐, 𝜇) ∶= 𝑢 ∶ 𝐸 → ℝ ∶ 𝑢 measurable, |𝑢| (𝑥)𝑝 𝜇(d𝑥) < ∞ , (1 ⩽ 𝑝 < ∞)


{ ∫ }
𝓛∞ (𝐸, 𝓐, 𝜇) ∶= {𝑢 ∶ 𝐸 → ℝ ∶ 𝑢 measurable, ∃𝑐 ⩾ 0, 𝜇 {|𝑢| ⩾ 𝑐} = 0} . (𝑝 = ∞)

Moreover, we define the norms 𝑢 ↦ ‖𝑢‖𝖫𝑝 (1 ⩽ 𝑝 ⩽ ∞)1


1/𝑝
‖𝑢‖𝖫𝑝 ∶= |𝑢(𝑥)|𝑝 𝜇(d𝑥) , (1 ⩽ 𝑝 < ∞)
(∫ )
‖𝑢‖∞ ∶= inf {𝑐 ⩾ 0, 𝜇 {|𝑢| ⩾ 𝑐} = 0 } . (𝑝 = ∞)

𝓛𝑝 (𝜇) is a quasi-normed vector space, since only ‖𝑢‖𝖫𝑝 ⟺ 𝑢 = 0 a.e. and not necessarily 𝑢 ≡ 0. But
we can make 𝓛𝑝 ⇝ 𝖫𝑝 to a normed space by a standard procedure:

• Define an equivalence relation by: 𝑢, 𝑣 ∈ 𝓛𝑝 (𝜇), 𝑢 ∼ 𝑣 ∶⟺ 𝜇(𝑢 ≠ 𝑣) = 0.


• Let [𝑢] ∶= {𝑣 ∈ 𝓛𝑝 (𝜇) ∶ 𝑣 ∼ 𝑢} the equivalence relation with representative 𝑢.
• Then ‖[𝑢]‖𝖫𝑝 ∶= inf {‖𝑣‖𝖫𝑝 ∶ 𝑣 ∈ [𝑢]} = ‖𝑢‖𝖫𝑝 (𝑢 = 𝑣 a.e.!).
• Define 𝖫𝑝 (𝜇) ∶= 𝓛𝑝 (𝜇)/∼ ≡ {[𝑢] ∶ 𝑢 ∈ 𝓛𝑝 (𝜇)}.

Then 𝖫𝑝 (𝜇) is a vector space with (true) norm ‖[𝑢]‖𝖫𝑝 .

Caution

• One normally speaks of 𝖫𝑝 -functions, where we just identify every [𝑢] with a «good» repres-
entative 𝑢0 ∈ [𝑢]. This is justified since [𝑢] = [𝑢0 ] (𝑢0 ∈ [𝑢]) and hence every representative is
unique only up to a null set.
• Expressions like 𝑢 = 𝑣, 𝑢 ⩽ 𝑣 are understood only up to null sets, i.e. 𝑢 = 𝑣 a.e., 𝑢 ⩽ 𝑣 a.e. etc.
• 𝑢 ∈ 𝓛𝑝 (𝜇) ⟺ 𝑢 measurable and |𝑢|𝑝 ∈ 𝖫1 (𝜇).

Next, we state the Dominated convergence theorem or Theorem of Lebesgue. Its power and
flexibility is one of the primary advantages of Lebesgue’s integration theory over Riemmanian. It is
heavily used in probability theory to prove the convergence of the expectation of random variables.

Theorem A.1 (Dominated convergence, Lebesgue). Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞, be a sequence of
real-valued measurable functions on (𝐸, 𝓐, 𝜇) with
𝑛↑∞
• 𝑢𝑛 (𝑥) ⟶ 𝑢(𝑥) for 𝜇 -almost all 𝑥,
• |𝑢(𝑥)| ⩽ 𝑤 for 𝜇 -almost all 𝑥 and some positive 𝑤 ∈ 𝓛𝑝 (𝜇) (𝑛 ∈ ℕ).

Then 𝑢 ∈ 𝓛𝑝 (𝜇) and it holds


𝑛↑∞
(a) ‖𝑢 − 𝑢𝑛 ‖𝖫𝑝 ⟶ 0,
1
To be precise, 𝑢 ↦ ‖𝑢‖𝖫𝑝 (1 ⩽ 𝑝 ⩽ ∞) behaves almost like a norm, since we only have ‖𝑢‖𝖫𝑝 ⟺ 𝑢 = 0 a.e.
– 10

𝑛↑∞
(b) ‖𝑢𝑛 ‖𝖫𝑝 ⟶ ‖𝑢‖𝖫𝑝 .

Mind that

Convergence in 𝖫𝑝 lim ‖𝑢 − 𝑢𝑛 ‖𝖫𝑝 ≠ Convergence of 𝖫𝑝 -norms lim ‖𝑢𝑛 ‖𝖫𝑝 = lim ‖𝑢‖𝖫𝑝
𝑛→∞ 𝑛→∞ 𝑛→∞

This is reflected in the following theorem.

𝑛↑∞
Theorem A.2 (Riesz). Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞. If 𝑢𝑛 (𝑥) ⟶ 𝑢(𝑥) 𝜇 -a.e. and 𝑢 ∈ 𝓛𝑝 (𝜇), then:

lim ‖𝑢 − 𝑢𝑛 ‖𝖫𝑝 ⟺ lim ‖𝑢𝑛 ‖𝖫𝑝 = lim ‖𝑢‖𝖫𝑝


𝑛→∞ 𝑛→∞ 𝑛→∞

Proof. ⟹ Δ-inequality backwards.


𝑝 𝑝 𝑝 𝑝
⟸ Using |𝑢 − 𝑢𝑛 | ⩽ 2𝑝 (|𝑢𝑛 | + |𝑢|𝑝 ) & Fatou’s lemma for 2𝑝 (|𝑢𝑛 | + |𝑢|𝑝 ) − |𝑢 − 𝑢𝑛 | ⩾ 0. ■

Theorem A.3 (Riesz-Fischer). The space 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞, is complete, i.e. every Cauchy sequence
(𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇) converges to some 𝑢 ∈ 𝓛𝑝 (𝜇).

𝖫𝑝
Corollary A.4. Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 ⩽ ∞ with 𝑢𝑛 ⟶ 𝑢, then there exists a subsequence (𝑢𝑛(𝑘) )𝑘∈ℕ
𝑛↑∞
such that 𝑢𝑛(𝑘) ⟶ 𝑢 for almost all 𝑥.

Limits in measure on a non-𝜎 -finite measure space (𝐸, 𝓐, 𝜇) need not be unique, but in probability
we only work with finite measures of mass one. Vitali’s theorem generalises Lebesgue’s dominated
convergence theorem.

Theorem A.5 (Vitali’s theorem). For 1 ⩽ 𝑝 < ∞, let (𝑋𝑛 )𝑛∈ℕ ⊂ 𝖫𝑝 (ℙ) a sequence of random variables

with 𝑋𝑛 ⟶ 𝑋 . Then for all equivalent:
𝖫𝑝
(i) 𝑋𝑛 ⟶ 𝑋
𝑝
(ii) (|𝑋𝑛 | )𝑛∈ℕ is uniformly integrable

𝑝 𝑛↑∞
(iii) 𝔼 |𝑋𝑛 | ⟶ 𝔼 |𝑋|

Remark A.6. Vitali’s theorem A.5 still holds for measure spaces (𝑋, 𝓐, 𝜇) which are not 𝜎 -finite. In
w
this case, we can no longer identify the 𝖫𝑝 -limit and the theorem reads: If 𝑋𝑗 ⟶ 𝑋 (measurable
function enough), then the following are equivalent:

(i) (𝑋𝑛 )𝑛∈ℕ converges in 𝖫𝑝 .

(ii) (𝑋𝑛 )𝑛∈ℕ is uniformly integrable.

(iii) (‖𝑋𝑛 ‖𝑝 ) converges in ℝ.


𝑛∈ℕ

You might also like