Convergence in Probability
Convergence in Probability
Robert Baumgarth1
1
Mathematics Research Unit, FSTM, University of Luxembourg, Maison du Nombre, 6, Avenue de la Fonte, 4364
Esch-sur-Alzette, Grand-Duché de Luxembourg
convergence in probability
(stochastic convergence)
limit a.s., 3.4
same const.
weak convergence
(convergence in distribution/law)
1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 «Subsequence» implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Weak convergence as convergence in distribution . . . . . . . . . . . . . . . . . . . . . . 5
5 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A Appendix - some measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
– 2
1 Definitions
Definition 1.1. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) defined on the same probability
space (Ω, 𝓐, ℙ). We say that
a.s. 𝑛↑∞
(i) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 almost surely (a.s.) or 𝑋𝑛 ⟶ 𝑋 with probability one
Definition 1.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) not necessarily defined on the same
probability space. We say that
d/𝓓/𝓛
(i) 𝑋𝑛 ⟶ 𝑋 , 𝑋𝑛 converges to 𝑋 in distribution/law
Remark 1.3. (1) Since the real random variables 𝑋𝑛 , 𝑋 do not have to be defined on the same
probability space in the definition of weak convergence, we actually should emphasise the de-
pendence by writing ℙ, ℙ𝑛 and 𝔼, 𝔼𝑛 respectively. It is a common convention to omit this
dependence.
(2) As already indicated in the notation in Definition 1.2, (i) and (ii) are equivalent (⇝ Portmanteau
theorem). We can equivalently write (1.1) as
2 Implications
Theorem 2.1. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) and 1 ⩽ 𝑝 ⩽ ∞. Then
𝖫𝑝 𝖫1
(a) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋
𝖫1 ℙ
(b) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋
a.s. ℙ
(c) 𝑋𝑛 ⟶ 𝑋 ⟹ 𝑋𝑛 ⟶ 𝑋
Proof. (a) Let 1 ⩽ 𝑝 < ∞ and 𝑞 the conjugate exponent 1/𝑝 + 1/𝑞 = 1. By Hölder’s inequality
1/𝑝 1/𝑞
𝑝
𝔼 |𝑋 𝑛 − 𝑋 | = |𝑋𝑛 − 𝑋 | ⋅ 1 dℙ ⩽ |𝑋𝑛 − 𝑋 | dℙ 1𝑞 dℙ
∫ (∫ ) (∫ )
𝑝 1/𝑝 𝑛↑∞
= (𝔼 |𝑋𝑛 − 𝑋 | ) ⟶ 0.
assumption
The case 𝑝 = ∞ follows in a completely analogue matter. In particular, we have shown that
𝖫𝑝 (ℙ) ↪ 𝖫1 (ℙ) is a continuous embedding.
1 𝑛↑∞
∀𝜀 > 0 ∶ ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) ⩽ 𝔼 |𝑋𝑛 − 𝑋 | ⟶ 0.
𝜀
𝑛↑∞
The random variables 𝑌𝑛 ∶= 𝟙{|𝑋𝑛 −𝑋 |>𝜀} are uniformly bounded by 𝟭 ∈ 𝖫𝑝 (ℙ) and 𝑌𝑛 ⟶ 0 a.s.
Hence, by dominated convergence A.1, it follows
𝑛↑∞
∀𝜀 > 0 ∶ ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) = 𝔼𝟙{|𝑋𝑛 −𝑋 |>𝜀} ⟶ 0. ■
ℙ d
Theorem 2.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) such that 𝑋𝑛 ⟶ 𝑋 . Then 𝑋𝑛 ⟶ 𝑋 .
𝖫1
Moreover, it holds 𝑓 (𝑋𝑛 ) ⟶ 𝑓 (𝑋) for all 𝑓 ∈ 𝖢𝑏 (ℝ).
Proof. Let 𝑓 ∈ 𝖢𝑏 (ℝ) and 𝜀 > 0 fixed. Since {|𝑋| > 𝑘} ↓ ∅ for 𝑘 ↑ ∞, by the ∅-continuity of the
probability measure ℙ, we have
⩽ + + 𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)| dℙ
(∫{|𝑋𝑛 −𝑋 |⩽𝛿 }∩{|𝑋|⩽𝑁} ∫{|𝑋𝑛 −𝑋 |⩽𝛿 }∩{|𝑋|>𝑁} ∫{|𝑋𝑛 −𝑋 |>𝛿 } ) |
where we used that |𝑓 | ⩽ 𝑀 together with (2.2) and since on the set {|𝑋𝑛 − 𝑋 | ⩽ 𝛿 } ∩ {|𝑋| ⩽ 𝑁},
we have |𝑋𝑛 | ⩽ |𝑋𝑛 − 𝑋 | + |𝑋| ⩽ 𝛿 + 𝑁 ⩽ 1 + 𝑁 . Finally, using (2.1), we get
𝑛↑∞ 𝜀↓0
𝔼 |𝑓 (𝑋𝑛 ) − 𝑓 (𝑋)| ⩽ (2𝑀 + 1)𝜀 + 2𝑀ℙ(|𝑋𝑛 − 𝑋 | > 𝛿 ) ⟶ (2𝑀 + 1)𝜀 ⟶ 0,
3 «Subsequence» implications
Recall the definition of the limit superior of sets:
lim sup 𝐴𝑛 = {𝐴𝑛 for infinitely many 𝑛 ∈ ℕ} = {𝐴𝑛 infinitely often (i.o.)} .
Proof.
and we see that ∑𝑛∈ℕ 𝟙𝐴𝑛 (𝜔) < ∞ a.s., hence ℙ(𝐴𝑛 for infinitely many 𝑛 ∈ ℕ) = 0. ■
Making the special choice 𝐴𝑛 ∶= {|𝑋𝑛 − 𝑋 | > 𝜀} in 3.1, we can prove that having control of a fast
rate of convergence of ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) → 0, we already get convergence almost surely.
ℙ
Lemma 3.2. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) with 𝑋𝑛 ⟶ 𝑋 and a null sequence 𝜀𝑛 ↓ 0
a.s.
with ℙ(|𝑋𝑛 − 𝑋 | > 𝜀𝑛 ) < ∞. Then 𝑋𝑛 ⟶ 𝑋 .
– 5
Proof. By assumption,
−𝑘 −𝑘
∑ ℙ(|𝑋𝑛(𝑘) − 𝑋 | > 2 ) ⩽ ∑ 2 < ∞.
𝑘∈ℕ 𝑘∈ℕ
Lemma 3.4. Let 𝑋, 𝑋𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ) defined on the same probability space. If
𝑋 ≡ 𝑐 is constant a.s., then
ℙ d
𝑋𝑛 ⟶ 𝑋 ≡ 𝑐 ⟺ 𝑋𝑛 ⟶ 𝑋 ≡ 𝑐.
⟸ We choose a cut-off function in a clever way: Fix 𝜀 > 0 and 𝜒𝜀 ∈ 𝖢𝑏 (ℝ) with 𝜒𝜀 (0) = 0 and
𝜒𝜀 ⩾ 𝟙[−𝜀,𝜀]𝑐 . Then 𝜒𝜀 (⋅ − 𝑐) ∈ 𝖢𝑐 (ℝ) and we have
w
ℙ(|𝑋𝑛 − 𝑋 | > 𝜀) ⩽ 𝜒𝜀 (𝑋𝑛 − 𝑋)dℙ = 𝜒𝜀 (𝑋𝑛 − 𝑐)dℙ ⟶ 𝜒𝜀 (𝑋 − 𝑐)dℙ = 0. ■
∫ ∫ 𝑛↑∞ ∫
Theorem 4.1 (Skorokhod’s representation theorem). Let 𝑋𝑛 , 𝑋 be real random variables (𝑛 ∈ ℕ), not
d
necessarily defined on the same probability space. If 𝑋𝑛 ⟶ 𝑋 , then there is another probability space
and random variables 𝑌𝑛 , 𝑌 on this probability space, such that
a.s.
𝑋𝑛 ∼ 𝑌 𝑛 , 𝑋∼𝑌 and 𝑋𝑛 ⟶ 𝑋.
– 6
Theorem 4.2 (CMT – continuous mapping theorem). Let 𝑋𝑛 , 𝑋 ∶ Ω → ℝ𝑑 be random variables with
d d
𝑋𝑛 ⟶ 𝑋 and 𝑔 ∶ ℝ𝑑 → ℝ𝑟 continuous. Then 𝑔(𝑋𝑛 ) ⟶ 𝑔(𝑋) in ℝ𝑟 .
𝑛→∞ 𝑛→∞
d
Hence, 𝑔(𝑋𝑛 ) ⟶ 𝑔(𝑋) in ℝ𝑟 . ■
𝑛→∞
𝑋𝑛 d 𝑋
⟶ in ℝ𝑟+𝑑 .
( 𝑌𝑛 ) 𝑛 → ∞ ( 𝑐 )
By the CMT, Theorem 4.2 above, for any function 𝑔 continuous on ℝ𝑟+𝑑 or a.s. continuous in (𝑋, 𝑐),
moreover
d
𝑔(𝑋𝑛 , 𝑌𝑛 ) ⟶ 𝑔(𝑋, 𝑐).
𝑛→∞
ℙ
Proof. Using the assumption 𝑌𝑛 ⟶ 𝑐
𝑛→∞
𝑋𝑛 𝑋𝑛 0 ℙ 0 𝑋𝑛 𝑋𝑛
− = ⟶ ⟹ , are stochastically equivalent.
( 𝑌𝑛 ) ( 𝑐 ) (𝑌𝑛 − 𝑐) 𝑛 → ∞ (0) ( 𝑌𝑛 ) ( 𝑐 )
𝑋𝑛 𝑋
⟹ 𝔼𝑓 ⟶ 𝔼
( 𝑐 ) 𝑛→∞ (𝑐 )
⟹ 𝔼𝑓 (𝑋𝑛 , 𝑐) ⟶ 𝔼𝑓 (𝑋, 𝑐),
𝑛→∞
d
as 𝑋𝑛 ⟶ 𝑋 by assumption. So the assertion follows. ■
𝑛→∞
We emphasise that the CMT, Cramér’s theorem and the Cramér-Slutsky theorem also hold in a
metric space setting.
d
Example 4.5. Let 𝑋𝑛 ∶ Ω → ℝ𝑟 and 𝑌𝑛 ∶ Ω → ℝ𝑑 be random variables with 𝑋𝑛 ⟶ 𝑋 in ℝ𝑟 and
𝑛→∞
ℙ
𝑌𝑛 ⟶ 𝑐 in ℝ𝑑 for some constant 𝑐 . By applying the CMT, Theorem 4.2 to the functions 𝑔(𝑥, 𝑦) = 𝑥+𝑦,
𝑛→∞
𝑔(𝑥, 𝑦) = 𝑥𝑦, and 𝑔(𝑥, 𝑦) = 𝑥𝑦−1 , we get by Theorem 4.4: For 𝑟 = 𝑑
– 7
d
(a) 𝑋𝑛 + 𝑌𝑛 ⟶ 𝑋 + 𝑐 ,
𝑛→∞
d
(b) ⟨𝑋𝑛 , 𝑌𝑛 ⟩ ⟶ ⟨𝑋, 𝑐⟩,
𝑛→∞
and for 𝑑 = 1,
d
(c) 𝑋𝑛 𝑌𝑛 ⟶ 𝑋𝑐 ,
𝑛→∞
𝑋𝑛 d 𝑋
(c) ⟶ .
𝑌𝑛 𝑛 → ∞ 𝑐
d
𝑋𝑛 ⟶ 𝑋 ⟺ ∀𝜉 ∈ ℝ𝑑 ∶ 𝜑𝑋𝑛 ⟶ 𝜑𝑋 (𝜉).
𝑛→∞ 𝑛→∞
d d
𝑋𝑛 ⟶ 𝑋 ⟺ ∀𝜉 ∈ ℝ𝑑 ∶ ⟨𝜉, 𝑋𝑛 ⟩ ⟶ ⟨𝜉, 𝑋⟩ .
𝑛→∞ 𝑛→∞
Finally, we give two result for the convergence in distribution of sums and vectors of random
d
variables. Without further assumptions (in general) it is not possible to deduce that, if 𝑋𝑛 ⟶ 𝑋
𝑛→∞
d d d
and 𝑌𝑛 ⟶ 𝑌 , then also 𝑋𝑛 + 𝑌𝑛 ⟶ 𝑋 + 𝑌 or (𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ), even if the random variables
𝑛→∞ 𝑛→∞ 𝑛→∞
𝑋𝑛 , 𝑋, 𝑌𝑛 , 𝑌 are defined on the same probability space. But it holds
Lemma 4.8. Let 𝑋𝑛 , 𝑌𝑛 ∶ Ω → ℝ be random variables (𝑛 ∈ ℕ), not necessarily defined on the same
probability space. Then
d d
⎧𝑋𝑛 ⟶ 𝑋, 𝑌𝑛 ⟶ 𝑌 and
d ⎪ 𝑛→∞ 𝑛→∞
(𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ) ⟹ ⎨
𝑛→∞ d
⎪𝑋 + 𝑌 ⟶ 𝑋 +𝑌.
⎩ 𝑛 𝑛 𝑛→∞
Proof. We choose 𝑑 = 2, and 𝜉 = (𝜏, 0), 𝜉 = (0, 𝜏) and 𝜉(𝜏, 𝜏) in the Cramér-Wold device 4.7. ■
5 Counterexamples
Example 5.1. Let (Ω, 𝓐, ℙ) = ([0, 1), 𝓑[0, 1), Leb|[0,1) = d𝜔).
(a) 𝖫1 -convergence ⟹
/ 𝖫𝑝 -convergence (𝑝 > 1)
(b) 𝖫1 -convergence ⟹
/ a.s. convergence
It is easy so see that 𝑋(𝜔) ≡ 0 in 𝖫1 , but the sequence does not converge at any point 𝜔 ∈ [0, 1).
/ 𝖫1 -convergence
(c) ℙ-convergence ⟹
(d) ℙ-convergence ⟹
/ a.s. convergence. Clear by part (b).
(e) w-convergence ⟹
/ ℙ-convergence (also if all random variables are defined on the same probabil-
ity space). We define the so called «Rademacher» functions
𝑅1 , 𝑅2 , 𝑅3 , ... by
1 1
𝑅𝑛 ∼ 𝛿
2 1
+ 𝛿 ,
2 −1
i.e. the 𝑅𝑛 are defined alternating on sets of the same length. Clearly,
w
𝔼𝑓 (𝑅𝑛 ) = 12 𝑓 (1) + 12 𝑓 (−1) ⟶ 𝔼𝑓 (𝑅1 ), i.e. 𝑅𝑛 ⟶ 𝑅1 . On the other hand we see that the
𝑅𝑛 (𝜔) cannot converge a.s., since
1
⎧
⎪0, {𝑅𝑛 = 𝑅𝑘 } in of all cases
2
⎪ 1
𝑅𝑛 − 𝑅𝑘 = ⎨+2, {𝑅𝑛 = 1, 𝑅𝑘 = −1} in of all cases
4
⎪ 1
⎪
⎩−2, {𝑅𝑛 = −1, 𝑅𝑘 = 1} in 4
of all cases.
Hence, it follows that ℙ(|𝑅𝑛 − 𝑅𝑘 | > 𝜀) = 12 for all 𝜀 < 2. So, (𝑅𝑛 )𝑛∈ℕ cannot be a ℙ-Cauchy
sequence, and thus, also not stochastically convergent.
d d d
̸
Example 5.2. 𝑋𝑛 ⟶ 𝑋, 𝑌𝑛 ⟶ 𝑌 ⟹(𝑋
𝑛→∞
𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ). Choose 𝑋, 𝑌 ∼ Ber(1/2) iid Bernoulli
𝑛→∞ 𝑛→∞
d d
and define 𝑋𝑛 ∶= 𝑋 + 1𝑛 and 𝑌𝑛 ∶= 1−𝑋𝑛 . Then 1−𝑋 ∼ Ber(1/2) ∼ 𝑌 , 𝑋𝑛 ⟶ 𝑋 and 𝑌𝑛 ⟶ 1−𝑋 ∼ 𝑌 .
d
Assuming that, (𝑋𝑛 , 𝑌𝑛 ) ⟶ (𝑋, 𝑌 ), then it follows that
𝑛→∞
1 1
𝑋+𝑌 ∼ 𝛿 + 𝛿1 ) ⊗ (𝛿0 + 𝛿1 )
2( 0 2
d
by 𝑋 ⫫ 𝑌 . On the other hand, obviously 𝑋𝑛 + 𝑌𝑛 ≡ 1 ⟶ 1.
– 9
𝓛𝑝 (𝜇) is a quasi-normed vector space, since only ‖𝑢‖𝖫𝑝 ⟺ 𝑢 = 0 a.e. and not necessarily 𝑢 ≡ 0. But
we can make 𝓛𝑝 ⇝ 𝖫𝑝 to a normed space by a standard procedure:
Caution
• One normally speaks of 𝖫𝑝 -functions, where we just identify every [𝑢] with a «good» repres-
entative 𝑢0 ∈ [𝑢]. This is justified since [𝑢] = [𝑢0 ] (𝑢0 ∈ [𝑢]) and hence every representative is
unique only up to a null set.
• Expressions like 𝑢 = 𝑣, 𝑢 ⩽ 𝑣 are understood only up to null sets, i.e. 𝑢 = 𝑣 a.e., 𝑢 ⩽ 𝑣 a.e. etc.
• 𝑢 ∈ 𝓛𝑝 (𝜇) ⟺ 𝑢 measurable and |𝑢|𝑝 ∈ 𝖫1 (𝜇).
Next, we state the Dominated convergence theorem or Theorem of Lebesgue. Its power and
flexibility is one of the primary advantages of Lebesgue’s integration theory over Riemmanian. It is
heavily used in probability theory to prove the convergence of the expectation of random variables.
Theorem A.1 (Dominated convergence, Lebesgue). Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞, be a sequence of
real-valued measurable functions on (𝐸, 𝓐, 𝜇) with
𝑛↑∞
• 𝑢𝑛 (𝑥) ⟶ 𝑢(𝑥) for 𝜇 -almost all 𝑥,
• |𝑢(𝑥)| ⩽ 𝑤 for 𝜇 -almost all 𝑥 and some positive 𝑤 ∈ 𝓛𝑝 (𝜇) (𝑛 ∈ ℕ).
𝑛↑∞
(b) ‖𝑢𝑛 ‖𝖫𝑝 ⟶ ‖𝑢‖𝖫𝑝 .
Mind that
Convergence in 𝖫𝑝 lim ‖𝑢 − 𝑢𝑛 ‖𝖫𝑝 ≠ Convergence of 𝖫𝑝 -norms lim ‖𝑢𝑛 ‖𝖫𝑝 = lim ‖𝑢‖𝖫𝑝
𝑛→∞ 𝑛→∞ 𝑛→∞
𝑛↑∞
Theorem A.2 (Riesz). Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞. If 𝑢𝑛 (𝑥) ⟶ 𝑢(𝑥) 𝜇 -a.e. and 𝑢 ∈ 𝓛𝑝 (𝜇), then:
Theorem A.3 (Riesz-Fischer). The space 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 < ∞, is complete, i.e. every Cauchy sequence
(𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇) converges to some 𝑢 ∈ 𝓛𝑝 (𝜇).
𝖫𝑝
Corollary A.4. Let (𝑢𝑛 )𝑛∈ℕ ⊂ 𝓛𝑝 (𝜇), 1 ⩽ 𝑝 ⩽ ∞ with 𝑢𝑛 ⟶ 𝑢, then there exists a subsequence (𝑢𝑛(𝑘) )𝑘∈ℕ
𝑛↑∞
such that 𝑢𝑛(𝑘) ⟶ 𝑢 for almost all 𝑥.
Limits in measure on a non-𝜎 -finite measure space (𝐸, 𝓐, 𝜇) need not be unique, but in probability
we only work with finite measures of mass one. Vitali’s theorem generalises Lebesgue’s dominated
convergence theorem.
Theorem A.5 (Vitali’s theorem). For 1 ⩽ 𝑝 < ∞, let (𝑋𝑛 )𝑛∈ℕ ⊂ 𝖫𝑝 (ℙ) a sequence of random variables
ℙ
with 𝑋𝑛 ⟶ 𝑋 . Then for all equivalent:
𝖫𝑝
(i) 𝑋𝑛 ⟶ 𝑋
𝑝
(ii) (|𝑋𝑛 | )𝑛∈ℕ is uniformly integrable
𝑝 𝑛↑∞
(iii) 𝔼 |𝑋𝑛 | ⟶ 𝔼 |𝑋|
Remark A.6. Vitali’s theorem A.5 still holds for measure spaces (𝑋, 𝓐, 𝜇) which are not 𝜎 -finite. In
w
this case, we can no longer identify the 𝖫𝑝 -limit and the theorem reads: If 𝑋𝑗 ⟶ 𝑋 (measurable
function enough), then the following are equivalent: