Modes of Convergence Explained
Modes of Convergence Explained
533
534 CHAPTER 66. MODES OF CONVERGENCE - RELATIONS
Note that this holds for any arbitrarily small c. By the de…nition of convergence in
probability, this means that Xn converges in probability to X (if you are wondering
about strict and weak inequalities here and in the de…nition of convergence in
probability, note that jXn Xj c implies jXn Xj > " for any strictly positive
" < c).
Let fXn g be a sequence of random variables1 . Let X n be the sample mean of the
…rst n terms of the sequence:
n
1X
Xn = Xi
n i=1
9 2 R : E [Xn ] = ; 8n 2 N
9 2 2 R+ : Var [Xn ] = 2 ; 8n 2 N
Cov [Xn ; Xn+k ] = 0; 8n; k 2 N
plim X n =
n!1
1 See p. 491.
2 See p. 511.
3 See p. 505.
4 In other words, all the random variables in the sequence have the same mean , the same
variance 2 and zero covariance with each other. See p. 493 for a de…nition of covariance
stationary sequence.
535
536 CHAPTER 67. LAWS OF LARGE NUMBERS
Var X n
P Xn E Xn k
k2
for any strictly positive real number k. Plugging in the values for the expected
value and the variance derived above, we obtain:
2
P Xn k
nk 2
Since
2
lim =0
n!1 nk 2
and
P Xn k 0
then it must also be that:
lim P X n k =0
n!1
Note that this holds for any arbitrarily small k. By the very de…nition of conver-
gence in probability, this means that X n converges in probability to (if you are
5 See p. 511.
6 See,in particular, the Multiplication by a constant property (p. 158).
7 See p. 168.
8 See p. 242.
67.1. WEAK LAWS OF LARGE NUMBERS 537
wondering about strict and weak inequalities here and in the de…nition of conver-
gence in probability, note that X n k implies X n > " for any strictly
positive " < k).
Note that it is customary to state Chebyshev’s Weak Law of Large Numbers
as a result on the convergence in probability of the sample mean:
plim X n =
n!1
However, the conditions of the above theorem guarantee the mean square conver-
gence9 of the sample mean to :
m:s:
Xn !
9 2 R : E [Xn ] = ; 8n > 0
8j 0; 9 j 2 R : Cov [Xn ; Xn j] = j ; 8n >j
9 See p. 519.
1 0 See p. 534.
1 1 In other words, all the random variables in the sequence have the same mean , the same
variance 0 and the covariance between a term Xn of the sequence and the term that is located
j positions before it (Xn j ) is always the same ( j ), irrespective of how Xn has been chosen.
538 CHAPTER 67. LAWS OF LARGE NUMBERS
Proof. For a full proof see e.g. Karlin and Taylor12 (1975). We give here a proof
based on the assumption that covariances are absolutely summable:
1
X
j <1
j=0
which is stronger than (67.1). The expected value of the sample mean X n is
" n # n
1X 1X
E Xn = E Xi = E [Xi ]
n i=1 n i=1
n
1X 1
= = n =
n i=1 n
(n i)
<1
n
Var X n
n
Now we can apply Chebyshev’s inequality to the sample mean X n :
Var X n
P Xn E Xn k
k2
for any strictly positive real number k. Plugging in the values for the expected
value and the variance derived above, we obtain:
Var X n
P Xn k
k2 nk 2
Since
lim =0
n!1 nk 2
and
P Xn k 0
then it must also be that:
lim P X n k =0
n!1
Note that this holds for any arbitrarily small k. By the very de…nition of conver-
gence in probability, this means that X n converges in probability to (if you are
wondering about strict and weak inequalities here and in the de…nition of conver-
gence in probability, note that X n k implies X n > " for any strictly
positive " < k).
540 CHAPTER 67. LAWS OF LARGE NUMBERS
Also Chebyshev’s Weak Law of Large Numbers for correlated sequences has
been stated as a result on the convergence in probability of the sample mean:
plim X n =
n!1
However, the conditions of the above theorem also guarantee the mean square
convergence of the sample mean to :
m:s:
Xn !
Proof. In the above proof of Chebyshev’s Weak Law of Large Numbers for corre-
lated sequences, we proved that
Var X n
n
and that
E Xn =
This implies:
h 2
i h 2
i
E Xn =E Xn E Xn = Var X n
n
Thus, taking limits on both sides, we obtain:
h 2
i
lim E X n lim =0
n!1 n!1 n
But h i
2
E Xn 0
so it must be: h i
2
lim E Xn =0
n!1
a Strong Law of Large Numbers applies to the sample mean X n if and only
if a Strong Law of Large numbers applies to each of the components of the
vector X n , i.e. if and only if
a:s:
X n;j ! j; j = 1; : : : ; K
Exercise 1
Let f"n g be an IID sequence. A generic term of the sequence has mean and
variance 2 . Let fXn g be a covariance stationary sequence such that a generic
term of the sequence satis…es
Xn = Xn 1 + "n
the sample mean of the sequence. Verify whether the sequence X n satis…es
the conditions that are required by Chebyshev’s Weak Law of Large Numbers. In
a¢ rmative case, …nd its probability limit.
Solution
By assumption the sequence fXn g is covariance stationary. So all the terms of the
sequence have the same expected value. Taking the expected value of both sides
of the equation
X n = X n 1 + "n
we obtain:
E [Xn ] = E [ X n 1 + "n ]
= E [Xn 1 ] + E ["n ]
= E [Xn ] +
E [Xn ] =
1
By the same token, the variance can be derived from:
where: in step A we have used the fact that Xn 1 is independent of "n because
f"n g is IID. Solving for Var [Xn ], we obtain
2
Var [Xn ] = 2
1
Now, we need to derive Cov [Xn ; Xn+j ]. Note that:
Xn+1 = Xn + "n+1
67.4. SOLVED EXERCISES 543
2
Xn+2 = Xn+1 + "n+2 = Xn + "n+2 + "n+1
3 2
Xn+3 = Xn+2 + "n+3 = Xn + "n+3 + "n+2 + "n+1
..
.
j 1
X
j s
Xn+j = Xn+j 1 + "n+j = Xn + "n+j s
s=0
B = j
Cov [Xn ; Xn ]
j
= Var [Xn ]
2
j
= 2
1
and the conditions of Chebyshev’s Weak Law of Large Numbers are satis…ed.
Therefore, the sample mean converges in probability to the population mean:
plim X n = E [Xn ] =
n!1 1
544 CHAPTER 67. LAWS OF LARGE NUMBERS
Chapter 68
Let fXn g be a sequence of random variables1 . Let X n be the sample mean of the
…rst n terms of the sequence:
n
1X
Xn = Xi
n i=1
d
where Z is a standard normal random variable2 , and are two constants and !
indicates convergence in distribution3 .
Why is the expression X n = multiplied by the square root of n? If we
p
do not multiply it by n, then X n = converges to a constant, provided that
4
the conditions
p of a Law of Large Numbers apply. On the contrary, multiplying
it by n, we obtain a sequence that converges to a proper random variable (i.e.
a random variable that is not constant). When the conditions of a Central Limit
Theorem apply, this variable has a normal distribution.
In practice, the CLT is used as follows:
1. we observe a sample consisting of n observations X1 , X2 , : : :, Xn ;
2. if n is large enough, then a standard
p normal distribution is a good approxi-
mation of the distribution of n X n = ;
3. therefore, we pretend that
p
n Xn = N (0; 1)
1 See p. 491.
2 Remember that a standard normal random variable is a normal random variable with zero
mean and unit variance (p. 376).
3 See p. 527.
4 See p. 535.
545
546 CHAPTER 68. CENTRAL LIMIT THEOREMS
There are several Central Limit Theorems. We report some examples below.
E [Xn ] = < 1; 8n 2 N
Var [Xn ] = 2 < 1; 8n 2 N
2
where > 0. Then, a Central Limit Theorem applies to the sample mean X n :
p Xn d
n !Z
d
where Z is a standard normal random variable and ! denotes convergence in
distribution.
Proof. We will just sketch a proof. For a detailed and rigorous proof see, for
example, Resnick6 (1999) and Williams7 (1991). First of all, denote by fZn g the
sequence whose generic term is
p Xn
Zn = n
Var [Y1 ] = 1
Therefore:
n
t
lim 'Zn (t) = lim 'Y1 p
n!1 n!1 n
" !#n
2 2
1 t t
= lim 1 p +o p
n!1 2 n n
n
1 t2 t2
= lim 1 +o
n!1 2n n
1 2
= exp t = 'Z (t)
2
where
1 2
'Z (t) = exp t
2
is the characteristic function of a standard normal random variable Z (see the
lecture entitled Normal distribution - p. 379). A theorem, called Lévy continuity
theorem, which we do not cover in these lectures, states that if a sequence of
random variables fZn g is such that their characteristic functions 'Zn (t) converge
to the characteristic function 'Z (t) of a random variable Z, then the sequence
fZn g converges in distribution to Z. Therefore, in our case the sequence fZn g
converges in distribution to a standard normal distribution.
So, roughly speaking, under the stated assumptions, the distribution of the
sample mean X n can be approximated by a normal distribution with mean and
2
variance n (provided n is large enough).
Also note that the conditions for the validity of Lindeberg-Lévy Central Limit
Theorem resemble the conditions for the validity of Kolmogorov’s Strong Law of
Large Numbers10 . The only di¤erence is the additional requirement that
2
Var [Xn ] = < 1; 8n 2 N
E [Xn ] = < 1; 8n 2 N
1 0 See p. 540.
1 1 See p. 492.
1 2 See p. 494.
68.2. MULTIVARIATE GENERALIZATIONS 549
2
Var [Xn ] = < 1; 8n 2 N
1
X
lim nVar X n = 2 + 2 Cov [X1 ; Xi ] = V < 1
n!1
i=2
where V > 0. Then, a Central Limit Theorem applies to the sample mean X n :
p Xn d
n p !Z
V
d
where Z is a standard normal random variable and ! denotes convergence in
distribution.
and the fact that ergodicity is replaced by the stronger condition of mixing.
Finally, let us mention that the variance V in the above proposition, which is
de…ned as
V = lim nVar X n
n!1
E [Xn ] = 2 RK ; 8n 2 N
1 3 Durrett, R. (2010) "Probability: Theory and Examples", Cambridge University Press.
1 4 White, H. (2001) "Asymptotic theory for econometricians", Academic Press.
1 5 See p. 541.
550 CHAPTER 68. CENTRAL LIMIT THEOREMS
Var [Xn ] = 2 RK K
; 8n 2 N
Pn
where is a positive de…nite matrix. Let X n = n1 i=1 Xi be the vector of sample
means. Then:
p d
n 1 Xn !Z
d
where Z is a standard multivariate normal random vector16 and ! denotes con-
vergence in distribution.
Proof. For a proof see, for example, Basu17 (2004), DasGupta18 (2008) and
McCabe and Tremayne19 (1993).
In a similar manner, the CLT for correlated sequences generalizes to random
vectors (V becomes a matrix, called long-run covariance matrix).
Exercise 1
Let fXn g be a sequence of independent Bernoulli random variables20 with para-
meter p = 21 , i.e. a generic term Xn of the sequence has support
RXn = f0; 1g
Solution
The sequence fXn g is and IID sequence. The mean of a generic term of the
sequence is
X
E [Xn ] = xpXn (x) = 1 pXn (1) + 0 pXn (0)
x2RXn
1 1 1
= 1 +0 1 = <1
2 2 2
1 6 See p. 439.
1 7 Basu, A. K. (2004) Measure theory and probability, PHI Learning PVT.
1 8 DasGupta, A. (2008) Asymptotic theory of statistics and probability, Springer.
1 9 McCabe, B. and A. Tremayne (1993) Elements of modern asymptotic theory with statistical
The variance of a generic term of the sequence can be derived thanks to the usual
formula for computing the variance21 :
X
E Xn2 = x2 pX (x) = 12 pXn (1) + 02 pXn (0)
x2RXn
1 1 1
= 1 +0 =
2 2 2
2 1
E [Xn ] =
4
2 1 1 1
Var [Xn ] = E Xn2 E [Xn ] = = <1
2 4 4
Therefore, the sequence fXn g satis…es the conditions of Lindeberg-Lévy Central
Limit Theorem (IID, …nite mean, …nite variance). The mean of the …rst 100 terms
of the sequence is:
100
1 X
X 100 = Xi
100 i=1
Using the Central Limit Theorem to approximate its distribution, we obtain:
Var [Xn ]
Xn N E [Xn ] ;
n
or
1 1
X 100 N ;
2 400
Exercise 2
Let fXn g be a sequence of independent Bernoulli random variables with parameter
p = 21 , as in the previous exercise. Let fYn g be another sequence of random
variables such that
1
Yn = Xn+1 Xn ; 8n
2
Suppose fYn g satis…es the conditions of a Central Limit Theorem for correlated
sequences. Derive an approximate distribution for the mean of the …rst n terms of
the sequence fYn g.
Solution
The sequence fXn g is and IID sequence. The mean of a generic term of the
sequence is
1 1
E [Yn ] = E Xn+1 Xn = E [Xn+1 ] E [Xn ]
2 2
1 11 1
= =
2 22 4
The variance of a generic term of the sequence is
1
Var [Yn ] = Var Xn+1 Xn
2
2 1 Var [X] = E X2 E [X]2 . See p. 156.
552 CHAPTER 68. CENTRAL LIMIT THEOREMS
1 1
= Var [Xn+1 ] + Var [Xn ] 2 Cov [Xn+1 ; Xn ]
4 2
A 1
= Var [Xn+1 ] + Var [Xn ]
4
1 11 5
= + =
4 44 16
where: in step A we have used the fact that Xn and Xn+1 are independent. The
covariance between two successive terms of the sequence is
Cov [Yn+1 ; Yn ]
1 1
= Cov Xn+2 Xn+1 ; Xn+1 Xn
2 2
A 1
= Cov [Xn+2 ; Xn+1 ] Cov [Xn+2 ; Xn ]
2
1 1
Cov [Xn+1 ; Xn+1 ] + Cov [Xn+1 ; Xn ]
2 4
B 1
= Cov [Xn+1 ; Xn+1 ]
2
C 1
= Var [Xn+1 ]
2
11 1
= =
24 8
Cov [Yn+j ; Yn ]
1 1
= Cov Xn+j+1 Xn+j ; Xn+1 Xn
2 2
A 1
= Cov [Xn+j+1 ; Xn+1 ] Cov [Xn+j+1 ; Xn ]
2
1 1
Cov [Xn+j ; Xn+1 ] + Cov [Xn+j ; Xn ]
2 4
B = 0
Using the Central Limit Theorem for correlated sequences to approximate its dis-
tribution, we obtain
V
Y n N E [Yn ] ;
n
or
1 1
Yn N ;
4 16 n
Exercise 3
Let Y be a binomial random variable with parameters n = 100 and p = 12 (you need
to read the lecture entitled Binomial distribution 23 in order to be able to solve this
exercise). Using the Central Limit Theorem, show that a normal random variable
X with mean = 50 and variance 2 = 25 can be used as an approximation of Y .
Solution
1
A binomial random variable Y with parameters n = 100 and p = 2 can be written
as
100
X
Y = Xi
i=1
In the …rst exercise, we have shown that the distribution of X 100 can be approxi-
mated by a normal distribution:
1 1
X 100 N ;
2 400
1 1
Y N 100; 1002
2 400
2 3 See p. 341.
Statistika Matematika II
56
Bank Soal | Mata Kuliah Semester V 57
3. Misalkan y1 , y2 , ..., yn adalah sampel acak dari distribusi Poisson dengan parameter λ.
4. Misalkan X variabel acak dari pdf f (x) = 4x3 dengan 0 < x < 1. Carilah pdf dari variabel acak
(a) Y = ex ;
(b) U = (x − 21 )2 .
Solusi
1. Misalkan y1 , y2 , ..., yn sampel acak dari pdf
θy θ−1 ,
0 < y < 1; θ > 0
f (y, θ) = .
0, yang lain
(a)
Z1 1
θ θ
M= θy θ dy = yθ = .
θ+1 0 θ+1
0
θ̂ θy
ŷ = ⇐⇒ θ̂ = .
θ̂ + 1 1 − ŷ
2. Diketahui
= [θ − E(θ̂)] + B(θ̂).
Diperoleh
= E(θ̂2 ) − 2θE(θ̂) + θ2
= E(θ̂2 − 2θθ̂ + θ2 )
= E((θ̂ − θ)2 ).
3. Misalkan y1 , y2 , ..., yn adalah sampel acak dari distribusi Poisson dengan parameter λ.
Fungsi pendugaan
L = P {y1 ∩ y2 ∩ · · · ∩ yn }
sehingga !
n
X n
X
ln(L) = −nλ + yi ln(λ) − ln(yi !).
i=1 i=1
∂ ln(L)
=0
Pn∂λ
yi
−n + i=1 = 0
λ !
n
1 X
λ̂ = yi = ȳ,
n i=1
∂ 2 ln(L)
dengan > 0, sehingga diperoleh λ̂ = ȳ adalah MLE dari estimator λ.
∂λ2
(b) !
n n n
1X 1X 1X 1
E(λ̂) = E(ȳ) = E yi = E(yi ) = λ = (nλ) = λ,
n i=1 n i=1 n i=1 n
n
! n n
1X 1 X 1 X 1 λ
V ar(λ̂) = V ar(ȳ) = V ar yi = 2
V ar(yi ) = 2
λ = 2 (nλ) = .
n i=1 n i=1 n i=1 n n
(a) W = ex berarti
FW (w) = P (ex ≤ w)
= P (x ≤ ln(w))
= Fx (ln(w))
= (ln(w))4 ,
d
fW (w) = FW (w)
dw
d
= (ln(w))4
dw
4(ln(w))3
= .
w
Range dari x adalah 0 < x < 1, sehingga range dari w adalah 1 < w < e. Pdf dari W adalah
4(ln(w))3
fW (w) = dengan 1 < w < e.
w
(b) U = (X − 0, 5)2 , sehingga
FU (u) = P (U ≤ u)
= P ((X − 0, 5)2 ≤ u)
√ √
= P (− u ≤ x − 0, 5 ≤ u)
√ √
= P (x ≤ (0, 5 + u) − P (x < (0, 5 − u))
√ √
= Fx (0, 5 + u) − Fx (0, 5 − u),
d
fU (u) = (FU (u))
du
d √
= (Fx (0, 5 + u) − Fx (0, 5 − sqrtu))
du
√ √
1 1
= fX (0, 5) + u) · √ − fX (0, 5 − u) − √
2 u 2 u
1 √ √
= √ (fX (0, 5) + u + fX (0, 5 − u))
2 u
1
= √ (4(0, 5 + u)3 + 4(0, 5 − u)3 )
2 u
= (0, 5 + 6u)−1/2 , 0 < u < 0, 25.
bigskip
Ingin dicari interval konfidensi 95% untuk selisih harga rata-rata daging sapi di atas. (Bobot nilai 25)
2. Diberikan data jumlah pemasukan suatu stasiun TV dari iklan spot acara olahraga dan film. Diasumsikan
bahwa data pemasukan iklan ini berdistribusi normal. RIngkasan sampel data sampel diberikan dalam
tabel berikut.
Olahraga Film
n1 = 14 n2 = 15
X̄1 = 6, 8 milyar X̄2 = 5, 3 milyar
S1 = 1, 8 milyar S2 = 1, 6 milyar
Lebih lanjut, kita asumsikan bahwa σ12 = σ22 . Pihak manajer ingin mengubah jam tayang dari kedua
acara tersebut. Namun untuk itu, mereka ingin mengetahui apakah memang ada perbedaan pemasukan
iklan dari kedua mata acara ini. Lakukan uji hipotesis dengan menggunakan α = 5%. (Bobot nilai 25)
3. Misalkan X1 , X2 , X3 , ..., Xn merupakan sampel acak berukuran n yang berasal dari distribusi dengan fkp
berbentuk
θxθ−1 ;
0<x<x
f (x; θ) = .
0;
lainnya
Buktikan bahwa daerah kritis terbaik untuk menguji hipotesis H0 : θ = 1 melawan H1 : θ = 2 adalah
4. Misalkan X1 , X2 , X3 , ..., Xn merupakan sampel acak berukuran n yang berasal dari distribusi N (µ; 36).
Buktikan bahwa daerah kritis paling kuasa secara seragam untuk menguji hipotesis H0 : µ = 50 melawan
H1 : µ < 50, diberikan C = {(x1 , x2 , ..., xn ); x̄ ≤ c}.
Solusi
1. Misalkan x̄1 = 38.750, n1 = 100, σ12 = 3.300 dan x̄2 = 36.150, n2 = 100, σ22 = 2.700. Diperoleh
s
σ12 σ2
σ= + 2 = 7, 746
n1 n2
α
dan 2 = 0, 025 =⇒ Z0,025 = 1, 96. Kemudian, diperoleh selang kepercayaan
H0 : σ12 = σ22
H1 : σ12 ̸= σ22
1
α = 0, 05 =⇒ f(0,05:13,14) = 2, 51, f(0,95:13,14) = = 0, 39.
2, 51
Wilayah kritis
Kesimpulan: Tolak H1 , terima H0 bahwa σ12 = σ22 tidak ada perbedaan pemasukan iklan dari kedua acara
saat dilakukan perubahan jam tayang.
= (θxθ−1
1 )(θxθ−1
2 ) · · · (θxθ−1
n )
θ−1
= θn (Πni=1 xi ) .
1−1
L(1; x1 , x2 , ..., xn ) = 1n (Πni=1 xi ) = 1.
2−1
L(2; x1 , x2 , ..., xn ) = 2n (Πni=1 xi ) = 2n (Πni=1 xi ) .
sehingga
1 1
≤ k ⇐⇒ (Πni=1 xi ) ≥ n = c.
2n (Πni=1 xi ) 2 k
n
!
1 1 X
L(50; x1 , x2 , ..., xn ) = √ exp − (xi − 50)2 .
(6 2π)n 72 i=1
n
!
′′ 1 1 X
L(50 ; x1 , x2 , ..., xn ) = √ exp − (xi − 50′′ )2 .
(6 2π)n 72 i=1
sehingga
Pn
√1 1 2
L(50; x1 , x2 , ..., xn ) (6 2π)n
exp − 72 i=1 (xi − 50)
≤ k ⇐⇒ 1 1 n ≤k
L(50′′ ; x1 , x2 , ..., xn )
′′ 2
P
√
(6 2π)n
exp − 72 i=1 (xi − 50 )
n n
!
1 X 1 X
⇐⇒ exp − (xi − 50)2 + (xi − 50′′ )2 ≤k
72 i=1 72 i=1
n
!!
1 1X
⇐⇒ exp n(2, 50 − 2, 50′′ ) xi + n((50′′ )2 − (50)2 ) ≤k
72 n i=1
⇐⇒ exp(c1 x̄ + c2 ) ≤ k
⇐⇒ c1 x̄ + c2 ≤ ln(k)
ln(k) − c2
⇐⇒ x̄ ≤ = c.
c1