0% found this document useful (0 votes)
37 views66 pages

Modes of Convergence Explained

The document discusses various modes of convergence for sequences of random variables, including almost sure, convergence in probability, convergence in distribution, and mean square convergence. It presents propositions that establish the relationships and implications between these modes, particularly focusing on the Laws of Large Numbers (LLN), both weak and strong. The document also includes specific examples and proofs related to Chebyshev's Weak Law of Large Numbers for uncorrelated and correlated sequences.

Uploaded by

Sofyan Efendi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views66 pages

Modes of Convergence Explained

The document discusses various modes of convergence for sequences of random variables, including almost sure, convergence in probability, convergence in distribution, and mean square convergence. It presents propositions that establish the relationships and implications between these modes, particularly focusing on the Laws of Large Numbers (LLN), both weak and strong. The document also includes specific examples and proofs related to Chebyshev's Weak Law of Large Numbers for uncorrelated and correlated sequences.

Uploaded by

Sofyan Efendi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 66

Relations between modes of


convergence

In the previous lectures, we have introduced several notions of convergence of a


sequence of random variables (also called modes of convergence). There are
several relations between the various modes of convergence, which are discussed in
the following subsections and are summarized by the following diagram (an arrow
denotes implication in the arrow’s direction):

Almost sure Mean square


convergence convergence
& .
Convergence
in probability
#
Convergence
in distribution

66.1 Almost sure ) Probability


Proposition 318 If a sequence of random variables fXn g converges almost surely
to a random variable X, then fXn g also converges in probability to X.

Proof. See e.g. Resnick1 (1999).

66.2 Probability ) Distribution


Proposition 319 If a sequence of random variables fXn g converges in probability
to a random variable X, then fXn g also converges in distribution to X.

Proof. See e.g. Resnick (1999).

1 Resnick, S.I. (1999) "A Probability Path", Birkhauser.

533
534 CHAPTER 66. MODES OF CONVERGENCE - RELATIONS

66.3 Almost sure ) Distribution


Proposition 320 If a sequence of random variables fXn g converges almost surely
to a random variable X, then fXn g also converges in distribution to X.
Proof. This is obtained putting together Propositions (318) and (319) above.

66.4 Mean square ) Probability


Proposition 321 If a sequence of random variables fXn g converges in mean
square to a random variable X, then fXn g also converges in probability to X.
Proof.
n Weocan apply Markov’s inequality2 to a generic term of the sequence
2
(Xn X) :
h i
2
E (Xn X)
2
P (Xn X) c2
c2
for any strictly positive real number c. Taking the square root of both sides of the
left-hand inequality, we obtain
h i
2
E (Xn X)
P (jXn Xj c)
c2
Taking limits on both sides, we get
h i h i
2 2
E (Xn X) limn!1 E (Xn X)
lim P (jXn Xj c) lim = =0
n!1 n!1 c2 c2
where we have used the fact that, by the very de…nition of convergence in mean
square: h i
2
lim E (Xn X) = 0
n!1
Since, by the very de…nition of probability, it must be that
P (jXn Xj c) 0
then it must be that also
lim P (jXn Xj c) = 0
n!1

Note that this holds for any arbitrarily small c. By the de…nition of convergence in
probability, this means that Xn converges in probability to X (if you are wondering
about strict and weak inequalities here and in the de…nition of convergence in
probability, note that jXn Xj c implies jXn Xj > " for any strictly positive
" < c).

66.5 Mean square ) Distribution


Proposition 322 If a sequence of random variables fXn g converges in mean
square to a random variable X, then fXn g also converges in distribution to X.
Proof. This is obtained putting together Propositions (321) and (319) above.
2 See p. 241.
Chapter 67

Laws of Large Numbers

Let fXn g be a sequence of random variables1 . Let X n be the sample mean of the
…rst n terms of the sequence:
n
1X
Xn = Xi
n i=1

A Law of Large Numbers (LLN) is a proposition stating a set of conditions


that are su¢ cient to guarantee the convergence of the sample mean X n to a con-
stant, as the sample size n increases. It is called a Weak Law of Large Numbers
(WLLN) if the sequence X n converges in probability2 and a Strong Law of
Large Numbers (SLLN) if the sequence X n converges almost surely3 .
There are literally dozens of Laws of Large Numbers. We report some examples
below.

67.1 Weak Laws of Large Numbers


67.1.1 Chebyshev’s WLLN
Probably, the best known Law of Large Numbers is Chebyshev’s:

Proposition 323 (Chebyshev’s WLLN) Let fXn g be an uncorrelated and co-


variance stationary sequence of random variables4 :

9 2 R : E [Xn ] = ; 8n 2 N
9 2 2 R+ : Var [Xn ] = 2 ; 8n 2 N
Cov [Xn ; Xn+k ] = 0; 8n; k 2 N

Then, a Weak Law of Large Numbers applies to the sample mean:

plim X n =
n!1
1 See p. 491.
2 See p. 511.
3 See p. 505.
4 In other words, all the random variables in the sequence have the same mean , the same

variance 2 and zero covariance with each other. See p. 493 for a de…nition of covariance
stationary sequence.

535
536 CHAPTER 67. LAWS OF LARGE NUMBERS

where plim denotes a probability limit5 .

Proof. The expected value of the sample mean X n is:


" n # n
1X 1X
E Xn = E Xi = E [Xi ]
n i=1 n i=1
n
1X 1
= = n =
n i=1 n

The variance of the sample mean X n is:


" n
#
1X
Var X n = Var Xi
n i=1
" n #
1 X
A = Var Xi
n2 i=1
n
B 1 X
= Var [Xi ]
n2 i=1
n
1 X 2 1 2
2
= = n =
n2 i=1 n2 n

where: in step A we have used the properties of variance6 ; in step B we have


used the fact that the variance of a sum is equal to the sum of the variances when
the random variables in the sum have zero covariance with each other7 . Now we
can apply Chebyshev’s inequality8 to the sample mean X n :

Var X n
P Xn E Xn k
k2
for any strictly positive real number k. Plugging in the values for the expected
value and the variance derived above, we obtain:
2
P Xn k
nk 2
Since
2
lim =0
n!1 nk 2
and
P Xn k 0
then it must also be that:

lim P X n k =0
n!1

Note that this holds for any arbitrarily small k. By the very de…nition of conver-
gence in probability, this means that X n converges in probability to (if you are
5 See p. 511.
6 See,in particular, the Multiplication by a constant property (p. 158).
7 See p. 168.
8 See p. 242.
67.1. WEAK LAWS OF LARGE NUMBERS 537

wondering about strict and weak inequalities here and in the de…nition of conver-
gence in probability, note that X n k implies X n > " for any strictly
positive " < k).
Note that it is customary to state Chebyshev’s Weak Law of Large Numbers
as a result on the convergence in probability of the sample mean:

plim X n =
n!1

However, the conditions of the above theorem guarantee the mean square conver-
gence9 of the sample mean to :
m:s:
Xn !

Proof. In the above proof of Chebyshev’s Weak Law of Large Numbers, it is


proved that:
2
Var X n =
n
and that
E Xn =
This implies that
h 2
i h 2
i 2
E Xn =E Xn E Xn = Var X n =
n
As a consequence:
h 2
i 2
lim E Xn = lim =0
n!1 n!1 n
but this is just the de…nition of mean square convergence of X n to .
Hence, in Chebyshev’s Weak Law of Large Numbers, convergence in proba-
bility is just a consequence of the fact that convergence in mean square implies
convergence in probability10 .

67.1.2 Chebyshev’s WLLN for correlated sequences


Chebyshev’s Weak Law of Large Numbers (see above) sets forth the requirement
that the terms of the sequence fXn g have zero covariance with each other. By
relaxing this requirement and allowing for some correlation between the terms of
the sequence fXn g, a more general version of Chebyshev’s Weak Law of Large
Numbers can be obtained:

Proposition 324 (Chebyshev’s WLLN for correlated sequences) Let fXn g


be a covariance stationary sequence of random variables11 :

9 2 R : E [Xn ] = ; 8n > 0
8j 0; 9 j 2 R : Cov [Xn ; Xn j] = j ; 8n >j
9 See p. 519.
1 0 See p. 534.
1 1 In other words, all the random variables in the sequence have the same mean , the same

variance 0 and the covariance between a term Xn of the sequence and the term that is located
j positions before it (Xn j ) is always the same ( j ), irrespective of how Xn has been chosen.
538 CHAPTER 67. LAWS OF LARGE NUMBERS

If covariances tend to be zero on average, i.e. if


n
1X
lim i =0 (67.1)
n!1 n
i=0

then a Weak Law of Large Numbers applies to the sample mean:


plim X n =
n!1

Proof. For a full proof see e.g. Karlin and Taylor12 (1975). We give here a proof
based on the assumption that covariances are absolutely summable:
1
X
j <1
j=0

which is stronger than (67.1). The expected value of the sample mean X n is
" n # n
1X 1X
E Xn = E Xi = E [Xi ]
n i=1 n i=1
n
1X 1
= = n =
n i=1 n

The variance of the sample mean X n is:


" n #
1X
Var X n = Var Xi
n i=1
" n #
X
A = 1 Var Xi
n2 i=1
8 9
<X n Xn Xi 1 =
B = 1 Var [Xi ] + 2 Cov [Xi ; Xj ]
n : i=1
2
i=2 j=1
;
( n
1 X
= + 2 1 + 2 ( 1 + 2)
n2 i=1 0
+::: + 2 1 + 2 + ::: n 1
1
= n + 2 (n 1) 1 + 2 (n 2) + ::: + 2
n2( 0 )
2 n 1
n
X1 (n i)
1
= 0+2 i
n i=1
n

where: in step A we have used the properties of variance13 ; in step B we have


used the formula for the variance of a sum14 . Note that:
( n
)
1 X1 (n i)
Var X n = 0+2 i
n i=1
n
1 2 Karlin, S., Taylor, H. M. (1975) "A …rst course in stochastic processes", Academic Press.
1 3 See,in particular, the Multiplication by a constant property (p. 158).
1 4 See p. 168.
67.1. WEAK LAWS OF LARGE NUMBERS 539
( n
)
1 X1 (n i)
0 +2 i
n i=1
n
( n
)
1 X1 (n i)
= 0 +2 j ij
n i=1
n
( n
)
1 X1
A
0+2 j ij
n i=1
( 1
)
1 X
0 +2 j ij
n i=1

where: in step A we have used the fact that

(n i)
<1
n

But the covariances are absolutely summable, so that:


1
X
0+2 j ij =
i=1

where is a …nite constant. Therefore:

Var X n
n
Now we can apply Chebyshev’s inequality to the sample mean X n :

Var X n
P Xn E Xn k
k2
for any strictly positive real number k. Plugging in the values for the expected
value and the variance derived above, we obtain:

Var X n
P Xn k
k2 nk 2
Since
lim =0
n!1 nk 2
and
P Xn k 0
then it must also be that:

lim P X n k =0
n!1

Note that this holds for any arbitrarily small k. By the very de…nition of conver-
gence in probability, this means that X n converges in probability to (if you are
wondering about strict and weak inequalities here and in the de…nition of conver-
gence in probability, note that X n k implies X n > " for any strictly
positive " < k).
540 CHAPTER 67. LAWS OF LARGE NUMBERS

Also Chebyshev’s Weak Law of Large Numbers for correlated sequences has
been stated as a result on the convergence in probability of the sample mean:
plim X n =
n!1

However, the conditions of the above theorem also guarantee the mean square
convergence of the sample mean to :
m:s:
Xn !
Proof. In the above proof of Chebyshev’s Weak Law of Large Numbers for corre-
lated sequences, we proved that

Var X n
n
and that
E Xn =
This implies:
h 2
i h 2
i
E Xn =E Xn E Xn = Var X n
n
Thus, taking limits on both sides, we obtain:
h 2
i
lim E X n lim =0
n!1 n!1 n
But h i
2
E Xn 0
so it must be: h i
2
lim E Xn =0
n!1

This is just the de…nition of mean square convergence of X n to .


Hence, also in Chebyshev’s Weak Law of Large Numbers for correlated se-
quences, convergence in probability descends from the fact that convergence in
mean square implies convergence in probability.

67.2 Strong Laws of Large numbers


67.2.1 Kolmogorov’s SLLN
Among the Strong Laws of Large Numbers, Kolmogorov’s is probably the best
known:
Proposition 325 (Kolmogorov’s SLLN) Let fXn g be an IID sequence15 of
random variables having …nite mean:
E [Xn ] = < 1; 8n 2 N
Then, a Strong Law of Large Numbers applies to the sample mean:
a:s:
Xn !
a:s:
where ! denotes almost sure convergence16 .
1 5 See p. 492.
1 6 See p. 505.
67.3. LAWS OF LARGE NUMBERS FOR RANDOM VECTORS 541

Proof. See, for example, Resnick17 (1999) and Williams18 (1991).

67.2.2 Ergodic theorem


In Kolmogorov’s Strong Law of Large Numbers, the sequence fXn g is required to
be an IID sequence. This requirement can be weakened, by requiring fXn g to be
stationary19 and ergodic20 .
Proposition 326 (Ergodic Theorem) Let fXn g be a stationary and ergodic se-
quence of random variables having …nite mean:
E [Xn ] = < 1; 8n 2 N
Then, a Strong Law of Large Numbers applies to the sample mean:
a:s:
Xn !
Proof. See, for example, Karlin and Taylor21 (1975) and White22 (2001).

67.3 Laws of Large numbers for random vectors


The Laws of Large Numbers we have just presented concern sequences of random
variables. However, they can be extended in a straightforward manner to sequences
of random vectors:
Proposition 327 Let fXn g be a sequence of K 1 random vectors, let E [Xn ] =
be their common expected value and
n
1X
Xn = Xi
n i=1
their sample mean. Denote the j-th component of Xn by Xn;j and the j-th compo-
nent of X n by X n;j . Then:
a Weak Law of Large Numbers applies to the sample mean X n if and only if
a Weak Law of Large numbers applies to each of the components of the vector
X n , i.e. if and only if
plim X n;j = j; j = 1; : : : ; K
n!1

a Strong Law of Large Numbers applies to the sample mean X n if and only
if a Strong Law of Large numbers applies to each of the components of the
vector X n , i.e. if and only if
a:s:
X n;j ! j; j = 1; : : : ; K

Proof. This is a consequence of the fact that a vector converges in probability


(almost surely) if and only if all of its components converge in probability (almost
surely). See the lectures entitled Convergence in probability (p. 511) and Almost
sure convergence (p. 505).
1 7 Resnick, S.I. (1999) "A Probability Path", Birkhauser.
1 8 Williams, D. (1991) "Probability with martingales", Cambridge University Press.
1 9 See p. 492.
2 0 See p. 494.
2 1 Karlin, S., Taylor, H. M. (1975) "A …rst course in stochastic processes", Academic Press.
2 2 White, H. (2001) "Asymptotic theory for econometricians", Academic Press.
542 CHAPTER 67. LAWS OF LARGE NUMBERS

67.4 Solved exercises


Below you can …nd some exercises with explained solutions.

Exercise 1
Let f"n g be an IID sequence. A generic term of the sequence has mean and
variance 2 . Let fXn g be a covariance stationary sequence such that a generic
term of the sequence satis…es

Xn = Xn 1 + "n

where 1< < 1. Denote by


n
1X
Xn = Xn
n i=1

the sample mean of the sequence. Verify whether the sequence X n satis…es
the conditions that are required by Chebyshev’s Weak Law of Large Numbers. In
a¢ rmative case, …nd its probability limit.

Solution
By assumption the sequence fXn g is covariance stationary. So all the terms of the
sequence have the same expected value. Taking the expected value of both sides
of the equation
X n = X n 1 + "n
we obtain:

E [Xn ] = E [ X n 1 + "n ]
= E [Xn 1 ] + E ["n ]
= E [Xn ] +

Solving for E [Xn ] we obtain:

E [Xn ] =
1
By the same token, the variance can be derived from:

Var [Xn ] = Var [ Xn 1 + "n ]


A = 2
Var [Xn 1 ] + Var ["n ]
2
= Var [Xn ] + 2

where: in step A we have used the fact that Xn 1 is independent of "n because
f"n g is IID. Solving for Var [Xn ], we obtain
2
Var [Xn ] = 2
1
Now, we need to derive Cov [Xn ; Xn+j ]. Note that:

Xn+1 = Xn + "n+1
67.4. SOLVED EXERCISES 543

2
Xn+2 = Xn+1 + "n+2 = Xn + "n+2 + "n+1
3 2
Xn+3 = Xn+2 + "n+3 = Xn + "n+3 + "n+2 + "n+1
..
.
j 1
X
j s
Xn+j = Xn+j 1 + "n+j = Xn + "n+j s
s=0

The covariance between two terms of the sequence is:

j = Cov [Xn ; Xn+j ]


" j 1
#
X
j s
= Cov Xn ; Xn + "n+j s
s=0
j 1
X
A = j
Cov [Xn ; Xn ] + s
Cov [Xn ; "n+j s]
s=0

B = j
Cov [Xn ; Xn ]
j
= Var [Xn ]
2
j
= 2
1

where: in step A we have used the bilinearity of covariance; in step B we have


used the fact that Xn is independent of "n+j s because f"n g is IID. The sum of
the covariances is:
n
X n
X 2
j
j = 2
j=0 j=0
1
2 n
X
j
= 2
1 j=0
2 n
X
1 j
= 2 1
1 j=0
2
1 2 n n+1
= 2
1 + + ::: +
1 1
2 n+1
1
= 2
1 1
Thus, covariances tend to be zero on average:
n
1X 1 2
1 n+1
lim j = lim 2
=0
n!1 n n!1 n 1 1
j=0

and the conditions of Chebyshev’s Weak Law of Large Numbers are satis…ed.
Therefore, the sample mean converges in probability to the population mean:

plim X n = E [Xn ] =
n!1 1
544 CHAPTER 67. LAWS OF LARGE NUMBERS
Chapter 68

Central Limit Theorems

Let fXn g be a sequence of random variables1 . Let X n be the sample mean of the
…rst n terms of the sequence:
n
1X
Xn = Xi
n i=1

A Central Limit Theorem (CLT) is a proposition giving a set of conditions


that are su¢ cient to guarantee the convergence of the sample mean X n to a normal
distribution, as the sample size n increases.
More precisely, a Central Limit Theorem is a proposition giving a set of condi-
tions that are su¢ cient to guarantee that
p Xn d
n !Z

d
where Z is a standard normal random variable2 , and are two constants and !
indicates convergence in distribution3 .
Why is the expression X n = multiplied by the square root of n? If we
p
do not multiply it by n, then X n = converges to a constant, provided that
4
the conditions
p of a Law of Large Numbers apply. On the contrary, multiplying
it by n, we obtain a sequence that converges to a proper random variable (i.e.
a random variable that is not constant). When the conditions of a Central Limit
Theorem apply, this variable has a normal distribution.
In practice, the CLT is used as follows:
1. we observe a sample consisting of n observations X1 , X2 , : : :, Xn ;
2. if n is large enough, then a standard
p normal distribution is a good approxi-
mation of the distribution of n X n = ;
3. therefore, we pretend that
p
n Xn = N (0; 1)
1 See p. 491.
2 Remember that a standard normal random variable is a normal random variable with zero
mean and unit variance (p. 376).
3 See p. 527.
4 See p. 535.

545
546 CHAPTER 68. CENTRAL LIMIT THEOREMS

where N indicates the normal distribution;

4. as a consequence, the distribution of the sample mean X n is


2
Xn N ;
n

There are several Central Limit Theorems. We report some examples below.

68.1 Examples of Central Limit Theorems


68.1.1 Lindeberg-Lévy CLT
The best known Central Limit Theorem is probably Lindeberg-Lévy CLT:

Proposition 328 (Lindeberg-Lévy CLT) Let fXn g be an IID sequence5 of ran-


dom variables such that

E [Xn ] = < 1; 8n 2 N
Var [Xn ] = 2 < 1; 8n 2 N
2
where > 0. Then, a Central Limit Theorem applies to the sample mean X n :

p Xn d
n !Z

d
where Z is a standard normal random variable and ! denotes convergence in
distribution.

Proof. We will just sketch a proof. For a detailed and rigorous proof see, for
example, Resnick6 (1999) and Williams7 (1991). First of all, denote by fZn g the
sequence whose generic term is

p Xn
Zn = n

The characteristic function8 of Zn is

'Zn (t) = E [exp (itZn )]


p Xn
= E exp it n
" n
!!#
p 1 1X
= E exp it n Xi
n i=1
" n
!#
t X Xi
= E exp i p
n i=1
5 See p. 492.
6 Resnick, S.I. (1999) "A Probability Path", Birkhauser.
7 Williams, D. (1991) "Probability with martingales", Cambridge University Press.
8 See p. 307.
68.1. EXAMPLES OF CENTRAL LIMIT THEOREMS 547
" n
#
Y t Xi
= E exp i p
i=1
n
n
Y
A t Xi
= E exp i p
i=1
n
n
Y
B t
= E exp i p Yi
i=1
n
n
Y
C t
= 'Yi p
i=1
n
n
D t
= 'Y1 p
n
where: in step A we have used the fact that the random variables Xi are mutually
independent9 ; in step B we have de…ned
Xi
Yi =

in step C we have used the de…nition of characteristic function and we have


denoted the characteristic function of Yi by 'Yi (t); in step D we have used
the fact that all the variables Yi have the same distribution and hence the same
characteristic function. Now take a second order Taylor series expansion of 'Y1 (s)
around the point s = 0:
'Y1 (s) = E [exp (isY1 )]
d
= E [exp (isY1 )]js=0 + (E [exp (isY1 )]) s
ds s=0
1 d2
+ (E [exp (isY1 )]) s2 + o s2
2 ds2 s=0
d
= E [exp (isY1 )]js=0 + E exp (isY1 ) s
ds s=0
1 d2
+ E exp (isY1 ) s2 + o s2
2 ds2 s=0
= E [exp (isY1 )]js=0 + (E [iY1 exp (isY1 )])js=0 s
1
+ E Y12 exp (isY1 ) s=0 s2 + o s2
2
1
= 1 + iE [Y1 ] s E Y12 s2 + o s2
2
A 1
= 1 Var [Y1 ] s2 + o s2
2
B 1 2
= 1 s + o s2
2
where: o s2 is an in…nitesimal of higher order than s2 , i.e. a quantity that
converges to 0 faster than s2 does; in step A we have used the fact that
E [Y1 ] = 0
9 In particular, see the Mutual independence via expectations property (p. 234).
548 CHAPTER 68. CENTRAL LIMIT THEOREMS

in step B we have used the fact that

Var [Y1 ] = 1

Therefore:
n
t
lim 'Zn (t) = lim 'Y1 p
n!1 n!1 n
" !#n
2 2
1 t t
= lim 1 p +o p
n!1 2 n n
n
1 t2 t2
= lim 1 +o
n!1 2n n
1 2
= exp t = 'Z (t)
2

So, we have that:


lim 'Zn (t) = 'Z (t)
n!1

where
1 2
'Z (t) = exp t
2
is the characteristic function of a standard normal random variable Z (see the
lecture entitled Normal distribution - p. 379). A theorem, called Lévy continuity
theorem, which we do not cover in these lectures, states that if a sequence of
random variables fZn g is such that their characteristic functions 'Zn (t) converge
to the characteristic function 'Z (t) of a random variable Z, then the sequence
fZn g converges in distribution to Z. Therefore, in our case the sequence fZn g
converges in distribution to a standard normal distribution.
So, roughly speaking, under the stated assumptions, the distribution of the
sample mean X n can be approximated by a normal distribution with mean and
2
variance n (provided n is large enough).
Also note that the conditions for the validity of Lindeberg-Lévy Central Limit
Theorem resemble the conditions for the validity of Kolmogorov’s Strong Law of
Large Numbers10 . The only di¤erence is the additional requirement that
2
Var [Xn ] = < 1; 8n 2 N

68.1.2 A CLT for correlated sequences


In the Lindeberg-Lévy CLT (see above), the sequence fXn g is required to be an
IID sequence. The assumption of independence can be weakened as follows:

Proposition 329 (CLT for correlated sequences) Let fXn g be a stationary11


and mixing12 sequence of random variables satisfying a CLT technical condition
(de…ned in the proof below) and such that

E [Xn ] = < 1; 8n 2 N
1 0 See p. 540.
1 1 See p. 492.
1 2 See p. 494.
68.2. MULTIVARIATE GENERALIZATIONS 549

2
Var [Xn ] = < 1; 8n 2 N
1
X
lim nVar X n = 2 + 2 Cov [X1 ; Xi ] = V < 1
n!1
i=2

where V > 0. Then, a Central Limit Theorem applies to the sample mean X n :

p Xn d
n p !Z
V
d
where Z is a standard normal random variable and ! denotes convergence in
distribution.

Proof. Several di¤erent technical conditions (beyond those explicitly stated in


the above proposition) are imposed in the literature in order to derive Central
Limit Theorems for correlated sequences. These conditions are usually very mild
and di¤er from author to author. We do not mention these technical conditions
here and just refer to them as CLT technical conditions. For a proof see, for
example, Durrett13 (2010) and White14 (2001).
So, roughly speaking, under the stated assumptions, the distribution of the
sample mean X n can be approximated by a normal distribution with mean and
variance Vn (provided n is large enough).
Also note that the conditions for the validity of the Central Limit Theorem
for correlated sequences resemble the conditions for the validity of the ergodic
theorem15 . The main di¤erences (beyond some technical conditions that are not
explicitly stated in the above proposition) are the additional requirements that
2
Var [Xn ] = < 1; 8n 2 N
1
X
2
V = lim nVar X n = +2 Cov [X1 ; Xi ] < 1
n!1
i=2

and the fact that ergodicity is replaced by the stronger condition of mixing.
Finally, let us mention that the variance V in the above proposition, which is
de…ned as
V = lim nVar X n
n!1

is called the long-run variance of X n .

68.2 Multivariate generalizations


The results illustrated above for sequences of random variables extend in a straight-
forward manner to sequences of random vectors. For example, the multivariate
version of the Lindeberg-Lévy CLT is:

Proposition 330 (Multivariate Lindeberg-Lévy CLT) Let fXn g be an IID


sequence of K 1 random vectors such that

E [Xn ] = 2 RK ; 8n 2 N
1 3 Durrett, R. (2010) "Probability: Theory and Examples", Cambridge University Press.
1 4 White, H. (2001) "Asymptotic theory for econometricians", Academic Press.
1 5 See p. 541.
550 CHAPTER 68. CENTRAL LIMIT THEOREMS

Var [Xn ] = 2 RK K
; 8n 2 N
Pn
where is a positive de…nite matrix. Let X n = n1 i=1 Xi be the vector of sample
means. Then:
p d
n 1 Xn !Z
d
where Z is a standard multivariate normal random vector16 and ! denotes con-
vergence in distribution.

Proof. For a proof see, for example, Basu17 (2004), DasGupta18 (2008) and
McCabe and Tremayne19 (1993).
In a similar manner, the CLT for correlated sequences generalizes to random
vectors (V becomes a matrix, called long-run covariance matrix).

68.3 Solved exercises


Below you can …nd some exercises with explained solutions.

Exercise 1
Let fXn g be a sequence of independent Bernoulli random variables20 with para-
meter p = 21 , i.e. a generic term Xn of the sequence has support

RXn = f0; 1g

and probability mass function


8
< 1=2 if x = 1
pXn (x) = 1=2 if x = 0
:
0 if x 2
= RXn

Use a Central Limit Theorem to derive an approximate distribution for the


mean of the …rst 100 terms of the sequence.

Solution
The sequence fXn g is and IID sequence. The mean of a generic term of the
sequence is
X
E [Xn ] = xpXn (x) = 1 pXn (1) + 0 pXn (0)
x2RXn

1 1 1
= 1 +0 1 = <1
2 2 2
1 6 See p. 439.
1 7 Basu, A. K. (2004) Measure theory and probability, PHI Learning PVT.
1 8 DasGupta, A. (2008) Asymptotic theory of statistics and probability, Springer.
1 9 McCabe, B. and A. Tremayne (1993) Elements of modern asymptotic theory with statistical

applications, Manchester University Press.


2 0 See p. 335.
68.3. SOLVED EXERCISES 551

The variance of a generic term of the sequence can be derived thanks to the usual
formula for computing the variance21 :
X
E Xn2 = x2 pX (x) = 12 pXn (1) + 02 pXn (0)
x2RXn
1 1 1
= 1 +0 =
2 2 2
2 1
E [Xn ] =
4
2 1 1 1
Var [Xn ] = E Xn2 E [Xn ] = = <1
2 4 4
Therefore, the sequence fXn g satis…es the conditions of Lindeberg-Lévy Central
Limit Theorem (IID, …nite mean, …nite variance). The mean of the …rst 100 terms
of the sequence is:
100
1 X
X 100 = Xi
100 i=1
Using the Central Limit Theorem to approximate its distribution, we obtain:
Var [Xn ]
Xn N E [Xn ] ;
n
or
1 1
X 100 N ;
2 400

Exercise 2
Let fXn g be a sequence of independent Bernoulli random variables with parameter
p = 21 , as in the previous exercise. Let fYn g be another sequence of random
variables such that
1
Yn = Xn+1 Xn ; 8n
2
Suppose fYn g satis…es the conditions of a Central Limit Theorem for correlated
sequences. Derive an approximate distribution for the mean of the …rst n terms of
the sequence fYn g.

Solution
The sequence fXn g is and IID sequence. The mean of a generic term of the
sequence is
1 1
E [Yn ] = E Xn+1 Xn = E [Xn+1 ] E [Xn ]
2 2
1 11 1
= =
2 22 4
The variance of a generic term of the sequence is
1
Var [Yn ] = Var Xn+1 Xn
2
2 1 Var [X] = E X2 E [X]2 . See p. 156.
552 CHAPTER 68. CENTRAL LIMIT THEOREMS

1 1
= Var [Xn+1 ] + Var [Xn ] 2 Cov [Xn+1 ; Xn ]
4 2
A 1
= Var [Xn+1 ] + Var [Xn ]
4
1 11 5
= + =
4 44 16

where: in step A we have used the fact that Xn and Xn+1 are independent. The
covariance between two successive terms of the sequence is

Cov [Yn+1 ; Yn ]
1 1
= Cov Xn+2 Xn+1 ; Xn+1 Xn
2 2
A 1
= Cov [Xn+2 ; Xn+1 ] Cov [Xn+2 ; Xn ]
2
1 1
Cov [Xn+1 ; Xn+1 ] + Cov [Xn+1 ; Xn ]
2 4
B 1
= Cov [Xn+1 ; Xn+1 ]
2
C 1
= Var [Xn+1 ]
2
11 1
= =
24 8

where: in step A we have used the bilinearity of covariance22 ; in step B we have


used the fact that the terms of fXn g are independent; in step C we have used the
fact that the covariance of a random variable with itself is equal to its variance.
The covariance between two terms that are not adjacent (Yn and Yn+j , with j > 1)
is

Cov [Yn+j ; Yn ]
1 1
= Cov Xn+j+1 Xn+j ; Xn+1 Xn
2 2
A 1
= Cov [Xn+j+1 ; Xn+1 ] Cov [Xn+j+1 ; Xn ]
2
1 1
Cov [Xn+j ; Xn+1 ] + Cov [Xn+j ; Xn ]
2 4
B = 0

where: in step A we have used the bilinearity of covariance; in step B we have


used the fact that the terms of fXn g are independent. The long-run variance is
1
X
V = Var [Y1 ] + 2 Cov [Xj ; X1 ]
j=2
= Var [Y1 ] + 2Cov [X2 ; X1 ]
5 2 1
= =
16 8 16
2 2 See p. 166.
68.3. SOLVED EXERCISES 553

The mean of the …rst n terms of the sequence fYn g is


n
1X
Yn = Yi
n i=1

Using the Central Limit Theorem for correlated sequences to approximate its dis-
tribution, we obtain
V
Y n N E [Yn ] ;
n
or
1 1
Yn N ;
4 16 n

Exercise 3
Let Y be a binomial random variable with parameters n = 100 and p = 12 (you need
to read the lecture entitled Binomial distribution 23 in order to be able to solve this
exercise). Using the Central Limit Theorem, show that a normal random variable
X with mean = 50 and variance 2 = 25 can be used as an approximation of Y .

Solution
1
A binomial random variable Y with parameters n = 100 and p = 2 can be written
as
100
X
Y = Xi
i=1

where X1 , . . . , X100 are mutually independent Bernoulli random variables with


parameter p = 12 . Thus:
100
!
1 X
Y = 100 Xi
100 i=1
= 100 X 100

In the …rst exercise, we have shown that the distribution of X 100 can be approxi-
mated by a normal distribution:

1 1
X 100 N ;
2 400

Therefore, the distribution of Y can be approximated by

1 1
Y N 100; 1002
2 400

Thus, Y can be approximated by a normal distribution with mean = 50 and


variance 2 = 25.

2 3 See p. 341.
Statistika Matematika II

56
Bank Soal | Mata Kuliah Semester V 57

5.1 UTS Statistika Matematika Tahun 2022


1. Misalkan y1 , y2 , ..., yn sampel acak dari pdf f (y, θ) = θy θ−1 dengan 0 < y < 1 dan θ > 0.

(a) Carilah estimator θ dengan metode moment estimator (MME).

(b) Tunjukkan apakah estimator di (a) bias atau tidak.

2. Buktikan bahwa jika θ̂−θ = [θ̂−E(θ̂)]+[E(θ̂)−θ] = [θ−E(θ̂)]+B(θ̂) maka E[(θ̂−θ)2 ] = V ar(θ̂)+(B(θ̂))2 .

3. Misalkan y1 , y2 , ..., yn adalah sampel acak dari distribusi Poisson dengan parameter λ.

(a) Cari estimator λ dengan MLE.

(b) Hitung ekspektasi dan varians dari estimator λ.

4. Misalkan X variabel acak dari pdf f (x) = 4x3 dengan 0 < x < 1. Carilah pdf dari variabel acak

(a) Y = ex ;

(b) U = (x − 21 )2 .

5. Diketahui X ∼ BIN (n, p). Carilah pdf dari variabel acak Y = n − X.

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 58

Solusi
1. Misalkan y1 , y2 , ..., yn sampel acak dari pdf

θy θ−1 ,

0 < y < 1; θ > 0
f (y, θ) = .
0, yang lain

(a)
Z1 1
θ θ
M= θy θ dy = yθ = .
θ+1 0 θ+1
0

θ̂ θy
ŷ = ⇐⇒ θ̂ = .
θ̂ + 1 1 − ŷ

(b) Perhatikan bahwa


ŷ 1
lim (θ̂) = lim = lim 1 = 1 ̸= 0,
y→∞ y→∞ 1 − ŷ y→∞ ŷ −1

sehingga estimator θ tidak konsisten dan berakibat tidak bias.

2. Diketahui

θ̂ − θ = = [θ̂ − E(θ̂)] + [E(θ̂) − θ]

= [θ − E(θ̂)] + B(θ̂).

Diperoleh

V ar(θ̂) + (B(θ̂))2 = E(θ̂2 ) − E(θ̂)2 + (E(θ̂) − θ)2

= E(θ̂2 ) − 2θE(θ̂) + θ2

= E(θ̂2 − 2θθ̂ + θ2 )

= E((θ̂ − θ)2 ).

3. Misalkan y1 , y2 , ..., yn adalah sampel acak dari distribusi Poisson dengan parameter λ.

(a) Distribusi Poisson memiliki pdf sebagai berikut.



 e−λ λy ,

y = 0, 1, 2, ...
y!
P (Y = y) =
0,

yang lain

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 59

Fungsi pendugaan

L = P {y1 ∩ y2 ∩ · · · ∩ yn }

= P {y1 } ∩ P {y2 } ∩ · · · ∩ P {yn }


e−λ λy1 e−λ λy2 e−λ λyn
= · · ··· ·
y1 ! y2 ! yn !
Pn
−nλ y
e λ i=1 i
= Qn
i=1 yi !
n
!−1
Pn Y
−nλ y
=e λ i=1 i
yi ! ,
i=1

sehingga !
n
X n
X
ln(L) = −nλ + yi ln(λ) − ln(yi !).
i=1 i=1

Maksimalkan ln(L) terhadap λ,

∂ ln(L)
=0
Pn∂λ
yi
−n + i=1 = 0
λ !
n
1 X
λ̂ = yi = ȳ,
n i=1

∂ 2 ln(L)
dengan > 0, sehingga diperoleh λ̂ = ȳ adalah MLE dari estimator λ.
∂λ2
(b) !
n n n
1X 1X 1X 1
E(λ̂) = E(ȳ) = E yi = E(yi ) = λ = (nλ) = λ,
n i=1 n i=1 n i=1 n

n
! n n
1X 1 X 1 X 1 λ
V ar(λ̂) = V ar(ȳ) = V ar yi = 2
V ar(yi ) = 2
λ = 2 (nλ) = .
n i=1 n i=1 n i=1 n n

4. CDF dari pdf f (x) = 4x3 adalah F (x) = x4 .

(a) W = ex berarti

FW (w) = P (ex ≤ w)

= P (x ≤ ln(w))

= Fx (ln(w))

= (ln(w))4 ,

d
fW (w) = FW (w)
dw
d
= (ln(w))4
dw
4(ln(w))3
= .
w

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 60

Range dari x adalah 0 < x < 1, sehingga range dari w adalah 1 < w < e. Pdf dari W adalah
4(ln(w))3
fW (w) = dengan 1 < w < e.
w
(b) U = (X − 0, 5)2 , sehingga

FU (u) = P (U ≤ u)

= P ((X − 0, 5)2 ≤ u)
√ √
= P (− u ≤ x − 0, 5 ≤ u)
√ √
= P (x ≤ (0, 5 + u) − P (x < (0, 5 − u))
√ √
= Fx (0, 5 + u) − Fx (0, 5 − u),

d
fU (u) = (FU (u))
du
d √
= (Fx (0, 5 + u) − Fx (0, 5 − sqrtu))
du
√ √
 
1 1
= fX (0, 5) + u) · √ − fX (0, 5 − u) − √
2 u 2 u
1 √ √
= √ (fX (0, 5) + u + fX (0, 5 − u))
2 u
1
= √ (4(0, 5 + u)3 + 4(0, 5 − u)3 )
2 u
= (0, 5 + 6u)−1/2 , 0 < u < 0, 25.

bigskip

5. pdf dari X ∼ BIN (n, p) adalah  


n x n−x
p q .
x

Substitusi Y = n − x ⇐⇒ x = n − y, diperoleh pdf dari Y ∼ BIN (n, p) adalah


   
n n−y n−(n−y) n n−y y
p q = p q .
n−y y

Solusi Gusti Muhammad Rosyadi Penulis Yustisia Wibi Naufal

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 61

5.2 UAS Statistika Matematika Tahun 2022


1. Dipunyai data harga daging sapi (rupiah) selama krisis moneter di dua daerah adalah sebagai berikut.

Daerah Daerah I Daerah II


Rata-rata 38.750 36.150
Variansi sampel 3.300 2.700
Jumlah sampel 100 100

Ingin dicari interval konfidensi 95% untuk selisih harga rata-rata daging sapi di atas. (Bobot nilai 25)

2. Diberikan data jumlah pemasukan suatu stasiun TV dari iklan spot acara olahraga dan film. Diasumsikan
bahwa data pemasukan iklan ini berdistribusi normal. RIngkasan sampel data sampel diberikan dalam
tabel berikut.

Olahraga Film
n1 = 14 n2 = 15
X̄1 = 6, 8 milyar X̄2 = 5, 3 milyar
S1 = 1, 8 milyar S2 = 1, 6 milyar

Lebih lanjut, kita asumsikan bahwa σ12 = σ22 . Pihak manajer ingin mengubah jam tayang dari kedua
acara tersebut. Namun untuk itu, mereka ingin mengetahui apakah memang ada perbedaan pemasukan
iklan dari kedua mata acara ini. Lakukan uji hipotesis dengan menggunakan α = 5%. (Bobot nilai 25)

3. Misalkan X1 , X2 , X3 , ..., Xn merupakan sampel acak berukuran n yang berasal dari distribusi dengan fkp
berbentuk 
θxθ−1 ;

0<x<x
f (x; θ) = .
0;

lainnya

Buktikan bahwa daerah kritis terbaik untuk menguji hipotesis H0 : θ = 1 melawan H1 : θ = 2 adalah

C = {(x1 , x2 , ..., xn ); c ≤ Πni=1 xi }.

4. Misalkan X1 , X2 , X3 , ..., Xn merupakan sampel acak berukuran n yang berasal dari distribusi N (µ; 36).
Buktikan bahwa daerah kritis paling kuasa secara seragam untuk menguji hipotesis H0 : µ = 50 melawan
H1 : µ < 50, diberikan C = {(x1 , x2 , ..., xn ); x̄ ≤ c}.

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 62

Solusi
1. Misalkan x̄1 = 38.750, n1 = 100, σ12 = 3.300 dan x̄2 = 36.150, n2 = 100, σ22 = 2.700. Diperoleh
s
σ12 σ2
σ= + 2 = 7, 746
n1 n2

α
dan 2 = 0, 025 =⇒ Z0,025 = 1, 96. Kemudian, diperoleh selang kepercayaan

(x̄1 − x̄2 ) − Z0,025 σ <µ1 − µ2 < (x̄1 − x̄2 ) + Z0,025 σ

2584, 82 <µ1 − µ2 < 2615, 18216,

sehingga selang kepercayaan 95% untuk selisih rata-rata daging adalah

2584, 82 < µ1 − µ2 < 2615, 18216.

2. Diketahui n1 = 14, x̄1 = 6, 8, s1 = 1, 8 dan n2 = 15, x̄ = 5, 3, s2 = 1, 6.

H0 : σ12 = σ22

H1 : σ12 ̸= σ22

Galat α dan wilayah kritisnya

1
α = 0, 05 =⇒ f(0,05:13,14) = 2, 51, f(0,95:13,14) = = 0, 39.
2, 51

Wilayah kritis

Menentukan statistik uji

σ22 s21 (1, 8)2


fhitung = = 1 · = 1, 266 (jatuh di luar wilayah kritis)
σ12 s22 (1, 6)2

Kesimpulan: Tolak H1 , terima H0 bahwa σ12 = σ22 tidak ada perbedaan pemasukan iklan dari kedua acara
saat dilakukan perubahan jam tayang.

3. Fungsi kepadatan peluang dari X adalah



θxθ−1 ;

0<x<x
f (x; θ) = .
0;

lainnya

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 63

FKP gabungan dari X1 , X2 , ..., Xn adalah

L(θ; x1 , x2 , ..., xn ) = f (x1 ; θ) · f (x2 ; θ) · · · f (xn ; θ)

= (θxθ−1
1 )(θxθ−1
2 ) · · · (θxθ−1
n )
θ−1
= θn (Πni=1 xi ) .

FKP gabungan dari X1 , X2 , ..., Xn di bawah H0 adalah

1−1
L(1; x1 , x2 , ..., xn ) = 1n (Πni=1 xi ) = 1.

FKP gabungan dari X1 , X2 , ..., Xn di bawah H1 adalah

2−1
L(2; x1 , x2 , ..., xn ) = 2n (Πni=1 xi ) = 2n (Πni=1 xi ) .

Dari dalil Neyman-Pearson, berlaku


L(1; x1 , x2 , ..., xn )
≤k
L(2; x1 , x2 , ..., xn )

sehingga
1 1
≤ k ⇐⇒ (Πni=1 xi ) ≥ n = c.
2n (Πni=1 xi ) 2 k

Jadi, daerah kritis terbaiknya adalah

c = {(x1 , x2 , ..., xn ); c ≤ Πni=1 xi }.

4. Fungsi kepadatan peluang dari X adalah


 
1 1 2
f (x; µ) = √ exp − (x − µ)
6 2π 72

dengan H0 : µ = 50 dan H1 : µ < 50′′ . FKP gabungan dari X1 , X2 , ..., Xn berbentuk

L(µ; x1 , x2 , ..., xn ) = f (x1 ; µ) · f (x2 ; µ) · · · f (xn ; µ)


     
1 1 1 1
= √ exp − (x1 − µ)2 ··· √ exp − (xn − µ)2
6 2π 72 6 2π 72
 n  
1 1 2 2 2
= √ exp − ((x1 − µ) + (x2 − µ) + · · · + (xn − µ) )
6 2π 72
n
!
1 1 X 2
= √ exp − (xi − µ) .
(6 2π)n 72 i=1

FKP gabungan di bawah H0 berbentuk

n
!
1 1 X
L(50; x1 , x2 , ..., xn ) = √ exp − (xi − 50)2 .
(6 2π)n 72 i=1

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM


Bank Soal | Mata Kuliah Semester V 64

FKP gabungan di bawah H1 berbentuk

n
!
′′ 1 1 X
L(50 ; x1 , x2 , ..., xn ) = √ exp − (xi − 50′′ )2 .
(6 2π)n 72 i=1

Dari dalil Neyman-Pearson berlaku


L(50; x1 , x2 , ..., xn )
≤k
L(50′′ ; x1 , x2 , ..., xn )

sehingga
Pn
√1 1 2

L(50; x1 , x2 , ..., xn ) (6 2π)n
exp − 72 i=1 (xi − 50)
≤ k ⇐⇒ 1 1 n ≤k
L(50′′ ; x1 , x2 , ..., xn )

′′ 2
P

(6 2π)n
exp − 72 i=1 (xi − 50 )
n n
!
1 X 1 X
⇐⇒ exp − (xi − 50)2 + (xi − 50′′ )2 ≤k
72 i=1 72 i=1
n
!!
1 1X
⇐⇒ exp n(2, 50 − 2, 50′′ ) xi + n((50′′ )2 − (50)2 ) ≤k
72 n i=1

⇐⇒ exp(c1 x̄ + c2 ) ≤ k

⇐⇒ c1 x̄ + c2 ≤ ln(k)
ln(k) − c2
⇐⇒ x̄ ≤ = c.
c1

Jadi, daerah kritis paling kuasa secara seragam adalah

c = {(x1 , x2 , ..., xn ); x̄ ≤ c}.

Solusi Gusti Muhammad Rosyadi Penulis Yustisia Wibi Naufal

©Keilmuan | 2024 HIMATIKA “REAL” FMIPA ULM

You might also like