0% found this document useful (0 votes)
12 views30 pages

Stat2602 Chapter2

The document discusses sampling distributions and large sample theories in statistics, defining random samples and their properties. It explains the computation of sample mean, variance, and standard error, emphasizing their importance in statistical inference. Additionally, it covers the F-distribution, its applications in comparing variances, and provides examples to illustrate these concepts.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

Stat2602 Chapter2

The document discusses sampling distributions and large sample theories in statistics, defining random samples and their properties. It explains the computation of sample mean, variance, and standard error, emphasizing their importance in statistical inference. Additionally, it covers the F-distribution, its applications in comparing variances, and provides examples to illustrate these concepts.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Stat2602 Probability and Statistics II Fall 2014-2015

Chapter II Sampling Distributions and Large Sample


Theories

§ 2.1 Sampling Distributions

In mathematical statistics, the formal definition of a random sample is given as


follows.

Definition

Let X 1 , X 2 ,..., X n be independent random variables that have the same marginal
distribution with pdf f  x  . Then X 1 , X 2 ,..., X n are said to be independently
identically distributed (iid) with common pdf f  x  . We call this set of random
variables X 1 , X 2 ,..., X n  a random sample from f  x  . For a random sample, all
the random variables will have common mean and variance, i.e.

EX i    , Var  X i    2 .

Base on a random sample, usually we will compute some summary statistics such
as the sample mean and sample variance. The probabilistic behaviours of these
summary statistics are called the sampling distributions.

1 n
Sample Mean X  Xi
n i 1

2
EX    , Var  X  
n

The standard deviation of the sampling distribution of X is called the standard


error of the sample mean and can be computed by


se X   .
n

Note that the standard error gets smaller as the sample size increases. This reflects
the fact that the mean from a large sample is more likely to be close to  than the
mean from a small sample. The standard error may be interpreted as a typical
distance between the sample mean and the population mean.
p. 11
Stat2602 Probability and Statistics II Fall 2014-2015

In practice the population standard deviation  is unknown. It may be estimated


by the sample standard deviation S and the standard error of the sample mean may
be computed by

se X  
S
.
n

The importance of the standard error in statistical inference procedures will be


evident in later chapters. At this point, however, it can be noted that the magnitude
of se X  is helpful in determining the precision to which the mean and some
measures of variability may be reported.

Sample Variance S2 
1 n
 X i  X 2
n  1 i 1

 
E S2  2

Moment generating function of the sample mean

Let X 1 , X 2 ,..., X n be identically independently distributed with common moment


generating function M X t  . Then the moment generating function of the sample
mean can be evaluated by

n
  t 
M X t    M X    .
  n 

Example 2.1
iid
Let X 1 , X 2 ,..., X n ~  r be a random sample from the Chi-square distribution. Then
the moment generating function of the sample mean is given by
nr
n
 t     n 2 2
n
 1 n
M X t    M X      r2 
    for t  .
  n    1  2t n    n 2  t  2

 nr n   nr n 
which is the mgf of Gamma  ,  . Therefore X ~ Gamma  ,  .
 2 2  2 2
p. 12
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.1.2 Sampling Distributions from Normal Random Sample

In practical statistical analysis, a common assumption is that the population closely


resembles a normal distribution. The data analysis would be usually based on the
sampling distributions of the sample mean X and sample variance S 2 obtained
from a random sample with X 1 , X 2 ,..., X n ~ N  ,  2 , which are summarized below:
iid

1. Sampling distribution of the sample mean

 2 
X ~ N  , 
 n 

2. Sampling distribution of the sample variance

n  1S 2

~  n21
 2

3. The sample mean and sample variance are statistically independent.

4. Standardizations of the sample mean

X  X 
Z ~ N 0,1 , T ~ tn1
 n S n

Example 2.2

A company manufactures semiconductor components with thicknesses (in 0.001


inch) following the normal distribution with   3.2 and   0.60 . To keep a
check on the process, the QC department periodically take random samples of size
n  20 , and the process is regarded to be “out of control” whenever the sample
mean is X is found to smaller 2.85 or larger than 3.55, or the sample standard
deviation S is larger than a threshold value 0.828. What is the chance that a
random sample will falsely alarm “out of control” of the process even when the
mean  and standard deviation  remain unchanged?

X 1 , X 2 ,..., X 20 ~ N 3.2, 0.60 2 


iid
Random sample:

p. 13
Stat2602 Probability and Statistics II Fall 2014-2015

P  X  2.85 or X  3.55  P  X    0.35


 X  0.35 
 P   

  n 0. 60 20 
 P  Z  2.61 ( Z ~ N 0, 1 )
 21   2.61
 0.009 (from standard normal table)

 n  1S 2 19  0.8282 
P S  0.828  P   
  2
0 . 60 2

 P W  36.18 ( W ~ 192 )
 0.01 (from Chi-square table)

P alarm " out of control"  P  X  3.2  0.35 or S  0.828


 1  P  X  3.2  0.35 and S  0.828
 1  P  X  3.2  0.35P S  0.828
( X and S 2 are independent)
 1  1  0.009 1  0.01
 0.0189

§ 2.1.3 Snedecor’s F-Distribution

The Snedecor’s F-distribution (or in short, F-distribution) is a continuous


distribution that was created by George W. Snedecor and named in honour of Sir
Ronald A. Fisher, who is often regarded as the “father of statistics”. It is frequently
used in statistical analysis methods that compare variances, e.g. in the analysis of
variance (ANOVA) for analysing experimental data.

Similar to the Student’s t-Distribution, the F-distribution is motivated by the


problem of seeking for the distribution of the following ratio:

W1 r1
X
W2 r2

where W1 ~  r2 and W2 ~  r2 are two independent Chi-square random variables.


1 2

The derivation of the pdf of X is similar to that for the Student’s t-distribution and
can be found in the supplementary notes.

p. 14
Stat2602 Probability and Statistics II Fall 2014-2015

Definition

Let W1 and W2 be two random variables independently distributed as Chi-square


W r
with degrees of freedom r1 and r2 respectively. The ratio X  1 1 is said to
W2 r2
have an F-distribution with numerator degrees of freedom r1 and denominator
degrees of freedom r2 . It is denoted as X ~ F r1 , r2  .

r r 
 1 2  r1

r r 1 2

2   r1  2 1  r  2
f x   
2 r 1

Probability density function   x 1  1 x  , x0


 r1   r2   r2   r 
   2

2  2

Moment generating function Not defined

 r2
 for r  2
Mean and variance    r2  2
 undefined
 otherwise

 2r22 r1  r2  2 
 for r  4
 2   r1 r2  2 r2  4
2

 undefined
 otherwise

Unlike normal distribution and t-distribution, the F-distribution is asymmetric and


skewed towards the right.

p. 15
Stat2602 Probability and Statistics II Fall 2014-2015

0.8

0.7

0.6

F r1 , r2 
0.5

0.4

0.3

0.2 

0.1

0
0 1 2 3 4 5

Fr 1 , r2 , 

The notation Fr ,r ,
1 2
represents the 1001    th percentile of the F r1 ,r2 
distribution.

The F-distribution table

The F-distribution table tells the critical value under an F-distribution curve
according to a given tail area. Each table entry of three numbers represents a F-
distribution with specific degrees of freedom. The three numbers are critical values
corresponding to the commonly used tail areas 0.05, 0.01 and 0.001. For example,
for r1  3 , r2  5 , we have F3,5, 0.05  5.41 , F3, 5, 0.01  12.06 , F3, 5, 0.001  33.20 .

p. 16
Stat2602 Probability and Statistics II Fall 2014-2015

Theorem

Let X 1 , X 2 ,..., X m be i.i.d random variables distributed as N  x , x2  and


Y1 , Y2 ,...,Yn be i.i.d. random variables distributed as N  y , y2 . Assume that the two
random samples are independent. Denote S x2 and S y2 as the sample variances from
the X-sample and the Y-sample respectively. The sampling distribution of

S x2  x2
F 2
Sy  y2

is F-distribution with degrees of freedom m  1 and n  1 , i.e. F ~ F m  1, n  1 .

Proof

First of all, from the two samples, we have

m  1S 2
n  1S 2

W1  x
~ 2
independent of W2  y
~  n21 .
 2
x
m 1
 2
y

Then from the definition of F-distribution,

W1 m  1 S x2 S y2 S x2  x2
F   ~ F m  1, n  1 .
W2 n  1  x2  y2 S y2  y2

Example 2.3

Suppose there are two normal populations with unknown variances  x2 and  y2 .
Johnnie claims that the two populations should have same variance, i.e.  x2   y2 .
To make inference about his claim, he had drawn random samples with sizes
m  n  8 independently from each of these two populations. If his claim is correct,
what is the chance that the ratio of the sample variances, S x2 S y2 , would exceed 7?

If Johnnie’s claim is correct,  x2  y2  1 and we have

S x2
F  2 ~ F m  1, n  1 .
Sy

p. 17
Stat2602 Probability and Statistics II Fall 2014-2015

From the F-distribution table, with degrees of freedoms m  1  7 and n  1  7 ,


F7 , 7 , 0.01  6.99 . Therefore
 S x2 
P  2  7   0.01 .
 Sy 

So if the sample variances of his samples are calculated as S x2  16.26 and


S y2  2.32 , the ratio is equal to 16.26 2.32  7.009  7 . This would happen with
only 1% chance if his claim is correct. Hence based on the observed data, we may
draw a conclusion that his claim is not likely to be correct, i.e. it is more reasonable
to believe that the two populations have different variances.

§ 2.1.4 Order Statistics

Beside the sample mean and sample variance, some other commonly used sample
statistics such as maximum, minimum, median, quantiles, etc, are evaluated from
the sorted values of the sample. Such items of the random sample arranged from
the smallest to the largest are called the order statistics. Order statistics are
important in nonparametric inferences which were developed to deal with the
problem of violation of normal population assumption, or unknown population
distribution.

Definition

Suppose that X 1 , X 2 ,..., X n is a random sample of size n from a distribution, we let


the random variables
X 1  X 2   ...  X n 

denote the order statistics of this sample, i.e. X 1 is the minimum, X n  is the
maximum, X k  is the k-th smallest of X 1 , X 2 ,..., X n .

Note: Don’t confuse X k  with X k . The notation X k  represents a random variable


transformed from X 1 , X 2 ,..., X n and usually has a different distribution from the
marginal distribution of X k , as illustrated by the following example.

p. 18
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.4

Suppose that X 1 , X 2 ~ U 0,1 , then the order statistic X 2  is given by


iid

X 2 if X 1  X 2 ,
X 2   
X1 if X 1  X 2 .

The cdf of X 2  is given by

G  y   P X 2   y 
 P  X 2  y and X 1  X 2   P  X 1  y and X 1  X 2 
 PX 1  X 2  y   P X 2  X 1  y 
 0 1dx dx  0

y x2 y x1

0 1 2 0
1dx2 dx1
   1dx dx  
y x2 y x1

0 0 1 2 0 0
1dx2 dx1
  x dx  
y y

0 2 2 0
x1dx1
 y2 for 0  y  1 .

Hence the pdf of X 2  is obtained as

g  y   G'  y   2 y , 0  y  1 .

Therefore X 2  does not follow the uniform distribution.

Remark

In practice, we may observe ties when we sort the sample, especially when the
measured variable is discrete. The presence of ties would complicate the
distribution theory of order statistics. Therefore in following discussion about order
statistics, we will assume that the random sample arises from a continuous
distribution, such that the probability of observing ties is zero. In such case,
general formulae for the probability density functions of the order statistics can be
easily derived.

p. 19
Stat2602 Probability and Statistics II Fall 2014-2015

Theorem

Suppose that X 1 , X 2 ,..., X n is a random sample of size n from a continuous


distribution with cumulative distribution function F  x  and probability density
function f  x  , for a  x  b . Let X 1  X 2   ...  X n  be the order statistics of the
sample. Then for k  1,2,..., n , the probability density function of the k-th order
statistic X k  is given by

gk  y  
n!
F  y k 1 1  F  y nk f  y  , a  y  b.
k  1!n  k !
In particular, the probability density function of X n  (sample maximum) is

g n  y   nF  y  f y,
n 1
a  y  b;

and the probability density function of X 1 (sample minimum) is

g1  y   n1  F  y  f y,
n 1
a  y  b.

An informal justification of the formulae is given below. Those who are interested
in the formal proof may refer to the supplementary notes.

Suppose that the sample space a, b  is divided into three intervals, one from a to y,
a second from y to y  h (where h is a very small positive number), and the third
from y  h to b, as shown below.

k  1 of the sample X k  n  k of the sample

a y yh b

Each of the sample items from X 1 , X 2 ,..., X n would independently falls into the
three intervals with probabilities F  y  , F  y  h   F  y  , and 1  F  y  h 
respectively. For the value of X k  to fall into the second interval  y , y  h  , there
should be k  1 of X 1 , X 2 ,..., X n fall into the first interval, 1 falls into the second
interval, and n  k  fall into the third interval.

p. 20
Stat2602 Probability and Statistics II Fall 2014-2015

According to the multinomial distribution, the corresponding probability is

P  y  X k   y  h   F  y k 1 F  y  h   F  y 1  F  y  h nk


n!
k  1!1! n  k !
Dividing both sides by h and letting h  0 , we have

P  y  X k   y  h 
g k  y   lim
h 0
h
F  y  h  F  y

n!
F  y k 1 lim 1  F  y  h n k

k  1! n  k ! h 0
h

n!
F  y k 1 1  F  y nk f  y  .
k  1! n  k !

Example 2.5

Consider a random sample from the exponential distribution:

X 1 , X 2 ,..., X n ~ Exponentia l  

The pdf of the sample maximum X n  is

g n  y   nF  y  f y
n 1

 n1  e y  e y


n 1

 ne y 1  e y  ,


n 1
y  0.

The pdf of the sample minimum X 1 is

g n  y   n1  F  y  f y
n 1

 ne  e y
y n 1

 n e  n  y , y  0 .

Hence X 1 ~ Exponentia l n  , i.e. the sample minimum of an exponential random
sample is also distributed as exponential, with the parameter magnified n times.

p. 21
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.6

Consider a random sample from the uniform distribution in 0,1 :

X 1 , X 2 ,..., X n ~ U 0,1

The pdf of the k-th order statistic is given by


gk  y  
n!
F  y k 1 1  F  y nk f  y 
k  1! n  k !
n!
y k 1 1  y  ,
n k
 0 y 1
k  1! n  k !
n  1
y k 1 1  y  ,
n k
 0 y 1
k  n  k  1

which is the pdf of the beta distribution with parameters   k  1 and   n  k  1 .


Therefore
X k  ~ Beta k  1, n  k  1 .

Remarks

1. Unlike the random variables X 1 , X 2 ,..., X n in the random sample, the order
statistics X 1 , X  2  ,..., X n  are usually dependent.

2. Using similar arguments as in p.20-21, the joint pdf of X  j  and X k  ( j  k )


can be easily obtained as

g j ,k  x , y  
n!
F x j1 F  y   F x k  j1 1  F  y n k f x  f  y 
 j  1! k  j  1n  k !
where x  y .

3. The joint pdf of X 1 , X 2  ,..., X n  is

 n ! f  y1  f  y 2  f  y n  if y1  y 2    y n ,
g1, 2 ,...,n  y1 , y 2 , , y n   
0 otherwise.

p. 22
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2 Large Sample Theory

The probabilistic behaviour of the sample mean when the sample size n is large
(say, tends to infinity) is called the limiting distribution of the sample mean. Law
of large number (LLN) and the central limit theorem (CLT) are two of the most
important theorems in statistics concerning the limiting distribution of the sample
mean. These two theorems suggest the “nice” properties of the sample mean and
justify its advantages.

Before proceeding, we need to define what ‘convergence’ means in the context of


random variables.

§ 2.2.1 Modes of Convergence

Let X 1 , X 2 ,... be a sequence of random variables (not necessarily independent), X


be another random variable. Let FX  x  be the cumulative distribution function of
n

X n , FX  x  be the cumulative distribution function of X.

Converges in Distribution / Converges in Law / Weak Convergence

X n is said to converge in distribution to X if

lim FX n  x   FX  x 
n

for all points x at which FX  x  is continuous. It is denoted as X n 



L
X .

Example 2.7

iid
Suppose that U1 ,U 2 ,... ~ U 0,1 . Define X n as the maximum of U1 ,U 2 ,...,U n . Then
the cumulative distribution function of X n is given by

FX n  x   0 for x  0 ;
FX n  x   1 for x  1 ;
FX n  x   P  X n  x   P U1  x,U 2  x,...,U n  x 
 P U1  x P U 2  x  P U n  x 
 xn , for 0  x  1.

p. 23
Stat2602 Probability and Statistics II Fall 2014-2015

Therefore
0 if x  1
lim FX n  x    .
n
1 if x  1

On the other hand, consider a random variable X which is degenerated at 1, i.e.


P  X  1  1 . The cumulative distribution function of X is

0 if x  1
FX  x   P  X  x    .
1 if x  1

Hence lim FX n  x   FX  x  and therefore X n 



L
X . We may also write
n

Xn 

L
1

as X is degenerated at 1.

Now consider another random variable defined as Yn  n 1  X n  . The cumulative


distribution function of Yn is

 y  y
FYn  y   P Yn  y   P n 1  X n   y   P X n  1    1  FX n 1  
 n  n
0 if y  0
 n
  y
  1  1   if 0  y  n .
  n 
1 if y  n

Therefore
0 if y  0
lim FYn  y    y
n
 1 e if 0  y  

which is the cumulative distribution function of Exponential 1 . Hence


Yn  n 1  X n  converges in distribution to an exponential random variable with
parameter   1 , i.e.

Yn  n 1  max U 1 ,U 2 ,...,U n  

L
Exponential 1 .

p. 24
Stat2602 Probability and Statistics II Fall 2014-2015

Converges in probability

X n is said to converge in probability to X if for any   0 ,

lim P  X n  X     0 .
n

It is denoted as X n 
P
X.

Example 2.8

Consider the X n defined in example 2.7. Obviously,

P X n  1     0 if   1 .
For any 0    1 ,

P  X n  1     P 1  X n     P  X n  1     FX n 1     1    .
n

Therefore for any   0 , lim P  X n  1     0 and hence X n 


P
1.
n

Remarks

1. If X n converges in distribution to X , it only means that the “behaviour” of X n


is getting closer and closer to the “behaviour” of X . It doesn’t guarantee that
the observed value of X n should be often close to the observed value of X .

On the other hand, if X n converges in probability to X , it means that it should


be more and more likely that the observed value of X n is arbitrarily close to the
observed value of X.

iid
For example, if X 1 , X 2 ,... ~ N 0,1 , then X n 

L
X 1 because they have the
same distribution, but X n does not converge in probability to X 1 as
X n  X 1 ~ N 0,2  .

2. To check convergence in distribution, nothing needs to be known about the


joint distribution of X n and X, whereas this joint distribution must be defined to
check convergence in probability.

p. 25
Stat2602 Probability and Statistics II Fall 2014-2015

Converges Almost Surely / Converges Almost Everywhere / Strong Convergence

X n is said to converges almost surely to X if


P lim X n  X  1
n

It is denoted as X n a
.s .
X.

To understand almost surely convergence, recall the basic definition of a random


variable. A random variable is a real-valued function defined on the sample space
 , and X n a
.s .
X means

lim X n    X  
n 

for all   E such that P E   1 , i.e. the convergence of the sequence of numbers
X n   to X   holds for almost all  .

Example 2.9
 1n if U  1 2 ,

Suppose that U ~ U 0,1 and define Xn   1 2 if U  1 2 ,
 11 n if U  1 2 .

 0 if U  1 2,

Then for a particular value of U, lim X n   1 2 if U  1 2 , .
n 
 1 if U  1 2 .

Consider the Bernoulli random variable defined using the same U :

 0 if U  1 2,
X 
 1 if U  1 2 .

Obviously lim X n  X if and only if U  1 2 , so


n 


P lim
n 

X n  X  P U  1 2   1

and hence X n a


.s .
X.

p. 26
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

1. The relationships among the above three modes of convergences are as follows:

X n a
.s .
X  X n 
P
X  Xn 

L
X.

Note that the converse may not be true.

2. Although the definitions of convergence in probability and convergence almost


surely look similar, they are different statements. To understand the difference,
consider an analogy scenario of an archer who can improve his skill every time
he shoots. For example, at the beginning he may miss the bull eye 1 in 10 times.
Then later he may miss the bull eye 1 in 100 times. After more practices, he
misses only 1 in 1000 times, and so on. We can say that his shooting converges
in probability to the bull eye. However, it does not converge almost surely
because he will still miss infinitely often no matter how many times he practices.
In other words, there is always imperfection in his shooting, even though it
becomes increasingly less frequent.

Example 2.10

Consider a point  uniformly drawn from 0,1 . Define

X    
X 1      I 0,1   ,
X 2      I 0,1 2    , X 3      I 1 2,1   ,
X 4      I 0,1 3   , X 5      I 1 3,2 3   , X 6      I 2 3,1  
… … …

Obviously, the deviation of X n   and X   is an indicator variable I A   where n

the length of the interval An converges to zero as n   . For any   0 , the


probability P  X n  X    is equal to the probability that  fall into An , which
tends to 0. Hence X n 
P
X.

However, for every  , the value X n   alternates between the values  and
  1 infinitely often. There is no value of   0,1 for which X n   converges to
X   , i.e. X n does not converge to X almost surely.

p. 27
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.2 Law of Large Number (LLN)

Theorem

Let X 1 , X 2 ,... be a sequence of identically independently distributed random


variables with finite mean E  X i    and variance Var  X i    2 . Let
1 n
X n   X i be the sample mean of the random sample. Then the weak law of
n i 1
large number (WLLN) states that X n 
P
 , i.e. for arbitrary number   0 , we
have

lim P  X n       0
n
or alternatively,
lim P  X n       1 .
n

Proof
2
E X n    , Var  X n  
n

By Chebyshev’s inequality,
   2
P  X n       P X n  E  X n   Var  X n    .
  n  n

Taking the limit on both sides,

2
0  lim P  X n       lim 2  0 .
n n   n

Therefore lim P  X n       0 .
n

p. 28
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

1. The weak law of large numbers states that the sample mean from a large sample
would have a very high probability to be arbitrarily close to the population
mean, thereby promising a stable performance of the sample mean.

2. A more general version of the weak law of large number states that if
E  X i    , then X n 
P
 . Note that it does not require a finite population
variance.

3. The strong law of large numbers (SLLN) states that if and only if E  X i    ,

X n 
a.s
.

In other words, with probability 1 we have lim X n   . The proof of the strong
n 

law is omitted here and can be found in the classic book “A course in
probability theory”, written by Kai-Lai Chung.

4. The weak law states that for sufficiently large n, the sample mean X n is likely
to be near  , but still allowing X n     to happen an infinite number of
times, though at very infrequent intervals. On the other hand, the strong law
states that this almost surely won’t happen.

p. 29
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.3 Central Limit Theorem (CLT)

Since X n   converges to zero, it may be difficult to study the probabilistic


behaviour of the deviation X n   when n is large. To manifest the convergence of
n X n   
X n   , we consider the limiting distribution of .

Central Limit Theorem

Let X n be the sample mean of a sequence of independent and identically


distributed random variables X 1 , X 2 ,..., X n from a distribution with finite mean
E  X i    and variance Var  X i    2 . Then the central limit theorem states that

Xn   n X n    L
  N 0,1 ,
 n 

 n X n    
i.e. lim P  x     x  for all    x   .
n
  

Proof

Xi  
Let Yi  for i  1,2,... Then E Yi   0 and Var Yi   1 . Assume that the

moment generating function of Yi exists and is denoted by M Y t  . Consider the
Taylor’s expansion of M Y t  :

1 1
M Y t   M Y 0   M 'Y 0  t  M ' 'Y   t 2  1  M ' 'Y   t 2 for some 0    t .
2 2

n X    1 n
Let Z n 

  Yi . The moment generating function of Z n is
n i 1
n n
  t   t2  t
M Z t    M Y     1  M ' 'Y   for some 0    .
  n    2n  n
n

p. 30
Stat2602 Probability and Statistics II Fall 2014-2015

When n   ,   0 and hence M ' 'Y    M ' 'Y 0   E Y 2   1 . Therefore

n
 t2 
lim M Z t   lim1  M ' 'Y    e t
2
2
n  n 
 2n 
n

which is the moment generating function of N 0,1.

Remarks

1. The key to the above proof is the following lemma which we state without
proof.

Lemma

Let Z1 , Z 2 ,... be a sequence of random variables having moment generating


functions M Z n t  , n  1,2,... ; and let Z be a random variable having moment
generating function M Z t  . If lim M Z n t   M Z t  for all t, then Z n 

L
Z.
n

2. A more general proof of the CLT uses the so-called characteristic function
(always exists) and does not require the existence of the moment generating
function.

3. The CLT can be extended to the independent non-identically distributed


random variables by the Lindeberg-Feller Theorem. See the book “A course in
probability theory” for details.

4. The CLT is one of the most startling theorems in statistics. It found the basis of
other important theorems and provides us with some useful approximations for
the large-sample statistical analysis.

Xn  
5. Note that is in fact the standard score of X n . The CLT suggests that we
 n
  
can approximate the sampling distribution of X n by N   ,  . It is called
 n
the normal approximation.

p. 31
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.11

A random sample of size n  81 is taken from a population with mean   128 and
standard deviation   6.3 . Suppose that we are interested in the probability of
observing X to be falling between 126.6 and 129.4.

Since we don’t know the distribution of the population, there is no way to


determine the exact probability. Using the Chebyshev’s inequality, we can obtain a
lower bound:

P 126.6  X  129.4   P  X  128  1.4 


 
Var  X  
1.4
 P X   
 6.3 81 

 P X    2 Var  X  
1
1  0.75
22

To have a more precise assessment on the probability, we can apply normal


approximation as the sample size is large.

 126.6  128 X   129.4 


P 126.6  X  129.4   P   
 6 .3 81  n 6.3 81 
 X  
 P  2   2 
  n 
  2     2   0.9544

Example 2.12

Normal approximate binomial

iid
If Yi ~ b1, p  , then E Yi   p , Var Yi   p 1  p  .

n
Let X   Yi , then X ~ bn, p  , E  X   np , Var  X   np 1  p  .
i 1

The sample mean of Y’s can be written as Yn  X n .

p. 32
Stat2602 Probability and Statistics II Fall 2014-2015

By CLT,

n Yn  p  X  np X  EX  L
   N 0,1 as n   .
p 1  p  np 1  p  Var  X 

The approximation will be quite good when n  30 , np  5 and n1  p   5 .

Example 2.13

X ~ b30,0.25

 30 
P6  X  9     0.25 0.75  0.6008
9
x 30  x

x 6  x 

Using normal approximation,

np  300.25  7.5 , np1  p   300.250.75  5.625

 6  7.5 X  np 9  7.5 
P6  X  9   P  
 5.625 np1  p  5.625 

 0.632    0.632   0.4726

It would be more accurate to use

 5.5  7.5 X  np 9.5  7.5 


P6  X  9   P  
 5.625 np1  p  5.625 

 0.843   0.843  0.6006

The 0.5 added to or subtracted from the bounds in the probability statement is
called the continuity correction. In general, when a continuous distribution is used
to approximate a discrete distribution, it would be better to use

P X  c  0.5 instead of P X  c ;
P X  c  0.5 instead of P X  c ;
P X  c  0. 5 instead of P X  c ;
P X  c  0.5 instead of P X  c

where c is an integer.
p. 33
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.14

Normal approximate Poisson

If Yi ~ Poisson   , then E Yi    , Var Yi    .


iid

n
Let X   Yi , then X ~ n  , E  X   n , Var  X   n .
i 1

The sample mean of Y’s can be written as Yn  X n .

By CLT,
n Yn    X  n X  E  X  L
   N 0,1 as n   .
 n Var  X 

X 
Therefore for X ~   , 
L
N 0,1 as    .

The approximation will be quite adequate when   10 .

Example 2.15

X ~ Poisson10

e 1010 x
21
P 11  X  21    0.3025
x 12 x!

Using normal approximation,

 11  10 X   21  10 
P 11  X  21  P   
 10  10 
  3.48   0.32   0.3745

If we apply continuity correction,

 11.5  10 X   21.5  10 
P 11  X  21  P    
 10  10 
 3.64   0.47   0.3175

which is more accurate.


p. 34
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.16

Normal approximate Gamma / Chi-squared

If Yi ~ Exponential   , then E Yi  


1
Var Yi  
iid
1
, .
 2
n n
Let X   Yi , then X ~ Gamma n,   , Var  X  
n
EX   , .
i 1  2

The sample mean of Y’s can be written as Yn  X n .

By CLT,
n Yn  1   X n  X  EX  L
   N 0,1 as n   .
1 2 n 2 Var  X 

X  
Therefore for X ~ Gamma  ,   , 
L
N 0,1 as    .
 2

X r L
In particular for X ~  r2 ,  N 0,1 as r   .
2r

Example 2.17

X ~ 80
2

From Chi-squared distribution table, P 64.28  X  101.9   0.95  0.1  0.85

Using normal approximation,

 64.28  80 X  r 101.9  80 
P 64.28  X  101.9   P    
 160 2r 160 

  1.7313    1.2428  0.9583  0.1070  0.8513

p. 35
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

Note that there is no single magic sample size that guarantees that sampling
distributions will be approximately normal. If a population is fairly symmetrical,
small sample sizes usually suffice. For strongly skewed populations (e.g., Chi-
Square), n may need to be quite large for the sampling distribution to be
approximately normal. For many distributions that arise in practice, relatively
small sample sizes such as 30 is sufficiently large for the normal approximation to
hold. Do know, however, that there are exceptions.

The following diagram shows the relationship among some common families of
distributions. The limiting distributions are derived based on the central limit
theorem.

p. 36
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.4 Sampling Distribution of Sample Proportion

Suppose we draw a sample of size n from a very large population. Assume that a
proportion p of the objects in the population have a certain characteristic (e.g.
support the president, carrier of a certain virus, having annual salary more than
$400,000, etc). We may be interested in making inference about the population
proportion p.

Since the population is large, we can regard the n drawn sample units as n
independent Bernoulli trials, each with success probability p (success means the
drawn object has the characteristic). Let X be the number of objects in the sample
having the characteristic, then X ~ bn, p  .

The inference of the population proportion p (success probability) is often based on


the sample proportion (sample success rate) :

no. of successes in the sample X


Sample proportion pˆ  
sample size n

E  X  np
E  pˆ    p
n n

Var  X  np 1  p  p 1  p 
Var  pˆ    
n2 n2 n

In case when p is unknown, we can estimate it by p̂ and the standard error will be
calculated as

pˆ 1  pˆ 
se pˆ   .
n

X
As can be seen from Example 2.12, the sample proportion pˆ  can be regarded
n
as the sample mean of a random sample from the Bernoulli distribution.

From the law of large numbers, we have pˆ 



P
p.

pˆ  p
From the CLT, we have 

L
N 0,1 .
p 1  p  n

p. 37
Stat2602 Probability and Statistics II Fall 2014-2015

The sampling distribution of the sample proportion is therefore approximated as:

 p1  p  
Sampling distribution of p̂ pˆ ~ N  p, 
 n 

Example 2.18

With the rising costs of a college education, most students depend on their parents
or family for monetary support during their college years. The results of a
freshman survey last year indicate that 86% of freshmen in the survey received
financial aid from parents or family. Suppose that we were to survey the current
freshman class by selecting a random sample of n  400 freshmen.

If the percentage of freshmen receiving financial aid from parents or family


remains unchanged this year, i.e. p  0.86 , then the sample proportion p̂ will be
approximately distributed as normal:

 0.86  0.14 
pˆ ~ N  0.86, 
 400 

The probability that the sample proportion will be greater than 90% can be
calculated as
 0.9  0.86 
P  pˆ  0.9   1      1   2.306  0.01 .

 0 .86  0. 14 400 

It would be very unlikely to obtain a sample with pˆ  0.9 . Therefore if we


observed the data and found that more than 90% of the freshmen in the sample are
receiving financial aid from their families, then we may conclude that the
assumption of unchanged percentage ( p  0.86 ) is incorrect, i.e. the population
proportion in this year should have increased.

p. 38
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.5 Further Results About Convergence

The following theorem gives some results which are often useful in deriving
asymptotic distributions.

Slutzky’s Theorem

If X n 

L
X and Yn 

P
b , and g  x, y  is continuous at  x, b  for all x in the
range of X, then
g  X n , Yn  

L
g  X , b .

The proof is beyond the scope of this course. Some useful results from this
theorem are given below.

Corollaries

1. If X n 

P
a and g  x  is continuous at x  a , then

gX n  

P
g a  .

2. If X n 

L
X and g  x  is continuous for all x in the range of X, then

gX n  

L
g  X .

3. If X n 

P
a and Yn 

P
b , and g  x, y  is continuous at  x, y   a, b  , then

g  X n , Yn  

P
g a, b  .

Example 2.19

Suppose X 1 , X 2 ,..., X n is a random sample drawn from a population with finite


second moment. Then by the law of large numbers, we have

 E  X 2 
1 n 2 P
X

P
, 
n i 1
Xi 
and from the corollary 3,

X i  X 2  1  X i2  X 2  E X 2    2   2 .
1 n n


n i 1 n i 1
P

p. 39
Stat2602 Probability and Statistics II Fall 2014-2015

Since S 2 
1 n
 X i  X 2 and n 
P
1 , we have
n  1 i 1 n 1

S2 

P
2,

i.e. the sample variance converges in probability to the population variance.

n X   
Moreover, by CLT, we have 

L
N 0,1 . Consider

n X    n X    S S
 and 

P
1.
S   

Therefore by the Slutzky’s theorem, we have the following useful result for
making inference about the population mean with large samples, in cases when 
is unknown:

n X    L
 N 0,1
S

Example 2.20

From section 2.2.4, the sampling distribution of the sample proportion can be
approximated as
 p1  p  
pˆ ~ N  p, .
 n 

This result, however, may not be practically useful as the population proportion p
is usually unknown.

pˆ  p pˆ 1  pˆ  P p 1  p 
Since 

L
N 0,1 and pˆ  
P
p  ,
p 1  p  n n n
the following asymptotic distribution of the sample proportion results from the
Slutzky’s theorem:

pˆ  p


L
N 0,1
pˆ 1  pˆ  n

and is more useful for making inference about p from p̂ .

p. 40

You might also like