0% found this document useful (0 votes)

12 views30 pages

Stat2602 Chapter2

The document discusses sampling distributions and large sample theories in statistics, defining random samples and their properties. It explains the computation of sample mean, variance, and standard error, emphasizing their importance in statistical inference. Additionally, it covers the F-distribution, its applications in comparing variances, and provides examples to illustrate these concepts.

Uploaded by

jeffsiu456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views30 pages

Stat2602 Chapter2

Uploaded by

jeffsiu456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Stat2602 Probability and Statistics II Fall 2014-2015

Chapter II Sampling Distributions and Large Sample

Theories

§ 2.1 Sampling Distributions

In mathematical statistics, the formal definition of a random sample is given as

follows.

Definition

Let X 1 , X 2 ,..., X n be independent random variables that have the same marginal
distribution with pdf f  x  . Then X 1 , X 2 ,..., X n are said to be independently
identically distributed (iid) with common pdf f  x  . We call this set of random
variables X 1 , X 2 ,..., X n  a random sample from f  x  . For a random sample, all
the random variables will have common mean and variance, i.e.

EX i    , Var  X i    2 .

Base on a random sample, usually we will compute some summary statistics such
as the sample mean and sample variance. The probabilistic behaviours of these
summary statistics are called the sampling distributions.

1 n
Sample Mean X  Xi
n i 1

2
EX    , Var  X  
n

The standard deviation of the sampling distribution of X is called the standard

error of the sample mean and can be computed by


se X   .
n

Note that the standard error gets smaller as the sample size increases. This reflects
the fact that the mean from a large sample is more likely to be close to  than the
mean from a small sample. The standard error may be interpreted as a typical
distance between the sample mean and the population mean.
p. 11
Stat2602 Probability and Statistics II Fall 2014-2015

In practice the population standard deviation  is unknown. It may be estimated

by the sample standard deviation S and the standard error of the sample mean may
be computed by

se X  
S
.
n

The importance of the standard error in statistical inference procedures will be

evident in later chapters. At this point, however, it can be noted that the magnitude
of se X  is helpful in determining the precision to which the mean and some
measures of variability may be reported.

Sample Variance S2 
1 n
 X i  X 2
n  1 i 1

 
E S2  2

Moment generating function of the sample mean

Let X 1 , X 2 ,..., X n be identically independently distributed with common moment

generating function M X t  . Then the moment generating function of the sample
mean can be evaluated by

n
  t 
M X t    M X    .
  n 

Example 2.1
iid
Let X 1 , X 2 ,..., X n ~  r be a random sample from the Chi-square distribution. Then
the moment generating function of the sample mean is given by
nr
n
 t     n 2 2
n
 1 n
M X t    M X      r2 
    for t  .
  n    1  2t n    n 2  t  2

 nr n   nr n 
which is the mgf of Gamma  ,  . Therefore X ~ Gamma  ,  .
 2 2  2 2
p. 12
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.1.2 Sampling Distributions from Normal Random Sample

In practical statistical analysis, a common assumption is that the population closely

resembles a normal distribution. The data analysis would be usually based on the
sampling distributions of the sample mean X and sample variance S 2 obtained
from a random sample with X 1 , X 2 ,..., X n ~ N  ,  2 , which are summarized below:
iid

1. Sampling distribution of the sample mean

 2 
X ~ N  , 
 n 

2. Sampling distribution of the sample variance

n  1S 2

~  n21
 2

3. The sample mean and sample variance are statistically independent.

4. Standardizations of the sample mean

X  X 
Z ~ N 0,1 , T ~ tn1
 n S n

Example 2.2

A company manufactures semiconductor components with thicknesses (in 0.001

inch) following the normal distribution with   3.2 and   0.60 . To keep a
check on the process, the QC department periodically take random samples of size
n  20 , and the process is regarded to be “out of control” whenever the sample
mean is X is found to smaller 2.85 or larger than 3.55, or the sample standard
deviation S is larger than a threshold value 0.828. What is the chance that a
random sample will falsely alarm “out of control” of the process even when the
mean  and standard deviation  remain unchanged?

X 1 , X 2 ,..., X 20 ~ N 3.2, 0.60 2 

iid
Random sample:

p. 13
Stat2602 Probability and Statistics II Fall 2014-2015

P  X  2.85 or X  3.55  P  X    0.35

 X  0.35 
 P   

  n 0. 60 20 
 P  Z  2.61 ( Z ~ N 0, 1 )
 21   2.61
 0.009 (from standard normal table)

 n  1S 2 19  0.8282 
P S  0.828  P   
  2
0 . 60 2

 P W  36.18 ( W ~ 192 )
 0.01 (from Chi-square table)

P alarm " out of control"  P  X  3.2  0.35 or S  0.828

 1  P  X  3.2  0.35 and S  0.828
 1  P  X  3.2  0.35P S  0.828
( X and S 2 are independent)
 1  1  0.009 1  0.01
 0.0189

§ 2.1.3 Snedecor’s F-Distribution

The Snedecor’s F-distribution (or in short, F-distribution) is a continuous

distribution that was created by George W. Snedecor and named in honour of Sir
Ronald A. Fisher, who is often regarded as the “father of statistics”. It is frequently
used in statistical analysis methods that compare variances, e.g. in the analysis of
variance (ANOVA) for analysing experimental data.

Similar to the Student’s t-Distribution, the F-distribution is motivated by the

problem of seeking for the distribution of the following ratio:

W1 r1
X
W2 r2

where W1 ~  r2 and W2 ~  r2 are two independent Chi-square random variables.

1 2

The derivation of the pdf of X is similar to that for the Student’s t-distribution and
can be found in the supplementary notes.

p. 14
Stat2602 Probability and Statistics II Fall 2014-2015

Definition

Let W1 and W2 be two random variables independently distributed as Chi-square

W r
with degrees of freedom r1 and r2 respectively. The ratio X  1 1 is said to
W2 r2
have an F-distribution with numerator degrees of freedom r1 and denominator
degrees of freedom r2 . It is denoted as X ~ F r1 , r2  .

r r 
 1 2  r1

r r 1 2

2   r1  2 1  r  2
f x   
2 r 1

Probability density function   x 1  1 x  , x0

 r1   r2   r2   r 
   2

2  2

Moment generating function Not defined

 r2
 for r  2
Mean and variance    r2  2
 undefined
 otherwise

 2r22 r1  r2  2 
 for r  4
 2   r1 r2  2 r2  4
2

 undefined
 otherwise

Unlike normal distribution and t-distribution, the F-distribution is asymmetric and

skewed towards the right.

p. 15
Stat2602 Probability and Statistics II Fall 2014-2015

0.8

0.7

0.6

F r1 , r2 
0.5

0.4

0.3

0.2 

0.1

0
0 1 2 3 4 5

Fr 1 , r2 , 

The notation Fr ,r ,
1 2
represents the 1001    th percentile of the F r1 ,r2 
distribution.

The F-distribution table

The F-distribution table tells the critical value under an F-distribution curve
according to a given tail area. Each table entry of three numbers represents a F-
distribution with specific degrees of freedom. The three numbers are critical values
corresponding to the commonly used tail areas 0.05, 0.01 and 0.001. For example,
for r1  3 , r2  5 , we have F3,5, 0.05  5.41 , F3, 5, 0.01  12.06 , F3, 5, 0.001  33.20 .

p. 16
Stat2602 Probability and Statistics II Fall 2014-2015

Theorem

Let X 1 , X 2 ,..., X m be i.i.d random variables distributed as N  x , x2  and

Y1 , Y2 ,...,Yn be i.i.d. random variables distributed as N  y , y2 . Assume that the two
random samples are independent. Denote S x2 and S y2 as the sample variances from
the X-sample and the Y-sample respectively. The sampling distribution of

S x2  x2
F 2
Sy  y2

is F-distribution with degrees of freedom m  1 and n  1 , i.e. F ~ F m  1, n  1 .

Proof

First of all, from the two samples, we have

m  1S 2
n  1S 2

W1  x
~ 2
independent of W2  y
~  n21 .
 2
x
m 1
 2
y

Then from the definition of F-distribution,

W1 m  1 S x2 S y2 S x2  x2
F   ~ F m  1, n  1 .
W2 n  1  x2  y2 S y2  y2

Example 2.3

Suppose there are two normal populations with unknown variances  x2 and  y2 .
Johnnie claims that the two populations should have same variance, i.e.  x2   y2 .
To make inference about his claim, he had drawn random samples with sizes
m  n  8 independently from each of these two populations. If his claim is correct,
what is the chance that the ratio of the sample variances, S x2 S y2 , would exceed 7?

If Johnnie’s claim is correct,  x2  y2  1 and we have

S x2
F  2 ~ F m  1, n  1 .
Sy

p. 17
Stat2602 Probability and Statistics II Fall 2014-2015

From the F-distribution table, with degrees of freedoms m  1  7 and n  1  7 ,

F7 , 7 , 0.01  6.99 . Therefore
 S x2 
P  2  7   0.01 .
 Sy 

So if the sample variances of his samples are calculated as S x2  16.26 and

S y2  2.32 , the ratio is equal to 16.26 2.32  7.009  7 . This would happen with
only 1% chance if his claim is correct. Hence based on the observed data, we may
draw a conclusion that his claim is not likely to be correct, i.e. it is more reasonable
to believe that the two populations have different variances.

§ 2.1.4 Order Statistics

Beside the sample mean and sample variance, some other commonly used sample
statistics such as maximum, minimum, median, quantiles, etc, are evaluated from
the sorted values of the sample. Such items of the random sample arranged from
the smallest to the largest are called the order statistics. Order statistics are
important in nonparametric inferences which were developed to deal with the
problem of violation of normal population assumption, or unknown population
distribution.

Definition

Suppose that X 1 , X 2 ,..., X n is a random sample of size n from a distribution, we let

the random variables
X 1  X 2   ...  X n 

denote the order statistics of this sample, i.e. X 1 is the minimum, X n  is the
maximum, X k  is the k-th smallest of X 1 , X 2 ,..., X n .

Note: Don’t confuse X k  with X k . The notation X k  represents a random variable

transformed from X 1 , X 2 ,..., X n and usually has a different distribution from the
marginal distribution of X k , as illustrated by the following example.

p. 18
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.4

Suppose that X 1 , X 2 ~ U 0,1 , then the order statistic X 2  is given by

iid

X 2 if X 1  X 2 ,
X 2   
X1 if X 1  X 2 .

The cdf of X 2  is given by

G  y   P X 2   y 
 P  X 2  y and X 1  X 2   P  X 1  y and X 1  X 2 
 PX 1  X 2  y   P X 2  X 1  y 
 0 1dx dx  0

y x2 y x1

0 1 2 0
1dx2 dx1
   1dx dx  
y x2 y x1

0 0 1 2 0 0
1dx2 dx1
  x dx  
y y

0 2 2 0
x1dx1
 y2 for 0  y  1 .

Hence the pdf of X 2  is obtained as

g  y   G'  y   2 y , 0  y  1 .

Therefore X 2  does not follow the uniform distribution.

Remark

In practice, we may observe ties when we sort the sample, especially when the
measured variable is discrete. The presence of ties would complicate the
distribution theory of order statistics. Therefore in following discussion about order
statistics, we will assume that the random sample arises from a continuous
distribution, such that the probability of observing ties is zero. In such case,
general formulae for the probability density functions of the order statistics can be
easily derived.

p. 19
Stat2602 Probability and Statistics II Fall 2014-2015

Theorem

Suppose that X 1 , X 2 ,..., X n is a random sample of size n from a continuous

distribution with cumulative distribution function F  x  and probability density
function f  x  , for a  x  b . Let X 1  X 2   ...  X n  be the order statistics of the
sample. Then for k  1,2,..., n , the probability density function of the k-th order
statistic X k  is given by

gk  y  
n!
F  y k 1 1  F  y nk f  y  , a  y  b.
k  1!n  k !
In particular, the probability density function of X n  (sample maximum) is

g n  y   nF  y  f y,
n 1
a  y  b;

and the probability density function of X 1 (sample minimum) is

g1  y   n1  F  y  f y,
n 1
a  y  b.

An informal justification of the formulae is given below. Those who are interested
in the formal proof may refer to the supplementary notes.

Suppose that the sample space a, b  is divided into three intervals, one from a to y,
a second from y to y  h (where h is a very small positive number), and the third
from y  h to b, as shown below.

k  1 of the sample X k  n  k of the sample

a y yh b

Each of the sample items from X 1 , X 2 ,..., X n would independently falls into the
three intervals with probabilities F  y  , F  y  h   F  y  , and 1  F  y  h 
respectively. For the value of X k  to fall into the second interval  y , y  h  , there
should be k  1 of X 1 , X 2 ,..., X n fall into the first interval, 1 falls into the second
interval, and n  k  fall into the third interval.

p. 20
Stat2602 Probability and Statistics II Fall 2014-2015

According to the multinomial distribution, the corresponding probability is

P  y  X k   y  h   F  y k 1 F  y  h   F  y 1  F  y  h nk

n!
k  1!1! n  k !
Dividing both sides by h and letting h  0 , we have

P  y  X k   y  h 
g k  y   lim
h 0
h
F  y  h  F  y

n!
F  y k 1 lim 1  F  y  h n k

k  1! n  k ! h 0
h

n!
F  y k 1 1  F  y nk f  y  .
k  1! n  k !

Example 2.5

Consider a random sample from the exponential distribution:

X 1 , X 2 ,..., X n ~ Exponentia l  

The pdf of the sample maximum X n  is

g n  y   nF  y  f y
n 1

 n1  e y  e y

n 1

 ne y 1  e y  ,

n 1
y  0.

The pdf of the sample minimum X 1 is

g n  y   n1  F  y  f y
n 1

 ne  e y
y n 1

 n e  n  y , y  0 .

Hence X 1 ~ Exponentia l n  , i.e. the sample minimum of an exponential random
sample is also distributed as exponential, with the parameter magnified n times.

p. 21
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.6

Consider a random sample from the uniform distribution in 0,1 :

X 1 , X 2 ,..., X n ~ U 0,1

The pdf of the k-th order statistic is given by

gk  y  
n!
F  y k 1 1  F  y nk f  y 
k  1! n  k !
n!
y k 1 1  y  ,
n k
 0 y 1
k  1! n  k !
n  1
y k 1 1  y  ,
n k
 0 y 1
k  n  k  1

which is the pdf of the beta distribution with parameters   k  1 and   n  k  1 .

Therefore
X k  ~ Beta k  1, n  k  1 .

Remarks

1. Unlike the random variables X 1 , X 2 ,..., X n in the random sample, the order
statistics X 1 , X  2  ,..., X n  are usually dependent.

2. Using similar arguments as in p.20-21, the joint pdf of X  j  and X k  ( j  k )

can be easily obtained as

g j ,k  x , y  
n!
F x j1 F  y   F x k  j1 1  F  y n k f x  f  y 
 j  1! k  j  1n  k !
where x  y .

3. The joint pdf of X 1 , X 2  ,..., X n  is

 n ! f  y1  f  y 2  f  y n  if y1  y 2    y n ,
g1, 2 ,...,n  y1 , y 2 , , y n   
0 otherwise.

p. 22
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2 Large Sample Theory

The probabilistic behaviour of the sample mean when the sample size n is large
(say, tends to infinity) is called the limiting distribution of the sample mean. Law
of large number (LLN) and the central limit theorem (CLT) are two of the most
important theorems in statistics concerning the limiting distribution of the sample
mean. These two theorems suggest the “nice” properties of the sample mean and
justify its advantages.

Before proceeding, we need to define what ‘convergence’ means in the context of

random variables.

§ 2.2.1 Modes of Convergence

Let X 1 , X 2 ,... be a sequence of random variables (not necessarily independent), X

be another random variable. Let FX  x  be the cumulative distribution function of
n

X n , FX  x  be the cumulative distribution function of X.

Converges in Distribution / Converges in Law / Weak Convergence

X n is said to converge in distribution to X if

lim FX n  x   FX  x 
n

for all points x at which FX  x  is continuous. It is denoted as X n 


L
X .

Example 2.7

iid
Suppose that U1 ,U 2 ,... ~ U 0,1 . Define X n as the maximum of U1 ,U 2 ,...,U n . Then
the cumulative distribution function of X n is given by

FX n  x   0 for x  0 ;
FX n  x   1 for x  1 ;
FX n  x   P  X n  x   P U1  x,U 2  x,...,U n  x 
 P U1  x P U 2  x  P U n  x 
 xn , for 0  x  1.

p. 23
Stat2602 Probability and Statistics II Fall 2014-2015

Therefore
0 if x  1
lim FX n  x    .
n
1 if x  1

On the other hand, consider a random variable X which is degenerated at 1, i.e.

P  X  1  1 . The cumulative distribution function of X is

0 if x  1
FX  x   P  X  x    .
1 if x  1

Hence lim FX n  x   FX  x  and therefore X n 


L
X . We may also write
n

Xn 

L
1

as X is degenerated at 1.

Now consider another random variable defined as Yn  n 1  X n  . The cumulative

distribution function of Yn is

 y  y
FYn  y   P Yn  y   P n 1  X n   y   P X n  1    1  FX n 1  
 n  n
0 if y  0
 n
  y
  1  1   if 0  y  n .
  n 
1 if y  n


Therefore
0 if y  0
lim FYn  y    y
n
 1 e if 0  y  

which is the cumulative distribution function of Exponential 1 . Hence

Yn  n 1  X n  converges in distribution to an exponential random variable with
parameter   1 , i.e.

Yn  n 1  max U 1 ,U 2 ,...,U n  

L
Exponential 1 .

p. 24
Stat2602 Probability and Statistics II Fall 2014-2015

Converges in probability

X n is said to converge in probability to X if for any   0 ,

lim P  X n  X     0 .
n

It is denoted as X n 
P
X.

Example 2.8

Consider the X n defined in example 2.7. Obviously,

P X n  1     0 if   1 .
For any 0    1 ,

P  X n  1     P 1  X n     P  X n  1     FX n 1     1    .
n

Therefore for any   0 , lim P  X n  1     0 and hence X n 

P
1.
n

Remarks

1. If X n converges in distribution to X , it only means that the “behaviour” of X n

is getting closer and closer to the “behaviour” of X . It doesn’t guarantee that
the observed value of X n should be often close to the observed value of X .

On the other hand, if X n converges in probability to X , it means that it should

be more and more likely that the observed value of X n is arbitrarily close to the
observed value of X.

iid
For example, if X 1 , X 2 ,... ~ N 0,1 , then X n 

L
X 1 because they have the
same distribution, but X n does not converge in probability to X 1 as
X n  X 1 ~ N 0,2  .

2. To check convergence in distribution, nothing needs to be known about the

joint distribution of X n and X, whereas this joint distribution must be defined to
check convergence in probability.

p. 25
Stat2602 Probability and Statistics II Fall 2014-2015

Converges Almost Surely / Converges Almost Everywhere / Strong Convergence

X n is said to converges almost surely to X if


P lim X n  X  1
n

It is denoted as X n a
.s .
X.

To understand almost surely convergence, recall the basic definition of a random

variable. A random variable is a real-valued function defined on the sample space
 , and X n a
.s .
X means

lim X n    X  
n 

for all   E such that P E   1 , i.e. the convergence of the sequence of numbers
X n   to X   holds for almost all  .

Example 2.9
 1n if U  1 2 ,

Suppose that U ~ U 0,1 and define Xn   1 2 if U  1 2 ,
 11 n if U  1 2 .


 0 if U  1 2,

Then for a particular value of U, lim X n   1 2 if U  1 2 , .
n 
 1 if U  1 2 .


Consider the Bernoulli random variable defined using the same U :

 0 if U  1 2,
X 
 1 if U  1 2 .

Obviously lim X n  X if and only if U  1 2 , so

n 


P lim
n 

X n  X  P U  1 2   1

and hence X n a

.s .
X.

p. 26
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

1. The relationships among the above three modes of convergences are as follows:

X n a
.s .
X  X n 
P
X  Xn 

L
X.

Note that the converse may not be true.

2. Although the definitions of convergence in probability and convergence almost

surely look similar, they are different statements. To understand the difference,
consider an analogy scenario of an archer who can improve his skill every time
he shoots. For example, at the beginning he may miss the bull eye 1 in 10 times.
Then later he may miss the bull eye 1 in 100 times. After more practices, he
misses only 1 in 1000 times, and so on. We can say that his shooting converges
in probability to the bull eye. However, it does not converge almost surely
because he will still miss infinitely often no matter how many times he practices.
In other words, there is always imperfection in his shooting, even though it
becomes increasingly less frequent.

Example 2.10

Consider a point  uniformly drawn from 0,1 . Define

X    
X 1      I 0,1   ,
X 2      I 0,1 2    , X 3      I 1 2,1   ,
X 4      I 0,1 3   , X 5      I 1 3,2 3   , X 6      I 2 3,1  
… … …

Obviously, the deviation of X n   and X   is an indicator variable I A   where n

the length of the interval An converges to zero as n   . For any   0 , the

probability P  X n  X    is equal to the probability that  fall into An , which
tends to 0. Hence X n 
P
X.

However, for every  , the value X n   alternates between the values  and
  1 infinitely often. There is no value of   0,1 for which X n   converges to
X   , i.e. X n does not converge to X almost surely.

p. 27
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.2 Law of Large Number (LLN)

Theorem

Let X 1 , X 2 ,... be a sequence of identically independently distributed random

variables with finite mean E  X i    and variance Var  X i    2 . Let
1 n
X n   X i be the sample mean of the random sample. Then the weak law of
n i 1
large number (WLLN) states that X n 
P
 , i.e. for arbitrary number   0 , we
have

lim P  X n       0
n
or alternatively,
lim P  X n       1 .
n

Proof
2
E X n    , Var  X n  
n

By Chebyshev’s inequality,
   2
P  X n       P X n  E  X n   Var  X n    .
  n  n

Taking the limit on both sides,

2
0  lim P  X n       lim 2  0 .
n n   n

Therefore lim P  X n       0 .
n

p. 28
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

1. The weak law of large numbers states that the sample mean from a large sample
would have a very high probability to be arbitrarily close to the population
mean, thereby promising a stable performance of the sample mean.

2. A more general version of the weak law of large number states that if
E  X i    , then X n 
P
 . Note that it does not require a finite population
variance.

3. The strong law of large numbers (SLLN) states that if and only if E  X i    ,

X n 
a.s
.

In other words, with probability 1 we have lim X n   . The proof of the strong
n 

law is omitted here and can be found in the classic book “A course in
probability theory”, written by Kai-Lai Chung.

4. The weak law states that for sufficiently large n, the sample mean X n is likely
to be near  , but still allowing X n     to happen an infinite number of
times, though at very infrequent intervals. On the other hand, the strong law
states that this almost surely won’t happen.

p. 29
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.3 Central Limit Theorem (CLT)

Since X n   converges to zero, it may be difficult to study the probabilistic

behaviour of the deviation X n   when n is large. To manifest the convergence of
n X n   
X n   , we consider the limiting distribution of .


Central Limit Theorem

Let X n be the sample mean of a sequence of independent and identically

distributed random variables X 1 , X 2 ,..., X n from a distribution with finite mean
E  X i    and variance Var  X i    2 . Then the central limit theorem states that

Xn   n X n    L
  N 0,1 ,
 n 

 n X n    
i.e. lim P  x     x  for all    x   .
n
  

Proof

Xi  
Let Yi  for i  1,2,... Then E Yi   0 and Var Yi   1 . Assume that the

moment generating function of Yi exists and is denoted by M Y t  . Consider the
Taylor’s expansion of M Y t  :

1 1
M Y t   M Y 0   M 'Y 0  t  M ' 'Y   t 2  1  M ' 'Y   t 2 for some 0    t .
2 2

n X    1 n
Let Z n 

  Yi . The moment generating function of Z n is
n i 1
n n
  t   t2  t
M Z t    M Y     1  M ' 'Y   for some 0    .
  n    2n  n
n

p. 30
Stat2602 Probability and Statistics II Fall 2014-2015

When n   ,   0 and hence M ' 'Y    M ' 'Y 0   E Y 2   1 . Therefore

n
 t2 
lim M Z t   lim1  M ' 'Y    e t
2
2
n  n 
 2n 
n

which is the moment generating function of N 0,1.

Remarks

1. The key to the above proof is the following lemma which we state without
proof.

Lemma

Let Z1 , Z 2 ,... be a sequence of random variables having moment generating

functions M Z n t  , n  1,2,... ; and let Z be a random variable having moment
generating function M Z t  . If lim M Z n t   M Z t  for all t, then Z n 

L
Z.
n

2. A more general proof of the CLT uses the so-called characteristic function
(always exists) and does not require the existence of the moment generating
function.

3. The CLT can be extended to the independent non-identically distributed

random variables by the Lindeberg-Feller Theorem. See the book “A course in
probability theory” for details.

4. The CLT is one of the most startling theorems in statistics. It found the basis of
other important theorems and provides us with some useful approximations for
the large-sample statistical analysis.

Xn  
5. Note that is in fact the standard score of X n . The CLT suggests that we
 n
  
can approximate the sampling distribution of X n by N   ,  . It is called
 n
the normal approximation.

p. 31
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.11

A random sample of size n  81 is taken from a population with mean   128 and
standard deviation   6.3 . Suppose that we are interested in the probability of
observing X to be falling between 126.6 and 129.4.

Since we don’t know the distribution of the population, there is no way to

determine the exact probability. Using the Chebyshev’s inequality, we can obtain a
lower bound:

P 126.6  X  129.4   P  X  128  1.4 

 
Var  X  
1.4
 P X   
 6.3 81 

 P X    2 Var  X  
1
1  0.75
22

To have a more precise assessment on the probability, we can apply normal

approximation as the sample size is large.

 126.6  128 X   129.4 

P 126.6  X  129.4   P   
 6 .3 81  n 6.3 81 
 X  
 P  2   2 
  n 
  2     2   0.9544

Example 2.12

Normal approximate binomial

iid
If Yi ~ b1, p  , then E Yi   p , Var Yi   p 1  p  .

n
Let X   Yi , then X ~ bn, p  , E  X   np , Var  X   np 1  p  .
i 1

The sample mean of Y’s can be written as Yn  X n .

p. 32
Stat2602 Probability and Statistics II Fall 2014-2015

By CLT,

n Yn  p  X  np X  EX  L
   N 0,1 as n   .
p 1  p  np 1  p  Var  X 

The approximation will be quite good when n  30 , np  5 and n1  p   5 .

Example 2.13

X ~ b30,0.25

 30 
P6  X  9     0.25 0.75  0.6008
9
x 30  x

x 6  x 

Using normal approximation,

np  300.25  7.5 , np1  p   300.250.75  5.625

 6  7.5 X  np 9  7.5 
P6  X  9   P  
 5.625 np1  p  5.625 

 0.632    0.632   0.4726

It would be more accurate to use

 5.5  7.5 X  np 9.5  7.5 

P6  X  9   P  
 5.625 np1  p  5.625 

 0.843   0.843  0.6006

The 0.5 added to or subtracted from the bounds in the probability statement is
called the continuity correction. In general, when a continuous distribution is used
to approximate a discrete distribution, it would be better to use

P X  c  0.5 instead of P X  c ;
P X  c  0.5 instead of P X  c ;
P X  c  0. 5 instead of P X  c ;
P X  c  0.5 instead of P X  c

where c is an integer.
p. 33
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.14

Normal approximate Poisson

If Yi ~ Poisson   , then E Yi    , Var Yi    .

iid

n
Let X   Yi , then X ~ n  , E  X   n , Var  X   n .
i 1

The sample mean of Y’s can be written as Yn  X n .

By CLT,
n Yn    X  n X  E  X  L
   N 0,1 as n   .
 n Var  X 

X 
Therefore for X ~   , 
L
N 0,1 as    .


The approximation will be quite adequate when   10 .

Example 2.15

X ~ Poisson10

e 1010 x
21
P 11  X  21    0.3025
x 12 x!

Using normal approximation,

 11  10 X   21  10 
P 11  X  21  P   
 10  10 
  3.48   0.32   0.3745

If we apply continuity correction,

 11.5  10 X   21.5  10 
P 11  X  21  P    
 10  10 
 3.64   0.47   0.3175

which is more accurate.

p. 34
Stat2602 Probability and Statistics II Fall 2014-2015

Example 2.16

Normal approximate Gamma / Chi-squared

If Yi ~ Exponential   , then E Yi  

1
Var Yi  
iid
1
, .
 2
n n
Let X   Yi , then X ~ Gamma n,   , Var  X  
n
EX   , .
i 1  2

The sample mean of Y’s can be written as Yn  X n .

By CLT,
n Yn  1   X n  X  EX  L
   N 0,1 as n   .
1 2 n 2 Var  X 

X  
Therefore for X ~ Gamma  ,   , 
L
N 0,1 as    .
 2

X r L
In particular for X ~  r2 ,  N 0,1 as r   .
2r

Example 2.17

X ~ 80
2

From Chi-squared distribution table, P 64.28  X  101.9   0.95  0.1  0.85

Using normal approximation,

 64.28  80 X  r 101.9  80 
P 64.28  X  101.9   P    
 160 2r 160 

  1.7313    1.2428  0.9583  0.1070  0.8513

p. 35
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

Note that there is no single magic sample size that guarantees that sampling
distributions will be approximately normal. If a population is fairly symmetrical,
small sample sizes usually suffice. For strongly skewed populations (e.g., Chi-
Square), n may need to be quite large for the sampling distribution to be
approximately normal. For many distributions that arise in practice, relatively
small sample sizes such as 30 is sufficiently large for the normal approximation to
hold. Do know, however, that there are exceptions.

The following diagram shows the relationship among some common families of
distributions. The limiting distributions are derived based on the central limit
theorem.

p. 36
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.4 Sampling Distribution of Sample Proportion

Suppose we draw a sample of size n from a very large population. Assume that a
proportion p of the objects in the population have a certain characteristic (e.g.
support the president, carrier of a certain virus, having annual salary more than
$400,000, etc). We may be interested in making inference about the population
proportion p.

Since the population is large, we can regard the n drawn sample units as n
independent Bernoulli trials, each with success probability p (success means the
drawn object has the characteristic). Let X be the number of objects in the sample
having the characteristic, then X ~ bn, p  .

The inference of the population proportion p (success probability) is often based on

the sample proportion (sample success rate) :

no. of successes in the sample X

Sample proportion pˆ  
sample size n

E  X  np
E  pˆ    p
n n

Var  X  np 1  p  p 1  p 
Var  pˆ    
n2 n2 n

In case when p is unknown, we can estimate it by p̂ and the standard error will be
calculated as

pˆ 1  pˆ 
se pˆ   .
n

X
As can be seen from Example 2.12, the sample proportion pˆ  can be regarded
n
as the sample mean of a random sample from the Bernoulli distribution.

From the law of large numbers, we have pˆ 


P
p.

pˆ  p
From the CLT, we have 

L
N 0,1 .
p 1  p  n

p. 37
Stat2602 Probability and Statistics II Fall 2014-2015

The sampling distribution of the sample proportion is therefore approximated as:

 p1  p  
Sampling distribution of p̂ pˆ ~ N  p, 
 n 

Example 2.18

With the rising costs of a college education, most students depend on their parents
or family for monetary support during their college years. The results of a
freshman survey last year indicate that 86% of freshmen in the survey received
financial aid from parents or family. Suppose that we were to survey the current
freshman class by selecting a random sample of n  400 freshmen.

If the percentage of freshmen receiving financial aid from parents or family

remains unchanged this year, i.e. p  0.86 , then the sample proportion p̂ will be
approximately distributed as normal:

 0.86  0.14 
pˆ ~ N  0.86, 
 400 

The probability that the sample proportion will be greater than 90% can be
calculated as
 0.9  0.86 
P  pˆ  0.9   1      1   2.306  0.01 .

 0 .86  0. 14 400 

It would be very unlikely to obtain a sample with pˆ  0.9 . Therefore if we

observed the data and found that more than 90% of the freshmen in the sample are
receiving financial aid from their families, then we may conclude that the
assumption of unchanged percentage ( p  0.86 ) is incorrect, i.e. the population
proportion in this year should have increased.

p. 38
Stat2602 Probability and Statistics II Fall 2014-2015

§ 2.2.5 Further Results About Convergence

The following theorem gives some results which are often useful in deriving
asymptotic distributions.

Slutzky’s Theorem

If X n 

L
X and Yn 

P
b , and g  x, y  is continuous at  x, b  for all x in the
range of X, then
g  X n , Yn  

L
g  X , b .

The proof is beyond the scope of this course. Some useful results from this
theorem are given below.

Corollaries

1. If X n 

P
a and g  x  is continuous at x  a , then

gX n  

P
g a  .

2. If X n 

L
X and g  x  is continuous for all x in the range of X, then

gX n  

L
g  X .

3. If X n 

P
a and Yn 

P
b , and g  x, y  is continuous at  x, y   a, b  , then

g  X n , Yn  

P
g a, b  .

Example 2.19

Suppose X 1 , X 2 ,..., X n is a random sample drawn from a population with finite

second moment. Then by the law of large numbers, we have

 E  X 2 
1 n 2 P
X

P
, 
n i 1
Xi 
and from the corollary 3,

X i  X 2  1  X i2  X 2  E X 2    2   2 .
1 n n


n i 1 n i 1
P

p. 39
Stat2602 Probability and Statistics II Fall 2014-2015

Since S 2 
1 n
 X i  X 2 and n 
P
1 , we have
n  1 i 1 n 1

S2 

P
2,

i.e. the sample variance converges in probability to the population variance.

n X   
Moreover, by CLT, we have 

L
N 0,1 . Consider


n X    n X    S S
 and 

P
1.
S   

Therefore by the Slutzky’s theorem, we have the following useful result for
making inference about the population mean with large samples, in cases when 
is unknown:

n X    L
 N 0,1
S

Example 2.20

From section 2.2.4, the sampling distribution of the sample proportion can be
approximated as
 p1  p  
pˆ ~ N  p, .
 n 

This result, however, may not be practically useful as the population proportion p
is usually unknown.

pˆ  p pˆ 1  pˆ  P p 1  p 
Since 

L
N 0,1 and pˆ  
P
p  ,
p 1  p  n n n
the following asymptotic distribution of the sample proportion results from the
Slutzky’s theorem:

pˆ  p


L
N 0,1
pˆ 1  pˆ  n

and is more useful for making inference about p from p̂ .

p. 40

5b Sampling Distribution
No ratings yet
5b Sampling Distribution
13 pages
Basic Statistical Concepts
No ratings yet
Basic Statistical Concepts
14 pages
Sampling Distributions Guide
No ratings yet
Sampling Distributions Guide
12 pages
8.chapter 2
No ratings yet
8.chapter 2
13 pages
Chap 1 Sampling Distributions
No ratings yet
Chap 1 Sampling Distributions
14 pages
Engineering Statistics-5
No ratings yet
Engineering Statistics-5
39 pages
Week 9
No ratings yet
Week 9
19 pages
Chapter 5
No ratings yet
Chapter 5
21 pages
Sampling Distributions
No ratings yet
Sampling Distributions
32 pages
MIT2 854F10 Stats
No ratings yet
MIT2 854F10 Stats
38 pages
CHAPTER 5 Distributions of Functions of Random Variables
No ratings yet
CHAPTER 5 Distributions of Functions of Random Variables
6 pages
Chapter 6
No ratings yet
Chapter 6
12 pages
Chapter 4
No ratings yet
Chapter 4
20 pages
T 4 Sampling Distributions
No ratings yet
T 4 Sampling Distributions
13 pages
Chapter 4. Sampling Distributions
No ratings yet
Chapter 4. Sampling Distributions
31 pages
Math 301 CH 8 Sampling Distributions
No ratings yet
Math 301 CH 8 Sampling Distributions
38 pages
Sampling Distributions and Statistics
No ratings yet
Sampling Distributions and Statistics
12 pages
Finals Math14
No ratings yet
Finals Math14
29 pages
Lecture No. Probability & Statistics
No ratings yet
Lecture No. Probability & Statistics
34 pages
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
100% (4)
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
282 pages
Statistics Lecture 3 Summary
No ratings yet
Statistics Lecture 3 Summary
5 pages
Sampling Distribution (19.09.2020)
No ratings yet
Sampling Distribution (19.09.2020)
23 pages
Chapter 6
No ratings yet
Chapter 6
7 pages
Untitled 3
No ratings yet
Untitled 3
32 pages
Session 31 - Sample Statistics
No ratings yet
Session 31 - Sample Statistics
28 pages
Part 2-1 Random Samples Sampling Distributions - Notes
No ratings yet
Part 2-1 Random Samples Sampling Distributions - Notes
8 pages
Lecture 3
No ratings yet
Lecture 3
37 pages
Understanding the Central Limit Theorem
100% (3)
Understanding the Central Limit Theorem
38 pages
5 BSM214 Lecture5 Fall2023
No ratings yet
5 BSM214 Lecture5 Fall2023
25 pages
Sampling Distributions & Estimation
No ratings yet
Sampling Distributions & Estimation
75 pages
Statistics Review
No ratings yet
Statistics Review
16 pages
Vi. Standard Scores and The Normal Distribution
No ratings yet
Vi. Standard Scores and The Normal Distribution
6 pages
Math Statistics
No ratings yet
Math Statistics
4 pages
Stat1 Formulas and Tables For Statistics 2022
No ratings yet
Stat1 Formulas and Tables For Statistics 2022
34 pages
CH 02
No ratings yet
CH 02
41 pages
Sampling Distributions in Statistics
No ratings yet
Sampling Distributions in Statistics
17 pages
3 - Introduction To Inferential Statistics
No ratings yet
3 - Introduction To Inferential Statistics
32 pages
Understanding Probability Distributions in Statistics
No ratings yet
Understanding Probability Distributions in Statistics
51 pages
Screenshot 2024-12-15 at 01.18.34
No ratings yet
Screenshot 2024-12-15 at 01.18.34
161 pages
Normal Distribution and Sampling Basics
No ratings yet
Normal Distribution and Sampling Basics
65 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Econ Review Stat W2 2025
No ratings yet
Econ Review Stat W2 2025
49 pages
Da Session 3
No ratings yet
Da Session 3
72 pages
Sampling Dist
No ratings yet
Sampling Dist
34 pages
Chapter 8 - Sampling Distribution
No ratings yet
Chapter 8 - Sampling Distribution
34 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Econ-2042 - Unit 5-HO
No ratings yet
Econ-2042 - Unit 5-HO
22 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Lect9 Math231
No ratings yet
Lect9 Math231
42 pages
Chapter4 Random Samples December 2024
No ratings yet
Chapter4 Random Samples December 2024
23 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
FormulaSheet Final
No ratings yet
FormulaSheet Final
19 pages
STAT2601B (23-24, 2nd) Chapter 10
No ratings yet
STAT2601B (23-24, 2nd) Chapter 10
12 pages
Distributions 1
No ratings yet
Distributions 1
18 pages
Applied Statistics and Probability For Engineers Chapter - 7
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 7
8 pages
Binomial Distribution Formulas for Class 12
No ratings yet
Binomial Distribution Formulas for Class 12
8 pages
Stat2602 Chapter6 Part 2
No ratings yet
Stat2602 Chapter6 Part 2
25 pages
Stat2602 Chapter6 Part 1
No ratings yet
Stat2602 Chapter6 Part 1
16 pages
Stat2602 Chapter1
No ratings yet
Stat2602 Chapter1
10 pages
Chap 3.1
No ratings yet
Chap 3.1
25 pages
Chap 3.2
No ratings yet
Chap 3.2
23 pages
Module 02
No ratings yet
Module 02
80 pages
Tmux Keyboard Shortcuts Guide
No ratings yet
Tmux Keyboard Shortcuts Guide
2 pages
Chapter 30 - Nursing Care of Patients With Upper Respiratory Tract Disorders
No ratings yet
Chapter 30 - Nursing Care of Patients With Upper Respiratory Tract Disorders
35 pages
Diagnostic Test Session 1
No ratings yet
Diagnostic Test Session 1
101 pages
CylanceON-PREM Admin Guide
No ratings yet
CylanceON-PREM Admin Guide
117 pages
Ground 124707
No ratings yet
Ground 124707
1 page
United States Court of Appeals, Third Circuit
No ratings yet
United States Court of Appeals, Third Circuit
4 pages
Liars Poker Ebook and TestBank Bundle Instructor Test Bank
No ratings yet
Liars Poker Ebook and TestBank Bundle Instructor Test Bank
304 pages
6 Chapter 6 - VR and AR
No ratings yet
6 Chapter 6 - VR and AR
51 pages
Modern Architecture - A Critical History
No ratings yet
Modern Architecture - A Critical History
5 pages
Grade 12 Chemistry Investigative Project 2023-24
No ratings yet
Grade 12 Chemistry Investigative Project 2023-24
8 pages
65b Evidence Act
No ratings yet
65b Evidence Act
6 pages
ScholarGyani Research Paper
No ratings yet
ScholarGyani Research Paper
5 pages
Authentic Art Nouveau Jewelry Designs
100% (1)
Authentic Art Nouveau Jewelry Designs
52 pages
Metro Level 2 Unit 6 Test A One Star
100% (1)
Metro Level 2 Unit 6 Test A One Star
3 pages
Water Facts for Students
No ratings yet
Water Facts for Students
19 pages
Remote Learning With Pear Deck
No ratings yet
Remote Learning With Pear Deck
26 pages
Value Line in Depth Guide
No ratings yet
Value Line in Depth Guide
24 pages
Iso 12151-3
No ratings yet
Iso 12151-3
1 page
Argobba People
No ratings yet
Argobba People
56 pages
Reflection Writing
No ratings yet
Reflection Writing
2 pages
Lipata pORT pROFILE
No ratings yet
Lipata pORT pROFILE
6 pages
MyAir Ducted AC Touchscreen Control
No ratings yet
MyAir Ducted AC Touchscreen Control
16 pages
Introduction to Elementary Probability
100% (2)
Introduction to Elementary Probability
35 pages
Ziale Lpqe - Commercial Transactions Notes On Commercial Credit
No ratings yet
Ziale Lpqe - Commercial Transactions Notes On Commercial Credit
5 pages
S300, S500, S2700, S5700, and S6700 V200R023C00 Configuration Guide - Interface Management
No ratings yet
S300, S500, S2700, S5700, and S6700 V200R023C00 Configuration Guide - Interface Management
123 pages
Annex 4 Sworn Statement of Source of Funds
No ratings yet
Annex 4 Sworn Statement of Source of Funds
1 page
BC - Question Bank
No ratings yet
BC - Question Bank
13 pages
Sheffield Wednesday - An Inside Job!: Cast
No ratings yet
Sheffield Wednesday - An Inside Job!: Cast
12 pages
Eeeb273 n06 - Diff Amp Fet x6
No ratings yet
Eeeb273 n06 - Diff Amp Fet x6
4 pages