Stat2602 Chapter2
Stat2602 Chapter2
Definition
Let X 1 , X 2 ,..., X n be independent random variables that have the same marginal
distribution with pdf f x . Then X 1 , X 2 ,..., X n are said to be independently
identically distributed (iid) with common pdf f x . We call this set of random
variables X 1 , X 2 ,..., X n a random sample from f x . For a random sample, all
the random variables will have common mean and variance, i.e.
EX i , Var X i 2 .
Base on a random sample, usually we will compute some summary statistics such
as the sample mean and sample variance. The probabilistic behaviours of these
summary statistics are called the sampling distributions.
1 n
Sample Mean X Xi
n i 1
2
EX , Var X
n
se X .
n
Note that the standard error gets smaller as the sample size increases. This reflects
the fact that the mean from a large sample is more likely to be close to than the
mean from a small sample. The standard error may be interpreted as a typical
distance between the sample mean and the population mean.
p. 11
Stat2602 Probability and Statistics II Fall 2014-2015
se X
S
.
n
Sample Variance S2
1 n
X i X 2
n 1 i 1
E S2 2
n
t
M X t M X .
n
Example 2.1
iid
Let X 1 , X 2 ,..., X n ~ r be a random sample from the Chi-square distribution. Then
the moment generating function of the sample mean is given by
nr
n
t n 2 2
n
1 n
M X t M X r2
for t .
n 1 2t n n 2 t 2
nr n nr n
which is the mgf of Gamma , . Therefore X ~ Gamma , .
2 2 2 2
p. 12
Stat2602 Probability and Statistics II Fall 2014-2015
2
X ~ N ,
n
n 1S 2
~ n21
2
X X
Z ~ N 0,1 , T ~ tn1
n S n
Example 2.2
p. 13
Stat2602 Probability and Statistics II Fall 2014-2015
n 1S 2 19 0.8282
P S 0.828 P
2
0 . 60 2
P W 36.18 ( W ~ 192 )
0.01 (from Chi-square table)
W1 r1
X
W2 r2
The derivation of the pdf of X is similar to that for the Student’s t-distribution and
can be found in the supplementary notes.
p. 14
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
r r
1 2 r1
r r 1 2
2 r1 2 1 r 2
f x
2 r 1
2 2
r2
for r 2
Mean and variance r2 2
undefined
otherwise
2r22 r1 r2 2
for r 4
2 r1 r2 2 r2 4
2
undefined
otherwise
p. 15
Stat2602 Probability and Statistics II Fall 2014-2015
0.8
0.7
0.6
F r1 , r2
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
Fr 1 , r2 ,
The notation Fr ,r ,
1 2
represents the 1001 th percentile of the F r1 ,r2
distribution.
The F-distribution table tells the critical value under an F-distribution curve
according to a given tail area. Each table entry of three numbers represents a F-
distribution with specific degrees of freedom. The three numbers are critical values
corresponding to the commonly used tail areas 0.05, 0.01 and 0.001. For example,
for r1 3 , r2 5 , we have F3,5, 0.05 5.41 , F3, 5, 0.01 12.06 , F3, 5, 0.001 33.20 .
p. 16
Stat2602 Probability and Statistics II Fall 2014-2015
Theorem
S x2 x2
F 2
Sy y2
Proof
m 1S 2
n 1S 2
W1 x
~ 2
independent of W2 y
~ n21 .
2
x
m 1
2
y
W1 m 1 S x2 S y2 S x2 x2
F ~ F m 1, n 1 .
W2 n 1 x2 y2 S y2 y2
Example 2.3
Suppose there are two normal populations with unknown variances x2 and y2 .
Johnnie claims that the two populations should have same variance, i.e. x2 y2 .
To make inference about his claim, he had drawn random samples with sizes
m n 8 independently from each of these two populations. If his claim is correct,
what is the chance that the ratio of the sample variances, S x2 S y2 , would exceed 7?
S x2
F 2 ~ F m 1, n 1 .
Sy
p. 17
Stat2602 Probability and Statistics II Fall 2014-2015
Beside the sample mean and sample variance, some other commonly used sample
statistics such as maximum, minimum, median, quantiles, etc, are evaluated from
the sorted values of the sample. Such items of the random sample arranged from
the smallest to the largest are called the order statistics. Order statistics are
important in nonparametric inferences which were developed to deal with the
problem of violation of normal population assumption, or unknown population
distribution.
Definition
denote the order statistics of this sample, i.e. X 1 is the minimum, X n is the
maximum, X k is the k-th smallest of X 1 , X 2 ,..., X n .
p. 18
Stat2602 Probability and Statistics II Fall 2014-2015
Example 2.4
X 2 if X 1 X 2 ,
X 2
X1 if X 1 X 2 .
G y P X 2 y
P X 2 y and X 1 X 2 P X 1 y and X 1 X 2
PX 1 X 2 y P X 2 X 1 y
0 1dx dx 0
y x2 y x1
0 1 2 0
1dx2 dx1
1dx dx
y x2 y x1
0 0 1 2 0 0
1dx2 dx1
x dx
y y
0 2 2 0
x1dx1
y2 for 0 y 1 .
g y G' y 2 y , 0 y 1 .
Remark
In practice, we may observe ties when we sort the sample, especially when the
measured variable is discrete. The presence of ties would complicate the
distribution theory of order statistics. Therefore in following discussion about order
statistics, we will assume that the random sample arises from a continuous
distribution, such that the probability of observing ties is zero. In such case,
general formulae for the probability density functions of the order statistics can be
easily derived.
p. 19
Stat2602 Probability and Statistics II Fall 2014-2015
Theorem
gk y
n!
F y k 1 1 F y nk f y , a y b.
k 1!n k !
In particular, the probability density function of X n (sample maximum) is
g n y nF y f y,
n 1
a y b;
g1 y n1 F y f y,
n 1
a y b.
An informal justification of the formulae is given below. Those who are interested
in the formal proof may refer to the supplementary notes.
Suppose that the sample space a, b is divided into three intervals, one from a to y,
a second from y to y h (where h is a very small positive number), and the third
from y h to b, as shown below.
a y yh b
Each of the sample items from X 1 , X 2 ,..., X n would independently falls into the
three intervals with probabilities F y , F y h F y , and 1 F y h
respectively. For the value of X k to fall into the second interval y , y h , there
should be k 1 of X 1 , X 2 ,..., X n fall into the first interval, 1 falls into the second
interval, and n k fall into the third interval.
p. 20
Stat2602 Probability and Statistics II Fall 2014-2015
P y X k y h
g k y lim
h 0
h
F y h F y
n!
F y k 1 lim 1 F y h n k
k 1! n k ! h 0
h
n!
F y k 1 1 F y nk f y .
k 1! n k !
Example 2.5
X 1 , X 2 ,..., X n ~ Exponentia l
g n y nF y f y
n 1
g n y n1 F y f y
n 1
ne e y
y n 1
n e n y , y 0 .
Hence X 1 ~ Exponentia l n , i.e. the sample minimum of an exponential random
sample is also distributed as exponential, with the parameter magnified n times.
p. 21
Stat2602 Probability and Statistics II Fall 2014-2015
Example 2.6
X 1 , X 2 ,..., X n ~ U 0,1
Remarks
1. Unlike the random variables X 1 , X 2 ,..., X n in the random sample, the order
statistics X 1 , X 2 ,..., X n are usually dependent.
g j ,k x , y
n!
F x j1 F y F x k j1 1 F y n k f x f y
j 1! k j 1n k !
where x y .
n ! f y1 f y 2 f y n if y1 y 2 y n ,
g1, 2 ,...,n y1 , y 2 , , y n
0 otherwise.
p. 22
Stat2602 Probability and Statistics II Fall 2014-2015
The probabilistic behaviour of the sample mean when the sample size n is large
(say, tends to infinity) is called the limiting distribution of the sample mean. Law
of large number (LLN) and the central limit theorem (CLT) are two of the most
important theorems in statistics concerning the limiting distribution of the sample
mean. These two theorems suggest the “nice” properties of the sample mean and
justify its advantages.
lim FX n x FX x
n
Example 2.7
iid
Suppose that U1 ,U 2 ,... ~ U 0,1 . Define X n as the maximum of U1 ,U 2 ,...,U n . Then
the cumulative distribution function of X n is given by
FX n x 0 for x 0 ;
FX n x 1 for x 1 ;
FX n x P X n x P U1 x,U 2 x,...,U n x
P U1 x P U 2 x P U n x
xn , for 0 x 1.
p. 23
Stat2602 Probability and Statistics II Fall 2014-2015
Therefore
0 if x 1
lim FX n x .
n
1 if x 1
0 if x 1
FX x P X x .
1 if x 1
Xn
L
1
as X is degenerated at 1.
y y
FYn y P Yn y P n 1 X n y P X n 1 1 FX n 1
n n
0 if y 0
n
y
1 1 if 0 y n .
n
1 if y n
Therefore
0 if y 0
lim FYn y y
n
1 e if 0 y
Yn n 1 max U 1 ,U 2 ,...,U n
L
Exponential 1 .
p. 24
Stat2602 Probability and Statistics II Fall 2014-2015
Converges in probability
lim P X n X 0 .
n
It is denoted as X n
P
X.
Example 2.8
P X n 1 0 if 1 .
For any 0 1 ,
P X n 1 P 1 X n P X n 1 FX n 1 1 .
n
Remarks
iid
For example, if X 1 , X 2 ,... ~ N 0,1 , then X n
L
X 1 because they have the
same distribution, but X n does not converge in probability to X 1 as
X n X 1 ~ N 0,2 .
p. 25
Stat2602 Probability and Statistics II Fall 2014-2015
P lim X n X 1
n
It is denoted as X n a
.s .
X.
lim X n X
n
for all E such that P E 1 , i.e. the convergence of the sequence of numbers
X n to X holds for almost all .
Example 2.9
1n if U 1 2 ,
Suppose that U ~ U 0,1 and define Xn 1 2 if U 1 2 ,
11 n if U 1 2 .
0 if U 1 2,
Then for a particular value of U, lim X n 1 2 if U 1 2 , .
n
1 if U 1 2 .
0 if U 1 2,
X
1 if U 1 2 .
P lim
n
X n X P U 1 2 1
p. 26
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
1. The relationships among the above three modes of convergences are as follows:
X n a
.s .
X X n
P
X Xn
L
X.
Example 2.10
X
X 1 I 0,1 ,
X 2 I 0,1 2 , X 3 I 1 2,1 ,
X 4 I 0,1 3 , X 5 I 1 3,2 3 , X 6 I 2 3,1
… … …
However, for every , the value X n alternates between the values and
1 infinitely often. There is no value of 0,1 for which X n converges to
X , i.e. X n does not converge to X almost surely.
p. 27
Stat2602 Probability and Statistics II Fall 2014-2015
Theorem
lim P X n 0
n
or alternatively,
lim P X n 1 .
n
Proof
2
E X n , Var X n
n
By Chebyshev’s inequality,
2
P X n P X n E X n Var X n .
n n
2
0 lim P X n lim 2 0 .
n n n
Therefore lim P X n 0 .
n
p. 28
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
1. The weak law of large numbers states that the sample mean from a large sample
would have a very high probability to be arbitrarily close to the population
mean, thereby promising a stable performance of the sample mean.
2. A more general version of the weak law of large number states that if
E X i , then X n
P
. Note that it does not require a finite population
variance.
3. The strong law of large numbers (SLLN) states that if and only if E X i ,
X n
a.s
.
In other words, with probability 1 we have lim X n . The proof of the strong
n
law is omitted here and can be found in the classic book “A course in
probability theory”, written by Kai-Lai Chung.
4. The weak law states that for sufficiently large n, the sample mean X n is likely
to be near , but still allowing X n to happen an infinite number of
times, though at very infrequent intervals. On the other hand, the strong law
states that this almost surely won’t happen.
p. 29
Stat2602 Probability and Statistics II Fall 2014-2015
Xn n X n L
N 0,1 ,
n
n X n
i.e. lim P x x for all x .
n
Proof
Xi
Let Yi for i 1,2,... Then E Yi 0 and Var Yi 1 . Assume that the
moment generating function of Yi exists and is denoted by M Y t . Consider the
Taylor’s expansion of M Y t :
1 1
M Y t M Y 0 M 'Y 0 t M ' 'Y t 2 1 M ' 'Y t 2 for some 0 t .
2 2
n X 1 n
Let Z n
Yi . The moment generating function of Z n is
n i 1
n n
t t2 t
M Z t M Y 1 M ' 'Y for some 0 .
n 2n n
n
p. 30
Stat2602 Probability and Statistics II Fall 2014-2015
n
t2
lim M Z t lim1 M ' 'Y e t
2
2
n n
2n
n
Remarks
1. The key to the above proof is the following lemma which we state without
proof.
Lemma
2. A more general proof of the CLT uses the so-called characteristic function
(always exists) and does not require the existence of the moment generating
function.
4. The CLT is one of the most startling theorems in statistics. It found the basis of
other important theorems and provides us with some useful approximations for
the large-sample statistical analysis.
Xn
5. Note that is in fact the standard score of X n . The CLT suggests that we
n
can approximate the sampling distribution of X n by N , . It is called
n
the normal approximation.
p. 31
Stat2602 Probability and Statistics II Fall 2014-2015
Example 2.11
A random sample of size n 81 is taken from a population with mean 128 and
standard deviation 6.3 . Suppose that we are interested in the probability of
observing X to be falling between 126.6 and 129.4.
Example 2.12
iid
If Yi ~ b1, p , then E Yi p , Var Yi p 1 p .
n
Let X Yi , then X ~ bn, p , E X np , Var X np 1 p .
i 1
p. 32
Stat2602 Probability and Statistics II Fall 2014-2015
By CLT,
n Yn p X np X EX L
N 0,1 as n .
p 1 p np 1 p Var X
Example 2.13
X ~ b30,0.25
30
P6 X 9 0.25 0.75 0.6008
9
x 30 x
x 6 x
6 7.5 X np 9 7.5
P6 X 9 P
5.625 np1 p 5.625
The 0.5 added to or subtracted from the bounds in the probability statement is
called the continuity correction. In general, when a continuous distribution is used
to approximate a discrete distribution, it would be better to use
P X c 0.5 instead of P X c ;
P X c 0.5 instead of P X c ;
P X c 0. 5 instead of P X c ;
P X c 0.5 instead of P X c
where c is an integer.
p. 33
Stat2602 Probability and Statistics II Fall 2014-2015
Example 2.14
n
Let X Yi , then X ~ n , E X n , Var X n .
i 1
By CLT,
n Yn X n X E X L
N 0,1 as n .
n Var X
X
Therefore for X ~ ,
L
N 0,1 as .
Example 2.15
X ~ Poisson10
e 1010 x
21
P 11 X 21 0.3025
x 12 x!
11 10 X 21 10
P 11 X 21 P
10 10
3.48 0.32 0.3745
11.5 10 X 21.5 10
P 11 X 21 P
10 10
3.64 0.47 0.3175
Example 2.16
By CLT,
n Yn 1 X n X EX L
N 0,1 as n .
1 2 n 2 Var X
X
Therefore for X ~ Gamma , ,
L
N 0,1 as .
2
X r L
In particular for X ~ r2 , N 0,1 as r .
2r
Example 2.17
X ~ 80
2
64.28 80 X r 101.9 80
P 64.28 X 101.9 P
160 2r 160
p. 35
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
Note that there is no single magic sample size that guarantees that sampling
distributions will be approximately normal. If a population is fairly symmetrical,
small sample sizes usually suffice. For strongly skewed populations (e.g., Chi-
Square), n may need to be quite large for the sampling distribution to be
approximately normal. For many distributions that arise in practice, relatively
small sample sizes such as 30 is sufficiently large for the normal approximation to
hold. Do know, however, that there are exceptions.
The following diagram shows the relationship among some common families of
distributions. The limiting distributions are derived based on the central limit
theorem.
p. 36
Stat2602 Probability and Statistics II Fall 2014-2015
Suppose we draw a sample of size n from a very large population. Assume that a
proportion p of the objects in the population have a certain characteristic (e.g.
support the president, carrier of a certain virus, having annual salary more than
$400,000, etc). We may be interested in making inference about the population
proportion p.
Since the population is large, we can regard the n drawn sample units as n
independent Bernoulli trials, each with success probability p (success means the
drawn object has the characteristic). Let X be the number of objects in the sample
having the characteristic, then X ~ bn, p .
E X np
E pˆ p
n n
Var X np 1 p p 1 p
Var pˆ
n2 n2 n
In case when p is unknown, we can estimate it by p̂ and the standard error will be
calculated as
pˆ 1 pˆ
se pˆ .
n
X
As can be seen from Example 2.12, the sample proportion pˆ can be regarded
n
as the sample mean of a random sample from the Bernoulli distribution.
pˆ p
From the CLT, we have
L
N 0,1 .
p 1 p n
p. 37
Stat2602 Probability and Statistics II Fall 2014-2015
p1 p
Sampling distribution of p̂ pˆ ~ N p,
n
Example 2.18
With the rising costs of a college education, most students depend on their parents
or family for monetary support during their college years. The results of a
freshman survey last year indicate that 86% of freshmen in the survey received
financial aid from parents or family. Suppose that we were to survey the current
freshman class by selecting a random sample of n 400 freshmen.
0.86 0.14
pˆ ~ N 0.86,
400
The probability that the sample proportion will be greater than 90% can be
calculated as
0.9 0.86
P pˆ 0.9 1 1 2.306 0.01 .
0 .86 0. 14 400
p. 38
Stat2602 Probability and Statistics II Fall 2014-2015
The following theorem gives some results which are often useful in deriving
asymptotic distributions.
Slutzky’s Theorem
If X n
L
X and Yn
P
b , and g x, y is continuous at x, b for all x in the
range of X, then
g X n , Yn
L
g X , b .
The proof is beyond the scope of this course. Some useful results from this
theorem are given below.
Corollaries
1. If X n
P
a and g x is continuous at x a , then
gX n
P
g a .
2. If X n
L
X and g x is continuous for all x in the range of X, then
gX n
L
g X .
3. If X n
P
a and Yn
P
b , and g x, y is continuous at x, y a, b , then
g X n , Yn
P
g a, b .
Example 2.19
E X 2
1 n 2 P
X
P
,
n i 1
Xi
and from the corollary 3,
X i X 2 1 X i2 X 2 E X 2 2 2 .
1 n n
n i 1 n i 1
P
p. 39
Stat2602 Probability and Statistics II Fall 2014-2015
Since S 2
1 n
X i X 2 and n
P
1 , we have
n 1 i 1 n 1
S2
P
2,
n X
Moreover, by CLT, we have
L
N 0,1 . Consider
n X n X S S
and
P
1.
S
Therefore by the Slutzky’s theorem, we have the following useful result for
making inference about the population mean with large samples, in cases when
is unknown:
n X L
N 0,1
S
Example 2.20
From section 2.2.4, the sampling distribution of the sample proportion can be
approximated as
p1 p
pˆ ~ N p, .
n
This result, however, may not be practically useful as the population proportion p
is usually unknown.
pˆ p pˆ 1 pˆ P p 1 p
Since
L
N 0,1 and pˆ
P
p ,
p 1 p n n n
the following asymptotic distribution of the sample proportion results from the
Slutzky’s theorem:
pˆ p
L
N 0,1
pˆ 1 pˆ n
p. 40