0% found this document useful (0 votes)
23 views90 pages

Chapter 5 - Sample Statistics

The document discusses sampling statistics, defining random samples and key statistics such as sample mean, variance, and standard deviation. It explains the relationship between sample statistics and population parameters, including confidence intervals for estimating population means. Additionally, it provides examples and exercises to illustrate the concepts of sample variance and the approximation of population variance.

Uploaded by

alexiskadje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views90 pages

Chapter 5 - Sample Statistics

The document discusses sampling statistics, defining random samples and key statistics such as sample mean, variance, and standard deviation. It explains the relationship between sample statistics and population parameters, including confidence intervals for estimating population means. Additionally, it provides examples and exercises to illustrate the concepts of sample variance and the approximation of population variance.

Uploaded by

alexiskadje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SAMPLE STATISTICS

Sampling can consist of


• Gathering random data from a large population, for example,
− measuring the height of randomly selected adults
− measuring the starting salary of random CS graduates

• Recording the results of experiments , for example,


− measuring the breaking strength of randomly selected bolts
− measuring the lifetime of randomly selected light bulbs

• We shall generally assume the population is infinite (or large) .


• We shall also generally assume the observations are independent .
• The outcome of any experiment does not affect other experiments.

246
DEFINITIONS :

• A random sample from a population consists of

independent , identically distributed random variables,

X1 , X2 , · · · , Xn .

• The values of the Xi are called the outcomes of the experiment.

• A statistic is a function of X1 , X2 , · · · , Xn .

• Thus a statistic itself is a random variable .

247
EXAMPLES :

The most important statistics are

• The sample mean


1
X̄ ≡ (X1 + X2 + · · · + Xn ) .
n

• The sample variance


n
1 X
S2 ≡ (Xk − X̄)2 .
n k=1

( to be discussed in detail · · · )


• The sample standard deviation S = S2 .

248
For a random sample

X1 , X2 , · · · , Xn ,

one can think of many other statistics such as :

• The order statistic in which the observation are ordered in size .

• The sample median , which is


− the midvalue of the order statistic (if n is odd),

− the average of the two middle values (if n is even).

• The sample range : the difference between the largest and the
smallest observation.

249
EXAMPLE : For the 8 observations

−0.737 , 0.511 , −0.083 , 0.066 , −0.562 , −0.906 , 0.358 , 0.359 ,

from the first row of the Table given earlier, we have

Sample mean :
1
X̄ = ( − 0.737 + 0.511 − 0.083 + 0.066
8
− 0.562 − 0.906 + 0.358 + 0.359 ) = − 0.124 .

Sample variance :
1
{ (−0.737 − X̄)2 + (0.511 − X̄)2 + (−0.083 − X̄)2
8
+ (0.066 − X̄)2 + (−0.562 − X̄)2 + (−0.906 − X̄)2
+ (0.358 − X̄)2 + (0.359 − X̄)2 } = 0.26 .

Sample standard deviation : 0.26 = 0.51 .

250
EXAMPLE : ( continued · · · )

For the 8 observations

−0.737 , 0.511 , −0.083 , 0.066 , −0.562 , −0.906 , 0.358 , 0.359 ,

we also have

The order statistic :

−0.906 , −0.737 , −0.562 , −0.083 , 0.066 , 0.358 , 0.359 , 0.511 .

The sample median : (−0.083 + 0.066)/2 = − 0.0085 .

The sample range : 0.511 − (−0.906) = 1.417 .

251
The Sample Mean
Suppose the population mean and standard deviation are µ and σ .

As before, the sample mean


1
X̄ ≡ (X1 + X2 + · · · + Xn ) ,
n
is also a random variable , with expected value
1
µX̄ ≡ E[X̄] = E[ (X1 + X2 + · · · + Xn ) ] = µ ,
n
and variance
2 σ2
σX̄ ≡ V ar(X̄) = ,
n
σ
Standard deviation of X̄ : σX̄ = √ .
n

NOTE : The sample mean approximates the population mean µ .

252
How well does the sample mean approximate the population mean ?

From the Corollary to the CLT we know

X̄ − µ
√ ,
σ/ n

is approximately standard normal when n is large.

Thus, for given n and z , (z > 0) , we can, for example, estimate

X̄ − µ
P( | √ | ≤ z) ∼
= 1 − 2 Φ(−z) .
σ/ n

(A problem is that we often don’t know the value of σ · · · )

253
It follows that

 X̄ − µ   σz 
P | √ | ≤ z = P | X̄ − µ | ≤ √
σ/ n n

 σz σz 
= P µ ∈ [ X̄ − √ , X̄ + √ ]
n n


= 1 − 2 Φ(−z) ,

which gives us a confidence interval estimate of µ .

254
σz σz
We found : P ( µ ∈ [X̄ − √ , X̄ + √ ] ) ∼
= 1 − 2 Φ(−z) .
n n

EXAMPLE : We take samples from a given population :


• The population mean µ is unknown .
• The population standard deviation is σ = 3
• The sample size is n = 25 .
• The sample mean is X̄ = 4.5 .

Taking z=2 , we have


3·2 3·2
P ( µ ∈ [ 4.5 − √ , 4.5 + √ ] ) = P ( µ ∈ [ 3.3 , 5.7 ] )
25 25
= 1 − 2 Φ(−2) ∼
∼ = 95 % .

We call [ 3.3 , 5.7 ] the 95 % confidence interval estimate of µ .

255
EXERCISE :
As in the preceding example, µ is unknown, σ = 3 , X̄ = 4.5 .
Use the formula
σz σz
P ( µ ∈ [X̄ − √ , X̄ + √ ]) ∼
= 1 − 2 Φ(−z) ,
n n
to determine
• The 50 % confidence interval estimate of µ when n = 25 .
• The 50 % confidence interval estimate of µ when n = 100 .
• The 95 % confidence interval estimate of µ when n = 100 .

NOTE : In the Standard Normal Table , check that


• The 50 % confidence interval corresponds to z = 0.68 ∼
= 0.7 .
• The 95 % confidence interval corresponds to z = 1.96 ∼
= 2.0 .

256
The Sample Variance We defined the sample variance as
n n
2 1 X X 1
S ≡ (Xk − X̄)2 = 2
[ (Xk − X̄) · ].
n k=1 k=1
n

Earlier, for discrete random variables X, we defined the variance as


X
2 2
σ ≡ E[ (X − µ) ] ≡ [ (Xk − µ)2 · p(Xk ) ] .
k

• These two formulas look deceptively similar !


• In fact, they are quite different !
• The 1st sum for S 2 is only over the sampled X-values.
• The 2nd sum for σ 2 is over all X-values.
• The 1st sum for S 2 has constant weights .
• The 2nd sum for σ 2 uses the probabilities as weights .

257
We have just argued that the sample variance
n
1 X
S2 ≡ (Xk − X̄)2 ,
n k=1

and the population variance (for discrete random variables)


X
2 2
σ ≡ E[ (X − µ) ] ≡ [ (Xk − µ)2 · p(Xk ) ] ,
k

are quite different.

Nevertheless, we will show that for large n their values are close !

Thus for large n we have the approximation


S2 ∼
= σ2 .

258
FACT 1 : We (obviously) have that
n n
1 X X
X̄ = Xk implies Xk = nX̄ .
n k=1 k=1

FACT 2 : From
σ2 ≡ V ar(X) ≡ E[(X − µ)2 ] = E[X 2 ] − µ2 ,
we (obviously) have
E[X 2 ] = σ 2 + µ2 .

FACT 3 : Recall that for independent, identically distributed Xk ,

where each Xk has mean µ and variance σ 2 , we have


2
2 σ
µX̄ ≡ E[X̄] = µ , σX̄ ≡ E[(X̄ − µ)2 ] = .
n

259
FACT 4 : ( Useful for computing S 2 efficiently ) :
n n
1 X 1 X 2
S2 ≡ (Xk − X̄)2 = [ Xk ] − X̄ 2 .
n k=1 n k=1
PROOF :
n
1 X
S2 = (Xk − X̄)2
n k=1

n
1 X 2
= (Xk − 2Xk X̄ + X̄ 2 )
n k=1

n n
1 X 2 X
= [ Xk − 2X̄ Xk + nX̄ 2 ] ( now use Fact 1 )
n k=1 k=1

n n
1 X 2 1 X 2
= [ Xk − 2nX̄ 2 + nX̄ 2 ] = [ Xk ] − X̄ 2 QED !
n k=1 n k=1

260
THEOREM : The sample variance
n
1 X
S2 ≡ (Xk − X̄)2
n k=1
has expected value
1
E[S ] = (1 − ) · σ 2 .
2

PROOF : n
n
2 1X
E[S ] = E[ (Xk − X̄)2 ]
n k=1
h 1X n i
= E [Xk2 ] − X̄ 2 ( using Fact 4 )
n k=1
n
1X
= E[Xk2 ] − E[X̄ 2 ]
n k=1
= σ 2 + µ2 − (σX̄
2
+ µ2X̄ ) ( using Fact 2 n + 1 times ! )
2
2 2 σ 2 1 2
= σ + µ − ( + µ ) = (1 − ) σ . ( Fact 3 ) QED !
n n
REMARK : Thus limn→∞ E[S 2 ] = σ 2 .

261
Most authors instead define the sample variance as
n
1 X
Ŝ 2 ≡ (Xk − X̄)2 .
n − 1 k=1

In this case the Theorem becomes :

THEOREM : The sample variance


n
1 X
Ŝ 2 ≡ (Xk − X̄)2
n − 1 k=1

has expected value


E[Ŝ 2 ] = σ2 .

EXERCISE : Check this !

262
EXAMPLE : The random sample of 120 values of a uniform
random variable on [−1, 1] in an earlier Table has
120
1 X
X̄ = Xk = 0.030 ,
120 k=1
120
1 X
S2 = (Xk − X̄)2 = 0.335 ,
120 k=1

S = S 2 = 0.579 ,

while
µ = 0,
Z 1
2 2 1 1
σ = (x − µ) dx = ,
−1 2 3
√ 1
σ = 2
σ = √ = 0.577 .
3
• What do you say ?

263
EXAMPLE :

• Generate 50 uniform random numbers in [−1, 1] .

• Compute their average.

• Do the above 500 times.

• Call the results X̄k , k = 1, 2, · · · , 500 .

• Thus each X̄k is the average of 50 random numbers.

• Compute the sample statistics X̄ and S of these 500 values.

• Can you predict the values of X̄ and S ?

264
EXAMPLE : ( continued · · · )
500
1 X
Results : X̄ = X̄k = − 0.00136 ,
500 k=1

500
1 X
S2 = (X̄k − X̄)2 = 0.00664 ,
500 k=1


S = S 2 = 0.08152 .

EXERCISE :
• What is the value of E[X̄] ?
• Compare X̄ to E[X̄] .
• What is the value of V ar(X̄) ?
• Compare S 2 to V ar(X̄) .

265
Estimating the variance of a normal distribution
We have shown that
n
1 X
S2 ≡ (Xk − X̄)2 ∼
= σ2 .
n k=1
How good is this approximation for normal random variables Xk ?
To answer this we need :
FACT 5 :
X n n
X
(Xk − µ)2 − (Xk − X̄)2 = n(X̄ − µ)2 .
k=1 k=1

PROOF :
Pn 2 2 2 2
LHS = k=1 { X k − 2X k µ + µ − X k + 2X k X̄ − X̄ }

= −2nX̄µ + nµ2 + 2nX̄ 2 − nX̄ 2

= nX̄ 2 − 2nX̄µ + nµ2 = RHS . QED !

266
Rewrite Fact 5
X n n
X
(Xk − µ)2 − (Xk − X̄)2 = n(X̄ − µ)2 ,
k=1 k=1
as n  n
X Xk − µ 2 n 1 X 2
 X̄ − µ 2
− 2
(X k − X̄) = √ ,
k=1
σ σ n k=1 σ/ n

and then as n
X n 2
Zk2 − 2
S = Z 2
,
k=1
σ
where
S 2 is the sample variance ,
and
Z and Zk are standard normal because the Xk are normal .

Finally, we can write the above as


n 2 2 2
2
S = χn − χ1 . ( Why ? )
σ

267
We have found that
n 2 2 2
2
S = χn − χ1 .
σ

THEOREM : For samples from a normal distribution :


n 2 2
2
S has the χn−1 distribution !
σ

PROOF : Omitted (and not as obvious as it might appear !) .

REMARK : If we use the alternate definition


n
1 X
Ŝ 2 ≡ (Xk − X̄)2 ,
n − 1 k=1
then the Theorem becomes
n−1 2 2
2
Ŝ has the χn−1 distribution .
σ

268
n−1 2 2
For normal random variables : 2
Ŝ has the χn−1 distribution
σ

EXAMPLE : For a large shipment of light bulbs we know that :

• The lifetime of the bulbs has a normal distribution .


• The standard deviation is claimed to be σ = 100 hours.
( The mean lifetime µ is not given. )

Suppose we test the lifetime of 16 bulbs. What is the probability


that the sample standard deviation Ŝ satisfies Ŝ ≥ 129 hours ?

SOLUTION : n − 1 15 
P (Ŝ ≥ 129) = P (Ŝ 2 ≥ 1292 ) = P Ŝ 2
≥ 129 2
σ2 1002

= P (χ215 ≥ 24.96) ∼
= 5 % ( from the χ2 Table ) .

QUESTION : If Ŝ = 129 then would you believe that σ = 100 ?

269
0.16

0.14

0.12

0.10
f (x)

0.08

0.06

0.04

0.02

0.00
0 5 10 15 20 25
x
The Chi-Square density functions for n = 5, 6, · · · , 15 .
(For large n they look like normal density functions .)

270
EXERCISE :
In the preceding example, also compute
P ( χ215 ≥ 24.96 )
using the standard normal approximation .

EXERCISE :
Consider the same shipment of light bulbs :

• The lifetime of the bulbs has a normal distribution .

• The mean lifetime is not given.

• The standard deviation is claimed to be σ = 100 hours.

Suppose we test the lifetime of only 6 bulbs .

• For what value of s is P (Ŝ ≤ s) = 5 % ?

271
EXAMPLE : For the data below from a normal population :

• Estimate the population standard deviation.

• Determine a 95 percent confidence interval for σ.

-0.047 0.126 -0.037 0.148


0.198 0.073 -0.025 -0.070
-0.197 -0.026 -0.062 -0.004
-0.164 0.265 -0.274 0.188

SOLUTION : We find ( with n = 16 ) that


n
1 X
X̄ = Xi = 0.00575 ,
n 1
and n
1 X
Ŝ 2 = (Xk − X̄)2 = 0.02278 .
n − 1 k=1

272
SOLUTION : We have n = 16 , X̄ = 0.00575 , Ŝ 2 = 0.02278 .

• Estimate the population standard deviation :



ANSWER : σ = Ŝ =∼ 0.02278 = 0.15095 .

• Compute a 95 percent confidence interval for σ :


ANSWER : From the Chi-Square Table :
P (χ215 ≤ 6.26) = 0.025 , P (χ215 > 27.49) = 0.025 .

(n − 1) Ŝ 2 2 (n − 1)Ŝ 2 15 · 0.02278
2
= 6.26 ⇒ σ = = = 0.05458
σ 6.26 6.26
(n − 1) Ŝ 2 2 (n − 1)Ŝ 2 15 · 0.02278
= 27.49 ⇒ σ = = = 0.01223
σ2 27.49 27.49

Thus the 95 % confidence interval for σ is


√ √
[ 0.01223 , 0.05458 ] = [ 0.106 , 0.234 ] .

273
Samples from Finite Populations

Samples from a finite population can be taken

(1) with replacement

(2) without replacement

• In Case 1 the sample


X1 , X2 , · · · , Xn ,
may contain the same outcome more than once.

• In Case 2 the outcomes are distinct .

• Case 2 arises, e.g., when the experiment destroys the sample.

274
EXAMPLE :

Suppose a bag contains three balls, numbered 1, 2, and 3.

A sample of two balls is drawn at random from the bag.

Recall that ( here with n = 2 ) :


1
X̄ ≡ (X1 + X2 + · · · + Xn ) .
n
n
1 X
S2 ≡ (Xk − X̄)2 .
n k=1

For both, sampling with and without replacement , compute

E[X̄] and E[S 2 ] .

275
• With replacement : The possible samples are
(1, 1) , (1, 2) , (1, 3) , (2, 1) , (2, 2) , (2, 3) , (3, 1) , (3, 2) , (3, 3) ,
1
each with equal probability 9
.

The sample means X̄ are


3 3 5 5
1 , , 2 , , 2 , , 2 , , 3,
2 2 2 2
with
1 3 3 5 5
E[X̄] = (1 + + 2 + + 2 + + 2 + + 3) = 2 .
9 2 2 2 2

The sample variances S 2 are


1 1 1 1
0 , , 1 , , 0 , , 1 , , 0. ( Check ! )
4 4 4 4
with
2 1 1 1 1 1 1
E[S ] = (0 + + 1 + + 0 + + 1 + + 0) = .
9 4 4 4 4 3

276
• Without replacement : The possible samples are
(1, 2) , (1, 3) , (2, 1) , (2, 3) , (3, 1) , (3, 2) ,
1
each with equal probability 6
.
The sample means X̄ are
3 3 5 5
, 2 , , , 2 , ,
2 2 2 2
with expected value
1 3 3 5 5
E[X̄] = ( + 2 + + + 2 + ) = 2.
6 2 2 2 2
The sample variances S 2 are
1 1 1 1
, 1 , , , 1 , . ( Check ! )
4 4 4 4
with expected value
2 1 1 1 1 1 1
E[S ] = ( + 1 + + + 1 + ) = .
6 4 4 4 4 2

277
EXAMPLE : ( continued · · · )

A bag contains three balls, numbered 1, 2, and 3.

A sample of two balls is drawn at random from the bag.

We have computed E[X̄] and E[S 2 ] :


1
• With replacement : E[X̄] = 2 , E[S 2 ] = 3
,
1
• Without replacement : E[X̄] = 2 , E[S 2 ] = 2
.

We also know the population mean and variance :


1 1 1
µ = 1· + 2· + 3· = 2,
3 3 3
2 2 1 2 1 2 1 2
σ = (1 − 2) · + (2 − 2) · + (3 − 2) · = .
3 3 3 3

278
EXAMPLE : ( continued · · · )

We have computed :
2
• Population statistics : µ = 2 , σ2 = 3
,

• Sampling with replacement : E[X̄] = 2 , E[S 2 ] = 1


3
,
1
• Sampling without replacement : E[X̄] = 2 , E[S 2 ] = 2
.

According to the earlier Theorem


1 2
E[S 2 ] = (1 − )σ .
n
In this example the sample size is n = 2 , thus

2 1 2 1
E[S ] = (1 − ) σ = .
2 3

NOTE : E[S 2 ] is wrong for sampling without replacement !

279
QUESTION :
Why is E[S 2 ] wrong for sampling without replacement ?

ANSWER : Without replacement the outcomes Xk of a sample


X1 , X2 , · · · , Xn ,
are not independent !

In our example , where n = 2 , and where the possible samples are


(1, 2) , (1, 3) , (2, 1) , (2, 3) , (3, 1) , (3, 2) ,
we have, e.g.,
1
P (X2 = 1 | X1 = 1) = 0 , P (X2 = 1 | X1 = 2) = .
2

Thus X1 and X2 are not independent . ( Why not ? )

280
NOTE :

Let N be the population size and n the sample size .

Suppose N is very large compared to n .

For example, n = 2 , and the population is

{ 1 , 2 , 3 , ··· , N } .

Then we still have

P (X2 = 1 | X1 = 1) = 0 ,

but for k 6= 1 we have


1
P (X2 = k | X1 = 1) = .
N −1

One could say that X1 and X2 are ”almost independent ” . ( Why ? )

281
The Sample Correlation Coefficient

Recall the covariance of random variables X and Y :


σX,Y ≡ Cov(X, Y ) ≡ E[ (X−µX ) (Y −µY ) ] = E[XY ] − E[X] E[Y ] .

It is often better to use a scaled version, the correlation coefficient


σX,Y
ρX,Y ≡ ,
σX σY
where σX and σY are the standard deviation of X and Y .

We have
• | σX,Y | ≤ σX σY , (the Cauchy-Schwartz inequality )
• Thus | ρX,Y | ≤ 1 , ( Why ? )
• If X and Y are independent then ρX,Y = 0 . ( Why ? )

282
Similarly, the sample correlation coefficient of a data set
{ (Xi , Yi ) }N
i=1 ,
is defined as
PN
i=1 (Xi− X̄)(Yi − Ȳ )
RX,Y ≡ qP qP ;
N 2 N 2
i=1 (X i − X̄) i=1 (Yi − Ȳ )

for which we have another version of the Cauchy-Schwartz inequality :


| RX,Y | ≤ 1.

Like the covariance, RX,Y measures ”concordance ” of X and Y :

• If Xi > X̄ when Yi > Ȳ and Xi < X̄ when Yi < Ȳ then


RX,Y > 0 .

• If Xi > X̄ when Yi < Ȳ and Xi < X̄ when Yi > Ȳ then


RX,Y < 0 .

283
The sample correlation coefficient
PN
i=1 (Xi − X̄)(Yi − Ȳ )
RX,Y ≡ qP qP .
N 2 N 2
i=1 (X i − X̄) i=1 (Yi − Ȳ )

can also be used to test for linearity of the data.

In fact,
• If | RX,Y | = 1 then X and Y are related linearly .

Specifically,
• If RX,Y = 1 then Yi = cXi + d, for constants c, d, with c > 0 .
• If RX,Y = −1 then Yi = cXi + d, for constants c, d, with c < 0 .

Also,
• If | RX,Y | ∼
= 1 then X and Y are almost linear .

284
EXAMPLE :

• Consider the average daily high temperature in Montreal in March.

• The Table shows these averages, taken over a number of years :

1 -1.52 8 -0.52 15 2.08 22 3.39 29 6.95


2 -1.55 9 -0.67 16 1.22 23 3.69 30 6.83
3 -1.72 10 0.01 17 1.73 24 4.45 31 6.93
4 -0.94 11 0.96 18 1.93 25 4.74
5 -0.51 12 0.49 19 3.10 26 5.01
6 -0.29 13 1.26 20 3.05 27 4.66
7 0.02 14 1.99 21 3.32 28 6.45

Average daily high temperature in Montreal in March : 1943-2014 .


( Source : [Link] )

These data have sample correlation coefficient RX,Y = 0.98 .

285
A scatter diagram showing the average daily high temperature.
The sample correlation coefficient is RX,Y = 0.98

286
EXERCISE :
• The Table below shows class attendance and course grade/100.

• The attendance was sampled in 18 sessions.

11 47 13 43 15 70 17 72 18 96 14 61 5 25 17 74
16 85 13 82 16 67 17 91 16 71 16 50 14 77 12 68
8 62 13 71 12 56 15 81 16 69 18 93 18 77 17 48
14 82 17 66 16 91 17 67 7 43 15 86 18 85 17 84
11 43 17 66 18 57 18 74 13 73 15 74 18 73 17 71
14 69 15 85 17 79 18 84 17 70 15 55 14 75 15 61
16 61 4 46 18 70 0 29 17 82 18 82 16 82 14 68
9 84 15 91 15 77 16 75

Class attendance - Course grade

• Draw a scatter diagram showing the data.

• Determine the sample correlation coefficient .

• Any conclusions ?

287
Maximum Likelihood Estimators

EXAMPLE :

Suppose a random variable has a normal distribution with mean 0 .

Thus the density function is


1 − 12 x2 /σ 2
f (x) = √ e .
2π σ

• Suppose we don’t know σ (the population standard deviation).

• How can we estimate σ from observed data ?

• ( We want a formula for estimating σ . )

• Don’t we already have such a formula ?

288
EXAMPLE : ( continued · · · )

We know we can estimate σ 2 by the sample variance


n
1 X
S2 ≡ (Xk − X̄)2 .
n k=1

In fact, we have proved that

2 1 2
E[S ] = (1 − ) σ .
n

• Thus, we can call S 2 an estimator of σ 2 .

• The ”maximum likelihood procedure” derives such estimators.

289
The maximum likelihood procedure is the following :
Let
X1 , X2 , · · · , Xn ,
be
independent, identically distributed ,
each having
density function f (x ; σ) ,
with unknown parameter σ .

By independence , the joint density function is


f (x1 , x2 , · · · , xn ; σ) = f (x1 ; σ) f (x2 ; σ) · · · f (xn ; σ) ,

DEFINITION : The maximum likelihood estimate σ̂ is

the value of σ that maximizes f (x1 , x2 , · · · , xn ; σ) .

NOTE : σ̂ will be a function of x1 , x2 , · · · , xn .

290
EXAMPLE : For our normal distribution with mean 0 we have
1 Pn
− x2k
e 2σ 2 k=1
f (x1 , x2 , · · · , xn ; σ) = √ . ( Why ? )
( 2π σ)n

To find the maximum (with respect to σ ) we set


d
f (x1 , x2 , · · · , xn ; σ) = 0 , ( by Calculus ! )

or, equivalently , we set


Pn
− 12 x2k 
d  e 2σ k=1
log n
= 0. ( Why equivalent ? )
dσ σ

Taking the (natural) logarithm gives


n
d  1 X 
− 2 x2k − n log σ = 0.
dσ 2σ k=1

291
EXAMPLE : ( continued · · · )
We had n
d  1 X 2 
− 2 xk − n log σ = 0.
dσ 2σ k=1

Taking the derivative gives


Pn 2
x
k=1 k n
− = 0,
σ3 σ
from which n
1 X 2
σ̂ 2 = xk .
n k=1

Thus we have derived the maximum likelihood estimate


n
1 X 2  12
σ̂ = √ Xk . ( Surprise ? )
n k=1

292
EXERCISE :
Suppose a random variable has the general normal density function
1 − 21 (x−µ)2 /σ 2
f (x ; µ, σ) = √ e ,
2π σ
with unknown mean µ and unknown standard deviation σ .

Derive maximum likelihood estimators for both µ and σ as follows :

For the joint density function

f (x1 , x2 , · · · , xn ; µ, σ) = f (x1 ; µ, σ) f (x2 ; µ, σ) · · · f (xn ; µ, σ) ,

• Take the log of f (x1 , x2 , · · · , xn ; µ, σ) .


• Set the partial derivative w.r.t. µ equal to zero.
• Set the partial derivative w.r.t. σ equal to zero.
• Solve these two equations for µ̂ and σ̂ .

293
EXERCISE : ( continued · · · )

The maximum likelihood estimators turn out to be

n
1 X
µ̂ = Xk ,
n k=1
n  12
1 X
σ̂ = √ (Xk − X̄)2 ,
n k=1
that is,

µ̂ = X̄ , ( the sample mean ) ,

σ̂ = S ( the sample standard deviation ) .

294
NOTE :

• Earlier we defined the sample variance as


n
1X
S2 = (Xk − X̄)2 .
n k=1

• Then we proved that, in general,

2 1 2 ∼
E[S ] = (1 − ) σ = σ2.
n

• In the preceding exercise we derived the estimator for σ !

• ( But we did so specifically for the general normal distribution. )

295
EXERCISE :

A random variable has the standard exponential distribution


with density function
 −λx
 λe , x>0
f (x ; λ) =
0, x≤0

• Suppose we don’t know λ .

• Derive the maximum likelihood estimator of λ .

• ( Can you guess what the formula will be ? )

296
EXAMPLE : Consider the special exponential density function
 2 −λx
 λ xe , x>0
f (x ; λ) =
0, x≤0

1.50 1.0

0.9

1.25
0.8

0.7
1.00

0.6

F (x)
f (x)

0.75 0.5

0.4

0.50
0.3

0.2
0.25

0.1

0.00 0.0
0 1 2 3 4 5 0 1 2 3 4 5
x x

Density and distribution functions for λ = 1 , 2 , 3 , 4 .

297
EXAMPLE : ( continued · · · )
For the maximum likelihood estimator of λ , we have
f (x ; λ) = λ2 x e−λx , for x > 0 ,
so, assuming independence, the joint density function is
f (x1 , x2 , · · · , xn ; λ) = λ2n x1 x2 · · · xn e−λ(x1 +x2 + ··· +xn )
.

To find the maximum (with respect to λ ) we set


d 
2n −λ(x1 +x2 + ··· +xn )

log λ x1 x2 · · · xn e = 0.

Taking the logarithm gives


n n
d  X X 
2n log λ + log xk − λ xk = 0 .
dλ k=1 k=1

298
EXAMPLE : ( continued · · · )
We had
n n
d  X X 
2n log λ + log xk − λ xk = 0.
dλ k=1 k=1

Differentiating gives
n
2n X
− xk = 0,
λ k=1
from which
2n
λ̂ = Pn .
k=1 xk

Thus we have derived the maximum likelihood estimate


2n 2
λ̂ = P n = .
k=1 Xk X̄

NOTE : This result suggests that perhaps E[X] = 2/λ . ( Why ? )

299
EXERCISE :

For the special exponential density function in the preceding example,


 2 −λx
 λ xe , x>0
f (x ; λ) =
0, x≤0

• Verify that Z ∞
f (x ; λ) dx = 1 .
0

• Also compute
Z ∞
E[X] = x f (x ; λ) dx
0

• Is it indeed true that


2
E[X] = ?
λ

300
NOTE :
• Maximum likelihood estimates also work in the discrete case .
• In such case we maximize the probability mass function .

EXAMPLE :
Find the maximum likelihood estimator of p in the Bernoulli trial
P (X = 1) = p,
P (X = 0) = 1−p .
SOLUTION : We can write
P (x ; p) ≡ P (X = x) = px (1 − p)1−x , ( x = 0, 1 ) (!)

so, assuming independence , the joint probability mass function is

P (x1 , x2 , · · · , xn ; p) = px1 (1−p)1−x1 px2 (1−p)1−x2 · · · pxn (1−p)1−xn


Pn Pn
= p k=1 xk n
· (1 − p) −
· (1 − p) k=1 xk
.

301
EXAMPLE : ( continued · · · )
We found
Pn Pn
P (x1 , x2 , · · · , xn ; p) = p k=1 xk n
· (1 − p) −
· (1 − p) k=1 xk
.

To find the maximum (with respect to p ) we set


d  Pn Pn 
log p k=1 xk · (1 − p)n · (1 − p)− k=1 xk = 0.
dp

Taking the logarithm gives


n n
d  X X 
log p xk + n log(1 − p) − log (1 − p) xk = 0.
dp k=1 k=1

Differentiating gives
n n
1 X n 1 X
xk − + xk = 0.
p k=1 1−p 1 − p k=1

302
EXAMPLE : ( continued · · · )
We found
n n
1 X n 1 X
xk − + xk = 0,
p k=1 1−p 1 − p k=1
from which
 1 n
1 X n
+ xk = .
p 1 − p k=1 1−p

Multiplying by 1 − p gives
1 − p n n
 X 1 X
+1 xk = xk = n,
p k=1
p k=1

from which we obtain the maximum likelihood estimator


Pn
k=1 Xk
p̂ = ≡ X̄ . ( Surprise ? )
n

303
EXERCISE :

Consider the Binomial probability mass function


 
N
P (x ; p) ≡ P (X = x) = · px · (1 − p)N −x ,
x

where x is an integer, (0 ≤ x ≤ N ) .

• What is the joint probability mass function P (x1 , x2 , · · · , xn ; p) ?

• ( Be sure to distinguish between N and n ! )

• Determine the maximum likelihood estimator p̂ of p .

• ( Can you guess what p̂ will be ? )

304
Hypothesis Testing

• Often we want to decide whether a hypothesis is True or False.

• To do so we gather data, i.e., a sample .

• A typical hypothesis is that a random variable has a given mean .

• Based on the data we want to accept or reject the hypothesis.

• To illustrate concepts we consider an example in detail.

305
EXAMPLE :

We consider ordering a large shipment of 50 watt light bulbs.

The manufacturer claims that :


• The lifetime of the bulbs has a normal distribution .
• The mean lifetime is µ = 1000 hours.
• The standard deviation is σ = 100 hours.

We want to test the hypothesis that µ = 1000 .

We assume that :
• The lifetime of the bulbs has indeed a normal distribution .
• The standard deviation is indeed σ = 100 hours.
• We test the lifetime of a sample of 25 bulbs .

306
Density function of X , Density function of X̄ (n = 25) ,
also indicating µ ± σX , also indicating µX̄ ± σX̄ ,
(µX = 1000 , σX = 100) . (µX̄ = 1000 , σX̄ = 20 ).

307
EXAMPLE : ( continued · · · )

We test a sample of 25 light bulbs :

• We find the sample average lifetime is X̄ = 960 hours.

• Do we accept the hypothesis that µ = 1000 hours ?

Using the standard normal Table we have the one-sided probability


 960 − 1000 
P (X̄ ≤ 960) = Φ √ = Φ(−2.0) = 2.28 % ,
100/ 25
(assuming that the average lifetime is indeed 1000 hours).

• Would you accept the hypothesis that µ = 1000 ?

• Would you accept (and pay for !) the shipment ?

308
EXAMPLE : ( continued · · · )

We test a sample of 25 light bulbs :

• Suppose instead the sample average lifetime is X̄ = 1040 hours.

• Do we accept that µ = 1000 hours ?

Using the standard normal Table we have one-sided probability


 1040 − 1000 
P (X̄ ≥ 1040) = 1 − Φ √ = 1 − Φ(2) = Φ(−2) = 2.28 % ,
100/ 25
(assuming again that the average lifetime is indeed 1000 hours).

• Would you accept the hypothesis that that the mean is 1000 hours ?

• Would you accept the shipment ? (!)

309
EXAMPLE : ( continued · · · )

Suppose that we accept the hypothesis that µ = 1000 if

960 ≤ X̄ ≤ 1040 .

Thus, if indeed µ = 1000 , we accept the hypothesis with probability

 960 − 1000 
P ( | X̄−1000 |≤ 40 ) = 1 − 2Φ √ = 1−2Φ(−2) ∼
= 95 % ,
100/ 25

and we reject the hypothesis with probability

P ( | X̄ − 1000 | ≥ 40 ) = 100 % − 95 % = 5 % .

310
Density function of X̄ (n = 25) , with µ = µX̄ = 1000 , σX̄ = 20 ,

P (960 ≤ X̄ ≤ 1040) ∼
= 95%

311
EXAMPLE : ( continued · · · )

What is the probability of

acceptance of the hypothesis if µ is different from 1000 ?

• If the actual mean is µ = 980 , then acceptance has probability


1040 − 980 960 − 980
P (960 ≤ X̄ ≤ 1040) = Φ( √ ) − Φ( √ )
100/ 25 100/ 25
= Φ(3) − Φ(−1) = 1 − Φ(−3) − Φ(−1)

(1 − 0.0013) − 0.1587 = 84 % .

• If the actual mean is µ = 1040 , then acceptance has probability


1040 − 1040 960 − 1040
P (960 ≤ X̄ ≤ 1040) = Φ( √ ) − Φ( √ )
100/ 25 100/ 25
= Φ(0) − Φ(−4) ∼ = 50 % .

312
µ = µX̄ = 980 µ = µX̄ = 1000 µ = µX̄ = 1040
P (accept) = 84% P (accept) = 95% P (accept) = 50%

Density functions of X̄ : n = 25 , σX̄ = 20

QUESTION 1 : How does P (accept) change when we “slide” the


density function of X̄ along the X̄-axis , i.e., when µ changes ?

QUESTION 2 : What is the effect of increasing the sample size n ?

313
EXAMPLE :

Now suppose there are two lots of light bulbs :

• Lot 1 : Light bulbs with mean life time µ1 = 1000 hours,

• Lot 2 : Light bulbs with mean life time µ2 = 1100 hours.

We want to decide which lot our sample of 25 bulbs is from.

Consider the decision criterion x̂ , where 1000 ≤ x̂ ≤ 1100 :

• If X̄ ≤ x̂ then the sample is from Lot 1 .

• If X̄ > x̂ then the sample is from Lot 2 .

314
There are two hypotheses :

• H1 : The sample is from Lot 1 (µ1 = 1000) .

• H2 : The sample is from Lot 2 (µ2 = 1100) .

We can make two types of errors :


• Type 1 error : Accept H2 when H1 is True ,

• Type 2 error : Accept H1 when H2 is True ,

which happen when, for given decision criterion x̂ ,

• Type 1 error : If X̄ > x̂ and the sample is from Lot 1 .

• Type 2 error : If X̄ ≤ x̂ and the sample is from Lot 2 .

315
The density functions of X̄ (n = 25) , also indicating x̂ .
blue : (µ1 , σ1 ) = (1000, 100) , red : (µ2 , σ2 ) = (1100, 200) .

Type 1 error : area under the blue curve, to the right of x̂ .


Type 2 error : area under the red curve, to the left of x̂ .

QUESTION : What is the effect of moving x̂ on these errors ?

316
RECALL :

• Type 1 error : If X̄ > x̂ and the sample is from Lot 1 .

• Type 2 error : If X̄ ≤ x̂ and the sample is from Lot 2 .

These errors occur with probability


• Type 1 error : P (X̄ ≥ x̂ | µ = µ1 ≡ 1000) .

• Type 2 error : P (X̄ ≤ x̂ | µ = µ2 ≡ 1100) .

We should have, for the (rather bad) choice x̂ = 1000 ,


• Type 1 error : P (X̄ ≥ 1000 | µ = µ1 ≡ 1000) = 0.5 .

and for the (equally bad) choice x̂ = 1100 ,


• Type 2 error : P (X̄ ≤ 1100 | µ = µ2 ≡ 1100) = 0.5 .

317
Probability of Type 1 error vs. x̂ Probability of Type 2 error vs. x̂
(µ1 , σ1 ) = (1000, 100) (µ2 , σ2 ) = (1100, 100) .

Sample sizes : 2 (red) , 8 (blue) , 32 (black) .

318
The probability of Type 1 and Type 2 errors versus x̂ .
Left : (µ1 , σ 1 ) = (1000, 100), (µ2 , σ 2 ) = (1100, 100).
Right : (µ1 , σ 1 ) = (1000, 100), (µ2 , σ 2 ) = (1100, 200).
Colors indicate sample size : 2 (red), 8 (blue), 32 (black) .
Curves of a given color intersect at the minimax x̂-value.

319
The probability of Type 1 and Type 2 errors versus x̂ .

Left : (µ1 , σ 1 ) = (1000, 100), (µ2 , σ 2 ) = (1100, 300).


Right : (µ1 , σ 1 ) = (1000, 100), (µ2 , σ 2 ) = (1100, 400).

Colors indicate sample size : 2 (red), 8 (blue), 32 (black) .


Curves of a given color intersect at the minimax x̂-value.

320
NOTE :

• There is an optimal value x̂∗ of x̂ .

• At x̂∗ the value of

max { P (Type 1 Error) , P (Type 2 Error) }

is minimized .

• We call x̂∗ the minimax value.

• The value of x̂∗ depends on σ1 and σ2 .

• The value of x̂∗ is independent of the sample size.

• (We will prove this!)

321
0.0040 0.020

0.0035

0.0030 0.015

0.0025

fX̄ (x)
f (x)

0.0020 0.010

0.0015

0.0010 0.005

0.0005

0.0000 0.000
400 600 800 1000 1200 1400 1600 1800 400 600 800 1000 1200 1400 1600 1800
x x

The population density functions. The density functions of X̄ (n = 25) .

(µ1 , σ1 ) = (1000 , 100) (blue)


(µ2 , σ2 ) = (1100 , 200) (red)

322
The density functions of X̄ (n = 25) , with minimax value of x̂ .

(µ1 , σ1 ) = (1000, 100) (blue) , (µ2 , σ2 ) = (1100, 200) (red) .

323
The minimax value x̂∗ of x̂ is easily computed : At x̂∗ we have
P ( Type 1 Error ) = P ( Type 2 Error ) ,
⇐⇒
P (X̄ ≥ x̂∗ | µ = µ1 ) = P (X̄ ≤ x̂∗ | µ = µ2 ) ,
⇐⇒  µ − x̂∗   x̂∗ − µ 
1
Φ √ = Φ √2 ,
σ1 / n σ2 / n
⇐⇒ µ1 − x̂∗ x̂∗ − µ2
√ = √ , ( by monotonicity of Φ ) .
σ1 / n σ2 / n
from which
∗ µ1 · σ2 + µ2 · σ1
x̂ = . ( Check ! )
σ1 + σ2

With µ1 = 1000 , σ1 = 100 , µ2 = 1100 , σ2 = 200 , we have

∗ 1000 · 200 + 1100 · 100


x̂ = = 1033 .
100 + 200

324
Thus we have proved the following :

FACT : Suppose Lot 1 and Lot 2 are normally distributed ,


with mean and standard deviation

(µ1 , σ1 ) and (µ2 , σ2 ) , where (µ1 < µ2 ) ,

and sample size n .


Then the value of decision criterion x̂ that minimizes

max { P (Type 1 Error) , P (Type 2 Error) } ,

i.e., the value of x̂ that minimizes

max { P (X̄ ≥ x̂ | µ = µ1 , σ = σ1 ) , P (X̄ ≤ x̂ | µ = µ2 , σ = σ2 } ,

is given by
∗ σ1 µ2 + σ2 µ1
x̂ = .
σ1 + σ2

325
EXERCISE :

Determine the optimal decision criterion x̂∗ that minimizes

max { P (Type 1 Error) , P (Type 2 Error) } ,


when

(µ1 , σ1 ) = (1000, 200) , (µ2 , σ2 ) = (1100, 300) .

For this x̂∗ find the probability of a Type 1 and a Type 2 Error ,

when

n=1 , n = 25 , n = 100 .

326
EXAMPLE ( Known standard deviation ) :

Given : A sample of size 9 from a normal population with σ = 0.2


has sample mean
X̄ = 4.88 ,

Claim : The population mean is


µ = 5.00 , ( the ”null hypothesis ” H0 )

We see that | X̄ − µ | = | 4.88 − 5.00 | = 0.12 .

We reject H0 if P ( | X̄ − µ | ≥ 0.12 ) is rather small , say, if

P ( | X̄ − µ | ≥ 0.12 ) < 10 % ( ”level of significance ” 10 % )

Do we accept H0 ?

327
SOLUTION ( Known standard deviation ) :
Given: n = 9 , σ = 0.2 , X̄ = 4.88, µ = 5.0, | X̄−µ |= 0.12 .
Since
X̄ − µ
Z ≡ √ is standard normal ,
σ/ n

the ”p-value ” ( from the standard normal Table ) is


 0.12 
P ( | X̄ − µ | ≥ 0.12 ) = P | Z | ≥ √
0.2/ 9
= P ( | Z | ≥ 1.8 ) = 2 Φ(−1.8) ∼ = 7.18 % .

Thus we reject the hypothesis that µ = 5.00 significance level 10 % .

NOTE : We would accept H0 if the level of significance were 5 % .


( We are “more tolerant” when the level of significance is smaller. )

328
EXAMPLE ( Unknown standard deviation, large sample ) :

Given : A sample of size n = 64 from a normal population has

sample mean X̄ = 4.847 ,


and
sample standard deviation Ŝ = 0.234 .

Test the hypothesis that µ ≤ 4.8 ,

and reject it if P (X̄ ≥ 4.847) is small, say if

P ( X̄ ≥ 4.847 ) < 5 % .

NOTE : Since the sample size n = 64 is large, we can assume that


σ ∼
= Ŝ = 0.234 .

329
SOLUTION ( Unknown standard deviation, large sample ) :

Given X̄ = 4.847 , µ = 4.8 , n = 64 , and σ ∼


= Ŝ = 0.234 .

Using the standard normal approximation we have that


X̄ ≥ 4.847 ,
if and only if
X̄ − µ 4.847 − 4.8
Z ≡ √ ≥ = 1.6 .
σ/ n 0.234/8

From the standard normal Table we have the p-value


P (Z ≥ 1.6) = 1 − Φ(1.6) = Φ(−1.6) = 5.48 % .

CONCLUSION:
We (barely) accept H0 at level of significance 5 % .
( We would reject H0 at level of significance 10 % .)

330
EXAMPLE ( Unknown standard deviation, small sample ) :

A sample of size n = 16 from a normal population has

sample mean X̄ = 4.88 ,


and
sample standard deviation Ŝ = 0.234 .

Test the null hypothesis


H0 : µ ≤ 4.8 ,
and reject it if
P (X̄ ≥ 4.88) < 5 % .

NOTE :
If n ≤ 30 then the approximation σ ∼
= Ŝ is not so accurate.
In this case better use the ”student t-distribution ” Tn−1 .

331
The T - distribution Table
n α = 0.1 α = 0.05 α =0.01 α = 0.005
5 -1.476 -2.015 -3.365 -4.032
6 -1.440 -1.943 -3.143 -3.707
7 -1.415 -1.895 -2.998 -3.499
8 -1.397 -1.860 -2.896 -3.355
9 -1.383 -1.833 -2.821 -3.250
10 -1.372 -1.812 -2.764 -3.169
11 -1.363 -1.796 -2.718 -3.106
12 -1.356 -1.782 -2.681 -3.055
13 -1.350 -1.771 -2.650 -3.012
14 -1.345 -1.761 -2.624 -2.977
15 -1.341 -1.753 -2.602 -2.947

This Table shows tα,n values such that P (Tn ≤ tα,n ) = α .


( For example, P (T10 ≤ −2.764) = 1% )
SOLUTION ( Unknown standard deviation, small sample ) :

With n = 16 we have X̄ ≥ 4.88 if and only if


X̄ − µ 4.88 − 4.8 ∼
Tn−1 = T15 = √ ≥ = 1.37 .
Ŝ/ n 0.234/4

The t-distribution Table shows that

P ( T15 ≥ 1.341 ) = P ( T15 ≤ −1.341 ) = 10 % ,


P ( T15 ≥ 1.753 ) = P ( T15 ≤ −1.753 ) = 5%.

Thus we reject H0 at level of significance 10 % ,

but we accept H0 at level of significance 5 % .

( We are “more tolerant” when the level of significance is smaller. )

332
EXAMPLE ( Testing a hypothesis on the standard deviation ) :
A sample of 16 items from a normal population has sample
standard deviation Ŝ = 2.58 .
Do you believe the population standard deviation satisfies σ ≤ 2.0 ?

SOLUTION : We know that


n−1 2 2
2
Ŝ has the χ n−1 distribution .
σ
For our data :
n−1 2 15 2
Ŝ ≥ 2.58 if and only if 2
Ŝ ≥ 2.58 = 24.96 ,
σ 4
and from the χ2 Table
P ( χ2 ≥ 25.0 ) ∼
15 = 5.0 % .

Thus we (barely) accept the hypothesis at significance level 5 % .


( We would reject the hypothesis at significance level 10 % . )

333
EXERCISE :

A sample of 16 items from a normal population has sample


standard deviation

Ŝ = 0.83 .

Do you believe the hypothesis that σ satisfies

σ ≤ 1.2 ?

( Probably Yes ! )

334

You might also like