0% found this document useful (0 votes)
60 views320 pages

Random Variables and Probability Functions

Uploaded by

vidhisha30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views320 pages

Random Variables and Probability Functions

Uploaded by

vidhisha30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 320

Random Variables

UNIT 5 RANDOM VARIABLES


Structure
5.1 Introduction
Objectives

5.2 Random Variable


5.3 Discrete Random Variable and Probability Mass Function
5.4 Continuous Random Variable and Probability Density Function
5.5 Distribution Function
5.6 Summary
5.7 Solutions/Answers

5.1 INTRODUCTION
In the previous units, we have studied the assignment and computation of
probabilities of events in detail. In those units, we were interested in knowing
the occurrence of outcomes. In the present unit, we will be interested in the
numbers associated with such outcomes of the random experiments. Such an
interest leads to study the concept of random variable.
In this unit, we will introduce the concept of random variable, discrete and
continuous random variables in Sec. 5.2 and their probability functions in Secs.
5.3 and 5.4.
Objectives
A study of this unit would enable you to:
 define a random variable, discrete and continuous random variables;
 specify the probability mass function, i.e. probability distribution of
discrete random variable;
 specify the probability density function, i.e. probability function of
continuous random variable; and
 define the distribution function.

5.2 RANDOM VARIABLE


Study related to performing the random experiments and computation of
probabilities for events (subsets of sample space) have been made in detail in
the first four units of this course. In many experiments, we may be interested in
a numerical characteristic associated with outcomes of a random experiment.
Like the outcome, the value of such a numerical characteristic cannot be
predicted in advance.
For example, suppose a die is tossed twice and we are interested in number of
times an odd number appears. Let X be the number of appearances of odd
number. If a die is thrown twice, an odd number may appear ‘0’ times (i.e. we
5
Random Variables and may have even number both the times) or once (i.e. we may have odd number
Expectation
in one throw and even number in the other throw) or twice (i.e. we may have
odd number both the times). Here, X can take the values 0, 1, 2 and is a
variable quantity behaving randomly and hence we may call it as ‘random
variable’. Also notice that its values are real and are defined on the sample
space
{(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 4), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
i.e.
 0, if the outcome is  2, 2  ,  2, 4  ,  2, 6  ,  4, 2  ,  4, 4  ,  4, 6  ,  6, 2  ,

  6, 4  ,  6, 6 

1, if the outcome is 1, 2  , 1, 4  , 1, 6  ,  2,1 ,  2, 3  ,  2, 5  ,  3, 2  ,

X  3, 4  ,  3, 6  ,  4,1 ,  4,3  ,  4, 5  ,  5, 2  , 5, 4  ,

  5, 6  ,  6,1 ,  6, 3 ,  6, 5 
 2, if the outcome is 1,1 , 1, 3  , 1,5  ,  3,1 ,  3,3  ,  3, 5  ,  5,1 ,

  5, 3 ,  5, 5 

9 1 18 1 9 1
So, P[X = 0] =  , P[X  1]   , P[X  2]   ,
36 4 36 2 36 4
1 1 1
and P[X  0]  P  X  1  P  X  2     1.
4 2 4
Observe that, a probability can be assigned to the event that X assumes a
particular value. It can also be observed that the sum of the probabilities
corresponding to different values of X is one.
So, a random variable can be defined as below:
Definition: A random variable is a real-valued function whose domain is a set
of possible outcomes of a random experiment and range is a sub-set of the set
of real numbers and has the following properties:
i) Each particular value of the random variable can be assigned some
probability
ii) Uniting all the probabilities associated with all the different values of the
random variable gives the value 1(unity).
Remark 1: We shall denote random variables by capital letters like X, Y, Z,
etc. and write r.v. for random variable.

6
Random Variables
5.3 DISCRETE RANDOM VARIABLE AND
PROBABILITY MASS FUNCTION
Discrete Random Variable
A random variable is said to be discrete if it has either a finite or a countable
number of values. Countable number of values means the values which can
be arranged in a sequence, i.e. the values which have one-to-one
correspondence with the set of natural numbers, i.e. on the basis of three-four
successive known terms, we can catch a rule and hence can write the
subsequent terms. For example suppose X is a random variable taking the
values say 2, 5, 8, 11, … then we can write the fifth, sixth, … values,
because the values have one-to-one correspondence with the set of natural
numbers and have the general term as 3n  1 i.e. on taking n = 1, 2, 3, 4, 5, …
we have 2, 5, 8, 11, 14,…. So, X in this example is a discrete random
variable. The number of students present each day in a class during an
academic session is an example of discrete random variable as the number
cannot take a fractional value.
Probability Mass Function
Let X be a r.v. which takes the values x1, x2, ... and let P  X  x i  = p(xi). This
function p(xi), i =1,2, … defined for the values x1, x2, … assumed by X is
called probability mass function of X satisfying p(xi)  0 and  p  x i   1 .
i

The set  x , p  x   ,  x , p  x   ,... specifies the probability distribution of a


1 1 2 2

discrete r.v. X. Probability distribution of r.v. X can also be exhibited in the


following manner:

X x1 x2 x3 …

p( x ) p( x1 ) p( x 2 ) p( x 3 )…

Now, let us take up some examples concerning probability mass function:


Example 1: State, giving reasons, which of the following are not probability
distributions:
(i)
X 0 1
p( x ) 1 3
2 4

(ii)
X 0 1 2
p( x ) 3 1 3

4 2 4

7
Random Variables and (iii)
Expectation
X 0 1 2
p( x ) 1 1 1
4 2 4

(iv)
X 0 1 2 3
p( x ) 1 3 1 1
8 8 4 8

Solution:
(i) Here p( x i )  0, i = 1, 2; but
2
1 3 5
px 
i 1
i = p( x1 ) + p( x 2 ) = p(0) + p(1) =    1.
2 4 4
2
So, the given distribution is not a probability distribution as  p  x  is
i 1
i

greater than 1.
1
(ii) It is not probability distribution as p(x2) = p(1) =  i.e. negative
2
(iii) Here, p(x i )  0 , i = 1, 2, 3
3
1 1 1
and  p  x   p  x   p  x   p  x   p  0   p 1  p  2   4  2  4  1 .
i 1
i 1 2 3

 The given distribution is probability distribution.


(iv) Here, p(xi)  0, i = 1, 2, 3, 4; but
4

 p  x  = p( x ) + p( x
i 1
i 1 2 ) + p( x3 ) + p( x 4 )

1 3 1 1 7
= p(0) + p(1) + p(2) + p(3) =      1.
8 8 4 8 8
The given distribution is not probability distribution.
Example 2: For the following probability distribution of a discrete r.v. X, find
i) the constant c,
ii) P[X  3] and
iii) P[1 < X < 4].
X 0 1 2 3 4 5
p( x ) 0 c c 2c 3c c
8
Solution: Random Variables

i) As the given distribution is probability distribution,

 px  1
i
i

1
 0 + c + c + 2c + 3c + c = 1  8 c = 1  c =
8
ii) P[X  3] = P[X = 3] + P[X = 2] + P[X = 1] + P[X = 0]
1 1
= 2c + c+ c + 0 = 4 c = 4   .
8 2
1 3
iii) P[1 < X < 4] = P[X = 2] + P[X = 3] = c + 2c = 3c = 3  .
8 8
Example 3: Find the probability distribution of the number of heads when
three fair coins are tossed simultaneously.
Solution: Let X be the number of heads in the toss of three fair coins.
As the random variable, “the number of heads” in a toss of three coins may be
0 or 1 or 2 or 3 associated with the sample space
{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT},
 X can take the values 0, 1, 2, 3, with
1
P[X = 0] = P[TTT ] =
8
3
P[X = 1] = P[HTT, THT, TTH] =
8
3
P[X = 2] = P[HHT, HTH, THH] =
8
1
P[X = 3] = P [HHH] = .
8
Probability distribution of X, i.e. the number of heads when three coins are
tossed simultaneously is
X 0 1 2 3
p( x ) 1 3 3 1
8 8 8 8

which is the required probability distribution.


Example 4: A r.v. X assumes the values  2,  1, 0, 1, 2 such that
P[X =  2] = P[X =  1] = P[X = 1] = P[X = 2],
P[X < 0] = P[X = 0] = P[X > 0].
Obtain the probability mass function of X.

9
Random Variables and Solution: As P[X < 0] = P[X = 0] = P[X > 0]
Expectation
 P[X =  1] + P[X =  2] = P[X = 0] = P[X = 1] + P[X = 2]
 p + p = P[X = 0] = p + p
[Letting P[X = 1] = P[X = 2] = P[X =  1] = P[X =  2] = p]
 P[X = 0] = 2p.
Now, as P[X < 0] + P[X = 0] + P[X > 0] = 1,
 P[X =  1] + P[X =  2] + P[X = 0] + P[X = 1] + P[X = 2] = 1
 p + p + 2p + p + p = 1
1
 6p = 1  p =
6
1 2
 P[X = 0] = 2p = 2  = ,
6 6
1
P[X =  1] = P[X =  2] = P[X = 1] = P[X = 2] = p = .
6
Hence, the probability distribution of X is given by

X 2 1 0 1 2
p(x) 1 1 2 1 1
6 6 6 6 6

Now, here are some exercises for you.


E1) 2 bad articles are mixed with 5 good ones. Find the probability
distribution of the number of bad articles, if 2 articles are drawn at
random.
E2) Given the probability distribution:

X 0 1 2 3
p( x ) 1 3 1 1
10 10 2 10

Let Y = X2 + 2X. Find the probability distribution of Y.


E3) An urn contains 3 white and 4 red balls. 3 balls are drawn one by one
with replacement. Find the probability distribution of the number of red
balls.

10
Let us define and explain a continuous random variable and its probability Random Variables
function in the next section.

5.4 CONTINUOUS RANDOM VARIABLE AND


PROBABILITY DENSITY FUNCTION
In Sec. 5.3 of this unit, we have defined the discrete random variable as a
random variable having countable number of values, i.e. whose values can be
arranged in a sequence. But, if a random variable is such that its values cannot
be arranged in a sequence, it is called continuous random variable.
Temperature of a city at various points of time during a day is an example of
continuous random variable as the temperature takes uncountable values, i.e. it
can take fractional values also. So, a random variable is said to be continuous
if it can take all possible real (i.e. integer as well as fractional) values between
two certain limits. For example, let us denote the variable, “Difference
between the rainfall (in cm) of a city and that of another city on every rainy
day in a rainy reason”, by X, then X here is a continuous random variable as it
can take any real value between two certain limits. It can be noticed that for a
continuous random variable, the chance of occurrence of a particular value of
the variable is very small, so instead of specifying the probability of taking a
particular value by the variable, we specify the probability of its lying within
an interval. For example, chance that an athlete will finish a race in say exactly
10 seconds is very-very small, i.e. almost zero as it is very rare to finish the
race in a fixed time. Here, the probability is specified for an interval, i.e. we
may be interested in finding as to what is the probability of finishing the race
by the athlete in an interval of say 10 to 12 seconds.
So, continuous random variable is represented by different representation
known as probability density function unlike the discrete random variable
which is represented by probability mass function.
Probability Density Function
Let f( x ) be a continuous function of x . Suppose the shaded region ABCD
shown in the following figure represents the area bounded by y = f( x ), x -axis
and the ordinates at the points x and x + x , where x is the length of the
interval ( x , x + x ).

A B
y = f(x)

D C
x x + x
Fig. 5.1

Now, if x is very-very small, then the curve AB will act as a line and hence
the shaded region will be a rectangle whose area will be AD  DC i.e. f( x ) x
[ AD = the value of y at x i.e. f ( x ), DC = length x of the interval
( x, x + x )]
Also, this area = probability that X lies in the interval ( x, x + x )
= P[ x  X  x + x ]
11
Random Variables and Hence,
Expectation
P[ x  X  x + x ] = f (x) x
P[x  X  x  x]
  f  x  , where x is very-very small
x
P[x  X  x  x]
 lim  f x.
x  0 x
f( x ), so defined, is called probability density function.
Probability density function has the same properties as that of probability mass
function. So, f( x )  0 and sum of the probabilities of all possible values that
the random variable can take, has to be 1. But, here, as X is a continuous
random variable, the summation is made possible through ‘integration’ and
hence

 f  x  dx  1 ,
R

where integral has been taken over the entire range R of values of X.
Remark 2
i) Summation and integration have the same meanings but in mathematics
there is still difference between the two and that is that the former is used
in case of discrete values, i.e. countable values and the latter is used in
continuous case.
ii) An essential property of a continuous random variable is that there is
zero probability that it takes any specified numerical value, but the
probability that it takes a value in specified intervals is non-zero and is
calculable as a definite integral of the probability density function of the
random variable and hence the probability that a continuous r.v. X will
lie between two values a and b is given by
b
P[a < X < b] =  f  x  dx .
a

Example 5: A continuous random variable X has the probability density


function:
f( x ) = Ax3, 0  x  1.
Determine
i) A
ii) P[0.2 < X < 0.5]
3 1
iii) P[X > given X > ]
4 2
Solution:
(i) As f( x ) is probability density function,
  f  x  dx  1
R

12
1 1 Random Variables
  f  x  dx  1   Ax 3dx  1
0 0

1
 x4  1 
 A   1  A  0 1  A  4
 4 0 4 
0.5 0.5 0.5
 x4 
(ii) P[0.2 < X < 0.5] =  f  x  dx =  Ax dx  4   = [(0.5)4 – (0.2)4]
3

0.2 0.2  4  0.2


= 0.0625 – 0.0016= 0.0609
3 1 3 1
(iii) P[X > given X > ] = P[ X   X  ]
4 2 4 2
 3 1
P X   X  
4 2 P A  B
=  [ P(AB) = ]
 1  P  B
P X  
 2 
 3
P X   the common portion for 
4 
   
 1  X > 3 and X > 1 is X > 3 
P X   4 2 4
 2 

1
 3 0 1 3
Now, P  X     f  x  dx
 4 3 2 4
4

 3 
 lower limit is 4 and upper limit 
 
 is given in the problem which is1 
1 1 4
 x4  4 3 81 175
=  4x dx  4   = 1     1 
3
 , and
3  4 3 4 256 256
4
4

1
 1 1 15
P  X     f  x   [x 4 ]11/ 2  1   .
 2 1 16 16
2

 3
P X 
4  175 16 35 35
 the required probability       .
 1  256 15 16  3 48
P X 
 2 
Example 6: The p.d.f. of the different weights of a “1 litre pure ghee pack” of
a company is given by:

 200  x  1 for 1  x  1.1


f(x) = 
 0, otherwise
13
Random Variables and Examine whether the given p.d.f. is a valid one. If yes, find the probability that
Expectation
the weight of any pack will lie between 1.01 and 1.02.
Solution: For 1  x  1.1, we have f(x)  0, and
1.1
1.1 1.1
 x2    1.12   1 
1 f  x  dx  1 200  x  1 dx  200   x   200    1.1     1
2 1   2   2 

 1.21  2.2   1  2    0.99 1   0.01 = 1.


 200      200     = 200
 2   2   2 2 2

 f(x) is p.d.f.
1.02 1.02
 x2 
Now, P[1.01 < X < 1.02] =  200  x  1 dx = 200   x 
1.01 2 1.01

  1.02  2   1.01 2 


= 200    1.02     1.01
  2  
2


1.0404 1.0201 
= 200   1.02   1.01
 2 2 
= 200  0.5202  1.02  0.51005  1.01

= 200  0.00015 = 0.03.

Now, you can try the following exercise.


E4) The life (in hours) X of a certain type of light bulb may be supposed to
be a continuous random variable with p.d.f.:

A
 , 1500  x  2500
f  x    x3
 0, elsewhere

Determine the constant A and compute the probability that


1600  X  2000.

5.5 DISTRIBUTION FUNCTION


A function F defined for all values of a random variable X by
F( x ) = P[X  x ] is called the distribution function. It is also known as the
cumulative distribution function (c.d.f.) of X since it is the cumulative
probability of X up to and including the value x. As X can take any real value,
therefore the domain of the distribution function is set of real numbers and as
F(x) is a probability value, therefore the range of the distribution function is
[0, 1].
Remark 3: Here, X denotes the random variable and x represents a particular
value of random variable. F( x ) may also be written as FX( x ), which means
that it is a distribution function of random variable X.

14
Discrete Distribution Function Random Variables

Distribution function of a discrete random variable is said to be discrete


distribution function or cumulative distribution function (c.d.f.). Let X be a
discrete random variable taking the values x1, x2, x3, … with respective
probabilities p1, p2, p3, …
Then F( x i ) = P[X  x i ] = P[X = x1 ] + P[X = x 2 ] + … + P[X = x i ]

= p1 + p2 + ... + pi .

The distribution function of X, in this case, is given as in the following table:


X F( x )
x1 p1

x2 p1  p 2

x3 p1  p 2  p3

x4 p1  p 2  p3  p 4

. .
. .
. .

The value of F(x) corresponding to the last value of the random variable X is
always 1, as it is the sum of all the probabilities. F(x) remains 1 beyond this
last value of X also, as it being a probability can never exceed one.
For example, Let X be a random variable having the following probability
distribution:
X 0 1 2
p( x ) 1 1 1
4 2 4

Notice that p( x ) will be zero for other values of X. Then, Distribution function
of X is given by
X F( x ) = P[X  x ]
0 1
4
1 1 1 3
 
4 2 4
2 1 1 1
  1
4 2 4

15
Random Variables and Here, for the last value, i.e. for X = 2, we have F( x ) =1.
Expectation
Also, if we take a value beyond 2 say 4, then we get
F(4) = P[X  4]
= P[X = 4] + P[X = 3] + P[X  2]
= 0 + 0 + 1 = 1.
Example 7: A random variable X has the following probability function:
X 0 1 2 3 4 5 6 7
p( x ) 1 1 1 3 1 1 17
0
10 5 5 10 100 50 100
Determine the distribution function of X.
Solution: Here,
F(0) = P[X  0] = P[X = 0] = 0,
1 1
F(1) = P[X  1] = P[X = 0] + P [X = 1] = 0 +  ,
10 10
1 1 3
F(2) = P[X  2] = P[X = 0] + P [X = 1] + [X = 2] = 0 +   ,
10 5 10
and so on. Thus, the distribution function F( x ) of X is given in the following
table:
X F( x ) = P[X  x ]
0 0
1 1
10
2 3
10
3 3 1 1
 
10 5 2
4 1 3 4
 
2 10 5
5 4 1 81
 
5 100 100
6 81 1 83
 
100 50 100
7 83 17
 1
100 100

16
Continuous Distribution Function Random Variables

Distribution function of a continuous random variable is called the continuous


distribution function or cumulative distribution function (c.d.f.).
Let X be a continuous random variable having the probability density function
f( x ), as defined in the last section of this unit, then the distribution function
F( x ) is given by
x
F( x ) = P[X  x ] =  f  x  dx .


Also, in the last section, we have defined the p.d.f. f( x ) as


P  x  X  x  x 
f  x   lim ,
x  0 x
P  X  x  x   P  X  x  x  x
 f  x   lim x
x  0 x
F  x  x   F  x 
 f  x   lim ,
x  0 x
 By definition of 
 f  x   Derivative of F( x ) with respect to x  
 the derivative 
 f  x   F ' x 

d
 f(x) =
dx
 F  x 
 dF(x) = f(x) dx

Here, dF( x ) is known as the probability differential.


x
So, F  x    f  x  dx
X and F(x) = f(x).

Example 8: The diameter ‘X’ of a cable is assumed to be a continuous random
6x 1  x  , 0  x  1
variable with p.d.f. f  x    .
 0, elsewhere
Obtain the c.d.f. of X.
Solution: For 0  x  1, the c.d.f. of X is given by

x x
F  x   P[X  x]   f  x  dx =  6x 1  x  dx
0 0
x x
 x 2 x3 
= 6   x  x  dx = 6    = 3x 2  2x 3
2

0 2 3 0

17
Random Variables and  The c.d.f. of X is given by
Expectation

 0, x0
 2 3
F  x   3x  2x , 0  x  1.
 1, x 1

Remark 4: In the above example, F( x ) is taken as 0 for x < 0 since p( x ) = 0


for x < 0 ; and F( x ) is taken as 1 for x > 1 since F(1) = 1 and therefore,

for x > 1 also F( x ) will remain 1.

Now, you can try the following exercises.

E 5) A random variable X has the following probability distribution:

X 0 1 2 3 4 5 6 7 8

p( x ) k 3k 5k 7k 9k 11k 13k 15k 17k

i) Determine the value of k.

ii) Find the distribution function of X.

E 6) Let X be continuous random variable with p.d.f. given by.

 x
 2, 0  x 1

 1 , 1 x  2
f x   2 .
1
 3  x  , 2x 3
2
 0, elsewhere.

Determine F( x ), the c.d.f. of X.

5.6 SUMMARY
Following main points have been covered in this unit of the course:
1) A random variable is a function whose domain is a set of possible
outcomes and range is a sub-set of the set of reals and has the following
properties:
i) Each particular value of the random variable can be assigned some
probability.
ii) Sum of all the probabilities associated with all the different values of
18 the random variable is unity.
2) A random variable is said to be discrete random variable if it has either Random Variables
a finite number of values or a countable number of values, i.e. the values
can be arranged in a sequence.
3) If a random variable is such that its values cannot be arranged in a
sequence, it is called continuous random variable. So, a random
variable is said to be continuous if it can take all the possible real
(i.e. integer as well as fractional) values between two certain limits.
4) Let X be a discrete r.v. which take on the values x1 , x2, ... and let
P  X  x i  = p(xi). The function p(xi) is called probability mass
function of X satisfying p(xi)  0 and  p  x   1 . The set
i
i

 x , p  x   ,  x , p  x  ,... specifies the probability distribution of


1 1 2 2

discrete r.v. X.
5) Let X be a continuous random variable and f( x ) be a continuous
function of x . Suppose ( x , x + x ) be an interval of length x . Then
P[x  X  x  x]
f( x ) defined by lim  f  x  is called the probability
x  0 x
density function of X.
Probability density function has the same properties as that of probability
mass function i.e. f( x ) 0 and  f  x  dx  1 , where integral has been
R
taken over the entire range R of values of X.
6) A function F defined for all values of a random variable X by
F( x ) = P[X  x ] is called the distribution function. It is also known as
the cumulative distribution function (c.d.f.) of X. The domain of the
distribution function is a set of real numbers and its range is [0, 1].
Distribution function of a discrete random variable X is said to be
discrete distribution function and is given by
 
 x1 , F  x1   ,  x 2 , F  x 2   ,... . Distribution function of a continuous
random variable X having the probability density function f( x ) is said to
be continuous distribution function and is given by
x
F( x ) = P[X  x ] =  f  x  dx .


Derivative of F( x ) with respect to x is f  x  , i.e. F '  x  = f  x  .

5.7 SOLUTIONS/ANSWERS
E 1) Let X be the number of bad articles drawn.
 X can take the values 0, 1, 2 with
P[X = 0] = P[No bad article]
= P[Drawing 2 articles from 5 good articles and zero article
from 2 bad articles]

19
Random Variables and 5
Expectation
C 2 2 C0 5  4 1 10
 7
  ,
C2 76 21
P[X = 1] = P[One bad article and 1 good article]
2
C1  5 C1 2  5  2 10
 7
  , and
C2 7 6 21
P[X = 2] = P[Two bad articles and no good article]
2
C 2  5 C0 1 1  2 1
= 7
 
C2 76 21

 Probability distribution of number of bad articles is:


X 0 1 2
p( x ) 10 10 1
21 21 21

E2) As Y = X2 + 2X,
 For X = 0, Y = 0 + 0 = 0;
For X = 1, Y = 12 + 2(1) = 3;
For X = 2, Y = 22 + 2(2) = 8; and
For X = 3, Y = 32 + 2(3) = 15.
Thus, the values of Y are 0, 3, 8, 15 corresponding to the values
0, 1, 2, 3 of X and hence
1 3
P[Y = 0] = P[X = 0] = , P[Y = 3] = P[X = 1] = ,
10 10
1 1
P[Y = 8] = P[X = 2] = and P[Y = 15] = P[X = 3] = .
2 10
 The probability distribution of Y is

Y 0 3 8 15
p(y) 1 3 1 1
10 10 2 10

E3) Let X be the number of red balls drawn.


 X can take the values 0, 1, 2, 3.
Let Wi be the event that i th draw gives white ball and Ri be the event
that i th draw gives red ball.
 P[X = 0] = P[No Red ball] = P[W1  W2  W3]
=P(W1). P(W2). P(W3)
[ balls are drawn with replacement and hence the draws are independent]

20
3 3 3 27 Random Variables
=   
7 7 7 343
P[X = 1] = P[One red and two white]

= P  R1  W2  W3  or  W1  R 2  W3  or  W1  W2  R 3  

= P  R 1  W2  W3   P  W1  R 2  W3   P  W1  W2  R 3 

= P[R 1 ]P  W2  P  W3   P  W1  P  R 2  P  W3   P  W1  P  W2  P R 3 

4 3 3 3 4 3 3 3 4 4 3 3 108
=         = 3   = ,
7 7 7 7 7 7 7 7 7 7 7 7 343
P[X = 2] = P[Two red and one white]

 P  R1  R 2  W3  or  R 1  W2  R 3  or  W1  R 2  R 3  

= P[R 1 ]P  R 2  P  W3   P  R1  P  W2  P  R 3   P  W1  P  R 2  P  R 3 

4 4 3 4 3 4 3 4 4 4 4 3 144
=         = 3   = .
7 7 7 7 7 7 7 7 7 7 7 7 343
P[X = 3] = P [Three red balls]
4 4 4 64
= P[R1  R2  R3] = P(R1) P(R2) P(R3) =    .
7 7 7 343
 Probability distribution of the number of red balls is

X 0 1 2 3
p( x ) 27 108 144 64
343 343 343 343

E4) As f( x ) is p.d.f.,
2500 2500 2500
A 3  x 2 
  3
dx  1  A  x dx  1  A   1
1500
x 1500   2 1500

A 1 1  A  1 1 
    1    1
2 2
2   2500  1500   20000  625 225 

 
A  9  25 
  1  16A = 5625  20000
20000  5625 
5625  20000
 A= = 5625  1250 = 7031250.
16
2000 2000
1
Now, P[1600  X  2000] =  f  x  dx = A  dx
1600 1600
x3

21
Random Variables and
A 1 1 
2000
Expectation A 1 
=      
2  x 2 1600 2   2000  2 1600 2 
 
A  1 1  A 16  25 
=     =
20000  400 256  20000  6400 
9  7031250 2025
= =
20000  6400 4096
E5) i) As the given distribution is probability distribution,
 sum of all the probabilities = 1
 k + 3k + 5k + 7k + 9k + 11k + 13k + 15k + 17k = 1
1
81 k = 1  k =
81
ii) The distribution function of X is given in the following table:
X F( x ) = P[X  x ]
0 1
k
81
1 4
k  3k  4k 
81
2 9
4k  5k  9k 
81
3 16
9k  7k  16k 
81
4 25
16k  9k  25k 
81
5 36
25k  11k  36k 
81
6 49
36k  13k  49k 
81
7 64
49k  15k  64k 
81
8 64k  17k  81k  1

E6) For x < 0,


x x
F( x ) = P[X  x ] =  f  x  dx =  0dx  0 [ f(x) = 0 for x < 0] .
 

22
For 0  x < 1, Random Variables

x 0 x
F( x ) = P[X  x ] =

 f  x  dx =  f  x  dx   f  x  dx
 0
[ 0  x <1]

x
x  x 
=0+  2 dx f (x)  2 for 0  x  1
0

x
1  x2  x2
=    .
2  2 0 4

For 1  x < 2,
x
F ( x ) = P [X  x ] =

 f  x  dx
0 1 x


 f  x  dx   f  x  dx   f  x  dx
0 1

1 x
x 1 1 1 1
= 0   dx   dx =  x 2   [x]1x
0
2 1
2 4 0 2

1 x 1 1
=     2x  1
4 2 2 4
For 2  x < 3,
x
F x =  f  x  dx


0 1 2 x
=  f  x  dx   f  x  dx   f  x  dx   f  x  dx
 0 1 2

1 2 x
x 1 1
= 0   dx   dx    3  x  dx
0
2 1
2 2
2
1 x
 x2  1 2 1  x2 
=     x 1  3x  
 4 0 2 2 2 2

1 1 1  x2  
=    3x     6  2  
4 2 2  2  

1 x2  5 x 2 3x 5
=  3x    =   
2 2  4 4 2 4

For 3  x < ,
x
F x =  f  x  dx


23
Random Variables and 0 1 2 3 x
Expectation = f  x  dx   f  x  dx   f  x  dx   f  x  dx  f  x  dx

 0 1 2 3

1 2 3 x
x 1 1
= 0   dx   dx    3  x  dx   0 dx
0
2 1
2 2
2 3

1 2 3
 x2   x  1  x2 
=       3x    0
 4  0  2 1 2  2 2

1  1  1  9 
=  1     9     6  2  
4  2  2  2 
1 1 19 
=     4
4 2 22 
1 1 1
=   1
4 2 4
Hence, the distribution function is given by:
 0,   x  0

 x2
, 0  x 1
 4
 2x  1
Fx    , 1 x  2
 4
 x 2 3x 5
   , 2 x3
 4 2 4
 1, 3 x  

24
Bivariate Discrete Random
UNIT 6 BIVARIATE DISCRETE RANDOM Variables
VARIABLES
Structure
6.1 Introduction
Objectives

6.2 Bivariate Discrete Random Variables


6.3 Joint, Marginal and Conditional Probability Mass Functions
6.4 Joint and Marginal Distribution Functions for Discrete Random Variables
6.5 Summary
6.6 Solutions/Answers

6.1 INTRODUCTION
In Unit 5, you have studied one-dimensional random variables and their
probability mass functions, density functions and distribution functions. There
may also be situations where we have to study two-dimensional random
variables in connection with a random experiment. For example, we may be
interested in recording the number of boys and girls born in a hospital on a
particular day. Here, ‘the number of boys’ and ‘the number of girls’ are
random variables taking the values 0, 1, 2, … and both these random variables
are discrete also.
In this unit, we concentrate on the two-dimensional discrete random variables
defining them in Sec. 6.2. The joint, marginal and conditional probability mass
functions of two-dimensional random variable are described in Sec. 6.3. The
distribution function and the marginal distribution function are discussed in
Sec. 6.4.
Objectives
A study of this unit would enable you to:
 define two-dimensional discrete random variable;
 specify the joint probability mass function of two discrete random
variables;
 obtain the marginal and conditional distributions for two-dimensional
discrete random variable;
 define two-dimensional distribution function;
 define the marginal distribution functions; and
 solve various practical problems on bivariate discrete random variables.

25
Random Variables and
Expectation 6.2 BIVARIATE DISCRETE RANDOM
VARIABLES
In Unit 5, the concept of single-dimensional random variable has been studied
in detail. Proceeding in analogy with the one-dimensional case, concept of
two-dimensional discrete random variables is discussed in the present unit.
A situation where two-dimensional discrete random variable needs to be
studied has already been given in Sec. 6.1 of this unit. To describe such
situations mathematically, the study of two random variables is introduced.
Definition: Let X and Y be two discrete random variables defined on the
sample space S of a random experiment then the function (X, Y) defined on
the same sample space is called a two-dimensional discrete random variable. In
others words, (X, Y) is a two-dimensional random variable if the possible
values of (X, Y) are finite or countably infinite. Here, each value of X and Y is
represented as a point ( x, y ) in the xy-plane.
As an illustration, let us consider the following example:
Let three balls b1, b2, b3 be placed randomly in three cells. The possible
outcomes of placing the three balls in three cells are shown in Table 6.1.
Table 6.1 : Possible Outcomes of Placing the Three Balls in Three Cells
Arrangement Placement of the Balls in
Number
Cell 1 Cell 2 Cell 3
1 b1 b2 b3
2 b1 b3 b2
3 b2 b1 b3
4 b2 b3 b1
5 b3 b1 b2
6 b3 b2 b1
7 b1,b2 b3 -
8 b1,b2 - b3
9 - b1,b2 b3
10 b1,b3 b2 -
11 b1,b3 - b2
12 - b1,b3 b2
13 b2,b3 b1 -
14 b2,b3 - b1
15 - b2,b3 b1
16 b1 b2,b3 -
17 b1 - b2,b3
18 - b1 b2,b3
26
19 b2 b3,b1 - Bivariate Discrete Random
Variables
20 b2 - b3,b1
21 - b2 b3,b1
22 b3 b1,b2 -
23 b3 - b1,b2
24 - b3 b1,b2
25 b1,b2,b3 - -
26 - b1,b2,b3 -
27 - - b1,b2,b3

Now, let X denote the number of balls in Cell 1 and Y be the number of cells
occupied. Notice that X and Y are discrete random variables where X take on
the values 0, 1, 2, 3 ( number of balls in Cell 1 may be 0 or 1 or 2 or 3) and
Y take on the values 1, 2, 3 ( number of occupied cells may be 1 or 2 or 3).
The possible values of two-dimensional random variable (X, Y), therefore, are
all ordered pairs of the values x and y of X and Y, respectively, i.e. are (0, 1),
(0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3).

 
Now, to each possible value xi , y j of (X, Y), we can associate a number
p  x i , y j  representing P  X  x i , Y  y j  as discussed in the following section
of this unit.

6.3 JOINT, MARGINAL AND CONDITIONAL


PROBABILITY MASS FUNCTIONS
Let us again consider the example discussed in Sec. 6.2. In this example, we
have obtained all possible values of (X, Y), where X is the number of balls in
Cell 1 and Y be the number of occupied cells. Now, let us associate numbers
p  x i , y j  representing P  X  x i , Y  y j  as follows:

p(0, 1) = P[X = 0, Y = 1] = P[no ball in Cell 1 and 1 cell occupied]


2
= P[Arrangement numbers 26, 27] =
27
p(0, 2) = P[X = 0, Y = 2] = P[no ball in Cell 1 and 2 cells occupied]
= P[Arrangement numbers 9, 12, 15, 18, 21, 24]
6
=
27
p(0, 3) = P[X = 0, Y = 3] = P[no ball in Cell 1 and 3 cells occupied]
= P[Impossible event] = 0
p(1, 1) = P[X = 1, Y = 1] = P[one ball in Cell 1 and 1 cell occupied]
= P[Impossible event] = 0

27
Random Variables and
Expectation p(1, 2) = P[X = 1, Y = 2] = P[one ball in Cell 1 and 2 cells occupied]
= P[Arrangement numbers 16, 17, 19, 20, 22, 23]
6
=
27
p(1, 3) = P[X = 1, Y = 3] = P[one ball in Cell 1 and 3 cells occupied]
6
= P[Arrangement numbers1 to 6] =
27
p(2, 1) = P[X = 2, Y = 1] = P[two balls in Cell 1 and 1 cell occupied]
= P[Impossible event] = 0
p(2, 2) = P[X = 2, Y = 2] = P[two balls in Cell 1 and 2 cells occupied]
= P[Arrangement numbers 7, 8, 10, 11, 13, 14]
6
=
27
p(2, 3) = P[X = 2, Y = 3] = P[two balls in Cell 1 and 3 cells occupied]
= P[Impossible event] = 0
p(3, 1) = P[X = 3, Y = 1] = P[three balls in Cell 1 and 1 cell occupied]
1
= P[Arrangement number 25] =
27
p(3, 2) = P[X = 3, Y = 2] = P[three balls in Cell 1 and 2 cells occupied]
= P[Impossible event] = 0
p(3, 3) = P[X = 3, Y = 3] = P[three balls in Cell 1 and 3 cells occupied]
= P[Impossible event] = 0

The values of (X, Y) together with the number associated as above constitute
what is known as joint probability distribution of (X, Y) which can be written
in the tabular form also as shown below:

Y 1 2 3 Total
X
0 2 / 27 6 / 27 0 8 / 27
1 0 6 / 27 6 / 27 12 / 27
2 0 6 / 27 0 6 / 27
3 1 / 27 0 0 1 / 27
Total 3 / 27 18 / 27 6 / 27 1

28
We are now in a position to define joint, marginal and conditional probability Bivariate Discrete Random
mass functions. Variables

Joint Probability Mass Function


Let (X, Y) be a two-dimensional discrete random variable. With each possible
   
outcome xi , y j , we associate a number p x i , y j representing
P  X  x i , Y  y j  or P  X  x i  Y  y j  and satisfying following conditions:

(i) p  xi , y j   0

(ii)  p  x , y   1
i j
i j

The function p defined for all  x i , y j  is in analogy with one-dimensional case


and called the joint probability mass function of X and Y. It is usually
represented in the form of the table as shown in the example discussed above.
Marginal Probability Function
Let (X, Y) be a discrete two-dimensional random variable which take up finite
 
or countably infinite values x i , y j . For each such two-dimensional random
variable (X, Y), we may be interested in the probability distribution of X or the
probability distribution of Y, individually.
Let us again consider the example of the random placement of three balls in
there cells wherein X and Y are the discrete random variables representing “the
number of balls in Cell 1” and “the number of occupied cells”, respectively.
Let us consider Table 6.1 showing the joint distribution of (X, Y). From this
table, let us take up the row totals and column totals. The row totals in the table
represent the probability distribution of X and the column totals represent the
probability distribution of Y, individually. That is,
2 6 8
P[X  0]   0
27 27 27
6 6 12
P[X  1]  0   
27 27 27
6 6
P[X  2]  0  0
27 27
1 1
P[X  3]  00  and
27 27
2 1 3
P[Y  1]  00 
27 27 27
6 6 6 18
P[Y  2]    0
27 27 27 27
6 6
P[Y  3]  0  00 
27 27
These distributions of X and Y, individually, are called the marginal
probability distributions of X and Y, respectively.
29
Random Variables and
Expectation
So, if (X, Y) is a discrete two-dimensional random variable which take up the
values  x i , y j  , then the probability distribution of X is determined as follows:

p  x i   P X  xi 

= P  X  x i  Y  y1  or  X  x i  Y  y 2  or...

 P  X  x i  Y  y1   P  X  x i  Y  y 2   P  X  x i  Y  y 3   ...

=  P X  x
j
i  Y  y j 

  p  x i , y j  , the joint probability mass 


  p  xi , y j   
j function, is P  X  x i  Y  y j  
   
which is known as the marginal probability mass function of X. Similarly, the
probability distribution of Y is

p  y j   P  Y  y j 

= P  X  x1  Y  y j   P  X  x 2  Y  y j   ...

  P  X  x i  Y  y j 
i

=  p  xi , y j 
i

and is known as the marginal probability mass function of Y.


Conditional Probability Mass Function
Let (X, Y) be a discrete two-dimensional random variable. Then the
conditional probability mass function of X, given Y  y is defined as

p  x  y   P[X  x  Y  y]

P  X  x  Y  y
= , provided P[Y = y]  0
P  Y  y

 P  A  B 
P  A  B  , P  B  0 
 P  B 
Similarly, the conditional probability mass function of Y, given X = x, is
defined as
P Y  y  X  x 
p  y  x   P Y  y  X  x  
P X  x 

Let us again consider the example as already discussed in this section.


Suppose, we are interested in finding the conditional probability mass function
of X given Y = 2. Then the conditional probabilities are found separately for
each value of X given Y = 2. That is, we proceed as follows:

30
6 Bivariate Discrete Random
P  X  0  Y  2 1 Variables
P  X  0  Y  2   27 
P  Y  2 18 3
27
6
P  X  1  Y  2 1
P  X  1  Y  2   27 
P  Y  2 18 3
27
6
P  X  2  Y  2 1
P  X  2  Y  2   27 
P  Y  2 18 3
27
P  X  3  Y  2 0
P  X  3  Y  2   0
P  Y  2 18
27
[Note that values of numerator and denominator in the above expressions have
already been obtained while discussing the joint and marginal probability mass
functions in this section of the unit.]
Independence of Random Variables
Two discrete random variables X and Y are said to be independent if and only
if

P  X  x i  Y  y j   P  X  x i  P  Y  y j 

[ two events A and B are independent if and only if P(A  B) = P(A) P(B)]
Example 1: The following table represents the joint probability distribution of
the discrete random variable (X, Y):
Y 1 2
X
1 0.1 0.2
2 0.1 0.3
3 0.2 0.1
Find :
i) The marginal distributions.
ii) The conditional distribution of X given Y = 1.
iii) P[(X + Y) < 4].
Solution:
i) To find the marginal distributions, we have to find the marginal totals, i.e.
row totals and column totals as shown in the following table:

31
Random Variables and
Expectation
Y 1 2 px
X
(Totals)
1 0.1 0.2 0.3
2 0.1 0.3 0.4
3 0.2 0.1 0.3
p  y 0.4 0.6 1

(Totals)

Thus, the marginal probability distribution of X is


X 1 2 3
p(x) 0.3 0.4 0.3

and the marginal probability distribution of Y is


Y 1 2
P(y) 0.4 0.6

P  X  1, Y  1 0.1 1
ii) As P  X  1  Y  1    ,
P  Y  1 0.4 4

P  X  2, Y  1 0.1 1
P  X  2  Y  1    and
P  Y  1 0.4 4

P  X  3  Y  1 0.2 1
P  X  3  Y  1    ,
P  Y  1 0.4 2

 The conditional distribution of X given Y = 1 is


X 1 2 3
P  X  x  Y  1 1 1 1
4 4 2

(iii) As the values of (X, Y) which satisfy X + Y < 4 are (1, 1), (1, 2) and (2, 1)
only.

 P  X  Y   4   P  X  1, Y  1  P  X  1, Y  2   P  X  2, Y  1

 0.1  0.2  0.1 = 0.4


Example 2: Two discrete random variables X and Y have
2 1 1
P[X = 0, Y = 0]  , P[X = 0, Y = 1] = , P  X  1, Y  0  , and
9 9 9
32
5 Bivariate Discrete Random
P  X  1, Y  1  . Examine whether X and Y are independent? Variables
9
Solution: Writing the given distribution in tabular form as follows:

Y 0 1 px
X
0 2/ 9 1/ 9 3/ 9
1 1/ 9 5/ 9 6/9
p  y 3/ 9 6/9 1

3 6
 P  X  0   , P  X  1  ,
9 9
3 6
P  Y  0  , P  Y  1 
9 9
3 3 1
Now P  X  0 P  Y  0   
9 9 9
2
But P  X  0, Y  0 
9
 P  X  0, Y  0  P  X  0 P  Y  0

Hence X and Y are not independent

[Note: If P  X  x, Y  y   P  X  x  P  Y  y  for each possible value of X


and Y, only then X and Y are independent.]

Here are two exercises for you.

E1) The joint probability distribution of a pair of random variables is given


by the following table:
Y 1 2 3
X
1 1 / 12 0 1 / 18
2 1/ 6 1/ 9 1/ 4
3 0 1/ 5 2 / 15

i) Evaluate marginal distribution of X.


ii) Evaluate conditional distribution of Y given X = 2
iii) Obtain P  X  Y  5.

33
Random Variables and
Expectation
E2) For the following joint probability distribution of (X, Y),
Y 1 2 3
X
1 1 / 20 1 / 10 1 / 10
2 1 / 20 1 / 10 1 / 10
3 1 / 10 1 / 10 1 / 20
4 1 / 10 1 / 10 1 / 20

i) find the probability that Y = 2 given that X = 4,


ii) find the probability that Y = 2, and
iii) examine if the two events X = 4 and Y = 2 are independent.

6.4 JOINT AND MARGINAL DISTRIBUTION


FUNCTIONS FOR DISCRETE RANDOM
VARIABLES
Two-Dimensional Joint Distribution Function
In analogy with the distribution function F  x   P  X  x  of one-dimensional
random variable X discussed in Unit 5 of this course, the distribution function
of the two-dimensional random variable (X, Y) for all real x and y is defined
as
F  x, y   P  X  x, Y  y 

Marginal Distribution Functions


Let (X, Y) be a two-dimensional discrete random variable having F  x, y  as its
distribution function. Now the marginal distribution function of X is defined
as
F  x   P X  x 

 P  X  x, Y  y1   P  X  x, Y  y 2   ...

  P  X  x, Y  y j 
j

Similarly, the marginal distribution function of Y is defined as


F  y   P  Y  y

 P  X  x1 , Y  y   P  X  x 2 , Y  y   ...

  P  X  x i , Y  y
i

34
Example 3: Considering the probability distribution function given in Bivariate Discrete Random
Example 1, find Variables

i) F(2, 2), F(3,2)


ii) FX(3)
iii) FY(1)
Solution:
i) F  2, 2   P  X  2, Y  2

= P  X  2, Y  2   P  X  1, Y  2 

 P  X  2, Y  2   P  X  2, Y  1  P  X  1, Y  2

 P  X  1, Y  1

= 0.3 + 0.1 + 0.2 + 0.1 = 0.7


F  3, 2   P  X  3, Y  2 

 P  X  2, Y  2   P  X  3, Y  2

first term on R.H.S. has been 


= 0.7  P  X  3, Y  2  obtained in part (i) of this example 
 
= 0.7  P  X  3, Y  2   P  X  3, Y  1  0.7  0.1  0.2 = 1

ii) FX  3   P  X  3

= P  X  3, Y  1  P  X  3, Y  2 

= P  X  3, Y  1  P  X  2, Y  1  P  X  1, Y  1

 P  X  3, Y  2  P  X  2, Y  2   P X  1, Y  2 

= 0.2 + 0.1 + 0.1 +0.1+ 0.3 + 0.2 = 1


iii) FY 1  P  Y  1

= P  X  1, Y  1  P  X  2, Y  1  P X  3, Y  1

= P  X  1, Y  1  P  X  2, Y  1  P  X  3, Y  1

= 0.1 + 0.1 + 0.2 = 0.4


Example 4: Find the joint and marginal distribution functions for the joint
probability distribution given in Example 2.
Solution: For the joint distribution function, we have to find
F  x, y   P  X  x, Y  y  for each x and y, i.e. we are to find F(0,0), F(0,1),
F(1,0), F(1,1).

35
Random Variables and
2
Expectation F  0, 0   P  X  0, Y  0  P  X  0, Y  0 
9
F  0,1  P  X  0, Y  1  P  X  0, Y  0   P  X  0, Y  1

2 1 3
=  
9 9 9
F 1,0   P  X  1, Y  0  P  X  1, Y  0   P  X  0, Y  0

1 2 3
  
9 9 9
F 1,1  P  X  1, Y  1  P  X  1, Y  1  P  X  1, Y  0 

+ P  X  0, Y  1  P  X  0, Y  0 

5 1 1 2
    =1
9 9 9 9
Above distribution function F  x, y  can be shown in the tabular form as
follows:
Y 0 Y 1
X 0 2/ 9 3/ 9
X 1 3/ 9 1

Marginal distribution function of X is obtained on finding F  x   P  X  x  for


each x, i.e. we have to obtain FX  0  , FX 1 .

FX  0   P  X  0   P  X  0

 P  X  0, Y  0   P  X  0, Y  1

2 1 3
  
9 9 9
FX 1  P  X  1  P  X  1, Y  0  P  X  1, Y  1

 P  X  1, Y  0  P  X  0, Y  0 

 P  X  1, Y  1  P  X  0, Y  1

1 2 5 1
    1
9 9 9 9
 marginal distribution function of X is given as
X F(x)
 0 3/ 9
 1 1
Similarly, marginal distribution function of Y can be obtained. [Do it yourself]
36
Here is an exercise for you. Bivariate Discrete Random
Variables
E3) Obtain the joint and marginal distribution functions for the joint
probability distribution given in E 1).
Now before ending this unit, let us summarizes what we have covered in it.

6.5 SUMMARY
In this unit we have covered the following main points:
1) If X and Y be two discrete random variables defined on the sample space
S of a random experiment then the function (X, Y) defined on the same
sample space is called a two-dimensional discrete random variable. In
others words, (X, Y) is a two-dimensional random variable if the
possible values of (X, Y) are finite or countably infinite.

2) A number p  x i , y j  associated with each possible outcome  x i , y j  of a


two-dimensional discrete random variable (X, Y) is called the joint
probability mass function of X and Y if it satisfies the following
conditions:

(i) p  xi , y j   0

(ii)  p  x , y   1
i j
i j

3) If (X,Y) is a discrete two-dimensional random variable which take up the


 
values xi , y j , then the probability distribution of X given by
p  x i  =  p  x i , y j  is known as the marginal probability mass
j

function of X and the probability distribution of Y given by


 
p y j =  p  x i , y j  is known as the marginal probability mass
i
function of Y.
4) The conditional probability mass function of X given Y  y in case of
a two-dimensional discrete random variable (X, Y) is defined as
p  x  y   P[X  x  Y  y]

P  X  x  Y  y
= ; and
P  Y  y

the conditional probability mass function of Y, given X = x is


defined as
p  y  x   P Y  y  X  x 

P Y  y  X  x 

P X  x 

5) Two discrete random variables X and Y are said to be independent if


and only if
P  X  x i  Y  y j   P  X  x i  P  Y  y j 

37
Random Variables and
Expectation 6.6 SOLUTIONS/ANSWERS
E1) Let us compute the marginal totals. Thus, the complete table with
marginal totals is given as
Y 1 2 3 px
X
1 1 1 1 1 5
0 0 
12 18 12 18 36
2 1 1 1 1 1 1 19
  
6 9 4 6 9 4 36
3 1 2 1 2 1
0 0  
5 15 5 15 3
p  y 1 14 79 1
4 45 180

Therefore,
i) Marginal distribution of X is
X px

1 5 / 36
2 19 / 36
3 1/ 3
P  Y  1, X  2 1 36 6
ii) P  Y  1  X  2     
P  X  2 6 19 19

P  Y  2, X  2 1 36 4
P  Y  2  X  2    
P  X  2 9 19 19

P  Y  3, X  2 1 36 9
P  Y  3  X  2    
P  X  2 4 19 19

 The conditional distribution of Y given X = 2 is


Y P  Y  y  X  2

1 6 / 19
2 4 / 19
3 9 / 19

38
iii) P  X  Y  5 Bivariate Discrete Random
Variables

=P  X  1, Y  1  P  X  1, Y  2   P  X  1, Y  3

+P  X = 2, Y = 1  P  X  2, Y  2  P  X  3, Y  1

1 1 1 1 15
 0   0  .
12 18 6 9 36
E2) First compute the marginal totals, then you will be able to find
1
i) P  X  4  , and hence
4
P  Y  2, X  4 2
P  Y  2  X  4  
P  X  4 5

2
ii) P  Y  2 
5
1 1 2
iii) P  X  4, Y  2  , P  X  4   , P  Y  2 
10 4 5
1 2 1
P[X = 4] P[Y = 2] =  =
4 5 10
 X= 4 and Y= 2 are independent
E3) To obtain joint distribution function F  x, y   P  X  x, Y  y  , we have
to obtain
F  x, y  for each value of X and Y, i.e. we have to obtain

F 1,1 , F 1, 2  , F 1,3 , F  2,1 , F  2, 2  , F  2,3 , F  3,1 , F  3, 2  , F  3,3 .

Then, the distribution function in tabular form is

Y1 Y2 Y3


X1 1 1 5
12 12 36

X2 1 13 2
4 36 3
1 101
X3 1
4 180

39
Random Variables and
Expectation
Marginal distribution function of X is given as
X F(x)
1 5/36
2 2/3
3 1

Marginal distribution function of Y is


Y F(y)
1 1/4
2 101/180
3 1

40
Bivariate Continuous
UNIT 7 BIVARIATE CONTINUOUS Random Variables
RANDOM VARIABLES
Structure
7.1 Introduction
Objectives
7.2 Bivariate Continuous Random Variables
7.3 Joint and Marginal Distribution and Density Functions
7.4 Conditional Distribution and Density Functions
7.5 Stochastic Independence of Two Continuous Random Variables
7.6 Problems on Two-Dimensional Continuous Random Variables
7.7 Summary
7.8 Solutions/Answers

7.1 INTRODUCTION
In Unit 6, we have defined the bivariate discrete random variable (X, Y), where
X and Y both are discrete random variables. It may also happen that one of the
random variables is discrete and the other is continuous. However, in most
applications we deal only with the cases where either both random variables are
discrete or both are continuous. The cases where both random variables are
discrete have already been discussed in Unit 6. Here, in this unit, we are going
to discuss the cases where both random variables are continuous.
In Unit 6, you have studied the joint, marginal and conditional probability
functions and distribution functions in context of bivariate discrete random
variables. Similar functions, but in context of bivariate continuous random
variables, are discussed in this unit.
Bivariate continuous random variable is defined in Sec. 7.2. Joint and marginal
density functions are described in Sec. 7.3. Sec. 7.4 deals with the conditional
distribution and density functions. Independence of two continuous random
variables is dealt with in Sec. 7.5. Some practical problems on two-dimensional
continuous random variables are taken up in Sec. 7.6.
Objectives
A study of this unit would enable you to:
 define two-dimensional continuous random variable;
 specify the joint and marginal probability density functions of two
continuous random variables;
 obtain the conditional density and distribution functions for two-
dimensional continuous random variable;
 check the independence of two continuous random variables; and
 solve various practical problems on bivariate continuous random
variables.

41
Random Variables and 7.2 BIVARIATE CONTINUOUS RANDOM
Expectation
VARIABLES
Definition: If X and Y are continuous random variables defined on the sample
space S of a random experiment, then (X, Y) defined on the same sample space
S is called bivariate continuous random variable if (X, Y) assigns a point in
xy-plane defined on the sample space S. Notice that it (unlike discrete random
variable) assumes values in some non-countable set. Some examples of
bivariate continuous random variable are:
1. A gun is aimed at a certain point (say origin of the coordinate system).
Because of the random factors, suppose the actual hit point is any point
(X, Y) in a circle of radius unity about the origin.

1 X
-1
-1

Fig.7.1: Actual Hit Point when a Gun is Aimed at a Certain Point

Then (X, Y) assumes all the values in the circle  x, y  : x 2



 y 2  1 i.e.
(X, Y) assumes all values corresponding to each and every point in the
circular region as shown in Fig.7.1. Here, (X, Y) is bivariate continuous
random variable.

2. (X, Y) assuming all values in the rectangle  x, y  : a  x  b, c  y  d is a


bivariate continuous random variable.
Here, (X, Y) assumes all values corresponding to each and every point in
the rectangular region as shown in Fig.7.2.

y=d
d
x=a x=b
c
y=c
X
a b
Fig.7.2: (X, Y) Assuming All Values in the Rectangle  x,y  : a  x  b,c  y  d
3. In a statistical survey, let X denotes the daily number of hours a child
watches television and Y denotes the number of hours he/she spends on the
studies. Here, (X, Y) is a two-dimensional continuous random variable.

42
Bivariate Continuous
7.3 JOINT AND MARGINAL DISTRIBUTION AND Random Variables
DENSITY FUNCTIONS
Two-Dimensional Continuous Distribution Function
The distribution function of a two-dimensional continuous random variable
(X, Y) is a real-valued function and is defined as
F  x, y   P  X  x, Y  y  for all real x and y.

Notice that the above function is in analogy with one-dimensional continuous


random variable case as studied in Unit 5 of the course.
Remark 1: F  x, y  can also be written as FX,Y  x, y  .

Joint Probability Density Function


Let (X, Y) be a continuous random variable assuming all values in some region
R of the xy-plane. Then, a function f  x, y  such that
x y

F  x, y     f  x, y  dydx
 

is defined to be a joint probability density function.


As in the one-dimensional case, a joint probability density function has the
following properties.
i) f  x, y   0
 
ii)   f  x, y  dydx  1
 

Remark 2:
As in the one-dimensional case, f  x, y  does not represent the probability of
anything. However, for positive x and y sufficiently small, f  x, y  xy is
approximately equal to
P  x  X  x  x, y  Y  y  y  .

In the one-dimensional case, you have studied that for positive x sufficiently
small f(x)x is approximately equal to P  x  X  x  x  . So, the two-
dimensional case is in analogy with the one-dimensional case.
Remark 3:
In analogy with the one-dimensional case [See Sec. 5.4 of Unit 5 of this course],
P  x  X  x  x, y  Y  y  y 
f  x, y  can be written as lim
x  0
y  0
xy

and is equal to
2
xy
 F  x, y   , i.e. second order partial derivative with respect to x and y.
43
d
Random Variables and
Expectation
[See Sec. 5.5 of Unit 5 where f  x  
dx
 F  x  ]
2
Note:
xy
 F  x, y   means first differentiate F  x, y  partially w.r.t. y and
then the resulting function w.r.t. x. When we differentiate a function partially
w.r.t. one variable, then the other variable is treated as constant
For example, Let F  x, y   xy3  x 2 y

If we differentiate it partially w.r.t. y, we have



y
 F  x, y   = x(3y 2 )  x 2 .1 [ here, x is treated as constant.]

If we now differentiate this resulting expression w.r.t. x, we have


2
xy
 F  x, y   = 3 y 2  2 x [ here, y is treated as constant.]

Marginal Continuous Distribution Function


Let (X, Y) be a two-dimensional continuous random variable having f  x, y  as
its joint probability density function. Now, the marginal distribution function of
the continuous random variable X is defined as
F  x   P X  x 

 P  X  x, y    [ for X  x, Y can take any real value]


x
 
   f  x, y  dy  dx,
and the marginal distribution function of the continuous random variable Y is
defined as
F  y   P  Y  y

 P  Y  y, X    [ for Y  y, X can take any real value]


y
 
   f  x, y  dx  dy
Marginal Probability Density Functions
Let (X, Y) be a two-dimensional continuous random variable having
F  x, y  and f  x, y  as its distribution function and joint probability density
function, respectively. Let F  x  and F  y  be the marginal distribution
functions of X and Y, respectively. Then, the marginal probability density
function of X is given as

44
 Bivariate Continuous
f x   f  x, y  dy, Random Variables


d
or, it may also be obtained as
dx
 F  x  ,
and the marginal probability density function of Y is given as

f y   f  x, y  dx


or
d
=
dy
 F  y 

7.4 CONDITIONAL DISTRIBUTION AND


DENSITY FUNCTIONS
Conditional Probability Density Function
Let (X, Y) be a two-dimensional continuous random variable having the joint
probability density function f(x, y). The conditional probability density function
of Y given X = x is defined as
f  x, y 
f y  x  , where f(x) > 0 is the marginal density of X.
f x

Similarly, the conditional probability density function of X given Y = y is


defined to be
f  x, y 
f x  y  , where f(y) > 0 is the marginal density of Y.
f  y

As f  y  x  and f  x  y  , though conditional yet, are the probability density


functions, hence possess the properties of a probability density function.
Properties of f  y  x  are:

i) f  y  x  is clearly  0
 
f  x, y 
ii)  f  y  x  dy   f x dy
 


1  
   f  x, y  dy 
f  x    

  
1   f  x, y  dyis the m arg inal 

f x
f (x)   
 probability density function of X 

=1

45
Random Variables and Similarly, f  x  y  satisfies
Expectation
i) f  x  y   0 and

ii)  f  x  y  dx  1


Conditional Continuous Distribution Function


For a two-dimensional continuous random variable (X, Y), the conditional
distribution function of Y given X = x is defined as
F  y  x   P Y  y  X  x 
y

  f  y  x  dy , for all x such that f  x   0 ;




and the conditional distribution function of X given Y = y is defined as


F  x  y   P X  x  Y  y
x
  f  x  y  dx , for all y such that f(y) > 0.


7.5 STOCHASTIC INDEPENDENCE OF TWO


CONTINUOUS RANDOM VARIABLES
You have already studied in Unit 3 of this course that independence of events is
closely related to conditional probability, i.e. if events A and B are independent,
then P  A  B  P  A  , i.e. conditional probability of A is equal to the
unconditional probability of A. Likewise independence of random variables is
closely related to conditional distributions of random variables, i.e. two random
variables X and Y with joint probability function f  x, y  and marginal
probability functions f  x  and f  y  respectively are said to be stochastically
independent if and only if
i) f  y  x   f  y 

ii) f  x  y   f  x  .

Now, as defined in Sec. 7.4, we have


f  x, y 
f y  x 
f x

 f  x, y   f  x  f  y  x  [On cross-multiplying]

So, if X and Y are independent, then


f  x, y   f  x  f  y  [ f  y  x   f  y  ]

Remark 4: The random variables, if independent, are actually stochastically


46
independent but often the word “stochastically” is omitted.
Definition: Two random variables are said to be (stochastically) independent if Bivariate Continuous
and only if their joint probability density function is the product of their Random Variables
marginal density functions.
Let us now take up some problems on the topics covered so far in this unit.

7.6 PROBLEMS ON TWO-DIMENSIONAL


CONTINUOUS RANDOM VARIABLES

Example 1: Let X and Y be two random variables. Then for

k  2x  y  , 0  x  1, 0  y  2
f  x, y   
 0, elsewhere

to be a joint density function, what must be the value of k ?


Solution: As f  x, y  is the joint probability density function,
 
   f  x, y  dy dx  1
 

1 2
   f  x, y  dy dx  1
0 0
[0  x  1, 0  y  2 ]

1 2
   k  2x  y  dy dx  1
0 0

1 2
 
 k     2x  y  dy  dx  1
0 0 
1 2
 y2 
 k   2xy   dx  1
0 
2 0

[Firstly the integral has been done w.r.t. y treating x as constant.]

1  2

 k   2x  2  
 2 
 0  dx  1
0 
2 

1
 k   4x  2  dx  1
0

1
 4x 2 
 k  2x   1
 2 0
4  1
 k   2  0   1  4k  1  k =
2  4

47
Random Variables and
Example 2: Let the joint density function of a two-dimensional random
Expectation variable (X, Y) be:
x  y for 0  x  1 and 0  y  1
f  x, y   
 0, otherwise

Find the conditional density function of Y given X.


f  x, y 
Solution: The conditional density function of Y given X is f  y  x   ,
f x

where f  x, y  is the joint density function, which is given; and f  x  is the


marginal density function which, by definition, is given by

f x   f  x, y  dy


1
  f  x, y  dy [ 0  y < 1]
0

1
=   x  y  dy
0

1
 y2 
  xy   .
 2 0
2
 1  1
  x 1   0   x  , 0  x  1.
 2  2

 the conditional density function of Y given X is


f  x, y  xy
f y  x   , for 0  x < 1 and 0  y < 1.
f x x
1
2
Example 3: Two-dimensional random variable (X, Y) have the joint density
8xy, 0  x  y  1
f  x, y   
 0 , otherwise
1 1
i) Find P[X <  Y < ].
2 4
ii) Find the marginal and conditional distributions.
iii)Are X and Y independent?
Solution:
1 1 1 1 1 1

 1 1 2 4 2 4 2
 y2  4
i) P  X   Y   f  x, y dy dx  8xy dy dx  0  2  dx
8x
 2 4  0 0 0 0
0

48
1 1 1 Bivariate Continuous
2  1  2
x 1  x2  2 Random Variables
  8x  dx  0 4 dx   
0 16  2   4  2 0

1 1  1
  .
4  8  32

ii) Marginal density function of X is


1
f  x    f  x, y  dy [0  x  y  1]
x

1 1
 y2 
  8xy dy  8x  
x  2 x

 1 x2 

 8x     4x 1  x 2 for 0  x  1 
2 2 
Marginal density function of Y is
y

f  y    f  x, y  dx [0  x  y ]
0

  8xy dx
0

y
 x2  8y 3
 8y     4y 3 for 0  y  1
 2 0 2

Conditional density function of X given Y(0 < Y < 1) is


f  x, y 
f  x  y 
f  y

8xy 2x
  , 0x y
4y 3 y 2
Conditional density function of Y given X(0 < X < 1) is
f  x, y 
f y  x 
f x

8xy 2y
  , x < y <1

4x 1  x 2   1 x2 
iii) f  x, y   8xy,

 
But f  x  f  y   4x 1  x 2 4y3

 16x 1  x  y
2 3

49
Random Variables and  f  x, y   f  x  f  y 
Expectation
Hence, X and Y are not independent random variables.
Now, you can try some exercises.
E1) Let X and Y be two random variables. Then for
kxy for 0  x  4 and 1  y  5
f  x, y   
 0, otherwise
to be a joint density function, what must be the value of k?
E2) If the joint p.d.f. of a two-dimensional random variable (X, Y) is given by
2 for 0  x  1 and 0  y  x
f  x, y   
0, otherwise,
Then,
i) Find the marginal density functions of X and Y.
ii) Find the conditional density functions.
iii) Check for independence of X and Y.
E3) If (X, Y) be two-dimensional random variable having joint density
function.

1
  6  x  y ; 0  x  2 , 2  y  4
f  x, y    8
 0, elsewhere

Find (i) P  X  1, Y  3 (ii) P  X  1  Y  3

Now before ending this unit, let’s summarize what we have covered in it.

7.7 SUMMARY
In this unit, we have covered the following main points:
1) If X and Y are continuous random variables defined on the sample space S of
a random experiment, then (X, Y) defined on the same sample space S is
called bivariate continuous random variable if (X, Y) assigns a point in
xy -plane defined on the sample space S.
2) The distribution function of a two-dimensional continuous random variable
(X, Y) is a real-valued function and is defined as
F  x, y   P  X  x, Y  y  for all real x and y.

3) A function f  x, y  is called joint probability density function if it is such


that
x y

F  x, y     f  x, y  dydx
 

50
and satisfies Bivariate Continuous
Random Variables
i) f  x, y   0
 
ii)   f  x, y  dydx  1.
 
4) The marginal distribution function of the continuous random variable X is
defined as
x
 
F  x   P X  x      f  x, y  dy dx,
and that of continuous random variable Y is defined as
y
 
F  y   P  Y  y    f  x, y  dx dy .
5) The marginal probability density function of X is given as

d
f x   f  x, y  dy  dx  F(x)  ,


and that of Y is given as



d
f y   f  x, y  dx = dx
 F  y  .


6) The conditional probability density function of Y given X = x is defined


as
f  x, y 
f y  x  ,
f x

and that of X given Y = y is defined as


f  x, y 
f x  y  .
f  y

7) The conditional distribution function of Y given X = x is defined as


y

F y  x   f  y  x  dy , for all x such that f  x   0 ;




and that of X given Y = y is defined as


x
F  x  y   f  x  y  dx , for all y such that f(y) > 0.


8) Two random variables are said to be (stochastically) independent if and


only if their joint probability density function is the product of their
marginal density functions.

51
Random Variables and 7.8 SOLUTIONS/ANSWERS
Expectation
E1) As f  x, y  is the joint probability density function,
 
   f  x, y dy dx  1
 

4 5 4 5
 
 0 1 kxy dy dx  1  k 0  1 xy dy  dx  1
4 5 4
 y2 
 k   x  dx  1  k  12x dx  1
0 
2 1 0

4
 x2 
 12k    1  96 k = 1
 2 0
1
k=
96
E2) i) Marginal density function of Y is given by
 1

f y   f  x, y  dx   2dx
 y

[As x is involved in both the given ranges, i.e. 0 < x < 1 and 0 < y < x;
therefore, here we will combine both these intervals and hence have
0 < y < x < 1.  x takes the values from y to 1]
1
  2x y  2  2y

= 2 – 2y
= 2(1– y), 0 < y < 1
Marginal density function of X is given by

f x   f  x, y  dy


x
  2dy [ 0 < y < x < 1]
0

x
 2  y 0
 2x, 0  x  1.
ii) Conditional density function of Y given X(0 < X < 1) is
f  x, y  2 1
f y  x    ; 0  y x
f x 2x x

Conditional density function of X and given Y(0 < Y < 1) is


f  x, y  2 1
f  x  y    , y x 1
f  y 2 1  y  1  y
52
iii) f  x, y   2, Bivariate Continuous
Random Variables

f  x  f  y   2  2x 1  y 

As f  x, y   f  x  f  y  ,

 X and Y are not independent.

1 3
E3) (i) P  X  1, Y  3    f  x, y  dy dx
 

1 3
1
   6  x  y dy dx
0 2
8
3
1
1  y2 
    6y  xy    dx
8
0  
2  2
1
1  9 
   6  3   x  3    12  2x  2dx
8 0  2 
1
1  9 
   18  3x    10  2x  dx
8 0  2 
1 1
1 7  1 7 x2  1 7 1  3
    x  dx   x       
802  8 2 2 0 8  2 2  8

P  X  1, Y  3
ii) P  X  1  Y  3 
P  Y  3
2 3
1
where P  Y  3      6  x  y  dy dx
0 2
8
2 3
1  y2 
  6y  xy   dx .
80 2 2
2
1  9 
   18  3x    12  2x  2 dx
8 0  2 
2
1  9 
   18  3x    10  2x   dx
8 0  2 
2
1 7 
   x  dx
8 0  2 
2
1 7 x2 
  x 
8 2 2 0

53
Random Variables and 1 4 
 7   0
Expectation 8 2 
5

8
3 / 8  value of numerator is 
 P  X  1 Y  3 
5 / 8 already calculated in part(i) 

3

5

54
UNIT 8 MATHEMATICAL EXPECTATION Mathematical Expectation

Structure
8.1 Introduction
Objectives

8.2 Expectation of a Random Variable


8.3 Properties of Expectation of One-dimensional Random Variable
8.4 Moments and Other Measures in Terms of Expectations
8.5 Addition and Multiplication Theorems of Expectation
8.6 Summary
8.7 Solutions/Answers

8.1 INTRODUCTION
In Units 1 to 4 of this course, you have studied probabilities of different events
in various situations. Concept of univariable random variable has been
introduced in Unit 5 whereas that of bivariate random variable in Units 6 and
7. Before studying the present unit, we advice you to go through the above
units.
You have studied the methods of finding mean, variance and other measures in
context of frequency distributions in MST-002 (Descriptive Statistics). Here, in
this unit we will discuss mean, variance and other measures in context of
probability distributions of random variables. Mean or Average value of a
random variable taken over all its possible values is called the expected value
or the expectation of the random variable. In the present unit, we discuss the
expectations of random variables and their properties.
In Secs. 8.2, 8.3 and 8.4, we deal with expectation and its properties. Addition
and multiplication laws of expectation have been discussed in Sec. 8.5.
Objectives
After studying this unit, you would be able to:
 find the expected values of random variables;
 establish the properties of expectation;
 obtain various measures for probability distributions; and
 apply laws of addition and multiplication of expectation at appropriate
situations.

8.2 EXPECTATION OF A RANDOM VARIABLE


In Unit 1 of MST-002, you have studied that the mean for a frequency
distribution of a variable X is defined as

55
Random Variables and n
Expectation
f x
i 1
i i
Mean = n
.
 fi
i 1

If the frequency distribution of the variable X is given as


x: x1 x2 x 3 ...x n
f: f1 f2 f 3 ...f n

The above formula of finding mean may be written as


n

f x
i 1
i i
f1x1  f 2 x 2  ...  f n x n
Mean = n
= n

 fi
i 1
f i 1
i

x1f1 x 2 f2 xn fn
= n
 n
 ...  n

 fi
i 1
 fi
i 1
f
i 1
i

     
 f   f   f 
= x1  n i   x2  n 2   ...  x n  n n 
 f   f   f 
 i   i   i 
 i 1   i 1   i1 
f1 f2 fn
Notice that n
, n
,..., n
are, in fact, the relative frequencies or the
f f
i 1
i
i 1
i f
i 1
i

proportion of individuals corresponding to the values x1 , x 2 , …, x n


respectively of variable X and hence can be replaced by probabilities. [See
Unit 2 of this course]
Let us now define a similar measure for the probability distribution of a
random variable X which assumes the values say x1 , x 2 ,..., x n with their
associated probabilities p1 , p2 , ..., p n . This measure is known as expected value
of X and in the similar way is given as
n
x1  p1   x 2  p 2   ...  x n  p n    x i pi with only difference is that the role of
i 1
relative frequencies has now been taken over by the probabilities. The expected
value of X is written as E(X).
The above aspect can be viewed in the following way also:

56
n
Mathematical Expectation
x f
i 1
i i
Mean of a frequency distribution of X is n
, similarly mean of a
 fi
i 1
n

x p
i 1
i i
probability distribution of r.v. X is n
.
p
i 1
i

n
Now, as we know that p
i 1
i  1 for a probability distribution, therefore
n
the mean of the probability distribution becomes  x i pi .
i 1

n
 Expected value of a random variable X is E  X    x i pi .
i 1

The above formula for finding the expected value of a random variable X
is used only if X is a discrete random variable which takes the values
x1 , x 2 , ..., x n with probability mass function
p  x i   P  X  x i  , i  1, 2,..., n.

But, if X is a continuous random variable having the probability density


function f  x  , then in place of summation we will use integration and in
this case, the expected value of X is defined as

E X   xf  x  dx ,


The expectation, as defined above, agrees with the logical/theoretical


argument also as is illustrated in the following example.
Suppose, a fair coin is tossed twice, then answer to the question, “How
many heads do we expect theoretically/logically in two tosses?” is
obviously 1 as the coin is unbiased and hence we will undoubtedly expect
one head in two tosses. Expectation actually means “what we get on an
average”? Now, let us obtain the expected value of the above question
using the formula.
Let X be the number of heads in two tosses of the coin and we are to
obtain E(X), i.e. expected number of heads. As X is the number of heads
in two tosses of the coin, therefore X can take the values 0, 1, 2 and its
probability distribution is given as
X: 0 1 2
1 1 1. [Refer Unit 5 of MST-003]
px :
4 2 4
3
 E  X    x i pi
i 1

= x1p1  x 2 p 2  x 3p 3

57
Random Variables and
1 1 1 1 1
Expectation =  0     1    (2)   = 0    1
4 2 4 2 2
So, we get the same answer, i.e. 1 using the formula also.
So, expectation of a random variable is nothing but the average (mean)
taken over all the possible values of the random variable or it is the value
which we get on an average when a random experiment is performed
repeatedly.
Remark 1: Sometimes summations and integrals as considered in the
above definitions may not be convergent and hence expectations in such
cases do not exist. But we will deal only those summations (series) and
integrals which are convergent as the topic regarding checking the
convergence of series or integrals is out of the scope of this course. You
need not to bother as to whether the series or integral is convergent or
not, i.e. as to whether the expectation exists or not as we are dealing with
only those expectations which exist.
Example 1: If it rains, a rain coat dealer can earn Rs 500 per day. If it is a dry
day, he can lose Rs 100 per day. What is his expectation, if the probability of
rain is 0.4?
Solution: Let X be the amount earned on a day by the dealer. Therefore, X can
take the values Rs 500,  Rs 100 ( loss of Rs 100 is equivalent to negative of
the earning of Rs100).
 Probability distribution of X is given as
Rainy Day Dry day
X  in Rs. : 500 100
px : 0.4 0.6

Hence, the expectation of the amount earned by him is


2
E  X    x i pi = x1p1  x 2 p 2
i 1

=  500  0.4    100  0.6  = 200  60 = 140

Thus, his expectation is Rs 140, i.e. on an overage he earns Rs 140 per day.

Example 2: A player tosses two unbiased coins. He wins Rs 5 if 2 heads


appear, Rs 2 if one head appears and Rs1 if no head appears. Find the
expected value of the amount won by him.
Solution: In tossing two unbiased coins, the sample space, is 
S = HH, HT, TH, TT.

1 2 1
 P  2 heads   , P  one head   , P  no head   .
4 4 4
Let X be the amount in rupees won by him
 X can take the values 5, 2 and 1 with

58
1 Mathematical Expectation
P  X  5  P  2heads   ,
4
2
P  X  2  P 1Head   , and
4
1
P  X  1  P  no Head   .
4
 Probability distribution of X is
X: 5 2 1
1 2 1
px
4 4 4
Expected value of X is given as
3
E  X    x i pi = x1p1  x 2 p 2  x 3p 3
i 1

 1   2   1  5 4 1 10
= 5    2    1  =     2.5.
4 4 4 4 4 4 4
Thus, the expected value of amount won by him is Rs 2.5.
Example 3: Find the expectation of the number on an unbiased die when thrown.
Solution: Let X be a random variable representing the number on a die when thrown.
X can take the values 1, 2, 3, 4, 5, 6 with
1
P  X  1  P  X  2   P  X  3  P  X  4  P  X  5  P  X  6   .
6
Thus, the probability distribution of X is given by
X: 1 2 3 4 5 6
1 1 1 1 1 1
px :
6 6 6 6 6 6
Hence, the expectation of number on the die when thrown is
6
1 1 1 1 1 1 21 7
E  X    x i pi  1  2   3   4   5   6  = 
i 1 6 6 6 6 6 6 6 2
Example 4: Two cards are drawn successively with replacement from a
well shuffled pack of 52 cards. Find the expected value for the number of
aces.
Solution: Let A1, A2 be the events of getting ace in first and second draws,
respectively. Let X be the number of aces drawn. Thus, X can take the
values 0, 1, 2 with

P  X  0  P  no ace   P  A1  A 2 

cards are drawn with replacement 


 P  A1  P  A 2  and hence the events are independent 
 

59
Random Variables and 48 48 12 12 144
Expectation =    = ,
52 52 13 13 169
P  X  1   one Ace and one other card 

 P  A1  A 2    A1  A 2  

 By Addition theorem of probability 


 P  A1  A 2   P  A1  A 2  for mutually exclusive events 
 
 By multiplication theorem of 
 P  A1  P  A 2   P  A1  P  A 2   
 probability for independent events 
4 48 48 4 1 12 12 1 24
         , and
52 52 52 52 13 13 13 13 169
P  X  2   P  both aces   P  A1  A 2 

4 4 1
 P  A1  P  A 2  =   .
52 52 169

Hence, the probability distribution of random variable X is


X: 0 1 2
144 24 1
px :
169 169 169
 The expected value of X is given by
3
144 24 1 26 2
E  X    x i pi  0   1  2  
i 1 169 169 169 169 13
Example 5: For a continuous distribution whose probability density
function is given by:
3x
f x   2  x  , 0  x  2, find the expected value of X.
4
Solution: Expected value of a continuous random variable X is given by
 2 2
3x 3
E X   xf  x  dx =  x  2  x  dx   x 2  2  x  dx
 0
4 40
2 3 4
3
2
3  x3 x 4  3   2  2 
40
2 3
 
=  2x  x dx =  2   =  2
4  3 4  0 4  3

4
 0


3 16 16  3 16
    1
4  3 4  4 12
Now, you can try the following exercises.
E1) You toss a fair coin. If the outcome is head, you win Rs 100; if the
outcome is tail, you win nothing. What is the expected amount won
by you?
60
E2) A fair coin is tossed until a tail appears. What is the expectation of Mathematical Expectation
number of tosses?
E3) The distribution of a continuous random variable X is defined by
 x3 , 0  x 1
 3
f  x    2  x  , 1  x  2
 0 , elsewhere

Obtain the expected value of X.

Let us now discuss some properties of expectation in the next section.

8.3 PROPERTIES OF EXPECTATION OF ONE-


DIMENSIONAL RANDOM VARIABLE
Properties of mathematical expectation of a random variable X are:
1. E(k) = k, where k is a constant
2. E(kX) = kE(X), k being a constant.
3. E(aX + b) = aE(X) + b, where a and b are constants
Proof:
Discrete case:
Let X be a discrete r.v. which takes the values x1 , x 2 , x 3 , ... with
respective probabilities p1 , p2 , p 3 , ...

1. E  k    k p i [By definition of the expectation]


i

= k  pi
i

sum of probabilities of all the 


 k 1  k  possible value of r.v. is 1 
 
2. E  kX     kx i  pi [By def.]
i

= k  x i pi
i

 k E X

3. E  a X  b     ax i  b  p i [By def.]
i

=   ax p
i
i i  bp i    ax i p i   bpi  a  x i pi  b  pi
i i i i

 aE  X   b 1  aE  X   b

61
Random Variables and Continuous Case:
Expectation
Let X be continuous random variable having f(x) as its probability density
function. Thus,

1. E  k    kf  x  dx [By def.]



 k  f  x  dx


integral of the p.d.f. over 


 k 1  k  the entire range is 1 
 

2. E  kX     kx  f  x  dx [By def.]



 k  xf  x  dx  kE  X 


  
3. E  aX  b     ax  b f  x  dx    ax  f  x  dx   b f  x  dx
  

 
 a  x f  x  dx  b  f  x  dx  aE  X   b 1 = aE  X   b
 

Example 6: Given the following probability distribution:


X 2 1 0 1 2

px 0.15 0.30 0 0.30 0.25

Find i) E(X)
ii) E(2X + 3)
iii) E(X2)
iv) E(4X – 5)
Solution
5
i) E  X    x i pi = x1p1  x 2 p 2  x 3 p3  x 4 p 4  x 5p 5
i 1

  2  0.15    1 0.30    0  0   1 0.30    2  0.25 


 0.3  0.3  0  0.3  0.5  0.2
ii) E(2X + 3) = 2 E  X   3 [Using property 3 of this section]
= 2(0.2) + 3 [Using solution (i) of the question]
= 0.4 + 3 = 3.4
5
iii) E  X 2    x i2 pi [By def.]
i 1
62
= x12 p1  x 22 p 2  x 32 p3  x 42 p 4  x 52 p5 Mathematical Expectation
2 2 2 2 2
  2   0.15   1  0.30    0   0   1  0.30    2   0.25 

=  4  0.15   1 0.30    0   1 0.30    4  0.25 


 0.6  0.3  0  0.3  1  2.2
iv) E (4X  5)  E  4X   5  

= 4E  X    5  [Using property 3 of this section]


= 4(0.2)  5
 0.8  5   4.2

Here is an exercise for you.


E4) If X is a random variable with mean ‘’ and standard deviation ‘’,
X
then what is the expectation of Z= ?

[Note: Here Z so defined is called standard random variate.]

Let us now express the moments and other measures for a random variable
in terms of expectations in the following section.

8.4 MOMENTS AND OTHER MEASURES IN


TERMS OF EXPECTATIONS
Moments
The moments for frequency distribution have already been studied by you
in Unit 3 of MST-002. Here, we deal with moments for probability
distributions. The rth order moment about any point ‘A’ (say) of variable X
already defined in Unit 3 of MST-002 is given by:
n
r
f x i i  A
r'  i 1
n

f i 1
i

So, the rth order moment about any point ‘A’ of a random variable X
having probability mass function P  X  x i   p  x i   pi is defined as
n
r
p x i i  A
r'  i 1
n

p i 1
i

[Replacing frequencies by probabilities as discussed in Sec. 8.2 of this unit.]


n
r  n 
  pi  xi  A   p i  1
i 1  i1 
63
Random Variables and The above formula is valid if X is a discrete random variable. But, if X is a
Expectation
continuous random variable having probability density function f(x), then

r
rth order moment about A is defined as  r '    x  A  f  x  dx.

th
So, r order moment about any point ‘A’ of a random variable X is defined as
  pi  x i  A r , if X is a discrete r.v.
 i
r'   
   x  A  r f  x  dx, if X is a continous r.v
 
= E(X  A)r
Similarly, rth order moment about mean () i.e. rth order central moment is
defined as
 p i  x i    r , if X is a discrete r.v.
 i
r   
   x    r f  x  dx, if X is a continous r.v
 
r r
= E  X     E  X  E  X  

Variance
Variance of a random variable X is second order central moment and is
defined as
2 2
 2 = V  X   E  X     E  X  E  X  
Also, we know that
2
V  X    2 '  1 ' 

where 1 ',  2 ' be the moments about origin.


2
 
 We have V  X   E X 2   E  X  

1 '  E  X  01  E  X  , and  2 '  E  X  0 2  E X 2   


 
Theorem 8.1: If X is a random variable, then V  aX  b   a 2 V  X  ,
where a and b are constants.
2
Proof: V  aX  b   E  aX  b   E  aX  b   [By def. of variance]
2
= E aX  b   aE  X   b   [Using property 3 of Sec. 8.3]
2
= E aX  b  aE  X   b 
2
= E a X  E  X 
2
 E a 2  X  E  X   
 

64
Mathematical Expectation
2
 a 2 E  X  E  X   [Using property 2 of section 8.3]

 a 2V  X  [By definition of Variance]

Cor. (i) V  aX   a 2 V  X 

(ii) V(b) = 0
(iii) V(X + b) = V(X)

Proof: (i) This result is obtained on putting b = 0 in the above theorem.


(ii) This result is obtained on putting a = 0 in the above theorem.
(iii) This result is obtained on putting a = 1 in the above theorem.
Covariance
For a bivariate frequency distribution, you have already studied in Unit 6 of
MST-002 that covariance between two variables X and Y is defined as

f xi i  x  y i  y 
Cov  X, Y   i

f
i
i

 For a bivariate probability distribution, Cov (X, Y) is defined as


 pi j  x i  x  yi  y  , if  X, Y  is two-dimensional discrete r.v.
 i
Cov(X, Y) =   
    x  x  y  y  f  x, y  dydx, if  X, Y  is two dimensional continuous r.v.
  

where p i j = P  X  x i , Y  y j 

= E  X  X  Y  Y  [By definition of expectation]

E(X) =Mean of X i.e. X,


 E  X  E  X    Y  E  Y     
 E(Y) = Mean of Y i.e.Y 
On simplifying,
Cov(X, Y) = E(XY) – E(X) E(Y).
Now, if X and Y are independent random variables then, by multiplication
theorem,
E(XY) = E(X)E(Y) and hence in this case Cov(X, Y) = 0.
Remark 2:
i) If X and Y are independent random variables, then
V(X + Y) = V(X) +V(Y).

65
Random Variables and 2
Expectation Proof: V  X  Y   E  X  Y   E  X  Y  

2
 E  X  Y  E  X   E  Y  

2
 E X  E  X   Y  E  Y 

2 2
 E X  E  X   Y  E  Y   2 X  E  X  Y  E  Y 
 
2 2
 E  X  E  X    E  Y  E  Y    2E  X  E  X    Y  E  Y   

 V  X   V  Y   2Cov  X, Y 

 V X  V Y  0 [ X and Y are independent]

 V X  V Y

ii) If X and Y are independent random variables, then


V(X – Y) = V(X) + V(Y).
Proof: This can be proved in the similar manner as done is Remark 2(i)
above.
iii) If X and Y are independent random variables, then

V  aX  bY   a 2 V  X   b 2 V  Y  .

Proof: Prove this result yourself proceeding in the similar fashion as in


proof of Remark 2(i).

Mean Deviation about Mean


Mean deviation about mean in context of frequency distribution is
n

f
i 1
i  xi  x 
n
, and
f i 1
i

therefore, mean deviation about mean in context of probability distribution is


n

p i  x  mean  n
i 1
n
  p i  x  mean 
p
i 1
i
i 1

 by definition of expectation, we have


M.D. about mean = EX  Mean
= EX – E(X)

66
 pi  x  Mean  for discrete r.v Mathematical Expectation


   x  Mean  f  x  dx for continuous r.v
 
Note: Other measures as defined for frequency distributions in MST-002
can be defined for probability distributions also and hence can be
expressed in terms of the expectations in the manner as the moments;
variance and covariance have been defined in this section of the Unit.
Example 7: Considering the probability distribution given in Example 6, obtain
i) V(X)
ii) V(2X + 3).
Solution:
2
 
(i) V  X   E X 2   E  X  

2 The values have already been obtained 


 2.2   0.2  in the solution of Example 6 
 
 2.2  0.04 = 2.16
2
(ii) V  2X  3   2  V  X  [Using the result of Theorem 8.1]

= 4V(X) = 4(2.16) = 8.64

Example 8: If X and Y are independent random variables with variances 2


and 3 respectively, find the variance of 3X + 4Y.

Solution: V(3X + 4Y) = (3)2 V(X) + (4)2 V(Y) [By Remark 3 of Section 8.4]

= 9(2) + 16( 3) = 18 + 48 = 66

Here are two exercises for you:

E5) If X is a random variable with mean  and standard deviation , then


X
find the variance of standard random variable Z = .

E6) Suppose that X is a random variable for which E(X) = 10 and


V(X) = 25. Find the positive values of a and b such that Y = aX – b
has expectation 0 and variance 1.

8.5 ADDITION AND MULTIPLICATION


THEOREMS OF EXPECTATION
Now, we are going to deal with the properties of expectation in case of
two-dimensional random variable. Two important properties, i.e. addition
and multiplication laws of expectation are discussed in the present section.

67
Random Variables and
Expectation
Addition Theorem of Expectation
Theorem 8.2: If X and Y are random variables, then E  X  Y   E  X   E  Y 

Proof:
Discrete case:
Let (X, Y) be a discrete two-dimensional random variable which takes up
the values (xi, yj) with the joint probability mass function
pij = P  X  x i  Y  y j  .

Then, the probability distribution of X is given by


pi  p  x i   P  X  x i 

 event X = xi can happen with 


 P  X  x i  Y  y1   P  X  x i  Y  y 2   ...  
Y=y1 orY=y2 orY=y3 or... 
= pi1  pi2  pi3  ...

  p ij
j

Similarly, the probability distribution of Y is given by

p j  p  y j   P  Y  y j  = p ij
i

 E  X    x i pi , E  Y    y jp j  and E  X  Y     x i  y j p ij
i j i j

Now E  X  Y     x i  y j p ij
i j

  x i pij   y jpij
i j i j

= x p   y p
i
i
j
ij
j
j
i
ij

[ in the first term of the right hand side, xi is free from j and hence can
be taken outside the summation over j; and in second term of the right
hand side, yj is free from i and hence can be taken outside the summation
over i.]

 E  X  Y    x i pi   y j p j  = E  X   E  Y 
i j

Continuous Case:
Let (X, Y) be a bivariate continuous random variable with probability
density function f  x, y  . Let f  x  and f  y  be the marginal
probability density functions of random variables X and Y respectively.

68
 
Mathematical Expectation
 E X   x f  x  dx, E  Y    y f  y dy,
 

 
and E  X  Y      x  y  f  x, y  dy dx .
 

 
Now, E  X  Y      x  y  f  x, y  dy dx
 

   
   x f  x, y  dy dx    y f  x, y  dy dx
   


  
 
 x
   f  x, y  dy  dx   y   f  x, y  dx  dy
    
[ in the first term of R.H.S., x is free from the integral w.r.t. y and
hence can be taken outside this integral. Similarly, in the second term of
R.H.S, y is free from the integral w.r.t. x and hence can be taken outside
this integral.]
 
 Refer to the definition of marginal density 
  x f  x  dx   y f  y  dy function given in Unit 7 of this course 
   
 E X  E Y

Remark 3: The result can be similarly extended for more than two random
variables.
Multiplication Theorem of Expectation
Theorem 8.3: If X and Y are independent random variables, then
E(XY) = E(X) E(Y)
Proof:
Discrete Case:
Let (X, Y) be a two-dimensional discrete random variable which takes up the
values  x i , y j  with the joint probability mass function
pij  P  X  x i  Y  y j  . Let pi and p j' be the marginal probability mass
functions of X and Y respectively.
 E  X    x i pi , E  Y    y jp j' , and
i j

E  XY     x i y j  p ij
i j

But as X and Y are independent,


 p ij  P  X  x i  Y  y j 

69
Random Variables and
Expectation  if events A and B are independent,
= P  X  x i  P  Y  y j   
 then P  A  B   P  A  P  B  

 p i p j'

Hence, E(XY) =   x y  p p
i j
i j i j
'

=  x i y jpi p j'
i j


   x i p i y j p j' 
i j

 x i pi is free from j and hence can be 


  x i pi  y j p j'  taken outside the summation over j 
i j  
=E(X) E(Y)
Continuous Case:
Let (X, Y) be a bivariate continuous random variable with probability
density function f(x, y). Let f(x) and f(y) be the marginal probability
density function of random variables X and Y respectively.
 
 E X   x f  x  dx, E  Y    y f  y  dy ,
 

 
and E  XY     xy f  x, y  dy dx .
 

 
Now E  XY     xy f  x, y  dy dx
 

 
 X and Y are independent, f(x,y)=f(x)f(y)
   xy f  x  f  y  dy dx
 
 (see Unit 7 of this course)



 
     x f  x    yf  y    dy dx
   

   
   x f  x  dx   y f  y  dy 
    

 E X E Y

Remark 4: The result can be similarly extended for more than two
random variables.
Example 8: Two unbiased dice are thrown. Find the expected value of
the sum of number of points on them.
Solution: Let X be the number obtained on the first die and Y be the
number obtained on the second die, then
70
7 7 Mathematical Expectation
E X  and E  Y   [See Example 3 given in Section 8.2]
2 2
 The required expected value = E(X + Y)
 Using addition theorem 
= E(X) + E(Y)  
 of expectation 
7 7
=  =7
2 2
Remark 5: This example can also be done considering one random
variable only as follows:
Let X be the random variable denoting “the sum of numbers of points on
the dice”, then the probability distribution in this case is
X: 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
p(x) :
36 36 36 36 36 36 36 36 36 36 36
1 2 1
and hence E(X) = 2   3   ...  12  =7
36 36 36
Example 9: Two cards are drawn one by one with replacement from 8
cards numbered from 1 to 8. Find the expectation of the product of the
numbers on the drawn cards.
Solution: Let X be the number on the first card and Y be the number on
the second card. Then probability distribution of X is
X 1 2 3 4 5 6 7 8
px 1 1 1 1 1 1 1 1
8 8 8 8 8 8 8 8

and the probability distribution of Y is


Y 1 2 3 4 5 6 7 8

p  y 1 1 1 1 1 1 1 1
8 8 8 8 8 8 8 8

1 1 1
 E  X   E  Y   1  2   ...  8 
8 8 8
1 1 9
 1  2  3  4  5  6  7  8    36  
8 8 2
Thus, the required expected value is
E  XY   E  X  E  Y  [Using multiplication theorem of expectation]

9 9 81
   .
2 2 4

71
Random Variables and
Expectation
Expectation of Linear Combination of Random Variables
Theorem 8.4: Let X1 , X 2 , ..., X n be any n random variables and if
a1 , a 2 , ..., a n are any n constants, then

E  a1X1  a 2 X 2  ...  a n X n   a1E  X1   a 2 E  X 2   ...  a n E  X n 

[Note : Here a1X1  a 2 X 2  ...  a n X n is a linear combination of X1, X2, ... , Xn]

Proof: Using the addition theorem of expectation, we have


E  a1X1  a 2 X 2  ....  a n X n   E  a1X1   E  a 2 X 2   ...  E  a n X n 

= a1E  X1   a 2 E  X 2   ...  a n E  X n  .

[Using second property of Section 8.3 of the unit]


Now, you can try the following exercises.
E7) Two cards are drawn one by one with replacement from ten cards
numbered 1 to 10. Find the expectation of the sum of points on two
cards.
E8) Find the expectation of the product of number of points on two dice.

Now before ending this unit, let’s summarize what we have covered in it.

8.6 SUMMARY
The following main points have been covered in this unit:
1) Expected value of a random variable X is defined as
n
E  X    x i pi , if X is a discrete random variable
i 1


  xf  x  dx , if X is a continuous random variable.


2) Important properties of expectation are:


i) E(k) = k, where k is a constant.
ii) E(kX) = kE(X), k being a constant.
iii) E(aX + b) = aE(X) + b, where a and b are constants
iv) Addition theorem of Expectation is stated as:
If X and Y are random variables, then E  X  Y   E  X   E  Y  .

v) Multiplication theorem of Expectation is stated as:


If X and Y are independent random variables, then
E(XY) = E(X)E(Y).

72
vi) If X1 , X 2 , ..., X n be any n random variables and if a1 , a 2 , ..., a n are any n Mathematical Expectation

constants, then
E  a1X1  a 2 X 2  ...  a n X n   a1E  X1   a 2 E  X 2   ...  a n E  X n  .

3) Moments and other measures in terms of expectation are given as:


i) r th order moment about any point is given as

  pi  x i  A  r , if X is a discrete r.v.
 i
'
r   
   x  A r f  x  dx, if X is a continous r.v
 

= E(X  A)r
ii) Variance of a random variable X is given as
2 2
V  X   E  X     E  X  E  X  

pi  xi  x  yi  y , if  X, Y is discrete r.v.


 i
iii) Cov(X,Y) =   
    x  x  y  y f  x, y dydx, if  X, Y is continuous r.v.
 

 E  X  E  X    Y  E  Y   

= E(XY) – E(X) E(Y).


iv) M.D. about mean = EX – E(X)
 pi  x  Mean  for discrete r.v


   x  Mean  f  x  dx for continuous r.v
 
If you want to see what our solutions to the exercises in the unit are, we
have given them in the following section.

8.7 SOLUTIONS/ANSWERS
E1) Let X be the amount (in rupees) won by you.
1
 X can take the values 100, 0 with P[X = 100] = P[Head] = , and
2
1
P[X = 0] = P[Tail] = .
2
 probability distribution of X is
X: 100 0
1 1
px
2 2

73
Random Variables and and hence the expected amount won by you is
Expectation
1 1
E  X   100   0  = 50.
2 2
E2) Let X be the number of tosses till tail turns up.
 X can take values 1, 2, 3, 4… with
1
P[X = 1] = P[Tail in the first toss] =
2
2
1 1 1
P[X = 2] = P[Head in the first and tail in the second toss] =    ,
2 2 2
3
1 1 1 1
P[X = 3] = P[HHT] =      , and so on.
2 2 2 2

 Probability distribution of X is
X: 1 2 3 4 5...
2 3 4 5
1 1 1 1 1
px         ...
2 2 2 2 2
and hence
2 3 4
1 1 1 1
E  X   1  2     3     4     ... … (1)
2 2 2 2
1
Multiplying both sides by , we get
2
2 3 4 5
1 1 1 1 1
E  X      2     3     4     ...
2 2 2 2 2
2 3 4
1 1 1 1
 E  X      2     3     ... … (2)
2 2 2 2
[Shifting the position one step towards right so that we get the
terms having same power at the same positions as that in (1)]
Now, subtracting (2) from (1), we have
2 3 4
1 1 1 1 1
E  X   E  X             ...
2 2 2 2 2
2 3 4
1 1 1 1 1
 E X          
2 2 2 2 2
2 3
1 1 1
 E  X   1         ...
2 2 2

74
(Which is an infinite G.P. with first term a = 1 and common ratio Mathematical Expectation
1
r= )
2
1 a
 [ S  (see Unit 3of course MST - 001)]
1 1 r
1
2
1
=  2.
1
2

E3) E  X    x f  x  dx


0 1 2 
  x f  x  dx   x f  x  dx   x f  x  dx   x f  x  dx
 0 1 2

0 1 2 
3
 x  0  dx   x  x  dx   x  2  x 
3
 dx   x  0  dx
 0 1 2

1 2
 0   x 4 dx   x 8  x 3  6x  2  x   dx  0
0 1

1 2


  x 4dx   8x  x 4  12x 2  6x 3 dx 
0 1

1 2
 x5   x 2 x5 x3 x4 
    8   12  6 
 5 0  2 5 3 4 1

1   8  2   2  12  2  6  2    8 1 1 12 1 6 1 


2 5 3 4 2 5 3 4

          
5   2 5 3 4   2 5 3 4 
 

1  32   1 3 
   16   32  24   4   4   
5  5   5 2 

1  8 13  1 3 1
      .
5  5 10  5 10 2
E4) As X is a random variable with mean ,
 E(X) =  ... (1)

 expectation is nothing but simply the average taken over all 


 the possible values of random variable as defined in Sec. 8.2 
 

75
Random Variables and
 X
Expectation Now, E  Z  E  
  
1 
 E   X   
 
1
 E X   [Using Property 2 of Sec. 8.3]

1
  E  X     [Using Property 3 of Sec. 8.3]

1
    [Using (1)]

=0
Note: Mean of standard random variable is zero.
X
E5) Variance of standard random variable Z = is given as

 X X 
V(Z) = V   =V  
     
1   
 V  X    
   
2
1  Using the result of the Theorem 8.1 
   V X of Sec. 8.5 of this unit 
  
1
= VX
2
1  it is given that the standard deviation 

 2  
2 = 1  2 
of X is and hence its variance is  
Note: The mean of standard random variate is ‘0’ [See (E4)] and its
variance is 1.
E6) Given that E  Y   0  E(a X  b) = 0  a E(X) – b = 0
 a(10) – b = 0
 10 a – b = 0 ... (1)
Also as V(Y) = 1,
hence V(aX  b) = 1
1
 a2V(X) = 1  a2(25) = 1  a2 =
25
1
 a= [ a is positive]
5
 From (1), we have
1
10    b  0  2 – b = 0  b = 2
5
76
1 Mathematical Expectation
Hence, a = , b = 2.
5
E7) Let X be the number on the first card and Y be the number on the
second card. Then probability distribution of X is:
X 1 2 3 4 5 6 7 8 9 10

px 1 1 1 1 1 1 1 1 1 1
10 10 10 10 10 10 10 10 10 10

and the probability distribution of Y is


X 1 2 3 4 5 6 7 8 9 10

px 1 1 1 1 1 1 1 1 1 1
10 10 10 10 10 10 10 10 10 10

1 1 1
 E(X) = E(Y) = 1  2   ...  10 
10 10 10
1 1
 1  2  3  4  5  6  7  8  9  10 =  55  5.5
10 10
and hence the required expected value is
E  X  Y   E  X   E  Y  = 5.5 + 5.5 = 11

E8) Let X be the number obtained on the first die and Y be the number
obtained on the second die.
7
Then E(X) = E(Y) = . [See Example 3 given in Section 8.2]
2
Hence, the required expected value is
E  XY   E  X  E  Y  [Using multiplication theorem of expectation]

7 7 49
=  = .
2 2 4

77
Binomial Distribution
UNIT 9 BINOMIAL DISTRIBUTION
Structure
9.1 Introduction
Objectives

9.2 Bernoulli Distribution and its Properties


9.3 Binomial Probability Function
9.4 Moments of Binomial Distribution
9.5 Fitting of Binomial Distribution
9.6 Summary
9.7 Solutions/Answers

9.1 INTRODUCTION
In Unit 5 of the Course, you have studied random variables, their probability
functions and distribution functions. In Unit 8 of the Course, you have come to
know as to how the expectations and moments of random variables are
obtained. In those units, the definitions and properties of general discrete and
probability distributions have been discussed.
The present block is devoted to the study of some special discrete distributions
and in this list, Bernoulli and Binomial distributions are also included which
are being discussed in the present unit of the course.
Sec. 9.2 of this unit defines Bernoulli distribution and its properties. Binomial
distribution and its applications are covered in Secs. 9.3 and 9.4 of the unit.
Objectives
Study of the present unit will enable you to:
 define the Bernoulli distribution and to establish its properties;
 define the binomial distribution and establish its properties;
 identify the situations where these distributions are applied;
 know as to how binomial distribution is fitted to the given data; and
 solve various practical problems related to these distributions.

9.2 BERNOULLI DISTRIBUTION AND ITS


PROPERTIES

There are experiments where the outcomes can be divided into two categories
with reference to presence or absence of a particular attribute or characteristic.
A convenient method of representing the two is to designate either of them as
success and the other as failure. For example, head coming up in the toss of a
fair coin may be treated as a success and tail as failure, or vice-versa.
Accordingly, probabilities can be assigned to the success and failure.
5
Discrete Probability
Distributions
Suppose a piece of a product is tested which may be defective (failure) or non-
defective (a success). Let p the probability that it found non-defective and
q = 1 – p be the probability that it is defective. Let X be a random variable
such that it takes value 1 when success occurs and 0 if failure occurs.
Therefore,
P  X  1  p, and
P  X  0  q  1  p .
The above experiment is a Bernoulli trial, the r.v. X defined in the above
experiment is a Bernoulli variate and the probability distribution of X as
specified above is called the Bernoulli distribution in honour of J. Bernoulli
(1654-1705).

Definition

A discrete random variable X is said to follow Bernoulli distribution with


parameter p if its probability mass function is given by
p x 1  p 1 x ; x  0,1
P X  x   
 0 ; elsewhere
11
i.e. P  X  1  p1 1  p  p [putting x = 1]
1 0
and P  X  0  p 0 1  p   1  p [putting x = 0]
The Bernoulli probability distribution, in tabular form, is given as

X 0 1
px 1p p

Remark 1: The Bernoulli distribution is useful whenever a random experiment


has only two possible outcomes, which may be labelled as success and failure.

Moments of Bernoulli Distribution

The rth moment about origin of a Bernoulli variate X is given as


'r  E  X r 
1
  xrp  x  [See Unit 8 of this course]
x 0
r r
  0  p  0   1 p 1
  0 1  p   1 p
=p
 1'  p, '2  p, 3'  p,  '4  p .
Hence,
Mean = 1'  p,
2
 
Variance  2  '2  1'    p  p 2  p 1  p  ,
3
Third order central moment  3   3'  3 2' 1'  2 1'    
3
 p  3pp  2  p 

6
Binomial Distribution
 p  3p2  2p3
 
 p 2p 2  3p  1  p  2p  1 p  1
 p 1  p 1  2p 
2 4
Fourth order central moment ( 4 )  '4  4 3' 1'  6 2' 1'    
 3 1'
2 4
 p  4p.p  6p  p   3  p 
 p  4p 2  6p 3  3p 4
 p 1  4p  6p 2  3p3 
= p 1  p  1  3p  3p 2 

[Note: For relations of central moments in terms of moments about origin, see
Unit 3 of MST-002.]

Example 1: Let X be a random variable having Bernoulli distribution with


paramete p = 0.4. Find its mean and variance.
Solution:
Mean = p = 0.4,
Variance = p(1  p) = (0.4) (1  0.4) = (0.4) (0.6) = 0.24

Single trial is taken into consideration in Bernoulli distribution. But, if trials


are performed repeatedly a finite number of times and we are interested in the
distribution of the sum of independent Bernoulli trials with the same
probability of success in each trial, then we need to study binomial distribution
which has been discussed in the next section.

9.3 BINOMIAL PROBABILITY FUNCTION


Here, is this section, we will discuss binomial distribution which was
discovered by J. Bernoulli (1654-1705) and was first published eight years
after his death i.e. in 1713 and is also known as “Bernoulli distribution for n
trials”. Binomial distribution is applicable for a random experiment comprising
a finite number (n) of independent Bernoulli trials having the constant
probability of success for each trial.
Before defining binomial distribution, let us consider the following example:
Suppose a man fires 3 times independently to hit a target. Let p be the
probability of hitting the target (success) for each trial and q   1  p  be the
probability of his failure.
Let S denote the success and F the failure. Let X be the number of successes in
3 trials,
P[X = 0] = Probability that target is not hit at all in any trial
= P [Failure in each of the three trials]
 P F  F  F
= P  F  .P  F  .P  F  [ trials are independent]
 q.q.q
 q3

7
Discrete Probability
Distributions
This can be written as

P  X  0   3 C 0 p 0 q 3 0
n
[ 3 C0  1, p0  1, q 30  q3 . Recall n C x  (see Unit 4 of MST-001)]
x nx
P[X = 1] = Probability of hitting the target once
= [(Success in the first trial and failure in the second and third trial)
or (success in the second trial and failure in the first and third
trials) or (success in the third trial and failure in the first two
trials)]
 P  S  F  F  or  F  S  F  or  F  F  S 
 P  S  F  F   P  F  S  F   P  F  F  S
= P  S .P  F  .P  F   P  F  .P S  .P  F   P  F  .P  F  .P  S
[ trials are independent]
 p.q.q  q.p.q  q.q.p
 pq 2  pq 2  pq 2
 3pq 2
This can also be written as
P  X  1  3C1p1q 31 [ 3
C1  3, p1  p, q 31  q 2 ]

P[X = 2] = Probability of hitting the target twice


= P[(Success in each of the first two trials and failure in the third
trial) or (Success in first and third trial and failure in the second
trial) or (Success in the last two trials and failure in the first
trial)]
 P  S  S  F    S  F  S   F  S  S 
= P  S  S  F  P  S  F  S   P  F  S  S 

 P  S .P  S  .P  F   P  S .P  F  .P S   P  F  .P S  .P S
 p.p.q  p.q.p  q.p.p
= 3 p2q
This can also be written as

P  X  2   3 C 2 p 2 q 3 2 [ 3 C 2  3, q 3 2  q ]

P[X = 3] = Probability of hitting the target thrice


= [Success in each of the three trials]
= P S  S  S
= P  S .P  S .P  S
= p.p.p
= p3

8
This can also be written as Binomial Distribution

P  X  3  3C3 p3q 33 [ 3C3  1, q 33  1 ]

From the above four enrectangled results, we can write

P  X  r   3C r p r q 3 r ; r  0, 1, 2, 3 .
which is the probability of r successes in 3 trials. 3 Cr , here, is the number of
ways in which r successes can happen in 3 trials.

The result can be generalized for n trials in the similar fashion and is given as
P  X  r   n C r p r q n  r ; r  0, 1, 2,..., n.

This distribution is called the binomial probability distribution. The reason


behind giving the name binomial probability distribution for this probability
distribution is that the probabilities for x = 0, 1, 2, …, n are the respective
probabilities n C0 p 0q n 0 , n C1p1q n 1 , ..., n C n p n q n n which are the successive
terms of the binomial expansion (q + p)n.

[ (q +p) n = n C0q n p0  n C1q n 1p1  ...  n Cn q 0 p n ]

Binomial Expansion:
‘Bi’ means ‘Two’. ‘Binomial expansion’ means ‘Expansion of expression
having two terms, e.g.
(X  Y) 2  X 2  2XY  Y 2  2C 0 X 2 Y 0  2 C1X 21Y1  2C 2 X 2  2 Y 2 ,
3
X  Y  X3  3X 2 Y  3XY 2  Y 3
 3C0 X3Y 0  3C1X31Y1  3C 2 X32 Y 2  3C3 X33Y 3
So, in general,
n
 X  Y   n C0 Xn Y0  n C1 X n 1Y1  n C2 X n 2 Y 2  ...  n Cn X n n Y n

The above discussion leads to the following definition.


Definition:
A discrete random variable X is said to follow binomial distribution with
parameters n and p if it assumes only a finite number of non-negative integer
values and its probability mass function is given by
 n C p x q n  x ; x  0, 1, 2, ..., n
P X  x    x
 0; elsewhere
where, n is the number of independent trials,
x is the number of successes in n trials,
p is the probability of success in each trial, and
q = 1 – p is the probability of failure in each trial.

9
Discrete Probability
Distributions Remark 2:
i) The binomial distribution is the probability distribution of sum of n
independent Bernoulli variates.
ii) If X is binomially distributed r.v. with parameters n and p, then we may
write it as X ~ B(n, p).

iii) If X and Y are two binomially distributed independent random variables


with parameters (n1, p) and (n2, p) respectively then their sum also follows a
binomial distribution with parameters n1  n 2 and p. But, if the probability
of success is not same for the two random variables then this property does
not hold.

Example 2: An unbiased coin is tossed six times. Find the probability of


obtaining
(i) exactly 3 heads
(ii) less than 3 heads
(iii) more than 3 heads
(iv) at most 3 heads
(v) at least 3 heads
(vi) more than 6 heads

Solution: Let p be the probability of getting head (success) in a toss of the


coin and n be the number of trials.
1 1 1
 n = 6, p = and hence q = 1 – p = 1 –  .
2 2 2

Let X be the number of successes in n trials,


 by binomial distribution, we have
P  X  x   n C x p x q n  x ; x  0, 1, 2, ..., n
x 6 x
1 1
 6Cx     ; x  0, 1, 2, ..., 6
2 2
6
1
 6 C x   ; x  0, 1, 2, ...,6 .
2
1 6
= . C x ; x  0, 1, 2, ..., 6.
64
Therefore,
(i) P[exactly 3 heads] = P [X = 3]
1 1  6 5 4  5
  6 C3     
64 64  3  2  16
n
[ Recall n C x  (see Unit 4 of MST- 001)]
x nx
(ii) P[less than 3 heads] = P[X < 3]
 P  X  2 or X  1 or X  0
 P  X  2  P  X  1  P  X  0
1 6 1 1
 . C 2  . 6C1  . 6C 0
10 64 64 64
1 6 1 65 Binomial Distribution

 C 2  6 C1  6 C0  =   6  1
64 64  2 
22 11
=  .
64 32
(iii) P[more than 3 heads] = P[X > 3]
 in 6 trials one can 
= P[X = 4 or X = 5 or X = 6]  
 have at most 6 heads 

 P  X  4  P  X  5  P  X  6 

1 6 1 1
= . C 4  . 6 C5  . 6 C6
64 64 64
1 6
  C 4  6C5  6C6 
64
1 65  22 11
=   6  1   .
64  2  64 32
(iv) P[at most 3 heads] = P [3 or less than 3 heads]
= P  X  3  P  X  2  P  X  1  P  X  0

1 6 1 1 1
= . C3  . 6 C2  . 6C1  . 6C0
64 64 64 64
1 6 6 6 6
  C3  C 2  C1  C0 
64 
1 42 21
  20  15  6  1   .
64 64 32
(v) P[at least 3 heads] = P[3 or more heads]
= P  X  3  P  X  4   P  X  5  P  X  6 
or
= 1   P  X  0  P  X  1  P  X  2 
 sum of probabilities of all possible 
 values of a random variable is 1 
 

 11   Already obtained in 
= 1    part (ii) of this example 
 32   

21
 .
32
(vi) P [more than 6 heads] = P [7 or more heads]
  in six tosses, it 
= P [an impossible event] is impossible to get 
 more than six heads 
=0

11
Discrete Probability
Distributions
Example 3: The chances of catching cold by workers working in an ice
factory during winter are 25%. What is the probability that out of 5 workers 4
or more will catch cold?

Solution: Let catching cold be the success and p be the probability of success
for each worker.
 Here, n = 5, p = 0.25, q = 0.75 and by binomial distribution
P  X  x   n C x p x q n  x ; x  0, 1, 2, ..., n
x 5 x
 5C x  0.25   0.75  ; 0, 1, 2, ...,5
Therefore, the required probability = P[X  4]
= p  X  4 or X  5
 P  X  4   P  X  5
4 1 5 0
 5C4  0.25  0.75  5C5  0.25   0.75 
  5  0.002930   1 0.000977 
 0.014650  0.000977
= 0.015627

Example 4: Let X and Y be two independent random variables such that


X ~ B(4, 0.7) and Y ~ B(3, 0.7). Find P[X + Y  1].

Solution: We know that if X and Y are independent random variables each


following binomial distribution with parameters (n1, p) and (n2, p), then
X + Y ~ B( n1  n 2 , p).
Therefore, here X + Y follows binomial distribution with parameters 4 + 3 and
0.7, i.e. 7 and 0.7. So, here, n = 7 and p = 0.7.

Thus, the required probability = [X + Y  1]


= P[X + Y=1] + P[X + Y= 0]
1 6 0 7
 7 C1  0.7   0.3  7 C0  0.7   0.3
 7(0.7)(0.000729)  1(1)(0.0002187)
 0.0035721  0.0002187
 0.0037908

Now, we are sure that you can try the following exercises:

1
E1) The probability of a man hitting a target is . He fires 5 times. What is the
4
probability of his hitting the target at least twice?

E2) A policeman fires 6 bullets on a dacoit. The probability that the dacoit will
be killed by a bullet is 0.6. What is the probability that the dacoit is still
alive?

12
Binomial Distribution
9.4 MOMENTS OF BINOMIAL DISTRIBUTION

The r th order moment about origin of a binomial variate X is given as


n

 
'r  E X r   x r .P  X  x 
x0
n
 1'  E  X    x .P  X  x 
x 0
n
=  x. n C x p x q n  x  P  X  x   n C x p x q n  x ; x  0, 1, 2, ..., n 
x 0
n
 first term with x = 0 will be zero 
  x. n C x p x q n  x  
x 1 and hence we may start from x  1 
n
n
  x. . n 1C x 1 p x q n  x
x 1 x

 n n n n 1 n n 1 
 C x  x n  x  x x  1  n  1   x  1  x C x 1 ,
 
 see Unit 4 of MST  001 
 
n
  n. n 1C x 1 p x 1.p1.q 
n 1   x 1
[n  x = (n  1)  (x  1)]
x 1
n
 np  C x 1 p x 1.q  n 1  x 1
n 1

x 1

 n 1 C 0 p 0 q  n 10  n 1C1 p1 q  n 11  n 1C 2 p 2q  n 1 2  ...


= np  
  n 1C n 1.p n 1 q  n 1  n 1 

Sum of probabilities of all possible values of a 


 np   
 binomial variate with parameters n  1 and p 

 sum of probabilities of all possible 


 np 1  values of a random variable is 1 
 
= np.

 Mean = First order moment about origin


= 1'
= np.

Mean = np

n n
 
'2  E X 2   x 2 .P  X  x   x 2 . n C x p x q n  x
x 0 x 0
2
Here, we will write x as x  x  1  x [ x  x  1  x  x 2  x  x  x 2 ]
This is done because in the following expression, we get x  x  1 in the
denominator:
13
Discrete Probability
Distributions  n n n  n  1 n  2 
 C x   
 x  n  x x  x  1 x  2   n  2    x  2  
 
 n  n  1 n 2 
 . C x 2
 x  x  1 
 
n
 '2    x  x  1  x  n C x p x q n  x
x 0
n n
  x  x  1 n C x p x q n  x   x. n C x p x q n  x
x 0 x 0
n
 
  x  x  1 n C x p x q  n 2  x  2   1'
 
 x 2 
 n
n  n  1 n  2 
  x  x  1. C x  2 p x q n  x   1'
 x 2 x  x  1 
 n 
  n  n  1 n 2 C x  2 p x 2 .p 2q  n 2  x 2   1'
 x 2 
n
 
  n  n  1 p 2  n  2 C x  2 p 2 q  n 2  x  2   1'
 x 2 
Sum of probabilities of all possible values of a 
 n  n  1 p 2   '
  1
 binomial variate with parameters n  2 and p 

= n(n – 1) p2 (1) + np 1'  np 


= n2 p2 – np2 + np
2
Variance (2) =  2  1   [See Unit 3 of MST-002]
= n2p2–n p2 + np – (np)2
= n2p2 – np2 + np – n2p2
= np – np2
= np(1 – p)
= npq

 Variance = npq
n
'3   x 3 .P  X  x 
x 0

Here, we will write x 3 as x  x  1 x  2   3x  x  1  x

Let x 3  x  x  1 x  2   Bx  x  1  Cx
Comparing coefficients of x 2 , we have
0 =– 3+B B3
Comparing coeffs of x, we have
0=2–B+CC=B–2=3–2C=1

14
n Binomial Distribution
 '3    x  x  1 x  2   3x  x  1  x  .n C x p x q n  x
x 0

n n n
  x  x  1 x  2  n C x p x q n  x  3 x  x  1 n C x p x q n  x   x. n C x p x q n  x
x 0 x 0 x0

n
n n  1 n  2 n 3
  x  x  1 x  2  . . C x 3p x q n  x  3  n  n  1 p 2    np 
x 0 x x 1 x  2
[The expression within brackets in the second term is the first term of
R.H.S. in the derivation of '2 and the expression in the third term is 1' as
already obtained.]
 n n n  n  1 n  2  n  3 
 C x   
 x n  x x  x  1 x  2  x  3  n  3   x  3 
 
 n  n  1 n  2  n 3 
 . C x 3
 x  x  1 x  2  
n
  n  n  1 n  2  .n 3 C x 3p 3p x 3q  n 3 x 3  3n  n  1p 2  np
x 3
n
 n  n  1 n  2  p3  n 3 C x 3 p x 3q 
n 3    x  3
 3n  n  1 p 2  np
x 3

 n  n  1 n  2  p3 1  3n  n  1 p 2  np

 Third order central moment is given by


3
 
3  3'  3 2' 1'  2 1' [See Unit 4 of MST-002]
= npq  q  p  [On simplification]

3  npq  q  p 

n
'4   x 4 P  X  x 
x 0

Writing
x 4  x  x  1 x  2  x  3   6x  x  1 x  2   7x  x  1  x

and proceeding in the similar fashion as for 1' ,  '2 , 3' , we have

'4  n  n  1 n  2  n  3 p 4  6n  n  1 n  2  p3  7n  n  1 p 2  np


and hence
2 4
    
 4   4'  4 3' 1'  6 2' 1' '
1

 4  npq 1  3  n  2  pq  [On simplification]

Now, recall the measures of skewness and kurtosis which you have studied in
Unit 4 of MST-002

15
Discrete Probability
Distributions
These measures are given as follows:
2 2
 2  npq  q  p   q  p 
1  33   3
 ,
2  npq  npq

 4 npq 1  3  n  2  pq  1  6pq


2  2
 2
 3 ,
2  npq  npq
qp 1  2p
1  1   , and
npq npq
1  6pq
 2  2  3 
npq

Remark 3:
(i) As 0 < q < 1
q<1
 npq < np [Multiplying both sides by np > 0]
 Variance < Mean
Hence, for binomial distribution
Mean > Variance

(ii) As variance of X  B(n, p) is npq,


 its standard deviation is npq.
1
Example 4: For a binomial distribution with p  and n = 10, find mean and
4
variance.
1 1 3
Solution: As p  ,  q = 1   .
4 4 4
1 5
Mean = np = 10  ,
4 2
1 3 15
Variance = npq = 10   = .
4 4 8
Example 5: The mean and standard deviation of binomial distribution are 4
2
and respectively. Find P[X  1].
3

Solution: Let X  B(n, p), then


Mean = np = 4
2
 2  2
and variance = npq =   [ S.D. = and variance is square of S.D.]
 3 3
Dividing second equation by the first equation, we have
4
npq 3

np 4
1
 q=
3
1 2
 p  1 q  1 
3 3

16
2 Binomial Distribution
Putting p = in the equation of mean, we have
3
2
n   4  n = 6
3
 by binomial distribution,
P[X = x] = n C x p x q n  x
x 6 x
6  2 1
 C x     ; x = 0, 1, 2, …, 6.
 3 3
Thus, the required probability
P  X  1  P  X  1  P  X  2   P X  3  ...  P  X  6
= 1  P  X  0
0 6 0
6 2 1 1 728
= 1  C0      1  11  .
 3 3 729 729
Example 6: If X  B (n, p). Find p if n = 6 and 9P[X = 4] = P[X = 2].

Solution: As X  B(n, p) and n = 6,


6 x
 P  X  x   6C x p x 1  p  ; x  0, 1, 2, ..., 6.
Now, 9P  X  4   P  X  2
6 4 4
 9  6C 4  p 4 1  p   6C 2  p 2 1  p 
65 4 2 65 2 4
 9  p  1  p   p 1  p 
2 2
2
 9p  1  p 
2

 9p 2  1  p 2  2p
 8p 2  2p  1  0
 8p 2  4p  2p  1  0
 4p  2p  1  1  2p  1  0
 (2p  1)(4p  1)  0
  2p  1  0 or  4p  1  0
1 1
p or
2 4
1
But p =  rejected [ probability can never be negative]
2
1
Hence, p =
4

Now, you can try the following exercises:

E3) Comment on the following:

The mean of a binomial distribution is 3 and variance is 4.

E4) Find the binomial distribution when sum of mean and variance of 5 trails
is 4.8.

17
Discrete Probability
Distributions
E5) The mean of a binomial distribution is 30 and standard deviation is 5.
Find the values of

i) n, p and q,

ii) Moment coefficient of skewness, and

iii) Kurtosis.

9.5 FITTING OF BINOMIAL DISTRIBUTION


To fit a binomial distribution, we need the observed data which is obtained
from repeated trials of a given experiment. On the basis of the observed data,
we find the theoretical (or expected) frequencies corresponding to each value
of the binomial variable. Process of finding the probabilities corresponding to
each value of the binomial variable becomes easy if we use the recurrence
relation for the probabilities of Binomial distribution. So, in this section, we
will first establish the recurrence relation for probabilities and then define the
binomial frequency distribution followed by process of fitting a binomial
distribution.

Recurrence Relation for the Probabilities of Binomial Distribution

You have studied that binomial probability function is

p  x   P X  x   nCx p xq n x … (1)

If we replace x by x + 1, we have

p  x  1  n C x 1 p x 1q n  x 1 … (2)

Dividing (2) by (1), we have

p  x  1 n
C x 1p x 1q n  x 1

px n
Cx p xq n x

 n n 
n x nx p  C x 1  x  1 n  x  1 and 
   
x 1 n  x 1 n q  n n 
 Cx  x n  x 
 

x n  x  n  x 1 p nx p
  = 
 x  1 x n  x  1 q x  1 q
nx p
 p  x  1  px ... (3)
x 1 q

Putting x  0 , 1, 2, 3,…in this equation, we get p(1) in terms of p(0), p(2) in


terms of p(1), p(3) in terms of p(2), and so on. Thus, if p(0) is known, we can
find p(1) then p(2), p(3) and so on.
18
So, eqn. (3) is the recurrence relation for finding the probabilities of binomial Binomial Distribution
distribution. The initial probability i.e. p(0) is obtained from the following
formula:
p 0  qn

[ p  x   n C x p x q n  x putting x = 0, we have p(0) = n C0 p 0q n  q n ]

Binomial Frequency Distribution


We have studied that in a random experiment with n trials and having p as the
probability of success in each trial,
P  X  x   n C x p x q n  x ; x  0, 1, 2, ..., n
where x is the number of successes. Now, if such a random experiment of n
trials is repeated say N times, then the expected (or theoretical) frequency of
getting x successes is given by
f(x) = N.P  X  x   N. n C x p x q n  x ; x  0, 1, 2, ..., n
i.e. probability is multiplied by N to get the corresponding expected frequency.

Process of Fitting a Binomial Distribution

Suppose we are given the observed frequency distribution. We first find the
mean from the given frequency distribution and equate it to np. From this, we
can find the value of p. After having obtained the value of p, we obtain
p  0   q n , where q = 1 – p.
nx
Then the recurrence relation i.e. p  x  1  p  x  is applied to find the
x 1
values of p(1), p(2),…. After that, the expected (theoretical) frequencies f(0),
(1), f(2), … are obtained on multiplying each of the corresponding
probabilities i.e. p(0), p(1), p(2), … by N.
In this way, the binomial distribution is fitted to the given data. Thus, fitting of
a binomial distribution involves comparing the observed frequencies with the
expected frequencies to see how best the observed results fit with the
theoretical (expected) results.
Example 7: Four coins were tossed and number of heads noted. The
experiment is repeated 200 times.
The number of tosses showing 0, 1, 2, 3 and 4 heads were found distributed as
under. Fit a binomial distribution to these observed results assuming that the
nature of the coins is not known.

Number of heads: 0 1 2 3 4

Number of tosses 15 35 90 40 20

19
Discrete Probability
Distributions
Solution: Here n = 4, N = 200.
First, we obtain the mean of the given frequency distribution as follows:

Number of head X Number of tosses f fX


0 15 0
1 35 35
2 90 180
3 40 120
4 20 80
Total 200 415

 Mean = f x [See Unit 1of MST-002]


f
415

200
 2.075
As mean for binomial distribution is np,
 np = 2.075
2.075
p
4
 0.5188
 q  1 p
1  0.5188
 0.4812
 p(0) = qn
= (0.4812)4
= 0.0536
Now, using the recurrence relation
nx p
p  x  1  . p  x  ; x  0, 1, 2, 3, 4;
x 1 q
we obtain the probabilities for different values of the random variable X i.e.
40
p(1) is obtained on multiplying p(0) with , p(2) is obtained on
0 1
4 1
multiplying p(1) with , and so on; i.e. the values as shown in col. 3 of the
11
following table are obtained on multiplying the preceding values of col. 2 and
col 3, except the first value which has been obtained using p(0) = qn as above.

20
Binomial Distribution

Number n  x p 4  x  0.5188  px Expected


of .    or
x  1 q x  1  0.4812 
Heads theoretical
(X) 4x frequency
 1.07814 
x 1 f x
(2) (3) (4)
(1)
0 40 p(0) = 0.0536 10.72 11
1.07814   4.31256
0 1

1 4 1 p(1) = 4.31256  0.0536 46.23 46


1.07814   1.61721 = 0.23115
11
2 42 p(2) =1.61721  0.23115 74.76 75
1.07814   .071876 = 0.37382
2 1

3 43 p(3) = 0.71876  0.37382 53.73 54


1.07814   0.26954 = 0.26869
3 1

4 44 p(4) = 0.26954  .26869 14.48 14


1.07814   0 = 0.0724
4 1

Remark 3: In the above example, if the nature of the coins had been known
e.g. if it had been given that “the coins are unbiased” then we would have
taken
1
p= and then the observed data would not have been used to find p. Such a
2
situation can be seen in the problem E6).
Here are two exercises for you:

E6) Seven coins are tossed and number of heads noted. The experiment is
repeated 128 times and the following distribution is obtained:
Number of heads 0 1 2 3 4 5 6 7
Frequencies 7 6 19 35 30 23 7 1

Fit a binomial distribution assuming the coin is unbiased.

21
Discrete Probability
Distributions
E7) Out of 800 families with 4 children each, how many families would you
expect to have 3 boys and 1 girl, assuming equal probability of boys and
girls?

Now before ending this unit, let’s summarize what we have covered in it.

9.6 SUMMARY
The following main points have been covered in this unit:
1) A discrete random variable X is said to follow Bernoulli distribution with
parameter p if its probability mass function is given by
p x 1  p 1 x ; x  0,1
P X  x   
 0; elsewhere
Its mean and variance are p and p(1  p), respectively. Third and fourth
central moments of this distribution are p 1  p 1  2p  and
p(1  p) (1  3p  3p 2 ) respectively.
2) A discrete random variable X is said to follow binomial distribution if it
assumes only a finite number of non-negative integer values and its
probability mass function is given by
 n C p x q n  x ; x  0, 1, 2, ..., n
P X  x    x
 0; elsewhere
where, n is the number of independent trials,
x is the number of successes in n trial,
p is the probability of success in each trial, and
q = 1 – p is the probability of failure in each trial.
3) The constants of Binomial distribution are:
Mean= np, Variance= npq,
3  npq  q  p  ,  4  npq 1  3  n  2  pq 
2

1 
q  p , 2  3 
1  6pq
,
npq npq
1  2p 1  6pq
1  , and 2 
npq npq
4) For a binomial distribution, Mean > Variance.
5) Recurrence relation for the probabilities of binomial distribution is
nx p
p  x  1  . .p  x  , x = 0, 1, 2, …, n  1
x 1 q
6) The expected frequencies of the binomial distribution are given by
f(x) = N.P  X  x   N. n C x p x q n  x ; x  0, 1, 2, ..., n

22
Binomial Distribution
9.7 SOLUTIONS/ANSWERS
E1) Let p be the probability of hitting the target (success) in a trial.
1 1 3
 n = 5, p = , q  1   ,
4 4 4
and hence by binomial distribution, we have
x 5 x
1 3
P  X  x   n Cx p x q n x  5 Cx     ; x  0,1, 2,3, 4, 5.
4 4

 Required probability = P  X  2

 P  X  2   P  X  3  P  X  4   P X  5

 1   P  X  0  P  X  1 

 5  1 0  3 50 5  1 1  3 51 
 1   C0      C1     
 4 4  4   4  

 243 405  376 47


=1      
1024 1024  1024 128
E2) Let p be the probability that the dacoit will be killed (success) by a bullet.
 n = 6, p = 0.6, q = 1 – p = 1 – 0.6 = 0.4, and hence by binomial
distribution, we have
P  X  x   n C x p x q n  x ; x  0, 1, 2, ..., n
x 6 x
 6Cx  0.6   0.4  ; x  0, 1, 2, ..., 6 .

 The required probability = P[The dacoit is still alive]


= P[No bullet kills the dacoit]
= P[Number of successes is zero]
0 6
= P[X = 0] = 6 C0  0.6   0.4 

= 0.0041
E3) Mean = np = 3 ... (1)
Variance = npq = 4 ... (2)
 Dividing (2) by (1), we have
4
q  1 and hence not possible
3
[ q, being probability, cannot be greater than 1]

E4) Let X  B(n, p), then


n = 5 and

23
Discrete Probability
Distributions
np + npq = 4.8 [ given that Mean + Variance = 4.8]
 5p + 5pq = 4.8
 5[p + p (1–p)] = 4.8
 5[p + p – p2] = 4.8
 5p2 – 10p + 4.8 = 0
 25p2 – 50p + 24 = 0 [Multiplying by 5]
2
 25p – 30 p – 20 p + 24 = 0
 5p(5p – 6) – 4 (5p – 6) = 0
 (5p  6) (5p  4) = 0
6 4
p= ,
5 5
6
The first value p = is rejected [ probability can never exceed 1]
5
4 1
 p = and hence q = 1 – p = .
5 5
Thus, the binomial distribution is
P  X  x   n Cx px q n x
x 5 x
5  4 1
 Cx     ; x  0, 1, 2, 3, 4, 5.
 5 5
The binomial distribution in tabular form is given as
X p(x)
0 0
4 1
5
1
5
C0     =
 5   5  3125
1 1
 4 1 20
4
5
C1     
 5 5 3125
2 2
 4 1
3
160
5
C2     =
 5   5  3125
3 3
 4 1
2
640
5
C3     =
 5   5  3125
4 4 1
 4   1  1280
5
C4     
 5   5  3125
5 5 0
 4   1  1024
5
C5     
 5   5  3125

E5) Given that Mean = 30 and S.D. = 5


Thus, np = 30, npq  5

24  np = 30, npq = 25
npq 25 5 5 5 1 1 Binomial Distribution
i)    q  , p  1  q  1   , n    30  n  180
np 30 6 6 6 6 6
1 5
ii)  2  npq  180    25
6 6
 5 1  50
3  npq  q  p   25    
6 6 3
32 4
 1  3

 2 225
 Moment coefficient of skewness is given by
2
1  1 
15
1 5
1 6 
1  6pq 6 6 = 3 1
iii) 2  3   3
npq 25 150
1
  2  2  3  0
150
So, the curve of the binomial distribution is leptokurtic.
1
E6) As the coin is unbiased,  p = .
2
1 1
Here, n = 7, N = 128, p = , q  1  p  .
2 2
7
1
n 1
 p(0) = q =    .
 2  128
Expected frequencies are, therefore, obtained as follows:
Number px Expected or
of 1 theoretical
heads nx p 7x 2 Frequency
(X) .  .
x  1 q x 1 1 f  x   N.p  x 
2
7x  128.p  x 

x 1
0 70 1 1
=7
0 1 128
1 7 1 1 7 7
3 7 
11 128 128
2 72 5 7 21 21
 3 
2 1 3 128 128
3 7 3 5 21 35 35
1  
3 1 3 128 128

25
Discrete Probability
Distributions 4 74 3 35 35 35
 1 
4 1 5 128 128
5 7 5 1 3 35 21 21
  
5 1 3 5 128 128
6 76 1 1 21 7 7
  
6 1 7 3 128 128
7 77 1 7 1 1
0  
7 1 7 128 128

1
E7) Here, probability (p) to have a boy is and the probability (q) to have
2
1
a girl is , n = 4, N = 800.
2
Let X be the number of boys in a family.
 by binomial distribution, the probability of having 3 boys in a family
of 4 children
= P[X = 3] [ P  X  x   n C x p x q n  x ]
3 4 3 4
4 1 1 1
 C3     = 4 
2 2 2
Hence, the expected number of families having 3 boys and 1 girl
1
= N.p  3  = 128   = 32
4

26
Poisson Distribution
UNIT 10 POISSON DISTRIBUTION
Structure
10.1 Introduction
Objectives

10.2 Poisson Distribution


10.3 Moments of Poisson Distribution
10.4 Fitting of Poisson Distribution
10.5 Summary
10.6 Solutions/Answers

10.1 INTRODUCTION
In Unit 9, you have studied binomial distribution which is applied in the cases
where the probability of success and that of failure do not differ much from
each other and the number of trials in a random experiment is finite. However,
there may be practical situations where the probability of success is very small,
that is, there may be situations where the event occurs rarely and the number of
trials may not be known. For instance, the number of accidents occurring at a
particular spot on a road everyday is a rare event. For such rare events, we
cannot apply the binomial distribution. To these situations, we apply Poisson
distribution. The concept of Poisson distribution was developed by a French
mathematician, Simeon Denis Poisson (1781-1840) in the year 1837.
In this unit, we define and explain Poisson distribution in Sec. 10.2. Moments
of Poisson distribution are described in Sec. 10.3 and the process of fitting a
Poisson distribution is explained in Sec. 10.4.
Objectives
After studing this unit, you would be able to:
 know the situations where Poisson distribution is applied;
 define and explain Poisson distribution;
 know the conditions under which binomial distribution tends to Poisson
distribution;
 compute the mean, variance and other central moments of Poisson
distribution;
 obtain recurrence relation for finding probabilities of this distribution;
and
 know as to how a Poisson distribution is fitted to the observed data.

27
Discrete Probability
Distributions 10.2 POISSON DISTRIBUTION
In case of binomial distributions, as discussed in the last unit, we deal with
events whose occurrences and non-occurrences are almost equally important.
However, there may be events which do not occur as outcomes of a definite
number of trials of an experiment but occur rarely at random points of time and
for such events our interest lies only in the number of occurrences and not in
its non-occurrences. Examples of such events are:
i) Our interest may lie in how many printing mistakes are there on each page
of a book but we are not interested in counting the number of words
without any printing mistake.
ii) In production where control of quality is the major concern, it often
requires counting the number of defects (and not the non-defects) per item.
iii) One may intend to know the number of accidents during a particular time
interval.
Under such situations, binomial distribution cannot be applied as the value of n
is not definite and the probability of occurrence is very small. Other such
situations can be thought of yourself. Poisson distribution discovered by S.D.
Poisson (1781-1840) in 1837 can be applied to study these situations.
Poisson distribution is a limiting case of binomial distribution under the
following conditions:
i) n, the number of trials is indefinitely large, i.e. n  .
ii) p, the constant probability of success for each trial is very small, i.e. p  0.
iii) np is a finite quantity say ‘’.

Definition: A random variable X is said to follow Poisson distribution if it


assumes indefinite number of non-negative integer values and its probability
mass function is given by:
 e . x
 ; x  0, 1, 2, 3, ... and   0.
px  PX  x   x
0; elsewhere

where e = base of natural logarithm, whose value is approximately equal to
2.7183 corrected to four decimal places. Value of e  can be written from the
table given in the Appendix at the end of this unit, or, can be seen from any
book of log tables.

Remark 1
i) If X follows Poisson distribution with parameter  then we shall use the
notation X  P().
ii) If X and Y are two independent Poisson variates with parameters 1 and 2
repectively, then X + Y is also a Poisson variate with parameter 1+2. This
is known as additive property of Poisson distribution.

28
Poisson Distribution
10.3 MOMENTS OF POISSON DISTRIBUTION
r th order moment about origin of Poisson variate is
 
e  x
 
'r  E X r   x r p  x    x r
x
x 0 x 0


e  x 
e  x 
e  . x
1'   x  x 
x 0 x x 1 x x  1 x 1 x  1

 1  2  3 
= e      ...
 0 1 2 

 1  2 
  e 1    ...
 1 2 

    2 3 
 e e  e  1     ...  see Unit 2 of MST-001  
 1 2 3 
=
 Mean = 

e   x
2   x 2
x 0 x

e . x
   x  x  1  x  [As done in Unit 9 of this Course]
x 0 x

 e   x e   x 
   x  x  1 x 
x 0  x x 

e   x 
e  x
  x  x  1  x
x 2 x  x  1 x  2 x 0 x

x 
e   x
 e   x
x2 x  2 x 0 x

  2 3  4 
 e      ...  1'
 0 1 2 

  2 
 e  2 1    ...  1'
 1 2 
 e  2e  1'

 2  
2
 Variance of X is given as V(X) =  2 = '2  1'  
2
= 2     

=
29
Discrete Probability 3
Distributions '3   x 3p  x 
x 0

Writing x 3 as x  x  1 x  2   3x  x  1  x, we have

See Unit 9 of this course where 


 ' 
 the expression of 3 is obtained 

e  x
   x  x  1 x  2   3x  x  1  x 
x 0 x

e   x 
e  x  e  x
  x  x  1 x  2   3 x  x  1 x
x 3 x x2 x x 1 x

x
 e  x  x  1 x  2 
x  x  1 x  2  x  3
 
 3 2    
x 3


x
 e   3 2  
x 3 x 3

 3  4  5
 
e     ...   3 2  
 0 1 2 

  2 
 e  3  1    ...   3 2  
 1 2 
 e  3 e  3 2  

 3  3 2  
Third order central moment is
3
3  3'  3 2' 1'  2 1'  
= [On simplification]

e  x
'4   x 4 .
x 3 x

Now writing x 4  x  x  1 x  2  x  3  6x  x  1 x  2   7x  x  1  x,


and proceeding in the similar fashion as done in case of  '3 , we have

'4   4  63  7 2  

 Fourth order central moment is


2 4
 
 4   4'  43' 1'  6'2 1'  
 3 1'

 3 2   [On simplification]

30
Therefore, measures of skewness and kurtosis are given by Poisson Distribution

32  2 1 1
1    , 1  1  ; and
32 3  

 4 3 2   1 1
2  2
 2
 3 ,  2  2  3  .
2   

Now as 1 is positive, therefore the Poisson distribution is always positively


skewed distribution. Also as  2 > (  > 0), the curve of the distribution is
Leptokurtic.
Remark 2
i) Mean and variance of Poisson distribution are always equal. In fact this is
the only discrete distribution for which Mean = Variance = the third
central moment.
ii) Moments of the Poisson distribution can be deduced from those of the
binomial distribution also as explained below:
For a binomial distribution,
Mean = np
Variance = npq
3  npq  q  p 

 4  npq 1  3pq  n  2    npq 1  3npq  6pq 

Now as the Poisson distribution is a limiting form of binomial distribution


under the conditions:
(i) n  , (ii) p  0 i.e. q  1, and (iii) np =  (a finite quantity);
 Mean, Variance and other moments of the Poisson distribution are given as:
Mean = Limiting value of np = 
Variance = Limiting value of npq
= Limiting value of (np) (q)
= () (1) = 
3 = Limiting value of npq (q – p)
= Limiting value of (npq) (q – p)
= () (1 – 0)
=
4 = Limiting value of npq [1 + 3npq – 6pq]
= Limiting value of (npq) [1 + 3(npq) –6 (p)(q)]
= () [1 + 3() – 6 (0)(1)]
= [1 + 3] = 32 + 
Now let’s give some examples of Poisson distribution.
31
Discrete Probability
Distributions
Example 1: It is known that the number of heavy trucks arriving at a railway
station follows the Poisson distribution. If the average number of truck arrivals
during a specified period of an hour is 2, find the probabilities that during a
given hour
a) no heavy truck arrive,
b) at least two trucks will arrive.
Solution: Here, the average number of truck arrivals is 2
i.e. mean = 2
 =2
Let X be the number of trucks arrive during a given hour,
 by Poisson distribution, we have
x
e  x e  2 
2

P X  x    ; x  0, 1, 2, ...
x x
Thus, the desired probabilities are:
(a) P[arrival of no heavy truck] = P[X = 0]
e 2 20
=
0

 e2

See the table given 


= 0.1353 in the Appendix at 
 
 the end of this unit 

(b) P[arrival of at least two trucks] = P  X  2 

 P  X  2   P  X  3  ...

 1   P  X  1  P  X  0

 sum of all the 


 probabilities is 1 
 

 e2 20 e2 21 
 1   
 0 1 

 20 21 
 1  e2    = 1  e 2 1  2 
 0 1
 1   0.1353 3 = 1  0.4059  0.5941

Note: In most of the cases for Poisson distribution, if we are to compute the
probabilities of the type P  X  a  or P  X  a  , we write them as
P  X  a   1  P  X  a  and

32
P  X  a   1  P  X  a  , because n may not be definite and hence we cannot Poisson Distribution

go up to the last value and hence the probability is written in terms of its
complementary probability.
Example 2: If the probability that an individual suffers a bad reaction from an
injection of a given serum is 0.001, determine the probability that out of 500
individuals
i) exactly 3,
ii) more than 2
individuals suffer from bad reaction
Solution: Let X be the Poisson variate, “Number of individuals suffering from
bad reaction”. Then,
n = 1500, p = 0.001,
  = np = (1500) (0.001) = 1.5
 By Poisson distribution,
e  x
P X  x   , x  0, 1, 2, ...
x
x
e1.5 . 1.5 
 ; x  0, 1, 2, ...
x
Thus,
i) The desired probability = P[X = 3]
3
e1.5 . 1.5 

3

 0.22313.375
 = 0.1255
6

 e 0.5  0.6065, e 1  0.3679,so 


 1.5 
e  e  e   0.3679)  0.6065   0.2231 
1 0.5

 
See the table given in the Appendix 
 at the end of this unit 

ii) The desired probability  P  X  2

 1  P  X  2

= 1   P  X  2  P  X  1  P  X  0 

 e1.5 . 1.5 2 e1.5 . 1.5 1 e 1.5 . 1.5 0 


 1    
 2 1 0 

33
Discrete Probability
Distributions  2.25 
 1  e1.5   1.5  1  1   3.625  e1.5
 2 
 1   3.625  0.2231 = 1 – 0.8087 = 0.1913

Example 3: If the mean of a Poisson distribution is 1.44, find the values of


variance and the central moments of order 3 and 4.
Solution: Here, mean = 1.44
  = 1.44
Hence, Variance =  = 1.44
3 =  = 1.44
4 = 32 +  = 3 (1.44)2 + 1.44 = 7.66.
Example 4: If a Poisson variate X is such that P[X = 1] = 2P[X = 2], find the
mean and variance of the distribution.
Solution: Let  be the mean of the distribution, hence by Poisson distribution,
e  x
P X  x   ; x  0, 1, 2, ...
x

Now, P  X  1  2P  X  2 

e  .1 e . 2
 2
1 2
  = 2  2   = 0  (  1) = 0   = 0, 1
But  = 0 is rejected
[ if  = 0 then either n = 0 or p = 0 which implies that Poisson distribution
does not exist in this case.]
=1
Hence mean =  = 1, and
Variance =  = 1.
Example 5: If X and Y be two independent Poisson variates having means 1
and 2 respectively, find P[X + Y < 2].
Solution: As X ~ P(1), Y ~ P(2), therefore,
X + Y follows Poisson distribution with mean = 1 + 2 = 3.
Let X + Y = W. Hence, probability function of W is
e3 .3w
PW  w  ; w  0, 1, 2, ... .
w
Thus, the required probability= P[X + Y < 2]
= P[W < 2]
= P[W = 0] + P[W = 1]
34
e 3 .30 e3 .31 Poisson Distribution
= 
0 1

= (0.0498)(1 + 3) [From Table, e 3 = 0.0498]


= 0.1992.

You may now try these exercises.


E1) Assume that the chance of an individual coal miner being killed in a mine
1
accident during a year is . Use the Poisson distribution to calculate
1400
the probability that in a mine employing 350 miners, there will be at least
one fatal accident in a year. (use e 0.25  0.78 )
E2) The mean and standard deviation of a Poisson distribution are 6 and 2
respectively. Test the validity of this statement.
E3) For a Poisson distribution, it is given that P[X = 1] = P[X = 2], find the
value of mean of distribution. Hence find P[X = 0] and P[X = 4].

We now explain as to how the Poisson distribution is fitted to the observed


data.

10.4 FITTING OF POISSON DISTRIBUTION

To fit a Poisson distribution to the observed data, we find the theoretical (or
expected) frequencies corresponding to each value of the Poisson variate.
Process of finding the probabilities corresponding to each value of the Poisson
variate becomes easy if we use the recurrence relation for the probabilities of
Poisson distribution. So, in this section, we will first establish the recurrence
relation for probabilities and then define the Poisson frequency distribution
followed by the process of fitting a Poisson distribution.
Recurrence Formula for the Probabilities of Poisson Distribution
For a Poisson distribution with parameter , we have
e   x
px  … (1)
x
Changing x to x + 1, we have
e   x 1
p  x  1  … (2)
x 1
Dividing (2) by (1), we have

e 
 x 1 
p  x  1 x 1 
 
px  e   x  x 1
x

35
Discrete Probability

Distributions  p  x  1  px … (3)
x 1
This is the recurrence relation for probabilities of Poisson distribution. After
obtaining the value of p(0) using Poisson probability function i.e.
e  0
p 0   e , we can obtain p(1), p(2), p(3),…, on putting
 0
x = 0, 1, 2, …. successively in (3).

Poisson Frequency Distribution


If an experiment, satisfying the requirements of Poisson distribution, is
repeated N times, then the expected frequency of getting x successes is given
by
e   x
f  x   N.P  X  x   N. ; x  0, 1, 2,...
x
Example 5: A manufacturer, who produces medicine bottles, finds that 0.1%
of the bottles are defective. The bottles are packed in boxes containing 500
bottles. A drug manufacturer buys 100 boxes from the producer of bottles.
Using Poisson distribution, find how many boxes will contain at least two
defective bottles.
Solution: Let X be the Poisson variate, “the number of defective bottles in a
box”. Here, number of bottles in a box (n) = 500, therefore, the probability (p)
of a bottle being defective is
0.1
p  0.1%   0.001
100
Number of boxes (N) = 100
  np  500  .001  0.5
Using Poisson distribution, we have
e  x
P X  x   ; x  0, 1, 2, ...
x
x
e0.5  0.5 
 ; x  0, 1, 2, ...
x
 Probability that a box contain at least two defective bottles
 P  X  2

 1  P  X  2

= 1–  P  X  0   P  X  1

 e0.5  0.5 0 e 0.5  0.5 1 


 1     1  e0.5 1  0.5
 0 1 

= 1 – (0.6065) (1.5) = 1 – 0.90975 = 0.09025.


36
Hence, the expected number of boxes containing at least two defective bottles Poisson Distribution

= N.P[X  2]
= (100) (0.09025)
= 9.025

Process of Fitting a Poison Distribution


For fitting a Poisson distribution to the observed data, you are to proceed as
described in the following steps.

 First we obtain mean of the given distribution i.e.  fx , being mean,


f
take this as the value of .
 Next we obtain p(0) = e  [Use table given in Appendix at the end of this
unit.]

 The recurrence relation p  x  1  p  x  is then used to compute the
x 1
values of p(1), p(2), p(3), …
 The probabilities obtained in the preceding two steps are then multiplied
with N to get expected/theoretical frequencies i.e.
f  x   N.P  X  x  ; x  0, 1, 2, ...

Example 6: The following data give frequencies of aircraft accidents


experienced by 2480 pilots during a certain period:

Number of Accidents 0 1 2 3 4 5
Frequencies 1970 422 71 13 3 1

Fit a Poisson distribution and calculate the theoretical frequencies.


Solution: Let X be the number of accidents of the pilots. Let us first obtain the
mean number of accidents as follows:
Number of Frequency ( f ) fX
Accidents
(X)
0 1970 0
1 422 422
2 71 142
3 13 39
4 3 12
5 1 5
Total 2480 620
37
Discrete Probability
Distributions
 Mean =  =  fx  620
 f 2480
  = 0.25

 by Poisson distribution,

p(0) = e  = e 0.25

See table given in the Appendix 


= 0.7788 at the end of this unit 
 

Now, using the recurrence relation for probabilities of Poisson distribution i.e.

p  x  1  p  x  and then multiplying each probability with N, we get the
x 1
expected frequencies as shown in the following table

Number of  0.25 p  x   P X  x  Expected/


Accidents  Theoretical
x 1 x 1
(X) frequency
f(x) = 2480p(x)
(1) (2) (3) (4)
0 0.25 p(0) = 0.7788 1931. 4 1931
 0.25
0 1
1 0.25 p(1) = 0.25  0.7788 482. 9 483
 0.125
11 = 0.1947
2 0.25 p(2) = 0.125  0.1947 60.3 60
 0.0833
2 1 = 0.0243

3 0.25 p(3)= 0.0833  0.0243 4.96 5


 0.0625
3 1 = 0.0020

4 0.25 p(4)= 0.0625  0.0020 0.248 0


 0.05
4 1 = 0.0001
5 0.25 p(5)= 0.05  0.0001 0
 0.0417
5 1 = 0.000005

You can now try the following exercises


E4) In a certain factory turning out fountain pens, there is a small chance,
1
, for any pen to be defective. The pens are supplied in packets of
500
10. Calculate the approximate number of packets containing (i) one
defective (ii) two defective pens in a consignment of 20000 packets.

38
E5) A typist commits the following mistakes per page in typing 100 pages. Poisson Distribution
Fit a Poisson distribution and calculate the theoretical frequencies.
Mistakes per 0 1 2 3 4 5
page(X)
Frequency 42 33 14 6 4 1
(f)

We now conclude this unit by giving a summary of what we have covered in it.

10.5 SUMMARY
The following main points have been covered in this unit:
1. A random variable X is said to follow Poisson distribution if it
assumes indefinite number of non-negative integer values and its
probability mass function is given by:

 e  x
 ; x  0, 1, 2, 3,... and   0.
px  PX  x   x
0; elsewhere

2. For Poisson distribution, Mean = Variance = 3 =  ,  4  3 2  
1 1 1 1
3. 1  , 1  , 2  3  ,  2  for this distribution .
   
4. Recurrence relation for probabilities of Poisson distribution is

p  x  1  .p  x  , x  0, 1, 2, 3,...
x 1
5. Expected frequencies for a Poisson distribution are given by
e  x
f  x   N.P  X  x   N. ; x  0, 1, 2, ...
x
If you want to see what our solutions/answers to the exercises in the unit are,
we have given them in the following section.

10.6 SOLUTIONS/ANSWERS

E1) Let X be the Poisson variable “Number of fatal accidents in a year”.


1
Here n = 350, p 
1400
 1 
  = np =  350     0.25 .
 1400 
By Poisson distribution,

39
Discrete Probability
Distributions e . x
P X  x   , x  0, 1, 2, ...
x
x
e0.25  0.25 
 , x  0, 1, 2, ...
x
Therefore, P [at least one fatal accident]
 P  X  1 = 1 – P[X < 1] = 1 – P[X = 0]
0
e0.25  0.25 
 1 = 1 – e 0.25 = 1 – 0.78 = 0.22
0

E2) As mean = 6, therefore,  = 6.


As standard deviation is 2, therefore, variance = 4   = 4.
We get two different values of , which is impossible. Hence, the
statement is invalid.

E3) Let  be the mean of the distribution,


 by Poisson distribution, we have
e  x
P X  x   ; x  0, 1, 2, 3, ...
x
Given that P[X = 1] = P[X = 2],
e  1 e   2
 
1 2

2
   2 – 2 = 0  ( – 2) = 0
2
  = 0, 2.
 = 0 is rejected,
 =2
Hence, Mean = 2.
e   0
Now, P[X = 0] =  e  e2 = 0.1353,
0
[See table given in the Appendix at the end of this unit.]
2 4
e  4 e  2  e 2 16  2
and P  X  4       0.1353 
4 4 24 3
= 2(0.0451)
= 0.0902.

40
1 Poisson Distribution
E4) Here p  , n  10, N  20000,
500
1
  = np = 10   0.02
500
By Poisson frequency distribution
f  x   N.P  X  x 

e   x
=  20000  ; x  0, 1, 2,...
x
Now,
i) The number of packets containing one defective
= f(1)
1
e 0.02 .  0.02 
=  20000 
1

See the table given 


= (20000) (0.9802) (0.02) in the Appendix 
 
= 392.08 392; and
ii) The number of packets containing two defectives
2
e0.02  0.02 
= f(2) = 20000
2

 0.9802  0.0004 
=  20000  = 3.9208 4
2
E5) The mean of the given distribution is computed as follows
X f fX
0 42 0
1 33 33
2 14 28
3 6 18
4 4 16
5 1 5
Total 100 100

 Mean  =  fx  100  1
 f 100
 p  0   e  e 1 = 0.3679.

41
Discrete Probability
Distributions
Now, we obtain p(1), p(2), p(3), p(4), p(5) using the recurrence relation for
probabilities of Poisson distribution i.e.

p  x  1  p  x  ; x  0, 1, 2, 3, 4 and then obtain the expected frequencies
x 1
as shown in the following table:

X  1 px Expected/Theoretical
 frequency
x 1 x 1
f  x   N.P  X  x 
 100.P  X  x 

0 1 p  0   0.3679 36.79 37
1
0 1
1 1 p 1  1  0.3679  0.3679 36.79 37
 0.5
11
2 1 p  2   0.5  0.3679  0.184 18.4 18
 0.3333
2 1
3 1 p(3)  0.3333  0.184  0.0613 6.13 6
 0.25
31
4 1 p(4)=0.25  0.0613 = 0.0153 1.53 2
 0.2
4 1
5 1 p(5)=0.2  0.0153 = 0.0031 0.3 0
 0.1667
5 1

42
Appendix Poisson Distribution

Value of e (For Computing Poisson Probabilities)

(0 <  < I)

 0 1 2 3 4 5 6 7 8 9

0.0 1.0000 0.9900 0.9802 0.9704 0.9608 0.9512 0.9418 0.9324 0.9231 0.9139
0.1 0.9048 0.8958 0.8860 0.8781 0.8694 0.8607 0.8521 0.8437 0.8353 0.8270
0.2 0.7187 0.8106 0.8025 0.7945 0.7866 0.7788 0.7711 0.7634 0.7558 0.7483
0.3 0.7408 0.7334 0.7261 0.7189 0.7118 0.7047 0.6970 0.6907 0.6839 0.6771
0.4 06703 0.6636 0.6570 0.6505 0.6440 0.6376 0.6313 0.6250 0.6188 0.6125
0.5 0.6065 0.6005 0.5945 0.5886 0.5827 0.5770 0.5712 0.5655 0.5599 0.5543
0.6 0.5448 0.5434 0.5379 0.5326 0.5278 0.5220 0.5160 0.5113 0.5066 0.5016
0.7 0.4966 0.4916 0.4868 0.4810 0.4771 0.4724 0.4670 0.4630 0.4584 0.4538
0.8 0.4493 0.4449 0.4404 0.4360 0.4317 0.4274 0.4232 0.4190 0.4148 0.4107
0.9 0.4066 0.4026 0.3985 0.3946 0.3906 0.3867 0.3829 0.3791 0.3753 0.3716

(=1, 2, 3, ...,10)

 1 2 3 4 5 6 7 8 9 10

e 0.3679 0.1353 0.0498 0.0183 0.0070 0.0028 0.0009 0.0004 0.0001 0.00004

Note: To obtain values of e for other values of , use the laws of exponents i.e.

e  a  b  e a .e  b e. g. e 2.25  e 2 .e0.25   0.1353  0.7788  = 0.1054.

43
Discrete Uniform and
UNIT 11 DISCRETE UNIFORM AND Hypergeometric
Distributions
HYPERGEOMETRIC
DISTRIBUTIONS
Structure
11.1 Introduction
Objectives

11.2 Discrete Uniform Distribution


11.3 Hypergeometric Distribution
11.4 Summary
11.5 Solution/Answers

11.1 INTRODUCTION
In the previous two units, we have discussed binomial distribution and its
limiting form i.e. Poisson distribution. Continuing the study of discrete
distributions, in the present unit, two more discrete distributions – Discrete
uniform and Hypergeometric distributions are discussed.
Discrete uniform distribution is applicable to those experiments where the
different values of random variable are equally likely. If the population is finite
and the sampling is done without replacement i.e. if the events are random but
not independent, then we use Hypergemetric distribution.
In this unit, discrete uniform distribution and hypergeometric distribution are
discussed in Secs. 11.2 and 11.3, respectively. We shall be discussing their
properties and applications also in these sections.
Objectives
After studing this unit, you should be able to:
 define the discrete uniform and hypergeometric distributions;
 compute their means and variances;
 compute probabilities of events associated with these distributions; and
 know the situations where these distributions are applicable.

11.2 DISCRETE UNIFORM DISTRIBUTION


Discrete uniform distribution can be conceived in practice if under the given
experimental conditions, the different values of the random variable are
equally likely. For example, the number on an unbiased die when thrown may
be 1 or 2 or 3 or 4 or 5 or 6. These values of random variable, “the number on
an unbiased die when thrown” are equally likely and for such an experiment,
the discrete uniform distribution is appropriate.

45
Discrete Probability Definition: A random variable X is said to have a discrete uniform
Distributions
(rectangular) distribution if it takes any positive integer value from 1 to n,
and its probability mass function is given by
1
 for x  1, 2, ..., n
P X  x    n
0, otherwise.
where n is called the parameter of the distribution.
For example, the random variable X, “the number on the unbiased die
when thrown”, takes on the positive integer values from 1 to 6 follows
discrete uniform distribution having the probability mass function.

1
 , for x  1, 2, 3, 4, 5, 6.
P X  x    6
 0 , otherwise.

Mean and Variance of the Distribution


n n
1 1 n
Mean = E(X) =  x p  x    x.    x
x 1 x 1  n  n x 1
1
 1  2  3  ...  n 
n
 n  n  1 
1 n  n  1 sum of first n natural numbers  
 . 2 
n 2 
(see Unit 3of Course MST  001) 
n 1
 .
2
2
Variance = E( X 2 )  [E(X)] 2  
[  2  '2  1' ]

where
n 1
E X  [Obtained above]
2
n
 
E X 2   x 2 .p(x)
x 1

n
1
and E  X 2    x 2 .
x 1 n
1
 [12  22  32  ...  n 2 ]
n
sum of squares of first n 
 
1  n  n  1 2n  1   n  n  1 2n  1 
    natural numbers 
n 6  6 
(see Unit 3of Course MST  001) 
 

46

 n  1 2n  1 Discrete Uniform and
Hypergeometric
6 Distributions

 Variance =
 n  1 2n  1   n  1  2
 
6  2 
 n  1  2
   2n  1  3  n  1 
12
n 1  n  1 n  1  n 2  1
  4n  2  3n  3   
12 12 12
Example 1: Find the mean and variance of a number on an unbiased die when
thrown.
Solution: Let X be the number on an unbiased die when thrown,
 X can take the values 1, 2, 3, 4, 5, 6 with
1
P  X  x   ; x  1, 2, 3, 4, 5, 6.
6
Hence, by uniform distribution, we have
n 1 6 1 7
Mean =   , and
2 2 2
2
n 2  1  6   1 35
Variance =   .
12 12 12
Uniform Frequency Distribution
If an experiment, satisfying the requirements of discrete uniform distribution,
is repeated N times, then expected frequency of a value of random variable is
given by
f  x   N.P  X  x  ; x  1, 2, ..., n

1
 N. ; x  1, 2, 3,..., n.
n
Example 2: If an unbiased die is thrown 120 times, find the expected
frequency of appearing 1, 2, 3, 4, 5, 6 on the die.
Solution: Let X be the uniform discrete random variable, “the number on the
unbiased die when thrown”.
1
 P  X  x   ; x  1, 2, ..., 6
6
Hence, the expected frequencies of the value of random variable are given as
computed in the following table:

47
Discrete Probability
Distributions
X P X  x  Expected/Theoretical frequencies
f  x   N.P[X  x]  120.P[X  x]

1 1 1
120   20
6 6
2 1 1
120   20
6 6
3 1 1
120   20
6 6
4 1 1
120   20
6 6
5 1 1
120   20
6 6
6 1 1
120   20
6 6

Now, you can try the following exercise:


E1) Obtain the mean, variance of the discrete uniform distribution for the
random variable, “the number on a ticket drawn randomly from an urn
containing 10 tickets numbered from 1 to 10”. Also obtain the expected
frequencies if the experiment is repeated 150 times.

11.3 HYPERGEOMETRIC DISTRIBUTION


In the last section of this unit, we have studied discrete uniform probability
distribution wherein the probability distribution is obtained for the possible
outcomes in a single trial like drawing a ticket from an urn containing 10
tickets as mentioned in exercise E1). But, if there are more than one but finite
trials with only two possible outcomes in each trial, we apply some other
distribution. One such distribution which is applicable in such a situation is
binomial distribution which you have studied in Unit 9. The binomial
distribution deals with finite and independent trials, each of which has exactly
two possible outcomes (Success or Failure) with constant probability of
success in each trial. For example, if we again consider the example of
drawing ticket randomly from an urn containing 10 tickets bearing numbers
from 1 to 10. Then, the probability that the drawn ticket bears an odd number
5 1
is  . If we replace the ticket back, then the probability of drawing a ticket
10 2
5 1
bearing an odd number is again  . So, if we draw ticket again and again
10 2
with replacement, trials become independent and probability of getting an odd
number is same in each trial. Suppose, it is asked that what is the probability of
getting 2 tickets bearing odd number in 3 draws then we apply binomial
distribution as follows:
48
Let X be the number of times on odd number appears in 3 draws, then by Discrete Uniform and
binomial distribution, Hypergeometric
Distributions
2 3 2
1 1  1  1  3
P  X  2  C2    
3
  3      .
2 2  4  2  8
But, if in the example discussed above, we do not replace the ticket after any
draw the probability of getting an odd number gets changed in each trial and
the trials remain no more independent and hence in this case binomial
distribution is not applicable. Suppose, in this case also, we are interested in
finding the probability of getting ticket bearing odd number twice in 3 draws,
then it is computed as follows:
Let Ai be the event that ith ticket drawn bears odd number and A i be the event
that i th ticket drawn does not bear odd number.
 Probability of getting ticket bearing odd number twice in 3 draws

 P  A1  A 2  A 3   P  A1  A 2  A 3   P  A1  A 2  A3 

[As done in Unit 3 of this Course]

 P  A1  P  A 2  A1  P  A3  A1  A 2   P  A1  P  A 2  A1  P  A 3  A1  A 2 

 P  A1  P  A 2  A1  P  A 3  A1  A 2 

[Multiplication theorem for dependent events (See Unit 3 of this Course)]


5 4 5 5 5 4 5 5 4
= . .  . .  . .
10 9 8 10 9 8 10 9 8
5 5 4
 3
10  9  8
This result can be written in the following form also:
5  4  5  3 2
= [Multiplying and Dividing by 2]
2  10  9  8
5
5 4 1 1 C  5C1
=  5 = 5 C 2  5C1  10  102
2 10  9  8 C3 C3
3 2
In the above result, 5 C 2 is representing the number of ways of selecting 2 out
of 5 tickets bearing odd number, 5 C1 is representing the number of ways of
selecting 1 out of 5 tickets bearing even number i.e. not bearing odd number,
and 10 C3 is representing the number of ways of selecting 3 out of total 10
tickets.
Let us consider another similar example of a bag containing 20 balls out of
which 5 are white and 15 are black. Suppose 10 balls are drawn at random one
by one without replacement, then as discussed in the above example, the
probability that in these 10 draws, there are 2 white and 8 black balls is

49
Discrete Probability 5
Distributions C 2  15C8
20
.
C10
Note: The result remains exactly same whether the items are drawn one by one
without replacement or drawn at once.
Let us now generalize the above argument for N balls, of which M are white
and N  M are black. Of these, n balls are chosen at random without
replacement. Let X be a random variable that denote the number of white balls
drawn. Then, the probability of X  x white balls among the n balls drawn is
given by
M
C x . N  M Cn  x
P X  x   N
Cn

[For x  0, 1, 2,..., n  n  M  or x  0, 1, 2,..., M  n  M  ]

The above probability function of discrete random variable X is called the


Hypergeometric distribution.
Remark 1: We have a hypergeometric distribution under the following
conditions:
i) There are finite number of dependent trials
ii) A single trial results in one of the two possible outcomes-Success or
Failure
iii) Probability of success and hence that of failure is not same in each trial
i.e. sampling is done without replacement
Remark 2: If number (n) of balls drawn is greater than the number (M) of
white balls in the bag, then if n ≤ M, the number  x  of white balls drawn
cannot be greater than n and if n > M, then number of white balls drawn
cannot be greater than M. So, x can take the values upto n  if n  M  and
M(if n > M) i.e. x can take the value upto n or M, whichever is less, i.e.
x  min {n, M}.
The discussion leads to the following definition
Definition: A random variable X is said to follow the hypergeometric
distribution with parameters N, M and n if it assumes only non-negative
integer values and its probability mass function is given by
 M C x . N M C n  x
 for x  0, 1, 2, ..., min{n, M}
P X  x    N
Cn
 0, otherwise

where n, M, N are positive integers such that n ≤ N, M ≤ N.
Mean and Variance
n
Mean = E  X    x.p  X  x 
x0

n M
C x . N M C n  x
=  x.
x 1
N
Cn
50
n
M M 1 C x 1. N M C n  x Discrete Uniform and
=  x.
x 1 x
. N
Cn
Hypergeometric
Distributions

n
M
= N
Cn

x 1
M 1
C x 1. N  M Cn  x 
M M 1
 N
 C0 . N  M Cn 1  M 1C1. N M C n  2  ...  M 1Cn 1. N  M C 0 
Cn

M
 N
Cn
 N 1
Cn 1 
[This result is obtained using properties of binomial coefficients and involves
lot of calculations and hence its derivation may be skipped. It may be noticed
that in this result the left upper suffix and also the right lower suffix is the sum
of the corresponding suffices of the binomial coefficients involved in each
product term. However, the result used in the above expression is enrectangled
below for the interesting learners.]
We know that
mn m n
1  x   1  x  . 1  x  [By the method of indices]

Expanding using binomial theorem as explained in Unit 9 of this course, we


have
mn
C0 . x m  n  m  n C1. x m  n 1  m  n C 2 .x m  n  2  ...  mn
Cm  n

  m
C0 x m  m C1 x m 1  m C 2 x m 2  ...  m Cm 

. n C0 x n  n C1 x n 1  n C2 x n  2  ...  n C n 
Comparing coefficients of X m  n  r , we have
mn
Cr   m
C0 .n Cr  m C1 . n C r 1  ...  mC r . n C 0 
M n Nn N 1
= .
N N  n n 1

M.n n  1 N  1 nM
 .  .
N. N  1 n  1 N

 
E X 2  E  X  X  1  X 

 E  X  X  1   E  X 

 n M
C x . N  M Cn  x   nM 
  x  x  1 . N  
 x 0 Cn   N 
n
 M M  1 M  2 C x  2 . N M C n  x   nM 
   x  x  1 . . . N  
x 0  x x 1 Cn   N 

51
Discrete Probability
M  M  1  n   nM 
Distributions  N 
Cn  x 0
 M2

C x  2 . N M C n  x    
  N 
M  M  1 N  2  nM 
 N
Cn
 C n 2   
 N 

[The result in the first term has been obtained using a property of
binomial coefficients as done above for finding E(X).]
M  M  1 N  n n N2 nM
= . 
N n 2 Nn N

M(M  1)n(n  1) nM
 
N(N  1) N
Thus,
2
2 M  M  1 n  n  1 nM  nM 
V X  E X  2
  E  X   
N  N  1
 
N  N 

NM  N  M  N  n 
 [On simplification]
N 2  N  1

Example 2: A jury of 5 members is drawn at random from a voters’ list of


100 persons, out of which 60 are non-graduates and 40 are graduates. What is
the probability that the jury will consist of 3 graduates?
Solution: The computation of the actual probability is hypergeometric, which
is shown as follows:
60
C . 40C3
P [2 non-graduates and 3 graduates]  1002
C5
60  59  40  39  38  5  4  3  2

2  6 100  99  98  97  96
 0.2323
Example 3: Let us suppose that in a lake there are N fish. A catch of 500 fish
(all at the same time) is made and these fish are returned alive into the lake
after making each with a red spot. After two days, assuming that during this
time these ‘marked’ fish have been distributed themselves ‘at random’ in the
lake and there is no change in the total number of fish, a fresh catch of 400 fish
(again, all at once) is made. What is the probability that of these 400 fish, 100
will be having red spots.
Solution: The computation of the probability is hypergeometric and is shown
as follows: As marked fish in the lake are 500 and other are N  500,
500
C100 . N500C300
 P[100 marked fish and 300 others] = N
.
C400
We cannot numerically evaluate this if N is not given. Though N can be
estimated using method of Maximum likelihood estimation which you will
read in Unit 2 of MST-004 We are not going to estimate it. You may try it as
an exercise after reading Unit 2 of MST-004.
Here, let us take an assumed value of N say 5000.
52
Then, Discrete Uniform and
500 4500 Hypergeometric
C100 . C300
P  X  100  5000
Distributions
C400
You will agree that the exact computation of this probability is complicated.
Such problem is normally there with the use of hypergeometric distribution,
especially, if N and M are large. However, if n is small compared to N i.e. if n
n
is such that  0.05 , say then there is not much difference between sampling
N
with and without replacement and hence in such cases, the probability obtained
by binomial distribution comes out to be approximately equal to that obtained
using hypergeometric distribution.
You may now try the following exercise.
E2) A lot of 25 units contains 10 defective units. An engineer inspects 2
randomly selected units from the lot. He/She accepts the lot if both the
units are found in good condition, otherwise all the remaining units are
inspected. Find the probability that the lot is accepted without further
inspection.

We now conclude this unit by giving a summary of what we have covered in it.

11.4 SUMMARY
The following main points have been covered in this unit:
1) A random variable X is said to have a discrete uniform (rectangular)
distribution if it takes any positive integer value from 1 to n, and its
probability mass function is given by
1
 for x  1, 2, ..., n
P X  x    n
0, otherwise.

where n is called the parameter of the distribution.


n 1 n2 1
2) For discrete uniform distribution, mean = and variance = .
2 12
3) A random variable X is said to follow the hypergeometic distribution
with parameters N, M and n if it assumes only non-negative integer values
and its probability mass function is given by
 M Cx . N MCn x
 for x  0, 1, 2, ..., min{n, M}
P X  x    N
Cn
 0, otherwise

where n, M, N are positive integers such that n ≤ N, M ≤ N.
nM
4) For hypergeometric distribution, mean  and
N
NM  N  M  N  n 
variance  .
N 2  N  1

53
Discrete Probability
Distributions 11.5 SOLUTIONS/ANSWERS
E1) Let X be the number on the ticket drawn randomly from an urn containing
tickets numbered from 1 to 10.
 X is a discrete uniform random variable having the values
1
1, 2, 3, 4, …, 10 with probability of each of these values equal to .
10
Thus, the expected frequencies for the values of X are obtained as in the
following table:

X P X  x  Expected/Theoretical frequency
f  x   N.P  X  x 
 150.P  X  x 
1 1 1
150   15
10 10
2 1 1
150   15
10 10
3 1 1
150   15
10 10
4 1 1
150   15
10 10
5 1 1
150   15
10 10
6 1 1
150   15
10 10
7 1 1
150   15
10 10
8 1 1
150   15
10 10
9 1 1
150   15
10 10
10 1 1
150   15
10 10

E2) Here N = 25, M = 10 and n = 2.


 none of the 2 randomly selected 
The desired probability = P  
 units is found defective 

C 0 . 2510 C2 1 . C2
10 15
15  14
 25
 25  = 0.35.
C2 C2 25  24

54
Geometric and Negative
UNIT 12 GEOMETRIC AND NEGATIVE Binomial Distributions
BINOMIAL DISTRIBUTIONS
Structure
12.1 Introduction
Objectives

12.2 Geometric Distribution


12.3 Negative Binomial Distribution
12.4 Summary
12.5 Solutions/Answers

12.1 INTRODUCTION
In Units 9 and 11, we have studied the discrete distributions – Bernoulli,
Binomial, Discrete Uniform and Hypergeometric. In each of these
distributions, the random variable takes finite number of values. There may
also be situations where the discrete random variable assumes countably
infinite values. Poisson distribution, wherein discrete random variable takes an
indefinite number of values with very low probability of occurrence of event,
has already been discussed in Unit 10. Dealing with some more situations
where discrete random variable assumes countably infinite values, we, in the
present unit, discuss geometric and negative binomial distributions. It is
pertinent to mention here that negative binomial distribution is a generalization
of geometric distribution. Some instances where these distributions can be
applied are “deaths of insects”, “number of insect bites”.
Like binomial distribution, geometric and negative binomial distributions also
have independent trials with constant probability of success in each trial. But,
in binomial distribution, the number of trials (n) is fixed whereas in geometric
distribution, trials are performed till first success and in negative binomial
distribution trials are performed till a certain number of successes.
Secs. 12.2 and 12.3 of this unit discuss geometric and negative binomial
distribution, respectively along with their properties.
Objectives
After studing this unit, you would be able to:
 define the geometric and negative binomial distributions;
 calculate the mean and variance of these distributions;
 compute probabilities of events associated with these distributions;
 identify the situations where these distributions can be applied; and
 know about distinguishing features of these distributions like
memoryless property of geometric distribution.

55
Discrete Probability
Distributions 12.2 GEOMETRIC DISTRIBUTION
Let us consider Bernoulli trials i.e. independent trials having the constant
probability ‘p’ of success in each trial. Each trial has two possible outcomes –
success or failure. Now, suppose the trial is performed repeatedly till we get
the success. Let X be the number of failures preceding the first success.
Example of such a situation is “tossing a coin until head turns up”. X defined
above may take the values 0, 1, 2, …. Letting q be the probability of failure in
each trial, we have
P  X  0   P[Zero failure preceding the first success]

= P(S)
= p,
P  X  1 = P[One failure preceding the first success]

= P[F  S]
= P(F) P(S) [ trials are independent]
= qp
P[X = 2] = P[Two failures preceding the first success]
= P[F  F  S]
= P(F) P(F) P(S)
= qqp
= q2 p
and so on.
Therefore, in general, probability of x failures preceding the first success is
P[X = x] = q x p; x  0, 1, 2, 3, ...
Notice that for x  0, 1, 2, 3,... the respective probabilities p, qp, q2p, q3p,…
are the terms of geometric progression series with common ratio q. That is
why, the above probability distribution is known as geometric distribution [see
Unit 3 of MST-001].
Hence, the above discussion leads to the following definition:
Definition: A random variable X is said to follow geometric distribution if it
assumes non-negative integer values and its probability mass function is given
by
q x p for x  0, 1, 2, ...
P X  x   
 0, otherwise
Notice that

x 2 3
 q p  p  q p  q p  q p  ...
x0

= p[1 + q + q2 + q3+ …]

56
 1  p Geometric and Negative
= p   1 Binomial Distributions
1 q  p
a 1
[ sum of infinite terms of G.P.   (see Unit 3 of MST-001)]
1 r 1 q
Now, let us take up some examples of this distribution.
Example 1: An unbiased die is cast until 6 appear. What is the probability that
it must be cast more than five times?
Solution: Let p be the probability of a success i.e. getting 6 in a throw of the
die
1 5
 p and q = 1 – p =
6 6
Let X be the number of failures preceding the first success.
 by geometric distribution,
P  X  x   q x p; x  0, 1, 2, 3,...
x
5 1
     for x  0, 1, 2, 3, ...
6 6
Thus, the desired probability = P[The die is to be cast more than five times]
= P [The number of throws is at least 6]
The number of failures preceding 
P 
 the first success is at least 5 
= P[X  5]
= P[X = 5] + P[X = 6] + P[X = 7] +…
5 6 7
5 1 5 1 5 1
=                ...
6 6 6 6 6 6
5 2
 5   1  5  5  
=     1      ...
 6   6   6  6  

5
  5
 5 1 1  5 
     =  
 6   6  1  5   6 
 6
Let us now discuss some properties of geometric distribution.
Mean and Variance
Mean of the geometric distribution is given as
  
Mean = E(X) = xq p x
= p  x q x  p x q x 1.q
x0 x 1 x 1

57
Discrete Probability  
d x
Distributions  pq  x q x 1  p q q  
x 1 x 1 dq

d x
dq
 
q is the differentiation of qx w.r.t. q where x is kept as constant

d
[  x m   mx m 1 , where m is constant (see Unit 6 of MST-001)]
dx

d  x  sum of the derivatives is 


= pq q
dq  x 1   the derivatives of the sums 
 
d   x
 pq q
dq  x 1 

d
 pq  q  q 2  q 3  ...
dq 

d  q 
 pq
dq 1  q 

 1  q   q  1   Applying quotient rule 


 pq  2
 of differentiation 
 1  q    

1  q  q 
= pq  2 
 p 
q
 . ... (1)
p
Variance of the geometric distribution is
V(X) = E(X2) – [E(X)]2,
where

E(X2) =  x px 2

x 


=   x  x  1  x  p  x 
x 0

[ x 2  x  x  1  x (it has already been discussed in Unit 9)]


 
  x  x  1 p  x    x p  x 
x 0 x 0


q
  x  x  1 q x p    [Using (1) in second term]
x 2 p

q
 pq 2  x  x  1 q x  2    [ q x  q x  2 .q 2 ]
x2 p

58
 d2 x Geometric and Negative
d  d x 
 2  q    q   Binomial Distributions
 dq dq  dq  

d 
q  d 
 pq 2  2  q x      xq x 1   x  x  1 q x 2 
x  2 dq p  dq 
 treating x as constant 
 
 
d2   x  q
 pq 2 q 
dq 2  x  2  p

d2 q
 pq 2 q 2  q 3  q 4  ... 
2 
dq p

d 2  q2  q
 pq 2  
dq 2 1  q  p

d  1  q  2q  q  1  q
2
2
 pq  2

dq  1  q   p

d  2q  2q 2  q 2  q
 pq 2  
dq  1  q 2  p
 

d  2q  q 2  q
 pq 2  
dq  1  q  2  p
 
 1  q  2  2  2q   2q  q 2 .2 1  q 1  1  q
 
2
 pq  4

 1  q   p
 

 pq 
2

 1  q  2 1  q  2  2 2q  q 2
   q

 1  q 
4  p
 

2
p  2p 2  2q  2  q   q
 pq . 4
 as p = 1  q
p p
2
q q
 2    p 2  q  2  q   
p p
2
q 2 q
 2   1  q   q  2  q   
p   p
2
q q
 2   1  q 2  2q  2q  q 2  
p p

q2 q 2q 2 q
2   2 
1 
p2 p p p

59
Discrete Probability
Distributions  V(X) = E(X2) – [E(X)]2

2
2q 2 q  q 
= 2   
p p p

q2 q q  q  q  1  p  q  1 
= 2
    1    1    1  1 
p p pp  p p  pp 

q 1 q
 .  2
p p p

q q Mean
Remark 1: Variance = 2
 
p p.p p

Mean
 Variance > Mean [ p < 1  Mean ]
p

Hence, unlike binomial distribution, variance of the geometric distribution is


greater than mean.
Example 2: Comment on the following:
The mean and variance of geometric distribution are 4 and 3 respectively.
Solution: If the given geometric distribution has parameter p(probability of
success in each trial).
Then,
q q
Mean =  4 and Variance = 2  3
p p
1 3
 
p 4
4
 p = , which is impossible, since probability can never exceed unity.
3
Hence, the given statement is wrong.
Now, you can try the following exercises.
E1) Probability of hitting a target in any attempt is 0.6, what is the probability
that it would be hit on fifth attempt?

E2) Determine the geometric distribution for which the mean is 3 and variance
is 4.

60
Lack of Memory Property Geometric and Negative
Binomial Distributions
Now, let us discuss the distinguishing property of the geometric distribution
i.e. the ‘lack of memory’ property or ‘forgetfulness property’. For example, in
a random experiment satisfying geometric distribution the wait up to 3 trials
(say) for the first success does not affect the probability that one will have to
wait for a further 5 trials if it is given that the first two trials are failures. The
geometric distribution is the only discrete distribution which has the
forgetfulness (memoryless) property. However, there is one continuous
distribution which also has the memoryless property and that is the exponential
distribution which we will study in Unit 15 of MST-003. The exponential
distribution is also the only continuous distribution having this property. It is
pertinent to mention here that in several aspects, the geometric distribution is
discrete analogs of the exponential distribution.
Let us now give mathematical/statistical discussion on ‘memoryless property’
of geometric distribution.
Suppose an event occurs at one of the trials 1, 2, 3, 4,… and the occurrence
time X has a geometric distribution with probability p. Let X be the number of
trials preceding to which one has to wait for successful attempt.
Thus, P  X  j  P  X  j  P  X  j  1  ...

= q jp  q j1p  q j 2 p  ...

= q jp 1  q  q 2  ...

 1  1
= q jp    qj p   qj
1  q  p
Now, let us consider the event  X  j  k 

Now, P  X  j  k  X  j means the conditional probability of waiting for at


least j + k unsuccessful trials given that we waited for at least j unsuccessful
attempts; and is given by
P  X  j  k  X  j
P  X  j  k  X  j 
P  X  j

P (X  j  k)   X  j 

P  X  j

P X  j  k
 [ X  j  k implies that  j]
P  X  j

q j k
 j
 qk
q

 P  X  j = q j already 
 P X  k   
obtained in this section 

So, P  X  j  k  X  j  P  X  k 

61
Discrete Probability
Distributions
The above result reveals that the conditional probability of at least first j+k
trials are unsuccessful before the first success given that at least first j trial
were unsuccessful, is the same as the probability that the first k trials were
unsuccessful. So, the probability to get first success remains same if we start
counting of k unsuccessful trials from anywhere provided all the trials
preceding to it are unsuccessful i.e. the future does not depend on past, it
depends only on the present. So, the geometric distribution forgets the
preceding trials and hence this property is given the name “forgetfulness
property” or “Memoryless property” or “lack of memory” property.

12.3 NEGATIVE BINOMIAL DISTRIBUTION


Negative binomial distribution is a generalisation of geometric distribution.
Like geometric distribution, variance of this distribution is also greater than its
mean. There are many instances including ‘deaths of insects’ and ‘number of
insect bites’ where negative binomial distribution is employed.
Negative binomial distribution is a generalisation of geometric distribution in
the sense that geometric distribution is the distribution of ‘number of failures
preceding the first success’ whereas the negative binomial distribution is the
distribution of ‘number of failures preceding the r th success’.
Let X be the random variable which denote the number of failures preceding
the r th success. Let p be the probability of a success and let x failures are there
preceding the r th success and hence for this the number of trials is x + r.
th
Now,  x  r  trial is success, but the remaining (r – 1) successes in the
x  r  1 trials can happen in any r – 1 trials out of the  x  r  1 trials. Thus,
happening of first (r – 1) successes in  x  r  1 trials follow binomial
distribution with ‘p’ as the probability of success in each trial and thus is given
by

C r 1 p r 1 q 
x  r 1 x  r 1   r 1
 x  r 1C r 1 p r 1q x , where q = 1 – p
[ by binomial distribution, the probability of x successes in n trials with p as
the probability of success is n C x p x q n  x . ]
Therefore,
P[ x failures preceding the r th success]
th
=P[{First (r – 1) successes in  x  r  1 trials} {success in  x  r  trial}]
th
=P[First(r – 1) successes in  x  r  1 trails]. P[success in  x  r  trial]

  x  r 1

C r 1 p r 1 q x p

 x  r 1Cr 1 p r q x
The above discussion leads to the following definition:
Definition: A random variable X is said to follow a negative binomial
distribution with parameters r (a positive integer) and p (0 < p < 1) if its
probability mass function is given by:
62
 x  r 1 C r 1 p r q x for x  0, 1, 2, 3,... Geometric and Negative
P X  x    Binomial Distributions
 0, otherwise
Now, as we know that
n
Cr  n Cn r , [See ‘combination’ in Unit 4 of MST-001]
x  r 1
 C r 1 can be written as
x  r 1
C x  r 1 r 1  x  r 1C x

x  r 1 r  x 1
 
x x  r 1 x x r 1


 r   x  1   r   x  2  ...  r  1 r  r  1
x r 1


 r  x  1 r  x  2  ...  r  1 r
x


  r  x  1  r  x  2 ...  r  1  r 
x

[ Numerator is product of x terms from r + 0 to r   x  1 and we have taken


common (1) from each of these x terms in the product.]

  1
x  r  x  1 r  x  2  ...  r  1 r 
x
x
 1   r  r  1 r  2  ...r   x  1

x
[Writing the terms in the numerator in reverse order]
x  r 
  1  
 x

n
Note: The symbol   stands for n C x if n is positive integer and is equal to
x
n  n  1 n  2  ...n   x  1 n 
. We may also use the symbol   if n is any
x x
real but in this case though it does not stand for n C x , yet it is equal to
n  n  1 n  2  ...n   n  x 
.
x

Hence, the probability distribution of negative binomnial distribution can be


expressed in the following form:

63
Discrete Probability
Distributions x  r 
P  X  x    1   p r q x
 x
 r  x
    q  p r for x  0, 1, 2, 3, ...
 x
 r  x r x r
    q  1 p for x  0, 1, 2, ...
 x
 r  x  rx
Here, the expression    q  1 is similar to the binomial distribution
 x
 n  x n x
 p q
x
 r  x  rx
r r
    q  1 is the general term of 1   q    1  q 
 x  

n
You have already studied in Unit 9 of this Course that   p x q n  x is the
x
n
general term of  q  p  .

and hence
 r  x r x r
P  X  x      q  1 .p r is the general term of 1  q  p r
 x
P[X = 0], P[X = 1], P[X = 2],… are the successive terms of the binomial
r
expansion 1  q  p r and hence the sum of these probabilities
r
= 1  q  p r

= p  r pr [ 1 – q = p]
= 1,
which must be, being a probability distribution.
Also, as the probabilities of the negative binomial distribution for
X  0, 1, 2, ... are the successive terms of
r r r
r r 1
r  1  1  q 
1  q  p = 1  q     1  q          , which is a binomial
p  p  p  p 
expansion with negative index (  r), it is for this reason the probability
distribution given above is called the negative binomial distribution.
Mean and Variance
Mean and variance of the negative binomial distribution can be obtained on
observing the form of this distribution and comparing it with the binomial
distribution as follows:

64
The probabilities of binomial distribution for X = 0, 1, 2, … are the successive Geometric and Negative
terms of the binomial expansion of (q + p)n and the mean and variance Binomial Distributions
obtained for the distribution are
Mean = np   n  p  i.e. Product of index and second term in (q + p)

Variance = npq = (n) (p) (q) i.e. Product of index, second term in (q + p) and
first term in (q + p)
Similarly, the probabilities of negative binomial distribution for X = 0, 1, 2, ...
r
 1  q 
are the successive term of the expansion of       and thus, its mean
 p  p 
and variance are:
 1  q    q   rq
Mean = (index) [second term in      ] =  r       , and
 p  p   p   p
 1  q   1  q 
Variance = (index) [second term in       ] [First term in       ]
 p  p   p  p 
 q  1 
=  r      
 p  p 
rq
 .
p2
Remark 2
i) If we take r = 1, we have P  X  x   pq x for x  0, 1, 2, ... which is
geometric probability distribution.
Hence, geometric distribution is a particular case of negative binomial
distribution and the latter may be regarded as the generalisation of the
former.
ii) Putting r = 1 in the formulas of mean and variance of negative binomial
distribution, we have

Mean =
1 q  q , and
p p

Variance =
1 q   q
,
2
p p2
which are the mean and variance of geometric distribution.

Example 3: Find the probability that third head turns up in 5 tosses of an


unbiased coin.
1
Solution: It is a negative binomial situation with p  , r = 3,
2
x  r  5  x  2.
 by negative binomial distribution, we have
65
Discrete Probability
Distributions P  X  2  x  r 1
C r 1 p r q x
3 2 5
2  31 1 1  1  43 1 3
= C31      4 C2     
2 2 2 2 32 16
Example 4: Find the probability that a third child in a family is the family’s
second daughter, assuming the male and female are equally probable.
Solution: It is a negative binomial situation with
1
p [ male and female are equally probable]
2
r = 2, x  r  3
 x 1
 by negative binomial distribution,
2 1 3
1 21 1 1 1 1 1
P  X  1  x  r 1 r
Cr 1 p q = x
C 21      2 C1    2   .
2 2 2 8 4
Example 5: A proof-reader catches a misprint in a document with probability
0.8. Find the expected number of misprints in the document in which the
proof-reader stops after catching the 20th misprint.
Solution: Let X be the number of misprints not caught by the proof-reader and
r be the number of misprints caught by him/her. It is a negative binomial
situation where we are to obtain the expected (mean) number of misprints in
the document i.e. E(X + r). We will first obtain mean number of misprints
which could not be caught by the proof-reader i.e. E(X).
Here, p = 0.8 and hence q = 0.2, r = 20.
Now, by negative binomial distribution,
rq  20  0.2 
E X   5
p  0.8 
Therefore, E  X  r   E  X   r = 5 + 20 = 25.

Hence, the expected number of misprints in the document till he catches the
20th misprint is 25.
Now, we are sure that you will be able to solve the following exercises:
E3) Find the probability that fourth five is obtained on the tenth throw of an
unbiased die.
E4) An item is produced by a machine in large numbers. The machine is
known to produce 10 per cent defectives. A quality control engineer is
testing the item randomly. What is the probability that at least 3 items
are examined in order to get 2 defectives?
E5) Find the expected number of children in a family which stops
producing children after having the second daughter. Assume, the male
and female births are equally probable.

66
We now conclude this unit by giving a summary of what we have covered in it. Geometric and Negative
Binomial Distributions
12.4 SUMMARY
The following main points have been covered in this unit:
1) A random variable X is said to follow geometric distribution if it
assumes non-negative integer values and its probability mass function is
given by
q x p for x  0, 1, 2, ...
P X  x   
 0, otherwise
q q
2) For geometric distribution, mean  and variance = 2 .
p p
3) A random variable X is said to follow a negative binomial distribution
with parameters r (a positive integer) and p (0 < p < 1) if its probability
mass function is given by:
 x  r 1 C r 1 p r q x for x  0, 1, 2, 3, ...
P X  x   
 0, otherwise

rq rq
4) For negative binomial distribution, mean  and variance = 2 .
p p
5) For both these distributions, variance > mean.

12.5 SOLUTIONS/ANSWERS
E1) Let p be the probability of success i.e. hitting the target in an attempt.
 p  0.6, q  1  p  0.4 .
Let X be the number of unsuccessful attempts preceding the first
successful attempt.
 by geometric distribution,
P  X  x   q x p for x  0, 1, 2, ...
x
=  0.4  0.6  for x  0, 1, 2, ...

Thus, the desired probability = P[hitting the target in fifth attempt]


= P[The number of unsuccessful attempts
before the first success is 4]
 P  X  4
4
  0.4   0.6    0.0256  0.6  = 0.01536.

E2) Let p be the probability of success in an attempt, and q = 1 – p


q q
Now, mean =  3 and Variance = 2  4
p p

67
Discrete Probability
Distributions 1 4 3 1
   p  and hence q = .
p 3 4 4
Now, let X be the number of failures preceding the first success,
 P X  x   q xp
x
1 3
=     for x  0, 1, 2, ...
4 4
This is the desired probability distribution.
E3) It is a negative binomial situation with
1 5
r = 4, x  r  10  x  6, p  and hence q 
6 6
 P  X  6   x  r 1Cr 1p r q x
4 6
1 5
 6 41C 41    
6 6
625  25
 9 C3 .
36  36  36  36  36
9  8  7  625  25
 = 0.0217
6  36  36  36  36  36
E4) It is a negative binomial situation with r  2, x  r  3  x  1, p  0.1 and
hence q = 0.9.
Now, the required probability  P  X  r  3

= P  X  1

= 1  P  X  0

 1   0 r 1 C r 1 p r q 0 

 1   0 21 C21 (0.1) 2 (0.9) 0 

= 1 – (1) (0.01) = 0.99.


1 1
E5) It is a negative binomial situation with p = , q  , r  2.
2 2
Let X be the number of boys in the family
1
rq
 2   
E  X     2   2.
p 1
2
 E(X + r) = E(X) + r = 2 + 2 = 4
 the required expected value = 4.

68
Normal Distribution
UNIT 13 NORMAL DISTRIBUTION
Structure
13.1 Introduction
Objectives

13.2 Normal Distribution


13.3 Chief Characteristics of Normal Distribution
13.4 Moments of Normal Distribution
13.5 Mode and Median of Normal Distribution
13.6 Mean Deviation about Mean
13.7 Some Problems Based on Properties of Normal Distribution
13.8 Summary
13.9 Solutions/Answers

13.1 INTRODUCTION
In Units 9 to 12, we have studied standard discrete distributions. From this unit
onwards, we are going to discuss standard continuous univariate distributions.
This unit and the next unit deal with normal distribution. Normal distribution
has wide spread applications. It is being used in almost all data-based research
in the field of agriculture, trade, business, industry and the society. For
instance, normal distribution is a good approximation to the distribution of
heights of randomly selected large number of students studying at the same
level in a university.
The normal distribution has a unique position in probability theory, and it can
be used as approximation to most of the other distributions. Discrete
distributions occurring in practice including binomial, Poisson,
hypergeometric, etc. already studied in the previous block (Block 3) can also
be approximated by normal distribution. You will notice in the subsequent
courses that theory of estimation of population parameters and testing of
hypotheses on the basis of sample statistics have also been developed using
the concept of normal distribution as most of the sampling distributions tend to
normality for large samples. Therefore, study of normal distribution is very
important.
Due to various properties and applications of the normal distribution, we have
covered it in two units – Units 13 and 14. In the present unit, normal
distribution is introduced and explained in Sec. 13.2. Chief characteristics of
normal distribution are discussed in Sec. 13.3. Secs. 13.4, 13.5 and 13.6
describes the moments, mode, median and mean deviation about mean of the
distribution.
Objectives
After studing this unit, you would be able to:
 introduce and explain the normal distribution;

5
Continuous Probability
Distributions  know the conditions under which binomial and Poisson distributions
tend to normal distribution;
 state various characteristics of the normal distribution;
 compute the moments, mode, median and mean deviation about mean
of the distribution; and
 solve various practical problems based on the above properties of
normal distribution.

13.2 NORMAL DISTRIBUTION


The concept of normal distribution was initially discovered by English
mathematician Abraham De Moivre (1667-1754) in 1733. De Moivre obtained
this continuous distribution as a limiting case of binomial distribution. His
work was further refined by Pierre S. Laplace (1749-1827) in 1774. But the
contribution of Laplace remained unnoticed for long till it was given concrete
shape by Karl Gauss (1777-1855) who first made reference to it in 1809 as the
distribution of errors in Astronomy. That is why the normal distribution is
sometimes called Gaussian distribution. Though, normal distribution can be
used as approximation to most of the other distributions, here we are going to
discuss (without proof) its approximation to (i) binomial distribution and (ii)
Poisson distribution.
Normal Distribution as a Limiting Case of Binomial Distribution
Normal distribution is a limiting case of binomial distribution under the
following conditions:
i) n, the number of trials, is indefinitely large i.e. n  ;
ii) neither p (the probability of success) nor q (the probability of failure) is too
close to zero.
Under these conditions, the binomial distribution can be closely associated by
X  np
a normal distribution with standardized variable given by Z  . The
npq
approximation becomes better with increasing n. In practice, the
approximation is very good if both np and nq are greater than 5.
For binomial distribution, you have already studied [see Unit 9 of this course]
that
2 2
 2  npq  q  p   q  p 
1  33   3
 ,
2  npq  npq

 4 npq 1  3  n  2  pq  1  6pq


2  2
 2
 3 ,
2  npq  npq

q  p 1  2p
1  1   , and
npq npq

1  6pq
 2  2  3  .
npq
6
From the above results, it may be noticed that if n  , then moment Normal Distribution
coefficient of skewness ( 1 )  0 and the moment coefficient of kurtosis i.e.
 2  3 or  2  0 . Hence, as n  , the distribution becomes symmetrical
and the curve of the distribution becomes mesokurtic, which is the main
feature of normal distribution.
Normal Distribution as a Limiting Case of Poisson Distribution
You have already studied in Unit 10 of this course that Poisson distribution is a
limiting case of binomial distribution under the following conditions:
i) n, the number of trials is indefinitely large i.e. n 
ii) p, the constant probability of success for each trial is very small i.e. p  0.
iii) np is a finite quantity say ‘’.
As we have discussed above that there is a relation between the binomial and
normal distributions. It can, in fact, be shown that the Poisson distribution
approaches a normal distribution with standardized variable given by
X
Z as λ increases indefinitely.

For Poisson distribution, you have already studied in Unit 10 of the course that
32  2 1 1
1  3
 3   1  1  ; and
2   

 4 3 2 1 1
2  2
 2    3    2  2  3  .
2   
Like binomial distribution, here in case of Poisson distribution also it may be
noticed from the above results that the moment coefficient of skewness
( 1 )  0 and the moment coefficient of kurtosis i.e. 2  3 or  2  0 as
λ . Hence, as λ  , the distribution becomes symmetrical and the curve
of the distribution becomes mesokurtic, which is the main feature of normal
distribution.
Under the conditions discussed above, a random variable following a binomial
distribution or following a Poisson distribution approaches to follow normal
distribution, which is defined as follows:
Definition: A continuous random variable X is said to follow normal
distribution with parameters  (      ) and 2(>0) if it takes on any real
value and its probability density function is given by
2
1  x  
1   
f x  e 2  
,   x  ;
 2
which may also be written as

1  1  x   2 
= exp     ,   x  .
 2  2    

7
Continuous Probability
Distributions Remark
i) The probability function represented by f  x  may also be written as

f x; ,  2 .
ii) If a random variable X follows normal distribution with mean  and
variance 2, then we may write, “X is distributed to N(, 2)” and is
expressed as X  N(, 2).
iii) No continuous probability function and hence the normal distribution
can be used to obtain the probability of occurrence of a particular value
of the random variable. This is because such probability is very small,
so instead of specifying the probability of taking a particular value by
the random variable, we specify the probability of its lying within
interval. For detail discussion on the concept, Sec. 5.4 of Unit 5 may be
referred to.
X
iv) If X ~ N  ,  2  , then Z  is standard normal variate having

mean ‘0’ and variance ‘1’. The values of mean and variance of standard
normal variate are obtained as under, for which properties of
expectation and variance are used (see Unit 8 of this course).
 X 1
Mean of Z i.e. E  Z   E   =  E  X    
   
1
  E  X    

1
     = 0 [ E(X) = Mean of X =  ]

 X  
Variance of Z i.e. V(Z) = V  
  
1 1
  V  X      2  V  X  
2 
 
1 2

2
  [ variance of X is 2]

= 1.
X 
v) The probability density function of standard normal variate Z 

1  12 z2
is given by   z   e ,   z  .
2

This result can be obtained on replacing f  x  by   z  , x by z,  by 0


and  by 1 in the probability density function of normal variate X i.e. in
2
1  x  
1   
f x  e 2  
,   x  
 2

8
vi) The graph of the normal probability function f  x  with respect to x is Normal Distribution

famous ‘bell-shaped’ curve. The top of the bell is directly above the
mean . For large value of , the curve tends to flatten out and for
small values of  , it has a sharp peak as shown in (Fig. 13.1):

Fig. 13.1

Normal distribution has various properties and large number of applications. It


can be used as approximation to most of the other distributions and hence is
most important probability distribution in statistical analysis. Theory of
estimation of population parameters and testing of hypotheses on the basis of
sample statistics (to be discussed in the next course MST-004) have also been
developed using the concept of normal distribution as most of the sampling
distributions tend to normality for large samples. Normal distribution has
become widely and uncritically accepted on the basis of much practical work.
As a result, it holds a central position in Statistics.
Let us now take some examples of writing the probability function of normal
distribution when mean and variance are specified, and vice-versa:
Example 1: (i) If X ~ N (40, 25) then write down the p.d.f. of X
(ii) If X ~ N (  36, 20) then write down the p.d.f. of X
(iii) If X ~ N (0, 2) then write down the p.d.f. of X
Solution: (i) Here we are given X ~ N (40, 25)
 in usual notations, we have

μ = 40, σ 2 = 25  σ = ± 25
5    0always 
Now, the p.d.f. of random variable X is given by
2
1  x  
1  
2 σ 

f(x) = e
σ 2π
2
1  x  40 
1  
2 5 

= e ,   x  
5 2π
9
Continuous Probability
Distributions
(ii) Here we are given X ~ N (  36, 20).
in usual notations, we have
μ  36, σ 2  20  σ  20
Now, the p.d.f. of random variable X is given by
2
1 x µ 
1  
2 σ 

f(x) = e
σ 2π
2
1  x  (  36)  1
1    1  (x +36) 2
2 20 
= e = e 40
20 2π 40π
1
1  (x +36) 2
= e 40 ,   x  
2 10π
(iii) Here we are given X ~ N (0, 2).
 in usual notations, we have

μ  0, σ 2  2  σ  2
Now, the p.d.f. of random variable X is given by
2 2
1 x µ  1 x 0 
1  
2 σ 
 1  
2 2 

f(x) = e = e
σ 2π 2 2π
1
1  x2
4
 e ,   x  
2 π
Example 2: Below, in each case, there is given the p.d.f. of a normally
distributed random variable. Obtain the parameters (mean and variance)
of the variable.
2
1  x  46 
1  
2 6 

(i) f(x) = e ,  x 
6 2π
1
1  (x  60) 2
(ii) f(x) = e 32 ,   x  
4 2π
2
1  x  46 
1  
2 6 

Solution: (i) f(x) = e ,  x
6 2π
Comparing it with,
2
1 x µ 
1  
2 σ 

f(x) = e
σ 2π
we have
µ  46,   6

 Mean    46, var iance   2  36

10
1
1  (x  60) 2 Normal Distribution
(ii) f(x) = e 32 ,   x  
4 2π
2
1  X  60 
1  
2 4 

= e
4 2π
Comparing it with,
2
1 x µ 
1  
2 σ 

f(x) = e
σ 2π
we get
µ = 60,   4

 Mean    60, var iance   2  16


Here are some exercises for you.
E 1) Write down the p.d.f. of r. v. X in each of the following cases:
1 4
(i) X ~ N  , 
2 9
(ii) X ~ N (  40,16)
E 2) Below, in each case, is given the p.d.f. of a normally distributed
random variable. Obtain the parameters (mean and variance) of the
variable.
x2
1 
(i) f(x) = e 8 ,   x  
2 2π
1
1  (x  2) 2
4
(ii) f(x) = e ,  x 
2 π

Now, we are going to state some important properties of Normal distribution in


the next section.

13.3 CHIEF CHARACTERISTICS OF


NORMAL DISTRIBUTION
The normal probability distribution with mean  and variance 2 has the
following properties:
i) The curve of the normal distribution is bell-shaped as shown in Fig. 13.1
given in Remark (vi) of Sec. 13.2.
ii) The curve of the distribution is completely symmetrical about x =  i.e. if
we fold the curve at x  , both the parts of the curve are the mirror images
of each other.
iii) For normal distribution, Mean = Median = Mode

11
Continuous Probability
Distributions iv) f  x  , being the probability, can never be negative and hence no portion
of the curve lies below x-axis.
v) Though x-axis becomes closer and closer to the normal curve as the
magnitude of the value of x goes towards  or  , yet it never touches it.

vi) Normal curve has only one mode.


vii) Central moments of Normal distribution are
µ1  0,  2   2 , 3  0,  4  3 4 and

µ 32 
β1 = 3
 0, 2  24  3
µ2 2
i.e. the distribution is symmetrical and curve is always mesokurtic.
Note: Not only µ1 and µ 3 are 0 but all the odd order central moments
are zero for a normal distribution.

viii) For normal curve,


Q3 – Median = Median – Q1
i.e. First and third quartiles of normal distribution are equidistant from
median.
Q  Q1 2
ix) Quartile Deviation (Q.D.) = 3 is approximately equal to of the
2 3
standard deviation.
4
x) Mean deviation is approximately equal to of the standard deviation.
5
2 4
xi) Q.D. : M.D. : S.D. =  :  :  = 10 : 12 : 15
3 5
xii) The points of inflexion of the curve are
1
1 
x    , f  x   e 2
 2
xiii) If X1 , X 2 ,..., X n are independent normal variables with means
1 ,  2 ,...,  n and var iances 12 , 22 , ...,  n2 respectively then the linear
combination a1X1 + a 2 X 2 +...+ a n X n of X1 , X 2 ,..., X n is also a normal
variable with
mean a11  a 2 2  ...  a n  n and var iance a1212  a 22 22  ...  a n2  n2 .
xiv) Particularly, sum or difference of two independent normal variates is also
a normal variate. If X and Y are two independent normal variates with
means 1, 2 and variances 12 ,  2 2 , then

X + Y  N(1 + 2, 12 + 22) and X – Y  N(1  2, 12 + 22).


Also, if X1, X2, …, Xn are independent variates each distributed as
 2 
N( ,2), then their mean X ~ N  , .
 n 
12
Normal Distribution

xv) Area property:




P     X       f  x  dx  0.6827,

1
Or P  1  Z  1     z  dz = 0.6827,
1
 2 

P    2  X    2   f  x  dx  0.9544,
 2 

2
Or P  2  Z  2     z  dz = 0.9544, and
2

 3 

P    3  X    3    f  x  dx  0.9973.
 3

3
Or P  3  Z  3     z  dz = 0.9973.
3

This property and its applications will be discussed in detail in Unit 14.
Let us now establish some of these properties.

13.4 MOMENTS OF NORMAL DISTRIBUTION


Before finding the moments, following is defined as gamma function [See Unit
16 of the Course also for detail discussion] which is used for computing the
even order central moments.
Gamma Function

If n > 0, the integral  x n 1e x dx is called a gamma function and is denoted by
0

n .
 
2 x 31  x
e.g.  x e dx   x e dx   3
0 0

  1
1/ 2  x
1
x 1
and 0 x e dx  0 x 2 e dx   2 
Some properties of the gamma function are

i) If n > 1,  n    n  1  n  1
ii) If n is a positive integer, then n  n  1

1
iii)     .
2
13
Continuous Probability
Distributions
Now, the first four central moments of normal distribution are obtained as
follows:
First Order Central Moment
As first order central moment (1) of any distribution is always zero [see Unit
3 of MST-002], therefore, first order central moment (1) of normal
distribution = 0.
Second Order Central Moment

2
2 =   x    f  x  dx

[See Unit 8 of MST-003]

2
 1  x  
2 1   
  x  
2  
 . e dx
  2
x 
Put  z  x    z

Differentiating
dx
 dz

 dx  dz
Also, when x  , we have z   and
and when x   , z  
 1
2 1  z2
 2    z
  2
e 2 dz

 1
2 2
 z2
2
 ze dz
2 

2
 z
2 2 2

2
  z e dz
2 0
z2

2
 on changing z to – z, the integrand i.e. z e does not get changed 2

i.e. it is an even function of z [see Unit 2 of MST-001]. Now, the


following property of definite integral can be used:
 

 f  z  dz  2  f  z  dz if f  z  is even function of z
 0

1
1  12
Now, put z 2  2t  z  2 t  2 t 2  dz  2 t dt
2

14
 1 Normal Distribution
2 1 2
 2  2  (2t)e t 1
dt   2 2  t 2 e  t dt
0  0
2t 2

2 2 32 1  t
  t e dt
 0

2 2  3 
   [By def. of gamma function]
 2

2 2 1 1
 [By Property (i) of gamma function]
 2 2

2


  [By Property (iii) of gamma function]

 2
Third Order Central Moment

3
3 =   x    f  x  dx


2
 1  x  
3 1   
  x  
2  
 e dx
  2
x 
Put  z  x    z  dx  dz

and hence

3 1  12 z 2
3 

  z  .
 2
e dz

 1
1  z2
 3  z 3e 2
dz
2 

1 1 1
 z2  z2  z2
3 3 3
Now, as integrand z e 2 changes to z e 2 on changing z to – z i.e. z e 2

is an odd function of z.
Therefore, using the following property of definite integral:
a

 f  z  dz  0 if f(z) is an odd function of z


a

we have,
1
3  3 0 = 0
2

15
Continuous Probability
Distributions Fourth Order Central Moment
2
  1  x  
4 4 1   
  x    f  x  dx    x   
2  
4 = e dx
   2
x 
Putting z

 dx  dz

4 1  12 z 2
 4    z 
  2
e dz

 1  1
4  z2 2 4  z2
  z 4e 2
dz  z 4
e 2
dz
2  2 0

z2
4 
 integrand z . e does not get changed on changing z to – z and hence it is
2

an even function of z [using the same property as used in case of 2].

z2
Put  t  z2 = 2t
2
 2zdz = 2dt
 z dz = dt
dt dt
 dz = 
z 2t

2 4 1
 4   (2t) 2 e  t dt
2 0 2t
 
24 .4 2 t 1 4 4 32  t
=  t e dt   t e dt
2 2 0 t  0

4 4 52 1  t
  t e dt
 0

4 4 5
 [By definition of gamma function]
 2

4 4 3 3
 [By Property (i) of gamma function]
 2 2

4 4 3 1 1
 [By Property (i) of gamma function]
 22 2

3 4 1
  [Using   (Property (iii) of gamma function)]
 2

 3 4

16
Thus, the first four central moments of normal distribution are Normal Distribution

1  0,  2   2 , 3  0,  4  3 4 .

 32 4 34 3 4
 1  = 0,  2   = 3
 23 22 2
2
4 
Therefore, moment coefficient of skewness ( 1 )  0

 the distribution is symmetrical.


The moment coefficient of kurtosis is 2  3 or  2  0.

 The curve of the normal distribution is mesokurtic.

Now, let us obtain the mode and median for normal distribution in the next
section.

13.5 MODE AND MEDIAN OF NORMAL


DISTRIBUTION
Mode
Let X ~ N  ,  2  , then p.d.f . of X is given by
2
1  x  
1  
 

f (x)  e 2 ...(1)
 2
,   x  
Taking logarithm on both sides of (1), we get
2
1 1 x   logmn  log m  log n 
logf(x) = log    log e  n 
 2 2    and log m  n log m 
1 1
 log  2
(x  )2 as loge = 1
σ 2π 2
Differentiating w.r.t.x
1 ' 1 (x  )
f (x)  0  2 2(x  )  
f(x) 2 2
(x  )
 f ' (x)   f (x) … (2)
2
For maximum or minimum
f ' (x)  0

(x  )
 f(x)  0
2

 x   0 as f (x)  0

17
Continuous Probability x
Distributions
Now differentiating (2) w.r.t. x, we have
 x   1
f  (x)   f '(x)  f (x)
 2
f () f ( )
f (x) at x   0 2
 2 0
 
 x =  is point where function has a maximum value.
 Mode of X is  .
Median
Let M denote the median of the normally distributed random variable X.
We know that median divides the distribution into two equal parts
M 
1
  f (x)dx   f (x)dx 
 M
2
M
1
  f (x)dx  2


2
µ 1  x   M
1  
 
 1
  e 2 dx   f (x)dx 
 σ 2π 
2

x 
In the first integral, let us put z

Therefore, dx   dz
Also when x    z  0 , and
when x    z   .
Thus, we have
0 M
1  12  z 2 1
 e dz   f (x)dx 
 2 
2
0 1 M
1  z2 1
  e 2 dz   f (x)dx 
 2 
2

 1  12 z2 
M  Z is s.n.v.with p.d.f. (z)  e 
1 1  2 
 +  f (x) dx 
2  2   0
1 
So  (z)dz  1   (z)dz  
   2 
M
  f(x)dx = 0
µ

M  as f (x)  0
18
 Median of X   Normal Distribution

From the above two results, we see that


Mean = Median = Mode = 

13.6 MEAN DEVIATION ABOUT THE MEAN


Mean deviation about mean for normal distribution is

  | x  Mean | f  x dx

[See Section 8.4 of Unit 8]

2
 1  x  
1  
2  

  x  e dx
  2
x 
Put  z  x    z

dx
  dz

 1
1  z2
 M.D. about mean =  | z | e 2 dz
  2
 1
  z2
2
  | z |e dz
2 

1
 z2
Now, | z | e 2 (integrand) is an even function z as it does not get changed on
changing z to –z,  by the property,
a a
“  f (x)dx  2 f (x)dx, if f  x  is an even function of x ”, we have
a 0

1 
  z2
M.D. about mean = 2  z e 2 dz
2 0
Now, as the range of z is from 0 to ∞ i.e. z takes non-negative values,
 z = z and hence
1 
2  z2
2
M.D. about mean   ze dz
2 0

z2
Put  t  z 2  2t  2zdz = 2dt  zdz = dt
2
 
2   e t  2 2
 M.D. about mean =  e t
dt  2      0  1 = 
2 0   1  0  

19
Continuous Probability
Distributions 2
In practice, instead of , its approximate value is mostly used and that is

4
.
5

2 2 7 7 4
    0.6364  0.7977  0.08 or (approx.)
 22 11 5
Let us now take up some problems based on properties of Normal Distribution
in the next section.

13.7 SOME PROBLEMS BASED ON PROPERTIES


OF NORMAL DISTRIBUTION
Example 3: If X1 and X2 are two independent variates each distributed as
N(0, 1), then write the distribution of (i) X1 + X2. (ii) X1 – X2.
Solution: We know that, if X1 and X2 are two independent normal variates s.t.

 
X1 ~ N 1 , 12 and X 2 ~ N  2 , 22 
then

 
X1  X 2 ~ N 1   2 , 12   22 , and


X1  X 2 ~ N 1   2 , 12   22  [See Property xiii (Section 13.3)]

Here, X1 ~ N  0,1 , X 2 ~ N  0,1

 i) X1  X 2 ~ N  0  0, 1  1

i.e. X1  X 2 ~ N  0, 2  , and

ii) X1  X 2 ~ N  0  0, 1  1

i.e. X1  X 2 ~ N  0, 2 

Example 4: If X ~ N  30, 25  , find the mean deviation about mean.

Solution: Here  = 30, 2 = 25   = 5.

2 2 2
 Mean deviation about mean =  = .5 = 5
  

Example 5: If X ~ N  0, 1 , what are its first four central moments?

Solution: Here  = 0, 2 = 1  σ = 1.
 first four central moments are:
1  0,  2   2  1,  3  0,  4  3 4  3.

Example 6: If X1 , X 2 are independent variates such that X1 ~ N (40, 25) ,


X 2 ~ N (60, 36) , then find mean and variance of (i) X = 2X1 +3X 2
(ii) Y  3X1  2X 2
20
Solution: Here X1 ~ N (40, 25), X 2 ~ N (60, 36) Normal Distribution

 Mean of X1 = E(X1 )  40

Variance of X1  Var(X1 )  25

Mean of X 2 = E(X 2 )  60

Variance of X 2  V ar(X 2 )  36

Now,
(i) Mean of X = E(X)  E(2X1  3X 2 )  E(2X1 )  E(3X 2 )

= 2E(X1 ) + 3E(X 2 ) = 2  40  3  60 = 80 + 180 = 260

Var (X) = Var(2X1  3X 2 )

 Var(2X1 )  Var(3X 2 )  X1 and X 2 are independent 

 4Var(X1 ) + 9Var(X 2 )

 4  25  9  36 = 100 + 324 = 424


(ii) Mean of Y = E(Y)  E(3X1  2X 2 )

 E(3X1 )  E( 2X 2 )

= 3E(X1 ) + (  2)E(X 2 )

= 3  40  2  (60) = 120 – 120 = 0

Var (Y) = Var (3X1  2X 2 )

= Var(3X1 )  Var( 2X 2 )


2
= (3) 2 Var  X1    2  Var  X 2 

= 9  25  4  36 = 225 + 144 = 369


You can now try some exercises based on the properties of normal distribution
which you have studied in the present unit.
E3) If X1 and X2 are two independent normal variates with means 30, 40
and variances 25, 35 respectively. Find the mean and variance of
i) X1 + X2
ii) X1 – X2
E4) If X  N(50, 225), find its Quartile deviation.
E5) If X1 and X2 are independent variates with each distributed as
X  X2
N (50, 64), what is the distribution of 1 ?
2
E6) For a normal distribution, the first moment about 5 is 30 and the fourth
moment about 35 is 768. Find the mean and standard deviation of the
distribution.

21
Continuous Probability
Distributions 13.8 SUMMARY
The following main points have been covered in this unit:
1) A continuous random variable X is said to follow normal distribution with
parameters  (      ) and 2(>0) if it takes on any real value and its
probability density function is given by
2
1  x  
1   
f x  e 2  
,   x  
 2
X
2) If X  
N ,  2 , then Z 

is standard normal variate.

3) The curve of the normal distribution is bell-shaped and is completely


symmetrical about x   .
4) For normal distribution, Mean = Median = Mode.
5) Q3 – Median = Median – Q1
Q3  Q1 2
6) Quartile Deviation (Q.D.) = is approximately equal to of the
2 3
standard deviation.
4
7) Mean deviation is approximately equal to of the standard deviation.
5
8) Central moments of Normal distribution are
1  0,  2   2 , 3  0,  4  34
9) Moment coefficient of skewness is zero and the curve is always mesokurtic.
10) Sum of independent normal variables is also a normal variable.

13.9 SOLUTIONS/ANSWERS
1 4
E 1) (i) Here we are given X ~ N  , 
2 9
 in usual notations, we have
1 4 2
  , 2  
2 9 3
Now, p.d.f. of r.v. X is given by
2
1  x  
1  
2  

f(x) = e ,   x  
 2
2
1  x 1/2 
1  
2  2/3 

= e
2

3

22
2
9  2x 1  Normal Distribution
3  
2 4 

= e ,   x  
2 2π
(ii) Here we are given X ~ N(40, 16)
 in usual notations, we have
   40, 2  16 4
Now, p.d.f. of r.v. X is given by
2
1  x  
1  
2  

f(x) = e ,   x 
σ 2π
2
1  x  (  40) 
1  
2 4


= e
4 2π
2
1  x + 40 
1  
2 4 

= e ,   x  
4 2π
x2
1 
E 2) (i) f(x) = e 8,   x  
2 2π
2
1x
1 
= e 24
2 2π
2
1  x 0 
1  
2 2 

= e ...(1) ,   x  
2 2π
Comparing (1) with,
2
1  x  
1  
2  

f(x) = e ,   x  
σ 2π
we get
  0, 2

 Mean    0 and var iance   2  (2) 2  4


1
1  (x  2) 2
4
(ii) f(x) = e ,   x  
2 π
1
1 
2×2
(x  2)2
= e
2× 2 π
2
1  x 2 
1  
2 2 

= e ...(1) ,   x  
2 2π
Comparing (1) with,
2
1  x  
1  
2  

f (x)  e ,   x  
 2
23
Continuous Probability
Distributions
we get

  2,  2

 Mean    2 and var iance   2  ( 2) 2  2


E3) i) X1 + X2  N(1 + 2, 12 + 22)
 X1 + X2  N(30 + 40, 25 + 35)
 X1 + X2  N(70, 60)
ii) X1 – X2  N(30  40, 25 + 35)
 X1 – X2  N(  10, 60)
E4) As 2 = 225
  = 15
2 2
and hence Q.D. =   15  10
3 3
E5) We know that if X1, X2, .., Xn are independent variates each distributed as
 2 
N(, 2), then X ~ N  , 
 n 
Here X1 and X2 are independent variates each distributed as N(50, 64),
X1  X 2  64 
 their mean i.e. X i.e. ~ N  50, 
2  2 

i.e X ~ N(50, 32).

E6) We know that 1  x  A [See Unit 3 of MST-002]

where 1 is the first moment about A.

 30 = x  5  x  35  Mean = 35
Given that fourth moment about 35 is 768. But mean is 35, and hence the
fourth moment about mean = 768.
 4 = 768
 34 = 768
768
 4 =   4  3 4 
3
 4 = 256 = (4)4   = 4.

24
Area Property of
UNIT 14 AREA PROPERTY OF NORMAL Normal Distribution
DISTRIBUTION
Structure
14.1 Introduction
Objectives

14.2 Area Property of Normal Distribution


14.3 Fitting of Normal Curve using Area Property
14.4 Summary
14.5 Solutions/Answers

14.1 INTRODUCTION
In Unit 13, you have studied normal distribution and its chief characteristics.
Some characteristics including moments, mode, median, mean deviation about
mean have been established too in Unit 13. The area property of normal
distribution has just been touched in the preceding unit. Area property is very
important property and has lot of applications and hence it needs to be studied in
detail. Hence, in the Unit 14 this property with its diversified applications has
been discussed in detail. Fitting of normal distribution to the observed data and
computation of expected frequencies have also been discussed in one of the
sections i.e. Sec. 14.3 of this unit.
Objectives
After studing this unit, you would be able to:
 describe the importance of area property of normal distribution;
 explain use of the area property to solve many practical life problems; and
 fit a normal distribution to the observed data and compute the expected
frequencies using area property.

14.2 AREA PROPERTY OF NORMAL


DISTRIBUTION
Let X be a normal variate having the mean  and variance 2.
Suppose we are interested in finding P    X  x1  See Fig.14.1 
2
x1 x1 1  x  
1   
Now, P    X  x1    f  x  dx   e 2  
dx
   2

25
Continuous Probability
Distributions

Fig. 14.1: P[µ < X < x1 ]


x 
Put  z  x    z

dx
 dz

Also, when X = , Z = 0
x1  
and when X  x1 , Z   z1 (say)

z1
1  12 Z2
 P    X  x1   P  0  Z  z1    e dz
0  2

z1 z
1  12 Z2 1

 e dz     z  dz
0 2 0

1  12 z 2
where   z   e is the probability density function of standard normal
2
z1 z
1  12 z2 1

variate and the definite integral  e dz i.e.   z dz represents the area
0 2 0

under standard normal curve between the ordinates at Z = 0 and Z = z1. (Fig.
14.2).

Fig. 14.2: P[0 < Z < z1 ]

You need not to evaluate the integral to find the area. Table is available to find
such area for different values of z1.
Here, we have transformed the integral from
2
x1 1  x   z1
1 2  
 1  12 z2
  2 e  dx to 
0 2
e dz

26
i.e. we have transformed normal variate ‘X’ to standard normal variate (S.N.V.) Area Property of
X Normal Distribution
Z .

This is because, the computation of
2
x1 1  x  
1  2   

 e dx requires construction of separate tables for different values of
 2
 and  as the normal variate X may have any values of mean and standard
deviation and hence different tables are required for different  and . So,
infinitely many tables are required to be constructed which is impossible. But
beauty of standard normal variate is that its mean is always ‘0’ and standard
deviation is always ‘1’ as shown in Unit 13. So, whatever the values of mean
and standard deviation of a normal variate be, the mean and standard deviation
on transforming it to the standard normal variate are always ‘0’ and ‘1’
respectively and hence only one table is required.

In particular,


P     X       f  x  dx [See Fig.14.3]


 X  
1 Z   when X    , Z  
 1
 P  1  Z  1     z  dz  
1  when X    , Z          1 
   
1
 2   z  dz [By Symmetry]
0

 From the table given in the 


= 2  0.34135  Appendix at the end of the unit 
 
= 0.6827

Fig. 14.3: Area within the Range 


Similarly,
 2 

P    2  X    2    f  x  dx See Fig.14.4
 2 

27
Continuous Probability
Distributions  X  
for Z   , we have 
2 2
 
 P  2  Z  2     z  dx  2   z  dx  Z  2 whenX    2 
2 0 and Z  2 when X    2 
 
 
 From the table given in the 
= 2  0.4772  Appendix at the end of the unit 
 
= 0.9544

Fig. 14.4: Area within the Range   2


and
P    3  X    3   P  3  Z  3 = 2.P [0 < Z < 3] [See Fig. 14.5]

= 2  0.49865 = 0.9973
 P[X lies within the range   3] = 0.9973
 P[X lies outside the range   3] = 1 – 0.9973 = 0.0027
which is very small and hence usually we expect a normal variate to lie within
the range from – 3 to 3, though, theoretically it ranges from –  to .

Fig. 14.5: Area within the Range   3


From the above discussion, we conclude that while solving numerical problems,
we need to transform the given normal variate into standard normal variate
because tables for the area under every normal curve, being infinitely many,
cannot be made available whereas the standard normal curve is one and hence
table for area under this curve can be made available and this is given in the
Appendix at the end of this unit.
Example 1: If X ~ N (45, 16) and Z is the standard normal variable (S.N.V.) i.e
X 
Z then find Z scores corresponding to the following values of X.

28
(i) X = 45 (ii) X = 53 (iii) X = 41 (iv) X = 47 Area Property of
Normal Distribution
Solution: We are given X ~ N (45, 16)
 In usual notations, we have

  45, 2  16     16    4    0 always 

X   X  45
Now Z  
 4
45  45 0
(i) When X = 45, Z  0
4 4
53  45 8
(ii) When X = 53, Z=  2
4 4
41  45 4
(iii) When X = 41, Z   1
4 4
47  45 2
(iv) When X = 47, Z   0.5
4 4
Example 2: If the r.v. X is normally distributed with mean 80 and standard
deviation 5, then find
(i) P  X  95 , (ii) P  X  72 , (iii) P  60.5  X  90 ,

(iv) P 85  X  97  , and (v) P  64  X  76 

Solution: Here we are given that X is normally distributed with mean 80 and
standard deviation (S.D.) 5.
i.e. Mean =   80 and var iance   2  (S.D.) 2  25.
X   X  80
If Z is the S.N.V., then Z  
 5
Now
95  80 15
(i) X = 95, Z =  3
5 5
 P  X  95  P  Z  3 [See Fig.14.6]

= 0.5  P  0  Z  3

= 0.5 – 0.4987 [Using table area under normal curve]


= 0.0013

Fig. 14.6: Area to the Right of X= 95


29
Continuous Probability
Distributions 72  80 8
(ii) X = 72, Z =   1.6
5 5
 P  X  72  P  Z  1.6  [See Fig.14.7]

 normal curveis symmetrical 


= P  Z  1.6  about the line Z  0 
 
= 0.5  P  0  Z  1.6 

= 0.5 – 0.4452 [Using table area under normal curve]


= 0.0548

Fig. 14.7: Area to the Left of X = 72

60.5  80 19.5
(iii) X = 60.5, Z    3.9
5 5
90  80 10
X = 90, Z   2
5 5
 P  60.5  X  90  P  3.9  X  2  [See Fig.14.8]

= P  3.9  X  0   P  0  Z  2

 normal curve is 
= P  0  X  3.9   P  0  Z  2  symmetrical about 
 the line Z  0 

 Using table area 


= 0.5000+ 0.4772  under normal curve 
 
= 0.9772

30
Area Property of
Normal Distribution

Fig. 14.8: Area between X = 60.5 and X = 90


85  80 5
(iv) X = 85, Z   1
5 5
97  80 17
X= 97, Z    3.4
5 5
 P 85  X  97   P 1  Z  3.4 [See Fig.14.9]

= P  0  Z  3.4  P  0  Z  1

= 0.4997 – 0.3413 [Using table area under normal curve]


= 0.1584

Fig. 1 4.9: Area between X = 85 and X= 97

64  80 16
(v) X = 64, Z   3.2
5 5
76  80 4
X = 76, Z    0.8
5 5
P  64  X  76   P  3.2  Z  0.8 [See Fig.14.10]

 normal curve is symmetrical 


= P  0.8  Z  3.2 about the line Z  0 
 
= P  0  Z  3.2  P  0  Z  0.8

 Using table area 


= 0.4993 – 0.2881  under normal curve 
 
= 0.2112

31
Continuous Probability
Distributions

Fig. 14.10: Area between X = 64 and X= 76

Example 3: In a university the mean weight of 1000 male students is 60 kg and


standard deviation is 16 kg.
(a) Find the number of male students having their weights
i) less than 55 kg
ii) more than 70 kg
iii) between 45 kg and 65 kg

(b) What is the lowest weight of the 100 heaviest male students?
(Assuming that the weights are normally distributed)
Solution: Let X be a normal variate, “The weights of the male students of the
university”. Here, we are given that µ = 60 kg, σ = 16 kg, therefore,
X ~ N(60, 256).
We know that if X ~ N(µ, σ2), then the standard normal variate is given by
X 
Z .

X  60
Hence, for the given information, Z 
16
55  60
(a) i) For X = 55, Z   0.3125  0.31 .
16
Therefore,
P[X < 55] = P [Z <  0.31] = P [Z > 0.31] [See Fig. 14.11]
area on both 
= 0.5  P [0 < Z < 0.31] sides of Z  0 is 0.5
 
 Using table area 
= 0.5  0.1217  under normal curve 
 
= 0.3783

Fig. 14.11: Area Representing Students having Less than 55 kg weight


32
Area Property of
Normal Distribution
Number of male students having weight less than 55 kg = N  P(X < 55)
= 1000  0.3783
= 378
70  60
ii) For X = 70, Z   0.625 0.63
16

P[X>70] = P [Z > 0 .63] [See Fig. 14.12]


area on both 
= 0.5  P [0 < Z < 0.63] sides of Z  0 is 0.5
 

 Using table area 


= 0.5  0.2357  under normal curve 
 
= 0.2643

Fig. 14.12: Area Representing Students having More than 70 kg weight

Number of male students having weight more than 70 kg = N  P[X > 70]
= 1000  0.2643
= 264
45  60
iii) For X 45, Z   0.9375  0.94
16
65  60
For X 65, Z   0.3125 0.31
16
P  45  X  65  P  0.94  Z  0.31 [See Fig. 14.13]

= P[  0.94 < Z < 0] + P[0 < Z < 0.31]


 P[0  Z  0.94]  P[0  Z  0.31]
= 0.3264 + 0.1217 = 0. 4481

33
Continuous Probability
Distributions

Fig. 14.13: Area Representing Students having Weight between 45 kg and 65 kg

 Number of male students having weight between 45 kg & 65 kg


= P[45 < X < 65]
= 1000  0.4481 = 448
b) Let x1 be the lowest weight amongst 100 heaviest students.

x1  60
Now, for X  x1 , Z   z1 (say) .
16
100
P [X  x1 ]   0.1 [See Fig.14.14]
1000
 P[Z  z1 ]  0.1

 P  0  Z  z1   0.5  0.1 0.4.

 z1 = 1.28 [From Table]

 x1 = 60+16  1.28 = 60+20.48 = 80.48.

Therefore, the lowest weight of 100 heaviest male students is 80.48 kg.

Fig. 14.14: Area Representing the 100 Heaviest Male Students

Example 4: In a normal distribution 10% of the items are over 125 and 35% are under
60. Find the mean and standard deviation of the distribution.
Solution:

Fig. 14.15: Area Representing the Items under 60 and over 125

Let X ~ N(µ, σ2), where µ and σ2 are unknown and are to be obtained.
34
Here we are given Area Property of
Normal Distribution
P[X > 125] = 0.1 and P[X < 60] = 0.35. [See Fig. 14.15]
X 
We know that if X ~ N(µ, σ2), then Z  .

60     vesign is taken because 
For X = 60, Z    z1 (say) ... (1)  P[Z  0]  P[Z  0]  0.5 
  
125  
For X 125, Z   z 2 (say) ...(2)

Now P  X  60   P  Z  z1   0.35

 P[Z  z1 ]  0.35 [By symmetry of normal curve]

 0.5  P[0  Z  z1 ]  0.35

 P[0  Z  z1 ]  0.15

 From the table areas 


 z1  0.39  under normal curve 
 
and P  X  125  P  Z  z 2   0.10

 0.5  P[0  Z  z 2 ]  0.10

 P[0  Z  z 2 ]  0.40

 z 2  1.28  From the table

Putting the values of z1 and z 2 in Equations (1) and (2), we get

60  
 0.39 … (3)

125  
 1.28 … (4)

(4) – (3) gives
125    60  
 1.28  0.39

65 65
 1.67     38.92
 1.67
From Eq. (4),   125  1.28    125  1.28  38.92 = 75.18
Hence   mean  75.18;   S.D.  38.92
Example 5: Find the quartile deviation of the normal distribution having mean µ
and variance  2 .
Solution: Let X  N(, 2). Let Q1 and Q3 are the first and third quartiles. Now
as Q1, Q2 and Q3 divide the distribution into four equal parts, therefore, areas
35
Continuous Probability
Distributions
under the normal curve to the left of Q1 , between Q1 and Q2 (Median), between
Q2 and Q3 and to the right of Q3 all are equal to 25 percent of the total area. This
has been shown in Fig. 14.16.

Fig. 14.16: Area to the Left of X = Q1 and to the Right of X = Q3

i.e. here, we have


P[X < Q1] = 0.25, P[Q1 < X < µ] = 0.25, P[µ < X < Q3 ] = 0.25 and
P[X > Q3] = 0.25 [See Fig. 14.16]
Q1  
Now, when X  Q1 , Z   z1 , (say)

 value of Z corresponds to Q1 which lies to the left of mean which is zero for Z
and hence the value to the left of it is negative. Thus, a negative value of Z has
been taken here.
 Q1    z1  Q1    z1

and when
Q3  
X  Q3 , Z   z1

Due to symmetry of normal curve, the values of Z corresponding to Q1 and Q3
are equal in magnitude because they are equidistant from mean.
 Q3    z1  Q3    z1

Now, as P[µ < X < Q3] = 0.25 , therefore,


P[0 < Z < z1] = 0.25
 z1 = 0.67 [From normal tables]
Q 3  Q1 (  z1 )  (  z1 ) 2
Now, Q.D. =   z1 = σ(0.67) i.e.  (approx).
2 2 3
Now, we are sure that you can try the following exercises:

X 
E1) If X ~ N(150, 9) and Z is a S.N.V. i.e Z  then find Z scores

corresponding to the following values of X
(i) X = 165 (ii) X = 120
36 E2) Suppose X ~ N (25, 4) then find
(i) P[X < 22], (ii) P [X > 23], (iii) P[X – 24< 3], and (iv) P[X – 21 > 2] Area Property of
Normal Distribution
E3) Suppose X ~ N (30, 16) then find  in each case
(i) P[X   ]  0.2492
(ii) P[X   ]  0.0496
E4) Let the random variable X denote the chest measurements (in cm) of
2000 boys, where X ~ N(85, 36).
a) Then find the number of boys having chests measurement
i) less than or equal to 87 cm,
ii) between 86 cm and 90 cm,
iii) more than 80 cm.

b) What is the lowest value of the chest measurement among the 100
boys having the largest chest measurements?
E5) In a particular branch of a bank, it is noted that the duration/waiting time
of the customers for being served by the teller is normally distributed
with mean 5.5 minutes and standard deviation 0.6 minutes. Find the
probability that a customer has to wait
a) between 4.2 and 4.5 minutes, (b) for less than 5.2 minutes, and (c)
more than 6.8 minutes
E6) Suppose that temperature of a particular city in the month of March is
normally distributed with mean 24  C and standard deviation 6  C . Find
the probability that temperature of the city on a day of the month of
March is
(a) less than 20  C (b) more than 26  C (c) between 23  C and 27  C

14.3 FITTING OF NORMAL CURVE USING AREA


PROPERTY
To fit a normal curve to the observed data we first find the mean and variance
from the given data. Mean and variance so obtained are  and  respectively.
Substituting these values of  and 2 in the probability function
2
1  x  
1   
f x  e 2  
, we get the normal curve fitted to the given data.
 2
Now, the expected frequencies can be computed using either of the following
two methods:
1. Area method
2. Method of ordinates
But, here we only deal with the area method. Process of finding the expected
frequencies by area method is described in the following steps:
(i) Write the lower limits of each of the given class intervals.

37
Continuous Probability
Distributions X 
(ii) Find the standard normal variate Z  corresponding to each lower

limit. Suppose the values of the standard normal variate are obtained as z1,
z2, z3, …
(iii) Find P[Z  z1], P[Z  z2], P[Z  z3],…i.e. the areas under the normal curve
to the left of ordinate at each value of Z obtained in step (ii). Using table
given in the Appendix at the end of the unit Z = zi may be to the right or left
of Z = 0.
If Z = zi is to the right of Z = 0 as shown in the following figure:

Fig. 14.17: Area to the Left of Z = z i , when zi is to the Right of Z = 0

Then, P[Z  zi] is obtained as


P[Z  zi] = 0.5 + P[0  Z  zi]
But, if Z = zi is to the left of Z = 0 (this is the case when zi is negative) as
shown in the following figure:

Fig. 14.18: Area to the Left of Z = z i , when z i is to the left of Z = 0

Then
P[Z  zi] = 0.5 – P[zi  Z  0]
= 0.5 – P[0  Z  – zi] [Due to symmetry]
e.g. zi =  2 (say),
Then P[Z  – 2] = 0.5 – P[–2  Z  0]
= 0.5 – P[0  Z  – (–2)]
= 0.5 – P[0  Z  2]
(iv) Obtain the areas for the successive class intervals on subtracting the area
corresponding to every lower limit from the area corresponding to the
succeeding lower limit.
38
e.g. suppose 10, 20, 30 are three successive lower limits. Area Property of
Normal Distribution
Then areas corresponding to these limits are
P[X  10], P[X  20], P[X  30] respectively.
Now the difference P[X  30] – P[X  20] gives the area corresponding to
the interval 20-30.
(v) Finally, multiply the differences obtained in step (iv) i.e. areas
corresponding to the intervals by N (the sum of the observed frequencies),
we get the expected frequencies.

Above procedure is explained through the following example.


Example 6: Fit a normal curve by area method to the following data and find the
expected frequencies.

X f
0-10 3
10-20 5
20-30 8
30-40 3
40-50 1

Solution: First we are to find the mean and variance of the given frequency
distribution. This you can obtain yourself as you did in Unit 2 of MST-002 and
at many other stages. So, this is left an exercise for you.
You will get the mean and variance as
 = 22 and 2 = 111 respectively
  = 10.54
Hence, the equation of the normal curve is
2
1  x  
1   
f x  e 2  

 2
2
1  x  22 
1   
= e 2  10.54  ,   x  
10.54  2

39
Continuous Probability
Distributions
Expected frequencies are computed as follows:

Class Lower Standard Area under Difference Expected


Interval Limit Normal normal curve between frequencies
Variate to the left of successive =20  col.V
X  Z areas
Z
 P X  x 
X  22  P  Z  z

10.54
Below 0  – P[Z < – ] 0.0183 – 0 0.366 0
= 0.0183
=0
0-10 0 –2.09 P[Z  –2.09] 0.1271 – 0.0183 2.176 2
= 0.1088
= 0.0183
10-20 10 –1.14 P[z  – 1.14] 0.4241 – 0.1271 5.94 6
= 0.2970
= 0.1271
20-30 20 –0.19 P[Z  –0.19] 0.7764 – 0.4241 7.05 7
= 0.3523
= 0.4241
30-40 30 0.76 P[Z  0.76] 0.9564 – 0.7764 3.6 4
= 0.1800
= 0.9564
40-50 40 1.71 P[Z  1.71] 0.9961 – 0.9564 0.79 1
= 0.0397
= 0.9564
50 and 50 2.66 P[Z  2.66] _ _
above
= 0.9961

The areas under the normal curve shown in the fourth column of the above tables
are obtained as follows:
 there is no value 
P[Z < –] = 0  to the left of   
 
P[Z  – 2.09] = 0.5 – P[–2.09  Z  0] [See Fig. 14.19]
= 0.5 – P[0  Z  2.09] [Due to symmetry]
 From table given at 
= 0.5 – 0.4817  the end of the unit 
 
= 0.0183

40
Area Property of
Normal Distribution

Fig. 14.19: Area to the Left of Z = –2.09


Similarly,
P[Z  – 1.14] = 0.5 – 0.3729 = 0.1271
P[Z  – 0.19] = 0.5 – 0.0759 = 0.4241
Now, P[Z  0.76] = 0.5 + P[0  Z  0.76] [See Fig. 14.20]
= 0.5 + 0.2764
= 0.7764

Fig. 14.20: Area to the Left of Z = 0.76

Similarly
P[Z  1.71] = 0.5 + 0.4564 = 0.9564
P[Z  2.66] = 0.5 = 0.4961 = 0.9961
You can now try the following exercises:
E7) Fit a normal curve to the following distribution and find the expected
frequencies by area method.

X 60– 65 65-70 70-75 75-80 80-85


5 8 12 8 7

E8) The following table gives the frequencies of occurrence of a variate X


between certain limits. The distribution is normal. Find the mean and S.D.
of X.

X Less than 40 40-50 50 and more


f 30 33 37

41
Continuous Probability
Distributions

14.4 SUMMARY
The main points covered in this unit are:
1) Area property and its various applications has been discussed in detail.
2) Quartile deviation has also been obtained using the area property in an
example.
3) Fitting of normal distribution using area property and computation of
expected frequencies using area method have been explained.

14.5 SOLUTIONS/ANSWERS
E1) We are given X ~ N(150, 9)
 in usual notations, we have
  150,  2  9 3

X   X  150
Now, Z  
 3
165  150 15
(i) When X = 165, Z=  5
3 3
120  150 30
(ii) When X = 120, Z=   10
3 3
E2) Here X ~ N(25, 4)
 in usual notations, we have
Mean =   25, var iance   2  4    2
X   X  25
If Z is the S.N.V then Z  
 2
22  25 3
i) X = 22, Z   1.5
2 2
P[X  22]  P[Z  1.5] [See Fig. 14.21]
 due to symmetry of 
 P[Z  1.5]  normal curve 
 
 0.5  P[0  Z  1.5]

 Using table area 


= 0.5  0.4332  under normal curve 
 
= 0.0668

42
Area Property of
Normal Distribution

Fig. 14.21: Area to the Left of X = 22

23  25 2
ii) X = 23, Z    1
2 2
P[X  23]  P[Z  1] [See Fig.14.22]

 due tosymmetry of 
= P[Z  1]  normal curve 
 
= 0.5  P[0  Z  1]

 Using table area 


= 0.5 + 0.3413  under normal curve 
 
= 0.8413

Fig. 14.22: Area to the Right of X = 23

 x  a  b 
iii) P[| X  24 | 3]  P[ 3  X  24  3]  
  b  x  a  b 
= P[  3  24  X  3  24)
= P[21<X  27]
21  25 4
X = 21, Z    2
2 2
27  25 2
X = 27, Z   1
2 2
 P[| X  24 | 3]  P[21  X  27] See Fig.14.23

 P[  2  Z  1]
43
Continuous Probability
Distributions
= P[–2< Z< 0] + P[0 < Z < 1]
= P[0  Z  2]  P[0  Z  1]
= 0.4772 – 0.3413 = 0.1359

Fig. 14.23: Area between X = 21 and X = 27

iv) P[|X  21| 2]  P[X  21  2 or   X  21  2]

 x  a  b    x  a   b 
 
 x  a  b or   x  a   b 
= P[X  23or  X  2  21]

 y   a 
= P[X  23or X  19]  y  a 
 
19  25 6
For X=19, Z    3
2 2
23  25 2
For X=23, Z    1
2 2
 P[| X  21| 2]  P[X  23or X  19] See Fig14.24 
= P[Z  1or Z  3]

 By addition theorem for 


= P[Z  1]  P[Z  3]  mutually exclusive events 
 
 1  P[ 3  Z  1]
 1  P[1  Z  3]
= 1– [P[0  Z  3]  P[0  Z  1]]

= 1– [0.4987 – 0.3413]  From table 


= 1– 0.1574 = 0.8426.

44
Area Property of
Normal Distribution

Fig. 14.24: Area between X = 19 and X = 23

E3) Here X ~ N(30, 16)


 in usual notations, we have
Mean =   30, variance  2  16    4
X   X  30
If Z is S.N.V then Z  
 4
  30
i) X  , Z   z1 (say) ...(1)
4
Now P[X   ]  0.2492 See Fig.14.25
 P[Z  z1 ]  0.2492  0.5  P 0  Z  z1   0.2492

 P  0  Z  z1   0.2508

 z1  0.67 [From the table]

Putting z1  0.67 in (1), we get

  30
= 0.67
4
  30  2.68
  30  2.68= 32.68

Fig. 14.25: z1 Corresponding to 24.92 % Area to its Right

  30
ii) For X  , Z    z 2 (say) … (2)
4
Now P[X   ]  0.0496

 P[Z   z 2 ]  0.0496 [See Fig.14.26]

 P[Z  z 2 ]  0.0496 [Due to symmetry]


45
Continuous Probability
Distributions

Fig. 14.26: z2 Corresponding to 4.96 % Area to its Right

 0.5  P[0  Z  z 2 ]  0.0496

 P[0  Z  z 2 ]  0.5  0.0496  0.4504

 z 2  1.65 [From the table]

Putting  z 2  1.65 in (2), we get

  30
 1.65
4
   30  1.65  4
   30  6.60
   30  6.6 = 23.4
E4) We are given X ~ N(85, 36), N = 2000
i.e.   85cm, 2  36cm, N  2000
x
If X ~ N(µ, σ2) and Z  then we know that Z ~ N(0, 1)

87  85 2
a) i) For X = 87, Z   0.33
6 6
Now P[X < 87] = P [Z < 0.33] [See Fig. 14.27]
= 0.5 + P [0 < Z < 0.33]

Fig. 14.27: Area to the Left of X = 87 or Z = 0.33

= 0.5 + P [0 < Z < 0.33]


 From the table of areas 
= 0.5 + 0.1293  under normal curve 
 
= 0.6293
46
Therefore, number of boys having chests measurement  87 Area Property of
Normal Distribution
 N.P[X  87]
= 2000  0.6293 = 1259
86  85 1
ii) For X = 86, Z   0.17
6 6
90  85 5
For X = 90, Z   0.83
6 6

Fig. 14.28: Area between X= 86 and X= 90

 P 86  X  90  P 0.17  Z  0.83 See Fig.14.28


= P  0  Z  0.83  P  0  Z  0.17 
 From the table of areas 
= 0.2967 – 0.0675  under normal curve 
 
= 0.2292
 number of boys having chests measurement between 86 cm
and 90 cm
= N. P [86  x  90 ]
= 2000  0.2292 = 458
80  85 5
iii) For X = 80, Z   0.83
6 6
P [X > 80] = P [Z > – 0.83] [See Fig. 14.29]
= P [Z < 0.83]
= 0.5 + P [0 < Z < 0.83]
 From the table of areas 
= 0.5 + 0.2967  under normal curve 
 
= 0.7967
 number of boys having chest measurement more than 80 cm
= N.P[X > 80]
= 2000  0.7967
= 1593

47
Continuous Probability
Distributions

Fig. 14.29: Area to the Right of X = 80 or Z =  0.83

b) Let x1 be the lowest chest measurement amongst 100 boys having the
largest chest measurements.
x  85
Now, for X x1 , Z  1  z1 (say) .
6
100
P[X  x1 ]   0.05
2000
 P  Z  z1   0.05 See Fig.14.30 

Fig. 14.30: Area Representing the 100 Boys having Largest Chest Measurements

 P[0  Z  z1 ]  0.5  0.05  0.45

 z1 =1.64 [From Table]

 x1 = 85 + 6  1.64  85  9.84 = 94.84.

Therefore, the lowest value of the chest measurement among the 100
boys having the largest chest measurement is 94.84 cm.
E5) We are given
  5.5 minutes,  = 0.6 minutes
X
If X ~ N(,  2 ) and Z  then we know that Z ~ N (0, 1)

4.2  5.5 1.3 13
a) For X = 4.2, Z    2.17
0.6 0.6 6
4.5  5.5 1.0 10 5
For X = 4.5, Z     1.67
0.6 0.6 6 3

48
Area Property of
Normal Distribution

Fig. 14.31: Area Representing Probability of Waiting Time between 4.2 and 4.5 Minutes

P[4.2  x  4.5]  P  2.17  Z  1.67  See Fig.14.31


= P [1.67 < Z < 2.17]
= P [0 < Z < 2.17] – P [0 < Z < 1.67]
= 0.4850 – 0.4525
= 0.0325
Therefore, probability that customer has to wait between 4.2 min and
4.5 min = 0.0325
5.2  5.5 0.3 3 1
b) For X  5.2, Z      0.5
0.6 0.6 6 2

Fig. 14.32: Area Representing Probability of Waiting Time Less than 5.2 Minutes

P[X < 5.2] = P [Z < –0.5] [See Fig 14.32]


= P [Z > 0.5]
= 0.5 – P [0 < Z < 0.5]
 From the table of areas 
= 0.5 – 0.1915  under normal curve 
 
= 0.3085
Therefore, probability that customer has to work for less than 5.2 min
= 0.3085
6.8  5.5 1.3 13
c) For X  6.8, Z    2.17
0.6 0.6 6

49
Continuous Probability
Distributions

Fig. 14.33: Area Representing Probability of Waiting Time Greater than 6.8 Minutes

P[X > 6.8] = P [Z > 2.17] [See Fig 14.33]


= 0.5 – P [0 < Z < 2.17]
= 0.5 – 0.4850 = 0.0150
Therefore, probability that customer has to wait for more than
6.8 min = 0.0150
E6) Let the random variable X denotes the temperature of the city in the
month of March. Then we are given
X ~ N(, 2 ), where   24  C,   6  C

X 
We know that if X ~ N(, 2 ), and Z  then Z ~ N (0, 1)

20  24 4 2
a) For X = 20, Z    0.67
6 6 3


Fig. 14.34: Area Representing Probability of Temperature Less than 20 C

P[X < 20] = P[Z < –0.67] [See Fig. 14.34]


= P [Z > 0.67]
= 0.5 – P[0 < Z < 0.67]
= 0.5 – 0.2486 = 0.2514
Therefore, probability that temperature of the city is less than 20  C is
0.2514
26  24 2 1
b) For X= 26, Z    0.33
6 6 3

50
Area Property of
Normal Distribution


Fig. 14.35: Area Representing Probability of Temperature Greater than 26 C
Since, P[X > 26] = P [Z > 0.33] [See Fig. 14.35]
= 0.5 – P[0 < Z < 0.33]
 From the table of areas 
= 0.5 – 0.1293  
 under normal curve 
= 0.3707
Therefore, probability that temperature of the city is more than
26  C is 0.3707
23  24 1
c) For X= 23, Z   0.17
6 6
27  24 3 1
For X= 27, Z     0.5
6 6 2

Fig. 14.36: Area Representing Probability of Temperature


 
between 23 C and 27 C

P[23 < X < 27] = P[–0.17 < Z < 0.5] [See Fig. 14.36]
= P [–0.17 < Z < 0] + P [0 < Z < 0.5]
= P[0 < Z < 0.17] + P[0 < Z < 0.5]
 From the table of areas 
= 0.0675 + 0.1915  under Normal Curve 
 
= 0.2590
Therefore, probability that temperature of the city is between
23  C and 27  C is 0.2590

51
Continuous Probability
Distributions E7) Mean ()  73, variance( 2 )  39.75
and hence S.D. ( )  6.3
 The equation of the normal curve fitted to the given data is
2
1  x  73 
1   
f(x)= e 2 6.3 
,   x  
(6.3) 2
Using area method,
The expected frequencies are obtained as follows:
Class Lower X Area under Difference Expected
interval limit Z= normal curve between frequency

X X  73 to the left of successive areas 40  col. V
 z
6.3
Below   0 0.0197 – 0 0.8 1
60
= 0.0197

60 – 65 60 –2.06 0.5 – 0.4803 0.1020 – 0.0197 3.3 3


= 0.0197
= 0.0823

65 – 70 65 –1.27 0.5 – 0.3980 0.3156 – 0.1020 8.5 9


= 0.1020
= 0.2136

70 – 75 70 –0.48 0.5 – 0.1844 0.6255 – 0.3156 12.4 12


= 0.3156
= 0.3099

75 – 80 75 + 0.32 0.5 + 0.1255 0.8655 – 0.6255 9.6 10


= 0.6255
= 0.2400

80 – 85 80 1.11 0.5 + 0.3655 0.9713 – 0.8655 4.2 4


= 0.8655
= 1.1058

85 and 85 1.90 0.5 + 0.4713


above = 0.9713

30
E8) P[X  40]   0.3,
100
33
P[40  X  50]   0.33, and
100

52
37 Area Property of
P[X  50]   0.37, Normal Distribution
100
Now, Let X ~ N(,  2 ),
 Standard normal variate is
X 
Z=

 It is taken as  ve as area to 
40  
When X = 40, Z   z1 , (say)  the left of this value is 30% 

as probabilityis 0.3 

 It is taken as +ve as 
50   area to the right of this 
When X = 50, Z =  z2 (say)  

 valueis given as37% 

Now,
P[X  40]  P[Z  z1 ]  0.3

 0.5  P[  z1  Z  0]  0.3 [See Fig14.37]

 0.5  P[0  Z  z1 ]  0.3 [Due to symmetry]

 P[0  Z  z1 ]  0.2
From table at the end of this unit,

Fig. 14.37: -z1 Corresponding to the 30% Area to its Left

The value of Z corresponding to probability/area is

 As values of Z are 0.52 and 0.53 


z1  0.525 corresponding to the proabability 
 
0.1985 

P[X  50]  0.37

 P[Z  z 2 ]  0.37 See Fig.14.38


 0.5  P[0  Z  z 2 ]  0.37

 P[0  Z  z 2 ]  0.13

z 2 =0.33(approx.) [From the table]

53
Continuous Probability
Distributions

Fig. 14.38: z2 Corresponding to the 37% Area to its Right

40   50  
  0.525 and  0.33
 
 40    0.525 and 50    0.33
Solving these equations for  and , we have
  11.7 and   46.14

54
APPENDIX Area Property of
Normal Distribution
AREAS UNDER NORMAL CURVE
The standard normal probability curve is given by
1  1 
(z)= exp   z 2  ,   z  
2  2 
The following table gives probability corresponding to the shaded area as shown
in the following figure i.e. P[0  Z  z] for different values of z

TABLE OF AREAS

z 0 1 2 3 4 5 6 7 8 9

0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0759

0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879

0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2703 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2005 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389

1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3655 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3820
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 4279 .4292 .4306 .4319

55
Continuous Probability
Distributions

1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767

2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4678 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936

2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4959 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .1960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4879 .4980 .4981

2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986

3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990

3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4493 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998

3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000

56
Continuous Uniform and
UNIT 15 CONTINUOUS UNIFORM AND Exponential Distributions
EXPONENTIAL DISTRIBUTIONS
Structure
15.1 Introduction
Objectives

15.2 Continuous Uniform Distribution


15.3 Exponential Distribution
15.4 Summary
15.5 Solutions/Answers

15.1 INTRODUCTION
In Units 13 and 14, you have studied normal distribution with its various
properties and applications. Continuing our study on continuous distributions,
we, in this unit, discuss continuous uniform and exponential distributions. It
may be seen that discrete uniform and geometric distributions studied in Unit
11 and Unit 12 are the discrete analogs of continuous uniform and exponential
distributions. Like geometric distribution, exponential distribution also has the
memoryless property. You have also studied that geometric distribution is the
only discrete distribution which has the memoryless property. This feature is
also there in exponential distribution and it is the only continuous distribution
having the memoryless property.
The present unit discusses continuous uniform distribution in Sec. 15.2 and
exponential distribution in Sec. 15.3.
Objectives
After studing the unit, you would be able to:
 define continuous uniform and exponential distributions;
 state the properties of these distributions;
 explain the memoryless property of exponential distribution; and
 solve various problems on the situations related to these distributions.

15.2 CONTINUOUS UNIFORM DISTRIBUTION


The uniform (or rectangular) distribution is a very simple distribution. It
provides a useful model for a few random phenomena like having random
number from the interval [0, 1], then one is thinking of the value of a
uniformly distributed random variable over the interval [0, 1].
Definition: A random variable X is said to follow a continuous uniform
(rectangular) distribution over an interval (a, b) if its probability density
function is given by

 1
 for a  x  b
f x  b  a
 0, otherwise

57
Continuous Probability The distribution is called uniform distribution since it assumes a constant
Distributions
(uniform) value for all x in (a, b). If we draw the graph of y = f(x) over x-axis
and between the ordinates x = a and x = b (say), it describes a rectangle as
shown in Fig. 15.1

1
ba

X
a b
Fig. 15.1: Graph of uniform function

A uniform variate X on the interval (a, b) is written as X ~ U[a, b]


Cumulative Distribution Function
The cumulative distribution function of the uniform random variate over the
interval (a, b) is given by:
x
For x  a , F  x   P  X  x    0dx  0


For a < x < b,


x x
1 1 x a
F  x   P  X  x    f  x  dx   dx   x ax  .
a a
ba ba ba

For x  b
x
F  x   P X  x   f  x dx


a b 
  f  x dx   f  x dx   f  x dx
 a b

a b 
1
   0 dx  
 a
ba
dx    0 dx
b

1 b ba
=0+  x a  0 =  1.
ba ba
So,
 0 for x  a
x  a

Fx    for a  x  b
b  a
 1 for x  b

58
On plotting its graph, we have Continuous Uniform and
Exponential Distributions
Fx 

a b
Fig. 15.2: Graph of distribution function

Mean and Variance of Uniform Distribution


Mean = 1st order moment about origin 1'  
b b
1
=  x.f  x  dx =  x.
a a
ba
dx

b
1  x2  1  b2 a 2 
=      
b  a  2 a b  a  2 2 

=
 b  a  b  a  
ab
2b  a  2

Second order moment about origin   '2 


b
2
=  x f  x  dx
a

b b
1 1  x3  1  b3 a 3 
=  x2. dx =   =   
a
ba b  a  3 a ba  3 3 

b3  a 3
=
3 b  a 

 b  a   b2  ab  a 2 
=  x 3  y 3   x  y   x 2  xy  y 2  
3 b  a   

a 2  ab  b 2

3
2
2 a 2  ab  b 2  a  b 
2
 Variance of X = E(X ) – [E(X)] =  
3  2 
2


 
4 a 2  ab  b 2  3  a  b 
12
59
Continuous Probability
Distributions 4a 2  4ab  4b 2  3a 2  3b 2  6ab

12
2
b2  a 2  2ab  b  a 
  .
12 12
2

So, Mean =
ab
and Variance =
b  a  .
2 12
Let us now take up some examples on continuous uniform distribution.
Example 1: If X is uniformly distributed with mean 2 and variance 12, find
P[X < 3].
Solution: Let X  U [a, b]
 probability density function of X is
1
f x  , a  x  b.
ba
Now as Mean = 2
ab
 2
2
 a+b=4 … (1)
Variance = 12
2


b  a   12
12
2
  b  a   144

 b – a =  12

b  a  12, being negative is 


 b – a = 12 ... (2)  rejected as b should be greater than a 
 
 b  a should be positive 

Adding (1) and (2), we have


2b = 16
 b = 8 and hence a = 4 – 8 = – 4
1 1 1
 f x    for – 4 < x < 8.
b  a 8   4  12
3 3
1 1 1 3
Thus, the desired probability  P  X  3  4 12 dx  12 41dx = 12  x 4
1 7
 3   4   = .
12 12
Example 2: Calculate the coefficient of variation for the rectangular
distribution in (0, 12).
60
Solution: Here a = 0, b = 12. Continuous Uniform and
Exponential Distributions
a  b 0  12
 Mean =   6,
2 2
2 2

Variance =
b  a  
12  0  
144
 12.
12 12 12
 S.D. = 12
Thus, the coefficient of variation
S.D.
= 100 [Also see Unit 2 of MST-002]
Mean

12
 100 = 57.74%
6
Example 3: Metro trains are scheduled every 5 minutes at a certain station. A
person comes to the station at a random time. Let the random variable X count
the number of minutes he/she has to wait for the next train. Assume X has a
uniform distribution over the interval (0, 5). Find the probability that he/she
has to wait at least 3 minutes for the train.
Solution: As X follows uniform distribution over the interval (0, 5),
 probability density function of X is
1 1 1
f x    , 0 x5
ba 50 5
Thus, the desired probability
5 5 5
1 1
P  X  3   f  x  dx   dx   1 dx
3 3
5 53

1 5 1 2
  x 3   5  3   0.4
5 5 5

Now, you can try the following exercises.


E1) Suppose that X is uniformly distributed over (–a, a). Determine ‘a’ so that
1
i) P  X  4 
3
3
ii) P  X  1 
4
iii) P  X  2   P  X  2 

E2) A random variable X has a uniform distribution over (–2, 2). Find k for
1
which P[X > k] = .
2

61
Continuous Probability Now, let us discuss exponential distribution in the next section.
Distributions

15.3 EXPONENTIAL DISTRIBUTION


The exponential distribution finds applications in the situations related to
lifetime of an equipment or service time at the counter in a queue. So, the
exponential distribution serves as a good model whenever there is a waiting
time involved for a specific event to occur e.g. waiting time for a failure to
occur in a machine. The exponential distribution is defined as follows:
Definition: A random variable X is said to follow exponential distribution
with parameter  > 0, if it takes any non-negative real value and its probability
density function is given by
e x for x  0
f x  
0, elsewhere

Its cumulative distribution function (c.d.f.) is thus given by


x x
F  x   P  X  x    f  x  dx   e x dx
0 0

x
 ex  x x


 x
  1 e  0   e  e
0
 
 0

 
  e x  1  1  e x .

1  e x for x  0
So, F  x    .
0, elsewhere

Mean and Variance of Exponential Distribution


 
Mean = E  X    x f  x  dx   x  ex dx
0 0


   x ex dx
0


 ex 

ex 
    x   1 dx  [Integrating by parts]
    0 0  

In case of integration of product of two different types of functions, we do


integration by parts i.e. the following formula is applied:

  First function Second fuction  dx


= (First function as it is) (Integral of second)
   Differentiation of first  Integral of second  dx

62
 Continuous Uniform and
 1  ex  
 Mean    0  0     Exponential Distributions
     0 

 1   1  1
    2  0  1     2   .
     
 
Now, E  X 2    x 2f  x  dx   x 2  e x  dx
0 0


   x 2e x dx
0

 ex   ex 
=   x 2   2x  dx  [Integrating by parts]
 0 0

  


 2 
   0  0    x e x dx 
 0 
 
2 2

 0 0

  x e x dx =  x ex dx 
2
 E X

21
 [E(X) is mean and has already been obtained]

2

2
2
2 1 2 1 1
Thus, Variance = E(X2) – [E(X)] 2 = 2
   2  2  2
    
1 1
So, Mean = and Variance = 2 .
 
1 1 Mean
Remark 1: Variance = 2
   Mean =  Variance
 . 
So,
Value of  Implies
<1 Mean < Variance
=1 Mean = Variance
>1 Mean > Variance

Hence, for exponential distribution,


Mean > or = or < Variance according to whether  > or = or < 1.

63
Continuous Probability Memoryless Property of Exponential Distribution
Distributions
Now, let us discuss a very important property of exponential distribution and
that is the memoryless (or forgetfulness) property. Like geometric distribution
in the family of discrete distributions, exponential distribution is the only
distribution in the family of continuous distributions which has memoryless
property. The memorless property of exponential distribution is stated as:
If X has an exponential distribution, then for every constant a  0, one has
P[X  x + a  X  a] = P  X  x  for all x i.e. the conditional probability of
waiting up to the time ' x  a ' given that it exceeds ‘a’ is same as the
probability of waiting up to the time ‘ x ’. To make you understand the above
concept clearly let us take the following example: Suppose you purchase a TV
set, assuming that its life time follows exponential distribution, for which the
expected life time has been told to you 10 years (say). Now, if you use this TV
set for say 4 years and then you ask a TV mechanic, without informing him/her
that you had purchased it 4 years ago, regarding its expected life time. He/she,
if finds the TV set as good as new, will say that its expected life time is 10
years.
So, here, in the above example, 4 years period has been forgotten, in a way,
and for this example:
P[life time up to 10 years]
= P[life time up to 14 years | life time exceeds 4 years]
i.e. P[X  10] = P [X  14 X  4]
or P[X  10] = P[X  10 + 4 X  4]
Here a = 4 and x = 10.
Let us now prove the memoryless property of exponential distribution.
 X  x  a    X  a  
Proof: P  X  x  a  X  a   [By conditional probability]
P X  a 

where
P  X  x  a    X  a    P  a  X  x  a 
xa xa
x
  f  x  dx   e dx
a a

xa
 ex   e   x a  e a 
    
   a    
  x  a 
  e  ea    e x .ea  ea 

= e a 1  ex  , and

  
x  e x   a  a
P[X  a] =  f  x  dx =  e dx       0  e   e
a a    a

64
e a 1  e x  Continuous Uniform and
 P X  x  a  X  a   1  e x Exponential Distributions
ea
x
Also, P[ X  x ]   e x dx
0

 1  e x [On simplification]


Thus,
P[X  x + a  X  a] = P[X  x].
Hence proved
Example 4: Show that for the exponential distribution:
f  x   Ae  x , 0  x   , mean and variance are equal.

Solution: As f  x  is probability function,



  f  x  dx  1
0

 
x  ex 
  Ae dx  1  A   1
0  (1)  0
 –A [0 –1] = 1  A = 1
 f  x   e x

Now, comparing it with the exponential distribution


f  x   e x , we have

=1
1 1
Hence, mean =   1,
 1
1 1
and variance =   1.
2 1
So, the mean and variance are equal for the given exponential distribution.
Example 5: Telephone calls arrive at a switchboard following an exponential
distribution with parameter  = 12 per hour. If we are at the switchboard, what
is the probability that the waiting time for a call is
i) at least 15 minutes
ii) not more than 10 minutes.
Solution: Let X be the waiting time (in hours) for a call.
 f  x   ex , x  0

 F  x   P  X  x   1  e x [c.d.f. of exponential distribution]

= 1  e12x … (1) [  = 12]


65
Continuous Probability Now,
Distributions
1
i) P[waiting time is at least 15 minutes] = P[waiting time is at least hours]
4
 1  1
= P X    1  P X  
 4  4
1
 12 
= 1  1  e 4  [Using (1) above]
 
= e 3
See table given at the 
= 0.0498 end of Unit10 
 
ii) P[waiting time not more than 10 minutes]
1
= P[waiting time not more than hrs]
6
1
 1 12
= P X    1  e 6
 6

= 1– e 2 = 1– (0.1353) = 0.8647

Now, we are sure that you can try the following exercises.
E3) What are the mean and variance of the exponential distribution given
by:
f  x   3e3x , x  0

E4) Obtain the value of k > 0 for which the function given by
f  x   2e kx , x  0

follows an exponential distribution.


1
E5) Suppose that accidents occur in a factory at a rate of   per
20
working day. Suppose in the factory six days (from Monday to
Saturday) are working. Suppose we begin observing the occurrence of
accidents at the starting of work on Monday. Let X be the number of
days until the first accident occurs. Find the probability that
i) first week is accident free
ii) first accident occurs any time from starting of working day on
Tuesday in second week till end of working day on Wednesday in
the same week.

66
We now conclude this unit by giving a summary of what we have covered in it. Continuous Uniform and
Exponential Distributions
15.4 SUMMARY
Following main points have been covered in this unit.
1) A random variable X is said to follow a continuous uniform (rectangular)
distribution over an interval (a, b) if its probability density function is given
by

 1
 for a  x  b
f x  b  a
 0, otherwise

ab
2) For continuous uniform distribution, Mean  and
2
2

variance 
b  a .
12
3) A random variable X is said to follow exponential distribution with
parameter  > 0, if it takes any non-negative real value and its probability
density function is given by
ex for x  0
f x  
 0 , elsewhere
1 1
4) For exponential distribution, Mean = and Variance = 2 .
 
5) Mean > or = or < Variance according to whether  > or = or < 1.
6) Exponential distribution is the only continuous distribution which has
the memoryless property given by:
P[X  x + a  X  a] = P[X  x].

15.5 SOLUTIONS/ANSWERS
E1) As X  U[  a, a],
 probability density function of X is
1 1 1
f x    , a  x  a .
a  ( a) a  a 2a
1
i) Given that P[X > 4] =
3
a
1 1
  2a dx  3
4

1 1
  x a4 
2a 3

67
Continuous Probability a4 1
Distributions  
2a 3
 3a – 12 = 2a
 a = 12.
3
ii) P  X  1 
4
1
1 3
 dx 
a
2a 4

1 3
  x 1 a 
2a 4
1 3
 1  a  
2a 4
3
 1+ a = a
2
 2  2a  3a
 a2
iii) P  X  2   P  X  2 

 X  2   X  2 
 X  2 or  X  2 
 
  2  X  2 and 
 P  2  X  2  P  X  2 or X  2   
 X  2  X  2 
 X  2or  X  2 
 
 X  2or X   2 

 By Addition law of 
 P  2  X  2  P  X  2   P  X  2  probability for mutually 

 exclusive events 
2 2 a
1 1 1
  dx   dx   dx
2
2a a
2a 2
2a

1 1 1

2a
 4    2  a    a  2 
2a 2a
 4  (2  a)  (a  2)
 4  4  2a
 2a = 8
 a=4
E2) As X ~ U [  2, 2],
1
 f x  ,  2  x  2.
68 4
1 Continuous Uniform and
Now P X  k  Exponential Distributions
2
2
1 1
  4dx  2
k

2k 1
 
4 2
2–k=2
 k = 0.
E3) Comparing it with the exponential distribution given by
f  x   e x , x  0

We have  = 3
1 1 1 1
 Mean =  and Variance = 2 
 3  9
E4) As the given function is exponential distribution i.e. a p.d.f.,

  f  x  dx  1
0

k=2 [On simplification]


Alternatively, you may compare the given function with exponential
distribution
f  x   ex ,

we have
 = 2 and  = k
k=2
1
 x
E5) Here P  X  x   F  x   1  e x = 1 – e 20

i) P[First week is accident free] = P[Accident occurs after six days]


= P[X > 6] = 1 – P[X  5]
1

= 1  1  e5/ 20   e 4
 e 0.25  0.7788.

ii) P[First accident occurs on second week from starting of working day
on Tuesday till end of working day on Wednesday]
=P[First accident occurs after 7 working days
and before the end of 9 working days]
= P[7 < X  9]
= P[X  9] – P[X  7]

69
Continuous Probability 9 7
Distributions      
  1  e 20   1  e 20 
   
9 7
 
20 20
 e e
7 9
 
20 20
e e
 e0.35  e 0.45
= 0.7047 – 0.6376 [See the table give at the end of Unit 10]
= 0.0671.

70
Gamma and Beta
UNIT 16 GAMMA AND BETA Distributions
DISTRIBUTIONS
Structure
16.1 Introduction
Objectives

16.2 Beta and Gamma Functions


16.3 Gamma Distribution
16.4 Beta Distribution of First Kind
16.5 Beta Distribution of Second Kind
16.6 Summary
16.7 Solutions/Answers

16.1 INTRODUCTION
In Unit 15, you have studied continuous uniform and exponential
distributions. Here, we will discuss gamma and beta distributions. Gamma
distribution reduces to exponential distribution and beta distribution reduces
to uniform distribution for special cases. Gamma distribution is a
generalization of exponential distribution in the same sense as the negative
binomial distribution is a generalization of geometric distribution. In a sense,
the geometric distribution and negative binomial distribution are the discrete
analogs of the exponential and gamma distributions, respectively. The present
unit discusses the gamma and beta distributions which are defined with the
help of special functions known as gamma and beta functions, respectively.
So, before defining these distributions, we first define gamma and beta
functions in Sec. 16.2 of this unit. Then gamma distribution and beta
distribution of first kind followed by beta distribution of second kind are
discussed in Secs. 16.3 to 16.5.
Objectives
After studing this unit, you would be able to:
 define beta and gamma functions;
 define gamma and beta distributions;
 discuss various properties of these distributions;
 identify the situations where these distributions can be employed; and
 solve various practical problems related to these distributions.

16.2 BETA AND GAMMA FUNCTIONS


In this section, some special functions i.e. beta and gamma functions are
defined with their properties and the relation between them. These will be
helpful in defining beta and gamma distributions to be defined in the
subsequent sections.

71
Continuous Probability
Distributions Beta Function
1
m 1 n 1
Definition: If m > 0, n > 0, the integral  x 1  x 
0
dx is called a beta

function and is denoted by β(m, n) e.g.


1 1 3
2 1 31 3 
i)  x 1  x  dx   x 2 1  x  dx    ,3 
0 0 2 
1
2 1  3 
or  x 1  x  dx     1, 2  1    , 3 
0 2  2 
1 1 1
   1 1  2 2
ii) x 3
1  x  3 dx      1,   1    , 
0  3 3  3 3

Properties of Beta Function


1. Beta function is symmetric function i.e. β(m, n) = β(n, m)
2. There are some other forms also of Beta function. One of these forms,
which will be helpful in defining beta distribution of second kind, is

x m 1
  m, n    mn
dx
0 1  x 
  p,q  1   p  1, q 
3. (i) 
q p
(ii) β(p, q) = β(p + q, q)  β(p, q + 1)

On the basis of the above discussion, you can try the following exercise.
E1) Express the following as a beta function:
1 1 1

i) x 3
1  x  2 dx
0

1
2 5
ii)  x 1  x 
0
dx


x2
iii)  1  x  5
dx
0

1
 
2
x
iv)  1  x  2
dx
0

72
Gamma Function Gamma and Beta
Distributions
Though we have defined Gamma function in Unit 13, yet we are again
defining it with more properties, examples and exercises to make you clearly
understand this special function.

n 1  x
Definition: If n > 0, the integral x e dx is called a gamma function and is
0

denoted by  n 

e.g.

2 x
(i) x e dx   2  1   3
0


x 1  3
(ii)  xe dx    1    
0 2  2
Some Important Results on Gamma Function

1. If n > 1,  n    n  1  n  1
2. If n is a positive integer, n   n  1 !

1
3.    
2
Relationship between Beta and Gamma Functions

If m > 0, n > 0, then   m, n  


 m n 
m  n
You can now try the following exercise.
E2) Evaluate:
 5
x
(i)  e x 2 dx
0


10
(ii)  1  x 
0
dx

 1

x
(iii)  x 2 e dx
0

16.3 GAMMA DISTRIBUTION


Gamma distribution is a generalisation of exponential distribution. Both the
distributions are good models for waiting times. For exponential distribution,
the length of time interval between successive happenings is considered i.e.
the time is considered till one happening occurs whereas for gamma
distribution, the length of time between 0 and the instant when rth happening
73
Continuous Probability
Distributions
occurs is considered. So, if r = 1, then the situation becomes the exponential
situation. Let us now define gamma distribution:
Definition: A random variable X is said to follow gamma distribution with
parameters r > 0 and  > 0 if its probability density function is given by
  r e x x r 1
 , x0
f (x)   r 

0, elsewhere

Remark 1:
(i) It can be verified that

 f  x  dx  1
0

Verification:
 
 r ex x r 1
  x dx  
0 0 r
dx

r 1

ex  x 
 dx
0 r
Putting  x = y   dx  dy
Also, when x  0, y  0 and when x  , y  

1 y
 e y r 1dx
r 0

1
 r [Using gamma function defined in Sec. 16.2]
r
=1
(ii) If X is a gamma variate with two parameters r > 0 and  > 0, it is expressed
as X  γ(, r).
(iii) If we put r = 1, we have

e x x 0
f (x)  ,x  0
1
 e x , x  0
which is probability density function of exponential distribution.

Hence, exponential distribution is a particular case of gamma


distribution.

(iv) If we put  = 1, we have

e x .x r 1
f (x)  , x  0, r  0
r
74
It is known as gamma distribution with single parameter r. This form of the Gamma and Beta
gamma distribution is also widely used. If X follows gamma distribution with Distributions

single parameter r > 0, it is expressed as X   (r).


Mean and Variance of Gamma Distribution
If X has a gamma distribution with parameters r > 0 and  > 0, then its
r r
Mean = , Variance = 2 .
 
If X has a gamma distribution with single parameter r > 0, then its
Mean = Variance = r.
Additive Property of Gamma Distribution
1. If X1, X2, …, Xk are independent gamma variates with parameters
 , r1  ,  , r2  ,...,  , rk  respectively, then X1 + X2 +…+ Xk is also a
gamma variate with parameter (, r1 + r2 +…+rk).
2. If X1, X2,...,Xk are independent gamma variates with single parameters r1,
r2,…, rk respectively, then X1 + X2 + …, + Xk is also a gamma variate with
parameter r1 + r2 + … + rk.
Example 1: Suppose that on an average 1 customer per minute arrive at a
shop. What is the probability that the shopkeeper will wait more than 5
minutes before
(i) both of the first two customers arrive, and
(ii) the first customer arrive?
Assume that waiting times follows gamma distribution.
Solution:
i) Let X denotes the waiting time in minutes until the second customer
arrives, then X has gamma distribution with r = 2 (as the waiting time is to
be considered up to 2nd customer)
 = 1 customer per minute.
 
 r ex x r 1
 P  X  5   f (x)dx   dx
5 5 r
2

1 e x x 21 
e  x x1 
 dx   dx   x1e  x dx
5  2 5
1 5

  e  x   e  x 
 x    1 dx  [Integrating by parts]
  1 5 5 1 

 
 ex 
  0  5e5    e x dx  5e5   
5  1  5
 5e 5   0  e5 

= 6 e 5
75
Continuous Probability
Distributions
= 6  0.0070 [See the table given at the end of Unit 10]
= 0.042
ii) In this case r = 1,  = 1 and hence

 r ex .x r 1
P  X  5   dx
5 r
  
(1)1 e  x x 0  e x 
 dx   e x dx     0  e5  0.0070
5 (1) 5  1  5
Alternatively,
As r = 1, so it is a case of exponential distribution for which
f  x   ex , x  0

  
 e x 
 P  X  5    e x x
dx   1e dx     0  e 5  0.0070
5 5  1  5
Here is an exercise for you.
E3) Telephone calls arrive at a switchboard at an average rate of 2 per minute.
Let X denotes the waiting time in minutes until the 4th call arrives and
follows gamma distribution. Write the probability density function of X.
Also find its mean and variance.

Let us now discuss the beta distributions in the next two sections:

16.4 BETA DISTRIBUTION OF FIRST KIND


You have studied in Sec. 16.3 that beta function is related to gamma function
in the following manner:

  m, n  
 m n 
m  n
Now, we are in a position to define beta distribution which is defined with the
help of beta function. There are two kinds of beta distribution  beta
distribution of first kind and beta distribution of second kind. Beta distribution
of second kind is defined in next section of the unit whereas beta distribution
of first kind is defined as follows:
Definition: A random variable X is said to follow beta distribution of first
kind with parameters m > 0 and n > 0, if its probability density function is
given by

 1 m 1 n 1
  m, n x 1  x  , 0  x  1
f (x)    
0, otherwise

The random variable X is known as beta variate of first kind and can be
expressed as X  1(m, n)
76
Remark 5: If m = 1 and n = 1, then the beta distribution reduces to Gamma and Beta
Distributions
1 11
f x  x11 1  x  , 0  x  1
 1,1
0
x 0 1  x 
 , 0  x 1
 1,1

1
 ,0  x 1
 1,1

11 0 0
But  1,1  
2 1

Therefore, f (x) 
11  1
1
 f (x)  1, 0  x  1
1
 ,0  x 1
1 0
which is uniform distribution on (0, 1).
1
[p.d.f. of uniform distribution on (a, b) is f (x)  , a  x  b]
ba
So, continuous uniform distribution is a particular case of beta
distribution.
Mean and variance of Beta Distribution of First Kind
Mean and Variance of this distribution are given as
m
Mean =
mn
mn
Variance = 2
 m  n   m  n  1
Example 4: Determine the constant C such that the function
6
f (x)  Cx 3 1  x  , 0  x  1 is a beta distribution of first kind. Also, find its
mean and variance.
Solution: As f  x  is a beta distribution of first kind.
1
 f x  1
0

1
3 6
  Cx 1  x 
0
dx  1

1
6
 C  x 3 1  x  dx  1
0

77
Continuous Probability
Distributions  C   3  1, 6  1  1 [By definition of Beta distribution of first kind]

1
 C
  4, 7 

47  m n 
    m, n   
47   m  n  
11 10
 
4 7 3 6

10  9  8  7  6
  840
3 2  6
6
Thus, f  x   840x 3 1  x 
7 1
= 840x 4 1 1  x 
7 1
x 41 1  x 

  4, 7 

1
[  840 just obtained above in this example]
  4, 7 

 m = 4, n = 7
m 4 4
 Mean =   ,
m  n 4  7 11
mn
and Variance = 2
 m  n   m  n  1
47
 2
 4  7   4  7  1
28 7 7
  
12112 121 3 363
Now, you can try the following exercises.
E4) Using beta function, prove that
1
2 3
 60x 1  x 
0
dx  1

E5) Determine the constant k such that the function


1 1

f (x)  kx 2 1  x  2 ,0  x  1, is a beta distribution of first kind. Also
find its mean and variance.

78
Gamma and Beta
16.5 BETA DISTRIBUTION OF SECOND KIND Distributions
Let us now define beta distribution of second kind.
Definition: A random variable X is said to follow beta distribution of second
kind with parameters m > 0, n > 0 if its probability density function is given
by
 x m 1
 mn
, 0x
f  x      m, n 1  x 

0, elsewhere

x m 1
Remark 6: It can be verified that    m, n 1  x  mn
dx  1
0

Verification:
 
x m 1 1 x m 1
   m, n 1  x  dx  dx
0
m n
  m, n  0 1  x  m  n

  x m-1 
  mn
dx is another form 
 0 1+x  
1 of beta function. 
   m, n 
  m, n   
(see Sec. 16.2 of this Unit) 
 
 
=1

Remark 7: If X is a beta variate of second kind with parameters m > 0, n > 0,


then it is expressed as X  2(m, n)
Mean and Variance of beta Distribution of second kind
m
Mean = , n  1;
n 1
m  m  n  1
Variance = 2
,n  2
 n  1  n  2 
Example 5: Determine the constant k such that the function
kx 3
f x  7
, 0  x  ,
1  x 
is the p.d.f of beta distribution of second kind. Also find its mean and
variance.
Solution: As f  x  is a beta distribution of second kind,

  f  x  dx  1
0

79
Continuous Probability 
Distributions kx 3
  1  x  dx  1
0
7


x 4 1
 k 4 3
dx  1
0 1  x 
 k  4, 3  1

1 7 6 6 5 4
 k     60
  4,3 4 3 3 2 2

Here m = 4, n = 3
m 4 4
 Mean =   2
n 1 3  1 2
m  m  n  1 4(4  3  1) 46
Variance = 2
 2
 6
 n  1  n  2  (3  1)  3  2  4 1

Now, you can try the following exercises.


E6) Using beta function, prove that

x3 64
 13
dx 
15015
0 1  x  2

E7) Obtain mean and variance for the beta distribution whose density is given
by
60x 2
f x  7
,0  x  
1  x 

16.6 SUMMARY
The following main points have been covered in this unit:
1) A random variable X is said to follow gamma distribution with
parameters r > 0 and  > 0 if its probability density function is given by
  r ex x r 1
 , x0
f x   r

0, elsewhere

2) Gamma distribution of random variable X with single parameter r > 0 is


e  x x r 1
defined as f  x   , x  0, r  0
r
r
3) For gamma distribution with two parameters λ and r, Mean = and

r
Variance = .
2
80
4) A random variable X is said to follow beta distribution of first kind with Gamma and Beta
parameters m > 0 and n > 0, if its probability density function is given by Distributions

 1 m 1 n 1
  m, n x 1  x  , 0  x  1
f x    
0, otherwise

m mn
Its mean and variance are and 2
, respectively.
mn  m  n   m  n  1
5) A random variable X is said to follow beta distribution of second kind
with parameters m > 0, n > 0 if its probability density function is given by:
 x m 1
 mn
,0  x  
f  x      m, n 1  x 

0, elsewhere

m m  m  n  1
Its Mean and Variance are , n  1; and 2
,n  2
n 1  n  1  n  2 
respectively.
6) Exponential distribution is a particular case of gamma distribution and
continuous uniform distribution is a particular case of beta distribution.

16.7 SOLUTIONS/ANSWERS
1 1 1
1 1  2 3
1  x  2 dx  B  

3
E1) (i) x  1,  1  B  , 
0  3 2  3 2
1 1
2 11 5 6 1
(ii)  x 1  x  dx   x 1  x 
0 0
dx

is not a beta function, since m =  1 < 0, but m and n both should be


positive.
 
x2 x 31
(iii)  5
dx   3 2
dx = β(3, 2)
0 1  x  0 1  x 
[ m = 3, n = 2(see Property 2 of Beta function Sec. 16.2)]
1 1
   1
2
x x2 1 3
(iv)  1  x  2
dx   1 3
dx    , 
0 0 1  x 

2 2 2 2


5 
E2)  e x .x 5/ 2 dx    1 
0 2 

7
  
2

81
Continuous Probability
Distributions  5  5  5   3   3   5  3   1   1 
                 
 2  2  2   2   2   2  2   2   2 
 Result 1on gamma 
function (See Sec. 16.2) 
 
 5  3  1 
       Result 3 on gamma function 
 2  2  2 

 15 
  
8

1 10
(ii)  x 1  x 
0
dx = β(1 + 1, 10 + 1)

= β(2,11)

2 11 see relation between 


=  beta and gamma function 
13  

=
1!10 ! [Result 2 on gamma function]
12 !

=
10  ! =
1

1
12 1110 ! 12 11 132
 1

x  1  1
(iii) 0 2 e dx    2  1   2   
x

E3) Here  = 2, r = 4.
 r ex .x r 1
 f (x)  ,x  0
r 
24.e2x .x 3
 ,x  0
4
16e2x .x 3
= ,x  0
3

8
= x 3e 2x , x  0
3
r 4
Mean =   2,
 2
r 4
Variance = 2
 2 1
 2
1 1
3 4 1
E4)  60x 2 1  x  dx  60  x 31 1  x  dx  60   3, 4 
0 0

82
34 2 3 60  2  3  2 Gamma and Beta
= 60 = 60   1 Distributions
7 6 6  5  4  3 2
1 1 1

E5)  kx 2
1  x  2 dx  1
0

 1 1 
 k    1,  1  1
 2 2 

1 2 1 2 2
 k    
1 3   
 ,  1 3 1
 
1
 
2 2 2 2 2 2
Now, as the given p.d.f. of beta distribution of first kind is
2  12 1
f (x)  x 1  x  2 , 0  x  1

1 3
1 1
x2 1  x  2
 , 0  x 1
1 3
 , 
2 2
1 3
m  , n 
2 2
1
m 1
and hence mean =  2 
mn 1  3 4
2 2
mn
Variance = 2
 m  n   m  n  1
 1  3  3
   3 1
2
   2 4
 2
 2
 
 1 3   1 3   2   3 4  4  3 16
      1
2 2 2 2 
 
x3 x 4 1
E6)  1  x  13/ 2
dx   5
dx
4
0 0 1  x  2

5 5
4 3
 5  2  2
=   4,  
 2  5 13
4  2
 2

5
6
2 6  32 64
=  
13 11 9 7 5 5 13 11 9  7  5 15015
. . . .
2 2 2 2 2 2
83
Continuous Probability
Distributions 60x 2
E7) f  x   7
,0  x  
1  x 
60x 31
 3 4
,0  x  
1  x 
x 31  34 23 1
 3 4
, 0x    3, 4     
  3, 4 1  x   6 6 60 

m =3, n = 4
m 3
Hence, mean =  1
n 1 4  1
m  m  n  1 3  3  4  1 3 6
Variance = 2
= 2
 = 1.
 n  1  n  2   4  1  4  2  9 2

84
UNIT 9 CONCEPTS OF TESTING OF
HYPOTHESIS
Structure
9.1 Introduction
Objectives
9.2 Hypothesis
Simple and Composite Hypotheses
Null and Alternative Hypotheses
9.3 Critical Region
9.4 Type-I and Type-II Errors
9.5 Level of Significance
9.6 One-Tailed and Two-Tailed Tests
9.7 General Procedure of Testing a Hypothesis
9.8 Concept of p-Value
9.9 Relation between Confidence Interval and Testing of Hypothesis
9.10 Summary
9.11 Solutions /Answers

9.1 INTRODUCTION
In previous block of this course, we have discussed one part of statistical
inference, that is, estimation and we have learnt how we estimate the unknown
population parameter(s) by using point estimation and interval estimation. In
this block, we will focus on the second part of statistical inference which is
known as testing of hypothesis.
In our day-to-day life, we see different commercials advertisements in
television, newspapers, magazines, etc. such as
(i) The refrigerator of certain brand saves up to 20% electric bill,
(ii) The motorcycle of certain brand gives 60 km/liter mileage,
(iii) A detergent of certain brand produces the cleanest wash,
(iv) Ninety nine out of hundred dentists recommend brand A toothpaste for
their patients to save the teeth against cavity, etc.
Now, the question may arise in our mind “can such types of claims be verified
statistically?” Fortunately, in many cases the answer is “yes”.
The technique of testing such type of claims or statements or assumptions is
known as testing of hypothesis. The truth or falsity of a claim or statement is
never known unless we examine the entire population. But practically it is not
possible in mostly situations so we take a random sample from the population
under study and use the information contained in this sample to take the
decision whether a claim is true or false.
This unit is divided into 11 sections. Section 9.1 is introductory in nature. In
Section 9.2, we defined the hypothesis. The concept and role of critical region
in testing of hypothesis is described in Section 9.3. In Section 9.4, we explored
the types of errors in testing of hypothesis whereas level of significance is
explored in Section 9.5. In Section 9.6, we explored the types of tests in testing
5
Testing of Hypothesis of hypothesis. The general procedure of testing a hypothesis is discussed in
Section 9.7. In Section 9.8, the concept of p-value in decision making about the
null hypothesis is discussed whereas the relation between confidence interval
and testing of hypothesis is discussed in Section 9.9. Unit ends by providing
summary of what we have discussed in this unit in Section 9.10 and solution of
exercises in Section 9.11.
Objectives
After reading this unit, you should be able to:
 define a hypothesis;
 formulate the null and alternative hypotheses;
 explain what we mean by type-I and type-II errors;
 explore the concept of critical region and level of significance;
 define one-tailed and two-tailed tests;
 describe the general procedure of testing a hypothesis;
 concept of p-value; and
 test a hypothesis by using confidence interval.
Before coming to the procedure of testing of hypothesis, we will discuss the
basis terms used in this procedure one by one in subsequent sections.

9.2 HYPOTHESIS
As we have discussed in previous section that in our day-to-day life, we see
different commercials advertisements in television, newspapers, magazines,
etc. and if someone may be interested to test such type of claims or statement
then we come across the problem of testing of hypothesis. For example,
(i) a customer of motorcycle wants to test whether the claim of motorcycle
of certain brand gives the average mileage 60 km/liter is true or false,
(ii) the businessman of banana wants to test whether the average weight of a
banana of Kerala is more than 200 gm,
(iii) a doctor wants to test whether new medicine is really more effective for
controlling blood pressure than old medicine,
(iv) an economist wants to test whether the variability in incomes differ in
two populations,
(v) a psychologist wants to test whether the proportion of literates between
two groups of people is same, etc.
In all the cases discussed above, the decision maker is interested in making
inference about the population parameter(s). However, he/she is not interested
in estimating the value of parameter(s) but he/she is interested in testing a
claim or statement or assumption about the value of population parameter(s).
Such claim or statement is postulated in terms of hypothesis.
In statistics, a hypothesis is a statement or a claim or an assumption about the
value of a population parameter (e.g., mean, median, variance, proportion,
etc.).
Similarly, in case of two or more populations a hypothesis is comparative
statement or a claim or an assumption about the values of population
parameters. (e.g., means of two populations are equal, variance of one
population is greater than other, etc.). The plural of hypothesis is hypotheses.
6
In hypothesis testing problems first of all we should being identifying the claim Concepts of Testing of
or statement or assumption or hypothesis to be tested and write it in the words. Hypothesis
Once the claim has been identified then we write it in symbolical form if
possible. As in the above examples,
(i) Customer of motorcycle may write the claim or postulate the hypothesis
“the motorcycle of certain brand gives the average mileage 60 km/liter.”
Here, we are concerning the average mileage of the motorcycle so let µ
represents the average mileage then our hypothesis becomes µ = 60 km /
liter.
(ii) Similarly, the businessman of banana may write the statement or
postulate the hypothesis “the average weight of a banana of Kerala is
greater than 200 gm.” So our hypothesis becomes µ > 200 gm.
(iii) Doctor may write the claim or postulate the hypothesis “ the new
medicine is really more effective for controlling blood pressure than old
medicine.” Here, we are concerning the average effect of the medicines
so let µ1 and µ2 represent the average effect of new and old medicines
respectively on controlling blood pressure then our hypothesis becomes
µ1 > µ2.
(iv) Economist may write the statement or postulate the hypothesis “ the
variability in incomes differ in two populations.” Here, we are concerning
the variability in income so let 12 and 22 represent the variability in
incomes in two populations respectively then our hypothesis becomes
12  22 .
(v) Psychologist may write the statement or postulate the hypothesis “the
proportion of literates between two groups of people is same.” Here, we
are concerning the proportion of literates so let P1 and P2 represent the
proportions of literates of two groups of people respectively then our
hypothesis becomes P1 = P2 or P1 –P2 = 0.
The hypothesis is classified according to its nature and usage as we will discuss
in subsequent subsections.
9.2.1 Simple and Composite Hypotheses
In general sense, if a hypothesis specifies only one value or exact value of the
population parameter then it is known as simple hypothesis. And if a
A hypothesis which
hypothesis specifies not just one value but a range of values that the population completely specifies
parameter may assume is called a composite hypothesis. parameter(s) of a
As in the above examples, the hypothesis postulated in (i) µ = 60 km/liter is theoretical population
(probability distribution)
simple hypothesis because it gives a single value of parameter (µ = 60) is called a simple
whereas the hypothesis postulated in (ii) µ > 200 gm is composite hypothesis hypothesis otherwise
because it does not specify the exact average value of weight of a banana. It called composite
may be 260, 350, 400 gm or any other. hypothesis.

Similarly, (iii) µ1 > µ2 or µ1 −µ2 > 0 and (iv) 12  22 or 12  22  0 are not
simple hypotheses because they specify more than one value as µ1 −µ2 = 4,
µ1 −µ2 = 7, 12  22  2, 12  22  5 , etc. and (v) P1 = P2 or P1 –P2 = 0 is simple
hypothesis because it gives a single value of parameter as P1 –P2 = 0.
9.2.2 Null and Alternative Hypotheses
As we have discussed in last page that in hypothesis testing problems first of
all we identify the claim or statement to be tested and write it in symbolical
7
Testing of Hypothesis form. After that we write the complement or opposite of the claim or statement
in symbolical form. In our example of motorcycle, the claim is µ = 60 km/liter
then its complement is µ ≠ 60 km/liter. In (ii) the claim is µ > 200 gm then its
complement is µ ≤ 200 gm. If the claim is µ < 200 gm then its complement is
µ ≥ 200 gm. The claim and its complement are formed in such a way that they
cover all possibility of the value of population parameter.
Once the claim and its compliment have been established then we decide of
We state the null and these two which is the null hypothesis and which is the alternative hypothesis.
alternative The thump rule is that the statement containing equality is the null hypothesis.
hypotheses in such a
way that they cover
That is, the hypothesis which contains symbols  or  or  is taken as null
all possibility of the hypothesis and the hypothesis which does not contain equality i.e. contains
value of population  or  or  is taken as alternative hypothesis. The null hypothesis is denoted
parameter.
by H0 and alternative hypothesis is denoted by H1 or HA.
In our example of motorcycle, the claim is µ = 60 km/liter and its complement
is µ ≠ 60 km/liter. Since claim µ = 60 km/liter contains equality sign so we take
it as a null hypothesis and complement µ ≠ 60 km/liter as an alternative
hypothesis, that is,
H0 : µ = 60 km/liter and H1: µ ≠ 60 km/liter
In our second example of banana, the claim is µ > 200 gm and its complement
is µ ≤ 200 gm. Since complement µ ≤ 200 gm contains equality sign so we take
complement as a null hypothesis and claim µ > 200 gm as an alternative
hypothesis, that is,
H0 : µ ≤ 200 gm and H1: µ > 200 gm
Formally these hypotheses are defined as
The hypothesis which we wish to test is called as the null hypothesis.
According to Prof. R.A. Fisher,
“A null hypothesis is a hypothesis which is tested for possible rejection under
the assumption that it is true.”
The hypothesis which complements to the null hypothesis is called alternative
hypothesis.
Note 1: Some authors use equality sign in null hypothesis instead of ≥ and ≤
signs.
The alternative hypothesis has two types:
(i) Two-sided (tailed) alternative hypothesis
(ii) One-sided (tailed) alternative hypothesis
If the alternative hypothesis gives the alternate of null hypothesis in both
directions (less than and greater than) of the value of parameter specified in
null hypothesis then it is known as two-sided alternative hypothesis and if it
gives an alternate only in one direction( less than or greater than) only then it is
known as one-sided alternative hypothesis. For example, if our alternative
hypothesis is H1 : θ ≠ 60 then it is a two-sided alternative hypothesis because its
means that the value of parameter θ is greater than or less than 60. Similarly, if
H1 : θ > 60 then it is a right-sided alternative hypothesis because its means that
the value of parameter θ is greater than 60 and if H1: θ < 60 then it is a
left-sided alternative hypothesis because its means that the value of parameter θ
is less than 60.
In testing procedure, we assume that the null hypothesis is true until there is
sufficient evidence to prove that it is false. Generally, the hypothesis is tested
8
with the help of a sample so evidence in testing of hypothesis comes from a Concepts of Testing of
sample. If there is enough sample evidence to suggest that the null hypothesis Hypothesis
is false then we reject the null hypothesis and support the alternative
hypothesis. If the sample fails to provide us sufficient evidence against the null
hypothesis we are not saying that the null hypothesis is true because here, we
take the decision on the basis of a random sample which is a small part of the
population. To say that null hypothesis is true we must study all observations
of the population under study. For example, if someone wants to test that the
person of India has two hands then to prove that this is true we must check all
the persons of India whereas to prove that it is false we require a person he /
she has one hand or no hand. So we can only say that there is not enough
evidence against the null hypothesis.
Note 2: When we assume that null hypothesis is true then we are actually
assuming that the population parameter is equal to the value in the claim. In our
example of motorcycle, we assume that µ = 60 km/liter whether the null
hypothesis is µ = 60 km/liter or µ ≤ 60 km/liter or µ ≥ 60 km/liter.
Now, you can try the following exercises.
E1) A company manufactures car tyres. Company claims that the average life
of its tyres is 50000 miles. To test the claim of the company, formulate
the null and alternative hypotheses.
E2) Write the null and alternative hypotheses in case (iii), (iv) and (v) of our
example given in Section 9.2.
E3) A businessman of orange formulates different hypotheses about the
average weight of the orange which are given below:
(i) H0:  = 100 (ii) H1 :  >100 (iii) H0 :  ≤ 100 (iv) H1:  ≠ 100
(v) H1:  > 150 (vi) H0:  = 130 (vii) H1:   0
Categorize the above cases into simple and composite hypotheses.
After describing the hypothesis and its types our next point in the testing of
hypothesis is critical region which will be described in next section.

9.3 CRITICAL REGION


As we have discussed in Section 9.1 that generally, null hypothesis is tested by
the sample data. Suppose X1, X2,…, Xn be a random sample drawn from a
population having unknown population parameter . The collection of all
possible values of X1, X2,…, Xn is a set called sample space(S) and a particular
value of X1, X2,…, Xn represents a point in that space. A statistic is a function
In order to test a hypothesis, the entire sample space is partitioned into two of sample observations
(not including
disjoint sub-spaces, say,  and S    . If calculated value of the test parameter). Basically, a
statistic lies in ω , then we reject the null hypothesis and if it lies in ω , then we test statistic is a statistic
do not reject the null hypothesis. The region  is called a “rejection region or which is used in
critical region” and the region  is called a “non-rejection region”. decision making about
the null hypothesis.
Therefore, we can say that
“A region in the sample space in which if the calculated value of the test
statistic lies, we reject the null hypothesis then it is called critical region or
rejection region.”
This can be better understood with the help of an example. Suppose, 100
students appear in total 10 papers two of each in English, Physics, Chemistry,
Mathematics and Computer Science of a Programme. Suppose the scores in
Testing of Hypothesis these papers are denoted by X1, X2, …, X10 and maximum marks =100 for each
paper. For obtaining the distinction award in this Programme, a student needs
to have total score equal to or more than 750 which is a rule.
Suppose, we select one student randomly out of 100 students and we want to
test that the selected student is a distinction award holder. So we can take the
null and alternative hypotheses as
H0 : Selected student is a distinction award holder
H1 : Selected student is not a distinction award holder
For taking the decision about the selected student, we define a statistic
10
T10  X
i 1
i as the sum of the scores in all the 10 papers of the student. The

range of T10 is 0  T10  1000. Now, we divide the whole space ( 0-1000) into
two regions as no-distinction awarded region (less than 750) and distinction
awarded region (greater than or equal to 750) as shown in Fig. 9.1. Here, 750 is
the critical value which separates the no-distinction and distinction awarded
regions.

Tn
Fig. 9.1: Non-rejection and critical regions for distinction award
On the basis of scores in all the papers of the selected student, we calculate the
10
value of the statistic T10   Xi . And calculated value may fall in distinction
i 1
award region or not, depending upon the observed value of test statistic.
For making a decision to reject or do not reject H0, we use test statistic
10
T10  X
i 1
i (sum of scores of 10 papers). If calculated value of test statistic T10

lies in no-distinction awarded region (critical region), that is, T10 < 750 then we
reject H0 and if calculated value of test statistic T10 lies in distinction awarded
region (non-rejection region), that is, T10  750 then we do not reject H0. It is a
basic structure of the procedure of testing of hypothesis which needs two
regions like:
(i) Region of rejection of null hypothesis H0
(ii) Region of non-rejection of null hypothesis H0
The point of discussion in this test procedure is “how to fix the cut off value
750”? What is the justification for this value? The distinction award region
may be like T10  800 or at T10  850 or at T10  900. So, there must be a
scientific justification for the cut-off value 750. In a statistical test procedure it
is obtained by using the probability distribution of the test statistic.
The region of rejection is called critical region. It has a pre-fixed area generally
denoted by , corresponding to a cut-off value in a probability distribution of
test statistic.
The rejection (critical) region lies in one-tail or two-tails on the probability
curve of sampling distribution of the test statistic its depends upon the
alternative hypothesis. Therefore, three cases arise: Concepts of Testing of
Hypothesis
Case I: If the alternative hypothesis is right-sided such as H1 : θ > θ0 or
H1 : θ1 > θ2 then the entire critical or rejection region of size α lies on
right tail of the probability curve of sampling distribution of the test
statistic as shown in Fig. 9.2.

Critical value is a
value or values that
separate the region of
rejection from the non-
rejection region.

Fig. 9.2
Case II: If the alternative hypothesis is left-sided such as H1: θ < θ0 or
H1 : θ1 < θ2 then the entire critical or rejection region of size α lies on
left tail of the probability curve of sampling distribution of the test
statistic as shown in Fig. 9.3.

Fig. 9.3
Case III: If the alternative hypothesis is two sided such as H1 : θ ≠ θ0 or
H1 : θ1 ≠ θ2 then critical or rejection regions of size α/2 lies on both
tails of the probability curve of sampling distribution of the test
statistic as shown in Fig. 9.4.

Fig. 9.4
Now, you can try the following exercise.
E4) If H0: θ = 60 and H1: θ ≠ 60 then critical region lies in one-tail or two-
tails.
Testing of Hypothesis
9.4 TYPE-I AND TYPE-II ERRORS
In Section 9.3, we have discussed a rule that if the value of test statistic falls in
rejection (critical) region then we reject the null hypothesis and if it falls in the
non-rejection region then we do not reject the null hypothesis. A test statistic is
calculated on the basis of observed sample observations. But a sample is a
small part of the population about which decision is to be taken. A random
sample may or may not be a good representative of the population.
A faulty sample misleads the inference (or conclusion) relating to the null
hypothesis. For example, an engineer infers that a packet of screws is sub-
standard when actually it is not. It is an error caused due to poor or
inappropriate (faulty) sample. Similarly, a packet of screws may infer good
when actually it is sub-standard. So we can commit two kinds of errors while
testing a hypothesis which are summarised in Table 9.1 which is given below:
Table 9.1: Type of Errors
Decision H0 True H1 True
Reject H0 Type-I Error Correct Decision
Do not reject H0 Correct Decision Type-II Error

Let us take a situation where a patient suffering from high fever reaches to a
doctor. And suppose the doctor formulates the null and alternative hypotheses
as
H0 : The patient is a malaria patient
H1 : The patient is not a malaria patient
Then following cases arise:
Case I: Suppose that the hypothesis H0 is really true, that is, patient actually
a malaria patient and after observation, pathological and clinical
examination, the doctor rejects H0, that is, he / she declares him / her
a non-malaria-patient. It is not a correct decision and he / she
commits an error in decision known as type-I error.
Case II: Suppose that the hypothesis H0 is actually false, that is, patient
actually a non-malaria patient and after observation, the doctor
rejects H0, that is, he / she declares him / her a non-malaria-patient. It
is a correct decision.
Case III: Suppose that the hypothesis H0 is really true, that is, patient actually
a malaria patient and after observation, the doctor does not reject H0,
that is, he / she declares him / her a malaria-patient. It is a correct
decision.
Case IV: Suppose that the hypothesis H0 is actually false, that is, patient
actually a non-malaria patient and after observation, the doctor does
not reject H0, that is, he / she declares him / her a malaria-patient. It
is not a correct decision and he / she commits an error in decision
known as type-II error.
Thus, we formally define type-I and type-II errors as below:
Type-I Error:
The decision relating to rejection of null hypothesis H0 when it is true is called
type-I error. The probability of committing the type-I error is called size of test,
denoted by  and is given by
 = P [Reject H0 when H0 is true] = P [Reject H0 / H0 is true]

12
We reject the null hypothesis if random sample / test statistic falls in rejection Concepts of Testing of
region, therefore, Hypothesis

α = P [ X   / H0 ]
where X = (X1, X2,…,Xn) is a random sample and ω is the rejection region and
1- = 1-P[Reject H0 / H0 is true]
= P[Do not reject H0 / H0 is true] = P[Correct decision]
The (1-) is the probability of correct decision and it correlates to the concept
of 100(1-)% confidence interval used in estimation.
Type-II Error:
The decision relating to non-rejection of null hypothesis H0 when it is false
(i.e. H1 is true) is called type-II error. The probability of committing type-II
error is generally denoted by  and is given by
 = P[Do not reject H0 when H0 is false]
= P[Do not reject H0 when H1 is true]
= P[Do not reject H0 / H1 is true]
= P[ X  ω / H1 ] where,  is the non-rejection region.
and
1- = 1-P[Do not reject H0 / H1 is true]
= P[Reject H0 / H1 is true] = P[Correct decision]
The (1-) is the probability of correct decision and also known as “power of
the test”. Since it indicates the ability or power of the test to recognize
correctly that the null hypothesis is false, therefore, we wish a test that yields a
large power.
We say that a statistical test is ideal if it minimizes the probability of both types
of errors and maximizes the probability of correct decision. But for fix sample
size,  and  are so interrelated that the decrement in one results into the
increment in other. So minimization of both probabilities of type-I and type-II
errors simultaneously for fixed sample size is not possible without increasing
sample size. Also both types of errors will be at zero level (i.e. no error in
decision) if size of the sample is equal to the population size. But it involves
huge cost if population size is large. And it is not possible in all situations such
as testing of blood.
Depending on the problem in hand, we have to choose the type of error which
has to minimize. For this, we have to look at a situation, suppose there is a
decision making problem and there is a rule that if we make type-I error, we
lose10 rupees and if we make type-II error we lose 1000 rupees. In this case,
we try to eliminate the type-II error, since it is more expensive.
In another situation, suppose the Delhi police arrests a person whom they
suspect is a murderer. Now, policemen have to test hypothesis:
H0: Arrested person is innocent (not murderer)
H1: Arrested person is a murderer
The type-I error is
 = P [Reject H0 when it is true]
That is, suspected person who is actually an innocent will be sent to jail when
H0 rejects, although H0 being a true.
Testing of Hypothesis The type-II error is
 = P [Do not reject H0 when H1 is true]
That is, when arrested person truly a murderer but released by the police. Now,
we see that in this case type-I error is more serious than type-II error because a
murderer may be arrested / punished later on but sending jail to an innocent
person is serious.
Consider another situation, suppose we want to test the null hypothesis
H0 : p  0.5 against H1 : p  0.5 on the basis of tossing a coin once, where p is
the probability of getting a head in a single toss (trial). And we reject the null
hypothesis if a head appears and do not reject otherwise. The type-I error, that
is, the probability of Reject H0 when it is true can be calculated easily(as shown
in Example 1) but the computation of type-II error is not possible because there
are infinitely many alternatives for p such as p = 0.6, p = 0.1, etc.
Generally, strong control on α is necessary. It should be kept as low as
possible. In test procedure, we prefix it at very low level like  = 0.05 ( 5%) or
0.01 (1%) .
Now, it is time to do some examples relating to α and .
Example 1: It is desired to test a hypothesis H 0 :p  p0  1/ 2 against the
alternative hypothesis H1 :p  p1  1/ 4 on the basis of tossing a coin once,
where p is the probability of “getting a head” in a single toss (trial) and
agreeing to reject H0 if a head appears and accept H0 otherwise. Find the value
of  and .
Solution: In such type of problems, first of all we search for critical region.
Here, we have critical region  = {head}
Therefore, the probability of type-I error can be obtained as
 = P[Reject H0 when H0 is true]
 P[X   / H 0 ]= P[Head appears / H 0 ]
1  H0 is trueso we take value 
 P  Head appears  1 
p
2 2  of parameter pgiven in H 0 
Also,
 = P[Do not reject H0 when H1 is true]
 P  X   / H1   P [Tail appears / H1 ]

 P  Tail appears   H1 is trueso we take value 


p
1
4 of parameter p given in H1 
1 3
 1  P  Head appears  1  1  
p
4 4 4
Example 2: For testing H0 :  = 1 against H1 :  = 2, the pdf of the variable is
given by
1
 ; 0x
f x ,    
 0; elsewhere

Obtain type-I and type-II errors when critical region is X  0.4. Also obtain
power function of the test.

14
Solution: Here, we have critical (rejection) and non-rejection regions as Concepts of Testing of
Hypothesis
  X : X  0.4 and   X : X  0.4
We have to test the null hypothesis
H 0 : θ  1 against H1 : θ  2
The size of type-I error is given by
  P  X   / H 0   P  X  0.4 /   1
   

   f  x,  dx   
 P X  a   f  x,  dx
 … (1)
0.4 1  a 
1
Now, by using f x ,    ; 0  x  , we get from equation (1)

 1  1
1
    dx    dx   x 0.4  1  0.4  0.6
 0.4   1 0.4
Similarly, the size of type-II error is given by
  P  X   / H1   P  X  0.4 /   2 

 0.4 1  0.4
1 1 0.4 1
    dx    dx   x 0   0.4  0   0.2
 0    2 0 2 2 2

The power function of the test = 1- = 1- 0.2 = 0.8.


Now, you can try the following exercise.
E5) An urn contains either 4 white and 2 black balls or 2 white and 4 black
balls. Two balls are to be drawn from the urn. If less than two white
balls are obtained, it will be decided that this urn contains 2 white and 4
black balls. Calculate the values of  and .

9.5 LEVEL OF SIGNIFICANCE


So far in this unit, we have discussed the hypothesis, types of hypothesis,
critical region and types of errors. In this section, we shall discuss very useful
concept “level of significance”, which play an important role in decision
making while testing a hypothesis.
The probability of type-I error is known as level of significance of a test. It is
also called the size of the test or size of critical region, denoted by α. Generally,
it is pre-fixed as 5% or 1% level (α = 0.05 or 0.01). As we have discussed in
Section 9.3 that if calculated value of the test statistic lies in rejection(critical)
region, then we reject the null hypothesis and if it lies in non-rejection region,
then we do not reject the null hypothesis. Also we note that when H0 is rejected
then automatically the alternative hypothesis H1 is accepted. Now, one point of
our discussion is that how to decide critical value(s) or cut-off value(s) for a
known test statistic.
If distribution of test statistic could be expressed into some well-known
distributions like Z, 2, t, F etc. then our problem will be solved and using the
probability distribution of test statistic, we can find the cut-off value(s) that
provides us critical area equal to 5% (or 1%).
15
Testing of Hypothesis Another viewpoint about the level of significance relates to the trueness of the
conclusion. If H0 do not reject at level, say,  = 0.05 (5% level) then a person
will be confident that “concluding statement about H0” is true with 95%
assurance. But even then it may false with 5% chance. There is no cent-percent
assurance about the trueness of statement made for H0.
As an example, if among 100 scientists, each draws a random sample and use
the same test statistic to test the same hypothesis H0 conducting same
experiment, then 95 of them will reach to the same conclusion about H0. But
still 5 of them may differ (i.e. against the earlier conclusion).
Similar argument can be made for, say,  = 0.01 (=1%). It is like when H0 is
rejected at  = 0.01by a scientist , then out of 100 similar researchers who
work together at the same time for the same problem, but with different
random samples, 99 of them would reach to the same conclusion however, one
may differ.
Now, you can try the following exercise.
E6) If probability of type-I error is 0.05 then what is the level of significance?

9.6 ONE-TAILED AND TWO-TAILED TESTS


We have seen in Section 9.3 that rejection (critical) region lies at one-tail or
two-tails on the probability curve of sampling distribution of the test statistic its
depend upon the form of alternative hypothesis. Similarly, the test of testing
the null hypothesis also depends on the alternative hypothesis.
A test of testing the null hypothesis is said to be two-tailed test if the alternative
hypothesis is two-tailed whereas if the alternative hypothesis is one-tailed then
a test of testing the null hypothesis is said to be one-tailed test.
For example, if our null and alternative hypothesis are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is two-tailed test because the
alternative hypothesis is two-tailed that means, the parameter θ can take value
greater than θ0 or less than θ0.
If the null and alternative hypotheses are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is right-tailed test because the
alternative hypothesis is right-tailed.
Similarly, if the null and alternative hypotheses are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is left-tailed test because the
alternative hypothesis is left-tailed.
The above discussion can be summarised in Table 9.2.
Table 9.2: Null and Alternative Hypotheses and Corresponding One-tailed and
Two-tailed Tests
Null Hypothesis Alterative Hypothesis Types of Critical Region / Test
Two-tailed test having critical
H0 : θ  θ0 H1 : θ  θ 0 regions under both tails.
Right-tailed test having critical
H 0 :   0 H1 : θ  θ 0 region under right tail only.
Left- tailed test having critical
H 0 :   0 H1 : θ  θ 0 region under left tail only.

16
Let us do one example based on type of tests. Concepts of Testing of
Hypothesis
Example 3: A company has replaced its original technology of producing
electric bulbs by CFL technology. The company manager wants to compare the
average life of bulbs manufactured by original technology and new technology
CFL. Write appropriate null and alternate hypotheses. Also say about one tailed
and two tailed tests.
Solution: Suppose the average lives of original and CFL technology bulbs are
denoted by 1 and 2 respectively.
If company manager is interested just to know whether any significant
difference exists in average-life time of two types of bulbs then null and
alternative hypotheses will be:
H0 : µ1 = µ2 [average lives of two types of bulbs are same]
H1 : µ1  µ2 [average lives of two types of bulbs are different]
Since alternative hypothesis is two-tailed therefore corresponding test will be
two-tailed.
If company manager is interested just to know whether average life of CFL is
greater than original technology bulbs then our null and alternative hypotheses
will be
H0 : µ1 ≥ µ2

H1: µ1 < µ2  average life of CFLtechnology bulbs 


 is greater than average life of original technology 
Since alternative hypothesis is left-tailed therefore corresponding test will be
left-tailed test.
Now, you can try the following exercises.
E7) If we have null and alternative hypotheses as
H0:  = 0 and H1:   0
then corresponding test will be
(i) left-tailed test (ii) right-tailed test (iii) both-tailed test
Write the correct option.
E8) The test whether one-tailed or two-tailed depends on
(i) Null hypothesis (H0) (ii) Alternative hypothesis (H1)
(iii) Neither H0 nor H1 (iv) Both H0 and H1

9.7 GENERAL PROCEDURE OF TESTING A


HYPOTHESIS
Testing of hypothesis is a huge demanded statistical tool by many discipline
and professionals. It is a step by step procedure as you will see in next three
units through a large number of examples. The aim of this section is just give
you flavor of that sequence which involves following steps:
Step I: First of all, we have to setup null hypothesis H0 and alternative
hypothesis H1. Suppose, we want to test the hypothetical / claimed /

17
Testing of Hypothesis assumed value θ0 of parameter θ. So we can take the null and
alternative hypotheses as
H 0 :   0 and H1 :   0 for two-tailed test
H 0 :   0 and H1 :   0 
or  for one-tailed test 
H 0 :   0 and H1 :   0 
In case of comparing same parameter of two populations of interest,
say, 1 and 2, then our null and alternative hypotheses would be
H0 : 1  2 and H1 : 1  2 for two-tailed test
H 0 : 1  2 and H1 : 1  2 
or   for one-tailed test 
H 0 : 1  2 and H1 : 1  2 
Step II: After setting the null and alternative hypotheses, we establish a
criteria for rejection or non-rejection of null hypothesis, that is,
decide the level of significance (), at which we want to test our
hypothesis. Generally, it is taken as 5% or 1% (α = 0.05 or 0.01).
Step III: The third step is to choose an appropriate test statistic under H0 for
testing the null hypothesis as given below:
Statistic  Value of the parameter under H 0
Test statistic 
Standard error of statistic
After that, specify the sampling distribution of the test statistic
preferably in the standard form like Z (standard normal), 2, t, F or
any other well-known in literature.
Step IV: Calculate the value of the test statistic described in Step III on the
basis of observed sample observations.
Step V: Obtain the critical (or cut-off) value(s) in the sampling distribution
of the test statistic and construct rejection (critical) region of size .
Generally, critical values for various levels of significance are
putted in the form of a table for various standard sampling
distributions of test statistic such as Z-table, 2-table, t-table, etc.
Step VI: After that, compare the calculated value of test statistic obtained
from Step IV, with the critical value(s) obtained in Step V and
locates the position of the calculated test statistic, that is, it lies in
rejection region or non-rejection region.
Step VII: In testing of hypothesis ultimately we have to reach at a conclusion.
It is done as explained below:
(i) If calculated value of test statistic lies in rejection region at 
level of significance then we reject null hypothesis. It means
that the sample data provide us sufficient evidence against the
null hypothesis and there is a significant difference between
hypothesized value and observed value of the parameter.
(ii) If calculated value of test statistic lies in non-rejection region at
 level of significance then we do not reject null hypothesis. Its
means that the sample data fails to provide us sufficient
evidence against the null hypothesis and the difference between
hypothesized value and observed value of the parameter due to
fluctuation of sample.
18
Note 3: Nowadays the decision about the null hypothesis is taken with the help Concepts of Testing of
of p-value. The concept of p-value is very important, because computer Hypothesis
packages and statistical software such as SPSS, SAS, STATA, MINITAB,
EXCEL, etc. all provide p-value. So, Section 9.8 is devoted to explain the
concept of p-value.
Now, with the help of an example we explain the above procedure.
Example 4: Suppose, it is found that average weight of a potato was 50 gm
and standard deviation was 5.1 gm nearly 5 years ago. We want to test that due
to advancement in agricultural technology, the average weight of a potato has
been increased. To test this, a random sample of 50 potatoes is taken and
calculate the sample mean (X) as 52gm. Describe the procedure to carry out
this test.
Solution: Here, we are given that
Specified value of population mean = 0 = 50 gm,
Population standard deviation = σ = 5.1 gm,
Sample size = n = 50,
Sample mean = X = 52 gm
To carry out the above test, we have to follow up the following steps:
Step I: First of all, we setup null and alternative hypotheses. Here, we want
to test that the average weight of potato is increased. So our claim is
“average weight of potato has increased” i.e. µ > 50 and its
complement is µ ≤ 50. Since complement contains equality sign so
we can take the complement as the null hypothesis and claim as the
alternative hypothesis, that is,
H0 : µ ≤ 50 gm and H1: µ > 50 gm [Here, θ = µ]
Since the alternative hypothesis is right-tailed, so our test is right-
tailed.
Step II: After setting the null and alternative hypotheses, we fix level of
significance α. Suppose, α = 0.01 (= 1 % level).
Step III: Define a test statistic to test the null hypothesis as
Statistic  Value of the parameter under H 0
Test staistic 
Standard error of statistic
X  50
T
σ/ n
Since sample size is large (n = 50 > 30) so by central limit theorem
the sampling distribution of test statistic approximately follows
standard normal distribution (as explained in Unit 1 of this course),
i.e. T ~ N(0,1)
Step IV: Calculate the value of test statistic on the basis of sample
observations as
52  50 2
T   2.78
5.1/ 50 0.72
Step V: Now, we find the critical value. The critical value or cut-off value for
standard normal distribution is given in Table I (Z-table) in the
Appendix at the end of Block 1 of this course. So from this table, the
critical value for right-tailed test at  = 0.01 is zα = 2.33.
19
Testing of Hypothesis Step IV: Now, to take the decision about the null hypothesis, we compare the
calculated value of test statistic with the critical value.
Since calculated value of test statistic (= 2.78) is greater than critical
value (= 2.33), that means calculated value of test statistic lies in
rejection region at 1% level of significance as shown in Fig. 9.5. So
we reject null hypothesis and support the alternative hypothesis. Since
alternative hypothesis is our claim, so we support the claim.
Thus, we conclude that sample does not provide us sufficient
evidence against the claim so we may assume that the average weight
of potato has increased.
Fig. 9.5 Now, you can try the following exercise.
E9) What is the first step in testing of hypothesis?

9.8 CONCEPT OF p-VALUE


In Note 3 of Section 9.6, we promised that p-value will be discussed in Section
9.8. So, it is the time to keep our promise. Nowadays use of p-value is
becoming more and more popular because of the following two reasons:
 most of the statistical software provides p-value rather than critical value.
 p-value provides more information compared to critical value as far as
rejection or do not rejection of H 0 .
The first point listed above needs no explanation. But second point lies in the
heart of p-value and needs to explain more clearly. Moving in this direction, we
note that in scientific applications one is not only interested simply in rejecting
or not rejecting the null hypothesis but he/she is also interested to assess how
strong the data has the evidence to reject H0. For example, as we have seen in
Example 4 based on general procedure of testing a hypothesis where we tested
the null hypothesis
H0 :  ≤ 50 gm against H1:  > 50 gm
To test the null hypothesis, we calculated the value of test statistic as 2.78 and
the critical value (zα) at  = 0.01 was zα = 2.33.
Since calculated value of test statistic (= 2.78) is greater than critical
(tabulated) value (= 2.33), therefore, we reject the null hypothesis at 1% level
of significance.
Now, if we reject the null hypothesis at this level (1%) surely we have to reject
it at higher level because at α = 0.05, zα = 1.645 and at α = 0.10, zα = 1.28.
However, the calculated value of test statistic is much higher than 1.645 and
1.28, therefore, the question arises “could the null hypothesis also be rejected at
values of α smaller than 0.01?” The answer is “yes” and we can compute the
smallest level of significance (α) at which a null hypothesis can be rejected.
This smallest level of significance (α) is known as “p-value”.
The p-value is the smallest value of level of significance(α) at which a null
hypothesis can be rejected using the obtained value of the test statistic and can
be defined as:
The p-value is the probability of obtaining a test statistic equal to or more
extreme (in the direction of sporting H1) than the actual value obtained when
null hypothesis is true.
The p-value also depends on the type of the test. If test is one-tailed then the p- Concepts of Testing of
value is defined as: Hypothesis

For right-tailed test:


p-value = P[Test Statistic (T) ≥ observed value of the test statistic]
For left-tailed test:
p-value = P[Test Statistic (T) ≤ observed value of the test statistic]
If test is two-tailed then the p-value is defined as:
For two-tailed test:
p-value = 2P T  observed value of the test statistic 

Procedure of taking the decision about the null hypothesis on the basis of
p-value:
To take the decision about the null hypothesis based on p-value, the p-value is
compared with level of significance (α) and if p-value is equal or less than 
then we reject the null hypothesis and if the p-value is greater than  we do not
reject the null hypothesis.
The p-value for various tests can be obtained with the help of the tables given
in the Appendix of the Block 1 of this course. But unless we are dealing with
the standard normal distribution, the exact p-value is not obtained with the
tables as mentioned above. But if we test our hypothesis with the help of
computer packages or softwares such as SPSS, SAS, MINITAB, STATA,
EXCEL, etc. then these types of computer packages or softwares present the p-
value as part of the output for each hypothesis testing procedure. Therefore, in
this block we will also describe the procedure to take the decision about the
null hypothesis on the basis of critical value as well as p-value concepts.

9.9 RELATION BETWEEN CONFIDENCE


INTERVAL AND TESTING OF HYPOTHESIS
In Units 7 and 8 of this course, we have learned about confidence intervals
which were used to estimate the unknown population parameters with certain
confidence. In Section 9.7, we have discussed the general procedure of testing
a hypothesis which has been used in making decision about the specified/
assumed/ hypothetical values of population parameters. Both confidence
interval and hypothesis testing have been used for different purposes but have
been based on the same set of concepts. Therefore, there is an extremely close
relationship between confidence interval and hypothesis testing.
In confidence interval, if we construct (1−α) 100% confidence interval for an
unknown parameter then this interval contained all probable values for the
parameter being estimated and relatively improbable values are not contained
by this interval.
So this concept can also be used in hypothesis testing. For this, we contract an
appropriate (1−α) 100% confidence interval for the parameter specified by the
null hypothesis as we have discussed in Units 7 and 8 of this course and if the
value of the parameter specified by the null hypothesis lies in this confident
interval then we do not reject the null hypothesis and if this specified value
does not lie in this confidence interval then we reject the null hypothesis.
Therefore, three cases may arise:
21
Testing of Hypothesis Case I: If we want to test the null hypothesis H0: θ = θ0 against the alternative
hypothesis H1 : θ ≠ θ0 at 5% or 1% level of significance then we
construct two-sided  1  α 100%  95% or 99% confidence interval
for the parameter θ. And we have 95% or 99% (as may be the case)
confidence that this interval will include the parameter value θ0. If the
value of the parameter specified by the null hypothesis i.e. θ0 lies in
this confidence interval then we do not reject the null hypothesis
otherwise we reject the null hypothesis.
Case II: If we want to test the null hypothesis H0 : θ ≤ θ0 against the alternative
hypothesis H1: θ > θ0 then we construct the lower one-sided
confidence bound for parameter θ. If the value of the parameter
specified by the null hypothesis i.e. θ0 is greater than or equal to this
lower bound then we do not reject the null hypothesis otherwise we
reject the null hypothesis.
Case III: If we want to test the null hypothesis H0: θ ≥ θ0 against the
alternative hypothesis H1: θ < θ0 then we construct the upper one-
sided confidence bound for parameter θ. If the value of the parameter
specified by the null hypothesis i.e. θ0 is less than or equal to this
upper bound then we do not reject the null hypothesis otherwise we
reject the null hypothesis.
For example, referring back to Example 4 of this unit, here we want to test the
null hypothesis
H 0 : µ  50gm against H1 : µ  50gm
This was tested with the help of test statistic
X µ
T  Here,θ  μ 
σ/ n
and we reject the null hypothesis at  = 0.01.
This problem could also have been solved by obtaining confidence interval
estimate of population mean which is described in Section 7.4 of Unit 7.
Here, we are given that
n  50, X  50 and σ  5.1
Since alternative hypothesis is right-tailed, therefore, we construct lower one-
sided confidence bound for population mean.
Since population variance is known so lower one-sided (1−α) 100 %
confidence bound for population mean when population variance is known is
given by

X  z
n
Since we test our null hypothesis at  = 0.01 therefore, we contract 99% lower
confidence bound and for  = 0.01, we have z   z0.01  2.33.
Thus, lower one-sided 99% confidence bound for average weight of potatoes is
5.1
52  2.33  52  1.68  50.32
50
Since the value of the parameter specified by the null hypothesis i.e. µ = 50 is
less than lower bound for average weight of potato so we reject the null
22
hypothesis. Concepts of Testing of
Hypothesis
Thus, we can use three approaches (critical value, p-value and confidence
interval) for taking decision about null hypothesis.
With this we end this unit. Let us summarise what we have discussed in this
unit.

9.10 SUMMARY
In this unit, we have covered the following points:
1. Statistical hypothesis, null hypothesis, alternative hypothesis, simple &
composite hypotheses.
2. Type-I and Type-II errors.
3. Critical region.
4. One-tailed and two-tailed tests.
5. General procedure of testing a hypothesis.
6. Level of significance.
7. Concept of p-value.
8. Relation between confidence interval and testing of hypothesis.

9.11 SOLUTIONS / ANSWERS


E1) Here, we wish to test the claim of the company that the average life of
its car tyres is 50000 miles so
Claim: µ = 50000 miles and complement: µ ≠ 50000 miles
Since claim contains equality sign so we take claim as the null
hypothesis and complement as the alternative hypothesis i.e.
H0 :  = 50000 miles
H1 :   50000 miles
E2) (iii) Here, doctor wants to test whether new medicine is really more
effective for controlling blood pressure than old medicine so
Claim: µ1 > µ2 and complement: µ1 ≤ µ2
Since complement contains equality sign so we take complement as the
null hypothesis and claim as the alternative hypothesis i.e.
H0 : µ1 ≤ µ2
H1 : µ1 > µ2
(iv) Here, economist wants to test whether the variability in incomes
differ in two populations so
Claim: 12  22 and complement: 12  22
Since complement contains equality sign so we take complement as the
null hypothesis and claim as the alternative hypothesis i.e.
H0 : 12  22

H1 : 12  22

23
Testing of Hypothesis (v) Here, psychologist wants to test whether the proportion of literates
between two groups of people is same so
Claim: P1 = P2 and complement: P1 ≠ P2
Since claim contains equality sign so we take claim as the null
hypothesis and complement as the alternative hypothesis i.e.
H0 : P1 = P2
H1 : P1 ≠ P2
E3) Here, (i) and (vi) represent the simple hypotheses because these
hypotheses tell us the exact values of parameter average weight of
orange  as  = 100 and  = 130.
The rest (ii), (iii), (iv), (v) and (vii) represent the composite hypotheses
because these hypotheses do not tell us the exact values of parameter .
E4) Since alternative hypothesis H1 : θ ≠ 60 is two tailed so critical region
lies in two-tails.
E5) Let A and B denote the number of white balls and black balls in the urn
respectively. Further, let X be the number of white balls drawn among
the two balls from the urn then we can take the null and alternative
hypotheses as
H0 : A = 4 & B = 2 and H1: A = 2 & B = 4
The critical region is given by
w  X : X  2  X : X  0,1
Thus,
 = P [Reject H0 when H0 is true]
 P  X  w / H0   PX  0 / H0   PX  1/ H0 
4
C0 2 C0 4
C1 2 C1 1  1 4  2 1 8
 6
 6
   
C2 C2 15 15 15 15
9 3
α 
15 5
Similarly,
 = P [Do not reject H0 when H1 is true]
2
C 2 4 C0 1  1 1
 P  X  w / H1   PX  2 / H1   6
 
C2 15 15
E6) Since level of significance is the probability of type-I error so in this
case level of significance is 0.05 or 5%.
E7) Here, the alternative hypothesis is two-tailed therefore, the test will be
two-tailed test.
E8) Whether the test of testing a hypothesis is one-tailed or two-tailed
depends on the alternative hypothesis. So correct option is (ii).
E9) First step in testing of hypothesis is to setup null and alternative
hypotheses.

24
UNIT 10 LARGE SAMPLE TESTS
Structure
10.1 Introduction
Objectives
10.2 Procedure of Testing of Hypothesis for Large Samples
10.3 Testing of Hypothesis for Population Mean Using Z-Test
10.4 Testing of Hypothesis for Difference of Two Population Means Using
Z-Test
10.5 Testing of Hypothesis for Population Proportion Using Z-Test
10.6 Testing of Hypothesis for Difference of Two Population Proportions
Using Z-Test
10.7 Testing of Hypothesis for Population Variance Using Z-Test
10.8 Testing of Hypothesis for Two Population Variances Using Z-Test
10.9 Summary
10.10 Solutions /Answers

10.1 INTRODUCTION
In previous unit, we have defined basic terms used in testing of hypothesis.
After providing you necessary material required for any test, we can move
towards discussing particular tests one by one. But before doing that let us tell
you the strategy we are adopting here.
First we categories the tests under two heads:
 Large sample tests
 Small sample tests
After that, their unit wise distribution is done. In this unit, we will discuss large
sample tests whereas in Units 11 and 12 we will discuss small sample tests.
The tests which are described in these units are known as “parametric tests”.
Sometimes in our studies in the fields of economics, psychology, medical, etc.
we take a sample of objects / units / participants / patients, etc. such as 70, 500,
1000, 10,000, etc. This situation comes under the category of large samples.
As a thumb rule, a sample of size n is treated as a large sample only if it
contains more than 30 units (or observations, n > 30). And we know that, for
large sample (n > 30), one statistical fact is that almost all sampling
distributions of the statistic(s) are closely approximated by the normal
distribution. Therefore, the test statistic, which is a function of sample
observations based on n > 30, could be assumed follow the normal distribution
approximately (or exactly).
But story does not end here. There are some other issues which need to be
taken care off. Some of these issues have been highlighted by making different
cases in each test as you will see when go through Sections 10.3 to 10.8 of this
unit.
This unit is divided into ten sections. Section 10.1 is introductory in nature.
General procedure of testing of hypothesis for large samples is described in
25
Testing of Hypothesis Section 10.2. In Section 10.3, testing of hypothesis for population mean is
discussed whereas in Section 10.4, testing of hypothesis for difference of two
population means with examples is described. Similarly, in Sections 10.5 and
10.6, testing of hypothesis for population proportion and difference of two
population proportions are explained respectively. Testing of hypothesis for
population variance and two population variances are described in Sections
10.7 and 10.8 respectively. Unit ends by providing summary of what we have
discussed in this unit in Section 10.9 and solution of exercises in Section 10.10.

Objectives
After studying this unit, you should be able to:
 judge for a given situation whether we should go for large sample test or
not;
 Applying the Z-test for testing the hypothesis about the population mean
and difference of two population means;
 Applying the Z-test for testing the hypothesis about the population
proportion and difference of two population proportions; and
 Applying the Z-test for testing the hypothesis about the population variance
and two population variances.

10.2 PROCEDURE OF TESTING OF HYPOTHESIS


FOR LARGE SAMPLES
As we have described in previous section that for large sample size (n > 30),
one statistical fact is that almost all sampling distributions of the statistic(s) are
closely approximated by the normal distribution. Therefore, when sample size
is large one can apply the normal distribution based test procedures to test the
hypothesis.
In previous unit, we have given the procedure of testing of hypothesis in
general. Let us now discuss the procedure of testing a hypothesis for large
samples in particular.
Suppose X1, X2, …, Xn is a random sample of size n (> 30) selected from a
population having unknown parameter  and we want to test the hypothesis
about the hypothetical / claimed / assumed value θ0 of parameter θ. For this, a
test procedure is required. We discuss it step by step as follows:
Step I: First of all, we have to setup null hypothesis H0 and alternative
hypothesis H1. Here, we want to test the hypothetical / claimed /
assumed value θ0 of parameter θ. So we can take the null and
alternative hypotheses as
H 0 :   0 and H1 :   0  for two-tailed test 
H 0 :   0 and H 1 :   0 
or  for one-tailed test 
H 0 :   0 and H 1 :   0 
In case of comparing same parameter of two populations of interest,
say, 1 and 2, then our null and alternative hypotheses would be
H 0 : 1  2 and H1 : 1  2 for two-tailed test 
26
H 0 : 1  2 and H 1 : 1  2  Large Sample Tests
or   for one-tailed test 
H 0 : 1  2 and H 1 : 1  2 

Step II: After setting the null and alternative hypotheses, we have to choose
level of significance. Generally, it is taken as 5% or 1% (α = 0.05 or
0.01). And accordingly rejection and non-rejection regions will be
decided.
Step III: Third step is to determine an appropriate test statistic, say, Z in case
of large samples. Suppose Tn is the sample statistic such as sample
mean, sample proportion, sample variance, etc. for the parameter 
then for testing the null hypothesis, test statistic is given by
 we know that SE of a statistic is 
Tn  E(Tn ) Tn  E(Tn )  theSD of the sampling distribution 
Z 
SE(Tn ) Var(Tn )  of that statistic 
SE(Tn )  SD(Tn )  Var(Tn ) 

where, E(Tn) is the expectation (or mean) of Tn and Var(Tn) is


variance of Tn.
Step IV: As already mentioned for large samples, statistical fact is that almost
all sampling distributions of the statistic(s) are closely approximated
by the normal distribution as the parent population is normal or non-
normal. So, the test statistic Z will assumed to be approximately
normally distributed with mean 0 and variance 1 as
Tn  E(Tn )
Z ~ N(0,1)
Var(Tn )

By putting the values of Tn, E(Tn) and Var(Tn) in above formula we


calculate the value of test statistic Z. Let z be the calculated value of
test statistic Z.
Step V: After that, we obtain the critical (cut-off or tabulated) value(s) in the
sampling distribution of the test statistic Z corresponding to 
assumed in Step II. These critical values are given in Table-I
(Z-table) at the Appendix of Block 1 of this course corresponding to
different level of significance (α). For convenient some useful critical
values at α = 0.01, 0.05 for Z-test are given in Table 10.1 in this
section. After that, we construct rejection (critical) region of size α in
the probability curve of the sampling distribution of test statistic Z.
Step VI: Take the decision about the null hypothesis based on the calculated
and critical values of test statistic obtained in Step IV and Step V.
Since critical value depends upon the nature of the test that it is one-
tailed test or two-tailed test so following cases arise:
In case of one-tailed test:
Case I: When H 0 :   0 and H 1 :   0 (right-tailed test)

In this case, the rejection (critical) region falls under the right tail of
the probability curve of the sampling distribution of test statistic Z.
Fig. 10.1
Suppose z is the critical value at  level of significance so entire
region greater than or equal to z is the rejection region and less than
z is the non-rejection region as shown in Fig. 10.1.
Testing of Hypothesis If z (calculated value ) ≥ z (tabulated value), that means the
calculated value of test statistic Z lies in the rejection region, then we
reject the null hypothesis H0 at  level of significance. Therefore, we
conclude that sample data provides us sufficient evidence against the
null hypothesis and there is a significant difference between
hypothesized or specified value and observed value of the parameter.
If z < z , that means the calculated value of test statistic Z lies in non-
rejection region, then we do not reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that the sample data
fails to provide us sufficient evidence against the null hypothesis and
the difference between hypothesized value and observed value of the
parameter due to fluctuation of sample.
so the population parameter θ may be 0.
Case II: When H 0 :    0 and H 1 :   0 (left-tailed test)

In this case, the rejection (critical) region falls under the left tail of the
probability curve of the sampling distribution of test statistic Z.
Suppose -z is the critical value at  level of significance then entire
region less than or equal to -z is the rejection region and greater
than -z is the non-rejection region as shown in Fig. 10.2.

Fig. 10.2 If z ≤-z, that means the calculated value of test statistic Z lies in the
rejection region, then we reject the null hypothesis H0 at  level of
significance.
If z >-z, that means the calculated value of test statistic Z lies in the
non-rejection region, then we do not reject the null hypothesis H0 at 
level of significance.
In case of two-tailed test: When H 0 :   0 and H 1 :   0

In this case, the rejection region falls under both tails of the
probability curve of sampling distribution of the test statistic Z. Half
the area (α) i.e. α/2 will lies under left tail and other half under the
right tail. Suppose  zα / 2 and zα / 2 are the two critical values at the
left-tailed and right-tailed respectively. Therefore, entire region less
than or equal to z  / 2 and greater than or equal to zα / 2 are the
rejection regions and between  zα / 2 and zα / 2 is the non-rejection
Fig. 10.3
region as shown in Fig. 10.3.

If z  z / 2 or z   z / 2 , that means the calculated value of test


statistic Z lies in the rejection region, then we reject the null
hypothesis H0 at  level of significance.

If zα/2  z  zα/ 2 , that means the calculated value of test statistic Z


lies in the non-rejection region, then we do not reject the null
hypothesis H0 at  level of significance.
Table 10.1 shows some commonly used critical (cut-off or tabulated) values
for one-tailed test and two-tailed test at different level of significance (α) for
Z-test.
Table 10.1: Critical Values for Z-test Large Sample Tests

Level of Two-Tailed Test One-Tailed Test


Significance (α)
Right-Tailed Test Left- Tailed Test

α = 0.05 (= 5%) ± zα/2 = ± 1.96 zα = 1.645 − zα = − 1.645

α = 0.01 (= 1%) ± zα/2 = ± 2.58 zα = 2.33 − zα = − 2.33

Note 1: As we have discussed in Step IV of this procedure that when sample


size is large then test statistic follows the normal distribution as the parent
population is normal or non-normal so we do not require any assumption of the
form of the parent population for large sample size but when sample size is
small (n < 30) then for applying parametric test we must require the assumption
that the population is normal as we shall in Units 11 and 12. If this assumption
is not fulfilled then we apply the non-parametric tests which will be discussed
in Block 4 of this course.
Decision making procedure about the null hypothesis using the concept of
p-value:
To take the decision about the null hypothesis on the basis of p-value, the p-
value is compared with given level of significance (α) and if p-value is less
than or equal to  then we reject the null hypothesis and if the p-value is
greater than  we do not reject the null hypothesis.
Since test statistic Z follows approximately normal distribution with mean 0
and variance unity, i.e. standard normal distribution and we also know that
standard normal distribution is symmetrical about Z = 0 line therefore, if z
represents the calculated value of Z then p-value can be calculated as follows:
For one-tailed test:
For H1: θ > θ0 (right-tailed test)
p-value = P[Z  z]
For H1: θ < θ0 (left-tailed test)
p-value = P[Z  z]
For two-tailed test:
For H1:   0
p-value = 2P  Z  z 

These p-values for Z-test can be obtained with the help of Table-I (Z-table)
given in the Appendix at the end of Block 1 of this course (which gives the
probability [0  Z  z] for different value of z) as discussed in Unit 14 of
MST-003.
For example, if test is right-tailed and calculated value of test statistic Z is 1.23
then
p-value = P  Z  z   P  Z  1.23  0.5  P  0  Z  1.23

 0.5  0.3907  From Z-table given in Appendix 


of Block 1of this course. 
= 0.1093
29
Testing of Hypothesis Now, you can try the following exercises.
E1) If an investigator observed that the calculated value of test statistic lies in
non-rejection region then he/she will
(i) reject the null hypothesis
(ii) accept the null hypothesis
(iii) not reject the null hypothesis
Write the correct option.
E2) If we have null and alternative hypotheses as
H0:  = 0 and H1:   0
then the rejection (critical) region lies in
(i) left tail
(ii) right tail
(iii) both tails
Write the correct option.
E3) If test is two-tailed and calculated value of test statistic Z is 2.42 then
calculate the p-value for the Z-test.

10.3 TESTING OF HYPOTHESIS FOR


POPULATION MEAN USING Z-TEST
In previous section, we have discussed the general procedure for Z-test. Now
we are discussing the Z-test for testing the hypothesis or claim about the
population mean when sample size is large. Let population under study has
mean  and variance σ2, where  is unknown and σ2 may be known or
unknown. We will consider both cases under this heading. For testing a
hypothesis about population mean we draw a random sample X1, X2,…,Xn of
size n  30 from this population. As we know that for drawing the inference
about the population mean we generally use sample mean and for test statistic,
we require the mean and standard error of sampling distribution of the statistic
(mean). Here, we are considering large sample so we know by central limit
theorem that sample mean is asymptotically normally distributed with mean 
and variance σ2/n whether parent population is normal or non-normal. That
is, if X is the sample mean of the random sample then
σ2
E  X   µ , Var  X   … (1)
n
But we know that standard error = Variance

σ
 SE  X   Var  X   … (2)
n
Now, follow the same procedure as we have discussed in previous section, that
is, first of all we have to setup null and alternative hypotheses. Since here we
want to test the hypothesis about the population mean so we can take the null
and alternative hypotheses as

30
Here,    and 0  0  Large Sample Tests
H0 :   0 and H1 :   0 for two-tailed test if we compareit with 
general procedure. 
H0 :   0 and H1 :   0 
or  for one-tailed test 
H0 :   0 and H1 :   0 
When we assume that
For testing the null hypothesis, the test statistic Z is given by the null hypothesis is
true then we are actually
X  E X  assuming that the
Z
SE  X  population parameter is
equal to the value in the
X  µ0 null hypothesis. For
Z  Using equations (1) and (2) and  example, we assume that
σ/ n  under H 0 we assume that µ  µ 0 .  µ = 60 whether the null
hypothesis is µ = 60 or
The sampling distribution of the test statistic depends upon σ2 that it is known µ ≤ 60 or µ ≥ 60.
or unknown. Therefore, two cases arise:
Case I: When σ2 is known
In this case, the test statistic follows the normal distribution with
mean 0 and variance unity when the sample size is the large as the
population under study is normal or non-normal. If the sample size is
small then test statistic Z follows the normal distribution only when
population under study is normal. Thus,
X  µ0
Z ~ N  0, 1 
σ/ n
Case II: When σ2 is unknown
In this case, we estimate σ2 by the value of sample variance (S2)
where,
1 n 2
S2  
n  1 i1
 Xi  X 

X  0
Then become test statistic follows the t-distribution with
S/ n
(n−1) df as the sample size is large or small provided the population
under study follows normal as we have discussed in Unit 2 of this
course. But when population under study is not normal and sample
size is large then this test statistic approximately follows normal
distribution with mean 0 and variance unity, that is,
X  0
Z ~ N  0,1
S/ n
After that, we calculate the value of test statistic as may be the case (σ2 is
known or unknown) and compare it with the critical value given in Table 10.1
at prefixed level of significance α. Take the decision about the null hypothesis
as described in the previous section.
From above discussion of testing of hypothesis about population mean, we note
following point:
(i) When σ2 is known then we apply the Z-test as the population under study
is normal or non-normal for the large sample. But when sample size is

31
Testing of Hypothesis small then we apply the Z-test only when population under study is
normal.
(ii) When σ2 is unknown then we apply the t-test only when the population
under study is normal as sample size is large or small. But when the
assumption of normality is not fulfilled and sample size is large then we
can apply the Z-test.
(iii) When sample is small and σ2 is known or unknown and the form of the
population is not known then we apply the non-parametric test as we will
be discussed in Block 4 of this course.
Following examples will help you to understand the procedure more clearly.
Example 1: A light bulb company claims that the 100-watt light bulb it sells
has an average life of 1200 hours with a standard deviation of 100 hours. For
testing the claim 50 new bulbs were selected randomly and allowed to burn
out. The average lifetime of these bulbs was found to be 1180 hours. Is the
company’s claim is true at 5% level of significance?
Solution: Here, we are given that
Specified value of population mean = 0 = 1200 hours,
Population standard deviation = σ = 100 hours,
Sample size = n = 50
Sample mean = X = 1180 hours.
In this example, the population parameter being tested is population mean i.e.
average life of a bulb (µ) and we want to test the company’s claim that average
life of a bulb is 1200 hours. So our claim is  = 1200 and its complement is
 ≠ 1200. Since claim contains the equality sign so we can take the claim as the
null hypothesis and complement as the alternative hypothesis. So
H 0 :    0  1200 average life of a bulb is 1200 hours 
H 1 :   1200 average life of a bulb is not1200 hours 
Also the alternative hypothesis is two-tailed so the test is two-tailed test.
Here, we want to test the hypothesis regarding mean when population SD
(variance) is known and sample size n = 50(> 30) is large. So we will go for
Z-test.
Thus, for testing the null hypothesis the test statistic is given by
X  0
Z
/ n
1180  1200 20
   1.41
100 / 50 14.14
The critical (tabulated) values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ± 1.96.
Fig. 10.4 Since calculated value of test statistic Z ( = –1.41) is greater than critical value
(= − 1.96) and less than the critical value (= 1.96), that means it lies in non-
rejection region as shown in Fig. 10.4, so we do not reject the null hypothesis.
Since the null hypothesis is the claim so we support the claim at 5% level of
significance.
Decision according to p-value:
The test is two-tailed, therefore,
p-value = 2P Z  z   2P  Z  1.41 Large Sample Tests

 2  0.5  P  0  Z  1.41  2  0.5  0.4207   0.1586

Since p-value (= 0.1586) is greater than α (= 0.05) so we do not reject the null
hypothesis at 5% level of significance.
Decision according to confidence interval:
Here, test is two-tailed, therefore, we contract two-sided confidence interval for
population mean.
Since population standard deviation is known, therefore, we can use
(1−α) 100 % confidence interval for population mean when population
variance is known which is given by
   
X  z  / 2 n , X  z / 2 n 
 

Here,  = 0.05, we have z /2  z0.025  1.96.

Thus, 95% confidence interval for average life of a bulb is given by

 100 100 
1180  1.96 50 ,1180  1.96 50 
 
or 1180  27.71,1180  27.71 
or 1152.29, 1207.71
Since 95% confidence interval for average life of a bulb contains the value of
the parameter specified by the null hypothesis, that is,    0  1200 so we do
not reject the null hypothesis.
Thus, we conclude that sample does not provide us sufficient evidence against
the claim so we may assume that the company’s claim that the average life of a
bulb is 1200 hours is true.
Note 2: Here, we note that the decisions about null hypothesis based on three
approaches (critical value or classical, p-value and confidence interval) are
same. The learners are advised to make the decision about the claim or
statement by using only one of the three approaches in the examination. Here,
we used all these approaches only to give you an idea how they can be used in
a given problem. Those learners who will opt biostatistics specialisation will
see and realize the importance of confidence interval approach in Unit 16 of
MSTE-004.
Example 2: A manufacturer of ball point pens claims that a certain pen
manufactured by him has a mean writing-life at least 460 A-4 size pages. A
purchasing agent selects a sample of 100 pens and put them on the test. The
mean writing-life of the sample found 453 A-4 size pages with standard
deviation 25 A-4 size pages. Should the purchasing agent reject the
manufacturer’s claim at 1% level of significance?
Solution: Here, we are given that
Specified value of population mean = 0 = 460,

33
Testing of Hypothesis Sample size = n = 100,
Sample mean = X = 453,
Sample standard deviation = S = 25
Here, we want to test the manufacturer’s claim that the mean writing-life (µ) of
pen is at least 460 A-4 size pages. So our claim is  ≥ 460 and its complement
is  < 460. Since claim contains the equality sign so we can take the claim as
the null hypothesis and the complement as the alternative hypothesis. So
H 0 :    0  460 and H 1 :   460

Also the alternative hypothesis is left-tailed so the test is left-tailed test.


Here, we want to test the hypothesis regarding population mean when
population SD is unknown. So we should used t-test for if writing-life of pen
follows normal distribution. But it is not the case. Since sample size is n = 100
(n > 30) large so we go for Z-test. The test statistic of Z-test is given by
X  0
Z
S/ n
453  460 7
   2.8
25 / 100 2.5
We have from Table 10.1 the critical (tabulated) value for left-tailed Z test at
1% level of significance is zα = −2.33.
Since calculated value of test statistic Z (= ‒2.8) is less than the critical value
(= −2.33), that means calculated value of test statistic Z lies in rejection region
Fig. 10.5 as shown in Fig. 10.5, so we reject the null hypothesis. Since the null
hypothesis is the claim so we reject the manufacturer’s claim at 1% level of
significance.
Decision according to p-value:
The test is left-tailed, therefore,

p-value = P  Z  z  P Z  2.8  P Z  2.8  Z is symetrical 


about Z  0 line 

 0.5  P 0  Z  2.8  0.5  0.4974  0.0026

Since p-value (= 0.0026) is less than α (= 0.01) so we reject the null hypothesis
at 1% level of significance.
Therefore, we conclude that the sample provide us sufficient evidence against
the claim so the purchasing agent rejects the manufacturer’s claim at 1% level
of significance.
Now, you can try the following exercises.
E4) A sample of 900 bolts has a mean length 3.4 cm. Is the sample regarded
to be taken from a large population of bolts with mean length 3.25 cm
and standard deviation 2.61 cm at 5% level of significance?
E5) A big company uses thousands of CFL lights every year. The brand that
the company has been using in the past has average life of 1200 hours. A
new brand is offered to the company at a price lower than they are paying
for the old brand. Consequently, a sample of 100 CFL light of new brand Large Sample Tests
is tested which yields an average life of 1220 hours with standard
deviation 90 hours. Should the company accept the new brand at 5% level
of significance?

10.4 TESTING OF HYPOTHESIS FOR


DIFFERENCE OF TWO POPULATION MEANS
USING Z-TEST
In previous section, we have learnt about the testing of hypothesis about the
population mean. But there are so many situations where we want to test the
hypothesis about difference of two population means or two population means.
For example, two manufacturing companies of bulbs are produced same type
of bulbs and one may be interested to test that one is better than the other, an
investigator may want to test the equality of the average incomes of the peoples
living in two cities, etc. Therefore, we require an appropriate test for testing the
hypothesis about the difference of two population means.
Let there be two populations, say, population-I and population-II under study.
Also let 1 ,  2 and 12 ,  22 denote the means and variances of population-I and
population-II respectively where both 1 and 2 are unknown but 12 and 22 may
be known or unknown. We will consider all possible cases here. For testing the
hypothesis about the difference of two population means, we draw a random
sample of large size n1 from population-I and a random sample of large size n2
from population-II. Let X and Y be the means of the samples selected from
population-I and II respectively.
These two populations may or may not be normal but according to the central
limit theorem, the sampling distribution of difference of two large sample
means asymptotically normally distributed with mean (µ1-µ2) and variance
(σ12 / n1  σ 22 / n 2 ) as described in Unit 2 of this course.
Thus,
E  X  Y   E  X   E  Y   1   2 … (3)

and
12 22
Var  X  Y   Var  X   Var  Y   
n1 n 2
But we know that standard error = Variance

12 22
 SE  X  Y   Var  X  Y    … (4)
n1 n 2
Now, follow the same procedure as we have discussed in Section 10.2, that is,
first of all we have to setup null and alternative hypotheses. Here, we want to
test the hypothesis about the difference of two population means so we can take
the null hypothesis as
 Here, 1  1 and 2   2 
H0 : 1  2 (no difference in means) if we compareit with 
 general procedure. 
35
Testing of Hypothesis or H 0 : 1  2  0 (difference in two means is 0)
and the alternative hypothesis as
H1 : 1  2 for two-tailed test 

H0 : 1  2 and H1 : 1  2 
or  for one-tailed test 
H0 : 1  2 and H1 : 1  2 
For testing the null hypothesis, the test statistic Z is given by

Z
 X  Y   E X  Y 
SE  X  Y 

 X  Y    µ1  µ 2 
or Z  using equations (3) and (4) 
σ12 σ 22

n1 n 2
Since under null hypothesis we assume that µ1 = µ2, therefore, we have
XY
Z
σ12 σ 22

n1 n 2

Now, the sampling distribution of the test statistic depends upon 12 and 22
that both are known or unknown. Therefore, four cases arise:
Case I: When 12 & 22 are known and 12 22  2
In this case, the test statistic follows normal distribution with mean
0 and variance unity when the sample sizes are large as both the
populations under study are normal or non-normal. But when
sample sizes are small then test statistic Z follows normal
distribution only when populations under study are normal, that is,
XY
Z ~ N  0,1 
1 1
σ 
n1 n2
Case II: When 12 & 22 are known and 12  22
In this case, the test statistic also follows the normal distribution as
described in case I, that is,
XY
Z ~ N(0, 1)
σ12 σ 22

n1 n 2

Case III: When 12 & 22 are unknown and 12 22  2

In this case, σ2 is estimated by value of pooled sample variance S2p


where,
1
S2p   n1S12  n 2S22 
n1  n 2  2 
and
36
n n Large Sample Tests
1 1
2 1 2
2
2
S 
1 
(n1  1) i1
 Xi  X  and S22  
(n 2  1) i1
 Yi  Y 

and test statistic follows t-distribution with (n1 + n2 − 2) degrees of


freedom as the sample sizes are large or small provided populations
under study follow normal distribution as described in Unit 2 of this
course. But when the populations are under study are not normal
and sample sizes n1 and n2are large (> 30) then by central limit
theorem, test statistic approximately normally distributed with mean
0 and variance unity, that is,
XY
Z ~ N  0,1 
1 1
Sp 
n1 n 2
Case IV: When 12 & 22 are unknown and 12  22

In this case, 12 & 22 are estimated by the values of the sample
variances S12 &S22 respectively and the exact distribution of test
statistic is difficult to derive. But when sample sizes n1 and n2 are
large (> 30) then central limit theorem, the test statistic
approximately normally distributed with mean 0 and variance unity,
that is,
XY
Z ~ N(0, 1)
S12 S22

n1 n 2
After that, we calculate the value of test statistic and compare it with the
critical value given in Table 10.1 at prefixed level of significance α. Take the
decision about the null hypothesis as described in Section10.2.
From above discussion of testing of hypothesis about population mean, we note
following point:
(i) When 12 & 22 are known then we apply the Z-test as both the population
under study are normal or non-normal for the large sample. But when
sample sizes are small then we apply the Z-test only when populations
under study are normal.
(ii) When 12 & 22 are unknown then we apply the t-test only when the
populations under study are normal as sample sizes are large or small.
But when the assumption of normality is not fulfilled and sample sizes
are large then we can apply the Z-test.
(iii) When samples are small and 12 & 22 are known or unknown and the
form of the population is not known then we apply the non-parametric
test as we will be discussed in Block 4 of this course.
Let us do some examples based on above test.
Example 3: In two samples of women from Punjab and Tamilnadu, the mean
height of 1000 and 2000 women are 67.6 and 68.0 inches respectively. If
population standard deviation of Punjab and Tamilnadu are same and equal to
5.5 inches then, can the mean heights of Punjab and Tamilnadu women be
regarded as same at 1% level of significance?
37
Testing of Hypothesis Solution: We are given
n1 = 1000, n2 = 2000, X  67.6, Y  68.0 and σ1  σ2  σ  5.5
Here, we wish to test that the mean height of Punjab and Tamilnadu women is
same. If 1 and 2 denote the mean heights of Punjab and Tamilnadu women
respectively then our claim is 1 = 2 and its complement is 1 ≠ 2. Since the
claim contains the equality sign so we can take the claim as the null hypothesis
and complement as the alternative hypothesis. Thus,
H 0 : 1   2 and H1 : 1   2
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
Here, we want to test the hypothesis regarding two population means. The
standard deviations of both populations are known and sample sizes are large,
so we should go for Z-test.
So, for testing the null hypothesis, the test statistic Z is given by
XY
Z
σ12 σ 22

n1 n 2
67.6  68.0  0.4
 2 2

  5.5   5.5   1 1 
 1000  2000  5.5   
 1000 2000 
 
 0.4
  1.88
5.5  0.0387
The critical (tabulated) values for two-tailed test at 1% level of significance are
± zα/2 = ± z0.005 = ± 2.58.
Since calculated value of Z ( = −1.88) is greater than the critical value
(= − 2.58) and less than the critical value (= 2.58), that means it lies in non-
rejection region as shown in Fig. 10.6, so we do not reject the null hypothesis
i.e. we fail to reject the claim.
Decision according to p-value:
Fig. 10.6
The test is two-tailed, therefore,
p-value = 2P  Z  z   2P  Z  1.88

 2  0.5  P  0  Z  1.88  2  0.5  0.4699   0.0602

Since p-value (= 0.0602) is greater than  (  0.01) so we do not reject the null
hypothesis at1% level of significance.
Thus, we conclude that the samples do not provide us sufficient evidence
against the claim so we may assume that the average height of women of
Punjab and Tamilnadu is same.
Example 4: A university conducts both face to face and distance mode classes
for a particular course indented both to be identical. A sample of 50 students of
face to face mode yields examination results mean and SD respectively as:
X  80.4, S1  12.8
and other sample of 100 distance-mode students yields mean and SD of their Large Sample Tests
examination results in the same course respectively as:
Y  74.3, S2  20.5
Are both educational methods statistically equal at 5% level?
Solution: Here, we are given that
n1  50, X  80.4, S1  12.8;

n 2  100 , Y  74.3, S2  20.5


We wish to test that both educational methods are statistically equal. If 1 and
2 denote the average marks of face to face and distance mode students
respectively then our claim is 1 = 2 and its complement is 1 ≠ 2. Since the
claim contains the equality sign so we can take the claim as the null hypothesis
and complement as the alternative hypothesis. Thus,
H 0 : 1   2 and H1 : 1   2
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
We want to test the null hypothesis regarding two population means when
standard deviations of both populations are unknown. So we should go for t-
test if population of difference is known to be normal. But it is not the case.
Since sample sizes are large (n1, and n2 > 30) so we go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
XY
Z
S12 S22

n1 n 2

80.4  74.3 6.1


   2.23
12.8 
2
 20.5 
2 3.28  4.20
 Fig. 10.7
50 100
The critical (tabulated) values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ± 1.96.
Since calculated value of Z ( = 2.23) is greater than the critical values
(= ±1.96 ), that means it lies in rejection region as shown in Fig. 10.7, so we
reject the null hypothesis i.e. we reject the claim at 5% level of significance.
Decision according to p-value:
The test is two-tailed, therefore,

p-value = 2P  Z  z   2P  Z  2.23

 2  0.5  P 0  Z  2.23  2  0.5  0.4871  0.0258

Since p-value (= 0.0258) is less than  (  0.05) so we reject the null hypothesis
at 5% level of significance.
Thus, we conclude that samples provide us sufficient evidence against the
claim so both methods of education, i.e. face-to-face and distance-mode, are
not statistically equal.
Testing of Hypothesis Now, you can try the following exercises.
E6) Two brands of electric bulbs are quoted at the same price. A buyer was
tested a random sample of 200 bulbs of each brand and found the
following information:
Mean Life (hrs.) SD(hrs.)
Brand A 1300 41
Brand B 1280 46
Is there significant difference in the mean duration of their lives of two
brands of electric bulbs at 1% level of significance?
E7) Two research laboratories have identically produced drugs that provide
relief to BP patients. The first drug was tested on a group of 50 BP
patients and produced an average 8.3 hours of relief with a standard
deviation of 1.2 hours. The second drug was tested on 100 patients,
producing an average of 8.0 hours of relief with a standard deviation of
1.5 hours. Does the first drug provide a significant longer period of relief
at a significant level of 5%?

10.5 TESTING OF HYPOTHESIS FOR


POPULATION PROPORTION USING Z-TEST
In Section 10.3, we have discussed the procedure of testing of hypothesis for
population mean when sample size is large. But in many real world situations,
in business and other areas where collected data are in form of counts or the
collected data are classified into two categories or groups according to an
attribute or a characteristic. For example, the peoples living in a colony may be
classified into two groups (male and female) with respect to the characteristic
sex, the patients in a hospital may be classified into two groups as cancer and
non-cancer patients, the lot of articles may be classified as defective and non-
defective, etc. Here, collected data are available in dichotomous or binary
outcomes which is a special case of nominal scale and the data categorized into
two mutually exclusive and exhaustive classes generally known as success and
failure out comes. For example, the characteristic sex can be measured as
success if male and failure if female or vice versa. So in such situations,
proportion is suitable measure to apply.
In such situations, we require a test for testing a hypothesis about population
proportion.
For this purpose, let X1 , X 2 , ..., X n be a random sample of size n taken from a
population with population proportion P. Also let X denotes the number of
observations or elements possess a certain attribute (number of successes) out
of n observations of the sample then sample proportion p can be defined as
X
p 1
n
As we have seen in Section 2.4 of the Unit 2 of this course that mean and
variance of the sampling distribution of sample proportion are
PQ
E(p)  P and Var(p) 
n
where, Q = 1‒ P.
40
Now, two cases arise: Large Sample Tests

Case I: When sample size is not sufficiently large i.e. either of the conditions
np > 5 or nq > 5 does not meet, then we use exact binomial test. But exact
binomial test is beyond the scope of this course.
Case II: When sample size is sufficiently large, such that np > 5 and nq > 5
then by central limit theorem, the sampling distribution of sample proportion p
is approximately normally distributed with mean and variance as
PQ
E(p) = P and Var(p) = … (5)
n
But we know that standard error = Variance

PQ
 SE (p)  … (6)
n
Now, follow the same procedure as we have discussed in Section 10.2, first of
all we setup null and alternative hypotheses. Since here we want to test the
hypothesis about specified value P0 of the population proportion so we can take
the null and alternative hypotheses as
Here,   P and   P0 
H 0 : P  P0 and H1 : P  P0 for two-tailed test   if we compare it with
0

 general procedure. 

H 0 : P  P0 and H1 : P  P0 
or  for one-tailed test 
H 0 : P  P0 and H1 : P  P0 
For testing the null hypothesis, the test statistic Z is given by
p  E p 
Z
SE  p 
p  P0
Z ~ N  0, 1  under H0  using equations(5) and (6) 
P0 Q 0
n
After that, we calculate the value of test statistic and compare it with the
critical value(s) given in Table 10.1 at prefixed level of significance α. Take
the decision about the null hypothesis as described in Section 10.2.
Let us do some examples of testing of hypothesis about population proportion.
Example 5: A machine produces a large number of items out of which 25%
are found to be defective. To check this, company manager takes a random
sample of 100 items and found 35 items defective. Is there an evidence of more
deterioration of quality at 5% level of significance?
Solution: The company manager wants to check that his machine produces
25% defective items. Here, attribute under study is defectiveness. And we
define our success and failure as getting a defective or non defective item.
Let P = Population proportion of defectives items = 0.25(= P0 )
p = Observed proportion of defectives items in the sample = 35/100 = 0.35
Here, we want to test that machine produces more defective items, that is, the
proportion of defective items (P) greater than 0.25. So our claim is P > 0.25
41
Testing of Hypothesis and its complement is P ≤ 0.25. Since complement contains the equality sign so
we can take the complement as the null hypothesis and the claim as the
alternative hypothesis. So
H 0 : P  P0  0.25 and H 1 : P  0.25

Since the alternative hypothesis is right-tailed so the test is right-tailed test.


Before proceeding further, first we have to check whether the condition of
normality meets or not.
 np  100  0.35  35  5
nq  100   1  0.35   100  0.65  65  5

We see that condition of normality meets, so we can go for Z-test.


So, for testing the null hypothesis, the test statistic Z is given by
p  P0
Z
P0Q 0
n
0.35  0.25 0.10
   2.31
0.25  0.75 0.0433
100
Since test is right-tailed so the critical value at 5% level of significance is
zα = z0.05 = 1.645.
Since calculated value of test statistic Z (= 2.31) is greater than the critical
value (= 1.645), that means it lies in the rejection region as shown in Fig. 10.8.
So we reject the null hypothesis and support the alternative hypothesis i.e. we
Fig. 10.8 support the claim at 5% level of significance.
Decision according to p-value:
The test is right-tailed, therefore,

p-value = P  Z  z  P  Z  2.31

 0.5  P 0  Z  2.31  0.5  0.4896

 0.0104
Since p-value (= 0.0104) is less than  (  0.05) so we reject the null
hypothesis at 5% level of significance.
Thus, we conclude that the sample fails to provide us sufficient evidence
against the claim so we may assume that deterioration in quality exists at 5%
level of significance.
Example 6: A die is thrown 9000 times and draw of 2 or 5 is observed 3100
times. Can we regard that die is unbiased at 5% level of significance.
Solution: Let getting a 2 or 5 be our success, and getting a number other than 2
or 5 be a failure then in usual notions, we have
n = 9000, X = number of successes = 3100, p = 3100/9000 = 0.3444
Here, we want to test that the die is unbiased and we know that if die is Large Sample Tests
unbiased then proportion or probability of getting 2 or 5 is
P = Probability of getting a 2 or 5
= Probability of getting 2 + Probability of getting 5
1 1 1
    0.3333
6 6 3
So our claim is P = 0.3333 and its complement is P ≠ 0.3333. Since the claim
contains the equality sign so we can take the claim as the null hypothesis and
complement as the alternative hypothesis. Thus,
H0 : P  P0  0.3333 and H1 :P  0.3333

Since the alternative hypothesis is two-tailed so the test is two-tailed test.


Before proceeding further, first we have to check whether the condition of
normality meets or not.
 np  9000  0.3444  3099.6  5
nq  9000   1  0.3444   9000  0.6556  5900.4  5
We see that condition of normality meets, so we can go for Z-test.
So, for testing the null hypothesis, the test statistic Z is given by
p  P0
Z
P0Q 0
n
0.3444  0.3333 0.0111
   2.22
0.3333  0.6667 0.005
9000
Since test is two-tailed so the critical values at 5% level of significance are
± zα/2 = ± z0.025 = ±1.96.
Since calculated value of Z (= 2.22) is greater than the critical value (= 1.96),
that means it lies in rejection region, so we reject the null hypothesis i.e. we
reject our claim.
Decision according to p-value:
Since test is two-tailed so

p-value = 2P Z  z   2P  Z  2.22

 2 0.5  P  0  Z  2.22  2  0.5  0.4868  0.0264

Since p-value (= 0.0264) is less than  (  0.05), so we reject the null


hypothesis at 5% level of significance.
Thus, we conclude that the sample provides us sufficient evidence against the
claim so die cannot be considered as unbiased.
Now, you can try the following exercises.
E8) In a sample of 100 MSc. Economics first year students of a University, it
was seen that 54 students came from Science background and the rest are
43
Testing of Hypothesis from other background. Can we assume that 50% of the students are from
Science background in MSc. Economics first year students in the
University at 1% level of significance?
E9) Out of 200 patients who are given a particular injection 180 survived.
Test the hypothesis that the survival rate is more than 80% at 5% level of
significance?

10.6 TESTING OF HYPOTHESIS FOR


DIFFERENCE OF TWO POPULATION
PROPORTIONS USING Z-TEST
In Section 10.5, we have discussed the testing of hypothesis about the
population proportion. In some cases, we are interested to test the hypothesis
about difference of two population proportions of an attributes in the two
different populations or groups. For example, one may wish to test whether the
proportions of alcohol drinkers in the two cities are same, one may wish to test
proportion of literates in a group of people is greater than the proportion of
literates in other group of people, etc. Therefore, we require the test for testing
the hypothesis about the difference of two population proportions.
Let there be two populations, say, population-I and population-II under study.
And also let we draw a random sample of size n1 from population-I with
population proportion P1 and a random sample of size n2 from population-II
with population proportion P2. If X1 and X2 are the number of observations
/individuals / items / units possessing the given attribute in the sample of sizes
n1 and n2 respectively then sample proportions can be defined as
X1 X
p1  and p 2  2
n1 n2
As we have seen in Section 2.5 of the Unit 2 of this course that mean and
variance of the sampling distribution of difference of sample proportions are
E(p1-p2) = P1-P2
and variance
P1Q1 P2Q 2
Var(p 1-p2)  
n1 n2

where, Q1 = 1‒ P1 and Q2 = 1‒ P2.


Now, two cases arise:
Case I: When sample sizes are not sufficiently large i.e. any of the conditions
n1p1  5 or n1q1  5 or n 2 p2  5 or n 2q2  5 does not meet, then we use exact
binomial test. But exact binomial test is beyond the scope of this course.
Case II: When sample sizes are sufficiently large, such that n1p1  5, n1q1  5,
n2p2  5 and n2q2  5 then by central limit theorem, the sampling distribution
of sample proportions p 1 and p 2 are approximately normally as

44
 PQ   PQ  Large Sample Tests
p1 ~ N  P1 , 1 1  and p 2 ~ N  P2 , 2 2 
 n1   n2 

Also, by the property of normal distribution described in Unit 13 of MST-003,


the sampling distribution of the difference of sample proportions follows
normal distribution with mean
E(p1-p2) = E(p1)-E(p2) = P1-P2 … (7)
and variance
P1Q1 P2Q 2
Var(p1-p2) = Var(p 1)+Var(p2)  
n1 n2

That is,
 PQ P Q 
p1  p 2 ~ N P1  P2 , 1 1  2 2 
 n1 n2 

Thus, standard error is given by


P1Q1 P2Q 2
SE  p1  p 2   Var  p1  p 2    … (8)
n1 n2

Now, follow the same procedure as we have discussed in Section 10.2, first of
all we have to setup null and alternative hypotheses. Here, we want to test the
hypothesis about the difference of two population proportions so we can take
the null hypothesis as
 Here, 1  P1 and 
H 0 : P1  P2 (no difference in proportions)   2  P2 if we compare 
 it with general 
 procedure. 

or H 0 : P1  P2  0 (difference in two proportions is 0)


and the alternative hypothesis may be
H1 : P1  P2 for two-tailed test 

H 0 : P1  P2 and H1 : P1  P2 
or  for one-tailed test
H 0 : P1  P2 and H1 : P1  P2 

For testing the null hypothesis, the test statistic Z is given by


 p1  p 2   E  p1  p 2 
Z
SE  p1  p 2 

 p1  p2    P1  P2 
or Z  using equations (7) and (8) 
P1Q1 P2Q2

n1 n2
Since under null hypothesis we assume that P1 = P2 = P, therefore, we have

45
Testing of Hypothesis p1  p 2
Z
1 1 
PQ   
 n1 n 2 
where, Q = 1-P.
Generally, P is unknown then it is estimated by the value of pooled proportion
P̂, where
n p  n 2 p 2 X1  X 2 ˆ  1  Pˆ
Pˆ  1 1  and Q
n1  n 2 n1  n 2
After that, we calculate the value of test statistic and compare it with the
critical value(s) given in Table 10.1 at prefixed level of significance α. Take
the decision about the null hypothesis as described in Section 10.2.
Now, it is time for doing some examples for testing of hypothesis about the
difference of two population proportions.
Example 7: In a random sample of 100 persons from town A, 60 are found to
be high consumers of wheat. In another sample of 80 persons from town B, 40
are found to be high consumers of wheat. Do these data reveal a significant
difference between the proportions of high wheat consumers in town A and
town B ( at α = 0.05 )?
Solution: Here, attribute under study is high consuming of wheat. And we
define our success and failure as getting a person of high consumer of wheat
and not high consumer of wheat respectively.
We are given that
n1 = total number of persons in the sample of town A = 100
n2 = total number of persons in the sample of town B = 80
X1 = number of persons of high consumer of wheat in town A = 60
X2 = number of persons of high consumer of wheat in town B = 40
The sample proportion of high wheat consumers in town A is
X1 60
p1    0.60
n1 100
and the sample proportion of wheat consumers in town B is
X 2 40
p2    0.50
n 2 80
Here, we want to test that the proportion of high consumers of wheat in two
towns, say, P1 and P2, is not same. So our claim is P1 ≠ P2 and its complement
is P1 = P2. Since the complement contains the equality sign, so we can take the
complement as the null hypothesis and the claim as the alternative hypothesis.
Thus,
H 0 : P1  P2  P and H 1 : P1  P2

Since the alternative hypothesis is two-tailed so the test is two-tailed test.


Before proceeding further, first we have to check whether the condition of
normality meets or not.

46
 n1p1  100  0.60  60  5, n1q1  100  0.40  40  5 Large Sample Tests

n 2 p2  80  0.50  40  5, n 2q 2  80  0.50  40  5
We see that condition of normality meets, so we can go for Z-test.
The estimate of the combined proportion (P) of high wheat consumers in two
towns is given by
n1p1  n 2p2 X1  X 2 60  40 5
P̂    
n1  n 2 n1  n 2 100  80 9

ˆ  1  Pˆ  1  5  4
Q
9 9
For testing the null hypothesis, the test statistic Z is given by
p1  p 2
Z
ˆ  1  1 
P̂Q
 n1 n 2 
0.60  0.50 0.10
   1.34
5 4 1 1  0.0745
 
9 9  100 80 
The critical values for two-tailed test at 5% level of significance are ± zα/2
= ± z0.025 = ±1.96.
Since calculated value of Z (=1.34) is less than the critical value (= 1.96) and
greater than critical value (= −1.96), that means calculated value of Z lies in
non-rejection region, so we do not reject the null hypothesis and reject the
alternative hypothesis i.e. we reject the claim.
Decision according to p-value:
Since the test is two-tailed, therefore
p-value = 2 P  Z  z   2P  Z  1.34

 2 0.5  P  0  Z  1.34  2  0.5  0.4099   0.1802

Since p-value (= 0.1802) is greater than  (  0.05) so we do not reject the null
hypothesis at 5% level of significance.
Thus, we conclude that the samples provide us the sufficient evidence against
the claim so we may assume that the proportion of high consumers of wheat in
two towns A and B is same.
Example 8: A machine produced 60 defective articles in a batch of 400. After
overhauling it produced 30 defective in a batch of 300. Has the machine
improved due to overhauling? (Take  = 0.01).
Solution: Here, the machine produced articles and attribute under study is
defectiveness. And we define our success and failure as getting a defective or
non defective article. Therefore, we are given that
X1 = number of defective articles produced by the machine before overhauling
= 60
X2 = number of defective articles produced by the machine after overhauling
= 30

47
Testing of Hypothesis and n1  400, n 2  300,
Let p1 = Observed proportion of defective articles in the sample before the
overhauling
X1 60
   0.15
n1 400
and p2 = Observed proportion of defective articles in the sample after the
overhauling
X2 30
   0.10
n 2 300
Here, we want to test that machine improved due to overhauling that means the
proportion of defective articles is less after overhauling. If P1 and P2 denote the
proportion defectives before and after the overhauling the machine so our claim
is P1 > P2 and its complement P1 ≤ P2. Since the complement contains the
equality sign so we can take the complement as the null hypothesis and claim
as the alternative hypothesis. Thus,
H 0 : P1  P2 and H1 : P1  P2
Since the alternative hypothesis is right-tailed so the test is right-tailed test.
Since P is unknown, so the pooled estimate of proportion is given by
X1  X 2 60  30 90 9 ˆ  1  Pˆ  1  9  61 .
P̂     and Q
n1  n 2 400  300 700 70 70 70
Before proceeding further, first we have to check whether the condition of
normality meets or not.
 n1p1  400  0.15  60  5, n1q1  400  0.85  340  5
n 2 p2  300  0.10  30  5, n 2q 2  300  0.90  270  5
We see that condition of normality meets, so we can go for Z-test.
For testing the null hypothesis, the statistic is given by
p1  p 2
Z
ˆ  1  1 
P̂Q
 n1 n 2 
0.15  0.10 0.05
   1.95
9 61  1 1  0.0256
 
70 70  400 300 
The critical value for right-tailed test at 1% level of significance is
zα = z0.01 = 2.33.
Since calculated value of Z (= 1.95) is less than the critical value (= 2.33) that
means calculated value of Z lies in non-rejection region, so we do not reject the
null hypothesis and reject the alternative hypothesis i.e. we reject the claim at
1% level of significance.
Decision according to p-value:
Since the test is right-tailed, therefore,
p-value = P Z  z  P Z  1.95

48
 0.5  P 0  Z  1.95  0.5  0.4744  0.0256 Large Sample Tests

Since p-value (= 0.0256) is greater than  (  0.01) so we do not reject the null
hypothesis at1% level of significance.
Thus, we conclude that the samples provide us sufficient evidence against the
claim so the machine has not been improved after overhauling.
Now, you can try the following exercises.
E10) The proportions of literates between groups of people of two districts A
and B are tested. Out of the 100 persons selected at random from each
district, 50 from district A and 40 from district B are found literates. Test
whether the proportion of literate persons in two districts A and B is
same at 1% level of significance?
E11) In a large population 30% of a random sample of 1200 persons had blue-
eyes and 20% of a random sample of 900 persons had the same blue-
eyes in another population. Test the proportion of blue-eyes persons is
same in two populations at 5% level of significance.

10.7 TESTING OF HYPOTHESIS FOR


POPULATION VARIANCE USING Z-TEST
In Section 10.3, we have discussed testing of hypothesis for population mean
but when analysing quantitative data, it is often important to draw conclusion
about the average as well as the variability of a characteristic of under study.
For example, a company manufactured the electric bulbs and the manager of
the company would probably be interested in determining the average life of
the bulbs and also determining whether or not the variability in the life of bulbs
is within acceptable limits, the product controller of a milk company may be
interested to know variance of the amount of fat in the whole milk processed
by the company is no more than the specified level, etc. So we require a test for
this purpose.
The procedure of testing a hypothesis for population variance or standard
deviation is similar to the testing of population mean.
For testing a hypothesis about the population variance, we draw a random
sample X 1 , X 2 ,..., X n of size n  30 from the population with mean  and
variance σ2 where,  be known or unknown.
We know that by central limit theorem that sample variance is asymptotically
normally distributed with mean σ2 and variance 2σ4/n whether parent
population is normal or non-normal. That is, if S2 is the sample variance of
the random sample then
2σ 4
E(S2 )  σ 2 and Var(S2 ) = … (9)
n

But we know that standard error = Variance

2 2
 SE  S2   Var  S2   σ … (10)
n
The general procedure of this test is explained in the next page.
49
Testing of Hypothesis As we are doing so far in all tests, first Step in hypothesis testing problems is to
setup null and alternative hypotheses. Here, we want to test the hypothesis
specified value 20 of the population variance 2 so we can take our null and
alternative hypotheses as

H 0 : 2  02 and H1 : 2  02 for two-tailed test 


H0 : 2  02 and H1 : 2  02 
or  for one-tailed test 
H0 : 2  02 and H1 : 2  02 

For testing the null hypothesis, the test statistic Z is given by


S2  E  S2 
Z ~ N  0,1 
SE  S2 

S2  σ20  Usingequations(9) and(10) and 


Z 2 2
2  under H0 : σ  σ0 
σ02
n
After that, we calculate the value of test statistic and compare it with the
critical value given in Table 10.1 at prefixed level of significance α. Take the
decision about the null hypothesis as described in Section 10.2.
Note 2: When population under study is normal then for testing the hypothesis
about population variance or population standard deviation we use chi-square
test which will be discussed in Unit 12 of this course. Whereas when the
distribution of the population under study is not known and sample size is large
then we apply Z-test as discussed above.
Now, it is time to do example based on above test.
Example 9: A random sample of size 65 screws is taken from a population of
big box of screws and measured their length (in mm) which gives sample
variance 9.0. Test that the two years old population variance 10.5 is still
maintained at present at 5% level of significance.
Solution: We are given that
n  65, S2  9.0, 20  10.5
Here, we want to test that the two years old screw length population variance
(σ2) is still maintained at 10.5. So our claim is σ2 = 10.5 and its complement is
σ2 ≠ 10.5. Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,
H 0 : 2  02  10.5 and H1 : 2  10.5
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
Here, the distribution of population under study is not known and sample size
is large (n > 30) so we can go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
S2  σ20
Z ~ N  0,1 
2 2
σ0
n
50
9.0  10.5  1.5 1.5 Large Sample Tests
    0.81
2 10.5  0.175 1.84
10.5
65
The critical values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ±1.96.
Since calculated value of Z (= −0.81) is less than critical value (= 1.96) and
greater than the critical value (= −1.96), that means it lies in non-rejection
region, so we do not reject the null hypothesis i.e. we support the claim.
Thus, we conclude that sample fails to provide us sufficient evidence against
the claim so we may assume that two years old screw length population
variance is still maintained at 10.5mm.
Now, you can try the following exercise.
E12) A random sample of size 120 bulbs is taken from a lot which gives the
standard deviation of the life of electric bulbs 7 hours. Test the standard
deviation of the life of bulbs of the lot is 6 hours at 5% level of
significance.

10.8 TESTING OF HYPOTHESIS FOR TWO


POPULATION VARIANCES USING Z-TEST
In previous section, we have discussed testing of hypothesis about the
population variance. But there are so many situations where we want to test the
hypothesis about equality of two population variances or standard deviations.
For example, an economist may want to test whether the variability in incomes
differ in two populations, a quality controller may want to test whether the
quality of the product is changing over time, etc.
Let there be two populations, say, population-I and population-II under study.
Also let 1, 2 and 12 , 22 denote the means and variances of population-I and
population-II respectively where both 12 and 22 are unknown but 1 and  2 may
be known or unknown. For testing the hypothesis about equality of two
population variances or standard deviations, we draw a random sample of large
size n1 from population-I and a random sample of large size n2 from
population-II. Let S12 and S22 be the sample variances of the samples selected
from population-I and population-II respectively.
These two populations may or may not be normal but according to the central
limit theorem, the sampling distribution of difference of two large sample
variances asymptotically normally distributed with mean (σ12  σ 22 ) and
variance (2σ14 / n1  2σ24 / n2 ) .
Thus,
E S12  S22   E S12   E S22   12  22 … (11)

and

214 224
Var S12  S22   Var S12   Var S22   
n1 n2

51
Testing of Hypothesis But we know that standard error = Variance

2 14 2  42
 SE S12  S22   Var  S12  S22    … (12)
n1 n2

Now, follow the same procedure as we have discussed in Section 10.2, that is,
first of all we have to setup null and alternative hypothesis. Here, we want to
test the hypothesis about the two population variances, so we can take our null
and alternative hypotheses as
H 0 : 12  22  2 and H1 : 12  22 for two-tailed test 
H 0 : 12  22 and H1 : 12  22 
or  for one-tailed test 
H 0 : 12  22 and H1 : 12  22 

For testing the null hypothesis, the test statistic Z is given by


 S12  S22   E  S12  S22  ~ N 0,1
Z  
SE  S12  S22 

or
 S12  S22    σ12  σ 22 
Z  using equations (11) and (12) 
2σ14 2σ 42

n1 n2

Since under null hypothesis we assume that σ12  σ22  σ2 , therefore, we have

S12  S22
Z ~ N  0,1 
 1 1 
σ2 2  
 n1 n 2 
Generally, population variances σ12 and σ 22 are unknown, so we estimate them
by their corresponding sample variances S12 and S 22 as

ˆ 12  S12 and ˆ 22  S 22
Thus, the test statistic Z is given by
S12  S22
Z ~ N  0,1
 2S14 2S42 
 n  n 
 1 2 

After that, we calculate the value of test statistic as may be the case and
compare it with the critical value given in Table 10.1 at prefixed level of
significance α. Take the decision about the null hypothesis as described in
Section 10.2.
Note 3: When populations under study are normal then for testing the
hypothesis about equality of population variances we use F- test which will be
discussed in Unit 12 of this course. Whereas when the form of the populations
under study is not known and sample sizes are large then we apply Z-test as
discussed above.
Now, it is time to do an example based on above test.

52
Example 10: A comparative study of variation in weights (in pound) of Army- Large Sample Tests
soldiers and Navy- sailors was made. The sample variance of the weight of 120
soldiers was 60 pound2 and the sample variance of the weight of 160 sailors
was 70 pound2. Test whether the soldiers and sailors have equal variation in
their weights. Use 5% level of significance.

Solution: Given that

n1  120, S12  60, n 2  160 , S22  70

We want to test that the Army-soldiers and Navy-sailors have equal variation
in their weights. If 12 and 22 denote the variances in the weight of Army-
soldiers and Navy-sailors so our claim is 12  22 and its complement is
12  22 . Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,

H 0 : 12  22  Army-soldiers and Navy-sailors 


 have equal variation in their weights 

and the alternative hypothesis as

H1 : 12   22  Army-soldiers and Navy-sailors 


 have different variation in their weights 
Here, the distributions of populations under study are not known and sample
sizes are large  n1  120  30, n 2  160  30  so we can go for Z-test.

Since population variances are unknown so for testing the null hypothesis, the
test statistic Z is given by
S12  S22
Z
 2S14 2S24 
 n  n 
 1 2 

60  70

2 2
2   60  2   70 

120 160
10 10
   0.91
60.0  61.25 11.01

The critical values for two-tailed test at 5% level of significance are


± zα/2 = ± z0.025 = ±1.96.
Since calculated value of Z (= −0.91) is less than critical value (= 1.96) and
greater than the critical value (= −1.96), that means it lies in non-rejection
region, so we do not reject the null hypothesis i.e. we support the claim.
Thus, we conclude that samples fail to provide us sufficient evidence against
the claim so we may assume that the Army-soldiers and Navy-sailors have
equal variation in their weights.
Now, you can try the following exercise.

53
Testing of Hypothesis E 13) Two sources of raw materials of bulbs are under consideration by a bulb
manufacturing company. Both sources seem to have similar
characteristics but the company is not sure about their respective
uniformity. A sample of 52 lots from source A yields variance 25 and a
sample of 40 lots from source B yields variance of 12. Test whether the
variance of source A significantly differs to the variances of source B at
 = 0.05?
We now end this unit by giving a summary of what we have covered in it.

10.9 SUMMARY
In this unit we have covered the following points:
1. How to judge a given situation whether we should go for large sample test
or not.
2. Applying the Z-test for testing the hypothesis about the population mean
and difference of two population means.
3. Applying the Z-test for testing the hypothesis about the population
proportion and difference of two population proportions.
4. Applying the Z-test for testing the hypothesis about the population variance
and two population variances.

10.10 SOLUTIONS / ANSWERS


E1) Since we have a rule that if the observed value of test statistic lies in
rejection region then we reject the null hypothesis and if calculated
value of test statistic lies in non-rejection region then we do not reject
the null hypothesis. Therefore in our case, we do not reject the null
hypothesis. So (iii) is the correct answer. Remember always on the
basis of the one sample we never accept the null hypothesis.
E2) Since the test is two-tailed therefore rejection region will lie under both
tails.
E3) Since test is two-tailed, therefore,

p-value = 2P  Z  z   2P  Z  2.42

 2 0.5  P  0  Z  2.42  2  0.5  0.4922   0.0156

E4) We are given that


n  900, X  3.4 cm, μ 0  3.25 cm and σ  2.61 cm
Here, we wish to test that the sample comes from a large population of
bolts with mean (µ) 3.25cm. So our claim is µ = 3.25cm and its
complement is µ ≠ 3.25cm. Since the claim contains the equality sign
so we can take the claim as the null hypothesis and the complement as
the alternative hypothesis. Thus,
H0 : µ  µ0  3.25 and H1 : µ  3.25
Since the alternative hypothesis is two-tailed, so the test is two-tailed
test.
54
Here, we want to test the hypothesis regarding population mean when Large Sample Tests
population SD is unknown, so we should use t-test if the population of
bolts known to be normal. But it is not the case. Since the sample size is
large (n > 30) so we can go for Z-test instead of t-test as an
approximate. So test statistic is given by
X  0
Z
/ n
3.40  3.25 0.15
   1.72
2.61/ 900 0.087
The critical (tabulated) values for two-tailed test at 5% level of
significance are ± zα/2 = ± z0.025 = ± 1.96.
Since calculated value of test statistic Z (= 1.72) is less than the critical
value (= 1.96) and greater than critical value (= −1.96), that means it
lies in non-rejection region, so we do not reject the null hypothesis i.e.
we support the claim at 5% level of significance.
Thus, we conclude that sample does not provide us sufficient evidence
against the claim so we may assume that the sample comes from the
population of bolts with mean 3.25cm.
E5) Here, we given that
µ 0  1200, n  100, X  1220, S  90
Here, the company may accept the new CFL light when average life of
CFL light is greater than 1200 hours. So the company wants to test that
the new brand CFL light has an average life greater than 1200 hours. So
our claim is  > 1200 and its complement is  ≤ 1200. Since
complement contains the equality sign so we can take the complement
as the null hypothesis and the claim as the alternative hypothesis. Thus,
H0 : µ  µ0  1200 and H1 : µ  1200
Since the alternative hypothesis is right-tailed so the test is right-tailed
test.
Here, we want to test the hypothesis regarding population mean when
population SD is unknown, so we should use t-test if the distribution of
life of bulbs known to be normal. But it is not the case. Since the
sample size is large (n > 30) so we can go for Z-test instead of t-test.
Therefore, test statistic is given by
X  0
Z
S/ n
1220  1200 20
   2.22
90/ 100 9
The critical values for right-tailed test at 5% level of significance is
zα = z0.05 = 1.645.
Since calculated value of test statistic Z (= 2.22) is greater than critical
value (= 1.645), that means it lies in rejection region so we reject the
null hypothesis and support the alternative hypothesis i.e. we support
our claim at 5% level of significance.
55
Testing of Hypothesis Thus, we conclude that sample does not provide us sufficient evidence
against the claim so we may assume that the company accepts the new
brand of bulbs.
E6) Given that
n1  200, X  1300, S1  41;

n 2  200, Y  1280, S2  46
Here, we want to test that there is significant difference in the mean
duration of their lives of two brands of electric bulbs. If 1 and 2
denote the mean lives of two brands of electric bulbs respectively then
our claim is 1 ≠ 2 and its complement is 1 = 2. Since the
complement contains the equality sign so we can take the complement
as the null hypothesis and the claim as the alternative hypothesis. Thus,
H 0 : 1   2 and H1 : 1   2
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
We want to test the null hypothesis regarding equality of two
population means. The standard deviations of both populations are
unknown so we should go for t-test if population of difference is known
to be normal. But it is not the case. Since sample sizes are large (n1, and
n2 > 30) so we go for Z-test.
So for testing the null hypothesis, the test statistic Z is given by
XY
Z
S12 S22

n1 n 2
1300  1280 20 20
    4.59
 41
2
 46
2
8.41  10.58 4.36

200 200
The critical (tabulated) values for two-tailed test at 1% level of
significance are ± zα/2 = ± z0.005 = ± 2.58.
Since calculated value of test statistic Z (= 4.59) is greater than the
critical values (= ± 2.58), that means it lies in rejection region, so we
reject the null hypothesis an support the alternative hypothesis i.e.
support the claim at 1% level of significance.
Thus, we conclude that samples do not provide us sufficient evidence
against the claim so there is significant difference in the mean duration
of their lives of two brands of electric bulbs.
E7) Given that
n1  50, X  8.3, S1  1.2;

n 2  100 , Y  8.0, S2  1.5


Here, we want to test that the first drug provides a significant longer
period of relief than the other. If 1 and 2 denote the mean relief time
due to first and second drugs respectively then our claim is 1 > 2 and
56
its complement is 1 ≤ 2. Since complement contains the equality sign Large Sample Tests
so we can take the complement as the null hypothesis and the claim as
the alternative hypothesis. Thus,
H 0 : 1   2 and H1 : 1   2
Since the alternative hypothesis is right-tailed so the test is right-tailed
test.
We want to test the null hypothesis regarding equality of two
population means. The standard deviations of both populations are
unknown. So we should go for t-test if population of difference is
known to be normal. But it is not the case. Since sample sizes are large
(n1, and n2 > 30) so we go for Z-test.
So for testing the null hypothesis, the test statistic Z is given by
XY
Z
S12 S22

n1 n 2

8.3  8.0 0.3


 2 2

 1.2   1.5  0.0288  0.0255

50 100
0.3
  1.32
0.2265
The critical (tabulated) value for right-tailed test at 5% level of
significance is zα = z0.05 = 1.645.
Since calculated value of test statistic Z (= 1.32) is less than the critical
value (=1.645), that means it lies in non-rejection region, so we do not
reject the null hypothesis and reject the alternative hypothesis i.e. we
reject the claim at 5% level of significance.
Thus, we conclude that samples provide us sufficient evidence against
the claim so the first drug does not has longer period of relief than the
other.
E8) Here, students are classified as Science background and other. We
define our success and failure as getting student of Science background
and other respectively. We are given that
n = Total number of students is the sample = 100
X = Number of students from Science background = 54
p = Sample proportion of students from Science background
54
  0.54
100
We want to test whether 50% of the students are from Science
background in MSc. If P denotes the proportion of first year Science
background students in the University. So our claim is P = P0 = 0.5 and
its complement is P ≠ 0.5. Since the claim contains the equality sign so
we can take the claim as the null hypothesis and complement as the
alternative hypothesis. Thus,
57
Testing of Hypothesis H0 = P = P0 = 0.5 (= 50%)
H1 : P ≠ 0.5 [Science background differs to 50%]
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Before proceeding further, first we have to check whether the condition
of normality meets or not.
 np  100  0.54  54  5
nq  100   1  0.54   100  0.46  46  5

We see that condition of normality meets, so we can go for Z-test.


For testing the null hypothesis, the test statistic Z is given by
p  P0 0.54  0.50
Z   Q 0  1  P0 
P0Q 0 0.5  0.5 /100
n
0.04
  0.80
0.05
The critical (tabulated) values for two-tailed test at 1% level of
significance are ± zα/2 = ± z0.005 = ± 2.58.
Since calculated value of test statistic Z (= 0.80) is less than the critical
value (= 2.58) and greater than critical value (= −2.58), that means it
lies in non-rejection region, so we do not reject the null hypothesis i.e.
we support the claim at 1% level of significance.
Thus, we conclude that sample fails to provide us sufficient evidence
against the claim so we may assume that 50% of the first year students
in MSc. Economics in the University are from Science background.
E9) We define our success and failure as a patient is survived and not
survived. Here, we are given that
n = 200
X = Number of survived patients who are given a particular injection
= 180
p = Sample proportion of survived patients who are given a particular
injection
X 180
   0.9
n 200
80 80
P0  80%   0.80  Q 0  1  P0   0.20
100 100
Here, we want to test that the survival rate of the patients is more than
80%. If P denotes the proportion of survival patients then our claim is P
> 0.80 and its complement is P ≤ 0.80. Since complement contains the
equality sign so we can take the complement as the null hypothesis and
the claim as the alternative hypothesis. Thus,
H 0 : P  P0  0.80 and H1 : P  0.80

58
Since the alternative hypothesis is right-tailed so the test is right-tailed Large Sample Tests
test.
Before proceeding further, first we have to check whether the condition
of normality meets or not.
 np  200  0.9  180  5

nq  200   1  0.9   200  0.1  20  5


We see that condition of normality meets, so we can go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
p  P0
Z
P0 Q 0
n
0.9  0.8 0.1
   3.53
 0.8  0.2  0.0283
200
The critical (tabulated) value for right-tailed test at 5% level of
significance is zα = z0.05 = 1.645.
Since calculated value of test statistic Z (= 3.53) is greater than the
critical value (=1.645), that means it lies in rejection region, so we
reject the null hypothesis and support the alternative hypothesis i.e.
support our claim at 5% level of significance.
Thus, we conclude that sample fails to provide us sufficient evidence
against the claim so we may assume that the survival rate is greater than
80% in the population using that injection.
E10) Let X1 and X2 stand for number of literates in districts A and B
respectively. Therefore, we are given that
X1 50
n1  100, X1  50  p1    0.50
n1 100
X 40
n 2  100, X 2  40  p 2  2   0.40
n 2 100
Here, we want to test whether the proportion of literate persons in two
districts A and B is same. If P1 and P2 denote the proportions of literate
persons in two districts A and B respectively then our claim is P1 = P2
and its complement P1 ≠ P2. Since the claim contains the equality sign
so we can take the claim as the null hypothesis and complement as the
alternative hypothesis. Thus,
H 0 : P1  P2  P and H1 : P1  P2

Since the alternative hypothesis is two-tailed so the test is two tailed


test.
The estimate of the combined proportion (P) of literates in districts A
and B is given by
n1p1  n 2 p2 X1  X 2 50  40
P̂     0.45
n1  n 2 n1  n 2 100  100
ˆ  1  Pˆ  1  0.45  0.55
Q
59
Testing of Hypothesis Before proceeding further, first we have to check whether the condition
of normality meets or not.
 n1p1  100  0.50  50  5, n1q1  100  0.50  50  5
n 2 p2  100  0.40  40  5, n 2q 2  100  0.60  60  5
We see that condition of normality meets, so we can go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
p1  p 2
Z
ˆ  1  1 
P̂Q
 n1 n 2 
0.50  0.40 0.10
   1.42
1 1  0.0704
0.45  0.55   
 100 100 
The critical (tabulated) values for two-tailed test at 1% level of
significance are ± zα/2 = ± z0.005 = ± 2.58.
Since calculated value of test statistic Z (= 1.42) is less than the critical
value (= 2.58) and greater than critical value (= −2.58), that means it
lies in non-rejection region, so we do not reject the null hypothesis i.e.
we support our claim at 1% level of significance.
Thus, we conclude that samples fail to provide us sufficient evidence
against the claim so we may assume that the proportion of literates in
districts A and B is equal.
E11) Here, we are given that
n1  1200, p1  30%  0.30

n 2  900, p2  20%  0.20


Here, we want to test that the proportion of blue-eye persons in both the
population is same. If P1 and P2 denote the proportions of blue-eye
persons in two populations respectively then our claim is P1 = P2 and its
complement P1 ≠ P2. Since the claim contains the equality sign so we
can take the claim as the null hypothesis and complement as the
alternative hypothesis. Thus,
H 0 : P1  P2  P and H1 : P1  P2

Since the alternative hypothesis is two-tailed so the test is two tailed


test.
The estimate of the combined proportion (P) of literates in districts A
and B is given by

n1p1  n 2 p2 1200  0.30  900  0.20


P̂    0.257
n1  n 2 1200  900
ˆ  1  Pˆ  1  0.257  0.743
Q

Before proceeding further, first we have to check whether the condition


of normality meets or not.
60
 n1p1  1200  0.30  360  5, n1q1  1200  0.70  840  5 Large Sample Tests

n 2p2  900  0.20  180  5, n 2q 2  900  0.80  720  5


We see that condition of normality meets, so we can go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
p1  p 2
Z
ˆ  1  1 
P̂Q
 n1 n 2 
0.30  0.20 0.10
   5.26
 1 1  0.019
0.257  0.743   
 1200 900 
The critical (tabulated) values for two-tailed test at 5% level of
significance are ± zα/2 = ± z0.005 = ± 1.96.
Since calculated value of test statistic Z (= 5.26) is greater than critical
values (= ±1.96), that means it lies in rejection region, so we reject the
null hypothesis i.e. we reject our claim at 5% level of significance.
Thus, we conclude that samples provide us sufficient evidence against
the claim so the proportion of blue-eyed person in both the population
is not same.
E12) Here, we are given that
n  120, S  7, 0  6
Here, we want to test that standard deviation (σ) of the life of bulbs of
the lot is 6 hours. So our claim is σ = 6 and its complement is σ ≠ 6.
Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,
H 0 :   0  6 and H1 :   6
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Here, the distribution of population under study is not known and
sample size is large (n > 30) so we can go for Z-test.
For testing the null hypothesis, the test statistic Z is given by
S2  σ02
Z ~ N  0,1 
2
σ02
n
2 2


 7  6 
13

13
 2.80
2 2 36  0.129 4.64
6
120
The critical values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ±1.96.
Since calculated value of Z (= 2.8) is greater than critical values
(= ±1.96), that means it lies in rejection region, so we reject the null
hypothesis i.e. we reject our claim at 5% level of significance.

61
Testing of Hypothesis Thus, we conclude that sample provides us sufficient evidence against
the claim so standard deviation of the life of bulbs of the lot is not 6.0
hours.
E13) Here, we are given that
n1  52, S12  25
n2  40, S22  12
Here, we want to test that variance of source A significantly differs to
the variances of source B. If 12 and 22 denote the variances in the raw
materials of sources A and B respectively so our claim is 12  22 and
its complement is 12   22 . Since complement contains the equality
sign so we can take the complement as the null hypothesis and the
claim as the alternative hypothesis. Thus,
H0 : 12  22 and H1 : 12  22
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Here, the distributions of populations under study are not known and
sample sizes are large (n1  52  30, n 2  40  30) so we can go for Z-
test.
Since population variances are unknown so for testing the null
hypothesis, the test statistic Z is given by
S12  S22
Z
 2S14 2S24 
 n  n 
 1 2 

25  12 13
   2.36
2
225 212
2 5.5

52 40
The critical values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ±1.96.
Since calculated value of Z (= 2.36) is greater than critical values (= ±1.96),
that means it lies in rejection region, so we reject the null hypothesis and
support the alternative hypothesis i.e. we support our claim at 5% level of
significance.
Thus, we conclude that samples fail to provide us sufficient evidence against
the claim so variance of source A significantly differs to the variance of source
B.

62
UNIT 11 SMALL SAMPLE TESTS
Structure
11.1 Introduction
Objectives
11.2 General Procedure of t-Test for Testing a Hypothesis
11.3 Testing of hypothesis for Population Mean Using t-Test
11.4 Testing of Hypothesis for Difference of Two Population Means Using
t-Test
11.5 Paired t-Test
11.6 Testing of Hypothesis for Population Correlation Coefficient Using
t-Test
11.7 Summary
11.8 Solutions /Answers

11.1 INTRODUCTION
In previous unit, we have discussed the testing of hypothesis for large samples
in details. Recall that throughout the unit, we were making an assumption that
“if sample size is sufficiently large then test statistic follows approximately
standard normal distribution”. Also recall two points highlighted in this course,
i.e.
 Cost of our study increases as sample size increases.
 Sometime nature of the units in the population under study is such that they
destroyed under investigation.
If there are limited recourses in terms of money then first point listed above
force us not to go for large sample size when items /units under study are very
costly such as airplane, computer, etc. Second point listed above give an alarm
for not to go for large sample if population units are destroyed under
investigation.
So, we need an alternative technique which is used to test the hypothesis based
on small sample(s). Small sample tests do this job for us. But in return they
demand one basic assumption that population under study should be normal as
you will see when you go through the unit. t,  2 and F -tests are some
commonly used small sample tests.
In this unit, we will discuss t-test in details which is based on the t-distribution
described in Unit 3 of this course. And  2 and F-tests will be discussed in next
unit which are based on χ2 and F-distributions described in Unit 3 and Unit 4 of
this course respectively.
This unit is divided into eight sections. Section 11.1 is described the need of
small sample tests. The general procedure of t-test for testing a hypothesis is
described in Section 11.2. In Section 11.3, we discuss testing of hypothesis for
population mean using t-test. Testing of hypothesis for difference of two
population means when samples are independent is described in Section 11.4
whereas in Section 11.5, the paired t-test for difference of two population
means when samples are dependent(paired) is discussed. In Section 11.6
testing of hypothesis for population correlation coefficient is explained. Unit

63
Testing of Hypothesis ends by providing summary of what we have discussed in this unit in Section
11.7 and solution of exercises in Section 11.8.
Before moving further a humble suggestion to you that please revise what you
have learned in previous two units. The concepts discussed there will help you
a lot to better understand the concepts discussed in this unit.
Objectives
After studying this unit, you should be able to:
 realize the importance of small sample tests;
 know the procedure of t-test for testing a hypothesis;
 describe testing of hypothesis for population mean for using t-test;
 explain the testing of hypothesis for difference of two population means
when samples are independent using t-test;
 describe the procedure for paired t-test for testing of hypothesis for
difference of two population means when samples are dependent or paired;
and
 explain the testing of hypothesis for population correlation coefficient using
t-test.

11.2 GENERAL PROCEDURE OF t-TEST FOR


TESTING A HYPOTHESIS
The general procedure of t-test for testing a hypothesis is similar as Z-test
already explained in Unit 10. Let us give you similar details here.
For this purpose, let X1, X2,…, Xn be a random sample of small size n (< 30)
selected from a normal population (recall the demand of small sample tests
pointed out in previous Section 11.1) having parameter of interest, say, 
which is actually unknown but its hypothetical value, say, 0 estimated from
some previous study or some other way is to be tested. t-test involves following
steps for testing this hypothetical value:
Step I: First of all, we setup null and alternative hypotheses. Here, we want
to test the hypothetical value θ0 of parameter θ so we can take the null
and alternative hypotheses as
H 0 :   0 and H 1 :   0 for two-tailed test 
H 0 :   0 and H 1 :   0 
or  for one-tailed test 
H 0 :   0 and H 1 :   0 

In case of comparing same parameter of two populations of interest,


say, 1, and 2 then our null and alternative hypotheses would be

H 0 : 1   2 and H1 : 1  2 for two-tailed test


H 0 : 1  2 and H1 : 1  2 
or  for one-tailed test 
H 0 : 1  2 and H1 : 1  2 

Step II: After setting the null and alternative hypotheses our next step is to
decide a criteria for rejection or non-rejection of null hypothesis i.e.
64
decide the level of significance , at which we want to test our null Small Sample Tests
hypothesis. We generally take  = 5 % or 1%.
Step III: The third step is to determine an appropriate test statistic, say, t for
testing the null hypothesis. Suppose Tn is the sample statistic (may be
sample mean, sample correlation coefficient, etc. depending upon )
for the parameter  then test-statistic t is given by
Tn  E(Tn )
t
SE  Tn 

Step IV: As we know, t-test is based on t-distribution and t-distribution is


described with the help of its degrees of freedom, therefore, test
statistic t follows t-distribution with specified degrees of freedom as
the case may be.
By putting the values of Tn, E(Tn) and SE(T n) in above formula, we
calculate the value of test statistic t. Let tcal be the calculated value of
test statistic t after putting these values.
Step V: After that, we obtain the critical (cut-off or tabulated) value(s) in the
sampling distribution of the test statistic t corresponding to  assumed
in Step II. The critical values for t-test are given in Table-II (t-table)
of the Appendix at the end of Block 1 of this course corresponding to
different level of significance (α). After that, we construct rejection
(critical) region of size  in the probability curve of the sampling
distribution of test statistic t.
Step VI: Take the decision about the null hypothesis based on calculated and
critical value(s) of test statistic obtained in Step IV and Step V
respectively. Since critical value depends upon the nature of the test
that it is one-tailed test or two-tailed test so following cases arise:
In case of one-tailed test:
Case I: When H 0 :   0 and H 1 :   0 (right-tailed test)

In this case, the rejection (critical) region falls under the right tail of
the probability curve of the sampling distribution of test statistic t.
Suppose t(ν), is the critical value at  level of significance then entire
region greater than or equal to t(ν), is the rejection region and less
than t(ν), is the non-rejection region as shown in Fig. 11.1.
If tcal ≥ t(ν,), that means calculated value of test statistic t lies in the Fig. 11.1
rejection (critical) region, then we reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that sample data
provides us sufficient evidence against the null hypothesis and there is
a significant difference between hypothesized value and observed
value of the parameter.
If tcal < t(ν,), that means calculated value of test statistic t lies in non-
rejection region, then we do not reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that the sample data
fails to provide us sufficient evidence against the null hypothesis and
the difference between hypothesized value and observed value of the
parameter due to fluctuation of sample.
Testing of Hypothesis Case II: When H 0 :    0 and H 1 :   0 (left-tailed test)
In this case, the rejection (critical) region falls under the left tail of the
probability curve of the sampling distribution of test statistic t.
Suppose - t(ν), is the critical value at  level of significance then
entire region less than or equal to - t(ν), is the rejection region and
greater than - t(ν), is the non-rejection region as shown in Fig. 11.2.
If tcal ≤ − t(ν),, that means calculated value of test statistic t lies in the
rejection (critical) region, then we reject the null hypothesis H0 at 
level of significance.
If tcal >- t(ν),, that means calculated value of test statistic t lies in the
Fig. 11.2 non-rejection region, then we do not reject the null hypothesis H0 at 
level of significance.
In case of two-tailed test:
That is, when H 0 :   0 and H 1 :   0
In this case, the rejection region falls under both tails of the
probability curve of sampling distribution of the test statistic t. Half
the area (α) i.e. α/2 will lies under left tail and other half under the
right tail. Suppose  t (  ),  / 2 and t (  ),  / 2 are the two critical values at
the left- tailed and right-tailed respectively. Therefore, entire region
less than or equal to  t (  ),  / 2 and greater than or equal to t (  ),  / 2 are
Fig. 11.3
the rejection regions and between  t (  ),  / 2 and t (  ),  / 2 is the non-
rejection region as shown in Fig. 11.3.
If tcal ≥ t(ν),/2, or tcal ≤ -t(ν),/2, that means calculated value of test
statistic t lies in the rejection(critical) region, then we reject the null
hypothesis H0 at  level of significance.
And if -t(ν),/2 < tcal < t(ν),/2, that means calculated value of test
statistic t lies in the non-rejection region, then we do not reject the
null hypothesis H0 at  level of significance.
Procedure of taking the decision about the null hypothesis on the basis of
p-value:
To take the decision about the null hypothesis on the basis of p-value, the p-
value is compared with given level of significance (α). And if p-value is less
than or equal to α then we reject the null hypothesis and if p-value is greater
than α then we do not reject the null hypothesis at α level of significance.
Since the distribution of test statistic t follows t-distribution with ν df and we
also know that t-distribution is symmetrical about t = 0 line therefore, if tcal
represents calculated value of test statistic t then p-value can be defined as:
For one-tailed test:
For H1: θ > θ0 (right-tailed test)
p-value = P[t  tcal]
For H1: θ < θ0 (left-tailed test)
p-value = P  t  t cal 
For two-tailed test: For H1:   0
p-value = 2P  t  t cal  Small Sample Tests

These p-values for t-test can be obtained with the help of Table-II (t-table)
given in the Appendix at the end of Block 1 of this course. But this table gives
the t-values corresponding to the standard values of α such as 0.10, 0.05,
0.025, 0.01 and 0.005 only, therefore, the exact p-values are not obtained with
the help of this table and we can approximate the p-value for this test.
For example, if test is right-tailed and calculated (observed) value of test
statistic t is 2.94 with 9 df then p-value is obtained as:
Since calculated value of test statistic t is based on the 9 df therefore, we use
row for 9 df in the t-table and move across this row to find the values in which
calculated t-value falls. Since calculated t-value falls between 2.821 and 3.250,
which are corresponding to the values of one-tailed area α = 0.01 and 0.005
respectively, therefore, p-value will lie between 0.005 and 0.01, that is,
0.005  p-value  0.01
If in the above example, the test is two-tailed then the two values 0.01 and
0.005 would be doubled for p-value, that is,
0.005  2  0.01  p-value  0.02  2  0.01
Note 1: With the help of computer packages and softwares such as SPSS, SAS,
MINITAB, EXCEL, etc. we can find the exact p-values for t-test.
Now, you can try the following exercise.

E1) If test is two-tailed and calculated value of test statistic t is 2.42 with 15
df then find the p-value for t-test.

11.3 TESTING OF HYPOTHESIS FOR


POPULATION MEAN USING t-TEST
In Section 10.3 of the previous unit, we have discussed Z-test for testing the
hypothesis about population mean when population variance σ2 is known and
unknown.
Recall form these, we have already pointed out that one basic difference
between Z-test and t-test is that Z-test is used when population SD is known
whether sample size is large or small and t-test is used when population SD is
unknown whether sample size is small or large. But in case of large sample size
Z-test is an appropriate of t-test as we did in previous unit. But in practice
standard deviation of population is not known and sample size is small so in
this situation, we use t-test provided population under study is normal.
Assumptions
Virtually every test has some assumptions which must be met prior to the
application of the test. This t-test needs following assumptions to work:
(i) The characteristic under study follows normal distribution. In other
words, populations from which random sample is drawn should be
normal with respect to the characteristic of interest.
(ii) Sample observations are random and independent.
(iii) Population variance σ2 is unknown.
67
Testing of Hypothesis For describing this test, let X1, X2,…, Xn be a random sample of small size
n (< 30) selected from a normal population with mean µ and unknown
variance σ2.
Now, follow the same procedure as we have discussed in previous section,
that is, first of all we setup the null and alternative hypotheses. Here, we
want to test the claim about the specified value 0 of population mean µ so
we can take the null and alternative hypotheses as
 Here,    and 0  0 
if we compareit with 
H0 :   0 and H1 :   0  for two-tailed test  general procedure. 

H 0 :   0 and H1 :   0 
or   for one-tailed test 
H 0 :   0 and H1 :   0 
For testing the null hypothesis, the test statistic t is given by
X  0
t ~ t  n 1 under H 0
S/ n

1 n 1 n 2
where, X  
n i1
X i is the sample mean and S 2
 
n  1 i1
 X i  X  is the

sample variance.
For computational simplicity, we may use the following formulae for X, S2 :
2
1 1    d 
n  1 
X  a   d and S 
2 2
 d  
n n 
 
where, d = (X − a ), ‘a’ being the assumed arbitrary value.
Here, the test statistic t follows t-distribution with (n − 1) degrees of freedom as
we discussed in Unit 3 of this course.
After substituting values of X, S and n , we get calculated value of test
statistic t. Then we look for critical (or cut-off or tabulated) value(s) of test
statistic t from the t-table. On comparing calculated value and critical value(s),
we take the decision about the null hypothesis as discussed in previous section.
Let us do some examples of testing of hypothesis about population mean using
t-test.
Example 1: A manufacturer claims that a special type of projector bulb has an
average life 160 hours. To check this claim an investigator takes a sample of 20
such bulbs, puts on the test, and obtains an average life 167 hours with standard
deviation 16 hours. Assuming that the life time of such bulbs follows normal
distribution, does the investigator accept the manufacturer’s claim at 5% level
of significance?
Solution: Here, we are given that
0  160, n  20, X  167 and S  16
Here, we want to test the manufacturer claims that a special type of projector
bulb has an average life (µ) 160 hours. So claim is µ = 160 and its complement
is µ ≠ 160. Since the claim contains the equality sign so we can take the claim
as the null hypothesis and complement as the alternative hypothesis. Thus,
68
H 0 :    0  160 and H 1 :   160 Small Sample Tests

Since the alternative hypothesis is two-tailed so the test is two-tailed test.


Here, we want to test the hypothesis regarding population mean when
population SD is unknown. Also sample size is small n = 20(n < 30) and
population under study is normal, so we can go for t-test for testing the
hypothesis about population mean.
For testing the null hypothesis, the test statistic t is given by
X  µ0
t
S/ n
167  160 7
   1.96
16 / 20 3.58
The critical value of the test statistic t for various df and different level of
significance  are given in Table II of the Appendix at the end of the Block
1of this course.
The critical (tabulated) values of test statistic for two-tailed test corresponding
(n-1) = 19 df at 5% level of significance are  t ( n 1), α / 2   t (19 ), 0.025   2.093.

Since calculated value of test statistic t (= 1.96) is greater than the critical
value (= − 2.093) and is less than critical value (= 2.093), that means calculated
value of test statistic lies in non-rejection region as shown in Fig. 11.4. So we
do not reject the null hypothesis i.e. we support the manufacture’s claim at 5%
level of significance.
Fig. 11.4
Decision according to p-value:
Since calculated value of test statistic t is based on 19 df therefore, we use row
for 19 df in the t-table and move across this row to find the values in which
calculated t-value falls. Since calculated t-value falls between 1.729 and 2.093
corresponding to one-tailed area α = 0.05 and 0.025 respectively therefore p-
value lies between 0.025 and 0.05, that is,
0.025  p-value  0.05
Since test is two-tailed so
2  0.025  0.05  p-value  0.10  0.05  2
Since p-value is greater than  (  0.05) so we do not reject the null
hypothesis at 5% level of significance.
Thus, we conclude that sample fails to provide us sufficient evidence against
the null hypothesis so we may assume that the manufacture’s claim is true so
the investigator may accept the manufacturer’s claim at 5% level of
significance.
Example 2: The mean share price of companies of Pharma sector is Rs.70. The
share prices of all companies were changed time to time. After a month, a
sample of 10 Pharma companies was taken and their share prices were noted as
below:
70, 76, 75, 69, 70, 72, 68, 65, 75, 72
Assuming that the distribution of share prices follows normal distribution, test
whether mean share price is still the same at 1% level of significance?
Testing of Hypothesis Solution: Here, we wish to test that the mean share price (µ) of companies of
Pharma sector is still Rs.70 besides all changes. So our claim is µ = 70 and its
complement is µ ≠ 70. Since the claim contains the equality sign so we can
take the claim as the null hypothesis and complement as the alternative
hypothesis. Thus,
H 0 :    0  70 mean share price of companies is still Rs. 70 
H1 :    0  70  mean share price of companies is not still Rs. 70
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
Here, we want to test the hypothesis regarding population mean when
population SD is unknown. Also sample size is small n = 10(n < 30) and
population under study is normal, so we can go for t-test for testing the
hypothesis about population mean.
For testing the null hypothesis, the test statistic t is given by
X  µ0
t ~ t ( n 1) … (1)
S/ n
Calculation for X and S:
S. No. Sample value Deviation d2
(X) d = (X-a), a = 70
1 70 0 0
2 76 6 36
3 75 5 25
4 69 -1 1
5 70 0 0
6 72 2 4
7 68 -2 4
8 65 -5 25
9 75 5 25
10 72 2 4
Total 12 124

The assumed value of a is 70.


From the above calculation, we have
1 1
Xa
n  d  70   12  71.2
10

2
S 
1  2
d 
 d 
2


n 1  n 
 

1   12 2  1  144 
  124    124   12.18
10  1  10  9  10 

S 12.18  3.49


Putting the values in equation (1), we have
71.2  70 1.2
t   1.09
3.49 / 10 1.10
70
The critical (tabulated) values of test statistic for two-tailed test corresponding Small Sample Tests
(n-1) = 9 df at 1% level of significance are  t ( n 1), α / 2   t ( 9 ), 0.005   3.250.

Since calculated value of test statistic t (= 1.09) is less than the critical value
(= 3.250) and greater than the critical value (= ‒3.250), that means calculated
value of t lies in non-rejection region as shown in Fig. 11.5. So we do not
reject the null hypothesis i.e. we support the claim at 1% level of significance.
Decision according to p-value:
Since calculated value of test statistic t is based on 9 df therefore, we use row
for 9 df in the t-table and move across this row to find the values in which Fig. 11.5
calculated t-value falls. Since all values in this row are greater than calculated
t-value 1.09 and the smallest value is 1.383 corresponding to one-tailed area
α = 0.10 therefore p-value is greater than 0.10, that is,
p-value  0.10
Since test is two-tailed so
p-value  2  0.10  0.20
Since p-value (= 0.20) is greater than (  0.01) so we do not reject the null
hypothesis at 1% level of significance.
Thus, we conclude that the sample fails to provide us sufficient evidence
against the claim so may assume that the mean share price is still Rs. 70.
Now, you can try the following exercises.
E2) A tyre manufacturer claims that the average life of a particular category
of his tyre is 18000 km when used under normal driving conditions. A
random sample of 16 tyres was tested. The mean and SD of life of the
tyres in the sample were 20000 km and 6000 km respectively.
Assuming that the life of the tyres is normally distributed, test the claim
of the manufacturer at 1% level of significance using appropriate test.
E3) It is known that the average weight of cadets of a centre follows normal
distribution. Weights of 10 randomly selected cadets from the same
centre are as given below:
48, 50, 62, 75, 80, 60, 70, 56, 52, 77
Can we say that average weight of all cadets of the centre from which
the above sample was taken is equal to 60 kg at 5% level of
significance?

11.4 TESTING OF HYPOTHESIS FOR


DIFFERENCE OF TWO POPULATION MEANS
USING t-TEST
In Section 10.4 of the previous unit, we have discussed Z-test for testing the
hypothesis about difference of two population means under different possibility
of population variances 12 and 22 . Recall from there, we have pointed out that
one basic difference between Z-test and t-test is that, Z-test is used when
standard deviations of both populations are known and t-test is used when
standard deviations of both populations are unknown. But in practice standard
Testing of Hypothesis deviations of both populations are not known, so in real life problems t-test is
more suitable compared to Z-test.
Assumptions
This test works under following assumptions:
(i) The characteristic under study follows normal distribution in both the
populations. In other words, both populations from which random
samples are drawn should be normal with respect to the characteristic of
interest.
(ii) Samples and their observations both are independent to each other.
(iii) Population variances 12 and  22 are both unknown but equal.
For describing this test, let there be two normal populations N 1, 12 and  
N 2 , 22  under study. And we have to draw two independent random samples,
say, X1,X2 ,...,Xn1 and Y1, Y2 ,..., Yn2 of sizes n1 and n2 from these normal
   
populations N 1, 12 and N 2 , 22 respectively. Let X and Y be the means of
first and second sample respectively. Further, suppose the variances of both the
populations are unknown but are equal, i.e., 12  22  2 (say). In this case, σ2
is estimated by value of pooled sample variance S 2p where,
1
S2p   n1  1 S12   n 2  1 S22 
n1  n 2  2

and
n n
1 1
2 1 2
2
S12  
(n1  1) i 1
 X i  X  , S 2
2  
(n 2  1) i1
 Yi  Y 

This can also be written as

1  n1 2
n2
2
  Xi  X     Yi  Y  
2
S 
p
n1  n 2  2  i 1 i 1 
For computational simplicity, use the following formulae for X, Y and S 2p :

1 1
Xa
n1  d1 , Y  b 
n2 d 2 and
2 2
1     d1      d 2   
n1  n 2  2    1     d2 
2 2 2
S 
p  d  
n1 n2
    
where, d1= (X − a ) and d2 = (Y − b), ‘a’ and ‘b’ are the assumed arbitrary
values.
Now, follow the same procedure as we have discussed in Section 11.2, that is,
first of all we have to setup null and alternative hypotheses. Here, we want to
test the hypothesis about the difference of two population means so we can take
the null hypothesis as
 Here, 1  1 and 2   2 
H 0 : 1  2 (no difference in means)  if we compare it with 
 general procedure. 

or H 0 : 1  2  0 (difference in two means is 0)


72
and the alternative hypothesis as Small Sample Tests

H1 : 1  2 for two-tailed test 

H0 : 1  2 and H1 : 1  2 
or  for one-tailed test 
H0 : 1  2 and H1 : 1  2 
For testing the null hypothesis, the test statistic t is given by
XY
t ~ t (n 1 n 2 2) under H0
1 1
Sp 
n1 n 2

After substituting values of X, Y, S p , n1 and n 2 , we get calculated value of test


statistic t. Then we look for critical (or cut-off or tabulated) value(s) of test
statistic t from the t-table. On comparing calculated value and critical value(s),
we take the decision about the null hypothesis as discussed in Section 11.2.
Let us do some examples to become more user friendly with the test explained
above.
Example 3: In a random sample of 10 pigs fed by diet A, the gain in weights
(in pounds) in a certain period were
12, 8, 14, 16, 13, 12, 8, 14, 10, 9
In another random sample of 10 pigs fed by diet B, the gain in weights (in
pounds) in the same period were
14, 13, 12, 15, 16, 14, 18, 17, 21, 15
Assuming that gain in the weights due to both foods follows normal
distributions with equal variances, test whether diets A and B differ
significantly regarding their effect on increase in weight at 5% level of
significance.
Solution: Here, we can test that diets A and B differ significantly regarding
their effect on increase in weight of pigs. If 1 and 2 denote the average gain
in weights due to diet A and diet B respectively then our claim is 1 ≠ 2 and
its complement is 1 = 2. Since complement contains the equality sign so we
can take the complement as the null hypothesis and the claim as the alternative
hypothesis. Thus,
H 0 : 1   2 and H 1 : 1   2

Since the alternative hypothesis is two-tailed so the test is two-tailed test.


Since it is given that the increase in the weight due to both foods follows
normal distributions and population variances are equal and unknown. And
other assumptions of t-test for testing a hypothesis about difference of two
population means also meet. So we can go for this test.
For testing the null hypothesis, the test statistic t is given by
XY
t ~ t (n 1n 22) under H0 … (2)
1 1
Sp 
n1 n 2

73
Testing of Hypothesis Calculation for X, Y and S p :
Diet A Diet B
X d1 = (X−a) d1 2 Y d2 = (Y−b) d2 2
a =12 b = 16
12 0 0 14 −2 4
8 −4 16 13 −3 9
14 2 4 12 −4 16
16 4 16 15 −1 1
13 1 1 16 0 0
12 0 0 14 −2 4
8 −4 16 18 2 4
14 2 4 17 1 1
10 −2 4 21 5 25
9 −3 9 15 −1 1
Total −4 70 −5 65

Here, a =12, b = 16 are assumed values.


From above calculations, we have
1  4 
Xa
n1  d1  12 
10
 11.6,

1  5 
Y  b   d 2  16   15.5
n2 10
2 2
1     d1      d2   
n1  n 2  2    1     d2 
2 2 2
S 
p  d  
n1 n2
    


1 
70 
 42   65   52 
   
10  10  2  10   10 

1
  68.4  62.5   7.27
18
 Sp  7.27  2.70

Putting the values in equation (2), we have


11.6  15.5
t
1 1
2.70 
10 10
 3.90 3.90
   3.21
2 .70  0 .45 1.215
The critical values of test statistic t for two-tailed test corresponding
(n1 + n2 -2) = 18 df at 5% level of significance are
 t ( n 1  n 2  2 ), α / 2   t (18 ), 0.025   2.101.

Since calculated value of test statistic t (= −3.21) is less than critical values
(± 2.101) that means calculated value of test statistic t lies in rejection region,
so we reject the null hypothesis and support the alternative hypothesis i.e.
support our claim at 5% level of significance.
74
Thus, we conclude that samples do not provide us sufficient evidence against Small Sample Tests
the claim so diets A and B differ significantly in terms of gain in weights of
pigs.
Example 4: The means of two random samples of sizes 10 and 8 drawn from
two normal populations are 210.40 and 208.92 respectively. The sum of
squares of the deviations from their means is 26.94 and 24.50 respectively.
Assuming that the populations are normal with equal variances, can samples be
considered to have been drawn from normal populations having equal mean.
Solution: In usual notations, we are given that
n1  10, n 2  8, X  210.40, Y  208.92,
2 2
 X  X   26.94,  Y  Y   24.50

Therefore,
1
S 2p     X  X  2    Y  Y 2 
n1  n 2  2  

1
 26.94  24.50  1  51.44  3.215
10  8  2 16
 Sp  3.215  1.79
We wish to test that both the samples are drawn from normal populations
having the same means. If 1 and 2 denote the means of both normal
populations respectively then our claim is 1= 2 and its complement is
1 ≠ 2. Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,
H 0 : 1   2 [mean of both populations is equal]
H1 : 1   2 [mean of both populations is not equal]
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
Since it is given that two populations are normal with equal and unknown
variances and other assumptions of t-test for testing a hypothesis about
difference of two population means also meet. So we can go for this test.
For testing the null hypothesis, the test statistic t is given by
XY
t ~ t (n 1 n 2 2) under H0
1 1
Sp 
n1 n 2
210.40  208.92 1.48 1.48
    1.76
1 1 1.79  0.47 0.84
1.79 
10 8
The critical values of test statistic t for two-tailed test corresponding
(n1 + n2 -2) = 16 df at 5% level of significance are
 t ( n1  n 2  2 ), α / 2   t (16 ), 0.025   2.12.

Since calculated value of test statistic t (= 1.76) is less than the critical value
(= 2.12) and greater than the critical value (= ‒2.12), that means calculated
value of test statistic t lies in non-rejection region so we do not reject the null
hypothesis i.e. we support the claim.
75
Testing of Hypothesis Thus, we conclude that samples fail to provide us sufficient evidence against
the claim so we may assume that both samples are taken from normal
populations having equal means.
Now, you can try the following exercises.
E4) Two different types of drugs A and B were tried on some patients for
increasing their weights. Six persons were given drug A and other 7
persons were given drug B. The gain in weights (in ponds) is given
below:

Drug A 5 8 7 10 9 6 −

Drug B 9 10 15 12 14 8 12

Assuming that increment in the weights due to both drugs follows


normal distributions with equal variances, do the both drugs differ
significantly with regard to their mean weights increment at 5% level of
significance?
E5) To test the effect of fertilizer on wheat production, 26 plots of land with
equal areas were chosen. Half of these plots were treated with fertilizer
and the other half were untreated. Other conditions were the same. The
mean yield of wheat on the untreated plots was 4.6 quintals with a
standard deviation of 0.5 quintals, while the mean yield of the treated
plots was 5.0 quintals with standard deviations of 0.3 quintals.
Assuming that yields of wheat with and without fertilizer follow normal
distributions with equal variances, can we conclude that there is
significant improvement in wheat production due to effect of fertilizer
at 1% level of significance?

11.5 PAIRED t-TEST


In the previous section, we have discussed t-test for equality of two population
means in case of independent samples. However, there are so many situations
where two samples are not independent and observations are recorded on the
same individuals or items. Generally, such types of observations are recorded
to assess the effectiveness of a particular training, diet, treatment, medicine,
etc. In such situations, the observations are recorded “before and after” the
insertion of training, treatment, etc. as the case may be. For example, if we
wish to test a new diet on, say, 15 individuals then the weight of the individuals
recorded before diet and after the diet will form two different samples in which
observations will be paired as per each individual. Similarly, in the test of
blood-sugar in human body, fasting sugar level before meal and sugar level
after meal, both are recorded for a patient as paired observations, etc. The
parametric test designed for this type of situation is known as paired t-test.
Now, come to the working principle of this test. This test first of all converts
the two populations into a single population by taking the difference of paired
observations.
Now, instead of two populations, we are left with one population, the
population of differences. And the problem of testing equality of two
population mean reduces to test the hypothesis that mean of the population of
differences is equal to zero.

76
Assumptions Small Sample Tests

This test works under following assumptions:


(i) The population of differences follows normal distribution.
(ii) Samples are not independent.
(iii) Size of both the samples is equal.
(iv) Population variances are unknown but not necessarily equal.
Let (X1, Y1), (X2, Y2), …,(Xn, Yn) be a paired random sample of size n and the
difference between paired observations Xi & Yi be denoted by Di, that is,
D i  X i  Yi for all i  1, 2,..., n
Hence, we can assume that D1, D2, …, Dn be a random sample from normal
population of differences with mean D and unknown variance  2D . This is
same as the case of testing of hypothesis for population mean when population
variance is unknown which is described in Section 11.3 of this unit.
Here, we want to test that there is an effect of a diet, training, treatment,
medicine, etc. So we can take the null hypothesis as
H0: µ1 = µ2 or H0 : µD  µ1  µ2  0
and the alternative hypothesis
H1 : 1   2 or H1 :  D  0 for two-tailed test 
H0: 1  2 and H1 : 1  2 
or  for one-tailed test 
H0: 1  2 and H1 : 1  2 
For testing the null hypothesis, the test statistic t is given by
D
t ~ t  n 1  under H 0
SD / n
2
  n  
1 n 1 n 2
 D 
1  n 2  i1  
n i1
2
where, D   Di and SD    Di  D   n  1  
n  1 i1 i 1
D 
n 
 
 
After substituting values of D, S D and n we get calculated value of test
statistic t. Then we look for critical (or cut-off or tabulated) value(s) of test
statistic t from the t-table. On comparing calculated value and critical value(s),
we take the decision about the null hypothesis as discussed in Section 11.2.
Let us do some examples to become more user friendly with paired t-test.
Example 5: A group of 12 children was tested to find out how many digits
they would repeat from memory after hearing them once. They were given
practice session for this test. Next week they were retested. The results
obtained were as follows:
Child Number 1 2 3 4 5 6 7 8 9 10 11 12
Recall Before 6 4 5 7 6 4 3 7 8 4 6 5
Recall After 6 6 4 7 6 5 5 9 9 7 8 7

77
Testing of Hypothesis Assuming that the memories of the children before and after the practice
session follow normal distributions, is the memory practice session improve the
performance of children?
Solution: Here, we want to test that memory practice session improve the
performance of children. If 1 and 2 denote the mean digit repetition before
and after the practice so our claim is 1 < 2 and its complement is 1 ≥ 2.
Since complement contains the equality sign so we can take the complement as
the null hypothesis and the claim as the alternative hypothesis. Thus,
H 0 : 1   2 and H1 : 1   2

Since the alternative hypothesis is left-tailed so the test is left-tailed test.


It is a situation of before and after. Also, it is given that the memories of the
children before and after the practice session follow normal distributions. So,
population of differences will also be normal. Also all the assumptions of
paired t-test meet so we can go for paired t-test.
For testing the null hypothesis, the test statistic t is given by
D
t ~ t  n 1  under H 0 … (3)
SD / n

where, D and S D are mean and standard deviation of the population of


differences.
Calculation for D and S D :
Child Digit recall D = (X−Y) D2
Number Before (X) After (Y)
1 6 6 0 0
2 4 6 −2 4
3 5 4 1 1
4 7 7 0 0
5 6 6 0 0
6 4 5 −1 1
7 3 5 −2 4
8 7 9 −2 4
9 8 9 −1 1
10 4 7 −3 9
11 6 8 −2 4
12 5 7 −2 4
   14
D   32
D 2

From above calculations, we have


1 1
D
n  D   14    1.17
12
2
1    D  
n  1  
2 2
S 
D D  
n
 

1  14 2  1
  32    15.67  1.42
11  12  11

 SD  1.42  1.19

78
Substituting these values in equation (3), we have Small Sample Tests

 1.17 1.17
t   3.44
1.19 / 12 0.34
The critical value of test statistic t for left-tailed test corresponding (n-1) = 11
df at 5% level of significance is  t ( n 1), α   t (11), 0.05   1.796.

Since calculated value of test statistic t (= −3.44) is less than the critical value
(=−1.796), that means calculated value of t lies in rejection region, so we reject
the null hypothesis and support the alternative hypothesis i.e. support the claim
at 5% level of significance.
Thus, we conclude that samples fail to provide us sufficient evidence against
the claim so we may assume that memory practice session improves the
performance of children.
Example 6: Ten students were given a test in Statistics and after one month’s
coaching they were again given a test of the similar nature and the increase in
their marks in the second test over the first are shown below:
Roll No. 1 2 3 4 5 6 7 8 9 10
Increase in Marks 6 −2 8 −4 10 2 5 −4 6 0

Assuming that increment in marks follows normal distribution. Do the data


indicate that students have gained knowledge from the coaching at 1% level of
significance?
Solution: Here, we want to test that students have gained knowledge from the
coaching. If D denotes the average increment in the marks due to one month’s
coaching then our claim is D < 0 but here we are given increment
Di = (Yi – Xi) instead of Di = (Xi – Yi) so we take our claim is D > 0 and its
complement is D ≤ 0. Since complement contains the equality sign so we can
take the complement as the null hypothesis and the claim as the alternative
hypothesis. Thus,
H 0 :  D  0 and H1 :  D  0

Since the alternative hypothesis is right-tailed so the test is right-tailed test.


It is given that increment in marks after one month coaching follows normal
distribution and population variance is unknown. Also participants are same in
both situations before and after the coaching. And all the assumption of paired
t-test meet so we can go for paired t-test.
For testing H0, the test statistic is given by
D
t ~ t  n 1  under H 0 … (4)
SD / n
Calculation for D and S D :
Roll
1 2 3 4 5 6 7 8 9 10 Total
No.
D 6 −2 8 −4 10 2 5 −4 6 0  D  27
2 2
D 36 4 64 16 100 4 25 16 36 0 D  301

From above calculations, we have


1 1
D
n  D   27  2.7
10
79
Testing of Hypothesis 2
1    D  
n  1  
2 2
S 
D D  
n
 

1   27 2  1
 301    228.1  25.34
10  1  10  9

 SD  25.34  5.03
Substituting these values in equation (4), we have
2.7 2.7
t   1.70
5.03/ 10 1.59
The critical value of test statistic t for right-tailed test corresponding (n-1) = 9
df at 1% level of significance is t (n 1), α  t (9), 0.01  2.821.
Since calculated value of test statistic t (= 1.70) is less than the critical value
(= 2.821), that means calculated value of test statistic t lies in non-rejection
region, so we do not reject null hypothesis and reject the alternative hypothesis
i.e. we reject our claim at 1% level of significance.
Thus, we conclude that sample provides us sufficient evidence against the
claim so students are not gained knowledge from the coaching.
Now, you can try the following exercises.
E6) To verify whether the programme “Post Graduate Diploma in Applied
Statistics (PGDAST)” improved performance of the graduate students
in Statistics, a similar test was given to10 participants both before and
after the programme. The original marks out of 100 (before course)
recorded in an alphabetical order of the participants are 42, 46, 50, 36,
44, 60, 62, 43, 70 and 53. After the course the marks in the same order
are 45, 46, 60, 42, 60, 72, 63, 43, 80 and 65. Assuming that marks of
the students before and after the course follow normal distribution. Test
whether the programme PGDAST has improved the performance of the
graduate students in Statistics at 5% level of significance?
E7) A drug is given to 8 patients and the increments in their blood pressure
are recorded to be 4, 0, 7, −2, 0, −3, 2, 0. Assume that increment in their
blood pressure follows normal distribution. Is it reasonable to believe
that the drug has no effect on the change of blood pressure at 5% level
of significance?

11.6 TESTING OF HYPOTHESIS FOR


POPULATION CORRELATION COEFFICIENT
USING t-TEST
In Unit 6 of MST-002, we have discussed the concept of correlation. Where,
we studied that if two variables are related in such a way that change in the
value of one variable affects the value of another variable then the variables are
said to be correlated or there is a correlation between these two variables.
Correlation can be positive, which means the variables move together in the
same direction, or negative, which means they move in opposite directions.
And correlation coefficient is used to measure the intensity or degree of linear
relationship between two variables. The value of correlation coefficient varies

80
between −1 and +1, where −1 representing a perfect negative correlation, 0 Small Sample Tests
representing no correlation, and +1 representing a perfect positive correlation.
Sometime, the sample data indicate for non-zero correlation but in population
they are uncorrelated ( = 0).
For example, price of tomato in Delhi (X) and in London (Y) are not correlated
in population ( = 0). But paired sample data of 20 days of prices of tomato at
both places may show correlation coefficient (r) ≠ 0. In general, in sample data
r ≠ 0 does not ensure in population  ≠ 0 holds.
In this section, we will know how we test the hypothesis that population
correlation coefficient is zero.
Assumptions
This test works under following assumptions:
(i) The characteristic under study follows normal distribution in both the
populations. In other words, both populations from which random
samples are drawn should be normal with respect to the characteristic of
interest.
(ii) Samples observations are random.
Let us consider a random sample (X1, Y1), (X2, Y2), …, (Xn, Yn) of size n taken
from a bivariate normal population. Let  and r be the correlation coefficients
of population and sample data respectively.
Here, we wish to test the hypothesis about population correlation coefficient
(), that is, linear correlation between two variables X and Y in the population,
so we can take the null hypothesis as
H0 :   0 and H1 :   0 for two-tailed test   Here,    and 0  0 if 
we compare it with general 
procedure given inSection11.2.

H 0 :   0 and H1 :   0 
or   for one-tailed test 
H 0 :   0 and H1 :   0 
For testing the null hypothesis, the test statistic t is given by
r n2
t ~ t  n 2 
1 r2
which follows t-distribution with n – 2 degrees of freedom.
After substituting values of r and n, we find out calculated value of test statistic
t. Then we look for critical (or cut-off or tabulated) value(s) of test statistic t
from the t-table. On comparing calculated value and critical value(s), we take
the decision about the null hypothesis as discussed in Section 11.2.
Let us do some examples of testing of hypothesis that population correlation
coefficient is zero.
Example 7: A random sample of 18 pairs of observations from a normal
population gave a correlation coefficient of 0.7. Test whether the population
correlation coefficient is zero at 5% level of significance.
Solution: Given that
n = 18, r = 0.7
Here, we wish to test that population correlation coefficient () is zero so our
claim is  = 0 and its complement is  ≠ 0. Since the claim contains the
81
Testing of Hypothesis equality sign so we can take the claim as the null hypothesis and complement
as the alternative hypothesis. Thus,
H 0 :   0 and H1 :   0

Since the alternative hypothesis is two-tailed so the test is two-tailed test.


Here, we want to test the hypothesis regarding population correlation
coefficient is zero and the populations under study follow normal distributions,
so we can go for t-test.
For testing the null hypothesis, the test statistic t is given by
r n2
t
1  r2
0.7 18  2 0.7  4 2.8
    3.94
1   0.7 
2
0.51 0.71

The critical value of test statistic t for two-tailed test corresponding (n-2) = 16
df at 5% level of significance are  t ( n  2 ), α / 2   t (16 ), 0.025   2.120.

Since calculated value of test statistic t (= 3.94) is greater than the critical
values (= ± 2.120), that means calculated value of test statistic t lies in rejection
region, so we reject the null hypothesis. i.e. we reject the claim at 5% level of
significance.
Thus, we conclude that sample provides us sufficient evidence against the
claim so there exists a relationship between two variables.
Example 8: A random sample of 15 married couples was taken from a
population consisting of married couples between the ages of 30 and 40. The
correlation coefficient between the IQs of husbands and wives was found to be
0.68. Assuming that the IQs of husbands and wives follow normal distributions
then test that IQs of husbands and wives in the population are positively
correlated at1% level of significance.
Solution: Given that
n = 15, r = 0.68
Here, we wish to test that IQs of husbands and wives in the population are
positively correlated. If  denote the correlation coefficient between IQs of
husbands and wives in the population then the claim is  > 0 and its
complement is  ≤ 0. Since complement contains the equality sign so we can
take the complement as the null hypothesis and the claim as the alternative
hypothesis. Thus,
H 0 :   0 and H1 :   0
Since the alternative hypothesis is right-tailed so the test is right-tailed test.
Here, we want to test the hypothesis regarding population correlation
coefficient is zero and the populations under study follow normal distributions,
so we can go for t-test.
For testing the null hypothesis, the test statistic t is given by
r n2
t
1  r2

82
0.68 15  2 0.68  3.61 Small Sample Tests
   3.36
1   0.68 
2 0.73

The critical value of test statistic t for right-tailed test corresponding (n-2) = 13
df at 1% level of significance is t ( n  2 ), α  t (13 ), 0.01  2.650.

Since calculated value of test statistic t (= 3.36) is greater than the critical value
(= 2.650), that means calculated value of test statistic t lies in rejection region,
so we reject the null hypothesis and support the alternative hypothesis i.e. we
support our claim at 1% level of significance.
Thus, we conclude that sample fail to provide us sufficient evidence against the
claim so we may assume that the correlation between IQs of husbands and
wives in the population is positive.
In the same way, you can try the following exercise.
E8) Twenty families were selected randomly from a colony to determine
that correlation exists between family income and the amount of money
spent per family member on food each month. The sample correlation
coefficient was computed as r = 0.40. Assuming that the family income
and the amount of money spent per family member on food each month
follow normal distributions then test that there is a positive linear
relationship between the family income and the amounts of money
spent per family member on food each month in colony at 1% level of
significance.
We now end this unit by giving a summary of what we have covered in it.

11.7 SUMMARY
In this unit, we have discussed the following points:
1. Need of small sample tests.
2. Procedure of testing a hypothesis for t-test.
3. Testing of hypothesis for population mean using t-test.
4. Testing of hypothesis for difference of two population means when samples
are independent using t-test.
5. The procedure of paired t-test for testing of hypothesis for difference of two
population means when samples are dependent or paired.
6. Testing of hypothesis for population correlation coefficient using t-test.

11.8 SOLUTIONS /ANSWERS


E1) Since calculated value of test statistic t is based on 15 df therefore, we
use row for 15 df in the t-table and move across this row to find the
values in which calculated t-value lies. Since calculated t-value falls
between 2.131 and 2.602, which are corresponding to the values of
one-tailed area α = 0.025 and 0.01 respectively, therefore p-value will
lie between 0.01 and 0.025, that is, 0.01  p-value  0.025
Since test is two-tailed, therefore, the values are doubled, so
0.02  2  0.01  p-value  2  0.025  0.05
E2) Here, we are given that
83
Testing of Hypothesis n  16,  0  18000, X  20000, S  6000
Here, we want to test that manufacturer’s claim is true that the average
life () of tyres is 18000 km. So claim is µ = 18000 and its complement
is µ ≠ 18000. Since the claim contains the equality sign so we can take
the claim as the null hypothesis and complement as the alternative
hypothesis. Thus,
H0 :  = µ0 = 18000 [average life of tyres is 18000 km]
H1 :  ≠ 18000 [average life of tyres is not 18000 km]
Here, population SD is unknown and population under study is given to
be normal. So we can go for t-test.
For testing the null hypothesis, the test statistic t is given by
X  µ 0 20000  18000 2000
t    1.33
S/ n 6000 / 16 1500
The critical value of test statistic t for two-tailed test corresponding
(n-1) = 15 df at 1% level of significance are ± t(15) ,0.005 = ± 2.947.
Since calculated value of test statistic t (= 1.33) is less than the critical
(tabulated) value (= 2.947) and greater that critical value (= − 2.947),
that means calculated value of test statistic lies in non-rejection region,
so we do not reject the null hypothesis i.e. we support the
manufacture’s claim at 1% level of significance.
Thus, we conclude that sample fails to provide sufficient evidence
against the claim so we may assume that manufacturer’s claim is true.
E3) Here, we want to test that the average weight () of all cadets of the
centre is 60 kg. So our claim is µ = 60 and its complement is µ ≠ 60.
Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,
H 0 :    0  60  average weight of all cadets is 60 kg 
H1 :   60 average weight of all cadets is not 60 kg 
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Here, population SD is unknown and population under study is given to
be normal. So we can go for t-test.
For testing the null hypothesis, the test statistic t is given by
X  µ0
t ~ t ( n 1) under H 0 … (5)
S/ n
Before moving further, first we have to calculate value of X and S.
Calculation for X and S:
Sample value (X) X  X  2
X  X
48 −15 225
50 −13 169
62 −1 1
75 12 144
80 17 289
60 −3 9
70 7 49

84
56 −7 49 Small Sample Tests
52 −11 121
77 14 196
 X  630  X  X
2
1252

From the above calculation, we have


1 1
X
n
 X  10  630  63
1 2
S2 
n 1
  X  X

1
  1252  139.11
10  1
 S  139.11  11.79
Putting the values in equation (5), we have
63  60
t
11.79 / 10
3
  0.80
3.73
The critical values of test statistic t for two-tailed test corresponding
(n-1) = 9 df at 5% level of significance are ± t(9) ,0.025 = ± 2.262.
Since calculated value of test statistic t (= 0.80) is less than the critical
value (= 2.262) and greater than the critical value (= −2.262), that
means calculated value of test statistic t lies in non-rejection region so
we do not reject H0 i.e. we support the claim at 5% level of
significance.
Thus, we conclude that sample fails to provide sufficient evidence
against the claim so we may assume that the average weight of all the
cadets of given centre is 60 kg.
E4) Here, we want to test that there is no difference between drugs A and B
with regard to their mean weight increment. If 1 and 2 denote the
mean weight increment due to drug A and drug B respectively then our
claim is µ1 = 2 and its complement is µ1 ≠ 2. Since the claim contains
the equality sign so we can take the claim as the null hypothesis and
complement as the alternative hypothesis. Thus,
H 0 : 1   2 effect of both drugs is same 
H 1 : 1   2 effect of both drugs is not same 
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Since it is given that increments in the weight due to both drugs follow
normal distributions with equal and unknown variances and other
assumptions of t-test for testing a hypothesis about difference of two
population means also meet. So we can go for this test.
For testing the null hypothesis, the test statistic t is given by

85
Testing of Hypothesis XY
t ~ t ( n 1 n 22) under H0 … (6)
1 1
Sp 
n1 n 2

Assume, a = 8, b = 12 and use short-cut method to find X, Y and Sp .

Calculation for X, Y and S p :


Drug A Drug B

X d1 = (X-a) d1 2 Y d2 = (Y-b) d2 2
a=8 b = 12
5 ‒3 9 9 ‒3 9
8 0 0 10 ‒2 4
7 ‒1 1 15 3 9
10 2 4 12 0 0
9 1 1 14 2 4
6 ‒2 4 8 ‒4 16
12 0 0

d 1  3 d 2
1  19 d 2  4  d22  42
From above calculation, we have
1 1
n1  1
Xa d  8   3   7.5,
6
1 1
Y  b   d 2  12   4   11.43
n2 7
2 2
1     d1      d2   
n1  n 2  2    1     d2 
2 2 2
S 
p  d  
n1   n2  

2 2
1    3    4   
  19    42  
6  7  2  6   7 
   
1
 17.5  39.71  5.20
11
 Sp  5.20  2.28
Putting these values in equation (6), we have
7.5  11.43 3.93 3.93
t    3.07
1 1 2.28  0.56 1.28
2.28 
6 7
The critical values of test statistic t for two-tailed test corresponding
(n1 + n2 -2) = 11 df at 5% level of significance are ± t(11) ,0.025 = ± 2.201.
Since calculated value of test statistic t (= −3.07) is less than the critical
values (= ± 2.201) that means calculated value of test statistic t lies in
rejection region, so we reject the null hypothesis i.e. we reject the claim
at 5% level of significance.

86
Thus, we conclude that samples provide us sufficient evidence against Small Sample Tests
the claim so drugs A and B differ significantly. Any one of them is
better than other.
E5) Here, we are given that
n1  13, X  4.6, S1  0.5,

n 2  13, Y  5.0, S2  0.3

Therefore, the pooled variance S 2p can be calculated as


1
S2p   n1  1 S12   n 2  1 S22 
n1  n 2  2

1 12   0.5  2  12   0.3 2 



13  13  2  
1
  3.00  1.08  0.17
24
 Sp  0.17  0.41

We want to test that the there is significant improvement in wheat


production due to fertilizer. If 1 and 2 denote the mean wheat
productions without and with the fertilizer respectively then our claim
is 1 < 2 and its complement is 1 ≥ 2. Since complement contains the
equality sign so we can take the complement as the null hypothesis and
the claim as the alternative hypothesis. Thus,
H0 : 1  2 and H1 : 1  2
Since the alternative hypothesis is left-tailed so the test is left-tailed
test.
Since it is given that yield of wheat with and without fertilizer follow
normal distributions with equal and unknown variances and other
assumptions of t-test for testing a hypothesis about difference of two
population means also meet. So we can go for this test.
For testing the null hypothesis, the test statistic t is given by
XY
t ~ t (n 1n 2 2) under H0
1 1
Sp 
n1 n 2
4.6  5.0  0.4  0.4
     2.5
1 1 0.41  0.39 0.16
0.41 
13 13
The critical values of test statistic t for left-tailed test corresponding
(n1 + n2 -2) = 24 df at 1% level of significance is − t(24), 0.01 = −2.492.
Since calculated value of test statistic t (= −2.5) is less than the critical
value (= −2.492), that means calculated value of test statistic t lies in
rejection region, so we reject the null hypothesis and support the
alternative hypothesis i.e. we support our claim at 1% level of
significance.

87
Testing of Hypothesis Thus, we conclude that samples fail to provide us sufficient evidence
against the claim so we may assume that there is significant
improvement in wheat production due to fertilizer.
E6) Here, we want to test whether the programme PGDAST has improved
the performance of the graduate students in Statistics. If 1 and 2
denote the average marks before and after the programmed so our claim
is 1 < 2 and its complement is 1 ≥ 2. Since complement contains the
equality sign so we can take the complement as the null hypothesis and
the claim as the alternative hypothesis. Thus,
H0 : 1  2 and H1 : 1  2
Since the alternative hypothesis is left-tailed so the test is left-tailed
test.
It is a situation of before and after. Also, the marks of the students
before and after the programme PGDAST follow normal distributions.
So, population of differences will also be normal. Also all the
assumptions of paired t-test meet. So we can go for paired t-test.
For testing the null hypothesis, the test statistic t is given by
D
t ~ t  n 1  under H 0 … (7)
SD / n
Calculation for D and S D :

Participant Marks D = (X−Y) D2


Before (X) After (Y)

1 42 45 −3 9
2 46 46 0 0
3 50 60 −10 100
4 36 42 −6 36
5 44 60 −16 256
6 60 72 −12 144
7 62 63 −1 1
8 43 43 0 0
9 70 80 −10 100
10 53 65 −12 144
2
 D  70 D  790

From above calculation, we have


1 1
n
D D   70    7
10
2
1    D 
n 1 
2 2
S 
D  D  
n 


1
790 
 702  1
    300  33 . 33
9 10  9

 SD  33.33  5.77

88
Putting the values in equation (7), we have Small Sample Tests

7.0
t  3.83
5.77 / 10
The critical value of test statistic t for left-tailed test corresponding
(n-1) = 9 df at 5% level of significance is − t(9), 0.05 = −1.833.
Since calculated value of test statistic t (= −3.83) is less than the critical
(tabulated) value (= −1.833), that means calculated value of test statistic
t lies in rejection region, so we reject the null hypothesis and support
the alternative hypothesis i.e. we support our claim at 5% level of
significance.
Thus, we conclude that samples fail to provide us sufficient evidence
against the claim so we may assume that the participants have
significant improvement after the programme “Post Graduate Diploma
in Applied Statistics (PGDAST)”.
E7) Here, we want to test that the drug has no effect on change in blood
pressure. If D denotes the average increment in the blood pressure
before drug then our claim is D = 0 and its complement is D ≠ 0.
Since the claim contains the equality sign so we can take the claim as
the null hypothesis and complement as the alternative hypothesis. Thus,
H 0 :  D  1 2  0  the drug has no effect 

H1: µ D  0  the drug has an effect 


It is given that increment in the blood pressure follows normal
distribution. Also patients are same in both situations before and after
the drug. And all the assumption of paired t-test meet. So we can go for
paired t-test.
For testing H0, the test statistic is given by
D
t ~ t  n 1  under H 0 … (8)
SD / n
Calculation for D and S D :
Patient
1 2 3 4 5 6 7 8 Total
Number
D 4 0 7 −2 0 −3 2 0 D  8
 D  D 3 −1 6 −3 −1 −4 1 −1
2 2
 D  D 9 1 36 9 1 16 1 1  D  D  74

From above calculation, we have


1 1
D
n  D  8 1
8
1 2 1
S2D 
n 1
  D  D    74  10.57
7
 SD  10.57  3.25
Putting the values in test statistic, we have

89
Testing of Hypothesis 1 1
t   0.87
3.25 / 8 1.15
The critical value of test statistic t for two-tailed test corresponding
(n-1) = 7 df at 5% level of significance are ± t(7), 0.025 = ± 2.365.
Since calculated value of test statistic t (= 0.87) is less than the critical
value (= 2.365) and greater than the critical value (=−2.365) that means
calculated value of test statistic t lies in non-rejection region, so we do
not reject the null hypothesis i.e. we support the claim at 5% level of
significance.
Thus, we conclude that samples fail to provide us sufficient evidence
against the claim so we may assume that the drug has no effect on the
change of blood pressure of patients.
E8) We are given that
n = 20, r = 0.40
and we wish to test that there is a positive linear relationship between
the family income and the amounts of money spent per family member
on food each month in colony. If  denote the correlation coefficient
between the family income and the amounts of money spent per family
member then the claim is  > 0 and its complement is  ≤ 0. Since
complement contains the equality sign so we can take the complement
as the null hypothesis and the claim as the alternative hypothesis. Thus,
H0 :   0 and H0 :   0
Since the alternative hypothesis is right-tailed so the test is right-tailed
test.
For testing the null hypothesis, the test statistic t is given by
r n2
t
1  r2
0.40 20  2 0.40  4.24
   1.84
1   0.40 
2 0.92

The critical value of test statistic t for right-tailed test corresponding


(n-2) = 18 df at 1% level of significance is t ( n  2 ), α  t (1 8 ), 0.01  2.552.

Since calculated value of test statistic t (= 1.84) is less than the critical
value (= 2.552), that means calculated value of test statistic t lies in non-
rejection region, so we do not reject the null hypothesis and reject the
alternative hypothesis i.e. we reject our claim at 1% level of significance.
Thus, we conclude that sample provide us sufficient evidence against the
claim so there is no positive linear correlation between the family income
and the amounts of money spent per family member on food each month
in colony.

90
UNIT 12 CHI-SQUARE AND F-TESTS
Structure
12.1 Introduction
Objectives
12.2 Testing of Hypothesis for Population Variance Using χ2-Test
12.3 Testing of Hypothesis for Two Population Variances Using F-Test
12.4 Summary
12.5 Solutions / Answers

12.1 INTRODUCTION
Recall from the previous unit when we test the hypothesis about the difference
of means of two populations, t-test needs an assumption of equality of
variances of two populations under study. Other than this there are situations
where we want to test the hypothesis about the variances of two populations.
For example, an economist may want to test whether the variability in incomes
differ in two population. In such situations, we use F-test when the populations
under study follow the normal distributions.

Similarly, there are many other situations where we need to test the hypothesis
about the hypothetical or specified value of the variance of the population
under study. For example, the manager of the electric bulbs company would
probably be interested whether or not the variability in the life of bulbs is
within acceptable limits, the product controller of a milk company may be
interested in the variance of the amount of fat in the whole milk processed by
the company is no more than the specified level. In such situations, we use
χ2-test when the population under study follows the normal distribution.
This unit is divided into five sections. Section 12.1 is described the need of χ2
and F-tests. χ2-test for testing the hypothesis about the population variance is
discussed in Section 12.2. And F-test for equality of variances of two
populations is discussed in Section 12.3. Unit ends by providing summary of
what we have discussed in this unit in Section 12.4 and solution of exercises in
Section 12.5.
Objectives
After studying this unit, you should be able to:

 describe the testing of hypothesis for population variance; and

 explain the testing of hypothesis for two population variances.

12.2 TESTING OF HYPOTHESIS FOR


POPULATION VARIANCE USING χ2-TEST
In Section 11.3 of previous unit, we have discussed testing of hypothesis for
population mean when the characteristic under study follows the normal
distribution but when analysing quantitative data, it is often important to draw
conclusion about the average as well as the variability of a characteristic under
study. For example, if a company manufactured the electric bulbs then the
91
Testing of Hypothesis manager of the company would probably be interested in determining the
average life of the bulbs and also determining whether or not the variability in
the life of bulbs is within acceptable limits, the product controller of a milk
company may be interested in the variance of the amount of fat in the whole
milk processed by the company is no more than the specified level, etc.
The procedure of testing a hypothesis for population variance or standard
deviation is similar to the testing of population mean. The basic difference is
that here we use chi-square test instead of t-test because here the sampling
distribution of test statistic follows the chi-square distribution.
Assumptions
This test works under the following assumptions:
(i) The characteristic under study follows normal distribution. In other
words, populations from which random sample is drawn should be
normal with respect to the characteristic of interest.
(ii) Sample observations are random and independent.
Let X1 , X 2 ,..., X n be a random sample of size n drawn from a normal
population with mean  and variance σ2, where  and σ2 are unknown.
The general procedure of this test is explained below in detail:
As we are doing so far in all tests, first step in hypothesis testing problems is to
setup null and alternative hypotheses. Here, we want to test the claim about the
hypothesized or specified value σ02 of population variance σ 2 so we can take
our null and alternative hypotheses as
H 0 : 2  02 and H1 : 2  02 for two-tailed test 
H0 : 2  02 and H1 : 2  02 
or  for one-tailed test
H0 : 2  02 and H1 : 2  02 
For testing the null hypothesis, the test statistic χ2 is given by
2

χ 2

X i X

(n  1)S 2
~ χ 2  n 1  under H 0 ... (1)
σ 02 σ 02
1 2
where, S2 
n  1
X X
Here, the test statistic χ2 follows chi square distribution with (n − 1) degrees of
freedom as we have discussed in Unit 3 of this course.
After substituting values of n, S and 02 , we get calculated value of test
statistic. Let 2cal be the calculated value of test statistic χ2.
Obtain the critical value(s) or cut-off value(s) in the sampling distribution of
the test statistic χ2 and construct rejection (critical) region of size . The critical
value of the test statistic χ2 for various df and different level of significance 
are given in Table III of the Appendix given at the end of the Block 1of this
course.
After doing all the calculation discussed above, we have to take the decision
about rejection or non-rejection of the null hypothesis. The procedure of taking
the decision about the null hypothesis is explained is the next page:
92
For one-tailed test: Chi-square and F-Tests

Case I: When H0 :2  20 and H1 : 2  02 (right-tailed test)


In this case, the rejection (critical) region falls under the right tail of
the probability curve of the sampling distribution of test statistic χ2.
Suppose  2( ),  is the critical value at  level of significance where,
ν = n ‒1, so entire region greater than or equal to  2( ),  is the rejection
region and less than  (2 ),  is the non-rejection region as shown in Fig.
12.1.
If 2cal  (2 ),  , that means calculated value of test statistic lies in the
rejection (critical) region, then we reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that sample data
provides us sufficient evidence against the null hypothesis and there
is a significant difference between hypothesized value and observed
value of the population variance σ2.
If 2cal  (2 ),  , that means calculated value of test statistic lies in non-
rejection region, then we do not reject the null hypothesis H0 at  Fig. 12.1
level of significance. Therefore, we conclude that the sample data
fails to provide us sufficient evidence against the null hypothesis and
the difference between hypothesized value and observed value of the
population variance σ2 due to fluctuation of sample.
Case II: When H0 :2  02 and H1 : 2  02 (left-tailed test)
In this case, the rejection (critical) region falls under the left tail of the
probability curve of the sampling distribution of test statistic χ2.
Suppose  2( ), (1 ) is the critical value at  level of significance then
entire region less than or equal to  2( ), (1 ) is the rejection (critical)
region and greater than  (2 ),(1 ) is the non-rejection region as shown
in Fig. 12.2.
If  2cal   (2 ),(1 ) , that means calculated value of test statistic lies in
rejection (critical) region, then we reject the null hypothesis H0 at 
Fig. 12.2
level of significance.
For  2cal   (2 ), (1 ) , that means calculated value of test statistic lies in
the non-rejection region, then we do not reject the null hypothesis H0
at  level of significance.
For two-tailed test:
When H0 :2  20 and H1 : 2  02
In this case, the rejection region falls under both tails of the
probability curve of sampling distribution of the test statistic χ2 and
half the area (α) i.e. α/2 of rejection (critical) region lies at left tail
and other half on the right tail. Suppose  2( ),(1 / 2) and 2( ),  / 2 are the
two critical values at the left-tailed and right-tailed respectively on
pre-fixed -level of significance. Therefore, entire region less than or
equal to  2( ),(1 / 2) and greater than or equal to (2 ),  / 2 are the rejection
Testing of Hypothesis (critical) regions and between  2( ),(1 / 2) and  2( ),  / 2 is the non-
rejection region as shown in Fig. 12.3.
If  2cal   (2 ),  / 2 or 2cal  (2 ),(1 / 2) , that means calculated value of test
statistic χ2 lies in rejection (critical) region, then we reject the null
hypothesis H0 at  level of significance.
If  (2 ),(1 / 2)   cal
2
  (2 ),  / 2 , that means calculated value of test
statistic χ2 lies in non-rejection region, then we do not reject the null
Fig. 12.3
hypothesis H0 at  level of significance.
Procedure of taking the decision about the null hypothesis on the basis of
p-value:
You have done it in so many test, it has become a routine thing for you. You
know that to take the decision about the null hypothesis on the basis of p-value,
the p-value is compared with level of significance (α) and if p-value is less than
or equal to α then we reject the null hypothesis and if the p-value is greater than
α we do not reject the null hypothesis.
For χ2-test, p-value is defined as:
For one-tailed test:
For H1 : 2  02 (right-tailed test)
p-value = P  2  cal
2


For H1 : 2  02 (left-tailed test)


p-value = P  2  cal
2


For two-tailed test: H1 : 2  20


For two-tailed test the p-value is approximated as
p-value = 2P  2  cal
2

The p-value for χ2-test can be obtained with the help of the Table-III
(χ2-table) given in the Appendix at the end of Block 1 of this course. Similar to
t-test, this table gives the χ2 values corresponding to the standard values of α
such as 0.995, 0.99, 0.10, 0.05, 0.025, 0.01, etc only. therefore, the exact
p-value is not obtained with the help of this table and we can approximate the
p-value for this test.
For example, if test is right-tailed and calculated (observed) value of test
statistic χ2 is 25.10 with 12 df then p-value is calculated as:
Since test statistic is based on the 12 df therefore, we use row for 12 df in the
χ2-table and move across this row to find the values in which calculated
χ2-value falls. Since calculated χ2-value falls between 23.24 and 26.22,
corresponding to one-tailed area α = 0.025 and 0.01 respectively, therefore
p-value lies between 0.01 and 0.025, that is,
0.01  p-value  0.025
If in the above example the test is two-tailed then the two values 0.01 and
0.005 would be doubled for p-value, that is,
2  0.01  0.02  p-value  0.05  2  0.025
Note 1: With the help of computer packages and softwares such as SPSS, SAS, Chi-square and F-Tests
MINITAB, EXCEL, etc. we can find the exact p-value for χ2-test.
Let us do some examples to become more user friendly with the test explained
above.
Example 1: The variance of a certain dimension article produced by a machine
is 7.2 over a long period. A random sample of 20 articles gave a variance 8. Is
it justifiable to conclude that variability has increased at 5% level of
significance assuming that the measurement of dimension article is normally
distributed?
Solution: Here, we are given that
Sample size = n = 20
Sample variance = S2 = 8
Specified value of population variance under test = σ20  7.2
Here, we want to test that variability of dimension article produced by a
machine has increased. Since variability is measured in terms of variance (  2 )
so our claim is  2  7.2 and its complement is  2  7.2. Since complement
contains the equality sign so we can take the complement as the null hypothesis
and the claim as the alternative hypothesis. Thus,
H0 : 2  20  7.2
H 1 :  2  7.2  variability of dimension article has increased 
Since the alternative hypothesis is right-tailed so the test is right-tailed test.
Here, we want to test the hypothesis about the population variance and sample
size is small n = 20(< 30). Also we are given that measurement of dimension
article follows normal distribution so we can go for χ2 test for population
variance.
So, test statistic is given by
 n  1  S2
χ2  2
~ χ 2 n 1  under H0
σ
19  8
  21.11
7.2
The critical (tabulated) value of test statistic χ2 for right-tailed test
corresponding (n-1) = 19 df at 5% level of significance is
 2(n 1),    (19),0.05
2
= 30.14.
Since calculated value of test statistic (= 21.11) is less than the critical
(tabulated) value (= 30.14), that means calculated value of test statistic lies in
non-rejection region as shown in Fig. 12.4, so we do not reject the null
hypothesis and reject the alternative hypothesis i.e. we reject our claim at 5%
level of significance.
Fig. 12.4
Decision according to p-value:
Since test statistic is based on 19 df therefore, we use row for 19 df in the χ2-
table and move across this row to find the values in which calculated χ2-value
falls. Since calculated χ2-value falls between 11.65 and 27.20, corresponding to
one-tailed area α = 0.90 and 0.10 respectively therefore p-value lies between
0.10 and 0.90, that is,
0.10  p-value  0.90
Testing of Hypothesis Since p-value is greater than  ( 0.05) so we do not reject the null hypothesis
at 5% level of significance.
Thus, we conclude that sample provides us sufficient evidence against the
claim so we may assume that the variability of dimension article produced by a
machine is not increased.
Example 2: The12 measurements of the same object on an instrument are
given below:
1.6, 1.5, 1.3, 1.5, 1.7, 1.6, 1.5, 1.4, 1.6, 1.3, 1.5, 1.5
If the measurement of the instrument follows normal distribution then carry out
the test at 1% level of significance that variance in the measurement of the
instrument is less than 0.016.
Solution: Here, we are given
Sample size = n = 12
Specified value of population variance under test 20  0.016
Here, we want to test that the variance (σ 2 ) in the measurements of the
instrument is less than 0.016. So our claim is  2  0.016 and its complement
is  2  0.016. Since complement contains the equality sign so we can take
complement as null hypothesis and claim as the alternative hypothesis. Thus,
H0 : 2  02  0.016 and H1 : 2  0.016
Since the alternative hypothesis is left-tailed so the test is left-tailed test.
Here, we want to test the hypothesis about the population variance and sample
size is small n = 12(< 30). Also we are given that the measurement of the
instrument follows normal distribution so we can go for χ2-test for population
variance.
For testing the null hypothesis, the test statistic χ2 is given by
2

χ 2

 X i X
~ χ 2 n 1  under H 0 ... (2)
2
σ 0
2
Calculation for   Xi  X  :
X X  X X  X
2

1.6 0.1 0.01


1.5 0 0
1.3 -0.2 0.04
1.5 0 0
1.7 0.2 0.04
1.6 0.1 0.01
1.5 0 0
1.4 -0.1 0.01
1.6 0.1 0.01
1.3 -0.2 0.04
1.5 0 0
1.5 0 0
  18
X 0
 X  X
2
 0.16

96
From above calculation, we have Chi-square and F-Tests

1 1
n
X X  18  1.5
12
2
Putting the values of  X  X  and σ 02 in equation (2), we have
2

χ 2

 X i X0.16

 10
2
σ 0 0.016
The critical value of test statistic χ2 for left-tailed test corresponding
(n-1) = 11 df at 5% level of significance is (2n 1),(1 )   2(11),0.95 = 4.57.

Since calculated value of test statistic (= 10) is greater than the critical value
(= 4.57), that means calculated value of test statistic lies in non-rejection region
as shown in Fig. 12.5 so we do not reject the null hypothesis and reject the
alternative hypothesis i.e. we reject the claim at 5% level of significance.
Thus, we conclude that sample provide sufficient evidence against the claim so
the variance in the measurement of the instrument is not less than 0.016.
In the same way, you can try the following exercises. Fig. 12.5

E1) An ambulance agency claims that the standard deviation in the length of
serving times is less than 15 minutes. Investigator suspects that this
claim is wrong and takes a random sample of 20 serving times which
has a standard deviation of 17 minutes. Assume that the service time of
the ambulance follows normal distribution. Test at  = 0.01, is there
enough evidence to reject the agency’s claim?
E2) A cigarette manufacturer claims that the variance of nicotine content of
its cigarettes is 0.62. Nicotine content is measured in milligrams and is
normally distributed. A sample of 25 cigarettes has a variance of 0.65.
Test the manufacturer’s claim at 5% level of significance.

12.5 TESTING OF HYPOTHESIS FOR TWO


POPULATION VARIANCES USING
F-TEST
In Section 12.1 we have already mentioned that before applying t-test for
difference of two population means, one of the requirements is to check the Some author uses this
equality of variances of two populations. This assumption can be checked with test as the name
the help of F-test for two population variances. This F-test is also important in “homoscedastic test”.
a number of contexts. For example, an economist may want to test whether the Because two
populations with equal
variability in incomes differ in two populations, a quality controller may want variances are called
to test whether the quality of the product is changing over time, etc. homoscedastic. This
word is derived from
Assumptions two wards, homo
The assumptions for F-test for testing the variances of two populations are: means “the same” and
scedastic means
1. The populations from which the samples are drawn must be normally “variability”.
distributed.
2. The samples must be independent of each other.
Now, we come to the general procedure of this test.
Testing of Hypothesis Let X1 , X2 ,..., X n1 be a random sample of size n1 from a normal population
with mean 1 and variance σ12. Similarly, Y1 ,Y2 ,..., Yn2 be a random sample of
size n2 from another normal population with mean 2 and variance σ22. Here,
we want to test the hypothesis about the two population variances so we can
take our alternative null and hypotheses as
H 0 : 12  22  2 and H1 : 12  22  for two-tailed test 
H 0 : 12  22 and H1 : 12  22 
or  for one-tailed test 
H 0 : 12  22 and H1 : 12  22 
For testing the null hypothesis, the test statistic F is given by
S12
F ~ F( n1 1, n 2 1) under H0 … (3)
S22
1 2 1 2
where, S12 
n1  1
  X  X  and S22 
n2 1
  Y  Y .

For computational simplicity, we can also write


2 2
2 1  2   X  2 1 
2   Y 
S 
1
n1  1 
 X  n  and S2  n  1   Y  n 
1 2 2
   
The test statistic F follows F-distribution with ν1   n1  1  and ν 2   n 2  1 
degrees of freedom as discussed in Unit 4 of this course.
After substituting the values of S12 and S22 , we get calculated value of test
statistic. Let Fcal be calculated value of test statistic F.
Obtain the critical value(s) or cut-off value(s) in the sampling distribution of
the test statistic F and construct rejection (critical) region of size . The critical
values of the test statistic F for various df and different level of significance 
are given in Table IV (F-table) of the Appendix at the end of Block 1of this
course.
After doing all calculations discussed above, we have to take the decision about
rejection or non rejection of the null hypothesis. This is explained below:
In case of one-tailed test:
Case I: When H0 :σ12  σ22 and H1 : 12  22 (right-tailed test)
In this case, the rejection (critical) region falls at the right side of the
probability curve of the sampling distribution of test statistic F.
Fig. 12.6
Suppose F( 1 , 2 ),  is the critical value of test statistic F with
(ν1 = n1 – 1, ν2 = n2 – 1) df at  level of significance so entire region
greater than or equal to F( 1 , 2 ),  is the rejection (critical) region and
less than F( 1 , 2 ),  is the non-rejection region as shown in Fig. 12.6.

If Fcal  F( 1 , 2 ),  , that means calculated value of test statistic lies in


rejection (critical) region, then we reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that samples data
provide us sufficient evidence against the null hypothesis and there is Chi-square and F-Tests
a significant difference between population variances.
If Fcal  F( 1 , 2 ),  , that means calculated value of test statistic lies in
non-rejection region, then we do not reject the null hypothesis H0 at 
level of significance. Therefore, we conclude that the samples data
fail to provide us sufficient evidence against the null hypothesis and
the difference between population variances due to fluctuation of
sample.
Case II: When H0 :σ12  σ22 and H1 : 12  22 (left-tailed test)
In this case, the rejection (critical) region falls at the left side of the
probability curve of the sampling distribution of test statistic F.
Suppose F( 1 , 2 ), (1) is the critical value at  level of significance then
entire region less than or equal to F( 1 , 2 ),(1) is the rejection(critical)
region and greater than F( 1 , 2 ),(1) is the non-rejection region as shown
in Fig. 12.7.
If Fcal  F( 1 , 2 ),(1 ) , that means calculated value of test statistic lies in
rejection (critical) region, then we reject the null hypothesis H0 at  Fig. 12.7
level of significance.
If Fcal  F( 1 , 2 ),(1 ) , that means calculated value of test statistic lies in
non-rejection region, then we do not reject the null hypothesis H0 at 
level of significance.
Note 2: F-table mentioned above gives only the right tailed critical values with
different degrees of freedom at different level of significance. The left tailed
critical value of F-test can always be obtained by the formula given below (as
described in Unit 4 of this course):
1
F 1 , 2 ,1 
F 2 , 1 , 
In case of two-tailed test:
When H0 :σ12  σ22 and H1 : 12  22
In this case, the rejection (critical) region falls at both sides of the
probability curve of the sampling distribution of test statistic F and
half the area(α) i.e. α/2 of rejection (critical) region lies at left tail and
other half on the right tail.
Suppose F( 1 , 2 ),(1 / 2) and F( 1 , 2 ),  / 2 are the two critical values at the
left-tailed and right-tailed respectively on pre-fixed  level of
significance. Therefore, entire region less than or equal to F( 1 , 2 ), (1 / 2)
and greater than or equal to F( 1 , 2 ),  / 2 are the rejection (critical)
regions and between F( 1 , 2 ),(1 / 2) and F( 1 , 2 ),  / 2 is the non-rejection
Fig. 12.8
region as shown in Fig. 12.8.
If Fcal  F( 1 , 2 ),  / 2 or Fcal  F( 1 , 2 ), (1 / 2) , that means calculated value of
test statistic lies in rejection(critical) region, then we reject the null
hypothesis H0 at  level of significance.
Testing of Hypothesis If F( 1 , 2 ),(1 / 2)  Fcal  F( 1 , 2 ),  / 2 , that means calculated value of test
statistic F lies in non-rejection region, then we do not reject the null
hypothesis H0 at  level of significance.
Procedure of taking the decision about the null hypothesis on the basis of
p-value:
To take the decision about the null hypothesis on the basis of p-value, the p-
value is compared with level of significance (α) and if p-value is less than or
equal to α then we reject the null hypothesis and if the p-value is greater than α
we do not reject the null hypothesis.
For F-test, p-value can be defined as:
For one-tailed test:
For H1 : 12  22 (right-tailed test)
p-value = P  F  Fcal 

For H1 : 12  22 (left-tailed test)


p-value = P  F  Fcal 

For two-tailed test: H1 : 12  22


For two-tailed test the p-value is approximated as
p-value = 2P  F  Fcal 
The p-value for F-test can be obtained with the help of Table-IV (F-table)
given in the Appendix at the end of Block 1 of this course. Similar to t-test or
χ2-test, this table gives F values corresponding to the standard values of α such
as 0.10, 0.05, 0.025 and 0.01 only. Therefore with the help of F-table, we
cannot find the exact p-value. So we can approximate p-value for this test.
For example, if test is right-tailed and calculated (observed) value of test
statistic F is 2.65 with 24 degrees of freedom of numerator and 14 degrees of
freedom of the denominator then we can find the p-value as:
Since test statistic is based on (24, 14) df therefore, we move across all the
values of tabulated F corresponding to (24, 14) df at α such as 0.10, 0.05, 0.025
and 0.01 and find the values in which calculated F-value falls. Since we have F
tabulated with (24, 14) df at α = 0.10, 0.05, 0.025 and 0.01 as
α= 0.10 0.05 0.025 0.01
F(24, 14), α 1.94 2.35 2.79 3.43

Since calculated F-value (= 2.65) falls between 2.35 and 2.79, corresponding
to one-tailed area α = 0.05 and 0.025 respectively therefore p-value lies
between 0.0025 and 0.05, that is,
0.025  p-value  0.05
If in the above example the test is two-tailed then the two values 0.025 and
0.05 would be doubled for p-value, that is,
0.05  p-value  0.10
Note 3: With the help of computer packages and software such as SPSS, SAS,
MINITAB, EXCEL, etc. we can find the exact p-value for F-test.
Let us do some examples based on this test.

100
Example 3: The following data relate to the number of items produced in a Chi-square and F-Tests
shift by two workers A and B for some days:
A 26 37 40 35 30 30 40 26 30 35 45
B 19 22 24 27 24 18 20 19 25

Assuming that the parent populations are normal, can it be inferred that B is
more stable (or consistent) worker compared to A?
Solution: Here, we want to test that worker B is more stable than worker A.
As we know that stability of data is related to variance of the data. Smaller
value of the variance implies data that it is more stable. Therefore, to compare
stability of two workers, it is enough to compare their variances. If 12 and 22
denote the variances of worker A and worker B respectively then our claim is
12   22 and its complement is 12  22 . Since complement contains the
equality sign so we can take the complement as the null hypothesis and the
claim as the alternative hypothesis. Thus,
H0 : 12  22

H1 : 12   22  worker B is more stable than worker A 

Since the alternative hypothesis is right-tailed so the test is right-tailed test.


Here, we want to test the hypothesis about two population variances and
sample sizes n1 = 11(< 30) and n2 = 9 (< 30) are small. Also populations under
study are normal and both samples are independent so we can go for F-test for
two population variances.
For testing the null hypothesis, test statistic is given by
S12
F ~ F n1 1, n 2 1  … (4)
S22

1 n1 2 1 n2 2
2
where, S1  
n1  1 i1
 Xi  X  and S2
2  
n2  1 i1
 Yi  Y 

Calculation for S12 and S22 :


Items X  X  2 Items Y  Y 2

Produced by
X  X  Produced by
Y  Y
A   X  34  B   Y  22 
( Variable X) (Variable Y)

26 −8 64 19 −3 9
37 3 9 22 0 0
40 6 36 24 2 4
35 1 1 27 5 25
30 −4 16 24 2 4
30 −4 16 18 −4 16
40 6 36 20 −2 4
26 −8 64 19 −3 9
30 −4 16 25 3 9
35 1 1
45 11 121
Total = 374 0 380 198 0 80

101
Testing of Hypothesis
Therefore, we have
1 1
X   X   374  34
n1 11
and
1 1
Y   Y   198  22
n2 9
Thus,
1 2 1
S12 
n1  1
  X  X    380  38
10
1 2 1
S22 
n2  1
  Y  Y    80  10
8
Putting the value of S12 and S 22 in equation (4), we have
38
F  3.8
10
The critical (tabulated) value of test statistic F for right-tailed test
corresponding (n1-1, n2 -1) = (10, 8) df at 1% level of significance is
F( n 1, n 1),   F(10,8), 0.01 = 5.81.
1 2

Since calculated value of test statistic (= 3.8) is less than the critical value
Fig. 12.9 (= 5.81), that means calculated value of test statistic lies in rejection region as
shown in Fig. 12.9, so we do not reject the null hypothesis and reject the
alternative hypothesis i.e. we reject the claim at 1% level of significance.
Thus, we conclude that samples provide us sufficient evidence against the
claim so worker B is not more stable (or consistent) worker compared to A.
Example 4: Two random samples drawn from two normal populations gave
the following results:

Sample Size Mean Sum of Squares of Deviation


from the Mean
Sample I 9 59 26
Sample II 11 60 32

Test whether both samples are from the same normal populations?
Solution: Since we have to test whether both the samples are from same
normal population, therefore, we will test two hypotheses separately:
(i) Two population means are equal, i.e. H 0 : µ1  µ 2

(ii) Two population variances are equal, i.e. H0 : σ12  σ22


Since sample sizes are small and populations under study are normal so two
means will be tested using t-test whereas two variances will be tested using
F-test. But t-test is based on the prior assumption that both population
variances are same, therefore, first we apply F-test and later the t- test (when
F- test accepts equality hypothesis).
Given that
2
n1  9, X  59,  X  X   26
2 Chi-square and F-Tests
n 2  11, Y  60,  Y  Y   32
Therefore,
1 2 1
S12 
n1  1
  X  X 
9 1
 26  3.25

1 2 1
S22 
n2  1
  Y  Y 
11  1
 36  3.60

First we want to test that the variances of both normal populations are equal so
our claim is 12   22 and its complement is 12  22 . Thus, we can take the null
and alternative hypotheses as
H0 : 12  22 and H1 : 12  22
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
For testing this, the test statistic F is given by
S12
F ~ F n1 1,n 2 1 
S22
3.20
  0.88
3.65
The critical (tabulated) value of test statistic F for two-tailed test corresponding
(n1-1, n2 -1) = (8, 11) df at 5% level of significance are F( n1 1, n 2 1),  / 2 
1 1 1
F(8,11),0.025  3.66 and F( n 1, n 1),(1 / 2)     0.24.
1 2
F(n2 1,n1 1),  / 2  F(11,8),0.025 4.24

Since calculated value of test statistic (= 0.88) is less than the critical value
(= 3.66) and greater than the critical value (= 0.24), that means calculated value
of test statistic F lies in non-rejection region so we do not reject the null
hypothesis i.e. we support the claim.
Thus, we conclude that both samples may be taken from normal populations
having equal variances.
Now, we test that the means of two normal populations are equal so our claim
is 1  2 and its complement is 1  2. Thus, we can take the null and
alternative hypotheses as
H0 : 1  2 and H1 : 1  2
The test statistic is given by
XY
t … (5)
1 1
Sp 
n1 n 2
1   X  X 2   Y  Y 2 
where, S2p 
n1  n 2  2  

1
 26  32  1  58  3.22
9  11  2 18

103
Testing of Hypothesis
Sp  3.22  1.79

Putting the values of X,Y,Sp ,n1 and n 2 in equation (5), we have,

59  60 1 1
t     1.23
1 1 1.79  0.45 0.81
1.79 
9 11
The critical values of test statistic t for (n1 + n2 -2) = 18 df at 5% level of
significance for two-tailed test are ± t(18) ,0.025 = ± 2.101.
Since calculated value of test statistic t (= −1.23) is less than the critical value
(= 2.101) and greater than the critical value (= ‒2.101), that means calculated
value of test statistic t lies in non-rejection region so we do not reject the null
hypothesis i.e. we support the claim.
Thus, we conclude that both samples may be taken from normal populations
having equal means.
Hence, overall we conclude that both samples may come from the same normal
populations.
Now, you can try the following exercises.
E3) Two sources of raw materials are under consideration by a bulb
manufacturing company. Both sources seem to have similar
characteristics but the company is not sure about their respective
uniformity. A sample of 12 lots from source A yields a variance of 125
and a sample of 10 lots from source B yields a variance of 112. Is it
likely that the variance of source A significantly differs to the variance
of source B at significance level  = 0.01?
E4) A laptop computer maker uses battery packs of two brands, A and B.
While both brands have the same average battery life between charges
(LBC), the computer maker seems to receive more complaints about
shorter LBC than expected for battery packs of brand A. The computer
maker suspects that this could be caused by higher variance in LBC for
brand A. To check that, ten new battery packs from each brand are
selected, installed on the same models of laptops, and the laptops are
allowed to run until the battery packs are completely discharged. The
following are the observed LBCs in hours:
Brand A 3.2 3.7 3.1 3.3 2.5 2.2 3.2 3.1 3.2 4.3

Brand B 3.4 3.6 3.0 3.2 3.2 3.2 3.0 3.1 3.2 3.2

Assuming that the LBCs of both brands follows normal distribution,


test the LBCs of brand A have a larger variance that those of brand B
at 5% level of significance.
We now end this unit by giving a summary of what we have covered in it.

12.4 SUMMARY
In this unit, we have discussed the following points:
1. Testing of hypothesis for population variance using χ2-test.
2. Testing of hypothesis for two population variances using F-test.
104
Chi-square and F-Tests
12.5 SOLUTIONS / ANSWERS
E1) Here, we are given that
σ0  15, n  20, S  17
Here, we want to test the agency’s claim that the standard deviation ()
of the length of serving times is less than 15 minutes. So our claim is
 < 15 and its complement is  ≥ 15. Since complement contains the
equality sign so we can take complement as null hypothesis and claim
as the alternative hypothesis. Thus,
H 0 :   0  15

H1 :   15 SD of the length of serving 


 times is less than 15 minutes 
Since the alternative hypothesis is left-tailed so the test is left-tailed
test.
Here, we want to test the hypothesis about the population standard
deviation and sample size is small n = 20(< 30). Also we are given that
the service time of the ambulance follows normal so we can go for χ2
test for population variance.
The test statistic is given by
 n  1  S2
χ2  2
~ χ 2 n 1 
σ
2
19   17.0 
  24.40
 15.0 2
The critical value of test statistic χ2 for left-tailed test corresponding
(n-1) = 19 df at 1% level significance is (2n 1), (1 )  2(19),0.99 = 7.63.
Since calculated value of test statistic (= 24.40) is greater than the
critical value (= 7.63), that means calculated value of test statistic lies in
non-rejection region so we do not reject the null hypothesis and reject
the alternative hypothesis i.e. we reject our claim at 1% level of
significance.
Thus, we conclude that sample provides us sufficient evidence against
the claim so agency’s claim that the standard deviation () of the length
of serving times is less than 15 minutes is not true.
E2) Here, we are given that
σ20  0.62, n  25, S2  0.65
Here, we want to test the cigarette manufacturer’s claims that the
variance (2) of nicotine content of its cigarettes is 0.62 milligrams. So
claim is 2 = 0.62 and its complement is  ≠ 0.62. Since claim contains
the equality sign so we can take claim as null hypothesis and
complement as the alternative hypothesis. Thus,
H 0 : σ 2  σ 02  0.62 and H1 : σ 2  0.62
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.

105
Testing of Hypothesis Here, we want to test the hypothesis about the population variance and
sample size is small n = 25(< 30). Also we are given that the nicotine
content of its cigarettes follows normal distribution so we can go for χ2
test for population variance.
The test statistic is given by
 n  1  S2
χ2  ~ χ 2 n 1 
σ2
24  0.65
  25.16
0.62
The critical (tabulated) values of test statistic χ2 for two-tailed test
corresponding (n-1) = 24 df at 5% level of significance are
 2( n 1),  / 2   (224),0.025  39.36 and  2( n 1), (1 / 2)   (224), 0.975 =12.40.
Since calculated value of test statistic (= 25.16) is less than the critical
value (= 39.36) and greater than the critical value (= 12.40), that means
calculated value of test statistic lies in non-rejection region, so we do
not reject the null hypothesis i.e. we support the claim at 5% level of
significance.
Thus, we conclude that sample fails to provide sufficient evidence
against the claim so we may assume that manufacturer’s claim that the
variance of the nicotine content of the cigarettes is 0.62 milligram is
true.
E3) Here, we are given that
n1  12, S12  125, n 2  10, S22  112
Here, we want to test that variance of source A significantly differs to
the variances of source B. If 12 and 22 denote the variances in the raw
materials of sources A and B respectively so our claim is 12  22 and
its complement is 12  22 . Since complement contains the equality sign
so we can take the complement as the null hypothesis and the claim as
the alternative hypothesis. Thus,
H0 : 12  22 and H1 : 12  22
Since the alternative hypothesis is two-tailed so the test is two-tailed
test.
Here, we want to test the hypothesis about two population variances
and sample sizes n1 = 12(< 30) and n2 = 10 (< 30) are small. Also
populations under study are normal and both samples are independent
so we can go for F-test for two population variances.
For testing this, the test statistic is given by
S12
F ~ F n1 1,n 2 1 
S22
125
  1.11
112
The critical (tabulated) value of test statistic F for two-tailed test
corresponding (n1-1, n2 -1) = (11, 9) df at 5% level of significance are
F( n1 1, n 2 1),  / 2  F(11,9),0.025  3.91 and

106
1 1 1 Chi-square and F-Tests
F( n1 1, n 2 1),(1 / 2)     0.28.
F(n2 1,n1 1),  / 2  F(9, 11),0.025 3.59
Since calculated value of test statistic (= 1.11) is less than the critical
value (= 3.91) and greater than the critical value (= 0.28), that means
calculated value of test statistic lies in non-rejection region, so we do
not reject the null hypothesis and reject the alternative hypothesis i.e.
we reject the claim at 5% level of significance.
Thus, we conclude that samples provide us sufficient evidence against
the claim so we may assume that the variances of source A and B is
differ.
E4) Here, we want to test that the LBCs of brand A have a larger variance
than those of brand B. If 12 and 22 denote the variances in the LBCs
of brands A and B respectively so our claim is 12  22 and its
complement is 12  22 . Since complement contains the equality sign
so we can take the complement as the null hypothesis and the claim as
the alternative hypothesis. Thus,
H0 : 12  22 and H1 : 12  22
Since the alternative hypothesis is right-tailed so the test is right-tailed
test.
Here, we want to test the hypothesis about two population variances
and sample sizes n1 = 10(< 30) and n2 = 10 (< 30) are small. Also
populations under study are normal and both samples are independent
so we can go for F-test for two population variances.
For testing the null hypothesis, test statistic is given by
S12
F ~ F n1 1, n 2 1  … (6)
S22
2
1 2 1    X 
where, S 
n1  1 
2
1  X  X   n  1   X  n  and
2

1  1 
2
1 2 1    Y  .
S22 
n2  1 
 Y  Y  n 1
 Y 2

n2 

2  
2 2
Calculation for S1 and S2 :
LBCs of Brand A X2 LBCs of Brand A Y2
(X) (Y)
3.7 13.69 3.6 12.96
3.2 10.24 3.2 10.24
3.3 10.89 3.2 10.24
3.1 9.61 3.0 9.00
2.5 6.25 3.0 9.00
2.2 4.84 3.2 10.24
3.1 9.61 3.2 10.24
3.2 10.24 3.1 9.61
4.3 18.49 3.2 10.24
3.2 10.24 3.1 9.61
Total = 3.18 104.1 31.8 101.38
107
Testing of Hypothesis
From the calculation, we have
1 1
X
n1  X   31.8  3.18,
10
1 1
Y
n2  Y   31.8  3.18
10

Thus,
2
1    X  1  31.8 2 
n1  1  
2 2
S   X    104.10 
1
n1  9  10 
 
1
  2.98  0.33
9
2
1    Y    1 101.38   31.8 2 
n2  1  
S22   Y 2
 
n 2  9  10 
 
1
  0.26  0.03
9
Putting the values of S12 and S22 in equation (6), we have
0.33
F  1.1
0.03
The critical (tabulated) value of test statistic F for right-tailed test
corresponding (n1-1, n2 -1) = (9, 9) df at 1% level of significance is
F( n1 1, n2 1),   F(9,9),0.01 = 5.35.

Since calculated value of test statistic (= 1.1) is less than the critical
value (= 5.35), that means calculated value of test statistic lies in non-
rejection region so we do not reject the null hypothesis and reject the
alternative hypothesis i.e. we reject the claim at 1% level of
significance.
Thus, we conclude that samples provide us sufficient evidence against
the claim so variance in LBCs of brand A is not greater than variance in
LBCs of brand B.

108

You might also like