0% found this document useful (0 votes)
18 views204 pages

Probabilityandstaticspdf 2024 08 21 14 15 59

This document covers the fundamentals of basic probability, including concepts such as random experiments, sample space, mutually exclusive events, and various definitions of probability. It explains mathematical, statistical, and axiomatic definitions of probability, along with standard results and examples to illustrate these concepts. Additionally, it provides practical examples and solutions to demonstrate the application of probability in different scenarios.

Uploaded by

Bhanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views204 pages

Probabilityandstaticspdf 2024 08 21 14 15 59

This document covers the fundamentals of basic probability, including concepts such as random experiments, sample space, mutually exclusive events, and various definitions of probability. It explains mathematical, statistical, and axiomatic definitions of probability, along with standard results and examples to illustrate these concepts. Additionally, it provides practical examples and solutions to demonstrate the application of probability in different scenarios.

Uploaded by

Bhanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 204

UNIT-I

Basic Probability

Random Experiment

If an experiment is repeated under the same conditions, any number of times, it does not give
unique results but may result in any of the several possible outcomes

Thus an experiment whose outcome cannot be predicted is called a random experiment or trial
and the outcomes are known as events or cases.

Sample space

The set of all possible outcomes of a random experiment is called a sample space and is
denoted by S.

Favourable Events

The number of outcomes favourable to an event in an experiment is the number of outcomes


which entail the happening of the event.

Example

In tossing two coins the cases favourable to the event of getting a head are HT, TH, HH

Mutually Exclusive events

Two events A and B are said to be mutually exclusive if they can not occur simultaneously.

Note

If A and B represent mutually exclusive events then they are disjoint, that is A  B   where  is
the null set.

Example

1. when we toss a coin, either head or tail can be up, but both cannot be up at a time.

2. when we throw a dice the outcomes getting 1, 2, 3 . . . . 6 are mutually exclusive events.

Hint : Mutually exclusive events are applicable for a single trail only.

If a coin is tossed twice, the head appearing in the first trail will not affect the appearing of the
tail in the next trail.

Equally like events

The events are said to be equally likely if none of them is expected to occur in preference to
the other.i.e each one of them has an equal chance of happening.

Example

In tossing of a coin, getting a head and tail are equally likely events.

4
Exhaustive Events

Outcomes are said to be Exhaustive when they include all possible out comes.

Example

In drawing two cards from a pack of 52 cards, the exhaustive number of cases in
52 c 2

In the case of throwing two dice the exhaustive number of cases is 36 (=62).

Independent Event

Two or more events are considered to be independent if the occurrence or non-occurrence of


an event does not affect the occurrence or non occurrence of the other.

Example

In successive tossing of a coin, the event of getting a head or tail in the first toss does not affect the
event of getting a head or tail in the second toss.

Dependent Events

The events are said to be dependent if the occurrence or non-occurrence of one event in any
trial affects the occurrence of other events in other trails.

Mathematical classical or a priority definition of probability

If a trail results in n exhaustive mutually exclusively and equally likely cases and m of them are
favourable to the happening of the event A, then the probability of happening of A is given by

number of favaourable cases n(A) m


P( A)  p   
Total number of exhaustive cases n(S) n

For example

In throwing a dice, the possible cases are

If A is an event of getting a number 5, then n(A) =1

The probability of getting 5 is

Note :

i) The probability p of the happening of an event is also known as the probability of success.

ii) The probability q=l-p of the non-happening of the event is known as the probability of failure.

iii) If P(A)=1, then A is called a certain event.

iv) If P(A)=0, A is called an impossible event.

v) If the exhaustive number of cases in a trial is infinite, then this definition of classical probability
breaks down.

5
vi) If the events are not equally likely then this definition of mathematical probability breaks down.

Statistical or a Post Priori or Emperical definition of probability

If a trial is repeated n times under essentially homogeneous and identical conditions and let an
event A occur m times out of n trials, n becomes indefinitely large then the probability p of the
m
P(A)  p  lim
happening of A is given by n n

Axomatic Definition of Probability

Let S be the sample space and A be an event associated with a random experiment. Then the
probability of the event A, denoted by P(A) is a real number satisfying the following axioms

(i) 0  P ( A)  1

(ii) P( S )  1, P( )  0

(iii) Addition theorem

If A1 , A 2 ,..., A n are mutually exclusive events then

P(A1  A 2  ..  .A n )  P(A1 )  P(A 2 )  ...  P( An )

i.e., P(A n )   P(A n )


Standard Results on Probability

(i) The probability of an impossible event is zero. i.e., P( )  0

(ii) P( A )  1 - P(A)

(iii) If B  A thn P(B)  P ( A)

(iv) If A and B are two events which are not disjoint then

P( A  B)  P( A)  P( B)  P( A  B)

This is known as additive law of probability

(v) If A,B,C are any three events then


P( A  B  C )  P( A)  P( B)  P(C )  P( A  B)  P( B  C )  P(C  A)  P( A  B  C )

(vi) Two events A and B are independent if P ( A  B )  P ( A) P ( B )

This is known as multiplication law of probability

(vii) If A and B are independent events then A and B also independent

6
Examples

1. In tossing a coin, find the probability of getting a head?

Solution

Number of favourable events 1


P(getting a head)  
Number of exhaustive events 2 .

2. From a bag containing 10 black and 12 white balls, a ball is drawn at random. What is the
probability that it is black?

Solution

Let A be the event of selecting a black ball .

Total number of balls = 10+12=22

Total number of possible (exhaustive) ways of choosing one ball =


22 c1  22 ways

n(S)=22

Out of 10 black balls, the number of ways of choosing one black ball =
10 c1  10

n(A)=10
 n( A) 10
  0.4545
Probability of getting a black ball = n( S ) 22 .

3. If at least one child in a family of three children is a boy, what is the probability that all three are
boys?

Solution

The sample space is

S= {BBB,BBG,BGB,GBB,GGB,GBG,BGG} where B represents boy and G represents a girl.

n(S)=7

n( A) 1

P(all three are boys)= n( S ) 7

4. A problem is given to 3 students A, B, C whose chances of solving it are 1/2, 1/3, 1/4 respectively.
what is the probability that

i) The problem is solved?

ii) Exactly one of them solves the problem.

Solution

Let A, B, C be the events that A, B, C respectively solve the problem.

7
P(A)=1/2, P(B)=1/3, P(C)=1/4

P ( A)  1 / 2, P( B)  2 / 3, P (C )  3 / 4 .

P(The problem is not solved)= P ( A  B  C )

 P ( A) P ( B ) P (C )
1 2 3
  
2 3 4
1

4

Therefore, P(the problem is solved )=1-1/4=3/4.

ii) P(Exactly one of them solves the problem)= P{( A  B  C )  P( A  B  C )  P ( A  B  C )}

 P ( A) P ( B ) P (C )  P ( A) P ( B ) P (C )  P ( A) P ( B )P (C )

 (1 / 2)(2 / 3)(3 / 4)  (1 / 2)(1 / 3)(3 / 4)  (1 / 2)(2 / 3)(1 / 4)

1 1 1 11
   
4 8 12 24 .

5. A is known to hit the target in 2 out of 5 shots. B is known to hit the target in 3 out of 4 shots. Find
the probability of the target being hit when both try.

Solution

2 3
P( A)  , P( B) 
5 4

P( A  B)  P( A)  P( B)  P( A  B) .

As A and B are independent , P ( A  B )  P ( A) P ( B ) .

2 3  2  3 
 P( A  B)      
5 4  5  4 
17

20

6. Events A and B are such that P ( A  B )  3 4 , P(AB)  1/4, P( A )  2 / 3 then find P(B).

Solution

P ( A)  1  P ( A)  1  2 / 3  1 / 3 .

P( A  B)  P( A)  P( B)  P( A  B)

8
P( B)  P( A  B)  P( A)  P( A  B)
=(3/4)+(1/4)-(1/3)

-2/3.

7. If A and B are events with P(A) = 3/8, P(B)=1/2,


P( A  B)  1 / 4 . Find P( Ac  B c )

Solution

P( A  B)  P( A)  P( B)  P( A  B)
3 1 1 5
   
8 2 4 8

 
P ( Ac  B c )  P  A  B c  1  P ( A  B )

=1-5/8

=3/8.

8. A lot consists of 10 good articles, 4 with minor defects and 2 with major defects. Two articles are
chosen from the lot at random(without replacement). Find the probability that

a) both are good

b) both have major defects

c) at least 1 is good

d) at most 1 is good

e) exactly 1 is good

f) neither has major defects

g) neither is good

Although the articles may be drawn one after the other, we can consider that both articles are drawn
simultaneously, as they are drawn without replacement.

Solution

No. of ways drawing 2 good articles


P(both are good) 
a) Total number of ways of drawing 2 articles

10 c2 3
 
16 c2 8.

No. of ways of drawing 2 articles with major defects


P(both have major defects) 
b) Total number of ways of drawing 2 articles

9
2c2 1
 
16 c 2 120

c) P (atleast 1 is good)  P (exactly 1 is good or both are good)

= P (exactly 1 is good and 1 is bad or both are good)

10 c1  6 c1  10 c 2 7
 
16 c 2 8

d) P (atmost 1 is good)  P(none is good or 1 is good and 1 is bad)

10 c0  6 c 2  10 c1  6 c1 5
 
16 c 2 8

e) P (exactly 1 is good)  P(1 is good and 1 is bad)

10 c1  6 c1 1
 
16 c2 2.

f) P (neither has major defects)  P(both are non - major defective articles)

14 c2 91
 
16 c2 120

g) P (neither is good)  P(both are defective)

6c2 1
 
16 c 2 8

9. A box contains 4 white, 5 red and 6 black balls. Two balls are drawn at random. What is the
probability that both are black?

Solution

Let A be the event of drawing two black balls.

Out of 15 balls, 2 balls can be selected in


15c 2 ways.

 n( S )  15c 2 =105

6
Out of 6 balls, 2 balls can be selected in c 2 ways.

 n( A)  6 c 2  15

10
n( A) 15
P( A)    0.1429
n( S ) 105

Hence the probability of drawing two black balls is 0.1429

10. From a well shuffled deck of 52 cards , 4 cards are selected at random. Find the probability that
the selected cards are

i) 3 spades and 1 heart

ii) 2 kings, 1 ace and 1 queen

iii) all are diamonds

iv) There is one card of each suit

v) all the four are hearts and one of them is a jack

Solution

13c 13c
3 1
P(getting 3 spades and 1 heart)   0.0137
52 c
i) 4

4c  4c  4c
2 1 1
P(getting 2 kings, 1 ace and 1 queen)   0.0003
ii) 52 c 4

13c
4
P(getting all diamonds)   0.0026
52 c
iii) 4

13c1 13c1 13c1 13c1


P(getting one card from each suit)   0.1055
52 c
iv) 4

1c  12 c3
1
P (getting 4 hearts out of which one is a jack)   0.0008
v) 52 c 4

Exercises

1) Find the chance of throwing

i) four

ii) an even number with an ordinary six faced dice.

2) Find the probability of drawing an ace or a spade or both from a deck of cards

3) What is the probability of obtaining 2 heads in two throws of a single coin.

11
Conditional probability

If the probability of the event A provided the event B has already occurred is called the conditional
probability and is defined as

P( A  B) P( B)  0
P( A / B) 
P( B) provided

If the probability of the event B provided the event A has already occurred is given by

P(A  B) P(A)  0
P(B/A) 
P(A) provided

Note

i) if the events A and B are independent then

P(A).P(B)
P(A/B)   P(A)
P(B)

P(A).P(B)
P(B/A)   P( B)
P(A)

ii) If A and B are mutually exclusive events then

P(B/A)  0 and P(A/B)  0 since P(A  B)  0

Multiplication law of probability

If A and B are dependent events then

P( A  B)  P( A) P( B / A)  P( B) P( A / B)

EXAMPLES

Total Probability Theorem

If B1, B2,…, Bn are mutually exclusive and exhaustive set of events of a sample space S and A is any
event associated with the events B1, B2, …,Bn then

P( A)  P( B1 ) P( A / B1 )  P( B2 ) P( A / B2 )  ...  P( Bn ) P( A / Bn ) .

n
P( A)   P( Bi ) P( A / Bi )
i.e., i 1 .

1) If the probability that a communication system will have high fidelity 0.81 and the probability that
it will have high fidelity and selectivity is 0.18, what is the probability that a system with high fidelity
will also have selectivity?

Solution

12
Let A be the event that the system has selectivity.

B be the event that the system has fidelity

P ( B )  0.81, P(A  B)  0.18

P( A  B) 0.18 2
P( A / B)   
P( B) o.81 9

2)A box contains 4 bad and 6 good tubes. Two are drawn out from the box at a time. One of them
tested and found to be good. What is the probability that the other one also is good?

Solution

Let A = one of the tubes drawn is good and B= the other tube is good

P(A)=6/10

6c 2 1
P( A  B)  P(both tubes drawn are good)  
10 c 2 3

Knowing that one tube is good, the conditional probability that the other tube is also good is required,
i.e., P(B/A) is required.

By definition,

P( A  B) 1 / 3 5
P( B / A)   
P( A) 6 / 10 9

3) Two defective tubes get mixed up with 2 good ones. The tubes are tested, one by one until both
defectives are found. What is the probability that the last defective tube is obtained on

a) the second test

b) the third test and

c) the fourth test

Solution

Let D represent defective and N represent non-defective tube.

a) P (Second D in the II test)  P(D in the I test and D in the II test)

 P( D1  D2 ), say

 P( D1 ) P( D2 )

13
2 1 1
  
4 3 6

b) P (Second D in the III test)  P(D1  N 2  D 3 or N1  D 2  D 3 )

 P(D1  N 2  D 3 )  P( N1  D 2  D 3 )

2 2 1 2 2 1 1
      
4 3 3 4 3 2 3

P(Second D in the IV Test)  P(D1  N 2  N 3  D 4 or N1  D 2  N 3  D 4 or N1  N 2  D3 D 4 )


 P(D1  N 2  N 3  D 4 )  P( N1  D 2  N 3  D 4 )  P( N1  N 2  D3 D 4 )

2 2 1 2 2 1 2 2 1
   1    1    1
4 3 2 4 3 2 4 3 2
1

2

4) In a shooting test, the probability of hitting the target is 1/2 for A, 2/3 for B and 3/4 for C. If all of
them fire at the target, find the probability that

a) none of them hits the target

b) atleast one of them hits the target.

Solution

P( A)  1  P( A)  1  1 / 2  1 / 2

1 1 1
P( A)  P( B)  , P(C ) 
2 3 4

P( A  B  C )  P( A) P( B) P(C )

1 1 1 1
   
2 3 4 24
P(at least one hits the target)= 1  P (none hits the target)

1 23
 1 
24 24

5) A box contains tags marked 1,2,…,n two tags are chosen at random without replacement. Find the
probability that the numbers on the tags will be consecutive integers.

Solution

14
Number of ways of choosing any one pair from the (n-1) pairs =
(n  1) c1  n  1

n
Total number of ways of choosing 2 tags from the n tags = c 2

 n 1
The required probability = nc 2

6) Among the workers in a factory only 30% receive a bonus. Among those receiving the bonus only
20% are skilled. What is the probability of a randomly selected worker who is skilled and receiving
bonus?

Solution

P(A)=30/100 =0.3

20
P( B / A)   0.2
100

P( A  B)  P( A) P( B / A)  (0.3)(0.2)  0.06

7) A and B alternately throw a pair of dice. A wins if he throws 6 before B throws7 and B wins if he
throws 7 before A throws 6. If A begins, show that his chance of winning is 30/61.

Solution

Let A be the event of A throwing 6 and B be the event of B throwing 7.

5 1
P( A)  , P(B) 
36 6

P( A wins)  P(A or ABA or ABABA or...)


 P(A)  P(ABA)  P(ABABA)  ...

5 5 1 5 30
  ( )( )( )  ... 
36 36 6 36 61

8) In a coin tossing experiment, if the coin shows head, 1 dice is thrown and the result is recorded. But
if the coin shows tail, 2 dice are thrown and their sum is recorded. What is the probability that the
recorded number will be 2?

Solution

When a single dice is thrown,

P (getting 2)  1 / 6

When 2 dice are thrown, the sum will be 2, only if each dice shows 1.

15
 P (getting 2 as sum with 2 dice)  P(getting 1 in first dice)p(getting 1 in second dice)

1 1 1
  
6 6 36

By theorem of total probability,

n
P( A)   P( Bi ) P( A / Bi )
i 1

P(recorded number will be 2 ) = P(H)P(getting 2 with 1 dice)+P(T)P( getting 2 as a sum with 2 dice)

 (1 / 2)(1 / 6)  (1 / 2)(1 / 36)

6 1 7
 
72 72

9) If atleast 1 child in a family with 2 children is a boy, what is the probability that both children are
boys?

Solution

S= {BB.BG.GB,BB}

P(atleast 1 child is a boy)=P(exactly 1 boy)+P(2 boys)

2 1 3
  
4 4 4

Therefore the probability that both children are boys given that atleast 1 child is a boy=

P( A  B)
P( B / A) 
P( A)

1/ 4
  1/ 3
3/ 4

10) Two fair dice are thrown independently. Three events A, B and C are defined as follows.

i) odd face with the first dice

ii) odd face with the second dice

iii) sum of the numbers in 2 dice is odd.

Are the events A, B and C mutually independent?

Solution

16
3
P( A)   1 / 2, P(B)  1/2, P(C)  1/2
6

1 1 1
P( A  B)  P( A) P( B)   
2 2 4

P( A  B)  P( B  C )  P( A  C )  1 / 4

P ( A  B  C )  0 , since C can not happen when A and B occur.

Therefore P ( A  B  C )  P ( A) P ( B ) P (C )

Therefore the events are pairwise independent, but not mutually independent.

11) A bold is manufacture by 3 machines A, B and C. A turns out twice as many items as B, and
machines B and C produce equal number of times. 2% of bolts produced by A and B are defective
and 4% of bolts produced by C are defective. All bolts are put into 1 stock pile and 1 is chosen from
this pile. what is the probability that it is defective?

Solution

Let A, B and C be the event in which the item has been produced by machine A, B and C.

Let D be the event of the item to be defective.

P(A)=1/2, P(B)= P(C)=1/4

P(D/A)=P(an item is defective, given that A has produced)=2/100

P(D/B)=P(an item is defective, given that B has produced)=2/100

P(D/C)=P(an item is defective , given that C has produced) =4/100

By theorem on total probability,

P ( D)  P ( A) P ( D / A)  P ( B ) P ( D / B )  P (C ) P ( D / C )

P( D)  (1 / 2)(2 / 100)  (1 / 4)(2 / 100)  (1 / 4)(4 / 100)


 1 / 40
12) A bag contain 5 red and 3 green balls and a second bag contains 4 red and 5 green balls. One of
the bags is selected at random and a draw of 2 balls is made from it. What is the probability that one
of them red and the other is green.
Solution
Let A1 and and A2 denote the first bag and second bag respectively. Then
P(A1)=P(A2)=1/2
Let B be the event of selecting one red and one green ball.
5c1  3c1

P(B/A1)= 8c 2 15/28

17
4 c1  5c1 20 5
 
P(B/A2)= 9c2 36 9

 The required probability = P(A1)P(B/A1)+P(A2)P(B/A2)

15 5
 (1 / 2)(15 / 28)  (1 / 2)(5 / 9)  
56 18

=275/504

13) An urn contains 10 white and 3 black balls. Another urn contains 3 white and 5 black balls. Two
balls are drawn at random from the first urn and placed in the second urn and then 1 ball is taken at
random from the latter. What is the probability that it is a white ball?

Solution

The two balls transferred may be both white or both black or 1 white and 1 black.

Let B1 be the event of drawing 2 white balls from the first urn and B2 be the event of drawing 2 black
balls from it and B3 be the event of drawing 1 white and 1 black ball from it.

Let A be the event of drawing a white ball from the second urn after transfer.

P( B1 )  15 / 26, P(B 2 )  1 / 26, P(B3 )  10 / 26

P( A / B1 )  P(drawing a white ball / 2 white balls have been transferred)


 5/110

Similarly P ( A / B2 )  3 / 10 and P ( A / B3 )  4 / 10

Therefore, P ( A)  P ( B1 )  P ( A / B1 )  P ( B2 ) P ( A / B2 )  P ( B3 ) P ( A / B3 )
 15 5   1 3   10 4  75  3  40 118
        
=  26 10   26 10   26 10  260 260

=59/130

Baye’s Theorem

If B1, B2,…, Bn are a set of exhaustive and mutually exclusive events of a sample space S and A is any
event associated with B1, B2,…, Bn such that

n P( Bi ) P( A / Bi )
A Bi P( Bi / A) 
n
i 1
 P( Bi ) P( A / Bi )
then i 1

Examples:

1). The contents of Urn I, II and III are as follows:

18
i) 1 white, 2 black and 3 red balls

ii) 2 white, 1 black and 1 red balls and

iii) 4 white, 5 black and 3 red balls

One urn is chosen at random and two balls are drawn. They happen to be white and red. What is the
probability that they come from urn I, urn II and III?

Solution

There are 3 urns. The probability of choosing one urn is 1/3.

Let B1 be the event of choosing urn I, B2 be the event of choosing urn II and B3 be the event of
choosing urn III.

P( B1 )  1 / 3, P(B 2 )  1 / 3, P( B3 )  1 / 3 .

Let A be the event of choosing 2 balls are white and one red. If the urn I is chosen then

1c1  3c1
P( A / B1 )   1/ 5
6c2 .

If the urn II is chosen, then

2c1  1c1 2
P( A / B2 )    1/ 3
4 c2 6

If the urn III is chosen

4 c1  3 c1
P ( A / B3 )   2 / 11
12 c2 .

 P( A)  P( B1) P( A / B1)  P( B2 ) P( A / B2 )  P( B3 ) P( A / B3 )
1 1 1 1 1 2 118
      
3 5 3 3 3 11 495

Assume that A has happened

i.e., 1 white and 1 red balls are chosen.

The probability that they come from urn I is

P( B1 ) P( A / B1 )
P( B1 / A) 
P( A)

1 1

3 5
118
495
19
=0.2797.

The probability that they come from urn II is

1 1

P( B2 ) P( A / B2 ) 3 3  0.4661
P( B2 / A)  
P( A) 118 / 495 .

The probability that they come from urn III is

1 2

P( B3 ) P( A / B3 ) 3 11  0.2542
P( B3 / A)  
P( A) 118 / 495 .

2) A bag contains 5 balls and it is not known how many of them are white. Two balls are drawn at
random from the bag and they are noted to be white. What is the chance that all the balls in the bag
are white?

Solution

Since 2 white balls have been drawn out, the bag must have contained 2,3,4 or 5 white balls.

Let B1 be the event of the bag containing 2 white balls, B2 be the event of the bag containing 3 white
balls, B3 be the event of the bag containing 4 white balls and B4 be the event of the bag containing 5
white balls.

Let A be the event of drawing 2 white balls.

P( A / B1 )  1 / 10, P(A/B 2 )  3 / 10, P(A/B3 )  3 / 5, P(A/B 4 )  1

P(B1)=P(B2)=P(B3)=P(B4)=1/4

By Bayes theorem

P( Bi ) P( A / Bi )
P( Bi / A) 
n
 P( Bi ) P( A / Bi )
i 1

P(all the balls in the bag are white)=P(B4/A)

P ( B4 ) P ( A / B4 )
P( B4 / A) 
4
 P( Bi ) P( A / Bi )
i 1

20
(1 / 4)(1)

(1 / 4)(1 / 10)  (1 / 4)(3 / 10)  (1 / 4)(3 / 5)  (1 / 4)1
1 / 40 1 / 40
   1 / 20
1 3 3 1 20 / 40
(    )
40 40 20 4

3) In a bolt factory, machines A, B and C produce 25%, 35% and 40% of the total output respectively.
Of their outputs 5,4 and 2% respectively are defective bolts. If a bolt is chosen at random from the
combined output, what is the probability that it is defective? If a bolt chosen at random is found to be
defective, what is the probability that it was produced by B?

Solution

P(E1)=0.25, P(E2)= 0.35, P(E3)=0.40

Let X be the event of drawing defective dolt.

P(X/E1)=0.05

P(X/E2)=0.04

P(X/E3)= 0.02

By Bayes theorem

P( E2 ) P( X / E1 )
P ( E2 / X ) 
P( E1 ) P( X / E1)  P( E2 ) P( X / E2 )  P( E3 ) P( X / E3 )
 0.406

4) A given lot of IC chips contains 2% defective chips. Each is tested before delivery. The tester itself
is not totally reliable. Probability of tester says the chip is good when it is really good is 0.95 and the
probability of tester says chip is defective when it is actually defective is 0.94. If a tested device is
indicated to be defective, what is the probability that it is actually defective?

Solution

Let E be the event of chip which is actually good and D be the event of tester says it is good.

P( E )  0.02

P( E )  1  P( E )  1  0.02  0.98

Given that the probability of tester says the chip is good when it is really good is 0.95.

PD / E   0.95

 
P D / E  1  0.95  0.05

 
P D / E  0.94

21
 
P D / E  1  0.94  0.06

P( E ) P( D / E )
P( E / D) 
P( E ) P( D / E )  P( E ) P( D / E )
(0.02)(0.94)

(0.98)(0.05)  (0.02)(0.94)
0.0188

0.0678
 0.2773

5) A certain firm has plant A, B and C producing IC chips. Plant A produces twice the output from B
and B produces twice the output from C. The probability of a non defective product produced by A,B
and C are respectively 0.85, 0.75, and 0.95. A customer receives a defective product, find the
probability that it came from plant B.

Solution:
P(A)=1; P(B)=0.5; P(C)=0.25
P ( E / A)  0.85; P(E/B)  0.75; P(E/C)  0.95
P( E / A)  0.15 ; P( E / B)  0.25 ; P( E / C )  0.05
The probability that the customer receives a defective product from plant B is
is
P( B) P( E / B)
P( B / E ) 
P( A) P( E / A)  P( B) P( E / B)  P(C ) P( E / C )

(0.5)(0.25)

1(0.15)  (0.5)(0.25)  (0.25)(0,05)
0.125
  0.4348
0.2875

6) There are 3 true coins and 1 false coin with head on both sides. A coin is chosen at random and
tossed 4 times. If head occurs all the 4 times, what is the probability that the false coin has been
chosen and used?
Solution
P(T)=P( the coin is a true coin)=3/4
P(F)=P(the coin is a false coin)=1/4
Let A be the event of getting all heads in 4 tosses.
1 1 1 1 1
   
P(A/T)= 2 2 2 2 16
P(A/F)=1
By Bayes theorem
P( F )  P( A / F )
P( F / A) 
P( F ) P( A / F )  P(T ) P( A / T )

22
1
1
 4
1 3 1
(1)  ( )( )
4 4 16
16

19

Random Variable:

A random variable X whose value is determined by the outcome of a random experiment is called a
random variable.

Example

A random experiment consists of two tosses of a coin. Consider the random variable which is the
number of heads (0,1 or 2)

Outcome H H T T
H T H T

Value of 2 1 1 0
X

If the function values are ordered pairs of real numbers, the function is said to be a two-
dimensional real numbers

Discrete Random Variable

A random variable which can assume only a countable number of real values is called a discrete
random variable

Example

Number of telephone calls per unit time, marks obtained in a test, number of printing mistakes in each
page of a book.

Probability Mass Function

Suppose X is an one-dimensional discrete random variable taking atmost a countably infinite number
of values x1,x2,… with each possible outcome xi, , we associate a number pi, P(X=xi)=p(xi)=pi, called
the probability of xi.

The function p(xi),i-1,2… satisfying the conditions

i. p( xi )  0i

23

 p( xi )  1
ii. i 1
is called the probability mass function or probability function of the random variable X.

The collection of pairs {xi,pi}, I =1,2,3… is called the probability distribution of the random variable
X.

Discrete Distribution Function

The distribution function of the random variable X with probability mass function p(xi), i=1,2,…,n is
F ( xi )   p(x i )
defined as i : xi  x .

Note:

p ( xi )   PX  x i )  Fx i  - Fx i -1 


i) i : xi  x

where F is the distribution of the random variable X.

E  X    xp( x)
ii) Mean of the random variable X = x

 x 2 p( x)   xp( x)
2
iii) Variance of the random variable X = V  X  

V  X   E ( X 2 )  E ( X )2

Continuous Random Variable

A random variable X is said to be continuous if it can take all possible values between certain limits.

Probability density function

The probability density function f X (x) of a continuous random variable X is defined as


P( x  X  x  dx)  f X ( x)dx where (x,x+dx) is an infinitesimally small interval and satisfies
the following conditions.

i. f X (x) is integerable over the range    x  


ii. f ( x)  0 for all x,    x  

 f ( x)dx  1
iii.   .

Properties of Probability density function

24
The probability density function f X (x) of a continuous random variable X satisfies the
following properties

i. f ( x)  0

 f ( x)dx  1
ii.  
x2
P( x1  X  x2 )   f X ( x)dx
iii. x1

iv. P ( x1  X  x2 )  P ( x1  X  x2 )
a
P( X  a)   f ( x)dx  0
v. a
[Note: In case of a continuous random variable, the probability at a point is always zero.

i.e, P(X=a)=0 for all possible values of a.

Cumulative Distribution Function

The cumulative distribution F(x) of a continuous random variable X with PDF f(x) is given by


F ( x)  P( X  x)   f ( x)dx, -   x  


Note:

i. P (a  X  b)  F (b)  F (a)
F ()  Lt F ( x)  0 F ()  Lt F ( x)  1
ii. x   ; x 
iii. The relation between the CDF and PDF is
d
f ( x)  F ( x)
iv. dx
v. If X is a continuous random variable with PDF f(x) then

Mean  E(X)   xf(x)dx
Mean = -


2 2
E(X )  x f(x)dx
-

Var(X)  E(X 2 )  E ( X )2

Examples:

1) A random variable X has the following probability distribution

X 0 1 2 3 4 5 6 7 8

25
P(X A 3a 5a 7a 9a 11a 13a 15a 17a
)

Find

i) a

ii) P ( X  2)

iii) The distribution function of X and

iv) the mean of X

Solution

i) If p(x) is the probability mass function, then  p( x)  1


i.e., a+3a+5a+7a+9a+11a+13a+15a+17a=1

81a=1

a=1/81

ii) P ( X  2)  P ( X  0)  P ( X  1)  P ( X  2) a=99

=a+3a+5a

=9a

=9(1/81)=1/9.

iii) The distribution function of X is

26
F ( x)  0, x  0
1
 P( X  0)  a  ,0  x 1
81
 P(X  1)  4a  4 , 1  x  2
81
9
 P(X  2)  9a  ,2  x  3
81
1
 P( X  3)  16a  16( ),3  x  4
81
1
 P( X  4)  25a  25( ),4  x  5
81
1
 P( X  5)  36a  36( ),5  x  6
81
1
 P( X  6)  49a  49( ),6  x  7
81
1
 P( X  7)  64a  64( ),7  x  8
81
1
 P( X  8)  81a  81( )  1, 8  x i.e., x  8
81

iv) The mean of X

i. e., E ( X )   xp( x)
=0+3a+10a+21a+a+36a+55a+78a+105a+136a

=444a

=444(1/81)

=5.48

Var(X)   x 2 p ( x)   xp( x)2


2. If the random variable X takes the values 1,2,3 and 4 such that
2 P ( X  1)  3P ( X  2)  P ( X  3)  5 P ( X  4) , find the probability distribution function and
cumulative distribution function of X.

Solution

Let P ( X  3)  k then

k
2 P( X  1)  k  P(X  1) 
2

k
3P( X  2)  k  P(X  2) 
3

27
k
5 P( X  4)  k  P(X  4) 
5
4
k k k
 p( xi ) 1  2  3  k  5  1
i 1
30
k
61

The probability distribution function

k 30 / 61
P(X  1)    0.2459
2 2

k 30 / 61 30
P(X  2)     0.1639
3 3 183

30
P(X  3)  k   0.491
61

k 30 / 61
P( X  4)    0.0984
5 5

To find Cumulative distribution function

The CDF F(x)= P ( X  x) is calculated as follows

F ( x)  0, x  1
 0.2459 , 1  x  2
 0.2459  0.1639  0.4098,2  x  3
 0.4098  0.491  0.9008, 3  x  4
 0.996  1, x  4

3). The probability distribution function of X is

X=i 1 2 3 4

P(X=i) 15/61 10/61 30/61 6/61

Find cumulative distribution function of X

Solution

The CDF F(x)= P ( X  x) is defined as follows

28
F ( x)  0, x  1
15
 ,1  x  2
61
25
 ,2  x  3
61
55
 ,3  x  4
61
 1, x  4

4) A continuous random variable has the probability density function

f ( x)  k ( x  1) 3 , 1  x  3
 0, otherwise

Find

i) the value of k

ii) the distribution function F(x)

Solution


 f ( x)dx  1
Since   we have

3
3
 k ( x  1) dx  1
1

3
( x  1) 4 
k  1
4 
1

(k / 4)(2 4  0)  1

k  1/ 4

 f ( x)  1 4 ( x  1) 3 ,1  x  3

x
F ( x)   f ( x)dx
ii) 

1 x
  f ( x)dx   f ( x)dx
 1

29
1 3
1 3
  0.dx   4 ( x  1) dx
 1

1
 ( x  1) 4
16

 F ( x)  0, x  1
1
 ( x  1) 4 , 1  x  3
6
 1, x  3

5) A random variable X has the PDF f(x) given by

f ( x)  cxe x , x  0
 0, x  0

Find the value of c and CDF OF X

Solution

If f(x) is a PDF, then


 f ( x)dx  1



x
 cxe dx  1
0

 

c  xe  x  (e  x ) 0  1

 c(1)  1

c=1

 f ( x)  xe  x


P X  x    xe  x dx
The CDF of X=F(x)= 0

 F ( x)   xe  x  e  x 0  x

 xe  x  e  x  1

30
 1  e  x (1  x)

 F ( x)  1  e  x (1  x), x  0
 0, otherwise

6) A continuous random variable X follows the probability law f(x)=ax2, 0  x  1 . Determine a and
1 1
and
find the probability that x lies between 4 2.

Solution

If f(x) is a PDF then


 f ( x)dx  1


0 1 
 f ( x)dx   f ( x)dx   f ( x)dx  1
 0 1

1
2
 ax dx  1
0

1
x3 
 a  1
3 
0

1 
 a   0  1
3  i.e., a=3

 f ( x)  3x 2

1/ 2
1 1
P  x 
4 2 
  f ( x)dx
1/ 4

1/ 2
2
  3x dx
1/ 4

1/ 2 3 3
x3  1 1
3     
3  2 4
1/ 4 =

1 1 7
  
8 64 64
31
7) If the PDF of a random variable x is f(x)=x/2 in 0  x  2 . Find P ( X  1.5 / X  1)

Solution

P( X  1.5) P( X  1.5  X  1)
P( X  1.5 / X  1) 
P( X  1)

2
 f ( x)dx
 1.5
2
 f ( x)dx
1

2
x
 2
dx
 1.5
2
x
 2 dx
1

2
1  x 2 

2  2 
1.5

2
1  x 2 

2  2 
1.

4  (1.5) 2

(4  1)

4  2.25

3

1.75
  0.5833
3

8) If X is a continuous random variable with PDF

f ( x )  x, 0  x  1
1
 ( x  1) 2 ,1  x  2
2
 0, otherwise (x)

3 5
P(  x  )
Find the cumulative distribution function F(x) of X and use it to find 2 2

Solution

32
By definition F ( x)  P ( X  x)

1 x
3
F ( x)   xdx   ( x  1) 2 dx
2
0 1

1 x
x2  3 ( x  1) 3 
   
2  2 3 
0 1

1 ( x  1)3
 0
2 2

1 x 3  3x 2  3x  1
 
2 2

x 3  3x 2  3x
,1  x  2
2

F ( x)  1 when x  2

The cumulative distribution function is

F ( x)  x 2 2 , 0  x  1
x 3  3x 2  3x
 ,1  x  2
2
 1 when x  2

3 5
P(  x  )
To find 2 2 .

3 5
P(  x  )  F (5 / 2)  F (3 / 2)
2 2
(3 / 2)3  3(3 / 2) 2  3(3 / 2)
1
2

(27 / 8)  (27 / 4)  (9 / 2)
 1
2

1  27  54  36 
 1  
2 8 

=1-(9/16)=7/16.

9) If the density function of a continuous random variable X is

33
f ( x)  ax, 0  x  1
 a, 1  x  2
 3a - ax, 2  x  3
 0, otherwise

i) Find the value of a

ii) the cumulative distribution function of X and

iii) P ( X  1.5)

Solution


 f ( x)dx  1
Since f(x) is a PDF of x, we have  

1 2 3
 axdx   adx   (3a  ax)dx  0  1
0 1 2

1 3
x2   x2 
 a   ax 12  3ax  a   1
2   2 
0 2

a 32 4
 (1  0)  a (2  1)  (3a.3  a  6a  a. )  1
2 2 2

a 9a
  a  (9a   6a  2a )  1
2 2

a
  a  (a / 2)  1
2

4a
1
2

a=1/2.

The cumulative distribution function of the random variable X is F(x)= P ( X  x)

F ( x)  0, x  0
x
  axdx, 0  x  1
0
x
x2  a
 a   x2 , 0  x  1
2  2
0

34
1 x
F ( x)   axdx   adx
0 1

1
x2  a a 1
a   ax 1x   ax  a    ax  a ( x  )
2  2 2 2
0

1
x2  a a 1 1 1
a   ax 1x   ax  a    ax  a ( x  )  ( x  )
2  2 2 2 2 2
0

x 1
  ,1  x  2
2 4

1 2 x
F ( x)   axdx   adx   (3a  ax)dx
0 1 2

1 2 x
x2  x
F ( x)  a   ax 12  (3ax  a )
2  2 
0 2

a x2
 (1  0)  (2a  a )  (3ax  a  6a  2aa )
2 2

a x2
  a  3ax  a  4a
2 2

 5a x2
  3ax  a
2 2

5 x x2
 3  ,2 x 3
4 2 4

F(x)=1 when x  3

 The CDF of f(x) is

1 2
F ( x)  x , 0  x 1
4
x 1
  ,1  x  2
2 4
5 3x x 2
   ,2x3
4 2 4
 1 when x  3

35
1.5 1 1
P( X  1.5)  F (1.5)   
2 4 2

The CDF of a random variable X is given by

F ( x)  0, x  0
1
 x2, 0  x 
2
 1
3
25
 1

3  x2 ,  x  3
2
 1 when x  3

P( X  1) 1
P(  X  4)
7) Find the PDF of X and evaluate and 3 using both PDFand CDF

Solution

Given the C.D.F. of F(x) is

F ( x)  0, x  0
1
 x2, 0  x 
2
 1
3
25
 1

3  x2 ,  x  3
2
 1 when x  3

W.K.T

d
f ( x)  F ( x)  F ( x)
dx

The PDF of X is

f ( x)  0, x  0
 2 x, 0  x  1/2
3 1
 1- 2(3  x)(1),  x  3
25 2
 0, x  3

Case i: Using PDF

1
13
p( X  1)  p(1  X  1)   f ( x)dx  25
1

36
4
1 8
p (  X  4) 
3  f ( x)dx  9
1/ 3 .

Case ii: Using the CDF

P( X  1) = P(1  X  1)  F (1)  F (1)

3 2
1 (2 )
25
12
1
25

p( X  1) = 13/25.

P(1 / 3  X  4)  F (4)  F (1 / 3)

1 8
1 
32 9

8) If the random variable X has the PDF

f ( x)  1 / 4, x  2
 0, otherwise

Find

i. P ( X  1)
ii. P( X  1)
iii. P(2 X  3  5)
Solution

1
P ( X  1)   f ( x)dx


2 1 1 1
1 1 1  1 
  4
dx   4 dx 4 x  2  4 x  2
 2 =

1  (2) 3
 
2 4.

P( X  1)  1  P X  1

 1  P (1  X  1)

37
1 1
1  1
1  f ( x)dx  1  x  
4  1 2
1

To find P (2 X  3  5)

P (2 X  3  5)  P (2 X  5  3)

 P ( 2 X  2)

 P ( X  1)

3
 1  P( X  1)  1   1/ 4
4

MOMENTS

If X is a random variable which is discrete or continuous, the moments about the origin denoted by
 r  is defined as

 r   E ( X r ) for r = 1,2,3,…

The moments about the mean or central moments denoted by  r is defined as



r  E  X  X 
r

 , for r=1,2,3,…

If X is a discrete random variable which can assume any one of the values x1, x2,…,xn with respective
probabilities p(x1),p(x2),…,p(xn), then

 
 r  E ( X )   x r p( x)
r

r 1

 
 r  E { X  X ) r   ( x  x ) r p ( xr )
and .

If X is a continuous random variable with PDF f(x) then


r  x
r
f ( x)dx, r  1,2,3,..
-


r
r   ( x  x) f ( x)dx, r  1,2,3,..
and -

Relation between moments about origin and moments about mean X

38
r  E( X  X ) 
r
By definition,  

r 1
 E  X r  rc1 X r 1 X  rc2 X r  2 X  rc3 X r  3 X  ...  (1) r 1 rcr 1 X X  (1) r X 
2 3 r
 

2 3 r
 E ( X r )  rE ( X r 1 ) X  rc2 E ( X r  2 ) X  rc3 E ( X r  3 ) X  ...  (1) r X


Since E ( X )  X  1 we have

2 3 r
 r   r   r r 1 1  rc2  r  2 1  rc3  r  3 1  ...  (1) r 1

Results

  
1. The first moment about the mean is always zero, since 1  1   0 1  0
2. The first moment about the origin is mean.
3. The second moment about the mean
2
 2   2  2 1 1  1

2
 2   2  1

Var (X)= E ( X )  E ( X )
2 2

i.e., The second moment about the mean is variance.

4. The third moment about the mean is

3   E ( X  X )   3  3 2 1  3c2 1 ( 1 ) 2  3c3 ( 1 )3


3
 

3
 3  3 2 1  3( 1 )3  1

3  3  3 2 1  2( 1 )3

Similarly,

4
 4   4  4 3 1  6( 1 ) 2 (  2 )  31 and so on.

Relation between moments about any point A and moments about mean X .

 
W.K.T.  r  E  X  A
r

Putting r =1,

39
1  E  X  A  X  A  Mean X  1  A

Putting r = 2,

 2   2  ( 1 ) 2

Similarly we get,

3  3  3 2 1  ( 1 )3

 4   4  4 3 1  6  2 ( 1 ) 2  ( 1 ) 4 etc.

Properties of Moments

1. If X is a random variable the

E (aX  b)  aE ( X )  b

Note: E ( X  Y )  E ( X )  E (Y )

2
2. If X is a random variable then Var (aX  b)  a var( X )

2 2
3. If X and Y are independent then Var(aX  bY)  a Var(X)  b Var(Y)

4. If X and Y are independent then E ( XY )  E ( X ) E (Y )

5. If X and Y are two random variables such that Y  X then E (Y )  E ( X )

Covariance (X,Y)

The covariance of two random variables is denoted by


 xy  Cov( X , Y ) which is defined as

Cov(X,Y)=E(XY)-E(X)E(Y)

Note:

i. If X and Y are independent random variables then Cov(X,Y)=0


ii. If X and Y are any two random variables then
2 2
iii. cov(aX  bY )  a Var ( X )  b Var (Y )  2ab Cov(X, Y)
iv. Cov (aX  b, CY  d )  ab Cov ( X , Y )

Various measures of Central tendency, dispersion, skewness and kurtosis for continuous
probability distribution

Let f(x) be the P.D.F of a random variable X where X is defined from a to b. Then

40
b
 f ( x)dx  x
Arithmetic Mean = a

b
1 1
  f ( x)dx
H x
Harmonic Mean a

Geometric Mean

Median: Median is the point which divides the entire distribution in two equal parts. In case of a
continuous distribution median is the point which divides the total area into two parts. Thus if M is the
median then

M b
1
 f ( x)dx   f ( x)dx  2
a M


Mean deviation: Mean deviation about the mean 1 is given by

b
M .D.   x  mean f ( x)dx
a

Mean deviation about an average A is given by

b
 x  A f ( x)dx
M.D. about A = a

Mode: Mode is the value of X for which f(x) is maximum.

i.e., mode is given by f ( x)  0 and f (x)  0 if x  [a, b]

Coefficient of skewness and kurtosis are defined as

 2 
1  3 ;  2  4
3
2 22

MOMENT GENERATING FUNCTION

Definition

The moment generating function of a random variable X denoted by M X (t ) is defined as

M X (t )  E (etx )

41
 tx (tx) 2 (tx)3 
 M X (t )  E 1     ... 
 1! 2! 3! 
 

t t2 t3 tr
 E (1)  E ( X )  E ( X 2 )  E ( X 3 )  ...  E ( X r )  ...
1! 2! 3! r!

t  t 2  t3  tr
1 1   2  3  ...   r   ...
1! 2! 3! r!


tr 
  r
r 0
r!

Which gives the MGF interms of moments.


t r M X (t )  r  r  E( X r )
The co efficient of r! in of where r = 1,2,… and , moment about the
origin.

Moment Generating Function of X about any point X=a is defined as

M X (t )  E (et ( x  a ) )  t ( x  a) t ( x  a) 2 t ( x  a)3 
E 1     ... 
 1! 2! 3! 
=  

1
t  t 2  t3 
1   2  3  ...

E ( X  a) r   r  
= 1! 2! 3! where .

Since MX(t) generates moments it is called moment generating function.

If X is a discrete random variable with PMF p(x) then

M X (t )  E (etx )   etx p( x)
x

If X is a continuous random variable with PDF f(x) then


tx tx
M X (t )  E (e )  e f ( x)dx


Moments using Moment Generating Function

Differentiate equation with respect to t and then putting t = 0 gives

d 
1   M X (t )
 dt t 0

42
 d2 
 2   M X (t )
2
 dt  t  0

In general,

 dr 
r   M X (t )
r
 dt  t  0 r=1,2,3,…

Note:

1. Moment generating function MX(t) is used to calculate the higher moments.

2. M aX (t )  M X (at ) , a being a constant.

3. The moment generating function of the sum of a given number of independent random variables is
equal to the product of their respective moment generating function

i.e.,
M X 1  X 2 ... X n (t )  M X 1 (t ).M X 2 (t )...M X n (t ) .


4. Mean = 1

 2
5.Variance =  2  1

Exercises

1) Find the M.G.F. for the following function given by

X 0 1 2 3 4 5 6

P(X) 1/49 3/49 5/49 7/49 9/49 11/49 13/49

Solution

t
M X (t )   etx p( x)
x 0

t
  etx p( x)
x 0

 e 0 (1 / 49)  et (3 / 49)  e 2t (5 / 49)  e3t (7 / 49)  e 4t (9 / 49)  e5t (11 / 49)  e 6t (13 / 49)


1
49

1  3et  5e 2t  7e3t  9e 4t  11e5t  15e 6t 
2) A random variable X has the probability function

43
1
f ( x) 
2 x , x =1,2,3,…

Find its i) M.G.F. ii) Mean

Solution

It is a discrete random variable ,


M .G.F .   etx p( x)
x 1


  etx f ( x)
x 1


1
  etx 2 x
x 1

 x
 et 
  
2 
x 1 

2 3
et  et   et 
        ...
2  2  2
 

 2 3 
et  et  et   et 
  
 1  2   2    2   ... 
2      
 

1
et  et  et
 1   
2  2  2  et
 

et
 M X (t ) 
2  et

 d  et 
X   
 dt  2  et 
 t  0
Mean

 ( 2  e t )e t  e t ( e t ) 
 
 (2  et ) 2  t  0

44
 2  e 2t  e 2t  2
  2
t 2 1
 (2  e )  t  0 = .

variance   2  1
2
 
d 2  et 
 2    
 dt 2  2  et 
  t  0

d  2 
  
 dt  (2  et ) 2  t  0

 2(2)(et ) 
 
t 3
 (2  e )  t  0

 2  4 .

But variance   2  1
    =4-4=0.
2

3) Find the M.G.F for the distribution

f(x)  2/3 at x  1
 1/3 at x  2
 0, otherwise

Also find i) Mean, ii) Variance

Solution

M.G. F. = E(etx)

  etx p(x)

M X (t )  et 2 / 3  e 2t .1 / 3  0

d 
1   M X (t )
 dt t 0

d 
1   (2et / 3  e 2t 3
 dt t 0


1  (2et / 3  2 e 2t 3 t  0 
45
2 2 4
  
3 3 3

 2
Variance =  2  1

 d2 
 2   M X (t )
2
 dt  t  0

d  2et 2e 2t 

 2    
dt 
 3 3  t  0

  2et 4e 2t 
2    
 3 3 
t 0

2 4 6
  2
=3 3 3

  2  6 / 3  (4 / 3) 2
Variance=  2  1

6 16 18  16
    2 / 9.
3 9 9

4) The moment generating function of a random variable X is given by

et 4e3t 2e 4t 4e5t
M X (t )    
3 15 15 15

Find the probability density function of X.

Solution

tx
For a discrete random variable, M X (t )  E (e )   etx p( x)
Hence the probability function is given by

x 1 3 4 5

P(x 1/ 4/1 2/1 4/1


) 3 5 5 5

 r
5) Find the M.G.F. of the random variable whose moments are  r  (r  1)3 and hence find its
mean.

46
Solution


tr 
M X (t )   r!  r
The M.G.F. is r 0 .


tr
  r! (r  1)!3r
r 0


(3t ) r
  r! (r  1)!
r 0


  3t r (r  1)
r 0

M X (t )  1  2(3t )  3(3t ) 2  ...

 (1  3t ) 2

1
M X (t ) 
(1  3t ) 2

d 
Mean   M X (t )
 dt t 0

 (2) 
 (3) 
 (1  3t )3  t  0

 6 

3
 (1  3t )  t  0 .

Mean = 6

t
Mean  1  coefficientof  6
(Or) 1! .

6) Find the moment generating function of the distribution

f(x)  ke  kx , x  0
 0, otherwise

 
Hence find i) Mean and ii) Variance, iii) 3 , iv)  4 .

Solution

47
Given

f(x)  ke  kx , x  0
 0, otherwise

M.G.F.  M X (t )  E (etx )


tx
 e f ( x)dx



 e
tx
ke  kx dx



 k  e (t  k ) x dx
0


e (t  k ) x 
k 
t  k 
0

k
 (0  1)
k t

k
 M X (t ) 
k t

d 
Mean  1   M X (t )
 dt t 0

 d  k 
  
 dt  k  t  t  0

 k ((1)) 
 
 (k  t ) 2  t  0

k 1
 
2 k.
k

Next to find Variance

  2
Variance=  2  ( 1 )

48
 d2 
 2   M X (t )
2
 dt  t  0

d  k 
  
 dt  (k  t ) 2  t  0

  k .2(1) 

3 
 (k  t )  t  0

 2k  2k 2
  
3 3
 (k  t )  t  0 k k2

  2
Variance=  2  ( 1 )

2
2 1 1
   
2 k
k k2

 d3 
3   M X (t )
3
 dt  t  0

 d  2k 
3    
dt  3
  (k  t )  t  0

 2k (3)(1) 
3  
4 
 (k  t )  t  0

6k 6
 
k4 k3

  d4 
4   M X (t )
4
 dt  t  0

 d  6k  
 4    
 dt  (k  t ) 4  t  0

 6k (4)(1) 
 4   
 (k  t )5  t  0

49
   24k 
 
 (k  t )5  t  0

24k 24

5
k k4 .

7) If the random variable X has the M.G.F.

3 3
M X (t )  
3  t 3(1  t / 3)

t
 (1  ) 1
3

 1  t 3  (t / 3) 2  (t / 3)3  ...

t t2 t3
 1  (1 / 3)  (2 / 9)  (6 / 27)  ...
1! 2! 3!

tr
 r   co  efficient of
r!

t
1  co  efficient of  1 / 3
1!

t2
 2  co  efficient of  2/9
2!

  2
Variance =  2  ( 1 )

2 1 2 1
 ( ) 
9 3 9

1
Variance 
Standard deviation = 3.

1 x   x 
dF  e dx
8) Find the m.g.f. of the distribution defined by 2 , and hence find the
variance.

Solution

1 x
dF  e dx
2

50
dF 1  x
 e
dx 2 .

dF 1 x
 f (x) f ( x)  e
Now dx . i.e., 2 .


tx
M X (t )  e f ( x)dx



1 x
  etx e dx
2


0 
1  ( x) 1
  e tx
e dx   etx . e  x dx
2 2
 0 .

0 
1 (t 1) x 1
 e dx   et ( 1) x dx
2 2
 0

0 
1 (t 1) x 1
 e dx   e (t 1) x dx
2 2
 0

0 
1 e (t 1) x  1   e  (1 t ) x 
   
2 t  1  2  1  t 
 0

1 1 1 
  0 (0  1)
2 1  t 1 t 


2
2(1  t 2 )

 1 t2 1 2
1 t 
(t 2 ) 2 t 2

 3
 ...
= 2! 3!

 1  co efficient of t/1!  0

 2  co efficient of t 2 / 2!  2

  2 2
Variance =  2  ( 1 )  2  0  2

51
UNIT-II

PROBABILITY DISTRIBUTIONS

Introduction

While constructing probabilistic models for observable phenomena, certain probability


distributions arise more frequently than do others. we treat such distributions that play important roles
in many engineering applications as special probability distributions.

DISCRETE DISTRIBUTIONS

Bernoulli Trials and Bernoulli Distributions

Let A be an event ((trail) associated with a random experiment such that p(A) remains the
same for the repetitions of that random experiment, then the events are called Bernoulli trails.

A random variable X which takes only two values either 1 (success) or 0(failure) with
probability p and q respectively. i.e., P(X=1)=p, P(X=0)=q, p+q=1 is called Bernoulli variate and is
said to have a Bernoulli distribution.

Moments of Bernoulli Distribution

 r   E ( X r )  1r.  p  0 r  q  p

1  E ( X )  p,  2  E ( X 2 )  p

Mean=p

Var ( X )  E ( X 2 )  E ( X )2  p  p 2  p (1  p )  pq

Definition.

A random variable X is said to follow binomial distribution denoted by B(n,p) if it assumes only non-
negative values and its probability mass function is given by

p ( x)  P ( X  x)  nc x p x q n  x ,x=0,1,2,…,n

=0, otherwise

Where n and p are parameters.

Binomial Frequency Distribution

52
Suppose that n trails constitute an experiment and if this experiment is repeated N times the
frequency function of the binomial distribution is given by

Np ( x)  N  nc x p x q n  x , x  0,1,2,..., n

Properties of Binomial Frequency Distribution

1. Each trail results in two mutually disjoint outcomes, termed success and failure.

2. The trails must be independent of each other.

3. All trails have same constant probability of success.

4. The number of trails n is finite.

Mean of Binomial Distributions

E ( X )   xp( x)
Mean = x

n
  xnc x p x q n  x
x 0

n
n!
  x  x!(n  x)!p x q n  x
x 0

n
n(n  1)! pp x 1q n  x
  ( x  1)!(n  x)!
x 0

n
(n  1)! p x 1q n  x
 np 
x 1
( x  1)!(n  x)!

n
(n  1)! p x 1q n  x
 np 
x 1
( x  1)!(n  x)!

n
 np  (n  1) c x 1 p x 1q n  x
x 1

n
 np  (n  1) c x 1 p x 1q ( n 1)  ( x 1)
x 1

53
 np (q  p ) n 1

Mean=np

Variance of Binomial Distribution

Var ( X )  E ( X 2 )  E ( X )2

The probability mass function of binomial distribution is

P ( X  x)  p ( x)  nc x p x q n  x , x  0,1,2,..., n

n
E( X 2 )   x 2 p( x)
x 0

n
  x 2 nc x p x q n  x
x 0

n
n!
  x 2 x!(n  x)! nc x p x q n  x
x 0

n
n!
  x( x  1)  x x!(n  x)! p x q n  x
x 0

n n
n! n!
  x( x  1) p xqn x   x p xqn x
x 0
x!(n  x)! x 0
x!(n  x)!

n
n(n  1)(n  2)!
  ( x  2)!(n  x)! p 2 p x  2 q n  x  E ( X )
x 0

n
(n  2)!
 n(n  1) p 2
 ( x  2)!(n  x)! p x  2 q n  x  np
x 0

 n(n  1) p 2 (q  p ) n 2  np

E ( x 2 )  n(n  1) p 2  np

But, Var(X)  E(X )  E  X 


2 2

54
 n(n  1) p 2  np  n 2 p 2


 p 2 n 2  n  n 2  np 
 np(1  p)

 npq

Note

Similarly we can prove that

i. 3  npq(q  p)

ii.  4  npq (1  6 pq  3npq )


iii. Mean of the binomial distribution always greater than its variance.

Moment Generating Function (M.G.F)

The probability mass function of a binomial distribution is

P( X  x)  nc x p x q n  x , x  0,1,2,...n

where n is the number of independent trials and x is the number of success.

By definition of the moment generating function

M x (t )  E (e tx )

n
  etx nc x p x q n  x
x 0

n
  nc x ( pet ) x q n  x
x 0

 q n  nc1 ( pe t )1 q n 1  nc 2 ( pe t ) 2 q n  2  .....  ( pe t ) n

 (q  pe t ) n

Finding mean using moment generating function:

55
d 
E ( X )  1   M X (t )
 dt t 0

d 
  (q  pe t ) n 
 dt t 0

 
 n(q  pe t ) n 1 pe t t  0

 np ( p  q ) n 1  np

Finding variance using moment generating function

 d2 
E ( X 2 )   2   M X (t )
2
 dt  t  0

 d2 
 (q  pe t ) n 
2
 dt  t  0

 
 npet (q  pet ) n 1  n(n  1)(q  pet ) n  2 ( pet ) 2 t  0


 np(q  p) n 1  n(n  1)(q  p) n  2 p 2 
 np  n(n  1) p  2

 np  n 2 p 2  np 2

 np (1  p )  n 2 p 2

E ( X 2 )  npq  n 2 p 2

Var ( X )  E ( X 2 )  E ( X )2

 npq  n 2 p 2  n 2 p 2

 npq

Moment Generating Function of Binomial Distribution about mean

By definition,

56
M X  np (t )  E[et ( x  np ) ]

 E[etx  tnp ]

 E[etx e tnp ]

 e  npt E[etx ]

M X (t )  e  npt (q  pet ) n

Additive or Reproductive property of binomial distribution

If X1 and X2 are two independent binomial variates with parameters (n1,p) and (n2,p)
respectively then X1+X2 is a binomial variate with parameter (n1+n2,p)

Proof

The MGF of the random variable X1+X2 is

M X 1  X 2 (t )  M X 1 (t ) M X 2 (t )

 (q  pet ) n1 (q  pet ) n2

 (q  pet ) n1  n2

This shows that X1+X2 is also a binomial variate with parameters n1+n2 and P.

Note

If X1 and X2 are two independent binomial variates with parameters (n1,p1) and (n2,p2) then
X1+X2 is not a binomial variate.

Recurrence formula for central moment

n
 k  E[ X  E ( X )]k   ( x  np) k nc x p x q n  x
x 0 -------------------------------(i)

Differentiate with respect to P we get

 
n
d
 k   nc x  nk ( x  np ) k 1 p x q n  x  ( x  np ) k [ xp x 1q n  x  p x (n  x)q n  x 1 (1)]
dp x 0

57
n
 nk k 1   nc x ( x  np) k p x 1q n  x 1[ xq  (n  x) p]
x 0

1 n Since, ( p  q  1)
 nk k 1   nc x p x q n  x ( x  np ) k ( x  np )
pq x  0

1 n
 nk k 1   nc x p x q n  x ( x  np) k 1
pq x  0

d k 1
  nk k 1   k 1
dp pq

 d 
 k 1  pq  k  nk k 1 
Therefore,  dp  -------------------------------(ii)

Using recurrence relation (ii) we can compute moments of higher order, provided the
moment of lower order are known.

Putting k=1 in (ii)

 d  Since, (  0  1, 1  0)
 2  pq  1  n 0   npq
 dp 

Therefore,  2  npq

Putting k=2 in (ii)

 d 
3  pq  2  2n1 
 dp 

d 
 pq  (npq )  2n(0)
 dp 

d  Since, q  1  p
 pq  [np (1  p )]
 dp 

d 
 pq  [np  np 2 ]
 dp 

 pqn  2np

58
 npq  2np 2 q

3  npq(1  2 p)

Putting k=3 in (ii)

 d 
 4  pq  3  3n 2 
 dp 

d 
 pq  npq (1  2 p )  3n(npq )
 dp 

d 
 pq  np (1  p )(1  2 p )  3n 2 pq 
 dp 

d
 pq 
 dp
  
np  np 2 )(1  2 p)  3n 2 pq 

d
 dp
 
 pq  np  2np 2  np 2  2np 3  3n 2 pq 


 pq n  4np  2np  6np 2  3n 2 pq 

 pq n  6np  6np 2  3n 2 pq 

 npq 1  6 p  6 p 2  3npq 
 npq1  6 p  6 p(1  q)  3npq

 npq1  6 pq  3npq

 npq1  3 pq (n  2)

Note

μ2 is the variance

μ3 is a measure of skewness and

μ4 is a measure of kurtosis

59
We denote sometime the measure of skewness and kurotsis by β1 and β2

 2 4
1  3 2 
 23 22
,

Examples

1. The mean and variance of a binomial distribution are 4 and 4/3 respectively. Find P(X≥1) if n=6.

Solution

Mean of binomial distribution = np = 4

Variance of binomial distribution = npq = 4/3

4
npq 3

np 4

q 1
3

Now p = 1-q = 1-1/3 = 2/3

Given n=6

P( X  x)  nc x p x q n  x

P( X  1)  1  P[ X  1]

 1  P[ X  0]

 1  6 c0 p 0 q 6  0

 1  q6

1
 1  ( )6
3

1
1
729

60
728

729

2. The mean and variance of binomial distributions are 4 and 3 respectively. Find P(X=0), P(X=1) and
P(X≥2).

Solution

Mean of binomial distribution = np = 4

Variance of binomial distribution = npq= 3

npq 3

np 4

3
q
4

Now p = 1-q = 1-3/4 = 1/4

Since Mean = np = 4

= n(1/4) = 4

n = 16

P( X  x)  nc x p x q n  x

P( X  0)  nc0 p 0q n

3
 16c0 ( )16
4

3
 ( )16  0.01
4

P( X  1)  nc1 p1q n 1

 16c1 p1q15

1 3
 16( )( )15  0.053
4 4

61
P ( X  2)  1  P ( X  2 )

 1  [ P( X  0)  P( X  1)]

 1  [0.01  0.053]  1  0.063

 0.937

3. If the mean is 3 and variance is 4 of a random variable X, check whether X follows binomial
distribution,

Solution

No. Because for a binomial distribution mean should be greater than the variance.

If mean = np = 3 and variance = npq = 4

npq/np = q = 4/3 = 1.33

1.33 is greater than 1

q>1 ( but the probability is less than 1 )

Therefore mean should be greater than the variance for a binomial distribution.

3. A binomial variate X satisfies the relation 9P(X=4) = P(X=2) when n=6. Find the parameter p of
the binomial distribution.

Solution

The probability function for a binomial distribution is

P( X  x)  nc x p x q n  x

P( X  4)  6c4 p 4q 6  4

P ( X  4)  6 c 4 p 4 q 2

P ( X  2)  6 c 2 p 2 q 4

Given 9P(X=4) = P(X=2)

62
9 * 6c4 p 4q 2  6c2 p 2q 4

135 p 2  15q 2

9 p2 q2

9 p 2 q 2  0

9 p 2 (1  p) 2  0

9 p 2 (1  p 2 2 p)  0

9 p 2 1  p 2 2 p  0

8 p 2 2 p  1  0

 2  4  32
p
16

26 4 8
p  ,
16 16 16

1 1
p ,
4 2

Since p cannot be negative, p=1/4.

4. Out of 800 families with 4 children each, how many families would be expected to have

(i) 2 boys and 2 girls

(ii) at least 1 boy

(iii) at most 2 girls and

(iv) children of both sexes.

Assume equal probabilities for boys and girls.

Solution

63
Considering each child is a trial, n=4. Assuming that birth of a boy is success, p = 1/2 and q =
½

Let X denote the number of successes (boys)

(i) P[2 boys and 2 girls] = P(X=2)

P( X  x)  nc x p x q n  x

1 1
P( X  2)  4 c2 ( ) 2 ( ) 4  2
2 2

1 3
 6( ) 4 
2 8

Therefore number of families having 2 boys and 2 girls=N[P(X=2)]

= 800(3/8) = 100 * 3

= 300

(ii) P[ at least 1 boy ] = P[X≥1]

= P[X=1] + P[X=2] + P[X=3] + P[X=4]

=1- P[X=0]

1 1
P ( X  0)  4 c 0 ( ) 0 ( ) 4  0
2 2

1 15
1  P( X  0)  1  ( ) 4 
2 16

Therefore number of families having at least 1 boy = N [1-(P(X=0)]

= 800 (15/16) = 750

(iii) P( at most 2 girls )= P(exactly 0 girl, 1 girl or 2 girls)

= P[ X=4, X=3, X=2]

= 1-[ P(X=0) + P(X=1) ]

 1 1 1 1 
 1  4 c0 ( ) 0 ( ) 4  0  4 c1 ( )1 ( ) 4 1 
 2 2 2 2 

64
 1 1  1 4 5
 1  ( ) 4  4( ) 4   1  (  )  1 
 2 2  16 16 16

11

16

Therefore number of families having at most 2 girls = N[P(X≥2)]

= 800 (11/16) = 550

(iv) P[ children of both sexes] = 1 – P[ children of same sex]

= 1 –[ P( all are boys) + P( all are girls)]

= 1- [P(X=4) + P(X=0)]

 1 1 1  1 1
 1  4 c4 ( ) 4 4 c 0 ( ) 0 ( ) 4   1  [( ) 4  ( ) 4 ]
 2 2 2  2 2

= 1- 2/16 = 7/8

Therefore number of families having children of both sexes = 800 * 7/8

= 700

5. An irregular 6 faced die is such that the probability that it gives 3 even numbers in 5 throws is twice
the probability that it gives 2 even numbers in 5 throws. How many sets of exactly 5 trials can be
expected to give no even number out of 2500 sets.

Solution

Let the probability of getting an even number with the unfair die be p .

Let X denote the number of even numbers obtained in 5 trials (throws)

Given: P(X=3) = 2 * P(X=2)

5c3 p 3q 2  2 * 5c2 p 2q 3

p = 2q

p = 2(1-p)

3p = 2

65
P = 2/3

q=1-p = 1/3

Now P[ getting no even number ] = P[X=0]

1 1
5c0 p 0 q 5  ( ) 5 
3 243

Therefore number of sets having no success ( even number) out of N sets = N [ P(X=0) ]

= 2500 * 1/243

= 10 nearly

6. A communication system consists of n components each of which will independently function with
probability P. The total system will be able to operate effectively is at least one half of is components
function. For what values of p is a 5 component system more likely to operate effectively than a 3
component system?

Solution

Since p is a constant and the n components function independently, the number of components X that
function follows a binomial distribution

p( x)  nc x p x q n  x , x  0,1,2,.., n

P[ 5 component system functioning effectively] = P[ X=3 or 4 or 5 }

= P[X=3] + P[X=4] + P[X=5]

 5 c 3 p 3q 5  3  5 c 4 p 4 q 5  4  5 c 5 p 5 q 5  5

 10 p 3q 2  5 p 4  p 5

P[ 3 component system functioning effectively ] = P(X≥2)

= P(X=2) + P(X=3)

 3c2 p 2q 3 2  3c3 p 3q 3 3

 3 p 2q  p 3

66
5 component system will function more effectively than 3 component system,

10 p 3q 2  5 p 4  p 5  3 p 2q  p 3

p 2 (10 pq 2  5 p 2 q  p 3 3q  p)  0

p 2 (10 p(1  p) 2  5 p 2 (1  p)  p 3 3(1  p)  p)  0

p 2 (10 p (1  p 2  2 p )  5 p 2  5 p 3  p 3 3  3 p  p )  0

p 2 (10 p  10 p 3  20 p 2  5 p 2  5 p 3  p 3 3  2 p )  0

p 2 (6 p 3  15 p 2  12 p  3)  0

3 p 2 (2 p 3  5 p 2  4 p  1)  0

3 p 2 [(2 p 2  3 p  1)( p  1)]  0

3 p 2 [( p  1 / 2)( p  1)( p  1)]  0

3 p 2 ( p  1 / 2)( p  1) 2  0

2 2
Since, 3 p ( p  1)  0

We have p-1/2≥0

That is p≥1/2

7. At least one half of an airplane’s engines are requires to function In order for it to operate. If each
engine independently function with probability p, for what values of p is a 4 engine plane to be
preferred for operatiions to a 2 engine plane?

Solution

For a 2 engine plane ,

P [ airplane operates] = P(X≥1)

=P(X=1) + P( X=2)

67
 2 c1 p1q1  p 2

 2 p(1  p)  p 2  2 p  2 p 2  p 2

 2 p  p 2  p(2  p)

For a 4 engine plane,

P[ airplane operates] = P(X≥2)

= 1- [P(X<2)

= 1- [P(X=0) + P(X=1)]

 1  [q 4  4c1 p1q 3 ]

 1  q 2 [q 2  4 pq ]

 1  (1  p) 2 [(1  p) 2  4 pq]

 1  (1  p) 2 [(1  p 2  2 p)  4 p(1  p)]

 1  (1  p) 2 [1  p 2  2 p  4 p  4 p 2 ]

 1  (1  p) 2 [1  2 p  3 p 2 ]

 1  (1  p 2  2 p)[1  2 p  3 p 2 ]

 1  (1  p 2  2 p )(1)  (1  p 2  2 p )(2 p )  (1  p 2  2 p )(3 p 2 )]

 1  1  p 2  2 p  2 p  2 p 3  4 p 2  3 p 2 3 p 4  6 p 3 )

 3 p 4  8 p3  6 p 2

 p 2 (3 p 2  8 p  6)

Since a 4 engine plane is preferred than a 2 engine plane,

p 2 (3 p 2  8 p  6)  p(2  p)

68
3 p 4  8 p3  6 p 2  2 p  p 2

3 p 4  8 p3  7 p 2  2 p  0

p(3 p 3  8 p 2  7 p  2)  0

(3 p 3  8 p 2  7 p  2)  0

( p  1)(3 p 2  5 p  2)  0

Now, 3p2 -5p +2= 3p2-3p-2p+2

= 3p(p-1)-2(p-1)

= (3p-2) (p-1)

( p  1)(3 p  2)( p  1)  0

( p  1) 2 (3 p  2)  0

That is

( p  1) 2  0, (3 p  2)  0

P ≥ 1 or p ≥ 2/3

Since p>2/3 is the only pemitted value and also P<1 , the required value of p is 2/3.

8. A factory has 10 machines which may need adjustment from time to time during the day. Three of
these machines are old, each having a probability of 1/11 of needing adjustment during the day and 7
are new, having the corresponding probability of 1/21. Assuming that no machines needs adjustments
twice on the same day, find the probabilities that on a particular day,

(i) just 2 old and no new machine need adjustment and

(ii) just 2 machines that need adjustment are of the same type

Solution

Let X1 be a random variable which denotes the number of old machines need adjustment and X2 be the
random variable which denotes the number of new machines that need adjustments

69
Let p1= Probability that an old machine needs adjustment

p1= 1/11 q1=1-p1 = 10/11

p2= Probability that a new machine needs adjustment

p2= 1/21 q2=20/21

There are 3 old machines i.e., n=3

P( X 1  x)  nc x p1 x q1n  x

1 10
 3c x ( ) x ( )3 x , x  0,1,2,3
11 11

There are 7 new machines i.e., n=7

P( X 2  x)  nc x p2 x q2 n  x

1 x 20 7  x
 7c x ( ) ( ) , x  0,1,2,..,7
21 21

The random variables X1 and X2 are independent.

i) The probability that just 2 old machines and no new machines need adjustment is given by

P1(X1=2 ∩ X2=0) = P(X1=2)P(X2=0)

1 10 1 20
 3c2 ( ) 2 ( ).7 c0 ( ) 0 ( ) 7
11 11 21 21

=0.016

ii) If just 2 machines need adjustment and they are of the same type can happen in the following two
mutually exclusive ways:

a) 2 old and no new machine (or)

b) 2 new and no old machine

Therefore the required probability is

=P(X1=2 ∩ X2=0) + P(X1=0 ∩ X2=2)

70
10 3 1 20
 0.016  ( ) 7 c2 ( ) 2 ( )5
11 21 21

=0.016+0.028

=0.044

9.The probability of a man hitting a target is 1/4.

i) If he fires 7 times, what is the probability of his hitting the target at least twice?

ii) How many times must he fire so that the probability of his hitting the target at least once is greater
than 2/3?

Solution

Let X be a random variable which denotes the number of hits.

Given: p=1/4 and q=3/4

P ( X  x)  p ( x)  nc x p x q n  x , x  0,1,2,..., n

i) Given n=7

1 3
P( X  x)  7 c x ( ) x ( ) 7  x , x  0,1,2,...,7
4 4

P( hitting at least twice i.e., X≥2) = 1-[P(X=0)+P(X=1)

 1 3 1 3 
 1  7 c0 ( ) 0 ( ) 7  7 c1 ( )1 ( ) 6 
 4 4 4 4 

3 3 7
 1  ( )6[  ]
4 4 4

3 10
 1  ( )6[ ]
4 4

 0.551

ii) P(hitting at least once) = P(X≥1)

= 1 – P(X<1)

= 1- P(X=0)

71
= 1-(3/4)n ≥ 2/3 (Given)

1-2/3 ≥ (3/4)n

1/3 ≥ (3/4)n

By trial, When n=4 this condition is satisfied. Therefor he must fire 4 times to hit at least once with
probability more than 2/3.

10. A set of 6 similar coins are tossed 640 times with the following results:

Number 0 1 2 3 4 5 6
of heads

Frequenc 7 64 140 210 132 75 12


y

Calculate the binomial frequencies on the assumption that the coins are symmetrical.

Solution

Let X denote the number of heads and let X follow binomial distribution with parameters (n,p)

Here n=6

To find p, we compute the mean of given frequency distribution and equate it to np( mean od the
binomial distribution)

X 0 1 2 3 4 5 6

F 7 64 140 210 132 75 12

Fx 0 64 280 630 528 375 72

x
 fx  1949
 f 640
6p= 1949/640

p= 0.5075 q=0.4925

The distribution of X is given by

72
NP ( X  x)  640.6 c x (0.5075) x (0.4925) 6  x

The expected frequencies are calculated as follows

X px) Np(x) Expected frequency

0 (0.4925)6 9.13308 9

1 6(0.5075)(0.4925)5 56.467 56

2 15(0.5075)2(0.4925)4 145.4683 146

3 20(0.5075)3(0.4925)3 199.865 200

4 15(0.5075)4(0.4925)2 154.4642 154

5 6(0.5075)5(0.4925) 63.6675 64

6 (0.5075)6 10.947 11

Total 640

11.Two dice are thrown 120 times. Find the average number of times in which the number on the first
die exceeds the number on the second die.

Solution

The number on the first die exceeds that on the second die, in the following combinations:

(2,1);

(3,1) ,(3,2);

(4,1),(4,2),(4,3);

(5,1),(5,2),(5,3),(5,4);

(6,1),(6,2),(6,3),(6,4),(6,5)

Where the numbers in the parantheses represent the numbers in the first and second dice respectively.

P(success) = P(number in the first dice exceed the number in the second dice)

=15/36 =5/12

This probability remains the same in all the throws that are independent.

73
If X is the number of success, then X follows binomial distribution with parameter n=120 and p=
5/12

E(X)=np=120*5/12= 50

12. It is known that diskettes produced by a certain company are defective with a probability 0.01
independently of each other. The company markets diskettes in packages of 10 and offers a money
guarantee that almost 1 of the 10 diskettes is defective. What proportion of diskettes are returned? If
someone buy 3 diskettes, what is the probability that he will return exactly one of them?

Solution

Given p = 0.01

q = 1-p =0.99

n = 10

Let X be the random variable denoting the number of defective in a package.

Then

P ( X  x)  nc x p x q n  x

The company must replace the packages only when it has more than 1 defective diskette

P[ at most 1 diskette is defective] = P[ X≤1]

= P(X=0) + P(X=1)

 10 c0 (0.01) 0 (0.99)10  10 c1 ( (0.01)1 (0.99) 9

 0.9044  0.0914

 0.1858

P[ a package will have to replace ]= P[ X>1]

= 1 – P[X≤1]

= 1 – 0.1858 = 0.8142

Therefore, 0.4% of the packages will have to replace. [ or the proportion of packages sold to be
replace is 4%]

74
If someone buys 3 diskettes,

P[ a package will have to return ] = 3 * 0.1842

= 2.4426

13. Assuming that half of the population is vegetarian and that 100 investigators each take 10
individuals to see whether they are vegetarians, how many would you expect to report that 3 people or
less were vegetarians?

Solution

n=10, p=1/2, q=1/2

P ( X  x)  nc x p x q n  x

1 1
 10 c x ( ) x ( )10  x
2 2

1
 10 c x ( )10
2

P ( X  3)  P ( X  0)  P ( X  1)  P ( X  2)  P ( X  3)

1 1 1 1
 10 c0 ( )10  10 c1 ( )10  10 c 2 ( )10  10 c3 ( )10
2 2 2 2

1
 ( )10 [1  10  45  120]
2 \

1 176
 ( )10 [176]   0.1718
2 1024

Among 100 investigators, the number of investigators who report that 3 or less were consumers

=100 * 0.1718

=17 investigators

14. A factory produces 10 articles daily. It may be assumed that there is a constant probability p= 0.1
of producing a defective article. Before the articles are stored, they are inspected and the defective

75
ones are set aside. Suppose that there is a constant probability r = 0.1, that a defective article is
misclassified. If X denote the number of articles classified as defective at the end of a production day,
find a) P(X=3) and b) P(X>3)

Solution

Let X be the random variable represented by the number of articles which are defective.

P[ a defective article is classified as defective ] = P( an article produced is defective) * P( it is


classified as defective)

= 0.1 *0.9

p = 0.09

q = 1 – p = 0.91

n= 10

P ( X  x)  nc x p x q n  x

 10 c x (0.09) x (0.91)10  x

P( X  3)  10 c3 (0.09) 3 (0.91) 7

 0.0452

P( X  3)  1  P( X  3)

 1  [ P ( X  0)  P ( X  1)  P ( X  2)  P ( X  3)]

 1  [10 c0 (0.09) 0 (0.91)10  10 c1 (0.09)1 (0.91) 9  10 c 2 (0.09) 2 (0.91)8  10 c3 (0.09) 3 (0.91) 7 ]

=0.0089

15. An irregular 6 faced die is thrown and the expectation that in 10 throws it will give 5 even
numbers is twice the expectation that it will give 4 even numbers. How many times in 10,000 sets of
10 throws would you expect to give even number?

Solution

76
Let the random variable X denote the number of even numbers.

P[ getting x even numbers ]  P ( X  x)  nc x p x q n  x , x  0,1,2..., n

Given n=10

P ( X  x)  10 c x p x q10  x , x  0,1,2...,10

P[ getting 5 even numbers ] = 2 * P[ getting 4 even numbers ]

10 c5 p 5 q 5  2 *10 c 4 p 4 q 6

6 5 5
p q  2 p 4q 6
5

6
p  2q
5

3 p  5q

3 p  5(1  p)

3p  55p

8p  5

5 3
p  q  1 p  1 5 
8 8 8

P[ getting x even numbers ]  P ( X  x)  nc x p x q n  x , x  0,1,2..., n

Therefore the required number of times that in 10,000 sets of 10 throws each we get no even number

5 3
= 10,000 * P[X = 0] = 10,000 *10 c0 ( ) 0 ( )10  0
8 8

= 0.5499 approximately

POISSON DISTRIBUTION

Definition

77
If X is a discrete random variable that assumes only non-negative values such that its probability
mass function is given by

e   x
P X  x   , x  0,1,2,3,... where  0
x!
 0, otherwise

then X is said to follow Poisson distribution with the parameter  .

Poisson Distribution is a Limiting case of Binomial Distribution

Suppose in a binomial distribution,

1. The number of trails n is indefinitely large, i.e., n   .

2. The probability of success p for each trail is very small, i.e, p  0 .

np (  )
p

q  1 p  1
 
3. is finite and n, n where is a positive constant.

Proof

Let X be a binomially distributed random variable. Then probability mass function of a binomial
distribution is

P( X  x)  nc x p x q n  x , x  0,1,2,..., n

n!
 p x (1  p ) n  x
(n  x)! x!

1.2.3...(n  ( x  1))(n  x)(n  ( x  1)...(n  1)n x


 p (1  p ) n  x
1.2.3....(n  x) x!

We know that mean of binomial distribution =np

 
np    p  q  1 p  1
Let n and n

x n x
  
.(n  ( x  1))(n  ( x  2)...(n  1)n  1  
 n  n
x!

78
x 
n.n(1  1 n)n(1  2 n)...n(1  (( x  1) n)) .(1  ) n  x
 nx n
x!

x 
n x (1  1 n)(1  2 n)...(1  (( x  1) n)) .(1  ) n  x
 nx n
x!


(1  1 n)(1  2 n)...(1  (( x  1) n)) x .(1  ) n  x
P( X  x)  n
x!

Taking the limit as n   in the above equation we get


(1  ) n  x
n e  x
P( X  x)   x Lt 
n x! x!

Mean of the Poisson distribution


E( X )   xP( X  x)
Mean = x 0


e   x
  x! x
x 0


x x 1
 e    x!
x 1


 x 1
 e   
x 1
( x  1)!

Mean  e  e 


Variance of Poisson distribution

Var(X)  E ( X 2 )  [ E ( X )]2

79

[ E ( X )]2   x 2 p( x)
x 0


e   x
  x2 x!
x 0


e   x
  ( x 2  x  x) x!
x 0


e   x
  ( x( x  1)  x) x!
x 0


e   x  e   x
  x( x  1)
x!
 x
x!
x 0 x 0


e   x  e   x
  ( x  2)!   ( x  1)!
x2 x 1


e   x  2   x 1
e 
 2  ( x  2)!  ( x  1)!

x2 x 1


x  2 
 x 1
 2 e    ( x  2)!  e 
 ( x  1)!
x2 x 1

 2 e   e   e   e 

E ( X 2 )  2  

Var(X)  E ( X 2 )  [ E ( X )]2  2    2  

Therefore variance of the poisson distribution is λ

Moment Generating function of Poisson distribution

Find the moment generating function of the Poisson distribution and hence find the mean and variance

e   x
P X  x   , x  0,1,2,3,...
x!

80
The moment generating function of the poisson distribution is

M x (t )  E (etx )


  etx p( x)
x 0


e   x
  etx x!
x 0


e xt   x 
 tx x
e 
  x!  e  x!
x 0 x 0


(et ) x t
 e  x!  e ee
x 0

t
 e (e 1)

To find mean and variance using MGF

d 
Mean  E ( X )  1'   [ M x (t )]
 dt t 0

d t 
  (e   e e ) 
 dt t 0

 t 
 e   e e  e t 
 t 0

 e  e   

Var(X)  E ( X 2 )  [ E ( X )]2

2
 d2' 
E ( X )  2   2 [ M x (t )]
 dt  t 0

81
d t 
  [e   ee et ]
 dt t 0

 t t 
  e   [ e  e e t  e t e  e  e t 
 t 0

 e   (e   e  )

 e  e (1   )

  (1   )    2

E ( X 2 )    2

Var(X)  E ( X 2 )  [ E ( X )]2    2  2  

For Poisson distribution

Mean=λ

Variance=λ

1.If X is a Poisson variate such that P(X=1)=3/10 and P(X=2)=1/5. Find P(X=0) and P(X=3)

Solution

e  x
P X  x  
x!

e  1 3
P X  1   (1)
1! 10

e  2 1
P X  2   (2)
2! 5

e - 2 1
( 2) 2!  5

(1)  3
e 
1! 10

82
(2)  2 4
   
(1) 2 3 3

4 x

e 3  4 
e   x 3
P X  x   
x! x!

4 0

e 3  4  4
P X  0   3   e  3  0.2637
0!

4 3

e 3  4 
P X  3   3   0.1047
3!

2. In a certain factory producing razor blades, there is a small chance 1/500 for any blade to be
defective. The blades are supplied in packets of 10.Use Poisson distribution to calculate the
approximate number of packets containing

(i) no defective blade

(ii) at least 1 defective blade and

(iii) at most 1 defective blade in a consignment of 10,000 packets.

Solution

Given p=1/500 and n=10

Let X be the number of defectives in a packet

λ=np=10/500=1/50=0.02

e  x e 0.02 (0.02) x
P X  x   
x! x!

i) No defective blade : P(X=0)

e 0.02 (0.02)0
  0.9802
0!

Therefore the number of pockets containing no defective razor= 10000 * 0.9802

83
= 9802

ii) At least 1 defective= P(X≥1)

= 1- P(X<1)

= 1-P(X=0)

= 1 – 0.9802 = 0.0198

Therefore the number of packets containing at least one defective= 10000 * 0.0198

= 198

iii) At most 1 defective = P(X≤1)

=P(X=0) + P(X=1)

e 0.02 e 0.02 (0.02)


 
0! 1!

 0.0198  e 0.02 (0.02)

 0.9997

Therefore the number of packets containing at most 1 defective blade = 10000 * 0.9997

= 9997

3. An insurance company has discovered that only about 0.1% of the population is involved in a
certain type of accident each year. If its 10000 policy holders were randomly selected from the
population, what is the problem that not more than 5 of its clients are involved in such an accident
next year?

Solution

Given p= 0.1% = 0.1/100 = 0.001

n= 10000

Mean   np  10000 * 0.001  10

Let X be a random variable of number of clients involved in accident

84
e   x e 10 (10) x
P( X  x)  
x! x!

P ( X  5)  P ( X  0)  P ( X  1)  P ( X  2)  P ( X  3)  P ( X  4)  P ( X  5)

e 10 (10)0 e 10 (10)1 e 10 (10) 2 e 10 (10)3 e 10 (10) 4 e 10 (10)5
P( X  5)      
0! 1! 2! 3! 4! 5!

 10 100 1000 10000 100000 


 e 10 1      
 1 2 6 24 120 

 0.0671

4. In a given city 4% of all licenced drivers will be involved in at least 1 road accident in any given
year. Determine the probability that among 150 licenced drivers ran only chosen in this city

i) only 5 will be involved in atleast 1 accident in any given year and

ii)at most 3 will be involved in atleast 1 accident in any given year.

Solution

4
  np  100  6
100

e 6 65
P( X  5)   0.1606
i) 5!

ii) P ( X  3)  P ( X  0)  P ( X  1)  P ( X  2)  P ( X  3)

e 6 6 e 6 6 2 e 6 63
 e6     0.1512
1! 2! 31

5.In an industrial complex, the average number of fatal accidents per month is one-half. The number
of accidents per month is adequately described by a Poisson distribution. What is the probability that
6 months will pass without a fatal accident.

Solution

The average number of fatal accidents per month is   1 2

During 6 months, the average number of fatal accidents would be 1/2+1/2+1/2+1/2+1/2+1/2=3, by


additive property of Poisson distribution. λ=3

85
The probability that 6 months will pass without a fatal accident = P(X=0)

e   x
P( X  x) 
x!

e 3 30
P ( X  0)   0.0498
0!

6. A radioactive source emits on the average 2.5 particles per second. Find the probability that 3 or
more particles will be emitted in an interval of 4 seconds.

Solution

λ = 2.5/sec

In an interval of 4 second average number of particles emitted= 2.5+2.5+2.5+2.5


= 10

P (3 or more particles emitted)  1  P[ X  3]

 1  P[ X  0]  P[ X  1]  P[ X  2]

 e 10 (10) 0 e 10 (10)1 e 10 (10) 2 


1    
 0! 1! 2! 

100
 1  e 10 (1  10  )
2
 0.9972

7. Messages arrive at a switch board in a Poisson manner at an average rate of six per hour. Find the
probability for each of the following events

(i) Exactly two messages arrive within one hour

(ii) No message arrives with in one hour

(iii) at least three messages arrive within one hour

Solution

Mean λ = 6 per hour

86
e  x e 6 6 x
P( X  x)  
x! x!

e 6 6 2
P( X  2)   0.0446
2!

e 6 60
P( X  0)   0.0025
0!

P ( X  3)  1  P ( X  3)  1  [ P ( X  0)  P ( X  1)  P ( X  2)]

 1  e 6 (1  6  18)  0.9380

8. A car hire firm has 2 cars which it hires out day by day. The number of demands for a car on each
day follows a Poisson distribution with mean 1.5. Calculate the proportion of days on which

i) neither car is used

ii) some demand is not fulfilled

Solution

Let X be random variable representing the number of demands for cars:

e  x
P( X  x) 
P( x demands in a day)= x!

Given: λ = 1.5

e 1.5 (1.5) x
P( X  x) 
Now x!

i) the proportion of days on which neither car is used

e 1.51.50
P ( X  0)   e 1.5  0.2231
0!

ii) The proportion of days on which some demand is refused

The demand is refused when x is more than 2

P ( X  2)  1  [ P ( X  2)]

87
 1  [ P( X  0)  P( X  1)  P( X  2)]

e 1.5 (1.5) 0 e 1.5 (1.5)1 e 1.5 (1.5) 2


1[   ]
0! 1! 2!

 0.19126

9. The proofs of a 500 page book contains 500 misprints. Find the probability that there are at least 4
misprints in a randomly chosen page.

Solution

Total number of mistakes= 500

Total number of pages= 500

The average number of mistake per page is 1. λ =1

Let X be a random variable of number of mistakes in a page.

e  x e 11x
P( X  x)  
x! x!

P( at least 4 mistakes)  P ( X  4)

 1  P( X  4)

 1  [ P( X  0)  P( X  1)  P( X  2)  P( X  3)]

 e 1 e 1 e 1 e 1 
1     
 0! 1! 2! 3! 

 1 1
 1  e 1 1  1   
 2 6

 0.0180

10. It has been established that the number of defective stereos produced daily at a certain plant is
Poisson distributed with mean 4. Over a 2 day span, what is the probability that the number of
defective stereos does not exceed 3?

Solution

88
Let X1 be the number of defective stereos produced on the first day and X2 be the number of defective
stereos produced on the second day.

Then X1+X2 is the number of defective stereos produced on the 2 days

X1 +X2 follows a poisson distribution with parameter 4+4=8. Thus

e  x e 8 8 x
P( X  x)   , x  0,1,2..
x! x!

P[ Number of defectives does not exceed 3]  P( X 1  X 2  3)


 P( X 1  X 2  0)  P( X 1  X 2  1)  P( X 1  X 2  2)  P( X 1  X 2  3)

e 8 80 e 8 81 e 8 82 e 8 83
   
0! 1! 2! 3!

 0.0424

11. If the number of telephone calls coming into a telephone exchange between 9 A.M and 10 A.M
and between 10 A.M and 11 A.M are independent and follows Poisson distribution with parameters 2
and 6 respectively. What is the probability that more than 5 calls come between 9 A.M and 11 A.M

Solution

Let X be a random variable which denotes the number of telephone calls between 9 am and 10 am
with parameter 2 and Y be a random variable which denotes the number of telephone calls between 10
am and 11 am with parameter 6

By additive property X+Y=Z be a random variable having the mean 8 (2+6)

Hence

P ( Z  5)  1  P ( Z  5)

 1  P( Z  0)  P( Z  1)  P( Z  2)  P( Z  3)  P( Z  4)  P( Z  5)

5
e  z
1  z!
z 0

8 8 82 83 84 85
 1  e [1      ]
1! 2! 3! 4! 5!

89
 1  e 8 [1  8  32  85.3333  170.669  273.0667]

 0.80876

12. Fit a poisson distribution to the following data and calculate the theoretical frequencies

Deaths 0 1 2 3 4

Frequency 122 60 15 2 1

Solution

X F Fx Theoretical
frequencies

0 122 0 121

1 60 60 61

2 15 30 15

3 2 6 3

4 1 4 0

x
 fx  100  0.5
Mean N 200

Theoretical distribution is given by

e   x
N  P[ X  x]  200 
= x!

Hence the theoretical frequencies are given by

e 0.5 (0.5). x
f ( x)  200  (1)
x!

90
Putting x=0,1,2,3,4 in (1) we get

e 0.5 (0.5).0
f (0)  200   121
0!

e 0.5 (0.5).1
f (1)  200   61
1!

e 0.5 (0.5) 2
f (2)  200   15
2!

e 0.5 (0.5) 3
f (3)  200  3
3!

e 0.5 (0.5) 4
f (4)  200  0
4!

EXERCISES:

1. The number of typing mistakes that a typist makes on a given page has a Poisson distribution with a
mean of 3 mistakes. What is the probability that she makes

i) Exactly 7 mistakes

ii) Fewer than 4 mistakes

iii) No mistakes on a given page

[Answer: 0.0216; 0.0474 ; 0.0498]

2.The number of black floes on a broad bean leaf follows a Poisson distribution with mean 2. A plant
inspector however records the number of flies on a leaf only if at least 1 fly is present. What is the
probability that he records 1 or 2 flies on a randomly chosen leaf? What is the expected number of
flies recorded per leaf?

3. Letters were received in an office an each of 100 days. Assuming the following data to form a
random sample from a Poisson distribution, find the expected frequencies, correct to nearest unit.

No. of 0 1 2 3 4 5 6 7 8 9 1
letters 0

Frequencies 1 4 1 2 2 2 8 6 2 0 1

91
5 2 1 0

NEGATIVE BINOMIAL DISTRIBUTION

The negative Binomial distribution is a reversal of binomial distribution.

The negative Binomial random variable represents the number of failures before the rth success. Here
the number of success is fixed.

The negative Binomial variable arises in situations where

i) The experiment consists of a series of independent and identical Bernoulli’s trails, each with
probability p of success,

ii) The trails are observed until exactly r successes are obtained where r is fixed by the experiment.

iii) The random variable X is the number of failures before the rth success.

Definition

A random variable X is said to assume the negative binomial distribution, if it’s probability mass
function is given by

PX  x   x  r  1c r 1 p r q x , x  0,1,2,...

 rc p r (q ) x , x  0,1,2,...


x

Note

In negative binomial the moments are (r,p)

Moments of Negative Binomial distribution


r    x r p( x)
W.k.t., x 0


1  E ( X )   xp( x)
x 0

 xx  r  1cr 1 p r q x 


x 0

92
2(r  1)  r r 2
 0  rp r q  p q  ...
2!

 r 1 (r  1)(r  2) 2 
 rp r q 1  q q  ...
 1! 2! 

 1  rq
 rp r q  
r 1
 p  p .

rq rq
 1 
p i.e., Mean = p .


 2  E( X 2 )   x 2 p( x)
x 0


  x 2 ( x  r  1) cr 1 p r q x
x 0


  x  x( x  1x  r  1cr 1 p r q x
x 0

 xx  r  1cr 1 p r q x   x( x  1)x  r  1cr 1 p r q x 


 

x 0 x 0


rq 

  x( x  1) x  r  1c r 1 p r q x
p x 0

rq  (r  1)r r 2 (r  2)(r  1)r r 3 (r  3)(r  2)(r  1)r r 4 
  0  0  2(1) p q  3(2) p q  4(3) p q  ...
p  2! 3! 4! 

rq  (r  2)(r  3) 2 
  (r  1)rp r q 2 1  (r  2)q  q  ...
p  2! 

rq 1
  (r  1)rp r q 2
p (1  q ) r  2

rq 1
  (r  1)rp r q 2
p pr 2

93
rq q2
  (r  1)r
p p2

Var(X)  E(X 2 )  E ( X )2

2
rq q2  rq 
  (r  1)r  
p p2  p 

rq r 2 q 2 rq 2 r 2 q 2
   
p p2 p2 p2

rq rq 2
 
p p2

rq
 ( p  q)
p2

rq

p 2 (since p+q=1)

Variance= E[X2]-[[E[X]]2

rq

Therefore, Variance p2 .

Moment Generating Function of negative binomial distribution


M X (t )  E (e tx )   etx p( x)
By definition of M.G.F, x 0


  etx x  r  1cr 1 p r q x
x 0

 x  r  1cr 1 qet 
 x
 pr
x 0

 (r  1)r 
 p r 1  rqe t  (qe t ) 2  ...
 2! 

94
r (1  x)  n  1  nc x  nc x 2  nc x3  ....)
 p r 1  qet  1 2 3
  (

r
pr  p 
 M X (t )   
(1  qet ) r  1  qet 
 .

Examples

1.Find the probability that a person tossing 3 coins will get either all heads or all tails for the second
time in a fifth toss.

Solution

The sample space is

S={HHH,HHT,HTH,THH,HTT,THT,TTH,TTT}

P[getting all heads or all tails]=P[getting all heads]+P[getting all tails]

1 1 2 1
   
8 8 8 4

1
p= 4

3
q  1 p 
4

Let X be the number of failures before the second success.

P( X  x)  x  r  1c r 1 p r q x

r =2, p=1/4. q=3/4

x+r=5  x  5  2  3

P( X  3)  3  2  1c 21 (1 / 4) 2 (3 / 4) 3

 4 c1 (1 / 4) 2 (3 / 4) 3 =0.1055

2. In a company 5% defective components are produced. What is the probability that atleast 5
components are to be examined in order to get 3 defective ones?

95
Solution

p  5%  0.05

q=0.95

Required probability= P X  x  r  5

 Px  3  5

 Px  2

 1  Px  2

 1  Px  0  Px  1

 1  {0  3  1c3 1 (0.05) 3 (0.95) 0  1  3  1c3 1 (0.05) 3 (0.95)}

 1  {2 c 2 (0.05) 3  3c 2 (0.05) 3 (0.95)}

1  0.00048  0.99952

3. The probability that a child exposed to certain contagious disease is 0.4. what will be the
probability that the tenth child exposed to the disease will be the third to catch it?

Solution

P=0.4, x=10, r=3

q=0.6

P X  10  P[ x  r  10]

 P[ x  10  r  10  3  7]

 P[ x  7]

We know that
P( X  x)  x  r  1c r 1 p r q x

P[ x  7] 7  3  1
c31 0.4  (0.6)
3 7
=

96
 9 c 2 (0.4) 3 (0.6) 7 =0.0645

4. A fair dice is toss on successive independent trails until the second 6 is observed. Find the
probability of observing exactly 10 non 6’s before the second 6 is toss.

Solution

Success is getting 6 when a dice is toss.

Probability of getting 6= 1/6

P=1/6; q=1-p=1-1/6=5/6

Let X b the number of failures.

1 5
p  , q  , X  10, r  2
6 6

P[ x  10]  10  2  1c 21 (1 / 6) 2 (5 / 6)10

 11c1 (1 / 6) 2 (5 / 6)10

 11(0.0045)  0.04935

5. If the probability of a male child is 1 2 , then find the probability that in a family the 6th child is the
third female child?

Solution

Here we have to find the probability that the sixth child is the third female child.

We have to apply negative Binomial distribution.

P( X  x)  x  r  1c r 1 p r q x

P=Probability of female child =1/2

q=1-p=1/2

r=3

x+r=6

97
x=6-3=3

P[ x  3]  3  3  1c31 (1 / 2) 3 (1 / 2) 3

 5c 2 (1 / 2) 3 (1 / 2) 3  5c 2 (1 / 2) 3  0.15625

ERLANG DISTRIBUTION OR GENERAL GAMMA DISTRIBUTION

Definition

A continuous random variable X is said to follow an erlang distribution or General Gamma


distribution with parameters   0 and k  0 if its p.d.f is given by

k x k 1e x
f ( x)  ,x 0
k
 0, otherwise

Note

1. When   1 , the Erlang distribution is called Gamma distributionr simple Gamma distribution
with

Parameter k whose density function is

1 k 1  x
f ( x)  x e , x  0, k  0
k

2. when k =1, the Erlang distribution leads to the exponential distribution.

Moments of Gamma Distribution


r  E(x r )  x
r
f ( x)dx



k x k 1e  x
r
 x dx
k
0


k
  x r  k 1e  x dx
k
0

98
 k k  r

k  k  r

k  r
r 
 r k

Putting r=1

k  1 kk k
1   
 k  k  .

k
Mean= E(X)= 

k  2 k  1k  1 (k  1)kk
 2   
 2 k  2 k  2 k

k (k  1)

2

 2
Var(X)=  2   ( 1 )

2
k (k  1) k
  
2 

k2  k  k2 k

2 2

k
2
Var(X)= 

GAMMA DISTRIBUTION

Definition

A continuous random variable X is said to follow the Gamma distribution with parameter .
If the probability density function is given by

99
e  x x  1
f ( x)  ,   0, 0  x  

 0, otherwise

Moment Generating function of Gamma distribution


tx tx
M X (t )  E (e )  e f ( x)dx



e  x x  1
  e tx dx

0


1
e (t 1) x x  1dx
 

0

1 

 1  t 

1

1  t 

M X (t )  1  t 

Mean and Variance of Gamma Distribution

d 
1  E ( X )   M X (t )
 dt t 0

d 
  (1  t )   
 dt t 0

 
   (1  t )   1 (1) t  0

Mean=  .

d2 
 2  E ( X 2 )   M X (t )
2
 dt  t  0

100

d
dt
 
 (1  t )  ( 1) t  0

 
    1(1  t )  (  2) t  0

 2   (  1)

Var(X)  E(X 2 )  E ( X )2


  (  1)  2  

Var(X)  

Examples

1. The daily consumption of milk in excess of 20,000 litres is approximately distributed as Gamma
1

variable with parameter k=2, 10,000 . If the city has a daily stock of 30,000 litres on a given day,
find the probability that the stock is insufficient.

Solution

Let X be the random variable for daily consumption of milk in the city.

Let Y=X-20,000. Then Y is a Gamma variable.

k y k 1e y
f (Y )  , y0
k

1

When k =2, 10,000

2 1
 y
 1 
  ye 10,000
10,000 
f (Y )  
2

P ( X  30,000)  P (Y  10,000)

 y
y
 10,0002 e
 10,000 dy

10,000

101
y dy
t  dt
Put 10,000 , 10,000

Therefore


1
PX  30,000    te  t 10,000 dt
10,000
1


  te  t dt
1

 te  t  e  t 1 


  1e 1  e 1  2e 1
2. The consumer demand for electricity in a certain locality per month is known to follow general
Gamma distribution. If the average demand is a kilowatt and the most likely demand is b
kilowatt(b<a) What is the variance of the demand.

Solution

Let X be the random variable denoting consumer demand of electricity,

k k 1  x
f ( x)  x e ,x 0
k .

The most likely demand is the mode of X. i.e., the value of x for which f(x) is maximum.

i.e.,
f ( x) 
k
k

(k  1) x k  2 e  x  x k 1 (e  x ) 
e   k x k  2
 (k  1)  x
k

f ( x)  0 k 1
x
when x = 0, 

f ( x) 
k
k

 x
k  2  x
e  (k  1)  x 
dx
x
d k  2  x 
e  

102
f ( x) 
k
k

   x k  2  x
e  ( k  1)   
x 
dx
x e 
d k  2  x 
  0
when
x
k 1

.

k 1
b
i.e., 

k
E( X )  a
Now  [since average demand =a]

1
ab 

k  k  1 
Var(X)       a (a  b)
Therefore λ 2      .

3. In a city, the daily consumption of electric power is million kilowatts hours in a random variable

with general gamma distribution or Erlang distribution with parameter   1 2 and k=3. If the power
plant of this city has a daily capacity of 12 million kilowatts hours. What is the probability that this
power supply will be inadequate on any given day.

Solution

Let X be the random variable that denotes the daily consumption of power. Then

3 1
 1  3 1  2 x
  x e
2
f ( x)    ,x 0
3

3 1
 1  21  2 x
  x e
2
f ( x)   
3 .

The power supply will be inadequate if the consumption goes beyond 12 million kilowatt hour.


P X  12   f ( x)dx
12

x
 
1 2e 2
 8x 3
dx
12

103
1 1 x
 .  x 2 e 2 dx
8 3


1  x 2 e  x 2 2 xe  x 2 2e  x 2 
    
8.2   1 2 1 / 22  1 23 12


1
16
  
 2 x 2 e  x 2  8 xe  x / 2  16e  x / 2 12


1
16

()  2(12) 2 e  6  8(12)e  6  16e  6 
1 16
 e 288  96  16  25e  6  0.0625
16

4. The daily consumption of bread in a hostel in excess of 2000 leaves is approximately gamma
1

distributed with parameters k =2 and 1000 . The hostel has a daily stock of 3000 leaves. What is
the probability that the stock is insufficient on a day.

Solution

Let X be the number of leaves consumed daily .

Let Y  X  2000 .

1

Then Y follows gamma distribution with k =2 and 1000 .

k y k 1e y
f (Y )  , y0
2

1
 y
2 2 1 1000
 1  y e
 
 1000  2

2 1
 1   y
  ye 1000
 1000  ,y>0

P[Stock is insufficient]= P ( X  3000)

104
 P(Y  2000  3000)

 P (Y  1000)


1
  ye  y 1000 dy
2
1000 1000

 2e 1  0.7358

EXPONENTIAL DISTRIBUTION

Definition

A continuous random variable X is designed in (0, ) is said to follow an exponential


distribution if the probability density function is

f ( x )  e  x ,   0
x0

Note

In exponential distribution,  is the parameter

Cumulative distribution function

x
F ( x)   e  x dx if x  0
0
0 otherwise

 1  e x , if x  0
0 otherwise

Moments, mean and variance

The rth moment is given by


 r  E[ X ]   x r e  x dx
' r

105

   e  x x r 11dx
0

(r  1) r  1 r!
  
r 1 r
  r

r!
r '  r

1! 1
1'  1 
Now,  

2! 2
 2'  E ( X 2 )  2  2
 

2 1
Var ( X )  E ( X 2 )  [ E ( X )]2  2  ( ) 2
 
2 1 1
  
2 2 2

1
Var ( X )  2

Moment Generating Function, Mean and Variance


M X (t )  E (e )   etx f ( x)dx
tx


  etx e  x dx
0


   e  ( t ) x dx
0

0
 e  e( t ) x  
    t
 ( t )  0

106
d  1 
M x (t )  
 2
 ( t ) 
dt

 1 
d   1
M (t
 dt X )
2 
t 0

d2 2
2
M x (t ) 
dt (  t )3

  d2  2 2
 2   2 M x (t ) 
3

 dt  t  0  2

Mean = 1  1

2
Variance =  2  1  2  1

2 1
 2  
 

Examples

1. Suppose that during a rainy season in a tropical island, the length of the shower has an exponential
distribution with average 2 minutes. Find the problem that the shower will be there for more than
three minutes. If the shower has already lasted for 2 minutes, what is the problem that it will last for
at least one more minute.

Solution

Let x be random variable representing the length of the shower in minute.

Given that X follows exponential distribution

The average length = 2.

Therefore the parameter   2 (since mean = 1  )

Hence the p.d.f is

1 1 2 x
f ( x )  e  x  e ,x 0
2
 0, otherwise .

107
(i) To find the probability of shower lasting more than 3 minutes.


1 x 2
 P( x  3)   2e dx
3 .


1  e x 2 
  
2   1 2 
3

1  e 3 2 
    0.2231
2   1 2 
.

ii) P[The shower will last at least one more minutes given that it has lasted 2 minutes]=
P[ X  3 / X  2]

2
1 x 2
 2e dx
=P[X>1]= 1


1 e x 2 
 
2  1 2 
1

2. The daily consumption of milk in a city in excess of 20,000 litres is approximately exponentially
distributed. The average excess in consumption of milk is 3000 litres. The city has a daily stock of
35,000 litres. What is the probability thst of two days selected at random, the stock is insufficient for
both the days.

Solution

Let X be the random variable of daily consumption of milk in excess of 20,000 litres.

X is exponentially distributed with mean 3000 litres.

   1 3000

1  x 3000
f ( x)  e ,x  0
3000 .

Let Y denote the daily consumption. Then x=Y-20000

P[the stock is insufficient on any day)=P(the consumption exceeds 35,000 litres)

108
i.e., P(Y>35,000)=P(X+20000>35,000)

=P(X>15,000)


 x
  e dx
15,000


1  x 3000
  3000
e dx
15,000

1    e 15,000 3000 
 e  
3000   1 3000 

 e 5

5 5 10
Therefore P[stock is insufficient on both days]  e  e = e

3. The mileage which car owners get with a certain type of radial tyre is a random variable having an
exponential with mean 40,000 k.m. Find the probability that atleast one of these tyres will last

i) atleasr 20,000k.m

ii) atmost 30,000k.m

Solution

Given mean = 40,000 k.m

1  1
  mean
40,000

f ( x )  e  x , x  0

1
 x
1
 e 40,000 , x  0
20000


P X  20,000   f ( x)dx
i) 20,000

109
 1
 x
1
  40,000
e 40,000 dx
20,000


  1 x
1  e 40,000 
  
40,000   1 
 40,000 
20,000

1
= 40,000
 
 40000(0  e 1 2  e 1/ 2
=0.6065.

30,000 1
 x
1
PX  30,000   e 40, 000 dx
40,000
ii) 0

30,000
  1 x
1  e 40,000 
  
40,000   1 
 40,000 
0

 
3  
3
1  
   40000(e  1  1  e 4
4
40,000  
 

=0.5276.

4. The time (in hours) required to repair a machine is exponentially distributed with parameter
1

2 . What is the probability that the repair time exceeds 2 hours? What is the conditional
probability that the repair time take takes atleast 10 hours given that its duration exceeds 9 hours?

Solution

Let X be a random variable of time to repair the machine,

1

Given X is exponentially distributed with 2.

1 x 2
f ( x )  e  x  e ,x 0
Therefore 2 .

110
To find P[ X  2]


P[ X  2]   f ( x)dx
2


1 x 2
 2e dx
2


1  e x 2 
  
2   1 / 2 
2

 
  e    e  2 2  e 1  0.3679 .

To find P X  10 / X  9

P X  10 / X  9  P[ X  1] (By Memory less property)


1 x / 2
  2e dx
1

NORMAL DISTRIBUTION

Definition

2
1  x 
  
1
y e 2  
A normal distribution is a continuous distribution given by  2 where X is a
continuous normal variate distributed with density function

2  
1  x 
  
1
f ( )  e 2  
 2 with mean and standard deviation .

Deviation of the distribution

111
When mean has been taken at the origin but if however another point is taken as the origin such that
the excess of the mean over the arbitrary origin is m then

2
1  xm 
  
1
y e 2  
 2 is the standard form of the normal curve with origin at (m,0).

Area under the normal curve is unity.

Characteristics of the Normal Distribution

The diagram of a normal distribution is given below. It is called normal curve.

Properties of the Normal Distribution

1. The normal distribution is a symmetrical distribution and the graph of the normal distribution is bell
shaped.

2. The curve has a single peak point (i.e.,) the distribution is unimodal

3. The mean of the normal distribution lies at the centre of normal curve.

4.Because of the symmetry of the normal curve, the median and mode are also at the centre of the
normal curve. Hence in a normal distribution the mean, median and mode coincide.

5. The tails of the normal distribution extend indefinitely and never touch the horizontal axes. That is
we say that the normal curve approaches approximately from either side of its horizontal axes.

6. The normal distribution is a two parameter probability distribution. The parameters mean and
standard deviation (μ,σ) completely determine the distribution.

7. Area property:

In a normal distribution about 67% of the observations will lie between mean  S.D i.e., (μ 
σ). About 95% of the observations lie between mean  2S.D (i.e., μ  2σ). About 99% of the
observation will lie between mean  3S.D i.e.,(μ  3σ) .

Standard Normal Probability Distribution

112
If X is a normally distributed random variable, μ and σ are respectively its mean and standard
X 
Z
deviation, then  is called standard normal random variable.

X= 

Z=0

Normal table

Special table called table of areas under normal curve is available to determine probabilities
that the random variable lies in a given range of values of the variables. Using the table, we can
determine the probability for X, taking a value less than x (X<x) and also for a given probability we
determine the value x such that X < x

Additive property of Normal Distribution:

If X1, X2 ,…, Xn are independent normal variates with parameters (m1,σ1),(m2, σ2)

,…,(mn, σn) respectively then X1+X2+…..+ Xn is also a normal variate with parameter (m, σ)

Where m=m1+m2+….+mn and σ2= σ12+ σ22+……..+ σn2.

Examples

1) X is a normal variate with mean 30 and standard deviation 5. Find the probability that

i)
26  X  40 ; ii) X  45 iii) X  30  5 .

Solution

Given   30;   5

X 
z

26  30
z  0.8
i) when X= 26, 5

40  30
z 2
when X=40, 5

113
 P26  X  40  P 0.8  z  2

 P 0.8  z  0  P0  z  2

 P0  z  0.8  P0  z  2

-0.8 X=zz=2
=0.2881+0.4772
Z=0
=0.7653.

45  30
z 3
ii) when X=45, 5

 P X  45  P( z  3)

= P0  Z  )  P (0  Z  3)  X=  z=3

=0.5- P (0  z  3)
Z=0

=0.5-0.4987=0.0013.

iii) 
To find P X  30  5 

P X  30  5  P(25  X  35)

25  30
z  1
When X=25, 5

35  30
z 1
When X=35, 5

P X  30  5  P(25  X  35)  P(1  z  1)  P(1  z  0)  P(0  z  1)  P(0  z  1)  P(0  z  1)

 2 P(0  z  1)

=2(0.3413)

=0.6826.

 P X  30  5  1  P X  30  5

=1-0.6826

114
=0.3174

2. A normal distribution has mean   20 and standard deviation   10 .Find P15  X  40  .

Solution

Given   20 and   10

X 
z
We know that  .

X  15 15  20
z  0.5
When , 10 and

40  20
z 2
When =40, 10

P 0.5  z  2   P(0  z  2)  P0  z  0.5

=0.4772+0.1915

=0.6687.

3. The average seasonal rainfall in a place is 16 inches with a standard deviation of 4 inches. What is
the probability that in a year the rainfall in that place will be between 20 and 24 inches?

Solution

X 
z

20  16
z 1
xWhen X=20, 4

24  16
z 2
When X =24, 4 z=1 z=2 0.5

Z=0
 P20  X  24   P1  z  2

 P0  z  2  P0  z  1

=0.4772-0.3413

=0.1359.

115
Note

E(aX+bY)=aE(X)+bE(Y)

Var(aX+bY)=a2V(X)+b2V(Y)

Var(a)=0

E(a)=a

4. X is a normal variate with mean 1 and variance 4. Y is another normal variate independent of X
with mean 2 and variance 3. What is the distribution of X+2Y?

Solution

Since X and Y are independent normal variates, X+2Y will also be a normal variate by the additive
property and

Mean of X+2Y= E(X+2Y)=E(X)+2E(Y) (since E(AX+BY)=AE(X)+BE(Y) )

=1+2(2)=5

Variance of X+2Y=V(X+2Y)=V(X)+22V(Y) (since Var(AX+BY)=A2V(X)+B2V(Y))

12(4)+22(3)=16.

 X+2Y will follow normal with mean 5 and variance 16.

5. The saving bank account of a customer showed an average balance of Rs.150 and a standard
deviation of Rs.50. Assuming that the account balances are normally distributed.

1. What percentage of account is over Rs. 200? P(X>200)


2. What percentage of account is between Rs.120 and Rs.170? P(120<X,170)
3. What percentage of account is less than Rs.75? P(X,75)

Solution

1) To find P ( X  200)

X 
z
We know that  z=1

200  150 Z=0


z 1 The area from z=0 to infinity is 0.5
When X=200, 50
From this subtract area from z=0 to
z=1 (this value get it from table)
116
P( X  200)  P( z  1)  0.5  P(0  z  1)

=0.5-0.3413

=0.1587.

 Percentage of account is over Rs. 200 is 15.87%.

2. To find P (120  X  170)


-0.6 0.4
120  150
z  0.6 Z=0
When X= 120, 50

170  150
z  0.4
When X=170, 50

 P(120  X  170)  P(0.6  z  0.4)

 P(0  z  0.6)  P(0  z  0.4)

=0.2257+0.1554=0.3811

Therefore, percentage of account between Rs.120 and Rs.170 is 0.3811(100)=38.11.

3. To find P ( X  75)

75  150
z  1.5
When X=75, 50

 P( X  75)  P( z  1.5)

 0.5  P(0  z  1.5)

=0.5-0.4322=0.0668.

Therefore, percentage of account is less than Rs.75 is 6.68%

6. The mean yield for one-acre plot is 662 kilos with standard deviation 32 kilos. Assuming normal
distribution, how many one-acre plots in a patch of 1000 plots would you expect to have yield over
700 kilos below 650 kilos.

Solution

Given   662,  32

117
X   X  662
z 
 32

700  662
z
When X=700, 32 =1.19

650  662
z  0.375  0.38
When X=650, 32

PX  700  P( z  1,19)

= 0.5  P (0  z  0.38)  0.352

Therefore, the number of plots have yield below 650 kilos=352.

Exercises

An electrical firm manufactures light bulbs that have a life, before burnout, that is normally
distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find

i) the probability that a bulb burns more than 834 hours.

ii) the probability that bulb burns between 778 and 834 hours.

7. In a distribution exactly normal 7% of the items are under 35 and 89% are under 63. What are the
mean and standard deviation of the distribution?

Solution

X 
z
We know that 

Let z=z1 when X=35 and z=z2 when X=63

P(0  z  z1 )  7%  0.07

P( z1  z  0)  43%  0.43

From tables, z1=-1.48

35  
 1.48
i.e., 

35    1.48 (1)

118
P[ z  z1 ]  39%  0.39

P(0  z  z2 )  0.39

From tables, z2 = 1.2

63  
 1.2
i.e., 

63    1.2 (2)

(2)-(1)  28=2.68 

28

2.68 =10.45

(2)  63    (1.2)(10.45)

63    12.54

  50.46

 Mean = 50.5 and standard deviation = 10.5.

8. In a normal distribution, 31% of the items are under 45 and 8% are over 64. Find the mean and
variance of the distribution.

Solution

X 
z
We know that  .

Let z=z1 when X=45 and z=z2 when X=64

P(0  z  z1 )  31%  0.31

P( z1  z  0)  0.19

From tables, z1=-0.49

45  
 0.49
i.e., 

45    0.49 (1)

119
P[ z  z1 ]  0.8 or P[0  z  z 2 )  0.42

From tables, z2 = 1.40

64  
 1.40
i.e., 

64    1.40 (2)

(1)-(2)  -19=-1.89 

 19
   10
 1.89

From (1)   45  (0.49)(10)

  45  4.9  49.9

  50

Therefore mean =50 and standard deviation =10

9. Suppose the heights of men of a certain country are normally distributed with average 68 inches
and standard deviation 2.5. Find the percentage of men who are

i) between a=66 and b= 71 inches in height.

ii) approximately 6 feet tall

Solution

Given   68,   2.5

X 
z
We know that 

66  68
z  0.8
When X=66, 2.5

71  68
z  1.20
When X=71, 2.5

P66  X  71  P 0.8  z  1.20

120
 P 0.8  z  0  P0  z  1.20

=0.2881+0.3849=0.6730

Approximately 67.3% men are between 66 and 71 inches in height.

ii) Assuming heights are round of the nearest inch.

Then we get a=71.5, b = 72.5

 71.5  68 X  68 72.5  68 
P71.5  X  72.5  P   
 2.5 2.5 2.5 

 P1.4  z  1.8

 P0  z  1.8  P0  z  1.4=0.0449

121
UNIT-III

BASIC STATISTICS

Introduction

The word statistics to indicate the numerical data in any field of enquiry and the term statistical
method to denote ‘ the technique of studying and analyzing the data.

Variables

Any character which can vary in magnitude or quality is called a variate.

Thus the height, weight, age, intelligence, number of children per family, the number of
students in a class etc are variates.

Variates are of two types

i) Discrete, ii) continuous

i) Discrete: Discrete variates are the values obtained by counting

Example: The number of children in a family and the number of students in a class.

They will be in whole number only.

ii) Continuous Variate: Continuous variates correspond to variables which are measured
theoretically to any degree of accuracy.

Example: Measurement of height, weight, temperature etc.

Frequency tables:

Suppose we know the marks obtained by 140 candidates in an examination in a certain subject. Let
the marks be given by the following table

86 35 69 12 55 53 41 10 35 58 71 45 50 30
59 56 37 29 29 51 47 46 82 36 52 59 32 54
16 65 42 27 53 39 40 62 54 53 38 69 66 50
74 26 44 21 53 32 41 54 32 81 58 45 48 30
57 37 43 77 34 21 61 54 46 33 62 52 75 89
66 75 60 47 85 37 49 61 93 50 51 8 45 49
20 70 47 41 49 16 60 63 39 58 23 40 44 68
52 28 52 51 31 83 36 80 70 43 52 35 18 60
27 60 31 44 78 48 55 38 25 59 22 72 61 48
41 98 67 42 42 33 11 64 72 46 37 76 65 43
(Table 1)

122
Frequency table of marks of 140 candidates

Class limits Tally Markss Frequency


0-9 I 1
10-19 IIII I 6
20-29 IIII IIII II 12
30-39 IIII IIII IIII IIII III 23
40-49 30
50-59 29
60-69 19
70-79 IIII IIII I 11
80-89 7
90-99 2
140
(Table 2)

We have arranged the raw data into classes of appropriate size, showing the corresponding frequency
of variates in each class.

When any set of data is symmetrically arranged in this way, it is called a frequency
distribution.

In the above the pairs of numbers written in the column of classes are called lower and upper
class limits. Some times called open class limits or class boundaries.

The difference between the heighest mark and lowest mark is called the range.

The range is divided into classes of appropriate size. The size of the class is called the class-interval.
Usually the classes are of equal size.

The mid value of such a class is called the class-mak, mid-value or the central value.

The width of a class interval is therefore the common difference beween the consecutive class
marks. It is also the difference between the lower ( or upper) limits of two successive classes.

In the above, the class interval is 10 and the successive class marks are 4.5, 14.5 etc

Cumulative Frequency:

If f1, f2, f3, … are the frequencies of the successive classes, then f1, f1+f2, f1+f2+f3, etc are
the cumulative frequencies.

Thus the cumulative frequencies for the previous table is 1, 7, 19, 42, …

The cumulative frequency 42 gives the number of students obtaining 39 or lower marks.

Relative Frequency:

In some problems we may require the relative frequency instead of the actual or absolute
frequency.

123
The relative frequency of any class is the ratio of the frequency of that class to the total frequency.
1 6 12 23
, , ,
Thus in the previous table the relative frequencies of the various classes are 140 140 140 140 etc.

Some times the relative frequency is given as a percentage.

Graphic Representation of a frequency distribution

We have seen that numerical data relating to an event can be presented in the form of a table.
As a visual aid to grasp the data in the table, certain diagrams and graphs are used.

The graph representing a frequency distribution is known as a frequency graph. We shall now
consider some methods of representing statistical data graphically.

a) The frequency Polygon

Plot the points whose x-co-ordinates are the middle values of the classes and y coordinates are the
frequencies in these classes. The figure obtained by joining the successive points is known as a
frequency polygon.

b) The Histogram

A second and more important graphical representation of a frequency distribution is by a


histogram.

The histogram consists of a set of rectangles erected over the true class intervals, their areas being
proportional to the frequencies of the respective classes.

Thus the base of a rectangle is the class-interval in width, the centre of the rectangle is the
mid-value and its area represents the class frequency.

Note:

1. In histogram frequencies are represented by areas, where as in a frequency polygon frequencies are
represented by lengths.

2. In a histogram the width of a rectangle is the same as that of the class. Since the classes are of equal
width, the height of the rectangle will be proportional to the class frequencies.

Frequency table of the marks of table 2 is given below:

Class limits Mid-value Frequency


0-9 4.5 1
10-19 14.5 6
20-29 24.5 12
30-39 34.5 23
40-49 44.5 30
50-59 54.5 29
60-69 64.5 19
70-79 74.5 11
80-89 84.5 7
90-99 94.5 2
140

124
Measures of Central Tendency

Averages

An average is a value which is typical or representative of a set of data. It represents the whole series
and conveys a fairly adequate idea of the whole group. Since such typical values tend to lie centrally
within a set of data arranged according to magnitude.

Averages are also called measures of central tendency or measures of location. An average may or
may not be one of the variate given in the data.

There are three forms of averages in common use.

The Arithmetic mean

The Median and

The Mode

Averages which are rarely used:

Geometric Mean

Harmonic Mean

1) The Arithmetic Mean or the Mean

i) Individual Observations: The arithmetic mean (A.M) or the mean of a set of n numbers x1,x2,…,xn is
denoted by x and is defined as

n
 xr
x1  x2  ...  xn r 1
x 
n n

ii)

a) Discrete Series: In the case of a frequency distribution, let us that a set of n numbers x1,x2,…,xn
having frequencis f1,f2,…,fn respectively.

 f r xr
Then Mean =  fr
ii) Continuous Series: In the case of a frequency distribution, let us that x1,x2,…,xn are the mid values
of the class intervals having frequencies f1,f2,…,fn respectively.

f x  f 2 x2  ...  f n xn i 1
 f i xi
x 1 1 
f1  f 2  ...  f n n
 fi
i 1

125
Shortcut method for calculating the A.M

x  Ac
 fr dr
N

A- the midpoint of the class interval with the highest frequency

x A N   fr
dr  r
c c- width of the class interval;
Examples:

1) The following table gives the monthly income of 10 employees in an office

1780 1760 1690 1750 1840 1920 1100 1810 1050 1950
Calculate the arithmetic mean of incomes

Solution

Let the income be denoted by the symbol x.

x  x2  ...  xn
x 1
n

1780  1760  1690  1750  1840  1920  1100  1810  1050  1950
 1665
10
2) From the following data of the marks obtained by 60 students of a class, calculate the arithmetic
mean

Marks No. of students


20 8
30 12
40 20
50 10
60 6
70 4
Solution:

 fr xr
Mean =  fr
Marks No. of students frxr
xr fr
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
Total 60 2460

126
Mean 
 fr xr  2460  41
 fr 60

3) From the following data compute arithmetic mean

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of 5 10 25 30 20 10
students fr

Solution:

Class Mid No. of students frxr


value fr
xr
0-10 5 5 25
10-20 15 10 150
20-30 25 25 625
30-40 35 30 1050
40-50 45 20 900
50-60 55 10 550
Total 100 3300

Mean 
 fr xr  3300  33
 fr 100
Remark: If there is a gap in class interval, find the average of difference, lower-subtraction; upper
–add

4) Calculate the A.M by the direct method

Weight Frequecy Weight in Frequecy


in lds lds
80-90 1 160-170 9
90-100 11 170-180 5
100-110 25 180-190 4
110-120 37 190-200 3
120-130 62 200-210 2
130-140 31 210-220 1
140-150 22
150-160 15

127
Solution

Class Midvalue of class Frequency frxr


interval interval
80-90 85 1 85
90-100 95 11 1045
100-110 105 25 2625
110-120 115 37 4255
120-130 125 62 7750
130-140 135 31 4185
140-150 145 22 3190
150-160 155 15 2325
160-170 165 9 1485
170-180 175 5 875
180-190 185 4 740
190-200 195 3 585
200-210 205 2 410
210-220 215 1 215
29770

x
 f r d r 29770  130.5716
 f r = 228
5 ) Calculate the A.M. of frequency distribution of weights of 228 adults given in previous problem
by taking an arbitrary origin

Solution

125 is the mid point of the class having the highest frequency. So we take the arbitrary origin A= 125.

Width of the class interval c=10

Class interval Mid value of class Frequency x A f rd r


interval xr fr dr ' r
c
A  125
80-90 85 1 -4 -4
90-100 95 11 -3 -33
100-110 105 25 -2 -50
110-120 115 37 -1 -37
120-130 125 62 0 0
130-140 135 31 1 31
140-150 145 22 2 44
150-160 155 15 3 45
160-170 165 9 4 36
170-180 175 5 5 25
180-190 185 4 6 24
190-200 195 3 7 21
200-210 205 2 8 16
210-220 215 1 9 9
228 127

128
x  Ac
 fr dr
N

 127 
 125  10   130.570
 228 

Median

The Median by definition refers to the middle value in a distribution.

Calculation of Mean

i)Individual Observations:

N 1
Median is the size of 2 item
Discrete Series:

i) Arrange the data in ascending or descending order of magnitude.

ii) Find out the cumulative frequency

N 1
iii) Median is the size 2 item

N 1
iv) Look at the cumulative frequency column and find the total which is either equl to 2 or next
higher to that and determine the value of the variable corresponding to it. That gives the value of
Median.

Continuous Series:

N/2 - c.f
Median  l  .c
f

l- lower limit of the Median class

c.f- cumulative frequency of the class preceeding the median class

f- frequency of the Median class

1) Obtain the value of Median from the following data

391 384 591 407 672 522 777 753 2488 1490

Solution:

129
Arrange in ascending order

384, 391,407,522,591, 672,753,777,1490,2488

N 1
Median = size of 2 item = size of (10+1)/2 item

= size of 5.5th item = average of size of 5th item and size of 6th item = (591+672)/2 = 631.5

2) From the following data of the wages of 7 workers , find Median

Wages 1100 1150 1080 1020 1120 1200 1160 1400


(Rs.)
Solution:

Median: Arrange the data in ascending order:

1 2 3 4 5 6 7 8

1020, 1080, 1100, 1120, 1150, 1160, 1200, 1400

Median = size of (N+1)/2 item = size of (8+1)/2 =4.5th item = 1135

3) From the following data find the value of Median

Income(Rs.) 1000 1500 800 2000 2500 1800


No. of 24 26 16 20 6 30
persons

Solution:

Income No. of persons c.f


f
800 16 16
1000 24 40

1500 26 66 * (N+1)/2=(122+1)/2=
123/2 =61.5
1800 30 96
2000 20 116
2500 6 122
Here N+1/2 = 61.5

In cumulative frequency take which is equal to (N+1)/2 or next heigher to that,

So the c.f which is next higher to N+1/2 is 66.

The corresponding value is 1500

130
Since Median is the value of variate whose cumulative frequency which is equal to N+1/2 or next
heigher

 Median = 1500.

3) calculate the Median for the following distribution

Marks No. of students


45-50 10
40-45 15
35-40 26
30-35 30
25-30 42
20-25 31
15-20 24
10-15 15
5-10 7

Solution

Marks f c.f
5-10 7 7
10-15 15 22
15-20 24 46
20-25 31 77 c.f
before the
median
class
25-30 median class 42■ f frq. 119  N/2=
Of m.c 200/2 = 100
30-35 30 149
35-40 26 175
40-45 15 190
45-50 10 200
Total 200
N/2 - c.f
Median  l  .c
f

Median class : 25-30

l-lower limit of the median class =25

c- width of the class interval=5

c.f – cumulative frquency before the median class = 77

f- 42

100  77
25  .5
Median = 42 = 27.7381

131
From the following data compute Median

Marks No. of students


410-419 14
420-429 20
430-439 42
440-449 54
450-459 45
460-469 18
470-479 7

There is a gap in the class interval divide the difference by 2 and adjust the limits of class interval

Lower limit – subtract ; upper limit- add

The difference is 1 and divide the difference by 2

½=0.5

Lower limit subtract 0.5; upper limit add o.5

So, the frequency distribution table becomes

Marks No. of students


409.5-419.5 14
419.5-429.5 20
429.5-439.5 42
439.5-449.5 54
449.5-459.5 45
459.5-469.5 18
469.5-479.5 7

Marks No. of students c.f


409.5-419.5 14 14
419.5-429.5 20 34

429.5-439.5 42 76 c.f
439.5-449.5 54 f 130 
449.5-459.5 45 175
459.5-469.5 18 193
469.5-479.5 7 200

N/2 =200/2=100

Median class: 439.5-449.5

f=54; c.f= 76; c= 10

132
N/2 - c.f
Median  l  .c
f

100 - 76
Median  439.5  .10  443.94
54
Weighted arithmetic Mean:

If x1, x2, …, xn are n observations and w1, w2,…, wn are their respective weights the weighted
arithmetic mean is defined to be

w1x1  w2 x2  ...  wn xn
Weighted A.M = w1  w2  ...  wn .

Geometric Mean:

Individual observations: The geometric mean of n sizes x1, x2,…,xn is the nth root of their products

i.e.,  x1 x 2 ...x n 
1/ n
(or)

  logx 
G.M.  Antilog  
 N 
 

Discrete Series:

1
G.M.  f1 log x1  f 2 log x2  ...  f n log xn 
N

Discrete series:

  f(logx) 
G.M.  Antilog  
 f 
 

Continuous Series:

  f(logx) 
G.M.  Antilog  
 f 
  ; x-middle value

Harmonic Mean:

The Harmonic mean of a set of quantities is defined to be the reciprocal of the arithmetic mean of the
reciprocal of the quantities. Hence if x1, x2,…, xn are n observations,

1 n
H.M  
1 1 1 1  1
   ...   x
n  x1 x 2 x n  n

In a frequency distribution

133
N N
H.M  
 f1 f 2 fn  f
   ...    xn
 x1 x 2 xn  n

Examples:

1) Calculate Geometric mean from the following data

125 1462 38 7 0.22 0.08 12.75 0.5


Solution:

G.M   x1x2 ...xn 1/ n  54210.3(1/ 8)  6.9517

Aliter:

  logx 
G.M.  Antilog  
 N 
 

x logx
125 2.0969
1462 3.1650
38 1.5798
7 0.8451
0.22 -0.657
0.08 -1.096
12.75 1.1055
0.5 -0.3010
Total 6.7367
  logx 
G.M.  Antilog  
 N 
 

=6.925

2) Find the geometric mean of the following data:

x 1 2 3 4 5
f 2 4 3 2 1
Find the Geometric mean

134
Solution:

  f(logx) 
G.M.  Antilog  
 f 
 

X f logx f.logx
1 2 0 0
2 4 0.3010 1.204
3 3 0.4771 1.431
4 2 0.6020 1.204
5 1 0.6990 0.6989
Total 12 4.5379

G.M.  Antilog  
 f(logx) 
  Antiltog ( 4.5379 )  A.L(0.3781)  2.388
 f  12
 

3) Find the Geometric mean for the data given below:

Marks Frequency
4-8 6
8-12 10
12-16 18
16-20 30
20-24 115
24-28 12
28-32 10
32-36 6
36-40 2

Solution

Marks Mid value logxr f f.logxr


xr
4-8 6 0.778 6 4.668
8-12 10 1 10 10
12-16 14 1.146 18 20.628
16-20 18 1.255 30 37.65
20-24 22 1.3424 115 154.376
24-28 26 1.4149 12 16.9788
28-32 30 1.4771 10 14.771
32-36 34 1.5314 6 9.1884
36-40 38 1.579 2 3.158
209 263.4182

  f(logx r ) 
G.M.  Antilog  

  f 

135
 263.4182 
G.M.  Antilog    18.212
 209 

Harmonic Mean

1) Calculate the harmonic mean from the following data

3834 382 63 8 0.4 0.03 0.009 0.005

Solution

n
H .M . 
 (1 / x)
x 1/x
3834 2.608
382 0.003
63 0.016
8 0.125
0.4 2.5
0.03 33.333
0.009 111.111
0.005 200
Total 349.696
n
H .M . 
 (1 / x)
8
H .M .   0.023
349.696

2) From the following data compute the value of harmonic mean

Marks 10 20 25 40 50
No. of students 20 30 50 15 5
Solution

H .M . 
N f
 ( f / x) ; N=
x f 1/x f(1/x)
10 20 0.1 2
20 30 0.05 1.5
25 50 0.04 2
40 15 0.025 0.375
50 5 0.02 0.1
Total 120 5.975

136
N 120
H .M .    20.08
 ( f / x) 5.975

3) From the following data compute the value of harmonic mean

Class 0-10 10-20 20-30 30-40 40-50


Frequency 15 10 7 5 3
N
H .M . 
 f (1 / x)

X xr f 1/xr f(1/x)
0-10 5 15 0.2 3
10-20 15 10 0.067 0.67
20-30 25 7 0.04 0.28
30-40 35 5 0.029 0.145
40-50 45 3 0.022 0.066
Total 40 4.091
H.M=40/4.091= 9.778

Measures of Dispersion

Defn: Dispersion is defined as the look of uniformity in the sizes of the items It is the extent to which
the value of size differs and hence it is the degree of variability.

The various measures of dispersion are

i) The Range

ii) The Quartile Deviation

iii) The mean deviation

iv) The standard deviation

Range: It is the difference between the greatest and least values observed.

The Quartile Deviation

Quartiles are those values which divide the frequency into four equal parts, where the values
are arranged in ascending or descending order of magnitude.

The first Quartile ( or the lower quartile) Q1 is that value of the variate which is such that
one quarter of the observations lies below Q1.

The third quartile ( or the upper quartile ) Q3 is that value of the variate which is such that
three quarters of the observations lie below Q3.

The middle or second quartile Q2 is obviously median. Thus

Individual Observations:

137
N 1
Q1 is the size of 4 th item

3( N  1)
Q3 is the size of 4 th item

Discrete and continuous series:

N
Q1 is the size of 4 th item

3N
Q3 is the size of 4 th item

Continuous Series:

First Quartile:

1
N  c. f1
Q1  l1  4 .c
f1

l1-lower limit of the first quartile class

c-width of the class interval

c.f.1= cumulative frequency upto to the lower limit

f1- frequency of that quartile class

similarly

3
N  c. f 3
Q3  l3  4 .c
Third Quartile f 3

Q3  Q1
The Quartile Deviation = 2

The middle or the second quartile Q2 is the median.

Remark:

Q1 is that value of x for which cumf = 1/4N

Q2 is that value of x for which cum f = ½ N

Q3 is that value of x for which cum f = ¾ N

Example:

1) Calculate the quartile deviation from the following marks of 13 students in a class

138
25, 35, 9, 28, 52, 41, 38, 96, 85, 72, 10, 40, 60

Soln: Let us arrange the given set of numbers in ascending order

9,10,25,28,35,38,41,41,52,66,72,85,96

Hence N =13

1
( N  1)
Q1 is the size corresponding to the rank 4 i.e., 3 ½.

Q1= 25+ 1/2(28-25) = 53/2 (OR)

25  28 53
  26.5
Q1 = average of 3 rd item and 4 item =
th
2 2

3
( N  1)
Q3 is the size corresponding to the rank 4 . i.e., 10 ½

Q3= 60+1/2( 72-60)=60+6=66

Q  Q1 66  (53 / 2) 79
Q.D  3  
2 2 4

2) The following data relate to the frequency distribution of weights of 1000 Males. Calculate the
quartile deviation

Wt. in lb frequency
80.4-94.4 13
94.4-108.4 107
108.4-122.4 340
122.4-136.4 334
136.4-150.4 136
150.4-164.4 48
164.4-178.4 14
178.4-192.4 7
192.4-206.4 -
206.4-220.4 1

Solution:

To find Q1 and Q3

We prepare the following cumulative frequency table

Wt. in lb frequency Cumulative frequency


80.4-94.4 13 13
94.4-108.4 107 120 C.F1
108.4-122.4 Q1 340 f1 460 C.F3 ¼ N=1/4(930)=232.5
122.4-136.4 Q3 334 f3 794 ¾(N)=3000/4=750
136.4-150.4 136 930
150.4-164.4 48 978

139
164.4-178.4 14 992
178.4-192.4 7 999
192.4-206.4 - 999
206.4-220.4 1 1000

The first Quartile class is 108.4-122.4

1
N  c. f1
Q1  l1  4 .c
f1

1
(1000)  120
Q1  108.4  4 .14  113.753
340

The third Quartile class is 122.4-136.4

3
N  c. f 3
Q3  l3  4 .c
f3

3
(1000)  460
Q3  122.4  4 .14  134.55
334

Q3  Q1 134.55  113.753
Q.D    10.3985
Hence Quartile Deviation = 2 2

The mode

The value of the variate which occurs most frequently is called mode.

Individual Observations:

Find the mode of the following

4, 7,3,4,8.4

Solution: since 4 appear maximum number of times,

mode = 4

Discrete series:

Find the value of mode from the following data:

Marks:x No. of students


20 8
30 12

140
40 20
50 10
60 6
70 4

Solution: Since 40 is having highest frequency

Mode = 40

Continuous Series:

cf 2
l
Mode= f1  f 2

l- lower limit of the modal class; modal class – class with heighest frequency

f1- frequency before the modal class; f2- frequency after the modal class

1) Find the mode in the case of heights of trees in a grade whose frequency distribution is given in the
following table

Heights Frequency
Under 7 feet 26
Under 14 feet 57
Under 21 feet 92-
Under 28 feet 134
Under 35 feet 216
Under 42 feet 287
Under 49 feet 341
Under 56 feet 360

141
Solution

The given problem becomes

Heights Frequency
0-7 26
7-14 57-26=31
14-21 92-57=35
21-28 134-92=42 f1
28-35 216-134=82
35-42 287-216=71 f2
42-49 341-287=54
49-56 360-341=19

Since 82 is the largest frequency, the modal class is 28-35

cf 2
l
Mode = f1  f 2

l- lower limit of the modal class – 28

c-width of the class interval-7

f1- frequency before the modal class-42

f2- frequency after the modal class-71

7(71)
 28   32.398
42  71

Emperical Relation between mean, median and mode

Mean – Mode = 3(Mean-Median)

1) The following table gives the height of 1000 adult males (measured to the nearest quarter inch):

Height (in inches) Frequency


58-59.75 2
60-61.75 28
62-63.75 125
64-65.75 270
66-67.75 303
68-69.75 197
70-71.75 65
72-73.75 10

Calculate the mean, median and mode. Verify whether the Emperical relation between them is
satisfied.

142
Solution:

Since there is a gap between the classes we adjust the classes

Difference = 0.25

Divide the difference by 2 i.e., .25/2=.125

Lower limit- subtract 0.125; upper limit add+0.125

So the gn. Problem becomes

Height (in inches) Frequency


57.875 - 59.875 2
59.875- 61.875 28
61.875-63.75+.125=63.875 125
63.875-65.875 270 f1
65.875-67.875 303
67.875-69.875 197 f2
69.875-71.875 65
71.875-73.875 10

I=65.875; c=2; f1= freq. before the modal class=270; f2 = 197

cf 2
l  66.724
Mode = f1  f 2

Calculation of mean

Height (in inches) xr-mid value Frequencyfr frxr


57.875 - 59.875 58.875 2 117.75
59.875- 61.875 60.875 28 1704.5
61.875-63.75+.125=63.875 62.875 125 7859.375
63.875-65.875 64.875 270 f1 17516.25
65.875-67.875 66.875 303 20263.125
67.875-69.875 68.875 197 f2 13568.375
69.875-71.875 70.875 65 4606.875
71.875-73.875 72.875 10 728.75
Total 1000 66365

 f r xr 
66365
 66.365
Mean =  fr 1000

N/2 - c.f
Median  l  .c
f

N/2= 1000/2 =500

143
Frequencyfr Cumulative
class frequency
57.875 - 59.875 2 2
59.875- 61.875 28 30
61.875- 125 155
63.75+.125=63.875
63.875-65.875 270 f1 425 c.f
65.875-67.875 303 f 728
67.875-69.875 197 f2 925
69.875-71.875 65 990
71.875-73.875 10 1000
Total 1000

N/2 - c.f 500  425


Median  l  .c  65.875  .2  66.37
f 303 ;

Mean = 66.365; mode = 66.724

The emperical relation between mean, median and mode is

Mean-mode= 3(mean-median)

2mean-3 median= -mode

Mode= 3 median- 2 mean= 3(66.37)-2(66.365)=199.11-132.73=66.38, which is approximately


66.365

Hence emperical relation between mean median mode is satisssfied.

The Mean deviation

The arithmetic mean of the absolute values of the deviations is called the mean deviation or the
average deviation.

Individual observation:

 xx
n

In symbols, for a frequency distribution,

Discrete series:

 f xx
Mean deviation = f
Some times average deviation is also taken from the median.

Mean deviation about Median 


 f x - Median
f
144
The standard deviation

The square root of the arithmetic mean of the squares of the deviation is called the standard deviation
or the roor mean square deviation.

Individual observations:


 ( x  x) 2
n

Discrete series:


 f ( x  x) 2
f
Continuous series:

Standarddeviation σ  c   f ( x  A)2
i) f
Here the final value must be multiplied by the width of the class interval to get the value of  in
absolute units.


ii) If the origin A is taken as the arithmetic mean x , standard deviation is denoted by

  c  f ( x  x) 2
f
2 x-A
  c.  fd 2    fd  d
c
iii)  f   f  where

Relation between the standard deviation and the root square mean square deviation

The root mean square deviation is least when the deviations are measured from the arithmetic mean
or the standard deviations is the least possible root mean square deviation.

The square of the standard deviation is also known as the variance of the distribution.

Emperical relation between measures of Dispersion

We have the following approximate relations between the different measure of dispersion

4
Mean Deviation  Standard Deviation
5

2
Quartile Deviation  Standard Deviation
3

145
Examples:

1) Find the mean deviation and standard deviation of the heights (in inches) of 16 students given
below:

67, 65, 59, 61, 67, 69, 72, 67, 62, 64, 63, 66, 68, 69, 67, 60

Solution

Let us arrange in ascending order

59, 60,61, 62, 63, 64, 64, 65, 66, 67, 67, 67, 67, 68, 69,69, 72


 x  1046  65.375
Mean n 16
X x- x =x-65.375 x-x 2
x-x
59 -6.375 6.375 40.641
60 -5.375 5.375 28.891
61 -4.375 4.375 19.141
62 -3.375 3.375 11.391
63 -2.375 2.375 5.641
64 -1.375 1.375 1.891
65 -.375 .375 0.141
66 .625 .625 0.391
67 1.625 1.625 2.641
67 1.625 1.625 2.641
67 1.625 1.625 2.641
67 1.625 1.625 2.641
68 2.625 2.625 6.891
69 3.625 3.625 13.141
69 3.625 3.625 13.141
72 6.625 6.625 43.891
Total 47.25 195.796

Mean deviation 
 x-x
n

Mean deviation 
 x-x  47.25  2.953
n 16

2
 x-x 
195.756
 12.23475  3.498
Standard deviation = n 16

2) Find the mean deviation and standard deviation for the following distribution

Marks 10 20 25 40 50
No. of 20 30 50 15 5

146
students

Solution:

Mean deviation 
 f x-x
n

To find x

x
 x  145  29
n 5

X x- x =x-29 xx 2 f f xx 2


xx f xx
10 -19 19 361 20 380 7220
20 -9 9 81 30 270 2430
25 -4 4 16 50 200 800
40 11 11 121 15 165 1815
50 21 21 441 5 105 2205
total 120 1120 14,470

Mean deviation 
 f x-x 
1120
 9.33
n 120

2
standard deviation 
 f x-x 
14470
 10.98
f 120

3) Compute mean deviation about mean, mean deviation about median and standard deviation for the
following distribution

class f
3.0-4.9 5
5.0-6.9 8
7.0-8.9 30
9.0-10.9 82
11.0-12.9 45
13.0-14.9 24
15.0-16.9 6
Here there is a gap between classes. So we have to adjust the classes

Difference -0.1

Difference/2 = 0.05

Lower limit subtract 0.05 and upper limit add 0.05

147
So the given problem becomes

class f
2.95-4.95 5
4.95-6.95 8
6.95-8.95 30
8.95-10.95 82
10.95-12.95 45
12.95-14.95 24
14.95-16.95 6
To find mean deviation

class Mid F fx x- x = xx 2 f xx 2


value xx f xx
x-
-x 10.45
2.95- 3.95 5 19.75 -6.5 6.5 42.25 32.5 211.25
4.95
4.95- 5.95 8 47.6 -4.5 4.5 20.25 36 162
6.95
6.95- 7.95 30 238.5 -2.5 2.5 6.25 75 187.5
8.95
8.95- 9.95 82 815.9 -0.5 0.5 .25 41 20.5
10.95
10.95- 11.95 45 537.75 1.5 1.5 2.25 67.5 101.25
12.95
12.95- 13.95 24 334.8 3.5 3.5 12.25 84 294
14.95
14.95- 15.95 6 95.7 5.5 5.5 30.25 33 181.5
16.95
Total 200 2090 369 1158

 fx  10.45
Mean = f

Mean deviation about mean 


 f x-x 
369
 1.845
f 200

Mean deviation about Median 


 f x - Median
f
1
N  c. f
l 2
Median = f

Calculate Median

148
Mean deviation about Median 
 f x - Median 
f

Standarddeviation σ  c 
 f ( x  x) 2  2  5.79  4.812
f
Co-efficient of variation:

It is equal to the ratio of the standard deviation of a distribution to it’s A.M. and is ofted expressed as
a percentage.

 S.D
 100
coefficient of variation = A.M

Remark:

For comparing the variability of two series, we calculate the co-efficient of variations for each
series.

The series having greater co-efficient of variation is said to more variable or less consistent
than the other and the series having lesser co efficient of variation is said to be more consistent or less
variable than the other.

149
Examples: The following table gives 10 measurements of the same quantity under the same
conditions by each observer A and B

A 8.116 8.125 8.125 8.129 8.130 8.137 8.137 8.141 8.136 8.146
B 8.112 8.118 8.124 8.130 8.136 8.137 8.138 8.139 8.137 8.141

Calculate the mean value and standard deviation value of each observer’s measurements which
observe do you think is probably the more reliable and why?

Solution

For measurements of observer A,

81.322
mean x1   8.132
10 nearly.

Calculation of standard deviation:

X
x  x 2
8.116 0.000256
8.125 0.000049
8.125 0.000049
8.129 0.000009
8.130 0.000004
8.137 0.000025
8.137 0.000025
8.141 .000081
8.136 0.000016
8.146 0.000196

 x  x 
2
0.00071
S.D    0.000071  0.008426
n 10

1
 100
x
Hence the co-efficient of variation for A= 1

0.0084261
 100  0.1036
= 8.132

For the measurement of observer B,

81.312
mean x 2   8.1312
10

=8.131 nearly

150
X
x  x 2
8.112 0.00036
8.118 0.000169
8.124 0.000049
8.130 0.000001
8.136 0.000025
8.137 0.000036
8.138 0.000049
8.139 .000064
8.137 0.000036
8.141 0.0001
Total:81.312 0.000889

0.00089
 22   0.000089
10

 2  0.009434

2 0.0009434
 100   100  0.1160
Hence the co-efficient of variation for B = x 2 8.131

The co-efficient of variation in the case of A is smaller than that of B.

A is more reliable than B.

2) The scores of two bats man A and B in a series of matches as follows:

A 37 43 28 62 59 20 83 48 52 47
B 35 52 77 38 26 58 63 31 40 46

Which of the two bats man do you consider the more consistent and more efficient.

X
x  x 2 y
y  y 2
37 118.81 35 134.56
43 24.01 52 29.16
28 396.01 77 924.16
62 198.81 38 73.96
59 123.21 26 424.36
20 728.41 58 129.96
83 1232.01 63 268.96
48 .01 31 243.36
52 16.81 40 43.56
47 0.81 46 .36

151
x
 x  47.9 y
 y  46.6
n ; n

2888.9
 12   288.89
10

 1  16.996 = 17 nearly

2272.4
 22   227.24
10

 2  15.07
 1 17
 100   100
x
The coefficient of variation for A = 1 47 . 9 35.49%

2 15.07
 100   100
x
The coefficient of variation for B = 2 46.6 32.34%

Since the A.M. of A is greater than the A.M of B, we conclude that A is more efficient than B.

Since the coefficient of variation B is less than the co efficient of variation of A, we conclude that B is
more consistent than A.

Thus even though A is the better player , he is less consistent.

Exercises:

1) The score of two golfers A and B in 12 rounds are given below. Who is the better player and who is
the more consistent player

A 74 75 78 72 78 77 79 81 79 76 72 71
B 87 84 80 88 89 85 86 82 82 79 86 80

To find the S.D of the combination of two groups

Let n1, x1 and  1 be the frequency, the A.M. and S.D. of a first set of variables and those
respectively for the second set be denoted by n2, x 2 and  2 . Let x be the A.M. of the combined set
of n1  n2 variables.

n  n2  N 2 2 2 2 2
If 1 then Nσ  n1σ1  n 2 σ 2  n1D1  n 2 D 2

Where D1  x1  x , D 2  x 2  x

152
Examples:

1) The numbers examined, the mean weight and standard deviation in each group of the examinations
by three medical examiners are given below. Find the mean weight and standard deviation of the
entire data when grouped together.

Medical Examiner No. Examined Mean weight S.D.


Kg. kg.
A 50 56 3
B 60 60 4
C 90 58 5

If  is the standard deviation of the entire data then

Nσ 2  n1σ12  n 2 σ 2 2  n 3σ 3 2  n1D12  n 2 D 2 2  n 3 D 3 2

Where D1  x1  x , D 2  x 2  x , D 3  x3  x

and N= n1+n2+n3

D1= 56-58.1=-2.1

D2= 60-58.1= 1.9

D3=58-58.1=-.1

N= n1+n2+n3=200

Nσ 2  50(3) 2  60(4) 2  90(5) 2  50(2..1) 2  60(1.9) 2  90(0.1) 2

N 2  4098

2
i.e., 200  4098

 2  20.49

  4.527 k.g

Exercises:

1) Find the mean and S.D. of the following two samples put together

Sample No. Size Mean S.D


1 50 158 5.1
2 60 164 4.6

153
UNIT IV

Linear Correlation

Correlation coefficient

Correlation:

The existence of the changes in one variable in sympathy with the changes in the other is called
correlation.

Thus, whenever two variables are related in such a way that a change in one is followed directly or
inversely by a change in the other they are said to be correlated.

The Scatter Diagram

In the case of raw correlated data, we can represent them graphically. Let (x1, y1), (x2,y2),…,
(xn,yn) be pairs of corresponding observations .

For example, x1, x2,…,xn may be the ages of husbands and y1, y2,…, yn may be the ages of wives. Plot
the points(x1,y1), (x2, y2) etc on a graph paer. The figure which ijs simply a collection of dots is called
the dot diagram or the scatter diagram. From this scatter diagram we can guess roughly how the
variables x and y are correlated.

If all the points in the scatter diagram seem to lie near a line as in figure 1(a), there is
correlation between the variables and the correlation is called linear.

If all the points seem to cluster round some curve as in figure 1 (b), the cojrrelation is called the non
linesr.

If the amount of change in oe variable tens to bear constant ratio to the amount of change in
the other variable then the correlation is said to be linear.

If the amount of change in one variable does not bear a constant ratio to the amount of
change in the other variable then the correlation is called nonlinear or curvilinear.

Coefficient of Correlation

Let x and y be respective arithmetic means of x1,x2,…xn and y1, y2, …, yn. There is said
to be positive correlation between x and y if, for any assigned value of x> x , the correspondingvalues
of y tend to be > y and if for any assigned x less than x the corresponding y values tend to be < y .

The correlation is said to be negative if for x> x , y tends to be < y and if for x< x , y tends to
be > y

P
 x - x y  y 
The quantity N is said to be the covariance between x and y.

154
r
P


 xx y y     x  x y  y  
 x  x y  y 
 x  x   y  y   x  x   y  y 
 x y N x y 2 2 2 2
N
N
N N N2

r is called Pearson’s product moment correlation coefficient or correlation co-efficient.

Computation of r –Direct Method

r
 x  x y  y 
N x  y

 x  x   y  y 
2 2
x  y 
N ; N

Given a set of values of x and y, we can calculate x and y by using the formula

xx X yy Y
σx 
 X2 σy 
 Y2
If and then N , N

r
 XY 
 XY
Nσ x σ y
 X 2Y 2
Remark:

The correlation coefficient always lies between -1 and 1.

when r = -1 , it means that there is perfect negative correlation between the variables.

When r = +1, it means that there is perfect positive correlation between the variables.

When r = 0, it means that there is no relationship between the two variables.

Examples:

1) Calculate the Karl Pearson’s coefficient of correlation from the following data

Roll No. of students 1 2 3 4 5


Mark in Accountancy 48 35 17 23 47
Marks in Statistics 45 20 40 25 45

Solution

Let the Marks in Accountancy be denoted by X and Marks in Statistics be denoted by Y.

Roll No. X XX X  X 2 Y YY Y  Y 2 X  X Y  Y 


X  34 Y  35
155
48 14 196 45 10 100 140
35 1 1 20 -15 225 -15
17 -16 256 40 5 25 -80
23 -11 121 25 -10 100 110
47 13 169 45 10 100 130
Total 170 743 175 550 285

X
 x  170  34 y
 y  175  35
n 5 ; n 5

r
 X  X Y  Y 
 X  X   Y  Y 
2 2

285 285
   0.446
743550 639.257
2) Calculate coefficient of correlation from the following data

X 100 200 300 400 500 600 700


Y 30 50 60 80 100 110 130
Solution

To simplify calculation let every value of X be divided by 100 and every value of Y be divided by 10
and denote these series by X’ and Y’

X X’=X/100 X  X' X  X '2 Y Y’=Y/10 Y Y' Y  Y '2


100 1 -3 9 30 3 -5 25 15
200 2 -2 4 50 5 -3 9 6
300 3 -1 1 60 6 -2 4 2
400 4 0 0 80 8 0 0 0
500 5 1 1 100 10 2 4 2
600 6 2 4 110 11 3 9 6
700 7 3 9 130 13 5 25 15
28 28 56 76 46
X'  4 Y'  8

r
 X   X Y 'Y '
 X ' X '  Y 'Y '
2 2

156
(46) 46
r   0.997
(28)(76) 46.130

3) Find the correlation coefficient between the heights of father and heights of son given below:

Height 65 66 67 67 69 71 72 70 65
of
father
(in
inches)
Height 67 68 69 68 70 70 69 70 70
of Son
(in
inches)

Solution

Let x and Y be the heights of father and heights of son respectively

x
 x  612  68 y
 y  621  69
n 9 ; n 9

X Y xx yy x  x 2 y  y 2 x  x y  y 


65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 69 -1 0 1 0 0
67 68 -1 -1 1 1 1
69 70 1 1 1 1 1
71 70 3 1 9 1 3
72 69 4 0 16 0 0
70 70 2 1 4 1 2
65 70 -3 1 9 1 -3
Total:612 621 54 10 12

r
 ( x  x )( y  y ) 
12

12
 0.5164
 ( x  x ) 2  y  y 
2 (54)(10) 23.238
Computation of r –Shortcut
Method:

p
 d1d 2   d1   d 2
N N N

2
x 2

 d12    d1 
N  N 
 

157
2
y 2

 d 2 2    d 2 
N  N 
 

P
r
 x y

4) Find the co-efficient of correlation between industrial production and export using the following
data and comment on the result

Product 55 56 58 59 60 60 62
(in crore
tons)
Exports 35 38 38 39 44 43 44
(in crore
tons)

Solution:

Let x represent the product and y represent the export

Take the origin at 59 for x and 39 for y. We prepare the following table.

X Y d1=x-A d2=Y-B d12 d22 d1d2


55 35 -4 -4 16 16 16
56 38 -3 -1 9 1 3
58 38 -1 -1 1 1 1
59 39 0 0 0 0 0
60 44 1 5 1 25 5
60 43 1 4 1 16 4
62 44 3 5 9 25 15
-3 8 37 84 44

p
 d1d 2    d1   d 2 
N  N  N 
 

44   3  8 
p   
7  7  7 

308  24 332
p 
49 49

2
x 2

 d12   d1 
 
N  N 
 

158
2
37   3 
2
x   
7  7 

259  9 250
 x2  
49 49

2
y 2

 d22   d2 
 
N  N 
 

2
84  8 
2
y   
7 7

588  64 524
 y2  
49 49

p 332 / 49
r  
 x y 250 / 49 524 / 49

332 332
   0.9173
250 524 361.9392

Exercises:

1)Calculate the pearson’s coefficient of correlation form the following data using 44 and 26
respectively as the origin of x and y (instead of x bar use 44 and instead of y bar use 26 in the
formula)

X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10

r
 x  A y  B  
 x  44 y  26
(r=-0.733)  x  A2  ( y  B) 2  x  442  ( y  26) 2
2)From the following table calculate the coefficient of correlation by Karl Pearson’s method

X 6 2 10 4 8
Y 9 11 ? 8 7
Arithmetic mean of x and y are 6 and 8 respectively

Solution:

From the mean value of y we can find the missing value

(9+11+ _+8+7)/5=8 (gn)

Missing value = 40-(35)=5

(r=-.919)

159
3) The following table gives indices of industrial production of registered unemployed (in hundred

thousand). Calculate the value of the coefficient so obtained.

Year 1991 1992 1993 1994 1995 1996 1997 1998


Index of 100 102 104 107 105 112 103 99
Production
Number 15 12 13 11 12 12 19 26
unemployed

Regression Lines

Let (x1, y1) , (x2, y2),…,(xn,yn) be n observations of the two variables. If we plot these n points on a
graph paper, it may happened that these points tend to cluster themselves along sojme well defined
lines. These are called regression lines.

Regression Equations

Regression equations also known as estimating equations are algebraic expressions of the regression
lines, there are two regression equations.

The regression equation of X on Y is used to describe the variations in the values of X for given
changes in Y and the regression equations of Y on X is used to describe the variation in the values of
y for given changes in X.

Regression equation of Y on X

y
Y Y  r
x
X  X 
Regression equation of X on Y

X X r
x
y

Y Y 
Remark:

1. Regression line of y on x is used to find the probable value or expected value of y for a given
value of x.

2. The regression line of x on y is used to find the probable or expected value of x for a given
value of y.

3. Both the regression lines passing through ( x, y )

4. The quantities are called the regression coefficients.

5. If the regression coefficients are both positive, then r is positive. If the regression coefficients
are both negative then r is negative.

160
p p2 p p
r r2   
 x y  x 2 y 2  x2  y2
6. ;

Hence r is the G.M. between the two regression coefficients.

p p y  p
r  r  r x
 x y  x2 x  2 y
7. Since , and y

Hence the regression lines have the equation

y yr
y
x  x  
xx r x y y
y
 
x and

Note:

x
b yx  r
1)  y is the regression co-efficient of X on Y.

y
b xy  r
2)  x is the regression coefficient of Y on X.

r  bxy  b yx
3)

1) The following data relate to the scores obtained by 9 salesmen of a company in an intelligence test
and their weekly sales in thousand rupees.

Test 50 60 50 60 80 50 80 40 70
Scores
Weekly 30 60 40 50 60 30 70 50 60
Sales

a) Obtain the set regression equation of sales on intelligence test scores of the salesmen.

b) Obtain the intelligence test score of a salesman is 64. What would be his expected weekly sales.

2) The following regression equations were obtained from a correlation table:

y  0.516 x  33.73

x  0.312 y  32.52

Find the value of

i) the correlation coefficient

ii) the mean of x

iii) the mean of y

161
Solution

Since the equations to the regression lines are

p p
y y ( x  x) xx  ( y  y)
2
x and  y2

These are the regression coefficients and they are both positive.

Hence the correlation is direct.

p p

Correlation coefficient = x y 2  2
(since regression coefficients are given, r= square root of
mult. (G.M. )Of both the regression coefficients)

 0.5160.512  0.514
( x, y ) is the point of intersection of the regression lines.

y  0.516 x  33.73

x  0.312 y  32.52

Solving (1) and (2)

 0.516 x  y  33.73                (1)


(2)  0.516  0.516x - 0.1601y  16.780............(3)
(1)  (3)  0.8399 y  16.95
y  20.181

-0.516x+y=33.73 -----(1)

0.516x-.264y=16.78 ___(2)

_________________

0.736y=50.51

Y = 68.63

Substitute in (1)

X= 67.64

Hence mean is (67.64, 68.63)

3) From the following data, find the most likely value of y when x = 24

Y x
Mean 958.8 18.1

162
S.D 36.4 2.0
Also r=0.58

Solution

The equation to the regression line of y on x is

y
y yr
x
x  x 
36.4
y  958.8  (0.58) x  18.1
2

y-958.8=10.556x-191.064

y  10.556 x  767.736

Putting x =24

Y= 253.344+767.736=1021.08

Exercises:

4) Find the equation of regression lines for the data given below:

X 25 28 35 32 36 36 29 38 34 22
Y 43 46 49 41 36 32 31 30 33 39

5) Find the regression line of y on x

X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15

Also obtain an estimate of y which should correspond on an average to x =6.2

Rank Correlation:

6 d 2
  1
n(n 2  1) where d= X-Y

Exercise:

Obtain the rank correlation between the variables X and Y from the following pairs of observed
values

X 50 55 65 50 55 60 50 65 70 75
Y 110 110 115 125 140 115 130 120 115 160
UNIT-IV

Fitting a straight line by the method of least squares

163
Let (xi, i) , I = 1,2,…,n be the n sets of observations

Let y= a0 +a1x be the best fir to the data

The normal equations are

na0  a1  x   y

a0  x  a1  x 2   xy

The solutions are

a0 
 y  x 2   x  xy
n x 2   x 
2

n xy   x  y
a1 
n x 2   x 
2

Examples:

1) Fit a straight line for the table of values given below:

Indept. 1 2 4 5 6 8 9
Variable
x
Dependent 2 5 7 10 12 15 19
Variable
y

Solution

Let the straight line be

Y= a1x+a0

The normal equations are

na0  a1  x   y

a0  x  a1  x 2   xy

X y X2 xy
1 2 1 2
2 5 4 10
4 7 16 28

164
5 10 25 50
6 12 36 72
8 15 64 120
9 19 81 171
Total:35 70 227 453

a0 
 y  x 2   x xy
n x 2   x 
2

(70)(227)  (35)(453) 35
a0    0.0962
2 364
7(227)  (35

=- 43.4615

n xy   x  y
a1 
n x 2   x 
2

7(453)  (35)(70) 721


a1    1.9808
7(227)  (35)2 364

Y=1.9808x+43.4615

Remark:

If for the same table of values we assume y to be the independent variable and x to be the dependent
variable, we consider an equation of the type

X= a1y+a0

In this case

a0 
 y  y 2   y  xy
n y 2   y 
2

n xy   x y
a1 
n y 2   y 
2

Polynomial Regression

Assume that n pairs of coordinates (x1,y1) are given which are to be approximated by a quadratic. Let
the quadratic curve be represented by

Y=a2x2+a1x+a0

The normal equations are

165
na 0  a1  x  a2  x 2   y
a 0  x  a1  x 2  a2  x 3   xy
a0  x 2  a1  x 3  a2  x 4   x 2 y

Examples:

1) Fit the quadratic curve to the following data:

X -4 -3 -2 -1 0 1 2 3 4 5
Y 21 12 4 1 2 7 15 30 45 67
Solution

Let y= a2x2+a1x+a0

The normal equations are

na 0  a1  x  a2  x 2   y
a 0  x  a1  x 2  a2  x 3   xy
a0  x 2  a1  x 3  a2  x 4   x 2 y

X Y X2 X3 X4 xy X2y
-4 21 16 -64 256 -84 336
-3 12 9 -27 81 -36 108
-2 4 4 -8 16 -8 16
-1 1 1 -1 1 -1 1
0 2 0 0 0 0 0
1 7 1 1 1 7 7
2 15 4 8 16 30 60
3 30 9 27 81 90 270
4 45 16 64 256 180 720
5 67 25 125 625 335 1675

The normal equations are

10a 0  5a1  85a2  204


5a 0  85a1  125a2  513
85a0  125a1  1333a2  3193

Solving these, we get

a2=1.98; a1= 3, a0=2.07

Fitting Exponential and Trigonometric equations:

bx
Let y = ae be the curve to be fitted.

166
Taking log on both sides

log y  log10 a  bx log10 e

Y=A+Bx where A= log10 a; B= b log10e

The normal equations are

 Y  nA  B x
 xY  A x  B x 2
Examples:
bx
1) Fit a curve y  ae to the above data:

x 0 5 8 12 20
y 3 1.5 1 0.55 0.18

x Y Y=log 10 y x2 xY
0 3 0.4771 0 0
5 1.5 0.1761 25 0.8805
8 1 0 64 0
12 0.55 -0.2596 144 -3.1152
20 0.18 -0.7447 400 -14.894
-0.3511 633 -17.1287

A=0.4815; B=-0.0613

Log 10a = 0.4815

a=10^ (0.4815)= Anti log(0,4815)= 3.0304

B=-0.0613

B=-0.0613/log 10e

= -0.0613/ 0.4343

= -0.1411

y=3.0304 e-0.1411x

Examples:
b
1) Fit a curve y  ax to the following data:

x 1 2 3 4 5 6
y 151 100 61 50 20 8

Solution:

167
Y= axb

Taking log10

Log10y=log10 a+xlog 10b

Y=A+Bx where B= log 10b

The normal equations are

 y  nA  B x
 xY  A x  B x 2
x Y Y=logy xY X2
1 151 2.1790 2.1790 1
2 100 2 4 4
3 61 1.7853 5.3559 9
4 50 1.6990 6.796 16
5 20 1.3010 6.505 25
6 8 0.9031 5.4186 36
9.8674 30.2545 91
The normal equations are

 y  nA  B x
 xY  A x  B x 2
9.8674  6 A  21B
30.3545  21A  91B

Solving these

A= 2,5010, B= -0.2447

Log 10a=2.5010

a=10^ 2.5010=316.9567

b=10^B=0.5692

y  316.9567(0.5692) x

3) Fit a curve y= axb to the following data:

X 1 2 3 4 5 6
Y 1200 900 600 200 110 50

Solution:

Y=axb

168
log 10y=log 10a+blog 10x

Y=A+BX

A=3.3086

b=-1.7494

y=2035x—1.7494

LARGE SAMPLES

Definition

The group of individuals under study is called population or universe.

The population may be finite or infinite

Definition

A finite subset of statistical individuals in a population is called sample.

Definition

The number of individuals in a sample is called sample size.

Example

In a shop we asses the quality of rice or any other commodity by taking a handful of it from the bag
and decide to purchase it or not.

Parameters and statistics

 2
The statistical constants of the population namely mean , variance  which are usually referred to
as parameters.

Statistical measures computed from sample observations alone

Example: Mean, variance etc are usually referred to as statistic.

Sampling Distribution

If we draw a sample of size n from a given finite population of size N then the total number of
possible samples is NCn.

N!
N Cn  K
n!( N  n)!

For each of these k samples we can compute, some statistic say t= t(x1.x2….,xn) in particular the mean,
variance etc as given below:

169
The set of values of the statistic so obtained, one for each sample constitutes the sampling distribution
of the statistic.

Standard error:

The standard deviation of sampling distribution of a statistic is known as its standard error and it is
denoted by (S.E.)

Test of Significance:

A very important aspect of the sampling theory is the study of tests of significance which en enable us
to decide on the basis of the sample results if

i) The deviation between the observed sample statistic and the hypothetical parameter value is
significant.

ii) The deviation between two sample statistics is significant.

Null hypothesis:

For applying the test of significance we first set up of a hypothesis a definite statement about the
population parameter. Such a hypothesis is usually a hypothesis of no difference and it is denoted by
H0.

Alternative hypothesis:

Any hypothesis which is complementary to the null hypothesis is called an alternative hypothesis,
usually denoted by H1.

For example

If we want to test the null hypothesis that the population has a specified mean 0 (say) i.e., H0:
  0

Then the alternative hypothesis would be (i) H1:   0 (i.e., either   0 or   0 )

ii) H1:   0

iii) H1:   0

The alternative hypothesis (i) is known as a two tailed alternative and the alternative in (ii) is
known as right tailed and (iii) is known as left tailed.

The setting of alternative hypothesis is very important to decide whether we have to use a
single tailed (right or left) or two tailed test.

Errors in Sampling

The main objective in a sampling theory is to draw valid inferences about the population parameters
on the basis of the sample results. In practice we decide to accept or reject the lot after examining a
sample from it. We have two type of errors.

170
Type I error: Reject H0 when it is true.

Type II error: Accept H0 when it is wrong

Critical region

A region corresponding to a statistic t in the sample space S which lead to the rejection of H0 is called
critical region or Rejection region. Those region which lead to the acceptance of H0 give us a region
called acceptance region.

Level of Significance

The probability  that a random value of the statistic t belongs to the critical region is known as the
level of significance. In otherwords, level of significance is the size of the Type I error. The levels of
significance usually employed in testing of hypothesis are 5% and 1%.

One tailed and Two tailed test:

A test of any statistical hypothesis where the alternative hypothesis is one tailed ( right tailed
or left tailed ) is called one tailed test.

In a test of statistical hypothesis where the alternative hypothesis is two tailed is called two
tailed test.

Procedure for testing of hypothesis:

i) Set up the null hypothesis

ii) Choose the appropriate level of significance (either 5% or 1%) This is to be decided before sample
is drawn.

t  E (t )
z
iii) Compute the test statistic S .E (t ) under the null hypothesis

iv) we compare the computed value of z in step (iii) with the significant value at given level of
significance.

If z  1.96 H0 may be accepted at 5% level of significance.

If z  1.96 H0 may be rejected at 5% level of significance.

z  2.58 H0 may be accepted at 1% level of significance

z  2.58 H0 may be rejected at 1% level of significance.

For single tailed test ( Right tail or left tail) we compare the computed value of z with 1.645
(at 5% level) and 2.33 (at 1% level) and accept or reject H0 accordingly.

Remark:

Two tailed One-tailed

171
5% 1.96 1.645
1% 2.58 2.33

Calculated value <table value - accept the null hypothesis

Calculated value > table value -reject the null hypothesis

Large Samples:

Definition

If the size of the sample n>30 then that sample is called large sample.

Test of Significance of large samples:

There are 4 important test to test the significance of large samples.

1. Test of significance for single proportion

2. Test of significance for difference of Proportions

3. Test of significance for single mean

4. Test of significance for difference of means

1. Test of significance for single Proportion

Suppose a large sample of size n is taken from a normal population. To test the significant
difference between the sample proportion p and the population proportion P , we use the test statistic

pP
z
PQ
n where n- sample size

PQ
p3
Note: Limits for population P are given by n where q=1-p

Examples:

1) A manufacturer claimed that atleast 95% of the equipment which he supplied to a factory
conformed to specifications. An examination of a sample of 200 pieces of equipment revealed that 18
were faulty. Test his claim at 5% level of significance?

Solution

Given sample size n = 200

172
Number of pieces confirming to specification = 200-18=182

 182
 .91
p= proportion of pieces confirming to specifications = 200

95
P= Population proportion = 100

Null hypothesis H0 : The proportion of pieces confirming to specification

i.e., P=95%

Alternative hypothesis: H1: P<.95 (one-tailed alternative)

pP
z
PQ
Test statistic n

0.91  0.95

.95  .05
= 200

= -2.59

Since alternative hypothesis is one tailed, the tabulated value of z at 5% level of significance is 1.645

Since calculated value of z  2.6 is >1.645, we reject the null hypothesis H0 at 5% level of
significance.

Hence the manufacturer claim is rejected.

2) In a sample of 1000 people in Karnataka 540 are rice eaters and the rest are wheat eaters. Can we
assume that both rice and wheat are equally popular in this state at 1% level of significance?

Solution

Given n =1000

P= sample proportion of rice eaters

540
  .54
1000

P= Population proportion of rice eaters

1
  .5
2

Q=1-P=0.5

Null hypothesis: H0: Both rice and wheat are equally popular in the state i.e., p=P orP= 1/2

173
Alternative hypothesis: H1: P  0.5 (Two tailed alternative) or p  P

Test statistic

pP
z
PQ
n

0.54  0.5
z
(0.5)(0.5)
1000

=2.532

The tabulated value of z at 1% level of significance is 2.58 for two tailed test.

Since calculated value of z< tabulated value of z, we accept H0.

i.e., Both rice and wheat eaters are equally popular in that state.

3) In a sample of 400 parts manufactured by a factory, the number of defective parts was found to be
30. The company, however claimed that only 5% of their product is defective. Is the claim tenable?

Solution

Given n = 400

No. of defectives in the sample = 30

30
 0.075
p= proportion of defectives in the sample = 400

5
 0.05
P= the population proportion = 100

 Q  1  P  1  0.05  0.95

Null hypothesis H0 : The company’s claim P =0.05 is acceptable.

Alternative hypothesis H1: P>0.05 (one tailed alternative)

Test statistic

pP
z
PQ
n

0.075  0.050
z
0.05  0.95
400

174
=2.27

Since the alternative hypothesis is one tailed alternative we apply one tailed test.

Tabulated value of z at 5% level of significance for one tailed test is 1.645.

Since calculated value of z > tabulated value of z, we reject the null hypothesis.

i.e., The company’s claim that only 5% of their product is defective is not acceptable.

4) A die was thrown 9000 times and of these 3220 yield a 3 or 4. Is this consistent with the hypothesis
that the die was unbiased?

Solution

Given n = 9000

P=proportion of success of getting 3 or 4 in 9000 throws

3220

9000

=0.3578

P= Population proportion of success

=P(getting a 3 or 4)

=P(getting 3) + p(getting 4)

1 1 2 1
   
6 6 6 3

P=0.3333

 Q  1  P  0.6667

Null hypothesis: H0: The die is unbiased

1
P
Alternative hypothesis H1: 3 (Two tailed alternative)

Test statistic

pP
z
PQ
n

175
0.3578  0.3333
z  4.94
(0.3333)(0.6667)
9000

Since alternative hypothesis is two tailed alternative, we apply two tailed test

The tabulated value of z for two tailed test at 5% level of significance is 1.96.

Since calculated value of z > tabulated value of z , the null hypothesis is rejected.

i.e., the die is biased.

5) A random sample of 500 pineapples was takes from a large consignment and 65 were found to be
bad. Find the percentage of bad pineapples in the consignment.

Solution

Given n= 500

65
 0.13
P=proportion of bad pineapples in the sample= 500

q=1-p=0.87

pq
p3
We know that limits for population proportion P are given by n

0.13(.87)
 0.13  3
500

 .13  0.045  (0.175,0.085)

 The percentage of bad pineapples in the consignment lies between 17.5%, 8.5%

6) A random sample of 500 apples was taken from a large consignment and 60 were found to be bad.
Obtain the 98% confidence limits for the percentage number of bad apples in the consignment.

Solution:

Given n = 500

60
 0.12
P= proportion of bad apples in the sample = 500

q=.88

We know that 98% confidence limits for population proportion are

176
pq
p  2.33
n

0.12  .88
 0.12  2.33
500

 0.12  2.33(0.01453)

 0.08615,0.15385

 98% confidence limits for percentage of bad apples in the consignment are (8.61%, 15.38%).

Difference of Proportions

Suppose 2 large samples of sizes n1 and n2 are taken respectively from 2 different populations.

To test the significant difference between the sample proportions p1 and p2

p1  p 2 n p  n 2p2
z p 1 1
 1 1  n1  n 2
pq  
n
 1 n 2  where and q = 1-p

Examples:

1) Random samples of 400 men and 600 women were asked whether they would like to have a
flyover near their residence. 200 men and 325 women were in favour of the proposal. Test the
hypothesis that proportions of men and women in favour of the proposal are same at 5% level of
significance?

Solution

p1; p2

Given sample sizes

n1=400, n2=600

200
 0.5
Proportion of men = P1= 400

325
 0.541
Proportion of women P2 = 600

Null hypothesis H0: Assume that there is no significant difference between the option ofmen and
women as far as proposal of flyover is connected

i.e., H0 = P1=P2

Alternative Hypothesis: H1: p1  p 2 (two tailed alternative)

177
p1  p 2
z
 1 1 
pq  
n
 1 n 2

n p  n 2p2
p 1 1
where n1  n 2 and q = 1-p

n p  n 2p2
p 1 1
n1  n 2

200
400   600  325
400 600
p
400  600

525
  0.525
1000

q=1-p=1-.525=0.475

0.5  0.541
z
 1 1 
0.525(.475)  
 400 600 

 0.041

0.032

= -1.28

z  1.28

The tabulated value of z for two tailed test at 5% level of significance is 1.96.

Since calculated value of z < tabulated value , we accept the null hypothesis at 5% level of
significance.

i.e., there is no difference of option between men and women as far as proposal of flyover is
concerned.

2) Before an increase in excise duty on tea, 800 persons out of a sample of 1000 persons were found
to be tea drinkers. After an increase on duty, 800 people were tea drinkers in a sample of 1200 people.
Using standard error of proportion, state whether there is a significant decrease in the consumption of
tea after the increase in excise duty?

Solution

Given n1=1000, n2=1200

178
800
 0.8
P1= 1000

800
 0.667
P2 = 1200

Null hypothesis H0: Assume that there is no significant difference between the consumption of tea
before and after the increase in excise duty.

i.e., H0 = P1=P2

Alternative Hypothesis: H1: p1  p 2 (one-tailed alternative)

p1  p 2
z
 1 1 
pq  
 n1 n 2 

n p  n 2p2
p 1 1
where n1  n 2 and q = 1-p

n p  n 2p2
p 1 1
n1  n 2

800
1000   1200  800
1000 1200
p
1000  1200

 0.727

q=1-p=1-0.727=0.273

0.8  0.667
z
 1 1 
0.727(0.273)  
 1000 1200 

0.133
 7
0.019

z 7

The tabulated value of z for one tailed test at 5% level of significance is 1.645.

Since calculated value of z > tabulated value , we reject the null hypothesis at 5% level of
significance.

That is, there is a difference in the consumption of tea before and after the increase in excise duty.

179
Note:

If we want to test the significance of the difference between p1 and p where

n p  n 2p2
p 1 1
n1  n 2

p1  p
z
n2 pq
n1 (n1  n2 )

3) In a random sample of 400 students of the university teaching department, it was found that 300
students failed in the examination. In another sample of 500 students of the affiliated colleges the
number of failures in the same examination was found to be 300. Find out whether the proportion of
failures in the university teaching departments significantly greater than the proportion of failures in
the university teaching departments and the affiliated colleges taken together.

Solution

Given n1=400, n2=500

300
 0.75
P1= 400

300
 0.6
P2 = 500

n p  n 2p2
p 1 1
n1  n 2

400(0.75)  500(0.6)
p
400  500

=0.667

q=0.333

Null hypothesis: H0: Assume that there is no significant difference between p1 and P (i.e., p1=P)

Alternative Hypothesis: p1>P (one-tailed alternative)

p1  p
z
n2 pq
Test statistic n1 (n1  n2 )

180
0.75  0.667
z
500  0.667  0.333
400(400  500)

=4.74

The table value of z for one tailed test at 5% level of significance is 1.645.

Since calculated value of z > table value , we reject the null hypothesis.

Therefore the proportion of failures in the affiliated colleges is greater than the proportion of failures
in university departments and affiliated colleges taken together.

Note:

i) Suppose the population proportions P1 and P2 are given and P1  P2 . If we want to test the
hypotheis that the difference P1-P2 in the population proportions is likely to be hidden in simple
samples of sizes n1 and n2 from the two populations respectively then

p1  p 2   (P1  P2 )
z
P1Q1 P2 Q 2

n1 n2

ii) If the sample proportions are not known then we use

P1  P2
z
P1Q1 P2 Q2

n1 n2

4) A cigarette manufacturing firm claims that its brand A line of cigarettes outsells its brand B by
8%. (The meaning of this one is P1-P2=8%=0.08)If it is found that 42 out of a sample of 200 smokers
prefer brand A and 18 out of another sample of 100 smokers prefer brand B.Test whether the 8%
difference is a valid claim.

Solution

Given n1=200, n2=100

42
p1= 200

18
p2 = 100

8
 0.08
P1-P2=8%= 100

Null Hypothesis: Assume that 8% difference in the sale of two brands of cigarettes is valid claim.

181
i.e., H0 : P1-P2=0.08

Alternative hypothesis: P1  P2  0.08 (Two-tailed alternative)

p1  p 2   (P1  P2 )
z
1 1 
pq  
 n1 n2 

P=0.2

Z=-1.02

z  1.02

Since the alternative is two tailed alternative, we apply two tailed test.

The table value of z at 5% level of significance for two tailed test is 1.96

Since the calculated value of z(=1.02) < table value(=1.96), we accept the null hypothesis.

Hence, 8% difference in the sale of two brands of cigarettes is valid claim.

5) In two large populations, there are 30% and 25% respectively of fair haired people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two populations.

Solution

Given n1=1200, n2=900

30
 0.3
P1= 100

25
 0.25
P2 = 100

Q1= 1- P1 = 0.7

Q2= 1-P2 = 0.75

Null hypothesis: Assume that sample proportions are equal

i.e., H0:p1=p2

i.e., the difference in population proportion is likely to be hidden in sampling

Alternative Hypothesis: p1  p 2 (Two-tailed alternative)

P1  P2
z
P1Q1 P2 Q2

n1 n2

182
=2.55

The table value of z at 5% level of significance for two tailed test is 1.96

Since calculated value of z > table value, we reject the null hypothesis.

i.e., The sample proportions are not equal.

Exercises:

5) A machine produced 20 defective articles in a batch of 400. After over hauling it produced 10
defective in a batch of 300. Has the machine improved? (Null hypo: p1=p2; Alter. hypo:p1 <p2 )

Test of Significance for single mean

Suppose we want to test whether the given sample of size n has been drawn from a population
 
with mean , we set up the null hypothesis that there is no difference between t x and where x
is the sample mean.

x

 s 
 
The test statistic z  n  where s is the sample size.

If  is given

x
z
  
 
 n

Note:


x  1.96
The values n are called 95% confidence limits for the means of the populations
corresponding to the given sample.


x  2.58
Similarly, n are called 99% confidence limits.

Examples:

1) A sample of 900 members has a mean of 3.4cms and S.D. 2.61cms. Is the sample drawn from a
large population of mean 3.25cm and S.D. 2.61cms? If the population is normal and its mean is
unknown. Find the 95% fiducial limits of true mean?

Solution

Given n = 900 ;   3.25   2,61

183
x  3.4 ; s  2.61

Null hypothesis: H0: Assume that the sample has been drawn from the population with mean
  3.25 .

Alternative Hypothesis: H1:   3.25 (Two tailed alternative)

x
z

The test statistic n

=1.724

The table value of z for two tailed test at 5% level of significance is 1.96

Since z  1.724 <1.96, we accept the null hypothesis.

That is the sample has been drawn from the population with mean   3.25 .


x  1.96
95% confidence limits are n

2.61
3.4  1.96
= 900

=3.57 and 3.2295

2) An insurance agent has claimed that the average age of policy holders who issue through him is
less than the average for all agents which is 30.5 years. A random sample of 100 policy holders who
had issued through him gave the following age distribution

Age 16-20 21-25 26-30 31-35 36-40


No. of 12 22 20 30 16
persons

Calculate the A.M. & S.D. of this distribution and use these values to test his claim at 5%
level of significance.

Solution

Calculate mean and s.d for the above data we get

x  28.8 , S.D. s  6.35

Null hypothesis H0: The sample is drawn from a population with mean   30.5

Alternative hypothesis: H1:   30.5 (see the question there less than came) (one tailed)

184
x
z
s
n

28.8  30.5

6.35
100

= -2.677

Since the alternative hypothesis is one tailed alternative we apply one tailed test.

The table value of z at 5% level of significance for one tailed test is 1.645

Since calculated value z  2.68  table value 1.645, we reject the null hypothesis.

 x and μ differ significantly.

i.e., the sample is not drawn from a population with mean μ .

IV:Test of significance for difference of Means

 2
Let x1 be the mean of a sample of size n1 from a population with mean 1 and S.D. 1

 2
Let x2 be the mean of a sample of size n2 from a population with mean 2 and S.D.  2

To test whether there is any significant difference between x1 and x2 .

x1  x2
z
12  22

n1 n2 .

Note: If the samples have been drawn from the same population then

12   22   2

x1  x2
z
2 2

n1 n2

ii) If  is not known then

n112  n2 22
2
 
n1  n2

185
Examples:

1) The means of 2 large samples 1000 and 2000 members are 67.5 inches and 68.0 inches
respectively. Can the samples be regarded as drawn from the population of S.D. 2.5 inches?

Solution

Given n1=1000; n2=2000

x1  67.5; x 2  68

Population S.D.   2.5 inches

Null hypothesis H0: x1  x 2 .

Alternative hypothesis H1: x1  x2 (two tailed alternative)

x1  x2
z
2 2

n1 n2

67.5  68  0.5
z
2.52  2.52 0.0968
1000 2000 =

=-5.16

Since the alternative hypothesis is two tailed alternative, we apply two-tailed test.

The table value of z for two tailed test at 5% level of significance is 1.96.

z  5.16  1.96

 we reject the null hypothesis at 5% level of significance .

i.e., the samples are not drawn from the same population of S.D. 2.5 inches.

2) The mean yield of wheat from a district was 210 pounds with S.D. 10 pounds per acre from a
sample of 100 plots. In another district the mean yield was 220 pounds with S.D. 12 pounds from a
sample of 150 plots. Assuming that the S.D. of yield in the entire state was 11 pounds, test whether
there is any significant difference between the mean yield of crops in the two districts.

Solution

x1  210 n1 =100

x2  220 ; n2 = 150

  11

186
Null hypothesis H0: x1  x2

Alternative hypothesis H1: x1  x2

x1  x2
z
2 2

n1 n2

210  220

112 112

100 150

=7.041

Since z  7.041  1.96 at 5% level of significance, we reject the null hypothesis.

That is there is a significant difference between the mean yield of crops in the two districts.

Note: If the two samples are drawn from two populations with unknown standard deviations then

x1  x2
z
s12 s22

n1 n2

3) In a survey of buying habits, 400 women shoppers are chosen at random in super market A located
in a certain section of the city. Their average weakly food expenditure is Rs.250 with a S.D. of Rs.40.
For 400 women shoppers chosen at random in super market B in another section of the city, the
average weakly food expenditure is Rs.220 with a S.D. of Rs.55. Test at 1% level of significance
whether the average weakly food expenditure of the two populations of shopper are equal.

Solution

n1=400; x1  250 , s1 = 40

n2=400; x2  220 , s2 = 55

Null hypothesis H0: x1  x2

Alternative hypothesis H1: x1  x2

x1  x2
z
s12 s2 2

n1 n2

187
250  200
z
402 552

400 400

z= 8.82

The table value of z for two tailed test at 1% level of significance is 2.58.

Since z  8.82  2,58 , we reject the null hypothesis.

That is, The average weakly food expenditure of the two populations of shoppers are not equal.

4) The means of two samples of 1000 and 2000 items are 67.5 and 68.0 respectively. Can the
samples be regarded at 5 % level of significance, as drawn from the same population with standard
deviation 2.5?

Solution

n1 = 1000, n2= 2000

x1  67.5; x2  68

  2.5

Null hypothesis H0 : x1  x2

Alternative hypothesis H1: x1  x2

x1  x2
z
2 2

n1 n2

=5.163

Since z  1.96 , we reject the null hypothesis.

Therefore The samples are not drawn from the same population with standard deviation 2.5

Exercises:

5) A sample of 100 electric light bulbs produced by manufacturer A showed a mean life time of 1190
hrs and a S.D of 90 hrs. A sample of 75 bulbs produced by manufacturer B showed a mean life time
of 1230 hrs with a standard deviation of 120 hrs. Is there a difference between the mean life time on
the two brands at significance level of i) 5% , ii) 1% (Ans: z= 2.421) (In this problem Samples are
draw from different population)

188
6) There are two brands of car tyres A and B in the market. A sample of 100 tyres of brand A has an
average life of 37500 kms with a S.D. of 2500 kms. Another sample of 75 tyres of brand B has an
average life of 3900 kms with a S.D. of 3000 km. Can we conclude that brand B is better than better
than A? (In this problem samples are drawn from same population)

In alternative hypothesis

Brand B mean value is x2 Brand A mean value is x1

Given brand B is better than better than A so we write x2 > x1

i.e., x1 < x2

189
UNIT-V

Definition: A sample of size less than or equal to 30 is said to small sample.

Testing the significance of the difference of sample mean

t-test

x x  x  x 
2
t x s 2

s n n
n  1 where ;

The number of degree of freedom of statistic generally denoted by  is defined as the number n of
independent observations in the sample (i.e., the sample size) minus the no. k, of population
parameters which must be estimated from sample observations.

  nk

Degree of freedom = n-1

Assumptions:

1) The parent population from which the samples are drawn are normally distributed.

2) The two samples are random and independent of the other.

2 2 2
3)  1   2   i.e., the population variances are equal.

Examples:

A sample of 26 bulbs gives a mean life of 990 hours with a standard deviation of 20 hours. The
manufacturer claims that the mean life of bulbs is 1000 hours. Is the sample not upto the standard

Solution

x  990,   1000, s  20

Null hypothesis: The sample is upto the standard.

Alternative hypothesis:   1000

x
t
s
n 1

=-2.5

t  2.5

The tabulated value of t at 5% level of significance for 25 degree of freedom of one –tailed test is 1.7

190
Degree of freedom = n-1 =26-1=25

Since the calculated value of t is greater than tabulated value of t we reject the null hypothesis H0.

Therefore the sample is not upto the standard.

2) Tests made on the breaking strength of hard drawn copper wire gave the following results in
kilograms

58, 582, 580, 578, 582, 580, 582, 606, 594

It was stated that the population mean of the breaking strength is 593kgs. Examine the validity of this
statement given that the 5% value of the student’s t test for degree of freedom, 9 and 10 are
respectively 2.31, 2.26, 2.23.

Solution

Null hypothesis: H0:   593

Alternative hypothesis H1:   593

Calculation of mean x and standard deviation s

xx x  x 2
588 2.222 4.937
582 -3.778 14.273
580 -5.778 33.385
578 -7.778 60.497
582 -3.778 14.273
580 -5.778 33.385
582 -3.778 14.273
606 20.222 408.929
594 8.222 67.601
Total 651.553

x
 x  585.778
n

 x  x 
2
2 651.553
s  
n 9 =72.395

 x  x 
2
s
n =8.509

x 585.778  593  7.222


t  
s  8.509  3.0084
 
n 1  2.8284 

191
=2.4006

Degree of freedom = 9-1=8

The table value of t for 8 degree of freedom at 5% level of significance is 2.31.

Since t =2.4006 > 2.31, we reject the null hypothesis

Therefore the population mean of the breaking strength is not 593 kgs

Testing the significance of the difference between two sample means

x1  x 2
t
1 1
s 
n1 n2

Degree of freedom = n1+n2-2

Examples:

1) The I.Q’s of 16 students from an area of the city showed a mean of 107 with S.D. of 10 while the
I.Q. of 14 students from another area of the city showed a mean of 112 and S.D. of 8. Is there a
significant difference between the I.Q.’s of 2 groups at 5% level of significance?

Solution

n1= 16, n2=14

x1  107 x 2  107

s1=10, s2=8

Null hypothesis H0: μ1  μ 2

There is no significant difference between the sample means.

Alternative hypothesis H1: 1   2

Test statistic

x1  x 2 n s 2  n2 s 2 2
t s2  1 1
1 1 n1  n2  2
s 
n1 n2 where

n s 2  n2 s 2 2
s2  1 1
n1  n2  2 =9.44

t=-1.447

192
t  1.447

Degree of freedom = n1+n2-2=16+14-2=28

The table value of t for 2 degree of freedom is 2.05

The calculated value of t is less than the table value of t 2.05.

Hence the null hypothesis is accepted.

There is no significant difference between two I.Q.’s.

2) A group of 10 boys fed on diet A and another group of boys fed on diet B recorded the following
increase in weights

Diet A 5 6 1 12 4 3 9 6 10kgs
Diet B 2 3 6 8 10 1 2 8kgs
Does it show the superiority of diet A over that of diet B?

Solution

Let the increase in weights (in kgs) due to diets A and B be denoted by the variables x and y
respectively.

Null hypothesis: H0: 1   2

Alternative hypothesis H1: 1   2

Computations of the sample mean and Standard deviation

X1 x1  x1 x1  x1 2 X2 x2  x2 x2  x2 2
5 -1.4 1.96 2 -3 9
6 -0.4 0.16 3 -2 4
8 1.6 2.56 6 1 1
1 -5.4 29.16 8 3 9
12 5.6 31.36 10 5 25
4 -2.4 5.76 1 -4 16
3 -3.4 11.56 2 -3 9
9 2.6 6.76 8 3 9
6 -0.4 0.16
10 3.6 12.96
Total:64 102.4 40 82

x1 
 x1  64
n1 10 =6.4

x2 
 x2 
n2 5

193
 x1  x1 
2
2 102.4
s1    10.24
n1 10

 x2  x2 
2
2
s2 
n2 = 10.25

n s 2  n2 s 2 2 102.4  82
s2  1 1 
n1  n2  2 16 =11.525

s=3.395

x1  x 2
t
1 1
s 
n1 n2

=0.8694

Degree of freedom = n1+n2-2=16

The table value of t for 16 degree of freedom is 1.75

Since the calculated value of t is less than the tabulated value, it is not significant.

Hence the null hypothesis is accepted.

We conclude that the diets A and B do not differ significantly in respect of increase in weights.

Testing the significance of difference of sample variances

F – test

2
Let n1 be the number of observations in a sample I from the first population with variance  1 and
2
n2 be the number of observations in a sample II from the second population with variance  2 .

2 2 2
We set up null hypothesis Ho:  1   2  

S12
F
S22

Where

 X  X 1 
2
2
S1 
n1  1

194
 X  X 2 
2
2
S2 
n2  1

Degree of freedom = (n1-1, n2-1)

Note:

In numerical problems, we take the greater of the variances S12 or S2 2 in the numerator.

Assumptions on F-test

1) The populations for each sample must be normally distributed

2) The samples must be random and independent.

2 2
3) The ratio of  1 to  2 should be equal to 1 or greater than 1.

Examples:

1) Two samples of 6 and 7 items have the following values of the variables

Sample I 39 41 43 41 45 39
Sample II 40 42 40 44 39 38 40

Do the sample variances vary significantly?

Solution

2 2
Null hypothesis H0:  1   2

The sample variances do not differ significantly

2 2
Alternative hypothesis H1:  1   2

S 2
F 1
S22

X1 x1  x1 x1  x1 2 X2 x2  x2 x2  x2 2
39 -2.3 5.29 40 0.16
41 0.09 42 1.6 2.56
43 1.7 2.89 40 -0.4 0.16
41 -0.3 0.09 44 3.6 12.96
45 3.7 13.69 39 -1.4 1.96
39 -2.3 5.29 38 -2.4 5.76
40 -0.4 0.16
Total 27.34 23.72

195
x1 
 x1  248  41.3
n1 6

x2 
 x2 
283
 40.4
n2 7

 X  X 1 
2
2
S1 
n1  1

=5.468

 X  X 2 
2
2
S2 
n2  1

=3.9533

S 2
F 1
S22

=1.383

Degree of freedom = (n1-1, n2-1)=(5,6)

The table value of F for (5,6) degree of freedom at 5%level of significance is 4.39

Since calculated value is less than the table value, we accept the null hypothesis.

Hence the variances do not differ significantly.

2) Two independent samples of sizes 7 and 6 have the following values of the variables

Sample I 28 30 32 33 31 29 34
Sample II 29 30 30 24 27 28 -
Examine whether the samples have been drawn from normal population having the same variance?

Solution:

F=S2^2/S1^2

F=1.113

The table value of F for the degree of freedom (5,6) at 5%level of significance is 4.39

Since the calculated value of F is less than the table value, we accept the null hypothesis.

Hence, the samples have been drawn from normal population having the same variance.

196
3) Two random samples gave the following results.

Sample Size Sample mean Sum of squares of


deviations from the mean
 ( x  x) 2
I 10 15 90
II 12 14 108

Test whether the samples could have come from the same normal population

Soln:

Null hypothesis:

Alternative hypothesis:

 x  x1 
2
90
  10
S12= n1  1 9

 x2  x2 
2
2 108
S2    9.8
n2  1 11

F=S1^2/S2^2=10/9.8=1.02

Degree of freedom is (9,11)

The table value of F for the degree of freedom (9,11) is 2.90

Since calculated value is less than the table value, we accept the null hypothesis.

Hence the samples could have come from the same normal population.

Chi-Square Test-

Type-I: Goodness of fit

1) The following table gives the number of accidents that take place in an industry during various
days of the week. Test if accidents are uniformly distributed over the week

Day Mon Tue Wed Thur Fri Sat


No. of 14 `18 12 11 15 14
accidents
Degree of 5 6 7
freedom
Value of 11.07 12.59 14.07
2

197
Soln: Null hypotheis: The accidents are uniformly distributed over the week

Alternative hypothesis: The accidents are not uniformly distributed over the week

Total no. of accidents=84

We have to test the hypothesis that the accidents are uniformly distributed over the 6 days of the
week.

On the basis of this hypothesis we should expect 84/6=14 accidents each day.

 
2 O  E 2
E

O E (O-E)2 (O-E)2/E
14 14 0 0
18 14 16 1.142
12 14 4 4/14=0.285
11 14 9 9/14=0.642
15 14 1 1/14=0.071
14 14 0 0
Total 2.14

2  
O  E 2  2.14
E

Degree of freedom=n-1=6-1=5

The table value of chi-square for the degree of freedom 5 at 5% level of significance is 11.07.

2
Since the calculated value of  is less than the table value we accept the null hypothesis.

Hence the accidents are uniformly distributed over the week.

2) In a cross breeding experiment with plants of certain species, 240 offspring were classified into 4
classes with respect to the structure of their leaves as follows:

Class: I II III IV Total

Frequency 21 127 40 52 240

According to the theory of heredity, the probability of the four classes should be in the ratio 1:9:3:3.
Are these data consistent with theory.

Soln:

Null hypothesis: The given data are consistent with the theory that the frequencies in the four classes
should be in the ratio 1:9:3:3.

On the basis of this hypothesis , the expected frequencies are

198
1 9 3 3
 240,  240,  240,  240
16 16 16 16

i.e., 15,135,45,45

2
Computation of  :

O E (O-E)2 (O-E)2/E
21 15 36 2.4
127 135 64 0.4742
40 45 25 0.5556
52 45 49 1.089
4.5188

2  
O  E 2  4.5188
E

Degree of freedom=n-1=4-1=3

2
The table value of  for the degree of freedom 3 at 5% level of significance is 7.81

2
Since the calculated value of  is less than the table value we accept the null hypothesis.

Hence the given data are consistent with the theory that the frequencies in the four classes should be
in the ratio 1:9:3:3.

3) Records taken of the number of male and female births in 800 families having four children are as
follows:

No. of boys 0 1 2 3 4
No. of girls 4 3 2 1 0
No. of families 32 178 290 236 64
Test whether the data are consistent with the hypothesis that the binomial law holds and that the
chance of male birth is equal to that of a female birth, namely p= q =1/2

Soln:

Null hypothesis: The data is consistent with the binomial law of equal probability for male and female
births.

i.e., p=q=1/2

Calculation of theoretical frequencies:

P(r)= Prob. of r male births in a family of n births

nr
ncr p r q n  r r
1 1
ncr    
= = 2 2

199
n
1
p (r )  ncr  
2

4
 1  50 4cr
8004cr  
Frequency of r male births f(r)=N p(r)= 2 =

Put r=0,1,2,3,4 we get

f(0)= 50  4c0  50

f(1)= 50  4c1  200

f(2)= 50  4c2  300

f(3)= 50  4c3  200

f(4)= 50  4c4  50

2
Calculation of  :

O E (O-E)2 (O-E)2/E
32 50 324 6.48
178 200 484 2.42
290 300 10 0.3333
236 200 1296 6.48
64 50 196 3.92
19.63

2  
O  E 2  19.63
E

Degree of freedom=5-1=4

2
The table value of  for the degree of freedom 4 at 5% level of significance is 9.49

2
Since the calculated value of  is greater than the table value we reject the null hypothesis.

Hence the data are not consistent with the hypothesis that the binomial law holds and that the chance
of male birth is not equal to that of a female birth.

4) In the accounting department of a bank, 100 accounts are selected at random and examined for
errors. The following result has been obtained.

No. of errors: 0 1 2 3 4 5 6

No. of accounts: 35 40 19 2 0 2 2

200
Does the information verify that the errors are distributed according to the poisson distribution law?

Soln:

emmr
p(r ) 
r! , m-mean

 fr
To get mean, use f
If the random variable X denotes the number of errors, the given distribution is as follows:

x f fx
0 35 0
1 40 40
2 19 38
3 2 6
4 0 0
5 2 10
6 2 12

Mean m 
 fx  106  1.06
 f 100
m=1.06

To fit a poisson distribution to the data we take the parameter of poisson distribution is equal to the
mean.

i.e., m=1.06

The frequency of x errors is given by the poisson law as

emm x
N
f(x)=Np(x)= x!

e 1.06 1.060
100  34.65
f(0)= 0!

e 1.06 1.061
100  36.73
f(1)= 1!

f(2)=19.47; f(3)=6.88, f(4)=1.82, f(5)=.39, f(6)=0.06

201
2
Calculation of  :

O E (O-E)2 (O-E)2/E
35 34.65 .1225 0.0035
40 36.73 10.6929 .2911
19 19.47 0.2209 0.0113
2 6 6.88 9.15 (6- 1.0844
9.15)2=9.9225
0 1.82
2 .39
2 0.06
1.3903
(In observed frequency Any frequency value is less than 5 we make the adjustment with other
classes)

 2 O  E 2  1.3903
E

Degree of freedom=n-1-(here we calculated mean value) so we have

=n-1-1

Again 3 cell are reduced. So degree of freedom = n-1-(1)-3=7-1-(1)-3=2

2
The table value of  for 2 degree of freedom at 5% level of significance is 5.99.

2
Since the calculated value of  is less than the table value we accept the null hypothesis.

Hence the information verify that the errors are distributed according to the poisson distribution.

Chi-square- as a test of independence

 2 O  E 2
E

RowTotal  ColumnTotal
Expected frequency= GrandTotal

The degree of freedom = (m-1)(n-1) where m is no of rows and n is no. of columns

Remark: For 2 x 2 contingency table

a b
c d

2 N (ad  bc) 2
 
a  b (a  c)(b  d )(c  d )

202
Yates correction:

Any cell frequency value is less than 5 we apply yates correction.

Add 0.5 to that cell and adjust the remaining cell frequencies.

Examples:

1) The following table gives the no. of good and bad parts produced by each of the three shifts in a
factory.

Good parts Bad parts Total


Day shift 960 40 1000
Eve shift 940 50 990
Night shift 950 45 995
2850 135 2985 (2985)

Test whether or not the production of bad parts is independent of the shift on which they were
produced?

Chi-square value for 2 degree of freedom is 5.991

Soln:

Null hypothesis: The production of bad parts is independent of the shift

Alternative hypo: The production of bad parts is not independent of the shift.

Now we calculate Expected frequencies:

RowTotal  ColumnTotal
Expected Frequency 
GrandTotal

1000  2850

E(960)= 2985 954.77

1000  2850

E(40)= 2985 45.23

990  2850
 945.23
E(940)= 2985

990  135
 44.77
E(50)= 2985

995  2850
 950
E(950)= 2985

E(45)=45

2
Calculation of 

203
Observed
Frequency
Expected Frequency
E
(O-E)2 O  E 2
O E
960 954.77 27.35 0.0286

40 45.23 27.35 0.6047

940 945.23 27.35 0.0289


50 44.77 27.35 0.6109
950 950 0 0
45 45 0 0
1.2731

2  
O  E 2
E

=1.2731

The degree of freedom = (m-1)(n-1) where m is no of rows and n is no. of columns

=(3-1)(2-1)=2

The table value of chi-square for 2 degree of freedom is 5.991

Since calculated value is less than table value, we accept the null hypothesis.

Hence the production of bad parts is independent of the shift on which they were produced.

2) Can vaccination be regarded as preventive measure of small pox as evidenced by the


following data:

Of 1482 persons exposed to small pox in a locality, 368 in all were affected. Of these 1482
persons, 343 were vaccinated and of these only 35 were attacked. Given chi-square at 5% level of
significance for 1 degree of freedom is 3.841.

204
Soln:

Vaccinated Not Vaccinated Total


Attacked by 35 (368-35)=333 368
small pox
Not attacked (343-35)=308 ? (1114-308)=806 ?(1482-368)=1114
343 333+806=1139 1482

Null hypothesis: vaccination and attack by small pox are independent.

Calculation of Expected Frequencies:

RowTotal  ColumnTotal
Expected Frequency 
GrandTotal

368  343

E(35) = 1482 85.17

368  1139

E(333)= 1482 282.83

1114  343

E(308)= 1482 257.83

1114  1139

E(806)= 1482 856.17

2
Calculation of 

O E (O-E)2 O  E 2
E
35 85.17 2517.03 29.55
333 282.83 2517.03 8.90
308 257.83 2517.03 9.76
806 856.17 2517.03 2.94
Total: 51.15

2  
O  E 2
E =51.15

The degree of freedom for chi-square test = (m-1)(n-1)= (2-1)(2-1)=1

The table value of chi-square test for 1 degree of freedom is 3.841.

Since the calculated value 51.15> table value, we reject the null hyposthesis.

We conclude that vaccination and attacked by small pox are not independent.

205
Yates correction:

Any cell frequency value is less than 5 we apply yates correction.

Add 0.5 to that cell and adjust the remaining cell frequencies.

3) In an experiment on the immunization of goats from anthrax. The following results were
obtained. Derive your inference on the efficiency of the vaccine.

Died Survived Total


Inocculated 2 <5 10 12
Not inoculated 6 6 12
8 16 24

Solution:

Since the cell (1,1) 2 which has the frequency 2 less than 5 we apply yates correction.

We add 0.5 to that cell and adjust the remaining cell frequencies.

After applying yate’s correction

Died Survived Total


Inocculated 2 .5 here we added 0.5 10 here we do 12 to get the same total
subtraction with 0.5
is changed to 9.5
Not inoculated 6 now this one 6 is increased to 0..5 12
changed to 5.5 Now this chaged to 6.5
8 16 24

RowTotal  ColumnTotal
ExpectedFrequency 
GrandTotal

10  8
 3.33
E(2.5)= 24

12  16
8
E(9.5)= 24

12  8
4
E(6.5)= 24

12  16
8
E(5.5)= 24

206
2
Calculation of 

O E (O-E)2 O  E 2
E
2.5 3.33 0.68 0.204
9.5 8 2.25 0.28
6.5 4 6.25 1.56
5.5 8 6.25 0.78
1.7

  2 O  E 2
E =1.7

The degree of freedom for chi-square test = (m-1)(n-1)= (2-1)(2-1)=1

The table value of chi-square test for 1 degree of freedom is 3.841.

Since the calculated value 1.7 < table value, we reject the null hypothesis.

We conclude that inoculation is not effective against the disease.

207

You might also like