Probabilityandstaticspdf 2024 08 21 14 15 59
Probabilityandstaticspdf 2024 08 21 14 15 59
Basic Probability
Random Experiment
If an experiment is repeated under the same conditions, any number of times, it does not give
unique results but may result in any of the several possible outcomes
Thus an experiment whose outcome cannot be predicted is called a random experiment or trial
and the outcomes are known as events or cases.
Sample space
The set of all possible outcomes of a random experiment is called a sample space and is
denoted by S.
Favourable Events
Example
In tossing two coins the cases favourable to the event of getting a head are HT, TH, HH
Two events A and B are said to be mutually exclusive if they can not occur simultaneously.
Note
If A and B represent mutually exclusive events then they are disjoint, that is A B where is
the null set.
Example
1. when we toss a coin, either head or tail can be up, but both cannot be up at a time.
2. when we throw a dice the outcomes getting 1, 2, 3 . . . . 6 are mutually exclusive events.
Hint : Mutually exclusive events are applicable for a single trail only.
If a coin is tossed twice, the head appearing in the first trail will not affect the appearing of the
tail in the next trail.
The events are said to be equally likely if none of them is expected to occur in preference to
the other.i.e each one of them has an equal chance of happening.
Example
In tossing of a coin, getting a head and tail are equally likely events.
4
Exhaustive Events
Outcomes are said to be Exhaustive when they include all possible out comes.
Example
In drawing two cards from a pack of 52 cards, the exhaustive number of cases in
52 c 2
In the case of throwing two dice the exhaustive number of cases is 36 (=62).
Independent Event
Example
In successive tossing of a coin, the event of getting a head or tail in the first toss does not affect the
event of getting a head or tail in the second toss.
Dependent Events
The events are said to be dependent if the occurrence or non-occurrence of one event in any
trial affects the occurrence of other events in other trails.
If a trail results in n exhaustive mutually exclusively and equally likely cases and m of them are
favourable to the happening of the event A, then the probability of happening of A is given by
For example
Note :
i) The probability p of the happening of an event is also known as the probability of success.
ii) The probability q=l-p of the non-happening of the event is known as the probability of failure.
v) If the exhaustive number of cases in a trial is infinite, then this definition of classical probability
breaks down.
5
vi) If the events are not equally likely then this definition of mathematical probability breaks down.
If a trial is repeated n times under essentially homogeneous and identical conditions and let an
event A occur m times out of n trials, n becomes indefinitely large then the probability p of the
m
P(A) p lim
happening of A is given by n n
Let S be the sample space and A be an event associated with a random experiment. Then the
probability of the event A, denoted by P(A) is a real number satisfying the following axioms
(i) 0 P ( A) 1
(ii) P( S ) 1, P( ) 0
(ii) P( A ) 1 - P(A)
(iv) If A and B are two events which are not disjoint then
P( A B) P( A) P( B) P( A B)
6
Examples
Solution
2. From a bag containing 10 black and 12 white balls, a ball is drawn at random. What is the
probability that it is black?
Solution
n(S)=22
Out of 10 black balls, the number of ways of choosing one black ball =
10 c1 10
n(A)=10
n( A) 10
0.4545
Probability of getting a black ball = n( S ) 22 .
3. If at least one child in a family of three children is a boy, what is the probability that all three are
boys?
Solution
n(S)=7
n( A) 1
P(all three are boys)= n( S ) 7
4. A problem is given to 3 students A, B, C whose chances of solving it are 1/2, 1/3, 1/4 respectively.
what is the probability that
Solution
7
P(A)=1/2, P(B)=1/3, P(C)=1/4
P ( A) 1 / 2, P( B) 2 / 3, P (C ) 3 / 4 .
P ( A) P ( B ) P (C )
1 2 3
2 3 4
1
4
P ( A) P ( B ) P (C ) P ( A) P ( B ) P (C ) P ( A) P ( B )P (C )
1 1 1 11
4 8 12 24 .
5. A is known to hit the target in 2 out of 5 shots. B is known to hit the target in 3 out of 4 shots. Find
the probability of the target being hit when both try.
Solution
2 3
P( A) , P( B)
5 4
P( A B) P( A) P( B) P( A B) .
2 3 2 3
P( A B)
5 4 5 4
17
20
6. Events A and B are such that P ( A B ) 3 4 , P(AB) 1/4, P( A ) 2 / 3 then find P(B).
Solution
P ( A) 1 P ( A) 1 2 / 3 1 / 3 .
P( A B) P( A) P( B) P( A B)
8
P( B) P( A B) P( A) P( A B)
=(3/4)+(1/4)-(1/3)
-2/3.
Solution
P( A B) P( A) P( B) P( A B)
3 1 1 5
8 2 4 8
P ( Ac B c ) P A B c 1 P ( A B )
=1-5/8
=3/8.
8. A lot consists of 10 good articles, 4 with minor defects and 2 with major defects. Two articles are
chosen from the lot at random(without replacement). Find the probability that
c) at least 1 is good
d) at most 1 is good
e) exactly 1 is good
g) neither is good
Although the articles may be drawn one after the other, we can consider that both articles are drawn
simultaneously, as they are drawn without replacement.
Solution
10 c2 3
16 c2 8.
9
2c2 1
16 c 2 120
10 c1 6 c1 10 c 2 7
16 c 2 8
10 c0 6 c 2 10 c1 6 c1 5
16 c 2 8
10 c1 6 c1 1
16 c2 2.
f) P (neither has major defects) P(both are non - major defective articles)
14 c2 91
16 c2 120
6c2 1
16 c 2 8
9. A box contains 4 white, 5 red and 6 black balls. Two balls are drawn at random. What is the
probability that both are black?
Solution
n( S ) 15c 2 =105
6
Out of 6 balls, 2 balls can be selected in c 2 ways.
n( A) 6 c 2 15
10
n( A) 15
P( A) 0.1429
n( S ) 105
10. From a well shuffled deck of 52 cards , 4 cards are selected at random. Find the probability that
the selected cards are
Solution
13c 13c
3 1
P(getting 3 spades and 1 heart) 0.0137
52 c
i) 4
4c 4c 4c
2 1 1
P(getting 2 kings, 1 ace and 1 queen) 0.0003
ii) 52 c 4
13c
4
P(getting all diamonds) 0.0026
52 c
iii) 4
1c 12 c3
1
P (getting 4 hearts out of which one is a jack) 0.0008
v) 52 c 4
Exercises
i) four
2) Find the probability of drawing an ace or a spade or both from a deck of cards
11
Conditional probability
If the probability of the event A provided the event B has already occurred is called the conditional
probability and is defined as
P( A B) P( B) 0
P( A / B)
P( B) provided
If the probability of the event B provided the event A has already occurred is given by
P(A B) P(A) 0
P(B/A)
P(A) provided
Note
P(A).P(B)
P(A/B) P(A)
P(B)
P(A).P(B)
P(B/A) P( B)
P(A)
P( A B) P( A) P( B / A) P( B) P( A / B)
EXAMPLES
If B1, B2,…, Bn are mutually exclusive and exhaustive set of events of a sample space S and A is any
event associated with the events B1, B2, …,Bn then
P( A) P( B1 ) P( A / B1 ) P( B2 ) P( A / B2 ) ... P( Bn ) P( A / Bn ) .
n
P( A) P( Bi ) P( A / Bi )
i.e., i 1 .
1) If the probability that a communication system will have high fidelity 0.81 and the probability that
it will have high fidelity and selectivity is 0.18, what is the probability that a system with high fidelity
will also have selectivity?
Solution
12
Let A be the event that the system has selectivity.
P( A B) 0.18 2
P( A / B)
P( B) o.81 9
2)A box contains 4 bad and 6 good tubes. Two are drawn out from the box at a time. One of them
tested and found to be good. What is the probability that the other one also is good?
Solution
Let A = one of the tubes drawn is good and B= the other tube is good
P(A)=6/10
6c 2 1
P( A B) P(both tubes drawn are good)
10 c 2 3
Knowing that one tube is good, the conditional probability that the other tube is also good is required,
i.e., P(B/A) is required.
By definition,
P( A B) 1 / 3 5
P( B / A)
P( A) 6 / 10 9
3) Two defective tubes get mixed up with 2 good ones. The tubes are tested, one by one until both
defectives are found. What is the probability that the last defective tube is obtained on
Solution
P( D1 D2 ), say
P( D1 ) P( D2 )
13
2 1 1
4 3 6
P(D1 N 2 D 3 ) P( N1 D 2 D 3 )
2 2 1 2 2 1 1
4 3 3 4 3 2 3
2 2 1 2 2 1 2 2 1
1 1 1
4 3 2 4 3 2 4 3 2
1
2
4) In a shooting test, the probability of hitting the target is 1/2 for A, 2/3 for B and 3/4 for C. If all of
them fire at the target, find the probability that
Solution
P( A) 1 P( A) 1 1 / 2 1 / 2
1 1 1
P( A) P( B) , P(C )
2 3 4
P( A B C ) P( A) P( B) P(C )
1 1 1 1
2 3 4 24
P(at least one hits the target)= 1 P (none hits the target)
1 23
1
24 24
5) A box contains tags marked 1,2,…,n two tags are chosen at random without replacement. Find the
probability that the numbers on the tags will be consecutive integers.
Solution
14
Number of ways of choosing any one pair from the (n-1) pairs =
(n 1) c1 n 1
n
Total number of ways of choosing 2 tags from the n tags = c 2
n 1
The required probability = nc 2
6) Among the workers in a factory only 30% receive a bonus. Among those receiving the bonus only
20% are skilled. What is the probability of a randomly selected worker who is skilled and receiving
bonus?
Solution
P(A)=30/100 =0.3
20
P( B / A) 0.2
100
P( A B) P( A) P( B / A) (0.3)(0.2) 0.06
7) A and B alternately throw a pair of dice. A wins if he throws 6 before B throws7 and B wins if he
throws 7 before A throws 6. If A begins, show that his chance of winning is 30/61.
Solution
5 1
P( A) , P(B)
36 6
5 5 1 5 30
( )( )( ) ...
36 36 6 36 61
8) In a coin tossing experiment, if the coin shows head, 1 dice is thrown and the result is recorded. But
if the coin shows tail, 2 dice are thrown and their sum is recorded. What is the probability that the
recorded number will be 2?
Solution
P (getting 2) 1 / 6
When 2 dice are thrown, the sum will be 2, only if each dice shows 1.
15
P (getting 2 as sum with 2 dice) P(getting 1 in first dice)p(getting 1 in second dice)
1 1 1
6 6 36
n
P( A) P( Bi ) P( A / Bi )
i 1
P(recorded number will be 2 ) = P(H)P(getting 2 with 1 dice)+P(T)P( getting 2 as a sum with 2 dice)
6 1 7
72 72
9) If atleast 1 child in a family with 2 children is a boy, what is the probability that both children are
boys?
Solution
S= {BB.BG.GB,BB}
2 1 3
4 4 4
Therefore the probability that both children are boys given that atleast 1 child is a boy=
P( A B)
P( B / A)
P( A)
1/ 4
1/ 3
3/ 4
10) Two fair dice are thrown independently. Three events A, B and C are defined as follows.
Solution
16
3
P( A) 1 / 2, P(B) 1/2, P(C) 1/2
6
1 1 1
P( A B) P( A) P( B)
2 2 4
P( A B) P( B C ) P( A C ) 1 / 4
Therefore P ( A B C ) P ( A) P ( B ) P (C )
Therefore the events are pairwise independent, but not mutually independent.
11) A bold is manufacture by 3 machines A, B and C. A turns out twice as many items as B, and
machines B and C produce equal number of times. 2% of bolts produced by A and B are defective
and 4% of bolts produced by C are defective. All bolts are put into 1 stock pile and 1 is chosen from
this pile. what is the probability that it is defective?
Solution
Let A, B and C be the event in which the item has been produced by machine A, B and C.
P ( D) P ( A) P ( D / A) P ( B ) P ( D / B ) P (C ) P ( D / C )
P(B/A1)= 8c 2 15/28
17
4 c1 5c1 20 5
P(B/A2)= 9c2 36 9
15 5
(1 / 2)(15 / 28) (1 / 2)(5 / 9)
56 18
=275/504
13) An urn contains 10 white and 3 black balls. Another urn contains 3 white and 5 black balls. Two
balls are drawn at random from the first urn and placed in the second urn and then 1 ball is taken at
random from the latter. What is the probability that it is a white ball?
Solution
The two balls transferred may be both white or both black or 1 white and 1 black.
Let B1 be the event of drawing 2 white balls from the first urn and B2 be the event of drawing 2 black
balls from it and B3 be the event of drawing 1 white and 1 black ball from it.
Let A be the event of drawing a white ball from the second urn after transfer.
Similarly P ( A / B2 ) 3 / 10 and P ( A / B3 ) 4 / 10
Therefore, P ( A) P ( B1 ) P ( A / B1 ) P ( B2 ) P ( A / B2 ) P ( B3 ) P ( A / B3 )
15 5 1 3 10 4 75 3 40 118
= 26 10 26 10 26 10 260 260
=59/130
Baye’s Theorem
If B1, B2,…, Bn are a set of exhaustive and mutually exclusive events of a sample space S and A is any
event associated with B1, B2,…, Bn such that
n P( Bi ) P( A / Bi )
A Bi P( Bi / A)
n
i 1
P( Bi ) P( A / Bi )
then i 1
Examples:
18
i) 1 white, 2 black and 3 red balls
One urn is chosen at random and two balls are drawn. They happen to be white and red. What is the
probability that they come from urn I, urn II and III?
Solution
Let B1 be the event of choosing urn I, B2 be the event of choosing urn II and B3 be the event of
choosing urn III.
P( B1 ) 1 / 3, P(B 2 ) 1 / 3, P( B3 ) 1 / 3 .
Let A be the event of choosing 2 balls are white and one red. If the urn I is chosen then
1c1 3c1
P( A / B1 ) 1/ 5
6c2 .
2c1 1c1 2
P( A / B2 ) 1/ 3
4 c2 6
4 c1 3 c1
P ( A / B3 ) 2 / 11
12 c2 .
P( A) P( B1) P( A / B1) P( B2 ) P( A / B2 ) P( B3 ) P( A / B3 )
1 1 1 1 1 2 118
3 5 3 3 3 11 495
P( B1 ) P( A / B1 )
P( B1 / A)
P( A)
1 1
3 5
118
495
19
=0.2797.
1 1
P( B2 ) P( A / B2 ) 3 3 0.4661
P( B2 / A)
P( A) 118 / 495 .
1 2
P( B3 ) P( A / B3 ) 3 11 0.2542
P( B3 / A)
P( A) 118 / 495 .
2) A bag contains 5 balls and it is not known how many of them are white. Two balls are drawn at
random from the bag and they are noted to be white. What is the chance that all the balls in the bag
are white?
Solution
Since 2 white balls have been drawn out, the bag must have contained 2,3,4 or 5 white balls.
Let B1 be the event of the bag containing 2 white balls, B2 be the event of the bag containing 3 white
balls, B3 be the event of the bag containing 4 white balls and B4 be the event of the bag containing 5
white balls.
P(B1)=P(B2)=P(B3)=P(B4)=1/4
By Bayes theorem
P( Bi ) P( A / Bi )
P( Bi / A)
n
P( Bi ) P( A / Bi )
i 1
P ( B4 ) P ( A / B4 )
P( B4 / A)
4
P( Bi ) P( A / Bi )
i 1
20
(1 / 4)(1)
(1 / 4)(1 / 10) (1 / 4)(3 / 10) (1 / 4)(3 / 5) (1 / 4)1
1 / 40 1 / 40
1 / 20
1 3 3 1 20 / 40
( )
40 40 20 4
3) In a bolt factory, machines A, B and C produce 25%, 35% and 40% of the total output respectively.
Of their outputs 5,4 and 2% respectively are defective bolts. If a bolt is chosen at random from the
combined output, what is the probability that it is defective? If a bolt chosen at random is found to be
defective, what is the probability that it was produced by B?
Solution
P(X/E1)=0.05
P(X/E2)=0.04
P(X/E3)= 0.02
By Bayes theorem
P( E2 ) P( X / E1 )
P ( E2 / X )
P( E1 ) P( X / E1) P( E2 ) P( X / E2 ) P( E3 ) P( X / E3 )
0.406
4) A given lot of IC chips contains 2% defective chips. Each is tested before delivery. The tester itself
is not totally reliable. Probability of tester says the chip is good when it is really good is 0.95 and the
probability of tester says chip is defective when it is actually defective is 0.94. If a tested device is
indicated to be defective, what is the probability that it is actually defective?
Solution
Let E be the event of chip which is actually good and D be the event of tester says it is good.
P( E ) 0.02
P( E ) 1 P( E ) 1 0.02 0.98
Given that the probability of tester says the chip is good when it is really good is 0.95.
PD / E 0.95
P D / E 1 0.95 0.05
P D / E 0.94
21
P D / E 1 0.94 0.06
P( E ) P( D / E )
P( E / D)
P( E ) P( D / E ) P( E ) P( D / E )
(0.02)(0.94)
(0.98)(0.05) (0.02)(0.94)
0.0188
0.0678
0.2773
5) A certain firm has plant A, B and C producing IC chips. Plant A produces twice the output from B
and B produces twice the output from C. The probability of a non defective product produced by A,B
and C are respectively 0.85, 0.75, and 0.95. A customer receives a defective product, find the
probability that it came from plant B.
Solution:
P(A)=1; P(B)=0.5; P(C)=0.25
P ( E / A) 0.85; P(E/B) 0.75; P(E/C) 0.95
P( E / A) 0.15 ; P( E / B) 0.25 ; P( E / C ) 0.05
The probability that the customer receives a defective product from plant B is
is
P( B) P( E / B)
P( B / E )
P( A) P( E / A) P( B) P( E / B) P(C ) P( E / C )
(0.5)(0.25)
1(0.15) (0.5)(0.25) (0.25)(0,05)
0.125
0.4348
0.2875
6) There are 3 true coins and 1 false coin with head on both sides. A coin is chosen at random and
tossed 4 times. If head occurs all the 4 times, what is the probability that the false coin has been
chosen and used?
Solution
P(T)=P( the coin is a true coin)=3/4
P(F)=P(the coin is a false coin)=1/4
Let A be the event of getting all heads in 4 tosses.
1 1 1 1 1
P(A/T)= 2 2 2 2 16
P(A/F)=1
By Bayes theorem
P( F ) P( A / F )
P( F / A)
P( F ) P( A / F ) P(T ) P( A / T )
22
1
1
4
1 3 1
(1) ( )( )
4 4 16
16
19
Random Variable:
A random variable X whose value is determined by the outcome of a random experiment is called a
random variable.
Example
A random experiment consists of two tosses of a coin. Consider the random variable which is the
number of heads (0,1 or 2)
Outcome H H T T
H T H T
Value of 2 1 1 0
X
If the function values are ordered pairs of real numbers, the function is said to be a two-
dimensional real numbers
A random variable which can assume only a countable number of real values is called a discrete
random variable
Example
Number of telephone calls per unit time, marks obtained in a test, number of printing mistakes in each
page of a book.
Suppose X is an one-dimensional discrete random variable taking atmost a countably infinite number
of values x1,x2,… with each possible outcome xi, , we associate a number pi, P(X=xi)=p(xi)=pi, called
the probability of xi.
i. p( xi ) 0i
23
p( xi ) 1
ii. i 1
is called the probability mass function or probability function of the random variable X.
The collection of pairs {xi,pi}, I =1,2,3… is called the probability distribution of the random variable
X.
The distribution function of the random variable X with probability mass function p(xi), i=1,2,…,n is
F ( xi ) p(x i )
defined as i : xi x .
Note:
E X xp( x)
ii) Mean of the random variable X = x
x 2 p( x) xp( x)
2
iii) Variance of the random variable X = V X
V X E ( X 2 ) E ( X )2
A random variable X is said to be continuous if it can take all possible values between certain limits.
24
The probability density function f X (x) of a continuous random variable X satisfies the
following properties
i. f ( x) 0
f ( x)dx 1
ii.
x2
P( x1 X x2 ) f X ( x)dx
iii. x1
iv. P ( x1 X x2 ) P ( x1 X x2 )
a
P( X a) f ( x)dx 0
v. a
[Note: In case of a continuous random variable, the probability at a point is always zero.
The cumulative distribution F(x) of a continuous random variable X with PDF f(x) is given by
F ( x) P( X x) f ( x)dx, - x
Note:
i. P (a X b) F (b) F (a)
F () Lt F ( x) 0 F () Lt F ( x) 1
ii. x ; x
iii. The relation between the CDF and PDF is
d
f ( x) F ( x)
iv. dx
v. If X is a continuous random variable with PDF f(x) then
Mean E(X) xf(x)dx
Mean = -
2 2
E(X ) x f(x)dx
-
Examples:
X 0 1 2 3 4 5 6 7 8
25
P(X A 3a 5a 7a 9a 11a 13a 15a 17a
)
Find
i) a
ii) P ( X 2)
Solution
81a=1
a=1/81
ii) P ( X 2) P ( X 0) P ( X 1) P ( X 2) a=99
=a+3a+5a
=9a
=9(1/81)=1/9.
26
F ( x) 0, x 0
1
P( X 0) a ,0 x 1
81
P(X 1) 4a 4 , 1 x 2
81
9
P(X 2) 9a ,2 x 3
81
1
P( X 3) 16a 16( ),3 x 4
81
1
P( X 4) 25a 25( ),4 x 5
81
1
P( X 5) 36a 36( ),5 x 6
81
1
P( X 6) 49a 49( ),6 x 7
81
1
P( X 7) 64a 64( ),7 x 8
81
1
P( X 8) 81a 81( ) 1, 8 x i.e., x 8
81
i. e., E ( X ) xp( x)
=0+3a+10a+21a+a+36a+55a+78a+105a+136a
=444a
=444(1/81)
=5.48
Solution
Let P ( X 3) k then
k
2 P( X 1) k P(X 1)
2
k
3P( X 2) k P(X 2)
3
27
k
5 P( X 4) k P(X 4)
5
4
k k k
p( xi ) 1 2 3 k 5 1
i 1
30
k
61
k 30 / 61
P(X 1) 0.2459
2 2
k 30 / 61 30
P(X 2) 0.1639
3 3 183
30
P(X 3) k 0.491
61
k 30 / 61
P( X 4) 0.0984
5 5
F ( x) 0, x 1
0.2459 , 1 x 2
0.2459 0.1639 0.4098,2 x 3
0.4098 0.491 0.9008, 3 x 4
0.996 1, x 4
X=i 1 2 3 4
Solution
28
F ( x) 0, x 1
15
,1 x 2
61
25
,2 x 3
61
55
,3 x 4
61
1, x 4
f ( x) k ( x 1) 3 , 1 x 3
0, otherwise
Find
i) the value of k
Solution
f ( x)dx 1
Since we have
3
3
k ( x 1) dx 1
1
3
( x 1) 4
k 1
4
1
(k / 4)(2 4 0) 1
k 1/ 4
f ( x) 1 4 ( x 1) 3 ,1 x 3
x
F ( x) f ( x)dx
ii)
1 x
f ( x)dx f ( x)dx
1
29
1 3
1 3
0.dx 4 ( x 1) dx
1
1
( x 1) 4
16
F ( x) 0, x 1
1
( x 1) 4 , 1 x 3
6
1, x 3
f ( x) cxe x , x 0
0, x 0
Solution
f ( x)dx 1
x
cxe dx 1
0
c xe x (e x ) 0 1
c(1) 1
c=1
f ( x) xe x
P X x xe x dx
The CDF of X=F(x)= 0
F ( x) xe x e x 0 x
xe x e x 1
30
1 e x (1 x)
F ( x) 1 e x (1 x), x 0
0, otherwise
6) A continuous random variable X follows the probability law f(x)=ax2, 0 x 1 . Determine a and
1 1
and
find the probability that x lies between 4 2.
Solution
f ( x)dx 1
0 1
f ( x)dx f ( x)dx f ( x)dx 1
0 1
1
2
ax dx 1
0
1
x3
a 1
3
0
1
a 0 1
3 i.e., a=3
f ( x) 3x 2
1/ 2
1 1
P x
4 2
f ( x)dx
1/ 4
1/ 2
2
3x dx
1/ 4
1/ 2 3 3
x3 1 1
3
3 2 4
1/ 4 =
1 1 7
8 64 64
31
7) If the PDF of a random variable x is f(x)=x/2 in 0 x 2 . Find P ( X 1.5 / X 1)
Solution
P( X 1.5) P( X 1.5 X 1)
P( X 1.5 / X 1)
P( X 1)
2
f ( x)dx
1.5
2
f ( x)dx
1
2
x
2
dx
1.5
2
x
2 dx
1
2
1 x 2
2 2
1.5
2
1 x 2
2 2
1.
4 (1.5) 2
(4 1)
4 2.25
3
1.75
0.5833
3
f ( x ) x, 0 x 1
1
( x 1) 2 ,1 x 2
2
0, otherwise (x)
3 5
P( x )
Find the cumulative distribution function F(x) of X and use it to find 2 2
Solution
32
By definition F ( x) P ( X x)
1 x
3
F ( x) xdx ( x 1) 2 dx
2
0 1
1 x
x2 3 ( x 1) 3
2 2 3
0 1
1 ( x 1)3
0
2 2
1 x 3 3x 2 3x 1
2 2
x 3 3x 2 3x
,1 x 2
2
F ( x) 1 when x 2
F ( x) x 2 2 , 0 x 1
x 3 3x 2 3x
,1 x 2
2
1 when x 2
3 5
P( x )
To find 2 2 .
3 5
P( x ) F (5 / 2) F (3 / 2)
2 2
(3 / 2)3 3(3 / 2) 2 3(3 / 2)
1
2
(27 / 8) (27 / 4) (9 / 2)
1
2
1 27 54 36
1
2 8
=1-(9/16)=7/16.
33
f ( x) ax, 0 x 1
a, 1 x 2
3a - ax, 2 x 3
0, otherwise
iii) P ( X 1.5)
Solution
f ( x)dx 1
Since f(x) is a PDF of x, we have
1 2 3
axdx adx (3a ax)dx 0 1
0 1 2
1 3
x2 x2
a ax 12 3ax a 1
2 2
0 2
a 32 4
(1 0) a (2 1) (3a.3 a 6a a. ) 1
2 2 2
a 9a
a (9a 6a 2a ) 1
2 2
a
a (a / 2) 1
2
4a
1
2
a=1/2.
F ( x) 0, x 0
x
axdx, 0 x 1
0
x
x2 a
a x2 , 0 x 1
2 2
0
34
1 x
F ( x) axdx adx
0 1
1
x2 a a 1
a ax 1x ax a ax a ( x )
2 2 2 2
0
1
x2 a a 1 1 1
a ax 1x ax a ax a ( x ) ( x )
2 2 2 2 2 2
0
x 1
,1 x 2
2 4
1 2 x
F ( x) axdx adx (3a ax)dx
0 1 2
1 2 x
x2 x
F ( x) a ax 12 (3ax a )
2 2
0 2
a x2
(1 0) (2a a ) (3ax a 6a 2aa )
2 2
a x2
a 3ax a 4a
2 2
5a x2
3ax a
2 2
5 x x2
3 ,2 x 3
4 2 4
F(x)=1 when x 3
1 2
F ( x) x , 0 x 1
4
x 1
,1 x 2
2 4
5 3x x 2
,2x3
4 2 4
1 when x 3
35
1.5 1 1
P( X 1.5) F (1.5)
2 4 2
F ( x) 0, x 0
1
x2, 0 x
2
1
3
25
1
3 x2 , x 3
2
1 when x 3
P( X 1) 1
P( X 4)
7) Find the PDF of X and evaluate and 3 using both PDFand CDF
Solution
F ( x) 0, x 0
1
x2, 0 x
2
1
3
25
1
3 x2 , x 3
2
1 when x 3
W.K.T
d
f ( x) F ( x) F ( x)
dx
The PDF of X is
f ( x) 0, x 0
2 x, 0 x 1/2
3 1
1- 2(3 x)(1), x 3
25 2
0, x 3
1
13
p( X 1) p(1 X 1) f ( x)dx 25
1
36
4
1 8
p ( X 4)
3 f ( x)dx 9
1/ 3 .
3 2
1 (2 )
25
12
1
25
p( X 1) = 13/25.
P(1 / 3 X 4) F (4) F (1 / 3)
1 8
1
32 9
f ( x) 1 / 4, x 2
0, otherwise
Find
i. P ( X 1)
ii. P( X 1)
iii. P(2 X 3 5)
Solution
1
P ( X 1) f ( x)dx
2 1 1 1
1 1 1 1
4
dx 4 dx 4 x 2 4 x 2
2 =
1 (2) 3
2 4.
P( X 1) 1 P X 1
1 P (1 X 1)
37
1 1
1 1
1 f ( x)dx 1 x
4 1 2
1
To find P (2 X 3 5)
P (2 X 3 5) P (2 X 5 3)
P ( 2 X 2)
P ( X 1)
3
1 P( X 1) 1 1/ 4
4
MOMENTS
If X is a random variable which is discrete or continuous, the moments about the origin denoted by
r is defined as
r E ( X r ) for r = 1,2,3,…
r E X X
r
, for r=1,2,3,…
If X is a discrete random variable which can assume any one of the values x1, x2,…,xn with respective
probabilities p(x1),p(x2),…,p(xn), then
r E ( X ) x r p( x)
r
r 1
r E { X X ) r ( x x ) r p ( xr )
and .
r x
r
f ( x)dx, r 1,2,3,..
-
r
r ( x x) f ( x)dx, r 1,2,3,..
and -
38
r E( X X )
r
By definition,
r 1
E X r rc1 X r 1 X rc2 X r 2 X rc3 X r 3 X ... (1) r 1 rcr 1 X X (1) r X
2 3 r
2 3 r
E ( X r ) rE ( X r 1 ) X rc2 E ( X r 2 ) X rc3 E ( X r 3 ) X ... (1) r X
Since E ( X ) X 1 we have
2 3 r
r r r r 1 1 rc2 r 2 1 rc3 r 3 1 ... (1) r 1
Results
1. The first moment about the mean is always zero, since 1 1 0 1 0
2. The first moment about the origin is mean.
3. The second moment about the mean
2
2 2 2 1 1 1
2
2 2 1
Var (X)= E ( X ) E ( X )
2 2
3
3 3 2 1 3( 1 )3 1
Similarly,
4
4 4 4 3 1 6( 1 ) 2 ( 2 ) 31 and so on.
Relation between moments about any point A and moments about mean X .
W.K.T. r E X A
r
Putting r =1,
39
1 E X A X A Mean X 1 A
Putting r = 2,
2 2 ( 1 ) 2
Similarly we get,
Properties of Moments
E (aX b) aE ( X ) b
Note: E ( X Y ) E ( X ) E (Y )
2
2. If X is a random variable then Var (aX b) a var( X )
2 2
3. If X and Y are independent then Var(aX bY) a Var(X) b Var(Y)
Covariance (X,Y)
Cov(X,Y)=E(XY)-E(X)E(Y)
Note:
Various measures of Central tendency, dispersion, skewness and kurtosis for continuous
probability distribution
Let f(x) be the P.D.F of a random variable X where X is defined from a to b. Then
40
b
f ( x)dx x
Arithmetic Mean = a
b
1 1
f ( x)dx
H x
Harmonic Mean a
Geometric Mean
Median: Median is the point which divides the entire distribution in two equal parts. In case of a
continuous distribution median is the point which divides the total area into two parts. Thus if M is the
median then
M b
1
f ( x)dx f ( x)dx 2
a M
Mean deviation: Mean deviation about the mean 1 is given by
b
M .D. x mean f ( x)dx
a
b
x A f ( x)dx
M.D. about A = a
2
1 3 ; 2 4
3
2 22
Definition
M X (t ) E (etx )
41
tx (tx) 2 (tx)3
M X (t ) E 1 ...
1! 2! 3!
t t2 t3 tr
E (1) E ( X ) E ( X 2 ) E ( X 3 ) ... E ( X r ) ...
1! 2! 3! r!
t t 2 t3 tr
1 1 2 3 ... r ...
1! 2! 3! r!
tr
r
r 0
r!
t r M X (t ) r r E( X r )
The co efficient of r! in of where r = 1,2,… and , moment about the
origin.
M X (t ) E (et ( x a ) ) t ( x a) t ( x a) 2 t ( x a)3
E 1 ...
1! 2! 3!
=
1
t t 2 t3
1 2 3 ...
E ( X a) r r
= 1! 2! 3! where .
M X (t ) E (etx ) etx p( x)
x
tx tx
M X (t ) E (e ) e f ( x)dx
d
1 M X (t )
dt t 0
42
d2
2 M X (t )
2
dt t 0
In general,
dr
r M X (t )
r
dt t 0 r=1,2,3,…
Note:
3. The moment generating function of the sum of a given number of independent random variables is
equal to the product of their respective moment generating function
i.e.,
M X 1 X 2 ... X n (t ) M X 1 (t ).M X 2 (t )...M X n (t ) .
4. Mean = 1
2
5.Variance = 2 1
Exercises
X 0 1 2 3 4 5 6
Solution
t
M X (t ) etx p( x)
x 0
t
etx p( x)
x 0
e 0 (1 / 49) et (3 / 49) e 2t (5 / 49) e3t (7 / 49) e 4t (9 / 49) e5t (11 / 49) e 6t (13 / 49)
1
49
1 3et 5e 2t 7e3t 9e 4t 11e5t 15e 6t
2) A random variable X has the probability function
43
1
f ( x)
2 x , x =1,2,3,…
Solution
M .G.F . etx p( x)
x 1
etx f ( x)
x 1
1
etx 2 x
x 1
x
et
2
x 1
2 3
et et et
...
2 2 2
2 3
et et et et
1 2 2 2 ...
2
1
et et et
1
2 2 2 et
et
M X (t )
2 et
d et
X
dt 2 et
t 0
Mean
( 2 e t )e t e t ( e t )
(2 et ) 2 t 0
44
2 e 2t e 2t 2
2
t 2 1
(2 e ) t 0 = .
variance 2 1
2
d 2 et
2
dt 2 2 et
t 0
d 2
dt (2 et ) 2 t 0
2(2)(et )
t 3
(2 e ) t 0
2 4 .
But variance 2 1
=4-4=0.
2
f(x) 2/3 at x 1
1/3 at x 2
0, otherwise
Solution
M.G. F. = E(etx)
etx p(x)
M X (t ) et 2 / 3 e 2t .1 / 3 0
d
1 M X (t )
dt t 0
d
1 (2et / 3 e 2t 3
dt t 0
1 (2et / 3 2 e 2t 3 t 0
45
2 2 4
3 3 3
2
Variance = 2 1
d2
2 M X (t )
2
dt t 0
d 2et 2e 2t
2
dt
3 3 t 0
2et 4e 2t
2
3 3
t 0
2 4 6
2
=3 3 3
2 6 / 3 (4 / 3) 2
Variance= 2 1
6 16 18 16
2 / 9.
3 9 9
et 4e3t 2e 4t 4e5t
M X (t )
3 15 15 15
Solution
tx
For a discrete random variable, M X (t ) E (e ) etx p( x)
Hence the probability function is given by
x 1 3 4 5
r
5) Find the M.G.F. of the random variable whose moments are r (r 1)3 and hence find its
mean.
46
Solution
tr
M X (t ) r! r
The M.G.F. is r 0 .
tr
r! (r 1)!3r
r 0
(3t ) r
r! (r 1)!
r 0
3t r (r 1)
r 0
(1 3t ) 2
1
M X (t )
(1 3t ) 2
d
Mean M X (t )
dt t 0
(2)
(3)
(1 3t )3 t 0
6
3
(1 3t ) t 0 .
Mean = 6
t
Mean 1 coefficientof 6
(Or) 1! .
f(x) ke kx , x 0
0, otherwise
Hence find i) Mean and ii) Variance, iii) 3 , iv) 4 .
Solution
47
Given
f(x) ke kx , x 0
0, otherwise
M.G.F. M X (t ) E (etx )
tx
e f ( x)dx
e
tx
ke kx dx
k e (t k ) x dx
0
e (t k ) x
k
t k
0
k
(0 1)
k t
k
M X (t )
k t
d
Mean 1 M X (t )
dt t 0
d k
dt k t t 0
k ((1))
(k t ) 2 t 0
k 1
2 k.
k
2
Variance= 2 ( 1 )
48
d2
2 M X (t )
2
dt t 0
d k
dt (k t ) 2 t 0
k .2(1)
3
(k t ) t 0
2k 2k 2
3 3
(k t ) t 0 k k2
2
Variance= 2 ( 1 )
2
2 1 1
2 k
k k2
d3
3 M X (t )
3
dt t 0
d 2k
3
dt 3
(k t ) t 0
2k (3)(1)
3
4
(k t ) t 0
6k 6
k4 k3
d4
4 M X (t )
4
dt t 0
d 6k
4
dt (k t ) 4 t 0
6k (4)(1)
4
(k t )5 t 0
49
24k
(k t )5 t 0
24k 24
5
k k4 .
3 3
M X (t )
3 t 3(1 t / 3)
t
(1 ) 1
3
1 t 3 (t / 3) 2 (t / 3)3 ...
t t2 t3
1 (1 / 3) (2 / 9) (6 / 27) ...
1! 2! 3!
tr
r co efficient of
r!
t
1 co efficient of 1 / 3
1!
t2
2 co efficient of 2/9
2!
2
Variance = 2 ( 1 )
2 1 2 1
( )
9 3 9
1
Variance
Standard deviation = 3.
1 x x
dF e dx
8) Find the m.g.f. of the distribution defined by 2 , and hence find the
variance.
Solution
1 x
dF e dx
2
50
dF 1 x
e
dx 2 .
dF 1 x
f (x) f ( x) e
Now dx . i.e., 2 .
tx
M X (t ) e f ( x)dx
1 x
etx e dx
2
0
1 ( x) 1
e tx
e dx etx . e x dx
2 2
0 .
0
1 (t 1) x 1
e dx et ( 1) x dx
2 2
0
0
1 (t 1) x 1
e dx e (t 1) x dx
2 2
0
0
1 e (t 1) x 1 e (1 t ) x
2 t 1 2 1 t
0
1 1 1
0 (0 1)
2 1 t 1 t
2
2(1 t 2 )
1 t2 1 2
1 t
(t 2 ) 2 t 2
3
...
= 2! 3!
2 co efficient of t 2 / 2! 2
2 2
Variance = 2 ( 1 ) 2 0 2
51
UNIT-II
PROBABILITY DISTRIBUTIONS
Introduction
DISCRETE DISTRIBUTIONS
Let A be an event ((trail) associated with a random experiment such that p(A) remains the
same for the repetitions of that random experiment, then the events are called Bernoulli trails.
A random variable X which takes only two values either 1 (success) or 0(failure) with
probability p and q respectively. i.e., P(X=1)=p, P(X=0)=q, p+q=1 is called Bernoulli variate and is
said to have a Bernoulli distribution.
r E ( X r ) 1r. p 0 r q p
1 E ( X ) p, 2 E ( X 2 ) p
Mean=p
Var ( X ) E ( X 2 ) E ( X )2 p p 2 p (1 p ) pq
Definition.
A random variable X is said to follow binomial distribution denoted by B(n,p) if it assumes only non-
negative values and its probability mass function is given by
p ( x) P ( X x) nc x p x q n x ,x=0,1,2,…,n
=0, otherwise
52
Suppose that n trails constitute an experiment and if this experiment is repeated N times the
frequency function of the binomial distribution is given by
Np ( x) N nc x p x q n x , x 0,1,2,..., n
1. Each trail results in two mutually disjoint outcomes, termed success and failure.
E ( X ) xp( x)
Mean = x
n
xnc x p x q n x
x 0
n
n!
x x!(n x)!p x q n x
x 0
n
n(n 1)! pp x 1q n x
( x 1)!(n x)!
x 0
n
(n 1)! p x 1q n x
np
x 1
( x 1)!(n x)!
n
(n 1)! p x 1q n x
np
x 1
( x 1)!(n x)!
n
np (n 1) c x 1 p x 1q n x
x 1
n
np (n 1) c x 1 p x 1q ( n 1) ( x 1)
x 1
53
np (q p ) n 1
Mean=np
Var ( X ) E ( X 2 ) E ( X )2
P ( X x) p ( x) nc x p x q n x , x 0,1,2,..., n
n
E( X 2 ) x 2 p( x)
x 0
n
x 2 nc x p x q n x
x 0
n
n!
x 2 x!(n x)! nc x p x q n x
x 0
n
n!
x( x 1) x x!(n x)! p x q n x
x 0
n n
n! n!
x( x 1) p xqn x x p xqn x
x 0
x!(n x)! x 0
x!(n x)!
n
n(n 1)(n 2)!
( x 2)!(n x)! p 2 p x 2 q n x E ( X )
x 0
n
(n 2)!
n(n 1) p 2
( x 2)!(n x)! p x 2 q n x np
x 0
n(n 1) p 2 (q p ) n 2 np
E ( x 2 ) n(n 1) p 2 np
54
n(n 1) p 2 np n 2 p 2
p 2 n 2 n n 2 np
np(1 p)
npq
Note
i. 3 npq(q p)
P( X x) nc x p x q n x , x 0,1,2,...n
M x (t ) E (e tx )
n
etx nc x p x q n x
x 0
n
nc x ( pet ) x q n x
x 0
q n nc1 ( pe t )1 q n 1 nc 2 ( pe t ) 2 q n 2 ..... ( pe t ) n
(q pe t ) n
55
d
E ( X ) 1 M X (t )
dt t 0
d
(q pe t ) n
dt t 0
n(q pe t ) n 1 pe t t 0
np ( p q ) n 1 np
d2
E ( X 2 ) 2 M X (t )
2
dt t 0
d2
(q pe t ) n
2
dt t 0
npet (q pet ) n 1 n(n 1)(q pet ) n 2 ( pet ) 2 t 0
np(q p) n 1 n(n 1)(q p) n 2 p 2
np n(n 1) p 2
np n 2 p 2 np 2
np (1 p ) n 2 p 2
E ( X 2 ) npq n 2 p 2
Var ( X ) E ( X 2 ) E ( X )2
npq n 2 p 2 n 2 p 2
npq
By definition,
56
M X np (t ) E[et ( x np ) ]
E[etx tnp ]
E[etx e tnp ]
e npt E[etx ]
M X (t ) e npt (q pet ) n
If X1 and X2 are two independent binomial variates with parameters (n1,p) and (n2,p)
respectively then X1+X2 is a binomial variate with parameter (n1+n2,p)
Proof
M X 1 X 2 (t ) M X 1 (t ) M X 2 (t )
(q pet ) n1 (q pet ) n2
(q pet ) n1 n2
This shows that X1+X2 is also a binomial variate with parameters n1+n2 and P.
Note
If X1 and X2 are two independent binomial variates with parameters (n1,p1) and (n2,p2) then
X1+X2 is not a binomial variate.
n
k E[ X E ( X )]k ( x np) k nc x p x q n x
x 0 -------------------------------(i)
n
d
k nc x nk ( x np ) k 1 p x q n x ( x np ) k [ xp x 1q n x p x (n x)q n x 1 (1)]
dp x 0
57
n
nk k 1 nc x ( x np) k p x 1q n x 1[ xq (n x) p]
x 0
1 n Since, ( p q 1)
nk k 1 nc x p x q n x ( x np ) k ( x np )
pq x 0
1 n
nk k 1 nc x p x q n x ( x np) k 1
pq x 0
d k 1
nk k 1 k 1
dp pq
d
k 1 pq k nk k 1
Therefore, dp -------------------------------(ii)
Using recurrence relation (ii) we can compute moments of higher order, provided the
moment of lower order are known.
d Since, ( 0 1, 1 0)
2 pq 1 n 0 npq
dp
Therefore, 2 npq
d
3 pq 2 2n1
dp
d
pq (npq ) 2n(0)
dp
d Since, q 1 p
pq [np (1 p )]
dp
d
pq [np np 2 ]
dp
pqn 2np
58
npq 2np 2 q
3 npq(1 2 p)
d
4 pq 3 3n 2
dp
d
pq npq (1 2 p ) 3n(npq )
dp
d
pq np (1 p )(1 2 p ) 3n 2 pq
dp
d
pq
dp
np np 2 )(1 2 p) 3n 2 pq
d
dp
pq np 2np 2 np 2 2np 3 3n 2 pq
pq n 4np 2np 6np 2 3n 2 pq
pq n 6np 6np 2 3n 2 pq
npq 1 6 p 6 p 2 3npq
npq1 6 p 6 p(1 q) 3npq
npq1 6 pq 3npq
npq1 3 pq (n 2)
Note
μ2 is the variance
μ4 is a measure of kurtosis
59
We denote sometime the measure of skewness and kurotsis by β1 and β2
2 4
1 3 2
23 22
,
Examples
1. The mean and variance of a binomial distribution are 4 and 4/3 respectively. Find P(X≥1) if n=6.
Solution
4
npq 3
np 4
q 1
3
Given n=6
P( X x) nc x p x q n x
P( X 1) 1 P[ X 1]
1 P[ X 0]
1 6 c0 p 0 q 6 0
1 q6
1
1 ( )6
3
1
1
729
60
728
729
2. The mean and variance of binomial distributions are 4 and 3 respectively. Find P(X=0), P(X=1) and
P(X≥2).
Solution
npq 3
np 4
3
q
4
Since Mean = np = 4
= n(1/4) = 4
n = 16
P( X x) nc x p x q n x
P( X 0) nc0 p 0q n
3
16c0 ( )16
4
3
( )16 0.01
4
P( X 1) nc1 p1q n 1
16c1 p1q15
1 3
16( )( )15 0.053
4 4
61
P ( X 2) 1 P ( X 2 )
1 [ P( X 0) P( X 1)]
0.937
3. If the mean is 3 and variance is 4 of a random variable X, check whether X follows binomial
distribution,
Solution
No. Because for a binomial distribution mean should be greater than the variance.
Therefore mean should be greater than the variance for a binomial distribution.
3. A binomial variate X satisfies the relation 9P(X=4) = P(X=2) when n=6. Find the parameter p of
the binomial distribution.
Solution
P( X x) nc x p x q n x
P( X 4) 6c4 p 4q 6 4
P ( X 4) 6 c 4 p 4 q 2
P ( X 2) 6 c 2 p 2 q 4
62
9 * 6c4 p 4q 2 6c2 p 2q 4
135 p 2 15q 2
9 p2 q2
9 p 2 q 2 0
9 p 2 (1 p) 2 0
9 p 2 (1 p 2 2 p) 0
9 p 2 1 p 2 2 p 0
8 p 2 2 p 1 0
2 4 32
p
16
26 4 8
p ,
16 16 16
1 1
p ,
4 2
4. Out of 800 families with 4 children each, how many families would be expected to have
Solution
63
Considering each child is a trial, n=4. Assuming that birth of a boy is success, p = 1/2 and q =
½
P( X x) nc x p x q n x
1 1
P( X 2) 4 c2 ( ) 2 ( ) 4 2
2 2
1 3
6( ) 4
2 8
= 800(3/8) = 100 * 3
= 300
=1- P[X=0]
1 1
P ( X 0) 4 c 0 ( ) 0 ( ) 4 0
2 2
1 15
1 P( X 0) 1 ( ) 4
2 16
1 1 1 1
1 4 c0 ( ) 0 ( ) 4 0 4 c1 ( )1 ( ) 4 1
2 2 2 2
64
1 1 1 4 5
1 ( ) 4 4( ) 4 1 ( ) 1
2 2 16 16 16
11
16
= 1- [P(X=4) + P(X=0)]
1 1 1 1 1
1 4 c4 ( ) 4 4 c 0 ( ) 0 ( ) 4 1 [( ) 4 ( ) 4 ]
2 2 2 2 2
= 1- 2/16 = 7/8
= 700
5. An irregular 6 faced die is such that the probability that it gives 3 even numbers in 5 throws is twice
the probability that it gives 2 even numbers in 5 throws. How many sets of exactly 5 trials can be
expected to give no even number out of 2500 sets.
Solution
Let the probability of getting an even number with the unfair die be p .
5c3 p 3q 2 2 * 5c2 p 2q 3
p = 2q
p = 2(1-p)
3p = 2
65
P = 2/3
q=1-p = 1/3
1 1
5c0 p 0 q 5 ( ) 5
3 243
Therefore number of sets having no success ( even number) out of N sets = N [ P(X=0) ]
= 2500 * 1/243
= 10 nearly
6. A communication system consists of n components each of which will independently function with
probability P. The total system will be able to operate effectively is at least one half of is components
function. For what values of p is a 5 component system more likely to operate effectively than a 3
component system?
Solution
Since p is a constant and the n components function independently, the number of components X that
function follows a binomial distribution
p( x) nc x p x q n x , x 0,1,2,.., n
5 c 3 p 3q 5 3 5 c 4 p 4 q 5 4 5 c 5 p 5 q 5 5
10 p 3q 2 5 p 4 p 5
= P(X=2) + P(X=3)
3c2 p 2q 3 2 3c3 p 3q 3 3
3 p 2q p 3
66
5 component system will function more effectively than 3 component system,
10 p 3q 2 5 p 4 p 5 3 p 2q p 3
p 2 (10 pq 2 5 p 2 q p 3 3q p) 0
p 2 (10 p (1 p 2 2 p ) 5 p 2 5 p 3 p 3 3 3 p p ) 0
p 2 (10 p 10 p 3 20 p 2 5 p 2 5 p 3 p 3 3 2 p ) 0
p 2 (6 p 3 15 p 2 12 p 3) 0
3 p 2 (2 p 3 5 p 2 4 p 1) 0
3 p 2 ( p 1 / 2)( p 1) 2 0
2 2
Since, 3 p ( p 1) 0
We have p-1/2≥0
That is p≥1/2
7. At least one half of an airplane’s engines are requires to function In order for it to operate. If each
engine independently function with probability p, for what values of p is a 4 engine plane to be
preferred for operatiions to a 2 engine plane?
Solution
=P(X=1) + P( X=2)
67
2 c1 p1q1 p 2
2 p(1 p) p 2 2 p 2 p 2 p 2
2 p p 2 p(2 p)
= 1- [P(X<2)
= 1- [P(X=0) + P(X=1)]
1 [q 4 4c1 p1q 3 ]
1 q 2 [q 2 4 pq ]
1 (1 p) 2 [(1 p) 2 4 pq]
1 (1 p) 2 [1 p 2 2 p 4 p 4 p 2 ]
1 (1 p) 2 [1 2 p 3 p 2 ]
1 (1 p 2 2 p)[1 2 p 3 p 2 ]
1 1 p 2 2 p 2 p 2 p 3 4 p 2 3 p 2 3 p 4 6 p 3 )
3 p 4 8 p3 6 p 2
p 2 (3 p 2 8 p 6)
p 2 (3 p 2 8 p 6) p(2 p)
68
3 p 4 8 p3 6 p 2 2 p p 2
3 p 4 8 p3 7 p 2 2 p 0
p(3 p 3 8 p 2 7 p 2) 0
(3 p 3 8 p 2 7 p 2) 0
( p 1)(3 p 2 5 p 2) 0
= 3p(p-1)-2(p-1)
= (3p-2) (p-1)
( p 1)(3 p 2)( p 1) 0
( p 1) 2 (3 p 2) 0
That is
( p 1) 2 0, (3 p 2) 0
P ≥ 1 or p ≥ 2/3
Since p>2/3 is the only pemitted value and also P<1 , the required value of p is 2/3.
8. A factory has 10 machines which may need adjustment from time to time during the day. Three of
these machines are old, each having a probability of 1/11 of needing adjustment during the day and 7
are new, having the corresponding probability of 1/21. Assuming that no machines needs adjustments
twice on the same day, find the probabilities that on a particular day,
(ii) just 2 machines that need adjustment are of the same type
Solution
Let X1 be a random variable which denotes the number of old machines need adjustment and X2 be the
random variable which denotes the number of new machines that need adjustments
69
Let p1= Probability that an old machine needs adjustment
P( X 1 x) nc x p1 x q1n x
1 10
3c x ( ) x ( )3 x , x 0,1,2,3
11 11
P( X 2 x) nc x p2 x q2 n x
1 x 20 7 x
7c x ( ) ( ) , x 0,1,2,..,7
21 21
i) The probability that just 2 old machines and no new machines need adjustment is given by
1 10 1 20
3c2 ( ) 2 ( ).7 c0 ( ) 0 ( ) 7
11 11 21 21
=0.016
ii) If just 2 machines need adjustment and they are of the same type can happen in the following two
mutually exclusive ways:
70
10 3 1 20
0.016 ( ) 7 c2 ( ) 2 ( )5
11 21 21
=0.016+0.028
=0.044
i) If he fires 7 times, what is the probability of his hitting the target at least twice?
ii) How many times must he fire so that the probability of his hitting the target at least once is greater
than 2/3?
Solution
P ( X x) p ( x) nc x p x q n x , x 0,1,2,..., n
i) Given n=7
1 3
P( X x) 7 c x ( ) x ( ) 7 x , x 0,1,2,...,7
4 4
1 3 1 3
1 7 c0 ( ) 0 ( ) 7 7 c1 ( )1 ( ) 6
4 4 4 4
3 3 7
1 ( )6[ ]
4 4 4
3 10
1 ( )6[ ]
4 4
0.551
= 1 – P(X<1)
= 1- P(X=0)
71
= 1-(3/4)n ≥ 2/3 (Given)
1-2/3 ≥ (3/4)n
1/3 ≥ (3/4)n
By trial, When n=4 this condition is satisfied. Therefor he must fire 4 times to hit at least once with
probability more than 2/3.
10. A set of 6 similar coins are tossed 640 times with the following results:
Number 0 1 2 3 4 5 6
of heads
Calculate the binomial frequencies on the assumption that the coins are symmetrical.
Solution
Let X denote the number of heads and let X follow binomial distribution with parameters (n,p)
Here n=6
To find p, we compute the mean of given frequency distribution and equate it to np( mean od the
binomial distribution)
X 0 1 2 3 4 5 6
x
fx 1949
f 640
6p= 1949/640
p= 0.5075 q=0.4925
72
NP ( X x) 640.6 c x (0.5075) x (0.4925) 6 x
0 (0.4925)6 9.13308 9
1 6(0.5075)(0.4925)5 56.467 56
5 6(0.5075)5(0.4925) 63.6675 64
6 (0.5075)6 10.947 11
Total 640
11.Two dice are thrown 120 times. Find the average number of times in which the number on the first
die exceeds the number on the second die.
Solution
The number on the first die exceeds that on the second die, in the following combinations:
(2,1);
(3,1) ,(3,2);
(4,1),(4,2),(4,3);
(5,1),(5,2),(5,3),(5,4);
(6,1),(6,2),(6,3),(6,4),(6,5)
Where the numbers in the parantheses represent the numbers in the first and second dice respectively.
P(success) = P(number in the first dice exceed the number in the second dice)
=15/36 =5/12
This probability remains the same in all the throws that are independent.
73
If X is the number of success, then X follows binomial distribution with parameter n=120 and p=
5/12
E(X)=np=120*5/12= 50
12. It is known that diskettes produced by a certain company are defective with a probability 0.01
independently of each other. The company markets diskettes in packages of 10 and offers a money
guarantee that almost 1 of the 10 diskettes is defective. What proportion of diskettes are returned? If
someone buy 3 diskettes, what is the probability that he will return exactly one of them?
Solution
Given p = 0.01
q = 1-p =0.99
n = 10
Then
P ( X x) nc x p x q n x
The company must replace the packages only when it has more than 1 defective diskette
= P(X=0) + P(X=1)
0.9044 0.0914
0.1858
= 1 – P[X≤1]
= 1 – 0.1858 = 0.8142
Therefore, 0.4% of the packages will have to replace. [ or the proportion of packages sold to be
replace is 4%]
74
If someone buys 3 diskettes,
= 2.4426
13. Assuming that half of the population is vegetarian and that 100 investigators each take 10
individuals to see whether they are vegetarians, how many would you expect to report that 3 people or
less were vegetarians?
Solution
P ( X x) nc x p x q n x
1 1
10 c x ( ) x ( )10 x
2 2
1
10 c x ( )10
2
P ( X 3) P ( X 0) P ( X 1) P ( X 2) P ( X 3)
1 1 1 1
10 c0 ( )10 10 c1 ( )10 10 c 2 ( )10 10 c3 ( )10
2 2 2 2
1
( )10 [1 10 45 120]
2 \
1 176
( )10 [176] 0.1718
2 1024
Among 100 investigators, the number of investigators who report that 3 or less were consumers
=100 * 0.1718
=17 investigators
14. A factory produces 10 articles daily. It may be assumed that there is a constant probability p= 0.1
of producing a defective article. Before the articles are stored, they are inspected and the defective
75
ones are set aside. Suppose that there is a constant probability r = 0.1, that a defective article is
misclassified. If X denote the number of articles classified as defective at the end of a production day,
find a) P(X=3) and b) P(X>3)
Solution
Let X be the random variable represented by the number of articles which are defective.
= 0.1 *0.9
p = 0.09
q = 1 – p = 0.91
n= 10
P ( X x) nc x p x q n x
10 c x (0.09) x (0.91)10 x
P( X 3) 10 c3 (0.09) 3 (0.91) 7
0.0452
P( X 3) 1 P( X 3)
1 [ P ( X 0) P ( X 1) P ( X 2) P ( X 3)]
=0.0089
15. An irregular 6 faced die is thrown and the expectation that in 10 throws it will give 5 even
numbers is twice the expectation that it will give 4 even numbers. How many times in 10,000 sets of
10 throws would you expect to give even number?
Solution
76
Let the random variable X denote the number of even numbers.
Given n=10
P ( X x) 10 c x p x q10 x , x 0,1,2...,10
10 c5 p 5 q 5 2 *10 c 4 p 4 q 6
6 5 5
p q 2 p 4q 6
5
6
p 2q
5
3 p 5q
3 p 5(1 p)
3p 55p
8p 5
5 3
p q 1 p 1 5
8 8 8
Therefore the required number of times that in 10,000 sets of 10 throws each we get no even number
5 3
= 10,000 * P[X = 0] = 10,000 *10 c0 ( ) 0 ( )10 0
8 8
= 0.5499 approximately
POISSON DISTRIBUTION
Definition
77
If X is a discrete random variable that assumes only non-negative values such that its probability
mass function is given by
e x
P X x , x 0,1,2,3,... where 0
x!
0, otherwise
np ( )
p
q 1 p 1
3. is finite and n, n where is a positive constant.
Proof
Let X be a binomially distributed random variable. Then probability mass function of a binomial
distribution is
P( X x) nc x p x q n x , x 0,1,2,..., n
n!
p x (1 p ) n x
(n x)! x!
np p q 1 p 1
Let n and n
x n x
.(n ( x 1))(n ( x 2)...(n 1)n 1
n n
x!
78
x
n.n(1 1 n)n(1 2 n)...n(1 (( x 1) n)) .(1 ) n x
nx n
x!
x
n x (1 1 n)(1 2 n)...(1 (( x 1) n)) .(1 ) n x
nx n
x!
(1 1 n)(1 2 n)...(1 (( x 1) n)) x .(1 ) n x
P( X x) n
x!
(1 ) n x
n e x
P( X x) x Lt
n x! x!
E( X ) xP( X x)
Mean = x 0
e x
x! x
x 0
x x 1
e x!
x 1
x 1
e
x 1
( x 1)!
Mean e e
Var(X) E ( X 2 ) [ E ( X )]2
79
[ E ( X )]2 x 2 p( x)
x 0
e x
x2 x!
x 0
e x
( x 2 x x) x!
x 0
e x
( x( x 1) x) x!
x 0
e x e x
x( x 1)
x!
x
x!
x 0 x 0
e x e x
( x 2)! ( x 1)!
x2 x 1
e x 2 x 1
e
2 ( x 2)! ( x 1)!
x2 x 1
x 2
x 1
2 e ( x 2)! e
( x 1)!
x2 x 1
2 e e e e
E ( X 2 ) 2
Var(X) E ( X 2 ) [ E ( X )]2 2 2
Find the moment generating function of the Poisson distribution and hence find the mean and variance
e x
P X x , x 0,1,2,3,...
x!
80
The moment generating function of the poisson distribution is
M x (t ) E (etx )
etx p( x)
x 0
e x
etx x!
x 0
e xt x
tx x
e
x! e x!
x 0 x 0
(et ) x t
e x! e ee
x 0
t
e (e 1)
d
Mean E ( X ) 1' [ M x (t )]
dt t 0
d t
(e e e )
dt t 0
t
e e e e t
t 0
e e
Var(X) E ( X 2 ) [ E ( X )]2
2
d2'
E ( X ) 2 2 [ M x (t )]
dt t 0
81
d t
[e ee et ]
dt t 0
t t
e [ e e e t e t e e e t
t 0
e (e e )
e e (1 )
(1 ) 2
E ( X 2 ) 2
Var(X) E ( X 2 ) [ E ( X )]2 2 2
Mean=λ
Variance=λ
1.If X is a Poisson variate such that P(X=1)=3/10 and P(X=2)=1/5. Find P(X=0) and P(X=3)
Solution
e x
P X x
x!
e 1 3
P X 1 (1)
1! 10
e 2 1
P X 2 (2)
2! 5
e - 2 1
( 2) 2! 5
(1) 3
e
1! 10
82
(2) 2 4
(1) 2 3 3
4 x
e 3 4
e x 3
P X x
x! x!
4 0
e 3 4 4
P X 0 3 e 3 0.2637
0!
4 3
e 3 4
P X 3 3 0.1047
3!
2. In a certain factory producing razor blades, there is a small chance 1/500 for any blade to be
defective. The blades are supplied in packets of 10.Use Poisson distribution to calculate the
approximate number of packets containing
Solution
λ=np=10/500=1/50=0.02
e x e 0.02 (0.02) x
P X x
x! x!
e 0.02 (0.02)0
0.9802
0!
83
= 9802
= 1- P(X<1)
= 1-P(X=0)
= 1 – 0.9802 = 0.0198
Therefore the number of packets containing at least one defective= 10000 * 0.0198
= 198
=P(X=0) + P(X=1)
0.9997
Therefore the number of packets containing at most 1 defective blade = 10000 * 0.9997
= 9997
3. An insurance company has discovered that only about 0.1% of the population is involved in a
certain type of accident each year. If its 10000 policy holders were randomly selected from the
population, what is the problem that not more than 5 of its clients are involved in such an accident
next year?
Solution
n= 10000
84
e x e 10 (10) x
P( X x)
x! x!
P ( X 5) P ( X 0) P ( X 1) P ( X 2) P ( X 3) P ( X 4) P ( X 5)
e 10 (10)0 e 10 (10)1 e 10 (10) 2 e 10 (10)3 e 10 (10) 4 e 10 (10)5
P( X 5)
0! 1! 2! 3! 4! 5!
0.0671
4. In a given city 4% of all licenced drivers will be involved in at least 1 road accident in any given
year. Determine the probability that among 150 licenced drivers ran only chosen in this city
Solution
4
np 100 6
100
e 6 65
P( X 5) 0.1606
i) 5!
ii) P ( X 3) P ( X 0) P ( X 1) P ( X 2) P ( X 3)
e 6 6 e 6 6 2 e 6 63
e6 0.1512
1! 2! 31
5.In an industrial complex, the average number of fatal accidents per month is one-half. The number
of accidents per month is adequately described by a Poisson distribution. What is the probability that
6 months will pass without a fatal accident.
Solution
85
The probability that 6 months will pass without a fatal accident = P(X=0)
e x
P( X x)
x!
e 3 30
P ( X 0) 0.0498
0!
6. A radioactive source emits on the average 2.5 particles per second. Find the probability that 3 or
more particles will be emitted in an interval of 4 seconds.
Solution
λ = 2.5/sec
1 P[ X 0] P[ X 1] P[ X 2]
100
1 e 10 (1 10 )
2
0.9972
7. Messages arrive at a switch board in a Poisson manner at an average rate of six per hour. Find the
probability for each of the following events
Solution
86
e x e 6 6 x
P( X x)
x! x!
e 6 6 2
P( X 2) 0.0446
2!
e 6 60
P( X 0) 0.0025
0!
P ( X 3) 1 P ( X 3) 1 [ P ( X 0) P ( X 1) P ( X 2)]
1 e 6 (1 6 18) 0.9380
8. A car hire firm has 2 cars which it hires out day by day. The number of demands for a car on each
day follows a Poisson distribution with mean 1.5. Calculate the proportion of days on which
Solution
e x
P( X x)
P( x demands in a day)= x!
Given: λ = 1.5
e 1.5 (1.5) x
P( X x)
Now x!
e 1.51.50
P ( X 0) e 1.5 0.2231
0!
P ( X 2) 1 [ P ( X 2)]
87
1 [ P( X 0) P( X 1) P( X 2)]
0.19126
9. The proofs of a 500 page book contains 500 misprints. Find the probability that there are at least 4
misprints in a randomly chosen page.
Solution
e x e 11x
P( X x)
x! x!
P( at least 4 mistakes) P ( X 4)
1 P( X 4)
1 [ P( X 0) P( X 1) P( X 2) P( X 3)]
e 1 e 1 e 1 e 1
1
0! 1! 2! 3!
1 1
1 e 1 1 1
2 6
0.0180
10. It has been established that the number of defective stereos produced daily at a certain plant is
Poisson distributed with mean 4. Over a 2 day span, what is the probability that the number of
defective stereos does not exceed 3?
Solution
88
Let X1 be the number of defective stereos produced on the first day and X2 be the number of defective
stereos produced on the second day.
e x e 8 8 x
P( X x) , x 0,1,2..
x! x!
e 8 80 e 8 81 e 8 82 e 8 83
0! 1! 2! 3!
0.0424
11. If the number of telephone calls coming into a telephone exchange between 9 A.M and 10 A.M
and between 10 A.M and 11 A.M are independent and follows Poisson distribution with parameters 2
and 6 respectively. What is the probability that more than 5 calls come between 9 A.M and 11 A.M
Solution
Let X be a random variable which denotes the number of telephone calls between 9 am and 10 am
with parameter 2 and Y be a random variable which denotes the number of telephone calls between 10
am and 11 am with parameter 6
Hence
P ( Z 5) 1 P ( Z 5)
1 P( Z 0) P( Z 1) P( Z 2) P( Z 3) P( Z 4) P( Z 5)
5
e z
1 z!
z 0
8 8 82 83 84 85
1 e [1 ]
1! 2! 3! 4! 5!
89
1 e 8 [1 8 32 85.3333 170.669 273.0667]
0.80876
12. Fit a poisson distribution to the following data and calculate the theoretical frequencies
Deaths 0 1 2 3 4
Frequency 122 60 15 2 1
Solution
X F Fx Theoretical
frequencies
0 122 0 121
1 60 60 61
2 15 30 15
3 2 6 3
4 1 4 0
x
fx 100 0.5
Mean N 200
e x
N P[ X x] 200
= x!
e 0.5 (0.5). x
f ( x) 200 (1)
x!
90
Putting x=0,1,2,3,4 in (1) we get
e 0.5 (0.5).0
f (0) 200 121
0!
e 0.5 (0.5).1
f (1) 200 61
1!
e 0.5 (0.5) 2
f (2) 200 15
2!
e 0.5 (0.5) 3
f (3) 200 3
3!
e 0.5 (0.5) 4
f (4) 200 0
4!
EXERCISES:
1. The number of typing mistakes that a typist makes on a given page has a Poisson distribution with a
mean of 3 mistakes. What is the probability that she makes
i) Exactly 7 mistakes
2.The number of black floes on a broad bean leaf follows a Poisson distribution with mean 2. A plant
inspector however records the number of flies on a leaf only if at least 1 fly is present. What is the
probability that he records 1 or 2 flies on a randomly chosen leaf? What is the expected number of
flies recorded per leaf?
3. Letters were received in an office an each of 100 days. Assuming the following data to form a
random sample from a Poisson distribution, find the expected frequencies, correct to nearest unit.
No. of 0 1 2 3 4 5 6 7 8 9 1
letters 0
Frequencies 1 4 1 2 2 2 8 6 2 0 1
91
5 2 1 0
The negative Binomial random variable represents the number of failures before the rth success. Here
the number of success is fixed.
i) The experiment consists of a series of independent and identical Bernoulli’s trails, each with
probability p of success,
ii) The trails are observed until exactly r successes are obtained where r is fixed by the experiment.
iii) The random variable X is the number of failures before the rth success.
Definition
A random variable X is said to assume the negative binomial distribution, if it’s probability mass
function is given by
PX x x r 1c r 1 p r q x , x 0,1,2,...
Note
r x r p( x)
W.k.t., x 0
1 E ( X ) xp( x)
x 0
xx r 1cr 1 p r q x
x 0
92
2(r 1) r r 2
0 rp r q p q ...
2!
r 1 (r 1)(r 2) 2
rp r q 1 q q ...
1! 2!
1 rq
rp r q
r 1
p p .
rq rq
1
p i.e., Mean = p .
2 E( X 2 ) x 2 p( x)
x 0
x 2 ( x r 1) cr 1 p r q x
x 0
x x( x 1x r 1cr 1 p r q x
x 0
rq
x( x 1) x r 1c r 1 p r q x
p x 0
rq (r 1)r r 2 (r 2)(r 1)r r 3 (r 3)(r 2)(r 1)r r 4
0 0 2(1) p q 3(2) p q 4(3) p q ...
p 2! 3! 4!
rq (r 2)(r 3) 2
(r 1)rp r q 2 1 (r 2)q q ...
p 2!
rq 1
(r 1)rp r q 2
p (1 q ) r 2
rq 1
(r 1)rp r q 2
p pr 2
93
rq q2
(r 1)r
p p2
2
rq q2 rq
(r 1)r
p p2 p
rq r 2 q 2 rq 2 r 2 q 2
p p2 p2 p2
rq rq 2
p p2
rq
( p q)
p2
rq
p 2 (since p+q=1)
Variance= E[X2]-[[E[X]]2
rq
Therefore, Variance p2 .
M X (t ) E (e tx ) etx p( x)
By definition of M.G.F, x 0
etx x r 1cr 1 p r q x
x 0
x r 1cr 1 qet
x
pr
x 0
(r 1)r
p r 1 rqe t (qe t ) 2 ...
2!
94
r (1 x) n 1 nc x nc x 2 nc x3 ....)
p r 1 qet 1 2 3
(
r
pr p
M X (t )
(1 qet ) r 1 qet
.
Examples
1.Find the probability that a person tossing 3 coins will get either all heads or all tails for the second
time in a fifth toss.
Solution
S={HHH,HHT,HTH,THH,HTT,THT,TTH,TTT}
1 1 2 1
8 8 8 4
1
p= 4
3
q 1 p
4
P( X x) x r 1c r 1 p r q x
x+r=5 x 5 2 3
P( X 3) 3 2 1c 21 (1 / 4) 2 (3 / 4) 3
4 c1 (1 / 4) 2 (3 / 4) 3 =0.1055
2. In a company 5% defective components are produced. What is the probability that atleast 5
components are to be examined in order to get 3 defective ones?
95
Solution
p 5% 0.05
q=0.95
Required probability= P X x r 5
Px 3 5
Px 2
1 Px 2
1 0.00048 0.99952
3. The probability that a child exposed to certain contagious disease is 0.4. what will be the
probability that the tenth child exposed to the disease will be the third to catch it?
Solution
q=0.6
P X 10 P[ x r 10]
P[ x 10 r 10 3 7]
P[ x 7]
We know that
P( X x) x r 1c r 1 p r q x
P[ x 7] 7 3 1
c31 0.4 (0.6)
3 7
=
96
9 c 2 (0.4) 3 (0.6) 7 =0.0645
4. A fair dice is toss on successive independent trails until the second 6 is observed. Find the
probability of observing exactly 10 non 6’s before the second 6 is toss.
Solution
P=1/6; q=1-p=1-1/6=5/6
1 5
p , q , X 10, r 2
6 6
11c1 (1 / 6) 2 (5 / 6)10
11(0.0045) 0.04935
5. If the probability of a male child is 1 2 , then find the probability that in a family the 6th child is the
third female child?
Solution
Here we have to find the probability that the sixth child is the third female child.
P( X x) x r 1c r 1 p r q x
q=1-p=1/2
r=3
x+r=6
97
x=6-3=3
P[ x 3] 3 3 1c31 (1 / 2) 3 (1 / 2) 3
5c 2 (1 / 2) 3 (1 / 2) 3 5c 2 (1 / 2) 3 0.15625
Definition
k x k 1e x
f ( x) ,x 0
k
0, otherwise
Note
1. When 1 , the Erlang distribution is called Gamma distributionr simple Gamma distribution
with
1 k 1 x
f ( x) x e , x 0, k 0
k
r E(x r ) x
r
f ( x)dx
k x k 1e x
r
x dx
k
0
k
x r k 1e x dx
k
0
98
k k r
k k r
k r
r
r k
Putting r=1
k 1 kk k
1
k k .
k
Mean= E(X)=
k 2 k 1k 1 (k 1)kk
2
2 k 2 k 2 k
k (k 1)
2
2
Var(X)= 2 ( 1 )
2
k (k 1) k
2
k2 k k2 k
2 2
k
2
Var(X)=
GAMMA DISTRIBUTION
Definition
A continuous random variable X is said to follow the Gamma distribution with parameter .
If the probability density function is given by
99
e x x 1
f ( x) , 0, 0 x
0, otherwise
tx tx
M X (t ) E (e ) e f ( x)dx
e x x 1
e tx dx
0
1
e (t 1) x x 1dx
0
1
1 t
1
1 t
M X (t ) 1 t
d
1 E ( X ) M X (t )
dt t 0
d
(1 t )
dt t 0
(1 t ) 1 (1) t 0
Mean= .
d2
2 E ( X 2 ) M X (t )
2
dt t 0
100
d
dt
(1 t ) ( 1) t 0
1(1 t ) ( 2) t 0
2 ( 1)
Var(X)
Examples
1. The daily consumption of milk in excess of 20,000 litres is approximately distributed as Gamma
1
variable with parameter k=2, 10,000 . If the city has a daily stock of 30,000 litres on a given day,
find the probability that the stock is insufficient.
Solution
Let X be the random variable for daily consumption of milk in the city.
k y k 1e y
f (Y ) , y0
k
1
When k =2, 10,000
2 1
y
1
ye 10,000
10,000
f (Y )
2
P ( X 30,000) P (Y 10,000)
y
y
10,0002 e
10,000 dy
10,000
101
y dy
t dt
Put 10,000 , 10,000
Therefore
1
PX 30,000 te t 10,000 dt
10,000
1
te t dt
1
te t e t 1
1e 1 e 1 2e 1
2. The consumer demand for electricity in a certain locality per month is known to follow general
Gamma distribution. If the average demand is a kilowatt and the most likely demand is b
kilowatt(b<a) What is the variance of the demand.
Solution
k k 1 x
f ( x) x e ,x 0
k .
The most likely demand is the mode of X. i.e., the value of x for which f(x) is maximum.
i.e.,
f ( x)
k
k
(k 1) x k 2 e x x k 1 (e x )
e k x k 2
(k 1) x
k
f ( x) 0 k 1
x
when x = 0,
f ( x)
k
k
x
k 2 x
e (k 1) x
dx
x
d k 2 x
e
102
f ( x)
k
k
x k 2 x
e ( k 1)
x
dx
x e
d k 2 x
0
when
x
k 1
.
k 1
b
i.e.,
k
E( X ) a
Now [since average demand =a]
1
ab
k k 1
Var(X) a (a b)
Therefore λ 2 .
3. In a city, the daily consumption of electric power is million kilowatts hours in a random variable
with general gamma distribution or Erlang distribution with parameter 1 2 and k=3. If the power
plant of this city has a daily capacity of 12 million kilowatts hours. What is the probability that this
power supply will be inadequate on any given day.
Solution
Let X be the random variable that denotes the daily consumption of power. Then
3 1
1 3 1 2 x
x e
2
f ( x) ,x 0
3
3 1
1 21 2 x
x e
2
f ( x)
3 .
The power supply will be inadequate if the consumption goes beyond 12 million kilowatt hour.
P X 12 f ( x)dx
12
x
1 2e 2
8x 3
dx
12
103
1 1 x
. x 2 e 2 dx
8 3
1 x 2 e x 2 2 xe x 2 2e x 2
8.2 1 2 1 / 22 1 23 12
1
16
2 x 2 e x 2 8 xe x / 2 16e x / 2 12
1
16
() 2(12) 2 e 6 8(12)e 6 16e 6
1 16
e 288 96 16 25e 6 0.0625
16
4. The daily consumption of bread in a hostel in excess of 2000 leaves is approximately gamma
1
distributed with parameters k =2 and 1000 . The hostel has a daily stock of 3000 leaves. What is
the probability that the stock is insufficient on a day.
Solution
Let Y X 2000 .
1
Then Y follows gamma distribution with k =2 and 1000 .
k y k 1e y
f (Y ) , y0
2
1
y
2 2 1 1000
1 y e
1000 2
2 1
1 y
ye 1000
1000 ,y>0
104
P(Y 2000 3000)
P (Y 1000)
1
ye y 1000 dy
2
1000 1000
2e 1 0.7358
EXPONENTIAL DISTRIBUTION
Definition
f ( x ) e x , 0
x0
Note
x
F ( x) e x dx if x 0
0
0 otherwise
1 e x , if x 0
0 otherwise
r E[ X ] x r e x dx
' r
105
e x x r 11dx
0
(r 1) r 1 r!
r 1 r
r
r!
r ' r
1! 1
1' 1
Now,
2! 2
2' E ( X 2 ) 2 2
2 1
Var ( X ) E ( X 2 ) [ E ( X )]2 2 ( ) 2
2 1 1
2 2 2
1
Var ( X ) 2
M X (t ) E (e ) etx f ( x)dx
tx
etx e x dx
0
e ( t ) x dx
0
0
e e( t ) x
t
( t ) 0
106
d 1
M x (t )
2
( t )
dt
1
d 1
M (t
dt X )
2
t 0
d2 2
2
M x (t )
dt ( t )3
d2 2 2
2 2 M x (t )
3
dt t 0 2
Mean = 1 1
2
Variance = 2 1 2 1
2 1
2
Examples
1. Suppose that during a rainy season in a tropical island, the length of the shower has an exponential
distribution with average 2 minutes. Find the problem that the shower will be there for more than
three minutes. If the shower has already lasted for 2 minutes, what is the problem that it will last for
at least one more minute.
Solution
1 1 2 x
f ( x ) e x e ,x 0
2
0, otherwise .
107
(i) To find the probability of shower lasting more than 3 minutes.
1 x 2
P( x 3) 2e dx
3 .
1 e x 2
2 1 2
3
1 e 3 2
0.2231
2 1 2
.
ii) P[The shower will last at least one more minutes given that it has lasted 2 minutes]=
P[ X 3 / X 2]
2
1 x 2
2e dx
=P[X>1]= 1
1 e x 2
2 1 2
1
2. The daily consumption of milk in a city in excess of 20,000 litres is approximately exponentially
distributed. The average excess in consumption of milk is 3000 litres. The city has a daily stock of
35,000 litres. What is the probability thst of two days selected at random, the stock is insufficient for
both the days.
Solution
Let X be the random variable of daily consumption of milk in excess of 20,000 litres.
1 3000
1 x 3000
f ( x) e ,x 0
3000 .
108
i.e., P(Y>35,000)=P(X+20000>35,000)
=P(X>15,000)
x
e dx
15,000
1 x 3000
3000
e dx
15,000
1 e 15,000 3000
e
3000 1 3000
e 5
5 5 10
Therefore P[stock is insufficient on both days] e e = e
3. The mileage which car owners get with a certain type of radial tyre is a random variable having an
exponential with mean 40,000 k.m. Find the probability that atleast one of these tyres will last
i) atleasr 20,000k.m
Solution
1 1
mean
40,000
f ( x ) e x , x 0
1
x
1
e 40,000 , x 0
20000
P X 20,000 f ( x)dx
i) 20,000
109
1
x
1
40,000
e 40,000 dx
20,000
1 x
1 e 40,000
40,000 1
40,000
20,000
1
= 40,000
40000(0 e 1 2 e 1/ 2
=0.6065.
30,000 1
x
1
PX 30,000 e 40, 000 dx
40,000
ii) 0
30,000
1 x
1 e 40,000
40,000 1
40,000
0
3
3
1
40000(e 1 1 e 4
4
40,000
=0.5276.
4. The time (in hours) required to repair a machine is exponentially distributed with parameter
1
2 . What is the probability that the repair time exceeds 2 hours? What is the conditional
probability that the repair time take takes atleast 10 hours given that its duration exceeds 9 hours?
Solution
1
Given X is exponentially distributed with 2.
1 x 2
f ( x ) e x e ,x 0
Therefore 2 .
110
To find P[ X 2]
P[ X 2] f ( x)dx
2
1 x 2
2e dx
2
1 e x 2
2 1 / 2
2
e e 2 2 e 1 0.3679 .
To find P X 10 / X 9
1 x / 2
2e dx
1
NORMAL DISTRIBUTION
Definition
2
1 x
1
y e 2
A normal distribution is a continuous distribution given by 2 where X is a
continuous normal variate distributed with density function
2
1 x
1
f ( ) e 2
2 with mean and standard deviation .
111
When mean has been taken at the origin but if however another point is taken as the origin such that
the excess of the mean over the arbitrary origin is m then
2
1 xm
1
y e 2
2 is the standard form of the normal curve with origin at (m,0).
1. The normal distribution is a symmetrical distribution and the graph of the normal distribution is bell
shaped.
2. The curve has a single peak point (i.e.,) the distribution is unimodal
3. The mean of the normal distribution lies at the centre of normal curve.
4.Because of the symmetry of the normal curve, the median and mode are also at the centre of the
normal curve. Hence in a normal distribution the mean, median and mode coincide.
5. The tails of the normal distribution extend indefinitely and never touch the horizontal axes. That is
we say that the normal curve approaches approximately from either side of its horizontal axes.
6. The normal distribution is a two parameter probability distribution. The parameters mean and
standard deviation (μ,σ) completely determine the distribution.
7. Area property:
In a normal distribution about 67% of the observations will lie between mean S.D i.e., (μ
σ). About 95% of the observations lie between mean 2S.D (i.e., μ 2σ). About 99% of the
observation will lie between mean 3S.D i.e.,(μ 3σ) .
112
If X is a normally distributed random variable, μ and σ are respectively its mean and standard
X
Z
deviation, then is called standard normal random variable.
X=
Z=0
Normal table
Special table called table of areas under normal curve is available to determine probabilities
that the random variable lies in a given range of values of the variables. Using the table, we can
determine the probability for X, taking a value less than x (X<x) and also for a given probability we
determine the value x such that X < x
If X1, X2 ,…, Xn are independent normal variates with parameters (m1,σ1),(m2, σ2)
,…,(mn, σn) respectively then X1+X2+…..+ Xn is also a normal variate with parameter (m, σ)
Examples
1) X is a normal variate with mean 30 and standard deviation 5. Find the probability that
i)
26 X 40 ; ii) X 45 iii) X 30 5 .
Solution
Given 30; 5
X
z
26 30
z 0.8
i) when X= 26, 5
40 30
z 2
when X=40, 5
113
P26 X 40 P 0.8 z 2
-0.8 X=zz=2
=0.2881+0.4772
Z=0
=0.7653.
45 30
z 3
ii) when X=45, 5
P X 45 P( z 3)
= P0 Z ) P (0 Z 3) X= z=3
=0.5- P (0 z 3)
Z=0
=0.5-0.4987=0.0013.
iii)
To find P X 30 5
P X 30 5 P(25 X 35)
25 30
z 1
When X=25, 5
35 30
z 1
When X=35, 5
2 P(0 z 1)
=2(0.3413)
=0.6826.
P X 30 5 1 P X 30 5
=1-0.6826
114
=0.3174
Solution
Given 20 and 10
X
z
We know that .
X 15 15 20
z 0.5
When , 10 and
40 20
z 2
When =40, 10
=0.4772+0.1915
=0.6687.
3. The average seasonal rainfall in a place is 16 inches with a standard deviation of 4 inches. What is
the probability that in a year the rainfall in that place will be between 20 and 24 inches?
Solution
X
z
20 16
z 1
xWhen X=20, 4
24 16
z 2
When X =24, 4 z=1 z=2 0.5
Z=0
P20 X 24 P1 z 2
P0 z 2 P0 z 1
=0.4772-0.3413
=0.1359.
115
Note
E(aX+bY)=aE(X)+bE(Y)
Var(aX+bY)=a2V(X)+b2V(Y)
Var(a)=0
E(a)=a
4. X is a normal variate with mean 1 and variance 4. Y is another normal variate independent of X
with mean 2 and variance 3. What is the distribution of X+2Y?
Solution
Since X and Y are independent normal variates, X+2Y will also be a normal variate by the additive
property and
=1+2(2)=5
12(4)+22(3)=16.
5. The saving bank account of a customer showed an average balance of Rs.150 and a standard
deviation of Rs.50. Assuming that the account balances are normally distributed.
Solution
1) To find P ( X 200)
X
z
We know that z=1
=0.5-0.3413
=0.1587.
170 150
z 0.4
When X=170, 50
=0.2257+0.1554=0.3811
3. To find P ( X 75)
75 150
z 1.5
When X=75, 50
P( X 75) P( z 1.5)
=0.5-0.4322=0.0668.
6. The mean yield for one-acre plot is 662 kilos with standard deviation 32 kilos. Assuming normal
distribution, how many one-acre plots in a patch of 1000 plots would you expect to have yield over
700 kilos below 650 kilos.
Solution
Given 662, 32
117
X X 662
z
32
700 662
z
When X=700, 32 =1.19
650 662
z 0.375 0.38
When X=650, 32
Exercises
An electrical firm manufactures light bulbs that have a life, before burnout, that is normally
distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find
ii) the probability that bulb burns between 778 and 834 hours.
7. In a distribution exactly normal 7% of the items are under 35 and 89% are under 63. What are the
mean and standard deviation of the distribution?
Solution
X
z
We know that
P(0 z z1 ) 7% 0.07
P( z1 z 0) 43% 0.43
35
1.48
i.e.,
35 1.48 (1)
118
P[ z z1 ] 39% 0.39
P(0 z z2 ) 0.39
63
1.2
i.e.,
63 1.2 (2)
(2)-(1) 28=2.68
28
2.68 =10.45
(2) 63 (1.2)(10.45)
63 12.54
50.46
8. In a normal distribution, 31% of the items are under 45 and 8% are over 64. Find the mean and
variance of the distribution.
Solution
X
z
We know that .
P( z1 z 0) 0.19
45
0.49
i.e.,
45 0.49 (1)
119
P[ z z1 ] 0.8 or P[0 z z 2 ) 0.42
64
1.40
i.e.,
64 1.40 (2)
(1)-(2) -19=-1.89
19
10
1.89
45 4.9 49.9
50
9. Suppose the heights of men of a certain country are normally distributed with average 68 inches
and standard deviation 2.5. Find the percentage of men who are
Solution
X
z
We know that
66 68
z 0.8
When X=66, 2.5
71 68
z 1.20
When X=71, 2.5
120
P 0.8 z 0 P0 z 1.20
=0.2881+0.3849=0.6730
71.5 68 X 68 72.5 68
P71.5 X 72.5 P
2.5 2.5 2.5
P1.4 z 1.8
121
UNIT-III
BASIC STATISTICS
Introduction
The word statistics to indicate the numerical data in any field of enquiry and the term statistical
method to denote ‘ the technique of studying and analyzing the data.
Variables
Thus the height, weight, age, intelligence, number of children per family, the number of
students in a class etc are variates.
Example: The number of children in a family and the number of students in a class.
ii) Continuous Variate: Continuous variates correspond to variables which are measured
theoretically to any degree of accuracy.
Frequency tables:
Suppose we know the marks obtained by 140 candidates in an examination in a certain subject. Let
the marks be given by the following table
86 35 69 12 55 53 41 10 35 58 71 45 50 30
59 56 37 29 29 51 47 46 82 36 52 59 32 54
16 65 42 27 53 39 40 62 54 53 38 69 66 50
74 26 44 21 53 32 41 54 32 81 58 45 48 30
57 37 43 77 34 21 61 54 46 33 62 52 75 89
66 75 60 47 85 37 49 61 93 50 51 8 45 49
20 70 47 41 49 16 60 63 39 58 23 40 44 68
52 28 52 51 31 83 36 80 70 43 52 35 18 60
27 60 31 44 78 48 55 38 25 59 22 72 61 48
41 98 67 42 42 33 11 64 72 46 37 76 65 43
(Table 1)
122
Frequency table of marks of 140 candidates
We have arranged the raw data into classes of appropriate size, showing the corresponding frequency
of variates in each class.
When any set of data is symmetrically arranged in this way, it is called a frequency
distribution.
In the above the pairs of numbers written in the column of classes are called lower and upper
class limits. Some times called open class limits or class boundaries.
The difference between the heighest mark and lowest mark is called the range.
The range is divided into classes of appropriate size. The size of the class is called the class-interval.
Usually the classes are of equal size.
The mid value of such a class is called the class-mak, mid-value or the central value.
The width of a class interval is therefore the common difference beween the consecutive class
marks. It is also the difference between the lower ( or upper) limits of two successive classes.
In the above, the class interval is 10 and the successive class marks are 4.5, 14.5 etc
Cumulative Frequency:
If f1, f2, f3, … are the frequencies of the successive classes, then f1, f1+f2, f1+f2+f3, etc are
the cumulative frequencies.
Thus the cumulative frequencies for the previous table is 1, 7, 19, 42, …
The cumulative frequency 42 gives the number of students obtaining 39 or lower marks.
Relative Frequency:
In some problems we may require the relative frequency instead of the actual or absolute
frequency.
123
The relative frequency of any class is the ratio of the frequency of that class to the total frequency.
1 6 12 23
, , ,
Thus in the previous table the relative frequencies of the various classes are 140 140 140 140 etc.
We have seen that numerical data relating to an event can be presented in the form of a table.
As a visual aid to grasp the data in the table, certain diagrams and graphs are used.
The graph representing a frequency distribution is known as a frequency graph. We shall now
consider some methods of representing statistical data graphically.
Plot the points whose x-co-ordinates are the middle values of the classes and y coordinates are the
frequencies in these classes. The figure obtained by joining the successive points is known as a
frequency polygon.
b) The Histogram
The histogram consists of a set of rectangles erected over the true class intervals, their areas being
proportional to the frequencies of the respective classes.
Thus the base of a rectangle is the class-interval in width, the centre of the rectangle is the
mid-value and its area represents the class frequency.
Note:
1. In histogram frequencies are represented by areas, where as in a frequency polygon frequencies are
represented by lengths.
2. In a histogram the width of a rectangle is the same as that of the class. Since the classes are of equal
width, the height of the rectangle will be proportional to the class frequencies.
124
Measures of Central Tendency
Averages
An average is a value which is typical or representative of a set of data. It represents the whole series
and conveys a fairly adequate idea of the whole group. Since such typical values tend to lie centrally
within a set of data arranged according to magnitude.
Averages are also called measures of central tendency or measures of location. An average may or
may not be one of the variate given in the data.
The Mode
Geometric Mean
Harmonic Mean
i) Individual Observations: The arithmetic mean (A.M) or the mean of a set of n numbers x1,x2,…,xn is
denoted by x and is defined as
n
xr
x1 x2 ... xn r 1
x
n n
ii)
a) Discrete Series: In the case of a frequency distribution, let us that a set of n numbers x1,x2,…,xn
having frequencis f1,f2,…,fn respectively.
f r xr
Then Mean = fr
ii) Continuous Series: In the case of a frequency distribution, let us that x1,x2,…,xn are the mid values
of the class intervals having frequencies f1,f2,…,fn respectively.
f x f 2 x2 ... f n xn i 1
f i xi
x 1 1
f1 f 2 ... f n n
fi
i 1
125
Shortcut method for calculating the A.M
x Ac
fr dr
N
x A N fr
dr r
c c- width of the class interval;
Examples:
1780 1760 1690 1750 1840 1920 1100 1810 1050 1950
Calculate the arithmetic mean of incomes
Solution
x x2 ... xn
x 1
n
1780 1760 1690 1750 1840 1920 1100 1810 1050 1950
1665
10
2) From the following data of the marks obtained by 60 students of a class, calculate the arithmetic
mean
fr xr
Mean = fr
Marks No. of students frxr
xr fr
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
Total 60 2460
126
Mean
fr xr 2460 41
fr 60
Solution:
Mean
fr xr 3300 33
fr 100
Remark: If there is a gap in class interval, find the average of difference, lower-subtraction; upper
–add
127
Solution
x
f r d r 29770 130.5716
f r = 228
5 ) Calculate the A.M. of frequency distribution of weights of 228 adults given in previous problem
by taking an arbitrary origin
Solution
125 is the mid point of the class having the highest frequency. So we take the arbitrary origin A= 125.
128
x Ac
fr dr
N
127
125 10 130.570
228
Median
Calculation of Mean
i)Individual Observations:
N 1
Median is the size of 2 item
Discrete Series:
N 1
iii) Median is the size 2 item
N 1
iv) Look at the cumulative frequency column and find the total which is either equl to 2 or next
higher to that and determine the value of the variable corresponding to it. That gives the value of
Median.
Continuous Series:
N/2 - c.f
Median l .c
f
391 384 591 407 672 522 777 753 2488 1490
Solution:
129
Arrange in ascending order
N 1
Median = size of 2 item = size of (10+1)/2 item
= size of 5.5th item = average of size of 5th item and size of 6th item = (591+672)/2 = 631.5
1 2 3 4 5 6 7 8
Solution:
130
Since Median is the value of variate whose cumulative frequency which is equal to N+1/2 or next
heigher
Median = 1500.
Solution
Marks f c.f
5-10 7 7
10-15 15 22
15-20 24 46
20-25 31 77 c.f
before the
median
class
25-30 median class 42■ f frq. 119 N/2=
Of m.c 200/2 = 100
30-35 30 149
35-40 26 175
40-45 15 190
45-50 10 200
Total 200
N/2 - c.f
Median l .c
f
f- 42
100 77
25 .5
Median = 42 = 27.7381
131
From the following data compute Median
There is a gap in the class interval divide the difference by 2 and adjust the limits of class interval
½=0.5
429.5-439.5 42 76 c.f
439.5-449.5 54 f 130
449.5-459.5 45 175
459.5-469.5 18 193
469.5-479.5 7 200
N/2 =200/2=100
132
N/2 - c.f
Median l .c
f
100 - 76
Median 439.5 .10 443.94
54
Weighted arithmetic Mean:
If x1, x2, …, xn are n observations and w1, w2,…, wn are their respective weights the weighted
arithmetic mean is defined to be
w1x1 w2 x2 ... wn xn
Weighted A.M = w1 w2 ... wn .
Geometric Mean:
Individual observations: The geometric mean of n sizes x1, x2,…,xn is the nth root of their products
i.e., x1 x 2 ...x n
1/ n
(or)
logx
G.M. Antilog
N
Discrete Series:
1
G.M. f1 log x1 f 2 log x2 ... f n log xn
N
Discrete series:
f(logx)
G.M. Antilog
f
Continuous Series:
f(logx)
G.M. Antilog
f
; x-middle value
Harmonic Mean:
The Harmonic mean of a set of quantities is defined to be the reciprocal of the arithmetic mean of the
reciprocal of the quantities. Hence if x1, x2,…, xn are n observations,
1 n
H.M
1 1 1 1 1
... x
n x1 x 2 x n n
In a frequency distribution
133
N N
H.M
f1 f 2 fn f
... xn
x1 x 2 xn n
Examples:
Aliter:
logx
G.M. Antilog
N
x logx
125 2.0969
1462 3.1650
38 1.5798
7 0.8451
0.22 -0.657
0.08 -1.096
12.75 1.1055
0.5 -0.3010
Total 6.7367
logx
G.M. Antilog
N
=6.925
x 1 2 3 4 5
f 2 4 3 2 1
Find the Geometric mean
134
Solution:
f(logx)
G.M. Antilog
f
X f logx f.logx
1 2 0 0
2 4 0.3010 1.204
3 3 0.4771 1.431
4 2 0.6020 1.204
5 1 0.6990 0.6989
Total 12 4.5379
G.M. Antilog
f(logx)
Antiltog ( 4.5379 ) A.L(0.3781) 2.388
f 12
Marks Frequency
4-8 6
8-12 10
12-16 18
16-20 30
20-24 115
24-28 12
28-32 10
32-36 6
36-40 2
Solution
f(logx r )
G.M. Antilog
f
135
263.4182
G.M. Antilog 18.212
209
Harmonic Mean
Solution
n
H .M .
(1 / x)
x 1/x
3834 2.608
382 0.003
63 0.016
8 0.125
0.4 2.5
0.03 33.333
0.009 111.111
0.005 200
Total 349.696
n
H .M .
(1 / x)
8
H .M . 0.023
349.696
Marks 10 20 25 40 50
No. of students 20 30 50 15 5
Solution
H .M .
N f
( f / x) ; N=
x f 1/x f(1/x)
10 20 0.1 2
20 30 0.05 1.5
25 50 0.04 2
40 15 0.025 0.375
50 5 0.02 0.1
Total 120 5.975
136
N 120
H .M . 20.08
( f / x) 5.975
X xr f 1/xr f(1/x)
0-10 5 15 0.2 3
10-20 15 10 0.067 0.67
20-30 25 7 0.04 0.28
30-40 35 5 0.029 0.145
40-50 45 3 0.022 0.066
Total 40 4.091
H.M=40/4.091= 9.778
Measures of Dispersion
Defn: Dispersion is defined as the look of uniformity in the sizes of the items It is the extent to which
the value of size differs and hence it is the degree of variability.
i) The Range
Range: It is the difference between the greatest and least values observed.
Quartiles are those values which divide the frequency into four equal parts, where the values
are arranged in ascending or descending order of magnitude.
The first Quartile ( or the lower quartile) Q1 is that value of the variate which is such that
one quarter of the observations lies below Q1.
The third quartile ( or the upper quartile ) Q3 is that value of the variate which is such that
three quarters of the observations lie below Q3.
Individual Observations:
137
N 1
Q1 is the size of 4 th item
3( N 1)
Q3 is the size of 4 th item
N
Q1 is the size of 4 th item
3N
Q3 is the size of 4 th item
Continuous Series:
First Quartile:
1
N c. f1
Q1 l1 4 .c
f1
similarly
3
N c. f 3
Q3 l3 4 .c
Third Quartile f 3
Q3 Q1
The Quartile Deviation = 2
Remark:
Example:
1) Calculate the quartile deviation from the following marks of 13 students in a class
138
25, 35, 9, 28, 52, 41, 38, 96, 85, 72, 10, 40, 60
9,10,25,28,35,38,41,41,52,66,72,85,96
Hence N =13
1
( N 1)
Q1 is the size corresponding to the rank 4 i.e., 3 ½.
25 28 53
26.5
Q1 = average of 3 rd item and 4 item =
th
2 2
3
( N 1)
Q3 is the size corresponding to the rank 4 . i.e., 10 ½
Q Q1 66 (53 / 2) 79
Q.D 3
2 2 4
2) The following data relate to the frequency distribution of weights of 1000 Males. Calculate the
quartile deviation
Wt. in lb frequency
80.4-94.4 13
94.4-108.4 107
108.4-122.4 340
122.4-136.4 334
136.4-150.4 136
150.4-164.4 48
164.4-178.4 14
178.4-192.4 7
192.4-206.4 -
206.4-220.4 1
Solution:
To find Q1 and Q3
139
164.4-178.4 14 992
178.4-192.4 7 999
192.4-206.4 - 999
206.4-220.4 1 1000
1
N c. f1
Q1 l1 4 .c
f1
1
(1000) 120
Q1 108.4 4 .14 113.753
340
3
N c. f 3
Q3 l3 4 .c
f3
3
(1000) 460
Q3 122.4 4 .14 134.55
334
Q3 Q1 134.55 113.753
Q.D 10.3985
Hence Quartile Deviation = 2 2
The mode
The value of the variate which occurs most frequently is called mode.
Individual Observations:
4, 7,3,4,8.4
mode = 4
Discrete series:
140
40 20
50 10
60 6
70 4
Mode = 40
Continuous Series:
cf 2
l
Mode= f1 f 2
l- lower limit of the modal class; modal class – class with heighest frequency
f1- frequency before the modal class; f2- frequency after the modal class
1) Find the mode in the case of heights of trees in a grade whose frequency distribution is given in the
following table
Heights Frequency
Under 7 feet 26
Under 14 feet 57
Under 21 feet 92-
Under 28 feet 134
Under 35 feet 216
Under 42 feet 287
Under 49 feet 341
Under 56 feet 360
141
Solution
Heights Frequency
0-7 26
7-14 57-26=31
14-21 92-57=35
21-28 134-92=42 f1
28-35 216-134=82
35-42 287-216=71 f2
42-49 341-287=54
49-56 360-341=19
cf 2
l
Mode = f1 f 2
7(71)
28 32.398
42 71
1) The following table gives the height of 1000 adult males (measured to the nearest quarter inch):
Calculate the mean, median and mode. Verify whether the Emperical relation between them is
satisfied.
142
Solution:
Difference = 0.25
cf 2
l 66.724
Mode = f1 f 2
Calculation of mean
f r xr
66365
66.365
Mean = fr 1000
N/2 - c.f
Median l .c
f
143
Frequencyfr Cumulative
class frequency
57.875 - 59.875 2 2
59.875- 61.875 28 30
61.875- 125 155
63.75+.125=63.875
63.875-65.875 270 f1 425 c.f
65.875-67.875 303 f 728
67.875-69.875 197 f2 925
69.875-71.875 65 990
71.875-73.875 10 1000
Total 1000
Mean-mode= 3(mean-median)
The arithmetic mean of the absolute values of the deviations is called the mean deviation or the
average deviation.
Individual observation:
xx
n
Discrete series:
f xx
Mean deviation = f
Some times average deviation is also taken from the median.
The square root of the arithmetic mean of the squares of the deviation is called the standard deviation
or the roor mean square deviation.
Individual observations:
( x x) 2
n
Discrete series:
f ( x x) 2
f
Continuous series:
Standarddeviation σ c f ( x A)2
i) f
Here the final value must be multiplied by the width of the class interval to get the value of in
absolute units.
ii) If the origin A is taken as the arithmetic mean x , standard deviation is denoted by
c f ( x x) 2
f
2 x-A
c. fd 2 fd d
c
iii) f f where
Relation between the standard deviation and the root square mean square deviation
The root mean square deviation is least when the deviations are measured from the arithmetic mean
or the standard deviations is the least possible root mean square deviation.
The square of the standard deviation is also known as the variance of the distribution.
We have the following approximate relations between the different measure of dispersion
4
Mean Deviation Standard Deviation
5
2
Quartile Deviation Standard Deviation
3
145
Examples:
1) Find the mean deviation and standard deviation of the heights (in inches) of 16 students given
below:
67, 65, 59, 61, 67, 69, 72, 67, 62, 64, 63, 66, 68, 69, 67, 60
Solution
59, 60,61, 62, 63, 64, 64, 65, 66, 67, 67, 67, 67, 68, 69,69, 72
x 1046 65.375
Mean n 16
X x- x =x-65.375 x-x 2
x-x
59 -6.375 6.375 40.641
60 -5.375 5.375 28.891
61 -4.375 4.375 19.141
62 -3.375 3.375 11.391
63 -2.375 2.375 5.641
64 -1.375 1.375 1.891
65 -.375 .375 0.141
66 .625 .625 0.391
67 1.625 1.625 2.641
67 1.625 1.625 2.641
67 1.625 1.625 2.641
67 1.625 1.625 2.641
68 2.625 2.625 6.891
69 3.625 3.625 13.141
69 3.625 3.625 13.141
72 6.625 6.625 43.891
Total 47.25 195.796
Mean deviation
x-x
n
Mean deviation
x-x 47.25 2.953
n 16
2
x-x
195.756
12.23475 3.498
Standard deviation = n 16
2) Find the mean deviation and standard deviation for the following distribution
Marks 10 20 25 40 50
No. of 20 30 50 15 5
146
students
Solution:
Mean deviation
f x-x
n
To find x
x
x 145 29
n 5
Mean deviation
f x-x
1120
9.33
n 120
2
standard deviation
f x-x
14470
10.98
f 120
3) Compute mean deviation about mean, mean deviation about median and standard deviation for the
following distribution
class f
3.0-4.9 5
5.0-6.9 8
7.0-8.9 30
9.0-10.9 82
11.0-12.9 45
13.0-14.9 24
15.0-16.9 6
Here there is a gap between classes. So we have to adjust the classes
Difference -0.1
Difference/2 = 0.05
147
So the given problem becomes
class f
2.95-4.95 5
4.95-6.95 8
6.95-8.95 30
8.95-10.95 82
10.95-12.95 45
12.95-14.95 24
14.95-16.95 6
To find mean deviation
fx 10.45
Mean = f
Calculate Median
148
Mean deviation about Median
f x - Median
f
Standarddeviation σ c
f ( x x) 2 2 5.79 4.812
f
Co-efficient of variation:
It is equal to the ratio of the standard deviation of a distribution to it’s A.M. and is ofted expressed as
a percentage.
S.D
100
coefficient of variation = A.M
Remark:
For comparing the variability of two series, we calculate the co-efficient of variations for each
series.
The series having greater co-efficient of variation is said to more variable or less consistent
than the other and the series having lesser co efficient of variation is said to be more consistent or less
variable than the other.
149
Examples: The following table gives 10 measurements of the same quantity under the same
conditions by each observer A and B
A 8.116 8.125 8.125 8.129 8.130 8.137 8.137 8.141 8.136 8.146
B 8.112 8.118 8.124 8.130 8.136 8.137 8.138 8.139 8.137 8.141
Calculate the mean value and standard deviation value of each observer’s measurements which
observe do you think is probably the more reliable and why?
Solution
81.322
mean x1 8.132
10 nearly.
X
x x 2
8.116 0.000256
8.125 0.000049
8.125 0.000049
8.129 0.000009
8.130 0.000004
8.137 0.000025
8.137 0.000025
8.141 .000081
8.136 0.000016
8.146 0.000196
x x
2
0.00071
S.D 0.000071 0.008426
n 10
1
100
x
Hence the co-efficient of variation for A= 1
0.0084261
100 0.1036
= 8.132
81.312
mean x 2 8.1312
10
=8.131 nearly
150
X
x x 2
8.112 0.00036
8.118 0.000169
8.124 0.000049
8.130 0.000001
8.136 0.000025
8.137 0.000036
8.138 0.000049
8.139 .000064
8.137 0.000036
8.141 0.0001
Total:81.312 0.000889
0.00089
22 0.000089
10
2 0.009434
2 0.0009434
100 100 0.1160
Hence the co-efficient of variation for B = x 2 8.131
A 37 43 28 62 59 20 83 48 52 47
B 35 52 77 38 26 58 63 31 40 46
Which of the two bats man do you consider the more consistent and more efficient.
X
x x 2 y
y y 2
37 118.81 35 134.56
43 24.01 52 29.16
28 396.01 77 924.16
62 198.81 38 73.96
59 123.21 26 424.36
20 728.41 58 129.96
83 1232.01 63 268.96
48 .01 31 243.36
52 16.81 40 43.56
47 0.81 46 .36
151
x
x 47.9 y
y 46.6
n ; n
2888.9
12 288.89
10
1 16.996 = 17 nearly
2272.4
22 227.24
10
2 15.07
1 17
100 100
x
The coefficient of variation for A = 1 47 . 9 35.49%
2 15.07
100 100
x
The coefficient of variation for B = 2 46.6 32.34%
Since the A.M. of A is greater than the A.M of B, we conclude that A is more efficient than B.
Since the coefficient of variation B is less than the co efficient of variation of A, we conclude that B is
more consistent than A.
Exercises:
1) The score of two golfers A and B in 12 rounds are given below. Who is the better player and who is
the more consistent player
A 74 75 78 72 78 77 79 81 79 76 72 71
B 87 84 80 88 89 85 86 82 82 79 86 80
Let n1, x1 and 1 be the frequency, the A.M. and S.D. of a first set of variables and those
respectively for the second set be denoted by n2, x 2 and 2 . Let x be the A.M. of the combined set
of n1 n2 variables.
n n2 N 2 2 2 2 2
If 1 then Nσ n1σ1 n 2 σ 2 n1D1 n 2 D 2
Where D1 x1 x , D 2 x 2 x
152
Examples:
1) The numbers examined, the mean weight and standard deviation in each group of the examinations
by three medical examiners are given below. Find the mean weight and standard deviation of the
entire data when grouped together.
Nσ 2 n1σ12 n 2 σ 2 2 n 3σ 3 2 n1D12 n 2 D 2 2 n 3 D 3 2
Where D1 x1 x , D 2 x 2 x , D 3 x3 x
and N= n1+n2+n3
D1= 56-58.1=-2.1
D3=58-58.1=-.1
N= n1+n2+n3=200
N 2 4098
2
i.e., 200 4098
2 20.49
4.527 k.g
Exercises:
1) Find the mean and S.D. of the following two samples put together
153
UNIT IV
Linear Correlation
Correlation coefficient
Correlation:
The existence of the changes in one variable in sympathy with the changes in the other is called
correlation.
Thus, whenever two variables are related in such a way that a change in one is followed directly or
inversely by a change in the other they are said to be correlated.
In the case of raw correlated data, we can represent them graphically. Let (x1, y1), (x2,y2),…,
(xn,yn) be pairs of corresponding observations .
For example, x1, x2,…,xn may be the ages of husbands and y1, y2,…, yn may be the ages of wives. Plot
the points(x1,y1), (x2, y2) etc on a graph paer. The figure which ijs simply a collection of dots is called
the dot diagram or the scatter diagram. From this scatter diagram we can guess roughly how the
variables x and y are correlated.
If all the points in the scatter diagram seem to lie near a line as in figure 1(a), there is
correlation between the variables and the correlation is called linear.
If all the points seem to cluster round some curve as in figure 1 (b), the cojrrelation is called the non
linesr.
If the amount of change in oe variable tens to bear constant ratio to the amount of change in
the other variable then the correlation is said to be linear.
If the amount of change in one variable does not bear a constant ratio to the amount of
change in the other variable then the correlation is called nonlinear or curvilinear.
Coefficient of Correlation
Let x and y be respective arithmetic means of x1,x2,…xn and y1, y2, …, yn. There is said
to be positive correlation between x and y if, for any assigned value of x> x , the correspondingvalues
of y tend to be > y and if for any assigned x less than x the corresponding y values tend to be < y .
The correlation is said to be negative if for x> x , y tends to be < y and if for x< x , y tends to
be > y
P
x - x y y
The quantity N is said to be the covariance between x and y.
154
r
P
xx y y x x y y
x x y y
x x y y x x y y
x y N x y 2 2 2 2
N
N
N N N2
r
x x y y
N x y
x x y y
2 2
x y
N ; N
Given a set of values of x and y, we can calculate x and y by using the formula
xx X yy Y
σx
X2 σy
Y2
If and then N , N
r
XY
XY
Nσ x σ y
X 2Y 2
Remark:
when r = -1 , it means that there is perfect negative correlation between the variables.
When r = +1, it means that there is perfect positive correlation between the variables.
Examples:
1) Calculate the Karl Pearson’s coefficient of correlation from the following data
Solution
X
x 170 34 y
y 175 35
n 5 ; n 5
r
X X Y Y
X X Y Y
2 2
285 285
0.446
743550 639.257
2) Calculate coefficient of correlation from the following data
To simplify calculation let every value of X be divided by 100 and every value of Y be divided by 10
and denote these series by X’ and Y’
r
X X Y 'Y '
X ' X ' Y 'Y '
2 2
156
(46) 46
r 0.997
(28)(76) 46.130
3) Find the correlation coefficient between the heights of father and heights of son given below:
Height 65 66 67 67 69 71 72 70 65
of
father
(in
inches)
Height 67 68 69 68 70 70 69 70 70
of Son
(in
inches)
Solution
x
x 612 68 y
y 621 69
n 9 ; n 9
r
( x x )( y y )
12
12
0.5164
( x x ) 2 y y
2 (54)(10) 23.238
Computation of r –Shortcut
Method:
p
d1d 2 d1 d 2
N N N
2
x 2
d12 d1
N N
157
2
y 2
d 2 2 d 2
N N
P
r
x y
4) Find the co-efficient of correlation between industrial production and export using the following
data and comment on the result
Product 55 56 58 59 60 60 62
(in crore
tons)
Exports 35 38 38 39 44 43 44
(in crore
tons)
Solution:
Take the origin at 59 for x and 39 for y. We prepare the following table.
p
d1d 2 d1 d 2
N N N
44 3 8
p
7 7 7
308 24 332
p
49 49
2
x 2
d12 d1
N N
158
2
37 3
2
x
7 7
259 9 250
x2
49 49
2
y 2
d22 d2
N N
2
84 8
2
y
7 7
588 64 524
y2
49 49
p 332 / 49
r
x y 250 / 49 524 / 49
332 332
0.9173
250 524 361.9392
Exercises:
1)Calculate the pearson’s coefficient of correlation form the following data using 44 and 26
respectively as the origin of x and y (instead of x bar use 44 and instead of y bar use 26 in the
formula)
X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
r
x A y B
x 44 y 26
(r=-0.733) x A2 ( y B) 2 x 442 ( y 26) 2
2)From the following table calculate the coefficient of correlation by Karl Pearson’s method
X 6 2 10 4 8
Y 9 11 ? 8 7
Arithmetic mean of x and y are 6 and 8 respectively
Solution:
(r=-.919)
159
3) The following table gives indices of industrial production of registered unemployed (in hundred
Regression Lines
Let (x1, y1) , (x2, y2),…,(xn,yn) be n observations of the two variables. If we plot these n points on a
graph paper, it may happened that these points tend to cluster themselves along sojme well defined
lines. These are called regression lines.
Regression Equations
Regression equations also known as estimating equations are algebraic expressions of the regression
lines, there are two regression equations.
The regression equation of X on Y is used to describe the variations in the values of X for given
changes in Y and the regression equations of Y on X is used to describe the variation in the values of
y for given changes in X.
Regression equation of Y on X
y
Y Y r
x
X X
Regression equation of X on Y
X X r
x
y
Y Y
Remark:
1. Regression line of y on x is used to find the probable value or expected value of y for a given
value of x.
2. The regression line of x on y is used to find the probable or expected value of x for a given
value of y.
5. If the regression coefficients are both positive, then r is positive. If the regression coefficients
are both negative then r is negative.
160
p p2 p p
r r2
x y x 2 y 2 x2 y2
6. ;
p p y p
r r r x
x y x2 x 2 y
7. Since , and y
y yr
y
x x
xx r x y y
y
x and
Note:
x
b yx r
1) y is the regression co-efficient of X on Y.
y
b xy r
2) x is the regression coefficient of Y on X.
r bxy b yx
3)
1) The following data relate to the scores obtained by 9 salesmen of a company in an intelligence test
and their weekly sales in thousand rupees.
Test 50 60 50 60 80 50 80 40 70
Scores
Weekly 30 60 40 50 60 30 70 50 60
Sales
a) Obtain the set regression equation of sales on intelligence test scores of the salesmen.
b) Obtain the intelligence test score of a salesman is 64. What would be his expected weekly sales.
y 0.516 x 33.73
x 0.312 y 32.52
161
Solution
p p
y y ( x x) xx ( y y)
2
x and y2
These are the regression coefficients and they are both positive.
p p
Correlation coefficient = x y 2 2
(since regression coefficients are given, r= square root of
mult. (G.M. )Of both the regression coefficients)
0.5160.512 0.514
( x, y ) is the point of intersection of the regression lines.
y 0.516 x 33.73
x 0.312 y 32.52
-0.516x+y=33.73 -----(1)
0.516x-.264y=16.78 ___(2)
_________________
0.736y=50.51
Y = 68.63
Substitute in (1)
X= 67.64
3) From the following data, find the most likely value of y when x = 24
Y x
Mean 958.8 18.1
162
S.D 36.4 2.0
Also r=0.58
Solution
y
y yr
x
x x
36.4
y 958.8 (0.58) x 18.1
2
y-958.8=10.556x-191.064
y 10.556 x 767.736
Putting x =24
Y= 253.344+767.736=1021.08
Exercises:
4) Find the equation of regression lines for the data given below:
X 25 28 35 32 36 36 29 38 34 22
Y 43 46 49 41 36 32 31 30 33 39
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Rank Correlation:
6 d 2
1
n(n 2 1) where d= X-Y
Exercise:
Obtain the rank correlation between the variables X and Y from the following pairs of observed
values
X 50 55 65 50 55 60 50 65 70 75
Y 110 110 115 125 140 115 130 120 115 160
UNIT-IV
163
Let (xi, i) , I = 1,2,…,n be the n sets of observations
na0 a1 x y
a0 x a1 x 2 xy
a0
y x 2 x xy
n x 2 x
2
n xy x y
a1
n x 2 x
2
Examples:
Indept. 1 2 4 5 6 8 9
Variable
x
Dependent 2 5 7 10 12 15 19
Variable
y
Solution
Y= a1x+a0
na0 a1 x y
a0 x a1 x 2 xy
X y X2 xy
1 2 1 2
2 5 4 10
4 7 16 28
164
5 10 25 50
6 12 36 72
8 15 64 120
9 19 81 171
Total:35 70 227 453
a0
y x 2 x xy
n x 2 x
2
(70)(227) (35)(453) 35
a0 0.0962
2 364
7(227) (35
=- 43.4615
n xy x y
a1
n x 2 x
2
Y=1.9808x+43.4615
Remark:
If for the same table of values we assume y to be the independent variable and x to be the dependent
variable, we consider an equation of the type
X= a1y+a0
In this case
a0
y y 2 y xy
n y 2 y
2
n xy x y
a1
n y 2 y
2
Polynomial Regression
Assume that n pairs of coordinates (x1,y1) are given which are to be approximated by a quadratic. Let
the quadratic curve be represented by
Y=a2x2+a1x+a0
165
na 0 a1 x a2 x 2 y
a 0 x a1 x 2 a2 x 3 xy
a0 x 2 a1 x 3 a2 x 4 x 2 y
Examples:
X -4 -3 -2 -1 0 1 2 3 4 5
Y 21 12 4 1 2 7 15 30 45 67
Solution
Let y= a2x2+a1x+a0
na 0 a1 x a2 x 2 y
a 0 x a1 x 2 a2 x 3 xy
a0 x 2 a1 x 3 a2 x 4 x 2 y
X Y X2 X3 X4 xy X2y
-4 21 16 -64 256 -84 336
-3 12 9 -27 81 -36 108
-2 4 4 -8 16 -8 16
-1 1 1 -1 1 -1 1
0 2 0 0 0 0 0
1 7 1 1 1 7 7
2 15 4 8 16 30 60
3 30 9 27 81 90 270
4 45 16 64 256 180 720
5 67 25 125 625 335 1675
bx
Let y = ae be the curve to be fitted.
166
Taking log on both sides
Y nA B x
xY A x B x 2
Examples:
bx
1) Fit a curve y ae to the above data:
x 0 5 8 12 20
y 3 1.5 1 0.55 0.18
x Y Y=log 10 y x2 xY
0 3 0.4771 0 0
5 1.5 0.1761 25 0.8805
8 1 0 64 0
12 0.55 -0.2596 144 -3.1152
20 0.18 -0.7447 400 -14.894
-0.3511 633 -17.1287
A=0.4815; B=-0.0613
B=-0.0613
B=-0.0613/log 10e
= -0.0613/ 0.4343
= -0.1411
y=3.0304 e-0.1411x
Examples:
b
1) Fit a curve y ax to the following data:
x 1 2 3 4 5 6
y 151 100 61 50 20 8
Solution:
167
Y= axb
Taking log10
y nA B x
xY A x B x 2
x Y Y=logy xY X2
1 151 2.1790 2.1790 1
2 100 2 4 4
3 61 1.7853 5.3559 9
4 50 1.6990 6.796 16
5 20 1.3010 6.505 25
6 8 0.9031 5.4186 36
9.8674 30.2545 91
The normal equations are
y nA B x
xY A x B x 2
9.8674 6 A 21B
30.3545 21A 91B
Solving these
A= 2,5010, B= -0.2447
Log 10a=2.5010
a=10^ 2.5010=316.9567
b=10^B=0.5692
y 316.9567(0.5692) x
X 1 2 3 4 5 6
Y 1200 900 600 200 110 50
Solution:
Y=axb
168
log 10y=log 10a+blog 10x
Y=A+BX
A=3.3086
b=-1.7494
y=2035x—1.7494
LARGE SAMPLES
Definition
Definition
Definition
Example
In a shop we asses the quality of rice or any other commodity by taking a handful of it from the bag
and decide to purchase it or not.
2
The statistical constants of the population namely mean , variance which are usually referred to
as parameters.
Sampling Distribution
If we draw a sample of size n from a given finite population of size N then the total number of
possible samples is NCn.
N!
N Cn K
n!( N n)!
For each of these k samples we can compute, some statistic say t= t(x1.x2….,xn) in particular the mean,
variance etc as given below:
169
The set of values of the statistic so obtained, one for each sample constitutes the sampling distribution
of the statistic.
Standard error:
The standard deviation of sampling distribution of a statistic is known as its standard error and it is
denoted by (S.E.)
Test of Significance:
A very important aspect of the sampling theory is the study of tests of significance which en enable us
to decide on the basis of the sample results if
i) The deviation between the observed sample statistic and the hypothetical parameter value is
significant.
Null hypothesis:
For applying the test of significance we first set up of a hypothesis a definite statement about the
population parameter. Such a hypothesis is usually a hypothesis of no difference and it is denoted by
H0.
Alternative hypothesis:
Any hypothesis which is complementary to the null hypothesis is called an alternative hypothesis,
usually denoted by H1.
For example
If we want to test the null hypothesis that the population has a specified mean 0 (say) i.e., H0:
0
ii) H1: 0
iii) H1: 0
The alternative hypothesis (i) is known as a two tailed alternative and the alternative in (ii) is
known as right tailed and (iii) is known as left tailed.
The setting of alternative hypothesis is very important to decide whether we have to use a
single tailed (right or left) or two tailed test.
Errors in Sampling
The main objective in a sampling theory is to draw valid inferences about the population parameters
on the basis of the sample results. In practice we decide to accept or reject the lot after examining a
sample from it. We have two type of errors.
170
Type I error: Reject H0 when it is true.
Critical region
A region corresponding to a statistic t in the sample space S which lead to the rejection of H0 is called
critical region or Rejection region. Those region which lead to the acceptance of H0 give us a region
called acceptance region.
Level of Significance
The probability that a random value of the statistic t belongs to the critical region is known as the
level of significance. In otherwords, level of significance is the size of the Type I error. The levels of
significance usually employed in testing of hypothesis are 5% and 1%.
A test of any statistical hypothesis where the alternative hypothesis is one tailed ( right tailed
or left tailed ) is called one tailed test.
In a test of statistical hypothesis where the alternative hypothesis is two tailed is called two
tailed test.
ii) Choose the appropriate level of significance (either 5% or 1%) This is to be decided before sample
is drawn.
t E (t )
z
iii) Compute the test statistic S .E (t ) under the null hypothesis
iv) we compare the computed value of z in step (iii) with the significant value at given level of
significance.
For single tailed test ( Right tail or left tail) we compare the computed value of z with 1.645
(at 5% level) and 2.33 (at 1% level) and accept or reject H0 accordingly.
Remark:
171
5% 1.96 1.645
1% 2.58 2.33
Large Samples:
Definition
If the size of the sample n>30 then that sample is called large sample.
Suppose a large sample of size n is taken from a normal population. To test the significant
difference between the sample proportion p and the population proportion P , we use the test statistic
pP
z
PQ
n where n- sample size
PQ
p3
Note: Limits for population P are given by n where q=1-p
Examples:
1) A manufacturer claimed that atleast 95% of the equipment which he supplied to a factory
conformed to specifications. An examination of a sample of 200 pieces of equipment revealed that 18
were faulty. Test his claim at 5% level of significance?
Solution
172
Number of pieces confirming to specification = 200-18=182
182
.91
p= proportion of pieces confirming to specifications = 200
95
P= Population proportion = 100
i.e., P=95%
pP
z
PQ
Test statistic n
0.91 0.95
.95 .05
= 200
= -2.59
Since alternative hypothesis is one tailed, the tabulated value of z at 5% level of significance is 1.645
Since calculated value of z 2.6 is >1.645, we reject the null hypothesis H0 at 5% level of
significance.
2) In a sample of 1000 people in Karnataka 540 are rice eaters and the rest are wheat eaters. Can we
assume that both rice and wheat are equally popular in this state at 1% level of significance?
Solution
Given n =1000
540
.54
1000
1
.5
2
Q=1-P=0.5
Null hypothesis: H0: Both rice and wheat are equally popular in the state i.e., p=P orP= 1/2
173
Alternative hypothesis: H1: P 0.5 (Two tailed alternative) or p P
Test statistic
pP
z
PQ
n
0.54 0.5
z
(0.5)(0.5)
1000
=2.532
The tabulated value of z at 1% level of significance is 2.58 for two tailed test.
i.e., Both rice and wheat eaters are equally popular in that state.
3) In a sample of 400 parts manufactured by a factory, the number of defective parts was found to be
30. The company, however claimed that only 5% of their product is defective. Is the claim tenable?
Solution
Given n = 400
30
0.075
p= proportion of defectives in the sample = 400
5
0.05
P= the population proportion = 100
Q 1 P 1 0.05 0.95
Test statistic
pP
z
PQ
n
0.075 0.050
z
0.05 0.95
400
174
=2.27
Since the alternative hypothesis is one tailed alternative we apply one tailed test.
Since calculated value of z > tabulated value of z, we reject the null hypothesis.
i.e., The company’s claim that only 5% of their product is defective is not acceptable.
4) A die was thrown 9000 times and of these 3220 yield a 3 or 4. Is this consistent with the hypothesis
that the die was unbiased?
Solution
Given n = 9000
3220
9000
=0.3578
=P(getting a 3 or 4)
=P(getting 3) + p(getting 4)
1 1 2 1
6 6 6 3
P=0.3333
Q 1 P 0.6667
1
P
Alternative hypothesis H1: 3 (Two tailed alternative)
Test statistic
pP
z
PQ
n
175
0.3578 0.3333
z 4.94
(0.3333)(0.6667)
9000
Since alternative hypothesis is two tailed alternative, we apply two tailed test
The tabulated value of z for two tailed test at 5% level of significance is 1.96.
Since calculated value of z > tabulated value of z , the null hypothesis is rejected.
5) A random sample of 500 pineapples was takes from a large consignment and 65 were found to be
bad. Find the percentage of bad pineapples in the consignment.
Solution
Given n= 500
65
0.13
P=proportion of bad pineapples in the sample= 500
q=1-p=0.87
pq
p3
We know that limits for population proportion P are given by n
0.13(.87)
0.13 3
500
The percentage of bad pineapples in the consignment lies between 17.5%, 8.5%
6) A random sample of 500 apples was taken from a large consignment and 60 were found to be bad.
Obtain the 98% confidence limits for the percentage number of bad apples in the consignment.
Solution:
Given n = 500
60
0.12
P= proportion of bad apples in the sample = 500
q=.88
176
pq
p 2.33
n
0.12 .88
0.12 2.33
500
0.12 2.33(0.01453)
0.08615,0.15385
98% confidence limits for percentage of bad apples in the consignment are (8.61%, 15.38%).
Difference of Proportions
Suppose 2 large samples of sizes n1 and n2 are taken respectively from 2 different populations.
p1 p 2 n p n 2p2
z p 1 1
1 1 n1 n 2
pq
n
1 n 2 where and q = 1-p
Examples:
1) Random samples of 400 men and 600 women were asked whether they would like to have a
flyover near their residence. 200 men and 325 women were in favour of the proposal. Test the
hypothesis that proportions of men and women in favour of the proposal are same at 5% level of
significance?
Solution
p1; p2
n1=400, n2=600
200
0.5
Proportion of men = P1= 400
325
0.541
Proportion of women P2 = 600
Null hypothesis H0: Assume that there is no significant difference between the option ofmen and
women as far as proposal of flyover is connected
i.e., H0 = P1=P2
177
p1 p 2
z
1 1
pq
n
1 n 2
n p n 2p2
p 1 1
where n1 n 2 and q = 1-p
n p n 2p2
p 1 1
n1 n 2
200
400 600 325
400 600
p
400 600
525
0.525
1000
q=1-p=1-.525=0.475
0.5 0.541
z
1 1
0.525(.475)
400 600
0.041
0.032
= -1.28
z 1.28
The tabulated value of z for two tailed test at 5% level of significance is 1.96.
Since calculated value of z < tabulated value , we accept the null hypothesis at 5% level of
significance.
i.e., there is no difference of option between men and women as far as proposal of flyover is
concerned.
2) Before an increase in excise duty on tea, 800 persons out of a sample of 1000 persons were found
to be tea drinkers. After an increase on duty, 800 people were tea drinkers in a sample of 1200 people.
Using standard error of proportion, state whether there is a significant decrease in the consumption of
tea after the increase in excise duty?
Solution
178
800
0.8
P1= 1000
800
0.667
P2 = 1200
Null hypothesis H0: Assume that there is no significant difference between the consumption of tea
before and after the increase in excise duty.
i.e., H0 = P1=P2
p1 p 2
z
1 1
pq
n1 n 2
n p n 2p2
p 1 1
where n1 n 2 and q = 1-p
n p n 2p2
p 1 1
n1 n 2
800
1000 1200 800
1000 1200
p
1000 1200
0.727
q=1-p=1-0.727=0.273
0.8 0.667
z
1 1
0.727(0.273)
1000 1200
0.133
7
0.019
z 7
The tabulated value of z for one tailed test at 5% level of significance is 1.645.
Since calculated value of z > tabulated value , we reject the null hypothesis at 5% level of
significance.
That is, there is a difference in the consumption of tea before and after the increase in excise duty.
179
Note:
n p n 2p2
p 1 1
n1 n 2
p1 p
z
n2 pq
n1 (n1 n2 )
3) In a random sample of 400 students of the university teaching department, it was found that 300
students failed in the examination. In another sample of 500 students of the affiliated colleges the
number of failures in the same examination was found to be 300. Find out whether the proportion of
failures in the university teaching departments significantly greater than the proportion of failures in
the university teaching departments and the affiliated colleges taken together.
Solution
300
0.75
P1= 400
300
0.6
P2 = 500
n p n 2p2
p 1 1
n1 n 2
400(0.75) 500(0.6)
p
400 500
=0.667
q=0.333
Null hypothesis: H0: Assume that there is no significant difference between p1 and P (i.e., p1=P)
p1 p
z
n2 pq
Test statistic n1 (n1 n2 )
180
0.75 0.667
z
500 0.667 0.333
400(400 500)
=4.74
The table value of z for one tailed test at 5% level of significance is 1.645.
Since calculated value of z > table value , we reject the null hypothesis.
Therefore the proportion of failures in the affiliated colleges is greater than the proportion of failures
in university departments and affiliated colleges taken together.
Note:
i) Suppose the population proportions P1 and P2 are given and P1 P2 . If we want to test the
hypotheis that the difference P1-P2 in the population proportions is likely to be hidden in simple
samples of sizes n1 and n2 from the two populations respectively then
p1 p 2 (P1 P2 )
z
P1Q1 P2 Q 2
n1 n2
P1 P2
z
P1Q1 P2 Q2
n1 n2
4) A cigarette manufacturing firm claims that its brand A line of cigarettes outsells its brand B by
8%. (The meaning of this one is P1-P2=8%=0.08)If it is found that 42 out of a sample of 200 smokers
prefer brand A and 18 out of another sample of 100 smokers prefer brand B.Test whether the 8%
difference is a valid claim.
Solution
42
p1= 200
18
p2 = 100
8
0.08
P1-P2=8%= 100
Null Hypothesis: Assume that 8% difference in the sale of two brands of cigarettes is valid claim.
181
i.e., H0 : P1-P2=0.08
p1 p 2 (P1 P2 )
z
1 1
pq
n1 n2
P=0.2
Z=-1.02
z 1.02
Since the alternative is two tailed alternative, we apply two tailed test.
The table value of z at 5% level of significance for two tailed test is 1.96
Since the calculated value of z(=1.02) < table value(=1.96), we accept the null hypothesis.
5) In two large populations, there are 30% and 25% respectively of fair haired people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two populations.
Solution
30
0.3
P1= 100
25
0.25
P2 = 100
Q1= 1- P1 = 0.7
i.e., H0:p1=p2
P1 P2
z
P1Q1 P2 Q2
n1 n2
182
=2.55
The table value of z at 5% level of significance for two tailed test is 1.96
Since calculated value of z > table value, we reject the null hypothesis.
Exercises:
5) A machine produced 20 defective articles in a batch of 400. After over hauling it produced 10
defective in a batch of 300. Has the machine improved? (Null hypo: p1=p2; Alter. hypo:p1 <p2 )
Suppose we want to test whether the given sample of size n has been drawn from a population
with mean , we set up the null hypothesis that there is no difference between t x and where x
is the sample mean.
x
s
The test statistic z n where s is the sample size.
If is given
x
z
n
Note:
x 1.96
The values n are called 95% confidence limits for the means of the populations
corresponding to the given sample.
x 2.58
Similarly, n are called 99% confidence limits.
Examples:
1) A sample of 900 members has a mean of 3.4cms and S.D. 2.61cms. Is the sample drawn from a
large population of mean 3.25cm and S.D. 2.61cms? If the population is normal and its mean is
unknown. Find the 95% fiducial limits of true mean?
Solution
183
x 3.4 ; s 2.61
Null hypothesis: H0: Assume that the sample has been drawn from the population with mean
3.25 .
x
z
The test statistic n
=1.724
The table value of z for two tailed test at 5% level of significance is 1.96
That is the sample has been drawn from the population with mean 3.25 .
x 1.96
95% confidence limits are n
2.61
3.4 1.96
= 900
2) An insurance agent has claimed that the average age of policy holders who issue through him is
less than the average for all agents which is 30.5 years. A random sample of 100 policy holders who
had issued through him gave the following age distribution
Calculate the A.M. & S.D. of this distribution and use these values to test his claim at 5%
level of significance.
Solution
Null hypothesis H0: The sample is drawn from a population with mean 30.5
Alternative hypothesis: H1: 30.5 (see the question there less than came) (one tailed)
184
x
z
s
n
28.8 30.5
6.35
100
= -2.677
Since the alternative hypothesis is one tailed alternative we apply one tailed test.
The table value of z at 5% level of significance for one tailed test is 1.645
Since calculated value z 2.68 table value 1.645, we reject the null hypothesis.
2
Let x1 be the mean of a sample of size n1 from a population with mean 1 and S.D. 1
2
Let x2 be the mean of a sample of size n2 from a population with mean 2 and S.D. 2
x1 x2
z
12 22
n1 n2 .
Note: If the samples have been drawn from the same population then
12 22 2
x1 x2
z
2 2
n1 n2
n112 n2 22
2
n1 n2
185
Examples:
1) The means of 2 large samples 1000 and 2000 members are 67.5 inches and 68.0 inches
respectively. Can the samples be regarded as drawn from the population of S.D. 2.5 inches?
Solution
x1 67.5; x 2 68
x1 x2
z
2 2
n1 n2
67.5 68 0.5
z
2.52 2.52 0.0968
1000 2000 =
=-5.16
Since the alternative hypothesis is two tailed alternative, we apply two-tailed test.
The table value of z for two tailed test at 5% level of significance is 1.96.
z 5.16 1.96
i.e., the samples are not drawn from the same population of S.D. 2.5 inches.
2) The mean yield of wheat from a district was 210 pounds with S.D. 10 pounds per acre from a
sample of 100 plots. In another district the mean yield was 220 pounds with S.D. 12 pounds from a
sample of 150 plots. Assuming that the S.D. of yield in the entire state was 11 pounds, test whether
there is any significant difference between the mean yield of crops in the two districts.
Solution
x1 210 n1 =100
x2 220 ; n2 = 150
11
186
Null hypothesis H0: x1 x2
x1 x2
z
2 2
n1 n2
210 220
112 112
100 150
=7.041
That is there is a significant difference between the mean yield of crops in the two districts.
Note: If the two samples are drawn from two populations with unknown standard deviations then
x1 x2
z
s12 s22
n1 n2
3) In a survey of buying habits, 400 women shoppers are chosen at random in super market A located
in a certain section of the city. Their average weakly food expenditure is Rs.250 with a S.D. of Rs.40.
For 400 women shoppers chosen at random in super market B in another section of the city, the
average weakly food expenditure is Rs.220 with a S.D. of Rs.55. Test at 1% level of significance
whether the average weakly food expenditure of the two populations of shopper are equal.
Solution
n1=400; x1 250 , s1 = 40
n2=400; x2 220 , s2 = 55
x1 x2
z
s12 s2 2
n1 n2
187
250 200
z
402 552
400 400
z= 8.82
The table value of z for two tailed test at 1% level of significance is 2.58.
That is, The average weakly food expenditure of the two populations of shoppers are not equal.
4) The means of two samples of 1000 and 2000 items are 67.5 and 68.0 respectively. Can the
samples be regarded at 5 % level of significance, as drawn from the same population with standard
deviation 2.5?
Solution
x1 67.5; x2 68
2.5
Null hypothesis H0 : x1 x2
x1 x2
z
2 2
n1 n2
=5.163
Therefore The samples are not drawn from the same population with standard deviation 2.5
Exercises:
5) A sample of 100 electric light bulbs produced by manufacturer A showed a mean life time of 1190
hrs and a S.D of 90 hrs. A sample of 75 bulbs produced by manufacturer B showed a mean life time
of 1230 hrs with a standard deviation of 120 hrs. Is there a difference between the mean life time on
the two brands at significance level of i) 5% , ii) 1% (Ans: z= 2.421) (In this problem Samples are
draw from different population)
188
6) There are two brands of car tyres A and B in the market. A sample of 100 tyres of brand A has an
average life of 37500 kms with a S.D. of 2500 kms. Another sample of 75 tyres of brand B has an
average life of 3900 kms with a S.D. of 3000 km. Can we conclude that brand B is better than better
than A? (In this problem samples are drawn from same population)
In alternative hypothesis
i.e., x1 < x2
189
UNIT-V
t-test
x x x x
2
t x s 2
s n n
n 1 where ;
The number of degree of freedom of statistic generally denoted by is defined as the number n of
independent observations in the sample (i.e., the sample size) minus the no. k, of population
parameters which must be estimated from sample observations.
nk
Assumptions:
1) The parent population from which the samples are drawn are normally distributed.
2 2 2
3) 1 2 i.e., the population variances are equal.
Examples:
A sample of 26 bulbs gives a mean life of 990 hours with a standard deviation of 20 hours. The
manufacturer claims that the mean life of bulbs is 1000 hours. Is the sample not upto the standard
Solution
x 990, 1000, s 20
x
t
s
n 1
=-2.5
t 2.5
The tabulated value of t at 5% level of significance for 25 degree of freedom of one –tailed test is 1.7
190
Degree of freedom = n-1 =26-1=25
Since the calculated value of t is greater than tabulated value of t we reject the null hypothesis H0.
2) Tests made on the breaking strength of hard drawn copper wire gave the following results in
kilograms
It was stated that the population mean of the breaking strength is 593kgs. Examine the validity of this
statement given that the 5% value of the student’s t test for degree of freedom, 9 and 10 are
respectively 2.31, 2.26, 2.23.
Solution
xx x x 2
588 2.222 4.937
582 -3.778 14.273
580 -5.778 33.385
578 -7.778 60.497
582 -3.778 14.273
580 -5.778 33.385
582 -3.778 14.273
606 20.222 408.929
594 8.222 67.601
Total 651.553
x
x 585.778
n
x x
2
2 651.553
s
n 9 =72.395
x x
2
s
n =8.509
191
=2.4006
Therefore the population mean of the breaking strength is not 593 kgs
x1 x 2
t
1 1
s
n1 n2
Examples:
1) The I.Q’s of 16 students from an area of the city showed a mean of 107 with S.D. of 10 while the
I.Q. of 14 students from another area of the city showed a mean of 112 and S.D. of 8. Is there a
significant difference between the I.Q.’s of 2 groups at 5% level of significance?
Solution
x1 107 x 2 107
s1=10, s2=8
Test statistic
x1 x 2 n s 2 n2 s 2 2
t s2 1 1
1 1 n1 n2 2
s
n1 n2 where
n s 2 n2 s 2 2
s2 1 1
n1 n2 2 =9.44
t=-1.447
192
t 1.447
2) A group of 10 boys fed on diet A and another group of boys fed on diet B recorded the following
increase in weights
Diet A 5 6 1 12 4 3 9 6 10kgs
Diet B 2 3 6 8 10 1 2 8kgs
Does it show the superiority of diet A over that of diet B?
Solution
Let the increase in weights (in kgs) due to diets A and B be denoted by the variables x and y
respectively.
X1 x1 x1 x1 x1 2 X2 x2 x2 x2 x2 2
5 -1.4 1.96 2 -3 9
6 -0.4 0.16 3 -2 4
8 1.6 2.56 6 1 1
1 -5.4 29.16 8 3 9
12 5.6 31.36 10 5 25
4 -2.4 5.76 1 -4 16
3 -3.4 11.56 2 -3 9
9 2.6 6.76 8 3 9
6 -0.4 0.16
10 3.6 12.96
Total:64 102.4 40 82
x1
x1 64
n1 10 =6.4
x2
x2
n2 5
193
x1 x1
2
2 102.4
s1 10.24
n1 10
x2 x2
2
2
s2
n2 = 10.25
n s 2 n2 s 2 2 102.4 82
s2 1 1
n1 n2 2 16 =11.525
s=3.395
x1 x 2
t
1 1
s
n1 n2
=0.8694
Since the calculated value of t is less than the tabulated value, it is not significant.
We conclude that the diets A and B do not differ significantly in respect of increase in weights.
F – test
2
Let n1 be the number of observations in a sample I from the first population with variance 1 and
2
n2 be the number of observations in a sample II from the second population with variance 2 .
2 2 2
We set up null hypothesis Ho: 1 2
S12
F
S22
Where
X X 1
2
2
S1
n1 1
194
X X 2
2
2
S2
n2 1
Note:
In numerical problems, we take the greater of the variances S12 or S2 2 in the numerator.
Assumptions on F-test
2 2
3) The ratio of 1 to 2 should be equal to 1 or greater than 1.
Examples:
1) Two samples of 6 and 7 items have the following values of the variables
Sample I 39 41 43 41 45 39
Sample II 40 42 40 44 39 38 40
Solution
2 2
Null hypothesis H0: 1 2
2 2
Alternative hypothesis H1: 1 2
S 2
F 1
S22
X1 x1 x1 x1 x1 2 X2 x2 x2 x2 x2 2
39 -2.3 5.29 40 0.16
41 0.09 42 1.6 2.56
43 1.7 2.89 40 -0.4 0.16
41 -0.3 0.09 44 3.6 12.96
45 3.7 13.69 39 -1.4 1.96
39 -2.3 5.29 38 -2.4 5.76
40 -0.4 0.16
Total 27.34 23.72
195
x1
x1 248 41.3
n1 6
x2
x2
283
40.4
n2 7
X X 1
2
2
S1
n1 1
=5.468
X X 2
2
2
S2
n2 1
=3.9533
S 2
F 1
S22
=1.383
The table value of F for (5,6) degree of freedom at 5%level of significance is 4.39
Since calculated value is less than the table value, we accept the null hypothesis.
2) Two independent samples of sizes 7 and 6 have the following values of the variables
Sample I 28 30 32 33 31 29 34
Sample II 29 30 30 24 27 28 -
Examine whether the samples have been drawn from normal population having the same variance?
Solution:
F=S2^2/S1^2
F=1.113
The table value of F for the degree of freedom (5,6) at 5%level of significance is 4.39
Since the calculated value of F is less than the table value, we accept the null hypothesis.
Hence, the samples have been drawn from normal population having the same variance.
196
3) Two random samples gave the following results.
Test whether the samples could have come from the same normal population
Soln:
Null hypothesis:
Alternative hypothesis:
x x1
2
90
10
S12= n1 1 9
x2 x2
2
2 108
S2 9.8
n2 1 11
F=S1^2/S2^2=10/9.8=1.02
Since calculated value is less than the table value, we accept the null hypothesis.
Hence the samples could have come from the same normal population.
Chi-Square Test-
1) The following table gives the number of accidents that take place in an industry during various
days of the week. Test if accidents are uniformly distributed over the week
197
Soln: Null hypotheis: The accidents are uniformly distributed over the week
Alternative hypothesis: The accidents are not uniformly distributed over the week
We have to test the hypothesis that the accidents are uniformly distributed over the 6 days of the
week.
On the basis of this hypothesis we should expect 84/6=14 accidents each day.
2 O E 2
E
O E (O-E)2 (O-E)2/E
14 14 0 0
18 14 16 1.142
12 14 4 4/14=0.285
11 14 9 9/14=0.642
15 14 1 1/14=0.071
14 14 0 0
Total 2.14
2
O E 2 2.14
E
Degree of freedom=n-1=6-1=5
The table value of chi-square for the degree of freedom 5 at 5% level of significance is 11.07.
2
Since the calculated value of is less than the table value we accept the null hypothesis.
2) In a cross breeding experiment with plants of certain species, 240 offspring were classified into 4
classes with respect to the structure of their leaves as follows:
According to the theory of heredity, the probability of the four classes should be in the ratio 1:9:3:3.
Are these data consistent with theory.
Soln:
Null hypothesis: The given data are consistent with the theory that the frequencies in the four classes
should be in the ratio 1:9:3:3.
198
1 9 3 3
240, 240, 240, 240
16 16 16 16
i.e., 15,135,45,45
2
Computation of :
O E (O-E)2 (O-E)2/E
21 15 36 2.4
127 135 64 0.4742
40 45 25 0.5556
52 45 49 1.089
4.5188
2
O E 2 4.5188
E
Degree of freedom=n-1=4-1=3
2
The table value of for the degree of freedom 3 at 5% level of significance is 7.81
2
Since the calculated value of is less than the table value we accept the null hypothesis.
Hence the given data are consistent with the theory that the frequencies in the four classes should be
in the ratio 1:9:3:3.
3) Records taken of the number of male and female births in 800 families having four children are as
follows:
No. of boys 0 1 2 3 4
No. of girls 4 3 2 1 0
No. of families 32 178 290 236 64
Test whether the data are consistent with the hypothesis that the binomial law holds and that the
chance of male birth is equal to that of a female birth, namely p= q =1/2
Soln:
Null hypothesis: The data is consistent with the binomial law of equal probability for male and female
births.
i.e., p=q=1/2
nr
ncr p r q n r r
1 1
ncr
= = 2 2
199
n
1
p (r ) ncr
2
4
1 50 4cr
8004cr
Frequency of r male births f(r)=N p(r)= 2 =
f(0)= 50 4c0 50
f(4)= 50 4c4 50
2
Calculation of :
O E (O-E)2 (O-E)2/E
32 50 324 6.48
178 200 484 2.42
290 300 10 0.3333
236 200 1296 6.48
64 50 196 3.92
19.63
2
O E 2 19.63
E
Degree of freedom=5-1=4
2
The table value of for the degree of freedom 4 at 5% level of significance is 9.49
2
Since the calculated value of is greater than the table value we reject the null hypothesis.
Hence the data are not consistent with the hypothesis that the binomial law holds and that the chance
of male birth is not equal to that of a female birth.
4) In the accounting department of a bank, 100 accounts are selected at random and examined for
errors. The following result has been obtained.
No. of errors: 0 1 2 3 4 5 6
No. of accounts: 35 40 19 2 0 2 2
200
Does the information verify that the errors are distributed according to the poisson distribution law?
Soln:
emmr
p(r )
r! , m-mean
fr
To get mean, use f
If the random variable X denotes the number of errors, the given distribution is as follows:
x f fx
0 35 0
1 40 40
2 19 38
3 2 6
4 0 0
5 2 10
6 2 12
Mean m
fx 106 1.06
f 100
m=1.06
To fit a poisson distribution to the data we take the parameter of poisson distribution is equal to the
mean.
i.e., m=1.06
emm x
N
f(x)=Np(x)= x!
e 1.06 1.060
100 34.65
f(0)= 0!
e 1.06 1.061
100 36.73
f(1)= 1!
201
2
Calculation of :
O E (O-E)2 (O-E)2/E
35 34.65 .1225 0.0035
40 36.73 10.6929 .2911
19 19.47 0.2209 0.0113
2 6 6.88 9.15 (6- 1.0844
9.15)2=9.9225
0 1.82
2 .39
2 0.06
1.3903
(In observed frequency Any frequency value is less than 5 we make the adjustment with other
classes)
2 O E 2 1.3903
E
=n-1-1
2
The table value of for 2 degree of freedom at 5% level of significance is 5.99.
2
Since the calculated value of is less than the table value we accept the null hypothesis.
Hence the information verify that the errors are distributed according to the poisson distribution.
2 O E 2
E
RowTotal ColumnTotal
Expected frequency= GrandTotal
a b
c d
2 N (ad bc) 2
a b (a c)(b d )(c d )
202
Yates correction:
Add 0.5 to that cell and adjust the remaining cell frequencies.
Examples:
1) The following table gives the no. of good and bad parts produced by each of the three shifts in a
factory.
Test whether or not the production of bad parts is independent of the shift on which they were
produced?
Soln:
Alternative hypo: The production of bad parts is not independent of the shift.
RowTotal ColumnTotal
Expected Frequency
GrandTotal
1000 2850
E(960)= 2985 954.77
1000 2850
E(40)= 2985 45.23
990 2850
945.23
E(940)= 2985
990 135
44.77
E(50)= 2985
995 2850
950
E(950)= 2985
E(45)=45
2
Calculation of
203
Observed
Frequency
Expected Frequency
E
(O-E)2 O E 2
O E
960 954.77 27.35 0.0286
2
O E 2
E
=1.2731
=(3-1)(2-1)=2
Since calculated value is less than table value, we accept the null hypothesis.
Hence the production of bad parts is independent of the shift on which they were produced.
Of 1482 persons exposed to small pox in a locality, 368 in all were affected. Of these 1482
persons, 343 were vaccinated and of these only 35 were attacked. Given chi-square at 5% level of
significance for 1 degree of freedom is 3.841.
204
Soln:
RowTotal ColumnTotal
Expected Frequency
GrandTotal
368 343
E(35) = 1482 85.17
368 1139
E(333)= 1482 282.83
1114 343
E(308)= 1482 257.83
1114 1139
E(806)= 1482 856.17
2
Calculation of
O E (O-E)2 O E 2
E
35 85.17 2517.03 29.55
333 282.83 2517.03 8.90
308 257.83 2517.03 9.76
806 856.17 2517.03 2.94
Total: 51.15
2
O E 2
E =51.15
Since the calculated value 51.15> table value, we reject the null hyposthesis.
We conclude that vaccination and attacked by small pox are not independent.
205
Yates correction:
Add 0.5 to that cell and adjust the remaining cell frequencies.
3) In an experiment on the immunization of goats from anthrax. The following results were
obtained. Derive your inference on the efficiency of the vaccine.
Solution:
Since the cell (1,1) 2 which has the frequency 2 less than 5 we apply yates correction.
We add 0.5 to that cell and adjust the remaining cell frequencies.
RowTotal ColumnTotal
ExpectedFrequency
GrandTotal
10 8
3.33
E(2.5)= 24
12 16
8
E(9.5)= 24
12 8
4
E(6.5)= 24
12 16
8
E(5.5)= 24
206
2
Calculation of
O E (O-E)2 O E 2
E
2.5 3.33 0.68 0.204
9.5 8 2.25 0.28
6.5 4 6.25 1.56
5.5 8 6.25 0.78
1.7
2 O E 2
E =1.7
Since the calculated value 1.7 < table value, we reject the null hypothesis.
207