0% found this document useful (0 votes)
31 views59 pages

G.B Smith Stats

Maths and statistics

Uploaded by

stellchi073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views59 pages

G.B Smith Stats

Maths and statistics

Uploaded by

stellchi073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Intro.

to probability and statistics

1
THE CONCEPT OF PROBABILITY

In our every day activities and conversations we 𝑖.e the probability of an event 𝐴 lies
make use of the words chance and likelihood. between 0 and 1. The probability of an
These words form the basis of probability and event can neither be greater than 1 nor
they have the same meaning in this context. less than 0.
2. 𝑃(𝑆) = ∑∀𝐴 𝑃(𝐴) = 1
1.1 Probability This can also be referred to as the
probability of the certain is one .
This is the likelihood or the chance of an event
i.e 𝑃(𝐶𝑒𝑟𝑡𝑎𝑖𝑛) = 1
occurring.
3. 𝑃(∅) = 𝑃(𝑖𝑚𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒) = 0
Event: An event is the outcome or the result of i.e 𝑃(𝐴) = 0 𝑖𝑓 𝐴 ∉ 𝑆.
an experiment. 4. 𝑃(𝐴) + 𝑃(𝐴𝑐 ) = 1
Usually , we write
Sample Space: This is the set of all possible 𝑃(𝐴) = 𝑝 & 𝑃(𝐴𝑐 ) = 𝑞, hence 𝑝 + 𝑞 =
outcome of an experiment; it is denoted with the 1 0𝑟 𝑞 = 1 − 𝑝.
letter 𝑆 𝑜𝑟 Ω. For instance , in tossing a coin , the ⇒ 𝑃(𝐴) = 1 − 𝑃(𝐴𝑐 ).
sample space is {𝐻, 𝑇}. 𝑖.e the probability of an event occurring
is equal to one minus the probability of
In tossing a die once , the sample space is the event not occurring. These two
{1,2,3,4,5,6}. events are said to be complimentary.
5. If 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , … , 𝐴𝑛 are pair-wise
In taking an examination ,the sample space for mutually exclusive events , then
the score of a student is
𝑃(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … ∪ 𝐴𝑛 )
𝑛
𝑆 = {𝑥: 0% ≤ 𝑥 ≤ 100%}.
= 𝑃 (⋃ 𝐴𝑖 )
As can be seen from the foregoing text , there 𝑖=1
are two types of sample space, the discrete and = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + 𝑃(𝐴3 ) + ⋯
𝑛
continuous sample spaces. The sample space on
students’ examination score is a continuous + 𝑃(𝐴𝑛 ) = ∑ 𝑃(𝐴𝑖 )
sample space while the rest are discrete sample 𝑖=1
spaces. 1.3 Types of Events
Sample Point: A sample point is a single 1.3.1 Mutually Exclusive Events
outcome of an experiment. For instance , in
tossing a die , 5 and 6 are sample points. Two or more events are said to be mutually
exclusive if the occurrence of one excludes the
1.2 Axioms or Properties of Probability occurrence of others, i.e they can not occur
simultaneously. For instance , there is no way a
Let 𝑆 be the sample space of a given experiment,
lecturer can become the Head of two
then the following are the axioms or the laws of
departments in UNIPORT at the same time.
probability.
In particular , if 𝐴 and 𝐵 are mutually exclusive
1. 0 ≤ 𝑃(𝐴) ≤ 1.
events , then they are disjoint , i.e 𝐴 ∩ 𝐵 = ∅
Page 1 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

⇒ 𝑃(𝐴 ∩ 𝐵) = 𝑃(∅) = 0. Solution

1.3.2 Independent events Given that 𝑃(𝐴) = 0.4 and 𝑃(𝐴 ∪ 𝐵) = 0.7, let
𝑃(𝐵) = 𝑥
Two or more events are said to be independent if
the occurrence of one does not affect the i. For mutually exclusive events,
occurrence of others, i.e they can occur together 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
or occur separately. i.e 0.7 = 0.4 + 𝑥;
𝑥 = 0.7 − 0.4 = 0.3
If 𝐴 and 𝐵 are independent events , then ii. For independent events , 𝑃(𝐴 ∪ 𝐵)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵). = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴). 𝑃(𝐵)
i.e 0.7 = 0.4 + 𝑥 − 0.4𝑥
1.3.3 Equally Likely Events
0.7 = 0.4 + 0.6𝑥
Two or more events are said to be equally likely 0.7 − 0.4 = 0.6𝑥
or equi-probable if they have equal chance of ⇒ 𝑥 = 0.3⁄0.6 = 0.5
occurring, i.e given the events
𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , … , 𝐴𝑛 , if they are equally likely , Example 1.2: (Question 13, 2011/2012
then 𝑃(𝐴𝑖 ) = 1⁄𝑛 ; 𝑖 = 1,2,3, … , 𝑛. Session)

1.4 Addition Law of Probability Let 𝐴 and 𝐵 be two events defined on a sample
space 𝑆. If 𝑃(𝐴) = 0.3 and 𝑃(𝐵) = 0.6, obtain
The addition law of probability states that if 𝐴 𝑃(𝐴 ∪ 𝐵) given that 𝐴 and 𝐵 are independent
and 𝐵 are two events , then the probability of 𝐴 events.
or 𝐵 occurring is equal to the probability of 𝐴
occurring plus the probability of 𝐵 occurring Solution
minus the probability of 𝐴 ∩ 𝐵 occurring.
Since the events are independent,⇒
Mathematically, this is written as
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴). 𝑃(𝐵)
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴 ∪ 𝐵)
i.e 𝑃(𝐴 ∪ 𝐵) = 0.3 + 0.6 − 0.3 × 0.6
= 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵).
= 0.9 − 0.18 = 0.72.
However, if the events are mutually exclusive
evnts , then 𝑃(𝐴 ∩ 𝐵) = 0 Example 1.3
⇒ 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) Let 𝐴 and 𝐵 be two events with 𝑃(𝐴 ∪ 𝐵) =
7⁄ ,𝑃(𝐴 ∩ 𝐵) = 1⁄ and 𝑃(𝐴𝑐 ) = 5⁄ , find
If the events are independent events, 𝑃(𝐴 ∩ 8 4 9
𝐵) = 𝑃(𝐴). 𝑃(𝐵)
i. 𝑃(𝐴) ii. 𝑃(𝐵) iii. 𝑃(𝐴 ∩ 𝐵𝑐 ).
⇒ 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴). 𝑃(𝐵).
Solution
Example 1.1(Question47,2014/2015;Section
A) i. To obtain 𝑃(𝐴), we use the axiom ,
𝑃(𝐴) + 𝑃(𝐴𝑐 ) = 1
Let 𝐴 and 𝐵 be two events associated with an ⇒ 𝑃(𝐴) = 1 − 𝑃(𝐴𝑐 ) = 1 − 5⁄9
experiment such that 𝑃(𝐴) = 0.4 and (𝐴 ∪ 𝐵) = = 4⁄9
0.7 . find 𝑃(𝐵) if the events are
ii. From the given parameters, to
i. Mutually exclusive obtain 𝑃(𝐵), we use the axiom
ii. Independent 𝑃(𝐴 ∪ 𝐵)
= 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Page 2 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

i.e 7⁄8 = 4⁄9 + 𝑃(𝐵) − 1⁄4 𝑃(𝐴 ∩ 𝐵𝑐 ) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵)


⇒ 𝑃(𝐵) = 7⁄8 − 4⁄9 + 1⁄4 = 0.48 − 0.13 = 0.35
= 49⁄72
⇒ 𝑃(𝐴 ∪ 𝐵𝑐 )
iii. Before we answer this question ,lets = 0.48 + 0.63 − 0.35 = 0.76
take note of the following relations
a. 𝑃(𝐴 ∩ 𝐵𝑐 ) Method2: 𝑃(𝐴 ∪ 𝐵𝑐 ) =
= 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴𝑐 ∩ 𝐵)𝑐 = 1 − 𝑃(𝐴𝑐 ∩ 𝐵)
b. 𝑃(𝐴𝑐 ∩ 𝐵)
= 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = 1 − {𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)} =
c. 𝑃(𝐴𝑐 ∩ 𝐵𝑐 ) = 𝑃(𝐴 ∪ 𝐵)𝑐 1 −{0.37-0.13}=0.76.
= 1 − 𝑃(𝐴 ∪ 𝐵) [ from De’
Morgan’s law] Example1.5: (Question21-23, 2011/2012
d. 𝑃(𝐴 ∪ 𝐵𝑐 ) = 𝑃(𝐴𝑐 ∩ 𝐵)𝑐 Session)
= 1 − 𝑃(𝐴𝑐 ∩ 𝐵)
If 𝐴, 𝐵 𝑎𝑛𝑑 𝐶 are mutually exclusive events with
Now , unto the question . 𝑃(𝐴 ∩ 𝐵𝑐 )
𝑃(𝐴) = 0.2, 𝑃(𝐵) = 0.3 𝑎𝑛𝑑 𝑃(𝐶) = 0.4
= 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) = 4⁄9 − 1⁄4 determine the following probabilities
= 7⁄36.
i. 𝑃(𝐴 ∩ 𝐵)
ii. 𝑃[(𝐴 ∪ 𝐵) ∩ 𝐶]
iii. 𝑃(𝐴′ ∩ 𝐵′ ∩ 𝐶 ′ )

Solution
Example 1.4: (Question 21-22, 2014/2015;
Section A) i. Since the events are mutually
exclusive , 𝑃(𝐴 ∩ 𝐵) = 0
Given that 𝑃(𝐴) = 0.48, 𝑃(𝐵) = ii. From the distributive law of
0.37 𝑎𝑛𝑑 𝑃(𝐴 ∩ 𝐵) = 0.13, find sets,(𝐴 ∪ 𝐵) ∩ 𝐶 = (𝐴 ∩ 𝐶) ∪ (𝐵 ∩
𝐶)
i. 𝑃(𝐴 ∪ 𝐵) and 𝑃(𝐴𝑐 ∩ 𝐵𝑐 ). ⇒ 𝑃[(𝐴 ∪ 𝐵) ∩ 𝐶]
ii. 𝑃(𝐴 ∪ 𝐵𝑐 ) = 𝑃[(𝐴 ∩ 𝐶) ∪ (𝐵 ∩ 𝐶)]
= 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐵 ∩ 𝐶)
Solution
=0+0=0
i. Using the axiom , 𝑃(𝐴 ∪ 𝐵) iii. 𝑃(𝐴′ ∩ 𝐵′ ∩ 𝐶 ′ ) = 𝑃(𝐴 ∪ 𝐵 ∪
= 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) 𝐶)′ = 1 − 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶)
= 1 − {𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶)}
Therefore , 𝑃(𝐴 ∪ 𝐵) = 0.48 + = 1 − {0.2 + 0.3 + 0.4} = 0.1
0.37 − 0.13 = 0.72 and
Example 1.6: (Question 8, 2013/2014 Section)
𝑃(𝐴𝑐 ∩ 𝐵𝑐 ) = 𝑃(𝐴 ∪ 𝐵)𝑐 = 1 − 𝑃(𝐴 ∪ 𝐵)
= 1 − 0.72 = 0.28 Let 𝑋 and 𝑌 be events such that

ii. Method1:𝑃(𝐴 ∪ 𝐵𝑐 ) = 𝑃(𝐴) + 𝑃(𝑋) = 𝑎, 𝑃(𝑌) = 𝑏 𝑎𝑛𝑑 𝑃(𝑋 ∩ 𝑌) = 𝑐,


𝑃(𝐵𝑐 ) − 𝑃(𝐴 ∩ 𝐵𝑐 ) express 𝑃(𝑋 𝑐 ∪ 𝑌 𝑐 ) as probability in terms of
𝑎, 𝑏 𝑎𝑛𝑑 𝑐.
Where 𝑃(𝐵𝑐 ) = 1 − 𝑃(𝐵)
Solution
= 1 − 0.37 = 0.63 and
Page 3 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

From De’ Morgan’s law , 𝑃(𝑋 𝑐 ∪ 𝑌 𝑐 ) = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 4


𝑃(𝑋 ∩ 𝑌)𝑐 = 1 − 𝑃(𝑋 ∩ 𝑌) = 1 − 𝑐. = = 7⁄50
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
ii. 𝑃(𝐴𝑛 𝑜𝑑𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝐴𝑝𝑝𝑒𝑎𝑟𝑖𝑛𝑔)
1.5 The Relative Frequency Definition Of = 𝑃(1,3 𝑜𝑟 5)
Probability
= 7⁄50 + 8⁄50 + 9⁄50 = 24⁄50
The relative frequency definition of probability , = 12⁄25
also termed the empirical definition of iii. 𝑃(𝐴 𝑝𝑟𝑖𝑚𝑒 𝑁𝑢𝑚𝑏𝑒𝑟 𝐴𝑝𝑝𝑒𝑎𝑟𝑖𝑛𝑔)
probability defines the probability of an event as = 𝑃(2,3 𝑜𝑟 5)
a function of its frequency of occurrence. It
states that the probability of an event 𝐴 is = ⁄50 + ⁄50 + 9⁄50 = 26⁄50
9 8
proportional to the frequency of 𝐴, i.e 𝑃(𝐴) ∝ = 13⁄25
𝑛(𝐴)
Always remember that “or” implies addition and
∴ 𝑃(𝐴) = 𝑘𝑛(𝐴) , where 𝑘 is a constant of “and ” implies multiplication.
proportionality and it’s equal to the reciprocal of
𝑛(𝑆) for a sample space 𝑆. Example 1.9: (Question 26, 2014/2015
;Section A)
𝑛(𝐴)
In all , 𝑃(𝐴) = .
𝑛(𝑆) If a class has 10 boys and 4 girls , the probability
of boys and girls in ratio is ?
Example 1.7
Solution
In a house of 5 boys and 7 girls , the probability
7 7 10
of picking a girl at random is 5+7 = 12 and that The 𝑃𝑟(𝐵𝑜𝑦𝑠) = = 0.71 and the
14
5 5 4
of picking a boy is 5+7
= 12 Pr(𝐺𝑖𝑟𝑙𝑠) = = 0.29, hence the ratio of boys
10
to girls is 0.71:0.29.
Example 1.8: (Question 30-32, 2014/2015;
SectionA) Example 1.10
A die is tossed 50 times. The table gives the six A marble is drawn at random from a box
numbers and their frequency of occurrence containing 10 red, 30 white ,20 blue and 15
orange marbles. Find the probability that it is
Number 1 2 3 4 5 6
a. Orange or Red
Frequency 7 9 8 7 9 10 b. Neither red nor Blue
c. Not Blue
Use the information above to answer the d. White
following e. Red, White or Blue.
i. Find the probability that a 4 appears Solution
ii. Find the probability that an odd
number appears From the given statement ,𝑛(𝑊) = 30, 𝑛(𝑅) =
iii. Find the probability that a prime 10 ,
number appears.
𝑛(𝐵) = 20 𝑎𝑛𝑑 𝑛(𝑂) = 15, the total number of
Solution marbles in the bx is

i. 𝑃(𝐴 4 𝑎𝑝𝑝𝑒𝑎𝑟𝑖𝑛𝑔) 𝑛(𝑆) = 30 + 10 + 20 + 15 = 75

Page 4 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

a. 𝑃(𝑂𝑟𝑎𝑛𝑔𝑒 𝑜𝑟 𝑅𝑒𝑑) = 𝑃(𝑂𝑟𝑎𝑛𝑔𝑒) + Number of minor defects=4


𝑃(𝑅𝑒𝑑)
Number of major defects=2
= 15⁄75 + 10⁄75 = 25⁄75 = 1⁄3
b. 𝑃(𝑁𝑒𝑖𝑡ℎ𝑒𝑟 𝑅𝑒𝑑 𝑛𝑜𝑟 𝐵𝑙𝑢𝑒) Number of good articles=10, hence the total
= 1 − 𝑃(𝑅𝑒𝑑 𝑜𝑟 𝐵𝑙𝑢𝑒) number of articles is 16
= 1 − {10⁄75 + 20⁄75}
i. 𝑃(𝑁𝑜 𝑑𝑒𝑓𝑒𝑐𝑡𝑠) =
= 1 − 30⁄75 = 45⁄75 = 3⁄5 𝑃(𝐺𝑜𝑜𝑑 𝑎𝑟𝑡𝑖𝑐𝑙𝑒) = 10⁄16
c. 𝑃(𝑁𝑜𝑡 𝐵𝑙𝑢𝑒) = 1 − 𝑃(𝐵𝑙𝑢𝑒)
= 5⁄8
= 1 − 20⁄75 = 55⁄75 = 11⁄15
ii. 𝑃(𝑁𝑜 𝑚𝑎𝑗𝑜𝑟 𝑑𝑒𝑓𝑒𝑐𝑡𝑠)
d. 𝑃(𝑊ℎ𝑖𝑡𝑒) = 30⁄75 = 2⁄5 = 1 − 𝑃(𝑀𝑎𝑗𝑜𝑟 𝑑𝑒𝑓𝑒𝑐𝑡𝑠)
e. 𝑃(𝑅𝑒𝑑 , 𝑊ℎ𝑖𝑡𝑒 𝑜𝑟 𝐵𝑙𝑢𝑒) = 1 − 2⁄16 = 7⁄8
= 𝑃(𝑅𝑒𝑑) + 𝑃(𝑊ℎ𝑖𝑡𝑒) + 𝑃(𝐵𝑙𝑢𝑒) iii. 𝑃(𝑒𝑖𝑡ℎ𝑒𝑟 𝑔𝑜𝑜𝑑 𝑜𝑟 𝑚𝑎𝑗𝑜𝑟 𝑑𝑒𝑓𝑒𝑐𝑡𝑠)
= 10⁄75 + 20⁄75 + 30⁄75 = 𝑃(𝐺𝑜𝑜𝑑) + 𝑃(𝑀𝑎𝑗𝑜𝑟 𝑑𝑒𝑓𝑒𝑐𝑡𝑠)
= 60⁄75 = 4⁄5 = 10⁄16 + 2⁄16 = 12⁄16
= 3⁄4
Example 1.11: (Question 5, 2013/2014
Section) 1.6 Probability Tree
An unbiased coin is tossed and a fair die is A probability tree is atree or structure that shows
thrown , what is the probability of obtaining a the possible outcomes of an experiment and their
tail and an even number? respective probability of occurrence. For
instance , the probability tree for the outcomes in
Solution a single toss of a fair coin is as shown below.
Since the coin is unbiased , the probability of
Head is same as the probability of Tail , and they
1 1 H
are both equal to 2. Also, for the die, the sample 2
space is {1,2,3,4,5,6}. The probability of an even
3 1 1
number is = , hence the probability of a tail 2
6 2
1 1 1
and an even number is 2 . 2 = 4. T
Example 1.12:

A lot consist of ten good articles with four minor


defects and two major defects. One article is If an experiment results to three possible
chosen at random , find the probability that outcomes, A, B and C with the respective
1 1 1
i. It has no defects probabilities , 𝑎𝑛𝑑 , then the probability tree
6 3 2
ii. It has no major defects for this experiment is as displayed below.
iii. It is either good or has a major
defects.

Solution

From the statement ,

Page 5 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

1 A B
G
6 G
1 B
3 B B
B
G
1
B
2 C G B
G
B
G
G

The above probability trees are for experiments From the above , the sample space is 𝑆 =
that are performed once. When the experiment is {𝐵𝐺𝐵, 𝐵𝐺𝐺, 𝐵𝐵𝐵, 𝐵𝐵𝐺, 𝐺𝐵𝐵, 𝐺𝐵𝐺, 𝐺𝐺𝐵, 𝐺𝐺𝐺}.
performed more than once , the probability tree
would be extended , as shown below. The probability of having at least of one girl is
7/8. This is so , since it is only 𝐵𝐵𝐵 that
In tossing a fair coin twice, the probability tree contains no girl.; the remaining seven outcomes
is as shown below. have at least one girl.

1.7Conditional probability
H

H Given two events 𝐴 and 𝐵, the conditional


probability of the event 𝐴 , given that 𝐵 has
T occurred is symbolized as
H 𝑃(𝐴∩𝐵)
𝑃(𝐴/𝐵) = ; 𝑃(𝐵) > 0 and the
H 𝑃(𝐵)
T conditional probability of the event 𝐵, given that
𝐴 has occurred is

𝑃(𝐴 ∩ 𝐵)
Example 1.13: (Question 2, 2008/2009 𝑃(𝐵/𝐴) = ; 𝑃(𝐴) > 0
𝑃(𝐴)
Session)
From the above ,
Three children are born into a family . assuming
that the probability of having a boy is the same 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵/𝐴); this is termed the
as the probability of having a girl , construct a multiplication law of probability.
tree diagram for this phenomenon . find the
probability that at least one of the children is a If the events are independent events, then
girl. 𝑃(𝐴∩𝐵) 𝑃(𝐴)∩𝑃(𝐵)
𝑃(𝐴/𝐵) = 𝑃(𝐵)
= 𝑃(𝐵)
= 𝑃(𝐴), similarly
Solution 𝑃(𝐵/𝐴) = 𝑃(𝐵).

Let the event of a birth of boy and a girl be Example 1.14 If 𝑃(𝐴 ∩ 𝐵) = 0.13,
respectively, B and G , hence the probability tree
is as shown below: 𝑃(𝐴) = 0.2 𝑎𝑛𝑑 𝑃(𝐵) = 0.32, find 𝑃(𝐴/
𝐵) 𝑎𝑛𝑑 𝑃(𝐵/𝐴)

Solution

Page 6 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

𝑃(𝐴∩𝐵) 𝑃(𝐴∩𝐵) 𝑃(𝑀∩𝐶) 0.10


Using 𝑃(𝐴/𝐵) = and 𝑃(𝐵/𝐴) = 𝑃(𝐶/𝑀) = 𝑃(𝑀) = 0.25
𝑃(𝐵) 𝑃(𝐴)
1
0.13 =
⇒ 𝑃(𝐴/𝐵) = 0.32 = 0.40625 and 𝑃(𝐵/𝐴) = 25
0.13 𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑓𝑎𝑖𝑙𝑖𝑛𝑔
= 0.65
0.2 iii. 𝑃 ( 𝑚𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 )
𝑜𝑟 𝑐ℎ𝑒𝑚𝑖𝑠𝑟𝑡𝑦
Example 1.15
= 𝑃(𝑀 ∪ 𝐶)
In a certain department , 25% of the students = 𝑃(𝑀) + 𝑃(𝐶) − 𝑃(𝑀 ∩ 𝐶)
failed mathematics,15% failed chemistry and = 0.25 + 0.15 − 0.10 = 0.30
10% failed both mathematics and chemistry . a
1.8 Baye’s Theorem
student is selected at random ,
Baye’s Theorem is a generalization of
i. If he failed chemistry , what is the
conditional probability. It states that suppose
probability the he failed
𝐴1 , 𝐴2 , 𝐴3 , … , 𝐴𝑛 are mutually exclusive events
mathematics.
that are collectively exhaustive and let 𝐵 be an
ii. If he failed mathematics, what is the
event that will occur along side with at least one
probability the he failed chemistry.
of the mutually exclusive events stated, then the
iii. What is the probability the he failed
probability that a given event 𝐴𝑘 , given that 𝐵
mathematics or chemistry. 𝑃(𝐴𝑘 ∩𝐵)
has occurred is given as 𝑃(𝐴𝑘 /𝐵) =
𝑃(𝐵)

𝑃(𝐴𝑘 ) ∩ 𝑃(𝐵/𝐴𝑘 )
Solution =
𝑃(𝐵)
From the above , we have it that
Where 𝑃(𝐵) is called the TotalProbabilityand it
𝑃(𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 𝑚𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠) = 𝑃(𝑀) is given as
= 25% = 0.25 𝑛

𝑃(𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 𝑐ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦) 𝑃(𝐵) = ∑ 𝑃(𝐴𝑖 ) . 𝑃(𝐵/𝐴𝑖 )


𝑖=1
= 𝑃(𝐶) = 15% = 0.15 and
𝑃(𝐴𝑘 ) ∩ 𝑃(𝐵/𝐴𝑘 )
⇒ 𝑃(𝐴𝑘 /𝐵) =
𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 𝑏𝑜𝑡ℎ 𝑚𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 ∑𝑛𝑖=1 𝑃(𝐴𝑖 ) . 𝑃(𝐵/𝐴𝑖 )
𝑃( )
𝑎𝑛𝑑 𝑐ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦
= 𝑃(𝑀 ∩ 𝐶) Example 1.16

= 10% = 0.10 Box 𝐼 contains 3 red and 2 blue marbles while


Box 𝐼𝐼 contains 2 Red and 8 Blue matbles. A
𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 fair coin is tossed, if the coin turns up Head , a
𝑚𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 marble is chosen from Box 𝐼, if it turns up tail, a
i. 𝑃( )
, 𝑔𝑖𝑣𝑒𝑛 marble is chosen from Box 𝐼𝐼. Find the
𝑡ℎ𝑎𝑡 ℎ𝑒 𝑓𝑎𝑖𝑙𝑒𝑑 𝑐ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦 probability that Box 𝐼 was chosen, given that a
𝑃(𝑀 ∩ 𝐶) 0.10 red marble was selected
= 𝑃(𝑀/𝐶) = =
𝑃(𝐶) 0.15
1 Solution
=
15 From the definition given, 𝑃(𝐶ℎ𝑜𝑜𝑠𝑖𝑛 𝑏𝑜𝑥 𝐼) =
𝐴 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑓𝑎𝑖𝑙𝑖𝑛𝑔 𝑐ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦, 1
ii. 𝑃( 𝑔𝑖𝑣𝑒𝑛 ) 𝑃(𝐼) =
2
𝑡ℎ𝑎𝑡 ℎ𝑒 𝑓𝑎𝑖𝑙𝑒𝑑 𝑚𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠
Page 7 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

1 𝑛(𝑊ℎ𝑖𝑡𝑒) 𝑛(𝑊ℎ𝑖𝑡𝑒) 𝑛(𝑊ℎ𝑖𝑡𝑒)


𝑃(𝐶ℎ𝑜𝑜𝑠𝑖𝑛𝑔 𝑏𝑜𝑥 𝐼𝐼) = 𝑃(𝐼𝐼) =
2 =5 =1 =3
𝑃(𝑅𝑒𝑑 𝑤𝑎𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑/𝐵𝑜𝑥 𝐼) = 𝑃(𝑅/𝐼)
𝑇𝑜𝑡𝑎𝑙 = 8 𝑇𝑜𝑡𝑎𝑙 = 3 𝑇𝑜𝑡𝑎𝑙58
= 3/5

𝑃(𝑅𝑒𝑑 𝑤𝑎𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑/𝐵𝑜𝑥 𝐼𝐼/) 𝑃(𝑅/𝐴) = 𝑃(𝑅/𝐴) = 𝑃(𝑅/𝐴) =


3/8 2/3 2/5
= 𝑃(𝑅/𝐼𝐼) = 2/10 = 1/5
𝑃(𝑊/𝐴) = 𝑃(𝑊/𝐴) = 𝑃(𝑊/𝐴) =
𝑃(𝐼∩𝑅) 𝑃(𝐼).𝑃(𝑅/𝐼)
Now, 𝑃(𝐼/𝑅) = = 5/8 1/3 3/5
𝑃(𝑅) 𝑃(𝐼).𝑃(𝑅/𝐼)+𝑃(𝐼𝐼).𝑃(𝑅/𝐼𝐼)

1 3 3
×
2 5 10
=1 3 1 1 = 3 1 = 3/4 .
× + × +
2 5 2 5 10 10 The total probabilities are

𝑃(𝑅) = 𝑃(𝐴). 𝑃(𝑅/𝐴) + 𝑃(𝐵). 𝑃(𝑅/𝐵)


Example 1.17 + 𝑃(𝐶). 𝑃(𝑅/𝐶)
1 3 1 2 1 2 1 3 2
We are given three urns as follows: Urn A 𝑃(𝑅) = × + × + × = ( ) ( + +
3 8 3 3 3 5 3 8 3
contains 3 red and 5 white balls, Urn B contains 2 1 173 173
) = (3) (120) = 360 and
2 red and 1 white ball ; Urn C contains 2 red and 5
3 white balls. An Urn is selected at random and
a ball is drawn from the Urn . if the ball is red , 𝑃(𝑊) = 𝑃(𝐴). 𝑃(𝑊/𝐴) + 𝑃(𝐵). 𝑃(𝑊/𝐵)
what is the probability that it came from Urn A? + 𝑃(𝐶). 𝑃(𝑊/𝐶)
if it is white , what is the probability that it came 1 5 1 1 1 3
from Urn C? = × + × + ×
3 8 3 3 3 5
Solution 1 5 1 3 1 187 173
= ( )( + + ) = ( )( )=
From the statement given , we are required to 3 8 3 5 3 120 360
obtain 𝑃(𝐴/𝑅) and 𝑃(𝐶/𝑊). 1 3
×
3 8 45
⇒ 𝑃(𝐴/𝑅) = 173 = 173 and 𝑃(𝐶/𝑊)
𝑃(𝐴∩𝑅)
We have it that 𝑃(𝐴/𝑅)= 𝑃(𝑅)
360

1 3
× 72
3 5
=
𝑃(𝐴).𝑃(𝐴/𝐵)
and
𝑃(𝐶∩𝑊)
𝑃(𝐶/𝑊)= 𝑃(𝑊) = 187 = 187.
𝑃(𝑅) 360

𝑃(𝐶). 𝑃(𝑊/𝐶) Example 1.18: (Question 19, 2005/2006


= Session)
𝑃(𝑊)
1 An assembler of electric fans uses motors from
Where 𝑃(𝐴) = 𝑃(𝐵) = 𝑃(𝐶) = two sources, Company A and Company B. 90%
3
of supplies comes from company A and 5% of
Further more , we have the following these are known to be defective. Of the
distribution. remaining 10% from Company B, 3% is known
Urn A Urn B Urn C to be defective. An assembled fan is found to
have a defective motor, what is the probability
that the motor is supplied by Company B?
𝑛(𝑅𝑒𝑑) = 3 𝑛(𝑅𝑒𝑑) = 2 𝑛(𝑅𝑒𝑑) = 2
Solution
Page 8 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

From the given statement , 𝑃(𝐵).𝑃(𝐷/𝐵)


𝑃(𝐵/𝐷) = 𝑃(𝐷)
where the total probability
𝑃(𝐴) = 90% = 0.9 , 𝑃(𝐵) = 10% = 0.1. is the total probability.

The probability of defective from the Company i.e 𝑃(𝐷) = 𝑃(𝐴). 𝑃(𝐷/𝐴)+ 𝑃(𝐵). 𝑃(𝐷/𝐵)
A is 𝑃(𝐷/𝐴) = 5% = 0.05 and 𝑃(𝐷/𝐵)=
3% = 0.03 𝑃(𝐷) = 0.9(0.05) + 0.1(0.03) = 0.048, hence
the conditional probability is
The probability that the motor is supplied by
0.1(0.03)
Company B, given that it is defective is 𝑃(𝐵/𝐷) = 0.048
= 0.0625

2
Random Variables And Probability Distributions

2.1 Random variables From the table above , the random variable X
can assume three possible values, 0,1 and 2.
A random variable is a real valued function
having its domain as a sample space. Random Let 𝑋 be the volume of petrol that a given
variables are denoted with capital letters while 500,000litres capacity tanker carries, then
realizations(i.e values that can be assume by
these random variables ) of these random 𝑋 = {𝑥: 0 ≤ 𝑥 ≤ 50,000𝑙𝑖𝑡𝑟𝑒𝑠}.
variable are denoted with corresponding small
letters. It can be seen from the above that there are two
types of random variables, the discrete and the
For instance , in tossing a die, let 𝑋 be the continuous random variables.
number that appears , then 𝑋is a randm variable
and 𝑋 = {1,2,3,4,5,6}. A discrete random variable can only assume
countable values while a continuous random
In a single toss of a coin , a head (H) or a tail(T) variable can assume measurable values , that is,
would appear. Let 𝑋 be a r.v , then the random values that are as many as the number of points
variable 𝑋 can be defined as 𝑋 = 1 if a head between two numbers on the real number line.
appears and 0 if a tail appears.
For instance , the number of deaths in a given
⇒ 𝑋 = {0,1} year , the number of students ina class , the
number of subjects offered by a given student,
If a coin is tossed twice or two coins are tossed the number of defective item produced by a
once , and let 𝑋 represent the number of tails given machine in a factory are all discrete
that appears , then the sample space is 𝑆 = variables while , weights, height, age , volume
{𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}. are all continuous variables.

S HH HT TH TT 2.2 Probability Distribution

X 0 1 1 2 A probability distribution is the pair (𝑥, 𝑝(𝑥)),


where 𝑥 is a realization of the random variable 𝑋
and 𝑃(𝑥) is the probability of 𝑥 occurring.

Page 9 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Just as we have two types of random variables , In the question above , wehave a discrete
so also we have two classes of probability random variable, hence to obtain the value of the
distributions. constant , we use the property

2.2.1Discrete Probability Distribution 3

∑ 𝑃(𝑥) = 1
Let 𝑋 be a discrete random variable , if 𝑃(𝑥) is a 1
function that satisfies the following properties ,
then it is called the probability mass function i.e
3
(p.m.f) of 𝑋:
∑ 𝑝(𝑥 + 2)𝑥 = 1
1. 𝑃(𝑥) ≥ 0∀ 𝑥 1
2. ∑∀ 𝑥 𝑃(𝑥) = 1
3. 𝑃(𝑥ℇ𝐸) = ∑𝑥ℇ𝐸 𝑃(𝑥) 𝑝(1 + 2)1 + 𝑝(2 + 2)2 + 𝑝(3 + 2)3 = 1

2.2.2 Continuous Probability Distribution 3𝑝 + 16𝑝 + 125𝑝 = 1

Let 𝑋 be a continuous random variable with 144𝑝 = 1


probability density function (p.d.f)𝑓(𝑥) , then
the following are true. ∴ 𝑝 = 1⁄144

1. 𝑓(𝑥) ≥ 0∀ 𝑥 Example 2.2 : Question 33-35, 2014/2015


∞ ;SectionA
2. ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
𝑏
3. 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥 1 𝑥
∞ Given that 𝑃(𝑋 = 𝑥) = 𝑘 ( ) ;
3
4. 𝑃(𝑋 ≥ 𝑎) = ∫𝑎 𝑓(𝑥)𝑑𝑥
𝑎
5. 𝑃(𝑋 ≤ 𝑎) = ∫−∞ 𝑓(𝑥)𝑑𝑥 𝑋 = 1,2,3, … is the p.d.f of ar.v 𝑋

Note that in the case of a continuous distribution i. Determine the constant 𝑘


, < and ≤ are the same > and ≥ are equally the ii. Compute 𝑃(𝑋 𝑖𝑠 𝑜𝑑𝑑)
same , hence iii. Compute 𝑃(𝑋 𝑖𝑠 𝑑𝑖𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑏𝑦 3)

𝑃(𝑋 > 𝑎) = 𝑃(𝑋 ≥ 𝑎) and so on, further more Solution


𝑃(𝑋 = 𝑎) = 0.
The values of the r.v are countable values, hence
We shall illustrate the application of these the distribution is discrete.
axioms with the following examples.
i. Using

Example 2.1:Question 17, 2014/2015 ;section
A ∑ 𝑃(𝑥) = 1
𝑋=1
Determine the constant 𝑝 if the p.d.f of the r.v 𝑋 ∞ ∞
is 𝑃(𝑋 = 𝑥) = 𝑝(𝑥 + 2)𝑥 ; 𝑋 = 1,2,3. 1 𝑥 1 𝑥
⇒ ∑𝑘( ) = 1 = 𝑘 ∑( ) = 1
3 3
Solution 𝑋=1 𝑋=1

It is worthy of note that the discrete distribution 1 1 1


𝑘( + 2 + 3 + ⋯) = 1
and the continuous distribution can both be 3 3 3
called p.d.f’s. In this situation , to know if it’s
discrete or continuous we use the sample space
of the random variable.
Page 10 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

1 1 1 1
The series (3 + 32 + 33 + ⋯ ) is geometric with The first term is 33 and the common
1 1 1 1 1
1st term of 3 and common ratio of 3. ratio is 36 ÷ 33 = 33
= 1⁄27
The sum to infinity of the geometric series is
𝑎 ⇒ 𝑃(𝑋 𝑖𝑠 𝑑𝑖𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑏𝑦 3)
𝑆∞ = 1−𝑟 1⁄
= 𝑘( 27 ) = 2(1⁄ )

1 − 1⁄27 26
1 𝑥 1 1 1
⇒ ∑ ( ) = ( + 2 + 3 + ⋯)
3 3 3 3 = 1⁄13
𝑋=1

1 1⁄ Example 2.3: (Question 15, 2014/2015


= 3
1 = 3 = 1⁄2 ;SectionA)
1−3 2⁄
3 The length of time (in mins) that a man speaks
∞ on a public phone booth is found to be a random
1 𝑥 −𝑥⁄
𝑘 ∑ ( ) = 1 = 𝑘(1⁄2) = 1 variable with p.d.f 𝑓(𝑥) = {𝑘𝑒 5 ; 𝑋 > 0, find
3 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑋=1
the value of 𝑘.
𝑘=2
Solution
ii. 𝑃(𝑋 𝑖𝑠 𝑜𝑑𝑑) = 𝑃(𝑋 = 1) +
𝑃(𝑋 = 3) + 𝑃(𝑋 = 5) + ⋯ The interval given for the p.d.f is 𝑋 > 0, hence
this is a continuous case .
1 1 1 3 1 5 1 7 ∞
𝑘( ) +𝑘( ) +𝑘( ) +𝑘( ) +⋯ Using the formula ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1, the only
3 3 3 3
difference is that the limits would be 0 𝑎𝑛𝑑 ∞.
1 1 1 1
= 𝑘( + 3 + 5 + 7 + ⋯) ∞
3 3 3 3 i.e ∫0 𝑓(𝑥)𝑑𝑥 = 1
The series above is geometric with a first term of ∞
𝑥⁄
1
and a common ratio of ∫ 𝑘𝑒 − 5 . 𝑑𝑥 =1
3 0
1 1 1 1 𝑥 ∞
33
÷ 3 = 3 2 = 9. ∞
−𝑥⁄5 𝑒 − ⁄5
= 𝑘∫ 𝑒 . 𝑑𝑥 = 1 = 𝑘 [ ]
0 − 1⁄5
Using the formula for sum to infinity of a 0
𝑎
geometric series :𝑆∞ = 1−𝑟 𝑥⁄ ∞
= 𝑘[−5𝑒 − 5]
0
= 𝑘(−5𝑒 −∞ + 5𝑒 0 )
1⁄ 1
3 ) = 2 ( ⁄3)
⇒ 𝑃(𝑋 𝑖𝑠 𝑜𝑑𝑑) = 𝑘 ( But 𝑒 −∞ = 0 ⇒ 𝑘(5 × 1) = 1; 𝑘 = 1⁄5
1 − 1⁄9 8⁄
9
1 9 3 Example 2.4: (Question 3, 2013/2014 session)
= 2 ( × ) = ⁄4 = 0.75
3 8
Let 𝑋 be a continuous random variable with
iii. 𝑃(𝑋 𝑖𝑠 𝑑𝑖𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑏𝑦 3) p.d.f (𝑥) = 𝑘𝑥 2 ; 0 ≤ 𝑥 ≤ 2 , where 𝑘 is a
= 𝑃(𝑋 = 3) + 𝑃(𝑋 = 6) constant. Find the value of 𝑘.
+ 𝑃(𝑋 = 9) + ⋯
1 1 1 1 Solution
𝑘 ( 3 + 6 + 9 + 12 + ⋯ )
3 3 3 3

Page 11 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

This is a continuous distribution with interval 𝑃(𝑥) 1⁄ 3⁄ 3⁄ 1⁄


0 ≤ 𝑥 ≤ 2. 8 8 8 8

2 2 Example2.6: A random variable 𝑋 has density


∫ 𝑓(𝑥)𝑑𝑥 = 1 ⇒ ∫ 𝑘𝑥 2 𝑑𝑥 = 1 𝑐𝑒 −3𝑥 ; 𝑋 > 0
function 𝑓(𝑥) = { ,
0 0 0 ;𝑋 ≤ 0
2
𝑥3 Find (a) c (b) 𝑃(1 < 𝑋 < 2) (c) 𝑃(𝑋 ≥ 3) (d)
𝑘[ ] = 1 𝑃(𝑋 < 1).
3 0
Solution
23 0
= 𝑘( − )=1 ∞
3 3 a. Using ∫0 𝑓(𝑥)𝑑𝑥 = 1
8𝑘 ∞
= = 1 ⇒ 𝑘 = 3⁄8 ∫ 𝑐𝑒 −3𝑥 . 𝑑𝑥 = 1
3
0
Example 2.5: A coin is tossed three times. If 𝑋 ∞
is a random variable that represents the number 𝑒 −3𝑥
= 𝑐[ ] =1
of heads that appears , construct a table showing −3 0
the probability distribution of 𝑋.
𝑒 −∞ 𝑒 0 𝑐
Solution = 𝑐[ − ]= =1
−3 −3 3
In a single toss , the sample space is {𝐻, 𝑇} ⇒ in
𝑐=3
three tosses the sample space can be obtained
using the Cartesian product. 2
b. 𝑃(1 < 𝑋 < 2) = ∫1 𝑐𝑒 −3𝑥 . 𝑑𝑥 =
2
i.e {𝐻, 𝑇} × {𝐻, 𝑇} × {𝐻, 𝑇} ∫1 3𝑒 −3𝑥 . 𝑑𝑥
2
= {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇} × {𝐻, 𝑇} 3𝑒 −3𝑥
[ ] = −𝑒 −6 − (−𝑒 −3 )
−3 1
=
= −𝑒 −6 + 𝑒 −3 = 0.04731
{𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇} ∞
. c. 𝑃(𝑋 ≥ 3) = ∫3 𝑐𝑒 −3𝑥 . 𝑑𝑥

3𝑒 −3𝑥
The possible number of heads are 0,1,2 and 3, =[ ]
−3 3
hence 𝑋 = {0,1,2,3}.
= −𝑒 −∞ − (−𝑒 −9 ) = 𝑒 −9
𝑃(𝑋 = 0) = 𝑃(𝑁𝑜 ℎ𝑒𝑎𝑑) = 1⁄8 = 0.0001234.
1
d. 𝑃(𝑋 < 1) = ∫0 𝑐𝑒 −3𝑥 . 𝑑𝑥
𝑃(𝑋 = 1) = 𝑃(𝑂𝑛𝑒 ℎ𝑒𝑎𝑑) = 3⁄8 3𝑒 −3𝑥
1
=[ ] = −𝑒 −3 − (−𝑒 0 )
−3 0
𝑃(𝑋 = 2) = 𝑃(𝑇𝑤𝑜 ℎ𝑒𝑎𝑑𝑠) = 3⁄8
= 1 − 𝑒 −3 = 0.9502
𝑃(𝑋 = 3) = 𝑃(𝑇ℎ𝑟𝑒𝑒 ℎ𝑒𝑎𝑑𝑠) = 1⁄8 Example 2.7: Let 𝑋 be a random variable
𝑐𝑥 ; 0 ≤ 𝑋 ≤ 2
having density function 𝑓(𝑥) = {
The distribution table is as shown below 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
,find
X 0 1 2 3

Page 12 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

(a) Value of c (b) 𝑃(1⁄2 < 𝑋 < 3⁄2) (c) Function of a random variable 𝑋 satisfies the
𝑃(𝑋 > 1) (d) 𝑃(𝑋 = 3) following properties:

Solution 1. 𝐹(𝑥) is non decreasing,


𝑖. 𝑒 𝑖𝑓 𝑥 ≤ 𝑦 , then 𝐹(𝑥) ≤ 𝐹(𝑦)
(a) Using ∫0 𝑓(𝑥)𝑑𝑥 = 1
2 2. lim 𝐹(𝑥) = 0 𝑎𝑛𝑑 lim 𝐹(𝑥) = 1
𝑥→−∞ 𝑥→∞
3. 𝐹(𝑥) is continuous from the right
2
𝑖. 𝑒 lim+ 𝐹(𝑥 + ℎ) = 𝐹(𝑥)
⇒ ∫ 𝑐𝑥. 𝑑𝑥 = 1 ℎ→0
0
2.3.1 Distribution Function of a Discrete
2
𝑐𝑥 2 Random Variable
=[ ] =1
2 0 Let 𝑋 be a discrete random variable with p.m.f
𝑃(𝑥) for 𝑋 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , … , 𝑥𝑛 , then note that
𝑐(2)2 − 𝑐(0) the distribution function of 𝑋 is 𝐹(𝑥) =
= =1
2 𝑃(𝑋 ≤ 𝑥) = ∑𝑋≤𝑥 𝑃(𝑥).
= 2𝑐 = 1 ∴ 𝑐 = 1⁄2 The distribution function of the random variable
𝑋 can be expressed in form of table or a piece-
⁄ 3
(b) 𝑃(1⁄2 < 𝑋 < 3⁄2) = ∫1⁄ 2 𝑐𝑥. 𝑑𝑥 = wise function as shown below.
2
3⁄ 3⁄ 0 ; −∞ ≤ 𝑋 < 𝑥1
𝑐𝑥 2 2 𝑥2 2
[ ] 1
= [ ]1 𝑃(𝑥1 ) ; 𝑥1 ≤ 𝑋 < 𝑥2
2 ⁄2 4 ⁄2
2 2 𝑃(𝑥1 ) + 𝑃(𝑥2 ) ; 𝑥2 ≤ 𝑋 < 𝑥3
(3⁄2) − (1⁄2) 9⁄ − 1⁄
4 4 = 1⁄
𝐹(𝑥) =
= 𝑃(𝑥1 ) + 𝑃(𝑥2 ) + 𝑃(𝑥3 ); 𝑥3 ≤ 𝑋 < 𝑥4
4 4 2 ⋮ ; ⋮
2
2 𝑐𝑥 2
(c) 𝑃(𝑋 > 1) = ∫1 𝑐𝑥. 𝑑𝑥 = [ ] {1 ; 𝑥𝑛 ≤ 𝑋 < ∞
2 1
2
𝑥2 22 − 12 3 Example 2.8: The probability function of a r.v
=[ ] = = 𝑋 is as shown below. Obtain the distribution
4 1 4 4
function of 𝑋.
(d) Since the r.v is continuous , the probability
that the r.v is equal to a particular value is 𝑋 1 2 3
zero.
i.e 𝑃(𝑋 = 3) = 0 𝑃(𝑥) 1⁄ 1⁄ 1⁄
2 3 6
2.3 Distribution Function
Solution
Given the random variable , the probability that
𝑋 is less than or equal to a particular particular is The distribution function of the probability
called the Cumulative Distribution Function distribution above is as shown below.
(C.D.F) at that point and it is denoted as 𝐹(𝑥).
0 ; −∞ ≤ 𝑋 < 1
In other words, 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥). 1⁄ ; 1≤𝑋<2
2
For a continuous random variable ,𝐹(𝑥) = 𝐹(𝑥) = 1⁄ + 1⁄ = 5⁄ ; 2 ≤ 𝑋 < 3
2 3 6
𝑃(𝑋 ≤ 𝑥) = 𝑃(𝑋 < 𝑥). 1 1 1
{ 2 + 3 + 6 = 1; 3 ≤ 𝑋 < ∞
The cumulative distribution function of a
random variable also called the Distribution

Page 13 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Example 2.9: The p.m.f of a r.v 𝑋 is given as Let 𝑓(𝑥) be the p.d.f of a continuous r.v 𝑋, then
𝑃(𝑋 = 𝑥) = 𝑘(𝑥 + 1)3 ; 𝑋 = 0,1,2,3,4. the distribution function of 𝑋 is
𝑥
Obtain the value of 𝑘 and the C.D.F of 𝑋. 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥 .
Solution Note: The p.d.f is the derivative of the
distribution function.
From the given , the distribution is a Discrete
distribution . 𝑑𝐹(𝑥)
i.e 𝑓(𝑥) = 𝑑𝑥
.
To obtain the constant 𝑘, we use the property
Example 2.10: Given that 𝑓(𝑥) =
4 2 );
{𝑐(1 − 𝑥 0 ≤ 𝑋 ≤ 1 is the p.d.f of 𝑋,
∑ 𝑃(𝑥) = 1 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥=0 determine the value of 𝑐 and the distribution
function of 𝑋.
4

⇒ ∑ 𝑘(𝑥 + 1)3 = 1 Solution


𝑥=0
To obtain the constant , we use the property
𝑘(0 + 1)3 + 𝑘(1 + 1)3 + 𝑘(2 + 1)3 1
∫0 𝑓(𝑥) 𝑑𝑥 = 1
+ 𝑘(3 + 1)3 + 𝑘(4 + 1)3 = 1
1
= 𝑘 + 8𝑘 + 27𝑘 + 64𝑘 + 125𝑘 = 1 i.e ∫0 𝑐(1 − 𝑥 2 ). 𝑑𝑥 = 1
1
= 225𝑘 = 1 ⇒ 𝑘 = 1⁄225. 𝑥3
𝑐 [𝑥 − ] = 1
3 0
The p.m.f of the r.v 𝑋 is 𝑃(𝑥) = 1⁄225 (𝑥 +
1)3 . = 𝑐[(1 − 1⁄3) − (0 − 0⁄3)] = 1

𝑋 0 1 2 3 4 = 𝑐(2⁄3) = 1 ⇒ 𝑐 = 3⁄2

𝑃(𝑥) 1⁄ 8⁄ 27⁄ 64⁄ 125⁄ The distribution function of 𝑋 is 𝐹(𝑥) =


225 225 225 225 225 𝑥
𝑃(𝑋 ≤ 𝑥) = ∫0 𝑐(1 − 𝑥 2 ). 𝑑𝑥
The distribution function is as shown below
𝑥
𝑥3 𝑥3
𝐹(𝑥) = 𝑐 [𝑥 − ] = 𝑐 (𝑥 − )
0 ; −∞ ≤ 𝑋 < 0 3 0 3
1⁄ ; 0≤𝑋<1
225 3𝑥 − 𝑥 3 3𝑥 − 𝑥 3
1⁄ 8 = 3⁄2 [ ]=
= 225 + ⁄225 ; 1≤𝑋<2 3 2
1⁄ 8 27
225 + ⁄225 + ⁄225 ; 2 ≤ 𝑋 < 3 3𝑥 − 𝑥 3
1⁄ 8 27 64 ∴ 𝐹(𝑥) = { 2 ;0 ≤ 𝑋 ≤ 1
225 + ⁄225 + ⁄225 + ⁄225 ; 3 ≤ 𝑋 < 4
{ 1 ; 4≤𝑋<∞ 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

2.3.2 Distribution Function of Continuous Example 2.10: Given that the p.d.f of a r.v 𝑋 if
−6𝑥
Random Variables (𝑥) = {6𝑒 ; 𝑋 > 0 .
0 𝑜𝑡ℎ𝑒𝑟 𝑤𝑖𝑠𝑒
Obtain the Distribution function of 𝑋.
Page 14 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Solution Solution

Given 𝑓(𝑥) = 6𝑒 −6𝑥 ; 𝑋 > 0 To obtain the density function , all we need to do
is to differentiate the Distribution function with
⇒the distribution function of 𝑋is 𝐹(𝑥) = respect to 𝑥.
𝑥
𝑥 6𝑒 −6𝑥
𝑃(𝑋 ≤ 𝑥) = ∫0 6𝑒 −6𝑥 . 𝑑𝑥 = [ ] 𝑑[𝐹(𝑥)] 𝑑[1−3𝑒 −2𝑥 ]
−6 0
We have it that 𝑓(𝑥) = 𝑑𝑥
= 𝑑𝑥
= −𝑒 −6𝑥 + 𝑒 0 = 1 − 𝑒 −6𝑥
= 6𝑒 −2𝑥
−2𝑥
1 − 3𝑒 ; 𝑥 ≥ 0
Example 2.11: If 𝐹(𝑥) = { is −2𝑥
0 ; 𝑥<0 ⇒ 𝑓(𝑥) = {6𝑒 ;𝑥 ≥ 0
the distribution function of a r.v , obtain the 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
density function of 𝑋.

3
Mathematical Expectation and Variance

3.1 Expectation .

Given the r.v 𝑋, the expected value or the .


expectation of 𝑋, denoted as 𝐸(𝑋) is defined as
∑ 𝑥 𝑛 𝑃(𝑥)
𝐸(𝑋)
∀𝑥
∑ 𝑥𝑃(𝑥) 𝑖𝑓 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑤𝑖𝑡ℎ 𝑝. 𝑚. 𝑓 𝑃(𝑥) 𝐸(𝑋 𝑛 ) = ∞

∀𝑥 ∫ 𝑥 𝑛 𝑓(𝑥)𝑑𝑥
= ∞
{−∞
∫ 𝑥𝑓(𝑥)𝑑𝑥 𝑖𝑓 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑤𝑖𝑡ℎ 𝑝. 𝑑. 𝑓 𝑓(𝑥)
{−∞ In general,
∑ 𝑔(𝑥)𝑃(𝑥)
Using similar fashion
∀𝑥
𝐸(𝑔(𝑥)) = ∞
∑ 𝑥 2 𝑃(𝑥) ∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
∀𝑥 {−∞
𝐸(𝑋 2 ) = ∞

∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 The last expression is called the Law Of The


{−∞ Unconscious Statistician (LOTUS).

∑ 𝑥 3 𝑃(𝑥) Remark: 𝑬(𝑿) is called the mean or the first


moment about the origin.
∀𝑥
𝐸(𝑋 3 ) = ∞
𝑬(𝑿𝟐 ) is called the second moment
∫ 𝑥 3 𝑓(𝑥)𝑑𝑥 about the origin.
{−∞
𝑬(𝑿𝒓 ) is called the rth moment about
. the origin.
Page 15 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

The second moment about the origin is P(x) 1⁄ 1⁄ 1⁄ 1⁄


called the variance of the r.v, i.e the variance is 4 8 2 8
𝑬(𝑿 − 𝝁)𝟐 .

Laws of Expectation Solution


1. 𝐸(𝑘) = 𝑘, where 𝑘 is a constant. From the given question ,it’s obvious that we are
2. 𝐸(𝑋 ± 𝑌) = 𝐸(𝑋) ± 𝐸(𝑌) considering discrete random variables.
3. 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋), where "𝑎" is constant.
4. 𝐸(𝑋𝑌) = 𝐸(𝑋). 𝐸(𝑌) if 𝑋 and 𝑌 are (𝑎)
independent variables, similarly if
𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are independent X 2 3 11
variables ,then 𝐸(𝑥1 . 𝑥2 . 𝑥3 … 𝑥𝑛 )
= 𝐸(𝑥1 ). 𝐸(𝑥2 ). 𝐸(𝑥3 ) … 𝐸(𝑥𝑛 ) P(x) 1⁄ 1⁄ 1⁄
𝑛 3 2 6
= ∏ 𝐸(𝑥𝑖 ) To obtain the expectation , we use the formula
𝑖=1 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥)
3.1 Variance ⇒ 𝐸(𝑥) = 2𝑝(2) + 3𝑝(3) + 11𝑝(11)
The variance of a random variable , denoted as
= 2(1⁄3) + 3(1⁄2) + 11(1⁄6)
𝑉𝑎𝑟(𝑋) is the second moment about the mean
of the r.v, i.e 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥 − 𝜇)2 = 𝐸(𝑥 2 ) − = 2⁄3 + 3⁄2 + 11⁄6
[𝐸(𝑥)]2 .
4 + 9 + 11
= =4
Laws of variance 6
1. 𝑉𝑎𝑟(𝑥) ≥ 0. 𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2
2. 𝑉𝑎𝑟(𝑘) = 0, where 𝑘 is a constant.
3. 𝑉𝑎𝑟(𝑘𝑥) = 𝑥 2 𝑣𝑎𝑟(𝑥). Where 𝐸(𝑥 2 ) = ∑ 𝑥 2 𝑝(𝑥)
4. 𝑉𝑎𝑟(𝑥 ± 𝑦) = 𝑣𝑎𝑟(𝑥) + 𝑣𝑎𝑟(𝑦) if 𝑋
= 22 𝑝(2) + 32 𝑝(3) + 112 𝑝(11)
and 𝑌 are independent.

Example 3.1: Find the expectation , the variance = 4(1⁄3) + 9(1⁄2) + 121(1⁄6)
and the standard deviation of the r.v 𝑋 having = 4⁄3 + 9⁄2 + 121⁄6
the following distribution.
𝐸(𝑥 2 ) = 26
(𝑎)
⇒ 𝑣𝑎𝑟(𝑥) = 26 − 42 = 26 − 16 = 10 and the
X 2 3 11
standard deviation is √𝑣𝑎𝑟(𝑥) = √10.
P(x) 1⁄ 1⁄ 1⁄ (𝑏)
3 2 6
X -5 -4 1 2

(𝑏) P(x) 1⁄ 1⁄ 1⁄ 1⁄
4 8 2 8
X -5 -4 1 2
𝐸(𝑥) = ∑ 𝑥𝑝(𝑥)

Page 16 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

⇒ 𝐸(𝑥) = −5𝑝(−5) + (−4)𝑝(−4) + 1𝑝(1) 1


6𝑥 3 6𝑥 4
+ 2𝑝(2) =[ − ]
3 4 0
= −5(1⁄4) − 4(1⁄8) + 1(1⁄2) + 2(1⁄8) 6(1)3 6(1)4
( − )
3 4
=
= − 5⁄4 − 4⁄8 + 1⁄2 + 2⁄8 6(0)3 6(0)4
−10 − 4 + 4 + 2 −( − )
= = −1 [ 3 4 ]
8 = [2 − 1.5] = 0.5
1
𝐸(𝑥) = −1 (ii) 𝐸(𝑥 2 ) = ∫0 𝑥 2 𝑓(𝑥)𝑑𝑥
1
2) [𝐸(𝑥)]2
𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 − = ∫ 𝑥 2 [6𝑥(1 − 𝑥)]𝑑𝑥
0
Where 𝐸(𝑥 2 ) = ∑ 𝑥 2 𝑝(𝑥) 1

𝐸(𝑥 2 ) = (−5)2 𝑝(−5) + (−4)2 𝑝(−4) = ∫(6𝑥 3 − 6𝑥 4 ). 𝑑𝑥


+ (1)2 𝑝(1) + (2)2 𝑝(2) 0
1
6𝑥 4 6𝑥 5
=[ − ]
= 25(1⁄4) + 16(1⁄8) + 1(1⁄2) + 4(1⁄8) 4 5 0
6(1)4 6(1)5
= 25⁄4 + 16⁄8 + 1⁄2 + 4⁄8 = 71⁄8 ( − )
4 5 6 6
= 4 5 = −
𝐸(𝑥 2 ) = 71⁄8 6(0) 6(0) 4 5
−( − )
[ 4 5 ]
∴ 𝑣𝑎𝑟(𝑥) = 71⁄8 − (−1)2 = 71⁄8 − 1 30 − 24 3
= = ⁄10
20
(iii) To evaluate 𝐸(𝑥 − 1)2 , we can use
= 63⁄8, hence the standard deviation is √63⁄8. two different method
Method1: 𝐸(𝑥 − 1)2
Example 3.2: Let the p.d.f of a random variable = 𝐸(𝑥 2 − 2𝑥 + 1)
be given as 𝑓(𝑥) = 6𝑥(1 − 𝑥) = 𝐸(𝑥 2 ) − 𝐸(2𝑥) + 𝐸(1)
𝐸(𝑥 2 ) − 2𝐸(𝑥) + 1
; 0 ≤ 𝑥 ≤ 1. = 3⁄10 − 2(1⁄2) + 1 = 3⁄10
Find (i) 𝐸(𝑥) (ii) 𝐸(𝑥 2 ) (iii) 𝐸(𝑥 − 1)2 (iv)
Example 3.3: (Question 28, 2014/2015
𝑣𝑎𝑟(𝑥).
;Section A)
Solution
Given that 𝐸(𝑥 + 4) = 10 and
The r.v given is a continuous r.v as can be seen
from the given interval. 𝐸(𝑥 + 4)2 = 116, determine 𝐸(𝑥) and 𝑣𝑎𝑟(𝑥).

1 Solution
(i) 𝐸(𝑥) = ∫0 𝑥𝑓(𝑥)𝑑𝑥
1 We have it that 𝐸(𝑥 + 4) = 10. From te
= ∫ 𝑥6𝑥(1 − 𝑥). 𝑑𝑥 properties of expectation , this can also be
0
written in the form 𝐸(𝑥) + 𝐸(4) = 10
1
i.e 𝐸(𝑥) + 4 = 10; 𝐸(𝑥) = 10 − 4 = 6.
= ∫(6𝑥 2 − 6𝑥 3 ). 𝑑𝑥
0 Also , 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − (𝐸(𝑥))2
Page 17 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

In this , we know 𝐸(𝑥) but not 𝐸(𝑥 2 ). In order 1 1

to obtain this , let’s use the equation = ∫ 𝑥 2 (1⁄4). 𝑑𝑥 = 1⁄4 ∫ 𝑥 2 . 𝑑𝑥


𝐸(𝑥 + 4)2 = 116, ie. −1 −1

𝐸(𝑥 2 + 8𝑥 + 16) = 116 𝑥3


1
(1)3 (−1)3
= 1⁄4 [ ] = 1⁄4 [ − ]
𝐸(𝑥 2 ) + 8𝐸(𝑥) + 16 = 116 3 −1 3 3

𝐸(𝑥 2 ) + 8(6) = 116 − 16 = 100 1 1 2


= 1⁄4 [ + ] = 1⁄4 [ ] = 1⁄6
3 3 3
𝐸(𝑥 2 ) = 100 − 48 = 52.
⇒ 𝑣𝑎𝑟(𝑥) = 1⁄6 − 0 = 1⁄6 ~(𝑏)

Example 3.4: (Question 8,2014/2015 ;Section


A)

Given 𝑓(𝑥) = 1⁄4 ; −1 ≤ 𝑥 ≤ 1, find 𝐸(𝑥) and Example 3.5: (Question 35, 2013/2014 session)
𝑣𝑎𝑟(𝑥).
Two fair coins are tossed , if 𝑌 represents the
(𝑎) 0 𝑎𝑛𝑑 6 (𝑏) 0 𝑎𝑛𝑑 1⁄6 (𝑐) 6 𝑎𝑛𝑑 1⁄6 number of tails , what is the expected value of
𝑌?
(𝑑) 6 𝑎𝑛𝑑 1⁄4 Solution
Solution For a fair coin , the sample space is {H,T},
hence the sample space for two coins is
This is a continuous distribution , hence 𝐸(𝑥) = {H,T}×{H,T}= {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}.
1
∫−1 𝑥𝑓(𝑥). 𝑑𝑥
From this , we have it that 𝑌 = {0,1,2} with
1 1
𝑝(0) = 1⁄4, 𝑝(1) = 2⁄4 and 𝑝(2) = 1⁄4
⇒ 𝐸(𝑥) = ∫ 𝑥(1⁄4). 𝑑𝑥 = 1⁄4 ∫ 𝑥 . 𝑑𝑥
2
−1 −1
⇒ 𝐸(𝑦) = ∑ 𝑦𝑃(𝑦)
2 1
𝑥 𝑦=0
= 1⁄4 [ ] = 0𝑝(0) + 1𝑝(1) + 2𝑝(2)
2 −1
2 1
(1)2 (−1)2 1 1 = 0 + 1( ) + 2( ) = 1
= 1⁄4 [ − ] = 1⁄4 ( − ) = 0 4 4
2 2 2 2
Example 3.6: (Question 14,2013/2014 Session)
⇒ 𝐸(𝑥) = 0.
Let 𝑋 be a r.v assuming the possible values 𝑋 =
𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 where {2,4,6,8} and suppose that the p.d.f is
1
𝑝(2) = 𝑝(4) = 0.25 and 𝑝(6) = 𝑝(8)
𝐸(𝑥 2 ) = ∫ 𝑥 2 𝑓(𝑥). 𝑑𝑥
−1
= 0.05, find the expected value of 𝑋.

Solution

Page 18 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

This is a discrete distribution , hence we use the Substituting these into the equations
formula 𝐸(𝑥) = ∑∀𝑥 𝑥𝑝(𝑥)
(1): 2 = 10𝑘 + 𝑑
= 2𝑝(2) + 4𝑝(4) + 6𝑝(6) + 8𝑝(8)
(1): 6 = 25𝑘 2 ⇒ 𝑘 2 = 6⁄25
= 2(0.25) + 4(0.25) + 6(0.05) + 8(0.05)

= 2.2 𝑎𝑛𝑑 𝑘 = ± √6⁄5.

Example 3.7: Let 𝑋 be a r.v for which 𝐸(𝑥) = Substituting these values into 2 = 10𝑘 + 𝑑
10 and 𝑣𝑎𝑟(𝑥) = 25. For what possible values
of 𝑘 and 𝑑 does 𝑌 = 𝑘𝑋 + 𝑑 have mean of 0 When 𝑘 = √6⁄5 , 2 = 10 (√6⁄5) + 𝑑 = 2√6 +
and variance of 1? For what values does it have
mean of 2 and variance of 6? 𝑑

Solution ⇒ 𝑑 = 2 − 2√6 = 2(1 − √6)

Given that 𝐸(𝑥) = 10, 𝑣𝑎𝑟(𝑥) and Solve for when 𝑘 = − √6⁄5.
𝑌 = 𝑘𝑋 + 𝑑
Example 3.8: A discrete r.v can take all
𝐸(𝑦) = 𝐸(𝑘𝑥 + 𝑑) = 𝑘𝐸(𝑥) + 𝑑 possible integer values from 1 to , each with
(𝑘 + 1)⁄
probability 1⁄𝑘. Show that its mean is 2
⇒ 𝐸(𝑦) = 𝑘𝐸(𝑥) + 𝑑
(𝑘 2 − 1)⁄
and variance is 12.
𝐸(𝑦) = 10𝑘 + 𝑑 … (1)
Solution
Also 𝑣𝑎𝑟(𝑦) = 𝑣𝑎𝑟(𝑘𝑥 + 𝑑) = 𝑣𝑎𝑟(𝑘𝑥) +
𝑣𝑎𝑟(𝑑) From the given statement , the r.v can take on
countable values from the set of natural numbers
𝑣𝑎𝑟(𝑦) = 𝑘 2 𝑣𝑎𝑟(𝑥) , i.e 𝑋 = 1,2,3,4, … , 𝑘. The probabilities are all
the same , hence the events or outcomes are said
𝑣𝑎𝑟(𝑦) = 25𝑘 2 … (2)
to be equally likely; i.e 𝑃(1) = 𝑃(2) = ⋯ =
1
For 𝐸(𝑦) = 0 𝑎𝑛𝑑 𝑣𝑎𝑟(𝑦) = 1 𝑃(𝑘) = .
𝑘

Substituting these values into (1) and (2). Now , the mean of the r.v is 𝐸(𝑋) =
1 1
(1): 0 = 10𝑘 + 𝑑 ∑𝑘𝑥=1 𝑥𝑝(𝑥) = ∑𝑘𝑥=1 𝑥 ( ) = ∑𝑘𝑥=1 𝑥
𝑘 𝑘

1 1
(2): 1 = 25𝑘 2 ⇒ 𝑘 2 = 25 ; 𝑘 = ± 1⁄5 = 𝑘 (1 + 2 + 3 + 4 + ⋯ + 𝑘). The expression
in the bracket is the sum of the first 𝑘 natural
When 𝑘 = 1⁄5 ; 0 = 10(1⁄5) + 𝑑 = 2 + 𝑑 ⇒ numbers. This is given by the expression
𝑘(𝑘+1)
𝑑 = −2 ∑𝑘𝑥=1 𝑥 = . The mean is thus (𝑋) =
2
1 𝑘(𝑘+1) 𝑘+1
. 2 = .
When 𝑘 = − 1⁄5 ; 0 = 10(− 1⁄5) + 𝑑 𝑘 2

Now , let’s compute that for the variance .


= −2 + 𝑑 ⇒ 𝑑 = 2
The variance is
For 𝐸(𝑦) = 2 𝑎𝑛𝑑 𝑣𝑎𝑟(𝑦) = 6

Page 19 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

2 Since the die is fair, then the events are said to


𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) , where 𝐸(𝑋 2 ) =
1 be equally likely.
∑𝑘𝑥=1 𝑥 2 𝑃(𝑥) = ∑𝑘𝑥=1 𝑥 2 ( )
𝑘
i.e 𝑝(1) = 𝑝(2) = 𝑝(3) = ⋯ = 𝑝(6) = 1⁄6.
1 𝑘 1
= ∑
𝑘 𝑥=1
𝑥2 = 𝑘
(12 2
+ 2 + 3 + ⋯ + 𝑘 ).2 2
6

The sum in the bracket is the sum of the squares 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥)
of the first 𝑘 natural numbers, and this is given 𝑥=1
𝑘(𝑘+1)(2𝑘+1)
as ∑𝑘𝑥=1 𝑥 2 = , hence we have
6 = 1𝑝(1) + 2𝑝(2) + 3𝑝(3) + 4𝑝(4) + 5𝑝(5)
+ 6𝑝(6)
1 𝑘(𝑘 + 1)(2𝑘 + 1)
𝐸(𝑋 2 ) = .
𝑘 6 = 1(1⁄6) + 2(1⁄6) + 3(1⁄6) + 4(1⁄6)
(𝑘+1)(2𝑘+1) + 5(1⁄6) + 6(1⁄6)
𝐸(𝑋 2 ) = 6
.

The variance is thus, = 1⁄6 + 2⁄6 + 3⁄6 + 4⁄6 + 5⁄6 + 6⁄6 = 21⁄6
= 3.5
2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋))
Example 3.10: (Question 6 and 7, 2013/2014
(𝑘 + 1)(2𝑘 + 1) 𝑘+1 2 Session)
= −( )
6 2 A random variable 𝑋 can take only values , 1
2 and 2 with 𝑝(1) = 0.8 𝑎𝑛𝑑 𝑝(2) = 0.2. find
(𝑘 + 1)(2𝑘 + 1) (𝑘 + 1)
= − 𝐸(𝑥) and 𝑣𝑎𝑟(𝑥).
6 4
2𝑘 + 1 𝑘 + 1 Solution
= (𝑘 + 1) ( − )
6 4 Given 𝑝(1) = 0.8 𝑎𝑛𝑑 𝑝(2) = 0, using 𝐸(𝑥) =
∑2𝑥=1 𝑥𝑝(𝑥)
4(2𝑘 + 1) − 6(𝑘 + 1)
= (𝑘 + 1) ( )
24 ⇒ 𝐸(𝑥) = 1𝑝(1) + 2𝑝(2) = 1(0.8) + 2(0.2)
= 1.2
8𝑘 + 4 − 6𝑘 − 6
= (𝑘 + 1) ( )
24 𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2
2𝑘 − 2 (𝑘 + 1)(𝑘 − 1) Where 𝐸(𝑥 2 ) = ∑2𝑥=1 𝑥 2 𝑝(𝑥) = (1)2 𝑝(1) +
= (𝑘 + 1) ( )=
24 12 22 𝑝(2)
𝑘 2 −1 = 1(0.8) + 4(0.2) = 1.6
= 12
(Shown)

Example 3.9: (Question 2,2013/2014 session) ⇒ 𝑣𝑎𝑟(𝑥) = 1.6 − (1.2)2 = 1.6 − 1.44 = 0.14

A fair die is tossed once and 𝑋 is the number Example 3.11: (Question 44, 2014/2015;
Section A)
that turns up, then 𝐸(𝑥) is what?
If the p.d.f of a r.v 𝑋 is given by 𝑓(𝑥) =
Solution 1
2 𝑥
, 0 ≤ 𝑥 ≤ 1, 𝑓𝑖𝑛𝑑 𝐸(𝑥)𝑎𝑛𝑑 𝑣𝑎𝑟(𝑥).

In tossing a coin , the sample space is
{1,2,3,4,5,6}. Solution

Page 20 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

1 2
Using the formula 𝐸(𝑥) = ∫0 𝑥𝑓(𝑥)𝑑𝑥 ⇒ 𝑣𝑎𝑟(𝑥) = 1⁄5 − (1⁄3) = 1⁄5 − 1⁄9
1 1 = 4⁄45
𝑥 1
⇒ 𝐸(𝑥) = ∫ 𝑥 ( ) 𝑑𝑥 = ∫ ( ) 𝑑𝑥
2√𝑥 2√𝑥
0 0

1 Example 3.12: (Question 11and 12, 2013/2014


√𝑥 Session)
=∫ 𝑑𝑥
2 3𝑥 2
0 If 𝑓(𝑥) = 125 ; 0 < 𝑥 < 5 is the p.d.f of a r.v
1 1 𝑋, find 𝐸(𝑥) and 𝐸(5𝑥 2 − 4𝑥 + 3).
1⁄
= 1⁄2 ∫ √𝑥 . 𝑑𝑥 = 1⁄2 ∫ 𝑥 2 . 𝑑𝑥
Solution
0 0
5
1 1 3 1 Using 𝐸(𝑥) = ∫0 𝑥𝑓(𝑥)𝑑𝑥
+1
𝑥 2 𝑥 2
= 1⁄2 [ ] = 1⁄2 [ ] 5 5
1⁄ + 1 3⁄ 3𝑥 2 3𝑥 3
2 2 ⇒ 𝐸(𝑥) = ∫ 𝑥 (
0 0 ) 𝑑𝑥 = ∫ ( ) 𝑑𝑥
125 125
3 1 3 1 3 3 0 0
2𝑥 2 𝑥 (1) (0)
2 2 2
= 1⁄2 [ ] =[ ] =[ − ] = 1⁄3 5
3 3 3 3
0 0
= 3⁄125 ∫ 𝑥 3 . 𝑑𝑥
0
𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 where
5
𝑥4 54 (0)4
1 = 3⁄125 [ ] = 3⁄125 [ − ]
4 4 4
𝐸(𝑥 2 ) = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 0

0
= 3⁄125 (625⁄4) = 15⁄4.
1 1
𝑥2 1
= ∫𝑥 ( 2
) 𝑑𝑥 = ∫ ( ) 𝑑𝑥 𝐸(5𝑥 2 − 4𝑥 + 3)
2√𝑥 2√𝑥
0 0
= 𝐸(5𝑥 2 ) − 𝐸(4𝑥) + 𝐸(3)
1 3⁄
𝑥 2
= 5𝐸(𝑥 2 ) − 4𝐸(𝑥) + 3
=∫ 𝑑𝑥
2
0
= 5𝐸(𝑥 2 ) − 4(15⁄4) + 3
1 3 1
3 𝑥 2+1 = 5𝐸(𝑥 2 ) − 15 + 3
= 1⁄2 ∫ 𝑥 . 𝑑𝑥 = 1⁄2 [
2 ]
3⁄ + 1
0 2 5 3𝑥 2
0 = 5𝐸(𝑥 2 ) − 12 but 𝐸(𝑥 2 ) = ∫0 𝑥 2 ( ) 𝑑𝑥 =
125
1 5 3𝑥 4 5
𝑥2
5
∫0 (125) 𝑑𝑥 = 3⁄125 ∫0 𝑥 4 . 𝑑𝑥
= 1⁄2 [ ]
5⁄ 5
2 0 𝑥5 55 (0)5
= 3⁄125 [ ] = 3⁄125 [ − ]
1 1
5 5
0
5
5 5 5 5
2𝑥 2 𝑥2 (1)2 (0)2 = 3⁄125 (3125⁄5) = 15
= 1⁄2 [ ] =[ ] =[ − ] = 1⁄5
5 5 5 5
0 0
Page 21 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

⇒ 𝐸(5𝑥 2 − 4𝑥 + 3) = 5(15) − 12=63. (1)2 (1)3 (0)2 (0)3


+ [( − )−( − )]
2 3 2 3

1 1 1 1
Example 3.13: (Question 33, 2013/2014 = [− + + − ] = 0
Session) 2 3 2 3

Suppose that 𝑋 is a random variable with p.d.f 𝐸(𝑥 2 ) = ∫ 𝑥𝑓(𝑥) 𝑑𝑥


1 + 𝑥; −1 ≤ 𝑥 ≤ 0
𝑓(𝑥) = { 1 − 𝑥; 0 ≤ 𝑥 ≤ 1 0 1
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 2 (1
= ∫𝑥 + 𝑥) 𝑑𝑥 + ∫ 𝑥 2 (1 − 𝑥) 𝑑𝑥
Find 𝑣𝑎𝑟(𝑥). −1 0

0 1
Solution
= ∫(𝑥 2 + 𝑥 3 ) 𝑑𝑥 + ∫(𝑥 2 − 𝑥 3 ) 𝑑𝑥
1 + 𝑥; −1 ≤ 𝑥 ≤ 0
−1 0
Given 𝑓(𝑥) = { 1 − 𝑥; 0 ≤ 𝑥 ≤ 1
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 1
𝑥3 𝑥4 𝑥3 𝑥4
=[ + ] +[ − ]
𝑣𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 , where 𝐸(𝑥) = 3 4 −1 3 4 0
∫ 𝑥𝑓(𝑥) 𝑑𝑥
(0)3 (0)4 (−1)3 (−1)4
0 1 = [( + )−( + )]
3 4 3 4
= ∫ 𝑥(1 + 𝑥) 𝑑𝑥 + ∫ 𝑥(1 − 𝑥) 𝑑𝑥
−1 0 (1)3 (1)4 (0)3 (0)4
+ [( − )−( − )]
0 1 3 4 3 4
= ∫(𝑥 + 𝑥 2 ) 𝑑𝑥 + ∫(𝑥 − 𝑥 2 ) 𝑑𝑥 1 1 1 1
−1 0
= [ − + − ] = 1⁄6
3 4 3 4
0 1
𝑥2 𝑥3 𝑥2 𝑥3 ∴ 𝑣𝑎𝑟(𝑥) = 1⁄6 − 02 = 1⁄6
=[ + ] +[ − ]
2 3 −1 2 3 0

(0)2 (0)3 (−1)2 (−1)3


= [( + )−( + )]
2 3 2 3

4
Theoretical Distributions

As discussed in the last chapter, there are two 1. Bernoulli distribution


classes of probability distributions, the discrete 2. Binomial distribution
and the continuous distributions. 3. Poisson distribution and
4. Normal distribution.
In this chapter, we shall consider three types of
discrete distribution and one of the continuous The 1st three are discrete distributions while the
distribution as stated below. 4th is a continuous distribution.
Page 22 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

4.2 Bernoulli Distribution Where 𝑛 is the sample size , 𝑝 is the prob. of


𝑛!
success and (𝑛𝑥) = (𝑛−𝑥)!𝑥!.
Let 𝑋 be a discrete random variable that is
associated to only two possible outcomes, then
the variable is called a Bernoulli r.v. The The mean and the variance of the binomial
experiment or trial that results to only two distribution are 𝐸(𝑥) = 𝑛𝑝 and
possible values is called a Bernoulli trial or
𝑣𝑎𝑟(𝑥) = 𝑛𝑝𝑞.
experiment. For instance , in a single toss of a
coin , the outcome is either a Head(H) or a If 𝑋 follows the binomial distribution with
Tail(T), hence the experiment is Bernoulli. In parameter 𝑛 and , then we symbolize this as
writing an examination , a candidate would 𝑋~𝐵(𝑛, 𝑝).
either fail or pass. In tossing a fair coin , we
know that there are six possible outcomes, Properties of the binomial distribution
however this experiment could be made
Bernoulli by defining the outcomes in the 1. The experiment consists of 𝑛
following ways: independent trials .
2. Each trial results to two possible
1. Either an even number would occur or outcomes , a success and a failure.
an odd number. 3. The probability of success in a single
2. Either a 3 would occur or not, and so on. trial is 𝑝 and that of failure is 𝑞.
Usually , we define the random variable 𝑋 as
1 𝑖𝑓 𝑡ℎ𝑒 𝑡𝑟𝑖𝑎𝑙 𝑖𝑠 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙
𝑋={ Example 4.1: (Question 1, 2011/2012 Session)
0 𝑖𝑓 𝑡ℎ𝑒 𝑡𝑟𝑖𝑎𝑙 𝑖𝑠 𝑢𝑛𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙

Furthermore , 𝑃(𝑠𝑢𝑐𝑐𝑒𝑠𝑠) = 𝑃(𝑋 = 1) = 𝑝 and If 𝑋~𝐵(𝑛, 𝑝) and it is given that 𝑛 = 4 𝑎𝑛𝑑 𝑝 =


𝑃(𝑓𝑎𝑖𝑙𝑢𝑟𝑒) = 𝑃(𝑋 = 0) = 𝑞 where 𝑝 + 𝑞 = 1 0.1, obtain the 𝑣𝑎𝑟(𝑥).

⇒ 𝑞 = 1−𝑝. Solution

The p.m.f of the Bernoulli r.v 𝑋 is 𝑃(𝑥) = Using the formula 𝑣𝑎𝑟(𝑥) = 𝑛𝑝𝑞, where 𝑞 =
𝑝𝑞1−𝑥 ; 𝑋 = 0 𝑜𝑟 1. 1 − 𝑝 = 1 − 0.1 = 0.9

The mean and the variance of the Bernoulli ⇒ 𝑣𝑎𝑟(𝑥) = 4(0.1)(0.9) = 0.36
distribution are𝐸(𝑥) = 𝑝 and 𝑣𝑎𝑟(𝑥) = 𝑛𝑝𝑞
respectively. Example 4.2: In a family of three children ,
what is the probability that there will be exactly
Binomial Distribution 2 boys assuming that the sexes are equally likely
to occur in each birth.
The binomial distribution is used when a
Bernoulli trial is performed atleast once . It’s Solution
used when a series of independent experiments
results to two possible outcomes each, and we In a single birth , the child will either be a boy or
are interested in the number of successes from a girland since the sexes are equally likely ,
these trials. 𝑃(𝑏𝑜𝑦) = 𝑃(𝑔𝑖𝑟𝑙) = 1⁄2 = 𝑝 = 𝑞. The number
of births is 𝑛 = 3
Let 𝑋 be a binomial r.v, then the p.m.f of 𝑋 is
𝑃(𝑥) = (𝑛𝑥)𝑝 𝑥 𝑞𝑛−𝑥 ; 𝑥 = 0,1,2, … , 𝑛 ⇒ 𝑋~𝐵(3, 1⁄2).

Using the binomial distribution 𝑃(𝑋 = 𝑥) =


𝑛
𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥
Page 23 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

⇒ 𝑃(𝑋 = 2) = 0.0317 + 0.1267 + 0.2323


= 3𝐶2 (0.5)2 (0.5)3−2 = 3(0.5)2 (0.5)1 = 0.375 + 0.2581
= 0.6488
Example 4.3: (Question 11-14; 2014/2015 iv. 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 4 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒)
,Section A) = 𝑃(𝑋 ≥ 4)
1 − 𝑃(𝑋 < 4)
From past records, it is found that in a sample of 𝑤ℎ𝑒𝑟𝑒𝑃(𝑋 < 4) =
12TV tubes , 3 are defectives. If a tube is drawn 12
= 𝐶0 (0.25)0 (0.75)12−0
at random, what is the probability that
+ 12𝐶1 (0.25)1 (0.75)12−1
i. Exactly 4 is defective? + 12𝐶2 (0.25)2 (0.75)12−2
ii. None is defective? + 12𝐶3 (0.25)3
iii. At most 4 is defective? = 0.0317 + 0.1267 + 0.2323
iv. At least 4 is defective? + 0.2581
= 0.6488
⇒ 𝑃(𝑋 ≥ 4) = 1 − 0.6488
= 0.3512

Solution Example 4.4: Injection of a certain dose of


digitalis per unit of body weight into a large
From the given problem , the probability that a number of frogs causes the death of 35% of
TV tube is defective is 3⁄12 = 1⁄4 them. What is the expected number of death
when this dose is injected into each of a group of
= 0.25 ten frogs ? What is the probability that the
number of deaths will be
⇒ 𝑝 =0.25 and 𝑞 = 1 − 0.25 = 0.75.
i. Exactly three?
Using the binomial distribution 𝑃(𝑋 = 𝑥) = ii. At most five?
𝑛
𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥 iii. At least six?

i. 𝑃(𝐸𝑥𝑎𝑐𝑡𝑙𝑦 4 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) Compute the variance.


= 𝑃(𝑋 = 4)
Solution
= 12𝐶4 (0.25)4 (0.75)12−4
= 495(0.00039) = 0.1936 From the definition of the question , the
ii. 𝑃(𝑁𝑜𝑛𝑒 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) percentage of deaths is 35%; this is the
= 𝑃(𝑋 = 0) probability that a given frog given this dose
= 12𝐶0 (0.25)0 (0.75)12−0 would die, i.e 𝑃(𝑑𝑒𝑎𝑡ℎ) = 35% = 0.35
= 0.0317
iii. 𝑃(𝐴𝑡 𝑚𝑜𝑠𝑡 𝑡ℎ𝑟𝑒𝑒 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) = = 𝑝 ⇒ 𝑞 = 1 − 0.35 = 0.65.
𝑃(𝑋 ≤ 3)
= 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) The sample size is 𝑛 = 10, hence the expected
+ 𝑃(𝑋 = 2) number of deaths is 𝑛𝑝 = 10 × 3.5 ≈ 4.
+ 𝑃(𝑋 = 3) Using the formula 𝑃(𝑋 = 𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
= 12𝐶0 (0.25)0 (0.75)12−0
+ 12𝐶1 (0.25)1 (0.75)12−1 i. 𝑃(𝐸𝑥𝑎𝑐𝑡𝑙𝑦 3 𝑑𝑒𝑎𝑡ℎ𝑠)
+ 12𝐶2 (0.25)2 (0.75)12−2 = 𝑃(𝑋 = 3)
+ 12𝐶3 (0.25)3 = 10𝐶3 (0.35)3 (0.65)10−3
= 0.25222

Page 24 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

ii. 𝑃(𝐴𝑡 𝑚𝑜𝑠𝑡 5 𝑑𝑒𝑎𝑡ℎ𝑠) = 𝑃(𝑋 ≤ 5) ii. No head appearing?


= 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + ⋯ iii.At most 4 heads appearing ?
+ 𝑃(𝑋 = 5) iv.At least five heads appearing?
= 10𝐶0 (0.35)0 (0.65)10−0 (b) Obtain the mean and the variance of the
+ 10𝐶1 (0.35)1 (0.65)10−1 + distribution.
10
𝐶2 (0.35)2 (0.65)10−2 Solution
+ 10𝐶3 (0.35)3 (0.65)10−3
+ 10𝐶4 (0.35)4 (0.65)10−4 Since the coin is a fair one⇒ 𝑃(𝐻𝑒𝑎𝑑) =
+ 10𝐶5 (0.35)5 (0.65)10−5 𝑃(𝑇𝑎𝑖𝑙) = 1⁄2 = 0.5 = 𝑝 = 𝑞
= 0.25222 + 0.23767 + 0.15357
+ 0.01346 (a) Using the binomial distribution
+ 0.07249 𝑃(𝑋 = 𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥
+ 0.17565 i. 𝑃(𝑋 = 𝑥) =
30
= 0.87504 𝐶3 (0.5)3 (0.5)30−3 =
30
iii. 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 6 𝑑𝑒𝑎𝑡ℎ𝑠) = 𝐶3 (0.5)3 (0.5)27
𝑃(𝑋 ≥ 6) = 1 − 𝑃(𝑋 < 6) = 0.00000378
𝑤ℎ𝑒𝑟𝑒 𝑃(𝑋 < 6) ii. 𝑃(𝑁𝑜 ℎ𝑒𝑎𝑑) = 𝑃(𝑋 = 𝑥) =
30
= 𝑃(𝑋 = 0) 𝐶0 (0.5)0 (0.5)30−0 =
+ 𝑃(𝑋 = 1) + ⋯ 30
𝐶0 (0.5)0 (0.5)30
+ 𝑃(𝑋 = 5) = 0.00000000093
= 0.87504 iii. 𝑃(𝐴𝑡 𝑚𝑜𝑠𝑡 4 ℎ𝑒𝑎𝑑𝑠) =
⇒ 𝑃(𝑋 ≥ 6) = 1 − 0.97504 𝑃(𝑋 ≤ 4)
= 0.12496. = 𝑃(𝑋 = 0) + 𝑃(𝑋 − 1) + ⋯
+ 𝑃(𝑋 = 4)
The variance of the distribution is 𝑛𝑝𝑞 = 10 ×
0.35 × 0.65 = 2.275 = 30𝐶0 (0.5)0 (0.5)30−0
+ 30𝐶1 (0.5)1 (0.5)30−1
Example 4.5: 12% of items produced by a + 30𝐶2 (0.5)2 (0.5)30−2
machine are defective. What is the probability + 30𝐶3 (0.5)3 (0.5)30−3
that out of a random sample of 20 items + 30𝐶4 (0.5)4 (0.5)30−4
produced by the machine, 5 are defective?
= 0.00002974
Solution iv. 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 5 ℎ𝑒𝑎𝑑𝑠) =
𝑃(𝑋 ≥ 5)
Given 𝑃(𝑖𝑡𝑒𝑚 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) = 12% = 1 − 𝑃(𝑋 < 5)
Where 𝑃(𝑋 < 5) ==
= 0.12 ⇒ 𝑞 = 1 − 0.12 = 0.88. 𝑃(𝑋 = 0) + 𝑃(𝑋 − 1) + ⋯ +
𝑃(𝑋 = 4)
The sample size is 𝑛 = 20 = 0.00002974
Using the binomial distribution, 𝑃(𝑋 = 5) = ⇒ 𝑃(𝑋 ≥ 5)
20 = 1 − 0.00002974
𝐶5 (0.12)5 (0.88)20−5
= 0.999977024
= 20
𝐶5 (0.12)5 (0.88)15 = 0.000923 (b) The mean is 𝑛𝑝 = 30 × 0.5 = 15 and
variance = 𝑛𝑝𝑞 = 30(0.5)(0.5) = 7.5
Example 4.6: A fair coin is tossed 30 times ,
what is the probability of Example 4.7: (Question 20, 2011/2012
session)
(a) i. 3 heads appearing?

Page 25 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Consider 𝑋~𝐵(10, 1⁄2), find 𝑃(3 ≤ 𝑥 ≤ = 𝑃(𝑋 = 16) + 𝑃(𝑋 = 17) + ⋯ + 𝑃(𝑋
5). = 20)
20
= 𝐶16 (0.75) (0.25)4
16

Solution + 20𝐶17 (0.75)17 (0.25)3


+ 20𝐶18 (0.75)18 (0.25)2
This is a binomial distribution with 𝑛 = 10
+ 20𝐶19 (0.75)19 (0.25)1
and 𝑝 = 1⁄2
+ 20𝐶20 (0.75)20 (0.25)0
⇒ 𝑃(3 ≤ 𝑥 ≤ 5) = 0.4148437
(c) 𝑛 = 20, hence the mean is 𝑛𝑝 =
= 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5) 20(0.75) = 15
10
= 𝐶3 (0.5)3 (0.5)10−3 4.3 Poisson Distribution
+ 10𝐶4 (0.5)4 (0.5)10−4 A poisson experiment is one which occurs in
+ 10𝐶5 (0.5)2 (0.5)10−5 time intervals. A poisson random variable is a
10 r.v that models the number of events that occurs
= 𝐶3 (0.5)3 (0.5)7 + 10𝐶4 (0.5)4 (0.5)6 within a given time interval or . if the rate of
+ 10𝐶5 (0.5)5 (0.5)5 occurrence of the event is 𝜆, then the p.m.f of
= 0.56836 𝜆𝑥 𝑒 −𝜆
the r.v is 𝑃(𝑋 = 𝑥) = 𝑥!
;
Example 4.8: (Question 8-10, 2011/2012
Session) 𝑥 = 0,1,2, …

The probability that your call to a service line is Usually , we write 𝑋~𝑝(𝜆), where 𝜆 is called
answered in less than 30 sec. is 0.75. the parameter of the distribution.

Assuming that your calls are independent, The mean and the variance of the poisson
distribution are equal to 𝜆.
(a) If you call 10 times , what is the
probability that exactly 9 of your calls Example of the poisson distribution are the
are answered in less than 30 secs.? number of deaths per year ,the number of
(b) If ypu call 20 times , what is the misprints in every five pages of a book, number
probability that at least 16 calls are of phone calls in every 2 minutes , e.t.c.
answered in less than 30 secs.?
(c) If you call 20 times , what is the mean Properties of the poisson distribution
number of calls that are answered?
1. The number of events in non-over-
Solution laping time intervals are independent.
2. The probability of an event occurring in
(a) In this , we shall use the binomial any sub-interval or the mean rate of
distribution , in which 𝑝 = 0.75 ⇒ 𝑞 = occurrence remains constant through the
1 − 0.75 = 0.25 entire time under consideration.
3. An interval can be divided into sub-
The sample size in this is 𝑛 = 10 intervals ; this is to enable the
probability of an even occurring in any
10
∴ 𝑃(𝑋 = 9) = 𝐶9 (0.75)9 (0.25)1 one sub – interval to tend to zero.
= 0.1877
Example 4.9: Suppose that the number of
(b) 𝑛 = 20. claims for missing baggage average 6 per day.
𝑃(𝑋 ≥ 16)
Page 26 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Find the probability that ona given day , there b. 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑐𝑎𝑙𝑙𝑠) = 𝑃(𝑋 ≥ 2)
will be = 1 − 𝑃(𝑋 < 2)
= 1 − {𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)}
(i) No claim 40 𝑒 −4 41 𝑒 −4
(ii) Exactly 6 claims =1−{ + }
(iii) At least 2 claims 0! 1!
= 1 − 0.0915782 = 0.908473
Solution c. 𝑃(𝑁𝑜 𝑐𝑎𝑙𝑙 𝑖𝑛 45 𝑠𝑒𝑐𝑠):
Since the time interval is not in 45 secs,
Let 𝑋 be the number of claims per day , then we need to convert the rate to per 45
𝑋~𝑝(𝜆 = 6). secs.
The given rate is 𝜆 =
𝜆𝑥 𝑒 −𝜆
Using the distribution 𝑃(𝑋 = 𝑥) = ;𝑥 = 4 𝑚𝑖𝑛𝑠 𝑝𝑒𝑟 𝑚𝑖𝑛𝑢𝑡𝑒.
𝑥!
0,1,2, … i.e In every 1 minute , we have 4 calls.
⇒ 4 𝑐𝑎𝑙𝑙𝑠 = 1 𝑚𝑖𝑛𝑢𝑡𝑒
(i) 𝑃(𝑁𝑜 𝑐𝑙𝑎𝑖𝑚) = 𝑃(𝑋 = 0) = 4 𝑐𝑎𝑙𝑙𝑠 = 60 𝑠𝑒𝑐𝑜𝑛𝑑𝑠
60 𝑒 −6 Let 𝑘 𝑐𝑎𝑙𝑙𝑠 = 45 𝑠𝑒𝑐𝑠
= 0.00248
0! ⇒ 4 × 45 = 60 × 𝑘
(ii) 𝑃(𝐸𝑥𝑎𝑐𝑡𝑙𝑦 6 𝑐𝑙𝑎𝑖𝑚𝑠) = 4 × 45
66 𝑒 −6 𝑘= = 3 𝑐𝑎𝑙𝑙𝑠 𝑝𝑒𝑟 45 𝑠𝑒𝑐𝑠.
𝑃(𝑋 = 6) = 6! = 0.1607 60
Using 𝜆 = 3 𝑐𝑎𝑙𝑙𝑠 𝑝𝑒𝑟 45 𝑠𝑒𝑐𝑠
(iii) 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑐𝑙𝑎𝑖𝑚𝑠) =
31 𝑒 −3
𝑃(𝑋 ≥ 2) = 1 − 𝑃(𝑋 < 2) 𝑃(𝑋 = 1) = = 0.14936
= 1 − {𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)} 1!
60 𝑒 −6 61 𝑒 −6 d. 𝑃(𝑁𝑜 𝑐𝑎𝑙𝑙 𝑖𝑛 𝑎𝑛 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑜𝑓 45 𝑠𝑒𝑐𝑠. )
=1−{ + } 30 𝑒 −3
0! 1! 𝑃(𝑋 = 0) = = 0.04979
0!
= 1 − 0.01736 = 0.98264
Example 4.11: (Question 3, 2008/2009
Example 4.10: (Question 40-43, 2014/2015; Session)
Section A)
Customers arrive a booking office at an average
Given that the expected number of phone calls is rate of 3 per 10 minutes. In a particular 10
4 per minute , what is the probability that there minutes duration , what is the probability that
will be
a. Exactly 4 customers arrive?
a. No call in one minute? b. Not more than three customers arrive?
b. At least two calls in one minute?
c. One call in 45 seconds? Solution
d. No call in an interval of 45 seconds?
Given that the mean rate of occurrence is 𝜆 = 3
Solution
𝜆𝑥 𝑒 −𝜆
From the given , the rate is Using the p.m.f 𝑃(𝑥) =
𝑥!

34 𝑒 −3
𝜆 = 4 𝑝𝑒𝑟 𝑚𝑖𝑛𝑢𝑡𝑒 a. 𝑃(𝑋 = 4) = 4! = 0.16803
𝜆𝑥 𝑒 −𝜆 b. 𝑃(𝑁𝑜𝑡 𝑚𝑜𝑟𝑒 𝑡ℎ𝑎𝑛 3 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠) =
Using 𝑃(𝑥) = 𝑥!
. 𝑃(𝑋 ≤ 3)
= 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + ⋯
40 𝑒 −4
a. 𝑃(𝑁𝑜 𝑐𝑎𝑙𝑙) = 𝑃(𝑋 = 0) = + 𝑃(𝑋 = 3)
0!
= 0.01832
Page 27 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

30 𝑒 −3 31 𝑒 −3 32 𝑒 −3 33 𝑒 −3 i. 𝑃(𝑁𝑜 𝑑𝑒𝑎𝑡ℎ𝑠) = 𝑃(𝑋 = 0) =


= + + + 40 𝑒 −4
0! 1! 2! 3! = 0.01832
= 𝑒 −3 (1 + 3 + 4.5 + 4.5) 0!
ii. 𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑑𝑒𝑎𝑡ℎ𝑠) =
= 𝑒 −3 (13) = 0.647232
𝑃(𝑋 ≥ 2)
Example 4.12: Suppose the particles that are = 1 − 𝑃(𝑋 < 2) = 1 − 5𝑒 −4
emitted from a radio-active source and that the = 0.9084218
number of particles emitted during one hour
period is a poisson distribution with parameter 𝜆. 4.4 Poisson Distribution As A Limiting Case
Of The Binomial Distribution
If 𝑃(𝑋 = 2) = 2⁄3 𝑃(𝑋 = 1), evaluate 𝑃(𝑋 > The Poisson distribution can be obtained as a
2) and 𝑃(𝑋 = 3). limiting case of the binomial distribution under
the f0llowing conditions
Solution
1. 𝑛, the number of trials is very large
Given that 𝑃(𝑋 = 2) = 2⁄3 𝑃(𝑋 = 1) while 𝑝, the constant probability of
success for each trial is very small. (𝑛 >
𝜆𝑥 𝑒 −𝜆
Using 𝑃(𝑋 = 𝑥) = 50 , 𝑝 ≈ 0 ).
𝑥!
2. The average tare of occurrence 𝜆 = 𝑛𝑝
𝜆2 𝑒 −𝜆 2 𝜆1 𝑒 −𝜆 is finite and at most equal to five.
⇒ = ⁄3 ( )
2! 1! Example 4.14: Assuming that on an average ,
𝜆2 𝑒 −𝜆 1% of the output in a factory , making certain
= 3( ) = 2(𝜆1 𝑒 −𝜆 ) part of an article are defective and that 200 units
2! are in a package , find the probability that
= 3𝜆2 𝑒 −𝜆 = 4𝜆1 𝑒 −𝜆 1. At most 2 defective
2. At least two defectives may be found
= 3𝜆2 = 4𝜆1 defective?
𝜆 = 4⁄3 Solution

Example 4.13: (Question 2b, 2007/2008 From the given statement , th e probability of an
Session) article being defective in a given production is
1% ⇒ 𝑝 = 0.01, the number of items in each
Suppose there is an average of 2 suicides per production is 𝑛 = 200, hence the mean rate of
50,000 animal population. In a colony of defective production is 𝜆 = 𝑛𝑝 = 200 × 0.01 =
100,000 , find the probability that in a given year 2 < 5.
, there are
The mean rate of occurrence is less than five .
i. No deaths? the poisson distribution can be used to model
ii. At least 2 deaths? this question.
Solution 𝜆𝑥 𝑒 −𝜆
i.e 𝑃(𝑋 = 𝑥) = 𝑥!
From the given , there are 2 suicides in every
500,000 animal population ⇒there are 4 suicides 1. 𝑃(𝐴𝑡 𝑚𝑜𝑠𝑡 2 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) = 𝑃(𝑋 ≤ 2)
in 100,000 animal population.
= 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
Let 𝜆 = 4.

Page 28 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

20 𝑒 −2 21 𝑒 −2 22 𝑒 −2 Other Normal random variable, however can be


= + + = 𝑒 −2 (1 + 2 + 2) standardized. To standardize the Normal random
0! 1! 2!
𝑋−𝜇
variable 𝑋, we use the formula , 𝑍 = . This
= 5𝑒 −2 = 0.67668. 𝜎
is called the standard normal Variate or the
2.𝑃(𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑑𝑒𝑎𝑡ℎ𝑠) = 𝑃(𝑋 ≥ 2) = 1 − standard score.
𝑃(𝑋 < 2)
It’s worthy to note that the standard normal table
0 −2
2 𝑒 1 −2
2 𝑒 is a table of c.d.f’s, hence it is defined as
=1−{ + } = 1 − 3𝑒 −2 𝑃(𝑍 ≤ 𝑧) = Φ(𝑧).
0! 1!
𝑃(𝑎 ≤ 𝑧 ≤ 𝑏) = ∅(𝑏) − ∅(𝑎) and 𝑃(𝑍 > 𝑎) =
= 0.8506 1 − 𝑃(𝑍 ≤ 𝑎) = 1 − ∅(𝑎).
Example 4.15: if 3% of electric bulbs These concepts are illustrated in the examples
manufactured by a company are defective , find that follows.
the probability that in a sample of 100 bulbs ,
exactly five bulbs are defective? Example 4.16: The mean breaking strength of a
material 𝑋 is 20 with variance 25. The material
Solution
is defective if 𝑋 ≤ 18. Find the probability of
In this , the probability that an item produced is this event assuming the normal distribution.
defective is 𝑝 = 3% = 0.03 and the sample size
Solution
is 𝑛 = 100 ⇒ 𝜆 = 𝑛𝑝 = 100(0.03) = 3 < 5.
Given that 𝑋~𝑁(20,25), the probability of the
Using the Poisson distribution , 𝑃(𝑋 = 5) = event 𝑃(𝑋 ≤ 18).
35 𝑒 −3
= 0.10081
5!
All we need to do is to standardize the event and
4.5 Normal Distribution use the standard normal table.
𝑋−𝜇
The normal distribution is the most common Using the standard variate 𝑍 = , where 𝜇 =
𝜎
and the most widely used continuous 20 𝑎𝑛𝑑 𝜎 2 = 25 ⇒ 𝜎 = 5.
distribution.
18−20
Let 𝑋 be a continuous Normal random variable 𝑃(𝑋 ≤ 18) = 𝑃 (𝑍 ≤ 5 ) = 𝑃(𝑍 ≤ −0.4) =
with mean ,𝜇 and variance ,𝜎 2 , usually written ∅(−0.4), from the standard normal table, we
in the form 𝑋~𝑁(𝜇, 𝜎 2 ) , then the p.d.f of 𝑋 is have it that 𝑃(𝑋 ≤ 18) = ∅(−0.4) = 0.3446.
1 𝑥−𝜇 2
1 − ( )
𝑓(𝑥) = 𝑒 2 𝜎 ; −∞ < 𝑥 < ∞. Example 4.17: Suppose that the weights of 800
𝜎 √2𝜋
animals are normally distributed with mean
The expression for the NormalDistribution is 65𝑘𝑔 and variance of 25, find the number of
tedious when it comes to application, for this animals with weughts
reason a table of values is in existence ; this
table however can be used only for i. Between 65𝑘𝑔 and 70𝑘𝑔
StandardNormalRandom variables. ii. At least 72𝑘𝑔

A standard normal random variable is a normal Solution


variable with mean of 0 and variance of 1, i.e if
From the given , let 𝑋 be the weight of the
𝑍 is a standard Normal random variable , then
animal, then to obtain the probability of these
𝑍~𝑁(0,1).
events , we standardize the events , take read
from the standard normal table.
Page 29 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

𝜇 = 65 𝑎𝑛𝑑 𝜎 2 = 25 ⇒ 𝜎 = 5 A random variable 𝑋has the normal distribution


with mean 8 and a standard deviation of 5.
65−65
i. 𝑃(65 ≤ 𝑋 ≤ 70) = 𝑃 ( ≤ Calculate the following
5
70−65
𝑍≤ 5
) i. 𝑃(0 ≤ 𝑋 ≤ 10)
= 𝑃(0 ≤ 𝑍 ≤ 1) = ∅(1) − ∅(0) ii. 𝑃(𝑋 ≥ 10)
= 0.8413 − 0.5000 = 0.3413 iii. 𝑃(𝑋 ≤ 16)
⇒ the number of animals having iv. 𝑃(𝑋 ≥ 4)
weight between 65kg and 70kg is
800×0.3418≈ 273 animals. Solution
72−65
ii. 𝑃(𝑋 ≥ 72) = 𝑃 (𝑍 ≥ 5 ) = Given that 𝜇 = 8 𝑎𝑛𝑑 𝜎 = 5.
𝑃(𝑍 ≥ 1.4)
= 1 − 𝑃(𝑍 < 1.4) = 1 − ∅(1.4) i. 𝑃(0 ≤ 𝑋 ≤ 10)
0−8 10 − 8
= 1 − 0.9192 = 0.0808 = 𝑃( ≤𝑍≤ )
⇒ the number of animals having 5 5
weight atleast 72kg is 0.0808 × = 𝑃(−1.6 ≤ 𝑍 ≤ 0.4)
800 = 65 animals. = ∅(0.4) − ∅(−1.6)
= 0.6554 − 0.0548 = 0.6006
Example 4.18: (Question 11, 2011/2012 ii. 𝑃(𝑋 ≥ 10) = 𝑃(𝑍 ≥ 0.4)
Session) = 1 − 𝑃(𝑍 < 0.4)
= 1 − ∅(0.4) = 1 − 0.6554
If 𝑋~𝑁(10,4), find 𝑃(𝑋 > 13). = 0.3446
16−8
Solution iii. 𝑃(𝑋 ≤ 16) = 𝑃 (𝑍 ≤ 5 )
= ∅(1.6) = 0.9452
In this , 𝜇 = 10 and 𝜎 2 = 4 ⇒ 𝜎 = 2. iv. 𝑃(𝑋 ≥ 4) = 𝑃(𝑍 ≥ −0.8) = 1 −
𝑃(𝑍 < −0.8)
13 − 10
𝑃(𝑋 > 13) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 1.5) 1 − ∅(−0.8) = 1 − 0.2119
2 = 0.7881
= 1 − 𝑃(𝑍 ≤ 1.5) = 1 − ∅(1.5) = 1 − 0.9332
= 0.0668.

Example 4.19: (Question 4b, 2007/2008


Session)

5
Regression Analysis and Correlation Coefficients

𝟓. 𝟏 Regression Analysis In regression analysis , we assume a model that


we suspects would represent the relation
This is the analysis of the functional relationship between the variables under consideration.
that exist between two or more variables . it While doing this , one have to identify two basic
deals with the development of mathematical types of variables , namely, the dependent or
models that fits the connection between these respond variable and the independent or the
variables. explanatory variable.
Page 30 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

There are different classes of regression


analysis. If there are three or more variables
involved , where on e is the dependent variable i. Obtain the means (𝑥̅ 𝑎𝑛𝑑 𝑦̅)
and the others are independent variables , then it ii. Find 𝛽0 and 𝛽1
is termed multiple regression. iii. Hence , what is the equation of best
fit?
If we have just two variables , one dependent iv. If 16 doses of poison were given to
and one independent , then it is termed simple groups of 25 mice, what is the
regression. In this text , we shall be focus on number of deaths?
simple linear regression.
Solution
The general form of the simple linear regression
is i. To obtain the means of the two
variables , we use he fomulae, 𝑥̅ =
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀, where 𝑦 is dependent on 𝑥. ∑𝑥
and 𝑦̅ = 𝑛 .
∑𝑦
𝑛
𝛽0 and 𝛽1 are the regression coefficients and 𝜀 From the given table ,
is called the random error or disturbance term. ∑ 𝑥 = 40, ∑ 𝑦 = 34 and 𝑛 = 5
40
hence the means are 𝑥̅ = 5 = 8 and
From the above expression , we have it that the 34
estimated regression line is 𝑦̅ = = 6.8
5
ii. To obtain the estimates 𝛽0 and 𝛽1 ,
𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥. In this ,𝛽̂0 and 𝛽̂1 are the we use the formulae
parameter estimates. 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽̂1 = 2 2 and
𝑛 ∑ 𝑥 −(∑ 𝑥)
The parameter estimates can be obtained using a 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅
statistical method known as Ordinary Least A table need to be created for these
Square Method(OLS). formulae. The table us as shown
below:
The parameters can be obtained to be 𝛽̂1 = 𝑥 𝑦 𝑥𝑦 𝑥 2
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
2
̂ ̅ − 𝛽̂1 𝑥̅
2 and 𝛽0 = 𝑦
𝑛 ∑ 𝑥 −(∑ 𝑥)
4 1 4 16
The random error is given by the expression
6 3 18 36
𝜀 = 𝑦 − 𝑦̂.
8 6 48 64
It is worthy to note that the regression line of 𝑦
on 𝑥 is also referred to as the line of best fit , the 10 8 80 100
least square line and so on.
12 16 192 144
Example 5.1: (Question 2-5, SectionA;
2014/2015) From the table ,𝑛 = 5,
In an experiment , various doses of a poison ∑ 𝑥 = 40, ∑ 𝑦 = 34,
were given to groups of 25 mice and the ∑ 𝑥𝑦 = 342 & ∑ 𝑥 2 = 360, hence
following results were observed. 5(342)−40(34)
𝛽̂1 = = 1.75 and 𝛽̂0 =
5(360)−402
Doses Xmg 4 6 8 10 12 34 40
− 1.75 ( ) = −7.2
5 5
No. of Deaths Y 1 3 6 8 16 iii. The equation of best fit is 𝑦̂ = 𝛽̂0 +
𝛽̂1 𝑥
Page 31 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

i.e 𝑦̂ = −7.2 + 1.75𝑥 2 5 10 4

iv. from the line obtained in iii, when 𝑥 = 16, 5 14 70 25


then we have 𝑦 = −7.2 +
1.75(16) = 20.8 6 17 102 36
Example 5.2 : (Question 37 & 38, Section A;
2014/2015)
∑ 𝑥 = 20, ∑ 𝑦 = 50,
Given the following data
∑ 𝑥𝑦 = 230 , ∑ 𝑥 2 = 90 &
X 3 4 2 - 6 𝑛 = 5 , hence we have
5(230)−20(50)
Y 8 - 5 14 17 𝛽̂1 = 5(90)−202 = 3 and 𝛽̂0 =
20 50
− 3 ( ) = 4 − 15
i. Find the missing values , if 𝑥̅ = 4 5 5

and 𝑦̅ = 10. = −11


ii. Evaluate 𝛽0 and 𝛽1 . Example 5.3

The table below shows the heights 𝑋 and 𝑌 of a


sample of 12 fathers and their oldest son’s
Solution
respectively.
i. From the table given above, let the
Heights of father Heights of sons
missing values for 𝑥 and 𝑦 be 𝑎 and
𝑏 respectively .
3+4+2+𝑎+6 165 173
i.e for 𝑥, we have 5
=4
15 + 𝑎 160 168
=4
5
15 + 𝑎 = 20 ⇒ 𝑎 = 5. 170 173
8+𝑏+5+14+17
For 𝑦 ,we have 5
=
183 165
10
44 + 𝑏
= 10 173 175
5
𝑏 + 44 = 50 ⇒ 𝑏 = 6. 158 168
ii. Just like it was done for the previous
example ,to obtain the estimates 𝛽0 178 173
and 𝛽1 , wee use the formulae 𝛽̂1 =
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
and 168 165
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅
173 180
The table for the required values is
as shown below.
170 170
𝑥 𝑦 𝑥𝑦 𝑥 2
175 173
3 8 24 9
180 178
4 6 24 16

Page 32 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Find the least square regression line of 𝑦 on 𝑥. There are different methods that are used t
determine the correlation coefficient
Solution between two variables. In this text , we shall
be considering two methods; these are
Please , do this one!!
1. The Pearson’s Product Moment
𝟓. 𝟐 Correlation Coefficients Correlation Coefficient and
Correlation Coefficient is a measure of the 2. The Spearman’s Rank Correlation
linear relationship between two variables. Coefficient.
5.2.1 Pearson’s Product Moment
It is used to determine if the increase in one Correlation Coefficient.
variable would result in the increase or decrease
in another. Given the two variables ,𝑋 and 𝑌, the PPMCC is
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑟= 2 2 2 2
√(𝑛 ∑ 𝑥 −(∑ 𝑥) )(𝑛 ∑ 𝑦 −(∑ 𝑦) )
If the increase in one results to the increase in
the other , then we say that the variables are
positively correlated.
Example 5.4
If the increase in one results to the decrease in
the other or vice versa , then the variables are The data below shows the scores of students in
said to be negatively correlated. mathematics (X) and economics(Y).
The correlation coefficient between two Maths(x) 70 49 58 12 67 23 12
variables , denoted as 𝑟 lies between 1 and −1,
i.e −1 ≤ 𝑟 ≤ 1. Econs. (y) 46 78 12 78 78 12 67
The following are different degrees of
correlation:
Obtain the PPMC and interpret the result.
1. If 𝑟 = 1, then the variables are said to
have a perfect positive correlation. i.e Solution
both will increase or decrease at the
same ate. Using the formula 𝑟 =
2. If 𝑟 = −1, then the variables are said t 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦

have a perfect negative correlation, i.e √(𝑛 𝑥 −(∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 )
∑ 2

as one increase , the other would


The table for the required values is as shown
decrease at the same rate and vice versa.
below:
3. 𝑟 = 0, then the variables are said not be
correlated or have a zero correlation. 𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
4. If 𝑟 ≫ 0.5, then the variables are said to
have a strong positive correlation. 70 46 3220 4900 2116
5. If 0 < 𝑟 < 0.5, then the variables are
said to have a weak positive correlation. 49 78 3822 2401 6084
6. If −0.5 < 𝑟 < 0, then the variables are
said to have a weak negative correlation. 58 12 696 3364 144
7. If 𝑟 < −0.5, then the variables are said
to have a strong positive correlation. 12 78 936 144 6084
8. If 𝑟 ≈ 0.5 , then they have a moderate
positive correlation and it’s a moderate 67 78 5226 4489 6084
negative if 𝑟 ≈ −0.5.
Page 33 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

23 12 276 529 144 However , if there exist a tie , then the rank of
the values having the tie is the same ; this is as
12 67 804 144 4489 illustrated in the examples below.

Example 5.5 : (Question 10, 2013/2014)

Calculate the spearman’s Rank correlation of the


𝑛 = 7, ∑ 𝑥 = 291, ∑ 𝑦 = 371, set of data below , hence interpret the result.

∑ 𝑥𝑦 = 14980, ∑ 𝑥 2 = 15971 & X 2 4 7 8

Y 3 7 12 7
∑ 𝑦 2 = 25145.

Substituting these into the formula, we have


Solution
7(14980) − 291(371)
𝑟= The formula for the Rank correlation is
√(7(15971) − 2912 )(7(25145) − 3712 )
3101 6 ∑ 𝑑2
=− 𝑟 =1− .
32257.54771 𝑛(𝑛2 − 1)
= −0.09613
Below is the completed table for the required
Since −0.5 < 𝑟 < 0, there is a very weak parameters.
negative correlation between a student’s score in
mathematics and economics. Note that the rank is taken in increasing order of
magnitude, and there’s a tie in the rank of 𝑦. The
5.2.2 The Spearman’s Rank 7’s in the column of 𝑦 share the same rank of 2;
Correlation Coefficient i.e they occupy the 2nd and the 3rd position. This
is the reason 12 occupies the 4th position.
Unlike the PPMC, the Rank correlation
coefficient is used to determine the correlation 𝑥 𝑦 𝑅𝑥 𝑅𝑦 𝑑𝑖 𝑑𝑖2
coefficient for also ordinal variables .

Given the variables , 𝑋 and 𝑌 , the Rank 2 3 1 1 0 0


Correlation Coefficient is
4 7 2 2 0 0
2
6∑𝑑
𝑟 =1− 7 12 3 4 -1 1
𝑛(𝑛2 − 1)

where 𝑛 is the number of pairs 8 7 4 2 2 4

𝑑 = 𝑅𝑥 − 𝑅𝑦

𝑅𝑥 and 𝑅𝑦 are the ranks of 𝑥 and 𝑦 respectively From the table above ,∑ 𝑑𝑖2 = 5 & 𝑛 = 4, hence
6(5) 1
in increasing or decreasing order of magnitude. 𝑟 =1− = 1 − = 0.5
4(4 2 −1) 2

To obtain the ranks of a set of values, we The variables have a moderate positive
identify the smallest value and give it a rank of 1 correlation.
, the next smallest value is given a rank of 2 and
so on. Example 5.6

Page 34 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Obtain the Spearman’s rank correlation i.e coefficient of determination is 𝑅 2 and it lies
coefficient for the data given below. between 0 and 1.

X 2 4 7 1 6 8 2 i.e 0 ≤ 𝑅 2 ≤ 1 or 0% ≤ 𝑅 2 ≤ 100%

Y 4 7 1 6 7 5 6 Example 5.7

Determine the PPMC and the Coefficient of


determination of the data below, hence interpret
Solution the results.

Below is the table for the required parameters: 𝑥 𝑦

𝑥 𝑦 𝑅𝑥 𝑅𝑦 𝑑𝑖 𝑑𝑖2 3 6

2 4 2 2 0 0 6 12

4 7 4 6 -2 4 2 8

7 1 6 1 5 25 4 9

1 6 1 4 -3 9 8 7

6 7 5 6 -1 1 6 10

8 5 7 3 4 16 2 5

2 6 2 4 -2 4

Solution

From the table, we have 𝑛 = 7 & Using the formula 𝑟 =


𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
.
2
∑ 𝑑 = 59. √(𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 )

The correlation coefficient is The completed table is as shown below.

6(59) 𝑥 𝑦 𝑥𝑦 𝑥 2 𝑦2
𝑟 =1− = 1−= −0.0536.
7(72 − 1)
3 6 18 9 36
5.3 Coefficient of Determination
6 12 72 36 144
The coefficient of determination for a set of
pairs o data , also called the Goodness of fit is 2 8 16 4 64
the measures that determines the amount of
variation in the dependent variable that can be 4 9 36 16 81
explained by the independent variable. It is the
square of the correlation coefficient between the 8 7 56 64 49
two variables.
6 10 60 36 100

Page 35 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

2 5 10 4 25 The coefficient of determination is 𝑟2 =


0.472 = 0.2209 = 22.1%.
𝑛 = 7, ∑ 𝑥 = 31, ∑ 𝑦 = 57,
i.e only 22.1% of variation in 𝑦 can be
explained by the explanatory variable, 𝑥.
∑ 𝑥𝑦 = 268, ∑ 𝑥 2 = 169 & ∑ 𝑦 2 = 499.
Note: In trying to determine the coefficient of
7(268) − 31(57) determination , if the method is not specified ,
𝑟=
√(7(169) − 312 )(7(499) − 572 ) use the method of PPMC.

= 0.47.

This means that the correlation between 𝑥 and 𝑦


is positively –weak.

6
Construction of Confidence Interval

6.1 Introduction Parameter: A parameter is a population


measure or characteristic, e.g Mean , Variance,
Statistics can be broadly classified into Proportion , e.t.c.
Descriptive and Inferential[Inductive]
statistics. Estimate: Most times , it’s possible that the
population may not be known or easily obtained
Descriptive statistics tends to describe due to some factors, like time ,finance , e.t.c, in
Parameters of populations and samples, this such a situation , results would be obtained from
includes the mean . variance , median, Kurtosis, samples. The results obtained from samples are
Skewness, proportion , e.t.c. In inferential called estimates of the population parameter or
statistics, our main concern is the population that sample statistic.
may not be reachable ; In this, we study results
obtained from samples, use these findings The following are distinctive symbols used for
obtained from the sample to infer on the parameters and corresponding Estimates.
population parameter(s)(mean , proportion ,
variance , et.c.) ., hence the name “ Population Sample
InferentialStatistics.” parameter Statistic

There are two parts in inferential statistics, these Or Estimate


are (i) Parameter Estimation (ii) Test of
Hypothesis. Mean 𝜇 𝑥̅

Page 36 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Variance 𝜎2 𝑆2 Generally , 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐿𝑒𝑣𝑒𝑙 +


𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 = 100% = 1
Proportion 𝑃 𝑝
6.2 Construction of confidence interval

To construct a confidence interval , one need to


More generally , a parameter estimate may be obtain the relevant sample statistic , the level of
denoted by placing a “cap or caret ” above the significance and a critical value. In this text , we
population parameter. shall consider confidence interval for single
population mean and for difference in
i.e 𝜇̂ is the estimate of 𝜇 population mean.

𝜎̂ 2 is the estimate of 𝜎 2 6.2.1 Confidence Interval For Single


Population Mean
𝑃̂ is the estimate of 𝑃 and so on.
To construct confidence interval for single
There are two types of estimates, namely: point population mean , using a sample of size 𝑛 . we
estimate and interval estimate . shall consider different cases
Point Estimate: A point estimate is a single Case1: When the sample size is large and the
value for a parameter estimate , e.g 𝑥̅ = population variance is known, i.e 𝑛 ≥ 30 and 𝜎 2
34.8𝑘𝑔 𝑎𝑛𝑑 𝜎̂ 2 = 0.5. Point estimates are not is known.
good choices in parameter estimation. It is better
for one to create an interval for the population The confidence interval for this is 𝐶. 𝐼 = 𝑥̅ ±
𝜎
parameter; this is because the true value of the 𝑍∝⁄ . 𝑛
2 √
population parameter may differ from that given
by the point estimate by slight value. To Case2: When the sample size is small and
accommodate for this slight deviation comes population variance is unknown, i.e 𝑛 < 30 and
interval estimates. 𝜎 2 is unknown.
Interval Estimate: An interval estimate is an 𝑆
The 𝐶. 𝐼 = 𝑥̅ ± 𝑡∝⁄ ,(𝑛−1) . , where 𝑛 − 1 is the
interval that a researcher can say with some 2 √𝑛
degree of confidence that a given population degree of freedom (d.f)
parameter lies. For instance , a researcher can
Casse3: When the sample size is large and the
construct a confidence interval for the
population variance is unknown, i.e 𝑛 ≥ 30 and
population mean age of students in a class and
assert that he has 95% confidence that the 𝜎 2 is unknown. In this , use the sample variance
population mean lies in the interval [19yrs, in place of the population variance .
25yrs], this can also be written as 19𝑦𝑟𝑠 ≤ 𝜇 ≤ 𝑆
25𝑦𝑟𝑠. ⇒ 𝐶. 𝐼 = 𝑥̅ ± 𝑍∝⁄ .
2 √𝑛
Level of Significance(𝜶): This is the maximum
In general, whenever the population variance is
percentage of error that a researcher or an
unknown , use the sample variance ; for a large
analyst is willing to make. Usually , for medical
sample size, we use the 𝑍 distribution and the 𝑡
research ∝= 1% and from 5% for other research
distribution(with 𝑛 − 1 d.f) for small sample
purposes.
size.
Confidence Level(𝟏 − 𝜶): This is the
Example 6.1: Measurements of the diameter of
percentage of confidence that a researcher has.
a random sample of 300 ball bearing made by a
certain machine during one week showed a
Page 37 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

mean of 1.07 millimetres and a variance of 0.25, 𝑆


Using the formula, 𝐶. 𝐼 = 𝑥̅ ± 𝑡∝⁄ . ,
find a 99% confidence interval for the mean 2,(𝑛−1) √𝑛

diameter of the ball bearing. where 𝛼 = 1 − 90% = 1 − 0.90 = 0.10


0.10
Solution ⇒ ∝⁄2 = 2
= 0.05 and 𝑑. 𝑓 = 17 − 1 = 16

From the given , the sample size is𝑛 = 200 ≥ From the 𝑡 𝑡𝑎𝑏𝑙𝑒, 𝑡0.05 (16) = 1.246
30.
2.059
The sample mean and variance are 𝑥̅ = ∴ 𝐶. 𝐼 = 22.3 ± 1.746 ( )
√17
1.07 𝑎𝑛𝑑 𝑆 2 = 0.25 ⇒ 𝑆 = 0.5.
= 22.3 ± 0.8719
This falls in the 3rd category. Using 𝐶. 𝐼 = 𝑥̅ ±
𝑆 ⇒ 𝐿𝑜𝑤𝑒𝑟 𝐿𝑖𝑚𝑖𝑡(𝐿𝐿) = 22.3 − 0.8719 =
𝑍∝⁄ . 𝑛, where ∝= 1 − 99%
2 √ 21.4281and
= 1 − 0.99 = 0.01 𝑈𝑝𝑝𝑒𝑟 𝐿𝑖𝑚𝑖𝑡(𝑈𝐿) = 22.3 + 0.8719
0.01 = 23.1719
⇒ 𝛼⁄2 = = 0.005.
2 Therefore , the confidence interval is
𝑍0.005 = 2.578 𝜇𝜖[21.4281𝑚𝑚, 23.1719𝑚𝑚].

0.5 Example 6.3: A chemical plant produces a


𝐶. 𝐼 = 1.07 ± 2.578 ( ) synthetic building material . At random , a
√200 sample is drawn to check on the hardness of the
= 1.07 ± 0.09115 materials . the results of the sample are
59.3,71.1,53.6,58.7,51.2,53.6,54.6,50.3,58.9,43.3
⇒ 𝐿𝑜𝑤𝑒𝑟 𝐿𝑖𝑚𝑖𝑡(𝐿𝐿) = 1.07 − 0.09115
𝑎𝑛𝑑 56.2. Find a 95% C.I for the mean hardness
= 0.9769 and of the material and interpret your result.

𝑈𝑝𝑝𝑒𝑟 𝐿𝑖𝑚𝑖𝑡(𝑈𝐿) = 1.07 + 0.09115 Solution


= 1.16115
From the given , we need the sample mean and
Therefore , the confidence interval is standard deviation.
𝜇𝜖[0.9769,0.9769] and can also be written in
∑𝑥 54.3+71.1+53.6+58.7+⋯+56.2
the form Using 𝑥̅ = 𝑛
= 11

0.9769 ≤ 𝜇 ≤ 0.9769. = 55.53 and


Example 6.2: A zoologist is interested in the ∑(𝑥 − 𝑥̅ )2
mean length of cuckoo eggs laid in a nest 𝑆2 =
𝑛−1
belonging to the meadow pipit. A sample of 17 [(54.3 − 55.53)2 + (71.1 − 55.53)2 + ⋯ + (56.2 − 55.53)2 ]
eggs yielded the following data: 𝑥̅ = 22.3𝑚𝑚 =
11 − 1
and 𝑆 = 2.059𝑚𝑚. calculate a 90% confidence
interval for the population mean. = 46.24819 ⇒ 𝑆 = 6.95
Solution The sample size is 𝑛 = 11 < 30, hence we use
the 𝑡 distribution.
The sample size is small (𝑛 = 17 < 30), hence
we are to use the 𝑡 distribution.

Page 38 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

The required confidence interval is𝐶. 𝐼 = 𝑥̅ ± In this, we use (𝑥̅1 − 𝑥̅2 ) ±


𝑆
𝑡∝⁄ ,(𝑛−1) . 𝑛, where ∝= 1 − 95% = 1 − 𝑡∝⁄
1 1
. 𝑆𝑝 √𝑛 + 𝑛 , where 𝑆𝑝2 is called
2 √ 2,(𝑛1 +𝑛2 −2) 1 2
0.95 = 0.05
the pooled variance and it is such that 𝑆𝑝2 =
0.05 (𝑛1 −1)𝑆12 +(𝑛2 −1)𝑆22
⇒ 𝛼⁄2 = 2
= 0.025 and 𝑑. 𝑓 = 11 − 1
𝑛1 +𝑛2 −2

= 10. 𝑡0.025 (10) = 2.228. Example 6.4: Assuming the weights loss (in
gms) of 12 rabbits fed on diet A were
6.95
𝐶. 𝐼 = 55.53 ± 2.228 ( ) 12.8,14.8,13.5,16.0,18.1,15.1,11.4,15.2,10.2,
√11
= 55.53 ± 4.6662 14.6,15.5,17.1 while the loss in 10 rabbits fed
on diet B were
Therefore, there is a 95% chance that the mean 9.3,11.5,14.6,10.7,13.2,12.6,16.7,10.6,13.1
hardness of the material lies between 50.864 and
60.1962. , 12.7. If there these are independent
observations drawn from two normal
6.2.2 Confidence Interval for Difference in populations, construct a 90% 𝐶. 𝐼 for the
Population Means(𝝁𝟏 , 𝝁𝟐 ) difference in population mean weight loss and
interpret your results.
Let (𝑥̅1 , 𝑥̅2 ) and (𝜎1 2 , 𝜎2 2 ) be the sample means
and the population vectors of two population. A Solution
confidence interval ca be constructed for the
difference in population means . just like that for We need to obtain the sample mean and variance
a single mean , we have different cases thatfor of both populations.
the difference in population means.
For diet A:
Case1: The sample sizes are large and the
∑ 𝑥𝐴 12.8+14.8+13.5+16.0+⋯+17.1
population variances are known. i.e 𝑛1 , 𝑛2 ≥ 30 𝑥̅𝐴 = = = 14.5
𝑛𝐴 12
and (𝜎1 2 , 𝜎2 2 ) is known. ∑(𝑥𝐴 −𝑥̅𝐴 )2
and 𝑆𝐴2 = 𝑛𝐴 −1
=
The confidence interval for the difference in (12.8−14.5)2 +(14.8−14.5)2 +(13.5−14.5)2 +⋯+(17.1−14.5)2
population means 𝜇1 − 𝜇2 is (𝑥̅1 − 𝑥̅2 ) ± 12−1
𝜎2 𝜎22
𝑍∝⁄ . √ 1 + 𝑆𝐴2 = 4.701
2 𝑛1 𝑛2

Case2: The sample sizes are small and the For diet B:
population variances are unknown. Ie 𝑛1 , 𝑛2 < ∑ 𝑥𝐵 9.3+11.5+14.6+10.7+⋯+12.7
30 and (𝜎1 2 , 𝜎2 2 ) is unknown. 𝑥̅𝐵 = = = 12.5
𝑛𝐵 10
∑(𝑥𝐵 −𝑥̅𝐵 )2
In this, we use (𝑥̅1 − 𝑥̅2 ) ± and 𝑆𝐵2 = 𝑛𝐵 −1
=
𝑆2 𝑆2 (9.3−12.5)2 +(11.5−12.5)2 +(14.6−12.5)2 +⋯+(12.7−12.5)2
𝑡∝⁄ ,(𝑛1 +𝑛2 −2) . √𝑛1 + 𝑛2 , where 𝑛1 + 𝑛2 − 2 = 10−1
2 1 2
𝑑. 𝑓 𝑆𝐵2 = 4.56
Case3: The sample sizes are small and the 2
𝑆𝐴 𝑆2
population variances are unknown. Ie 𝑛1 , 𝑛2 < Using (𝑥̅𝐴 − 𝑥̅𝐵 ) ± 𝑡∝⁄ ,(𝑛 +𝑛 −2) . √
𝑛
+ 𝑛𝐵 ,
2 𝐴 𝐵 𝐴 𝐵
30 and (𝜎1 2 , 𝜎2 2 ) is unknown but estimated to where ∝= 1 − 90% = 0.10
be equal(𝜎1 2 = 𝜎2 2 ).

Page 39 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

0.10 The parameters given are as shown below


⇒ ∝⁄2 = = 0.05 , 𝑑. 𝑓 = 12 + 10 − 2
2
= 20 𝑎𝑛𝑑 𝑡0.05,20 = 1.725 Group A Group B

𝑛𝐴 = 50 𝑛100 = 100
4.701 4.56
𝐶. 𝐼 = (14.5 − 12.5) ± 1.725√ +
12 10 𝑥̅𝐴 = 7.82ℎ𝑟𝑠 𝑥̅𝐴 = 6.75ℎ𝑟𝑠

= 2 ± 1.5883 𝑆𝐴 = 0.24ℎ𝑟𝑠 𝑆𝐴 = 0.34ℎ𝑟𝑠

𝐿𝐿 = 0.412 𝑎𝑛𝑑 𝑈𝐿 = 3.5883

∴ 𝜇𝐴 − 𝜇𝐵 𝜖[0.412,3.5883]. There is a 90% The sample sizes are large, hence we are to use
chance that the difference in the mean weights the 𝑍 distribution.
of the rabbits between the two diets lies between
0.412 and 3.5883. 𝑆2 𝑆2
Using (𝑥̅𝐴 − 𝑥̅𝐵 ) ± 𝑍∝⁄ . √𝑛𝐴 + 𝑛𝐵 , where ∝=
2 𝐴 𝐵
Example 6.5: Of two similar groups of patients 1 − 99% = 0.01
A and B , consisting of 50 and 100 individuals
respectively. The first was given a new type of ⇒ ∝⁄2 =
0.01
= 0.005 and 𝑍0.005 = 2.576.
sleeping pill and the second was given a 2
conventional type. For patience in group A , the
mean number of hours of sleep was 7.82 with a 0.242 0.302
standard deviation of 0.24hrs. for patience in 𝐶. 𝐼 = (7.82 − 6.75) ± 2.676√ +
50 100
group B , the mean number of hours was 6.75
with a standard deviation 0f 0.30hrs. find a 99% = 1.07 ± 0.11669 .
confidence limits for the difference in the mean
number of hours of sleep induced by the two The confidence interval is [0.95331,1.18669].
sleeping pills.

Solution

7
Test of Hypothesis

It’s not uncommon for one to make statement believe and different statements about the
about given phenomenon based on personal phenomenon.
believe or experience. In most cases , one
person’s experience may differ significantly In statistics , it is neither sufficient nor of good
from some other person’s, hence different practice for conclusions to be made about
situations(populations) based on personal
experience ; before such generalizations can be
Page 40 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

statistically accepted, it must be subjected to 𝑻𝒚𝒑𝒆 𝑰 𝑬𝒓𝒓𝒐𝒓:When a true null hypothesis is


statistical analysis. been rejected , a 𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟 has benn
committed.
7.1 Hypothesis
𝑻𝒚𝒑𝒆 𝑰𝑰 𝑬𝒓𝒓𝒐𝒓:A 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 is committed
A hypothesis is at statement about some when a false null hypothesis is been accepted.
population parameter(s) that need to be
subjected to statistical test. There are generally The probability of committing a 𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟
two types of hypothesis , namely the null and the is called the level of significance of the test and
alternative hypothesis. denoted as ∝.
Null hypothesis(𝑯𝟎 𝒐𝒓 𝑯𝒏): It’s the hypothesis From conditional probability, 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 /
that is tested for the possible rejection with the 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒) + 𝑃(𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0 /𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒) =
assumption that it is true. 1
Alternative hypothesis(𝑯𝟏 𝒐𝒓 𝑯𝒂): Any ⇒ 𝑃(𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔𝐻0 /𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
hypothesis that contradicts the null hypothesis is
called the alternative hypothesis. The alternative = 1 − 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 /𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
hypothesis compliments the null hypothesis. The
null and alternative hypotheses for a given 𝑃(𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0 /𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒) = 1−∝ and this
statistical analysis always comes in pairs. called level of confidence of the test.

e.g 𝐻0 : 𝜇 = 𝜇0 Also, The probability of committing a


𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 is denoted as 𝛽.
vs
From conditional probability, 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 /
𝑖. 𝐻1 : 𝜇 ≠ 𝜇0 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) + 𝑃(𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0 /
𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) = 1
𝑖𝑖. 𝐻1 : 𝜇 > 𝜇0 or
⇒ 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 /𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒)
𝑖𝑖𝑖. 𝐻1 : 𝜇 < 𝜇0 =1
− 𝑃(𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0
Only one of the alternative hypothesis is used
/𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒)
for the given analysis.
𝑃(𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 /𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) = 1 − 𝛽 and this
In the set of alternative hypothesis stated above,
called the power of the test.
i. is called a two tailed alternative while ii. And
iii. are called one-tailed alternative hypotheses. Test statistic: Test statistics are formulae or
methods that are used in testing hypotheses. In
𝑖𝑖. Is one tailed to the right and iii. is one-tailed
this chapter, we shall be considering the 𝑍 and
to the left.
the 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 ′ 𝑠 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠.
After the analysis , based on given evidence, the
The 𝑍 statistics is used for test of significance
null hypothesis or the alternative hypothesis
for large sample sizes while the 𝑡 statistic is used
would be accepted ( or rejected). However ,
for small sample sizes.
wrong decisions are likely to be made, these
decisions are called statistical errors. Critical value: This is the value of the test
statistics that separates the rejection region from
7.2 Types of Errors
the acceptance region of the statistic.
Generally, thee are two types of statistical errors
involved in hypothesis testing
Page 41 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Rejection region: This is the statistics that leads Case3: When sample size is large(𝑛 ≥ 30) and
to the rejection of the null hypothesis; this is also the population variance(𝜎 2 ) is unknown.
called the critical region.
Since the sample size is large , the 𝑍 statistic is
Acceptance region: This is the statistics that used.
leads to the acceptance of the null hypothesis.
(𝑥̅ −𝜇0 )√𝑛
i.e 𝑍 = ~𝑁(0,1)
Decision Rule: This is a rule that guides the 𝑆
researcher , as to whether reject or accept the
in all these cases , the hypotheses are
null hypothesis.
𝐻0 : 𝜇 = 𝜇0
Generally , in test of significance , we reject the
null hypothesis if vs
|𝑐𝑎𝑙. 𝑠𝑡𝑎𝑡. | > 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 𝑠𝑡𝑎𝑡. In particular , 𝑖. 𝐻1 : 𝜇 ≠ 𝜇0
for
𝑖𝑖. 𝐻1 : 𝜇 > 𝜇0 or
i. Two tailed test , the decision rule is
“reject 𝐻0 if |𝑍𝑐𝑎𝑙. | > 𝑍∝⁄ ” 𝑖𝑖𝑖. 𝐻1 : 𝜇 < 𝜇0
2
ii. One tailed test, we reject 𝐻0 if
𝑍𝑐𝑎𝑙. > 𝑍∝ (one tailed to the right) Example 7.1: (Question 18-19,Section A;
2014/2015)

𝑍𝑐𝑎𝑙. < −𝑍∝ (one tailed to the left) Assuming that the mean age of a certain
population is 17yrs . a random sample of 25
Note : in two tailed test , we use 𝛼⁄𝟐 and ∝ for people drawn from this population shows a
one tailed tests. mean of 18.1yrs and a sample variance of 16.

The same principals are applied for t test. a. What is the calculated test statistic?
b. Is the population mean greater than what
7.3 Test About a Single Population Mean is specified based on sample observation
at 𝜶 = 5%?
We shall consider the test under the following
cases Solution

Case1: When sample size is large(𝑛 ≥ 30) and From the given statement, 𝜇0 = 17𝑦𝑟𝑠, 𝑛 =
the population variance(𝜎 2 ) is known. 25, 𝑥̅ = 18.1 & 𝑆 = √16 = 4.

Since the sample size is large , the 𝑍 statistic is The hypothesis is


used.
𝐻0 : 𝜇 = 17
(𝑥̅ −𝜇0 )√𝑛
i.e 𝑍 = 𝜎
~𝑁(0,1) 𝐻1 : 𝜇 > 17
Case2: When sample size is small(𝑛 < 30) and Since the sample size is small , we use the 𝑡
the population variance(𝜎 2 ) is unknown. statistic.
Since the sample size is small , the 𝑡 statistic is (𝑥̅ − 𝜇0 )√𝑛
used. 𝑡= ~𝑡𝑛−1
𝑆
(𝑥̅ −𝜇0 )√𝑛
i.e 𝑡 = ~𝑡𝑛−1
𝑆

Page 42 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

(18.1 − 17)√25 Example 7.3: (Question6, Section A;


𝑡= = 1.375 2008/2009)
4
This is a one tailed test to the right. In an attempt to investigate the mean birth
weight (in kg) of babies in a particular locality, a
∝= 5% = 0.05 𝑎𝑛𝑑 𝑑. 𝑓 = 𝑛 − 1 = 25 − 1 random sample of 70 babies was taken with
= 24 sample mean of 2.89 and sample standard
deviation of 4.0. can we conclude at 5% level of
The critical value is 𝑡0.05 (24) = 1.725 significance that the mean birth weight of babies
is 3.2kg?
Decision rule
Solution
Reject 𝐻0 if 𝑡𝑐𝑎𝑙. > 𝑡∝ .
Given 𝑥̅ = 2.89, 𝑆 = 4, 𝑛 = 70 ≥ 30 &𝜇0 = 3.2
Now , 𝑡𝑐𝑎𝑙. = 1.375 ≯ 1.725, ∴ we accept the
null hypothesis and conclude that the mean age Hypotheses
is equal to that specified.
𝐻0 : 𝜇 = 3.2𝑘𝑔
Example 7.2: (Question 45-46, Section A; 𝐻1 : 𝜇 ≠ 3.2𝑘𝑔
2014/2015)
(𝑥̅ −𝜇0 )√𝑛
A certain type cable is supposed to have at least Using the test statistic, 𝑍 =
𝑆
1500kg . A sample of 50 cables suggests a mean
breaking strength of 1490kg with a standard (2.89 − 3.2)√70
⇒𝑍= = −0.6484
deviation of 31kg. 4

a. What is the calculated test statistic? This is a two tailed test, 𝑍∝⁄ = 𝑍0.025 = 1.96
2
b. Do we accept the hypothesis at ∝= 5%?
Since |−0.6484| = 0.6484 ≯ 1.96, we accept
Solution the null hypothesis and conclude that the mean
weight is 3.2𝑘𝑔.
a. Given 𝑛 = 50, 𝑥̅ = 1490, 𝜇0 =
1500 & 𝑆 = 31 Example 7.4: (Question 2(b), 2005/2006
(𝑥̅ −𝜇0 )√𝑛 Session)
Using 𝑍 = 𝑆
(1490−1500)√50 An electric engineer proposes that the time to
i.e 𝑍 = 31
= −2.2810
repair a particular electronic instrument exceeds
48hrs. 20 such instruments requiring repair were
b. This is a two tailed test, hence the
randomly chosen with repair times as follows
hypotheses are
𝐻0 : 𝜇 = 1500 39 50 51 48 54
𝐻1 : 𝜇 ≠ 1500
Using 5% level of significance,𝑍𝑐𝑟𝑖𝑡. = 48 54 53 49 50
1.96.
Since |−2.2810| = 2.2810 > 1.96, 46 43 49 50 49
hence we reject the null hypothesis .
60 54 52 53 52

Using 5% level of significance , do the data


present sufficient evidence to believe his
proposal?

Page 43 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Solution 𝐻0 : 𝜇1 = 𝜇2

From the given , we need to obtain the sample 𝐻1 : 𝜇1 ≠ 𝜇2


mean and the sample variance, hence standard
deviation. 𝐻1 : 𝜇1 ≠ 𝜇2
∑𝑥 or
The sample mean is 𝑥̅ = =
39+50+51+48+⋯+53+52
𝑛 𝐻1 : 𝜇1 ≠ 𝜇2
20
= 50.2 and
Test Of Significance For Independent
2
∑(𝑥 − 𝑥̅ )2 Samples
𝑆 =
𝑛−1
(39 − 50.2)2 + (50 − 50.2)2 + (51 − 50.2)2 + (48 − 50.2)2 + ⋯ + (52 − 50.2)2
=
= 144.8
20 − 1 Just like the case of single mean , we shall also
consider different cases.
⇒ 𝑆 = 12.033
Case1: When the sample sizes are large and the
Hypotheses population variance are known.

𝐻0 : 𝜇 = 48ℎ𝑟𝑠 Since the sample size is large , we shall make


use of the 𝑍 statistic
𝐻1 : 𝜇 > 48ℎ𝑟𝑠
𝑥̅1 −𝑥̅1
i.e 𝑍 = ~𝑁(0,1)
Since 𝑛 = 20 < 30, we use the 𝑡 statistic 𝜎2 𝜎2
√ 1+ 2
𝑛1 𝑛2
(𝑥̅ − 𝜇0 )√𝑛 (50.2 − 48)√20
𝑍= = Case2: When the sample sizes are small and the
𝑆 12.033
= 0.81764 population variance are unknown.

𝑑. 𝑓 = 20 − 1 = 19 𝑎𝑛𝑑 ∝= 5% = 0.05 Since the sample size is small, we shall make


use of the 𝑡 statistic
This is a one tailed test ,hence 𝑡∝ = 𝑡0.05 =
𝑥̅1 −𝑥̅1
1.729 i.e 𝑡 = ~𝑡(𝑛1 +𝑛2 −2)
𝑆2 𝑆2
√ 1+ 2
Decision rule 𝑛1 𝑛2

Reject the null hypothesis if 𝑡𝑐𝑎𝑙 > 𝑡𝑡𝑎𝑏. Case3: When the sample sizes are small and the
population variance are unknown but estimated
Now, 𝑡𝑐𝑎𝑙 = 0.81764 ≯ 𝑡𝑡𝑎𝑏. = 1.729, hence to be equal.
we accept the null hypothesis and conclude that
the mean hour for the repair of the instrument is Since the sample size is small, we shall make
48hrs. use of the 𝑡 statistic .but with the pooled sample
variance
𝑥̅1 −𝑥̅1
7.4 Hypothesis about Equality of two i.e 𝑡 = 1 1
~𝑡(𝑛1 +𝑛2 −2), where 𝑆𝑝2 is the
𝑆𝑝 √ +
𝑛1 𝑛2
Population Means
pooled variance and is calculated , thus 𝑆𝑝2 =
In this , we wish to test the hypothesis that two (𝑛1 −1)𝑆12 +(𝑛2 −1)𝑆22
population mean are the same. If (𝜇1 , 𝜇2 ) and 𝑛1 +𝑛2 −2
(𝜎12 , 𝜎22 ) are the mean and variance vectors of
the two populations , then the hypotheses are Example 7.5: (Question 48-50,Section
A,2014/2015)

Page 44 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Ten soldiers visited their shooting range for two 𝐻1 : There’s a significant difference in their
weeks running. For the first week, their scores performance.
were 67,24,57,55,63,54,56,68,33 & 43. For the
second week, their scores were Since the sample sizes are small and the
70,38,58,58,56,67,68,77,42 & 38. Is there any population variances are unknown but known to
significant difference between their be equal , then the sample statistics to be used is
𝑥̅ −𝑥̅
performances in the two weeks? (Assume the 𝑡 = 1 1 1 1 ~𝑡(𝑛1 +𝑛2 −2)
population variances are equal). Take the level 𝑆𝑝 √ +
𝑛1 𝑛2
significance as 𝛼 = 5%.
Where the pooled variance is
Solution
(𝑛1 − 1)𝑆12 + (𝑛2 − 1)𝑆22
We need to obtain the mean and the variance for 𝑆𝑝2 =
𝑛1 + 𝑛2 − 2
the two weeks period.
(10 − 1) × 209.11 + (10 − 1) × 193.3
For the first week , we have the following: =
10 + 10 − 2
∑10
𝑖=1 𝑥 67 + 24 + 57 + 55 + ⋯ + 43 𝑆𝑝2 = 201.1
𝑥̅1 = =
10 10
⇒ 𝑆𝑝 = 14.185
∑10
𝑖=1(𝑥−𝑥̅ )
2
𝑥̅1 = 52 and 𝑆12 = 𝑛−1
52 − 57.2
(67 − 52)2 + (24 − 52)2 + (57 − 52)2 + ⋯ + (43 − 52)2
𝑡= = −0.8195
= 1 1
9 14.185√10 + 10

𝑆12 = 209.11 This is a two way test , hence the critical value is
𝛼
For week two , we have the mean and variance to be calculated at 2 , under 10+10-2=18
as degree of freedom, hence we have 𝑡0.025 (18) =
1.734.
∑10
𝑖=1 𝑥 70 + 38 + 58 + 58 + ⋯ + 38
𝑥̅2 = = Since |−0.8195| = 0.8195 < 1.734, we accept
10 10
the null hypothesis and conclude that there is no
= 57.2 𝑎𝑛𝑑 the variance is significant difference in their performance.
∑10
𝑖=1(𝑥 − 𝑥̅ )
2
Example 7.6:
𝑆22 =
𝑛−1
Suppose we are interested in the mean number
of hours worked by blue-caller workers and
white –caller workers. A random sample of 100
(70 − 57.2)2 + (38 − 57.2)2 + (58 − 57.2)2 + ⋯ + blue-caller 2
(38 − 57.2)workers has a mean of 42.4hrs/week
=
9 with a known population standard deviation of
3hrs/week. A random sample of 140 white-caller
𝑆22 = 193.3 workers has a mean of 39.8hrs/week with a
known population standard deviation of 5.4hrs.
The hypotheses are
Test the hypothesis of no significant difference
𝐻0 : There’s no significant difference between
between the two groups at∝= 5%.
their performance .
Solution

Page 45 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

White-caller workers Blue-caller Workers

𝑥̅1 = 39.8ℎ𝑟𝑠/𝑤𝑒𝑒𝑘 𝑥̅2 = 42.4ℎ𝑟𝑠/𝑤𝑒𝑒𝑘 Hypothesis

𝑛1 = 140 𝑛2 = 040 𝐻0 : 𝜇𝐴 = 𝜇𝐵

𝐻0 : 𝜇𝐴 < 𝜇𝐵
𝑆1 = 5.4ℎ𝑟𝑠 𝑆2 = 3ℎ𝑟𝑠
This is a one tailed test to the right.
𝑥̅1 −𝑥̅1
Since the sample sizes are large, we use the 𝑍 Using 𝑍 =
𝜎 2𝜎 2
statistic √ 1+ 2
𝑛1 𝑛2

𝑥̅1 −𝑥̅1
i.e 𝑍 = ~𝑁(0,1) i.e 𝑍 =
84.6−88.3
= −3.884
𝑆2 𝑆2 35.4 49
√ 1+ 2 √ +
𝑛1 𝑛2 120 80

42.4 − 39.8 𝑍∝ = 𝑍0.05 = 1.96


𝑍= = 4.761
5.42 32 Since |−3.884| = 3.884 > 2.576, we reject the
√ +
140 100 null hypothesis and conclude that tablet B works
faster than tablet A
𝑍∝⁄ = 𝑍0.025 = 2.576
2
7.5 Paired Sampled 𝒕 −test
Since 𝑍𝑐𝑎𝑙. = 4.761 > 𝑍𝑡𝑎𝑏 = 2.576, we reject
the null hypothesis and conclude that there’s no (Dependent sample)
significant difference between the white and the
blue caller workers. We consider two populations which can be said
to be dependent; in this , the observations of one
Example 7.7: sample depends on the other . Usually,
dependent sample comes in the form of before
The manufacturer of sinus relief tablet B and after.
claimed that their tablets work faster than those
does tablet A. A sample of 120 doses of A Let (𝑥𝑖 , 𝑦𝑖 ) be the pair from the sample , then
showed that it took 84.6mins on the average to the sample statistic is
take effect. The population variance is known to 𝑛
𝑑̅ √𝑛 ∑ 𝑑𝑖
be 35.4mins. meanwhile , a sample of 80 doses 𝑡= 𝑆𝑑
~𝑡(𝑛−1), where 𝑑̅ = 𝑖=1
𝑛
and 𝑆𝑑2 =
of B showed a mean relief time of 88.3 mins 𝑛 (𝑑
∑𝑖=1 𝑖 −𝑑̅ )2
with variance of 49. Is their claim valid at ∝= 𝑛−1
for 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖
5%?
Example 7.8: Eight men suffering from obesity
Solution were offered diet capable of reducing body
weight. Their weights obtained before and after
Tablet A Tablet B the diet are as follows:
𝑥̅𝐴 = 84.6𝑚𝑖𝑛𝑠 𝑥̅𝐵 = 88.3𝑚𝑖𝑛𝑠 Bef 78 76 74 80 81 86 76 78
ore .6 .6 .5 .8 .6 .2 .2 .4
𝑛𝐴 = 120 𝑛1 = 80
Afte 76 75 74 79 80 85 74 76
𝜎𝐴2 = 35.4𝑚𝑖𝑛𝑠 𝜎𝐵2 = 49𝑚𝑖𝑛𝑠
r .5 .2 .0 .8 .3 .7 .7 .8

Page 46 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Do the data provide sufficient evidence to Example 7.9


indicate that the diet is capable of reducing body
weights at ∝= 1%? Memory capacity of 10 students was tested
before and after training . state whether the
Solution training was effective or not from the following
scores
If the diet reduces body weight , then the mean
weight before diet must be greater than that after Befor 1 1 1 8 7 1 3 0 5 6
the diet, that is 𝑥̅𝐴 < 𝑥̅𝐵 . This is a one tailed test e 2 4 1 0
to the left, hence we have the following
hypotheses. After 1 1 1 7 5 1 1 2 3 8
5 6 0 2 0
𝐻0 : 𝑥̅𝐵 = 𝑥̅𝐴

𝐻0 : 𝑥̅𝐵 > 𝑥̅𝐴

The hypothesis is Solution

𝑑̅ √𝑛 ∑ 𝑑 𝑛 Hypothesis
𝑡= ~𝑡(𝑛−1), where 𝑑̅ = 𝑖=1 𝑖 and 𝑆𝑑2 =
𝑆𝑑 𝑛
∑𝑛 (𝑑
𝑖=1 𝑖 −𝑑̅ )2 𝐻0 : The training is not effective (𝜇𝐴 = 𝜇𝐵 )
𝑛−1
for 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖
𝐻1 : The training is effective (𝜇𝐵 < 𝜇𝐴 )
The modified table is as shown below:
This is a one tailed test to the left.
Bef 78 76 74 80 81 86 76 78
𝑑̅ √𝑛
ore .6 .6 .5 .8 .6 .2 .2 .4 The test statistic is 𝑡 = 𝑆𝑑
~𝑡(𝑛−1)

Afte 76 75 74 79 80 85 74 76 Befor 1 1 1 8 7 1 3 0 5 6
r .5 .2 .0 .8 .3 .7 .7 .8 e 2 4 1 0
d 2. 1. 0. 1 1. 0. 1. 1. After 1 1 1 7 5 1 1 2 3 8
1 4 5 3 5 5 6 5 6 0 2 0

𝑑𝑖 -3 -2 1 1 2 -2 -7 - 2 -
2 2
From the above table , ∑𝑛𝑖=1 𝑑𝑖 = 9.9, hence 𝑑̅ =
∑𝑛
𝑖=1 𝑑𝑖 9.9
𝑛
= 8
= 1.2375 and the variance is 𝑆𝑑2 =
∑𝑛 ̅ 2
𝑖=1(𝑑𝑖 −𝑑 )
= 12
𝑛−1
(2.1−1.24)2 +(1.4−1.24)2 +(0.5−1.24)2 +⋯+(1.6−1.24)2 𝑑̅ = − = −1.2
10
8−1
𝑆𝑑2
𝑆𝑑2 = 0.303 ⇒ 𝑆𝑑 = √0.303 = 0.5505 (−3 + 1.2)2 + (−2 + 1.2)2 + ⋯ + (−2 + 1.2)2
=
1.24√8 10 − 1
The calculated statistic is 𝑡 = 0.5505 = 6.37 = 7.73 ⇒ 𝑆𝑑 = 2.781

Since it is a one way test, the tabulated statistics −1.2√10


is 𝑡0.01 at degree of freedom of 7.and this value 𝑡= = −1.3645
2.781
is Type equation here.
𝑡𝑐𝑟𝑖𝑡. = 𝑡0.05 = 2.262
Page 47 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Since |1.364| ≯ 2.262, we accept the null


hypothesis and conclude that the training has no
significant effect on the students.

8
Chi-Square Test

Chi- square test is used in testing observations N is the grand total , i.e 𝑁 = ∑𝑚
𝑖=1 𝑅𝑖
that are frequencies of categorical data,
𝑛
displayed on a table known as contingency
table. = ∑ 𝐶𝑗
𝑗=1
Contingency Table
The chi-Square test statistic is 𝜒 2 =
This is a table that displays the observed 2
(𝑜𝑖𝑗 −𝑒𝑖𝑗 )
frequencies froma given research or set of data. ∑𝑛𝑗=1 ∑𝑚
𝑖=1
2
~𝜒(𝑚−1)(𝑛−1)
𝑒𝑖𝑗
An 𝑚 × 𝑛contingencytable is as shown below
Where 𝑒𝑖𝑗 is the expected or theoretical
frequency of the observation in the (I,j)th cell.;
1 2 3 n Total 𝑅𝑖 ×𝐶𝑗
its value is obtained thus, 𝑒𝑖𝑗 = and (𝑚 −
𝑁
1 𝑜11 𝑜12 𝑜13 ... 𝑜1𝑛 𝑅1 1)(𝑛 − 1) is the degree of freedom.

2 𝑜21 𝑜22 𝑜23 ... 𝑜2𝑛 𝑅2 8.1 Application of the chi-square statistic

3 𝑜31 𝑜32 𝑜33 ... 𝑜3𝑛 𝑅3 The 𝜒 2 statistic can be used for the following
tests
. . . . . . .
1. To test for independence of attributes
. . . . . . . 2. To test if two or more samples are from
the same population
. . . . . . . 3. To test for goodness of fit

8.2 Test of Independence


m 𝑜𝑚1 𝑜𝑚2 𝑜𝑚3 . . . 𝑜𝑚𝑛 𝑅𝑚
In using the 𝜒 2 statistic to test for independency
Total 𝐶1 𝐶2 𝐶3 ... 𝐶𝑛 N of two attributes or factor , we work with the
following hypothese

𝐻0 : The variables or attributes are independent


Where 𝑜𝑖𝑗 is the observed frequency ; 𝑖 =
1,2,3, … , 𝑚 𝑎𝑛𝑑 𝑛 = 1,2,3, … , 𝑛 𝐻1 : The variables or attributes are not
independent
𝑅𝑖 is the totaol of the ith row
Decision Rule
𝐶𝑗 is the total of the jth column
Page 48 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

2 2
Reject 𝐻0 if 𝜒𝑐𝑎𝑙 > 𝜒𝑡𝑎𝑏 . From the table above , we need to obtain the
expected frequencies using the formula
Example 8.1
𝑅𝑖 × 𝐶𝑗
A psychologist classified 1725 school children 𝑒𝑖𝑗 =
𝑁
according to intelligence and family income
level. This is shown in the table below. 𝑅1 × 𝐶1 636 × 449
⇒ 𝑒11 = = = 165.54
𝑁 1725
Intelligence
𝑅1 × 𝐶2 636 × 942
𝑒12 = = = 347.32
Income Dull Intelligent Average 𝑁 1725
𝑅1 × 𝐶3 636 × 334
Well clothed 81 322 133 𝑒13 = = = 123.14
𝑁 1725
Averagely clothed 141 457 153 𝑅2 × 𝐶1 751 × 449
𝑒21 = = = 195.48
Poorly clothed 127 163 48 𝑁 1725
𝑅2 × 𝐶2 751 × 942
𝑒22 = = = 410.11
𝑁 1725
Test for independence at ∝= 1% 𝑅2 × 𝐶3 751 × 334
𝑒23 = = = 145.41
𝑁 1725
Solution
𝑅3 × 𝐶1 338 × 449
The hypotheses are 𝑒31 = = = 87.98
𝑁 1725
𝐻0 : intelligence is independent of family 𝑅3 × 𝐶2 338 × 942
income. 𝑒32 = = = 184.58
𝑁 1725
𝐻1 : intelligence is not independent of family 𝑅3 × 𝐶3 338 × 334
income. 𝑒33 = = = 65.44
𝑁 1725
From the table above , the complete table is as The chi square statistics is
shown below.
(181 − 165.54)2 (322 − 347.32)2
Intelligence Total 𝜒2 = +
165.54 347.32
(133 − 123.14)2
Income Dull Intelligent Average + +⋯
123.14
(48 − 65.44) 2
Well 181 322 133 636 + = 49.4966
65.44
clothed
The degree of freedom is (3-1)(3-1)=4 and the
Averagely 141 457 153 751 level of significance is 1%=0.01.
clothed
The critical value is 𝜒42 (0.01) = 13.28.
Poorly 127 163 48 338
Since the calculated is greater than the tabulated,
clothed
we reject the null hypothesis and conclude that
that students intelligence level is not
Total 449 942 334 1725
independent on family income.

Page 49 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Example 8.2 (Question 25-27; 2013/2014) (63 − 54.6)2 (49 − 57.4)2


𝜒2 = +
54.6 57.4
Given the contingency table below, of the data (15 − 23.4)2
of disease status and residue. +
23.4
Disease Residue (33 − 24.6)2
+ = 8.41
24.6
status Urban Rural c. The hypotheses are
𝐻0 : The disease status is independent of
Disease 63 49 the residue.
𝐻1 : The disease status is not
No disease 15 33 independent on the residue.
The degree of freedom is (2-
1)(2-1)=1 and the level of significance is
𝛼 = 0.05. from the chi square table , we
a. Calculate the expected frequency for find the value of 0.05 at 1 and this is
each of the cells 3.841.
b. Obtain the calculated 𝜒 2 statistics. Since the calculated is greater than the
c. At 𝛼 = 0.05, test the hypothesis for tabulated , we reject the null hypothesis
independence between the disease status and conclude that disease status is not
and the residue. independent on residue.

Solution 8.3 Test of Homogeneity


a. To calculate the expected frequency , we The Chi Squared test of Homogeneity is used to
need to complete the table. determine if two or more samples are from the
same population. For instance a researcher may
Disease Residue Total be interested in the opinion of students from
different faculties in the university of port
Status Urban Rural Harcourt about evening classes. In this , we may
conclude based on evidence that they are all
Disease 63 49 112 saying the same thing (from the same population
) or their opinion are different , on the average (
No disease 15 33 48 different population.)
Total 78 82 160 The only different between this and test of
independence is the hypotheses; the
computations are the same.
Just like it was calculated for the
previous example , the expected The hypotheses are :
frequencies are
112 × 78 𝐻0 : The samples are from the same population.
𝑒11 = = 54.6
160
112 × 82 𝐻1 : The samples are from different population.
𝑒12 = = 57.4
160 𝟖. 𝟑: Goodness of fit test
48 × 78
𝑒21 = = 23.4
160 In the chi square goodness of fit test , we use the
48 × 82
𝑒22 = = 24.6 chi square statistics to determine if a given set of
160 data fits a given distribution.
b. The calculated 𝜒 2 is

Page 50 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

The hypotheses for this test are Two distributions (poisson and exponential )
have been suggested as possible fits for the data
𝐻0 : The data fits the distribution. below.
𝐻1 : The data does not fit the distribution. No of 0 1 2 3 4 5 6 7 8 9
∑(𝑜𝑖 −𝑒𝑖 )2 Errors
The test statistics is 𝜒 2 = 𝑒𝑖
, where 𝑒𝑖 =
𝑁𝑃(𝑥𝑖 ). Frequency 18 53 103 107 82 46 18 10 2 1

The degree of freedom is 𝑛 − 1, where 𝑛 is the


number of classifications.
Which is a more likely distribution that fits the
Example 8.3 data.?

Solution

9
Analysis of Variance (ANOVA)

9.1 Introduction iii. Various treatment and


environmental effects are
In the previous chapter, we considered t-test for additive in nature.
compa
In the study of ANOVA , the following terms
rism of two population means for small samples. are very important.
ANOVA is a design used when our aim is to
compare the mean of three or more populations Factor: A factor is an attribute or a
,this is so, since if the t-test were to be used , characteristic whose effect is being considered .
then the test need to be iterated and would be for instance, Detergent ,Time ,Teaching methods
time consuming and cumbersome . ,e.t.c

ANOVA is based on the F-test statistic or Level: A level is the amount , method , class of
Variance Ratio Statistic. a factor being used on a set of data. Level of
factor is also considered as treatments; for
Assumptions of ANOVA instance , if the factor being considered is
Detergent , the the following are possible
i. The observations are treatments or levels of factor: OMO,KLIN,
independent SUNLIGHT DETERGENT e.t.c
ii. The populations from which the
observations are taken are
Normally distributed.

Page 51 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Block: A block is a group of items that are 𝑋𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝑒𝑖𝑗 ; 𝑖 = 1,2,3, … , 𝑘 𝑎𝑛𝑑 𝑗 =
homogeneous or whose characteristics are not so ,2,3, … , 𝑛
different , compared to other items.
Where
There are different classifications of ANOVA, in
this text , we shall consider one-way and Two- 𝑋𝑖𝑗 is the effect on the jth replication due to the
Ways ANOVA. ith treatment.

9.2 ONE-WAY (FACTOR) ANOVA 𝜇 is the grand mean or the overall mean.
A one-way ANOVA is one in which only one 𝛼𝑖 is the mean effect of the ith treatment.
factor is being considered. For example , a
botanist may be interested in the effect of 𝑒𝑖𝑗 is the random error on the jth replication due
different types of fertilizers on his/her crops. In to the ith treatment and 𝑒𝑖𝑗 ∼ 𝑁(0, 𝜎𝑒2 )
this, the types of fertilizer becomes the
levels(treatments ) of the factor(Fertilizer). ⇒the estimated model is 𝐸(𝑋𝑖𝑗 ) = 𝜇̂ + 𝛼̂𝑖 ; 𝑖 =
1,2,3, … , 𝑘 𝑎𝑛𝑑 𝑗 = ,2,3, … , 𝑛
The table of observations for the one-way
ANOVA is as shown below. 9.2.1 Statistical Analysis of the model
Treatments Replications To analyze the one-way ANOVA , we consider
the different sources of variation and the
1 2 3 … n Total relationship between them.

1 𝑥11 𝑥12 𝑥13 ... 𝑥1𝑛 𝑇1 It has been found that , for a one way ANOVA,
there are three different sources of variation,
2 𝑥21 𝑥22 𝑥23 . . . 𝑥2𝑛 𝑇2 namely:

3 𝑥31 𝑥32 𝑥33 . . . 𝑥3𝑛 𝑇3 Variation due to treatment( variation between


treatment), variation due to error (variation
. . . . . . . within treatments) and the total variation.

. . . . . . . Mathematically, we have it that Square of the


sum of Total variation equals the sum of the sum
. . . . . . . of squares of variation within and between .i.e

𝑆𝑆𝑇 = 𝑆𝑆𝑡 + 𝑆𝑆𝑒


K 𝑥𝑘1 𝑥𝑘2 𝑥𝑘3 . . . 𝑥𝑘𝑛 𝑇𝑘
⇒ 𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡

where
The grand total is 𝑇 = ∑𝑘𝑖=1 𝑇𝑖
𝑛 𝑘
The mean effect of due to the ith treatment 𝑆𝑆𝑇 = ∑ ∑ 𝑋𝑖𝑗 − 𝐶. 𝐹
∑𝑛
𝑗=1 𝑥𝑖𝑗
is𝑥̅𝑖 = 𝑛
;𝑗 = 1,2, … , 𝑘 𝑗=1 𝑖=1

𝑘
The above able is of of 𝑘 treatments and 𝑛
𝑆𝑆𝑡 = 1⁄𝑛 ∑ 𝑇𝑖2 − 𝐶. 𝐹 𝑎𝑛𝑑
replications.
𝑖=1
The general model of the one-way ANOVA is

Page 52 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

𝐶. 𝐹 is the correction factor and it is equal to the same test . the table below shows the test
𝑇2 𝑇2 score of the student.
𝑛𝑘
= 𝑁
Using the above information , answer the
9.2.2 ANOVA table following questions.
The ANOVA table for the one way ANOVA is i. Determine the mean score for the
as shown below different methods, 𝑥̅1 , 𝑥̅2 , 𝑥̅3
respectively.
S.V d.f SS MS F-ratio
ii. Determine the grand mean
iii. Determine the total variation.
Treatment 𝑘 𝑆𝑆𝑡 𝑀𝑆𝑡 𝐹
iv. Determine the variation between
−1 𝑆𝑆𝑡 𝑀𝑆𝑡
(Between) = = methods.
𝑘−1 𝑀𝑆𝑒 v. Determine the variation within
methods.
Error 𝑛𝑘 𝑆𝑆𝑒 𝑀𝑆𝑒 vi. Determine at 𝛼 = 5%, whether
−𝑘 𝑆𝑆𝑒 there is a significant difference
(Within) =
𝑛𝑘 − 𝑘 amongst the teaching methods.
Total 𝑛𝑘 𝑆𝑆𝑇 Teaching methods Test Score
−1
Method1 3 7 4 6

Method2 4 2 3 3
Hypotheses
Method3 6 4 5 5
The hypotheses for the one way ANOVA are

𝐻0 : There’s no significant difference due to


treatment effect Solution
i.e 𝜇1= 𝜇1= 𝜇1= 𝜇1= … = 𝜇1 The first thing to do is to complete the table
, by getting the row sums.
𝐻1 : there’s a significant difference due to
treatment effect Teaching methods Test Score Total
i.e 𝜇𝑖 ≠ 𝜇 for atleast one 𝑖 = 1,2,3, … , 𝑘.
Method1 3 7 4 6 20
Decision Rule
Method2 4 2 3 3 12
The decision rule is Reject 𝐻0 if 𝐹𝑐𝑎𝑙 > 𝐹𝑡𝑎𝑏.
Method3 6 4 5 5 20
Where 𝐹𝑡𝑎𝑏. = 𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝐹𝛼,𝑘−1,𝑛𝑘−𝑘 .
𝑇 = 52
Example 9.1: Question 18-23 ,2013.2014
session From the above table , we have a one factor
(teaching method) ANOVA. There are three
A teacher wants to test there different teaching treatments(method1, method2, method3)
methods, to do this , three random selected and four replications.
groupsof five students each were chosen and
each group was taught using one of the teaching
methods. All the groups were later subjected to
Page 53 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

i. 𝑥̅1 = 20⁄4 = 5, 𝑥̅2 = 12⁄4 = 3 and ⇒ 𝐹𝑐𝑎𝑙. =


5.335
= 3.43
𝑥̅3 = 20⁄4 = 5 1.56
20+12+20
ii. Grand mean (𝜇)= 𝑥̅ = = 𝐹𝑡𝑎𝑏. = 𝐹0.05,2,9 = 4.26
3×4
4.33 This value is obtained from the 𝐹 tablewith d.f
iii. Total variation is the same as Total
(nemerator)= 2 and d.f(Denominator)= 9
sum of squares
Decision Rule
i.e
𝑛 𝑘
Reject 𝐻0 if 𝐹𝑐𝑎𝑙 > 𝐹𝑡𝑎𝑏.
𝑆𝑆𝑇 = ∑ ∑ 𝑋𝑖𝑗 − 𝐶. 𝐹
𝑗=1 𝑖=1 Now, 𝐹𝑐𝑎𝑙. = 3.43 ≯ 𝐹𝑡𝑎𝑏. = 4.26
2 2 We Accept 𝐻0 and conclude that there’s no
where 𝐶. 𝐹 = 𝑇 ⁄𝑛𝑘 = 52 ⁄12 = 225.33
significant difference amongst teaching
𝑛 𝑘 methods.
∑ ∑ 𝑋𝑖𝑗 = 32 + 72 + 42 + ⋯ + 52 = 250
Example: 9.2
𝑗=1 𝑖=1
An experiment is performed to determine the
⇒ 𝑆𝑆𝑇 = 250 − 225.33 = 24.67 yields of five different varieties of wheat ,
iv. Variation between methods is the A,B,C,D,E. four plots of land are assigned to
each varieties , and the yields (in bushels per
same as 𝑆𝑆𝑡 = 1⁄𝑛 ∑𝑘𝑖=1 𝑇𝑖2 − 𝐶. 𝐹 acre) as shown in the table below. Assuming the
𝑆𝑆𝑡 = 1⁄4 (202 + 122 + 202 ) plots to be of similar fertility and their varieties
− 225.33 are assigned at random to plots . Determine if
= 236 − 225.33 there is a significant difference in the yields at
= 10.67 1% level of significance.
v. Variation within methods is th same
as 𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 = 24.67 − A 20 12 15 19
10.67 = 14
vi. To test the hypotheses for B 17 14 12 15
significant difference at 𝛼 = 0.05,
we state the hypotheses. C 23 16 18 14

𝐻0 : there’s no significant difference D 15 17 20 12


amongstthe methods.
E 21 14 17 18
𝐻1 : There’s a significant difference amongst
the methods.
𝑀𝑆𝑡 Solution
𝐹𝑐𝑎𝑙. =
𝑀𝑆𝑒
The classification is one way(wheat).
𝑆𝑆𝑡 10.67
Where 𝑀𝑆𝑡 = = = 5.335
𝑘−1 3−1 There are five treatments(A-E) and four
𝑆𝑆𝑒 14 replications.
𝑀𝑆𝑒 = = = 1.56
𝑛𝑘 − 𝑘 12 − 3 𝑛 = 4 𝑎𝑛𝑑 𝑘 = 5

Hypotheses
Page 54 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

𝐻0 : There’s no significant difference amongst 𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 = 184.95 − 27.2 = 157.75
the different varieties of wheats.
ANOVA table
𝐻0 : There’s a significant difference amongst the
different varieties of wheats. S.V d.f SS MS F-ratio

The complete table is as shown below Treatmen 5 27.2 𝑀𝑆𝑡 𝐹


t −1 27.2 6.8
Replication total = =
=4 4 10.52
(Between = 6.8 = 0.6466
A 20 12 15 19 66 )
B 17 14 12 15 58 Error 15 157.75 𝑀𝑆𝑒
C 23 16 18 14 71 157.75
(Within) =
15
D 15 17 20 12 64 = 10.52

E 21 14 17 18 70 Total 19 184.95

Analysis 𝐹𝑡𝑎𝑏. = 𝐹𝛼,𝑘−1,𝑛𝑘−𝑘 = 𝐹0.01,4,15 = 4.89

𝑆𝑆𝑇 = 𝑆𝑆𝑡 + 𝑆𝑆𝑒 ⇒ 𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 Decision Rule

Whrere Reject 𝐻0 if 𝐹𝑐𝑎𝑙 > 𝐹𝑡𝑎𝑏.


𝑛 𝑘

𝑆𝑆𝑇 = ∑ ∑ 𝑋𝑖𝑗 − 𝐶. 𝐹 Now, 𝐹𝑐𝑎𝑙 = 0.6466 ≯ 𝐹𝑡𝑎𝑏. = 4.89


𝑗=1 𝑖=1
∴ we accept te null hypothesis and conclude that
𝑘 there’s no significant difference in yields.
𝑆𝑆𝑡 = 1⁄𝑛 ∑ 𝑇𝑖2 − 𝐶. 𝐹 9.3 Two Way (Factors) ANOVA
𝑖=1 Classification
2 (58 + 71 + 66 + 64 + 70)2
𝐶. 𝐹 = 𝑇 ⁄𝑛𝑘 = Suppose that the observations are classified into
20 𝑘 categories according to an attribute
3292 (characteristic) and into 𝑛 categories according
= = 5412.05
20 to another attribute(charateristic), then the
classification is a two way classification .
and
𝑛 𝑘
In this classification , the primary factor is the
∑ ∑ 𝑋𝑖𝑗 = 202 + 122 + 152 + ⋯ + 182 treatment and the secondary factor is the block.
𝑗=1 𝑖=1 In order words , two factor classification is such
= 5597 that the total observations are classified into 𝑘
treatments and 𝑛 blocks, where the blocks are
⇒ 𝑆𝑆𝑇 = 5597 − 5412.05 = 184.95
homogeneous units.
𝑆𝑆𝑡 = 1⁄4 (662 + 582 + 712 + 642 + 702 )
− 5412.05 = 27.2

Page 55 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

For example , the performance of students in a 𝛼𝑖 is the mean effect of the ith
class may depend on the method of teaching and treatment.
the tutor in charge of the class.
𝛽𝑗 is the mean effect of the jth block.
The table for two way ANOVA is as shown
below 𝑒𝑖𝑗 is the random error of the jth block
due to the ith treatment.
treatments Blocks
Hypotheses
1 2 3 … n 𝑋𝑖.
There are two sets of hypotheses for the two
1 𝑥11 𝑥12 𝑥13 … 𝑥1𝑛 𝑋1. way ANOVA without interaction; for the
treatment and the block.
2 𝑥21 𝑥22 𝑥23 … 𝑥2𝑛 𝑋2.
𝐻0𝑡 : There’s a significant difference amongst the
3 𝑥31 𝑥32 𝑥33 … 𝑥3𝑛 𝑋3. treatment.

𝐻0𝑏 : There’s a significant difference amongst


. . . . … . .
the blocks.
. . . . … . . 𝐻1𝑡 : there’s a significant difference amongst the
treatments.
. . . . … . .
𝐻1𝑏 : there’s a significant difference amongst the
blocks.
K 𝑥𝑘1 𝑥𝑘2 𝑥𝑘3 … 𝑥𝑘𝑛 𝑋𝑘. Analysis

𝑋.𝑗 𝑋.𝑗 𝑋.𝑗 𝑋.𝑗 … 𝑋.𝑗 𝑋.. In this, we have four sources of variation ,
namely: treatment, block , error and total. The
Where 𝑋𝑖. Is the total observed effect due to the relationship amongst these variation is as shown
ith treatment. below.

𝑋.𝑗 is the total observed effect due to the 𝑆𝑆𝑇 = 𝑆𝑆𝑡 + 𝑆𝑆𝑏 + 𝑆𝑆𝑒 ⇒ 𝑆𝑆𝑒
jth block. = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 − 𝑆𝑆𝑏

Note: the treatment is not always on the row of Where 𝑆𝑆𝑇 = ∑𝑛𝑗=1 ∑𝑘𝑖=1 𝑋𝑖𝑗 − 𝐶. 𝐹, 𝑆𝑆𝑡 =
the table. It all depends on the definition of the 1⁄ ∑𝑘 𝑋 2 − 𝐶. 𝐹 , 𝑆𝑆𝑡 = 1⁄ ∑𝑛 𝑋 2 −
𝑛 𝑖=1 𝑖. 𝑘 𝑗=1 .𝑗
problem. 𝐶. 𝐹
9.3.1 Model for two way ANOVA without 𝑋..2 𝑋..2
interaction 𝑎𝑛𝑑 𝐶. 𝐹 = =
𝑛𝑘 𝑁
The model for a two way classification without 9.3.2 ANOVA Table
interaction is
S.V d.f SS MS F-
𝑋𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝛽𝑗 + 𝑒𝑖𝑗
ratio
Where 𝑋𝑖𝑗 is the effect of the ith treatment on
Treatme k-1 SSt 𝑆𝑆𝑡 𝐹𝑡
the jth block. 𝑀𝑆𝑡 =
nt 𝑘−1 𝑀𝑆𝑡
=
𝑀𝑆𝑒
Page 56 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Block n-1 SS 𝑆𝑆𝑏 𝐹𝑏 Hot 54 46 58 158


𝑀𝑆𝑏 =
b 𝑛−1 𝑀𝑆𝑏 water
=
𝑀𝑆𝑒
Total 160 153 193 506
Error (k- SS 𝑀𝑆𝑒
1)(n e 𝑆𝑆𝑒
=
-1) (𝑘 − 1)(𝑛 − 1) Perform a two way analysis of variance , using
5% level of significance.
Total nk- SS
1 T Solution

In the statement above, we are faced with two


factors (temperature and detergents). One can
The critical value for each of these variations are easily say tat the treatment are the detergents
𝐹𝛼,𝑘−1,(𝑛−1)(𝑘−1) for the treatments and while the blocks are the water temperature. With
𝐹𝛼,𝑛−1,(𝑛−1)(𝑘−1) for the blocks. this, we have 3 treatments and 3 blocks .

Decision Rule Hypotheses

The decision rule remains the same . reject the 𝐻0𝑡 : there’s no significant difference in
null hypotheses if the calculated value is greater treatment effect.
than the tabulated value.
𝐻0𝑏 : there’s no significant difference in block
Example 9.3 effect.

To study the performance of three detergents 𝐻1𝑡 : there’s a significant difference in treatment
and three different water temperature , the effect
following ‘ whiteness’ reading were obtained
with specially designed equipment. 𝐻1𝑏 : there’s a significant difference in block
effect

Analysis

Using 𝑆𝑆𝑇 = 𝑆𝑆𝑡 + 𝑆𝑆𝑏 + 𝑆𝑆𝑒 ⇒ 𝑆𝑆𝑒 =


𝑆𝑆𝑇 − 𝑆𝑆𝑡 − 𝑆𝑆𝑏

Detergent 𝑛 𝑘

𝑆𝑆𝑇 = ∑ ∑ 𝑋𝑖𝑗 − 𝐶. 𝐹 , 𝑆𝑆𝑡


Water Deterge Deterge Deterge Tot 𝑗=1 𝑖=1
𝑘
temperat ntA ntB ntC al
ure = 1⁄𝑛 ∑ 𝑋𝑖. 2 − 𝐶. 𝐹 ,
𝑖=1
Cold 57 55 67 179 𝑛
water 𝑆𝑆𝑏 = 1⁄𝑘 ∑ 𝑋.𝑗 2 − 𝐶. 𝐹
𝑗=1
Warm 49 52 68 169
water 𝑋..2 5062
𝐶. 𝐹 = = = 28448.44
𝑛𝑘 9

Page 57 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

𝑛 𝑘 Since 𝐹𝑐𝑎𝑙. (𝐷𝑒𝑡𝑒𝑟𝑔𝑒𝑛𝑡) = 9.85 > 𝐹𝑡𝑎𝑏. = 6.94,


∑ ∑ 𝑋𝑖𝑗 = 572 + 552 + 672 + ⋯ + 582 we reject the null hypothesis and conclude that
𝑗=1 𝑖=1 there’s a significant difference in treatment
= 28888 (detergent ) effect.

𝑆𝑆𝑇 = 28888 − 28448.44 = 439.56 Also, 𝐹𝑐𝑎𝑙. (𝑊𝑎𝑡𝑒𝑟 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) = 2.38 ≯


𝐹𝑡𝑎𝑏. = 6.94, we accept the null hypothesis and
𝑆𝑆𝑡 = 1⁄3 (1602 + 1532 + 1932 ) − 28448.44 conclude thst there’s no significant difference in
the block (Water temperature ) effect.
= 28752.67 − 28448.44 = 304.23
Example 9.4
𝑆𝑆𝑏 = 1⁄3 (1792 + 1692 + 1582 ) − 28448.44
Three varieties of coal were analyzed by four
= 28522 − 28448.44 = 73.56 chemicals and the ash content in the varieties
was found as given in the table below.
⇒ 𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 − 𝑆𝑆𝑏
Chemicals
= 439.56 − 304.23 − 73.56 = 61.77
Varieties 1 2 3 4
ANOVA table
A 8 5 5 7
S.V d.f SS MS F-ratio
B 7 6 4 4
Treatmen 2 304.2 𝑀𝑆𝑡 𝐹𝑡
t 3 304.23 152.11 C 3 6 5 4
= =
2 15.44
= 152.11 = 9.8504
Analyse the data for significant difference at
Block 2 73.56 𝑀𝑆𝑏 𝐹𝑏
𝛼 = 1%.
73.56 36.78
= =
2 15.44 Solution
= 36.78 = 2.38
The hypotheses are
Error 4 61.77 𝑀𝑆𝑒
61.77 𝐻0𝑐 : There’s no significant difference in the
= chemist effects.
4
= 15.44 𝐻0𝑣 : There’s no significant difference amongst
the varieties.
Total nk 439.5
-1 6 𝐻1𝑐 : There’s a significant difference in the
chemist effects.

𝐻1𝑣 : There’s a significant difference in the


𝐹0.05,2,4 = 6.94. varieties effects.

The critical value for the detergent and water The complete table is shown below.
temperature are the same.
Chemicals
Decision Rule

Page 58 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988
Intro. to probability and statistics

Varieties 1 2 3 4 Total Chemist 3 5.67 1.89 𝐹𝑐 =0.6019

A 8 5 5 7 25 Varieties 2 6.5 3.25 𝐹𝑣 =1.035

B 7 6 4 4 23 Error 6 18.83 3.14

C 3 6 5 4 18 Total 11 31

Total 18 19 14 15 66

Decision Rule

This is a two way classification. 𝐹0.01,3,6 (𝐶ℎ𝑒𝑚𝑖𝑠𝑡) = 9.78

⇒ 𝑆𝑆𝑇 = 𝑆𝑆𝑡 + 𝑆𝑆𝑏 + 𝑆𝑆𝑒 𝐹0.01,2,6 (𝐶ℎ𝑒𝑚𝑖𝑠𝑡) = 10.92

𝑆𝑆𝑒 = 𝑆𝑆𝑇 − 𝑆𝑆𝑡 − 𝑆𝑆𝑏 Obviously, 𝐹𝑐 =0.6019≯ 𝐹0.01,3,6 (𝐶ℎ𝑒𝑚𝑖𝑠𝑡) =


9.78 and
Where 𝑆𝑆𝑇 = ∑𝑛𝑗=1 ∑𝑘𝑖=1 𝑋𝑖𝑗 − 𝐶. 𝐹
𝐹𝑐 =0.6019≯ 𝐹0.01,2,6 = 10.93
3

𝑆𝑆𝑐(𝑐ℎ𝑒𝑚𝑖𝑠𝑡) = 1⁄3 ∑ 𝑋𝑖. 2 − 𝐶. 𝐹 Therefore , we reject the null hypotheses and


𝑖=1 conclude that there’s no significant difference
amongst the varieties and the chemists.
3

𝑆𝑆𝑣(𝑉𝑎𝑟𝑖𝑒𝑡𝑖𝑒𝑠) = 1⁄3 ∑ 𝑋.𝑗 2 − 𝐶. 𝐹


𝑗=1

𝑋..2 662
𝐶. 𝐹 = = = 363
𝑛𝑘 12
𝑛 𝑘

∑ ∑ 𝑋𝑖𝑗 = 82 + 52 + 52 + ⋯ + 42 = 394
𝑗=1 𝑖=1

𝑆𝑆𝑇 = 394 − 363 = 31

𝑆𝑆𝑐 = 1⁄3 (252 + 232 + 182 ) − 363 = 5.67

𝑆𝑆𝑣 = 1⁄4 (182 + 192 + 142 + 152 ) − 363


= 6.5

𝑆𝑆𝑒 = 31 − 6.5 − 5.67 = 18.83

The ANOVA table for the analysis is as shown


below

S.V d.f SS MS F-ratio

Page 59 of 59
DETERMINATION and FOCUS are part of the keys to SUCCESS. Build INTEREST in what ever you do and it will be hard for failure to be your
attendant. Just know that “YOU CAN, ONLY IF YOU THINK YOU CAN” For enquiries,call GEORGE WHYTE: 08173887711 or 08165335988

You might also like