Herman Karanja Mwangi Final Submission PDF
Herman Karanja Mwangi Final Submission PDF
COURSE OBJECTIVE,
This course equips the students with the necessary skills in collection, organization,
presentation and analysis of data.
COURSE OUTLINE
TOPIC AND CONTENT REMARKS
1
WEEK 7 Measures of dispersion
The range
The mean absolute deviation
Population variance and standard deviation
Sample variance and standard deviation
Measures of Skewness
Measures of Kurtosis
COURSE EVALUATION
1. CAT & Assignment 30%
2. Final Examination 70%
Total 100%
METHOD OF INSTRUCTION
1. Lectures
2. Class discussions and group discussions
3. Individual assignment
4. Presentations
SUGGESTED READING TEXTS
2
1. King‟ori G. K. (2004): Fundamentals of Applied Statistics; Jomo Kenyatta
Foundation.
2. Quantitative techniques by Terry Lucey
3. Statistical methods and data analysis by R. Lyman Ott & Michael Longnecker
4. Kothari,C.R,(2009). Research Methodology: Methods & Techniques 2nd
Edition.New Delhi.
5. Saunders, M., Lewis,P. and Thornhill (2009). Research Methods for Business
Students 5th Edition. London
6. Saleemi N. A (2006): Quantitative Techniques; Saleemi Publications Ltd,
Nairobi.
7. Saleemi N. A (2006): Business Mathematics and Statistics Simplified;
Saleemi Publications Ltd, Nairobi.
3
1.0 INTRODUCTION
4
1.2 Functions of Statistics
i. Marketing
ii. Production
iii. Finance
iv. Banking
v. Investment
vi. Purchase
vii. Accounting
viii. Planning and Control
ix. Quality control
x. Credit
xi. Personnel
xii. Research and development
.
5
1.4 Limitations of Statistics
1.Discuss the limitations in details in relevance to the group work 1.0 given
earlier: Each group to give the relevant limitations in their area of discussion.
2. Is national census Important? Discuss this in terms of budgeting, policy
formulation in the country.
6
2.0 COLLECTION OF DATA
In order to collect data for a particular investigation, it is important take note of the
object and scope of inquiry, nature and types of inquiry, statistical unit and degree of
accuracy.
Every investigation has its unique objective to be achieved and scope of coverage.
Before undertaking any investigation, determine its objective and extent of coverage.
This helps to save money, save time and ensure important data is not neglected
Scope entails questions as from where, when, whom statistical data will be collected
Primary inquiry- Data is collected for the first time by the investigators
Secondary inquiry-Investigators use data that was collected by someone else for
some other purpose. This data is available in research papers, magazines and journals
etc
Open inquiry- Inquiry whose results are not kept secret or shown to the public
Confidential inquiry-Inquiry whose result are kept secret or not shown to the public.
7
2.1.8 Initial or Repetitive
These are the units used for collection of data. There are two types of statistical units.
These are;
8
2.3.1 Observation Method
Investigator observes the objects and records the desired information without asking
any questions. This method is most suitable when validity of data collected by other
methods is questionable.
2.3.1.1 Advantages
i) Data collected are objective and generally more accurate.
ii) It is easier to note that effects of environmental influence on specific
outcomes.
iii) Does not rely on the willingness and ability of respondents to report
accurately.
iv) It‟s easier to observe certain groups of individuals (e.g. very young children
and extremely busy executives) from whom it may be otherwise difficult to
obtain information.
v) No biasing effect of interviewers at phrasing of their questions.
2.3.1.2 Disadvantages
i) Inability to observe such things as attitudes, motivation and plans. Only as
those factors are reflected in actions can they be observed.
ii) It‟s necessary for the observer to be physically present unless a camera or
other mechanical system can capture the event of interest), after long periods
of time.
iii) This method is not only slow but also tedious and expensive.
iv) Influences the behavior of the object
9
iv) Suitable for intensive investigation
v) High responses rate
2.3.2.2 Disadvantages
i) Higher costs
ii) Lack of anonymity
iii) reluctance to discuss sensitive topics
iv) Not suitable for extensive inquiry
v) Any bias by the investigator can damage the whole inquiry
2.3.3.2 Disadvantages
i. Reluctance to discuss sensitive topics on telephone
ii. Respondents can terminate interview before it is finished (hang-up)
iii. Difficult to collect supplementary information about the respondent
2.3.4 Questionnaire
A standard list of questions (questionnaire) relating to the problem is prepared
It‟s a pre-formulated written set of questions to which respondents record their
answers
The questionnaires are delivered and returned either;
i. Electronically –email or online
ii. Post/mail
iii. drop and pick questionnaires- Delivered by hand to each respondents and
collected later.Usually the questionnaires are completed by respondents
2.3.4.2 Advantages
i. Low cost compared to other methods
10
ii. Reduced biasing error- Respondents are not influenced by interviewer
characteristics or techniques
iii. Greater anonymity- This is especially important when sensitive issues are
involved
iv. Greater accessibility (wide geographic contact at minimal cost)
v. Respondents have adequate time to think about their answers and/or
contact other sources.
vi. Suitable for extensive inquiries
2.3.4.3 Disadvantages
2.4.1 Advantages
i. Are cheap
ii. Can be obtained quickly
11
iii. Can provide information which may not be available to a typical
researcher
iv. May be objective because it will not be biased.
2.4.2 Disadvantages
v. Information / data may be outdated
vi. Variation in definition of terms
vii. Units of measurements may be different
viii. Lack of information to verify the data’s accuracy.
2.5 Sampling
Sampling is the process of selecting a number of individual for a study in such a way
that the individuals selected represent the large group from which they were selected.
The individuals selected form the sample.
The large group from which they were selected is the population (Universe) eg total
number of students in a college
The purpose of sampling is to secure a representative group which will enable the
researcher to gain information about the population.
There are two methods that can be used to collect statistical data. These are
i) Census method
ii) Sampling method
12
iii. Not applicable where a census is required
2.5.3.1 Population
Population refers to the entire group of people, objects, events or things of interest
that the researcher wishes to investigate.
2.5.3.2 Sampling frame
Is a list of elements from which a sample is actually drawn
Note: Perfect sampling frames are however rare
2.5.3.3 Sampling unit
Is the basic unit containing the elements of the population to be sampled. e.g. female,
male, purchasing agents etc
2.5.3.4 Sample size
Is the number of elements that constitute the sample
13
v. Utilize data for further statistical analysis
3.2.3.1Merits of tabulation
i. Facilitate easy understanding of data
ii. Aid comparison of different classes
iii. Easy location of required data
iv. Avoidance of unnecessary details
Example
In January 2011, a firm employed 90 staff of whom 79 were men. During the year 17
staff left and 13 of these were men. The total recruitment during the year was 13 out
of whom 3 were women. In 2012, wastage decline by 3 amongst men compared with
2011 and no women left, 6 more men but 2 fewer women were recruited than in the
previous year. The total number employed on 1st January 2012 amounted to 93.
Required;
14
Arrange the above information in concise tabular form showing all relevant totals and
sub-totals
Employees 2011 2012
Jan 1 Men women Total Men women Total
79 11 90 76 10 86
Recruited 10 3 13 16 1 17
Left (13) (4) (17) (10) - (10)
Total 76 10 86 82 11 93
Exercise
The following report was prepared by an examination officer on the performance of
County X in a national examination. Out of 3500 male candidates below 20 years of
age, 500 passed and 3000 failed. Of the 1100 male candidates 20 years old and over
200 passed and 900 failed. As regards the female candidates, out of 500 below 20
years of age, 100 passed and 400 failed. Of the 340 females 20 years old and over, 80
passed and 260 failed.
Required;
Present the above information in tabular form
3.3 Variables
A variable is a measurable quantity which varies from one value to another
eg. Price, production, temperature etc
15
iii. Condition series
Example
Product Sales in Ksh. ’000s
Sugar 650
Rice 420
Cosmetics 350
Crockery 280
Tin food 160
16
3.5 Frequency distribution
Frequency distribution is the grouping of statistical data according to size or
magnitude
It consists of class intervals and their corresponding frequencies
Its features include:
i. Number of classes – minimum of 6-8 and max of 20 -25
ii. Class intervals -span of a class(upper limit – lower limit)
iii. Class limits and boundaries
iv. Class mid-point=(UCL +LCL)/2
Class frequency – No. of values/items in each class
There are two methods of classifying the data according to class-intervals:
i. Exclusive method – upper limit of one class is the lower limit of
the next and is not included in that former class.
ii. Inclusive method – upper limit of one class is included in that
class itself and not repeated in the next class
17
Exercise
You are provided with the following data
2, 4, 3, 1, 23, 5, 7, 9, 21, 19, 11, 13, 17, 14, 20, 10, 12, 16, 14, 7, 6, 19, 22, 11, 23,
18, 22, 13, 24, 2, 5, 3,24, 4, 3, 2
Required;
Group the data using a class interval of 5 in;
i. Inclusive form
ii. Exclusive form
Revision Questions
In groups of three students discuss the following questions
18
4.0 MEASUREMENT
What is to be measured?
i) Objects-Things of ordinary experience -Some things that are not concrete
ii) Properties -characteristics of objects
4.2.1.2 Advantage
Simplicity in situations where data is classified into groups 4.2.1.3
Disadvantages
Has no arithmetical origin,
Information on varying degree of attitude, skill, satisfaction, etc, would be
wasted.
4.2.2.1Advantage
Simple
4.2.2.2Disadvantages
Waste of the measure of degree
Limitation of statistical analysis to rank order correlation,
The measure or value around which the data is scattered is known as measures of
central tendency or averages. An average removes all the unnecessary details of the
data and gives a concise picture of the huge data under investigation.
20
5.1 Characteristics of measures of central tendency
A good average should have the following characteristics;
X1 + x2 ……………xn = Σx
X= n n
Where:
X= Mean or arithmetic mean
N= Number of items
x1, x2, x3,………….,xn = Values of items
Σ (Greek letter called sigma) = sum of all items
21
Solution:-
Total marks = 75 + 55+48+72+60 = 310
Number of subjects 5 5
X= Σx = 310 = 62
n 5
Example 2
Monthly sales of Company XYZ for the last 6 months were: shs.37,000; 48,000;
73,000; 35,000;53,000
Required:
Find the monthly average sales.
Solution
In short-cut method, a specific value or the assumed mean( say A) from the given set
of values is taken as provisional mean.
The differences between values of various items of the series and this provisional
mean are known as the deviations are derived.
X = A +Σd
N
Where:-
A = Provisional Mean
Dx = Deviations from A.
ΣDx = The sum of deviations from A.
This method is applied when the data is too large and complicated.
Example
22
Compute the mean wage or arithmetic mean using:-
a. Direct method
b. Short cut method
Solution
Earnings d=(x - a)
(x) (a = 1500)
Sh. Sh.
1000 -500
1200 -300
1300 -200
1100 -400
1090 -410
1010 -490
1500 0
1900 400
1700 200
2000 500
13800 -1200
Arithmetic mean:
Direct method: X= Σx = sh.13,800 = sh. 1380
n 10
One may find that the short-cut method takes more time as compared to direct
method. However, this is true only for ungrouped data.
In case of grouped data, short-cut method saves time.
Direct method
X= Σ fx
∑f
Where:
f = Frequencies
∑f=Total of frequencies
23
Short cut method
X= a + Σfd
∑f
Example
Calculate the arithmetic mean from the following data using direct method and short
cut method:-
Values 5 10 15 20 25 30 35 40 45 50
Frequency: 20 43 75 67 72 45 39 9 8 6
Solution
Direct method
Values (x) Frequency (f) fx
5 20 100
10 43 430
15 75 1125
20 67 1340
25 72 1800
30 45 1350
35 39 1365
40 9 360
45 8 360
50 6 300
Total 384 8530
24
X= a + Σfx
∑f
=25 + -1070
384
= 25 – 2.8
=22.2
Continuous series
The method of calculating the arithmetic mean from a continuous series is exactly the
same as that of discrete series with the exception that in a continuous series, we first
take the mid points of the various class intervals which are written against each class
interval.
These mid-point values are multiplied by the corresponding frequencies.
The provisional mean is also taken from these mid-point values.
Example
Solution:
Direct method:
Marks mid point (x) f fx
(x)
0-20 10 5 50
20-40 30 7 210
40-60 50 13 650
60-80 70 8 560
80-100 90 7 630
∑f=40 Σfx= 2100
25
Short Cut method:
Mean = A + Σfd
∑f
26
n = Number of items
f = frequency
Example
Compute the geometric mean of the following data
130, 135, 140, 145, 146, 148, 149, 150, 157
Solution
X Log x
130 2.1139
135 2.1303
140 2.1461
145 2.1614
146 2.1614
148 2.1703
149 2.1732
150 2.1761
157 2.1959
∑Log x = 19.4316
∑Log x
G.M= AntiLog of n
= AntiLog of 19.4316
9
27
Example
The following data was collected from a sample of a 100 students in a certain
University;
No. of 4 10 14 53 7 12
Students
Example
The table below shows the scores obtained by 48 students in an examination
No. of 3 7 8 18 12
Students
This is the reciprocal of the arithmetic mean of the reciprocals of the values of
items in a given series.
28
For ungrouped data;
n
Harmonic Mean =
∑1
x
Solution;
X 1/X
1 1.000
2 0.500
4 0.250
5 0.200
8 0.125
10 0.100
10 0.100
∑1/X = 2.275
n
Harmonic Mean =
∑ 1
x
= 700
2.275
= 3.077
29
Example
The following data was collected from a 100 students in a tertiary institution;
No. of 4 10 14 53 7 12
Students
Example
The table below shows the scores obtained by 48 students in an examination
No. of 3 7 8 18 12
Students
30
5.1.4 Median
Median is the value of the middle item of a series when these items are arranged
in either ascending or descending order.
Median item = n+1 th item where n = number of items.
2
Solution
First arrange the data in ascending or descending order:-
S.No. values
1 15
2 17
3 19
4 20
5 21
Example
The marks of six students in a class are 80,70,75,85,60 and 80. Find the median
Solution
First arrange the data in ascending order:-
S.No Marks
1 60
2 70
3 75
4 80
5 80
6 85
31
Thus;
Example
The following data related to sizes of shoes sold at a store during a given week. Find
the median size:
Size of
Shoes: 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5
11.0
No of Pairs 1 2 4 5 15 30 60 95 82 75 44 25 15
4
Solution:
Size of shoes (x) No. of pairs (f) Cumulative frequency (cf)
4.5 1 1
5.0 2 3
5.5 4 7
6.0 5 12
6.5 15 27
7.0 30 57
7.5 60 117
8.0 95 212
8.5 82 294
9.0 75 369
9.5 44 413
10.0 25 438
10.5 15 453
11.0 4 457
th
Median = size of n item
2
= 228.5 th item
228.5 th item lies in 294 cumulative frequencies so: median = 8.5
32
5.1.4.3 Computation of Median in a continuous series
In order to calculate the median of the continuous frequency distribution, there is one
difficulty, that is, the value of the median lies in a class interval so this value is
calculated by the method of interpolation.
Median = L + i [ n -c]
f 2
where:-
Example
Solution
Marks (x) Students (f) Cumulative frequency
(cf)
0-10 2 2
10-20 18 20
20-30 30 50
30-40 45 95
40-50 35 130
50-60 20 150
60-70 6 156
70-80 3 159
Median = n th item
2
= 159 th item = 79.5th item
2
This lies in 95, so the median group is 30-40 marks group.
33
Median = L + i [ n -c]
f 2
Example
Solution:
In this question the data is assumed to be continuous, the class boundaries are
reshaped and the distribution takes the form:
Class f Cf
boundaries
49.5-59.5 7 7
59.5-69.5 81 88
69.5-79.5 192 280
79.5-89.5 312 592
89.5-99.5 218 810
99.5-109.5 82 892
109.5-119.5 18 910
Median = n th item
2
= 910 th item = 455 th item
2
:. Median lies in the group 79.5 – 89.5
Median = L + i [ n -c]
f 2
=79.5 + 10 (455-280)
312
=79.5 + 10 (175) = 79.5 + 1750
312 312
=79.5 + 5.61 = 85.11
34
5.1.4.4 Obtaining median graphically
Uses a cumulative frequency curve.
Example
Solution:
The above data can be written as under:-
Class f c.f
0-10 5 5
10-20 10 15
20-30 15 30
30-40 8 38
40-50 7 45
Mark cumulative frequencies (c.f.) on the graph paper. C.f of each group is marked
against upper limit of the respective group.
35
Median item = 45 = 22.5th item
2
Median = 26 approximately
36
skewed towards the right. They are equal when the distribution is
symmetrical.
5.1.5 Mode
Mode is the value of the item which occurs most frequently in a series.
Example 17
The marks of 10 students in a test are as under:-
Student 1 2 3 4 5 6 7 8 9 10
Marks 65 43 57 63 39 57 60 48 57 55
Find out mode:
Solution:
Marks 33 39 43 48 55 57 60 63 65
F 1 1 1 1 1 2 1 1 1
Mode = 57 marks
37
Solution:
Marks No of Students (f)
0-10 2
10-20 7
20-30 21
30-40 25
40-50 30
50-60 35
60-70 28
70-80 12
Mode = Lm + d1 i
d 1 + d2
Modal class = 50 – 60
Mode = 50 + 5 x 20
5+ 7
= 50 + 0.417 x 20
= 50 + 8.333
= 58.333 Marks
Example
Use graphical method to determine the mode of the following data
Wages (Sh) No. of workers
0-10 15
10-20 17
20-30 19
30-40 25
40-50 16
50-60 15
60-70 13
70-80 10
80-90 5
90-100 2
38
Graphical Solution
Plot the histogram using class boundaries against frequencies;
25
F
r
e 20
q
u
e 15
n
c
i 10
e
s
5
0 10 20 30 40 50 60 70 80 90 100
Class Boundaries
35
Mode = Sh 35 (Approximately)
Example
Use graphical method to determine the mode of the following data
Marks No of students
0-5 7
5-10 8
10-15 15
15-20 16
20-25 19
25-30 13
30-35 12
35-40 10
40-45 5
45-50 2
39
Solution by Formula method
Modal class = 20 – 25
Mode = Lm + d1 x i
d1 + d2
Lm = 20, d1 = 19- 16 = 3 , d2 = 19 - 13 = 6
and i = 25 – 20 = 5
Mode = 20 + 3 x 5 = 21 marks
3+ 6
Graphical Solution
Use a histogram as follows
20
F
r
e 15
q
u
e
10
n
c 5
y
0 5 10 15 20 25 30 35 40 45 50
Class Boundary
22
40
REVISION EXERCISES
EXERCISE ONE
The managers of an import agency are investigating the length of time that customers
take to pay their invoices, the normal terms for which are 30 days net. They have
checked the payment record of 100 customers chosen at random and have compiled
the following table:
Required:
a) Calculate the arithmetic mean.
b) Calculate the standard deviation
c) Construct a histogram and insert the modal value.
EXERCISE TWO
The price of the ordinary 25p shares of Manco PLC quoted on the stock exchange, at
the close of the business on successive Fridays is tabulated below
Required
a) Group the above date into eight classes.
b) Calculate cumulative frequency, the median value, quartile values and the
semi-quartile range.
c) Compare and contrast the values that you have obtained for:
i) The median and mean
ii) The semi-interquartile range and the standard deviation
41
6.0 MEASURES OF DISPERSION
6.1 Introduction
The various measures of central tendency gives us one single value that represents
the entire data. But average alone cannot adequately describe a set of observation,
unless all the observation are alike. It‟s therefore necessary to describe the variations
or dispersions of the observations.
The first two (Range and Quartile deviation or inter quartile range) are positional
measures because they depend on the values at a particular position in the
distribution. Mean deviation or average deviation and Standard deviation are
42
calculated by employing all measures in calculation. The last one is a graphical
method.
6.5.1.1 Range
The range is defined as the difference between the smallest and the largest value of a
series.
For grouped data, the range is equal to the difference between the upper class
boundary of the highest class and the lower class boundary of the lowest class.
Where:
L = the largest value of distribution
S = smallest value of distribution
Example:
The following data represents sales of news papers during a week by a vendor:
Solution:
Range = L – S where L = 4800 and S = 600
Example:
Calculate the range and coefficient of range from the following data:
43
students
Coefficient of range
L S
L S
60.5 9.5
60.5 9.5
51
70
0.72857
0.73
(X X)
MD1
n
Where:
M.D = mean deviation
X = value of item
X =mean
IX X Absolute value of the deviation of x from the mean
X X Summation of all absolute values of deviation of x from the mean
n = number of all items
44
b) For grouped data
f X X
MD1
f
Where:
f = frequency of each class
Example:
Calculate the mean deviation and coefficient M.D from the following data;
X 9 14 16 23 30 41
F 2 4 5 3 2 2
Solution
x F d fd xf
(X X) f(x-20.28)
9 2 22.56 18
14 4 25.12 56
16 5 21.4 80
23 3 8.16 69
30 2 19.44 60
41 2 41.44 82
f 18 fd 138.2 xf 365
fI X xf 365
M .D X
f f 18
138.18 20.28
18
7.667
45
Example:
Calculate the mean deviation and its coefficient from the following data:
Solution
15-19 17 4 68
20-24 22 7 154
27 4 103
25-29
20 30.0
xf 390
X 19.5
f 20
Variance is the arithmetic mean of squared deviation from the mean. Standard
deviation is the square root of the variance. If all the numbers in the sample are very
close to each other, the standard deviation is close to zero. If the numbers are well
dispersed the standard deviation will tend to be large. A small standard deviation
means a high degree of uniformity of the observations as well as homogeneity of
series and vise versa.
46
Ungrouped Data
Population Variance
2
2
X X
n
or
2
2
X2 X
n n
2
X X
n
or
2
X2 X
n n
Where:
b) Sample Variance
2
2
X X
n 1
47
2
X X
n 1
GROUPED DATA
n n 1
2 2
f X X f X X
n n 1
Ungrouped data
2
x2 x
n n
Grouped data
2
fx 2 fx
n n
mean
Coefficient of variation
CV 100
mean
48
REVISION EXERCISES
EXERCISE ONE
ii) Compute the standard deviation and coefficient of variation from the following
data:
Marks 0-10 10-20 20-30 30-40 40-50
No. of students 7 6 15 12 10
EXERCISE TWO
a) What is dispersion and what is the formula for the standard deviation?
b) What is the measure of relative dispersion?
EXERCISE THREE
The managers of an import agency are investigating the length of time that customers
take to pay their invoices, the normal terms for which are 30 days net. They have
checked the payment record of 100 customers chosen at random and have compiled
the following table:
Required:
Calculate the arithmetic mean.
Calculate the variance and standard deviation
Construct a histogram and insert the modal value.
Estimate the probability that an unpaid invoice chosen at random will be
between 30 and 39 days old.
49
EXERCISE FOUR
Define the coefficient of variation.
The following table gives profits (in ten thousands of shillings) of two supermarkets
for the year 2012.
Required:
i) Compute the coefficient of variation for each supermarket.
ii) Indicate for which supermarket the variability of profits is relatively greater.
50
7.0 SKEWNESS
This is the tendency of a given frequency curve leaning towards the left. In a
positively skewed distribution, the long tail extended to the right.
Normal distribution
Long tail
Median
Mode
Mean
Median
Mode
Mean
NB: This frequency curve for the age distribution is characteristic of the age
distribution in developed countries
i) The mode is usually bigger than the mean and median
ii) The median usually occurs in between the mean and mode
iii) The no. of observations above the mean are usually more than those
below the mean (see the shaded region)
51
7.3 MEASURES OF SKEWNESS
These are numerical values which assist in evaluating the degree of deviation of a
frequency distribution from the normal distribution.
2. Coefficient of skewness
mean - mode
=
Standard deviation
NB: These 2 coefficients above are also known as Pearsonian measures of skewness.
Example
The following information was obtained from an NGO which was giving small loans
to some small scale business enterprises in 2010. the loans are in the form of
thousands of Kshs.
Loans Units Midpoints(x) x-a=d d/c= u fu Fu2 UCB cf
(f)
46 – 50 32 48 -15 -3 -96 288 50.5 32
51 – 55 62 53 -10 -2 -124 248 55.5 94
56 – 60 97 58 -5 -1 -97 97 60.5 191
61 –65 120 63 (A) 0 0 0 0 0 0
66 –70 92 68 5 +1 92 92 70.5 403
71 –75 83 73 10 +2 166 332 75.5 486
76 – 80 52 78 15 +3 156 468 80.5 538
81 – 85 40 83 20 +4 160 640 85.5 57.8
86 – 90 21 88 25 +5 105 525 90.5 599
91 – 95 11 93 30 +6 66 396 95.5 610
Total 610 428 3086
52
Required
Using the Pearsonian measure of skewness, calculate the coefficients of skewness
and hence comment briefly on the nature of the distribution of the loans.
c fu
Arithmetic mean = Assumed mean +
f
428 ×5
= 63 +
610
= 66.51
2
fu 2 fu
The standard deviation =c× -
f f
2
3086 428
=5 × -
610 610
= 10.68
n +1
The Position of the median lies m =
2
610 +1
= = 305.5
2
305.5 - 191
= 60.5 + ×5
120
114.4
= 60.5 + ×5
120
Median = 65.27
Therefore the Pearsonian coefficient
66.51- 65.27
=3
10.68
= 0.348
53
Comment
The coefficient of skewness obtained suggests that the frequency distribution of the
loans given was positively skewed
This is because the coefficient itself is positive. But the skewness is not very high
implying the degree of deviation of the frequency distribution from the normal
distribution is small
Example 2
Using the above data calculate the quartile coefficient of skewness
Q3+ Q1- 2Q2
Quartile coefficient of skewness =
Q3+ Q1
610 +1
The position of Q1 lies on = = 152.75
4
152.75 - 94
∴ actual value Q1 =55. 5 + 5 58.53
97
610 +1
The position of Q3 lies on =3 = 458.25
4
458.25 - 403
∴ actual value Q3 =70.55 + 5 73.83 × 5
83
610 +1
Q2 position: i.e. 2 = 305.5
4
305.5 -191
Actual Q2 value 60.5 5 65.27
120
Conclusion
Same as above when the Pearsonian coefficient was used
8 KURTOSIS
54
Generally there are 3 types of kurtosis namely;-
i. Leptokurtic
ii. Mesokurtic
iii. Platykurtic
8.1 Leptokurtic
A frequency distribution which is lepkurtic has generally a higher peak than that of
the normal distribution. The coefficient of kurtosis when determined will be found to
be more than 3. thus frequency distributions with a value of more than 3 are
definitely leptokurtic
8.2 Mesokurtic
Some frequency distributions when plotted may produce a curve similar to that of the
normal distribution. Such frequency distributions are referred to as mesokurtic. The
degree of kurtosis is usually equal to 3
8.3 Platykurtic
When the frequency curve contacted produces a peak which is lower that that of a
normal distribution when such a curve is said to be platykurtic. The coefficient of
such is usually less than 3
55
= 549.9
∴ percentile measure of kurtosis
Q3 - Q1
K(Kappa) = ½
P90 - P10
73.83 - 58.53
=½
81.99 - 52.85
= 0.26
Since 0.26 < 3, it can be concluded that the frequency distribution exhibited by the
distribution of loans is platykurtic
Kurtosis is also measured by moment statistics, which utilize the exact value of each
observation.
X
i. M1 the first moment = M1 = = Mean M1 or M1
n
X2
M2 =
n
X3
M3 =
n
X4
M4 =
n
M4
Moment coefficient of Kurtosis
S4
56
Example
Find the moment coefficient of the following distribution
x f
12 1
14 4
16 6
18 10
20 7
22 2
528
M = = 17.6
30
179.20
σ2 = = 5.973
30
σ4 = 35.677
4
x m f 2, 676.74
M4 = = = 89.22
f 30
89.22
Moment coefficient of Kurtosis = = 2.5
35.677
Note Coefficient of kurtosis can also be found using the method of assumed mean.
57
9.0 INDEX NUMBERS
An index number is an attempt to summarize a whole mass of data into one figure.
The single figure shows how one year differs from another year.
It is a statistical devise used to measure the change in the level of prices, wages
output and other variables at given times, relative to their level at an earlier time
which is taken as the base for comparison purposes
Pn
A simple price index = × 100 (an unweighted price index)
Po
Qn
A simple quantity index = × 100 (an unweighted quantity index)
Qo
Where pn is the price of a commodity in the current year (the year for which the price
index to be calculated)
Where po is the price of the same commodity in the base year (the year for
comparison purposes)
pn q n
Value index = × 100
Po q o
pn
po wo
Laspeyre’s Price index 100
wo
58
Where w0 are the proportions of the total expected in the basic period. This formula is
frequently used to calculate retail price index.
Example;
Year 1985 1986 1987 1988 1989 1990 1991 1992
Price index 100 104 108 109 112 120 125 140
When changing the base year, it is advisable to update the weights used in the base
year.
A chain based index is one where the index is calculated every year using the
previous year as the base year. This type of index measures rate of change from year
to year.
59
This method is suitable where weights are changing rapidly and items are constantly
being brought into the index and unwanted items taken out. It can be a price or
quantity index
9.7 Deflation
Indexes may be used to deflate time series so that comparisons between periods may
be made in real terms
It is a process of reducing a value measured in current period prices to its equivalent
in the base period prices. The deflated value is what would have been necessary to
60
purchase the same amount of goods as the present value can purchase in the current
period
pn q n
Deflation Factor = × 100
p0 q n
61
iii. Textile
iv. Construction
v. Gas electricity, water e.t.c
It excludes agriculture, fishing, trade, transport, finance and other such industries.
Each industries order is given a weighting. The weighting is based on average
monthly production in each industry in a fixed base year. It gives each item its
relative importance amongst all other items and thus gives a better estimate of the
index for comparison purposes.
Example
The share prices of ordinary shares of four companies on 1st January 1990 and 1st
January 1991 were as follows.
Using an unweighted geometric index, calculate the index of share prices at 1.1.1991
if 1.1.1990 is the base date, index 100
Solution
1 1
12 15 25 6 4 27000 4 1
2.25 4
10 12 20 5 12000
1.225
percentage increase = 22.5% index = 122.5
9.8.3 Inflation
The inflation rate for a given period can be calculated using the following formula;
Current retail price index
Inflation = × 100
Retail price index in the base year
62
9.9.1 Factor Reversal Test
This test indicates that when the price index is multiplied with a quantity index i.e.
factors are reversed), it should result in the value index.
9.9.2The time reversal test
If we reverse the time subscripts of a price or quantity index, the result should be
reciprocal of the original index.
63
EXERCISE ONE
Cable PLC manufactures an item of domestic equipment which requires a number of
components which have varied as various modifications of the model have been used.
The following table shows the number of components required together with the
price over the last three years of production.
Required:
a) Establish the base weighted price indices for 2011 and 2012 based on
2010 for the item of equipment.
b) Establish the current weighted price indices for 2011 and 2012 based on
2010 for the item of equipment.
c) Using the results of (a) and (b) as illustrations, compare and contrast
Laspeyre‟s and Paasche price index numbers.
EXERCISE TWO
A company manufacturing a product known as TOILTEX uses five components in its
assembly.
The quantities and prices of the components used to produce a unit of TOILTEX in
2010, 2011 and 2012 are tabulated as follows:
Required:
i) Calculate Laspyere‟s type price index number for the cost of one unit of
TOILTEX for 2011 and 2012 based on 2010.
ii) Calculate Paasche type price index numbers for the cost of one unit of TOILTEX
for 2011 and 2012 based on 2010.
64
iii) Compare and contrast the Laspeyre and Paasche price-index numbers you
have obtained in (i) and (ii)
EXERCISE THREE
Required:
Explain the usefulness of an index of Industrial Production and an index of retail
prices to both sides in a series of pay negotiations.
65
10.0 DECISION TREES
Symbols used
Decision point
i) Are points where a choice exists between alternatives
ii) Represented with a square
iii) At a decision point, the decision maker has a choice on which course of action to
undertake
Outcome point
i) Are points where events depend on probabilities
ii) Represented with a circle/ node
End point
Are the final outcomes
Represented with a triangle
66
Illustration
Action B1
Action B2
Outcome D2
X1 Outcome
Action A1 X2
Outcome
D1 X3
Action A2 Outcome
Y1
Action C1
Outcome
Y2 D3 Action C2
Action C3
10.2 Steps in drawing a decision tree
Example
Two projects are being considered and the project data has been estimated as follows;
PROJECT A PROJECT A
RETURNS PROBABILI RETURNS PROBABILI
Kshs TY Kshs TY
OPTIMISTIC OUTCOME 6,000 0.2 6,500 0.1
MOST LIKELY OUTCOME 3,500 0.5 4,000 0.6
PESSIMISTIC OUTCOME (2,500) 0.3 (1,000) 0.3
67
Required;
(i) Construct a decision three for this problem
(ii) Calculate the expected monetary value (EMVs) of the two projects
Example
A company is considering whether to launch a new product. The success of the idea
depends on the ability of a competitor to bring out a competing product (Estimated at
60%) and the relationship of the competitor‟s price to the firm‟s price.
The table below shows the profits for each range that could be set by the company related
to the possible competing prices
The company must set its price first because its product will be on the market earlier so
that the competitor will be able to react to the price. Estimates of the probability of a
competitor‟s price are shown below;
Required:
i. Draw a decision tree and analyze the problem
ii. Recommend what the company should do
MP 70
D2
HP 90
No competition EV=90 LP* 0.8 30
0.4
EV
32.55
MP* 0.15 42
Market EV HP* 0.05 45
product 61.92 LP
LP* 0.2
34
D1 Competitio D3 MP EV MP* 0.7
0.6 43.2 45
n
EV=43.2
HP* 0.1 49
HP LP* 0.05
Don‟t market 10
MP*
product 68
EV 0.3
42.8 30
HP* 50.6
53
0
Example
A company is planning on drilling for oil. It can either drill immediately or carry out
some preliminary tests. Alternatively, the company could also sell the rights to the site to
another company. It has created the following decision tree of the problem;
Example:
Montana Electronics is a company producing Delux television sets. It is contemplating
launching a new model, the Super view. There are several possibilities that could be
opted for;
i) Continue producing Delux which has profits declining at 10% per annum on a
compounding basis. Last year its profit was KShs. 60,000/=.
ii) Launch Super view without any prior market research. If sales are high annual profit
is put at KShs. 90,000/= with a probability which from past data is put at 0.7. Low
sales have 0.3 probability and estimated profit of KShs. 30,000/=.
iii) Launch Super view with prior market research costing KShs. 30,000/= the market
research will indicate whether future sales are likely to be „good‟ or „bad.‟ If the
research indicates „good‟ then the management will spend KShs. 35,000/= more on
capital equipment and this will increase annual profits to KShs. 100,000/= if sales are
actually high. If however sales are actually low, annual profits will drop to KShs.
25,000/=. Should market research indicate „good‟ and management not spend more
on promotion the profit levels will be as for 2nd scenario above.
69
iv) If the research indicate „bad‟ then the management will scale down their expectations
to give annual profit of KShs. 50,000/= when sales are actually low, but because of
capacity constraints if sales are high profit will be KShs. 70,000/=. Past history of the
market research company indicated the following results.
Actual sales
High Low
Predicted Good 0.8* 0.1
sales level Bad 0.2 0.9
*When actual sales were high the market research company had predicted good sales
level 80% of the time.
Required:
Use a time horizon of 6 years to indicate to the management of the company which
option theory should adopt (Ignore the time value of money).
Solution
(a) First draw the decision tree diagram
DELUX
(Option 1)
60000 (declining)
High 0.7
90000
Super View
2 (Option2)
A
Low 0.3
30000
P(H|G)
100,000
Market 0.95
Research Extra B P(L|G)
(option 3) 35,000 25000
0.05 00
Good 1
No P(H|G)
extra 90000
C 0.95
E P(L|G)
30,000
0.05
P(H|B)
Bad 70000
D 0.34
P(L|B)
50000
0.66
70
Computations; note how probability figures are arrived at.
The decision tree dictates that the following probabilities need to be calculated.
P(H|G)
P(L|G) For sales
P(H|B) outcome;
P(L|B)
P(G|H) = 0.8
P(B|H) = 0.2
P(G|L) = 0.1 Given
P(B|L) = 0.9
P(H) = 0.7
P(L) = 0.3
P G|H P H 0.56
P H |G 0.95
P G 0.59
P G|L P L 0.03
P L|G 0.05
P G 0.59
P B|H P H 0.14
P H |B 0.34
P B 0.41
P B|L P L 0.27
P L|B 0.66
P B 0.41
71
Evaluating financial outcome:
Option 1:
Last year Shs. 60,000 profits
Year Shs.
1= 60,000 × 0.91 = 54,000.0
2= 60,000 × 0.92 = 48,000.0
3= 60,000 × 0.93 = 43,740.0
4= 60,000 × 0.94 = 39,366.0
5= 60,000 × 0.95 = 35,429.5
6= 60,000 × 0.96 = 31,886.5
253,022.0
Option 2
Expected value of Super View
Node (A): 0.7(90,000 × 6) + 0.3(30,000 × 6)
= 378,000 + 54,000 = KShs. 432,000/=
Option 3
Expected value of market research
72
Therefore we chose option 2 since it has the highest EMV.
73
11.0 PROBABILITY
P (H) = 1
2
1
P (T) =
2
Example
In a production run of 500 items, 15 items are found to be defective. If one item is
drawn at random, find the probability that it is defective
74
11.1.2 Relative Frequency Theory
This also referred as Empirical Theory
Repetitive experiments are conducted and the results of each trial recorded
Trial 1 2 3 4 5 6 7 8 9
Observations H H T H T T H H H
6 2
P (H) = =
9 3
P (T) = 3 = 1
9 3
If n = number of trials , m = number of heads observed and q = number of tails
observed
P (H) = m = 6 = 2
n 9 3
q 3 1
P (T) = = =
n 9 3
Thus;
P(H) = Lim m = 2
n ∞ n 3
m
The fraction is called relative frequency of the event in n trials
n
Example
1000 tosses of a coin results in 519 heads and 481 tails
Solution;
481
P (T) = = 0.481 or P (T) = 1 – 0.519 = 0.481
1000
75
11.1.3 Subjective Probability
11.1.4.1Probability Range
0 ≤ P (E) ≤ 1
76
Solution:
i. P ( W) = 6 = 1
6+24 5
ii. P ( W) = 24 = 4
6+24 5
Or
P ( W) = 1 - 1 = 4
5 5
Thus if A represent the probability of an event happening, the probability that the event
will not happen is given as follows;
P (Event not happening) = 1 - P (Event happening)
=1-A
11.2 Probability Events
11.2.1Mutually exclusive events
Are events which cannot happen at the same time ie either one or the other eg when a
coin is tossed either a head (H) or a tail (T) will occur but not both- if a head occurs it
excludes the possibility of a tail from occurring
11.2.2Independent events
Are events that can happen at the same time ie occurrence of one event does not affect
the occurrence of the other eg if a coin is tossed twice the occurrence of a head in the first
toss does not affect the occurrence of a head in the second toss.
Hence; P (H in the first toss) = ½ and P(H in the second toss) = ½
77
ii. Multiplication rule
iii. Conditional probability rule
Solution;
Let A and B represent the events that the residents read the morning and
evening papers respectively.
The probability that a resident reads either the morning or evening or both
the papers is given by;
P (A u B) = P (A) + P (B) – P (A n B)
If two events A and B are dependent in such a way that B occurs only after A has
occurred then the probability of both occurring is given as follows;
P (A and B) = P (B/A) x P (A)
Example 1
What is the probability of getting a 3 and 6 when a die is rolled twice?
1 1
6 6
78
P(3) = and P(6) =
P (3 and 6) = 1X 1= 1
6 6 36
1 is the probability of getting a 3 followed by a 6 if the order is not important
36 then 3followed by 6 or 6 followed by 3 is acceptable
Hence;
P (3 and 6) = P (3 followed by 6) or P ( 6 followed by 3)
= 1 X 1 + 1 x 1
6 6 6 6
1 1 1
= + =
36 36 18
Example
1. The probability that one is called for an interview is 1 . If one is called for the
6
interview the probability of being successful is 3. Find the probability that
10
one is successful in the interview?
Solution;
Let P(S) = probability that one is successful in interview
P (C) = probability one is called for the interview
P (S/C) = P(S n C) = 1 X 3
P (C) 6 10
= 1
20
Using a tree diagram
3
1 3 1
10 P (S/C) =
1 6
X 10 = 20
6
7
79 10
5
6
Example : A math teacher gave her class two tests. 25% of the class passed both tests and
42% of the class passed the first test. What percent of those who passed the
first test also passed the second test?
Example : A jar contains black and white marbles. Two marbles are chosen without
replacement. The probability of selecting a black marble and then a white
marble is 0.34, and the probability of selecting a black marble on the first draw
is 0.47. What is the probability of selecting a white marble on the second draw,
given that the first marble drawn was black?
Example : The probability that it is Friday and that a student is absent is 0.03.
Since there are 5 school days in a week, the probability that it is Friday
is 0.2. What is the probability that a student is absent given that today is
Friday?
Solution: P(Friday and Absent) 0.03
P(Absent|Friday) = = = 0.15
P(Friday) 0.2
Exercise
There are 100 students in a first year college intake. 36 are male and are studying
accounting, 9 are male and not studying accounting, 42 are female and studying
accounting, 13 are female and are not studying accounting.
Required:
i. Probability a student is a male
80
ii. Probability that a student is a male and studying accounting
iii. Probability that a student is female and not studying accounting
iv. Probability that a student is studying accounting given that she is a female
v. Probability that s student is not studying accounting given that he is a male
This is concerned with the method of estimating the probabilities of the causes of an
observed event.
The process involves working backwards from effect to cause
Bayes‟ theorem is used in the analysis of decisions using decision trees where
information is given inform of conditional probabilities and the reverse of these
probabilities must be found.
This theorem is also referred as Bayes‟ rule and is given as follows;
Example
A company has three production sections A, B and C which contribute 40%, 35% and
25% respectively, to a total output. The following percentages of faulty units have been
observed;
A 2% 0.02
B 3% 0.03
C 4% 0.04
There is a final check before output is dispatched. Calculate the probability that a unit
found faulty at this check has come from section A.
Solution: F P(A and F)
0.02
0.98 F P(A and F)
A
0.40 F P(B and F)
0.03
0.35
B 0.97
F P(B and F)
0.04 F P(C and F)
0.25 C
0.96
F P(C and F)
81
P(A/F) = P(A and F)
P(F)
= 0.0285
0.008
P(A/F) = 0.2807
0.0285
=
Example
The student body in a statistics class is 60 % males. The registration records show that
30% of the males attended private high schools and 62% of ladies attended public high
schools. A student involved in a case is known to have attended a public school. What is
the probability that the student is a male?
0.38
Priv P(F and Priv)
P (M and Pub)
P(Pub/M) = P(M)
82
P (M and Pub) = 0.60 x 0.70 = 0.42
P (M) = 0.60 x 0.70 + 0.60 x 0.30
= 0.42 + 0.18 = 0.60
0.42
0.60 == 0.70
The expected value of an event is the product of its probability and the outcome or value
of the event over a series of trials.
-Its used management to make a decision especially where there are many competing
alternatives or options.
Solution:
Monthly profit in Probability
Kshs (x) (p) px
10,000 0.70 7,000
20,000 0.30 6,000
∑px = 13,000
Example
Two projects are being considered and the project data has been estimated as follows;
PROJECT A PROJECT A
Kshs PROBABILITY Kshs PROBABILITY
OPTIMISTIC OUTCOME 6,000 0.2 6,500 0.1
MOST LIKELY OUTCOME 3,500 0.5 4,000 0.6
PESSIMISTIC OUTCOME 2,500 0.3 1,000 0.3
Required;
The expected value of each project
83
Solution;
PROJECT A PROJECT A
Kshs P EV Kshs P EV
OPTIMISTIC OUTCOME 6000 0.2 1200 6500 0.1 650
MOST LIKELY OUTCOME 3500 0.5 1750 4000 0.6 2400
PESSIMISTIC OUTCOME 2500 0.3 750 1000 0.3 300
Project EV 3700 3350
On the basis of expected value, project A would be preferred because it has a higher
value
Exercise
1. A company‟s sales for a new product are subject to uncertainty. It has determined a
range of possible outcomes over the first two years.
Year 1
Sales Kshs m %
High 40 60
Low 20 40
84
Year 2 : (i) If year 1 sales are high
Sales Kshs m %
High 80 90
Low 30 10
(ii) If year 1 sales are low
Sales Kshs m %
High 30 20
Low 10 80
Required:
Calculate the expected value for each year
2. Three groups of children contain 3 girls and 1 boy, 2 girls and 2 boys, 1 girl and 3
boys respectively. One child is selected at random from each group. Show that the
probability that three children selected consists of 1 girl and 2 boys.
3. A candidate is selected for the interview of management trainee for three companies.
For the first company there are 12 candidates, for the second company there are 15
candidates and for the third company there are 10 candidates. What is the probability
that his getting at least one of the company.
85
12.0 PROBABILITY DISTRIBUTIONS
The binomial probability distribution is usually characterized by the fact that the binomial
events have to fulfill the following properties
i. Each event has 2 possible outcomes only known as success or failure
ii. The probability of each outcome is independent of the previous outcomes
iii. The sample size is generally fixed
iv. The probabilities of success and failure tend to approach 0.5 if the sample size
increases (in the event when an unbiased coin is thrown a number of times)
v. The probabilities are given by the following equation
9 n r
P r C5 p r 1 p
n! n r
pr 1 p
r! n r
Where p = Probability of success
r = no. of successes
n = sample size
q = 1 – P = Probability of failure
Example 1
A medical survey was conducted in order to establish the proportion of the population
which was infected with cancer. The results indicated that 40% of the population were
suffering from the disease.
A sample of 6 people was later taken and examined for the disease. Find the probability
that the following outcomes were observed
a) Only one person had the disease
b) Exactly two people had the disease
c) At most two people had the disease
d) At least two people had the disease
e) Three or four people had the disease
86
Solution
P(a persona having cancer) = 40% = 0.4 = P
P(a person not having cancer) = 60% = 0.6 = 1 – p = q
a) P(only one person having cancer)
= 6C1 (0.4)(0.6)5
6!
= (0.4)1(0.6)5
5 !1!
= 0.1866
Note that from the formula
n
Crprqn-r: n = sample size = 6
p = 0.4
r = 1 = only one person having cancer
b) P(2 people had the disease)
= 6C2 (0.4)2 (0.6)4
6!
= (0.4) 2 (0.6)5
4!2!
6 5 4!
= (0.4) 2 (0.6)5
4! 2 1
= 15 × (0.4) 2 (0.6)5
= 0.311
c) P(at most 2) = P(0) + P(1) + P(2) = P(0) or P(1) or P(2)
So we calculate the probability of each and add them up.
P(0) = P(nobody having cancer)
= 6C0 (0.4) 0(0.6)6
6!
= (0.4) 0(0.6)6
0!6!
= (0.6)6
= 0.0467
The probabilities of P(1) and P(2) have been worked out in part (a) and (b)
Therefore P(at most 2) = 0.0467 + 0.1866 + 0.311 = 0.5443
d) P(at least 2)
= P(2) + P(3) + P(4) + P(5) + P(6)
= 1 – [P(0) + P(1)] This is a shorter way of working out the solution since
[P(0) + P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1]
= 1 – (0.0467 + 0.1866)
= 0.7667
87
e) P(3 or 4 people had the disease)
= P(3) +P(4)
= 6C3(0.4)3(0.6)3 + 6C4(0.4)4(0.6)2
6! 6!
= (0.4) 3(0.6)3 + (0.4) 4(0.6)2
3!3! 2! 4!
= 20(0.4)3(0.6)3 + 15(0.4)4(0.6)2
= (20 × 0.013824) + (15 × 0.009216)
= 0.27648 + 0.13824
= 0.41472
Example
An insurance company takes a keen interest in the age at which a person is insured.
Consequently a survey conducted on prospective clients indicated that for clients having
the same age the probability that they will be alive in 30 years time is 2 3 . This
probability was established using the actuarial tables. If a sample of 5 people was insured
now, find the probability of having the following possible outcomes in 30 years
a) All are alive
b) At least 3 are alive
c) At most one is alive
d) None is alive
e) At least 1 is alive
Sample size = 5
P alive p 23 where as P not alive q 13
a) P all alive P r 5
5 2 5 1 0
C5 3 3
5! 2 5 1 0
3 3
5!0!
2 5
3
32
243
88
b) P atleast 3 alive P r 3
P 3 orP 4 orP 5 P 3 P 4 P 5
5 2 4 1 1
P 4 C4 3 3
5 2 3 1 2
P 3 C3 3 3 5! 2 4 1 1 5 4! 2 4 1
3 3 3 3
5! 2 5 1 0 2 3 1 2
4!1! 4! 1
3 3 10 3 3
3!2! 80
80 243
243
32
P 5
243
80 80 32
P 3
243 243 243
192
243
5! 1 5 5! 243
2 1 4
0!5! 3 1!4! 3 3
P atleast 1 alive P r 1
1 10 e)
243 243 1 P none alive
11 1
1
243 243
242
243
89
12.1.2 POISSON PROBABILITY DISTRIBUTION
This is a set of probabilities which is obtained for discrete events which are described as
being rare. Occasions similar to binominal distribution but have very low probabilities
and large sample size.
Examples of such events in business are as follows:
i. Telephone congestion at midnight
ii. Traffic jams at certain roads at 9 o‟clock at night
iii. Sales boom
iv. Attaining an age of 100 years (Centureon)
iv)Poisson probabilities are frequently applied in business situations in order to determine
the numerical probabilities of such events occurring.
v) The formula used to determine such probabilities is as follows
e x
P x
x!
Where x = No. of successes
⋋ = mean no. of the successes in the sample (⋋ = np)
e = 2.718
Example
A manufacturer assures his customers that the probability of having defective item is
0.005. A sample of 1000 items was inspected. Find the probabilities of having the
following possible outcomes
i. Only one is defective
ii. At most 2 defective
iii. More than 3 defective
e λλx
P(x) =
x!
(⋋ = np = 1000 × 0.005) = 5
i. P(only one is defective) = P(1) = P(x = 1)
2.718 5 51 1
= Note that 2.718-5=
1! 2.718 5
5
=
2.718 5
5
=
148 .33
= 0.0337
90
ii. P(at most 2 defective) = P(x ≤ 2)
= P(0) + P(1) + P(2)
e 5 50 P(1) = 0.0337
P(x = 0) =
0!
= 2.718-5 P(2) =
2.718 5 52
1 2!
=
2.718 5 25
1
=
= 2 148.336
148 .336
= 0.00674 = 0.08427
= 0.012471
=1– P0 P1 P2 P3
Example
A firm is manufacturing 45,000 units of nuts. The probability of having a defective nut is
0.15
Calculate the following
i. The expected no. of defective nuts
ii. The variance and standard deviation of the defective nuts in a daily consignment
of 45,000
91
Solution
Sample size n = 45,000
P(defective) = 0.15 = p
P(non defective) = 0.85 = q
i. ∴ the expected no of defective nuts
= 45,000 × 0.15 = 6,750
ii. The variance = npq
= 45000 × 0.85 × 0.15
= 5737.50
The standard deviation = npq
= 5737.50
= 75.74
Example
The probability of a rare disease striking a given population is 0.003. A sample of 10000
was examined. Find the expected no. suffering from the disease and hence determine the
variance and the standard deviation for the above problem
Solution
Sample size n = 10000
P(a person suffering from the disease) = 0.003 = p
∴ expected number of people suffering from the disease
Mean = λ = 10000 × 0.003
= 30
= np = ⋋
variance = np = 30
Standard deviation = np =⋋
= 30
= 5.477
In a continuous distribution, the variable can take any value within a specified range, e.g.
2.21 or 1.64 compared to the specific values taken by a discrete variable e.g 1 or 3. The
probability is represented by the area under the probability density curve between the
given values.
92
12.2 CONTINUOUS PROBABILITY DISTRIBUTIONS
The uniform distribution, the normal probability distribution and the exponential
distribution are examples of a continuous distribution
Normal probability
Distribution curve
Line of symmetry
Age (Yrs)
ii. The line of symmetry divides the curve into two equal halves
iii. The two ends of the normal distribution curve continuously approach the horizontal
axis but they never cross it
iv. The values of the mean, mode and median are all equal
NB: The above distribution curve is referred to as normal probability distribution curve
because if a frequency distribution curve is plotted from measurements of a given sample
drawn from a normal population then a graph similar to the normal curve must be
obtained.
vi) It should be noted that 68% of any population lies within one standard deviation, ±1σ
93
vii) 95% lies within two standard deviations ±2σ
viii) 99% lies within three standard deviations ±3σ
0 Z
12.2.1.2STANDARDIZATION OF VARIABLES
Before we use the normal distribution curve to determine probabilities of the continuous
variables, we need to standardize the original units of measurement, by using the
following formular.
μ
Z=
σ
Where χ = Value to be standardized
Z = Standardization of x
µ = population mean
σ = Standard deviation
Example
A sample of students had a mean age of 35 years with a standard deviation of 5 years. A
student was randomly picked from a group of 200 students. Find the probability that the
age of the student turned out to be as follows
i. Lying between 35 and 40
ii. Lying between 30 and 40
iii. Lying between 25 and 30
iv. Lying beyond 45 yrs
v. Lying beyond 30 yrs
vi. Lying below 25 years
94
Solution
(i). The standardized value for 35 years
35 - 35
Z= = = 0
σ 5
Hence, the area between Z = 0 and Z = 1 is 0.3413 (These values are checked from the
normal tables see appendix)
30 35
Z= = = -1
σ 5
95
iv). P(beyond 45 years) is determined as follow = P(x > 45)
45 35 10
Z= = = =+2
σ 5 5
Examples
1. The length of time until an electronic device fails
2. The time required to wait for the first emission of a particle from a radio
active source
3. The length of time between successive accidents in a large factory
Assume that a probability density function f(x) is valid between the values a and b, then
b
(i).. f ( x)dx 1 i.e. The area under the curve is equal to 1
a
b
(ii).The mean of the distribution E x xf x dx
a
(iii) The variance of the distribution = E(x2) – [E(x)]2
b
Where E x 2 x 2 f x dx
a
Solution
96
1
i) ii) 1
f x dx 1 Mean E x xf x dx
0 0
1 1
1 1
kx.dx k
2 x2 1 2 x 2 dx 2
3 x3
0 0
0 0
k
2 1 0 1 2
3 0 2
3
k 2
iii) Variance E x2 E x
2
b
2
x 2 f x dx Mean
a
1
2 2
x 2 2 x dx 3
0
1
1
2 x4 4
9
0
1 4
2 9
1
Variance
18
Example
The mean life of an electrical component is 100 hours and its life has an exponential
distribution.
Find
a. The probability that it will last less than 60 hours
b. The probability that it will last more than 90 hours
Solution
A continuous random variable X has an exponential distribution, if for some constant k
>0 it has the probability density function
k .e kx for x 0
f x
0 elsewhere
The function f(x) is positive for all values of x and the area under the curve
f x dx ke kx dx 1
0 0
1 1
The mean of an exponential distribution with parameter k is k and its variance is k2
Example
The mean of an exponential distribution is 100, find;
a) P(x<60)
b) P(x>90)
97
solution.
60
1
1 x 1
a) P x 60 100 e 100
dx mean 100 thus k
0
100
x 60
0.6
e 100
1e
0
0.45
b) P x 90 1 P x 90
90
x
1
1 100 e 100
dx
0
0.9 90 0.9
1 e e
0
0.41
12.2.3.1Properties of t distribution
i) The t distribution ranges from – ∞ to ∞ first as does the normal distribution
ii) The t distribution like the standard normal distribution is bell shaped and
symmetrical around mean zero
iii) The shapes of the t distribution changes as the number of degrees of freedom
changes
iv) The t distribution is more platykurtic than the normal distribution
v) The t distribution has a greater dispersion than the standard normal distribution.
As n gets larger the t distribution approaches the normal distribution when n =
30 the difference is very small
98
Relation between the t distribution and standard normal distribution is shown in the
following diagram
-4 -3 -2 -1 0 1 2 3 4
Note that the t distribution has different shapes depending on the size of the sample.
When the sample is quite small the height of the t distribution is shorter than the normal
distribution and the tails are wider.
Assumptions of t distribution
1. The sample observations are random
2. Samples are drawn from normal distribution
3. The size of sample is thirty or less n ≤ 30
Application of t distribution
- Estimation of population mean from small samples
- Test of hypothesis about the population mean
- Test of hypothesis about the difference between two means
Chi square was first used by Karl Pearson in 1900. It is denoted by the Greek letter χ2. it
contains only one parameter, called the number of degrees of freedom (d-f), where the
term degree of freedom represents the number of independent random variables that
express the chi square
P(x) ℧=1
℧=2
℧=3
℧=4
℧=5
0 1 2 3 4 5 6 7 8 9 10 .
χ2
100
Where normal population means are unknown
n1 – sample size of independent random 1
n2 – sample size of independent random 2
s12 - Sample variance of 1
s22 – sample variance of 2
d12 - Population variance of 1
d 22 Population variance of 2
2
x1 x1
s12 as the unbiased estimator of d12
n1 1
2
x2 x2
s22 as the unbiased estimator of d22
n1 1
F – Distribution with n1–1 and n2–1 degrees of freedom. F distribution depends on the
degrees of freedom ℧1 for the numerator and ℧2 for the denominator. It has parameters
℧1 and ℧2 such that for different values of ℧1 and ℧2 will have different distributions.
Properties F Distribution
1. The shape of the f distribution depends upon the number of degrees of
freedom
2. The mean and variance of the f distribution are
Mean = ℧1 for ℧2 >2
-v2 - 2
1
2v2 v1 v2 2
Variance 2
for ℧2 > 4
v1 v2 2 v2 4
101
Assumptions
a) All sample observations are randomly selected and independent
b) The total variance of the various sources of variance should be additive.
c) The ratio of S12 to S22 should be equal to or greater than 1
d) The population for each sample must be normally distributed with identical mean
of variance
e) F value can never be negative
102
REVISION QUESTIONS AND TABLES
SET 1
QUESTION ONE
QUESTION TWO
The table below gives the monthly rent paid by 160 employees in Company X
10,000 – 15,999 20
16,000 – 21,999 34
22,000 – 27,999 45
28,000 – 33,999 23
34,000 – 39,999 22
40,000 – 45,999 16
103
QUESTION THREE
The table below shows the distribution of the weight of students in a class of 100.
30 – 39 5
40 – 49 10
50 – 59 35
60 – 69 28
70 – 79 13
80– 89 9
Construct:
(a) An Ogive
(b) Estimate the median
(c) Estimate the 7th Decile
(d) Estimate the 90th Percentile
QUESTION FOUR
Box A contains 4 defective and 6 non-defective items. Box B contains 3 defective and 7
non-defective items. A box is picked at random and then an item is drawn from it.
Find the probability:
i) Draw a tree diagram to represent the information.
ii) Of drawing a defective item.
iii) Of drawing a non-defective item.
iv) That box A was picked given that a defective item has been drawn.
v) That box B was picked given that a non-defective item was drawn.
QUESTION FIVE
b) The prices in Kshs per Kg and consumption in tones of some retail products in a
certain region between 2011 and 2012 were as shown in the table below;
104
Product 2011 2012
A 36 99 39 94
B 79 11 89 9
C 44 15 40 18
D 4 1100 5 1200
QUESTION SIX
22 73 52 65 34 46 25 36 43 61 72 23 50 46 64 72 68 55 61 75 67 27
58 66 70 49 60 77 64 21 53 35 60 57 66 73 75 78 79 37 48 45 61 63
71 67 76 68 59
a) Organize the data in a frequency distribution table with 20 – 29 as the first class.
105
SET 2
QUESTION ONE
High response rate enhances both the validity and reliability of the research findings.
Discuss how you would enhance response rate in a survey.
QUESTION TWO
Differentiate between the following;
i) Stratified sampling and cluster sampling
ii) Systematic sampling and snowball sampling
QUESTION THREE
The following set of data represents a frequency distribution of the number of foreign
Required;
a) The Arithmetic Mean
b) The Median
c) The Mode
d) The Variance and standard deviation
e) The coefficient of skewness
106
SET 3
QUESTION ONE
QUESTION TWO
a) The frequency distribution below shows the mass of some flowers produced in a
farm off Limuru road in the month of October 2012.
Frequency (f) 7 14 22 13 6
QUESTION THREE
24 72 81 96 34 83 48 38 46 25
28 36 79 86 27 62 73 55 44 24
25 54 35 75 14 64 55 67 63 66
36 72 49 52 54 48 49 52 42 46
a) Construct a frequency distribution table with the class interval 10-20, being
the first class.
b) Represent the above data on a histogram
c) A frequency polygon
d) Briefly explain the distinction between relative frequency and frequency
distribution .
107
QUESTION FOUR
The table below shows the distribution of the weight of students in a class of 100.
Compute:
(e) Mean absolute deviation
(f) Median
(g) Mode
(h) Standard deviation
QUESTION FIVE
QUESTION SIX
b) A group of 50 students was asked which of three daily newspapers they read,
Nation Standard and Star. The results showed that 25 read Nation, 16 read
Standard and 14 read Star, 5 read both Nation and Standard, 4 read Standard and
Star while 6 read Nation and Star and 2 read all the three.
i) Illustrate these data on Venn diagram.
ii) Find the probability that a person selected at random from this group reads;
i) At least 1 of the newspapers.
ii) None of the newspapers.
iii) Only 1 of the newspapers.
iv) Only the Nation.
108
QUESTION SEVEN
b) The prices in Kshs per Kg and consumption in tones of some retail products in a
certain region between 2010 and 2011 were as shown in the table below;
A 2 40 5 80
B 4 20 8 50
C 4 10 4 25
D 5 20 10 60
E 8 75 12 90
109
MATHEMATICAL TABLES
Student's t-distribution
110
Z DISTRIBUTION TABLES
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
111
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
112