0% found this document useful (0 votes)
646 views112 pages

Herman Karanja Mwangi Final Submission PDF

This document outlines a course on business statistics. It provides the course title, code, credit hours, and lecturer. The course objective is to equip students with skills in collecting, organizing, presenting, and analyzing data. The course outline lists 13 topics to be covered across 13 weeks, including measures of central tendency, probability, and decision trees. Students will be evaluated based on CATs, assignments, and a final exam. Instruction methods include lectures, discussions, assignments, and presentations. Suggested reading texts are also provided.

Uploaded by

cyrus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
646 views112 pages

Herman Karanja Mwangi Final Submission PDF

This document outlines a course on business statistics. It provides the course title, code, credit hours, and lecturer. The course objective is to equip students with skills in collecting, organizing, presenting, and analyzing data. The course outline lists 13 topics to be covered across 13 weeks, including measures of central tendency, probability, and decision trees. Students will be evaluated based on CATs, assignments, and a final exam. Instruction methods include lectures, discussions, assignments, and presentations. Suggested reading texts are also provided.

Uploaded by

cyrus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

COURSE TITLE: BUSINESS STATISTICS 1

COURSE CODE: BMA 210


3 CREDIT HOURS
Course lecturer: Herman K Mwangi

COURSE OBJECTIVE,
This course equips the students with the necessary skills in collection, organization,
presentation and analysis of data.

COURSE OUTLINE
TOPIC AND CONTENT REMARKS

WEEK 1 Introduction and general overview


Definition of statistics and divisions of statistics
Importance, functions of statistics and nature of
statistical data
Statistics as tool of management
Limitations of statistics
WEEK 2 Collection of data
Sources of data
Methods of collecting primary data
WEEK 3 Classification and Tabulation
Classification of data
Tabulation of data
Discrete and continuous variables
Statistical series
Frequency distribution
WEEK 4 Presentation of data and levels of measurement
Presentation of data
Nominal scale
Ordinal scale
Interval scale
Ratio scale
WEEK 5 Measures of Central tendency
Arithmetic Mean
The Median
The Mode
Geometric mean
harmonic Mean
WEEK 6 CAT 1

1
WEEK 7 Measures of dispersion
The range
The mean absolute deviation
Population variance and standard deviation
Sample variance and standard deviation
Measures of Skewness
Measures of Kurtosis

WEEK 8 Index Numbers


Introduction.
Uses and problems in construction of index
numbers
WEEK 9 Index Numbers
Methods of constructing of index numbers
WEEK 10 CAT 2
WEEK 11 Probability
Approaches to probability: Classical approach,
relative frequency approach, questimate
approach, axioms of probability
Probability range, mutually exclusive events,
independent events, collectively exhaustive
events and complementary events
WEEK 12 Probability
Probability rules: addition law,
multiplication law and conditional
probability rule
Baye‟s theorem
WEEK 13 Decision trees
Decision point, outcome point and end point
Construction of a decision tree

COURSE EVALUATION
1. CAT & Assignment 30%
2. Final Examination 70%

Total 100%

METHOD OF INSTRUCTION
1. Lectures
2. Class discussions and group discussions
3. Individual assignment
4. Presentations
SUGGESTED READING TEXTS

2
1. King‟ori G. K. (2004): Fundamentals of Applied Statistics; Jomo Kenyatta
Foundation.
2. Quantitative techniques by Terry Lucey
3. Statistical methods and data analysis by R. Lyman Ott & Michael Longnecker
4. Kothari,C.R,(2009). Research Methodology: Methods & Techniques 2nd
Edition.New Delhi.
5. Saunders, M., Lewis,P. and Thornhill (2009). Research Methods for Business
Students 5th Edition. London
6. Saleemi N. A (2006): Quantitative Techniques; Saleemi Publications Ltd,
Nairobi.
7. Saleemi N. A (2006): Business Mathematics and Statistics Simplified;
Saleemi Publications Ltd, Nairobi.

3
1.0 INTRODUCTION

1.1 Definition of Statistics

Statistics two meanings- in singular and plural


a) Statistics in singular: Scientific study of the principles and methods applied in
collection, organization, presentation, analysis and interpretation of numerical
data. (in any field of inquiry). Statistics as a field of study is concerned with
the following activities:
i) Collection, organization, presentation, and analysis of data
ii) Making inferences about a body of data when only a part of the
data is observed, and
iii) Interpretation and communication of the results of the first two
activities
b) Statistics in plural: Systematically collected and aggregated numerical data.
Statistical data have the following main characteristics
i) aggregates of facts e.g. total sales in a month
ii) multiplicity of causes e.g. total demand is a function of several
factors
iii) numerically expressed
iv) enumerated or estimated according to a reasonable standards of
accuracy e.g. 90% accuracy
v) collected in a systematic manner
vi) collected for a predetermined purpose e.g. to determine economic
growth rate
vii) comparable i.e. placed in relation to each other e.g. chronological
(time wise) and geographical comparisons

4
1.2 Functions of Statistics

i. Definiteness: presentation of facts in a precise and definite form for


understanding. e.g. economy has grown at 6% in 2005
ii. Condensation: simplification of mass of data into a few significant
figures. E.g. per capital income
iii. Comparison: facilitates comparison of figures
iv. Formulation and testing hypothesis
v. Prediction - forecasting
vi. Formulation of policies

1.3 Statistics in Business and Management


Group work 1.0
(NB : in each of the areas ; Students are grouped in to small discussion groups of
three to five people; Each group discuss an application area through the role
play approach.)

i. Marketing
ii. Production
iii. Finance
iv. Banking
v. Investment
vi. Purchase
vii. Accounting
viii. Planning and Control
ix. Quality control
x. Credit
xi. Personnel
xii. Research and development
.

5
1.4 Limitations of Statistics

1. Statistics does not deal with isolated measurement


2. Statistics deals only with quantitative characteristics
3. Statistical results are true only on average
4. Statistics is only a means and not an end.
5. Statistics can be misused

1.Discuss the limitations in details in relevance to the group work 1.0 given
earlier: Each group to give the relevant limitations in their area of discussion.
2. Is national census Important? Discuss this in terms of budgeting, policy
formulation in the country.

6
2.0 COLLECTION OF DATA

2.1 Statistical inquiries

In order to collect data for a particular investigation, it is important take note of the
object and scope of inquiry, nature and types of inquiry, statistical unit and degree of
accuracy.

2.1.1 Object and Scope of Inquiry

Every investigation has its unique objective to be achieved and scope of coverage.
Before undertaking any investigation, determine its objective and extent of coverage.
This helps to save money, save time and ensure important data is not neglected
Scope entails questions as from where, when, whom statistical data will be collected

2.1.2 Nature and Types of Inquiry

Based on their nature a statistical inquiry can either be

2.1.3 Primary or Secondary

Primary inquiry- Data is collected for the first time by the investigators
Secondary inquiry-Investigators use data that was collected by someone else for
some other purpose. This data is available in research papers, magazines and journals
etc

2.1.4 Census or Sample

Census inquiry- Make a complete enumeration of all individuals of the universe


Sample inquiry-Make use of units selected to represent the universe.

2.1.5 Open or Confidential

Open inquiry- Inquiry whose results are not kept secret or shown to the public
Confidential inquiry-Inquiry whose result are kept secret or not shown to the public.

2.1.6 Direct or Indirect

Direct inquiry-Data measured directly eg age, wages etc


Indirect inquiry- Data measured indirectly eg intelligence-measured through a test

2.1.7 Regular or Ad hoc

Regular inquiry-Data collected on regular basis ie periodically eg Population census


taken after every 10 years
Ad hoc inquiry- Data collected only as need arises eg increasing crime rate

7
2.1.8 Initial or Repetitive

Initial inquiry- Being conducted for the first time


Repetitive inquiry- Is a repetition of previous inquiries with slight modifications.

2.1.9 Official, Semi-official or Non-official

Official inquiry- Conducted by the government eg National manpower survey-


Ministry of labour. This inquiry is conducted through given regulation or legislation
Semi-official inquiry- Conducted by bodies that are supported by the government
Non-official inquiry-Conducted by private organizations

2.1.10 Statistical Units

These are the units used for collection of data. There are two types of statistical units.
These are;

2.1.10.1 Physical units- Are objective and used in day-to-day life eg


kilograms, litre, metres etc

2.1.10.2 Arbitrary units- Are adopted by investigators for their own


use eg wages, workers literate etc. This units are subjective and
therefore defined by the investigator eg work-skilled workers

2.1.10.3 Characteristic of statistical units


These are;
a) Suit the purpose of investigation
b) Should be stable
c) Should be homogenous
d) Should be defined correctly and clearly

2.1.11 Degree of Accuracy


Before undertaking any investigation, determine and state the standard of accuracy to
be achieved
Degree of accuracy depends on the nature and type of inquiry

2.2 Sources of Statistical Data


There are two main sources of data. These are;
i) Primary data
ii) Secondary data

2.3 Methods of Collecting Primary Data


The methods used to collect primary data include; Observation, Personal interview,
Telephone interview Questionnaires

8
2.3.1 Observation Method
Investigator observes the objects and records the desired information without asking
any questions. This method is most suitable when validity of data collected by other
methods is questionable.

2.3.1.1 Advantages
i) Data collected are objective and generally more accurate.
ii) It is easier to note that effects of environmental influence on specific
outcomes.
iii) Does not rely on the willingness and ability of respondents to report
accurately.
iv) It‟s easier to observe certain groups of individuals (e.g. very young children
and extremely busy executives) from whom it may be otherwise difficult to
obtain information.
v) No biasing effect of interviewers at phrasing of their questions.

2.3.1.2 Disadvantages
i) Inability to observe such things as attitudes, motivation and plans. Only as
those factors are reflected in actions can they be observed.
ii) It‟s necessary for the observer to be physically present unless a camera or
other mechanical system can capture the event of interest), after long periods
of time.
iii) This method is not only slow but also tedious and expensive.
iv) Influences the behavior of the object

2.3.2 Personal Interview


Researcher asks respondents questions (face to face)
The questions can either be direct or indirect
The interview may be very structured or unstructured
Structured interview Unstructured interview
 The same questions are presented in the  If preplanned questions are asked, they are altered to
same manner and order to each subject. suit the situation and subjects.
 The choice of alternative answers is  Subjects are encouraged to express their thoughts
restricted to a predetermined list freely
 The same introductory and concluding  The introductory and concluding remarks varies
remarks are used.  They are less scientific in nature
 They are more scientific in nature  Only a few questions are asked to direct their answers.
 They introduce controls that permit the  In some instances, the information is obtained in such a
formulation of scientific generalizations. casual manner that the respondents are not aware they
 Highly restrictive are being interviewed.
 They have few restrictions.
2.3.2.1 Advantages
i) Greater flexibility
ii) Greater control of the interview situation
iii) Opportunity to use probes

9
iv) Suitable for intensive investigation
v) High responses rate
2.3.2.2 Disadvantages
i) Higher costs
ii) Lack of anonymity
iii) reluctance to discuss sensitive topics
iv) Not suitable for extensive inquiry
v) Any bias by the investigator can damage the whole inquiry

2.3.3 Telephone Interview


Investigator asks respondents questions over a telephone
Most frequently used when information must be collected quickly, inexpensively
and the output of information is limited
2.3.3.1 Advantages
i.Moderate costs
ii.Speed-Can reach as many respondents in a short time
iii.High response rate

2.3.3.2 Disadvantages
i. Reluctance to discuss sensitive topics on telephone
ii. Respondents can terminate interview before it is finished (hang-up)
iii. Difficult to collect supplementary information about the respondent

2.3.4 Questionnaire
A standard list of questions (questionnaire) relating to the problem is prepared
It‟s a pre-formulated written set of questions to which respondents record their
answers
The questionnaires are delivered and returned either;
i. Electronically –email or online
ii. Post/mail
iii. drop and pick questionnaires- Delivered by hand to each respondents and
collected later.Usually the questionnaires are completed by respondents

2.3.4.1 Features of a Good Questionnaire

The questions to be used are


i. Short and clear
ii. Few in numbers
iii. Seek definite answers
iv. Non-confidential
v. Logical sequence
vi. Relevant to the problem

2.3.4.2 Advantages
i. Low cost compared to other methods

10
ii. Reduced biasing error- Respondents are not influenced by interviewer
characteristics or techniques
iii. Greater anonymity- This is especially important when sensitive issues are
involved
iv. Greater accessibility (wide geographic contact at minimal cost)
v. Respondents have adequate time to think about their answers and/or
contact other sources.
vi. Suitable for extensive inquiries

2.3.4.3 Disadvantages

i. Require simple, easily understood questions and instructions.


ii. No opportunity for probing. They do not offer researchers the opportunity to
probe for additional information or to clarify answers.
iii. No control over who fills out the questionnaires
iv. Low response rate
v. Sequence bias (respondent changes mind after seeing later/earlier questions.

2.3.5 Focused Group Discussion

i. A focus group is an organized discussion session. A panel of people meets for


a short duration to exchange ideas, feelings, and experiences on a specific
topic.
ii. A trained facilitator, using group dynamics principles, guides participants
through the meeting.
iii. Increasingly used in social and business research, focus group meetings
enable a researcher to gain much information in a relatively short period of
time. Focus groups have been a mainstay in private sector marketing research
for the past three decades.
iv. The focus group allows for group interaction and greater insight into why
certain opinions are held.
v. Focus groups can improve the planning and design of new programs, provide
means of evaluating existing programs, and produce insights for developing
marketing strategies.

2.4 Secondary Data


Secondary data refers to data that have been previously collected for some
project other than one at hand. These data are usually historical, already
assembled and do not require access to respondents. It is usually obtained from
already existing reports.

2.4.1 Advantages
i. Are cheap
ii. Can be obtained quickly

11
iii. Can provide information which may not be available to a typical
researcher
iv. May be objective because it will not be biased.

2.4.2 Disadvantages
v. Information / data may be outdated
vi. Variation in definition of terms
vii. Units of measurements may be different
viii. Lack of information to verify the data’s accuracy.

2.5 Sampling

Sampling is the process of selecting a number of individual for a study in such a way
that the individuals selected represent the large group from which they were selected.
The individuals selected form the sample.
The large group from which they were selected is the population (Universe) eg total
number of students in a college
The purpose of sampling is to secure a representative group which will enable the
researcher to gain information about the population.
There are two methods that can be used to collect statistical data. These are
i) Census method
ii) Sampling method

2.5.1 Census Method


In census method, all the units of the population or universe are examined.
2.5.1.1Advantages of census method
i. Information from all units
ii. Greater accuracy
2.5.1.2 Disadvantages of census method
i. Expensive
ii. Time consuming

2.5.2 Sampling Method


A few units from the population or universe are selected for the purpose of data
collection
The results obtained on the basis of sampling are generalized to the population
2.5.2.1 Advantages of sampling method
i. Reduces cost of study
ii. Reduces time of study
iii. greater supervision
iv. Destructive nature of certain observations, measurements or tests.
v. Scarcity – sometimes only a small sample is available.
2.5.2.2 Disadvantages of sampling
i. Unreliable if the sample is not representative
ii. Requires skills and experience

12
iii. Not applicable where a census is required

2.5.3 Sampling Process


i. Define the population (i.e. the group you are interested in the study)
ii. specify the sampling frame
iii. specify the sampling units (individual, household etc)
iv. determine the sample size
v. Select the sample. How? (Sampling techniques).

2.5.3.1 Population
Population refers to the entire group of people, objects, events or things of interest
that the researcher wishes to investigate.
2.5.3.2 Sampling frame
Is a list of elements from which a sample is actually drawn
Note: Perfect sampling frames are however rare
2.5.3.3 Sampling unit
Is the basic unit containing the elements of the population to be sampled. e.g. female,
male, purchasing agents etc
2.5.3.4 Sample size
Is the number of elements that constitute the sample

2.5.4 Methods of Sampling


There are two main methods:
i. Random or probability sampling method
ii. Non-random or non-probability sampling method

3.0 ORGANISATION OF DATA


Organization of data refers to the classification and tabulation of data

3.1 Importance of organization of data


i)Make data comprehensive
ii)Provides for fast reading of data
iii)Aids statistical analysis

3.2 Classification of data


This is arrangement of related data/facts into different groups/classes with respect to
some characteristics (basis of classification)

3.2.1 Objectives of classification of data


The objectives of classification of data include;
i. Eliminate unnecessary details
ii. Highlight points of similarity and dissimilarity
iii. Comparison and inferences
iv. Highlight important aspects of the data

13
v. Utilize data for further statistical analysis

3.2.2 Types of classification of data


There are four types of data classification. These include;
i. Geographical or spatial classification – with respect to place/area e.g.
district
ii. Chronological or temporal classification- with respect to time
iii. Qualitative- on the basis of some attribute or quality such as sex, literacy etc
iv. Quantitative – on the basis of measurable characteristics such as age
Classification can also be divided into;
i. Simple classification- data divided into two classes only eg male & female,
literates & illiterates etc
ii. Manifold classification- data divided into classes and sub-classes eg. Male &
female then male literates & female literates

3.2.3 Tabulation of data


This is the systematic arrangement of statistical data in columns and rows
It simplifies the presentation and facilitate comparisons
Tables can either be;
i. Simple table-has data with only one characteristic
ii. Complex table-data with several characteristics

3.2.3.1Merits of tabulation
i. Facilitate easy understanding of data
ii. Aid comparison of different classes
iii. Easy location of required data
iv. Avoidance of unnecessary details

3.2.3.2 Parts of a table


i. Table number
ii. Title of the table
iii. Caption – column headings
iv. Stub – row headings
v. Body of the table – contains numerical information
vi. Headnotes – explanatory statement related to all or most of table contents e.g.
units of measurement such as “in millions”
vii. Footnotes- clarify anything in the table or give source in case of secondary
data

Example
In January 2011, a firm employed 90 staff of whom 79 were men. During the year 17
staff left and 13 of these were men. The total recruitment during the year was 13 out
of whom 3 were women. In 2012, wastage decline by 3 amongst men compared with
2011 and no women left, 6 more men but 2 fewer women were recruited than in the
previous year. The total number employed on 1st January 2012 amounted to 93.
Required;

14
Arrange the above information in concise tabular form showing all relevant totals and
sub-totals
Employees 2011 2012
Jan 1 Men women Total Men women Total
79 11 90 76 10 86
Recruited 10 3 13 16 1 17
Left (13) (4) (17) (10) - (10)
Total 76 10 86 82 11 93

Exercise
The following report was prepared by an examination officer on the performance of
County X in a national examination. Out of 3500 male candidates below 20 years of
age, 500 passed and 3000 failed. Of the 1100 male candidates 20 years old and over
200 passed and 900 failed. As regards the female candidates, out of 500 below 20
years of age, 100 passed and 400 failed. Of the 340 females 20 years old and over, 80
passed and 260 failed.
Required;
Present the above information in tabular form

3.3 Variables
A variable is a measurable quantity which varies from one value to another
eg. Price, production, temperature etc

3.3.1 Types of variables


i. Discrete variable
This is a variable that takes countable number of distinct values
The number of distinct values can be counted eg number of children in a
family, number of books, number of classmates etc
ii. Continuous variable
This is a variable that theoretically assumes any value in a given interval
Eg 0 < x < 1
Continuous variables cannot be listed eg. Exact time taken in 100 m race,
Height of children, temperatures in a given day etc

Discrete variables are counted but continuous variables are measured.

3.4 Statistical series


What is a series?
A series is the arrangement of statistical data in a systematic manner

3.4.1 Types of series


i. Spatial series
ii. Time series

15
iii. Condition series

3.4.1.1 Spatial Series


This is data that is arranged on the basis of geographical location
Example of spatial series
Town Sales in Ksh. ’000s
Nairobi 520
Mombasa 470
Nakuru 350
Eldoret 340

3.4.1.2 Time Series


This refers to data that is arranged on the basis of time
Example of time series
Year Sales in Ksh. ’000s
1991 1,250
1992 1,430
1993 1,140
1994 1,350

3.4.1.3 Condition Series


This is data that is arranged on the basis of a given condition

Example
Product Sales in Ksh. ’000s
Sugar 650
Rice 420
Cosmetics 350
Crockery 280
Tin food 160

16
3.5 Frequency distribution
Frequency distribution is the grouping of statistical data according to size or
magnitude
It consists of class intervals and their corresponding frequencies
Its features include:
i. Number of classes – minimum of 6-8 and max of 20 -25
ii. Class intervals -span of a class(upper limit – lower limit)
iii. Class limits and boundaries
iv. Class mid-point=(UCL +LCL)/2
Class frequency – No. of values/items in each class
There are two methods of classifying the data according to class-intervals:
i. Exclusive method – upper limit of one class is the lower limit of
the next and is not included in that former class.
ii. Inclusive method – upper limit of one class is included in that
class itself and not repeated in the next class

3.5.1 Types of Frequency Distribution

There are two types;


i. Discrete Series
ii. Continuous Series

3.5.1.1 Discrete Series


Various units are capable of exact measurement
Each unit is separate and complete
Definite breaks are visible
Marks 11 12 13 14 15 16 17
No. of Students 5 7 11 26 18 13 9

3.5.1.2 Continuous Series


Various units are not capable of exact measurement
The units are only approximations
Units are arranged in groups or classes
Marks obtained No. of Students
0-20 5
20-40 15
40-60 25
60-80 13
80-100 4

17
Exercise
You are provided with the following data
2, 4, 3, 1, 23, 5, 7, 9, 21, 19, 11, 13, 17, 14, 20, 10, 12, 16, 14, 7, 6, 19, 22, 11, 23,
18, 22, 13, 24, 2, 5, 3,24, 4, 3, 2
Required;
Group the data using a class interval of 5 in;
i. Inclusive form
ii. Exclusive form

Revision Questions
In groups of three students discuss the following questions

1. Discuss any five main sources of secondary data.


2. Discusss any four functions of statistics and application Areas.
3. Explain the four features of a questionnaire.

18
4.0 MEASUREMENT

Measurement is the process of assigning numbers to objects or observations. eg. Weight,


height, length, motivation, job satisfaction, etc

What is to be measured?
i) Objects-Things of ordinary experience -Some things that are not concrete
ii) Properties -characteristics of objects

4.1 Measurement scales


A scale can be defined as a series of items arranged according to a value for the
purpose of quantification.
The scales are grouped according to mathematical properties

4.2 Levels of measurement


There are four levels of measurement. These include;
i. Nominal scale
ii. Ordinal scale
iii. Interval scale
iv. Ratio scale

4.2.1 Nominal scale


Simply assigns number or symbols to events or objects for the purpose of identification
eg positive attitude = 0
negative attitude = 1
skilled = 1
unskilled = 0
satisfied =1
dissatisfied =0
Only allows the counting of number symbols and no other arithmetical operation.
Mode is the only measures of central tendency used
It has no measure of dispersion.

4.2.1.2 Advantage
Simplicity in situations where data is classified into groups 4.2.1.3
Disadvantages
Has no arithmetical origin,
Information on varying degree of attitude, skill, satisfaction, etc, would be
wasted.

4.2.2 Ordinal scale


Uses ranking, without attempting to fit the interval according to a rule.
It has no absolute values except ranking from highest to lowest.
Median and mode are the only measure of central tendency used.
Percentile is the only measure dispersion used
Examples : employee performance ranking; operational performance
ranking of
19
organizations, ranking of elements of operational performance on the basis of
importance, etc.

4.2.2.1Advantage
Simple
4.2.2.2Disadvantages
Waste of the measure of degree
Limitation of statistical analysis to rank order correlation,

4.2.3 Interval scale


Provides numerical measure in which the scales are adjusted on the basis of
some rule for making the units equal.
Mean, median and mode are the measures of central tendency
Standard deviation and percentile are the measures of dispersion.
Statistical computations include product moment correlation, t-test, and F-
test calculations.
Units are only valid when the rule established for equality is accepted
Zero is arbitrary.
Example: temperature measures in oC or oF
4.2.3.1Advantages
Acceptance of the concept of equality
More powerful statistical computations are possible.
4.2.3.2Disadvantages
The arbitrariness of zero,
Limits some mathematical operations on the data
Inability to measure complete absence of a trait/characteristic.

4.2.4 Ratio scale


This is the most precise scale of measurement.
It has an absolute zero
All mathematical and statistical operations not in other scales can be done on the
data.
Example: temperature measures in Kelvin , weight
4.2.4.1Advantage
High level of precision and flexibility on statistical analysis.
4.2.4.2 Disadvantage
Has some limitations arising from the nature of some variables.

5.0 MEASURES OF CENTRAL TENDENCY

The measure or value around which the data is scattered is known as measures of
central tendency or averages. An average removes all the unnecessary details of the
data and gives a concise picture of the huge data under investigation.

20
5.1 Characteristics of measures of central tendency
A good average should have the following characteristics;

1. It should be rigidly defined


2. It should be based on all values
3. It should be easily understood and calculated.
4. It should be least affected by the fluctuations of sampling
5. It should be capable of further algebraic or statistical treatment
6. It should be least affected by extreme values.

5.1 Types of Averages


The following are the commonly used types of averages:-
i. Arithmetic means or simple average
ii. Median
iii. Mode
iv. Geometric mean
v. Harmonic mean

5.1.1 Arithmetic Mean


It is also known as mean
Arithmetic mean is obtained by summing up the values of all the items of a series and
dividing this sum by the number of items. It is expressed as under:-

X1 + x2 ……………xn = Σx
X= n n

Where:
X= Mean or arithmetic mean
N= Number of items
x1, x2, x3,………….,xn = Values of items
Σ (Greek letter called sigma) = sum of all items

5.1.1.1Computation of arithmetic Mean


Arithmetic mean may be computed through
i. direct method or
ii. Short-cut method-provisional mean method/assumed mean
method/working mean method

5.1.1.1.1 Direct Method


Example 1
Kibet got the following marks in 5 subjects: 75, 55, 48, 72, and 60. Find his average
mark.

21
Solution:-
Total marks = 75 + 55+48+72+60 = 310
Number of subjects 5 5

X= Σx = 310 = 62
n 5

Example 2
Monthly sales of Company XYZ for the last 6 months were: shs.37,000; 48,000;
73,000; 35,000;53,000
Required:
Find the monthly average sales.

Solution

Total sales = 37,000+48,000+84,000+73,000+35,000+53,000 =


330,000
Number of months 6 6

X= Σx = 330,000 = sh. 55,000


n 6

5.1.1.2 Assumed Mean Method


Assumed mean method is a short-cut method:

In short-cut method, a specific value or the assumed mean( say A) from the given set
of values is taken as provisional mean.
The differences between values of various items of the series and this provisional
mean are known as the deviations are derived.

X = A +Σd
N
Where:-
A = Provisional Mean
Dx = Deviations from A.
ΣDx = The sum of deviations from A.

This method is applied when the data is too large and complicated.

Example

The monthly earnings of 10 employees are


Sh.1,000; 1,200;1,300;1,100;1,090;1,010;1,500;1,900‟1,700;2,000

22
Compute the mean wage or arithmetic mean using:-
a. Direct method
b. Short cut method

Solution

Earnings d=(x - a)
(x) (a = 1500)
Sh. Sh.
1000 -500
1200 -300
1300 -200
1100 -400
1090 -410
1010 -490
1500 0
1900 400
1700 200
2000 500
13800 -1200

Arithmetic mean:
Direct method: X= Σx = sh.13,800 = sh. 1380
n 10

Short cut method X= A + Σd =Sh1500 + 1200


n 10
=sh.1500 – 120 = sh.1380

One may find that the short-cut method takes more time as compared to direct
method. However, this is true only for ungrouped data.
In case of grouped data, short-cut method saves time.

If ungrouped (descrete) data is organized into a frequency distribution, mean is


calculated as follows;

Direct method

X= Σ fx
∑f

Where:
f = Frequencies
∑f=Total of frequencies

23
Short cut method

X= a + Σfd
∑f
Example

Calculate the arithmetic mean from the following data using direct method and short
cut method:-

Values 5 10 15 20 25 30 35 40 45 50
Frequency: 20 43 75 67 72 45 39 9 8 6

Solution

Direct method
Values (x) Frequency (f) fx
5 20 100
10 43 430
15 75 1125
20 67 1340
25 72 1800
30 45 1350
35 39 1365
40 9 360
45 8 360
50 6 300
Total 384 8530

X= Σfx = 8530 = 22.2


∑f 384
Short cut method

Values (x) Frequency d= x- 25 fd


(f) ( a = 25 )
5 20 -20 -400
10 43 -15 -645
15 75 -10 -750
20 67 -5 -335
25 72 0 0
30 45 5 225
35 39 10 390
40 9 15 135
45 8 20 160
50 6 25 150
total 384 -1070

24
X= a + Σfx
∑f

=25 + -1070
384
= 25 – 2.8
=22.2

Continuous series
The method of calculating the arithmetic mean from a continuous series is exactly the
same as that of discrete series with the exception that in a continuous series, we first
take the mid points of the various class intervals which are written against each class
interval.
These mid-point values are multiplied by the corresponding frequencies.
The provisional mean is also taken from these mid-point values.

Example

Marks number of students


0-20 5
20-40 7
40-60 13
60-80 8
80-100 7
40

Calculate the arithmetic mean

Solution:
Direct method:
Marks mid point (x) f fx
(x)
0-20 10 5 50
20-40 30 7 210
40-60 50 13 650
60-80 70 8 560
80-100 90 7 630
∑f=40 Σfx= 2100

X = Σfx = 2100 = 52.5 Answer


∑f 40

25
Short Cut method:

Marks mid point f d= x - a f.dx


(x) (a =50)
0-20 10 5 -40 -200
20-40 30 7 -20 -140
40-60 50 13 0 0
60-80 70 8 20 160
80-100 90 7 40 280
40 +100

Mean = A + Σfd
∑f

= 50 + 100 = 50 + 2.50 = 52.50

5.1.1.3 Advantages of Arithmetic Mean


i. It can be easily understood.
ii. It takes into account all the items of the series
iii. It is capable of algebraic treatment
iv. It is used for comparison
v. It is not indefinite. It is determined
vi. It is used very frequently.
5.1.1.4 Disadvantages of arithmetic mean
i. It is affected by extreme values to a greater extent
ii. It may be a figure which does not exist in a series
iii. It cannot be calculated if all the items of a series are not known
iv. It cannot be used in case of qualitative data.

5.1.2 Geometric Mean


This is the nth root of the product of items of a series.
For ungrouped data;

Geometric Mean = Antilog of ∑Log x


n
For grouped data;
∑f Log x
Geometric Mean = Antilog of n

Where; Log x = Logarithms of values of x

26
n = Number of items
f = frequency
Example
Compute the geometric mean of the following data
130, 135, 140, 145, 146, 148, 149, 150, 157

Solution
X Log x

130 2.1139

135 2.1303

140 2.1461

145 2.1614

146 2.1614

148 2.1703

149 2.1732

150 2.1761

157 2.1959

∑Log x = 19.4316

∑Log x
G.M= AntiLog of n

= AntiLog of 19.4316
9

= AntiLog of 2.159 = 144.2

27
Example
The following data was collected from a sample of a 100 students in a certain
University;

Weight 115.5 125.5 135.5 145.5 155.5 165.5


(Kg)

No. of 4 10 14 53 7 12
Students

Required: Compute the geometric mean

Example
The table below shows the scores obtained by 48 students in an examination

Marks 0-10 10-20 20-30 30-40 40-50

No. of 3 7 8 18 12
Students

Required: Compute the geometric mean

5.1.2.1 Advantages of Geometric Mean


i. It is rigidly defined
ii. It takes into account all the items of the series
iii. It is capable of algebraic treatment
iv. It is not affected by fluctuations of sampling
v. Its better than arithmetic mean used for comparison

5.1.2.2 Disadvantages of Geometric mean


i.It is not easy to compute
ii.Not easy to understand
iii.Does not give equal weights to every item

5.1.3 Harmonic Mean

This is the reciprocal of the arithmetic mean of the reciprocals of the values of
items in a given series.

28
For ungrouped data;
n
Harmonic Mean =
∑1
x

For grouped data;


n
Harmonic Mean=
∑f 1
x
Example

Compute the harmonic mean of the following data


1, 2, 4, 5, 8, 10, 10

Solution;

X 1/X

1 1.000

2 0.500

4 0.250

5 0.200

8 0.125

10 0.100

10 0.100

∑1/X = 2.275

n
Harmonic Mean =
∑ 1
x

= 700
2.275
= 3.077

29
Example

The following data was collected from a 100 students in a tertiary institution;

Weight 115.5 125.5 135.5 145.5 155.5 165.5


(Kg)

No. of 4 10 14 53 7 12
Students

Required: Compute the harmonic mean

Example
The table below shows the scores obtained by 48 students in an examination

Marks 0-10 10-20 20-30 30-40 40-50

No. of 3 7 8 18 12
Students

Required: Compute the harmonic mean

5.1.3.1 Advantages of Geometric Mean


i. It takes into account all the items of the series
ii. It is capable of algebraic treatment
iii. It is not affected by fluctuations of sampling

5.1.3.2 Disadvantages of Geometric mean


i. It is not easy to compute
ii. Not easy to understand
iii. Its not a good representative of a series

30
5.1.4 Median
Median is the value of the middle item of a series when these items are arranged
in either ascending or descending order.
Median item = n+1 th item where n = number of items.
2

5.1.4.1 Computation of median from discrete series:


Example
In a factory, there are 5 workers whose ages are 20, 15, 19, 21, 17 years. Find the
median age.

Solution
First arrange the data in ascending or descending order:-

S.No. values
1 15
2 17
3 19
4 20
5 21

Median item = n+1 = 5 + 1 = 3rd item


2 2

Hence, the median = 19 years

Example
The marks of six students in a class are 80,70,75,85,60 and 80. Find the median

Solution
First arrange the data in ascending order:-
S.No Marks
1 60
2 70
3 75
4 80
5 80
6 85

Median item = n+1 = 6 + 1 = 3.5th item


2 2

31
Thus;

Median = ½ (3rd Item + 4th Item)

So, median = ½ (75 + 80) =77.5 Marks

5.1.4.2 Computation of Median in large discrete series

The data is first organized into a frequency distribution table.


A cumulative frequency column is formulated and then the value of the middle item
is located.

Example
The following data related to sizes of shoes sold at a store during a given week. Find
the median size:

Size of
Shoes: 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5
11.0
No of Pairs 1 2 4 5 15 30 60 95 82 75 44 25 15
4

Solution:
Size of shoes (x) No. of pairs (f) Cumulative frequency (cf)
4.5 1 1
5.0 2 3
5.5 4 7
6.0 5 12
6.5 15 27
7.0 30 57
7.5 60 117
8.0 95 212
8.5 82 294
9.0 75 369
9.5 44 413
10.0 25 438
10.5 15 453
11.0 4 457
th
Median = size of n item
2
= 228.5 th item
228.5 th item lies in 294 cumulative frequencies so: median = 8.5

32
5.1.4.3 Computation of Median in a continuous series
In order to calculate the median of the continuous frequency distribution, there is one
difficulty, that is, the value of the median lies in a class interval so this value is
calculated by the method of interpolation.

Median = L + i [ n -c]
f 2

where:-

L = Lower class boundary of the median group


i = Class interval of the median group
f= Frequency of the median group
n = ∑f
C= Cumulative frequency of the group preceding the median group

Example

Find the median from the following table:

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Students 2 18 30 45 35 20 6 3

Solution
Marks (x) Students (f) Cumulative frequency
(cf)
0-10 2 2
10-20 18 20
20-30 30 50
30-40 45 95
40-50 35 130
50-60 20 150
60-70 6 156
70-80 3 159

Median = n th item
2
= 159 th item = 79.5th item
2
This lies in 95, so the median group is 30-40 marks group.

33
Median = L + i [ n -c]
f 2

=30+ 10 (79.5-50) = 30 + 10 (29.5)


45 45
M0.5 = 30 + 6.56 = 36.56

Example

Calculate the median from the following frequency distribution:-


Grade 50-59 60-69 70-79 80-89 90-99 100-109 110-119
frequency 7 81 192 312 218 82 18

Solution:

In this question the data is assumed to be continuous, the class boundaries are
reshaped and the distribution takes the form:
Class f Cf
boundaries
49.5-59.5 7 7
59.5-69.5 81 88
69.5-79.5 192 280
79.5-89.5 312 592
89.5-99.5 218 810
99.5-109.5 82 892
109.5-119.5 18 910

Median = n th item
2
= 910 th item = 455 th item
2
:. Median lies in the group 79.5 – 89.5

Median = L + i [ n -c]
f 2

=79.5 + 10 (455-280)
312
=79.5 + 10 (175) = 79.5 + 1750
312 312
=79.5 + 5.61 = 85.11

34
5.1.4.4 Obtaining median graphically
Uses a cumulative frequency curve.

Example

Find out the value of median graphically.

Class intervals frequency


0-10 5
10-20 10
20-30 15
30-40 8
40-50 7

Solution:
The above data can be written as under:-
Class f c.f
0-10 5 5
10-20 10 15
20-30 15 30
30-40 8 38
40-50 7 45

Mark cumulative frequencies (c.f.) on the graph paper. C.f of each group is marked
against upper limit of the respective group.

35
Median item = 45 = 22.5th item
2
Median = 26 approximately

5.1.4.5 Advantages of median


1. It is easy to calculate
2. It is simple and is understood easily
3. It is less affected by the values of extreme items
4. It can be calculated by inspection in some cases
5. It is especially useful in the study of those phenomena which are
of qualitative nature

5.1.4.6 Disadvantages of median


1. It is not a suitable representative of a series in most of the cases
2. It is not suitable for further algebraic treatment
3. It is not used frequently like arithmetic mean
4. It cannot be determined exactly in the case of continuous series.

5.1.4.6 Properties (characteristics) of the median


1. It is a positional average and is influenced by the position of the
items in the series and not by the size of items.
2. The sum of the absolute values of deviations is least when the
deviations are measured from median instead of any other values
3. Median is greater than the mean when the distribution is skewed
towards the left and is less than the mean when the distribution is

36
skewed towards the right. They are equal when the distribution is
symmetrical.

5.1.5 Mode
Mode is the value of the item which occurs most frequently in a series.

5.1.5.1 Calculation of mode from discrete series

From discrete series, mode is obtained through observation.


The value of that item which occurs most times is the mode.

Example 17
The marks of 10 students in a test are as under:-
Student 1 2 3 4 5 6 7 8 9 10
Marks 65 43 57 63 39 57 60 48 57 55
Find out mode:

Solution:
Marks 33 39 43 48 55 57 60 63 65
F 1 1 1 1 1 2 1 1 1

Mode = 57 marks

5.1.5.2 Computing Mode of Continuous Data (grouped data)


There are two methods which are used. These are;
i. Formula method- Gives an accurate value of mode
Mode = Lm + d1
i
d 1 + d2
Where; L = Lower class boundary of the modal class
d1 = Difference between the frequency of the modal class and the
frequency
of the class above it
d2 = Difference between the frequency of the modal class and the
frequency
of the class below it
i = Class interval ( Size) of the modal class

ii. Graphical method- Gives an approximate value of mode. This


method make use of the histogram
Example
Use the formula method to compute the mode of the following data
Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No of
Students: 2 7 21 25 30 35 28 12

37
Solution:
Marks No of Students (f)
0-10 2
10-20 7
20-30 21
30-40 25
40-50 30
50-60 35
60-70 28
70-80 12

Mode = Lm + d1 i
d 1 + d2

Modal class = 50 – 60

Lm = 50, d1 = 35- 30 = 5 , d2 = 35- 28 = 7 and i = 60 – 40 = 20

Mode = 50 + 5 x 20
5+ 7

= 50 + 0.417 x 20
= 50 + 8.333
= 58.333 Marks

5.1.5.2 Graphical Calculation of Mode


This involves the use of a histogram

Example
Use graphical method to determine the mode of the following data
Wages (Sh) No. of workers
0-10 15
10-20 17
20-30 19
30-40 25
40-50 16
50-60 15
60-70 13
70-80 10
80-90 5
90-100 2

38
Graphical Solution
Plot the histogram using class boundaries against frequencies;

Histogram of workers wages

25
F
r
e 20
q
u
e 15
n
c
i 10
e
s
5

0 10 20 30 40 50 60 70 80 90 100
Class Boundaries
35

Mode = Sh 35 (Approximately)

Solution by Formula method


Modal class = 30 – 40
Mode = Lm + d1 x i
d1 + d2

Lm = 30, d1 = 25- 19 = 6 , d2 = 25- 16 = 9


and i = 40 – 30 = 10
Mode = 30 + 6 x 10 = Sh 34
6+ 9

Example
Use graphical method to determine the mode of the following data
Marks No of students
0-5 7
5-10 8
10-15 15
15-20 16
20-25 19
25-30 13
30-35 12
35-40 10
40-45 5
45-50 2
39
Solution by Formula method
Modal class = 20 – 25
Mode = Lm + d1 x i
d1 + d2

Lm = 20, d1 = 19- 16 = 3 , d2 = 19 - 13 = 6
and i = 25 – 20 = 5
Mode = 20 + 3 x 5 = 21 marks
3+ 6

Graphical Solution
Use a histogram as follows

20
F
r
e 15
q
u
e
10
n
c 5
y

0 5 10 15 20 25 30 35 40 45 50
Class Boundary
22

Mode = 22 Marks (Approx)

40
REVISION EXERCISES
EXERCISE ONE
The managers of an import agency are investigating the length of time that customers
take to pay their invoices, the normal terms for which are 30 days net. They have
checked the payment record of 100 customers chosen at random and have compiled
the following table:

Payment in Number of customers


5 to 9 days 4
10 to 14 days 10
15 to 19 days 17
20 to 24 days 20
25 to 29 days 22
30 to 34 days 16
35 to 39 days 8
40 to 44 days 3

Required:
a) Calculate the arithmetic mean.
b) Calculate the standard deviation
c) Construct a histogram and insert the modal value.

EXERCISE TWO
The price of the ordinary 25p shares of Manco PLC quoted on the stock exchange, at
the close of the business on successive Fridays is tabulated below

126 120 122 105 129 119 131 138


125 127 113 112 130 122 134 136
128 126 117 114 120 123 127 140
124 127 114 111 116 131 128 137
127 122 106 121 116 135 142 130

Required
a) Group the above date into eight classes.
b) Calculate cumulative frequency, the median value, quartile values and the
semi-quartile range.
c) Compare and contrast the values that you have obtained for:
i) The median and mean
ii) The semi-interquartile range and the standard deviation

41
6.0 MEASURES OF DISPERSION

6.1 Introduction
The various measures of central tendency gives us one single value that represents
the entire data. But average alone cannot adequately describe a set of observation,
unless all the observation are alike. It‟s therefore necessary to describe the variations
or dispersions of the observations.

6.2 Definition of dispersion


Dispersion of data is the degree in which numerical data tend to spread about an
average value. It measures how much an average data tends to spread around an
average value or measure of central tendency. It is the extent of scatteredness of
items around a measure of central tendency.

A measure of dispersion indicates the extent to which individual observations differ


on average from the mean or from any other measure of central tendency. The
measures of dispersion are also called measures of variation or measures of spread.
When a measure of dispersion is expressed in the units of variables, it is called
absolute measure of dispersion. If it‟s expressed in the form of coefficient, ratio or
percentage then is called relative measure of dispersion.

6.3 Significance of measure of Dispersion


Measures of dispersion are needed for four basic functions:
1. To determine the reliability of average.
2. To serve as basis for the control of variability.
3. To compare two or more series with regard to their variability.
4. To facilitate the use of other statistical measures.

6.4 Properties of good measure of dispersion


1. Simple to understand.
2. Easy to compute.
3. Rigidly defined.
4. Based on each and every item of the distribution.
5. Applicable to further algebraic calculation.
6. Sampling stability.
7. Not be unduly affected by extreme items.

6.5 Measures of dispersion


The main methods of measuring dispersion are:
a) Range
b) Quartile deviation or inter quartile range
c) Mean deviation or average deviation
d) Standard deviation

The first two (Range and Quartile deviation or inter quartile range) are positional
measures because they depend on the values at a particular position in the
distribution. Mean deviation or average deviation and Standard deviation are
42
calculated by employing all measures in calculation. The last one is a graphical
method.

6.5.1.1 Range

The range is defined as the difference between the smallest and the largest value of a
series.

For grouped data, the range is equal to the difference between the upper class
boundary of the highest class and the lower class boundary of the lowest class.

Range (absolute value) = L – S


L S
Coefficient of range =
L S

Where:
L = the largest value of distribution
S = smallest value of distribution

Example:
The following data represents sales of news papers during a week by a vendor:

Day Monday Wednesday Tuesday Thursday Friday Saturday Sunday


Sales
1200 600 2000 1500 1800 3600 4800
(Kshs)

Find the range and coefficient of the range:

Solution:
Range = L – S where L = 4800 and S = 600

Range = 4800 – 600 = 4200


L S 4200
Coefficient range =
L S 4800 600
4200 7
0.7777 0.7
5400 9
0.78

Example:
Calculate the range and coefficient of range from the following data:

Marks 10-20 20-30 30-40 40-50 50-60


No. of 8 10 12 8 4

43
students

Range = L –S = 60.5 - 9.5 = 51

Coefficient of range
L S
L S
60.5 9.5
60.5 9.5
51
70
0.72857
0.73

6.5.1.2 Uses of range


1) Quality control
2) Fluctuations in the prices
a. Study of variations in the prices of stocks and shares and commodities
that are sensitive to price changes from one period to another.
3) Weather forecast.
a. Minimum and maximum temperatures
4) Common use
a. Range of wages and salaries

6.5.2 Mean Deviation

It is also known as average deviation.


It is the average amount by which values from the data set vary from the
mean. It is an absolute measure of dispersion (mean absolute deviation) thus
all deviations are taken positively, any resigns are ignored.

a) For ungrouped data

(X X)
MD1
n
Where:
M.D = mean deviation
X = value of item
X =mean
IX X Absolute value of the deviation of x from the mean
X X Summation of all absolute values of deviation of x from the mean
n = number of all items

44
b) For grouped data

f X X
MD1
f

Where:
f = frequency of each class

Example:
Calculate the mean deviation and coefficient M.D from the following data;

X 9 14 16 23 30 41
F 2 4 5 3 2 2

Solution

x F d fd xf
(X X) f(x-20.28)
9 2 22.56 18

14 4 25.12 56

16 5 21.4 80

23 3 8.16 69

30 2 19.44 60

41 2 41.44 82

f 18 fd 138.2 xf 365

fI X xf 365
M .D X
f f 18
138.18 20.28
18
7.667

45
Example:
Calculate the mean deviation and its coefficient from the following data:

Age (yrs) 10-14 15-19 20-24 25-29


No. of people 5 4 7 4

Solution

Class Mid point frequency xf d fd


X
10-14 12 5 60

15-19 17 4 68

20-24 22 7 154

27 4 103
25-29
20 30.0

xf 390
X 19.5
f 20

6.5.3 Variance and Standard Deviation

Variance is the arithmetic mean of squared deviation from the mean. Standard
deviation is the square root of the variance. If all the numbers in the sample are very
close to each other, the standard deviation is close to zero. If the numbers are well
dispersed the standard deviation will tend to be large. A small standard deviation
means a high degree of uniformity of the observations as well as homogeneity of
series and vise versa.

46
Ungrouped Data

Population Variance

2
2
X X
n
or
2
2
X2 X
n n

Population Standard deviation

2
X X
n
or
2
X2 X
n n

Where:

X = values of individual items


X = arithmetic mean
n = total number of items
2
= variance
= standard deviation

b) Sample Variance

Observation less than 30

2
2
X X
n 1

Sample Standard deviation

47
2
X X
n 1

GROUPED DATA

Population Variance Sample Variance


For a sample size <30)
2
f X X
2
2
f X X
2

n n 1

Population Standard deviation Sample Standard deviation

2 2
f X X f X X
n n 1

Short cut method

Ungrouped data

2
x2 x
n n

Grouped data

2
fx 2 fx
n n

Coefficient of standard deviation

mean

Coefficient of variation

CV 100
mean

48
REVISION EXERCISES

EXERCISE ONE

i) Calculate the standard deviation from the following data set:


192 288 236 184 260 354 291 530 and 242

ii) Compute the standard deviation and coefficient of variation from the following
data:
Marks 0-10 10-20 20-30 30-40 40-50
No. of students 7 6 15 12 10

EXERCISE TWO
a) What is dispersion and what is the formula for the standard deviation?
b) What is the measure of relative dispersion?

EXERCISE THREE
The managers of an import agency are investigating the length of time that customers
take to pay their invoices, the normal terms for which are 30 days net. They have
checked the payment record of 100 customers chosen at random and have compiled
the following table:

Payment in Number of customers


5 to 9 days 4
10 to 14 days 10
15 to 19 days 17
20 to 24 days 20
25 to 29 days 22
30 to 34 days 16
35 to 39 days 8
40 to 44 days 3

Required:
Calculate the arithmetic mean.
Calculate the variance and standard deviation
Construct a histogram and insert the modal value.
Estimate the probability that an unpaid invoice chosen at random will be
between 30 and 39 days old.

49
EXERCISE FOUR
Define the coefficient of variation.

The following table gives profits (in ten thousands of shillings) of two supermarkets
for the year 2012.

Month Supermarket A Supermarket B


January 65 28
February 48 33
March 15 20
April 28 23
May 41 69
June 59 45
July 41 53
August 10 15
September 24 35
October 56 57
November 92 99
December 120 136

Required:
i) Compute the coefficient of variation for each supermarket.
ii) Indicate for which supermarket the variability of profits is relatively greater.

50
7.0 SKEWNESS

This is a concept which is commonly used in statistical decision making. It refers to


the degree in which a given frequency curve is deviating away from the normal
distribution.
There are 2 types of skewness namely
i. Positive skewness
ii. Negative skewness

7.1 Positive Skewness

This is the tendency of a given frequency curve leaning towards the left. In a
positively skewed distribution, the long tail extended to the right.

In this distribution one should note the following


i. The mean is usually bigger than the mode and median
ii. The median always occurs between the mode and mean
iii. There are more observations below the mean than above the mean
This frequency distribution as represented in the skewed distribution curve is
characteristic of the age distributions in the developing countries

frequency Positively skewed frequency


frequency curve Negatively skewed
frequency curve

Normal distribution

Long tail
Median
Mode

Mean

Median

Mode
Mean

7.2 Negative Skewness


This is an asymmetrical curve in which the long tail extends to the left

NB: This frequency curve for the age distribution is characteristic of the age
distribution in developed countries
i) The mode is usually bigger than the mean and median
ii) The median usually occurs in between the mean and mode
iii) The no. of observations above the mean are usually more than those
below the mean (see the shaded region)

51
7.3 MEASURES OF SKEWNESS
These are numerical values which assist in evaluating the degree of deviation of a
frequency distribution from the normal distribution.

Following are the commonly used measures of skewness.


1. Coefficient Skewness
mean - median
=3
Standard deviation

2. Coefficient of skewness
mean - mode
=
Standard deviation

NB: These 2 coefficients above are also known as Pearsonian measures of skewness.

3. Quartile Coefficient of skewness


Q3+Q1- 2Q2
=
Q3+Q1
Where Q1 = 1st quartile
Q2 = 2nd quartile
Q3 = 3rd quartile
NB: The Pearsonian coefficients of skewness usually range between –ve 3 and +ve 3.
These are extreme value i.e. +ve 3 and –ve 3 which therefore indicate that a given
frequency is negatively skewed and the amount of skewness is quite high.
Similarly if the coefficient of skewness is +ve it can be concluded that the amount of
skew ness of deviation from the normal distribution is quite high and also the degree
of frequency distribution is positively skewed.

Example
The following information was obtained from an NGO which was giving small loans
to some small scale business enterprises in 2010. the loans are in the form of
thousands of Kshs.
Loans Units Midpoints(x) x-a=d d/c= u fu Fu2 UCB cf
(f)
46 – 50 32 48 -15 -3 -96 288 50.5 32
51 – 55 62 53 -10 -2 -124 248 55.5 94
56 – 60 97 58 -5 -1 -97 97 60.5 191
61 –65 120 63 (A) 0 0 0 0 0 0
66 –70 92 68 5 +1 92 92 70.5 403
71 –75 83 73 10 +2 166 332 75.5 486
76 – 80 52 78 15 +3 156 468 80.5 538
81 – 85 40 83 20 +4 160 640 85.5 57.8
86 – 90 21 88 25 +5 105 525 90.5 599
91 – 95 11 93 30 +6 66 396 95.5 610
Total 610 428 3086

52
Required
Using the Pearsonian measure of skewness, calculate the coefficients of skewness
and hence comment briefly on the nature of the distribution of the loans.
c fu
Arithmetic mean = Assumed mean +
f

428 ×5
= 63 +
610
= 66.51

2
fu 2 fu
The standard deviation =c× -
f f

2
3086 428
=5 × -
610 610

= 10.68
n +1
The Position of the median lies m =
2
610 +1
= = 305.5
2
305.5 - 191
= 60.5 + ×5
120

114.4
= 60.5 + ×5
120
Median = 65.27
Therefore the Pearsonian coefficient
66.51- 65.27
=3
10.68
= 0.348

53
Comment
The coefficient of skewness obtained suggests that the frequency distribution of the
loans given was positively skewed
This is because the coefficient itself is positive. But the skewness is not very high
implying the degree of deviation of the frequency distribution from the normal
distribution is small

Example 2
Using the above data calculate the quartile coefficient of skewness
Q3+ Q1- 2Q2
Quartile coefficient of skewness =
Q3+ Q1

610 +1
The position of Q1 lies on = = 152.75
4

152.75 - 94
∴ actual value Q1 =55. 5 + 5 58.53
97

610 +1
The position of Q3 lies on =3 = 458.25
4
458.25 - 403
∴ actual value Q3 =70.55 + 5 73.83 × 5
83

610 +1
Q2 position: i.e. 2 = 305.5
4
305.5 -191
Actual Q2 value 60.5 5 65.27
120

The required coefficient of skew ness


73.83 58.53 2 65.27
= 0.013
73.83 58.53

Conclusion
Same as above when the Pearsonian coefficient was used

8 KURTOSIS

This is a concept, which refers to the degree of peakedness of a given frequency


distribution. The degree is normally measured with reference to normal distribution.
The concept of kurtosis is very useful in decision making processes i.e. if is a
frequency distribution happens to have either a higher peak or a lower peak, then it
should not be used to make statistical inferences.

54
Generally there are 3 types of kurtosis namely;-
i. Leptokurtic
ii. Mesokurtic
iii. Platykurtic

8.1 Leptokurtic

A frequency distribution which is lepkurtic has generally a higher peak than that of
the normal distribution. The coefficient of kurtosis when determined will be found to
be more than 3. thus frequency distributions with a value of more than 3 are
definitely leptokurtic

8.2 Mesokurtic
Some frequency distributions when plotted may produce a curve similar to that of the
normal distribution. Such frequency distributions are referred to as mesokurtic. The
degree of kurtosis is usually equal to 3

8.3 Platykurtic

When the frequency curve contacted produces a peak which is lower that that of a
normal distribution when such a curve is said to be platykurtic. The coefficient of
such is usually less than 3

It is necessary to calculate the numerical measure of kurtosis. The commonly used


measure of kurtosis is the percentile coefficient of kurtosis. This coefficient is
normally determined using the following equation
1
Q3 - Q1
Percentile measure of kurtosis, K (Kappa) = 2
P90 - P10
Example
Refer to the table above for loans to small business firms/units
Required
Calculate the percentile coefficient of Kurtosis
90
P90 = n +1 = 0.9 610 +1
100
= 0.9 (611)
= 549.9
The actual loan for a firm in this position
549.9 - 538
(549.9) = 80.5 + x 5 = 81.99
40
10
P10 = (n + 1) = 0.1 (611) = 61.1
100
The actual loan value given to the firm on this position is
61.1 32
50.5 + x 5 = 52.85
62
= 0.9 (611)

55
= 549.9
∴ percentile measure of kurtosis

Q3 - Q1
K(Kappa) = ½
P90 - P10

73.83 - 58.53

81.99 - 52.85

= 0.26
Since 0.26 < 3, it can be concluded that the frequency distribution exhibited by the
distribution of loans is platykurtic
Kurtosis is also measured by moment statistics, which utilize the exact value of each
observation.
X
i. M1 the first moment = M1 = = Mean M1 or M1
n

X2
M2 =
n

X3
M3 =
n

X4
M4 =
n

2. M2 second moment about the mean M2 or f2


M2 = M2 – M12
3. M3 third moment about the mean M3 (a measure of the absolute skew ness)
M3 = M3 – 3M2M1 + 2M13
4. M4 fourth moment about the mean M4 (a measure of the absolute Kurtosis)
M4 = M4– 4M3M1 + 6M2M12 + 3M14
An alternative formula
4
x m f
M4 = Where m is mean
f

M4
Moment coefficient of Kurtosis
S4

56
Example
Find the moment coefficient of the following distribution
x f
12 1
14 4
16 6
18 10
20 7
22 2

X F xf (x-m) (x-m)2 (x-m)2f (x-m)4f


12 1 12 -5.6 31.36 31.36 983.45
14 4 56 -3.6 12.96 51.84 671.85
16 6 96 -1.6 2.56 15.36 39.32
18 10 180 .4 0.16 1.60 0.256
20 7 140 2.4 5.76 40.32 232.24
22 2 44 4.4 19.36 38.72 749.62
30 528 179.20 2,676.74

528
M = = 17.6
30
179.20
σ2 = = 5.973
30
σ4 = 35.677

4
x m f 2, 676.74
M4 = = = 89.22
f 30

89.22
Moment coefficient of Kurtosis = = 2.5
35.677

Note Coefficient of kurtosis can also be found using the method of assumed mean.

57
9.0 INDEX NUMBERS
An index number is an attempt to summarize a whole mass of data into one figure.
The single figure shows how one year differs from another year.
It is a statistical devise used to measure the change in the level of prices, wages
output and other variables at given times, relative to their level at an earlier time
which is taken as the base for comparison purposes
Pn
A simple price index = × 100 (an unweighted price index)
Po

Qn
A simple quantity index = × 100 (an unweighted quantity index)
Qo

Where pn is the price of a commodity in the current year (the year for which the price
index to be calculated)

Where po is the price of the same commodity in the base year (the year for
comparison purposes)

Similarly Qn and Qo are defined in the same way.

9.1 Laspeyre’s and Paasche’s Price and Quantity Indices


In the table shown below, Laspeyre‟s and Paasche‟s Price and Quantity Indices are
indicated.

Aggregate Price Index Numbers and Quantity Index Numbers


PRICE INDEX QUANTITY INDEX
pn q o q n po
× 100 × 100
Po q o q o po
PAASCHE‟S INDEX pn qn qn qn
× 100 × 100
Po qn qo pn

pn q n
Value index = × 100
Po q o

9.2 Modified Form of the Laspeyre’s Price Index Number

pn
po wo
Laspeyre’s Price index 100
wo

58
Where w0 are the proportions of the total expected in the basic period. This formula is
frequently used to calculate retail price index.

9.3 Changing the Base of the Index


For comparison purposes if two series have different base years, it is difficult to
compare them directly. In such cases, it is necessary to change the base year of one of
the series (or both) so that both have the same base.
It is also necessary to keep the index relevant to current conditions hence the need to
change the base from time to time.

Example;
Year 1985 1986 1987 1988 1989 1990 1991 1992
Price index 100 104 108 109 112 120 125 140

Suppose we wish to change the base year to 1989


We recalculate each index by expressing it as a percentage of 1989

Previous index Recalculated index


1985 100 100
× 100 = 89.3
112
1986 104 104
× 100 = 92.9
112
1987 108 108
× 100 = 96.4
112
1988 109 109
× 100 = 97.3
112
1989 (new base year) 112 112
× 100 = 100
112
1990 120 120
× 100 = 107.1
112
1991 125 125
× 100 = 111.6
112
1992 140 140
× 100 = 125.0
112

When changing the base year, it is advisable to update the weights used in the base
year.

9.4 Chain Based Index Numbers

A chain based index is one where the index is calculated every year using the
previous year as the base year. This type of index measures rate of change from year
to year.

59
This method is suitable where weights are changing rapidly and items are constantly
being brought into the index and unwanted items taken out. It can be a price or
quantity index

Previous index Recalculated fixedbased index


chainbased index
1985 100 100 100(1985 base year
1986 104 104 104
× 100 = 104 × 100 = 104
100 100
1987 108 108 108
× 100 = 103.8 × 100 = 108
104 100
1988 109 109 109
× 100 = 100.9 × 100 = 109
108 100
1989 112 112 112
× 100 = 102.8 × 100 = 112
109 100
1990 120 120 120
× 100 = 107.1 × 100 = 120
112 100
1991 125 125 125
× 100 = 104.2 × 100 = 125
120 100
1992 140 140 140
× 100 = 112 × 100 = 140
120 100

9.5 The Fisher’s index


The Fisher‟s index acts as a compromise between Laspeyre‟s index and Paasche‟
index. It is calculated as a geometric mean of the two indexes.

9.6 Retail price index


It is weighted average of price relatives based upon an average household in the base
year. The items consumed are divided into groups such as food, housing, transport,
alcoholic drinks, footwear, fuel, light, water, household goods, services e.t.c. each
item included in the index is given a weighting and a price relative to the base is
calculated. Modified form of laspeyre‟s price index formula is used as a weighted
arithmetic mean of price relatives.
pn
po
I.e. Retail Price index W0 100
W0
The index is used by the Government as a guide in determining the minimum wages,
pension rates unemployed benefits (in UK e.t.c). Trade unions use it as a basis for
their wages claims.

9.7 Deflation
Indexes may be used to deflate time series so that comparisons between periods may
be made in real terms
It is a process of reducing a value measured in current period prices to its equivalent
in the base period prices. The deflated value is what would have been necessary to
60
purchase the same amount of goods as the present value can purchase in the current
period
pn q n
Deflation Factor = × 100
p0 q n

Deflation of a time series


Year Average monthly earnings Retail index Real earnings
(shs)
1 5,000 100 5000 = 5000
2 5,500 120 100
5,500 × =
120
4,583.3
3 6,000 140 100
6,000 × =
140
4,285.7
4 6,500 170 100
6,500 × =
170
3,823.5
5 7,200 200 100
7,200 × =
200
3,600.0

9.8 The technique of index number construction


When preparing index numbers it is important to define
The exact purpose of the index
How the items are to be selected
The choice of the weights
The choice of the base
The type of average to be used
The base year should be as close to the normal trend as possible. The best
methods should be used for collection of data. The items should be selected in
such a way that they are a fair representation of all the relevant items.Due
consideration should be given to the weighting of all items selected

9.8.1The index of industrial production

It is a quantity index compiled by the government. It measures changes in the volume


of production in major industries. The index is a good indication of the state of
national economy.
It covers the following major industries in the UK
i. Mining and quarrying
ii. Manufacturing such as food, drinks and tobacco, chemicals, metal
manufacture, engineering e.t.c

61
iii. Textile
iv. Construction
v. Gas electricity, water e.t.c
It excludes agriculture, fishing, trade, transport, finance and other such industries.
Each industries order is given a weighting. The weighting is based on average
monthly production in each industry in a fixed base year. It gives each item its
relative importance amongst all other items and thus gives a better estimate of the
index for comparison purposes.

9.8.2 The Geometric Index (Industrial Share index)


This index is an index of 30 selected top industrial companies. It is calculated by
taking an unweighted geometric mean of the price relatives of the selected shares.

Example
The share prices of ordinary shares of four companies on 1st January 1990 and 1st
January 1991 were as follows.

Share Price on Price on


1.1.1990 1.1.1991
Company A Shs 10 Shs 12
Company B Shs 12 Shs 15
Company C Shs 20 Shs 25
Company D Shs 5 Shs 6

Using an unweighted geometric index, calculate the index of share prices at 1.1.1991
if 1.1.1990 is the base date, index 100

Solution
1 1
12 15 25 6 4 27000 4 1
2.25 4
10 12 20 5 12000
1.225
percentage increase = 22.5% index = 122.5

9.8.3 Inflation
The inflation rate for a given period can be calculated using the following formula;
Current retail price index
Inflation = × 100
Retail price index in the base year

9.8.4 Marshal Hedge Worth Index


pn po qn
Marshal Hedge worth index = × 100
po qo qn

9.9 Tests For An Ideal Index Number

62
9.9.1 Factor Reversal Test
This test indicates that when the price index is multiplied with a quantity index i.e.
factors are reversed), it should result in the value index.
9.9.2The time reversal test
If we reverse the time subscripts of a price or quantity index, the result should be
reciprocal of the original index.

63
EXERCISE ONE
Cable PLC manufactures an item of domestic equipment which requires a number of
components which have varied as various modifications of the model have been used.
The following table shows the number of components required together with the
price over the last three years of production.

COMPONENT 2010 2011 2012


Prices Quantity Prices Quantity Prices Quantity
A 3.63 3 4.00 2 4.49 2
B 2.11 4 3.10 5 3.26 6
C 10.03 1 10.36 1 12.05 1
D 4.01 7 5.23 6 5.21 5

Required:
a) Establish the base weighted price indices for 2011 and 2012 based on
2010 for the item of equipment.
b) Establish the current weighted price indices for 2011 and 2012 based on
2010 for the item of equipment.
c) Using the results of (a) and (b) as illustrations, compare and contrast
Laspeyre‟s and Paasche price index numbers.

EXERCISE TWO
A company manufacturing a product known as TOILTEX uses five components in its
assembly.
The quantities and prices of the components used to produce a unit of TOILTEX in
2010, 2011 and 2012 are tabulated as follows:

COMPONENT 2010 2011 2012


Quantity Prices Quantity Prices Quantity Prices
A 10 3.12 12 3.17 14 3.20
B 6 11.49 7 11.58 5 11.67
C 5 1.40 8 1.35 9 1.31
D 9 2.15 9 2.14 10 2.63
E 50 0.32 53 0.32 57 0.32

Required:
i) Calculate Laspyere‟s type price index number for the cost of one unit of
TOILTEX for 2011 and 2012 based on 2010.

ii) Calculate Paasche type price index numbers for the cost of one unit of TOILTEX
for 2011 and 2012 based on 2010.

64
iii) Compare and contrast the Laspeyre and Paasche price-index numbers you
have obtained in (i) and (ii)

EXERCISE THREE

A number of employers manufacturing plastic components used in plumbing have


formed themselves into an association for the purpose of negotiating with the trade
union for this industrial sector.

The negotiations cover pay and contributions in this sector.

Required:
Explain the usefulness of an index of Industrial Production and an index of retail
prices to both sides in a series of pay negotiations.

65
10.0 DECISION TREES

10.1 Introduction to Decision trees

i) Decision tree is pictorial method of showing a sequence of inter-related decisions


and outcomes used to help in making a choice between several courses of action.
ii) It shows all possible courses of action and every possible outcome of each course
of action
iii) Decision trees should be used where a problem involves a series of decisions
being made and several outcomes arise during the decision-making process
iv) All possible choices are shown as branches while the outcomes as subsidiary
branches
v) Decision trees force the decision maker to consider the logical sequence of events.
A complex problem is broken down into smaller, easier-to-handle sections
vi) They provide a highly effective structure within which you can lay out options
and investigate the possible outcomes of choosing those options.
vii) They also help you to form a balanced picture of the risks and rewards associated
with each possible course of action.

Symbols used

Decision point
i) Are points where a choice exists between alternatives
ii) Represented with a square

iii) At a decision point, the decision maker has a choice on which course of action to
undertake

Outcome point
i) Are points where events depend on probabilities
ii) Represented with a circle/ node

iii) The branches from a circle are always subject to probabilities

End point
Are the final outcomes
Represented with a triangle

66
Illustration

Action B1
Action B2
Outcome D2
X1 Outcome
Action A1 X2
Outcome
D1 X3
Action A2 Outcome
Y1

Action C1
Outcome
Y2 D3 Action C2

Action C3
10.2 Steps in drawing a decision tree

This is a three-step method that involves;


Step 1: Draw the tree from left to right showing appropriate decisions and
event/outcomes ie forward pass. Label the tree appropriately.
Step 2: Evaluate the tree from right to left carrying out these two actions
i. Calculate the expected value (EV) at each outcome point
ii. Choose the best option at each decision point
This is referred as the backward pass
Step 3: Recommend a course of action to management

10.3 Decision Trees and Sequential Decisions

Example
Two projects are being considered and the project data has been estimated as follows;

PROJECT A PROJECT A
RETURNS PROBABILI RETURNS PROBABILI
Kshs TY Kshs TY
OPTIMISTIC OUTCOME 6,000 0.2 6,500 0.1
MOST LIKELY OUTCOME 3,500 0.5 4,000 0.6
PESSIMISTIC OUTCOME (2,500) 0.3 (1,000) 0.3

67
Required;
(i) Construct a decision three for this problem
(ii) Calculate the expected monetary value (EMVs) of the two projects

Example
A company is considering whether to launch a new product. The success of the idea
depends on the ability of a competitor to bring out a competing product (Estimated at
60%) and the relationship of the competitor‟s price to the firm‟s price.

The table below shows the profits for each range that could be set by the company related
to the possible competing prices

Profits in Ksh 000,000 if competitor’s price is;


If company’s Low Medium High Profit if no
price is ; competitor
Low 30 42 45 50
Medium 34 45 49 70
High 10 30 53 90

The company must set its price first because its product will be on the market earlier so
that the competitor will be able to react to the price. Estimates of the probability of a
competitor‟s price are shown below;

Competitor’s price expected to be;


If company’s price is ; Low Medium High
Low 0.8 0.15 0.05
Medium 0.2 0.7 0.1
High 0.05 0.35 0.6

Required:
i. Draw a decision tree and analyze the problem
ii. Recommend what the company should do

LP*, MP* and HP* = COMPETITOR’S PRICES


LP , MP and HP = COMPANY’S PRICES
LP 50

MP 70
D2
HP 90
No competition EV=90 LP* 0.8 30
0.4
EV
32.55
MP* 0.15 42
Market EV HP* 0.05 45
product 61.92 LP
LP* 0.2
34
D1 Competitio D3 MP EV MP* 0.7
0.6 43.2 45
n
EV=43.2
HP* 0.1 49
HP LP* 0.05
Don‟t market 10
MP*
product 68
EV 0.3
42.8 30
HP* 50.6
53

0
Example
A company is planning on drilling for oil. It can either drill immediately or carry out
some preliminary tests. Alternatively, the company could also sell the rights to the site to
another company. It has created the following decision tree of the problem;

0.8 Find Oil (150)


120
Drill (50)
Indicate Oil 0.2
(0.7) 70 Find No Oil
Sell Rights (65)
53.5 0.2 Find Oil (150)
30
Indicate Drill (50) 0.8
Tests
No Oil Find No Oil
(10) 15
(0.3)
Sell Rights (15)
43.5 Sell Rights (40)

Drill Now (50)


0.55
Find Oil (150)
82.5

0.45 Find No Oil

10.4 Bayes Theory and Decision Trees

It makes an application of Bayes‟ Theorem to solve typical decision problems. This is


examined a lot so it is important to clearly understand it.

Example:
Montana Electronics is a company producing Delux television sets. It is contemplating
launching a new model, the Super view. There are several possibilities that could be
opted for;
i) Continue producing Delux which has profits declining at 10% per annum on a
compounding basis. Last year its profit was KShs. 60,000/=.
ii) Launch Super view without any prior market research. If sales are high annual profit
is put at KShs. 90,000/= with a probability which from past data is put at 0.7. Low
sales have 0.3 probability and estimated profit of KShs. 30,000/=.
iii) Launch Super view with prior market research costing KShs. 30,000/= the market
research will indicate whether future sales are likely to be „good‟ or „bad.‟ If the
research indicates „good‟ then the management will spend KShs. 35,000/= more on
capital equipment and this will increase annual profits to KShs. 100,000/= if sales are
actually high. If however sales are actually low, annual profits will drop to KShs.
25,000/=. Should market research indicate „good‟ and management not spend more
on promotion the profit levels will be as for 2nd scenario above.

69
iv) If the research indicate „bad‟ then the management will scale down their expectations
to give annual profit of KShs. 50,000/= when sales are actually low, but because of
capacity constraints if sales are high profit will be KShs. 70,000/=. Past history of the
market research company indicated the following results.

Actual sales
High Low
Predicted Good 0.8* 0.1
sales level Bad 0.2 0.9

*When actual sales were high the market research company had predicted good sales
level 80% of the time.

Required:
Use a time horizon of 6 years to indicate to the management of the company which
option theory should adopt (Ignore the time value of money).

Solution
(a) First draw the decision tree diagram

DELUX
(Option 1)
60000 (declining)
High 0.7
90000
Super View
2 (Option2)
A
Low 0.3
30000
P(H|G)
100,000
Market 0.95
Research Extra B P(L|G)
(option 3) 35,000 25000
0.05 00
Good 1
No P(H|G)
extra 90000
C 0.95
E P(L|G)
30,000
0.05
P(H|B)
Bad 70000
D 0.34
P(L|B)
50000
0.66

70
Computations; note how probability figures are arrived at.
The decision tree dictates that the following probabilities need to be calculated.

P(G) For market


P(B) research

P(H|G)
P(L|G) For sales
P(H|B) outcome;
P(L|B)

P(G|H) = 0.8
P(B|H) = 0.2
P(G|L) = 0.1 Given
P(B|L) = 0.9
P(H) = 0.7
P(L) = 0.3

Good P(G&H) = P(H) × P(G|H) P(G&L) = P(L) ×


0.7 × 0.8 = 0.56 P(G|L)
0.3 × 0.1 = 0.03
Bad B&H = P(H) × P(B|H) P(B&L) = P(L) ×
0.7 × 0.2 = 0.14 P(B|L)
0.3 × 0.9 = 0.27
High 0.7 Low 0.3

P(G) = P(G and H) + P(G and L)


= 0.56 + 0.03 = 0.59

P(B) = P(B and H) + P(B and L)


= 0.14 + 0.27 = 0.41
Note that P(G) + P(B) = 0.59 + 0.41 = 1.00

From Bayes‟ rule;

P G|H P H 0.56
P H |G 0.95
P G 0.59
P G|L P L 0.03
P L|G 0.05
P G 0.59
P B|H P H 0.14
P H |B 0.34
P B 0.41
P B|L P L 0.27
P L|B 0.66
P B 0.41

71
Evaluating financial outcome:

Option 1:
Last year Shs. 60,000 profits

Year Shs.
1= 60,000 × 0.91 = 54,000.0
2= 60,000 × 0.92 = 48,000.0
3= 60,000 × 0.93 = 43,740.0
4= 60,000 × 0.94 = 39,366.0
5= 60,000 × 0.95 = 35,429.5
6= 60,000 × 0.96 = 31,886.5
253,022.0
Option 2
Expected value of Super View
Node (A): 0.7(90,000 × 6) + 0.3(30,000 × 6)
= 378,000 + 54,000 = KShs. 432,000/=

Note that the figures a multiplied by 6 to account for the 6 years.

Option 3
Expected value of market research

Node (B): 0.95(100,000 × 6) + 0.05(25,000 × 6)


= 570,000 + 7,500 = KShs. 577,500/=
Deduct KShs. 35,000/= for extensions
= Kshs 542,500/=.

Node (C): 0.95(90,000 × 6) + 0.05(30,000 × 6)


= 513,000 + 9,000 = KShs. 522,000/=

Node 1: Compare B and C


B is higher, thus = Kshs542,000/=.

Node (D): 0.34(70,000 × 6) + 0.66(50,000 × 6)


142,800 + 198,000 = KShs. 340,800/=

Node 2: KShs. 340,800/= or 0 – no launch

Node (E): 0.59 × 542,500 + 0.41 × 340,800


320,075 + 139,728 = KShs. 459,803/=
Less market research expenditure
Kshs459,803/= – Kshs30,000/= = KShs. 429,803/=

Node 2: Final decision summary


Option 1 EMV = Kshs 253,022/=
Option 2 EMV = Kshs 432,000/=
Option 3 EMV = Kshs 429,803/=

72
Therefore we chose option 2 since it has the highest EMV.

10.5 Advantages of decision trees and Disadvantages of decision trees

10.5.1Advantages of decision trees


1. it clearly brings out implicit assumptions and calculations for all to see question and
revise
2. it is easy to understand

10.5.2 Disadvantages of decision trees


1. it assumes that the utility of money is linear with money
2. it is complicated by introduction of more variables and decision alternatives
3. it is complicated by presence of interdependent alternatives and dependent variables

73
11.0 PROBABILITY

Probability is concerned with quantification of uncertainty.Uncertainty is also referred as


likelihood or chance or risk. Probability is represented with the letter P, therefore the
probability of an event E happening is denoted P(E) and the probability of event E not
happening is denoted P( E )

11.1 Approaches to Probability

There are four main approaches. These are;


i. Classical or priori approach
ii. Relative frequency or empirical approach
iii. Subjective approach
iv. Axiomatic approach

11.1.1 Classical Approach


Assumes that all possible outcomes are equally likely, such as the outcomes of
tossing a coin or throwing dice

P (E) = Number of favorable outcomes


Total number of possible outcomes

Thus; If A = Number of favorable events and B = Number of unfavorable events

Then, P (Event E happening) = and;

P (Event E not happening) =

Tossing a coin - two outcomes (H or T)

P (H) = 1
2
1
P (T) =
2

This approach is also called Apriori or Logical or Rational theory of probability

Example
In a production run of 500 items, 15 items are found to be defective. If one item is
drawn at random, find the probability that it is defective

74
11.1.2 Relative Frequency Theory
 This also referred as Empirical Theory
 Repetitive experiments are conducted and the results of each trial recorded

Trial 1 2 3 4 5 6 7 8 9

Observations H H T H T T H H H

6 2
P (H) = =
9 3
P (T) = 3 = 1
9 3
If n = number of trials , m = number of heads observed and q = number of tails
observed

P (H) = m = 6 = 2
n 9 3
q 3 1
P (T) = = =
n 9 3
Thus;
P(H) = Lim m = 2
n ∞ n 3

m
The fraction is called relative frequency of the event in n trials
n

Example
1000 tosses of a coin results in 519 heads and 481 tails

Solution;

P (H) = 519 = 0.519


1000

Lim P (H) = 0.5


n ∞

481
P (T) = = 0.481 or P (T) = 1 – 0.519 = 0.481
1000

Lim P (T) = 0.5


n ∞

75
11.1.3 Subjective Probability

Decision makers quantify the possibility of an event occurring using numbers


between 0 and 1or 0 and 100 % based on the degree of rational belief
Different decision makers assign different values of weight and thus the approach
is subjective and personalistic
Commonly used in situations that are unusual to business eg political unrest

11.1.4 Axioms of Probability


There are three axioms. These are;
1. Axiom of positiveness
The probability of an event A denoted P(A) lies between 0 and 1
ie 0≤ P(A) ≤1
2. Axiom of certainty
Probability of the sample space is equal to one ie P(S) = 1
3. Axiom of sum
The probability of the sum or union of any mutually exclusive event is equal to
the sum of probabilities of individual events
ie P(AuBuCuDu….) = P(A+B+C+D+…….) = P(A) + P(B) +P(C) +P(D)+……

11.1.4.1Probability Range

The probability of an event E can only take values ranging from 0 to 1

0 ≤ P (E) ≤ 1

Probability of uncertain event= 0 (event that won‟t happen)


Eg P(flying to the moon unaided) = 0
Probability of certain event= 1 (event that must happen)
Eg P(dying) = 1

Event refers to an occurrence ie a possible outcome eg spinning a head from a coin,


drawing a card from a shuffled park of cards, rolling a 5 from a fair die
Generally, the probability of an event E is defined as follows;

Number of favorable outcomes


P(E) =
Total number of possible outcomes
Example;
1. Find the probability of drawing an Ace from a shuffled park of cards
4= 1
P (Ace) =
52 13
2. If in a lottery, there are 6 prizes and 24 blanks, find the probability of
i. Winning
ii. Not winning

76
Solution:
i. P ( W) = 6 = 1
6+24 5

ii. P ( W) = 24 = 4
6+24 5

Or

P ( W) = 1 - 1 = 4
5 5

Thus if A represent the probability of an event happening, the probability that the event
will not happen is given as follows;
P (Event not happening) = 1 - P (Event happening)
=1-A
11.2 Probability Events
11.2.1Mutually exclusive events
Are events which cannot happen at the same time ie either one or the other eg when a
coin is tossed either a head (H) or a tail (T) will occur but not both- if a head occurs it
excludes the possibility of a tail from occurring

Hence; P (H) = ½ or P(T) = ½

11.2.2Independent events
Are events that can happen at the same time ie occurrence of one event does not affect
the occurrence of the other eg if a coin is tossed twice the occurrence of a head in the first
toss does not affect the occurrence of a head in the second toss.
Hence; P (H in the first toss) = ½ and P(H in the second toss) = ½

11.2.3 Collectively exhaustive events


Refers to the inclusion of all possible outcomes eg in tossing a coin all the possible events
are occurrence of a head or occurrence of a tail
The sum of probabilities of all events is unity

11.2.4 Complementary events


If A represent the event of the number of favorable cases, then A represent the event of
the number of unfavorable cases
A and A are complementary events
Hence; P (A) + P (A) = 1 ie are mutually exclusive and mutually exhaustive

11.3 Basic Rules of Probability


The basic rules of probability include;
i. Addition rule

77
ii. Multiplication rule
iii. Conditional probability rule

11.3.1 Addition Rule (OR)


P (A or B) = P (A) + P (B)
This rule is used to calculate the probability of two or more mutually exclusive
events
Example
1. What is the probability of getting a 3 or a 6 when a die is rolled once?
1 1
P(3) = or P(6) =
6 6
P (3 or 6) = 1+ 1= 2= 1
6 6 6 3

2. City residents were surveyed to determine readership of newspapers. 50% of


the residents were found to read the morning paper, 60% read the evening
paper and 20% read both newspapers. Find the probability that a resident
selected reads either the morning or evening or both the papers.

Solution;

Let A and B represent the events that the residents read the morning and
evening papers respectively.

Then; P (A) = 0.50 , P (B) = 0.60 and P (A n B) = 0.20

The probability that a resident reads either the morning or evening or both
the papers is given by;

P (A u B) = P (A) + P (B) – P (A n B)

= 0.50 + 0.60 – 0.20 = 0.90

11.3.2 Multiplication Rule

If two events A and B are independent then;


P (A and B) = P (A) X P (B)
This rule is used to calculate the probability of two or more independent events

If two events A and B are dependent in such a way that B occurs only after A has
occurred then the probability of both occurring is given as follows;
P (A and B) = P (B/A) x P (A)

Example 1
What is the probability of getting a 3 and 6 when a die is rolled twice?
1 1
6 6
78
P(3) = and P(6) =

P (3 and 6) = 1X 1= 1
6 6 36
1 is the probability of getting a 3 followed by a 6 if the order is not important
36 then 3followed by 6 or 6 followed by 3 is acceptable
Hence;
P (3 and 6) = P (3 followed by 6) or P ( 6 followed by 3)

= 1 X 1 + 1 x 1
6 6 6 6

1 1 1
= + =
36 36 18

11.3.3 Conditional Probability

Conditional probability is used when calculating probabilities when only partial


information concerning the results is available or when recalculating probabilities in the
light of additional information.
It involves calculating probability of a later event based on the results of an earlier event
ie Probability of an event A depends on the occurrence of an earlier event B; P(A/B)
(Probability of A given B has occurred)

P(A/B) = P (A and iff P(B) = 0


B)
P (B)
NOTE: Probability of an event a given B is equal to the probability of event A and
event B divided by probability of B.

Example
1. The probability that one is called for an interview is 1 . If one is called for the
6
interview the probability of being successful is 3. Find the probability that
10
one is successful in the interview?

Solution;
Let P(S) = probability that one is successful in interview
P (C) = probability one is called for the interview

P (S/C) = P(S n C) = 1 X 3
P (C) 6 10
= 1
20
Using a tree diagram
3
1 3 1
10 P (S/C) =
1 6
X 10 = 20
6
7
79 10

5
6
Example : A math teacher gave her class two tests. 25% of the class passed both tests and
42% of the class passed the first test. What percent of those who passed the
first test also passed the second test?

Solution: P(First and Second) 0.25


P(Second|First) = = = 0.60
P(First) 0.42

Example : A jar contains black and white marbles. Two marbles are chosen without
replacement. The probability of selecting a black marble and then a white
marble is 0.34, and the probability of selecting a black marble on the first draw
is 0.47. What is the probability of selecting a white marble on the second draw,
given that the first marble drawn was black?

Solution: P(Black and White) 0.34


P(White|Black) = = = 0.72
P(Black) 0.47

Example : The probability that it is Friday and that a student is absent is 0.03.
Since there are 5 school days in a week, the probability that it is Friday
is 0.2. What is the probability that a student is absent given that today is
Friday?
Solution: P(Friday and Absent) 0.03
P(Absent|Friday) = = = 0.15
P(Friday) 0.2

Example : At Kennedy Middle School, the probability that a student takes


Technology and Spanish is 0.087. The probability that a student takes
Technology is 0.68. What is the probability that a student takes Spanish
given that the student is taking Technology?

Solution: P(Technology and Spanish) 0.087


P(Spanish|Technology) = = = 0.13
P(Technology) 0.68

Exercise

There are 100 students in a first year college intake. 36 are male and are studying
accounting, 9 are male and not studying accounting, 42 are female and studying
accounting, 13 are female and are not studying accounting.
Required:
i. Probability a student is a male

80
ii. Probability that a student is a male and studying accounting
iii. Probability that a student is female and not studying accounting
iv. Probability that a student is studying accounting given that she is a female
v. Probability that s student is not studying accounting given that he is a male

11.4.0 Bayes’ Theorem

This is concerned with the method of estimating the probabilities of the causes of an
observed event.
The process involves working backwards from effect to cause
Bayes‟ theorem is used in the analysis of decisions using decision trees where
information is given inform of conditional probabilities and the reverse of these
probabilities must be found.
This theorem is also referred as Bayes‟ rule and is given as follows;

P(A and P(B/A) x P(A)


P(A/B) =
B) P(B)
=
P(B)

Example
A company has three production sections A, B and C which contribute 40%, 35% and
25% respectively, to a total output. The following percentages of faulty units have been
observed;
A 2% 0.02

B 3% 0.03

C 4% 0.04

There is a final check before output is dispatched. Calculate the probability that a unit
found faulty at this check has come from section A.
Solution: F P(A and F)
0.02
0.98 F P(A and F)
A
0.40 F P(B and F)
0.03
0.35
B 0.97
F P(B and F)
0.04 F P(C and F)
0.25 C
0.96
F P(C and F)

81
P(A/F) = P(A and F)
P(F)

P (A and F) = 0.40 x 0.02 = 0.008

P (F) = 0.40 x 0.02 + 0.35 x 0.03 + 0.25 x 0.04

= 0.008 + 0.0105 + 0.01

= 0.0285

0.008
P(A/F) = 0.2807
0.0285
=

Example

The student body in a statistics class is 60 % males. The registration records show that
30% of the males attended private high schools and 62% of ladies attended public high
schools. A student involved in a case is known to have attended a public school. What is
the probability that the student is a male?

Pub P(M and Pub)


Solution: 0.70

M P(M and Priv)


0.60 0.30 Priv

Pub P(F and Pub)


0.62
0.40
F

0.38
Priv P(F and Priv)

P (M and Pub)
P(Pub/M) = P(M)

82
P (M and Pub) = 0.60 x 0.70 = 0.42
P (M) = 0.60 x 0.70 + 0.60 x 0.30
= 0.42 + 0.18 = 0.60

0.42
0.60 == 0.70

11.5.1 Expected Value

The expected value of an event is the product of its probability and the outcome or value
of the event over a series of trials.
-Its used management to make a decision especially where there are many competing
alternatives or options.

Calculation of Expected Value

EV = ∑px Where x = Future outcome


P = Probability of the outcome occurring
Example
A company expects the following monthly profits;
Monthly profit Kshs Probability
10,000 0.70
20,000 0.30
Required:
Calculate the expected value of monthly profit

Solution:
Monthly profit in Probability
Kshs (x) (p) px
10,000 0.70 7,000
20,000 0.30 6,000
∑px = 13,000
Example
Two projects are being considered and the project data has been estimated as follows;

PROJECT A PROJECT A
Kshs PROBABILITY Kshs PROBABILITY
OPTIMISTIC OUTCOME 6,000 0.2 6,500 0.1
MOST LIKELY OUTCOME 3,500 0.5 4,000 0.6
PESSIMISTIC OUTCOME 2,500 0.3 1,000 0.3

Required;
The expected value of each project

83
Solution;
PROJECT A PROJECT A
Kshs P EV Kshs P EV
OPTIMISTIC OUTCOME 6000 0.2 1200 6500 0.1 650
MOST LIKELY OUTCOME 3500 0.5 1750 4000 0.6 2400
PESSIMISTIC OUTCOME 2500 0.3 750 1000 0.3 300
Project EV 3700 3350

The expected value of project A = Kshs 3,700


The expected value of project B = Kshs 3,350

On the basis of expected value, project A would be preferred because it has a higher
value

11.5.2 Advantages of expected value and disadvantages of expected value

Advantages of expected value


i. Take risk into account by considering the probability of each possible outcome
and using this information to calculate the expected value.
ii. The information is reduced to a single number resulting in easier decision
iii. Calculations are relatively simple

Disadvantages of expected value


i. Probabilities used are usually very subjective
ii. EV is merely a weighted average and therefore has little meaning for a one-off
project
iii. EV may not correspond to any of the actual possible outcomes

Exercise
1. A company‟s sales for a new product are subject to uncertainty. It has determined a
range of possible outcomes over the first two years.

Year 1
Sales Kshs m %
High 40 60
Low 20 40

84
Year 2 : (i) If year 1 sales are high

Sales Kshs m %
High 80 90
Low 30 10
(ii) If year 1 sales are low
Sales Kshs m %
High 30 20
Low 10 80

Required:
Calculate the expected value for each year

2. Three groups of children contain 3 girls and 1 boy, 2 girls and 2 boys, 1 girl and 3
boys respectively. One child is selected at random from each group. Show that the
probability that three children selected consists of 1 girl and 2 boys.

3. A candidate is selected for the interview of management trainee for three companies.
For the first company there are 12 candidates, for the second company there are 15
candidates and for the third company there are 10 candidates. What is the probability
that his getting at least one of the company.

85
12.0 PROBABILITY DISTRIBUTIONS

12.1 DISCRETE PROBABILITY DISTRIBUTIONS

12.1.1 BINOMIAL PROBABILITY DISTRIBUTION

Binomial probability distribution is a set of probabilities for discrete events. Discrete


events are those whose results or outcomes can be counted. Binomial probabilities are
commonly encountered in business situations e.g. in quality control activities the
binomial probabilities are frequently used especially when determining the probability of
having a certain no. of defective items in a given consignment.

The binomial probability distribution is usually characterized by the fact that the binomial
events have to fulfill the following properties
i. Each event has 2 possible outcomes only known as success or failure
ii. The probability of each outcome is independent of the previous outcomes
iii. The sample size is generally fixed
iv. The probabilities of success and failure tend to approach 0.5 if the sample size
increases (in the event when an unbiased coin is thrown a number of times)
v. The probabilities are given by the following equation
9 n r
P r C5 p r 1 p

n! n r
pr 1 p
r! n r
Where p = Probability of success
r = no. of successes
n = sample size
q = 1 – P = Probability of failure

Example 1
A medical survey was conducted in order to establish the proportion of the population
which was infected with cancer. The results indicated that 40% of the population were
suffering from the disease.
A sample of 6 people was later taken and examined for the disease. Find the probability
that the following outcomes were observed
a) Only one person had the disease
b) Exactly two people had the disease
c) At most two people had the disease
d) At least two people had the disease
e) Three or four people had the disease

86
Solution
P(a persona having cancer) = 40% = 0.4 = P
P(a person not having cancer) = 60% = 0.6 = 1 – p = q
a) P(only one person having cancer)
= 6C1 (0.4)(0.6)5
6!
= (0.4)1(0.6)5
5 !1!
= 0.1866
Note that from the formula
n
Crprqn-r: n = sample size = 6
p = 0.4
r = 1 = only one person having cancer
b) P(2 people had the disease)
= 6C2 (0.4)2 (0.6)4
6!
= (0.4) 2 (0.6)5
4!2!
6 5 4!
= (0.4) 2 (0.6)5
4! 2 1

= 15 × (0.4) 2 (0.6)5

= 0.311
c) P(at most 2) = P(0) + P(1) + P(2) = P(0) or P(1) or P(2)
So we calculate the probability of each and add them up.
P(0) = P(nobody having cancer)
= 6C0 (0.4) 0(0.6)6
6!
= (0.4) 0(0.6)6
0!6!
= (0.6)6
= 0.0467
The probabilities of P(1) and P(2) have been worked out in part (a) and (b)
Therefore P(at most 2) = 0.0467 + 0.1866 + 0.311 = 0.5443
d) P(at least 2)
= P(2) + P(3) + P(4) + P(5) + P(6)
= 1 – [P(0) + P(1)] This is a shorter way of working out the solution since
[P(0) + P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1]
= 1 – (0.0467 + 0.1866)
= 0.7667

87
e) P(3 or 4 people had the disease)

= P(3) +P(4)

= 6C3(0.4)3(0.6)3 + 6C4(0.4)4(0.6)2

6! 6!
= (0.4) 3(0.6)3 + (0.4) 4(0.6)2
3!3! 2! 4!

= 6 × 5 × 4 × 3! (0.4) 3(0.6)3 + 6 × 5 × 4! (0.4) 4(0.6)2


3 × 2 × 1 × 3! 2 × 1 × 4!

= 20(0.4)3(0.6)3 + 15(0.4)4(0.6)2
= (20 × 0.013824) + (15 × 0.009216)
= 0.27648 + 0.13824
= 0.41472

Example
An insurance company takes a keen interest in the age at which a person is insured.
Consequently a survey conducted on prospective clients indicated that for clients having
the same age the probability that they will be alive in 30 years time is 2 3 . This
probability was established using the actuarial tables. If a sample of 5 people was insured
now, find the probability of having the following possible outcomes in 30 years
a) All are alive
b) At least 3 are alive
c) At most one is alive
d) None is alive
e) At least 1 is alive
Sample size = 5
P alive p 23 where as P not alive q 13
a) P all alive P r 5
5 2 5 1 0
C5 3 3

5! 2 5 1 0
3 3
5!0!
2 5
3

32
243

88
b) P atleast 3 alive P r 3
P 3 orP 4 orP 5 P 3 P 4 P 5
5 2 4 1 1
P 4 C4 3 3
5 2 3 1 2
P 3 C3 3 3 5! 2 4 1 1 5 4! 2 4 1
3 3 3 3
5! 2 5 1 0 2 3 1 2
4!1! 4! 1
3 3 10 3 3
3!2! 80
80 243
243

32
P 5
243

80 80 32
P 3
243 243 243
192
243

c) P atmost 1 is alive P r 1 d) P none is alive P r 0


P 0 P 1 5 2 0 1 5
C0 3 3
5 2 0 1 5 5 2 1 1 4 1
C0 3 3 C1 3 3

5! 1 5 5! 243
2 1 4
0!5! 3 1!4! 3 3

P atleast 1 alive P r 1
1 10 e)
243 243 1 P none alive
11 1
1
243 243
242
243

89
12.1.2 POISSON PROBABILITY DISTRIBUTION

This is a set of probabilities which is obtained for discrete events which are described as
being rare. Occasions similar to binominal distribution but have very low probabilities
and large sample size.
Examples of such events in business are as follows:
i. Telephone congestion at midnight
ii. Traffic jams at certain roads at 9 o‟clock at night
iii. Sales boom
iv. Attaining an age of 100 years (Centureon)
iv)Poisson probabilities are frequently applied in business situations in order to determine
the numerical probabilities of such events occurring.
v) The formula used to determine such probabilities is as follows
e x
P x
x!
Where x = No. of successes
⋋ = mean no. of the successes in the sample (⋋ = np)
e = 2.718

Example
A manufacturer assures his customers that the probability of having defective item is
0.005. A sample of 1000 items was inspected. Find the probabilities of having the
following possible outcomes
i. Only one is defective
ii. At most 2 defective
iii. More than 3 defective
e λλx
P(x) =
x!
(⋋ = np = 1000 × 0.005) = 5
i. P(only one is defective) = P(1) = P(x = 1)
2.718 5 51 1
= Note that 2.718-5=
1! 2.718 5
5
=
2.718 5
5
=
148 .33
= 0.0337

90
ii. P(at most 2 defective) = P(x ≤ 2)
= P(0) + P(1) + P(2)
e 5 50 P(1) = 0.0337
P(x = 0) =
0!
= 2.718-5 P(2) =
2.718 5 52
1 2!
=
2.718 5 25
1
=
= 2 148.336
148 .336
= 0.00674 = 0.08427

P(x≤2) = 0.00674 + 0.0337 + 0.08427

= 0.012471

iii. P(more than 3 defective) = P(x > 3)

=1– P0 P1 P2 P3

12.1.3 BINOMIAL MATHEMATICAL PROPERTIES

1. The mean or expected value = n × p = np


Where; n = Sample Size
p = Probability of success
2. The variance = npq
Where; q = probability of failure = 1 - p
3. The standard deviation = npq

Example
A firm is manufacturing 45,000 units of nuts. The probability of having a defective nut is
0.15
Calculate the following
i. The expected no. of defective nuts
ii. The variance and standard deviation of the defective nuts in a daily consignment
of 45,000

91
Solution
Sample size n = 45,000
P(defective) = 0.15 = p
P(non defective) = 0.85 = q
i. ∴ the expected no of defective nuts
= 45,000 × 0.15 = 6,750
ii. The variance = npq
= 45000 × 0.85 × 0.15
= 5737.50
The standard deviation = npq
= 5737.50
= 75.74

12.1.4 POISSON MATHEMATICAL PROPERTIES

1. The mean or expected value = np = λ


Where; n = Sample Size
p = Probability of success
2. The variance = np = ⋋
3. Standard deviation = np =

Example
The probability of a rare disease striking a given population is 0.003. A sample of 10000
was examined. Find the expected no. suffering from the disease and hence determine the
variance and the standard deviation for the above problem

Solution
Sample size n = 10000
P(a person suffering from the disease) = 0.003 = p
∴ expected number of people suffering from the disease
Mean = λ = 10000 × 0.003
= 30
= np = ⋋
variance = np = 30
Standard deviation = np =⋋
= 30

= 5.477

In a continuous distribution, the variable can take any value within a specified range, e.g.
2.21 or 1.64 compared to the specific values taken by a discrete variable e.g 1 or 3. The
probability is represented by the area under the probability density curve between the
given values.

92
12.2 CONTINUOUS PROBABILITY DISTRIBUTIONS

The uniform distribution, the normal probability distribution and the exponential
distribution are examples of a continuous distribution

12.2.1 THE NORMAL DISTRIBUTION


The normal distribution is a probability distribution which is used to determine
probabilities of continuous variables
Examples of continuous variables are
o Distances
o Times
o Weights
o Heights
o Capacity e.t.c
Usually continuous variables are those, which can be measured by using the appropriate
units of measurement.
Following are the properties of the normal distribution
i. The total area under the curve is = 1 which is equivalent to the maximum value of
probability

Normal probability
Distribution curve

Line of symmetry

Tail end Tail end

Age (Yrs)
ii. The line of symmetry divides the curve into two equal halves
iii. The two ends of the normal distribution curve continuously approach the horizontal
axis but they never cross it
iv. The values of the mean, mode and median are all equal

NB: The above distribution curve is referred to as normal probability distribution curve
because if a frequency distribution curve is plotted from measurements of a given sample
drawn from a normal population then a graph similar to the normal curve must be
obtained.
vi) It should be noted that 68% of any population lies within one standard deviation, ±1σ

93
vii) 95% lies within two standard deviations ±2σ
viii) 99% lies within three standard deviations ±3σ

Where σ = standard deviation

0 Z
12.2.1.2STANDARDIZATION OF VARIABLES

Before we use the normal distribution curve to determine probabilities of the continuous
variables, we need to standardize the original units of measurement, by using the
following formular.
μ
Z=
σ
Where χ = Value to be standardized
Z = Standardization of x
µ = population mean
σ = Standard deviation

Example
A sample of students had a mean age of 35 years with a standard deviation of 5 years. A
student was randomly picked from a group of 200 students. Find the probability that the
age of the student turned out to be as follows
i. Lying between 35 and 40
ii. Lying between 30 and 40
iii. Lying between 25 and 30
iv. Lying beyond 45 yrs
v. Lying beyond 30 yrs
vi. Lying below 25 years

94
Solution
(i). The standardized value for 35 years
35 - 35
Z= = = 0
σ 5

The standardized value for 40 years


40 - 35
Z= = = 1
σ 5

Hence, the area between Z = 0 and Z = 1 is 0.3413 (These values are checked from the
normal tables see appendix)

The value from standard normal curve tables.


When z = 0, p = 0
And when z = 1, p = 0.3413
Now the area under this curve is the area between z = 1 and z = 0
= 0.3413 – 0 = 0.3413
Hence, the probability age lying between 35 and 40 yrs is 0.3413
(ii). 30 and 40 years
30 35 5
Z= = = = -1
σ 5 5
40 35
Z= = = 1
σ 5

∴ the area between Z = -1 and Z = 1 is


= 0.3413 (lying on the positive side of zero) + 0.3413 (lying on the negative side of
zero)
P = 0.6826
∴ the probability age lying between 30 and 40 yrs is 0.6826
(iii). 25 and 30 years
25 35 10
Z= = = = -2
σ 5 5

30 35
Z= = = -1
σ 5

∴ the area between Z = -2 and Z = -1


Probability area corresponding to Z = -2
= 0.4772 (the z value to check from the tables is 2)
Probability area corresponding to Z = -1
= 0.3413 (the z value for this case is 1)
∴ the probability that the age lies between 25 and 30 yrs
= 0.4772 – 0.3413 (The area under this curve)
P = 0.1359

95
iv). P(beyond 45 years) is determined as follow = P(x > 45)
45 35 10
Z= = = =+2
σ 5 5

Probability corresponding to Z = 2 = 0.4772 = probability of between 35 and 45


∴ P(Age > 45yrs) = 0.5000 – 0.4772
= 0.0228

12.2.2.1 THE EXPONENTIAL DISTRIBUTION

The exponential distribution is of particular importance because of the wide


ranging nature of the practical situations in which it is used.

Examples
1. The length of time until an electronic device fails
2. The time required to wait for the first emission of a particle from a radio
active source
3. The length of time between successive accidents in a large factory
Assume that a probability density function f(x) is valid between the values a and b, then

b
(i).. f ( x)dx 1 i.e. The area under the curve is equal to 1
a
b
(ii).The mean of the distribution E x xf x dx
a
(iii) The variance of the distribution = E(x2) – [E(x)]2
b
Where E x 2 x 2 f x dx
a

Example of continuous probability distribution function


The distribution of a random variable x has a probability density function f(x) given by
f(x) = kx for 0 ≤ x ≤1
f(x) = 0 elsewhere
Where k is constant
Required.
i. Show the value of k is 2
ii. Find the mean of f(x)
iii. Find the variance of f(x)

Solution

96
1
i) ii) 1
f x dx 1 Mean E x xf x dx
0 0
1 1
1 1
kx.dx k
2 x2 1 2 x 2 dx 2
3 x3
0 0
0 0
k
2 1 0 1 2
3 0 2
3

k 2

iii) Variance E x2 E x
2

b
2
x 2 f x dx Mean
a
1
2 2
x 2 2 x dx 3
0
1
1
2 x4 4
9
0
1 4
2 9

1
Variance
18

12.2.2 EXPONENTIAL DISTRIBUTION

Example
The mean life of an electrical component is 100 hours and its life has an exponential
distribution.
Find
a. The probability that it will last less than 60 hours
b. The probability that it will last more than 90 hours

Solution
A continuous random variable X has an exponential distribution, if for some constant k
>0 it has the probability density function
k .e kx for x 0
f x
0 elsewhere
The function f(x) is positive for all values of x and the area under the curve
f x dx ke kx dx 1
0 0
1 1
The mean of an exponential distribution with parameter k is k and its variance is k2

Example
The mean of an exponential distribution is 100, find;
a) P(x<60)
b) P(x>90)

97
solution.
60
1
1 x 1
a) P x 60 100 e 100
dx mean 100 thus k
0
100
x 60
0.6
e 100
1e
0

0.45

b) P x 90 1 P x 90
90
x
1
1 100 e 100
dx
0

0.9 90 0.9
1 e e
0

0.41

12.2.3 THE STUDENTS T DISTRIBUTION


The students t distribution was presented by W. S. Gosset in 1908 under the pen name of
„student‟. The t distribution is of great importance in the so called small sample tests and
is profoundly used in statistical inference
The t distribution has a single parameter, known as the number of degrees of freedom. It
is denoted by the Greek symbol ℧ (read as nu). It can be interpreted as the number of
useful items of information generated by a sample of given size. The degrees of freedom
are sample size less one (v = n-1)

12.2.3.1Properties of t distribution
i) The t distribution ranges from – ∞ to ∞ first as does the normal distribution
ii) The t distribution like the standard normal distribution is bell shaped and
symmetrical around mean zero
iii) The shapes of the t distribution changes as the number of degrees of freedom
changes
iv) The t distribution is more platykurtic than the normal distribution
v) The t distribution has a greater dispersion than the standard normal distribution.
As n gets larger the t distribution approaches the normal distribution when n =
30 the difference is very small

98
Relation between the t distribution and standard normal distribution is shown in the
following diagram

Standard normal distribution


T distribution n = 15
T distribution n = 5

-4 -3 -2 -1 0 1 2 3 4

Note that the t distribution has different shapes depending on the size of the sample.
When the sample is quite small the height of the t distribution is shorter than the normal
distribution and the tails are wider.

Assumptions of t distribution
1. The sample observations are random
2. Samples are drawn from normal distribution
3. The size of sample is thirty or less n ≤ 30

Application of t distribution
- Estimation of population mean from small samples
- Test of hypothesis about the population mean
- Test of hypothesis about the difference between two means

12.2.4 CHI SQUARE DISTRIBUTION

Chi square was first used by Karl Pearson in 1900. It is denoted by the Greek letter χ2. it
contains only one parameter, called the number of degrees of freedom (d-f), where the
term degree of freedom represents the number of independent random variables that
express the chi square

Properties of Chi Square Distribution


1. Its critical values vary with the degree of freedom. For every increase in the
number of degrees of freedom there is a new χ 2 distribution.
2. This possesses additional property so that when χ 21 and χ 22 are independent
and have a chi square distribution with n1 and n2 degrees of from χ 21 + χ 22
will also be distributed as a chi square distribution with n1 + n2 degrees of
freedom
99
3. Where the degrees of freedom is 3.0 and less the distribution of χ 2 is skewed.
But, for degrees of freedom greater than 30 in a distribution, the values of χ 2
are normally distributed
4. The χ 2 function has only one parameter, the number of degrees of
freedom.

P(x) ℧=1

℧=2

℧=3

℧=4

℧=5

0 1 2 3 4 5 6 7 8 9 10 .
χ2

5. χ 2 distribution is a continuous probability distribution which has the value


zero at its lower limit and extends to infinity in the positive direction.
Negative value of χ2 is not possible because the differences between the
observed and expected frequencies are always squared

12.2.5 F DISTRIBUTION OR VARIANCE RATIO DISTRIBUTION


It was developed by R. A Fisher in 1924 and is usually defined in terms of the ratio of the
variances of two normally distributed populations
It is used to test the hypothesis that the two normally distributed populations have two
equal variances
F distribution ratio of the variances between two normally distributed population may be
s12
d2
expressed as 2 1
s2
d 22
With ℧1 = n1–1 and ℧2 = n2–1 degrees of freedom

100
Where normal population means are unknown
n1 – sample size of independent random 1
n2 – sample size of independent random 2
s12 - Sample variance of 1
s22 – sample variance of 2
d12 - Population variance of 1
d 22 Population variance of 2

s12 and s22 are given by

2
x1 x1
s12 as the unbiased estimator of d12
n1 1

2
x2 x2
s22 as the unbiased estimator of d22
n1 1

S12 Larger estimate of variance


if d12 = d 22 then the statistic F =
S 22 Smaller estimate of variance

F – Distribution with n1–1 and n2–1 degrees of freedom. F distribution depends on the
degrees of freedom ℧1 for the numerator and ℧2 for the denominator. It has parameters
℧1 and ℧2 such that for different values of ℧1 and ℧2 will have different distributions.

Properties F Distribution
1. The shape of the f distribution depends upon the number of degrees of
freedom
2. The mean and variance of the f distribution are
Mean = ℧1 for ℧2 >2
-v2 - 2
1
2v2 v1 v2 2
Variance 2
for ℧2 > 4
v1 v2 2 v2 4

3. The f distribution is positively skewed and its skewness decreases with


increases in ℧1 and ℧2
4. The value of f must be positive or zero since variances are squares and can
never assume negative values

101
Assumptions
a) All sample observations are randomly selected and independent
b) The total variance of the various sources of variance should be additive.
c) The ratio of S12 to S22 should be equal to or greater than 1
d) The population for each sample must be normally distributed with identical mean
of variance
e) F value can never be negative

102
REVISION QUESTIONS AND TABLES
SET 1

QUESTION ONE

a) Discuss any five uses of statistics to a business organization

a) Discuss any two instruments of data collection.

b) Write short notes on the terms Sample and Census


c) Distinguish between Dispersion and Skewness.

QUESTION TWO

The table below gives the monthly rent paid by 160 employees in Company X

Rent (KES) No. of employees

10,000 – 15,999 20
16,000 – 21,999 34
22,000 – 27,999 45
28,000 – 33,999 23
34,000 – 39,999 22
40,000 – 45,999 16

i) Find the lower and upper quartile


ii) Determine the inter quartile range and mean deviation of monthly rent.

iii) Determine the Mean, Variance and Standard

103
QUESTION THREE

The table below shows the distribution of the weight of students in a class of 100.

Weight (Kg) No. of students

30 – 39 5
40 – 49 10
50 – 59 35
60 – 69 28
70 – 79 13
80– 89 9

Construct:
(a) An Ogive
(b) Estimate the median
(c) Estimate the 7th Decile
(d) Estimate the 90th Percentile

QUESTION FOUR

Box A contains 4 defective and 6 non-defective items. Box B contains 3 defective and 7
non-defective items. A box is picked at random and then an item is drawn from it.
Find the probability:
i) Draw a tree diagram to represent the information.
ii) Of drawing a defective item.
iii) Of drawing a non-defective item.
iv) That box A was picked given that a defective item has been drawn.
v) That box B was picked given that a non-defective item was drawn.

QUESTION FIVE

a) State and explain the characteristics of a normal curve.

b) The prices in Kshs per Kg and consumption in tones of some retail products in a
certain region between 2011 and 2012 were as shown in the table below;

104
Product 2011 2012

Price Quantity Price Quantity

A 36 99 39 94

B 79 11 89 9

C 44 15 40 18

D 4 1100 5 1200

Using 2011 as the base year, calculate the following

i) Paasche price index

ii) Laspeyres price index

iii) Fishers ideal index

QUESTION SIX

The ages of patients in a hospital were recorded as follows :

22 73 52 65 34 46 25 36 43 61 72 23 50 46 64 72 68 55 61 75 67 27
58 66 70 49 60 77 64 21 53 35 60 57 66 73 75 78 79 37 48 45 61 63
71 67 76 68 59

a) Organize the data in a frequency distribution table with 20 – 29 as the first class.

b) Using Karl Pearson‟s Coefficient of Skewness determine and comment on the


skewness of the distribution.

105
SET 2

QUESTION ONE
High response rate enhances both the validity and reliability of the research findings.
Discuss how you would enhance response rate in a survey.

QUESTION TWO
Differentiate between the following;
i) Stratified sampling and cluster sampling
ii) Systematic sampling and snowball sampling

QUESTION THREE
The following set of data represents a frequency distribution of the number of foreign

exchange transactions conducted by a Bank over 250 working days

Number of transactions 0-4 5–9 10 – 14 15 – 19 20 – 24 25 – 29 30 - 34


Frequency (f) 5 55 150 18 12 7 3

Required;
a) The Arithmetic Mean
b) The Median
c) The Mode
d) The Variance and standard deviation
e) The coefficient of skewness

106
SET 3
QUESTION ONE

a) Differentiate between the following terms:


i) Mutually exclusive and collectively exhuastive events
ii) Sample and Population.
iii) Dispersion and Kurtosis
b) Discuss any five limitations of statistics in business.

QUESTION TWO

a) The frequency distribution below shows the mass of some flowers produced in a
farm off Limuru road in the month of October 2012.

Kg ( X) 30-39 40-49 50-59 60-69 70-79

Frequency (f) 7 14 22 13 6

i) For the above distribution, compute Quartile coefficient of skewness and


comment on the results.
ii) Determine inter quartile range

b) With the help of well labeled diagrams, differentiate between symmetrical,


positively and negatively skewed distributions.

QUESTION THREE

The marks for 40 Students in a statistics exam were recorded as follows:

24 72 81 96 34 83 48 38 46 25
28 36 79 86 27 62 73 55 44 24
25 54 35 75 14 64 55 67 63 66
36 72 49 52 54 48 49 52 42 46

a) Construct a frequency distribution table with the class interval 10-20, being
the first class.
b) Represent the above data on a histogram
c) A frequency polygon
d) Briefly explain the distinction between relative frequency and frequency
distribution .

107
QUESTION FOUR

The table below shows the distribution of the weight of students in a class of 100.

Weight 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79


(Kg)
No. of 6 8 24 21 15 10 9 7
students

Compute:
(e) Mean absolute deviation
(f) Median
(g) Mode
(h) Standard deviation

QUESTION FIVE

a) What is a tree diagram?

b) A manufacturer makes ball pens. The manufacturer employs an inspector to


check the quality of his product. The inspector tested a random sample of the
pens from a large batch and calculated the probability of any pen being
defective as 0.25.Peter buys two of the pens made by the manufacturer.
i) Draw a tree diagram to illustrate the information
ii) Calculate the probability that both pens are defective.
iii) Calculate the probability that exactly one of the pens is defective.

QUESTION SIX

a) Discuss the properties of a good measure of variation

b) A group of 50 students was asked which of three daily newspapers they read,
Nation Standard and Star. The results showed that 25 read Nation, 16 read
Standard and 14 read Star, 5 read both Nation and Standard, 4 read Standard and
Star while 6 read Nation and Star and 2 read all the three.
i) Illustrate these data on Venn diagram.
ii) Find the probability that a person selected at random from this group reads;
i) At least 1 of the newspapers.
ii) None of the newspapers.
iii) Only 1 of the newspapers.
iv) Only the Nation.

108
QUESTION SEVEN

a) Explain uses of indices.

b) The prices in Kshs per Kg and consumption in tones of some retail products in a
certain region between 2010 and 2011 were as shown in the table below;

Product 2010 2011

Price Quantity Price Quantity

A 2 40 5 80

B 4 20 8 50

C 4 10 4 25

D 5 20 10 60

E 8 75 12 90

Using 2010 as the base year, calculate the following

j) Paasche price index

ii) Laspeyres price index

iii) Fishers ideal index

109
MATHEMATICAL TABLES

Student's t-distribution

The table gives the values of t ; where


t ;
Pr(T > t ; )= , with degrees of freedom

0.1 0.05 0.025 0.01 0.005 0.001 0.0005

1 3.078 6.314 12.076 31.821 63.657 318.310 636.620


2 1.886 2.920 4.303 6.965 9.925 22.326 31.598
3 1.638 2.353 3.182 4.541 5.841 10.213 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869

6 1.440 1.943 2.447 3.143 3.707 5.208 5.959


7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587

11 1.363 1.796 2.201 2.718 3.106 4.025 4.437


12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073

16 1.337 1.746 2.120 2.583 2.921 3.686 4.015


17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.850

21 1.323 1.721 2.080 2.518 2.831 3.527 3.819


22 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.485 3.767
24 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.450 3.725

26 1.315 1.706 2.056 2.479 2.779 3.435 3.707


27 1.314 1.703 2.052 2.473 2.771 3.421 3.690
28 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 1.310 1.697 2.042 2.457 2.750 3.385 3.646

40 1.303 1.684 2.021 2.423 2.704 3.307 3.551


60 1.296 1.671 2.000 2.390 2.660 3.232 3.460
120 1.289 1.658 1.980 2.358 2.617 3.160 3.373
1.282 1.645 1.960 2.326 2.576 3.090 3.291

110
Z DISTRIBUTION TABLES

Table 1 AREAS UNDER THE STANDARD NORMAL CURVE

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

111
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998

112

You might also like