SMS 202
SMS 202
Course Editor:
March, 2014
BUSINESS STATISTICS CONTENTS
Introduction
What You Will Learn In This Course
BUSINESS STATISTICS (BHM202)
Course Aims
Course Objectives
Working Through This Course
Course
Materials Study
Units Set
Textbooks
Assignment File
Presentation Schedule
Assessment
Tutor-Marked Assignment
(TMAs) Final Examination And
Grading Course Marking
Scheme
Course Overview
How To Get The Most From This Course
Tutors And Tutorials
Summary.
INTRODUCTION:
The course consists of eighteen units that involved basic concepts and
principles of statistics and decision making process, forms of data, methods
of data estimation, summarizing data, graphical presentation of data, measures
of both index number and dispersion, co-efficient of correlation and
regression analysis, some elements of hypothesis tests and time series
analysis, distrib utio ns of both discrete and continuous random variables.
NOUN 2
BUSINESS STATISTICS (BHM202)
of the course.
This Course Guide tells you what the course is about, what course
materials you will be using and how you can work your way through these
materials. It suggests some general guidelines for the amount of time you
are likely to spend on each unit of the course in order to complete it
successfully. It also gives you some guidance on your tutor--marked
assignments. Detailed information on tutor-marked assignment is found
in the separate file.
There are likely going to be regular tutorial classes that are linked to the
course. It is advised that you should attend these sessions. Details of the
time and locations of tutorials will be communicated to you by National
Open University of Nigeria (NOUN).
Course
Aims
NOUN 3
BUSINESS STATISTICS (BHM202)
Course
Objectives
To achieve the aims set above the course sets overall objectives; in
addition, each unit also has specific objectives. The unit objectives are
included at the beginning of a unit, you should read them before you
start working through the unit. You may want to refer to them during
your study of the unit to check on your progress. You should always look
at the unit objectives after completing a unit. In this way you can be sure
you have done what was required of you by the unit.
NOUN 4
BUSINESS STATISTICS (BHM202)
To complete this course, you are required to read the study units, read set
books and other materials on the course.
Course
Materials
(1) Course
Guide (2)
Study Units
(3) Textbooks
(4) Presentation
Schedule.
Study Units
The course is in four modules and eighteen Study
Units as follows:
Module 1: Role and Concepts of Statistics
Unit 1: Role of Statistics (Application of Statistics)
Unit 2 Measurement of Variables
NOUN 5
BUSINESS STATISTICS (BHM202)
The first four units concentrate on the roles and concepts of statistics. This
constitutes Module 1. The next four units, module 2, concentrate on index
number and research in management. Module3, deal with the correlation and
regression analysis, The last five units Module 4, teach the principles
underlying the applications of some important probability distributions., module
5, teach the principles underlying the applications of some important test of
hypothesis and theory.
Each unit consists of one week direction for study, reading material, other
resources and summaries of key issues and ideas. The units direct you to work
on exercises related to the required readings
Each unit contains a number of self-tests. In general, these self-tests question
you on the material you have just covered or required you to apply it in
NOUN 6
BUSINESS STATISTICS (BHM202)
some way and thereby help you to assess your progress and to reinforce
your understating of the material. Together with tutor-marked assignments,
these exercises will assist you in achieving the stated learning objectives of the
individual units and of the course.
Set Textbooks
It is advisable you have some of the following
books
ONWE J.O. NOUN TEXT BOOK, ENT 321: Quantitative Methods for Business Decisions
Assessment
There are two types of the assessment of the course. First are the tutor-marked
assignments (TMA); second, there is a computer base examination.
In tackling the assignments, you are expected to apply information, knowledge
and techniques gathered during the course. The assignments must be submitted
to your tutor for formal Assessment in accordance with the deadlines
stated in the Presentation Schedule and the Assignments File on your NOUN
portal. The work you submit to your tutor for assessment will count for 30 % of
your total course mark.
At the end of the course, you will need to sit for a final computer base
examination of two hours' duration at designated centre. This examination
will also count for 70% of your total course mark.
Tutor-Marked Assignments
TMAs
There are four tutor-marked assignments in this course. You will submit
all the assignments. You are encouraged to work all the questions
thoroughly. Each assignment counts 12.5% toward your total course mark.
Assignment questions for the units in this course are contained in the
Assignment File. You will be able to complete your assignments from
the information and materials contained in your set books, reading
and study units. However it is desirable in all degree level education to
demonstrate that you have read and researched more widely than the
NOUN 7
BUSINESS STATISTICS (BHM202)
When you have completed each assignment, send it, together with a TMA
form, to your tutor. Make sure that each assignment reaches your tutor on
or before the deadline given in the Presentation File. If for any reason, you
cannot complete your work on time, contact your tutor before the
assignment is due to discuss the possibility of an extension. Extensions
will not be granted after the due date unless there are exceptional
circumstances.
The final examination will be of three hours' duration and have a value of
70% of the total course grade. The examination will consist of questions
which reflect the types of self testing, practice exercises and tutor-marked
problems you have previously encountered. All areas of the course will be
assessed
Use the time between finishing the last unit and sitting the examination to
revise the entire course. You might find it useful to review your self-tests,
tutor-marked assignments and comments on them before the examination.
The final examination covers information from all parts of the course.
NOUN 8
BUSINESS STATISTICS (BHM202)
Course Editor:
March, 2014
BUSINESS STATISTICS
NOUN 9
BUSINESS STATISTICS (BHM202)
NOUN 10
BUSINESS STATISTICS (BHM202)
NOUN 11
BUSINESS STATISTICS (BHM202)
SMS STATISTICS
CONTENTS PAGES
Module 1: Role and Concepts of Statistics
Unit 1: Role of Statistics (Application of Statistics) …………………….... 4
Unit 2 Measurement of Variables…………………………………………..9
Unit 3: Measurement of Dispersion, Skewness and Kurtosis.........................13
Unit 4 Decision Analysis and Administration..............................................29
NOUN 12
BUSINESS STATISTICS (BHM202)
The four units that constitute this module are statistically linked. By the end of this module
you would have been able to list, differentiate and link these common statistics functions as
well as identify and use them to solve related statistical problems. These units to be studied
are;
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Definition of Statistics
3.2 Role of Statistics
3.3 Basic Concept in Statistics
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
NOUN 13
BUSINESS STATISTICS (BHM202)
1.0 Introduction
You will realize that the activities of man and those of the various organizations,
that will often be referred to as firms, continue to increase. This brings an
increase in the need for man and the firms to make decisions on all these
activities. The need for the quality and the quantity of the information
required to make the decisions increases also. The management of any firm
requires scientific methods to collect and analyze the mass of information it
collects to make decisions on a number of issues. Such issues include the sales
over a period of time, the production cost and the expected net profit. In this
regard, statistics plays an important role as a management tool for making
decisions.
2.0 Objectives
From all these definitions, you will realize that statistics are concerned with
numerical data.. Examples of such numerical data are the heights and weights of
pupils in a primary school when evaluating the nutritional well being of the
pupils and the accident fatalities on a particular road for a period of time.
You should also know that when there are numerical data, there must be non-
numerical data such as the taste of brands of biscuits, the greenness of some
vegetables and the texture of some joints of a wholesale cut of meat. Non-
numerical data cannot be subjected to statistical analysis except they are
transformed to numerical data. To transform greenness of vegetables to
numerical data, a five point scale for measuring the colour can be developed
with 1 indicating very dull and 5 indicating very green.
You will realize that statistics is useful in all spheres of human life. A woman
with a given amount of money, going to the market to purchase foodstuff for
the family, takes decision on the types of food items to purchase, the quantity
and the quality of the items to maximize the satisfaction she will derive from the
NOUN 14
BUSINESS STATISTICS (BHM202)
purchase. For all these decisions, the woman makes use of statistics
Government uses statistics during census. The various forms sent by the
government to individuals and firms on annual income, tax returns, prices, costs,
output and wage rates generate a lot of statistical data for the use of the
government
Business uses statistics to monitor the various changes in the national economy
for the various budget decisions. Business makes use of statistics in production,
marketing, administration and in personnel management.
Statistics is also used extensively to control and analyze stock level such as
minimum, maximum and reorder levels. It is used by business in market
research to determine the acceptability of a product that will be demanded at
various prices by a given population in a geographical area. Management also
uses statistics to make forecast about the sales and labour cost of a firm.
Management uses statistics to establish mathematical relationship between two or
more variables for the purpose of predicting a variable in terms of others. For
the conduct and analyses of biological, physical, medical and social researches,
we use statistics extensively.
Let us quickly define some of the basic concepts you will continue to come
across in this course.
NOUN 15
BUSINESS STATISTICS (BHM202)
numerical form or that cannot be counted. Examples of this are colours
of fruits, taste of some brands of a biscuit.
• Discrete Variable: This is the variable that can only assume whole
numbers. Examples of these are the number of Local Government
Council Areas of the States in Nigeria, number of female students in
the various programmes in the National Open University. A discrete
variable has "interruptions" between the values it can assume. For
instance between 1 and 2, there are infinite number of values such as
1.1, 1.11, 1.111, 1.IV land so on. These are called interruptions.
• Sample: This is the part of the population that is selected for a study. In
studying the income distribution of students in the National Open
University, the incomes of 1000 students selected for the study, from
the population of all the students in the Open University will constitute
the sample of the study.
Exercise 1.1
4.0 Conclusion
In this unit you have learned a number of important issues that relate to the
meaning and roles of statistics. The various definitions and examples of concepts
given in this unit will assist tremendously in the studying of the units to follow.
5.0 Summary
What you have learned in this unit concerns the meaning and roles of statistics,
and the various concepts that are important to the study of statistics.
NOUN 16
BUSINESS STATISTICS (BHM202)
NOUN 17
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Definition of Variable
3.2 Measurement of Variables
3.3 Variance of Binomial Distribution
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Variable can be used under the following conditions:
Information, which are not numeric in nature are cafled qualitative verb instance, information
on colour of the skin, colour of the eye or hair, level, of education, si status, and other
qualitative categories as building types are qualitative variables. (Variables can be assigned
numerical values. This assignment of numerical values to information is called coding. Also
these qualitative data can be arranged in order of impoi values assigned to them in that order.
This is called ranking.
.2.0 OBJECTIVES
The aim of this unit is to enable student understand the meaning of variable and instances
when it is applicable.
NOUN 18
BUSINESS STATISTICS (BHM202)
Also rain is an object, but the amount of rain is a variable. Other variables Inc height, sex,
weight, colour of the skin, hair colour, genotype, blood group, maci religious affiliation, level
of education attained, place of residence of a person strength of Dangote cement, tensile
strength, number of bags of cement in the star of bags of cement used in the site per day,
expenditure, income of household per r degree of satisfaction, level of intelligence etc.
Therefore, any characteristic of an vary in time and space is called a variable.
Statistical raw data are generated or provided by these variables. That is, attached to the
variables constitutes statistical data. A single value of a variable it observation, an item, a
score or a case.
Quantitative variable can be classified into two major types, viz. dis continuous variables.
3.1.2. Discrete variables are variable those values are whole numbers or integers. Th
fractional part, they are countahs or finite. Examples of discrete variables include housing
unit, number of students in a class, number of goals scored in a fooft number of cars sold etc.
3.1.3. Continuous variables are variables that assume any value ii’ithin an interval or r have
the property of infinite divisibility. They can assume fractional values. Example weight,
height, cost, scores, income, breaking strength etc.
There are four measurement scales available as insrrumenrs for measurinl These scales easily
identiQi variables. The scales are nominal, ordinal, interval and Nominal scale -
This scale groups the objects into distinct categories to facilitate referencing. Alt is attached
to each distinct category. Examples of nominal scale variables include sex, in religious or
party affIliation, genotype, blood group, place of residence, etc. Also, we the various
categories of the nominal variables with numbers (or codes): When this number or code is
mere label or mere identification mark, which do not permit an operation. For instance,
marital status may be categorized as married, separated, divorce married, If we assign 1 to
married, 2 to separated, 3 to divorced and 4 to never marry, the married. If we assign 1 to
married, 2 to separted, 3 to divorced and 4 to never marry, these numbers ine coc The
numbers do not indicate order of importance of the various categories and the sum of land 3
can not produce categoiy 4. This is the lowest scale of measurement. O, sCar.
NOUN 19
BUSINESS STATISTICS (BHM202)
This scale ranks or orders the mutually exdusive categories of the variables according to the
importance attached to each category. This scale has all the properties of the nominal scale
plus the additional property of ordering or ranking the categories. Examples are, a teacher
rating his students according to their performance — A, B, C, D, E, and F or 1”, 3 income
groups of individuals dassifled as high, medium, and low, dassiflcanon of a city according to
high, medium and low density of population concentration. The numbers assigned to each
variable category only help toortleror rank the observations in ascending or descending order.
Many statist!ca! npcrntions that are based on ranking or rank ordering are permissible under
this scale. Examples of such statistical techniques are Spearman’s rank correlation
coefficient, Wilcoxon rank- sum test, signed rank test etc. This scale is higher than the normal
scale.
This has the combined properties of the nominal and ordinal scale plus the additional
property of measuring the distance or interval between two measurements. This scale gives
information on how much one category is more or less than the other. Examples are age in
years, income, pressure, and temperature. This scale has no absolute zero. That is, the
selected zero point in this scale is arbitrary. That a student scored zero percent in examination
does not mean that he does not know anything in that course. Interval variables are
quantitative and may be discrete or continuous. As such arithmetic operations of addiction
and subtraction are permitted. Many statistical procedures are permissible in this scale, the
mean, standard deviation, product moment correlation coefficient and other statistical
inferences are possible on this scale.
Ratio scale
This scale has all the properties of the nominal, ordinal and interval scales including the
additional property of having an absolute zero point. This is the highest level of
measurement. Examples are measurement of height, weight, volume, price of an item, votes
scored in an election, etc. many statistical procedures are available for ratio scale data.
Note that the scale of measurement of variables determines the type of statistical tool to be
employed.
4.0 CONCLUSION
In probability theory and statistics, the Binomial distribution is the discrete probability
distribution of the number of successes in a sequence of nindependent yes/no experiments,
NOUN 20
BUSINESS STATISTICS (BHM202)
each of which yields success with probabilityp. Such a success/failure experiment is also
called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a
Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of
statistical significance. The Binomial distribution is frequently used to model the number of
successes in a sample of size n drawn with replacement from a population of size N. If the
sampling is carried out without replacement, the draws are not independent and so the
resulting distribution is a hypergeometric distribution, not a binomial one. However, for N
much larger than n, the binomial distribution is a good approximation, and widely used.
5.0 SUMMARY
You have been made to understand in this unit that the meaning of variables. And the
measurement of various variables.. Therefore, in summary, the measurement of variable
describes the behaviour of a scale, if the following conditions apply:
1. The Ratio Scale.
2. Nominal Scale.
3. Ordinal Scale.
4. Interval Scale.
If in your application of variables, these conditions are met, then statistical scale has a
meaning.
6.0 TUTOR-MARKED ASSIGNMENT
1. What is a variable? Distinguish between quantitative and qualitative variables, discrete and
continuous variables.
NOUN 21
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Measurement of Dispersion
3.2 Measure of Skewness
3.3 Kurtosis
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References / Further Reading
1.0 INTRODUCTION
The second most important characteristics which describe a set of data is the amount of
variation, scatter, or spread in the data. In this chapter, we discuss in detail the various
measures of dispersion and skewness. The purpose of these measures is to amplify the
imperfect summary of any statistical distribution usually provided by the three measures of
averages commonly used: the mean, the median, and the mode. These averages are
inherently unsatisfactory because no single measure of average can tell you everything about
a distribution, and the wider the dispersion of a given data around the average, the less
satisfactory the average becomes. In order to improve your understanding of population
averages, you need to know how wide the dispersion is around the average, and whether it is
symmetrical (un-skewed) or asymmetrical (skewed).
The first set of measures to be discussed here are measures of dispersion, and the second set
measures of skewness.
NOUN 22
BUSINESS STATISTICS (BHM202)
Fig. 1: Normal Curve
2.0 OBJECTIVE
The main aim of this unit is to ensure students’ proper understanding of the measurement of
dispersion and skewness; appreciate its applicability in day-to-day business and scientific live
and be able to use it as appropriate in practical statistical studies
(d) Variance
The variation or dispersion can be said to measure the degree of uniformity of observations in
a given set of data. The greater the variation, the more un-uniform the observations in a
given set of data
The Range
The Range (R) of a given set of ungrouped data can be determined from an ordered array as
the difference between the highest observation and the lowest observation in a distribution..
XL = Lowest observation
NOUN 23
BUSINESS STATISTICS (BHM202)
Then, R = Xh-XL
R = 18 – 2 = 16.
Unlike the range, quartile deviation does not take extreme values or items. Quartiles are the
boundaries separating the items in a given distribution or set of data into quarters.
There are, therefore, three quartiles: the lower quartile (at the 25 percent mark); the median
(at the 50 percent mark); and, the upper quartile (at the 75 percent mark). To compute the
quartiles of ungrouped data, you simply use:
NOUN 24
BUSINESS STATISTICS (BHM202)
0.5n for the median quartile
Example
21 – 30 7
31 – 40 11
41 – 50 14
51 – 60 8
61 – 70 5
Table 3.1 indicates that there are 45 items or observations ( ie. total number of employees or
sum of the frequencies, f).
NOUN 25
BUSINESS STATISTICS (BHM202)
Lower quartile: Since, according to table 3.1, there are 7 items in the first group (ie, group of
21 – 30), the quartile item is the (11.25 – 7) = 4.25th item of the second group. Thus,
11
= 30 + 3.66
= 34 approximately.
In a similar process, the value of the median and upper quartiles can be determined, thus:
Value of Median quartile: The 22.5th item in the distribution is in the 41 – 50 group and is
the (22.5 – 18 ) = 4.5th item out of 14 in the group (note that the figure 18 is the cumulative
frequency of the first and second groups,and the figure 10 appearing in the calculations is the
class interval of the distribution). The value of the median quartile (Q2) is therefore:
Q2 = 40 + (4.5) x 10
14
= 40 + 3.21 = 43.21
= 43 units approximately.
Value of the Upper quartile (Q3): The 33.75th item in the distribution in the third group, the
group of (41 – 50), and since there are 32 items in the third group (the cumulative
frequency), the median is the (33.75 – 32) = 1.75th item in the fourth group. The value of the
upper quartile is therefore:
Q3 = 50 + (1.75) x 10
= 50 + 2.19 = 52.19
= 52 units approximately.
NOUN 26
BUSINESS STATISTICS (BHM202)
The quartile deviation referred to as the semi-interquartile range is defined as one-half the
difference between the upper quartile and the lower quartile. Thus,
Quartile Deviation = Q3 – Q1
52 – 34 = 9 units
The distribution in table 3.1 can then be described as having a median value of 43 units and a
quartile deviation around the median value of 9 units.
MD = Σ /X-X/
where Σ /X-X/ = sum of the absolute values of deviation from arithmetic mean
n = number of observation
X = ΣX = 67 = 9.57
n 7
NOUN 27
BUSINESS STATISTICS (BHM202)
By tabulation,
X (X - X) /X - X/
2 -7.57 7.57
5 -2.57 2.57
8 -1.57 1.57
9 -0.57 0.57
12 2.43 2.43
13 3.43 3.43
18 8.43 8.43_
Σ /X-X/ = 26.57
Thus,
MD = ∑/X-X/=26.57 = 3.7957
n 7
The Variance
The Variance for a given set of an ungrouped data can be defined by:
Variance = S2 = ∑x2-(∑x)2
n____
n-1
where X represents the numerical values of the given set of an ungrouped data.
X = 2,5,8,9,12,13,18
and by tabulation:
NOUN 28
BUSINESS STATISTICS (BHM202)
X_____________ X2
2 4
5 25
8 64
9 81
12 144
13 169
18 324
n 7 7
n-1 7-1
= 169.71 = 28.285
Simply stated, the standard deviation is the most useful measure of variation. It can be
defined as the square root of the variance for a given set of data.
Thus,
Or,
n-1
NOUN 29
BUSINESS STATISTICS (BHM202)
S = √S2 = √28.285 = 5.318
The computation of variance and standard deviation for a grouped data is illustrated with the
following example.
The Variance and Standard Deviation for a grouped data are defined by the following
formulations:
Variance = S2 = ∑fx2-(Σfx)2/n_
n-1
n-1
Example.
The following data presents the profit ranges of 100 firms in a given industry.
10-15 8
16-21 18
22-27 20
28-33 12
34-39 15
40-45 17
46-51 10__
∑f = n = 100
We are required to compute the variance and standard deviation of profits within the industry.
Solutions.
By definition,
Variance = S2=∑fx2-(∑fx)2/n
n-1
NOUN 30
BUSINESS STATISTICS (BHM202)
n-1
∑f=n=100 ∑fx=3044
SUMMARY:
∑fx2 = 104791
∑fx=3044
n 100
∑fx2 = 104791
It follows that:
NOUN 31
BUSINESS STATISTICS (BHM202)
Variance = S2 = ∑fx2-(∑fx)2/n = 104791-92659.36
n-1 100-1
= 12131.64 = 122.54
99
Thus, the required variance and standard deviation are 122.54 and 11.07 respectively.
The coefficient of variation measures the standard deviation relative to the mean and is
computed by:
X
The coefficient of variation is also useful in the comparison of two or more sets of data which
are measured in the same units but differ to such an extent that a direct comparison of the
respective standard deviations is not very helpful. As an example, suppose a potential
investor is considering the purchase of shares in one of two companies, A or B, which are
listed on the Nigerian Stock Exchange (NSE). If neither company offered dividends to its
shareholders and if both companies were rated equally high in terms of potential growth, the
potential investor might want to consider the volatility of the two stocks to aid in the
investment decision.
Now, suppose each share of stock in Company A has averaged N50 over the past months
with a standard deviation of N10. In addition, suppose that in this same time period, the price
per share for Company B’s stock averaged N12 with a standard deviation of N4. Observe
that in terms of actual standard deviations, the price of Company A’s shares seems to be more
volatile than that of Company B. However, since the average prices per share for the two
stocks are so different, it would be more appropriate for the potential investor to consider the
NOUN 32
BUSINESS STATISTICS (BHM202)
variability in price relative to the average price in order to examine the volatility/stability of
two stocks.
XA N50
XB N12
It follows that relative to the average, the share price of company B’s stock is much more
variable/unstable than that of Company A.
The measures of skewness are generally called Pearson’s first coefficient of skewness and
Pearson’s second coefficient of skewness. Measures of skewness are used in determining
the degree of asymmetry of a distribution; a distribution which is not symmetrical is said to
be skewed.
The Pearson’s No. 1 Coefficient of skewness: The formula used in calculating Pearson’s
No. 1 coefficient is:
Sk = Mean – Mode
Notice that the mean, the mode, and the standard deviation are all expressed in the units of
the original data. When the difference between the mean and the mode is computed as a
NOUN 33
BUSINESS STATISTICS (BHM202)
fraction as a fraction of the standard deviation ( or average spread of the data around the
mean), the original units cancel out in the fraction. The result will be a coefficient of
skewness, a number which tells you the extent of the skewness in the distribution.
Example: Consider a set of data on monthly sales of a company’s product, the mean of
which was found to be N240,000; the mode found to be N135,000; and the standard
deviation found to be N85,000. The Pearson’s No. 1 Coefficient of skewness would be
calculated as follows:
85,000
= 1.24
3.2 KURTOSIS
NOUN 34
BUSINESS STATISTICS (BHM202)
Example calculates the first four moments about the means for the weight
distribution of the students in National Open University of Nigeria given below:
Solution:
Thus m1 = 0
NOUN 35
BUSINESS STATISTICS (BHM202)
M4 = 179.91
4.0 CONCLUSION
Generally, a complete absence of skewness would have a coefficient of skewness equal to
zero. In our example, since the mean was larger than the mode, we obtained a positive
coefficient of skewness to the extent of 124% of the standard deviation.
The Pearson’s No. 2 Coefficient of Skewness: This type of the Pearson’s coefficient of
skewness came as a result of the fact that a precise calculation of mode is difficult in many
distributions. Hence, Pearson’s No. 2 coefficient of skewness uses the difference between
the mean and the median of the distribution instead of the difference between the mean and
the mode. In this calculation, you have the formula:
sk = 3(mean – median)
This formula should give you a more accurate measure of skewness than that of the
Pearson’s No. 1 formula.
5.0 SUMMARY
Easily now, you can comprehend that the dispersion and skewness can be described
completely by the two parameters . As always, the mean is the center of the
distribution and the standard deviation is the measure of the variation around the mean.
NOUN 36
BUSINESS STATISTICS (BHM202)
10 – 15 10
16 – 21 36
22 – 27 28
28 – 33 10
34 – 39 6
(a) The mean, modal, and median sales for the sales reps.
sales distribution
2. A distribution of data about the sales reps’ salaries per month is found to have an
arithmetic mean of N60,000, with a standard deviation of N15,000, and a coefficient of
skewness of 0.92. Explain what these terms mean in describing the distribution of the sales
reps’ salaries.
3. A certain set of data about the weight of female typists in the 25 – 32 age group gives a
mean weight of 51 kg, a standard deviation of 7.3 kg, and a median weight of 49.6 kg.
Compute and explain the coefficient of skewness
NOUN 37
BUSINESS STATISTICS (BHM202)
NOUN 38
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Administrative and Decision Analysis
3.2 Certainty and Uncertainty in Decision
3.3 Expected Monetary Value Decisions
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References / Further Reading
1.0 INTRODUCTION
DECISION ANALYSIS
Decision analysis is the modern approach to decision making both in economics and in
business. It can be defined as the logical and quantitative analysis of all the factors
influencing a decision. The analysis forces decision makers to assume some active roles in
the decision-making process. By so doing, they rely more on rules that are consistent with
their logic and personal behaviour than on the mechanical use of a set of formulas and
tabulated probabilities.
2.0 Objective
The primary aim of decision analysis is to increase the likelihood of good outcomes by
making good and effective decisions. A good decision must be consistent with the
information and preferences of the decision maker. It follows that decision analysis provides
decision-making framework based on available information on the business environment, be
it a sample information, judgmental information, or a combination of both.
The art of problem solving and decision making is base on common sense. It is to ensure that
better quality decisions are made for approaching problem solution.
There are two (2) main ways of approaching problems and obtaining solutions.
NOUN 39
BUSINESS STATISTICS (BHM202)
1. Analytical Thinking
2. Creative Thinking.
The above two (2) approaches may further be subdivided unto various methods
Critical Examinations
Brain Storming or Group Creativity
Analogies
Morphological Approach or Attribute Listening
Heuristic Approach
Critical Examinations: Is the logical approach, it answers questions like What, Who,
Where, How, When and Why. The result of ‘’why’’ investigation are supposed to indicate
possible alternatives or choices from which an acceptable solution may be derived.
Brain Storming: This is a method base on two head is better than one. Brain Storming
involves conference techniques by which a group of people attempts to find solution for
specified problems by amazing all ideas spontaneously contributed by its members. It is a
free thinking meeting.
Orientation
Consideration
Speculation ( opinion)
Recommendation
Analogies: This is the comparison of one thing with another that has similar features
Types of Analogous
Morphological Approach or Attribute Listening: This method looks for the attributes
or qualities of the product .i.e. comparison of the best one.
NOUN 40
BUSINESS STATISTICS (BHM202)
Most decision-making situations involve the choice of one among several alternatives
actions. The alternative actions and their corresponding payoffs are usually known to the
decision-maker in advance. A prospective investor choosing one investment from several
alternative investment opportunities, a store owner determining how many of a certain type of
commodity to stock, and a company executive making capital-budgeting decisions are some
examples of a business decision maker selecting from a multitude of a multitude of
alternatives. The decision maker however, does not know which alternative which alternative
will be best in each case, unless he/she also knows with certainty the values of the economic
variables that affect profit. These economic variables are referred to, in decision analysis, as
states of nature as they represent different events that may occur, over which the decision
maker has no control.
The states of nature in decision problems are generally denoted by si (i = 1, 2, 3, …, k), where
k is the number of or different states of nature in a given business and economic environment.
It is assumed here that the states of nature are mutually exclusive, so that no two states can be
in effect at the same time, and collectively exhaustive, so that all possible states are included
within the decision analysis.
When the state if nature, si, whether known or unknown, has no influence on the outcomes of
given alternatives, we say that the decision maker is operating under certainty. Otherwise,
he/she is operating under uncertainty.
Decision making under certainty appears to be simpler than that under uncertainty. Under
certainty, the decision maker simply appraises the outcome of each alternative and selects the
one that best meets his/her objective. If the number of alternatives is very high however,
even in the absence of uncertainty, the best alternative may be difficult to identify. Consider,
for example, the problem of a delivery agent who must make 100 deliveries to different
residences scattered over Lagos metropolis. There may literally be thousands of different
alternative routes the agent could choose. However, if the agent had only 3 stops to make,
he/she could easily find the least-cost route.
NOUN 41
BUSINESS STATISTICS (BHM202)
Decision making under uncertainty is always complicated. It is the probability theory and
mathematical expectations that offer tools for establishing logical procedures for selecting the
best decision alternatives. Though statistics provides the structure for reaching the decision,
the decision maker has to inject his/her intuition and knowledge of the problem into the
decision-making framework to arrive at the decision that is both theoretically justifiable and
intuitively appealing. A good theoretical framework and commonsense approach are both
essential ingredients for decision making under uncertainty.
Observe that actions a1 to a3 do not involve uncertainty as the outcomes associated with them
do not depend on uncertain market conditions. Observe also that action a 2 dominates actions
a1 and a3. In addition, action a1 is clearly inferior to the risk-free positive growth investment
alternatives a2 and a3 as it provides for no growth of the principal amount.
Action a4 is associated with an uncertain outcome that, depending on the state of the
economy, may produce either a negative return or a positive return. Thus there exists no
apparent dominance relationship between action a4 and action a2, the best among the actions
involving no uncertainty.
Suppose the investor believes that if the market is down in the next year, an investment in the
mutual fund would lose 10 percent returns; if the market stays the same, the investment
would stay the same; and if the market is up, the investment would gain 20 percent returns.
The investor has thus defined the states of nature for his/her investment decision-making
problem as follows:
NOUN 42
BUSINESS STATISTICS (BHM202)
A study of the market combined with economic expectations for the coming year may lead
the investor to attach subjective probabilities of 0.25, 0.25, and 0.50, respectively, the the
states of nature, s1, s2, and s3. The major question is then, how can the investor use the
foregoing information regarding investments A, B, and C, and the expected market behaviour
serves as an aid in selecting the investment that best satisfies his/her objectives? This
question will be considered in the sections that follow.
In problems involving choices from many alternatives, one must identify all the actions that
may be taken and all the states of nature whose occurrence may influence decisions. The
action to take none of the listed alternatives whose outcome is known with certainty may also
be included in the list of actions. Associated with each action is a list of payoffs. If an action
does not involve risk, the payoff will be the same no matter which state of nature occurs.
The payoffs associated with each possible outcome in a decision problem should be listed in
a payoff table, defined as a listing, in tabular form, of the value payoffs associated with all
possible actions under every state of nature in a decision problem.
The payoff table is usually displayed in grid form, with the states of nature indicated in the
columns and the actions in the rows. If the actions are labeled a 1, a2, …, an, and the states of
nature labeled s1, s2, …, sk, a payoff table for a decision problem appears as in table 10.1
below. Note that a payoff is entered in each of the nk cells of the payoff table, one for the
payoff associated with each action under every possible state of nature.
NOUN 43
BUSINESS STATISTICS (BHM202)
Table 3.1: The Payoff Table
STATE OF NATURE
ACTION s1 s2 s3 … sk
a1
a2
a3
an
Example
Solution
Let the three potential locations be sites A, B, and C. To determine a payoff measure to
associate with each of the company’s objectives under each alternative, the managing director
subjectively assigns a rating on a 0 – to – 10 scale to measure the degree to which each
location satisfies the company’s objectives. For each objective, a 0 rating indicates complete
dissatisfaction, while a 10 rating indicates complete dissatisfaction. The results are presented
in table 3.2 below:
NOUN 44
BUSINESS STATISTICS (BHM202)
Table 3.2: Ratings for three alternative plant sites for a
Manufacturing Company
ALTERNATIVE
Transportation Costs 6 4 10
Taxation Costs 6 9 5
Workforce Pool 7 6 4
To combine the components of payoff, the managing director asks himself, what are the
relative measures of importance of the three company objectives I have considered as
components of payoff? Suppose the managing director decides that minimising
transportation costs is most important and twice as important as either the minimization of
local taxation or the size of workforce available. He/she thus assigns a weight of 2 to the
transportation costs and weights of 1 each to taxation costs and workforce. This will give
rise to the following payoff measures:
A decision-making procedure, which employs both the payoff table and prior probabilities
associated with the states of nature to arrive at a decision is referred to as the Expected
Monetary Value decision procedure. Note that by prior probability we mean probabilities
representing the chances of occurrence of the identifiable states of nature in a decision
problem prior to gathering any sample information. The expected monetary value decision
refers to the selection of available action based on either the expected opportunity loss or the
expected profit of the action.
Decision makers are generally interested in the optimal monetary value decisions. The
optimal expected monetary value decision involves the selection of the action associated with
NOUN 45
BUSINESS STATISTICS (BHM202)
the minimum expected opportunity loss or the action associated with the maximum expected
profit, depending on the objective of the decision maker.
where Lij is the opportunity loss for selecting action ai given that the state of nature, sj, occurs
and P(sj) is the prior probability assigned to the state of nature, sj.
Example
By recording the daily demand for a perishable commodity over a period of time, a retailer
was able to construct the following probability distribution for the daily demand levels:
sj P(sj)
1 0.5
2 0.3
3 0.2
4 or more 0.0
NOUN 46
BUSINESS STATISTICS (BHM202)
The opportunity loss table for this demand-inventory situation is as follows:
a1(1) 0 3 6
a2(2) 2 0 3
a3(3) 4 2 0
We are required to find the inventory level that minimises the expected opportunity loss.
Solution
Given the prior probabilities in the first table, the expected opportunity loss are computed as
follows:
NOUN 47
BUSINESS STATISTICS (BHM202)
It follows that in order to minimize the expected opportunity loss, the retailer should stock 2
units of the perishable commodity. This is the optimal decision.
4.0 CONCLUSION
In conclusion Decision Analysis is a limiting case of Administration problems, it can be
applied in cases when the number is very large tending towards infinity and the probability of
success is very low.
5.0 SUMMARY
In this unit, student must have learnt the rudiments and applications of Administrative and
Decision Analysis. Students are must have learnt how to solve problems using Decision
Analysis.
Critical Examinations
Brain Storming or Group Creativity
Analogies
Morphological Approach or Attribute Listening
Heuristic Approach
3. The following table shows a set of utility values that have been
assessed for the associated Naira-valued outcomes by a decision
maker. If the decision maker wishes to maximise his/her expected utility,
how should he/she act on each of the following investment problems?
NOUN 48
BUSINESS STATISTICS (BHM202)
Naira-Valued Utility
Outcome
- N10,000 0
- N5,000 0.45
- N1,000 0.50
N0.00 0.55
N5,000 0.70
N10,000 0.80
N25,000 1.0
NOUN 49
BUSINESS STATISTICS (BHM202)
JUDE, MICAN & EDITH N. Statistical & Quantitative Methods for Construction
& Business Managers
NOUN 50
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References / Further Reading
1.0 INTRODUCTION
Index numbers are indicators which reflect the relative changes in the level of certain
phenomenon in any given period (or over a specified period of time) called the current period
with respect to its value in some fixed period called the base period selected for comparison.
The phenomenon or variable under consideration may be price, volume of trade, factory
production, agricultural production, imports or exports, shares, sales, national income, wage
NOUN 51
BUSINESS STATISTICS (BHM202)
structure, bank deposits, foreign exchange reserves, cost of living of people of a particular
community etc.
2.0 OBJECTIVE
The main objective of this unit is to provide students with good understanding of index
numbers and its applications in statistics and business management.
NOUN 52
BUSINESS STATISTICS (BHM202)
2. Selection of base period – The base period is the previous period with which
comparison of some later period is made. The index of the base period is taken to be
100. The following points should be borne in mind while selecting a base period:
(a) Base period should be a normal period devoid of natural disaster, economic boom,
depression, political instability, famine etc.
(b) The base period should not be too distant from the given period. This is because
circumstances such as tastes customs, habits and fashion keep changing.
(c) One must determine whether to use fixed-base or chain-base method
(1) Selection of commodities – Commodities to be selected must be relevant to the study;
must not be too large nor too small and must be of the same quality in different
periods.
(2) Data for the index number- Data to be used must be reliable.
(3) Type of average to be used – ie, arithmetic, geometric, harmonic etc.
(4) Choice of formula – There are different types of formulas and the choice is mostly
dependent on available data.
(5) System of weighting – Different weights should be assigned to different commodities
according to their relative importance in the group.
NOUN 53
BUSINESS STATISTICS (BHM202)
Based on this method quantity index is given by the formula:
Exercise: From the following data calculate Index Number by Simple Aggregate method.
Commodity A B C D
Price 2012 85 82 95 73
(2) Weighted Aggregate Method - In this method, appropriate weights are assigned to
various commodities to reflect their relative importance in the group. The weights can
be production figures, consumption figure or distribution figure
NOUN 54
BUSINESS STATISTICS (BHM202)
(ii) Fisher’s Price Index – Irving Fisher advocated the geometric cross of Laspeyre’s
and Paasche’s Price index numbers and is given as:
Fisher’s Index is termed as an ideal index since it satisfies time reversal and factor reversal
test for the consistency of index numbers.
Example 1: Consider the table below which gives the details of price and consumption of
four commodities for 2010 and 2012. Using an appropriate formula calculate an index
number for 2012 prices with 2010 as base year.
Commodities Price per unit 2010 Price per unit 2012 Consumption value
(N) (N) 2010 (N)
Solution: In the above problem, we are given the base year (2010) consumption values (poqo)
and current year quantities (q1) are not given, the appropriate formula for index number here
is the Laspeyre’s Price Index.
NOUN 55
BUSINESS STATISTICS (BHM202)
3620 4350
Therefore, the Laspeyre’s Price Index for 2012 with respect to (w.r.t) base 2010 is given by:
Example 2: From the following data calculate price index for 2012 with 2007 as the base
year by (i) Laspeyre’s method (ii) Pasche’s method (iii) Fisher’s method and
(iii) Dowbish-Bowley price index methods
Gaari 20 8 40 6
Rice 50 10 60 5
Fish 40 15 50 15
Palm-oil 20 20 20 25
NOUN 56
BUSINESS STATISTICS (BHM202)
Solution:
Commodities 2007 2012
Price Quantity Price Quantity poqo poq1 p1qo p1q1
(po) (qo) (p1) (q1)
Gaari 20 8 40 6 160 120 320 240
Rice 50 10 60 5 500 250 600 300
Fish 40 15 50 15 600 600 750 750
Palm-oil 20 20 20 25 400 500 400 500
Total poqo= poq1= p1q0= p1q1=
1660 1470 2070 1790
Laspeyre’s Price Index
= 1.24699 X 100
= 124.7
(i) Pasche’s Price Index
= 1.2177 X 100
= 121.77
(ii) Fisher’s Price Index
= 123.23
(iii) Dorbish-Bowley Price Index
NOUN 57
BUSINESS STATISTICS (BHM202)
= ½ [1.247 + 1.2177] X100
= 1.23235 X 100
= 123.24
4.0 CONCLUSION
In conclusion, the uses index numbers are enormous. Its uses and importance goes beyond the
field of statistics and economics but also applicable in policy formulation, governance and so
on. Methods which can be used to study the statistic is also diverse as different variants have
been proposed by statisticians and economics alike.
5.0 SUMMARY
In this unit, we have been able to introduce students to the concept of index numbers, its uses
and methods of calculation. Students are now expected to be proficient in the calculation, use
and interpretation of index numbers. This is useful in the study and interpretation of inflation,
cost of living, trends of economic variables among others.
NOUN 58
BUSINESS STATISTICS (BHM202)
W 15 150 18 216
X 21 252 30 240
Y 30 240 36 288
Z 12 60 15 90
NOUN 59
BUSINESS STATISTICS (BHM202)
NOUN 60
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Statistics deals with the theories and methods of collection, presentation, analysis, and
interpretation of numerical data.
2.0 OBJECTIVES
In general, the objective for you the student here is to make you appreciate the purpose of
statistical tests and data, in the determinationof whether some hypotheses are extremely
unlikely given observed data.
Advantages – They are always relevant to the subject under study because they
are collected primarily for the purpose.
- They are more accurate and reliable
- Provide opportunity for the researcher to interact with study population.
- Information on other relevant issues can be obtained
NOUN 61
BUSINESS STATISTICS (BHM202)
Disadvantages – Always costly to collect
- Inadequate cooperation from the study population
- Wastes a lot of time and energy
(b) Secondary Data: These are data which have been collected by someone else or
some organization either in published or unpublished forms.
Advantages: - It is easier to get
- It is less expensive
Disadvantages:-May not completely meet the need of the research at
hand because it was not collected primarily for that purpose
- There is always a problem of missing periods
(2) Classification based on form of the data: Sometimes, data are classified based on the
form of the data at hand and may be classified as:
(a) Cross-sectional data – These are data collected for cross-section of subjects
(population under study) at a time. For example, data collected on a cross-section of
household on demand for recharge card for the month of August 2013.
(b) Time-series data – These are data collected on a particular variable or set of
variables over time e.g a set Nigeria’s Gross Domestic Product (GDP) values form
1970 to2012.
(c) Panel Data – These combine the features of cross-sectional and time-series data. They
are type of data collected from the same subjects over time. For example, a set of data
collected on monthly recharge card expenditure from about 100 households in Lagos
from January to December 2013 will form a panel data.
Note that Social and Economic data of national importance are collected routinely as by-
product of governmental activities e.g. information on trade, wages, prices, education, health,
crime, aids and grants etc.
Sources of Data
1. Source of Primary data:
(i) Census
(ii) Surveys
2. Sources of Secondary data:
NOUN 62
BUSINESS STATISTICS (BHM202)
(i) Publications of the Federal Bureau of statistics
(ii) Publications of Central Bank of Nigeria
(iii) Publications of National population commission
(iv) Nigerian Custom Service
(v) Nigeria Immigration Service
(vi) Nigerian Port Authority
(iv) Federal and State Ministries, Departments and
Agencies
Some of the publications referred to above are:
(i) Annual Digest of statistics (by NBS)
(ii) Annual Abstract of statistics (by NBS)
(iii) Economic and Financial Review (by CBN)
(iv) Population of Nigeria (by NPC)
4.0 CONCLUSION
Here, a further aim of statistical data and testing is shown to you to quantify evidence against
a particular hypothesis being true. You were able to think of it as testing to guide research.
We believe a certain statement may be true and want to work out whether it is worth
investing time investigating it. Therefore, we look at the opposite of this statement. If it is
quite likely then further study would seem to not make sense. However if it is extremely
unlikely then further study would make sense.
5.0 SUMMARY
This unit has acquainted you with the transformation of the processed data into statistics and
steps in the statistical cycle. The transformation involves analysis and interpretation of data to
identify important characteristics of a population and provide insights into the topic being
investigated.
6.0 TUTOR-MARKED ASSIGNMENT
1. Distinguish between primary and secondary data
2. What are the advantages of primary data
3. List 4 source of secondary data you know
4. Distinguish between cross-sectional and panel data
NOUN 63
BUSINESS STATISTICS (BHM202)
NOUN 64
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Researchers collect data in order to test hypotheses and to provide empirical support for
explanations and predictions. Once investigators have constructed their measuring instrument
in order to collect sufficient data pertinent to the research problem, the subsequent
explanations and predictions must be capable of being generalised to be of scientific value.
Generalizations are important not only for testing hypotheses but also for descriptive
purposes. Typically, generalizations are not based on data collected from all the observations,
all the respondents, or the events that are defined by the research problem as this is always
not possible or where possible too expensive to undertake. Instead, researchers use a
relatively small number of cases (a sample) as the bases for making inferences for all the
cases (a population).
2.0 OBJECTIVES
The objective here is to make an awareness of how the art of sampling is a very valuable tool
in collecting of data for planning and decision making.
NOUN 65
BUSINESS STATISTICS (BHM202)
students). A careful selection of relatively small number of students across faculties,
departments and levels will possibly give a representation of the entire student population.
The entire set of relevant units of analysis, or data is called the population. When the data
serving as the basis for generalizations is comprised of a subset of the population, that subset
is called a sample. A particular value of the population, such as the mean income or the level
of formal education, is called a parameter; its counterpart in the sample is termed the
statistic. The major objective of sampling theory is to provide accurate estimates of unknown
values of the parameters from sample statistics that can be easily calculated. To accurately
estimate unknown parameters from known statistics, researchers have to effectively deal with
three major problems:
Population
Methodologically, a population is the “aggregate of all cases that conform to some designated
set of specifications”. For example, a population may be composed of all the residents in a
specific neighbourhood, legislators, houses, records, and so on. The specific nature of the
population depends on the research problem. If you are investigating consumer behaviour in a
particular city, you might define the population as all the households in that city. Therefore,
one of the first problems facing a researcher who wishes to estimate a population value from
a sample value is how to determine the population involved.
NOUN 66
BUSINESS STATISTICS (BHM202)
study population. A sampling unit is not necessarily an individual. It can be an event, a
university, a city or a nation.
Sampling Frame
Once researchers have defined the population, they draw a sample that adequately represents
that population. The actual procedures involve in selecting a sample from a sample frame
comprised of a complete listing of sampling units. Ideally, the sampling frame should include
all the sampling units in the population. In practice, a physical list rarely exists; researchers
usually compile a substitute list and they should ensure that there is a high degree of
correspondence between a sampling frame and the sampling population. The accuracy of a
sample depends, first and foremost, on the sampling frame. Indeed, every aspect of the
sample design – the population covered, the stages of sampling, and the actual selection
process – is influenced by the sampling frame. Prior to selecting a sample, the researcher has
to evaluate the sampling frame for potential problems.
Sample Design
The essential requirement of any sample is that it be as representative as possible of the
population from which it is drawn. A sample is considered to be representative if the analyses
made using the researcher’s sampling units produce results similar to those that would be
obtained had the researcher analysed the entire population.
NOUN 67
BUSINESS STATISTICS (BHM202)
population. When a researcher is using a probability sample design, it is possible for him or
her to estimate the population’s parameters on the basis of the sample statistics calculated.
Quota samples: The chief aim of quota sample is to select a sample that is as similar as
possible to the sampling population. For example, if it is known that the population has equal
numbers of males and females, the researcher selects an equal numbers of males and females
in the sample. In quota sampling, interviewers are assigned quota groups characterised by
specific variables such as gender, age, place of residence, and ethnicity.
NOUN 68
BUSINESS STATISTICS (BHM202)
probability that you will get a head or a tail is equal and known (50 percent), and each
subsequent outcome is independent of the previous outcomes.
Random selection procedures ensure that every sampling unit of the population has an equal
and known probability of being included in the sample; this probability is n/N, where n stands
for the size of the sample and N for the size of the population. For example if we are
interested in selection 60 household from a population of 300 households using simple
random sampling, the probability of a particular household being selected is 60/300 = 1/5.
Systematic Sampling: It consists of selecting every kth sampling unit of the population after
the first sampling unit is selected at random from the total of sampling units. Thus if you wish
to select a sample of 100 persons from total population of 10,000, you would take every
hundredth individual (K=N/n = 10,000/100 = 100). Suppose that the fourteenth person were
selected; the sample would then consist of individuals numbered 14,114, 214, 314, 414, and
so on. Systematic sampling is more convenient than simple random sampling. Systematic
samples are also more amenable for use with very large populations or when large samples
are to be selected.
Stratified Sampling:Researchers use this method primarily to ensure that different groups of
population are adequately represented in the sample. This is to increase their level of
accuracy when estimating parameters. Furthermore, all other things being equal, stratified
sampling considerably reduces the cost of execution. The underlying idea in stratified
sampling is to use available information on the population “to divide it into groups such that
the elements within each group are more alike than are the elements in the population as a
whole. That is, you create a set of homogeneous samples based on the variables you are
interested in studying. If a series of homogenous groups can be sampled in such a way when
the samples are combined they constitute a sample of a more heterogeneous population, you
will increase the accuracy of your parameter estimates.
Cluster sampling: it is frequently used in large-scale studies because it is the least expensive
sample design. Cluster sampling involves first selecting large groupings, called clusters, and
then selecting the sampling units from the clusters. The clusters are selected by a simple
random sample or a stratified sample. Depending on the research problem, researchers can
include all the sampling units in these clusters in the sample or make a selection within the
clusters using simple or stratified sampling procedures.
Sample size
NOUN 69
BUSINESS STATISTICS (BHM202)
A sample is any subset of sampling units from a population. A subset is any combination of
sampling units that does not include the entire set of sampling units that has been defined as
the population. A sample may include only one sampling unit, or any number in between.
There are several misconceptions about the necessary size of a sample. One is that the sample
size must be certain proportion (often set as 5 percent) of the population; another is that the
sample should total about 2000; still another is that any increase in the sample size will
increase the precision of the sample results. These are faulty notions because they do not
derive from the sampling theory. To estimate the adequate size of the sample properly,
researchers need to determine what level of accuracy is expected of their estimates; that is,
how large a standard error is acceptable.
Standard error
Some people called it error marginorsampling error. The concept of standard error is central
to sampling theory and to determining the size of a sample. It is one of the statistical
measures that indicate how closely the sample results reflect the true value of a parameter.
NOUN 70
BUSINESS STATISTICS (BHM202)
Respondents have time to think about their answers and /or consult other sources.
Questionnaires provide wide access to geographically dispersed samples at low cost
Disadvantages
Questionnaires require simple, easily understood questions and instructions
Mail questionnaires do not offer researchers the opportunity to probe for additional
information or to clarify answers.
Researchers cannot control who fills out the questionnaire.
Response rate are low
Personal interview
The personal interview is a face-to-face, interpersonal role situation in which an interviewer
asks respondents question designed to elicits answers pertinent to the research hypotheses.
The questions, their wording, and their sequence define the structure of the interview.
NOUN 71
BUSINESS STATISTICS (BHM202)
Flexibility: The interview allows great flexibility in the questioning process, and the
greater the flexibility, the less structure the interview. Some interviews allow the
interviewer to determine the wording of the questions, to clarify terms that are
unclear, to control the order in which the question are presented, and to probe for
additional information and details.
Control of the interview situation: An interviewer can ensure that the respondents
answer the questions in the appropriate sequence or that they answer certain questions
before they ask subsequent questions.
High response rate: The personal interview results in a higher response rate than the
mail questionnaire.
Fuller information: An interviewer can collect supplementary information about
respondents. This may include background information, personal characteristics and
their environment that can aid the researcher in interpreting the results.
Telephone interview
It is also called telephone survey, and can be characterised as a semi-personal method
of collecting information. In comparison, the telephone is convenient, and it produces
a very significant cost saving.
NOUN 72
BUSINESS STATISTICS (BHM202)
Moderate cost
Speed: Telephone interviews can reach a large of respondents in a short
time. Interviewers can code data directly into computers, which can later
compile the data.
High response rate: Telephone interviews provide access to people who
might be unlikely to reply to a mail questionnaire or refuse a personal
interview.
Quality: High quality data can be collected when interviewers are centrally
located and supervisors can ensure that questions are being asked correctly
and answers are recorded properly.
4.0 CONCLUSION
This unit has relayed to you that a well-chosen sample can usually provide reliable
information about the whole of the population to any desired degree of accuracy. In some
instances sampling is an alternative to a complete census, and may be preferable mainly
because of its cheapness and convenience.
5.0 SUMMARY
You now would be able to discern that a sample is a subset of a population selected to meet
specific objectives. And also familiar with the guiding principle and sampling techniques in
selecting a sample, is that it must, as far as possible have the essential characteristics of the
target population.
NOUN 73
BUSINESS STATISTICS (BHM202)
1. Explain three non-probability sampling methods
2. What are the advantages of telephone interview
3. Is there any disadvantage(s) in personal interview method of data collection
NOUN 74
BUSINESS STATISTICS (BHM202)
Unit 4: ESTIMATION THEORY
CONTENTS
1.0 Introduction
2.0 Objective
3.0 Main Content
3.1 Methods of Point Estimation
3.2 Method of Maximum likelihood
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 Introduction
Point Estimation when a single numerical value of the statistic is used as an estimate of the
exact population value, we have a point or target estimate. An estimate is value of the sample
statistic which is taken as an approximation of the parameter value. An estimator refers to the
formula or statistic which has been chosen to provide an estimate of the population value.
The mean, mode, medium, variance etc are examples of point estimates. In any population
distribution with mean j and variance o the corresponding estimators are the sample mean and
sample variance given as
Note that the estimators are functions of the random samples which do not depend on the
parameters.
2.0 OBJECTIVES
The main objective of this unit is to enable students understand the theory behind and the
application of estimation in statistics. Students are expected at the end of this unit to be able
to apply estimation theory to solving day-to-day business and economic problems
The following are methods of obtaining point estimators of the population parameter.
NOUN 75
BUSINESS STATISTICS (BHM202)
Method of maximum likelihood
Method of moments
3.2 Method of Maximum Likelihood: Let x x x,, be a random sample of size n from a
population ,vith pdf f(x, 0). The likelihood function is the function of the sample values x 2’
...x, which expresses the joint probability of occurrence of the sample values. That is, the
likelihood of the random samples is the product of their respective probability distribution.
The Maximum Likelihood Estimator (MLE) of 0 based on a random sample x x ... x is the
value of 0 which maximizes the likelihood function L(0; x x ...
Since any positively valued function attains a maximum at the same point as its logarithm
function, we obtain the m.l.e usually by maximizing the natural logarithm of the likelihood.
Given
NOUN 76
BUSINESS STATISTICS (BHM202)
NOUN 77
BUSINESS STATISTICS (BHM202)
Example: Given the data, 5, 8, 3, 4, 6, 1. Obtain: (i) first and second noncentral moment (ii)
second central moment, (iii) 4 moment about zero, (second moment about 5.
NOUN 78
BUSINESS STATISTICS (BHM202)
Generally, the rth moment of a random variable X about the mean or the rth central moment
is given as:
4.0 CONCLUSION
This unit has relayed to you that a well-chosen estimation can usually provide reliable
information about the whole of the population to any desired degree of accuracy. In some
NOUN 79
BUSINESS STATISTICS (BHM202)
instances estimation is an alternative to a complete census, and may be preferable mainly
because of its cheapness and convenience.
5.0 SUMMARY
You now would be able to discern that a estimation theory is a subset selected to meet
specific objectives. And also familiar with the guiding principle and estimation techniques in
selecting formula, is that it must, as far as possible have the essential characteristics of the
target estimation.
6.0TUTOR-MARKED ASSIGNMENT
Given the data, 3, 8, 5, 1, 6, 4. Obtain: (i) first and second non-central moment (ii) second
central moment.
NOUN 80
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Perfect Positive Correlation
3.2 Perfect Negative Correlation
3.3 Strong Positive Correlation
3.4 Strong Negative Correlation
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Correlation can be defined as the branches of statistics that deals with
mutual dependence or inter-relationship of two or more variables. If the
value of two variables such that when one changes, the other changes
too, then the variable are said to be correlated.
Note that the degree of relationship which exist between two variables.
The degree of relationship existing between two variables is called
simple correlation. While the degree of relationship that connected three
or more variables together is called Multiple correlation.
NOUN 81
BUSINESS STATISTICS (BHM202)
2.0
2.0 OBJECTIVES
The main objective of this unit is to enable students understand the theory behind and the
application of correlation in statistics. Students are expected at the end of this unit to be able
to apply correlation theory to solving day-to-day business and economic problems.
r=1
3.2 Perfect Negative Correlation: This indicates that all the points
passes through the normal straight line and non deviated from the
line. The curve shown downward slope of units.
r=1
NOUN 82
BUSINESS STATISTICS (BHM202)
3.5 Weak positive correlation: In these case the points are deviated
from each other so that each of the scatter points are for the depart
from each other and the association is weak. The slope is positive
and not close, to unity.
NOUN 83
BUSINESS STATISTICS (BHM202)
3.7 No Correlation: The scatter point at random and did not form any
regular pattern for recognization by any straight line. There is no
association between the variables.
4. Conclusion
The relationships among business variables can simply be identified using correlation
coefficients. Two variables can either be positively or negatively correlated. This correlation
can be linear or nonlinear depending on variable characteristics.
5. Summary
For a precise quantitative measurement of the degree of correlation between two variables,
say X and Y, we use a parameter referred to as the correlation coefficient. The sample
estimate of this parameter is referred to as r.
6. Tutor-Marked Assignment
1. Explain with the use of diagram different types of correlation
2. Differentiate between strong positive correlation and negative
correlation.
NOUN 84
BUSINESS STATISTICS (BHM202)
1.0 Introduction
2.0 Objective
3.0 Main Content
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Coefficient of correlation refers as the ratio of covariance between the
related variables to the square root of the product of individual variance.
2.0 OBJECTIVE
At the end of this unit, you should be able to:
describe the computation of linear correlation coefficients
apply the concept of correlations in business decisions.
3.0 MAIN CONTENT
r: r = Sxy
Sx Sx
r= ∑xy
√ ∑x2 – ∑y2
Where,
X=x–x
Y = y – y respectively
From above equation, substitutes for x and y
r= ∑ (x – x) (y – y)
√ ∑ (x – x)2 – ∑ ( y – y) 2
NOUN 85
BUSINESS STATISTICS (BHM202)
r= ∑ (x – x) (y – y)
√ ∑ (x – x) (x – x) – ∑ (y – y) ( y – y)
From numerator above,
r = ∑ (x – x) (y – y)
r = ∑ (xy – xy – yx + xy)
r = ∑ (xy - ∑y x - ∑xy + ∑x . ∑y
∩ ∩ ∩ ∩
r = ∑xy - ∑y ∑x - ∑x . ∑y + ∩ . ∑x . ∑y
∩ ∩ ∩ ∩
= ∑xy - ∑x ∑y
∩
= ∩ ∑xy - ∑x . ∑y ---------------------(i)
∩
From renominator,
∑ (x – x) (x - x)
∑ (x2 – x x – xx f xx)
∑ (x2 - ∑x x - ∑y x + ∑x . ∑x
∩ ∩ ∩ ∩
∑x2 – (∑xy)2 – (∑x) + (∑x)2
∩ ∩ ∩
= ∑x2 – (∑X)2
Mathematically,
∑ (y – y) (y - y)
∑ (y2 – y y – yy f yy)
NOUN 86
BUSINESS STATISTICS (BHM202)
∑ (y2 - ∑y y - ∑y x + ∑y . ∑y
∩ ∩ ∩ ∩
∑y2 – (∑y)2 – (∑y) + (∑y)2
∩ ∩ ∩
= ∑y2 – (∑y)2
∩
= ∑x2 – (∑x)2 ----------------------(ii)
Or
r= ∑xy
∑x2 . ∑y2
Remarks:
The value of r can be expressed in 3 ways of interpretation of relationship
between x and y.
i. When r = +1, i.e. perfect (positive) linear relationship
NOUN 87
BUSINESS STATISTICS (BHM202)
Data
µ = 6; ∑x = 30; ∑y = 180; ∑xy = 1000; ∑x2 = 250; ∑y2 = 5642
Yrs X Y xy X2 Y2
1990 3 25 75 9 625
1989 2 20 40 4 400
NOUN 88
BUSINESS STATISTICS (BHM202)
r1 = 600
√(300) (1452)
r1 = 600
√ 435600
r1 = 600 = 0.9091
√660
Remarks: They are highly perfect / related.
Illustration: Lasu Campus stores has been selling the believe it or not.
Wonders of statistics study guide for 12 Semester and would like to estimates
the relationship between sales and number of sections of elementary statistics
taught in each Semester. The data below have been collected.
Sales 33 35 24 61 52 45 65 82 29 63 50 79
(units)
No of 3 7 6 6 10 12 12 13 12 13 14 15
sections
38 7 226 1444 49
24 6 144 576 36
61 6 366 3721 36
NOUN 89
BUSINESS STATISTICS (BHM202)
Data
∑x = 621
∑y = 123
∑xy = 6833
∑x2 = 385641
∑y2 = 15129
NOUN 90
BUSINESS STATISTICS (BHM202)
2003 180 72
2004 135 60
2005 156 66
2006 165 70
2007 178 74
2008 160 65
2009 132 62
2010 145 67
NOUN 91
BUSINESS STATISTICS (BHM202)
Data
V = 10
∑x = 669
∑y = 1559
∑xy = 10,4887
∑x2 = 44943
∑y2 = 245443
r= 5899
6227.27 = 0.9037
= 0.9037
But r = 0.8169 = 0.82
The coefficient determination shows the variation in the independent variable
(y) as a result of corresponding variation in the explanatory variables (x).
NOUN 92
BUSINESS STATISTICS (BHM202)
This shows that 90% of beer consumption belong to road accident and of is
thus RA = F (BO). The interpretation of coefficient correlation means that
0.82% road accident is brought about 90% of the beer consumption.
4.0 Conclusion
The relationships among business variables can simply be identified using correlation
coefficients. Two variables can either be positively or negatively correlated. This correlation
can be linear or nonlinear depending on variable characteristics.
5.0 Summary
For a precise quantitative measurement of the degree of correlation between two variables,
say X and Y, we use a parameter referred to as the correlation coefficient. The sample
estimate of this parameter is referred to as r.
Qty 10 20 50 40 50 60 80 90 90 120
Supply
Unit 2 4 6 8 10 12 14 16 18 20
Price
(N)
NOUN 93
BUSINESS STATISTICS (BHM202)
Contents
1.0 Introduction
2.0 Objectives
3.0 Main content
3.1 Analysis of Rank Correlation
4.0 Summary and Conclusion
5.0 Tutor-Marked Assignment
6.0 Further Reading
7.0 References
1.0 INTRODUCTION
It is found very difficult to quantity a data or set of data that has big values.
Rank correlation is used to determine the extent at which the variable are
correlated. This idea was employed by Spearman’s rank correlation
coefficient, which is computed by using this formula.
r= 1 - 6∑d2
(x2 -1)
2.0 OBJECTIVE
At the end of this unit, you should be able to:
explain the computation of rank correlation coefficients
apply the concept of correlations in business decisions.
NOUN 94
BUSINESS STATISTICS (BHM202)
Where,
O = number of observation
r = rank correlation
note: In a cases where there tied or tied in ranking of variables x and y, other
representation is applicable.
r1 = 1 -6 (∑d2 + t3 –t)
(π +1) (π– 1)
Illustration: The following data refer to the students scores. The general level
of their intelligent in 9 selected courses. Using Spearman’s correlated
techniques to determine the straight of the relationship between the students
cadres and their intelligent.
Sales y 16 14 15 13 31 16 10 17 20
(units)
Intelligent x 38 41 48 22 64 64 26 53 30
Y X rx Ry d = (rx – d2
ry)
14 41 5 7 -2 4
15 48 4 6. -2 4
13 22 9 8 1 1
10 26 8 9 -1 1
NOUN 95
BUSINESS STATISTICS (BHM202)
17 53 3 3 0 0
20 30 7 2 5 25
Data
X=9
∑F2 = 46.5
r = 1 -6∑d2
x (x2 -1)
1 – 6 (46.5)
9 (92 – 1)
1 – 6 (46.5)
9 (92 – 1)
720
Illustrate: A market research asked two (2) smoker to express their difference
for 12 difference brands of cigarettes. The reply as shown in the following
table.
Brand of A B C D E F G H I J K L
cigarette
Smoker z 9 10 4 1 8 11 3 2 5 7 12 6
(v)
Smoker 7 8 3 2 10 12 1 6 5 4 11 9
W (x)
NOUN 96
BUSINESS STATISTICS (BHM202)
Y X rx Ry d d2
9 7 6 4 2 4
10 8 5 3 2 4
4 3 10 9 1 1
1 2 11 12 -1 1
8 10 3 5 -2 4
11 12 1 2 -1 1
3 1 12 10 2 4
2 6 7 11 -4 16
5 5 8 8 0 0
7 4 9 6 3 9
12 11 2 1 1 1
6 9 4 7 -3 9
Data
∩ = 12
∑d2 = 54
r1 = 1 ∩∑d2
n(n2 – 1)
= 1 – 12 (54) 1 - 324
12 (122 – 1) 1716
= 1 – 0.1888
r= 0.89
Illustration: Assuming that 10 men assign to a particular job or task were given
two aptitude test. After they have been on the job for some period of time. The
production manager was ask to rank the employees from 1st to 10th in regard
to their value to the company. You, as the particular manager, should use the
Spearman’s technique to determine the relationship between the 2 test.
NOUN 97
BUSINESS STATISTICS (BHM202)
Workers A B C D E F G H I J
Test 1 96 98 79 78 84 84 76 79 62 44
Test 2 78 72 60 72 64 84 72 56 78 40
Y X rx ry d d2
44 40 10 10 0 0
Data
∩ = 10
∑d2 = 110.25
r1 = 1 6∑d2
n(n2 – 1)
= 1 – 6 x 110.25 1 – 661.5
10 (102 – 1) 990
= 1 – 0.6682
r= 0.3318
NOUN 98
BUSINESS STATISTICS (BHM202)
X 56 57 55 58 51 56 56 58 57 57 57 57
Y 52 40 37 43 57 45 47 51 68 49 43 48
a. Rank in data
b. Compute Spearman’s coefficient of rank correlation
X Y rx Ry d d2
56 52 9 3 6 36
57 40 5 11 6 36
55 37 11 12 1 1
58 43 1.5 9.5 8 64
51 57 1.2 2 10 100
56 45 9 8 1 1
56 47 9 7 2 4
57 68 5 1 4 16
57 49 5 5 0 0
57 48 5 6 1 1
Data
∩ = 12
∑d2 = 285.5
r1 = 1 6∑d2
n(n2 – 1)
= 1 – 6 x 285.5 1 – 1713
NOUN 99
BUSINESS STATISTICS (BHM202)
12 (122 – 1) 1716
= 1 – 0.9983
r= 0.0001
Comment: The value of r1 shows that x and y are not correlated i.e. they are
not in agreement
rs = 1 – 6 (∑d2)
n (n2 – 1)
where,
NB: for large-sample where n is 10 or more, the student’s “t” distribution can
be used as the test of statistic. And the degree of freedom is given as (n -2)
t = rs π – 2
1 – rs 2
NOUN 100
BUSINESS STATISTICS (BHM202)
1 1 4
2 2 3
3 3 2
4 4 6
5 5 1
6 6 5
7 7 8
8 8 12
9 9 11
10 10 9
11 11 7
12 12 10
1 1 4 -3 9
2 2 3 -1 1
3 3 2 1 1
4 4 6 -2 4
5 5 1 4 16
6 6 5 1 1
NOUN 101
BUSINESS STATISTICS (BHM202)
7 7 8 -1 1
8 8 12 -4 16
9 9 11 -2 4
10 10 9 1 1
11 11 7 4 16
12 12 10 2 4
∑d2 = d 4
rs = 6∑d2
(n2 -1)
1– 6(74) = 1 = 0.741
12(12-1)
Decision: The value 0.741 indicate fairly strong positive association between
the ranks of mechanical ability and social compatibility.
α = 0.05
d.f = n -2 = 12 – 2 = 10
t = rs√ n – 2
1-r2s
= 3.4895 = 3.49
NOUN 102
BUSINESS STATISTICS (BHM202)
Dallae Paul 94 95
At the 0.05 significance level can we conclude that the IQ scores have
increased in over 20 years. Compute the coefficient of rank correlation.
NOUN 103
BUSINESS STATISTICS (BHM202)
NOUN 104
BUSINESS STATISTICS (BHM202)
(REGRESSION ANALYSIS)
Contents
1.0 Introduction
4.0 Objectives
5.0 Main content
6.0 Summary and Conclusion
7.0 Tutor-Marked Assignment
6.0 Further Reading
7.0 References
1.0 INTRODUCTION
Regression analysis can be defined as the relationship between two or more
variables. This relationship has to do with the changes that result from a
change in one of the related variables.
2.0 OBJECTIVE
The main objective of this unit is to enable students understand the theory behind and the
application of regression analysis in statistics. Students are expected at the end of this unit to
be able to apply regression analysis to solving day-to-day business and economic problems.
3.0 MAIN CONTENT
This involve only two variables and the relationship between them tends
towards a fixed direction.
NOUN 105
BUSINESS STATISTICS (BHM202)
This also involved more than two variables in the regression model or
equation.
Regression line of any form can be titled to a bivariate data by any of the
following methods.
1. Freehand method
In this method, regression line is fitted into the scatter diagram
Philosophy 38 51 19 53 39 38 66
marks
Mathematics 50 32 36 54 52 56 80
marks
NOUN 106
BUSINESS STATISTICS (BHM202)
By scatter diagram:
140
120
100
80
60
40
20
0 10 20 30 40 50 60 70
Limitation:
By representation,
Y = a+bx; b = the coefficient of x and x = independent variable.
NOUN 107
BUSINESS STATISTICS (BHM202)
x ∑x
∑x ∑x2
To obtain
x ∑x = N ∑x2 - ∑x . ∑x
∑x ∑x2 = ∑x2 - ∑x2
= ∑x2 – (∑x)2
∑xy ∑x2
∑x ∑xy
∆1 ∆
NOUN 108
BUSINESS STATISTICS (BHM202)
∑x2 – (∑x)2
∑x2 – (∑x)2
Non-Linear Model
On most occasion, the simple linear model and in particular the multiple linear
model will not be satisfactory. A plot or scatter diagram on the dominant
variable may suggest that the relationship is not linear. We consider non-linear
model, which involves:
Types of Curve
NOUN 109
BUSINESS STATISTICS (BHM202)
Y = a + b/x or y = 1/a+bx
To linearise y, take x = 1/x, then we have
Y = a +bx
i/y = a + bx
since y = 1/y
therefore, y = a +bx
3. Power curve mor model: This power model have the form of y = axb.
Otherwise known as logarithms functions. The general representation
can be given as:
y = axb
to linearise: obtain log10 to both sides
log10 y = log10 a + blog10x
since a and b are constant.
Log10y = y1 and log10 a = A
Therefore, y1 = A + bx
Illustration: Draw the scatter diagram and fit an exponential curbe in the
following data
X years Y sales
O 100
1 150
2 225
3 337.5
506.25
X Y Log y Xlogy X2
0 100 2.000 0 0
NOUN 110
BUSINESS STATISTICS (BHM202)
Data
∑x = 10
∑y = 11.7610
∑xy = 25.2831
∑x2 = 30
y = abx
to find a and b
firstly, to find b = ?
b= n∑xy - ∑x ∑y
n ∑x2 – (∑x)2
5(30) – 100
b= 8.8056 = 0.17611
50
to obtain a = ?
A = y – bx
A = y – bx
A = ∑y – b ∑x
n n
NOUN 111
BUSINESS STATISTICS (BHM202)
5 5
A = 1.6989
A-1 (1.6989) = 50
Therefore, y = axb
Y = 50(x2)
Therefore, y = 50 (12) = 5
Y = 50 (22) = 200
Y = 50 (32) = 450
4.0 Conclusion
The relationships among business variables can simply be identified using correlation
coefficients. Two variables can either be positively or negatively correlated. This correlation
can be linear or nonlinear depending on variable characteristics.
5.0 Summary
For a precise quantitative measurement of the degree of correlation between two variables,
say X and Y, we use a parameter referred to as the correlation coefficient. The sample
estimate of this parameter is referred to as r.
Philosophy 3 5 9 5 9 8 6
marks
Mathematics 5 2 3 4 5 6 8
marks
2.
3. Illustration: Draw the scatter diagram and fit an exponential curbe in the
following data
NOUN 112
BUSINESS STATISTICS (BHM202)
NOUN 113
BUSINESS STATISTICS (BHM202)
1.0 INTRODUCTION
e.g. y = a+b1x1+b2x2+b3x3------bx xn
2.0 OBJECTIVE
The above expression can be solved by the normal equation of the three
variables.
y = a+b1x1+b2x2+b3x3------bx xn
∑y = ax + b1∑x1+b2∑x2------------(i)
∑x1y = a∑x1+b1∑x12+b2∑x1x2----------(ii)
∑x2y = a∑x2+b1∑x12,x2+b2∑x22--------(iii)
r2 = a∑y+b1∑x1y+b2∑x2y – (∑y)2
∑y2 – (∑y/x)2
NOUN 114
BUSINESS STATISTICS (BHM202)
Student 1 2 3 4 5 6 7 8 9 10
No of 9 6 12 14 11 6 19 16 3 9
lecturer
Exams 56 45 80 73 71 55 95 86 34 66
scores
y = a+b1x1+b2x2+b3x3------bx xn
Data:
NOUN 115
BUSINESS STATISTICS (BHM202)
∑y = 661
∑y2 = 46889
∑x1 = 105
∑x12 = 1321
∑x2 = 1054
∑x22 = 111,806
∑x1y = 7744
∑x2y = 69730
∑x1x2 = 10,974
To find b = p
n ∑x12 – (∑x1)2
10 (1321) – (105)2
Therefore: bx = 3.68
To find a = ?
n n
10 10
NOUN 116
BUSINESS STATISTICS (BHM202)
But, y = ax1+bx1x1
y =27.64+3.68
y = 27.64+ 3.68x1
4.0 Conclusion
The relationships among business variables can simply be identified using correlation
coefficients. Two variables can either be positively or negatively correlated. This correlation
can be linear or nonlinear depending on variable characteristics.
5.0 Summary
For a precise quantitative measurement of the degree of correlation between two variables,
say X and Y, we use a parameter referred to as the correlation coefficient. The sample
estimate of this parameter is referred to as r.
A partial correlation coefficient measures the relationship between any two variables,
keeping other variables constant.
The limitations of linear correlations as a technique for the study of economic relations are as
follows
The formula for correlation coefficient applies only to linear relationships between variables.
That correlation coefficient as a measure of co-variability of variables does not imply any
functional relationship between the variables concerned.
Students 1 2 3 4 5 6 7 8 9 10
Hours 9 6 12 14 11 6 19 16 3 9
studied
x1
NOUN 117
BUSINESS STATISTICS (BHM202)
NOUN 118
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Application of Hypothesis and t-distribution
3.2 Test for single mean
3.3 Assumptions for Student’s test
3.4 t-Test for difference of means
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References / Further Reading
1.0 INTRODUCTION
A hypothesis can be defined as a conjectural statement a postulate, or a proposition about an
assumed relationship between two or more variables.
Hypothesis testing or testing a hypothesis are used interchangeably. Hypothesis testing starts
with a statement about population parameters such as mean. But, in an attempt to reach a
decision, statistician often make an assumption or proposition about the population involve.
Such assumption which is subject to testing either may be true or may not be true is called
statistical hypothesis.
If the population variance is unknown then for the large samples, its estimates provided by
sample variance S2 is used and normal test is applied. For small samples an unbiased estimate
of population variance σ2 is given by:
NOUN 119
BUSINESS STATISTICS (BHM202)
It is quite conventional to replace σ2 byS2 (for small samples) and then apply the normal test
even for small samples. W.S Goset, who wrote under the pen name of Student, obtained the
sampling distribution of the statistic for small samples and showed that it is far from
normality. This discovery started a new field, viz ‘Exact Sample Test’ in the history of
statistical inference.
Note: If x1, x2...............xn is a random sample of size n from a normal population with mean μ
and variance σ2 then the Student’s t statistic is defined as:
2.0 OBJECTIVES
The objective of this unit is to introduce students to t-distribution and emphasize its
application in statistics.
(i) The given normal population has a specified value of the population mean, say μ o.
(ii) Thesample mean differ significantly from specified value of population mean.
(iii) A given random sample x1, x2...............xnof size n has been drawn from a normal
population with specified meanμo.
NOUN 120
BUSINESS STATISTICS (BHM202)
Basically, all the three problems are the same. We set up the corresponding null
hypothesis thus:
Where = and
We compute the test-statistic using the formula above under Ho and compare it with the
tabulated value of t for (n-1) d.f.at the given level of significance. If the absolute value of the
calculated t is greater than tabulated t, we say it is significant and the null hypothesis is
rejected. But if the calculated t is less than tabulated t, Ho may be accepted at the level of
significance adopted.
Example: Ten cartons are taken at random from an automatic filling machine. The mean net
weight of the 10 cartons is 11.8kg and standard deviation is 0.15kg. Does the sample mean
differ significantly from the intended weight of 12kg, α=0.05
Hint: You are given that for d.f. =9, t0.05 = 2.26
NOUN 121
BUSINESS STATISTICS (BHM202)
Null hypothesis, Ho: μ = 12 kg (i.e. the sample mean of =11.8 kg does not differ
significantly from the population mean μ = 12 kg
The tabulated value of t for 9 d.f. at 5% level of significance is 2.26. Since the calculated t is
much greater than the tabulated t, it is highly significant. Hence, null hypothesis is rejected at
5% level of significance and we conclude that the sample mean differ significantly.
Let x1, x2...............xn and y1, y2...............yn be two independent random samples from the
given normal populations.
Ho: μx = μyi.e the two samples have been drawn from the normal populations with the same
means. Under the hypothesis that the σ12 = σ22 = σ2 i.e population variances are equal but
unknown, the test statistic under Ho is:
Where
And
This is an unbiased estimate of the common population variance σ2 based on both the
samples. By comparing the computed value of t with the tabulated value of t for n1 + n2 -2
d.f. and at desired level of significance, usually 5% or 1%, we reject the null hypothesis.
NOUN 122
BUSINESS STATISTICS (BHM202)
Example: The nicotine content in milligram of two samples of tobacco were found to be as
follows:
Sample A: 24 27 26 21 25
Sample B: 27 30 28 31 22 36
Can it be said that the two samples come from the same normal population having the same
mean?
Solution Hints: Applying the above formula and calculating the variance as appropriate, the
calculated t-value is -1.92. the tabulated value for 9 d.f. at 5% level of significance for two-
tailed test is 2.262. Since calculated t is less than the tabulated t, it is not significant and the
null hypothesis is accepted.
4.0 CONCLUSION
T-testhas very wide applications. It can be applied in the tests of single mean, in the
comparison of two different means and in the test of significance of other parameter
estimates.
5.0 SUMMARY
Here, you would have learnt how to apply t-test in solving statistical problems such as test to
confirm if mean is a certain value, to test significance of the difference between two mean
among others.
2.Prices of shares of a company on the different days in a month were found to be: 66, 65,
69, 70, 69, 71, 70, 63, 64 and 68. Discuss whether the mean price of the price of the
shares in the month is 65.
3.Two salesmen A and B are working in certain district. From a Sample Survey Conducted
by the Head Office the following results were obtained. State whether there is any
significant difference in the average sales between the two salesmen.
NOUN 123
BUSINESS STATISTICS (BHM202)
A B
No. of sales 20 18
Average sales (in ‘000N) 170 205
Average sales (in ‘000N) 20 25
NOUN 124
BUSINESS STATISTICS (BHM202)
UNIT 2:
F-TEST
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Applications of the F-distribution
3.2 For testing equality of population variances
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
In F-TEST, If X is a χ2-variate with n1 degree of freedom and Y is an independent χ2-variate
with n2 degree of freedom, then F-statistic is defined as:
i.e. F-statistic is the ratio of two independent chi-square variates divided by their respective
degrees of freedom. The statistic follows G.W Snedecor’sF-distribution with (n1, n2)degree of
freedom with probability density function given by:
Where is a constant which is so determined that total area under the probability curves is
1 i.e
Note: The sampling distribution of F-statistics does not involve any population parameters
and depends only on the degrees of freedom n1 and n2. The graph of the function p(F) varies
with the degree of freedom n1 and n2.
Critical values of F-distribution: The available F-tables in most standard statistical table
give the critical values of F for the right-tailed test, i.e. the critical region is determined by the
right tail areas. Thus, the significant value Fα (v1,v2) at level of significance α and (v1,v2) d.f.
is determined by the equation:
NOUN 125
BUSINESS STATISTICS (BHM202)
2.0 OBJECTIVE
The main objective of this section is to introduce student to the world of F-distribution and
learn its theories and application to day-to-day business and economic problems.
3.2 For testing equality of population variances: Here, we set up the Null hypothesis Ho:
σ1 = σ 2 = σ,i.e. population variances are the same. In other words, H o is that the two
independent estimates of the common population variance do not differ significantly.
Under Ho, the test statistic is
Where, and are unbiased estimates of the common population variance σ 2 and are given
by:
and it follows Snedecor’s F-distribution with v1 =n1-1, v2 =n2-1 d.f.; i.e. F ~F(v1, v2)
Since F-test is based on the ratio of two variances, it is also known as variance ratio test.
Assumption for F-test for equality of variances
1. The samples are simple random samples
2. The samples are independent of each other
NOUN 126
BUSINESS STATISTICS (BHM202)
3. The parent populations from which the samples are drawn are normal
N.B (1) since, the most available tables of the significant values of F are for the right-tail test,
i.e. against the alternative Ho: σ12 > σ22, in numerical problems we will tale greater of the
variances or as the numerator and adjust for the degree of freedom accordingly. Thus,
in F ~ (v1, v2), v1 refers to the degree of freedom of the larger variance, which must be taken
as the numerator while computing F.
If Ho is true i.e. σ12 = σ22 = σ2 the value of F should be around 1, otherwise, it should be
greater than 1. If the value of F is far greater than 1 the Ho should be rejected. Finally, if we
take larger of or as the numerator, all the tests based on the F-statistic become right
tailed tests.
- All one tailed tests for Ho at level of significance “α” will be right tailed tests only
with area “α” in the right.
- For two-tailed tests, the critical valuesare located in the right tail of F-distribution
with area (α/2) in the right tail.
Example 1: The time taken (in minutes) by drivers to drive from Town A to Town B driving
two different types of cars X and Y is given below
Car Type X: 20 16 26 27 23 22
Car Type Y: 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which the
samples are drawn do not differ significantly?
Solution:
X d = x – 22 d2 Y d = y -35 D2
20 -2 4 27 -8 64
16 -6 36 33 -2 4
26 4 16 42 7 49
NOUN 127
BUSINESS STATISTICS (BHM202)
25 5 9 35 0 0
23 1 1 32 -3 9
22 0 0 34 -1 1
38 3 9
4.0 CONCLUSION
In conclusion, F-test can be used to test the equality of several population variances, several
population means, and overall significance of a regression model.
5.0 SUMMARY
Students have learnt the theories and application of the F-test
NOUN 128
BUSINESS STATISTICS (BHM202)
Sample Size Sample Mean Sum of squares of deviation from the mean
1 10 12 120
2 12 15 314
NOUN 129
BUSINESS STATISTICS (BHM202)
UNIT 3:
CHI-SQUARE TEST
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Application of Chi-Square Distribution
3.2 Chi-squared test of goodness of fit
3.3 Steps for computing χ2 and drawing conclusions
3.4 Chi-Square test for independence of attributes
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References/ Further Reading
1.0 INTRODCUTION
The square of a standard normal variable is called a Chi-square variate with 1 degree of
freedom, abbreviated as d.f. Thus if x is a random variable following normal distribution with
mean μ and standard deviation , then (X- μ)/ is a standard normal variate.
If X1, X2, X3, ...........................Xv are v independent random variables following normal
distribution with means μ1, μ2, μ3,................... μv, and standard deviations σ1, σ2, σ3,..... σv
respectively then the variate
χ2 = +
which is the sum of the squares of v independent standard normal variates, follow Chi-square
distribution with v d.f.
2.0 OBJECTIVE
NOUN 130
BUSINESS STATISTICS (BHM202)
The main objective of this unit is to enable students understand the theory behind and the
application of chi-square statistics. Students are expected at the end of this unit to be able to
apply chi-square analysis to solving day-to-day business and economic problems.
Definition of χ2
A measure of discrepancy existing between the observed and expected frequencies is
supplied by the statistics χ2 given by
χ2 =
NOUN 131
BUSINESS STATISTICS (BHM202)
observation (experiment) and theory may be attributed to chance (fluctuations of sampling) or
if it is really due to the inadequacy of the theory to fit the observed data.
Under the null hypothesis that there is no significant difference between the observed
(experimental and the theoretical or hypothetical values i.e. there is good compatibility
between theory and experiment.
χ2 =
Follows χ2-distribution with v = n-1, d.f. where O1, O2,..................On are the observed
frequencies and E1, E2,..................En are the corresponding expected or theoretical
frequencies obtained under some theory or hypothesis.
(v) Under the null hypothesis that the theory fits the data well, the statistic follows χ 2-
distribution with v = n-1 d.f.
(vi) Look for the tabulated (critical) values of χ2 for (n-1) d.f. at certain level of
significance, usually 5% or 1%, from any Chi-square distribution table.
If calculated value of χ2 obtained in step (iv) is less than the corresponding
tabulated value obtained in step (vi), then it is said to be non-significant at the
required level of significance. This implies that the discrepancy between observed
values (experiment) and the expected values (theory) may be attributed to chance,
i.e. fluctuations of sampling. In other words, data do not provide us any evidence
against the null hypothesis [given in step (v)] which may, therefore, be accepted at
NOUN 132
BUSINESS STATISTICS (BHM202)
the required level of significance and we may conclude that there is good
correspondence (fit) between theory and experiment.
(vii) On the other hand, if calculated value of χ2 is greater than the tabulated value, it is
said to be significant. In other words, discrepancy between observed and expected
frequencies cannot be attributed to chance and we reject the null hypothesis. Thus,
we conclude that the experiment does not support the theory.
Example 1:A pair of dice is rolled 500 times with the sums in the table below
It should be noted that the expected sums if the dice are fair, are determined from the
distribution of x as in the table below:
1
2 /36
2
3 /36
3
4 /36
4
5 /36
5
6 /36
NOUN 133
BUSINESS STATISTICS (BHM202)
6
7 /36
5
8 /36
4
9 /36
3
10 /36
2
11 /36
1
12 /36
To obtain the expected frequencies, the P(x) is multiplied by the total number of trials
1
2 15 /36 13.9
2
3 35 /36 27.8
3
4 49 /36 41.7
4
5 58 /36 55.6
5
6 65 /36 69.5
NOUN 134
BUSINESS STATISTICS (BHM202)
6
7 76 /36 83.4
5
8 72 /36 69.5
4
9 60 /36 55.6
3
10 35 /36 41.7
2
11 29 /36 27.8
1
12 6 /36 13.9
To calculate the overall Chi-squared value, recall that χ2 = i.e. we add the
individual χ2 value.
Therefore, χ2 = 0.09 + 1.86 + 1.28+ 0.10 + 0.29 + 0.66 + 0.09 + 0.35 + 1.08 + 0.05 + 4.49
χ2 = 10.34
NOUN 135
BUSINESS STATISTICS (BHM202)
Therefore, table value = 18.3
Decision: since the calculated value which is 10.34 is less than table (critical) value the null
hypothesis is accepted.
Exercise: The following figures show the distribution of digits in numbers chosen at random
from a telephone directory:
Digit 0 1 2 3 4 5 6 7 8 9 Total
Frequency 1,02 1,107 997 966 1,075 933 1,107 972 964 853 10,000
Test whether the digits may be taken to occur equally frequently in the directory. The table
value of χ2 for d.f at 5% level of significance is 16.92.
Hint: Set up the null hypothesis that the digits 0, 1, 2, 3, ..........9 in the numbers in the
telephone directory are uniformly distributed, i.e all digits occur equally frequently in the
directory. Then, under the null hypothesis, the expected frequency for each of the digits 0, 1,
2, 3,.............9 is 10,000/10 = 1,000
NOUN 136
BUSINESS STATISTICS (BHM202)
: : : ............. ............ : :
: : : ........ ............ : :
Where (Ai) is the frequency of the ith attribute Ai,i.e, it is, number of persons possessing the
attribute Ai , i=1,2, .........r; (Bj) is the number of persons possessing the attribute Bj, j=1,2.....s;
and (AiBj) is the number of persons possessing both the attributes Ai and Bj ; (i: 1, 2, ......r; j:
1, 2, ........, s)
Under the hypothesis that the two attributes A and B are independent, the expected frequency
for (Ai, Bi) is given by
=N
Thus, under the null hypothesis of independence of attributes, the expected frequencies for
each of the cell frequencies of the above table can be obtained on using this last equation. The
rule in the last can be stated in the words as follows:
NOUN 137
BUSINESS STATISTICS (BHM202)
“Under the hypothesis of independence of attributes the expected frequency for any of
the cell frequencies can be obtained by multiplying the row totals and the column totals in
which the frequency occurs and dividing the product by the total frequency N”.
Here, we have a set of r x s observed frequencies (AiBj) and the corresponding expected
frequencies (AiBj)o. Applying χ2–test of goodness of fit , the statistic
χ2 =
Comparing this calculated value of χ2 with the tabulated value for (r-1)X(s-1) d.f.and at
certain level of significance, we reject or retain the null hypothesis of independence of
attributes at that level of significance.
Note: For the contingency table data, the null hypothesis is always set up that the attributes
under consideration are independent. It is only under this hypothesis that formula (AiBj)o =
Example: A movie producer is bringing out a new movie. In order to map out her
advertising, she wants to determine whether the movie will appeal most to a particular age
group or whether it will appeal equally to all age groups. The producer takes a random
sample from persons attending a pre-reviewing show of the new movie and obtained the
result in the table below. Use Chi-square (χ2) test to arrive at the conclusion (α=0.05).
Age-groups (in years)
Persons Under 20 20-39 40– 59 60& over Total
Liked the movie 320 80 110 200 710
Dislikedthe movie 50 15 70 60 195
Indifferent 30 5 20 40 95
Total 400 100 200 300 1,000
Solution:
It should be noted that the two attributes being considered here are the age groups of the
people and their level of likeness of the new movie. Our concern here is to determine whether
the two attributes are independent or not.
NOUN 138
BUSINESS STATISTICS (BHM202)
Null hypothesis (Ho): Likeness of the of the movie is independent of age group (i.e. the
movie appeals the same way to different age group)
Alternative hypothesis (Ha): Likeness of the of the movie depends on age group (i.e. the
movie appeals differently across age group)
As earlier explained, to calculate the expected value in the cell of row 1 column 1, we divide
the product of row 1 total and column 1 total by the grand total (N) i.e.
NOUN 139
BUSINESS STATISTICS (BHM202)
Eij = (AiBj)/N
Therefore, E11 =
E12 =
E13 =
E14 =
E21 =
E22 =
E23 =
E24 =
E31 =
E32 =
E33 =
E34 =
NOUN 140
BUSINESS STATISTICS (BHM202)
Table of expected values
NOUN 141
BUSINESS STATISTICS (BHM202)
χ2calculated=
=4.56+1.14+7.12+0.79+10.05+1.04+24.64+0.04+1.68+2.13+0.05+4.64 = 57.97
Recall, that the d.f. is (number of row minus one) X (number of column minus one)
Decision: Since the calculated χ2 value is greater than the table (critical value) we shall reject
the null hypothesis and accept the alternative.
Conclusion:It can be concluded that the movie appealed differently to different age groups
(i.e. likeness of the movie is dependent on age).
4.0 CONCLUSION
In conclusion, chi-squared analysis has very wide applications which include test of
independence of attributes; test of goodness fit; test of equality of population proportion and
to test if population has a specified variance among others. This powerful statistical tool is
useful in business and economic decision making.
5.0 SUMMARY
In this unit, we have examined the concept of chi-square and its scope. We also look at its
methodology and applications. It has been emphasized that it is not just an ordinary statistical
exercise but a practical tool for solving day-to-day business and economic problems.
NOUN 142
BUSINESS STATISTICS (BHM202)
Private School 6 14 17 9 46
Public School 30 32 17 3 86
Total 36 46 34 12 128
Ho: The distribution of test scores is the same for private and public high school students at
α=0.05
2. A manufacturing company has just introduced a new product into the market. In order to
assess consumers’ acceptability of the product and make efforts towards improving its
quality, a survey was carried out among the three major ethnic groups in Nigeria and the
following results were obtained:
Ethnic groups
Persons Igbo Yoruba Hausa Ijaw Total
Using the above information, does the acceptability of the product depend on the ethnic
group of the respondents? (Take α=1%)
NOUN 143
BUSINESS STATISTICS (BHM202)
Swift L., (1997).Mathematics and Statistics for Business, Management and Finance.London
UK, Macmillan.
NOUN 144
BUSINESS STATISTICS (BHM202)
UNIT 4:
CONTENTS
5.0 Introduction
6.0 Objectives
7.0 Main Content
7.1 Assumption for ANOVA test
7.2 The one-way classification
7.3 Bernoulli Distribution
8.0 Conclusion
9.0 Summary
10.0 Assignment
11.0 References/Further Reading
1.0 INTRODUCTION
In day-to-day business management and in sciences, instances may arise where we need to
compare means. If there are only two means e.g. average recharge card expenditure between
male and female students in a faculty of a University, the typical t-test for the difference of
two means becomes handy to solve this type of problem. However in real life situation man is
always confronted with situation where we need to compare more than two means at the
same time. The typical t-test for the difference of two means is not capable of handling this
type of problem; otherwise, the obvious method is to compare two means at a time by using
the t-test earlier treated. This process is very time consuming, since as few as 4 sample means
would require 4C2 = 6, different tests to compare 6 possible pairs of sample means. Therefore,
there must be a procedure that can compare all means simultaneously. One such procedure is
the analysis of variance (ANOVA). For instance, we may be interested in the mean telephone
recharge expenditures of various groups of students in the university such as student in the
faculty of Science, Arts, Social Sciences, Medicine, and Engineering. We may be interested
in testing if the average monthly expenditure of students in the five faculties are equal or not
or whether they are drawn from the same normal population. The answer to this problem is
provided by the technique of analysis of variance. It should be noted that the basic purpose of
the analysis of variance is to test the homogeneity of several means.
NOUN 145
BUSINESS STATISTICS (BHM202)
The term Analysis of Variance was introduced by Prof. R.A Fisher in 1920s to deal with
problem s in the analysis of agronomical data. Variation is inherent in nature. The total
variation in any set of numerical data is due to a number of causes which may be classified
as:
The variation due to assignable causes can be detected and measured whereas the variation
due to chances is beyond the control of human and cannot be traced separately.
2.0 OBJECTIVE
The main objective of this unit is to teach students the theories and application of Analysis of
Variance (ANOVA). It is hoped that students should after taking this unit be able to apply
ANOVA in solving business and economic problem especially as it concern multiple
comparison of means
ANOVA as a tool has different dimensions and complexities. ANOVA can be (a) One-way
classification or (b) two-way classification. However, the one-way ANOVA we will deal
with in this course material.
Note
NOUN 146
BUSINESS STATISTICS (BHM202)
(ii) The origin of the ANOVA technique lies in agricultural experiments and as such
its language is loaded with such terms as treatments, blocks, plots etc. However,
ANOVA technique is so versatile that it finds applications in almost all types of
design of experiments in various diverse fields such as industry, education,
psychology, business, economics etc.
(iii) It should be clearly understood that ANOVA technique is not designed to test
equality of several population variances. Rather, its objective is to test the equality
of several population means or the homogeneity of several independent sample
means.
(iv) In addition to testing the homogeneity of several sample means, the ANOVA
technique is now frequently applied in testing the linearity of the fitted regression
line or the significance of the correlation ratio.
NOUN 147
BUSINESS STATISTICS (BHM202)
n = n1 +n2 +...........................+ nk =
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
The total variation in the observations Xijcan be split into the following two components:
(i) The variation between the classes or the variation due to different bases of
classification (commonly known as treatments in pure sciences, medicine and
agriculture). This type of variation is due to assignable causes which can be
detected and controlled by human endeavour.
(ii) The variation within the classes, i.e. the inherent variation of the random variable
within the observations of a class. This type of variation is due to chance causes
which are beyond the control of man.
NOUN 148
BUSINESS STATISTICS (BHM202)
The main objective of the analysis of variance technique is to examine if there is significant
difference between the class means in view of the inherent variability within the separate
classes.
Steps for testing hypothesis for more than two means (ANOVA): Here, we adopt the
rejection region method and the steps are as follows:
Step 2: Compute the means and standard deviations for each of the by the formular:
Also, compute the mean of all the data observations in the k-classes by the formula:
BSS = (
Step 5: Obtain the Within Classes Sum of Squares (WSS) by the formula:
NOUN 149
BUSINESS STATISTICS (BHM202)
Which follows F-distribution with (v1 = k-1, v2 = n-k)d.f (This implies that the
degrees of freedom are two in number. The first one is the number of classes (treatment) less
one, while the second d.f is number of observations less number of classes)
Step 8: Find the critical value of the test statistic F for the degree of freedom and at
desired level of significance in any standard statistical table.
If computed value of test-statistic F is greater than the critical (tabulated) value, reject (Ho,
otherwise Ho may be regarded as true.
Example 1: To test the hypothesis that the average number of days a patient is kept in the
three local hospitals A, B and C is the same, a random check on the number of days that
seven patients stayed in each hospital reveals the following:
Hospital A: 8 5 9 2 7 8 2
Hospital A: 4 3 8 7 7 1 5
Hospital A: 1 4 9 8 7 2 3
NOUN 150
BUSINESS STATISTICS (BHM202)
Test the hypothesis at 5 percent level of significance.
Solution: Let X1j, X2j, X3j denote the number of days the jth patient stays in the hospitals A, B
and C respectively
8 4 1 4.5796 1 14.8996
5 3 4 0.7396 4 0.7396
9 8 9 9.8596 9 17.1396
2 7 8 14.8996 4 9.8596
7 7 7 1.2996 4 4.5796
8 1 2 4.5796 16 8.1796
2 5 3 14.8996 0 3.4596
=50.8572 =38
= ;
Within Sample Sum of Square: To find the variation within the sample, we compute the
sum of the square of the deviations of the observations in each sample from the mean values
of the respective samples (see the table above)
NOUN 151
BUSINESS STATISTICS (BHM202)
Sum of Squares within Samples =
To obtain the variation between samples, we compute the sum of the squares of the
deviations of the various sample means from the overall (grand) mean.
= 0.3844;
= 0.0576;
= 0.1444;
= (
The total variation in the sample data is obtained on calculating the sum of the squares
of the deviations of each sample observation from the grand mean, for all the samples as in
the table below:
= = =
NOUN 152
BUSINESS STATISTICS (BHM202)
Note: Sum of Squares Within Samples + S.S Between Samples = 147.71 + 4.10 =151.81
Ordinarily, there is no need to find the sum of squares within the samples (i.e, the error sum
of squares), the calculations of which are quite tedious and time consuming. In practice, we
find the total sum of squares and between samples sum of squares which are relatively simple
to calculate. Finally within samples sum of squares is obtained by subtracting Between
Samples Sum of Squares from the Total Sum of Squares:
NOUN 153
BUSINESS STATISTICS (BHM202)
Within Classes (or Error) Sum of Squares = n-k = 21 – 3= 18
ANOVA TABLE
Sources of d.f(2) Sum of Mean Sum of Variance Ratio(F)
variation(1) Squares(S.S) (3) Squares(4) =
Critical Value: The tabulated (critical) value of F for d.f (v1=2, v2=18) d.f at 5% level of
significance is 3.55
Since the calculated F = 0.25 is less than the critical value 3.55, it is not significant. Hence
we fail to accept Ho.
However, in cases like this when MSS between classes is less than the MSS within classes,
we need not calculate F and we may conclude that the means , and do not differ
significantly. Hence, Ho may be regarded as true.
Conclusion: Ho : μ1 = μ2 = μ3, may be regarded as true and we may conclude that there is no
significant difference in the average stay at each of the three hospitals.
Critical Difference: If the classes (called treatments in pure sciences) show significant effect
then we would be interested to find out which pair(s) of treatment differ significantly. Instead
of calculating Student’s t for different pairs of classes (treatments) means, we calculate the
Least Significant Difference (LSD) at the given level of significance. This LSD is also known
as Critical Difference (CD).
The LSD between any two classes (treatments) means, say and at level of significance
NOUN 154
BUSINESS STATISTICS (BHM202)
LSD ( - ) = [The critical value of t at level of significance α and error d.f] X [S.E ( -
)]
Note: S.E means Standard Error. Therefore, the S.E ( - ) above mean the standard error
= t n-k (α/2) X
If the difference between any two classes (treatments) means is greater than the
Step 1: Compute: G =
Step 2: Compute Correction Factor (CF) = , where n = n1+n2=.....nk, is the total number of
observations.
observations
Step 5: Compute
Step 7: Within Classes or Error Sum of Squares = Total S.S – Between Classes S.S
The calculations here are much simpler and shorter than in the first method
NOUN 155
BUSINESS STATISTICS (BHM202)
Application: Let us now apply this alternative method to solve the same problem treated
earlier.
But
Therefore, BCSS =
Therefore, Within Classes (hospitals) Sum of Squares or Error S.S = TSS – BCSS
Having arrived at the same Sums of Squares figures, computations can proceed as done
earlier.
Example 2: The table below gives the retail prices of a commodity in some shops selected at
random in four cities of Lagos, Calabar, Kano and Abuja. Carry out the Analysis of Variance
NOUN 156
BUSINESS STATISTICS (BHM202)
(ANOVA) to test the significance of the differences between the mean prices of the
commodity in the four cities.
If significant difference is established, calculate the Least Significant Difference (LSD) and
use it to compare all the possible combinations of two means (α=0.05).
Solution:
Using the alternative method of obtaining the sum of square
City Price per unit of the commodity in Total Means
different shops
Lagos 9 7 10 8 34 8.5
Calabar 5 4 5 6 20 5
Kano 10 8 9 9 36 9
Abuja 7 8 9 8 32 8
= 34 + 20 + 36 +32
= 122
NOUN 157
BUSINESS STATISTICS (BHM202)
= 930.25
(72 + 82 + 92 + 82)
RSS = 980
= 980 – 930.5
TSS= 49.75
= 969 – 930.25
BCSS = 38.75
= TSS –BCSS
= 49.75 – 38.75
WSS = 11
Between Class Mean Sum of Square Error = ; where k is the number of classes
= =
NOUN 158
BUSINESS STATISTICS (BHM202)
= 12.92
= 0.92
F calculated =
F calculated = 14.04
Decision: Sincethe computed F is greater than the table value F(v1, v2, α) , the null hypothesis is
rejected and the alternative is accepted.
= 2.18 X
= 2.18 X 0.678
LSD = 1.48
NOUN 159
BUSINESS STATISTICS (BHM202)
4.0 CONCLUSION
The unit has espoused the theory and application of Analysis of Variance in statistics with
special emphasis on its application in the comparison of more than two means.
5.0 SUMMARY
In summary, ANOVA is very useful in the multiple comparison of mean among other
important uses in both social and applied sciences.
Brand 2: 9, 8, 11, 8, 10
Brand 4: 8, 9, 13, 9
Test the hypothesis that the average life for each of brand of tyres is the same. Take α = 0.01
NOUN 160
BUSINESS STATISTICS (BHM202)
7.0 REFERENCES / FURTHER READINGS
OKOJIE, DANIEL E. NOUN TEXT BOOK, Eco 203: Statistics for EconomistsGupta S.C.
(2011). Fundamentals of Statistics.(6th Rev.& Enlarged ed.).Mumbai India, Himalayan
Publishing House.
NOUN 161
BUSINESS STATISTICS (BHM202)
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Steps in Forecasting
3.2 Types of Forecasts
3.3 Methods of Forecasting
3.4 Lease Squares or Trend Lines
3.5 Lease Square Method
4.0 Conclusion
5.0 Summary
6.0 Assignment
7.0 References / Further Reading
1.0 Introduction
Assumptions in Forecasting
Forecasts are based on past performances. In other words, future values are predicted from
past values. This assumes that the future will be basically the same as the past and present,
implying that the relationships underlying the phenomenon of interest are stable overtime.
Forecasting can be performed at different levels, depending on the use to which it will be put.
Simple guessing, based on previous figures, is occasionally adequate. However, where there
is a large investment at stake, structured forecasting is essential.
2.0 Objective
Any forecasts made, however technical or structured, should be treated with caution, since
the analysis is based on past data and there could be unknown factors present in the future.
However it is often reasonable to assume that patterns that have been identified in the
analysis of past data will be broadly continued, at least into the short-term future.
NOUN 162
BUSINESS STATISTICS (BHM202)
We outline the basic steps in forecasting as follows:
Step 1. Garther past data: daily, weekly, monthly, yearly.
Step 2. Adjust or clean up the raw data against inflationary factors. Index
numbers can be used in deflating inflationary factors.
Step 4. When the future data ( which is been forecast ) becomes available, compare
forecasts with actual values, By so doing, one will be able to establish the error
due to forecasting.
1. Short-term Forecasts. These are forecasts concerning the near future. They, are
characterized by few uncertainties and therefore more accurate then distant future
forecasts
2. Long – term forecasts. These concern the distant future. They
are characterized by more uncertainties than short – term
forecasts.
3. Extrapolation. These are forecasts based solely on past and present values of the variable
to be forecast. In this case, future values are extrapolated from past and present values.
I. Moving Averages
II. Trend lines or least squares.
Moving Averages
Moving averages can be used to generate the general picture (or trend) behind a set of data or
time series. The general pattern generated can be used to forecast future values.
NOUN 163
BUSINESS STATISTICS (BHM202)
Note that a time series is a name given to numerical data described over a uniform set of time
points. Time series occur naturally in all spheres of business activity.
Past (Actual) 50 55 70 50 45 90
50 + 55 + 70 = 58 (Feb)
55 + 70 + 50 = 58 (March)
70 + 50 + 45 = 55 (April)
50 + 45 + 90 = 62 (May)
NOUN 164
BUSINESS STATISTICS (BHM202)
Forecast sales (N) Jan Feb March April May June
Future (forecast) - 58 58 55 62 -
Sales (N)
actual
80
forecast
60
40
20
The idea behind the use of trend lines in forecasting is based on the assumption that the
general picture underlying a given set of data can be reasonably approximated by a straight
line. Such a straight line can be extended backwards or forward for predicting past or future
values.
Example
NOUN 165
BUSINESS STATISTICS (BHM202)
Suppose the line AB in the following straight line reasonably approximates a set of data for
1995 – 2000
Profit B
95 96 97 98 99 2000
Year
Figure 3.4 indicates that we can forecast profit backwards for years below 1995, using the
dotted line AC. Similarly, profits can be forecast for years beyond 2000, using the dotted line
BD.
The basic task in using a trend line for forecasting is to determine a line similar to line AB in
figure 3.4: then forecasting backwards or forwards is a straight forward activity. The most
effective way of determining such a line is the Least-Squares method.
The least – squares method provides a sound mathematical basis for choosing the best trend
line; of all possible trend lines for a given set of time series. This method provides an
equation ( with its numerical coefficients) so that the value corresponding to any given year
(or period) can be determined by substituting the given year (or period) into the equation.
NOUN 166
BUSINESS STATISTICS (BHM202)
Y=â + bt
The Least – Squares method is then used to determine the numerical values of the parameters,
â and b
We assume:
t= 1 in 1990
t= 2 in 1991
t= 3 in 1992
t= 4 in 1993
t= 5 in 1994
t= 6 in 1995
t= 7 in 1996
t= 8 in 1997
t= 9 in 1998
t y
1 50
2 80
3 90
4 49
5 75
6 58
7 82
8 73
9 95
â = Y - b t
NOUN 167
BUSINESS STATISTICS (BHM202)
b = nty - ty
nt2 - (t)2
t y ty t2
1 50 50 1
2 80 160 4
3 90 270 9
4 49 196 16
5 75 375 25
6 58 348 36
7 82 574 49
8 73 584 64
9 95 855 81
45 652 3412 285
y = y = 652 = 72.44
n 9
t = t = 45 = 5
n 9
It follows that:
= 2.53
â = y–bt
= 72.44 – 2.53 (5)
= 72.44 – 12.65
= 59.79
NOUN 168
BUSINESS STATISTICS (BHM202)
Y = 59.79 + 2.53 t.
This equation can be used any time to forecast the value of any given year, provided the
numerical value of the year is appropriately identified.
For example, let use forecast the value of output, Y, for year 2003.
Following the systematic process, the year 2003 is associated with the numerical value, t =
14, so that for t = 14,
= 59.79 + 35.4
= 95.21
4.0 CONCLUSION
The unit has espoused the theory and application of Forecasting and Time Series Analysis in
statistics with special emphasis on its application in the comparison of more than two means.
5.0 SUMMARY
In summary, Time Series is very useful in the multiple comparison of mean among other
important uses in both managementl and applied sciences.
8 , 11, 10, 21, 4, 9, 12, 10, 23, 5, 10, 13, 11, 26, 6
which set of moving averages is the correct one to use for obtaining
a trend for the series.
2. The data given in Table below represent the annual gross revenue (in N’ millions)
obtained by a Telephone company over the periods 1997 – 2006:
NOUN 169
BUSINESS STATISTICS (BHM202)
ONWE J.O. NOUN TEXT BOOK, ENT 735: Quantitative Methods for Banking and
Finance
OKOJIE, DANIEL E. NOUN TEXT BOOK, Eco 203: Statistics for Economists
NOUN 170
BUSINESS STATISTICS (BHM202)
NOUN 171