Economics Sem 1Lecture Notes Introduction to Statistics (1)
Economics Sem 1Lecture Notes Introduction to Statistics (1)
Introduction to Statistics
1. INTRODUCTION
1.1 Definition of Statistics
In a singular sense, statistics refers to statistical methods or a subject of statistics. This is the
focus of this course!
Statistics is a branch of applied mathematics concerned with the collection and interpretation of
quantitative data and the use of probability theory to estimate population parameters. It is a science
that helps us make better decisions in business and economics as well as in other fields. Statistics
also teaches us how to summarize, analyze, and draw meaningful inferences/conclusions from data
that then lead to improve our decisions. In short, statistics is the science of conducting studies to
collect, organize, summarize, analyze, and draw conclusions from data.
There are two categories of statistics: descriptive statistics and inferential statistics
Descriptive statistics deals with data collection, summarization, analysis and interpretation.
Descriptive statistics can only be used to describe the sample data. That is, the results
cannot be generalized to the population (no conclusion or inference about population
characteristics). In other words, we use descriptive statistics simply to describe what is
going on in our sample data we have at hand. It is making aggregation from the sample
data but not generalization about the population. Frequency distributions, graphs (like
bar histogram, pie, line, etc.), measures of central tendencies (e.g. mean, median and
mode), measures of variations (standard deviation, range, CV) are examples of
descriptive statistics.
1
Example: suppose that a sample of marks of 6 students were 45, 60, 72, 80, 85 and 93. “The
average score of the six students is 72.5” or “The range of the six students is 48” are
descriptive statistics.
⚫ Descriptive Statistics
✓ Collect
✓ Organize
✓ Summarize
✓ Display
✓ Analyze
✓ Interpret
2
• Descriptive statistics is focused on summarizing the data collected from a sample. The
technique produces measures of central tendency and dispersion which represent how the
values of the variables are concentrated and dispersed.
• Inferential statistics generalizes the statistics obtained from a sample to the general population
to which the sample belongs. The measures of the population are termed as parameters.
• Descriptive statistics make only summarization of the properties of the sample from which
data were acquired, but in inferential statistics, the measure from the sample is used to infer
properties of the population.
• In inferential statistics, the parameters were obtained from a sample, but not the whole
population; therefore, always some uncertainty exists compared to the real values.
In the fields of economics it is almost impossible to think of a problem which does not require an
extensive use of statistical methods and statistical data. For instance,
• At macro levels: government uses statistics to formulate sound policies (monetary, fiscal,
etc. ), depicting the trends of its GDP using graphs, and for planning purposes, which
needs statistics for assessment of various resources both human and material of the country
in order to make proper planning
• At sectoral levels: describing growth in agricultural and industrial sectors, depicting trends
in fertilizers and improved seeds consumption over the last 15 years
• At micro levels: statistics is required to describe, compare and correlate about production,
distribution and consumption, trend and fluctuation analysis prices, sales and advertising
using time series data
• At subject level, econometrics applies statistical methods to the empirical study of
economic theories and relationships
3
o Social science e.g.: Psychology (human characteristics)
o Medical science. e.g.: -the effectiveness of a new drug.
o Pharmaceuticals. e.g.: the percentage composition of a certain drug.
Limitations:
• Statistics is not suitable to study qualitative phenomenon like beauty, good, bad, honesty,
courage, intelligence, etc.
• It doesn’t study individual item.
• Its laws are not as perfect as laws in natural sciences.
• Its results are true only on average.
• Its interpretation requires a high degree of skill.
• Mayn’t necessarily bring out the cause and effect relationships between the variables.
• Its methods are biased.
• Possibilities of misuses; like using a data with no source, defective data, unrepresentative
data, inadequate data, unfair comparison, and mistakes in arithmetic.
4
2. SAMPLING THEORY
a) Finite vs. infinite population: if it is possible to count all individuals in the population,
then it is finite population. However, if the elements are infinite and uncountable, then it
is infinite.
b) Real vs. hypothetical population: real population exists in reality but hypothetical
population does not exist (or it is simply imaged).
c) Target vs. studied population: The target population is the entire group a researcher is
interested in or the group about which the researcher wishes to draw conclusions.
However, not all target population are willing to be investigated. The population out of
which the sample is selected is called or studied population (population that is willing to
be investigated).
Population parameters: the descriptive characteristics (or summary measures) obtained from
the entire population are termed as population parameters examples population mean and
population variance.
Sample statistics are the summary measures obtained from the sample example sample mean
and sample variance.
5
Sampling implies the selection of few items from a given population.
◼ Sampling with replacement: an item, after being picked up and included into the
sample, is replaced back into the population so that it can be picked again.
Sample size- the sample size (denoted by n) is the number of elements included in the sample for
investigation
Sampling frame is the source material or device from which a sample is drawn. It is a list of all
those within a population who can be sampled, and may include individuals, households or
institutions.
Sampling unit is one of the units into which an aggregate is divided for the purpose of sampling,
each unit being regarded as individual and indivisible when the selection is made.
Sample design is the process of selecting sample elements from the sampling frame.
Sampling error is incurred when the statistical characteristics of a population are estimated from
a subset, or sample, of that population. Since the sample does not include all members of the
population, statistics on the sample, such as sample mean generally differs from population mean.
That is, it occurs as we are working with the sample but not with the population and hence sample
statistics differ from population parameters. In other words, the discrepancy between a sample
statistic and its population parameter is called sampling error. It arises due to inappropriate
sampling techniques.
Non-sampling error is a catch-all term for the deviations of estimates from their true values that
are not a function of the sample chosen, including various: coverage errors ((non-representative
sample), non-response errors (respondents give false information), data entry error,
misunderstanding of questions being asked etc.
6
(1) Costs/economy
(2) Timeliness
(3) Large size of many population
(4) Inaccessibility of the entire population
(5) Destructive nature of many tests
(6) Reliability or accuracy.
(1) Cost/Economy
Unit cost of collecting data in the case of census is significantly less than in the case of sampling.
However, due to due to the larger number of items in the population, the total cost involves in the
case of is significantly higher than in the case of sampling. Suppose it takes Birr 200 per unit to
make a census of 10,000,000 individuals but the unit cost of sampling 5000 individuals is Birr
1000. Thus, the total cost is: 10,000,000 x 200 = 2,000,000,000 but that of sample is 5,000 x 1000
= 50,000,000
(2) Timeliness
Due to the larger size of population total time involves in the case of census in significantly higher
than that of sampling (i.e., the sample may provide us with necessary information quickly).
Some times, many populations about which inferences must be made are quite large implying that
it is impossible to cover all the items in the population. Thus, the solution is to take sample from
such a population.
In some cases the entire population may not be accessible due to diseases, death, conflict, mental
abnormality, prisoners, etc. In that case sampling is necessary.
7
(5) Destructive nature of many tests:
Due to destructive nature of many tests, the resources are completed to collect information only
from part of the population. For example: blood test for a patient, life hours of a tube light, strength
of wires, etc.
(6) Accuracy:
Non-sampling error in the case of census is higher than the non-sampling error committed in the
case of a sample survey ( as less qualified investigator are involve in the case of census and the
supervision, monitoring and quality control mechanism in the case of census may be poor). The
higher the degree of non-sampling error, the less reliable your result may be.
A. Probability Sampling: in this case each observation in the population has an equal
chance of being selected to become part of the sample. Sampling techniques such as
simple random sampling, stratified sampling, cluster sampling and systematic
sampling are probability sampling.
B. Non-probability Sampling is no way of estimating the probability that each
individual will be included in the sample. Quota sampling and judgmental sampling
are examples of non-probability sampling.
A. Probability Sampling
1. Simple Random Sample: is a method of probability sampling in which every unit in the
population has an equal nonzero chance of being selected (or part of the sample). In other
8
words, each element of the population has an equal and independent chance of being included
into the sample. The probability is given by n/N.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
9
How to Use Random Number Tables
a) Assign a unique number to each population element in the sampling frame. Start with
serial number 1, or 01, or 001, etc. depending on the number of digits required.
b) Choose a random starting position by closing your eyes (blind fold selection) and placing
your finger on a number in the table.
c) Select serial numbers across rows or down columns or diagonally from the starting point.
d) Discard numbers that are not assigned to any population element and ignore numbers that
have already been selected.
e) Repeat the selection process until the required number of sample elements is selected.
Example
A lecturer wants to randomly select 20 students from a class of 100 students. Here is how
he can do it using a random number table.
Step 1: Assign all the 100 members of the population a unique number. You may identify
each element by assigning a two-digit number. Assign 01 to the first name on the list, and
00 to the last name. If this is done, then the task of selecting the sample will be easier as
you would be able to use a 2-digit random number table.
…. …
Chala ………………… 08 Tsion……………… 61
… …
Genet ………………… 18 Tura …………….. 87
… …
10
Step 2:Select any starting point in the Random Number Table and find the first number that
corresponds to a number on the list of your population. In the example above, # 08 has
been chosen as the starting point and the first student chosen is Chala.
Starting point:
move right to the end
of the row, then down 10 09 73 25 33 76
to the next row; move 37 54 20 48 05 64
left to the end, then 08 42 26 89 53 19
down to the next row, 90 01 90 25 29 09
and so on. 12 80 79 99 70 80
66 06 57 47 17 34
31 06 01 08 05 45
Step 3: Move to the next number, 42 and select the person (Temesgen) corresponding to that number
into the sample
Step 4: Continue to the next number that qualifies and select that person into the sample.
# 26 -- Lelise, followed by #89, #53 and #19
Step 5: After you have selected the student # 19, go to the next line and choose #90. Continue in the
same manner until the full sample is selected. If you encounter a number selected earlier (e.g.,
90, 06 in this example) simply skip over it and choose the next number.
11
Stratified samples can be:
◼ Proportionate: involving the selection of sample elements from each stratum, such that the
ratio of sample elements total number of population elements (n/N) is constant/equal for
all strata.
◼ Disproportionate: the sample is disproportionate when the above mentioned ratio is
unequal.
Example: To select a proportionate stratified sample of 20 households from Addis Ababa
that belong to three income groups: low (50), middle (30) and high (20)
(N=50+30+20=100).
◼ Sub-divide the club members into three homogeneous sub-groups or strata by the income
groups: low, middle and high.
◼ Calculate the overall sampling fraction, f, in the following manner: f=n/N=20/100=0.2
Where n = sample size and N = population size: n1=0.2*50=10, n2=0.2*30=6 and
n3=0.2*2=4. Thus, n=n1+n2+n3=10+6+4=20
3. Systematic Sampling:
In systematic sampling only one random number is needed throughout the entire sampling process.
To use systematic sampling, a researcher needs:
[i] a sampling frame of the population;
The first element (number), which is between 1 and K, is determined using simple random
sampling and then the next items are selected using the skip interval. For instance, The j th unit is
selected at first and then ( j + k ) th , ( j + 2 K ) th ,…, etc until the required sample size is obtained.
Example: if a lecturer wants to randomly select 20 students from a class of 100 students using
systematic sampling, she can take the first element between 1 and 5 using simple random sampling
and then select every 5th element starting from the initial number.
12
4. Cluster Sampling
Step 1: Determine the geographic area to be surveyed, and identify its subdivisions. Each
subdivision cluster should be highly similar to all others. For example, choose ten housing
blocks within 2 kilometers of the proposed site [say, Model Town] for your new retail
outlet; assign each a number.
Step 2: Decide on the use of one-step or two-step cluster sampling. Assume that you decide to
use a two-stage cluster sampling.
Step 3: Using random numbers, select the housing blocks to be sampled. Here, you select 4
blocks randomly, say numbers #102, #104, #106, and #108.
Step 4: Using some probability method of sample selection, select the households in each of the
chosen housing block to be included in the sample. Identify a random starting point (say,
apartment no. 103).
13
Stratified Sampling vs Cluster Sampling
In multi-stage sampling: several levels of nested clusters are involved where sample units are
clusters at each stage except the final stage. It is known as 'multistage' because there are multiple
stages, or steps, to creating the sample. The first stage in multistage sampling is the same as cluster
sampling. Thus, it is a complex form of cluster sampling. It often includes both stratified and
cluster sampling techniques. E.g. cluster Ethiopia into regions, zones, Woredas, kebeles and finally
take households from sampled kebeles using SRS or stratified sampling.
Multi-phase sampling: is designed to make use of the information collected in one phase to
develop sampling design for the next phase. For instance, in the double phase sampling, the first
phase may consider relationship between income and expenditure and using information obtained
in the first phase, surveyed households divided into groups based on income levels (strata). Or it
is sometimes convenient and economical to collect certain items of information from the whole of
the units of a sample and other items of usually more detailed information from a sub-sample of
the units constituting the original sample. This may be termed two-phase sampling, e.g. if the
14
collection of information concerning variate, y, is relatively expensive, and there exists some other
variate, x, correlated with it, which is relatively cheap to investigate, it may be profitable to carry
out sampling in two phases.
Convenience sampling
• Drawn at the convenience of the researcher. Common in exploratory research. Does
not lead to any conclusion.
Judgmental sampling
• Sampling based on some judgment, gut-feelings or experience of the researcher.
Common in commercial marketing research projects. If inference drawing is not
necessary, these samples are quite useful.
Quota sampling
• In this method, the decision maker requires the sample to contain a certain number
of items with a given characteristic. It is something like judgmental sampling.
Individuals are selected from each quota using SRS.
Snowball sampling
• Used in studies involving respondents who are rare to find. To start with, the
researcher compiles a short list of sample units from various sources. Each of these
respondents is contacted to provide names of other probable respondents.
15
3. Data Collection and Presentation
Definition of Data: data refer to the numerical description of quantitative aspect of things. In other
words, data are the facts and figures that are collected, summarized, analyzed, and interpreted. The
singular form of data is datum.
Qualitative data are data that can be placed into distinct categories, according to some
characteristics or attributes. Qualitative data could be nominal or ordinal. Examples of quantitative
data are gender (Male and Female), income categories (low, middle, high), education level
(diploma, degree, masters, PhD), etc.
Quantitative data are numerical and can be ordered or ranked and they are closure under
mathematical operations. Examples of quantitative data are age, height, weight, income,
expenditure, profit, price, household size, inflation rate, interest rate, etc. Quantitative data can be
discrete or continuous. Discrete data assume values that can be counted or within a given interval
they assume some fixed values. Examples are household size, number of customers, number of
patients, etc. Continuous data can assume an infinite number of values between any two specific
values. They often include fractions and decimals. Examples are income, profit, price, inflation
rate, interest rate, etc.
Data can also be categorized based on type of measurement scale:
16
• Ordinal data: classify data into categories that can be ranked/ordered; however, precise
differences between the ranks do not exist. That is, allow us to rank/order the items we
measure in terms of which has less and which has more of the quality represented by the
variable, but still they do not allow us to say "how much more.” Examples quality of
instructors: superior, average, poor; grades: A, B, C, D, F; quality: excellent, very good,
good, poor; etc.
• Interval data: allow us not only to rank or order the items that are measured, but also to
quantify and compare the sizes of differences between them. In other word, the interval
level of measurement ranks data, and precise differences between units of measure do
exist; however, there is no meaningful zero (i.e., zero is arbitrarily located in the
interval scale). For example, year of birth, temperature as measured in degrees Fahrenheit
or Celsius, constitutes an interval scale. However, for temperature, 00F does not mean no
heat at all.
• Ratio data: possess all the properties of interval measurement, and there exists a true zero.
In addition, true ratios exist when the same variable is measured on two different members
of the population. Examples of ratio scales are those used to measure height, weight, area,
and number of phone calls received. Ratio scales have differences between units (1 cm, 1
kg, etc.) and a true zero. In addition, the ratio scale contains a true ratio between values.
For example, if one person can lift 100kg and another can lift 50 kg, then the ratio between
them is 2 to 1. Put another way, the first person can lift twice as much as the second person.
Note that most statistical data analysis procedures do not distinguish between the interval
and ratio properties of the measurement scales.
Summary
17
Measurement Distinguished Ranked/ True True ratio Example
scale by ordered differences and true zero
exist exists
Nominal Their names no no No gender
scale only
Ordinal scale Their names Yes No No Grades
only
Interval scale Their names Yes Yes No Year of
only
birth
Ratio scale Their names yes Yes Yes Age
only
Primary data are directly obtained from the respondents or from the collecting organization like
CSA (i.e., it is unprocessed or raw data)
Secondary data: are processed that that are obtained from the secondary sources including various
organizations, books, reports, journals etc.
A number of methods can be used to gather data from the so called 'respondents'. Some of the
methods of data collection are:
i) Direct personal interview-as the name indicates directly ask the respondents
One advantage is obtaining in-depth responses to questions from the person being interviewed.
Disadvantage: costly to train enumerators and collect data
ii) Mail questionnaire: data obtained via mail survey
Advantages: cover a wider geographic area than telephone surveys or personal interviews since
mailed questionnaire surveys are less expensive to conduct. Also, respondents can remain
anonymous if they desire. Disadvantages:
• Lack of mail address
• Misunderstanding of questions
• Less response rates
18
• Difficulty in reading and writing (literacy related problems)
iii) Telephone interviews
An advantage over personal interview surveys in that they are less costly. Also, people may be
more frank in their opinions since there is no face-to-face contact. A major drawback to the
telephone survey is that some people in the population will not have phones or will not answer
when the calls are made; hence, not all people have a chance of being surveyed.
vi) E-mail interviews : questionnaires are sent to respondents and they give responses.
Advantages: costs are nearly zero
Other methods of data collection: direct personal observation, Focus Group Discussions (FGDs)
and Key Informant Interview (KII)-asking knowledgeable person(s). However, they are not used
in scientific research.
`Once data are generated from representative sample using appropriate data collection tools or
obtained from secondary sources, they must be presented in a meaningful manner. Presenting data
in tabular form has some advantages such as simplicity, save time and space, ease to handle,
analyze and interpret and facilitate comparison.
3.3.1. Tabular method of data presentation for ungrouped data: frequency distribution
Nominal data: Here the construction is straight forward: count the occurrences in each category
and find the totals. Example: The marital status of 60 adults classified as single, married, divorced
and widowed is given below:
Marital status Single Married Divorced Widowed Total
Frequency 25 20 8 7 60
19
Ordinal data: The construction is identical to the nominal case. However, the categories should
be put in an ordered manner. Example: Satisfaction on teaching method in a class size of 40.
Satisfaction Very Satisfied Dissatisfied Very Total
Satisfied dissatisfied
Frequency 11 24 3 2 40
Ratio data: suppose the raw data, which are in original form before statistical techniques are
applied, for the heights of 33 students are given as follows (in cm):
154 160 154 160 152 162 164 170 166
168 154 160 162 158 156 164 152
168 160 150 164 154 158 160 160
160 152 156 168 152 154 164 154
It is not ease to answer these and related questions from the above raw data. Thus, it must be
presented in such a way that it is possible to make meaningful analysis and interpretation from
such dataset.
Definitions:
20
known as frequency distribution. That is, frequency distribution associates each value of
Xi to its corresponding frequency (fi).
iv. The relative frequency is the ratio of frequency of a case to the total number of
f f
observation (sample size =n). That is, rf = or rf = n where f is the number of
fi
n
i =1
times a given element repeats itself (absolute frequency) and n is the # of obs.
Examples:
v. Percentage frequency-when the relative frequency is expressed in terms of percentage.
21
Relative Frequency Distribution
3.3.2. Tabular method of data presentation for grouped data: frequency distribution
Note, however, that when the number of possible values of our variable is very large the discrete
frequency distribution will no more be a condensed presentation. Then data have to be handled as
continuous and distributed in to classes.
22
38-44 37.5-44.5 5
45-51 44.5-51.5 9
52-58 51.5-58.5 6
59-65 58.5-65.5 1
Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
Insights Gained from the Percent Frequency Distribution
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third) of the parts costs are in the $70-
79 class.
• 10% of the parts costs are $100 or more.
Some concepts and terminologies
• Class Frequency (or simply frequency): refers to the number of items that belong to a class
or number of observations in a particular class.
23
• Class limits (C.L.): the lowest and highest values that can be included in a class such that
there is gap between successive classes are called class limits. The lower class limit (L.C.L.)
of a class is a value such that no lower value can fall in to that class, whereas the upper class
limit (U.C.L.) of a class is a value such that no upper value can fall into that class.
• Class Boundary (C.B) or Real class limits: class boundaries are the lowest and the highest
values in each class when there is no gap between successive classes. To work with the
distribution of a variable as if it was continuous, we make use of these real class limits (also
known as class boundaries).
Finding class boundaries
Let d =LCL of a class minus UCL of preceding/previous class. Add half of this difference to all
upper class limits to get the upper class boundary (UCB), and subtract it from all lower class limits
to get the lower class boundary (LCB).That is,
1 1
UCBi = UCLi + d and LCBi = LCLi - d
2 2
1
For instance,31-30=1 , and d = 0.5 .For the first class,
2
UCB = UCL + 0.5 = 30+0.5=30.5, and LCB= LCL-0.5=24-0.5=23.5; continuing in such a way,
we get the class boundaries in the second column of the example in the above table.
The class width for a class in a frequency distribution is found by subtracting the lower (or upper)
class limit of one class from the lower (or upper) class limit of the next class. For example, the
class width of the above example is seven (7). That is, 31-24=7 or 37-30=7.
24
The class mark is the mid-point of the class interval or is a value which lies mid way between the
lower and upper limits of the class. It is obtained as:
Note that: In further analysis of the data (measures of central tendencies, measures of variations,
etc.), a CM is used to represent all the items in that class.
1. Decide the number of classes (k): Select the number of classes desired, usually between 5 and
20 or use Sturge’s rule K = 1 + 3.322 log n where “k” is number of classes desired and n is total
number of observation. For example; If n =10, k = 4.32 4; if n =100, k= 7.644 8; if n =
1000, k =10.96 11.
2. Compute the Range(R) = Maximum value- Minimum value
3. Determine the Class Width (w): If the number of classes is known and if it is decided to use a
Range
uniform class width, we use w and rounded up to the nearest integer, where Range is
k
the difference between the highest and the smallest value of the data.
Note: As far as possible, a class width of 5 or a multiple of 5 is convenient and facilitates
computations.
4. Determine the Class Limits: Pick a suitable starting point less than or equal to the minimum
value. The starting point is called the lower limit of the first class. Continue to add the class width
to this lower limit to get the rest of the lower limits.
25
5. Determine the Upper class limits: To find the upper limit of the first class, subtract 1 (one)
from the lower limit of the second class. Then continue to add the class width to this upper limit
to find the rest of the upper limits.
6. Determine the frequency of each class
Frequency of each class can be determined simply by counting the number of observations
belonging to each class.
7. Sum up the frequency of each class to check whether it is equal to the total number of data
collected from the field or not.
Example: Construct a continuous frequency distributions for the following raw data on marks (out
of 100) obtained by 50 students in Statistics.
57, 53, 65, 55, 50, 45, 64, 52, 16, 46, 42, 63, 33, 64, 53, 25, 54, 35, 48, 55, 70, 47, 39, 58, 52, 36,
65, 75, 26, 20, 55, 60, 83, 61, 45, 63, 49, 42, 35, 18, 51, 45, 42, 65, 39, 59, 45, 41, 30, 40.
Solution:
i. Since n = 50, using the Sturge’s rule, the number of classes is:
k= 1+ 3.322 log 50 =6.64 7. Thus, the number of class is 7.
ii. Range = highest value – lowest value = 83 –16= 67.
Range 67
iii. Class width = w = = = 9.57 10 .
k 7
iv. Since the smallest value is 16, the LCL1 can be 15 and the UCL1 should be 24; and the frequency
distribution would look like:
Marks frequency
15 - 24 3
25 – 34 4
35 – 44 10
45 -54 15
55 – 64 12
65 – 74 4
75 – 84 2
Total 50
26
Note: 1. For the class boundaries, see the nature of the data.
• If there is no decimal point, then d = 1.
• If there is one digit after the decimal, then d = 0.1.
• If there are two digits after the decimal, then d = 0.01.
The cumulative frequency of a class tells us how often the values fall below or above that class.
Or as the name indicates it cumulates frequencies starting at the lowest or the highest class
boundary. There are two types of cumulative frequency distributions: the “less than” and the
“more than” cumulative frequency distributions.
i) The “less than” cumulative frequency distribution is obtained by adding the frequency of
all the preceding (previous or earlier) classes including the class against which it is written
or including the frequency of that class. In other words, it is obtained by adding
successively the frequencies of all the previous classes including the class under
consideration. The cumulate is started from the lowest to the highest size.
ii) The “more than” cumulative frequency distribution is obtained by adding the frequency
of the succeeding (later) classes including the frequency of that class. In other words, it is
obtained by finding the cumulate total of frequencies starting from the highest to the lowest
class. Example consider the distribution of marks of 50 students:
27
Interpretation of less than cumulative frequency: for instance, 23 out of 50 students have marks
less than 49.5.
Interpretation of more than cumulative frequency: for instance, 27 out of 50 students have marks
49.5 or more.
Relative frequencies for less than and more than cumulative frequencies
I. The histogram
The histogram is the set of adjoining vertical bars whose width is the class interval and whose
vertical heights represent the frequency of that class. On the x-axis, the class boundaries or limits
are presented and on the y-axis, we present the frequency of each class. That is, the histogram is a
graph consisting of a series of adjacent rectangles whose bases are equal to the class width of the
corresponding classes and whose heights are proportional to the corresponding class frequencies
with no gap between rectangles. In short, the histogram is a graph that displays the data by using
contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the
frequencies of the classes.
28
To construct a histogram, the class boundaries, the class marks or the class limits are plotted on
the horizontal axis and the class frequencies are plotted on the vertical axis. The absolute (actual)
frequencies may also be indicated on the top of each rectangle.
Consider the distribution for costs of customers on certain item:
A frequency polygon is a line graph where class frequencies are plotted against the class marks (or
mid points) and the successive points are connected by straight lines. As in the histogram, the
class marks are plotted along the X-axis and frequencies along the Y-axis. In other words, the
frequency polygon is a graph that displays the data by using lines that connect points plotted for
the frequencies at the midpoints of the classes. The frequencies are represented by the heights of
the points. Two classes with zero frequencies at both ends must be added to tie down the graph to
the X-axis. Example:
29
Marks frequ
Frequency Polygon ency
16 15 - 24 3
14 25 – 34 4
35 – 44 10
12
45 -54 15
Freq. 10 55 – 64 12
8 65 – 74 4
75 – 84 2
6
Total 50
4
2
0
10 20 30 40 50 60 70 80 90
The ogive curve is a graph that represents the cumulative frequencies for the classes in a frequency
distribution. It is also known as the cumulative frequency curve, which is a graphical
representation of a cumulative frequency distribution. There are two types of Ogive curves: the
“less than Ogive” and the “or more Ogive”.
i. The “less than” Ogive – the less than cumulative frequencies are plotted against upper class
boundaries of their respective classes and they are joined by either straight lines or smooth
curves.
ii. The “more than” Ogive- in this case, the “ or more than” cumulative frequencies are plotted
against the lower class boundaries of their respective classes, and the connections may be by
straight lines or smooth curves.
30
Example :
31
m
la
C
ti
u
u
n
v
y
F
e
c
r
Less than cumulative (or ogive) curve
60
50 79.5, 50
69.5, 47
40 59.5, 39
30
49.5, 23
20
39.5, 13
10
29.5, 6
19.5, 2
0 9.5, 0
0 10 20 30 40 50 60 70 80 90
32
en
eq
cy
Fr
ul
m
iv
at
C
u
u
e
More than cumulative (ogive) curve
60
50 9.5, 50 19.5, 48
29.5, 44
40
39.5, 37
30
49.5, 27
20
10 59.5, 11
69.5, 3
0 79.5, 0
0 10 20 30 40 50 60 70 80 90
Note that the Ogives can help us in estimating the number of observations falling either above,
below or in between values.
It is the graph that displays the item on the X-axis and the value on the Y-axis.
-used to present discrete frequency distributions.
Example: consider the Ethiopian GDP at constant market for the last 40 years (1971-2011).
33
Ethiopian GDP at constant price
180,000.0
160,000.0
140,000.0
120,000.0
100,000.0
80,000.0
60,000.0
40,000.0
20,000.0
-
Scatter plots show the relationship between two variables. For instance, income and expenditure
of households can be presented using a scatter plot.
12000
10000
8000
6000
4000
2000
34
2.4 Diagrammatical Presentation of Data
Diagrammatical presentation of data is usually used to present categorical data. The two most
commonly used charts (usually for qualitative data are bar diagram (Bar chart) and Pie diagram
(Pie chart).
a) Bar chart
A bar graph is a graphical device for depicting qualitative data. On the horizontal axis we specify
the labels that are used for each of the classes. A frequency, relative frequency, or percent
frequency scale can be used for the vertical axis. The bars are separated to emphasize the fact that
each class is a separate category. It is the simplest and most frequently used graphical methods of
data presentation. It uses a series of equally spaced bars of uniform width. The bases of the bars
are the categories on the horizontal axis while the height of each bar represents the absolute
frequency of the particular category. It is of simple, multiple, component and percentage
component types.
35
bar graph
10
9
9
8
7
6
5
5
4
3
3
2
2
1
1
0
Poor Below average Average Above average Excellent
The bar graph for performance of students that can be sub divided into components (say, by sex)
as:
Multiple Bar chart and sub-divided bar chart or component bar chart.
rating total Male Female
Poor 2 1 1
Below 3 2 1
average
Average 5 3 2
Above 9 4 5
average
Excellent 1 0 1
36
5
5
4.5 4
4
3.5 3
3
Male
2.5 2 2
Femal
2
1.5 1 1 1 1
1
0.5 0
0
Poor Below average Average Above average Excellent
b) Pie-chart (diagram)
37
Pie- charts are very popularly used in practice to show percentage breakdowns of a categorical
data. A pie-chart is a circle representing a set of data by dividing a circle into sectors proportional
to the number of items in the categories. To construct a pie-chart, we need to find the relative
fi
frequencies; compute the central angles: i = 3600 , and finally draw a circle partitioned
n
according to the central angles.
Example:
rating frequency Relative % size of central
frequency angle
Poor 2 0.1 10 36
Below average 3 0.15 15 54
Average 5 0.25 25 90
Above average 9 0.45 45 162
Excellent 1 0.05 5 18
Excellent, 5
Poor , 10
Below average,
15
Above average ,
45
Average , 25
38
Pie-charts are useful to make comparisons as long as the number of components is not large
(usually, five or six components are used).
39
4. Measures of Central Tendency
4.1.Definition and objectives of the chapter
The most important aspect of studying the distribution of a sample measurement is the position of
the central value, that is, a representative value about which the measurements are distributed and
when it is convenient. This figure is known as the average of the group. If the numbers of the
group are arranged in order of magnitude, the averages tend to fall around the central position in
the group, so averages are called measures of central tendency. In short, any measure intended to
represent the center of data set is called a measure of location or central tendency. Measure of
central tendencies is statistical figures which indicate the average or the center of a given
distribution (data). It enables us to compare two or more series pertaining to the same time period
(e.g. student’s result in a semester) or within the same distribution through time (e.g. one’s GPA
in semesters). The three most common measures of central tendencies are: mean, median, and
mode.
40
The Summation Notation (∑)
Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and x i is the ith observation. Then the
sum is:
For instance, a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.
6
Their sum becomes xi =1
i = 21+13+59+46+32+37=208.
n
Similarly x1 + x2 + ... + xn = xi
2 2 2 2
i =1
The A.M is the best known and most commonly used average. Sometimes it is referred as simply
“the mean”. The A.M can be defined for ungrouped data (with a single and/or more frequencies)
a. A.M. for ungrouped data: when fi (or frequency) is 1 for each observation
41
Suppose that x1, x2, , xn are n observed values in a sample of size n from a population of size N,
n < N. Then, the sample mean ( x , read as " x - bar") , and the population mean ( , read as “mu”)
are defined as:
n
x + x + + xn x i
x= 1 2 = i =1
; For sample data
n n
N
x + x ++ xN x i
= 1 2 = i =1
; For population data
N N
Recall that x is a statistic used to estimate (a parameter).
Example 1: Find the mean of the scores (out of 35) of a population of 12 students in a test:
20, 23, 26, 17, 30, 10, 8, 34, 25, 28, 29 and 14.
Solution: Since N = 12, and x1 = 20, x2 = 23,, x12 = 14, we have
12
x i
264
Solution: Since N = 12, and x1 = 20, x2 = 23,, x12 = 14, we have = i =1
= = 22 .
12 12
b. A.M. for ungrouped frequency distribution
If X1 occurs f1 times, X2occurs f2 times,…, Xn occurs fn times, then the A.M. will be
n
− f X i i
X = i =1
n
.
f i =1
i
fi 5 9 12 17 14 10 6 73
f i xi 5 18 36 68 70 60 42 299
42
7
− f X i i
299
Therefore, X = i =1
7
= = 4.09
f
73
i
i =1
In case of grouped continuous data, the mid-points (class marks) of the various classes are taken
as representatives of their respective classes to estimate the mean with the assumption that the
values in a class are concentrated around the center. Therefore, for grouped continuous data with
m classes where the ith class has frequency fi and class mark xi , the mean is given by:
n
− f X i i
X = i =1
n
Where: X i = class-mark/midpoint of i th class, and f i = Frequency of i th class.
f i =1
i
Freq. 4 8 12 6 3 4 3
Solution: To calculate the mean, we have to construct the following table along with the class
marks for each class.
Class mark 3 8 13 18 23 28 33 Total
( xi )
or midpoint
Frequency (fi) 4 8 12 6 3 4 3 40
f i xi 12 64 156 108 69 112 99 620
− f X i i
620
Thus, X = i =1
7
= = 15.5.
f
40
i
i =1
Finding missing value in the frequency distribution and correcting the wrong mean
43
Missing frequency (given the mean)
Exercises:
1. If the mean of the following data is 32, find the missing freq., f
Marks of students 0-9 10 – 19 20 - 29 30 - 39 40 - 49 50 - 59
Number of students 5 15 20 f 20 10
Answer: f=30
2. Class A Class B
n= 20 n = 30
X = 70 X = 80
3. (i) A student made 65, 78, and 85 on three tests. What must she make on the fourth test
in order to have an average of 80?
65 + 78 + 85 + X
80 =
4
320 = 228 + X X = 320 − 228 = 92
(ii) A student had an average of 75 on four tests. If he makes 95 on the fifth test, what is
his average for the five tests?
4 * 75 + 95
X= = 79
5
Wrong values
If a wrong figure has been used in calculating the mean, we can correct if we know the correct
figure that should have been used. Let denote the wrong figure used in calculating the mean
and be the correct figure that should have been used. Then be the wrong mean calculated
using , then the correct mean, , is given by
44
Examples:
1. An average weekly wage of 25 workers of XYZ Company is Birr 378.4. It was later
discovered that one figure was entered wrongly as Birr 160 instead of the correct value of Birr
200. Find the correct mean.
2. Suppose the mean of 200 observations was 50. Later on it was discovered that two
observations were wrongly read as 92 and 8 instead of the correct values of 192 and 88.
Find the correct mean.
Solution
nX w + Xc − Xw
Xc =
n
1. 25(378.4) + 200 − 160
Xc = = 380
25
n X w + X c1 + X c 2 − X w1 − X w 2
Xc =
n
2. 200(50) + 192 + 88 − 92 − 8
Xc = = 50.9
200
Properties of A.M
n
1. The algebraic sum of deviations from the mean is always zero: ( x − x) = 0 . For
i =1
i
x i = nx
n
2. The sum of squares of deviations from the mean is minimum. That is, ( x − A)
i =1
i
2
is
minimum when A = x .
3. Let y = a bx be a linear function of x ,then y is also same linear function of x ,i.e y
= a bx
That is,if the mean of x1 , x 2 , ..., x n is x , then
45
Merits of the A.M. Demerits of the A.M.
-Rigidly defined by mathematical formula -cannot be calculated if all the values are not
given
-Is based on all the observations -cannot be used when there is an open-ended
class
b) Weighted Mean
While calculating the simple arithmetic mean, we had given equal importance to all values. But
there are cases where the relative importance is not the same for all items. When this is the case, it
is necessary to assign their weights (i.e., relative importance), and then calculate a weighted mean.
( )
The Weighted mean x w of a set of values x1, x2 ,, and x n whose relative importance (weight)
w1 x1 + w2 x2 + + wn xn wi xi
xw = =
w1 + w2 + + wn wi
Examples:
1. A typical example of the weighted mean is computation of your GPA. Suppose that a student
was registered for five courses with 4, 4, 3, 2 and 3 credit hours and she obtained grades B, A,
C, D and A, respectively. Find her GPA.
Solution:
Here, the numerical equivalences of the grades are the values and the corresponding credit hours are
their respective weights. So, we have
Values ( xi ) 3 4 2 1 4 Total
Weights ( wi ) 4 4 3 2 3 16
46
wi xi 12 16 6 2 12 48
48
and GPA= =3.0.
16
2. In a certain English class, quizzes make up 15% of the final average, major tests make up 35%,
papers make up 20%, and the final exam makes up 30%. If a student has an average of 90 on
quizzes, 80 on major tests, 75 on papers, and 85 on the final exam, what is his final average?
Solution:
The combined mean for a combined set of data may be obtained from the separate means if their
sample or population sizes are given. Suppose that the combined set was formed by combining two
sets of data with means X 1 and X 2 with n1 and n2 observations, respectively. Then, the combined
mean or grand mean is given by:
n1 x 1 + n 2 x 2
xc = .
n1 + n 2
Examples:
3. If the mean weight of 50 women working in a factory is 48 kg and the mean weight for men is
found to be 58kg, and the total number of workers in the factory was 125, find the mean
weight of all workers in the factory.
47
2. Suppose the mean grades/marks in statistics of 100 students of a class is 72. The mean marks of
boys is 75 Find the mean marks of girls in the class if the number of boys is 70.
n1=number of boys
n1 X 1 + n2 X 2
Xc =
n1 + n2
70(75) + 30( X 2 )
72 =
70 + 30
72(100) = 5250 + 30( X 2 )
X 2 = 65
In general, if there are k-different groups of data with n1, n2, , nk numbers of observations with
n1 x1 + n2 x 2 + + nk x k ni x i
xc = =
ni
.
n1 + n2 + + nk
It is the mean of n positive numbers x1, x2 ,, and x n and is the nth root of their product. It is used
in averaging: ratios, rate of change/growth rate, economic indices (like Consumer price index),
compound interest, discounting, etc.
G.M . = n x1 .x 2 . ...x n .
48
4. An economy has grown for the last five consecutive years as 5%, 6%, 4.5%, 3% and
7.5%.What was the average growth rate during these years?
1
For grouped data: it is given by logGM=logGM = 𝑛 (∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 )
It is the mean of n numbers x1, x2 ,, and x n and is defined as n divided by the sum of the
reciprocals of the n numbers. It is appropriate for situations when the average of rates is desired
(e.g., it helps to find the average speed of a trip over a route divided into constant speed segments
(of distance), ..). For example, if one travels half-way to a destination at 20 mi/hr, and then goes
60 mi/hr for the second half of the distance. The average speed is 30mi/hr.
n n
H .M . = = ………for ungrouped data
1 1 1 1
+
x1 x 2
+ .... +
xn
x
i
2 24
Example:-1.The H.M. of 6&4 is =
1 1 5
+
6 4
6 360
2. The H.M. of the first six natural numbers is =
1 1 1 1 1 147
1+ + + + +
2 3 4 5 6
n n
H .M . = = ………for grouped data
f1 f 2 f f
+
x1 x 2
+ .... + i
xn
( xi )
i
Example
49
CB fi xi Xi*fi logxi (fi*logxi)*1/n fi/xi
2-4 20 3 60 0.5 0.095 6.7
4-6 40 5 200 0.7 0.28 8.0
6-8 30 7 210 0.8 0.254 4.3
8-10 10 9 90 1.0 0.095 1.1
sum 100 560 0.724 20.1
560
X=
100 -------A.M.
X = 5.6
1 n 72.4
G.M . = anti log(
n i =1
f i xi ) = anti log(
100
) = 100.724 = 5.29
n n 100
H .M . = = = = 4.98
f1 f 2 f fi
( x )
20.1
+ + .... + i
x1 x 2 xn i
The Median
It has been pointed out that whenever there is a frequency distribution with open-end intervals, the
arithmetic mean cannot be calculated. Also the mean is greatly affected by extremely large or small
values. Hence, in such cases, the mean cannot be a good representative. Instead, other measures are
used to describe the data. In this section, we will discuss the most popular measure of position, the
median, and other related measures known as quintiles. Positional measures are chosen because of
their positions.
The median is, as its name indicates, the middle most value in the arrangement in an ascending or
descending order of magnitude, which divides the data in to two equal parts. It is the value which
exceeds, and is exceeded by, equal number (i.e., half) of the observations. That is, the median is
found by arranging the data in an increasing or decreasing order of magnitude. We can consider the
following three cases in finding the median:
50
Case 1: Ungrouped data (with frequency=1 for all xi): also known as individual series. In such
cases, the median is the value of the middle term when the data are arranged in order of magnitude.
When the number of observations is odd, there will always be a single value in the middle of the
arrayed data. When n is even, however, there will be two middle observations, and the median is the
mean of these two values. Let x1 , x2 ,, xn be n ordered observations. Then, the median value is
th th
n n
+ + 1
n +1
th
a) the b) the
2 2
value, if n is odd and value, if n is even.
2 2
Example 1: Find the median for the following data: 5, 2, 7, 1, 9, 10, 12. N=7
Solution: n = 7 is odd, we use above formula, and the median is the 4th item in the array. i.e.
median = 7.
Example 2: Find the median value of the population figures (in thousands) of 10 cities: 2000,
1180, 1785, 1500, 560, 782, 1200, 385, 1123, 222.
Solution: The arrayed data is: 222, 385, 560, 782, 1123, 1180, 1200, 1500, 1785, 2000, since
n = 10 is even, the median is the mean of the 5th , and 6 th values; i.e., Median =
1123+ 1180 = 1151.5
2
thousands.
Case 2: Ungrouped data (with frequency=fi for xi): In this case also, the median is obtained by
the same formula. Only one more step, finding the less than cumulative frequencies, is added,
because cumulative frequency distribution is itself an arrangement of values in an order:
• Find the less than cumulative frequencies.
• Look at the cumulative frequency and find that total which is either equal to or next higher to
n + 1 th obs. when n=odd and the average of the two middle values when=even, and the
2
51
3 4 4
5 4 8
6 7 15
8 9 24
10 5 29
n + 1
th
2
Example 4: Suppose that the frequencies of the above example are 4, 6, 5, 9, 6, respectively. Find
the median.
Solution: The cumulative frequencies are:
xi fi Cum. Freq. ( )
3 4 4
5 6 10
6 5 15
8 9 24
10 6 30
Here, n = 30, and median = 15.5th obs. Thus, it is the average of value = 15th and 16th obs. That is,
(6+8)/2 = 7
CF = the cum. Freq. corresponding to the class preceding the median class.
52
That is, the sums of the frequencies of all classes lower than the median class. Where the median class
th
n
is the class which contains the observation whether n is odd or even, since the items have
2
already lost their originality once they are grouped into continuous classes.
Formula, we get:
~
x = 59.5 + 10 (35.5 − 23) = 64.13 .
27
100 – 109 22
110 – 119 19
53
> 119 7
~ (32.5 − 17)10
x = 99.5 + = 99.5 + 7.05 = 106.55 .
22
Merits of the Median Demerits of the Median
-easy to calculate and understand - is based only on the middle item
-rigidly defined by mathematical formula -is relatively less stable than the mean
-can be computed when there is an open-ended class -doesn’t consider all the item
- not affected by extreme values -not suitable for further mathematical
treatments
(or upper) quartiles, respectively. The first quartile (Q1 ) is the value for which one-fourth of the data
lie below it and the rest three-fourth lie above it. The second quartile (Q2 ) is the value for which half
54
of the data lie below or above it. Since this is the property of the median, Q2 is equal to the median.
The third quartile (Q3 ) is the value for which three-fourth of the arranged items lie below it, or one-
Like the median, we can identify three cases in locating the quartiles:
Case 1: Quartiles in ungrouped data. In this case, we proceed as follows: Let x1 , x 2 , , x n be n
ordered observations. The ith quartile (Qi ) is the value of the item corresponding with the
th
i
(n + 1) position, for i = 1, 2, 3.
4
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, respectively
th th th
1 2 3
(n + 1) , (n + 1) , and (n + 1) values.
4 4 4
Example 1: Find Q1 and Q3 for 1080, 1100, 1120, 1150, 1160, 1200, 1400.
Solution: The data is already arranged: 1080, 1100, 1120, 1150, 1160, 1200, and 1400.
Then, Q1 =
1
(n + 1)th = 1 (7 + 1)th = 2 nd value = 1100; and
4 4
Q3 =
3
(n + 1)th = 3 (7 + 1)th = 6 th value = 1200.
4 4
Note that (when n = 7), the median value was 1150, which is nothing but Q1 is the median of the
values below the median and Q3 is the median of the values above the median.
To find Q1 , compute
1
(n + 1) , and find the minimum cumulative frequency greater than or equal to
4
this value. Then, the value corresponding to this cumulative frequency is the value of Q1 .
55
To find Q3 , compute
3
(n + 1) and locate it in a similar fashion.
4
Example 2: Find the median and the two quartiles from the following table.
xi 5 7 9 11 13 15 17 19
fi 1 2 7 9 11 8 5 4
Solution: We first find the less than cum. freq. and proceed as discussed above.
xi 5 7 9 11 13 15 17 19
fi 1 2 7 9 11 8 5 4
Cum. freq. ( ) 1 3 10 19 30 38 43 49
We see that since n = 49,
Q1 =
1
(49 + 1)th = 12.5 th value = 11 ; Q2 = 2 (49 + 1)th = 25 th value = 13 ; and
4 4
Q3 =
3
(49 + 1)th = 37.5 th value = 15.
4
Case 3: Quartiles in grouped continuous data.
For continuous grouped data, like the median, the exact quartiles cannot be obtained and thus the
following interpolation formula gives approximate values;
w in
Qi = L + − CF
f Qi 4
Where i = 1,2, 3, and L, w , f Qi and CF are defined in the same way as the median. The class under
i
question is the one including ( n)th value. That is, the class with the minimum frequency greater than
4
i
or equal to ( n) is the class of the ith quartile.
4
Example 3: Calculate the first and the third quartiles for the following data.
Grade 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99
Frequency. 5 18 27 15 6 3
Solution: The first step is to construct the cumulative frequency distribution.
56
Class boundary Frequency Cum. freq. (less than)
39.5 - 49.5 5 5
49.5 - 59.5 18 23
59.5 - 69.5 27 50
69.5 - 79.5 15 65
79.5 - 89.5 6 71
89.5 - 99.5 3 74
To find Q1 : find its class, determine L, w, f Q1 , and CF, and apply the formula.
n
= 18.5 , the minimum cumulative frequency greater than or equal to 18.5 is 23. So, the class
4
corresponding to this cumulative frequency is the second class. For this class of Q1 , L = 49.5,
w = 10, f Q1 = 18, and CF = 5.
(18.5 − 5) 10
Now applying Formula 2.17, Q1 = 49.5 + = 49.5 + 7.5 = 57 .
18
To calculate Q 3 , following similar procedures,
3n
= 55.5 , so it’s in the 4th class, and L = 69.5, w = 10, f Q3 = 15, and CF = 50.
4
2. Deciles:
The deciles divide a given set of data in to ten equal parts. There are nine deciles, usually denoted by
D1 , D 2 , , D9 ; generally, D k is used to denote the kth decile. The first decile (D1 ) is the value for
which (1/10)th of the arranged data lies below it, (2/10)th of these items lie below D2 , and so on.
When x1 , x 2 , , x n are n ordered observations, the k th decile (Dk ) is the value corresponding to the
th
k
(n + 1) position, k = 1,2, ,9 . That is, after arranging the data in an increasing order,
10
D1 =
1
(n + 1)th value, D 2 = 2 (n + 1)th value,…, D9 = 9 (n + 1)th value.
10 10 10
Example 1: Find D 4 and D 6 for the following set of numbers (n = 19).
57
2, 9, 7, 6, 1, 5, 8, 10, 6, 11, 6, 14, 7, 1, 2, 7, 8, 9,7.
Solution: After arranging the numbers in ascending order of magnitude, we get: 1, 2, 2, 5, 6, 6, 6,
Example 2: For the data given in Example on page 53 (copy here!), calculate D3 and D8 .
xi 5 7 9 11 13 15 17 19
fi 1 2 7 9 11 8 5 4
Solution: The cumulative frequency for the problem given will be our reference.
To find D3,
3
(n + 1) = 15 , the minimum cumulative frequency greater than or equal to 15 is 19, and
10
the corresponding value is 11= D3 .
To find D8 ,
8
(n + 1) = 40 , the minimum cumulative frequency greater than or equal to 40 is 43,
10
and the corresponding value to this is 17 = D8 .
Example 3: Find D 4 and D 7 for the F.D of marks of 74 students given below.
58
Solution: The less than cumulative frequency with the class boundaries is:
Class boundaries Frequency Cum. freq. (less than)
39.5 – 44.5 2 2
44.5 – 49.5 3 5
49.5 – 54.5 8 13
54.5 – 59.5 10 23
59.5 – 64.5 15 38
65.5 – 69.5 12 50
69.5 – 74.5 14 64
74.5 10 74
To find D 4 : n = 74, 4(74) = 29.6 38, shows D4 is found in the 5th class, where L = 59.5, w =5 ,
10
Percentiles
Generally, p m is used to denote the mth percentile
minimum cumulative frequency greater than or equal to this value. Then, the value associated with
this cumulative frequency is pm. Exercise: find p 40 & p 70 . For the data in example 2, above
59
Percentiles for grouped continuous data
THE MODE
The mode of a set of data is defined as the value with the highest frequency, and
which occurs more than once”.
Mode for ungrouped data
The mode or the modal value of a raw data is simply obtained by locating the observation with the
maximum frequency (if there exists such a value).
Example 1: Find the mode(s) for each of the following sets of data:
a) 3, 5, 5, 4, 6, 5, 4, 7, 8, 7.
b) 3, 8, 7, 8, 4, 7, 8, 7, 4, 5.
c) 3, 4, 6, 8, 7, 2, 1, 9, 5, 12.
Solution: In order to detect the mode(s), it is better if we collect values of the same magnitude
together.
a) 3, 4, 4, 5, 5, 5, 6, 7, 7, 8. The modal value is 5.
b) 3, 4, 4, 5, 7, 7, 7, 8, 8, 8. The modes are 7 and 8.
c) 5, 6, 7, 8, 9, 12. The mode is undefined.
From these examples, we note that a given set of data may have a single mode (called uni-modal),
it may have two modes (called bimodal data), or it may have no mode at all (it does not exist or the
mode is undefined). Generally, if a set of data has two or more modes, then it is said to be
multimodal (ill-defined).
Mode for grouped discrete data
In the case of discrete grouped data, the mode is determined just by inspection, i.e., by looking to that
value (s) having the highest frequency.
60
Mode for grouped continuous data
In such cases, one can only determine the modal class easily: the class with the highest frequency.
After locating this class, the mode is interpolated using:
1 f mod − f1
Mode = L + w =L+ w
1 + 2 ( f mod − f1 ) + ( f mod − f 2 )
Where L = the lower class boundary of the modal class;
1 = f mod − f 1 ,
2 = f mod − f 2 ,
w = the common class width
f 1 = frequency of the class preceding the modal class;
f 2 = frequency of the class succeeding the modal class; and
fmod = frequency of the modal class.
Example 2: Calculate the modal age for the age distribution of 228 housewives.
Class Interval Number of women
Age– (in
15 19 years) 6
20 – 24 19
25 – 29 50
30 – 34 57
35 – 39 48
40 – 44 27
45 – 49 21
Total 228
Solution: By inspection, the mode lies in the fourth class, where L =29.5, fmod = 57, f1=50, f2=48, w
= 5, and 1 = 57 − 50 = 7, 2 = 57 − 48 = 9 .
7
Therefore, x̂ = 29.5 + 5 = 29.5 + 2.2 = 31.7 years.
7+9
Merits of the Mode Demerits of the Mode
-easy to calculate and understand - not rigidly defined by mathematical formula
-can be computed when there is an open-ended class -doesn’t consider all the items
61
- not affected by extreme values -not suitable for further mathematical
treatments
-can be used for qualitative data -is an unstable average
Relationship between the Mean, Median and Mode
In a perfectly symmetrical distribution, they all coincide. X = median= mod e .
If a distribution is moderately symmetrical, then mean, median and mode are related by the
empirical formula: Mode = 3median− 2mean .
Example:-If the mode and mean of a moderately symmetrical distribution are 60 and 66, respectively,
then what is the median value of this distribution?
1 1
Solution: mod e = 3median− 2mean . Median = (mod e + 2mean) = (60 + 132) = 64
3 3
A distribution having a tail in any one side is called a skewed distribution. Thus,
• if a distribution is skewed to the right, then mean median mod e and
• if a distribution is skewed to the left, then mean median mod e.
When to Use the Different Averages?
Mean is appropriate if the data is quantitative and there is no extreme (abnormal) observation(s). For
the data having extreme value(s) (or for qualitative data having ordinal measurement scale) it is better
to use median as measure of central tendency. It is largely used measure of central tendency in
psychology, education and other social sciences. On the other hand, mode is best measure of central
tendency for qualitative data with nominal scale of measurement. It can also be used as a quick
measure of central tendency for both qualitative and quantitative data.
62
5. Measures of Variation
5.1.Introduction
Definition: Measures of dispersion (variation) are statistical figures which indicate the spread or
scatterings of a given distribution about the center. The highest figure indicates high inequality
between the observations. Some of the commons measures of dispersion are: Positional measures
(range, quartile deviation); Mathematical Measures (mean deviation, standard deviation); and
Measures of relative variation (Coefficient of Variation and Standard score). The positional measures
and mathematical measures are known as absolute measures of variation.
R = L-S R= 65-3 = 62 .
Example2: xi 5 7 9 11 13 15 17 19
fi 1 2 7 9 11 8 5 4
For a grouped discrete data, Range = upper class boundary of the highest class-lower class
boundary of the lowest class or the difference between the largest class mark and the smallest class
mark.
Frequency 2 3 8 10 15 12 14
63
The coefficient of range: it is the relative measure of variation, which is unit free
The coefficient of range is a relative measure of dispersion and is calculated by the formula
L−S
.
L+S
L − S 65 − 3 62
For the above examples,1. coef . R = = = = 0.912 .
L + S 65 + 3 68
L − S 19 − 5 14
2. coef . R = = = = 0.583.
L + S 19 + 5 24
L − S 74 − 40 34
3. coef . R = = = = 0.298.
L + S 74 + 40 114
Demerits: Considers only the two extreme values and not possible to determine it in case of open-
ended distributions.
The difference between the third quartile and the first quartile is called the inter quartile range
(I.R.). Usually, the I.R. is reduced to the form of semi- inter quartile range (S.I.R.), which is called
quartile deviation and it describes the variability of the middle 50% of the data.
Q3 − Q1 Q3 − Q1
I.R. = (Q3 – Q1) , Q.D. = and coefficient of Q.D.= .
2 Q3 + Q1
Example: For a certain continuous frequency distribution, the first and the last class limits are 0-
9, and 60-69, and the lower and the upper quartiles are 26.32, and 39.25. Find a) the range; b) the
I.R. and Q.D. c) coeff. Q.D.
64
Q3 − Q1 26.32 − 39.25 12.93
c) coef. Q.D. = = = = 0.197.
Q3 + Q1 26.32 + 39.25 65.57
Merits: easy to understand and calculate, not affected by extreme values, can be determined in
case of open-ended distributions.
The mean Deviation (M.D.) is the arithmetic mean of the absolute deviations of the values from
the mean or median.
If the deviations are taken from the mean then it is called M.D around the mean, if the deviations
are taken from the median, it is called M.D around the median, or if the deviations are taken from
the mode, it is called M.D around the mode.
n n
xi − x x
i =1
i − median
i =1
M.D. about the mean = , M.D. about the median =
n n
n
x
i =1
i − mod e
M.D. about the mode =
n
M .D. x M .D. ~x M .D. xˆ
The coefficient of M.D.= or or
x median mod e
.
While dealing with population values, it is adjusted accordingly.
Example : Find the mean deviations around the mean and around the median of the following
marks of 10 students (out of 50): 21, 23, 25, 28, 30, 32, 38, 39, 46, 48
Solution: The mean = 33 and median = 31. Then, we get
xi 21 23 25 28 30 32 38 39 46 48 Total
xi − 33 12 10 8 5 3 1 5 6 13 15 78
xi − 31 10 8 6 3 1 1 7 8 15 17 76
65
Therefore, the M.D around the mean = 78/10 = 7.8, and 76/10 = 7.6 around the median. That
means, the marks of the students deviate, on the average, by 7.8 from the mean 33, (or by 7.6
from the median 31).
For grouped data, the mean deviations around the mean, median, and mode are obtained,
respectively, as follows:
m m m
f x i
− x f
i =1
i xi − median f
i =1
i xi − mod e
i =1 i
, , and
n n n
Where m = number of classes, and xi = class mark of the ith class.
Example 2: Calculate the two mean deviations for the data given below
Class Interval (C.I) 1-5 6-10 11-15 16-20
Frequency 4 1 2 3
Solution: Mean =
3 4 + 8 1 + 13 2 + 18 3
= 10 , and Median = 5.5 +
(5 − 4)5 = 10.5 .
10 1
C.I xi fi fi xi − 10 fi xi − 10.5
1-5 3 4 28 30
6-10 8 1 2 2.5
11-15 13 2 6 5
16-20 18 3 24 22.5
Total 10 60 60
Therefore, MD around the mean and around the median are 60/10 = 6.
Merits: easy to understand and calculate, less affected by extreme values, considers every item.
Demerits: cannot be determined in case of open-ended distributions.
Note! M.D. is used in studying economic problems such as distribution of income and wealth in a
society.
66
5.3.2. Variance and Standard Deviation
a) Population variance and S.D. (ungrouped data)
Definitions: Variance is the mean of the squared deviations about the A.M. and Standard
Deviation is the positive square root of variance. S.D. is the most commonly used
measure of dispersion and is also extremely useful in judging the representatives of
the mean.
Suppose that x1, x2, ..., xn are the values of the observation in a population of size N with mean
. Then, the population variance and standard deviation are defined by:
n
(x − )
2
i
Variance = 2 = i =1
…………… . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. .. .. . (#)
N
n
(x − )
2
i
Standard deviation = = i =1
N
Example 1: Suppose that the ages of all patients in the recovery room of a certain Hospital are:
26,30,38,40,36,20,45, and 37 years. Find the population variance and standard deviation
Solution: Given N = 8, x1 = 6, x 2 = 30,, x8 = 37.
26 + 30 + + 37 272
The population mean is, = = = 34.
8 8
Then, we construct the following table for the deviations and squared deviations from (the last
column is for totals).
Age ( xi ) 26 30 38 40 36 20 45 37 272
xi − -8 -4 4 6 2 -14 11 3 0
Thus, using above formulae we have, 2 = 462 / 8 = 57.75 ; and = 57.75= 7.599.
1 N 2
OR =
N i =1
xi − 2
2
The derivation of this computational formula is obtained by the above formulae (#):
2
N N N N N
(x − ) = x
i
i =1
i
2
− 2 xi + 2 = xi − 2 xi +N 2
i =1 i =1 i =1
2
i =1
67
N N
= xi − 2 N +N = xi − N 2 . Hence, the resulting formula.
2 22 2
i =1 i =1
xi = 9710 , and
2
Now, to solve Example 1 above using this formula, we have
i =1
x i = 272 = 34 , 2 =
1
(9710) − (34)2 = 57.75
i =1 8
b) Population variance and S.D for grouped data
In a grouped discrete or continuous data where x1, x2, , xm have their corresponding frequencies
1 m
f i (xi − )
2
2 =
N i =1 for N items and m classes.
Example 2.: Find the variance and S.D for the population values:
xi 2 3 5 6 8
fi 3 4 4 5 4
68
5
Here, N = fi = 20 ; then, using the definitional formula, the variance is
i =1
estimate the corresponding parameters. So far, the parameters 2 and have been discussed.
Normally, one can say that if we replace N by n and by x the resulting formula would become
( )
2
1 n
xi − x
n 1
and this would be used to estimate 2. But, theoretically, it can be shown that this
1 n
S = ( xi − x )
2 2
n − 1 i =1
Example 3: A sample of 5 students was taken from a class and their weights were found to be 48,
51, 52, 51, and 53kg. Find the variance and standard deviation.
Solution: The mean is x = 255/ 5 = 51, and prepare the following table:
Weights (xi ) xi − x (x − x )
i
2
48 -3 9
51 0 0
52 1 1
51 0 0
53 2 4
Total 0 14
69
n 5 2
x i − x) 2
( x i − 51)
14
s = 2 i =1
= 1
= = 3.5, and s = 3.5 = 1.87 is the sample standard
n −1 5 −1 4
deviation.
d) Sample variance and S.D for grouped data
If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:
1 m
S = fi ( xi − x )
2 2
n − 1 i =1
The above definition for sample variance also holds for grouped continuous distribution where
xi=class mark of the ith class
Example 4: Find the sample variance and standard deviation for the distribution:
Freq. 4 1 2 3
Solution: In a continuous F.D., xi is the class mark representing the ith class.
C.I xi fi f i xi 2
f i xi
1-5 3 4 12 36
6-10 8 1 8 64
11-15 13 2 26 338
16.20 18 3 54 972
Total 10 100 1410
Where n = fi = 10, x =
fi xi 100
= = 10, fi xi 2 = 1410 , so that
n 10
s2 =
1
9
(1410 − 10 (10 ) =
2 410
9
)
= 45.56, and s = 45.56 = 6.75.
Properties of S.D.
70
1.Adding or subtracting the same number on each observation doesn’t change the S.D.
2.Multiplication and division on each number will affect the S.D. similarly.
Merits of the S.D. Demerits of the S.D.
-rigidly defined by mathematical formula -gives more weight for extreme values
-is based on all the observations -cannot be calculated if all the values are not
given
-is useful for further mathematical treatments. -cannot be used when there is an open-ended
class -takes time to compute it
5.4.Measures of relative Variation
5.4.1. Coefficient of Variation (C.V.)
It is a relative measure of dispersion and is always a percentage. it is the best measure for
comparing the variability of two series having different means and standard deviations.
s
C.V . = 100% or C.V . = 100% .
x
Example: Suppose typist A types out 30 pages per day on average with a S.D. of 6 and typist B
types out 45 pages per day on average with a S.D. of 10. Which typist has shown greater
consistency in her/his output?
s 6
Solution: C.V.(Typist A) = 100% = 100% = 20 % .
x 30
s 10
C.V.(Typist B) = 100% = 100% = 22.2 % .
x 45
Typist A.
Exercise! Let n = 10, x = 12 and x = 1530 for a certain data. Find the C.V.
2
i
71
Thus, one can argue that the student’s score in Statistics is: 66 − 51 standard deviations above
= 1.25
12
the average, while his score in Mathematics is only 80 − 72 = 0.50 standard deviation above the
16
average for the class. Here, the grades have been converted in to standard scores. Where as the
original scores cannot be meaningfully compared, the standard scores expressed in terms of
standard deviations can be compared. Thus, the student scored much higher in Statistics than in
Mathematics compared to the rest of the class.
In general, we define the standard score as:
x−x x−
z= or Z =
s
The Z- score tells us how many standard deviations a value lies above (if positive) or below (if
negative) the mean of the set of data to which it belongs.
Example: If a set of measurements has the mean 48 with a S.D. of 12, convert each of the
following in to
standard units: a) 54; b) 72 ; c) 78.
Solution: a) For x = 54, Z 54 − 48 that is, 54 is 0.5 S.D’s below the mean .
= = −0 .5 ;
12
A distribution with the bell shaped and the mean= median =mode of the distribution is known as
the symmetric distribution (or the normal distribution)
72
A distribution with an asymmetric tail extending out to the right is referred to as “positively skewed”
or “skewed to the right”. In case the mean>the median>the mode
A distribution with an asymmetric tail extending out to the left is referred to as “negatively skewed”
or “skewed to the left.” In this case, the mean< the median < the mode of the distribution
3( X − median)
sk = . If sk>0, the distribution is right skewed. If sk < 0, it is left skewed and if sk
s
= 0, it is normal distribution
Skewness has also been defined with respect to the third moment about the mean:
( X − )3
1 = , which is simply the expected value of the distribution of cubed z scores.
n 3
Skewness measured in this way is sometimes referred to as “Fisher’s skewness.” When the
deviations from the mean are greater in one direction than in the other direction, this statistic will
deviate from zero in the direction of the larger deviations
Great skewness may motivate the researcher to investigate outliers. When making decisions about
which measure of location to report (means being drawn in the direction of the skew) and which
inferential statistic to employ (one which assumes normality or one which does not), one should
take into consideration the estimated skewness of the population. Normal distributions have zero
skewness. Of course, a distribution can be perfectly symmetric but far from normal.
Transformations commonly employed to reduce (positive) skewness include square root, log, and
reciprocal transformations.
73
6.3.Kurtosis
It measures the degree of peakdenss or flatness of a distribution at the top. There are three types
of kurtosis:
b. Platykurtic: it is flat at the top (i.e., more flat-topped distribution) when compared to
the normal distribution.
c. Leptokurtic: if is peak at the top (i.e., less flat-topped distribution) when compared to
the normal distribution.
( X − )4
2 = ,
n 4
74
7. Simple Correlation and Regression Analysis
7.1.Simple Correlation Analysis
Definition
Simple correlation coefficient (r) measures the degree of linear relationship (or association)
between two variables only. Simple correlation coefficient (r) varies from negative one to positive
one (i.e., −1 r 1 ).
r= -1 means there is perfect negative linear association between two variables
r= 1 means there is perfect positive linear association between two variables
r= 0 means there is no linear association between two variables
Note that the existence of correlation does not necessarily imply causation. However, causation
necessarily leads to correlation. In general, correlation may be due to one of the following facts:
a. One variable being the cause of the other variable. E.g. money supply causes inflation
b. Both variables being the result of a common cause. E.g. yield of maize and sorghum
may be positively correlated due to the fact that both are related to rainfall
c. Chance factor
Limitations of simple correlation coefficient
1. it considers only two variables and if the 3rd, 4th, etc variables exist, simple correlation
coefficient is not appropriate
2.
3. it does not capture non-linear relationship between two variables
75
7.2.Measures of Simple Correlation
1. Scatter Diagrams
+yi
+yi
.
-xj +xj
-xj +xj
-yi
-yi
positive negative
correlation correlation
(X i − X )(Yi − Y )
r = i =1
n n
(X
i =1
i − X ) . (Yi − Y ) 22
i =1
n
( x )( y ) i i
r = i =1
n n
( x ) . ( y )
i =1
i
2
i =1
i
2
3. The Rank Correlation Coefficient ( ): it is used when our variables are qualitative in their
nature.
76
6 Di2
= 1−
n(n2 − 1)
Where n=# of pairs ranks
Economics is often concerned with the relationship between economic variables. Regression
analysis is basically concerned with examining these relationships.
Regression analysis is one of the most commonly used tools in econometric work.
Regression Analysis is concerned with the study of the dependency of one variable, the dependent
or explained variable, on one or more other variables called the explanatory variable or the
independent variables.
Conventionally the explained variable is designated by Y and the explanatory variables by Xis.
Y = f (X1, X2,...XK)
If K=1, that is, there is only one X-variable, we have the simple Regression.
But
If K>1, i.e. more than one X-variable, we have the multiple regression case.
But if: Y = consumption expenditure of a family is affected by X1, x2, and X3.
77
X1 = family income
X2 = financial assets of the family
X3 = family size
We have the case of multiple regression.
Once the variables have been identified and the data are collected, the next step in regression
analysis is to estimate the coefficients of the equation.
Thus it is often said that the bread and butter of regression analysis is the estimation of the
parameters of the econometric model.
In other words, the main purpose of regression analysis is to take a purely theoretical equation
like:
Y=0 + 1Xi + 2X2 +
and use a set of data to create an estimated equation like:
The numerical values of the parameters (coefficients) of a regression equation can be estimated
using different methods.
The most widely used method of obtaining the parameter estimates from a sample is called
ordinary least squares (OLS).
78
^ ^ ^ ^
Sample regression function (SRF): Yi = + X i + ei but
Yhat = + X i
Yi = Yhat + ei
ei = Yi − Yhat
Thus, the residual is the difference between the actual and the estimated/predicted values of Y.
Now, given data on both X and Y, our objective is to find the SRF that best “fits” the PRF or the
values of estimators, which are as close as possible to population parameters. However, what is
the possible criterion to choose the SRF that best “fits” the actual PRF?
n
the sum the residuals is zero (i.e., ei = 0 ), we will have the method of moments. However, this
i =1
criterion is not good as it gives equal weight to all kinds of residuals (large, medium and small).
According to this criterion, find the SRF, which minimize the sum of the squared residual (i.e.,
n
e i
2
^ ^
min. i =1 ). Alternatively, we choose and such that the sum of squares of errors is
minimized. This criterion is very important for two basic reasons:
a. it gives more weight to larger residuals and less weight to smaller residuals
b. the estimators obtained through OLS method of estimation have some desirable
statistical properties
Since different samples from a given population result in different estimators, which of these
estimators minimize the sum of squared residuals? (Recall how to find extreme values of
functions- the necessary and sufficient conditions).
79
^ ^
Yi = + X i + i , we want to substitute and for for, and minimise
n 2
Y − − X
^ ^
S= i i
i =1
n
= e
i =1
2
i
The intuitive idea behind the procedure of least squares is given by looking at the following figure.
Y
Sample
Regression Line
ei
Yi ei
.
.
Yi
.
X
Xi
The regression line passes through points in such a way that it is ‘as close as possible’ to the points
of the data. Closeness could mean different things. The minimisation procedure of the OLS implies
that we minimise the sum of squares of the vertical distances of the points from the line.
80
S
( )
n n
= 0 2 Yi − ˆ X i (− 1) = 0
ˆ
ˆ −
i =1
e
i =1
i = 0
n n
Y
i =1
i = n ˆ
ˆ + Xi i =1
EQ 1
Y = ˆX
ˆ +
ˆX
ˆ = Y −
S
( )
n n
ˆ
= 0 2 Y i − ˆ − ˆ X i (− X i ) = 0
i =1
X i =1
i ei = 0
EQ 2
n n n
X Y
i =1
i i
ˆ Xi +
=
ˆ X
i =1 i =1
2
i
Equations 1 and 2 are known as the normal equations. To solve for ̂ , we substitute for ̂ in
equation 2 from equation 1 and we get
n n n
Yi X i = Y − x X i + X i2
^ ^
i =1 i =1 i =1
n n n n
Yi X i = Y X i − X X i + X i2
^ ^
i =1 i =1 i =1 i =1
n n
n
n
YX − Y Xi = Xi − X Xi
^
2
i =1
i i
i =1 i =1 i =1
n
1 n n ^ n 1 n n
Yi X i −
i =1
Yi i
n i =1 i =1
X =
i =1
X i
2
− X i Xi
n i =1 i =1
n 2
n
X i X i
n n n
n X i Y i − X i Y i n 2
−
i =1 i =1 i =1 ˆ i =1
=
i =1
n n
n n n
n X i Y i − X i Y i
ˆ =
i =1 i =1 i =1
2
n X i2 −
X i
n n
i =1 i =1
Numerator of the last Equation can be simplified as follows: First, we take the numerator to get
81
n n n n
n Yi X i − Y X i i = n Yi X i − nYn X
i =1 i =1 i =1 i =1
n
= n Yi X i − nY X
i =1
n
= n (Yi X i − Y X )
i =1
n
= n (Yi − Y )( X i − X )
i =1
Similarly the denominator of our last equation can be rewritten as:
2
n
X − X i = n X i2 − (n X )
n n
n 2 2
i =1 i =1
i
i =1
n 2
= n X i2 − ( X )
i =1
= n ( X i − X )
n
2
i =1
Therefore, we have:
(Y − Y )( X ) x y
n n
^ i i −X i i
= i =1
= i =1
, where xi = X i − X & yi = Yi − Y the deviation form
( X )
n n
x
2
i −X i
2
i =1 i =1
( X − X )(Y i − Y )
n
i ( n − 1) Cov ( X , Y )
ˆ = i =1
=
Var ( X )
( X − X ) ( n − 1)
n 2
i
i =1
The estimators obtained thus are known as the Least Square Estimators.
(1) The estimators are expressed solely in terms of the observable quantities (sample)
(2) They are point estimators: they provide a single (point) value of the relevant
population parameter
82
The regression line thus obtained has the following properties:
y = b0 + b1x i
( )
= y − b1x + b1x i
= y − b1( x i − x )
Summing both sides of this last quantity over the sample values and dividing through by the
sample values N gives y = y since ( x i − x ) = 0
83
8. Introduction Probability Theory
8.1. Basic Concepts
Typically, the starting point of an investigation is an experiment. This may involve either a simple
rolling of a pair of dice or as complicated as conducting a large-scale survey of households or
firms.
2. Rolling a die
An experiment is said to be a Random experiment if the following three conditions are satisfied:
b. Outcomes: The results one obtains from an experiment are known as outcomes. There are
certain regularities that one may observe outcomes from an experiment; however, it is not possible
to predict them. Example the outcomes are head or tail in tossing a coin only once
c. Sample space: The set of all possible outcomes (totality) of an experiment is referred to as the
sample space. It is usually denoted by the letter S.
Examples: 1. In tossing a coin once the sample space is: S = H , T . Tossing the coin twice the
sample space is S = HH , HT , TH , TT . Tossing the coin three times the sample space is:
HHH , HHT , HTH , HTT , THH , THT , TTH , TTT . In general tossing coin n times, the
number of elements in the sample space is 2n
84
2. In rolling a die, the sample space is: S = 1,2,3,4,5,6 . In rolling two dice at the same
• Discrete Sample Space: If the sample space contains a finite, or an infinite but countable
number of elements, with a one to one correspondence to positive integers, it is called a
discrete space.
• Continuous Sample space: If the sample space has infinite and uncountable number of
elements (i.e., if it has as many elements as there are real numbers) it is said to be
continuous.
d. Sample point or elementary event: Elementary events are singletons in a sample space or each
distinct individual element of the sample space is known as the sample point (or elementary event).
Example: If a coin is tossed twice, the sample space consists of four sample points HH, HT, TH,
and TT; S = {HH, HT, TH, TT}, where H = heads, and T = tails.
e. Events: Any subset of the sample space (which is also a set of sample points) that represents
several outcomes of an experiment is known as an event.
Example: 1. in tossing two coins at the same time, event A contains at least one head.
A = HH , HT , TH
2.when a pair of dice is rolled, “total score of 7” is an event represented by the sample points in A
as follows: A = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}, where the first number is the outcome of
the first die and the second number is the outcome of the second die.
f. Null (impossible) event: the null or impossible event is represented by . It refers to the event
that never occurs at all.
85
g. Mutually Exclusive Events: Events A and B are said to be mutually exclusive if the occurrence
of A precludes the occurrence of B and vice-versa. They cannot occur together. That is,
A B=
h. Equally Likely Events: the outcomes of a random event are said to be equally likely iff
occurrence of none of them can be expected in preference to another.
i. Collectively Exhaustive: two events A and B are said to be collectively exhaustive if they contain
all possible outcomes of the sample space.
j. Independent Events: Events A and B are said to be independent if the occurrence of A has no
effect on the occurrence of B and vice-versa.
k. Complement of an Event: If event A is a subset of sample space S, then its complement is A/,
which contains elements in S but not in A. Thus, complement is S-A. In other words, for each set
A we can define a unique set A/ such that
A ∩ A/ = and A U A/ = S.
a. Classical probability concept. This concept applies when all possible outcomes are equally
likely, as is presumably the case in most games of chances.
86
Example1: Toss a fair coin twice, then the sample space S = {HH, HT, TH, TT); N=4
Let the event A be “observing at least one head”, then A = {HH, HT, TH}; n = 3
3
P ( A) =
4
Example 2: when a pair of dice is rolled, event A represents “total score of 7”. N=36 and n=6
6 1
P ( A) = =
36 6
NB: The probability of an event, according to this definition, is the proportion of that event of the
same kind will occur in the long run.
Despite improvement over the classicl approach, there are some objections: what is mean by limit
as N goes to infinity? How can we generate infintie squence of trials? What happens to phenomena
where repeated trials are not possible?
c. Subjective Definition: According to this approach, probability refers to the “degree of belief”
of individuals in assessing the uncertainty of a particular situation. Since there is no past experience,
it depends on educated guesses. The two approaches lead to alternative methods of statistical
inference. The frequentist will conduct the discussion around what happens “in the long-run” or
87
“on average”, and attempt to develop “objective” procedures. On the other hand, a subjectivist will
be concerned with the question of revising prior beliefs in the light of available information in the
form of the observed data.
Axiom 1: For and event A in S, the probability of A is a nonnegative real number. That is, P(A)
≥ 0.
P(S) = 1.
Axiom 3: If A1, A2, ... is a finite or infinite sequence of mutually exclusive events in S then
P( Ai ) =
i =1
P( A )
i =1
i
Theorem 1.1: If A is an event in a discrete sample space S, then P(A) equals the sum the
probabilities of the individual outcomes comprising A.
Theorem 1.2: If an experiment can result in any one of the N different equally likely outcomes,
and n of these outcomes together constitute event A, then the probability of event A is given by
n
P ( A) = .
N
88
Theorem 1.4: If A and B are events in a sample space S and A B, then P(A) ≤ P(B)
(Monotonicity of the probability function).
Theorem 1.5: If A and A are complementary events in a sample space S, then, P(A ) = 1 - P(A).
Counting Procedures
Theorem 1: If an operation can be performed in n1 different ways and if for each of these the
second operation can be performed in n2 ways, then the two operations when associated
together can be performed in n1n2 ways.
Example: How many sample points are in the sample space when a pair of dice is thrown once?
n1 = 6, n2 = 6 n1n2 = 36
Theorem 2: If an operation can be performed in n1 ways and if for each of these the second
operation can be performed in n2 ways, and for each of the first two the third operation
can be performed in n3 ways, etc…, then the sequence of k operations can be performed
in n1n2…nk ways.
Example: A college freshman must take an economics course, a management course, and a
statistics course. If she may select any one of 3 economics courses, any of 4 management courses
and any of 2 statistics courses, in how many ways can she arrange her program?
89
n1 = 3, n2 = 4, n3 = 2 n1n2 n3 = 24
The number of permutations is the product of n consecutively positioned integers. Since this
number arises in many problems in mathematics, it is given a special name. If n is a positive integer
we define
E.g. 6! = 6 x 5 x 4 x 3 x 2 x 1 = 720
2! = 2 x 1 = 2
Theorem: The number of ordered arrangements or permutations of n distinct objects taken all at
a time (with out repetition) is given by n! and denoted by n P n = n !
Theorem: The number of permutations of n distinct objects taken r at a time (with out repetition)
is given by
Combinations – refers to unordered arrangements (i.e., no concern about the order of the
arrangements)
The number of ways of choosing r objects from a total of n distinct objects, called the number
of combinations of n objects taken r at a time, is given by (order is not important in this case).
n n!
n C r
= =
r r!(n − r )!
.
90