8-MC 107-Elementary Stat and Probability-Prelims
8-MC 107-Elementary Stat and Probability-Prelims
Course Description: The course equips the students with the basic statistical
tools to understand various phenomena. The topics on mean, variance, sampling
and estimation eventually allow the students to be able to perform hypothesis
testing on real-life problems from different fields. The course includes applications
and data analysis with computations carried out using SPSS.
Course Requirements:
▪ Assessment Tasks - 60%
▪ Major Exams - 40%
_________
Periodic Grade 100%
Introduction
The study of Statistics covers different things to different people. Men from
PAGASA report daily weather statistics, sports casters give halftime statistics, market
reporters give exact prices of prime commodities and the analysis on the actual price
increases. Mathematicians describe statistics as a major area in mathematics and
researchers discuss the appropriate statistics for analyzing the results of a particular
investigation (Amid, 2005).
1
Government agencies gather data through surveys called census. A census
is an official, usually periodic, counting of population as to the number of persons
staying in the household, their ages, gender, civil status, monthly income, the type of
house and whether they own or are renting the house. The data collected are organized
and are given meanings relative to the purposes for which the census is made. Various
government agencies use these data for planning of allocation of funds such as
housing, need for school buildings and books, opportunities for employment, medical
services and many others (Cruz, 2002).
Statistics is a branch of mathematics that can be used for many other purposes. It
can give a precise description of data. This is a function of statistics which enables
researchers to make accurate statements or judgments about averages, variability and
relationships. Example: describing the academic performance of a group of pupils
according to the computed mean, standard deviation, and correlation with another
factor. It can predict the behavior of individuals. To measure the success of a pupil,
teacher or worker the researcher may have to compute measures like mean, standard
scores, percentiles, stanines and other statistical methods. Example: grades of students
can be predicted through scholastic aptitude test, the work performance through an
aptitude test related to the particular type of work, a teacher’s performance through a
teacher aptitude test or psychological test. It can be used to test a hypothesis.
Relationship/s between variables can be determined through a test of inference such as
in correlation. Other statistical measures that can be applied for inferential purposes are
the t-test, chi-square test, F-test and others. It is wise to remember that the choice of
statistics to use in testing hypothesis depends upon the nature of data. This includes
the scale of measurement used such as nominal, ordinal, interval and ratio; and its
distribution whether normally distributed or not; and other considerations depending on
the purpose. (Punzalan & Uriarte, 1989)
2
Learning Outcomes
3
Lesson 1. Definition of Statistics
Statistics have different meanings. In the more common usage, it refers to numerical
facts or recorded data. The number that represents the income of a family, the number of
cars sold at a dealership during past months, the number of employees of a company, the
number of students enrolled in a class, the starting salary of a typical college graduate, the
number of patients visiting a clinic or the number of hospital-acquired infection per month
are examples of statistics in this sense. Statistics is also a way of collecting and displaying
information (Amid, 2003).
4
population (INSTAT UPLB, n.d.). It is a portion of the population selected for study
(Amid, 2003).
6. Representative sample is a sample that represents the characteristics of the
population as closely as possible (Amid, 2003).
7. Random sample is a sample drawn in such a way that each element of a population
has equal chances of being selected (Amid, 2003).
8. Datum refers to the value of the variable associated with one element of a
population or sample. This value may be a number, a word, or a symbol
(https://quizlet.com/, 2020).
9. Data (plural) is the set of values collected for the variable from each of the elements
belonging to the sample (https://quizlet.com/, 2020). These are numbers or
measurements that are collected as a result from observation, interview,
questionnaire, experimentation, test and so forth (Amid, 2003).
10. Parameter is a numerical value summarizing all the data of an entire population
(https://www2.southeastern.edu/, 2020).
11. Statistic is a numerical value summarizing the sample data
(https://www2.southeastern.edu/, 2020).
5
Lesson 4. Types of Variables
NOTE: Arithmetic operations, such as addition and averaging, are meaningful for data
resulting from a quantitative variable (Melek, 2020).
Discrete variable. Discrete variable is a variable that can assume a countable number
of values, using integers – that is, whole numbers (e.g.0,1,2)
Example:
• Age
• Weight
• Height
• Time
6
NOTE: Statisticians often treat discrete variables as continuous variables.
Summation notation is used to denote the sum of values. The uppercase Greek letter
(pronounced sigma) is used to denote the sum of all values. Using this notation, the
foregoing sum can be written as follows:
x
i =1
i = x1 + x2 + x3 + x4 + x5
The notation in this expression represents the sum of all the values of x.
7
Example 1: Suppose the ages of four managers are 35, 47, 28 and 60 years.
Find:
a) x b) ( x − 6 ) c) x 2 d )( x )
2
Solution: Let x1, x2, x3 and x4 be the ages (in years) of the first, second, third and fourth
manager, respectively. Then,
x1 = 35, x2 = 47, x3 = 28 and x4 = 60
a.) x = x 1 + x 2 + x3 + x 4 = 35 + 47 + 28 + 60 = 170
b.)
( x − 6) = (x − 6) + (x − 6) + (x − 6) + (x − 6)
1 2 3 4
c.) Note that (x)2 is the square of the sum of all x values.
x = 170 (solved in letter a.)
d.)
x = x + x + x + x
2
1
2
2
2
3
2
4
2
x = 7818
2
m 12 15 20 30
8
f 5 9 10 16
a.) m = m 1 + m2 + m3 + m4 = 12 + 15 + 20 + 30 = 77
b.)
f 2
= f1 + f 2 + f 3 + f 4
2 2 2 2
f 2
= 25 + 81 + 100 + 256
f 2
= 462
c.)
mf = m f + m f + m f + m f
1 1 2 2 3 3 4 4
d.)
m 2
f = m1 f1 + m2 f 2 + m3 f 3 + m4 f 4
2 2 2 2
m 2
f =720 + 2025 + 4000 + 14400
m 2
f =21145
9
Assessment Task 1-1
Describe the nature of statistics. With your knowledge of statistics in this sense, how
do you apply it in a real life situation. Illustrate your answer in 5 sentences.
10
Assessment Task 1-2
Answers:
1. 6.
2. 7.
3. 8.
4. 9.
5. 10.
11
Assessment Task 1-3
Compute:
a.) x
b.) (𝑦 − 1)
c.) xy
d.) (x)2
e.) y2
12
Assessment Task 1-4
A. Qualitative variable
B. Quantitative – discrete variable
C. Quantitative – continuous variable
13
Summary
• Statistics is the science that deals with the collection, organization, analysis,
interpretation and presentation of information that can be applied numerically.
• Population is a collection, or set, of individuals, objects or events whose
properties are to be analyzed; the complete collection of individuals or objects
that are of interest to the data collector and researcher.
• Sample is any subset of a population which simply means that it could be the
individuals, objects or measurements selected by the sample collector from the
population.
• Primary and secondary data are the two different types of data.
• The types of variables are qualitative and quantitative variables.
• There are two classifications of quantitative variables namely: discrete and
continuous.
• The four classifications of a qualitative variable are nominal, ordinal, interval and
ratio variable.
• Summation notation is used to denote the sum of values. The upper case Greek
letter (pronounced sigma) is used to denote the sum of all values.
References
14
Identifying Parameters and Statistics. (n.d.).
https://www2.southeastern.edu/Academics/Faculty/dgurney/Math241/StatTopic
s/ParamStat.htm#:~:text=Parameters%20are%20numbers%20that%20summa
rize,subset%20of%20the%20entire%20population.&text=For%20each%20stud
y%2C%20identify%20both,the%20statistic%20in%20the%20study.
15
MODULE 2
COLLECTION AND PRESENTATION OF DATA
Introduction
Every day we come across a lot of information in the form of facts, numerical figures,
tables, graphs, etc. These are provided by newspapers, televisions, magazines, and other
means of communication. (aven.amritalearning.com, 2020) Data collection, organization and
presentation of data are equally important activities as the statistical data analysis and
interpretation of results in any research undertaking. The researcher must see to it that the
data collected are suitable and sufficient to achieve the objectives of research. Methods of
data presentation vary in their efficiency depending on the type of information the researcher
have.
Learning Outcomes
16
Lesson 1. Methods of Data Collection (INSTAT UPLB, n.d.)
There are three methods available to a researcher for collecting data, namely
objective, subjective and the use of existing records. One or a combination of these methods
can be used depending on the availability of resources and data requirements. If a research
needs direct collection of data from the units of the study, a researcher has to apply the
objective or subjective method. If the data or part of the data needed by the researcher have
already been collected by another researcher or institution, then the use of existing records,
can be used.
Objective Method. In using the objective method, data are collected by measuring or
observing the characteristics of interest directly on the entities under study. This method
requires counting or measuring instruments to ensure correct and up-to-date information.
Data collection by observation using the five senses is also considered an objective method.
Use of existing records. It is the most convenient method since the researcher
makes use of data that are already available. In using this method, the researcher should
remember to properly acknowledge the source of data.
Data collected can be classified into two types namely, primary and secondary.
Primary data are those collected directly from the source or obtained through objective or
subjective methods. Secondary data are those which have been acquired through the use of
existing records.
Primary data. Primary data is the kind of data that is collected directly from the data
source without going through any existing sources. It is mostly collected specially for a
research project and may be shared publicly to be used for other research. Primary data is
often reliable, authentic, and objective in as much as it was collected with the purpose of
addressing a particular research problem. It is noteworthy that primary data is not commonly
17
collected because of the high cost of implementation. A common example of primary data is
the data collected by organizations during market research, product research, and
competitive analysis. This data is collected directly from its original source which in most
cases are the existing and potential customers. Most of the people who collect primary data
Secondary data. Secondary data is the data that has been collected in the past by
someone else but made available for others to use. They are usually once primary data but
become secondary when used by a third party. Secondary data are usually easily accessible
to researchers and individuals because they are mostly shared publicly. This, however,
means that the data are usually general and not tailored specifically to meet the researcher's
needs as primary data does. For example, when conducting a research thesis, researchers
need to consult past works done in this field and add findings to the literature review. Some
other things like definitions and theorems are secondary data that are added to the thesis to
be properly referenced and cited accordingly. Some common sources of secondary data
include trade publications, government statistics, journals, etc. In most cases, these sources
cannot be trusted as authentic.
Example: 1. In the Statistics class of 40 students, 3 obtained the perfect score of 50. Sixteen
students got a score of 40 and above, while only 3 got 19 and below. Generally, the
students performed well in the test with 23 or 70% getting a passing score of 38 and above
(Pegollo, 2012).
Example 2: A census conducted by the Philippine census placed the population of the
country, as of May 2000 at 76.5 million. This figure is higher by 7.9 million persons than the
18
1995 census figure of 68.6 million and higher by 15.8 million persons than the 1990 census
count of 60.1 million persons.
Example:
1. Bar chart – the data are presented by drawing bars in height to the value which
they represent; maybe drawn horizontally or vertically depending on the number
of categories or groupings of the variable being depicted.
19
Source: https://www.mathsisfun.com/data/bar-graphs.html
2. Pie chart – a circle is drawn to represent the whole quantity and it is then divided
into segments each of which is proportional to the size of the components.
21
5. Histogram – a graphical representation of the frequency distribution of a
continuous quantitative variable; a bar is used to depict the counts of each class
or group
22
Lesson 3. The Frequency Distribution Table
One type of tabular presentation that is very useful in summarizing and organizing
data is the Frequency Distribution Table (FDT). It contains non-overlapping categories or
classes of a variable and the respective frequencies or counts of the observations falling in
each category or class. The frequency distribution is an arrangement of the data which
shows the frequency of different values or groups of values of variable. It can be done direct
from the raw data. The raw data can be scores comprised of ungrouped data (Amid, 2005).
Example: A class of 30 students was given a Mathematics examination. The grades are
given below. Construct FDT with seven (7) classes.
The following steps are suggested in the construction of a frequency distribution table
[Punzalan & Uriarte (1989), Cruz (2002), Febre Jr. (1987), INSTAT UPLB (n.d.), Amid
(2005)].
23
2. Compute for the range by getting the difference between the highest and the lowest
scores. It is given by the formula
R = HS – LS = 92 – 65 = 27
3. Determine the number of groups or classes (k) to use. The maximum number of
classes is 15-20 no matter how many observations there are. The ideal number of
class interval is somewhere between 5 and 15.
k = 7 classes (given in the problem)
4. Compute for the class size (class interval size or interval size) denoted by i. Divide
the range by the number of classes to get i. Round it off to the nearest whole
number.
i = R ÷ k = 27 ÷ 7 = 3.857... ~ 4
5. Write the class intervals starting from the lowest score. Take note that we have 7
classes (k) and the lowest score is 65. The first entry in the class interval is 65-68
with an interval size of 4 and the next all have i=4.
LL = lower limit
= 65, 69, 73, 77, 81, 85, 89
UL = upper limit
= 68, 72, 76, 80, 84, 88, 92
6. Count the frequency (f) for each class interval and find the total (N or n). Frequency
is the score found within the range of each class interval.
24
7. Compute for the lower true class boundary (LTCB) and the upper true class
boundary (UTCB).
LTCB = LL – 0.5 UTCB = UL + 0.5
25
8. Compute for the midpoint or class mark of each class interval (M or x m). Do not
round off the midpoint.
LL + UL
M =
2
M = (65 + 68) ÷ 2
= (133) ÷ 2
M = 66.5
9. Find the cumulative frequency less than (cf<). It should start from the frequency of
the lowest class interval. The lowest class interval is 65-68, so start adding from 4.
The sum should be equal to N.
10. Find the cumulative frequency greater than (cf>). It should start from the frequency of
the highest class interval. The highest class interval is 89-92, so start adding from 4.
The sum should be equal to N.
26
11. Compute for the relative frequency (RF). Round off to the nearest hundredths.
Possible RF total = 99.99, 100 or 100.01
f
RF = x100%
N
RF = 4 ÷ 30 x 100% = 13.33
RF = 1 ÷ 30 x 100 % = 3. 33
27
Assessment Task 2-1
3. Which do you think is the best method of presenting data? Explain your
answer.
28
Assessment Task 2-2
Below are Statistics exam scores of 40 BEEd3 students. Follow the steps in
constructing FDT to complete the table. The table should have 6 classes.
29
Summary
30
References
31
MODULE 3
MEASURES OF CENTRAL TENDENCY
Introduction
Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that patterns might
emerge from the data (statistics.laerd.com, 2018). It is simply a way to describe data
and does not allow for conclusions to be made beyond the data analyzed or reached
conclusions regarding any hypotheses made. This is used to present quantitative
descriptions in a manageable form. It helps simplify large amounts of data in a
sensible way (Trochim, 2020 ). It can be useful to provide basic information about
variables in a data set and highlight potential relationships between variables (Child
Care & Early Education Research Connections, 2019).
A measure of central tendency is a single value that attempts to describe a
set of data by identifying the central position within that set of data
(https://statistics.laerd.com/, 2018). Measures of central tendency are sometimes called
measures of central location (Australian Bureau of Statistics, 2020). A measure of
central position or central tendency is a single figure which is representative of the
general level of magnitudes or values of the items in a set of data. This figure is used to
represent all the numbers in the set of data. When arranged according to magnitude, it
tends to lie centrally within the set (Pagoso & Montaña, 1985). They are also classed as
summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the
median and the mode. The mean, median and mode are all valid measures of central
tendency, but under different conditions, some measures of central tendency become
more appropriate to use than others (statistics.laerd.com, 2018).
32
Learning Outcomes
33
Lesson 1. Mean
MEAN. The mean (or average) is the most popular and well-known measure of
central tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous data (https://statistics.laerd.com/, 2018). It is generally described
Mean for Ungrouped Data. This is computed by adding all the values and dividing
the sum by the number of values (Febre, 1987). The formula for the mean is given by:
Σ𝑥
x = .
𝑛
Example: Suppose a teacher chooses ten students whose scores in a 30-item test are as
follows: 15, 25, 18, 15, 20, 25, 18, 18, 20, 25. Calculate the mean and interpret
the result.
Solution:
Σ𝑥
x = 𝑛
15+25+18+15+20+25+18+18+20+25
x =
10
199
x =
10
x = 𝟏𝟗. 𝟗
The value of the mean indicates that the group has obtained an average
score of 19.9 or has correctly answered about 20 items out of 30. The
group has answered about 67% (20/30) of the items, a relatively good
result.
34
If the scores occur more than once, it can be listed as to the number of frequencies
the score appears. Instead of adding individual scores, take the product of each score and
the frequency with which it appears (Cruz, 2002). Use the formula:
Σf𝑥
x = 𝑛
Supplemental Activity: Solve for the mean from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑥ҧ =36.5
Example: Suppose a teacher chooses ten students whose scores in a 30-item test are as
follows: 15, 25, 18, 15, 20, 25, 18, 18, 20, 25. Summarize the scores into a table
as follows:
Scores Frequency
15 2
18 3
20 2
25 3
𝑛 = 10
Σf𝑥
Solution: x = 𝑛
x = 15(2)+18(3)+20(2)+25(3)
10
199
x =
10
x = 𝟏𝟗. 𝟗
35
Weighted Mean. The weighted mean considers the proper weights assigned to the
x = mean (x bar )
Example: Here are the grades obtained by a student in the different criteria for grading. The
weight for each criterion is given.
Σw𝑥
Applying the formula: x 𝑤 =
Σw
832
x𝑤 = 10
x 𝑤 = 83.2
This means that, in consideration of all the criteria with their respective weights, the
grade of the students is generally 83.
36
Supplemental Activity: Solve for the weighted mean of the following data:
x w
8 2
9 3
12 5
14 7
15 8
16 4
18 1
Mean for Grouped Data. Data which are arranged in a frequency distribution are
called grouped data. When the number of items is too large, it is best to compute for the
measures of central tendency and variability using the frequency distribution (Pagoso &
Montaña, 1985). To compute for the mean of grouped data, determine first the midpoint of
each class interval (Amid, 2005). The formula is given by:
ΣfM Σf𝑥𝑚
x = or x =
n n
where:
x = mean
37
Example: The following is the distribution of the length of service in years of 50 employees
of United Laboratories Inc. Determine the mean using the given formula.
Solution:
Step 1: Compute for the midpoint of each class interval. Use the formula:
𝐿𝐿+𝑈𝐿
𝑀= 2
38
Step 2: Multiply the frequency by the midpoint and get the total (Σ𝑓𝑀).
ΣfM
x = n
810
x = 50
x = 16.2
Supplemental Activity: Solve for the mean from the data below:
CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2
Answer: 𝑥ҧ = 20.9
39
Lesson 2. Median
MEDIAN. The median is defined as the score-point which divides a ranked
distribution into two equal parts; it is the value below which lies 50% of the data. It is
denoted by 𝑴𝒅𝒏 (Febre, 1987).
Median for Ungrouped Data. Computation of the median for ungrouped data requires
the values to be arranged in the order of magnitude, either in ascending or descending order
(Febre, 1987).
• For data involving an odd number of scores (n = odd), the median is simply
the middle value. For example: if n = 9, the median is the fifth score from
either the lowest or the highest.
• If n = even, there would be two middle values. The median in this case is the
average of these two middle values. For example: if n = 20, the median is the
average of the 10th and 11th scores.
Solution:
1. 6, 8, 15, 18, 23, 24, 42
Since n = 7, the middle score is the 4th score which is 18, therefore the 𝑀𝑑𝑛 = 18.
𝑀𝑑𝑛 = 105
40
Supplemental Activity: Solve for the median from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑀𝑑𝑛 = 28.5
Median of Grouped Data (Febre, 1987). The median of grouped data could be
determined by the following formula:
𝑛
− 𝑐𝑓𝑏
2
𝑀𝑑𝑛 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑀𝑑𝑛
𝑛 𝑛
The median class is the class limit which contains the 2 𝑡ℎ value. The 2 𝑡ℎ is equal or
nearest but not greater than the value in the cumulative frequency less than (𝑐𝑓 <)
distribution.
Example: Find the median of the frequency distribution of length of service in years of 50
employees of United Laboratories Inc.
41
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3
Solution:
𝑛 𝑛 50
Step 2: To obtain the median class, solve for 𝑡ℎ value, that is 𝑡ℎ = 𝑡ℎ = 25th value. The
2 2 2
25th value is located at the class interval 16 – 20.
42
Step 3: Since the 25th value has been located, the terms in the formula can now be
enumerated. The median class is 16 – 20.
𝒏
= 25
𝟐
𝑓𝑀𝑑𝑛 = 13
𝑖 = 5
43
25 − 24
𝑀𝑑𝑛 = 15.5 + ( ) 5 (subtract, divide, multiply, add)
13
𝑀𝑑𝑛 = 𝟏𝟓. 𝟖𝟖 (round off to the nearest hundredths place)
Supplemental Activity: Solve for the mean from the data below:
CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2
Lesson 3. Mode
MODE. The mode is the simplest measure of central tendency. It may be easily
identified by looking at an ungrouped set of scores and locating the score or item which
occurs most frequently. A distribution with only one mode is said to be unimodal while a
distribution with a two or more modes is described as multimodal. A distribution which has
two modes is labeled as bimodal; with three modes, as trimodal; and so on (Pagoso &
Montaña, 1985). The mode is denoted by 𝑀𝑜 (Amid, 2005).
Mode for Ungrouped Data. This is done simply through inspection (Febre, 1987).
Look for the item value which occurs the most number of times. That value is the mode.
44
Solution:
1. 𝑀𝑜 = 8 (since it appears 3 times more than the other values)
2. 𝑀𝑜 = 13 and 17 (bimodal)
Supplemental Activity: Solve for the mode from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑀𝑜 = 𝑛𝑜𝑛𝑒
Mode for Grouped Data. The mode in a frequency distribution is within the class
interval with the highest frequency. The class interval with the highest frequency is known as
the modal class (Pagoso & Montaña, 1985). The formula is given by:
𝑑1
𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 + 𝑑2
where:
𝐿𝑇𝐶𝐵 = lower true class boundary of the modal class
𝑑1 = difference between the frequency in the modal class and the
frequency of the preceding class interval
𝑑1 = difference between the frequency in the modal class and the
frequency in the succeeding class interval
𝑖 = class interval size
Example: Find the mode of the frequency distribution of length of service in years of 50
employees of United Laboratories Inc.
45
Solution:
1. The highest frequency is 13 which belongs to the class interval 16 – 20, therefore it is
the modal class.
Length of Service (CI) Number of Employees f
LL-UL
1–5 5
6 – 10 7
11 – 15 12
Modal class
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3
𝑛 = 50
2. The preceding class interval is the interval that is lower in value than the modal class
(11 – 15) with a frequency of 12,
The succeeding class interval is the interval with a greater value than the modal
class (21 – 25) with a frequency of 6.
Preceding class 6 – 10 7
interval 11 – 15 12 }d 1 = 13 – 12 = 1
16 – 20 13
Succeeding
class interval 21 – 25 6
}d 2 = 13 – 6 = 7
26 – 30 4
31 – 35 3
46
3. Compute for Mode.
Modal class = 16 – 20
𝐿𝑇𝐶𝐵 = LL – 0.5 = 16 – 0.5 = 15.5
𝑑1 =1
𝑑1 =7
𝑑1
𝑖 = 5𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 +𝑑2
𝑑1
𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 +𝑑2
1
𝑀𝑜 = 15.5 + ( )5
1+7
𝑀𝑜 = 16.13
Supplemental Activity: Solve for the mean from the data below:
CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2
Answer: 𝑀𝑜 = 15.75
47
Assessment Task 3-1
48
Assessment Task 3-2
No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4
49
Assessment Task 3-3
No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4
No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4
50
5. Compute for the mode of each of the following sets of data.
51
Summary
References
52
+measures+of+central+tendency#:~:text=A%20measure%20of%20central%20te
ndency,or%20centre%20of%20its%20distribution.
• Cruz, C. U. (2002). Statistics. Marikina City. Instructional Coverage System
Publishing Inc.
• Child Care & Early Education Research Connections. (2019). Descriptive
Statistics.
https://www.researchconnections.org/childcare/datamethods/descriptivestats.jsp
• Febre Jr., F. A. (1987). Introductory Statistics. Quezon City. Phoenix Publishing
House, Inc.
• Frequency Distribution. (n.d.). https://www.emathzone.com/tutorials/basic-
statistics/frequency-
distribution.html#:~:text=Data%20presented%20in%20the%20form,distribution%
20is%20called%20grouped%20data.&text=The%20numerical%20raw%20data%
20arranged,%2C%2010%2C%2016%2C%2019.
• Pagoso, C. M., Montaña, Rizalina A. (1985). Introductory Statistics. Quezon City.
Rex Printing Company, Inc.
• Trochim, W. M.K. (2020, March 10). Descriptive Statistics.
https://conjointly.com/kb/descriptive-statistics/
• Measures of Central Tendency. (2018). https://statistics.laerd.com/statistical-
guides/measures-central-tendency-mean-mode-median.php
53
MODULE 4
MEASURES OF LOCATION
Introduction
A measure of position is a method by which the position that a particular data value
has within a given data set can be identified (http://www.milefoot.com/, n.d.). It is also called
a measure of location, quantiles or fractiles. Fractiles are measures of location or position
which include not only central location but also any position based on the number of equal
divisions in a given distribution. The most commonly used fractiles are the quartiles, deciles,
and percentiles (Yap, 2014). A fractile is the cut off point for a certain fraction of a sample. If
your distribution is known, then the fractile is just the cut-off point where the distribution
reaches a certain probability (Glen, 2017). Fractile computations are related to computing
the median in the sense that both quantities form points of divisions of a distribution,
depending on the number of parts the distribution is to be partitioned (Cruz, 2002).
Learning Outcomes
54
Lesson 1. Percentile
The 𝑘th percentile, 𝑃𝑘 , can be defined as a value in a data set such that about k% of
the measurements are smaller than the value of 𝑃𝑘 and about (100 – k)% of the
measurements are greater than the value of 𝑃𝑘 (Amid, 2005). The value of the 𝑘th percentile
(𝑃𝑘 ) is given by:
𝑘𝑛
1. 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 for ungrouped data; and
100
𝑘𝑛
−𝑐𝑓𝑏
2. 𝑃𝑘 = 𝐿𝑇𝐶𝐵 + (100 )𝑖 for grouped data
𝑓𝑃 𝑘
where: P = Percentile
k = the value from 1 to 99
LTCB = lower true class boundary of the 𝑃𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝑃𝑘 class
𝑓𝑃𝑘 = frequency of the 𝑃𝑘 class
𝑖 = class interval size
55
Example: Ungrouped Data
The following data give the price earnings ratio of 12 companies.
16 38 20 20 18 34 7 58 31 19 22 18
Find the value of the 62nd percentile.
Solution:
Step 1: Rank the given data in increasing order.
7 16 18 18 19 20 20 22 31 34 38 58
Supplemental Activity: Use the given in the example to solve for 𝑷𝟐𝟓 and 𝑷𝟗𝟎 .
Answers: 𝑃25 = 18, 𝑃90 = 38
a.
Length of Service (Class interval) LL- Number of Employees (frequency) f
UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4
56
31 – 35 3
Solution:
a. 𝑃45
𝑘𝑛
Step 2: Compute for 𝑡ℎ value and locate the value on the cf< column to find the
100
𝑃45 class. The value of k in the given is 45 and n = 50.
45𝑛 45(50)
𝑡ℎ value = 𝑡ℎ value = 22.5th value
100 100
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50
57
45𝑛
−𝑐𝑓𝑏
Step 3: Compute 𝑃45 with the formula for 𝑃45 = 𝐿𝑇𝐶𝐵 + ( 100 ) 𝑖.
𝑓𝑃45
45𝑛
− 𝑐𝑓𝑏
𝑃45 = 𝐿𝑇𝐶𝐵 + (100 )𝑖
𝑓𝑃45
22.5 − 12
𝑃45 = 10.5 + ( )5
12
𝑃45 = 𝟏𝟒. 𝟖𝟖
Supplemental Activity: Use the given in the example to solve for 𝑷𝟔𝟎 and 𝑷𝟗𝟓 .
Answers: 𝑃60 = 17.81, 𝑃95 = 31.33
58
Lesson 2. Quartile and Interquartile Range
Quartiles are three summary measures that divide a ranked data set into four equal parts.
Three measures will divide any data set into four equal parts. These three measures are the
first quartile (𝑄1 ), the second quartile 𝑄2 , and the third quartile 𝑄3 . The second quartile is the
same as the median of a data set. The first quartile is the value of the middle term among
the observations less than the median, and the third quartile is the value of the middle term
among the observations that are greater than the median. The data should be ranked in
increasing order before the quartiles are determined (Amid, 2005).
The difference between the first and the third quartile is called the Interquartile range (IQR).
That is, 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 (Amid, 2005). The interquartile range is a measure of where the
“middle fifty” is in a data set. It is a measure of where the bulk of the values lie. That’s why
it’s preferred over many other measures of spread when reporting things like school
performance or SAT scores (Glen, 2020). This measure is generally more desirable than
the range when the distribution described is markedly truncated or skewed, or when the
median is the only measure of central tendency that is available (Febre, 1987).
The semi-interquartile range (or quartile deviation) is a measure of spread or dispersion. It is
computed as one half the difference between the 75th percentile [often called (Q3)] and the
𝑄3 −𝑄1 𝐼𝑄𝑅
25th percentile (Q1). Thus, 𝑄𝐷 = or 𝑄𝐷 = (Lane, n.d.).
2 2
The value of the 𝑘th quartile for ungrouped data is described below:
59
Solution:
1. Rank the given data in increasing order. Then calculate the three quartiles as follows:
7 16 18 18 19 20 20 22 31 34 38 58
𝑘𝑛
a. 𝑸𝟏 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=1
4
𝑘𝑛 1(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒; the third value is 18; 𝑸𝟏 = 𝟏𝟖
4 4
𝑘𝑛
b. 𝑸𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=2
4
𝑘𝑛 2(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the sixth value is 20; 𝑸𝟐 = 𝟐𝟎
4 4
𝑘𝑛
c. 𝑸𝟑 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=3
4
𝑘𝑛 3(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the ninth value is 31; 𝑸𝟐 = 𝟑𝟏
4 4
d. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝐼𝑄𝑅 = 31 − 18
𝑰𝑸𝑹 = 𝟏𝟑
2. Rank the given data in increasing order. Then calculate the three quartiles as follows:
24 28 33 35 37 39 47 51 59
𝑘𝑛
a. 𝑸𝟏 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=1
4
𝑘𝑛 1(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 2.25𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒 ; the third value is 33;
4 4
therefore, 𝑸𝟏 = 𝟑𝟑
𝑘𝑛
b. 𝑸𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=2
4
60
𝑘𝑛 2(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the fifth value is 37;
4 4
therefore, 𝑸𝟐 = 𝟑𝟕
𝑘𝑛
c. 𝑸𝟑 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=3
4
𝑘𝑛 3(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 6.75𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 7𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the seventh value is 47;
4 4
therefore 𝑸𝟐 = 𝟒𝟕
d. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝐼𝑄𝑅 = 47 − 33
𝑰𝑸𝑹 = 𝟏𝟒
Supplemental Activity
The following data give the average family income (in thousand pesos) in 1985 for each of the 13 regions in the
Philippines: 58, 31, 27, 38, 29, 20, 24, 21, 18, 23, 27, 28, 24. Find:
1. the value of the three quartiles: and
2. the interquartile range.
The value of the 𝑘th quartile for grouped data is given by:
𝑘𝑛
−𝑐𝑓𝑏
𝑄𝑘 = 𝐿𝑇𝐶𝐵 + ( 4 )𝑖
𝑓𝑄𝑘
where: Q = Quartile
k = the value from 1 to 3
LTCB = lower true class boundary of the 𝑄𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝑄𝑘 class
𝑓𝑄𝑘 = frequency of the 𝑄𝑘 class
𝑖 = class interval size
61
Below is the frequency distribution of length of service in years of 50 employees of United
Laboratories Inc. Find the value of the three quartiles and the interquartile range.
Solution:
a. 𝑄1 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 1, n =50
4
𝑘𝑛 1(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 12.5th value
4 4
62
Solve for 𝑄1 : k = 1, n = 50, LTCB = 10.5, 𝑐𝑓𝑏 = 12, 𝑓𝑄1 = 12, 𝑖=5
1(𝑛)
−𝑐𝑓𝑏
4
𝑄1 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑄1
12.5−12
𝑄1 = 10.5 + ( )5
12
𝑸𝟏 = 𝟏𝟎. 𝟕𝟏
b. 𝑄2 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 2, n =50
4
𝑘𝑛 2(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 25th value
4 4
Solve for 𝑄2 : k = 2, n = 50, LTCB = 15.5, 𝑐𝑓𝑏 = 24, 𝑓𝑄1 = 13, 𝑖=5
2(𝑛)
4
−𝑐𝑓𝑏
𝑄2 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑄2
25−24
𝑄2 = 15.5 + ( )5
13
𝑸𝟐 = 𝟏𝟓. 𝟖𝟖
63
c. 𝑄3 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 3, n =50
4
𝑘𝑛 3(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 37.5th value
4 4
d. 𝐼𝑄𝑅 = ?
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 20.92 − 10.71
𝑰𝑸𝑹 = 𝟏𝟎. 𝟐𝟏
64
Supplemental Activity
The following is a distribution for the number of employees in 80 companies belonging to certain industry.
Find:
1. the value of the three quartiles: and
2. the interquartile range.
Lesson 3. Decile
Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort
data into ten equal parts: The 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th and 100th
percentiles. Deciles and decile ranks are used more often in real life than in the
classroom. Deciles are also commonly used for college admissions and high school
rankings (Glen, 2014). It is also used in other fields such as finance, economics and others
(Educba, 2020).
65
A decile rank assigns a number to a decile:
1 10th
2 20th
3 30th
4 40th
5 50th
6 60th
7 70th
8 80th
9 90th
The higher your place in the decile rankings, the higher your overall ranking. For example, if
you were in the 99th percentile for a particular test, that would put you in the decile ranking
of 10. A person who scored very low (say, the 5th percentile) would find themselves in a
decile rank of 1 (Glen, 2014).
Like other tools quartile and percentile, decile is also a method which divides data into
smaller parts which are easier to measure, analyze and understand (www.educba.com,
2020).
66
𝑘𝑛
1. 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 for ungrouped data; and
10
𝑘𝑛
−𝑐𝑓𝑏
2. 𝐷𝑘 = 𝐿𝑇𝐶𝐵 + ( 10 )𝑖 for grouped data
𝑓𝐷𝑘
where: D = Decile
k = the value from 1 to 9
LTCB = lower true class boundary of the 𝐷𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝐷𝑘 class
𝑓𝐷𝑘 = frequency of the 𝐷𝑘 class
𝑖 = class interval size
Solution:
1. Rank the given data in increasing order.
7 16 18 18 19 20 20 22 31 34 38 58
𝑘𝑛
a. 𝑫𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=2
10
𝑘𝑛 1(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 1.2𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 2𝑛𝑑 𝑣𝑎𝑙𝑢𝑒 ; the second value is 16;
10 10
therefore, 𝑫𝟐 = 𝟏𝟔
𝑘𝑛
b. 𝑫𝟕 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=7
10
𝑘𝑛 7(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 8.4𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the ninth value is 31;
10 10
therefore, 𝑫𝟕 = 𝟑𝟏
67
Supplemental Activity
The following data give the average family income (in thousand pesos) in 1985 for each of the 13 regions in the
Philippines: 58, 31, 27, 38, 29, 20, 24, 21, 18, 23, 27, 28, 24. Find:
1.𝑫𝟒: and
2.𝑫𝟗.
Answers: 𝐷4 = 24, 𝐷9 = 38
Solution:
a. 𝐷2 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 2, n =50
10
𝑘𝑛 2(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 10th value
10 10
68
Length of Service (Cl) No. of Employees
cf<
LL-UL f
1–5 5 5 – 𝑐𝑓𝑏
𝐷2 6 – 10 7 - 𝑓𝑄1 12 10th
class 11 – 15 12 24 value
16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50
b. 𝐷7 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 7, n =50
10
𝑘𝑛 7(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 35th value
10 10
69
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50
Solve for 𝐷7 : k = 7, n = 50, LTCB = 15.5, 𝑐𝑓𝑏 = 24, 𝑓𝑄1 = 13, 𝑖=5
7(𝑛)
−𝑐𝑓𝑏
𝐷7 = 𝐿𝑇𝐶𝐵 + ( 10 )𝑖
𝑓𝐷7
35−24
𝐷7 = 15.5 + ( )5
13
𝑫𝟐 = 𝟏𝟗. 𝟕𝟑
Supplemental Activity: Use the given in the example to solve for 𝑫𝟒 and 𝑫𝟗 .
Answers: 𝐷4 = 13.83, 𝐷9 = 19.73
70
Assessment Task 4-1
The following are the average number of minutes required to do an assembly job by
the 15 workers of a manufacturing plant:
77, 85, 63, 54, 62, 78, 80, 48, 63, 79, 69, 55, 63, 78,71.
Find:
a. 𝑃77
b. 𝑄3
c. 𝐷6
71
Assessment Task 4-2
The following table gives the frequency distribution of the number of orders received
each day during the past 50 days at the office of a mail-order company.
Find:
a. 𝑃77
b. 𝑄3
c. 𝐷6
72
Summary
References
73
• Febre Jr., F. A. (1987) Introductory Statistics. Quezon City. Phoenix Publishing
House, Inc.
• Glen, S. . (2017, July 31). "Fractile: Simple Definition"
From StatisticsHowTo.com: Elementary Statistics for the rest of
us! https://www.statisticshowto.com/fractile-simple-definition/
• Glen, S. (2020). What is an Interquartile Range?
https://www.statisticshowto.com/probability-and-statistics/interquartile-range/
• Glen, S. (2014, February 21). "What is a
Decile?". https://www.statisticshowto.com/decile/
• Lane, D. (n.d.). Semi-Interquartile Range.
http://davidmlane.com/hyperstat/A48607.html
• Measures of Position. (n.d.). http://www.milefoot.com/math/stat/desc-
positions.htm#:~:text=A%20measure%20of%20position%20is,to%20defining%2
0such%20a%20measure.
• Yap, N. (2014, October 12). Fractiles.
https://www.slideshare.net/nemalynyap/fractiles#:~:text=Fractiles%20are%20me
asures%20of%20location,Q2%2C%20Q3%2C%20and%20Q4.
74
MODULE 5
MEASURES OF DISPERSION, SKEWNESS AND
KURTOSIS
Introduction
75
Learning Outcomes
76
Lesson 1. Variance and Standard Deviation
Variance is the average of the squared deviations (differences) from the mean
(Investopedia, 2020). It is a measurement of the spread between numbers in a data set.
That is, it measures how far each number in the set is from the mean and therefore from
every other number in the set. Variance is calculated by taking the differences between each
number in the data set and the mean, then squaring the differences to make them positive,
and finally dividing the sum of the squares by the number of values in the data set (Hayes,
2019). The sample variance is denoted by 𝑠 2 and the population variance by 𝛿 2 (Brown,
2019).
The Standard Deviation is a measure of how spread out numbers are (Math is fun,
2017). It is a statistic that measures the dispersion of a dataset relative to its mean and is
calculated as the square root of the variance. The standard deviation is calculated as the
square root of variance by determining each data point's deviation relative to the mean. If
the data points are further from the mean, there is a higher deviation within the data set;
thus, the more spread out the data, the higher the standard deviation (Hargrave, 2020). The
sample standard deviation is denoted by 𝑠 and the population standard deviation by 𝛿
(Brown, 2019).
The formula for variance and standard deviation are given below (Febre, 1987):
Σ(x − 𝑥ҧ )2 Σf(x𝑚 − 𝑥ҧ )2
Variance 𝑠2 = 𝑠2 =
𝑛−1 𝑛−1
Σ(x − 𝑥ҧ )2 Σ(x − 𝑥ҧ )2
Standard Deviation 𝑠=√ 𝑠=√
𝑛−1 𝑛−1
where: 𝑠2 = variance
𝑠 = standard deviation
Σ = summation
𝑥 = raw score
𝑥ҧ = mean
𝑛 = total number of items/frequency
𝑓 = frequency
77
𝑥𝑚 = midpoint/class mark
8 −7.2
15.2
9 −6.2
10 −5.2
12 −3.2
17 1.8
18 2.8
18 2.8
19 3.8
78
20 4.8
21 5.8
Σ𝑥 = 152
𝑛 = 10
Step 3: Get the square of 𝑥 − 𝑥ҧ .
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
8 −7.2 51.84
15.2
9 −6.2 38.44
10 −5.2 27.04
12 −3.2 10.24
17 1.8 3.24
18 2.8 7.84
18 2.8 7.84
19 3.8 14.44
20 4.8 23.04
21 5.8 33.64
Σ𝑥 = 152
𝑛 = 10
79
Step 5: Compute for variance and standard deviation
Σ(x−𝑥ҧ )2 Σ(x−𝑥ҧ )2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1
217.6 217.6
𝑠2 = 𝑠=√
10−1 10−1
217.6 217.6
𝑠2 = 𝑠=√
9 9
𝒔𝟐 = 𝟐𝟒. 𝟏𝟖 𝑠 = √24.18
𝒔 = 𝟒. 𝟗𝟐
Supplemental Activity: Solve for the variance and standard deviation from the given data: 23, 18, 27,
30, 43, 78.
80
81 – 89 8
90 – 98 3
Solution:
Step 1: Compute for the mean.
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ
18 – 26 2 22 44
63.7
27 – 35 3 31 93
36 – 44 5 40 200
45 – 53 6 49 294
54 – 62 10 58 580
63 – 71 11 67 737
72 – 80 12 76 912
81 – 89 8 85 680
90 – 98 3 94 282
𝑛 = 60 Σ𝑓𝑥𝑚 = 3822
3822
𝑥ҧ = =63.7
60
Step 2: Subtract the mean from each of the midpoint and get its absolute value
|𝑥𝑚 − 𝑥|̅
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ |
18 – 26 2 22 44 41.7
63.7
27 – 35 3 31 93 32.7
36 – 44 5 40 200 23.7
45 – 53 6 49 294 14.7
54 – 62 10 58 580 5.7
63 – 71 11 67 737 3.3
72 – 80 12 76 912 12.3
81
81 – 89 8 85 680 21.3
90 – 98 3 94 282 30.3
𝑛 = 60 Σ𝑓𝑥𝑚 = 3822
Step 4: Multiply the 6th and 7th columns and then get the sum
Σ𝑓(𝑥𝑚 − 𝑥ҧ )2 .
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | 𝑓|𝑥𝑚 − 𝑥ҧ | 𝑓(𝑥𝑚 − 𝑥ҧ )2
18 – 26 2 22 44 41.7 83.4 3477.78
63.7
27 – 35 3 31 93 32.7 98.1 3207.87
36 – 44 5 40 200 23.7 118.5 2808.45
45 – 53 6 49 294 14.7 88.2 1296.54
54 – 62 10 58 580 5.7 57 324.9
63 – 71 11 67 737 3.3 36.3 119.79
72 – 80 12 76 912 12.3 147.6 1815.48
81 – 89 8 85 680 21.3 170.4 3629.52
82
90 – 98 3 94 282 30.3 90.9 2754.27
Σ𝑓𝑥𝑚 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2
𝑛 = 60
= 3822 = 19434.6
19 434.6 19 434.6
𝑠2 = 𝑠=√
60−1 60−1
19 434.6 19 434.6
𝑠2 = 𝑠=√
59 59
𝒔𝟐 = 𝟑𝟐𝟗. 𝟒 𝑠 = √329.4
𝒔 = 𝟏𝟖. 𝟏𝟓
Supplemental Activity: Solve for the variance and standard deviation from the data below:
CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2
83
Lesson 2. Coefficient of Variation (Febre, 1987)
The coefficient of variation (CV) is one type of measure of relative dispersion which
expresses the standard deviation as a percentage of the mean.
𝑠
𝑐𝑣 = 𝑥100% for sample data
𝑥ҧ
𝜎
𝐶𝑉 = 𝑥100% for population data
𝜇
Example 1: Calculate the coefficient of variation of each of the following samples and
interpret the result.
Sample A: 24, 28, 32, 35, 37, 43, 48, 59, 62, 64
Solution:
Sample A: Compute for the mean and standard deviation first to calculate for the
coefficient of variation.
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
24 −19.2 368.64
43.2
28 −15.2 231.04
32 −11.2 125.44
35 −8.2 67.24
37 −6.2 38.44
43 −0.2 0.04
48 4.8 23.04
59 15.8 249.64
84
62 18.8 353.44
64 20.8 432.64
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6
𝑛 = 10
Σ𝑥 432
𝑥ҧ = = = 43.2
𝑛 10
Σ(x−𝑥ҧ )2 1889.6
𝑠=√ = √ = 14.49
𝑛−1 10−1
𝑠 14.49
𝑐𝑣 = 𝑥100% = 𝑥100% = 33.54%
𝑥ҧ 43.2
Sample B: Compute for the mean and standard deviation first to calculate for the coefficient
of variation.
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
212 −19 361
231
218 −13 169
220 −11 121
223 −8 64
234 3 9
238 7 49
245 14 196
258 27 729
Σ𝑥 = 1848
Σ(𝑥 − 𝑥ҧ )2 = 1698
𝑛=8
Σ𝑥 1848
𝑥ҧ = = = 231
𝑛 8
85
Σ(x−𝑥ҧ )2 1698
𝑠=√ = √ 8−1 = 15.57
𝑛−1
𝑠 15.57
𝑐𝑣 = 𝑥100% = 𝑥100% = 6.74%
𝑥ҧ 231
𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
54 – 62 10
63 – 71 11
72 – 80 12
81 – 89 8
90 – 98 3
Solution:
Compute for the mean and standard deviation first to calculate for the coefficient of variation.
86
27 – 35 3 31 93 32.7 98.1 3207.87
36 – 44 5 40 200 23.7 118.5 2808.45
45 – 53 6 49 294 14.7 88.2 1296.54
54 – 62 10 58 580 5.7 57 324.9
63 – 71 11 67 737 3.3 36.3 119.79
72 – 80 12 76 912 12.3 147.6 1815.48
81 – 89 8 85 680 21.3 170.4 3629.52
90 – 98 3 94 282 30.3 90.9 2754.27
Σ𝑓𝑥𝑚 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2
𝑛 = 60
= 3822 = 19434.6
𝑠 18.15
𝑐𝑣 = 𝑥100% = 𝑥100% = 28.49%
𝑥ҧ 63.7
87
With the use of standard deviation it is possible to obtain a measure of skewness which
indicates both the direction and magnitude of skewness of a frequency data. It is called the
Pearsonian coefficient of skewness (Sk) given by the formula (Amid, 2005):
̅−𝑴𝒅𝒏)
𝟑(𝒙
𝑺𝒌 = .
𝒔
The algebraic sign of the value of Sk indicates the direction of skewness while
magnitude of the value of Sk indicates the extent to which the curve is skewed. The value of
Sk is positive if the mean is greater than the median, negative if the mean is less than the
median, and zero if they are equal. The curve is bell-shaped and symmetrical when Sk=0.
As a rule the closer the coefficient of skewness is to zero, the less skewed the distribution
will be and the farther it is from zero, the more skewed the distribution will be (Amid, 2005).
If:
Example 1: Find the measure of skewness given the following set of data: 24, 28, 32, 35, 37,
43, 48, 59, 62, 64. Interpret the result.
88
Solution: Solve for the mean, median, standard deviation skewness.
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
24 −19.2 368.64
43.2
28 −15.2 231.04
32 −11.2 125.44
35 −8.2 67.24
37 −6.2 38.44
43 −0.2 0.04
48 4.8 23.04
59 15.8 249.64
62 18.8 353.44
64 20.8 432.64
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6
𝑛 = 10
Σ𝑥 432
𝑥ҧ = = = 43.2
𝑛 10
37+43
𝑀𝑑𝑛 = = 40
2
Σ(x−𝑥ҧ )2 1889.6
𝑠=√ = √ 10−1 = 14.49
𝑛−1
Example 2: Find the measure of skewness given the following set of data: 212, 218, 220,
223, 234, 238, 245, 258. Interpret the result.
89
Solution: Solve for the mean, median, standard deviation skewness.
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
212 −19 361
231
218 −13 169
220 −11 121
223 −8 64
234 3 9
238 7 49
245 14 196
258 27 729
Σ𝑥 = 1848
Σ(𝑥 − 𝑥ҧ )2 = 1698
𝑛=8
Σ𝑥 1848
𝑥ҧ = = = 231
𝑛 8
223+234
𝑀𝑑𝑛 = = 228.5
2
Σ(x−𝑥ҧ )2 1698
𝑠=√ = √ 8−1 = 15.57
𝑛−1
𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
54 – 62 10
63 – 71 11
72 – 80 12
90
81 – 89 8
90 – 98 3
𝑛
−𝑐𝑓𝑏 30−26
𝑀𝑑𝑛 = 𝐿𝑇𝐶𝐵 + ( 2𝑓 ) 𝑖 = 62.5 + ( ) 9 = 65.77
𝑀𝑑𝑛 11
Curves of distributions having the same coefficient of skewness may still differ
significantly. Symmetrical curves may vary in shape and this may be because curves do not
91
have the same peakedness, a property of curves which can be described by computing for
the value called measure of kurtosis (Febre, 1987).
The kurtosis of a set of data is obtained by simply dividing the fourth moment about
the mean by the square of the variance (Febre, 1987).
Σ(𝑥−𝑥ҧ )4
𝐾= for ungrouped data (Febre, 1987)
𝑛𝑠 4
Σf(𝑥𝑚 −𝑥ҧ )4
𝐾= for grouped data (Febre, 1987)
𝑛𝑠 4
If K = 3, mesokurtic;
K < 3, platykurtic.
92
Example 1: Find the measure of kurtosis given the following set of data: 24, 28, 32, 35, 37,
43, 48, 59, 62, 64. Interpret the result.
Solution: Solve for the mean and standard deviation to solve for kurtosis.
(𝑥 − 𝑥ҧ )4 = ((𝑥 − 𝑥ҧ )2 )2
𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2 (𝑥 − 𝑥ҧ )4
24 −19.2 368.64 135895.45
43.2
28 −15.2 231.04 53379.48
32 −11.2 125.44 15735.19
35 −8.2 67.24 4521.22
37 −6.2 38.44 1477.63
43 −0.2 0.04 0
48 4.8 23.04 530.84
59 15.8 249.64 62320.13
62 18.8 353.44 124919.83
64 20.8 432.64 187177.37
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6 Σ(𝑥 − 𝑥ҧ )4 = 585957.14
𝑛 = 10
Σ(𝑥−𝑥ҧ )4 585957.14
𝐾= = (10)(14.49)4 = 1.32 PLATYKURTIC
𝑛𝑠 4
𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
93
54 – 62 10
63 – 71 11
72 – 80 12
81 – 89 8
90 – 98 3
Solution: Solution: Solve for the mean and standard deviation to solve for kurtosis.
(𝑥𝑚
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | − 𝑥ҧ )2
𝑓(𝑥𝑚 − 𝑥ҧ )2 𝑓(𝑥𝑚 − 𝑥ҧ )4
94
Assessment Task 5-1
18 19 23 7 20 21 24 26 18 22
Find:
1. Variance
2. Standard deviation
3. Coefficient of variation
4. Skewness (interpret the result)
5. Kurtosis (interpret the result)
95
Assessment Task 5-2
CI f
21-30 8
31-40 11
41-50 15
51-60 18
61-70 20
71-80 12
81-90 9
91-100 7
Find:
1. Variance
2. Standard deviation
3. Coefficient of variation
4. Skewness (interpret the result)
5. Kurtosis (interpret the result)
96
Summary
• Variance is the average of the squared deviations (differences) from the mean.
• The standard deviation is calculated as the square root of variance by
determining each data point's deviation relative to the mean.
• The coefficient of variation (CV) is one type of measure of relative dispersion
which expresses the standard deviation as a percentage of the mean.
• Skewness is the degree of distortion from the symmetrical bell curve or the normal
distribution. It measures the lack of symmetry in data distribution
• Kurtosis (K) is the measure of peakedness or flatness of a data distribution
relative to a normal distribution.
References
97
andard%20deviation%20is%20a,square%20root%20of%20the%20variance.&tex
t=If%20the%20data%20points%20are,the%20higher%20the%20standard%20de
viation.
• Hayes, A. (2019, September 2). Variance.
https://www.investopedia.com/terms/v/variance.asp
• Faculty of the Institute of Statistics. (n.d.). Workbook in Statistics 1. UP Los
Banos, College Laguna 4031
• Finance Train. (2020). Interpretation of Skewness, Kurtosis, Coskewness,
Cokurtosis. https://financetrain.com/interpretation-of-skewness-kurtosis-
coskewness-cokurtosis/
• Investopedia. (2020, July 14). Standard Deviation vs. Variance: What’s the
Difference? https://www.investopedia.com/ask/answers/021215/what-difference-
between-standard-deviation-and-variance.asp
• Measures of Dispersion. (n.d.). https://www.toppr.com/guides/business-
mathematics-and-statistics/measures-of-central-tendency-and-
dispersion/measure-of-
dispersion/#:~:text=As%20the%20name%20suggests%2C%20the,the%20distrib
ution%20of%20the%20observations.
• Standard Deviation and Variance. (2017).
https://www.mathsisfun.com/data/standard-
deviation.html#:~:text=The%20Standard%20Deviation%20is%20a%20measure
%20of%20how%20spread%20out%20numbers%20are.&text=The%20formula%
20is%20easy%3A%20it,square%20root%20of%20the%20Variance.
• Statistics Solutions. (2020). Dispersion.
https://www.statisticssolutions.com/dispersion/
• Stat Trek. (2020). How to Measure Variability. https://stattrek.com/descriptive-
statistics/variability.aspx#:~:text=Statisticians%20use%20summary%20measure
s%20to,%2C%20variance%2C%20and%20standard%20deviation.
98