0% found this document useful (0 votes)
56 views102 pages

8-MC 107-Elementary Stat and Probability-Prelims

Uploaded by

Leron Jayvee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views102 pages

8-MC 107-Elementary Stat and Probability-Prelims

Uploaded by

Leron Jayvee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Elementary Statistics and Probability

Rose Nannette J. San Juan


Table of Contents
-
, 16pt Bookman Old Style

Module 1: Introduction to Statistics


Introduction 1
Learning Outcomes 3
Lesson 1. Definition of Statistics 4
Lesson 2. Basic Terms in Statistics 4
Lesson 3. Areas of Statistics 5
Lesson 4. Types of Variables 6
Lesson 5. Summation Notation 7
Assessment Task 10
Summary 14
References 14

Module 2: Collection and Presentation of Data


Introduction 16
Learning Outcomes 16
Lesson 1. Methods of Collecting Data 17
Lesson 2. Methods of Presenting Data 18
Lesson 3. The Frequency Distribution Table 23
Assessment Task 28
Summary 29
References 31

Module 3: Measures of Central Tendency


Introduction 32
Learning Outcomes 33
Lesson 1. Mean 34
Lesson 2. Median 40
Lesson 3. Mode 44
Assessment Task 48
Summary 52
References 52
Module 4: Measures of Location
Introduction 54
Learning Outcomes 54
Lesson 1. Percentile 55
Lesson 2. Quartile and Interquartile Range 59
Lesson 3. Decile 65
Assessment Task 71
Summary 73
References 73

Module 5: Measures of Variation, Skewness and Kurtosis


Introduction 75
Learning Outcomes 76
Lesson 1. Variance and Standard Deviation 77
Lesson 2. Coefficient of Variation 83
Lesson 3. Measure of Skewness 87
Lesson 4. Measure of Kurtosis 91
Assessment Task 95
Summary 97
References 97
Course Code: MC107

Course Description: The course equips the students with the basic statistical
tools to understand various phenomena. The topics on mean, variance, sampling
and estimation eventually allow the students to be able to perform hypothesis
testing on real-life problems from different fields. The course includes applications
and data analysis with computations carried out using SPSS.

Course Intended Learning Outcomes (CILO):


At the end of the course, students should be able to:
1. Define terms used in statistics;
2. Identify various methods of data collection and presentation;
3. Categorize appropriate numerical descriptive measures for
ungrouped and grouped data;
4. Solve for the normal probability distribution;
5. Formulate hypotheses; and
6. Perform hypotheses testing.

Course Requirements:
▪ Assessment Tasks - 60%
▪ Major Exams - 40%
_________
Periodic Grade 100%

MIDTERM GRADE = 60% (Activity) + 40% (Midterm exam)

FINAL GRADE = 30%(Midterm Grade) + 70 %[60% (Activity) + 40% (Final exam)]


MODULE 1
INTRODUCTION TO STATISTICS

Introduction

Statistics as a discipline began with the beginning of man’s existence. In


ancient times, it was used to provide information pertaining to taxes, soldiers,
agricultural crops and even to athletic endeavors. It also developed as a science partly
due to man’s propensity for gambling. This inclination then led to the early development
of probability theory. During this time the gamblers asked assistance from
mathematicians to provide them optimum techniques for various games of chance for
the purpose of having more wins (Punzalan & Uriarte, 1989).

The study of Statistics covers different things to different people. Men from
PAGASA report daily weather statistics, sports casters give halftime statistics, market
reporters give exact prices of prime commodities and the analysis on the actual price
increases. Mathematicians describe statistics as a major area in mathematics and
researchers discuss the appropriate statistics for analyzing the results of a particular
investigation (Amid, 2005).

Personal or business-related decisions or of some other kind are made every


day and usually under conditions of uncertainty. Many times, situations or problems
faced in the real world have no precise or definite solution. Statistical methods help to
make scientific and intelligent decisions in such situations. Decisions made by using
statistical methods are called educated guesses. Decisions made without using
statistical (or scientific) methods are pure guesses and may prove to be unreliable (My
digital backpack. 2020). To avoid unreliability, follow the step-by-step Statistical
Method: 1.) Carefully defining the situation and determine what you want to know; 2)
Gathering data using sampling method; 3) Analyzing and accurately summarizing the
data; and 4.) Deriving and communicating meaningful conclusions (Elcamino, 2020).

1
Government agencies gather data through surveys called census. A census
is an official, usually periodic, counting of population as to the number of persons
staying in the household, their ages, gender, civil status, monthly income, the type of
house and whether they own or are renting the house. The data collected are organized
and are given meanings relative to the purposes for which the census is made. Various
government agencies use these data for planning of allocation of funds such as
housing, need for school buildings and books, opportunities for employment, medical
services and many others (Cruz, 2002).

Statistics is a branch of mathematics that can be used for many other purposes. It
can give a precise description of data. This is a function of statistics which enables
researchers to make accurate statements or judgments about averages, variability and
relationships. Example: describing the academic performance of a group of pupils
according to the computed mean, standard deviation, and correlation with another
factor. It can predict the behavior of individuals. To measure the success of a pupil,
teacher or worker the researcher may have to compute measures like mean, standard
scores, percentiles, stanines and other statistical methods. Example: grades of students
can be predicted through scholastic aptitude test, the work performance through an
aptitude test related to the particular type of work, a teacher’s performance through a
teacher aptitude test or psychological test. It can be used to test a hypothesis.
Relationship/s between variables can be determined through a test of inference such as
in correlation. Other statistical measures that can be applied for inferential purposes are
the t-test, chi-square test, F-test and others. It is wise to remember that the choice of
statistics to use in testing hypothesis depends upon the nature of data. This includes
the scale of measurement used such as nominal, ordinal, interval and ratio; and its
distribution whether normally distributed or not; and other considerations depending on
the purpose. (Punzalan & Uriarte, 1989)

2
Learning Outcomes

At the end of this module, students should be able to:

1. Define terms used in statistics;


2. Distinguish qualitative from quantitative variable and discrete from continuous
variable; and
3. Solve summation notation.

3
Lesson 1. Definition of Statistics

Statistics have different meanings. In the more common usage, it refers to numerical
facts or recorded data. The number that represents the income of a family, the number of
cars sold at a dealership during past months, the number of employees of a company, the
number of students enrolled in a class, the starting salary of a typical college graduate, the
number of patients visiting a clinic or the number of hospital-acquired infection per month
are examples of statistics in this sense. Statistics is also a way of collecting and displaying
information (Amid, 2003).

As a field or discipline of study, statistics is a group of methods that are used to


collect, organize, present, analyze, and interpret data to make decisions (Admid, 2003). In
general usage, statistics is the science that deals with the collection, organization, analysis,
interpretation and presentation of information that can be applied numerically (Febre Jr.,
1987).

Lesson 2. Basic Terms in Statistics


1. Variable. It refers to a characteristic or property whereby the members of the group
or set vary or differ from one another. It is a characteristic under study that assumes
different values for different elements. In contrast to a variable, the value of a
constant is fixed (Amid, 2003).
2. Population. It is a collection, or set, of individuals, objects or events whose
properties are to be analyzed; the complete collection of individuals or objects that
are of interest to the data collector and researcher (Rafeedali, n.d.).
3. Target Population is a group from which representative information is desired and to
which inferences will be made. It is the population being studied (Amid, 2003).
Example: all 6 to 10-year old children in Barangay Mayumi
4. Sampling Population is the population from which a sample will actually be taken.
Example: only 6 to 10-year old school children and disregards those who are out-of-
school
5. Sample is any subset of a population which simply means that it could be the
individuals, objects or measurements selected by the sample collector from the

4
population (INSTAT UPLB, n.d.). It is a portion of the population selected for study
(Amid, 2003).
6. Representative sample is a sample that represents the characteristics of the
population as closely as possible (Amid, 2003).
7. Random sample is a sample drawn in such a way that each element of a population
has equal chances of being selected (Amid, 2003).
8. Datum refers to the value of the variable associated with one element of a
population or sample. This value may be a number, a word, or a symbol
(https://quizlet.com/, 2020).
9. Data (plural) is the set of values collected for the variable from each of the elements
belonging to the sample (https://quizlet.com/, 2020). These are numbers or
measurements that are collected as a result from observation, interview,
questionnaire, experimentation, test and so forth (Amid, 2003).
10. Parameter is a numerical value summarizing all the data of an entire population
(https://www2.southeastern.edu/, 2020).
11. Statistic is a numerical value summarizing the sample data
(https://www2.southeastern.edu/, 2020).

NOTE: Parameters are fixed in value, whereas statistics vary in value.

Lesson 3. Areas of Statistics (INSTAT UPLB, n.d.)

Descriptive Statistics. It involves methods of organizing, summarizing and presenting


data by using tables, graphs and summary measures.

Inferential Statistics. It is concerned with making generalizations on the characteristics


of a lager set where only a part is examined (this part is called a sample). It consists of
methods that use sample results to help make prediction.

5
Lesson 4. Types of Variables

Qualitative, or Attribute, or Categorical Variable . Qualitative, or Attribute, or


Categorical Variable is a variable that describes or categorizes an element of a population
(Quizlet, 2020). This is classified according to the scale of measurement used in data
collection. These are the nominal, ordinal, interval and ratio scales. Example: Four patients
were surveyed for: Gender, Educational attainment, and Level of satisfaction

Quantitative or Numerical Variable. Quantitative or Numerical Variable is a variable


that quantifies an element of a population (Quizlet, 2020). This variable can either be
discrete or continuous. Example: “Total cost” of textbooks purchased by each student for
this semester’s class (P238.87, P94.57, P139.24). The average cost is P157.56.

NOTE: Arithmetic operations, such as addition and averaging, are meaningful for data
resulting from a quantitative variable (Melek, 2020).

CLASSIFICATION OF QUANTITATIVE VARIABLES (Quizlet, 2020)

Discrete variable. Discrete variable is a variable that can assume a countable number
of values, using integers – that is, whole numbers (e.g.0,1,2)
Example:

• Number of children per household


• Length of hospital stay
• Hospital bed capacity
• No. of students per block

Continuous variable. Continuous variable is a variable that can assume an


uncountable number of values.
Example:

• Age
• Weight
• Height
• Time

6
NOTE: Statisticians often treat discrete variables as continuous variables.

CLASSIFICATIONS OF QUALITATIVE VARIABLES (Radford. 2020)

Nominal variable. Nominal variable is a variable that categorizes, describes or names


an element of a population. Example: Hair colors (brown, black, gray, blonde)
Ordinal variable. Ordinal variable is a variable that integrate data in an ordered
position, or ranking. Example: Level of satisfaction (very satisfied, satisfied, dissatisfied, very
dissatisfied)
Interval variable. Interval variable refers to the exact distance between two categories
can be determined, but the zero point is arbitrary. Example: Temperature, 00C does not
mean absence of any temperature at all; it is simply a reference point for purposes of
measurement.
Ratio variable. Ratio variable is the scale of measurement is similar to that of the
interval scale but the zero point is fixed. Example: Weight, which is measured in kilograms.
A zero weight always means absence of any weight.

Lesson 5. Summation Notation (Amid, 2003)

Summation notation is used to denote the sum of values. The uppercase Greek letter
 (pronounced sigma) is used to denote the sum of all values. Using this notation, the
foregoing sum can be written as follows:

x
i =1
i = x1 + x2 + x3 + x4 + x5

The notation in this expression represents the sum of all the values of x.

7
Example 1: Suppose the ages of four managers are 35, 47, 28 and 60 years.
Find:
a) x b)  ( x − 6 ) c) x 2 d )( x )
2

Solution: Let x1, x2, x3 and x4 be the ages (in years) of the first, second, third and fourth
manager, respectively. Then,
x1 = 35, x2 = 47, x3 = 28 and x4 = 60

a.) x = x 1 + x 2 + x3 + x 4 = 35 + 47 + 28 + 60 = 170
b.)
 ( x − 6) = (x − 6) + (x − 6) + (x − 6) + (x − 6)
1 2 3 4

 (x − 6) = (35 − 6) + (47 − 6) + (28 − 6) + (60 − 6)


 (x − 6) = 29 + 41 + 22 + 54
 (x − 6) = 146

c.) Note that (x)2 is the square of the sum of all x values.
x = 170 (solved in letter a.)

Thus, (x)2 = (170)2 = 28 900

d.)

x = x + x + x + x
2
1
2
2
2
3
2
4
2

 x = (35) + (47 ) + (28) + (60)


2 2 2 2 2

 x = 1225 + 2209 + 784 + 3600


2

 x = 7818
2

Example 2: The following table lists four pairs of m and f values:

m 12 15 20 30

8
f 5 9 10 16

Compute the following


a) m b) f2 c) mf d) m2f e) (m – 5)2f

Solution: m1 = 12, m2 = 15, m3 = 20, m4 = 30


f1 = 12, f2 = 9, f3 = 10, f4 = 16

a.) m = m 1 + m2 + m3 + m4 = 12 + 15 + 20 + 30 = 77

b.)

f 2
= f1 + f 2 + f 3 + f 4
2 2 2 2

f = (5) + (9) + (10 ) + (16 )


2 2 2 2 2

f 2
= 25 + 81 + 100 + 256
f 2
= 462
c.)
 mf = m f + m f + m f + m f
1 1 2 2 3 3 4 4

 mf =(12 )(5) + (15)(9) + (20 )(10 ) + (30 )(16 )


 mf =60 + 135 + 200 + 480
 mf =875

d.)

m 2
f = m1 f1 + m2 f 2 + m3 f 3 + m4 f 4
2 2 2 2

m f =(12 ) (5) + (15) (9) + (20 ) (10 ) + (30 ) (16 )


2 2 2 2 2

m 2
f =720 + 2025 + 4000 + 14400
m 2
f =21145

9
Assessment Task 1-1

Describe the nature of statistics. With your knowledge of statistics in this sense, how
do you apply it in a real life situation. Illustrate your answer in 5 sentences.

10
Assessment Task 1-2

Identify the word/s described by the following statements.

1. It is a variable that integrate data in an ordered position, or ranking.


2. This variable can either be discrete or continuous.
3. It is a group from which representative information is desired and to which
inferences will be made.
4. It is the complete collection of individuals or objects that are of interest to the
data collector and researcher. 9
5. It is the area of statistics which involves data collection, presentation and
description of sample data.
6. _____ is the kind of data that is collected directly from the data source without
going through any existing sources.
7. It is the science that deals with the collection, organization, analysis,
interpretation and presentation of information that can be applied numerically.
8. It is a characteristic of interest about each element of a population or sample.
9. It is a variable that describes or categorizes an element of a population.
10. A variable that can assume an uncountable number of values is called _____.

Answers:

1. 6.

2. 7.

3. 8.

4. 9.

5. 10.

11
Assessment Task 1-3

The following table lists five pairs of x and y values.


x 4 18 25 9 20
y 12 5 14 7 8

Compute:
a.) x

b.) (𝑦 − 1)

c.) xy

d.) (x)2

e.) y2

12
Assessment Task 1-4

Indicate which of the following variables are quantitative or qualitative. Classify


quantitative variables as discrete or continuous. Choose the letter of the correct
answer from the choices below.

A. Qualitative variable
B. Quantitative – discrete variable
C. Quantitative – continuous variable

1. Brand of shampoo 1. ______


2. ID number 2. ______
3. Marital status 3. ______
4. Number of years in service 4. ______
5. Rank of teachers 5. ______
6. Height of a building 6. ______
7. Student number 7. ______
8. Number of provinces in the Philippines 8. ______
9. IQ level 9. ______
10. Name of car companies 10. ______

13
Summary

• Statistics is the science that deals with the collection, organization, analysis,
interpretation and presentation of information that can be applied numerically.
• Population is a collection, or set, of individuals, objects or events whose
properties are to be analyzed; the complete collection of individuals or objects
that are of interest to the data collector and researcher.
• Sample is any subset of a population which simply means that it could be the
individuals, objects or measurements selected by the sample collector from the
population.
• Primary and secondary data are the two different types of data.
• The types of variables are qualitative and quantitative variables.
• There are two classifications of quantitative variables namely: discrete and
continuous.
• The four classifications of a qualitative variable are nominal, ordinal, interval and
ratio variable.
• Summation notation is used to denote the sum of values. The upper case Greek
letter (pronounced sigma) is used to denote the sum of all values.

References

Amid, D. M. (2005) Fundamentals of Statistics. Quezon City. Lorimar Publishing


Company Inc.

Cruz, C. U. (2002) Statistics. Marikina City. Instructional Coverage System


Publishing Inc.

Febre Jr., F. A. (1987). Introduction to Statistics. Quezon City. Phoenix Publishing


House, Inc.

14
Identifying Parameters and Statistics. (n.d.).
https://www2.southeastern.edu/Academics/Faculty/dgurney/Math241/StatTopic
s/ParamStat.htm#:~:text=Parameters%20are%20numbers%20that%20summa
rize,subset%20of%20the%20entire%20population.&text=For%20each%20stud
y%2C%20identify%20both,the%20statistic%20in%20the%20study.

Melek, R. I. (n.d.) Statistics.


http://courses.minia.edu.eg/Attach/%D8%A5%D8%AD%D8%B5%D8%A7%D8
%A1.pdf
My Digital Backpack. (n.d.) What is Statistical Mathematics?
https://mydigitalbackpack.net/advanced-level-statistical-
mathematics/#:~:text=Statistical%20methods%20help%20us%20make,may%2
0prove%20to%20be%20unreliable.

Elcamino. (n.d.). Process of Statistics.


https://www.elcamino.edu/faculty/klaureano/Documents/Math%20150/Section1
.1.Lecture[1].pdf

Punzalan, T. G., & Uriarte, G. G. (1989) Statistics A Simplified Approach. Manila.


Rex Book Store.

RAFEEDALI, E. (n.d.). Population and Sample.


https://tophat.com/marketplace/social-science/education/course-notes/oer-
research-population-and-sample-dr-
rafeedalie/1196/#:~:text=In%20research%20terminology%20the%20Population
,the%20interest%20of%20a%20researcher.&text=The%20process%20of%20c
onducting%20a,population%20is%20called%20a%20census.

Sampling and Data. (n.d.)


https://emunix.emich.edu/~kchu/MATH_170/Fall09_170/Evening%20session/u
nit1_print.pdf Retrieved on August 2013

15
MODULE 2
COLLECTION AND PRESENTATION OF DATA

Introduction

Every day we come across a lot of information in the form of facts, numerical figures,
tables, graphs, etc. These are provided by newspapers, televisions, magazines, and other
means of communication. (aven.amritalearning.com, 2020) Data collection, organization and
presentation of data are equally important activities as the statistical data analysis and
interpretation of results in any research undertaking. The researcher must see to it that the
data collected are suitable and sufficient to achieve the objectives of research. Methods of
data presentation vary in their efficiency depending on the type of information the researcher
have.

Learning Outcomes

At the end of this module, students should be able to:

1. Identify the different methods of collecting and presenting data;

2. Identify the different elements of a frequency distribution;

3. Construct a frequency distribution table; and

4. Use statistical graphs and charts in presenting data.

16
Lesson 1. Methods of Data Collection (INSTAT UPLB, n.d.)

There are three methods available to a researcher for collecting data, namely
objective, subjective and the use of existing records. One or a combination of these methods
can be used depending on the availability of resources and data requirements. If a research
needs direct collection of data from the units of the study, a researcher has to apply the
objective or subjective method. If the data or part of the data needed by the researcher have
already been collected by another researcher or institution, then the use of existing records,
can be used.

Objective Method. In using the objective method, data are collected by measuring or
observing the characteristics of interest directly on the entities under study. This method
requires counting or measuring instruments to ensure correct and up-to-date information.
Data collection by observation using the five senses is also considered an objective method.

Subjective Method. Information is collected through interviews, not necessarily


requiring the presence of entities under study for actual measurement. Information can be
obtained over the phone, through face-to-face interviews or through mailed questionnaires.

Use of existing records. It is the most convenient method since the researcher
makes use of data that are already available. In using this method, the researcher should
remember to properly acknowledge the source of data.

Data collected can be classified into two types namely, primary and secondary.
Primary data are those collected directly from the source or obtained through objective or
subjective methods. Secondary data are those which have been acquired through the use of
existing records.

Primary data. Primary data is the kind of data that is collected directly from the data
source without going through any existing sources. It is mostly collected specially for a
research project and may be shared publicly to be used for other research. Primary data is
often reliable, authentic, and objective in as much as it was collected with the purpose of
addressing a particular research problem. It is noteworthy that primary data is not commonly

17
collected because of the high cost of implementation. A common example of primary data is
the data collected by organizations during market research, product research, and
competitive analysis. This data is collected directly from its original source which in most
cases are the existing and potential customers. Most of the people who collect primary data

are government authorized agencies, investigators, research-based private institutions, etc.

Secondary data. Secondary data is the data that has been collected in the past by
someone else but made available for others to use. They are usually once primary data but
become secondary when used by a third party. Secondary data are usually easily accessible
to researchers and individuals because they are mostly shared publicly. This, however,
means that the data are usually general and not tailored specifically to meet the researcher's
needs as primary data does. For example, when conducting a research thesis, researchers
need to consult past works done in this field and add findings to the literature review. Some
other things like definitions and theorems are secondary data that are added to the thesis to
be properly referenced and cited accordingly. Some common sources of secondary data
include trade publications, government statistics, journals, etc. In most cases, these sources
cannot be trusted as authentic.

Lesson 2. Methods of Data Presentation (INSTAT UPLB, n.d.)


There are three methods of presenting data, namely narrative or textual
presentation, tabular presentation and graphical presentation.

Narrative or Textual Presentation. In a narrative or textual presentation, the data is


presented in the form of words, sentences and paragraphs. It is story-fashioned and is used
with small data sets and limited summaries.

Example: 1. In the Statistics class of 40 students, 3 obtained the perfect score of 50. Sixteen
students got a score of 40 and above, while only 3 got 19 and below. Generally, the
students performed well in the test with 23 or 70% getting a passing score of 38 and above
(Pegollo, 2012).
Example 2: A census conducted by the Philippine census placed the population of the
country, as of May 2000 at 76.5 million. This figure is higher by 7.9 million persons than the

18
1995 census figure of 68.6 million and higher by 15.8 million persons than the 1990 census
count of 60.1 million persons.

Tabular Presentation. Tabular presentation is a systematic and logical arrangement


of data in the form of Rows and Columns with respect to the characteristics of data. It points
out trends and comparisons. It also shows the interrelationships among different variables.
The essential parts of a statistical table are table number, title, column headings, row
headings or stubs, body, footnotes and source of data.

Example:

Figure 2.1. Tabular Presentation


Source: https://byjus.com/commerce/tabular-presentation-of-
data/
Graphical Presentation. Graphical Representation is a way of analyzing numerical
data. It exhibits the relation between data, ideas, information and concepts in a diagram. It is
easy to understand and it is one of the most important learning strategies. It always depends
on the type of information in a particular domain. Accuracy, neatnesss and an anticipation
of what the final display will look like are the important concerns when using graphical
methods. There are different types of graphical presentation. Some of them are as follows.

1. Bar chart – the data are presented by drawing bars in height to the value which
they represent; maybe drawn horizontally or vertically depending on the number
of categories or groupings of the variable being depicted.

19
Source: https://www.mathsisfun.com/data/bar-graphs.html

Figure 2.2. Bar chart


Source: https://www.excel-easy.com/examples/bar-chart.html

2. Pie chart – a circle is drawn to represent the whole quantity and it is then divided
into segments each of which is proportional to the size of the components.

Figure 2.3. Pie chart


Source: https://www.excel-easy.com/examples/bar-chart.html

Figure 2.3. Pie chart 20


Source: https://www.excel-easy.com/examples/bar-chart.html
3. Line graph – used to plot continuing data and to show the relationship between
two variables; the successive points are connected with a straight line

Figure 2.4. Line Graph


Source: https://study.com/academy/lesson/what-is-a-line-graph-definition-examples.html

4. Stem-and-leaf display – a technique used to classify either discrete or continuous


variables; each number in the data is broken down into a stem and a leaf.

Figure 2.5. Stem and leaf display


Source: https://www.softschools.com/math/topics/stem_and_leaf_plot/

21
5. Histogram – a graphical representation of the frequency distribution of a
continuous quantitative variable; a bar is used to depict the counts of each class
or group

Figure 2.6. Histogram

6. Frequency polygon – constructed by marking at the midpoint of the class interval;


points are connected with a straight line; at the ends, the points are connected to
the midpoints of the previous and succeeding intervals of zero frequency

Figure 2.6. Frequency Polygon


Source:https://www.onlinemath4all.com/how-to-draw-frequency-polygon.html

22
Lesson 3. The Frequency Distribution Table

One type of tabular presentation that is very useful in summarizing and organizing
data is the Frequency Distribution Table (FDT). It contains non-overlapping categories or
classes of a variable and the respective frequencies or counts of the observations falling in
each category or class. The frequency distribution is an arrangement of the data which
shows the frequency of different values or groups of values of variable. It can be done direct
from the raw data. The raw data can be scores comprised of ungrouped data (Amid, 2005).

Example: A class of 30 students was given a Mathematics examination. The grades are
given below. Construct FDT with seven (7) classes.

The following steps are suggested in the construction of a frequency distribution table
[Punzalan & Uriarte (1989), Cruz (2002), Febre Jr. (1987), INSTAT UPLB (n.d.), Amid
(2005)].

1. Arrange the data (ungrouped) in ascending or descending order.

23
2. Compute for the range by getting the difference between the highest and the lowest
scores. It is given by the formula
R = HS – LS = 92 – 65 = 27

3. Determine the number of groups or classes (k) to use. The maximum number of
classes is 15-20 no matter how many observations there are. The ideal number of
class interval is somewhere between 5 and 15.
k = 7 classes (given in the problem)

4. Compute for the class size (class interval size or interval size) denoted by i. Divide
the range by the number of classes to get i. Round it off to the nearest whole
number.
i = R ÷ k = 27 ÷ 7 = 3.857... ~ 4

5. Write the class intervals starting from the lowest score. Take note that we have 7
classes (k) and the lowest score is 65. The first entry in the class interval is 65-68
with an interval size of 4 and the next all have i=4.

LL = lower limit
= 65, 69, 73, 77, 81, 85, 89

UL = upper limit
= 68, 72, 76, 80, 84, 88, 92

6. Count the frequency (f) for each class interval and find the total (N or n). Frequency
is the score found within the range of each class interval.

24
7. Compute for the lower true class boundary (LTCB) and the upper true class
boundary (UTCB).
LTCB = LL – 0.5 UTCB = UL + 0.5

25
8. Compute for the midpoint or class mark of each class interval (M or x m). Do not
round off the midpoint.

LL + UL
M =
2

M = (65 + 68) ÷ 2
= (133) ÷ 2
M = 66.5

9. Find the cumulative frequency less than (cf<). It should start from the frequency of
the lowest class interval. The lowest class interval is 65-68, so start adding from 4.
The sum should be equal to N.

10. Find the cumulative frequency greater than (cf>). It should start from the frequency of
the highest class interval. The highest class interval is 89-92, so start adding from 4.
The sum should be equal to N.

26
11. Compute for the relative frequency (RF). Round off to the nearest hundredths.
Possible RF total = 99.99, 100 or 100.01

f
RF = x100%
N

RF = 4 ÷ 30 x 100% = 13.33

RF = 1 ÷ 30 x 100 % = 3. 33

Do the same for the remaining frequencies.

Complete FDT table:

27
Assessment Task 2-1

1. Differentiate objective from subjective method of data collection.

2. Give 5 examples each for primary and secondary data.

3. Which do you think is the best method of presenting data? Explain your
answer.

28
Assessment Task 2-2

Below are Statistics exam scores of 40 BEEd3 students. Follow the steps in
constructing FDT to complete the table. The table should have 6 classes.

𝑅 = ____________________ 𝑘 = ________ 𝑖 = ____________________

29
Summary

• Data collection, organization and presentation of data are equally important


activities as the statistical data analysis and interpretation of results in any
research undertaking.
• There are three methods available to a researcher for collecting data, namely
objective, subjective and the use of existing records. One or a combination of
these methods can be used depending on the availability of resources and data
requirements. If a research needs direct collection of data from the units of the
study, a researcher has to apply the objective or subjective method. If the data or
part of the data needed by the researcher have already been collected by
another researcher or institution, then the use of existing records, can be used.
• Data collected can be classified into two types namely, primary and secondary.
Primary data are those collected directly from the source or obtained through
objective or subjective methods. Secondary data are those which have been
acquired through the use of existing records.
• There are three methods of presenting data, namely narrative or textual
presentation, tabular presentation and graphical presentation.
• One type of tabular presentation that is very useful in summarizing and
organizing data is the Frequency Distribution Table (FDT).
• The frequency distribution is an arrangement of the data which shows the
frequency of different values or groups of values of variable. It can be done direct
from the raw data. The raw data can be scores comprised of ungrouped data.

30
References

Amid, D. M. (2005) Fundamentals of Statistics. Quezon City. Lorimar Publishing


Company Inc.
Collection and Presentation of Data. (2013).
aven.amritalearning.com/index.php?sub=102&brch=305&sim=1597&cnt=388
9
Cruz, C. U. Statistics. (2002). Marikina City. Instructional Coverage System
Publishing Inc.
Febre Jr., F. A. (1987) Introduction to Statistics. Quezon City. Phoenix Publishing
House, Inc.
Graphical Presentation of Data. (2020). https://byjus.com/maths/graphical-
representation/.
Pegollo, Chie. (2012, January 9). Presentation of Data.
https://www.slideshare.net/mschie/presentation-of-data
Primary vs Secondary Data. (2010). https://methods.sagepub.com/reference/encyc-
of-research-design/n333.xml
Tabular Presentation of Data. (n.d.). https://byjus.com/commerce/tabular-
presentation-of-
data/#:~:text=Concept%20of%20Tabulation,is%20compact%20and%20self%
2Dexplanatory.
Textual Presentation of Data. (2019, March 2)
http://researcharticles.com/index.php/textual-presentation-of-data/.
Workbook in Statistics 1. Authored by the Faculty of the Institute of Statistics, UP Los
Banos, College Laguna 4031

31
MODULE 3
MEASURES OF CENTRAL TENDENCY

Introduction

Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that patterns might
emerge from the data (statistics.laerd.com, 2018). It is simply a way to describe data
and does not allow for conclusions to be made beyond the data analyzed or reached
conclusions regarding any hypotheses made. This is used to present quantitative
descriptions in a manageable form. It helps simplify large amounts of data in a
sensible way (Trochim, 2020 ). It can be useful to provide basic information about
variables in a data set and highlight potential relationships between variables (Child
Care & Early Education Research Connections, 2019).
A measure of central tendency is a single value that attempts to describe a
set of data by identifying the central position within that set of data
(https://statistics.laerd.com/, 2018). Measures of central tendency are sometimes called
measures of central location (Australian Bureau of Statistics, 2020). A measure of
central position or central tendency is a single figure which is representative of the
general level of magnitudes or values of the items in a set of data. This figure is used to
represent all the numbers in the set of data. When arranged according to magnitude, it
tends to lie centrally within the set (Pagoso & Montaña, 1985). They are also classed as
summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the
median and the mode. The mean, median and mode are all valid measures of central
tendency, but under different conditions, some measures of central tendency become
more appropriate to use than others (statistics.laerd.com, 2018).

32
Learning Outcomes

At the end of this module, students should be able to:

4. Calculate the mean;


5. Locate the median class in the frequency distribution;
6. Compute the median;
7. Locate the modal class in the frequency distribution: and
8. Compute the mode.

33
Lesson 1. Mean

MEAN. The mean (or average) is the most popular and well-known measure of
central tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous data (https://statistics.laerd.com/, 2018). It is generally described

as “the center of gravity of a distribution and is the most convenient. It is denoted by x


(Febre, 1987).

Mean for Ungrouped Data. This is computed by adding all the values and dividing
the sum by the number of values (Febre, 1987). The formula for the mean is given by:
Σ𝑥
x = .
𝑛

where: x = mean (x bar)


𝑛 = total number of items in the sample
𝑥 = the observed value/value of each item
Σ = summation notation (means the sum of)

Example: Suppose a teacher chooses ten students whose scores in a 30-item test are as
follows: 15, 25, 18, 15, 20, 25, 18, 18, 20, 25. Calculate the mean and interpret
the result.
Solution:
Σ𝑥
x = 𝑛
15+25+18+15+20+25+18+18+20+25
x =
10
199
x =
10
x = 𝟏𝟗. 𝟗

The value of the mean indicates that the group has obtained an average
score of 19.9 or has correctly answered about 20 items out of 30. The
group has answered about 67% (20/30) of the items, a relatively good
result.

34
If the scores occur more than once, it can be listed as to the number of frequencies
the score appears. Instead of adding individual scores, take the product of each score and
the frequency with which it appears (Cruz, 2002). Use the formula:
Σf𝑥
x = 𝑛

where: x = mean (x bar)


𝑛 = total number of items in the sample
𝑥 = the observed value/value of each item
𝑓 = frequency
Σ = summation notation (means the sum of)

Supplemental Activity: Solve for the mean from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑥ҧ =36.5

Example: Suppose a teacher chooses ten students whose scores in a 30-item test are as
follows: 15, 25, 18, 15, 20, 25, 18, 18, 20, 25. Summarize the scores into a table
as follows:
Scores Frequency
15 2
18 3
20 2
25 3
𝑛 = 10

Σf𝑥
Solution: x = 𝑛
x = 15(2)+18(3)+20(2)+25(3)
10
199
x =
10
x = 𝟏𝟗. 𝟗

35
Weighted Mean. The weighted mean considers the proper weights assigned to the

observed values according to their relative importance. It is denoted by the symbol x 𝑤

(Amid, 2005). In formula:


Σw𝑥
x𝑤 = Σw

where: 𝑤 = weight of each item


𝑥 = value of each item

x = mean (x bar )

Σ = summation notation (means the sum of)

Example: Here are the grades obtained by a student in the different criteria for grading. The
weight for each criterion is given.

Grades Weight (w)


Criteria xw
(x)
Long tests 80 3.0 240
Quizzes 85 2.0 170
Departmental tests 82 2.5 205
Class Participation 88 1.5 132
Homework & Projects 85 1.0 85

Total Σ𝑤𝑥 =10 Σ𝑤𝑥 = 832

Σw𝑥
Applying the formula: x 𝑤 =
Σw
832
x𝑤 = 10

x 𝑤 = 83.2

This means that, in consideration of all the criteria with their respective weights, the
grade of the students is generally 83.

36
Supplemental Activity: Solve for the weighted mean of the following data:

x w
8 2
9 3
12 5
14 7
15 8
16 4
18 1

Answer: 𝑥ҧ𝑤 =13.43

Mean for Grouped Data. Data which are arranged in a frequency distribution are
called grouped data. When the number of items is too large, it is best to compute for the
measures of central tendency and variability using the frequency distribution (Pagoso &
Montaña, 1985). To compute for the mean of grouped data, determine first the midpoint of
each class interval (Amid, 2005). The formula is given by:

ΣfM Σf𝑥𝑚
x = or x =
n n

where:

x = mean

𝑀/𝑥𝑚 = class midpoint/class mark

𝑓 = the corresponding frequencies

𝑛 = total number of items/total frequencies

37
Example: The following is the distribution of the length of service in years of 50 employees

of United Laboratories Inc. Determine the mean using the given formula.

Length of Service (Class interval) LL- Number of Employees (frequency) f


UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3

Solution:

Step 1: Compute for the midpoint of each class interval. Use the formula:

𝐿𝐿+𝑈𝐿
𝑀= 2

Length of Service (CI) Number of Employees Midpoint


LL-UL f M
1–5 5 3
6 – 10 7 8
11 – 15 12 13
16 – 20 13 18
21 – 25 6 23
26 – 30 4 28
31 – 35 3 33

38
Step 2: Multiply the frequency by the midpoint and get the total (Σ𝑓𝑀).

Length of Service (CI) Number of Employees Midpoint fM


LL-UL f M
1–5 5 3 15
6 – 10 7 8 56
11 – 15 12 13 156
16 – 20 13 18 234
21 – 25 6 23 138
26 – 30 4 28 112
31 – 35 3 33 99
𝑛 = 50 Σ𝑓𝑀
= 810

Step 3: Compute for the mean.

ΣfM
x = n
810
x = 50

x = 16.2

Supplemental Activity: Solve for the mean from the data below:

CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2

Answer: 𝑥ҧ = 20.9

39
Lesson 2. Median
MEDIAN. The median is defined as the score-point which divides a ranked
distribution into two equal parts; it is the value below which lies 50% of the data. It is
denoted by 𝑴𝒅𝒏 (Febre, 1987).

Median for Ungrouped Data. Computation of the median for ungrouped data requires
the values to be arranged in the order of magnitude, either in ascending or descending order
(Febre, 1987).
• For data involving an odd number of scores (n = odd), the median is simply
the middle value. For example: if n = 9, the median is the fifth score from
either the lowest or the highest.
• If n = even, there would be two middle values. The median in this case is the
average of these two middle values. For example: if n = 20, the median is the
average of the 10th and 11th scores.

Example: Find the median of the following data:


1. 6, 8, 15, 18, 23, 24, 42
2. 121, 108, 120, 98, 132, 100, 92, 140, 102, 98

Solution:
1. 6, 8, 15, 18, 23, 24, 42
Since n = 7, the middle score is the 4th score which is 18, therefore the 𝑀𝑑𝑛 = 18.

2. 121, 108, 120, 98, 132, 100, 92, 140, 102, 98


Arrange the values in ascending or descending order:
92, 98, 98, 100, 102, 108, 120, 121, 132, 140
Since n = 10, there are two middle values (the fifth and the sixth scores) which are
102 and 108 respectively.
102+108
𝑀𝑑𝑛 = 2

𝑀𝑑𝑛 = 105

40
Supplemental Activity: Solve for the median from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑀𝑑𝑛 = 28.5

Median of Grouped Data (Febre, 1987). The median of grouped data could be
determined by the following formula:
𝑛
− 𝑐𝑓𝑏
2
𝑀𝑑𝑛 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑀𝑑𝑛

where: 𝑀𝑑𝑛 = median

𝐿𝑇𝐶𝐵 = lower true class boundary of the median class

𝑛 = total number of frequencies

𝑓𝑀𝑑𝑛 = frequency of the median class

𝑐𝑓𝑏 = cumulative frequency (cf) below the median class


𝑖 = class interval size

𝑛 𝑛
The median class is the class limit which contains the 2 𝑡ℎ value. The 2 𝑡ℎ is equal or
nearest but not greater than the value in the cumulative frequency less than (𝑐𝑓 <)
distribution.

Example: Find the median of the frequency distribution of length of service in years of 50
employees of United Laboratories Inc.

Length of Service (Class interval) LL- Number of Employees (frequency) f


UL
1–5 5
6 – 10 7
11 – 15 12

41
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3

Solution:

Step 1: Compute for the cumulative frequency less than (cf<).

Length of Service (CI) Number of Employees cf<


LL-UL f
1–5 5 5
6 – 10 7 12
11 – 15 12 24
16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

𝑛 𝑛 50
Step 2: To obtain the median class, solve for 𝑡ℎ value, that is 𝑡ℎ = 𝑡ℎ = 25th value. The
2 2 2
25th value is located at the class interval 16 – 20.

Length of Service (CI) Number of Employees cf< Value


LL-UL f
Median
1–5 5 5 1st to 5th
class 6 – 10 7 12 6th – 12th
11 – 15 12 24 13th – 24th
16 – 20 13 37 25th – 37th
21 – 25 6 43 38th – 43rd
26 – 30 4 47 44th – 47th
31 – 35 3 50 48th – 50th
𝑛 = 50

42
Step 3: Since the 25th value has been located, the terms in the formula can now be
enumerated. The median class is 16 – 20.

Length of Service (CI) LL- Number of cf< Value


UL Employees f
1–5 5 5 1st to 5th
6 – 10 7 12 6th – 12th
11 – 15 12 24 - 𝑐𝑓𝑏 13th – 24th
16 – 20 13 - 𝑓𝑀𝑑𝑛 37 25th – 37th
21 – 25 6 43 38th – 43rd
26 – 30 4 47 44th – 47th
31 – 35 3 50 48th – 50th
𝑛 = 50

Median class (CI) = LL – UL = 16 – 20

LTCB = LL – 0.5 = 16 – 0.5 = 15.5

𝒏
= 25
𝟐

𝑐𝑓𝑏 = 24 (less than the cf< of the median class)

𝑓𝑀𝑑𝑛 = 13

𝑖 = 5

Step 4: Compute the median.


𝑛
− 𝑐𝑓𝑏
𝑀𝑑𝑛 = 𝐿𝑇𝐶𝐵 + (2 )𝑖
𝑓𝑀𝑑𝑛

43
25 − 24
𝑀𝑑𝑛 = 15.5 + ( ) 5 (subtract, divide, multiply, add)
13
𝑀𝑑𝑛 = 𝟏𝟓. 𝟖𝟖 (round off to the nearest hundredths place)

Supplemental Activity: Solve for the mean from the data below:

CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2

Answer: 𝑀𝑑𝑛 = 18.94

Lesson 3. Mode

MODE. The mode is the simplest measure of central tendency. It may be easily
identified by looking at an ungrouped set of scores and locating the score or item which
occurs most frequently. A distribution with only one mode is said to be unimodal while a
distribution with a two or more modes is described as multimodal. A distribution which has
two modes is labeled as bimodal; with three modes, as trimodal; and so on (Pagoso &
Montaña, 1985). The mode is denoted by 𝑀𝑜 (Amid, 2005).

Mode for Ungrouped Data. This is done simply through inspection (Febre, 1987).
Look for the item value which occurs the most number of times. That value is the mode.

Example: Find the mode of the following values.


1. 4, 5, 8, 8, 8, 9, 12, 12, 19, 20
2. 15, 13, 13, 14, 17, 17, 18, 12

44
Solution:
1. 𝑀𝑜 = 8 (since it appears 3 times more than the other values)
2. 𝑀𝑜 = 13 and 17 (bimodal)

Supplemental Activity: Solve for the mode from the given data: 23, 18, 27, 30, 43, 78.
Answer: 𝑀𝑜 = 𝑛𝑜𝑛𝑒

Mode for Grouped Data. The mode in a frequency distribution is within the class
interval with the highest frequency. The class interval with the highest frequency is known as
the modal class (Pagoso & Montaña, 1985). The formula is given by:
𝑑1
𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 + 𝑑2
where:
𝐿𝑇𝐶𝐵 = lower true class boundary of the modal class
𝑑1 = difference between the frequency in the modal class and the
frequency of the preceding class interval
𝑑1 = difference between the frequency in the modal class and the
frequency in the succeeding class interval
𝑖 = class interval size

Example: Find the mode of the frequency distribution of length of service in years of 50
employees of United Laboratories Inc.

Length of Service (Class interval) LL- Number of Employees (frequency) f


UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3

45
Solution:
1. The highest frequency is 13 which belongs to the class interval 16 – 20, therefore it is
the modal class.
Length of Service (CI) Number of Employees f
LL-UL
1–5 5
6 – 10 7
11 – 15 12
Modal class
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3
𝑛 = 50

2. The preceding class interval is the interval that is lower in value than the modal class
(11 – 15) with a frequency of 12,
The succeeding class interval is the interval with a greater value than the modal
class (21 – 25) with a frequency of 6.

Length of Service (CI) Number of Employees f


LL-UL
1–5 5

Preceding class 6 – 10 7
interval 11 – 15 12 }d 1 = 13 – 12 = 1
16 – 20 13
Succeeding
class interval 21 – 25 6
}d 2 = 13 – 6 = 7

26 – 30 4
31 – 35 3

46
3. Compute for Mode.
Modal class = 16 – 20
𝐿𝑇𝐶𝐵 = LL – 0.5 = 16 – 0.5 = 15.5
𝑑1 =1
𝑑1 =7
𝑑1
𝑖 = 5𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 +𝑑2

𝑑1
𝑀𝑜 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑑1 +𝑑2

1
𝑀𝑜 = 15.5 + ( )5
1+7

𝑀𝑜 = 16.13

Supplemental Activity: Solve for the mean from the data below:

CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2

Answer: 𝑀𝑜 = 15.75

47
Assessment Task 3-1

Compute for the mean of each of the following sets of data.

1. The grades of a student on 8 examinations were:


85, 90, 88, 75, 69, 92, 78, 84.

2. Grades of a student for the first quarter.


Subjects Grades No. of Units
Science 88 5
English 81 3
Social Studies 79 3
Math 82 3
PE 85 1

48
Assessment Task 3-2

Compute for the mean of each of the following sets of data.

3. The following is a distribution for the number of employees in 40 companies


belonging to a certain industry.

No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4

49
Assessment Task 3-3

Compute for the mode of each of the following sets of data.

1. 16, 17, 18, 19, 20, 21, 22, 22, 22

2. 19, 15, 21, 20, 14, 17

3. The following is a distribution for the number of employees in 40 companies


belonging to a certain industry.

No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4

4The following is a distribution for the number of employees in 40 companies


belonging to a certain industry.

No. of No. of
Employees Companies
14 – 20 1
21 – 27 3
28 – 34 6
35 – 41 11
42 – 48 8
49 – 55 7
56 – 62 4

50
5. Compute for the mode of each of the following sets of data.

3. 16, 17, 18, 19, 20, 21, 22, 22, 22

4. 19, 15, 21, 20, 14, 17

Compute for the median of each of the following sets of data.

6. The following are the scores obtained by 12 applicants in the entrance


examination for first year college at Laguna University.
75, 60, 85, 80, 95, 75, 98, 90, 90, 85, 90, 87

7. In a selection of 15 lots of 150 electronic components, the following numbers


of defective electronic components were found:
2, 5, 7, 4, 6, 10, 11, 4, 9, 5, 12, 8, 13, 4, 15

51
Summary

• A measure of central tendency is a single value that attempts to describe a set of


data by identifying the central position within that set of data. Measures of central
tendency are sometimes called measures of central location.
• The mean (or average) is the most popular and well-known measure of central
tendency. It can be used with both discrete and continuous data, although its use
is most often with continuous data. It is generally described as “the center of

gravity of a distribution and is the most convenient. It is denoted by x .


• The median is defined as the score-point which divides a ranked distribution into
two equal parts; it is the value below which lies 50% of the data. It is denoted by
𝑀𝑑𝑛.
• Computation of the median for ungrouped data requires the values to be
arranged in the order of magnitude, either in ascending or descending order.
• The mode is the simplest measure of central tendency. It may be easily identified
by looking at an ungrouped set of scores and locating the score or item which
occurs most frequently.
• The mode in a frequency distribution is within the class interval with the highest
frequency. The class interval with the highest frequency is known as the modal
class.

References

• Amid, D. M. (2005). Fundamentals of Statistics. Quezon City. Lorimar Publishing


Company Inc.
• Australian Bureau of Statistics. (2020). Statistical Language - Measures of
Central Tendency.
https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-

52
+measures+of+central+tendency#:~:text=A%20measure%20of%20central%20te
ndency,or%20centre%20of%20its%20distribution.
• Cruz, C. U. (2002). Statistics. Marikina City. Instructional Coverage System
Publishing Inc.
• Child Care & Early Education Research Connections. (2019). Descriptive
Statistics.
https://www.researchconnections.org/childcare/datamethods/descriptivestats.jsp
• Febre Jr., F. A. (1987). Introductory Statistics. Quezon City. Phoenix Publishing
House, Inc.
• Frequency Distribution. (n.d.). https://www.emathzone.com/tutorials/basic-
statistics/frequency-
distribution.html#:~:text=Data%20presented%20in%20the%20form,distribution%
20is%20called%20grouped%20data.&text=The%20numerical%20raw%20data%
20arranged,%2C%2010%2C%2016%2C%2019.
• Pagoso, C. M., Montaña, Rizalina A. (1985). Introductory Statistics. Quezon City.
Rex Printing Company, Inc.
• Trochim, W. M.K. (2020, March 10). Descriptive Statistics.
https://conjointly.com/kb/descriptive-statistics/
• Measures of Central Tendency. (2018). https://statistics.laerd.com/statistical-
guides/measures-central-tendency-mean-mode-median.php

53
MODULE 4
MEASURES OF LOCATION

Introduction

A measure of position is a method by which the position that a particular data value
has within a given data set can be identified (http://www.milefoot.com/, n.d.). It is also called
a measure of location, quantiles or fractiles. Fractiles are measures of location or position
which include not only central location but also any position based on the number of equal
divisions in a given distribution. The most commonly used fractiles are the quartiles, deciles,
and percentiles (Yap, 2014). A fractile is the cut off point for a certain fraction of a sample. If
your distribution is known, then the fractile is just the cut-off point where the distribution
reaches a certain probability (Glen, 2017). Fractile computations are related to computing
the median in the sense that both quantities form points of divisions of a distribution,
depending on the number of parts the distribution is to be partitioned (Cruz, 2002).

Learning Outcomes

At the end of this module, students should be able to:

1. Locate the percentile, decile and quartile classes; and


2. Compute percentile, decile and quartile.

54
Lesson 1. Percentile

PERCENTILE. Percentiles are the ninety-nine score points which divide a


distribution into 100 equal parts (Febre, 1987). To form 100 parts, there are 99 points of
divisions, from 𝑃1 to 𝑃99 . Each percentile score identifies the parts of the distribution below it.
For example, 𝑃1 is a score that surpasses 1% of the whole group; 𝑃99 surpasses 99% of that
group. In effect, a score 𝑃1 belongs to the lowest 1% and 𝑃99 belongs to the highest 1%.
The calculation of the percentile scores is similar to the calculation of the median. The
median is equivalent to 𝑃50 (Cruz, 2002).
According to Cruz (2002), percentiles can identify the placement of a score within the
distribution of scores. It is commonly used in giving results of government examinations
such as the National Elementary Assessment Test (NEAT) and the National Secondary
Assessment Test (NSAT) .

The 𝑘th percentile, 𝑃𝑘 , can be defined as a value in a data set such that about k% of
the measurements are smaller than the value of 𝑃𝑘 and about (100 – k)% of the
measurements are greater than the value of 𝑃𝑘 (Amid, 2005). The value of the 𝑘th percentile
(𝑃𝑘 ) is given by:
𝑘𝑛
1. 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 for ungrouped data; and
100
𝑘𝑛
−𝑐𝑓𝑏
2. 𝑃𝑘 = 𝐿𝑇𝐶𝐵 + (100 )𝑖 for grouped data
𝑓𝑃 𝑘

where: P = Percentile
k = the value from 1 to 99
LTCB = lower true class boundary of the 𝑃𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝑃𝑘 class
𝑓𝑃𝑘 = frequency of the 𝑃𝑘 class
𝑖 = class interval size

55
Example: Ungrouped Data
The following data give the price earnings ratio of 12 companies.
16 38 20 20 18 34 7 58 31 19 22 18
Find the value of the 62nd percentile.

Solution:
Step 1: Rank the given data in increasing order.
7 16 18 18 19 20 20 22 31 34 38 58

Step 2: Compute for the 62nd percentile, 𝑘 = 62, 𝑛 = 12.


𝑘𝑛 62(12)
= = 7.44𝑡ℎ 𝑡𝑒𝑟𝑚 ≈ 8 (round up)
100 100

Step 3. The 8𝑡ℎ 𝑡𝑒𝑟𝑚 is 22. Therefore, 𝑷𝟔𝟐 = 22.


22
7 16 18 18 19 20 20 31 34 38 58
8𝑡ℎ

Supplemental Activity: Use the given in the example to solve for 𝑷𝟐𝟓 and 𝑷𝟗𝟎 .
Answers: 𝑃25 = 18, 𝑃90 = 38

Example: Grouped Data

Below is the frequency distribution of length of service in years of 50 employees of United


Laboratories Inc. Find 𝑃45 .

a.
Length of Service (Class interval) LL- Number of Employees (frequency) f
UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4

56
31 – 35 3

Solution:
a. 𝑃45

Step 1: Compute for cf<.


Length of Service (Cl) Number of Employees
cf<
LL-UL f
1–5 5 5
6 – 10 7 12
11 – 15 12 24
16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

𝑘𝑛
Step 2: Compute for 𝑡ℎ value and locate the value on the cf< column to find the
100
𝑃45 class. The value of k in the given is 45 and n = 50.

45𝑛 45(50)
𝑡ℎ value = 𝑡ℎ value = 22.5th value
100 100

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12
𝑃45 11 – 15 12 24 22.5th
class 16 – 20 13 37 value

21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

57
45𝑛
−𝑐𝑓𝑏
Step 3: Compute 𝑃45 with the formula for 𝑃45 = 𝐿𝑇𝐶𝐵 + ( 100 ) 𝑖.
𝑓𝑃45

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12 - 𝒄𝒇𝒃
11 – 15 12 - 𝒇𝑷𝟒𝟓 24
16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

45𝑛
− 𝑐𝑓𝑏
𝑃45 = 𝐿𝑇𝐶𝐵 + (100 )𝑖
𝑓𝑃45

22.5 − 12
𝑃45 = 10.5 + ( )5
12

𝑃45 = 𝟏𝟒. 𝟖𝟖

Supplemental Activity: Use the given in the example to solve for 𝑷𝟔𝟎 and 𝑷𝟗𝟓 .
Answers: 𝑃60 = 17.81, 𝑃95 = 31.33

58
Lesson 2. Quartile and Interquartile Range

Quartiles are three summary measures that divide a ranked data set into four equal parts.
Three measures will divide any data set into four equal parts. These three measures are the
first quartile (𝑄1 ), the second quartile 𝑄2 , and the third quartile 𝑄3 . The second quartile is the
same as the median of a data set. The first quartile is the value of the middle term among
the observations less than the median, and the third quartile is the value of the middle term
among the observations that are greater than the median. The data should be ranked in
increasing order before the quartiles are determined (Amid, 2005).

The difference between the first and the third quartile is called the Interquartile range (IQR).
That is, 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 (Amid, 2005). The interquartile range is a measure of where the
“middle fifty” is in a data set. It is a measure of where the bulk of the values lie. That’s why
it’s preferred over many other measures of spread when reporting things like school
performance or SAT scores (Glen, 2020). This measure is generally more desirable than
the range when the distribution described is markedly truncated or skewed, or when the
median is the only measure of central tendency that is available (Febre, 1987).
The semi-interquartile range (or quartile deviation) is a measure of spread or dispersion. It is
computed as one half the difference between the 75th percentile [often called (Q3)] and the
𝑄3 −𝑄1 𝐼𝑄𝑅
25th percentile (Q1). Thus, 𝑄𝐷 = or 𝑄𝐷 = (Lane, n.d.).
2 2

The value of the 𝑘th quartile for ungrouped data is described below:

25% 25% 25% 25%


𝑄1 𝑄2 𝑄3

Example: Ungrouped Data


Find the value of the three quartiles and the interquartile range.
1. The following data give the price earnings ratio of 12 companies.
16 38 20 20 18 34 7 58 31 19 22 18
2. The following are the ages of nine employees of an insurance company.
47 28 39 51 33 37 59 24 33

59
Solution:
1. Rank the given data in increasing order. Then calculate the three quartiles as follows:
7 16 18 18 19 20 20 22 31 34 38 58

𝑘𝑛
a. 𝑸𝟏 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=1
4

𝑘𝑛 1(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒; the third value is 18; 𝑸𝟏 = 𝟏𝟖
4 4

𝑘𝑛
b. 𝑸𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=2
4

𝑘𝑛 2(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the sixth value is 20; 𝑸𝟐 = 𝟐𝟎
4 4

𝑘𝑛
c. 𝑸𝟑 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=3
4

𝑘𝑛 3(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the ninth value is 31; 𝑸𝟐 = 𝟑𝟏
4 4

d. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝐼𝑄𝑅 = 31 − 18
𝑰𝑸𝑹 = 𝟏𝟑

2. Rank the given data in increasing order. Then calculate the three quartiles as follows:
24 28 33 35 37 39 47 51 59

𝑘𝑛
a. 𝑸𝟏 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=1
4

𝑘𝑛 1(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 2.25𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒 ; the third value is 33;
4 4
therefore, 𝑸𝟏 = 𝟑𝟑

𝑘𝑛
b. 𝑸𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=2
4

60
𝑘𝑛 2(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the fifth value is 37;
4 4
therefore, 𝑸𝟐 = 𝟑𝟕

𝑘𝑛
c. 𝑸𝟑 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=9, k=3
4

𝑘𝑛 3(9)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 6.75𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 7𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the seventh value is 47;
4 4
therefore 𝑸𝟐 = 𝟒𝟕

d. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝐼𝑄𝑅 = 47 − 33
𝑰𝑸𝑹 = 𝟏𝟒

Supplemental Activity
The following data give the average family income (in thousand pesos) in 1985 for each of the 13 regions in the
Philippines: 58, 31, 27, 38, 29, 20, 24, 21, 18, 23, 27, 28, 24. Find:
1. the value of the three quartiles: and
2. the interquartile range.

Answers: 𝑄1 = 23, 𝑄2 = 27, 𝑄3 = 29, 𝐼𝑄𝑅 = 6

The value of the 𝑘th quartile for grouped data is given by:

𝑘𝑛
−𝑐𝑓𝑏
𝑄𝑘 = 𝐿𝑇𝐶𝐵 + ( 4 )𝑖
𝑓𝑄𝑘

where: Q = Quartile
k = the value from 1 to 3
LTCB = lower true class boundary of the 𝑄𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝑄𝑘 class
𝑓𝑄𝑘 = frequency of the 𝑄𝑘 class
𝑖 = class interval size

Example: Grouped Data

61
Below is the frequency distribution of length of service in years of 50 employees of United
Laboratories Inc. Find the value of the three quartiles and the interquartile range.

Length of Service (Class interval) LL- Number of Employees (frequency) f


UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3

Solution:

a. 𝑄1 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 1, n =50
4
𝑘𝑛 1(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 12.5th value
4 4

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12 – 𝑐𝑓𝑏
𝑄1 12.5th
11 – 15 12 - 𝑓𝑄1 24
class value
16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

62
Solve for 𝑄1 : k = 1, n = 50, LTCB = 10.5, 𝑐𝑓𝑏 = 12, 𝑓𝑄1 = 12, 𝑖=5
1(𝑛)
−𝑐𝑓𝑏
4
𝑄1 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑄1
12.5−12
𝑄1 = 10.5 + ( )5
12
𝑸𝟏 = 𝟏𝟎. 𝟕𝟏

b. 𝑄2 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 2, n =50
4
𝑘𝑛 2(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 25th value
4 4

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12
11 – 15 12 24 - 𝑐𝑓𝑏
𝑄2 16 – 20 13 - 𝑓𝑄2 37 25th
class value
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

Solve for 𝑄2 : k = 2, n = 50, LTCB = 15.5, 𝑐𝑓𝑏 = 24, 𝑓𝑄1 = 13, 𝑖=5
2(𝑛)
4
−𝑐𝑓𝑏
𝑄2 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑄2
25−24
𝑄2 = 15.5 + ( )5
13
𝑸𝟐 = 𝟏𝟓. 𝟖𝟖

63
c. 𝑄3 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 3, n =50
4
𝑘𝑛 3(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 37.5th value
4 4

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12
11 – 15 12 24
16 – 20 13 37 - 𝑐𝑓𝑏
𝑄3 37.5th
21 – 25 6 - 𝑓𝑄3 43
class value
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

Solve for 𝑄3 : k = 3, n = 50, LTCB = 20.5, 𝑐𝑓𝑏 = 37, 𝑓𝑄1 = 6, 𝑖=5


3(𝑛)
−𝑐𝑓𝑏
4
𝑄3 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝑄3
37.5−37
𝑄3 = 20.5 + ( )5
6
𝑸𝟑 = 𝟐𝟎. 𝟗𝟐

d. 𝐼𝑄𝑅 = ?
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 20.92 − 10.71
𝑰𝑸𝑹 = 𝟏𝟎. 𝟐𝟏

64
Supplemental Activity
The following is a distribution for the number of employees in 80 companies belonging to certain industry.

Number of Employees Number of Companies


60 – 69 2
70 – 79 6
80 – 89 10
90 – 99 22
100 – 109 19
110 – 119 11
120 – 129 7
130 – 139 3

Find:
1. the value of the three quartiles: and
2. the interquartile range.

Answers: 𝑄1 = 90.41, 𝑄2 = 99.5, 𝑄3 = 110.41, 𝐼𝑄𝑅 = 20

Lesson 3. Decile

Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort
data into ten equal parts: The 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th and 100th
percentiles. Deciles and decile ranks are used more often in real life than in the
classroom. Deciles are also commonly used for college admissions and high school
rankings (Glen, 2014). It is also used in other fields such as finance, economics and others
(Educba, 2020).

65
A decile rank assigns a number to a decile:

Decile Rank Percentile

1 10th

2 20th

3 30th

4 40th

5 50th

6 60th

7 70th

8 80th

9 90th

The higher your place in the decile rankings, the higher your overall ranking. For example, if
you were in the 99th percentile for a particular test, that would put you in the decile ranking
of 10. A person who scored very low (say, the 5th percentile) would find themselves in a
decile rank of 1 (Glen, 2014).
Like other tools quartile and percentile, decile is also a method which divides data into
smaller parts which are easier to measure, analyze and understand (www.educba.com,
2020).

The value of the 𝑘th percentile (𝐷𝑘 ) is given by:

66
𝑘𝑛
1. 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 for ungrouped data; and
10
𝑘𝑛
−𝑐𝑓𝑏
2. 𝐷𝑘 = 𝐿𝑇𝐶𝐵 + ( 10 )𝑖 for grouped data
𝑓𝐷𝑘

where: D = Decile
k = the value from 1 to 9
LTCB = lower true class boundary of the 𝐷𝑘 class
n = total number of frequencies
𝑐𝑓𝑏 = cumulative frequency below the 𝐷𝑘 class
𝑓𝐷𝑘 = frequency of the 𝐷𝑘 class
𝑖 = class interval size

Example: Ungrouped Data


1. The following data give the price earnings ratio of 12 companies.
16 38 20 20 18 34 7 58 31 19 22 18
Find 𝐷2 and 𝐷7 .

Solution:
1. Rank the given data in increasing order.
7 16 18 18 19 20 20 22 31 34 38 58

𝑘𝑛
a. 𝑫𝟐 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=2
10

𝑘𝑛 1(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 1.2𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 2𝑛𝑑 𝑣𝑎𝑙𝑢𝑒 ; the second value is 16;
10 10
therefore, 𝑫𝟐 = 𝟏𝟔

𝑘𝑛
b. 𝑫𝟕 =?, 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = ?, n=12, k=7
10

𝑘𝑛 7(12)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 8.4𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 ≈ 9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒; the ninth value is 31;
10 10
therefore, 𝑫𝟕 = 𝟑𝟏

67
Supplemental Activity
The following data give the average family income (in thousand pesos) in 1985 for each of the 13 regions in the
Philippines: 58, 31, 27, 38, 29, 20, 24, 21, 18, 23, 27, 28, 24. Find:
1.𝑫𝟒: and
2.𝑫𝟗.

Answers: 𝐷4 = 24, 𝐷9 = 38

Example: Grouped Data

Below is the frequency distribution of length of service in years of 50 employees of United


Laboratories Inc. Find 𝐷2 and 𝐷7 .

Length of Service (Class interval) LL- Number of Employees (frequency) f


UL
1–5 5
6 – 10 7
11 – 15 12
16 – 20 13
21 – 25 6
26 – 30 4
31 – 35 3

Solution:

a. 𝐷2 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 2, n =50
10
𝑘𝑛 2(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 10th value
10 10

68
Length of Service (Cl) No. of Employees
cf<
LL-UL f
1–5 5 5 – 𝑐𝑓𝑏
𝐷2 6 – 10 7 - 𝑓𝑄1 12 10th
class 11 – 15 12 24 value

16 – 20 13 37
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

Solve for 𝐷2 : k = 1, n = 50, LTCB = 5.5, 𝑐𝑓𝑏 = 5, 𝑓𝑄1 = 7, 𝑖=5


2(𝑛)
−𝑐𝑓𝑏
10
𝐷2 = 𝐿𝑇𝐶𝐵 + ( )𝑖
𝑓𝐷2
10−5
𝐷2 = 5.5 + ( )5
7
𝑫𝟐 = 𝟗. 𝟎𝟕

b. 𝐷7 = ?
𝑘𝑛
Solve for 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, k = 7, n =50
10
𝑘𝑛 7(50)
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 35th value
10 10

Length of Service (Cl) No. of Employees


cf<
LL-UL f
1–5 5 5
6 – 10 7 12
11 – 15 12 24 – 𝑐𝑓𝑏
𝐷7 16 – 20 13 - 𝑓𝑄1 37 35th
class value

69
21 – 25 6 43
26 – 30 4 47
31 – 35 3 50
𝑛 = 50

Solve for 𝐷7 : k = 7, n = 50, LTCB = 15.5, 𝑐𝑓𝑏 = 24, 𝑓𝑄1 = 13, 𝑖=5
7(𝑛)
−𝑐𝑓𝑏
𝐷7 = 𝐿𝑇𝐶𝐵 + ( 10 )𝑖
𝑓𝐷7
35−24
𝐷7 = 15.5 + ( )5
13
𝑫𝟐 = 𝟏𝟗. 𝟕𝟑

Supplemental Activity: Use the given in the example to solve for 𝑫𝟒 and 𝑫𝟗 .
Answers: 𝐷4 = 13.83, 𝐷9 = 19.73

70
Assessment Task 4-1

The following are the average number of minutes required to do an assembly job by
the 15 workers of a manufacturing plant:

77, 85, 63, 54, 62, 78, 80, 48, 63, 79, 69, 55, 63, 78,71.

Find:

a. 𝑃77

b. 𝑄3

c. 𝐷6

71
Assessment Task 4-2

The following table gives the frequency distribution of the number of orders received
each day during the past 50 days at the office of a mail-order company.

Number of orders frequency


10-12 4
13-15 8
16-18 18
19-21 10
22-24 4
25-27 6

Find:

a. 𝑃77

b. 𝑄3

c. 𝐷6

72
Summary

• A measure of position is a method by which the position that a particular data


value has within a given data set can be identified. It is also called a measure of
location, quantiles or fractiles.
• Percentiles are values that divide a distribution into 100 equal parts. To form 100
parts, there are 99 points of divisions, from 𝑃1 to 𝑃99 . Each percentile score
identifies the parts of the distribution below it.
• Quartiles are the summary measures that divide a ranked data set into four equal
parts. Three measures will divide any data set into four equal parts.
• The difference between the first and the third quartile is called the Interquartile
range (IQR).
• One-half this distance is called the semi-interquartile range or the quartile
deviation (QD).
• Deciles are similar to quartiles. But while quartiles sort data into four quarters,
deciles sort data into ten equal parts: The 10th, 20th, 30th, 40th, 50th, 60th, 70th,
80th, 90th and 100th percentiles.

References

• Amid, D. M. (2005) Fundamentals of Statistics. Quezon City. Lorimar Publishing


Company Inc.
• Cruz, C. U. (2002) Statistics. Marikina City. Instructional Coverage System
Publishing Inc.
• Decile Formula. (2020). https://www.educba.com/decile-formula/

73
• Febre Jr., F. A. (1987) Introductory Statistics. Quezon City. Phoenix Publishing
House, Inc.
• Glen, S. . (2017, July 31). "Fractile: Simple Definition"
From StatisticsHowTo.com: Elementary Statistics for the rest of
us! https://www.statisticshowto.com/fractile-simple-definition/
• Glen, S. (2020). What is an Interquartile Range?
https://www.statisticshowto.com/probability-and-statistics/interquartile-range/
• Glen, S. (2014, February 21). "What is a
Decile?". https://www.statisticshowto.com/decile/
• Lane, D. (n.d.). Semi-Interquartile Range.
http://davidmlane.com/hyperstat/A48607.html
• Measures of Position. (n.d.). http://www.milefoot.com/math/stat/desc-
positions.htm#:~:text=A%20measure%20of%20position%20is,to%20defining%2
0such%20a%20measure.
• Yap, N. (2014, October 12). Fractiles.
https://www.slideshare.net/nemalynyap/fractiles#:~:text=Fractiles%20are%20me
asures%20of%20location,Q2%2C%20Q3%2C%20and%20Q4.

74
MODULE 5
MEASURES OF DISPERSION, SKEWNESS AND
KURTOSIS

Introduction

In statistics, the measure of central tendency gives a single value that


represents the whole value; however, the central tendency cannot describe the
observation fully. The measure of dispersion helps to study the variability of the
items. In a statistical sense, dispersion has two meanings: first it measure s the
variation of the items among themselves, and second, it measures the variation
around the average (Statistics Solutions, 2020). Statisticians use summary
measures to describe the amount of variability or spread in a set of data. The most common
measures of variability are the range, the interquartile range (IQR), variance, and standard
deviation (Stat Trek, 2020). The measure of dispersion shows the scatterings of the data. It
tells the variation of the data from one another and gives a clear idea about the distribution of the
data. The measure of dispersion shows the homogeneity or the heterogeneity of the distribution
of the observations (Toppr, n.d.). Dispersion is the state of getting dispersed or spread.
Statistical dispersion means the extent to which a numerical data is likely to vary about an
average value. In other words, dispersion helps to understand the distribution of the data
(Byjus, 2020).

75
Learning Outcomes

At the end of this module, students should be able to:

5. Define variance, standard deviation, coefficient of variation, skewness and kurtosis;

6. Compute for variance, standard deviation, coefficient of variation, skewness and


kurtosis; and

7. Interpret skewness and symmetry.

76
Lesson 1. Variance and Standard Deviation

Variance is the average of the squared deviations (differences) from the mean
(Investopedia, 2020). It is a measurement of the spread between numbers in a data set.
That is, it measures how far each number in the set is from the mean and therefore from
every other number in the set. Variance is calculated by taking the differences between each
number in the data set and the mean, then squaring the differences to make them positive,
and finally dividing the sum of the squares by the number of values in the data set (Hayes,
2019). The sample variance is denoted by 𝑠 2 and the population variance by 𝛿 2 (Brown,
2019).
The Standard Deviation is a measure of how spread out numbers are (Math is fun,
2017). It is a statistic that measures the dispersion of a dataset relative to its mean and is
calculated as the square root of the variance. The standard deviation is calculated as the
square root of variance by determining each data point's deviation relative to the mean. If
the data points are further from the mean, there is a higher deviation within the data set;
thus, the more spread out the data, the higher the standard deviation (Hargrave, 2020). The
sample standard deviation is denoted by 𝑠 and the population standard deviation by 𝛿
(Brown, 2019).
The formula for variance and standard deviation are given below (Febre, 1987):

Measure of Dispersion Ungrouped Data Grouped Data

Σ(x − 𝑥ҧ )2 Σf(x𝑚 − 𝑥ҧ )2
Variance 𝑠2 = 𝑠2 =
𝑛−1 𝑛−1

Σ(x − 𝑥ҧ )2 Σ(x − 𝑥ҧ )2
Standard Deviation 𝑠=√ 𝑠=√
𝑛−1 𝑛−1

where: 𝑠2 = variance
𝑠 = standard deviation
Σ = summation
𝑥 = raw score
𝑥ҧ = mean
𝑛 = total number of items/frequency
𝑓 = frequency

77
𝑥𝑚 = midpoint/class mark

Example: Ungrouped Data


Calculate the variance and standard deviation of the following data:
8, 9, 10, 12, 17, 18, 18, 19, 20, 21.
Solution:
Step 1: Compute for the mean.
𝑥 𝑥ҧ
8
15.2
9
10
12
17
18
18
19
20
21
Σ𝑥 = 152
𝑛 = 10
152
𝑥ҧ = = 15.2
10

Step 2: Subtract the mean from x.


𝑥 𝑥ҧ 𝑥 − 𝑥ҧ

8 −7.2
15.2
9 −6.2

10 −5.2

12 −3.2

17 1.8

18 2.8

18 2.8

19 3.8

78
20 4.8

21 5.8
Σ𝑥 = 152
𝑛 = 10
Step 3: Get the square of 𝑥 − 𝑥ҧ .

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
8 −7.2 51.84
15.2
9 −6.2 38.44
10 −5.2 27.04
12 −3.2 10.24
17 1.8 3.24
18 2.8 7.84
18 2.8 7.84
19 3.8 14.44
20 4.8 23.04
21 5.8 33.64
Σ𝑥 = 152
𝑛 = 10

Step 4: Get the sum of (𝑥 − 𝑥ҧ )2  Σ(𝑥 − 𝑥ҧ )2


𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
8 −7.2 51.84
15.2
9 −6.2 38.44
10 −5.2 27.04
12 −3.2 10.24
17 1.8 3.24
18 2.8 7.84
18 2.8 7.84
19 3.8 14.44
20 4.8 23.04
21 5.8 33.64
Σ𝑥 = 152
Σ(𝑥 − 𝑥ҧ )2 = 217.6
𝑛 = 10

79
Step 5: Compute for variance and standard deviation
Σ(x−𝑥ҧ )2 Σ(x−𝑥ҧ )2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1

217.6 217.6
𝑠2 = 𝑠=√
10−1 10−1

217.6 217.6
𝑠2 = 𝑠=√
9 9

𝒔𝟐 = 𝟐𝟒. 𝟏𝟖 𝑠 = √24.18

𝒔 = 𝟒. 𝟗𝟐

Supplemental Activity: Solve for the variance and standard deviation from the given data: 23, 18, 27,
30, 43, 78.

Answer: 𝑠 2 = 484.3, 𝑠 = 22.01

Example: Grouped Data


Determine the variance and standard deviation for the following data.
𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
54 – 62 10
63 – 71 11
72 – 80 12

80
81 – 89 8
90 – 98 3

Solution:
Step 1: Compute for the mean.
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ
18 – 26 2 22 44
63.7
27 – 35 3 31 93
36 – 44 5 40 200
45 – 53 6 49 294
54 – 62 10 58 580
63 – 71 11 67 737
72 – 80 12 76 912
81 – 89 8 85 680
90 – 98 3 94 282
𝑛 = 60 Σ𝑓𝑥𝑚 = 3822

3822
𝑥ҧ = =63.7
60

Step 2: Subtract the mean from each of the midpoint and get its absolute value
 |𝑥𝑚 − 𝑥|̅

𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ |
18 – 26 2 22 44 41.7
63.7
27 – 35 3 31 93 32.7
36 – 44 5 40 200 23.7
45 – 53 6 49 294 14.7
54 – 62 10 58 580 5.7
63 – 71 11 67 737 3.3
72 – 80 12 76 912 12.3

81
81 – 89 8 85 680 21.3
90 – 98 3 94 282 30.3
𝑛 = 60 Σ𝑓𝑥𝑚 = 3822

Step 3. Multiply the result in Step 3 by the frequency  𝑓|𝑥𝑚 − 𝑥ҧ |.

𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | 𝑓|𝑥𝑚 − 𝑥ҧ |


18 – 26 2 22 44 41.7 83.4
63.7
27 – 35 3 31 93 32.7 98.1
36 – 44 5 40 200 23.7 118.5
45 – 53 6 49 294 14.7 88.2
54 – 62 10 58 580 5.7 57
63 – 71 11 67 737 3.3 36.3
72 – 80 12 76 912 12.3 147.6
81 – 89 8 85 680 21.3 170.4
90 – 98 3 94 282 30.3 90.9
𝑛 = 60 Σ𝑓𝑥𝑚 = 3822

Step 4: Multiply the 6th and 7th columns and then get the sum
 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2 .
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | 𝑓|𝑥𝑚 − 𝑥ҧ | 𝑓(𝑥𝑚 − 𝑥ҧ )2
18 – 26 2 22 44 41.7 83.4 3477.78
63.7
27 – 35 3 31 93 32.7 98.1 3207.87
36 – 44 5 40 200 23.7 118.5 2808.45
45 – 53 6 49 294 14.7 88.2 1296.54
54 – 62 10 58 580 5.7 57 324.9
63 – 71 11 67 737 3.3 36.3 119.79
72 – 80 12 76 912 12.3 147.6 1815.48
81 – 89 8 85 680 21.3 170.4 3629.52

82
90 – 98 3 94 282 30.3 90.9 2754.27
Σ𝑓𝑥𝑚 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2
𝑛 = 60
= 3822 = 19434.6

Step 5: Compute for the variance and standard deviation.

Σf(x𝑚 −𝑥ҧ )2 Σf(x𝑚 −𝑥ҧ )2


𝑠2 = 𝑠=√
𝑛−1 𝑛−1

19 434.6 19 434.6
𝑠2 = 𝑠=√
60−1 60−1

19 434.6 19 434.6
𝑠2 = 𝑠=√
59 59

𝒔𝟐 = 𝟑𝟐𝟗. 𝟒 𝑠 = √329.4

𝒔 = 𝟏𝟖. 𝟏𝟓

Supplemental Activity: Solve for the variance and standard deviation from the data below:

CI f
0–9 4
10 – 19 9
20 – 29 6
30 – 39 4
40 – 49 2

Answer: 𝑠 2 = 140.67, 𝑠 = 11.86

83
Lesson 2. Coefficient of Variation (Febre, 1987)

The coefficient of variation (CV) is one type of measure of relative dispersion which
expresses the standard deviation as a percentage of the mean.

The following formula for the coefficient of variation:

𝑠
𝑐𝑣 = 𝑥100% for sample data
𝑥ҧ

𝜎
𝐶𝑉 = 𝑥100% for population data
𝜇
Example 1: Calculate the coefficient of variation of each of the following samples and
interpret the result.

Sample A: 24, 28, 32, 35, 37, 43, 48, 59, 62, 64

Sample B: 212, 218, 220, 223, 234, 238, 245, 258

Solution:

Sample A: Compute for the mean and standard deviation first to calculate for the
coefficient of variation.

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
24 −19.2 368.64
43.2
28 −15.2 231.04
32 −11.2 125.44
35 −8.2 67.24
37 −6.2 38.44
43 −0.2 0.04
48 4.8 23.04
59 15.8 249.64

84
62 18.8 353.44
64 20.8 432.64
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6
𝑛 = 10

Σ𝑥 432
𝑥ҧ = = = 43.2
𝑛 10

Σ(x−𝑥ҧ )2 1889.6
𝑠=√ = √ = 14.49
𝑛−1 10−1

𝑠 14.49
𝑐𝑣 = 𝑥100% = 𝑥100% = 33.54%
𝑥ҧ 43.2

Sample B: Compute for the mean and standard deviation first to calculate for the coefficient
of variation.

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
212 −19 361
231
218 −13 169
220 −11 121
223 −8 64
234 3 9
238 7 49
245 14 196
258 27 729
Σ𝑥 = 1848
Σ(𝑥 − 𝑥ҧ )2 = 1698
𝑛=8

Σ𝑥 1848
𝑥ҧ = = = 231
𝑛 8

85
Σ(x−𝑥ҧ )2 1698
𝑠=√ = √ 8−1 = 15.57
𝑛−1

𝑠 15.57
𝑐𝑣 = 𝑥100% = 𝑥100% = 6.74%
𝑥ҧ 231

The standard deviation of A is 33.54% of the mean, while standard deviation of B is


only 6.74% of the mean. This means that, relative to the mean, the value of A are more
scattered or more varied than those in B.

Example 2: Calculate the coefficient of variation for the following distribution.

𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
54 – 62 10
63 – 71 11
72 – 80 12
81 – 89 8
90 – 98 3
Solution:

Compute for the mean and standard deviation first to calculate for the coefficient of variation.

𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | 𝑓|𝑥𝑚 − 𝑥ҧ | 𝑓(𝑥𝑚 − 𝑥ҧ )2


18 – 26 2 22 44 63.7 41.7 83.4 3477.78

86
27 – 35 3 31 93 32.7 98.1 3207.87
36 – 44 5 40 200 23.7 118.5 2808.45
45 – 53 6 49 294 14.7 88.2 1296.54
54 – 62 10 58 580 5.7 57 324.9
63 – 71 11 67 737 3.3 36.3 119.79
72 – 80 12 76 912 12.3 147.6 1815.48
81 – 89 8 85 680 21.3 170.4 3629.52
90 – 98 3 94 282 30.3 90.9 2754.27
Σ𝑓𝑥𝑚 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2
𝑛 = 60
= 3822 = 19434.6

Σ𝑓𝑥𝑚 3822 Σ𝑓(𝑥𝑚−𝑥ҧ )2 19434.6


𝑥ҧ = = = 63.7 𝑠=√
𝑛−1
=√
60−1
= 18.15
𝑛 60

𝑠 18.15
𝑐𝑣 = 𝑥100% = 𝑥100% = 28.49%
𝑥ҧ 63.7

Lesson 3. Measure of Skewness


A frequency curve not symmetrical about the mean is said to be skewed. It is said to
be positively skewed if it tails off to the right, and negatively skewed if it tails off to the left.
The relationship between the mean and the median is related to the direction of skewness. If
the mean is greater than the median, the curve is positively skewed but if the mean is less
than the median, the curve is negatively skewed (Amid, 2005). Skewness is the degree of
distortion from the symmetrical bell curve or the normal distribution. It measures the lack of
symmetry in data distribution (Dugar, 2018).

87

Figure 5.1. Types of Skewness


(Dugar, 2018)
Figure 5.2. Positively & Negatively Skewed Distribution
(Finance Train, 2020)

With the use of standard deviation it is possible to obtain a measure of skewness which
indicates both the direction and magnitude of skewness of a frequency data. It is called the
Pearsonian coefficient of skewness (Sk) given by the formula (Amid, 2005):
̅−𝑴𝒅𝒏)
𝟑(𝒙
𝑺𝒌 = .
𝒔

The algebraic sign of the value of Sk indicates the direction of skewness while
magnitude of the value of Sk indicates the extent to which the curve is skewed. The value of
Sk is positive if the mean is greater than the median, negative if the mean is less than the
median, and zero if they are equal. The curve is bell-shaped and symmetrical when Sk=0.
As a rule the closer the coefficient of skewness is to zero, the less skewed the distribution
will be and the farther it is from zero, the more skewed the distribution will be (Amid, 2005).

If:

Sk > 0, positively skewed distribution

Sk < 0, negatively skewed distribution

Sk = 0, normal distribution (symmetrical)

Example 1: Find the measure of skewness given the following set of data: 24, 28, 32, 35, 37,
43, 48, 59, 62, 64. Interpret the result.

88
Solution: Solve for the mean, median, standard deviation skewness.

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
24 −19.2 368.64
43.2
28 −15.2 231.04
32 −11.2 125.44
35 −8.2 67.24
37 −6.2 38.44
43 −0.2 0.04
48 4.8 23.04
59 15.8 249.64
62 18.8 353.44
64 20.8 432.64
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6
𝑛 = 10

Σ𝑥 432
𝑥ҧ = = = 43.2
𝑛 10

37+43
𝑀𝑑𝑛 = = 40
2

Σ(x−𝑥ҧ )2 1889.6
𝑠=√ = √ 10−1 = 14.49
𝑛−1

3(𝑥ҧ −𝑀𝑑𝑛) 3(43.2−40)


𝑆𝑘 = = = 0.66 Positively skewed distribution
𝑠 14.49

Example 2: Find the measure of skewness given the following set of data: 212, 218, 220,
223, 234, 238, 245, 258. Interpret the result.

89
Solution: Solve for the mean, median, standard deviation skewness.

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2
212 −19 361
231
218 −13 169
220 −11 121
223 −8 64
234 3 9
238 7 49
245 14 196
258 27 729
Σ𝑥 = 1848
Σ(𝑥 − 𝑥ҧ )2 = 1698
𝑛=8

Σ𝑥 1848
𝑥ҧ = = = 231
𝑛 8
223+234
𝑀𝑑𝑛 = = 228.5
2

Σ(x−𝑥ҧ )2 1698
𝑠=√ = √ 8−1 = 15.57
𝑛−1

3(𝑥ҧ −𝑀𝑑𝑛) 3(231−40)


𝑆𝑘 = = = 0.62 Positively skewed distribution
𝑠 15.57

Example 3: Compute for skewness and interpret the result.

𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6
54 – 62 10
63 – 71 11
72 – 80 12

90
81 – 89 8
90 – 98 3

Solution: Compute for the mean, median and standard deviation

𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | 𝑓|𝑥𝑚 − 𝑥ҧ | 𝑓(𝑥𝑚 − 𝑥ҧ )2 cf<


18 – 26 2 22 44 41.7 83.4 3477.78 2
63.7
27 – 35 3 31 93 32.7 98.1 3207.87 5
36 – 44 5 40 200 23.7 118.5 2808.45 10
45 – 53 6 49 294 14.7 88.2 1296.54 16
54 – 62 10 58 580 5.7 57 324.9 26
63 – 71 11 67 737 3.3 36.3 119.79 37
72 – 80 12 76 912 12.3 147.6 1815.48 49
81 – 89 8 85 680 21.3 170.4 3629.52 57
90 – 98 3 94 282 30.3 90.9 2754.27 60
Σ𝑓(𝑥𝑚
Σ𝑓𝑥𝑚
𝑛 = 60 − 𝑥ҧ )2
= 3822
= 19434.6
Σ𝑓𝑥𝑚 3822 Σ𝑓(𝑥𝑚−𝑥ҧ )2 19434.6
𝑥ҧ = = = 63.7 𝑠=√
𝑛−1
=√
60−1
= 18.15
𝑛 60

𝑛
−𝑐𝑓𝑏 30−26
𝑀𝑑𝑛 = 𝐿𝑇𝐶𝐵 + ( 2𝑓 ) 𝑖 = 62.5 + ( ) 9 = 65.77
𝑀𝑑𝑛 11

3(𝑥ҧ −𝑀𝑑𝑛) 3(63.7−65.77)


𝑆𝑘 = = = −0.34 Negatively skewed distribution
𝑠 18.15

Lesson 4. Measure of Kurtosis

Curves of distributions having the same coefficient of skewness may still differ
significantly. Symmetrical curves may vary in shape and this may be because curves do not

91
have the same peakedness, a property of curves which can be described by computing for
the value called measure of kurtosis (Febre, 1987).

Kurtosis (K) is the measure of peakedness or flatness of a data distribution relative to a


normal distribution (Faculty of the Institute of Statistics, n.d.). There are three types of
symmetrical curves namely: mesokurtic curve, leptokurtic curve and platykurtic curve.
Mesokurtic curve shows a normal or ideal curve, leptokurtic curve shows a more peaked
curve and a platykurtic curve shows a flat-topped curve (Amid, 2005).

Figure 5.3 Types of Kurtosis


(Finance Train, 2020)

The kurtosis of a set of data is obtained by simply dividing the fourth moment about
the mean by the square of the variance (Febre, 1987).

Σ(𝑥−𝑥ҧ )4
𝐾= for ungrouped data (Febre, 1987)
𝑛𝑠 4

Σf(𝑥𝑚 −𝑥ҧ )4
𝐾= for grouped data (Febre, 1987)
𝑛𝑠 4

If K = 3, mesokurtic;

K > 3, leptokurtic; and (Febre, 1987)

K < 3, platykurtic.

92
Example 1: Find the measure of kurtosis given the following set of data: 24, 28, 32, 35, 37,
43, 48, 59, 62, 64. Interpret the result.

Solution: Solve for the mean and standard deviation to solve for kurtosis.

(𝑥 − 𝑥ҧ )4 = ((𝑥 − 𝑥ҧ )2 )2

𝑥 𝑥ҧ 𝑥 − 𝑥ҧ (𝑥 − 𝑥ҧ )2 (𝑥 − 𝑥ҧ )4
24 −19.2 368.64 135895.45
43.2
28 −15.2 231.04 53379.48
32 −11.2 125.44 15735.19
35 −8.2 67.24 4521.22
37 −6.2 38.44 1477.63
43 −0.2 0.04 0
48 4.8 23.04 530.84
59 15.8 249.64 62320.13
62 18.8 353.44 124919.83
64 20.8 432.64 187177.37
Σ𝑥 = 432
Σ(𝑥 − 𝑥ҧ )2 = 1889.6 Σ(𝑥 − 𝑥ҧ )4 = 585957.14
𝑛 = 10

Σ𝑥 432 Σ(x−𝑥ҧ )2 1889.6


𝑥ҧ = = = 43.2 𝑠=√ = √ = 14.49
𝑛 10 𝑛−1 10−1

Σ(𝑥−𝑥ҧ )4 585957.14
𝐾= = (10)(14.49)4 = 1.32 PLATYKURTIC
𝑛𝑠 4

Since the computed K < 3, then the curve is PLATYKURTIC.

Example 2: Compute for kurtosis and interpret the result.

𝐶𝐼 𝑓
18 – 26 2
27 – 35 3
36 – 44 5
45 – 53 6

93
54 – 62 10
63 – 71 11
72 – 80 12
81 – 89 8
90 – 98 3

Solution: Solution: Solve for the mean and standard deviation to solve for kurtosis.

(𝑥𝑚
𝐶𝐼 𝑓 𝑥𝑚 𝑓𝑥𝑚 𝑥ҧ |𝑥𝑚 − 𝑥ҧ | − 𝑥ҧ )2
𝑓(𝑥𝑚 − 𝑥ҧ )2 𝑓(𝑥𝑚 − 𝑥ҧ )4

18 – 26 2 22 44 41.7 1738.89 3477.78 6047476.86


63.7
27 – 35 3 31 93 32.7 1069.29 3207.87 3430143.31

36 – 44 5 40 200 23.7 561.69 2808.45 1577478.28

45 – 53 6 49 294 14.7 216.09 1296.54 280169.33

54 – 62 10 58 580 5.7 32.49 324.9 10556

63 – 71 11 67 737 3.3 10.89 119.79 1304.51

72 – 80 12 76 912 12.3 151.29 1815.48 274663.97

81 – 89 8 85 680 21.3 453.69 3629.52 1646676.93

90 – 98 3 94 282 30.3 918.09 2754.27 2528667.74


𝑛 Σ𝑓𝑥𝑚 Σ𝑓(𝑥𝑚 − 𝑥ҧ )2 Σ𝑓(𝑥𝑚 − 𝑥ҧ )4
= 60 = 3822 = 19434.6 = 15797136.93

Σ𝑓𝑥𝑚 3822 Σ𝑓(𝑥𝑚−𝑥ҧ )2 19434.6


𝑥ҧ = = = 63.7 𝑠=√
𝑛−1
=√
60−1
= 18.15
𝑛 60

Σf(𝑥𝑚 −𝑥ҧ )4 15797136.93


𝐾= = = 2.43 PLATYKURTIC
𝑛𝑠 4 (60)(18.15)4

94
Assessment Task 5-1

Given the scores of ten girls in a 30-item quiz:

18 19 23 7 20 21 24 26 18 22

Assessment Task 5-2

Find:

1. Variance
2. Standard deviation
3. Coefficient of variation
4. Skewness (interpret the result)
5. Kurtosis (interpret the result)

95
Assessment Task 5-2

The following is a frequency distribution of an admission test.

CI f
21-30 8
31-40 11
41-50 15
51-60 18
61-70 20
71-80 12
81-90 9
91-100 7

Find:

1. Variance
2. Standard deviation
3. Coefficient of variation
4. Skewness (interpret the result)
5. Kurtosis (interpret the result)

96
Summary

• Variance is the average of the squared deviations (differences) from the mean.
• The standard deviation is calculated as the square root of variance by
determining each data point's deviation relative to the mean.
• The coefficient of variation (CV) is one type of measure of relative dispersion
which expresses the standard deviation as a percentage of the mean.
• Skewness is the degree of distortion from the symmetrical bell curve or the normal
distribution. It measures the lack of symmetry in data distribution
• Kurtosis (K) is the measure of peakedness or flatness of a data distribution
relative to a normal distribution.

References

• Amid, D. M. (2005) Fundamentals of Statistics. Lorimar Publishing Company Inc.


Quezon City.
• Brown, S. (2019). Statistics Symbol Sheet.
https://brownmath.com/swt/symbol.htm#:~:text=For%20variance%2C%20apply
%20a%20squared%20symbol%20(s%C2%B2%20or%20%CF%83%C2%B2).&t
ext=%CE%BC%20and%20%CF%83%20can%20take,mean%20or%20standard
%20deviation%20of.
• Byjus. (2020). Dispersion and Measures of Dispersion.
https://byjus.com/maths/dispersion/
• Dugar, D. (2018, August 24). Skew and Kurtosis: 2 Important Statistics terms
you need to know in Data Science. https://codeburst.io/2-important-statistics-
terms-you-need-to-know-in-data-science-skewness-and-kurtosis-388fef94eeaa
• Febre Jr., F. A. (1987) Introduction to Statistics. Phoenix Publishing House, Inc.
• Hargrave, M. (2020, September 19). Standard Deviation Definition.
https://www.investopedia.com/terms/s/standarddeviation.asp#:~:text=The%20st

97
andard%20deviation%20is%20a,square%20root%20of%20the%20variance.&tex
t=If%20the%20data%20points%20are,the%20higher%20the%20standard%20de
viation.
• Hayes, A. (2019, September 2). Variance.
https://www.investopedia.com/terms/v/variance.asp
• Faculty of the Institute of Statistics. (n.d.). Workbook in Statistics 1. UP Los
Banos, College Laguna 4031
• Finance Train. (2020). Interpretation of Skewness, Kurtosis, Coskewness,
Cokurtosis. https://financetrain.com/interpretation-of-skewness-kurtosis-
coskewness-cokurtosis/
• Investopedia. (2020, July 14). Standard Deviation vs. Variance: What’s the
Difference? https://www.investopedia.com/ask/answers/021215/what-difference-
between-standard-deviation-and-variance.asp
• Measures of Dispersion. (n.d.). https://www.toppr.com/guides/business-
mathematics-and-statistics/measures-of-central-tendency-and-
dispersion/measure-of-
dispersion/#:~:text=As%20the%20name%20suggests%2C%20the,the%20distrib
ution%20of%20the%20observations.
• Standard Deviation and Variance. (2017).
https://www.mathsisfun.com/data/standard-
deviation.html#:~:text=The%20Standard%20Deviation%20is%20a%20measure
%20of%20how%20spread%20out%20numbers%20are.&text=The%20formula%
20is%20easy%3A%20it,square%20root%20of%20the%20Variance.
• Statistics Solutions. (2020). Dispersion.
https://www.statisticssolutions.com/dispersion/
• Stat Trek. (2020). How to Measure Variability. https://stattrek.com/descriptive-
statistics/variability.aspx#:~:text=Statisticians%20use%20summary%20measure
s%20to,%2C%20variance%2C%20and%20standard%20deviation.

98

You might also like