MODULE IN STATISTICS Measures Tendency Variability
MODULE IN STATISTICS Measures Tendency Variability
3
MEASURES OF CENTRAL
OUTLINE
OBJECTIVES • INTRODUCTION
• MEAN
After completing this chapter, • MEDIAN
you should be able to: • MODE
1. Define the mean,
median, and mode.
2. Understand the
purposes of
measures of central
tendency.
3. Calculate the
mean, median, and
mode of the given
set of data.
11
INTRODUCTION
Now that we have visualized our data to understand its shape, we can begin with numerical analyses! The
descriptive statistics presented in this chapter serve to start to describe the distribution of our data objectively and
mathematically – our first step into statistical analysis!
Statistical data can be described in a variety of ways, based on two factors: the nature of the data and the reason for
which the data was obtained. When quantitatively or vocally explaining data, make sure the description is neither
too short nor too long. We can compare two or more distributions from the same time period or within the same
distribution over time using measures of central tendency. For example, an average can be used to compare
coffee consumption in two separate territories over the same time period or in a single territory over two years, such
as 2021 and 2022.
Finding the center of a distribution may be more challenging than first thought. Imagine this situation: You are in a
class with just four other students, and the five of you took a 5-point pop quiz. Today your teacher is walking
around the room, handing back the quizzes. She stops at your desk and hands you your paper. Written in bold black
ink on the front is “3/5.” How do you react? Are you happy with your score of 3 or disappointed? How do you
decide? You might calculate your percentage correct, realize it is 60%, and be appalled. But it is more likely that
when deciding how to react to your performance, you will want additional information. What additional
information would you like? If you are like most students, you will immediately ask your neighbors, “How much
did you get?” and then ask the teacher, “How did the class do?” In other words, the additional information you want
is how your quiz score compares to other students’ scores. You therefore understand the importance of comparing
your score to the class distribution of scores. Should your score of 3 turn out to be among the higher scores, then
you’ll be pleased after all. On the other hand, if 3 is among the lower scores in the class, you won’t be quite so
happy.
In statistics, the central tendency is the descriptive summary of a data set. It reflects the data distribution's center
through a single value from the dataset. Furthermore, where it presents a summary of the dataset, it does not
provide information on specific data from the dataset. In general, several statistical metrics can be used to
determine a dataset's central tendency.
Based on the properties of the data, the measures of central tendency are selected.
If you have a symmetrical distribution of continuous data, all the three measures of central tendency hold good. But
most of the times, the analyst uses the mean because it involves all the values in the distribution or dataset.
If you have skewed distribution, the best measure of finding the central tendency is the median.
If you have the original data, then both the median and mode are the best choice of measuring the central tendency.
If you have categorical data, the mode is the best choice to find the central tendency.
In this chapter, we'll look at the three most important measures of central tendency: mean, median, and mode.
2
THE MEAN
The arithmetic mean is the most often used measure of central tendency. It is known as the mean or
computed average. It is defined to be the sum of the values of a group of items divided by the number of such
items. The mean of sample of scores on a variable x is symbolized by x (x-bar) and the mean of the population is
represented by the μ (mu). Researchers are frequently obliged to estimate μ from x , since they cannot measure
every item in the population.
Fig. 2.1
Graphical display of the relationship of a sample mean from the population mean of a data set
CHARACTERISTICS OF MEAN
When using sample data to create population inferences, the mean is a more dependable or stable
measurement to utilize. It is the point at which all of the values on both sides are balanced. The mean is sensitive
to the values, whether they are high or low. When the distribution is extremely skewed, it loses its representative
character, and the mean cannot be determined when the distribution contains open-ended intervals in the balance
of additional information, making it an improper average to use.
USES OF MEAN
The mean is the most widely used, most easily understood, and most generally recognized average. When
the distribution is symmetrical, it is the optimal measure to use, and it is a valuable measure for inferential
statistics. It can also be used to calculate the average value of a set of values after each one has been weighted. It's
regarded as a weighted average.
n
In simpler form, the formula may be presented as, sum of the data
sample mean →
∑ x1 number of pieces of data
i=1
x=
n
3
x=
∑x
N
where x = sample mean
x = value of each item
n = number of items in the sample
Σ= “the sum of”
μ=
∑x
N
Example 1:
Let us consider the scores of Triz Glee in her statistics class. (The scores have been arrayed in a
descending order.);
76
76
62
51
45
27
12
6
2
357 Total of all scores
Since in the case of Triz Glee's scores Σx = 357, Triz Glee's mean score is
x=
∑ x = 357 =¿ 39.67
n 9
Example 2:
A random sample of six cashiers in a grocery store shows the following balances at the end of the day:
₱16,640.39; ₱26,915.59; ₱6,827.08; ₱101,791.17; ₱61,811.75; and ₱20,244.12. Compute the mean balance.
4
n
∑ f 1 x1
x= i=1n
∑ f1
i =1
x=
∑ fx
∑f
Examp le 1:
The final grades of Levi Rosell at the end of the semester are the following:
Subjects Grades (x) Units (f)
Examp le 2:
Consider the daily earnings of the employees of a small accounting firm: 210, 210, 850, 360, 310, 310,
210, 210, 960, 210. Find the mean daily salary.
5 2
Sum of salaries = 210 + 210 + 210 + 210 + 210 + 310 + 310 + 360 + 850 + 960
The sum of the salaries can be written as shown
Salary
5
Sum of salaries = 210 : 5 + 310 : 2 + 360 : 1 + 350 : 1 + 960 : 1
Frequency
sum of salaries
Solution 2:
210 5 1,050
310 2 620
360 1 360
850 1 850
960 1 960
= 3,840
10
x = ₱384
∑ f 1 x1 (for a sample)
x= i=1
n
n
∑ f 1 x1 (for a population)
i=1
μ=
N
x=
∑ fx , μ= ∑ fX
n N
6
Example:
Compute for the mean height of 50 men using the long method.
61-63 2 62 124
64-66 5 65 325
67-69 12 68 816
70-72 15 71 1,065
73-75 8 74 592
76-78 5 77 385
79-81 3 80 240
x=
∑ f Xi
n
= 3,547
50
x = 70.94
B. Coded Formula
This formula requires coding and is called the coded formula for the mean.
The procedure is as follows:
1. Take the class mark of the class intervals as an assumed mean.
Denote this by xo·
This xo is set to a zero (origin).
2. The class marks of the classes following the class containing the origin are coded +1,+2... The class
marks prior to the class containing the origin are coded -1, -2, ... or the class marks may be
expressed by the codes. Ui = x where C is the size of the class interval.
3. Multiply the coded values (Ui) by the corresponding frequencies (fi) and find the sum.
4. Divide the sum (ΣfU) by the total number of frequencies (n) and multiply the result by the size of
the class interval (C).
5. The result is then added to the assumed mean (xo).
( )
n
∑ f i Ui
i =1
x=x o + C
n
Compute for the mean height of 50 men using the long method.
61-63 2 62 -3 -6
64-66 5 65 -2 -10
67-69 12 68 -1 -12
70-72 15 71 0 0
73-75 8 74 1 8
76-78 5 77 2 10
79-81 3 80 3 9
x=x o + (∑ )
n
fU
C
¿ 71 + ( )
−1
50
3
¿ 71 - .06
x=¿ 70.94
THE MEDIA N
The median (symbol Md) of a set of data is a measure of central tendency that occupies the middle
value in an array of numbers. It is the number that separates the bottom half of the data from the top half,
meaning half of the data items are below the median and half are above it. The median is just the middle value
when there are an odd number of items. The median is the average of the two middle data values in the ordered
list if n is even.
8
COMPUTATION OF THE MEDIAN FOR UNGROUPED DATA
The median is computed as follows:
1. Arrange the items in an array.
2. Identify the middle value.
Example 1:
The amount of money a peanut vendor earned on five randomly selected days are:
₱86, 109, 141, 74, 123
Example 2:
Let us consider the average grades of 10 students: 83, 74, 63, 77, 81, 100, 60, 73, 86, 91. Arranging them,
we have 60, 63, 73, 74, 77, 81, 83, 86, 91, 100. Here, there are 10 (even) items.
Median = Md = 77 + 81 = 79
2
Example 3:
In an example of Michelle's grades in statistics, the middle value in the array and thus the median grade is
45.
( )
N
−Cfp
Median = Md = LB + 2 C
fmd
9
N
In the computation of the median, the number of the desired item is first determined by (n = number of
2
items). Referring to “less than” cumulative frequency distribution by cumulative addition, the amount is continued
until the group containing the middle value is located. The class interval where median is located is called the
N
median class. The median class is the class interval where is found.
2
Example:
61-63 2 2
64-66 5 7
67-69 12 19
70-72 15 34
73-75 8 42
76-78 5 47
79-81 3 50
N
= 25, the median class is 70-72
2
( )
N
−Cfp
Md = LB + 2 C
fmd
= 69.5 + ( 25−19
15 )
3
= 69.5 + 1.2
Md = 70.70 inches 50% of the scores in the distribution are smaller
than 70.70 inches.
THE MODE
frequency. A histogram of such bimodal distribution
By definition, the mode (symbol Mo) is the is given in Figure 2.2.
most often occurring value in a series. A series could
have multiple modes or none at all. In the case of
Michelle's grades, the mode is 76, an average that
Michelle likes but not his professor. The modal class
for aggregated data is the one with the highest
frequency. A unimodal distribution is one that has
only one mode. The distribution is said to be bimodal
when there are two class limits with the highest
10
CHARACTERISTICS OF THE MODE
It is the simplest yet unreliable method of determining central tendency. Extreme
Fig. 2.2 values in a distribution
have no effect on it. It is unnecessary to arrange the items before determining
Histogram ofthe mode.Distribution
Bimodal In some data sets, the
mode may not exist, while in others, there may be multiple modes.
Examples:
b. 6, 6, 6, 9, 9, 9, 12, 12, 12, 12, 12, 12, 15, 15, 15, 15, 15, 15, 21, 21, 35, 35
Mo = 12,15
Mo = L B + ( d1
)
d 1 +d 2
C
11
d1 = 15 – 12 = 3 LB = 69.5
d1 = 15 – 8 = 7 C=3
M o = LB +
( d1
)
d 1 +d 2
C
= 69.5 + ( 3+73 ) 3
= 69.5 + .9
Mo = 70.4 inches
12
EXERCISE
1. Determine the mean, median, and mode of the given set of data.
b) 14, 19, 24, 27, 14, 23, 32, 19, 41, 46, 35, 29, 38, 19, 40
Class Limits f
5.5-6.0 10
5.1-5.4 20
4.7-5.0 30
4.3-4.6 24
3.9-4.2 16
3. A sample of 15 families gave the following data on the number of children per family.
0, 3, 2, 6, 2, 4, 7, 0, 3, 3, 5, 4, 1, 0, 7
13
CHAPTER
3
MEASURES OF
OUTLINE
OBJECTIVES • INTRODUCTION
• RANGE
After completing this chapter, • VARIANCE
you should be able to: • STANDARD DEVIATION
1. Define the range,
variance, and
standard deviation.
2. Understand the
purposes of
measures of
variability.
3. Compute the
range, variance,
and standard of the
given set of data.
INTRODUCTION
14
In many research situations, the average score in a distribution is essential, but another set of statistics that
describe how variable (or how distributed) the scores tend to be is also important. Do the scores differ significantly,
or do they tend to be fairly comparable or similar in value? Sometimes the central issue in a study question is score
variability. Because variability is a quantitative term, none of this applies to qualitative data distributions.
While central tendency measurements offer information on the shared characteristics of measured
attributes, measures of variability assess quantify the degree to which they differ. If not all of the data values are
the same, there is variability. For the same reason that objective descriptions of events should include accounting
of both centripetal and centrifugal pressures, consenting and opposing viewpoints, shared and conflicting views,
measures of central tendency should be supplemented with measures of variability.
Furthermore, measures of variability, or the extent of individual variances around the central tendency, are
used to describe a set of test scores. These measurements indicate how evenly distributed the scores in a
distribution are. This is an important aspect of the distribution. Let's say you scored 75 out of 100 on a statistics
test, and you know the class average was 55. You now know you did better than the average, which is a positive
thing, but you have no idea how much better than the average you did. If the majority of respondents scored at or
near the mean, your score may be relatively high in compared to everyone else's. On the other hand, if the scores
were fairly evenly distributed, your score might be slightly over average.
If the scores are widely scattered around a mean, the distribution has high variability; if the scores are
mostly near to the mean, the distribution has low variability. Consider two distributions with the same mean of 55:
one with high variability, which includes scores such as 4, 15, 27, 36, 72, 75, 76, 78, 98, 99, and another with low
variability, which contains scores such as 48, 50, 52, 54, 55, 56, 56, 56, 58, 59, 62.
This diversity is frequently what we are interested in in the social sciences. Why are some people more
intelligent than others, why do they commit crimes? Why do certain businesses make more money than others, and
so on? We frequently look for factors that can help us understand variation.
When data is defined by a measure of central tendency (mean, median, or mode), a single number is used
to summarize all of the scores. Measures of variability are widely used to supplement and complement reports of
central tendency. These measures are known to communicate how scores are scattered about the mean in a given
distribution, and they are as follows:
1. The Range is the difference between the highest and lowest scores in a distribution.
2. The Variance is defined as the average of the squared deviations.
3. The Standard Deviation is the square root of the sum of the squared deviations about the mean,
divided by the number of scores.
Measures of variability determine the range of the distribution, in comparison to the measures of central
tendency. Whereas the measures of central tendency are specific data points, variability measures are the distances
between distinct points within the distribution. The variability is revealed by the spread of these data points. Range,
variance, and standard deviation are used to quantify variation or variability. In this Chapter, more information
about them will be discussed.
THE RANGE
The range is the simplest of all the measurements of variability to determine. It is frequently employed as a
preliminary indicator of variance. However, because it only considers the scores at the two extremes, the greatest
and lowest, it is of limited use. It is rather a crude measure of variability.
15
Example 1:
12, 25, 27, 29, 36, 38, 40, 43, 50, 54, 62
Range = 62 – 12 = 50
A range of 50 tells us very little about how the values are dispersed.
Are the values all clustered to one end with the low value (12 )or the high value (62) being an outlier?
Or are the values more evenly dispersed among the range?
Example 2:
In the set of numbers 2, 3, 4, 5, 7, 9 the range would be 9 – 2 = 7.
Example 3:
The range of the numbers 5, 7, 9, 13, 22, 35, 42, 44, 56, 60 is 55.
Example 4:
In the set 5, 30, 31, 33, 34, 37, 39, 41, 60 the range is 55.
By inspection, we can see that each distribution in examples 2 and 3 has a range of 55, but that's about all
they have in common. Sample A is fairly evenly distributed while distribution B tends to be bunched in the middle.
Squaring each deviation is done in finding standard deviation and variance. Variance is the average
squared differences of scores from the mean score of a distribution. Standard deviation is the square root of the
variance. They are used when the mean is the preferred measure of central tendency.
The standard deviation (σ or s) and variance (σ2 or s2) are more complete measures of variability which
consider every score in a distribution. The most widely used indicator of variability is the standard deviation
which, in a nutshell, is based on the deviation of each score from the mean. The other measures of dispersion we
have discussed are based on considerably less information. However, because variance relies on the squared
differences of scores from the mean, a single outlier has greater impact on the size of the variance than does a
single score near the mean. Some statisticians view this property as a shortcoming of variance as a measure of
variation, especially when there is reason to doubt the reliability of some of the extreme scores. For example, a
16
researcher might believe that a person who reports watching television for an average of 24 hours per day may
have misunderstood the question. Just one such extreme score might result in an appreciably larger standard
deviation, especially if the sample is small. Fortunately, since all scores are used in the calculation of variance, the
many non-extreme scores (those closer to the mean) will tend to offset the misleading impact of any extreme
scores.
1. Both consider the precise difference between each score and the mean. Consequently, these measures
are based on a maximum amount of information.
2. If any single score is changed, the standard deviation changes. If the
score is moved away from the mean the standard deviation increases. Moving the score toward the mean decreases
the standard deviation.
3. If a score is added that is far from the mean the standard deviation increases. If the added score is close
to the mean, the standard deviation decreases.
Example 1:
Consider a population consisting of the following values: 2, 4, 4, 4, 5, 5, 7, 9.
There are eight data points in total, with a mean of 5:
=2+4+4+4+5+5+7+9=5
8
To calculate the population standard deviation, first compute the difference of each data point from the
mean, and square the result:
σ=
√ 9+1+ 1+ 1+ 0+0+ 4+16
8
=2
Therefore, the above has a population standard deviation of 2, where the variance is:
2 9+1+1+ 1+ 0+0+ 4+16 32
σ = = =4
8 8
The above assumes a population. If the 8 values are obtained by random sampling from some parent
population, then computing the sample standard deviation would use a denominator of 7 instead of 8.
9+1+1+1+0+ 0+4 +16 32
s2= = =4.57
8−1 7
17
s= √ 4.57=2.14
A. For Population:
Variance = σ 2=
∑ ( X−μ )2
N
Standard Deviation = σ =
√ ∑ ( X−μ )2
N
where:
2
σ = variance of a population
σ = standard deviation of a population
X = population values
μ = population mean
N = total number of values in the population
B. For Sample:
2
Variance = s =
∑ ( X −x )2
n−1
Standard Deviation = s=
√ ∑ ( X −x )2
n−1
where:
2
s = variance of a sample
s = standard deviation of a sample
X = sample values
x = sample mean
n = total number of values in the sample
Since we rarely encounter population data in most of the research works done, we use estimates taken from
samples, called statistics. In this case, we will consider the data to be used in our next example to be from samples.
Example 2:
Given the data 50, 98, 82, 23, 46, 40, 63, 52, 92, 54, find the variance and standard deviation.
x = 60
X ( X −x ). ( X −x )2
50 -10 100
98 38 1444
82 22 484
23 37 1369
46 -14 196
40 -20 400
63 3 9
52 -8 64
92 32 1024
54 -6 36
n = 10 Σ( X −x )2 = 5126
18
2
s=
∑ ( X −x )2
n−1
5126
¿
10−1
s2=¿ 569.56
s= √ 569.56
s=23.87
Interpretation: On the average, the values vary from the mean of 60 by 23.87.
Example 3:
Here are 3 distributions that have the same mean, but different amount of variability. Find their variance
and standard deviation.
A: 10 10 10 10 10 x = 10, X - x: 0 0 0 0 0
B: 8 9 10 11 12 x = 10, X - x : -2 -1 0 1 2
C: 2 6 10 14 18 x = 10, X - x : -8 -4 0 4 8
2
n ( ∑ x ) −( ∑ x )
2
2
s= (for sample variance)
n ( n−1 )
√
2
n ( ∑ x 2 ) −( ∑ x ) (for sample standard deviation)
s=
n ( n−1 )
X X
2
50 2500
98 9604
82 6724
23 529
46 2116
40 1600
63 3969
52 2704
92 8464
19
54 2916
Σ X = 600 Σ X 2 = 41126
2
n ( ∑ x ) −( ∑ x )
2
2
s=
n ( n−1 )
10 ( 41126 )− (600 )2
¿
10 ( 9 )
411260−360000
¿
90
s2 ¿ 569.56
√ n ( ∑ x ) −( ∑ x )
2 2
s=
n ( n−1 )
¿ √ 569.56
s ¿ 23.87
We get the same answers using these formulas. You may choose the method which you find easier to use.
A. For Population:
∑
√ ∑ f ( X m −μ )2
2
2
f ( X m −μ )
Variance = σ = Standard Deviation = σ =
N N
where:
2
σ = population variance
σ = population standard deviation
f = frequency of each class
X m = midpoint of each class
μ = population mean
N = total number of observations in a population
B. For Sample:
Variance = s =
∑
2
f ( X m−x )2
n−1
Standard Deviation = s=
√ ∑ f ( X m−x )2
n−1
20
where:
s2 = sample variance
s = sample standard deviation
f = frequency of each class
X m = midpoint of each class
x = sample mean
n = total number of values in the sample
Example 1:
X f ( X m−x )
.
( X m−x )
2
f ( X m−x )
2
72 1 6 36 36
70 1 4 16 16
67 1 1 1 1
66 2 0 0 0
64 1 -2 4 4
62 1 -4 16 16
61 3 -5 25 75
462
x¿
7 = 66
∑
√
2
2
s=
f ( X m−x )
s=
∑ f ( X m−x )2
n−1 n−1
¿
148
9
¿
√ 148
9
¿ √ 16.44
2
s ¿ 16.44
s ¿ 4.05
Example 2:
X f ( X m−x )
.
( X m−x )
2
f ( X m−x )2
8 4 2.5 6.25 25
7 5 1.5 2.25 11.25
6 7 0.5 0.25 1.75
5 4 -0.5 0.25 1
4 3 -1.5 2.25 6.75
3 1 -2.5 6.25 6.25
21
Σ X = 33 n = 24 Σ f ( X−x )2 = 52
33
x¿ = 5.5
6
∑
√
2
f ( X m−x )2 ∑ f ( X m−x )2
s= s=
n−1 n−1
¿
52
23
¿
√ 52
23
2
¿ √ 2.26
s ¿ 2.26
s ¿ 1.50
EXERCISE
1. The following scores were obtained by a class of boys and girls in a 20-item test in Statistics.
22
Find the range, variance, and standard deviation.
2. The following are the scores of two sections of students in the same level who took the removal test in
Mathematics.
Section X Section Y
40 67
72 75
59 41
65 42
47 55
30 72
62 38
23