0% found this document useful (0 votes)
358 views23 pages

MODULE IN STATISTICS Measures Tendency Variability

This document provides an overview of measures of central tendency including the mean, median, and mode. It defines the mean as the sum of all values divided by the total number of items. The median is the middle value when data is arranged in order. The mode is the value that occurs most frequently. Examples are given to demonstrate calculating the mean for both grouped and ungrouped data, as well as calculating a weighted mean. The purpose of measures of central tendency is to find a single value that represents the center of a data set.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
358 views23 pages

MODULE IN STATISTICS Measures Tendency Variability

This document provides an overview of measures of central tendency including the mean, median, and mode. It defines the mean as the sum of all values divided by the total number of items. The median is the middle value when data is arranged in order. The mode is the value that occurs most frequently. Examples are given to demonstrate calculating the mean for both grouped and ungrouped data, as well as calculating a weighted mean. The purpose of measures of central tendency is to find a single value that represents the center of a data set.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER

3
MEASURES OF CENTRAL

OUTLINE

OBJECTIVES • INTRODUCTION
• MEAN
After completing this chapter, • MEDIAN
you should be able to: • MODE
1. Define the mean,
median, and mode.
2. Understand the
purposes of
measures of central
tendency.
3. Calculate the
mean, median, and
mode of the given
set of data.

11
INTRODUCTION

Now that we have visualized our data to understand its shape, we can begin with numerical analyses! The
descriptive statistics presented in this chapter serve to start to describe the distribution of our data objectively and
mathematically – our first step into statistical analysis!

Statistical data can be described in a variety of ways, based on two factors: the nature of the data and the reason for
which the data was obtained. When quantitatively or vocally explaining data, make sure the description is neither
too short nor too long. We can compare two or more distributions from the same time period or within the same
distribution over time using measures of central tendency. For example, an average can be used to compare
coffee consumption in two separate territories over the same time period or in a single territory over two years, such
as 2021 and 2022.

Finding the center of a distribution may be more challenging than first thought. Imagine this situation: You are in a
class with just four other students, and the five of you took a 5-point pop quiz. Today your teacher is walking
around the room, handing back the quizzes. She stops at your desk and hands you your paper. Written in bold black
ink on the front is “3/5.” How do you react? Are you happy with your score of 3 or disappointed? How do you
decide? You might calculate your percentage correct, realize it is 60%, and be appalled. But it is more likely that
when deciding how to react to your performance, you will want additional information. What additional
information would you like? If you are like most students, you will immediately ask your neighbors, “How much
did you get?” and then ask the teacher, “How did the class do?” In other words, the additional information you want
is how your quiz score compares to other students’ scores. You therefore understand the importance of comparing
your score to the class distribution of scores. Should your score of 3 turn out to be among the higher scores, then
you’ll be pleased after all. On the other hand, if 3 is among the lower scores in the class, you won’t be quite so
happy.

In statistics, the central tendency is the descriptive summary of a data set. It reflects the data distribution's center
through a single value from the dataset. Furthermore, where it presents a summary of the dataset, it does not
provide information on specific data from the dataset. In general, several statistical metrics can be used to
determine a dataset's central tendency.

Based on the properties of the data, the measures of central tendency are selected.
If you have a symmetrical distribution of continuous data, all the three measures of central tendency hold good. But
most of the times, the analyst uses the mean because it involves all the values in the distribution or dataset.
If you have skewed distribution, the best measure of finding the central tendency is the median.
If you have the original data, then both the median and mode are the best choice of measuring the central tendency.
If you have categorical data, the mode is the best choice to find the central tendency.

In this chapter, we'll look at the three most important measures of central tendency: mean, median, and mode.

2
THE MEAN

The arithmetic mean is the most often used measure of central tendency. It is known as the mean or
computed average. It is defined to be the sum of the values of a group of items divided by the number of such
items. The mean of sample of scores on a variable x is symbolized by x (x-bar) and the mean of the population is
represented by the μ (mu). Researchers are frequently obliged to estimate μ from x , since they cannot measure
every item in the population.

Data set includes Data set is only sample


entire population of population
Mean = 𝜇 Mean = x

Fig. 2.1
Graphical display of the relationship of a sample mean from the population mean of a data set

CHARACTERISTICS OF MEAN
When using sample data to create population inferences, the mean is a more dependable or stable
measurement to utilize. It is the point at which all of the values on both sides are balanced. The mean is sensitive
to the values, whether they are high or low. When the distribution is extremely skewed, it loses its representative
character, and the mean cannot be determined when the distribution contains open-ended intervals in the balance
of additional information, making it an improper average to use.

USES OF MEAN
The mean is the most widely used, most easily understood, and most generally recognized average. When
the distribution is symmetrical, it is the optimal measure to use, and it is a valuable measure for inferential
statistics. It can also be used to calculate the average value of a set of values after each one has been weighted. It's
regarded as a weighted average.

COMPUTATION OF THE MEAN FOR UNGROUPED DATA


For ungrouped data, the mean is computed by simply adding all the values and dividing the sum by the
total number of items. For the sample mean, the formula is:

n
In simpler form, the formula may be presented as, sum of the data
sample mean →
∑ x1 number of pieces of data
i=1
x=
n

3
x=
∑x
N
where x = sample mean
x = value of each item
n = number of items in the sample
Σ= “the sum of”

and for the population mean, it is

μ=
∑x
N

where μ = arithmetic mean of a population


N = number of X items in the population

Example 1:

Let us consider the scores of Triz Glee in her statistics class. (The scores have been arrayed in a
descending order.);
76
76
62
51
45
27
12
6
2
357 Total of all scores

Since in the case of Triz Glee's scores Σx = 357, Triz Glee's mean score is

x=
∑ x = 357 =¿ 39.67
n 9

Example 2:

A random sample of six cashiers in a grocery store shows the following balances at the end of the day:
₱16,640.39; ₱26,915.59; ₱6,827.08; ₱101,791.17; ₱61,811.75; and ₱20,244.12. Compute the mean balance.

x = ₱16,640.39 + 26,915.59 + 6,827.08 + 101,791.17 + 61,811.75 + 20,244.12


6
x = ₱234,230.10
6
x = ₱39,038.35

WEIGHTED ARITHMETIC MEAN


The weighted arithmetic mean of a set of values represented by x 1, x2, x3,…,xn can be expressed as the
sum of the values multiplied by their corresponding weights. The formula is:

4
n

∑ f 1 x1
x= i=1n
∑ f1
i =1

where fi = represents the weight or frequency of each item


xi = represents each of the item values

In simplified form, the formula is

x=
∑ fx
∑f
Examp le 1:

The final grades of Levi Rosell at the end of the semester are the following:
Subjects Grades (x) Units (f)

Bus. Math 10A 1.75 3


Nat Sci. 101 1.50 3
English 101 2.00 3
Accounting 1 and 2 2.25 6
Economics 101 2.50 3
Finance 101 1.50 3

Then the mean grade of Levi Rosell is


x = 3(1.75) + 3(1.50) + 3(2.00) + 6(2.25) + 3(2.50) + 3(1.50)
21
x = 1.96

Examp le 2:

Consider the daily earnings of the employees of a small accounting firm: 210, 210, 850, 360, 310, 310,
210, 210, 960, 210. Find the mean daily salary.
5 2

Sum of salaries = 210 + 210 + 210 + 210 + 210 + 310 + 310 + 360 + 850 + 960
The sum of the salaries can be written as shown

Salary

5
Sum of salaries = 210 : 5 + 310 : 2 + 360 : 1 + 350 : 1 + 960 : 1

Frequency

sum of salaries

x = 3,840 = ₱384 average daily salary


10
total number of salaries

Solution 2:

This can be done most efficiently if we present the data in a column.


Salary Frequency Salary Frequency
(x) (f) (fx)

210 5 1,050
310 2 620
360 1 360
850 1 850
960 1 960

10 Σfx = ₱3,840 sum of salaries

x= Sum of salaries = Σfx


Numbers of salaries Σf

= 3,840
10
x = ₱384

COMPUTING THE MEAN FOR GROUPED DATA


Data which are arranged in a frequency distribution are called grouped data. Observations belonging to
each class interval are represented by the class mark of the interval. There are two methods we can use to compute
the mean from grouped data, the long method and the coded method.
A. Long Method
The formulas are:
n

∑ f 1 x1 (for a sample)
x= i=1
n
n

∑ f 1 x1 (for a population)
i=1
μ=
N

In simplified form, the formulas are:

x=
∑ fx , μ= ∑ fX
n N

6
Example:

Compute for the mean height of 50 men using the long method.

Height Frequency Class Mark fiXi


(inches) (fi) (Xi)

61-63 2 62 124
64-66 5 65 325
67-69 12 68 816
70-72 15 71 1,065
73-75 8 74 592
76-78 5 77 385
79-81 3 80 240

Total Σfi = 50 ΣfiXi = 3,547

x=
∑ f Xi
n
= 3,547
50
x = 70.94

Thus, the mean height of 50 men would be 70.94 inches.

B. Coded Formula
This formula requires coding and is called the coded formula for the mean.
The procedure is as follows:
1. Take the class mark of the class intervals as an assumed mean.
Denote this by xo·
This xo is set to a zero (origin).
2. The class marks of the classes following the class containing the origin are coded +1,+2... The class
marks prior to the class containing the origin are coded -1, -2, ... or the class marks may be
expressed by the codes. Ui = x where C is the size of the class interval.
3. Multiply the coded values (Ui) by the corresponding frequencies (fi) and find the sum.
4. Divide the sum (ΣfU) by the total number of frequencies (n) and multiply the result by the size of
the class interval (C).
5. The result is then added to the assumed mean (xo).

The formula employed is:

( )
n

∑ f i Ui
i =1
x=x o + C
n

where xo = assumed mean


f = frequency
n = total frequency
C = class size
U i = 0, ±1, ±2, ±3, ....

To obtain a population mean, we simply replace7 x by μ and n by N.


Example:

Compute for the mean height of 50 men using the long method.

Height Frequency Class Mark Deviation or Coded


(inches) (fi) (Xi)
Ui fiUi

61-63 2 62 -3 -6
64-66 5 65 -2 -10
67-69 12 68 -1 -12
70-72 15 71 0 0
73-75 8 74 1 8
76-78 5 77 2 10
79-81 3 80 3 9

C=3 Σfi = 50 ΣfiUi = -1

Applying the formula

x=x o + (∑ )
n
fU
C

¿ 71 + ( )
−1
50
3

¿ 71 - .06
x=¿ 70.94

THE MEDIA N

The median (symbol Md) of a set of data is a measure of central tendency that occupies the middle
value in an array of numbers. It is the number that separates the bottom half of the data from the top half,
meaning half of the data items are below the median and half are above it. The median is just the middle value
when there are an odd number of items. The median is the average of the two middle data values in the ordered
list if n is even.

CHARACTERISTICS OF THE MEDIAN


The median is another commonly used average that is simple to comprehend and calculate. It can only
be found if the items are ordered ascending or descending. It's the point when the frequency distribution is split
in half. When a distribution is badly skewed, the median is the best choice because it is unaffected by
exceptionally high or low values. It may be determined in open-ended distribution.

USES OF THE MEDIAN


The median is used whenever an average of position is required.  It is used when there are open-ended
intervals involved.  The median is widely employed as an average in measuring general abilities, such as in
intelligence tests, because it divides the distribution in half.

8
COMPUTATION OF THE MEDIAN FOR UNGROUPED DATA
The median is computed as follows:
1. Arrange the items in an array.
2. Identify the middle value.

Example 1:

The amount of money a peanut vendor earned on five randomly selected days are:
₱86, 109, 141, 74, 123

Making an array, we have


₱74, 86, 109, 123, 141

Since there are 5 (odd) items,


Median = Md = ₱109

Example 2:

Let us consider the average grades of 10 students: 83, 74, 63, 77, 81, 100, 60, 73, 86, 91. Arranging them,
we have 60, 63, 73, 74, 77, 81, 83, 86, 91, 100. Here, there are 10 (even) items.

Median = Md = 77 + 81 = 79
2

Example 3:

In an example of Michelle's grades in statistics, the middle value in the array and thus the median grade is
45.

76, 76, 62, 51, 45, 27, 12, 6, 2


Median

COMPUTATION OF THE MEDIAN FOR GROUPED DATA


The median of a grouped frequency distribution is essentially the x-coordinate of the point of intersection
of the “less than” and “greater than” gives of the distribution.
The formula for the computation of the median is

( )
N
−Cfp
Median = Md = LB + 2 C
fmd

where LB = lower boundary of the median class


N
Median class is the class interval where is found
2
Cfp = cumulative frequency for the class interval preceding the
Median class when the scores are arranged from lowest to highest.
C = size of the median class
fmd = frequency of the median class

9
N
In the computation of the median, the number of the desired item is first determined by (n = number of
2
items). Referring to “less than” cumulative frequency distribution by cumulative addition, the amount is continued
until the group containing the middle value is located. The class interval where median is located is called the
N
median class. The median class is the class interval where is found.
2

Example:

Compute for the median of the following grouped frequency distribution.


Height f “Less than” cumulative
(inches) Frequencies (f<)

61-63 2 2
64-66 5 7
67-69 12 19
70-72 15 34
73-75 8 42
76-78 5 47
79-81 3 50

N
= 25, the median class is 70-72
2

( )
N
−Cfp
Md = LB + 2 C
fmd

= 69.5 + ( 25−19
15 )
3

= 69.5 + 1.2
Md = 70.70 inches 50% of the scores in the distribution are smaller
than 70.70 inches.

THE MODE
frequency. A histogram of such bimodal distribution
By definition, the mode (symbol Mo) is the is given in Figure 2.2.
most often occurring value in a series. A series could
have multiple modes or none at all. In the case of
Michelle's grades, the mode is 76, an average that
Michelle likes but not his professor. The modal class
for aggregated data is the one with the highest
frequency. A unimodal distribution is one that has
only one mode. The distribution is said to be bimodal
when there are two class limits with the highest

10
CHARACTERISTICS OF THE MODE
It is the simplest yet unreliable method of determining central tendency. Extreme
Fig. 2.2 values in a distribution
have no effect on it. It is unnecessary to arrange the items before determining
Histogram ofthe mode.Distribution
Bimodal In some data sets, the
mode may not exist, while in others, there may be multiple modes.

USES OF THE MODE


It is utilized when a quick average estimate is required. It aids in the detection of a trend. Because it is the
most commonly occurring value, if you are a shoe or garment manufacturer and want to know (and make) the size
that will fit the most people, you would look for the modal size. Naturally, the shoe or clothing manufacturer will
produce more shoes or outfits in the most popular size than in other sizes. As a result, the mode gives information
to businessmen and manufacturers to aid in business planning and decision-making.

COMPUTATION OF THE MODE FOR UNGROUPED DATA


For ungrouped data, the most frequent occurring score is the mode.

Examples:

Find the mode of the following values:

a. 3, 4, 7, 7, 7, 8, 11, 11, 14, 18, 19


Mo = 7

b. 6, 6, 6, 9, 9, 9, 12, 12, 12, 12, 12, 12, 15, 15, 15, 15, 15, 15, 21, 21, 35, 35
Mo = 12,15

COMPUTATION OF THE MODE FOR GROUPED DATA


For grouped distributions, the class with the greatest frequency is called the modal class. The formula is:

Mo = L B + ( d1
)
d 1 +d 2
C

where LB = lower boundary of the modal class


d1 = difference between the frequency of the modal class and the frequency of the class
interval lower than the modal class.
d2 = difference between the frequency of the modal class and the frequency of the class
interval higher than the modal class.
C = size of the modal class

In the following table, the modal size is 70-72 inches.


Height Frequency
61-63 2
64-66 5
67-69 12
70-72 15
73-75 8
76-78 5
79-81 3
n = 50

11
d1 = 15 – 12 = 3 LB = 69.5
d1 = 15 – 8 = 7 C=3

M o = LB +
( d1
)
d 1 +d 2
C

= 69.5 + ( 3+73 ) 3
= 69.5 + .9
Mo = 70.4 inches

12
EXERCISE

1. Determine the mean, median, and mode of the given set of data.

a) 5, 7, 11, 17, 10, 7, 13, 15, 15, 3, 8, 4, 12

b) 14, 19, 24, 27, 14, 23, 32, 19, 41, 46, 35, 29, 38, 19, 40

2. Find the mean, median and mode of the distribution below.

Class Limits f

5.5-6.0 10
5.1-5.4 20
4.7-5.0 30
4.3-4.6 24
3.9-4.2 16

3. A sample of 15 families gave the following data on the number of children per family.

0, 3, 2, 6, 2, 4, 7, 0, 3, 3, 5, 4, 1, 0, 7

Find the mean, median, and mode.

13
CHAPTER

3
MEASURES OF

OUTLINE

OBJECTIVES • INTRODUCTION
• RANGE
After completing this chapter, • VARIANCE
you should be able to: • STANDARD DEVIATION
1. Define the range,
variance, and
standard deviation.
2. Understand the
purposes of
measures of
variability.
3. Compute the
range, variance,
and standard of the
given set of data.

INTRODUCTION

14
In many research situations, the average score in a distribution is essential, but another set of statistics that
describe how variable (or how distributed) the scores tend to be is also important. Do the scores differ significantly,
or do they tend to be fairly comparable or similar in value? Sometimes the central issue in a study question is score
variability. Because variability is a quantitative term, none of this applies to qualitative data distributions.

While central tendency measurements offer information on the shared characteristics of measured
attributes, measures of variability assess quantify the degree to which they differ. If not all of the data values are
the same, there is variability. For the same reason that objective descriptions of events should include accounting
of both centripetal and centrifugal pressures, consenting and opposing viewpoints, shared and conflicting views,
measures of central tendency should be supplemented with measures of variability.
Furthermore, measures of variability, or the extent of individual variances around the central tendency, are
used to describe a set of test scores. These measurements indicate how evenly distributed the scores in a
distribution are. This is an important aspect of the distribution. Let's say you scored 75 out of 100 on a statistics
test, and you know the class average was 55. You now know you did better than the average, which is a positive
thing, but you have no idea how much better than the average you did. If the majority of respondents scored at or
near the mean, your score may be relatively high in compared to everyone else's. On the other hand, if the scores
were fairly evenly distributed, your score might be slightly over average.
If the scores are widely scattered around a mean, the distribution has high variability; if the scores are
mostly near to the mean, the distribution has low variability. Consider two distributions with the same mean of 55:
one with high variability, which includes scores such as 4, 15, 27, 36, 72, 75, 76, 78, 98, 99, and another with low
variability, which contains scores such as 48, 50, 52, 54, 55, 56, 56, 56, 58, 59, 62.
This diversity is frequently what we are interested in in the social sciences. Why are some people more
intelligent than others, why do they commit crimes? Why do certain businesses make more money than others, and
so on? We frequently look for factors that can help us understand variation.
When data is defined by a measure of central tendency (mean, median, or mode), a single number is used
to summarize all of the scores. Measures of variability are widely used to supplement and complement reports of
central tendency. These measures are known to communicate how scores are scattered about the mean in a given
distribution, and they are as follows:

1. The Range is the difference between the highest and lowest scores in a distribution.
2. The Variance is defined as the average of the squared deviations.
3. The Standard Deviation is the square root of the sum of the squared deviations about the mean,
divided by the number of scores.

Measures of variability determine the range of the distribution, in comparison to the measures of central
tendency. Whereas the measures of central tendency are specific data points, variability measures are the distances
between distinct points within the distribution. The variability is revealed by the spread of these data points. Range,
variance, and standard deviation are used to quantify variation or variability. In this Chapter, more information
about them will be discussed.

THE RANGE

The range is the simplest of all the measurements of variability to determine. It is frequently employed as a
preliminary indicator of variance. However, because it only considers the scores at the two extremes, the greatest
and lowest, it is of limited use. It is rather a crude measure of variability.

Range = Highest score - Lowest score

15
Example 1:
12, 25, 27, 29, 36, 38, 40, 43, 50, 54, 62

Range = 62 – 12 = 50

A range of 50 tells us very little about how the values are dispersed.
Are the values all clustered to one end with the low value (12 )or the high value (62) being an outlier?
Or are the values more evenly dispersed among the range?

Example 2:
In the set of numbers 2, 3, 4, 5, 7, 9 the range would be 9 – 2 = 7.

Example 3:
The range of the numbers 5, 7, 9, 13, 22, 35, 42, 44, 56, 60 is 55.

Example 4:
In the set 5, 30, 31, 33, 34, 37, 39, 41, 60 the range is 55.

By inspection, we can see that each distribution in examples 2 and 3 has a range of 55, but that's about all
they have in common. Sample A is fairly evenly distributed while distribution B tends to be bunched in the middle.

OTHER CHARACTERISTICS OF THE RANGE

1. It provides a rapid approximation of data variability, although it is not particularly complex.


2. It's limited because if the extreme scores aren't typical of the sample yet are included in the scores, the
range will be as well.
3. When the mode is the preferred measure of central tendency (i.e. when you have nominal level data),
this method is utilized.
4. It is the most basic measure of variability.
5. It is not very informative because only the most extreme scores are used.
6. It is heavily influenced by extreme scores in your data distribution; only one of these extreme scores can
drastically alter the range, making it unreliable as a measure of variability.

THE VARIANCE AND STANDARD DEVIATION

Squaring each deviation is done in finding standard deviation and variance. Variance is the average
squared differences of scores from the mean score of a distribution. Standard deviation is the square root of the
variance. They are used when the mean is the preferred measure of central tendency.
The standard deviation (σ or s) and variance (σ2 or s2) are more complete measures of variability which
consider every score in a distribution. The most widely used indicator of variability is the standard deviation
which, in a nutshell, is based on the deviation of each score from the mean. The other measures of dispersion we
have discussed are based on considerably less information. However, because variance relies on the squared
differences of scores from the mean, a single outlier has greater impact on the size of the variance than does a
single score near the mean. Some statisticians view this property as a shortcoming of variance as a measure of
variation, especially when there is reason to doubt the reliability of some of the extreme scores. For example, a

16
researcher might believe that a person who reports watching television for an average of 24 hours per day may
have misunderstood the question. Just one such extreme score might result in an appreciably larger standard
deviation, especially if the sample is small. Fortunately, since all scores are used in the calculation of variance, the
many non-extreme scores (those closer to the mean) will tend to offset the misleading impact of any extreme
scores.

CHARACTERISTICS OF THE VARIANCE AND THE STANDARD DEVIATION


The variance and standard deviation are the most commonly used measures of dispersion or variability in
the social science because:

1. Both consider the precise difference between each score and the mean. Consequently, these measures
are based on a maximum amount of information.
2. If any single score is changed, the standard deviation changes. If the
score is moved away from the mean the standard deviation increases. Moving the score toward the mean decreases
the standard deviation.
3. If a score is added that is far from the mean the standard deviation increases. If the added score is close
to the mean, the standard deviation decreases.

COMPUTATION OF THE VARIANCE AND THE STANDARD DEVIATION OF UNGROUPED


DATA

Example 1:
Consider a population consisting of the following values: 2, 4, 4, 4, 5, 5, 7, 9.
There are eight data points in total, with a mean of 5:
=2+4+4+4+5+5+7+9=5
8
To calculate the population standard deviation, first compute the difference of each data point from the
mean, and square the result:

(2 - 5)2 = (-3)2 = 9 (5 - 5)2 = 02 = 0


(4 - 5) = (-1) = 1
2 2
(5 - 5)2 = 02 = 0
(4 - 5)
Next
2
= (-1)
divide
2
the =sum
1 of these values
(7 -by 5)the
2
= 2number
2
= 4 of values and get the square root to give the standard
(4
deviation: - 5) 2
= (-1) 2
= 1 (9 - 5) 2
= 4 2
= 16

σ=
√ 9+1+ 1+ 1+ 0+0+ 4+16
8
=2

Therefore, the above has a population standard deviation of 2, where the variance is:
2 9+1+1+ 1+ 0+0+ 4+16 32
σ = = =4
8 8

The above assumes a population. If the 8 values are obtained by random sampling from some parent
population, then computing the sample standard deviation would use a denominator of 7 instead of 8.
9+1+1+1+0+ 0+4 +16 32
s2= = =4.57
8−1 7

17
s= √ 4.57=2.14

Let us summarize the formulas we used:

A. For Population:

Variance = σ 2=
∑ ( X−μ )2
N
Standard Deviation = σ =
√ ∑ ( X−μ )2
N

where:
2
σ = variance of a population
σ = standard deviation of a population
X = population values
μ = population mean
N = total number of values in the population

B. For Sample:
2
Variance = s =
∑ ( X −x )2
n−1
Standard Deviation = s=
√ ∑ ( X −x )2
n−1

where:
2
s = variance of a sample
s = standard deviation of a sample
X = sample values
x = sample mean
n = total number of values in the sample

Since we rarely encounter population data in most of the research works done, we use estimates taken from
samples, called statistics. In this case, we will consider the data to be used in our next example to be from samples.

Example 2:
Given the data 50, 98, 82, 23, 46, 40, 63, 52, 92, 54, find the variance and standard deviation.

x = 60

X ( X −x ). ( X −x )2

50 -10 100
98 38 1444
82 22 484
23 37 1369
46 -14 196
40 -20 400
63 3 9
52 -8 64
92 32 1024
54 -6 36

n = 10 Σ( X −x )2 = 5126

18
2
s=
∑ ( X −x )2
n−1

5126
¿
10−1

s2=¿ 569.56

s= √ 569.56
s=23.87

Interpretation: On the average, the values vary from the mean of 60 by 23.87.

Example 3:
Here are 3 distributions that have the same mean, but different amount of variability. Find their variance
and standard deviation.

A: 10 10 10 10 10 x = 10, X - x: 0 0 0 0 0
B: 8 9 10 11 12 x = 10, X - x : -2 -1 0 1 2
C: 2 6 10 14 18 x = 10, X - x : -8 -4 0 4 8

A: s2=0 B: s2=2.5 C: s2=50


s=0 s=1.58 s=7.

THE SHORT-CUT FORMULA


Using the data in Example 2 (previous page), let us find the variance and the standard deviation using the
short-cut formula:

2
n ( ∑ x ) −( ∑ x )
2
2
s= (for sample variance)
n ( n−1 )


2
n ( ∑ x 2 ) −( ∑ x ) (for sample standard deviation)
s=
n ( n−1 )

X X
2

50 2500
98 9604
82 6724
23 529
46 2116
40 1600
63 3969
52 2704
92 8464

19
54 2916

Σ X = 600 Σ X 2 = 41126

2
n ( ∑ x ) −( ∑ x )
2
2
s=
n ( n−1 )

10 ( 41126 )− (600 )2
¿
10 ( 9 )

411260−360000
¿
90

s2 ¿ 569.56

√ n ( ∑ x ) −( ∑ x )
2 2

s=
n ( n−1 )

¿ √ 569.56

s ¿ 23.87

We get the same answers using these formulas. You may choose the method which you find easier to use.

COMPUTATION OF THE VARIANCE AND THE STANDARD DEVIATION OF GROUPED


DATA
The formulas used in finding the variance and standard deviation of grouped data are the following:

A. For Population:

√ ∑ f ( X m −μ )2
2
2
f ( X m −μ )
Variance = σ = Standard Deviation = σ =
N N

where:
2
σ = population variance
σ = population standard deviation
f = frequency of each class
X m = midpoint of each class
μ = population mean
N = total number of observations in a population

B. For Sample:

Variance = s =

2
f ( X m−x )2
n−1
Standard Deviation = s=
√ ∑ f ( X m−x )2
n−1

20
where:
s2 = sample variance
s = sample standard deviation
f = frequency of each class
X m = midpoint of each class
x = sample mean
n = total number of values in the sample

Example 1:

X f ( X m−x )
.
( X m−x )
2
f ( X m−x )
2

72 1 6 36 36
70 1 4 16 16
67 1 1 1 1
66 2 0 0 0
64 1 -2 4 4
62 1 -4 16 16
61 3 -5 25 75

Σ X = 462 n = 10 Σ f ( X−x )2 = 148

462
x¿
7 = 66



2
2
s=
f ( X m−x )
s=
∑ f ( X m−x )2
n−1 n−1

¿
148
9
¿
√ 148
9
¿ √ 16.44
2
s ¿ 16.44
s ¿ 4.05

Example 2:

X f ( X m−x )
.
( X m−x )
2
f ( X m−x )2

8 4 2.5 6.25 25
7 5 1.5 2.25 11.25
6 7 0.5 0.25 1.75
5 4 -0.5 0.25 1
4 3 -1.5 2.25 6.75
3 1 -2.5 6.25 6.25

21
Σ X = 33 n = 24 Σ f ( X−x )2 = 52

33
x¿ = 5.5
6



2
f ( X m−x )2 ∑ f ( X m−x )2
s= s=
n−1 n−1

¿
52
23
¿
√ 52
23

2
¿ √ 2.26
s ¿ 2.26
s ¿ 1.50

EXERCISE

1. The following scores were obtained by a class of boys and girls in a 20-item test in Statistics.

Boys: 6, 7, 8, 9, 10, 11, 11, 16


Girls: 5, 6, 6, 7, 8, 10, 10, 12, 13, 15, 18, 19

22
Find the range, variance, and standard deviation.

2. The following are the scores of two sections of students in the same level who took the removal test in
Mathematics.

Section X Section Y
40 67
72 75
59 41
65 42
47 55
30 72
62 38

a. Find the following mean, range, variance, and standard deviation.


b. Answer the following:
i. Which section performed better in the test?
ii. Which section has more uniform net of scores?
iii. Which section shows more variability in scores?

23

You might also like