Chapter Four
Measures of Variation
1
Chapter Goals
After completing this chapter, you will be able to:
• Compute and interpret the absolute and relative measures of
variation for a set of data.
• Find the range, variance, standard deviation, coefficient of
variation, and standard score and know what these values mean.
2
Introduction
Dispersion refers to lack of uniformity in the sizes or
qualities.
A measure of central tendency only shows the middle or
the average of a dataset, i.e., variability cannot be
determined.
▪ Example: Consider the following datasets
A: 1, 2, 5, 6, 6 B: −40, 0, 5, 20, 35
Mean(A) = Mean(B) = 4
▪ However, the data in A seem more consistent (less
variable) than the data in B.
▪ If observations are close to the center, we say that
dispersion is small.
3
4.1 Objectives of Measuring Variation
To assess the reliability of the average being used: If
the dispersion in the values of various items in a
dataset is large, the average may be unrepresentative of
the dataset.
To compare two or more sets of data with regard to
their variability.
To identify the causes of variability with a view to
control it.
To serve as a basis for further statistical analysis.
4
4.2 Absolute and Relative Measures
Measure of Dispersion
Absolute Relative
• Absolute Variations are expressed in the same units of
measurement in which the original data are given.
• Recommended to compare variations in distributions
where units/standards of measurements are the
same.
• A relative variation is obtained from the ratio of
absolute variation to a measure of central tendency.
• These are used to compare variations of sets of data
measured without same standards(units).
5
Absolute and Relative Measures . . .
Absolute Variation Relative Variation
Range Coefficient of range
Inter– quartile range Coefficient of quartile deviation
Quartile deviation Coefficient of mean deviation
Mean deviation Coefficient of variation
Variance and sd Standard scores
6
4.3 Types of Measures of Variation
4.3.1 Range
Range: the difference between the smallest and the largest
values.
Range= UCB of the last class – LCB of the first class
(for grouped frequency distribution)
• Example: Find the range of the following distributions.
1) 23, 42, 20, 30, 35, 21, 45, 33, 23, 23, 20, 42, 29, 20.
Range: 45 – 20 = 25
1) Class: 2.5 – 10.5 10.5 – 18.5 18.5 – 26.5 26.5 – 34.5
Frequency: 4 7 6 15
Range: 34.5 – 2.5 = 32
7
Disadvantages of the Range
• Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
• Cannot be calculated for open–end distributions.
8
4.3.2 Variance, Standard Deviation and Coefficient of Variation
• Sum of squares is obtained by subtracting the mean from an
observation and squaring this "deviate".
• Variance is average of the sum of squares.
If X1, X2,..., XN are N observations, then the population
variance, denoted σ2 is given by:
N
(x i − μ) 2
σ =
2 i =1
N
Where
μ = population mean
N = population size
xi = ith value of the variable x 9
Sample Variance
• Average (approximately) of squared deviations of values
from the mean.
– Sample variance: n
(x − x)i
2
s =
2 i=1
n -1
X = arithmetic mean
Where
n = sample size
Xi = ith value of the variable X
• The sum of squares in this case is divided by n-1 in order
to get an unbiased estimator of the population variance.
10
Standard Deviation
• Standard deviation is the positive square root of variance.
• To put the variance on the same scale as the original data, we
prefer to work with the standard deviation.
• Standard deviation shows variation about the mean.
Population standard deviation:
N
i
(x − μ) 2
σ= i =1
N
Sample standard deviation:
n
i
(x − x) 2
S= i =1
n -1
11
Example: Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
s= (10 − X)2 + (12 − x)2 + (14 − x)2 ++ (24 − x)2
n −1
= (10 −16)2 + (12 −16)2 + (14 −16)2 ++ (24 −16)2
8 −1
A measure of the “average” scatter
= 126 = 4.2426 around the mean
7
12
Alternative Formulae
1. 2. 1 ( x) 2
s=
1
[ x i2 − nx 2 ] s= [ x i −
2
]
n −1 n −1 n
For data from a frequency distribution, standard deviation is
given by:
s = 1 f(x − x)2 Here x = fx
n −1 f
s= 1 [n fx 2 − ( fx) 2].
n(n −1)
where, X is the class mark of the ith class
Examples: 1. Find the sample mean, variance & sd for A and B.
A: 10 60 50 30 40 20
B: 40 30 45 35 40 20
13
Example . . .
A B A- XA (A - X )2 B - X
A B (B - XB )2
10 40 -25 625 5 25
60 30 25 625 -5 25
50 45 15 225 10 100
30 35 -5 25 0 0
40 40 5 25 5 25
20 20 -15 225 -15 225 Variances
Total 210 210 0 1750 0 400 differ
Mean 35 35 0 0
Var (A) = 350 Var(B) = 80
Sd(A) = 18.71 Sd(B) = 8.9
• 2) Class: 0 –10 10 –20 20 –30 30 – 40 40 – 50
• Frequency 7 6 15 12 10
14
Example . . .
Class f Xm fXm (Xm - X ) (Xm - X )2 f(Xm - X )2
0 - 10 7 5 35 -22.4 501.76 3512.32
10 -20 6 15 90 -12.4 153.76 922.56
20 –30 15 25 375 -2.4 5.76 86.40
30 - 40 12 35 420 7.6 57.76 693.12
40- 50 10 45 450 17.6 309.76 3097.60
Total *** 50 1370 8312
Mean 27.4
Var(X) = 8312/49 = 169.63
Since the variance is just the square of the standard deviation,
these quantities contain essentially the same information on
different scales.
15
Comparing Standard Deviations
Which of the following datasets is the most variable?
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
16
Algebraic Properties of Variance and Standard Deviation
• Sd(X) > = 0
• If K is added to/subtracted from each observation, the
variance and sd remain the same.
• If each observation is multiplied by K, the new variance &
sd will be (K2)(“previous” variance) & (|K|)(“previous” sd)
respectively.
• If each observation is divided by k, the new variance & sd
will be (“previous” variance)/K2 & (“previous” sd) / |K|
respectively.
17
Coefficient of Variation (CV)
• Measures relative variation
• Always in percentage (%)
• Useful to compare the amount of variation among
groups with different means.
• CV is a unitless quantity.
• Can be used to compare two or more sets of data
measured in different units
CV = 100%
s
x
18
Comparing Coefficient of Variation
• Example: Which stock is more variable relative to price?
• Stock A:
– Average price last year = Birr50
– Standard deviation = Birr 5
s
CV = 100% = birr 5 100% = 10%
A x birr 50 Both stocks
have the same
• Stock B: standard
deviation, but
– Average price last year = Birr100 stock B is less
– Standard deviation = Birr 5 variable relative
to its price
CV = 100% = birr 5 100% = 5%
s
B x birr 100
19
4.4 Standard Score (Z – score)
• To show how far above or below an individual value is
compared to the population mean in units of standard
deviation
– “How far above or below”= (data value – mean)
– “In units of standard deviation” = divided by s
• Standardized data value
Z = Value of the variable − mean
Standard deviation
– A negative z means the data value falls below the mean.
– A zero z means the data value is the same as the mean.
– A positive z means the data value is above the mean.
20
Example
• A student obtained 80 on a civics exam that had a mean of 70
and a standard deviation of 10. The same student obtained 60
on a calculus exam, which had a mean of 51 and a variance of
64. On which exams did the student perform better relative to
other students? Why?
Civics Calculus
Mean = 70 Mean = 51
Standard deviation = 10 Standard deviation = 8
Score = 80 Score = 60
Z = [(80 – 70)/10] Z = [(60 – 51)/8]
Z= 1.00 Z= 1.125
• Since ZCal is greater than ZCiv, the student performed better on
calculus exam
21