Chapter 4 Data Managmnt Lesson 3 Measures of Dispersion
Chapter 4 Data Managmnt Lesson 3 Measures of Dispersion
Measures of dispresion identify how a set of values spreads or fluctuates. The measures of dispersion are the
range, the mean absolute deviation or variance, the standard deviation, the coefficient of vatiation, the coefficient of
skewness and the boxplot.
1.Range is the simplest measure of dispersion. It is the difference between the highest and lowest score. It actually
does not reflect the variations in the data that lie in between the highest and the lowest scores; therefore, it is not
considered to be a valid measure of variability and spread ability.
for Ungrouped data: The range of a set of data is the absolute difference between the highest and the lowest value in
the set. The range is denoted by R.
R = │HV - LV│
where
R - Range
HV – Highest value
LV - Lowest value
Example 1. The items listed below represent the scores of seven BS Applied Statistics students during the final
examination. Compute the range 89, 75, 90, 85, 78, 87, and 80.
Example 2. Suppose BS Applied Math program has 10 students and the height (in cm) are as follows: 170, 165, 155,
160, 150, 149, 152, 161, 163, 175. Find the range of height of the BSAM students.
RG = │ULHC – LLLC │
where:
R - Range
ULHC - Upper Limit of the Highest Class
LLLC - Lower Limit of the Lowest Class
Example 3. The table below represents the scores of 64 students in a long quiz.
2. Mean absolute deviation, also known as variance, is the simplest method of taking into account the variations or
the spread ability of all items into a series from the point of central tendency.
For Ungrouped data: Given the set of values X1, X2, X3, . . ., XN. The deviation of ith observation from the mean is X1 - µ..
The population variance, σ2, is
( Xi )
( X 1 ) 2 ( X 2 ) 2 ( X 3 ) 2 ... ( Xn ) 2
N N
The computational formula of the variance is
2 Xi 2
2
X 12 X 2 2 X 32 ..... Xn 2
2
N N
Example 4. The following data p esent the score of 7 BS Applied Statistics in a quiz:
r
X1=4, X2=7, X3=8, X4=2, X5=2, X6=9, X7=3.
478 2 293
5
7
2 (X i )2
(X 1 5) 2 ( X 2 5) 2 ( X 3 5) ... ( X 7 5) 2
7 7
2 ( 4 5 ) ( 7 5 ) (8 5 ) ( 2 5 ) 2 ( 9 5 ) 2 ( 3 5 ) 2
2 2 2
7
2
7.42857
Using the computational formula
X
2
2 i 4 2 7 2 8 2 2 2 2 2 9 2 32
2 52 7.43
N 7
Note: Using the definitional or computational formula the population variance is the same.ut the computational is
faster and easier to apply than the definitional formula.
s 2 ( X i X )2 2 2 2
( X X ) ( X 2 X ) ( X 3 X ) ... ( X n X )
1
n 1 n 1
n X i ( X i ) 2
2
2
s
n(n 1)
X
X i
4 7 8 2 2 8 9 2 5 7 54
5 .4
n 10 10
s2
(X i X)
2
(4 4.9) 2 (7 4.9) 2 (8 4.9) 2 ... (7 4.9) 2
7 .6
n 1 9
Using the computational formula
X
2 2 2 2 2
i x1 x2 x3 ... xn 4 2 7 2 82 ... 7 2 305
n X i ( X i ) 2
2
2 10(360) (54) 2
s 7 .6
n(n 1) 10(10 1)
for Grouped data: The variance from the grouped data can be obtained using the formula.
fX
2 2 2 2 2
2 i i 2 f X f 2 X 2 f 3 X 3 ... f k X k 2
G G 1 1 G
N N
n f i X i ( f i X i ) 2
2
S2
n(n 1)
where
fi - the frequency of the ith class
Xi - the class mark of the ith class
µG – the mean from the grouped data
Example 6: The table below represent the scorer of 64 students in along quiz.
fX
2
2 i i 2 29311 1273 2
G G ( ) 62.347412 62.35
N 64 64
n f i X i ( f i X i ) 2
2
64(29311) (1273) 2
s2 63.3370536 63.34
n(n 1) 64(64 1)
Standard deviation is based on the deviations of all the scores in a series. It is always computed from the mean. The
standard deviation is defined as the positive square root of the variance. Hence the variance is denoted by the σ for
the population standard deviation and s for the sample standard deviation.
2
2 G G
2 2
S S SG SG
Example 7. Using the data in example 4. Compute the population standard deviation.
From example 4, the population variance was 7.43, then the population standard is σ = 2.7258.
Example 8. Using the data in example 5. Compute the sample standard deviation.
From example 5, the sample variance was 7.21, then the sample standard deviation is s = 2.68514.
Example 9. Using the data in example 6. Compute the population and sample standard deviation.
From example 6, the population variance was 63.34, then the sample standard deviation is SG = 7.958.
The properties of standard deviation have the same properties with the variance except property 5. The unit of
measure of the standard deviation is the same as the unit of measure of the raw data.
Coefficient of variation, also known as relative dispersion, is the ratio of the standard deviation and the mean and is
usually expressed in percent; i.e.,
� �
CV= � x 100 or CV= µ
x 100
The coefficient of variation is a unitless measure of dispersion, hence it can be used to compare variability of
two or more groups of data measured in the same or different units.
Skewness is a measure or a criterion on how asymmetric the distribution of data is from the mean. Positive skewness
indicates a distribution with an asymmetric tail extending toward the right side of the distribution while negative
skewness indicates a distribution with an asymmetric tail extending toward the left.
The distribution of data is said to symmetric about the mean if its graph can be folded along a vertical axis
about the mean and the two sides coincide. Analytically, if the coefficient of skewness is zero, then the distribution of
the data is symmetric about the mean.
4, 7, 8, 2, 8, 8, 9, 2, 5, 7
Using the measure of central tendency, tell whether the given data are symmetric, skewed to the left,
or skewed to the right.
The formula for the coefficient of the Pearsonian skewness, denoted by SK, is
3( Md )
SK
where
Example 10: The following data represent the score of 7 BS Applied Statistics students in a quiz: X1 =4, X2 =7,
X3 = 8, X4 = 2, X5 = 2, X6 = 9, X7 = 3.
3(5 4)
SK 1.0989 1.10
2.73
Hence,positively skewed distribution
Example12: Using the data from the Frequency Distribution Table in example 6, compute the coefficient of
skewness
3(19.89 20.06)
SK 0.2697 0.27
7.897
Coefficient of Kurtosis
Kurtosis measures the flatness and peakedness of the distribution of a given data set. It also
measures the degree of departure from the normal distribution.
A distribution which is more peaked than the normal distribution is called Leptokurtik distribution. A
distribution which is more flatter than the normal distribution is called Platykurtic distribution. Between these
two types are distribution which is more “normal” in shaped, reffered to as Mesokurtic distribution.
n
K
( 2 ) 2
( X i X )4
n
n
K
(s 2 ) 2
(X i G ) 4
N
N
KG 2
( G ) 2
Example 13: The following data represent the score of 7 BS Applied Statistics in a quiz:
X1=4, X2=7, X3=8, X4=2, X5=2, X6=9, X7=3.
Solution : μ = 5 σ = 2.73
532 / 7
K 0.04115 0.04
(2.732 ) 2
719.017 / 10
K 1.373188 1.37
(2.69 2 ) 2
618352.20 / 64
K 2.483061 2.48
(7.898 2 ) 2
Measure of position identifies the rank or position occupied by a data from an array of data collected.
1. Percentiles are values that divide a set of observations into 100 equal parts. These values denoted by P1,
P2, P3, …, P99, mean that 1% of the data fall below P1, 2% fall below P2, …99% fall below P99. The position
occupied by each of the score from an array of data collected is based on the hundredth when the scores are
arranged from the highest to lowest or vice versa.
p
To determine or identify the data of the desired percentile, the formula ( )n gives the number of
100
p
observation below the percentile, then counting from 1 to ( )n from the data arranged in ascending order gives the
100
percentile.
2. Deciles are values that divide a set of observations into 10 equal parts. These values denoted by D1, D2,
D3…., D9, indicate that 10% of the data fall below D1., 20% fall below D2.., 90% below D9. The position occupied by
each of the score from an array of data collected is based on the tenth when the scores are arranged from highest to
lowest or vice versa.
D
To determine or identify the data of the desired decile, the formula ( )n gives the number of observations
10
D
below the decile, then counting from, 1 to ( )n from the data arranged in ascending order gives the decile.
10
3. Quartiles are values that divide a set of observations into 4 equal parts.
st
The 1st Quartile, , also called the lower quartile is equivalent to P25. To determine the 1 quartile, the formula
n n
Q1 gives the number of observations below the quartile; then, counting from 1 to from the data arranged
4 4
in ascending order gives the quartile.
The 2nd Quartile, Q2, is the middlemost score or the median and is equivalent to the 50th percentile. To determine
2� � �
the 2nd quartile, the formula �2= 4 = 2 gives the number of observations below the quartile; the, counting from 1 to 2 from
the data arranged in ascending order gives the quartile.
Example: The scores of ten students in a 20 point math quiz are as follows: 6, 12, 18, 8. 9. 10, 9, 15, 17, 15. Find the values of Q1,
Q2, D1, D5, P10, P25, P50. Interpret the values.
Scores Position
6 1
8 2
9 3
9 4
10 5
12 6
15 7
15 8
17 9
18 10
N=10
n 10
Q1 = = = 2.5 ≈ 3 . This implies that the value is located on the 3rd position and that is 9. Thus, Q1 = 9. This
4 4
means that 25% of the students got scores equal or below 9; or 75 % of the students got scores equal or above 9.
2n 2(10)
Q2 = 4
= 4
= 5This implies that the value is located on the 5th position and that is 10. Q2=10. This means that
50% of the students got scores equal or below 10 or above 10.
�� 1(10)
�1 = 10
= 10
= 1This implies that the value is located on the 1st position and that is 6. Thus, D1=6. This means
10% of the students got scores equal or below 6; or 90% of the students got scores equal or above 6.
�� 5(10)
�5 = 10
= 10
= 5 This implies that the value is located at 5th position and that is 10. This means that 50% of the
students got scores equal or below 10 or above 10.
�� 10(10)
�10 = 100 = 100
= 1This implies that the value is located on the 1st position and that is 6. Thus, P10=6. This means
that 10% of the students got scores equal or below 6; 0r 90% of the students got scores equal or above 6.
� 25(10)
�
�25 = 100 = 100
= 2.5 ≈ 3 This implies that the value is located on the 3rd position and that is 9. Thus, P25=9. This
means that 25% of the students got scores equal or below 9; or 75% of the students got scores equal or above 9.
�� 50(10)
�50 = 100 = 100
= 5 This implies that the value is located on the 5th position and that is 10. Thus, P50 = 10. This
means that 50% of the students got scores equal or below 9 or above 9.
For Grouped Data: The formulas for quartiles, deciles and percentiles are derived from the formula of the median, i.e
Q(N/4) − Fb
Q = LQC + C
FQC
where
LQC - Lower CB of the quartile class
C - Class size
Fb - <CF before the quartile class
N - Total number of observations
FQC - frequency of the quartile class
Rommel H. Sarreal, RME, MEng-EE(Ongoing)_InstructorI_CvSU-GTC
D(N/10) − Fb
D = LDC + C
FDC
where
LDC - Lower CB of the decile class
C-Class size
Fb - <CF before the decile class
N - Total number of observations
FDC - frequency of the decile class
P(N/100) − Fb
P = LPC + C
FPC
where
LPC - Lower CB of the percentile class
C-Class size
Fb - <CF before the percentile class
N - Total number of observations
FPC - frequency of the percentile class
Example: The table below represents the scores of 64 students in along quiz.
Class Interval Frequency Class Boundary <CF
5-9 7 4.5-9.5 7
10-14 10 9.5-14.5 17
15-19 13 14.5-19.5 30
20-24 18 19.5-24.5 48
25-29 8 24.5-29.5 56
30-34 5 29.5-34.5 61
35-39 4 34.5-395 64
Total N=64
D(N/10) − Fb
D = LDC + C
FDC
5(64/10) − 30
D5 = 19.5 + 5 = 20.06 ≈ 20
18
This means that 50% of the students got scores equal or below 20 or above 20.
P(N/100) − Fb
P = LPC + C
FPC
75(64/100) − 30
P75 = 19.5 + 5 = 24.5 ≈ 25
18
This means that 75% of the students got scores equal or below 25; or 25% of the students got scores equal or above
25.