0% found this document useful (0 votes)
33 views33 pages

3-Measures of Dispersion

The document discusses measures of dispersion, which indicate how data observations are spread around the mean. Key measures include range, interquartile range, variance, and standard deviation, each with specific calculations and implications for data analysis. The importance of these measures lies in their ability to characterize frequency distributions and facilitate comparisons between different datasets.

Uploaded by

eurokhan0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views33 pages

3-Measures of Dispersion

The document discusses measures of dispersion, which indicate how data observations are spread around the mean. Key measures include range, interquartile range, variance, and standard deviation, each with specific calculations and implications for data analysis. The importance of these measures lies in their ability to characterize frequency distributions and facilitate comparisons between different datasets.

Uploaded by

eurokhan0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Statistics &

Queuing Theory
Course No: MAT0541202

Topic 3: Measures of Dispersion

Tariq Bin Amir


Measures of Dispersion
 The dispersion of a distribution reveals how the
observations are spread out or scattered on each side
of the center.
 The measure of dispersion shows how the data is
spread or scattered around the mean.
 To measure the dispersion, scatter, or variation of a
distribution is as important as to locate the central
tendency.
 If the dispersion is small, it indicates high iformity of the
observations in the distribution.
 Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all observations in
the distribution are identical.
Measures of Dispersion
Purpose of Measuring Dispersion
 A measure of dispersion appears to serve two purposes.
 First, it is one of the most important quantities used to
characterize a frequency distribution.
 Second, it affords a basis of comparison between two
or more frequency distributions.
 The study of dispersion bears its importance from the
fact that various distributions may have exactly the same
averages, but substantial differences in their variability.
Range
 Range is simply the difference between the largest and
smallest values in a set of data
 Useful for: daily temperature fluctuations or share price
movement
 The formula is:
Range = largest observation - smallest observation
 Example 1: Find out the range of the given distribution:
1, 3, 5, 9, 11
The range is 11 – 1 = 10.
 Example 2:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12
Range
Why the Range can be Misleading
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range (IQR)
 Quartiles split the ranked data into 4 segments with an
equal number of values per segment

25% 25% 25% 25%

Q1 Q2 Q3

 The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
 Q2 is the same as the median (50% of the observations
are smaller and 50% are larger)
 Only 25% of the observations are greater than the third
quartile
Interquartile Range (IQR)
 Interquartile range measures the range of the middle
50% of the values only
 Is defined as the difference between the upper and
lower quartiles
Interquartile range = upper quartile - lower quartile
= Q3 - Q1
 The IQR is a measure of variability that is not influenced
by outliers or extreme values
 Measures like Q1, Q3, and IQR that are not influenced
by outliers are called resistant measures
Interquartile Range (IQR)
 Example:

Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
Quartile Deviation
Mean Deviation
Mean Deviation

k
k = Number of classes
f i xi  x xi= Mid point of the i-th class
MDx   i 1
fi= frequency of the i-th class
n
Mean Deviation
Variance
 Variance is a measure of how data points differ from the
mean
 Example:
Data Set 1: 3, 5, 7, 10, 10
Data Set 2: 7, 7, 7, 7, 7
What is the mean and median of the above data set?
Data Set 1: mean = 7, median = 7
Data Set 2: mean = 7, median = 7
But we know that the two data sets are not identical! The
variance shows how they are different.
We want to find a way to represent these two data set
numerically.
Variance
 Sample variance:
n

 (Xi  X)2 X = arithmetic mean


S2  i 1 n = sample size
n -1
Xi = ith value of the variable X

 Sample Variance with frequency table

 X = arithmetic mean
2
(x  x ) f
s 2
 n = sample size
n -1
Xi = ith value of the variable X
f = frequency
Variance
 Population variance:
N μ = population mean
 (X i  μ) 2

N = population size
σ2  i 1
N Xi = ith value of the variable X
Variance
Calculate the Variance for Ungrouped Data
1. Find the Mean.
2. Calculate the difference between each score and the
mean.
3. Square the difference between each score and the
mean.
4. Add up all the squares of the difference between each
score and the mean.
5. Divide the obtained sum by n – 1.
Variance
 Example:
Variance
 Example (cont.):
Variance
Calculate the Variance for Grouped Data
1. Calculate the mean.
2. Get the deviations by finding the difference of each
midpoint from the mean.
3. Square the deviations and find its summation.
4. Substitute in the formula.
Variance
 Example:
Standard Deviation
 Measures the variation of observations from the mean
 The most common measure of dispersion
 Takes into account every observation
 Measures the ‘average deviation’ of observations from
mean
 Works with squares of residuals not absolute values—
easier to use in further calculations
 Is the square root of the variance
 Has the same units as the original data
Standard Deviation
Standard deviation of a sample s
 In practice, most populations are very large and it is
more common to calculate the sample standard
deviation.
 x  x
2

Sample standard deviation s 


n 1
 Where: (n-1) is the number of observations in the
sample

Standard deviation of a population δ


 Every observation in the population is used.

 x  x
2

Standard deviation δ 
n
Standard Deviation
Characteristics of the Standard Deviation
 The standard deviation is affected by the value of
every observation.
 The process of squaring the deviations before adding
avoids the algebraic fallacy of disregarding the signs.
 It has a definite mathematical meaning and is
perfectly adapted to algebraic treatment.
 It is, in general, less affected by fluctuations of
sampling than the other measures of dispersion.
 The standard deviation is the unit customarily used in
defining areas under the normal curve of error. It has,
thus, great practical utility in sampling and statistical
inference.
Standard Deviation
Steps for Calculating Standard Deviation
1. Calculate the difference between each value and the
mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample variance.
5. Take the square root of the sample variance to get
the sample standard deviation.
Standard Deviation
Standard deviations for frequency distributions
 If data is in a frequency distribution
No. Units Frequency
n f
1 85
2 192
3 123
Total 400
Total

 Calculate standard deviation using:

s

 x  x 2
  1
Standard Deviation
 Example: Find Standard Deviation of Ungroup Data

 Here, x
 x i

50
5
n 10


 ix  x 2
20
s 
2
  2.2,
n 1 9

s  2.2  1.48
Standard Deviation
 Example: Find Standard Deviation of Group Data

  f x  x 
2
f i xi 60 i 40
x  6 s 2
 i
  4.44
f i 10 n 1 9
Moments
 A moment is a quantitative measure of the shape of a
set of points.
 The first moment is called the mean which describes
the center of the distribution.
 The second moment is the variance which describes
the spread of the observations around the center.
 Other moments describe other aspects of a
distribution such as how the distribution is skewed
from its mean or peaked.
 A moment designates the power to which deviations
are raised before averaging them.
Skewness
 A distribution in which the values equidistant from the
mean have equal frequencies and is called Symmetric
Distribution.
 Any departure from symmetry is called skewness.
 In a perfectly symmetric distribution,
Mean=Median=Mode and the two tails of the
distribution are equal in length from the mean.
 If right tail is longer than the left tail then the distribution
is said to have positive skewness. In this case,
Mean>Median>Mode
 If left tail is longer than the right tail then the distribution
is said to have negative skewness. In this case,
Mean<Median<Mode
Skewness
 When the distribution is symmetric, the value of
skewness should be zero.
 Karl Pearson defined coefficient of Skewness as:
Mean  Mode
Sk 
SD
 Since in some cases, Mode doesn’t exist, so using
empirical relation,
Mode  3Median  2Mean
 We can write,
3  Median  Mean 
Sk 
SD

(it ranges b/w -3 to +3)


Kurtosis
 KURTOSIS is a a measure of the "peakedness" of the
probability distribution of a real-valued random
variable. Its the standardized fourth central moment of
a distribution.
 When the peak of a curve becomes
relatively high then that curve is
called Leptokurtic.
 When the curve is flat-topped, then it
is called Platykurtic.
 Since normal curve is neither very
peaked nor very flat topped, so it is
taken as a basis for comparison.
 The normal curve is called
Mesokurtic.
Kurtosis
4
Kurt   2  2 , for population data
2
m4
Kurt  b2  2
, for sample data
m2
 For a normal distribution, kurtosis is equal to 3.
 When is greater than 3, the curve is more sharply
peaked and has narrower tails than the normal curve
and is said to be leptokurtic.
 When it is less than 3, the curve has a flatter top and
relatively wider tails than the normal curve and is said
to be platykurtic.
Conclusion
 The more the data are spread out, the greater the
range, variance, and standard deviation.

 The less the data are spread out, the smaller the
range, variance, and standard deviation.

 If the values are all the same (no variation), all these
measures will be zero.

 None of these measures are ever negative.

You might also like