0% found this document useful (0 votes)
11 views91 pages

Measures of Dispersion

Uploaded by

cleathey2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views91 pages

Measures of Dispersion

Uploaded by

cleathey2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 91

Measures of Dispersion

1
Definition
Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
The more similar the scores are to each other, the
lower the measure of dispersion will be
The less similar the scores are to each other, the
higher the measure of dispersion will be
In general, the more spread out a distribution is,
the larger the measure of dispersion will be
2
Measures of Dispersion
Which of the
distributions of scores 125
has the larger 100
75
dispersion? 50
25
The upper distribution 0
has more dispersion 1 2 3 4 5 6 7 8 9 10
125
because the scores are 100
more spread out 75
50
That is, they are less 25
similar to each other 0
1 2 3 4 5 6 7 8 9 10
3
Measures of Dispersion
There are three main measures of
dispersion:
The range
The semi-interquartile range (SIR)
Variance / standard deviation

4
The Range
The range is defined as the difference between
the largest score in the set of data and the
smallest score in the set of data, XL - XS
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
The largest score (XL) is 9; the smallest score
(XS) is 1; the range is XL - XS = 9 - 1 = 8

5
When To Use the Range
The range is used when
you have ordinal data or
you are presenting your results to people with little or
no knowledge of statistics
The range is rarely used in scientific work as it is
fairly insensitive
It depends on only two scores in the set of data, XL and
XS
Two very different sets of data can have the same
range:
1 1 1 1 9 vs 1 3 5 7 9
6
The Semi-Interquartile Range
The semi-interquartile range (or SIR) is
defined as the difference of the first and
third quartiles divided by two
The first quartile is the 25th percentile
The third quartile is the 75th percentile
SIR = (Q3 - Q1) / 2

7
8
SIR Example
What is the SIR for the 2
data to the right? 4  5 = 25th %tile
25 % of the scores are 6
below 5 8
5 is the first quartile 10
25 % of the scores are 12
above 25 14
25 is the third quartile
20  25 = 75th %tile
SIR = (Q3 - Q1) / 2 = (25 30
- 5) / 2 = 10 60 9
Example 1 – Range and
interquartile range of a data set

Find the quartiles of this data


set: 6, 47, 49, 15, 43, 41, 7, 39,
43, 41, 36.

10
6, 47, 49, 15, 43, 41, 7, 39, 43,
41, 36.
Rank Value
1 6
2 7
3 15
4 36
5 39
6 41
7 41
8 43
9 43
10 47
11
11 49
Then you need to find the Rank Value
rank of the median to 1 6
split the data set in two. 2 7
As we have seen in the 3 15
section on the median, if 4 36
the number of data points 5 39
is an uneven value, the 6 41
rank of the median will be 7 41
(n + 1) ÷ 2 = (11 + 1) ÷ 2 = 6 8 43
The rank of the median is 9 43
6, which means there are 10 47
five points on each side. 11
12
49
Then you need to split the lower Rank Value
half of the data in two again to 1 6
find the lower quartile. The lower
quartile will be the point of rank 2 7
(5 + 1) ÷ 2 = 3. The result is Q1 = 3 15
15. The second half must also be
4 36
split in two to find the value of
the upper quartile. The rank of 5 39
the upper quartile will be 6 + 3 = 6 41
9. So Q3 = 43.
7 41
Once you have the quartiles, you
can easily measure the spread. 8 43
The interquartile range will be 9 43
Q3 - Q1, which gives 28 (43-15).
The semi-interquartile range is 14 10 47
(28 ÷ 2) and the range is 43 (49-6). 11 13 49
14
15
16
17
18

19

20

21
 22
When To Use the SIR
The SIR is often used with skewed data as it
is insensitive to the extreme scores

23
Variance
Variance is defined as the average of the square
deviations:

2  X    2

 
N
24
25
Formula for Standard Deviation

( X  X ) 2
S =

(n - 1)
=square root
=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of observations or cases
Varianc
e

Variance is the average squared


deviation from the mean of a set of
data. It is used to find the
standard deviation.
Variance Formula
The variance formula includes the
Sigma Notation,
 , which represents
the sum of all the items to the right of
Sigma.
 ( x   ) 2
 ( X  X ) 2

n (n - 1)
Mean is represented by and n
is the number of items.
Variance

1. Find the mean of the data.


Hint – mean is the average so add up
the values and divide by the number of
items.
2. Subtract the mean from each value – the
result is called the deviation from the
mean. each deviation of the mean.
3. Square
4. Find the sum of the squares.
5. Divide the total by the number of items.
Standard Deviation
Standard Deviation shows the
variation in data. If the data is close
together, the standard deviation will
be small. If the data is spread out,
the standard deviation will be large.

Standard Deviation is often denoted


by the lowercase Greek letter sigma, . 
Problem #1: There are 39 plants in the
garden. A few plants were selected
randomly and their heights in cm were
recorded as follows: 51, 38, 79, 46, 57.
Calculate the standard deviation of
their heights.
Problem 2: In a class of 50, 4
students were selected at random
and their total marks in the final
assessments are recorded, which
are: 812, 836, 982, and 769. Find
the variance and standard
deviation of their marks.
Solution:
n=4
Sample Mean (X̄) = (812+836+982+769)/4
= 849.75

= [(812 - 849.75)2 + (836 - 849.75)2 + (982 - 849.75)2 + (769 -


849.75)2] /3

= 8541.58

Using the SD formula,

SD = √8541.58 = 92.4

Answer: Variance is 8541.58 and standard deviation for this


data is 92.4
=2.03
What Does the Variance Formula
Mean?
First, it says to subtract the mean from each
of the scores
This difference is called a deviate or a deviation
score
The deviate tells us how far a given score is
from the typical, or average, score
Thus, the deviate is a measure of dispersion for
a given score

75
What Does the Variance Formula
Mean?
Why can’t we simply take the average of the
deviates? That is, why isn’t variance defined
as:
2  X   
 
N
This is not the
formula for
variance!
76
What Does the Variance Formula
Mean?
One of the definitions of the mean was that it
always made the sum of the scores minus the
mean equal to 0
Thus, the average of the deviates must be 0
since the sum of the deviates must equal 0
To avoid this problem, statisticians square the
deviate score prior to averaging them
Squaring the deviate score makes all the squared
scores positive
77
What Does the Variance Formula
Mean?
Variance is the mean of the squared
deviation scores
The larger the variance is, the more the
scores deviate, on average, away from the
mean
The smaller the variance is, the less the
scores deviate, on average, from the mean

78
Standard Deviation
When the deviate scores are squared in variance,
their unit of measure is squared as well
E.g. If people’s weights are measured in pounds,
then the variance of the weights would be expressed
in pounds2 (or squared pounds)
Since squared units of measure are often
awkward to deal with, the square root of variance
is often used instead
The standard deviation is the square root of variance
79
Standard Deviation

Standard deviation = variance


Variance = standard deviation 2

80
Computational Formula
When calculating variance, it is often easier to use
a computational formula which is algebraically
equivalent to the definitional formula:

 X
2

X  
2 2
 X  
2 N 
 
N

N

2 is the population variance, X is a score,  is the


population mean, and N is the number of scores 81
Computational Formula Example
X X2 X- 
(X- ) 2

9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
 = 42  = 306 =0  = 12
82
Computational Formula Example

X
2

 X
2

N
 
2
2
 
N
2  X  
306  42
6
2
 
N

6
12

306  294 
6 6
12

6 2
83
2
Variance of a Sample
Because the sample mean is not a perfect estimate
of the population mean, the formula for the
variance of a sample is slightly different from the
formula for the variance of a population:

s
2

 X X 2

N 1
s2 is the sample variance, X is a score, X is the
sample mean, and N is the number of scores 84
Measure of Skew
Skew is a measure of symmetry in the
distribution of scores
Normal
(skew = 0)

Positive Negative Skew


Skew

85
Measure of Skew
The following formula can be used to
determine skew:

 
 X X
3

3 N
s 
 X  X 
2

N
86
Measure of Skew
If s3 < 0, then the distribution has a negative
skew
If s3 > 0 then the distribution has a positive
skew
If s3 = 0 then the distribution is symmetrical
The more different s3 is from 0, the greater
the skew in the distribution
87
Kurtosis
(Not Related to Halitosis)
Kurtosis measures whether the scores are spread
out more or less than they would be in a normal
(Gaussian) distribution
Mesokurtic
(s4 = 3)

Leptokurtic (s4 > 3) Platykurtic (s4 < 3)

88
Kurtosis
When the distribution is normally distributed, its
kurtosis equals 3 and it is said to be mesokurtic
When the distribution is less spread out than
normal, its kurtosis is greater than 3 and it is
said to be leptokurtic
When the distribution is more spread out than
normal, its kurtosis is less than 3 and it is said to
be platykurtic

89
Measure of Kurtosis
The measure of kurtosis is given by:
4
 
 
 X X 
 
 X  X 
2
 
 
N
s4   
N

90
s2 , s 3 , & s 4
Collectively, the variance (s2), skew (s3),
and kurtosis (s4) describe the shape of the
distribution

91

You might also like