Descriptive Statistics
▪ The central tendency is the extent to which all the
data values group around a typical or central value.
▪ The variation is the amount of dispersion or
scattering of values
▪ The shape is the pattern of the distribution of values
from the lowest value to the highest value.
Chap 3-1
Measures of Central Tendency:
The Mean
◼ The arithmetic mean (often just called the “mean”)
is the most common measure of central tendency
◼ For a sample of size n:
The ith value
Pronounced x-bar
n
X i
X1 + X2 + + Xn
X= i=1
=
n n
Sample size Observed values
Chap 3-2
Measures of Central Tendency:
The Mean
(continued)
◼ The most common measure of central tendency
◼ Mean = sum of values divided by the number of values
◼ Affected by extreme values (outliers)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14
11 + 12 + 13 + 14 + 15 65 11 + 12 + 13 + 14 + 20 70
= = 13 = = 14
5 5 5 5
Chap 3-3
Measures of Central Tendency:
The Median
◼ In an ordered array, the median is the “middle”
number (50% above, 50% below)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
◼ Not affected by extreme values
Chap 3-4
Measures of Central Tendency:
Locating the Median
◼ The location of the median when the values are in numerical order
(smallest to largest):
n +1
Median position = position in the ordered data
2
◼ If the number of values is odd, the median is the middle number
◼ If the number of values is even, the median is the average of the
two middle numbers
Note that n + 1 is not the value of the median, only the position of
2
the median in the ranked data
Chap 3-5
Measures of Central Tendency:
The Mode
◼ Value that occurs most often
◼ Not affected by extreme values
◼ Used for either numerical or categorical (nominal)
data
◼ There may be no mode
◼ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Chap 3-6
Measures of Central Tendency:
Review Example
House Prices: ▪ Mean: ($3,000,000/5)
$2,000,000 = $600,000
$ 500,000
$ 300,000
▪ Median: middle value of ranked
$ 100,000 data
$ 100,000 = $300,000
Sum $ 3,000,000 ▪ Mode: most frequent value
= $100,000
Chap 3-7
Measures of Central Tendency:
Which Measure to Choose?
▪ The mean is generally used, unless extreme values
(outliers) exist.
▪ The median is often used, since the median is not
sensitive to extreme values. For example, median
home prices may be reported for a region; it is less
sensitive to outliers.
▪ In some situations it makes sense to report both the
mean and the median.
Chap 3-8
Measures of Central Tendency:
Summary
Central Tendency
Arithmetic Median Mode
Mean
n
X i
X= i=1
n Middle value Most
in the ordered frequently
array observed
value
Chap 3-9
Measures of Variation
Variation
Range Variance Standard Coefficient of
Variation
Deviation
◼ Measures of variation give
information on the spread
or variability or
dispersion of the data
values.
Same center,
different variation
Chap 3-10
Measures of Variation:
The Range
▪ Simplest measure of variation
▪ Difference between the largest and the smallest values:
Range = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Chap 3-11
Measures of Variation:
Why The Range Can Be Misleading
▪ Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
▪ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Chap 3-12
Measures of Variation:
The Sample Variance
Low variation: more points close to the mean
High variation: more points far from the mean
So, measure the distance to the mean
Chap 1-13
Measures of Variation:
The Sample Variance
◼ Average (approximately) of squared deviations
of values from the mean
n
◼ Sample variance:
(X − X) i
2
S =2 i=1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Chap 3-14
Measures of Variation:
The Sample Standard Deviation
◼ Most commonly used measure of variation
◼ Shows variation about the mean
◼ Is the square root of the variance
◼ Has the same units as the original data
n
◼ Sample standard deviation: (X − X)
i
2
S= i=1
n -1
Chap 3-15
Measures of Variation:
Comparing Standard Deviations
Smaller standard deviation
Larger standard deviation
Chap 3-16
Numerical Descriptive
Measures for a Population
▪ Descriptive statistics discussed previously described a
sample, not the population.
▪ Summary measures describing a population, called
parameters, are denoted with Greek letters.
▪ Important population parameters are the population mean,
variance, and standard deviation.
Chap 3-17
Numerical Descriptive Measures
for a Population: The mean µ
◼ The population mean is the sum of the values in
the population divided by the population size, N
X i
X1 + X2 + + XN
= i=1
=
N N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Chap 3-18
Numerical Descriptive Measures
For A Population: The Variance σ2
◼ Average of squared deviations of values from
the mean
N
◼ Population variance: (X − μ)
i
2
σ2 = i=1
N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Chap 3-19
Numerical Descriptive Measures For A
Population: The Standard Deviation σ
◼ Most commonly used measure of variation
◼ Shows variation about the mean
◼ Is the square root of the population variance
◼ Has the same units as the original data
N
◼ Population standard deviation:
i
(X − μ)2
σ= i=1
N
Chap 3-20
Sample statistics versus
population parameters
Measure Population Sample
Parameter Statistic
Mean
X
Variance
2 S2
Standard
S
Deviation
Chap 3-21