Chapter 3
Describing Data Using Numerical Measures
Chapter Goals
After completing this chapter, you should be able to:
Compute and interpret the mean, median, and mode for a set of
data
Compute the range, variance, and standard deviation and know
what these values mean
Construct and interpret a box and whiskers plot
Compute and explain the coefficient of variation and
z scores
Use numerical measures along with graphs, charts, and tables to
describe data
Chapter Topics
Measures of Center and Location
Mean, median, mode, geometric mean, midrange
Other measures of Location
Weighted mean, percentiles, quartiles
Measures of Variation
Range, interquartile range, variance and standard
deviation, coefficient of variation
Summary Measures
Describing Data
Numerically
Center and Other Measures Variation
Location of Location
Range
Mean Percentiles
Median Interquartile
Quartiles Range
Mode Variance
Weighted Standard
Mean Deviation
Coefficient
of Variation
Measures of Center and Location
Center and Location
Mean Median Mode Weighted Mean
∑w x
n
∑x i
XW = i i
x=
∑w
i=1
n i
N
∑x i µW =
∑ wxi i
µ= i=1
N
∑w i
Mean (Arithmetic Average)
The Mean is the arithmetic average of data values
Sample mean
n = Sample Size
n
∑x i
x1 + x 2 + + x n
x= i =1
=
n n
Population mean
N = Population Size
N
∑x x1 + x 2 + + x N
i
µ= =
i =1
N N
The most common measure of central tendency
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =4
= =3 5 5
5 5
Median
In an ordered array, the median is the “middle” number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the two middle
numbers
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
Mode = 5 No Mode
Weighted Mean
Used when values are grouped by frequency or relative importance
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW =
∑w x
i i
=
(4 × 5) + (12 × 6) + (8 × 7) + (2 × 8)
6 12 ∑w i 4 + 12 + 8 + 2
7 8 164
= = 6.31 days
8 2 26
Shape of a Distribution
Describes how data is distributed
Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
The pth percentile in a data array: 1st quartile = 25th percentile
p% are less than or equal to this
value
2nd quartile = 50th percentile
(100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100) 3rd quartile = 75th percentile
Quartiles
Quartiles split the ranked data into 4 equal groups
25% 25% 25% 25%
A Graphical display of data using 5-number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Minimum 1st Median 3rd Maximum
Quartile Quartile
Shape of Box and Whisker Plots
The Box and central line are centered between the endpoints if data is
symmetric around the median
A Box and Whisker plot can be shown in either vertical or horizontal
format
Distribution Shape and Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of
Variation
Interquartile Population Population
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
Variation
Measures of variation give information on the spread or variability of
the data values.
Same center,
different variation
Range
Simplest measure of variation
Difference between the largest and the smallest observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
14
Range = 14 - 1 = 13
Interquartile Range
Can eliminate some outlier problems by using the interquartile
range
Same center,
Eliminate some high-and low-valued observations and calculate
different variation
the range from the remaining values.
Interquartile range = 3rd quartile – 1st quartile
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Variance
Average of squared deviations of values from the mean
Sample variance:
n
∑ (x i − x) 2
s2 = i=1
n -1
Population variance:
∑ (x i − μ) 2
σ2 = i=1
Standard Deviation
Most commonly used measure of variation N
Shows variation about the mean
Has the same units as the original data
Sample standard deviation:
n
∑ i
(x − x ) 2
s= i =1
n -1
Population standard deviation:
∑ i
(x − μ) 2
σ= i =1
Comparing Standard Deviations
N
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data measured in
different units
Population Sample
σ s
CV =
μ
⋅100% CV = ⋅100%
x
The Empirical Rule
If the data distribution is bell-shaped, then the interval:
μ ± 1σ contains about 68% of the values in the population or
the sample
68%
μ
μ ± 1σ
μ ± 2σ contains about 95% of the values in the population or
the sample
μ ± 3σ contains about 99.7% of the values in the population
or the sample
95% 99.7%
μ ± 2σ μ ± 3σ
Tchebysheff’s Theorem
Regardless of how the data are distributed, at least (1 - 1/k2) of
the values will fall within k standard deviations of the mean
Examples:
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
Using Microsoft Excel
Descriptive Statistics are easy to obtain from Microsoft Excel
Use menu choice:
tools / data analysis / descriptive statistics
Enter details in dialog box
Use menu choice:
tools / data analysis /
descriptive statistics
Enter dialog box
details
Check box for
summary statistics
Click OK
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Chapter Summary
Described measures of center and location
Mean, median, mode, geometric mean, midrange
Discussed percentiles and quartiles
Described measure of variation
Range, interquartile range, variance,
standard deviation, coefficient of variation
Created Box and Whisker Plots
Illustrated distribution shapes
Symmetric, skewed
Discussed Tchebysheff’s Theorem
Calculated standardized data values