2035 CH2 Notes
2035 CH2 Notes
Example 2.1
A fast food restaurant has a drive-through
lane. The restaurant (call them fast food
chain #1) is interested in how many cars go
through the drive-through lane during the
lunch hour (11:45 to 1:15 pm).
37, 44, 24, 51, 48, 40, 29, 34, 61, 44, 43,
36, 39, 54, 45, 42, 43, 27, 50, 31, 55, 38,
42, 48, 33, 25, 62, 59, 47, 47, 37, 33, 41,
50, 51, 44, 49, 35, 40
Variable of Interest
where i = 1, 2, 3, … , 39
n = 39 (sample size)
2|4579
3|1334567789
4|0012233444577889
5|01124589
6|1
Splitting Stems
• useful for larger data sets or when there
are not enough stems (less than 5) to
clearly see distribution
• split each stem into two:
one with leaves 0 to 4
other with leaves 5 to 9
2|4
2|579
3|1334
3|567789
4|0012233444
4|577889
5|001124
5|58
6|1
Observations about a Distribution
After making a graph, you should always
ask “What do I see?”
Example 2.2
Another fast food restaurant across the street
is also interested in the number of cars
passing through the drive-through lane
during the lunch hour (11:45 to 1:15 pm).
Call them fast food chain #2.
28, 35, 22, 47, 40, 45, 27, 31, 26, 33, 38, 43,
57, 51, 34, 24, 16, 28, 25, 15, 13, 30, 35, 48,
53, 40, 68, 39, 30, 64, 29, 27, 28, 29, 32, 24,
Chain #2 Chain #1
3|1|
65|1|
442|2|4
998887765|2|579
432100|3|1334
9855|3|567789
300|4|0012233444
875|4|577889
31|5|001124
7|5|58
4|6|1
8|6|
Observations
Chain #2 – Distribution
Comparison of Both
2. Histograms
• one advantage of stemplots is that
original data is not lost; it is displayed
in the stemplot
• however, this advantage makes
stemplots awkward for large data sets
or for small data sets with a wide range
of numbers (too many stems, too few
leaves)
1. K = number of classes
• Choose smallest value of K, such that
2K > n = sample size
• K = 5 → 25 = 32 < n = 36
• K = 6 → 26 = 64 > 36; so choose 6
classes
12
10
Frequency
0
$42 to < $50 $50 to < $58 $58 to < $66 $66 to < $74 $74 to < $82 $82 to < $90
Annual Salary (in $000's)
Observations
Note
The histogram gives the same info as a
stemplot, but once the data is grouped into
classes, the original observations are gone
Other Graphical Displays
1. Frequency Polygon
• same horizontal axis as histogram
• constructed by marking a point at the
midpoint of each class
• height of graph is equal to the
frequency of that class
• these points are then connected with
straight lines
• polygons give a better “feel” for the
distribution of data (treats the data as
continuous)
Frequency Polygon of Marketing Managers Salaries in Toronto
14
12
10
Frequency
-
$46,000 $54,000 $62,000 $70,000 $78,000 $86,000
Midpoint of Interval (annual salary)
2.Ogive (cumulative frequency polygons)
• same horizontal axis as histogram, but
we now mark our points at the upper
class boundary of each class
• vertical scale is cumulative frequency,
(or better yet, the cumulative relative
frequency)
• by joining the points with straight
lines, we obtain what is called an
Ogive (cumulative relative frequency
polygon)
• the graph starts at the lower boundary
point of the first interval with a 0
Ogive for Marketing Manager Salaries in Toronto
100%
95%
90%
85%
80%
Cumulative Relative Frequency
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
$42,000 $50,000 $58,000 $66,000 $74,000 $82,000 $90,000
Boundary Points of Salary Intervals
61, 11, 25, 24, 12, 52, 38, 69, 20, 49, 27, 72,
14, 37, 24, 107, 16, 35, 56, 27, 59, 26, 22,
14, 53, 43, 56, 29, 51, 25
10
8
Frequency
0
10 to <20 20 to < 30 to < 40 to < 50 to < 60 to < 70 to < 80 to < 90 to < 100 to <
30 40 50 60 70 80 90 100 110
Share Prices (nearest $)
Frequency Polygon of Share Prices
12
10
8
Frequency
0
15 25 35 45 55 65 75 85 95 105
Share Prices (midpoints of intervals)
80%
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
$10 $20 $30 $40 $50 $60 $70 $80 $90 $100 $110
Share Prices (boundary points of intervals)
Graphical Displays for Qualitative Data
(section 2.3)
U G U S H D D R I U S U
S U G C S U D R S U D U
S S D P R S I S U D G S
S U S D G S C U D D S S
S U
18
16
14
12
Frequency
10
0
Broken Bulging Cracked Dirty Box Hole in Improper Printing Unreadable Unsealed
Box Box Box Box Weight Error Label Box Top
2. Pie Chart
Unreadable Label
Hole in Box 24%
2%
Printing Error
Dirty Box 2%
18%
Cracked Box
4%
Unsealed Box Top
Bulging Box 32%
8%
Broken Box
6%
Improper Weight
4%
Example 2.6
The following table gives data on the blood
types and the various ethnic groups in a study
done in the state of Hawaii:
Ethnic Number Relative Blood Number Relative
Group Observed Frequency Type Observed Frequency
Hawaiian 4,670 O 62337
100.0%
Relative Frequency
80.0%
60.0%
40.0%
20.0%
0.0%
Hawaiian Hawaiian- Hawaiian- White
White Chinese
Ethnic Group
50.0%
Relative Frequency
40.0%
30.0%
20.0%
10.0%
0.0%
O A B AB
Blood Type
Pie Charts vs. Bar Graphs
35%
30%
25%
Percentage
20%
15%
10%
5%
0%
Unsealed Unreadable Dirty Box Bulging Box Broken Box Cracked Improper Hole in Box Printing
Box Top Label Box Weight Error
Charts and Graphs for Two Variables
(section 2.4)
Example 2.8
A company has done a review of the
restaurants in Toronto. A sample of 300
restaurants in the Toronto area was taken,
and data was collected on the quality rating
and the typical meal price.
50%
40%
30%
20%
10%
0%
Good Very Good Excellent
40.0%
35.0%
30.0%
25.0%
20.0%
15.0%
10.0%
5.0%
0.0%
10-19 20-29 30-39 40-49
(B) Scatterplots
A scatterplot is a graph of the pairs of data
points for two quantitative variables
• It is used to examine the possible
relationships between two variables
Example 2.9
A study on the driving speed (in miles per
hour) and fuel efficiency (miles per gallon)
for midsize automobiles resulted in the
following data:
Speed 30 50 40 55 30 25 60 25 50 55
Fuel 28 25 25 23 30 32 21 35 26 25
35
30
Gas Mileage (mpg)
25
20
15
10
0
0 10 20 30 40 50 60 70
Driving Speed (miles per gallon)