0% found this document useful (0 votes)
21 views

2035 CH2 Notes

This chapter discusses techniques for summarizing and depicting quantitative data using charts and graphs. It covers constructing frequency distributions, different types of graphs like histograms and frequency polygons for displaying quantitative data, and how to analyze distributions by looking at features like shape, center, and spread. Examples are provided to demonstrate these concepts, such as analyzing the number of cars passing through a fast food drive-through lane each day and salaries of marketing managers to construct histograms and compare distributions.

Uploaded by

kejacob629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

2035 CH2 Notes

This chapter discusses techniques for summarizing and depicting quantitative data using charts and graphs. It covers constructing frequency distributions, different types of graphs like histograms and frequency polygons for displaying quantitative data, and how to analyze distributions by looking at features like shape, center, and spread. Examples are provided to demonstrate these concepts, such as analyzing the number of cars passing through a fast food drive-through lane each day and salaries of marketing managers to construct histograms and compare distributions.

Uploaded by

kejacob629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 2 – Charts and Graphs

In this chapter we will learn about several


techniques for summarizing and depicting
data.

We will be interested in things like:

• Constructing a frequency distribution from


a set of data
• Describing/constructing different types of
graphs for measurement (quantitative) data
• Discussing the distribution of your data,
including what to look for in a graph and
how to describe the “shape” of the
distribution
• Describing/constructing different types of
graphs for categorical (qualitative) data
• Charts and graphs of two measurement
variables
Frequency Distributions/Graphical Displays
for Quantitative Data (section 2.1/2.2)

Example 2.1
A fast food restaurant has a drive-through
lane. The restaurant (call them fast food
chain #1) is interested in how many cars go
through the drive-through lane during the
lunch hour (11:45 to 1:15 pm).

The fast food chain gathers the following


data on the number of cars passing through
the drive-through lane during the lunch hour
over the past 39 days:

37, 44, 24, 51, 48, 40, 29, 34, 61, 44, 43,
36, 39, 54, 45, 42, 43, 27, 50, 31, 55, 38,
42, 48, 33, 25, 62, 59, 47, 47, 37, 33, 41,
50, 51, 44, 49, 35, 40

Analyze this data and draw some


conclusions.
We see in example 2.1 that things start with
a question:

How many cars go through the drive-


through lane during the lunch hour for this
fast food restaurant?

Variable of Interest

Xi = number of cars that use the drive-


through lane during the lunch
hour of day i

where i = 1, 2, 3, … , 39

n = 39 (sample size)

xi = actual observed number of cars



x1 = 37, x2 = 44, … , x38 = 35, x39 = 40
Organizing/Summarizing Data

What type of data you have will determine


how you organize and summarize your data

However, one of the goals is to determine


the distribution of the variable you are
interested in (you want the distribution of
the data)

In example 2.1, the numbers vary


• this is called variation
• the pattern of variation of a variable is
called its distribution

The distribution of a variable is best


displayed graphically
2 Main Types of Graphical Displays for
Quantitative Data

1. Stemplots (Stem-and-leaf Displays)


• quick/easy way to view the shape of a
distribution
• best used for small data sets with
observations having at least 2 digits

stem: consists of one or more leading


digits
leaf: consists of the final digit

• arrange stems vertically in increasing


order from top to bottom
• arrange leaves in increasing order from
left to right
Back to Example 2.1

2|4579
3|1334567789
4|0012233444577889
5|01124589
6|1

Splitting Stems
• useful for larger data sets or when there
are not enough stems (less than 5) to
clearly see distribution
• split each stem into two:
 one with leaves 0 to 4
 other with leaves 5 to 9

2|4
2|579
3|1334
3|567789
4|0012233444
4|577889
5|001124
5|58
6|1
Observations about a Distribution
After making a graph, you should always
ask “What do I see?”

Look for the graph’s important features such


as:

1. What is the overall pattern?


2. Are there any deviations from this
pattern?
3. What is the shape of the distribution?
4. Where is distribution centred?
5. How spread out is the distribution?
6. Are there any gaps? Are there any
outliers (which are points that fall well
outside the overall pattern)?

Back to Example 2.1


Back to Back Stemplots
Are useful for comparing two related
distributions

Example 2.2
Another fast food restaurant across the street
is also interested in the number of cars
passing through the drive-through lane
during the lunch hour (11:45 to 1:15 pm).
Call them fast food chain #2.

They gather the following data during the


past 36 days:

28, 35, 22, 47, 40, 45, 27, 31, 26, 33, 38, 43,
57, 51, 34, 24, 16, 28, 25, 15, 13, 30, 35, 48,
53, 40, 68, 39, 30, 64, 29, 27, 28, 29, 32, 24,

Graph this data using a stem plot. Make


some observations. Then compare fast food
chain #2 with chain #1.
Solution to 2.2

Chain #2 Chain #1

3|1|
65|1|
442|2|4
998887765|2|579
432100|3|1334
9855|3|567789
300|4|0012233444
875|4|577889
31|5|001124
7|5|58
4|6|1
8|6|

Observations

Chain #2 – Distribution
Comparison of Both
2. Histograms
• one advantage of stemplots is that
original data is not lost; it is displayed
in the stemplot
• however, this advantage makes
stemplots awkward for large data sets
or for small data sets with a wide range
of numbers (too many stems, too few
leaves)

For these cases, a histogram is better to


illustrate the shape of a distribution

• histogram groups the data into classes (or


intervals or cells)
• the number of observations in each class
(the count or frequency) is calculated
Guidelines
1. Divide the range of data into classes of
equal width
• 5 to 15 cells is about right
• compromise between too much detail
and too little
• all observations must fall into one, and
only one, class

2. Count the number of observations in each


class and summarize the results in a
frequency table
Example 2.3
A survey was performed in the Toronto area
that recorded the annual salaries of
marketing managers. The results of 36
random selected managers are
(in $000’s):

$51.4 $68.1 $66.5 $75.0 $60.5 $69.3


$67.3 $74.2 $80.0 $78.9 $88.4 $76.5
$42.9 $55.6 $60.4 $78.8 $81.9 $67.2
$64.6 $81.3 $85.0 $72.5 $79.1 $65.5
$48.9 $73.5 $53.8 $80.8 $74.6 $78.3
$59.5 $72.8 $77.4 $73.5 $68.5 $62.8

Graph this data and make some


observations.
Solution to 2.3

1. K = number of classes
• Choose smallest value of K, such that
2K > n = sample size

• K = 5 → 25 = 32 < n = 36
• K = 6 → 26 = 64 > 36; so choose 6
classes

2. Estimated Class width =


[largest value – smallest value] / K

• range = 88.4 – 42.9 = 45.5


• estimate width = 45.5 / 6 = 7.58 → try
$8,000 as your class width
Boundaries mid- frequency relative Cumulative
point f frequency frequency
$42,000 to less $46,000
than $50,000
$50,000 to less $54,000
than $58,000
$58,000 to less $62,000
than $66,000
$66,000 to less $70,000
than $74,000
$74,000 to less $78,000
than $82,000
$82,000 to less $86,000
than $90,000
n = 36 1.000
Histogram of Marketing Managers Salary in Toronto
14

12

10
Frequency

0
$42 to < $50 $50 to < $58 $58 to < $66 $66 to < $74 $74 to < $82 $82 to < $90
Annual Salary (in $000's)

Observations
Note
The histogram gives the same info as a
stemplot, but once the data is grouped into
classes, the original observations are gone
Other Graphical Displays

1. Frequency Polygon
• same horizontal axis as histogram
• constructed by marking a point at the
midpoint of each class
• height of graph is equal to the
frequency of that class
• these points are then connected with
straight lines
• polygons give a better “feel” for the
distribution of data (treats the data as
continuous)
Frequency Polygon of Marketing Managers Salaries in Toronto
14

12

10
Frequency

-
$46,000 $54,000 $62,000 $70,000 $78,000 $86,000
Midpoint of Interval (annual salary)
2.Ogive (cumulative frequency polygons)
• same horizontal axis as histogram, but
we now mark our points at the upper
class boundary of each class
• vertical scale is cumulative frequency,
(or better yet, the cumulative relative
frequency)
• by joining the points with straight
lines, we obtain what is called an
Ogive (cumulative relative frequency
polygon)
• the graph starts at the lower boundary
point of the first interval with a 0
Ogive for Marketing Manager Salaries in Toronto
100%
95%
90%
85%
80%
Cumulative Relative Frequency
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
$42,000 $50,000 $58,000 $66,000 $74,000 $82,000 $90,000
Boundary Points of Salary Intervals

You can use the Ogive to determine the


relative position of salaries
• for example, about 30% of all marketing
managers salaries are $66,000 or less
Example 2.4
The following are the price per share
(nearest $) of 30 companies listed in the
Dow-Jones Industrial Average:

61, 11, 25, 24, 12, 52, 38, 69, 20, 49, 27, 72,
14, 37, 24, 107, 16, 35, 56, 27, 59, 26, 22,
14, 53, 43, 56, 29, 51, 25

Using a cell width of $10, set up a frequency


table, graph the data using a histogram,
frequency polygon and an ogive, and make
some observations about the distribution of
the data.
Solution to 2.4

Share Prices of Dow Jones Companies


12

10

8
Frequency

0
10 to <20 20 to < 30 to < 40 to < 50 to < 60 to < 70 to < 80 to < 90 to < 100 to <
30 40 50 60 70 80 90 100 110
Share Prices (nearest $)
Frequency Polygon of Share Prices
12

10

8
Frequency

0
15 25 35 45 55 65 75 85 95 105
Share Prices (midpoints of intervals)

Ogive (Cumulative frequency graph) of Share Prices


100%
95%
90%
85%
Cumulative Relative Frequency

80%
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
$10 $20 $30 $40 $50 $60 $70 $80 $90 $100 $110
Share Prices (boundary points of intervals)
Graphical Displays for Qualitative Data
(section 2.3)

1. Bar Graph (Bar Chart)


• main form of graphical display for
categorical data
• categories on horizontal axis
• relative frequencies (or frequencies) on
vertical axis
• each bar is drawn separately (bars do
not touch like in a histogram)
Example 2.5
The operations manager at a cereal packaging
plant said that in her experience, there are 9
reasons that result in the production of
unacceptable cereal boxes at the end of the
packaging process:

R – broken box G – bulging box


C – cracked box D – dirty box
H – hole in box P – printing error
I – improper package weight
U – unreadable label
S – unsealed box top

The raw data below represent a sample of 50


unacceptable cereal boxes taken from the past
week’s production:

U G U S H D D R I U S U
S U G C S U D R S U D U
S S D P R S I S U D G S
S U S D G S C U D D S S
S U

Summarize and graph these data.


Solution
Reason Freq. Relative Frequency
R 3
G 4
C 2
D 9
H 1
I 2
P 1
U 12
S 16

Reasons for Unacceptable Cereal Boxes

18

16

14

12
Frequency

10

0
Broken Bulging Cracked Dirty Box Hole in Improper Printing Unreadable Unsealed
Box Box Box Box Weight Error Label Box Top
2. Pie Chart

The pie chart for example 2.5 is given


below:

Unreadable Label
Hole in Box 24%
2%

Printing Error
Dirty Box 2%
18%

Cracked Box
4%
Unsealed Box Top
Bulging Box 32%
8%
Broken Box
6%
Improper Weight
4%
Example 2.6
The following table gives data on the blood
types and the various ethnic groups in a study
done in the state of Hawaii:
Ethnic Number Relative Blood Number Relative
Group Observed Frequency Type Observed Frequency
Hawaiian 4,670 O 62337

Hawaiian- 9,982 A 59,537


White
Hawaiian- 5,385 B 17,604
Chinese
White 125,020 AB 5,579

Total 145,057 145,057

Calculate the relative frequencies, graph the


results and make some observations.
Solution
Bar Graph of Ethnic Group for Hawaii

100.0%
Relative Frequency

80.0%

60.0%

40.0%

20.0%

0.0%
Hawaiian Hawaiian- Hawaiian- White
White Chinese
Ethnic Group

Bar Graph of Blood Type for Hawaii

50.0%
Relative Frequency

40.0%

30.0%

20.0%

10.0%

0.0%
O A B AB
Blood Type
Pie Charts vs. Bar Graphs

1. Pie charts are kind of nice to look at as


part of a presentation, but they are harder
to create by hand
2. In order to use a pie chart, each individual
observation in your data must fall into one,
and only one, category AND the pie chart
must include all the categories that make
up the whole

3. Pie charts are useful only when you want


to emphasize each category’s relation to
the whole
4. Bar graphs are easier to make than pie
charts and are easier to read – they are
generally better to use when graphing
categorical data

5. The order in which you display the


categories on the horizontal axis does not
matter (although, some people feel it is a
good idea to arrange the bars in order of
heights)
Example 2.7
A survey of over 200,000 freshmen students
reported the following data on the sources
students use to pay for the expenses of
university or college:
Funding Source Percent of Students

Family Resources 78.2%


Student Resources 62.8%
Aid – not to be repaid 70.0%
Aid – to be repaid 53.4%
Other 6.5%
If the bars are ordered from tallest to
shortest:
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
Family Aid - not to be Student Aid - to be Other
Resources repaid Resources repaid

But the bars can be in any order:


Pareto Charts (section 2.3)

When a bar graph is ordered by arranging


the bars in order of heights (from tallest on
the left to smallest on the right), the bar
graph can be referred to as a Pareto Chart

• This is named after the Italian economist


Vilfredo Pareto, who observed more than
100 years ago that most of Italy’s wealth
was controlled by a few families who were
the major drivers behind the Italian
economy
• This notion was applied to quality control
in industry
o by using a Pareto chart, poor quality in
the manufacture of goods can often be
addressed by attacking a few major
causes that result in most of the
problems
Back to Example 2.5
Here is the bar chart of the reasons for
unacceptable boxes of cereal, arranged from
tallest to smallest (giving us the Pareto
Chart)

Reasons for Unacceptable Cereal Boxes

35%

30%

25%
Percentage

20%

15%

10%

5%

0%
Unsealed Unreadable Dirty Box Bulging Box Broken Box Cracked Improper Hole in Box Printing
Box Top Label Box Weight Error
Charts and Graphs for Two Variables
(section 2.4)

(A) Summarizing Two Categorical Variables


– Cross Tabulation

Cross tabulation is a process for producing a


two-dimensional table that displays the
frequency counts for two variables
simultaneously

Example 2.8
A company has done a review of the
restaurants in Toronto. A sample of 300
restaurants in the Toronto area was taken,
and data was collected on the quality rating
and the typical meal price.

The table below shows the data for the first


10 restaurants:
Restaurant Quality Rating Meal Price ($)
1 Good 18
2 Very Good 22
3 Good 28
4 Excellent 38
5 Very Good 33
6 Good 28
7 Very Good 19
8 Very Good 11
9 Very Good 23
10 Good 13
: : :

Note that “quality rating” is a categorical


variable, while “meal price” is a quantitative
variable that ranges from $10 to $49.
• We will put the meal prices into ranges
(categories) of $10-19, $20-29, $30-39
and $40-49
• A cross tabulation of the data is given in
the table below
Quality Meal Price
Rating 10-19 20-29 30-39 40-49 Total
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
Distribution by Restaurant Rating
60%

50%

40%

30%

20%

10%

0%
Good Very Good Excellent

Distribution by Meal Price


45.0%

40.0%

35.0%

30.0%

25.0%

20.0%

15.0%

10.0%

5.0%

0.0%
10-19 20-29 30-39 40-49
(B) Scatterplots
A scatterplot is a graph of the pairs of data
points for two quantitative variables
• It is used to examine the possible
relationships between two variables

Example 2.9
A study on the driving speed (in miles per
hour) and fuel efficiency (miles per gallon)
for midsize automobiles resulted in the
following data:

Speed 30 50 40 55 30 25 60 25 50 55
Fuel 28 25 25 23 30 32 21 35 26 25

Graph the data and draw some conclusions.


Driving Speed vs. Gas Mileage (Midsize Automobiles)
40

35

30
Gas Mileage (mpg)

25

20

15

10

0
0 10 20 30 40 50 60 70
Driving Speed (miles per gallon)

You might also like