Charts and Graphs
Charts and Graphs
In Chapters 2 and 3 many techniques are presented for reformatting or reducing data so that the data are more
manageable and can be used to assist decision makers more effectively. Two techniques for grouping data are the
frequency distribution and the stem-and-leaf plot presented in this chapter. In addition, Chapter 2 discusses and
displays several graphical tools for summarizing and presenting data, including histogram, frequency polygon,
ogive, dot plot, bar chart, pie chart, and Pareto chart for one-variable data and the scatter plot for two-variable
numerical data. Raw data, or data that have not been summarized in any way, are sometimes referred to as
ungrouped data. Table 2.1 contains 60 years of raw data of the unemployment rates for Canada. Data that have
been organized into a frequency distribution are called grouped data.
Table 2.2 presents a frequency distribution for the data displayed in Table 2.1. The distinction between Ungrouped
and grouped data is important because the calculation of statistics differs between the two types of data. This
chapter focuses on organizing ungrouped data into grouped data and displaying them graphically.
FREQUENCY DISTRIBUTIONS
One particularly useful tool for grouping data is the frequency distribution, which is a summary of data presented
in the form of class intervals and frequencies. How is a frequency distribution constructed from raw data? That is,
how are frequency distributions like the one displayed in Table 2.2 constructed from raw data like those presented
in Table 2.1?
Frequency distributions are relatively easy to construct. Although some guidelines and rules of thumb help in their
construction, frequency distributions vary in final shape and design, even when the original raw data are identical.
In a sense, frequency distributions are constructed according to individual business researchers taste.
When constructing a frequency distribution, the business researcher should first determine the range of the raw
data. The range often is defined as the difference between the largest and smallest numbers. The range for the data
in Table 2.1 is 9.7 (12.02.3). The second step in constructing a frequency distribution is to determine how many
classes it will contain. One rule of thumb is to select between 5 and 15 classes. If the frequency distribution contains
too few classes, the data summary may be too general to be useful. Too many classes may result in a frequency
distribution that does not aggregate the data enough to be helpful. The final number of classes is arbitrary. The
business researcher arrives at a number by examining the range and determining a number of classes that will span
the range adequately and also be meaningful to the user. The data in Table 2.1 were grouped into six classes for
Table 2.2.
After selecting the number of classes, the business researcher must determine the width of the class interval. An
approximation of the class width can be calculated by dividing the range by the number of classes. For the data in
Table 2.1, this approximation would be 9.7/6 = 1.62. Normally, the number is rounded up to the next whole
number, which in this case is 2. The frequency distribution must start at a value equal to or lower than the lowest
number of the ungrouped data and end at a value equal to or higher than the highest number. The lowest
unemployment rate is 2.3 and the highest is 12.0, so the business researcher starts the frequency distribution at 1
and ends it at 13. Table 2.2 contains the completed frequency distribution for the data in Table 2.1. Class endpoints
are selected so that no value of the data can fit into more than one class. The class interval expression under in
the distribution of Table 2.2 avoids such a problem.
Class Midpoint
The midpoint of each class interval is called the class midpoint and is sometimes referred to as the class mark. It
is the value halfway across the class interval and can be calculated as the average of the two class endpoints. For
example, in the distribution of Table 2.2, the midpoint of the class interval 3under 5 is 4, or (3 + 5)/2. The class
midpoint is important, because it becomes the representative value for each class in most group statistics
calculations. The third column in Table 2.3 contains the class midpoints for all classes of the data from Table 2.2.
Relative Frequency
Relative frequency is the proportion of the total frequency that is in any given class interval in a frequency
distribution. Relative frequency is the individual class frequency divided by the total frequency. For example, from
Table 2.3, the relative frequency for the class interval 5under 7 is 13/60 = .2167. Consideration of the relative
frequency is preparatory to the study of probability in Chapter 4. Indeed, if values were selected randomly from
the data in Table 2.1, the probability of drawing a number that is 5under 7 would be .2167, the relative
frequency for that class interval. The fourth column of Table 2.3 lists the relative frequencies for the frequency
distribution of Table 2.2.
Cumulative Frequency
The cumulative frequency is a running total of frequencies through the classes of a frequency distribution. The
cumulative frequency for each class interval is the frequency for that class interval added to the preceding
cumulative total. In Table 2.3 the cumulative frequency for the first class is the same as the class frequency: 4.
The cumulative frequency for the second class interval is the frequency of that interval (12) plus the frequency of
the first interval (4), which yields a new cumulative frequency of 16. This process continues through the last
interval, at which point the cumulative total equals the sum of the frequencies (60). The concept of cumulative
frequency is used in many areas, including sales cumulated over a fiscal year, sports scores during a contest
(cumulated points), years of service, points earned in a course, and costs of doing business over a period of time.
Table 2.3 gives cumulative frequencies for the data in Table 2.2.
QUANTITATIVE DATA GRAPHS
One of the most effective mechanisms for presenting data in a form meaningful to decision makers is graphical
depiction. Through graphs and charts, the decision maker can often get an overall picture of the data and reach
some useful conclusions merely by studying the chart or graph. Converting data to graphics can be creative and
artful. Often the most difficult step in this process is to reduce important and sometimes expensive data to a graphic
picture that is both clear and concise and yet consistent with the message of the original data. One of the most
important uses of graphical depiction in statistics is to help the researcher determine the shape of a distribution.
Data graphs can generally be classified as quantitative or qualitative. Quantitative data graphs are plotted along a
numerical scale, and qualitative graphs are plotted using non-numerical categories. In this section, we will examine
five types of quantitative data graphs: (1) histogram, (2) frequency polygon, (3) ogive, (4) dot plot, and (5) stem-
and-leaf plot.
Histograms
One of the more widely used types of graphs for quantitative data is the histogram. A histogram is a series of
contiguous bars or rectangles that represent the frequency of data in given class intervals. If the class intervals
used along the horizontal axis are equal, then the height of the bars represent the frequency of values in a given
class interval. If the class intervals are unequal, then the areas of the bars (rectangles) can be used for relative
comparisons of class frequencies. Construction of a histogram involves labeling the x-axis (abscissa) with the class
endpoints and the y-axis (ordinate) with the frequencies, drawing a horizontal line segment from class endpoint to
class endpoint at each frequency value, and connecting each line segment vertically from the frequency value to
the x-axis to form a series of rectangles (bars). Figure 2.1 is a histogram of the frequency distribution in Table 2.2
produced by using the software package Minitab.
A histogram is a useful tool for differentiating the frequencies of class intervals. A quick glance at a histogram
reveals which class intervals produce the highest frequency totals. Figure 2.1 clearly shows that the class interval
7under 9 yields by far the highest frequency count (19). Examination of the histogram reveals where large
increases or decreases occur between classes, such as from the 1under 3 class to the 3under 5 class, an increase
of 8, and from the 7under 9 class to the 9under 11 class, a decrease of 12. Note that the scales used along the x-
and y-axes for the histogram in Figure 2.1 are almost identical. However, because ranges of meaningful numbers
for the two variables being graphed often differ considerably, the graph may have different scales on the two axes.
Frequency Polygons
A frequency polygon, like the histogram, is a graphical display of class frequencies. However, instead of using
bars or rectangles like a histogram, in a frequency polygon each class frequency is plotted as a dot at the class
midpoint, and the dots are connected by a series of line segments. Construction of a frequency polygon begins by
scaling class midpoints along the horizontal axis and the frequency scale along the vertical axis. A dot is plotted
for the associated frequency value at each class midpoint. Connecting these midpoint dots completes the graph.
Figure 2.5 shows a frequency polygon of the distribution data from Table 2.2 produced by using the software
package Excel. The information gleaned from frequency polygons and histograms is similar. As with the
histogram, changing the scales of the axes can compress or stretch a frequency polygon, which affects the users
impression of what the graph represents.
Ogives
An ogive (o-jive) is a cumulative frequency polygon. Construction begins by labeling the x-axis with the class
endpoints and the y-axis with the frequencies. However, the use of cumulative frequency values requires that the
scale along the y-axis be great enough to include the frequency total. A dot of zero frequency is plotted at the
beginning of the first class, and construction proceeds by marking a dot at the end of each class interval for the
cumulative value. Connecting the dots then completes the ogive. Figure 2.6 presents an ogive produced by using
Excel for the data in Table 2.2.
Ogives are most useful when the decision maker wants to see running totals. For example, if a comptroller is
interested in controlling costs, an ogive could depict cumulative costs over a fiscal year. Steep slopes in an ogive
can be used to identify sharp increases in frequencies. In Figure 2.6, a particularly steep slope occurs in the 7
under 9 class, signifying a large jump in class frequency totals.
Stem-and-Leaf Plots
Another way to organize raw data into groups besides using a frequency distribution is a stem-and-leaf plot. This
technique is simple and provides a unique view of the data. A stem-and-leaf plot is constructed by separating the
digits for each number of the data into two groups, a stem and a leaf. The leftmost digits are the stem and consist
of the higher valued digits. The rightmost digits are the leaves and contain the lower values. If a set of data has
only two digits, the stem is the value on the left and the leaf is the value on the right. For example, if 34 is one of
the numbers, the stem is 3 and the leaf is 4. For numbers with more than two digits, division of stem and leaf is a
matter of researcher preference.
Table 2.4 contains scores from an examination on plant safety policy and rules given to a group of 35 job trainees.
A stem-and-leaf plot of these data is displayed in Table 2.5. One advantage of such a distribution is that the
instructor can readily see whether the scores are in the upper or lower end of each bracket and also determine the
spread of the scores. A second advantage of stem-and-leaf plots is that the values of the original raw data are
retained (whereas most frequency distributions and graphic depictions use the class midpoint to represent the
values in a class).