0% found this document useful (0 votes)
33 views24 pages

Week 2

The document discusses various methods for graphing quantitative data, including pie charts, bar charts, line charts, dot-plots, stem and leaf plots, and relative frequency histograms. It explains how to effectively present and interpret data distributions, including identifying features such as symmetry, skewness, and outliers. Additionally, it emphasizes the importance of numerical measures for making inferences about populations based on sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views24 pages

Week 2

The document discusses various methods for graphing quantitative data, including pie charts, bar charts, line charts, dot-plots, stem and leaf plots, and relative frequency histograms. It explains how to effectively present and interpret data distributions, including identifying features such as symmetry, skewness, and outliers. Additionally, it emphasizes the importance of numerical measures for making inferences about populations based on sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dr.

Modar Shbat
Division of Engineering
[email protected]
2
Quantitative variables measure an amount or quantity on each experimental unit. If the
variable can take only a finite or countable number of values, it is a discrete variable. A
variable that can assume an infinite number of values corresponding to points on a line
interval is called continuous.
Quantitative data graphs options:
a. Pie and bar charts
b. Line charts
c. Dot-plots
d. Stem and leaf plots
e. Relative frequency histograms
Pie and Bar Charts
Sometimes information is collected for a quantitative variable measured on different
segments of the population, or for different categories of classification. For example, we
might measure the average incomes for people of different age groups.
In such cases, you can use pie charts or bar charts to describe the data, using the
amount measured in each category rather than the frequency of occurrence of each
category.

Prob. & Stat. 3


The pie chart displays how the total quantity is distributed among the categories, and the
bar chart uses the height of the bar to display the amount in a particular category.
Example: The amount of money expended in fiscal year 2005 by the U.S. Department
of Defense in various categories is shown in Table. Construct both a pie
chart and a bar chart to describe the data.

Prob. & Stat. 4


Example (Cont.):
In the case of pie chart, for example, for the research and development category, the
angle of the sector is:

Line Charts
When a quantitative variable is recorded over time at equally spaced intervals (such as
daily, weekly, monthly, quarterly, or yearly), the data set forms a time series. Time series
data are most effectively presented on a line chart with time as the horizontal axis.

Prob. & Stat. 5


Line Charts
The idea is to try to discern a pattern or trend that will likely continue into the future, and
then to use that pattern to make accurate predictions for the immediate future.
Example: The United States gives projections for the portion of the U.S. population that
will be 85 and over in the coming years, as shown below. Construct a line
chart to illustrate the data. What is the effect of stretching and shrinking the
vertical axis on the line chart?

Solution:
The quantitative variable “85 and over” is measured over five time intervals, creating a
time series that you can graph with a line chart. The time intervals are marked on the
horizontal axis and the projections on the vertical axis. The data points are then
connected by line segments to form the line charts.

Prob. & Stat. 6


Example (Cont.):

Shrinking the scale on the vertical axis causes large changes to appear small, and vice
versa (stretching). To avoid misleading conclusions, you must look carefully at the scales
of the vertical and horizontal axes.

Prob. & Stat. 7


Dot-plots
Many sets of quantitative data consist of numbers that cannot easily be separated into
categories or intervals of time. You need a different way to graph this type of data. The
simplest graph for quantitative data is the dot-plot.
For a small set of measurements—for example, the set 2, 6, 9, 3, 7, 6 we can simply plot
the measurements as points on a horizontal axis.

For a large data set, the dot-plot can be uninformative and tedious to interpret.

Prob. & Stat. 8


Stem and Leaf Plots
Another simple way to display the distribution of a quantitative data set is the stem and
leaf plot. This plot presents a graphical display of the data using the actual numerical
values of each data point.

Example:
In the following table the prices (in dollars) of 19 different brands of walking shoes.
Construct a stem and leaf plot to display the distribution of the data.

Prob. & Stat. 9


To create the stem and leaf, we could divide
each observation between the ones and the
tens place. The number to the left is the stem;
the number to the right is the leaf. Thus, for the
shoes that cost $65, the stem is 6 and the leaf
is 5.
Solution:

If you indicate that the leaf unit is 1, the reader will realize that the stem and leaf 6 and
8, for example, represent the number 68.

Prob. & Stat. 10


Stem and Leaf Plots
Sometimes the available stem choices result in a plot that contains too few stems and a
large number of leaves within each stem. In this situation, you can stretch the stems by
dividing each one into several lines, depending on the leaf values assigned to them.
Stems are usually divided in one of two ways:

Example: The data in Table are the weights at birth of 30 full-term babies, born at a
metropolitan hospital and recorded to the nearest tenth of a pound.
Construct a stem and leaf plot to display the distribution of the data.
Solution:

Prob. & Stat. 11


Example (Cont.):
Solution:
The data, though recorded to an accuracy of only one decimal place, are measurements
of the continuous variable x =weight, which can take on any positive value. By looking at
Table, we can quickly see that the highest and lowest weights are 9.4 and 5.6,
respectively.

For these data, the leaf unit is .1, and the reader can infer that the stem and leaf 8 and
2, for example, represent the measurement x = 8.2

Prob. & Stat. 12


Once we have created a graphs for a set of data, we should look for some features to
describe the data:
• First, check the horizontal and vertical scales, so that you are clear about what is
being measured.
• Examine the location of the data distribution (Where on the horizontal axis is the
center of the distribution, comparing two distributions)
• Examine the shape of the distribution (Does the distribution have one “peak,”. If so,
this is the most frequently occurring measurement or category. Is there more than one
peak? Are there an approximately equal number of measurements to the left and right of
the peak?)
• Look for any unusual measurements or outliers (any measurements much bigger or
smaller than all of the others. These outliers may not be representative of the other
values in the set)

Distributions are often described according to their shapes.

Prob. & Stat. 13


Definition: A distribution is symmetric if the left and right sides of the distribution,
when divided at the middle value, form mirror images. A distribution is skewed to the
right if a greater proportion of the measurements lie to the right of the peak value.
Distributions that are skewed right contain a few unusually large measurements.
A distribution is skewed to the left if a greater proportion of the measurements lie to
the left of the peak value. Distributions that are skewed left contain a few unusually
small measurements.

A distribution is unimodal if it has one peak; a bimodal distribution has two peaks.
Bimodal distributions often represent a mixture of two different populations in the data
set.
Example: Examine the three dot-plots and describe these distributions in terms of their
locations and shapes.

14

This dot-plot shows a relatively symmetric distribution with a single peak located at x =4.
Example (Cont.):

The second dot-plot has a long “right tail,” meaning that there are a few unusually large
observations. This distribution is skewed to the right.

Similarly, the third dot-plot with the long “left tail” is skewed to the left.

Prob. & Stat. 15


Example:
An administrative assistant for the athletics department at a local university is monitoring
the grade point averages for eight members of the volleyball team. He enters the GPAs
into the database but accidentally misplaces the decimal point in the last entry. Use a
dot-plot to describe the data and uncover the assistant’s mistake.

Solution:

Outlier
We can clearly see the
outlier or unusual
observation caused by
the assistant’s data
entry error.

After correction
Since this is a very small set, it is difficult to describe the shape of the distribution,
although it seems to have a peak value at 3.0 and it appears to be relatively symmetric 16
Note:
When comparing graphs created for two data sets, you should compare their scales of
measurement, locations, and shapes, and look for unusual measurements or outliers.
Remember that outliers are not always caused by errors or incorrect data entry.
Sometimes they provide very valuable information that should not be ignored. We may
need additional information to decide whether an outlier is a valid measurement that is
simply unusually large or small, or whether there has been some sort of mistake in the
data collection.
RELATIVE FREQUENCY HISTOGRAMS
A relative frequency histogram resembles a bar chart, but it is used to graph quantitative
rather than qualitative data.
Example:
To present the data using relative
frequency Histogram, first, divide
the interval from the smallest to
the largest measurements into
subintervals or classes of equal
length.
Prob. & Stat. 17
Example (Cont.):

If you stack up the dots in each subinterval and draw a bar over each stack, you will
have created a frequency histogram or a relative frequency histogram, depending
on the scale of the vertical axis.

Definition: A relative frequency histogram for a quantitative data set is a bar


graph in which the height of the bar shows “how often” (measured as a proportion or
relative frequency) measurements fall in a particular class or subinterval. The classes
or subintervals are plotted along the horizontal axis.

Prob. & Stat. 18


The number of classes should range from 5 to 12; the more data available, the more
classes you need. The classes must be chosen so that each measurement falls into one
and only one class.
Example (Cont.):
For the birth weights in our example, we decided to use eight intervals of equal length.
Since the total span of the birth weights is:

the minimum class width necessary to cover the range of the data is:
For convenience, we round this approximate width up to 0.5. Beginning the first interval
at the lowest value, 5.6, we form subintervals from 5.6 up to but not including 6.1, 6.1 up
to but not including 6.6, and so on. By using the method of left inclusion, and including
the left class boundary point but not the right boundary point in the class, we eliminate
any confusion about where to place a measurement that happens to fall on a class
boundary point.
To construct the relative frequency histogram, plot the class boundaries along
the horizontal axis. Draw a bar over each class interval, with height equal to the
relative frequency for that class.

Prob. & Stat. 19


Example (Cont.):

Example:
Twenty-five Starbucks® customers are polled
in a marketing survey and asked, “How often
do you visit Starbucks in a typical week?” The
Table lists the responses for these 25
customers. Construct a relative frequency
histogram to describe the data.

Prob. & Stat. 20


Example (Cont.):
The variable being measured is “number of visits to Starbucks,” which is a discrete
variable that takes on only integer values. In this case, it is simplest to choose the
classes or subintervals as the integer values over the range of observed values.

We can notice that the distribution is skewed to the left and that there is a gap between
1 and 3.

Prob. & Stat. 21


Prob. & Stat. 22
General Objectives
Graphs are extremely useful for the visual description of a data set. However, they are
not always the best tool when you want to make inferences about a population from the
information contained in a sample. For this purpose, it is better to use numerical
measures to construct a mental picture of the data.
SECTION INDEX
● Box plots
● Measures of center: mean, median, and mode
● Measures of relative standing: z-scores, percentiles, quartiles, and the inter-quartile
range
● Measures of variability: range, variance, and standard deviation
● Empirical Rule

Definition: Numerical descriptive measures associated with a population of


measurements are called parameters; those computed from sample
measurements are called statistics.

Prob. & Stat. 23

You might also like