3 Summarizing Data
3 Summarizing Data
Central Tendency
Kristoffer Ryan T. Gidaya, PhD, RGC
Learning Objectives
• After reading this chapter, you should be able to:
1. Distinguish between a population mean and sample mean.
2. Calculate and interpret the mean, the median, and the mode.
3. Calculate and interpret the weighted mean for two or more samples with unequal sample
sizes.
4. Identify the characteristics of the mean.
5. Identify an appropriate measure of central tendency for different distributions and scales
of measurement.
6. Compute the mean, the median, and the mode using SPSS.
INTRODUCTION TO CENTRAL
TENDENCY
• Suppose before registering for a statistics class, a friend told you that
students in Professor Smith's class earned higher grades on average than
those in Professor Jones's class. On the basis of this information, you
decided to register for Professor Smith's class. You did not need to know all
the individual grades for each student in both classes to make your decision.
Instead, your decision was based on knowledge of a single score, or in this
case, a class average.
• The class average in this example is a measure of central tendency.
• Measures of central tendency are statistical measures for locating a single
score that is most representative or descriptive of all scores in a distribution.
• Measures of central tendency are single values that have a "tendency" to be
near the "center" of a distribution.
• statistical measures of central tendency ensure that the single score
meaningfully represents a set of data.
• MEAN, MEDIAN & MODE
• Measures of central tendency are stated differently for populations and
samples.
• Calculations of central tendency are largely the same for populations and
samples of data, except for the notation used to represent population size
and sample size.
• The population size is the number of individuals who constitute an entire
group or population. The population size is represented by a capital N.
• The sample size is the number of individuals who constitute a subset of
those selected from a larger population. The sample size is represented by a
lowercase n.
MEASURES OF CENTRAL
TENDENCY
Then, we divide the weighted sum by the combined sample size (n), which is computed by adding the
sample sizes in the denominator:
• The weighted mean for these samples is 63.5.
• The weighted mean is larger than the arithmetic mean (63 .5 vs. 59.0) because
the larger sample (the sample of obese participants) scored higher on the
fitness measure. Hence, the value of the weighted mean shifted toward the
mean from the larger sample (or the sample with more weight).
• This makes the weighted mean an accurate statistic for computing the mean
for samples with unequal sample sizes.
The Median
• is the middle value in a distribution of data listed in numeric order.
• Suppose you measure the following set of scores: 2, 3, 4, 5, 6, 6, and
100.
• The mean of these scores is 18 (add up the seven scores and divide by 7).
• Yet, the score of 100 is an outlier in this data set, which causes the mean value to increase so much that the mean fails to
reflect most of the data.
• The mean can be misleading when a data set has an outlier because the mean
will shift toward the value of that outlier. For this reason, there is a need for alternative measures of
central tendency.
• One measure is the median, which is the middle value in a distribution. The median value represents the
midpoint of a distribution of scores where half the scores in a distribution fall above and
half below its value.
To find the median position, list a set of scores
in numeric order and compute this formula:
n+1
Median position = __________
2
• Locating the median is a little different for odd- and even-numbered sample
sizes (n).
Activity 1
• When the number of scores in a distribution is odd, order the set of scores
from least to most (or vice versa) and find the middle number. Let us find
the median for each of these lists.
a) 3, 6, 5, 3, 8, 6, 7 (n = 7)
b) 99, 66, 44, 13, 8 (n = 5)
c) 51, 55, 105, 155, 205, 255, 305, 355, 359 (n = 9)
Activity 2
• When the number of scores in a distribution is even, list the scores in
numeric order and then average the middle two scores. Let us find the
median for each of these lists.
• (a) 3, 6, 5, 3, 8, 6 (n = 6)
• (b) 99, 66, 44, 13 (n = 4)
• (c) 55, 105, 155, 205, 255, 305 (n = 6)
• Notice that to find the median, we find the middle score.
• Graphically, the median can be estimated by a cumulative percent distribution.
Because the median is located in the middle of a distribution, it is
approximately at the 50th percentile of a cumulative percent distribution.
The Mode
• is the value in a data set that occurs most often or most frequently.
• One advantage of the mode is that it is simply a count; no calculations or formulas are
necessary to compute a mode.
• To find the mode, list a set of scores in numeric order and count the score that occurs
most often.
• The following is a list of 20 golfers' scores on a difficult par·4 golf hole: 2, 3,
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, and 7. What score did these golfers
card the most (mode) on this hole?
Table 3.4 lists these scores in a frequency distribution table. From this table, it
is clear that most golfers scored a par 4 on this difficult hole. Therefore, the
mode, or most common score on this hole, was par.
Activity 2
• A researcher recorded the number of symptoms for major depressive
disorder (MDD) expressed in a small sample of 20 "at-risk" participants: 0, 4,
3, 6, 5, 2, 3, 3, 5, 4, 6, 3, 5, 6, 4, 0, 0, 3, 0, and 1. How many symptoms of
MOD did participants in this sample most commonly express?
• First list these scores in numeric order: 0, 0, 0, 0, 1, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, and 6. In doing so, we find that 3, which occurred five times, is the
mode in this data set. Participants in this "at-risk" sample most often
reported three symptoms of MDD.
CHARACTERISTICS OF THE MEAN
CHOOSING AN APPROPRIATE
MEASURE OF CENTRAL TENDENCY
• The choice of which measure to select depends largely on the type of
distribution and the
• scale of measurement of the data
Using the Mean to Describe Data
• The mean is typically used to describe data that are normally distributed and
measures on an interval or ratio scale.
Describing Normal Distributions
• The mean is used to describe data that are approximately normally distributed. The normal
distribution is a symmetrical distribution in which scores are similarly distributed above and
below the mean, the median, and the mode at the center of the distribution.
• The general structure and approximate shape of this distribution is shown in Figure 3 .2.
• For cases in which the data are normally distributed, the mean is used to summarize the data.
• We could choose to describe a normal distribution with the median or mode, but the mean is
most often used because all scores are included in its calculation (i.e., its value is most reflective
of all the data).
• The normal distribution (also called the symmetrical, Gaussian, or bell-
shaped distribution) is a theoretical distribution in which scores are
symmetrically distributed above and below the mean, the median, and the
mode at the center of the distribution