0% found this document useful (0 votes)
78 views

3 Summarizing Data

The document discusses different measures of central tendency including the mean, median, and mode. It provides formulas to calculate each and examples of when each measure is most appropriate based on the distribution and scale of the data. Key measures discussed are the population mean, sample mean, weighted mean, and using the appropriate central tendency for normal vs. non-normal distributions.

Uploaded by

Joevyvamae Torre
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

3 Summarizing Data

The document discusses different measures of central tendency including the mean, median, and mode. It provides formulas to calculate each and examples of when each measure is most appropriate based on the distribution and scale of the data. Key measures discussed are the population mean, sample mean, weighted mean, and using the appropriate central tendency for normal vs. non-normal distributions.

Uploaded by

Joevyvamae Torre
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Summarizing Data:

Central Tendency
Kristoffer Ryan T. Gidaya, PhD, RGC
Learning Objectives
• After reading this chapter, you should be able to:
1. Distinguish between a population mean and sample mean.
2. Calculate and interpret the mean, the median, and the mode.
3. Calculate and interpret the weighted mean for two or more samples with unequal sample
sizes.
4. Identify the characteristics of the mean.
5. Identify an appropriate measure of central tendency for different distributions and scales
of measurement.
6. Compute the mean, the median, and the mode using SPSS.
INTRODUCTION TO CENTRAL
TENDENCY
• Suppose before registering for a statistics class, a friend told you that
students in Professor Smith's class earned higher grades on average than
those in Professor Jones's class. On the basis of this information, you
decided to register for Professor Smith's class. You did not need to know all
the individual grades for each student in both classes to make your decision.
Instead, your decision was based on knowledge of a single score, or in this
case, a class average.
• The class average in this example is a measure of central tendency.
• Measures of central tendency are statistical measures for locating a single
score that is most representative or descriptive of all scores in a distribution.
• Measures of central tendency are single values that have a "tendency" to be
near the "center" of a distribution.
• statistical measures of central tendency ensure that the single score
meaningfully represents a set of data.
• MEAN, MEDIAN & MODE
• Measures of central tendency are stated differently for populations and
samples.
• Calculations of central tendency are largely the same for populations and
samples of data, except for the notation used to represent population size
and sample size.
• The population size is the number of individuals who constitute an entire
group or population. The population size is represented by a capital N.
• The sample size is the number of individuals who constitute a subset of
those selected from a larger population. The sample size is represented by a
lowercase n.
MEASURES OF CENTRAL
TENDENCY

Although we use different symbols to represent the number of scores (x)


in a sample (n) versus a population (N), the computation of central
tendency is the same for samples and populations.
The Mean
The formulas for the population mean and
the sample mean are as follows.
• The population mean is the sum of N scores divided by N:

• The sample mean is the sum of n scores divided by n:


The mean is often referred to as the "balance
point" in a distribution. The balance point is not
always at the exact center of a distribution
• Remember that the computation of the mean does not change for
samples and populations, just the notation used in the formula.
• To calculate the mean of a sample or a population, we do the same thing:
We sum a set of scores and divide by the number
of scores summed.
Example 1
Example 2
Activity
• A scientist records the following sample of scores (n = 6) : 3, 6, 4, 1, 10, and
12. What is the sample mean of these scores?
The Weighted Mean
• (denoted Mw) is the combined mean of two or more groups of scores in
which the number of scores in each group is disproportionate or unequal.
• A common application of this in behavioral science is when scores are measured in two
or more samples with unequal sample sizes.
• The term disproportionate refers to the fact that some samples have more scores than
others (the samples are of disproportionate sizes).
The formula for the weighted mean for samples of
unequal size can be expressed as follows:
• M represents the mean of each sample,
and n represents the size of each
sample.
• In this formula, the sample size (n) is
the weight for each mean. Using this
formula, we will compute the
combined mean for two or more
samples of scores in which the
number of scores in each sample is
disproportionate or unequal.
• Notice that the sample size for each group is not the same; more scores were
used to compute the mean for some samples than others. If we computed
the arithmetic mean, we would get the following result:
To compute the weighted mean, we find the product, M x n, for each sample. This gives us a weight for
the mean of each sample. By adding these products, we arrive at the weighted sum:

Then, we divide the weighted sum by the combined sample size (n), which is computed by adding the
sample sizes in the denominator:
• The weighted mean for these samples is 63.5.
• The weighted mean is larger than the arithmetic mean (63 .5 vs. 59.0) because
the larger sample (the sample of obese participants) scored higher on the
fitness measure. Hence, the value of the weighted mean shifted toward the
mean from the larger sample (or the sample with more weight).
• This makes the weighted mean an accurate statistic for computing the mean
for samples with unequal sample sizes.
The Median
• is the middle value in a distribution of data listed in numeric order.
• Suppose you measure the following set of scores: 2, 3, 4, 5, 6, 6, and
100.
• The mean of these scores is 18 (add up the seven scores and divide by 7).
• Yet, the score of 100 is an outlier in this data set, which causes the mean value to increase so much that the mean fails to
reflect most of the data.
• The mean can be misleading when a data set has an outlier because the mean
will shift toward the value of that outlier. For this reason, there is a need for alternative measures of
central tendency.
• One measure is the median, which is the middle value in a distribution. The median value represents the
midpoint of a distribution of scores where half the scores in a distribution fall above and
half below its value.
To find the median position, list a set of scores
in numeric order and compute this formula:

n+1
Median position = __________
2

• Locating the median is a little different for odd- and even-numbered sample
sizes (n).
Activity 1
• When the number of scores in a distribution is odd, order the set of scores
from least to most (or vice versa) and find the middle number. Let us find
the median for each of these lists.
a) 3, 6, 5, 3, 8, 6, 7 (n = 7)
b) 99, 66, 44, 13, 8 (n = 5)
c) 51, 55, 105, 155, 205, 255, 305, 355, 359 (n = 9)
Activity 2
• When the number of scores in a distribution is even, list the scores in
numeric order and then average the middle two scores. Let us find the
median for each of these lists.
• (a) 3, 6, 5, 3, 8, 6 (n = 6)
• (b) 99, 66, 44, 13 (n = 4)
• (c) 55, 105, 155, 205, 255, 305 (n = 6)
• Notice that to find the median, we find the middle score.
• Graphically, the median can be estimated by a cumulative percent distribution.
Because the median is located in the middle of a distribution, it is
approximately at the 50th percentile of a cumulative percent distribution.
The Mode
• is the value in a data set that occurs most often or most frequently.
• One advantage of the mode is that it is simply a count; no calculations or formulas are
necessary to compute a mode.
• To find the mode, list a set of scores in numeric order and count the score that occurs
most often.
• The following is a list of 20 golfers' scores on a difficult par·4 golf hole: 2, 3,
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, and 7. What score did these golfers
card the most (mode) on this hole?
Table 3.4 lists these scores in a frequency distribution table. From this table, it
is clear that most golfers scored a par 4 on this difficult hole. Therefore, the
mode, or most common score on this hole, was par.
Activity 2
• A researcher recorded the number of symptoms for major depressive
disorder (MDD) expressed in a small sample of 20 "at-risk" participants: 0, 4,
3, 6, 5, 2, 3, 3, 5, 4, 6, 3, 5, 6, 4, 0, 0, 3, 0, and 1. How many symptoms of
MOD did participants in this sample most commonly express?
• First list these scores in numeric order: 0, 0, 0, 0, 1, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, and 6. In doing so, we find that 3, which occurred five times, is the
mode in this data set. Participants in this "at-risk" sample most often
reported three symptoms of MDD.
CHARACTERISTICS OF THE MEAN
CHOOSING AN APPROPRIATE
MEASURE OF CENTRAL TENDENCY
• The choice of which measure to select depends largely on the type of
distribution and the
• scale of measurement of the data
Using the Mean to Describe Data

• The mean is typically used to describe data that are normally distributed and
measures on an interval or ratio scale.
Describing Normal Distributions

• The mean is used to describe data that are approximately normally distributed. The normal
distribution is a symmetrical distribution in which scores are similarly distributed above and
below the mean, the median, and the mode at the center of the distribution.
• The general structure and approximate shape of this distribution is shown in Figure 3 .2.
• For cases in which the data are normally distributed, the mean is used to summarize the data.
• We could choose to describe a normal distribution with the median or mode, but the mean is
most often used because all scores are included in its calculation (i.e., its value is most reflective
of all the data).
• The normal distribution (also called the symmetrical, Gaussian, or bell-
shaped distribution) is a theoretical distribution in which scores are
symmetrically distributed above and below the mean, the median, and the
mode at the center of the distribution

You might also like