STATS
STATS
Table of Contents
● Introduction to Statistics ……………………………………………………………………………….….. pg.3
● Measurement Scales ……………………………………………………..……………………………..… pg.4
● Central Tendency …………………………………………………………..………………………...……pg. 5
● Standard Scores and the Normal Curve ………………………………………………………………... pg.10
● Correlation Analysis ……………………………………………………………………………………. pg. 14
● Short Quiz ………………………………………………………………………….……………….….. pg. 20
Make a copy of this SHEETS for some additional Data the measurements that are made
information regarding the topics. on the participants in a study
RATIO SCALES
● The highest level of measurement.
● It has all the properties of an interval scale
and in addition, has an absolute zero point.
○ While the difference between 8 and 9
degrees Celsius is the same as the LESSON 3: MEASURES OF CENTRAL TENDENCY
difference between 99 and 100 ● The three most used measures of central
degrees, a reading of 20 degrees tendency are the arithmetic mean, the median,
Celsius is not twice as hot as 10 and the
degrees Celsius. mode.
○ Zero degrees Celsius is an arbitrarily
set zero that actually occurs at 273 THE MEAN
degrees Kelvin. ● The arithmetic mean is defined as the sum of
the scores divided by the number of scores.
LESSON 2.2: CONTINUOUS vs. DISCRETE
VARIABLES Properties of the Mean
Continuous Variable 1. Changing a Score: Changing the value of
● One that theoretically can have an infinite any score in a distribution will change the
number of values between adjacent units on mean.
the scale. 2. Introducing a New Score or Removing a
● Ex. weight, height, and time Score: If you add a new score or take away
a score, both ∑X and N will change. It does ● However, with unequal number of scores per
not guarantee, however, that a change in the distribution (i.e., n₁=10, n₂=60, and n₃=30),
mean will take place. then:
3. Adding or Subtracting a Constant: When
adding or subtracting a constant, the mean
changes in exactly the same way.
○ Consider this example. THE MEDIAN
● Definition: The median (symbol Mdn) is
defined as the scale value below which 50%
of the scores fall.
● Among grouped data, it is therefore the
same thing as P50.
Formula:
𝑋𝑛+1
★ 𝑥̃ = 2
THE MODE
Google Sheets / Excel Command:
● The third and last measure of central
★ =AVERAGE(value)
tendency that we shall discuss is the mode.
Formula: ● The mode is defined as the most frequent
★ 𝑥̅ =
Σ𝑥 score in the distribution.
𝑛
● Clearly, this is the easiest of the three
measures to determine. The mode is found by
THE OVERALL MEAN inspection of the scores; there isn’t any
● Given that three distributions have the same calculation necessary.
number of scores (i.e., n₁=20, n₂=20, and ○ For instance, to find the mode of the
n₃=20); with the following means: Mean₁ = data in Table 3.2, all we need to do is
60; Mean₂ = 50; Mean₃ = 40 search the frequency column. The mode
● Because the number of scores per distribution for these data is 76. With grouped
is the same, then: scores, the mode is designated as the
60 + 50 + 40 = 150/3 = 50 midpoint of the interval with the
highest frequency. bimodal distribution are shown in Figure 4.2.
EXERCISE #1
Calculate the standard deviation of the scores
contained in the first column of the following table: Introduction to z-scores
● Now we shift attention to the individual
scores within a distribution.
● A statistical technique that uses the mean and
the standard deviation to transform each
score (X value) into a z-score, or a standard
score.
● The purpose of z-scores, or standard scores,
is to identify and describe the exact
location of each score in a distribution.
Note: For our purposes, answers should be rounded EXAMPLE ON THE USE OF Z-SCORES
off to the nearest three decimal places. ● Suppose you received a score of X = 76 on
a statistics exam. How did you do? It should
be clear that you need more information to
predict your grade. Your score of X = 76
could be one of the best scores in the class,
or it might be the lowest score in the
distribution.
● To find the location of your score, you must
have information about the other scores in
the distribution. It would be useful, for
example, to know the mean for the class. If
the mean were 70, you would be in a much
better position than if the mean were 85.
Obviously, your position relative to the rest
of the class depends on the mean.
● However, the mean by itself is not sufficient
to tell you the exact location of your score.
Suppose you know that the mean for the
statistics exam is 70 and your score is X =
76. The preceding example demonstrates that a score
● At this point, you know that your score is 6 by itself does not provide much information about
points above the mean, but you still do not its position in the distribution. To make these raw
know exactly where it is located. Six points scores meaningful, we need to convert them to
may be a relatively big distance and you standard scores. One such standard score is a
may have one of the highest scores in the z-score.
class, or 6 points may be a relatively small ● A z score is a transformed score that
distance and you may be only slightly above designates how many standard deviation
the average. units the corresponding raw score is above or
● Figure 5.2 shows two possible distributions of below the mean.
exam scores. Both distributions have a mean
of 70, but for one distribution, the standard
deviation is 3, and for the other, SD = 12.
The location of X = 76 is highlighted in each
of the two distributions.
● When the standard deviation is 3, your
score of X = 76 is in the extreme right-hand Z-SCORES AND LOCATION IN A DISTRIBUTION
tail, the highest score in the distribution. ● One of the primary purposes of a z-score is
However, in the other distribution, where the to describe the exact location of a score
SD is 12, your score is only slightly above within a distribution. The z-score
average. accomplishes this goal by transforming each
● Thus, the relative location of your score within X value into a signed number (+ or –) so that
the distribution depends on the standard ○ The sign tells whether the score is
deviation as well as the mean. located above (+) or below (–) the
mean, and 6 points.
○ The number tells the distance Step 3: Identify the X value.
between the score and the mean in ● The value we want is located below the
terms of the number of standard mean by 6 points. The mean is μ = 60, so
deviations. the score must be X = 54.
● Thus, in a distribution of IQ scores with mean
= 100 and SD = 15, a score of X = 130 CHARACTERISTICS OF A NORMAL CURVE
would be transformed into z = 2.00. The z
value indicates that the score is located
above the mean (+) by a distance of 2
standard deviations (30 points).
● Question: What are the corresponding
z-scores for the given example above? (in
Figure 5.2)
● Using the same z-score formula, it becomes 1. It is bell-shaped and symmetrical.
possible to determine the other parts of the 2. The mean, median, and mode are equal in a
formula. normal distribution.
3. The greatest concentration of scores are
situated at the center of the distribution.
4. It is asymptotic (the tail extends infinitely
towards both ends without touching the
abscissa (or the x-axis).
Issues to remember:
1. Finding a relationship between two variables
allows us to make predictions (i.e., the first
variable “predicts” how the second variable
will turn out; and
LESSON 5: CORRELATION ANALYSIS 2. Even when two variables are related, we
● Regression – describes how an independent cannot fully say that one variable “caused”
variable is numerically related to the the other to happen.
dependent variable
● Correlation – measures co-relationship or CORRELATION
association of two variables. ● Correlation is a statistical technique that is
Why Correlate? used to measure and describe the
● We would like to see whether two variables relationship between two variables.
are “related.” ● Usually, the two variables are simply
○ What happens to a second variable observed as they exist naturally in the
when the first variable “increases” or environment—there is no attempt to control
“decreases”? or manipulate the variables.
Example: ● Example: a researcher could check high school
● Are interview scores of applicants (variable 1) records (with permission) to obtain a measure
related to their performance appraisal scores of each student’s academic performance, and
after six months (variable 2)? then survey each family to obtain a measure
○ If there is a significant relationship of income.
between interview scores and ○ The resulting data could be used to
subsequent performance appraisal, determine whether there is relationship
can we use interview scores to predict between high school grades and
employee performance? family income.
● Is there a relationship between USTET scores
(variable 1) and students’ GPA (variable 2)?
○ If there is a significant relationship
between USTET scores and students’
GPA, can we use USTET scores to
predict scholastic success of students?
● Is there a relationship between employees’
organizational commitment (variable 1) and
their work performance (variable 2)?
○ If there is a significant relationship
CHARACTERISTICS OF A RELATIONSHIP Relationship – this is measured by the
1. The Direction of the Relationship – the sign numerical value of the correlation.
of the correlation, positive or negative, ● A perfect correlation is always
describes the direction of the relationship. identified by a correlation of 1.00
● In a positive correlation, the two and indicates a perfectly consistent
variables tend to change in the same relationship.
direction: ○ For a correlation of 1.00 (or
○ As the value of the X variable –1.00), each change in X is
increases, the Y variable also accompanied by a perfectly
tends to increase; predictable change in Y.
○ When the X variable ● At the other extreme, a correlation of
decreases, the Y variable also 0 indicated no consistency at all.
decreases. ○ For a correlation of 0, the
● In a negative correlation, the two data points are scattered
variables tend to go in opposite randomly with no clear trend.
directions. ● Intermediate values between 0 and 1
○ As the X variable increases, the indicate the degree of consistency.
Y variable decreases. That is, it
is an inverse relationship.
CONGRATS, BESTIE!
THIS IS THE END OF THE REVIEWER.
GOOD LUCK :)
QUIZ 11.For a sample with a mean of 80 and a
1. A researcher conducted an experiment standard deviation of 20, find the X-value
involving three groups of subjects. The mean corresponding to z = -0.20.
of the first group is 75, and there were 50 12.If a distribution has a long tail to the left, it is
subjects in the group. The mean of the second said to be:
group is 80, and there were 40 subjects. The a. Positively skewed
third group has a mean of 70 and 25 b. Negatively skewed
subjects. Calculate the overall mean of the c. Symmetrical
three groups combined. 13.For a sample with a mean of 85, a score of
Mean n Mean x n X = 80 corresponds to a z = -1.00 What is
75 50 ? the standard deviation for the sample?
80 40 ? 14.Which of the following shapes best
70 25 ? represents a symmetrical distribution?
Total ? ? a. A bell curve
Overall Mean ? b. A J-curve
c. A U-curve
2. Calculate the range for the following
d. A skewed curve
distribution: 18, 12, 28, 15, 20
15.For a sample with a standard deviation of
3. For each of the following variables, identify
12, a score of X = 83 corresponds to a z =
the level of measurement: Eye color (e.g.,
0.50. What is the mean for the sample?
blue, brown, green).
16.Correlation measures the:
4. Calculate the range for the following
a. Difference between two variables
distribution: 115, 107, 105, 109, 101
b. Relationship between two variables
5. For each of the following variables, identify
c. Average of two variables
the level of measurement: Ranking of
17.Calculate the standard deviation of the
students in a class (1st, 2nd, 3rd, etc.)
scores: 25, 28, 35, 37, 38, 40, 42, 45, 47,
6. Calculate the range for the following
50.
distribution: 1.2, 1.3, 1.5, 1.8, 2.3
18.Would ‘IQ and grade point average for high
7. For each of the following variables, identify
school students’ have a positive or negative
the level of measurement: Years of birth.
correlation?
8. For a sample with a mean of 40 and a
19.Would ‘Daily high temperature and daily
standard deviation of 12, find the z-score
energy consumption for 30 winter days in
corresponding to X = 43.
Iceland’ have a positive or negative
9. For each of the following variables, identify
correlation?
the level of measurement: Number of
20.The "direction" of a correlation refers to:
siblings.
a. How strong the relationship is.
10.In a positively skewed distribution, the tail is
b. Whether the relationship is positive
longer on the:
or negative.
a. Left side
c. The shape of the data points on a
b. Right side
scatterplot.
c. Neither side
21. The "strength" of a correlation refers to:
a. Whether the relationship is positive
or negative.
b. How closely the data points follow a
straight line.
c. The direction of the line on a
scatterplot.
22.The "form" of a correlation typically refers
to:
a. Whether the relationship is linear or
nonlinear.
b. How strong the relationship is.
c. Whether the relationship is positive
or negative.
23.A scatterplot showing data points scattered
randomly with no clear pattern indicates:
a. A strong positive correlation.
b. A strong negative correlation.
c. A weak or no correlation.
24.Would ‘Model year and price for a used
Honda’ have a positive or negative
correlation?
25.Which of the following indicates a continuous
variable?
a. Number of hours per day
b. Height of a toddler
c. Number of female students in a block
26.The data points would be clustered more
closely around a straight line for a
correlation of -0.80 than for a correlation of
0.05. True or false?
27.If the data points are clustered close to a
line that slopes up from left to right, then a
good estimate of the correlation would be
0.90. True or false?
28.If a scatter plot shows a set of data points
that form a circular pattern, the correlation
should be near zero. True or false?
29.In a positive correlation, we can say that as
the value of the X variable increases, the Y
variable tends to decrease. True or false?
30.A measurement procedure is considered
reliable to the extent that it produces
unstable and inconsistent measurements. True
or false?
ANSWER KEY 25.b. Height of a toddler
1. 26.true
Mean n Mean x n 27.true
75 50 3750 28.true
80 40 3200 29.false
70 25 1750 30.false
Total 115 8700
Overall Mean 75.652
2. 28 - 12 + 1 = 17
3. Nominal
4. 115 - 101 + 1 = 15
5. Ordinal
6. 2.3 - 1.2 = 2.1
7. Interval
8. 0.25
9. Ratio
10.b. Right side
11.76
12.b. Negatively skewed
13.5
14.a. A bell curve
15.77
16.b. Relationship between two variables
17.
18.positive correlation
19.negative correlation
20.b. Whether the relationship is positive or
negative
21.b. How closely the data points follow a
straight line
22.a. Whether the relationship is linear or
nonlinear
23.c. A weak or no correlation
24.positive correlation