0% found this document useful (0 votes)
14 views22 pages

STATS

The document provides an introduction to statistics, covering key concepts such as measurement scales, central tendency, and variability. It explains different types of measurement scales (nominal, ordinal, interval, and ratio) and their implications for data analysis. Additionally, it discusses measures of central tendency (mean, median, mode) and variability (range, standard deviation, variance), along with their applications in descriptive and inferential statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

STATS

The document provides an introduction to statistics, covering key concepts such as measurement scales, central tendency, and variability. It explains different types of measurement scales (nominal, ordinal, interval, and ratio) and their implications for data analysis. Additionally, it discusses measures of central tendency (mean, median, mode) and variability (range, standard deviation, variance), along with their applications in descriptive and inferential statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Prepared by:

Aelisha Vargas, 2BES2


Althea Verzosa, 2BES2

Table of Contents
●​ Introduction to Statistics ……………………………………………………………………………….….. pg.3
●​ Measurement Scales ……………………………………………………..……………………………..… pg.4
●​ Central Tendency …………………………………………………………..………………………...……pg. 5
●​ Standard Scores and the Normal Curve ………………………………………………………………... pg.10
●​ Correlation Analysis ……………………………………………………………………………………. pg. 14
●​ Short Quiz ………………………………………………………………………….……………….….. pg. 20
Make a copy of this SHEETS for some additional Data the measurements that are made
information regarding the topics. on the participants in a study

LESSON 1: INTRODUCTION TO Statistic a number calculated on sample


STATISTICS/DESCRIPTIVE STATISTICS data that quantifies a
characteristic of the sample
STATISTICS Parameter a number calculated on
●​ A set of mathematical procedures for population data that quantifies a
organizing, summarizing, and interpreting characteristic of the population
information.

Key Terms in Statistics

Population the complete set of individuals,


objects, or scores that the
investigator is interested in
studying

Sample a subset of the population (main


issue: representativeness) DESCRIPTIVE STATISTICS
Variable any property or characteristic of ●​ involves analyzing data for the purpose of
some event, object, or person that describing or characterizing the data
changes, or may have different ○​ Measures of Central Tendency
values in certain conditions. ○​ Measures of Variability
Constant any characteristic or condition ○​ Frequency Distribution
that does not change, or has ○​ Percentage and Tallies
similar values for all individuals. ○​ Kurtoses and Skewness
Independent the variable in a study that is
INFERENTIAL STATISTICS
Variable manipulated or systematically
varied by the investigator ●​ consists of techniques that allow us to use
obtained sample data to make inferences or
Dependent the variable that the investigator draw conclusions about population
Variable measures to determine the effect ○​ Correlational techniques
of the independent variable
○​ Tests of differences (t-test, ANOVA)
○​ Chi-square tests
LESSON 2.1: MEASUREMENT SCALES ●​ These categories comprise the “units” of the
●​ Being aware of scales of measurement is scale, and objects are “measured” by
important because the type of measuring determining the category to which they
scale determines the statistical tool that will belong. Thus, measurement with a nominal
be used in data analysis. scale really amounts to classifying the
●​ Attributes of measurement scales: objects and giving them the name of the
○​ Magnitude category to which they belong.
○​ An equal interval between adjacent ●​ There is no magnitude relationship between
units the units of a nominal scale, because these
○​ An absolute zero point units are categories.
○​ Counting the instances of a category
(quantitative comparison of the
Levels of Measurement
categories) is different. There is still
Nominal ●​ Named variables no magnitude relationship between
the units of a nominal scale however,
Ordinal ●​ Named the frequencies from the comparison
●​ Ordered variables allow us to compare the number of
individuals under each category.
Interval ●​ Named
●​ Equivalence – a fundamental property of
●​ Ordered
●​ Proportionate interval nominal scales meaning that all members of
between variables a given class are the same from the
standpoint of the classification variable.
Ratio ●​ Named ●​ A nominal scale, therefore, does not possess
●​ Ordered any of the mathematical attributes of
●​ Proportionate interval magnitude, equal interval, or absolute zero
between variables
point.
●​ Can accommodate
absolute zero
ORDINAL SCALES
●​ Represents the next higher level of
NOMINAL SCALES
measurement than the nominal scale.
●​ The lowest level of measurement, and is
●​ It possesses a relatively low level of the
most often used with variables that are
property of magnitude.
qualitative in nature.
●​ We can rank-order the objects being
●​ Examples of qualitative variables:
measured according to whether they possess
○​ Brands of shoes more, less, or the same amount of the
○​ Kinds of fruit variable being measured.
○​ Types of music ●​ Although this scale allows better than, equal
○​ Days of the month to, or less than comparisons, it says nothing
○​ Nationality about the magnitude of the differences
○​ Religion between adjacent units on the scale. Thus, it
○​ Eye color also does not have the property of equal
intervals between adjacent units.
●​ Since all we have are relative rankings, the Discrete Variable
scale doesn’t tell the absolute level of the ●​ One in which there are no possible values
variable. between adjacent units on the scale.
●​ Ex. Number of children in the family
INTERVAL SCALES
●​ Represents the next higher level of LESSON 3: DESCRIPTIVE STATISTICS: MEASURE OF
measurement than the ordinal scale. CENTRAL TENDENCY AND VARIABILITY
●​ It possesses the properties of magnitude ●​ Frequency distributions, by themselves, do not
and equal interval between adjacent units allow quantitative statements that
but doesn’t have an absolute zero point. characterize the distribution as a whole to
●​ Example: be made, nor do they allow quantitative
○​ Celsius scale of temperature. comparisons to be made between two or
○​ The difference between 45 and 50 more distributions.
degrees Celsius is the same as the ●​ It is also important to quantify the extent to
difference between 55 and 60 which scores in distributions are dispersed, or
degrees Celsius. However, a zero spread out.
degrees Celsius does not mean ●​ These requirements are satisfied using
absence of heat/temperature. In this measures of central tendency and measures
case, the zero is not a “true zero,” but of variability.
it is arbitrarily set.

RATIO SCALES
●​ The highest level of measurement.
●​ It has all the properties of an interval scale
and in addition, has an absolute zero point.
○​ While the difference between 8 and 9
degrees Celsius is the same as the LESSON 3: MEASURES OF CENTRAL TENDENCY
difference between 99 and 100 ●​ The three most used measures of central
degrees, a reading of 20 degrees tendency are the arithmetic mean, the median,
Celsius is not twice as hot as 10 and the
degrees Celsius. mode.
○​ Zero degrees Celsius is an arbitrarily
set zero that actually occurs at 273 THE MEAN
degrees Kelvin. ●​ The arithmetic mean is defined as the sum of
the scores divided by the number of scores.
LESSON 2.2: CONTINUOUS vs. DISCRETE
VARIABLES Properties of the Mean
Continuous Variable 1.​ Changing a Score: Changing the value of
●​ One that theoretically can have an infinite any score in a distribution will change the
number of values between adjacent units on mean.
the scale. 2.​ Introducing a New Score or Removing a
●​ Ex. weight, height, and time Score: If you add a new score or take away
a score, both ∑X and N will change. It does ●​ However, with unequal number of scores per
not guarantee, however, that a change in the distribution (i.e., n₁=10, n₂=60, and n₃=30),
mean will take place. then:
3.​ Adding or Subtracting a Constant: When
adding or subtracting a constant, the mean
changes in exactly the same way.
○​ Consider this example. THE MEDIAN
●​ Definition: The median (symbol Mdn) is
defined as the scale value below which 50%
of the scores fall.
●​ Among grouped data, it is therefore the
same thing as P50.

4.​ Multiplying or Dividing a Score by a


Constant: When multiplying or dividing a
constant, the mean changes in exactly the
same way.
○​ Again, this example.

Google Sheets / Excel Command:


★​ =MEDIAN(value)

Formula:
𝑋𝑛+1
★​ 𝑥̃ = 2

THE MODE
Google Sheets / Excel Command:
●​ The third and last measure of central
★​ =AVERAGE(value)
tendency that we shall discuss is the mode.
Formula: ●​ The mode is defined as the most frequent
★​ 𝑥̅ =
Σ𝑥 score in the distribution.
𝑛
●​ Clearly, this is the easiest of the three
measures to determine. The mode is found by
THE OVERALL MEAN inspection of the scores; there isn’t any
●​ Given that three distributions have the same calculation necessary.
number of scores (i.e., n₁=20, n₂=20, and ○​ For instance, to find the mode of the
n₃=20); with the following means: Mean₁ = data in Table 3.2, all we need to do is
60; Mean₂ = 50; Mean₃ = 40 search the frequency column. The mode
●​ Because the number of scores per distribution for these data is 76. With grouped
is the same, then: scores, the mode is designated as the
60 + 50 + 40 = 150/3 = 50 midpoint of the interval with the
highest frequency. bimodal distribution are shown in Figure 4.2.

●​ Although the mode is the easiest measure of


○​ The mode of the grouped scores in central tendency to determine, it is not used
Table 3.4 is 77. very much in the behavioral sciences because
○​ When all the scores in the distribution it is not very stable from sample to sample
have the same frequency, it is and often there is more than one mode for a
customary to say that the distribution given set of scores.
has no mode.

Google Sheets / Excel Command:


★​ =MODE(value)

LESSON 3.1: MEASURES OF CENTRAL TENDENCY


AND SYMMETRY
●​ If the distribution is unimodal and
symmetrical, the mean, median, and mode
will all be equal. An example of this is the
bell-shaped curve shown in Figure 4.3. When
the distribution is skewed, the mean and
●​ Usually, distributions are unimodal; that is, median will not be equal. Since the mean is
they have only one mode. However, it is most affected by extreme scores, it will have
possible for a distribution to have many a value closer to the extreme scores than will
modes. When a distribution has two modes, the median. Thus, with a negatively skewed
as is the case with the scores 1, 2, 3, 3, 3, 3, distribution, the mean will be lower than the
4, 5, 7, 7, 7, 7, 8, 9, the distribution is called median. With a positively skewed curve, the
bimodal. Histograms of a unimodal and mean will be larger than the median. Figure
4.3 shows these relationships.
Take Note: ○​ Positive skewness suggests a larger
●​ 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛) number of smaller values.
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
●​ As a rule of thumb:
●​ Symmetrical/Normal (bell-shaped) Curve
○​ A skewness value between -1 and +1
○​ x̅ = x̃
is considered excellent.
○​ SK = 0
○​ A value between -2 and +2 is
●​ Not Symmetrical/Not Normal Curve
generally acceptable.
○​ Positive skew - distribution is skewed
○​ Values beyond -2 and +2 indicate
to the right
significant non-normality (Hair et al.,
■​ x̅ > x̃ 2022).
■​ SK > 0 ●​ Kurtosis measures the peak of the
○​ Negative skew - distribution is distribution:
skewed to the left ○​ A positive kurtosis means the
■​ x̅ < x̃ distribution is more peaked than
■​ SK < 0 normal.
○​ A negative kurtosis means the
TYPES OF KURTOSIS distribution is flatter than normal.
●​ General guidelines for kurtosis:
○​ A kurtosis greater than +2 suggests
the distribution is too peaked.
○​ A kurtosis less than -2 indicates the
distribution is too flat.
●​ When both skewness and kurtosis are close
to zero, the distribution is considered normal
★​ Leptokurtic – k > 0 (George Mallery, 2019).
★​ Mesokurtic – k = 0 ●​ In rare cases, if both skewness and kurtosis
★​ Platykurtic – k < 0 are exactly zero, it indicates a perfect
normal distribution. However, this situation is
Take Note: unlikely in real-world data.
●​ Kurtosis are all normal curves, they only
differ in terms of their peaks. Google Sheets / Excel Command:
★​ =skew(value)
INTERPRETING EXCESS SKEWNESS AND ★​ =kurt(value)
KURTOSIS
LESSON 3.2: THE CONCEPT OF VARIABILITY
●​ Skewness measures the symmetry of a
●​ Whereas measures of central tendency are
variable’s distribution. If the distribution is
a quantification of the average value of the
stretched toward the right or left tail, it's
distribution, measures of variability quantify
considered skewed:
the extent of dispersion.
○​ Negative skewness indicates a
●​ Three measures of variability are commonly
larger number of higher values.
used in the behavioral sciences: the range,
the standard deviation, and the variance.
The Range
●​ We have already used the range when we
were constructing frequency distributions of
grouped scores.
●​ The range is defined as the difference
between the highest and lowest scores in the Table 4.6 shows scores obtained from a population.
distribution. In equation form; ●​ Note: As a general rule, sample symbols use
letters in the English alphabet, while
population symbols use Greek letters.
The Standard Deviation
●​ Before discussing the standard deviation, it is
necessary to introduce the concept of a
deviation score.
●​ A deviation score tells how far away the raw ●​ The denominators for the two formulas are
score is from the mean of its distribution. slightly different (i.e., N-1 for sample SDs,
●​ In equation form, a deviation score is and simple N for population SDs).
defined as, ●​ Technically, the equation is the same for
calculating the standard deviation of sample
scores. However, when we calculate the
standard deviation of sample data, we want
to use our calculation to estimate the
population standard deviation. It can be
shown algebraically that the equation with N
in the denominator gives an estimate that on
the average is too small. Dividing by N-1,
instead of N, gives a more accurate estimate
of s.
Table 4.7 shows scores obtained from a sample.
●​ The column x-x̄ presents the deviation scores,
SAMPLE AND POPULATION SDs
which sum up to zero (0).
●​ Population data; ALL scores are considered.
●​ The sum of all deviation scores will ALWAYS
Computed standard deviation is the EXACT
be equal to zero. This is a “general property
standard deviation.
of the mean.”
●​ Sample data; NOT ALL scores are
●​ This does not imply that there is no deviation.
considered. Computed standard deviation is
●​ To determine the dispersion of the scores, an ESTIMATE.
each deviation score is squared (see column
●​ Like any estimate, we want it to be closer to
next to x-x̄). For this distribution, the sum of
reality.
squares of all deviation scores (SS) is 40.
○​ By using N-1 as the denominator
instead of just N, we arrive at a
larger SD value.
○​ By saying that the sample SD is The Variance
larger than the population SD, we ●​ The variance is a set of scores that is just the
are in fact saying that the sample SD square of the standard deviation. For sample
is an UNBIASED ESTIMATE of the scores, the variance equals
population SD.
NEW TERMS
●​ Sample Mean: Unbiased estimate of the
Population Mean
●​ Sample Standard Deviation: Unbiased ●​ For population scores, the variance equals
estimate of the Population Standard
Deviation
●​ Population Mean: True Mean
●​ Population Standard: True Standard
Deviation
LESSON 4: STANDARD SCORES AND THE
NORMAL CURVE

Table 4.9 Computational Procedure for Computing SD

EXERCISE #1
Calculate the standard deviation of the scores
contained in the first column of the following table: Introduction to z-scores
●​ Now we shift attention to the individual
scores within a distribution.
●​ A statistical technique that uses the mean and
the standard deviation to transform each
score (X value) into a z-score, or a standard
score.
●​ The purpose of z-scores, or standard scores,
is to identify and describe the exact
location of each score in a distribution.

Note: For our purposes, answers should be rounded EXAMPLE ON THE USE OF Z-SCORES
off to the nearest three decimal places. ●​ Suppose you received a score of X = 76 on
a statistics exam. How did you do? It should
be clear that you need more information to
predict your grade. Your score of X = 76
could be one of the best scores in the class,
or it might be the lowest score in the
distribution.
●​ To find the location of your score, you must
have information about the other scores in
the distribution. It would be useful, for
example, to know the mean for the class. If
the mean were 70, you would be in a much
better position than if the mean were 85.
Obviously, your position relative to the rest
of the class depends on the mean.
●​ However, the mean by itself is not sufficient
to tell you the exact location of your score.
Suppose you know that the mean for the
statistics exam is 70 and your score is X =
76. The preceding example demonstrates that a score
●​ At this point, you know that your score is 6 by itself does not provide much information about
points above the mean, but you still do not its position in the distribution. To make these raw
know exactly where it is located. Six points scores meaningful, we need to convert them to
may be a relatively big distance and you standard scores. One such standard score is a
may have one of the highest scores in the z-score.
class, or 6 points may be a relatively small ●​ A z score is a transformed score that
distance and you may be only slightly above designates how many standard deviation
the average. units the corresponding raw score is above or
●​ Figure 5.2 shows two possible distributions of below the mean.
exam scores. Both distributions have a mean
of 70, but for one distribution, the standard
deviation is 3, and for the other, SD = 12.
The location of X = 76 is highlighted in each
of the two distributions.
●​ When the standard deviation is 3, your
score of X = 76 is in the extreme right-hand Z-SCORES AND LOCATION IN A DISTRIBUTION
tail, the highest score in the distribution. ●​ One of the primary purposes of a z-score is
However, in the other distribution, where the to describe the exact location of a score
SD is 12, your score is only slightly above within a distribution. The z-score
average. accomplishes this goal by transforming each
●​ Thus, the relative location of your score within X value into a signed number (+ or –) so that
the distribution depends on the standard ○​ The sign tells whether the score is
deviation as well as the mean. located above (+) or below (–) the
mean, and 6 points.
○​ The number tells the distance Step 3: Identify the X value.
between the score and the mean in ●​ The value we want is located below the
terms of the number of standard mean by 6 points. The mean is μ = 60, so
deviations. the score must be X = 54.
●​ Thus, in a distribution of IQ scores with mean
= 100 and SD = 15, a score of X = 130 CHARACTERISTICS OF A NORMAL CURVE
would be transformed into z = 2.00. The z
value indicates that the score is located
above the mean (+) by a distance of 2
standard deviations (30 points).
●​ Question: What are the corresponding
z-scores for the given example above? (in
Figure 5.2)

●​ Using the same z-score formula, it becomes 1.​ It is bell-shaped and symmetrical.
possible to determine the other parts of the 2.​ The mean, median, and mode are equal in a
formula. normal distribution.
3.​ The greatest concentration of scores are
situated at the center of the distribution.
4.​ It is asymptotic (the tail extends infinitely
towards both ends without touching the
abscissa (or the x-axis).

UNDERSTANDING THE AREAS UNDER THE


Refer to ‘Standard Normal Distribution’ in NORMAL NORMAL CURVE
DISTRIBUTION SHEET for the z-score table.

CONVERTING z-SCORES TO X-VALUES


For a population with μ = 60 and σ = 12, what is
the X value corresponding to z = -0.50?

Step 1: Locate X in relation to the mean.


●​ A z-score of -0.50 indicates a location below
the mean by half of a standard deviation.
Step 2: Convert the distance from standard
deviation units to points.
●​ With σ = 12, half of a standard deviation is
In a normal distribution, there’s a specific USING THE TABLE OF AREAS UNDER THE
relationship between the mean (μ) and the standard NORMAL CURVE
deviation (σ) that determines the area under the
curve. Key points: ●​ The value of computing z-scores is best
understood when using Table A. Once a raw
●​ 34.13% of the area lies between the mean score is transformed to a z-score, we can
(μ) and a score that is one standard also determine the proportion of test takers
deviation above it (μ + 1σ). whose scores fall above or below that
●​ 13.59% of the area lies between μ + 1σ particular raw score.
and μ + 2σ. For instance, Sarah obtained a raw score of 30 in a
●​ 2.15% of the area lies between μ + 2σ and 50-item test. Did she do well? To answer that, the raw
μ + 3σ. score has to be converted to a standard score (or a
●​ Beyond μ + 3σ, 0.13% of the area remains. z-score in this case). For this to be done, the mean and
standard deviation must be known. If the mean is
This accounts for 50% of the area. Because the equal to 20 and the standard deviation is equal to 5,
distribution is symmetrical, the same percentages a z-score of 2.0 is obtained, using the z-score
apply below the mean. These areas represent the formula.
percentage of scores within each range, with the ●​ Then comes this question: What proportion of
frequency of scores shown on the vertical axis of the test takers obtained a score higher than
graph. Sarah’s 30? Using Table, A, the answer is
.0228. Refer to the column heading “Area
For example, in a population of 10,000 IQ scores
beyond z”
with a mean of 100 (μ=100) and a standard
●​ Then this question: What proportion of test
deviation of 16 (σ=16), the distribution works as
takers obtained a score lower than Sarah’s
follows:
30? Using Table A again, the answer is .4772
●​ 34.13% of the scores are between 100 and + .5000 = .9772. Refer to the column
116 (μ + 1σ). heading “Area Between Mean and z” and add
this value to the proportion equivalent to half
●​ 13.59% are between 116 and 132 (μ +
of area under the normal curve (which is
2σ).
.5000).
●​ 2.15% are between 132 and 148.
●​ Now, if the number of test takers is given, it
●​ 0.13% are above 148.
becomes possible for us to estimate exactly
Similarly, on the lower end: how many individuals there are. For instance, if
there are 3,000 test takers, we can estimate
●​ 34.13% of the scores are between 84 and that approximately 68 students got a score
100 (μ - 1σ). higher than Sarah’s. And approximately 2,932
●​ 13.59% are between 68 and 84. students got a score lower than Sarah’s.
●​ 2.15% are between 52 and 68.
●​ 0.13% are below 52.

These relationships are visually represented in


Figure 5.2.
EXERCISE #1 between organizational commitment
and job performance, can we use
organizational commitment scores to
predict how well employees will
perform on the job?

Issues to remember:
1.​ Finding a relationship between two variables
allows us to make predictions (i.e., the first
variable “predicts” how the second variable
will turn out; and
LESSON 5: CORRELATION ANALYSIS 2.​ Even when two variables are related, we
●​ Regression – describes how an independent cannot fully say that one variable “caused”
variable is numerically related to the the other to happen.
dependent variable
●​ Correlation – measures co-relationship or CORRELATION
association of two variables. ●​ Correlation is a statistical technique that is
Why Correlate? used to measure and describe the
●​ We would like to see whether two variables relationship between two variables.
are “related.” ●​ Usually, the two variables are simply
○​ What happens to a second variable observed as they exist naturally in the
when the first variable “increases” or environment—there is no attempt to control
“decreases”? or manipulate the variables.
Example: ●​ Example: a researcher could check high school
●​ Are interview scores of applicants (variable 1) records (with permission) to obtain a measure
related to their performance appraisal scores of each student’s academic performance, and
after six months (variable 2)? then survey each family to obtain a measure
○​ If there is a significant relationship of income.
between interview scores and ○​ The resulting data could be used to
subsequent performance appraisal, determine whether there is relationship
can we use interview scores to predict between high school grades and
employee performance? family income.
●​ Is there a relationship between USTET scores
(variable 1) and students’ GPA (variable 2)?
○​ If there is a significant relationship
between USTET scores and students’
GPA, can we use USTET scores to
predict scholastic success of students?
●​ Is there a relationship between employees’
organizational commitment (variable 1) and
their work performance (variable 2)?
○​ If there is a significant relationship
CHARACTERISTICS OF A RELATIONSHIP Relationship – this is measured by the
1.​ The Direction of the Relationship – the sign numerical value of the correlation.
of the correlation, positive or negative, ●​ A perfect correlation is always
describes the direction of the relationship. identified by a correlation of 1.00
●​ In a positive correlation, the two and indicates a perfectly consistent
variables tend to change in the same relationship.
direction: ○​ For a correlation of 1.00 (or
○​ As the value of the X variable –1.00), each change in X is
increases, the Y variable also accompanied by a perfectly
tends to increase; predictable change in Y.
○​ When the X variable ●​ At the other extreme, a correlation of
decreases, the Y variable also 0 indicated no consistency at all.
decreases. ○​ For a correlation of 0, the
●​ In a negative correlation, the two data points are scattered
variables tend to go in opposite randomly with no clear trend.
directions. ●​ Intermediate values between 0 and 1
○​ As the X variable increases, the indicate the degree of consistency.
Y variable decreases. That is, it
is an inverse relationship.

THE PEARSON CORRELATION


●​ By far, the most common correlation is the
2.​ The Form of the Relationship – in the Pearson correlation (or the Pearson
preceding coffee and beer examples, the product-moment correlation) which measures
relationships tend to have a linear form; that the degree of straight-line relationship.
is, the points in the scatter plot tend to ●​ This measures the degree and direction of
cluster around a straight line. We have the linear relationship between two
drawn a line through the middle of the data variables.
points in each figure to help show the ●​ It is identified by the letter r. Conceptually,
relationship. The most common use of this correlation is computed by:
correlation is to measure straight-line 𝑟 =
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
relationships. However, other forms of
𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌
relationships do exist and there are special = 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
correlations used to measure them.
3.​ The Strength or Consistency of the
What does a “relationship” mean?
●​ When there is a perfect linear relationship,
every change in the X variable is
accompanied by a corresponding change in
the Y variable.
●​ In Figure 15.4 (a), for example, every time
the value of X increases, there is a perfectly
predictable decrease in the value of Y. The
result is a perfect linear relationship, with X
and Y always varying together.
●​ In this case, the covariability (X and Y
●​ Computing Pearson’s r also requires
together) is identical to the variability of X
computing the Sums of Squares (SS) using
and Y separately, and the formula produces
the formula:
a correlation with a magnitude of 1.00 or 2
2 (Σ𝑥)
–1.00. 𝑆𝑆 = Σ𝑥 − 𝑛
●​ At the other extreme, when there is no linear
●​ Now, since we have two distributions whose
relationship, a change in the X variable does
scores/values we are correlating, we must
not correspond to any predictable change in
compute SS twice: once for X (𝑆𝑆𝑋) and
the Y variable. In this case, there is no
covariability, and the resulting correlation is again for Y (𝑆𝑆𝑌).
zero. ●​ The formula for Pearson’s r now becomes:
𝑆𝑃
Calculating Pearson’s r 𝑟=
𝑆𝑆𝑋𝑆𝑆𝑌
●​ To calculate the Pearson correlation, it is
necessary to introduce one new concept: the
INTERPRETING THE MAGNITUDE OF PEARSON’S r
sum of product of deviations, or SP. This
The following can be used as a basis for interpreting
new value is similar to SS (the sum of
the strength of relationships (based on Pearson’s r)
squared deviations), which is used to
Perfect Relationship +/- 1.00
measure variability for a single variable. High/Strong Degree Between 0.50 to 0.99
Now, we use SP to measure the amount of Moderate Degree Between 0.30 to 0.49
covariability between two variables. Low/Weak Degree Lower than 0.30
●​ The computational formula for the sum of No Correlation Computed value is 0 (zero)
products of deviations is:
Σ𝑋Σ𝑌 HYPOTHESIS TESTS WITH THE PEARSON
𝑆𝑃 = Σ𝑋𝑌 − 𝑛 CORRELATION
For example: 1.​ Writing a problem and formulating a
hypothesis
●​ The problem is a general question concerning
a population. To answer the question, a
sample will be selected, and the sample
data will be used to compute the correlation
value.
●​ The basic question for this hypothesis test is PROBLEM ANALYSIS AND COMPUTATIONAL
whether a correlation exists in the PROCEDURES
population. 1.​ State the problem (should be a question):
○​ The null hypothesis is “No. There is no ●​ Ex. Is there a relationship between
correlation in the population,” or “The organizational commitment and job
population correlation is zero.” performance?
○​ The alternative hypothesis is “Yes. 2.​ Write a hypothesis
There is a real, nonzero correlation in ●​ Statistical Hypotheses:
the population.”
○​ Because the population correlation is
traditionally represented by ρ (the
Greek letter rho), these hypotheses ●​ Research Hypotheses:
would be stated in symbols as: ○​ Ex. H₀: There is no relationship
between Organizational Commitment
and Job Performance.
H₁: There is a relationship between
2.​ Assigning the level of significance Organizational Commitment and Job
α = .05 Performance.
3.​ Degrees of freedom for the correlation test 3.​ Level of Significance
●​ The hypothesis test for the Pearson α = .05
correlation has degrees of freedom defined 4.​ Computational Procedure
by df = n – 2. ●​ Using the SP procedure, the following solution
○​ An intuitive explanation for this value is in place:
is that a sample with only n=2 data ●​ We begin with the column totals for the
points has no degrees of freedom. variables X and Y.
Specifically, if there are only two ★​ Note: One must be careful when computing
points, they will fit perfectly on a the column totals, as well as the sum of
straight line, and the sample products of x and y. Error obtained here
produces a perfect correlation of would have a domino effect on all succeeding
r=1.00 or r=–1.00. Because the first procedures.
two points always produce a perfect
correlation, the sample correlation is USING AND INTERPRETING PEARSON’S
free to vary only when the data set CORRELATION
contains more than two points. Thus, Where and why correlations are used:
df = n – 2. ●​ When making predictions.
4.​ Testing the hypothesis ○​ If two variables are known to be
●​ The obtained Pearson’s r is then compared to related in a systematic way, then it
a table value (or critical value). (Refer to is possible to use one of the variables
“Critical Values for the Pearson’s r” in to make accurate predictions about
'CORRELATION' SHEETS) the other.
●​ When conducting test validation.
○​ We use correlations to determine
whether tests are “valid,” i.e., to cannot be interpreted as proof of a
determine whether they measure cause-and-effect relationship between the
what they say they are measuring. two variables.
○​ For example, if the test actually
measures intelligence, then the scores
on the test should be related to other
measures of intelligence—for
example, standardized IQ tests,
performance on learning tasks,
problem-solving ability, and so on.
●​ To determine test reliability.
○​ A measurement procedure is 2.​ The value of a correlation can be affected
considered reliable to the extent that greatly by the range of scores represented
it produces stable, consistent in the data (i.e., concept of restricted range)
measurements.
○​ For example, if your IQ was measured
as 113 last week, you would expect to
obtain nearly the same score if your
IQ was measured again this week.
●​ For theory verification.
○​ Many psychological theories make
specific predictions about the 3.​ One of two extreme data points, often
relationship between two variables. called outliers, can have a dramatic effect
○​ For example, a theory may predict a on the value of a correlation.
relationship between brain size and
learning ability; a developmental
theory may predict a relationship
between the parents’ IQs and the
child’s IQ; a social psychologist may
have a theory predicting a relationship
between personality type and behavior
in a social situation.
○​ In each case, the prediction of the
theory could be tested by
determining the correlation between
the two variables.
4.​ When judging how “good” a relationship is,
it is tempting to focus on the numerical value
ON INTERPRETING CORRELATIONS
of the correlation. To describe how
1.​ Correlation simply describes a relationship
accurately one variable predicts the other,
between two variables. It does not explain
you must square the correlation. Thus, a
why the two variables are related.
correlation of r=.5 means that one variable
Specifically, a correlation should not and
partially predicts the other, but the
predictable portion is only r²=0.5²=0.25 (or
25%) of the total variability.
●​ The value r² is called the coefficient of
determination because it measures the
proportion of variability in one variable that
can be determined from the relationship with
the other variable.
●​ For example, a correlation of r=0.80 (or
–0.80), means that r²=0.64 (or 64%) of the
variability in the Y scores can be predicted
from the relationship with X.

CONGRATS, BESTIE!
THIS IS THE END OF THE REVIEWER.
GOOD LUCK :)
QUIZ 11.​For a sample with a mean of 80 and a
1.​ A researcher conducted an experiment standard deviation of 20, find the X-value
involving three groups of subjects. The mean corresponding to z = -0.20.
of the first group is 75, and there were 50 12.​If a distribution has a long tail to the left, it is
subjects in the group. The mean of the second said to be:
group is 80, and there were 40 subjects. The a.​ Positively skewed
third group has a mean of 70 and 25 b.​ Negatively skewed
subjects. Calculate the overall mean of the c.​ Symmetrical
three groups combined. 13.​For a sample with a mean of 85, a score of
Mean n Mean x n X = 80 corresponds to a z = -1.00 What is
75 50 ? the standard deviation for the sample?
80 40 ? 14.​Which of the following shapes best
70 25 ? represents a symmetrical distribution?
Total ? ? a.​ A bell curve
Overall Mean ? b.​ A J-curve
c.​ A U-curve
2.​ Calculate the range for the following
d.​ A skewed curve
distribution: 18, 12, 28, 15, 20
15.​For a sample with a standard deviation of
3.​ For each of the following variables, identify
12, a score of X = 83 corresponds to a z =
the level of measurement: Eye color (e.g.,
0.50. What is the mean for the sample?
blue, brown, green).
16.​Correlation measures the:
4.​ Calculate the range for the following
a.​ Difference between two variables
distribution: 115, 107, 105, 109, 101
b.​ Relationship between two variables
5.​ For each of the following variables, identify
c.​ Average of two variables
the level of measurement: Ranking of
17.​Calculate the standard deviation of the
students in a class (1st, 2nd, 3rd, etc.)
scores: 25, 28, 35, 37, 38, 40, 42, 45, 47,
6.​ Calculate the range for the following
50.
distribution: 1.2, 1.3, 1.5, 1.8, 2.3
18.​Would ‘IQ and grade point average for high
7.​ For each of the following variables, identify
school students’ have a positive or negative
the level of measurement: Years of birth.
correlation?
8.​ For a sample with a mean of 40 and a
19.​Would ‘Daily high temperature and daily
standard deviation of 12, find the z-score
energy consumption for 30 winter days in
corresponding to X = 43.
Iceland’ have a positive or negative
9.​ For each of the following variables, identify
correlation?
the level of measurement: Number of
20.​The "direction" of a correlation refers to:
siblings.
a.​ How strong the relationship is.
10.​In a positively skewed distribution, the tail is
b.​ Whether the relationship is positive
longer on the:
or negative.
a.​ Left side
c.​ The shape of the data points on a
b.​ Right side
scatterplot.
c.​ Neither side
21.​ The "strength" of a correlation refers to:
a.​ Whether the relationship is positive
or negative.
b.​ How closely the data points follow a
straight line.
c.​ The direction of the line on a
scatterplot.
22.​The "form" of a correlation typically refers
to:
a.​ Whether the relationship is linear or
nonlinear.
b.​ How strong the relationship is.
c.​ Whether the relationship is positive
or negative.
23.​A scatterplot showing data points scattered
randomly with no clear pattern indicates:
a.​ A strong positive correlation.
b.​ A strong negative correlation.
c.​ A weak or no correlation.
24.​Would ‘Model year and price for a used
Honda’ have a positive or negative
correlation?
25.​Which of the following indicates a continuous
variable?
a.​ Number of hours per day
b.​ Height of a toddler
c.​ Number of female students in a block
26.​The data points would be clustered more
closely around a straight line for a
correlation of -0.80 than for a correlation of
0.05. True or false?
27.​If the data points are clustered close to a
line that slopes up from left to right, then a
good estimate of the correlation would be
0.90. True or false?
28.​If a scatter plot shows a set of data points
that form a circular pattern, the correlation
should be near zero. True or false?
29.​In a positive correlation, we can say that as
the value of the X variable increases, the Y
variable tends to decrease. True or false?
30.​A measurement procedure is considered
reliable to the extent that it produces
unstable and inconsistent measurements. True
or false?
ANSWER KEY 25.​b. Height of a toddler
1.​ 26.​true
Mean n Mean x n 27.​true
75 50 3750 28.​true
80 40 3200 29.​false
70 25 1750 30.​false
Total 115 8700
Overall Mean 75.652

2.​ 28 - 12 + 1 = 17
3.​ Nominal
4.​ 115 - 101 + 1 = 15
5.​ Ordinal
6.​ 2.3 - 1.2 = 2.1
7.​ Interval
8.​ 0.25
9.​ Ratio
10.​b. Right side
11.​76
12.​b. Negatively skewed
13.​5
14.​a. A bell curve
15.​77
16.​b. Relationship between two variables

17.​
18.​positive correlation
19.​negative correlation
20.​b. Whether the relationship is positive or
negative
21.​b. How closely the data points follow a
straight line
22.​a. Whether the relationship is linear or
nonlinear
23.​c. A weak or no correlation
24.​positive correlation

You might also like