0% found this document useful (0 votes)
475 views

MMW Module 10 - Correlation and Linear Regression

Uploaded by

Dahlia Fernandez
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
475 views

MMW Module 10 - Correlation and Linear Regression

Uploaded by

Dahlia Fernandez
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Module Overview

Purpose of this Module

The purpose of this module is to discuss the relationships of


variables using Linear Regression and correlation

Linear Regression and Correlation

Linear regression is a technique that is appropriate to understand the association


between one independent (or predictor) variable and one continuous
dependent (or outcome) variable. In correlation analysis, we estimate
a sample correlation coefficient, more specifically the Pearson Product Moment
correlation coefficient. The sample correlation coefficient, denoted r, ranges between
-1 and +1 and quantifies the direction and strength of the linear association between
the two variables. The correlation between two variables can be positive (i.e., higher
levels of one variable are associated with higher levels of the other) or negative (i.e.,
higher levels of one variable are associated with lower levels of the other).

Module guide

This module discusses whether two variables are related to each


other by using Least-Squares Regression Line and Linear Correlation
Coefficient

Module Outcome/s

After this learning module, students will be able to:

1. Use linear regression to predict the value of a variable given certain


conditions.
2. Apply correlation to determine the relationship between two
variables
3. Articulate the importance of mathematics in one’s life.
4. Express appreciation for mathematics as a human endeavor.

Module Requirements

By the end of this module, the students will submit the


following activities provided.
Course Pre-Assessment

Matching Type: Find in the box the type of linear correlation in each picture of scatter
Diagram.

1. 4.

_________________ _________________

2. 5.

_________________ _________________

3. 6.

_________________ _________________

Perfect positive correlation Strong negative correlation

Strong positive correlation Positive correlation

Negative correlation Little or no linear

Key Terms
Linear Regression - is a linear approach to modelling the relationship between a scalar
response and one or more explanatory variables. The case of one
explanatory variable is called simple linear regression; for more
than one, the process is called multiple linear regression.

Correlation - or dependence is any statistical relationship, whether causal or not,


between two random variables or bivariate data. In the broadest sense
correlation is any statistical association, though it commonly refers to
the degree to which a pair of variables are linearly related.

Linear Regression

Linear Regression In many applications, scientists


try to determine whether two variables are related. If
they are related, the scientists then try to find an
equation that can be used to model the relationship. For
instance, the zoology professor R. McNeill Alexander wanted to determine whether
the stride length of a dinosaur, as shown by its fossilized footprints, could be used
to estimate the speed of the dinosaur. Stride length for an animal is defined as the
distance x from a particular point on a footprint to that same point on the next
footprint of the same foot. (See the figure at the right.) Because no dinosaurs were
available, Alexander and fellow scientist A. S. Jayes carried out experiments with
many types of animals, including adult men, dogs, camels, ostriches, and elephants.
The results of these experiments tended to support the idea that the speed y of an
animal is related to the animal’s stride length x. To better understand this
relationship, examine the data in Table 13.11, which are similar to, but less extensive
than, the data collected by Alexander and Jayes.

TABLE 1: Speed for Selected Stride Lengths

A graph of the ordered pairs in Table 1 is shown in Figure 1. In this graph,


which is called a scatter diagram or scatter plot, the x-axis represents the stride
lengths in meters and the y-axis represents the average speeds in meters per
second. The scatter diagram seems to indicate that for each of the three species, a
larger stride length generally produces a faster speed. Also note that for each
species, a straight line can be drawn such that all of the points for that species lie
on or very close to the line. Thus the relationship between speed and stride length
appears to be a linear relationship.

FIGURE 1: Scatter diagram for Table 1


After a relations hip between paired data, which are referred to as bivariate
data, has been discovered, a scientist tries to model the relationship with an
equation. One method of determining a linear relationship for bivariate data is called
linear regression. To see how linear regression is carried out, let us concentrate on
the bivariate data for the dogs, which is shown by the green points in Figures 1 and
2. There are many lines that can be drawn such that the data points lie close to the
line; however, scientists are generally interested in the line called the line of best fit
or the least-squares regression line.

FIGURE 2: Vertical deviations

The least-squares regression line is also called the least-squares line. The
approximate equation of the least-squares line for the bivariate data for the dogs is
ŷ = 3.2x - 1.1. Figure 2 shows the graph of these data and the graph ŷ = 3.2x - 1.1. In
Figure 2, the vertical deviations from the ordered pairs to the graph of ŷ = 3.2x - 1.1
are 0, -0.06, 0.5, -0.52, -0.16, -0.6, 0.34 and 0.2.

It is traditional to use the symbol ŷ (pronounced y-hat) in place of y in the


equation of a least-squares line. This also helps us differentiate the line’s y-values
from the y-values of the given ordered pairs.
The next formula can be used to determine the equation of the least-squares
line for a given set of ordered pairs.

In the formula for the least-squares regression line, ∑ x represents


the sum of all the x values, y represents the sum of all the y values, and ∑xy
represents the sum of the n products x1y1, x2y2, ... , xnyn. The notation x̅ represents
the mean of the x values, and y̅ represents the mean of the y values. The following
example illustrates a procedure that can be used to calculate efficiently the sums
needed to find the equation of the least-squares line for a given set of data.

Example 1: Find the Equation of a Least-Squares Line

Find the equation of the least-squares line for the adult men ordered pairs in
Table 1.

Solution:

The ordered pairs are (2.5, 3.4) , (3.0, 4.9) , (3.3, 5.5) , (3.5, 6.6) , (3.8, 7.0) , (4.0,
7.7) , (4.2, 8.3) , (4.5, 8.7). The number of ordered pairs is n = 8. Organize the data
in four columns, as shown in Table 2 Then find the sum of each column.

Table 2

Find the slope a.


Find x̅ and y̅.

Find the y-intercept b.

If a and b are each rounded to the nearest tenth, to


reflect the accuracy of the original data, then we have
as our equation of the least-squares line:

ŷ = ax + b
ŷ ≈ 2.7x - 3.3

See Figure 3

FIGURE 3: Least-squares line for speed versus stride


length in adult men

Example 2: Use a Least-Squares Line to Make a
Prediction

Use the equation of the least-squares line from Example


1 to predict the average speed of an adult man for each
of the following stride lengths. Round your results to
the nearest tenth of a meter per second.

a. 2.8 m b. 4.8 m

Solution:

a. In Example 1, we found the equation of the least-squares line to be ŷ = 2.7x -


3.3. Substituting 2.8 for x gives…

ŷ = 2.7(2.8) - 3.3. = 4.26

Rounding 4.26 to the nearest tenth produces 4.3.


The procedure in Example 2a made use of an equation to
determine a point between given data points. This
procedure is referred to as interpolation. In Example 2b,
an equation was used to determine a point to the right of
the given data points. The process of using an equation to
determine a point to the right or left of given data points
is referred to as extrapolation. See Figure 4.

FIGURE 4: Interpolation and extrapolation


Linear Correlation Coefficient

To determine the strength of a linear relationship between two variables,


statisticians use a statistic called the linear correlation coefficient, which is
denoted by the variable r and is defined as follows.

If the linear correlation coefficient r is positive, the


relationship between the variables has a positive
correlation. In this case, if one variable increases,
the other variable also tends to increase. If r is
negative, the linear relationship between the
variables has a negative correlation. In this case, if
one variable increases, the other variable tends to
decrease. Figure 5 shows some scatter diagrams
along with the type of linear correlation that exists between the x and y variables.
The closer |r| is to 1, the stronger the linear relationship between the variables.

FIGURE 5
Linear correlation

Example 3: Find a Linear Correlation Coefficient

Find the linear correlation coefficient for stride length versus speed of an adult
man. Use the data in Table 1. Round your result to the nearest hundredth.

Solution:

The ordered pairs are (2.5, 3.4), (3.0, 4.9), (3.3, 5.5), (3.5, 6.6), (3.8, 7.0), (4.0, 7.7),
(4.2, 8.3), (4.5, 8.7). The number of ordered pairs is n = 8. In Table 2 we found:

The only additional value that is needed is…

Substituting the above values into the equation for the linear correlation coefficient
gives us…

To the nearest hundredth, the linear correlation coefficient is 0.99.


The linear correlation coefficient indicates the strength of a linear relationship
between two variables; however, it does not indicate the presence of a cause-and-
effect relationship. For instance, the data in Table 3 show the hours per week that a
student spent playing pool and the student’s weekly algebra test scores for those
same weeks.

TABLE 3: Algebra Test Scores vs. Hours Spent Playing Pool

The linear correlation coefficient for the ordered pairs in the table is r ≈ 0.98. Thus
there is a strong positive linear relationship between the student’s algebra test
scores and the time the student spent playing pool. This does not mean that the
higher algebra test scores were caused by the increased time spent playing pool. The
fact that the student’s test scores increased with the increase in the time spent
playing pool could be due to many other factors or it could just be a coincidence. In
your work with applications that involve the linear correlation coefficient r, it is
important to remember the following properties of r.
References/Suggested Readings

BIBLIOGRAPHY Sirug, Winston S., Mathematics in the Modern World: CHED Curriculum Compliant

Hengania, Catherine O., Et.Al., Mathematics in the Modern World

Jamison R.E., (2000).Learning the Language of Mathematics. Language and Learning


Across the Discipline (45-54).

Post Assessment:

Solve the problems.

1. Which of the scatter diagrams below suggests the …

a. strongest positive linear correlation between the x and y variables?


b. strongest negative linear correlation between the x and y variables?

2. Which of the scatter diagrams below suggests …

a. a near perfect positive linear correlation between the x and y variables?


b. little or no linear correlation between the x and y variables?
3. Given the bivariate data:

a. Draw a scatter diagram for the data.


b. Find n, ∑x, ∑y, ∑x2 , (∑x)2, ∑xy.
c. Find a, the slope of the least-squares line, and b, the y-intercept of the least-
squares line.
d. Draw the least-squares line on the scatter diagram from part a.
e. Is the point x̅, y̅ on the least-squares line?
f. Use the equation of the least-squares line to predict the value of y when x = 3.4.
g. Find, to the nearest hundredth, the linear correlation coefficient.
Name:__________________________________Year/Section_________Score________

Write your answers here!

You might also like