0% found this document useful (0 votes)
18 views

Linear Correlation and Regression

Correlation and regression are statistical techniques used to determine the relationship between two quantitative variables. Correlation finds the degree of relationship without inferring causation, while regression predicts one variable based on the other. A scatter plot displays the data points and can reveal if the relationship is positive, negative, or none. The correlation coefficient measures the strength and direction of the linear relationship, ranging from -1 to 1. Regression finds the "best fit" linear relationship through least squares regression to minimize residuals and predict values using the regression equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Linear Correlation and Regression

Correlation and regression are statistical techniques used to determine the relationship between two quantitative variables. Correlation finds the degree of relationship without inferring causation, while regression predicts one variable based on the other. A scatter plot displays the data points and can reveal if the relationship is positive, negative, or none. The correlation coefficient measures the strength and direction of the linear relationship, ranging from -1 to 1. Regression finds the "best fit" linear relationship through least squares regression to minimize residuals and predict values using the regression equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Correlation and

Regression

Y
* *
*
X
Correlation

- finding the relationship between two quantitative


variables without being able to infer causal
relationships

- a statistical technique used to determine the


degree to which two variables are related
Scatter diagram

• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and
the second is called dependent (Y)
• Points are not joined
Y
* *
*
X
Example. The data is about weight in kilograms

and systolic blood pressure.

Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
Wt. 67 69 85 83 74 81 97 92 114 85
SBP(mmHg) (kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)

220
200

180
160

140
120

100
80 wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


Scatter plots

The pattern of data is indicative of the type of


relationship between your two variables:
 positive relationship
 negative relationship
 no relationship
Positive relationship
18

16

14

12
Height in CM

10

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship

Reliability

Age of Car
No relation
Correlation Coefficient
- Statistic showing the degree of relation between two
variables

Simple Correlation Coefficient (r)

 also called Pearson's correlation or product


moment correlation
coefficient.
 measures the nature and strength between two
variables of
the quantitative type.
• The sign of r denotes the nature of association

• while the value of r denotes the strength of association.

 If the sign is +ve this means the relation is direct


(an increase in one variable is associated with an
increase in the other variable and a decrease in
one variable is associated with a decrease in the
other variable).

 While if the sign is -ve this means an inverse or


indirect relationship (which means an increase in
one variable is associated with a decrease in the
other).
 The value of r ranges between ( -1) and ( +1)
 The value of r denotes the strength of the
association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

1- -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
If r = Zero this means no association or
correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)

 xy   x y
r n
 ( x) 2
  ( y) 
2
x 
2 .  y 
2 
 n  n 
  
or
:Example
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.

Weight Age Serial


)Kg( )years( .No
12 7 1
8 6 2
12 8 3
10 5 4
11 6 5
13 9 6
These 2 variables are of the quantitative type, one variable
(Age) is called the independent and denoted as (X) variable
and the other (weight)
is called the dependent and denoted as (Y) variables to find
the relation between age and weight compute the simple
correlation coefficient using the following formula:

 xy   x y
r  n

  x2 
(  x) 2

.  y 2 
(  y) 2


 n  n 
  
Weight Age
(yrs)
Serial
Y2 X2 xy (kg)
)y( (x) .No

1
144 49 84 12 7

2
64 36 48 8 6

3
144 64 96 12 8

4
100 25 50 10 5

5
121 36 66 11 6

6
169 81 117 13 9

Total
y2 = 742∑ x2 = 291∑ xy= 461∑ y = 66∑ x= 41∑
41  66
461 
r 6
 (41) 2   (66) 2 
291  .742  
 6  6 

r = 0.759
strong direct correlation
EXAMPLE: Relationship between Anxiety and
Test Scores
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient

(6)(129)  (32)(32) 774  1024


r   .94
6(230)  32 6(204)  32 
2 2
(356)( 200)

r = - 0.94

Indirect strong correlation


Regression Analyses

Regression: technique concerned with predicting


some variables by knowing others

The process of predicting variable Y using


variable X
Regression
 Uses a variable (x) to predict some outcome
variable (y)
 Tells you how values in y change as a function
of changes in values of x
Correlation and Regression

 Correlation describes the strength of a linear


relationship between two variables
 Linear means “straight line”

 Regression tells us how to draw the straight line


described by the correlation
Regression
• Calculates the “best-fit” line for a certain set of data
• the regression line makes the sum of the squares of the
residuals smaller than for any other line
• minimizes residuals

220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
By using the least squares method (a procedure
that minimizes the vertical deviations of plotted
points surrounding a straight line) we are
able to construct a best fitting straight line to the
scatter diagram points and then formulate a
regression equation in the form of:

ŷ  a  bX

 x y
 xy 
ŷ  y  b(x  x) bb1  n
(  x) 2
 x 2

n
Regression Equation

 Regression equation describes the regression


line mathematically
 Intercept

 Slope
SBP(mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
Linear Equation

Y
ŷY = bX
a +bX
a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Hours studying and grades
Regressing grades on hours


Linear Regression


90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88


80.00  

 
70.00

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

Predicted final grade in class =


59.95 + 3.17*(number of hours you study per week)
Predicted final grade in class = 59.95 + 3.17*(hours of
study)

…Predict the final grade of

• Someone who studies for 12 hours


• Final grade = 59.95 + (3.17*12)
• Final grade = 97.99

• Someone who studies for 1 hour:


• Final grade = 59.95 + (3.17*1)
• Final grade = 63.12
2. A sample of 6 persons was selected the value of their
age ( x variable) and their weight is demonstrated in
the following table. Find the regression equation and
what is the predicted weight when age is 8.5 years.

Weight (y) Age (x) .Serial no

12 7 1
8 6 2
12 8 3
10 5 4
11 6 5
13 9 6
Solution:
Y2 X2 xy Weight (y) Age (x) .Serial no

144 49 84 12 7 1
64 36 48 8 6 2
144 64 96 12 8 3
100 25 50 10 5 4
121 36 66 11 6 5
169 81 117 13 9 6

742 291 461 66 41 Total


41
x  6.83 41  66
6 461 
b 6  0.92
2
(41)
291 
66 6
y  11
6

Regression equation

ŷ (x)  11  0.9(x  6.83)


ŷ (x)  4.675  0.92x

ŷ (8.5)  4.675  0.92 * 8.5  12.50Kg

ŷ (7.5)  4.675  0.92 * 7.5  11.58Kg


12.6
12.4
Weight (in Kg) 12.2
12
11.8
11.6
11.4
7 7.5 8 8.5 9
Age (in years)

We create a regression line by plotting two estimated


values for y against their X component, then extending the
line right and left.
The following are the age
(in years) and systolic B.P Age B.P Age
blood pressure of 20 (y) (x) (y) (x)
apparently healthy
adults. 128 46 120 20
Find the correlation 136 53 128 43
between age and blood 146 60 141 63
pressure using simple 124 20 126 26
and Spearman's 143 63 134 53
correlation coefficients, 130 43 128 31
and comment. 124 26 136 58
Find the regression 121 19 132 46
equation? 126 31 140 58
What is the predicted 123 23 144 70
blood pressure for a man
aging 25 years?
x2 xy y x Serial
400 2400 120 20 1
1849 5504 128 43 2
3969 8883 141 63 3
676 3276 126 26 4
2809 7102 134 53 5
961 3968 128 31 6
3364 7888 136 58 7
2116 6072 132 46 8
3364 8120 140 58 9
4900 10080 144 70 10
x2 xy y x Serial
2116 5888 128 46 11
2809 7208 136 53 12
3600 8760 146 60 13
400 2480 124 20 14
3969 9009 143 63 15
1849 5590 130 43 16
676 3224 124 26 17
361 2299 121 19 18
961 3906 126 31 19
529 2829 123 23 20
41678 114486 2630 852 Total
 x y
 xy 
n 114486 
852  2630
b1  = 20  0.4547
(  x) 2
852 2

x  n
2 41678 
20

ŷ =112.13 + 0.4547 x

for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg
Multiple Regression
Multiple regression analysis is a straightforward
extension of simple regression analysis which allows
more than one independent variable.

You might also like