0% found this document useful (0 votes)
12 views31 pages

Correlation & Regression

The document provides an overview of correlation and regression analysis, detailing their purposes, methods, and key differences. It explains correlation as a measure of the linear relationship between two variables, while regression models the relationship between a dependent variable and one or more independent variables. Additionally, it covers calculation methods for Karl Pearson's and Spearman's coefficients, along with examples and the relationship between correlation and regression coefficients.

Uploaded by

KHILY SAXENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Correlation & Regression

The document provides an overview of correlation and regression analysis, detailing their purposes, methods, and key differences. It explains correlation as a measure of the linear relationship between two variables, while regression models the relationship between a dependent variable and one or more independent variables. Additionally, it covers calculation methods for Karl Pearson's and Spearman's coefficients, along with examples and the relationship between correlation and regression coefficients.

Uploaded by

KHILY SAXENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Correlation and

Regression
Analysis
DR. REKHA PRASAD
IM, BHU
INTRODUCTION
 Correlation and regression analysis are two statistical methods used to
explore relationships between variables, but they serve different purposes.
1. Correlation Analysis
 Correlation measures the strength and direction of the linear
relationship between two variables.
 The result is expressed as the correlation coefficient (r), which
ranges from -1 to +1:
 r = +1: Perfect positive linear relationship.
 r = -1: Perfect negative linear relationship.
 r = 0: No linear relationship.
Example: If you measure hours studied and exam scores, correlation tells you if more
study hours lead to higher scores.
CONT…

2. Regression Analysis
 Regression goes a step further by modeling the relationship between a
dependent variable (outcome) and one or more independent variables
(predictors).
 It provides a mathematical equation, like Y = a + bX, where:
 Y is the dependent variable,
 X is the independent variable,
 b is the slope (rate of change),
 a is the intercept.
 Types of Regression:
 Linear Regression: Examines one independent variable.
 Multiple Regression: Examines several independent variables.
KEY DIFFERENCES BETWEEN
CORRELATION AND
REGRESSION
Correlation Regression
Feature
Analysis Analysis
Measures
Predicts or
Objective relationship
explains values
strength
Correlation Regression
Output
coefficient (r) equation
Dependent &
Symmetrical (no
Dependency independent
distinction)
defined
KARL PEARSON’S CO-
EFFICIENT OF CORRELATION
 Karl Pearson's Coefficient of Correlation is a statistical measure that
quantifies the strength and direction of the linear relationship between two
variables. It is represented by the symbol 'r' and ranges from -1 to +1.
Here's how to interpret it:
 +1: Perfect positive correlation (as one variable increases, the other
increases proportionally).
 -1: Perfect negative correlation (as one variable increases, the other
decreases proportionally).
 0: No linear correlation (no relationship between the variables).
CONT…
 The formula for calculating Karl Pearson’s coefficient is:

where:
 x and y: The two variables being compared.
 n: The number of paired observations.
 ∑: Denotes summation.
 It is widely used in fields like economics, social sciences, and engineering to study
PROBLEM

 EXAMPLE: The heights (in cm) and weights (in kg) of 5


individuals are given below:
Height (X) Weight (Y)
150 50
160 55
165 58
170 60
175 63
 Calculate Karl Pearson's coefficient of correlation rr for the given
data.
CONT…
 SOLUTION:
 Step 1: Create a table with additional columns
 We’ll compute:
 X2
 Y2
 XY
CONT…

X Y X2 Y2 XY
150 50 22500 2500 7500
160 55 25600 3025 8800
165 58 27225 3364 9570
170 60 28900 3600 10200
175 63 30625 3969 11025
∑X = 820 ∑Y = 286 ∑X2 = ∑Y2 = 16458 ∑ XY =
134750 47095
 Step 2: Substitute the values in the formula for correlation
 n=5, ∑X=820, ∑Y=286, ∑X2=134750, ∑Y2=16458, ∑XY=47095
 r={5(47095)−(820)(286)}/{√[5(134750)−(820)2][5(16458)−(286)2]}\
 Step 3: Simplify
 r=(955/955.9812)≈ 0.998974
 Result:
 There is strong correlation between height and weight of 99.894%
SPEARMAN’S COEFFICIENT OF
CORRELATION
 The Spearman's Rank Correlation Coefficient, often denoted
as rs, measures the strength and direction of the relationship
between two ranked variables. It's particularly useful for non-
linear data or ordinal variables. Its formula is:

 Where: xi and yi are the two variables being compared


n is number of pairs of data
CONT…
 Steps for Calculation:
 Rank the data: Assign ranks to the values in both variables, handling
tied ranks by assigning the average rank.
 Calculate the rank differences: Find the difference d between the
ranks for each pair of data.
 Square the rank differences: Compute d2 for each pair.
 Substitute into the formula: Use the formula to calculate rs.
 Interpretation:
 rs = +1: Perfect positive monotonic relationship.
 rs= -1: Perfect negative monotonic relationship.
 rs = 0: No monotonic relationship.
PROBLEM
 EXAMPLE: Suppose one is given the scores of 5 students in two subjects:

Student Subject X Subject Y


A 85 92
B 60 65
C 75 78

 D 95 88
Find the Spearman's Rank Correlation Coefficient.
E 70 72
CONT…
 SOLUTION:
 Step 1: Rank the data
 Assign ranks to the scores in each subject. Higher scores get lower rank (rank
1 for the highest score, and so on).

Student Subject X Rank X Subject Y Rank Y


A 85 2 92 1
B 60 5 65 5
C 75 4 78 4
D 95 1 88 2
E 70 3 72 3
CONT…
 Step 2: Calculate rank differences
 Compute the differences d=Rank X−Rank Y and square them (d2):

d=Rank X−Ra
Student Rank X Rank Y d2
nkY
A 2 1 1 1
B 5 5 0 0
C 4 4 0 0
D 1 2 -1 1
E 3 3 0 0
 Sum of d2 = 1 + 0 + 0 + 1 + 0 = 2.
CONT…
 Step 3: Substitute into the formular
 rs = 1 - {6∑d2/n(n2-1)} = 1 - {6*2/5(25-1)} = 1- {12/(125-5)} =1-.1 = .9
 Final Answer:
 The Spearman’s Rank Correlation Coefficient is 0.9, indicating a strong
positive relationship between the two subjects.
LINES AND EQUATIONS OF
REGRESSION
 The lines and equations of regression are used in statistics to model the
relationship between two variables, typically denoted as X (independent
variable) and Y (dependent variable). There are two regression lines:
 1. Regression Line of Y on X
 This line predicts the values of Y based on given values of X. The equation is:
 Y=a+bX
 Where:
 a is the intercept (value of Y when X = 0)
 b is the slope or regression coefficient, which shows the change in Y for a unit
change in X.
 The slope b is calculated as:
 b = {n∑(XY)−∑X∑Y}/{n∑(X2)−(∑X)2}
CONT…
 Regression Line of X on Y
 This line predicts the values of X based on given values of Y. The equation is:
 X=c+dY
 Where:
 c is the intercept (value of X when Y=0).
 d is the slope or regression coefficient for X on Y.
 The slope d is calculated as:
 d={n∑(XY)−∑X∑Y}}/{n∑(Y2)−(∑Y)2}
 Key Points to Note:
 The two regression lines intersect at the point (Xˉ,Yˉ), where Xˉ and
Yˉare the means of X and Y, respectively.
 If the correlation coefficient between X and Y (denoted as r) is perfect
(r=±1), the two lines will coincide into a single line.
PROBLEM
 EXAMPLE: The following data shows the marks obtained by 5 students in
Mathematics (X) and Science (Y):

Student Mathematics (X) Science (Y)


A 10 15
B 20 25
C 30 35
D 40 45
 E
Find the regression 50X on Y.
line Y on X and 50
CONT…
 SOLUTION:
 Step 1: Calculate required values

Student X Y X×Y X2 Y2
A 10 15 150 100 225
B 20 25 500 400 625
C 30 35 1050 900 1225
D 40 45 1800 1600 2025
E 50 50 2500 2500 2500
TOTAL 150 170 6000 55000 6600
CONT…
 Step 2: Regression line of Y on X
 Equation of line Y = a + bY and b = = {n∑(XY)−∑X∑Y}/{n∑(X2)−(∑X)2}
 Substitute the values:
b = (5*6000 – 150*170)/(5*5500 – 150*150) = 0.9
 The intercept a is: a=∑Y/n−b⋅∑X/n = 170/5 – (.9)*150/5 = 34−27=7
 So, the regression line of Y on X is:
 Y=7+0.9X
 Step 3: Regression line of X on Y
 Equation of line X = c + dX and d = = {n∑(XY)−∑X∑Y}/{n∑(Y2)−(∑Y)2}
 Substitute the values:
b = (5*6000 – 150*170)/(5*6600 – 170*170) = 1.1
 The intercept a is: c =∑X/n−d⋅∑Y/n = 150/5 – (1.1)*170/5 = 30−37.4=−7.4
 So, the regression line of X on Y is:
 X=-7.4+1.1Y
CONT…
 Final Answer:
 Regression line of Y on X: Y=7+0.9X
 Regression line of X on Y: X=−7.4+1.1Y
RELATION BETWEEN
CORRELATION COEFFICIENT AND
REGRESSSION
coefficient (bb) is rooted inCOEFFICIENT

The relationship between the correlation coefficient (r) and the regression
how they describe the relationship between two
variables in a linear relationship.
 Key Connections:
 Correlation Coefficient (r):
 It measures the strength and direction of the linear relationship between two
variables.
 It is a standardized value that ranges from −1 to +1.
 −1 indicates a perfect negative linear correlation, +1 indicates a perfect positive
linear correlation, and 0 indicates no linear correlation.
CONT…
 Regression Coefficient (b):
 It represents the slope of the regression line in the linear regression
equation y=a+bx, where a is the intercept.
 It quantifies how much the dependent variable (y) changes for a one-unit
change in the independent variable (x).
 Mathematical Relationship: The correlation coefficient is related to the
regression coefficients as:
 r=±√byx⋅bxy
 where byx is the regression coefficient of y on x, and bxy is the regression
coefficient of x on y.
 The sign of r is determined by the direction of the relationship between x and
y.
CONT…
 Standard Deviations and Regression Coefficient: The regression
coefficient is influenced by the standard deviations of the variables:
 byx=r⋅σy/σx
 bxy=r⋅σx/σy
 Here, σx & σy are the standard deviations of x and yy, respectively.
 In summary, the correlation coefficient is a unit less measure of linear
association, while regression coefficients express the nature of the
relationship in terms of the units of the variables. Both are interrelated, as
r can be derived from the regression coefficients.
PROBLEM
 EXAMPLE:
 Consider the following dataset of two variables x and y:

x y
1 2
2 4
3 6
 SOLUTION: 4 8
 Step 1: Calculate r, byx, bxy from the given data
 For this one requires the value of ∑X, ∑Y, ∑X2, ∑Y2, ∑XY
CONT…
 Calculate these values

X) Y X2 Y2 XY
1 2 1 4 2
2 4 4 16 8
9 18
3 6
36
4 8 16 64 32
10 20 30 120 60
 r = (5*60 – 10*20)/{√(5*30-(10)*(10)√5*120 –(20)(20)}
 = (300-200)/√(150 -100)(600 – 400)
 = 100/√50*200 = 100/√10000 = 100/100 = 1
 byx = {n∑(XY)−∑X∑Y}/{n∑(X2)−(∑X)2} = (5*60-10*20)/(5*30-(10)(10))
 = (300-200)/(150 – 100)= 100/50 = 2
CONT…
 bXY = {n∑(XY)−∑X∑Y}/{n∑(Y2)−(∑Y)2}
 =100/(5*120-400) = 100/(600-400) = 100/200 = ½
 byx * bxy = 2*(1/2) = 1 = r
 Result:
 Proved r=±√byx⋅bxy

You might also like