UNIT -4
CORRELATION AND REGRESSION
Correlation: Correlation is the study of the natural relationship between two or more variables.
Uses of correlation: Correlation is very useful in physical, social sciences, business and economics. It
is very useful to economists to study the relationship between price and demand. And to estimate
costs, sales, price and other related variables.
Variables: Cost, sales, price, income, expenditure, investment, return, Loan given to customers, Loan
amount received from the customers, students appeared an examination, students passed an
examination, etc.,
Types of Correlation:
Positive correlation, Negative correlation, Simple correlation, Multiple correlation, Partial
correlation, Linear correlation and Non Linear correlation.
Positive correlation:
Sales Rs. : 1000 1600 2100 3000
Profit Rs : 500 750 900 1500
Negative Correlation:
Supply in Rice ( in tones) : 100 98 85 120 70
Price : 35 36 38 30 40
Simple correlation:
When only two variables are studied, it is said to be simple correlation.
For example, The study of age and consumption of milk .
Multiple correlation:
When more than two variables are studied simultaneously, the correlation is said to be multiple.
For example, The study of price, demand and supply of product.
Partial Correlation:
Partial correlation coefficient provides a measure of relationship between a dependent variable and
a particular independent variable when all other variables involved are kept constant and when the
effect of all other variables are removed.
Linear Correlation:
The correlation said to be linear , if the amount of change in one variable tends to bear a constant
ratio to the amount of change in the other.
Degree of correlation: y =3x +2 x=1 y = 5 2 8 3 11
γ(correlation coefficient) value
0 No (or) zero correlation
+1 Perfect positive correlation
-1 Perfect Negative correlation
0.7 to 0.99 High degree of Positive correlation
-0.7 to -0.99 High degree of Negative Correlation
0.36 to 0.69 Moderate degree of Positive Correlation
-0.36 to -0.69 Moderate degree of Negative correlation
0.01 to 0.35 Low degree of Positive correlation
-0.01 to -0.35 Low degree of Negative correlation
Correlation value lies in between -1 and +1
Karl Pearson’s Coefficient of correlation
Formula:
1. γ = (Co-variance of xy) / (σx σy)
2. γ = Σ xy / N σx σy (or) Σdxdy / N σx σy
Where x = X – Mean; y = Y – Mean ; dx = X – A ; dy = Y - A
σx=Standard deviation of x σy = standard deviation of y
3. γ = Σxy / √(Σx2 Σy2) or γ = (Σdxdy) / (√Σx2 √Σy2 )
4. γ = (NΣXY-(ΣX)(ΣY))/(√NΣX2- (ΣX)2 √NΣY2- (ΣY)2)
Spearman’s Rank Correlation coefficient
In 1904, a famous British Psychologist Charles Edward Spearman found out the method of
Coefficient of correlation of rank.
Rank correlation is applicable to individual observations.
This measure is useful in dealing with quantitative characteristics.
R = 1 - (6ΣD^2)/(N3 -1) Where R = Rank co-efficient of correlation
D = Difference of two ranks
ΣD2 = Sum of squares of the difference of two ranks.
Problem:1
Find the Karl pearson’s Coefficient of Correlation
X: 6 2 10 4 8
Y: 9 11 5 8 7
----------------------------------------------------------------------------------------
X Y X2 Y2 XY
-----------------------------------------------------------------------------------------
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 7 64 49 56
---------------------------------------------------------------------------------------------
30 40 220 340 214
N=5 ΣX = 30 ΣY = 40 ΣX2 =220 ΣY2 =340 ΣXY= 214
Γ = ( (5 X 214)-(30 X 40))/(√5X220-(30)2 X √5X340- (40)2) = - 0.9194
Problem:2
Calculate coefficient of correlation from the following data:
X: 100 101 102 102 100 99 97 98 96 95
Y: 98 99 99 97 95 92 95 94 90 91
Coefficient of correlation γ = Σxy
(√Σx2 √Σy2 )
X x = X –Mean x2 Y y= Y- Mean y2 xy
100 1 1 98 3 9 3
101 2 4 99 4 16 8
102 3 9 99 4 16 12
102 3 9 97 2 4 6
100 1 1 95 0 0 0
99 0 0 92 -3 9 0
97 -2 4 95 0 0 0
98 -1 1 94 -1 1 1
96 -3 9 90 -5 25 15
95 -4 16 91 -4 16 16
ΣX =990 Σx2 =54 ΣY= 950 Σy2 = 96 Σxy=61
Mean = ΣX / N = 990 /10 = 99 Mean =ΣY / N = 950 /10 = 95
Γ = Σxy
(√Σx2 √Σy2 )
= 61 / (√54 √96) = 0.847
Problem : 3
Covariance between X and Y variables is 10.6 and variance of X and Y is 16 and 9. Find correlation
value.
Variance = (Standard deviation)2 ; Standard deviation =√variance
Correlation value γ = Covariance / σx σy
= 10.6 / √16 √9
= 10.6 / 12 = 0.8833
Problem:4
Coefficient of correlation between two variables X and Y is 0.48. Their covariance is 36. The variance
of X is 16. Find the standard deviation of Y series.
Correlation value γ = Covariance / σx σy
Correlation value Γ = 0.48 ; Covariance = 36
Variance of X = σx2 = 16 σx = √16=4
Correlation value Γ = covariance / σx σy
0.48 = 36 / 4 σy ; 4 σy ( 0.48) = 36 ; σy = 36 / 4(0.48)
= 18.75
Regression:
Regression is the measure of the average relationship between two or more variables in terms of the
original units of the data.
Uses:
It is useful to estimate the relationship between two variables, prediction of unknown value,
forecast the business situations, estimate the error in sampling.
Equation
Regression equation of X on Y
X = a + bY
Regression equation of Y on X
Y = a + bX (Trend line of Time Series) a and b are reg.coefficient.
To determine the value of a and b, the following two normal equation are to be solved
simultaneously.
Σy = Na + bΣx a = ∑Y / n b = ∑XY /∑X2
Σxy = aΣx + bΣx2
Problem:5
Given X = 16 , σx = 4.8 Y = 20 σy = 9.6. The coefficient of correlation between x and y is 0.6.
What will be the regression coefficient of x on y ?
σx= 4.8 σy =9.6
Regression coefficient of x on y bxy = γ (σx/σy)
= 0.6 (4.8 / 9.6) = 0.6 (0.5)
= 0.3
Problem:6
The correlation coefficient between x and y is -1/2. The value of bxy = -1/8. Find byx.
Γ2 = bxy . byx
(-1/2)2 = -1/8 . byx
1/4 = -1/8 . byx
byx = -8/4 = -2
POINTS:
Bivariate data : Data are collected from two variables simultaneously.
Uncorrelation: If change in one variable does not affect on another variable.
Karl pearson is the best method of calculating correlation coefficient.
It is only limitation is that it is applicable for only linear relation.
Quickest method of finding correlation is concurrent deviation.
Spurious correlation between two variables having no casual relation.
Method applied for devicing the regression equation is least square.
Observed value – Estimated value = Error or residue
Two lines of regression are equal, when r = -1 or r = 1.
Correlation coefficient is dependent/independent of the units of measurement.
If the sum of the product of deviations of x and y series from their means is zero, then the
coefficient of correlation will be zero.
The linear equations y = a + bx and