0% found this document useful (0 votes)
499 views

Correlation Analysis

This document discusses correlation analysis, which examines the relationship between two or more variables. It begins by introducing bivariate data, which involves two variables, and scattered diagrams, which can graphically show the type of relationship. There are several key points made: - Correlation refers to how the values of one variable change in relation to the other. It can be positive (moving in the same direction) or negative (moving in opposite directions). - Methods for studying correlation include scatter diagrams, Karl Pearson's coefficient of correlation, and Spearman's rank correlation. - Correlation can be simple (between two variables), partial (controlling for other variables), or multiple (among three or more variables).

Uploaded by

Veerendra Nath
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
499 views

Correlation Analysis

This document discusses correlation analysis, which examines the relationship between two or more variables. It begins by introducing bivariate data, which involves two variables, and scattered diagrams, which can graphically show the type of relationship. There are several key points made: - Correlation refers to how the values of one variable change in relation to the other. It can be positive (moving in the same direction) or negative (moving in opposite directions). - Methods for studying correlation include scatter diagrams, Karl Pearson's coefficient of correlation, and Spearman's rank correlation. - Correlation can be simple (between two variables), partial (controlling for other variables), or multiple (among three or more variables).

Uploaded by

Veerendra Nath
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 20

Correlation Analysis

Introduction:
So far we have studied problems relating to one variable such as measures of central tendency,
measures of dispersion, measures of skewness, measure of kurtosis etc. For example one can find the mean
height of 50 persons, mean weight of 50 persons, mean chest size of 50 persons etc. In all the above examples
we are dealing with one variable. Now one question arises into our minds that, Is there any relation between
these variables i.e. height, weight and chest size? If yes, then how much relation is there? And what type of
relation is there? Similarly if we take some more examples like ages of husbands and their wives, income and
expenditure, price of commodity and its demand, amount of rainfall, daily temperature and yield of crop,
price and supply etc. the same questions arises. So to answer these questions one should deal with two or
more variables.

Bivariate Data:

The data generated by two variables (X, Y) is called Bivariate data.


For example, if we observe a group of persons and note their heights and weights, or if we observe a group of
couples and note the ages of husbands and wives, or if we measure the heights of the fathers along with their
sons, or monthly income of group of persons and their expenditure, if we consider the average price and the
demand of a commodity at various points of time, we are said to be dealing with data of two variables or
bivariate data. One of the variables denoted by X and other as Y.

Scattered diagram:

The graphical representation of bivariate data is called scattered diagram. This diagram can act as an
instrument for the condensation of bivariate data indicating the type of relation existing between the
concerned variables.
For example a bivariate data on heights of fathers and their sons is given below

x
y

64
62

65
63

66
68

67
68

67
66

68
70

68
69

69
72

70
68

72
71

The above scattered diagram shows that the heights of


sons increase as the heights of the father increases. Such a
behavior of the diagram makes us to believe that perhaps
there is some type of relation or association existing
between X-series and Y-series.
Scattered diagram

Correlation:
The term correlation (or co variation) indicates the relation between two such variables in which with
changes in the values of one variable, the values of the other variable also change.

Some definitions of correlation


Correlation Analysis deals with the association between two or more variables

-Simpson & Kafka


Correlation Analysis attempts to determine the degree of relationship between the variables
-Ya Lun Chou
Thus correlation is a statistical device which helps us in analyzing the co variation of two or more variables.
Significance of correlation or uses of correlation:
1.
2.
3.
4.
5.

Most of the variable shows some kind of relation ship. For example there is a relation between price
and supply, income and expenditure etc. With the help of correlation analysis we can measure in one
figure the degree and direction of relationship between the variables.
Once we know that two variables are closely related, we can estimate the value of one variable given
the value of another by using regression analysis.
The study of the correlation reduces the range of uncertainty associated with the decision making.
The prediction based on correlation analysis is likely to be more valuable and nearer to reality.
Correlation analysis is very helpful in understanding economic behavior.
In business, correlation analysis enables the executive to estimate costs, sales, prices and other
variables on the basis of some other series with which these costs, sales or prices may be functionally
related.

Thus the correlation studies are very widely used for variety of purposes and are considered to be
basic tools for detailed analysis and interpretation of statistical data relating to two or more variables.
Types of correlation:
Correlation is classified in several different ways. Three of the most important ways of classifying
correlation are
1. Positive and Negative Correlation.
2. Simple, Partial and Multiple Correlations.
3. Linear and Non linear Correlation

1.

Positive and Negative Correlation:

Whether correlation is positive or negative would depend on the direction of change of one variable
with change in the other variable. If both the variables are moving in the same direction it is called positive
correlation. In other words, if with the increase or decrease in values of one variable causes an increase or
decrease in the values of other variable then it is called positive correlation.
For example, heights and weights, income and expenditure, rainfall and yield of crops, supply and
price of commodity etc. this is also called direct correlation.
On the other hand if the variables are moving in opposite direction to each other, it is called
negative or inverse correlation. In other words if the increase or decrease in the values of one variable
causes the decrease or increase in the value of other variable then it is called negative correlation. This is
also known as indirect correlation.
For example, price and demand, temperature and woolen garments etc.
Examples:
Positive correlation
X
10
15
20
30
40

Y
120
127
140
170
180

X
15
10
8
5
3

Y
100
80
75
50
30

Negative correlation
X
10
15
20
35
50

Y
150
127
125
100
75

X
80
65
50
40
25

Y
500
600
650
750
875

2. Simple, Multiple and Partial Correlation

If the relationship between any two variables is studied it is called as simple correlation.
Ex: Relation ship between heights and weights, income and expenditure, rainfall and production of crops,
price and demand etc.
In multiple correlation we study together the relationship between three or more factors Ex:
Correlation between yield of rice and both the amount of rainfall and amount of fertilizers used.
In partial correlation though more than two factors are involved but correlation is studied only
between two factors and the other factors are assumed to be constant.
Ex: Let us consider the three factors yield of rice, amount of rainfall and the temperature in different time
periods and if we limit our study of correlation between yield of rice and amount of rainfall by assuming that
a constant temperature is existed daily it becomes a problem relating to partial correlation.

3. Linear and Nonlinear Correlation


The distinction between linear and non-linear correlation is based on the constancy of the ratio of change
between the variables.
If the ratio of change between the two variables is uniform, then we say there is a linear correlation
between two variables i.e. if the data relating to two variables are plotted on a graph and all the points lie on a
straight line then it is a case of linear correlation.
Ex:

X
Y

1
10

2
20

3
30

4
40

5
50

6
60

7
70

Correlation would be called non-linear or


curvilinear if the amount of change in one variable does not bear a constant ratio to the amount in the other
variable.
Ex:
X
10 15 20 25 30 35 40
Y
30 42 55 68 86 95 98

X
X

Positive linear
Correlation

Negative linear Positive non-linear Negative nonlinear


Correlation
Correlation
Correlation

Note: Since techniques of analysis of measuring non linear correlation are far more complicated than those
for linear correlation, we generally make an assumption that the relation ship between the variables is
of the linear type.
Methods of studying correlation:
The commonly used methods for studying simple correlation between two variables are
i)
Scatter Diagram Method
ii)
Karl Pearson Coefficient Of Correlation
iii)
Spearmans Rank Correlation
1) Scatter Diagram Method
This method is the simplest for determining relationship between two variables. Under this method
the given bivariate data is plotted on a graph paper in the form of dots i.e. for each values of X and Y, we put a dot
and thus obtain as many dots as the number of paired observations. Now by looking at the scatter of the points,
we can form an idea as to whether the variables are related or not. If the plotted points show an upward trend
from left bottom towards the right top, the correlation is said to be positive and if the points are downward from
left top towards the right bottom, the correlation is negative. If all the points lie on a straight line starting from the
left bottom going up towards the right top, it is treated as perfect positive correlation. If the points lie on straight
line starting from the left top and coming to the right bottom, it is perfect negative correlation. If the points lie in
haphazard manner then there is no correlation between the two variables. This method is also known as Dot
Diagram Method.

Interpretations:

Y
Perfect positive
Correlation

Perfect negative
Correlation

Low degree positive


Correlation

X
Low degree negative
Correlation

High degree positive High degree negative No correlation


Correlation
Correlation

Advantages of scatter diagram method:


1) It is simplest method of studying correlation between two variables.
2) The slope of the scatter diagram easily determines the types of correlation i.e. linear or curvilinear etc.
3) It helps in obtaining the line of best fit.
4) It is not influenced by the size of the extreme items.
Disadvantages of scatter diagram method:
1) This method can give only a rough idea of how the two variables are related.
2) It can not determine the exact degree of correlation as determined by other mathematical methods.
3) It is not applicable for studying partial and multiple correlations.
4) This method is not applicable when the number of observations is very large.
5) No algebraic treatment is not possible withy this method.

2)

Karl Pearsons correlation coefficient (or) Covariance method:

Of the several methods of measuring correlation, this method is very much popular in practice. This
method is also known as Pearsons correlation coefficient.
It is a mathematical method to measure a linear relationship between two variables.
Pearsons correlation coefficient between two random
variables X and Y usually denoted by rXY or r ( X , Y ) or r
Karl Pearson
and is defined as
Cov ( X , Y )
(Karl Pearson (n Carl Pearson)
rX ,Y

XY

Born 27 March 1857 @


Islington, London, England

Where
Cov ( X , Y ) Covariance between X and Y
X
= Standard deviation of X
Y
= Standard deviation of Y

rXY

Died 27 April 1936


(aged 79) @ Coldharbour,
Surrey, England

1 n
xi x yi y
n i1
1 n
1 n
2
xi x yi y 2

n i1
n i 1

Residence England
Nationality British
Fields lawyer, Germanist, eugenicist,
mathematician and statistician (primarily the latter)

Known for Pearson distribution , Pearson's r,


Pearson's chi-square test

1
xi y i x y
n i 1

Notable awards Darwin Medal (1898)

1 n 2
1 n 2
2
x

x
yi y 2

i
n i 1
n i 1

Interpretation of Karl Pearsons correlation coefficient:


The value of rXY always lies between -1 to +1 i.e.
1 rXY 1
1) If rXY >0, then there is positive correlation between two variables.
2) If rXY <0, then there is negative correlation between two variables.
3) If rXY =1, then there is perfectly positive correlation between two variables.
4) If rXY =-1, then there is perfectly negative correlation between two variables.
5) If rXY >0 and nearer to 1, then there is high degree positive correlation between two variables.
6) If rXY <0 and nearer to -1, then there is high degree negative correlation between two variables.
7) If rXY >0 and nearer to 0, then there is low degree positive correlation between two variables.
8) If rXY <0 and nearer to 0, then there is low degree negative correlation between two variables.
9) If rXY =0, then there is no correlation between two variables.

High -ve

Low -ve

Low +ve

High +ve
r=

-1

r= - 0.5

r=0

r=0.5

r=+1

Perfectly
-ve
No correlation
+ve

Properties of Karl Pearsons correlation coefficient:


1)
The correlation coefficient rXY is an absolute quantity or a pure number i.e. rXY is free from the unit
of measurement of both X and Y.
2) The value of rXY lies between -1 to 1
proof:
let us consider a fact

1 n

n i 1
1 n

n i 1

xi x

y y

xi x y i y

x
y

x x

2 i
x

yi y

0


y

1 1 n
1 1 n
1 1 n
2
2

2
i
i
xi x yi y 0
2
2
x y n i 1
x n i 1
y n i 1
1

2 Cov( X , Y )
0
x y

1 1 2rxy 0
2 2rxy 0 .. (1)

from (1)

3)

2 2rxy 0

2 2rxy 0

1 rxy 0

1 rxy 0

rxy 1............( 2)

rxy 1............( 2)

from (2) and (3) we get 1 rXY 1


The Karl Pearsons correlation coefficient rXY is invariant under change of origin and scale.
Proof: let X and Y be the
two variables with means X and Y .
Under change of origin and scale the new variables are given by

XA
Y B
and V
c
d

And the values of the variable changes to


y B
x A
vi i
ui i
d
c
and

dvi B
xi cui A
i
y dv B
x cu A
the Karl Pearsons coefficient of correlation is

rXY

1 n
xi x yi y
n i 1

1 n
1 n
2

yi y 2
x

i
n i 1
n i 1

rXY

rXY

1 n
cui A cu A dvi B dv B
n i 1
n
1 n
cui A cu A 2 1 dvi B dv B 2

n i 1
n i 1

cd n
ui u vi v
n i 1
c2
n

2
ui u
i 1

d2
n

v
i 1

rXY

4)
5)

1
ui u vi v
n i 1
1 n
1 n
2
ui u
vi v 2

n i 1
n i 1

rUV

The Karl Pearsons correlation coefficient rXY is the geometric mean of two regression coefficients.
i.e. rxy bxy b yx
rXY works both ways i.e. rXY = rYX

Merits of Karl Pearsons correlation coefficient:


1) This is the most popular method of finding out correlation and is widely used in practice.
2) This method indicates the direction (positive or negative) of correlation along with the exact degree of
correlation between the variables.
3) It is used for further mathematical treatments.
4) This method enables us in estimating the value of a dependent variable with reference to a certain value
of independent variable through regression analysis.
Demerits of Karl Pearsons correlation coefficient:
1) It is affected by the extreme values.
2) As compared to the other methods this method takes more time to calculate.
3) The correlation coefficient always assumes linear relationship regardless of the fact whether that
assumption is correct or not.
4) The correlation coefficient is subject to probable error.
Probable error:
The probable error is used to determine the reliability of the value of the correlation coefficient. The
probable error of the correlation coefficient is obtained as follows
1 r 2
P.E ( r ) 0.6745
n
Where
r is the correlation co efficient
n is the no of pairs of the observations
Interpretations:
1. If the value of r is less than the probable error there is no evidence of correlation i.e. the value of r is
not at all significant.
2. If the value of r is more than 6 times the probable error, the coefficient of correlation is significant.
3. By adding and subtracting the value of probable error from the coefficient of correlation we get
respectively the lower and upper limits with in which the coefficient of correlation of the population
can be expected to lie i.e. r P.E
Standard Error:
The standard error of the correlation coefficient is given by

S .E ( r )

1 r2

n
Coefficient of determination:
Coefficient of determination is defined as the square of the correlation coefficient r. Thus the
coefficient of determination is given by r2.
The coefficient of determination (r2) is defined as the ratio to the explained variance to the total
variance.
Coefficient of Deter min ation r 2

Explained Variance
Total Variance

Note:

If the variables are related with an equation aX bY c 0 then the correlation coefficient between X
and Y is r =+1 or -1 depends on a and b having different signs or a and b having equal signs.
Ex:
1. Let X and Y are related by the equation 2X+3Y=10 then the correlation coefficient between x and y
is r = -1.
2. Let X and Y are related by the equation 2X-4Y=1 then the correlation coefficient between x and y is
r = +1.
Example:1) Calculate Karl Pearsons coefficient of correlation for the following data

Example:2) Calculate Karl Pearsons coefficient of correlation for the following data and find the
probable error.

3)

Spearmans Rank Correlation Coefficient:


The product moment correlation coefficient measures the amount of linear relationship between two
variables which are measured quantitatively. Rank correlation coefficient is used to measure the amount of
degree of association between two sets of qualitative observations which can be ranked or graded through
scores as intelligence, beauty etc.
Let x (i ) , y (i ) (i 1,2,......, n) be the ranks of the values of the variables X and Y
respectively. Then the rank correlation coefficient between X and Y is calculated as
n

rs 1

6 d i

i 1
2

n(n 1)
Where d i x ( i ) y ( i ) , (i 1,2,......, n)
x (i ) Rank of X.
y (i ) Rank of Y

Derivation of Rank Correlation Coefficient:


Let us consider a set of n individuals ranked according to two characters
X and Y. i.e.
Rank of X = x (1) , x ( 2 ) , x ( 3) ,........., x ( n )
Rank of Y = y (1) , y ( 2 ) , y ( 3) ,........., y ( n )
The mean of the ranks of X is x

1 n
x(i )
n i 1

1
1 2 3 ........ n = n 1
n
2

Charles Edward Spearman


Born: September 10, 1863
Died: September 17, 1945

1 n
The mean of the ranks of Y is y y ( i )
n i 1

Now

So x y

1
1 2 3 ........ n = n 1
n
2

n2 1
12
n2 1

12

The variance of ranks of X is x 2


The variance of ranks of Y is y 2
So x = y
2

Now if d i stands for the difference in ranks of ith individual, we have


d i x( i ) y ( i )

=x


x y

y [ x y ]

= x( i ) x y (i ) x

(i )

(i )

1
1
2
d i n x( i ) x y ( i ) y
n
1
2
2
x( i ) x y (i ) y 2 x(i ) x y ( i ) y
n
1
1
1
2
2
= x( i ) x y ( i ) y 2 x(i ) x y (i ) y
n
n
n

= x y 2 xy
2

= x y 2 r x y
2

= x x 2r x x [ x = y ]
2

=2 x -2 r x
2

=2 x [1- r ]
2

1
2
di

1 r n
2
2 x
1
2
di
n
r 1
n2 1
2
12
2
6 d i
rs 1
n(n 2 1)
Note: The limits of spearmans rank correlation coefficient is 1 rs 1
Interpretations of rank correlation coefficient::
1.If rs 0 , and nearer to 1 - strong agreement.
2.If rs 0 ,and nearer to -1 -strong disagreement
3.If rs 1
- perfect agreement
4.If rs 1
-perfect disagreement
5.If rs 0
- no agreement
6.
When ranks are equal:

In the process of assignment of ranks if some of the values of the items are same then each of items get
the same rank. In this case the items are given the average of the ranks they would have received.
In this case an adjustment in the above formula for calculating the rank correlation coefficient is made.
1
2
(m 3 m) to the value of
d i as many times tie occurs in both
The adjustment consists of adding
12
the variables. Here m stands for the number of items whose ranks are common.

The reduced formula can be written as


1
1

2
6 d i (m 3 m) (m 3 m) ...........
12
12

rs 1
2
n(n 1)
Note: The answers obtained by Spearmans method and Karl Pearsons method will be the same provided no
values are repeated i.e. all items are different.
Merits:
1. This method is simpler to understand and easier to calculate as compared to Pearsons method.
2. Where the data are of qualitative nature like honesty, efficiency etc this method can be used with great
advantage.
3. Even when actual data are given, rank method can be applied for ascertaining correlation.
Demerits:
1. This method can not be used for finding out correlation in a grouped frequency distribution.
2. When the number of observations is large this method becomes tedious.

4.Concurrent Deviation Method


This method of studying correlation is the simplest of all the methods. The only thing that is required
under this method is to find out the direction of change of X variable and Y variable. The formula
applicable

where rc stands for coefficient of correlation by the concurrent method, c stands for the number of
concurrent deviations or the number of- positive signs obtained after multiplying D x with Dy
n= Number of pairs of observations compared.
Steps.
Find out the direction of change of X variable, i.e., as compared with the first value, whether the second
value is increasing decreasing or is constant. If it is increasing put a + sign if it is decreasing put a - sign
(minus) if it is constant put zero. Similarly, as compared to second value, find out whether the third value is
increasing, decreasing or constant. Repeat the same process for other values. Denote this column by D x.
In the same manner as discussed above, find out the direction of change of Y variable and denote this
column by Dy
Multiply Dx with Dy and determine the value of C, i.e.. the number of positive sign.
Apply the above formula, i.e.

Note: The significance of signs, both (inside the under-root and outside the under-root, is that we cannot
take the under-root of minus sign.
....
Therefore, if is negative, this negative value multiplied with the minus sign inside would make it positive and
we can take the under-root. But the ultimate result would be negative. If is positive then, of course, we get a
positive value of the coefficient of correlation.

Regression Analysis
Def: Regression is the measure of the average relationship between two or more variables in terms of the
original units of the data.

(Or)
Regression analysis is used to study the functional relationship between the variables and
thereby provides a mechanism for prediction or forecasting.
It is clear from the above definitions that regression analysis is a statistical device with the
help of which we are in a position to estimate or predict the unknown values of one variable called dependent
variable from the known values of another variable called independent variable.
Regression lines:
The line describing the average relationship between two variables is known as the line of
regression or the line of best fit
If we take the case of two variables X and Y we shall have two regression lines as the regression
line of Y on X and regression line of X on Y. The regression line of Y on X gives the most probable values of Y
for given values of X i.e. here X is independent variable and Y is dependent variable. The regression line of X
on Y gives the most probable values of X for given values of Y i.e. here Y is independent variable and X is
dependent variable.
Relation between regression lines and correlation coefficient:
The regression lines drawn on the scatter diagram give indication as to the extent of
correlation between two variables.
1. If two regression lines coincides then the correlation coefficient r is +1 or -1 i.e. perfectly positive or
perfectly negative depending on the direction of the regression lines.
2. If the two regression lines intersects at right angle then r is equal to 0 i.e. no correlation.
3. If the two regression lines make a very small acute angle, r is very high i.e. the closer the regression lines
greater is the degree of correlation.
4. The farther the two regression lines from each other, the lesser are the degree of correlation.
5. Upward slope of regression indicates positive correlation and downward slope indicates negative
correlation.

r = +1

r = -1

High +ve correlation

Low +ve correlation


r =0
Regression Equations:
Regression equations are algebraic expressions of the regression lines. Since there are two
regression lines, there are two regression equations.

1) The regression equation of Y on X is expressed as Y a bX . In this equation Y is the dependent variable and
X is the independent variable
2) The regression equation of X on Y is expressed as X a bY . In this equation X is the dependent variable
and Y is the independent variable
Where a and b are the numerical constants.
Determination of regression lines:
There are in general two methods to determine the regression lines.
1. Method of eye inspection.
2. Method of least squares.

Method of eye inspection:

In this method first take the independent variables on the X axis and dependent variable on Y axis and plot the
points. These points give a picture of scatter diagram. Now an average line is drawn through these scattered
points by eye inspection such that the number of points above the line is almost equal to the number of points
below the line. Although this method is simple, it is subjective to human bias and the regression line varies from
person to person and therefore prediction also varies.
1.

Regression line

Method of least squares


Fitting of regression line of Y on X by method of least squares:
Let us consider the regression line equation of Y on X be
Yc a bX (1)
a
Where and b are numerical constants.
In this equation Y is dependent variable and X is independent variable.
Now we have to find out a and b values and this can be done by the method of least squares.
In this method we have to find out a and b values such that the sum of the squares of the deviations of the
actual values of Y from the computed values Yc is least.
2
i.e. Y Yc is least.
S Y a bX is least
2

Now
Differentiating partially S w.r.t

S
0
a

a and equating to 0 we get

Y a bX 2 0

a
2 Y a bX 0

Y a bX 0

Y a b X
Y na b X

.. (2)

Differentiating partially S w.r.t b and equating to 0 we get

S
0
b

Y a bX 2 0

b
2 Y a bX ( X ) 0

Y a bX ( X ) 0
XY a X b X

XY

a X b X

2
2

.. (3)

Equations (2) and (3) are called normal equations. Solving these two normal equations we get
values
From (2)

na b X

ab

n
Y a bX

a Y bX .(4)

XY a X b X

XY Y bX X b X

a Y bX

XY Y X bX X b X

From (3)

XY
n
XY
n

YX

bX X

YX b

b X

XY YX

X 2

X 2

Cov ( X , Y )

.. (5)

a and

Cov ( X , Y )

XY
Y
Cov ( X , Y ) Y

We have r
X
XY X

Cov ( X , Y )
r Y
.......................................(6)
X
X2
Form (5) and (6) we get the value of b say b

b r Y .(7)
X

Putting (7) in (4) we get the value of a say a

a Y r Y X ..(8)
X
Putting (7) and (8) in (1) we get
Yc a bX

X r Y
X
X

Yc Y r Y ( X X )
X

Yc Y r

This is the required regression line of Y on X. Here b r

Y
is the slope of the regression line of Y on X and is
X

also called regression coefficient of Y on X and is denoted by bYX .


Similarly we can determine the regression line of X on Y and is given by

Xc X r
Here b r

X
(Y Y )
Y

X
is the slope of the regression line of X on Y and is also called regression coefficient of X on Y
Y

and is denoted by b XY .
Regression Coefficients:

Y
X
X
=r
Y

The regression coefficient of Y on X is denoted by bYX and is defined as bYX = r


The regression coefficient of X on Y is denoted by b XY and is defined as b XY

Properties of regression coefficients:


1) The square root of product of two regression coefficients is equal to correlation coefficient r. In other
words the geometric mean of two regression coefficients is correlation coefficient r i.e. r bYX b XY
Proof: we have two regression coefficients bYX = r
Now

bYX b XY = r

Y
X
and b XY = r
X
Y

Y X
r
= r2 r
X Y

2) Both the regression coefficients will have the same sign i.e. both are positive or both are negative. The
correlation coefficient will have the same sign as that of the regression coefficients.
i.e. if both regression

coefficients are positive then r is positive


and if both regression coefficients are
negative then r is negative.
3) Since the value of correlation coefficient can not exceed 1, one of the regression coefficients must be
less than one.
Proof:
4) Regression coefficients are independent of origin but not on scale.
Proof:
The regression coefficient of Y on X is bYX = r

Y
X

Cov( X , Y ) Y
XY X

bYX

Cov ( X , Y )
2
X

Let X and Y be the two variables with means X and Y .


Under change of origin and scale the new variables are given by

XA
Y B
and V
c
d

And the values of the variable changes to


y B
x A
vi i
ui i
d
c
and

dvi B
xi cui A
i
y dv B
x cu A
the regression coefficient of Y on X is

bYX

1 n
x x yi y
Cov ( X , Y ) n i 1 i

2
1 n
X
xi x 2

n i 1
n
1
cui A cu A dvi B dv B
n i 1
bYX
1 n
cui A cu A 2

n i 1
n
cd
ui u vi v
n i 1
bYX
c2 n
ui u 2

n i 1
1 n
u u vi v d
d n i 1 i
bYX
bVU
1 n
c
c
2
ui u

n i 1

5) The arithmetic mean of two regression coefficients is greater than equal to r.


i.e.

bYX b XY
r
2

Proof: let us consider the statement

bYX b XY
r is true
2

bYX b XY 2r is true

r X 2r is true
X
Y

Y X 2 is true
X Y
r

X
Y
2 is true
XY
2

Y X 2 X Y is true
2

Y X 2 X Y 0 is true
2

( X Y ) 2 0 which is always true


b b XY
r
Hence YX
2
Differences between correlation and regression:
Correlation
1. Correlation is the relationship between two or
more variables which vary with the same or
opposite direction.
2. It is used for testing and verifying the relation
between two variables and gives limited
information.
3. The coefficient of correlation is a relative
measure.
4. There may be nonsensical correlation
between two variables.
5. It is not useful for further mathematical
treatments.
6. There is no any functional relationship
between two variables.

1.
2.
3.
4.
5.
6.

Regression
Regression is a measure showing average
relationship between two variables.
Besides verification it is used for prediction of
one value in relation to the other given value.
Regression coefficient is a absolute measure.
There is no such nonsensical correlation.
It is widely useful for further mathematical
treatments.
There is functional relationship between two
variables.

You might also like