0 ratings 0% found this document useful (0 votes) 62 views 17 pages Bivariate Data
The document discusses Karl Pearson's coefficient of correlation, detailing its properties, assumptions, and formulas for calculation. It explains the relationship between two variables, emphasizing that the correlation coefficient ranges from -1 to +1, indicating perfect negative to perfect positive correlation, respectively. Additionally, it includes examples and regression analysis concepts, highlighting the importance of linear relationships and the independence of correlation coefficients from changes in scale.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Bivariate Data For Later
88
‘Mables yw hich
comelat Known as Karl Pea {ficient of
HON co-efficient), | bol‘, The
" ic i He symbol he
Wo varinblog ). Wis d y
hy
Where
eov (My)
oy
Similarly a
Thus ry, can also be written in the following forms
DEi-DOi-89
| xt
oe n Zxy — (Ex) (Zy)
ndx? — (Ex)? nBy? = (y)?
when actual mean Is 1n decimal, the calculations become very tedius and in such cases we may
Tike help of the following formula,
nEuy ~ (Zu) (Zv) ey
pea Ba Eien) (20) ah ede
Asa (si aye)
where A and B are assumed means.
7.6 PROPERTIES OF KARL PEARSON’S CO-EFFICIENT OF
CORRELATION (r)
4.-1
Ben ee ap yi = 2
od a (x-5) zzy-y) eucrs Z(x-H(-yP) 20
y y
Dividing both sides by
1 2G@-x
crA FIRST COURSE IN grap.
La — ten,
* pare
aa {vp evels re? Varal
. . tort 2
Interpretation Hitt e 1 there is perfect Positive corny
The lenst value of ris T and the most 1S jorrelation i
egative ©
perfect newati
cn two variables. Wf r= — 1, there is P able. Howevey
Sane va ee con between the vari . there
re is no linear fe
Wy = 0, we say that ther a
Ye ne-lincar relationship between the variable
x orrelation and if F is clogg
weal Pp
Wo vero, we have
Jk positive ©
If r is positive but close
we have strong positive correlation, Bee fend scale
2. Correlation co-efficient + is independent of chan
jes and after changing origin and sca
: ey
1¢ original
variables eC
and Fp 5 th Kate all conga,
x 2 Y-b=KV
cae > yab+ wv
= X = a+hU > Y =b+kV
x-X =h(U-0)
y-¥ =k(V-V)
Sie) = y)) Dh(u-w).k (V—¥)
Now, 5, = _
En way? [ER (v— 5)?
S(x- 37 E(y-
HAE (uit) (V=¥)
hk [= (= a)?.2(v- 5?
* ry = Tx Hence Proved.
3. Two independent variables are uncorrelated but the converse is not true.
Proof: If two variables are independent then their covariance is zero i.e. cov (X, ¥) =)
con(XG) nes Og
ty = =
Oy Ox.dy
Thus, if two variables are independent their co-efficient of correlation is zero i.e. independent
variables are uncorrelated.
But the converse is not tue. If ry, = 0, we say that there does not exist any linear
correlation between the variables as Karl Peanson’s co-efficient of correlation r,, is a measur
of only linear relationship. However, there may be strong nonlinear or curvilinear relationship
even though ry = 0.Is eo 3 a
o> conse ~ illustration consider ‘the bivariate dlstHbuton
on 3 2 1 0 1 2 \
; ° 4 1 0 1 4 9
og the formula of ry, WE get ry = 0. But X and ¥ are not independent ane they
AT the non-linear relation y= x2, Hence Proved
ore Ee eicuton co-efficient ris a pure mumber independent of unit of measurement
ation : Correlation co-efficient is symmettic Le. ry = Px
7 ASSUMPTIONS OF KARL PEARSON'S CO-EFFICIENT OF
my 71 CORRELATION
41, ‘There are three assumptions
(i) the variable x and y are linearly related,
(ii) there is a cause and effect relationship between factors affecting the values of the
ew variable x andy.
(iii) the random variables x and y are normally distributed
ts, EXAMPLE 1.
Calculate the co-efficient of correlation ry from the following data
=X = 71, LY = 70, EX? = 555,
TY? = 526, IXY= 527, n=10
(AHSEC 1997)
SOLUTION.
7 = VOY) _ nEXY ~ (2X) (ZY)
? O,0y |nzx? - (Ex)? Indy? = (EY)?
= 10527-7170 5270-4970 __ 300
lox 555—(71)? 10x 526-(70)? 509 360 ~ 42796
=.70
EXAMPLE 2.
Find the correlation eee between X and_¥ from the following data:
z 2 4 a
18 12 a ae 72. (AHSEC 1995)
ee‘A FIRST COURSE IN stan.
2 sue)
ev? Bey)
= hw (su)
ra om 492. 2 97
area 50823
gx 50-0 aT ei
m the following data ,
a
a ¥ fro
ny X am (AHSEC 24
EXAMPLE 3. de
Find the correlation co-efficlest be ee
interpret the result. all 7
X 16 20 : 5
: 25 3
¥: 30 40
SOLUTION. pe
iin. oS ae
. ie 5 follows
(0
Since X and Y are whole numbers, We can proceed aS @ : i
x Y xox [vee | =X" On: aaa
po 30 8 5 64 2 a
24 2 a i nt 100 0
ea 25 0 10 2) A 0
28 35 4 6
2 45 8 10 oe 400 80
=X=120 | EY=175 Zax? | 20 Hy? | 2cK- HW-7
= 160 = 250 = 100
r Recs coe 100 100 _,
SKK EY-y) vi60%250 = 300
Interpretation: Since r = .5 we find positive correlation between the variables X and)
EXAMPLE 4.
Calculate correlation co-efficient from the following data:
n = 10, Ex = 140,
Zy = 150, Zoe — 10)? = 180,
Zy - 15)? = 215, Xx = 10) (= 15) = 60,
SOLUTION.
Let us take u =x-10
vy =y-15
Zu = Ze —10)= Bx-n x 10 = 140 - 100 = 40
Zv = XQ - 15) = By—n x 15 = 150 — 150 = 0
Su? = Zor — 10 = 180
Xv? = Ly - 15)? = 215|
10) (= 15)
nEnv ~ (Zu) (Zv)
© dnd? = ny? fndv? ~ vy?
pe 10 x 60 - 40x0
© lox 180-40? Jidx 213-6
600 6
* J200% 2150 ~ 6557 >
=91
EXAMPLE 5.
show the correlation co-efficient between x and a — x is ~ 1,
(AHSEC 1997, 2000)
SOLUTION.
We know that
fy = £0¥(8 9)
f Ody
Pea cuGia=2)
00g
_ 42 (@~¥)(a~x-a+5)
LE (x—z)? JL E(a—x-a+z)
(x — x)?
X(x- x)?
EXAMPLE 6.
If a, 5, c, d are constants, then show that the co-efficient of correlation between ax
+8 and cy + d in numerically equal to that between x and y. (AHSEC 1997, 2000)
SOLUTION.
Let u=artb and v=otd
a u =axt+b > v
u-@ =a(x-3) ie
Now ep OS)
“Eu-a Bo
2a (x~ 2) e(y—y)
ac & (x~%)c(y—J)
Tac (eG=5' JE0-9FA FIRST COURSE IN STATIS
94
EXAMPLE 7. 1) = 16, find the standard deviation of,
Given that ry = .6, cov (x, 9) = 7.2, var ( LAS ESTey
SOLUTION.
We know that ry
S 6
. a
EXAMPLE 7.
A computer while calculating correlation SS two variables X ang Y
from 25 paris of observation obtained the following r¢ Ror ead
‘he Pn EXY= 508
paler: reas i airs of observati
It was however, discovered at the time of checking that two pi ‘ations
d (8, 6) while the correct value,
ied. They were taken as (6, 14) an r
mes @ Paaavenan ‘What Is the correct value of the correlation co-efficient ?
SOLUTION.
Corrected EX = 125-6-8+8+6 = 125
Corrected ZY = 100- 14-6+12+8=1
Corrected =X? = 650 - 6? — 82 + 82 + 62 = 650
Corrected ZY? = 460 — 256 - 36 + 144 + 64 = 436
Corrected EXY = 508-6 x 14-8 x 6+8 x 12+6 x 8 = 520
* Corrected correlation co-efficient is
sh NEXY ~ (2X) (BY)
NEX? — (2x)? Jnzy? —(ay)2
25 520~125x 100
aS x 100
25% 650— (125)? 1/25 x 436 —(100)2
eni500)
25x 30
= 500 _
0
wis
= 67
7.8 REGRESSION
The concept of regression was first used by Sir Francis
general meaning of ‘Regression’ j
ie Galton in his study of heredity
Sieejretita] ot) 80] Usk to thelaverage Valve
Telationship between ‘Wo variables and from this average
err...
Regression indicates the average‘AND REGRESSION
95
ation
A the average value of one variable is estimated corresponding to
a given value of other
ast process is KTOwn as simple regression. In re
r sis prow
©
Bression analysis there are two types
gible «dependent variable and independent variable, A dependent variable is one whose value
writes ted. It is also known as ex; plained variable. An independent variable is the variable
ivan egicted. It P
js Trences the value of the other variable, It is also known as explanator,
afteh je words of MM. Blair, “Regression analysis is a mathematical meas
ja the
hip between two oF More Variables in terms of the original units of
on
ti
ac of Regression
is!
regression is the fine which gives the best estimate of one variable for any specific
Brie other variable, For bivariate distribution we have two lines of 4
°
gression
The regression line of Y on X- this gives the best estimated value of ¥ corresponding
to a given value of X.
ia) The regression line of X on Y- This gives the best estimated values of
oa given value of Y.
ure of the average
the data.”
sabe
o
X corresponding
‘There are always two lines of regression since each variable may be treated as the
ident as well as the independent variable. When we consider X as independent variable and
epen dependent variable we get the regression line of Y on X Again, when we consider ¥
erie variable and X as the dependent variable we get the regression line of X on Y.
si
7.9 REGRESSION EQUATIONS
‘The regression equations express the regression lines.
1. Regression equation of Yon X is
Y-Y = by (X-X)
where by, = Regression co-efficient of Yon X
icon tae) enc
OE Ox
= Z@7%)(¥~¥) _ nBxy- (2) By)
L(x)? nx? — (Ex)?
2. Regression equation of X on Y is
X-X =b, (Y-¥)
where by, = Regression co-efficient of X on Y
= COV(X,Y) _ oy _ E(e- 3) (9-5)
oe Gy Z(y-y)?
= 22 xy ~ (Ex) (By)
ny? —(2y)?
Note : When actual mean is in decimal, we may take help of the formula given below:
nEuv ~ (Su) (Ev)
eS 2 Bee Ve
nZu? — (Zu)
=x-A
ye =y-BA FIRST COURSE IN Statigy,
4 and B are assumed means
.EFFICIENTS
¢ correlation co-efficien,
REGRESSION co.
co-efficients 1s th
7.10 PROPERTIES OF
mn
tric mean of the regressio
%
1. The geome
Proof. We have byx
ill be the sign ,
en i e sign and that will sien y
vill have the sam ete
Both the regression co-efficients Ww ae then ris also positive if both 6, ag
Se ae Peart a > 0, a, > 0, the sign of each of 7, by. 5,
hg are negative then ris also negative. a
eee are ffici en the other must b
2. If one of the regression co-efficient is > 1, th 5
= <2
ue
Proof. Let bye > 1 (ay) ae ie
Also, Dyebry = wa pols. Se
(Since “=i See = rst)
1
- ce
aa
Hence, by <1
3. Regression co-efficients are independent of the change of origin but not of scale
Proof. Let X and Y be the original variables and after changing origin and scale nex
variables.
X~a
Ua
@, 6, h, k are all constants. h> 0, k> 0.
Since correlation co-efficient are independent of chan;
SD is independent of chang f :
Y-b
Vea
a k
€ of origin and scale, ry = ryy. Agais
e in origin but not of scale
©, = ho, and, = ko,
Now
Also
Hence proved.97
ation coefficient, provided
*Meyoat, We want 1 show
by tO
> . obi Phy 2 FF
potent 2 or > ml Ee 5g
> eae G CE
° ap toy & 200, i dy toe 20cm
: (,- oF 20
pich s always true, Henee proved.
whic
me {mportant Remarks
J, The regression co-efficient of Yon A denoted by by, gives the change in Y for a unit
change in the value ofX The regression co-efficient of X on Y denoted by by, gives
the change in the value of X for a unit change in the value of Y.
Sot
Both the regression lines passes through the point (x, ))
3, When two variables are uncorrelated (7 = 0), the lines of regression become perpendicular
to each other. In case of perfect positive or perfect negative correlation (r = 1), the
lines of regression coincide since they can not be parallel.
EXAMPLE 9.
x and y are two variables for which 10 pairs of values are available. Further
Ee = 10, Ty =0, Tet = 148, By? = 164, Dy = 123
Find the regression co-efficient of y on x. (AHSEC 1992)
SOLUTION.
The regression co-efficient of y on x
— Cov(x,y) 15
Dye
Ox
_nday
Ex
EXAMPLE 10.
Given that by = .25, var (x) = 4, var (y) = 36, find the correlation between x and
y (AHSEC 1993)
SOLUTION.
Given by = 25
we know that ere 7598 A FIRST COURSE IN STATISTiog
EXAMPLE 11.
H ry = 6 and by = .8 what is the value of byy ? bad. |
SOLUTION. :
We know that Aim Bein boys canteen Dane eeu
es cea eto 5
EXAMPLE 12.
what would be the value of the
If two regression co-efficient are .8 and 1.2, (AHSEC 1958
co-efficient of correlation ?
SOLUTION.
We know that 1 = [Bypby = VEX = 196 = 98
ee i —5=0 and 2x + 3y- 8 = 0. Fi
L a pass lines have the equations x + 2y ~5=0 yeas 3
SOLUTION.
Given the regression lines
x+2y-5 =0
2x +3y-8 =0
Since both the regression lines passes through the point (¥, 7), we have
E+2y =5 ()
2x+3y =8 (ii)
@ x22 27447 = 10 iif)
(i) - (i) > y=2
“(> ¥+2X25—5 > =1
pa4}
EXAMPLE 14.
Give the following regression line of y on x is y = 10 ~ 6x. Derive the condition
under which the regression line of x on y can be written as x= 1 (a0 — y),
(AHSEC 1998)
SOLUTION.
Given the regression line of y on x
y = 10 - 6x
i. 26
If the regression line of y on x be>
99
maby
ID REGRESSION
jon ANI
ce 1
2 <== (los
6 ( y)
101
oe
: 6
since both the regression co-efficient are negative
now &
“5
—6) (-
Henee
oe sie 45.
6
the required condition is that there must be perfect negative correlation between x
rind the Hine of regression of y on x from the following data:
in
icra | ass | i | WSO ea We | RTT | a
Fe 55 60
hat will be the value of y for x= 48? (AHSEC 2003)
IN.
soLuTio! — =
Z y il u=x-30 vey3o | ve uw
3 25 25 ~14 625 196 350
10 32 -20 7 400 49 140
5 4 15 5 225 25 15
B 32 5 7 25 49 35
% 39 0 0 0 0 0
35 49 5 10 25 100 50
40 55 10 16 100 256 160
Acme \e 60 15. 24 225 4 315
zus35 | xve24 | Sut=1625| sv=1116 | Suv=975
Now,
¥ =u7+30 Y =7+39
=— 3430 =74 439
8g
= 25.63 = 42
bye y 42 = 75 (x - 25.63)
> y = 15x + 22.78
Now x = 48,
5X 48 + 22.78
= 36 + 22.78 = 58.78
EXAMPLE 16.
Heche
Sx + 180 = 0. Given that P= 4,05 = eo?
(AHSEC 2005)
The regression equation of x on y is 3y—
and n= 4, Find rand ¥.
SOLUTION. :
Given the regressuion equation of x on is
3y — Sx + 180 = 0
uv
1
a
Sy
RR
From (i),
EXAMPLE 17.
Suppose b,, is the regression’ co-efficient of y on x What does it indicate? Interpret
the meaning of the statement b,, = -.53. i (AHSEC 2005)
SOLUTION.
bye indicates the increment in the value of the dependent variable y for a unit change in th
value of the independent variable x.
Interpretation: When byx= -.53, we mean that the change in the value of y is -.53 for th
unit increase in the value of °x.
EXAMPLE 18,
‘The Equation of two lines of regression are given below:
8x — 10y + 66 = 0, 40x — 18) - 214 = 0
Find the co-efficient of correlation between x andy. (AHSEC 199wea
eLATION AND REGRESSION a
yTion.
at the equation of regression of y on x be Br- 10y + 66 = 0 and that of x on be
ey 214 =0
ay - 2
= Ree AGy +68 = 0 40x ~ 18y - 214 = 9
= 1Oy = Bx + 66 a Wi as
4548 eek
=oxt a
a yee 10 = esate
: 9
bye = ak
: ; 22 20)
Co-efficient of correlation
a9
r =+/By,by a
=t6
since both the regression co-efficients are positive, therefore r= .6
note; Here ~ 1 <. 6 <1, our supposition is correct. But if r goes outside the limits, we have
to interchange the lines.
EXAMPLE 19.
Is the following statement correct? Give reasons.
‘The regression co-efficient of x on y is 3.2 and that of y on x is 8”
(AHSEC 1996)
SOLUTION.
Given
Now, r = fby-by = [32*3
= is
Since r is out of the limit— 1 < r < 1, the statement is incorrect.
EXAMPLE 20.
You are given the following data
* =
AM 36 85
sD u 8
Correlation coefficient between x and y= .66
() Find two regression equations
(i Estimate the value of x when y= 75
SOLUTION.
Given,
¥ =36 ye 8o-»
A FIRST COURSE IN stan,
(9 Regression equa
jon of von r
8
¥-F wp tee fy cd y aS 66 Cr = %4)
o, "
ps y-Bs ah in « yp ® Ae * 67.72
Regression equation of x on y
0
x-€ erl2(y—5y
%,
u -
- x- % = 66 e078) > x 36 = 908) Tid
> x = OORy ~ 4114
(ii) when y = 75, x= 908 x 75 - 41.14
= 68.1 — 41.14 = 26.96
EXAMPLE 21,
Af the two lines of regression are
de S430 20
20x - oy ~ 107 = 0
which of these is the line of
regression of x on y, Find rand 0, when o, = 3,
SOLUTION.
tet the regression of y on x be 20x — 9y— 107 = 0 and that af x on y be 4x Sy +
0.
20x ~ 9» - 107 = 0 4x - Sy +30 =0
9% =20r- 107 = 4x = 5y — 30
- ee OT ey oe
Bi Oe 9 meats 4
20 5
oe es
5 R
Now.
>
re
Vr-b5y Since regression co-efficient are positive.
20 5
T Vomaee Gd
Since, r goes outside — | <
7 og, [* Our supposition is wrong. So, the required regression
ine of x on y is 20x - 9y — 107 = 9,
Calculation of +
Let us interchange the regression lines
20x — 9y - 107 = 0, ax - Sy + 30 =0ORRELATION AND REGRESSION
o
ape
. Explain what do you mean by positive correlation and negative correlation,
. Define Karl Pearson’s co-efficient of correlation. What does it measut
. If r= 0, what is the value of cov ae
. Show that-1 <
Show that correlation co-efficient is independent of change
. If the correlation co-efficient between two telated variables x and
103
> 20x = 9y 4 197 SS aeag
tae. in 107 A
id 20°" 20 e Tea
re ees
5
r
9 9
Now, by. 50 > oy
322
> som 20 en
EVIEW QUESTIONS
Write a short note on scatter diagram, (AHSEC 1998)
Give examples.
(AHSEC 1992)
(AHSEC 1996) (Ans. Need not]
(AHSEC 1997) (Ans. independent]
ie? (AHSEC 1999)
¥) and how are x and y related?
(AHSEC 2000) [Ans. o, independent]
S x = 1, where r is the correlation coefficient
Can two uncorrelated variable be independent?
If ry, = 0, how are x and y related?
(AHSEC 2004, 2007, 2015)
of origin and scale,
(AHSEC 2003, 2008)
y be 0.5, what will be
(AHSEC 1992) [Ans. .5]
xand y are two independent variables, show that they are uncorrelated,
the correlation co-efficient between y and x?
(AHSEC 1992,2006)
- The correlation co-efficient between x and y be r Ifa, 5 are constants, then show that
Correlation co-efficient between ax and by is numerically equal to r, (AHSEC 1992)
Suppose the correlation co-efficient (r = 0) between two variables x and y are a ee
it mean that x and y are independent? Explain by means of an example,(AHS!
i it scatter
Define correlation. Discuss positive, negative and zero correlation with the help of sc
diagram,
What do you mean by bi-variate data?
i i -efficients.
. Write down the relationship between correlation and regression co-efficiLATION AND REGRESSION
105
coRRE! te sea
21
28.
29.
30.
31.
32.
33.
35.
|. What is the value of geometric mean between the regression coefficients ?
en the expenditure on lodging Rs X and on fooding Rs ¥,
led the following results
y= 8500, 27 = 9600, 0, = , vy = Wer= 6
To study the relationship betw
an enquiry into 50 families re
x
imate the expenditure on fooding when the expenditure on lodgin,
Es
Rs. 200,
TAns. 198}
Find the regression equation of'y on x and that of x on y: Also find the value of y when
y= 3and the vlane of x when y = 5
x: 10 20 30 40 50 60
y 4 12 20 24 32 38
TAs. y= 18 + 52x, x = 14 1.7% y = 336, x = 9.5]
Find the co-efficient of correlation of the heights of mothers and daughters from the
following:
Height of Mothers: 65 66 67 68 69 70
Height of Daughters: 67 68 66 69 72
nA
n 69
[Ans. .67 (approx)]
(@) Given by = 85, by, = 89 and oy = 6, find the value of + and a,
[Ans. r= 87, o,, = 6.14]
(®) You are given
a
= 6, ¥ =10, ¥ = 20, o, = 1.875, Oy = 2.5,
find the line of regression of x on y. [Ans x - 1+ 45 y}
(©) Two lines of regression are x + 2y- 5 = 0 and 2x + 3y - 8 = 0 and var(z)
= 12. Calculate X, y, Gi and r. [Ans. 1, 2, 35.998, —87]
What is the relationship between correlation coefficient and regression coefficients?
(AHSEC 2008) (Ans, r=+ [By.-By ]
Define Karl Pearson’s correlation coefficient. State its properties.
[AHSEC 2006]
What are correlation and regression’?
(AHSEC 2007)
(AHSEC, 2015) [Ans Correlation coefficient]
‘What is regression coefficient ? Show that regression coefficient is independent of change
of origin but not of scate. (AHSEC 2015)