We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
Regression
Definition: Regression is the measure of the average relationship between two or
more variables in terms of the original units of data:
Regression Equation: The functional relationship of a dependent variable with
one or more independent variable is called regression equation.
It is also called a prediction equation or estimating equation.
Note: The independent variable in regression analysis is called the "predictor" or
“regressor" and the dependent variable is called the regressed variable.Types of Regression:
= If there are only two variables under consideration, then the regression is called
simple regression.
= If there are more than two variables under consideration, then the regression is
called multiple regression.
= If there are more than two variables under consideration, and only the relation
between two variables is established, after excluding the effect of the remaining
variables, then the regression is called partial regression.
= If the relationship between x and y is non-linear, then the regression is a
curvilinear regression.There are certain guidelines for regression lines:
1) Use regression lines when there is a significant correlation to predict values.
2) Do not use if there is not a significant correlation.
3) Stay within the range of the data. For example, if the data is from 10 to 60, do not predict
a value for 400.
Regression Equations (Linear Fit)
* Linear regression equation of y on x
+ Linear regression equation of x on yEquation of the Regression Line of Y on X
The regression line of Y on X is the best-fitting straight line for the observed pairs of
values (x1, y1), (x2, y2), .«. (xn, yn), based on the assumption that x is the
independent variable and y is the dependent variable.
let the equation of the regression line of Y on X be assumed as
y=ax+tb. (1)
By the principle of least squares, the normal equations which give the values
of a and b.
are Lyj=aLxjtnb Q)
and Lxy,=adx2,4b Ex, )
Dividing equation (2) by n, we get
ysax+b (4)where ¥ = E(X) and = E(¥). (1)-(4) gives the required equation as
aix-¥) ‘5)
Eliminating b between equations (2) and (3)
we get
or
Using (6) in (5), we get the equation of the regression line of Y on X as
eee i)
or y-Fe BER ®)In a similar manner, assuming the equation of the regression line of X and Y as
x = ay + b and using the equations
Dx, = a Dy, + nb and Bxy,=a Ly? +b Ey,
we can get the equation of the regression line of X on Y as
x-¥ = BY (y-5) (9)
Oy
or (10)called the regression coefficient of ¥ on X and
is called the regression
Px. ty Ov
denoted by by ony SE or ME
coefficient of X on ¥ and denoted by by or Byy.
Clearly by 03 = Py, 1-0. ryy is the geometric mean of h, and by,
ryy = + yb, by
¥ 7 o
The sign of ryy is the same as that of b, or by, as by =r, 2% and
Oy
Oo: . a
i= Te fee have the same sign as ryy (+7 Gy and dy are positive).
x
Also — =
GEG3. When there is perfect linear correlation between X and Y, viz.,
when ryy = + I, the two regression lines coincide.
4. The point of intersection of the two regression lines is clearly the
point whose co-ordinates are
5. When there is no linear correlation between X and Y, v
= 0, the equations of the regression lines become y
which are at right angles.
when ryy
and x
xProblem 1: For the following data, find the regression line of y on x.
1/2/3]4 {5 | 8 [10
Solution 1: x = = = 8 = 4,714 and y= 2% = 4 = 12
x | y | x | x
1|9 9 aN
2) 8 | 16 4
3 | 10| 30 9
4/12) 48 | 16
5 | 14] 70 | 25
8 | 16 | 128 | 64
10 | 15 | 150 | 100_ ay xi ~ XV _
bn = I Seage 0887
The regression equation of y on x is
Y-Y = by(x—x)
> y-12 = 0.867(x 4.714)
=y = 0.867x + 7.9129Problem 2: From the following data, fit two regression equations by
finding actual means (of x and y), i.e., by the actual mean method
x{1/2/3/}4/5|6/7
y|2)4/7/6}5]6]5
Solution 2: x= U4 = Bo dgandy= U4 = B=5
x | y [X=x-x[ Y=y—-y] xX? Y? [XY
1/2 3B 3 a) 9 7) 9
2/4 2 A 4} 4] 2
3/7 A 2 1/1] -2
4) 6 0 1 0 0 0
5 | 5 A, 0 2D 1 0
6 | 6 2 1 al}4)2
7/5 3 0 9 9 0
28 | 35 0 0 28 | 16 | 11GY, la
by = ZO = = = 0.3928
“SX? 28
_=xY, _u
by = = = =0.6875
vy" Sy? ~ 16
The regression equation of y on x is
Y~Y = byx(x—X)
Sy—5 = 0.3928(x —4)
Sy = 0.393x + 3.428.
The regression equation of x on y is
xX byly-Y)
=+x-4 = 0.6875(y —5)
0.688y + 0.56.
=>xProblem 3: From the following results, obtain the two regression equa-
tions and estimate the yield of crops when the rainfall is 29 cms and the
rainfall when the yield is 600 kg.
Y (yield in kgs) | X (rainfall in cms)
Mean 508.4 26.7
SD. 36.8 46
Coefficient of correlation between yield and rainfall is 0.52.
Solution 3: We have x = 26.7,y = 508.4,0, = 4.6,0, = 36.8 and
p =0.52. Now,
bye = p24 = 4.16 and by = p2* = 0.065.
Ox oyThe required regression equations are
y = 4,16x + 397.328
and x 0.065y — 6.346.
When x = 29 cm, we have y = (4.16 x 29) + 397.328 = 517.968 kg.
When y = 600 kg, we have x = (0.065 x 600) — 6.346 = 32.654 cm.
i.e., when the rainfall is 29 cms, the yield of the crop is 517.968 kg, and
when the yield is 600 kg, the rainfall is 32.654 cms.Multiple linear Regressions
If the number of independent variables in a regression model is more than one, then
the model is called as multiple regression. In fact, many of the real-world applications
demand the use of multiple regression models.
Regression Model with wo independent variables using Normal equations:
Suppose the number of independent variables is two, then
Y=b,+hX,+b,X,
Normal equations are
EY =nb, +b X,+b,2X,
DYN, =b, DX, 4h 2X +b,2 XX,
EYXy =B,EXy +B EX Ky tT.Problem 1: The annual sales revenue(in crores of rupees ) of a product
as a function of sales force(number of salesmen) and annual
advertising expenditure(in lakhs of rupees) for the past 10 year are
summarized in the following table.
Annual sales|20 | 23 | 25 |27 [21 |29 |22 |24 |27 |35
revenue Y
Sales force | 8 13/8 18 |23 |16 |10 |12 |14 |20
X1
Annual 28 /23 /38 |16 |20 |28 |23 |30 |26 |32
advertising
expenditures
X2Let the regression model be Y =), +b,X,+b,X,
Y XL X2 XV X2? XIX2— | YXI_ | YX2
20 8 28 64 784 224 160 560
23 13 23 169 529 229 299 529
25 8 38 64 1444 304 200 950
27 18 16 324 256 288 486 432
21 23 20 529 400 460 483 420
29 16 28 256 784 448 464 812
22 10 23 100 529 230 220 506
24 Ln 30 144 900 360 288 720
27 14 26 196 676 364 378 702
35 20 32 400 1024 640 700Substituting the required values in the normal equations, we get the
following simultaneous equations
253 =10b, + 142b, + 264b, To 142 26%) [b 253
61
3678 = 142b, +2246b, +3617, a aa $61 Th | = | 268
26% siz 7326] | ba ast
6751 = 264h, +3617b, + 73265,
The solution to the above set of simultaneous equation is
b, = 5.1483, b,=0.6190 and b, = 0.4304
Therefore, the regression model is Y = 5.1483 + 0.6190.X, + 0.4304.X,