Econometrics1 Cha2
Econometrics1 Cha2
Yi = E[Y | X i ] + i
The stochastic disturbance term ɛi plays a critical
role in estimating the PRF.
The PRF is an idealized concept. Hence, we use the
stochastic sample regression function (SRF)
to estimate the PRF, i.e.,
we use: Yi = Yˆ + e to estimate Yi = E[Y | X i ] + i ;
i i
where: Yˆi = f(X i ) 3
2.2 The Simple Linear Regression Model
Linear: we assume that the PRF is linear in
parameters (α and β); it may or may not be linear in
variables (Y or X).
E[Y | X i ] = + X i Yi = + X i + i
Simple: because we have only one regressor (X).
Accordingly, we use:
Yˆ = ˆ + ˆX to estimate E[Y | X ] = + X .
i i i i
4
2.2 The Simple Linear Regression Model
Using theoretical r/p b/n X & Y, Yi is decomposed
into a non-stochastic/systematic component α+βXi
and a random component ɛi.
This is a theoretical decomposition because we do
not know the values of α and β, or the values of ɛ.
Operational decomposition of Yi is with reference to
the fitted line: actual value Yi is equal to the fitted
value Yˆ = ˆ + ˆX plus the residual ei.
i i
The residuals ei serve a similar purpose as the
stochastic term ɛi, but the two are not identical. 5
2.2 The Simple Linear Regression Model
E[Y|X2] = α + βX2
O4
Y ɛ4
E[Y|Xi] = α + βXi
O1 P3
P4
ɛ1 P2 ɛ3
ɛ2 O3
α P1
O2
X
X1 X2 X3 X4
7
2.2 The Simple Linear Regression Model
O4
e4 SRF : Yˆ = ˆ + ˆX
Y ɛ4
R4 E[Y|Xi] = α + βX i
O1 R3
P4
P3 Ɛi & ei are not
ɛ1 RP
22 e3
e1 ɛ3 identical
e2 ɛ2
P1 O3 Ɛ1 < e1
α O2
R1 Ɛ2 = e2
̂
Ɛ3 < e3
X1 X2 X3 X4 X
Ɛ4 > e4
8
2.3 The Method of Least Squares
Our sample is only one of the large number of
possibilities.
Implication: the SRF line is just one of the possible
SRFs. Each SRF line has unique ˆ and ˆ values.
Then, which of these lines should we choose?
Generally we will look for the SRF which is very close
to the (unknown) PRF.
We need a rule that makes the SRF as close as
possible to the observed data points.
But, how can we devise such a rule? Equivalently,
how can we choose the best technique to estimate
the parameters of interest (α and β)?
9
2.3 The Method of Least Squares
Generally, there are 3 methods of estimation:
method of least squares,
method of moments, and
maximum likelihood estimation.
The most common method for fitting a regression
line is the method of least-squares. We will use LSE,
specifically, Ordinary Least Squares (OLS).
What does OLS do?
A line fits a dataset well if observations are close to
it, i.e., if predicted values obtained using the line are
close to the values actually observed. 10
2.3 The Method of Least Squares
Meaning, the residuals should be small.
Therefore, when assessing the fit of a line, the
vertical distances of the points from the line are the
only distances that matter.
The OLS method calculates the best-fitting line for
a dataset by minimizing the sum of the squares of
the vertical deviations from each data point to the
line (the Residual Sum
n
of Squares, RSS).
Minimize RSS = i
2
e
i =1
We could think of minimizing RSS by successively
choosing pairs of values for ˆ and ˆ until RSS is
made as small as possible. 11
2.3 The Method of Least Squares
But, we will use differential calculus (which turns
out to be a lot easier).
Why the sum of the squared residuals? Why not just
minimize the sum of the residuals?
To prevent negative residuals from cancelling
positive ones.
If we use ei , all the error terms ei would receive
equal importance no matter how closely/widely
scattered the individual observations are from SRF.
If so, the algebraic sum of ei’s may be small (even
zero) though the eis are widely scattered about SRF.
Besides, the OLS estimators possess desirable
properties of estimators under some assumptions.12
2.3 The Method of Least Squares
n
=
n
− ˆ =
n
− ˆ − ˆ
minimize i i i i
α β
2 2 2
OLS: e (Y Y ) (Y X i )
i =1 i =1 i =1
αˆ, βˆ
n n
F.O.C.: (1) ( e ) 2
i [ (Yi − ˆ − ˆX i ) 2 ]
i =1
=0 i =1
=0
ˆ ˆ
n
2.[ (Yi − ˆ − ˆX i )][−1] = 0 (Yi − ˆ − ˆX i ) = 0
n
i =1 i =1
n n n n n
Yi − ˆ − ˆX i = 0 Yi − nˆ − ˆ X i = 0.
i =1 i =1 i =1 i =1 i =1
Y − ˆ − ˆX = 0 αˆ =Y − βˆX 13
2.3 The Method of Least Squares
n n
F.O.C.: (2) ( e ) 2
i [ (Yi − ˆ − ˆX i ) 2 ]
i =1
=0 i =1
=0
ˆ ˆ
n
2.[ (Yi − ˆ − ˆX i )][− X i ] = 0
i =1
n
[(Yi − ˆ − ˆX i )( X i )] = 0
i =1
n n n
Yi X i − ˆX i − ˆX i2 = 0
i =1 i =1 i =1
n n n
Yi X i = ˆ X i + ˆ X i2
i =1 i =1 i =1 14
2.3 The Method of Least Squares
Solve αˆ =Y − βˆX and Y i X i = αˆ X i + βˆ X i2
(called normal equations) simultaneously!
i i i i
Y X = ˆ X + ˆ X 2 Y X = (Y − βˆX)( X ) + βˆ X 2
i i i i
ˆ ˆ
Yi X i = Y X i − βX X i + β X i2
Yi X i − Y X i = βˆ X i2 − βˆX X i
ˆ
Yi X i − Y X i = β( X i − X X i )
2
ˆ
Yi X i − nXY = β( X i − nX )
2 2
Xi
b/c X = X i = nX. 15
n
2.3 The Method of Least Squares
Y X − nXY
Thus, 1. βˆ = i i
2 2
X − nX
i
Alternative expressions for βˆ :
( X − X )(Y − Y ) ˆβ = xy
ˆ
2. β = i i
x 2
2
( Xi − X ) where : x = X − X & y = Y − Y .
i i
Cov ( X , Y ) n Y X − ( X )( Y )
ˆ
3. β = 4. βˆ = i i i i
Var ( X ) 2 2
n X − ( X ) 16
i i
2.3 The Method of Least Squares
17
2.3 The Method of Least Squares
Previously, we came across two normal equations:
1. (Yi − ˆ − ˆX i ) = 0 this is equivalent to: ei = 0
n
i =1
n
2. i
[(Y −
ˆ − ˆX )( X )] = 0 equivalently,
i i e Xi i =0
i =1
Y = Yˆ
Y = ˆ + ˆX
These facts imply that the sample regression line
passes through the sample mean values of X and Y.
Yˆ = ˆ + ˆX
Y
X
X 19
2.3 The Method of Least Squares
Firm (i) Sales (Yi) Advertising Expense (Xi)
Numerical
1 11 10
Example:
2 10 7
Explaining sales
3 12 10
= f(advertising)
4 6 5
Sales are in
5 10 8
thousands
6 7 8
of Birr &
7 9 6
advertising
8 10 7
expenses are in
hundreds of Birr. 9 11 9
10 10 10 20
2.3 The Method of Least Squares
i Yi Xi y i = Yi − Y xi = X i − X xi y i 10
1 11 10 1.4 2 2.8 Y i
96
2 10 7 0.4 -1 -0.4 Y = i =1
=
. n 10
3 12 10 2.4 2 4.8
= 9.6
4 6 5 -3.6 -3 10.8
10
5
6
10
7
8
8
0.4
-2.6
0
0
0
0
X i
80
X = i =1
=
7 9 6 -0.6 -2 1.2 n 10
8 10 7 -0.4 -1 -0.4 =8
9 11 9 -1.4 1 1.4
10 10 10 0.4 2 0.8
Ʃ 96 80 0 0 21 21
2.3 The Method of Least Squares
i yi xi y i2 x i2
1 1.4 2 1.96 4
ˆ =
xiy i
=
21
= 0.75
x
2 0.4 -1 0.16 1 2
i 28
3 2.4 2 5.76 4
4 -3.6 -3 12.96 9
ˆ = Y − ˆX
5 0.4 0 1.96 0
6 -2.6 0 6.76 0
7 -0.6 -2 0.36 4 = 9.6 − 0.75(8) = 3.6
8 -0.4 -1 0.16 1
9 -1.4 1 1.96 1
10 0.4 2 0.16 4
Ʃ 0 0 30.4 28 22
2.3 The Method of Least Squares
ˆY = 3.6 + 0.75 X ei = Yi − Yˆi
= 14.65
i 2
e 2
i i i
e i
1 11.1 -0.10 0.01
2
3
8.85
11.10
1.15
0.90
1.3225
0.81 yˆ = 15.752
i
4 7.35 -1.35 1.8225
5
6
9.60
9.60
0.40
-2.60
0.16
6.76
y = 30.4 2
i
= e = 0
8 8.85 1.15 1.3225
i
9 10.35 0.65 0.4225
10 11.10 -1.10 1.21
Ʃ 96 0 14.65 23
2.3 The Method of Least Squares
Assumptions Underlying the Method of Least Squares
To obtain ˆ & ˆ in the model Yi = ˆ + ˆX i + ei , the only
assumption we need is that:
X must take at least 2 distinct values (number of
observations number of parameters).
But the objective in regression analysis is not only
to obtain ˆ and ˆ but also to draw inferences about
the true parameters and .
For example, we’d like to know how close ˆ and ˆ
are to and or how close Yˆi is to E[Y | X i ] .
To that end, we must also make certain assumptions
about the manner in which Yi ’s are generated. 24
2.3 The Method of Least Squares
The PRF Yi = + X i + i shows that Yi depends on
both X i and i .
Therefore, unless we are specific about how X i & i
are created/generated, there is no way we can make
any statistical inference about Yi and about and .
Assumptions made about the X variable and the
error term are extremely critical to the valid
interpretation of the regression estimates.
25
2.3 The Method of Least Squares
THE ASSUMPTIONS:
1. Zero mean value of the error term: E(ɛ|Xi) = 0.
Or equivalently, E[Y|Xi] = α + βXi.
2. Homoskedasticity or equal variance of ɛi: the
variance of ɛ is the same (finite positive constant
σ2) for all observations, i.e.,
var(ɛ|Xi) = E[ɛ|Xi – E(ɛ|Xi)]2 = E(ɛ|Xi)2 = σ2.
By implication: var(Y|Xi) = σ2.
var(Y|Xi) = E{α+βXi+ɛi – (α+βXi)}2
= E(ɛ|Xi)2
= σ2 for all i. 26
2.3 The Method of Least Squares
3. No autocorrelation between the disturbance terms.
Each error term ɛi is uncorrelated with every other
error term ɛs (for s ≠ i).
cov(ɛi,ɛs|Xi,Xs) = E({[ɛi−E(ɛi)]|X i}{[ɛs−E(ɛs)]|Xs})
= E[(ɛi|Xi)(ɛs|Xs)] = E(ɛiɛs|Xi,s) = 0.
Equivalently, cov(Yi,Ys|Xi,s) = 0. (for all s ≠ i).
4. The disturbance term ɛ and the explanatory
variable X are uncorrelated: cov(ɛi,Xi) = 0.
cov(ɛi,Xi) = E{[ɛi−E(ɛi)][Xi−E(Xi)]}
= E[ɛi(Xi−E(Xi))]
= E(ɛiXi)−E(Xi)E(ɛi) = E(ɛiXi) = 0 27
2.3 The Method of Least Squares
5. The error terms are normally and independently
distributed, i.e., i ~ NID(0, ).
2
28
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
☞Given the assumptions of the classical linear
regression model, the least-squares estimators
possess some ideal or optimum properties.
These statistical properties are extremely important
because they provide criteria for choosing among
alternative estimators.
These properties are contained in the well-known
Gauss–Markov Theorem.
29
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Gauss-Markov Theorem:
Under the above assumptions of the linear
regression model, the estimators ˆ and ˆ have the
smallest variance of all linear & unbiased
estimators of & , i.e., OLS estimators are the Best
Linear Unbiased Estimators (BLUE) of and .
ˆ =
x y
i
=
x (
iY − Y )i i
ˆ =
x Y
−
Y
i i x i
x 2
i x 2
i x x 2
i
2
i
=
xY xY
−
i i i
=
xY
i i
(sin ce x = 0)
x x x
2 2 2 i
i i i
xi
= (
ˆ x
i = k i Yi where k i =
)Y ˆ i
i
x 2
ix 2
ˆ
= k1Y1 + k 2Y2 + ... + k nYn 31
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Note that:
(1) xi is a constant
2
xi xi2 k i xi = 1
(4) . k i xi = ( x 2 )( xi ) = x 2 = 1 i 2
= 1
i i
k 2
x
i
2
x
k X
xi 1
(5) . k = [(
2
)] =
2
=
i
. =1
xi ( xi ) xi
i 2 2 2 2 i i
xi xi
(6) . k i X i = ( 2 )( X i ) = ( 2 )( xi + X ) =
i
x 2
+
X xi
=1
xi xi x 2
i x 2
32
i
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Unbiasedness: ˆ = k i Yi
ˆ = k i ( + X i + i )
ˆ = ki + ki X i + ki i
ˆ = + ki i [because ki = 0 and ki X i = 1]
E ( ˆ ) = E ( ) + E (k1 1 + k 2 2 + ... + k n n )
E ( ˆ ) = E ( ) + ( k i ).E ( i )
E ( ˆ ) = + ( ki ).(0) E ( ˆ ) = 33
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Efficiency:
~
Suppose is another unbiased linear estimator of .
ˆ ~
Then, var( ) var( ) .
Proof: var(ˆ ) = var( k Y ) i i
ˆ
var( ) = k ( ) + k ( ) + ... + k ( )
2 2 2 2 2 2
1 2 n 34
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
2
var( ˆ ) = 2 k i2 or , var(ˆ ) = x 2
~
i
~
var( ) = w1 ( ) + w2 ( ) + ... + wn ( )
2 2 2 2 2 2
~
var( ) = 2
w 2
i 36
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
ˆ ~
Let us now compare var(β) and var(β )!
Suppose wi k i , and the r/p b/n them
be given by : d i = wi − k i .
Because both wi and k i equal zero :
d i = wi − k i = 0
Because both wi xi and k i xi equal one :
d i xi = wi xi − k i xi = 1 − 1 = 0
(wi ) 2 = (k i + d i ) 2 wi2 = k i2 + d i2 + 2k i d i
xi
w = k + d + 2 (d i )(
2 2 2
)
xi
i i i 2
37
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
1
wi = k i + d i + 2(
2 2 2
2 i i
)( d x )
xi
1
w = k + d + 2(
2 2 2
)(0)
xi
i i i2
wi2 = k i2 + d i2
2
w 2
i 2
k ~
i
2
ˆ = ( + X) − X { k i + k i X i + k i i }
ˆ = ( + X) − X { + k i i }
ˆ = ( + X − X − X k i i )
E (ˆ ) = E ( ) − E ( X k i i )
E (ˆ ) = E ( ) − X ( k i ).E ( i )
E (ˆ ) = 40
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Efficiency:
Suppose ~ is another unbiased linear estimator of .
Then, var(ˆ ) var(~) .
Proof: var(ˆ ) = var( f iYi )
var(ˆ ) = var( f1Y1 + f 2Y2 + ... + f nYn )
var(ˆ ) = var( f1Y1 ) + var( f 2Y2 ) + ... + var( f nYn )
{since cov(Yi , Ys ) = 0 for i s}
var( ) = f1 var(Y1 ) + f 2 var(Y2 ) + ... + f n var(Yn )
ˆ 2 2 2
var( ) = f1 ( ) + f 2 ( ) + ... + f n ( ) = f i
ˆ 2 2 2 2 2 2 2 2
41
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
1
var(ˆ ) = f i = ( ( − Xki ) 2 )
2 2 2
n
1 2
var(ˆ ) = { ( 2 + X ki − Xki )}
2 2 2
n n
2
2 1 2
var(ˆ ) = { + X ki − X ki }
1 X
var(ˆ ) = 2 ( +
2 2
)
n n n xi 2
2 i
1 1 X
var(ˆ ) = 2 { + X 2 ki2 } = 2 { +
2
} X
n n xi or, var (αˆ) = σ
2
n xi2
note that :
2
1 X
1
fi = ( n − Xki ) = 1 − X ki = 1 fi 2 = +
n 42xi2
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Suppose : ~ = ziYi where zis are coefficien ts.
~ = ziYi
~ = zi ( + X i + i )
~ = zi + zi X i + zi i
i
E(~) = ( z ).E(α ) + ( z X ).E( β) + (
i i z ).E( ε )
i i
~
var( ) = z ( ) + z ( ) + ... + z ( )
2 2 2 2 2 2
1 2 n
~
var( ) = 2
z 2
i
Let us now compare var(ˆ ) and var(~ )!
Suppose zi f i , and the relatioshi p
b/n them be given by : d i = zi − f i . 44
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Because z i X i = 0, and z i = 1,
z i xi = z i ( X i − X ) = z i X i − z i X
= zi X i − X zi
= 0 − X (1) = − X
i i i − 2{ zi f i }
d 2
= z 2
+ f 2 1
where f i = − X (
n
xi
xi2
)
1 xi
d 2
= z +
2
f i − 2{ [ z i ( − X
2
)]}
xi
i i 2
n
1 X
d = z + f i − 2{ z i − (
2 2 2
2 i i
)( z x )}
xi
i i
n
1 X
d i2 = z i2 + f i 2 − 2{ − ( )(− X )}
n xi 2
45
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
2
1 X
d i2 = z i2 + f i 2 − 2{ + }
n xi 2
d i2 = z i2 + f i 2 − 2 f i 2
d i2 = z i2 − f i 2
z i2 = d i2 + f i 2
zi2 f i 2
2 zi2 2 f i 2
~
var( ) var(ˆ ).
46
2.5 Residuals and Goodness of Fit
Decomposing the variation in Y:
47
2.5 Residuals and Goodness of Fit
One measure of the variation in Y is the sum of its
squared deviations around its sample mean, often
described as the Total Sum of Squares, TSS.
TSS, the total sum of squares of Y can be
decomposed into two:
ESS, the ‘explained’ sum of squares, and
RSS, residual (‘unexplained’) sum of squares.
(Yi − Y) = (Yi − Y) + ei
2 ˆ 2 2
48
2.5 Residuals and Goodness of Fit
Yi = Yˆi + ei Yi − Y = Yˆi − Y + ei
(Yi − Y ) 2 = (Yˆi − Y + ei ) 2
(Y − Y ) = (Y − Y + e )
i
2 ˆ
i
2
i
y 2
i = ( yˆ i + ei ) 2
i i i + 2 yˆ i ei
y 2
= ˆ
y 2
+ e 2
1. R =2ESS
=
ˆ
y 2
TSS y 2
ESS 2
x 2
2. R =
2 ˆ
=
R =
2 ESS
=
( ˆx) 2
TSS y 2
TSS y 2
51
2.5 Residuals and Goodness of Fit
Coefficient of Determination (R2):
R =
2 ESS
= ˆ (
)( )
xy x 2
ESS i
ˆ
y 2
R =
2
=
TSS x 2
y 2
TSS y 2
4. R =
ESS
2
= ˆ
xy =
15.75
= 0.5181
TSS y 2 30.4
R =2 xy xy
x y 2 2
2
( xy ) 2
6. R =2 [cov( X , Y )]
5. R = 2
var( X ) var(Y )
x y 2 2
52
2.5 Residuals and Goodness of Fit
A natural criterion of goodness of fit is the
correlation between the actual and fitted values of
Y. The least squares principle also maximizes this.
In fact, R = (ryˆ , y ) = (rx , y )
2 2 2
i
x 2
n xi
2
n x 54i
2
To sum up …
2 2
i i i
2
y = ˆ
y 2
+ e 2
var(ˆ ) = =
x 2
i 28
TSS = ESS + RSS 0.0357 2
R =
2 ESS
=
y
ˆ 2
1 X 2
y 2 var(ˆ ) = ( +
2
)
n xi
TSS 2
yˆ 2
= ̂ xy
var(ˆ ) = 2 (
1
+
64
)
10 28
yˆ 2 ˆ
= 2
x 2
2.3857 2
RSS = (1 − R ) y 2 2
But , = ? 2
55
An unbiased estimator for σ2
E ( RSS ) = E ( e ) = (n − 2)
2
i
2
Thus, if we define ˆ 2
=
e2
i
, then :
n−2
1 1
E ( ) = (
ˆ
2
) E ( ei ) = (
2
)(n − 2) =
2 2
n−2 n−2
ˆ 2 =
i
e 2
is an unbiased estimator of 2 .
n−2
56
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Why is the Error Normality Assumption Important?
The normality assumption permits us to derive the
functional form of the sampling distributions of
ˆ , ˆ & ˆ 2 .
Knowing the form of the sampling distributions
enables us to derive feasible test statistics for the
OLS coefficient estimators.
These feasible test statistics enable us to conduct
statistical inference, i.e.,
1)to construct confidence intervals for , & .
2
. i ~ N (0, )
2
Y ~ N ( + X , )
i i
2
2
ˆ ~ N ( , ) ˆ ~ N ( ,
X 2
xi
2 i
2 )
x 2
i
ˆ −
( ) xi ~ N (0,1)
2
ˆ −
~ t n−2
2
βˆ − β seˆ(ˆ ) σˆ =
e i
~t n − 2 n−2
seˆ(βˆ)
i
2
ˆ X
seˆ( ˆ ) = seˆ(αˆ ) = σˆ.
n xi
2
i
2
x 58
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
59
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Let us continue with our earlier example.
We have: n = 10, ˆ = 3.6 , ˆ
= 0.75, R = 0.5181,
2
n−2 8
σˆ = 1.83125 1.3532
Thus, vâr(ˆ ) 2.3857(1.83125) 4.3688
seˆ(ˆ ) 4.3688 2.09
vâr( ˆ ) 0.0357(1.83125) 0.0654
seˆ( ˆ ) 0.0654 0.256 60
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
68
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
There is a correspondence between the confidence
intervals derived earlier and tests of hypotheses.
For instance, the 95% CI we derived earlier for is:
(0.16 < < 1.34).
Any hypothesis that says = c , where c is in this
interval, will not be rejected at the 5% level for a
two-sided test.
For instance, the hypothesis = 1 was not rejected,
but the hypothesis = 0 was.
For one-sided tests we consider one-sided
confidence intervals.
69
2.7 Prediction with the Simple Linear Regression
The estimated regression equation Yˆi = ˆ + ˆX i is used
for predicting the value (or the average value) of Y
for given values of X.
Let X0 be the given value of X. Then we predict the
corresponding value YP of Y by: YˆP = ˆ + ˆX 0
The true value YP is given by: YP = + X 0 + P
Hence the prediction error is:
YˆP − YP = (ˆ − ) + ( ˆ − ) X 0 − P
E (YˆP − YP ) = E (ˆ − ) + E ( ˆ − ) X 0 − E ( P ) E (YˆP − YP ) = 0
Yˆ = ˆ + ˆX is an unbiased predictor of Y. (BLUP!)
P 0 70
2.7 Prediction with the Simple Linear Regression
The variance of the prediction error is:
var(YˆP − YP ) = var(ˆ − ) + X 02 var(ˆ − )
+ 2 X 0 cov(ˆ − , ˆ − ) + var( P )
ˆ
var(YP − YP ) = 2X i
2
+ 2 X 2
0
− 2 X 2 X
+ 2
n xi i i
2 2 0 2
x x
ˆ 1 ( X − X ) 2
var(YP − YP ) = [1 + +
2 0
]
n xi 2
ˆ 2 1 (X 0 − X ) 2
var(YP − YP ) = [ + ]
n xi2
1. Y = + X + dY = .dX =
dY
= slope
dX
β is the (AVERAGE) change in Y resulting from
a unit change in X.
+ X +
2. Y =e ln Y = + X +
1 (dY ) Relative Δ in Y
d (ln Y ) = .dX .dY = .dX = Y =
Y dX Absolute in X
d (ln Y ) dY / Y %age in Y
= = =
d (ln X ) dX / X %age in X
β is the (AVERAGE) percentage change in Y
= Elasticity
resulting from a percentage change in X. 77