C1 English
C1 English
Simple Regression
(Course: Econometrics)
Lê Phương
3 Other Issues
Using Simple Linear Regression
Extending the Simple Linear Regression Model
Introduction
yi = β1 + β2 xi + εi
or
E(y|xi ) = β1 + β2 xi ,
where
• y: Dependent variable
• yi : The i-th observed value of the dependent variable
• x: Independent variable
• xi : The i-th observed value of the independent variable
• εi : Random error associated with the i-th observation
Introduction
yi = β̂1 + β̂2 xi + ei
or
ŷi = β̂1 + β̂2 xi ,
where
• β̂1 : The intercept of the sample regression function, which is the
point estimate of β1 ,
• β̂2 : The slope of the sample regression function, which is the
point estimate of β2 ,
• ei : Random error, which is the point estimate of εi ,
• ŷi : The estimated value of yi .
Ordinary Least Squares (OLS) Method
To find β̂1 and β̂2 such that the sum of squared errors is minimized:
n
X n
X 2
ei2 = yi − β̂1 − β̂2 xi → min .
i=1 i=1
Questions:
1 Does the sample regression function always pass through the
sample mean point (x, y)? Why?
2 If x increases 10 times and y remains unchanged, how will β̂1
and β̂2 change?
3 If x increases 10 times and y increases 100 times, how will β̂1
and β̂2 change?
Ordinary Least Squares (OLS) Method
Example 1. Observations on TV advertisements (x) and cars sold (y)
for 5 garages in one week give the following data:
x 1 3 2 1 3
y 14 24 18 17 27
Sum of Squares
• Total Sum of Squares (TSS)
X X
TSS = (yi − y)2 = yi2 − ny 2 .
Coefficient of Determination
RSS ESS
R2 = =1−
TSS TSS
indicates the goodness of fit of the model to the sample data.
• 0 ≤ R 2 ≤ 1,
• R 2 = 1: The model fits the sample data perfectly,
• R 2 = 0: The model does not fit the sample data at all.
Assumptions of the OLS Method
Assumptions of the OLS Method
• Assumption 1: The relationship between y and x is linear. The
values xi are given and not random.
• Assumption 2: The errors εi are random variables with a mean of
0
E(εi |xi ) = 0.
• Assumption 3: The errors εi are random variables with constant
variance
Var (εi |xi ) = σ 2 .
• Assumption 4: No correlation between the errors εi
Cov(εi , xi ) = 0.
Assumptions of the OLS Method
εi ∼ N(0, σ 2 ).
with variances
xi2 2
P 2
σ2 s2
P
xi 2
σβ̂2 = σ ≈ s , σβ̂2 = ≈ .
1 nSxx nSxx 2 Sxx Sxx
Pn Pn
(Recall: Sxx = i=1 (xi − x)2 = i=1 xi2 − nx 2 .)
Define qP
xi2 2
• se(β̂1 ) = nSxx s : standard error of β̂1 ,
q
• se(β̂2 ) = s2
Sxx : standard error of β̂2 ,
then
Thus,
• The 1 − α confidence interval for β1 is
β̂1 − t α2 (n − 2)se(β̂1 ), β̂1 + t α2 (n − 2)se(β̂1 )
ŷ = β̂1 + β̂2 x R2
se se(β̂1 ) se(β̂2 ) df
t t(β̂1 ) t(β̂2 ) F
p − value p(β̂1 ) p(β̂2 ) p(F )
k1
β̂1∗ = k1 β̂1 , β̂2∗ = β̂2 .
k2
Using Simple Regression Model
Prediction Problem
Suppose we have the sample regression function: ŷi = β̂1 + β̂2 xi .
• Point estimate of y at x = x0 is
where s
1 (x0 − x)2
se(ŷ0 ) = s + .
n Sxx
• 1 − α Confidence interval for the individual value of y when
x = x0 is
ŷ0 − t α2 (n − 2)se(y0 − ŷ0 ), ŷ0 + t α2 (n − 2)se(y0 − ŷ0 ) ,
where s
1 (x0 − x)2
se(y0 − ŷ0 ) = s 1+ + .
n Sxx
Extending the Simple Linear
Regression Model
Regression through the Origin
When the intercept is zero, the model becomes a regression through
the origin, where the regression function is
• PRF: yi = β2 xi + εi ,
• SRF: yi = β̂2 xi + ei ,
with
σ2
P
xi yi
β̂2 = P 2 and σβ̂2 = P 2 ,
xi 2 xi
ESS
where σ 2 is estimated by s2 = n−1 .
Note:
( xi yi ) 2
P
• R 2 can be negative for this model, so use R02 = P 2P instead,
xi yi2
• R 2 cannot be compared with R02 .
The regression through the origin model is rarely used in practice.
Extending the Simple Linear
Regression Model
• Log-log or double log model:
ln yi = β1 + β2 ln xi + εi .
Interpretation of β2 : When x changes by 1%, y changes by β2 %.
• Log-linear model:
ln yi = β1 + β2 xi + εi .
Interpretation of β2 : When x changes by 1 unit, y changes by
100 · β2 %.
• Linear-log model:
yi = β1 + β2 ln xi + εi .
Interpretation of β2 : When x changes by 1%, y changes by
β2 /100 units.
• Inverse model:
1
yi = β1 + β2 + εi .
xi
These models are nonlinear in variables but can be transformed into
linear forms by changing variables.
Extending the Simple Linear
Regression Model
Exercise: Given the regression results between Y - sales (million
VND/ton) and X - price (thousand VND/kg) as follows: