Fe5209 3 Ay 2024
Fe5209 3 Ay 2024
Chao ZHOU
3rd Lecture
Textbook
Outline
yt = α + βxt + ϵt (1)
where the subscript t (= 1, 2, 3, ..., T ) denotes the observation
number and ϵ a random disturbance (or noise, error) term.
α and β are also called intercept and slope of the line.
16
14
12
y1
10
8
6
4
Index
16
14
12
10
y3
8
6
4
4 6 8 10 12 14 16
y1
Notations
Let yt denote the actual data point for observation t and let ŷt
denote the tted value from the regression line.
Let ϵ̂t denote the residual, which is the dierence between the actual
value of y and the value tted by the model for this data point, i.e.
(yt − ŷt ). ϵ̂ can be understood as an estimation for ϵ.
(3) is equivalent to
T T
xt = 0
X X
yt − T α̂ − β̂
t=1 t=1
T ȳ − T α̂ − β̂T x̄ = 0
α̂ = ȳ − β̂ x̄
Chao ZHOU RMI FE5209 10 / 59
Linear regression model
t=1
T T T T
xt2 = 0
X X X X
xt yt − ȳ xt + β̂ x̄ xt − β̂
t=1 t=1 t=1 t=1
T T
xt yt − T ȳ x̄ + β̂T x̄ 2 − β̂ xt2 = 0
X X
t=1 t=1
Rearranging for β̂ ,
T
! T
2 2
X X
β̂ T x̄ − xt = T ȳ x̄ − xt yt
t=1 t=1
Therefore the coecient estimators for the slope β and the intercept
α are the minimisers of L, given by
P P
xt yt − T x̄ ȳ (xt − x̄)(yt − ȳ )
β̂ = P 2 2
= and α̂ = ȳ − β̂ x̄
(xt − x̄)2
P
xt − T x̄
1 P
(xt −x̄)(yt −ȳ )
We can rewrite β̂ = (T −1)1 P(xt −x̄)2 , which is equivalent to the
(T −1)
sample covariance between x and y divided by the sample variance of
x.
where sxy is the sample covariance between x and y and sx2 is the
sample variance of x .
Interpretation
Non-linear models
Estimators
q
1
ϵ̂2t is known as the standard error of the regression.
P
σ̂ = T −2
s P 2 s P 2
xt x
^) = σ̂
SE(α 2
= σ̂ P 2 t (5)
T ( xt − T x̄ 2 )
P
T (xt − x̄)
s s
^ = σ̂ P 1 1
SE(β) = σ̂ (6)
(xt − x̄) 2 (T − 1)sx2
Example
Given the regression results, it is of interest to test the hypothesis
that the true value of β is in fact 0.5 :
H0 : β = 0.5, Ha : β ̸= 0.5
This would be known as a two-sided test, since the outcomes are
both β < 0.5 and β > 0.5 under Ha .
Sometimes, some prior information may suggest that β > 0.5 would
be expected rather that β < 0.5, then a one-sided test would be :
H0 : β = 0.5, Ha : β > 0.5
Test statistics
In very general terms, if the estimated value is a long way away from
the hypothesised value, the null hypothesis is likely to be rejected ; if
the value under the null hypothesis and the estimated value are close
to one another, the null hypothesis is less likely to be rejected.
Under Assumption (5), it can be shown that the coecient estimates
will also be normally distributed,
α̂ ∼ N(α, var(α̂)) and β̂ ∼ N(β, var(β̂))
When the assumption (5) doesn't hold, the coecient estimates still
follow a normal distribution if all the other assumptions hold and the
sample size is suciently large.
Chao ZHOU RMI FE5209 25 / 59
Linear regression model
Thus we have
α̂ − α β̂ − β
∼ N(0, 1) and ∼ N(0, 1)
var(α̂)
p q
var(β̂)
which is equivalent to
−t1−α/2 · SE (β̂) ≤ β̂ − β ∗ ≤ t1−α/2 · SE (β̂)
Signicance
Review : p-value
t-ratio test
If the test is
H0 : β = 0 , Ha : β ̸= 0
Matrix notation
1 x11
y1 ϵ
y2 1 x12 1
β0 ϵ2
.. = .. .
.. β1 + ..
. . .
yT 1 x1T ϵT
OLS estimations
Minimise the RSS L = Tt=1 ε̂2t = Tt=1 (yt − xt β̂)2 .
P P
Write L in matrix notation
^ ′ (y − xβ)
L = ϵ̂′ ϵ̂ = (y − xβ) ^ = y′ y − β ^ ′ x′ y − y′ xβ
^+β
^ ′ x′ xβ
^ (7)
We can check easily that β′ x ′ y = y ′ xβ, thus (7) can be written
^ x′ y + β
L = y ′ y − 2β
′
^ x′ xβ
^′
F-test statistic
The F -test statistic for testing multiple hypotheses is given by
RRSS − URSS T − k − 1
F −test statistic = × (8)
URSS m
where
URSS = residual sum of squares from unrestricted regression
RRSS = residual sum of squares from restricted regression
m = number of restrictions
T = number of observations
k +1 = number of regressors in unrestricted regression
It can be shown that URSS ≤ RRSS .
Chao ZHOU RMI FE5209 37 / 59
Linear regression model
Examples
Ideas of F-test
The test statistic follows the F-distribution under the null hypothesis,
more precisely, Fm,T −k−1 .
Single hypotheses involving one coecient can be tested using a t - or
an F -test (they always give the same conclusion, see denitions in
Lecture 1), but multiple hypotheses can be tested only using an
F -test.
It is not possible to test hypotheses that are not linear or that are
multiplicative using this framework - for example, H0 : β1 β2 = 3, or
H0 : β12 = 4 cannot be tested.
Goodness of t statistics
Goodness of t statistics : R 2
A scaled version of the RSS is usually employed. The most common
goodness of t statistic is known as R 2 .
Total SS(TSS) = Explained SS(ESS) + RSS
t t t
ESS RSS
R2 = =1−
TSS TSS
R 2 can be understood as the portion of the total uctuation of the
dependent variable, y , explained by the regression relation.
R 2 must always lie between zero and one. A higher R 2 implies,
everything else being equal, that the model ts the data better.
Chao ZHOU RMI FE5209 43 / 59
Linear regression model
Problems with R 2
(i) R 2 never falls if more regressors are added to the regression. It's
impossible to use R 2 as a determinant of whether a given
variable should be present in the model or not.
(ii) R 2 can take values of 0.9 or higher for time series regressions,
and hence it is not good at discriminating between models.
In order to get around the problem (i), one can use the adjusted R 2 ,
denoted by R̄ 2
T −1
2
R̄ = 1 − (1 − R 2 ) (9)
T −k −1
With Cp , AIC, and BIC, smaller values are better, but for adjusted R 2
(R̄ 2 ), larger values are better.
3.5
3.0
bhp1
2.5
Time
2.5
2.0
vale1
1.5
1.0
0.5
Time
BHP
3.5
3.0
bhp1
2.5
Time
VALE
2.5
vale1
1.5
0.5
Time
3.5
3.0
bhp
2.5
vale
BHP-0.717VALE
1.95
1.90
1.85
wt
1.80
1.75
1.70
Index
16
14
12
10
y3
8
6
4
4 6 8 10 12 14 16
y1
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20 25 30
Lag
1.5
1.0
0.5
r3
0.0
-0.5
-1.0
r1
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20 25 30
Lag