0% found this document useful (0 votes)
18 views36 pages

Autocorrelation 2

Uploaded by

23039himanshuraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views36 pages

Autocorrelation 2

Uploaded by

23039himanshuraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Autocorrelation II

OLS and Autocorrelation


• Estimate Yt =ß1 +ß2Xt +ut original model
• Assume that the disturbance, or error, terms are
generated by the following mechanism.
ut =ρut-1 + ϵt -1< ρ < 1 AR(1)
ρ is known as the coefficient of autocovariance and where ϵ t
is the stochastic disturbance term such that it satisfies the
standard OLS assumptions:
• As in the case of heteroscedasticity, in the
presence of autocorrelation the OLS
estimators are still linear unbiased as well as
consistent and asymptotically normally
distributed.
• They are no longer efficient (i.e., minimum
variance).
• We are likely to declare a coefficient
statistically insignificant (i.e., not different
from zero) even though it may be significant.
• t-statistic = ß2 / S.E
Detecting Autocorrelation
1a. Graphical Method
• Assumption of no autocorrelation of the
classical model relates to the population
disturbances ut (not directly observable).
• What we have are their proxies, the residuals
ˆ ut (OLS).
• Visual examination of the ˆ u’s gives us some
clues about the presence of autocorrelation
in the u’s.
• Various methods
• Plot ˆ ut against ˆ ut−1, that is, plot the
residuals at time t against their value at time
(t−1).
• If the residuals are nonrandom, we should
obtain pictures similar to those seen earlier.
Positive

Negative
1b. We can also plot the standardized residuals against
time.
• The standardized residuals are simply the residuals
(ˆ ut) divided by the standard error of the regression
(√ˆ σ2), that is, they are(ˆ ut/ˆ σ).

• Notice that ˆ ut and ˆ σ are measured in the units in


which the regressand Y is measured. The values of the
standardized residuals will therefore be pure numbers
(no units of measurement) and can be compared with
the standardized residuals of other regressions.
• In large samples (ˆ ut/ˆ σ) is approximately normally
distributed with zero mean and unit variance.
Examining the time sequence plot given in figure
we observe that both ˆ ut and the standardized ˆ
ut exhibit a pattern suggesting that ut are not
random.
II. The Runs Test
• If we carefully examine the figure: Initially, we have
several residuals that are negative, then there is a series
of positive residuals, and then there are several residuals
that are negative.
• If these residuals were purely random, could we observe
such a pattern? Intuitively, it seems unlikely.
• This intuition can be checked by the ‘runs test’,
sometimes also know as the Geary test, a nonparametric
test. (In nonparametric tests we make no assumptions
about the (probability) distribution from which the
observations are drawn. )
• To explain the runs test, let us simply note down the
signs (+or-) of the residuals
• There are 9 negative residuals, followed by 21
positive residuals, followed by 10 negative
residuals, for a total of 40 observations.
• We now define a run as an uninterrupted
sequence of one symbol or attribute, such as +
or -. We further define the length of a run as
the number of elements in it.
• In the sequence shown below, there are 3 runs:
a run of 9 minuses (i.e., of length 9), a run of 21
pluses (i.e., of length 21) and a run of 10
minuses (i.e., of length 10).
(−−−−−−−−−)(+++++++++++++++++++++)(−−−−−−−
−−−)
• By examining how runs behave in a strictly random
sequence of observations, one can derive a test of
randomness of runs.
• Question: Are the 3 runs observed in our example
consisting of 40 observations too many or too few
compared with the number of runs expected in a
strictly random sequence of 40 observations?
• If there are too many runs, it would mean that in our
example the residuals change sign frequently, thus
indicating negative serial correlation.
• Similarly, if there are too few runs, they may suggest
positive autocorrelation.
• A priori, figure below would indicate positive
correlation in the residuals.
• Now let
N=total number of observations=N1 +N2
N1=number of +symbols (i.e., +residuals)
N2=number of −symbols (i.e., − residuals)
R=number of runs
• Then under the null hypothesis that the successive outcomes
(here, residuals) are independent, and assuming that N1 > 10
and N2 > 10, the number of runs is (asymptotically) normally
distributed with

• This interval does not include 3. Hence, we can
reject the hypothesis that the residuals in our
example are random with 95% confidence.
• In other words, the residuals exhibit
autocorrelation.
• As a general rule, if there is positive
autocorrelation, the number of runs will be few,
whereas if there is negative autocorrelation, the
number of runs will be many.
• Swed and Eisenhart have developed special
tables that give critical values of the runs
expected in a random sequence of N
observations if N1 or N2 is smaller than 20.
• III. Durbin–Watson Test

• It is popularly known as the Durbin– Watson d statistic,

• The ratio of the sum of squared differences in successive residuals to the


RSS.
• Note that in the numerator of the d statistic the number of observations
is n−1 because one observation is lost in taking successive differences.
• A great advantage of the d statistic is that it is based on the estimated
residuals, which are routinely computed in regression analysis.
• It is now a common practice to report the Durbin–Watson d along with
summary measures, such as R2, adjusted R2, t, and F. Although it is now
routinely used, it is important to note the assumptions underlying the d
statistic.
Assumptions of DW statistic
These are the bounds in which the estimated d value must lie.
• If ˆ ρ =0, d=2 --- no serial correlation (of the first-order), d is expected to
be about 2.
Rule of thumb: if d is found to be 2 for an estimated model, we can
assume that there is no first-order autocorrelation, either positive or
negative.
• If ˆ ρ =+1 --- perfect positive correlation in the residuals, d≈0. Therefore,
the closer d is to 0, the greater the evidence of positive serial
correlation.
• If ˆ ρ =−1 --- perfect negative correlation among successive residuals,
d≈4. Hence, the closer d is to 4, the greater the evidence of negative
serial correlation.
Calculate DW or d statistic
1. Run the OLS regression and obtain the residuals.
2. Compute d (Most computer programs give this.)
3. For the given sample size and given number of explanatory
variables, find out the critical dL and dU values.
4. Now follow the decision rules given in the following table.
Limitations of DW Statistic
• Assumptions:
– (1) the explanatory variables, or regressors, are
non-stochastic;
– (2) the error term follows the normal distribution;
and
– (3) that the regression models do not include the
lagged value(s) of the regressand
• If value of d falls in the indecisive zone, we
cannot conclude whether (first-order)
autocorrelation does or does not exist.
Improvements over d

1. Durbin has developed the h test to test serial


correlation in models where lagged values of
Y variable are included on RHS.
2. Breusch and Godfrey (BG or LM): general test
of autocorrelation-- it allows for
– non-stochastic regressors, such as the lagged
values of the regressand;
– higher-order autoregressive schemes, such as
AR(1), AR(2), etc. and
– simple or higher-order moving averages of white
noise error terms, such as εt
BG Test
• Let us use the two-variable regression model to
illustrate the test (many regressors can be added to
the model including lagged values of the regressand)
Let
Yt =β1 +β2Xt +ut
• Assume that the error term ut follows the pth-order
autoregressive, AR(p), process as follows:
ut = ρ1ut−1 + ρ2ut−2 +···+ρput−p + εt

• The null hypothesis H0: ρ1 =ρ2 =···=ρp =0


(That is, there is no serial correlation of any order.)
Steps

1. Estimate 2 variable model by OLS and obtain the residuals, ˆ u t.


2. Regress ˆ ut on the original Xt (if there is more than one X variable in
the original model, include them also) and
ˆ u t−1, ˆ u t−2, ..., ˆ u t−p, where the latter are the lagged values of
the estimated residuals in step 1.
• Thus, if p=4, we will introduce four lagged values of the residuals as
additional regressors in the model. Note that to run this regression we
will have only (n−p) observations (why?). In short, run the following
regression:
ˆ ut =α1 +α2Xt +ˆ ρ1ˆ u t−1 +ˆ ρ2ˆ u t−2 +···+ˆ ρpˆ u t−p+ εt

and obtain R2 from this (auxiliary) regression.

3. If the sample size is large (technically, infinite), Breusch and Godfrey have
shown that (n− p)R2 ∼χ2 p
Correcting Autocorrelation

1. Try to find out if the autocorrelation is pure autocorrelation and


not the result of mis-specification of the model. Sometimes we
observe patterns in residuals because the model is misspecified—
that is, it has excluded some important variables—or because its
functional form is incorrect.
2. If it is pure autocorrelation, one can use appropriate
transformation of the original model so that in the transformed
model we do not have the problem of (pure) autocorrelation. As in
the case of heteroscedasticity, we will have to use some type of
generalized least-square (GLS) method.
3. In large samples, we can use the Newey–West method to obtain
standard errors of OLS estimators that are corrected for
autocorrelation. This method is actually an extension of White’s
heteroscedasticity-consistent standard errors method.
4. In some situations we can continue to use the OLS method.
Remedy
• The remedy depends on the knowledge we have about
the nature of interdependence among the
disturbances/structure of autocorrelation.
• Two-variable regression model:
Yt = β1 + β2Xt + ut ………………………1
assume that the error term follows the AR(1) scheme, i.e.,
ut = ρut−1 + εt ……………………………..2

where −1 < ρ < 1


• Now we consider two cases:
(1) ρ is known and
(2) ρ is not known & has to be estimated.
When ρ Is Known
• If the coefficient of first-order autocorrelation is known, the problem
of autocorrelation can be easily solved.
• If eqn 1 holds true at time t, it also holds true at time (t − 1).
• Hence, Yt−1 = β1 + β2Xt−1 + ut−1 ……………………………………..3
• Multiplying this eqn by ρ,
ρYt−1 = ρβ1 + ρβ2Xt−1 + ρut−1 …………………………………….4
• Subtracting eqn 4 from eqn 1 gives
(Yt − ρYt−1) = β1(1 − ρ) + β2(Xt − ρXt−1) + εt ..........................5
where
εt = (ut − ρut−1)
• We can express eqn 5 as
Y* t = β* 1 + β* 2X* t + εt …………………........................6
where
β* 1 = β1(1 − ρ),
Y* t = (Yt − ρYt−1),
X* t = (Xt − ρXt−1), and β* 2 = β2.
• Since the error term in eqn 6 satisfies OLS assumptions, we
can apply OLS to the transformed variables Y* and X* and
get BLUE estimators.
• In effect, running eqn 6 amounts to using generalized least
squares (GLS). Note: GLS is nothing but OLS applied to the
transformed model that satisfies the classical assumptions.
• Regression 5 is known as the generalized, or quasi,
difference equation.
• It involves regressing Y on X, not in the original form, but in
the difference form, which is obtained by subtracting a
proportion ( = ρ) of the value of a variable in the previous
time period from its value in the current time period.
• In this differencing procedure we lose one observation
because the first observation has no antecedent.
• To avoid this loss of one observation, the first
observation on Y and X can be transformed as
follows:
• Y1√ ( 1 − ρ2) and X1√ (1 − ρ2)
• This transformation is known as the Prais–
Winsten transformation.
When ρ Is Not Known
The First-Difference Method.

• Since ρ lies between 0 and ±1.


• At one extreme, one could assume that ρ = 0, that is, no (first-order) serial
correlation, and at the other extreme we could let ρ = ±1, that is, perfect
positive or negative correlation.
• Generally when we run a regression: assumption-- no autocorrelation. Then
let Durbin–Watson or other test show whether this assumption is justified.
• If, however, ρ = +1, the generalized difference equation (5) reduces to the
first-difference equation:
Yt − Yt−1 = β2(Xt − Xt−1) + (ut − ut−1)
or
ΔY t = β2ΔXt + Δϵt , …………………………7
Δ is the first-difference operator
• Since the error term in eqn 7 is free from (first-order) serial correlation to
run the regression 7 we take the first differences of both the regressand and
regressor(s) and run the regression on these first differences.
Alternative methods for estimating ρ

• From DW statistic
• From Residuals
• From Iterative Techniques:
• Cochrane- Orcutt
• Hildreth Lu

You might also like