Time Series Analysis
Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
1 / 143
Introduction
Objectives
Time series are ubiquitous in economics, and very important
in macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochastic
processes
what the most important standard models in economics are
how to fit models to real world time series
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
2 / 143
Introduction
Prerequisites
Descriptive Statistics
Probability Theory
Statistical Inference
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
3 / 143
Introduction
Class and material
Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
4 / 143
Introduction
Literature
Neusser, Klaus (2011), Zeitreihenanalyse in den
Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.
available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,
Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated and
Cointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),
Zeitreihenanalyse, 7. Aufl., Oldenbourg, M
unchen.
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
5 / 143
Basics
Definition
Definition: Time series
A sequence of observations ordered by time is called time series
Time series can be univariate or multivariate
Time can be discrete or continous
The states can be discrete or continuous
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
6 / 143
Basics
Definition
Typical notations
x1 , x2 , . . . , xT
or x(1), x(2), . . . , x(T )
or xt , t = 1, . . . , T
or (xt )t0
This course is about . . .
univariate time series
in discrete time
with continuous states
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
7 / 143
Basics
Examples
Quarterly GDP Germany, 1991 I to 2012 II
600
550
500
450
400
350
GDP (in current billion Euro)
650
1995
2000
2005
2010
Time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
8 / 143
Basics
Examples
6000
2000
DAX
DAX index and log(DAX), 31.12.1964 to 6.4.2009
1970
1980
1990
2000
2010
2000
2010
9.0
8.0
7.0
6.0
logarithm of DAX
Time
1970
1980
1990
Time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
9 / 143
Basics
Definition
Definition: Stochastic process
A sequence (Xt )tT of random variables, all defined on the same
probability space (, A, P), is called stochastic process with discrete time
parameter (usually T = N or T = Z)
Short version: A stochastic process is a sequence of random variables
A stochastic process depends on both chance and time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
10 / 143
Basics
Definition
Distinguish four cases: both time and chance can be fixed or variable
fixed
variable
t fixed
Xt () is a
real number
Xt () is a
random variable
t variable
Xt () is a sequence of
real numbers (path,
realization, trajectory)
Xt () is a stochastic
process
process.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
11 / 143
Basics
Examples
Example 1: White noise
t NID 0, 2
Example 2: Random walk
Xt
= Xt1 + t
and X0 = 0
NID(0, )
Example 3: A random constant
Xt
Z
Andrea Beccarini (CQE)
= Z
N(0, 2 )
Time Series Analysis
Winter 2013/2014
12 / 143
Basics
Moment functions
Definition: Moment functions
The following functions of time are called moment functions:
(t) = E (Xt )
(expectation function)
2 (t) = Var (Xt )
(variance function)
(s, t) = Cov (Xs , Xt ) (covariance function)
Correlation function (autocorrelation function)
(s, t)
p
(s, t) = p
2
(s) 2 (t)
moments.R
Andrea Beccarini (CQE)
[1]
Time Series Analysis
Winter 2013/2014
13 / 143
Basics
Estimation of moment functions
Usually, the moment functions are unknown and have to be estimated
Problem: Only a single path (realization) can be observed
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Can we still estimate the expectation function (t) and the
autocovariance function (s, t)? Under which conditions?
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
14 / 143
Basics
Estimation of moment functions
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Usually, the expectation function (t) should be estimated by
averaging over realizations,
n
1 X (i)
(t) =
Xt
n
i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
15 / 143
Basics
Estimation of moment functions
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Under certain conditions, (t) can be estimated by
averaging over time,
T
1 X (1)
=
Xt
T
t=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
15 / 143
Basics
Estimation of moment functions
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Usually, the autocovariance (t, t + h) should be estimated by
averaging over realizations,
n
1 X (i)
(i)
(t, t + h) =
(Xt
(t))(Xt+h
(t + h))
n
i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
16 / 143
Basics
Estimation of moment functions
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Under certain conditions, (t, t + h) can be estimated by
averaging over time,
(t, t + h) =
T h
1 X
)(Xt+h (1)
)
(Xt (1)
T
t=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
16 / 143
Basics
Definition
Moment functions cannot be estimated without additional
assumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:
The distribution of (Xt ())tT must not be completely different for
each t T
Restriction of the memory:
If the values of the process are coupled too closely over time, the
individual observations do not supply any (or only insufficient)
information about the distribution
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
17 / 143
Basics
Restriction of time heterogeneity: Stationarity
Definition: Strong stationarity
Let (Xt )tT be a stochastic process, and let t1 , . . . , tn T be an arbitrary
number of n N arbitrary time points.
(Xt )tT is called strongly stationary if for arbitrary h Z
P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )
Implication: all univariate marginal distributions are identical
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
18 / 143
Basics
Restriction of time heterogeneity: Stationarity
Definition: Weak stationarity
(Xt )tT is called weakly stationary if
1
the expectation exists and is constant: E (Xt ) = < for all t T
the variance exists and is constant: Var (Xt ) = 2 < for all t T
for all t, s, r Z (in admissible range)
(t, s) = (t + r , s + r )
Simplified notation for covariance and correlation functions
(h) = (t, t + h)
(h) = (t, t + h)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
19 / 143
Basics
Restriction of time heterogeneity: Stationarity
Strong stationarity implies weak stationarity
(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distribution
of Xt1 , . . . , Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization of
a stationary process, if a gliding window of appropriate width
always displays qualitatively the same picture
stationary.R
Examples
Andrea Beccarini (CQE)
[2]
Time Series Analysis
Winter 2013/2014
20 / 143
Basics
Restriction of memory: Ergodicity
Definition: Ergodicity (I)
Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T
1 X
Xt
T
t=1
(Xt )tT is called (expectation) ergodic, if
h
i
lim E (
T )2 = 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
21 / 143
Basics
Restriction of memory: Ergodicity
Definition: Ergodicity (II)
Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
(h) =
T h
1 X
(Xt )(Xt+h )
T
t=1
(Xt )tT is called (covariance) ergodic, if for all h Z
h
i
lim E (
(h) (h))2 = 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
22 / 143
Basics
Restriction of memory: Ergodicity
Ergodicity is consistency (in quadratic mean) of the estimators
of
and (h) of (h) for dependent observations
The process (Xt )tT is expectation ergodic if ((h))hZ is
absolutely summable, i.e.
|(h)| <
h=
The dependence between far away observations must be sufficiently
small
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
23 / 143
Basics
Restriction of memory: Ergodicity
Ergodicity condition (for autocovariance): A stationary Gaussian
process (Xt )tT with absolutely summable autocovariance function
(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if the
observations are dependent
If the dependence (h) does not diminish fast enough,
the estimators are no longer consistent
Examples
Andrea Beccarini (CQE)
[3]
Time Series Analysis
Winter 2013/2014
24 / 143
Basics
Estimation of moment functions
Summary of estimators
electricity.R
T
1 X
= XT =
Xt
T
t=1
T
h
X
(h) =
1
T
(h) =
(h)
(0)
(Xt
)(Xt+h
)
t=1
Sometimes, (h) is defined with factor 1/(T h)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
25 / 143
Basics
Estimation of moment functions
A closer look at the expectation estimator
The estimator
is unbiased, i.e. E (
) =
[4]
The variance of
is
[5]
T 1
(0)
2 X
h
Var (
) =
+
1
(h)
T
T
T
h=1
Under ergodicity, for T
T Var (
) (0) + 2
X
h=1
Andrea Beccarini (CQE)
Time Series Analysis
(h) =
(h)
h=
Winter 2013/2014
26 / 143
Basics
Estimation of moment functions
For Gaussian processes,
is normally distributed
N (, Var (
))
and asymptotically
T (
) Z N
0, (0) + 2
!
(h)
h=1
For non-Gaussian processes,
is (often) asymptotically normal
!
T (
) Z N 0, (0) + 2
(h)
h=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
27 / 143
Basics
Estimation of moment functions
A closer look at the autocovariance estimators (h)
For Gaussian processes with absolutely summable covariance function,
0
T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and
T Cov (
(h1 ) , (h2 ))
X
=
( (r ) (r + h1 + h2 ) + (r h2 ) (r + h1 ))
r =
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
28 / 143
Basics
Estimation of moment functions
A closer look at the autocorrelation estimators (h)
For Gaussian processes with absolutely summable covariance function,
the random vector
0
T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and a
complicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance and
autocorrelation estimators are biased!
autocorr.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
29 / 143
Basics
Estimation of moment functions
An important special case for autocorrelation estimators:
Let (t ) be a white-noise process with Var (t ) = 2 < , then
E (
(h)) = T 1 + O(T 2 )
1
2 )
T + O(T
Cov (
(h1 ) , (h2 )) =
2
O T
for h1 = h2
else
For white-noise processes and long time series, the empirical
autocorrelations are approximately independent normal random
variables with expectation T 1 and variance T 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
30 / 143
Mathematical digression (I)
Complex numbers
Some quadratic equations do not have real solutions, e.g.
x2 + 1 = 0
Still it is possible (and sensible) to define solutions to such equations
The definition in common notation is
i = 1
where i is the number which, when squared, equals 1
The number i is called imaginary (i.e. not real)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
31 / 143
Mathematical digression (I)
Complex numbers
Other imaginary numbers follow from this definition, e.g.
16 =
16 1 = 4i
5 =
5 1 = 5i
Further, it is possible to define numbers that contain
both a real part and an imaginary part, e.g. 5 8i or a + bi
Such numbers are called complex and the set of complex numbers
is denoted as C
The pair a + bi and a bi is called conjugate complex
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
32 / 143
Mathematical digression (I)
Complex numbers
imaginary axis
seq(0, 8, length = 11)
Geometric interpretation:
a+bi
er
alu
ev
lut
so
ab
imaginary part b
real part a
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
33 / 143
Mathematical digression (I)
Complex numbers
Polar coordinates and Cartesian coordinates
z
= a + bi
= r (cos + i sin )
= re i
a = r cos
b = r sin
p
a2 + b 2
r =
b
= arctan
a
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
34 / 143
Mathematical digression (I)
Complex numbers
Rules of calculus:
Addition
(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)
(a + bi) (c + di) = (ac bd) + (ad + bc)i
Multiplication (polar coordinates)
r1 e i1 r2 e i2 = r1 r2 e i(1 +2 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
35 / 143
Mathematical digression (I)
Complex numbers
imaginary axis
seq(2, 8, length = 11)
Addition:
a+bi
c+di
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I)
Complex numbers
Addition:
imaginary axis
seq(2, 8, length = 11)
a+bi
c+di
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I)
Complex numbers
Addition:
(a+c)+(b+d)i
imaginary axis
seq(2, 8, length = 11)
a+bi
c+di
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I)
Complex numbers
imaginary axis
seq(2, 8, length = 11)
Multiplication:
r2
r1
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
37 / 143
Mathematical digression (I)
Complex numbers
Multiplication:
imaginary axis
seq(2, 8, length = 11)
r=
r1
= 1 + 2
2
r2
r1
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
37 / 143
Mathematical digression (I)
Complex numbers
The quadratic equation
x 2 + px + q = 0
has the solutions
p
x =
2
If
p2
4
p2
q
4
q < 0 the solutions are complex (and conjugate)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
38 / 143
Mathematical digression (I)
Complex numbers
Example: The solutions of
x 2 2x + 5 = 0
are
(2)
+
x =
2
(2)2
5 = 1 + 2i
4
(2)
x =
(2)2
5 = 1 2i
4
and
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
39 / 143
Mathematical digression (II)
Linear difference equations
First order difference equation with initial value x0 :
xt = c + 1 xt1
p-th order difference equation with initial value x0 :
xt = c + 1 xt1 + . . . + p xtp
A sequence (xt )t=0,1,... that satisfies the difference equation
is called a solution of the difference equation
Examples (diffequation.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
40 / 143
Mathematical digression (II)
Linear difference equations
We only consider the homogeneous case, i.e. c = 0
The general solution of the first-order difference equation
xt = 1 xt1
is
xt = A t1
with arbitrary constant A since xt = At1 = 1 At1
= 1 xt1
1
The constant is definitized by the initial condition, A = x0
The sequence xt = At1 is convergent if and only if |1 | < 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
41 / 143
Mathematical digression (II)
Linear difference equations
Solution of the p-th order difference equation
xt = 1 xt1 + . . . + p xtp
Let xt = Az t , then
Az t
z t
= 1 Az (t1) + . . . + p Az (tp)
= 1 z (t1) + . . . + p z (tp)
and thus
1 1 z 1 . . . p z p = 0
Characteristic polynomial, characteristic equation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
42 / 143
Mathematical digression (II)
Linear difference equations
There are p (possibly complex, possibly nondistinct) solutions
of the characteristic equation
Denote the solutions (called roots) by z1 , . . . , zp
If all roots are real and distinct, then
xt = A1 z1t + . . . + Ap zpt
is a solution of the homogeneous difference equation
If there are complex roots the solution is oscillating
The constants A1 , . . . , Ap can be definitized with p initial conditions
(x0 , x1 , . . . , xp1 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
43 / 143
Mathematical digression (II)
Linear difference equations
Stability condition: The linear difference equation
xt = 1 xt1 + . . . + p xtp
is stable (i.e. convergent) if and only if all roots of the
characteristic polynomial
1 1 z . . . p z p = 0
are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using the
commands polyroot (base package) or ArmaRoots (fArma package)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
44 / 143
ARMA models
Definition
Definition: ARMA process
Let (t )tT be a white noise process; the stochastic process
Xt = 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
with p , q 6= 0 is called ARMA(p, q) process
AutoRegressive Moving Average process
ARMA processes are important since every stationary process can be
approximated by an ARMA process
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
45 / 143
ARMA models
Lag operator and lag polynomial
The lag operator is a convenient notational tool
The lag operator L shifts the time index of a stochastic process
L (Xt )tT = (Xt1 )tT
LXt
= Xt1
Rules
L2 Xt
n
L Xt
= Xtn
= Xt+1
= Xt
L Xt
Andrea Beccarini (CQE)
= L (LXt ) = Xt2
Time Series Analysis
Winter 2013/2014
46 / 143
ARMA models
Lag operator and lag polynomial
Lag polynomial
A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp
Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L2 , then
C (L) = A(L)B(L)
= (1 0.5L) 1 + 4L2
= 1 0.5L + 4L2 2L3
Lag polynomials can be treated in the same way as ordinary
polynomials
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
47 / 143
ARMA models
Lag operator and lag polynomial
Define the lag polynomials
(L) = 1 1 L . . . p Lp
(L) = 1 + 1 L + . . . + q Lq
The ARMA(p, q) process can be written compactly as
(L)Xt = (L)t
Important special cases
MA(q) process :
Xt = t + 1 t1 + . . . + q tq
AR(1) process :
Xt = 1 Xt1 + t
AR(p) process :
Xt = 1 Xt1 + + p Xtp + t
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
48 / 143
ARMA models
MA(q) process
The MA(q) process is
Xt
= (L)t
Xt
= t + 1 t1 + . . . + q tq
with t NID(0, 2 )
Expectation function
E (Xt ) = E (t + 1 t1 + . . . + q tq )
= E (t ) + 1 E (t1 ) + . . . + q E (tq )
= 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
49 / 143
ARMA models
MA(q) process
Autocovariance function
(s, t)
= E (s + 1 s1 + . . . + q sq ) (t + 1 t1 + . . . + q tq )
= E s t + 1 s t1 + 2 s t2 + . . . + q s tq
+1 s1 t + 12 s1 t1 + 1 2 s1 t2 + . . . + 1 q s1 tq
+...
+q sq t + 1 q sq t1 + 2 q sq t2 + . . . + q2 sq tq
The expectations of the cross products are
0
for s 6= t
E (s t ) =
2
for s = t
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
50 / 143
ARMA models
MA(q) process
Define 0 = 1, then
(t, t) = 2
(t 1, t) =
Xq
2
i=0 i
Xq1
2
i i+1
i=0
(t 2, t) = 2
Xq2
i=0
i i+2
(t q, t) = 2 0 q = 2 q
(s, t) = 0 for s < t q
Hence, MA(q) processes are always stationary
Simulation of MA(q) processes (maqsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
51 / 143
ARMA models
AR(1) process
The AR(1) process is
(L)Xt
= t
(1 1 L)Xt
= t
Xt
= 1 Xt1 + t
with t NID(0, 2 )
Expectation and variance function
[6]
Stability condition: AR(1) processes are stable if |1 | < 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
52 / 143
ARMA models
AR(1) process
Stationarity: Stable AR(1) processes are weakly stationary if
[7]
E (X0 ) = 0
Var (X0 ) =
2
1 21
Nonstationary stable processes converge towards stationarity
[8]
It is common parlance to call stable processes stationary
Covariance function of stationary AR(1) process
Andrea Beccarini (CQE)
Time Series Analysis
[9]
Winter 2013/2014
53 / 143
ARMA models
AR(p) process
The AR(p) process is
(L)Xt
Xt
= t
= 1 Xt1 + . . . + p Xtp + t
with t NID(0, 2 )
Assumption: t is independent from Xt1 , Xt2 , . . . (innovations)
Expectation function
[10]
The covariance function is complicated (ar2autocov.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
54 / 143
ARMA models
AR(p) process
AR(p) processes are stable if all roots of the characteristic equation
(z) = 0
are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p
An AR(p) process is weakly stationary if the joint distribution of the
p initial values (X0 , X1 , . . . , X(p1) ) is appropriate
Stable AR(p) processes converge towards stationarity;
they are often called stationary
Simulation of AR(p) processes (arpsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
55 / 143
ARMA models
Invertability
AR and MA processes can be inverted (into each other)
Example: Consider the stable AR (1) process with |1 | < 1
Xt
= 1 Xt1 + t
= 1 (1 Xt2 + t1 ) + t
= 21 Xt2 + 1 t1 + t
..
.
= n1 Xtn + 1n1 t(n1) + . . . + 21 t2 + 1 t1 + t
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
56 / 143
ARMA models
Invertability
Since |1 | < 1
Xt
i1 ti
i=0
= t + 1 t1 + 2 t2 + . . .
with i = i1
A stable AR(1) process can be written as an MA() process
(the same is true for stable AR(p) processes)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
57 / 143
ARMA models
Invertability
Using lag polynomials this can be written as
(1 1 L)Xt
Xt
Xt
= t
= (1 1 L)1 t
X
=
(1 L)i t
i=0
General compact and elegant notation
(L)Xt
Xt
= t
= ((L))1 t
= (L)t
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
58 / 143
ARMA models
Invertability
MA(q) can be written as AR() if all roots of (z) = 0 are larger
than 1 in absolute value (invertability condition)
Example: MA(1) with |1 | < 1; from
Xt
= t + 1 t1
1 Xt1 = 1 t1 + 12 t2
we find Xt = 1 Xt1 + t 12 t2
Repeated substitution of the ti terms yields
Xt =
i Xti + t
with i = (1)i+1 1i
i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
59 / 143
ARMA models
Invertability
Summary
ARMA(p, q) processes are stable if all roots of
(z) = 0
are larger than 1 in absolute value
ARMA(p, q) processes are invertible if all roots of
(z) = 0
are larger than 1 in absolute value
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
60 / 143
ARMA models
Invertability
Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)
process either as AR() or as MA()
ARMA(p, q) can be written as AR() or MA()
(L)Xt
Xt
((L))1 (L)Xt
Andrea Beccarini (CQE)
= (L)t
= ((L))1 (L)t
= t
Time Series Analysis
Winter 2013/2014
61 / 143
ARMA models
Deterministic components
Until now we only considered processes with zero expectation
Many processes have both a zero-expectation stochastic
component (Yt ) and a non-zero deterministic component (Dt )
Examples:
linear trend Dt = a + bt
exponential trend Dt = ab t
saisonal patterns
Let (Xt )tZ be a stochastic process with deterministic component Dt
and define Yt = Xt Dt
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
62 / 143
ARMA models
Deterministic components
Then E (Yt ) = 0 and
Cov (Yt , Ys ) = E [(Yt E (Yt )) (Ys E (Ys ))]
= E [(Xt Dt E (Xt Dt ))(Xs Ds E (Xs Ds ))]
= E [(Xt E (Xt )) (Xs E (Xs ))]
= Cov (Xt , Xs )
The covariance function does not depend on the deterministic
component
To derive the covariance function of a stochastic process, simply drop
the deterministic component
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
63 / 143
ARMA models
Deterministic components
Special case: Dt = t =
ARMA(p, q) process with constant (non-zero) expectation
Xt = 1 (Xt1 ) + . . . + p (Xtp )
+t + 1 t1 + . . . + q tq
The process can also be written as
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
where c = (1 1 . . . p )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
64 / 143
ARMA models
Deterministic components
Wolds representation theorem: Every stationary stochastic process
(Xt )tT can be represented as
Xt =
h th + Dt
h=0
with 0 = 1,
2
h=0 j
< and t white noise with variance 2 > 0
Stationary stochastic processes can be written as a sum of a
deterministic process and an MA() process
Often, low order ARMA(p, q) processes can approximate MA()
processes well
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
65 / 143
ARMA models
Linear processes and filter
Definition: Linear process
Let (t )tZ be a white noise process; a stochastic process (Xt )tZ is called
linear if it can be written as
Xt
h th
h=
= (L)t
where the coefficients are absolutely summable, i.e.
h= |h |
< .
The lag polynomial (L) is called (linear) filter
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
66 / 143
ARMA models
Linear processes and filter
Some special filters
Change from previous period (difference filter)
(L) = 1 L
Change from last year (for quarterly or monthly data)
(L) = 1 L4
(L) = 1 L12
Elimination of saisonal influences (quarterly data)
(L) = 1 + L + L2 + L3 /4
(L) = 0.125L2 + 0.25L + 0.25 + 0.25L1 + 0.125L2
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
67 / 143
ARMA models
Linear processes and filter
Hodrick-Prescott filter (important tool in empirical macro economics)
Decompose a time series (Xt ) into a long-term growth component
(Gt ) and a short-term cyclical component (Ct )
Xt = Gt + Ct
Trade-off between goodness-of-fit and smoothness of Gt
Minimize the criterion function
T
X
(Xt Gt )2 +
t=1
T
1
X
[(Gt+1 Gt ) (Gt Gt1 )]2
t=2
with respect to Gt for given smoothness parameter
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
68 / 143
ARMA models
Linear processes and filter
The FOCs of the minimization problem are
G1
X1
.
..
. = A ..
GT
XT
where A = (I + K 0 K )1 with
1 2 1
0 0
0 1 2 1 0
1 2 1
K = 0 0
..
..
..
..
..
.
.
.
.
.
0 0
0
0 0
Andrea Beccarini (CQE)
Time Series Analysis
... 0
... 0
... 0
.
. . . ..
0
0
0
..
.
0
0
0
..
.
. . . 1 2 1
Winter 2013/2014
69 / 143
ARMA models
Linear processes and filter
The HP filter is a linear filter
Typical values for smoothing parameter
= 10
= 1600
= 14400
annual data
quarterly data
monthly data
Implementation in R (code by Olaf Posch)
Empirical examples (hpfilter.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
70 / 143
Estimation of ARMA models
The estimation problem
Problem: The parameters 1 , . . . , p , 1 , . . . , q , 2 of an ARMA(p, q)
process are usually unknown
They have to be estimated from an observed time series X1 , . . . , XT
Standard estimation methods:
Least squares (OLS)
Maximum likelihood (ML)
Assumption: the lag orders p and q are known
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
71 / 143
Estimation of ARMA models
Least squares estimation of AR(p) models
The AR(p) model with non-zero constant expectation
Xt = c + 1 Xt1 + . . . + p Xtp + t
can be writte in matrix notation
Xp+1
1
Xp
Xp1
Xp+2 1 Xp+1
Xp
.. = ..
..
..
. .
.
.
XT
...
...
..
.
X1
X2
..
.
1 XT 1 XT 2 . . . XT p
c
1
..
.
p+1
p+2
..
.
Compact notation: y = X + u
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
72 / 143
Estimation of ARMA models
Least squares estimation of AR(p) models
The standard least squares estimator is
1 0
= X0 X
Xy
The matrix of exogenous variables X is stochastic
usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error term
and the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
73 / 143
Estimation of ARMA models
Least squares estimation of ARMA models
Solve the ARMA equation
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
for t ,
t = Xt c 1 Xt1 . . . p Xtp 1 t1 . . . q tq
Define the residuals as functions of the unknown parameters
t (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt d f1 Xt1 . . . fp Xtp
g1 t1 . . . gq tq
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
74 / 143
Estimation of ARMA models
Least squares estimation of ARMA models
Define the sum of squared residuals
S (d, f1 , . . . , fp , g1 , . . . , gq ) =
T
X
(
t (d, f1 , . . . , fp , g1 , . . . , gq ))2
t=1
The least squares estimators are
(
c , 1 , . . . , p , 1 , . . . , q ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq )
Since the residuals are defined recursively one needs starting values
0 , . . . , q+1 and X0 , . . . , Xp+1 to calculate 1
Easiest way: Set all starting values to zero ( conditional estimation)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
75 / 143
Estimation of ARMA models
Least squares estimation of ARMA models
The first order conditions are a nonlinear equation system
which cannot be solved easily
Minimization by standard numerical methods
(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system or
minimize S
Simple special case: ARMA(1, 1)
arma11.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
76 / 143
Estimation of ARMA models
Maximum likelihood estimation
Additional assumption: The innovations t are normally distributed
Implication: ARMA processes are Gaussian
The joint distribution of X1 , . . . , XT is multivariat normal
X1
X = ... N (, )
XT
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
77 / 143
Estimation of ARMA models
Maximum likelihood estimation
Expectation vector
X1
c/ (1 1 . . . p )
..
= E ... =
.
XT
c/ (1 1 . . . p )
Covariance matrix
X1
X2
= Cov . =
..
XT
Andrea Beccarini (CQE)
. . . (T 1)
. . . (T 2)
..
..
.
.
(T 1) (T 2) . . .
(0)
(0)
(1)
..
.
Time Series Analysis
(1)
(0)
..
.
Winter 2013/2014
78 / 143
Estimation of ARMA models
Maximum likelihood estimation
The expectation vector and the covariance matrix contain
all
2
unknown parameters = 1 , . . . , p , 1 , . . . , q , c,
The likelihood function is
T /2
L (; X) = (2)
1/2
(det )
1
0 1
exp (X ) (X )
2
and the loglikelihood function is
ln L (; X) =
T
1
1
ln (2) ln (det ) (X )0 1 (X )
2
2
2
The ML estimators are = arg max ln L (; X)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
79 / 143
Estimation of ARMA models
Maximum likelihood estimation
The loglikelihood function has to be maximized by numerical methods
Standard properties of ML estimators:
1
2
3
4
consistency
asymptotic efficiency
asymptotically jointly normally distributed
the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interest
rate spread (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
80 / 143
Estimation of ARMA models
Hypothesis tests
Since the estimation method is maximum likelihood,
the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses
H0 : g () = 0
H1 : not H0
where g () is an m-valued function of the parameters
Example: If H0 : 1 = 0 then m = 1 and g () = 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
81 / 143
Estimation of ARMA models
Hypothesis tests
Likelihood ratio test statistic
LR = 2(ln L(ML ) ln L(R ))
where ML and R are the unrestricted and restricted estimators
Under the null hypothesis
d
LR U 2m
and H0 is rejected at significance level if LR > 2m;1
Disadvantage: Two models must be estimated
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
82 / 143
Estimation of ARMA models
Hypothesis tests
For the Wald test we only consider g () = 0 , i.e.
H0 : = 0
H1 : not H0
Test statistic
d ()(
0 )
W = ( 0 )0 Cov
d
If the null hypothesis is true then W U 2m
The asymptotic covariance matrix can be estimated consistently as
d ()
= H 1 where H is the Hessian matrix returned by the
Cov
maximization procedure
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
83 / 143
Estimation of ARMA models
Hypothesis tests
Test example 1:
H0 : 1 = 0
H1 : 1 6= 0
Test example 2
H0 : = 0
H1 : not H0
Illustration (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
84 / 143
Estimation of ARMA models
Model selection
Usually, the lag orders p and q of an ARMA model are unknown
Trade-off: Goodness-of-fit against parsimony
Akaikes information criterion for the model with non-zero expectation
AIC =
ln
2
|{z}
goodness-of-fit
+ 2 (p + q + 1) /T
|
{z
}
penalty
Choose the model with the smallest AIC
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
85 / 143
Estimation of ARMA models
Model selection
Bayesian information criterion BIC (Schwarz information criterion)
BIC = ln
2 + (p + q + 1) ln T /T
Hannan-Quinn information criterion
HQ = ln
2 + 2 (p + q + 1) ln (ln T ) /T
Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
86 / 143
Estimation of ARMA models
Model selection
Another illustration: The true model is ARMA(2, 1) with
Xt = 0.5Xt1 + 0.3Xt2 + t + 0.7t1 ; 1000 samples of size n = 500
were generated; the table shows the models orders p and q as selected by
AIC and BIC
p
0
1
2
3
4
5
0
0
0
0
0
9
11
# orders selected by
q
1
2
3
0
0
0
18
64
23
171
21
16
7
35
58
2
12 139
6
12
56
Andrea Beccarini (CQE)
AIC
4
0
14
5
80
37
46
5
0
6
7
45
44
56
0
0
0
0
1
6
1
Time Series Analysis
# orders selected by
q
1
2
3
0
0
0
310 167
4
503
3
1
0
2
1
1
0
0
0
0
0
BIC
4
0
0
0
0
0
0
Winter 2013/2014
5
0
0
0
0
0
0
87 / 143
Integrated processes
Difference operator
Define the difference operator
= 1 L,
then
Xt = Xt Xt1
Second order differences
2 = () = (1 L)2 = 1 2L + L2
Higher orders n are defined in the same way; note that n 6= 1 Ln
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
88 / 143
Integrated processes
Definition
Definition: Integrated process
A stochastic process is called integrated of order 1 if
Xt = + (L)t
P
where t is white noise, (1) 6= 0, and
j=0 j|j | <
Common notation: Xt I (1)
I (1) processes are also called difference stationary or
unit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since (1) = 0)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
89 / 143
Integrated processes
Definition
Stationary processes are sometimes called I (0)
Higher order integrations are possible, e.g.
Xt
I (2)
Xt
I (0)
In general, Xt I (d) means that d Xt I (0)
Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
90 / 143
Integrated processes
Definition
Example 1: The random walk with drift, Xt = b + Xt1 + t ,
is I (1) because
Xt
= Xt Xt1
= b + t
= b + (L)t
where 0 = 1 and j = 0 for j 6= 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
91 / 143
Integrated processes
Definition
Example 2: The trend stationary process, Xt = a + bt + t ,
is not I (1) because
Xt
= b + t t1
= (L)t
with 0 = 1, 1 = 1 and j = 0 for all other j
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
92 / 143
Integrated processes
Definition
Example 3: The AR(2) process
Xt
(1 L) (1 L) Xt
= b + (1 + ) Xt1 Xt2 + t
= b + t
is I (1) if || < 1 because Xt = (L) (b + t ) with
(L) = (1 L)1 = 1 + L + 2 L2 + 3 L3 + 4 L4 + . . .
P
1
i
and thus (1) =
i=0 = 1 6= 0. The roots of the characteristic
equation are z = 1 and z = 1/
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
93 / 143
Integrated processes
Definition
Example 4: The process
Xt = 0.5Xt1 0.4Xt2 + t
is a stationary (stable) zero expectation AR(2) process; the process
Yt = a + bt + Xt
is trend stationary and I (0) since
Yt = b + Xt
with Xt = (L)t = (1 L) 1 0.5L + 0.4L2
and therefore (1) = 0 (i0andi1.R)
Andrea Beccarini (CQE)
Time Series Analysis
1
Winter 2013/2014
94 / 143
Integrated processes
Definition
Definition: ARIMA process
Let (t )tT be a white noise process; the stochastic process (Xt )tZ is
called integrated autoregressive moving-average process of the orders
p, d and q, or ARIMA(p, d, q), if d Xt is an ARMA(p, q) process
(L)d Xt = (L)t
For d > 0 the process is nonstationary (I (d)) even if all roots of
(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
95 / 143
Integrated processes
Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 1: Long-term forecasts and forecasting errors
Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
96 / 143
Integrated processes
Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 2: Spurious regression
OLS regressions will show spurious relationships between
time series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,
but it does not help if the series are integrated
Illustrations
spurious1.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
97 / 143
Integrated processes
Integrated processes and parameter estimation
OLS estimators (and ML estimators) are consistent and
asymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case
Xt = 1 Xt1 + t
with t NID(0, 1) and X0 = 0
The AR(1) process is stationary if |1 | < 1 and has a unit root
if |1 | = 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
98 / 143
Integrated processes
Integrated processes and parameter estimation
The usual OLS estimator of 1 is
PT
t=1 Xt Xt1
1 = P
T
2
t=1 Xt1
How does the distribution of look like?
Influence of and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
99 / 143
Integrated processes
Integrated processes and parameter estimation
Consistency and asymptotic normality for I (0) processes (|1 | < 1)
plim 1 = 1
d
T 1 1 Z N 0, 1 21
Consistency and asymptotic normality for I (1) processes (1 = 1)
plim 1 = 1
d
T 1 1 V
where V is a nondegenerate, nonnormal random variable
Root-T -consistency and superconsistency
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
100 / 143
Integrated processes
Unit root tests
Importance to distinguish between trend stationarity and difference
stationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,
Phillips-Perron-Test
Basic tool: Linear regression
Xt
Xt
= deterministics + Xt1 + t
= deterministics + ( 1) Xt1 + t
| {z }
=:
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
101 / 143
Integrated processes
Unit root tests
Null and alternative hypothesis
H0 : = 1
(unit root)
H1 : || < 1
(no unit root)
H0 : = 0
(unit root)
H1 : < 0
(no unit root)
or, equivalently,
Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
102 / 143
Integrated processes
DF test and ADF test
Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests
Possible regressions
Xt = Xt1 + t
Xt = a + Xt1 + t
Xt = a + bt + Xt1 + t
or Xt = Xt1 + t
or Xt = a + Xt1 + t
or Xt = a + bt + Xt1 + t
Assumption for Dickey-Fuller test: no autocorrelation in t
If there is autocorrelation in t , use the augmented DF test
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
103 / 143
Integrated processes
DF test and ADF test
Dickey-Fuller regression, case 1: no constant, no trend
Xt = Xt1 + t
Null and alternative hypotheses
H0 : = 0
H1 : < 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around zero
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
104 / 143
Integrated processes
DF test and ADF test
Dickey-Fuller regression, case 2: constant, no trend
Xt = a + Xt1 + t
Null and alternative hypotheses
H0 : = 0
or H0 : = 0, a = 0
H1 : < 0
or
H0 : < 0, a 6= 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around a constant
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
105 / 143
Integrated processes
DF test and ADF test
Dickey-Fuller regression, case 3: constant and trend
Xt = a + bt + Xt1 + t
Null and alternative hypotheses
H0 : = 0
or = 0, b = 0
H1 : < 0
or
< 0, b 6= 0
Null hypothesis: stochastic trend with drift
Alternative hypothesis: trend stationary process
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
106 / 143
Integrated processes
DF test and ADF test
Dickey-Fuller test statistics for single hypotheses
-test :
-test :
T
/
The -test statistic is computed in the same way as
the usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.6)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
107 / 143
Integrated processes
DF test and ADF test
The Dickey-Fuller test statistics for the joint hypotheses are
computed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.7)
Illustrations (dftest.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
108 / 143
Integrated processes
DF test and ADF test
If there is autocorrelation in t the DF test does not work (dftest.R)
Augmented Dickey-Fuller test (ADF test) regressions:
Xt = 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + bt + 1 Xt1 + . . . + p Xtp + Xt1 + t
The added lagged differences capture the autocorrelation
The number of lags p must be large enough to make t white noise
The critical values remain the same as in the no-correlation case
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
109 / 143
Integrated processes
DF test and ADF test
Further interesting topics (but we skip these)
Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity
H0 : Xt I (0)
H1 : Xt I (1)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
110 / 143
Integrated processes
Regression with integrated processes
Spurious regression: If Xt and Yt are independent but both I (1)
then the regression
Yt = + Xt + ut
will result in an estimated coefficient that is significantly different
from 0 with probability 1 as T
BUT: The regression
Yt = + Xt + ut
may be sensible even though Xt and Yt are I (1)
Cointegration
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
111 / 143
Integrated processes
Regression with integrated processes
Definition: Cointegration
Two stochastic processes (Xt )tT and (Yt )tT are cointegrated if both
processes are I (1) and there is a constant such that the process
(Yt Xt ) is I (0)
If is known, cointegration can be tested using a standard unit root
test on the process (Yt Xt )
If is unknown, it can be estimated from the linear regression
Yt = + Xt + ut
and cointegration is tested using a modified unit root test on the
residual process (ut )t=1,...,T
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
112 / 143
GARCH models
Conditional expectation
Let (X , Y ) be a bivariate random variable with a joint density
function, then
Z
E (X |Y = y ) =
x fX |Y =y (x)dx
is the conditional expectation of X given Y = y
E (X |Y ) denotes a random variable with realization E (X |Y = y )
if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y ) are called conditional expectation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
113 / 143
GARCH models
Conditional variance
Let (X , Y ) be a bivariate random variable with a joint density
function, then
Z
Var (X |Y = y ) =
(x E (X |Y = y ))2 fX |Y =y (x)dx
is the conditional variance of X given Y = y
Var (X |Y ) denotes a random variable with realization Var (X |Y = y )
if the random variable Y realizes as y
Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
114 / 143
GARCH models
Rules for conditional expectations
Law of iterated expectations: E (E (X |Y )) = E (X )
If X and Y are independent, then E (X |Y ) = E (X )
The condition can be treated like a constant,
E (XY |Y ) = Y E (X |Y )
The conditional expecation is a linear operator. For a1 , . . . , an R
!
n
n
X
X
E
ai Xi |Y =
ai E (Xi |Y )
i=1
Andrea Beccarini (CQE)
i=1
Time Series Analysis
Winter 2013/2014
115 / 143
GARCH models
Basics
Some economic time series show volatility clusters, e.g. stock returns,
commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters since
their conditional variance is constant
Example: Stationary AR(1)-process, Xt = Xt1 + t with || < 1;
then
2
Var (Xt ) = X2 =
,
1 2
and the conditional variance is
Var (Xt |Xt1 ) = 2
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
116 / 143
GARCH models
Basics
In the following, we will focus on stock returns
Empirical fact: squared (or absolute) returns are positively
autocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
117 / 143
GARCH models
ARCH(1)-process
Definition: ARCH(1)-process
The stochastic process (Xt )tZ is called ARCH(1)-process if
E (Xt |Xt1 ) = 0
Var (Xt |Xt1 ) = t2
2
= 0 + 1 Xt1
for all t Z, with 0 , 1 > 0
Often, an additional assumption is
2
Xt | (Xt1 = xt1 ) N(0, 0 + 1 xt1
)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
118 / 143
GARCH models
ARCH(1)-process
The unconditional distribution of Xt is a non-normal distribution
Leptokurtosis: The tails are heavier than the tails of the normal
distribution
Example of an ARCH(1)-process
Xt = t t
where (t )tZ is white noise with 2 = 1 and
q
2
t = 0 + 1 Xt1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
119 / 143
GARCH models
ARCH(1)-process
One can show that
[11]
E (Xt |Xt1 ) = 0
E (Xt ) = 0
2
Var (Xt |Xt1 ) = 0 + 1 Xt1
Var (Xt ) = 0 / (1 1 )
Cov (Xt , Xti ) = 0
for i > 0
Stationarity condition: 0 < 1 < 1
The unconditional
kurtosis is 3(1 12 )/(1 312 ) if t N(0, 1).
p
If 1 > 1/3 = 0.57735, the kurtosis does not exist.
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
[12]
120 / 143
GARCH models
ARCH(1)-process
Squared returns follow
[13]
2
Xt2 = 0 + 1 Xt1
+ vt
with vt = t2 (2t 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt )tZ is white noise
E (vt ) = 0
Var (vt ) = E (vt2 ) = const.
Cov (vt , vti ) = 0
Andrea Beccarini (CQE)
Time Series Analysis
(i = 1, 2, . . .)
Winter 2013/2014
121 / 143
GARCH models
ARCH(1)-process
Simulation of an ARCH(1)-process for t = 1, . . . , 2500
Parameters: 0 = 0.05, 1 = 0.95, start value X0 = 0
Conditional distribution: t N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylized
facts of return distributions
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
122 / 143
GARCH models
Estimation of an ARCH(1)-process
Of course, we do not know the true values of the model parameters
0 and 1
How can we estimate the unknown parameters 0 and 1 ?
Observations X1 , . . . , XT
Because of
2
Xt2 = 0 + 1 Xt1
+ vt
a possible estimation method is OLS
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
123 / 143
GARCH models
Estimation of an ARCH(1)-process
OLS estimator of 1
P
1 =
2 X2
Xt2 Xt2 Xt1
t1
2
(Xt2 , Xt1
)
2
PT 2
2
X
X
t1
t1
t=2
T
t=2
Careful: These
p estimators are only consistent if the kurtosis exists
(i.e. if 1 < 1/3)
Test of ARCH-effects
H0 : 1 = 0
H1 : 1 > 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
124 / 143
GARCH models
Estimation of an ARCH(1)-process
For T large, under H0
Reject H0 if
T
1 N(0, 1)
T
1 > 1 (1 )
Second version of this test: Consider the R 2 of the regression
2
+ vt ,
Xt2 = 0 + 1 Xt1
then under H0
appr
T
12 TR 2 21
Reject H0 if TR 2 > F1
2 (1 )
1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
125 / 143
GARCH models
ARCH(p)-process
Definition: ARCH(p)-process
The stochastic process (Xt )tZ is called ARCH(p)-process if
E (Xt |Xt1 , . . . Xtp ) = 0
Var (Xt |Xt1 , . . . , Xtp ) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp
for t Z, where i 0 for i = 0, 1, . . . , p 1 and p > 0
Often, an additional assumption is that
Xt |(Xt1 = xt1 , . . . , Xtp = xtp ) N(0, t2 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
126 / 143
GARCH models
ARCH(p)-process
Example of an ARCH(p)-process
Xt = t t
where(t )tZ is white noise with 2 = 1 and
q
2 + ... + X2
t = 0 + 1 Xt1
p tp
An ARCH(p) process is weakly stationary if all roots of
1 1 z 2 z 2 . . . p z p = 0 are outside the unit circle
Then, for all t Z, E (Xt ) = 0 and
Var (Xt ) =
Andrea Beccarini (CQE)
P0p
Time Series Analysis
i=1 i
Winter 2013/2014
127 / 143
GARCH models
ARCH(p)-process
If (Xt )tZ is a stationary ARCH(p) process, then (Xt2 )tZ is a
stationary AR(p) process
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt
As to the error term,
E (vt ) = 0
Var (vt ) = const.
Cov (vt , vti ) = 0
for i = 1, 2, . . .
Simulating an ARCH(p) is easy
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
128 / 143
GARCH models
Estimation of ARCH(p) models
OLS estimation of
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt
Test of ARCH-effects
H0 : 1 = 2 = . . . = p = 0
vs H1 : not H0
Let R 2 denote the coefficient of determination of the regression
Under H0 , the test statistic TR 2 2p ;
thus reject H0 if TR 2 > F1
2 (1 )
p
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
129 / 143
GARCH models
Maximum likelihood estimation
Basic idea of the maximum likelihood estimation method:
Choose parameters such that the joint density of the observations
fX1 ,...,XT (x1 , . . . , xT )
is maximized
Let X1 , . . . , XT denote a random sample from X
The density fX (x; ) depends on R unknown parameters
= (1 , . . . , R )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
130 / 143
GARCH models
Maximum likelihood estimation
ML estimation of : Maximize the (log)likelihood function
L () = fX1 ,...XT (x1 , . . . , xT ; )
=
ln L () =
T
Y
t=1
T
X
fX (xt ; )
ln fX (xt ; )
t=1
ML estimate
= argmax [ln L ()]
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
131 / 143
GARCH models
Maximum likelihood estimation
Since observations are independent in random samples
fX1 ,...,XT (x1 , . . . , xT ) =
T
Y
fXt (xt )
t=1
or
ln fX1 ,...,XT (x1 , . . . , xT ) =
T
X
ln fXt (xt )
t=1
T
X
ln fX (xt )
t=1
But: ARCH-returns are not independent!
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
132 / 143
GARCH models
Maximum likelihood estimation
Factorization with dependent observations
fX1 ,...,XT (x1 , . . . , xT ) =
T
Y
fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
or
ln fX1 ,...,XT (x1 , . . . , xT ) =
T
X
ln fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
Hence, for an ARCH(1)-process
T
Y
1
1
fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 )
p 2 exp
2
2 t
t=2
Andrea Beccarini (CQE)
Time Series Analysis
xt
t
2 !
Winter 2013/2014
133 / 143
GARCH models
Maximum likelihood estimation
The marginal density of X1 is complicated but becomes negligible for
large T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)
ln L(0 , 1 |x1 , . . . , xT )
T
t=2
t=2
T
1X
1X
= ln 2
ln t2
2
2
2
xt
t
2
2
where t2 = 0 + 1 xt1
ML-estimation of 0 and 1 by numerical maximization of
ln L(0 , 1 ) with respect to 0 and 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
134 / 143
GARCH models
GARCH(p,q)-process
Definition: GARCH(p,q)-process
The stochastic process (Xt )tZ is called GARCH(p, q)-process if
E (Xt |Xt1 , Xt2 , . . .) = 0
Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp
2
2
+1 t1
+ . . . + q tq
for t Z with i , i 0
Often, an additional assumption is that
(Xt |Xt1 = xt1 , Xt2 = xt2 , . . .) N(0, t2 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
135 / 143
GARCH models
GARCH(p,q)-process
Conditional variance of GARCH(1, 1)
Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ 1 t1
X
0
2
=
+ 1
1i1 Xti
1 1
i=1
Unconditional variance
Var (Xt ) =
Andrea Beccarini (CQE)
0
Pq
i=1 i
j=1 j
Pp
Time Series Analysis
Winter 2013/2014
136 / 143
GARCH models
GARCH(p,q)-process
Necessary condition for weak stationarity
p
X
i +
i=1
q
X
j < 1
j=1
(Xt )tZ has no autocorrelation
GARCH-processes can be written as ARMA(max (p, q) , q)-processes
in the squared returns
Example: GARCH(1, 1)-process with Xt = t t and
2 + 2
t2 = 0 + 1 Xt1
1 t1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
137 / 143
GARCH models
Estimation of GARCH(p,q)-processes
Estimation of the ARMA(max (p, q) , q)-process in the squared returns
Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process
fX1 ,...,XT (x1 , . . . , xT )
T
Y
1
1
= fX1 (x1 )
p 2 exp
2
2 t
t=2
Andrea Beccarini (CQE)
Time Series Analysis
xt
t
2 !
Winter 2013/2014
138 / 143
GARCH models
Estimation of GARCH(p,q)-processes
Again, the density of X1 can be neglected
Log-Likelihood function
ln L(0 , 1 , 1 |x1 , . . . , xT )
T
t=2
t=2
T
1X
1X
= ln 2
ln t2
2
2
2
xt
t
2
2
2
with t2 = 0 + 1 xt1
+ 1 t1
and 12 = 0
ML-estimation of 0 , 1 and 1 by numerical maximization
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
139 / 143
GARCH models
Estimation of GARCH(p,q)-processes
2
Conditional h-step forecast of the volatility t+h
in a GARCH(1, 1)
model
0
2
h
2
E t+h |Xt , Xt1 , . . . = (1 + 1 ) t
1 1 1
0
+
1 1 1
If the process is stationary
2
lim E (t+h
|Xt , Xt1 , . . .) =
0
1 1 1
Simulation of GARCH-processes is easy; the estimation can be
computer intensive
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
140 / 143
GARCH models
Residuals of an estimated GARCH(1,1) model
Careful: Residuals are slightly different from what you know from OLS
regressions
Estimates:
0,
1 , 1 ,
2 + 2
From t2 = 0 + 1 Xt1
1 t1 and Xt = + t t we calculate
the standardized residuals
t =
Xt
Xt
=q
t
2 +
1 2
0 +
1 Xt1
t1
Histogram of the standardized residuals
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
141 / 143
GARCH models
AR(p)-ARCH(q)-models
Definition: (Xt )tZ is called AR(p)-ARCH(q)-process if
Xt
= + 1 Xt1 + t
t2
= 0 + 1 2t1
where t N(0, t2 )
mean equation / variance equation
Maximum likelihood estimation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
142 / 143
GARCH models
Extensions of the GARCH model
There are a number of possible extensions to the GARCH model:
Empirical fact: Negative shocks have a larger impact on volatility
than positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. t t
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
143 / 143