0% found this document useful (0 votes)
34 views17 pages

Pairs Trading Strategy Using Kalman Filter

Uploaded by

JThomas K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views17 pages

Pairs Trading Strategy Using Kalman Filter

Uploaded by

JThomas K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/301644668

A pairs trading strategy based on linear state space models and the Kalman
filter

Article in Quantitative Finance · April 2016


DOI: 10.1080/14697688.2016.1164886

CITATIONS READS

34 4,230

3 authors:

Carlos Eduardo de Moura Adrian Heringer Pizzinga


Instituto Nacional de Matemática Pura e Aplicada Universidade Federal Fluminense
1 PUBLICATION 34 CITATIONS 30 PUBLICATIONS 253 CITATIONS

SEE PROFILE SEE PROFILE

Jorge P. Zubelli
Khalifa University
157 PUBLICATIONS 1,688 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jorge P. Zubelli on 19 December 2017.

The user has requested enhancement of the downloaded file.


Quantitative Finance

ISSN: 1469-7688 (Print) 1469-7696 (Online) Journal homepage: http://www.tandfonline.com/loi/rquf20

A pairs trading strategy based on linear state


space models and the Kalman filter

Carlos Eduardo de Moura, Adrian Pizzinga & Jorge Zubelli

To cite this article: Carlos Eduardo de Moura, Adrian Pizzinga & Jorge Zubelli (2016): A pairs
trading strategy based on linear state space models and the Kalman filter, Quantitative
Finance, DOI: 10.1080/14697688.2016.1164886

To link to this article: http://dx.doi.org/10.1080/14697688.2016.1164886

Published online: 25 Apr 2016.

Submit your article to this journal

Article views: 48

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=rquf20

Download by: [Adrian Pizzinga] Date: 27 May 2016, At: 00:26


Quantitative Finance, 2016
http://dx.doi.org/10.1080/14697688.2016.1164886

A pairs trading strategy based on linear state space


models and the Kalman filter
CARLOS EDUARDO DE MOURA†, ADRIAN PIZZINGA∗ ‡ and JORGE ZUBELLI†
†Associação Instituto Nacional de Matemática Pura e Aplicada (IMPA), Rio de Janeiro, Brazil
‡Institute of Mathematics and Statistics - Fluminense Federal University (UFF), Niterói, Brazil

(Received 28 December 2014; accepted 18 February 2016; published online 25 April 2016)

Among many strategies for financial trading, pairs trading has played an important role in practical
and academic frameworks. Loosely speaking, it involves a statistical arbitrage tool for identifying and
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

exploiting the inefficiencies of two long-term, related financial assets. When a significant deviation
from this equilibrium is observed, a profit might result. In this paper, we propose a pairs trading
strategy entirely based on linear state space models designed for modelling the spread formed with a
pair of assets. Once an adequate state space model for the spread is estimated, we use the Kalman filter
to calculate conditional probabilities that the spread will return to its long-term mean. The strategy is
activated upon large values of these conditional probabilities: the spread is bought or sold accordingly.
Two applications with real data from the US and Brazilian markets are offered, and even though they
probably rely on limited evidence, they already indicate that a very basic portfolio consisting of a
sole spread outperforms some of the main market benchmarks.

Keywords: Kalman filter; Mean-reverting conditional probabilities; Pair; Pairs trading; Spread; State
space models; Statistical arbitrage

1. Introduction ARMA, model (cf. Brockwell and Davis 1991, 2003, Hamilton
1994, Enders 2004), and its particular specifications are also
Pairs trading is a type of statistical arbitrage that was first dealt with in this paper under appropriate linear state space
implemented in the mid-80s by Nunzio Tartaglia and his group forms. In fact, we will prove that this second class of models,
at Morgan Stanley (cf. Vidyamurthy 2004). Currently, pairs even though they lack theoretical finance support, encompasses
trading is widely used by investment banks and hedge funds. the proposal made by Elliott et al. (2005) as a particular case.
In general terms, a pairs trading strategy aims to identify and Subsequently, we develop a methodology for calculating
exploit market inefficiencies observed with two long-term, conditional probabilities (given the past and actual spread data)
related assets, mostly by using statistical methods. The two that the spread will return to its long-term mean k-steps ahead
assets are said to form a pair. When a significant deviation of (the frequency can be daily or intra-daily) whenever significant
the prices between the two assets is detected, a trading position deviations are observed. We propose an alternative augmented
is taken: the higher priced asset is sold (this is the so-called state space form for a previously selected model estimated
short position by market practitioners) and the lower priced using spread data, and with this enlarged state space form,
asset is bought (that is, a long position is taken), with the hope we apply the Kalman filter k-steps-ahead prediction (see, for
that mispricing will correct to the long-term equilibrium value instance, Harvey 1989, Durbin and Koopman 2001) to obtain
(cf. Elliott et al. 2005, Vidyamurthy 2004). conditional mean vectors and covariance matrices of the k
In this paper, we consider two linear state space models future spreads. These first- and second-order moments are all
that are appropriate for modelling spreads (stationary linear that are needed for calculating the conditional probabilities
combinations of long-term related assets), with the intent of previously mentioned. The quantitative strategy we pursue
testing a new quantitative strategy involving pairs trading. The here is activated according to the following rule: if the spread
first model is the unobserved component model proposed by is found to be considerably below (above) its long-term mean
Elliott et al. (2005). Such a model, which has a Gaussian linear and the conditional probability that the spread will increase
state space form, is a discrete-time version of the linear mean- above (decrease below) its long-term mean by k-steps ahead
reverting Ornstein–Uhlenbeck model. The second model is is reasonably large, buy (sell) the spread.
the traditional stationary autoregressive moving average, or The contribution made by this paper to the literature on
pairs trading is the paradigm related to the trading rule briefly
∗ Corresponding author. Email: [email protected]
© 2016 Informa UK Limited, trading as Taylor & Francis Group
2 C. E. de Moura et al.

described above: one takes positions on the assets forming In their study, stocks from companies that had at least one day
the pair by checking whether the spread is too positive or too out of business were discarded. A pair formation for each stock
negative and also by examining the probability that the spread was found by minimizing the squared deviations between the
will not take too long to cross its long-term value (which is the two normalized daily price series, where the dividends were
probability that a profit will result soon). reinvested. The basic strategy consists of opening a position in a
The paper is organized as follows. Section 2 reviews the pair when prices diverge by more than two historical standard
literature on pairs trading, without claiming exhaustiveness. deviations and unwinding the position whenever the prices
Section 3 discusses pair trading from the statistical arbitrage cross each other. Should prices not cross after the end of the
standpoint, enumerating some of its main practical features. trading interval, gains and losses are calculated at the end of
Section 4 presents the two aforementioned linear state space the last trading day. The performance of this strategy used by
models, discusses their mathematical properties and embeds Gatev et al. (2006) was addressed for a Brazilian stock market
each of them into the state space modelling/Kalman filter case by Perlin (2007). The latter investigated the period from
framework. Section 5 formally discusses how the conditional 2000 until 2006 and tested different conditions of long and
probabilities that the spread will mean-revert are calculated, short, ranging between 1.5 and 3 standard deviations. For the
addresses the corresponding computational issues and data set used, the best options were those between 1.5 and 2
describes step by step how the quantitative strategy is imple- standard deviations.
mented. Section 6 offers two applications to real data from the A very seminal paper entirely dedicated to state space mod-
US and Brazilian markets and compares the performance of elling for spreads was authored by Elliott et al. (2005), where
the proposed strategy with the main benchmarks and with a a Gaussian linear state space model for the mean-reversion
former pairs trading strategy already used by market practi- behaviour of the spread between paired stocks was devel-
tioners. For each of the examples, the justification for the pair oped under a continuous time setting. It is assumed that the
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

used is initially addressed using a fundamental analysis of the ‘observed’ spread St is a noisy observation of some mean-
expected equilibrium between the two corresponding assets reverting ‘unobserved’ spread xt . The set-up for parameter es-
(section 6.1); in the sequel, the advocated equilibrium rela- timation is based on a version of the expectation–maximization
tion is assessed using proper econometric cointegration tests algorithm previously developed in Elliott and Krishnamurthy
(beginning of section 6.2). An analysis of the computational (1999). The pairs trading strategy proposed is the following:
effort related to estimation and goodness of fit is included if St is larger/smaller than the one-step-ahead estimate x̂ t|t−1 ,
as well. Section 7 discusses the main results obtained in the then the spread is regarded as too large/small, and thus, the
former section, provides some economic arguments in favour trader could take a short/long position in the spread portfolio.
of our methodology and lists some comments on the use of the Therefore, a profit is expected whenever a price correction
latter in real scenarios. The appendices review the Kalman filter occurs.
methods used in the paper, provide the proofs of the technical Another paper on state space models for spread data is that by
results and explain some of the financial returns calculated in Triantafyllopoulos and Montana (2009), where the modelling
the applications. framework proposed in Elliott et al. (2005) is extended in
several ways. First, they introduce time-varying autoregressive
(or mean-reverting) parameters, which potentially allows the
2. Pairs trading: a glimpse at the literature model to adapt itself to sudden changes in the data. Second,
they develop and implement a Bayesian approach for estimat-
This section discusses earlier studies on pairs trading strategies, ing the parameters and provide an on-line estimation scheme.
focusing mainly on spread modelling. A feature common to Finally, they advocate a procedure known as flexible least
most of these models reviewed in section 2.1 consists of recog- squares to estimate the cointegration coefficient recursively,
nizing the spread associated with a pair of stocks (cf. the naive unveiling a possible time-varying cointegration relationship
definition of ‘pair’ already given and used in section 1) as some between the two asset prices.
kind of mean-reverting stochastic process, the parameters of We now discuss two other works published in 2009. The
which are estimated using financial market data. In section 2.2, first is a paper by Bertram (2009), where the theory of Itô
we explain how this paper fits within the literature. diffusion processes comes into play for determining optimal
trading strategies that also take into account transaction costs.
The empirical content of the paper makes use of the Ornstein–
2.1. The review Uhlenbeck modelling of the spread of a security traded si-
The first reference that we discuss here is Vidyamurthy (2004). multaneously in both the Australian and New Zealand stock
In this book, a good background is provided on the pairs trading markets. The second paper, by Huck (2009), offers a data-
universe as well as several techniques for choosing pairs trad- driven and multi-criteria decision method for selecting pairs
ing, with a focus on cointegration tests. Moreover, the author and implements the latter using weekly returns of S&P100
explains how pairs trading works and surveys some meth- stocks.
ods for addressing the problem in real settings—for instance, In 2010, at least five papers addressing pairs trading tech-
common trends/cointegration models, arbitrage pricing the- niques were published. Bertram (2010) complements his work
ory (APT), distance measures and state space models/Kalman published in 2009 by deriving analytical solutions for the ex-
filter. pected return, the variance of the return and the expected trade
Gatev et al. (2006) studied pairs trading strategies in the length of his continuous time trading strategy—these param-
US equity market with daily data over the period 1962–2002. eters are used for constructing optimal strategies. Similarly,
A pairs trading strategy 3

Huck (2010) authored a continuation paper (of the one just Finally, we mention that our statistical framework is quite
surveyed in the last paragraph), where the multi-criteria de- different from that of Triantafyllopoulos and Montana (2009),
cision method is enhanced by adding neural network fore- who work with a model that, by its very definition, has to
casting techniques. The paper by Avellaneda and Lee (2010) be recognized under the conditionally Gaussian state space
employed principal component analysis with sectors Exchange approach (see appendix 1). Moreover, Triantafyllopoulos and
Trade Funds for extracting risk factors. Baronyan et al. (2010) Montana make use of the Bayesian perspective for estimating
investigated 14 market-neutral trading strategies combined the model parameters. On the other hand, we accomplish such
with different trading methods and pairs selection methods. tasks in our model using the maximum likelihood method.
From empirical evidence of weekly data on stocks that com-
prise the Dow Jones 30 index, they find that the performance
of market-neutral equity trading is superior in the complicated
year of 2008, the first one of the global financial crisis. Finally,
Wissner-Gross and Freer (2010) proposed an econophysical 3. Statistical Arbitrage Strategies
perspective to generalize statistical arbitrage trading strategies
for space-like separated world trading locations: one of their Quoting Kaufman (2005), ‘when the two legs of a spread are
findings is that optimal intermediate locations exist between highly correlated and therefore the opportunity for profit from
trading centres. price divergence is of short duration, the trade is called an
Continuing with the literature review, we now mention the arbitrage. True arbitrage has, theoretically, no trading risk,
paper by Mori and Ziobrowski (2011). In this mostly empir- however it is offset by small profits and limited opportunity
ical work, the effectiveness of pairs trading in the US Real for volume’.
Estate Investment Trust market is compared with that in the US Statistical arbitrage is a class of strategies widely used by
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

general stock market over the period 1987–2008. The authors hedge funds and proprietary traders. The distinctive feature
conclude that the former market was more profitable than the of such strategies is that profits can be made by exploiting
latter between 1993 and 2000, after which pairs trading showed the statistical mispricing of one or more assets, based on their
similar performances in both markets. regular behaviour. Despite the use of the term ‘arbitrage’, such
To conclude, we review three recent works on pairs trading a class is not riskless. One simple and very popular strategy that
theory and methods. The first two are works by Fasen (2013a, fits in with the definition of statistical arbitrage is pairs trading
2013b), who essentially proposes least squares estimators for (cf. Elliott et al. 2005). Other types of statistical arbitrage are
the parameters of several versions of the Ornstein–Uhlenbeck discussed in Vidyamurthy (2004) and Pole (2007).
model and fully investigates their statistical properties, such as Following Vidyamurthy (2004), the first use of a pairs trad-
consistency and asymptotic distributions. The usual t-ratio and ing strategy is attributed to the Wall Street ‘quant’ Nunzio
Wald tests are also investigated in terms of their asymptotic Tartaglia, who was at Morgan Stanley in the Mid-1980s. Pairs
behaviour. In the third paper by Tourin and Yan (2013), a trading is based on APT (cf. Ross 1976). Informally speaking,
dynamic model for pairs trading based on the theory of optimal if two stocks have similar characteristics, the prices of both
stochastic control is proposed and illustrated using minute-by- assets must be more or less the same; that is, they maintain some
minute historical data on two stocks traded on the New York degree of equilibrium. If prices diverge, then it is likely that
Stock Exchange. one of the assets is overpriced and/or the other is underpriced.
Basically, pairs trading schemes involve selling the higher
priced asset and buying the lower priced asset with the hope
that mispricing will be ultimately corrected by the long-term
2.2. This paper’s contribution equilibrium value. The difference between the two observed
prices is termed spread. Therefore, the idea behind a given
Given the articles and books reviewed here on pairs trad-
pairs trading strategy is to trade on the oscillations around the
ing, in this paper, we intend to complement the findings of
equilibrium value of the spread. The oscillations of the spread
Elliott et al. (2005) along three directions:
occur because the latter is allegedly mean-reverting. One can
• A more general class of possible probabilistic put on a trade when the spread deviates substantially from its
descriptions—the ARMA models—for a given spread equilibrium value and unwind the trade when the equilibrium
time series is proposed.As we demonstrate, such a class is restored (cf. Elliott et al. 2005). For the trade to be profitable,
encapsulates the mean-reverting model by Elliott et al. the deviation must be reasonably larger than trading costs.
as a particular case. Pairs trading is a market-neutral trading strategy. Hence,
• We create a new quantitative pairs trading strategy this strategy strives to provide positive returns in both bull and
based upon outputs (specifically: some conditional bear markets by selecting a large number of long and short
probabilities) of the stochastic model selected and es- positions with no net exposure to the market (cf. Nicholas 2000,
timated using spread data. Jacobs and Levy 2005). The main risks involved in a pairs
• The whole procedure (model estimation & trading trading are the following: (1) the divergence risk: the long-
strategy) is implemented using real financial time term equilibrium relation between the assets may change or
series. The results are compared with those from other even vanish; (2) the horizon risk: the spread does not converge
investment alternatives, including the simple pairs in a given horizon of time, hence forcing the traders to close the
trading strategy proposed by Gatev et al. (2006) and position before the convergence, due to worsened mispricing
re-considered by Perlin (2007). or margin calls (cf. Engelberg et al. 2009).
4 C. E. de Moura et al.

4. Proposed models Ohrstein-Uhlenbeck continuous time stochastic process (see


Rampertshammer 2007). The second is that it recognizes a
4.1. What is a pair? mean-reverting behaviour for the spread. The last good prop-
erty is a consequence of the next result, the proof of which is
The idea behind a pair (of stocks, bonds, foreign exchanges,
provided in appendix 2:
commodities, etc.) is closely linked to the econometric concept
of cointegration. More precisely, two time series Yt ∼ I(1) and Proposition 1 If St follows the unobserved component model
X t ∼ I(1) are said to be cointegrated iff aYt + bX t ∼ I(0) by Elliott et al. given in equation (2), then St ∼ ARMA(1,1).
for some a = 0 and b = 0. Here, the notation I(d) means
This last proposition, besides encapsulating this proposal by
‘integrated of order d’. This definition is sufficient for the scope
Elliott et al. (2005) in a more general class of mean-reverting
of this paper. For richer expositions on the theme and more
statistical models (next subsection), suggests a procedure for
general definitions, please refer to Harvey (1993), Hamilton
selecting/discarding equation (2) as a probabilistic description
(1994) and Enders (2004).
of some spread time series: if one obtains evidence from the
Consider
  spread data that the latter will not be adequately fitted by
St = log(Pt,1 ) − α + βlog(Pt,2 ) , (1) any ARMA(1,1) model, then the proposal by Elliott et al. is
necessarily misspecified for being considered in a pairs trading
where Pt,1 and Pt,2 are the prices of assets A1 and A2 in time scheme.
t, respectively. The time frequency can be daily or some kind
of intraday frequency (second, minute, hour, etc.). If log(Pt,1 )
and log(Pt,2 ) are cointegrated, the spread St is stationary—that
is, St ∼ I(0). In this case, α is the mean of the cointegration 4.3. ARMA models: generalizing the stochastic spread
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

relationship, β is the cointegration coefficient, and A1 and A2 approach


form a pair.
Cointegration, once verified, suggests that St would wander Because of their mean-reverting behaviour, stationary ARMA
around an equilibrium value. This is actually almost indispens- dynamics can be always considered as valid approaches for
able for pairs trading. A suitable choice of α in equation (1) modelling the spread St . For instance, we could assume that
renders a value of zero. Any expressive deviations from this St ∼ARMA(2,2); that is,
value can be traded against. St = φ0 + φ1 St−1 + φ2 St−2 + t + θ1 t−1 + θ2 t−2 , (4)
 
where t ∼ NID 0, σ 2 and (φ1 , φ2 ) are such that the polyno-
mial p (z) = 1 − φ1 z − φ2 z 2 , ∀z ∈ C, has its two roots outside
4.2. Unobserved component models: the stochastic spread the unit circle. The latter assumption on the coefficients φ1 and
approach φ2 is a sufficient condition for St to be a stationary process (see
Following Elliott et al. (2005), in this subsection, we assume Brockwell and Davis 1991, 2003, Hamilton 1994). The same
that the observed spread St , associated with a given pair of restrictions could be imposed on the moving average coeffi-
assets A1 and A2 , is a noisy realization of the unobserved or cients θ1 and θ2 in order to guarantee that St is invertible—that
actual mean-reverting spread xt : is, t can be written as a function of Yt , Yt−1 , . . . , by means
of an AR(∞) representation for St (again, see Brockwell and
St = xt + Dt
(2) Davis 1991, 2003, Hamilton 1994). Fortunately, such questions
xt − xt−1 = a − bxt−1 + Cηt , regarding invertibility are immaterial under the state space
where a ∈ , 0 < b < 2, D > 0, C > 0 and (t , ηt ) ∼ modelling/Kalman filter framework, as in the latter, both like-
NID (0, I2 ). In order to attain an appropriate state space rep- lihood function evaluation and forecasting attainable tasks are
resentation for the model in equation (2), the second equation independent of the invertibility question, as cleverly discussed
in the latter must be rephrased as x t+1 = a + (1 − b)xt + ηt∗ , by Hamilton (1994) in chapters 4, 5 and 13.
where ηt∗ = Cηt+1 . To adapt equation (A1) of appendix 1 to We can use equation (A1) of appendix 1 to accommodate the
accommodate equation (2)—with the second equation replaced model in equation (4) and any other stationary ARMA( p, q)
by its equivalent form—define Z t = 1, dt = 0, Ht = D 2 , model under a state space representation. Although there is no
Tt = B ≡ 1 − b, ct = a, Rt = 1 and Q t = C 2 . The Kalman unique way of performing such a conversion and the litera-
filter formulae in equation (A2) of appendix 1 become ture has been frequently proposing several state space forms
for ARIMA models (to cite a few books: Harvey 1989,
υt = St − at|t−1 , Ft = Pt|t−1 + D 2 , Brockwell and Davis 1991, 2003, Hamilton 1994, Durbin and
K t = B Pt|t−1 Ft−1 , L t = B − K t , t = 1, . . . , n. Koopman 2001), in this paper, the following alternative for
at+1|t = a + Bat|t−1 + K t υt , Pt+1|t = B Pt|t−1 L t + C 2 (3) the ARMA(2,2) model given in equation (4) will be used in
the sequel:
Equation (3) can have the initial conditions a1|0 = a/(1 − B)  
and P1|0 = C 2 /(1− B 2 ). Notice that the latter are precisely un- Z t = 1 0 0 0 0 , dt = 0, Ht = 0,
⎡ ⎤ ⎡ ⎤
conditional first- and second-order moments of the stationary φ1 φ2 1 θ1 θ2 φ0
process xt . ⎢ 1 0 0 0 0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
The model proposed by Elliott et al. (2005) has three in- Tt = ⎢ ⎥
⎢ 0 0 0 0 0 ⎥ , ct = ⎢ 0 ⎥ ,
⎢ ⎥
teresting features. The first is that it has support in finance ⎣ 0 0 1 0 0 ⎦ ⎣0⎦
theory, as it can be viewed as a discrete-time version of the 0 0 0 1 0 0
A pairs trading strategy 5
⎡ ⎤
0 where Ft is the σ -field generated by past and actual spread
⎢0⎥ data; that is, Ft ≡ σ (S1 , . . . , St−1 , St ). If the assumption that a
⎢ ⎥
Rt = ⎢ ⎥
⎢1⎥ , Q t = σ .
2
specific Gaussian linear state space model is appropriate for the
⎣0⎦ spread (something that is to be checked for in practical imple-
0 mentations), the conditional distribution functions described
in equation (5) correspond to
The Kalman filter formulae in equation (A1) (cf. appendix 1)
⎡ ⎤
are applied with the matrices above and can be initialized St+1
φ0 φ0 ⎢ St+2 ⎥  
under the initial conditions a1 = , , ⎢ ⎥
1 − φ1 − φ2 1 − φ1 − φ2 St,k ≡ ⎢ . ⎥ |Ft ∼ N μt,k , t,k , (6)
 ⎣ .. ⎦
 
0, 0, 0 and vec (P1 ) = (I − T ⊗ T )−1 vec R Q R  . St+k
   
where μt,k ≡ E St,k |Ft and t,k ≡ Var St,k |Ft . Using
the notation established in appendix 1 for the key quantities
5. A new pairs trading strategy
related to the Kalman filter and also defining Pt+i,t+ j|t ≡
Cov(αt+i , αt+ j |Ft ), for i, j = 1, 2, . . . , k and i < j (recall
In this section, we discuss the main elements of a quantitative 
pairs trading strategy based entirely on the estimation of the that Pt+i,t+ j|t = Pt+ j,t+i|t ), it follows that each entry of μt,k
state space models proposed in section 4. First, in section 5.1, is given by
we provide theoretical details on how the conditional probabil- E(St+i |Ft ) = E(Z t+i αt+i + dt+i + t+i |Ft )
ities that the spread will return to its long-term mean, k-steps
= Z t+i E(αt+i |Ft ) + dt+i + E(t+i |Ft ) (7)
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

ahead from a given time instant t, are defined. In section 5.2,


we explore the practical issues for effectively calculating the = Z t+i E(αt+i |Ft ) + dt+i + E(t+i )
aforementioned probabilities in an online fashion: once an = Z t+i at+i|t + dt+i .
appropriate state space model is estimated using maximum
Regarding t,k , its diagonal and off-diagonal blocks are,
likelihood (see appendix 1), the implementation of the usual
respectively, given by
Kalman filter prediction equations given in equation (A2) to
an augmented version of the model is appropriate. Finally, Var(St+i |Ft )
in section 5.3, the quantitative strategy is described step by 
= Z t+i Var(αt+i |Ft )Z t+i + Var(t+i |Ft )
step, and the content derived in sections 5.1 and 5.2 is merged 
with the trading rule that involves buying or selling the spread + Z t+i Cov(αt+i , t+i |Ft ) + Cov(t+i , αt+i |Ft )Z t+i

accordingly. = Z t+i Pt+i|t Z t+i + Ht+i , (8)
Cov(St+i , St+ j |Ft )
5.1. Mean-reverting conditional probabilities pup and pdown : = Cov(Z t+i αt+i + dt+i + t+i , Z t+ j αt+ j + dt+ j + t+ j |Ft )
theory 
= Z t+i Cov(αt+i , αt+ j |Ft )Z t+ j + Z t+i Cov(t+i , t+ j |Ft )

The main ingredient for success: to achieve, from a statisti- + Cov(t+i , αt+ j |Ft )Z t+i + Cov(t+i , t+ j |Ft )
cal/probabilistic standpoint, the minimum confidence that a 
= Z t+i Pt+i,t+ j|t Z t+ j. (9)
future observed value of the spread will not take very long to
cross back to some long-term value (for instance, its uncon-
ditional mean) once the spread observed on some time t is
somewhat distant from that same long-term value. If such a 5.2. Mean-reverting conditional probabilities pup and pdown :
task can be accomplished, one might buy (or sell) the spread practical evaluation
on that time t whenever chances are that he or she can make a
profit. For each t, the first- and second-order conditional moments
Formally, the strategy that we build is strongly based upon displayed in equations (7) and (8) are obtained from the Kalman
the ability of calculating the conditional probability that the filter in equation (A2) applied with the data subset {S1 , S2 ,
spread will revert to its long-term mean—or any other conve- . . . , St } enlarged with k missing values after the last spread St :
nient value c to be chosen—by k steps ahead, given the past {S1 , S2 , . . . , St , .NaN, .NaN, . . . , .NaN}, where the acronym
and actual spread data; that is: ‘.NaN’ stands for ‘Not available Number’. Following Durbin
pup (t, k, c) and Koopman (2001, section 4.9), this indicates that, under the
state space modelling approach, forecasting is a particular case
= P[(St+1 > c) ∪ (St+2 > c) ∪ · · · ∪ (St+k > c)|Ft ]
of missing value estimation. On the other hand, equation (9)
= 1 − P[(St+1 ≤ c, St+2 ≤ c, . . . , St+k ≤ c|Ft ]
depends on an additional implementation of Kalman recursions
= 1 − FSt+1 ,St+2 ,...,St+k |Ft (c, c, . . . , c), other than those revisited in appendix 1—specifically, the ones
pdown (t, k, c) derived in Durbin and Koopman (2001, section 4.5), with
= P[(St+1 < c) ∪ (St+2 < c) ∪ · · · ∪ (St+k < c)|Ft ] appropriate adaptations for the case of missing values. To avoid
= P[(−St+1 > −c) ∪ (−St+2 > −c) ∪ · · · ∪ (−St+k > −c)|Ft ] this computational effort, which is not always available as a
ready-to-use option offered by commercial softwares and is
= 1 − P[−St+1 ≤ −c, −St+2 ≤ −c, . . . , −St+k ≤ −c|Ft ]
not considered in the usual Kalman filter codes suggested in
= 1 − F−St+1 ,−St+2 ,...,−St+k |Ft (−c, −c, . . . , −c), (5) textbooks, we propose an alternative in this paper. Our proposal
6 C. E. de Moura et al.

makes use of already-implemented formulae known to time framework—see for instance Drezner and Wesolowsky (1989),
series analysts. Genz (1992, 2004) and Drezner (1994).
The building block for routinely evaluating equations (7),
(8) and (9) for each time t involves using an augmented state
space form equivalent to a given time series model formerly
selected and estimated using the spread data. In this paper, the 5.3. The strategy
models considered are those previously discussed in sections Assuming that a particular state space model has been already
4.2 and 4.3. This task consists of adding k new blocks to the estimated with available time series data from the spread pro-
state vector in equation (A1) of appendix 1, and each one has the cess St —the latter is associated with a pair of assets A1 and
same dimensions as those of the original state vector. Formally: A2 —, that the numerical devices discussed in section 5.2 have
⎡ ⎤ been implemented, and that the capital is invested at some low-
αt risk fixed income market, we now propose our trading rule. It
 ⎢ ⎥
⎢αt−1 ⎥ can be split into two mutually exclusive situations:
Yt = Z t 0 . . . 0 ⎢ . ⎥ + d t + ε t ,
⎣ .. ⎦ • If the observed value of St is found to be minimally lower
αt−k than (let us say for δ units) a long-term value c, which
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ is the same as that used in equation (5) and previously
αt+1 Tk 0 ··· 0 αt ct Rt
⎢ αt ⎥ ⎢I 0 · · · 0⎥ ⎢αt−1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ fixed to a particular value (for instance: c = 0, should
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. ⎥ ⎢ .. ⎥ + ⎢ .. ⎥ + ⎢ .. ⎥ ηt , one choose the spread mean), and pup in equation (5) is
⎣ . ⎦ ⎣. . ··· . ⎦ ⎣ . ⎦ ⎣.⎦ ⎣ . ⎦ found to be greater than some ‘large’ value pup ∗ , use the
αt−(k−1) 0 ··· I 0 αt−k 0 0 capital to buy the spread.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

(10) • If the observed value of St is found to be minimally


larger than c (without loss of generality, consider the
where Z t , dt , Tt , Rt and ct are the system matrices of the same amount δ) and pdown in equation (5) is found to
original model. With this enlarged state space form, we apply ∗
be greater than some ‘large’ value pdown , use the capital
the Kalman filter k-steps-ahead prediction for a given time to sell the spread.
t to obtain first- and second-order conditional moments of
The items above deserve some qualification and comple-
(αt+1 , . . . , αt+k ) ; with these quantities, the calculation of the
mentation. First, the meaning of the expression ‘buy the spread’
first- and second-order moments displayed in equations (7), (8)
is that the lower priced asset (in this case, A1 —see equation
and (9) becomes straightforward.
(1)) is bought and the other asset is sold. The expression ‘sell
Denote the vectors of the unknown parameters associated
˜ the spread’ can be analogously explained. Second, notice that
with equations (10) and (A1) by ψ † and ψ † , respectively, and
˜ either the first situation (long position on the spread) or the
the corresponding likelihood functions by L† and L† . Because second (short position on the spread) occurs when the spread
the augmented model does not include any new parameters, deviates from the long-term value for more than or less than
˜
it trivially follows that ψ † = ψ † . Even though it is not that δ, the latter being a threshold that guarantees a minimum prof-
easy to claim the same for the maximum likelihood estimators ∗
itable trade after costs. Third, because their values are fixed, pup
˜
obtained under L† and L† , the next proposition, the proof of ∗
and pdown necessarily reflect risk aversion, and one assuredly
which is in appendix 3, asserts that it is indeed the case: has the option of choosing different values for each one. Fourth,
˜ ˜ we can disable the position (either long or short) when either
Proposition 2 ψ̂ † ≡ arg max L(ψ † ) = arg max L† (ψ † ) ≡
˜ the spread hits the long-term value c or does not hit c by k
ψ̂ † .
time instants ahead—recall equation (5). When the position is
This result and its proof are admittedly inspired by theorem 2 disabled, the capital is immediately shifted back to the previous
of Atherino et al. (2010), and we decided to include them here fixed income market. Finally, even though the two situations in
in detail, with proper adaptations of the former proof, so that which the spread is supposed to be bought or sold are mutually
this paper is more self-contained. exclusive, these are certainly not exhaustive: indeed, if none
The interpretation of proposition 2 is that there are no of the conditions required for each of them are met, the capital
changes in the maximum likelihood estimation when consid- remains invested at the very fixed income market until one of
ering the augmented model in equation (10); hence, one does the two ‘triggers’ is activated.
not need to use the latter to estimate the parameters, which The latter trading rule is repeated each time the observed
would result in additional and unnecessary computational end- value of St and the appropriate mean-reverting conditional
eavour. Instead, the estimation of unknown parameters can be probability ( pup or pdown ) concurrently meet the required con-
accomplished using the original model in equation (A1) of ditions. The choices for the parameters δ, pup ∗ , p∗
down , c and
appendix 1, and the estimates obtained are used with the aug- k considered in this paper will be given in the applications of
mented model. From a practical standpoint, this result is imp- section 6. When the strategy described is used, the main risk
ortant in the applications of section 6 for speeding up the involved is related to specific fundamental changes: the prices
calculation of the probabilities in equation (5). of A1 and A2 may diverge, which means that the spread, which
Finally, once μt,k and t,k in equation (6) are calculated, is not stationary anymore, does not reach its former long-term
the conditional probabilities in equation (5) are evaluated us- value c. The parameter k has the function of mitigating such a
ing standard numerical multiple integration algorithms, which divergence risk. Another aspect is that the target return must al-
have been adapted for multivariate normal distributions ways be higher than the one corresponding to the fixed income
A pairs trading strategy 7

market because it is the opportunity cost inherent to this strat- Table 1. Engle–Granger cointegration tests with the pairs (in-sample
egy. This issue is allegedly addressed using the parameter δ. analysis).

Pairs Dicker–Fuller test*

XOM-LUV −3.006**
VALE5-BRAP4 −4.059**
6. Applications
*Critical values considered have been taken from MacKinnon (2010).
**Pair was considered stationary at a 5% level.
This section presents the results of applying models from
section 4 and the pairs trading strategy derived in
section 5 with real data from the US and Brazilian markets. Additionally, the following asset class indexes have been
In section 6.1, we describe the data used in the estimations used in the evaluation of strategy results:
and justify our choice of the stocks as candidates to form
• Libor—1 year: This indicator stands for London Inter-
pairs. For each case, an effort is made to examine the expected
bank Offered Rate. It is the rate that banks use to borrow
equilibrium between the pair of stock prices in light of the
from and lend to one another in the wholesale money
existing economic relation between both firms. In section 6.2,
markets in London.
we present the results on cointegration tests (which statisti-
• Standard and Poor’s 500 Index (S&P): This is a
cally confirm the economic insights), model estimation and
capitalization-weighted index of 500 stocks represent-
goodness-of-fit, and the strategy performances.
ing all major industries and is designed to measure the
performance of the broad domestic economy through
changes in the aggregate market.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

• Inter-bank deposit certificate (CDI): This indicator is the


6.1. The data & economic justifications
overnight rate in Brazil. As such, it plays the same role
All the financial time series used in the implementations were as Libor. Despite being a market rate, the CDI is closely
obtained from Bloomberg Professional service. Two of them, tied to the interest rate, which is fixed by the Brazilian
considered in one of the two exercises offered here, consist Central Bank for monetary policy decisions.
of daily stock prices of two securities, Exxon Mobil Corpora- • Bovespa Index (Ibovespa): This is the main indicator of
tion (traded in the NYSE with the symbol XOM) and South- the Brazilian stock market’s average performance. The
west Airlines Co (traded in the NYSE with the symbol LUV). relevance of this index is due to several reasons: one
ExxonMobil Corporation is the world’s largest traded interna- is the integrity of its historical series, which has been
tional oil and gas company and has its headquarters located regularly calculated without any methodological change
in Texas in the US. Southwest Airlines Co operates passen- since its inception in 1968.
ger airlines that provide scheduled air transportation services
in the United States. For these two stocks, the period con-
sidered ranges from 22 September 2011 to 26 March 2013,
6.2. Results
which has been split into two parts, 22 September 2011 to
20 September 2012 (in-sample) and 21 September 2012 to 26 We begin by checking whether XOM-LUV and VALE5-BRAP4
March 2013 (out-of-sample). Two other series, corresponding show degrees of mutual equilibrium in the periods consid-
to the second exercise, are daily stock prices of Vale (traded ered. This is assessed by testing cointegration hypotheses (see
in the stock exchange BMF&BOVESPA in São Paulo with the section 4.1 of section 4). We used the two-step Engel Granger
symbol VALE5) and Bradespar (traded in the stock exchange cointegration test, which is essentially an augmented Dickey–
BMF&BOVESPA in Sao Paulo with the symbol BRAP4). Vale Fuller unit root test performed with the ordinary least squares
is the second largest mining company in the world and the (OLS) residuals (this is the second step), obtained after regress-
largest private company in Brazil. It is the largest producer of ing one time series on the other (this is the first step); the critical
iron ore in the world and the second largest producer of nickel. values for the unit root test must be conveniently modified—
Bradespar is an investment company that seeks to create value cf. Engle and Granger (1987), Enders (2004, chapter 6), and
for its shareholders through relevant interests in companies MacKinnon (2010). Once the cointegration hypothesis is not
that are leaders in their operational areas. Currently, Bradespar rejected, the spread to be considered in upcoming analyses is
holds a stake in Vale, acting directly through senior manage- simply the OLS residuals—recall equation (1) in section 4.1
ment, with members on the Board of Directors and Advisory of section 4. The Engle–Granger tests were implemented in
Committees. We have used available data for these two stocks EViews 4.0 with the in-sample parts of the data (see previous
from 29 August 2011 to 3 April 2013. As performed previously section 6.1 for details). From table 1, we see that the data
with the two stocks from the US market, this whole period was provide enough evidence in favour of cointegration for both
divided into two parts, one ranging from 29 August 2011 to XOM-LUV and VALE5-BRA4, supporting the previous fun-
20 September 2012 and the other containing the remainder damental/economic conjectures of section 6.1.
data. In view of the definition of the pair given and discussed We now examine the information presented in table 2, which
in section 4, the stocks described above have been chosen still considers the in-sample data-sets formally defined for both
mainly because, in view of the details given above, XOM and pairs in section 6.1. This contains information on the goodness
LUV—similar to VALE5 and BRAP4—are supposedly long- of fit for three parsimonious ARMA ( p, q) models and the
term related. model proposed by Elliott et al., along with some diagnostics
8 C. E. de Moura et al.
υt
performed using the standardized residuals υtS = √ , where the traditional benchmarks of the US financial market already
√ Ft detailed in section 6.1, and the performance of what we term
υt and Ft are obtained from equation (A2). MATLAB 7.6.0 the ‘plain strategy’: the ‘spread’ for this strategy is defined
was used for the implementations. The unknown parameters as the ratio between the highest and lowest price assets, and
were estimated using maximum likelihood, and we adopted the the trading strategy, formerly addressed by Gatev et al. (2006),
exact log-likelihood function displayed in equation (A3) (see involves opening a position with two assets whenever their cor-
appendix 1). First, we see that, for each of the models estimated responding spread deviates more than two historical (sample)
using spreads from both the US and Brazilian markets, the data standard deviations and unwinding the position when it returns
are reproduced by each of the four models almost under similar to the spread historical mean. In case the prices do not converge
capabilities according to Pseudo R 2 and MSE measures. How- at the end of the trading interval, gains and losses are calculated
ever, AIC and BIC criteria reveal that the AR(1) model, which at the end of the last trading day. All the returns observed
is the simplest option, shows a slightly better complexity/fit from the spreads considered here in this paper, for both our
relation. Before addressing the diagnostics, it is worth noting strategy and the plain strategy, have been calculated according
that if a given linear Gaussian state space model is adequate to the directions given in appendix 4. From table 3, Sharpe
for the data at hand, the standardized residuals must behave ratios, calculated here as the main criterion for choosing the
like the observed values of i.i.d. standard normal random vari- best strategy (as these measure return performances adjusted
ables. Regarding serial dependence, Ljung–Box tests for both to the market risk cf. Sharpe (1966, 1994), indicate that the best
level and squared standardized residuals showed good results trading options for the whole period considered (including the
for all the models and spreads from both markets. Regarding out-of-sample part as well) are the ones related to our pairs
the normality assumption, the Jarque–Bera normality test trading strategy implemented using the AR(2) and ARMA(1,1)
and the coverage Kupiec tests agreed on revealing adequacy models, which both present the same cumulative return, his-
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

for the pair XOM-LUV. On the other hand, even though the torical volatility and maximum drawdown. The Sharpe ratio
Kupiec tests suggested that the standardized residuals from all for the plain strategy has a negative value and is therefore not
the four models estimated using the VALE5-BRAP4 spread shown. The cumulative and average returns corresponding to
seem to come from a probability distribution similar to the the AR(2) and ARMA(1,1) models are larger than the other
standard normal distribution in terms of the tails, the Jarque– investment opportunities, except for the stock index (S&P),
Bera test unveiled discrepancies. Therefore, some care must be which showed a strong upward trend in the out-of-sample
exercised in interpreting and even using the conditional prob- period, as illustrated in each panel of figure 1 by the corre-
abilities pup and pdown in equation (5) for trading decisions: sponding return lines during the time instants after observation
pup and pdown might not be ‘tail’ probabilities. 250. Economic explanations for this excellent performance of
We now discuss our pairs trading strategy performances. the US stock market in the mentioned period would include
This time, as opposed to previous tasks (cointegration testing, the US economy expansion in the first quarter of 2013 and an
parameter estimations and goodness-of-fit analysis), we also agreement reached by the US federal government regarding the
consider the out-of-sample parts of the data-sets for both pairs. US debt ceiling. However, due to its quite larger volatility, the
Therefore, additionally to address performances during the S&P had a worse Sharpe ratio and a larger maximum draw-
period spanning the first year, we investigate the ability of down. Additionally, both our strategy and the plain strategy
our strategy to make profits as compared with other invest- with the S&P displayed low correlations: the plain strategy
ment alternatives during a period spanning about six months exhibited a better performance, as the latter and the S&P were
without re-estimating any parameter. This should be viewed virtually uncorrelated. This evidence was previously expected,
as an assessment of how robust our proposed methodology as the type of quantitative strategy considered is one that is
as a whole might be in real scenarios when it may perhaps supposedly market neutral. On the other hand, based on the
take some time to update/calibrate the statistical models for ability to make profits when a trading position on the spread is
the spread time series. opened, our strategy proved to be considerably superior to the
The parameter c is set to zero, which is the long-term mean plain strategy, as gains were achieved 90% of the times with the
of the spreads, as these are precisely the OLS residual time former (see the fifth performance measure in table 3). Figure 1
series from the cointegration regressions. The parameter δ is depicts cumulative returns for the four state space models,
set to 0.5% to overcome operating costs, due to slippage (this together with cumulative returns from the market indices and
is the difference between the trade expected price and the trade the plain strategy, corroborating and illustrating the findings
actual price) and transaction. In view of these two choices for presented in table 3.
c and δ, a position to buy (sell) spread is open if and only Likewise, both table 4 and figure 2 present the results for
if the spread is less (greater) than −δ (+δ). Finally, for the the pair VALE5-BRAP4. The best performance, relying once
conditional probabilities pup and pdown , their threshold values again on Sharpe ratio comparisons (which were negative for
∗ and p ∗
pup down are both set to 80% and the parameter k is fixed at both the Ibovespa domestic stock index and the plain strat-
25, meaning that the strategy will be closed if, once the spread egy and for two models considered with our strategy), is that
is bought or sold, the pair does not return to its long-term mean corresponding to the AR(1) model. Additionally, like all the
in 25 days at the current market prices, with the latter being an other models and the plain strategy, the AR(1) model has also
event with a conditional probability of 20% at the most. shown almost no correlation at all with Ibovespa. In figure 2,
Table 3 and figure 1 display the results corresponding to it is suggested that cumulative returns of our pairs trading
the pair XOM-LUV for the four linear state space models strategy, implemented with this best AR(1) model, maintained
already under investigation. They also show the results of an upward trend with relatively low volatility, probably
A pairs trading strategy 9

Table 2. Results from in-sample estimations ( p-values in parentheses).

XOM-LUV VALE5-BRAP4

Attribute AR(1) AR(2) ARMA(1,1) Elliott AR(1) AR(2) ARMA(1,1) Elliott


Log-likelihood −989.044 −989.044 −989.044 −989.081 −1053.502 −1060.960 −1068.300 −1068.310
Pseudo R 2 0.896 0.896 0.896 0.902 0.767 0.780 0.788 0.789
MSE ×10−4 2.299 2.298 2.299 2.161 0.905 0.857 0.8130 0.8110
AIC 7.865 7.873 7.873 7.882 8.377 8.444 8.502 8.510
BIC 7.893 7.915 7.915 7.938 8.405 8.486 8.544 8.566
LR Kupiec test (superior)* 0.077 0.077 0.077 0.434 0.987 0.987 0.015 0.015
(0.781) (0.781) (0.781) (0.510) (0.320) (0.320) (0.903) (0.903)
LR Kupiec test (inferior)* 0.434 0.434 0.434 0.434 2.952 0.434 0.434 0.434
(0.510) (0.510) (0.510) (0.510) (0.086) (0.510) (0.510) (0.510)
Ljung-Box test 1(20 lags)** 13.718 13.727 13.726 13.684 29.706 23.628 18.040 18.035
(0.845) (0.844) (0.844) (0.846) (0.075) (0.259) (0.585) (0.585)
Ljung-Box test 2 (20 lags)*** 29.557 26.679 29.669 29.582 13.891 13.694 16.381 16.473
(0.148) (0.145) (0.145) (0.152) (0.836) (0.832) (0.693) (0.687)
Jarque–Bera test 0.709 0.706 0.706 0.703 22.602 24.308 24.880 24.914
(0.685) (0.687) (0.689) (0.688) (0.002) (0.001) (0.001) (0.001)
Mean**** 0.069 0.068 0.068 0.085 −0.019 −0.029 −0.045 −0.049
Variance**** 0.999 0.999 0.999 0.997 1.004 1.003 1.002 1.002
*These are likelihood ratio unconditional coverage tests proposed by Kupiec (1995). The first and second tests check the standard residual violations of 95 and
5% standard normal distribution quantiles (that is, 1.65 and −1.65), respectively.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

**This test has been performed using the standardized residuals.


***This test has been performed using the squared standardized residuals.
****These sample statistics have been calculated using the standardized residuals.

Figure 1. Comparison of the cumulative returns: strategy P/L with the pair XOM-LUV, Libor, S&P and plain strategy (whole period analysis).

corroborating the best Sharpe ratio. In terms of Ibovespa, we and persistent reversals of this index in figure 2. Finally, in
observe that even though this benchmark did present at specific terms of the efficiency indicator given in table 4, our strategy
times the largest returns amongst all the investment alternatives has clearly outperformed the plain strategy: similar to the first
in the period considered, its huge risky behaviour (compar- exercise with the US market, the percentages of success in
ing the volatilities and maximum drawdowns in table 4) is trading positions were in tune with the nominal threshold value
noteworthy and has certainly contributed to some temporary of 80% for the conditional probabilities pup and pdown .
losses and a worse cumulative return at the very end of the out- Finally, table 5 shows the computational gain, in terms of
of-sample period. This can also be seen from the downward estimation time, due to proposition 2 of this paper. Even though
10 C. E. de Moura et al.

Table 3. USA market data: performance measures from four different models for the spread and three benchmarks (whole period analysis).

XOM-LUV Benchmarks

Measures AR(1) AR(2) ARMA(1,1) ELLIOTT LIBOR S&P Plain strategy


Average return 0.047% 0.054% 0.054% 0.047% 0.0004% 0.07% −0.006%
Volatility* 0.590% 0.556% 0.556% 0.590% 0.00005% 0.936% 0.788%
Cumulative return 14.815% 17.753% 17.753% 14.815% 0.150% 25.907% −3.323%
Maximum drawdown** −5.036% −3.953% −3.953% −5.036% 0.000% −9.936% −19.491%
Efficiency*** 90% 90% 90% 90% – – 67%
Sharpe ratio 1.322 1.685 1.685 1.322 – 1.464 –
Correlation**** 0.183 0.176 0.176 0.183 0.053 1.000 0.017
*This is the standard deviation calculated using the daily returns.
**The maximum drawdown is a market risk measure for a given portfolio. It is the difference, observed in the period being analysed, between the highest peak
and the lowest bottom in the value of the portfolio (see Karatzas and Shreve (1997) and Magdon-Ismail et al. (2004)).
***This indicator is the percentage of the total number of times when the strategy activated has resulted in profits.
****Correlation between the daily returns from the strategy P/L and the equity market (S&P).
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

Figure 2. Comparison of cumulative returns: strategy P/L with the pair VALE5-BRAP4, CDI, IBOVESPA and plain strategy (whole period
analysis).

Table 4. Brazilian market data: performance measures of four different models for the spread and three benchmarks (whole period analysis).

VALE5-BRAP4 Benchmarks

Measures AR(1) AR(2) ARMA(1,1) ELLIOTT CDI IBOVESPA Plain strategy


Average return 0.048% 0.038% 0.029% 0.029% 0.033% 0.032% 0.015%
Volatility* 0.923% 0.815% 0.730% 0.730% 0.006% 1.362% 0.531%
Cumulative return 16.746% 13.155% 5.245% 5.245% 12.304% 8.428% 5.260%
Maximum drawdown** −7.737% −9.718% −8.867% −8.867% −0.045% −20.853% −6.931%
Efficiency*** 77% 82% 82% 82% – – 0%
Sharpe ratio 0.256 0.056 – – – – –
Correlation**** −0.027 −0.016 −0.010 −0.010 0.066 1.000 −0.012
%CDI***** 136.104% 106.918% 42.631% 42.631% – 68.505% 40.822%
*This is the standard deviation calculated with the daily returns.
**The maximum drawdown is a market risk measure for a given portfolio. It is the difference, observed in the period being analysed, between the highest peak
and the lowest bottom in the value of the portfolio—see Karatzas and Shreve (1997) and Magdon-Ismail et al. (2004).
***This indicator is the percentage of the total number of times when the strategy activated has resulted in profits.
****Correlation between the daily returns from the strategy P/L and the equity market (IBOVESPA).
*****Ratio between accumulated returns from the strategy P/L and the CDI in percentual terms.
A pairs trading strategy 11

Table 5. Computational times (seconds) for maximum likelihood estimation of the models with the pair VALE5-BRAP4 (in-sample analysis).

Models Original modelAugmented model (k = 10)Augmented model (k = 15)Augmented model (k = 20)Augmented model (k = 25)

ELLIOT 2.481 6.113 14.049 44.603 152.091


AR(1) 0.579 1.125 2.250 7.513 24.086
AR(2) 0.939 1.737 4.026 12.725 41.557
ARMA(1,1) 0.891 2.004 5.716 18.154 58.531

the information corresponds to model estimations with a port- market. In other words, the model in equation (10), which is
folio that has only a pair of assets on a daily basis, it is plausible the building block of the quantitative strategy proposed in
to assume that the augmented model would also be excessively this paper, serves ultimately as a way of incorporating the
time consuming. If we had adopted and implemented the mod- extra information provided by the data as well as of informed
elling and pairs trading strategy proposed in this paper with traders (cf. Baruci 2003, chapter 7, and references therein) in
intraday high-frequency data, the estimation times would have an algorithmic and consistent decision mechanism. As more
been increased in the case of a portfolio containing several practical exercises are still to be made, we do recognize that
pairs. For instance, the augmented model with k = 25 for the empirical evidence shared in this paper is limited in order
Elliott’s model required almost 3 min for estimation; the orig- to confirm these economic perspectives. However, the two
inal model took less than three seconds. applications detailed in section 6 already prove that our strategy
can be efficiently implemented and suggest that this change
of direction in the usual pairs trading paradigms might work
7. Discussion well.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

At the end of this paper, we address some points potentially


In this paper, we have developed a new pairs trading strategy relevant and in tune with the financial market reality for the
based on linear state space models and the Kalman Filter. As case of implementing the strategy under real scenarios. We
opposed to other approaches found in the literature, neither start by suggesting further investigation on the parameters
point forecasts nor confidence bands constitute the basis for c, k and δ, which have been held constant in the examples
decisions on trading operations; instead, we examine the con- in this paper (notice that nothing prevents them from being
ditional probability that the value of the mispriced spread will estimated or, should one prefer, optimized under the usual
mean-revert eventually to some pre-established horizon. back-testing schemes). We may also enhance the use of such
The economic motivation behind this strategy is akin to parameters. For instance, the parameter δ, although designed
the intuitive argument used in commodities or in interest rate here to simultaneously take into account the transaction costs
theory: if the price of one asset would outperform the other with from both long and short positions, might be doubled: a δ1
very high probability over long periods of time, then going for one type of position and a δ2 for the other. In the case of
long on the former and short on the latter would generate short positions, a very important cost, which anyone willing
an imbalance seen by the market - cf. Bessembinder et al. to adopt any pairs trading strategy (including ours) must pay
(1995), Pindyck (2001) and Nielsen and Schwartz (2004). To attention to, is the rented asset cost. Because transactions fees
dwell on this financial matter and be more specific in terms vary according to the type of investment, analysis of the latter
of the economic intuition of our model, we must recall that helps to identify how suitable our strategy is. More details
the spread process St fluctuates around a zero equilibrium that on rented asset costs can be found in www.bmfbovespa.com.
is expected to be reached within a certain precision radius δ. br for the Brazilian market and in nyse.nyx.com for the New
At the root of such a claim is the appropriate testing to which York Stock Exchange.
a trader should submit a ‘potential pair’ of assets to confirm Like the latter, practically every cost related to our strategy—
co-integration (see Fasen (2013a, 2013b)). Therefore, ex ante such as commissions, the bid/ask spread, market impact and
we have a well-established return to a close-to-zero historical opportunity cost—should be classified as a turnover cost.
value. Associated with that long-term equilibrium assumption, Following Grinold and Kahn (1999, chapter 16), a turnover
and given that the spread is below (above) the threshold, we occurs every time one constructs or changes/rebalances a port-
have a certain probability pup (respec. pdown ) that it will mean- folio; possible motivations might include new information,
revert. The use of a mean-reverting model, like those consid- risk control and, in our case, the perception (large pup and
ered in this paper, with a fixed divergence risk parameter k pdown probabilities) that a specific change in the portfolio at
corresponds to an interim observation period for which the the present time will lead to a profit in the very near future.
Kalman filter allows us to accumulate information. As is well- Turnover costs are difficult to measure, increase with trade
known, Kalman filters have extremely desirable properties, as size and require quick execution and are therefore difficult
they provide conditional expectations for Gaussian linear state to incorporate in the calculation of cumulative returns and
space models (cf. Harvey 1989, Durbin and Koopman 2001). other portfolio performance measures. Something that aggra-
Therefore, for the models considered in this paper, the Kalman vates these problems, as is particularly relevant for this paper,
filter is the best estimator to adapt to the information flow. is the fact that the two examples we consider are culled from
Bringing together these points, we can conclude that the use two entirely different markets with different characteristics and
of the augmented model given in equation (10) offers a clear regulations. Because turnovers by their very definition regu-
indication of whether we should go short or long on the spread larly occur with our strategy, we are certainly aware that accu-
position or do nothing but remain on the low-risk fixed income rate estimates of such costs would affect the realized
12 C. E. de Moura et al.

portfolio value. However, sticking to Grinold and Kahn’s point Finally, we discuss the use of our strategy in high-frequency
of view, ‘trading is itself a portfolio optimization problem, data. The analyses of these data are complicated due to irregular
distinct from the portfolio construction problem’; therefore, temporal spacing, intra-daily patterns and price discreteness
‘optimal trading can lower transactions costs, though at the (cf. Ait-Sahalia and Hansen 2010, chapter 7). Another major
expense of additional short-term risk’. Bearing these last quo- characteristic of high-frequency data is the strong intra-day
tations in mind, we understand that to effectively combine such seasonal behaviour of the volatility, as pointed out by Fouque
trading schemes for reducing (that is, optimizing) costs and a et al. (2000, chapter 4). A data-generating process with strong
pairs trading strategy such as the one proposed in our paper seasonal patterns cannot be stationary. Therefore, controlling
deserves much more time and space. We leave this as a possible these periodical movements before fitting any time series model
theme for upcoming papers on pairs trading. to the data should be a mandatory initial step. In light of these
We now dedicate some effort towards discussing the ques- issues typically related to high-frequency situations, other state
tion of market neutrality. The starting point is to recall that space models shall be combined with the pairs trading strategy
virtually any portfolio return variation can possibly be ex- proposed in this paper.
plained by some market factors. Following the standard finance
literature, the natural way of addressing this is to consider
some type of factor model for the portfolio, amongst which Acknowledgements
we recall the CAPM & APT models (cf. Elton et al. 2014), the
model by Fama and French (cf. Fama and French 1996), and Our sincere thanks go to the referees, whose comments,
the asset class factor model (cf. Sharpe 1992, de Roon et al. requirements and suggestions were invaluable for improving
2004).We understand that such modelling should consider the this paper. We are also truly grateful to Cristiano Fernandes,
quite plausible assumption of time-varying coefficients, which Adrien Nguyen Huu and Paulo Cezar Carvalho for their very
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

would allow us to precisely see the real exposures on different constructive comments. All remaining errors are ours.
financial risks and, thus, be more solid about unveiling the mar-
kets that a portfolio defined by our strategy would be neutral
Disclosure statement
to. This time-varying coefficient assumption is justified here
because our strategy involves time-varying trading positions No potential conflict of interest was reported by the authors.
on three different assets: the two assets forming the pair and a
risk-free asset. Such extensions of the original factor models
previously cited already exist and view the coefficients (the References
factor exposures) as latent stochastic processes: for instance,
the reader is referred to the dynamic asset class factor mod- Ait-Sahalia, Y. and Hansen, L., Handbook of Financial Econometrics,
els proposed in Swinkels and Van Der Sluis (2006), Pizzinga 2nd ed., 2010 (Springer: New York).
et al. (2011) and Marques et al. (2012). Statistically speaking, Atherino, R., Pizzinga, A. and Fernandes, C., A row-wise stacking
of the runoff triangle: State space alternatives for IBNR reserve
we are interested in the selection and estimation of stochastic prediction. Astin Bull., 2010, 40(2), 917–946.
coefficient regression models. Coincidently, the method for Avellaneda, M. and Lee, J.H., Statistical arbitrage in the US equities
implementing such tasks is proper linear state space modelling market. Quant. Finance, 2010, 10(7), 761–782.
with the use of the Kalman filter. However, although it would Baronyan, S., Boduroglu, I. and Sener, E., Investigation of stochastic
require the same methodological framework of the statistical pairs trading strategies under different volatility regimes. The
Manchester School, 2010, 2010 (supplement), 114–134.
analysis already found in our paper, we would require other Baruci, E., Financial Markets Theory, 2003 (Springer: New York).
implementations (the stochastic coefficient regression models Bertram, W.K., Optimal trading strategies for Itô diffusion processes.
are quite different from, for example, the ARMA models con- Physica A, 2009, 388, 2865–2873.
sidered in our paper). Due to space limitations and the huge Bertram, W.K., Analytic solutions for optimal statistical arbitrage
relevance of market neutrality as a mainstream subject within trading. Physica A, 2010, 389, 2234–2243.
Bessembinder, H., Coughenour, J., Seguin, P. and Smoller, M., Mean
the finance literature, we leave this for future research. reversion in equilibrium asset prices: Evidence from the futures
We now take a closer look at a more statistically oriented term structure. J. Finance, 1995, 50, 361–375.
question: that of distributional assumptions. As strong viola- Brockwell, P.J. and Davis, R.A., Time Series: Theory and Methods,
tions of normality can make the quantities pup and pdown quite 2nd ed., 1991 (Springer: New York).
unreliable as proxies for the true conditional probability of Brockwell, P.J. and Davis, R.A., Introduction to Time Series and
Forecasting, 2nd ed., 2003 (Springer: New York).
mean-reverting, an alternative for dealing with such inconve- de Roon, F.A., Nijman, T.E. and Ter Horst, J.R., Evaluating style
nient situations is to rely on Monte Carlo simulations of future analysis. J. Empirical. Finance, 2004, 11(1), 29–53.
trajectories of the spread St k steps ahead. For the ARMA Drezner, Z., Computation of the trivariate normal integral. Math.
models, this would require modelling the error term with the aid Comput., 1994, 63, 289–294.
of standardized residuals. A second alternative, which releases Drezner, Z. and Wesolowsky, G.O., On the computation of the
bivariate normal integral. J. Stat. Comput. Simul., 1989, 35, 101–
one from choosing/modelling error distributions (but is much 107.
more demanding in computational terms), is to adopt some Durbin, J. and Koopman, S.J., Time Series Analysis by State Space
bootstrap procedure to estimate the mean-reverting conditional Methods, 2001 (Oxford Statistical Science Series: Oxford).
probabilities. Wall and Stoffer (2002) and Rodriguez and Ruiz Elliott, R.J. and Krishnamurthy, V., New finite-dimensional filters
(2009) are two papers from among a large list of references on for parameter estimation of discrete-time linear Gaussian models.
IEEE Trans. Autom. Control, 1999, 44, 938–951.
bootstrapping state space models, and these two papers have Elliott, R.J., van der Hoek, J. and Malcolm, W.P., Pairs trading. Quant.
methodologies that address the aims being discussed here. Finance, 2005, 5(3), 271–276.
A pairs trading strategy 13

Elton, E.J., Gruber, M.J., Brown, S.J. and Goetzmann, W.N., iModern Pizzinga, A., Vereda, L. and Fernandes, C., A dynamic style analysis
Portfolio Theory and Investment Analysis, 9th ed., 2014 (John of exchange rate funds: The case of Brazil at the 2002 election.
Wiley & Sons: Hoboken, NJ). Adv. Appl. Stat. Sci., 2011, 6, 111–135.
Enders, W., Applied Econometric Time Series, 2nd ed., 2004 (John Pole, A., Statistical Arbitrage: Algorithmic Trading Insights and
Wiley & Sons: Hoboken, NJ). Techniques, 2007 (John Wiley & Sons: Hoboken, NJ).
Engelberg, J., Gao P. and Jagannathan R.,An anatomy of Pairs trading: Rampertshammer, S., An Ornstein–Uhlenbech Framework for Pairs
The role of idiosyncratic news, common information and liquidity, Trading, 2007 (Department of Mathematics and Statistics of the
Third Singapore International Conference on Finance, 2009. University of Melbourne). Unpublished Note.
Engle, R. and Granger, C., Co-Integration and error correction: Rodriguez, A. and Ruiz, E., Bootstrap prediction intervals in state
Representation, estimation, and testing. Econometrica 1987, 55 space models. J. Time Ser. Anal., 2009, 30(2).
(n◦ 2), 251–276. Ross, S., The arbitrage theory of capital asset pricing. J. Economic
Fama, E.F. and French, K.R., Multifactor explanations of asset pricing Theory, 1976, 13, 341–360.
anomalies. J. Finance, 1996, 51(1), 55–84. Sharpe, F.W., Mutual fund performance. J. Bus., 1966, 39, 119–138.
Fasen, V., Statistical estimation of multivariate Ornstein-Uhlenbeck Sharpe, W.F., Asset allocation: Management style and performance
processes and applications to co-integration. J. Econometrics, measurement. J. Portfolio Manage. 1992, (winter), 7–19.
2013a, 172, 325–337. Sharpe, F.W., The sharpe ratio. J. Portfolio. Manage., 1994, 21,
Fasen, V., Time series regression on integrated continuous-time 49–58.
processes with heavy and light tails. Econometric Theory, 2013b, Shumway, R.H. and Stoffer, D.S., Time Series Analysis and
29, 28–67. its Applications (With R Examples), 2nd ed., 2006 (Springer:
Fouque, P.J., Papanicolaou, G., Sircar, R. and K., Derivatives in New York).
Financial Markets with Stochastic Volatility, 2000 (Cambridge Swinkels, L. and van der Sluis, P.J., Return-based style analysis with
University Press: Cambridge). time-varying exposures. Eur. J. Finance, 2006, 12, 529–552.
Gatev, E., Goetzmann, W. and Rouwenhorst, K., Pairs trading: Tourin, A. and Yan, R., Dynamic pairs trading using the stochastic
Performance of a relative value arbitrage rule. Rev. Financial Stud., control approach. J. Economic Dyn. Control, 2013, 37, 1972–1981.
2006, 19, 797–827. Triantafyllopoulos, K. and Montana, G., Dynamic modeling of mean-
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

Genz, A., Numerical computational of multivariate normal reverting spreads for statistical arbitrage. Comput. Manage. Sci.,
probabilities. J. Comput. Graphical Stat., 1992, 1, 141–149. 2009, 8, 23–49.
Genz, A., Numerical computation of rectangular bivariate and Vidyamurthy, G., Pairs Trading, Quantitative Methods and Analysis,
trivariate normal and t probabilities. Stat. Comput., 2004, 14(3), 2004 (John Wiley & Sons: Hoboken, NJ).
251–260. Wall, K. and Stoffer, S., A State space approach to bootstrapping
Grinold, R.C. and Kahn, R.N., Active Portfolio Management: conditional forecasts in ARMA models. J. Time Ser. Anal., 2002,
A Quantitative Approach for Producing Superior Returns and 23(6).
Controlling Risk, 2nd ed., 1999 (McGraw-Hill Education: New Wissner-Gross, A.D. and Freer, C.E., Relativistic statistical arbitrage.
York). Phys. Rev. E, 2010, 82, 056104.
Hamilton, J.D., Time Series Analysis, 1994 (Princeton University
Press: Princeton, NJ).
Harvey, A.C., Forecasting, Structural Time Series Models and the Appendix 1. Linear state space models & the Kalman filter
Kalman Filter, 1989 (Cambridge University Press: Cambridge).
Harvey, A.C., Time Series Models, 2nd ed., 1993 (Harvester By a Gaussian linear state space model, we mean the following
Wheatsheaf: Hemel Hempstead). measurement equation, state equation and initial state vector:
Huck, N., Pairs selection and outranking: An application to the S&P
100 index. Eur. J. Operational Res., 2009, 196, 819–825. Yt = Z t αt + dt + t , t ∼ NID(0, Ht )
Huck, N., Pairs trading and outranking: The multi-step-ahead αt+1 = Tt αt + ct + Rt ηt , ηt ∼ NID(0, Q t ) (A1)
forecasting case. Eur. J. Operational Res., 2010, 207, 1702–1716. α1 ∼ N(a1 , P1 ).
Jacobs, B. and Levy, K., Market Neutral Strategies, 2005 (John Wiley
The former equation is an affine function relating the observed
& Sons: Hoboken, NJ).
p-variate time series Yt to the generally unobserved m-variate state
Karatzas, I. and Shreve, S.E., Brownian Motion and Stochastic
vector αt , and the latter equation stipulates the state evolution through
Calculus, 1997 (Springer: New York).
a Markovian structure. The random errors t and ηt are independent
Kaufman, P.J. New Trading Systems and Methods, 4th ed., 2005 (John
(in time, between each other and of α1 ). The system matrices Z t , dt ,
Wiley & Sons: Hoboken, NJ).
Ht , Tt , ct , Rt and Q t are deterministic or, at most, depend on the past
Kupiec, P., Techniques for verifying the accuracy of risk management
value of Yt . In the latter case, Harvey (1989, section 3.7), refers to
models. J. Derivatives, 1995, 3, 73–84.
equation (A1) as a conditionally Gaussian state space model.
Mackinnon J.G., Critical Values for Cointegration Tests, Queen’s
Economics Department Working Paper No. 1227, Queen’s
For a given time series of size n and any time instants  j ∈
t,
{1, 2, . . . n}, define F j ≡ σ Y1 , . . . , Y j , at| j ≡ E αt |F j and
University, 2010.
Magdon-Ismail, M., Atiya, A.F., Pratap, A. and Abu-Mostafa, Y.S., On Pt| j ≡ Var αt |F j . Kalman filtering consists of recursive equations
the maximum drawdown of a Browninan motion. J. Appl. Probab., for these first- and second-order conditional moments, corresponding
2004, 41, 147–161. to one-step-ahead prediction ( j = t − 1) and smoothing ( j = n). The
Marques, R., Pizzinga, A. and Vereda, L., Restricted Kalman formulae corresponding to the predictions are given below:
filter applied to dynamic style analysis of actuarial funds. Appl. υt = Yt − Z t at|t−1 − dt , Ft = Z t Pt|t−1 Z t + Ht ,
Stochastic Models Bus. Ind., 2012, 28, 558–570.
Mori, M. and Ziobrowski, A., Performance of Pairs trading strategy in K t = Tt Pt|t−1 Z t Ft−1 , L t = Tt − K t Z t , t = 1, . . . , n,
the U.S. REIT market. Real Estate Economics, 2011, 39(3), 409– at+1|t = Tt at|t−1 + ct + K t υt , Pt+1|t = Tt Pt|t−1 L t + Rt Q t Rt ,
428. (A2)
Nielsen, M. and Schwartz, E., Theory of Storage and the Pricing of
Commodity Claims. Rev. Derivatives Res., 2004, 7, 5–24. The derivations of equation (A2) are found in Durbin and Koopman
Nicholas, J.G., Market-Neutral Investing: Long/Short Hedge Fund (2001). There are several other references on this subject that deserve
Strategies, 2000 (Bloomberg Press: Princeton, NJ). mention, such as the books by Harvey (1989, 1993), Brockwell and
Perlin, M.S., Evaluation of Pairs trading strategy at the Brazilian Davis (1991, 2003), Hamilton (1994), and Shumway and Stoffer
financial market. J. Derivatives Hedge Funds, 2007, 15, 122–136. (2006).
Pindyck, R., The dynamics of commodity spot and futures markets: In practice, system matrices include unknown parameters that must
A primer. Energy J., 2001, 22(3), 1–29. be estimated. By grouping all unknown parameters of the model
14 C. E. de Moura et al.

described in (A1) in a vector ψ, and denoting the corresponding para- s = 1, . . . , t − 1, is


metric space by , one can obtain an exact log-likelihood function ⎡ ⎡ ⎤⎤
using some outputs from equation (A2): Tj 0 · · · 0
 ⎢ s−1
†˜   ⎢ ⎢ ⎢ I 0 · · · 0⎥⎥  
⎥⎥
1 
n  Ys = Z s 0 · · · 0 ⎢ ⎢ .. . . .. ⎥⎥ α˜1
log L(ψ) = −
np
log − log |Ft | + υt Ft−1 υt , ∀ψ ∈ . ⎣ ⎣. . · · · . ⎦⎦
j=1
2 2
t=1 0 ··· I 0
(A3) ⎡ ⎡T 0 · · · 0⎤⎤ ⎛⎡ R ⎤ ⎡ ⎤⎞
cj
k j
The maximum likelihood estimator of ψ is defined by ψ̂ ≡   ⎢ s−1
s−2  ⎢ I 0 · · · 0⎥⎥ ⎜ ⎢0⎥ ⎢ 0 ⎥⎟
arg maxψ∈ log L(ψ). When the normality assumption for t , ηt + ⎢ ⎢. . .. ⎥ ⎥ ⎜⎢ ⎥ ⎢ ⎥⎟
⎢ . ⎥ η + ⎢ . ⎥⎟
⎣ ⎣ . .. ⎦⎦ ⎜ ⎝⎣ .. ⎦ j ⎣ .. ⎦⎠
is violated, equation (A3) should be viewed as a quasi log-likelihood j=1 k= j+1 . · · · .
function and ψ̂, in turn, as a quasi maximum likelihood estimator. 0 ··· I 0 0 0
⎧⎡ ⎤ ⎡R ⎤ ⎫
⎪ cs−1 s−1 ⎪
⎪ ⎪
  ⎨⎢ 0 ⎥ ⎢ 0 ⎥ ⎬
⎢ ⎥ ⎢ ⎥
+ Z s 0 · · · 0 ⎣ . ⎦ + ⎣ . ⎦ ηs−1 + ds + εs , (C2)

⎪ .. .. ⎪

⎩ ⎭
0 0
Appendix 2. Proof of proposition 1
where α˜1 is an initial state vector with appropriate first and second
From the second equation of equation (2), it follows that moments.
Now, we observe that
xt = a + (1 − b)xt−1 + Cηt ≡ a + Bxt−1 + ηt∗ , ⎡ !s−1 ⎤
j=1 T j 0 0 · · · 0
⎡T 0 · · · 0⎤ ⎢ s−1 T ! ⎥
where ηt∗ ∼ N(0, C 2 ). Therefore, (1−B L)xt = xt −Bxt−1 = a+ηt∗ , k ⎢ j=1 j+1 0 0 · · · 0⎥
⎢
s−1 · · · ⎢ ! ⎥
leading to I 0 0 ⎥ ⎢ s−1 T j+2 0 0 · · · 0⎥
⎢. . ⎥ = ⎢ j=1 ⎥,
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

.
.. ⎦ ⎢
(C3)
⎣ . ..
. · · · ⎢ .. .. .. .. .. ⎥

1 1 a a j=1 ⎢ . . . . . ⎥
xt = a+ η∗ = + η∗ , 0 ··· I 0 ⎣ ⎦
(1 − B L) (1 − B L) t (1 − B) (1 − B L) t Ts−1 0 0 ··· 0
(B1) I 0 0 ··· 0
⎡ ⎡T 0 · · · 0⎤⎤ ⎛⎡ R ⎤ ⎡ ⎤⎞
cj
k j
where L is the usual lag operator (recall: 0 < b < 2). Now, we
 ⎢ s−1
s−2  ⎢ I 0 · · · 0⎥⎥ ⎜ ⎢0⎥ ⎢ 0 ⎥⎟
substitute equation (B1) in the first equation of equation (2) to get ⎢ ⎢. . .. ⎥ ⎥ ⎢ . ⎥ηj + ⎢ . ⎥
⎜ ⎢ ⎥ ⎢ ⎟
⎣ ⎣ . .. ⎦⎦ ⎜ ⎝ ⎣ . ⎦ ⎣ . ⎥⎟
j=1 k= j+1 . ··· . . . ⎠

a 1
St = + η∗ + Dt 0 ··· I 0 0 0
(1 − B L) (1 − B L) t ⎡ !s−1   ⎤
a 1 k= j+1 Tk R jηj + cj
= + η∗ + t∗ (B2) ⎢!s−1  ⎥
(1 − B L) (1 − B L) t ⎢ k= j+1 Tk+1 R j η j + c j ⎥
⎢
s−2 ⎢ ⎥
.. ⎥
= ⎢ . ⎥. (C4)
where t∗ ∼ N(0, D 2 ). ⎢   ⎥
Applying the operator (1 − B L) on both sides of equation (B2), j=1 ⎢
⎢ T R η + c ⎥

⎣ 
s−1 j j j

R η +cj j j
St∗ ≡ (1 − B L)St = a + ηt∗ + t∗ − Bt−1
∗ (B3) 0
Placing (C3) and (C4) properly in (C2) implies
From equation (B3), it is straightforward to see that ⎧⎡ ⎤ ⎡ ⎤
⎨ s−1
 
s−2 
s−1
 
γ (0) = C 2 + (1 + B 2 )D 2 , γ (1) = −B D 2 , γ (k) = 0, k ≥ 2, †˜
Ys = Z s ⎣ T j ⎦ α1 + ⎣ Tk ⎦ R j η j + c j
(B4) ⎩
j=1 j=1 k= j+1
"
where γ (k) = Cov(St∗ , St−k∗ ), k = 0, ±1, ±2, . . . . + cs−1 + Rs−1 ηs−1 + dt + εs , (C5)
From equation (B4) and Brockwell and Davis (1991, p. 89, propo- which coincides with the recursive solution of the measurement equa-
sition 3.2.1), it follows that St∗ ∼MA(1).  tion from the original model (A1). To conclude the proof, combine
equations (C5) and (C1). 

Appendix 4. Spread returns: plain strategy & our pairs


trading strategy
Appendix 3. Proof of proposition 2
The spread for the plain strategy, whenever it is activated, is defined
We have to prove that the likelihood functions of models † (original) as the ratio between the highest and lowest price assets of the pair.
˜
and †˜ (augmented) are equal; in other words, L† = L† over all the For instance, assume that in time t, the prices Pt,1 and Pt,2 of assets
† †˜ Pt,1
parametric space. It is sufficient to show that υt = υt for each t = A1 and A2 are such that Pt,1 ≥ Pt,2 . Then, the spread is St = .
1, . . . , n. Notice that Pt,2
Using this definition, the daily return depends on how St deviates from
† † † † †˜ †˜ †˜ †˜ its historical mean. If St is found to be higher than its historical mean
υt = Yt − Z t at|t−1 − dt and υt = Yt − Z t at|t−1 − dt , (C1) Pt+1,2
by at least two historical standard deviations, the return is −
Pt,2
†˜ †˜ † †˜ Pt+1,1
where at|t−1 ≡ E(αt |Ft−1 ) and dt = dt . Under the augmented ; if the St takes on a value that is lower than its historical
k Pt,1
model in equation (10) and using the convention that T =
j=1 j mean by at least two historical standard deviations, the return is, in
T1 · · · Tk , i.e. the product is taken from left to right in increasing order, Pt+1,1 Pt+1,2
turn, − .
the recursive solution for the measurement equation, for an arbitrary Pt,1 Pt,2
A pairs trading strategy 15

On the other hand, Vidyamurthy (2004, pp. 80–82), derived a direct that is, once a long position is taken on the spread, the return for the
way of obtaining the return for the pairs trading strategy proposed in latter is simply the difference between the spread value in times t + i
section 5 that is justified with some elements of the definition of a pair and t. If, in turn, the investor sells asset A2 and buys asset A1 (spread
of assets (recall section 4.1). Assume that log(Pt,1 ) and log(Pt,2 ) are is being sold now), virtually the same derivation in equation (D1)
cointegrated—that is, A1 and A2 form a pair—with mean α and would demonstrate that the spread return becomes St − St+i , which
cointegration coefficient β. If the investor takes a long position on in turn corresponds to the negative of the return.
asset A1 and takes a short position on asset A2 (that is, the investor
buys the spread) and if he or she maintains the position by at least
t + i (i = 1, 2, . . . , k, where k denotes divergence risk parameters
that are previously set - cf. section 5.3), then the corresponding return
from time t to t + i is given by
Pt+i,1 Pt+i,2
log − βlog
Pt,1 Pt,2
= log(Pt+i,1 ) − log(Pt,1 ) − β(log(Pt+i,2 ) − log(Pt,2 ))
= log(Pt+i,1 ) − βlogPt+i,2 − (log(Pt,1 ) − βlogPt,2 ) (D1)
= log(Pt+i,1 ) − α − βlogPt+i,2 − (log(Pt,1 ) − α − βlogPt,2 )
= St+i − St ;
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016

View publication stats

You might also like