Pairs Trading Strategy Using Kalman Filter
Pairs Trading Strategy Using Kalman Filter
net/publication/301644668
A pairs trading strategy based on linear state space models and the Kalman
filter
CITATIONS READS
34 4,230
3 authors:
Jorge P. Zubelli
Khalifa University
157 PUBLICATIONS 1,688 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jorge P. Zubelli on 19 December 2017.
To cite this article: Carlos Eduardo de Moura, Adrian Pizzinga & Jorge Zubelli (2016): A pairs
trading strategy based on linear state space models and the Kalman filter, Quantitative
Finance, DOI: 10.1080/14697688.2016.1164886
Article views: 48
(Received 28 December 2014; accepted 18 February 2016; published online 25 April 2016)
Among many strategies for financial trading, pairs trading has played an important role in practical
and academic frameworks. Loosely speaking, it involves a statistical arbitrage tool for identifying and
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
exploiting the inefficiencies of two long-term, related financial assets. When a significant deviation
from this equilibrium is observed, a profit might result. In this paper, we propose a pairs trading
strategy entirely based on linear state space models designed for modelling the spread formed with a
pair of assets. Once an adequate state space model for the spread is estimated, we use the Kalman filter
to calculate conditional probabilities that the spread will return to its long-term mean. The strategy is
activated upon large values of these conditional probabilities: the spread is bought or sold accordingly.
Two applications with real data from the US and Brazilian markets are offered, and even though they
probably rely on limited evidence, they already indicate that a very basic portfolio consisting of a
sole spread outperforms some of the main market benchmarks.
Keywords: Kalman filter; Mean-reverting conditional probabilities; Pair; Pairs trading; Spread; State
space models; Statistical arbitrage
1. Introduction ARMA, model (cf. Brockwell and Davis 1991, 2003, Hamilton
1994, Enders 2004), and its particular specifications are also
Pairs trading is a type of statistical arbitrage that was first dealt with in this paper under appropriate linear state space
implemented in the mid-80s by Nunzio Tartaglia and his group forms. In fact, we will prove that this second class of models,
at Morgan Stanley (cf. Vidyamurthy 2004). Currently, pairs even though they lack theoretical finance support, encompasses
trading is widely used by investment banks and hedge funds. the proposal made by Elliott et al. (2005) as a particular case.
In general terms, a pairs trading strategy aims to identify and Subsequently, we develop a methodology for calculating
exploit market inefficiencies observed with two long-term, conditional probabilities (given the past and actual spread data)
related assets, mostly by using statistical methods. The two that the spread will return to its long-term mean k-steps ahead
assets are said to form a pair. When a significant deviation of (the frequency can be daily or intra-daily) whenever significant
the prices between the two assets is detected, a trading position deviations are observed. We propose an alternative augmented
is taken: the higher priced asset is sold (this is the so-called state space form for a previously selected model estimated
short position by market practitioners) and the lower priced using spread data, and with this enlarged state space form,
asset is bought (that is, a long position is taken), with the hope we apply the Kalman filter k-steps-ahead prediction (see, for
that mispricing will correct to the long-term equilibrium value instance, Harvey 1989, Durbin and Koopman 2001) to obtain
(cf. Elliott et al. 2005, Vidyamurthy 2004). conditional mean vectors and covariance matrices of the k
In this paper, we consider two linear state space models future spreads. These first- and second-order moments are all
that are appropriate for modelling spreads (stationary linear that are needed for calculating the conditional probabilities
combinations of long-term related assets), with the intent of previously mentioned. The quantitative strategy we pursue
testing a new quantitative strategy involving pairs trading. The here is activated according to the following rule: if the spread
first model is the unobserved component model proposed by is found to be considerably below (above) its long-term mean
Elliott et al. (2005). Such a model, which has a Gaussian linear and the conditional probability that the spread will increase
state space form, is a discrete-time version of the linear mean- above (decrease below) its long-term mean by k-steps ahead
reverting Ornstein–Uhlenbeck model. The second model is is reasonably large, buy (sell) the spread.
the traditional stationary autoregressive moving average, or The contribution made by this paper to the literature on
pairs trading is the paradigm related to the trading rule briefly
∗ Corresponding author. Email: [email protected]
© 2016 Informa UK Limited, trading as Taylor & Francis Group
2 C. E. de Moura et al.
described above: one takes positions on the assets forming In their study, stocks from companies that had at least one day
the pair by checking whether the spread is too positive or too out of business were discarded. A pair formation for each stock
negative and also by examining the probability that the spread was found by minimizing the squared deviations between the
will not take too long to cross its long-term value (which is the two normalized daily price series, where the dividends were
probability that a profit will result soon). reinvested. The basic strategy consists of opening a position in a
The paper is organized as follows. Section 2 reviews the pair when prices diverge by more than two historical standard
literature on pairs trading, without claiming exhaustiveness. deviations and unwinding the position whenever the prices
Section 3 discusses pair trading from the statistical arbitrage cross each other. Should prices not cross after the end of the
standpoint, enumerating some of its main practical features. trading interval, gains and losses are calculated at the end of
Section 4 presents the two aforementioned linear state space the last trading day. The performance of this strategy used by
models, discusses their mathematical properties and embeds Gatev et al. (2006) was addressed for a Brazilian stock market
each of them into the state space modelling/Kalman filter case by Perlin (2007). The latter investigated the period from
framework. Section 5 formally discusses how the conditional 2000 until 2006 and tested different conditions of long and
probabilities that the spread will mean-revert are calculated, short, ranging between 1.5 and 3 standard deviations. For the
addresses the corresponding computational issues and data set used, the best options were those between 1.5 and 2
describes step by step how the quantitative strategy is imple- standard deviations.
mented. Section 6 offers two applications to real data from the A very seminal paper entirely dedicated to state space mod-
US and Brazilian markets and compares the performance of elling for spreads was authored by Elliott et al. (2005), where
the proposed strategy with the main benchmarks and with a a Gaussian linear state space model for the mean-reversion
former pairs trading strategy already used by market practi- behaviour of the spread between paired stocks was devel-
tioners. For each of the examples, the justification for the pair oped under a continuous time setting. It is assumed that the
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
used is initially addressed using a fundamental analysis of the ‘observed’ spread St is a noisy observation of some mean-
expected equilibrium between the two corresponding assets reverting ‘unobserved’ spread xt . The set-up for parameter es-
(section 6.1); in the sequel, the advocated equilibrium rela- timation is based on a version of the expectation–maximization
tion is assessed using proper econometric cointegration tests algorithm previously developed in Elliott and Krishnamurthy
(beginning of section 6.2). An analysis of the computational (1999). The pairs trading strategy proposed is the following:
effort related to estimation and goodness of fit is included if St is larger/smaller than the one-step-ahead estimate x̂ t|t−1 ,
as well. Section 7 discusses the main results obtained in the then the spread is regarded as too large/small, and thus, the
former section, provides some economic arguments in favour trader could take a short/long position in the spread portfolio.
of our methodology and lists some comments on the use of the Therefore, a profit is expected whenever a price correction
latter in real scenarios. The appendices review the Kalman filter occurs.
methods used in the paper, provide the proofs of the technical Another paper on state space models for spread data is that by
results and explain some of the financial returns calculated in Triantafyllopoulos and Montana (2009), where the modelling
the applications. framework proposed in Elliott et al. (2005) is extended in
several ways. First, they introduce time-varying autoregressive
(or mean-reverting) parameters, which potentially allows the
2. Pairs trading: a glimpse at the literature model to adapt itself to sudden changes in the data. Second,
they develop and implement a Bayesian approach for estimat-
This section discusses earlier studies on pairs trading strategies, ing the parameters and provide an on-line estimation scheme.
focusing mainly on spread modelling. A feature common to Finally, they advocate a procedure known as flexible least
most of these models reviewed in section 2.1 consists of recog- squares to estimate the cointegration coefficient recursively,
nizing the spread associated with a pair of stocks (cf. the naive unveiling a possible time-varying cointegration relationship
definition of ‘pair’ already given and used in section 1) as some between the two asset prices.
kind of mean-reverting stochastic process, the parameters of We now discuss two other works published in 2009. The
which are estimated using financial market data. In section 2.2, first is a paper by Bertram (2009), where the theory of Itô
we explain how this paper fits within the literature. diffusion processes comes into play for determining optimal
trading strategies that also take into account transaction costs.
The empirical content of the paper makes use of the Ornstein–
2.1. The review Uhlenbeck modelling of the spread of a security traded si-
The first reference that we discuss here is Vidyamurthy (2004). multaneously in both the Australian and New Zealand stock
In this book, a good background is provided on the pairs trading markets. The second paper, by Huck (2009), offers a data-
universe as well as several techniques for choosing pairs trad- driven and multi-criteria decision method for selecting pairs
ing, with a focus on cointegration tests. Moreover, the author and implements the latter using weekly returns of S&P100
explains how pairs trading works and surveys some meth- stocks.
ods for addressing the problem in real settings—for instance, In 2010, at least five papers addressing pairs trading tech-
common trends/cointegration models, arbitrage pricing the- niques were published. Bertram (2010) complements his work
ory (APT), distance measures and state space models/Kalman published in 2009 by deriving analytical solutions for the ex-
filter. pected return, the variance of the return and the expected trade
Gatev et al. (2006) studied pairs trading strategies in the length of his continuous time trading strategy—these param-
US equity market with daily data over the period 1962–2002. eters are used for constructing optimal strategies. Similarly,
A pairs trading strategy 3
Huck (2010) authored a continuation paper (of the one just Finally, we mention that our statistical framework is quite
surveyed in the last paragraph), where the multi-criteria de- different from that of Triantafyllopoulos and Montana (2009),
cision method is enhanced by adding neural network fore- who work with a model that, by its very definition, has to
casting techniques. The paper by Avellaneda and Lee (2010) be recognized under the conditionally Gaussian state space
employed principal component analysis with sectors Exchange approach (see appendix 1). Moreover, Triantafyllopoulos and
Trade Funds for extracting risk factors. Baronyan et al. (2010) Montana make use of the Bayesian perspective for estimating
investigated 14 market-neutral trading strategies combined the model parameters. On the other hand, we accomplish such
with different trading methods and pairs selection methods. tasks in our model using the maximum likelihood method.
From empirical evidence of weekly data on stocks that com-
prise the Dow Jones 30 index, they find that the performance
of market-neutral equity trading is superior in the complicated
year of 2008, the first one of the global financial crisis. Finally,
Wissner-Gross and Freer (2010) proposed an econophysical 3. Statistical Arbitrage Strategies
perspective to generalize statistical arbitrage trading strategies
for space-like separated world trading locations: one of their Quoting Kaufman (2005), ‘when the two legs of a spread are
findings is that optimal intermediate locations exist between highly correlated and therefore the opportunity for profit from
trading centres. price divergence is of short duration, the trade is called an
Continuing with the literature review, we now mention the arbitrage. True arbitrage has, theoretically, no trading risk,
paper by Mori and Ziobrowski (2011). In this mostly empir- however it is offset by small profits and limited opportunity
ical work, the effectiveness of pairs trading in the US Real for volume’.
Estate Investment Trust market is compared with that in the US Statistical arbitrage is a class of strategies widely used by
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
general stock market over the period 1987–2008. The authors hedge funds and proprietary traders. The distinctive feature
conclude that the former market was more profitable than the of such strategies is that profits can be made by exploiting
latter between 1993 and 2000, after which pairs trading showed the statistical mispricing of one or more assets, based on their
similar performances in both markets. regular behaviour. Despite the use of the term ‘arbitrage’, such
To conclude, we review three recent works on pairs trading a class is not riskless. One simple and very popular strategy that
theory and methods. The first two are works by Fasen (2013a, fits in with the definition of statistical arbitrage is pairs trading
2013b), who essentially proposes least squares estimators for (cf. Elliott et al. 2005). Other types of statistical arbitrage are
the parameters of several versions of the Ornstein–Uhlenbeck discussed in Vidyamurthy (2004) and Pole (2007).
model and fully investigates their statistical properties, such as Following Vidyamurthy (2004), the first use of a pairs trad-
consistency and asymptotic distributions. The usual t-ratio and ing strategy is attributed to the Wall Street ‘quant’ Nunzio
Wald tests are also investigated in terms of their asymptotic Tartaglia, who was at Morgan Stanley in the Mid-1980s. Pairs
behaviour. In the third paper by Tourin and Yan (2013), a trading is based on APT (cf. Ross 1976). Informally speaking,
dynamic model for pairs trading based on the theory of optimal if two stocks have similar characteristics, the prices of both
stochastic control is proposed and illustrated using minute-by- assets must be more or less the same; that is, they maintain some
minute historical data on two stocks traded on the New York degree of equilibrium. If prices diverge, then it is likely that
Stock Exchange. one of the assets is overpriced and/or the other is underpriced.
Basically, pairs trading schemes involve selling the higher
priced asset and buying the lower priced asset with the hope
that mispricing will be ultimately corrected by the long-term
2.2. This paper’s contribution equilibrium value. The difference between the two observed
prices is termed spread. Therefore, the idea behind a given
Given the articles and books reviewed here on pairs trad-
pairs trading strategy is to trade on the oscillations around the
ing, in this paper, we intend to complement the findings of
equilibrium value of the spread. The oscillations of the spread
Elliott et al. (2005) along three directions:
occur because the latter is allegedly mean-reverting. One can
• A more general class of possible probabilistic put on a trade when the spread deviates substantially from its
descriptions—the ARMA models—for a given spread equilibrium value and unwind the trade when the equilibrium
time series is proposed.As we demonstrate, such a class is restored (cf. Elliott et al. 2005). For the trade to be profitable,
encapsulates the mean-reverting model by Elliott et al. the deviation must be reasonably larger than trading costs.
as a particular case. Pairs trading is a market-neutral trading strategy. Hence,
• We create a new quantitative pairs trading strategy this strategy strives to provide positive returns in both bull and
based upon outputs (specifically: some conditional bear markets by selecting a large number of long and short
probabilities) of the stochastic model selected and es- positions with no net exposure to the market (cf. Nicholas 2000,
timated using spread data. Jacobs and Levy 2005). The main risks involved in a pairs
• The whole procedure (model estimation & trading trading are the following: (1) the divergence risk: the long-
strategy) is implemented using real financial time term equilibrium relation between the assets may change or
series. The results are compared with those from other even vanish; (2) the horizon risk: the spread does not converge
investment alternatives, including the simple pairs in a given horizon of time, hence forcing the traders to close the
trading strategy proposed by Gatev et al. (2006) and position before the convergence, due to worsened mispricing
re-considered by Perlin (2007). or margin calls (cf. Engelberg et al. 2009).
4 C. E. de Moura et al.
makes use of already-implemented formulae known to time framework—see for instance Drezner and Wesolowsky (1989),
series analysts. Genz (1992, 2004) and Drezner (1994).
The building block for routinely evaluating equations (7),
(8) and (9) for each time t involves using an augmented state
space form equivalent to a given time series model formerly
selected and estimated using the spread data. In this paper, the 5.3. The strategy
models considered are those previously discussed in sections Assuming that a particular state space model has been already
4.2 and 4.3. This task consists of adding k new blocks to the estimated with available time series data from the spread pro-
state vector in equation (A1) of appendix 1, and each one has the cess St —the latter is associated with a pair of assets A1 and
same dimensions as those of the original state vector. Formally: A2 —, that the numerical devices discussed in section 5.2 have
⎡ ⎤ been implemented, and that the capital is invested at some low-
αt risk fixed income market, we now propose our trading rule. It
⎢ ⎥
⎢αt−1 ⎥ can be split into two mutually exclusive situations:
Yt = Z t 0 . . . 0 ⎢ . ⎥ + d t + ε t ,
⎣ .. ⎦ • If the observed value of St is found to be minimally lower
αt−k than (let us say for δ units) a long-term value c, which
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ is the same as that used in equation (5) and previously
αt+1 Tk 0 ··· 0 αt ct Rt
⎢ αt ⎥ ⎢I 0 · · · 0⎥ ⎢αt−1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ fixed to a particular value (for instance: c = 0, should
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. ⎥ ⎢ .. ⎥ + ⎢ .. ⎥ + ⎢ .. ⎥ ηt , one choose the spread mean), and pup in equation (5) is
⎣ . ⎦ ⎣. . ··· . ⎦ ⎣ . ⎦ ⎣.⎦ ⎣ . ⎦ found to be greater than some ‘large’ value pup ∗ , use the
αt−(k−1) 0 ··· I 0 αt−k 0 0 capital to buy the spread.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
market because it is the opportunity cost inherent to this strat- Table 1. Engle–Granger cointegration tests with the pairs (in-sample
egy. This issue is allegedly addressed using the parameter δ. analysis).
XOM-LUV −3.006**
VALE5-BRAP4 −4.059**
6. Applications
*Critical values considered have been taken from MacKinnon (2010).
**Pair was considered stationary at a 5% level.
This section presents the results of applying models from
section 4 and the pairs trading strategy derived in
section 5 with real data from the US and Brazilian markets. Additionally, the following asset class indexes have been
In section 6.1, we describe the data used in the estimations used in the evaluation of strategy results:
and justify our choice of the stocks as candidates to form
• Libor—1 year: This indicator stands for London Inter-
pairs. For each case, an effort is made to examine the expected
bank Offered Rate. It is the rate that banks use to borrow
equilibrium between the pair of stock prices in light of the
from and lend to one another in the wholesale money
existing economic relation between both firms. In section 6.2,
markets in London.
we present the results on cointegration tests (which statisti-
• Standard and Poor’s 500 Index (S&P): This is a
cally confirm the economic insights), model estimation and
capitalization-weighted index of 500 stocks represent-
goodness-of-fit, and the strategy performances.
ing all major industries and is designed to measure the
performance of the broad domestic economy through
changes in the aggregate market.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
for the pair XOM-LUV. On the other hand, even though the torical volatility and maximum drawdown. The Sharpe ratio
Kupiec tests suggested that the standardized residuals from all for the plain strategy has a negative value and is therefore not
the four models estimated using the VALE5-BRAP4 spread shown. The cumulative and average returns corresponding to
seem to come from a probability distribution similar to the the AR(2) and ARMA(1,1) models are larger than the other
standard normal distribution in terms of the tails, the Jarque– investment opportunities, except for the stock index (S&P),
Bera test unveiled discrepancies. Therefore, some care must be which showed a strong upward trend in the out-of-sample
exercised in interpreting and even using the conditional prob- period, as illustrated in each panel of figure 1 by the corre-
abilities pup and pdown in equation (5) for trading decisions: sponding return lines during the time instants after observation
pup and pdown might not be ‘tail’ probabilities. 250. Economic explanations for this excellent performance of
We now discuss our pairs trading strategy performances. the US stock market in the mentioned period would include
This time, as opposed to previous tasks (cointegration testing, the US economy expansion in the first quarter of 2013 and an
parameter estimations and goodness-of-fit analysis), we also agreement reached by the US federal government regarding the
consider the out-of-sample parts of the data-sets for both pairs. US debt ceiling. However, due to its quite larger volatility, the
Therefore, additionally to address performances during the S&P had a worse Sharpe ratio and a larger maximum draw-
period spanning the first year, we investigate the ability of down. Additionally, both our strategy and the plain strategy
our strategy to make profits as compared with other invest- with the S&P displayed low correlations: the plain strategy
ment alternatives during a period spanning about six months exhibited a better performance, as the latter and the S&P were
without re-estimating any parameter. This should be viewed virtually uncorrelated. This evidence was previously expected,
as an assessment of how robust our proposed methodology as the type of quantitative strategy considered is one that is
as a whole might be in real scenarios when it may perhaps supposedly market neutral. On the other hand, based on the
take some time to update/calibrate the statistical models for ability to make profits when a trading position on the spread is
the spread time series. opened, our strategy proved to be considerably superior to the
The parameter c is set to zero, which is the long-term mean plain strategy, as gains were achieved 90% of the times with the
of the spreads, as these are precisely the OLS residual time former (see the fifth performance measure in table 3). Figure 1
series from the cointegration regressions. The parameter δ is depicts cumulative returns for the four state space models,
set to 0.5% to overcome operating costs, due to slippage (this together with cumulative returns from the market indices and
is the difference between the trade expected price and the trade the plain strategy, corroborating and illustrating the findings
actual price) and transaction. In view of these two choices for presented in table 3.
c and δ, a position to buy (sell) spread is open if and only Likewise, both table 4 and figure 2 present the results for
if the spread is less (greater) than −δ (+δ). Finally, for the the pair VALE5-BRAP4. The best performance, relying once
conditional probabilities pup and pdown , their threshold values again on Sharpe ratio comparisons (which were negative for
∗ and p ∗
pup down are both set to 80% and the parameter k is fixed at both the Ibovespa domestic stock index and the plain strat-
25, meaning that the strategy will be closed if, once the spread egy and for two models considered with our strategy), is that
is bought or sold, the pair does not return to its long-term mean corresponding to the AR(1) model. Additionally, like all the
in 25 days at the current market prices, with the latter being an other models and the plain strategy, the AR(1) model has also
event with a conditional probability of 20% at the most. shown almost no correlation at all with Ibovespa. In figure 2,
Table 3 and figure 1 display the results corresponding to it is suggested that cumulative returns of our pairs trading
the pair XOM-LUV for the four linear state space models strategy, implemented with this best AR(1) model, maintained
already under investigation. They also show the results of an upward trend with relatively low volatility, probably
A pairs trading strategy 9
XOM-LUV VALE5-BRAP4
Figure 1. Comparison of the cumulative returns: strategy P/L with the pair XOM-LUV, Libor, S&P and plain strategy (whole period analysis).
corroborating the best Sharpe ratio. In terms of Ibovespa, we and persistent reversals of this index in figure 2. Finally, in
observe that even though this benchmark did present at specific terms of the efficiency indicator given in table 4, our strategy
times the largest returns amongst all the investment alternatives has clearly outperformed the plain strategy: similar to the first
in the period considered, its huge risky behaviour (compar- exercise with the US market, the percentages of success in
ing the volatilities and maximum drawdowns in table 4) is trading positions were in tune with the nominal threshold value
noteworthy and has certainly contributed to some temporary of 80% for the conditional probabilities pup and pdown .
losses and a worse cumulative return at the very end of the out- Finally, table 5 shows the computational gain, in terms of
of-sample period. This can also be seen from the downward estimation time, due to proposition 2 of this paper. Even though
10 C. E. de Moura et al.
Table 3. USA market data: performance measures from four different models for the spread and three benchmarks (whole period analysis).
XOM-LUV Benchmarks
Figure 2. Comparison of cumulative returns: strategy P/L with the pair VALE5-BRAP4, CDI, IBOVESPA and plain strategy (whole period
analysis).
Table 4. Brazilian market data: performance measures of four different models for the spread and three benchmarks (whole period analysis).
VALE5-BRAP4 Benchmarks
Table 5. Computational times (seconds) for maximum likelihood estimation of the models with the pair VALE5-BRAP4 (in-sample analysis).
Models Original modelAugmented model (k = 10)Augmented model (k = 15)Augmented model (k = 20)Augmented model (k = 25)
the information corresponds to model estimations with a port- market. In other words, the model in equation (10), which is
folio that has only a pair of assets on a daily basis, it is plausible the building block of the quantitative strategy proposed in
to assume that the augmented model would also be excessively this paper, serves ultimately as a way of incorporating the
time consuming. If we had adopted and implemented the mod- extra information provided by the data as well as of informed
elling and pairs trading strategy proposed in this paper with traders (cf. Baruci 2003, chapter 7, and references therein) in
intraday high-frequency data, the estimation times would have an algorithmic and consistent decision mechanism. As more
been increased in the case of a portfolio containing several practical exercises are still to be made, we do recognize that
pairs. For instance, the augmented model with k = 25 for the empirical evidence shared in this paper is limited in order
Elliott’s model required almost 3 min for estimation; the orig- to confirm these economic perspectives. However, the two
inal model took less than three seconds. applications detailed in section 6 already prove that our strategy
can be efficiently implemented and suggest that this change
of direction in the usual pairs trading paradigms might work
7. Discussion well.
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
portfolio value. However, sticking to Grinold and Kahn’s point Finally, we discuss the use of our strategy in high-frequency
of view, ‘trading is itself a portfolio optimization problem, data. The analyses of these data are complicated due to irregular
distinct from the portfolio construction problem’; therefore, temporal spacing, intra-daily patterns and price discreteness
‘optimal trading can lower transactions costs, though at the (cf. Ait-Sahalia and Hansen 2010, chapter 7). Another major
expense of additional short-term risk’. Bearing these last quo- characteristic of high-frequency data is the strong intra-day
tations in mind, we understand that to effectively combine such seasonal behaviour of the volatility, as pointed out by Fouque
trading schemes for reducing (that is, optimizing) costs and a et al. (2000, chapter 4). A data-generating process with strong
pairs trading strategy such as the one proposed in our paper seasonal patterns cannot be stationary. Therefore, controlling
deserves much more time and space. We leave this as a possible these periodical movements before fitting any time series model
theme for upcoming papers on pairs trading. to the data should be a mandatory initial step. In light of these
We now dedicate some effort towards discussing the ques- issues typically related to high-frequency situations, other state
tion of market neutrality. The starting point is to recall that space models shall be combined with the pairs trading strategy
virtually any portfolio return variation can possibly be ex- proposed in this paper.
plained by some market factors. Following the standard finance
literature, the natural way of addressing this is to consider
some type of factor model for the portfolio, amongst which Acknowledgements
we recall the CAPM & APT models (cf. Elton et al. 2014), the
model by Fama and French (cf. Fama and French 1996), and Our sincere thanks go to the referees, whose comments,
the asset class factor model (cf. Sharpe 1992, de Roon et al. requirements and suggestions were invaluable for improving
2004).We understand that such modelling should consider the this paper. We are also truly grateful to Cristiano Fernandes,
quite plausible assumption of time-varying coefficients, which Adrien Nguyen Huu and Paulo Cezar Carvalho for their very
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
would allow us to precisely see the real exposures on different constructive comments. All remaining errors are ours.
financial risks and, thus, be more solid about unveiling the mar-
kets that a portfolio defined by our strategy would be neutral
Disclosure statement
to. This time-varying coefficient assumption is justified here
because our strategy involves time-varying trading positions No potential conflict of interest was reported by the authors.
on three different assets: the two assets forming the pair and a
risk-free asset. Such extensions of the original factor models
previously cited already exist and view the coefficients (the References
factor exposures) as latent stochastic processes: for instance,
the reader is referred to the dynamic asset class factor mod- Ait-Sahalia, Y. and Hansen, L., Handbook of Financial Econometrics,
els proposed in Swinkels and Van Der Sluis (2006), Pizzinga 2nd ed., 2010 (Springer: New York).
et al. (2011) and Marques et al. (2012). Statistically speaking, Atherino, R., Pizzinga, A. and Fernandes, C., A row-wise stacking
of the runoff triangle: State space alternatives for IBNR reserve
we are interested in the selection and estimation of stochastic prediction. Astin Bull., 2010, 40(2), 917–946.
coefficient regression models. Coincidently, the method for Avellaneda, M. and Lee, J.H., Statistical arbitrage in the US equities
implementing such tasks is proper linear state space modelling market. Quant. Finance, 2010, 10(7), 761–782.
with the use of the Kalman filter. However, although it would Baronyan, S., Boduroglu, I. and Sener, E., Investigation of stochastic
require the same methodological framework of the statistical pairs trading strategies under different volatility regimes. The
Manchester School, 2010, 2010 (supplement), 114–134.
analysis already found in our paper, we would require other Baruci, E., Financial Markets Theory, 2003 (Springer: New York).
implementations (the stochastic coefficient regression models Bertram, W.K., Optimal trading strategies for Itô diffusion processes.
are quite different from, for example, the ARMA models con- Physica A, 2009, 388, 2865–2873.
sidered in our paper). Due to space limitations and the huge Bertram, W.K., Analytic solutions for optimal statistical arbitrage
relevance of market neutrality as a mainstream subject within trading. Physica A, 2010, 389, 2234–2243.
Bessembinder, H., Coughenour, J., Seguin, P. and Smoller, M., Mean
the finance literature, we leave this for future research. reversion in equilibrium asset prices: Evidence from the futures
We now take a closer look at a more statistically oriented term structure. J. Finance, 1995, 50, 361–375.
question: that of distributional assumptions. As strong viola- Brockwell, P.J. and Davis, R.A., Time Series: Theory and Methods,
tions of normality can make the quantities pup and pdown quite 2nd ed., 1991 (Springer: New York).
unreliable as proxies for the true conditional probability of Brockwell, P.J. and Davis, R.A., Introduction to Time Series and
Forecasting, 2nd ed., 2003 (Springer: New York).
mean-reverting, an alternative for dealing with such inconve- de Roon, F.A., Nijman, T.E. and Ter Horst, J.R., Evaluating style
nient situations is to rely on Monte Carlo simulations of future analysis. J. Empirical. Finance, 2004, 11(1), 29–53.
trajectories of the spread St k steps ahead. For the ARMA Drezner, Z., Computation of the trivariate normal integral. Math.
models, this would require modelling the error term with the aid Comput., 1994, 63, 289–294.
of standardized residuals. A second alternative, which releases Drezner, Z. and Wesolowsky, G.O., On the computation of the
bivariate normal integral. J. Stat. Comput. Simul., 1989, 35, 101–
one from choosing/modelling error distributions (but is much 107.
more demanding in computational terms), is to adopt some Durbin, J. and Koopman, S.J., Time Series Analysis by State Space
bootstrap procedure to estimate the mean-reverting conditional Methods, 2001 (Oxford Statistical Science Series: Oxford).
probabilities. Wall and Stoffer (2002) and Rodriguez and Ruiz Elliott, R.J. and Krishnamurthy, V., New finite-dimensional filters
(2009) are two papers from among a large list of references on for parameter estimation of discrete-time linear Gaussian models.
IEEE Trans. Autom. Control, 1999, 44, 938–951.
bootstrapping state space models, and these two papers have Elliott, R.J., van der Hoek, J. and Malcolm, W.P., Pairs trading. Quant.
methodologies that address the aims being discussed here. Finance, 2005, 5(3), 271–276.
A pairs trading strategy 13
Elton, E.J., Gruber, M.J., Brown, S.J. and Goetzmann, W.N., iModern Pizzinga, A., Vereda, L. and Fernandes, C., A dynamic style analysis
Portfolio Theory and Investment Analysis, 9th ed., 2014 (John of exchange rate funds: The case of Brazil at the 2002 election.
Wiley & Sons: Hoboken, NJ). Adv. Appl. Stat. Sci., 2011, 6, 111–135.
Enders, W., Applied Econometric Time Series, 2nd ed., 2004 (John Pole, A., Statistical Arbitrage: Algorithmic Trading Insights and
Wiley & Sons: Hoboken, NJ). Techniques, 2007 (John Wiley & Sons: Hoboken, NJ).
Engelberg, J., Gao P. and Jagannathan R.,An anatomy of Pairs trading: Rampertshammer, S., An Ornstein–Uhlenbech Framework for Pairs
The role of idiosyncratic news, common information and liquidity, Trading, 2007 (Department of Mathematics and Statistics of the
Third Singapore International Conference on Finance, 2009. University of Melbourne). Unpublished Note.
Engle, R. and Granger, C., Co-Integration and error correction: Rodriguez, A. and Ruiz, E., Bootstrap prediction intervals in state
Representation, estimation, and testing. Econometrica 1987, 55 space models. J. Time Ser. Anal., 2009, 30(2).
(n◦ 2), 251–276. Ross, S., The arbitrage theory of capital asset pricing. J. Economic
Fama, E.F. and French, K.R., Multifactor explanations of asset pricing Theory, 1976, 13, 341–360.
anomalies. J. Finance, 1996, 51(1), 55–84. Sharpe, F.W., Mutual fund performance. J. Bus., 1966, 39, 119–138.
Fasen, V., Statistical estimation of multivariate Ornstein-Uhlenbeck Sharpe, W.F., Asset allocation: Management style and performance
processes and applications to co-integration. J. Econometrics, measurement. J. Portfolio Manage. 1992, (winter), 7–19.
2013a, 172, 325–337. Sharpe, F.W., The sharpe ratio. J. Portfolio. Manage., 1994, 21,
Fasen, V., Time series regression on integrated continuous-time 49–58.
processes with heavy and light tails. Econometric Theory, 2013b, Shumway, R.H. and Stoffer, D.S., Time Series Analysis and
29, 28–67. its Applications (With R Examples), 2nd ed., 2006 (Springer:
Fouque, P.J., Papanicolaou, G., Sircar, R. and K., Derivatives in New York).
Financial Markets with Stochastic Volatility, 2000 (Cambridge Swinkels, L. and van der Sluis, P.J., Return-based style analysis with
University Press: Cambridge). time-varying exposures. Eur. J. Finance, 2006, 12, 529–552.
Gatev, E., Goetzmann, W. and Rouwenhorst, K., Pairs trading: Tourin, A. and Yan, R., Dynamic pairs trading using the stochastic
Performance of a relative value arbitrage rule. Rev. Financial Stud., control approach. J. Economic Dyn. Control, 2013, 37, 1972–1981.
2006, 19, 797–827. Triantafyllopoulos, K. and Montana, G., Dynamic modeling of mean-
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016
Genz, A., Numerical computational of multivariate normal reverting spreads for statistical arbitrage. Comput. Manage. Sci.,
probabilities. J. Comput. Graphical Stat., 1992, 1, 141–149. 2009, 8, 23–49.
Genz, A., Numerical computation of rectangular bivariate and Vidyamurthy, G., Pairs Trading, Quantitative Methods and Analysis,
trivariate normal and t probabilities. Stat. Comput., 2004, 14(3), 2004 (John Wiley & Sons: Hoboken, NJ).
251–260. Wall, K. and Stoffer, S., A State space approach to bootstrapping
Grinold, R.C. and Kahn, R.N., Active Portfolio Management: conditional forecasts in ARMA models. J. Time Ser. Anal., 2002,
A Quantitative Approach for Producing Superior Returns and 23(6).
Controlling Risk, 2nd ed., 1999 (McGraw-Hill Education: New Wissner-Gross, A.D. and Freer, C.E., Relativistic statistical arbitrage.
York). Phys. Rev. E, 2010, 82, 056104.
Hamilton, J.D., Time Series Analysis, 1994 (Princeton University
Press: Princeton, NJ).
Harvey, A.C., Forecasting, Structural Time Series Models and the Appendix 1. Linear state space models & the Kalman filter
Kalman Filter, 1989 (Cambridge University Press: Cambridge).
Harvey, A.C., Time Series Models, 2nd ed., 1993 (Harvester By a Gaussian linear state space model, we mean the following
Wheatsheaf: Hemel Hempstead). measurement equation, state equation and initial state vector:
Huck, N., Pairs selection and outranking: An application to the S&P
100 index. Eur. J. Operational Res., 2009, 196, 819–825. Yt = Z t αt + dt + t , t ∼ NID(0, Ht )
Huck, N., Pairs trading and outranking: The multi-step-ahead αt+1 = Tt αt + ct + Rt ηt , ηt ∼ NID(0, Q t ) (A1)
forecasting case. Eur. J. Operational Res., 2010, 207, 1702–1716. α1 ∼ N(a1 , P1 ).
Jacobs, B. and Levy, K., Market Neutral Strategies, 2005 (John Wiley
The former equation is an affine function relating the observed
& Sons: Hoboken, NJ).
p-variate time series Yt to the generally unobserved m-variate state
Karatzas, I. and Shreve, S.E., Brownian Motion and Stochastic
vector αt , and the latter equation stipulates the state evolution through
Calculus, 1997 (Springer: New York).
a Markovian structure. The random errors t and ηt are independent
Kaufman, P.J. New Trading Systems and Methods, 4th ed., 2005 (John
(in time, between each other and of α1 ). The system matrices Z t , dt ,
Wiley & Sons: Hoboken, NJ).
Ht , Tt , ct , Rt and Q t are deterministic or, at most, depend on the past
Kupiec, P., Techniques for verifying the accuracy of risk management
value of Yt . In the latter case, Harvey (1989, section 3.7), refers to
models. J. Derivatives, 1995, 3, 73–84.
equation (A1) as a conditionally Gaussian state space model.
Mackinnon J.G., Critical Values for Cointegration Tests, Queen’s
Economics Department Working Paper No. 1227, Queen’s
For a given time series of size n and any time instants j ∈
t,
{1, 2, . . . n}, define F j ≡ σ Y1 , . . . , Y j , at| j ≡ E αt |F j and
University, 2010.
Magdon-Ismail, M., Atiya, A.F., Pratap, A. and Abu-Mostafa, Y.S., On Pt| j ≡ Var αt |F j . Kalman filtering consists of recursive equations
the maximum drawdown of a Browninan motion. J. Appl. Probab., for these first- and second-order conditional moments, corresponding
2004, 41, 147–161. to one-step-ahead prediction ( j = t − 1) and smoothing ( j = n). The
Marques, R., Pizzinga, A. and Vereda, L., Restricted Kalman formulae corresponding to the predictions are given below:
filter applied to dynamic style analysis of actuarial funds. Appl. υt = Yt − Z t at|t−1 − dt , Ft = Z t Pt|t−1 Z t + Ht ,
Stochastic Models Bus. Ind., 2012, 28, 558–570.
Mori, M. and Ziobrowski, A., Performance of Pairs trading strategy in K t = Tt Pt|t−1 Z t Ft−1 , L t = Tt − K t Z t , t = 1, . . . , n,
the U.S. REIT market. Real Estate Economics, 2011, 39(3), 409– at+1|t = Tt at|t−1 + ct + K t υt , Pt+1|t = Tt Pt|t−1 L t + Rt Q t Rt ,
428. (A2)
Nielsen, M. and Schwartz, E., Theory of Storage and the Pricing of
Commodity Claims. Rev. Derivatives Res., 2004, 7, 5–24. The derivations of equation (A2) are found in Durbin and Koopman
Nicholas, J.G., Market-Neutral Investing: Long/Short Hedge Fund (2001). There are several other references on this subject that deserve
Strategies, 2000 (Bloomberg Press: Princeton, NJ). mention, such as the books by Harvey (1989, 1993), Brockwell and
Perlin, M.S., Evaluation of Pairs trading strategy at the Brazilian Davis (1991, 2003), Hamilton (1994), and Shumway and Stoffer
financial market. J. Derivatives Hedge Funds, 2007, 15, 122–136. (2006).
Pindyck, R., The dynamics of commodity spot and futures markets: In practice, system matrices include unknown parameters that must
A primer. Energy J., 2001, 22(3), 1–29. be estimated. By grouping all unknown parameters of the model
14 C. E. de Moura et al.
.
.. ⎦ ⎢
(C3)
⎣ . ..
. · · · ⎢ .. .. .. .. .. ⎥
⎥
1 1 a a j=1 ⎢ . . . . . ⎥
xt = a+ η∗ = + η∗ , 0 ··· I 0 ⎣ ⎦
(1 − B L) (1 − B L) t (1 − B) (1 − B L) t Ts−1 0 0 ··· 0
(B1) I 0 0 ··· 0
⎡ ⎡T 0 · · · 0⎤⎤ ⎛⎡ R ⎤ ⎡ ⎤⎞
cj
k j
where L is the usual lag operator (recall: 0 < b < 2). Now, we
⎢ s−1
s−2 ⎢ I 0 · · · 0⎥⎥ ⎜ ⎢0⎥ ⎢ 0 ⎥⎟
substitute equation (B1) in the first equation of equation (2) to get ⎢ ⎢. . .. ⎥ ⎥ ⎢ . ⎥ηj + ⎢ . ⎥
⎜ ⎢ ⎥ ⎢ ⎟
⎣ ⎣ . .. ⎦⎦ ⎜ ⎝ ⎣ . ⎦ ⎣ . ⎥⎟
j=1 k= j+1 . ··· . . . ⎠
⎦
a 1
St = + η∗ + Dt 0 ··· I 0 0 0
(1 − B L) (1 − B L) t ⎡ !s−1 ⎤
a 1 k= j+1 Tk R jηj + cj
= + η∗ + t∗ (B2) ⎢!s−1 ⎥
(1 − B L) (1 − B L) t ⎢ k= j+1 Tk+1 R j η j + c j ⎥
⎢
s−2 ⎢ ⎥
.. ⎥
= ⎢ . ⎥. (C4)
where t∗ ∼ N(0, D 2 ). ⎢ ⎥
Applying the operator (1 − B L) on both sides of equation (B2), j=1 ⎢
⎢ T R η + c ⎥
⎥
⎣
s−1 j j j
⎦
R η +cj j j
St∗ ≡ (1 − B L)St = a + ηt∗ + t∗ − Bt−1
∗ (B3) 0
Placing (C3) and (C4) properly in (C2) implies
From equation (B3), it is straightforward to see that ⎧⎡ ⎤ ⎡ ⎤
⎨ s−1
s−2
s−1
γ (0) = C 2 + (1 + B 2 )D 2 , γ (1) = −B D 2 , γ (k) = 0, k ≥ 2, †˜
Ys = Z s ⎣ T j ⎦ α1 + ⎣ Tk ⎦ R j η j + c j
(B4) ⎩
j=1 j=1 k= j+1
"
where γ (k) = Cov(St∗ , St−k∗ ), k = 0, ±1, ±2, . . . . + cs−1 + Rs−1 ηs−1 + dt + εs , (C5)
From equation (B4) and Brockwell and Davis (1991, p. 89, propo- which coincides with the recursive solution of the measurement equa-
sition 3.2.1), it follows that St∗ ∼MA(1). tion from the original model (A1). To conclude the proof, combine
equations (C5) and (C1).
On the other hand, Vidyamurthy (2004, pp. 80–82), derived a direct that is, once a long position is taken on the spread, the return for the
way of obtaining the return for the pairs trading strategy proposed in latter is simply the difference between the spread value in times t + i
section 5 that is justified with some elements of the definition of a pair and t. If, in turn, the investor sells asset A2 and buys asset A1 (spread
of assets (recall section 4.1). Assume that log(Pt,1 ) and log(Pt,2 ) are is being sold now), virtually the same derivation in equation (D1)
cointegrated—that is, A1 and A2 form a pair—with mean α and would demonstrate that the spread return becomes St − St+i , which
cointegration coefficient β. If the investor takes a long position on in turn corresponds to the negative of the return.
asset A1 and takes a short position on asset A2 (that is, the investor
buys the spread) and if he or she maintains the position by at least
t + i (i = 1, 2, . . . , k, where k denotes divergence risk parameters
that are previously set - cf. section 5.3), then the corresponding return
from time t to t + i is given by
Pt+i,1 Pt+i,2
log − βlog
Pt,1 Pt,2
= log(Pt+i,1 ) − log(Pt,1 ) − β(log(Pt+i,2 ) − log(Pt,2 ))
= log(Pt+i,1 ) − βlogPt+i,2 − (log(Pt,1 ) − βlogPt,2 ) (D1)
= log(Pt+i,1 ) − α − βlogPt+i,2 − (log(Pt,1 ) − α − βlogPt,2 )
= St+i − St ;
Downloaded by [Adrian Pizzinga] at 00:26 27 May 2016