0% found this document useful (0 votes)
95 views

SSRN Id4774522

This paper discusses how factor investing strategies have underperformed expectations due to misspecification errors in their econometric models. It argues that current practices in factor research, such as p-hacking and choosing models based on explanatory power rather than causal inference, have led to biased estimates of risk premia. The paper calls for rebuilding the discipline of factor investing using causal inference methods to specify models correctly.

Uploaded by

Anshuman Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

SSRN Id4774522

This paper discusses how factor investing strategies have underperformed expectations due to misspecification errors in their econometric models. It argues that current practices in factor research, such as p-hacking and choosing models based on explanatory power rather than causal inference, have led to biased estimates of risk premia. The paper calls for rebuilding the discipline of factor investing using causal inference methods to specify models correctly.

Uploaded by

Anshuman Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

THE CASE FOR CAUSAL FACTOR INVESTING

Marcos López de Prado α


Alexander Lipton β
Vincent Zoonekynd γ

ADIA Lab Research Paper Series, No. 9

First version (v0.1): December 31, 2023


Current version (v1.4): April 8, 2024

______________________
α
Global Head, Quantitative Research & Development, Abu Dhabi Investment Authority (ADIA); Board Member,
ADIA Lab; Professor of Practice, School of Engineering, Cornell University; Research Fellow, Applied Mathematics
& Computational Research Department, Lawrence Berkeley National Laboratory. E-mail:
[email protected].
β
Global Head, Quantitative Research & Development, Abu Dhabi Investment Authority (ADIA); Board Member,
ADIA Lab.
γ
Quantitative Research & Development Lead, Abu Dhabi Investment Authority (ADIA); Research Affiliate, ADIA
Lab.

The views expressed in this paper are the authors’, and do not necessarily represent the opinions of the organizations
they are affiliated with. We would like to thank our ADIA colleagues, especially Pascal Blanqué, Anders Svennesen,
and Jean-Paul Villain for their suggestions. The paper has also benefited from conversations with Frank Fabozzi
(EDHEC), Miguel Hernán (Harvard University), Guido Imbens (Stanford University), Alessia López de Prado Rehder
(ETH Zürich), Riccardo Rebonato (EDHEC), Luis Seco (University of Toronto), and Horst Simon (ADIA Lab).
THE CASE FOR CAUSAL FACTOR INVESTING

ABSTRACT

Researchers use factor models to obtain unbiased estimates of the premia harvested by assets
exposed to certain risk characteristics. These estimates are unbiased only if the factor models are
correctly specified. Choosing the correct model specification requires knowledge of the causal
graph that characterizes the underlying data-generating process. However, following the current
econometric canon, factor researchers choose their model specifications using associational (non-
causal) arguments, such as the model’s explanatory power, instead of applying causal inference
procedures, such as do-calculus. As a result, factor investing models are likely misspecified, and
the estimates of risk premia are biased. This paper explains the dire consequences of factor
investing’s specification errors, and calls for the need to rebuild the discipline under the more
scientific foundations of causal factor investing.

Keywords: Causal inference, causal discovery, confounder, collider, factor investing, p-hacking,
underperformance, systematic losses.

JEL Classification: G0, G1, G2, G15, G24, E44.


AMS Classification: 91G10, 91G60, 91G70, 62C, 60E.

2
Factor investing can be defined as the investment approach that aims to monetize the exposure to
measurable risk characteristics (called “factors”) that presumably explain differences in the
performance of a set of securities. Its origins can be traced back to the seminal work of Fama and
MacBeth [1973] and Schipper and Thompson [1981], among others. Since then, thousands of
academic papers have claimed the discovery of hundreds of investment factors (Harvey et al.
[2016]), propelling the growth of a multi-trillion-dollar industry. In 2019, J.P. Morgan estimated
that over USD 2.5 trillion (more than 20 percent of the US equity market capitalization) was
managed by quant-style funds (Neuberger Berman [2019]). BlackRock estimates that the factor
investing industry managed USD 1.9 trillion in 2017, and it projected that amount would grow to
USD 3.4 trillion by 2022 (BlackRock [2017]).

125

120

115

110

105

100

95

Bloomberg GSAM US Equity Multi-Factor Index ICE BofA US 3-Month Treasury Bill Index

Figure 1 - Performance of US multi-factor equity strategies, compared to risk-free rate


instruments. Source: Bloomberg

Unfortunately for investors, these academically-endorsed investment products have failed to


perform as expected. Figure 1 plots the performance of one of the broadest factor investing indices,
the Bloomberg – Goldman Sachs Asset Management US Equity Multi-Factor Index (BBG code:
BGSUSEMF <Index>), and compares it to the performance of risk-free rate instruments, as
reported by the ICE – Bank of America US 3-Month Treasury Bill Index (BBG code: G0O1
<Index>). This Multi-Factor index tracks the long/short performance of the momentum, value,
quality, and low-risk factors in U.S. stocks (Bloomberg [2021]). Its annualized Sharpe ratio from
May 2, 2007 (the inception date) to December 31, 2023 (this paper’s completion date) has been
−0.08 (t-stat=−0.33, p-value=0.63), and the average annualized excess return has been −0.30%.
This performance does not include: (a) transaction costs; (b) market impact of order execution; (c)
cost of borrowing stocks for shorting positions; (d) management and incentive fees. After more

3
than 16 years of out-of-sample performance, factor investing’s Sharpe ratio is statistically
insignificant at any reasonable confidence level.

The primary goal of this paper is to elucidate why the performance of factor investing strategies
has disappointed, and what can be done to correct this situation. Factor investing has failed to
perform as expected, not for the lack of stringent peer-review (many of these failed factors have
been published in the top journals of financial economics, including The Journal of Finance), but
because the econometric canon used to make and peer-review factor claims is flawed. Until now,
much of the blame has been placed on multiple testing and p-hacking (for example, see Bailey et
al. [2014], and Harvey and Liu [2020]), and very little attention has been paid to the role of
specification errors. We show how canonical econometric practices favor misspecified factor
strategies that underperform (compared to the correctly specified model) and potentially yield
systematic losses, even if all risk premia remain constant and are estimated with the correct sign.
Unless the econometric canon is rewritten, these errors will not be addressed, and factor investors
have no reason to expect that performance will improve in the future.

Next, we explain how six fundamental methodological flaws likely have contributed to factor
investing’s disappointing performance.

PITFALL 1: BRUTE FORCE P-HACKING


The goal of a statistical test of hypothesis is to assess whether the empirical evidence is consistent
with a so-called null hypothesis. The p-value of a test measures the probability of observing data
at least as extreme as the observed data, assuming that the null hypothesis is true. A researcher
rejects a null hypothesis when the p-value is less than a pre-defined threshold 𝛼, which represents
the probability of making a type I error (a false positive).

Suppose that we repeat for a second time a test with false positive probability 𝛼. At each trial, the
probability of not making a type I error is (1 − 𝛼). If the two trials are independent, the probability
of not making a type I error on the first and second tests is (1 − 𝛼)2 . The probability of making at
least one type I error is the complement, 1 − (1 − 𝛼)2. If we conduct 𝐾 independent trials, the
joint probability of not making a single type I error is (1 − 𝛼)𝐾 . Hence, the probability of making
at least one type I error is the complement, 𝛼𝐾 = 1 − (1 − 𝛼)𝐾 , also known as the familywise
error rate (FWER). As 𝐾 grows, a researcher who applies the threshold 𝛼 is accepting a false
positive rate 𝛼𝐾 instead of 𝛼. For example, a researcher who publishes the best result out of 10
independent trials with 𝛼 = 0.05 accepts a type I error of 𝛼𝐾 ≈ 0.40, eight times higher than
reported.

P-hacking is the malpractice of running multiple statistical tests until one test achieves a p-value
below the target threshold 𝛼, without reporting the real type I error, 𝛼𝐾 . “Brute force” p-hacking
is the type of p-hacking that relies on sheer computational power, sometimes with the assistance
of an optimization algorithm, such as stepwise factor selection (Romano and Wolf [2005]) or the
popular general-to-simple algorithm (Greene [2012, pp. 178–182]). The effect of brute force p-
hacking is that the researcher mistakes random variability (noise) for signal, resulting in a false
association. In the absence of serial dependence, the expected return of a brute force p-hacked is
zero, before transaction costs and fees (Bailey et al. [2014], Harvey et al. [2016]).

4
Some factor researchers have argued that 𝐾 is a relatively small number in financial economics,
because there are only a few hundred published factors, and out of those, only 80 or so can be
considered uncorrelated. This argument evidences a misunderstanding of the meaning of 𝐾. To
control for overfitting, we need to account for the number of unreported trials, not the number of
published (true and false) positives. Even if a single factor had been published by the entire
economics profession, that factor would be a likely false positive for some number 𝐾 of unreported
trials. How many unreported trials have taken place over the past 60 years? Nobody knows, of
course, because factor researchers have not reported this information, but it is possibly in the
billions. For example, the procedures inspired by Fama and MacBeth [1973] and Fama and French
[1993, 2015] involve a large number of subjective decisions, such as fit window length, fit
frequency, number of quantiles, definition of long-short portfolios, choice of controls, choice of
factors, choice of investment universe, data cleaning and outlier removal decisions, start and end
dates, etc. There are millions of potential combinations to pick from, many of which could be
defended ex-post on logical grounds.

Causal theories are a deterrent against brute force p-hacking, because they constrain a researcher’s
modelling choices to those consistent with the proposed falsifiable mechanism, thus curtailing
efforts to explain random variation (López de Prado [2023, section 6.4.1]). A researcher who
engages in p-hacking may build an ad hoc theory that explains an observed random variation.
However, other researchers will use the theory to design an experiment where the original random
variation is not observed (López de Prado [2023, section 3.3]).

PITFALL 2: FEW-SHOT P-HACKING


As explained earlier, “brute force” p-hacking is a well-documented practice in finance, whereby
selection bias under multiple testing misleads researchers into mistaking noise for signal. Far less
known is “few-shot” p-hacking, whereby the econometric canon induces specification errors that
bias the estimated p-values downwards (López de Prado and Zoonekynd [2024]). To understand
why, first we need to introduce the concept of a collider. Consider a data-generating process with
equations:

𝑌: = 𝑋𝛽 + 𝑢 (1)
𝑍: = 𝑌𝛾 + 𝑋𝛿 + 𝑣

where 𝛾 ≠ 0, 𝛿 ≠ 0, and variables (𝑢, 𝑣, 𝑋) are independent and identically distributed as a


standard Normal, (𝑢, 𝑣, 𝑋)~𝑁[0, 𝐼].1 The causal graph for this process is displayed in Figure 1.2
This is the graph that represents through nodes all the variables involved in the process, and
through arrows the direction of dependences. In the language of causal inference, 𝑍 is a collider,

1
Serial independence, linearity, stationarity and Normality are highly idealized assumptions commonly adopted in the
factor investing literature. One goal of this paper is to show that, even under this ideal scenario, factor strategies are
structurally flawed and will not perform as designed.
2
In this paper, we focus on causal graphs that are acyclic. Acyclicality allows us to represent joint distributions as the
product of conditional probabilities between ancestors and descendants only. This choice does not limit the analysis
to less realistic scenarios, because acyclicality can usually be imposed through: (a) redefinition of variables, e.g.
adding lagged variables as causes; and (b) targeted sampling frequency, such that the effect on 𝑌 is measured before
feedback loops come into play.

5
because it is influenced by both the cause (𝑋) and the effect (𝑌).3 The non-causal path from 𝑋 to
𝑌 through 𝑍 (i.e., 𝑋 → 𝑍 ← 𝑌) is blocked, unless the model controls for collider 𝑍.

Figure 1 – Variable 𝑍 as a controlled collider

The data-generating process is unknown to observers. With proper knowledge of the causal graph,
observers would estimate the effect of 𝑋 on 𝑌 (green arrow in Figure 1) by fitting the equation
𝑌 = 𝑋𝛽 + 𝜀. With that model specification, 𝐸[𝑌|𝑋] = 𝐸[𝑌̂|𝑋], i.e. the true value of 𝑌 (from the
data-generating process) is on average equal to the value of 𝑌 expected by observers, and the
parameter is estimated without bias, 𝐸[𝛽̂ |𝑋] = 𝛽.

Suppose that observers incorrectly attempt to estimate the causal effect of 𝑋 on 𝑌 by fitting the
equation 𝑌 = 𝑋𝛽 + 𝑍𝜃 + 𝜀 on a sample produced by the process. Controlling for collider 𝑍
(shaded node in Figure 1) unblocks the non-causal path (red arrows in Figure 1). Then, 𝛽 and 𝜃
are estimated with a bias, with expected values,

𝛽 − 𝛿𝛾
𝐸[𝛽̂ |𝑋, 𝑍] =
1 + 𝛾2 (2)
𝛾
𝐸[𝜃̂|𝑋, 𝑍] =
1 + 𝛾2

Parameters (𝛿, 𝛾) may bias the estimate of risk premium 𝛽 (note that the bias is zero when (𝛿, 𝛾) =
(0,0)). For appropriate values of (𝛿, 𝛾), the bias can be so large that it flips the sign of the estimated
𝛽 and 𝜃 (a phenomenon that biostatisticians call Berkson’s fallacy, named after the physician who
discovered it, see Berkson [1946]). Gu et al. [2023] present empirical evidence that the factor
model regression in Fama and French [2015] over-controls for at least one collider (market returns
net of the risk-free rate).

Cochrane [2011] famously described the proliferation of spurious factor claims as “the factor zoo.”
Until now, authors have explained this proliferation as the result of brute-force p-hacking via
multiple testing (e.g., Harvey et al. [2016]). While this remains a very plausible explanation, the
adverse outcomes mechanism indicates that this p-hacking may not be a brute-force exercise.
López de Prado and Zoonekynd [2024] proved that misspecified factor models that over-control
for a collider exhibit a higher R-squared than correctly specified models and, under general
conditions, colliders make it easier to p-hack a factor. The econometric canon enables “few-shot
p-hacking”, by favoring over-controlled underperforming (including money-losing) models with
higher R-squared and lower p-values than the correctly specified money-making model. Factor
researchers complying with the current econometric canon produce p-hacked factor strategies after

3
For a general introduction to causal inference, causal discovery and do-calculus, see Neal [2020], Glymour et al.
[2019], Pearl [2009], Pearl et al. [2016]. For an introduction to causal inference in finance, see López de Prado [2023].

6
a few trials, without ill-intent, and despite their best efforts to avoid selection bias or to control for
multiple testing, with the perverse consequences for investors described in the following sections.

PITFALL 3: SYSTEMATIC LOSSES


A long-held belief among factor researchers is that model specification errors do not cause
systematic losses, as long as risk premia remain constant and are estimated with the correct sign.
In this section, we debunk that myth. Consider an investor who wishes to harvest the risk premia
offered by securities with returns 𝑌, where the data-generating process is the one described earlier.
With the correctly specified factor model, the factor strategy that sizes positions by 𝐸[𝑌̂|𝑋]
produces expected returns

𝐸[𝑌𝐸[𝑌̂|𝑋]|𝑋] = (𝑋𝛽)2 ≥ 0 (3)

See López de Prado and Zoonekynd [2024] for a proof. The above equation shows that the factor
strategy based on the correctly specified factor model extracts the only risk premium, 𝛽, as
designed. The collider parameters (𝛾, 𝛿) are absent from the equation.

However, with the incorrect model specification 𝑌 = 𝑋𝛽 + 𝑍𝜃 + 𝜀, the investor attempts to design
a factor strategy that profits from securities’ exposures 𝑋 and 𝑍, instead of from the only factor
exposure, 𝑋. One problem with this attempt is that collider 𝑍 is a function of 𝑌, and not the other
way around. This model specification is an instance of reverse causation. By the time 𝑍 is known
and it is possible to condition on its value, the value of 𝑌 has already been set. In other words, it is
not possible to estimate 𝐸[𝑌̂|𝑋, 𝑍] before the value of 𝑌 has been set. If the investor knew that 𝑍
is a function of 𝑌, he could at least condition on 𝐸[𝑍̂|𝑋] = 𝑋(𝛽𝛾 + 𝛿), but this is inconsistent with
the investor’s (false) assumption that 𝑍 is a cause of 𝑌. Therefore, attempting to condition on a
value of 𝑍 before the value of 𝑌 is set is tantamount to conditioning on a variable 𝑍̃ that adds no
information beyond what is known when the value of 𝑋 is set. Formally, 𝐸[∙ |𝑋, 𝑍̃] = 𝐸[∙ |𝑋], thus
𝐸[𝑢|𝑋, 𝑍̃] = 𝐸[𝑢|𝑋] = 0. The expected returns of a strategy that sizes positions by 𝐸[𝑌̂|𝑋, 𝑍̃] are

𝑋𝛽 + 𝛾(𝑍̃ − 𝛿𝑋) (4)


𝐸[𝑌𝐸[𝑌̂|𝑋, 𝑍̃]|𝑋, 𝑍̃] = 𝑋𝛽
1 + 𝛾2

This factor strategy has failed to achieve the intended expected return, (𝑋𝛽 + 𝑍𝜃)2 , which would
have been achieved if 𝑍 had been a confounder or an independent cause of 𝑌. Controlling for
collider 𝑍 has unblocked the non-causal path 𝑋 → 𝑍 ← 𝑌, with the effect of creating a non-causal
association between 𝑋 and 𝑌. Accordingly, now the collider parameters (𝛾, 𝛿) appear in the
equation for expected strategy returns, despite of not being risk premia. Even if 𝛽 = 0, this over-
controlled factor strategy is still exposed to that non-causal association.

For appropriate combinations of real values of (𝑋, 𝑍̃, 𝛽, 𝛾, 𝛿), over-controlling for a collider causes
the factor strategy to yield systematic losses, i.e. 𝐸[𝑌𝐸[𝑌̂|𝑋, 𝑍̃]|𝑋, 𝑍̃] < 0. For example, the
strategy yields systematic losses under (𝑋, 𝑍̃, 𝛽, 𝛾, 𝛿) = (1,1,1,1,3). This proves that factor
strategies that over-control for a collider can yield systematic losses, even if all correlations remain
constant and the risk premium is estimated with the correct sign.

7
PITFALL 4: FRAGILITY
As currently developed, factor investing strategies are not only prone to systematic losses, but they
also are not robust to parameter shift. To see why, we need to introduce the concept of a
confounder. Consider a data-generating process with equations:

𝑋: = 𝑍𝛿 + 𝑣 (5)
𝑌: = 𝑋𝛽 + 𝑍𝛾 + 𝑢

where 𝛾 ≠ 0, 𝛿 ≠ 0, and variables (𝑢, 𝑣, 𝑍) are independent and identically distributed as a


standard Normal, (𝑢, 𝑣, 𝑍)~𝑁[0, 𝐼]. The causal graph for this process is displayed in Figure 2. In
the language of causal inference, 𝑍 is called a confounder, because it influences both the cause (𝑋)
and the effect (𝑌), thus obfuscating the true effect of 𝑋 on 𝑌. The arrow in green denotes the causal
path from 𝑋 to 𝑌, and the arrows in red denote a non-causal path from 𝑋 to 𝑌 through 𝑍, i.e. 𝑋 ←
𝑍 → 𝑌. In presence of a confounder, this is also called a backdoor path, because it has an arrow
pointing into the cause (𝑋).

Figure 2 – Variable 𝑍 as confounder

The data-generating process is unknown to observers. With proper knowledge of the causal graph,
observers would estimate the effect of 𝑋 on 𝑌 by fitting the equation 𝑌 = 𝑋𝛽 + 𝑍𝛾 + 𝜀.
Controlling for confounder 𝑍 blocks the backdoor path, thus achieving that 𝐸[𝑌|𝑋, 𝑍] =
𝐸[𝑌̂|𝑋, 𝑍], i.e. the true value of 𝑌 (from the data-generating process) is on average equal to the
value of 𝑌 expected by observers, and the parameters are estimated without bias, 𝐸[𝛽̂ |𝑋, 𝑍] = 𝛽,
𝐸[𝛾̂|𝑋, 𝑍] = 𝛾.

Suppose that observers incorrectly attempt to estimate the causal effect of 𝑋 on 𝑌 by fitting the
equation 𝑌 = 𝑋𝛽 + 𝜀 on a sample produced by the process. Then, the backdoor path remains
unblocked, and the expected value of the estimated 𝛽 is,

𝛾𝛿 (6)
𝐸[𝛽̂ |𝑋] = 𝛽 +
1 + 𝛿2
𝛾𝛿
For appropriate values of (𝛿, 𝛾), the confounder bias 1+𝛿2 can be so large that it flips the sign of
the estimated 𝛽 (a phenomenon known as Simpson’s paradox, see Pearl [2014]). Sadeghi et al.
[2024] apply the PC algorithm for causal discovery to the variables used in Fama and French
[1993], and find that the HML factor may be a collider (see figure 8b in that publication).

From an investment standpoint, variables 𝑋 and 𝑍 are factor exposures, because both are a cause
of 𝑌. However, not all system parameters are risk premia. This can be seen in the correctly
specified model, 𝐸[𝑌̂|𝑋, 𝑍] = 𝑋𝐸[𝛽̂ |𝑋, 𝑍] + 𝑍𝐸[𝛾̂|𝑋, 𝑍] = 𝑋𝛽 + 𝑍𝛾 = 𝐸[𝑌|𝑋, 𝑍], thus 𝑌 is only

8
influenced by two parameters (𝛽, 𝛾). Therefore, (𝛽, 𝛾) are risk premia, and 𝛿 is not a risk premium.
Parameter 𝛿 is a confounding effect that may bias the estimate of risk premium 𝛽 (note that the
bias is zero when 𝛿 = 0).

Suppose that, sometime afterwards, the confounding effect shifts from 𝛿0 to 𝛿1 , while keeping
constant the risk premia (𝛽, 𝛾). For a long period of time, there are not enough observations to
detect that shift, thus the investor continues to use the original parameter estimates. After the shift
in parameters, a factor strategy based on the correctly specified model that sizes positions by
𝐸0 [𝑌̂|𝑋, 𝑍] delivers expected returns

𝐸1 [𝑌𝐸0 [𝑌̂|𝑋, 𝑍]|𝑋, 𝑍] = (𝑋𝛽 + 𝑍𝛾)2 ≥ 0 (7)

See López de Prado and Zoonekynd [2024] for a proof. This equation does not include 𝛿1 , therefore
the performance of the factor strategy based on the correctly specified model is not affected by the
change in 𝛿0 ≠ 𝛿1 .

In contrast, for a factor strategy based on a model that under-controls for a confounder and sizes
positions by 𝐸0 [𝑌̂|𝑋], the expected returns after the parameter shift are

𝛾𝛿1 𝛾𝛿0 (8)


𝐸1 [𝑌𝐸0 [𝑌̂|𝑋]|𝑋] = 𝑋 2 (𝛽 + 2 ) (𝛽 + )
1 + 𝛿1 1 + 𝛿02

For appropriate combinations of real values of the above variables, the strategy yields systematic
losses, i.e. 𝐸1 [𝑌𝐸0 [𝑌̂|𝑋]|𝑋] < 0. For example, the strategy yields systematic losses under
(𝛽, 𝛾, 𝛿0 , 𝛿1 ) = (1,3,1, −1), even though the risk premia remain constant and 𝐸0 [𝛽̂ |𝑋] = 2.5 is
estimated with the correct sign (𝛽 = 1). This proves that factor strategies that under-control for a
confounder can yield systematic losses, even if all risk premia remain constant and are estimated
with the correct sign.

PITFALL 5: TIME-VARYING RISK PREMIA


Factor investing strategies exhibit inconsistent performance over time, with pronounced
drawdowns. Academic authors explain this inconsistent performance in terms of time-varying risk
premia (e.g., Evans [1994], Anderson [2011], Cochrane [2011]). While these explanations are
plausible, an arguably more likely and parsimonious, hence preferable, explanation: the actual
(ground truth) risk premia do not change over time, however the estimated risk premia do change
over time due to specification errors. To see how, consider the case where the market rewards
exposure to 𝑋 (𝛽 > 0) in the collider model described earlier. Even if risk premium 𝛽 remains
constant, changes over time in 𝛾 or 𝛿 will change 𝛽̂ . In particular, for some values of (𝛾, 𝛿), then
𝛽̂ < 0. Likewise, in the confounder case, changes over time in 𝛿 will change 𝛽̂ , and for some
values of 𝛿 then 𝛽̂ < 0. Misspecified factor models misattribute performance, thus misleading
investors into believing that the market has turned to punish exposure to risk characteristic 𝑋, when
in reality their losses have nothing to do with changes the risk premium 𝛽.

Causality is a necessary condition for investment efficiency, because: (a) non-causal factor models
are likely misspecified; (b) misspecified factor models misattribute risk and performance; and (c)

9
risk and performance misattribution lead to inefficient asset allocation. Causal models allow
investors to attribute risk and performance to the variables responsible for a phenomenon. With
proper attribution, investors can build a portfolio exposed only to rewarded risks, and aim for
investment efficiency (see Rebonato [2010], Rebonato and Denev [2014], Denev [2015]).

PITFALL 6: THE CAUSAL MECHANISM FOR ADVERSE OUTCOMES


Virtually all factor researchers apply econometric procedures that fundamentally date back to the
1930s (when the Econometric Society and the Cowles commission were launched) and even the
late 18th century (when Gauss developed the least squares method, see Stigler [1981]), before the
development of causal inference. For the reasons explained earlier, the current econometric canon
misleads researchers into selecting a certain class of misspecified factor models, in place of
correctly specified models. Factor researchers (who selected the misspecified factor models) build
investment portfolios that are more likely to underperform and yield systematic losses, to the
benefit of more informed investors (who applied causal discovery to select correctly specified
models). Consequently, factor investors and informed investors tend to take opposite sides of the
same trade.

Figure 3 – The causal mechanism behind the failure of factor investing

Figure 3 outlines the causal mechanism that enables adverse outcomes in factor investing. Due to
Pitfall 3, factor strategies based on factor models that over-control for a collider can yield
systematic losses, even if correlations remain constant and the risk premium was estimated with
the correct sign (link 1 in the causal mechanism). Due to Pitfall 2, misspecified factor models that
over-control for a collider exhibit a higher R-squared than correctly specified models. Also, under
general conditions, colliders make it easier to p-hack a factor (link 2 in the causal mechanism).
Canonical approaches for specification-selection in econometrics favor precisely those traits (high
R-squared and low p-values), hence over-controlled models crowd out correctly specified models
(link 3 in the causal mechanism). The combined effect is an increased association between selected
(either published or deployed) factor investing strategies and underperformance, including
systematic losses (link 4 in the causal mechanism).

There are two reasonable counter-arguments to Figure 3. First, factor investing papers usually
publish long historical backtests with a positive expected return. Could those positive backtests
indicate that those strategies’ models are not over-controlled? The answer is no. As the False
Strategy Theorem has proven (Bailey et al. [2014], López de Prado and Fabozzi [2018], López de
Prado and Bailey [2021]), it is trivial to overfit a backtest to simulate any desired performance.
For example, a researcher can run thousands of backtests by trying alternative investment
universes and time periods, introducing arbitrary profit-taking and stop-out rules, or imposing ad
hoc constraints to the portfolio optimization, to cite a few common variations. The expected Sharpe

10
ratio of the best backtest rises with the number of trials, even if the true Sharpe ratio is zero.
Historical backtests do not model the data-generating process, hence they cannot tell us much
about whether the factor model is correctly specified. A better tool for assessing specification
errors is causal discovery algorithms. A second counter-argument to Figure 3 is that some popular
factor investing strategies have delivered positive out-of-sample performance for multiple years,
before experiencing losses or flattening out. Could this be evidence that these strategies’ models
are not over-controlled? Again, the answer is no. The sheer number of factors, many of which are
noisy (Akey et al. [2023]), has resulted in thousands of alternative implementations. Some of those
implementations will perform well out-of-sample for a while (a statistical fluke). Also, a strategy’s
popularity can push up the prices of securities with targeted characteristics, driving up the
strategy’s live performance for several years (a self-fulfilling prophecy), before losses offset the
initial trend. The factor investing industry continues to grow at sufficient pace to potentially mask
its true expected returns for years.

The causal mechanism for adverse outcomes is enforced by canon-complying factor researchers
(a self-inflicted wound), not by the informed investors taking the other side of the trades. The
solution is to correct the econometric canon such that the causal mechanism is disrupted. Figure 3
suggests two corrections. First, utilize the tools of causal inference. The correct way to justify a
factor model’s specification is to discover the causal graph associated with the data-generating
process, and to apply do-calculus to block all non-causal paths, thus avoiding both, under-
controlling for a confounder and over-controlling for a collider. Second, prevent that over-
controlled models crowd out correctly specified factor models. Academics and practitioners should
desist in their practice of choosing the factor model’s specification with the highest explanatory
power, as this practice all but guarantees over-controlling, with potentially dire consequences for
investors. Investors can also take matters into their own hands, by defunding investment strategies
that do not have a solid causal foundation (see López de Prado [2023]).

THE EMPIRICIST-RATIONALIST DIALECTIC


The purpose of this paper is not to argue that factor investing cannot work. On the contrary, there
are good economic reasons for expecting that correctly specified (and hence, causal) factor models
can yield systematic profits, and can be robust to parameter shifts. The issue is, financial
economists have marginalized themselves to the role of econometric assistants, and accepted the
reductionist view that p-values or backtests rank highest in the hierarchy of empirical evidence. In
reality, a p-value is useless if the regression is misspecified, p-values are easily overfit, and a
backtest cannot tell us why an investment strategy presumably works. Selection bias under
multiple testing has resulted in the factor zoo of irreproducible results. Similarly, black-box
machine learning methods are not robust to parameter shifts, thus they are generally unsuitable for
modelling complex dynamic systems like the financial markets. Robustness is a virtue of causality,
and economists are best positioned to identify and explain the causal mechanisms responsible for
the observed associations.

Quantamental approaches advocate taking discretionary investment decisions that are loosely
informed by quantitative signals. This is the polar opposite to econometrics’ empiricist philosophy,
whereby economists decide what empirical evidence is admissible, and how it should be
interpreted (a form of confirmation bias). Neither the econometric nor the quantamental approach

11
provide a coherent collaboration framework: one side ends up imposing its view over the other. It
is amusing to recognize in this econometric-quantamental dialectic the reverberations of the
empiricist-rationalist debate that predated the scientific revolution.

THE PATH FORWARD


What is the solution? History provides an answer: the scientific method was developed precisely
to reconcile rationalism with empiricism. One caveat is finance’s barriers to experimentation,
however recent advances in causal inference make it possible for researchers to design controlled,
natural and simulated experiments. Causal factor investing allows economists to constrain
statisticians’ tendency to mistake noise for signal, and it allows statisticians to constrain
economists’ tendency to produce unfalsifiable ad-hoc and ex-post rationales, because a causal
explanation must be refutable through experimentation. Under this new methodological
framework, economists and statisticians can collaborate in equal terms, drawing on each other’s
strengths, towards the development of robust theoretically-sound quantitative investment
strategies that overcome the pitfalls of associational (including econometric and black-box
machine learning) models.

CONCLUSIONS
A common belief among researchers and investors is that specification errors are not dangerous,
because an investment strategy based on a misspecified factor model will still perform well as long
as the risk premia remain constant and are estimated with the correct sign. In this paper, we have
shown that this belief is false, for two reasons. First, even if all correlations remain constant,
misspecified factor strategies: (a) underperform the correctly specified strategy in the confounder
case; and (b) can yield systematic losses in the collider case. This is true regardless of whether the
risk premia were estimated with the correct sign. Second, the performance of factor strategies that
under-control for a confounder is not robust to parameter shift, and these strategies can yield
systematic losses, even if the risk premia remain constant and are estimated with the correct sign.

The implication is that factor researchers should cease to dismiss as inconsequential concerns
regarding specification errors.4 Authors should justify very carefully their specification choices,
discuss potential confounders, and argue why the chosen controls cannot be colliders. Put bluntly,
without some knowledge of the causal graph, it is possible (even likely) that an investment strategy
will be fatally flawed.

Causal discovery and causal inference methods are being applied successfully to untangle the
causal relationships in complex systems across various scientific fields (for example, see Runge et
al. [2019], Shen at al. [2011]). In contrast, virtually no factor investing studies motivate their
specification choices in terms of causal graphs derived from data; or develop refutable economic
theories consistent with discovered causal graphs; or provide an analysis of potential colliders,
confounders or mediators; or apply do-calculus to derive a sufficient adjustment set. The
consequence is that factor researchers routinely make avoidable specification errors. A
misspecified model that over-controls for a collider exhibits a higher explanatory power than a

4
For instance, in a recent interview, Prof. Kenneth French is quoted as saying that “efforts to establish causality may
be a distraction.” The full interview can be found here: https://www.risk.net/investing/7959101/the-quants-who-
kicked-the-hornets-nest-to-champion-causality

12
correctly specified model. Given the standard econometric practice of choosing a model’s
specification based on its explanatory power, it is natural for over-controlled models to crowd-out
correctly specified models, making it more likely for underperforming and money-losing models
to be selected.

Our findings challenge the scientific soundness and long-term profitability of the current multi-
trillion-dollar (associational, casual, non-causal) factor investing industry. To overcome these
pitfalls, academics and practitioners should rebuild the financial economics literature on the more
scientifically rigorous grounds of causal factor investing.

13
REFERENCES

Bailey, D., J. Borwein, M. López de Prado, and J. Zhu (2014): “Pseudo-Mathematics and Financial
Charlatanism: The Effects of Backtest Overfitting on Out-Of-Sample Performance.” Notices of the
American Mathematical Society, Vol. 61, No. 5, pp. 458-471

Berkson, J. (1946): “Limitations of the Application of Fourfold Table Analysis to Hospital Data.”
Biometrics Bulletin, Vol. 2, No. 3, pp. 47–53.

BlackRock (2017): “What Is Factor Investing?” Online publication. Available at


www.blackrock.com/us/individual/investment-ideas/what-is-factor-investing

Bloomberg (2021): “Bloomberg GSAM US Equity Multi Factor Index.” Bloomberg Professional
Services – Indices. Available through the Bloomberg Terminal.
https://assets.bbhub.io/professional/sites/10/Bloomberg-GSAMUS-Equity-Multi-Factor-Index-
Fact-Sheet.pdf

Chen, B. and J. Pearl (2013): “Regression and Causation: A Critical Examination of Six
Econometrics Textbooks.” Real-World Economics Review, No. 65, pp. 2–20.
http://www.paecon.net/PAEReview/issue65/ChenPearl65.pdf

Denev, A. (2015): Probabilistic Graphical Models: A New Way of Thinking in Financial


Modelling. Risk Books, 1st ed.

Fama, E. and K. French (1993): “Common risk factors in the returns on stocks and bonds.” Journal
of Financial Economics, Vol. 33, No. 1, pp. 3-56.

Fama, E. and K. French (2015): “A five-factor asset pricing model.” Journal of Financial
Economics, Vol. 116, No. 1, pp. 1-22.

Fama, E. and J. MacBeth (1973): “Risk Return and Equilibrium: Empirical Tests.” Journal of
Political Economy, Vol. 71, pp. 607–636.

Glymour, C., K. Zhang, and P. Spirtes (2019): “Review of Causal Discovery Methods Based on
Graphical Models.” Frontiers in Genetics, Vol. 10, No. 524, pp. 1–15,
www.frontiersin.org/articles/10.3389/fgene.2019.00524/full.

Greene, W. (2012): Econometric Analysis. Pearson Education, 7th edition.

Gu, L., H. Zhang, A. Heinz, J. Liu, T. Yao, M. AlRemeithi, Z. Luo, D. Ruppert (2023): “Re-
examination of Fama-French factor investing with causal inference methods.” Working paper.
Available at https://ssrn.com/abstract=4677537

Harvey, C., Y. Liu, H. Zhu (2016): “… and the Cross-Section of Expected Returns.” Review of
Financial Studies, Vol. 29, No. 1, pp. 5–68.

14
Harvey, C. and Y. Liu (2020): “False (and Missed) Discoveries in Financial Economics.” Journal
of Finance, Vol. 75, No. 5, pp. 2503-2553.

López de Prado, M. (2023): Causal Factor Investing: Can Factor Investing Become Scientific?
Cambridge University Press. https://doi.org/10.1017/9781009397315

López de Prado, M. and D. Bailey (2021): “The False Strategy Theorem: A Financial Application
of Experimental Mathematics.” American Mathematical Monthly, Vol. 128, No. 9, pp. 825-831.

López de Prado, M. and F. Fabozzi (2018): “Being Honest in Backtest Reporting: A Template for
Disclosing Multiple Tests.” The Journal of Portfolio Management, Vol. 45, No. 1, pp. 141-147.

López de Prado, M. and V. Zoonekynd (2024): “Why has Factor Investing Failed?: The Role of
Specification Errors.” Working paper. Available at https://ssrn.com/abstract_id=4697929

Neal, B. (2020): Introduction to Causal Inference: From a Machine Learning Perspective. Course
Lecture Notes (December 17, 2020). www.bradyneal.com/causal-inference-course.

Neuberger Berman (2019): “Inside the Quant Investing Trend.” Quarterly Views.
www.nb.com/documents/public/en-us/Messinger_Client_Letter_2Q19.pdf

Pearl, J. (2009): Causality: Models, Reasoning and Inference. Cambridge, 2nd ed.

Pearl, J. (2014): “Understanding Simpson's Paradox.” The American Statistician. Vol. 68, No. 1,
pp. 8–13.

Pearl, J., M. Glymour, and N. Jewell (2016): Causal Inference in Statistics: A Primer. Wiley, 1st
ed.

Rebonato, R. (2010): Coherent Stress Testing. Wiley, 1st ed.

Rebonato, R. and A. Denev (2014): Portfolio Management under Stress: A Bayesian-Net


Approach to Coherent Asset Allocation. Cambridge University Press, 1st ed.

Romano, J. and M. Wolf (2005): “Stepwise Multiple Testing as Formalized Data Snooping.”
Econometrica, Vol. 73, No. 4, pp. 1237–1282.

Runge, J., P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic (2019): “Detecting and
quantifying causal associations in large nonlinear time series datasets.” Science Advances, Vol. 5,
No. 11. Available at https://www.science.org/doi/10.1126/sciadv.aau4996

Sadeghi, A., A. Gopal, and M. Fesanghary (2024): “Causal discovery in financial markets: A
framework for nonstationary time-series data.” Working paper. Available at
https://arxiv.org/abs/2312.17375

15
Schipper, K. and R. Thompson (1981): “Common Stocks as Hedges against Shifts in the
Consumption or Investment Opportunity Set.” Journal of Business, Vol. 1, pp. 305–328.

Shen, X., S. Ma, P. Vemuri, M. Castro, P. Caraballo, G. Simon (2021): “A novel method for causal
structure discovery from EHR data and its application to type‑2 diabetes mellitus.” Scientific
Reports, Vol. 11: 21025. Available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8546093/

16

You might also like