A Trading Agent With No Intelligence Routinely Outperforms AI
A Trading Agent With No Intelligence Routinely Outperforms AI
Abstract— There is a long tradition of research using the intelligent human traders were replaced by automated
computational intelligence, i.e. methods from artificial algorithmic trading systems, colloquially known in the industry as
intelligence (AI) and machine learning (ML), to automatically "robot traders". Given that successful human financial-market
discover, implement, and fine-tune strategies for autonomous traders were generally considered to be intelligent (for some
adaptive automated trading in financial markets, with a reasonable definition of that word), there was an obvious appeal in
sequence of research papers on this topic published at major using tools and techniques from artificial intelligence (AI) and
AI conferences such as IJCAI and in prestigious journals such machine learning (ML) to create smarter robot traders. Arguably
as Artificial Intelligence: we show evidence here that this the initial spark or seed for the growth of algorithmic trading was
the publication of a sequence of public-domain research papers in
strand of research has taken a number of methodological mis-
the late 1990s and early 2000s that reported on trading algorithms
steps and that actually some of the reportedly best-
which used AI/ML methods to autonomously adapt their trading
performing public-domain AI/ML trading strategies can behavior as market conditions altered: some of these trading
routinely be out-performed by extremely simple trading algorithms were demonstrated to consistently out-perform human
strategies that involve no AI or ML at all. The results that we traders, a result that generated worldwide press coverage. Because
highlight here could easily have been revealed at the time that these adaptive robot traders asked for no pay or holidays or sleep,
the relevant key papers were published, more than a decade employing profitable adaptive robot traders was clearly a more
ago, but the accepted methodology at the time of those attractive commercial proposition than continuing to employ
publications involved a somewhat minimal approach to human traders, and so the reduction in human head-count began.
experimental evaluation of trader-agents, making claims on As the amount of money to be made from automated trading grew,
the basis of a few thousand test-sessions of the trader-agent in so did the secrecy surrounding new developments in the field: if
a small number of market scenarios. In this paper we present someone creates an innovative improvement on the profitability of
results from exhaustive testing over wide ranges of parameter an automated trading system, the economically rational thing to do
values, using parallel cloud-computing facilities, where we is to keep it a secret and quietly make money with the new
conduct millions of tests and thereby create much richer data technology, in the hope that none of your competitors make the
from which firmer conclusions can be drawn. We show that same discovery.
the best public-domain AI/ML traders in the published In this paper we re-visit the sequence of papers from 1993-
literature can be routinely outperformed by a "sub-zero- 2008 that gave details of various public-domain trading
intelligence" trading strategy that at face value appears to be algorithms. We describe those papers and the corresponding
so simple as to be financially ruinous, but which interacts with algorithms in more detail in Section II of this paper, but for the
the market in such a way that in practice it is more profitable purposes of this introduction it suffices to say that the various
than the well-known AI/ML strategies from the research algorithms are known by the acronyms ZIC, ZIP, GDX, and AA:
literature. That such a simple strategy can outperform ZIC, introduced in 1993 by Gode & Sunder [9] was an
established AI/ML-based strategies is a sign that perhaps the extraordinarily simple "zero intelligence" trading algorithm that
AI/ML trading strategies were good answers to the wrong could exhibit surprisingly human-like market dynamics; ZIP was
question. publicised by Hewlett-Packard in 1997 [2] as an improvement on
ZIC; GDX was released in 2001 by IBM researchers [15] as an
Keywords—Financial Markets; Automated Trading; Experiment improvement on ZIP (and was claimed at the time of its release to
Design; Simulation Methods. be the best-performing trading algorithm in the public domain);
and then in 2008 Vytelingum [17] described his AA trading
I. INTRODUCTION algorithm, which was argued to out-perform ZIP, GDX, and ZIC.
Ever since the establishment of the first significant stock- In this way, a dominance-hierarchy or "pecking-order" of trading
market in Amsterdam in 1611, skilled human traders could make algorithms was established: if we use A>B to represent the claim
a lot of money by trading in financial markets, and those human that Algorithm A outperforms Algorithm B, then the commonly
traders were widely considered to be intelligent. In the past 15 accepted view among researchers in this field, on the basis of the
years, the number of human traders at the point of execution in published research literature, was that AA>GDX>ZIP>ZIC.
most of the world's major capital markets has declined sharply, as
In our most recent previous paper [11], we presented a parasitic strategy, intended as a tongue-in-cheek model of a
preliminary results which demonstrated that this commonly- contemporary high-frequency trading (HFT) algorithms, which
accepted dominance-hierarchy among these four trading involves no intelligence other than a relentless desire to undercut
algorithms may in fact be incorrect. Our argument in [11], all of its competitors; GVWY is, prima facie, a financially suicidal
summarised in more detail below, was that the old results were strategy that is hard-wired to trade at zero profit. We then run
generated from relatively simplistic experimental evaluations, and pairwise comparison experiments between GVWY and SHVR and
that if we apply orders of magnitude more computational resources each of AA, GDX, ZIC, and ZIP, summarising our results in a style
(via the cost-reductions offered to us by contemporary cloud- analogous to Fig.1, but now involving fully-connected six-node
computing services) and conduct truly exhaustive comparisons networks from each of BSE and TBSE. In each of the two
between these four algorithms, using a state-of-the-art market- simulators we run one set of experiments using static market
simulator as our test-bed, we find that the dominance hierarchy in supply-and-demand curves, i.e, the kind of market situation that
the published record is incomplete, and/or wrong. This present the original experimental evaluations of ZIC, ZIP, GDX, and AA
paper uses the same methods as we deployed in [11], but it used; and then we run another set of experiments in which the
significantly extends our prior results, and in this paper we market supply and demand change dynamically throughout each
complete the analysis started in [11]: the results in that paper, market session: a situation much closer to real-world markets.
generated from hundreds of thousands of market-simulation tests,
showed four pair-wise comparisons, AA-vs-ZIC, AA-vs-ZIP, And here's the rub: counter to what many people would expect,
GDX-vs-ZIC, and GDX-vs-ZIP, each of which was run in a pre- the results we present in Section IV demonstrate that in the more
realistic evaluations GVWY (which involves no computational
existing market simulator called BSE that is similar in style to the
market simulators used in the original research papers that intelligence) consistently outperforms two of the AI/ML traders,
introduced each trading algorithm, but which (like the other algorithms that were first described in prestigious AI publications.
simulators) does not accurately model the parallel and We offer an explanation for why this is so, in Section V, and
asynchronous nature of trading in real financial markets. Then the then in Section VI we discuss how our results highlight the extent
same set of experiments was repeated in a brand-new market- to which a series of methodological mis-steps seems to have
simulator called TBSE, which does incorporate parallel and occurred in the sequence of influential publications that introduced
asynchronous processing. In [11] we reported four sets of these well-known trading algorithms.
pairwise-comparison results from TBSE and showed that they
differed in significant and interesting ways from those generated Before that, Section II of this paper gives a summary of the
in BSE: i.e., that the nature of the market-simulator used in the background to this work, and is taken largely verbatim from a
comparison studies has a major effect on the dominance hierarchy. position paper previously published by one of us (Cliff) in 2019
In [11] we presented tabulated quantitative data from our [4]: our new results presented here can be read as an empirical
experiments, but in this paper we introduce graphical qualitative illustration of the arguments made in that paper. Readers familiar
summaries of the numeric data, shown for our previous results in with [4] can safely skip forward to Section III, where we describe
Figure 1. our experiment methods in more detail. All the new experiment
results presented here, and full details of the new TBSE simulator,
are given in Rollins' 2020 thesis [10], which includes details of the
public-domain TBSE source-code repository on GitHub: the
TBSE code is freely available for inspection and use by others.
II. BACKGROUND
A. Traders, Markets, and Experimental Economics
The 2002 Nobel Prize in Economics was awarded to Vernon
Smith, in recognition of Smith’s work in establishing and
Fig. 1. Pairwise dominance graphs summarising all the results presented in our
thereafter growing the field of Experimental Economics
most recent paper [11]: trading algorithms AA, GDX, ZIC, and ZIP were tested in (abbreviated hereafter to “EE”). Smith showed that the
pairwise contests (i.e., A-vs-B tests) in the sequential BSE simulator (left: Fig.1a) microeconomic behavior of human traders interacting within the
and in the parallel and asynchronous TBSE simulator (right: Fig.1b). Nodes in the rules of some specified market, known technically as an auction
graphs represent specific trading algorithms. Each diagram shows an arrow from mechanism, could be studied empirically, under controlled and
Algorithm A to Algorithm B if, over all sessions in the pairwise contests, A scored
repeatable laboratory conditions, rather than in the noisy messy
more "wins" than B (A "wins" a session if it scored a higher average profit per
trader than B in that session). i.e. if A dominates B in that sense. The numbers confusing circumstances of real-world markets. The minimal
below each strategy name are the indegree/outdegree values for that trader in this laboratory studies could act as useful proxies for studying real-
graph. In the TBSE diagram, arrows that remain the same as in BSE are shown in world markets of any type, but one particular auction mechanism
a pale gray, while arrows that have changed direction are shown in solid black. has received the majority of attention: the Continuous Double
Note that these graphs are not fully connected, because Rollins & Cliff did not Auction (CDA), in which any buyer can announce a bid-price at
report results for AA-vs-GDX or for ZIC-vs-ZIP. Three of the four dominance
relationships invert as the market-simulator test-bed is switched from the serial any time and any seller can announce an offer-price at any time,
BSE to the parallel and asynchronous TBSE. and in which at any time any trader in the market can accept an
offer or bid from a counterparty, and thereby engage in a
In this paper we present new results from extensive further transaction. The CDA is the basis of major financial markets
experimental evaluation and comparisons, extending the four worldwide, and tens of trillions of dollars flow through CDA
pairwise comparisons published in [11]: we show here results from markets every year. Understanding how best to trade in a CDA is
AA-vs-GDX and from ZIC-vs-ZIP, which are necessary to a challenging problem. Note that here the problem of trading
"complete" the graphs shown in Fig.1, turning them into fully- involves trying to get the best price for a transaction within the
connected networks; we then introduce two additional trading CDA.
algorithms, called SHVR and GVWY, which are both minimally Each trader in one of Smith's experimental CDA markets
simple and involve absolutely no AI or ML technology: SHVR is
would be assigned a private valuation, a secret limit price: for a
buyer this was the price above which he or she should not pay prices at random, using a uniform distribution that is bounded by
when purchasing an item; for a seller this was the price below the limit-price on their current assignment: a ZIC buyer generates
which he or she should not sell an item. These limit-price quote-prices between zero and its current limit price (i.e. the limit-
assignments model the client-orders executed by sales traders in price is the upper-bound on the domain of the uniform
real financial markets; we’ll refer to them just as assignments in distribution); a ZIC seller generates quote-prices between its limit-
the rest of this paper. Traders in EE experiments from Smith's price (as a lower-bound) and some arbitrarily-chosen maximum
onwards are often motivated by payment of some form of real- allowable price.
world reward that is proportional to the amount of "profit" that they
accrue from their transactions: the profit is the absolute value of A few years later two closely related research papers were
the difference between the limit price specified when a unit is published independently and at roughly the same time, each
assigned to a trader, and the actual executed transaction price for written without knowledge of the other: the first was a 1997
that unit: a seller profits by transacting at a price above her current Hewlett-Packard Labs technical report by Cliff [2] describing the
assignment's limit-price; a buyer profits by transacting at a price adaptive AI/ML trading-agent strategy known as the ZIP
below her current assignment's limit-price. algorithm, inspired by ZIC but addressing situations in which
markets populated by ZIC traders failed to equilibrate; the second
The limit prices in the assignments defined the market's supply was a 1998 paper [8] that summarized an adaptive CDA trading
and demand schedules, which are commonly illustrated in algorithm developed by Gjerstad as part of his economics PhD
economics texts as supply and demand curves on a 2D graph with research: that trading algorithm is now widely known simply as
quantity on the horizontal axis and price on the vertical axis: where GD. Gjerstad later worked at IBM’s TJ Watson Labs where he
the two curves intersect is the market's theoretical competitive helped set up an EE laboratory that his IBM colleagues used in a
equilibrium point: indicating the equilibrium price denoted here by study which generated world-wide media coverage when its results
P0. A fundamental observation from micro-economics is that were published by Das et al. at the prestigious International Joint
competition among buyers pushes prices up, and competition Conference on AI (IJCAI) in 2001 [5]. This paper presented results
among sellers pushers prices down, and these two opposing from studies exploring the behavior of human traders interacting
influences on prices balance out at the competitive equilibrium with GD and ZIP robot traders, and demonstrated that both GD and
point; a market in which transaction prices rapidly and stably ZIP reliably outperformed human traders. A follow-on 2001 paper
settles to the P0 value is often viewed by economists as efficient [14] by Tesauro & Das (two of the four co-authors of the IBM
(for a specific definition of efficiency) whereas a market in which IJCAI paper) described a more extensively Modified GD (MGD)
transactions consistently occur at off-equilibrium prices is usually strategy, and in 2002 Tesauro & Bredin [15] then described the GD
thought of as inefficient: for instance, if transaction prices are eXtended (GDX) strategy. Both MGD and GDX were each
consistently above P0 then it's likely that buyers are being claimed by their IBM authors to be the strongest-known public-
systematically ripped off. By varying the prices in the traders' domain trading strategies at the times of their publication.
assignments in Smith's experiments, the nature of the market's
supply and demand curves could be altered, and the effects of those Subsequently, Vytelingum’s 2006 PhD thesis introduced the
Adaptive Aggressive (AA) strategy which again used ML
variations on the speed and stability of the market's convergence
toward an equilibrium point could be measured. techniques, and which in an Artificial Intelligence journal paper
[17], and in later IJCAI [6] and ICAART [7] conference papers,
Smith’s initial set of experiments were run in the late 1950’s, was shown to be dominant over ZIP, GDX, and human traders.
and were described in his first paper on EE [12], published in the Thus far then, ZIP had been developed to improve on ZIC; ZIP had
prestigious Journal of Political Economy (JPE) in 1962. The then been beaten by GDX; and AA had then beaten GDX, and
experiment methods laid out in that 1962 paper would hence AA held the title. In shorthand, we had
subsequently come to dominate the methodology of researchers AA>GDX>ZIP>ZIC.
working to build adaptive autonomous automated trading agents
In all of the studies discussed thus far, typically two or three
by combining tools and techniques from Artificial Intelligence
(AI) and Machine Learning (ML). This strand of AI/ML research different types of trading algorithm would be compared against
converged toward a common aim: specifying an artificial agent, an each other on the basis of how much profit (or surplus, to use the
autonomous adaptive trading strategy, that could automatically economists' technical term) they extract from the market, so
tune its behavior to different market environments, and that could Algorithm A was said to dominate or outperform or beat or be
reliably beat all other known automated trading strategies, thereby stronger than Algorithm B if, over some number of market
taking the crown of being the current best trading strategy known sessions, traders running A made more profit than traders running
in the public domain, i.e., the dominant strategy. Over the past 20 B. Methods of comparison varied. Sometimes a particular market
years the dominant strategy crown has passed from one algorithm set-up (i.e., a specific number of sellers, number of buyers, and
to another and until very recently Vytelingum’s [17] AA strategy their associated limit-price assignments specifying the market's
was widely believed to be the dominant strategy, but recent results supply and demand schedules) would be homogeneously
using contemporary large-scale computational simulation populated with traders of type A, and then the same market would
be re-run with all traders instead being type B, and an A-vs-B
techniques indicate that it does not always perform so well as was
previously believed, as discussed in the next section, in which we comparison of profitability in the absence of any other trading
briefly review key publications leading to the development of AA, algorithms could then be made. In another design of experiment,
and the recent research that called its dominance into question. baseline results would first be generated from a market populated
homogenously by A-type traders, and then the same market
B. A Brief History of Trading Agents experiment would be run with a single B-type trader replacing one
If our story starts with Smith’s 1962 JPE paper, then the next of the A-type traders: these one-in-many (OIM) tests explored the
major step came 30 years later, with a surprising result published attractiveness or not of traders 'mutating' or 'defecting' from using
in the JPE by Gode & Sunder [9]: this popularized a minimally Algorithm A as their trading strategy and switching to Algorithm
simple (so-called zero-intelligence) automated trading algorithm B. In another experiment design, for a market with D demand-side
now commonly referred to as ZIC. ZIC traders generate their quote traders (buyers) and S supply-side traders (sellers), D/2 of the
buyers would use Algorithm A and the remaining D/2 buyers
would run Algorithm B, with the seller population being similarly developed to test and compare ZIC with ZIP was published in [3];
split, and the A/B comparison then showed profitability in the the simulators used in [14, 15] and [17] take essentially the same
presence of the other trading algorithm. A/B tests involving 50:50 approach as that; and the public-domain BSE simulator [1] also
splits, as just described, were commonly used to establish the uses the same very simple time-sliced approach. In all of these
dominance relationship between A and B. When each A-type simulators, if any one trader is called upon to issue a response to a
buyer has a buy-order assignment that is also assigned to a change in the market, it always does so within exactly one
matching B-type buyer (i.e., one A-type and one B-type buyer, simulated time-slice within the simulation, regardless of how much
each with the same limit-order assignment); and assignments are computation it actually has to execute to generate that response:
similarly "balanced" for the sellers, then the experiment design is that is, all other activity in the market is temporarily paused, giving
known as a balanced-group (BG) test. The OIM and BG the currently-active trader as long as it requires to compute a
experiment designs were introduced by the IBM researchers in response: in this sense, this style of simulator is sequential and
their work on MGD and GDX [14, 15], who proposed that BG tests synchronous (S&S), because even with fine time-slicing
are the fairest way of comparing two trading algorithms. Note that, fundamentally only one trader is ever computationally active at
for market with N traders in it, BG tests involve a (N/2):( N/2) ratio any one time, and the computations of all other traders are paused
of Algorithm A to Algorithm B, while OIM tests use (N-1):1. while that one trader receives the attention of the simulator. Such
S&S simulations have the net effect of treating each trader-agent's
The significance of the ratio of the different trading algorithms reaction-time as being so small as to be irrelevant. This approach
in a test was first highlighted in Vach's 2015 master's thesis [16] might be defensible in human-vs-robot trading experiments,
which presented results from experiments with the OpEx market
because even the slowest of these trading algorithms can compute
simulator [6], in which AA, GDX, and ZIP were set to compete a response substantially faster than a human can reasonably react;
against one another, and in which the dominance of AA was but if most of the traders in the market are robots, then their
questioned: Vach’s results indicate that whether AA dominates or comparative execution times can matter a lot: if Robot A reacts
not can in fact be dependent on the ratio of AA:GDX:ZIP in the
faster than Robot B to some market event, then A's reaction to that
experiment; for some ratios, Vach found AA to dominate; for other market event may itself materially change the market, forcing B to
ratios, it was GDX. Vach studied only a very small sample from re-compute its response. In this way, being faster can often be more
the space of possible ratios, but his results prompted [3] to use the
profitable than being smarter, but an S&S simulator ignores this.
public-domain BSE financial exchange simulator [1] to
exhaustively run through a wide range of differing ratios of four Snashall & Cliff [13] used the BSE S&S simulator to profile
trading strategies (AA, ZIC, ZIP, and the minimally simple SHVR the reaction-times of various trading algorithms and found that
built into BSE, explained further in Section III), doing a brute- they varied quite widely, indicating that a proper comparative
force search for situations in which AA is outperformed by the study would require truly parallel and asynchronous (P&A)
other strategies. Cliff [3] reported on results from over 3.4 million simulation test-bed, rather than an S&S one. In A P&A simulator,
individual simulations of market sessions, which collectively each robot trader is running on its own processing thread (i.e., the
indicated that Vach’s observation was correct: whether AA traders are simulated in parallel), and the computations of any one
dominates does indeed depend on how many other AA traders are trader do not cause any other trader's computations to go into a
in the market, and what mix of what other strategies are also pause-state (i.e., the traders' operations are asynchronous). In a
present. Depending on the ratio, AA could be outperformed by ZIP P&A simulator, the slower traders might be expected to do much
and by SHVR. Subsequent research by Snashall and Cliff [15, 16] less well than when they are evaluated or compared in a temporally
employed the same exhaustive testing method, using a simplistic S&S simulation. That is what we set out to explore in
supercomputer to run more than one million market simulations the work reported here. In order to do that, one of us (Rollins)
(all in BSE) to exhaustively test AA against IBM's GDX strategy: developed TBSE, a new Threaded (i.e., P&A) version of the BSE
this again revealed that AA does not always dominate GDX. financial-market simulator. TBSE is described in full in [10], and
is available on the GitHub open-source code repository as free-to-
In this paper we will talk about counting the number of "wins" use public-domain software. The results from our experiments,
when comparing an A algorithm to a B algorithm: in the involving more than one million individual market trails, are
experiments reported in Section IV, we create a specific market presented in Section IV. Before that, Section III describes our
set-up, and then run some number n of independent and identically methods in more detail.
distributed markets sessions with a specific ratio A:B of the two
strategies among the buyers, and the same A:B ratio in the sellers. III. METHODS
In any one of those sessions, if the average profit per trader (APPT)
of type A traders is higher than the APPT for traders of type B, A. The BSE/TBSE market-simulator test-bed
then we count that session as a "win" for A; and vice versa as a win The first version of the BSE simulator was released on GitHub
for B. as free-to use public-domain software in October 2012, written in
Our preliminary experimental results reported in [12], and the Python. Initially developed for teaching issues in automated
much more extensive results reported here, are motivated by and trading on masters-level courses at the University of Bristol, BSE
extend this progression of past research. In particular, we noted has since come to be used as a reliable platform for research in
that Vach's results which first revealed that the ratio of different automated trading: all traders in a BSE market are robot traders;
trading algorithms could affect the dominance hierarchy came the current BSE source code repository includes code for ZIC, ZIP,
from experiments he ran using the OpEx market simulator [6], AA and GDX, and two additional trading algorithms called SHVR
which is a true parallel asynchronous distributed system: OpEx and GVWY which are described in Section III.B. BSE is written
involves a number of individual trader computers (discrete laptop using object-oriented programming which makes it very easy for a
PCs) communicating over a local-area network with a central user of BSE to add other trading algorithms, or to edit the existing
exchange-server (a desktop PC). But many of the other results that ones.
we have just summarized came from financial-market simulators As in Vernon Smith's original EE experiments, BSE allows one
that only very roughly approximated parallel execution: the C- set of traders to be identified as buyers, and another set of traders
language source-code for the discrete-event market simulator to be identified as sellers. Any trader can be defined to run any of
the trading algorithms available within BSE. Traders are issued TBSE copies BSE in all of the above regards, except where the
with assignments to buy or sell via a function which can be easily BSE is a fine-timesliced S&S implementation of a LOB-based
switched between issuing all traders fresh assignments at the same financial exchange, TBSE is a multi-threaded P&S
instant (i.e., synchronous updating of assignments), or fresh implementation.
assignments can be drip-fed into the market over the course of a
market session with the assignment inter-arrival time being either B. Trader-agents Shaver (SHVR) and Giveaway (GVWY)
regular or stochastic. Limit prices on the individual assignments When the first version of the BSE simulator was released in
(which determine the overall market's supply and demand curves, 2012, the release included Python code for ZIC and ZIP, and also
and hence its competitive equilibrium price P0) can similarly be set two additional minimally-simple built-in trading algorithms:
in some regular pattern or can be randomly-generated from a Shaver (referred to by the ticker-style abbreviation SHVR); and
specified distribution. For any particular supply and demand Giveaway (GVWY). These were introduced initially just for
schedule, BSE also allows a dynamically varying offset function to illustrative purposes: SHVR is a light-hearted approximation to a
be added to all limit-prices at the same time, where the amount high-frequency trading (HFT) algorithm, and GVWY was
added (or subtracted) from each limit-price is a function of time: intended as nothing more than a stub, a bare-bones trading
in this way the relative relationship between the supply and algorithm that other users could edit and extend. Both SHVR and
demand curves can be maintained, while the value of P0 can vary GVWY are expressible in only a few, less than 10, core lines of
dynamically: for example, in [3], the offset function was based on Python code. SHVR's strategy is simple: it looks at the best price
time-series of real-world asset-prices, so that the BSE P0 value on its side of the LOB, compares it to the limit-price on its current
varied over time in the same way as the original real-world asset- assignment, and if it is able to then it shaves one tick off the best
price. In the experiments reported here, we study the effect of price (i.e. increasing the best bid by one penny if it is a buyer, or
varying between a null offset function (i.e. the supply and demand decreasing the best ask by one cent if it is a seller). If the shaved
schedule is static for the duration of each market session, a price would be the wrong side of the SHVR's limit price, then it
commonplace style of experiment in all the papers reviewed in does nothing. And that's it.
Section II.B), which we refer to as Static P0 (results given in Table GVWY's strategy is even simpler. It takes the limit-price on its
I and Fig. 2); and using the sinusoidal dynamic offset function current assignment, and uses that as the price of its quote. It does
illustrated in Fig.4.8 of the BSE User Guide [1], which we refer to not look at any LOB data. It only ever quotes its current assignment
as Dynamic P0 (results in Table II and Fig. 3). price. And, because the profit assigned to a trader for executing an
BSE implements a single-asset financial exchange running a assignment is the difference between the assignment's limit-price
CDA, and like most real-world financial exchanges it maintains a and the price of the transaction on that assignment, it would appear
Limit Order Book (LOB) as the data-structure at the core of its that GVWY sets out to make zero profit. But, as we will show in
operations. Traders in BSE issue limit orders, i.e. an indication of Section IV, and explain in Section V, in practice GVWY can be
a willingness to buy or to sell: a bid limit order indicates an surprisingly profitable, and in our results presented here it out-
intention to buy a specified quantity of the asset at a price no performs both GDX and ZIP, casting some doubt on the actual
greater than the price indicated on the order; an ask indicates a value of those strategies.
desire to sell a specified quantity at a price no lower than the price
C. Experiment design
indicated on the order. Traders issue limit orders to the BSE
exchange, which sorts them into the array of currently active bids, In this paper, as in [11], we report results from experiments in
ordered best-first by price (i.e., prices sorted from highest to which the number of buyers NB is the same as the number of sellers
lowest) and the array of currently active asks, again ordered best- NS, and the supply and demand curves are randomly generated but
first by price (i.e. prices sorted from lowest to highest) the total approximately symmetric (i.e., having gradients roughly equal in
quantity currently available at each price is also displayed, magnitude but opposite in sign). We use NB = NS =20, and run
although the identities of the individual orders that collectively experiments at all possible A:B ratios from 1:19 through 10:10 to
make up that quantity are not: in this way the exchange acts as an 19:1 – this gives us 19 different ratios to study for any one A:B
aggregator and anonymizer of the set of currently active limit pair. At any given ratio, we run 1000 independent and identically
orders. The LOB, i.e. the list of prices and quantities on the bid- distributed market sessions. A single market session is a simulated
side and on the ask-side, is published by the exchange to all traders version of one of Smith's CDA experiments: a clock is set to zero
in the market. At any one time the top of the LOB shows the and starts running; traders are issued assignments according to the
quantity and price of the best bid, and the quantity and price of the supply and demand functions programmed for this experiment; as
best ask. The difference between the prices of the best bid and the time progresses traders can issue orders that are either entered on
best ask is known as the spread, and the arithmetic mean of the the LOB or that cross the spread and create a transaction; the BSE
two best prices is the market's current mid-price – a common exchange distributes the updated LOB to each trader after any
single-valued summary of current market price on a LOB. change to the LOB, and all traders react to each change in the LOB
according to whatever trading algorithm they are running.
If a trader wishes to transact at the current best price on the Eventually the time on the simulated clock reaches the designated
counterparty side of the LOB, it does so by issuing a limit order end-time for the session, and the session ends – at which point the
that crosses the spread, i.e. a bid limit order with a price higher average profit per trader (APPT) can be calculated for algorithm A
than the current best ask (referred to as lifting the ask), or an ask and algorithm B, and then the "winner" of the session is the
limit order with a price lower than the current best bid (referred to algorithm with the higher APPT.
as hitting the bid). When this happens, the quantity indicated by
the spread-crossing order is consumed from the top of the LOB, Simulating 1000 such sessions for each of 19 ratios gives us
and the price of the transaction is whatever best price was showing 19,000 sessions per test of an A:B pair. Each such pair is tested in
at the top of the LOB at the point that the spread-crossing order BSE and then in TBSE with static P0 and then each such pair is
was issued. Prices in BSE are integers, representing multiples of tested again, in BSE and then in TBSE, with dynamic P0, meaning
the exchange's tick-size: if the tick-size is set to be $0.01 then a that we conduct a total of 4x19,000=76,000 sessions per A/B pair,
price of 100 in BSE represents $1. and with 15 such pairs to test this gives a total of
15x76,000=1,140,000 market sessions to be simulated.
D. Presentation of results changed within the relevant network as the P0 treatment switches
The experiment design laid out in the previous section requires from static to dynamic.
us to conduct simulations of 1,140,000 individual market sessions.
As we showed in [11] there is often interesting structure in the A-
vs-B results when we sweep along the range of A:B ratios from
one extreme to the other, but in this paper space limitations require
us to skip over that structure and instead present only aggregate
results, which roll up the total number of wins for A and the total
number of wins for B across all the different ratios studied. As we
have six trading strategies (AA, GDX, GVWY, SHVR, ZIC, and
ZIP) there are 15 distinct A/B pairs to run comparative tests on.
Hence for each experiment "treatment" of the equilibrium price P0
(static vs dynamic) we show a 15-row table with each row showing Fig. 2. Fully-connected pairwise dominance graph for the six trading strategies
the win-counts for A and for B in BSE and then again in TBSE. AA, GDX, GVWY, SHVR, ZIC, and ZIP, in BSE (left) and TBSE (right), with a
static value for the competitive equilibrium price P0 (i.e., quantiative data shown
After that, we summarise the results in a network such as Fig.1. in Table I). Format is as for Figure 1. In BSE, it is notable that AA is a source
node, in that it dominates all five other nodes (indegree=0); and ZIC is a sink node
IV. RESULTS (outdegree=0), in that it is dominated by all five other nodes. In moving from BSE
Table I shows data summarising the results from running the to TBSE, three of the dominance hierarchies reverse (as indicated by solid-black
complete set of A-vs-B tests on the six trading algorithms, in BSE arrows). See text for further discussion.
and in TBSE, with a static value of the competitive equilibrium
TABLE II. RESULTS FROM PAIRWISE CONTESTS WITH DYNAMIC P0
price P0. Table I summarises the results from 570,000 individual
market sessions. On each row of the table, the trading algorithms
used as A and as B are named in the two left-most columns; then
the two central columns show the number of wins for A and for B
in BSE, with bold font highlighting the result from the dominant
strategy (the larger of the two win-counts); and then the two
rightmost columns show the win-counts for A and for B in TBSE
– with bold font again highlighting the dominant result, and with
underlining and italic font used to highlight those results where the
dominance relationship in BSE is inverted after switching to
TBSE. Fig. 2 qualitatively summarises the results of Table 1, using
the graphical format established in Fig. 1.
V. DISCUSSION
The results for BSE with static P0 are the closest we come here
to replicating prior methods from the literature reviewed in Section
II. Our results, illustrated in Fig.1a, confirm the
AA>GDX>ZIP>ZIC dominance hierarchy, and the "source"
nature of the AA node (zero indegree, maximal outdegree)
confirms AA as the dominant algorithm, when evaluated using
traditional methods. Furthermore, if the only zero-intelligence
trader under consideration is ZIC, then the fact that the ZIC node
the BSE graph on Fig.1 is a "sink" node (i.e., maximal indegree;
zero outdegree) is also no surprise. But a strange feature of these
BSE results is that neither GVWY or SHVR fare as badly as ZIC:
GVWY dominates not only ZIC but also ZIP; and SHVR
Table II then shows data summarising the results from running dominates all other strategies apart from AA. To some extent this
the complete set of A-vs-B tests on the six trading algorithms, in can be explained by the fact that SHVR is essentially parasitic,
BSE and in TBSE, but with the dynamically varying P0 value: the relying on other traders to set a price and then merely improving
format for this table is the same as that used in Table I, and again on that price by a single penny: in any pairwise A-vs-B comparison
the total number of market sessions reported in Table II is 570,000, involving SHVR as A and some other algorithm as B, unless B can
taking the total number of sessions reported in this paper to jump ahead of SHVR in the race to a transactable price, SHVR will
1,140,000. very often get the best of A. If you only test algorithms in a
The results shown in Table II are qualitatively illustrated in temporally simplistic (S&S) style of simulator such as BSE (which
Figure 3: this figure again uses the convention of pale gray being is very close to those used in the prior papers reviewed in Section
used to color arrows in the TBSE network that are unchanged from II) then you might form the impression that SHVR is worth
the BSE network, and it introduces an additional convention of exploring further.
using dashed lines to highlight dominance relationships that have However, as Fig.2b makes clear, as soon as the market
simulator is switched to the more realistic (P&A) style used in
TBSE, three of the dominance relationships reverse. Now the GVWY's disregard for quoting profit-making prices means that
AI/ML-based traders ZIP and GDX each dominate only two other while it will never make any profit from a limit order it quotes that
algorithms each – they both dominate ZIC, and additionally GDX does actually make it onto the LOB, very many of a GVWY's
now dominates SHVR, while ZIP now dominates GDX. If we rank quotes will cross the spread and execute at a price higher than that
order the algorithms on the basis of number of algorithms GVWY's quoted limit-price, thereby generating a profit for the
dominated (i.e., outdegree of their node on the graph) we have AA GVWY despite the fact that the GVWY's quoted price offers it no
top-ranked with 5; SHVR and GVWY ranked second with 3 each; price if the transaction had gone through at that price. GVWY
then ZIP and GDX with 2 each; and then ZIC with none. The fact consistently makes money because of the way the LOB works. As
that GDX, claimed at the time of its introduction as the best- it does not even involve the calculation of a randomly-chosen
performing public-domain trading algorithm, can be dominated in price, it is simpler even than ZIC, but its simplicity allows it to do
TBSE by both SHVR and GVWY is surely a sign that the original well because (unlike ZIC) it will never miss the opportunity to
evaluations of GDX, and in particular its original market-simulator cross the spread, if it is able to do so.
test-bed, left quite a lot to be desired.
VI. CONCLUSION
In this paper we have used the kind of comparative
experimental study that is commonplace in the published literature
on trading algorithms, but we have deployed resources at much
greater scale than has been conventional in the past. Our results are
surprising, in that they show that however interesting ZIP and
GDX were at the time of their introduction, these two algorithms
can be consistently outperformed by the much simpler GVWY,
when the comparison takes place in a contemporary parallel and
asynchronous simulator, across all possible ratios of the two
trading strategies within the population of traders, and when the
Fig. 3. Fully-connected pairwise dominance graph for the six trading strategies market session involves a dynamically varying equilibrium price.
AA, GDX, GVWY, SHVR, ZIC, and ZIP, in BSE (left) and TBSE (right), with Although prior publications by other authors had presented
dynamically varying P0 (i.e., quantiative data shown in Table II). Format is as for comparisons for specific pairs (or triples) of trading algorithms,
Figure 2, with the additional convention that a dotted arrow in the BSE graph in they had very often drawn conclusions from sample-sizes that were
this figure is used to highlight a dominance relationship that differs in direction
from the BSE graph in Figure 2; and similarly a dotted arrow in the TBSE graph no larger than a few tens of thousands of market sessions, using
here is used to highlight a difference in direction of dominance when compared to supply and demand curves that gave a rarely-changing equilibrium
the TBSE graph in Figure 2. As in Figure 2, here the BSE graph shows AA as a price, and studying only one or two ratios of the traders under
source and ZIC as a sink; but in TBSE seven dominance relationships have comparison (e.g. via one-in-many and/or balanced-group tests
reversed, and the graph has no pure sources or sinks. only). To some extent, this is probably because at the time those
studies were conducted, the compute-power needed to conduct
If we now turn our attention to Fig.3, in which the market's studies with sample-sizes in excess of one million would have
competitive equilibrium price P0 is not constant (as was often the required either a lot of money (to spend on supercomputer
case in the original evaluations and comparisons) but is instead resources) or a lot of time (while waiting the large number of
dynamically varying, we see in the BSE graph of Fig.3a that the samples, of individual simulated market sessions, to complete).
move to dynamic P0 affects only one dominance relationship While lack of funds can be a tricky problem to get around, lack of
(highlighted by the dashed-line arrow) when compared to the time requires only patience: in principle the experiments described
corresponding graph in Fig.2a: GDX loses its ability to dominate here could have been run 15 or more years ago; they might have
GVWY. However, when we examine Fig.3b we see that the switch locked up a few high-end PCs for some number of weeks or
to TBSE makes a much bigger difference: seven dominance months, and we can only speculate that such a wait is much longer
relationships (highlighted by black-shaded arrows) invert relative than would have been considered as acceptable at that time, by the
to the BSE graph of Fig.3a, and four of those (highlighted by norms of the research community. So people ran faster
dashed-line arrows) are changed from the TBSE graph of Fig.2b. experiments, and reported less reliable results, results that we have
Of the four sets of results we report here, the dynamic- P0 TBSE now demonstrated here to be, bluntly, just wrong.
results of Fig.3b are the closest to a real-world scenario: the
equilibrium price is constantly shifting, and the traders are But a past lack of patience for waiting while long simulation
operating in parallel and asynchronously; and, under those more runs conclude is only part of the problem: another manifest issue
realistic conditions, there are no "source" or "sink" nodes on the is the clear communal commitment to repeatedly re-using the
dominance graph and the ranking of trading algorithms by number- experimental methods of prior researchers, traditions established
of-other-algorithms-dominated divides the six into two classes: the long ago. If we start with the 2008 Artificial Intelligence paper on
top-ranked class contains AA, GVWY, and ZIP, which each AA, we can see how those experiments were influenced by the
dominate three other algorithms; and the bottom-ranked class, each 1997 paper introducing ZIP, how that ZIP paper took
dominating two other algorithms, are SHVR, ZIC, and GDX, the methodological influence from the 1993 paper introducing ZIC,
algorithm described by IBM at the time of its introduction as the and how that ZIC paper used methods commonplace in
best-performing algorithmic trading system in the published experimental economics research work stretching back all the way
literature. to Smith's 1962 JPE paper. Each paper used the experimental
procedures and methods of one or more previous papers, which is
The strong performance of GVWY may appear to be entirely commonplace in science (we all stand on the shoulders of
counterintuitive, but it is a simple consequence of how traders in giants), but which led to the curious situation that the AA work
any CDA-based financial exchange interact with the LOB. published in the first decade of the 21st Century was
Because the arrival of a spread-crossing bid(ask) will lift(hit) the methodologically almost a carbon-copy of Smith's seminal 1962
best ask(bid) on the counterparty side of the LOB, and the paper. Our work here is an attempt at nudging the field more firmly
transaction then goes through at that previously-posted best price, into using present-day computational methods and resources; the
Python TBSE source-code used to generate our results in this paper [9] D. Gode and S. Sunder, "Allocative efficiency of markets with zero-
is now freely available on GitHub, allowing others to replicate and intelligence traders". J. Political Economy, 101(1):119-137, 1993.
extend the work that we have presented here; we look forward to [10] M. Rollins, Studies of Response-Time Issues in Popular Trading Strategies
via a Multithreaded Exchange Simulator. Master's Thesis, University of
seeing what comes next. Bristol, 2020.
REFERENCES [11] M. Rollins and D. Cliff, "Which trading agent is best? Using a threaded
parallel simulation of a financial market changes the pecking order".
[1] "BSE: The Bristol Stock Exchange". Open-source GitHub repository at Proceedings of the 32nd EMSS; Athens, Greece, in press 2020.
https://github.com/davecliff/BristolStockExchange, 2012. [12] V. Smith, "An experimental study of competitive market behavior". Journal
[2] D. Cliff, "Minimal-intelligence agents for bargaining behaviours in market- of Political Economy, 70(2):111-137, 1962
based environments". HP Labs Tech. Report HPL-97-91; 1997. [13] D. Snashall and D. Cliff, "Adaptive-aggressive traders don't dominate". In
[3] D. Cliff, "Exhaustive testing of trader-agents in realistically dynamic CDA van den Herik, J., Rocha, A., Steels, L., (eds) Agents & Artificial
markets". In Proceedings ICAART-2019. 2019. Intelligence: Papers from ICAART 2019. Springer, 2019.
[4] D. Cliff, "Simulation-based evaluation of automated trading strategies: a [14] G. Tesauro and R. Das, "High-performance bidding agents for the CDA".
manifesto for modern methods." Proc. EMSS-2019, 2019. 3rd ACM Conf. on E-Commerce, pp.206-209, 2001.
[5] R. Das, J. Hanson, J. Kephart, and G. Tesauro, "Agent-human interactions [15] G. Tesauro and J. Bredin, "Sequential strategic bidding in auctions using
in the continuous double auction. Proc. IJCAI-2001, pp.1169-1176, 2001. dynamic programming." Proceedings AAMAS2002, 2002.
[6] M. De Luca and D. Cliff, "Agent-human interactions in the continuous [16] D. Vach, Comparison of Double Auction Bidding Strategies for Trading
double auction, redux." Proceedings ICAART-2011, 2011. Agents. MSc Thesis, Charles University in Prague, 2015.
[7] M. De Luca and D. Cliff, "Human-agent auction interactions: adaptive- [17] P. Vytelingum, D. Cliff, and N. Jennings, "Strategic bidding in continuous
aggressive agents dominate". Proceedings IJCAI-2011, 2011. double auctions", Artificial Intelligence, 172(14):1700-1729, 2008.
[8] S. Gjerstad and J. Dickhaut, "Price formation in continuous double auctions".
Games & Economic Behavior. 22(1):1-29, 1997.