0% found this document useful (0 votes)
32 views18 pages

A Stochastic LQR Model For Child Order Placement in Algorithmic Trading

This document proposes a stochastic linear-quadratic regulator (LQR) model for child order placement in algorithmic trading. The model focuses on the dynamic between placing passive limit orders and using aggressive takeout orders. It assumes a reductionist approach that examines this interplay exclusively. The model defines state and control variables, and derives stage costs related to delay, spread, impact, and Poisson hits. It then formulates Bellman equations and obtains closed-form optimal solutions for child order placement under the stochastic LQR framework.

Uploaded by

mateusfmpetry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views18 pages

A Stochastic LQR Model For Child Order Placement in Algorithmic Trading

This document proposes a stochastic linear-quadratic regulator (LQR) model for child order placement in algorithmic trading. The model focuses on the dynamic between placing passive limit orders and using aggressive takeout orders. It assumes a reductionist approach that examines this interplay exclusively. The model defines state and control variables, and derives stage costs related to delay, spread, impact, and Poisson hits. It then formulates Bellman equations and obtains closed-form optimal solutions for child order placement under the stochastic LQR framework.

Uploaded by

mateusfmpetry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

A Stochastic LQR Model for Child Order Placement

in Algorithmic Trading *
Jackie Jianhong Shen

Financial Services
New York City, USA

March 25, 2020

Abstract
Modern Algorithmic Trading (“Algo”) allows institutional investors and traders to liquidate
or establish big security positions in a fully automated or low-touch manner. Most existing
academic or industrial Algos focus on how to “slice” a big parent order into smaller child orders
over a given time horizon. Few models rigorously tackle the actual placement of these child
orders. Instead, placement is mostly done with a combination of empirical signals and heuristic
decision processes. A self-contained, realistic, and fully functional Child Order Placement (COP)
model may never exist due to all the inherent complexities, e.g., fragmentation due to multiple
venues, dynamics of limit order books, lit vs. dark liquidity, different trading sessions and
rules. In this paper, we propose a reductionism COP model that focuses exclusively on the
interplay between placing passive limit orders and sniping using aggressive takeout orders. The
dynamic programming model assumes the form of a stochastic linear-quadratic regulator (LQR)
and allows closed-form solutions under the backward Bellman equations. Explored in detail are
model assumptions and general settings, the choice of state and control variables and the cost
functions, and the derivation of the closed-form solutions.

Keywords: child order placement, dynamic programming, LQR, delay cost, spread cost,
impact cost, Poisson hits, passive, aggressive, Bellman equation, optimal policy, positive matrix.

Attention: The current work is designed to be exclusively published in the Social Sciences
Research Network (SSRN) preprint server. Its commercial or open-journal publication is
prohibited without the prior consent of the author. The current free style accommodates
informative cover pages and colored text boxes that help enhance the reading experience.

*
Dr. Jackie Shen has been the Global Project Leader for Algo Trading Model Risk at Goldman Sachs, New York,
USA. Previously, he had also been a quantitative strategist at the equities Algorithmic Trading desks at both J.P.
Morgan and Barclays, New York, USA. Email: [email protected]; URL: alum.mit.edu/www/jhshen

Electronic copy available at: https://ssrn.com/abstract=3574365


Contents
1 Introduction to Child Order Placement (COP) 4

2 Basic Model Settings and Scopes 5


2.1 From Macro to Micro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Reductionism and Value of the Current Work . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Placement Problem to be Modelled . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 The COP Model Based on Dynamic Programming 8


3.1 Model and Process Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 State and Control Variables, and State Transition . . . . . . . . . . . . . . . . . . . . 10
3.3 The Stage Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Bellman Equations and Optimal Solutions 11


4.1 Value Functions V (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Solution at the Last Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 General Solution to the Stochastic LQR . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Conclusion and Disclaimers 17

Electronic copy available at: https://ssrn.com/abstract=3574365


The following terms are frequently used throughout the paper.

term meaning
Algo an automated trading process based on algorithms
Profile an intraday time series derived from history, e.g., for volumes
IS implementation shortfall - a popular optimization-based Algo
TWAP time-weighted average price - a standard profile-based Algo
VWAP volume-weighted average price - a standard profile-based Algo
COP child order placement
DP dynamic programming
LQR linear quadratic regulator with linear transitions & quadratic costs
(N)BBO (national) best bid and offer of a venue or market system
LOB limit order book, with buy/sell orders displayed
SOR smart order router/routing
FX foreign exchanges
ADV average daily volume
Passive Touch best bid for buy and best offer for sell
Aggressive Touch best offer for buy and best bid for sell
Near Touch same as Passive Touch
Far Touch same as Aggressive Touch
Takeout or Sniping aggressive market order at the far touch
TMV true market value
Positive Matrix a symmetric real matrix with positive eigenvalues

The following symbols have been consistently used in the paper.

symbol meaning
tn a discrete action time in dynamic programming
Xn± the right/left limit of a quantity X at tn , via tn ± ε
HS half spread between the BBO
M id mid price between the BBO
N (λ) Poisson distribution with rate λ
qn outstanding positions right before tn ; a state variable
λn Poisson hitting rate right before tn ; a state variable
un aggressive market orders at tn ; the control variable
η market impact of aggressive orders on passive fills
γn delay cost penalty at tn ; also expressing risk aversion
xn state variable right before tn ; xn = (qn , λn )
Vn (xn ) the value function at tn in dynamic programming

Electronic copy available at: https://ssrn.com/abstract=3574365


1 Introduction to Child Order Placement (COP)
Small retail orders can be filled by simple vanilla market or limit orders. There is no need for
automated algorithmic trading (or “Algos”) involving sophisticated strategies. Algos are primarily
designed to execute sizable institutional orders from portfolio managers or traders in various funds
or broker-dealers. In the current work, the term “Algos” is restricted to the automated execution
service provided to either external or internal clients by the buy side, sell side, or specialized
execution agents.
A typical single-name Algo is stratified to at least two distinguishable layers, which we shall
refer to as “macro” and “micro” layers or (sub-)Algos.
(A) At the macro layer, a big parent order is sliced into smaller child orders over a series of time
buckets, e.g., 5-minute intervals, which usually depend on the liquidity profiles of a security.
(B) The micro layer handles the execution of the resulted child orders. The actual implementa-
tion and architecture could vary significantly among broker-dealers and execution agents, and
constitute into the very proprietary core of all Algos.
Most well-known Algos in the industry are named after the macro layers, e.g., TWAP, VWAP,
or Implementation Shortfall (IS). These macro Algos are either configured based on historical
benchmarks or optimized under proper utility objectives. The optimization techniques involve
either static or dynamic frameworks, as in these sample works [1, 3, 4, 6, 7, 9, 10, 11]. Overall, the
macro Algos are built upon the macro behaviors of the targeted securities, including for instance,
the historical profiles of volumes, volatilities, spreads, etc, as in our earlier works in [9, 10, 11].
They do not act upon the real-time micro structure signals such as the dynamics of limit order
books (LOB).
The complexities of these Algos mainly reside within the micro layers. The actual real-time
placement of orders on various venues is implemented at this layer, and different Algo providers
may take very different approaches. For example, two offerings of the same VWAP Algo could
differ significantly in terms of architecture and logic.
In general, a micro Algo must handle actions like the following:
(a) a dynamic decision flows for the placement actions and monitoring of their status,
(b) allocation among different order types offered by all accessible venues, and
(c) real-time routing to all accessible liquidity venues, including both lit and dark venues.
In particular, the last component often assumes its own identity in most broker-dealer Algo offerings,
and is called the SOR - Smart Order Router. SORs are vital for some liquid asset classes with
highly fragmented markets, e.g., the common stocks in the USA.
In terms of modeling techniques, macro Algos are either configured or scheduled using bench-
mark profiles such as TWAP or VWAP, or optimized using proper utility functions (e.g., the
mean-variance framework of Almgren-Chriss [2]; also see Shen et al. [9, 10, 11]). Modeling of micro
Algos is much more challenging due to the aforementioned multiple tasks. The main complexi-
ties inherent to market microstructures include venues, sessions, order types, and their optimal
real-time management. SOR is part of this grand effort and probably the most well known or
actively marketed by Algo providers. But the SOR alone is only the last segment of the placement
stream, and is actually irrelevant for single-venue securities such as commodities futures or some
FX products.
In this paper, we attempt to develop a COP model for a single-venue security. Hence SOR is
out of the scope. Instead, the primary focus is on how to dynamically place and manage aggressive
market orders and passive limit orders. The two order types compete in the following manner.

Electronic copy available at: https://ssrn.com/abstract=3574365


(i) Aggressive market orders get filled fast and hence help accomplish order completion, but at
the cost of paying a half spread (with respect to the market mid price) and information leakage
that could harm trailing orders.
(ii) Passive limit orders gain a half spread if filled, but are subject to fill uncertainty and an extra
“chasing” cost if the market drifts away. In general, they jeopardize order completion.
A COP Algo in this context must optimally manage the interplay between the two order types, in
order to achieve the two primary objectives:
(a) lower total cost for filled orders, and
(b) a higher completion rate.
The current work rolls out as follows. After making some reductionism assumptions, we first
build the COP model based on stochastic dynamic programming (SDP). The solution to the re-
sultant stochastic linear-quadratic regulator (LQR) problem is then worked out via the associated
Bellman equations. Theoretical assumptions or practical implications are always discussed in detail
along the way. The main result is summarized into Theorem 1 in Section 4.
We believe this is the first self-contained and mathematically rigorous COP model that has a
closed-form solution.

2 Basic Model Settings and Scopes


2.1 From Macro to Micro
Throughout the rest, in order to gain a more tangible sense of all the model settings, we assume a
concrete working parent order with the following attributes.
(i) It is a “buy” order for a common stock, say.
(ii) The total quantity is Q = 100, 000 shares.
(iii) The average daily volume (ADV) is 2, 000, 000 shares, based on a monthly rolling window.
(iv) The client prefers the order to be completed during the horizon from 12:00 pm to 3:00 pm
EST in the US equity market (which is however consolidated into a single-venue market to
avoid SOR).
Of course none of these example order details actually puts restrictions on the proposed COP
model. For example, the model is applicable to other liquid asset classes such as futures and rates.
At the macro level, Algos such as TWAP, VWAP and IS “slice” the parent order into smaller
child orders over a series of time buckets. For illustration, assume that such an Algo works with 5-
minute time buckets. Then the target 3-hour execution horizon requested by the client is split into
36 time buckets. Further assume that this macro Algo decides to allocate 2, 500 shares or 25 lots
to the specific time bucket [1:00pm, 1:05pm]. In practice these time knots can all be randomized
for anti-gaming.
If the time buckets of the macro Algo are relatively big, e.g., 5 minutes, a micro-layer scheduler
can be further designed to schedule the allocated 2, 500 shares, say, over finer micro bins. At
this layer, sophisticated optimization may be spared in order to save time. In general, an equal
partition, i.e., allocation based on TWAP, can be applied. Take the above working example for
instance. The 25 lots allocated for the macro time bucket [1:00pm, 1:05pm] can be further split

Electronic copy available at: https://ssrn.com/abstract=3574365


to 5 lots over each micro bins of 1 minute, i.e., [1:00pm, 1:01pm], [1:01pm, 1:02pm], etc. Again
randomization of time knots should also be applied.
The introduction of finer micro time bins is necessary for most scenarios. This is because the
macro time buckets must last long enough so that bucket signals are statistically meaningful. Take
the VWAP or IS Algo for example. These macro Algos all depend on the temporal profiles of
security volumes, volatilities, or spreads (e.g., Shen et al. [9, 10, 11]). When the macro buckets are
too brief, the profile values based on historical averaging or filtering would be too noisy and result
in unreliable Algo scheduling at the macro layer. In general, bucket size should be optimized or
adapted to the liquidity profiles of a given security.
The current work takes a reductionism approach and attempts to develop a self-contained
rigorous COP model at the micro bin level. That is, the model handles the actual execution
of individual child (or grandchild) orders over micro bins, e.g., 5 lots over [1:00pm, 1:01pm] for
the above running example. It is a fully automated model based on the framework of stochastic
linear-quadratic regulators (LQR) and allows closed-form solutions.

2.2 Reductionism and Value of the Current Work


In our previous two works on macro Algos:
ˆ Shen [9] on a generic pre-trade macro Algo based on static quadratic programming in Hilbert
spaces, and
ˆ Shen [10] on a real-time adaptive macro Algo based on dynamic programming that integrates
the VWAP and IS Algos,
the proposed Algo models can actually be implemented in execution houses after proper model
calibration is performed using their proprietary trading data.
Hence it is important to point out at the outset that the current micro Algo is more a theoretical
model in the spirit of reductionism. As explained earlier, it is almost impossible to have a single
self-contained model to comprehensively handle the entire micro-layer execution. For instance, it
is nontrivial to handle unexpected intraday trading halts or participation in various auctions. The
main reductionism of the proposed model involves the following aspects.
(a) It is restricted to single-venue executions and does not involve SOR modeling.
(b) It deals with neither lit vs. dark venues nor complex order types (e.g., icebergs or mid-pegging).
(c) It only handles the two most basic order types - limit and market orders.
Notice that in professional trading one almost never sends a “naked” market order. Hence by
market orders we mean more precisely marketable limit orders whose limits cross the far touch.
Despite the reductionism, the value of the current work can be summarized as follows.
ˆ To the academic community, to our best knowledge this is the first rigorous COP model at
the micro layer, which is self-contained and allows closed-form solutions. It opens the door
to more sophisticated or realistic COP models in the future.
ˆ To the Algo practitioners on Wall Street, the model does reveal the intriguing dependency
and competition among different key Algo components: aggressive sniping, passive waiting,
execution cost, information leakage, market impact, and the requirement of completion. The
modeling techniques here can always be tweaked to facilitate existing COP processes.

Electronic copy available at: https://ssrn.com/abstract=3574365


2.3 The Placement Problem to be Modelled
Recall earlier in this section, after macro bucketing and micro binning, one ends up with a COP
problem like the following:
to buy 5 lots over a micro bin [1:00pm, 1:01pm],
or more generally, to buy q lots over a micro bin [T0 , T1 ] with a brief duration ∆T = T1 − T0 of 30
or 60 seconds or so.
The COP problem modelled herein is formulated as follows. A given micro bin [T0 , T1 ] is
partitioned into N action times:
t0 = T0 < t1 < . . . < tN −1 < tN = T1 .
In practice, one could choose periodic knots, say, τ = 5 or 10 seconds, and
tn+1 = tn + τ, n = 0, 1, . . . , N − 1.
Actual implementation could also have them randomized for anti-gaming. Since the micro bin
duration ∆T = T1 − T0 = tN − t0 is brief, it is also assumed that the BBO, i.e., the best bid
and offer, remain unchanged over the given micro bin. (This is introduced merely for convenience.
In reality, the BBO can change realistically in the current model as long as one assumes that the
benchmark market price is the moving mid price and that the limit order placement is cancelled
and replaced whenever the BBO move so that it is effectively pegged to the BBO.)
Following the running example introduced earlier, we shall always work with a buy order - to
buy q0 lots over a micro bin [T0 , T1 ]. For a buy order, we shall introduce the following concepts:
ˆ the passive or near touch - the best bid of the venue, and
ˆ the aggressive or far touch - the best offer of the venue.
For a sell order the other way around holds. Furthermore, let qn denote the remaining lots right
before t ∈ [t0 , tN ]. To simplify notation, we introduce the following convention: for any observable,
variable or parameter X,
Xn := Xn− = lim Xt−ε , and Xn+ = lim Xt+ε .
ε→0+ ε→0+

For a continuous X there is no difference among the three. For the proposed COP model, a sniping
action (or a control in the context of dynamic programming) will take place right at a given action
time tn , and hence they could differ.

Attention - We have defaulted Xn to Xn− to simplify notation and equation lines.

The proposed COP model adopts the following action plan.


(i) At any time t, as long as qn > 0, a single-lot limit order will be placed at the passive touch.
(ii) Whenever such a limit order is filled at t and the remainder qt+ = qt − 1 is still positive, a new
single-lot limit order will be immediately placed at the passive touch.
(iii) At each action time tn from
t0 , . . . , tn , . . . , tN −1 ,
if qn (which represents qn− )still has some positive lots to trade, the COP Algo has an option
to send a single-lot market order at the far touch. We shall nickname this action by “single-lot
sniping” or simply “sniping.” The term “Sniper” or “Sniping” has been popularly used in the
Algo world, e.g., the Sniper Algo of Credit Suisse in this 2007 article of Reuters (with an
active URL link in PDF).

Electronic copy available at: https://ssrn.com/abstract=3574365


The major three characteristics of this target COP problem are: order completion, passive
waiting, and aggressive sniping, which are elaborated as follows.

The Constraint on Completion Order completion is usually enforced at the macro layer. It
is either explicitly formulated into optimization as a constraint (e.g., for the IS Algo) or enforced
through hard scheduling (e.g., for TWAP/VWAP Algos). For micro-layer execution, e.g., executing
5 lots within a micro bin of 30 or 60 seconds, completion can be soft or delayed, in order to better
dance with the liquidity waves in the market. The unfilled can be handed over to the next micro
bin, and so on so forth. The macro Algo on the top usually deploys a dedicated schedule “keeper”
to enforce the schedules.

Passive Limit Order at the Near Touch Recall that for convenience, the BBO have been
assumed invariant over a brief micro bin [T0 , T1 ]. Let
1
HS = (BestOf f erP rice − BestBidP rice)
2
denote the half spread, and
1
M id = (BestOf f erP rice + BestBidP rice)
2
the unbiased mid price that represents the true market value (TMV) at the moment. Compared
with other more sophisticated averaging schemes, this is usually called the simple mid. Once filled,
a limit order saves a half spread HS compared with the TMV. However, if left unfilled by the target
end time, limit orders can jeopardize order completion. A reasonable COP model must be able to
reflect this tradeoff.

Aggressive Sniping at the Far Touch Market or marketable orders take out liquidity from
the top of the opposite LOB, i.e., the far touch. At the micro layer, market orders are always
kept small, e.g., a couple of lots on average for major exchanges in US. Hence the proposed model
always assumes that such small orders are filled instantaneously. Market orders fill fast and help
achieve completion, but at the cost of a half spread HS. Furthermore, aggressive market orders
could also leak information and result in opposite market participants or market makers biasing
their perceived TMV towards the aggressive touch. As a result, they become less willing to take
out “our” limit orders posted at the passive touch. A reasonable COP model must be able to
demonstrate such tradeoffs as well.

3 The COP Model Based on Dynamic Programming


3.1 Model and Process Assumptions
We now introduce the basic assumptions for the model and process.

Assumptions on Limit Order Placement For the placement of limit orders, the following
basic assumptions are made.
(A) The size of the limit order is always a single lot.
(B) A single-lot limit order is placed initially at t0 .

Electronic copy available at: https://ssrn.com/abstract=3574365


(C) Afterwards, whenever the limit order is taken out by an opposite market order at time t, a new
single-lot limit order is immediately replenished as long as the remaining position qn+ = qn − 1
is still positive.
(D) For any time interval of duration ∆t, as long as “we” do not snipe at the aggressive touch
within the interval, the number W of single-lot limit orders being hit is subject to a Poisson
distribution N (λ∆t) with some rate λ. That is
(λ∆t)n
Prob(W = n) = e−λ∆t , n = 0, 1, . . . . (1)
n!
It is also assumed that once being hit the entire lot is taken. Poisson distribution or process is not
unfamiliar in the context of Algo trading and limit order books [8].

Assumptions on the Aggressive Market Orders For sniping using aggressive market orders,
e.g., sending a marketable order un at time tn at the far touch, the following assumptions are made.
(a) The ideal goal is to set un to either a single lot or zero (i.e., no sniping), so that information
leakage and market impact can be curbed. In reality, to facilitate a tractable dynamic pro-
gramming (DP) formulation with closed-form solutions, the single lot constraint is not explicitly
imposed. The cost function designed later will naturally encourage un to stay small.
(b) It is also assumed that once an aggressive market order un is sniped, the entire order will be
filled. Since the cost function in general keeps un small, this assumption holds naturally for
most liquid securities.
(c) Since during the short duration of a micro bin, the BBO are assumed to be invariant, we adopt
the following model for information leakage caused by an aggressive sniping at time t.
Once un lots are taken out (by “us”) at the far touch, other market participants, including
in particular market makers, will update their belief on the TMV and bias it towards the far
touch. As a result,

either fewer opposite participants are willing to snipe at the passive touch or market makers
will also cancel their more passive inside limit orders and replace with new ones at the
passive touch.

The latter will congest the queue at the passive touch. Hence heuristically both will reduce the
chance of “our” limit orders being hit by the market.
Quantitatively, we assume that the Poisson hitting rate introduced in Eqn. (1) will be negatively
impacted by the following linear form:
λ+
n = λn − ηun , (2)
after un lots are sniped and filled at time tn . Here η > 0 is a model parameter that calibrates
the rate of information leakage or market impact caused by aggressive sniping. In general, it
should depend on the liquidity profile of a given security.
For instance, assume λ = 5.0 lots per minute, and η = 0.5 per lot per minute. Then a sniping
of un = 2 lots at some time tn will reduce the Poisson hitting rate according to:
λ+
n = λn − ηun = 5.0 − 0.5 ∗ 2 = 4.0.

The impact parameter η can be calibrated or estimated using experimental orders that are
designed specifically for this purpose.

Electronic copy available at: https://ssrn.com/abstract=3574365


3.2 State and Control Variables, and State Transition
To develop the dynamic programming (DP) COP model, we first define the state variables and
their transition.
There are two state variables, qn and λn , or organized into a state vector xn = (qn , λn )T , where
the superscript T denotes transposition of vectors or matrices.
ˆ State variable qn denotes the outstanding lots still needed to be traded right before tn . The
symbol is equivalent to qn− but with the minus superscript omitted, as set up earlier.
ˆ State variable λn denotes the Poisson hitting rate right before tn . It represents the rate the
opposite aggressive orders hit “our” limit orders. It changes whenever “we” snipe a market
order un at the far touch, due to information leakage and its digestion by other market
participants.
The control variable is the aggressive takeout order un that “we” snipe at tn at the far touch.
Furthermore, let Wn denote the Poisson random number of “our” limit orders being hit by the
opposite aggressive orders. Then from tn to tn+1 (or more precisely t− n+1 ), the following state
transition equations hold.

qn+1 = qn − un − Wn , (3)
λn+1 = λn − ηun . (4)

In the vector form, define


a = (1, η)T , z n = (Wn , 0)T .
Then the state vector transits as follows:

xn+1 = xn − aun − z n . (5)

3.3 The Stage Cost


All variables are assumed to be continuous, as normally done in the Algo literature. (In reality
trades are mostly in whole shares or lots.)
Following all the previous preparation, the COP problem is formulated as a DP problem with
controls taken at one of the following action times:

tn ∈ {t0 , t1 , . . . , tN −1 }.

No action is taken at the terminal time knot tN . At each tn , we first define the initial candidate
(1)
jn for the stage cost, which is still subject to revision later on:

jn(1) (un , Wn | xn ) = γn qn2 + un − Wn . (6)

It is explained as follows.
(a) The first term γn qn2 favors fast execution so that qn ’s quickly touch down to zero. It appears
in earlier works for both static and dynamic macro Algos, e.g., Algren-Chriss [2], Hora [6], and
Shen [9, 10], just to name a few. The penalty coefficient γn ’s are the control parameters to
penalize trading delays. In general, γn can be set in proportion to the real-time variance σn2
of the secuity, i.e., in the form of γn = γ̃n σn2 . In addition, γn should increase monotonically to
facilitate order completion. For instance, γN = +∞ would enforce hard completion: qN = 0.

10

Electronic copy available at: https://ssrn.com/abstract=3574365


(b) Wn is the number of single-lot limit orders being taken out by opposite market orders at “our”
passive touch. Limit orders save a half-spread HS. We incorporate the scaling of HS into γn
so that −HS · Wn can be simplified to −Wn . Following the general Poisson setting in Eqn. (1),
we assume more specifically that Wn is subject to the Poisson distribution N (λ+n ∆tn ) with

λ+
n = λn − ηun , and ∆tn = tn+1 − tn .

(c) un is the number of lots that “we” snipe at the aggressive touch at the action time tn . It pays
the cost of a half spread, i.e., HS · un . Since HS is scaled into γn , it is simply expressed as un
in the stage cost.
For a buy order, un is preferably nonnegative. In order to facilitate a close-form solution,
however, we do not explicitly impose the constraint of un ≥ 0. When un < 0, we shall interpret
it as an aggressive sell order of size −un at the current passive touch. If this is the case, the
update position will increase:
qn+ = qn − un > qn .
The first term γn qn2 will discourage such opposite trades as long as the risk aversion weights
γn ’s are not negligible.
On the other hand, compared with the market mid price, selling at the passive touch also incurs
a cost of a half spread, i.e, HS · (−un ). Hence when un is allowed to be signed, the stage cost
in Eqn. (6) should at least be revised to:

jn(2) (un , Wn | xn ) = γn qn2 + |un | − Wn .

Since we intend to design a DP model with a closed-form solution, the absolute value is further
revised to a squared form:

jn (un , Wn | xn ) = γn qn2 + u2n − Wn , (7)

which is the final stage cost adopted for the current model.
When un stays close to a single lot, u2n ' |un |. For |un | > 1, this quadratic form penalizes big
sizes even heavier than the linear form. It favors smaller aggressive order sizes as a result.
Inspired by the Γ-convergence theory and its application in multi-phase variational problems [5],
one could also introduce the double-well cost function:
u2n (1 − un )2
, with 0 < ε  1.
ε
This will softly enforce the binary sniping behavior - either no action with un = 0 or sniping
with a single lot un = 1. However, such high order non-convex costs can completely thwart the
effort of designing a DP model with a unique and close-form solution.

Summary. The stage cost model in Eqn. (7) and the state transition model in Eqn. (5) define
the proposed stochastic dynamic programming model for child order placement (COP).

4 Bellman Equations and Optimal Solutions


4.1 Value Functions V (x)
At any “current” action time tn with state variable xn = (qn , λn )T , for any choice of policy or action
sequence un = (un , . . . , uN −1 ) at (tn , . . . , tN −1 ), let W n = (Wn , . . . , WN −1 ) denote the resulting

11

Electronic copy available at: https://ssrn.com/abstract=3574365


Poisson hits on “our” passive limit orders during individual intervals [tk , tk+1 )’s. Then the future
cost is given by
Jn (un , W n | xn ) = Jn (un , . . . , uN −1 ; Wn , . . . , WN −1 | xn )
= jn (un , Wn | xn ) + . . . + jN −1 (uN −1 , WN −1 | xN −1 ) + jN (xN ) (8)
= jn (un , Wn | xn ) + Jn+1 (un+1 , W n+1 | xn+1 )
Here jN (xN ) denotes the terminal cost at tN . Define the value function by:
Vn (xn ) = inf EW n [Jn (un , W n | xn )] , (9)
un

ranging over all state-driven policies in the form of uk = φk (xk ), k = n, . . . , N − 1. Then we have
the Bellman equation at each action time tn :
Vn (xn ) = inf EWn [jn (un , Wn | xn )] + EWn [Vn+1 (xn+1 )] . (10)
un

The terminal cost at tN is defined to be


2
VN (xN ) = jN (xN ) = γN qN . (11)
The other two terms are dropped out since the end of the given micro bin is reached.
Next, we shall work out first the solution for the last period [tN −1 , tN ) to gain some tangible
knowledge, and then the general solution in the framework of stochastic LQR.

4.2 Solution at the Last Period


At action time tN −1 for the last period [tN −1 , tN ), the Bellman equation reads:
2 2 + 2
 
VN −1 (xN −1 ) = inf γN −1 qN
uN −1 −1 + uN −1 − λN −1 ∆tN −1 + γN EWN −1 (qN −1 − uN −1 − WN −1 ) .

For clarity in calculation, we drop the subscript N − 1 so that it becomes


V (x) = inf f (u | x) = inf γq 2 + u2 − λ+ ∆t + γN EW (q − u − W )2 ,
u u

with the current state vector x = (q, λ)T which is known, and λ+ = λ − ηu. The problem becomes
the minimization of a single-variate function f (u | x) given x.
Since E [W ] = λ+ ∆t, and E[X 2 ] = Var(X) + E[X]2 for a generic random variable X, one has
E(q − u − W )2 = Var(q − u − W ) + (q − u − λ+ ∆t)2
= Var(W ) + (q − u − λ+ ∆t)2
= λ+ ∆t + (q − u − λ+ ∆t)2 .
As a result, the derivative of f is:
df dλ+ dλ+
= 2u + (γN − 1)∆t − 2γN (q − u − λ+ ∆t)(1 + ∆t )
du du du
= 2(1 + γN (1 − η∆t)2 )u − η∆t(γN − 1) − 2γN (1 − η∆t)(q − λ∆t).
At the optimal u∗ , the derivative vanishes. Hence the optimal aggressive trading is given by:
u∗ = α∗ + β∗ (q − λ∆t), with
(γN − 1)η∆t γN (1 − η∆t) (12)
α∗ = , and β∗ =
2(1 + γN (1 − η∆t)2 ) 1 + γN (1 − η∆t)2

12

Electronic copy available at: https://ssrn.com/abstract=3574365


In particular, the optimal policy u∗ = φ∗ (x) = φ∗ (q, λ) is a linear function of the state variable.
Consider the asymptotic case when γN = +∞. Then one has
η∆t 1
α∗ = , and β∗ = .
2(1 − η∆t)2 1 − η∆t

In particular, for a highly liquid security so that η ' 0, the optimal policy is simply

u∗ = α∗ + β∗ (q − λ∆t) ' q − λ∆t.

That is, on the last action time tN −1 , the proposed COP Algo will trade the expected remaining
lots that cannot be filled by the passive limit orders W (since E[W ] = λ+ ∆t ' λ∆t when η ' 0).
This certainly makes business sense.
In terms of the value function at tN −1 , one then has

V (x) = f (u∗ | x)
= γq 2 + u2∗ + (γN − 1)(λ∆t − η∆tu∗ ) + γN (q − λ∆t − (1 − η∆t)u∗ )2 (13)
2 2
= γq + c1 (q − λ∆t) + c2 (q − λ∆t) + c3 ,

where γN
c1 = β∗2 + γN (1 − (1 − η∆t)β∗ )2 = > 0.
1 + γN (1 − η∆t)2
Hence the value function for the last period can be written in the canonical quadratic form as:

V (x) = xT P x + bT x + c, (14)

where P must be a positive definite matrix. This is because by the quadratic portion of V (x),

γq 2 + c1 (q − λ∆t)2 = 0 ⇒ q = 0, λ = 0.

Next we show that this is not accidental.

4.3 General Solution to the Stochastic LQR


In general, for n = 0, 1, . . . , N − 2, assume that the value function at n + 1 is in the quadratic form:

Vn+1 (xn+1 ) = xTn+1 Pn+1 xn+1 + bTn+1 xn+1 + cn+1 , (15)

where Pn+1 is positive definite. We now show that this implies that

Vn (xn ) = xTn Pn xn + bTn xn + cn ,

where Pn is also positive definite.


Furthermore, we show that the optimal action is given in the linear form:

un = φn (xn ) = αn + β Tn xn .

The objective is to derive αn , β n , Pn , bn and cn from Pn+1 , bn+1 and cn+1 recursively.
For clarity, we drop the subscript n so that for any variable or parameter X, we use instead

Xn −→ X, and Xn+1 −→ X1 .

13

Electronic copy available at: https://ssrn.com/abstract=3574365


In particular, the value function at n + 1 now assumes the cleaner form:

V1 (x1 ) = xT1 P1 x1 + bT1 x1 + c1 . (16)

Also the state transition equation becomes:

x1 = x − au − z, with a = (1, η)T , z = (W, 0)T ,

where η is a model parameter or constant that represents the market impact or information leakage
and W is the Poisson random hits on “our” passive limit orders over [tn , tn+1 ).
Then the value function at tn is given by:

V (x) = min f (u | x) = min γq 2 + u2 − λ+ ∆t + EW V1 (x1 ).


u u

Let p11 denote the (1,1)-element of P1 , and z E = (λ+ ∆t, 0)T = E[z]. Then

EW V1 (x1 ) = V1 (EW x1 ) + EW (z − z E )T P1 (z − z E )
= V1 (x − au − z E ) + p11 Var(W )
= V1 (x − au − z E ) + p11 λ+ ∆t.

We now Define    
0 ∆t 1 − η∆t
L= , J = I2 − L, and h = , (17)
0 0 η
where I2 denote the 2 by 2 identity matrix. Then
   
λ∆t − η∆tu η∆t
zE = = Lx − u,
0 0
x − au − z E = Jx − hu.

Hence we have, with l11 = 1 − p11 ,

f (u) = f (u | x) = γq 2 + u2 − l11 λ+ ∆t + V1 (Jx − hu). (18)

Assume that
f (u) = f (0) − Bu + Au2 . Then, f 0 (u) = 2Au − B. (19)
On the other hand, direct differentiation gives

f 0 (u) = 2u + l11 η∆t − hT ∇V1 (Jx − hu)


= 2(1 + hT P1 h)u − (hT b1 − l11 η∆t) − 2hT P1 Jx.

Hence we have

A = 1 + hT P1 h, and B = (hT b1 − l11 η∆t) + 2hT P1 Jx. (20)

Therefore, the optimal policy at tn−1 is given by


B
u∗ = = α∗ + β T∗ x, with
2A
(21)
hT b1 − l11 η∆t hT P 1 J
α∗ = , and β∗T =
2(1 + hT P1 h) 1 + hT P1 h

14

Electronic copy available at: https://ssrn.com/abstract=3574365


Next we derive the associated value function V (x) = f (u∗ | x). For convenience, for any
positive definite matrix Q and any real vector x of the same dimension, we define the Q-stretched
Euclidean norm by:
kxk2Q = xT Qx.
Then the coefficient A is simply 1 + khk2P1 .
Since 2Au∗ = B, one has 2Au2∗ = Bu∗ . Hence by Eqn. (18),

V (x) = f (u∗ | x)
= f (0) − Bu∗ + Au2∗
= f (0) − Au2∗
= γq 2 − l11 λ∆t + V1 (Jx) − A(α∗ + β T∗ x)2
= γq 2 − l11 λ∆t + (xT J T P1 Jx + bT1 Jx + c1 ) − (1 + khk2P1 )(α∗ + β T∗ x)2
= γq 2 + xT (J T P1 J)x − (1 + khk2P1 )xT β ∗ β T∗ x
− l11 λ∆t + bT1 Jx − 2(1 + khk2P1 )α∗ β T∗ x
+ c1 − (1 + khk2P1 )α∗2
= xT P x + bT x + c,

where the quadratic parameters are given by

P1 hhT P1
 
1 0
P =γ + J T QJ, with Q = P1 − ,
0 0 1 + khk2P1
(22)
bT = bT1 J − l11 ∆t(0, 1) − 2(1 + khk2P1 )α∗ β T∗ ,
c = c1 − (1 + khk2P1 )α∗2 .

where l11 = 1 − P1 (1, 1) and α∗ , β ∗ are given as in Eqn. (21).


We now show that P is positive definite.
Lemma 1 If P1 is positive definite, so must be P .
Proof. Since J = I2 − L is non-singular as in Eqn (17), it suffices to show that Q is positive
definite.
For any non-zero vector v ∈ R2 , previously we have used the notation kvkP1 to denote the
P1 -stretched Euclidean distance. More generally, we use

v, h P1
:= v T P1 h

to denote the P1 -stretched inner product. By the Cauchy-Schwarz Theorem, one has

v, h P1
≤ kvkP1 · khkP1 .

15

Electronic copy available at: https://ssrn.com/abstract=3574365


Then back to Q, one has
!
T T P1 hhT P1
v Qv = v P1 − v
1 + khk2P1
2
v, h P1
= kvk2P1 −
1+ khk2P1
2
kvk2P1 + kvk2P1 khk2P1 − v, h P1
=
1 + khk2P1
kvk2P1
≥ > 0.
1 + khk2P1

Since this holds for any non-zero vector v, Q and hence P must be positive definite. 
We have thus established the following theorem, with Jn and hn defined as in Eqn. (17):
   
1 −∆tn 1 − η∆tn
Jn = , and hn = .
0 1 η

They are constant for equal partitioning when ∆tn = tn+1 − tn ’s are all the same.

Theorem 1 Let VN (xN ) = γN qN 2 be the terminal cost at the ending time t . Then at each
N
action time tn with n < N , there exist a positive definite 2 by 2 matrix Pn , a 2 by 1 vector bn ,
a scalar cn , such that the value function Vn is given by:
   
T T qn T qn
Vn (xn ) = xn Pn x + bn xn + cn = (qn , λn )Pn + bn + cn . (23)
λn λn

The optimal policy un is given by the linear form using parameters at tn+1 :

u∗n = φn (xn ) = αn∗ + xTn β ∗n , with


bTn+1 hn − (1 − Pn+1 (1, 1))η∆tn JnT Pn+1 hn (24)
αn∗ = , and β ∗n = ,
2(1 + hTn Pn+1 hn ) 1 + hTn Pn+1 hn

where Pn+1 (1, 1) denotes the (1,1)-element of Pn+1 . Furthermore, the structure of the value
functions also cascades backwards as follows:

Pn+1 hn hTn Pn+1


   
1 0 T
Pn = γn + Jn Pn+1 − Jn ,
0 0 1 + hTn Pn+1 hn
 
0 (25)
bn = JnT bn+1 − (1 − Pn+1 (1, 1))∆tn − 2(1 + hTn Pn+1 hn )αn∗ β ∗n ,
1
cn = cn+1 − (1 + hTn Pn+1 hn )(αn∗ )2 ,

with n = N − 1, . . . , 1, 0, and terminal values PN = diag(γN , 0), bN = 0 and cN = 0.

16

Electronic copy available at: https://ssrn.com/abstract=3574365


5 Conclusion and Disclaimers
We conclude the current work with the following comments and disclaimers.

(1) The proposed dynamic programming COP Algo had not been the internal or external product
of any execution houses where the author worked previously. Any potential industrial conflict
or suspected proprietary trespass should be promptly directed to the attention of the author,
together with necessary evidences.
(2) In the spirit of reductionism and the pursuit of a dynamic programming COP model with
closed-form solutions, the current model does not address other important execution or imple-
mentation details, including fragmented venues in the national market system (NMS), different
trading sessions and rules, various order types, lit vs. dark, and so on.
(3) The current work focuses exclusively on the dynamic interplay between aggressive takeout
orders and passive limit orders. The price improvement or cost is represented by a half spread.
Information leakage or the market impact of aggressive orders is reflected in the reduction of
Poisson hitting rates on passive limit orders.
(4) Like some earlier DP macro Algos, risk aversion is implemented by the delay cost in the stage
cost model. It facilitates soft completion of a given child order over its designated micro time
bin. Hard completion or catchup is usually implemented at the macro layer.
(5) If the results here are to be integrated into an existing COP program in an execution house, a
practitioner should apply some heuristic but necessary overlays. For instance, u∗n ≤ 0 can be
interpreted as no sniping, while u∗n > 0 should also be capped by the outstanding position qn .
(6) Overall, the author wishes that the current model could inspire more similar and rigorous
works that can improve the heuristic decision trees prevailing in the COP processes in the
contemporary Algo industry.

Acknowledgments
Jackie Shen is very grateful to all the colleagues at the electronic or algorithmic trading desks
of J.P. Morgan, Barclays and Goldman Sachs, esp. to many of our hard-working IT, RISK, and
COMPLIANCE colleagues whose names hardly appear in the headlines, for their daily professional
assistance as well as generous personal support. Reliable and healthy electronic trading would be
impossible without solid IT implementations of databases, data streaming, servers, networks, and
multiple inter-dependent Algo components, or effective risk management and compliance controls.
This work was completed when the pandemic Covid-19 was sweeping through the entire globe
mercilessly. Under tremendous mental pressure living in the epicenter of New York, the author is
extremely grateful to his family, friends, and colleagues, as well as thousands of courageous and
selfless medical professionals, policemen and policewomen, and fire fighters of this great city.
The pandemic has actually brought the people in the city and around the globe much closer
and more united, as my 9-year old observes from her numerous Zoom online classes and chats,
as well as all the touching stories around the world on fighting against the virus. Beyond A.I. or
automated trading “robots” as the current work has covered, the pandemic has grounded all of us
to the very core meaning of human beings and human societies. At the end of this darkest storm
there will be a brightest rainbow — so colorful, refreshing, and full of new hopes.

17

Electronic copy available at: https://ssrn.com/abstract=3574365


References
[1] R. Almgren. Optimal trading with stochastic liquidity and volatility. SIAM J. Financial
Math., 3:163–181, 2012.
[2] R. Almgren and N. Chriss. Optimal execution of portfolio transactions. J. Risk, 3:5–39, 2000.
[3] D. Bertsimas and A. W. Lo. Optimal control of execution costs. J. Financial Markets, 1(1):1–
50, 1998.
[4] B. Bouchard, N.-M. Dang, and C.-A. Lehalle. Optimal control of trading algorithms: a general
impulse control approach. SIAM J. Finan. Math., 2(1):404–438, 2011.
[5] T. F. Chan and J. Shen. Image Processing and Analysis: variational, PDE, wavelet, and
stochastic methods. SIAM Publisher, Philadelphia, 2005.
[6] Merell Hora. Tactical liquidity trading and intraday volume. Preprint, pages 1–28, 2006.
[7] G. Huberman and W. Stanzl. Optimal liquidity trading. Yale School of Management Working
Papers, YSM 165, 2001.
[8] A. Obizhaeva and J. Wang. Optimal trading strategy and supply/demand dynamics. J.
Financial Markets, 16(1):1–32, 2013.
[9] J. Shen. A pre-trade algorithmic trading model under given volume measures and generic price
dynamics. Applied Math. Res. eXpress, Oxford Univ. Press, 2015(1):64–98, 2015.
[10] J. Shen. Hybrid IS-VWAP dynamic algorithmic trading via LQR. Social Sci. Res. Network
(SSRN) Preprint, (2984297):1–23, 2017.
[11] J. Shen and Y. Yu. Styled dynamic algorithmic trading and the MV-MVP style. Social Sci.
Res. Network (SSRN) Preprint, (2507002):1–18, 2014.

18

Electronic copy available at: https://ssrn.com/abstract=3574365

You might also like