A Stochastic LQR Model For Child Order Placement in Algorithmic Trading
A Stochastic LQR Model For Child Order Placement in Algorithmic Trading
in Algorithmic Trading *
Jackie Jianhong Shen
Financial Services
New York City, USA
Abstract
Modern Algorithmic Trading (“Algo”) allows institutional investors and traders to liquidate
or establish big security positions in a fully automated or low-touch manner. Most existing
academic or industrial Algos focus on how to “slice” a big parent order into smaller child orders
over a given time horizon. Few models rigorously tackle the actual placement of these child
orders. Instead, placement is mostly done with a combination of empirical signals and heuristic
decision processes. A self-contained, realistic, and fully functional Child Order Placement (COP)
model may never exist due to all the inherent complexities, e.g., fragmentation due to multiple
venues, dynamics of limit order books, lit vs. dark liquidity, different trading sessions and
rules. In this paper, we propose a reductionism COP model that focuses exclusively on the
interplay between placing passive limit orders and sniping using aggressive takeout orders. The
dynamic programming model assumes the form of a stochastic linear-quadratic regulator (LQR)
and allows closed-form solutions under the backward Bellman equations. Explored in detail are
model assumptions and general settings, the choice of state and control variables and the cost
functions, and the derivation of the closed-form solutions.
Keywords: child order placement, dynamic programming, LQR, delay cost, spread cost,
impact cost, Poisson hits, passive, aggressive, Bellman equation, optimal policy, positive matrix.
Attention: The current work is designed to be exclusively published in the Social Sciences
Research Network (SSRN) preprint server. Its commercial or open-journal publication is
prohibited without the prior consent of the author. The current free style accommodates
informative cover pages and colored text boxes that help enhance the reading experience.
*
Dr. Jackie Shen has been the Global Project Leader for Algo Trading Model Risk at Goldman Sachs, New York,
USA. Previously, he had also been a quantitative strategist at the equities Algorithmic Trading desks at both J.P.
Morgan and Barclays, New York, USA. Email: [email protected]; URL: alum.mit.edu/www/jhshen
term meaning
Algo an automated trading process based on algorithms
Profile an intraday time series derived from history, e.g., for volumes
IS implementation shortfall - a popular optimization-based Algo
TWAP time-weighted average price - a standard profile-based Algo
VWAP volume-weighted average price - a standard profile-based Algo
COP child order placement
DP dynamic programming
LQR linear quadratic regulator with linear transitions & quadratic costs
(N)BBO (national) best bid and offer of a venue or market system
LOB limit order book, with buy/sell orders displayed
SOR smart order router/routing
FX foreign exchanges
ADV average daily volume
Passive Touch best bid for buy and best offer for sell
Aggressive Touch best offer for buy and best bid for sell
Near Touch same as Passive Touch
Far Touch same as Aggressive Touch
Takeout or Sniping aggressive market order at the far touch
TMV true market value
Positive Matrix a symmetric real matrix with positive eigenvalues
symbol meaning
tn a discrete action time in dynamic programming
Xn± the right/left limit of a quantity X at tn , via tn ± ε
HS half spread between the BBO
M id mid price between the BBO
N (λ) Poisson distribution with rate λ
qn outstanding positions right before tn ; a state variable
λn Poisson hitting rate right before tn ; a state variable
un aggressive market orders at tn ; the control variable
η market impact of aggressive orders on passive fills
γn delay cost penalty at tn ; also expressing risk aversion
xn state variable right before tn ; xn = (qn , λn )
Vn (xn ) the value function at tn in dynamic programming
For a continuous X there is no difference among the three. For the proposed COP model, a sniping
action (or a control in the context of dynamic programming) will take place right at a given action
time tn , and hence they could differ.
The Constraint on Completion Order completion is usually enforced at the macro layer. It
is either explicitly formulated into optimization as a constraint (e.g., for the IS Algo) or enforced
through hard scheduling (e.g., for TWAP/VWAP Algos). For micro-layer execution, e.g., executing
5 lots within a micro bin of 30 or 60 seconds, completion can be soft or delayed, in order to better
dance with the liquidity waves in the market. The unfilled can be handed over to the next micro
bin, and so on so forth. The macro Algo on the top usually deploys a dedicated schedule “keeper”
to enforce the schedules.
Passive Limit Order at the Near Touch Recall that for convenience, the BBO have been
assumed invariant over a brief micro bin [T0 , T1 ]. Let
1
HS = (BestOf f erP rice − BestBidP rice)
2
denote the half spread, and
1
M id = (BestOf f erP rice + BestBidP rice)
2
the unbiased mid price that represents the true market value (TMV) at the moment. Compared
with other more sophisticated averaging schemes, this is usually called the simple mid. Once filled,
a limit order saves a half spread HS compared with the TMV. However, if left unfilled by the target
end time, limit orders can jeopardize order completion. A reasonable COP model must be able to
reflect this tradeoff.
Aggressive Sniping at the Far Touch Market or marketable orders take out liquidity from
the top of the opposite LOB, i.e., the far touch. At the micro layer, market orders are always
kept small, e.g., a couple of lots on average for major exchanges in US. Hence the proposed model
always assumes that such small orders are filled instantaneously. Market orders fill fast and help
achieve completion, but at the cost of a half spread HS. Furthermore, aggressive market orders
could also leak information and result in opposite market participants or market makers biasing
their perceived TMV towards the aggressive touch. As a result, they become less willing to take
out “our” limit orders posted at the passive touch. A reasonable COP model must be able to
demonstrate such tradeoffs as well.
Assumptions on Limit Order Placement For the placement of limit orders, the following
basic assumptions are made.
(A) The size of the limit order is always a single lot.
(B) A single-lot limit order is placed initially at t0 .
Assumptions on the Aggressive Market Orders For sniping using aggressive market orders,
e.g., sending a marketable order un at time tn at the far touch, the following assumptions are made.
(a) The ideal goal is to set un to either a single lot or zero (i.e., no sniping), so that information
leakage and market impact can be curbed. In reality, to facilitate a tractable dynamic pro-
gramming (DP) formulation with closed-form solutions, the single lot constraint is not explicitly
imposed. The cost function designed later will naturally encourage un to stay small.
(b) It is also assumed that once an aggressive market order un is sniped, the entire order will be
filled. Since the cost function in general keeps un small, this assumption holds naturally for
most liquid securities.
(c) Since during the short duration of a micro bin, the BBO are assumed to be invariant, we adopt
the following model for information leakage caused by an aggressive sniping at time t.
Once un lots are taken out (by “us”) at the far touch, other market participants, including
in particular market makers, will update their belief on the TMV and bias it towards the far
touch. As a result,
either fewer opposite participants are willing to snipe at the passive touch or market makers
will also cancel their more passive inside limit orders and replace with new ones at the
passive touch.
The latter will congest the queue at the passive touch. Hence heuristically both will reduce the
chance of “our” limit orders being hit by the market.
Quantitatively, we assume that the Poisson hitting rate introduced in Eqn. (1) will be negatively
impacted by the following linear form:
λ+
n = λn − ηun , (2)
after un lots are sniped and filled at time tn . Here η > 0 is a model parameter that calibrates
the rate of information leakage or market impact caused by aggressive sniping. In general, it
should depend on the liquidity profile of a given security.
For instance, assume λ = 5.0 lots per minute, and η = 0.5 per lot per minute. Then a sniping
of un = 2 lots at some time tn will reduce the Poisson hitting rate according to:
λ+
n = λn − ηun = 5.0 − 0.5 ∗ 2 = 4.0.
The impact parameter η can be calibrated or estimated using experimental orders that are
designed specifically for this purpose.
qn+1 = qn − un − Wn , (3)
λn+1 = λn − ηun . (4)
tn ∈ {t0 , t1 , . . . , tN −1 }.
No action is taken at the terminal time knot tN . At each tn , we first define the initial candidate
(1)
jn for the stage cost, which is still subject to revision later on:
It is explained as follows.
(a) The first term γn qn2 favors fast execution so that qn ’s quickly touch down to zero. It appears
in earlier works for both static and dynamic macro Algos, e.g., Algren-Chriss [2], Hora [6], and
Shen [9, 10], just to name a few. The penalty coefficient γn ’s are the control parameters to
penalize trading delays. In general, γn can be set in proportion to the real-time variance σn2
of the secuity, i.e., in the form of γn = γ̃n σn2 . In addition, γn should increase monotonically to
facilitate order completion. For instance, γN = +∞ would enforce hard completion: qN = 0.
10
λ+
n = λn − ηun , and ∆tn = tn+1 − tn .
(c) un is the number of lots that “we” snipe at the aggressive touch at the action time tn . It pays
the cost of a half spread, i.e., HS · un . Since HS is scaled into γn , it is simply expressed as un
in the stage cost.
For a buy order, un is preferably nonnegative. In order to facilitate a close-form solution,
however, we do not explicitly impose the constraint of un ≥ 0. When un < 0, we shall interpret
it as an aggressive sell order of size −un at the current passive touch. If this is the case, the
update position will increase:
qn+ = qn − un > qn .
The first term γn qn2 will discourage such opposite trades as long as the risk aversion weights
γn ’s are not negligible.
On the other hand, compared with the market mid price, selling at the passive touch also incurs
a cost of a half spread, i.e, HS · (−un ). Hence when un is allowed to be signed, the stage cost
in Eqn. (6) should at least be revised to:
Since we intend to design a DP model with a closed-form solution, the absolute value is further
revised to a squared form:
which is the final stage cost adopted for the current model.
When un stays close to a single lot, u2n ' |un |. For |un | > 1, this quadratic form penalizes big
sizes even heavier than the linear form. It favors smaller aggressive order sizes as a result.
Inspired by the Γ-convergence theory and its application in multi-phase variational problems [5],
one could also introduce the double-well cost function:
u2n (1 − un )2
, with 0 < ε 1.
ε
This will softly enforce the binary sniping behavior - either no action with un = 0 or sniping
with a single lot un = 1. However, such high order non-convex costs can completely thwart the
effort of designing a DP model with a unique and close-form solution.
Summary. The stage cost model in Eqn. (7) and the state transition model in Eqn. (5) define
the proposed stochastic dynamic programming model for child order placement (COP).
11
ranging over all state-driven policies in the form of uk = φk (xk ), k = n, . . . , N − 1. Then we have
the Bellman equation at each action time tn :
Vn (xn ) = inf EWn [jn (un , Wn | xn )] + EWn [Vn+1 (xn+1 )] . (10)
un
with the current state vector x = (q, λ)T which is known, and λ+ = λ − ηu. The problem becomes
the minimization of a single-variate function f (u | x) given x.
Since E [W ] = λ+ ∆t, and E[X 2 ] = Var(X) + E[X]2 for a generic random variable X, one has
E(q − u − W )2 = Var(q − u − W ) + (q − u − λ+ ∆t)2
= Var(W ) + (q − u − λ+ ∆t)2
= λ+ ∆t + (q − u − λ+ ∆t)2 .
As a result, the derivative of f is:
df dλ+ dλ+
= 2u + (γN − 1)∆t − 2γN (q − u − λ+ ∆t)(1 + ∆t )
du du du
= 2(1 + γN (1 − η∆t)2 )u − η∆t(γN − 1) − 2γN (1 − η∆t)(q − λ∆t).
At the optimal u∗ , the derivative vanishes. Hence the optimal aggressive trading is given by:
u∗ = α∗ + β∗ (q − λ∆t), with
(γN − 1)η∆t γN (1 − η∆t) (12)
α∗ = , and β∗ =
2(1 + γN (1 − η∆t)2 ) 1 + γN (1 − η∆t)2
12
In particular, for a highly liquid security so that η ' 0, the optimal policy is simply
That is, on the last action time tN −1 , the proposed COP Algo will trade the expected remaining
lots that cannot be filled by the passive limit orders W (since E[W ] = λ+ ∆t ' λ∆t when η ' 0).
This certainly makes business sense.
In terms of the value function at tN −1 , one then has
V (x) = f (u∗ | x)
= γq 2 + u2∗ + (γN − 1)(λ∆t − η∆tu∗ ) + γN (q − λ∆t − (1 − η∆t)u∗ )2 (13)
2 2
= γq + c1 (q − λ∆t) + c2 (q − λ∆t) + c3 ,
where γN
c1 = β∗2 + γN (1 − (1 − η∆t)β∗ )2 = > 0.
1 + γN (1 − η∆t)2
Hence the value function for the last period can be written in the canonical quadratic form as:
V (x) = xT P x + bT x + c, (14)
where P must be a positive definite matrix. This is because by the quadratic portion of V (x),
γq 2 + c1 (q − λ∆t)2 = 0 ⇒ q = 0, λ = 0.
where Pn+1 is positive definite. We now show that this implies that
un = φn (xn ) = αn + β Tn xn .
The objective is to derive αn , β n , Pn , bn and cn from Pn+1 , bn+1 and cn+1 recursively.
For clarity, we drop the subscript n so that for any variable or parameter X, we use instead
Xn −→ X, and Xn+1 −→ X1 .
13
where η is a model parameter or constant that represents the market impact or information leakage
and W is the Poisson random hits on “our” passive limit orders over [tn , tn+1 ).
Then the value function at tn is given by:
Let p11 denote the (1,1)-element of P1 , and z E = (λ+ ∆t, 0)T = E[z]. Then
EW V1 (x1 ) = V1 (EW x1 ) + EW (z − z E )T P1 (z − z E )
= V1 (x − au − z E ) + p11 Var(W )
= V1 (x − au − z E ) + p11 λ+ ∆t.
We now Define
0 ∆t 1 − η∆t
L= , J = I2 − L, and h = , (17)
0 0 η
where I2 denote the 2 by 2 identity matrix. Then
λ∆t − η∆tu η∆t
zE = = Lx − u,
0 0
x − au − z E = Jx − hu.
Assume that
f (u) = f (0) − Bu + Au2 . Then, f 0 (u) = 2Au − B. (19)
On the other hand, direct differentiation gives
Hence we have
14
V (x) = f (u∗ | x)
= f (0) − Bu∗ + Au2∗
= f (0) − Au2∗
= γq 2 − l11 λ∆t + V1 (Jx) − A(α∗ + β T∗ x)2
= γq 2 − l11 λ∆t + (xT J T P1 Jx + bT1 Jx + c1 ) − (1 + khk2P1 )(α∗ + β T∗ x)2
= γq 2 + xT (J T P1 J)x − (1 + khk2P1 )xT β ∗ β T∗ x
− l11 λ∆t + bT1 Jx − 2(1 + khk2P1 )α∗ β T∗ x
+ c1 − (1 + khk2P1 )α∗2
= xT P x + bT x + c,
P1 hhT P1
1 0
P =γ + J T QJ, with Q = P1 − ,
0 0 1 + khk2P1
(22)
bT = bT1 J − l11 ∆t(0, 1) − 2(1 + khk2P1 )α∗ β T∗ ,
c = c1 − (1 + khk2P1 )α∗2 .
v, h P1
:= v T P1 h
to denote the P1 -stretched inner product. By the Cauchy-Schwarz Theorem, one has
v, h P1
≤ kvkP1 · khkP1 .
15
Since this holds for any non-zero vector v, Q and hence P must be positive definite.
We have thus established the following theorem, with Jn and hn defined as in Eqn. (17):
1 −∆tn 1 − η∆tn
Jn = , and hn = .
0 1 η
They are constant for equal partitioning when ∆tn = tn+1 − tn ’s are all the same.
Theorem 1 Let VN (xN ) = γN qN 2 be the terminal cost at the ending time t . Then at each
N
action time tn with n < N , there exist a positive definite 2 by 2 matrix Pn , a 2 by 1 vector bn ,
a scalar cn , such that the value function Vn is given by:
T T qn T qn
Vn (xn ) = xn Pn x + bn xn + cn = (qn , λn )Pn + bn + cn . (23)
λn λn
The optimal policy un is given by the linear form using parameters at tn+1 :
where Pn+1 (1, 1) denotes the (1,1)-element of Pn+1 . Furthermore, the structure of the value
functions also cascades backwards as follows:
16
(1) The proposed dynamic programming COP Algo had not been the internal or external product
of any execution houses where the author worked previously. Any potential industrial conflict
or suspected proprietary trespass should be promptly directed to the attention of the author,
together with necessary evidences.
(2) In the spirit of reductionism and the pursuit of a dynamic programming COP model with
closed-form solutions, the current model does not address other important execution or imple-
mentation details, including fragmented venues in the national market system (NMS), different
trading sessions and rules, various order types, lit vs. dark, and so on.
(3) The current work focuses exclusively on the dynamic interplay between aggressive takeout
orders and passive limit orders. The price improvement or cost is represented by a half spread.
Information leakage or the market impact of aggressive orders is reflected in the reduction of
Poisson hitting rates on passive limit orders.
(4) Like some earlier DP macro Algos, risk aversion is implemented by the delay cost in the stage
cost model. It facilitates soft completion of a given child order over its designated micro time
bin. Hard completion or catchup is usually implemented at the macro layer.
(5) If the results here are to be integrated into an existing COP program in an execution house, a
practitioner should apply some heuristic but necessary overlays. For instance, u∗n ≤ 0 can be
interpreted as no sniping, while u∗n > 0 should also be capped by the outstanding position qn .
(6) Overall, the author wishes that the current model could inspire more similar and rigorous
works that can improve the heuristic decision trees prevailing in the COP processes in the
contemporary Algo industry.
Acknowledgments
Jackie Shen is very grateful to all the colleagues at the electronic or algorithmic trading desks
of J.P. Morgan, Barclays and Goldman Sachs, esp. to many of our hard-working IT, RISK, and
COMPLIANCE colleagues whose names hardly appear in the headlines, for their daily professional
assistance as well as generous personal support. Reliable and healthy electronic trading would be
impossible without solid IT implementations of databases, data streaming, servers, networks, and
multiple inter-dependent Algo components, or effective risk management and compliance controls.
This work was completed when the pandemic Covid-19 was sweeping through the entire globe
mercilessly. Under tremendous mental pressure living in the epicenter of New York, the author is
extremely grateful to his family, friends, and colleagues, as well as thousands of courageous and
selfless medical professionals, policemen and policewomen, and fire fighters of this great city.
The pandemic has actually brought the people in the city and around the globe much closer
and more united, as my 9-year old observes from her numerous Zoom online classes and chats,
as well as all the touching stories around the world on fighting against the virus. Beyond A.I. or
automated trading “robots” as the current work has covered, the pandemic has grounded all of us
to the very core meaning of human beings and human societies. At the end of this darkest storm
there will be a brightest rainbow — so colorful, refreshing, and full of new hopes.
17
18