Numerical Optimal Control
Moritz Diehl
Simplified Optimal Control Problem in ODE
path constraints h(x, u) ≥ 0
6
states x(t)
initial value terminal
x0 r constraint r (x(T )) ≥ 0
controls u(t)
p p
-
0 t T
Z T
minimize L(x(t), u(t)) dt + E (x(T ))
x(·),u(·) 0
subject to
x(0) − x0 = 0, (fixed initial value)
ẋ(t)−f (x(t), u(t)) = 0, t ∈ [0, T ], (ODE model)
h(x(t), u(t)) ≥ 0, t ∈ [0, T ], (path constraints)
r (x(T )) ≥ 0 (terminal constraints)
More general optimal control problems
Many features left out here for simplicity of presentation:
I multiple dynamic stages
I differential algebraic equations (DAE) instead of ODE
I explicit time dependence
I constant design parameters
I multipoint constraints r (x(t0 ), x(t1 ), . . . , x(tend )) = 0
Optimal Control Family Tree
Three basic families:
I Hamilton-Jacobi-Bellmann equation /
dynamic programming
I Indirect Methods / calculus of variations / Pontryagin
I Direct Methods (control discretization)
Principle of Optimality
Any subarc of an optimal trajectory is also optimal.
6
intermediate
states x(t) s
value x̄
initial s
value x0
optimal
controls u(t)
p p-
0 t̄ T
Subarc on [t̄, T ] is optimal solution for initial value x̄.
Dynamic Programming Cost-to-go
IDEA:
I Introduce optimal-cost-to-go function on [t̄, T ]
Z T
J(x̄, t̄) := min L(x, u)dt + E (x(T )) s.t. x(t̄) = x̄, . . .
x,u t̄
I Introduce grid 0 = t0 < . . . < tN = T .
I Use principle of optimality on intervals [tk , tk+1 ]:
Z tk+1
J(xk , tk ) = min L(x, u)dt + J(x(tk+1 ), tk+1 )
x,u tk
s.t. x(tk ) = xk , . . .
rxk rx(tk+1 )
p-
tk tk+1 T
Dynamic Programming Recursion
Starting from J(x, tN ) = E (x), compute recursively backwards, for
k = N − 1, . . . , 0
Z tk+1
J(xk , tk ) := min L(x, u)dt + J(x(tk+1 ), tk+1 ) s.t. x(tk ) = xk , . . .
x,u tk
by solution of short horizon problems for all possible xk and
tabulation in state space.
Dynamic Programming Recursion
Starting from J(x, tN ) = E (x), compute recursively backwards, for
k = N − 1, . . . , 0
Z tk+1
J(xk , tk ) := min L(x, u)dt + J(x(tk+1 ), tk+1 ) s.t. x(tk ) = xk , . . .
x,u tk
by solution of short horizon problems for all possible xk and
tabulation in state space.
J(·, tN )
6
@
R xN
@
Dynamic Programming Recursion
Starting from J(x, tN ) = E (x), compute recursively backwards, for
k = N − 1, . . . , 0
Z tk+1
J(xk , tk ) := min L(x, u)dt + J(x(tk+1 ), tk+1 ) s.t. x(tk ) = xk , . . .
x,u tk
by solution of short horizon problems for all possible xk and
tabulation in state space.
J(·, tN−1 ) J(·, tN )
6 6
@ @
R xN−1 @
@ R xN
Dynamic Programming Recursion
Starting from J(x, tN ) = E (x), compute recursively backwards, for
k = N − 1, . . . , 0
Z tk+1
J(xk , tk ) := min L(x, u)dt + J(x(tk+1 ), tk+1 ) s.t. x(tk ) = xk , . . .
x,u tk
by solution of short horizon problems for all possible xk and
tabulation in state space.
J(·, t0 ) J(·, tN−1 ) J(·, tN )
6 6 6
···
@ @ @
@ x0
R R xN−1 @
@ R xN
Hamilton-Jacobi-Bellman (HJB) Equation
I Dynamic Programming with infinitely small timesteps leads to
Hamilton-Jacobi-Bellman (HJB) Equation:
∂J ∂J
− (x, t) = min L(x, u) + (x, t)f (x, u) s.t. h(x, u) ≥ 0.
∂t u ∂x
I Solve this partial differential equation (PDE) backwards for
t ∈ [0, T ], starting at the end of the horizon with
J(x, T ) = E (x).
I NOTE: Optimal controls for state x at time t are obtained
from
∗ ∂J
u (x, t) = arg min L(x, u) + (x, t)f (x, u) s.t. h(x, u) ≥ 0.
u ∂x
Dynamic Programming / HJB
I “Dynamic Programming” applies to discrete time,
“HJB” to continuous time systems.
I Pros and Cons
+ Searches whole state space, finds global optimum.
+ Optimal feedback controls precomputed.
+ Analytic solution to some problems possible (linear systems
with quadratic cost → Riccati Equation)
I “Viscosity solutions” (Lions et al.) exist for quite general
nonlinear problems.
- But: in general intractable, because partial differential
equation (PDE) in high dimensional state space: “curse of
dimensionality”.
I Possible remedy: Approximate J e.g. in framework of
neuro-dynamic programming [Bertsekas 1996].
I Used for practical optimal control of small scale systems e.g.
by Bonnans, Zidani, Lee, Back, ...
Indirect Methods
For simplicity, regard only problem without inequality constraints:
6
states x(t)
initial value terminal
x0 r cost E (x(T ))
controls u(t)
p p
-
0 t T
Z T
minimize L(x(t), u(t)) dt + E (x(T ))
x(·),u(·) 0
subject to
x(0) − x0 = 0, (fixed initial value)
ẋ(t)−f (x(t), u(t)) = 0, t ∈ [0, T ], (ODE model)
Pontryagin’s Minimum Principle
OBSERVATION: In HJB, optimal controls
∗ ∂J
u (t) = arg min L(x, u) + (x, t)f (x, u)
u ∂x
∂J
depend only on derivative ∂x (x, t), not on J itself!
IDEA: Introduce adjoint variables
∂J
λ(t) =
ˆ (x(t), t)T ∈ Rnx
∂x
and get controls from Pontryagin’s Minimum Principle
u ∗ (t, x, λ) = arg min L(x, u) + λT f (x, u)
u | {z }
Hamiltonian=:H(x,u,λ)
QUESTION: How to obtain λ(t)?
Adjoint Differential Equation
I Differentiate HJB Equation
∂J ∂J
− (x, t) = min H(x, u, (x, t)T )
∂t u ∂x
with respect to x and obtain:
∂
−λ̇T = (H(x(t), u ∗ (t, x, λ), λ(t))) .
∂x
I Likewise, differentiate J(x, T ) = E (x) and obtain terminal
condition
∂E
λ(T )T = (x(T )).
∂x
How to obtain explicit expression for controls?
I In simplest case,
u ∗ (t) = arg min H(x(t), u, λ(t))
u
is defined by
∂H
(x(t), u ∗ (t), λ(t)) = 0
∂u
(Calculus of Variations, Euler-Lagrange).
I In presence of path constraints, expression for u ∗ (t) changes
whenever active constraints change. This leads to state
dependent switches.
I If minimum of Hamiltonian locally not unique, “singular arcs”
occur. Treatment needs higher order derivatives of H.
Necessary Optimality Conditions
Summarize optimality conditions as boundary value problem:
x(0) = x0 , initial value
∗
ẋ(t) = f (x(t), u (t)), t ∈ [0, T ], ODE model
∂H
−λ̇(t) = (x(t), u ∗ (t), λ(t))T , t ∈ [0, T ], adjoint equations
∂x
u ∗ (t) = arg min H(x(t), u, λ(t)), t ∈ [0, T ], minimum principle
u
∂E
λ(T ) = (x(T ))T . adjoint final value.
∂x
Solve with so called
I gradient methods,
I shooting methods, or
I collocation.
Indirect Methods
I “First optimize, then discretize”
I Pros and Cons
+ Boundary value problem with only 2 × nx ODE.
+ Can treat large scale systems.
- Only necessary conditions for local optimality.
- Need explicit expression for u ∗ (t), singular arcs difficult to
treat.
- ODE strongly nonlinear and unstable.
- Inequalities lead to ODE with state dependent switches.
Possible remedy: Use interior point method in function space
inequalities, e.g. Weiser and Deuflhard, Bonnans and
Laurent-Varin
I Used for optimal control e.g. in satellite orbit planning at
CNES...
Direct Methods
I “First discretize, then optimize”
I Transcribe infinite problem into finite dimensional, Nonlinear
Programming Problem (NLP), and solve NLP.
I Pros and Cons:
+ Can use state-of-the-art methods for NLP solution.
+ Can treat inequality constraints and multipoint constraints
much easier.
- Obtains only suboptimal/approximate solution.
I Nowadays most commonly used methods due to their easy
applicability and robustness.
Direct Methods Overview
We treat three direct methods:
I Direct Single Shooting (sequential simulation and
optimization)
I Direct Collocation (simultaneous simulation and optimization)
I Direct Multiple Shooting (simultaneous resp. hybrid)
Direct Single Shooting [Hicks1971,Sargent1978]
Discretize controls u(t) on fixed grid 0 = t0 < t1 < . . . < tN = T ,
regard states x(t) on [0, T ] as dependent variables.
states x(t; q)
x0 r
discretized controls u(t; q)
q0 qN−1
p p-
0 q1 t T
Use numerical integration to obtain state as function x(t; q) of
finitely many control parameters q = (q0 , q1 , . . . , qN−1 )
NLP in Direct Single Shooting
After control discretization and numerical ODE solution, obtain
NLP:
Z T
minimize L(x(t; q), u(t; q)) dt + E (x(T ; q))
q 0
subject to
h(x(ti ; q), u(ti ; q)) ≥ 0,
(discretized path constraints)
i = 0, . . . , N,
r (x(T ; q)) ≥ 0. (terminal constraints)
Solve with finite dimensional optimization solver, e.g. Sequential
Quadratic Programming (SQP).
Solution by Standard SQP
Summarize problem as
min F (q) s.t. H(q) ≥ 0.
q
Solve e.g. by Sequential Quadratic Programming (SQP), starting
with guess q 0 for controls. k := 0
1. Evaluate F (q k ), H(q k ) by ODE solution, and derivatives!
2. Compute correction ∆q k by solution of QP:
1
min ∇F (qk )T ∆q+ ∆q T Ak ∆q s.t. H(q k )+∇H(q k )T ∆q ≥ 0.
∆q 2
3. Perform step q k+1 = q k + αk ∆q k with step length αk
determined by line search.
ODE Sensitivities
∂x(t; q)
How to compute the sensitivity of a numerical ODE
∂q
solution x(t; q) with respect to the controls q?
Four ways:
1. External Numerical Differentiation (END)
2. Variational Differential Equations
3. Automatic Differentiation
4. Internal Numerical Differentiation (IND)
Numerical Test Problem
Z 3
minimize x(t)2 + u(t)2 dt
x(·),u(·) 0
subject to
x(0) = x0 , (initial value)
ẋ = (1 + x)x + u, t ∈ [0, 3], (ODE model)
1 − x(t) 0
1 + x(t) 0
1 − u(t) ≥ 0 , t ∈ [0, 3], (bounds)
1 + u(t) 0
x(3) = 0. (zero terminal constraint).
Remark: Uncontrollable growth for
(1 + x0 )x0 − 1 ≥ 0 ⇔ x0 ≥ 0.618.
Single Shooting Optimization for x0 = 0.05
I Choose N = 30 equal control intervals.
I Initialize with steady state controls u(t) ≡ 0.
I Initial value x0 = 0.05 is the maximum possible, because
initial trajectory explodes otherwise.
Single Shooting: Initialization
Single Shooting: First Iteration
Single Shooting: 2nd Iteration
Single Shooting: 3rd Iteration
Single Shooting: 4th Iteration
Single Shooting: 5th Iteration
Single Shooting: 6th Iteration
Single Shooting: 7th Iteration and Solution
Direct Single Shooting: Pros and Cons
I Sequential simulation and optimization.
+ Can use state-of-the-art ODE/DAE solvers.
+ Few degrees of freedom even for large ODE/DAE systems.
+ Active set changes easily treated.
+ Need only initial guess for controls q.
- Cannot use knowledge of x in initialization (e.g. in tracking
problems).
- ODE solution x(t; q) can depend very nonlinearly on q.
- Unstable systems difficult to treat.
I Often used in engineering applications e.g. in packages gOPT
(PSE), DYOS (Marquardt), . . .
Direct Collocation (Sketch) [Tsang1975]
I Discretize controls and states on fine grid with node values
si ≈ x(ti ).
I Replace infinite ODE
0 = ẋ(t) − f (x(t), u(t)), t ∈ [0, T ]
by finitely many equality constraints
ci (qi , si , si+1 ) = 0, i = 0, . . . , N − 1,
si+1 − si si + si+1
e.g. ci (qi , si , si+1 ) := −f , qi
ti+1 − ti 2
I Approximate also integrals, e.g.
Z ti+1
si + si+1
L(x(t), u(t))dt ≈ li (qi , si , si+1 ) := L , qi (ti+1 −ti )
ti 2
NLP in Direct Collocation
After discretization obtain large scale, but sparse NLP:
N−1
X
minimize li (qi , si , si+1 ) + E (sN )
s,q
i=0
subject to
s0 − x0 = 0, (fixed initial value)
ci (qi , si , si+1 ) = 0, i = 0, . . . , N − 1, (discretized ODE model)
h(si , qi ) ≥ 0, i = 0, . . . , N, (discretized path constraint
r (sN ) ≥ 0. (terminal constraints)
Solve e.g. with SQP method for sparse problems.
What is a sparse NLP?
General NLP:
min F (w ) s.t.
w
G (w ) = 0,
H(w ) ≥ 0.
is called sparse if the Jacobians (derivative matrices)
∂G ∂G
∇w G T = = and ∇w H T
∂w ∂wj ij
contain many zero elements.
In SQP methods, this makes QP much cheaper to build and to
solve.
Direct Collocation: Pros and Cons
I Simultaneous simulation and optimization.
+ Large scale, but very sparse NLP.
+ Can use knowledge of x in initialization.
+ Can treat unstable systems well.
+ Robust handling of path and terminal constraints.
- Adaptivity needs new grid, changes NLP dimensions.
I Successfully used for practical optimal control e.g. by Biegler
and Wächter (IPOPT), Betts,
Direct Multiple Shooting [Bock 1984]
I Discretize controls piecewise on a coarse grid
u(t) = qi for t ∈ [ti , ti+1 ]
I Solve ODE on each interval [ti , ti+1 ] numerically, starting with
artificial initial value si :
ẋi (t; si , qi ) = f (xi (t; si , qi ), qi ), t ∈ [ti , ti+1 ],
xi (ti ; si , qi ) = si .
Obtain trajectory pieces xi (t; si , qi ).
I Also numerically compute integrals
Z ti+1
li (si , qi ) := L(xi (ti ; si , qi ), qi )dt
ti
Sketch of Direct Multiple Shooting
6
xi (ti+1 ; si , qi ) 6= si+1
@
R
@
r r r sN−1
r r r s
r N
r r s r si+1
r i
s0r s1
6
x0 f
r
qi
q0
q q q q q q q q -
t0 t1 ti ti+1 tN−1 tN
NLP in Direct Multiple Shooting
6
q
q q q q q q q q q
q q
bq
p p p p p p p p
-
N−1
X
minimize li (si , qi ) + E (sN )
s,q
i=0
subject to
s0 − x0 = 0, (initial value)
si+1 − xi (ti+1 ; si , qi ) = 0, i = 0, . . . , N − 1, (continuity)
h(si , qi ) ≥ 0, i = 0, . . . , N, (discretized path constraints
r (sN ) ≥ 0. (terminal constraints)
Structured NLP
I Summarize all variables as w := (s0 , q0 , s1 , q1 , . . . , sN ).
I Obtain structured NLP
G (w ) = 0
min F (w ) s.t.
w H(w ) ≥ 0.
I Jacobian ∇G (w k )T contains dynamic model equations.
I Jacobians and Hessian of NLP are block sparse, can be
exploited in numerical solution procedure.
Test Example: Initialization with u(t) ≡ 0
Multiple Shooting: First Iteration
Multiple Shooting: 2nd Iteration
Multiple Shooting: 3rd Iteration and Solution
Direct Multiple Shooting: Pros and Cons
I Simultaneous simulation and optimization.
+ uses adaptive ODE/DAE solvers
+ but NLP has fixed dimensions
+ can use knowledge of x in initialization (here bounds;
more important in online context).
+ can treat unstable systems well.
+ robust handling of path and terminal constraints.
+ easy to parallelize.
- not as sparse as collocation.
I Used for practical optimal control e.g by Franke (ABB)
(“HQP”), Terwen (Daimler); Bock et al.
(“MUSCOD-II”); in ACADO Toolkit; ...
Conclusions: Optimal Control Family Tree
( ((
( ( (((
(( (
((((
Hamilton-Jacobi- Indirect Methods, Direct Methods:
Bellman Equation: Pontryagin: Transform into
Tabulation in Solve Boundary Nonlinear Program
State Space Value Problem (NLP)
( (((( (
(
( ((
( ( ((
(( (
Single Shooting: Collocation: Multiple Shooting:
Only discretized Discretized controls Controls and node
controls in NLP and states in NLP start values in NLP
(sequential) (simultaneous) (simultaneous/hybrid)
Literature
I T. Binder, L. Blank, H. G. Bock, R. Bulirsch, W. Dahmen, M.
Diehl, T. Kronseder, W. Marquardt and J. P. Schler, and O.
v. Stryk: Introduction to Model Based Optimization of
Chemical Processes on Moving Horizons. In Grötschel,
Krumke, Rambau (eds.): Online Optimization of Large Scale
Systems: State of the Art, Springer, 2001. pp. 295–340.
I John T. Betts: Practical Methods for Optimal Control Using
Nonlinear Programming. SIAM, Philadelphia, 2001. ISBN
0-89871-488-5
I Dimitri P. Bertsekas: Dynamic Programming and Optimal
Control. Athena Scientific, Belmont, 2000 (Vol I, ISBN:
1-886529-09-4) & 2001 (Vol II, ISBN: 1-886529-27-2)
I A. E. Bryson and Y. C. Ho: Applied Optimal Control,
Hemisphere/Wiley, 1975.