Pontryagin’s maximum principle
Emo Todorov
Applied Mathematics and Computer Science & Engineering
University of Washington
Winter 2012
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1/9
Pontryagin’s maximum principle
For deterministic dynamics ẋ = f (x, u) we can compute extremal open-loop
trajectories (i.e. local minima) by solving a boundary-value ODE problem
∂
with given x (0) and λ (T ) = ∂x qT (x), where λ (t) is the gradient of the
optimal cost-to-go function (called costate).
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 2/9
Pontryagin’s maximum principle
For deterministic dynamics ẋ = f (x, u) we can compute extremal open-loop
trajectories (i.e. local minima) by solving a boundary-value ODE problem
∂
with given x (0) and λ (T ) = ∂x qT (x), where λ (t) is the gradient of the
optimal cost-to-go function (called costate).
Definition (deterministic Hamiltonian)
H (x, u, λ) , ` (x, u) + f (x, u)T λ
Theorem (continuous-time maximum principle)
If x (t) , u (t), 0 t T is the optimal state-control trajectory starting at x (0), then
∂
there exists a costate trajectory λ (t) with λ (T ) = ∂x qT (x) satisfying
ẋ = H λ (x, u, λ) = f (x, u)
λ̇ = Hx (x, u, λ) = `x (x, u) + fx (x, u)T λ
u e, λ)
= arg min H (x, u
e
u
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 2/9
Derivation from the HJB equation (continuous time)
For deterministic dynamics ẋ = f (x, u) the optimal cost-to-go in the
finite-horizon setting satisfies the HJB equation
n o
vt (x, t) = min ` (x, u) + f (x, u)T vx (x, t) = min H (x, u, vx (x, t))
u u
If the optimal control law is π (x, t), we can set u = π and drop the ’min’:
0 = vt (x, t) + ` (x, π (x, t)) + f (x, π (x, t))T vx (x, t)
Now differentiate w.r.t. x and suppress the dependences for clarity:
0 = vtx + `x + π T T T T
x `u + fx + π x fu vx + vxx f
Using the identity v̇x = vtx + vxx f and regrouping yields
0 = v̇x + `x + fT T T T
x vx + π x `u + fu vx = v̇x + Hx + π x Hu
Since u is optimal we have Hu = 0, thus λ̇ = Hx (x, π, λ) where λ = vx .
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 3/9
Derivation via Largrange multipliers (discrete time)
Optimize total cost subject to dynamics constraints xk+1 = f (xk , uk ).
Define the Lagrangian L (x , u , λ ) as
N 1
L = qT (xN ) + ∑k=0 ` (xk , uk ) + (f (xk , uk ) xk+1 )T λk+1
N 1
N λN + x0 λ0 + ∑k=0 H (xk , uk , λk+1 )
xT T
= qT (xN ) xT
k λk
Setting Lx = Lλ = 0 and explicitly minimizing w.r.t. u yields
Theorem (discrete-time maximum principle)
If xk , uk , 0 k N is the optimal state-control trajectory starting at x0 , then there
∂
exists a costate trajectory λk with λN = ∂x qT (xN ) satisfying
xk+1 = H λ (xk , uk , λk+1 ) = f (xk , uk )
λk = Hx (xk , uk , λk+1 ) = `x (xk , uk ) + fx (xk , uk )T λk+1
uk = arg min H (xk , ue , λ k +1 )
e
u
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 4/9
Gradient of the total cost
The maximum principle provides an efficient way to evaluate the gradient of
the total cost w.r.t. u, and thereby optimize the controls numerically.
Theorem (gradient)
For given control trajectory uk , let xk , λk be such that
xk+1 = f (xk , uk )
λk = `x (xk , uk ) + fx (xk , uk )T λk+1
∂
with x0 given and λN = ∂x qT (xN ). Let J (x , u ) be the total cost. Then
∂
J (x , u ) = Hu (xk , uk , λk+1 ) = `u (xk , uk ) + fu (xk , uk )T λk+1
∂uk
Note that xk can be found in a forward pass (since it does not depend on λ),
and then λk can be found in a backward pass.
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 5/9
Proof by induction
The cost accumulated from time k until the end can be written recursively as
Jk (xk N , uk N 1 ) = ` (xk , uk ) + Jk+1 (xk+1 N , uk+1 N 1 )
Noting that uk affects future costs only through xk+1 = f (xk , uk ), we have
∂ ∂
J = `u (xk , uk ) + fu (xk , uk )T J
∂uk k ∂xk+1 k+1
∂
We need to show that λk = J . For k = N this holds because JN = qT .
∂xk k
For k < N we have
∂ ∂
J = `x (xk , uk ) + fx (xk , uk )T J
∂xk k ∂xk+1 k+1
which is identical to λk = `x (xk , uk ) + fx (xk , uk )T λk+1 .
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 6/9
Enforcing terminal states
The final state x (T ) is usually different from the minimum of the final
cost qT , because it reflects a trade-off between final and running cost.
We can enforce x (T ) = x as a boundary condition and remove the
boundary condition on λ (T ).
Once the solution is found, we can construct a function qT such that
∂
λ (T ) = ∂x qT (x (T )). However if λ (T ) 6= 0 then x (T ) is not the
minimum of this qT .
We can also define the problem as infinite horizon average cost, in which
case it is usually suboptimal to have an asymptotic state different from
the minimum of the state cost function. The maximum principle does not
apply to infinite horizon problems, so one has to use the HJB equations.
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 7/9
More tractable problems
When the dynamics and cost are in the restricted form
ẋ= a (x) + Bu
1
` (x, u) = q (x) + uT Ru
2
the Hamiltonian can be minimized analytically, which yields the ODE
1 T
ẋ = a (x) BR B λ
λ̇ = qx ( x ) + a x ( x ) T λ
∂
with boundary conditions x (0) and λ (T ) = ∂x qT (x). If B, R depend on x,
the second equation has additional terms involving the derivatives of B, R.
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 8/9
More tractable problems
When the dynamics and cost are in the restricted form
ẋ= a (x) + Bu
1
` (x, u) = q (x) + uT Ru
2
the Hamiltonian can be minimized analytically, which yields the ODE
1 T
ẋ = a (x) BR B λ
λ̇ = qx ( x ) + a x ( x ) T λ
∂
with boundary conditions x (0) and λ (T ) = ∂x qT (x). If B, R depend on x,
the second equation has additional terms involving the derivatives of B, R.
We have Hu = R (x) u + B (x)T λ and Huu = R (x) 0. Thus the maximum
principle here is both a necessary and a sufficient condition for a local
minimum.
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 8/9
Pendulum example
Passive dynamics:
x2
a (x) =
k sin (x1 )
0 1
ax (x) =
k cos (x1 ) 0
Optimal control:
1
u= r λ2
ODE (with q = 0):
ẋ1 = x2
1
ẋ2 = k sin (x1 ) r λ2
λ̇1 = k cos (x1 ) λ2
λ̇2 = λ1
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 9/9
Pendulum example
Passive dynamics: Cost-to-go and trajectories:
x2
a (x) =
k sin (x1 )
0 1
ax (x) =
k cos (x1 ) 0
Optimal control:
1
u= r λ2
Control law (from HJB):
ODE (with q = 0):
ẋ1 = x2
1
ẋ2 = k sin (x1 ) r λ2
λ̇1 = k cos (x1 ) λ2
λ̇2 = λ1
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 9/9