0% found this document useful (0 votes)
253 views12 pages

Pontryagin's Maximum Principle Overview

The document summarizes Pontryagin's maximum principle for determining optimal control trajectories. It describes how the principle uses costates or co-states to represent the gradient of the optimal cost-to-go function. It shows that the principle can be derived from the Hamilton-Jacobi-Bellman equation and provides conditions for optimal trajectories and controls in both continuous and discrete time. It also provides examples of how the principle can be applied to specific dynamics and cost functions.

Uploaded by

manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
253 views12 pages

Pontryagin's Maximum Principle Overview

The document summarizes Pontryagin's maximum principle for determining optimal control trajectories. It describes how the principle uses costates or co-states to represent the gradient of the optimal cost-to-go function. It shows that the principle can be derived from the Hamilton-Jacobi-Bellman equation and provides conditions for optimal trajectories and controls in both continuous and discrete time. It also provides examples of how the principle can be applied to specific dynamics and cost functions.

Uploaded by

manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pontryagin’s maximum principle

Emo Todorov

Applied Mathematics and Computer Science & Engineering


University of Washington

Winter 2012

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1/9


Pontryagin’s maximum principle
For deterministic dynamics ẋ = f (x, u) we can compute extremal open-loop
trajectories (i.e. local minima) by solving a boundary-value ODE problem

with given x (0) and λ (T ) = ∂x qT (x), where λ (t) is the gradient of the
optimal cost-to-go function (called costate).

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 2/9


Pontryagin’s maximum principle
For deterministic dynamics ẋ = f (x, u) we can compute extremal open-loop
trajectories (i.e. local minima) by solving a boundary-value ODE problem

with given x (0) and λ (T ) = ∂x qT (x), where λ (t) is the gradient of the
optimal cost-to-go function (called costate).

Definition (deterministic Hamiltonian)


H (x, u, λ) , ` (x, u) + f (x, u)T λ

Theorem (continuous-time maximum principle)


If x (t) , u (t), 0 t T is the optimal state-control trajectory starting at x (0), then

there exists a costate trajectory λ (t) with λ (T ) = ∂x qT (x) satisfying

ẋ = H λ (x, u, λ) = f (x, u)
λ̇ = Hx (x, u, λ) = `x (x, u) + fx (x, u)T λ
u e, λ)
= arg min H (x, u
e
u

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 2/9


Derivation from the HJB equation (continuous time)
For deterministic dynamics ẋ = f (x, u) the optimal cost-to-go in the
finite-horizon setting satisfies the HJB equation
n o
vt (x, t) = min ` (x, u) + f (x, u)T vx (x, t) = min H (x, u, vx (x, t))
u u

If the optimal control law is π (x, t), we can set u = π and drop the ’min’:

0 = vt (x, t) + ` (x, π (x, t)) + f (x, π (x, t))T vx (x, t)

Now differentiate w.r.t. x and suppress the dependences for clarity:

0 = vtx + `x + π T T T T
x `u + fx + π x fu vx + vxx f

Using the identity v̇x = vtx + vxx f and regrouping yields

0 = v̇x + `x + fT T T T
x vx + π x `u + fu vx = v̇x + Hx + π x Hu

Since u is optimal we have Hu = 0, thus λ̇ = Hx (x, π, λ) where λ = vx .


Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 3/9
Derivation via Largrange multipliers (discrete time)
Optimize total cost subject to dynamics constraints xk+1 = f (xk , uk ).
Define the Lagrangian L (x , u , λ ) as
N 1
L = qT (xN ) + ∑k=0 ` (xk , uk ) + (f (xk , uk ) xk+1 )T λk+1
N 1
N λN + x0 λ0 + ∑k=0 H (xk , uk , λk+1 )
xT T
= qT (xN ) xT
k λk

Setting Lx = Lλ = 0 and explicitly minimizing w.r.t. u yields

Theorem (discrete-time maximum principle)


If xk , uk , 0 k N is the optimal state-control trajectory starting at x0 , then there

exists a costate trajectory λk with λN = ∂x qT (xN ) satisfying

xk+1 = H λ (xk , uk , λk+1 ) = f (xk , uk )


λk = Hx (xk , uk , λk+1 ) = `x (xk , uk ) + fx (xk , uk )T λk+1
uk = arg min H (xk , ue , λ k +1 )
e
u

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 4/9


Gradient of the total cost

The maximum principle provides an efficient way to evaluate the gradient of


the total cost w.r.t. u, and thereby optimize the controls numerically.

Theorem (gradient)
For given control trajectory uk , let xk , λk be such that

xk+1 = f (xk , uk )
λk = `x (xk , uk ) + fx (xk , uk )T λk+1

with x0 given and λN = ∂x qT (xN ). Let J (x , u ) be the total cost. Then


J (x , u ) = Hu (xk , uk , λk+1 ) = `u (xk , uk ) + fu (xk , uk )T λk+1
∂uk

Note that xk can be found in a forward pass (since it does not depend on λ),
and then λk can be found in a backward pass.

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 5/9


Proof by induction

The cost accumulated from time k until the end can be written recursively as

Jk (xk N , uk N 1 ) = ` (xk , uk ) + Jk+1 (xk+1 N , uk+1 N 1 )

Noting that uk affects future costs only through xk+1 = f (xk , uk ), we have

∂ ∂
J = `u (xk , uk ) + fu (xk , uk )T J
∂uk k ∂xk+1 k+1


We need to show that λk = J . For k = N this holds because JN = qT .
∂xk k
For k < N we have
∂ ∂
J = `x (xk , uk ) + fx (xk , uk )T J
∂xk k ∂xk+1 k+1

which is identical to λk = `x (xk , uk ) + fx (xk , uk )T λk+1 .

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 6/9


Enforcing terminal states

The final state x (T ) is usually different from the minimum of the final
cost qT , because it reflects a trade-off between final and running cost.
We can enforce x (T ) = x as a boundary condition and remove the
boundary condition on λ (T ).
Once the solution is found, we can construct a function qT such that

λ (T ) = ∂x qT (x (T )). However if λ (T ) 6= 0 then x (T ) is not the
minimum of this qT .
We can also define the problem as infinite horizon average cost, in which
case it is usually suboptimal to have an asymptotic state different from
the minimum of the state cost function. The maximum principle does not
apply to infinite horizon problems, so one has to use the HJB equations.

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 7/9


More tractable problems

When the dynamics and cost are in the restricted form

ẋ= a (x) + Bu
1
` (x, u) = q (x) + uT Ru
2
the Hamiltonian can be minimized analytically, which yields the ODE
1 T
ẋ = a (x) BR B λ
λ̇ = qx ( x ) + a x ( x ) T λ

with boundary conditions x (0) and λ (T ) = ∂x qT (x). If B, R depend on x,
the second equation has additional terms involving the derivatives of B, R.

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 8/9


More tractable problems

When the dynamics and cost are in the restricted form

ẋ= a (x) + Bu
1
` (x, u) = q (x) + uT Ru
2
the Hamiltonian can be minimized analytically, which yields the ODE
1 T
ẋ = a (x) BR B λ
λ̇ = qx ( x ) + a x ( x ) T λ

with boundary conditions x (0) and λ (T ) = ∂x qT (x). If B, R depend on x,
the second equation has additional terms involving the derivatives of B, R.

We have Hu = R (x) u + B (x)T λ and Huu = R (x) 0. Thus the maximum


principle here is both a necessary and a sufficient condition for a local
minimum.

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 8/9


Pendulum example
Passive dynamics:

x2
a (x) =
k sin (x1 )
0 1
ax (x) =
k cos (x1 ) 0

Optimal control:
1
u= r λ2

ODE (with q = 0):

ẋ1 = x2
1
ẋ2 = k sin (x1 ) r λ2
λ̇1 = k cos (x1 ) λ2
λ̇2 = λ1

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 9/9


Pendulum example
Passive dynamics: Cost-to-go and trajectories:

x2
a (x) =
k sin (x1 )
0 1
ax (x) =
k cos (x1 ) 0

Optimal control:
1
u= r λ2
Control law (from HJB):
ODE (with q = 0):

ẋ1 = x2
1
ẋ2 = k sin (x1 ) r λ2
λ̇1 = k cos (x1 ) λ2
λ̇2 = λ1

Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 9/9

You might also like