0% found this document useful (0 votes)
734 views8 pages

Aggressive Driving via MPPI Control

This paper presents a model predictive control algorithm for optimizing non-linear systems using a stochastic optimal control framework. The approach integrates planning and execution, allowing for real-time trajectory generation for aggressive driving tasks, specifically applied to a fifth-scale Auto-Rally vehicle. The method leverages GPU computing to efficiently handle the complexities of non-linear dynamics and costs, improving performance over traditional hierarchical methods.

Uploaded by

phanhoc19022001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
734 views8 pages

Aggressive Driving via MPPI Control

This paper presents a model predictive control algorithm for optimizing non-linear systems using a stochastic optimal control framework. The approach integrates planning and execution, allowing for real-time trajectory generation for aggressive driving tasks, specifically applied to a fifth-scale Auto-Rally vehicle. The method leverages GPU computing to efficiently handle the complexities of non-linear dynamics and costs, improving performance over traditional hierarchical methods.

Uploaded by

phanhoc19022001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2016 IEEE International Conference on Robotics and Automation (ICRA)

Stockholm, Sweden, May 16-21, 2016

Aggressive Driving with Model Predictive Path Integral Control


Grady Williams, Paul Drews, Brian Goldfain, James M. Rehg, and Evangelos A. Theodorou

Abstract— In this paper we present a model predictive control and knowing the constraint requires precise knowledge of the
algorithm designed for optimizing non-linear systems subject system dynamics in the form of the coefficient of friction.
to complex cost criteria. The algorithm is based on a stochastic Another approach, which we consider here, is the optimal
optimal control framework using a fundamental relationship
between the information theoretic notions of free energy and control approach which finds a sequence of controls or a
relative entropy. The optimal controls in this setting take the control law that is optimal with respect to a given cost
form of a path integral, which we approximate using an efficient and system dynamics. This approach combines planning and
importance sampling scheme. We experimentally verify the execution into one phase by explicitly taking the dynamics
algorithm by implementing it on a Graphics Processing Unit into account during optimization. The problem with the
(GPU) and apply it to the problem of controlling a fifth-scale
Auto-Rally vehicle in an aggressive driving task. optimal control approach is that, for systems with non-linear
dynamics and costs, solving the optimal control problem is
I. INTRODUCTION computationally demanding and cannot typically be done on
Aggressive driving at the limits of handling is a chal- the fly. There have been approaches which solve for the
lenging robotics problem which has received significant optimal control inputs offline. For instance in [7] cornering
attention in recent years. Most current algorithms approach is posed as a minimum time problem and analyzed offline. A
the problem hierarchically: first plan a trajectory considering similar method is [8], which solves for an optimal open loop
only the position and velocity of the vehicle, and then use a control sequence offline, and then applies an LQR controller
simple feedback controller to track the planned trajectory. to stabilize the vehicle about the open loop trajectory. An
This approach works well for vehicles operating in non- approach for performing aggressive sliding maneuvers in
aggressive regions of the state space and was used by the order to avoid collisions is developed in [9], where optimal
winning teams in the DARPA Grand Challenges in 2005 and trajectories are generated offline for a variety of initial
2007 [1], [2]. conditions and then a feedback controller is synthesized
The hierarchical approach has also been applied to the using Gaussian Process regression. In all of these methods
more challenging problem of controlling a vehicle near its the bulk of the computation is performed offline, which
limit of friction. In this case it is necessary to ensure that means that new behaviors and maneuvers cannot be produced
a planned path is feasible, which can be very difficult. If on the fly. In contrast, the method developed here is able
the coefficient of friction between the road and vehicle is to generate new trajectories in real-time, which allows it to
known then optimal velocity profiles can be found [3] for handle a wide range of scenarios.
pre-specified paths. And in the case where both the desired The method we develop is a model predictive control
path and a feasible velocity profile are given, several methods algorithm based on the path integral control framework.
exist for robustly tracking the given path [4], [5]. A method This framework combines the best of the hierarchical and
for high speed maneuvers which does not simply track a pre- optimal control paradigms. We do not split the problem into
specified trajectory is developed and deployed on small-scale a planning and execution phase, which allows for a simple
vehicles in a laboratory environment in [6]. The algorithm problem formulation and optimal behavior with respect to
relies on a series of pre-specified waypoints, and the planner the system dynamics. However, unlike in previous optimal
finds a feasible interpolation between the waypoints which a control approaches, we perform all of the computation on
model predictive controller then tracks. the fly by leveraging recent advances in GPU computing.
These approaches highlight some of the advantages and II. STOCHASTIC TRAJECTORY OPTIMIZATION
disadvantages of the hierarchical paradigm of splitting con-
trol and planning. Excellent performance (e.g. the DARPA In this section we derive a method for stochastic trajectory
Grand Challenge) can be achieved if the problem specifica- optimization based on the path integral control framework.
tions are safely within the handling limits of the vehicle, or if This method serves as the basis for our model predictive con-
the path and velocity profile are determined by the problem trol algorithm. Path integral control [10] provides a mathe-
structure. However, generating paths which are aggressive matically sound methodology for developing optimal control
and feasible is a difficult constrained optimization problem, algorithms based on stochastic sampling of trajectories. One
appealing aspect of algorithms developed in the path integral
This research has been supported by NSF Grant No. NRI-1426945, ARO framework is that they require no derivatives of either the
through MURI award W911NF-11-1-0046, and DURIP award W911NF- dynamics or cost. This provides flexibility and robustness
12-1-0377. The authors are with Institute for Robotics and Intelligent
Machines at the Georgia Institute of Technology, Atlanta, GA, USA. Email: when estimating the system dynamics and designing cost
[email protected] functions.

978-1-4673-8026-3/16/$31.00 ©2016 IEEE 1433


The typical derivation of path integral control relies on quadratic control cost:
an exponential transformation of the value function of the 1
optimal control problem. Additionally, it requires that the L(xt , ut , t) = q(xt , t) + uT R(xt , t)ut (2)
2 t
noise in the system only affects directly actuated states.
and dynamics which are affine in controls:
The exponential transform, along with the noise assumption,
transforms the stochastic Hamilton-Jacobi-Bellman equation F(xt , ut , t) = f (xt , t) + G(xt , t)ut (3)
into a linear partial differential equation which can then
be transformed into a path integral using the Feynman-Kac In [16] an interpretation of path integral control is given in
lemma. The ensuing path integral takes the form of an expec- terms of the information theoretic concepts of free energy
tation over trajectories under the uncontrolled dynamics of and relative entropy. The interpretation is based on the
the system, which is then approximated using Monte-Carlo following equality:
sampling to compute the control for the current state. This
h i
−λF(S(τ )) = inf EQ [S(τ )] + λDKL (Q k P) (4)
is the approach used in the path integral control frameworks Q
of: [10]–[15]. In this equation λ ∈ R+ , S(τ ) is Rdefined as the state-
The derivation we provide here provides two benefits T
dependent cost-to-go term φ(xT , T )+ t0 q(xt , t)dt, the
over the standard path integral approach: (1) It relaxes the free
energy F(S(τ )) is defined as: log EP exp − λ1 S(τ ) , P
condition between the noise and control authority to allow
is the probability measure over the space of trajectories in-
for noise in both indirectly and directly actuated states,
duced by the uncontrolled stochastic dynamics: f (xt , t)dt +
and (2) It explicitly provides a formula for the optimal
B(xt , t)dw, and Q is any probability measure defined over
controls over the entire time horizon instead of only the
the space of trajectories such that Q is absolutely continuous
initial time. The key insight which allows for this is the
with P (i.e. they agree on which sets have measure zero).
use of a fundamental relationship [16], [17] between the
The DKL (Q k P) denotes  relative entropy between Q and P
information theoretic notions of free energy and relative
which is defined as EQ log dQ dP .
entropy (also know as KL-Divergence). This relationship
The controlled dynamics induce another probability mea-
can be used to derive the form of an optimal probability
sure on the space of trajectories, which we denote Q(u). The
distribution over trajectories. We can then solve the optimal
relative entropy term between the uncontrolled distribution
control problem by directly minimizing the relative entropy
P and the controlled distribution Q(u) can be computed by
between the optimal distribution and the distribution induced
applying Girsanov’s theorem. The result is that:
by the controller.
This approach can be viewed as the cross-entropy method DKL (Q(u) k P) =
applied to stochastic diffusion processes, which was origi-
1 T T
Z
nally studied in [18]. In that work, the strategy was used u G(xt , t)T Σ(xt , t)−1 G(xt , t)ut dt (5)
to fit the parameters of a feedback-control policy offline. In 2 t0 t
contrast, we do not restrict the controls to a pre-specified Where Σ(xt , t) = B(xt , t)B(xt , t)T . Applying this result,
policy and perform all computation online. Two crucial and if we assume that the control cost matrix takes the form:
concepts in this approach are the Radon-Nikodym derivative,
which relates two probability measures to each other, and R(xt , t) = λG(xt , t)T Σ(xt , t)−1 G(xt , t) (6)
Girsanov’s theorem which provides the form of the Radon-
we get the following correspondence between the right hand
Nikodym derivative for systems subject to Brownian distur-
side of (4) and (1):
bances.
EQ(u) [S(τ )] + λDKL (Q k P) =
A. Problem Formulation " #
1 T T
Z
We consider stochastic dynamical systems with the state EQ(u) S(τ ) + u R(xt , t)ut dt (7)
and controls at time t denoted xt ∈ Rn and ut ∈ Rm . 2 t0 t
These dynamics are disturbed by the Brownian motion dw ∈ Moreover, [16] explicitly derives the form of the optimal
Rp . We also define u(·) : [t0 , T ] → Rm as the function probability measure Q∗ in terms of the Radon-Nikodym
which maps time to control inputs, and define the function derivative1 with respect to the uncontrolled dynamics. This
τ : [t0 , T ] → Rn as a trajectory of the system. In the classical takes the form:
stochastic optimal control setting we seek a control sequence
exp − λ1 S(τ )

dQ∗
u(·) such that: = (8)
EP exp − λ1 S(τ )
 
" # dP
Z T
∗ Previous works, [16], [17], make these connections between
u (·) = argmin EQ φ(xT , T ) + L(xt , ut , t)dt (1)
u(·) t0 the information theoretic notions of free energy, relative
entropy, and classical optimal control theory. However, they
where the expectation is taken with respect to the dynamics:
dx = F(xt , ut , t)dt+B(xt , t)dw. In this paper we consider 1 The Radon-Nikodym derivative defines the measure Q∗ by means of the

equation Q∗ (A) = A dQ
R
costs composed of an arbitrary state-dependent term and a dP
dP

1434
(0)
do not provide a method for computing a control law hR dw i is a Brownian motion with respect to P (i.e.
Where
t
independent of the HJB-equation, as we do here. EP 0 dw(0) = 0, ∀t). Combining this with (8), we get the
The main idea in our approach is that since we have the following expression for DKL (Q∗ k Q(u)):
form of the optimal distribution, it is possible to pursue the " !#
following optimization scheme: instead of trying to directly exp(− λ1 S(τ )) exp(D(τ, u(·)))
EQ∗ log (14)
EP exp(− λ1 S(τ ))
 
solve the optimal control problem by computing the solution
to the stochastic HJB equation, we can solve the mini-
mization problem defined by (4) by moving the probability Then using basic rules of logarithms and exponents we can
distribution induced by the controller, Q(u), as close as re-write this as:
possible to the optimal probability measure, Q∗ , defined EQ∗ − λ1 S(τ ) + D(τ, u(·)) − log EP exp − λ1 S(τ )
  
dQ∗
by the Radon-Nikodym derivative dP . Using the relative (15)
entropy between Q∗ and Q(u) as a notion of distance, we Since S(τ ), does not depend on the controls u(·), we can
obtain the following minimization problem: remove the first and last terms from the minimization to get:
u∗ (·) = argmin DKL (Q∗ k Q(u)) (9) argmin DKL (Q∗ k Q(u)) = argmin EQ∗ [D(τ, u(·))] (16)
u(·) u(·) u(·)

This is an intuitively appealing strategy since the optimal The goal is to find the function u∗ (·) which minimizes
probability measure Q∗ takes the form of a softmax function

(16). However, since we inevitably apply the control in
with temperature λ. If a trajectory has low cost, then dQ
dP , has discrete time it suffices to consider the class of step functions:
a high value. Therefore, if a trajectory is drawn at random 
.
from Q∗ , it is likely to have a very low cost. If Q(u) is ..


able to closely approximate Q∗ , then applying the controls

ut = uj if j∆t ≤ t < (j + 1)∆t (17)
u(·) to the system will have a high probability of generating 
.
..


a low cost trajectory. This objective is closely related to
the Bayesian approximate inference approach to stochastic
With j = {0, 1, . . . N }. Applying this parameterization to
optimal control [19]. In that approach, the relative entropy
D(τ, u(·)) yields:
between the controlled distribution and a target distribution
is minimized (DKL (Q(u) k Q∗ ) in our notation). However, XN Z tj+1
1 T tj+1
Z
T (0)
since relative entropy is not symmetric, this leads to a very − uj G(xt , t)dw + uj H(xt , t)dt uj
j=0 tj 2 tj
different optimization problem than what is derived here.
(18)
B. Relative Entropy Minimization Where we have denoted:
We seek to minimize the relative entropy between the i) G(x, t) = G(x, t)T Σ(x, t)−1 B(x, t)
optimal distribution Q∗ and the distribution induced by the ii) H(x, t) = G(x, t)T Σ(x, t)−1 G(x, t)
controller Q(u). Applying the definition of relative entropy iii) N = T /∆t.
we have: By noting that each uj does not depend on the trajectory
taken, when we apply the expectation operator we have
dQ∗
  

DKL (Q k Q(u)) = EQ∗ log (10) EQ∗ [D(τ, u(·))] equal to the following:
dQ(u)
N
"Z #
tj+1
In order to optimize this function we need to∗ find an X
T (0)
dQ EQ∗ [D(τ, u(·))] = − uj EQ∗ G(xt , t)dw
expression for the Radon-Nikodym derivative dQ(u) . To do tj
j=0
this, we apply the chain rule property of Radon-Nikodym "Z #
N tj+1
derivatives: X 1 T
dQ∗ dQ∗ dP + u EQ∗ H(xt , t)dt uj (19)
= (11) j=0
2 j tj
dQ(u) dP dQ(u)
∗ Now it’s easy to see that this is convex with respect to each
We know what dQ ∗
dP is from the definition of Q , and we can uj , so to find u∗j we can take the gradient of (19) with respect
dP
apply Girsanov’s theorem again to compute dQ(u) as: to uj , set it equal to zero and solve for uj . Doing this yields:
dP hR i−1 hR i
t t
= exp(D(τ, u(·))) (12) u∗j = EQ∗ tjj+1 H(xt , t)dt EQ∗ tjj+1 G(xt , t)dw(0)
dQ(u)
(20)
with D(τ, u(·)) defined as: For small ∆t we can make the approximations that:
Z T Z tj+1
D(τ, u(·)) = − uT T −1
B(xt , t)dw(0) H(xt , t)dt ≈ H(xtj , tj )∆t
t G(xt , t) Σ(xt , t) tj
0
tj+1 tj+1
1 T T
Z Z Z
(0)
+ u G(xt , t)T Σ(xt , t)−1 G(xt , t)ut dt (13) G(xt , t)dw ≈ G(xtj , tj ) dw(0)
2 0 t tj tj

1435
which allows us to obtain: Where j is a vector where each√entry is a standard normal
" Z tj+1
# random variable. Substituting j ∆t into (26) yields:
1 −1
 √ #!
u∗j = dw(0)

EQ∗ H(xtj , tj ) EQ∗ G(xtj , tj ) "
1
∆t 1 −1 exp − S(τ ) j ∆t
tj
u∗j = H G Ep  λ (28)
Ep exp − λ1 S(τ )

(21) ∆t
Since we cannot sample from the Q∗ distribution, we need to
where p is the probability distribution corresponding to the
change the expectation to be an expectation with respect to
discrete time uncontrolled dynamics (i.e. Equation (27) with
the uncontrolled dynamics, P. We can then directly sample
utj set to zero). Instead of sampling from p we can choose
trajectories from P to approximate the controls. The change
to sample from a different probability distribution quν which
in expectation is achieved by applying the Radon-Nikodym
corresponds to sampling from the dynamics:
derivative: √
" #−1 dxtj = f (xtj , tj )∆t+G(xtj , tj )utj ∆t+BE (xtj , tj )j ∆t
1

1 exp − S(τ ) H(x t , tj ) (29)
u∗j = EP λ j

EP exp − λ1 S(τ )

∆t The new diffusion matrix BE is defined as:
 
Ba (xt ) 0
 Rt 
exp − λ1 S(τ ) G(xtj , tj ) tjj+1 dw(0)

BE (xt ) = (30)
EP   (22) 0 νBc
EP exp − λ1 S(τ )
 
with ν ≥ 1. When sampling from this distribution the
C. Special Case: State Independent Control Matrix designer gets to choose:
i) The initial controls from which sampling is centered
In this section we consider a special case of (22) which
about.
we use in all of our experiments. We suppose that the control
ii) The magnitude of the exploration variance defined by
matrix and diffusion matrix have the form:
    ν.
0 Ba (xt ) 0 In order to sample from quν instead of p it’s necessary to
G= , B(xt ) = (23)
Gc 0 Bc compute the likelihood ratio between the two distributions.
In other words, there are no correlations between noise in We omit this derivation for brevity and refer the reader to
the directly actuated and non-directly actuated states, and the [20]. Inserting the likelihood ratio corresponds to changing
diffusion for the directly actuated states is state-independent. the running cost from q(xt , t) to:
In this case the covariance matrix becomes: 1 
q̃(xt , ut , t , t) = q(xt , t) + uT T
t Rut + λu G √ +
Ba (x)Ba (x)T
 
0 2 ∆t
Σ(x) = (24)
0 Bc BT c 1 T T −1 
λ(1 − ν −1 ) √ BT

c Bc Bc Bc √ (31)
And then the terms H(xtj ) and G(xt ) are no longer state 2 ∆t ∆t
dependent and reduce to: The first two additional terms can be interpreted as penalties
T −1 T −1
for shifting the mean of the exploration away from zero,
H = GT
c (Bc Bc ) Gc G = GT
c (Bc Bc ) Bc (25) and the last term is a penalty for sampling from an over-
And since all of these matrices are state-independent we can aggressive variance. Note that Bc √∆t can be interpreted as
pull them out of the expectation to get: the effective change in the control input due to noise.
"Z #! The update rule (26) also changes when sampling from
tj+1
exp − λ1 S(τ ) dw(0)

∗ 1 −1 a distribution with non-zero control input. This is due to
uj = H G EP the random variable under
EP exp − λ1 S(τ ) √ consideration shifting from the
 
∆t tj
(26) zero mean term√ Bc j ∆t to the non-zero mean term:
Gc u∆t + BE j ∆t. By substituting this term into (28) and
D. Numerical Approximation simplifying, and if we denote S̃(τ ) as:
In order to numerically approximate (26) there are two N
X
problems that need to be addressed. First, we need to re- S̃(τ ) = φ(xT , T ) + q̃(xt , ut , t , t)∆t (32)
write the equation for sampling in discrete time. Secondly, j=0
the expectation is with respect to the uncontrolled dynamics We get that (26) becomes the iterative update rule:
which in many cases is a very inefficient distribution to    
j

sample from (sampling from P corresponds to turning the exp − λ1 S̃(τ ) √∆t
machine on and waiting for the natural noise in the system u∗j = uj + H−1 G Equν  h  i  (33)
Equν exp − λ1 S̃(τ )
to produce useful trajectories). So we need a way to perform
importance sampling with (26). and then the term inside the parenthesis is approximated as:
In discrete time the dynamics of the system transform into:   


 √ XK exp − λ1 S̃(τk ) √j,k ∆t
dxtj = f (xtj , tj ) + G(xtj , tj )uj ∆t + B(xtj , tj )j ∆t P   (34)
K 1
(27) k=1 k=1 exp − λ S̃(τ k )

1436
where each trajectory τk is drawn from the sampling dynam-
ics (29). The iterative update rule can be seen as computing
the new control based on a reward weighted average over
trajectories. Note that if Gc and Bc are multiples of each
other with γBc = Gc , then H−1 G = γ.

III. MODEL PREDICTIVE CONTROL ALGORITHM


Equations (33) and (34) provide an iterative update law
which we can apply in a model predictive control setting. In
this setting optimization takes place on the fly: the trajectory
is optimized, then a single control input is executed before Fig. 1. AutoRally vehicle with protective body removed. A: IMU and GPS
re-optimization occurs. Since the derivation of path integral enclosure B: Hall effect wheel speed sensors C: High performance electric
control provided here gives a formula for optimizing the motor D: Rugged computer and battery enclosure
entire sequence of controls, as opposed to just the current
time instance, we can re-use the un-executed portion of
the control sequence to warm start the optimization. This (MPPI) algorithm is given in Algorithm 1. The simplest
is crucial for the performance of the algorithm since, for a approach to running Algorithm 1 on a GPU is to simply par-
complex system operating at a reasonable control frequency, allelize the sampling for-loop using one thread per sample.
it is typically only possible to perform a small number of However, since the vehicle dynamics are fairly complex, we
iterations every timestep. obtained additional speed up by using multiple threads per
The other key aspect of real-time implementation is that sample and parallelizing the dynamics predictions as well.
the bulk of the computation involved in the path integral IV. AUTONOMOUS DRIVING SYSTEM
control update law can be done in parallel. We take advantage
The ability of MPPI to handle complex non-linear dynam-
of this by implementing the sampling step on a Graphics
ics and costs on the fly make it an appealing candidate for
Processing Unit (GPU). Parallel trajectory sampling using
autonomously controlling a vehicle near its friction limits. In
a GPU is an efficient operation, which takes less than 15
order to test the controller in a realistic, off-road environment
millisecond in our implementation, even when sampling
we applied it to the Auto-Rally vehicle developed at the
thousands of trajectories from complex non-linear dynamics.
Georgia Institute of Technology in an aggressive driving
The description of the model predictive path integral control
scenario.

Algorithm 1: Model Predictive Path Integral Control A. Fifth-Scale Auto-Rally Platform


Given: K: Number of samples; The Auto-Rally vehicle is based on a 1/5 scale remote
N : Number of timesteps; control vehicle equipped with an electric motor. It has been
(u0 , u1 , ...uN −1 ): Initial control sequence; modified to host onboard an ASUS Mini-ITX motherboard
∆t, xt0 , f , G, B, BE : System/sampling dynamics; with an Intel quad-core i7 processor, 16GB RAM, 2 SSDs,
φ, q, R, λ: Cost parameters; an Nvidia GTX-750ti graphics card, and a 222Wh battery.
uinit : Value to initialize new controls to; Additionally it has been equipped with 2 forward facing
cameras, a Lord Microstrain 3DM-GX4-25 IMU, an RTK
while task not completed do corrected GPS receiver, and hall effect wheel speed sen-
for k ← 0 to K − 1 do sors. All computational components have been housed in a
x = xt0 ; rugged aluminum enclosure able to withstand violent vehicle
for i ← 1 to N − 1 do √ rollovers with no damage to internal components. The GPS
xi+1 = xi + (f + Gui )∆t + BE i,k ∆t; and IMU are housed in a separate protective enclosure
S̃(τk ) = S̃(τk ) + q̃(xi , ui , i,k , ti ); at the rear of the vehicle (Fig. 1). The total platform is
approximately 22 kg and measures 0.9 meters from front
for i ← 0 to N − 1 do to back. This platform allows fully self-contained testing of
ui = 
PK

exp(− λ
1 
S̃(τk )) √j,k
 all algorithms, with no reliance on external position systems
−1
ui + H G k=1 PK ∆t
; or computation beyond a GPS receiver. The onboard GPU
k=1 exp(− λ S̃(τk ))
1

allows the MPPI algorithm to control the vehicle in real time.


send to actuators(u0 ); Our implementation can sample over 2500, 2.5 second long
for i ← 0 to N − 2 do trajectories in under 1/60 of a second, allowing state feedback
ui = ui+1 ; from the vehicle to enter the optimization at 60Hz.
uN −1 = uinit
Update the current state after receiving feedback; B. State Estimation
check for task completion; In order to accurately determine the vehicle state to feed
back into the MPPI controller, the IMU and GPS measure-

1437
ments are combined and pose, velocity, and bias estimates inputs, so we found it helpful to smooth the controls via the
are obtained at each time step. To perform this optimization equation: u̇ = k(udes − u). Therefore the MPPI algorithm
and estimation, a factor graph is constructed with successive actually considers udes as the control input of the system,
measurements and iteratively optimized using the software and considers u as a state variable. It is only necessary to
package GTSAM and iSAM2 [21]. In order to keep com- learn the dynamics for ψ, Ux , Uy , and θ̇ since the others are
putational loads low while maintaining high accuracy, this kinematically obvious.
1
graph only optimizes for state nodes at 10Hz, corresponding We set k = 10 and Bc = 100 I2×2 . The sampling
to GPS measurements [22]. To obtain a state estimate at noise was set to 25 times Bc . We also experimented with
200 Hz, IMU measurements are integrated to interpolate the including noise from the passive dynamics in the sampling
10Hz smoothed position to 200 Hz. procedure by using the variance estimate of the model.
This corresponds to setting Ba (xt ) equal to a diagonal
C. Data Driven System ID
matrix with Ba (xt )i,i = σi (xt ). This results in slightly less
In order to deploy the MPPI algorithm, an approximate aggressive control inputs compared with setting Ba to zero,
model of the system dynamics is required. We opted for however it was unclear what the overall effect on the system
an approach which utilized information from physics based performance was. During our experiments we include this
analysis and tools from machine learning in order to perform extra noise in the sampling.
system identification. Analytic vehicle models were used to This model provides a relatively fast and accurate method
identify the important non-linearities in the dynamics, these for sampling trajectories in order to estimate (34). However,
were then used to construct 24 different basis functions, the modeling error is significant, especially after 1.5 seconds
which we denote φ(x). We then fit a (bayesian) linear of simulation, so the MPPI algorithm has to be robust to this
model using these basis functions. The result is that each error in order to achieve good performance. Fig. 2 provides
element of the passive dynamics takes the form fi (x) ∼ the mean absolute error of the prediction after 0.5, 1.0, 1.5,
N (µi (x), σi2 (x)) where: 2.0, and 2.5 seconds of simulation.
1
µi (x) = φ(x̃)T A−1
i Φ(X̃)Yi , σi2 (x̃) = φ(x̃)T A−1
i φ(x̃)
V. EXPERIMENTS
σn2
We gave the MPPI algorithm the task of navigating the
Where the matrix Ai = σ12 Φ(x̃)Φ(x̃)T + Σ−1 p . In these AutoRally platform around a roughly ellipsoid track (see
n
equations x̃ is x− x̄ where x̄ is the mean of the training data. Fig. (3) for the track dimensions and Fig. (5) for a image
Φ(X̃) is an n × d matrix representing the basis functions for of the track) with a maximum outer diameter of 30 meters
all elements of the training set which has size d. Σp defines and a uniform 3 meter wide driving surface. We tested
the prior over the training data and σn is the estimated noise the controller first with a moderate target speed of (6 m/s)
due to observation error. The state space x is composed and then at a higher target speed of (8 m/s). We consider
the aggressive regime of the state space to be a target
Multi-Step Prediction Error velocity which is above the friction limit of the vehicle while
4.0 1.8
cornering on the test track. This was empirically determined
Velocity Error (m/s)

Px 1.6 Ux
Position Error (m)

3.5
Py 1.4 Uy
3.0 to be between 5-6 m/s with the actual limit varying due to
1.2
2.5
1.0 changes in the track conditions.
2.0
0.8
1.5 A. Cost Formulation
0.6
1.0 0.4
0.5 0.2 We formulated the the task of driving around the track as
0.0
0.0 0.5 1.0 1.5 2.0 2.5
0.0
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
a finite-horizon optimal control problem in order to deploy
the MPPI algorithm. The cost function consists of three
Hea. Rate Error (rad/s)

4.0 1.0
Heading Error (rad)

3.5 θ main parts: (1) A cost for staying on the track, (2) A cost
0.8
3.0 for achieving a desired velocity, and (3) A control cost. In
2.5 0.6
2.0
addition to the track cost, if a given trajectory leaves the
1.5 0.4 track then the dynamics are set to zero and the car remains
1.0
0.2
in its current location for the rest of the simulation. A cost
0.5 θ̇ on the sideslip angle was also added which improved the
0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 stability of the vehicle. The final cost is formulated as:
Time (s)
2
Fig. 2. Mean absolute error and standard deviation of the multi-step 2.5 (Vdes − V ) + 50.0h(px , py )2 + 40.0C + 10.0kuk2 (35)
prediction as a function of time.
Here Vdes is the desired speed of the vehicle, and V =
q
of global position in the world frame: px , py , the heading Ux2 + Uy2 . The function h(px , py ) is a function which
and roll angle of the vehicle: θ, ψ, and the longitudinal and returns -1 if the vehicle is on the inside boundary of the track
lateral velocity of the vehicle in body frame Ux , Uy . We also (or off the track inside), 1 if the vehicle is on the outside
include θ̇ which is the derivative of the heading angle. The boundary of the track (or off the track outside) and a value
stochastic nature of the algorithm can result in jittery control in [−1, 1] if the vehicle is on the track with 0 representing

1438
the middle, C is an indicator variable which turns on if
8
Velocity at 6 m/s and 8 m/s Targets
the side-slip angle exceeds .25 radians (14.33 degrees). The 7
coefficients which determine the relative weights of the cost 6

6 m/s Target
components were set manually through experimentation. 5
The negative exponentiation of the cost-to-go can be 4
3 |Ux | k(Ux , Uy )k
numerically problematic if all of the cost-to-go terms are |Uy |
2
large, to combat this we normalize the cost according to the

Velocity (m/s)
1
current cost-to-go of the nominal trajectory. This corresponds 0
to multiplying with S̃α , where S̃n is the current cost-to-go, 8
n
instead of λ1 . We set α = 10. 7
6

8 m/s Target
B. 6 m/s Target Velocity 5
4
For navigating the track at a moderate velocity we set
3
the speed cost in (35) to 6 m/s. Although we chose this as 2
the moderate speed target, it is important to note that this 1
is still a challenging control task since cornering at 6 m/s 0
0 20 40 60 80 100 120
is beyond the friction limits of the vehicle. For instance, a Time (s)
constant speed PID controller following a pre-specified path Fig. 4. Velocity profile for Top: 6 m/s target trial and Bottom: 8 m/s target
velocity.
of waypoints will start to slip and lose control around corners
at 5 m/s. The MPPI controller enters turns at a little under
Paths Taken for 6 m/s and 8 m/s Targets speed for a small amount of time, if at all, before it has
10 to start slowing down to maneuver the corners. There is
6 m/s Target

5 TABLE I
0
P ERFORMANCE METRICS FOR 6 M / S AND 8 M / S TEST RUNS .

−5 Performance Metric 6 m/s target 8 m/s target


Y Position (m)

Average Speed 5.67 m/s 6.15 m/s


−10 Top Speed 7.75 m/s 7.98 m/s
Average Lap Time 11.26s 10.04s
10 Maximum Side-Slip Angle 43.99◦ 64.83◦
8 m/s Target

5
a significant difference, both qualitative and quantitative,
0 between this setting and the 6 m/s setting. Table I highlights
−5
the quantitative difference in performance metrics between
the two trials. The vehicle takes more velocity into the
−10
turns at slightly above 5 m/s, and then makes a sharp turn
−20 −15 −10 −5 0 5
while sliding a significant amount. The amount of sliding is
X Position (m)
indicated by the body frame lateral velocity (i.e. the velocity
Fig. 3. Top: Path of the vehicles movement at the 6 m/s setting, Bottom:
Paths at 8 m/s. perpendicular to the forward facing direction of the car),
during the 8 m/s trial run this velocity regularly exceeds 2.5
5 m/s and then accelerates out of them while sliding a small m/s during cornering and reaches as high as 4 m/s (approx.
amount (indicated by the non-zero lateral velocity). Fig. 4 9 mph) during some turns.
shows the velocity profile of the vehicle during the trial, and VI. CONCLUSIONS
Fig. 3 shows the path of the vehicle during a 2 minute long
run. There are a couple of occasions where the vehicle’s path In this work we have derived a sampling based MPC
diverges from the rest of the trajectories. In these scenarios algorithm and experimentally verified it on a challenging
the car applies throttle while the back wheels have broken real system. The MPPI algorithm relies on a new deriva-
lose from the ground. This causes the tires to spin instead of tion of path integral control which is based on relative
providing forward acceleration. The result is that the vehicle entropy minimization. The benefits of this approach are that
over-steers around the corner. This behaviour is caused by it explicitly provides a formula for the controls over the
a mismatch between the learned and actual dynamics of the entire time horizon, as opposed to the first timestep only,
vehicle. Despite these disturbances, the controller is quickly and it relaxes the usual condition between control authority
able to re-converge on a trajectory and move the car back and noise required in path integral control. In addition to
towards the center of the track. this theoretical contribution, we’ve given a demonstration of
the algorithm in an aggressive autonomous driving scenario
C. 8 m/s Target Velocity using a fifth-scale Auto-Rally vehicle. The MPPI controller is
For a high target speed we chose 8 m/s. For this size able to maneuver the vehicle around the track, even when the
of track it is only possible for the vehicle to be at this desired speed is set beyond the friction limits of the vehicle,

1439
and it is even able to perform controlled power slides around R EFERENCES
the corners (Fig. 5). [1] Urmson et al., “Tartan racing: A multi-modal approach to the darpa
These behaviors highlight the advantages of this approach urban challenge,” 2007.
over the hierarchical and traditional optimal control ap- [2] Thrun et al., “Stanley: The robot that won the darpa grand challenge,”
Journal of field Robotics, vol. 23, no. 9, pp. 661–692, 2006.
proaches. In the hierarchical case the path planner has very [3] E. Velenis and P. Tsiotras, “Minimum-time travel for a vehicle
limited knowledge of vehicle/track dynamics and has to with acceleration limits: Theoretical analysis and receding-horizon
either be conservative or use expert knowledge imparted implementation,” Journal of Optimization Theory and Applications,
vol. 138, no. 2, pp. 275–296, 2008.
by the control designer to select feasible trajectories. And, [4] K. L. Talvala, K. Kritayakirana, and J. C. Gerdes, “Pushing the limits:
unlike previous optimal control approaches which perform From lanekeeping to autonomous racing,” Annual Reviews in Control,
most of the computation offline, the MPPI algorithm is able pp. 137 – 148, 2011.
[5] K. Kritayakirana and J. C. Gerdes, “Using the centre of percussion
to generate completely new behaviors on the fly. This enables to design a steering controller for an autonomous race car,” Vehicle
the controller to push the vehicle to its operational limits System Dynamics, vol. 50, no. sup1, pp. 33–51, 2012.
using only an approximate model of the system and a simple [6] N. Keivan and G. Sibley, “Realtime simulation-in-the-loop control for
agile ground vehicles,” Towards Autonomous Robotic Systems: 14th
cost function. Annual Conference, 2013.
Finally we would like to conclude by discussing the advan- [7] E. Velenis, P. Tsiotras, and J. Lu, “Modeling aggressive maneuvers
tages of the proposed framework when compared to standard on loose surfaces: The cases of trail-braking and pendulum-turn,” in
Control Conference (ECC), 2007 European. IEEE, 2007, pp. 1233–
optimal control methodologies using Dynamic Programming. 1240.
In particular, the derivation of the optimal control is per- [8] M. Gerdts, S. Karrenberg, B. Müller-Beßler, and G. Stock, “Generating
formed without the requirement to make connections with locally optimal trajectories for an automatically driven car,” Optimiza-
tion and Engineering, vol. 10, no. 4, pp. 439–463, 2009.
the Hamilton-Jacobi-Bellman PDE, as is typically required in [9] P. Tsiotras and R. S. Diaz, “Real-time near-optimal feedback control
stochastic optimal control. Therefore the minimization of the of aggressive vehicle maneuvers,” in Optimization and optimal control
relative entropy in (9) corresponds to an alternative stochastic in automotive systems. Springer, 2014, pp. 109–129.
[10] H. J. Kappen, “Linear theory for control of nonlinear stochastic
optimal control formulation that goes beyond the standard systems,” Phys Rev Lett, vol. 95, p. 200201, 2005, journal Article
HJB-PDE approach and Dynamic Programming. United States.
[11] E. Theodorou, J. Buchli, and S. Schaal, “A generalized path integral
control approach to reinforcement learning,” The Journal of Machine

7
Sliding Through Turns Learning Research, vol. 9999, pp. 3137–3181, 2010.
[12] E. Rombokas, M. Malhotra, E. Theodorou, E. Todorov, and Y. Mat-
6 suoka, “Reinforcement learning and synergistic control of the act
Velocity (m/s)

5 hand,” Mechatronics, IEEE/ASME Transactions on, vol. 18, no. 99,


4
|Ux | pp. 569–577, 2013.
|Uy | [13] V. Gómez, H. J. Kappen, J. Peters, and G. Neumann, “Policy search for
3 k(Ux , Uy )k path integral control,” in Machine Learning and Knowledge Discovery
2 in Databases - European Conference, ECML PKDD 2014, Nancy,
1 France, September 15-19, 2014. Proceedings, Part I, 2014, pp. 482–
0 497.
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 [14] G. Williams, E. Rombokas, and T. Daniel, “Gpu based path integral
Time (s) control with learned dynamics,” in Neural Information Processing
4
2
Systems, Autonomously Learning Robots Workshop, 2014.
[15] V. Gómez, S. Thijssen, H. J. Kappen, S. Hailes, and A. Syming-
Y Position (m)

0
ton, “Real-time stochastic optimal control for multi-agent quadrotor
−2
swarms,” RSS Workshop, 2015. arXiv:1502.04548, 2015.
−4
[16] E. A. Theodorou and E. Todorov, “Relative entropy and free energy
−6
dualities: Connections to path integral and kl control,” in Decision and
−8
Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012,
−10 pp. 1466–1473.
−12 [17] E. A. Theodorou, “Nonlinear stochastic control and information the-
−10 −5 0 5
X Position (m) oretic dualities: Connections, interdependencies and thermodynamic
interpretations,” Entropy, vol. 17, no. 5, p. 3352, 2015.
[18] W. Zhang, H. Wang, C. Hartmann, M. Weber, and C. Schutte,
“Applications of the cross-entropy method to importance sampling and
optimal control of diffusions,” SIAM Journal on Scientific Computing,
vol. 36, no. 6, pp. A2654–A2672, 2014.
[19] K. Rawlik, M. Toussaint, and S. Vijayakumar, “On stochastic optimal
control and reinforcement learning by approximate inference,” in Int.
Conf. on Robotics Science and Systems (R:SS 2012).
[20] G. Williams, A. Aldrich, and E. Theodorou, “Model predictive
path integral control using covariance variable importance sampling,”
preprint, http://arxiv.org/abs/1509.01149v2.
[21] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and
F. Dellaert, “isam2: Incremental smoothing and mapping using the
bayes tree,” International Journal of Robotics Research, vol. 31, no.
2, SI, pp. 216–235, FEB 2012.
[22] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “Imu preinte-
gration on manifold for efficient visual-inertial maximum-a-posteriori
Fig. 5. Top: Velocity Profile of the Auto-Rally vehicle cornering. Middle: estimation,” in Proceedings of Robotics: Science and Systems, Rome,
Path taken and heading progression. Bottom: Time lapse video of the Italy, July 2015.
maneuver.

1440

You might also like