0% found this document useful (0 votes)
31 views8 pages

Path Planning with Local Motion Estimations

Uploaded by

feel relax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Path Planning with Local Motion Estimations

Uploaded by

feel relax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020 1

Path Planning with Local Motion Estimations


Jérôme Guzzi, R. Omar Chavez-Garcia, Mirko Nava, Luca Maria Gambardella, Alessandro Giusti

Abstract—We introduce a novel approach to long-range path


planning that relies on a learned model to predict the outcome
of local motions using possibly partial knowledge. The model is
trained from a dataset of trajectories acquired in a self-supervised
way. Sampling-based path planners use this component to eval-
uate edges to be added to the planning tree. We illustrate the
application of this pipeline with two robots: a complex, simulated,
quadruped robot (ANYmal) moving on rough terrains; and a
simple, real, differential-drive robot (Mighty Thymio), whose
geometry is assumed unknown, moving among obstacles. We (a) ANYmal simulation on chal- (b) Real Mighty Thymio with an
quantitatively evaluate the model performance in predicting the lenging terrain. extension on the right.
outcome of short moves and long-range paths; finally, we show Fig. 1: Platforms used in our experiments.
that planning results in reasonable paths.
Index Terms—Motion and Path Planning, Deep Learning in
Robotics and Automation, Probability and Statistical Methods motion estimator is queried very often, and therefore needs to
be fast); and partial knowledge about the environment (e.g., at
I. I NTRODUCTION planning time, we might know the terrain topography but not
its softness/friction characteristics).
AMPLING-based planning [1] is a powerful and general
S solution to path planning that casts the problem as a search
for a sequence of feasible local motions, each of which links
Our main contribution is a novel approach in which:
• we learn an estimator to predict the outcome (e.g.,
success probability, duration, energy, . . . ) of motions
two nearby states: the first local motion starts at the source
linking two nearby states, given the available (possibly
state and the last ends at the target state. The feasibility of
incomplete) knowledge about the local environment;
local motions is determined by a local motion estimator, which
• we use the learned estimator in a sampling-based global
should account for the robot kinematics, the local planners
planning framework to identify feasible local motions and
and controllers at play, and all available knowledge about the
assign them a cost to be minimized (e.g., finding the path
environment.
with minimal failure probability),
The local motion estimator is typically a simple component
that checks whether a direct move linking two states collides under the assumptions that: (1) a local planner and a controller
with known obstacles and complies with the robot’s kinematic are available; (2) our knowledge of the environment is static
constraints. However, this approach is unsuitable for cases with (in particular, at planning time, the map covers source and
complex, possibly stochastic interactions between the robot target locations and the robot does not acquire new information
and its environment. during trajectory execution); and (3) the robot can not recover
Consider, as an example, the case of a legged robot planning from failures and has thus no need for dynamic re-planning.
a long path on a known, rugged terrain. A possible alternative We will briefly discuss in Section VI how some of these
to evaluate whether a local motion is feasible is to accurately restrictions can be lifted.
simulate the robot moving on the terrain patch between two In this context, training the estimator requires to collect
locations. Such simulation must account, among other factors, instances in the form of input-output pairs: the input is a
for the terrain shape, the robot-terrain interaction, the charac- pair of nearby states and any available knowledge about the
teristics of the robot’s sensing ability and low-level controllers, surrounding environment; the output is the outcome of the
and the probability that the robot could slip on the terrain. This move. Such data can be acquired either in simulation or in
is problematic for two reasons: computational expense (a local the real world, with a robot attempting local motions and
measuring their outcomes. If the robot can sense the outcome
Manuscript received: September, 10, 2019; Revised December, 12, 2019; without external help (e.g., by visual odometry to measure
Accepted January, 8, 2020. advancement), the approach can be implemented in a self-
This paper was recommended for publication by Editor Nancy Amato upon
evaluation of the Associate Editor and Reviewers’ comments. This work has supervised [2] fashion, and the robot can progressively adapt
been conducted as part of ANYmal Research, a community to advance legged the estimator to the environment.
robotics, and was supported by the Swiss National Centre of Competence in The approach naturally handles planning in partially-known,
Research Robotics.
Authors are with the Dalle Molle Institute for Artificial Intelligence stochastic environments, i.e., where robot’s actions do not
(IDSIA), USI-SUPSI, Lugano, Switzerland [email protected] lead to deterministic outcomes. In particular, by adopting
Data and code to reproduce our results, video demonstrations, and ad- probabilistic machine learning [3] tools, one can expect that
ditional information are available online at https://github.com/idsia-robotics/
pplanning-local-estimations. any uncertainty in the estimator inputs is properly mapped to
Digital Object Identifier (DOI): see top of this page. uncertainty in the outputs, which is handled in a principled

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

way when planning. b) Search spaces: Some planning strategies explore the
Note that the approach does not require that a model of the space of states and determine feasible paths as sequences of
robot is available (unless one uses simulations to acquire train- feasible states to traverse, for example, focusing on collision-
ing data) and resembles the way humans plan paths in open, free constraints or searching for a stable gait state [13].
rugged terrains: estimating from a distance whether a given Alternatively, planners search in the action space: feasible
passage is traversable relies on a quick, intuitive evaluation paths are sequences of feasible actions to execute. In our
based on partial information about the local environment and previous work [14], we estimated the feasibility of motions on
on previous experience attempting to negotiate similar-looking a regular horizontal grid. In this work, we instead randomly
passages. Crucially, this evaluation accounts for the operation sample from a much larger space.
and performance of lower-level controllers, i.e., in our case, c) Local motion estimation: Estimating the feasibility
the motor skills of the subject. An example of online learning of local motions is a key component of global path plan-
in this context occurs when inexperienced hikers learn how to ning algorithms. For instance, for legged robots, traversability
plan paths that avoid wet rocks after slipping and falling once scores can be assigned to cells of 2D maps to account for the
on such surface. robot’s geometry, and the terrain roughness and slope [15];
After reviewing related work in Section II, we formalize in a more complex approach, Fankhauser et al. [13] first
our abstract model (Section III) and describe implementa- compute stable footholds, and then body and leg trajectories
tions (Section IV) in two contexts: a simulated legged robot to overcome steep sections and high steps. In this context,
navigating rugged terrain and a real wheeled robot with un- when the interaction between the robot and the environment is
known geometry navigating among obstacles. Qualitative and complex, one can train a model to output traversability scores:
quantitative experimental results are reported and discussed in Wellhausen et al. [16] first use a weakly-supervised method
Section V. to segment terrain classes from a front-facing camera image,
then use a CNN to compute the confidence of keeping balance
on potential footholds, which is mapped to a cost function.
II. R ELATED W ORK
Similar approaches, which first classify terrain and then assign
a) Path planning: A common strategy for integrating a cost to a grid map, have been developed, e.g., during the
machine learning in robotic path planning [4] is imitation LAGR project [17].
learning of expert trajectories. Along this line of research, Our proposed approach, given a local planner, trains an esti-
Pfeiffer et al. [5] train a CNN to compute steering commands mator to predict which local motions are feasible, using many
to reach the desired target pose given current readings from random trials on diverse terrains; then, instead of building
a laser scanner; Ollis et al. [6] analyze expert trajectories to a fixed resolution grid map, samples from the whole space
learn features of traversed map areas and then compute cost of local motions to build a dense planning tree. Moreover, it
maps in a Bayesian framework; Bagnell et al. [7] use inverse does not rely on hand-modeled terrain features (like slope or
optimal planning to infer a cost function for which the expert roughness) but learns more general features from examples on
trajectories would be optimal. a variety of terrains. In contemporary work, Chiang et al. [18]
In contrast, our work relies on sampling-based graph search. propose a similar approach for integrating a local planner
Sampling-based motion planning algorithms [1] handle high- trained using reinforcement learning in a global sampling-
dimensional configuration spaces and rely on a local plan- based planner for kinodynamic constrained robots. The gen-
ner (or steering function) that determines the connectivity eral approach of using machine learning to approximate the
between nodes of a graph; in this context, one can search outputs of a computationally expensive planner is also found
in the space of controls to take into account dynamical or in imitation learning, e.g., when the expert trajectories for a
differential constraints. The Rapidly Exploring Random Trees legged robot originate from a sophisticated high-dimensional
(RRT) [8] algorithm iteratively constructs a tree by sampling footholds planner [7] instead of a human pilot.
the configuration space to add edges that lead to feasible
states; once the target state is connected to the tree, the path is
III. M ODEL
returned. Alternatively, the Probabilistic Road Maps (PRM) [9]
algorithm builds a graph of feasible edges in which the best In this section, we describe the global path planning prob-
path is found using, e.g., Dijkstra’s algorithm. In general, lem as a search in the space of sequences of local motions,
sampling-based planners are probabilistically complete, i.e., whose feasibility and costs are estimated through machine
they find a path from source to target with probability 1 learning (see Figure 2).
if such a path exists and if they are given enough time to
grow the graph. Optimal variants such as RRT* and PRM*
asymptotically converge to the path of least cost [10]. Machine A. Local motions
learning has been used to augment components of sampling- The robot moves in an environment of which it has, at plan-
based planners, e.g., for automatically selecting the best sam- ning time, partial knowledge K modeled as a set of relevant
pler [11] or the best expansion strategy depending on the pieces of information. A local motion e = (q1 , q2 ) ∈ C 2 is
neighborhood of a node [12]. In this work, we apply machine defined by two nearby states in the robot’s configuration space
learning on a different component, i.e., the one that determines C. For example, for a differential-drive wheeled robot moving
the connectivity of two nearby states. on a plane among obstacles, we may represent a local motion

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
GUZZI et al.: PATH PLANNING WITH LOCAL MOTION ESTIMATIONS 3

1) Collecting the dataset: The robot can collect the dataset


t (see Figure 2a) in a self-supervised way if the robot di-
q2 Ke q2 Ke q2
? rectly extracts the trajectory descriptors from on-board sensors
d(γ) and its internal state. As relevant descriptors should only
d? q1 depend on the relative position of q2 with respect to q1 ,
q1 q1
s and the local knowledge around q1 , we use a local coor-
dinate frame, centered at q1 , to represent q2 and Ke . The
(a) A sample (b) A trained model (c) A planner resulting dataset is composed of many samples of the form
(Ke , e, d(γ)) is an predicts E [d(e, Ke )] queries the model
(Kq2 , q2 , d1 , d2 , . . . , dn ).
outcome of a local for an unobserved lo- for a possible new
motion from q1 to q2 . cal motion e. edge (q1 , q2 ). 2) Learning: Because some descriptors are categorical
(e.g., S) while others are continuous (e.g., duration and en-
Fig. 2: Basic definitions, as introduced in Section III: a dataset ergy), the model has multiple outputs: some for classification
(a) of local motion outcomes is used to train a model (b), and others for regression. Figure 5 shows the deep neural
which a sampling-based planner (c) queries to get the success network architecture that we use as a local motion estimator
probability and cost of candidate edges for a tree connecting for two specific planning problems.
source s with target t. The orange circle represents local
knowledge Ke that the model uses to predict the outcome.
C. Global path planning
The global path planner searches for the best trajectory
for the robot to travel from s ∈ C to t ∈ C, encoded as
as a pair of poses in the two-dimensional special Euclidean a sequence π = (e1 , e2 , . . . em ) of local motions to per-
group SE(2), and knowledge as an obstacle occupancy grid. form. We discriminate among trajectories using the descriptors
When the robot attempts to travel from q1 to q2 using a computed by the trained model. In particular, P we define an
local planner and a controller, it follows a time-parametrized additive cost for the path as C (π; K π ) = e∈π C (e; Ke ) =
trajectory γ : [0, T ] → C, with γ(0) = q1 . In general, multiple C where the cost of the local motions
P
e∈π (E [d(e, K e )]),
executions of the same local motion may result in different depends on the predicted descriptors.
trajectories; this randomness has diverse sources, such as: For instance, if we assume that the probabilities of failing
partial knowledge of the environment (possibly based on the different local motions are independent, one interesting cost
robot’s sensors); stochastic interaction with the environment to minimize is the path survival risk R, given by the negative
while executing the motion (e.g., when a robot slips); local log-probability that the robot completes all local motions
planners that use random algorithms. Therefore, we assume X X
that γ is randomly drawn from a distribution, which depends R(π; Kπ ) = − log P(S(e, Ke ) = 1) = − log yS (e, Ke ).
e∈π e∈π
on the environment, e, and the local knowledge Ke ⊂ K, (1)
which we define as the subset that contains any information 1) Sampling-based planner: We solve the global planning
about the surroundings of e; for the previous example, local problem using a sampling-based planner that iteratively builds
knowledge would be a portion of the occupancy grid map a graph of states connected by edges that represent feasible
containing the source and target poses. local motions (e.g., local motions with a sufficiently high prob-
Let d(γ) = (d1 (γ), . . . , dn (γ)) ∈ D be an n-dimensional ability of success). In our implementation, we use randomized
descriptor that should contain any information relevant to variants of Rapid Exploring (Dense) Tree (RDT) [19] algo-
evaluate trajectories. For instance, d1 ≡ S ∈ {0, 1} denotes rithms to answer one-shot path planning queries. In general,
whenever the robot has reached the target, i.e., if γ(T ) = q2 . these algorithms iteratively construct a tree T ⊂ C that covers
Other interesting descriptors may be duration T , energy con- the configuration space by sampling random points (with a
sumed, number of collisions occurred, and maximal motor bias towards t) and connecting them to the tree through local
torque. Finally, we use the notation d(e, Ke ) to denote the motions, as illustrated in Figure 2c and Algorithm 1, which
distribution of descriptors for trajectories resulting from a makes explicit the role of the predictions during planning.
local motion e when the robot has local knowledge Ke ; the The main difference with the original RDT algorithms, like
dependency on the environment is always implicit. RRT, is that we cannot evaluate infinitesimally short local
motions (otherwise their descriptors would be meaningless),
and therefore the asymptotic properties are not preserved.
B. Learning to predict local trajectories Optimal variants, like RRT* [10] and SST [20] (Stable
We use supervised machine learning to predict the outcome Sparse RRT), use a cost C (in our case computed from the
of a given local motion (i.e., the expected value of the predicted descriptors) to grow a tree of optimal paths; adding
trajectory descriptors). In particular, we learn the mapping an edge to the tree may involve locally reorganizing the tree
(e, Ke ) 7→ y = E[d(e, Ke )] (see Figure 2b). As a first step, we (RRT* ) or competing with neighbor branches (SST).
collect many instances of local trajectories γ by sampling local
motions in different environments and recording the values of IV. E XPERIMENTAL SETUP
their descriptors d(γ); then, we train a machine learning model In this section, we present two concrete applications of the
using this data. model presented in Section III for two different robots depicted

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

P P

q2 50 cm
15 cm
q1 q1

(a) One local motion sample: (b) Evaluation of the trained model: colors indicate (c) In a challenging terrain, ANYmal success-
the target point q2 is uniformly predicted success probability yS (red: low, green: fully executes the planned path of lowest risk
drawn in an annulus (cyan) from high) for motions to a target point in the annulus: (blue dots) between start (left) and target poses
15 cm to 50 cm around q1 . trying to step over the (white) wall is correctly (right), computed using the trained model.
assigned a low chance of success (orange).
Fig. 3: ANYmal pipeline: (a) we collect about 90K random trajectories by picking q1 uniformly on a map and q2 in an annulus
around it. We use these trajectories to train a model (b) to finally compute plans (c) using RRT* . Local knowledge K(q1 ,q2 )
is encoded as a 2 m × 2 m grid patch P around the robot bounded by the orange line; heightmaps are depicted as gray-scale
images. All drawing are in scale and use experimental data.

Algorithm 1 RRT-based planner that builds a random tree T simplicity, we limit our analysis to local motions that maintain
with resolution δ to compute a path from s ∈ C to t ∈ C, orientation; therefore, the configuration space C = R2 is given
using a model to predict the expected descriptors y of local by the horizontal position of ANYmal’s center. Heightmaps
motions trajectories and knowledge K about the environment. have a resolution of 2 cm per pixel; local knowledge is
Only edges with success probability larger than τ are added represented as a 100 px × 100 px (2 m × 2 m) patch P of such
to the tree and carry a cost function C . heightmap (see Figure 3a).
Initialize T with a node in s b) Local motion: The simulated ANYmal uses a sim-
while t 6∈ T do ple1 locomotion planner and a complex closed-loop feedback
q ← random sample from C with a small bias toward t controller to follow a local trajectory while compensating for
q1 ← vertex in T nearest to q terrain irregularities [23]. The local planner takes as its only
e ← local motion from q1 toward q of at most length δ input the relative target pose q2 = (x, y, θ). For any sampled
y ← E [d(e, Ke )] trajectory γ, d(γ) ∈ {0, 1} × R+ is composed of success S
if yS > τ then (“has the robot arrived near enough to q2 ?”) and duration T .
add edge e with cost C (y) to T c) Dataset collection: We collect a dataset of about 90K
end if samples (76% with S = 1), where each sample is a tuple
end while (P, x, y, S, T ), by randomly spawning the robot on stable
poses and randomly sampling a relative target point at a
distance between 15 cm to 50 cm along a random direction;
in Figure 1: a simulated ANYmal robot and a real Mighty on average the robot needs 8 steps (about 10 seconds) to
Thymio robot. complete a local motion (Figure 3a). We gather data on 12
Both robots move on 2D terrain whose knowledge K is different terrains (with total area 1200 m2 ), which contain
encoded as heightmaps (i.e., grey-scale images with an altitude challenging obstacles such as slopes, bumps, holes, rails, and
value associated with each pixel); we ignore any property that steps. As in our previous work [14], terrain heightmaps are
is not geometrical (like the surface material). Knowledge used procedurally generated as superpositions of simplex noise at
by the robot for local motions is given by local heightmaps, different scales.
i.e., square heightmap patches centered at the robot location, d) Training: The success and duration of local motions
and aligned with its orientation. We implement sampling- are modeled by a deep neural network, outlined in Figure 5.
based planners using the Open Motion Planning Library The estimator has two stages: the first is composed of con-
(OMPL) [21]. volutional layers that operate on the heightmap patch (which
has an image-like 2D structure); the second stage processes
A. Simulated ANYmal the resulting features and accepts, as an additional input, the
relative target point (x, y). The output of the model consists
a) Robot and environment: ANYmal [22] is a state-of-
of the success probability yS from a dense layer with softmax
the-art quadruped robot developed at ETHZ for autonomous
activation and the estimation of duration yT from a dense
operation in challenging environments, with the ability to
layer. The loss function is given by the sum of categorical
walk on rough terrain. We simulate ANYmal in Gazebo (see
cross-entropy (success) and mean squared error (duration).
Figure 1a) to learn how well the robot copes with different
obstacles (e.g., steps, holes, slopes, bumps) and later plan safe 1 ANYmal also features a more complex local planner [13], which we do
paths; ANYmal’s footprint is 80 cm × 60 cm × 70 cm. For not use in this work, that selects the best foothold locations on a terrain.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
GUZZI et al.: PATH PLANNING WITH LOCAL MOTION ESTIMATIONS 5

S=0 P

S=1

(a) When the robot collides with an (b) We use the trained model to predict (c) Planning a path through a narrow opening of
obstacle, it inverts the linear speed (here a sequence of five 1 s local motions with 44 cm from various starting poses (robots at the
moving backward) before having reached constant linear and angular speeds: target bottom) to the same target pose (blue circle) shows
the target position (red); from the trajec- poses are colored by success probability that the trained model makes the robot pass on the
tory (blue line), we extract and label many yS (red: low, green high): collisions are left side of the corridor to avoid touching with its
1 s segments (delimited by star markers). correctly identified (red). arm poking out to the right.
Fig. 4: Mighty Thymio pipeline: (a) the robot randomly changes angular speed every 2 s to collect 1 s training samples labeled
according to the occurrence (or not, S ∈ {0, 1}) of a collision; (b) this model predicts the probability of a collision in the next
1 s for given angular and linear speeds, which (c) we use to plan safe trajectories (τ = 0.98) that minimizes duration using
SST. Local knowledge K(q1 ,q2 ) is a grid patch P (orange stroke) of 50 cm around the robot. All drawing are in scale and use
experimental data.

The training dataset contains 70K samples from 7 terrains; B. Mighty Thymio
the validation dataset contains 10K samples from 2 terrains;
a) Robot and environment: The Mighty Thymio [24]
the evaluation dataset contains 8.7K samples from 3 terrains;
is an extension of the small educational robot Thymio [25]
all 12 terrains are distinct.
for research use. It features two differentially driven wheels
and its configuration space C = SE(2) is composed of
input
the robot’s position and orientation. The small mobile robot
dense (64)

target dense (1) moves on the flat floor of a room that has several obsta-
(x, y) output yT cles on the ground (see Figure 1b); to make the planning
flatten / dense (64)

problem more interesting, we have extended the body of


concatenation

dense (64)
conv (16)

conv (16)

conv (32)

conv (32)

input
dropout

map the robot by adding a rigid arm on its right. The Mighty
dense (32)

dense (2)
patch P
softmax Thymio size is 11 cm × 11 cm × 20 cm, and the arm size is
output yS 10 cm × 2 cm × 0.5 cm. As another application of the model
introduced in Section III, the goal is to learn to predict if the
Fig. 5: Neural Network architecture for ANYmal discussed in robot (whose geometry is assumed unknown) would collide
Section IV-A. Given the robot-centered heightmap patch P and with obstacles during a short motion and use this information
the relative target goal (x, y), the model outputs a probability to plan collision-free paths. We represent environments as
of reaching the target pose and an estimation of the required heightmaps with a resolution of 0.625 cm per pixel; local
time. The architecture used for Mighty Thymio is very similar knowledge is represented by robot-centered 80 px × 80 px
(see Section IV-B). (50 cm × 50 cm) patches P.
b) Local motion: Like ANYmal, the Mighty Thymio
could use a relatively complex planner/controller combination
e) Path planning: We use the trained local motion model to perform arbitrary local motions in free space. Instead,
to compute global paths through a planner, which is derived we present a simpler model that highlights the generality
from RRT* in the same way Algorithm 1 has been derived of our approach; namely, we limit local motions to have
from RRT in Section III-C1. Namely, the planner iteratively constant linear speed v ∈ {−8 cm s−1 , 8 cm s−1 }, angular
samples a random point q and attempts to grow the tree with a speed ω ∈ [−0.5 rad s−1 , 0.5 rad s−1 ], and duration T = 1 s;
motion from the nearest neighbor of q, towards q, on a segment in this case, the local planner is trivial and just executes the
of length δ = 45 cm. If q is closer than δ, the candidate is corresponding control. For every local motion in the training
retained as long as the distance is above 15 cm; otherwise, it set, we record S ∈ {0, 1}, which discriminates whenever a
is discarded. Based on the success score yS returned by the collision occurred during the local motion.
estimator, we assign a risk − log yS to each edge while we c) Dataset collection: We acquire data using a controller
consider any edge with a score lower than a threshold τ = 0.5 that randomly selects a new angular speed every 2 s. Each
as not feasible. We run RRT* to find the path with minimal time a collision occurs, the robot records it and inverts its
survival risk, as defined in Eq. (1); as RRT* is an anytime motion direction (i.e., starts moving backward if it was moving
algorithm that converges towards the optimal solution, we let forward, and vice versa). A motion capture system records the
the tree grow for a given computational time budget. position of all obstacles and the robot. The robot runs for a

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

total time of 120 min (in 6 different maps), during which about Anymal success Anymal duration Thymio success
1 1
460 collisions were recorded. After the acquisition, we process 15
0.8 0.8

p(S|yS )

p(S|yS )
the data to extract 1 s trajectories with constant controls: we

T [s]
0.6 10 0.6
assign S = 1 to those without collisions, while we assign S = 0 0.4 0.4
to those where a collision occurs (see Figure 4a). In total, we 0.2 5 0.2
0 0
collect a dataset of 104K samples of the form (P, v, ω, S). 0 0.5 1 5 10 0 0.5 1
yS yT [s] yS
d) Training: We adopt the CNN architecture depicted in
Figure 5, with the only difference that the output is only the
Fig. 6: Model evaluation (ground truth vs prediction) on local
success probability (no-collision) of the motion. We generate
motions for ANYmal (8700 samples, left and center) and
the dataset from three disjoint sets of maps: 52K samples were
Mighty Thymio (30K samples, right). In the success plots, we
used for training (3 maps), 20K for validation (1 map), and
group yS in 0.1 width bins. In the duration plot, dots mark
30K for evaluation (2 maps).
the mean value of T for outputs yT grouped in 1 s width bins,
e) Path planning: To take into account the differential
while the grey area is delimited by ± one standard deviation.
constraints of the robot’s motion, we use a variant of SST,
Perfect models are represented by dashed black lines.
which samples control inputs to reach a local target point.
More precisely, SST iteratively samples a random pose q ∈
SE(2) and several control inputs, among which it selects the dataset; the corresponding performance scores are listed in the
one that would lead the robot nearest to q from its nearest evaluation row of grey columns (related to testing single local
neighbor; the planner then adds an edge to the tree only if it motions) in Table I (AUC = 0.889). The model is very well
decreases the local cost-to-come. We discard any edge with calibrated for both outputs.
a success probability lower than τ = 0.98 and assign a fixed
cost of 1 (i.e., proportional to their duration) to feasible edges. TABLE I: ANYmal model evaluation for success prediction:
SST is an anytime algorithm that converges towards a nearly number of samples for the two classes and Area Under the
optimal solution; therefore, we let the tree grow for a fixed ROC (AUC). The columns contain the results for single local
amount of computational time. motions e (gray), paths π between random source and target
locations on selected maps (white), and all subpaths subπ
V. E XPERIMENTAL RESULTS extracted from such paths (blue).

How well are we able to predict the outcome of local dataset # successes # failures AUC
motions? Are the local models useful for path planning? Are e π subπ e π subπ e π subπ
the assumptions we made in Section III justified? In particular, evaluation 7.30K - - 1.40K - - 0.889 - -
are we justified to interpret yS as the success probability for rough map 18.4K 562 297K 185 185 49.7K 0.878 0.904 0.912
local motions and use it to compute the success probability surf map 22.4K 564 431K 237 237 63.5K 0.962 0.962 0.952
for an entire path? In this section, we present experimental
results that partially answer these questions for the two setups 2) Mighty Thymio: Similarly, we report in Table II and
introduced in Section IV. Figure 6:right the results for the Mighty Thymio model pre-
Qualitative tests, like those illustrated in Figures 3b and 4b, dicting success probability (AUC = 0.972). During data col-
show that the learned model’s predictions are coherent with lection, the robot had a relatively small number of collisions,
the map: the ANYmal model correctly estimates that motions which lead to training and evaluation datasets that are very
that step over the short wall are risky; the Mighty Thymio unbalanced (one failed sample every 25 successful samples),
model correctly predicts collisions. Similarly, computing paths which could explain the inferior calibration (compared to the
between hand-picked locations on interesting maps, like in ANYmal model) of a model that tends to overestimate the
Figures 3c and 4c, shows that the planner computes sensible success probability. Nonetheless, the model, with a threshold
paths: ANYmal avoids high steps and slopes, and Mighty τ = 0.98, yields a precision of 99.7% and a recall of 96.1%,
Thymio (with an arm on the right) stays on the left side of a which prove to be good enough for planning on our maps.
corridor to avoid collisions. The video in the supplementary
TABLE II: Mighty Thymio model evaluation: number of
material contains several examples of planned paths and their
samples for the two classes, Area Under the ROC (AUC),
executions.
and success probability for scores above τ = 0.98, which
In the rest of this section, we report quantitative, statistically
is the threshold we use to compute the plans according to
significant results. First, we present metrics on the evaluation
Section IV-B.
datasets for the prediction of local motions. Then, we report
results for the planned paths, focusing on the ANYmal case, successes failures AUC P(S = 1|yS > 0.98)
for which we gathered a large number of executed samples in 28913 1087 0.972 0.997
simulation.

A. Local motion prediction B. Path planning for ANYmal


1) ANYmal: Figure 6:left illustrates how well the ANYmal In this section, we report results that investigate the core of
model outputs correspond to the true values on the evaluation our contribution, i.e., using the learned local motion model to

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
GUZZI et al.: PATH PLANNING WITH LOCAL MOTION ESTIMATIONS 7

plan a path: we compute and then execute many paths between Table I compares the prediction performance of success for
target and source locations for the simulated ANYmal. In single segments, source-target paths, and all their subpaths. For
particular, we randomly sample 1000 pairs of source and both maps, the AUC is similar across the different datasets.3
target states in two different maps (rough and surf ) of size Performance on the rough map is comparable to the evaluation
10 m × 10 m illustrated at the top of Figure 8: we reject any dataset of single segments discussed before, while predictions
pair outside of the green areas (where we expect a successful on the surf map are easier.
spawning of the simulated robot) and any pair with a distance
shorter than 3 m. The two maps have not been used to train or rough map surf map
evaluate the model and have been hand-designed as realistic
(rough) and easy legible (surf, with a smooth, gradual slope)
outdoor terrains.
One evaluation of the model requires about 1 ms on a single,
modern desktop, CPU. We fix the RRT* planning duration to
120 s, which is sufficient to cover the maps with a granularity
of δ = 0.45 m, resulting in trees with about 10K edges and 10 0.8 10 1
that require about 100K model evaluations. 8 0.6 8 0.8
6 6 0.6

[m]

[m]
0.4
4 4 0.4
10 2 0.2 2 0.2
yS = 0.14, S = 0
yT = 157 s 0 0 0 0
0 2 4 6 8 10 0 2 4 6 8 10
[m] [m]
8 success success
1 1
p(S|yS ) 0.8 0.8

p(S|yS )
yS = 0.90, S = 1 0.6 0.6
6 yT = 246 s, T = 265 s
0.4 0.4
[m]

0.2 0.2
0 0
4 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
yS yS
e10 duration duration
e11 e20
2 400 400
T [s]

T [s]
yS = 0.57, S = 1
yT = 271 s, T = 293 s 200 200
0
0 2 4 6 8 10 0 0
[m] 0 200 400 0 200 400
yT [s] yT [s]
Fig. 7: Three samples of paths from source (cyan arrow) to
target (blue arrow) and their execution on the rough map: Fig. 8: ANYmal model evaluation on about 400K subpaths
one failure (partly red) and two successes (all blue). For for two maps (rough left and surf right). The first row shows
any execution, we record ground truth and model prediction the 3D view of the terrain with a correctly sized robot. The
(yellow). Because of stochastic interactions with the terrain, second row depicts the heightmap (in meters) with spawning
multiple runs may have different outcomes; for examples, 30 areas in green. In the plots, dots marks mean values for binned
trials of the bottom path result in 14 failures and 16 successes, prediction outputs, while the area on the bottom plots covers
for which P(S = 1) ≈ 0.53 is remarkably near to the predicted ± one standard deviation.
value yS = 0.57, with failures distributed in 3 out of 34
segments (e10 : 7, e11 : 3, e20 : 4). Figure 8 illustrates the quality of the prediction when
applied to subpaths. In general, the success score is well
calibrated although it tends to overestimate the success chance
As illustrated2 in Figure 7, for any planned path, we spawn
of risky paths (i.e., those with low score); the model is better
the robot at the source location and pass the sequence of
calibrated on the rough map than on surf, which is smoother
intermediate target poses to the controller. We record the
but has steeper slopes. Instead, the duration of subpaths
success and duration of each segment of the path: in about
is systematically underestimated; we believe that this is an
80% of the cases the robot arrives safely at the target, while
artifact: in training, we record the time until a single pose
in the remaining cases we observe a failure to complete one
is reached, which may happen while the robot has not fully
segment. To increase the number of (path) samples (from 1K
stopped yet; instead, during path execution, the controller waits
to around 400K), from any path we extract all subpaths that
until the robot is still to start moving towards the next pose.
start before a failure (if any): each subpath has well-defined
Nonetheless, the error is relatively small at around 5%.
ground truth labels and predictions for success and duration.
3 We note that the vast majority of the segments along a path are successful,
2 The
supplementary video captures ANYmal planning queries on the surf which leads to local motions datasets on rough and surf that are significantly
map too and planning queries for Mighty Thymio. unbalanced.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2972849, IEEE Robotics
and Automation Letters
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

VI. C ONCLUSIONS [4] D. González, J. Pérez, V. Milanés, and F. Nashashibi, “A review of


motion planning techniques for automated vehicles,” IEEE Trans. on
We introduced a novel, data-driven approach to robotic Intelligent Transportation Systems, vol. 17, no. 4, pp. 1135–1145, 2015.
path planning: first, we learn to predict the outcome of short [5] M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart, and C. Cadena, “From
trajectories from local information; then, we use this model to perception to decision: A data-driven approach to end-to-end motion
planning for autonomous ground robots,” in IEEE Int. Conf. on Robotics
discriminate between potential edges while building a dense and Automation (ICRA), 2017, pp. 1527–1533.
random tree to connect source and target states. We applied [6] M. Ollis, W. H. Huang, and M. Happold, “A bayesian approach to
the model in two scenarios: computing geometrical paths on imitation learning for robot navigation,” in IEEE/RSJ Int. Conf. on
Intelligent Robots and Systems (IROS), 2007, pp. 709–714.
rough terrain for a simulated legged ANYmal robot with a [7] J. Bagnell, J. Chestnutt, D. M. Bradley, and N. D. Ratliff, “Boosting
sophisticated controller whose performance would be difficult structured prediction for imitation learning,” in Advances in Neural
to model by hand; and computing control trajectories for a Information Processing Systems (NIPS), 2007, pp. 1153–1160.
[8] S. M. LaValle, “Rapidly-exploring random trees: A new tool for path
real, non-holonomic, Mighty Thymio robot moving between planning,” Iowa State University, Tech. Rep., 1998.
obstacles with a simple controller. Note that, in the second [9] L. E. Kavraki, P. Švestka, J.-C. Latombe, and M. H. Overmars, “Prob-
case, it would be straightforward to analytically model the abilistic roadmaps for path planning in high-dimensional configuration
spaces,” IEEE Trans. on Robotics and Automation, vol. 12, no. 4, 1996.
collision risk of the robot with known obstacles; yet, it is [10] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal
interesting that our data-driven approach yields acceptable motion planning,” The Int. Journal of Robotics Research, vol. 30, no. 7,
results, using a small training dataset and requiring no explicit pp. 846–894, 2011.
[11] A. Upadhyay and C. Ekenna, “Investigating heterogeneous planning
knowledge of the robot geometry. spaces,” in IEEE Int. Conf. on Simulation, Modeling, and Programming
To assign a risk to the path (as a function of the estimated for Auton. Robots (SIMPAR), 2018, pp. 108–115.
risk of individual edges), we assumed that the model output [12] J. Denny, M. Morales, S. Rodriguez, and N. M. Amato, “Adapting rrt
growth for heterogeneous environments,” in IEEE/RSJ Int. Conf. on
was calibrated and that edge failure probabilities were inde- Intelligent Robots and Systems (IROS), 2013, pp. 1772–1778.
pendent (or at least Markovian). We verified that this is the [13] P. Fankhauser, M. Bjelonic, C. Dario Bellicoso, T. Miki, and M. Hutter,
case for ANYmal, for which we acquired a large and balanced “Robust rough-terrain locomotion with a quadrupedal robot,” in IEEE
Int. Conf. on Robotics and Automation (ICRA), 2018, pp. 1–8.
dataset: our approach yields accurate predictions of success [14] R. O. Chavez-Garcia, J. Guzzi, L. M. Gambardella, and A. Giusti,
probability for entire paths. In general, we observe that the “Learning ground traversability from simulations,” IEEE Robotics and
size and class balance of training datasets is a crucial factor; Automation Letters, vol. 3, no. 3, pp. 1695–1702, July 2018.
[15] P. Krüsi, P. Furgale, M. Bosse, and R. Siegwart, “Driving on Point
ongoing work focuses on assessing requirements for path Clouds: Motion Planning, Trajectory Optimization, and Terrain Assess-
planning tasks. Moreover, the machine learning models that ment in Generic Nonplanar Environments,” Journal of Field Robotics,
have been trained do not take into account the sequential nature vol. 34, no. 5, pp. 940–984, 2017.
[16] L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena Lerma,
of following a path: we can improve this by defining loss and M. Hutter, “Where should i walk? predicting terrain properties from
functions on sequences of segments rather than on individual images via self-supervised learning,” IEEE Robotics and Automation
segments. Letters, vol. 4, no. 2, pp. 1509–1516, 2019.
[17] L. D. Jackel, E. Krotkov, M. Perschbacher, J. Pippine, and C. Sullivan,
We are presently working on a richer model for ANYmal, “The DARPA LAGR program: Goals, challenges, methodology, and
with local motions that change orientation and additional phase I results,” Journal of Field Robotics, vol. 23, pp. 945–973, 2006.
labels, such as consumed energy and multiple failure modes [18] H. L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust, “Rl-rrt:
Kinodynamic motion planning via learning reachability estimators from
(getting stuck, capsizing, low-level control errors); we are rl policies,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp.
going to compare the plans produced by our pipeline with 4298–4305, Oct 2019.
those obtained by modeling a cost function by hand [26]. [19] S. M. LaValle, Planning Algorithms. Cambridge Univ. Press, 2006.
[20] Y. Li, Z. Littlefield, and K. E. Bekris, “Asymptotically optimal sampling-
To keep the model simple, we assumed that knowledge based kinodynamic planning,” The Int. Journal of Robotics Research,
is static, that the map is known, and that failures are fatal. vol. 35, no. 5, pp. 528–564, 2016.
To drop these assumptions, one can integrate our data-driven [21] I. A. Sucan, M. Moll, and L. E. Kavraki, “The open motion planning
library,” IEEE Robotics and Automation Mag., vol. 19, pp. 72–82, 2012.
approach with online learning, receding horizons, and fast [22] M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsou-
re-planning strategies for sampling-based planners [27]; this nis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm,
would address more realistic scenarios where the robot is able S. Bachmann, A. Melzer, and M. Hoepflinger, “Anymal - a highly mobile
and dynamic quadrupedal robot,” in IEEE/RSJ Int. Conf. on Intelligent
to re-plan once it discovers new information or experiences a Robots and Systems (IROS), 2016, pp. 38–44.
non-fatal failure. [23] P. Fankhauser, C. Dario Bellicoso, C. Gehring, R. Dubé, A. Gawel, and
Finally, we illustrated the model in the context of path M. Hutter, “Free gait — an architecture for the versatile control of legged
robots,” in IEEE-RAS Int. Conf. on Humanoid Robots (Humanoids),
planning, but the application scope of the main idea is broad 2016, pp. 1052–1058.
enough that it can be applied to any planning problem using [24] J. Guzzi, A. Giusti, G. A. Di Caro, and L. M. Gambardella, “Mighty
actions whose outcome can be predicted by a machine learning thymio for university-level educational robotics,” in AAAI Conf. on
Artificial Intelligence, 2018.
estimator. [25] F. Mondada, M. Bonani, F. Riedo, M. Briod, L. Pereyre, P. Retornaz, and
S. Magnenat, “Bringing robotics to formal education: The thymio open-
R EFERENCES source hardware robot,” IEEE Robotics and Automation Mag., vol. 24,
no. 1, pp. 77–85, 2017.
[1] M. Elbanhawi and M. Simic, “Sampling-based robot motion planning: [26] M. Wermelinger, P. Fankhauser, R. Diethelm, P. Krüsi, R. Siegwart,
A review,” IEEE Access, vol. 2, pp. 56–77, 2014. and M. Hutter, “Navigation planning for legged robots in challenging
[2] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised visual repre- terrain,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
sentation learning by context prediction,” in 2015 IEEE Int. Conf. on (IROS), 2016, pp. 1184–1189.
Computer Vision (ICCV), 2015, pp. 1422–1430. [27] M. Otte and E. Frazzoli, “Rrtx: Asymptotically optimal single-query
[3] Z. Ghahramani, “Probabilistic machine learning and artificial intelli- sampling-based motion planning witutt quick replanning,” The Int.
gence,” Nature, vol. 521, pp. 452–459, 2015. Journal of Robotics Research, vol. 35, no. 7, pp. 797–822, 2016.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like