Dynamic Vehicle Routing For Robotic Systems: Proceedings of The IEEE September 2011
Dynamic Vehicle Routing For Robotic Systems: Proceedings of The IEEE September 2011
net/publication/228639118
CITATIONS READS
156 186
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Emilio Frazzoli on 01 June 2014.
Abstract—Recent years have witnessed great advancements in a request for close-range observation by one of the UAVs
the science and technology of autonomy, robotics and networking. is generated. In response to this request, a UAV visits the
This paper surveys recent concepts and algorithms for dynamic location to gather close-range information and investigates the
vehicle routing (DVR), that is, for the automatic planning of
optimal multi-vehicle routes to perform tasks that are generated cause of the alarm. Each request for close-range observation
over time by an exogenous process. We consider a rich variety might include priority levels or time windows during which the
of scenarios relevant for robotic applications. We begin by inspection must occur and it might require an on-site service
reviewing the basic DVR problem: demands for service arrive time. In summary, from a control algorithmic viewpoint, each
at random locations at random times and a vehicle travels to time a new request arises, the UAVs need to decide which
provide on-site service while minimizing the expected wait time
of the demands. Next, we treat different multi-vehicle scenarios vehicle will inspect that location and along which route. Thus,
based on different models for demands (e.g., demands with the problem is to design algorithms that enable real-time task
different priority levels and impatient demands), vehicles (e.g., allocation and vehicle routing.
motion constraints, communication and sensing capabilities), and Accordingly, this paper surveys allocation and routing al-
tasks. The performance criterion used in these scenarios is gorithms that typically blend ideas from receding-horizon re-
either the expected wait time of the demands or the fraction
of demands serviced successfully. In each specific DVR scenario, source allocation, distributed optimization, combinatorics and
we adopt a rigorous technical approach that relies upon methods control. The key novelty in our approach is the simultaneous
from queueing theory, combinatorial optimization and stochastic introduction of stochastic, combinatorial and queueing aspects
geometry. First, we establish fundamental limits on the achievable in the distributed coordination of robotic networks.
performance, including limits on stability and quality of service. Static vehicle routing: In the recent past, considerable
Second, we design algorithms, and provide provable guarantees
on their performance with respect to the fundamental limits. efforts has been devoted to the problem of how to coop-
eratively assign and schedule demands for service that are
defined over an extended geographical area [1], [2], [3], [4],
I. I NTRODUCTION [5]. In these papers, the main focus is in developing distributed
This survey presents a joint algorithmic and queueing ap- algorithms that operate with knowledge about the demands
proach to the design of cooperative control and task allocation locations and with limited communication between robots.
strategies for networks of uninhabited vehicles and robots. However, the underlying mathematical model is static, in that
The approach enables groups of robots to complete tasks in no new demands arrive over time. Thus, the centralized version
uncertain and dynamically changing environments, where new of the problem fits within the framework of the static vehicle
task requests are generated in real-time. Applications include routing problem (see [6] for a thorough introduction to this
surveillance and monitoring missions, as well as transportation problem), whereby: (i) a team of m vehicles is required to
networks and automated material handling. service a set of n demands in a 2-dimensional space; (ii) each
As a motivating example, consider the following scenario: demand requires a certain amount of on-site service; (iii) the
a sensor network is deployed in order to detect suspicious goal is to compute a set of routes that optimizes the cost
activity in a region of interest. (Alternatively, the sensor of servicing (according to some quality of service metric)
network is replaced by a high-altitude sensory-rich aircraft the demands. In general, most of the available literature on
loitering over the region.) In addition to the sensor network, routing for robotic networks focuses on static environments
a team of unmanned aerial vehicles (UAVs) is available and and does not properly account for scenarios in which dynamic,
each UAV is equipped with close-range high-resolution on- stochastic and adversarial events take place.
board sensors. Whenever a sensor detects a potential event, Dynamic vehicle routing: The problem of planning routes
through service demands that arrive during a mission exe-
This research was partially supported by AFOSR award FA 8650-07-2- cution is known as the “dynamic vehicle routing problem”
3744, ARO MURI award W911NF-05-1-0219, NSF awards ECCS-0705451
and CMMI-0705453, and ONR award N00014-07-1-0721. (abbreviated as the DVR problem in the operations research
F. Bullo is with the Center for Control, Dynamical Systems and Com- literature). See Figure 1 for an illustration of DVR. There
putation and with the Department of Mechanical Engineering, University of are two key differences between static and dynamic vehicle
California, Santa Barbara, CA 93106 ([email protected]).
E. Frazzoli and M. Pavone are with the Laboratory for Information and De- routing problems. First, planning algorithms should actually
cision Systems, Department of Aeronautics and Astronautics, Massachusetts provide policies (in contrast to pre-planned routes) that pre-
Institute of Technology, Cambridge, MA 02139 ({pavone,frazzoli}@mit.edu). scribe how the routes should evolve as a function of those
K. Savla is with the Laboratory for Information and Decision Systems,
Massachusetts Institute of Technology, Cambridge, MA 02139 (ksavla@mit. inputs that evolve in real-time. Second, dynamic demands (i.e.,
edu). demands that vary over time) add queueing phenomena to the
S. L. Smith is with the Computer Science and Artificial Intelligence combinatorial nature of vehicle routing. In such a dynamic
Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
([email protected]). setting, it is natural to focus on steady-state performance
The authors are listed in alphabetical order. instead of optimizing the performance for a single demand.
2
feature of the online algorithm approach is the method that is performance over all possible inputs (i.e., demand arrival sequences). A policy
performs within a constant factor κ of the optimal if the ratio between the
used to evaluate the performance of online algorithms, which policy’s expected performance and the optimal expected performance is upper
is called competitive analysis [22]. In competitive analysis, bounded by κ.
4
following steps: therein. The m-median of the set Q with density ϕ is the
(i) queueing model of the robotic system and analysis of global minimizer
its structure; ∗
Pm (Q) = arg min Hm (P, Q).
(ii) establishment of fundamental limitations on perfor- P ∈Qm
mance, independent of algorithms; and
(iii) design of algorithms that are either optimal or constant- We let Hm ∗
(Q) = Hm (Pm ∗
(Q), Q) be the global minimum
factor away from optimal, possibly in specific asymp- of Hm . The set of critical points of Hm contains all arrays
totic regimes. (p1 , . . . , pm ) with distinct entries and with the property that
each point pk is simultaneously the generator of the Voronoi
Finally, the proposed algorithms are evaluated via numerical,
cell Vk (P ) and the median of Vk (P ). We refer to such Voronoi
statistical and experimental studies, including Monte-Carlo
diagrams as median Voronoi diagrams. It is possible to show
comparisons with alternative approaches.
that a median Voronoi diagram always exists for any bounded
In order to make the model tractable, customers are usually
convex domain Q and density ϕ. More properties of the multi-
considered “statistically independent” and their arrival process
median function are discussed in Section C of the Appendix.
is assumed stationary (with possibly unknown parameters).
Because these assumptions can be unrealistic in some scenar-
ios, this approach has its own limitations. The aim of this B. Queueing Model for DVR
paper is to show that algorithmic queueing theory, despite
Here we review the model known in the literature as the
these disadvantages, is a very useful framework for the design
m-vehicle Dynamic Traveling Repairman Problem (m-DTRP)
of routing algorithms for robotic networks and a valuable
and introduced in [7], [8].
complement to the online algorithm approach.
Consider m vehicles free to move, at a constant speed
v, within the environment Q (even though we are assuming
III. A LGORITHMIC Q UEUEING T HEORY FOR DVR Q ⊂ R2 , the extension to three-dimensional environments is
In this section we describe algorithmic queueing theory. We often straightforward). The vehicles are identical, and have
start with a short review of some fundamental concepts from unlimited range and demand servicing capacity.
the locational optimization literature, and then we introduce Demands are generated according to a homogeneous (i.e.,
the general approach. time-invariant) spatio-temporal Poisson process, with time
intensity λ ∈ R>0 , and spatial density ϕ : Q → R>0 . In other
A. Preliminary Tools words, demands arrive to Q according to a Poisson process
with intensity λ, and their locations {Xj ; j ≥ 1} are i.i.d.
The Euclidean Traveling Salesman Problem (in short, TSP) (i.e., independent and identically distributed) and distributed
is formulated as follows: given a set D of n points in Rd , according to a density ϕ whose support is Q. A demand’s
find a minimum-length tour (i.e., a closed path that visits all location becomes known (is realized) at its arrival epoch; thus,
points exactly once) of D. More properties of the TSP tour at time t we know with certainty the locations of demands that
can be found in Section A of the Appendix. In this paper, we arrived prior to time t, but future demand locations form an
will present policies that require real-time solutions of TSPs i.i.d. sequence. The density ϕ satisfies:
over possibly large point sets; this can indeed be achieved by Z Z
using efficient approximation algorithms presented in Section P [Xj ∈ S] = ϕ(x) dx ∀S ⊆ Q, and ϕ(x) dx = 1.
B of the Appendix. S Q
Let the environment Q ⊂ R2 be a bounded, convex set At each demand location, vehicles spend some time s ≥ 0
(the following concepts can be similarly defined in higher in on-site service that is i.i.d. and generally distributed with
dimensions). Let P = (p1 , . . . , pm ) be an array of m distinct finite first and second moments denoted by s̄ > 0 and s2 . A
points in Q. The Voronoi diagram of Q generated by P is realized demand is removed from the system after one of the
an array of sets, denoted by V(P ) = (V1 (P ), . . . , Vm (P )), vehicles has completed its on-site service. We define the load
defined by factor % := λs̄/m.
Vi (P ) = {x ∈ Q| kx − pi k ≤ kx − pj k, ∀j ∈ {1, . . . , m}}, The system time of demand j, denoted by Tj , is defined as
the elapsed time between the arrival of demand j and the time
where k · k denotes the Euclidean norm in R2 . We refer to P one of the vehicles completes its service. The waiting time of
as the set of generators of V(P ), and to Vi (P ) as the Voronoi demand j, Wj , is defined by Wj = Tj − sj . The steady-state
cell or the region of dominance of the ith generator. system time is defined by T := lim supj→∞ E [Tj ]. A policy
The expected distance between a random point q, generated for routing the vehicles is said to be stable if the expected
according to a probability density function ϕ, and the closest number of demands in the system is uniformly bounded at
point in P is given by all times. A necessary condition for the existence of a stable
Hm (P, Q) := E mink∈{1,...,m} kpk − qk .
policy is that % < 1; we shall assume % < 1 throughout the
paper. When we refer to light-load conditions, we consider
The function Hm is known in the locational optimization the case % → 0+ , in the sense that λ → 0+ ; when we refer to
literature as the continuous Weber function or the continu- heavy-load conditions, we consider the case % → 1− , in the
ous multi-median function; see [28], [29] and the references sense that λ → (m/s̄)− .
5
Let P be the set of all causal, stable, and time-invariant where βTSP,2 ' 0.7120 ± 0.0002 (for more detail on the
routing policies and T π be the system time of a particular constant βTSP,2 , we refer the reader to Appendix A).
policy π ∈ P. The m-DTRP is then defined as the problem Within the class of spatially biased policies in P, the
of finding a policy π ∗ ∈ P (if one exists) such that optimal system time is lower bounded by
∗ 3
T := T π∗ = inf T π .
R
π∈P ∗
2
βTSP,2 λ Q ϕ2/3 (x)dx
TB ≥ as % → 1− . (3)
In general, it is difficult to characterize the optimal achiev- 2 m2 v 2 (1 − %)2
∗
able performance T and to compute the optimal policy π ∗ Both bounds (2) and (3) are tight: there exist policies whose
for arbitrary values of the problem parameters λ, m, etc. It system times, in the limit % → 1− , attain these bounds;
is instead possible and useful to consider particular ranges of therefore the inequalities in (2) and (3) could indeed be
parameter values and, specifically, asymptotic regimes such replaced by equalities. We present asymptotically optimal
as the light-load and the heavy-load regimes. For the purpose policies for the heavy-load case below. It is shown in [9] that
of characterizing asymptotic performance, we briefly review the lower bound in equation (3) is always less than or equal
some useful notation. For f, g : N → R, f ∈ O(g) to the lower bound in equation (2) for all densities ϕ.
(respectively, f ∈ Ω(g)) if there exist N0 ∈ N and k ∈ R>0 We conclude with some remarks. First, it is possible to show
such that |f (N )| ≤ k|g(N )| for all N ≥ N0 (respectively, (see [9], Proposition 1) that a uniform spatial density function
|f (N )| ≥ k|g(N )| for all N ≥ N0 ). If f ∈ O(g) and leads to the worst possible performance and that any deviation
f ∈ Ω(g), then the notation f ∈ Θ(g) is used. from uniformity in the demand distribution will strictly lower
the optimal mean system time in both the unbiased and biased
C. Lower Bounds on the System Time case. Additionally, allowing biased service results in a strict
reduction of the optimal expected system time for any non-
As in many queueing problems, the analysis of the DTRP uniform density ϕ. Finally, when the density is uniform there
problem for all the values of the load factor % in (0, 1) is is nothing to be gained by providing biased service.
difficult. In [7], [8], [9], [30], lower bounds for the optimal
steady-state system time are derived for the light-load case
(i.e., % → 0+ ), and for the heavy-load case (i.e., % → 1− ). D. Centralized and Ad-Hoc Policies
Subsequently, policies are designed for these two limiting In this section we present centralized, ad-hoc policies that
regimes, and their performance is compared to the lower are either optimal in light-load or optimal in heavy-load. Here,
bounds. we say that a policy is ad-hoc if it performs “well” only for
For the light-load case, a tight lower bound on the system a limited range of values of %. In light-load, the SQM policy
∗
time is derived in [8]. In the light-load case, the lower bound provides optimal performance (i.e., lim%→0+ T SQM /T = 1):
on the system time is strongly related to the solution of the The m Stochastic Queue Median (SQM) Pol-
m-median problem: icy [8] — Locate one vehicle at each of the m
∗ 1 ∗ median locations for the environment Q. When
T ≥ H (Q) + s̄, as % → 0+ . (1) demands arrive, assign them to the vehicle corre-
v m
sponding to the nearest median location. Have each
The bound is tight: there exist policies whose system times, vehicle service its respective demands in First-Come,
in the limit % → 0+ , attain this bound; we present such First-Served (FCFS) order returning to its median
asymptotically optimal policies for the light-load case below. after each service is completed.
Two lower bounds exist for the heavy-load case [9], [30]
This policy, although optimal in light-load, has two charac-
depending on whether one is interested in biased policies or
teristics that limit its application to robotic networks: First,
unbiased policies.
it quickly becomes unstable as the load increases, i.e., there
Definition III.1 (Spatially biased and unbiased policies). Let exists %c < 1 such that for all % > %c the system time T SQM
X be the location of a randomly chosen demand and W be is infinite (hence, this policy is ad-hoc). Second, a central
its wait time. A policy π is said to be entity needs to compute the m-median locations and assign
(i) spatially unbiased if for every pair of sets S1 , S2 ⊆ Q them to the vehicles (hence, from this viewpoint the policy is
centralized).
E [W |X ∈ S1 ] = E [W |X ∈ S2 ]; and In heavy-load, the UTSP policy provides optimal unbiased
∗
performance (i.e., lim%→1− T UTSP /T U = 1):
(ii) spatially biased if there exist sets S1 , S2 ⊆ Q such that
The Unbiased TSP (UTSP) Policy [9] — Let r be
E [W |X ∈ S1 ] > E [W |X ∈ S2 ]. a fixed positive, large integer. From a central point
in the interior of Q, subdivide theRservice region into
Within the class of spatially unbiased policies in P, the r wedges Q1 , . . . , Qr such that Qk ϕ(x)dx = 1/r,
optimal system time is lower bounded by k ∈ {1, . . . , r}. Within each subregion, form sets
R 2 of demands of size n/r (n is a design parameter).
∗
2
βTSP,2 λ Q
ϕ1/2
(x)dx As sets are formed, deposit them in a queue and
TU ≥ as % → 1− , (2)
2 m v 2 (1 − %)2
2 service them FCFS with the first available vehicle
6
by forming a TSP on the set and following it in the dependencies among the inter-demand travel distances,
an arbitrary direction. Optimize over n (see [9] for the analysis of the NN policy is difficult and no rigorous
details). results have been obtained so far [7]; in particular, there are
It is possible to show that, as % → 1− , no rigorous results about its stability properties. Simulation
R 2 experiments show that the NN policy performs like a biased
2
m βTSP,2 λ Q
ϕ 1/2
(x)dx policy and is not optimal in the light-load case orin the heavy-
T UTSP ≤ 1 + ; (4) load case [7], [9]. Therefore, the NN policy lacks provable
r 2 m2 v 2 (1 − %)2
performance guarantees (in particular about stability), and does
thus, letting r → ∞, the lower bound in (2) is achieved. not seem to achieve optimal performance in light-load or in
The same paper [9] presents an optimal biased policy. heavy-load.
This policy, called Biased TSP (BTSP) Policy, relies on an In [15], we study decentralized and adaptive routing policies
even finer partition of the environment and requires ϕ to be that are optimal in light-load and that are optimal unbiased
piecewise constant. algorithms in heavy-load. The key idea we pursue is that of
Although both the UTSP and the BTSP policies are optimal partitioning policies:
within their respective classes, they have two characteristics
Definition IV.1 (Partitioning policies). Given a policy π for
that limit their application to robotic networks: First, in the
the 1-DTRP and m vehicles, a π-partitioning policy is a family
UTSP policy, to ensure stability, n should be chosen so that
of multi-vehicle policies such that
(see [9], page 961)
R 2 (i) the environment Q is partitioned into m openly disjoint
λ2 βTSP,2
2
ϕ 1/2
(x) dx subregions Qk , k ∈ {1, . . . , m}, whose union is Q,
Q
n> ; (ii) one vehicle is assigned to each subregion (thus, there
m2 v 2 (1 − %)2 is a one-to-one correspondence between vehicles and
therefore, to ensure stability over a wide range of values of subregions),
%, the system designer is forced to select a large value for (iii) each vehicle executes the single-vehicle policy π in order
n. However, if during the execution of the policy the load to service demands that fall within its own subregion.
factor turns out to be only moderate, demands have to wait for
Because Definition IV.1 does not specify how the environ-
an excessively large set to be formed, and the overall system
ment is actually partitioned, it describes a family of policies
performance deteriorates significantly. Similar considerations
(one for each partitioning strategy) for the m-DTRP. The SQM
hold for the BTSP policy. Hence, these two policies are ad-
policy, which is optimal in light-load, is indeed a partitioning
hoc. Second, both policies require a centralized data structure
policy whereby Q is partitioned according to a median Voronoi
(the demands’ queue is shared by the vehicles); hence, both
diagram and each vehicle executes inside its own Voronoi
policies are centralized.
region the policy “service FCFS and return to the median
Remark III.2 (System time bounds in heavy-load with zero after each service completion.” Moreover, specific partitioning
service time). If s̄ = 0, then the heavy-load regime is defined policies, which will be characterized in Theorem IV.2, are
as λ/m → +∞, and all the performance bounds we provide optimal or within a constant factor of the optimal in heavy-
in this and in the next two sections hold by simply substituting load.
% = 0. For example, equation (2) reads In the following,R given two functions ϕj : Q → R>0 ,
R 2 j ∈ {1, 2}, with Q ϕj (x) dx = cj , an m-partition (i.e.,
∗
2
βTSP,2 λ Q
ϕ 1/2
(x)dx a partition into m subregions)R is simultaneously equitable
TU ≥ as λ/m → +∞. with respect to ϕ1 and ϕ2 if Qi ϕj (x) dx = cj /m for all
2 m2 v 2
i ∈ {1, . . . , m} and j ∈ {1, 2}. Theorem 12 in [31] shows that,
IV. ROUTING FOR ROBOTIC N ETWORKS : given two such functions ϕj , j ∈ {1, 2}, there always exists
D ECENTRALIZED AND A DAPTIVE P OLICIES an m-partition that is simultaneously equitable with respect to
In this section we first discuss routing algorithms that ϕ1 and ϕ2 , and whose subregions Qi are convex. Then, the
are both adaptive and amenable to decentralized implemen- following results characterize the optimality of two classes of
tation; then, we present a decentralized and adaptive routing partitioning policies [15].
algorithm that does not require any explicit communication Theorem IV.2 (Optimality of partitioning policies). Assume
between the vehicles while still being optimal in the light- π ∗ is a single-vehicle, unbiased optimal policy in the heavy-
load case. load regime (i.e., % → 1− ). For m vehicles,
(i) a π ∗ -partitioning policy based on an m-partition which
A. Decentralized and Adaptive Policies is simultaneously equitable with respect to ϕ and ϕ1/2
is an optimal unbiased policy in heavy-load.
Here, we say that a policy is adaptive if it performs
(ii) a π ∗ -partitioning policy based on an m-partition which
“well” for every value of % in the range [0, 1). A candidate
is equitable with respect to ϕ does not achieve, in
decentralized and adaptive control policy is the simple Nearest
general, the optimal unbiased performance, however it
Neighbor (NN) policy: at each service completion epoch, each
is always within a factor m of it in heavy-load.
vehicle chooses to visit next the closest unserviced demand,
if any, otherwise it stops at the current position. Because of The above results lead to the following strategy: First, for
7
the 1-DTRP, one designs an adaptive and unbiased (in heavy- a partition that is equitable with respect to ϕ and represents a
load) control policy with provable performance guarantees. “good” approximation of a median Voronoi diagram (see [32]
Then, by using decentralized algorithms for environment par- for details on the metrics that we use to judge “closeness”
titioning, such as those recently developed in [32], one extends to median Voronoi diagrams). Moreover, if an m-median of
such single-vehicle policy to a decentralized and adaptive Q that induces a Voronoi partition that is equitable with
multi-vehicle policy. respect to ϕ exists, the algorithm will locally converge to
Consider, first, the single vehicle case. it. This partitioning algorithm is related to the classic Lloyd
The single-vehicle Divide & Conquer (DC) Policy algorithm from vector quantization theory, and exploits the
— Compute an r-partition {Qk }rk=1 of Q that is unique features of power diagrams, a generalization of Voronoi
simultaneously equitable with respect to ϕ and ϕ1/2 . diagrams.
Let P̃1∗ be the point minimizing the sum of distances Accordingly, we define the multi-vehicle Divide & Conquer
to demands serviced in the past (if no points have policy as follows.
been visited in the past, P̃1∗ is set to be a random The multi-vehicle Divide & Conquer (m-DC)
point in Q), and let D be the set of outstanding Policy — The vehicles run the decentralized parti-
demands waiting for service. If D = ∅, move tioning algorithm discussed above (see [32] for more
to P̃1∗ . If, instead, D 6= ∅, randomly choose a details) and assign themselves to the subregions (this
k ∈ {1, . . . , r} and move to subregion Qk ; compute part is indeed a by-product of the algorithm in [32]).
the TSP tour through all demands in subregion Qk Simultaneously, each vehicle executes the single-
and service all demands in Qk by following this vehicle DC policy inside its own subregion.
TSP tour. If D 6= ∅ repeat the service process in The m-DC policy is within a factor m of the optimal
subregion k + 1 (modulo r). unbiased performance in heavy-load (since the algorithm in
This policy is unbiased in heavy-load. In particular, if r → [32] always provides a partition that is equitable with respect
+∞, the policy (i) is optimal in light-load and achieves to ϕ), and stabilizes the system in every load condition. In
optimal unbiased performance in heavy-load, and (ii) is stable general, the m-DC policy is only suboptimal in light-load;
in every load condition. It is possible to show that with r = 10 note, however, that the computation of the global minimum of
the DC policy is already guaranteed to be within 10% of the the Weber function Hm (which is non-convex for m > 1) is
optimal (for unbiased policies) performance in heavy-load. If, difficult for m > 1 (it is NP-hard for the discrete version of
instead, r = 1, the policy (i) is optimal in light-load and the problem); therefore, for m > 1, suboptimality has also to
within a factor 2 of the optimal unbiased performance in be expected from any practical implementation of the SQM
heavy-load, (ii) is stable in every load condition, and (iii) its policy. If an m-median of Q that induces a Voronoi partition
implementation does not require the knowledge of ϕ. This last that is equitable with respect to ϕ exists, the m-DC will locally
property implies that, remarkably, when r = 1, the DC policy converge to it, thus we say that the m-DC policy is “locally”
adapts to all problem data (both % and ϕ). It is worth noting optimal in light-load.
that when r = 1 and ϕ is constant over Q the DC policy is Note that, when the density is uniform, a partition that is
similar to the generation policy presented in [33]. equitable with respect to ϕ is also equitable with respect to
The optimality of the SQM policy and Theorem IV.2(i) ϕ1/2 ; therefore, when the density is uniform the m-DC policy
suggest the following decentralized and adaptive multi-vehicle is arbitrarily close to optimality in heavy-load (see Theorem
version of the DC policy: IV.2(i)).
(i) compute an m-median of Q that induces a Voronoi The m-DC policy adapts to arrival rate λ, expected on-site
partition that is equitable with respect to ϕ and ϕ1/2 , service s̄, and vehicle’s velocity v; however, it requires the
(ii) assign one vehicle to each Voronoi region, knowledge of ϕ.
(iii) each vehicle executes the single-vehicle DC policy in Tables I and II provide a synoptic view of the results
order to service demands that fall within its own subre- available so far; in particular, our policies are compared with
gion, by using the median of the subregion instead of the best unbiased policy available in the literature, i.e., the
P̃1∗ . UTSP policy with r → ∞. In Table I, an asterisk * signals
that the result is heuristic. Note that there are currently no
For a given Q and ϕ, if there exists an m-median of Q that results about decentralized and adaptive routing policies that
induces a Voronoi partition that is equitable with respect to ϕ are optimal in light-load and that are optimal biased algorithms
and ϕ1/2 , then the above policy is optimal both in light-load in heavy-load.
and arbitrarily close to optimality in heavy-load, and stabilizes
the system in every load condition. There are two main issues
with the above policy, namely (i) existence of an m-median of B. A Policy with No Explicit Inter-vehicle Communication
Q that induces a Voronoi partition that is equitable with respect A common theme in cooperative control is the investigation
to ϕ and ϕ1/2 , and (ii) how to compute it. In [32], we showed of the effects of different communication and information
that for some choices of Q and ϕ a median Voronoi diagram sharing protocols on the system performance. Clearly, the
that is equitable with respect to ϕ and ϕ1/2 fails to exist. ability to access more information at each single vehicle can
Additionally, in [32], we presented a decentralized partitioning not decrease the performance level; hence, it is commonly
algorithm that, for any possible choice of Q and ϕ, provides believed that providing better communication among vehicles
8
TABLE I
P OLICIES FOR THE 1-DTRP
TABLE II
P OLICIES FOR THE m-DTRP
will improve the system’s performance. In [16], we propose different cost function, it can be proved that the critical points
a policy for the DVR that does not rely on dedicated com- reached by this algorithm are no worse than the critical points
munication links between vehicles, but only on the vehicles’ reached knowing a priori the distribution ϕ.
knowledge of outstanding demands. An example is when Interestingly, the NC policy can be regarded as a learning
outstanding demands broadcast their location, but vehicles are algorithm in the context of the following game [16]. The
not aware of one another. We show that, under light load service requests are considered as resources and the vehicles
conditions, the inability of vehicles to communicate explicitly as selfish entities. The resources offer rewards in a continuous
does not limit the steady-state performance. In other words, fashion and the vehicles can collect these rewards by traveling
the information contained in the outstanding demands (and to the resource locations. Every resource offers reward at a
hence the effects of others on them) is sufficient to provide, in unit rate when there is at most one vehicle present at its
light load conditions, the same convergence properties attained location and the life of the resource ends as soon as more
when vehicles are able to communicate explicitly. than one vehicle are present at its location. This setup can be
The No (Explicit) Communication (NC) Policy — understood to be an extreme form of congestion game, where
Let D be the set of outstanding demands waiting for the resource cannot be shared between vehicles and where the
service. If D = ∅, move to the point minimizing the resource expires at the first attempt to share it. The total reward
average distance to demands serviced in the past by for vehicle i from a particular resource is the time difference
each vehicle. If there is no unique minimizer, then between its arrival and the arrival of the next vehicle, if i
move to the nearest one. If, instead, D 6= ∅, move is the first vehicle to reach the location of the resource, and
towards the nearest outstanding demand location. zero otherwise. The utility function of vehicle i is then defined
to be the expected value of reward, where the expectation is
In the NC policy, whenever one or more service requests
taken over the location of the next resource. Hence, the goal of
are outstanding, all vehicles will be pursuing a demand; in
every vehicle is to select their reference location to maximize
particular, when only one service request is outstanding, all
the expected value of the reward from the next resource. In
vehicles will move towards it. When the demand queue is
[16], we prove that the median locations, as a choice for
empty, vehicles will either (i) stop at the current location, if
reference positions, are an efficient pure Nash equilibrium for
they have visited no demands yet, or (ii) move to their ref-
this game. Moreover, we prove that by maximizing their own
erence point, as determined by the set of demands previously
utility function, the vehicles also maximize the common global
visited.
utility function, which is the negative of the average wait time
In [16], we prove that the system time provided by the NC
for service requests.
policy converges to a critical point (either a saddle point or a
local minimum) of Hm ∗
(Q) with high probability as λ → 0+ .
Let us underline that, in general, the achieved critical point V. ROUTING FOR ROBOTIC N ETWORKS : T IME
strictly depends on the initial positions of the vehicles inside C ONSTRAINTS AND P RIORITIES
the environment Q. We can not exclude that the algorithm so In many vehicle routing applications, there are strict service
designed will converge indeed to a saddle point instead of a requirements for demands. This can be modeled in two ways.
local minimum. This is due to the fact that the algorithm does In the first case, demands have (possibly stochastic) deadlines
not follow the steepest direction of the gradient of the function on their waiting times. In the second case, demands have
Hm , but just the gradient with respect to one of the variables. different urgency or “threat” levels, which capture the relative
On the other hand, since the algorithm is based on a sequence importance of each demand. In this section we study these
of demands and at each phase we are trying to minimize a two related problems and provide routing policies for both
9
scenarios. We discuss hard time constraints in Section V-A Given an information structure, we then study the following
and priorities in Section V-B. optimization problem OPT :
In this section we focus only on the case of a uniform spatial
OPT : min |π|, subject to lim Pπ [Wj < Gj ] ≥ φd ,
density ϕ. However, the algorithms we present below extend π j→∞
directly to non-uniform density. One simply replaces the equal
where |π| is the number of vehicles used by π (the existence of
area partitions with simultaneously equitable (with respect to
the limit limj→∞ Pπ [Wj < Gj ] and equivalent formulations
ϕ and ϕ1/2 ) partitions, as described for the DC policy in
in terms of time averages are discussed in [12], [34]). Let m∗
Section IV. The presentation, on the other hand, would become
denote the optimal cost for the problem OPT (for a given
more involved, and thus we restrict our attention to uniform
information structure).
densities.
In principle, one should study the problem OPT for each
of the possible information structures. In [12], instead, we
A. Time Constraints considered the following strategy: first, we derived a lower
bound that is valid under the most informative information
In [11], [12] we introduced and analyzed DVR with time
structure (this implies validity under any information struc-
constraints. Specifically, the setup is the same as that of the m-
ture), then we presented and analyzed two service policies that
DTRP, but now each demand j waits for the beginning of its
are amenable to implementation under the least informative
service no longer than a stochastic patience time Gj , which is
information structure (this implies implementability under any
generally distributed according to a distribution function FG .
information structure). Such approach gives general insights
A vehicle can start the on-site service for the jth demand only
into the problem OPT .
within the stochastic time window [Aj , Aj + Gj ), where Aj
1) Lower Bound: We next present a lower bound for the
is the arrival time of the jth demand. If the on-site service for
optimization problem OPT that holds under any information
the jth demand is not started before the time instant Aj + Gj ,
structure. Let P = (p1 , . . . , pm ) and define
then the jth demand is considered lost; in other words, such
1 kx − xk k
Z
demand leaves the system and never returns. If, instead, the
Lm (P, Q) := 1 − FG min dx.
on-site service for the jth demand is started before the time |Q| Q k∈{1,...,m} v
instant Aj + Gj , then the demand is considered successfully
Theorem V.1 (Lower bound on OPT ). Under any informa-
serviced. The waiting time of demand j, denoted again by
tion structure, the optimal cost for the minimization problem
Wj , is the elapsed time between the arrival of demand j
OPT is lower bounded by the optimal cost for the minimiza-
and the time either one of the vehicles starts its service or
tion problem
such demand departs from the system due to impatience,
whichever happens first. Hence, the jth demand is considered min m
m∈N>0
serviced if and only if Wj < Gj . Accordingly, we denote by (5)
Pπ [Wj < Gj ] the probability that the jth demand is serviced subject to sup Lm (P, Q) ≥ φd .
P ∈Qm
under a routing policy π. The aim is to find the minimum
number of vehicles needed to ensure that the steady-state The proof of this lower bound relies on some nearest-
probability that a demand is successfully serviced is larger neighbor arguments. Algorithms to find the solution to the
than a desired value φd ∈ (0, 1), and to determine the policy minimization problem in equation (5) have been presented in
the vehicles should execute to ensure that such objective is [12].
attained. 2) The Nearest-Depot Assignment (NDA) Policy: We next
Formally, define the success factor of a policy π as present the Nearest-Depot Assignment (NDA) policy, which
φπ := limj→+∞ Pπ [Wj < Gj ]. We identify four types of requires the least amount of information and is optimal in
information on which a control policy can rely: 1) Arrival light-load.
time and location: we assume that the information on arrivals The Nearest-Depot Assignment (NDA) Policy —
and locations of demands is immediately available to control Let P̃m ∗
(Q) := arg maxP ∈Qm Lm (P, Q) (if there
policies; 2) On-site service: the on-site service requirement of are multiple maxima, pick one arbitrarily), and let
demands may either (i) be available, or (ii) be available only p̃∗k be the location of the depot for the kth vehicle,
through prior statistics, or (iii) not be available to control poli- k ∈ {1, . . . , m}. Assign a newly arrived demand
cies; 3) Patience time: the patience time of demands may either to the vehicle whose depot is the nearest to that
(i) be available, or (ii) be available only through prior statistics; demand’s location, and let Dk be the set of out-
4) Departure notification: the information that a demand leaves standing demands assigned to vehicle k. If the set
the system due to impatience may or may not be available Dk is empty, move to p̃∗k ; otherwise, visit demands
to control policies (if the patience time is available, such in Dk in first-come, first-served order, by taking the
information is clearly available). Hence, several information shortest path to each demand location. Repeat.
structures are relevant. The least informative case is when on- In [12] we prove that the NDA policy is optimal in light-load
site service requirements and departure notifications are not under any information structure. Note that the NDA policy is
available, and patience times are available only through prior very similar to the SQM policy described in section III-D;
statistics; the most informative case is when on-site service the only difference is that the depot locations are now the
requirements and patience times are available. maximizers of Lm , instead of the minimizers of Hm .
10
3) The Batch (B) Policy: Finally, we present the Batch (B) demand is
policy, which is well-defined for any information structure, n
1 X
however it is particularly tailored for the least informative case Tπ = λα T π,α ,
Λ α=1
and is most effective in moderate and heavy-loads.
The Batch (B) Policy — Partition Q into m equal Pn
where Λ := α=1 λα , and T π,α is the expected system time
area regions Qk , k ∈ {1, . . . , m}, and assign one ve- of α-demands (under routing policy π). The average system
hicle to each region. Assign a newly arrived demand time per demand is the standard cost functional for queueing
that falls in Qk to the vehicle responsible for region systems with multiple
k, and let Dk be the set of locations of outstanding Pn classes of demands. Notice that we
can write T π = α=1 cα T π,α with cα = λα /Λ. Thus, if
demands assigned to vehicle k. For each vehicle- we aim to assign distinct importance levels, we can model
region pair k: if the set Dk is empty, move to the priority among classes by allowing any convex combination
median (the “depot”) of Qk ; otherwise, compute a of T π,1 , . . . , T π,n . If cα > λα /Λ, then the system time of α-
TSP tour through all demands in Dk and vehicle’s demands is being weighted more heavily than in the average
current position, and service demands by following case. In other words, the quantity cα Λ/λα gives the priority of
the TSP tour, skipping demands that are no longer α-demands compared to that given in the average system time
outstanding. Repeat. case. Without loss of generality we can assume that priority
Note that this policy is basically a simplified version of the classes are labeled so that
m-DC policy (with r = 1).
c1 c2 cn
The following theorem characterizes the batch policy, under ≥ ≥ ··· ≥ , (7)
the assumption of zero on-site service, and assuming the least λ1 λ2 λn
informative information structure.
implying that if α < β for some α, β ∈ {1, . . . , n}, then the
Theorem V.2 (Vehicles required by batch policy). Assuming priority of α-demands is at least as high as that of β-demands.
zero on-site service, the batch policy guarantees a success The problem is as follows. Consider
Pn a set of coefficients
factor at least as large as φd if the number of vehicles is cα > 0, α ∈ {1, . . . , n}, with α=1 cα = 1, and satisfying
equal to or larger than: expression (7). Determine the policy π (if it exists) which
n o minimizes the cost
min m sup (1−FG (θ))(1−2g(m)/θ) ≥ φd ,
θ∈R>0 n
X
2 q T π,c := cα T π,α .
4 β̄ 2
where g(m) := 12 β̄v2 |Q| mλ2 + β̄v4 |Q|2 m
λ2
4 + 8 v 2 |Q| m ,
1 α=1
and where β̄ is a constant that depends on the shape of the In the light-load case where % → 0+ we can use existing
service regions. policies to solve the problem. This is summarized in the
Furthermore, in [11] we show that when (i) the system is following remark.
in heavy-load, (ii) φd tends to one, and (iii) the deadlines are
Remark V.3 (Light-load regime). In light-load, it can be
deterministic, the batch policy requires a number of vehicles
verified that the Stochastic Queue Median policy (see Sec-
that is within a factor 3.78 of the optimal.
tion III-D) provides optimal performance. That is, the vehicles
can simply ignore the priorities and service the demands in
B. Priorities the FCFS order, returning to their median locations between
each service.
In this section we look at a DVR problem in which
demands for service have different levels of importance. The 1) Lower Bound in Heavy-Load: In this section we present
service vehicles must then prioritize, providing a quality of a lower bound on the weighted system time T π,c for every
service which is proportional to each demand’s importance. policy π.
We introduced this problem in [13]. Formally, we assume the
environment Q ⊂ R2 , with area |Q|, contains m vehicles, each Theorem V.4 (Heavy-load lower bound). The system time of
with maximum speed v. Demands of type α ∈ {1, . . . , n}, any policy π is lower bounded by
called α-demands, arrive in the environment according to a 2 n n
βTSP,2 |Q| X
Poisson process with rate λα . Upon arrival, demands assume
X
T π,c ≥ cα + 2 c j λα , (8)
an independently and uniformly distributed location in Q. An 2m2 v 2 (1 − %)2 α=1 j=α+1
α-demand requires on-site service with finite mean s̄α .
For this problem the load factor can be written as as % → 1− , where c1 , . . . , cn satisfy expression (7).
n
1 X Remark V.5 (Lower bound for all % ∈ [0, 1)). Lower
% := λα s̄α . (6) bound (8) holds only in heavy-load. We can also obtain a
m α=1
lower bound that is valid for all values of %. However, in the
The condition % < 1 is necessary for the existence of a stable heavy-load limit it is less tight than bound (8). Under the
policy. For a stable policy π, the average system time per labeling in expression (7), this general bound for any policy
11
Q target
d1 d2
Fig. 5. Illustration of the Strip Loitering policy. The trajectory providing
closure of the loitering path (along which the vehicles travel from the end of
the last strip to the beginning of the first strip) is not shown here for clarity
of the drawing.
Fig. 6. Close-up of the Strip Loitering policy with construction of the point
of departure and the distances d1 , and d2 for a given demand, at the instant
of appearance.
Divide Q into strips of width r, where
( 2/3 )
4 RS + 10.38ρS
r = min √ , 2ρ .
3 ρ m
ρ
Orient the strips along the side of length R. Con-
struct a closed Dubins path, henceforth referred to as
the loitering path, which runs along the longitudinal
bisector of each strip, visiting all strips in top-to- Bρ (!)
p− p+
bottom sequence, making U-turns between strips at
the edges of Q, and finally returning to the initial !
configuration. The m vehicles are allotted loitering
positions on this path, equally spaced, in terms of
path length.
When a demand arrives, it is allocated to the closest
Fig. 7. An illustration for the construction of the bead for a given ρ and `.
vehicle among those that lie within the same strip
as the demand and that have the demand in front of
them. When a vehicle has no outstanding demands, where height denotes the smaller of the two side
the vehicle returns to its loitering position as follows. lengths of a rectangle. Let R and S be the width
(We restrict the exposition to the case when a vehicle and height of this bounding rectangle, respectively.
has only one outstanding demand Fig. when
2. Construction itsthe “bead” Bρ (!).
it leaves of The figure shows how the upper half of the boundary is constructed, the bottom
Tile the plane with identical beads Bρ (`) with ` =
loitering position and no more demands are allotted min{CBTA v/λ, 4ρ}, where
to it before it returns to its loitering position; other √ −1
cases can be handled similarly.) After making a 7 − 17 7πρS
CBTA = 1+ .
left turn of length d2 (as shownNext, in Figure
we study the probability of targets belonging4 to a given 3|Q|
6) to bead. Consider a bead B entirely
service the demand, the vehicle makes a right turn
The beads are oriented to be along the width of the
of length 2d2 followed by another andleft turn of
assume n length
points are uniformly randomly generated Q. Thevehicle th
bounding rectangle. TheinDubins probability
visitsthat
all the i point is s
d2 , and then returns to the loitering path. However,
beads intersecting Q in a row-by-row fashion in top-
the vehicle has fallen behind in the loitering path
to-bottom sequence, Area(Bρ (!)) one demand
by a distance 4(d2 − ρ sin dρ2 ). To rectify this, as it µ(!) =servicing at least .
in every nonempty bead.Area(Q) This process is repeated
nears the end of the current strip, it takes its U-turn
indefinitely.
a distance 2(d2 − ρ sin dρ2 ) early.
Furthermore, the probability that Theexactly k outis of
BT policy the n points
extended to the are sampledBead
m-vehicle in BTiling
has a binomial d
Note that the loitering path must cover Q, but it need not
(mBT) policy in the following way (see Figure 8).
Q. The bounding
cover the entire bounding box of indicating with nBbox the is
total number of points sampled in B,
merely a construction used to place an upper bound on the The m-vehicle Bead Tiling (mBT) Policy — Di-
vide the environment into regions ! "of dominance with
total path length. n
The MC and SL policies will be proven to be efficient in Pr[nB = k| n samples] = µk (1 − µ)n−k .
k
light-load. We now propose the Bead Tiling policy which will
be proven to be efficient in heavy-load. If theAbead
key component
length ! is ofchosen as a function of n in such a way that ν = n · µ(!(n)) is a constant
the algorithm is the construction of a novel geometric set,
tuned to the kinetic constraints of the for Dubins
large n vehicle, called distribution is [31] the Poisson distribution of mean ν, that is,
of the binomial
the bead [17]. The construction of a bead Bρ (`) for a given
ρ and an additional parameter ` > 0 is illustrated in Figure 7. ν k −ν
We start with the policy for a single vehicle. lim Pr[n B = k| n samples] = Q e .
n→+∞ k!
The Bead Tiling (BT) Policy — Bound the en-
vironment Q with a rectangle of minimum height, Fig. 8. An illustration of the mBT policy.
C. The Recursive Bead-Tiling Algorithm
In this section, we design a novel algorithm that computes a Dubins path through a point set in Q
14
lines parallel to the bead rows. Let the area and Theorem VI.3 together with Theorem VI.1 implies that the
height of the i-th vehicle’s region be denoted with mBT policy is within a constant factor of the optimal in heavy-
|Q|i and Si . Place the subregion dividers in such a load, and thatthe optimal system time in this case belongs to
way that Θ λ2 /(mv)3 .
7 1
7
It is instructive to compare the scaling of the optimal system
|Q|i + πρSi = |Q| + πρS time with respect to λ, m and v for the m-DTRP and for the
3 m 3
m-Dubins DTRP. Such comparison is shown in Table III. One
for all i ∈ {1, . . . , m}. Allocate one subregion to
every vehicle and let each vehicle execute the BT TABLE III
A COMPARISON BETWEEN THE SCALING OF THE OPTIMAL SYSTEM TIME
policy in its own region. FOR THE m-DTRP AND FOR THE m-D UBINS DTRP.
VII. ROUTING FOR ROBOTIC N ETWORKS : concern for the constant factors. It turns out that this analysis
T EAM F ORMING FOR C OOPERATIVE TASKS provides substantial insight into the performance of different
Here we study demands (or tasks) that require the si- team forming policies. This type of asymptotic analysis is
multaneous services of several vehicles [20]. In particular, frequently performed in computational complexity [45] and
consider m vehicles, each capable of providing one of k ad-hoc networking [46].
services. We assume that there are mj > 0 vehicles capable
of providing service j (called vehicles
Pkof service-type j), for A. Three Team Forming Policies
each j ∈ {1, . . . , k}, and thus m := j=1 mj . We now present three team forming policies.
In addition, we assume there are K different types of tasks.
The Complete Team (CT) Policy — Form m/k
Tasks of type α ∈ {1, . . . , K} arrive according to a Poisson
teams of k vehicles, where each team contains one
process with rate λα , and assume aPlocation i.i.d. uniformly
K vehicle of each service-type. Each team meets and
in Q.2 The total arrival rate is λ := α=1 λα . Each task-type
moves as a single entity. As tasks arrive, service
α requires a subset of the k services. We record the required
them by one of the m/k teams according to the
services in a zero-one (column) vector Rα ∈ {0, 1}k . The jth
UTSP policy.
entry of Rα is 1 if service j is required for task-type α, and
0 otherwise. The on-site service time for each task-type α has For the second policy, recall that the vector R1K records in
mean s̄α . To complete a task of type α, a team of vehicles its jth entry the number of task-types that require service j,
capable of providing the required services must travel to the where 1K is a K × 1 vector of ones. Thus, if
task location and remain there simultaneously for the on-site R1K ≤ [m1 , . . . , mk ]T (17)
service time. We refer to this problem as the dynamic team
forming problem [20]. component-wise, then there are enough vehicles of each
As a motivating example, consider the scenario given in service-type to create
Section I where each demand (or task) corresponds to an event
$ ( )%
mj
that requires close-range observation. The sensors required mTST := min j ∈ {1, . . . , k}
eTj R1K
to properly assess each event will depend on that event’s
properties. In particular, a event may require several different dedicated teams for each task-type, where ej is the jth vector
sensing modalities, such as electro-optical, infra-red, synthetic of the standard basis of Rk . Thus, when equation (17) is
aperture radar, foliage penetrating radar, and moving target satisfied, we have the following policy.
indication radar [44]. One solution would be to equip each The Task-Specific Team (TT) Policy — For each
UAV with all sensing modalities (or services). However, in of the K task-types, create mTST teams of vehicles,
many cases, most events will require only a few sensing where there is one vehicle in the team for each
modalities. Thus, we might increase our efficiency by having a service required by the task-type. Service each task
larger number of UAVs, each equipped with a single modality, by one of its mTST corresponding teams, according
and then forming the appropriate sensing team to observe each to the UTSP policy.
event. The task-specific team policy can be applied only when
We restrict our attention to task-type unbiased policies; there is a sufficient number of vehicles of each service-type.
policies π for which the system time of each task (denoted by The following policy requires only a single vehicle of each
T π,α ) is equal, and thus T π,1 = T π,2 = · · · = T π,K =: T π . service type. The policy partitions the task-types into groups,
We seek policies π which minimize the expected system time where each group is chosen such that there is a sufficient
of tasks T π . Policies of this type are amenable to analysis number of vehicles to create a dedicated team for each task-
because the task-type unbiased constraint collapses the feasible type in the group. The task-specific team policy is then run on
set of system times from a subset of RK to a subset of R. each group sequentially. The groups are defined via a service
Defining the matrix schedule which is a partition of the K task-types into L ≤ K
R := [R1 · · · RK ] ∈ {0, 1}k×K , (15) time slots, such that each task-type appears in precisely one
time slot, and the task-types in each time slot are pairwise
a necessary condition for stability is disjoint (i.e., in a given time slot, each service appears in
R[λ1 s̄1 · · · λK s̄K ]T < [m1 · · · mk ]T (16) at most one task-type).3 We now formally present the third
policy.
component-wise. Note that this condition is akin to the “load The Scheduled Task-Specific Team (STT) Policy
factor” in Subsection III-B. However, the space of load factors — Partition Q into mCT := min{m1 , . . . , mk } equal
is much richer, and thus light and heavy-load are no longer area regions and assign one vehicle of each service-
simply defined. To simplify the problem we take an alterna- type to each region. In each region form a queue for
tive approach. We study the performance as the number of each of the K task-types. For each time slot in the
vehicles becomes very large, i.e., m → +∞. In addition, schedule and each task-type in the time slot, create
we simply look at the order of the expected delay, without
3 Computing an optimal schedule is equivalent to solving a vertex coloring
2 As in Section V, the algorithms in this section extend directly to a non- problem, which is NP-hard. However, an approximate schedule can be
uniform spatial density by utilizing simultaneously equitably partitions. computed via known vertex coloring heuristics; see [20] for more details.
16
$
a team containing one vehicle for each required '!
System Time T
following an optimal TSP tour. When the end of the '!
#
12345
(see [20] for details). T'!'
ord
!
T'!
min
B. Performance of Policies
!'
To analyze the performance of the policies we make the fol- '!
! !"# !"$ !"% !"& Bcrit
'
lowing simplifying assumptions: (A1) There are n/k vehicles Throughput
)*+,-.*/-0Bm
have presented dynamic vehicle routing algorithms with prov- [4] R. W. Beard, T. W. McLain, M. A. Goodrich, and E. P. Anderson,
able performance guarantees for several important problems. “Coordinated target assignment and intercept for unmanned air vehicles,”
IEEE Transactions on Robotics and Automation, vol. 18, no. 6, pp. 911–
These include adaptive and decentralized implementations, 922, 2002.
demands with time constraints and priority levels, vehicles [5] G. Arslan, J. R. Marden, and J. S. Shamma, “Autonomous vehicle-target
with motion constraints, and team forming. These results assignment: A game theoretic formulation,” ASME Journal on Dynamic
Systems, Measurement, and Control, vol. 129, no. 5, pp. 584–596, 2007.
complement those from the online algorithms literature, in [6] P. Toth and D. Vigo, eds., The Vehicle Routing Problem. Monographs
that they characterize average case performance (rather than on Discrete Mathematics and Applications, SIAM, 2001.
worst-case), and exploit probabilistic knowledge about future [7] D. J. Bertsimas and G. J. van Ryzin, “A stochastic and dynamic vehicle
demands. routing problem in the Euclidean plane,” Operations Research, vol. 39,
pp. 601–615, 1991.
Dynamic vehicle routing is an active area of research and, [8] D. J. Bertsimas and G. J. van Ryzin, “Stochastic and dynamic vehicle
in recent years, several directions have been pursued which routing in the Euclidean plane with multiple capacitated vehicles,”
were not covered in this paper. In [14], [48], we consider Operations Research, vol. 41, no. 1, pp. 60–76, 1993.
[9] D. J. Bertsimas and G. J. van Ryzin, “Stochastic and dynamic ve-
moving demands. The work focuses on demands that arrive hicle routing with general interarrival and service time distributions,”
on a line segment, and move in a perpendicular direction Advances in Applied Probability, vol. 25, pp. 947–978, 1993.
at fixed speed. The problem has applications in perimeter [10] H. N. Psaraftis, “Dynamic programming solution to the single vehicle
many-to-many immediate request dial-a-ride problem,” Transportation
defense as well as robotic pick-and-place operations. In [19], Science, vol. 14, no. 2, pp. 130–154, 1980.
a setup is considered where the information on outstanding [11] M. Pavone, N. Bisnik, E. Frazzoli, and V. Isler, “A stochastic and
demands is provided to the vehicles through limited-range on- dynamic vehicle routing problem with time windows and customer im-
board sensors, thus adding a search component to the DVR patience,” ACM/Springer Journal of Mobile Networks and Applications,
vol. 14, no. 3, pp. 350–364, 2009.
problem with full information. The work in [49] and [50] [12] M. Pavone and E. Frazzoli, “Dynamic vehicle routing with stochastic
considers the dynamic pickup and delivery problem, where time constraints,” in Proc. IEEE Conf. on Robotics and Automation,
each demand consists of a source-destination pair, and the (Anchorage, Alaska), May 2010. To Appear.
[13] S. L. Smith, M. Pavone, F. Bullo, and E. Frazzoli, “Dynamic vehicle
vehicles are responsible for picking up a message at the source, routing with priority classes of stochastic demands,” SIAM Journal on
and delivering it to the destination. In [8], the authors consider Control and Optimization, vol. 48, no. 5, pp. 3224–3245, 2010.
the case in which each vehicle can serve at most a finite [14] S. D. Bopardikar, S. L. Smith, F. Bullo, and J. P. Hespanha, “Dynamic
vehicle routing for translating demands: Stability analysis and receding-
number of demands before returning to a depot for refilling. In horizon policies,” IEEE Transactions on Automatic Control, vol. 55,
[51], a DVR problem is considered involving vehicles whose no. 11, 2010. (Submitted, Mar 2009) to appear.
dynamics can be modeled by state space models that are [15] M. Pavone, E. Frazzoli, and F. Bullo, “Adaptive and distributed algo-
affine in control and have an output in R2 . Finally, in [21] rithms for vehicle routing in a stochastic and dynamic environment,”
IEEE Transactions on Automatic Control, 2009. Provisionally Accepted,
we consider a setup where the servicing of a demand needs http://arxiv.org/abs/0903.3624.
to be done by a vehicle under the supervision of a remotely [16] A. Arsie, K. Savla, and E. Frazzoli, “Efficient routing algorithms for
located human operator. multiple vehicles with no explicit communications,” IEEE Transactions
on Automatic Control, vol. 54, no. 10, pp. 2302–2317, 2009.
The dynamic vehicle routing approach presented in this [17] K. Savla, E. Frazzoli, and F. Bullo, “Traveling Salesperson Problems for
paper provides a new way of studying robotic systems in dy- the Dubins vehicle,” IEEE Transactions on Automatic Control, vol. 53,
namically changing environments. We have presented results no. 6, pp. 1378–1391, 2008.
[18] J. J. Enright, K. Savla, E. Frazzoli, and F. Bullo, “Stochastic and dynamic
for a wide variety of problems. However, this is by no means routing problems for multiple UAVs,” AIAA Journal of Guidance,
a closed book. There is great potential for obtaining more Control, and Dynamics, vol. 34, no. 4, pp. 1152–1166, 2009.
general performance guarantees by developing methods to deal [19] J. J. Enright and E. Frazzoli, “Cooperative UAV routing with limited
with correlation between demand positions. In addition, there sensor range,” in AIAA Conf. on Guidance, Navigation and Control,
(Keystone, CO), Aug. 2006.
are many other key problems in robotic systems that could [20] S. L. Smith and F. Bullo, “The dynamic team forming problem:
benefit from being studied from the perspective presented in Throughput and delay for unbiased policies,” Systems & Control Letters,
this paper. Some examples include search and rescue missions, vol. 58, no. 10-11, pp. 709–715, 2009.
[21] K. Savla, T. Temple, and E. Frazzoli, “Human-in-the-loop vehicle
force protection, map maintenance, and pursuit-evasion. routing policies for dynamic environments,” in IEEE Conf. on Decision
and Control, (Cancún, México), pp. 1145–1150, Dec. 2008.
[22] D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update and
ACKNOWLEDGMENTS paging rules,” Communications of the ACM, vol. 28, no. 2, pp. 202–208,
The authors wish to thank Alessandro Arsie, Shaunak D. 1985.
[23] S. O. Krumke, W. E. de Paepe, D. Poensgen, and L. Stougie, “News
Bopardikar, and John J. Enright for numerous helpful discus- from the online traveling repairmain,” Theoretical Computer Science,
sions about topics related to this paper. vol. 295, no. 1-3, pp. 279–294, 2003.
[24] S. Irani, X. Lu, and A. Regan, “On-line algorithms for the dynamic
traveling repair problem,” Journal of Scheduling, vol. 7, no. 3, pp. 243–
R EFERENCES 258, 2004.
[1] B. J. Moore and K. M. Passino, “Distributed task assignment for [25] P. Jaillet and M. R. Wagner, “Online routing problems: Value of
mobile agents,” IEEE Transactions on Automatic Control, vol. 52, no. 4, advanced information and improved competitive ratios,” Transportation
pp. 749–753, 2007. Science, vol. 40, no. 2, pp. 200–210, 2006.
[2] S. L. Smith and F. Bullo, “Monotonic target assignment for robotic [26] B. Golden, S. Raghavan, and E. Wasil, The Vehicle Routing Prob-
networks,” IEEE Transactions on Automatic Control, vol. 54, no. 9, lem: Latest Advances and New Challenges, vol. 43 of Operations
pp. 2042–2057, 2009. Research/Computer Science Interfaces. Springer, 2008.
[3] M. Alighanbari and J. P. How, “A robust approach to the UAV task [27] P. Van Hentenryck, R. Bent, and E. Upfal, “Online stochastic optimiza-
assignment problem,” International Journal on Robust and Nonlinear tion under time constraints,” Annals of Operations Research, 2009. To
Control, vol. 18, no. 2, pp. 118–134, 2008. appear.
18
[28] P. K. Agarwal and M. Sharir, “Efficient algorithms for geometric [54] A. G. Percus and O. C. Martin, “Finite size and dimensional dependence
optimization,” ACM Computing Surveys, vol. 30, no. 4, pp. 412–458, of the Euclidean traveling salesman problem,” Physical Review Letters,
1998. vol. 76, no. 8, pp. 1188–1191, 1996.
[29] Z. Drezner, ed., Facility Location: A Survey of Applications and Meth- [55] R. C. Larson and A. R. Odoni, Urban Operations Research. Prentice
ods. Series in Operations Research, Springer, 1995. Hall, 1981.
[30] H. Xu, Optimal Policies for Stochastic and Dynamic Vehicle Routing [56] D. Applegate, R. Bixby, V. Chvátal, and W. Cook, “On the solution of
Problems. PhD thesis, Massachusetts Institute of Technology, Cam- traveling salesman problems,” in Documenta Mathematica, Journal der
bridge, MA, 1995. Deutschen Mathematiker-Vereinigung, (Berlin, Germany), pp. 645–656,
[31] S. Bespamyatnikh, D. Kirkpatrick, and J. Snoeyink, “Generalizing ham Aug. 1998. Proceedings of the International Congress of Mathemati-
sandwich cuts to equitable subdivisions,” Discrete & Computational cians, Extra Volume ICM III.
Geometry, vol. 24, pp. 605–622, 2000. [57] N. Christofides, “Worst-case analysis of a new heuristic for the trav-
[32] M. Pavone, A. Arsie, E. Frazzoli, and F. Bullo, “Equitable par- eling salesman problem,” Tech. Rep. 388, Carnegie Mellon University,
titioning policies for robotic networks,” IEEE Transactions on Pittsburgh, PA, Apr. 1976.
Automatic Control, 2009. Provisionally Accepted, available at [58] S. Arora, “Nearly linear time approximation scheme for Euclidean TSP
http://arxiv.org/abs/0903.5267. and other geometric problems,” in IEEE Symposium on Foundations of
[33] J. D. Papastavrou, “A stochastic and dynamic routing policy using Computer Science, (Miami Beach, FL), pp. 554–563, Oct. 1997.
branching processes with state dependent immigration,” European Jour- [59] S. Lin and B. W. Kernighan, “An effective heuristic algorithm for the
nal of Operational Research, vol. 95, pp. 167–177, 1996. traveling-salesman problem,” Operations Research, vol. 21, pp. 498–
[34] M. Pavone, Dynamic Vehicle Routing for Robotic Networks. Dept. of 516, 1973.
Aeronautics and Astronautics, Massachusetts Institute of Technology, [60] N. Megiddo and K. J. Supowit, “On the complexity of some common
Cambridge, MA, 2010. geometric location problems,” SIAM Journal on Computing, vol. 13,
[35] L. E. Dubins, “On curves of minimal length with a constraint on no. 1, pp. 182–196, 1984.
average curvature and with prescribed initial and terminal positions and [61] C. H. Papadimitriou, “Worst-case and probabilistic analysis of a geo-
tangents,” American Journal of Mathematics, vol. 79, pp. 497–516, 1957. metric location problem,” SIAM Journal on Computing, vol. 10, no. 3,
[36] S. M. LaValle, Planning Algorithms. Cambridge University Press, 2006. 1981.
Available at http://planning.cs.uiuc.edu.
[37] U. Boscain and B. Piccoli, Optimal Syntheses for Control Systems on
2-D Manifolds. Mathématiques et Applications, Springer, 2004. A PPENDIX
[38] L. Pallottino and A. Bicchi, “On optimal cooperative conflict resolution
for air traffic management systems,” IEEE Transactions on Intelligent A. Asymptotic Properties of the Traveling Salesman Problem
Transportation Systems, vol. 1, no. 4, pp. 221–231, 2000. in the Euclidean Plane
[39] C. Tomlin, I. Mitchell, and R. Ghosh, “Safety verification of conflict res-
olution manoeuvres,” IEEE Transactions on Intelligent Transportation Let D be a set of n points in Rd and let TSP(D) denote
Systems, vol. 2, no. 2, pp. 110–120, 2001. the minimum length of a tour through all the points in D;
[40] P. Chandler, S. Rasmussen, and M. Pachter, “UAV cooperative path
planning,” in AIAA Conf. on Guidance, Navigation and Control, (Denver, by convention, TSP(∅) = 0. Assume that the locations of the
CO), Aug. 2000. n points are random variables independently and identically
[41] C. Schumacher, P. R. Chandler, S. J. Rasmussen, and D. Walker, “Task distributed in a compact set Q; in [52], [53] it is shown that
allocation for wide area search munitions with variable path length,” in
American Control Conference, (Denver, CO), pp. 3472–3477, 2003. there exists a constant βTSP,d such that, almost surely,
[42] E. Zemel, “Probabilistic analysis of geometric location problems,”
TSP(D)
Z
Annals of Operations Research, vol. 1, no. 3, pp. 215–238, 1984.
[43] K. Savla and E. Frazzoli, “On endogenous reconfiguration for mobile
lim = βTSP,d ϕ̄(q)1−1/d dq, (20)
n→+∞ n1−1/d Q
robotic networks,” in Workshop on Algorithmic Foundations of Robotics,
(Guanajuato, Mexico), Dec. 2008. where ϕ̄ is the density of the absolutely continuous part of the
[44] E. K. P. Chong, C. M. Kreucher, and A. O. Hero III, “Monte-Carlo-
based partially observable Markov decision process approximations point distribution. For the case d = 2, the constant βTSP,2 has
for adaptive sensing,” in Int. Workshop on Discrete Event Systems, been estimated numerically as βTSP,2 ' 0.7120±0.0002 [54].
(Göteborg, Sweden), pp. 173–180, May 2008. Notice that the bound (20) holds for all compact sets: the
[45] B. Korte and J. Vygen, Combinatorial Optimization: Theory and Al-
gorithms, vol. 21 of Algorithmics and Combinatorics. Springer, 4 ed., shape of the set only affects the convergence rate to the limit.
2007. According to [55], if Q is a “fairly compact and fairly convex”
[46] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE set in the plane, then equation (20) provides a “good” estimate
Transactions on Information Theory, vol. 46, no. 2, pp. 388–404, 2000.
[47] V. Sharma, M. Savchenko, E. Frazzoli, and P. Voulgaris, “Transfer time of the optimal TSP tour length for values of n as low as 15.
complexity of conflict-free vehicle routing with no communications,”
International Journal of Robotics Research, vol. 26, no. 3, pp. 255–
272, 2007. B. Tools for Solving TSPs
[48] S. L. Smith, S. D. Bopardikar, and F. Bullo, “A dynamic boundary
guarding problem with translating demands,” in IEEE Conf. on Decision The TSP is known to be NP-complete, which suggests that
and Control, (Shanghai, China), pp. 8543–8548, Dec. 2009. there is no general algorithm capable of finding the optimal
[49] H. A. Waisanen, D. Shah, and M. A. Dahleh, “A dynamic pickup and tour in an amount of time polynomial in the size of the input.
delivery problem in mobile networks under information constraints,”
IEEE Transactions on Automatic Control, vol. 53, no. 6, pp. 1419–1433, Even though the exact optimal solution of a large TSP can be
2008. very hard to compute, several exact and heuristic algorithms
[50] M. R. Swihart and J. D. Papastavrou, “A stochastic and dynamic model and software tools are available for the numerical solution of
for the single-vehicle pick-up delivery problem,” European Journal of
Operational Research, vol. 114, pp. 447–464, 1999. TSPs.
[51] S. Itani, E. Frazzoli, and M. A. Dahleh, “Dynamic travelling repairperson The most advanced TSP solver to date is arguably
problem for dynamic systems,” in IEEE Conf. on Decision and Control, concorde [56]. Polynomial-time algorithms are available
(Cancún, México), pp. 465–470, Dec. 2008.
[52] J. Beardwood, J. Halton, and J. Hammersly, “The shortest path through for constant-factor approximations of TSP solutions, among
many points,” in Proceedings of the Cambridge Philosophy Society, which we mention Christofides’ algorithm [57]. On a more
vol. 55, pp. 299–327, 1959. theoretical side, Arora proved the existence of polynomial-
[53] J. M. Steele, “Probabilistic and worst case analyses of classical problems
of combinatorial optimization in Euclidean space,” Mathematics of time approximation schemes for the TSP, providing a (1 + ε)
Operations Research, vol. 15, no. 4, p. 749, 1990. constant-factor approximation for any ε > 0 [58].
19
Xm Z
= kpk − qkϕ(q) dq,
k=1 Vk (P )
4 The concorde and linkern solvers are freely available for academic
research use at www.tsp.gatech.edu/concorde/index.html.