The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Learning Neuro-Symbolic Abstractions for Robot Planning and Learning
Naman Shah
School of Computing and Augmented Intelligence
Arizona State University, Tempe, AZ, USA, 85281
[email protected]
Abstract
Although state-of-the-art hierarchical robot planning algo-
rithms allow robots to efficiently compute long-horizon mo-
tion plans for achieving user desired tasks, these methods typ-
ically rely upon environment-dependent state and action ab-
stractions that need to be hand-designed by experts. On the
other hand, non-hierarchical robot planning approaches fail (a) (b) (c)
to compute solutions for complex tasks that require reason-
ing over a long horizon. My research addresses these prob-
lems by proposing an approach for learning abstractions and Figure 1: The figure shows the overall approach of my re-
developing hierarchical planners that efficiently use learned search. (a) shows the ground input motion planning prob-
abstractions to boost robot planning performance and provide lem. The next step is to identify critical regions as show in
strong guarantees of reliability. (b) and use them synthesize abstract states and actions as
shown in (c) using colored cell and arrows respectively.
1 Introduction
Robots needs to plan their actions in order to complete com- 2 Proposed Approach
plex tasks in these various areas. E.g., consider the prob-
My research develops data-driven neuro-symbolic ap-
lem shown in Fig. 1(a). However, robot planning over a
proaches for learning to create hierarchical state and action
long horizon is challenging due to the continuous state and
abstractions for unseen environments. I use the concept of
action spaces of the robot. Hierarchical approaches (Gar-
critical regions (Molina, Kumar, and Srivastava 2020) for
rett, Lozano-Pérez, and Kaelbling 2020; Shah et al. 2020)
constructing these hierarchical abstractions. Intuitively, crit-
have shown that such abstractions can also be used for ef-
ical regions generalize bottlenecks and hub or access points
ficient robot planning. Unfortunately, these approaches re-
in an environment in a single concept. I propose to learn a
quire sound abstractions that are consistent with the motion
critical region predictor using randomly generated motion
planning of the robot. However, designing these abstractions
plans in a few training environments and use it to automat-
is non-intuitive and non-trivial and requires a domain ex-
ically identify critical regions in an unseen environment us-
pert. Most related approaches require hand-coded abstrac-
ing an occupancy matrix of the environment.
tions (Garrett, Lozano-Pérez, and Kaelbling 2020; Shah
et al. 2020) or require experience in the test domain (Bagaria My research uses these automatically identified critical re-
and Konidaris 2020) to learn abstractions. gions to automatically construct a region-based Voronoi di-
My research aims to answer two crucial research ques- agram (RBVD). A region-based Voronoi diagram partitions
tion: (1) Can we automatically learn effective hierarchical the configuration space into different cells. Each cell defines
state and action abstractions that enable hierarchical plan- an abstract state inducing an abstraction function. High-level
ning, and (2) Is it possible to develop efficient approaches abstract actions are defined as transition between these ab-
that use these automatically generated hierarchical abstrac- stract states induced by Voronoi cells. Fig. 1(c) shows an
tions for robot planning? My research focuses on devel- illustration of a region-based Voronoi diagram. Thus, we ob-
oping data-driven neuro-symbolic approaches for automat- tain a neuro-symbolic atomic abstract representation for an
ically learning such hierarchical states and action abstrac- otherwise continuous configuration space of the robot.
tions for complex long-horizon robot planning tasks in un- Given an abstract representation with a discrete set of ab-
seen environments. I also develop hierarchical planners that stract states and abstract actions constructed in a bottom-
use these learned abstractions for efficient robot planning. up fashion as outlined above, I focus to develop hierarchi-
cal robot planning approaches for efficiently using these
Copyright © 2024, Association for the Advancement of Artificial abstractions. My research proposes to develop hierarchical
Intelligence (www.aaai.org). All rights reserved. probabilistically-complete robot planning algorithms that
23417
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
interleave high-level symbolic reasoning with continuous 1996; LaValle et al. 1998) and learning-based (Molina, Ku-
low-level motion planning using learned state and action mar, and Srivastava 2020) motion planners. The results show
abstractions. Here, an interleaved approach implies that the that using hierarchical planning alongside learning signifi-
developed algorithm searches for a high-level abstract plan cantly (∼ 10×) improves the efficiency.
that has valid low-level refinement for all its symbolic ac-
tion in iterative setting. This develops a suite of hierarchi- 3.3 Robot Planning Under Uncertainty
cal algorithms that provide strong theoretical guarantees of Shah and Srivastava (2022a) develop an approach -- stochas-
probabilistically-completeness and downward refineability. tic hierarchical abstraction-guided robot planner (SHARP)
-- for computing motion policies for robots in stochastic dy-
3 Preliminary Results namics. It uses the abstract states defined using an RBVD
and defines options that makes transitions between these ab-
This section outlines multiple algorithms for hierarchical stract states. These options are multi-task meaning same set
planning developed using the above mentioned approach for of options can be used for multiple problems in the same en-
solving robot planning problem. These approaches include vironment. SHARP uses A∗ search to compute a high-level
stochastic task and motion planning approach (Sec. 3.1) us- plan by composing options and then uses an off-the-shelf
ing hand-coded abstractions and hierarchical planning ap- DRL approach to compute policies for these options.
proaches using learned abstractions (Sec. 3.2 and 3.3). The approach is evaluated in 14 different settings and
compared against a re-planning variant of RRT (LaValle
3.1 Stochastic Task and Motion Planning et al. 1998), SAC (Haarnoja et al. 2018), and several HRL
Shah et al. (2020) develop an interleaved algorithm for com- approaches. While most baselines failed to compute solu-
bined task and motion planning. It takes a continuous robot tions, our approach significantly outperformed all the base-
planning problem in the form of a stochastic shortest path lines owing to a dense auto-generated pseudo-reward and an
(SSP) problem and an entity abstraction as an input and uses effectively shorter horizon for learning reactive policies.
it to compute task and motion policy for the input SSP that
the robot can execute in the low-level. It iteratively computes References
a high-level policy and its refinements until it finds a policy Bagaria, A.; and Konidaris, G. 2020. Option discovery using
that has valid motion planning refinements for all its actions. deep skill chaining. In ICLR.
The approach is evaluated in multiple settings where com- Garrett, C.; Lozano-Pérez, T.; and Kaelbling, L. 2020.
bined task and motion planning is necessary to compute fea- PDDLStream: Integrating symbolic planners and blackbox
sible solutions. Refining each possible outcome in the policy samplers via optimistic adaptive planning. In ICAPS.
can take a substantial amount of time. However, ATAM al-
Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft
gorithm (Shah et al. 2020) reduces the problem of selecting
actor-critic: Off-policy maximum entropy deep reinforce-
scenarios for refinement to a knapsack problem and use a
ment learning with a stochastic actor. In ICML.
greedy approach to prioritize more likely outcomes for re-
finement. The empirical evaluation shows that this approach Kavraki, L. E.; Svestka, P.; Latombe, J.-C.; and Overmars,
allows the robot to start executing action much earlier com- M. 1996. Probabilistic Roadmaps for Path Planning in High-
pared to when actions are selected randomly. Detailed algo- Dimensional Configuration Spaces. IEEE TRO, 12(4).
rithm and experiments are available in the paper. LaValle, S. M.; et al. 1998. Rapidly-exploring random trees:
A new tool for path planning. Iowa State University.
3.2 Robot Planning Using Learned Abstractions Lowerre, B. T. 1976. The Harpy Speech Recognition System.
Carnegie Mellon University.
Shah and Srivastava (2022b) develop a hierarchical planner
-- hierarchical abstraction-guided robot planner (HARP) -- Molina, D.; Kumar, K.; and Srivastava, S. 2020. Identifying
that uses automatically synthesized state abstraction in the Critical Regions for Motion Planning using Auto-Generated
form of a region-based Voronoi diagram and the action ab- Saliency Labels with Convolutional Neural Networks. In
stractions induced by it. The approach develops a hierarchi- ICRA.
cal planner that uses a multi-source multi-directional vari- Shah, N.; Kala Vasudevan, D.; Kumar, K.; Kamojjhala, P.;
ant of the Beam search (Lowerre 1976) for computing a set and Srivastava, S. 2020. Anytime Integrated Task and Mo-
of high-level plans and use a multi-source multi-directional tion Policies for Stochastic Environments. In ICRA.
motion planner LLP (Molina, Kumar, and Srivastava 2020) Shah, N.; and Srivastava, S. 2022a. Multi-Task Option
to simultaneously refine them into a motion plan. Multi- Learning and Discovery for Stochastic Path Planning. arXiv
source approaches typically do not work for robot planning. preprint arXiv:2210.00068.
However, critical regions provide crucial information about Shah, N.; and Srivastava, S. 2022b. Using Deep Learning to
the states that the robot would potentially visit allowing a Bootstrap Abstractions for Hierarchical Robot Planning. In
multi-source approach to work for robot planning. In sum- AAMAS.
mary, Shah and Srivastava (2022b) develop a first approach
for learning to create zero-shot state and action abstractions.
The approach is evaluated in multiple settings and com-
pared against state-of-the art sampling-based (Kavraki et al.
23418