0% found this document useful (0 votes)
23 views10 pages

Learning Efficiency Meets Symmetry Breaking: Yingbin Bai, Sylvie Thi Ebaux, Felipe Trevizan

The paper introduces a graph representation for planning problems that enhances learning efficiency and symmetry detection using Graph Neural Networks (GNNs). It presents two pruning methods, action pruning and state pruning, to manage symmetries during the search process, achieving improved performance over traditional methods like LAMA. The proposed techniques are implemented in a planner called Distincter, which demonstrates superior coverage in the latest IPC learning track dataset.

Uploaded by

Mariana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views10 pages

Learning Efficiency Meets Symmetry Breaking: Yingbin Bai, Sylvie Thi Ebaux, Felipe Trevizan

The paper introduces a graph representation for planning problems that enhances learning efficiency and symmetry detection using Graph Neural Networks (GNNs). It presents two pruning methods, action pruning and state pruning, to manage symmetries during the search process, achieving improved performance over traditional methods like LAMA. The proposed techniques are implemented in a planner called Distincter, which demonstrates superior coverage in the latest IPC learning track dataset.

Uploaded by

Mariana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Learning Efficiency Meets Symmetry Breaking

Yingbin Bai1 , Sylvie Thiébaux1, 2 , Felipe Trevizan1


1
School of Computing, The Australian National University
2
LAAS-CNRS, Université de Toulouse
[email protected], [email protected], [email protected]
arXiv:2504.19738v1 [cs.AI] 28 Apr 2025

Abstract advantage, the full potential of this feature has not yet been
effectively harnessed to detect and break symmetries during
Learning-based planners leveraging Graph Neural Networks the search process.
can learn search guidance applicable to large search spaces,
In this paper, we remedy this by introducing a graph rep-
yet their potential to address symmetries remains largely un-
explored. In this paper, we introduce a graph representation of resentation designed to achieve two key objectives: learning
planning problems allying learning efficiency with the abil- efficiency and symmetry reduction. Leveraging the strengths
ity to detect symmetries, along with two pruning methods, of this representation, we propose two pruning methodolo-
action pruning and state pruning, designed to manage sym- gies: action pruning and state pruning. Action pruning in-
metries during search. The integration of these techniques fers symmetries by analyzing object involvement in action
into Fast Downward achieves a first-time success over LAMA parameters, without generating child states nor computing
on the latest IPC learning track dataset. Code is released at: their heuristic value. Additionally, since GNNs can retain
https://github.com/bybeye/Distincter. invariant outputs for symmetrical inputs, state pruning ex-
ploits this property to efficiently identify symmetries be-
Introduction tween states.
To evaluate the proposed techniques, we implemented
Over the past two decades, heuristic search has achieved them on top of Fast Downward (Helmert 2006) in a plan-
significant success across a variety of planning problems, ner called Distincter and carried out experiments on the
and has become the standard approach in the field (Richter 2023 International Planning Competition Learning Track.
and Westphal 2010; Höller et al. 2020; Corrêa et al. 2022; The overall coverage of Distincter surpasses that of the tradi-
Geißer et al. 2022; Klößner, Seipp, and Steinmetz 2023). tional SOTA method, LAMA (Richter and Westphal 2010),
Nevertheless, even in classical planning, scalability remains for the first time in the recent literature on learning plan-
a significant challenge for these methods. This has led a ning heuristics, marking a significant milestone for learning-
growing number of researchers to turn to learning-based based methods.
methods, particularly using Graph Neural Networks (GNNs) In terms of related work, recent independent work by
(Toyer et al. 2020; Shen, Trevizan, and Thiébaux 2020; (Drexler et al. 2024b) removes symmetries in the training set
Ståhlberg, Bonet, and Geffner 2022; Chen, Trevizan, and in offline mode, thereby improving training effectiveness. In
Thiébaux 2024; Horcı́k and Sı́r 2024; Hao et al. 2024; contrast, our approach focuses on removing symmetries dur-
Drexler et al. 2024b). Unlike traditional model-based meth- ing the search process, so as to enhance search efficiency and
ods, which are reliant solely on analysing planning domain scale to large planning problems.
and problem definitions, GNNs are capable of learning pat-
terns and strategies from existing plans to enhance search Background and notation
efficiency and adaptability.
However, learning efficiency alone is insufficient to A lifted planning problem is defined as a tuple Π =
address the challenges inherent in large-scale planning, ⟨O, T , P, A, I, G⟩, where O denotes a set of objects, T is
which often involves a substantial number of symmetrical a set of object types, P consists of first-order predicates, A
states (Wehrle et al. 2015; Sievers et al. 2019b). Although comprises action schemas, I specifies the current (or initial)
these states do not affect plan quality, they consume sig- state, and G delineates the goal.
nificant computational resources and can considerably slow A predicate p ∈ P has parameters xp1 , . . . , xpn for pn ∈
down the search process. In this paper, we use NNs and N, where each parameter requires a specific type of object.
GNNs with permutation invariant activation functions to A predicate can be instantiated by assigning each xi to an
learn a permutation invariant function allowing them to pro- object from O, resulting in a proposition ρ. A state is an
duce consistent outputs for symmetrical inputs. Despite this assignment of truth value to the propositions.
An action schema a = ⟨Xa , pre(a), add(a), del(a)⟩ is
Copyright © 2025, Association for the Advancement of Artificial defined as a tuple comprising a list of typed parameters
Intelligence (www.aaai.org). All rights reserved. Xa = (xa1 , . . . xan ), along with sets of preconditions, add
effects, and delete effects, all of which are predicates in P • E = {(o, p(o1 , ..., on ))|o ∈ O, ∃i o = oi , p(o1 , ..., on ) ∈
with parameters from Xa . When all parameters of an action I ∪ G}
schema are instantiated with objects of the required types, • c : V → {(status, class) | status ∈ {0, 1, 2, 3}, class ∈
the action is referred to as a ground action. A ground action T ∪P}, maps each vertex to a tuple where:
a is applicable in a state s if pre(a) ⊆ s. When a is applied
to s, the resulting state s′ is given by (s\del(a))∪add(a). In – status indicates the goal status of propositions: 0 for
this context, the state s is referred to as the parent state, and non-goal propositions in I \ G, 1 for unachieved goal
s′ is known as the child state. Since the set of applicable ac- propositions in G \ I, and 2 for achieved goal propo-
tions for a parent state is typically not a singleton, expanding sitions in I ∩ G. status = 3 for object vertices.
a parent state usually generates a set of child states. – class refers to the object type for object vertices, and
A sequence of actions a1 , . . . , an is applicable in a state for proposition vertices, it denotes the predicate of
s if there exists a sequence of states s0 , . . . , sn such that which the proposition is an instance.
s0 = s, and for each i ∈ {1, . . . , n}, the state si is the result • l : E → N, where for each edge e ∈ E, l(e) indicates
of applying ai in si−1 . The aim is to find a plan for a given the index of the object in the proposition parameters.
planning problem Π, which is a sequence of ground actions
that is applicable in the initial state I and results in a state In ILG, the object type information is absent, whereas
sn such that G ⊆ sn . TILG embeds it within each object vertex. This may seem
A colored (or labelled) graph is a tuple G = ⟨V, E, c, l⟩ minor, but it adds valuable information to each object vertex
where V is the set of vertices, E is the set of edges, and significantly enriching the information available. Moreover,
c (resp. l) maps vertices (resp. edges) to their color. Two Fast Downward omits static propositions during search,
graphs G = ⟨V, E, c, l⟩ and G′ = ⟨V ′ , E ′ , c′ , l′ ⟩ are iso- which causes them to be missing in existing ILG imple-
morphic, denoted by G ∼ = G′ , if there exists a bijection mentations as well. While this omission does not affect
τ : V → V such that (u, v) ∈ E iff (τ (u), τ (v)) ∈ E ′ ,
′ traditional heuristic methods, it significantly impacts learn-
c′ (τ (v)) = c(v) for all v ∈ V , and l′ ((τ (u), τ (v))) = ing methods, which estimate heuristics based on the graph.
l((u, v)) for all (u, v) ∈ E. Without static propositions, crucial information is lost, lead-
An automorphism of G is defined as an equivalence rela- ing to blind guesses for some actions. For instance, “wait-
tion σ representing an isomorphism between G and itself. ing” propositions in the childsnack domain are static, and
The set of all automorphisms of G forms a group under without this information, planners are unable to determine
the operation of composition, known as the automorphism which table the tray should be moved to. Therefore, TILG
group Aut(G) of the graph. The orbit of a vertex v in a graph includes static propositions.
consists of all vertices that can be transformed into v by any In the following, all elements of the problem Π are fixed,
automorphism in Aut(G). This implies that any two vertices except for the current state s. We shall therefore identify Π
within the same orbit are structurally equivalent in the graph, with s and will refer to the TILG Gπ as Gs .
maintaining the same connections and roles relative to other
vertices and edges. Action Pruning
With Greedy Best-First Search (GBFS), planners select the
Distincter state with the smallest heuristic value to expand, which re-
Typed Instance Learning Graph (TILG) quires computing the heuristic value of all child states. When
Our graph representation extends the Instance Learning child states contain a large number of symmetries, this can
Graph (ILG) (Chen, Trevizan, and Thiébaux 2024), main- result in significant time wasted on redundant calculations.
taining similar structures but offering additional information Shallow pruning was designed to address this challenge
for learning and symmetry detection. The graph’s vertices (Pochter, Zohar, and Rosenschein 2011). However, the prob-
represent objects and propositions in the initial (current) lem description graph (PDG) used in shallow pruning re-
state and the goal, and edges exist between propositions and quires instantiating all predicates and action schemas within
the objects in their parameter list.1 Vertex features capture the graph, resulting in significant computational overhead
the object types, the predicates instanciated by the propo- for each state. To improve efficiency, we introduce Action
sitions, and whether goal propositions have been achieved. Pruning, which replaces the PDG with TILG. A key inno-
Edge features capture the index of objects in proposition pa- vation of action pruning is its ability to infer symmetrical
rameter lists. Formally: child states from the parent state, eliminating the need for
action preconditions and effects in the graph. By leveraging
Definition 1. Let Π = ⟨O, T , P, A, I, G⟩ represent a the much more compact TILG representation and its infer-
lifted planning problem. The typed instance learning graph ence capability, action pruning enables faster automorphism
(TILG) for Π is the undirected graph GΠ = ⟨V, E, f, l⟩, calculations.
such that:
Definition 2. (Object Tuples Equivalence) Let
• V =O∪I ∪G
⟨A1 , . . . , An ⟩ and ⟨B1 , . . . , Bn ⟩ be two tuples of ob-
1
In the following, we will use the word symmetric to refer to jects s.t. Ai and Bi in O with corresponding vertices
states represented by isomorphic TILG and to objects or proposi- ui and vi in the TILG Gs . We say that ⟨A1 , . . . , An ⟩
tions that are related via τ (or σ depending on the context). is equivalent to ⟨B1 , . . . , Bn ⟩ in s, denoted as
Algorithm 1: Action pruning algorithm feature i.
Input: Planning problem with current state s After obtaining the orbits Os of Gs , the parameters of
Input: Set As of actions applicable in s each applicable action a are substituted with their respec-
Output: Pruned action set Ap ⊆ As tive orbit IDs, generating a unique hash key Ka . This hash
1: K ← ∅, Ap ← ∅ key is subsequently used to identify and eliminate symmet-
2: Graph Gs ← TILG(s) with encoding in Eq. 1 ric actions, ensuring that only distinct actions are retained in
3: Orbits Os ← Nauty(Gs ) Ap for further processing.
4: for a in As do
5: Ka ← Replace params with orbits(a, Os ) State Pruning
6: if Ka not in K then Symmetries arise not only between child states but also
7: K ← K ∪ {Ka } across states from different parents. Many state pruning ap-
8: Ap ← Ap ∪ {a} proaches have been proposed and proven useful in classical
9: end if planning (Pochter, Zohar, and Rosenschein 2011; Domsh-
10: end for
lak, Katz, and Shleyfman 2012). However, the main issue
11: return Ap
limiting their widespread use in planning problems is their
high computational cost. To address this issue, we propose
a novel method that performs state pruning with negligible
⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩, iff there exist an auto- additional overhead. Specifically, building on the permuta-
morphism of Gs represented by the bijective function σ s.t. tion invariance property of GNNs, we use the embeddings
σ(ui ) = vi for all i ∈ {1, . . . , n}. from the second-to-last layer of the network as hash keys to
Theorem 1. Let Ai and Bi be objects in O for i ∈ efficiently detect and eliminate symmetries across states.
{1, . . . , n} with Ai ̸= Aj and Bi ̸= Bj for all i ̸= j. Let The idea of using neural network outputs to check sim-
an action schema α ∈ A, and consider two ground actions, ilarity is not new, having been employed in Siamese net-
a = α(A1 , A2 , . . . , An ) and b = α(B1 , B2 , . . . , Bn ), ap- works since early work in deep learning (Bromley et al.
plicable in a state s, resulting in successor states sa and sb 1993). These identical architecture, weight-shared networks
respectively. If ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ in s then the are specifically designed to assess and compare the similar-
TILGs Gsa and Gsb are isomorphic, i.e., Gsa ∼ = Gsb . ity between two inputs. This approach has proven effective
across various fields, including fingerprint identification (Li
The proof of Theorem 1 is in the supplementary material. et al. 2021) and anomaly detection (Zhou et al. 2021). For
Unfortunately, identifying all isomorphic successor states GNNs, Chen et al. (2019) highlight the equivalence between
in order to prune actions requires testing an exponential graph isomorphism testing and approximating permutation-
number of tuples against the automorphisms of Gs . There- invariant functions. Moreover, standard GNNs have been
fore, we resort to a simpler method that over-approximates shown to possess an expressive power, comparable to that of
the set of equivalent tuples and isomorphic successor states, the 1-LWL test (Xu et al. 2019). While this implies GNNs
and consequently does not preserve the completeness of the may be unable to distinguish some non-isomorphic graphs,
search process. It relaxes the conditions of the theorem by compromising the completeness of the search when state
checking the equivalence of all individual pairs Ai and Bi pruning is used, our results demonstrate that GNNs based
in s, i.e., the condition that ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ on TILG can be highly effective in both heuristic prediction
is replaced with ⟨Ai ⟩ ≃ ⟨Bi ⟩ ∀i, or in other terms that Ai and state pruning.
and Bi are in the same orbit of Gs . In our experiments, we The TILG Gsi for the current state si is fed through a
find that this over-approximation yields good results in prac- graph network ϕθ to encode an embedding zi . Subsequently,
tice. Moreover, we did not observe any failure due to incom- zi is processed by a fully connected linear layer φθ to gener-
pleteness, and therefore do not currently employ any fall-
back mechanism. ate a heuristic value ĥi ∈ R. This process is represented by
This process of action pruning is outlined in Algorithm 1. zi = ϕθ (Gsi ) and ĥi = φθ (zi ).
First, the planning problem with current state s is converted Next, zi is rounded up and encoded using MD5 to shorten
into a TILG Gs . The Nauty library (McKay and Piperno its length, serving as a key in a hash map for state match-
2014) is then utilized to compute the orbits Os of Gs . Since ing. Since zi is efficiently captured during the network’s for-
Nauty lacks support for feature vectors, we aggregate ver- ward pass, there is no need to generate keys through compu-
tex features into an unique vertex color to detect automor- tationally expensive methods like calculating isomorphisms
phisms. This color-coding strategy is detailed in Equation 1: in PDG (Pochter, Zohar, and Rosenschein 2011), resulting
XN in minimal additional cost for state pruning.
color = 10βi × Fi with
i=1
 Pi−1 (1) Experiments
βi = n=1 ⌈log 10 M n ⌉ i ≥ 2 , Datasets. We evaluate our framework, Distincter, on the
0 i=1 2023 International Planning Competition Learning Track
where N denotes the number of features, Fi represents the (Seipp and Segovia-Aguas 2023), which includes ten do-
value of feature i, and Mn is the maximum possible value of mains. In ablation experiments, to assess the effectiveness
Domain hF F LAMA GOOSE OptRank GPR Distincter Domain None Action State Distincter
blocksworld 28 61 61±10 44±11 69 88±4 blocksworld 79±11 79±7 88±3 88±4
childsnack 26 34 16±4 32±1 20 64±5 childsnack 34±4 63±3 61±1 64±5
ferry 71 70 70±0 64±4 82 83±1 ferry 82±0 82±0 83±1 83±1
floortile 10 10 1±0 1±0 2 2±0 floortile 2±1 2±0 2±0 2±0
miconic 90 90 89±1 88±4 90 90±0
miconic 90±0 90±0 90±0 90±0
rovers 29 70 28±1 31±2 36 42±2
satellite 64 90 29±2 29±3 39 48±17
rovers 41±2 41±3 41±2 42±2
sokoban 36 40 34±0 32±1 38 32±2 satellite 45±13 46±13 47±17 48±17
spanner 30 30 39±16 65±0 74 90±0 sokoban 32±2 32±2 32±2 32±2
transport 41 68 37±4 42±5 28 50±3 spanner 83±0 90±0 83±0 90±0
transport 42±1 42±1 49±2 50±3
Sum 425 563 405 429 478 589
gripper 24±9 75±4 90±0 90±0
grippers 62±5 89±1 85±1 90±1
Table 1: Coverage comparison with SOTA methods on the logistics 19±9 36±4 53±4 52±3
2023 International Planning Competition Learning Track. movie 90±0 74±17 90±0 75±17
tsp 76±24 78±0 90±0 90±0
Domain LAMA Distincter tyreworld 0±0 1±1 64±20 65±21
blocksworld 390 198±10 Sum 803 920 1048 1051
childsnack 45 34±3
ferry 257 206±2 Table 3: Ablation study. “None” refers to GBFS + GNN
floortile 34 32±0 heuristic without pruning, “Action” denotes the use of ac-
miconic 324 273±8 tion pruning, “State” represents the use of state pruning.
rovers 72 106±11
satellite 18 27±8
sokoban 46 49±14 All experiments are conducted on a single CPU core with an
spanner 14 16±0 NVIDIA A6000 GPU and 8GB of memory, with a 30 minute
transport 49 45±3 timeout per problem. The mean and standard deviation are
Sum 1249 987 computed from three trials.

Table 2: Average plan lengths over problems solved by both Results


LAMA and Distincter. We compare Distincter with SOTA baselines, including
both traditional heuristic search methods, namely LAMA
(Richter and Westphal 2010) and GBFS with hF F (Hoff-
of two proposed pruning methods, we further employ six mann and Nebel 2001), and GBFS with learnt heuristics us-
domains that contain a lot of symmetrical states. ing GOOSE (Chen, Thiébaux, and Trevizan 2024), its op-
Network structure. Our graph network consists of RGCN timal ranking counterpart (Hao et al. 2024) and Gaussian
layers with a hidden dimension of 64 (Schlichtkrull et al. Process Regression (GPR) from WL features (Chen, Tre-
2018), followed by global add pooling, and a linear layer vizan, and Thiébaux 2024). Note that, as shown in (Hao
producing a one-dimensional output. The network is imple- et al. 2024), other methods such as STRIPS-HGN (Shen,
mented using the standard PyTorch Geometric package (Fey Trevizan, and Thiébaux 2020) and Perfrank (Chrestien et al.
and Lenssen 2019). For further setting information, please 2023), are dominated by our baselines. All baselines are
see the supplementary material. run on the same hardware and with the same computational
requirements as Distincter. In terms of coverage, Distinc-
Training and evaluation. For each domain, we train a ter matches or surpasses all baselines across five domains.
GNN using the RMSE loss function for 30 epochs, including Notably, when compared to the strongest learning base-
10 warm-up epochs. We use an initial learning rate of 10−3 line, GPR (Chen, Trevizan, and Thiébaux 2024), Distincter
and apply cosine annealing(Loshchilov and Hutter 2017) achieves parity or superiority in eight of the ten domains.
over a single cycle, with a momentum value of 0.9. To adapt Additionally, the total coverage of Distincter surpasses that
to varying numbers of examples (N ) across different do- of model-based methods: it exceeds that of hF F by a sub-
mains, we set the number of iterations to 100 per epoch and stantial margin of 164 and that of LAMA by 26. In Table 2,
N we report the average plan length over the problems suc-
adjust the batch size accordingly, using 100 .
For evaluation, our planner is based on Fast Downward cessfully solved by both approaches. The results suggest a
using eager-GBFS (Helmert 2006) guided by GNN heuris- correlation between plan length and coverage.
tic values. Upon completing the training, the model is saved In keeping with the 8GB memory requirement of the IPC
in JIT format and executed using C++. To ensure result sta- learning track, we found it is insufficient for the Fast Down-
bility and mitigate dataset bias, we employ early stopping on ward translator to ground some large problems, resulting in
a validation set to select optimal models (Bai et al. 2023a). performance loss as shown in Table 1. For instance, in the
“ferry” domain, when sufficient memory is available, Dis- References
tincter can solve all 90 test problems. Bai, Y.; Han, Z.; Yang, E.; Yu, J.; Han, B.; Wang, D.; and
Although Distincter exhibits very good performance Liu, T. 2023a. Subclass-Dominant Label Noise: A Coun-
across many domains, it struggles in others – see e.g. its low terexample for the Success of Early Stopping. In NeurIPS.
performance on “Floortile” and “Sokoban”, and the large de-
Bai, Y.; Han, Z.; Yang, E.; Yu, J.; Han, B.; Wang, D.; and
viations observed in the “Satellite” domain. The key issue
Liu, T. 2023b. Subclass-Dominant Label Noise: A Coun-
with Floortile and Sokoban is that they require path-finding
terexample for the Success of Early Stopping. In NeurIPS.
and geometrical reasoning which cannot be achieved with
the limited receptive field of ordinary GNNs. Dead ends in Bai, Y.; Yang, E.; Wang, Z.; Du, Y.; Han, B.; Deng, C.;
these domains are another issue, as they cannot always be Wang, D.; and Liu, T. 2022. RSA:Reducing Semantic Shift
captured when training with optimal plans only. In Satellite, from Aggressive Augmentations for Self-supervised Learn-
due to the the lack of static propositions in its graph, GOOSE ing. In NeurIPS.
learns a simple strategy that works in simple problems only. Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; and Shah,
On the other hand, thanks to statics being included, Distinc- R. 1993. Signature Verification Using a Siamese Time Delay
ter is able to learn to solve much larger problems. However, Neural Network. In NeurIPS, 737–744.
standard GNNs are not expressive enough to learn to dis- Chen, D. Z.; Thiébaux, S.; and Trevizan, F. W. 2024.
tinguish all non-isomorphic Satellite states (Drexler et al. Learning Domain-Independent Heuristics for Grounded and
2024b), and therefore the learning procedure fails every once Lifted Planning. In AAAI, 20078–20086.
in a while, leading to a large variance.
Chen, D. Z.; Trevizan, F. W.; and Thiébaux, S. 2024. Re-
turn to Tradition: Learning Reliable Heuristics with Classi-
Ablation Study cal Machine Learning. In ICAPS, 68–76.
To assess the effectiveness of our proposed pruning tech- Chen, Z.; Villar, S.; Chen, L.; and Bruna, J. 2019. On the
niques, we performed ablation experiments across four con- equivalence between graph isomorphism testing and func-
figurations. From Table 3, we observe that in domains with tion approximation with GNNs. In NeurIPS, 15868–15876.
a high degree of symmetries, such as “spanner” and “child- Chrestien, L.; Edelkamp, S.; Komenda, A.; and Pevný, T.
snack”, action pruning offers substantial benefits. However, 2023. Optimize Planning Heuristics to Rank, not to Estimate
in the “movie” domain, which contains a large number of Cost-to-Goal. In NeurIPS.
redundant objects, action pruning consumes excessive time
computing symmetries. In contrast, state pruning improves Corrêa, A. B.; Giacomo, G. D.; Helmert, M.; and Rubin, S.
performance in seven domains, demonstrating its broader 2024. Planning with Object Creation. In ICAPS, 104–113.
utility. By combining both techniques, we can leverage their Corrêa, A. B.; Pommerening, F.; Helmert, M.; and Francès,
strengths to achieve better overall outcomes. G. 2022. The FF Heuristic for Lifted Classical Planning. In
AAAI, 9716–9723.
Conclusion Domshlak, C.; Katz, M.; and Shleyfman, A. 2012. Enhanced
Symmetry Breaking in Cost-Optimal Planning as Forward
We introduced TILG, a novel graphical representation that Search. In ICAPS, 343–347.
captures key problem-solving features and is designed for Drexler, D.; Ståhlberg, S.; Bonet, B.; and Geffner, H. 2024a.
combining efficiency with symmetry detection. Leveraging Equivalence-Based Abstractions for Learning General Poli-
the properties of TILG, we proposed two efficient pruning cies. In Proc. ICAPS-24 Workshop on Bridging the Gap
techniques that are suitable for large-scale planning prob- Between AI Planning and Reinforcement Learning (PRL).
lems. Our framework, Distincter, achieved a historic mile-
stone by outperforming the LAMA framework on the learn- Drexler, D.; Ståhlberg, S.; Bonet, B.; and Geffner, H.
ing track of the 2023 International Planning Competition. 2024b. Symmetries and Expressive Requirements for Learn-
In addition, both pruning methods are applicable to tradi- ing General Policies. In KR.
tional model-based approaches. Although state pruning with Fey, M.; and Lenssen, J. E. 2019. Fast Graph Representation
GNNs can be computationally expensive, small dedicated Learning with PyTorch Geometric. CoRR, abs/1903.02428.
GNNs can mitigate this issue. Geißer, F.; Haslum, P.; Thiébaux, S.; and Trevizan, F. W.
2022. Admissible Heuristics for Multi-Objective Planning.
Acknowledgments In ICAPS, 100–109.
Hao, M.; Trevizan, F. W.; Thiébaux, S.; Ferber, P.; and Hoff-
The authors thank Sandra Kiefer and Brendan McKay for mann, J. 2024. Guiding GBFS through Learned Pairwise
useful discussions. This work was supported by the Aus- Rankings. In IJCAI, 6724–6732.
tralian Research Council grant DP220103815, by the Arti-
ficial and Natural Intelligence Toulouse Institute (ANITI) Helmert, M. 2006. The Fast Downward Planning System. J.
under the grant agreement ANR-23-IACL-0002, and by Artif. Intell. Res., 26: 191–246.
the European Union’s Horizon Europe Research and Inno- Hoffmann, J.; and Nebel, B. 2001. The FF Planning System:
vation program under the grant agreement TUPLES No. Fast Plan Generation Through Heuristic Search. J. Artif. In-
101070149. tell. Res., 14: 253–302.
Höller, D.; Bercher, P.; Behnke, G.; and Biundo, S. 2020. Wehrle, M.; Helmert, M.; Shleyfman, A.; and Katz, M.
HTN Planning as Heuristic Progression Search. J. Artif. In- 2015. Integrating Partial Order Reduction and Symmetry
tell. Res., 67: 835–880. Elimination for Cost-Optimal Classical Planning. In IJCAI,
Horcı́k, R.; and Sı́r, G. 2024. Expressiveness of Graph Neu- 1712–1718.
ral Networks in Planning Domains. In ICAPS, 281–289. Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019. How
Powerful are Graph Neural Networks? In ICLR.
Klößner, T.; Seipp, J.; and Steinmetz, M. 2023. Cartesian
Abstractions and Saturated Cost Partitioning in Probabilistic Zhou, X.; Liang, W.; Shimizu, S.; Ma, J.; and Jin, Q.
Planning. In ECAI, 1272–1279. 2021. Siamese Neural Network Based Few-Shot Learning
for Anomaly Detection in Industrial Cyber-Physical Sys-
Li, Q.; Liao, X.; Liu, M.; and Valaee, S. 2021. Indoor Lo- tems. IEEE Trans. Ind. Informatics, 17(8): 5790–5798.
calization Based on CSI Fingerprint by Siamese Convolu-
tion Neural Network. IEEE Trans. Veh. Technol., 70(11):
12168–12173.
Loshchilov, I.; and Hutter, F. 2017. SGDR: Stochastic Gra-
dient Descent with Warm Restarts. In ICLR.
McKay, B. D.; and Piperno, A. 2014. Practical graph iso-
morphism, II. Journal of Symbolic Computation, 60: 94–
112.
Pochter, N.; Zohar, A.; and Rosenschein, J. S. 2011. Exploit-
ing Problem Symmetries in State-Based Planners. In AAAI,
1004–1009.
Richter, S.; and Westphal, M. 2010. The LAMA Planner:
Guiding Cost-Based Anytime Planning with Landmarks. J.
Artif. Intell. Res., 39: 127–177.
Schlichtkrull, M. S.; Kipf, T. N.; Bloem, P.; van den Berg,
R.; Titov, I.; and Welling, M. 2018. Modeling Relational
Data with Graph Convolutional Networks. In ESWC, vol-
ume 10843, 593–607.
Seipp, J.; Keller, T.; and Helmert, M. 2017. Narrowing the
Gap Between Saturated and Optimal Cost Partitioning for
Classical Planning. In AAAI, 3651–3657.
Seipp, J.; and Segovia-Aguas, J. 2023. Int. Planning Com-
petition 2023 - Learning Track.
Seipp, J.; Torralba, Á.; and Hoffmann, J. 2022. PDDL Gen-
erators. https://doi.org/10.5281/zenodo.6382173.
Shen, W.; Trevizan, F. W.; and Thiébaux, S. 2020. Learning
Domain-Independent Planning Heuristics with Hypergraph
Networks. In ICAPS, 574–584.
Shleyfman, A.; Katz, M.; Helmert, M.; Sievers, S.; and
Wehrle, M. 2015. Heuristics and Symmetries in Classical
Planning. In AAAI, 3371–3377.
Sievers, S.; Katz, M.; Sohrabi, S.; Samulowitz, H.; and Fer-
ber, P. 2019a. Deep Learning for Cost-Optimal Planning:
Task-Dependent Planner Selection. In AAAI, 7715–7723.
Sievers, S.; Röger, G.; Wehrle, M.; and Katz, M. 2019b.
Theoretical Foundations for Structural Symmetries of Lifted
PDDL Tasks. In ICAPS, 446–454.
Ståhlberg, S.; Bonet, B.; and Geffner, H. 2022. Learning
General Optimal Policies with Graph Neural Networks: Ex-
pressive Power, Transparency, and Limits. In ICAPS, 629–
637.
Toyer, S.; Thiébaux, S.; Trevizan, F.; and Xie, L. 2020. As-
nets: Deep learning for generalised planning. J. Artif. Intell.
Res., 68: 1–68.
Proof of Theorem 1 • τ (pai (Ai1 , Ai2 , . . . , Aiki )) = pbi (Bi1 , Bi2 , . . . , Biki )
for all pai (Ai1 , Ai2 , . . . , Aiki ) ∈ Vanew where
Lemma 1. Let σ be any automorphism function of Gs (the b
TILG of state s) satisfying ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ pi (Bi1 , Bi2 , . . . , Biki ) is the respective added proposi-
where Ai and Bi are objects in O and, Ai ̸= Aj and tion in b+ based on the action schema grounding
Bi ̸= Bj for all i ̸= j. Then σ(p(Ai1 , . . . , Aik )) = where σ is any automorphism function of Gs satisfying
p(Bi1 , . . . , Bik ) for all propositions p(Ai1 , . . . , Aik ) ∈ s ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩.
where i1 , . . . , ik is a permutation of a subset of size k
of {1, . . . , n}. τ is a bijection. For all pai (Ai1 , Ai2 , . . . , Aiki ) ∈ Vanew ,
τ (pai (Ai1 , Ai2 , . . . , Aiki )) = pbi (Bi1 , Bi2 , . . . , Biki ) is a bi-
Proof. By the definition of ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩, jection from Vanew to Vbnew since it is a 1-to-1 mapping by
we have that σ(Aij ) = Bij . Moreover, σ preserves ver- construction and pbi (Bi1 , Bi2 , . . . , Biki ) ̸∈ Vbold otherwise
tex and edge colors, therefore σ(p(Ai1 , . . . , Aik )) must be a Lemma 1 would imply that pai (Ai1 , Ai2 , . . . , Aiki ) ∈ Vaold
proposition from predicate p (vertex class color) and, for all which is impossible since it is in Vanew by assumption.
ij ∈ {1, · · · , n}, its ij -th argument must be σ(Aij ) = Bij Since σ is an automorphism, it is a bijection of Vs to Vs
because the edges (Aij , p(Ai1 , . . . , Aik )) and (σ(Aij ) = therefore, for all v ∈ Vaold , τ (v) ∈ Vs and the same applies
Bij , σ(p(Ai1 , . . . , Aik ))) have the same color ij . for v ∈ Vbold . Thus we need to show that σ is also a bijection
between Vaold and Vbold . Since σ represents an automorphism
Theorem 2. Let Ai and Bi be objects in O for i ∈ for a TILG, it preserves the vertex and edge labels, therefore,
{1, . . . , n} with Ai ̸= Aj and Bi ̸= Bj for all i ̸= j. Let σ is a bijection between X to X where X ⊆ V is the set
an action schema α ∈ A, and consider two ground actions, of vertices with same colors in Gs . Thus σ is a bijection
a = α(A1 , A2 , . . . , An ) and b = α(B1 , B2 , . . . , Bn ), ap- between all v ∈ O and between all g ∈ G since they are in
plicable in a state s, resulting in successor states sa and sb all TILGs.
respectively. If ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ in s then the We still need to show the bijection between all the vertices
TILGs Gsa and Gsb are isomorphic, i.e., Gsa ∼ = Gsb . representing propositions already in s that remained true in
sa and in sb . Namely, we need to show the bijection between
Proof. Let Gs = ⟨Vs , Es , fs , ls ⟩ be the TILG for state s and s \ (a− \ a+ ) and s \ (b− \ b+ ). This follows from Lemma 1,
Vs = {o1 , . . . , on } ∪ {ps1 , . . . , psm } ∪ {g1 , . . . , gk } where therefore τ is a bijection.
oi are objects, psi are propositions true in s and gi are goal Vertex color compatibility. We need to show that, for all
propositions. Also, let a+ and a− (b+ and b− ) represent the v ∈ Va , the color of v in Gsa equals the color of τ (v) in
ordered list of add and delete effects of action a (b). Since Gsb , i.e., ca (v) = cb (τ (v)). For all v ∈ Vanew , v represents
a and b are groundings of α, we have that |a+ | = |b+ | and a proposition by definition and let p be the predicate that
|a− | = |b− | and that the i-th item in a+ and b+ (a− and b− ) generate it. The only possible color for v is “(p, non-goal)”
correspond to the same predicates pi with different object since Gsa is a valid TILG, i.e., v cannot be mentioned in
instantiations. the goal otherwise it would be in Vaold . The same argument
Let Gsa = ⟨Va , Ea , ca , la ⟩ and Gsb = ⟨Vb , Eb , cb , lb ⟩ be applies to Vbnew . As shown in the bijection part of the proof,
the TILG associated with sa and sb , respectively. First, let’s τ is a bijection between Vanew and Vbnew , therefore we have
partition Va = Vaold ∪ Vanew where Vaold are the vertices also ca (v) = cb (τ (v)) for all v ∈ Vanew .
in Vs and Vanew are the new vertices in Va wrt Vs . Clearly, We still need to show color compatibility for all v ∈
Vaold = Vs ∩ Va and, applying the TILG definition and the Vaold = O ∪ G ∪ (s \ a− ) ∪ (s ∩ a+ ). This is trivially
STRIPS rule, we have Vaold = O ∪ G ∪ (s ∩ ((s \ a− ) ∪ a+ )). true for v ∈ O since objects can only be labelled “(obj,
With more algebra manipulation, we arrive at Vaold = O ∪ obj-type)” and their object type is immutable. For all v ∈
G ∪ (s \ a− ) ∪ (s ∩ a+ ), i.e., all objects, all goal propositions, (s \ a− ) ∪ (s ∩ a+ ) \ G, i.e., non-goal propositions that are
all propositions not deleted and all added propositions that true in both s and sa , we have that ca (v) = cb (τ (v)) because
were already in s. Doing the same for Vanew = Va \ Vs , we τ (v) = σ(v) and σ is color compatible.
have Vanew = (O ∪ G ∪ sa ) \ (O ∪ G ∪ s) = sa \ (O ∪ G ∪ s). The remaining case is v ∈ G. Let p be the predicate that
Since sa ∩ O = ∅, after applying the STRIPS rule, we have generate the proposition v. Since Gsa is a valid TILG, ca (v)
Vanew = ((s \ a− ) ∪ a+ ) \ (G ∪ s) = a+ \ (G ∪ s), i.e., is either “(p, unachieved-goal)” or “(p, achieved-goal)”. If
all add effects that are not a goal proposition nor already in c(v) = ca (v), i.e., the color of v was not affect by a, then we
s. Notice that all elements in Vanew are propositions of the have c(v) = c(σ(v)) = ca (v) = cb (τ (v)) because σ(v) =
form pai (Ai1 , Ai2 , . . . , Aiki ) and that the same can be done τ (v) and σ is color compatible.
to create a partition of Vb . Otherwise we have that c(v) ̸= ca (v) and the color of
In order to show that Gsa and Gsb are isomorphic, we v was changed by a. Since a and b are actions from the
need to show that there exists a function τ : Va → Vb that same action schema, i.e., a = α(A1 , A2 , . . . , An ) and b =
is a bijection and also preserves the vertex and edge colors. α(B1 , B2 , . . . , Bn ) and ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩,
We show that the following function τ satisfies all these req- then v ∈ a+ iff σ(v) ∈ b+ and v ∈ a− iff σ(v) ∈ b− due to
uisites: the action schema grounding. Moreover, τ (v) = σ(v) ∈ G
and c(v) = c(σ(v)), i.e., both v and τ (v) are in goal propo-
• τ (v) = σ(v) for v ∈ Vaold sitions with the same color in Gs denoting if they are both
achieved or both non-achieved. Therefore, the change in the states and detect all state symmetries by using graph isomor-
label of v will be the same after applying a in s and applying phism algorithms. While this approach effectively reduces
b in s and ca (v) = cb (τ (v)). symmetries in the training data, the time required to com-
pute isomorphisms makes it impractical during the search
Edge color compatibility. TILGs are bipartite undirected process. In contrast, TILG provides a more compact struc-
graphs, thus all edges can be represented as an object- ture than the Object Graph. By using edge types to represent
proposition pair with the color representing the position parameter indices and consolidating each proposition into a
of the object in the instantiation of the predicate result- single vertex, TILG is both clearer and contains fewer ver-
ing in the given proposition. Thus we need to show that, tices.
for all (o, p) ∈ Ea , la ((o, p)) and lb ((τ (o), τ (p))) are the Graph neural networks in planning The widespread suc-
same. Notice that, o ∈ O is always in Vaold and in Vbold . cess of graph neural networks in various fields has motivated
If p ∈ Vaold , then lb ((τ (o), τ (p))) = lb ((σ(o), σ(p))) by an increasing number of researchers to explore neural net-
the definition of τ and it equals la ((o, p)) since σ is an works for planning problems (Ståhlberg, Bonet, and Geffner
automorphism. Otherwise, p ∈ Vanew and, without loss of 2022; Horcı́k and Sı́r 2024; Hao et al. 2024; Drexler et al.
generality, let p be pai (Ai1 , Ai2 , . . . , Aiki ) and o be Aij 2024b).
for j ∈ {1, . . . , ki }. Since both Ga and Gb are valid Chen et al. (Chen, Thiébaux, and Trevizan 2024; Chen,
TILGs, we have lb ((τ (Aij ), τ (pai (Ai1 , Ai2 , . . . , Aiki )))) = Trevizan, and Thiébaux 2024) introduce two graph struc-
lb ((Bij , pbi (Bi1 , Bi2 , . . . , Biki ))) = j which is equals tures: the Lifted Learning Graph (LLG) and the Instance
la ((Aij , pai (Ai1 , Ai2 , . . . , Aiki ))) = j, i.e., Aij and Learning Graph (ILG). LLG includes objects, propositions,
Bij are the j-th argument of pai (Ai1 , Ai2 , . . . , Aiki ) and goals, and action schemas to compute domain-independent
pbi (Bi1 , Bi2 , . . . , Biki ), respectively. Therefore, τ is edge heuristic values, while ILG simplifies LLG by excluding ac-
color compatible. tion schemas, which reduces graph size and increases pro-
cessing speed in the domain-dependent setting. Our work
builds upon these foundations by introducing Typed In-
Related Work stance Learning Graph (TILG), which significantly en-
Symmetries in planning Identifying symmetries during hances the graph-level representation with object types and
search is crucial for pruning search spaces, and numerous static propositions. As illustrated in Figure 1, a direct com-
studies have investigated this area (Domshlak, Katz, and Sh- parison between two graphs reveals that ILG has signifi-
leyfman 2012; Shleyfman et al. 2015; Sievers et al. 2019b). cantly fewer vertices and connections compared to TILG,
Pochter, Zohar, and Rosenschein (2011) introduce the Prob- resulting in a much smaller graph. The absence of object
lem Description Graph (PDG), a structure that integrates all types and static propositions and their connections with ob-
variables, their possible values, and action preconditions and jects in ILG may restrict the ability of neural networks to
effects to compute automorphism generators at the initial effectively learn from data.
state. These generators are key to creating canonical forms
that aid in identifying symmetries during the search process. Additional Experimental Setting
The paper also introduces a shallow pruning method, which
eliminates branches without producing child states. While Data Generation
both methods share similarities with ours, our methods are The 2023 International Planning Competition Learning
notably more efficient. For state pruning, our method elimi- Track (Seipp and Segovia-Aguas 2023) is used in our ex-
nates the need for automorphism calculations and canonical periments. It consists of 10 domains, each of which contains
transformations, substantially reducing computational over- 100 training problems and 90 testing problems. The testing
head. Action pruning focuses exclusively on the current state problems are categorized into three levels of difficulty: easy,
and omits extraneous values and actions, significantly reduc- medium, and hard, with 30 problems in each category. To
ing graph size and consequently, the automorphism compu- further evaluate the proposed pruning methods, we employ
tation time. Furthermore, PDG requires fixing the number six additional domains that contain a high number of sym-
of objects (Corrêa et al. 2024) to find all generators, while metrical states. The problems for these domains are gen-
TILG is based on each state, so it can adjust this change. erated using PDDL generators (Seipp, Torralba, and Hoff-
Sievers et al. (2019a) adopts graphs normally used for mann 2022). To replicate the setting of the Learning Track,
symmetry detection, such as the PDG or the Abstract Struc- we generate training problems and testing problems for each
ture Graph (ASG) (Sievers et al. 2019b), to learn plan- additional domain (see Table 4).
ning portfolios. However, these graphs are transformed into For training data, we follow the setting of GOOSE (Chen,
grayscale images for processing by a convolutional network. Thiébaux, and Trevizan 2024), and solve the training prob-
Moreover, the ASG, which encodes a lifted planning prob- lems using the Scorpion planner (Seipp, Keller, and Helmert
lem, creates computational issues for learning with GNNs, 2017), saving the graphs of the states on the optimal paths
as its encoding of predicate and action schema arguments as data, and their optimal heuristic values as targets. One
via a directed path of length linear in the number of argu- challenge we encountered with the Scorpion planner is the
ments requires a large receptive field, and its directed edges limited number of training problems it is able to solve.
restricts the information flow. Recent independent work by To address this, we developed a simple method creat-
(Drexler et al. 2024a) proposes the Object Graph to represent ing additional training problems by considering increasingly
(a) ILG (b) TILG

Figure 1: A planning problem in the childsnack domain is illustrated using both ILG and TILG. Note that the colors used are
for visualization purposes only, differing from the color encoding employed for computing automorphisms. Observe ILG lacks
types (besides “Object”) and static propositions.

Training Testing
Domain
Parameters Num. Solved Parameters Num.
gripper n ∈ [1, 30] 30 16 n ∈ [50, 800] 90
grippers n ∈ [1, 2], r ∈ [1, 3], o ∈ [4, 16] 30 30 n ∈ [5, 50], r ∈ [3, 5], o ∈ [20, 400] 90
logistics a ∈ [1, 3], c ∈ [2, 3], s ∈ [2, 3], p ∈ [2, 32] 50 40 a ∈ [1, 5], c ∈ [2, 5], s ∈ [2, 3], p ∈ [5, 100] 90
movie n ∈ [1, 30] 30 30 n ∈ [50, 800] 90
tsp n ∈ [1, 30] 30 30 n ∈ [50, 1600] 90
tyreworld n ∈ [1, 30] 30 17 n ∈ [11, 100] 90

Table 4: The parameters are used to generate training and testing problems. The ’Solved’ column indicates the number of
problems solved by the Scorpion planner (Seipp, Keller, and Helmert 2017).

larger subsets of the goal of an existing problem. Specif- main employs seven layers.
ically, our method extracts the sequence in which subgoals This layer adjustment addresses an observed issue: with
were last achieved in optimal solutions of training problems. fewer layers, the GNNs produce identical output vectors,
For existing problems with n subgoals, we create n − 1 new leading to lower performance and substantial pruning errors
problems, where the goal for the kth new problem consists in certain domains. This raises an important question: should
in the k first subgoals of the sequence, while other elements the same number of layers be used across all domains? Due
remain unchanged. to substantial structural differences in TILG across domains,
These newly generated problems are then solved by the a limited number of layers often fails to effectively aggre-
Scorpion planner as well. A key advantage of this method is gate information for specific graph types. For example, in
that the new problems produced are simpler than the origi- the logistics domain, information about the goal location is
nals, enabling more problems to be solved. Furthermore, the critical but can be distant from the package’s current loca-
method can be considered a form of offline data augmen- tion in TILG, necessitating additional layers to effectively
tation, as it doesn’t rely on external sources of additional capture and aggregate this information. Therefore, the num-
training data. ber of layers in GNNs for TILG should be tailored to each
domain.
Network Structure
Our graph network is implemented using the standard Py-
Validation Details
Torch Geometric package (Fey and Lenssen 2019). It com- To ensure result stability and reduce the bias in the training
prises multiple RGCN layers with a hidden dimension of datasets, we apply early stopping based on a validation set to
64 (Schlichtkrull et al. 2018), followed by global add pool- select the optimal models (Bai et al. 2022, 2023b). Specifi-
ing and a linear layer that produces a one-dimensional out- cally, since the training data only comes from optimal paths,
put representing heuristic values. For most domains, we use a prediction that closely approximates the ground truth does
three RGCN layers; however, the spanner, logistics and tyre- not guarantee that it is the lowest in the heuristic queue. To
world domains require four layers, while the sokoban do- address this issue, we save all sibling states on the optimal
paths to select models. The selection metric is the valida-
tion accuracy, where the correct result corresponds to the
optimal state with the lowest heuristic value among sibling
states. Note: for validation, the generated sub-problems are
not included.

Reproducibility
To ensure reproducibility, we have implemented several
measures. First, we report both the mean and standard devi-
ation of all experimental results. Second, to enable reliable
replication, we conduct experiments in PyTorch’s determin-
istic mode throughout model training, ensuring that results
are consistent when using the same hardware and software
versions. Finally, comprehensive details regarding the hard-
ware, software, container image, and datasets used in this
research will be made publicly available in our code reposi-
tory, which will be released upon publication.

You might also like