Learning Efficiency Meets Symmetry Breaking: Yingbin Bai, Sylvie Thi Ebaux, Felipe Trevizan
Learning Efficiency Meets Symmetry Breaking: Yingbin Bai, Sylvie Thi Ebaux, Felipe Trevizan
Abstract advantage, the full potential of this feature has not yet been
effectively harnessed to detect and break symmetries during
Learning-based planners leveraging Graph Neural Networks the search process.
can learn search guidance applicable to large search spaces,
In this paper, we remedy this by introducing a graph rep-
yet their potential to address symmetries remains largely un-
explored. In this paper, we introduce a graph representation of resentation designed to achieve two key objectives: learning
planning problems allying learning efficiency with the abil- efficiency and symmetry reduction. Leveraging the strengths
ity to detect symmetries, along with two pruning methods, of this representation, we propose two pruning methodolo-
action pruning and state pruning, designed to manage sym- gies: action pruning and state pruning. Action pruning in-
metries during search. The integration of these techniques fers symmetries by analyzing object involvement in action
into Fast Downward achieves a first-time success over LAMA parameters, without generating child states nor computing
on the latest IPC learning track dataset. Code is released at: their heuristic value. Additionally, since GNNs can retain
https://github.com/bybeye/Distincter. invariant outputs for symmetrical inputs, state pruning ex-
ploits this property to efficiently identify symmetries be-
Introduction tween states.
To evaluate the proposed techniques, we implemented
Over the past two decades, heuristic search has achieved them on top of Fast Downward (Helmert 2006) in a plan-
significant success across a variety of planning problems, ner called Distincter and carried out experiments on the
and has become the standard approach in the field (Richter 2023 International Planning Competition Learning Track.
and Westphal 2010; Höller et al. 2020; Corrêa et al. 2022; The overall coverage of Distincter surpasses that of the tradi-
Geißer et al. 2022; Klößner, Seipp, and Steinmetz 2023). tional SOTA method, LAMA (Richter and Westphal 2010),
Nevertheless, even in classical planning, scalability remains for the first time in the recent literature on learning plan-
a significant challenge for these methods. This has led a ning heuristics, marking a significant milestone for learning-
growing number of researchers to turn to learning-based based methods.
methods, particularly using Graph Neural Networks (GNNs) In terms of related work, recent independent work by
(Toyer et al. 2020; Shen, Trevizan, and Thiébaux 2020; (Drexler et al. 2024b) removes symmetries in the training set
Ståhlberg, Bonet, and Geffner 2022; Chen, Trevizan, and in offline mode, thereby improving training effectiveness. In
Thiébaux 2024; Horcı́k and Sı́r 2024; Hao et al. 2024; contrast, our approach focuses on removing symmetries dur-
Drexler et al. 2024b). Unlike traditional model-based meth- ing the search process, so as to enhance search efficiency and
ods, which are reliant solely on analysing planning domain scale to large planning problems.
and problem definitions, GNNs are capable of learning pat-
terns and strategies from existing plans to enhance search Background and notation
efficiency and adaptability.
However, learning efficiency alone is insufficient to A lifted planning problem is defined as a tuple Π =
address the challenges inherent in large-scale planning, ⟨O, T , P, A, I, G⟩, where O denotes a set of objects, T is
which often involves a substantial number of symmetrical a set of object types, P consists of first-order predicates, A
states (Wehrle et al. 2015; Sievers et al. 2019b). Although comprises action schemas, I specifies the current (or initial)
these states do not affect plan quality, they consume sig- state, and G delineates the goal.
nificant computational resources and can considerably slow A predicate p ∈ P has parameters xp1 , . . . , xpn for pn ∈
down the search process. In this paper, we use NNs and N, where each parameter requires a specific type of object.
GNNs with permutation invariant activation functions to A predicate can be instantiated by assigning each xi to an
learn a permutation invariant function allowing them to pro- object from O, resulting in a proposition ρ. A state is an
duce consistent outputs for symmetrical inputs. Despite this assignment of truth value to the propositions.
An action schema a = ⟨Xa , pre(a), add(a), del(a)⟩ is
Copyright © 2025, Association for the Advancement of Artificial defined as a tuple comprising a list of typed parameters
Intelligence (www.aaai.org). All rights reserved. Xa = (xa1 , . . . xan ), along with sets of preconditions, add
effects, and delete effects, all of which are predicates in P • E = {(o, p(o1 , ..., on ))|o ∈ O, ∃i o = oi , p(o1 , ..., on ) ∈
with parameters from Xa . When all parameters of an action I ∪ G}
schema are instantiated with objects of the required types, • c : V → {(status, class) | status ∈ {0, 1, 2, 3}, class ∈
the action is referred to as a ground action. A ground action T ∪P}, maps each vertex to a tuple where:
a is applicable in a state s if pre(a) ⊆ s. When a is applied
to s, the resulting state s′ is given by (s\del(a))∪add(a). In – status indicates the goal status of propositions: 0 for
this context, the state s is referred to as the parent state, and non-goal propositions in I \ G, 1 for unachieved goal
s′ is known as the child state. Since the set of applicable ac- propositions in G \ I, and 2 for achieved goal propo-
tions for a parent state is typically not a singleton, expanding sitions in I ∩ G. status = 3 for object vertices.
a parent state usually generates a set of child states. – class refers to the object type for object vertices, and
A sequence of actions a1 , . . . , an is applicable in a state for proposition vertices, it denotes the predicate of
s if there exists a sequence of states s0 , . . . , sn such that which the proposition is an instance.
s0 = s, and for each i ∈ {1, . . . , n}, the state si is the result • l : E → N, where for each edge e ∈ E, l(e) indicates
of applying ai in si−1 . The aim is to find a plan for a given the index of the object in the proposition parameters.
planning problem Π, which is a sequence of ground actions
that is applicable in the initial state I and results in a state In ILG, the object type information is absent, whereas
sn such that G ⊆ sn . TILG embeds it within each object vertex. This may seem
A colored (or labelled) graph is a tuple G = ⟨V, E, c, l⟩ minor, but it adds valuable information to each object vertex
where V is the set of vertices, E is the set of edges, and significantly enriching the information available. Moreover,
c (resp. l) maps vertices (resp. edges) to their color. Two Fast Downward omits static propositions during search,
graphs G = ⟨V, E, c, l⟩ and G′ = ⟨V ′ , E ′ , c′ , l′ ⟩ are iso- which causes them to be missing in existing ILG imple-
morphic, denoted by G ∼ = G′ , if there exists a bijection mentations as well. While this omission does not affect
τ : V → V such that (u, v) ∈ E iff (τ (u), τ (v)) ∈ E ′ ,
′ traditional heuristic methods, it significantly impacts learn-
c′ (τ (v)) = c(v) for all v ∈ V , and l′ ((τ (u), τ (v))) = ing methods, which estimate heuristics based on the graph.
l((u, v)) for all (u, v) ∈ E. Without static propositions, crucial information is lost, lead-
An automorphism of G is defined as an equivalence rela- ing to blind guesses for some actions. For instance, “wait-
tion σ representing an isomorphism between G and itself. ing” propositions in the childsnack domain are static, and
The set of all automorphisms of G forms a group under without this information, planners are unable to determine
the operation of composition, known as the automorphism which table the tray should be moved to. Therefore, TILG
group Aut(G) of the graph. The orbit of a vertex v in a graph includes static propositions.
consists of all vertices that can be transformed into v by any In the following, all elements of the problem Π are fixed,
automorphism in Aut(G). This implies that any two vertices except for the current state s. We shall therefore identify Π
within the same orbit are structurally equivalent in the graph, with s and will refer to the TILG Gπ as Gs .
maintaining the same connections and roles relative to other
vertices and edges. Action Pruning
With Greedy Best-First Search (GBFS), planners select the
Distincter state with the smallest heuristic value to expand, which re-
Typed Instance Learning Graph (TILG) quires computing the heuristic value of all child states. When
Our graph representation extends the Instance Learning child states contain a large number of symmetries, this can
Graph (ILG) (Chen, Trevizan, and Thiébaux 2024), main- result in significant time wasted on redundant calculations.
taining similar structures but offering additional information Shallow pruning was designed to address this challenge
for learning and symmetry detection. The graph’s vertices (Pochter, Zohar, and Rosenschein 2011). However, the prob-
represent objects and propositions in the initial (current) lem description graph (PDG) used in shallow pruning re-
state and the goal, and edges exist between propositions and quires instantiating all predicates and action schemas within
the objects in their parameter list.1 Vertex features capture the graph, resulting in significant computational overhead
the object types, the predicates instanciated by the propo- for each state. To improve efficiency, we introduce Action
sitions, and whether goal propositions have been achieved. Pruning, which replaces the PDG with TILG. A key inno-
Edge features capture the index of objects in proposition pa- vation of action pruning is its ability to infer symmetrical
rameter lists. Formally: child states from the parent state, eliminating the need for
action preconditions and effects in the graph. By leveraging
Definition 1. Let Π = ⟨O, T , P, A, I, G⟩ represent a the much more compact TILG representation and its infer-
lifted planning problem. The typed instance learning graph ence capability, action pruning enables faster automorphism
(TILG) for Π is the undirected graph GΠ = ⟨V, E, f, l⟩, calculations.
such that:
Definition 2. (Object Tuples Equivalence) Let
• V =O∪I ∪G
⟨A1 , . . . , An ⟩ and ⟨B1 , . . . , Bn ⟩ be two tuples of ob-
1
In the following, we will use the word symmetric to refer to jects s.t. Ai and Bi in O with corresponding vertices
states represented by isomorphic TILG and to objects or proposi- ui and vi in the TILG Gs . We say that ⟨A1 , . . . , An ⟩
tions that are related via τ (or σ depending on the context). is equivalent to ⟨B1 , . . . , Bn ⟩ in s, denoted as
Algorithm 1: Action pruning algorithm feature i.
Input: Planning problem with current state s After obtaining the orbits Os of Gs , the parameters of
Input: Set As of actions applicable in s each applicable action a are substituted with their respec-
Output: Pruned action set Ap ⊆ As tive orbit IDs, generating a unique hash key Ka . This hash
1: K ← ∅, Ap ← ∅ key is subsequently used to identify and eliminate symmet-
2: Graph Gs ← TILG(s) with encoding in Eq. 1 ric actions, ensuring that only distinct actions are retained in
3: Orbits Os ← Nauty(Gs ) Ap for further processing.
4: for a in As do
5: Ka ← Replace params with orbits(a, Os ) State Pruning
6: if Ka not in K then Symmetries arise not only between child states but also
7: K ← K ∪ {Ka } across states from different parents. Many state pruning ap-
8: Ap ← Ap ∪ {a} proaches have been proposed and proven useful in classical
9: end if planning (Pochter, Zohar, and Rosenschein 2011; Domsh-
10: end for
lak, Katz, and Shleyfman 2012). However, the main issue
11: return Ap
limiting their widespread use in planning problems is their
high computational cost. To address this issue, we propose
a novel method that performs state pruning with negligible
⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩, iff there exist an auto- additional overhead. Specifically, building on the permuta-
morphism of Gs represented by the bijective function σ s.t. tion invariance property of GNNs, we use the embeddings
σ(ui ) = vi for all i ∈ {1, . . . , n}. from the second-to-last layer of the network as hash keys to
Theorem 1. Let Ai and Bi be objects in O for i ∈ efficiently detect and eliminate symmetries across states.
{1, . . . , n} with Ai ̸= Aj and Bi ̸= Bj for all i ̸= j. Let The idea of using neural network outputs to check sim-
an action schema α ∈ A, and consider two ground actions, ilarity is not new, having been employed in Siamese net-
a = α(A1 , A2 , . . . , An ) and b = α(B1 , B2 , . . . , Bn ), ap- works since early work in deep learning (Bromley et al.
plicable in a state s, resulting in successor states sa and sb 1993). These identical architecture, weight-shared networks
respectively. If ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ in s then the are specifically designed to assess and compare the similar-
TILGs Gsa and Gsb are isomorphic, i.e., Gsa ∼ = Gsb . ity between two inputs. This approach has proven effective
across various fields, including fingerprint identification (Li
The proof of Theorem 1 is in the supplementary material. et al. 2021) and anomaly detection (Zhou et al. 2021). For
Unfortunately, identifying all isomorphic successor states GNNs, Chen et al. (2019) highlight the equivalence between
in order to prune actions requires testing an exponential graph isomorphism testing and approximating permutation-
number of tuples against the automorphisms of Gs . There- invariant functions. Moreover, standard GNNs have been
fore, we resort to a simpler method that over-approximates shown to possess an expressive power, comparable to that of
the set of equivalent tuples and isomorphic successor states, the 1-LWL test (Xu et al. 2019). While this implies GNNs
and consequently does not preserve the completeness of the may be unable to distinguish some non-isomorphic graphs,
search process. It relaxes the conditions of the theorem by compromising the completeness of the search when state
checking the equivalence of all individual pairs Ai and Bi pruning is used, our results demonstrate that GNNs based
in s, i.e., the condition that ⟨A1 , . . . , An ⟩ ≃ ⟨B1 , . . . , Bn ⟩ on TILG can be highly effective in both heuristic prediction
is replaced with ⟨Ai ⟩ ≃ ⟨Bi ⟩ ∀i, or in other terms that Ai and state pruning.
and Bi are in the same orbit of Gs . In our experiments, we The TILG Gsi for the current state si is fed through a
find that this over-approximation yields good results in prac- graph network ϕθ to encode an embedding zi . Subsequently,
tice. Moreover, we did not observe any failure due to incom- zi is processed by a fully connected linear layer φθ to gener-
pleteness, and therefore do not currently employ any fall-
back mechanism. ate a heuristic value ĥi ∈ R. This process is represented by
This process of action pruning is outlined in Algorithm 1. zi = ϕθ (Gsi ) and ĥi = φθ (zi ).
First, the planning problem with current state s is converted Next, zi is rounded up and encoded using MD5 to shorten
into a TILG Gs . The Nauty library (McKay and Piperno its length, serving as a key in a hash map for state match-
2014) is then utilized to compute the orbits Os of Gs . Since ing. Since zi is efficiently captured during the network’s for-
Nauty lacks support for feature vectors, we aggregate ver- ward pass, there is no need to generate keys through compu-
tex features into an unique vertex color to detect automor- tationally expensive methods like calculating isomorphisms
phisms. This color-coding strategy is detailed in Equation 1: in PDG (Pochter, Zohar, and Rosenschein 2011), resulting
XN in minimal additional cost for state pruning.
color = 10βi × Fi with
i=1
Pi−1 (1) Experiments
βi = n=1 ⌈log 10 M n ⌉ i ≥ 2 , Datasets. We evaluate our framework, Distincter, on the
0 i=1 2023 International Planning Competition Learning Track
where N denotes the number of features, Fi represents the (Seipp and Segovia-Aguas 2023), which includes ten do-
value of feature i, and Mn is the maximum possible value of mains. In ablation experiments, to assess the effectiveness
Domain hF F LAMA GOOSE OptRank GPR Distincter Domain None Action State Distincter
blocksworld 28 61 61±10 44±11 69 88±4 blocksworld 79±11 79±7 88±3 88±4
childsnack 26 34 16±4 32±1 20 64±5 childsnack 34±4 63±3 61±1 64±5
ferry 71 70 70±0 64±4 82 83±1 ferry 82±0 82±0 83±1 83±1
floortile 10 10 1±0 1±0 2 2±0 floortile 2±1 2±0 2±0 2±0
miconic 90 90 89±1 88±4 90 90±0
miconic 90±0 90±0 90±0 90±0
rovers 29 70 28±1 31±2 36 42±2
satellite 64 90 29±2 29±3 39 48±17
rovers 41±2 41±3 41±2 42±2
sokoban 36 40 34±0 32±1 38 32±2 satellite 45±13 46±13 47±17 48±17
spanner 30 30 39±16 65±0 74 90±0 sokoban 32±2 32±2 32±2 32±2
transport 41 68 37±4 42±5 28 50±3 spanner 83±0 90±0 83±0 90±0
transport 42±1 42±1 49±2 50±3
Sum 425 563 405 429 478 589
gripper 24±9 75±4 90±0 90±0
grippers 62±5 89±1 85±1 90±1
Table 1: Coverage comparison with SOTA methods on the logistics 19±9 36±4 53±4 52±3
2023 International Planning Competition Learning Track. movie 90±0 74±17 90±0 75±17
tsp 76±24 78±0 90±0 90±0
Domain LAMA Distincter tyreworld 0±0 1±1 64±20 65±21
blocksworld 390 198±10 Sum 803 920 1048 1051
childsnack 45 34±3
ferry 257 206±2 Table 3: Ablation study. “None” refers to GBFS + GNN
floortile 34 32±0 heuristic without pruning, “Action” denotes the use of ac-
miconic 324 273±8 tion pruning, “State” represents the use of state pruning.
rovers 72 106±11
satellite 18 27±8
sokoban 46 49±14 All experiments are conducted on a single CPU core with an
spanner 14 16±0 NVIDIA A6000 GPU and 8GB of memory, with a 30 minute
transport 49 45±3 timeout per problem. The mean and standard deviation are
Sum 1249 987 computed from three trials.
Figure 1: A planning problem in the childsnack domain is illustrated using both ILG and TILG. Note that the colors used are
for visualization purposes only, differing from the color encoding employed for computing automorphisms. Observe ILG lacks
types (besides “Object”) and static propositions.
Training Testing
Domain
Parameters Num. Solved Parameters Num.
gripper n ∈ [1, 30] 30 16 n ∈ [50, 800] 90
grippers n ∈ [1, 2], r ∈ [1, 3], o ∈ [4, 16] 30 30 n ∈ [5, 50], r ∈ [3, 5], o ∈ [20, 400] 90
logistics a ∈ [1, 3], c ∈ [2, 3], s ∈ [2, 3], p ∈ [2, 32] 50 40 a ∈ [1, 5], c ∈ [2, 5], s ∈ [2, 3], p ∈ [5, 100] 90
movie n ∈ [1, 30] 30 30 n ∈ [50, 800] 90
tsp n ∈ [1, 30] 30 30 n ∈ [50, 1600] 90
tyreworld n ∈ [1, 30] 30 17 n ∈ [11, 100] 90
Table 4: The parameters are used to generate training and testing problems. The ’Solved’ column indicates the number of
problems solved by the Scorpion planner (Seipp, Keller, and Helmert 2017).
larger subsets of the goal of an existing problem. Specif- main employs seven layers.
ically, our method extracts the sequence in which subgoals This layer adjustment addresses an observed issue: with
were last achieved in optimal solutions of training problems. fewer layers, the GNNs produce identical output vectors,
For existing problems with n subgoals, we create n − 1 new leading to lower performance and substantial pruning errors
problems, where the goal for the kth new problem consists in certain domains. This raises an important question: should
in the k first subgoals of the sequence, while other elements the same number of layers be used across all domains? Due
remain unchanged. to substantial structural differences in TILG across domains,
These newly generated problems are then solved by the a limited number of layers often fails to effectively aggre-
Scorpion planner as well. A key advantage of this method is gate information for specific graph types. For example, in
that the new problems produced are simpler than the origi- the logistics domain, information about the goal location is
nals, enabling more problems to be solved. Furthermore, the critical but can be distant from the package’s current loca-
method can be considered a form of offline data augmen- tion in TILG, necessitating additional layers to effectively
tation, as it doesn’t rely on external sources of additional capture and aggregate this information. Therefore, the num-
training data. ber of layers in GNNs for TILG should be tailored to each
domain.
Network Structure
Our graph network is implemented using the standard Py-
Validation Details
Torch Geometric package (Fey and Lenssen 2019). It com- To ensure result stability and reduce the bias in the training
prises multiple RGCN layers with a hidden dimension of datasets, we apply early stopping based on a validation set to
64 (Schlichtkrull et al. 2018), followed by global add pool- select the optimal models (Bai et al. 2022, 2023b). Specifi-
ing and a linear layer that produces a one-dimensional out- cally, since the training data only comes from optimal paths,
put representing heuristic values. For most domains, we use a prediction that closely approximates the ground truth does
three RGCN layers; however, the spanner, logistics and tyre- not guarantee that it is the lowest in the heuristic queue. To
world domains require four layers, while the sokoban do- address this issue, we save all sibling states on the optimal
paths to select models. The selection metric is the valida-
tion accuracy, where the correct result corresponds to the
optimal state with the lowest heuristic value among sibling
states. Note: for validation, the generated sub-problems are
not included.
Reproducibility
To ensure reproducibility, we have implemented several
measures. First, we report both the mean and standard devi-
ation of all experimental results. Second, to enable reliable
replication, we conduct experiments in PyTorch’s determin-
istic mode throughout model training, ensuring that results
are consistent when using the same hardware and software
versions. Finally, comprehensive details regarding the hard-
ware, software, container image, and datasets used in this
research will be made publicly available in our code reposi-
tory, which will be released upon publication.