AI Goal Recognition via Process Mining
AI Goal Recognition via Process Mining
Fast and Accurate Data-Driven Goal Recognition Using Process Mining Techniques
Zihang Su, Artem Polyvyanyy, Nir Lipovetzky, Sebastian Sardina and Nick van Beest
PII: S0004-3702(23)00119-4
DOI: https://doi.org/10.1016/j.artint.2023.103973
Reference: ARTINT 103973
Please cite this article as: Z. Su, A. Polyvyanyy, N. Lipovetzky et al., Fast and Accurate Data-Driven Goal Recognition Using Process
Mining Techniques, Artificial Intelligence, 103973, doi: https://doi.org/10.1016/j.artint.2023.103973.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and
formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.
Zihang Sua,∗, Artem Polyvyanyya , Nir Lipovetzkya , Sebastian Sardinab , Nick van Beestc
a The University of Melbourne, Parkville, VIC 3010, Australia
b Royal Melbourne Institute of Technology, 124 La Trobe St., Melbourne, VIC 3000, Australia
c CSIRO, 41 Boggo Road, Dutton Park, QLD 4102, Australia
Abstract
The problem of goal recognition requests to automatically infer an accurate probability distribu-
tion over possible goals an autonomous agent is attempting to achieve in the environment. The
state-of-the-art approaches for goal recognition operate under full knowledge of the environment
and possible operations the agent can take. This knowledge, however, is often not available
in real-world applications. Given historical observations of the agents’ behaviors in the environ-
ment, we learn skill models that capture how the agents achieved the goals in the past. Next, given
fresh observations of an agent, we infer their goals by diagnosing deviations between the obser-
vations and all the available skill models. We present a framework that serves as an outline for
implementing such data-driven goal recognition systems and its instance system implemented
using process mining techniques. The evaluations we conducted using our publicly available
implementation confirm that the approach is well-defined, i.e., all system parameters impact its
performance, has high accuracy over a wide range of synthetic and real-world domains, which is
comparable with the more knowledge-demanding state-of-the-art approaches, and operates fast.
Keywords: Goal recognition, data-driven, process mining, autonomous agent, fast and accurate
1. Introduction
Goal Recognition (GR) techniques aim to infer the intentions of an autonomous agent
according to the observed actions of that agent [1, 2]. As humans can observe and understand
another agent’s intentions, GR techniques aim to mimic some extent of this human’s ability.
Three concepts are inherent to the understanding of GR: A plan is composed of actions that were
or should be taken to achieve a certain goal; An agent, such as a robot or human, follows such
plans to accomplish goals; A GR system is software that implements a GR technique capable of
inferring the goals of agents based on partial knowledge about the plans. When a GR system
analyzes actions executed by some agent, it aims to detect the plan that the agent is following
and, hence, the goal that will be achieved after completing that plan.
Nowadays, many tasks traditionally assigned to humans are performed by robots or software.
From smart houses to autonomously driving cars, understanding what other agents are trying
∗ Corresponding author
Email address: [email protected] (Zihang Su)
Preprint submitted to Artificial Intelligence July 10, 2023
to accomplish is paramount to implementing intelligent behavior and effective human-machine
interaction. For example, an intelligent driving system should figure out other vehicles’
intentions or next actions to ensure safety. A smart house ought to understand whether the
household is trying to cook, watch a movie, or sleep to provide relevant help. GR techniques
play an important role in these, and many other areas, including the support of adversarial
reasoning [3, 4], trajectory/maneuver prediction [5, 6, 7], and human-computer collaboration [8].
The existing GR techniques can be broadly classified into the following three categories:
(i) Observations of an agent’s actions are “matched” to a plan (the one judged to be carried
out by the agent) in a predefined library encoding the standard operational procedures of the
domain [1, 2, 9], namely, plan-library-based GR approaches; (ii) Appealing to the principle of
rational behavior, an agent is assumed to be taking the “optimal” plan to the goal: the more
rational a behavior appears toward a goal, the more likely such goal is the agent’s goal. Ramirez
and Geffner [10, 11] have sparked a plethora of approaches not needing any a priori set of plans.
These approaches perform GR by exploiting planning systems to automatically generate plans
relative to a domain theory, namely, planning-based GR approaches; (iii) Learning-based GR
approaches learn domain models or prediction models from data, for example, sequences of
observed states, as illustrated by Amado et al. [12], perform GR over image sequences used to
train an autoencoder learning the transition function, see the approach by Amado et al. [13] that
learns Q-value functions to infer probabilities toward the goal candidates, and learn interpretable
dynamics through deterministic finite automata, as in the approach by Shvo et al. [14].
The challenge for plan-library-based approaches is in obtaining or hand-coding a possible set
of plans for achieving the candidate goals. In addition, these approaches do not accommodate
uncertainty (i.e., they often can not generalize to the observations that are not pre-stored in
the plan library). For planning-based approaches, even though specifying domain models
could be less demanding than hand-coding of plans and “new” plans can be found, acquiring
domain models is far from trivial due to the difficulty of defining models using standard
declarative languages [15]. It is especially challenging to acquire domain models of real-
world environments, which are subject to continuous changes [16]. As a result, planning-based
approaches are difficult to apply in real-world scenarios. Finally, one of the main challenges of
the learning-based approaches is determining the scope and obtaining a sufficient volume of data
that can be used to train effective models for goal recognition.
In this paper, we present a framework, and its concrete instantiation, for implementing GR
systems that do not require hand-coded plans or domain models for inference. Instead, the
framework proposes to utilize process mining [17] techniques to automatically learn process
models from event logs of historical observations of agents that encode the skills for achieving
various goals in the environment, and to analyze deviations from the observed agent’s behavior
from the learned skill models for inference. A process, or a skill, model in this work is a Petri net
(see Section 3.2.1), with its executions encoding action sequences (plans) executed by an agent
for achieving a certain goal. The event logs are not plan libraries, and the learned process models
stand for execution instances of some underlying “hidden” standard operational procedures of
the domain. Therefore, our approach overcomes the challenges associated with handcrafting the
plan libraries and domain models. Moreover, compared with the learning-based approaches, our
GR approach learns models from sequences of actions, while the other approaches often require
additional information, such as the environment states before and after the executed actions or
the transition functions for every observed action [12, 13, 14]. We assume the agent is acting in
an unknown environment that can be described, for instance, in a STRIPS [18] or PDDL [15]
model that specifies the states and the dynamics of the environment. Learned Petri nets then
2
aim to describe a subset of goal-relevant action sequences—plans—in the environment. Note
we do not use Petri nets to represent the underlying dynamic domain, and as such states in
the net do not correspond to states in the domain. We argue that our learning-based proposal sits
between plan-library-based approaches, which base reasoning on exemplary plans, and planning-
based approaches, which base reasoning on cost differences between optimal and observed plans.
The performed evaluations demonstrate that our GR system can make inferences fast, while
its accuracy is comparable with the state-of-the-art GR techniques. The fast recognition speed
brings advantages to applying our GR system to time-sensitive scenarios such as a smart home
system [19], where the household expects a smooth user experience.
This paper is an extended version of our conference paper [20]. The conference paper made
these contributions:
• It proposed a GR framework that describes the fundamental mechanisms for performing
GR without predefined plan libraries and domain models;
• It discussed an implementation of a concrete GR approach based on process mining
techniques that follows the proposed GR framework and relies on three parameters
to construct a probability distribution over possible goals and to infer the most likely
goal. The three parameters are a “smoothening” constant (ϕ) that flattens the probability
distribution over possible goals, a consecutive mismatch suffix factor (λ) that detects
whether the agent is deviating from a candidate goal, and a discount factor (δ) that
emphasizes the recently observed actions to have more impact on the goal inferences;
• It presented experimental results for two traditional planning domains, namely “grids”
and “blocks-world” from the International Planning Competition (IPC), and three real-
world datasets, namely “daily activities,” “building permit,” and “environment permit” to
demonstrate that our GR approach can overcome the limitations of the classical planning-
based GR systems.
This paper extends the conference paper with the following contributions:
• It extends the original GR approach by introducing a new parameter, called decision
threshold (θ), which controls the selection of the most likely goals;
• It presents results of a sensitivity analysis over 15 IPC domains and ten real-world domains
that confirm that all four parameters (ϕ, λ, δ, and θ) have a significant impact on the
performance of our GR approach;
• It presents a scenario discovery method for identifying parameters that lead to better
performance of our GR approach;
• It confirms, via an evaluation over synthetic datasets generated by diverse planners, that
our GR approach can accurately recognize goals even when trained on non-optimal plans;
• It summarizes the insights of a comprehensive comparison of the performance of our GR
approach with the state-of-the-art techniques, which show that our approach achieves a
comparable performance and is often faster;
• It demonstrates that our GR approach is applicable in real-world scenarios.
The next section presents motivating examples of goal recognition based on historical
observations of agents’ behaviors. Next, Section 3 presents the background in goal recognition
and process mining required for understanding the subsequent discussions. Section 4 is devoted
to presenting the GR framework and our GR system grounded in process mining. Subsequently,
Section 5 presents the results of an evaluation of our implementation of the GR system, followed
by a discussion of related work in Section 6. Finally, we summarize our contributions and
conclude the paper in Section 7.
3
2. Motivating Examples
Figure 1a shows an 11x11 grid including the initial state in cell I and six goals represented
by cells A to F; this grid is similar to the grid used in [11]. The grid also shows three observed
walks of an agent, comprising a rational walk toward goal A (green), an irrational walk toward
goal A (red), and a rational walk toward goal F (blue). To achieve a goal, the√ agent can perform
horizontal and vertical steps at the cost of 1 and diagonal steps at the cost of 2.
1.0
10 B C D E
A
9 B
0.8
8 C
D
7
Probability
0.6 E
6 A F F
5
0.4
4
3
0.2
2
1 0.0
0 0 1 2 3 4 5 6 7
I
0 1 2 3 4 5 6 7 8 9 10 Step No.
(a) Three walks of an agent in a grid: two from initial cell I to goal cell (b) Rational walk to goal A.
A (green rational walk and red irrational walk) and one from initial cell I
to goal cell F (blue rational walk).
1.0 1.0
A A
B B
0.8 0.8
C C
D D
Probability
Probability
0.6 E 0.6 E
F F
0.4 0.4
0.2 0.2
0.0 0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6
Figure 1: Inferred probability distributions (Fig. 1b, 1c, and 1d) over the six goals from Fig. 1a computed based on the observed behaviors
shown in Fig. 2.
√
The green walk has√a cost of 5 + 3 2. As this cost is close to the cost of the optimal walk
from I to A (i.e., 1 + 5 2), we say that it is rational. The red walk starts by √ approaching goal
F before diverting toward reaching target goal A, resulting in a cost of 5 + √ 6 2. Hence, the red
walk is irrational. Finally, the blue walk toward goal F has a cost of 3 + 4 2, which is close to
the cost of the optimal walk from cell I to cell F and is, therefore, rational.
Although the surrounding environment of an agent is often unknown, the observed action
sequences executed by the agent can be used to learn models that explain the behavior for
achieving different goals. For each of the six goals from Fig. 1a, Fig. 2 shows “footprints” of
six observed walks of the agent from cell I to that goal. The thickness of an arrow in the figure
indicates the frequency with which the corresponding step was taken. Using the observations, we
4
construct process models that describe skills required for accomplishing the goals. Subsequently,
new observations of an agent in the environment can be compared with the acquired models to
identify the discrepancies between the newly observed behavior and the skill models. Intuitively,
the more discrepancies between the observed behavior and a skill model, the less likely the
agent is attempting to achieve the corresponding goal. Using process mining techniques [17]
(see Section 3.2 for details), we discover the skill models from the historical observations and
identify and compare the discrepancies between the observed behavior and the skill models.
We use the identified discrepancies to compute the probability distributions over the goals as
functions of time (i.e., the number of steps) for each observed behavior.
10 B C D E 10 B C D E 10 B C D E
9 9 9
8 8 8
7 7 7
6 A F 6 A F 6 A F
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 I 0 I 0 I
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(a) (b) (c)
10 B C D E 10 B C D E 10 B C D E
9 9 9
8 8 8
7 7 7
6 A F 6 A F 6 A F
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 I 0 I 0 I
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(d) (e) (f)
Figure 2: Observed agent behaviors from initial cell I to each of the goals A–F.
Figures 1b to 1d show the probability distributions over the candidate goals computed by our
GR system for the three walks from Fig. 1a. As expected, along the green rational walk, goal A is
consistently the most likely goal; see Fig. 1b. For the first seven steps of the red irrational walk,
however, goals E and F prevail, while goal A is identified as the most likely goal only toward
the end of the walk; see Fig. 1c. Finally, the blue rational walk toward goal F shows an equal
probability toward goals E and F for the first three steps (the same confusion as for the first steps
of the irrational walk), with the probability for goal F prevailing from step four onwards; see
Fig. 1d. Empirical evidence suggests (refer to Section 5) that our GR system can be used to fast
and accurately infer the intended goals of agents for a wide range of domains.
3. Background
This section provides the technical background required for understanding our contribution.
5
3.1. Goal Recognition
Goal Recognition (GR) is the problem of recognizing an agent’s intention (i.e., its goal)
according to its observed behavior.1 The problem was first introduced by Schmidt et al. [21]
in their psychological-based BELIEVER system and arguably first formalized by Kautz and
Allen [2] as a logic-based (non-monotonic) deductive reasoning task over an action taxonomy
encoding plan decompositions. A common feature of the early and traditional work in GR is the
reliance on pre-defined plan libraries (e.g., a hierarchical tree or network of goals and actions)
that are meant to encode the known “operational plans” of the domain [22]. Those plan libraries
are then parsed and “matched” against the observed sequences of action observations. For
example, in Kautz and Allen’s approach, the plan library is represented as a hierarchical action
graph (e.g., “Make Pasta Dish” and “Make Meat Dish” are the lower-level actions of “Prepare
Meal”) that allows high-level actions to be deduced from the observations of actions at the lower
levels. Avrahami-Zilberbrand and Kaminka [23], in turn, use the so-called Feature Decision
Tree (FDT) as a way for encoding the library of plans, whereas Pynadath and Wellman [24] use
grammar representations augmented with probabilities and state information. The fact is that,
in many domains, crafting those libraries could be costly or simply not feasible. In addition,
GR approaches requiring the specification of plan libraries may not be able to adequately handle
behavior outside the provided plans.
Possibly the first step toward plan-library-“free” GR was Hong’s proposal [25], in which the
so-called Goal Graph is constructed from the specification of a set of primitive actions and then
analyzed against the observed actions to extract consistent goals and valid plans. It was then the
work of Ramı́rez and Geffner [10, 11] that provided an elegant GR approach by leveraging on the
representational and algorithmic techniques in AI model-based automated planning. Intuitively,
the authors drew from the insight that a rational agent is expected to be taking the optimal, or
close to optimal, plan to its (hidden) goal, a point also noted elsewhere [26, 27], so the probability
of a plan can be linked to its cost. The main ingredient is that the relevant costs can be computed
by automatically synthesizing adequate plans relative to the observations seen using planning
technology [28]. By doing this, pre-defined plans are abandoned and replaced by – hopefully less
onerous – declarative models of the world dynamics, which are well-studied in the automated
planning and reasoning about action and change communities. The planning-based probabilistic
goal recognition problem is defined as follows.
Definition 1 (Planning-based probabilistic goal recognition). Given a tuple < F, A, I, G, τ >,
where F is a set a fluents, A is a set of actions, I ⊆ F is the initial state, G ⊆ 2F is a set of
candidate goals (each given as a set of fluents that ought to be true in the corresponding goal
state), and τ ∈ A∗ is an observation given as a sequence of actions performed by an agent,
the planning-based probabilistic goal recognition problem consists in obtaining a posterior
probability distribution over the candidate goals that describes the true goal of the agent.
In their 2010 work, Ramı́rez and Geffner derive such probability distribution over the possible
goals from Bayes’ Rule based on the assumption that the probability of a plan is inversely
proportional to its cost. Such assumption is encapsulated in the notion of cost difference between
the (cost of the) optimal plan for a goal matching the observed actions and the optimal plan that
could have been reached otherwise, that is, not embedding the observed actions. To compute
those costs, planning systems are used over specific encodings of the domain that also account
1 We refer to the problems of goal recognition, intention recognition, and plan recognition interchangeably.
6
for the observations. Ultimately, this yields a Boltzmann-like sigmoidal distribution with the
important property that the lower the cost difference, the higher the probability.
Several works have subsequently elaborated Ramı́rez and Geffner’s set-up (which we will
refer to as R&G from now on), or grounded it to specific interesting settings, such as navigation.
Here, we shall adopt the most recent elaboration by Masters and Sardiña [29, 30], which refined
the original set-up to achieve a simpler and computationally less demanding GR approach, and
one able to handle irrational agent behavior parsimoniously without counter-intuitive outcomes.
Concretely, taking optc(S , τ, G) to denote the optimal cost of reaching goal G from state S ⊆ F
by embedding the sequence of observations τ, we first define the cost difference of reaching a
goal G from S via observations τ as follows:2
When the agent is observed to act optimally for G, the cost difference is zero; as the agent
becomes more suboptimal toward G, the cost difference increases. Here, the difference to
the R&G approach is that the “cost difference” is calculated against the optimal cost (without
knowing any observations) to reach the goals, rather than the optimal cost without executing
some observed actions, which is known to be computationally demanding [29, 30].
Using the cost difference, and assuming for simplicity that all goals are initially equally likely,
the probability of a candidate goal G ∈ G given observations τ from initial state I can be obtained
as follows (note the denominator acts here as the normalization factor) [11, 31]:
e−β×costdiff (I,τ,G)
Pr(G ∣ τ) = ′ , (1)
∑ e−β×costdiff (I,τ,G )
G′ ∈G
where β is a parameter that “allows the goal recognition system developers to soften the implicit
assumption of the agent being rational” [11]. This account yields the principle that the more
suboptimal an agent acts for a potential goal, the higher the cost difference, hence the lower the
probability. Nonetheless, we adopt Masters and Sardiña [31, 32]’s approach—in turn, inspired
in the cost-ratio used by Vered et al. [33]—that lifts the original requirement that agents ought to
be rational (or close to rational) by dynamically modulating the β parameter using a rationality
measure of the observed agent as follows:
η
optc(I, ϵ, G)
β = (max ) . (2)
G∈G optc(I, τ, G)
Intuitively, this β expresses the most optimistic rationality ratio among all goals, with β = 1
when the agent is seen fully rational toward some goal. By using this dynamic parameter, the
more erratic the agent behavior (irrational to all goals), the more Pr(⋅) approaches a uniform
distribution. By doing this, the resulting GR system is capable of self-modulating its confidence
as observations are gathered (η is a positive constant regulating how quickly confidence should
drop when irrational behavior is seen).
In this paper, we adapt and instantiate the above approach to apply notions and techniques
from process mining [17].
2 Under no observations (i.e., τ = ϵ), optc(S , τ, G) reduces to optimal cost to G from state S .
7
3.2. Process Mining
Process mining studies methods, techniques, and tools to discover, monitor, and improve
processes carried out by organizations using the knowledge accumulated in event logs recorded
by information systems that support the execution of business processes [17].
An event log, or log, is a collection of traces, where each trace consists of a sequence of
timestamped events observed and recorded during the execution of a single case of a business
process. Each event in such a log refers to an action executed by an agent at a particular time and
for a particular case. Let E be a universe of events. Then an event log is defined as follows.
Definition 2 (Trace, Event log). A trace τ is a finite sequence of n events ⟨e1 , . . . , en ⟩, with
ei ∈ E and i ∈ [1.. n]. An event log, or log, L is a finite collection of traces over E.
For example, one can encode each collection of six sequences of actions toward each of the six
goals from Fig. 2 in an event log of six traces. An event in a trace of such an event log encodes
a move in the grid and can be specified as a pair of two cells: the source cell and the target cell.
For example, by (5,0
4,1 ), we denote the event of the agent moving from cell (5, 0) to cell (4, 1) in
the grid. Let LA = {τ1 , . . . , τ6 } be the log that contains six traces, each capturing the moves from
some walk toward goal A shown in Fig. 2a. The six traces in log LA are specified in Fig. 3.
⇖ ⇖ ⇖ ⇖ ⇖ ⇑
τ1 =
(5,0 ) (4,1 ) (3,2 ) (2,3 ) (1,4 ) (0,5 )
4,1 3,2 2,3 1,4 0,5 0,6
⇑ ⇖ ⇖ ⇖ ⇖ ⇖
τ2 =
(5,0 ) (5,1 ) (4,2 ) (3,3 ) (2,4 ) (1,5 )
5,1 4,2 3,3 2,4 1,5 0,6
⇖ ⇑ ⇑ ⇐ ⇐ ⇖ ⇑ ⇖
τ3 =
(5,0 ) (4,1 ) (4,2 ) (4,3 ) (3,3 ) (2,3 ) (1,4 ) (1,5 )
4,1 4,2 4,3 3,3 2,3 1,4 1,5 0,6
⇖ ⇖ ⇖ ⇖ ⇐ ⇑ ⇑
τ4 =
(5,0 ) (4,1 ) (3,2 ) (2,3 ) (1,4 ) (0,4 ) (0,5 )
4,1 3,2 2,3 1,4 0,4 0,5 0,6
⇖ ⇑ ⇑ ⇐ ⇐ ⇘ ⇘ ⇑ ⇑ ⇐ ⇐ ⇖ ⇑ ⇖
τ5 =
(5,0 ) (4,1 ) (4,2 ) (4,3 ) (3,3 ) (2,3 ) (3,2 ) (4,1 ) (4,2 ) (4,3 ) (3,3 ) (2,3 ) (1,4 ) (1,5 )
4,1 4,2 4,3 3,3 2,3 3,2 4,1 4,2 4,3 3,3 2,3 1,4 1,5 0,6
⇑ ⇐ ⇑ ⇑ ⇐ ⇐ ⇖ ⇑ ⇖
τ6 =
(5,0 ) (5,1 ) (4,1 ) (4,2 ) (4,3 ) (3,3 ) (2,3 ) (1,4 ) (1,5 )
5,1 4,1 4,2 4,3 3,3 2,3 1,4 1,5 0,6
Figure 3: Event log LA representing the walks to goal A shown in Fig. 2a.
In the figure, a trace is specified as a table with two rows, where a column encodes one event.
The bottom row specifies the moves, while the top row visualizes the moves as arrows pointing
in the directions of the moves. Thus, trace τ1 in Fig. 3 consists of six events. The first five events
encode diagonal north-west moves that take the agent from cell (5, 0) to cell (0, 5), and the last
event encodes the move from cell (0, 5) to cell (0, 6), thus going north and reaching the goal.
otherwise. An execution σ of a net N is either the empty sequence, if no transitions are enabled
in the initial marking, or a sequence of transitions ⟨t1 , t2 , . . . , tn ⟩, ti ∈ T , i ∈ [1.. n], such that
t1 t2 tn
M0 Ð→ M1 Ð→ . . . Ð→ Mn and there are no enabled transitions in Mn .
Figure 4 shows the marked net discovered from log LA shown in Fig. 3 using the Split
miner discovery technique [36]. Note that in the discovered net, we label transitions with
events to refer to the actions they represent, i.e., it holds that E ⊂ L. In the figure, the initial
marking is denoted by the black dot in place I, specifying that in the initial marking, place I
is associated with the number one (one black dot in the place), whereas every other place of
the net is associated with the number zero (no black dots). The executions of the net describe
(and generalize) the walks from the initial cell I toward goal A in the grid shown in Fig. 2a. In
particular, the net generalizes the repetitive fragment in trace τ5 , via transitions t5 , t6 , t7 , t8 , t10 ,
and t11 . The transitions in the net encode steps in the grid. For example, transitions t1 and
t5 , despite both capturing a step to the north, describe two different steps in the grid, namely,
(5,0
5,1 ) and (4,2 ), respectively; the cell references are not shown in the figure. Hence, execution
4,1
⟨t3 , t4 , t5 , t6 , t7 , t8 , t10 , t11 , t5 , t6 , t7 , t8 , t9 , t14 , t19 , t23 ⟩ of the net describes trace τ5 in the log from
Fig. 3. Note that transitions t4 and t9 are silent, i.e., they are assigned silent labels that do not
convey the domain semantics, shown as black rectangles in the figure.
t23
A ⇖
t22 t21
⇑ ⇐
t1 t2 t11 t10 t9
⇑ ⇐ ⇘ ⇘
t3 t4 t5 t6 t7 t8
⇖ ⇑ ⇑ ⇐ ⇐
t6 t13
⇖ ⇖
Figure 4: Net NA discovered from the event log in Fig. 3 capturing the walks to goal A shown in Fig. 2a.
A move (x, y) is a legal move if it is either a move on log, a move on model, or a synchronous
move; otherwise move (x, y) is an illegal move. We also refer to moves on log and model as
asynchronous moves. By MLM , we denote the set of all legal moves.
Let δ ∶ MLM → N0 be a function that assigns non-negative costs to moves. The cost of an
alignment γ is denoted by δ(γ) and is equal to the sum of the costs of all its moves. As
synchronous moves specify agreement between the trace and the execution, we use cost functions
that assign zero costs to synchronous moves. In addition, as moves on model for silent transitions,
e.g., transitions t4 and t9 in Fig. 4, do not demonstrate a disagreement with the trace, they are
also assigned zero costs. Indeed, a silent transition does not represent a step of an agent and is
present in a net for technical reasons only, i.e., to support the encoding of the desired executions.
10
Asynchronous moves capture the disagreement between the trace and the execution. Thus,
we use cost functions that assign positive costs to asynchronous moves. Finally, an optimal
alignment of a trace and a marked net is an alignment of the trace and some execution of
the marked net that yields the lowest, among all possible alignments between the trace and
executions of the marked net, cost. Intuitively, an optimal alignment characterizes minimal
discrepancies between the trace and the net. A trace and a net are said to agree perfectly if
there exists an optimal alignment of zero cost between the trace and some execution of the net.
Let us again consider the three walks of the agent from Fig. 1a. Alignments can be used to
identify the discrepancies between the observed behavior of the agent in each of these walks and
the models that represent the historical behavior toward the goals. Recall that net NA in Fig. 4
models the observed behavior of the agent toward goal A from Fig. 2a. In addition, Fig. 5 depicts
net NF modeling the observed behavior of the agent toward goal F from Fig. 2f.
t2 t5 t8 t12 t17
⇗ ⇗ ⇗ ⇗ ⇗
t1 t25
⇑ ⇑
Figure 5: Net NF discovered from the walks to goal F shown in Fig. 2f.
3 The logs that describe the walks toward the six goals shown in Fig. 2 and the nets discovered from these logs,
captured using the XES standard (https://xes-standard.org/) and PNML standard (http://www.pnml.org/),
respectively, can be accessed here: https://doi.org/10.26188/21749570.
11
5,0 4,1 4,2 4,3 3,3 2,3 1,4 1,5
τ′ 4,1 4,2 4,3 3,3 2,3 1,4 1,5 A
⇖ ≫ ⇑ ⇑ ⇐ ⇐ ≫ ⇖ ⇑ ⇖
γ1 =
⇖ ⇑ ⇑ ⇐ ⇐ ⇖ ⇑ ⇖
NA 5,0
4,1
4,1
4,2
4,2
4,3
4,3
3,3
3,3
2,3
2,3
1,4
1,4
1,5
1,5
A
t3 t4 t5 t6 t7 t8 t9 t14 t19 t23
Table 1: Optimal alignment between the rational walk τ′ toward goal A in Fig. 1a and NA in Fig. 4.
a move on model for transition t17 of the net. The last two steps in γ2 are moves on log. As a result
√
of these three asynchronous moves, two straight and one diagonal, it holds that δ(γ2 ) = 2 + 2.
Note that we use the costs of straight and diagonal steps discussed in Section 2 as costs of the
corresponding asynchronous alignment moves.
5,0 5,1 6,2 7,3 8,4 9,5 10,5
τ′′ 5,1 6,2 7,3 8,4 9,5 10,5 F
⇑ ⇗ ⇗ ⇗ ⇗ ≫ ⇒ ⇑
γ2 =
⇑ ⇗ ⇗ ⇗ ⇗ ⇗ ≫ ≫
NF 5,0
5,1
5,1
6,2
6,2
7,3
7,3
8,4
8,4
9,5
9,5
F
t1 t2 t5 t8 t12 t17
Table 2: Optimal alignment between the rational walk τ′′ toward goal F in Fig. 1a and NF in Fig. 5.
Table 3 shows optimal alignment γ3 of the irrational walk toward goal A in Fig. 1a and net NA
from Fig. 4. The first five moves in γ3 are moves on model, while the subsequent seven moves
are moves on log. The final five moves demonstrate agreement between the walk and model.
√ function δ used to obtain costs of alignments γ1 and γ2 above, it holds that
Considering cost
δ(γ3 ) = 6 + 5 2; again, the silent moves on model for transitions t4 and t9 are not penalized.
5,0 5,1 6,2 7,3 6,4 5,4 4,4 3,3 2,3 1,4 1,5
τ′′′ 5,1 6,2 7,3 6,4 5,4 4,4 3,3 2,3 1,4 1,5 A
≫ ≫ ≫ ≫ ≫ ⇑ ⇗ ⇗ ⇖ ⇐ ⇐ ⇙ ⇐ ≫ ⇖ ⇑ ⇖
γ3 =
⇖ ⇑ ⇑ ⇐ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ⇐ ⇖ ⇑ ⇖
NA 5,0
4,1
4,1
4,2
4,2
4,3
4,3
3,3
3,3
2,3
2,3
1,4
1,4
1,5
1,5
A
t3 t4 t5 t6 t7 t8 t9 t14 t19 t23
Table 3: Optimal alignment between the irrational walk τ′′′ toward goal A in Fig. 1a and NA in Fig. 4.
Finally, Table 4 shows optimal alignment γ4 of the irrational walk toward goal
√ A in Fig. 1a
and net NF from Fig. 5. Using the cost function δ, it holds that δ(γ4 ) = 4 + 7 2. Indeed, the
first three moves are synchronous. However, among the subsequent eleven asynchronous moves,
four represent straight steps and seven encode diagonal steps.
5,0 5,1 6,2 7,3 6,4 5,4 4,4 3,3 2,3 1,4 1,5
τ′′′ 5,1 6,2 7,3 6,4 5,4 4,4 3,3 2,3 1,4 1,5 A
⇑ ⇗ ⇗ ≫ ≫ ≫ ⇖ ⇐ ⇐ ⇙ ⇐ ⇖ ⇑ ⇖
γ4 =
⇑ ⇗ ⇗ ⇗ ⇗ ⇗ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫
NF 5,0
5,1
5,1
6,2
6,2
7,3
7,3
8,4
8,4
9,5
9,5
F
t1 t2 t5 t8 t12 t17
Table 4: Optimal alignment between the irrational walk τ′′′ toward goal A in Fig. 1a and NF in Fig. 5.
12
4. Data-Driven Goal Recognition
A solution to a data-driven GR problem uses data from historical solutions to the problem.
In this paper, the data is a collection of previously observed sequences of actions that resulted in
achieving the candidate goals. Concretely, we define the data-driven GR problem as follows.
Definition 6 (Data-driven probabilistic goal recognition). Given a tuple < G, A, D, τ >, where
G is a set of candidate goals, A is a set of actions, D ⊆ A∗ × G is a set of pairs, each relating a
historical sequence of actions to the goal achieved by executing this sequence of actions, and
τ ∈ A∗ is an observation given as a sequence of actions performed by an agent, the data-driven
probabilistic goal recognition problem consists in obtaining a posterior probability distribution
over the candidate goals that describes the true goal of the agent.
Section 4.1 presents our GR framework inspired by the principles from observational learning [41].
The framework can be seen as a collection of abstract components that, when instantiated, result
in a concrete GR system. Then, in Section 4.2, we discuss an instantiation of the framework
using process mining techniques to obtain a GR system for solving data-driven GR problems.
1. The attention stage is responsible for determining whether an observed stimulus, e.g., a
performed action, is relevant for learning a certain skill and capturing the relevant stimuli. This
stage also regulates the selection of the relevant observed stimuli for a certain learning purpose
and ignores irrelevant or noisy stimuli. In the top-left part of Fig. 6, the framework proposes
to capture relevant actions s and w executed by agents A and B, respectively. Meanwhile,
the framework also recognizes that agent A has achieved goal α. Hence, a GR system
that instantiates the framework must “know” the possible goals agents may achieve in the
environment and the conditions when these goals are fulfilled.
2. The retention stage (the top-right part of Fig. 6) is responsible for incorporating the observed
stimuli into skill models, where a skill model describes how agents achieved one particular
goal in the past; for example, the α-skill model records the historical traces to goal α. The
framework receives information about the observed actions s and w and appends them to the
corresponding currently constructed traces. Once the “Goal completion recognized” signal
triggered by action α is captured, the corresponding trace of agent A is added to the skill
13
Attention Retention Goal
completion Retain Discover
recognized skill trace skill model
Goal
completion
recognized
α
α5
α e e e n e w s α-skill Old α-skill model
w A library Skill
B s
{α4,α5} models
A β-skill
s e w
B w
library
{β2,β3}
β-goal
Motivation Recognition β-skill model
library, a collection of recently observed traces that lead to the same goal. For instance, in the
figure, the “Retain skill trace” activity adds trace α5 = ⟨e, e, e, n, e, w, s⟩ to the α-skill library
(α5 is one of the traces to goal α). We use process discovery techniques [17, 42] from process
mining to update old skill models based on the retained traces. Thus, a skill model aggregates
and generalizes the observed behaviors for achieving the corresponding goal.
3. Based on the observed stimuli captured in the attention stage and stored in the retention
stage, the motivation stage of the framework is responsible for triggering the subsequent goal
recognition episodes. In the bottom-left part of Fig. 6, as a response to the “Action retained”
signal triggered by action w, goal recognition is initiated by triggering the “Goal recognition
initiated” signal. The implementation of the decision logic for triggering the recognition is
outsourced to concrete instantiations of the framework.
4. The recognition stage is responsible for inferring the goals of the currently observed agents
based on the retained skill models. This stage constructs the observed trace fragment, as
shown in the bottom-right part of Fig. 6. For example, fragment ⟨s, e, w⟩ is performed by agent
B and launches the “Check conformance” activity. The latter analyzes the commonalities and
discrepancies between the trace fragment and all the available skill models to compute the
distribution over the possible goals the agent may be striving to achieve. We use conformance
checking techniques [38, 17, 43] from process mining to compute the commonalities and
discrepancies between trace fragments and skill models. Finally, based on the performed
analysis, the framework decides the goal of the agent. For instance, Fig. 6 suggests that β is
the goal that agent B currently aims to achieve since ⟨s, e, w⟩ matches the β-skill model (the
model for achieving goal β) better than the α-skill model (for goal α).
Apart from the four stages mentioned above, a feedback mechanism for learning based on the
14
recognition mistakes can be introduced. The need for such a feedback mechanism is motivated
by the argument that observational learning could emerge from reinforcement learning [44].
e−β×ω(τ,αG )
Pr(G ∣ τ) = . (3)
∑ e−β×ω(τ,αG′ )
G′ ∈G
Here, ω(τ, αG ) ≥ 0, while the “temperature” β controls the level of confidence for GR, which can
also be interpreted as the trust over the learned models. We define parameter β as follows:
1
β= . (4)
1 + min ω(τ, αG )
G∈G
Equation (4) follows and simplifies Eq. (2) to account for the fact that the best case scenario is
an alignment weight of zero, which implies that Eq. (4) inherits the confidence-based properties
described in [31]. As the minimum (among all goals) alignment weight ω increases, the observed
agent is arguably more “irrational”, β approaches zero and the GR probability distribution more
closely resembles a uniform one. Finally, the alignment weight between an observation trace
τ = ⟨e1 , . . . , en ⟩ and a skill model αG captured as a marked net is defined as follows:
n
ω(τ, αG ) = ϕ + λm × ∑ (i δ × c(τ, αG , i)) , where: (5)
i=1
15
• c(τ, αG , i) is the cost of move for trace event ei in an optimal alignment4 between trace τ and
model αG ;
• i δ , with δ ≥ 0, is a discount factor that emphasizes that the more recent disagreements between
the trace and the model impact the alignment weight more;
• ϕ ≥ 0 is the “smoothening” constant that flattens the likelihoods of the goals in the case of
(close-to-)perfect alignments for all (most of) the skill models; and
• λ m penalizes the suffix of the trace that deviates from the skill model, where λ ≥ 1 is a constant
and m is the number of consecutive asynchronous moves on trace at the end of the optimal
alignment between trace τ and model αG .
We use the example optimal alignments from Section 3.2.2 to demonstrate the computation of
alignment weights. Differently from the cost functions used in conformance checking, which
penalize moves on both model and log, for GR purposes, we assign the cost of move ei , i.e.,
c(τ, αG , i), to be equal to one if it is an asynchronous move on log. All other moves are assigned
a cost of zero. This costing scheme avoids penalizing partially observed traces since an optimal
alignment of a partial trace tends to contain asynchronous moves on model. These moves on
model, for example, describe how the trace can unfold in the future, not the discrepancies
between the model and trace. A cost of one for asynchronous moves on trace is used as, in
general, we assume no knowledge about the GR environment and the problem domain. Similarly,
in all the evaluations of the approach we performed, when constructing alignments, we penalized
all asynchronous moves, both moves on model and moves on trace, with a cost of one, and all the
other moves were given no cost. √ This is different from our example alignments γ1 –γ4 that were
constructed using the cost of 2 for diagonal asynchronous moves. Note that picking different
optimal alignments can affect the computation result of alignment weight, affecting the goal
inference. Furthermore, in the example calculations, we use these default parameter settings:
ϕ = 50, λ = 1.1, and δ = 1.
In alignment γ1 from Table 1, all events in the trace are matched by model αA , that is, net NA
from Fig. 4. Hence, each move has zero cost, i.e., c(τ′ , αA , i) = 0, for any position i in the trace.
The number of consecutive asynchronous moves on trace in the suffix of the alignment is zero
(λm = 1.10 ). Therefore, the alignment weight of γ1 is 50; see the calculation below.
n
ω(τ′ , αA ) = ϕ + λm × ∑ (iδ × c(τ′ , αA , i)) = 50 + 1.10 × 0 = 50
i=1
τ′ ⇖ ≫ ⇑ ⇑ ⇐ ⇐ ≫ ⇖ ⇑ ⇖
γ1 =
αA ⇖ ⇑ ⇑ ⇐ ⇐ ⇖ ⇑ ⇖
iδ 11 21 31 41 51 61 71 81
c(τ , αA , i)
′
0 0 0 0 0 0 0 0
In alignment γ2 from Table 2, the first five events in the trace are matched by model αF ,
that is, net NF from Fig. 5. However, the sixth and the seventh events result in asynchronous
4 As there can exist multiple optimal alignments between a trace and a model, in this work, we rely on a
procedure proposed by the authors of the original alignment technique that chooses one such optimal alignment
deterministically [38, 47].
16
moves on log. Hence, each of the two corresponding moves in the alignment incurs the cost of
one. As the alignment closes with two consecutive asynchronous moves on log, it holds that
λm = 1.12 . Consequently, the alignment weight of γ2 is 65.73, see the detailed calculation below;
the red columns in this and subsequent examples denote the asynchronous moves on log that are
penalized when calculating alignment weight.
n
ω(τ′′ , αF ) = ϕ + λm × ∑ (iδ × c(τ′′ , αF , i)) = 50 + 1.12 × 13 = 65.73
i=1
τ′′ ⇑ ⇗ ⇗ ⇗ ⇗ ≫ ⇒ ⇑
γ2 =
αF ⇑ ⇗ ⇗ ⇗ ⇗ ⇗ ≫ ≫
iδ 11 21 31 41 51 61 71
c(τ , αF , i)
′′
0 0 0 0 0 1 1
In γ3 from Table 3, the first seven events in the trace are not matched by model αA . Thus, the
costs of the corresponding seven asynchronous moves in the alignment are equal to one. The last
four moves are synchronous. As there are no asynchronous moves at the end of the alignment, it
holds that λm = 1.10 . Therefore, the alignment weight of γ3 is 78, as shown below.
n
ω(τ′′′ , αA ) = ϕ + λm × ∑ (iδ × c(τ′′′ , αA , i)) = 50 + 1.10 × 28 = 78
i=1
τ′′′ ≫ ≫ ≫ ≫ ≫ ⇑ ⇗ ⇗ ⇖ ⇐ ⇐ ⇙ ⇐ ≫ ⇖ ⇑ ⇖
γ3 =
αA ⇖ ⇑ ⇑ ⇐ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ⇐ ⇖ ⇑ ⇖
iδ 11 21 31 41 51 61 71 81 91 101 111
c(τ′′′ , αA , i) 1 1 1 1 1 1 1 0 0 0 0
Finally, in alignment γ4 from Table 4, the first three events in the trace are matched by the
model, while the remaining eight events are not and, thus, it holds that λm = 1.18 . The weight of
γ4 , consecutively, amounts to 178.62, see below.
n
ω(τ′′′ , αF ) = ϕ + λm × ∑ (iδ × c(τ′′′ , αF , i)) = 50 + 1.18 × 60 ≈ 178.62
i=1
τ ⇑ ⇗ ⇗ ≫ ≫ ≫ ⇖ ⇐ ⇐ ⇙ ⇐ ⇖ ⇑
′′′
⇖
γ4 =
αF ⇑ ⇗ ⇗ ⇗ ⇗ ⇗ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫
iδ 11 21 31 41 51 61 71 81 91 101 111
c(τ , αF , i) 0 0 0 0 0 0 1 1 1 1 1 1
′′′
1 1
Our GR system uses Eq. (3) and weights of the alignments between the observed trace and all
the skill models to construct a probability distribution over all the available goals. Subsequently,
the GR system infers the possible goal(s) of the agent based on the constructed probability
distribution and a given selection threshold θ. First, the candidate goal with the highest computed
probability (Pr+ ) is included in the resulting set of goals. Next, every goal with a computed
probability greater than θ×Pr+ is added to the resulting set. For instance, if it holds that Pr+ = 0.5
and θ = 0.8, then if any of the candidate goals has the probability of at least 0.8 × 0.5 = 0.4, it is
included in the resulting set. In our implementation of the GR system, we set the default value
of the selection threshold θ to be equal to 0.8.
17
Putting everything together, Eq. (3) provides a novel middle ground between traditional
plan-library-based goal recognition and the more recent approaches from the planning literature
grounded in cost differences between plans. Indeed, observations are matched to a sort of library
of plans, implicitly represented and aggregated in a collection of skill models that generalize the
original plans they are discovered from by re-interpreting plan cost as the level of misalignment
between the observations and the skill models. Our GR system can be, in principle, used for plan
recognition. Instead of inferring the goals, it could return skill models and remove the branches
that are not optimally aligned with the observed action sequence, from which plan(s) can be
inferred. Learning skill models from spurious (missing and noisy) observations is also possible
but may lead to two issues: (i) An action should be included in the model but is missing. This
phenomenon will lead to an asynchronous move on model, which may increase the alignment
weight and decrease the recognition accuracy. (ii) An action should be excluded from the model
but is included nonetheless. This phenomenon will cause an asynchronous move on log which
we do not penalize, hence no impact on the accuracy.
Our GR system has four parameters that impact the inferences it produces, namely δ, λ, ϕ,
and θ. In the next section, we demonstrate that each of these parameters impacts the performance
of the system and suggest an approach for configuring them to maximize the GR performance.
5. Evaluation
In this section, we present our experimental setup (Section 5.1) and the results of the
conducted experiments that allow us to answer the following five research questions:
RQ1: Do all the parameters of our GR system impact its performance (Section 5.2)?
RQ2: How to configure our GR system for better performance (Section 5.3)?
RQ3: Does the choice of a particular collection of traces for learning skill models impact goal
recognition time and accuracy (Section 5.4)?
RQ4: How does the performance of our GR system compare with the performance of other
state-of-the-art GR systems (Section 5.5)?
RQ5: Is our GR system applicable in real-world scenarios (Section 5.6)?
18
observation). An observation can be full (when 100% of actions in the sequence were observed)
or partial (for example, when only 10%, 30%, 50%, or 70% of all actions were observed).
Our approach requires historical observations for learning skill models. However, the
mentioned domains only provide the rules of how agents can act in the environment. Therefore,
we used planners to generate plans (traces) that resemble historical observations toward the
candidate goals. Our GR system does not consider the frequencies of observed traces when
it learns the skill models so that every generated trace in the dataset is unique.
To generate the traces, two planners were used: the top-k planner [50] and the diverse
planner [51]. The top-k planner generates a set of k different cost-optimal traces toward a given
goal. Such cost-optimal traces simulate possible rational behaviors of an agent toward the goal.
On the other hand, the diverse planner generates a set of divergent traces such that each trace is
significantly different from others according to a stability diversity metric [52, 53]. The stability
between two plans is the number of the co-occurring actions in both plans over the total number
of the actions, refer to Eq. (6). The diverse planner generates a set of plans such that the stability
between every two plans is equal to or is less than a specified number. Note that A(π) and A(π′ )
in Eq. (6) represent the sets of actions in plan π and plan π′ , respectively.
∣A(π) ∩ A(π′ )∣
stability(π, π′ ) = (6)
∣A(π) ∪ A(π′ )∣
A planner does not guarantee that it can generate an arbitrary number of distinct traces toward
a given goal. Therefore, if a planner failed to generate the requested number of traces for a
given problem, we removed that problem from the dataset. We attempted to generate 100 traces
toward each candidate goal in each problem instance in all 15 synthetic domains using the top-
k planner and the diverse planner. If a planner did not generate 100 traces toward a candidate
goal within one hour, we marked the GR problem instance containing that candidate goal as
failed. The diverse planner failed to generate 100 traces for all the problem instances from the
Campus and Kitchen domains; these were excluded from the analysis accordingly. For the same
reason, we excluded several problem instances in the remaining domains, for both planners. For
each planner, the summary of the excluded instances is provided in Appendix A. The excluded
instances were not used in our experiments.
• The BPIC 2011 event log includes records of patients’ medical treatments from a Dutch
Hospital (DOI: 10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54). Four GR prob-
lems were formulated using the LTL classifiers to capture different treatment outcomes.
• The five BPIC 2015 event logs record the application processes to acquire building permits
in five Dutch municipalities (DOI: 10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1),
19
one municipality per log. The traces in each event log were classified according to whether
the action “send confirmation receipt” was executed before the action “retrieve missing
data”.
• The BPIC 2017 event log records the loan application process of a Dutch financial institute
(DOI: 10.4121/12705737.v2). This event log was partitioned into six clusters that formed
three GR problems of identifying whether an application is accepted, rejected, or canceled.
Each problem consists of two subsets of traces: one with the corresponding outcome
(“accepted”, “rejected”, or “canceled”) and the other one with all the remaining traces
(“not accepted”, “not rejected”, or “not canceled”, respectively).
• The Hospital Billing event log records the execution of billing procedures for medical
services (DOI: 10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741). Two GR prob-
lems were formulated based on this log: to recognize whether the billing package was
eventually closed and to recognize whether the case was reopened.
• The Production event log records activities for producing items in a manufacturing
scenario (DOI: 10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399). The traces were
classified into two sub-logs according to whether the number of rejected orders was zero
or not.
• The Sepsis Cases event log from a Dutch hospital records laboratory tests of patients who
have sepsis conditions (DOI: 10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460).
Three GR problems were formulated using the LTL classifiers: whether a patient will
return to the emergency room within 28 days of discharge, whether a patient is admitted
for intensive care services, and whether a patient is discharged due to a reason other than
Release.
• The Traffic Fines event log records events related to fines from an Italian local police
force (DOI: 10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5). The traces were
classified into two groups according to whether the fine was fully repaid or not.
The resulting GR problems are binary choices between two candidate goals. The traces
for training skill models and the observed traces for testing GR performance are provided by
Teinemaa et al. [54]. The statistics of these binary-choice datasets are shown in Table 5. We also
tested our GR system on multi-class problems that have more than two candidate goals. To this
end, we used three real-world datasets that were also used in [20], namely Activities of Daily
Living, Building Permit Applications, and Environmental Permit Applications. The statistics of
these multi-class datasets are shown in Table 6. These datasets were divided into 80% and 60%
of traces for training the skill models and the remaining 20% and 40% of traces, respectively, for
testing GR performance. These three domains are summarized below:
• The Activities of Daily Living event logs record activities executed by four individ-
uals during weekdays and weekends separately (DOI: 10.4121/uuid:01eaba9f-d3ed-4e04-
9945-b8b302764176). Therefore, eight event logs were used to formulate the GR problem
of identifying who and on which day (weekday or weekend) executed a given action
sequence.
• The Building Permit Applications is the same dataset as BPIC 2015 mentioned above. It
contains event logs that record the processes of build permit applications handled by five
20
Domain Sub-set # Train traces # Obs traces # Candidates Avg Len Min Len Max Len Std Dev
1 516 58 2 56.66 1 824 113.50
2 1025 115 2 131.34 1 1814 202.60
BPIC 2011
3 1008 113 2 62.93 1 1368 134.44
4 1025 115 2 81.64 1 1432 142.54
1 626 70 2 41.34 2 101 17.22
2 677 76 2 54.71 1 132 19.04
BPIC 2015 3 1194 134 2 43.29 3 124 15.35
4 518 59 2 42.00 1 82 14.52
5 945 106 2 51.91 5 134 15.11
1 28270 3143 2 38.15 10 180 16.70
BPIC 2017 2 28270 3143 2 38.15 10 180 16.70
3 28271 3142 2 38.15 10 180 16.70
1 69772 7753 2 5.53 2 217 2.32
Hospital Billing
2 69771 7754 2 5.27 2 217 1.97
Production 197 23 2 11.31 1 78 10.13
1 702 80 2 16.78 5 185 12.10
Sepsis Cases 2 703 79 2 13.97 4 60 5.05
3 702 80 2 15.94 4 185 12.18
Traffic Fines 116652 12963 2 3.55 2 20 1.37
Table 5: Statistics of the real-world datasets for binary choice problems. Train traces: the traces for training skill models; Obs traces: the
observed traces for testing GR performance.
Dutch municipalities. Instead of classifying the traces in each event log into two sub-
groups, we formulated GR problems to identify which municipality handled which action
sequence.
• The five Environmental Permit Applications event logs record the environmental permit
application processes in five Dutch municipalities (DOI: 10.4121/uuid:26aba40d-8b2d-
435b-b5af-6d4bfbd7a270). As for the Building Permit Applications domain, the GR prob-
lems were formulated to recognize which municipality processed which environmental
application.
Domain # Traces # Candidates Avg Len Min Len Max Len Std Dev
Activities of Daily Living 148 8 75.26 20 134 23.49
Building Permit Applications 1000 5 45.61 1 154 19.67
Environmental Permit Applications 1000 5 43.89 2 108 17.39
5.1.3. Implementation
Our GR system was implemented and is available as part of an open-source simulation tool6
capable of automatically solving batches of GR problem instances formulated in a given domain.
When solving a single problem instance, our tool takes a set of parameters and an observed
trace as input. As a result, it returns a list of inferred goals, their likelihoods, and the time
spent solving the instance. All the problem instances were run on a single core of an Intel Xeon
Processor (Skylake, IBRS) @ 2.0GHz. Note that solving a single problem instance requires less
than 4GB of RAM.
6 https://github.com/zihangs/GR_system
21
5.1.4. Quality Measures
We used precision, recall, and accuracy to evaluate the performance of GR systems. Four
terms are used to compute these measures. True Positive (TP) is the number of correct goals
inferred by a GR system. True Negative (TN) is the number of incorrect goals that were not
inferred. False Positive (FP) is the number of incorrect goals inferred by a GR system. Finally,
False Negative (FN) is the number of correct goals that were not inferred. Given the above terms,
precision is the fraction of the correctly inferred goals among all the inferred goals.
TP
fprecision = (7)
TP + FP
Recall is defined as the fraction of the correctly inferred goals among all the true goals.
TP
frecall = (8)
TP + FN
Finally, accuracy is the ratio of the correct positive and negative inferences to the total positive
and negative inferences.
TP + TN
faccuracy = (9)
TP + TN + FP + FN
Suppose our GR system observes an agent that executes a sequence of actions. The agent works
toward a true hidden goal (g1) among ten candidate goals (g1 to g10). For example, according
to the observed action sequence, our GR system infers g1 and g2 as two possible goals the agent
is trying to achieve. Thus, g1 and g2 are two positive goals, and the other candidate goals are
negative goals. In this scenario, for the two positive goals, g1 is the true hidden goal correctly
inferred by the GR system. Hence, TP is equal to one. Goal g2 is not the true hidden goal (it is
falsely inferred by the GR system). Thus, FP is also equal to one. For the eight negative goals
(g3 to g10), none of them is the true hidden goal. As such, TN equals eight because our GR
system made the correct decision not to infer these goals. Finally, FN is equal to zero because
none of the true hidden goals are missed (our GR system correctly recognized the true hidden
goal). Hence, in this example scenario, precision, recall, and accuracy are equal to 0.5, 1.0, and
0.9, respectively. Note that in our experiments TP, FN ∈ {0, 1}, as there is only one true goal per
instance, while TN, FP ∈ {0, . . . , ∣G∣ − 1}, where G stands for the set of candidate goals.
(a) (b)
Figure 7: Sobol sensitivity analysis indices for the first-order effects and the total effects for the blocks-world domain. The GR system
was trained by (a) the cost-optimal traces and (b) the divergent traces.
The Sobol sensitivity analysis results for all the other synthetic domains are listed in
Appendix B. For the Sokoban domain with the divergent traces, the lower bound of the ST
confidence interval for parameter ϕ is less than 0.05 (non-significant). Except for this special
case, all other lower bounds are greater than 0.05, indicating that all four parameters significantly
impact the performance.
Figure 8: Peeling trajectory for the driverlog Figure 9: Visualization of the parameter ranges recommended
domain (trained by the cost-optimal traces). by the PRIM algorithm.
Table 7: The recommended ranges for parameters of the GR system (cost-optimal traces). ∗: qp-value ≤ 0.05; ∗∗: qp-value ≤ 0.001.
Table 8: The recommended ranges for parameters of the GR system (divergent traces). ∗: qp-value ≤ 0.05; ∗∗: qp-value ≤ 0.001.
Intuitively, PRIM narrows the parameter ranges to smaller ones that are more likely to
contain the top performance scenarios. Thus, for each domain, we use the middle points of the
25
0.0
0.2
0.4
0.6
0.0
0.1
0.2
0.3
0.4
0.3
0.2
0.1
0.0
blocks-world (10%) blocks-world (10%) blocks-world (10%)
blocks-world (30%) blocks-world (30%) blocks-world (30%)
blocks-world (50%) blocks-world (50%) blocks-world (50%)
blocks-world (70%) blocks-world (70%) blocks-world (70%)
blocks-world (100%) blocks-world (100%) blocks-world (100%)
campus (10%) campus (10%) campus (10%)
campus (30%) campus (30%) campus (30%)
campus (50%) campus (50%) campus (50%)
campus (70%) campus (70%) campus (70%)
campus (100%) campus (100%) campus (100%)
depots (10%) depots (10%) depots (10%)
depots (30%) depots (30%) depots (30%)
depots (50%) depots (50%) depots (50%)
depots (70%) depots (70%) depots (70%)
depots (100%) depots (100%) depots (100%)
driverlog (10%) driverlog (10%) driverlog (10%)
driverlog (30%) driverlog (30%) driverlog (30%)
driverlog (50%) driverlog (50%) driverlog (50%)
driverlog (70%) driverlog (70%) driverlog (70%)
driverlog (100%) driverlog (100%) driverlog (100%)
dwr (10%) dwr (10%) dwr (10%)
dwr (30%) dwr (30%) dwr (30%)
dwr (50%) dwr (50%) dwr (50%)
dwr (70%) dwr (70%) dwr (70%)
dwr (100%) dwr (100%) dwr (100%)
easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%)
easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%)
easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%)
easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%)
easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%)
ferry (10%) ferry (10%) ferry (10%)
ferry (30%) ferry (30%) ferry (30%)
ferry (50%) ferry (50%) ferry (50%)
26
intrusion-detection (50%)
(b) Recall
intrusion-detection (70%)
(a) Precision
(c) Accuracy
intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%)
kitchen (10%) kitchen (10%) kitchen (10%)
Figure 10: The performance differences between the GR system configured with the PRIM parameters and that configured with the
much one is better than the other. Contrarily, the orange bars, “default win,” represent the default
better than that configured with the default parameters. The height of the bars represents how
bars, “PRIM win,” represent that the GR system configured with the PRIM parameters performs
system configured with the PRIM parameters and that with the default parameters. The blue
by the default parameters. Figures 10 and 11 show the performance differences between the GR
by the PRIM parameters and subtract the corresponding values from the GR system configured
We use the precision, recall, accuracy, and execution time from the GR system configured
= 1.1, δ = 1.0, θ = 0.8). The results of precision, recall, accuracy, and execution time for each
compared the performance with the GR experiments based on the default parameters (ϕ = 50, λ
For example, in Fig. 9, the middle point of the range for parameter δ is 1.82. Therefore, we
We performed GR experiments with the PRIM parameters for all the synthetic domains and
the middle points of the corresponding ranges identified by PRIM, namely the PRIM parameters.
configured the δ of our GR system to be 1.82. Similarly, other parameters are configured with
parameter ranges discovered by PRIM to configure our GR system to obtain good performance.
0.0
0.2
0.4
0.6
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.2
0.1
0.0
0.1
0.2
0.3
0.4
blocks-world (10%) blocks-world (10%) blocks-world (10%)
blocks-world (30%) blocks-world (30%) blocks-world (30%)
blocks-world (50%) blocks-world (50%) blocks-world (50%)
blocks-world (70%) blocks-world (70%) blocks-world (70%)
blocks-world (100%) blocks-world (100%) blocks-world (100%)
depots (10%) depots (10%) depots (10%)
depots (30%) depots (30%) depots (30%)
27
intrusion-detection (50%) intrusion-detection (50%)
(b) Recall
intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%)
(a) Precision
(c) Accuracy
intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%)
logistics (10%) logistics (10%) logistics (10%)
logistics (30%) logistics (30%) logistics (30%)
logistics (50%) logistics (50%) logistics (50%)
logistics (70%) logistics (70%) logistics (70%)
logistics (100%) logistics (100%) logistics (100%)
miconic (10%) miconic (10%) miconic (10%)
miconic (30%) miconic (30%) miconic (30%)
miconic (50%) miconic (50%) miconic (50%)
miconic (70%) miconic (70%) miconic (70%)
miconic (100%) miconic (100%) miconic (100%)
rovers (10%) rovers (10%) rovers (10%)
rovers (30%) rovers (30%) rovers (30%)
rovers (50%) rovers (50%) rovers (50%)
rovers (70%) rovers (70%) rovers (70%)
rovers (100%) rovers (100%) rovers (100%)
satellite (10%) satellite (10%) satellite (10%)
satellite (30%) satellite (30%) satellite (30%)
satellite (50%) satellite (50%) satellite (50%)
satellite (70%) satellite (70%) satellite (70%)
satellite (100%) satellite (100%) satellite (100%)
sokoban (10%) sokoban (10%) sokoban (10%)
sokoban (30%) sokoban (30%) sokoban (30%)
sokoban (50%) sokoban (50%) sokoban (50%)
the GR systems trained by the cost-optimal and divergent traces is shown in Appendix C.
zeno-travel (30%) zeno-travel (30%) zeno-travel (30%)
zeno-travel (50%) zeno-travel (50%) zeno-travel (50%)
zeno-travel (70%) zeno-travel (70%) zeno-travel (70%)
PRIM win (divergent)
PRIM win (divergent)
by the cost-optimal traces. The orange bars represent the cost-optimal traces win. The height
Fig. 12. The blue bars represent the GR system trained by divergent traces win over those trained
The results of comparing GR performance on different sets of training traces are shown in
trained by different sets of traces (configured with the PRIM parameters). The performance of
learning the skill models and, consequently, the GR performance. We compared the GR systems
planners, namely the top-k planner and the diverse planner, see Section 5.1.1, which may impact
In our experiments, the traces accepted as historical observations of agents are generated by
conclude that the PRIM parameters lead to better performance, and we recommend using the
recall, as one can always obtain high recall by inferring all the candidate goals. Therefore, we
with higher precision and accuracy is more convincing to be a good system than that with higher
and accuracy of our GR system, while the default parameters yield high recall. A GR system
The comparison results show that the PRIM parameters are likely to yield high precision
Figure 11: The performance differences between the GR system configured with the PRIM parameters and that configured with the
of the bars represents how much one is better than the other. The divergent traces yield higher
precision than the cost-optimal traces (except for the domains of DWR, Intrusion-detection, and
Logistics), as well as a higher accuracy (except for the domains of DWR, Intrusion-detection, and
Sokoban). The cost-optimal traces yield higher recall for all the domains. The recognition times
of the GR system trained by the cost-optimal traces are shorter than those trained by the divergent
traces. Intuitively, compared with the cost-optimal traces, the divergent traces tend to be longer
traces that consist of more actions. As Table 9 shows, the skill models (Petri nets) learned from
these traces tend to have more places, transitions, and arcs, which suggests that the skill models
tend to cover a larger state space. Consequently, the GR system trained by the divergent traces
yields higher precision and accuracy. However, computing the optimal alignment between a trace
and a skill model that covers a smaller state space is relatively easier, such that the recognition
time of the GR system trained by the cost-optimal traces is shorter.
Table 9: The average number of transitions, places, and flow arcs over the skill models learned from the cost-optimal traces and the
divergent traces for each domain.
divergent win
divergent win
divergent win
divergent win
depots (10%)
cost-optimal win
depots (10%) depots (10%) depots (10%)
cost-optimal win
cost-optimal win
cost-optimal win
depots (30%)
depots (30%) depots (30%) depots (30%)
depots (50%)
depots (50%) depots (50%) depots (50%)
depots (70%)
depots (70%) depots (70%) depots (70%)
depots (100%) depots (100%) depots (100%) depots (100%)
driverlog (10%) driverlog (10%) driverlog (10%) driverlog (10%)
driverlog (30%) driverlog (30%) driverlog (30%) driverlog (30%)
driverlog (50%) driverlog (50%) driverlog (50%) driverlog (50%)
driverlog (70%) driverlog (70%) driverlog (70%) driverlog (70%)
driverlog (100%) driverlog (100%) driverlog (100%) driverlog (100%)
dwr (10%) dwr (10%) dwr (10%) dwr (10%)
dwr (30%) dwr (30%) dwr (30%) dwr (30%)
dwr (50%) dwr (50%) dwr (50%) dwr (50%)
dwr (70%) dwr (70%) dwr (70%) dwr (70%)
dwr (100%) dwr (100%) dwr (100%) dwr (100%)
easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%)
easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%)
easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%)
easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%)
easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%)
ferry (10%) ferry (10%) ferry (10%) ferry (10%)
ferry (30%) ferry (30%) ferry (30%) ferry (30%)
ferry (50%) ferry (50%) ferry (50%) ferry (50%)
ferry (70%) ferry (70%) ferry (70%) ferry (70%)
ferry (100%) ferry (100%) ferry (100%) ferry (100%)
intrusion-detection (10%) intrusion-detection (10%) intrusion-detection (10%) intrusion-detection (10%)
intrusion-detection (30%) intrusion-detection (30%) intrusion-detection (30%) intrusion-detection (30%)
intrusion-detection (50%)
29
intrusion-detection (50%) intrusion-detection (50%) intrusion-detection (50%)
(b) Recall
intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%)
(a) Precision
(c) Accuracy
intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%)
lines represent that our approach is the fastest in that domain with the level of observation.
zeno-travel (30%) zeno-travel (30%) zeno-travel (30%) zeno-travel (30%)
zeno-travel (50%) zeno-travel (50%) zeno-travel (50%) zeno-travel (50%)
Figure 12: The differences of the GR performance between the system trained by the cost-optimal traces and the divergent traces.
zeno-travel (70%) zeno-travel (70%) zeno-travel (70%) zeno-travel (70%)
zeno-travel (100%) zeno-travel (100%) zeno-travel (100%) zeno-travel (100%)
corresponding domain and observation level. For the comparison of the execution time, the blue
the performance of our approach is better than the mean of the other GR approaches for the
approaches and compared it with the performance of our approach. A blue line represents that
approach. We calculated the average of the precision, recall, and accuracy over the other
domain for each level of observation. The blue dots represent the GR performance of our
Figure 13 plots the precision, recall, accuracy, and time for different GR approaches in each
The GR performance for domain knowledge-based approaches is listed in Appendix D.
10
10
10
1
1
3
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
blocks-world (10%) blocks-world (10%) blocks-world (10%) blocks-world (10%)
blocks-world (30%) blocks-world (30%) blocks-world (30%) blocks-world (30%)
blocks-world (50%) blocks-world (50%) blocks-world (50%) blocks-world (50%)
blocks-world (70%) blocks-world (70%) blocks-world (70%) blocks-world (70%)
blocks-world (100%) blocks-world (100%) blocks-world (100%) blocks-world (100%)
campus (10%) campus (10%) campus (10%) campus (10%)
campus (30%) campus (30%) campus (30%) campus (30%)
campus (50%) campus (50%) campus (50%) campus (50%)
campus (70%) campus (70%) campus (70%) campus (70%)
campus (100%) campus (100%) campus (100%) campus (100%)
depots (10%) depots (10%) depots (10%) depots (10%)
depots (30%) depots (30%) depots (30%) depots (30%)
depots (50%) depots (50%) depots (50%) depots (50%)
depots (70%) depots (70%) depots (70%) depots (70%)
depots (100%) depots (100%) depots (100%) depots (100%)
driverlog (10%) driverlog (10%) driverlog (10%) driverlog (10%)
driverlog (30%) driverlog (30%) driverlog (30%) driverlog (30%)
driverlog (50%) driverlog (50%) driverlog (50%) driverlog (50%)
driverlog (70%) driverlog (70%) driverlog (70%) driverlog (70%)
driverlog (100%) driverlog (100%) driverlog (100%) driverlog (100%)
dwr (10%) dwr (10%) dwr (10%) dwr (10%)
PM (Ours)
PM (Ours)
PM (Ours)
PM (Ours)
dwr (30%) dwr (30%) dwr (30%) dwr (30%)
dwr (50%) dwr (50%) dwr (50%) dwr (50%)
dwr (70%) dwr (70%) dwr (70%) dwr (70%)
dwr (100%) dwr (100%) dwr (100%) dwr (100%)
easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%)
easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%)
easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%)
easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%)
easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%)
Landmark
Landmark
Landmark
Landmark
ferry (10%) ferry (10%) ferry (10%) ferry (10%)
ferry (30%) ferry (30%) ferry (30%) ferry (30%)
ferry (50%) ferry (50%) ferry (50%) ferry (50%)
ferry (70%) ferry (70%) ferry (70%) ferry (70%)
ferry (100%) ferry (100%) ferry (100%) ferry (100%)
intrusion-detection (10%) intrusion-detection (10%) intrusion-detection (10%) intrusion-detection (10%)
intrusion-detection (30%) intrusion-detection (30%) intrusion-detection (30%) intrusion-detection (30%)
30
intrusion-detection (50%) intrusion-detection (50%) intrusion-detection (50%) intrusion-detection (50%)
(b) Recall
intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%)
(a) Precision
(c) Accuracy
that our approach uses the shortest recognition time (the fastest approach).
intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%)
kitchen (10%) kitchen (10%) kitchen (10%) kitchen (10%)
R&G DUAL-BFWS
R&G DUAL-BFWS
R&G DUAL-BFWS
R&G DUAL-BFWS
kitchen (50%) kitchen (50%) kitchen (50%) kitchen (50%)
kitchen (70%) kitchen (70%) kitchen (70%) kitchen (70%)
kitchen (100%) kitchen (100%) kitchen (100%) kitchen (100%)
logistics (10%) logistics (10%) logistics (10%) logistics (10%)
logistics (30%) logistics (30%) logistics (30%) logistics (30%)
logistics (50%) logistics (50%) logistics (50%) logistics (50%)
logistics (70%) logistics (70%) logistics (70%) logistics (70%)
logistics (100%) logistics (100%) logistics (100%) logistics (100%)
miconic (10%) miconic (10%) miconic (10%) miconic (10%)
miconic (30%) miconic (30%) miconic (30%) miconic (30%)
miconic (50%) miconic (50%) miconic (50%) miconic (50%)
miconic (70%) miconic (70%) miconic (70%) miconic (70%)
LP
satellite (10%) satellite (10%) satellite (10%) satellite (10%)
satellite (30%) satellite (30%) satellite (30%) satellite (30%)
satellite (50%) satellite (50%) satellite (50%) satellite (50%)
satellite (70%) satellite (70%) satellite (70%) satellite (70%)
satellite (100%) satellite (100%) satellite (100%) satellite (100%)
sokoban (10%) sokoban (10%) sokoban (10%) sokoban (10%)
sokoban (30%) sokoban (30%) sokoban (30%) sokoban (30%)
sokoban (50%) sokoban (50%) sokoban (50%) sokoban (50%)
sokoban (70%) sokoban (70%) sokoban (70%) sokoban (70%)
sokoban (100%) sokoban (100%) sokoban (100%) sokoban (100%)
zeno-travel (10%) zeno-travel (10%) zeno-travel (10%) zeno-travel (10%)
zeno-travel (30%) zeno-travel (30%) zeno-travel (30%) zeno-travel (30%)
zeno-travel (50%) zeno-travel (50%) zeno-travel (50%) zeno-travel (50%)
zeno-travel (70%) zeno-travel (70%) zeno-travel (70%) zeno-travel (70%)
zeno-travel (100%) zeno-travel (100%) zeno-travel (100%) zeno-travel (100%)
performance of our approach is better than the average performance of the other GR approaches. For the time, the blue lines indicate
Figure 13: The performances of different GR approaches. For the precision, recall, and accuracy, the blue lines indicate that the
In Fig. 13, the plots show that our approach uses the shortest recognition time in 57 out of 75
cases. The precision of our GR approach is higher than the average of the other approaches in
12 out of 75 cases, and in 14 out of 75 the recall of our approach is higher than the average. As
PRIM identified parameters to maximize accuracy, in approximately half of the cases (36 out of
75 cases), our accuracy is higher than the average of the other approaches. As other approaches
(R&G variants) can access the full domain model, they can compute the cost difference between
the optimal plan and any observed plan to infer the goal. However, our PM-based approach learns
skill models based on a few rational (optimal or close to optimal) plans. If an observed plan has
a large distance from plans in the learning dataset, the recognition accuracy of our PM-based
approach tends to decrease. Hence, for some domains and some levels of observation, other GR
approaches can outperform our PM-based approach. For example, in Section 2, the skill model
for achieving goal A is learned from six rational traces (see Fig. 2a). If the first seven steps of the
red irrational walk (in Fig. 1a) are observed, our PM-based GR system is unlikely to infer that
the red walk intends to achieve goal A, because only the first step can match the steps in Fig. 2a.
However, for the planning-based GR approaches, after seven steps of the red walk, the agent is
located in cell (3, 3), which is close to goal A. As a result, it is likely for these systems to infer
goal A. In short, our PM-based GR system can recognize a goal accurately if it has observed
some similar traces before (regardless of the rationality).
Table 10: The average ranks of performance (avg) for each GR approach and the standard deviations (std) of the ranks.
Table 10 shows the average ranks of GR performance (precision, recall, accuracy, and time)
for the five approaches mentioned above. Despite only using skill models learned from historical
observations and without access to full domain knowledge, our approach achieves an accuracy
level that is comparable to other GR approaches. Furthermore, our approach shows a clear
performance advantage over existing GR approaches in terms of recognition speed. We note
that, potentially, the R&G variants may use some form of precomputation to speed up the
recognition by, for example, precomputing the probabilities “heatmaps” for each state or the
so-called Radius of Maximum Probability (RMP) for each possible goal, as proposed by Masters
and Sardiña [31, 30]. However, those techniques have been proposed for the special case of
navigational grid-world settings, which enjoy a uniform and manageable state space. In our
work, on the other hand, we deal with task-planning domains (beyond navigation) with arbitrary
state space, so precomputation is less applicable or practical. In particular, RMP only provides
the tipping point boundary in which a goal becomes the most probable but would not provide
probabilities or ranks outside that boundary. More generally, precomputation is arguably a
different and orthogonal issue to all approaches. As such, all the evaluated techniques have
performed the recognition from scratch to allow meaningful and fair comparisons.
The relatively lower values for precision and recall reflect our PRIM parameter settings
optimizing for accuracy. When no domain models are available but only historical traces, our
GR system is the only approach among those evaluated here that can still be used.
31
5.5.2. Comparison with LSTM-based GR Approach
The implementation of the LSTM-based GR approach uses the configuration recommended
by Min et al. [63]. The LSTM model, with a dropout rate of 0.75, comprises three layers:
an embedding layer that converts actions to a 20-dimensional vector space, a layer with 100
self-connected memory cells (units), and a softmax layer for probability distribution over goal
candidates, with the highest probability indicating the inferred goal. To handle actions that
appear in testing traces but not in training traces, we set the number of distinct embeddings in the
embedding layer to be the number of unique actions in the training traces, with an additional label
for “unknown” actions. During the training of the LSTM model, we utilize a mini-batch size of 8,
employ the cross entropy loss function, apply the stochastic gradient descent optimizer, and train
for a total of 100 epochs. We evaluated the PM- and LSTM-based systems with two datasets:
a small dataset of 10 traces for achieving each goal and a large dataset with 100 traces per
goal. Note that the training datasets generated by the diverse planner and the top-k planner differ
from the standard testing dataset provided by Pereira et al. [48]. The detailed GR performance
results for the PM- and LSTM-based approaches trained with small and large datasets are listed
in Table E.18 included in Appendix E. For three performance metrics of precision, recall, and
accuracy, Table 11 shows the percentage and the number of cases (out of 75 total cases) where
the PM-based system strictly outperforms the LSTM-based system.
Table 11: The percentage (number) of cases out of 75 cases in which the PM-based system outperformed the LSTM-based system.
Figure 14 plots the accuracies of the PM-based and LSTM-based GR approaches trained
with 10 and 100 traces per goal for all 75 cases. The blue vertical lines denote the cases in which
the PM-based approach is more accurate than the LSTM-based approach. When trained with 10
traces per goal, the PM-based approach is more accurate in 54 out of 75 cases than the LSTM-
based approach. However, the latter is more accurate more often when trained with 100 traces
per goal. For precision and recall, regardless of the size of the training dataset, our approach
outperforms the LSTM-based GR. In Appendix E, Figure E.17 plots all the comparison results
for precision and recall on 10 and 100 training traces. The reason why our approach performs
better on the precision and recall, while the LSTM-based approach performs better (with a large
training dataset) on the accuracy, is that LSTM tends to have high true negative scores (TN, refer
to Section 5.1.4). The problem instances in our dataset contain multiple goal candidates but only
one true goal. If LSTM tends to infer only one goal (or few goals) among many goal candidates,
the TN score is high, even if LSTM always infers a wrong goal. In contrast, our approach
tends to infer more goals to increase the possibility of containing the true goal. Hence, the TN
score of our approach tends to be low, especially in the situations of uncertainty, like when only
a few observations have been made. We conclude that our approach performs better than the
LSTM-based approach if the training dataset size is small and has comparable performance to
the LSTM-based approach when the dataset is relatively large.
The explainability is another merit of the PM-based GR system. While the logic behind the
GR inference is a black box for the LSTM-based GR, the GR results of the PM-based approach
can be explained using alignments between observations and the learned Petri nets. The user
can explore the commonalities and discrepancies between the observations and skill models for
each goal and study how they contributed to the resulting probabilities assigned to each candidate
32
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
blocks-world (10%) blocks-world (10%)
blocks-world (30%) blocks-world (30%)
blocks-world (50%) blocks-world (50%)
blocks-world (70%) blocks-world (70%)
blocks-world (100%) blocks-world (100%)
campus (10%) campus (10%)
campus (30%) campus (30%)
campus (50%) campus (50%)
campus (70%) campus (70%)
campus (100%) campus (100%)
depots (10%) depots (10%)
depots (30%) depots (30%)
depots (50%) depots (50%)
depots (70%) depots (70%)
depots (100%) depots (100%)
driverlog (10%) driverlog (10%)
driverlog (30%) driverlog (30%)
driverlog (50%) driverlog (50%)
driverlog (70%) driverlog (70%)
driverlog (100%) driverlog (100%)
dwr (10%) dwr (10%)
dwr (30%) dwr (30%)
dwr (50%) dwr (50%)
dwr (70%) dwr (70%)
dwr (100%) dwr (100%)
easy-ipc-grid (10%) easy-ipc-grid (10%)
easy-ipc-grid (30%) easy-ipc-grid (30%)
easy-ipc-grid (50%) easy-ipc-grid (50%)
easy-ipc-grid (70%) easy-ipc-grid (70%)
easy-ipc-grid (100%) easy-ipc-grid (100%)
ferry (10%) ferry (10%)
33
intrusion-detection (50%) intrusion-detection (50%)
intrusion-detection (70%) intrusion-detection (70%)
intrusion-detection (100%) intrusion-detection (100%)
kitchen (10%) kitchen (10%)
kitchen (30%) kitchen (30%)
kitchen (50%) kitchen (50%)
kitchen (70%) kitchen (70%)
LSTM
LSTM
(g1 and g2), the random guess approach can randomly return {g1}, {g2}, or {g1, g2}.
sokoban (50%) sokoban (50%)
sokoban (70%) sokoban (70%)
sokoban (100%) sokoban (100%)
zeno-travel (10%) zeno-travel (10%)
zeno-travel (30%) zeno-travel (30%)
zeno-travel (50%) zeno-travel (50%)
zeno-travel (70%) zeno-travel (70%)
zeno-travel (100%) zeno-travel (100%)
experiments using the real-world dataset mentioned in Section 5.1.2, and introduced the random
expected precision and recall of the random guess baseline GR approach, except for 15 out
In Table 12, the majority of precision and recall values are greater than the corresponding
accuracy. Hence, we only show the values of precision to represent both precision and accuracy.
sub-problem. Since binary choice problems only have two candidate goals, precision equals
problem in the BPIC 2011 domain. The Production and Traffic Fines domains only have one
to formulate sub-problems. The notation “BPIC 2011 (1)” in the table represents the first sub-
mentioned in Section 5.1.2, we clustered the event log of each domain into subsets of traces
The performance of the binary choice real-world GR problems is shown in Table 12. As
combination of candidate goals. For example, if a GR problem instance has two candidate goals
represents the performance of a GR approach that randomly returns any candidate goal or any
real-world domains to identify the best configuration parameters. The random guess baseline
in the real-world if it outperforms the baseline. Note that we applied a PRIM analysis in the
guess baseline to compare with our GR performance. We considered our GR system applicable
To verify whether our GR system is applicable in real-world scenarios, we conducted the GR
are doing when striving for different goals. In contrast, the LSTM network only provides tuned
the behavior of the agents toward candidate goals and, thus, give a general view of what agents
goal; refer to the examples discussed in Section 3.2.2. In addition, the learned Petri nets explain
Figure 14: The accuracies of the PM-based GR approaches and the LSTM-based GR approach. The blue lines indicate that the accuracy
of 190 cases (highlighted in red). The recognition time tends to increase as more actions are
observed. Note that, if a recognition time is less than 0.01 seconds, we consider that GR problem
instance to be recognized immediately, and the time is denoted with ε accordingly.
BPIC 2011 (1) BPIC 2011 (2) BPIC 2011 (3) BPIC 2011 (4) BPIC 2015 (1)
%O p r t p r t p r t p r t p r t
10 0.47 0.84 0.32 0.56 0.96 1.98 0.59 0.94 0.48 0.55 0.92 1.05 0.53 0.90 0.45
30 0.65 0.91 0.72 0.60 0.89 5.75 0.66 0.88 1.26 0.62 0.88 2.86 0.65 0.74 1.72
50 0.62 0.86 1.40 0.67 0.92 11.00 0.63 0.83 2.18 0.62 0.88 4.64 0.76 0.80 3.51
70 0.59 0.84 1.94 0.70 0.91 14.37 0.81 0.96 2.95 0.66 0.88 6.85 0.71 0.74 7.31
100 0.69 0.86 2.58 0.66 0.85 15.61 0.79 0.97 3.11 0.84 0.97 8.41 0.82 0.83 10.30
Baseline 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67
BPIC 2015 (2) BPIC 2015 (3) BPIC 2015 (4) BPIC 2015 (5) BPIC 2017 (1)
%O p r t p r t p r t p r t p r t
10 0.56 0.87 0.70 0.63 0.93 0.28 0.53 0.81 0.32 0.69 0.97 0.98 0.50 1.00 ε
30 0.56 0.71 2.07 0.45 0.59 1.63 0.65 0.97 1.16 0.54 0.79 2.68 0.52 0.98 ε
50 0.55 0.61 5.96 0.76 0.91 3.57 0.79 0.93 2.37 0.72 0.90 4.87 0.57 0.98 ε
70 0.58 0.64 6.63 0.85 0.90 4.66 0.91 0.93 4.05 0.84 0.89 7.30 0.59 0.97 ε
100 0.68 0.75 7.29 0.90 0.94 6.04 0.95 0.97 4.86 0.82 0.87 7.70 0.77 0.97 ε
Baseline 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67
BPIC 2017 (2) BPIC 2017 (3) Hospital Billing (1) Hospital Billing (2) Production
%O p r t p r t p r t p r t p r t
10 0.50 1.00 ε 0.50 1.00 ε 0.50 1.00 ε 0.50 0.99 ε 0.50 1.00 0.01
30 0.52 0.97 ε 0.50 1.00 ε 0.68 0.99 ε 0.12 0.20 ε 0.52 1.00 0.01
50 0.56 0.96 ε 0.51 1.00 ε 0.97 0.98 ε 0.50 0.89 ε 0.43 0.83 0.01
70 0.59 0.94 ε 0.51 0.99 ε 0.98 0.98 ε 0.51 0.91 ε 0.52 0.87 0.01
100 0.78 0.97 ε 0.99 1.00 ε 0.99 0.99 ε 0.91 0.93 ε 0.50 0.87 0.01
Baseline 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67
Sepsis Cases (1) Sepsis Cases (2) Sepsis Cases (3) Traffic Fines
%O p r t p r t p r t p r t
10 0.49 0.97 ε 0.49 0.99 ε 0.49 0.97 ε 0.50 1.00 ε
30 0.55 0.97 ε 0.46 0.87 ε 0.47 0.85 ε 0.62 0.80 ε
50 0.59 0.96 ε 0.45 0.85 ε 0.50 0.89 ε 0.64 0.95 ε
70 0.57 0.96 0.01 0.54 0.96 ε 0.47 0.91 0.01 0.68 0.99 ε
100 0.61 0.94 0.01 0.91 0.97 ε 0.55 0.94 0.01 0.74 0.98 ε
Baseline 0.50 0.67 0.50 0.67 0.50 0.67 0.50 0.67
Table 12: GR performance of the binary-choice real-world GR problems; %O: the level of observation, p: precision, r: recall, t: time (in
seconds), ε: time < 0.01. The performance worse than the random guess baseline is highlighted in red.
The performance of the multi-class real-world GR problems is shown in Table 13. The
“Activities,” “Build Prmt,” and “Env Prmt” represent the datasets of Activities of Daily Living,
Building Permit Applications, and Environmental Permit Applications, respectively. The
80%/20% split represents that the skill models were learned from 80% of the traces in that
domain, and the remaining 20% of the traces were used for testing the GR performance. The
(60%/40%) means that 60% of traces were used for learning and 40% of traces for testing. For
the domains of “Activities” and “Build Prmt,” all precision, recall, and accuracy values are greater
for the PM-based GR system than for the random guess baseline. For the domain of “Env Prmt,”
there are two cases where the recall values are slightly lower than the baseline (annotated in red),
while the corresponding precision and accuracy values are significantly higher than the baseline.
These two cases indicate that our GR system tends to infer fewer goals, resulting in higher preci-
sion and accuracy, which, in turn, impacts recall. There are two accuracy values slightly lower
than the baseline (annotated in red), which might indicate that 10% observations may not contain
enough information to identify the true goal.
The precision values are sometimes low (even for 100% observations). The testing obser-
vations are not seen during the learning stage, and the GR precision relies on the ability of the
learned Petri nets to generalize to unforeseen traces. If the learned Petri nets do not generalize
34
Activities (80%/20%) Build Prmt (80%/20%) Env Prmt (80%/20%)
%O p r a t p r a t p r a t
10 0.27 0.97 0.52 0.14 0.34 0.77 0.59 2.00 0.30 0.72 0.47 1.42
30 0.34 0.77 0.71 0.28 0.53 0.65 0.76 6.21 0.40 0.59 0.69 4.99
50 0.35 0.61 0.81 0.47 0.58 0.69 0.80 11.29 0.41 0.52 0.73 9.62
70 0.42 0.74 0.82 0.62 0.58 0.64 0.82 15.88 0.43 0.54 0.72 13.64
100 0.55 0.81 0.88 0.67 0.71 0.76 0.88 20.80 0.55 0.69 0.78 15.20
Baseline 0.13 0.50 0.50 0.20 0.52 0.49 0.20 0.52 0.49
Activities (60%/40%) Build Prmt (60%/40%) Env Prmt (60%/40%)
%O p r a t p r a t p r a t
10 0.29 1.00 0.58 0.04 0.33 0.72 0.58 1.70 0.32 0.74 0.46 0.76
30 0.43 0.71 0.82 0.08 0.56 0.65 0.79 5.37 0.43 0.60 0.72 2.92
50 0.54 0.73 0.88 0.12 0.60 0.70 0.81 9.74 0.37 0.42 0.72 5.75
70 0.50 0.67 0.87 0.15 0.61 0.65 0.83 13.66 0.39 0.48 0.72 8.17
100 0.50 0.60 0.87 0.18 0.65 0.69 0.86 17.85 0.59 0.69 0.81 8.93
Baseline 0.13 0.50 0.50 0.20 0.52 0.49 0.20 0.52 0.49
Table 13: GR performance of the multi-classes real-world GR problems; %O: the level of observation, p: precision, r: recall, a: accuracy,
t: time (in seconds), 80%/20%: 80% of traces used for learning and 20% of traces used for testing, 60%/40%: 60% of traces used for
learning and 40% of traces used for testing. The performance worse than the random guess baseline is highlighted in red.
well, our GR system may not infer the true goal, and the precision decreases. If the learned Petri
nets manage to over-generalize, not distinguishing between different traces for different goals,
then our GR system may infer multiple goals, decreasing the precision.
Our GR system outperforms the random guess baseline for binary choice and multi-class
real-world GR problems. We conclude that our GR system is applicable in real-world scenarios.
6. Related Work
The typical GR approaches rely on predefined models derived either from plan libraries
[2, 66, 23, 67] or domain knowledge [10, 11, 48, 62]. Some domain knowledge-based GR
approaches consider handling irrational behaviors [32, 68]. Our work, on the other hand, aims
to solve the GR problem by learning models from historical observations, relying on sets of
observed plans. Similar work by Sohrabi et al. [69] also relies on sets of plans to solve GR
problems. However, this approach takes a sequence of states rather than actions as input. The
observed state sequence is mapped to each plan in the set according to the domain knowledge,
and the mapping process identifies the noisy and missing observations. Taking the noisy and
missing observations into account, the likelihood of the observed state sequence to match each
plan in the set and the likelihood of the agent following that plan to achieve the goal are
computed. However, this approach still requires domain knowledge for mapping a state sequence
to an action sequence.
Some works in (statistical) learning propose performing GR based on historical behavior
data. For example, one can learn the underlying domain transition model to support planning-
based GR [12, 13, 70]. Alternatively, one can learn the decision-making model of the observed
agent when executing an Hierarchical Task Network style plan library (which is known as a
priori) or perform the end-to-end learning from observed behavior to the intended goal [63, 71].
Like our work, their overarching objective is to ease the traditional requirement of hand-crafting
the observed agent model. However, those approaches require large training datasets and yield
black-box type GR systems. In contrast, our approach can perform well even if the training
datasets are small. In addition, our approach produces judgments that can be directly interpreted
by relying on the structured processes synthesized and identified process misalignments.
35
A method proposed by Shvo et al. [14] trains interpretable classifiers that can solve GR
problems in the absence of pre-defined models. This method learns deterministic finite automata
(DFA) from training traces. Similar to our previous work [20], they replace Petri nets with DFAs
and target the use case of early prediction of goals. The learned DFAs are graphical encodings
of the training traces, and each DFA is associated with a class label. The probability distribution
over class labels is computed based on the observed trace against the DFAs. This approach relies
on the training traces and a transition function to assign the trace prefixes to the states in DFA.
However, our approach purely relies on the traces.
In the business process management area, some existing works aim to predict the business
goal of an incomplete business process (i.e., partially observed trace), which is referred to as
outcome-oriented predictive process monitoring [54]. The existing methods [72, 73, 74, 75, 76]
predict the class label (business goal) of a given trace based on trained classifiers. Similar to our
GR approach, these works require an offline phase for learning classifiers and an online phase
for predicting. As summarized by Teinemaa et al. [54], the outcome-oriented predictive process
monitoring methods extract and filter traces from an event log to obtain the prefixes of the traces.
Next, these methods divide the trace prefixes into multiple buckets for training several classifiers.
Several bucketing approaches are used, such as the k-nearest neighbors (KNN) [72], the state-
based approaches [73, 74], and the clustering-based approaches [75, 76]. Subsequently, the trace
prefixes in each bucket are encoded to feature vectors, since training the classifiers requires
the fixed-length feature vectors as input. Finally, the classifiers are trained with commonly
used classification algorithms such as decision tree (DT), random forest (RF), or support vector
machine (SVM). In the online predicting phase, the trained classifier assigns a class label to an
observed trace. However, these outcome-oriented predictive process monitoring approaches are
also non-interpretable artifacts. In contrast, we construct interpretable artifacts such that both
skill models and alignments are interpretable.
Process discovery resembles action model learning [77, 78, 79, 80]. Whereas the aim of
action model learning is to learn the dynamics of an underlying environment (e.g., PDDL
models), process discovery aims to learn models that compactly describe sets of action sequences
(goal-relevant plans, in our case) without relying on any information about the states of the
environment. Hence, unlike existing works on learning action models, process discovery has
fewer data requirements in that it does not require information about domain states. As such,
places in our Petri nets do not represent domain states but rather the states of plans in a
generalized manner. Importantly, also, discovery techniques are designed to work with human-
driven processes and, thus, are made to be robust toward missing or noisy data. At the technical
level, Petri nets and planning models (PDDL or STRIPS) have indeed been related in both
directions, but for different purposes and needs than ours. For example, planning models
were translated to Petri nets to perform concurrent planning using known Petri net unfolding
techniques [81, 82], while Petri nets have been compiled into planning models to facilitate
process analysis using planning technology [83, 84]. As stated, our Petri net models do not
aim to represent dynamic systems or be used to perform planning. Instead, we use Petri nets as
convenient compact representations of sets of plans and leverage significant existing work and
tools to perform process discovery and alignment analysis.
The research on cognitive architectures [85, 86, 87, 88] attempts to model the core capabili-
ties of humans, including, but not limited to, perception, attention mechanisms, action selection,
learning, memory, reasoning, and metareasoning [89]. Our GR framework can be seen as a
goal-intention recognition module of a cognitive architecture, such as ACT-R [85] and Soar [86].
Note also that the proposed in this work GR framework is specifically designed as an outline for
36
implementing process mining-based GR systems.
7. Conclusion
We presented a solution to the goal recognition problem that does not require pre-defined
models of behavior in the domain of concern, as is often the case in many existing goal
recognition approaches in the form of plan libraries or domain dynamic descriptions (e.g.,
planning domains). Instead, our GR approach leverages recorded past behaviors (captured as
collections of event traces) to automatically learn skill models using process discovery techniques
and, subsequently, infer the goal of the agent by checking conformance, or aligning, observations
against the skill models. This perspective takes advantage of the fact that logs of past behaviors
exist or can be collected with reasonable effort. In contrast, plan libraries or domain descriptions
may not be readily available in many real-world domains (or are costly to produce).
More concretely, we recast the principled planning-based GR approaches based on the
rationality assumption of agents within our set-up of learned skill models and process alignment
to obtain a probabilistic GR based on seen traces of behavior. As our approach contains
four parameters, we conducted a sensitivity analysis to verify that all four parameters have a
significant impact on the GR accuracy and used the Patient Rule Induction Method (PRIM)
to identify the parameters that yield high GR accuracy. We showed, experimentally, that
despite relying on a limited number of past behaviors, our approach achieved an accuracy
level comparable to other state-of-the-art GR approaches with full access to domain knowledge
and that, in addition, the recognition speed of our approach is often faster. We also provided
experimental results on real-world datasets for which no domain knowledge is readily available
but for which logs of traces do exist. Such results provide evidence that the approach is able
to perform GR quickly and accurately and without predefined plan libraries or domain models.
Finally, we argue that our GR approach can be used to instantiate a GR framework inspired by
the principles of observational learning from social cognitive learning theory, which constitutes a
collection of components that can be selectively replaced to tune the performance of the system.
We acknowledge a range of limitations of our work thus far. For instance, we have not tested
with a wider range of process discovery techniques, including learning and using stochastic
process models that encode frequencies of the traces they were discovered from and models
constructed from spurious observations. Also, other conformance checking techniques beyond
alignments used in this work can be explored, and we have only considered alignment moves that
involve trace actions. These limitations give rise to future work. Another interesting direction
to pursue in future work is looking at non-stationary environments, that is, environments that
change over time. Finally, future work will address the detailed design and implementation of
the attention, retention, and motivation phases of the proposed goal recognition framework.
References
[1] S. Carberry, Techniques for plan recognition, User Modeling and User-Adapted Interaction 11 (2001) 31–48.
[2] H. A. Kautz, J. F. Allen, Generalized plan recognition, in: AAAI, 1986, pp. 32–37.
[3] G. Sukthankar, K. P. Sycara, A cost minimization approach to human behavior recognition, in: AAMAS, 2005,
pp. 1067–1074.
[4] A. Kott, W. McEneaney, Adversarial reasoning: Computational approaches to reading the opponent’s mind, CRC
Press, 2006.
[5] S. Lefèvre, D. Vasquez, C. Laugier, A survey on motion prediction and risk assessment for intelligent vehicles,
Robomech Journal 1 (2014) 1–14.
37
[6] J. Firl, Q. Tran, Probabilistic maneuver prediction in traffic scenarios, in: ECMR, 2011, pp. 89–94.
[7] J. F. P. Kooij, N. Schneider, F. Flohr, D. M. Gavrila, Context-based pedestrian path prediction, in: ECCV (6),
volume 8694 of LNCS, 2014, pp. 618–633.
[8] N. Lesh, C. Rich, C. L. Sidner, Using plan recognition in human-computer collaboration, in: UM, 1999, pp. 23–32.
[9] R. Demolombe, E. Hamon, What does it mean that an agent is performing a typical procedure? A formal definition
in the situation calculus, in: AAMAS, 2002, pp. 905–911.
[10] M. Ramı́rez, H. Geffner, Plan recognition as planning, in: IJCAI, 2009, pp. 1778–1783.
[11] M. Ramı́rez, H. Geffner, Probabilistic plan recognition using off-the-shelf classical planners, in: AAAI, 2010, pp.
1121–1126.
[12] L. Amado, R. F. Pereira, J. P. Aires, M. C. Magnaguagno, R. Granada, F. Meneguzzi, Goal recognition in latent
space, in: IJCNN, 2018, pp. 1–8.
[13] L. Amado, R. Mirsky, F. Meneguzzi, Goal recognition as reinforcement learning, in: AAAI, 2022, pp. 9644–9651.
[14] M. Shvo, A. C. Li, R. T. Icarte, S. A. McIlraith, Interpretable sequence classification via discrete optimization, in:
AAAI, 2021, pp. 9647–9656.
[15] P. Haslum, N. Lipovetzky, D. Magazzeni, C. Muise, An Introduction to the Planning Domain Definition Language,
Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, 2019.
[16] Z. Su, A. Polyvyanyy, N. Lipovetzky, S. Sardiña, N. van Beest, GRACE: A simulator for continuous goal
recognition over changing environments, in: PMAI@IJCAI, volume 3310 of CEUR Workshop Proceedings, 2022,
pp. 37–48.
[17] W. M. P. van der Aalst, Process Mining — Data Science in Action, Second Edition, Springer, 2016.
[18] R. Fikes, N. J. Nilsson, STRIPS: A new approach to the application of theorem proving to problem solving,
Artificial Inteligence 2 (1971) 189–208.
[19] N. Wilken, H. Stuckenschmidt, Combining symbolic and statistical knowledge for goal recognition in smart home
environments, in: PerCom Workshops, 2021, pp. 26–31.
[20] A. Polyvyanyy, Z. Su, N. Lipovetzky, S. Sardiña, Goal recognition using off-the-shelf process mining techniques,
in: AAMAS, 2020, pp. 1072–1080.
[21] C. F. Schmidt, N. S. Sridharan, J. L. Goodson, The plan recognition problem: An intersection of psychology and
artificial intelligence, Artificial Inteligence 11 (1978) 45–83.
[22] G. Sukthankar, C. Geib, H. H. Bui, D. Pynadath, R. P. Goldman, Plan, activity, and intent recognition: Theory and
practice, Newnes, 2014.
[23] D. Avrahami-Zilberbrand, G. A. Kaminka, Fast and complete symbolic plan recognition, in: IJCAI, 2005, pp.
653–658.
[24] D. V. Pynadath, M. P. Wellman, Probabilistic state-dependent grammars for plan recognition, in: UAI, 2000, pp.
507–514.
[25] J. Hong, Goal recognition through goal graph analysis, Journal of Artificial Intelligence Research 15 (2001) 1–30.
[26] C. L. Baker, R. Saxe, J. B. Tenenbaum, Action understanding as inverse planning, Cognition 113 (2009) 329–349.
[27] D. Pattison, D. Long, Accurately determining intermediate and terminal plan states using Bayesian goal
recognition, in: D. Pattison, D. Long, C. Geib (Eds.), GAPRec 2011. Proceedings of the First Workshop on
Goal, Activity and Plan Recognition, 2011, pp. 32–37.
[28] M. Ghallab, D. S. Nau, P. Traverso, Automated planning — Theory and practice, Elsevier, 2004.
[29] P. Masters, S. Sardiña, Cost-based goal recognition for path-planning, in: AAMAS, ACM, 2017, pp. 750–758.
[30] P. Masters, S. Sardiña, Cost-based goal recognition in navigational domains, Journal of Artificial Intelligence
Research 64 (2019) 197–242.
[31] P. Masters, S. Sardiña, Goal recognition for rational and irrational agents, in: AAMAS, 2019, pp. 440–448.
[32] P. Masters, S. Sardiña, Expecting the unexpected: Goal recognition for rational and irrational agents, Artificial
Inteligence 297 (2021) 103490.
[33] M. Vered, G. A. Kaminka, S. Biham, Online goal recognition through mirroring: Humans and agents, in: Advances
in Cognitive Systems, 2016.
[34] W. M. P. van der Aalst, T. Weijters, L. Maruster, Workflow mining: Discovering process models from event logs,
IEEE Transactions on Knowledge and Data Engineering 16 (2004) 1128–1142.
[35] W. M. P. van der Aalst, The application of Petri nets to workflow management, Journal of Circuits, Systems and
Computers 8 (1998) 21–66.
[36] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, A. Polyvyanyy, Split miner: Automated discovery of accurate
and simple business process models from event logs, Knowledge and Information Systems 59 (2019) 251–284.
[37] M. Weidlich, A. Polyvyanyy, N. Desai, J. Mendling, M. Weske, Process compliance analysis based on behavioural
profiles, Information Systems 36 (2011) 1009–1025.
[38] W. M. P. van der Aalst, A. Adriansyah, B. F. van Dongen, Replaying history on process models for conformance
checking and performance analysis, WIREs Data Mining and Knowledge Discovery 2 (2012) 182–192.
[39] J. Carmona, B. F. van Dongen, A. Solti, M. Weidlich, Conformance Checking — Relating Processes and Models,
38
Springer, 2018.
[40] A. Polyvyanyy, A. Solti, M. Weidlich, C. Di Ciccio, J. Mendling, Monotone precision and recall measures for
comparing executions and specifications of dynamic systems, ACM Transactions on Software Engineering and
Methodology 29 (2020) 17:1–17:41.
[41] A. Bandura, Observational Learning, American Cancer Society, 2008.
[42] J. M. E. M. van der Werf, A. Polyvyanyy, B. R. van Wensveen, M. J. S. Brinkhuis, H. A. Reijers, All that glitters
is not gold: Four maturity stages of process discovery algorithms, Information Systems 114 (2023) 102155.
[43] S. J. J. Leemans, W. M. P. van der Aalst, T. Brockhoff, A. Polyvyanyy, Stochastic process mining: Earth movers’
stochastic conformance, Information Systems 102 (2021) 101724.
[44] D. Borsa, N. Heess, B. Piot, S. Liu, L. Hasenclever, R. Munos, O. Pietquin, Observational learning by reinforcement
learning, in: AAMAS, 2019, pp. 1117–1124.
[45] S. J. J. Leemans, E. Poppe, M. T. Wynn, Directly follows-based process mining: Exploration & a case study, in:
ICPM, 2019, pp. 25–32.
[46] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Process and deviation exploration with inductive visual miner,
in: BPM Demos, volume 1295 of CEUR Workshop Proceedings, 2014, p. 46.
[47] A. Adriansyah, N. Sidorova, B. F. van Dongen, Cost-based fitness in conformance checking, in: ACSD, 2011, pp.
57–66.
[48] R. F. Pereira, N. Oren, F. Meneguzzi, Landmark-based approaches for goal recognition as planning, Artificial
Inteligence 279 (2020).
[49] R. F. Pereira, N. Oren, F. Meneguzzi, Landmark-based heuristics for goal recognition, in: AAAI, 2017, pp.
3622–3628.
[50] M. Katz, S. Sohrabi, O. Udrea, D. Winterer, A novel iterative approach to top-k planning, in: ICAPS, 2018, pp.
132–140.
[51] M. Katz, S. Sohrabi, Reshaping diverse planning, in: AAAI, 2020, pp. 9892–9899.
[52] M. Fox, A. Gerevini, D. Long, I. Serina, Plan stability: Replanning versus plan repair, in: ICAPS, 2006, pp.
212–221.
[53] A. Coman, H. Muñoz-Avila, Generating diverse plans using quantitative and qualitative plan distance metrics, in:
AAAI, 2011.
[54] I. Teinemaa, M. Dumas, M. La Rosa, F. M. Maggi, Outcome-oriented predictive process monitoring: Review and
benchmark, ACM Transactions on Knowledge Discovery from Data 13 (2019) 17:1–17:57.
[55] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, S. Tarantola, Global sensitivity
analysis: The primer, John Wiley & Sons, 2008.
[56] J. H. Kwakkel, The exploratory modeling workbench: An open source toolkit for exploratory modeling, scenario
discovery, and (multi-objective) robust decision making, Environmental Modelling & Software 96 (2017) 239–250.
[57] I. Sobol, Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates, Mathematics
and Computers in Simulation 55 (2001) 271–280.
[58] X. Y. Zhang, M. N. Trame, L. J. Lesko, S. Schmidt, Sobol sensitivity analysis: A tool to guide the development and
evaluation of systems pharmacology models, CPT: Pharmacometrics & Systems Pharmacology 4 (2015) 69–79.
[59] J. Helton, F. Davis, Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems,
Reliability Engineering & System Safety 81 (2003) 23–69.
[60] B. P. Bryant, R. J. Lempert, Thinking inside the box: A participatory, computer-assisted approach to scenario
discovery, Technological Forecasting and Social Change 77 (2010) 34–49.
[61] J. H. Friedman, N. I. Fisher, Bump hunting in high-dimensional data, Statistics and Computing 9 (1999) 123–143.
[62] L. R. A. Santos, F. Meneguzzi, R. F. Pereira, A. G. Pereira, An lp-based approach for goal recognition as planning,
in: AAAI, 2021, pp. 11939–11946.
[63] W. Min, B. W. Mott, J. P. Rowe, B. Liu, J. C. Lester, Player goal recognition in open-world digital games with long
short-term memory networks, in: IJCAI, 2016, pp. 2590–2596.
[64] S. Richter, M. Helmert, M. Westphal, Landmarks revisited, in: AAAI, 2008, pp. 975–982.
[65] N. Lipovetzky, H. Geffner, Best-first width search: Exploration and exploitation in classical planning, in: AAAI,
2017, pp. 3590–3596.
[66] E. Charniak, R. P. Goldman, A Bayesian model of plan recognition, Artificial Inteligence 64 (1993) 53–79.
[67] C. W. Geib, R. P. Goldman, Partial observability and probabilistic plan/goal recognition, in: Proceedings of the
International workshop on modeling other agents from observations (MOO-05), volume 8, 2005, pp. 1–6.
[68] T. Zhi-Xuan, J. L. Mann, T. Silver, J. Tenenbaum, V. Mansinghka, Online Bayesian goal inference for boundedly
rational planning agents, in: NeurIPS, 2020.
[69] S. Sohrabi, A. V. Riabov, O. Udrea, Plan recognition as planning revisited, in: IJCAI, 2016, pp. 3258–3264.
[70] R. F. Pereira, M. Vered, F. Meneguzzi, M. Ramı́rez, Online probabilistic goal recognition over nominal models, in:
IJCAI, 2019, pp. 5547–5553.
[71] W. Min, E. Ha, J. P. Rowe, B. W. Mott, J. C. Lester, Deep learning-based goal recognition in open-ended digital
39
games, in: AIIDE, 2014.
[72] F. M. Maggi, C. Di Francescomarino, M. Dumas, C. Ghidini, Predictive monitoring of business processes, in:
CAiSE, volume 8484 of LNCS, 2014, pp. 457–472.
[73] G. T. Lakshmanan, S. Duan, P. T. Keyser, F. Curbera, R. Khalaf, Predictive analytics for semi-structured case
oriented business processes, in: BPM Workshops, volume 66 of LNBIP, 2010, pp. 640–651.
[74] W. M. P. van der Aalst, V. A. Rubin, H. M. W. Verbeek, B. F. van Dongen, E. Kindler, C. W. Günther, Process
mining: A two-step approach to balance between underfitting and overfitting, Software and Systems Modeling 9
(2010) 87–111.
[75] C. D. Francescomarino, M. Dumas, F. M. Maggi, I. Teinemaa, Clustering-based predictive process monitoring,
IEEE Transactions on Services Computing 12 (2019) 896–909.
[76] I. Verenich, M. Dumas, M. La Rosa, F. M. Maggi, C. Di Francescomarino, Complex symbolic sequence clustering
and multiple classifiers for predictive process monitoring, in: BPM Workshops, volume 256 of LNBIP, 2015, pp.
218–229.
[77] R. Garcı́a-Martı́nez, D. Borrajo, An integrated approach of learning, planning, and execution, Journal of Intelligent
& Robotic Systems 29 (2000) 47–78.
[78] D. Shahaf, E. Amir, Learning partially observable action schemas, in: AAAI, 2006, pp. 913–919.
[79] L. Lamanna, A. Saetti, L. Serafini, A. Gerevini, P. Traverso, Online learning of action models for PDDL planning,
in: IJCAI, 2021, pp. 4112–4118.
[80] V. Mehta, B. Paria, J. Schneider, S. Ermon, W. Neiswanger, An experimental design perspective on model-based
reinforcement learning, in: ICLR, 2022.
[81] B. Bonet, P. Haslum, S. L. Hickmott, S. Thiébaux, Directed unfolding of Petri nets, Transactions on Petri Nets and
Other Models of Concurrency 1 (2008) 172–198.
[82] S. L. Hickmott, S. Sardiña, Optimality properties of planning via Petri net unfolding: A formal analysis, in:
ICAPS, 2009.
[83] S. Edelkamp, S. Jabbar, Action planning for directed model checking of Petri nets, in: MoChArt@CONCUR/SPIN,
volume 149 of Electronic Notes in Theoretical Computer Science, 2005, pp. 3–18.
[84] M. de Leoni, G. Lanciano, A. Marrella, Aligning partially-ordered process-execution traces and models using
automated planning, in: ICAPS, 2018, pp. 321–329.
[85] J. R. Anderson, How Can the Human Mind Occur in the Physical Universe?, Oxford University Press, 2007.
[86] J. E. Laird, Extending the soar cognitive architecture, in: AGI, volume 171 of Frontiers in Artificial Intelligence
and Applications, 2008, pp. 224–235.
[87] E. Gat, Integrating planning and reacting in a heterogeneous asynchronous architecture for controlling real-world
mobile robots, in: AAAI, 1992, pp. 809–815.
[88] R. J. Firby, R. E. Kahn, P. N. Prokopowicz, M. J. Swain, An architecture for vision and action, in: IJCAI, 1995,
pp. 72–81.
[89] I. Kotseruba, J. K. Tsotsos, 40 years of cognitive architectures: Core cognitive abilities and practical applications,
Artificial Intelligence Review 53 (2020) 17–94.
40
Appendix A. Skipped GR Problems
The total number of GR problem instances, the number of skipped instances, and the
number of remaining instances used in our evaluations, for each synthetic domain, are shown in
Table A.14. The number of skipped instances and remaining instances for the top-k planner and
the diverse planner are recorded in separate columns. In the domains of Blocks-world, Depots,
DWR, and Sokoban, the top-k or the diverse planner skip a portion of the total instances. For
the domains in which some instances are skipped, in Table A.15, we display the total number of
instances and the numbers of skipped instances for each level of observations (10%, 30%, 50%,
70%, and 100%).
top-k planner diverse planner
Domain Instances
skipped remaining skipped remaining
Blocks-world 1076 12 1064 152 924
Campus 75 0 75 75 0
Depots 364 0 364 203 161
Driverlog 364 0 364 0 364
DWR 364 0 364 260 104
Easy-ipc-grid 673 0 673 0 673
Ferry 364 0 364 0 364
Intrusion-detection 465 0 465 0 465
Kitchen 75 0 75 75 0
Logistics 673 0 673 0 673
Miconic 364 0 364 0 364
Rovers 364 0 364 0 364
Satellite 364 0 364 0 364
Sokoban 364 0 364 104 260
Zeno-travel 364 0 364 0 364
Table A.14: The number of skipped problems in synthetic domains for the top-k planner and the diverse planner.
Table A.15: The number of skipped problems for each level of observations (for the domains with some, and not all, skipped instances).
41
Appendix B. Sobol Sensitivity Analysis
The results of the Sobol sensitivity analysis on the synthetic domains (excluding the domain
of Blocks-world discussed in Section 5.2) with the cost-optimal traces generated by the top-k
planner are shown in Fig. B.15.
Figure B.15: Sobol sensitivity analysis (the GR system was trained with the cost-optimal traces).
42
The results of the Sobol sensitivity analysis on the synthetic domains (excluding the domain
of Blocks-world discussed in Section 5.2) with the divergent traces generated by the diverse
planner are shown in Fig. B.16.
Figure B.16: Sobol sensitivity analysis (the GR system was trained with the divergent traces).
43
Appendix C. Performance of GR Approaches Trained Using Cost-Optimal and Divergent
Traces and Configured Using PRIM and Default Parameters
Domain %O PRIM (cost-optimal traces) Default (cost-optimal traces) PRIM (divergent traces) Default (divergent traces)
p r a t p r a t p r a t p r a t
10 0.13 0.67 0.53 0.04 0.05 1.00 0.05 0.05 0.12 0.50 0.69 0.07 0.05 1.00 0.05 0.07
30 0.25 0.79 0.70 0.04 0.03 0.99 0.05 0.06 0.31 0.60 0.87 0.07 0.05 0.95 0.10 0.07
blocks-world 50 0.33 0.77 0.74 0.04 0.05 0.99 0.26 0.04 0.39 0.58 0.91 0.08 0.07 0.91 0.30 0.08
70 0.49 0.91 0.78 0.05 0.13 1.00 0.53 0.05 0.58 0.70 0.95 0.10 0.20 0.90 0.65 0.10
100 0.76 1.00 0.93 0.05 0.43 1.00 0.82 0.06 0.76 0.89 0.98 0.12 0.43 0.97 0.82 0.12
10 0.50 1.00 0.50 0.01 0.50 1.00 0.50 0.01 — — — — — — — —
30 0.60 0.93 0.60 5.96E-3 0.50 1.00 0.50 6.81E-3 — — — — — — — —
campus 50 0.63 1.00 0.63 6.57E-3 0.50 1.00 0.50 6.85E-3 — — — — — — — —
70 0.80 1.00 0.80 5.51E-3 0.57 1.00 0.57 6.14E-3 — — — — — — — —
100 0.90 1.00 0.90 6.53E-3 0.70 1.00 0.70 6.95E-3 — — — — — — — —
10 0.16 0.85 0.31 0.01 0.11 1.00 0.11 0.01 0.11 0.57 0.45 0.19 0.12 1.00 0.12 0.15
30 0.21 0.88 0.41 0.01 0.13 0.96 0.23 0.01 0.28 0.42 0.75 0.35 0.22 1.00 0.35 0.26
depots 50 0.26 0.92 0.44 0.01 0.16 0.95 0.32 0.01 0.25 0.36 0.77 0.55 0.20 0.92 0.39 0.41
70 0.29 0.94 0.45 0.02 0.22 0.96 0.40 0.02 0.46 0.64 0.84 0.74 0.30 0.94 0.58 0.54
100 0.41 0.96 0.48 0.01 0.34 0.93 0.46 0.01 0.58 0.67 0.91 0.98 0.66 1.00 0.84 0.73
10 0.34 0.94 0.41 9.75E-3 0.15 1.00 0.15 0.01 0.29 0.68 0.58 0.68 0.15 0.99 0.17 0.69
30 0.26 0.88 0.41 8.43E-3 0.13 0.96 0.24 8.80E-3 0.35 0.51 0.74 2.09 0.22 0.80 0.45 2.13
driverlog 50 0.34 0.94 0.45 8.83E-3 0.24 0.95 0.37 7.67E-3 0.48 0.60 0.82 3.43 0.42 0.83 0.68 3.48
70 0.34 0.94 0.45 9.51E-3 0.27 0.96 0.41 9.65E-3 0.56 0.62 0.84 4.85 0.52 0.80 0.76 4.85
100 0.42 0.93 0.48 7.85E-3 0.41 0.96 0.47 7.32E-3 0.79 0.86 0.93 6.61 0.64 0.93 0.82 6.82
10 0.33 0.52 0.71 0.02 0.22 0.87 0.39 0.02 0.18 0.54 0.50 0.04 0.15 1.00 0.15 0.04
30 0.39 0.58 0.83 0.04 0.28 0.76 0.65 0.04 0.22 0.29 0.74 0.06 0.17 0.83 0.30 0.06
dwr 50 0.55 0.74 0.89 0.05 0.50 0.87 0.81 0.05 0.25 0.33 0.78 0.10 0.29 0.83 0.58 0.09
70 0.54 0.70 0.90 0.07 0.51 0.87 0.85 0.07 0.28 0.38 0.79 0.12 0.31 0.83 0.59 0.11
100 0.80 0.86 0.95 0.07 0.79 0.96 0.92 0.08 0.12 0.12 0.74 0.16 0.35 0.62 0.81 0.16
10 0.34 0.96 0.46 0.01 0.13 1.00 0.13 0.01 0.48 0.90 0.70 0.09 0.16 1.00 0.23 0.09
30 0.54 0.99 0.67 0.02 0.23 1.00 0.39 0.02 0.79 0.95 0.94 0.17 0.45 1.00 0.67 0.17
easy-ipc-grid 50 0.59 1.00 0.68 0.02 0.45 1.00 0.59 0.02 0.89 0.97 0.97 0.24 0.69 0.97 0.87 0.23
70 0.61 1.00 0.69 0.02 0.56 1.00 0.66 0.02 0.89 0.96 0.98 0.29 0.79 0.96 0.92 0.29
100 0.73 1.00 0.75 0.02 0.72 1.00 0.75 0.02 0.92 0.97 0.98 0.27 0.86 0.95 0.97 0.27
10 0.17 0.75 0.37 0.01 0.13 1.00 0.13 0.01 0.16 0.40 0.59 0.23 0.13 1.00 0.14 0.24
30 0.13 0.65 0.45 9.97E-3 0.11 0.92 0.29 0.01 0.18 0.29 0.70 0.52 0.18 0.90 0.40 0.52
ferry 50 0.18 0.64 0.51 0.01 0.20 0.90 0.42 0.01 0.29 0.42 0.78 0.86 0.26 0.76 0.58 0.86
70 0.29 0.77 0.54 0.01 0.33 0.95 0.54 0.01 0.41 0.45 0.83 1.15 0.54 0.85 0.81 1.16
100 0.48 0.93 0.59 0.01 0.51 1.00 0.59 0.01 0.84 0.89 0.95 1.45 0.76 0.96 0.89 1.43
10 0.32 0.52 0.78 0.01 0.07 0.70 0.04 0.01 0.19 0.55 0.58 0.02 0.07 0.70 0.04 0.02
30 0.43 0.58 0.89 0.01 0.14 0.77 0.47 0.02 0.26 0.50 0.77 0.01 0.10 0.64 0.25 0.01
intrusion-detection 50 0.46 0.55 0.91 0.02 0.28 0.65 0.79 0.02 0.42 0.49 0.85 0.01 0.29 0.64 0.71 0.01
70 0.55 0.61 0.93 0.02 0.47 0.66 0.89 0.02 0.47 0.52 0.87 0.01 0.41 0.66 0.80 0.01
100 0.43 0.49 0.91 0.02 0.42 0.49 0.89 0.02 0.44 0.44 0.87 0.02 0.43 0.51 0.81 0.01
10 0.66 1.00 0.71 0.01 0.33 1.00 0.33 0.01 — — — — — — — —
30 0.73 0.87 0.82 6.02E-3 0.48 1.00 0.49 5.84E-3 — — — — — — — —
kitchen 50 0.57 0.80 0.69 5.72E-3 0.51 1.00 0.51 5.97E-3 — — — — — — — —
70 0.90 1.00 0.93 5.09E-3 0.44 1.00 0.47 5.40E-3 — — — — — — — —
100 0.80 0.87 0.87 4.15E-3 0.49 1.00 0.56 4.55E-3 — — — — — — — —
10 0.41 0.86 0.67 0.01 0.10 1.00 0.10 0.01 0.24 0.58 0.61 0.16 0.10 0.98 0.14 0.17
30 0.40 0.78 0.71 0.02 0.20 0.97 0.53 0.01 0.38 0.48 0.83 0.31 0.24 0.78 0.61 0.30
logistics 50 0.48 0.81 0.72 0.02 0.33 0.97 0.65 0.02 0.43 0.52 0.87 0.45 0.38 0.87 0.78 0.45
70 0.56 0.88 0.74 0.02 0.45 0.98 0.71 0.02 0.57 0.63 0.91 0.60 0.48 0.88 0.84 0.60
100 0.62 0.84 0.78 0.02 0.55 0.98 0.76 0.02 0.82 0.84 0.96 0.66 0.70 0.95 0.92 0.67
10 0.22 0.64 0.49 0.01 0.16 0.94 0.19 0.01 0.31 0.60 0.65 0.30 0.17 0.86 0.28 0.31
30 0.19 0.56 0.59 0.02 0.14 0.87 0.34 0.02 0.26 0.40 0.68 0.68 0.22 0.77 0.46 0.68
miconic 50 0.20 0.56 0.60 0.02 0.16 0.80 0.46 0.02 0.34 0.42 0.76 1.05 0.32 0.75 0.58 1.05
70 0.20 0.50 0.61 0.02 0.19 0.69 0.51 0.02 0.48 0.50 0.82 1.41 0.46 0.75 0.73 1.41
100 0.35 0.61 0.63 0.02 0.31 0.68 0.57 0.02 0.68 0.71 0.88 1.92 0.70 0.75 0.86 1.94
10 0.24 0.93 0.36 8.74E-3 0.17 1.00 0.17 9.04E-3 0.17 0.46 0.56 0.27 0.17 0.92 0.23 0.28
30 0.22 0.86 0.40 7.43E-3 0.11 1.00 0.11 7.36E-3 0.32 0.48 0.71 0.72 0.22 0.85 0.38 0.72
rovers 50 0.15 0.74 0.39 6.10E-3 0.13 0.98 0.23 6.26E-3 0.43 0.51 0.79 1.11 0.29 0.81 0.52 1.12
70 0.15 0.70 0.38 7.83E-3 0.14 0.89 0.30 6.87E-3 0.42 0.50 0.79 1.57 0.36 0.81 0.66 1.55
100 0.19 0.71 0.39 6.46E-3 0.16 0.75 0.35 6.19E-3 0.43 0.46 0.79 2.22 0.45 0.93 0.67 2.20
10 0.17 0.89 0.24 0.01 0.16 1.00 0.16 0.01 0.20 0.64 0.44 0.13 0.16 1.00 0.16 0.13
30 0.15 0.88 0.32 0.01 0.11 1.00 0.11 9.41E-3 0.26 0.55 0.58 0.21 0.20 0.83 0.33 0.21
satellite 50 0.19 0.87 0.38 9.44E-3 0.15 0.95 0.25 0.01 0.41 0.60 0.75 0.32 0.32 0.82 0.54 0.32
70 0.27 0.92 0.42 9.64E-3 0.18 0.96 0.31 0.01 0.55 0.65 0.79 0.43 0.41 0.82 0.64 0.43
100 0.39 0.93 0.48 7.29E-3 0.26 0.96 0.40 8.11E-3 0.71 0.75 0.88 0.56 0.57 0.86 0.76 0.56
10 0.33 0.67 0.62 0.04 0.17 0.96 0.20 0.04 0.20 0.70 0.43 0.12 0.16 0.98 0.18 0.12
30 0.36 0.57 0.79 0.07 0.33 0.75 0.67 0.07 0.38 0.53 0.72 0.32 0.35 0.70 0.60 0.33
sokoban 50 0.41 0.61 0.82 0.11 0.37 0.69 0.76 0.11 0.46 0.50 0.83 0.64 0.49 0.68 0.80 0.64
70 0.51 0.76 0.86 0.14 0.53 0.83 0.84 0.14 0.67 0.68 0.89 0.93 0.65 0.70 0.86 0.93
100 0.70 0.82 0.89 0.15 0.68 0.82 0.87 0.16 0.70 0.70 0.89 1.22 0.70 0.70 0.88 1.22
10 0.23 0.93 0.31 9.45E-3 0.15 1.00 0.15 0.01 0.26 0.58 0.61 0.97 0.15 0.99 0.16 0.69
30 0.22 0.95 0.32 8.44E-3 0.10 1.00 0.12 7.95E-3 0.31 0.43 0.77 2.02 0.25 0.83 0.45 1.32
zeno-travel 50 0.20 0.90 0.32 9.37E-3 0.13 0.99 0.22 7.57E-3 0.37 0.44 0.78 3.02 0.33 0.79 0.61 2.04
70 0.22 0.92 0.32 9.77E-3 0.16 0.96 0.28 9.36E-3 0.41 0.45 0.82 4.06 0.42 0.79 0.71 2.71
100 0.32 0.96 0.38 7.36E-3 0.28 0.96 0.37 7.24E-3 0.54 0.54 0.85 5.49 0.58 0.82 0.83 3.61
average — 0.40 0.82 0.61 0.02 0.29 0.92 0.44 0.02 0.43 0.58 0.78 0.93 0.35 0.85 0.55 0.85
Table C.16: Performance of the GR systems with the PRIM parameters and the cost-optimal traces, the Default parameters and the cost-
optimal traces, the PRIM parameters and the divergent traces, and the Default parameters and the divergent traces; the PRIM parameters:
the middle points of the parameter ranges identified by the PRIM algorithm, the Default parameters: ϕ = 50, λ = 1.1, δ = 1.0, θ = 80%,
%O: the level of observation, p: precision, r: recall, a: accuracy, t: time (in seconds).
44
Appendix D. Performance Comparison with the Domain Knowledge-Based GR Approaches
Domain %O PM-based (ours) Landmark-based R&G (DUAL-BFWS) R&G (Greedy LAMA) LP-based
p r a t p r a t p r a t p r a t p r a t
10 0.12 0.50 0.69 0.05 0.19 0.63 0.79 0.40 0.24 0.84 0.71 97.17 0.29 0.94 0.70 773.25 0.27 0.96 0.67 2.51
30 0.29 0.60 0.85 0.06 0.27 0.74 0.83 0.40 0.43 0.64 0.91 23.39 0.38 0.68 0.90 772.76 0.51 0.88 0.90 2.44
blocks-world 50 0.39 0.59 0.91 0.07 0.29 0.81 0.84 0.41 0.51 0.65 0.91 19.03 0.48 0.63 0.94 806.92 0.69 0.91 0.95 2.44
70 0.58 0.70 0.95 0.09 0.42 0.95 0.89 0.41 0.64 0.76 0.93 27.25 0.64 0.73 0.96 819.90 0.86 0.99 0.98 2.48
100 0.76 0.89 0.97 0.11 0.52 1.00 0.93 0.41 0.66 0.76 0.94 52.18 0.63 0.72 0.96 848.38 0.93 1.00 0.99 2.49
10 0.50 1.00 0.50 0.01 0.50 1.00 0.50 0.31 0.57 0.73 0.57 0.36 0.83 1.00 0.83 0.70 0.87 1.00 0.87 0.23
30 0.60 0.93 0.60 6.02E-3 0.60 1.00 0.60 0.31 0.67 1.00 0.67 0.19 0.87 0.93 0.87 0.83 0.97 1.00 0.97 0.24
campus 50 0.63 1.00 0.63 6.19E-3 0.57 1.00 0.57 0.31 0.63 0.93 0.63 0.20 0.93 1.00 0.93 0.82 0.97 1.00 0.97 0.23
70 0.80 1.00 0.80 5.82E-3 0.67 1.00 0.67 0.31 0.60 0.93 0.60 0.19 0.83 0.87 0.83 0.90 0.97 1.00 0.97 0.24
100 0.90 1.00 0.90 6.84E-3 0.67 1.00 0.67 0.30 0.60 0.87 0.60 0.21 0.60 0.80 0.60 0.97 0.97 1.00 0.97 0.22
10 0.11 0.57 0.45 0.17 0.22 0.62 0.60 0.92 0.37 0.51 0.78 18.61 0.50 0.68 0.82 326.27 0.40 0.81 0.75 1.70
30 0.28 0.39 0.75 0.33 0.40 0.92 0.70 0.93 0.57 0.67 0.84 58.49 0.69 0.72 0.92 322.05 0.67 0.81 0.90 1.69
depots 50 0.27 0.47 0.75 0.52 0.48 0.94 0.76 0.94 0.57 0.83 0.73 130.20 0.83 0.92 0.92 349.64 0.77 0.94 0.91 1.69
70 0.43 0.61 0.81 0.71 0.58 0.92 0.84 0.95 0.60 0.89 0.68 159.63 0.78 0.78 0.94 346.60 0.97 0.97 0.99 1.69
100 0.54 0.67 0.90 0.93 0.79 1.00 0.95 0.95 0.65 0.92 0.75 214.83 0.83 0.83 0.95 367.32 1.00 1.00 1.00 1.67
10 0.29 0.68 0.58 0.65 0.24 0.80 0.48 0.83 0.39 0.45 0.78 17.10 0.42 0.48 0.82 16.39 0.38 0.77 0.72 1.07
30 0.35 0.51 0.73 1.98 0.35 0.83 0.59 0.84 0.27 0.75 0.45 80.30 0.41 0.45 0.82 18.03 0.60 0.83 0.86 1.08
driverlog 50 0.49 0.61 0.81 3.28 0.51 0.93 0.73 0.84 0.28 0.86 0.38 117.16 0.62 0.71 0.84 23.06 0.82 0.93 0.95 1.08
70 0.57 0.63 0.84 4.67 0.63 1.00 0.83 0.85 0.30 0.92 0.36 165.37 0.71 0.81 0.87 30.97 0.88 0.95 0.97 1.07
100 0.79 0.86 0.93 6.49 0.79 1.00 0.94 0.85 0.55 0.93 0.61 252.39 0.73 0.82 0.91 31.65 0.98 1.00 1.00 1.12
10 0.18 0.54 0.50 0.04 0.23 0.75 0.48 0.59 0.20 0.50 0.58 1.66 0.30 0.71 0.59 271.30 0.28 0.88 0.52 1.01
30 0.22 0.29 0.74 0.06 0.29 0.92 0.55 0.58 0.27 0.67 0.61 7.65 0.53 0.79 0.77 264.35 0.65 1.00 0.82 1.02
dwr 50 0.23 0.29 0.77 0.09 0.40 0.96 0.68 0.59 0.25 0.62 0.55 19.75 0.68 0.83 0.89 275.76 0.81 1.00 0.94 1.04
70 0.30 0.38 0.79 0.11 0.52 1.00 0.80 0.61 0.47 0.67 0.72 47.24 0.78 0.88 0.90 288.99 0.91 0.96 0.97 1.00
100 0.19 0.25 0.76 0.15 0.58 1.00 0.85 0.60 0.29 0.50 0.65 86.00 0.88 0.88 0.96 312.59 1.00 1.00 1.00 1.04
10 0.48 0.90 0.70 0.08 0.36 0.93 0.46 0.68 0.67 0.90 0.87 7.28 0.64 0.84 0.87 45.03 0.64 0.93 0.87 1.28
30 0.79 0.95 0.94 0.15 0.58 0.90 0.73 0.69 0.82 0.98 0.91 3.30 0.82 0.96 0.93 96.23 0.82 0.95 0.95 1.29
easy-ipc-grid 50 0.89 0.97 0.97 0.22 0.84 0.95 0.92 0.69 0.91 0.99 0.95 8.05 0.89 0.95 0.96 198.89 0.93 0.99 0.99 1.30
70 0.89 0.96 0.98 0.27 0.94 0.98 0.98 0.71 0.94 1.00 0.96 6.12 0.83 0.87 0.95 379.05 0.94 0.99 0.99 1.32
100 0.92 0.97 0.98 0.25 1.00 1.00 1.00 0.68 0.99 1.00 0.99 5.71 0.69 0.74 0.88 614.65 0.98 1.00 1.00 1.20
10 0.17 0.44 0.59 0.21 0.21 0.93 0.37 0.39 0.52 0.68 0.84 1.58 0.54 0.69 0.84 61.95 0.50 1.00 0.71 0.90
30 0.19 0.30 0.71 0.49 0.46 0.93 0.67 0.39 0.48 0.76 0.68 10.62 0.79 0.90 0.92 69.77 0.85 1.00 0.93 0.92
ferry 50 0.29 0.39 0.78 0.80 0.62 0.93 0.81 0.40 0.40 0.89 0.48 25.59 0.87 0.95 0.94 86.17 0.92 1.00 0.97 0.91
70 0.43 0.49 0.84 1.09 0.78 0.93 0.87 0.40 0.29 0.89 0.38 44.89 0.92 0.99 0.95 99.55 0.99 1.00 1.00 0.92
100 0.86 0.89 0.96 1.34 0.89 0.93 0.92 0.40 0.59 0.93 0.66 68.78 0.90 1.00 0.93 127.22 1.00 1.00 1.00 0.91
10 0.19 0.55 0.58 0.02 0.10 1.00 0.21 0.40 0.61 0.98 0.91 10.66 0.59 1.00 0.91 4.91 0.59 1.00 0.91 1.76
30 0.27 0.49 0.78 0.01 0.32 1.00 0.70 0.40 0.84 0.92 0.98 1.80 0.93 1.00 0.99 4.95 0.94 1.00 0.99 1.77
intrusion-detection 50 0.42 0.49 0.85 0.01 0.54 1.00 0.86 0.40 0.85 0.92 0.97 2.40 0.99 1.00 1.00 5.05 0.99 1.00 1.00 1.77
70 0.47 0.52 0.87 0.01 0.72 1.00 0.91 0.41 0.78 0.85 0.97 3.69 1.00 1.00 1.00 5.32 1.00 1.00 1.00 1.79
100 0.44 0.44 0.87 0.01 0.91 1.00 0.94 0.41 0.72 0.80 0.96 6.81 1.00 1.00 1.00 5.65 1.00 1.00 1.00 1.81
10 0.66 1.00 0.71 0.01 0.33 1.00 0.33 0.27 0.59 0.80 0.67 5.27 0.59 0.80 0.67 1.32 0.66 1.00 0.71 0.32
30 0.73 0.87 0.82 6.24E-3 0.47 1.00 0.47 0.27 0.80 0.93 0.87 0.92 0.80 0.93 0.87 1.07 0.83 1.00 0.89 0.33
kitchen 50 0.57 0.80 0.69 6.25E-3 0.47 1.00 0.47 0.28 0.83 0.92 0.89 0.29 0.74 0.83 0.81 1.00 0.79 0.93 0.84 0.32
70 0.90 1.00 0.93 4.74E-3 0.56 1.00 0.56 0.28 0.79 0.93 0.82 0.38 0.79 0.87 0.84 1.09 0.82 0.87 0.87 0.32
100 0.80 0.87 0.87 4.72E-3 0.69 1.00 0.69 0.28 0.77 0.93 0.84 0.71 0.69 0.87 0.78 1.18 0.60 0.60 0.73 0.32
10 0.25 0.58 0.62 0.15 0.24 0.94 0.44 1.18 0.47 0.61 0.85 26.33 0.55 0.79 0.88 12.41 0.61 1.00 0.85 1.51
30 0.38 0.47 0.83 0.29 0.54 0.97 0.79 1.19 0.56 0.78 0.76 35.07 0.70 0.89 0.85 14.28 0.86 0.98 0.97 1.50
logistics 50 0.42 0.50 0.87 0.43 0.70 1.00 0.90 1.19 0.56 0.83 0.72 64.76 0.75 0.96 0.81 16.81 0.93 0.99 0.98 1.50
70 0.56 0.62 0.90 0.57 0.86 1.00 0.96 1.21 0.54 0.82 0.70 121.22 0.75 0.97 0.78 22.28 0.96 1.00 0.99 1.48
100 0.81 0.84 0.96 0.64 0.96 1.00 0.99 1.09 0.52 0.69 0.81 139.02 0.80 0.98 0.82 30.15 1.00 1.00 1.00 1.44
10 0.31 0.60 0.65 0.29 0.23 1.00 0.28 0.99 0.28 0.48 0.67 3.68 0.43 0.61 0.77 13.35 0.63 1.00 0.81 1.03
30 0.26 0.40 0.68 0.64 0.43 1.00 0.60 1.00 0.26 0.65 0.50 4.83 0.45 0.88 0.58 40.92 0.92 1.00 0.97 1.02
miconic 50 0.32 0.38 0.75 1.00 0.54 1.00 0.74 1.00 0.26 0.75 0.44 13.21 0.48 0.87 0.59 53.85 0.96 1.00 0.98 1.02
70 0.45 0.49 0.81 1.37 0.73 1.00 0.86 1.00 0.19 0.82 0.33 22.20 0.52 0.90 0.60 69.41 0.99 1.00 1.00 1.00
100 0.65 0.68 0.86 1.84 0.79 1.00 0.90 1.01 0.18 0.93 0.23 39.31 0.63 0.93 0.68 73.11 1.00 1.00 1.00 1.04
10 0.19 0.46 0.58 0.26 0.27 0.96 0.40 1.01 0.55 0.73 0.82 6.43 0.51 0.80 0.78 4.28 0.53 0.99 0.71 0.99
30 0.34 0.46 0.73 0.69 0.39 0.96 0.57 1.02 0.65 0.83 0.80 13.31 0.70 0.90 0.84 12.11 0.78 0.86 0.92 0.99
rovers 50 0.43 0.48 0.80 1.09 0.52 0.98 0.72 1.05 0.72 0.93 0.81 63.57 0.76 0.95 0.83 16.98 0.92 0.99 0.97 0.98
70 0.36 0.42 0.77 1.53 0.71 1.00 0.86 1.05 0.70 0.96 0.78 96.03 0.74 0.96 0.78 39.90 0.98 0.99 0.99 0.98
100 0.39 0.43 0.77 2.14 0.83 1.00 0.91 1.05 0.79 0.96 0.88 85.87 0.82 0.96 0.84 54.18 1.00 1.00 1.00 1.02
10 0.20 0.64 0.44 0.12 0.25 0.89 0.40 1.24 0.30 0.55 0.65 6.37 0.35 0.65 0.66 5.83 0.46 0.92 0.69 1.03
30 0.26 0.54 0.59 0.20 0.39 0.90 0.59 1.24 0.44 0.69 0.72 12.22 0.50 0.70 0.78 9.76 0.68 0.93 0.85 1.04
satellite 50 0.41 0.60 0.75 0.30 0.59 0.92 0.77 1.25 0.34 0.69 0.55 27.00 0.64 0.80 0.84 10.57 0.82 0.96 0.94 1.05
70 0.55 0.65 0.79 0.40 0.67 0.93 0.83 1.26 0.30 0.83 0.47 45.91 0.62 0.87 0.80 15.48 0.93 0.98 0.97 1.02
100 0.71 0.75 0.88 0.51 0.74 0.93 0.88 1.25 0.37 0.93 0.47 84.69 0.74 0.93 0.84 13.11 0.96 1.00 0.99 1.02
10 0.20 0.70 0.43 0.11 0.28 0.90 0.42 1.30 0.59 0.72 0.81 51.53 0.36 0.42 0.78 215.56 0.58 0.73 0.85 1.93
30 0.38 0.53 0.72 0.30 0.42 0.83 0.63 1.32 0.73 0.87 0.83 179.13 0.52 0.77 0.68 321.13 0.62 0.63 0.87 1.93
sokoban 50 0.46 0.50 0.83 0.59 0.54 0.92 0.73 1.33 0.84 0.93 0.88 309.37 0.71 0.82 0.85 438.12 0.47 0.48 0.81 1.87
70 0.67 0.68 0.89 0.88 0.75 0.95 0.89 1.33 0.81 0.88 0.90 415.83 0.81 0.90 0.90 510.07 0.45 0.47 0.79 1.89
100 0.70 0.70 0.89 1.16 0.89 1.00 0.97 1.35 0.91 0.95 0.94 632.05 0.90 0.90 0.97 637.70 0.35 0.35 0.75 1.88
10 0.26 0.55 0.61 0.78 0.29 0.70 0.62 1.28 0.30 0.45 0.69 6.97 0.45 0.50 0.80 10.39 0.49 0.87 0.71 1.39
30 0.31 0.42 0.77 1.61 0.39 0.90 0.67 1.30 0.30 0.69 0.53 46.92 0.48 0.74 0.69 12.28 0.71 0.90 0.88 1.37
zeno-travel 50 0.37 0.45 0.78 2.42 0.62 0.95 0.82 1.30 0.26 0.82 0.39 111.15 0.63 0.92 0.69 13.78 0.89 0.95 0.97 1.38
70 0.41 0.45 0.81 3.29 0.79 1.00 0.90 1.31 0.37 0.92 0.43 186.22 0.63 0.94 0.68 16.34 1.00 1.00 1.00 1.35
100 0.52 0.54 0.85 4.44 0.95 1.00 0.98 1.31 0.60 1.00 0.60 268.10 0.73 0.96 0.75 21.54 1.00 1.00 1.00 1.36
average — 0.46 0.62 0.77 0.74 0.54 0.94 0.72 0.80 0.54 0.81 0.71 65.73 0.68 0.84 0.83 157.42 0.79 0.93 0.91 1.24
Table D.17: Performance of different GR approaches; %O: the level of observation, p: precision, r: recall, a: accuracy, t: time (in
seconds). The PM-based approach (ours) is configured with the PRIM parameters and trained with the divergent traces or with the cost-
optimal traces if the divergent traces are not available. The landmark-based approach uses the uniqueness heuristic with θ = 20%. The two
R&G approaches use the DUAL-BFWS planner and the Greedy LAMA planner, respectively. The LP-based approach uses a combination
of three heuristics, which are landmarks, state equation, and post-hoc.
45
Appendix E. Performance Comparison with the LSTM-Based GR Approach
Table E.18: Performance of the PM-based GR approach (ours) and the LSTM-based approach; (10): trained with 10 traces per goal,
(100): trained with 100 traces per goal, %O: the level of observation, p: precision, r: recall, a: accuracy. Both approaches are trained with
the divergent traces or with the cost-optimal traces if the divergent traces are not available. Our approach is configured with the PRIM
parameters.
46
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
blocks-world (10%) blocks-world (10%) blocks-world (10%) blocks-world (10%)
blocks-world (30%) blocks-world (30%) blocks-world (30%) blocks-world (30%)
blocks-world (50%) blocks-world (50%) blocks-world (50%) blocks-world (50%)
blocks-world (70%) blocks-world (70%) blocks-world (70%) blocks-world (70%)
blocks-world (100%) blocks-world (100%) blocks-world (100%) blocks-world (100%)
campus (10%) campus (10%) campus (10%) campus (10%)
campus (30%) campus (30%) campus (30%) campus (30%)
campus (50%) campus (50%) campus (50%) campus (50%)
campus (70%) campus (70%) campus (70%) campus (70%)
campus (100%) campus (100%) campus (100%) campus (100%)
depots (10%) depots (10%) depots (10%) depots (10%)
depots (30%) depots (30%) depots (30%) depots (30%)
depots (50%) depots (50%) depots (50%) depots (50%)
depots (70%) depots (70%) depots (70%) depots (70%)
depots (100%) depots (100%) depots (100%) depots (100%)
driverlog (10%) driverlog (10%) driverlog (10%) driverlog (10%)
driverlog (30%) driverlog (30%) driverlog (30%) driverlog (30%)
driverlog (50%) driverlog (50%) driverlog (50%) driverlog (50%)
driverlog (70%) driverlog (70%) driverlog (70%) driverlog (70%)
driverlog (100%) driverlog (100%) driverlog (100%) driverlog (100%)
dwr (10%) dwr (10%) dwr (10%) dwr (10%)
dwr (30%) dwr (30%) dwr (30%) dwr (30%)
dwr (50%) dwr (50%) dwr (50%) dwr (50%)
dwr (70%) dwr (70%) dwr (70%) dwr (70%)
dwr (100%) dwr (100%) dwr (100%) dwr (100%)
easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%) easy-ipc-grid (10%)
easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%) easy-ipc-grid (30%)
easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%) easy-ipc-grid (50%)
easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%) easy-ipc-grid (70%)
easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%) easy-ipc-grid (100%)
ferry (10%) ferry (10%) ferry (10%) ferry (10%)
PM (Ours)
PM (Ours)
PM (Ours)
PM (Ours)
47
intrusion-detection (50%) intrusion-detection (50%) intrusion-detection (50%) intrusion-detection (50%)
intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%) intrusion-detection (70%)
intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%) intrusion-detection (100%)
kitchen (10%) kitchen (10%) kitchen (10%) kitchen (10%)
kitchen (30%) kitchen (30%) kitchen (30%) kitchen (30%)
kitchen (50%) kitchen (50%) kitchen (50%) kitchen (50%)
kitchen (70%) kitchen (70%) kitchen (70%) kitchen (70%)
LSTM
LSTM
LSTM
LSTM
Figure E.17: Precision and recall of the PM-based (ours) and the LSTM-based GR approaches. The blue lines indicate cases when the
Declaration of interests
We declare that we have no known competing financial interests or personal relationships that
could have appeared to influence the work reported in this paper.