Autonomous Behavior Selection For Self-Driving Car
Autonomous Behavior Selection For Self-Driving Car
Introduction
The appropriate selection of driving behaviors to respond to the different
scenarios on the road is essential to achieving full automation in self-driving
cars. By driving behaviors, we understand reactive or short-term actions such
as “obstacle avoidance,” “acceleration,” or “stopping short” that a self-driving
car must choose and execute appropriately to reach its destination safely (Da
Lio et al. 2023).
A variety of strategies for the selection of driving behaviors are available
(Liu et al. 2021). However, artificial neural networks (ANNs), first-order logic
rule systems, and probabilistic models are the most common approaches to
that purpose (Aksjonov and Kyrki 2022; Paden et al. 2016; Sun et al. 2020).
ANNs are outstanding at learning and recognition of noisy, cluttered, or
partial data (Cai et al. 2016). They can also absorb controlling tasks of other
specialized components (Zhu et al. 2022).
Despite their advantages, inference in ANNs requires the parallel propaga
tion of input signals through a massive quantity of weighted links among their
processing units. This process obscures the influence of the components of the
ANNs on their output, and external ad-hoc mechanisms are required to
achieve some explainability and interpretability (Padalkar, Wang, and Gupta
2023; Zhang et al. 2021). These features are crucial in self-driving cars, for
example, when inspecting the source of a car malfunction or determining legal
responsibilities in a collision scenario. In contrast to ANNs, rule-based sys
tems inherently provide rich characterizations of a problem domain by expres
sing properties and relationships between objects through interpretable logic
rules and facts. They also offer traceable inference mechanisms that contribute
to explaining the conclusions (Mishra 2022). Moreover, probability theory is
the most prevalent approach for handling uncertainty. It extends the binary
truth values {1; 0} of first-order logic to a degree of belief in the interval
½0; 1� � R. Consequently, it is not surprising that there is a growing interest
in probabilistic logic frameworks that blend logical clarity with flexible com
binations of deductive and probabilistic inferences (Riguzzi 2022).
Therefore, in this paper, we propose probabilistic logic factored Markov
decision processes (PL-fMDPs) to generate action policies that serve as
a behavior selection scheme for self-driving cars. PL-fMDPs are factored
Markov decision processes (fMDPs) (Boutilier, Dearden, and Goldszmidt
2000) coded in a probabilistic logic programming language. We use MDP-
ProbLog (Bueno et al. 2016) which is a specialized programming language
developed on top of ProbLog (Fierens et al. 2015) to represent and solve
discrete, infinite-horizon fMDPs. MDP-ProbLog considers Boolean state flu
ents to describe the state space of the system, the probable effect of the actions
over the state fluents, and utility values for state fluents, actions, and auxiliary
atoms. The solution of a PL-fMDP is a deterministic, reward-optimal action
policy (called from now on simply as “policy”). Following this approach,
simple probabilistic rules and facts allow us to express complex spatial rela
tionships between the self-driving car and other vehicles on the road, in
a comprehensible and readable manner. For evaluation, we developed a self-
driving car using a realistic simulator. In addition to behavior selection, the
architecture of the self-driving car includes perception and control compo
nents. We utilize simple control laws and perception techniques, as we con
centrate our efforts on behavior selection and our approach does not require
highly accurate model-based controllers or precise estimation of obstacle
position and speed. The behavior selection module is composed of four
APPLIED ARTIFICIAL INTELLIGENCE e2304942-3
Related Work
In (Smith, Petrick, and Belle 2021), a framework for integrated logical and
probabilistic intent reasoning using basic switch sensor data from smart
homes and assisted living facilities is introduced. Utilizing ProbLog, the frame
work infers the most likely intention based on agent actions and crucial sensor
readings from the environment. In (Al-Nuaimi et al. 2021), a novel approach
combining the model checker for multi-agent systems (MCMAS) and the
probabilistic model checker (PRISM) for autonomous vehicles (AV) is intro
duced. The system presented includes a perception system feeding data into
a rational agent (RA) based on a belief-desire-intention (BDI) architecture,
which uses a model of the environment and is connected to the RA to verify its
decision-making process. MCMAS is used to verify the agent’s logic during
design, and PRISM provides probabilities during run-time. Testing on a new
AV software platform and a parking lot scenario confirms the feasibility, and
practical implementation on an experimental test is carried out (Mu et al.
2022). focuses on optimizing the selection of behaviors for a robot through
a repetitive prediction-selection process. In this approach, the framework
predicts how humans will respond to a series of consecutive robot-guiding
e2304942-4 H. AVILÉS ET AL.
behaviors over the long term. The goal is to determine the best sequence of
actions using a technique called Monte Carlo Tree Search (MCTS). By doing
so, the robot aims to enhance human comprehension of its active instructions
and facilitate their decision-making regarding when and where to transition
between different behaviors. This approach seeks to improve the effectiveness
of human-robot interaction during guidance tasks. In (Spanogiannopoulos,
Zweiri, and Seneviratne 2022), the authors propose a novel approach for
generating collision-free paths in non-holonomic self-driving cars using an
incremental sampling-based planner like Rapidly-exploring Random Trees
(RRT). The approach generates incremental collision-free path segments
from the start to end configuration, relying only on local sensor data (point
cloud data) in quasi-static unknown environments. The planner generates
real-time paths that ensure safety and has been extensively tested and validated
in popular benchmark simulation environments. In (Dortmans and Punter
2022), provides an overview of lessons learned from applying Behavior Trees
to autonomous robots. For the design of the “action selection policy” (a
mapping from states to actions), they propose a method that simplifies the
representation of the world’s state space using Boolean conditions as predi
cates. Additionally, the robot’s complex algorithms for actuation are summed
up in a set of actions. These actions are responsible for transitioning the world
from one pre-condition state to a post-condition state with a high probability
of success. This approach expedites the representation of the world and robot
behaviors, making it more suitable for decision-making and control processes.
In (Yuan, Shan, and Mi 2022), the authors use deep reinforcement learning
and game theory for the decision-making of an ego-vehicle at an unsignalized
intersection, where there are other vehicles. The behaviors of these vehicles are
conservative, aggressive, and adaptive. There is no communication between
the vehicles. The tests are carried out in the Gazebo-ROS simulator. In (Duan
et al. 2020), the authors use interconnected neural networks to build a master
driving policy, and three maneuvering policies for driving an autonomous
vehicle. The maneuvers are driving in the lane, and changing between the left
and right lanes. Each maneuver has sub-policies that allow low-level move
ments of the vehicle to be carried out in the lateral and longitudinal directions.
With this hierarchy, they manage to reduce the amount of data necessary in
supervised learning, which requires covering all possible management scenar
ios. Autonomous driving has also been addressed using an optimal control
approach. This approach consists of minimizing a cost function subject to
several restrictions: system dynamics and obstacle shapes, positions, or velo
cities. The solution of such minimization results in the control input signals to
drive the vehicle toward a goal point or trajectory while avoiding obstacles.
Examples of this approach are (Lindqvist et al. 2020; Pepe et al. 2019; Zhang,
Liniger, and Borrelli 2021) and (Wang et al. 2023). The drawback of the
optimal control approach for movement planning in self-driving cars is the
APPLIED ARTIFICIAL INTELLIGENCE e2304942-5
assumption that obstacles have positions and velocities that can be described
analytically, sometimes even constant positions are assumed. We take
a probabilistic approach to develop our behavior selection system due to the
inherent handling of uncertainty. Application of optimal control in problems
different to path planning for self-driving cars can be found in Precup et al.
(2008), Unguritu and Nichitelea (2023), and Rigatos et al. (2016).
ProbLog Overview
ProbLog is a probabilistic logic programming language to describe discrete,
multivariate probability distributions and answers probabilistic queries. It
extends the syntax of Prolog with probability values.
A ProbLog program is a tuple ðL; F Þ where L is a logic program (a finite
set of logic rules and facts), and F is a finite set of probabilistic facts with
the syntax α::f , such that f is a (positive) ground atomic formula,1 and
α 2 ½0; 1�. Probabilistic facts can be understood as mutually independent
Bernoulli random variables, each one with a success probability. ProbLog
also accepts normal rules with probability values, e.g.,
“α::a : b1 ; . . . ; bn ; þ bnþ1 ; . . . ; þ bnþ, :“, where n; , 2 N 0 , a; b1 ; . . . ; bn
are atoms þbnþ1 ; . . . ; þbnþ, are negative atoms, and α 2 ½0; 1� is the prob
ability value attached to the atom a. The sub-expression to the left of the
symbol “:-” (which means “if”) is the head of the clause, and the sub-
expression to the right is its body. In the body, commas function as logical
conjunctions. Semicolons can be used as logical (inclusive) disjunctions.
The symbol\+ means negation as failure. Therefore, the normal rule can be
read as follows: “atom a will succeed with probability α if the atoms from
b1 to bn are true and the atoms from bnþ1 to bnþ, are not true.” In
probability theory notation, this rule is expressed as
pðajb1 ; . . . ; bn ; :bnþ1 ; . . . ; :bnþ, Þ ¼ α (ProbLog does not allow the defini
tion of the probability of failure of an atom; instead, this value is calculated
internally). Notice that probabilistic rules are actually traditional logic rules
extended with an auxiliary probabilistic fact. For example, the rule
“0:4::a1 : b1 ; þb2 :“is treated by ProbLog as a combination of the prob
abilistic fact “0:4::aux:“with the pure logic rule “a1 : b1 ; þ b2 ; aux:”.
ProbLog allows annotated disjunctions (Vennekens, Verbaeten, and
Bruynooghe 2004). A fact with annotated disjunctions
“α1 ::a1 ; α2 ::a2 ; � � � ; αn ::an :” emulates a finite, discrete random variable that
takes n � 2 mutually exclusive, categorical values ai with a probability of
P
occurrence αi , for all i ¼ 1; . . . ; n, and ni¼1 αi ¼ 1. A rule with annotated
disjunctions has the form “α1 ::a1 ; α2 ::a2 ; . . . αm ::am : body:”, where m � 2,
P
and m i¼1 αi ¼ 1. The body is a cause that produces mutually exclusive effects
ai , each one with a probability of success αi , for all i ¼ 1; . . . ; m.
e2304942-6 H. AVILÉS ET AL.
In fMDPs, it is frequently taken for granted that all post-action state variables
are mutually independent among them given the current state x and an action
a (Equation (3)). Moreover, further conditional independence assumptions are
applied to eliminate some of the pre-action state variables that do not affect the
probability value of X0 i (Equation (4)):
Y
n
pX0 1 ;...;X0 n jX1 ;...;Xn ðx0 jx; aÞ ¼ pX0 i ðx0 i jx; aÞ (3)
i¼1
Y
n
¼ pX0 i ðx0 i jobsðPaðX 0 i ÞÞ; aÞ (4)
i¼1
where obsðPaðXi0 ÞÞ stands for the observed values of the parent pre-action
state variables PaðXi0 Þ � X of Xi0 that along with the action a influence the
probability of Xi0 . From now on, random variables as subscripts in the prob
ability theory notation pX0 i ðxi0 jobsðPaðX0 i ÞÞ; aÞ will be omitted whenever it is
clear that x0 i is the value of the post-action state variable X0 i .
e2304942-8 H. AVILÉS ET AL.
Solution of fMDPs
for all x 2 X. Equation (5) is repeated until the largest absolute difference
jvkþ1 ðxÞ vk ðxÞj for any state x 2 X in two consecutive iterations is smaller
than some error threshold, namely, Pð12γ γÞ, where P > 0 establishes a maximum
error bound, and as a consequence jvkþ1 ðxÞ v� ðxÞj < P2 (Puterman 1994).
Upon convergence, � the P-optimal policy πP is recovered
� by the equation
P 0 0
πP ðxÞ ¼ argmaxa Rðx; aÞ þ γ x0 2X vkþ1 ðx Þpðx jx; aÞ , for each x 2 X. This
method is known as value iteration algorithm.
obsðPaðx0 ÞÞ � XF are the truth values of the subset of pre-action state fluents
Paðx0 Þ that together with a 2 A influence the truth value of x0 . The reward
model Rr is the set of probabilistic rules that restrict the utility assignments.
Policy Calculation
MDP-ProbLog uses value iteration to compute the P-optimal policy πP
(Sec. 2.2). The main difference with respect to traditional fMDPs is that
MDP-probLog specifies the reward function Rðx; aÞ through a weighted
linear combination of utilities over state fluents, actions, and complemen
tary atoms:
P P P
Rðx; aÞ ¼ u pðxjx; aÞ þ u pðajx; aÞ þ upðcjx; aÞ
ðx;uÞ2U ða;uÞ2U ðc;uÞ2U (6)
x2XF a2A c2C
Language Syntax
MDP-ProbLog reserves special logic atoms to declare state fluents, actions,
and utilities. A state variable is introduced through the unary atom
e2304942-10 H. AVILÉS ET AL.
Methodology
Software Architecture
Perception Module
The self-driving car has an onboard RGB camera for lane detection, a 3D-
Lidar to detect the presence of other vehicles, and an accelerometer for crash
detection. RGB camera and 3D-Lidar include additive noise with a standard
deviation σ ¼ 0:02.
Visual analysis for lane detection is made with the following steps: (a) we
crop the image to keep only the part below the horizon, and then we apply
a triangular RoI to discard regions where lanes are not likely to appear, (b)
a Gaussian filter is applied and borders are detected by Canny edge detection,
(c) Hough transform is used to find lanes in normal form ðρ; θÞ considering
the bottom-center of the image (car hood) as the origin, (d) lines that are
almost vertical or horizontal are discarded and the rest is divided into left and
right borders, and finally, (e) lines are averaged by weighting their correspond
ing lengths, and the results are taken as the borders of the current lane.
Figure 2 shows the intermediate and final results of the previous steps on
lane detection.
Vehicle detection is performed in predetermined locations around the self-
driving car. Figure 3 shows the locations and their labels in each lane using
cardinal and ordinal directions with respect to the center of the road (herein
after, we will abbreviate cardinal and ordinal labels by their initials N, W, E,
Figure 2. Lane detection process. Left: region of interest. Center: canny edge detection. Right:
detected lanes by Hough Transform.
Figure 3. Preestablished locations and their labels around the self-driving car (in red) where it is
expected to see other vehicles (outlined) on each lane.
e2304942-12 H. AVILÉS ET AL.
and NW, SW, NE, SE, respectively). A 3D LiDAR sensor mounted on the top
of the self-driving car retrieves a point cloud of xyz readings in a three-
dimensional space centered on the roof of the self-driving car. Readings hitting
the ground or above some height threshold, or falling inside the bounding box
of the self-driving car, are considered spurious data and are removed. The
presence or absence of the surrounding vehicles is decided by counting the
number of points that lie inside a rectangular cuboid valid for each location.
Distance estimation is implemented to keep an appropriate distance to the
vehicle in front of the self-driving car. A crash alarm is triggered if the next
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rule holds: da ¼ ðax;t ax;t 1 Þ2 þ ðay;t ay;t 1 Þ2 þ ðaz;t az;t 1 Þ2 � Δa ,
where da represent the Euclidean distance between two consecutive acceler
ometer readings ðax;t 1 ; ay;t 1 ; az;t 1 Þ and ðax;t ; ay;t ; az;t Þ, each component
in m/s2 units. The threshold Δa , which is set at 400 m/s2 , was obtained by
experimentation. The identification of the lane in which the car is located is
determined by comparing the current ðx; y; zÞ coordinates of the self-driving
car in the global coordinate system of the simulated environment, given that
the coordinates of the center of the road are known.
Control Module
The control module is responsible for the execution of four driving actions:
cruise, keep_distance, change_lane, and stop. Lane tracking is at the core of
cruise and keep_distance, so it is going to be presented first. Lane tracking
keeps the car on track and it is performed by comparing an observed lane
ðρo ; θo Þ and a desired lane ðρd ; θd Þ. An error is calculated for left and right
lanes and speed-steering signals are calculated depending on the executed
behavior as explained later. Figure 4 shows these quantities. Green lines depict
the desired lines that should be observed if the car is well centered in the lane.
Cyan lines represent the lines actually observed. Linear speed and steering are
calculated depending on the executed action as follows:
Figure 4. Parameters of the desired (green) and observed (cyan) lines described in normal form
ðρ; θÞ.
APPLIED ARTIFICIAL INTELLIGENCE e2304942-13
Cruise
This action consists of keeping the car in the current lane assuming
there is no obstacle. We use the following control law to track the lane:
δ ¼ Kρ eρ þ Kθ eθ where δ is the steering angle of the self-driving car, Kρ ,
Kθ are tuning constants and eρ and eθ are the errors between the
parameters of the observed and desired parameters of the lines. � These
errors are calculated as eρ ¼ ðρld ρl þ ρrd ρr Þ=2 and
eθ ¼ ððθld θl þ θrd θr Þ=2Þ. In this action, linear speed is con
stant v ¼ Kv .
Keep Distance
The goal of this action is to keep the car at a desired distance from the
car in front (North position). Lane tracking is still performed to keep
the car in the current lane. In this action, steering is calculated similar
to the cruise action. Linear speed is calculated using a proportional
control:
�
vc þ Kd ðd dd Þ if d < dd
v¼ (7)
vc otherwise
where d is the distance between the self-driving car and the car in front of it
and dd is the desired distance. Kd is a tuning constant.
Change Lane
This action is intended to be used for passing cars. Change can be made to left
of right, depending on the current lane (a two-lane way is assumed). We
_
calculate steering as: δ ¼ arcsinðθL v Þ where v is the current car linear speed,
ω ¼ θ_ is a desired angular speed and L is the distance between rear and front
_
tires. Quotient θL=v must be in [−1,1] for δ to exist, but this is easily achieved if
ω is set according to the car’s physical capabilities. For example, if the car is
stopped (v ¼ 0), it is not possible to turn left or right with any steering value.
In this work, we set ω ¼ 1:2 [rad/s] and L ¼ 2:9 m. Linear speed v is the
current speed and can vary depending on the previous executed action.
Lane changing is achieved with two consecutive turns. For change to the
left, the car turns first to left until reaching a given position threshold y1 , and
then it turns right until being aligned to the left lane. Figure 5 shows the two
movements to perform a change-lane to the left. Turning to the right uses the
same logic.
Stop
This action sets both speed and steering to zero.
e2304942-14 H. AVILÉS ET AL.
For example, two factored transition rules for the action change_lane in MDP-
Left are:
% Action: change_lane
0.99::free_NE(1) :- free_NE(0), free_E(0),
(free_SE(0);\+free_SE(0)), change_lane.
0.01::free_NE(1) :- (\+free_NE(0);\+free_E(0)), change_lane.
0.99::free_NW(1) :- free_NW(0), change_lane.
0.01::free_NW(1) :-\+free_NW(0), change_lane.
0.99::free_E(1) :- free_E(0), change_lane.
0.01::free_E(1) :-\+free_E(0), change_lane.
0.99::free_SE(1) :- free_SE(0), change_lane.
0.01::free_SE(1) :-\+free_SE(0), change_lane.
The first rule indicates that when traveling in the left lane, free_NE will most
likely be true in the immediate future with a high probability value of 0.99 if
both free_NE (which refers to the space ahead in the right lane) and free_E
(referring to the space to the right in the right lane) are currently true, and the
action change_lane is performed. The rule also explicitly specifies that the
presence or absence of an obstacle vehicle in space SE is not relevant for this
maneuver. This rule is based on the implicit assumption that the self-driving
car consistently navigates at higher speeds than the other vehicles, thus
ensuring there is sufficient room for a safe turn. The second rule indicates
that free_NW will be true with a low probability value of 0.01 after changing to
the right lane, whenever either free_NE or free_E, or both are currently false
(there is no enough free space for the maneuver, and trying to execute it will
almost surely produce a collision, in accordance with the reward model
below). The rest of the rules are included to reflect the assumption that the
truth value of free_NW, free_E and free_SE are not directly influenced by
change_lane (if each one of the state fluents are true, they will almost certainly
remain true, and if they are false, it is unlikely that they will become true).
These rules emulate the description of Spudd (Hoey et al. 1999), an early
influential software package for coding fMDPs using algebraic decision dia
grams, and where it is customary to explicitly specify which state variables are
not affected by the execution of an action.
The utility assignments and the rules of the reward model in the program
MDP-Left are:
% Utilities
utility(free_NE(0), 1.0).
utility(free_NW(0), 1.0).
utility(rearEnd_crash, −4.0).
utility(sideSwipe_crash, −2.0).
e2304942-16 H. AVILÉS ET AL.
utility(keep_distance, −1.0).
% Reward model
0.99::rearEnd_crash :-\+ free_NW(0), cruise.
0.99::rearEnd_crash :-\+free_NE(0), change_lane.
0.95::sideSwipe_crash :-\+free_E(0), change_lane.
In these rules, a positive utility is assigned to the state fluents free_NE and
free_NW whenever the self-driving car has its NE or NW free. Negative utilities
are attached to two complementary atoms called rear_crash and sideSwipe_crash,
which represent the occurrence of rear-end and sideswipe collisions, respectively.
The difference in their penalization reflects an assumed variation in severity
among these types of accidents within our setting. The last three rules stipulate
how likely each one of these collisions will happen if the corresponding conditions
are met. Finally, utility is also negative for keep_distance, because this action may
lead to an undesirable delay in reaching the destination.
conservative when it comes to changing to the left lane (as stated above, policy
Left is for overtaking only, and then it favors to return to the right lane as soon as
possible). Policy Right only allows changing to the left lane in one state, when NE
is occupied but W, NW, and SW are free. When in the right lane, vehicles
approaching from SW must be considered, as left-lane vehicles usually move
faster, and switching lanes could be risky. Policy “Selector” has four state-action
associations. If success is true, Selector chooses the policy appropriate for the
current lane; otherwise, the policy “Stop” is chosen. This latter policy is the
simplest, involving only two state-action associations: one for stopping the car if
success is false, and the second action is to do nothing otherwise.
To implement the behavior selection module, the policies were arranged in
a two-level hierarchical structure, as depicted in Figure 1. Only three of the
policies are active at every moment: Selector, Stop, and either Left or Right, in
accordance with the position of the car. Recall from Table 1 the number of
state fluents on each PL-fMDP. This task decomposition results in
22 þ 21 þ 24 þ 24 ¼ 38 states, in comparison to the 28 ¼ 256 states that are
required for a single PL-fMDP6 with the same number of fluent variables. In
our experience, this reduction in the number of states sped up the design,
testing, and debugging of the PL-fMDPs.
Figure 6. Convergence graphs for the solution of the PL-fMdps using value iteration: the maximum
error for each on each iteration (left), and the maximum state value (right).
e2304942-18 H. AVILÉS ET AL.
Figure 7. Heat map of the xy-coordinates (in meters) of the obstacle vehicles at their initial
positions in all the tests.
Figure 8. Overhead view of initial positions of the obstacle vehicles in tests 1, 2, and 3 (shown from
top to bottom, respectively). The self-driving car painted in red, is always at the bottom left.
Figure 9. Examples of some of the 16 scenarios derived from policy right and designed for test 1.
scenarios that were tested are derived from the states of the policy Right. In all
the scenarios, the self-driving car initiates at the beginning of the right lane
with 0 to 4 moving obstacle vehicles distributed ahead in both, the right and
left, lanes. Figure 9 shows some of the 16 scenarios in this test, considering the
presence or absence of four obstacle vehicles and the self-driving car. The
minimum distance between the self-driving car and the obstacle vehicles was
6 m. The maximum linear velocity of the self-driving car was set to 28 km
per hour. Obstacle vehicles travel at a speed of 18.8 km per hour. The test
ended when either the correct decision was made according to the policy or
when there was a collision with another vehicle.
speed difference between the left and right lanes, obstacle vehicles in the left lane
were significantly moved backward, as illustrated in the bottom image of
Figure 8. Due to this reallocation, the initial distances between obstacle vehicles
were reduced to approximately 17 m compared to Test 2.
The three tests were executed on standard laptop computers with Intel Core i7
processors 8GB of RAM and standard integrated graphics cards. We com
pleted 10 repetitions for each driving scenario of Test 1 (160 repetitions in
total). Results are presented in Table 3. In all cases, the selection of actions was
accurate with respect to the policy Right. For Test 2 and Test 3, we conducted
30 repetitions for each of the 12 possible combinations of self-driving car
speeds and the number of obstacle vehicles, resulting in a total of 360 repeti
tions. Table 4 summarizes the results. The self-driving car successfully over
took the static obstacle vehicles in all repetitions. When obstacle vehicles
move, the self-driving car overtook them without collisions in 97.7% of the
trials. In 2.3% of the remaining repetitions, the self-driving car collided with
other vehicles, particularly when changing lanes.
These crashes could be attributed to factors such as the permissibility
of the policy Left for lane changing, failures in detecting neighboring
vehicles, or the limitations in the capacity of the laptop computers to
perform the required physics calculations, as continuously warned by
the simulator. Although we believe these are plausible explanations, it is
also necessary to add more information to the system, such as an
estimate of the speed and distance of the surrounding vehicles. The
overall weighted average with respect to the number of repetitions in
each test is 99.2%. We are carefully analyzing log files to improve the
# of Obstacle
Vehicles 20 km/h 24 km/h 28 km/h
Static obstacle 5 100% 100% 100%
vehicles 10 100% 100% 100%
Moving obstacle 5 100% 100% 90%
vehicles 10 96.6% 100% 100%
Figure 10. Trajectories of self-driving car in test 2 and test 3: (a) and (b) depict the self-driving car
overtaking 5 and 10 stationary vehicles, while (c) and (d) illustrate the overtaking of 5 and 10
dynamic vehicles, respectively.
e2304942-22 H. AVILÉS ET AL.
Notes
1. An atomic formula (or atom, for short) has the form “aðt1 ; . . . ; tn Þ,” for n � 0, where a is
the identifier of the atom, and each argument ti for i ¼ 1 to n is a term (that is, a variable,
a constant, or a compound term). An atom is grounded when none of its arguments are
variables or when they do not contain variables.
2. Iverson bracket function ½½��� evaluates to 1 if the propositional condition enclosed in the
brackets holds, and it evaluates to 0 otherwise.
3. In this document, prime notation is employed to differentiate between post-action state
variables X0 1 ; ::; X 0 n and pre-action state variables X1 ; ::; Xn .
APPLIED ARTIFICIAL INTELLIGENCE e2304942-23
4. Lowercase letters are used to denote state fluents, rather than uppercase letters for state
variables as in the previous section, in order to adhere to the standard definition of atoms
in Prolog.
5. Wherever used after its introduction in an MDP-ProbLog program, a pre-action
(resp. post-action) state fluent is identified by adding a value 0 (resp. 1) as its first
parameter.
6. Notice that free_NE, free_NW, and success are used twice in the hierarchy, so they are
counted only once for the single PL-fMDP.
Acknowledgements
The authors would like to thank Sergio Yahir Hernandez-Mendoza for his generous support in
conducting part of the tests in this work, and the reviewers for their insightful and interesting
comments and feedback.
Disclosure statement
No potential conflict of interest was reported by the author(S).
Funding
This work was partially supported by UNAM-DGAPA under grant TA101222 and AI
Consortium - CIMAT-CONAHCYT.
References
Aksjonov, A., and V. Kyrki. 2022. A safety-critical decision-making and control framework
combining machine-learning-based and rule-based algorithms. SAE International Journal of
Vehicle Dynamics, Stability, and NVH 7 (3). arXiv preprint arXiv:2201.12819. doi:10.4271/
10-07-03-0018.
Al-Nuaimi, M., S. Wibowo, H. Qu, J. Aitken, and S. Veres. 2021, SEP. Hybrid verification
technique for decision-making of self-driving vehicles. Journal of Sensor & Actuator
Networks 10(3):42. doi:10.3390/jsan10030042.
Avilés-Arriaga, H. H., L. E. Sucar, E. F. Morales, B. A. Vargas, J. Sánchez, and E. Corona. 2009.
Markovito: a flexible and general service robot. Design and Control of Intelligent Robotic
Systems 177:401–23.
Avilés, H., M. Negrete, R. Machucho, K. Rivera, D. Trejo, and H. Vargas. 2022. Probabilistic
logic Markov decision processes for modeling driving behaviors in self-driving cars. Ibero-
American Conference on Artificial Intelligence (IBERAMIA), 366–77, Cartagena de Indias,
Colombia: Springer. Springer.
e2304942-24 H. AVILÉS ET AL.
Boutilier, C., R. Dearden, and M. Goldszmidt. 2000 August. Stochastic dynamic programming
with factored representations. Artificial Intelligence 121(1–2):49–107. doi:10.1016/S0004-
3702(00)00033-3.
Bueno, T. P., D. D. Mauá, L. N. De Barros, and F. G. Cozman. 2016. Markov decision processes
specified by probabilistic logic programming: representation and solution. 2016 5th
Brazilian Conference on Intelligent Systems (BRACIS), 337–42, Recife, Brazil: IEEE.
Cai, Z., Q. Fan, R. S. Feris, and N. Vasconcelos. 2016. A unified multi-scale deep convolutional
neural network for fast object detection. Computer Vision–ECCV 2016: 14th European
Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14,
354–70, Springer.
Chavira, M., and A. Darwiche. 2008. On probabilistic inference by weighted model counting.
Artificial Intelligence 172 (6–7):772–99. doi:10.1016/j.artint.2007.11.002.
Clark, K. L. 1978. Negation as failure. In Logic and data bases, ed., H. Gallaire and J. Minker.
Boston, MA: Springer. doi: 10.1007/978-1-4684-3384-5_11.
Da Lio, M., A. Cherubini, G. P. R. Papini, and A. Plebe. 2023. Complex self-driving behaviors
emerging from affordance competition in layered control architectures. Cognitive Systems
Research 79:4–14. doi:10.1016/j.cogsys.2022.12.007.
Dortmans, E., and T. Punter. 2022. Behavior trees for smart robots practical guidelines for
robot software development. Journal of Robotics 2022 (7):7419–30. doi:10.1155/2022/
3314084.
Duan, J., S. Eben Li, Y. Guan, Q. Sun, and B. Cheng. 2020. Hierarchical reinforcement learning
for self-driving decision-making without reliance on labelled driving data. IET Intelligent
Transport Systems 14 (5):297–305. doi:10.1049/iet-its.2019.0317.
Fierens, D., G. Van den Broeck, J. Renkens, D. Shterionov, B. Gutmann, I. Thon, G. Janssens,
and L. De Raedt. 2015. Inference and learning in probabilistic logic programs using weighted
boolean formulas. Theory and Practice of Logic Programming 15 (3):358–401. doi:10.1017/
S1471068414000076.
Hoey, J., R. St-Aubin, A. Hu, and C. Boutilier. 1999. SPUDD: Stochastic planning using
decision diagrams. Proceedings of International Conference on Uncertainty in Artificial
Intelligence (UAI ‘99), Stockholm, Sweden.
Lindqvist, B., S. S. Mansouri, A.-A. Agha-Mohammadi, and G. Nikolakopoulos. 2020.
Nonlinear mpc for collision avoidance and control of UAVs with dynamic obstacles. IEEE
Robotics and Automation Letters 5 (4):6001–08. doi:10.1109/LRA.2020.3010730.
Liu, Q., X. Li, S. Yuan, and Z. Li. 2021. Decision-making technology for autonomous vehicles:
Learning-based methods, applications and future outlook. 2021 IEEE International
Intelligent Transportation Systems Conference (ITSC), 30–37, Indianapolis, IN, USA. IEEE.
Mishra, P. 2022. Model explainability for rule-based expert systems. In Practical explainable AI
using python, ed. P. Mishra, 315–26. Berkeley, CA: Apress. doi:10.1007/978-1-4842-7158-2_13.
Mu, Z., W. Fang, S. Zhu, T. Jin, W. Song, X. Xi, Q. Huang, J. Gu, and S. Yuan. 2022. A
multi-modal behavior planning framework for guide robot. 2022 IEEE International
Conference on Robotics and Biomimetics (ROBIO), Jinghong, China.
Padalkar, P., H. Wang, and G. Gupta. 2023. Nesyfold: A system for generating logic-based
explanations from convolutional neural networks. arXiv preprint arXiv:2301.12667.
Paden, B., M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli. 2016. A survey of motion planning
and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent
Vehicles 1 (1):33–55. doi:10.1109/TIV.2016.2578706.
Pepe, G., M. Laurenza, D. Antonelli, and A. Carcaterra. 2019. A new optimal control of obstacle
avoidance for safer autonomous driving. 2019 AEIT International Conference of Electrical
and Electronic Technologies for Automotive (AEIT AUTOMOTIVE), 1–6, Turin, Italy.
APPLIED ARTIFICIAL INTELLIGENCE e2304942-25
Precup, R.-E., S. Preitl, J. Tar, M. Tomescu, M. Takacs, P. Korondi, and P. Baranyi, 2008 9.
Fuzzy control system performance enhancement by iterative learning control. IEEE
Transactions on Industrial Electronics 55(9):3461–75. Accessed November 27, 2023. doi:10.
1109/TIE.2008.925322.
Puterman, M. L. 1994. Markov decision processes: Discrete stochastic dynamic programming.
Hoboken, NJ, US: John Wiley & Sons.
Reyes, A., L. E. Sucar, P. H. Ibargüengoytia, and E. F. Morales. 2020, May. Planning under
uncertainty applications in power plants using factored Markov decision processes. Energies
13(9):2302. doi:10.3390/en13092302.
Rigatos, G., P. Siano, D. Selisteanu, and R. E. Precup. 2016, Nonlinear optimal control of
oxygen and carbon dioxide levels in blood. Intelligent Industrial Systems 3 (2):61–75.
Accessed November 27, 2023. doi:10.1007/s40903-016-0060-y.
Riguzzi, F. 2022. Foundations of probabilistic logic programming: Languages, semantics, infer
ence and learning. Denmark: River Publishers.
Sato, T. 1995. A statistical learning method for logic programs with distribution semantics.
Proceedings of the 12th international conference on logic programming (ICLP’95), 715–29,
Citeseer.
Smith, G., R. Petrick, and V. Belle. 2021. Intent recognition in smart homes with problog. 2021
IEEE International Conference on Pervasive Computing and Communications Workshops
and other Affiliated Events (PerCom Workshops), Kassel, Germany.
Spanogiannopoulos, S., Y. Zweiri, and L. Seneviratne. 2022. Sampling-based non-holonomic
path generation for self-driving cars. Journal of Intelligent & Robotic Systems 104(1). doi:10.
1007/s10846-021-01440-z.
Sun, Z., J. Zou, D. He, Z. Man, and J. Zheng. 2020. Collision-avoidance steering control for
autonomous vehicles using neural network-based adaptive integral terminal sliding mode.
Journal of Intelligent & Fuzzy Systems 39 (3):4689–702. doi:10.3233/JIFS-200625.
Unguritu, M.-G., and T.-C. Nichitelea. 2023 Design and assessment of an anti-lock braking
system controller. Romanian Journal of Information Science and Technology 2023 (1):21–32.
Online; Accessed November 27, 2023. doi:10.59277/ROMJIST.2023.1.02.
Van Gelder, A., K. A. Ross, and J. S. Schlipf. 1991. The well-founded semantics for general logic
programs. Journal of the ACM (JACM) 38 (3):619–49. doi:10.1145/116825.116838.
Vennekens, J., S. Verbaeten, and M. Bruynooghe. 2004. Logic programs with annotated
disjunctions. In International Conference on Logic Programming, Saint-Malo, France,
431–45. Springer.
Wang, K., C. Mu, Z. Ni, and D. Liu. 2023. Safe reinforcement learning and adaptive optimal
control with applications to obstacle avoidance problem. IEEE Transactions on Automation
Science and Engineering, doi:10.1109/TASE.2023.3299275.
Yuan, M., J. Shan, and K. Mi. 2022. Deep reinforcement learning based game-theoretic
decision-making for autonomous vehicles. IEEE Robotics and Automation Letters
7 (2):818–25. doi:10.1109/LRA.2021.3134249.
Zhang, X., A. Liniger, and F. Borrelli. 2021. Optimization-based collision avoidance. IEEE
Transactions on Control Systems Technology 29 (3):972–83. doi:10.1109/TCST.2019.2949540.
Zhang, Y., P. Tiňo, A. Leonardis, and K. Tang. 2021. A survey on neural network
interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence
5 (5):726–42. doi:10.1109/TETCI.2021.3100641.
Zhu, W., Y. Lu, Y. Zhang, X. Wei, and Z. Wei. 2022. End-to-end driving model based on deep
learning and attention mechanism. Journal of Intelligent & Fuzzy Systems 42 (4):3337–48.
doi:10.3233/JIFS-211206.