0% found this document useful (0 votes)

7 views19 pages

Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning

Uploaded by

alaa hussin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning

Uploaded by

alaa hussin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Article

Cooperative Formation Control of a Multi-Agent Khepera IV

Mobile Robots System Using Deep Reinforcement Learning
Gonzalo Garcia 1 , Azim Eskandarian 1 , Ernesto Fabregas 2 , Hector Vargas 3 and Gonzalo Farias 3, *

1 College of Engineering, Virginia Commonwealth University, 601 W Main St., Richmond, VA 23220, USA;
[email protected] (G.G.); [email protected] (A.E.)
2 Departamento de Informatica y Automatica, Universidad Nacional de Edicación a Distancia (UNED),
Juan del Rosal 16, 28040 Madrid, Spain; [email protected]
3 Escuela de Ingenieria Electrica, Pontificia Universidad Catolica de Valparaiso, Av. Brasil 2147,
Valparaiso 2362804, Chile; [email protected]
* Correspondence: [email protected]

Abstract: The increasing complexity of autonomous vehicles has exposed the limitations of
many existing control systems. Reinforcement learning (RL) is emerging as a promising so-
lution to these challenges, enabling agents to learn and enhance their performance through
interaction with the environment. Unlike traditional control algorithms, RL facilitates au-
tonomous learning via a recursive process that can be fully simulated, thereby preventing
potential damage to the actual robot. This paper presents the design and development of
an RL-based algorithm for controlling the collaborative formation of a multi-agent Khep-
era IV mobile robot system as it navigates toward a target while avoiding obstacles in
the environment by using onboard infrared sensors. This study evaluates the proposed
RL approach against traditional control laws within a simulated environment using the
CoppeliaSim simulator. The results show that the performance of the RL algorithm gives a
sharper control law concerning traditional approaches without the requirement to adjust
the control parameters manually.

Keywords: deep reinforcement learning; mobile robots; multi-agent systems; formation

control
Academic Editors: Mehmet Aydin
and Rakib Abdur

Received: 20 December 2024

Revised: 31 January 2025
1. Introduction
Accepted: 7 February 2025
Published: 10 February 2025 In recent years, AI has enabled machines to perform tasks typically requiring human
Citation: Garcia, G.; Eskandarian, A.;
intelligence including visual perception, speech recognition, decision-making, and language
Fabregas, E.; Vargas, H.; Farias, G. translation, transforming numerous fields [1–3]. Among AI’s many applications, machine
Cooperative Formation Control of a learning (ML) stands out, enabling algorithms to learn from data and make predictions
Multi-Agent Khepera IV Mobile or decisions without explicit programming [4]. A further specialization within the field
Robots System Using Deep
of machine learning is deep learning (DL), which utilizes neural networks with multiple
Reinforcement Learning. Appl. Sci.
layers to model intricate patterns in extensive datasets [5–8].
2025, 15, 1777. https://doi.org/
10.3390/app15041777
Deep learning algorithms, particularly those involving reinforcement learning, have
shown great promise in enabling agents to develop sophisticated strategies through trial
Copyright: © 2025 by the authors.
and error [9,10]. By leveraging neural networks, these agents can process vast amounts of
Licensee MDPI, Basel, Switzerland.
This article is an open access article
sensory data, identify patterns, and make informed decisions. This is particularly beneficial
distributed under the terms and in scenarios where agents must operate collaboratively to achieve a common goal [11].
conditions of the Creative Commons The integration of DL with multi-agent systems has seen a marked increase in interest
Attribution (CC BY) license recently [12,13], particularly in the context of mobile robot formation control [14]. This
(https://creativecommons.org/
innovative approach utilizes the power of DL to enable multiple autonomous agents to
licenses/by/4.0/).

Appl. Sci. 2025, 15, 1777 https://doi.org/10.3390/app15041777

Appl. Sci. 2025, 15, 1777 2 of 19

learn and adapt their behaviours by interacting with their environment and each other [15].
The primary objective is to achieve and maintain specific formations, crucial for appli-
cations ranging from unmanned aerial vehicle (UAV) swarms to autonomous vehicle
platoons [16,17]. By addressing the complexities of coordination, communication, and
dynamic environments, DL offers a promising solution to enhance the efficiency and
robustness of multi-robot systems.
During the last few years, our research has focused on the position control of mobile
robots [18], leveraging advanced deep reinforcement learning algorithms. Our previous
work [19,20] details the design, development, and implementation of reinforcement learn-
ing algorithms for controlling the position of a wheeled mobile robot Khepera IV. These
approaches facilitate the learning of optimal control policies through environmental inter-
action, resulting in significantly improved performance compared with traditional control
methods [21–24]. Our experiments, conducted both in simulation and real-world envi-
ronments, demonstrate the effectiveness of our algorithms in achieving precise position
control, even in the presence of obstacles.
This work presents innovative models that enable autonomous agents to learn and
adapt their behaviours using deep reinforcement learning (DRL). These models show the
potential of this approach in applications such as UAV swarms and autonomous vehicle
platoons. This research addresses the complexities of coordination and communication
required to maintain precise formations. Using DRL, the models allow multiple agents
to process large amounts of data in real-time. This enables them to make better decisions
and adjust their positions dynamically. This capability ensures the agents can operate
cohesively, maintaining their formation even in dynamic and unpredictable environments.
The enhanced coordination and communication lead to increased efficiency and robustness
of multi-robot systems, making them more reliable and effective in various scenarios. This
work’s findings could significantly improve autonomous systems, from transportation and
surveillance to other fields.
The main contributions of this work are as follows:
1. Design and implementation: The design and implementation of a control law for the
formation position control of a group of robots based on DRL. This control law helps
robots maintain precise formations and adapt to their surroundings, improving on
the results presented in [18], where classical controllers were used.
2. Simulation environment: The implementation of the proposed algorithm in a simula-
tion environment, including obstacle avoidance. This enables thorough testing and
validation of the effectiveness of the algorithm in maintaining formation and avoiding
obstacles. The obstacle avoidance logic presented in [19] was expanded to all robots
in the formation.
3. Comparison with traditional position control approaches: Comparison of the results
of the new approach against existing control laws under similar conditions. This
comparison highlights the advantages and improvements offered by the RL-based
method. This work expands on [20] by experimenting with a more explicit reward
function for faster target tracking.
4. Performance evaluation: The evaluation of the performance of the proposed control
law using their control surfaces. This provides a quantitative assessment of the
algorithm’s efficiency, robustness, and adaptability. Similar metrics as used in [19]
were selected for a more accurate comparison.
The remainder of this paper is organized as follows: Section 2 presents the environment
where the experiments will take place, as well as some theoretical aspects of the position
control problem, the formation control problem, the obstacles avoidance approach, and
the multi-agent system. Section 3 describes the proposed approach: formation control
Appl. Sci. 2025, 15, 1777 3 of 19

with deep reinforcement learning. Section 4 shows and discusses some simulation and
experimental results of this research. Finally, Section 5 presents the main conclusion and
future works.

2. Background
2.1. Simulation Environment—CoppeliaSim Simulator
CoppeliaSim simulator (formerly V-REP) [25] is a useful tool for the development of 3D
simulations, based on high-fidelity physics-based models. As an integrated development
environment (IDE), it allows for a distributed control architecture where each object/model
can be individually controlled using embedded scripts, plug-ins, remote application pro-
gram interface (API) clients, Robot Operating System (ROS) nodes, or custom solutions.
On the other hand, control algorithms can be written in several programming languages,
including Lua, C/C++, Python, MATLAB, Java, and others. The simulator offers numerous
examples like robot models, sensors, and actuators to create and interact in a virtual world
in real-time. Additionally, it allows for adding new objects with dynamic properties for
designing and constructing new robots. In particular, a model for the Khepera IV robot
has already been put together and developed for the CoppeliaSim simulator. In previous
works [26] (see Figure 1), the authors developed a library to include the Khepera IV robot
model in CoppeliaSim. In this work, the model of the Khepera IV is used to test the
experiments in the CoppeliaSim environment.

Figure 1. CoppeliaSim simulator environment.

2.2. Robot Position Control

2.2.1. Kinematic Model for the Robot
A differential wheeled robot is a mobile vehicle that moves using two independently
driven wheels located on either side of its body. The tangential velocities of these wheels
denoted as (v L ) and (v R ), left and right, respectively, are perpendicular to the axis of
the wheels. The wheels are assumed to roll without sliding, a restriction known as non-
holonomic [27]. The robot changes direction by the differential rotation of the wheels,
eliminating the need for an additional steering mechanism. The robot’s kinematic model in
Cartesian coordinates is described as follows [23,28]:

 ẋc = v cos(θ )

ẏc = v sin(θ ) (1)

 θ̇ = ω,
Appl. Sci. 2025, 15, 1777 4 of 19

where ( xc , yc ) is the robot’s position, and θ is the heading direction angle, perpendicular
to the turning radius (R). The linear and angular velocities of the robot are obtained
from V = (v L + v R )/2, and ω = (v L − v R )/L, respectively, with being L the distance
between the wheels. The angular velocity is defined concerning the Instantaneous Center
of Curvature (ICC). The robot has a maximum linear velocity Vmax and angular ωmax , and
a minimum turning radius Rmin . It can only move forward or backward in the heading
direction; this is known as non-holonomic constraints [24].

2.2.2. Position Control Problem

The experiment involves moving the robot from its current position (C) to a target
point (T p) by adjusting its angular and linear velocities. Traditionally, these velocities are
generated by the controller and then transformed into speeds for the left and right motors.
The proposed DRL directly outputs the wheels’ velocities. Figure 2 shows the variables
involved in this experiment.

Y Kr

v
Yp Tp

d
R
v

Yc C
C R ICC

Xc Xp X
Figure 2. Position control variables for the differential robot.

The kinematic behavior of these robots appears simple, but non-holonomic constraints
pose a challenge in control law design. Previous works detail this issue, [24,29]. In typical
motion, the differential robot follows a circular trajectory with radius R and center ICC.
The position control algorithm aims to minimize orientation error (e = α − θ), where α is
the angle to the target, and θ is robot orientation. Simultaneously, the robot decreases the
distance to the target point (d → 0).
Figure 3 illustrates the position control algorithm, with the inner square as the con-
troller and the outer square as the robot. The Position Sensor is an IPS (Indoor Positioning
System), providing the absolute position and orientation of the robot [18].

d vL
Tp x, y, θ
Compute α Control Law vR Motors
Controller Robot
C

Position Sensor

Figure 3. Block diagram of the position control problem.

Equation (2) gives the distance d, and Equation (3) computes the angle to the target
point α, using the coordinates of T p( x p , y p ) and C ( xc , yc ). Both equations are part of the
Compute block [24].
Appl. Sci. 2025, 15, 1777 5 of 19

q
2 2
d= y p − yc + x p − xc (2)
y p − yc

α = tan−1 (3)
x p − xc
The algorithm calculates the wheel speeds required to reach the destination point
based on distance and angle. This is performed using the block labeled Control Law
(in light red color), which can be designed in different ways. One approach, known as
Villela [21], generates control signals V and ω using the following equations:

V i f d > kr
max
V (t) = (4)
d Vmax i f d ≤ kr
kr

ω = ωmax sin(α − θ ) (5)

with Vmax and ωmax as defined before, and Kr , a docking area radius. This docking area
allows a fast approach to the target when situated further away, but then slowing down
nearby. From the robot’s velocities, the wheels’ (left and right) velocities are obtained by
the relations v L = (2V + ωL)/2 and v R = (2V − ωL)/2.

2.3. Obstacles Avoidance (Braitenberg Algorithm)

The obstacles avoidance algorithm block (light green) is shown in Figure 4. It imple-
ments the Braitenberg algorithm for calculating new velocities (v L ’, v R ’) when obstacles are
present nearby [30]. If no obstacles are detected, the output velocities remain the same as
the input velocities for the block (v′L = v L , v′R = v R ).

Proximity Sensors

d vL vL ’
Tp Avoidance x, y, θ
Compute α Control Law vR vR ’ Motors
Algorithm
Controller Robot
C

Position Sensor

Figure 4. Block diagram showing the position control problem with obstacle avoidance.

The Braitenberg algorithm uses sensor inputs (such as infrared or ultrasonic sensors)
to control the motors of a robot. The basic idea is to connect the sensors to the motors in
a way that the robot can react to its environment. Broadly speaking, when the left sensor
detects an obstacle, it reduces the speed of the right motor, causing the robot to turn left.
Similarly, when the right sensor detects an obstacle, it reduces the speed of the left
motor, causing the robot to turn right. This simple mechanism allows the robot to navigate
around obstacles effectively. The adjustment between the sensors and the motors can be
empirically calibrated by conducting a series of tests in different environments to observe
the robot’s behavior. This helps in fine-tuning the sensor thresholds and motor responses
to ensure smooth and effective obstacle avoidance [31].

2.4. Multi-Agent Systems

Multi-agent systems (MASs) consist of multiple interacting agents that work together
to achieve a common goal. These systems are characterized by their ability to operate de-
centralized, where each agent makes decisions based on local information and interactions
Appl. Sci. 2025, 15, 1777 6 of 19

with other agents. This approach offers several advantages, including increased robustness,
scalability, and flexibility [32].
In cooperative multi-agent systems, agents interact together to maximize a shared
objective. This requires effective communication and coordination among agents to ensure
that their actions are aligned toward the common goal. Examples of cooperative MASs
include swarm robotics [33], where multiple robots collaborate to perform tasks such as
search and rescue [34], environmental monitoring, and formation control.
To implement cooperation among agents, the interaction can be established in two
different ways: (1) all agents have the same hierarchy/rank, and (2) there is a leader or
master and the rest are followers. In the first approach, known as egalitarian cooperation,
all agents collaborate equally, sharing information and decisions to achieve a common
goal. This model promotes robustness and flexibility, as it does not rely on a single point
of failure.
In the second approach, known as hierarchical cooperation, a leader or master agent
coordinates the actions of the other agents, who act as followers. This model can be more
efficient in terms of decision-making and coordination, especially in complex tasks that
require centralized direction. Both approaches have their advantages and challenges, and
the choice between them depends on the nature of the task and the environment in which
the agents operate. To make a formation with the leader-follower approach, the following
equations can be taken into account [35], where N is the number of followers:

vm (t) = K p E pm (t) − K f E f (t) (6)

N
ds (t) = ∑ d f i (t) (7)
i =1

In the case of the followers, they use the position of the leader as a target point
and a distance regarding their position in the formation concerning the leader. These
equations help in designing the control strategy for the followers to maintain a specific
formation relative to the leader. The velocity of the leader (vm (t)) depends on two terms
(Equation (6)): K p E pm (t) which is its position error (E pm (t)) and a control gain (K p ) that
defines the importance of this error in the control. The second term (−K f E f (t)) represents
the sum of the position error of each follower (E f ) in the formation (Equation (7)), multiplied
by a control gain (K f ). By adjusting the control gains (K p , K f ) and desired positions, the
formation can be maintained dynamically as the leader moves.
Figure 5 shows a representation of non-cooperative multi-agent systems [36] from
points A to C. The leader robot is represented in red and the followers in black. The dotted
lines represent the distances between the leader and the follower robots.
The experiment starts at point A, where the position of the robots is not in formation,
which is reached at point C. All the robots move from the initial position to point B (dashed
lines), as can be seen, the leader does not wait for the followers and the formation is not
maintained. Then, the leader arrives at point C and the followers take time to reach the final
formation at point C. On the other hand, the example of cooperative multi-agent systems is
shown from point C to E. The robots maintain the formation during the maneuver.
Appl. Sci. 2025, 15, 1777 7 of 19

A B C D E

Figure 5. Positionformation control.

Deep Reinforcement Learning and Multi-Agent Systems

Multi-agent systems have a wide range of applications across various domains. In
robotics, a MAS is used for tasks such as formation control, where multiple robots maintain
a specific formation while navigating through an environment. This is particularly useful
in scenarios involving unmanned aerial vehicles (UAVs) and autonomous vehicle platoons,
where maintaining precise formations is crucial for efficiency and safety.
In addition to robotics, MASs are employed in fields such as distributed sensing,
where multiple sensors work together to monitor and collect data from large areas. This
approach enhances the coverage and accuracy of the sensing system, making it suitable for
applications like environmental monitoring and surveillance.
Despite their advantages, multi-agent systems also face several challenges. One of the
primary challenges is ensuring effective communication and coordination among agents,
especially in dynamic and unpredictable environments. Developing robust algorithms that
can handle these complexities is an ongoing area of research. Another challenge is the
scalability of MASs, as the number of agents is incremented, the system’s complexity grows
exponentially. Researchers are exploring techniques such as hierarchical organization and
distributed control to address scalability issues [12].
Looking ahead, the integration of deep reinforcement learning (DRL) with multi-
agent systems holds great promise. DRL enables agents to learn and adapt their behavior
through interactions with their surroundings, making it a powerful tool for enhancing the
capabilities of MASs. Future research will likely focus on developing more efficient and
scalable DRL algorithms for multi-agent systems, paving the way for advanced applications
in areas such as autonomous transportation, smart grids, and collaborative robotics [37].

3. Proposed Approach
Based on the leader/follower configuration, two different DRL controllers are de-
signed. While the leader’s goal is to approach the target taking into consideration the
location of the followers, the aim of these ones is to keep a fixed relative location concerning
the leader for a given geometric formation. These relative reference positions are located
on each side of the leader at some relative angle and distance. Both DRL controllers output
the linear velocities of the left and right wheels, v L and v R , according to Figures 3 or 4.
The Deep Deterministic Policy Gradient (DDPG) is selected for the DRL controllers.
This is an improvement derivation from the Q-learning and Deep Q-learning Controller
(DQN) formulations for continuous-time states and actions based on the following recursive
Bellman optimality equation [38]:

Q(S[k], A[k]) = Q(S[k], A[k]) + α

error
z }| {
( R[t + 1] + γ · max Q(S[k + 1], a) − Q(S[k], A[k)) (8)
a
| {z }
numerical search target
Appl. Sci. 2025, 15, 1777 8 of 19

which, after convergence, gives the maximum reward-to-go Q as an output with inputs
S[k] and A[k], the current observed state and action. The discount factor γ accounts for
future attenuation, and learning rate α, also a design parameter, is used for accelerating or
slowing down the convergence. R[k + 1] is the reward obtained from applying A[k ] at S[k ]
and transitioning to S[k + 1]. The associated neural network is called Critic. Another net is
used to approximate the control policy, and it is called an actor. It is interesting to highlight
that DRL is by design an optimal controller that directly benefits from this underlying
theory, but it differs from other optimal techniques in that it can be trained forward-in-time
while interacting with the environment, using the plant’s real dynamics. Its performance
will be shaped by reward function in any way, similar to all other optimal approaches.
Figure 6 shows an Actor and Critic neural networks block diagram, corresponding to the
Control Law block in Figures 3 and 4 for the followers and the leader.

𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

𝑡𝑡𝑡𝑡𝑡𝑡𝑡

⋮ ⋮ 𝑄𝑄

𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 30 40
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓: 𝑒𝑒 𝑡𝑡 , 𝑑𝑑 𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡

𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙: 𝑒𝑒 𝑡𝑡 , 𝑑𝑑 𝑡𝑡 , 𝑑𝑑𝑠𝑠 𝑡𝑡 ⋮ ⋮ 𝑣𝑣𝑙𝑙 𝑡𝑡 , 𝑣𝑣𝑟𝑟 𝑡𝑡

40 30
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
Figure 6. Actor and Critic neural networks.

3.1. Reward Function for the Followers

The follower’s DRL is designed to minimize the position error with respect to its
relative location to the leader. The follower’s reward function R f [k + 1] is then designed
by [39]:
f
R f [k + 1] = −10(e f )2 [k] − 36vd [k] (9)
f f
where eθ is the orientation error of the follower, as defined earlier, and vd is the time rate
of change of the distance error d f , previously defined. This function rewards a smaller
orientation error, which ensures the right orientation towards its target, and at the same
time rewards a larger negative time rate of change of the distance error, by maximizing
the rewards the follower will track its target. The relative weights in the reward functions
for both the followers and the leader were obtained heuristically and empirically after
extensive runs.

3.2. Reward Function for the Leader

The leader’s DRL is designed to achieve two concurrent goals and approach its target
while preventing the formation from losing its structure. The leader’s reward function
Rl [k + 1] is designed by [39]:

Rl [k + 1] = −10(el )2 [k ] − 12vld [k ] − 10(ds )2 [k] (10)

where eθl and vld (based on dl ) are acting similarly for the follower case. Based on
Equation (7), the third term incorporates the added distance errors of the followers
ds = d f1 + d f2 , setting a cooperative interaction. This addition allows the leader to modify
its own advance by rewarding a smaller value of ds .
Appl. Sci. 2025, 15, 1777 9 of 19

4. Results
This section shows some results involving different configurations. These include
cooperative and non-cooperative formations, and obstacle avoidance by any or all of
the robots.

4.1. First Scenario: Cooperative vs. Non-Cooperative Formation

Two tests are conducted. First, the leader approaches the target, and the followers
track their formation position without cooperation. Figure 7 shows the initial positions.
The simulation time is around 25 s.

Figure 7. Initial positions: CoppeliaSim environment.

Figure 8 shows the position history of three robots.

24 24
0.5

14 14
0

4
-0.5 4
y [m]

0 24
0

-1
14

-1.5

4
-2
0

-2.5 -2 -1.5 -1 -0.5 0 0.5 1

x [m]
Figure 8. Position history without cooperation. Target: blue cross, leader trajectory: orange, follower 1
trajectory: blue, follower 2 trajectory: green. Numbers indicate elapsed time in seconds arrows
represent the initial orientation of each robot.
Appl. Sci. 2025, 15, 1777 10 of 19

Figure 9 show the reward values vs. time.

Figure 9. Reward values for non-cooperative approach with e the orientation error, and d the distance
error. Leader: up, follower 1: centre, follower 2: bottom.

Figure 10 shows the leader wheel speeds from Figures 8 and 10, it is clear that the
leader did not wait for the followers.

0.1
vL
vR
0.08
Wheel velocity [m/s]

0.06

0.04

0.02

0
0 5 10 15 20 25
t [s]
Figure 10. Leader left and right wheel speeds for the non-cooperative approach.

The second test is similar to the first one, but now there is cooperation. This is bi-
directional, with followers sharing their positions with the leader, and the leader sharing
their position and orientation with the followers. The initial disposition is the same as
shown in Figure 7. The simulation time is around 45 s.
Figure 11 shows the position history of three robots.
Appl. Sci. 2025, 15, 1777 11 of 19

Figure 11. Position history for the cooperative approach. Target: blue cross, leader trajectory: orange,
follower 1 trajectory: blue, follower 2 trajectory: green. Numbers indicate elapsed time in seconds
and arrows represent the initial orientation of each robot.

Figure 12 show the reward values vs. time.

Figure 12. Reward values for cooperative approach with e the orientation error, d the distance error,
and ds the added followers’ distance errors. Leader: up, follower 1: centre, follower 2: bottom.

Figure 13 shows the leader wheel speeds from Figures 11 and 13, it is clear that the
leader waited for the followers.
Appl. Sci. 2025, 15, 1777 12 of 19

Figure 13. Leader left and right wheel speeds for the cooperative approach.

4.2. Second Scenario: Obstacle Avoidance

While the leader approaches the target, both followers are affected by an obstacle.
Figure 14 shows the initial positions. The simulation time is around 70 s.

Figure 14. Obstacle avoidance of initial positions.

Figure 15 shows the position history of three robots.

Appl. Sci. 2025, 15, 1777 13 of 19

Figure 15. Position history for the cooperative approach to obstacle avoidance. Target: blue cross,
leader and trajectory: orange, follower 1 and trajectory: blue, follower 2 and trajectory: green,
obstacles: light brown. Numbers indicate elapsed time in seconds.

Figure 16 shows the reward values.

Figure 16. Reward values with e the orientation error, d the distance error, and ds the added followers’
distance errors. Leader: up, follower 1: centre, follower 2: bottom.

Figure 17 shows the leader wheel speeds.

From Figures 15 and 17 it is clear that the followers avoided their respective obstacles,
while the leader kept a low speed waiting for them.
Appl. Sci. 2025, 15, 1777 14 of 19

Figure 17. Leader left and right wheel speeds.

4.3. Control Laws Performance Comparison

Previous control designs on these robots have used more traditional approaches [23,24].
This section outlines a comparison of the current optimal method and a previous traditional
one, the Villela [21]. Two cases are carried out for a better comparison. These are with and
without cooperation.

4.3.1. Non-Cooperative Control Case

In this case, single robots start with similar initial conditions, and orientations as op-
posed to a target point, and execute their algorithm based on their errors. As the controllers,
DRL and Villela, were designed with two inputs (such as followers’ controllers), orientation
error e, and distance error d, a graphical representation is possible. Figures 18 and 19 show
their control surfaces and the trajectory the robot took. Figure 20 shows the trajectories in
the horizontal plane. These results confirm previous ones obtained by the authors, where a
Q-learning controller was applied to the robot outperforming other classical controllers [19].
These results show a better performance for the DRL. In general, the goodness of DRL
becomes more apparent for more complex systems, involving more signals and more
involved dynamics.

Figure 18. Follower DRL control surfaces show position history in blue and the starting point in red.
Appl. Sci. 2025, 15, 1777 15 of 19

Figure 19. Follower Villela control surfaces, showing a position history in blue and stating point
in red.

These graphic representations allow for a quick assessment of the performance of the
control laws. The sharp feature of the DRL follows from its optimal nature, allowing for a
more effective control.

-0.8

-0.6

-0.4

-0.2
y[m]

0.2

0.4

0.6

0.8

2 1.5 1 0.5 0
x[m]
Figure 20. Position histories in the horizontal plane moving from right to left, DRL: blue and Villela:
red. The black cross represents the target and the black arrow represents the initial orientation of
the robot.

4.3.2. Cooperative Formation Control Case

In this case, a similar test as the one performed in Section 4.2 is selected for comparison,
where the followers cooperate with the leader sharing their position so it can wait for them
to keep a tighter formation. Villela controller is used instead in all three robots. Figure 21
shows the results with the previous DRL results superimposed (in dashed line).
For a more quantitative comparison of controller performance, the absolute integral
error (IAE) and the square integral error (ISE) are applied to the distance error d from its
position to the target I AE = ∑ |d|dt and ISE = ∑ d2 dt, with dt being the sample interval.
The results are shown in Table 1.
Appl. Sci. 2025, 15, 1777 16 of 19

1.5
42 56

0.5 28
56

42
0 28
14
y [m]

14 56
0
-0.5 42
0

-1
28

-1.5

14
-2
0

-2.5 -2 -1.5 -1 -0.5 0 0.5 1

x [m]
Figure 21. Position history. Target: blue cross, leader and trajectory: orange, follower 1 and trajectory:
blue, follower 2 and trajectory: green, obstacles: light brown. Segmented lines are the results for DRL.
Numbers indicate elapsed time in seconds and the black arrows represent the initial orientation of
each robot.

Table 1. Obtained Metrics for both Controllers.

Indexes Villela DRL

IAE 32.27 13.93
ISE 28.63 8.91

From Figure 21 and Table 1, it is clear that DRL is outperforming Villela during the
cooperative test.

5. Conclusions
The increasing complexity of autonomous vehicles has exposed the limitations of many
existing control systems. Reinforcement learning (RL) is emerging as a promising solution
to these challenges, enabling agents to learn and enhance their performance through
interaction with the environment. Unlike traditional control algorithms, RL facilitates
autonomous learning via a recursive process that can be fully simulated, thereby preventing
potential damage to the actual robot. This paper presents the design and development of
an RL-based algorithm for controlling the collaborative formation of a multi-agent Khepera
IV mobile robot system as it navigates toward a target while avoiding obstacles in the
environment by using onboard infrared sensors. This study evaluates the proposed RL
approach against traditional control laws within a simulated environment using Coppelia
Robotics. The results show that the performance of the RL algorithm is qualitatively
superior to traditional approaches simplifying the parameter adjustments. Non-cooperative
and cooperative results show better performance using DRL compared with a classical
controller. Both the leader and followers demonstrated more efficient target tracking with
Appl. Sci. 2025, 15, 1777 17 of 19

smaller errors. This was seen qualitatively in the position history graphs, and reflected also
in the metrics for accumulated errors, for the cooperative case. The increasing complexity
of autonomous vehicles has exposed the limitations of many existing control systems.
Reinforcement learning (RL) is emerging as a promising solution to these challenges,
enabling agents to learn and enhance their performance through interaction with the
environment. Unlike traditional control algorithms, RL facilitates autonomous learning via
a recursive process that can be fully simulated, thereby preventing potential damage to the
actual robot. This paper presents the design and development of an RL-based algorithm for
controlling the collaborative formation of a multi-agent Khepera IV mobile robot system as
it navigates toward a target while avoiding obstacles in the environment by using onboard
infrared sensors. This study evaluates the proposed RL approach against traditional control
laws within a simulated environment using Coppelia Robotics. The results show that
the performance of the RL algorithm is qualitatively superior to traditional approaches
simplifying the parameter adjustments. Non-cooperative and cooperative results show
better performance using DRL compared with a classical controller. Both the leader and
followers demonstrated more efficient target tracking with smaller errors. This was seen
qualitatively in the position history graphs, and reflected also in the metrics for accumulated
errors, for the cooperative case.

Author Contributions: Conceptualization, G.G. and G.F.; methodology, G.G.; software, G.G.; vali-
dation, H.V., G.F. and E.F.; formal analysis, G.G.; investigation, G.G.; resources, G.F. and H.V.; data
curation, G.G.; writing—original draft preparation, E.F., G.F. and G.G.; writing—review and editing,
G.F., H.V. and E.F.; supervision, A.E. and H.V.; project administration, A.E., G.F. and H.V.; funding
acquisition, G.F. and E.F. All authors have read and agreed to the published version of the manuscript.

Funding: This research was funded, in part, by the Chilean Research and Development Agency
(ANID) under Projects FONDECYT 1191188. The Ministry of Science and Innovation of Spain under
Project PID2022-137680OB-C32. The Agencia Estatal de Investigación of Spain (AEI) under Project
PID2022-139187OB-I00.

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Data are contained within the article.

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997.
2. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: London, UK, 2016.
3. McCarthy, J. What Is Artificial Intelligence? 2007. Available online: http://jmc.stanford.edu/articles/whatisai/whatisai.pdf
(accessed on 1 December 2024).
4. Domingos, P. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World; Basic Books: New York,
NY, USA, 2015.
5. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
6. Wang, X.; Zhao, Y.; Pourpanah, F. Recent Advances in Deep Learning. Int. J. Mach. Learn. Cybern. 2020, 11, 385–400. [CrossRef]
7. Talaei Khoei, T.; Ould Slimane, H.; Kaabouch, N. Deep Learning: Systematic Review, Models, Challenges, and Research Directions.
Neural Comput. Appl. 2023, 35, 23103–23124. [CrossRef]
8. Mienye, I.D.; Swart, T.G. A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications.
Information 2024, 15, 755. [CrossRef]
9. Ghasemi, M.; Mousavi, A.H.; Ebrahimi, D. Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical
Challenges. arXiv 2024, arXiv:2411.18892. [CrossRef]
10. Wong, A.; Bäck, T.; Kononova, A.V.; Plaat, A. Deep multiagent reinforcement learning: Challenges and directions. Artif. Intell.
Rev. 2023, 56, 5023–5056. [CrossRef]
Appl. Sci. 2025, 15, 1777 18 of 19

11. Liu, W.; Zhang, Y.; Chen, X. Recent advances in deep learning models: A systematic review. Multimed. Tools Appl. 2023,
82, 15295–15320. [CrossRef]
12. Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges,
Solutions and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [CrossRef]
13. Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [CrossRef]
14. Dutta, A.; Orr, J. Kernel-based Multiagent Reinforcement Learning for Near-Optimal Formation Control of Mobile Robots. Appl.
Intell. 2022, 52, 12345–12360. [CrossRef]
15. Dawood, M.; Pan, S.; Dengler, N.; Zhou, S.; Schoellig, A.P.; Bennewitz, M. Safe Multi-Agent Reinforcement Learning for Formation
Control without Individual Reference Targets. arXiv 2023, arXiv:2312.12861. [CrossRef]
16. Mukherjee, S. Formation Control of Multi-Agent Systems. Master’s Thesis, University of North Texas: Denton, TX,
USA, 2017.
17. Nagahara, M.; Azuma, S.I.; Ahn, H.S. Formation Control. In Control of Multi-Agent Systems; Springer: Berlin/Heidelberg,
Germany, 2024; pp. 113–158. [CrossRef]
18. Farias, G.; Fabregas, E.; Peralta, E.; Vargas, H.; Dormido-Canto, S.; Dormido, S. Development of an Easy-to-Use Multi-Agent
Platform for Teaching Mobile Robotics. IEEE Access 2019, 7, 55885–55897. [CrossRef]
19. Farias, G.; Garcia, G.; Montenegro, G.; Fabregas, E.; Dormido-Canto, S.; Dormido, S. Reinforcement Learning for Position Control
Problem of a Mobile Robot. IEEE Access 2020, 8, 152941–152951. [CrossRef]
20. Quiroga, F.; Hermosilla, G.; Farias, G.; Fabregas, E.; Montenegro, G. Position Control of a Mobile Robot through Deep
Reinforcement Learning. Appl. Sci. 2022, 12, 7194. [CrossRef]
21. Gonzalez-Villela, V.; Parkin, R.; Lopez, M.; Dorador, J.; Guadarrama, M. A wheeled mobile robot with obstacle avoidance
capability. Ing. Mecánica. Tecnol. Y Desarro. 2004, 1, 150–159.
22. Baillieul, J. The geometry of sensor information utilization in nonlinear feedback control of vehicle formations. In Proceedings of
the Cooperative Control: A Post-Workshop Volume 2003 Block Island Workshop on Cooperative Control; Springer: Berlin/Heidelberg,
Germany 2005; pp. 1–24. [CrossRef]
23. Siegwart, R.; Nourbakhsh, I.R.; Scaramuzza, D. Introduction to Autonomous Mobile Robots; MIT Press: Cambridge, MA,
USA, 2011.
24. Fabregas, E.; Farias, G.; Aranda-Escolástico, E.; Garcia, G.; Chaos, D.; Dormido-Canto, S.; Bencomo, S.D. Simulation and
Experimental Results of a New Control Strategy For Point Stabilization of Nonholonomic Mobile Robots. IEEE Trans. Ind. Electron.
2020, 67, 6679–6687. [CrossRef]
25. Rohmer, E.; Singh, S.P.N.; Freese, M. V-REP: A Versatile and Scalable Robot Simulation Framework. In Proceedings of the Proc.
of The International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013. [CrossRef]
26. Farias, G.; Fabregas, E.; Peralta, E.; Torres, E.; Dormido, S. A Khepera IV library for robotic control education using V-REP.
IFAC-PapersOnLine 2017, 50, 9150–9155. [CrossRef]
27. Ma, Y.; Cocquempot, V.; El Najjar, M.E.B.; Jiang, B. Actuator failure compensation for two linked 2WD mobile robots based on
multiple-model control. Int. J. Appl. Math. Comput. Sci. 2017, 27, 763–776. [CrossRef]
28. Morales, G.; Alexandrov, V.; Arias, J. Dynamic model of a mobile robot with two active wheels and the design an optimal
control for stabilization. In Proceedings of the 2012 IEEE Ninth Electronics, Robotics and Automotive Mechanics Conference,
Cuernavaca, Mexico, 19–23 November 2012; pp. 219–224. [CrossRef]
29. Fabregas, E.; Farias, G.; Dormido-Canto, S.; Guinaldo, M.; Sánchez, J.; Dormido Bencomo, S. Platform for teaching mobile robotics.
J. Intell. Robot. Syst. 2016, 81, 131–143. [CrossRef]
30. Shayestegan, M.; Marhaban, M.H. A Braitenberg Approach to Mobile Robot Navigation in Unknown Environments. In
Proceedings of the Trends in Intelligent Robotics, Automation, and Manufacturing, Kuala Lumpur, Malaysia, 28–30 November
2012; pp. 75–93. [CrossRef]
31. Gogoi, B.J.; Mohanty, P.K. Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm. In Proceedings of the International
Conference on Artificial Intelligence and Sustainable Engineering; Springer: Berlin/Heidelberg, Germany, 2022; pp. 139–150. [CrossRef]
32. Dorri, A.; Kanhere, S.S.; Jurdak, R. Multi-agent systems: A survey. IEEE Internet Things J. 2018, 6, 285–298. [CrossRef]
33. Brambilla, M.; Ferrante, E.; Birattari, M.; Dorigo, M. Swarm robotics: A review from the swarm engineering perspective. Swarm
Intell. 2013, 7, 1–41. [CrossRef]
34. Osooli, H.; Robinette, P.; Jerath, K.; Ahmadzadeh, S.R. A Multi-Robot Task Assignment Framework for Search and Rescue with
Heterogeneous Teams. arXiv 2023, arXiv:2309.12589v1. [CrossRef]
35. Lawton, J.; Beard, R.; Young, B. A decentralized approach to formation maneuvers. IEEE Trans. Robot. Autom. 2003, 19, 933–941.
[CrossRef]
36. Oh, K.K.; Park, M.C.; Ahn, H.S. A survey of multi-agent formation control. Automatica 2015, 53, 424–440. [CrossRef]
37. Oroojlooy, J.; Snyder, L.V. A Review of Cooperative Multi-Agent Deep Reinforcement Learning. arXiv 2021, arXiv:2106.15691.
[CrossRef]
Appl. Sci. 2025, 15, 1777 19 of 19

38. Morales, M. Grokking Deep Reinforcement Learning; Co., Manning Publications: Shelter Island, NY, USA, 2020.
39. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found.
Trends Mach. Learn. 2018, 11, 219–354. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
No ratings yet
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
15 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
Collision Avoidance Using RL
No ratings yet
Collision Avoidance Using RL
19 pages
RL vs DRL: Robotics Arm Motion Planning
No ratings yet
RL vs DRL: Robotics Arm Motion Planning
9 pages
Final MSC Report Divyam Rastogi
No ratings yet
Final MSC Report Divyam Rastogi
78 pages
Decentralized Multi-Robot Formation Control in Environments With Non-Convex and Dynamic Obstacles Based On Path Planning Algorithms
No ratings yet
Decentralized Multi-Robot Formation Control in Environments With Non-Convex and Dynamic Obstacles Based On Path Planning Algorithms
18 pages
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
No ratings yet
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
14 pages
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
No ratings yet
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
11 pages
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
No ratings yet
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
22 pages
DuranK Thesis Redacted
No ratings yet
DuranK Thesis Redacted
63 pages
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
No ratings yet
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
7 pages
Sensors 23 05974
No ratings yet
Sensors 23 05974
15 pages
Robotic Action CTRL Seminar Report
No ratings yet
Robotic Action CTRL Seminar Report
27 pages
Computation 12 00116
No ratings yet
Computation 12 00116
17 pages
Impact of RL in Robot Control
No ratings yet
Impact of RL in Robot Control
20 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
No ratings yet
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
10 pages
Mit 2
No ratings yet
Mit 2
106 pages
Drones 06 00323 v3
No ratings yet
Drones 06 00323 v3
18 pages
Multi-Agent Deep Reinforcement Learning-Based Robo
No ratings yet
Multi-Agent Deep Reinforcement Learning-Based Robo
35 pages
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
No ratings yet
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
8 pages
Motion Planning For Mobile RobotsFocusing On Deep Reinforcement Learning A Systematic Review
No ratings yet
Motion Planning For Mobile RobotsFocusing On Deep Reinforcement Learning A Systematic Review
21 pages
Towards Hierarchical Task Decomposition Using Deep Reinforcement Learning For Pick and Place Subtasks
No ratings yet
Towards Hierarchical Task Decomposition Using Deep Reinforcement Learning For Pick and Place Subtasks
6 pages
Efficient Deep RL for Robotic Walking
No ratings yet
Efficient Deep RL for Robotic Walking
10 pages
Deep Reinforcement Learning For Drone Delivery
No ratings yet
Deep Reinforcement Learning For Drone Delivery
19 pages
Advancements in DRL for Robotics
No ratings yet
Advancements in DRL for Robotics
4 pages
SA031PL
No ratings yet
SA031PL
7 pages
Liu 2019
No ratings yet
Liu 2019
8 pages
Data Driven Control IEEE Paper
No ratings yet
Data Driven Control IEEE Paper
4 pages
Autonomous Decentralized Control For Formation Multiple Mobile-Robots Considering Ability of Robot
No ratings yet
Autonomous Decentralized Control For Formation Multiple Mobile-Robots Considering Ability of Robot
6 pages
Baocaopbl 5
No ratings yet
Baocaopbl 5
14 pages
Minor Project Synopsis
No ratings yet
Minor Project Synopsis
2 pages
Paper Ask1 Arxiv
No ratings yet
Paper Ask1 Arxiv
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
232 pages
Dhruv Anirudh DrSandeep
No ratings yet
Dhruv Anirudh DrSandeep
21 pages
Crisci Francesco
No ratings yet
Crisci Francesco
74 pages
Applsci 09 03057
No ratings yet
Applsci 09 03057
16 pages
Deep RL for Robotic Fish Control
No ratings yet
Deep RL for Robotic Fish Control
14 pages
الداينمك والفزي 3 معروضات هنا
No ratings yet
الداينمك والفزي 3 معروضات هنا
21 pages
Deep-Reinforcement-Learning-based Path Planning For Industrial Robots Using Distance Sensors As Observation
No ratings yet
Deep-Reinforcement-Learning-based Path Planning For Industrial Robots Using Distance Sensors As Observation
9 pages
Mathematics 13 00173
No ratings yet
Mathematics 13 00173
17 pages
Robot Formations
No ratings yet
Robot Formations
10 pages
Improving Deep Reinforcement L
No ratings yet
Improving Deep Reinforcement L
9 pages
Logarithmic Map for Multi-Robot Navigation
No ratings yet
Logarithmic Map for Multi-Robot Navigation
8 pages
Robot Formation Control via RL
No ratings yet
Robot Formation Control via RL
11 pages
Soft Actor-Critic in Industrial Robotics
No ratings yet
Soft Actor-Critic in Industrial Robotics
15 pages
Adaptive Robotics Papers
No ratings yet
Adaptive Robotics Papers
56 pages
Reinforcement Learning in Robotics
No ratings yet
Reinforcement Learning in Robotics
3 pages
Learning Adaptive Control of A UUV Using A Bio-Inspired Experience Replay Mechanism
No ratings yet
Learning Adaptive Control of A UUV Using A Bio-Inspired Experience Replay Mechanism
14 pages
A Hybrid Multi-Task Learning Approach For Optimizing Deep Reinforcement Learning Agents
No ratings yet
A Hybrid Multi-Task Learning Approach For Optimizing Deep Reinforcement Learning Agents
23 pages
MARRGM Learning Framework For Multi-Agent Reinforcement Learning Via Reinforcement Recommendation and Group Modification
No ratings yet
MARRGM Learning Framework For Multi-Agent Reinforcement Learning Via Reinforcement Recommendation and Group Modification
8 pages
Model-Based Deep Reinforcement Learning For Robotic Systems
No ratings yet
Model-Based Deep Reinforcement Learning For Robotic Systems
146 pages
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
No ratings yet
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
6 pages
CS6700 Reinforcement Learning Assignment
No ratings yet
CS6700 Reinforcement Learning Assignment
17 pages
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
No ratings yet
Maxime2022 - Learning To Walk Legged Hexapod Locomotion From Simulation To The Real World
61 pages
Paper AI
No ratings yet
Paper AI
31 pages
DONE - DoCRL
No ratings yet
DONE - DoCRL
6 pages
Master Thesis
No ratings yet
Master Thesis
77 pages
Generalization of Heterogeneous Multi-Robot Policies
No ratings yet
Generalization of Heterogeneous Multi-Robot Policies
20 pages
Operating Systems Lect 3 M
No ratings yet
Operating Systems Lect 3 M
38 pages
Inverse Kinematic Model: Paul Method Arm+Wrist+ End Effector
No ratings yet
Inverse Kinematic Model: Paul Method Arm+Wrist+ End Effector
22 pages
Microcontrollers Arduino
No ratings yet
Microcontrollers Arduino
58 pages
Resumes
No ratings yet
Resumes
1 page
Probability Regression Roadmap
No ratings yet
Probability Regression Roadmap
8 pages
Thesis Curriculum Vitae Guide
100% (2)
Thesis Curriculum Vitae Guide
4 pages
Social Media Research Trends and Insights
No ratings yet
Social Media Research Trends and Insights
17 pages
Edcc 315 - 1 - e 2021
No ratings yet
Edcc 315 - 1 - e 2021
21 pages
Desigining Assessments: Extensive Speaking: Reference
No ratings yet
Desigining Assessments: Extensive Speaking: Reference
3 pages
5.1.A CalculatingPropertiesShapes Activity 031514
0% (1)
5.1.A CalculatingPropertiesShapes Activity 031514
7 pages
Therrien & Hunsley (2012)
No ratings yet
Therrien & Hunsley (2012)
17 pages
Software Configuration Management: Engr - DR (MRS) Elei F.O
No ratings yet
Software Configuration Management: Engr - DR (MRS) Elei F.O
24 pages
Assessing The Budgeting Practices of The Grade 12 Students of Ubay National Science High School 3
No ratings yet
Assessing The Budgeting Practices of The Grade 12 Students of Ubay National Science High School 3
51 pages
Memorandum of Understanding Between (Nama Universitas) and Idn Boarding School On (Judul Mou)
No ratings yet
Memorandum of Understanding Between (Nama Universitas) and Idn Boarding School On (Judul Mou)
1 page
Our Data
No ratings yet
Our Data
280 pages
Treasure (En)
No ratings yet
Treasure (En)
2 pages
Understanding Network Connections
No ratings yet
Understanding Network Connections
19 pages
Histograms of Oriented Gradients For Human Detection N. Dalal and B. Triggs CVPR 2005
No ratings yet
Histograms of Oriented Gradients For Human Detection N. Dalal and B. Triggs CVPR 2005
11 pages
Chapter 4 Human Cpaital Formation
No ratings yet
Chapter 4 Human Cpaital Formation
34 pages
Verification Sheet2
No ratings yet
Verification Sheet2
4 pages
Senior High School: 21 Century Literature Teaching From The Philippines To The World
No ratings yet
Senior High School: 21 Century Literature Teaching From The Philippines To The World
7 pages
50 Connecting Questions 1
No ratings yet
50 Connecting Questions 1
4 pages
PreCal Orientation
No ratings yet
PreCal Orientation
19 pages
Lor For Professor (GM)
100% (1)
Lor For Professor (GM)
1 page
Module 9 - 20 Item Multiple Answer Question
No ratings yet
Module 9 - 20 Item Multiple Answer Question
6 pages
Evidence-Based Autism Treatment Guidelines
No ratings yet
Evidence-Based Autism Treatment Guidelines
135 pages
MadisonLivingston IntegratedSources
No ratings yet
MadisonLivingston IntegratedSources
3 pages
María Rosa Menocal - Shards of Love - Exile and The Origins of The Lyric (1993, Duke University Press) - Libgen - Li
100% (1)
María Rosa Menocal - Shards of Love - Exile and The Origins of The Lyric (1993, Duke University Press) - Libgen - Li
193 pages
Pyrolysis of Thermoplastics A Mini Project
No ratings yet
Pyrolysis of Thermoplastics A Mini Project
3 pages
Sınıf Ingilizce Yıllık Plan
No ratings yet
Sınıf Ingilizce Yıllık Plan
12 pages
Curriculum Vitae: Ismaila - Aderolu@kwasu - Edu.ng
No ratings yet
Curriculum Vitae: Ismaila - Aderolu@kwasu - Edu.ng
10 pages
Deep Learning in Image Retrieval
No ratings yet
Deep Learning in Image Retrieval
23 pages
Benefits of Green Exercise
No ratings yet
Benefits of Green Exercise
5 pages

Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning

Uploaded by

Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning

Uploaded by

Article

Cooperative Formation Control of a Multi-Agent Khepera IV

Keywords: deep reinforcement learning; mobile robots; multi-agent systems; formation

Received: 20 December 2024

Appl. Sci. 2025, 15, 1777 https://doi.org/10.3390/app15041777

Figure 1. CoppeliaSim simulator environment.

2.2. Robot Position Control

2.2.2. Position Control Problem

Figure 3. Block diagram of the position control problem.

ω = ωmax sin(α − θ ) (5)

2.3. Obstacles Avoidance (Braitenberg Algorithm)

2.4. Multi-Agent Systems

vm (t) = K p E pm (t) − K f E f (t) (6)

Figure 5. Positionformation control.

Deep Reinforcement Learning and Multi-Agent Systems

Q(S[k], A[k]) = Q(S[k], A[k]) + α

𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙: 𝑒𝑒 𝑡𝑡 , 𝑑𝑑 𝑡𝑡 , 𝑑𝑑𝑠𝑠 𝑡𝑡 ⋮ ⋮ 𝑣𝑣𝑙𝑙 𝑡𝑡 , 𝑣𝑣𝑟𝑟 𝑡𝑡

3.1. Reward Function for the Followers

3.2. Reward Function for the Leader

Rl [k + 1] = −10(el )2 [k ] − 12vld [k ] − 10(ds )2 [k] (10)

4.1. First Scenario: Cooperative vs. Non-Cooperative Formation

Figure 7. Initial positions: CoppeliaSim environment.

Figure 8 shows the position history of three robots.

-2.5 -2 -1.5 -1 -0.5 0 0.5 1

Figure 9 show the reward values vs. time.

Figure 12 show the reward values vs. time.

4.2. Second Scenario: Obstacle Avoidance

Figure 14. Obstacle avoidance of initial positions.

Figure 15 shows the position history of three robots.

Figure 16 shows the reward values.

Figure 17 shows the leader wheel speeds.

Figure 17. Leader left and right wheel speeds.

4.3. Control Laws Performance Comparison

4.3.1. Non-Cooperative Control Case

4.3.2. Cooperative Formation Control Case

-2.5 -2 -1.5 -1 -0.5 0 0.5 1

Table 1. Obtained Metrics for both Controllers.

Indexes Villela DRL

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Data are contained within the article.

Conflicts of Interest: The authors declare no conflicts of interest.

You might also like