Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning
Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning
1 College of Engineering, Virginia Commonwealth University, 601 W Main St., Richmond, VA 23220, USA;
[email protected] (G.G.); [email protected] (A.E.)
2 Departamento de Informatica y Automatica, Universidad Nacional de Edicación a Distancia (UNED),
Juan del Rosal 16, 28040 Madrid, Spain; [email protected]
3 Escuela de Ingenieria Electrica, Pontificia Universidad Catolica de Valparaiso, Av. Brasil 2147,
Valparaiso 2362804, Chile; [email protected]
* Correspondence: [email protected]
Abstract: The increasing complexity of autonomous vehicles has exposed the limitations of
many existing control systems. Reinforcement learning (RL) is emerging as a promising so-
lution to these challenges, enabling agents to learn and enhance their performance through
interaction with the environment. Unlike traditional control algorithms, RL facilitates au-
tonomous learning via a recursive process that can be fully simulated, thereby preventing
potential damage to the actual robot. This paper presents the design and development of
an RL-based algorithm for controlling the collaborative formation of a multi-agent Khep-
era IV mobile robot system as it navigates toward a target while avoiding obstacles in
the environment by using onboard infrared sensors. This study evaluates the proposed
RL approach against traditional control laws within a simulated environment using the
CoppeliaSim simulator. The results show that the performance of the RL algorithm gives a
sharper control law concerning traditional approaches without the requirement to adjust
the control parameters manually.
learn and adapt their behaviours by interacting with their environment and each other [15].
The primary objective is to achieve and maintain specific formations, crucial for appli-
cations ranging from unmanned aerial vehicle (UAV) swarms to autonomous vehicle
platoons [16,17]. By addressing the complexities of coordination, communication, and
dynamic environments, DL offers a promising solution to enhance the efficiency and
robustness of multi-robot systems.
During the last few years, our research has focused on the position control of mobile
robots [18], leveraging advanced deep reinforcement learning algorithms. Our previous
work [19,20] details the design, development, and implementation of reinforcement learn-
ing algorithms for controlling the position of a wheeled mobile robot Khepera IV. These
approaches facilitate the learning of optimal control policies through environmental inter-
action, resulting in significantly improved performance compared with traditional control
methods [21–24]. Our experiments, conducted both in simulation and real-world envi-
ronments, demonstrate the effectiveness of our algorithms in achieving precise position
control, even in the presence of obstacles.
This work presents innovative models that enable autonomous agents to learn and
adapt their behaviours using deep reinforcement learning (DRL). These models show the
potential of this approach in applications such as UAV swarms and autonomous vehicle
platoons. This research addresses the complexities of coordination and communication
required to maintain precise formations. Using DRL, the models allow multiple agents
to process large amounts of data in real-time. This enables them to make better decisions
and adjust their positions dynamically. This capability ensures the agents can operate
cohesively, maintaining their formation even in dynamic and unpredictable environments.
The enhanced coordination and communication lead to increased efficiency and robustness
of multi-robot systems, making them more reliable and effective in various scenarios. This
work’s findings could significantly improve autonomous systems, from transportation and
surveillance to other fields.
The main contributions of this work are as follows:
1. Design and implementation: The design and implementation of a control law for the
formation position control of a group of robots based on DRL. This control law helps
robots maintain precise formations and adapt to their surroundings, improving on
the results presented in [18], where classical controllers were used.
2. Simulation environment: The implementation of the proposed algorithm in a simula-
tion environment, including obstacle avoidance. This enables thorough testing and
validation of the effectiveness of the algorithm in maintaining formation and avoiding
obstacles. The obstacle avoidance logic presented in [19] was expanded to all robots
in the formation.
3. Comparison with traditional position control approaches: Comparison of the results
of the new approach against existing control laws under similar conditions. This
comparison highlights the advantages and improvements offered by the RL-based
method. This work expands on [20] by experimenting with a more explicit reward
function for faster target tracking.
4. Performance evaluation: The evaluation of the performance of the proposed control
law using their control surfaces. This provides a quantitative assessment of the
algorithm’s efficiency, robustness, and adaptability. Similar metrics as used in [19]
were selected for a more accurate comparison.
The remainder of this paper is organized as follows: Section 2 presents the environment
where the experiments will take place, as well as some theoretical aspects of the position
control problem, the formation control problem, the obstacles avoidance approach, and
the multi-agent system. Section 3 describes the proposed approach: formation control
Appl. Sci. 2025, 15, 1777 3 of 19
with deep reinforcement learning. Section 4 shows and discusses some simulation and
experimental results of this research. Finally, Section 5 presents the main conclusion and
future works.
2. Background
2.1. Simulation Environment—CoppeliaSim Simulator
CoppeliaSim simulator (formerly V-REP) [25] is a useful tool for the development of 3D
simulations, based on high-fidelity physics-based models. As an integrated development
environment (IDE), it allows for a distributed control architecture where each object/model
can be individually controlled using embedded scripts, plug-ins, remote application pro-
gram interface (API) clients, Robot Operating System (ROS) nodes, or custom solutions.
On the other hand, control algorithms can be written in several programming languages,
including Lua, C/C++, Python, MATLAB, Java, and others. The simulator offers numerous
examples like robot models, sensors, and actuators to create and interact in a virtual world
in real-time. Additionally, it allows for adding new objects with dynamic properties for
designing and constructing new robots. In particular, a model for the Khepera IV robot
has already been put together and developed for the CoppeliaSim simulator. In previous
works [26] (see Figure 1), the authors developed a library to include the Khepera IV robot
model in CoppeliaSim. In this work, the model of the Khepera IV is used to test the
experiments in the CoppeliaSim environment.
where ( xc , yc ) is the robot’s position, and θ is the heading direction angle, perpendicular
to the turning radius (R). The linear and angular velocities of the robot are obtained
from V = (v L + v R )/2, and ω = (v L − v R )/L, respectively, with being L the distance
between the wheels. The angular velocity is defined concerning the Instantaneous Center
of Curvature (ICC). The robot has a maximum linear velocity Vmax and angular ωmax , and
a minimum turning radius Rmin . It can only move forward or backward in the heading
direction; this is known as non-holonomic constraints [24].
Y Kr
v
Yp Tp
d
R
v
Yc C
C R ICC
Xc Xp X
Figure 2. Position control variables for the differential robot.
The kinematic behavior of these robots appears simple, but non-holonomic constraints
pose a challenge in control law design. Previous works detail this issue, [24,29]. In typical
motion, the differential robot follows a circular trajectory with radius R and center ICC.
The position control algorithm aims to minimize orientation error (e = α − θ), where α is
the angle to the target, and θ is robot orientation. Simultaneously, the robot decreases the
distance to the target point (d → 0).
Figure 3 illustrates the position control algorithm, with the inner square as the con-
troller and the outer square as the robot. The Position Sensor is an IPS (Indoor Positioning
System), providing the absolute position and orientation of the robot [18].
d vL
Tp x, y, θ
Compute α Control Law vR Motors
Controller Robot
C
Position Sensor
Equation (2) gives the distance d, and Equation (3) computes the angle to the target
point α, using the coordinates of T p( x p , y p ) and C ( xc , yc ). Both equations are part of the
Compute block [24].
Appl. Sci. 2025, 15, 1777 5 of 19
q
2 2
d= y p − yc + x p − xc (2)
y p − yc
α = tan−1 (3)
x p − xc
The algorithm calculates the wheel speeds required to reach the destination point
based on distance and angle. This is performed using the block labeled Control Law
(in light red color), which can be designed in different ways. One approach, known as
Villela [21], generates control signals V and ω using the following equations:
V i f d > kr
max
V (t) = (4)
d Vmax i f d ≤ kr
kr
with Vmax and ωmax as defined before, and Kr , a docking area radius. This docking area
allows a fast approach to the target when situated further away, but then slowing down
nearby. From the robot’s velocities, the wheels’ (left and right) velocities are obtained by
the relations v L = (2V + ωL)/2 and v R = (2V − ωL)/2.
Proximity Sensors
d vL vL ’
Tp Avoidance x, y, θ
Compute α Control Law vR vR ’ Motors
Algorithm
Controller Robot
C
Position Sensor
Figure 4. Block diagram showing the position control problem with obstacle avoidance.
The Braitenberg algorithm uses sensor inputs (such as infrared or ultrasonic sensors)
to control the motors of a robot. The basic idea is to connect the sensors to the motors in
a way that the robot can react to its environment. Broadly speaking, when the left sensor
detects an obstacle, it reduces the speed of the right motor, causing the robot to turn left.
Similarly, when the right sensor detects an obstacle, it reduces the speed of the left
motor, causing the robot to turn right. This simple mechanism allows the robot to navigate
around obstacles effectively. The adjustment between the sensors and the motors can be
empirically calibrated by conducting a series of tests in different environments to observe
the robot’s behavior. This helps in fine-tuning the sensor thresholds and motor responses
to ensure smooth and effective obstacle avoidance [31].
with other agents. This approach offers several advantages, including increased robustness,
scalability, and flexibility [32].
In cooperative multi-agent systems, agents interact together to maximize a shared
objective. This requires effective communication and coordination among agents to ensure
that their actions are aligned toward the common goal. Examples of cooperative MASs
include swarm robotics [33], where multiple robots collaborate to perform tasks such as
search and rescue [34], environmental monitoring, and formation control.
To implement cooperation among agents, the interaction can be established in two
different ways: (1) all agents have the same hierarchy/rank, and (2) there is a leader or
master and the rest are followers. In the first approach, known as egalitarian cooperation,
all agents collaborate equally, sharing information and decisions to achieve a common
goal. This model promotes robustness and flexibility, as it does not rely on a single point
of failure.
In the second approach, known as hierarchical cooperation, a leader or master agent
coordinates the actions of the other agents, who act as followers. This model can be more
efficient in terms of decision-making and coordination, especially in complex tasks that
require centralized direction. Both approaches have their advantages and challenges, and
the choice between them depends on the nature of the task and the environment in which
the agents operate. To make a formation with the leader-follower approach, the following
equations can be taken into account [35], where N is the number of followers:
N
ds (t) = ∑ d f i (t) (7)
i =1
In the case of the followers, they use the position of the leader as a target point
and a distance regarding their position in the formation concerning the leader. These
equations help in designing the control strategy for the followers to maintain a specific
formation relative to the leader. The velocity of the leader (vm (t)) depends on two terms
(Equation (6)): K p E pm (t) which is its position error (E pm (t)) and a control gain (K p ) that
defines the importance of this error in the control. The second term (−K f E f (t)) represents
the sum of the position error of each follower (E f ) in the formation (Equation (7)), multiplied
by a control gain (K f ). By adjusting the control gains (K p , K f ) and desired positions, the
formation can be maintained dynamically as the leader moves.
Figure 5 shows a representation of non-cooperative multi-agent systems [36] from
points A to C. The leader robot is represented in red and the followers in black. The dotted
lines represent the distances between the leader and the follower robots.
The experiment starts at point A, where the position of the robots is not in formation,
which is reached at point C. All the robots move from the initial position to point B (dashed
lines), as can be seen, the leader does not wait for the followers and the formation is not
maintained. Then, the leader arrives at point C and the followers take time to reach the final
formation at point C. On the other hand, the example of cooperative multi-agent systems is
shown from point C to E. The robots maintain the formation during the maneuver.
Appl. Sci. 2025, 15, 1777 7 of 19
A B C D E
3. Proposed Approach
Based on the leader/follower configuration, two different DRL controllers are de-
signed. While the leader’s goal is to approach the target taking into consideration the
location of the followers, the aim of these ones is to keep a fixed relative location concerning
the leader for a given geometric formation. These relative reference positions are located
on each side of the leader at some relative angle and distance. Both DRL controllers output
the linear velocities of the left and right wheels, v L and v R , according to Figures 3 or 4.
The Deep Deterministic Policy Gradient (DDPG) is selected for the DRL controllers.
This is an improvement derivation from the Q-learning and Deep Q-learning Controller
(DQN) formulations for continuous-time states and actions based on the following recursive
Bellman optimality equation [38]:
which, after convergence, gives the maximum reward-to-go Q as an output with inputs
S[k] and A[k], the current observed state and action. The discount factor γ accounts for
future attenuation, and learning rate α, also a design parameter, is used for accelerating or
slowing down the convergence. R[k + 1] is the reward obtained from applying A[k ] at S[k ]
and transitioning to S[k + 1]. The associated neural network is called Critic. Another net is
used to approximate the control policy, and it is called an actor. It is interesting to highlight
that DRL is by design an optimal controller that directly benefits from this underlying
theory, but it differs from other optimal techniques in that it can be trained forward-in-time
while interacting with the environment, using the plant’s real dynamics. Its performance
will be shaped by reward function in any way, similar to all other optimal approaches.
Figure 6 shows an Actor and Critic neural networks block diagram, corresponding to the
Control Law block in Figures 3 and 4 for the followers and the leader.
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝑡𝑡𝑡𝑡𝑡𝑡𝑡
⋮ ⋮ 𝑄𝑄
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 30 40
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓: 𝑒𝑒 𝑡𝑡 , 𝑑𝑑 𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡
40 30
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
Figure 6. Actor and Critic neural networks.
where eθl and vld (based on dl ) are acting similarly for the follower case. Based on
Equation (7), the third term incorporates the added distance errors of the followers
ds = d f1 + d f2 , setting a cooperative interaction. This addition allows the leader to modify
its own advance by rewarding a smaller value of ds .
Appl. Sci. 2025, 15, 1777 9 of 19
4. Results
This section shows some results involving different configurations. These include
cooperative and non-cooperative formations, and obstacle avoidance by any or all of
the robots.
24 24
0.5
14 14
0
4
-0.5 4
y [m]
0 24
0
-1
14
-1.5
4
-2
0
Figure 9. Reward values for non-cooperative approach with e the orientation error, and d the distance
error. Leader: up, follower 1: centre, follower 2: bottom.
Figure 10 shows the leader wheel speeds from Figures 8 and 10, it is clear that the
leader did not wait for the followers.
0.1
vL
vR
0.08
Wheel velocity [m/s]
0.06
0.04
0.02
0
0 5 10 15 20 25
t [s]
Figure 10. Leader left and right wheel speeds for the non-cooperative approach.
The second test is similar to the first one, but now there is cooperation. This is bi-
directional, with followers sharing their positions with the leader, and the leader sharing
their position and orientation with the followers. The initial disposition is the same as
shown in Figure 7. The simulation time is around 45 s.
Figure 11 shows the position history of three robots.
Appl. Sci. 2025, 15, 1777 11 of 19
Figure 11. Position history for the cooperative approach. Target: blue cross, leader trajectory: orange,
follower 1 trajectory: blue, follower 2 trajectory: green. Numbers indicate elapsed time in seconds
and arrows represent the initial orientation of each robot.
Figure 12. Reward values for cooperative approach with e the orientation error, d the distance error,
and ds the added followers’ distance errors. Leader: up, follower 1: centre, follower 2: bottom.
Figure 13 shows the leader wheel speeds from Figures 11 and 13, it is clear that the
leader waited for the followers.
Appl. Sci. 2025, 15, 1777 12 of 19
Figure 13. Leader left and right wheel speeds for the cooperative approach.
Figure 15. Position history for the cooperative approach to obstacle avoidance. Target: blue cross,
leader and trajectory: orange, follower 1 and trajectory: blue, follower 2 and trajectory: green,
obstacles: light brown. Numbers indicate elapsed time in seconds.
Figure 16. Reward values with e the orientation error, d the distance error, and ds the added followers’
distance errors. Leader: up, follower 1: centre, follower 2: bottom.
Figure 18. Follower DRL control surfaces show position history in blue and the starting point in red.
Appl. Sci. 2025, 15, 1777 15 of 19
Figure 19. Follower Villela control surfaces, showing a position history in blue and stating point
in red.
These graphic representations allow for a quick assessment of the performance of the
control laws. The sharp feature of the DRL follows from its optimal nature, allowing for a
more effective control.
-0.8
-0.6
-0.4
-0.2
y[m]
0.2
0.4
0.6
0.8
2 1.5 1 0.5 0
x[m]
Figure 20. Position histories in the horizontal plane moving from right to left, DRL: blue and Villela:
red. The black cross represents the target and the black arrow represents the initial orientation of
the robot.
1.5
42 56
0.5 28
56
42
0 28
14
y [m]
14 56
0
-0.5 42
0
-1
28
-1.5
14
-2
0
From Figure 21 and Table 1, it is clear that DRL is outperforming Villela during the
cooperative test.
5. Conclusions
The increasing complexity of autonomous vehicles has exposed the limitations of many
existing control systems. Reinforcement learning (RL) is emerging as a promising solution
to these challenges, enabling agents to learn and enhance their performance through
interaction with the environment. Unlike traditional control algorithms, RL facilitates
autonomous learning via a recursive process that can be fully simulated, thereby preventing
potential damage to the actual robot. This paper presents the design and development of
an RL-based algorithm for controlling the collaborative formation of a multi-agent Khepera
IV mobile robot system as it navigates toward a target while avoiding obstacles in the
environment by using onboard infrared sensors. This study evaluates the proposed RL
approach against traditional control laws within a simulated environment using Coppelia
Robotics. The results show that the performance of the RL algorithm is qualitatively
superior to traditional approaches simplifying the parameter adjustments. Non-cooperative
and cooperative results show better performance using DRL compared with a classical
controller. Both the leader and followers demonstrated more efficient target tracking with
Appl. Sci. 2025, 15, 1777 17 of 19
smaller errors. This was seen qualitatively in the position history graphs, and reflected also
in the metrics for accumulated errors, for the cooperative case. The increasing complexity
of autonomous vehicles has exposed the limitations of many existing control systems.
Reinforcement learning (RL) is emerging as a promising solution to these challenges,
enabling agents to learn and enhance their performance through interaction with the
environment. Unlike traditional control algorithms, RL facilitates autonomous learning via
a recursive process that can be fully simulated, thereby preventing potential damage to the
actual robot. This paper presents the design and development of an RL-based algorithm for
controlling the collaborative formation of a multi-agent Khepera IV mobile robot system as
it navigates toward a target while avoiding obstacles in the environment by using onboard
infrared sensors. This study evaluates the proposed RL approach against traditional control
laws within a simulated environment using Coppelia Robotics. The results show that
the performance of the RL algorithm is qualitatively superior to traditional approaches
simplifying the parameter adjustments. Non-cooperative and cooperative results show
better performance using DRL compared with a classical controller. Both the leader and
followers demonstrated more efficient target tracking with smaller errors. This was seen
qualitatively in the position history graphs, and reflected also in the metrics for accumulated
errors, for the cooperative case.
Author Contributions: Conceptualization, G.G. and G.F.; methodology, G.G.; software, G.G.; vali-
dation, H.V., G.F. and E.F.; formal analysis, G.G.; investigation, G.G.; resources, G.F. and H.V.; data
curation, G.G.; writing—original draft preparation, E.F., G.F. and G.G.; writing—review and editing,
G.F., H.V. and E.F.; supervision, A.E. and H.V.; project administration, A.E., G.F. and H.V.; funding
acquisition, G.F. and E.F. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded, in part, by the Chilean Research and Development Agency
(ANID) under Projects FONDECYT 1191188. The Ministry of Science and Innovation of Spain under
Project PID2022-137680OB-C32. The Agencia Estatal de Investigación of Spain (AEI) under Project
PID2022-139187OB-I00.
References
1. Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997.
2. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: London, UK, 2016.
3. McCarthy, J. What Is Artificial Intelligence? 2007. Available online: http://jmc.stanford.edu/articles/whatisai/whatisai.pdf
(accessed on 1 December 2024).
4. Domingos, P. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World; Basic Books: New York,
NY, USA, 2015.
5. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
6. Wang, X.; Zhao, Y.; Pourpanah, F. Recent Advances in Deep Learning. Int. J. Mach. Learn. Cybern. 2020, 11, 385–400. [CrossRef]
7. Talaei Khoei, T.; Ould Slimane, H.; Kaabouch, N. Deep Learning: Systematic Review, Models, Challenges, and Research Directions.
Neural Comput. Appl. 2023, 35, 23103–23124. [CrossRef]
8. Mienye, I.D.; Swart, T.G. A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications.
Information 2024, 15, 755. [CrossRef]
9. Ghasemi, M.; Mousavi, A.H.; Ebrahimi, D. Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical
Challenges. arXiv 2024, arXiv:2411.18892. [CrossRef]
10. Wong, A.; Bäck, T.; Kononova, A.V.; Plaat, A. Deep multiagent reinforcement learning: Challenges and directions. Artif. Intell.
Rev. 2023, 56, 5023–5056. [CrossRef]
Appl. Sci. 2025, 15, 1777 18 of 19
11. Liu, W.; Zhang, Y.; Chen, X. Recent advances in deep learning models: A systematic review. Multimed. Tools Appl. 2023,
82, 15295–15320. [CrossRef]
12. Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges,
Solutions and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [CrossRef]
13. Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [CrossRef]
14. Dutta, A.; Orr, J. Kernel-based Multiagent Reinforcement Learning for Near-Optimal Formation Control of Mobile Robots. Appl.
Intell. 2022, 52, 12345–12360. [CrossRef]
15. Dawood, M.; Pan, S.; Dengler, N.; Zhou, S.; Schoellig, A.P.; Bennewitz, M. Safe Multi-Agent Reinforcement Learning for Formation
Control without Individual Reference Targets. arXiv 2023, arXiv:2312.12861. [CrossRef]
16. Mukherjee, S. Formation Control of Multi-Agent Systems. Master’s Thesis, University of North Texas: Denton, TX,
USA, 2017.
17. Nagahara, M.; Azuma, S.I.; Ahn, H.S. Formation Control. In Control of Multi-Agent Systems; Springer: Berlin/Heidelberg,
Germany, 2024; pp. 113–158. [CrossRef]
18. Farias, G.; Fabregas, E.; Peralta, E.; Vargas, H.; Dormido-Canto, S.; Dormido, S. Development of an Easy-to-Use Multi-Agent
Platform for Teaching Mobile Robotics. IEEE Access 2019, 7, 55885–55897. [CrossRef]
19. Farias, G.; Garcia, G.; Montenegro, G.; Fabregas, E.; Dormido-Canto, S.; Dormido, S. Reinforcement Learning for Position Control
Problem of a Mobile Robot. IEEE Access 2020, 8, 152941–152951. [CrossRef]
20. Quiroga, F.; Hermosilla, G.; Farias, G.; Fabregas, E.; Montenegro, G. Position Control of a Mobile Robot through Deep
Reinforcement Learning. Appl. Sci. 2022, 12, 7194. [CrossRef]
21. Gonzalez-Villela, V.; Parkin, R.; Lopez, M.; Dorador, J.; Guadarrama, M. A wheeled mobile robot with obstacle avoidance
capability. Ing. Mecánica. Tecnol. Y Desarro. 2004, 1, 150–159.
22. Baillieul, J. The geometry of sensor information utilization in nonlinear feedback control of vehicle formations. In Proceedings of
the Cooperative Control: A Post-Workshop Volume 2003 Block Island Workshop on Cooperative Control; Springer: Berlin/Heidelberg,
Germany 2005; pp. 1–24. [CrossRef]
23. Siegwart, R.; Nourbakhsh, I.R.; Scaramuzza, D. Introduction to Autonomous Mobile Robots; MIT Press: Cambridge, MA,
USA, 2011.
24. Fabregas, E.; Farias, G.; Aranda-Escolástico, E.; Garcia, G.; Chaos, D.; Dormido-Canto, S.; Bencomo, S.D. Simulation and
Experimental Results of a New Control Strategy For Point Stabilization of Nonholonomic Mobile Robots. IEEE Trans. Ind. Electron.
2020, 67, 6679–6687. [CrossRef]
25. Rohmer, E.; Singh, S.P.N.; Freese, M. V-REP: A Versatile and Scalable Robot Simulation Framework. In Proceedings of the Proc.
of The International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013. [CrossRef]
26. Farias, G.; Fabregas, E.; Peralta, E.; Torres, E.; Dormido, S. A Khepera IV library for robotic control education using V-REP.
IFAC-PapersOnLine 2017, 50, 9150–9155. [CrossRef]
27. Ma, Y.; Cocquempot, V.; El Najjar, M.E.B.; Jiang, B. Actuator failure compensation for two linked 2WD mobile robots based on
multiple-model control. Int. J. Appl. Math. Comput. Sci. 2017, 27, 763–776. [CrossRef]
28. Morales, G.; Alexandrov, V.; Arias, J. Dynamic model of a mobile robot with two active wheels and the design an optimal
control for stabilization. In Proceedings of the 2012 IEEE Ninth Electronics, Robotics and Automotive Mechanics Conference,
Cuernavaca, Mexico, 19–23 November 2012; pp. 219–224. [CrossRef]
29. Fabregas, E.; Farias, G.; Dormido-Canto, S.; Guinaldo, M.; Sánchez, J.; Dormido Bencomo, S. Platform for teaching mobile robotics.
J. Intell. Robot. Syst. 2016, 81, 131–143. [CrossRef]
30. Shayestegan, M.; Marhaban, M.H. A Braitenberg Approach to Mobile Robot Navigation in Unknown Environments. In
Proceedings of the Trends in Intelligent Robotics, Automation, and Manufacturing, Kuala Lumpur, Malaysia, 28–30 November
2012; pp. 75–93. [CrossRef]
31. Gogoi, B.J.; Mohanty, P.K. Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm. In Proceedings of the International
Conference on Artificial Intelligence and Sustainable Engineering; Springer: Berlin/Heidelberg, Germany, 2022; pp. 139–150. [CrossRef]
32. Dorri, A.; Kanhere, S.S.; Jurdak, R. Multi-agent systems: A survey. IEEE Internet Things J. 2018, 6, 285–298. [CrossRef]
33. Brambilla, M.; Ferrante, E.; Birattari, M.; Dorigo, M. Swarm robotics: A review from the swarm engineering perspective. Swarm
Intell. 2013, 7, 1–41. [CrossRef]
34. Osooli, H.; Robinette, P.; Jerath, K.; Ahmadzadeh, S.R. A Multi-Robot Task Assignment Framework for Search and Rescue with
Heterogeneous Teams. arXiv 2023, arXiv:2309.12589v1. [CrossRef]
35. Lawton, J.; Beard, R.; Young, B. A decentralized approach to formation maneuvers. IEEE Trans. Robot. Autom. 2003, 19, 933–941.
[CrossRef]
36. Oh, K.K.; Park, M.C.; Ahn, H.S. A survey of multi-agent formation control. Automatica 2015, 53, 424–440. [CrossRef]
37. Oroojlooy, J.; Snyder, L.V. A Review of Cooperative Multi-Agent Deep Reinforcement Learning. arXiv 2021, arXiv:2106.15691.
[CrossRef]
Appl. Sci. 2025, 15, 1777 19 of 19
38. Morales, M. Grokking Deep Reinforcement Learning; Co., Manning Publications: Shelter Island, NY, USA, 2020.
39. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found.
Trends Mach. Learn. 2018, 11, 219–354. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.