Electronics 13 00985
Electronics 13 00985
Article
Deep Reinforcement Learning with Godot Game Engine
Mahesh Ranaweera * and Qusay H. Mahmoud
Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON L1G 0C5,
Canada; [email protected]
* Correspondence: [email protected]
Abstract: This paper introduces a Python framework for developing Deep Reinforcement Learning
(DRL) in an open-source Godot game engine to tackle sim-to-real research. A framework was designed
to communicate and interface with the Godot game engine to perform the DRL. With the Godot
game engine, users will be able to set up their environment while defining the constraints, motion,
interactive objects, and actions to be performed. The framework interfaces with the Godot game
engine to perform defined actions. It can be further extended to perform domain randomization and
enhance overall learning by increasing the complexity of the environment. Unlike other proprietary
physics or game engines, Godot provides extensive developmental freedom under an open-source
licence. By incorporating Godot’s built-in powerful node-based environment system, flexible user
interface, and the proposed Python framework, developers can extend its features to develop deep
learning applications. Research performed on Sim2Real using this framework has provided great
insight into the factors that affect the gap in reality. It also demonstrated the effectiveness of this
framework in Sim2Real applications and research.
1. Introduction
In recent years, the field of Deep Reinforcement Learning (DRL) has undergone
significant development in artificial intelligence and robotics. Hinton et al. [1] in 2016
Citation: Ranaweera, M.; Mahmoud, proposed “Deep Learning” (DL), which is a branch of Machine Learning and Artificial
Q.H. Deep Reinforcement Learning Intelligence (AI) that can learn from the given data. DRL combines DL and reinforcement
with Godot Game Engine. Electronics learning (RL), where a computational agent attempts to make decisions through trial and
2024, 13, 985. https://doi.org/ error while adjusting its learning policies. DRL uses deep neural networks to train agents
10.3390/electronics13050985 that are capable of making complex decisions to adapt to dynamic environments. The
agent can learn from the environment and optimize its behavior based on rewards and
Academic Editors: Shuping Zhao, Jie
Wen, Chao Huang and Bob Zhang
punishments to focus on its overall learning. However, the success of DRL methods is often
hindered by a lack of quality or sufficient training data (sample inefficiency), particularly
Received: 12 January 2024 when the tasks involve real-world scenarios and applications. Developing advanced robotic
Revised: 23 February 2024 systems in the real world is often challenging because the required training data can be
Accepted: 1 March 2024 dangerous or difficult to gather [2]. Therefore, creating virtual environments for Sim2Real
Published: 5 March 2024
transfer is important for offering a safe, scalable, and cost-effective platform for training,
testing, and evaluating intelligent systems while minimizing the risks and costs associated
with real-world applications.
Copyright: © 2024 by the authors.
To address these challenges, an upcoming method for generating quality training
Licensee MDPI, Basel, Switzerland. datasets is to utilize virtual environments. As an increasing number of DRL agents are
This article is an open access article being trained and used for real-world applications, such as robotics, manufacturing, and
distributed under the terms and warehouses, it is important to have quality training datasets or access to high-fidelity virtual
conditions of the Creative Commons environments to perform simulations. A lack of quality training data can cause a “reality
Attribution (CC BY) license (https:// gap”, which is a term used to identify the difference between the source and target domains.
creativecommons.org/licenses/by/ According to [3,4], techniques such as Sim2Real are currently used for transfer learning from
4.0/). a simulated environment to a physical environment. Virtual environments have opened
new possibilities for the development of advanced robotic systems that do not require real-
world training datasets, physical platforms, or planned simulation scenarios. Advanced
game engines and physics engines, such as Unreal Engine, Unity, and MujoCo are used
in such applications. Virtual environments provide an inexpensive method for gathering
necessary data, performing advanced simulations, and creating advanced DRL models.
Currently, many Software Development Kits (SDK), plugins, game engines, and
physics engines have been developed to aid in simulated training or add capabilities to
existing game engines. Most of these implementations rely on high-end physics or game
engines, which poses challenges.
1. Complexity and overhead: Game engines are complex and designed to handle a
wide range of tasks related to rendering, simulations, physical interactions, and user
interactions. This complexity can introduce unnecessary overheads during DRL tasks,
potentially slowing the learning process. In addition, integrating DRL algorithms or
workflows with high-end game engines can be challenging.
2. Limited Customizations and Closed Sourced: As most game engines are designed
for specific purposes, using them to perform simulations requires extending their
features through APIs. However, most high-end game engines are proprietary and
close-sourced, and they limit their ability to modify the core codebase, extend their
features via APIs, and access low-level functionalities.
3. Resource intensiveness: High-end game engines can be resource-intensive, and they
require powerful GPUs, CPUs, and significant computational resources to handle
complex scenes. Resource intensiveness can cause scalability challenges, particularly
when the access to high-end computational clusters is limited or expensive. In ad-
dition, performing DRL alongside the simulation could affect the overall learning
owing to the delayed execution of actions and state observations.
4. Tradeoff between Realism and Simplicity: Most modern game engines are designed to
simulate realistic immersion environments. However, DRL research requires simplic-
ity and control over the simulation rather than realism. Simplifying the environment
allows more control and customization options to be tailored to the specific require-
ments of reinforcement learning algorithms. Most existing virtual environments and
physics engines require researchers to follow a specific structure to begin performing
DRL. This restricts the ability to tailor the research.
This paper presents a methodology for utilizing an open-source Godot game engine to
train DRL agents. Godot provides extensive flexibility to overcome the above-mentioned
limitations and provides a user-friendly user interface and a custom programming language
called GDscript, which is closely modelled after Python. The developed Python framework
allows it to interface with the Godot game engine instance and DRL algorithm. For the
demonstration, a three-degrees-of-freedom (3-DoF) Stewart platform was built to perform
DRL learning. Additionally, the framework was utilized to perform a Sim2Real transfer
learning on a real-world 3-DoF Stewart platform. To overcome the reality gap and other
domain-related inconsistencies, the DR technique was introduced.
The paper is organized as follows: In Section 2, related works, such as other physics
engines, game engines, and techniques that were utilized to improve the overall simulated
learning, are presented. This provides an overview of the current frameworks and environ-
ments used by the research community. Section 3 provides an overview of the Godot game
engine, its unique capabilities, and the features that can be used for DRL. Section 4 outlines
the design aspects of the Python framework, communication protocols, and the general
structure of the implementations. Section 5 presents the framework used for the design
and development of a 3-DoF Stewart platform on a Godot game engine to perform DRL.
Finally, in Section 6, we conclude this paper and provide steps for future development.
2. Related Work
There has been significant research and development in utilizing virtual environments
to generate training data through virtual environments to develop advanced DRL models.
Electronics 2024, 13, 985 3 of 15
DRL models require large amounts of high-quality training data. Owing to challenges
in accessing real-world data in scenarios where it is often dangerous, or such data are
inaccessible, a virtual environment provides a great alternative to resolve this limitation.
However, there are many challenges in simulating the real world, such as simulation
fidelity, real-world physical conditions, the dynamic nature, hardware limitations, sensor
and physics simulations, computational complexity, and the domain gap (reality gap).
Many researchers have used existing game engines to perform DRL, whereas others have
developed state-of-the-art physics engines. Additionally, they utilized techniques such as
Sim2Real, DR, and domain adaptation (DA) to augment simulated environments. These
techniques increased the overall learning and fidelity of the trained DRL model.
There are many DRLframeworks developed to achieve state-of-the-art RL. Some of
these frameworks include SampleFactory [5], OpenAI Gym [6], Google’s Dophamine [7],
Keras RL [8], and RLLib [9]. These frameworks utilize their built-in environments and
API structures for learning. In addition, there are a few implementations that utilize the
Godot game engine to perform DRL. One notable implementation is ’Godot Reinforcement
Learning Agents’ [10], which was developed as a Godot plugin and Python package called
godot-rl. The plugin allows users to create custom reinforcement learning environments.
Currently, this library provides wrappers for multiple RL frameworks such as StableBase-
lines3 [11], CleanRL [12], SampleFactory, and RayRLLib. This allows interfacing with
the Godot game engine using the Godot-rl Python library. This library offers features for
creating virtual sensors that can be used within the training environment. It utilizes the
PyTorch open-source machine-learning framework. The implementation employs a TCP
client–server mechanism for communication with multiple virtual instances, both locally
and in distributed settings.
Similarly, another library, GodotAI Gym [13], enables the conversion of existing Godot
virtual environments to compatible OpenAI Gym environments to train RL models using
PyTorch. This library communicates with the Godot game engine through a shared memory
to perform training. Since the framework is designed to be similar to the OpenAI Gym
architecture, similar environments can be easily developed within the Godot engine. CST-
Godot [14] is a framework designed to facilitate the integration of cognitive architectures
within the Godot game engine. This framework provides a seamless interface between
game engines and cognitive agents. This implementation provides a general framework
that can be used to design Web-based RL-agents. Compared with CST-Godot, the pro-
posed framework is more tailored towards Sim2Real learning applications. The proposed
framework incorporates Sim2Real techniques, such as DR, to enhance the overall learning.
The GodotRL agent library was developed concurrently (at the same time) as the
initial research phase of the current study, marking its early development stage. Similar
to GodotRL, our research employed a TCP client-server connection to facilitate the com-
munication between each virtual instance for distributed learning. This setup enabled the
execution of additional training steps, such as DR, to enhance the overall learning process
required for Sim2Real applications. In contrast, the current study was designed specifically
for DRL in Sim2Real applications. This research incorporates techniques such as DR to
address the reality gap. Additionally, this research uses TensorFlow and Keras machine
learning libraries to develop and train the DRL models.
To develop high-fidelity virtual environments and bridge the reality gap to overcome
limitations during DRL, many studies have proposed the use of game engines. Since most
modern AAA game engines have been developed to simulate the real world, they consist
of advanced physics engines and techniques to accurately reproduce real-world lighting
and perform high-fidelity simulations. Game engines such as Unreal Engine [15,16] and
Unity [17] are widely used in DRL development. Additionally, GPU-based simulators such
as the NVIDIA Flex [18], MuJoCo [19,20], and RAWSim-O [21] have been developed specif-
ically for DRL applications. These simulators are specifically designed to provide extensive
support for accurate real-world physical simulations. The key feature of these simulators is
their support for rigid/deformable bodies, phase transitions, particles, fluid, cloth, rope,
Electronics 2024, 13, 985 4 of 15
positions and their orientation [22]. These augmentations in the virtual environment create
complexity and diversity, allowing the agent to adjust its behavior progressively over
time [29].
6. Node-based scene system: The Godot engine allows for the construction of environ-
ments or scenes using node-based systems that create a hierarchy between nodes.
The above features provide an excellent starting point for creating a virtual environ-
ment for DRL learning. The ability to select different physics engines, networking features,
and multi-platform support provides accessibility for controlling the simulation.
With the Godot Game engine, some of the limitations posed by high-end game engines,
such as complexity, overhead (due to high resource usage), and limited customization due
to closed-source code, can be overcome. First, Godot’s open-source nature allows great
flexibility and customization to address the limitations imposed by closed-source code in
other game engines. This allows developers full access to the engines’ source code, enabling
them to modify and tailor to suit their requirements. Additionally, Godot is designed to be
lightweight and efficient, minimizing the overhead and resource usage compared to other
high-end engines. Based on the use-case, Godot supports both 2D and 3D simulations
while allowing the user to select CPU- or GPU-based physics simulations to further fine-
tune the resource usage. This makes it particularly suitable for developers working on
projects with limited hardware and resources. Overall, Godot’s combination of openness,
efficiency, and user-friendly design makes it an excellent candidate for reinforcement
learning applications.
4. Methodology
The following section outlines how the Godot game engine was modified and utilized
to perform the DRL. The built-in code features of Godot game engines have been extensively
used for interfacing with simulated environments. A custom GDscript is used to create a
TCP WebSocket connection to communicate with the game instance. Additionally, a Python
framework was developed to interface with the game engine and train the DRL model. The
rest of this section demonstrates the proposed framework, setting up the environment, and
the experimental setup of the 3-DoF Stewart platform used for Sim2Real DRL.
Environment: Scenario or virtual environment DRL model used for training. It consists
of interactive elements, objects, collision meshes, and real-world scenarios that have been
virtually developed. This will be based on research, and it will be developed using Godot
Electronics 2024, 13, 985 7 of 15
level design features. Interactions and the manner in which objects behave in virtual
systems can be defined by applying kinematic, static, or rigid properties.
RLAgent: The RL agent can perform actions in the environment, make state obser-
vations regarding environmental conditions, and update its policies based on the reward
received. Observations were made through the WebSocket connection established between
the Python framework and virtual instance. The agent is defined as an archive; for example,
in the Stewart platform, the goal is to balance the marble at the center of the moving
platform by precisely controlling the three axes. To achieve this, the agent should be able to
observe the current marble position and moving direction and predict where the marble
should be.
As shown in Figure 3, the environment was first initialized. This initializes the commu-
nication ports between the framework and game instance. The Godot virtual environment
was exported as a standalone application. A custom GDScript (tcpserver.gd) was created
to accept arguments from the Python framework during the initial launch to establish the
environment and WebSocket for communication. The Python framework sends requests to
virtual instances to step through the simulation, perform domain randomization, and reset
the environment when RLAgent fails to reach the goal. This process is repeated until the
RLAgent reaches a specific reward level or goal.
defines the TCP port at which a new virtual instance communicates. Each new virtual
instance is opened in a new subprocess to maintain control and handle errors caused by
the child process. This allows DRL to be performed across multiple instances of the virtual
environment in a distributed manner, using resources.
5. Experimentation
5.1. Deep Reinforcement Learning on a 3-DoF Stewart Platform
The Python framework and Godot game engine were used to perform Sim2Real on a
Stewart platform to overcome this gap in reality. In that paper [29], a Godot environment
was used to train the DRL algorithm and perform DR to increase the fidelity of the model.
The trained DRL model was used to manipulate a real-world 3-DoF Stewart platform.
First, a Stewart platform was developed using 3D CAD software to ensure that the
two platforms were identical. Each designed part was then be exported and 3D printed to
create the physical environment or exported to build the virtual environment in the Godot
game engine.
To create the virtual environment, components designed in Autodesk Fusion 360 were
imported into the Godot game engine in an OBJECT format. OBJECT/MeshInstances can
be used to assign the physical attributes. Figure 4 shows the developed 3-DoF Stewart
platform hosted on the Godot game engine. Godot’s collision shapes and body type
attributes were used to define how each part interacts with the environment, inputs from
the DRAgent, external perturbations, and during the simulation. The body type allows the
definition of rigid, static, and kinematic bodies of the Stewart platform. Without defining
the body type, the marble will not interact with the platform. By setting, the kinematic body
type of the servo arms allows the addition of linear motion to the base platform, resulting
in a pitch and roll motion. The servos arms were directly controlled by the inputs from the
DRL agent. Additionally, using Godot’s environment properties, the object mass, weight,
and gravity scale can be manipulated in environmental physics.
Virtual cameras and event-based triggers were used to make state observations. Addi-
tionally, the position and relative velocity of the marble were determined by accessing its
position using custom GDScripts. To perform DR, which were attached to the environment
nodes to manipulate different aspects of the nodes. The goal of DR is to increase the ran-
domization of the virtual environment, which allows for an increase in the variability of the
environment while exposing the agent to a diverse set of different environmental conditions.
To this end, different randomizations were introduced. Table 1 lists these randomizations
and their value ranges. Each randomization was designed to capture all aspects of the
virtual environment. This allows DRL agents to learn only the essential features of the
virtual environment. Figures 5 and 6 show the previews of these randomizations in the
virtual environment captured through the virtual camera.
Electronics 2024, 13, 985 10 of 15
Figure 5. Domain randomization applied through changing the background colour to increase the
variation [29].
Electronics 2024, 13, 985 11 of 15
Figure 6. DR applied to change the lighting position, intensity and camera position [29].
5.1.2. Results
This framework was utilized to facilitate the research conducted by the authors.
The framework was utilized to evaluate techniques for Sim2Real RL [30] and to show
the effectiveness of DR and induced noise to bridge the reality gap in [2]. Two DRL
algorithms were used for the evaluation: Actor-Critic and Deep Q-Learning. The paper
involved evaluating the performance of the two DRL algorithms in both virtual and
physical environments under different training conditions. Multiple training sessions were
conducted with different environmental parameters while optimizing the hyperparameters
for each algorithm. The following is a summary of the observation made on previous
research conducted using this framework. Table 2 shows the optimized hyper-parameters
for training and making observations.
Electronics 2024, 13, 985 12 of 15
To evaluate, each algorithm was measured for 20 runs in each virtual and physical
environment. Each iteration was performed with different environment conditions to
determine the change in success rate. Table 3 shows different environmental conditions
that were performed against each DRL algorithm. The tests were performed with no
randomization, randomization to marble position, camera position, with and without DR,
and induced noise. In each episode, the complexity of the environment was increased to
induce a variability in the environment using the framework. Observations were made
through the virtual camera setup in the environment and the physical camera in the physical
environment, respectively.
Table 3. The performance results obtained using the framework, demonstrating the application of
DRL using Actor-Critic (A) and Q-Learning (B) within different environment conditions. Adapted
from [29].
The results shown in the table demonstrate the effectiveness of using DR and induced
noise during the training process to enhance the performance of the physical environment.
The maximum reward observed was reached by the agent through balancing the marble
position on the Stewart platform. The success rate observed in the training was determined
by the number of steps the agent took to position the marble within the center of the
platform. The positional data and observational data received by the framework is used to
determine the position of the marble relative to the Stewart platform. For all experiments
within the virtual and physical environment, the marble was placed randomly. Based on
the observations, Q-Learning without randomizations performed poorly due to the incon-
sistencies between the virtual and physical environments, demonstrating the importance of
environmental fidelity during the training. Q-Learning utilized the positional and camera
data to make decisions. On the other-hand, the Actor-Critic model excelled in both virtual
and physical environments, which leverage only pre-processed image data to track the
Electronics 2024, 13, 985 13 of 15
marble position and mitigate external environment noise. The maximum reward observed
through Actor-Critic and Q-Learning was 9942 and 7944, respectively.
Additionally, the results obtained from the framework were compared with a similar
implementation of a robot arm developed by Vacaro et al. [31]. This research demonstrated
that the models trained on the framework outperformed IMPALA RL when DR and induced
noise were added during the DRL process. IMPALA RL used the Unity game engine to
perform their training. IMPALA RL with target, camera, and color randomization achieved
a success rate of 91.33% in the virtual environment and 87.67% in the physical environment.
Meanwhile, the Actor-Critic using the Godot environment achieved a success rate of 99.49%
in the virtual environment and 81.88% in the physical environment. Q-Learning with
DR and induced noise achieved a respectable 89.55% in the virtual environment and and
78.56% in the physical environment.
The reduced performance in the physical environment is attributed to factors such as
the gear-backslash, servo noise, friction on 3D-printed arms and joints, and other environ-
mental variables that had not being accounted for during the modelling of the environment
for training. However, the integration of the framework with the Godot game engine
provides valuable insights into finding techniques to overcome the reality-gap.
Author Contributions: Writing—review, editing and original draft preparation, M.R.; supervi-
sion, review and editing, Q.H.M. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Computer Code and Software for this research are available at https:
//github.com/ma-he-sh/GodotSim2RealResearch.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554.
[CrossRef] [PubMed]
2. Ranaweera, M.; Mahmoud, Q. Bridging Reality Gap Between Virtual and Physical Robot through Domain Randomization and
Induced Noise. In Proceedings of the Canadian Conference on Artificial Intelligence, Virtual Online, 27 May 2022. Available
online: https://caiac.pubpub.org/pub/kzx3gl4e (accessed on 12 January 2024).
3. Wang, J.; Chen, Y.; Feng, W.; Yu, H.; Huang, M.; Yang, Q. Transfer Learning with Dynamic Distribution Adaptation. ACM Trans.
Intell. Syst. Technol. 2020, 11, 1–25. [CrossRef]
Electronics 2024, 13, 985 14 of 15
4. Ranaweera, M.; Mahmoud, Q.H. Virtual to Real-World Transfer Learning: A Systematic Review. Electronics 2021, 10, 1491.
[CrossRef]
5. Petrenko, A.; Huang, Z.; Kumar, T.; Sukhatme, G.; Koltun, V. Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS
with Asynchronous Reinforcement Learning. arXiv 2020, arXiv:2006.11751.
6. Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016,
arXiv:1606.01540.
7. Castro, P.S.; Moitra, S.; Gelada, C.; Kumar, S.; Bellemare, M.G. Dopamine: A Research Framework for Deep Reinforcement
Learning. arXiv 2018, arXiv:1812.06110.
8. Plappert, M. Keras-rl. 2016. Available online: https://github.com/keras-rl/keras-rl (accessed on 12 January 2024).
9. Liang, E.; Liaw, R.; Moritz, P.; Nishihara, R.; Fox, R.; Goldberg, K.; Gonzalez, J.E.; Jordan, M.I.; Stoica, I. RLlib: Abstractions for
Distributed Reinforcement Learning. arXiv 2018, arXiv:1712.09381.
10. Beeching, E.; Debangoye, J.; Simonin, O.; Wolf, C. Godot Reinforcement Learning Agents. arXiv 2021, arXiv:2112.03636.
11. Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning
Implementations. J. Mach. Learn. Res. 2021, 22, 1–8.
12. Huang, S.; Dossa, R.F.J.; Ye, C.; Braga, J. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning
Algorithms. arXiv 2021, arXiv:2111.08819.
13. Derevyanko, G. Lupoglaz/GodotAIGym. 2019. Available online: https://lupoglaz.github.io/GodotAIGym/ (accessed on 12
January 2024).
14. Morais, G.; Loron, I.; Coletta, L.F.; da Silva, A.A.; Simões, A.; Gudwin, R.; Costa, P.D.P.; Colombini, E. CST-Godot: Bridging the
Gap Between Game Engines and Cognitive Agents. In Proceedings of the 2022 21st Brazilian Symposium on Computer Games
and Digital Entertainment (SBGames), Natal, Brazil, 24–27 November 2022; pp. 1–6. [CrossRef]
15. Bakhmadov, M.; Fridheim, M. Combining Reinforcement Learning and Unreal Engine’s AI-Tools to Create Intelligent Bots.
Bachelor Thesis, NTNU, May 2020. Available online: https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/2672159 (accessed
on 12 January 2024).
16. Boyd, R.A.; Barbosa, S.E. Reinforcement Learning for All: An Implementation Using Unreal Engine Blueprint. In Proceedings of
the 2017 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 14–16
December 2017; pp. 787–792. [CrossRef]
17. Ward, T.; Bolt, A.; Hemmings, N.; Carter, S.; Sanchez, M.; Barreira, R.; Noury, S.; Anderson, K.; Lemmon, J.; Coe, J.; et al. Using
Unity to Help Solve Intelligence. arXiv 2020, arXiv:2011.09294.
18. Kar, A.; Prakash, A.; Liu, M.Y.; Cameracci, E.; Yuan, J.; Rusiniak, M.; Acuna, D.; Torralba, A.; Fidler, S. Meta-Sim: Learning to
Generate Synthetic Datasets. arXiv 2019, arXiv:1904.11621.
19. Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033.
[CrossRef]
20. Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy
updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3
June 2017; pp. 3389–3396. [CrossRef]
21. Merschformann, M.; Xie, L.; Li, H. RAWSim-O: A Simulation Framework for Robotic Mobile Fulfillment Systems, 8th ed.; Bundesvere-
inigung Logistik (BVL) e.V: Bremen, Germany, 2018. [CrossRef]
22. Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks
from Simulation to the Real World. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), Vancouver, BC, Canada, 24 September 2017; pp. 23–30.
23. Park, S.; Kim, J.; Kim, H.J. Zero-Shot Transfer Learning of a Throwing Task via Domain Randomization. In Proceedings of the
2020 20th International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 13–16 October 2020;
pp. 1026–1030. [CrossRef]
24. OpenAI.; Akkaya, I.; Andrychowicz, M.; Chociej, M.; Litwin, M.; McGrew, B.; Petron, A.; Paino, A.; Plappert, M.; Powell, G.; et al.
Solving Rubik’s Cube with a Robot Hand. arXiv 2019, arXiv:1910.07113.
25. Zakharov, S.; Ambrus, R.; Guizilini, V.; Kehl, W.; Gaidon, A. Photo-realistic Neural Domain Randomization. arXiv 2022,
arXiv:2210.12682.
26. Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In
Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020;
pp. 737–744. [CrossRef]
27. Jiang, H.; Wang, H.; Yau, W.Y.; Wan, K.W. A Brief Survey: Deep Reinforcement Learning in Mobile Robot Navigation. In
Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13
November 2020; pp. 592–597. [CrossRef]
28. Berner, C.; Brockman, G.; Chan, B.; Cheung, V.; D˛ebiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, C.; et al. Dota 2
with Large Scale Deep Reinforcement Learning. arXiv 2019, arXiv:1912.06680.
29. Ranaweera, M.; Mahmoud, Q.H. Bridging the Reality Gap Between Virtual and Physical Environments Through Reinforcement
Learning. IEEE Access 2023, 11, 19914–19927. [CrossRef]
Electronics 2024, 13, 985 15 of 15
30. Ranaweera, M.; Mahmoud, Q. Evaluation of Techniques for Sim2Real Reinforcement Learning. In The International FLAIRS
Conference Proceedings; Open Journal System: Gainesville, FL, USA, 2023; Volume 36. [CrossRef]
31. Vacaro, J.; Marques, G.; Oliveira, B.; Paz, G.; Paula, T.; Staehler, W.; Murphy, D. Sim-to-Real in Reinforcement Learning for
Everyone. In Proceedings of the 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR)
and 2019 Workshop on Robotics in Education (WRE), Rio Grande do Sul, Brazil, 23–25 October 2019; pp. 305–310. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.