0% found this document useful (0 votes)

42 views12 pages

An Advanced Reinforcement Learning Control Method

Uploaded by

Odamboy Djumanazarov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views12 pages

An Advanced Reinforcement Learning Control Method

Uploaded by

Odamboy Djumanazarov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

[Link]

ORIGINAL ARTICLE

An advanced reinforcement learning control method for quadruped

robots in typical urban terrains
Chi Yan1 · Ning Wang2 · Hongbo Gao1,3,4 · Xinmiao Wang1 · Chao Tang1 · Lin Zhou1 · Yuehua Li5 · Yue Wang6

Received: 9 April 2024 / Accepted: 21 November 2024 / Published online: 3 December 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024, corrected publication 2025

Abstract
Quadruped robots, with their exceptional flexibility and stable structure, are highly suitable for traversing the complex
unstructured terrains in urban environments. However, the current flexibility and stability of quadruped robots based on
reinforcement learning are still not ideal in these terrains. To address this limitation, a large-scale parallel technology-based
end-to-end teacher-student learning network framework is proposed, where the Gated Recurrent Unit achieves a potential
estimation of the heights surrounding the robot. Meanwhile, by introducing an omnidirectional terrain learning curriculum,
the robot can move in any commanded direction, achieving smooth output and tracking of motor joint angles. By utilizing
state machines, the model trained from the simulation is deployed in the Unitree Go1 robot via zero-shot learning. Simula-
tion and real-world experiments have demonstrated that this approach significantly enhances the robot’s adaptability and
mobility across various urban terrains such as gravel, grass, slopes, and steps.

Keywords Reinforcement learning · Quadruped robots · Locomotion control · Unstructured terrain

1 Introduction urban environments. The robustness of legged robots has

greatly facilitated their application in human life.
Legged robots, due to their high degree of freedom, pos- Urban terrains are replete with a variety of complex
sess the capability to move over unstructured terrains. This challenges. For example, navigating through irregular ter-
capability closely resembles that of quadrupedal animals [1], rains such as slopes and steps necessitates robots to assume
which, with their robust locomotive abilities, adapt well to significantly different postures than those required for flat
surfaces. Terrains filled with gravel or bushes are fraught
Chi Yan and Ning Wang: These authors contributed equally to this with uncertainties, where robots can easily lose footing,
work.

1
* Hongbo Gao School of Information Science and Technology, University
ghb48@[Link] of Science and Technology of China, No. 96 Jinzhai Road,
Hefei 230026, Anhui, China
Chi Yan
2
yanchi@[Link] School of Information and Security, Chongqing College
of Mobile Communication, No. 36 Dengying Avenue,
Ning Wang
Qijiang District, Chongqing 401520, Sichuan, China
lemon-807@[Link]
3
Institute of Advanced Technology, University of Science
Xinmiao Wang
and Technology of China, No. 5089 Wangjiang West Road,
wxm95@[Link]
Hefei 230088, Anhui, China
Chao Tang 4
School of Electrical and Electronic Engineering,
tangchao0108@[Link]
Nanyang Technological University, 50 Nanyang Avenue,
Lin Zhou 639798 Nanyang, Singapore
zl0508@[Link] 5
Zhejiang Lab, Kechuang Avenue, Zhongtai Sub‑District,
Yuehua Li Hangzhou 311121, Zhejiang, China
liyh@[Link] 6
College of Control Science and Engineering, Zhejiang
Yue Wang University, Hangzhou 310027, Zhejiang, China
ywang24@[Link]

Vol.:(0123456789)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3748 International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

leading to whole-body instability [2]. Traditional control often fails to learn from the teacher’s experience of failure
methods often require an ensemble of state estimation, tra- states in the early stages of training [18]. Moreover, such
jectory generation, gait optimization, and actuator control training processes typically require a considerable amount
[3–6]. However, when confronted with varied environments, of time. With the introduction of Isaac Gym’s large-scale
these controllers must undergo precise, environment-specific parallelized training methods [22], the efficiency of train-
adaptations. Such intricate designs typically demand meticu- ing quadruped robots using reinforcement learning has been
lous manual modeling and detailed parameter adjustments. greatly enhanced. However, the terrain curriculum learning
Additionally, robots are prone to losing control in environ- mechanism proposed subsequently, may cause the robot to
ments that have not been previously modeled. circle in the current terrain, and is not conducive to learning
In recent years, the application of reinforcement learning motor skills under arbitrary commands.
to quadruped robots has seen a surge in popularity, signifi- To address the existing challenges in the methods based
cantly enhancing their mobility and robustness [7–10]. Many on teacher-student framework, a new blind locomotion sys-
advanced approaches employ a plethora of sensors [11–14], tem is proposed for quadruped robots based on large-scale
such as cameras and LIDAR systems. While these external parallel training. This system can safely and robustly navi-
sensors can augment the robots’ perceptual capabilities, they gate through most unstructured terrains in urban environ-
also tend to reduce the robots’ overall robustness. Cameras, ments with commands in various directions, demonstrating
for instance, often underperform in low-light conditions, strong locomotive performance and robustness. Addition-
such as during nighttime or in foggy settings. Similarly, ally, it is capable of implicitly inferring the height of sur-
LIDAR systems might not function optimally in soft ter- rounding terrains. This algorithm has been implemented on
rains, like snow or thick bushes. In contrast, akin to how the Unitree Go1 robot based on Isaac Gym platform, and the
quadruped animals can adeptly navigate their surroundings model has been further successfully deployed in the robot
without sight, quadruped robots equipped with propriocep- operating in the real world. The main contributions are listed
tive sensors, such as Inertial Measurement Units (IMU) and as follows:
joint encoders, can efficiently traverse various terrains [15].
These proprioceptive sensors offer a more cost-effective and • An end-to-end teacher-student learning training frame-
robust solution, making them especially advantageous for work for quadruped robots is proposed, which conducts
locomotion in complex urban environments. parallel training across various unstructured terrains,
The current proprioceptive motion methods for legged thoroughly learning the privileged information of the
robots primarily encompass the following approaches. The simulation.
methods proposed by [16, 17] learn world models by train- • By incorporating a Gated Recurrent Unit (GRU) network
ing directly in the real world, thereby avoiding the simula- and an omnidirectional terrain curriculum, the potential
tion errors inherent in simulators. However, this approach estimation of the surrounding terrain’s height is achieved,
poses significant training challenges and carries a high enabling the robot to navigate through the current terrain
risk of damaging experimental equipment. The approaches in any commanded direction.
described in [2, 18] employ the VAE encoder to learn the • Zero-shot transfer generalization from simulated train-
ground truth of states in simulation, which inevitably intro- ing environments to real challenging terrains has been
duces domain randomization noise. Furthermore, when the realized, demonstrating that this reinforcement learning
robot becomes unstable, the velocity estimates will become framework can overcome complex unstructured terrains
highly unreliable. Given that proprioceptive motion methods in urban settings.
cannot acquire information such as elevation maps, friction
coefficients, or contact forces, another class of methods uses
imitation learning to implicitly predict this information. Imi-
tation learning methods mainly includes two frameworks: 2 Method
adaptation and teacher-student. The methods using adap-
tation framework include [19, 20], and the methods using This work aims to realize end-to-end foot gait planning for
teacher-student framework include [10, 21]. However, adap- quadruped robots through reinforcement learning, and accel-
tive methods often suffer from errors in estimating latent erate the learning process of RL through parallel training
information, and directly reusing the actor network may lead method. Figure 1 is the proposed framework of the reinforce-
to incorrect estimations of biased information. ment learning architecture, which can be divided into two
This work is primarily based on a teacher-student network parts: a teacher policy training network and a student policy
architecture, where the student mimics the teacher’s perfor- training network. The training of the two networks occurs
mance through knowledge distillation. However, since the in parallel, enabling the student network to continuously
training of both components occurs separately, the student assimilate the experiences from the teacher network. The

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757 3749

Fig. 1 Overview of the training methods. The teacher policy gener- Meanwhile, the student network, which is limited to accessing pro-
ates supervisory information ht , zt , at by utilizing proprioception, prioceptive information, fits the network with the label samples gen-
terrain scandots and privileged params. To optimize the teacher net- erated by the teacher network and thus outputs actions â t
work, a critic network akin to the MLP policy network 𝜋 is employed.

following sections will describe the whole learning frame- a greater impact on the robot’s learning of locomotion skills
work in detail. are selected in ot.
Action space: Since our model is an end-to-end frame-
work, the action at is a 12-D vector representing the target
2.1 Problem statement angles for each of the twelve joints.

The method models the locomotion problem of quadruped

robots as a partially observable Markov decision process
2.2 Teacher policy architecture
(POMDP), since the terrain is partially observable without
Since training this system directly using reinforcement learn-
exteroceptive sensors. The state st make up the whole envi-
ing would require billions of samples to converge, placing
ronment at time step t. The policy performs an action at , and
a high demand on training resources, a teacher network is
the transition probability p(st+1 |st , at ) pushes the environ-
used to achieve knowledge distillation. The Teacher Policy
ment to move to the next state st+1, returning a reward rt con-
Network has access to privileged information, like height
currently. The agent intends to find the optimal parameters
information, ground friction, contact force and so on, which
𝜃 of a policy 𝜋𝜃 to maximize the expected discounted return
are not available in the real world. Using proximal policy
over the future trajectory:
optimization (PPO) [24], the teacher network will learn a
[∞ ]
∑ policy 𝜋 to map proprioception information and privileged
(1)
t
J(𝜃) = E𝜋𝜃 𝛾 rt information to target joint angles.
t=0 The teacher policy Architecture consists of three com-
ponents: a Multilayer Perceptron (MLP) encoder 𝜇1, a MLP
where 𝛾 ∈ [0, 1] is a discount factor.
encoder 𝜇2 and a MLP policy network 𝜋.
State space: The state is defined as st ∶= ⟨ot , dt , et ⟩,
where ot ∈ ℝ45 represents the proprioceptive observations. ( )
ht = 𝜇1 dt (2)
These observations encompass the gravity vector and base
angular velocity in the robot’s base frame, the joint positions ( )
and velocities, and the desired base velocity command vec- zt = 𝜇2 et (3)
tor. The privileged terrain scandots dt ∈ ℝ187, which denote
the vertical distance from the sampled points to the robot ( )
at = 𝜋 ot , ht , zt (4)
base, are acquired from the simulation environment and are
utilized exclusively to train the teacher policy network. The Concretely, the method processes the privileged terrain scan-
privileged environment parameters et ∈ ℝ13 consists of the dots dt with 𝜇1 to get the latent terrain height representation
base linear velocity, ground friction coefficients, ground res- ht ∈ ℝ16, and the privileged environment parameters et with
titution coefficients, the contact state of the four feet, base 𝜇2 to get the latent environment vector zt ∈ ℝ8. The encoder
mass, and the displacement of the center of mass. The design compresses high-dimensional privileged information into a
of the state is based on the theory of [23]. Variables that have low-dimensional latent representation, retaining the most

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3750 International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

Table 1 Network architecture for teacher policy and student policy Given that it has access only to proprioceptive data, the
Network Inputs Hidden layers Outputs dynamics of the student policy can be considered as a
POMDP. Consequently, the student policy needs to esti-
𝜇1 (MLP) dt [256, 128, 64] ht mate unobservable states using the history of the observa-
𝜇2 (MLP) et [64, 32] zt tion. A fundamental premise of this approach is the hypoth-
𝜙1 (GRU-MLP) ot−1∶t−k [256, 256]-[256, 128, 64] ĥ t esis that the latent vectors ht and zt can be approximately
𝜙2 (GRU-MLP) ot−1∶t−k [256, 256]-[64, 32] ẑ t recovered from a time series of proprioceptive observations
𝜋 (MLP) ot , ht , zt [512, 256, 128] at ot−1∶t−k = {ot−1 , ⋯ , ot−k }, where k is the history length and
𝜋̂ (MLP) ot , ĥ t , ẑ t [512, 256, 128] â t is set to 6 in the experiment.
The GRU network is employed for its effectively in cap-
All network use the elu activation for hidden layers
turing the features of time series data, offering the advantage
of fewer parameters and a simpler architecture compared to
Table 2 Reward terms for task LSTM, thereby facilitating model convergence. Each GRU
Reward Equation Weight
network is followed by a MLP network, forming modules 𝜙1
� �
and 𝜙2 . Through supervised learning, the student network
Linear velocity tracking ‖vcmd −vxy ‖2 1.0 emulates the policy 𝜋̂ and reconstructs latent feature vectors
exp − xy 𝜎 2
ĥ t and ẑ t generated by 𝜙1 and 𝜙2 . The sum loss function is
� �
Angle velocity tracking ‖vcmd −vyaw ‖2 0.5 defined as
exp − yaw 𝜎 2
( )2 ( )2 ( )2
Linear velocity(z) v2z −2.0 L = ĥ t − ht + ẑ t − zt + â t − at (5)
Angular velocity(xy) ‖𝜔xy ‖22 −0.10
Different from [19], the training of the student network and
Orientation ‖gxy ‖22 −0.01
teacher network is carried out simultaneously. This concur-
Joint power ̇ T -1.5e−4
|𝜏||𝜃| rent training process allows the student network to assimilate
Joint accelerations ̈ 2
‖𝜃‖ 2
-2.5e−7 the early failure experiences of the teacher, thereby enhanc-
Body height (h target
− h) 2 −1.5
ing the robustness of the algorithm. During the training
Collision fcollision −0.1 phase, the encoders are initially randomized. The student
Feet clearance ∑3 −0.01
i=0 ‖pz
target
− piz ‖22 network trains the GRU-MLP encoders by using historical
Action rate ‖at − at−1 ‖22 −0.02 observations and latent feature vectors, combined with their
Smoothness ‖at − 2at−1 + at−2 ‖22 −0.02 ground truth, to ultimately generate actions. More details on
each layer are shown in Table 1.
( )
ĥ t = 𝜙1 ot−1∶t−k (6)
needed information and facilitating model convergence. The
MLP policy 𝜋 uses ot , dt and et as input to generate action at . ( )
More details on each layer are shown in Table 1. ẑ t = 𝜙2 ot−1∶t−k (7)
Reward: Part of the reward function refers to the design
in [25]. All components of the reward are enumerated in ( )
â t = 𝜋̂ ot , ĥ t , ẑ t (8)
Table 2. Among these, the tracking shaping scale, denoted
by 𝜎 is fixed at 0.25. The desired base height, htarget , is
another significant element set at 0.28, which ensures the
robot maintains a specific stance height during operation. 2.4 Terrain curriculum learning
The collision flag, fcollision, indicates whether a collision has
occurred, playing a vital role in penalizing undesirable con- Owing to the instability of RL in the early stage, directly
tacts. Lastly, the desired foot lift height, pz , is obtained training robots on complex terrains poses significant chal-
target

based on the height difference of the surrounding terrain. lenges. To address this, a curriculum has been developed
This reward is pivotal for adapting the robot’s gait to dif- that creates four types of terrains using a method similar to
ferent ground conditions, thereby optimizing locomotion [22]. These terrain categories include rough flats, slopes,
efficiency and stability. stairs and discrete obstacles. The environment used for train-
ing is a height-field map with 160 terrains arranged in a
2.3 Student policy architecture 20 × 8 grid, each of which is 8 × 8m2 in area. Each row in
the map represents the same terrain with increasing diffi-
The student policy network is tasked with replicating the culty, and each column represents different types of terrains.
actions of the teacher policy through supervised learning. To simulate roughness on flat terrains, noise ranging from

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757 3751

±0.2cm to ±2cm is introduced. The slopes’ inclinations vary Table 3 PPO algorithm Hyperparameter Value
from 0 to 30 degrees, accompanied by ±0.5cm of noise. For hyperparameters
stair terrains, the steps are consistently 30cm in width, with Clip param 0.2
heights escalating from 2cm to 18cm. Twenty rectangular Entropy coefficient 0.01
obstacles are set up in the discrete obstacle terrain, with the Discount factor 0.99
height increasing from 2cm to 18cm and the area increasing GAE discount factor 0.95
from 1m2 to 2m2. Desired KL-divergence 0.01
At the beginning of the training process, all robots are Learning rate 1e-3
uniformly placed on the simplest types of terrain. Given the Adam epsilon 1e-3
variability in their initial direction of velocities, these robots
may pass through the current terrain heading in diverse
directions. To address this, when passing through a terrain, The scandots adaptation loss, the privileged adaptation
a robot will be transferred to the leftmost side of the sub- loss and the actor adaptation loss during training is illus-
sequent terrain. Importantly, its base quaternion, velocity trated in Fig. 2. By converting the error produced by the
vector, and the orientation of gravity are all adjusted using actor network into joint angles, it amounts to approximately
a rotation matrix to align with the robot’s original local 0.05◦. This indicates that the student network has effectively
coordinate system. This adjustment ensures that the robot learned the experience from the teacher network.
can seamlessly continue its movement onto the next terrain Hardware details: The method is deployed in Unitree
without any disruption, which helps the robot learn to pass Go1, which is 28 cm in height and 12 kg in weight. All
through the terrain with commands in various directions. control and estimation processes runs on a laptop. The algo-
When the robot successfully traverses a quarter of the dis- rithm runs at 50Hz, and the PD controller, which is used to
tance across the next terrain, its terrain coordinate origin track the target joint angle, runs at 500Hz, with parameters
will be updated to reflect this new position, which ensures kp = 20 and kd = 0.5. The PD controller is used to convert
that upon the next initialization, the robot will commence the desired joint angles output by the model into control
from this new terrain. torques, which are then sent to the joint motors to achieve
In general, a robot progresses to more challenging terrains tracking of the desired joint angles. During the running of
only after it has successfully adapted to the difficulties pre- the robot, only joint position encoders and an IMU sensor
sented by its current terrain. This adaptation is considered are used to collect proprioception information.
successful when the robot is capable of passing through the Domain randomization: Due to the difference between
current terrain at a speed that is at least 85 percent of the the simulation and the reality, deploying the algorithm
average linear speed necessary to achieve the set rewards. directly in real robots is a great challenge. The robust pol-
Conversely, if a robot fails to traverse at least half of the icy is trained through domain randomization, with a range
distance required by its commanded linear speed by the end of parameters about robot attributes and terrain geometry.
of an episode, it will be demoted to a simpler terrain. To Details are shown in Table 4.
prevent skill forgetting, once a robot has mastered the most
challenging terrain, it will be randomly assigned to a certain 3.2 Results analysis
difficulty level within the same terrain category.
3.2.1 Evaluation of the robust locomotion

3 Experiments and results analysis The typical urban environment is mainly composed of flat
land, slopes, steps, gravel and obstacles, which bring varying
3.1 Experimental setups degrees of challenges to the robot’s robustness. Similar to
the terrain curriculum used in training phase, a testing ter-
Simulation setup: The method uses Isaac Gym [22] with rain environment is generated. It is composed of rough flats,
4,096 environments training in parallel. The model takes slopes, stairs and discrete obstacles with different difficulty
about 1 h with 400 rollouts on NVIDIA RTX 8000 to learn levels to evaluate the robust locomotion performance of the
basic movement skills, and the performance of the model student network, which has only access to the propriocep-
continues improving until about 4,000 rollouts through ter- tion buffer.
rain curriculum learning, eventually obtaining the ability to The method has been implemented on the Unitree Go1,
running in typical urban terrain at a speed of 0.6m/s. The utilizing only joint position encoders and an IMU sensor to
policies are trained by Proximal Policy Optimization [24], collect proprioceptive information. Figure 3 shows snapshots
the details of which are listed in Table 3. of the robot’s performance in typical urban terrains.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3752 International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

Fig. 2 Loss function curve. The

current loss is recorded every
ten epochs during training. The
figure displays the outcome
after data smoothing. When the
sum loss value is approximately
0.2, the corresponding error,
once converted to joint angle,
is around 0.05◦, which is within
the permissible limits

Table 4 The domain parameters applied in the simulation In all experiments, the robot successfully traverses
through rough flats without a single failure, which simulate
Parameter Randomization range Unit
terrains like grass and gravel through different terrain levels
CoM [−0.1, 0.1] × [−0.1, 0.1] × [−0.1, m and attributes. This type of terrain obstructs the robot’s feet,
0.1] potentially causing dynamic angular slips during movement.
Payload Mass [−0.5, 2.0] Kg The robot must stabilize entangled feet and actively navi-
Friction [0.05, 3.0] - gate through some of these obstacles to effectively traverse
Restitution [0.0, 1.0] - such terrain. When confronted with slopes, the robot can
Motor Strength [0.8, 1.2] × motor torque Nm move on terrain with an inclination angle of up to 30◦. If the
Joint Kp [0.8, 1.2] × 20 - slope continues to increase, which has not been encountered
Joint Kd [0.8, 1.2] × 0.5 - during training, the robot will yaw while moving forward.

Fig. 3 The Unitree Go1 runs in

various typical urban terrains.
When confronted with flat,
rough flat with noise ranging
from ±0.2cm to ±2cm, discrete
obstacles (17 cm in height),
slope with a 30◦ angle of
inclination and steps (15 cm in
height), the robot could operate
normally with commands in any
direction

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757 3753

By manually giving a reverse yaw command, the robot can angles, as shown in the Fig. 4. The robot walks on flat
climb a 40◦ slope. This experiment also evaluates the robot’s ground at a speed of 0.5m/s. It can be seen that the joint
mobility performance on stairs. When going down stairs, the movement of the robot is relatively smooth, with the thigh
robot can successfully navigate steps up to 18 cm high with and calf joints tracking the target joint angle well. A error of
a 100% success rate, and when going up, it has the ability 0.05◦ is observed in the hip joint motor, which has no impact
to navigate 15 cm steps with a 90% probability, effectively on the actual movement.
handling typical urban stair terrains. The robot can cross
discrete obstacles up to 17 cm high. However, when only
one foot is stuck in this case, it will be difficult for the robot 3.2.2 Comparison with baseline
to escape.
The results indicate that the method demonstrates excel- To further assess the motion performance of the method, it
lent generalization capabilities, adapting to various typical is compared with baseline methods in the simulation. Algo-
urban terrains only with proprioceptive sensors. An interest- rithms are evaluated through the robot’s performance in sev-
ing phenomenon is that when encountering stairs, the robot eral typical urban terrains.
often initially attempts to touch the vertical surface of the Baselines: The method compares to the following
step, then adjusts its posture and crosses the terrain with baselines:
a continuous gait. This confirms that the adaptive module
introduced in this method can estimate stairs effectively. 1. RMA [19]: A similar teacher-student learning network,
Moreover, since the terrain is inferred from historical obser- in which privileged information is extracted through a
vations, the robot occasionally missteps. However, it exhibits 1-D CNN adaptation module, and the training is con-
an impressive ability to recover, swiftly adjusting its posture ducted in two stages.
to continue its progress. 2. MoB [20]: This controller algorithm can be generalized
In addition, this experiment assesses the alignment to various behaviors. By manually adjusting the gait
between the model’s network output and the actual joint parameters, it adapts well to unknown terrains.

Fig. 4 The joint angle tracking curve. The robot traverses flat terrain at a pace of 0.5 ms per second. Target joint angles are derived from the
model’s output through a conversion process. Meanwhile, the actual motor angles are obtained by measuring the joint motor angles

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3754 International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

3. MPC [4]: The algorithm embedded in the Unitree Go1 3.2.3 Performance in the real world
implements model predictive control through a finite
state machine. In order to deploy the simulation-trained model to a real
robot, this work implements a motion control system for
These methods are deployed on the Unitree Go1 and evalu- quadruped robots based on state machine and User Data-
ated in some typical urban terrains. The experiment con- gram Protocol (UDP) communication, which provides a
structs step terrains of 15 cm and 18 cm in height, and a flexible framework to freely switch between reinforcement
slope of approximately 30◦, as well as a discrete obstacle ter- learning-based control and traditional control. The relevant
rain of 17 cm in height, are identified. The results presented code is posted on github: [Link]
in the Table 5 indicate that the method is generally superior ee_rl. Leveraging zero-shot learning, the simulation-trained
to the baseline methods. model is seamlessly integrated into the real-world robot.
Figure 5 displays the Unitree Go1 robot, equipped with
In each terrain test, the robot’s forward speed is set at the algorithm presented in this work, passing through an
0.5m/s. A test will be considered successful if the robot can array of challenging terrains. Owing to the domain rand-
traverse the terrain without falling. In the experiment, each omization technology, the robot consistently achieves high-
terrain and algorithm are tested 10 times, and their suc- performance locomotion on flat surfaces with varying fric-
cess rates are calculated. As shown in Table 5, the method tion and restitution coefficients. When facing slope terrain,
outperforms all others in ascending and descending stairs the robot demonstrates exceptional stability by adjusting its
and can navigate obstacles as high as 17 cm. However, its gait and the orientation of its base in a targeted manner.
performance declines sharply with steps higher than 17 cm. Notably, the slope angle of the terrain in Fig. 5(f) is about
This may be due to the latent variables extracted through 30◦, which exceeds the slope angle trained in the simula-
the teacher-student learning network not accurately estimat- tion, underscoring the robot’s robust movement ability in
ing the robot’s own state. As the difficulty of the terrain unfamiliar terrain. Moreover, the robot exhibits remarkable
increases, the learning difficulty of the student side also stability and adaptability when traversing surfaces that typi-
increases, resulting in less stable predicted output actions. cally induce slipping, such as soft grassy areas and cobble-
The experiment also assesses the robot’s load-bearing stone paths. Through rigorous testing, it has been confirmed
capacity by observing its locomotion performance on flat that the robot can ascend stairs up to 15 cm in height while
ground with added weight. By employing domain randomi- maintaining a stable posture, which is similar to the simula-
zation techniques, the robot continues to operate normally tion results. The real robot’s demonstration videos can be
even when subjected to a load of up to 2 kg. Moreover, the viewed at the link below: [Link] stx12 3.g ithub.i o/R
L-c ontr
robot maintains a stable posture even when subjected to ol/, which reflects the effectiveness of the algorithm pro-
moderate external force impacts. posed by this work in real robots.
Besides, owing to the omni-directional terrain curricu- In Table 6, several terrains that are challenging for robots
lum, the robot could pass through these complex terrains are listed. The robot traverses these terrains using command
with commands in any direction. The RMA method, which in different directions, repeating the process ten times to
is trained by a single robot, only achieves traveling through obtain corresponding success rates. To ensure statistical
complex terrain at a command in the positive x-axis direc- significance, the success rates are further processed using
tion due to the limits of training time. the Wilson Score Interval method to obtain a 95% confi-
dence interval. The robot can successfully traverse a 16cm
high obstacle using commands in any direction without any
failures. On a 30◦ grassy slope, the robot may tip over side-
Table 5 Results compared with baselines (95% confidence level) ways due to its own stability issues during lateral traversal.
Terrain Metrics Success rate (%) For 15cm high steps, the backward bending structure of the
legs may cause the robot to get stuck when moving back-
Ours RMA MoB MPC
wards over consecutive steps, leading to a lower success
Slope 30◦ 86.1 ± 13.9 50.0 ± 26.3 28.3 ± 22.7 13.9 ± 13.9 rate. Thanks to the design of the omnidirectional terrain cur-
Step up 15cm 78.9 ± 19.3 21.1 ± 19.3 13.9 ± 13.9 13.9 ± 13.9 riculum, the robot can traverse unstructured terrains in any
Step 18cm 86.1 ± 13.9 78.9 ± 19.3 42.8 ± 26.0 21.1 ± 19.3 direction in the real world.
down This experiment additionally recorded the foot height
Obstacle 17cm 71.7 ± 22.7 13.9 ± 13.9 13.9 ± 13.9 13.9 ± 13.9 relative to its base when the robot was walking under vari-
ous typical urban terrains. The results, illustrated in Fig. 6,
The success rates are presented with their 95% confidence intervals,
calculated using the Wilson Score Interval method. Each value repre- shows the recording results of the front left foot. These
sents the success rate from 10 trials for each terrain outcomes suggest that the robot can potentially infer the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757 3755

Fig. 5 Snapshots of Unitree

Go1 evaluating performance in
various real-world scenes

Fig. 6 Height curve of the front left foot relative to base under typical terrains

surrounding terrain characteristics from historical obser- adjust its gait within 1 s to maintain stability. Notably, due
vations, and accordingly adjust its gait for various terrains. to the lack of visual sensors, the robot cannot anticipate
To conserve energy, the robot typically opts for a lower the need to elevate its feet before encountering steps. How-
gait height on flat surfaces. When encountering occasional ever, upon making contact with a step, it proactively raises
slipping or instability on uneven terrains, the robot can its feet to climb the stairs, which greatly demonstrates the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3756 International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

Table 6 Results in the real world (95% confidence level) References

Terrain Metrics Command direction Success rate (%)
1. Peng XB, Coumans E, Zhang T, Lee T-WE, Tan J, Levine S
Grassy slope 30◦ Forward 86.1 ± 13.9 (2020) Learning agile robotic locomotion skills by imitating
Sideways 71.7 ± 22.7 animals. In: Robotics: Science and Systems
2. Nahrendra I, Oh M, Yu B, Lim H, Myung H (2023) Robust
Backward 78.9 ± 19.3 recovery motion control for quadrupedal robots via learned ter-
Steps 15cm Forward 78.9 ± 19.3 rain imagination. arXiv preprint arXiv:2306.12712
Sideways 71.7 ± 22.7 3. Bledt G, Powell MJ, Katz B, Di Carlo J, Wensing PM, Kim S
Backward 57.2 ± 26.0 (2018) Mit cheetah 3: Design and control of a robust, dynamic
quadruped robot. In: 2018 IEEE/RSJ International Conference
Obstacle 16cm Forward 86.1 ± 13.9 on Intelligent Robots and Systems (IROS), IEEE, 2245–2252
Sideways 86.1 ± 13.9 4. Di Carlo J, Wensing PM, Katz B, Bledt G, Kim S (2018)
Backward 86.1 ± 13.9 Dynamic locomotion in the mit cheetah 3 through convex
model-predictive control. In: 2018 IEEE/RSJ International
The success rates are presented with their 95% confidence intervals, Conference on Intelligent Robots and Systems (IROS), IEEE,
calculated using the Wilson Score Interval method. Each value repre- 1–9
sents the success rate from 10 trials for each terrain in the real world 5. Kim D, Di Carlo J, Katz B, Bledt G, Kim S (2019) Highly
dynamic quadruped locomotion via whole-body impulse control
and model predictive control. arXiv preprint arXiv:1909.06586
6. Katz B, Di Carlo J, Kim S (2019) Mini cheetah: A platform
for pushing the limits of dynamic quadruped control. In: 2019
powerful motion performance of the robot under limited International Conference on Robotics and Automation (ICRA),
IEEE, 6295–6301
sensors. 7. Hoeller D, Rudin N, Sako D, Hutter M (2024) Anymal parkour:
Learning agile navigation for quadrupedal robots. Sci Robot
9(88):7566
8. Schneider L, Frey J, Miki T, Hutter M (2023) Learning risk-
aware quadrupedal locomotion using distributional reinforce-
4 Conclusion ment learning. arXiv preprint arXiv:2309.14246
9. Vollenweider E, Bjelonic M, Klemm V, Rudin N, Lee J, Hutter
This work introduces an end-to-end teacher-student net- M (2023) Advanced skills through multiple adversarial motion
work architecture, which dramatically accelerating the priors in reinforcement learning. In: 2023 IEEE International
Conference on Robotics and Automation (ICRA), IEEE,
training process via large-scale parallel training tech- 5120–5126
niques. To efficiently capture the features of time series 10. Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P (2022)
data, the architecture incorporates the GRU network for Rapid locomotion via reinforcement learning. Int J Robot Res
estimating potential surrounding environmental parame- 43:572–587
11. Yang R, Zhang M, Hansen N, Xu H, Wang X (2021) Learning
ters. Additionally, this work introduces an omnidirectional vision-guided quadrupedal locomotion end-to-end with cross-
terrain curriculum learning approach, enabling quadruped modal transformers. arXiv preprint arXiv:2107.03996
robots to traverse terrains with commands in any given 12. Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M
direction. Simulations and real-world experiments have (2022) Learning robust perceptive locomotion for quadrupedal
robots in the wild. Sci Robot 7(62):2822
demonstrated that this method significantly enhances the 13. Loquercio A, Kumar A, Malik J (2023) Learning visual loco-
flexibility and stability of robot locomotion in typical motion with cross-modal supervision. In: 2023 IEEE Interna-
urban terrains. Robots can traverse various unstructured tional Conference on Robotics and Automation (ICRA), IEEE,
terrains with a success rate surpassing the baseline. Even 7295–7302
14. Cheng X, Shi K, Agarwal A, Pathak D (2023) Extreme parkour
in instances of occasional slipping or instability, robots with legged robots. arXiv preprint arXiv:2309.14341
can adjust their state within 1 s to regain stability. How- 15. Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2020)
ever, the method falls short in addressing accidental foot- Learning quadrupedal locomotion over challenging terrain. Sci
stuck situations. Future work will explore the development Robot 5(47):5986
16. Wu P, Escontrela A, Hafner D, Abbeel P, Goldberg K (2023)
of a self-recovery mechanism to address this challenge. Daydreamer: World models for physical robot learning. In: Con-
ference on Robot Learning, PMLR, 2226–2240
Acknowledgements This work was supported in part by the National 17. Smith L, Kostrikov I, Levine S (2022) A walk in the park:
Natural Science Foundation of China (Grant No. U2013601), in part Learning to walk in 20 minutes with model-free reinforcement
by Anhui Province Natural Science Funds for Distinguished Young learning. arXiv preprint arXiv:2208.07860
Scholar (Grant No. 2308085J02), and in part by Innovation Leading 18. Nahrendra IMA, Yu B, Myung H (2023) Dreamwaq: Learning
Talent of Anhui Province TeZhi plan. robust quadrupedal locomotion with implicit terrain imagination
via deep reinforcement learning. In: 2023 IEEE International Con-
Data availability Partial code is publicly available at [Link] ference on Robotics and Automation (ICRA), IEEE, 5078–5084
com/dstx123/unitree_rl. Some demonstration videos showcasing the 19. Kumar A, Fu Z, Pathak D, Malik J (2021) Rma: Rapid motor
results can be accessed at [Link] These adaptation for legged robots. arXiv preprint arXiv:2107.04034
resources are freely available for further research purposes.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757 3757

20. Margolis, G.B., Agrawal, P.: Walk these ways: Tuning robot con- 25. Long J, Wang Z, Li Q, Gao J, Cao L, Pang J (2023) Hybrid inter-
trol for generalization with multiplicity of behavior. In: Confer- nal model: Learning agile legged locomotion with simulated robot
ence on Robot Learning, pp. 22–31 (2023). PMLR response. ArXiv
21. Wu J, Xin G, Qi C, Xue Y (2023) Learning robust and agile legged
locomotion using adversarial motion priors. IEEE Robot Autom Publisher's Note Springer Nature remains neutral with regard to
Lett 8:4975 jurisdictional claims in published maps and institutional affiliations.
22. Rudin N, Hoeller D, Reist P, Hutter M (2022) Learning to walk
in minutes using massively parallel deep reinforcement learning. Springer Nature or its licensor (e.g. a society or other partner) holds
In: Conference on Robot Learning, PMLR, 91–100 exclusive rights to this article under a publishing agreement with the
23. Yu W, Yang C, McGreavy C, Triantafyllidis E, Bellegarda G, author(s) or other rightsholder(s); author self-archiving of the accepted
Shafiee M, Ijspeert AJ, Li Z (2023) Identifying important sen- manuscript version of this article is solely governed by the terms of
sory feedback for learning locomotion skills. Nat Mach Intell such publishing agreement and applicable law.
5(8):919–932
24. Schulman J, Wolski F, Dhariwal P, Radford A (2017) Klimov, O.:
Proximal policy optimization algorithms. arXiv preprint arXiv:
1707.06347

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

onlineservice@[Link]

Predictive Navigation for Autonomous Vehicles
No ratings yet
Predictive Navigation for Autonomous Vehicles
8 pages
SHAP Values for LightGBM Explainability
No ratings yet
SHAP Values for LightGBM Explainability
5 pages
Open-Source HyperDog Quadruped Robot
No ratings yet
Open-Source HyperDog Quadruped Robot
6 pages
NFTSM Control for Mecanum Robots
No ratings yet
NFTSM Control for Mecanum Robots
7 pages
Sensors 25 00025
No ratings yet
Sensors 25 00025
30 pages
Multi-AGV Path Planning Algorithm
No ratings yet
Multi-AGV Path Planning Algorithm
6 pages
Fuzzy Pure Pursuit for UGV Navigation
No ratings yet
Fuzzy Pure Pursuit for UGV Navigation
29 pages
DoubleBee: Hybrid Aerial-Ground Robot
No ratings yet
DoubleBee: Hybrid Aerial-Ground Robot
10 pages
Path Planning in Multi-Agent Robotics
No ratings yet
Path Planning in Multi-Agent Robotics
7 pages
Acceleration Velocity Trajectory Optimization of Intelligent EVs Usi - 2018 - IF
No ratings yet
Acceleration Velocity Trajectory Optimization of Intelligent EVs Usi - 2018 - IF
5 pages
Control Systems for Collaborative Robots
No ratings yet
Control Systems for Collaborative Robots
24 pages
CNN-LSTM Model for HEV Fault Diagnosis
No ratings yet
CNN-LSTM Model for HEV Fault Diagnosis
16 pages
Mechanical Intelligence in Limbless Locomotion
No ratings yet
Mechanical Intelligence in Limbless Locomotion
78 pages
Robotics Projects 2025
No ratings yet
Robotics Projects 2025
12 pages
Improved APF for Robot Navigation
No ratings yet
Improved APF for Robot Navigation
33 pages
Hybrid CFD and Neural Network for PMSM Temperature Prediction
No ratings yet
Hybrid CFD and Neural Network for PMSM Temperature Prediction
8 pages
DRL for Trajectory Planning in Robotics
No ratings yet
DRL for Trajectory Planning in Robotics
10 pages
Nonlinear Visual Servo Control for Robots
No ratings yet
Nonlinear Visual Servo Control for Robots
7 pages
Robotics Research The International Journal Of: Bounding On Rough Terrain With The Littledog Robot
No ratings yet
Robotics Research The International Journal Of: Bounding On Rough Terrain With The Littledog Robot
25 pages
Genetic Algorithm for Robot Trajectory Optimization
No ratings yet
Genetic Algorithm for Robot Trajectory Optimization
11 pages
Continuous System Modeling Overview
No ratings yet
Continuous System Modeling Overview
10 pages
Advances in PID Control 2018 Proceedings
No ratings yet
Advances in PID Control 2018 Proceedings
6 pages
Concurrent Training for Legged Locomotion
No ratings yet
Concurrent Training for Legged Locomotion
9 pages
PID Control in Robotics Seminar Report
No ratings yet
PID Control in Robotics Seminar Report
19 pages
PID Control for Nonlinear Systems
No ratings yet
PID Control for Nonlinear Systems
6 pages
Unified Learning and Control for Robotics
No ratings yet
Unified Learning and Control for Robotics
264 pages
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
No ratings yet
Sciencedirect: © 2018, Ifac (International Federation of Automatic Control) Hosting by Elsevier Ltd. All Rights Reserved
6 pages
Chomp
No ratings yet
Chomp
30 pages
Self Healing Robots
No ratings yet
Self Healing Robots
43 pages
Bio-Inspired Designs in Mechanical Engineering
No ratings yet
Bio-Inspired Designs in Mechanical Engineering
37 pages
Continuous System Modeling by Cellier
No ratings yet
Continuous System Modeling by Cellier
10 pages
Mobile Robot Navigation Strategy
No ratings yet
Mobile Robot Navigation Strategy
7 pages
Flexible Robot Arm Project Overview
No ratings yet
Flexible Robot Arm Project Overview
7 pages
Centralized Path Planning for Multi-Robots
No ratings yet
Centralized Path Planning for Multi-Robots
13 pages
Bio-Inspired Speed Control for Bipedal Robots
No ratings yet
Bio-Inspired Speed Control for Bipedal Robots
29 pages
36754165
No ratings yet
36754165
59 pages
Mobile Robot Obstacle Avoidance Strategy
No ratings yet
Mobile Robot Obstacle Avoidance Strategy
15 pages
Inverse Design of Bézier Metamaterials
No ratings yet
Inverse Design of Bézier Metamaterials
14 pages
MixNet: Reconfigurable Fabric for MoE Training
No ratings yet
MixNet: Reconfigurable Fabric for MoE Training
22 pages
Optimizing Warehouse Robot Picking Systems
No ratings yet
Optimizing Warehouse Robot Picking Systems
6 pages
Editing Test Instructions and Samples
No ratings yet
Editing Test Instructions and Samples
5 pages
BRepFormer: Advanced B-rep Feature Recognition
No ratings yet
BRepFormer: Advanced B-rep Feature Recognition
10 pages
Inuktun Modular Robot for Pipe Inspection
No ratings yet
Inuktun Modular Robot for Pipe Inspection
6 pages
Energy Management in Legged Robots
No ratings yet
Energy Management in Legged Robots
20 pages
Multi-Agent RL for Ramp Metering Control
No ratings yet
Multi-Agent RL for Ramp Metering Control
7 pages
CFD: Electric Conduction & Joule Heating
No ratings yet
CFD: Electric Conduction & Joule Heating
21 pages
Electrical and Electronics Engineering Research Output
No ratings yet
Electrical and Electronics Engineering Research Output
29 pages
Soft Manipulators: Real-Time Dynamics Analysis
No ratings yet
Soft Manipulators: Real-Time Dynamics Analysis
29 pages
Improved VFF for Robot Path Planning
No ratings yet
Improved VFF for Robot Path Planning
11 pages
Geometric Inequalities Methods of Proving (Problem Books in Mathematics) eBook quick access link
No ratings yet
Geometric Inequalities Methods of Proving (Problem Books in Mathematics) eBook quick access link
46 pages
Modified A* Algorithm for Mobile Robots
No ratings yet
Modified A* Algorithm for Mobile Robots
12 pages
5.a Method of Using A Scoring Algorithm To Find Similar Diagnoses in Medical Information Systems
No ratings yet
5.a Method of Using A Scoring Algorithm To Find Similar Diagnoses in Medical Information Systems
4 pages
4.artificial Intelligence Is Used To Identify Tomato Diseases Through
No ratings yet
4.artificial Intelligence Is Used To Identify Tomato Diseases Through
5 pages
Artificial Intelligence Is Used To Identify Tomato Diseases Through Leaf Analysis
No ratings yet
Artificial Intelligence Is Used To Identify Tomato Diseases Through Leaf Analysis
4 pages
2.detection of Eye Disease in Retinal Images Based On Haar Wavelets
No ratings yet
2.detection of Eye Disease in Retinal Images Based On Haar Wavelets
4 pages
Python Programming For Beginners The Most Comprehensive Programming Guide To Become A Python Expert From Scratch in No Time 9798354101856 Compress
No ratings yet
Python Programming For Beginners The Most Comprehensive Programming Guide To Become A Python Expert From Scratch in No Time 9798354101856 Compress
132 pages
Shear Force and Bending Moment Diagrams
No ratings yet
Shear Force and Bending Moment Diagrams
2 pages
Algebraic Structures and Homomorphisms
No ratings yet
Algebraic Structures and Homomorphisms
5 pages
Geopolymer Mortar Strengths: S/FA Ratio Effects
No ratings yet
Geopolymer Mortar Strengths: S/FA Ratio Effects
10 pages
8.2 Columns Design
No ratings yet
8.2 Columns Design
29 pages
Cooled Multi Pulsejet Propulsion Analysis
No ratings yet
Cooled Multi Pulsejet Propulsion Analysis
9 pages
Motorless Weed Cutter Design Overview
No ratings yet
Motorless Weed Cutter Design Overview
3 pages
IRAffinity-1 ATR Accessory Overview
No ratings yet
IRAffinity-1 ATR Accessory Overview
2 pages
2D & 3D Transformations in Graphics
No ratings yet
2D & 3D Transformations in Graphics
42 pages
IIT-JEE 2023 Chemistry DPP: H-Atom Spectrum
No ratings yet
IIT-JEE 2023 Chemistry DPP: H-Atom Spectrum
2 pages
By Eric Leca and G. Wayne Clough, Fellow, ASCE: J. Geotech. Engrg. 1992.118:558-575
No ratings yet
By Eric Leca and G. Wayne Clough, Fellow, ASCE: J. Geotech. Engrg. 1992.118:558-575
18 pages
Mole Calculation Practice Worksheet
No ratings yet
Mole Calculation Practice Worksheet
3 pages
ASTM E213 Unlocked
No ratings yet
ASTM E213 Unlocked
11 pages
Bolt Grades and Strength Overview
No ratings yet
Bolt Grades and Strength Overview
3 pages
AWS D1.1 Errata Sheet Corrections
No ratings yet
AWS D1.1 Errata Sheet Corrections
11 pages
Understanding Isometries in Geometry
No ratings yet
Understanding Isometries in Geometry
2 pages
Insects' Erratic Flight at Lights
No ratings yet
Insects' Erratic Flight at Lights
15 pages
Class 11 Chemistry Mock Exam 2025
No ratings yet
Class 11 Chemistry Mock Exam 2025
11 pages
ASCO 7000 Series ADTB Operator Manual
No ratings yet
ASCO 7000 Series ADTB Operator Manual
20 pages
Grade 8 Lesson Plan: Compounds
100% (1)
Grade 8 Lesson Plan: Compounds
4 pages
KKS Handbook Edition 10 Overview
No ratings yet
KKS Handbook Edition 10 Overview
85 pages
Group Theory Basics and Examples
No ratings yet
Group Theory Basics and Examples
6 pages
Solar-Powered BLDC Water Pumping System
No ratings yet
Solar-Powered BLDC Water Pumping System
53 pages
Resonant Frequency in Electric Fields
No ratings yet
Resonant Frequency in Electric Fields
18 pages
Wiki Section Modulus
No ratings yet
Wiki Section Modulus
6 pages
A - Thermo Scientific Fiberlite Bottles - en
No ratings yet
A - Thermo Scientific Fiberlite Bottles - en
2 pages
Dynamics of Multibody Systems Third Edition Ahmed A. Shabana Ready To Read
No ratings yet
Dynamics of Multibody Systems Third Edition Ahmed A. Shabana Ready To Read
95 pages
Fluid Mechanics: Hydrostatics & Flow Analysis
No ratings yet
Fluid Mechanics: Hydrostatics & Flow Analysis
12 pages
PDE Exercises and Solutions Overview
No ratings yet
PDE Exercises and Solutions Overview
14 pages
Energization Plan for Road Utilities Project
100% (1)
Energization Plan for Road Utilities Project
5 pages
127 35.86kg-m
No ratings yet
127 35.86kg-m
2 pages

An Advanced Reinforcement Learning Control Method

Uploaded by

An Advanced Reinforcement Learning Control Method

Uploaded by

International Journal of Machine Learning and Cybernetics (2025) 16:3747–3757

An advanced reinforcement learning control method for quadruped

Keywords Reinforcement learning · Quadruped robots · Locomotion control · Unstructured terrain

1 Introduction urban environments. The robustness of legged robots has

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

The method models the locomotion problem of quadruped

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 2 Loss function curve. The

Fig. 3 The Unitree Go1 runs in

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 5 Snapshots of Unitree

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 6 Results in the real world (95% confidence level) References

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

You might also like