ADHDP-based robust self-learning 3D trajectory tracking control for underactuated UUVs
- Published
- Accepted
- Received
- Academic Editor
- Valentina Emilia Balas
- Subject Areas
- Adaptive and Self-Organizing Systems, Autonomous Systems, Robotics
- Keywords
- Unmanned underactuated vehicles (UUVs), Robust adaptive control, Trajectory tracking, Action-dependent heuristic dynamic programming (ADHDP)
- Copyright
- © 2024 Zhao et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
- Cite this article
- 2024. ADHDP-based robust self-learning 3D trajectory tracking control for underactuated UUVs. PeerJ Computer Science 10:e2605 https://doi.org/10.7717/peerj-cs.2605
Abstract
In this work, we propose a robust self-learning control scheme based on action-dependent heuristic dynamic programming (ADHDP) to tackle the 3D trajectory tracking control problem of underactuated uncrewed underwater vehicles (UUVs) with uncertain dynamics and time-varying ocean disturbances. Initially, the radial basis function neural network is introduced to convert the compound uncertain element, comprising uncertain dynamics and time-varying ocean disturbances, into a linear parametric form with just one unknown parameter. Then, to improve the tracking performance of the UUVs trajectory tracking closed-loop control system, an actor-critic neural network structure based on ADHDP technology is introduced to adaptively adjust the weights of the action-critic network, optimizing the performance index function. Finally, an ADHDP-based robust self-learning control scheme is constructed, which makes the UUVs closed-loop system have good robustness and control performance. The theoretical analysis demonstrates that all signals in the UUVs trajectory tracking closed-loop control system are bounded. The simulation results for the UUVs validate the effectiveness of the proposed control scheme.
Introduction
Recently, uncrewed underwater vehicles (UUVs) have garnered substantial interest from research teams due to potential engineering applications such as underwater archaeology, torpedo deployment, and cable installation (Peng, Wang & Han, 2019; Qiao & Zhang, 2020; Wang et al., 2021a; Wang, Gao & Zhang, 2021; Yuan et al., 2023). Most oceanic missions require the tracking of a predefined desired trajectory. In practice, UUVs are often underactuated, displaying significant nonlinearity, model parameter uncertainties, and vulnerability to unknown time-varying disturbances in the ocean environment (Li et al., 2023; Zhou et al., 2023). These features greatly complicate and challenge the development of trajectory tracking controllers.
Since the majority of UUVs are outfitted with three propellers to control surge, pitch, and yaw motions, they are classified as underactuated systems, where the number of independent actuators is less than the degrees of freedom. Yet, the existence of underactuated characteristic makes the design of UUV trajectory tracking control law more difficult. Further, the incomplete development of the hydraulic system complicates the acquisition of UUV model parameters. The neural networks (NNs) have gained substantial attention in research because of its universal approximation property. In Shi et al. (2024), Van (2019), Liu & Du (2021), Wang et al. (2019), NNs were applied to approximate the unknown nonlinear dynamics. Nevertheless, achieving precise estimation with NNs necessitates weight identification, which adds considerable computational complexity. To overcome this issue, Shen et al. (2020) proposes combining minimum learning parameter technology with RBFNNs and a fixed-time sliding mode control scheme with RBFNN disturbance observer and the single-parameter learning was proposed in Zhu et al. (2023). A robust trajectory tracking control scheme is proposed in Heshmati-Alamdari, Nikou & Dimarogonas (2021) for underactuated underwater vehicle with obstacles in the constrained workspace. In Chen et al. (2017), a robust control method is developed though based on sliding mode control and fuzzy control for underactuated underwater vehicle with uncertain model parameters and unknown disturbances. Accurate trajectory tracking with good robustness is achieved based on a finite-time extended state observer in Wang et al. (2021b).
Currently, most proposed control schemes focus on achieving stable tracking without considering the optimal performance of UUVs under uncertain dynamics and time-varying disturbances. Nonetheless, the intricate marine environment inevitably impacts the control performance of systems with fixed design parameters. In recent times, there has been a wealth of research on the application of adaptive dynamic programming (ADP) in numerous systems to achieve optimal control (Tong, Sun & Sui, 2018; Li, Sun & Tong, 2019). The fault-tolerant tracking control problem for UUVs subjected to time-varying ocean current disturbance is converted into an optimal control problem through the ADP approach in Che & Yu (2020). Through the integration of the backstepping design method, a fault-tolerant control scheme is formulated using a single-critic network-based ADP approach in Che (2022). In Wen et al. (2019), virtual and actual control are designed as the optimization solution of corresponding subsystems, and an optimized backstepping technique based on an actor network is utilized to enhance the control performance of the surface vessel system. A fault-tolerant fuzzy-optimized control scheme for tracking control of underactuated autonomous underwater vehicles is proposed in Gong, Er & Liu (2024) for complicated oceanic environments, particularly with the existence of unknown actuator failures and uncertain dynamics. Based on the above analyses, it is imperative to explore an intelligent control scheme with self-learning and self-adaptive capability to improve the control performance of UUVs under uncertain dynamics and time-varying ocean disturbances.
Motivated by the above discussions, this work develops a robust self-learning control scheme based on action-dependent heuristic dynamic programming (ADHDP) for underactuated UUVs subject to uncertain dynamics and unknown time-varying external disturbances. The main contributions of this work are as follows.
The fact that a large number of control design parameters (e.g., adaptive gain, leakage term gain, etc.) must be artificially set to fixed values makes it difficult to ensure the optimization of control performance metrics for existing robust adaptive control methods in Li et al. (2021) and Yang et al. (2021). The ADHDP-based control scheme proposed in this article can automatically update the control parameters in response to dynamic uncertainties and unknown time-varying perturbations, and our work not only reduces the need for manually designing the control design parameters, but also greatly improves the control performance and adaptability of the system.
Unlike traditional optimal tracking control schemes, such as those in Zhang, Li & Liu (2018) and Zheng et al. (2020), which usually require partial model information (e.g., the control gain is known or partially known) for parameter tuning and strategy optimization, the ADHDP-based control scheme proposed in this article no longer relies on the model parameter information, but only the input and output data of the UUV.
The remaining segments of this work are organized as follows. The problem formulation and preliminaries are introduced in “Problem Formulation and Preliminaries”. The details of design process and stability proof of the robust self-learning are given in “Control Law Design”. “Simulation Results” presents the simulation results. The conclusions of this work are summed up in “Conclusion”.
Notations: In this work, denotes the diagonal matrix; denotes the minimum eigenvalue of the matrix; denotes the 2-norm value of the matrix or vector; and denote the sine and cosine functions, respectively.
Problem formulation and preliminaries
Motion mathematical model of UUVs
The mathematical model of underactuated UUVs (Yan et al., 2024) is given as:
(1)
(2) where is position vector consist of the surge position , the sway position , the heave position , the pitch angle and the yaw angle of UUVs in the earth-fixed frame. is the velocity vector consist of the surge velocity , the sway velocity , the heave velocity , the pitch velocity and the yaw velocity of UUVs in the body-fixed frame. In this article, we define the earth-fixed and the body-fixed reference frames of the UUV as indicated in Fig. 1. represents the hydrodynamic damping matrix. is the positive-definite inertia matrix. is the control input vector consist of surge force , pitch moment and yaw moment . represents the environmental disturbance vector. with , , and are water density, gravity acceleration, displaced volume and longitudinal metacentric height, respectively. represents the rotation matrix. represents the Coriolis and centripetal force matrix.
Figure 1: The earth-fixed and the body-fixed reference frames of the UUV.
Assumption 1 The model parameters matrices , and are unavailable.
Assumption 2 The disturbance vector is bound yet unknown time-varying and , is a positive constant.
Assumption 3 The desired trajectory of UUVs in this article is notated as . The component of and their first two time derivatives are bound.
Assumption 4 The sway velocity and the heave velocity are bounded.
Transformation of UUV’s position
To initiate the discussion, the subsequent coordinate transformation is employed to tackle the underactuation issue of UUVs:
(3) where is a small positive constant.
According to Eqs. (1)–(3), we can obtain the following equation as:
(4)
(5) where , , , , , , , and .
The goal of this work is to propose a robust self-learning control law based on ADHDP to ensure that the UUV trajectory tracking error converges to a small compact set in the presence of uncertain dynamics, and unknown time-varying disturbances, while all signals in the UUV trajectory tracking closed-loop control system are bounded.
Lemma 1: (Luo et al., 2020; Na et al., 2020) Notate is a continuous function which is defined on a compact set . There has a radial basis function neural network (RBFNN) for an arbitrarily small constant satisfy
(6) where denotes optimal weight vector. denotes the reconstruction error of NNs, which satisfies . denotes the basic function vector. Herein, is chosen as the Gaussian function.
Control law design
Robust adaptive NN control law design
Notate the tracking error as:
(7)
According to Eqs. (5) and (7), we have
(8)
Design the virtual control law for as follows
(9) where is positive-definite design matrix.
To proceed, we can define the velocity error vector as follows:
(10)
In the light of Eqs. (6) and (10), we can obtain the time derivative of
(11)
Using the RBF NN to approximate the uncertain term . We have
(12) where is the ideal weight matrix, denotes the input vector, represents the basis function vector. is the NN reconstruction error vector with .
With the aid of virtual parameter learning technology, the following inequality can be obtained
(13) where is an virtual parameter without physical meaning. is a scalar function.
Design the control law for control input as:
(14) where denotes positive-definite design matrix, denotes the estimate of H with estimation error .
The adaptive law is given as
(15) where and are positive design constants.
Theorem 1 Consider the UUV trajectory tracking closed-loop control system Eqs. (1), (2) under Assumptions 1–4, the virtual control law Eq. (9), the control law Eq. (14), with the adaptive law Eq. (15). The actual trajectory can track the desired trajectory , while all signals in the UUV trajectory tracking closed-loop control system are bounded.
Proof: Select the following Lyapunov function candidate
(16)
The time derivative of Eq. (16) is
(17)
In view of Eqs. (8)–(10), we have
(18)
Substituting Eqs. (13) and (14) into Eq. (11) and rearranging Eq. (11) yields
(19)
In view of Eq. (15), one has:
(20)
Substituting Eqs. (18)–(20) into (17) and rearranging Eq. (17) yields:
(21) where and .
Solving Eq. (21) yields
(22)
It shows that V is uniformly ultimately bounded. From Eq. (17), , and are uniformly ultimately bounded. Thus, Theorem 1 is proved.
Optimal control law design
To optimize the tracking performance online, an optimal control law based on ADHDP is developed in this section. Schematic diagram of the proposed closed-loop control system of the UUV is presented in Fig. 2. An action-critic network structure is developed, the actor network is constructed to learn the optimal control scheme , and the critic network is constructed to approximate the cost function. Notate with . denotes the uniform time interval.
Figure 2: Schematic diagram of the proposed closed-loop control system of the UUV.
Herein, the cost function is formulated as follows:
(23) where is a discount factor. represents the utility function. Herein, and are positive-definite design matrices.
The Bellman equation is as follows (Liang, Xu & Zhang, 2023):
(24)
The goal of this section is to make the system output can track the desired trajectory in an optimal manner, while minimizing the cost function.
1) Critic network:
A multilayer perceptron with a three-layer network containing input, hidden, and output layers is introduced to approximate the cost function . The input vector is . is the number of hidden nodes. and denote the weights of critic network, where and . The hyperbolic tangent function is introduced in this work. Here, and are the intermediate variables. Further, one can derive the approximation as
(25)
(26)
(27)
2) Action network:
A multilayer perceptron with a three-layer network containing input, hidden, and output layers is introduced to approximate the optimal control law . The input vector of action network is , and the output vector of action network is . is the number of hidden nodes. and denote the weights of action network, where , and . Here, and are the intermediate variables. Further, we have
(28)
(29)
(30)
3) Adaptation of the critic network:
The critic network’s error function can be described as:
(31)
Hence, the critic network’s objective function for weights update can be defined as:
(32)
A gradient descent algorithm is applied to update the weight law. Here, , . According to Eqs. (28)–(30), the weight updating laws and of critic network can be calculated using the chain derivation rule
(33)
(34) where is the learning rate.
4) Adaptation of the action network:
The action network can be adjusted by backpropagating the error between the desired value and the approximate value of the critic network.
Herein, we have
(35)
(36)
Similarly, a gradient descent algorithm is applied to update the weight law of the action network. Here, , . According to Eqs. (31)–(33), the weight updating laws and of action network can be calculated using the chain derivation rule:
(37)
(38) where is the learning rate.
Remark 1 As a matter of convenience, only the output layer weights of the critic–action network are tuned during the learning process, whereas the weights in Eqs. (25) and (28) are randomly initiated. As described by Liang, Xu & Zhang (2023), as the number of NN’s hidden nodes increases, the NN’s approximation error is able to converge to a value that is adequately small.
Stability analysis
The optimal control law based on ADHDP is developed to optimize the tracking performance online. It can be obtained that actual control input signal is compound of the robust adaptive NN control law and optimal control law. The stability of robust adaptive control has been presented in Theorem 1. The ADHDP technique is looking for an optimal control action while approximating the Bellman equation with the critic network.
The ideal weights and for critic and action networks are bounded.
(39) where and are positive constants.
In the light of Eqs. (27) and (30), we have:
(40)
(41) where and are the estimation error of and , respectively.
Theorem 2 Consider the updating law Eq. (38) for actor network weights, and the updating law Eq. (34) for critic network weights, and the utility function is assumed as a bounded positive semidefinite function. Then, the estimation error and are bounded.
Proof: Select the following Lyapunov function candidate as:
(42) where , with being a design constant.
The first difference of can be written as:
(43)
Here, and can be calculated as
(44)
(45)
From Eq. (34), can be get:
(46)
According to Eqs. (44) and (46), we have:
(47) where .
Defined . The first term in Eq. (47) can be defined as:
(48)
Substituting Eqs. (48) into (47), we have:
(49)
From Eq. (38), we have:
(50)
According to Eqs. (45) and (50), we have:
(51) where .
The last term in Eq. (51) can be further calculated as:
(52)
Substituting Eqs. (52) into (51) yields
(53)
By using Cauchy–Schwarz inequality, we have:
(54)
Substituting Eqs. (54) into (53), we have:
(55)
According to Eqs. (49) and (55), we have:
(56) where .
By using Cauchy–Schwarz inequality, we have:
(57) where , , , , and are the upper bounds of , , , , and , respectively, and .
Choosing , , and , then .
According to the Lyapunov stability theorem, the estimation error and are bounded.
Remark 2 The actual input signal of the UUV consists of the robust adaptive control law and the optimal control law. The velocity error and trajectory tracking error under the robust adaptive control law are the driving signals of the optimal control law based on ADHDP. In other words, the optimal control law based on ADHDP is inactive and the equivalent output is 0 if the velocity and trajectory are able to track the desired trajectory under the robust adaptive control law.
Remark 3 With respect to the trajectory tracking control law developed in Li et al. (2021) and Yang et al. (2021), manual adjustment of the control law design parameters does not easily guarantee control performance. Once the UUV suffers from the bigger uncertain dynamics and time-varying disturbances, the fixed control law design parameters can’t attain the desirable tracking accuracy. With the incorporation of ADHDP into the control law design, the control law design parameters are updated automatically, thus effectively improving the control performance of the system.
Simulation results
In this section, we perform some simulations on an underactuated UUV to show the effectiveness and superiority of the proposed control scheme. As a comparison, the robust adaptive NN control scheme without the optimal control law based on ADHDP is labeled as “RC”. As a comparison, an advanced kalman filter-Based control scheme based on Vafamand, Arefi & Anvari-Moghaddam (2023) is labeled as “AKF” and the design parameters about the augmented Kalman filter is selected the same as Vafamand, Arefi & Anvari-Moghaddam (2023). The ADHDP-based self-learning control scheme proposed is labeled as the “ADHDP”. The model parameters of the UUV (Liang et al., 2020) are given as follows: , , , , , , , , , and .
The unknown disturbance vector is given as . The desired trajectory is given as , and .
The initial conditions are chosen as , . The design parameters of the robust adaptive NN control law are selected as , , , , .
The design parameters of optimal control law are chosen as , , , respectively. The learning rates are selected as and . The learning rate gradually decreases to the final values and over time. The number of hidden nodes for the action–critic network are given as , and the initial weights of action–critic network are randomly generated within the range . The maximal iteration numbers are set as for the action network and for the critic network. The completion of internal training is determined by reaching either the maximal iteration numbers. Herein, define the performance metrics as , .
Figures 3–8 show the simulation results of the proposed control scheme and the comparison control scheme. From Fig. 3, it can be seen that the underactuated UUV can accurately track the 3D spiral drive trajectory by the proposed control scheme. Specifically, it can be seen from Fig. 4 that the proposed control scheme has a smaller tracking error compared to the these two control scheme. It is observed in Fig. 4 that the tracking error under “AKF” is smaller than the tracking error under “RC”. The reason for this is that the main role of Kalman filtering in control is to provide accurate estimates of the state of the system, especially in the presence of noise and uncertainty, and these estimates help to improve the control strategy and hence the stability and performance of the system. As can be seen in Figs. 5, 6, the optimal control part is enacted to minimize the tracking error resulted from the initial state deviation. As the weights are adaptively adjusted, the tracking error and utility function gradually decrease. As the tracking error approaches 0, the utility function decreases to 0 and the weights remain constant. Moreover, the control performance can be assessed through the performance metrics as shown in Fig. 7. Clearly, the performance metric values of the proposed control scheme are smaller than the metric values of the compared control scheme, corroborating the feature that the proposed control scheme is able to optimize the tracking performance. The performance metric values under “AKF” is smaller than under “RC”. The effect of Kalman filtering is further demonstrated in Fig. 7. As shown in Fig. 8, the control input of the proposed control scheme is within a reasonably bounded range.
Figure 3: Desired and actual trajectories of the UUV.
Figure 4: The tracking error of the UUV.
Figure 5: The weights of action network.
Figure 6: Utility function.
Figure 7: The performance metrics.
Figure 8: The actual control input.
Conclusion
In this work, we have proposed a robust self-learning control scheme based on ADHDP to deal with the 3D trajectory tracking control problem of an UUV with uncertain dynamics and time-varying ocean disturbances. By combining the ADHDP optimization control scheme with the robust adaptive control scheme, an adaptive self-learning optimal scheme with online learning function is proposed. The proposed control method requires less model information and is more suitable for the actual situation of the system. In addition, the control parameters can be automatically updated according to the changes of external environment and unknown dynamics. Theoretical analyses as well as the comparison of simulations show that the proposed control scheme has significant effectiveness and superiority.