ADHDP-based robust self-learning 3D trajectory tracking control for underactuated UUVs

Chunbo Zhao; Huaran Yan; Deyi Gao

doi:10.7717/peerj-cs.2605

ADHDP-based robust self-learning 3D trajectory tracking control for underactuated UUVs

Chunbo Zhao, Huaran Yan , Deyi Gao

Merchant Marine College, Shanghai Maritime University, Shanghai, China

DOI: 10.7717/peerj-cs.2605

Published: 2024-12-10
Accepted: 2024-11-21
Received: 2024-05-31

Academic Editor: Valentina Emilia Balas

Subject Areas: Adaptive and Self-Organizing Systems, Autonomous Systems, Robotics
Keywords: Unmanned underactuated vehicles (UUVs), Robust adaptive control, Trajectory tracking, Action-dependent heuristic dynamic programming (ADHDP)

Copyright: © 2024 Zhao et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Zhao C, Yan H, Gao D. 2024. ADHDP-based robust self-learning 3D trajectory tracking control for underactuated UUVs. PeerJ Computer Science 10:e2605 https://doi.org/10.7717/peerj-cs.2605

The authors have chosen to make the review history of this article public.

Abstract

In this work, we propose a robust self-learning control scheme based on action-dependent heuristic dynamic programming (ADHDP) to tackle the 3D trajectory tracking control problem of underactuated uncrewed underwater vehicles (UUVs) with uncertain dynamics and time-varying ocean disturbances. Initially, the radial basis function neural network is introduced to convert the compound uncertain element, comprising uncertain dynamics and time-varying ocean disturbances, into a linear parametric form with just one unknown parameter. Then, to improve the tracking performance of the UUVs trajectory tracking closed-loop control system, an actor-critic neural network structure based on ADHDP technology is introduced to adaptively adjust the weights of the action-critic network, optimizing the performance index function. Finally, an ADHDP-based robust self-learning control scheme is constructed, which makes the UUVs closed-loop system have good robustness and control performance. The theoretical analysis demonstrates that all signals in the UUVs trajectory tracking closed-loop control system are bounded. The simulation results for the UUVs validate the effectiveness of the proposed control scheme.

Introduction

Recently, uncrewed underwater vehicles (UUVs) have garnered substantial interest from research teams due to potential engineering applications such as underwater archaeology, torpedo deployment, and cable installation (Peng, Wang & Han, 2019; Qiao & Zhang, 2020; Wang et al., 2021a; Wang, Gao & Zhang, 2021; Yuan et al., 2023). Most oceanic missions require the tracking of a predefined desired trajectory. In practice, UUVs are often underactuated, displaying significant nonlinearity, model parameter uncertainties, and vulnerability to unknown time-varying disturbances in the ocean environment (Li et al., 2023; Zhou et al., 2023). These features greatly complicate and challenge the development of trajectory tracking controllers.

Since the majority of UUVs are outfitted with three propellers to control surge, pitch, and yaw motions, they are classified as underactuated systems, where the number of independent actuators is less than the degrees of freedom. Yet, the existence of underactuated characteristic makes the design of UUV trajectory tracking control law more difficult. Further, the incomplete development of the hydraulic system complicates the acquisition of UUV model parameters. The neural networks (NNs) have gained substantial attention in research because of its universal approximation property. In Shi et al. (2024), Van (2019), Liu & Du (2021), Wang et al. (2019), NNs were applied to approximate the unknown nonlinear dynamics. Nevertheless, achieving precise estimation with NNs necessitates weight identification, which adds considerable computational complexity. To overcome this issue, Shen et al. (2020) proposes combining minimum learning parameter technology with RBFNNs and a fixed-time sliding mode control scheme with RBFNN disturbance observer and the single-parameter learning was proposed in Zhu et al. (2023). A robust trajectory tracking control scheme is proposed in Heshmati-Alamdari, Nikou & Dimarogonas (2021) for underactuated underwater vehicle with obstacles in the constrained workspace. In Chen et al. (2017), a robust control method is developed though based on sliding mode control and fuzzy control for underactuated underwater vehicle with uncertain model parameters and unknown disturbances. Accurate trajectory tracking with good robustness is achieved based on a finite-time extended state observer in Wang et al. (2021b).

Currently, most proposed control schemes focus on achieving stable tracking without considering the optimal performance of UUVs under uncertain dynamics and time-varying disturbances. Nonetheless, the intricate marine environment inevitably impacts the control performance of systems with fixed design parameters. In recent times, there has been a wealth of research on the application of adaptive dynamic programming (ADP) in numerous systems to achieve optimal control (Tong, Sun & Sui, 2018; Li, Sun & Tong, 2019). The fault-tolerant tracking control problem for UUVs subjected to time-varying ocean current disturbance is converted into an optimal control problem through the ADP approach in Che & Yu (2020). Through the integration of the backstepping design method, a fault-tolerant control scheme is formulated using a single-critic network-based ADP approach in Che (2022). In Wen et al. (2019), virtual and actual control are designed as the optimization solution of corresponding subsystems, and an optimized backstepping technique based on an actor network is utilized to enhance the control performance of the surface vessel system. A fault-tolerant fuzzy-optimized control scheme for tracking control of underactuated autonomous underwater vehicles is proposed in Gong, Er & Liu (2024) for complicated oceanic environments, particularly with the existence of unknown actuator failures and uncertain dynamics. Based on the above analyses, it is imperative to explore an intelligent control scheme with self-learning and self-adaptive capability to improve the control performance of UUVs under uncertain dynamics and time-varying ocean disturbances.

Motivated by the above discussions, this work develops a robust self-learning control scheme based on action-dependent heuristic dynamic programming (ADHDP) for underactuated UUVs subject to uncertain dynamics and unknown time-varying external disturbances. The main contributions of this work are as follows.

The fact that a large number of control design parameters (e.g., adaptive gain, leakage term gain, etc.) must be artificially set to fixed values makes it difficult to ensure the optimization of control performance metrics for existing robust adaptive control methods in Li et al. (2021) and Yang et al. (2021). The ADHDP-based control scheme proposed in this article can automatically update the control parameters in response to dynamic uncertainties and unknown time-varying perturbations, and our work not only reduces the need for manually designing the control design parameters, but also greatly improves the control performance and adaptability of the system.
Unlike traditional optimal tracking control schemes, such as those in Zhang, Li & Liu (2018) and Zheng et al. (2020), which usually require partial model information (e.g., the control gain is known or partially known) for parameter tuning and strategy optimization, the ADHDP-based control scheme proposed in this article no longer relies on the model parameter information, but only the input and output data of the UUV.

The remaining segments of this work are organized as follows. The problem formulation and preliminaries are introduced in “Problem Formulation and Preliminaries”. The details of design process and stability proof of the robust self-learning are given in “Control Law Design”. “Simulation Results” presents the simulation results. The conclusions of this work are summed up in “Conclusion”.

Notations: In this work, $d i a g (*)$ denotes the diagonal matrix; $λ (*)$ denotes the minimum eigenvalue of the matrix; $| | * | |$ denotes the 2-norm value of the matrix or vector; $s (*)$ and $c (*)$ denote the sine and cosine functions, respectively.

Problem formulation and preliminaries

Motion mathematical model of UUVs

The mathematical model of underactuated UUVs (Yan et al., 2024) is given as:

(1) $\dot{η} = J (η) v$

(2) $M \dot{v} + C (v) v + D v + g (η) = τ + d$ where $η = [x, y, z, θ, φ]^{T}$ is position vector consist of the surge position $x$ , the sway position $y$ , the heave position $z$ , the pitch angle $θ$ and the yaw angle $φ$ of UUVs in the earth-fixed frame. $v = [u, ν, w, q, r]^{T}$ is the velocity vector consist of the surge velocity $u$ , the sway velocity $ν$ , the heave velocity $w$ , the pitch velocity $q$ and the yaw velocity $r$ of UUVs in the body-fixed frame. In this article, we define the earth-fixed and the body-fixed reference frames of the UUV as indicated in Fig. 1. $D = d i a g (d_{11}, d_{22}, d_{33}, d_{44}, d_{55})$ represents the hydrodynamic damping matrix. $M = d i a g (m_{11}, m_{22}, m_{33}, m_{44}, m_{55})$ is the positive-definite inertia matrix. $τ = [τ_{u}, 0, 0, τ_{q}, τ_{r}]^{T} .$ is the control input vector consist of surge force $τ_{u}$ , pitch moment $τ_{q}$ and yaw moment $τ_{r}$ . $d = [d_{1}, d_{2}, d_{3}, d_{4}, d_{5}]^{T}$ represents the environmental disturbance vector. $g (η) = [0, 0, 0, ρ g \nabla G M_{L} s (θ), 0]^{T}$ with $ρ$ , $g$ , $\nabla$ and $G M_{L}$ are water density, gravity acceleration, displaced volume and longitudinal metacentric height, respectively. $J (η) = [\begin{matrix} (θ) c (φ) & - s (φ) & s (θ) c (φ) & 0 & 0 \\ c (θ) s (φ) & c (φ) & s (θ) s (φ) & 0 & 0 \\ - s (θ) & 0 & c (φ) & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & \frac{1}{c (φ)} \end{matrix}]$ represents the rotation matrix. $C (v) = [\begin{matrix} 0 & 0 & 0 & m_{33} w & - m_{22} ν \\ 0 & 0 & 0 & 0 & m_{11} u \\ 0 & 0 & 0 & - m_{11} u & 0 \\ - m_{33} w & 0 & m_{11} u & 0 & 0 \\ m_{22} ν & - m_{11} u & 0 & 0 & 0 \end{matrix}]$ represents the Coriolis and centripetal force matrix.

Figure 1: The earth-fixed and the body-fixed reference frames of the UUV.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-1

Assumption 1 The model parameters matrices $M$ , $C$ and $D$ are unavailable.

Assumption 2 The disturbance vector $d$ is bound yet unknown time-varying and $∥ d ∥\leq d_{m}$ , $d_{m}$ is a positive constant.

Assumption 3 The desired trajectory of UUVs in this article is notated as $η_{d} = [x_{d}, y_{d}, z_{d}]^{T}$ . The component of $η_{d}$ and their first two time derivatives are bound.

Assumption 4 The sway velocity $ν$ and the heave velocity $w$ are bounded.

Transformation of UUV’s position

To initiate the discussion, the subsequent coordinate transformation is employed to tackle the underactuation issue of UUVs:

(3) $η_{1} = [x + F c (θ) c (φ), y + F c (θ) s (φ), z - F s (φ)]^{T}$ where $F$ is a small positive constant.

According to Eqs. (1)–(3), we can obtain the following equation as:

(4) ${\dot{η}}_{1} = J_{1} (η) v_{1} + R_{1} (η, v)$

(5) $M_{1} {\dot{v}}_{1} + C_{1} (v) v_{1} + D_{1} v_{1} + g_{1} (η) = τ_{1} + d_{1}$ where $J_{1} (η) = [\begin{matrix} (θ) c (φ) & - F s (θ) c (φ) & - F s (φ) \\ c (θ) s (φ) & - F s (θ) s (φ) & F c (φ) \\ - s (θ) & - F c (θ) & 0 \end{matrix}]$ , $M_{1} = d i a g (m_{11}, m_{44}, m_{55})$ , $v_{1} = [u, q, r]^{T}$ , $C_{1} (v) = [\begin{matrix} 0 & m_{33} w & - m_{22} ν \\ (m_{11} - m_{33}) w & 0 & 0 \\ (m_{22} - m_{11}) ν & 0 & 0 \end{matrix}]$ , $R_{1} (η, v) = [\begin{matrix} - ν s (φ) + w s (θ) c (φ) \\ ν c (φ) + w s (θ) c (φ) \\ w c (θ) \end{matrix}]$ , $τ_{1} = [τ_{u}, τ_{q}, τ_{r}]^{T}$ , $d_{1} = [d_{1}, d_{4}, d_{5}]^{T}$ , $D_{1} = d i a g (d_{11}, d_{44}, d_{55})$ and $g_{1} (η) = {[0, ρ g \nabla G M_{L} s (θ), 0]}^{T}$ .

The goal of this work is to propose a robust self-learning control law based on ADHDP to ensure that the UUV trajectory tracking error converges to a small compact set in the presence of uncertain dynamics, and unknown time-varying disturbances, while all signals in the UUV trajectory tracking closed-loop control system are bounded.

Lemma 1: (Luo et al., 2020; Na et al., 2020) Notate $f (s)$ is a continuous function which is defined on a compact set $s \subset R^{n}$ . There has a radial basis function neural network (RBFNN) for an arbitrarily small constant $Θ$ satisfy

(6) $f (s) = ω^{* T} G (s) + Θ$ where $ω^{*}$ denotes optimal weight vector. $Θ$ denotes the reconstruction error of NNs, which satisfies $| | Θ | | \leq \bar{Θ}$ . $G (s) = [G_{1} (s), G_{2} (s), \cdot \cdot \cdot, G_{n} (s)]^{T}$ denotes the basic function vector. Herein, $G_{n} (s)$ is chosen as the Gaussian function.

Control law design

Robust adaptive NN control law design

Notate the tracking error $s_{1} = [s_{11}, s_{12}, s_{13}]^{T}$ as:

(7) $s_{1} = η_{1} - η_{d} .$

According to Eqs. (5) and (7), we have

(8) ${\dot{s}}_{1} = J_{1} (η) v_{1} + R_{1} (η, v) + {\dot{η}}_{d} .$

Design the virtual control law $ς \in R^{3}$ for $v_{1}$ as follows

(9) $ς = {J_{1}}^{- 1} (η) (- k_{1} s_{1} - R_{1} (η, v) + {\dot{η}}_{d})$ where $k_{1} \in R^{3 \times 3}$ is positive-definite design matrix.

To proceed, we can define the velocity error vector $s_{2} = [s_{21}, s_{22}, s_{23}]^{T}$ as follows:

(10) $s_{2} = v_{1} - ς$

In the light of Eqs. (6) and (10), we can obtain the time derivative of $s_{2}$

(11) $M_{1} s_{2}^{.} = - C_{1} (v) v_{1} - D_{1} v_{1} - g_{1} (η) + τ_{1} + d_{1} - M_{1} \dot{ς}$

Using the RBF NN to approximate the uncertain term $- C_{1} (v) v_{1} - D_{1} v_{1} - g_{1} (η) - M_{1} \dot{ς}$ . We have

(12) $- C_{1} (v) v_{1} - D_{1} v_{1} - g_{1} (η) - M_{1} \dot{ς} = ω^{* T} G (J) + Θ$ where $ω^{*}$ is the ideal weight matrix, $J = [v^{T}, ς^{T}, θ]^{T}$ denotes the input vector, $G (J)$ represents the basis function vector. $Θ$ is the NN reconstruction error vector with $| | Θ | | \leq \bar{Θ}$ .

With the aid of virtual parameter learning technology, the following inequality can be obtained

(13) $\begin{aligned} ∥ ω^{* T} G (J) + Θ + d_{1} ∥ & \leq ω_{m} ∥ G (J) ∥ + \bar{Θ} + d_{m} \\ \leq H ϕ \end{aligned}$ where $H = max {ω_{m}, \bar{Θ} + d_{m}}$ is an virtual parameter without physical meaning. $ϕ = ∥ G J ∥ + 1$ is a scalar function.

Design the control law for control input as:

(14) $τ_{1} = - k_{2} s_{2} - J_{1} (η) s_{1} - \hat{H} ϕ^{2} s_{2}$ where $s_{2} \in R^{3 \times 3}$ denotes positive-definite design matrix, $\hat{H}$ denotes the estimate of H with estimation error $\tilde{H} = H - \hat{H}$ .

The adaptive law is given as

(15) $\dot{\hat{H}} = Γ (ϕ^{2} | | s_{2} | |^{2} - ϑ \hat{H})$ where $Γ$ and $ϑ$ are positive design constants.

Theorem 1 Consider the UUV trajectory tracking closed-loop control system Eqs. (1), (2) under Assumptions 1–4, the virtual control law Eq. (9), the control law Eq. (14), with the adaptive law Eq. (15). The actual trajectory $η_{1}$ can track the desired trajectory $η_{d}$ , while all signals in the UUV trajectory tracking closed-loop control system are bounded.

Proof: Select the following Lyapunov function candidate

(16) $V = \frac{1}{2} (s_{1}^{T} s_{1} + s_{2}^{T} M_{1} s_{2} + Γ^{- 1} {\tilde{H}}^{2})$

The time derivative of Eq. (16) is

(17) $\dot{V} = s_{1}^{T} {\dot{s}}_{1} + s_{2}^{T} M_{1} {\dot{s}}_{2} + Γ^{- 1} \tilde{H} (- \dot{\hat{H}})$

In view of Eqs. (8)–(10), we have

(18) $s_{1}^{T} s_{1}^{.} = s_{1}^{T} J_{1} (η) s_{2} - s_{1}^{T} k_{1} s_{1}$

Substituting Eqs. (13) and (14) into Eq. (11) and rearranging Eq. (11) yields

(19) $\begin{aligned} s_{2}^{T} M_{1} {\dot{s}}_{2} & \leq∥ s_{2} ∥ H ϕ + {s_{2}}^{T} (- k_{2} s_{2} - J_{1} (η) s_{1} - \hat{H} ϕ^{2} s_{2}) \\ \leq \frac{H}{4} + {s_{2}}^{T} (- k_{2} s_{2} - J_{1} (η) s_{1} - \tilde{H} ϕ^{2} s_{2}) \end{aligned}$

In view of Eq. (15), one has:

(20) $\begin{aligned} Γ^{- 1} \tilde{H} (- \dot{\hat{H}}) = & - \tilde{H} (ϕ^{2} ∥ s_{2} ∥^{2} - ϑ \hat{H}) \\ \leq - \tilde{H} ϕ^{2} | | s_{2} | |^{2} - \frac{1}{2} ϑ {\tilde{H}}^{2} + \frac{1}{2} ϑ H^{2} . \end{aligned}$

Substituting Eqs. (18)–(20) into (17) and rearranging Eq. (17) yields:

(21) $\begin{aligned} \dot{V} = & s_{1}^{T} {\dot{s}}_{1} + s_{2}^{T} M_{1} {\dot{s}}_{2} + Γ^{- 1} \tilde{H} (- \dot{\hat{H}}) \\ \leq - s_{1}^{T} k_{1} s_{1} - s_{2}^{T} k_{2} s_{2} - \frac{1}{2} ϑ {\tilde{H}}^{2} + \frac{1}{2} ϑ H^{2} + \frac{H}{4} \\ \leq - 2 σ V + ψ \end{aligned}$ where $σ = min {λ_{min} (k_{1}), λ_{min} (k_{2} {\bar{M}}_{1}^{- 1}), \frac{1}{2} ϑ Γ}$ and $ψ = \frac{1}{2} ϑ H^{2} + \frac{H}{4}$ .

Solving Eq. (21) yields

(22) $0 \leq V \leq \frac{ψ}{2 σ} + [V (0) - \frac{ψ}{2 σ}] \exp^{- (2 σ t)} .$

It shows that V is uniformly ultimately bounded. From Eq. (17), $s_{1}$ , $s_{2}$ and $\tilde{H}$ are uniformly ultimately bounded. Thus, Theorem 1 is proved.

Optimal control law design

To optimize the tracking performance online, an optimal control law based on ADHDP is developed in this section. Schematic diagram of the proposed closed-loop control system of the UUV is presented in Fig. 2. An action-critic network structure is developed, the actor network is constructed to learn the optimal control scheme $τ_{a} = [τ_{a 1}, τ_{a 2}, τ_{a 3}]^{T}$ , and the critic network is constructed to approximate the cost function. Notate $\bar{s} = [s (t - Δ t), s (t)]^{T}$ with $s = [s_{11}, s_{12}, s_{13}, s_{21}, s_{22}, s_{23}]^{T}$ . $Δ t$ denotes the uniform time interval.

Figure 2: Schematic diagram of the proposed closed-loop control system of the UUV.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-2

Herein, the cost function is formulated as follows:

(23) $P (\bar{s} (t), τ (t)) = \sum_{c = 1}^{\infty} β^{c - 1} Y (\bar{s} (t + b Δ t), τ_{a} (t + b Δ t))$ where $0 < β \leq 1$ is a discount factor. $Y (\bar{s} (t), τ_{a} (t)) = {\bar{s}}^{T} F \bar{s} + τ_{a}^{T} L τ_{a}$ represents the utility function. Herein, $F \in R^{12 \times 12}$ and $L \in R^{3 \times 3}$ are positive-definite design matrices.

The Bellman equation is as follows (Liang, Xu & Zhang, 2023):

(24) $P^{*} (t - Δ t) = min_{τ_{a} (t - Δ t)} [Y (t) + β P^{*} (t)]$

The goal of this section is to make the system output $η_{1}$ can track the desired trajectory $η_{d}$ in an optimal manner, while minimizing the cost function.

1) Critic network:

A multilayer perceptron with a three-layer network containing input, hidden, and output layers is introduced to approximate the cost function $P (t)$ . The input vector is $I_{c} = [s (t - Δ t), s (t), τ_{a}]$ . $I_{c h}$ is the number of hidden nodes. $ω_{c i j}^{(1)} (t)$ and $ω_{c j}^{(2)} (t)$ denote the weights of critic network, where $i = 1, \dots, 15$ and $j = 1, \dots, I_{c h}$ . The hyperbolic tangent function $Ξ (t) = [(1 - e^{- t}) / (1 + e^{- t})]$ is introduced in this work. Here, $p_{c j} (t)$ and $q_{c j} (t)$ are the intermediate variables. Further, one can derive the approximation $\hat{P} (t)$ as

(25) $p_{c j} (t) = \sum_{i = 1}^{12} ω_{c i j}^{(1)} (t) {\bar{s}}_{1 i} (t) + \sum_{i = 1}^{3} ω_{c (i + 12) j}^{(1)} (t) τ_{a i} (t)$

(26) $q_{c j} (t) = Ξ_{c j} (p_{c j} (t)) = \frac{1 - e^{- p_{c j} (t)}}{1 + e^{- p_{c j} (t)}}$

(27) $\hat{P} (t) = ω_{c}^{(2)} (t) Ξ_{c} (t) = \sum_{j = 1}^{I_{c h}} ω_{c j}^{(2)} (t) q_{c j} (t) .$

2) Action network:

A multilayer perceptron with a three-layer network containing input, hidden, and output layers is introduced to approximate the optimal control law $τ_{a}$ . The input vector of action network is $\bar{s}$ , and the output vector of action network is $τ_{a}$ . $I_{a h}$ is the number of hidden nodes. $ω_{a κ n}^{(1)} (t)$ and $ω_{a n k}^{(2)} (t)$ denote the weights of action network, where $κ = 1, \dots, 12$ , $n = 1, \dots, I_{a h}$ and $k = 1, \dots, 3$ . Here, $p_{a n} (t)$ and $q_{a n} (t)$ are the intermediate variables. Further, we have

(28) $p_{a n} (t) = \sum_{κ = 1}^{12} ω_{a κ n}^{(1)} (t) {\bar{s}}_{1 κ} (t)$

(29) $q_{a n} (t) = Ξ_{a n} (p_{a n} (t)) = \frac{1 - e^{- p_{a n} (t)}}{1 + e^{- p_{a n} (t)}}$

(30) $τ_{a k} (t) = \sum_{n = 1}^{I_{a h}} ω_{a n k}^{(2)} (t) q_{a n} (t) .$

3) Adaptation of the critic network:

The critic network’s error function can be described as:

(31) $O_{c} (t) = β \hat{P} (t) - [\hat{P} (t - Δ t) - Y (t)]$

Hence, the critic network’s objective function for weights update can be defined as:

(32) $E_{c} (t) = \frac{1}{2} O_{c}^{2} (t) .$

A gradient descent algorithm is applied to update the weight law. Here, $ω_{c i j}^{(1)} (t + Δ t) = ω_{c i j}^{(1)} (t) + Δ ω_{c i j}^{(1)} (t)$ , $ω_{c j}^{(2)} (t + Δ t) = ω_{c j}^{(2)} (t) + Δ ω_{c j}^{(2)} (t)$ . According to Eqs. (28)–(30), the weight updating laws $Δ ω_{c i j}^{(1)}$ and $Δ ω_{c j}^{(2)}$ of critic network can be calculated using the chain derivation rule

(33) $\begin{aligned} Δ ω_{c i j}^{(1)} & = - μ_{c} \frac{\partial E_{c} (t)}{\partial \hat{P} (t)} \frac{\partial \hat{P} (t)}{\partial q_{c j} (t)} \frac{\partial q_{c j} (t)}{\partial p_{c j} (t)} \frac{\partial p_{c j} (t)}{\partial ω_{c i j}^{(1)} (t)} \\ = - μ_{c} β O_{c} (t) ω_{c j}^{(2)} (t) [\frac{1}{2} (1 - Ξ_{c j}^{2} (t))] I_{c i} (t) \end{aligned}$

(34) $Δ ω_{c j}^{(2)} = - μ_{c} \frac{\partial E_{c} (t)}{\partial \hat{P} (t)} \frac{\partial \hat{P} (t)}{\partial ω_{c j}^{(2)} (t)} = - μ_{c} β O_{c} (t) Ξ_{c j} (t)$ where $μ_{c}$ is the learning rate.

4) Adaptation of the action network:

The action network can be adjusted by backpropagating the error between the desired value and the approximate value of the critic network.

Herein, we have

(35) $O_{a} (t) = \hat{P} (t) - P (t)$

(36) $E_{a} (t) = \frac{1}{2} O_{a}^{2} (t)$

Similarly, a gradient descent algorithm is applied to update the weight law of the action network. Here, $ω_{a κ n}^{(1)} (t + Δ t) = ω_{a κ n}^{(1)} (t) + Δ ω_{a κ n}^{(1)} (t)$ , $ω_{a n k}^{(2)} (t + Δ t) = ω_{a n k}^{(2)} (t) + Δ ω_{a n k}^{(2)} (t)$ . According to Eqs. (31)–(33), the weight updating laws $ω_{a κ n}^{(1)} (t)$ and $Δ ω_{a n k}^{(2)}$ of action network can be calculated using the chain derivation rule:

(37) $\begin{aligned} Δ ω_{a κ n}^{(1)} & = - μ_{a} \frac{\partial E_{a} (t)}{\partial \hat{P} (t)} {[\frac{\partial \hat{P} (t)}{\partial τ_{a k} (t)}]}^{T} \frac{\partial τ_{a k} (t)}{\partial q_{a n} (t)} \frac{\partial q_{a n} (t)}{\partial p_{a n} (t)} \frac{\partial p_{a n} (t)}{\partial ω_{a κ n}^{(1)} (t)} \\ = - μ_{a} O_{a} (t) (\sum_{j = 1}^{I_{c h}} ω_{c j}^{(2)} (t) [\frac{1}{2} (1 - Ξ_{c j}^{2} (t))]) \\ \times (\sum_{i = 1}^{3} ω_{c (i + 12) j}^{(1)} (t)) ω_{a n k}^{(2)} (t) [\frac{1}{2} (1 - Ξ_{a n}^{2} (t))] (\sum_{κ = 1}^{12} {\bar{s}}_{1 κ} (t)) \end{aligned}$

(38) $\begin{aligned} Δ ω_{a n k}^{(2)} & = - μ_{a} \frac{\partial E_{a} (t)}{\partial \hat{P} (t)} \frac{\partial \hat{P} (t)}{\partial τ_{a k} (t)} \frac{\partial τ_{a k} (t)}{\partial ω_{a n k}^{(2)}} \\ = - μ_{a} O_{a} (t) Ξ_{a n} (t) (\sum_{j = 1}^{I_{c h}} ω_{c j}^{(2)} (t) [\frac{1}{2} (1 - Ξ_{c j}^{2} (t))]) \times \sum_{i = 1}^{3} ω_{c (i + 12) j}^{(1)} (t) \end{aligned}$ where $μ_{a}$ is the learning rate.

Remark 1 As a matter of convenience, only the output layer weights of the critic–action network are tuned during the learning process, whereas the weights in Eqs. (25) and (28) are randomly initiated. As described by Liang, Xu & Zhang (2023), as the number of NN’s hidden nodes increases, the NN’s approximation error is able to converge to a value that is adequately small.

Stability analysis

The optimal control law based on ADHDP is developed to optimize the tracking performance online. It can be obtained that actual control input signal is compound of the robust adaptive NN control law and optimal control law. The stability of robust adaptive control has been presented in Theorem 1. The ADHDP technique is looking for an optimal control action while approximating the Bellman equation with the critic network.

The ideal weights $ω_{c}^{*}$ and $ω_{a}^{*}$ for critic and action networks are bounded.

(39) $∥ ω ∥_{c}^{*} \leq ω_{c m}, ∥ ω_{a}^{*} ∥\leq ω_{a m}$ where $ω_{c m}$ and $ω_{a m}$ are positive constants.

In the light of Eqs. (27) and (30), we have:

(40) $\hat{P} (t) = ω_{c}^{(2)} (t)^{T} Ξ_{c} (t)$

(41) $τ_{a} = ω_{a}^{(2)} (t)^{T} Ξ_{a} (t)$ where ${\tilde{ω}}_{c}^{(2)} = ω_{c}^{(2)} - ω_{c}^{*}$ and ${\tilde{ω}}_{a}^{(2)} = ω_{a}^{(2)} - ω_{a}^{*}$ are the estimation error of $ω_{c}^{(2)}$ and $ω_{a}^{(2)}$ , respectively.

Theorem 2 Consider the updating law Eq. (38) for actor network weights, and the updating law Eq. (34) for critic network weights, and the utility function $Y (t)$ is assumed as a bounded positive semidefinite function. Then, the estimation error ${\tilde{ω}}_{c}^{(2)}$ and ${\tilde{ω}}_{a}^{(2)}$ are bounded.

Proof: Select the following Lyapunov function candidate as:

(42) $L_{V} (t) = L_{V 1} (t) + L_{V 2} (t)$ where $L_{V 1} (t) = \frac{1}{μ_{c}} t r ({\tilde{ω}}_{c}^{(2)} (t)^{T} {\tilde{ω}}_{c}^{(2)} (t))$ , $L_{V 2} (t) = \frac{1}{γ μ_{a}} t r ({\tilde{ω}}_{a}^{(2)} (t)^{T} {\tilde{ω}}_{a}^{(2)} (t))$ with $γ$ being a design constant.

The first difference of $L_{V} (t)$ can be written as:

(43) $Δ L_{V} (t) = Δ L_{V 1} (t) + Δ L_{V 2} (t) .$

Here, $Δ L_{V 1} (t)$ and $Δ L_{V 2} (t)$ can be calculated as

(44) $Δ L_{V 1} (t) = \frac{1}{μ_{c}} t r ({\tilde{ω}}_{c}^{(2)} {(t + Δ t)}^{T} {\tilde{ω}}_{c}^{(2)} (t + Δ t) - {\tilde{ω}}_{c}^{(2)} {(t)}^{T} {\tilde{ω}}_{c}^{(2)} (t))$

(45) $Δ L_{V 2} (t) = \frac{1}{γ μ_{a}} t r ({\tilde{ω}}_{a}^{(2)} {(t + Δ t)}^{T} {\tilde{ω}}_{a}^{(2)} (t + Δ t) - {\tilde{ω}}_{a}^{(2)} {(t)}^{T} {\tilde{ω}}_{a}^{(2)} (t)) .$

From Eq. (34), ${\tilde{ϖ}}_{c}^{(2)} (t + Δ t)$ can be get:

(46) $\begin{aligned} {\tilde{ω}}_{c}^{(2)} (t + Δ t) & = {\tilde{ω}}_{c}^{(2)} (t) + Δ {\hat{ω}}_{c}^{(2)} (t) \\ = - μ_{c} β Ξ_{c} (t) [β ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) {+ Y (t) - ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t)]}^{T} \\ + {\tilde{ω}}_{c}^{(2)} (t) . \end{aligned}$

According to Eqs. (44) and (46), we have:

(47) $\begin{array}{l} Δ L_{V 1} = & - 2 β δ_{c} (t) {[Y (t) - ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t) + β ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t)]}^{T} \\ + μ_{c} β^{2} ∥ Ξ_{c} (t) ∥^{2} ∥ β ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) + Y (t) - ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t) ∥^{2} \end{array}$ where $δ_{c} (t) = {\tilde{ω}}_{c}^{(2)} (t)^{T} Ξ_{c} (t)$ .

Defined $- 2 β δ_{c} (t) [Y (t) - ω_{c}^{(2)} (t - Δ t)^{T} Ξ_{c} (t - Δ t) + β ω_{c}^{(2)} (t)^{T} Ξ_{c} (t)]^{T} = A$ . The first term in Eq. (47) can be defined as:

(48) $\begin{aligned} A & =∥ β ω_{c}^{(2)} (t)^{T} Ξ_{c} (t) + Y (t) - ω_{c}^{(2)} (t - Δ t)^{T} Ξ_{c} (t - Δ t) - β δ_{c} (t) ∥^{2} \\ - ∥ β ω_{c}^{(2)} (t)^{T} Ξ_{c} (t) + Y (t) - ω_{c}^{(2)} (t - Δ t)^{T} Ξ_{c} (t - Δ t) ∥^{2} - ∥ β δ_{c} (t) ∥^{2} . \end{aligned}$

Substituting Eqs. (48) into (47), we have:

(49) $\begin{aligned} Δ L_{V 1} (t) & = - β^{2} (1 - μ_{c} β^{2} | | Ξ_{c} (t) | |) \times | | β^{- 1} Y (t) - β^{- 1} ω_{c}^{(2)} (t - Δ t)^{T} Ξ_{c} (t - Δ t)) \\ + ω_{c}^{(2)} (t)^{T} Ξ_{c} (t) | |^{2} + | | β ω_{c}^{*} Ξ_{c} (t) - ω_{c}^{(2)} (t - Δ t)^{T} Ξ_{c} (t - Δ t) + Y (t) | |^{2} \\ - β^{2} | | δ_{c} (t) | |^{2} . \end{aligned}$

From Eq. (38), we have:

(50) $\begin{aligned} {\tilde{ω}}_{a}^{(2)} (t + Δ t) & = {\tilde{ω}}_{a}^{(2)} (t) + Δ {\hat{ω}}_{a}^{(2)} (t) \\ = {\tilde{ω}}_{a}^{(2)} (t) - μ_{a} Ξ_{a} (t) ω_{c}^{(2)} (t)^{T} N (t) {[ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t)]}^{T} \end{aligned}$

According to Eqs. (45) and (50), we have:

(51) $\begin{aligned} Δ L_{V 2} (t) & = \frac{1}{γ_{1}} t r (- 2 δ_{a} (t) ω_{c}^{(2)} {(t)}^{T} N (t) {[ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t)]}^{T} \\ + μ_{a} ∥ Ξ_{a} (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} N (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2}) \end{aligned}$ where $δ_{a} (t) = {\tilde{ω}}_{a}^{(2)} (t)^{T} Ξ_{a} (t)$ .

The last term in Eq. (51) can be further calculated as:

(52) $\begin{aligned} - 2 δ_{a} (t) ω_{c}^{(2)} (t)^{T} N (t) [ω_{c}^{(2)} (t)^{T} Ξ_{c} (t)]^{T} \\ =∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) - ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2} - ∥ {ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t)}^{2} ∥ \\ - ∥ ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2} . \end{aligned}$

Substituting Eqs. (52) into (51) yields

(53) $\begin{aligned} Δ L_{V 2} (t) & = \frac{1}{γ_{1}} t r (∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) - ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2} \\ + μ_{a} ∥ Ξ_{a} (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} N (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2} \\ - ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2} - ∥ ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2}) \end{aligned}$

By using Cauchy–Schwarz inequality, we have:

(54) $\begin{aligned} ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) - ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2} - ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2} \\ \leq 4 ∥ δ_{c} (t) ∥^{2} + 4 ∥ ω_{c}^{* T} Ξ_{c} (t) ∥^{2} + ∥ ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2} . \end{aligned}$

Substituting Eqs. (54) into (53), we have:

(55) $\begin{aligned} Δ L_{V 2} (t) & \leq \frac{1}{γ} [- (1 - μ_{a} ∥ Ξ_{a} (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} N (t) ∥^{2}) \times ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2} \\ + 4 ∥ δ_{c} (t) ∥^{2} + 4 ∥ ω_{c}^{* T} Ξ_{c} (t) ∥^{2} + ∥ ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2}] . \end{aligned}$

According to Eqs. (49) and (55), we have:

(56) $\begin{array}{l} Δ L_{V} (t) & = Δ L_{V 1} (t) + Δ L_{V 2} (t) \\ = - β^{2} (1 - μ_{c} β^{2} ∥ Ξ_{c} ∥^{2}) \times ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) \\ + β^{- 1} Y (t) - β^{- 1} ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t) ∥^{2} \\ - (β^{2} - \frac{4}{γ}) ∥ δ_{c} (t) ∥^{2} - \frac{1}{γ_{1}} ∥ ω_{c}^{(2)} {(t)}^{T} Ξ_{c} (t) ∥^{2} \\ \times (1 - μ_{a} ∥ Ξ_{c} (t) ∥^{2} ∥ ω_{c}^{(2)} {(t)}^{T} N (t) ∥^{2}) + Ω \end{array}$ where $Ω = \frac{4}{γ} | | ω_{c}^{* T} Ξ_{c} (t) | |^{2} + \frac{1}{γ} ∥ ω_{c}^{(2)} {(t)}^{T} N (t) δ_{a} (t) ∥^{2}$ $+ ∥ β ω_{c}^{* T} Ξ_{c} (t) + Y (t) - ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t) ∥^{2}$ .

By using Cauchy–Schwarz inequality, we have:

(57) $\begin{array}{l} Ω (t) & \leq (4 β^{2} + \frac{4}{γ}) ∥ ω_{c}^{*} {(t)}^{T} Ξ_{c} (t) ∥^{2} + \frac{1}{γ} ∥ ω_{c}^{(2)} (t) N (t) δ_{a} (t) ∥^{2} \\ + 4 ∥ Y (t) ∥^{2} + 2 ω_{c}^{(2)} {(t - Δ t)}^{T} Ξ_{c} (t - Δ t) ∥^{2} \\ \leq (4 β^{2} + \frac{4}{γ} + 2) {\bar{ω}}_{c}^{2} Ξ_{c m}^{2} + 4 Y_{m}^{2} + \frac{1}{γ} ω_{c M}^{2} N_{m} {\bar{ω}}_{a m}^{2} Ξ_{a m}^{2} \\ = D_{m} \end{array}$ where ${\bar{ω}}_{a m}$ , $ω_{c M}$ , $Ξ_{c m}$ , $Ξ_{a m}$ , $Y_{m}$ and $N_{m}$ are the upper bounds of ${\tilde{ω}}_{a} (t)$ , ${\tilde{ω}}_{c} (t)$ , $Ξ_{c} (t)$ , $Ξ_{a} (t)$ , $Y (t)$ and $N (t)$ , respectively, and ${\bar{ω}}_{c} = max {ω_{c m}, ω_{c M}}$ .

Choosing $μ_{a} < [1 / (∥ Ξ_{a} (t) ∥^{2} ∥ ω_{c}^{(2)} (t) M (t) ∥^{2})]$ , $| | δ_{a} (t) | |^{2} > [D_{m} / (β^{2} - \frac{4}{γ})]$ , $μ_{c} < [1 / (β^{2} | | Ξ_{c} (t) | |^{2})]$ and $β^{2} - \frac{4}{γ} > 0$ , then $Δ L_{V} (t) \leq 0$ .

According to the Lyapunov stability theorem, the estimation error ${\tilde{ω}}_{c}^{(2)}$ and ${\tilde{ω}}_{a}^{(2)}$ are bounded.

Remark 2 The actual input signal of the UUV consists of the robust adaptive control law and the optimal control law. The velocity error and trajectory tracking error under the robust adaptive control law are the driving signals of the optimal control law based on ADHDP. In other words, the optimal control law based on ADHDP is inactive and the equivalent output is 0 if the velocity and trajectory are able to track the desired trajectory under the robust adaptive control law.

Remark 3 With respect to the trajectory tracking control law developed in Li et al. (2021) and Yang et al. (2021), manual adjustment of the control law design parameters does not easily guarantee control performance. Once the UUV suffers from the bigger uncertain dynamics and time-varying disturbances, the fixed control law design parameters can’t attain the desirable tracking accuracy. With the incorporation of ADHDP into the control law design, the control law design parameters are updated automatically, thus effectively improving the control performance of the system.

Simulation results

In this section, we perform some simulations on an underactuated UUV to show the effectiveness and superiority of the proposed control scheme. As a comparison, the robust adaptive NN control scheme without the optimal control law based on ADHDP is labeled as “RC”. As a comparison, an advanced kalman filter-Based control scheme based on Vafamand, Arefi & Anvari-Moghaddam (2023) is labeled as “AKF” and the design parameters about the augmented Kalman filter is selected the same as Vafamand, Arefi & Anvari-Moghaddam (2023). The ADHDP-based self-learning control scheme proposed is labeled as the “ADHDP”. The model parameters of the UUV (Liang et al., 2020) are given as follows: $m_{11} = 47.5 k g$ , $m_{22} = 94 k g$ , $m_{33} = 94 k g$ , $m_{44} = 13.5 k g \cdot m^{2}$ , $m_{55} = 13.4 k g \cdot m^{2}$ , $d_{11} = 13.5 k g \cdot s^{- 1}$ , $d_{22} = 45 k g \cdot s^{- 1}$ , $d_{33} = 45 k g \cdot s^{- 1}$ , $d_{44} = 23.8 k g \cdot m^{2} \cdot s^{- 1}$ , $d_{55} = 27.2 k g \cdot m^{2} \cdot s^{- 1}$ and $ρ g \nabla G M_{L} = 8 . 8$ .

The unknown disturbance vector is given as $d (t) = [5 \sin (0.15 t) + 2 \cos (0.15 t), 0.5 \sin (0.15 t) + 0.5 \cos (0.15 t), 0.5 \sin (0.15 t) + 0.5 \cos (0.15 t), \sin (0.15 t) + \cos (0.15 t), \sin (0.15 t) + \cos (0.15 t)]^{T}$ . The desired trajectory is given as $x_{d} = 10 \cos (0.1 t)$ , $y_{d} = 10 \sin (0.1 t)$ and $z_{d} = - 10 - 0.1 t$ .

The initial conditions are chosen as $η (0) = [8.5 m, 1 m, - 9 m, 0.1 π r a d, 0 r a d]^{T}$ , $v (0) = [0 m / s, 0 m / s,$ $0 m / s, 0 r a d / s, 0 r a d / s]^{T}$ . The design parameters of the robust adaptive NN control law are selected as $F = 0.3$ , $Γ = 0.8$ , $ϑ = 0.002$ , $k_{1} = d i a g (0.2, 0.1, 0.2)$ , $k_{2} = d i a g (10, 10, 10)$ .

The design parameters of optimal control law are chosen as $β = 0.95$ , $F = 0.001 I^{12 \times 12}$ , $L = 0.001 I^{3 \times 3}$ , respectively. The learning rates are selected as $μ_{a} (0) = 0.1$ and $μ_{c} (0) = 0.1$ . The learning rate gradually decreases to the final values $μ_{a} (\infty) = 0.005$ and $μ_{c} (\infty) = 0.005$ over time. The number of hidden nodes for the action–critic network are given as $I_{c h} = 6$ , $I_{a h} = 5$ and the initial weights of action–critic network are randomly generated within the range $[- 0.2, 0.2]$ . The maximal iteration numbers are set as $n_{a} = 200$ for the action network and $n_{c} = 200$ for the critic network. The completion of internal training is determined by reaching either the maximal iteration numbers. Herein, define the performance metrics as $J_{s 1} = \int_{0}^{T} (| s_{11} | + | s_{12} | + | s_{13} |) d t$ , $J_{s 2} = \int_{0}^{T} (| s_{21} | + | s_{22} | + | s_{23} |) d t$ .

Figures 3–8 show the simulation results of the proposed control scheme and the comparison control scheme. From Fig. 3, it can be seen that the underactuated UUV can accurately track the 3D spiral drive trajectory by the proposed control scheme. Specifically, it can be seen from Fig. 4 that the proposed control scheme has a smaller tracking error compared to the these two control scheme. It is observed in Fig. 4 that the tracking error under “AKF” is smaller than the tracking error under “RC”. The reason for this is that the main role of Kalman filtering in control is to provide accurate estimates of the state of the system, especially in the presence of noise and uncertainty, and these estimates help to improve the control strategy and hence the stability and performance of the system. As can be seen in Figs. 5, 6, the optimal control part is enacted to minimize the tracking error resulted from the initial state deviation. As the weights are adaptively adjusted, the tracking error and utility function gradually decrease. As the tracking error approaches 0, the utility function decreases to 0 and the weights remain constant. Moreover, the control performance can be assessed through the performance metrics as shown in Fig. 7. Clearly, the performance metric values of the proposed control scheme are smaller than the metric values of the compared control scheme, corroborating the feature that the proposed control scheme is able to optimize the tracking performance. The performance metric values under “AKF” is smaller than under “RC”. The effect of Kalman filtering is further demonstrated in Fig. 7. As shown in Fig. 8, the control input of the proposed control scheme is within a reasonably bounded range.

Figure 3: Desired and actual trajectories of the UUV.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-3

Figure 4: The tracking error of the UUV.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-4

Figure 5: The weights of action network.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-5

Figure 6: Utility function.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-6

Figure 7: The performance metrics.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-7

Figure 8: The actual control input.

Download full-size image

DOI: 10.7717/peerj-cs.2605/fig-8

Conclusion

In this work, we have proposed a robust self-learning control scheme based on ADHDP to deal with the 3D trajectory tracking control problem of an UUV with uncertain dynamics and time-varying ocean disturbances. By combining the ADHDP optimization control scheme with the robust adaptive control scheme, an adaptive self-learning optimal scheme with online learning function is proposed. The proposed control method requires less model information and is more suitable for the actual situation of the system. In addition, the control parameters can be automatically updated according to the changes of external environment and unknown dynamics. Theoretical analyses as well as the comparison of simulations show that the proposed control scheme has significant effectiveness and superiority.

Supplemental Information

The model parameters of the UUV and all the design parameters of the control law.

DOI: 10.7717/peerj-cs.2605/supp-1

Download

[1] Che G. 2022. Single critic network based fault-tolerant tracking control for underactuated auv with actuator fault. Ocean Engineering 254(7):111380

[2] Che G, Yu Z. 2020. Neural-network estimators based fault-tolerant tracking control for auv via adp with rudders faults and ocean current disturbance. Neurocomputing 411(4):442-454

[3] Chen Y, Li J, Wang K, Ning S. 2017. Robust trajectory tracking control of underactuated underwater vehicle subject to uncertainties. Journal of Marine Science and Technology 25(3):5

[4] Gong H, Er MJ, Liu Y. 2024. Fuzzy optimal fault-tolerant trajectory tracking control of underactuated auvs with prescribed performance in 3-D space. IEEE Transactions on Systems, Man, and Cybernetics: Systems Epub ahead of print 7 October 2024

[5] Heshmati-Alamdari S, Nikou A, Dimarogonas DV. 2021. Robust trajectory tracking control for underactuated autonomous underwater vehicles in uncertain environments. IEEE Transactions on Automation Science and Engineering 18(3):1288-1301

[6] Li Y, Liang S, Xu B, Hou M. 2021. Predefined-time asymptotic tracking control for hypersonic flight vehicles with input quantization and faults. IEEE Transactions on Aerospace and Electronic Systems 57(5):2826-2837

[7] Li Y, Sun K, Tong S. 2019. Observer-based adaptive fuzzy fault-tolerant optimal control for SISO nonlinear systems. IEEE Transactions on Cybernetics 49(2):649-661

[8] Li Z, Wang M, Ma G, Zou T. 2023. Adaptive reinforcement learning fault-tolerant control for AUVs with thruster faults based on the integral extended state observer. Ocean Engineering 271(11):113722

[9] Liang X, Qu X, Wang N, Zhang R, Li Y. 2020. Three-dimensional trajectory tracking of an underactuated AUV based on fuzzy dynamic surface control. IET Intelligent Transport Systems 14(5):364-370

[10] Liang S, Xu B, Zhang Y. 2023. Robust self-learning fault-tolerant control for hypersonic flight vehicle based on ADHDP. IEEE Transactions on Systems, Man, and Cybernetics: Systems 53(9):5295-5306

[11] Liu J, Du J. 2021. Composite learning tracking control for underactuated autonomous underwater vehicle with unknown dynamics and disturbances in three-dimension space. Applied Ocean Research 112(8):102686

[12] Luo C, Lei H, Li J, Zhou C. 2020. A new adaptive neural control scheme for hypersonic vehicle with actuators multiple constraints. Nonlinear Dynamics 100(4):3529-3553

[13] Na J, Wang S, Liu YJ, Huang Y, Ren X. 2020. Finite-time convergence adaptive neural network control for nonlinear servo systems. IEEE Transactions on Cybernetics 50(6):2568-2579

[14] Peng Z, Wang J, Han QL. 2019. Path-following control of autonomous underwater vehicles subject to velocity and input constraints via neurodynamic optimization. IEEE Transactions on Industrial Electronics 66(11):8724-8732

[15] Qiao L, Zhang W. 2020. Trajectory tracking control of AUVs via adaptive fast nonsingular integral terminal sliding mode control. IEEE Transactions on Industrial Informatics 16(2):1248-1258

[16] Shen Z, Bi Y, Wang Y, Guo C. 2020. MLP neural network-based recursive sliding mode dynamic surface control for trajectory tracking of fully actuated surface vessel subject to unknown dynamics and input saturation. Neurocomputing 377(10):103-112

[17] Shi Y, Xie W, Zhang G, Zhang W, Silvestre C. 2024. Event-triggered quantitative prescribed performance neural adaptive control for autonomous underwater vehicles. IEEE Transactions on Systems, Man, and Cybernetics: Systems 54(6):3381-3392

[18] Tong S, Sun K, Sui S. 2018. Observer-based adaptive fuzzy decentralized optimal control design for strict-feedback nonlinear large-scale systems. IEEE Transactions on Fuzzy Systems 26(2):569-584

[19] Vafamand N, Arefi MM, Anvari-Moghaddam A. 2023. Advanced kalman filter-based backstepping control of ac microgrids: a command filter approach. IEEE Systems Journal 17(1):1060-1070

[20] Van M. 2019. Adaptive neural integral sliding-mode control for tracking control of fully actuated uncertain surface vessels. International Journal of Robust and Nonlinear Control 29(5):1537-1557

[21] Wang N, Gao Y, Zhang X. 2021. Data-driven performance-prescribed reinforcement learning control of an unmanned surface vehicle. IEEE Transactions on Neural Networks and Learning Systems 32(12):5456-5467

[22] Wang N, Gao Y, Zhao H, Ahn CK. 2021a. Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle. IEEE Transactions on Neural Networks and Learning Systems 32(7):3034-3045

[23] Wang J, Wang C, Wei Y, Zhang C. 2019. Command filter based adaptive neural trajectory tracking control of an underactuated underwater vehicle in three-dimensional space. Ocean Engineering 180(8):175-186

[24] Wang N, Zhu Z, Qin H, Deng Z, Sun Y. 2021b. Finite-time extended state observer-based exact tracking control of an unmanned surface vehicle. International Journal of Robust and Nonlinear Control 31(5):1704-1719

[25] Wen G, Ge SS, Chen CLP, Tu F, Wang S. 2019. Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Transactions on Cybernetics 49(9):3420-3431

[26] Yan T, Xu Z, Yang SX, Gadsden SA. 2024. Distributed neurodynamics-based backstepping optimal control for robust constrained consensus of underactuated underwater vehicles fleet. IEEE Transactions on Cybernetics 54(8):4666-4677

[27] Yang X, Yan J, Hua C, Guan X. 2021. Trajectory tracking control of autonomous underwater vehicle with unknown parameters and external disturbances. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(2):1054-1063

[28] Yuan C, Shuai C, Ma J, Fang Y. 2023. An efficient control allocation algorithm for over-actuated auvs trajectory tracking with fault-tolerant control. Ocean Engineering 273(1):113976