Structured Deep Neural Network-Based Backstepping Trajectory Tracking Control for Lagrangian Systems

Jiajun Qian, Liang Xu, Xiaoqiang Ren, Xiaofan Wang The work was supported in part by the National Natural Science Foundation of China under Grant 62373239, 62273223, 62336005 and 62333011, and the Project of Science and Technology Commission of Shanghai Municipality under Grant 22JC1401401. Jiajun Qian, Xiaofan Wang are with the School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China. Emails: {qianjiajun, xfwang}@shu.edu.cnXiaoqiang Ren is with School of Mechatronic Engineering and Automation, Shanghai University and Key Laboratory of Marine Intelligent Unmanned Swarm Technology and System, Ministry of Education, Shanghai, China. Emails: [email protected]Liang Xu is with the School of Future Technology, Shanghai University, Shanghai, China. Email: [email protected] (corresponding author)

Abstract

Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By properly designing neural network structures, the proposed controller can ensure closed-loop stability for any compatible neural network parameters. In addition, improved control performance can be achieved by further optimizing neural network parameters. Besides, we provide explicit upper bounds on tracking errors in terms of controller parameters, which allows us to achieve the desired tracking performance by properly selecting the controller parameters. Furthermore, when system models are unknown, we propose an improved Lagrangian neural network (LNN) structure to learn the system dynamics and design the controller. We show that in the presence of model approximation errors and external disturbances, the closed-loop stability and tracking control performance can still be guaranteed. The effectiveness of the proposed approach is demonstrated through simulations.

Index Terms:

deep neural networks, backstepping control, trajectory tracking, stability guarantees, Lagrangian systems.

I Introduction

Learning-based control methods have gained significant attention in the control community due to the development of machine learning techniques. Moreover, neural network-based control has now become prevalent due to the excellent function approximation capability of neural networks. Traditional neural network-based control uses shallow neural networks [1]. In contrast, deep neural networks (DNNs) are superior to shallow NNs in representing function compositions [2, 3] and avoid the curse of dimensionality [4, 5], therefore motivating the applications of DNNs in control systems [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

DNNs in control systems are mainly used to learn dynamics or learn controllers. In the first category, researchers use DNN to learn complex models [6, 7, 8, 9, 10] or combine them with the first principles to capture the uncertainty and residual terms of the system [11, 12]. The second application involves using neural networks to learn controllers or control certificates. However, since DNNs are black-box models, providing formal stability guarantees for neural network-based controllers is difficult. Many approaches have been proposed to address this challenge.

Neural Lyapunov control methods [13, 14, 15, 16, 17] use neural networks to learn both a Lyapunov function and a control law. This framework was first proposed in [14], which comprises a learner and a falsifier. The learner generates a Lyapunov candidate, while the falsifier aims to identify points where the Lyapunov candidate fails to satisfy the Lyapunov condition. Subsequently, the identified points are added to the training dataset and the neural Lypaunov candidate and the neural controller are re-trained. This process is repeated until the falsifier cannot find any violation points. The method has been applied in various fields, such as robot control [18] and learning safe control strategies [19]. However, identifying points where the Lyapunov candidate does not satisfy the Lyapunov condition is computationally complex. Moreover, the training and falsification process may undergo several rounds before a Lyapunov function and a neural network controller can be found. As a result, the neural Lypaunov control method is computationally demanding.

Therefore, effectively reducing the computational complexity while providing formal stability guarantees for DNN based controllers is challenging. Structured DNN controllers are proposed for Hamiltonian or Lagrangian systems as an alternative [20, 21, 22, 23, 24]. Closed-loop stability under structured DNN controllers can be guaranteed as long as the neural networks satisfy certain structures. These methods require only the solution of a simple training problem, significantly reducing computational complexity. However, there are few structured DNN-based controllers for the trajectory tracking control problem.

Several existing neural network-based controllers are proposed for trajectory tracking control problems [25, 26, 27, 28]. The work [25, 26] use the powerful function approximation capibility of Radial Basis Function Neural Network (RBFNN) to learn controllers. These methods ensure stability by customizing the parameter update rate of RBFNN, but cannot provide performance guarantees. The work [27, 28] use the outputs of the neural networks to determine the control gains of the traditional controller such as kinematic controller and PID controller. By training and optimizing the parameters of the neural network, the neural network can quickly and accurately find the optimal values of the controller parameters. Since the output of neural networks only determine the control gains of the controller, problems with traditional controllers, such as the need to linearize the system model, inability to cope with disturbances, and failure to adapt to model uncertainty or unknown system models, still persist.

In this paper, we present a structured DNN-based controller for the trajectory tracking control of Lagrangian systems. Our proposed DNN-based controller is constructed using the backstepping technique [29, 30]. We demonstrate that closed-loop stability is ensured for any compatible value of the neural network parameters. Moreover, we explicitly provide an upper bound for tracking errors in terms of the control parameters. In scenarios where obtaining model information is difficult, we propose a modified Lagrangian Neural Network (LNN) [8] to learn the system model. Moreover, we show that in the presence of model learning uncertainties and disturbances, we can still guarantee a bounded tracking error. Finally, we substantiate the effectiveness of our proposed tracking controller through a series of simulations.

This paper is organized as follows. In Section II, some preliminaries are provided. Section III gives the tracking controller design and performance analysis. Section IV shows how to use improved LNNs to learn dynamics and control designs that can guarantee closed-loop stability. Several simulation results are given in Section V. Some conclusions are provided in Section VI.

Notations: $\mathbb{R}$ ( $\mathbb{R}^{n}$ ) denote the set of real numbers ( $n$ -dimensional real vectors). $\lambda_{\min}(\cdot)$ denotes the minimum eigenvalue of a symmetric matrix. $I$ denote the identity matrix. $W_{1:k}$ denote the sequence $\{W_{1},\ldots,W_{k}\}$ . $\|\cdot\|$ denotes the standard Euclidean norm.

II Preliminaries

In this section, we briefly introduce the Euler-Lagrange equation, Fully Connected Neural Network (FCNN), Lagrangian Neural Networks (LNNs) [8] and the Input Convex Neural Network (ICNN) [31], which will be used in subsequent sections.

II-A Euler-Lagrange Equation

The Euler-Lagrange equation describes the motion of a mechanical system, which is

\displaystyle\frac{d}{dt}\left(\frac{\partial L}{\partial\bm{\dot{q}}}\right)-% \frac{\partial L}{\partial\bm{q}}=\bm{u}+\bm{\tau^{d}},L(\bm{q},\bm{\dot{q}})=% T(\bm{q},\bm{\dot{q}})-V(\bm{q}),

(1)

where $L$ is the Lagrangian, $T$ is the kinetic energy, $V$ is the potential energy, $\bm{q}$ is the generalized coordinates, $\bm{u}$ is the generalized non-conservative force, and $\bm{\tau^{d}}$ is the disturbance acting on the system. Equation (1) can be equivalently represented as

\displaystyle\bm{M}(\bm{q})\bm{\ddot{q}}+\bm{C}(\bm{q},\bm{\dot{q}})\bm{\dot{q% }}+\bm{G}(\bm{q})=\bm{u}+\bm{\tau^{d}},

(2)

where $\bm{M}(\bm{q})=\frac{\partial^{2}(T(\bm{q},\bm{\dot{q}}))}{\partial\bm{\dot{q}% }\partial\bm{\dot{q}}}$ is the inertia matrix, $\bm{C}(\bm{q},\bm{\dot{q}})=\bm{\dot{M}}(\bm{q})-\frac{1}{2}\bm{\dot{M}}(\bm{q% })^{\top}$ is the Coriolis matrix, and $\bm{G}(\bm{q})=-\frac{\partial(V(\bm{q}))}{\partial\bm{q}}$ is the gravitational force vector. The Euler-Lagrange equation has the following properties [32].

Property 1.

$\bm{M}$ is positive definite and is bounded by

\displaystyle a_{1}\|\bm{x}\|^{2}\leq\bm{x}^{\top}\bm{M}\bm{x}\leq a_{2}\|\bm{% x}\|^{2},\quad\forall\bm{x}\in\mathbb{R}^{n}

where $a_{1},a_{2}\in\mathbb{R}^{+}$ are positive constants.

Property 2.

$\bm{\dot{M}}-2\bm{C}$ is skew symmetric satisfying $\bm{x}^{\top}(\bm{\dot{M}}-2\bm{C})\bm{x}=0,\quad\forall\bm{x}\in\mathbb{R}^{n}$ .

II-B Fully Connected Neural Network

Fully Connected Neural Network (FCNN), also known as Multilayer Perceptron (MLP), is a DNN structure. It contains an input layer, one or more hidden layers, and an output layer. Each neuron in each layer is connected to every neuron in the adjacent layer. FCNN has the following expression

	$\displaystyle\bm{y_{i+1}}$	$\displaystyle=\sigma_{i}\left(\bm{W_{i}}\bm{y_{i}}+\bm{b_{i}}\right),\quad i=0% ,\ldots,k-1,$		(3)
	$\displaystyle f(\bm{x};\bm{\gamma})$	$\displaystyle=\bm{y_{k}},$		(3)

where $\bm{y}_{i}$ are the layer output with $\bm{y}_{0}=\bm{x}$ being the neural network input, $\sigma_{i}$ represents the layer activation function, $\bm{\gamma}=\{\bm{W_{0:k-1}},\bm{b_{0:k-1}}\}$ are the trainable parameters of FCNN.

II-C Lagrangian Neural Network

Lagrangian Neural Network (LNN) [8] is used to learn the Lagrangian function of mechanical systems from data. Unlike conventional supervised learning, LNN is constructed to respect the Euler-Lagrange equation in an unsupervised manner. By endowing the laws of physics, LNN has better inductive biases.

In LNN, the Lagrangian function is approximated by an FCNN $\bm{\mathcal{L}}(\bm{q},\bm{\dot{q}};\bm{\gamma})$ with the parameters $\bm{\gamma}$ . During training, $N+1$ samples $\{\bm{q}(i),\bm{\dot{q}}(i),\bm{\ddot{q}}(i),\bm{u}(i)\}_{i=0}^{N}$ are collected. Then the LNN is trained by solving the following optimization problem

$\displaystyle\min_{\bm{\gamma}}$	$\displaystyle\sum_{i=0}^{N}\left(\bm{\hat{\ddot{q}}}(i)-\bm{\ddot{q}}(i)\right% )^{\top}\bm{Q}\left(\bm{\hat{\ddot{q}}}(i)-\bm{\ddot{q}}(i)\right)$	(4)
$\displaystyle\mathrm{s.t.},$	$\displaystyle\bm{\hat{\ddot{q}}}(i)=\frac{\partial^{2}\mathcal{L}(\bm{q}(i),% \bm{\dot{q}}(i);\bm{\gamma})}{\partial\bm{\dot{q}}(i)\partial\bm{\dot{q}}(i)}^% {-1}[\bm{u}(i)$
	$\displaystyle+\frac{\partial\mathcal{L}(\bm{q}(i),\bm{\dot{q}}(i);\bm{\gamma})% }{\partial\bm{q}(i)}-\frac{\partial^{2}\mathcal{L}(\bm{q}(i),\bm{\dot{q}}(i);% \bm{\gamma})}{\partial\bm{q}(i)\partial\bm{\dot{q}}(i)}\bm{\dot{q}}(i)]$

where $\bm{Q}=\bm{Q}^{\top}>0$ is the weight matrix, and the Jacobian and Hessian matrix of $\mathcal{L}$ are calculated using automatic differentiation [33]. By solving (4), one can obtain the optimal parameter $\bm{\gamma^{*}}$ . Then, using automation differentiation, we can obtain an approximation of $\bm{M}$ as $\bm{\hat{M}}=\frac{\partial^{2}\mathcal{L}(\bm{q},\bm{\dot{q}};\bm{\gamma^{*}}% )}{\partial\bm{\dot{q}}\partial\bm{\dot{q}}}$ , $\bm{C}$ as $\bm{\hat{C}}=\bm{\dot{\hat{M}}}(\bm{q})-\frac{1}{2}\bm{\dot{\hat{M}}}(\bm{q})^% {\top}$ , and $\bm{G}$ as $\bm{\hat{G}}=-\frac{\partial\mathcal{L}(\bm{q},\bm{\dot{q}};\bm{\gamma^{*}})}{% \partial\bm{q}}+\frac{1}{2}\bm{\dot{\hat{M}}}(\bm{q})^{\top}\bm{\dot{q}}$ .

To effectively solve the optimization problem (4), a special initialization strategy is necessary [31], which is defined according to the depth and width of the LNN

\displaystyle\left.\nu=\frac{1}{\sqrt{n}}\left\{\begin{array}[]{cc}2.2&\text{% First layer}\\ 0.58i&\text{Hidden layer }i\in\{1,\ldots\}\\ n&\text{Output layer}\end{array}\right.\right.,

(5)

where $i$ represents the $i$ -th layer of the neural network, $n$ is the number of neurons in this layer. The parameters of every layer are then initialized according to $\mathcal{N}(0,\nu^{2})$ , where $\mathcal{N}$ represents a Gaussian distribution.

II-D Input Convex Neural Network

Input Convex Neural Network (ICNN) is a feedforward neural network architecture with constraints on the neural network parameters to ensure that the output of the neural network is convex with respect to all inputs (Fully Input Convex Neural Networks or FICNN) or with respect to a subset of inputs (Partially Input Convex Neural Network or PICNN).

FICNN uses the following architecture for $i=0,\ldots,k-1$

\displaystyle\bm{y_{i+1}}=\sigma_{i}\left(\bm{W_{i}^{(y)}}\bm{y_{i}}+\bm{W_{i}% ^{(x)}}\bm{x}+\bm{b_{i}}\right),f(\bm{x};\bm{\theta})=\bm{y_{k}},

(6)

where $\sigma_{i}$ represents layer activation functions, $\bm{x}$ is the FICNN input, $\bm{y_{0}}=0,\bm{W_{0}^{(y)}}=0$ , $\bm{\theta}=\{\bm{W_{1:k-1}^{(y)}},\bm{W_{0:k-1}^{(x)}},\bm{b_{0:k-1}}\}$ are network parameters. One can show that $f(\bm{x};\bm{\theta})$ is convex in $\bm{x}$ if $\bm{W_{1:k-1}^{(y)}}$ are nonnegative and the activation functions $\sigma_{i}$ are convex and non-decreasing [31].

PICNN uses the following architecture for $i=0,\ldots,k-1$

		$\displaystyle\bm{v_{i+1}}=\widetilde{\sigma}_{i}(\bm{\widetilde{W}_{i}}\bm{v_{% i}+\bm{\widetilde{b}_{i}}}),$		(7)
		$\displaystyle\bm{y_{i+1}}=\sigma_{i}\left(\bm{W_{i}^{(y)}}\left(\bm{y_{i}}% \circ(\bm{W^{(yv)}_{i}}\bm{v_{i}}+\bm{b^{(y)}_{i}})\right)\right.$
		$\displaystyle\quad\left.+\bm{W_{i}^{(x)}}\left(\bm{x}\circ(\bm{W^{(xv)}_{i}}% \bm{v_{i}}+\bm{b^{(x)}_{i}})\right)+\bm{W_{i}^{(v)}}\bm{v_{i}}+\bm{b_{i}}% \right),$
		$\displaystyle f(\bm{\widetilde{x},\bm{x}};\bm{\theta})=\bm{y_{k}},$

where $\sigma_{i}$ and $\widetilde{\sigma}_{i}$ represent layer activation functions, $\bm{y_{0}}=0$ , $\bm{v_{0}}=\bm{\widetilde{x}}$ , $\bm{W_{0}^{(y)}}=0,$ $\bm{\theta}=\{\bm{W_{1:k-1}^{(y)}},\bm{W_{1:k-1}^{(yv)}},\bm{W_{0:k-1}^{(x)}},% \bm{W_{0:k-1}^{(xv)}}$ $,\bm{\widetilde{W}_{0:k-1}},\bm{b^{(y)}_{1:k-1}},$ $\bm{b_{0:k-1}}$ $,\bm{b_{0:k-1}}^{(x)},\bm{\widetilde{b}_{0:k-1}}\}$ are the network parameters and $\circ$ denotes the Hadmard product. One can show that $f(\bm{\widetilde{x}},\bm{x};\bm{\theta})$ is convex in $\bm{x}$ if $\bm{W_{1:k-1}^{(y)}}$ are nonnegative and the activation functions are convex and non-decreasing [31]. It should be noted that ICNN can only guarantee the convexity and not the strong convexity.

III Main Results

In this section, we propose a neural backstepping controller (NBS controller) for the trajectory tracking control of (2) under the assumption that model information $\bm{M},\bm{C},\bm{G}$ is available. The design of the tracking controller is given in Section III-A. Its performance is analyzed in Section III-B and the final learning optimization problem is formulated in Section III-C.

III-A Neural Backstepping Tracking Control Design

Suppose that the reference trajectory is $\bm{q}^{d}(t)$ , which is continuously differentiable. We aim to design a controller such that the state $\bm{q}(t)$ follows $\bm{q^{d}}(t)$ . We define the following errors

\displaystyle\bm{z_{1}}=\bm{q}-\bm{q^{d}},\bm{z_{2}}=\bm{\dot{q}}-\bm{\phi},

(8)

where $\bm{\phi}$ is a virtual signal to be designed. The controller $\bm{u}$ and the virtual signal $\bm{\phi}$ are defined as

$\displaystyle\bm{u}=$	$\displaystyle\bm{G}(\bm{q})+\bm{M}(\bm{q})\bm{\dot{\phi}}+\bm{C}(\bm{q},\bm{% \dot{q}})\bm{\phi}$	(9)
	$\displaystyle\qquad-\frac{\partial\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1}},\bm{S})% }{\partial\bm{z_{1}}}-\bm{D}(\bm{z_{2}};\bm{\theta_{2}},m)\bm{z_{2}},$
$\displaystyle\bm{\phi}=$	$\displaystyle\bm{\dot{q}^{d}}-\frac{\partial\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1% }},\bm{S})}{\partial\bm{z_{1}}}.$

In the above control design, $\bm{\Phi}$ is constructed as

\displaystyle\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1}},\bm{S})=\bm{\psi}(\bm{z_{1}}% ;\bm{\theta_{1}})+\bm{z_{1}}^{\top}\bm{S}\bm{z_{1}},

(10)

where $\bm{S}$ is a positive definite matrix; $\bm{\psi}(\bm{z_{1}};\bm{\theta_{1}})$ is a FICNN (6) with input $\bm{z_{1}}$ and parameters $\bm{\theta}_{1}$ . We let the parameters $\bm{W_{1:k-1}^{(y)}}$ in $\bm{\psi}(\bm{z_{1}};\bm{\theta_{1}})$ be the output of the ReLu function, whose input is some free parameters. Moreover, we set $\bm{b_{i}}$ in $\bm{\psi}(\bm{z_{1}};\bm{\theta_{1}})$ to $0$ , which ensures $\bm{\psi}(0)=0$ . Furthermore, by selecting sReLu [34] as the activation function of $\bm{\psi}$ , we can ensure that $\bm{\psi}(z_{1})\geq 0$ , which further guarantees that $\bm{\Phi}$ has only one minimum at $\bm{z_{1}}=0$ , satisfying $\bm{\Phi}(0)=0$ . This design ensures that $\bm{\Phi}$ is strongly convex with respect to the input $\bm{z_{1}}$ .

In (9), $\bm{D}(\bm{z_{2}};\bm{\theta_{2}},m)$ is a positive definite matrix, constructed from deep neural networks with parameters $\bm{\theta_{2}}$ , hyperparameter $m$ , and input $\bm{z_{2}}$ . We first use two independent FCNNs to generate the diagonal and off-diagonal elements of a lower triangular matrix $\bm{T}$ . The activation function for each FCNN is chosen as $\tanh(\cdot)$ . We then filter the output of the diagonal elements of $\bm{T}$ through a ReLu function to ensure that the diagonal elements of $\bm{T}$ are nonnegative. In addition, a positive number $m$ is added to the diagonal elements of $\bm{T}$ . Then $\bm{D}(\bm{z_{2}};\bm{\theta_{2}},m)$ is constructed as $\bm{D}=\bm{T}^{\top}\bm{T}$ . Through this design, $\bm{D}$ is guaranteed to be positive definite.

The structure of the tracking controller is illustrated in Fig. 1, the Jacobian $\bm{\dot{\phi}},\frac{\partial\bm{\Phi}(\bm{z_{1}})}{\partial\bm{z_{1}}}$ are obtained through automatic differentiation.

Refer to caption — Figure 1: The structure for proposed NBS controller.

III-B Stability of the NBS Controller

In the following theorems, we analyze the performance of the NBS tracking controller. First, we consider the case without disturbances.

Theorem 1.

Consider the system (2) without disturbance $\bm{\tau^{d}}$ , if the controller is designed as (9), where $\bm{\Phi}(\bm{z_{1}})$ is strongly convex in $\bm{z_{1}}$ with a unique minimum at $\bm{z_{1}}=\bm{0}$ satisfying $\bm{\Phi}(0)=0$ , and $\bm{D}(\bm{z_{2}})$ is positive definite, then the closed-loop system is globally asymptotically stable at $\bm{z_{1}}=\bm{0},\bm{z_{2}}=\bm{0}$ .

Proof.

For simplifing notations, we shall ignore the DNN parameters $\bm{\theta_{1}},\bm{\theta_{2}}$ and hyperparameters $\bm{S},m$ in (9). According to (8) and (9), (2) can be reformulated as

	$\displaystyle\bm{M}\bm{\dot{z}_{2}}+\bm{C}\bm{z_{2}}+\bm{G}=\bm{M}\bm{\ddot{q}% }-\bm{M}\bm{\dot{\phi}}+\bm{C}\bm{\dot{q}}-\bm{C}\bm{\phi}+\bm{G}$		(11)
	$\displaystyle=\bm{u}-\bm{M}\bm{\dot{\phi}}-\bm{C}\bm{\phi}=\bm{G}-\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}-\bm{D}(\bm{z_{2}})\bm{z_{2}},$		(11)

Consider the following candidate Lyapunov function

\displaystyle\bm{V}(\bm{z_{1}},\bm{z_{2}})=\bm{\Phi}(\bm{z_{1}})+\frac{1}{2}% \bm{z_{2}}^{\top}\bm{M}\bm{z_{2}}.

Since $\bm{\Phi}(\bm{z_{1}})$ is strongly convex in $\bm{z_{1}}$ and has only one minimum at $\bm{z_{1}}=0$ with $\bm{\Phi}(0)=\bm{0}$ , in view of Property 1, we have $\bm{V}(0,0)=0,\bm{V}(\bm{z_{1}},\bm{z_{2}})>0,\forall\bm{z_{1}}\neq 0,\bm{z_{2% }}\neq 0$ , and $\|\bm{z_{1}}\|,\|\bm{z_{2}}\|\to\infty\Rightarrow\bm{V}(\bm{z_{1}},\bm{z_{2}})\to\infty$ . The time derivative of $\bm{V}$ is

	$\displaystyle\bm{\dot{V}}=\bm{z_{2}}^{\top}\bm{M}\bm{\dot{z}_{2}}+\frac{1}{2}% \bm{z_{2}}^{\top}\bm{\dot{M}}\bm{z_{2}}+\bm{\dot{z}}_{1}^{\top}\frac{\partial% \bm{\Phi}}{\partial\bm{z_{1}}}$
	$\displaystyle=\bm{z_{2}}^{\top}\left(\bm{u}-\bm{M}\bm{\dot{\phi}}-\bm{C}\bm{% \phi}-\bm{C}\bm{z_{2}}-\bm{G}+\frac{1}{2}\bm{\dot{M}}\bm{z_{2}}\right)+\bm{% \dot{z}_{1}}^{\top}\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}$
	$\displaystyle=\bm{z_{2}}^{\top}\left(-\frac{\partial\bm{\Phi}}{\partial\bm{z_{% 1}}}-\bm{D}\bm{z_{2}}\right)+\bm{\dot{z}_{1}}^{\top}\frac{\partial\bm{\Phi}}{% \partial\bm{z_{1}}}+\frac{1}{2}\bm{z_{2}}^{\top}(\bm{\dot{M}}-2\bm{C})\bm{z_{2% }}.$

Based on the definitions of (8), we can derive $\bm{z_{2}}=\bm{\dot{q}}-\bm{\dot{q}^{d}}+\frac{\partial\bm{\Phi}}{\partial\bm{% z_{1}}}=\bm{\dot{z}_{1}}+\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}$ . According to Property 2 and the positive definiteness of $\bm{D}$ , we have

\displaystyle\bm{\dot{V}}=-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}% \frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}-\bm{z_{2}}^{\top}\bm{D}\bm{z_{2}}% <0.

Therefore, we can conclude that $\bm{z}_{1},\bm{z}_{2}$ will converge to $\{\bm{z_{1}},\bm{z_{2}}|\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}% \frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}+\bm{z_{2}}^{\top}\bm{D}\bm{z_{2}}% =0\}$ . Since $\bm{\Phi}(\bm{z_{1}})$ is strongly convex in $\bm{z_{1}}$ and is with a minimum at $\bm{z_{1}}=\bm{0}$ , $\bm{D}$ is positive, we have that $\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{\partial\bm{\Phi}}{% \partial\bm{z_{1}}}+\bm{z_{2}}^{\top}\bm{D}\bm{z_{2}}=0$ if and only if $\bm{z_{1}}=0$ and $\bm{z_{2}}=0$ . Therefore, we have $\lim_{t\rightarrow\infty}\bm{z_{1}}=\bm{0},\lim_{t\rightarrow\infty}\bm{z_{2}}% =\bm{0}.$ ∎

Theorem 1 demonstrates the stability of the closed-loop system under the NBS tracking controller when there are no disturbances. It is unconditionally stable for all compatible neural network parameters. That is, the controller can guarantee closed-loop stability as long as the neural networks $\bm{\Phi}$ and $\bm{D}$ satisfy the conditions in Theorem 1, irrespective of the neural network parameters. In the following theorem, we will show that the NBS tracking controller can achieve a bounded tracking error in the presence of bounded disturbances.

Theorem 2.

Consider the system (2) with a bounded disturbance $\|\bm{\tau^{d}}\|^{2}\leq d$ . If the controller is designed as (9) and $\bm{D}(\bm{z_{2}})\geq\frac{1}{2}\bm{I}$ , then the tracking error will converge to $\{\bm{z_{1}}|\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{\partial% \bm{\Phi}}{\partial\bm{z_{1}}}\leq\frac{1}{2}d\}$ .

Proof.

Consider the candidate Lyapunov function

\displaystyle V(\bm{z_{1}},\bm{z_{2}})=\bm{\Phi}(\bm{z_{1}})+\frac{1}{2}\bm{z_% {2}}^{\top}\bm{M}\bm{z_{2}}.

The time derivative of $V$ is

	$\displaystyle\bm{\dot{V}}$	$\displaystyle=-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}-\bm{z_{2}}^{\top}\bm{D}(\bm{z_{2}})\bm{% z_{2}}+\bm{z_{2}}^{\top}\bm{\tau^{d}}$
		$\displaystyle\leq-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}-\bm{z_{2}}^{\top}\bm{D}(\bm{z_{2}})\bm{% z_{2}}+\frac{1}{2}\bm{z_{2}}^{\top}\bm{z_{2}}+\frac{1}{2}\\|\bm{\tau^{d}}\\|^{2}.$

Since $\bm{D}(\bm{z_{2}})\geq\frac{1}{2}\bm{I}$ , we have $\bm{\dot{V}}\leq-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}+\frac{1}{2}d.$ Then $\bm{z_{1}}$ will converge to the set $\{\bm{z_{1}}|\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{\partial% \bm{\Phi}}{\partial\bm{z_{1}}}\leq\frac{1}{2}d\}$ . ∎

We can achieve $\bm{D}(\bm{z_{2}})\geq\frac{1}{2}\bm{I}$ by adjusting the hyperparameter $m$ . In Theorem 2, the bound of the tracking error is given as a function of the Jacobian of $\bm{\Phi}$ . In the following, we will provide an explicit tracking error bound assuming structures of $\bm{\Phi}$ .

Theorem 3.

Consider the system (2) with a bounded disturbance $\|\bm{\tau^{d}}\|^{2}\leq d$ . If $\bm{D}(\bm{z_{2}})\geq\frac{1}{2}\bm{I}$ and $\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=% 0}\geq\alpha\bm{I}$ , then the tracking error under the controller (9) will converge to the set $\{\bm{z_{1}}|\|\bm{z_{1}}\|^{2}\leq\frac{1}{2}\frac{1}{\alpha^{2}}d\}$ .

Proof.

The Taylor expansion of $\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}$ at $\bm{z_{1}}=\bm{0}$ is

\displaystyle\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}=\left.\frac{\partial% ^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=\bm{0}}\bm{z_{1}}+\delta

(12)

where $\delta$ contains high-order terms. We can approximate $\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}$ with $\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=% \bm{0}}\bm{z_{1}}$ . Then from $\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{\partial\bm{\Phi}}{% \partial\bm{z_{1}}}\leq\frac{1}{2}d$ , we have $\bm{z_{1}}^{\top}\left(\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{% 2}}\right|_{\bm{z_{1}}=\bm{0}}\right)^{2}\bm{z_{1}}\leq\frac{1}{2}d$ . Since $\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=% \bm{0}}\geq\alpha\bm{I}$ , therefore $\alpha^{2}\|\bm{z_{1}}\|^{2}\leq\frac{1}{2}d$ , that is, $\|\bm{z_{1}}\|^{2}\leq\frac{1}{2}\frac{1}{\alpha^{2}}d$ . ∎

These analyses imply that we can constrain the Hessian matrix of $\bm{\Phi}$ at $\bm{z_{1}}=\bm{0}$ to improve the tracking error performance under disturbances. This can be achieved by adding a regularizer $\text{ReLu}\left(\lambda_{\max}\left(\alpha\bm{I}-\left.\frac{\partial^{2}\bm{% \Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=\bm{0}}\right)\right)$ during training or by letting the hyperparameter $\bm{S}$ satisfy $\bm{S}\geq\alpha\bm{I}$ in $\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1}},\bm{S})$ .

III-C Learning Optimization Formulation

Since the controller (9) is stable for all compatible DNN parameters, we can further optimize the performance by optimizing the DNN parameters. The optimization we are solving is the following.

$\displaystyle\min_{\bm{\theta_{1}},\bm{\theta_{2}}}$	$\displaystyle\int_{t=0}^{T}l_{t}(\bm{z_{1}},\bm{u})dt$	(13)
	$\displaystyle\qquad+\text{ReLu}\left(\lambda_{\max}\left(\alpha\bm{I}-\left.% \frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right\|_{\bm{z_{1}}=\bm{0}% }\right)\right)$
$\displaystyle\mathrm{s.t.}$	$\displaystyle\text{dynamics }\bm{M}(\bm{q})\bm{\ddot{q}}+\bm{C}(\bm{q},\bm{% \dot{q}})\bm{\dot{q}}+\bm{G}(\bm{q})=\bm{u},$
	$\displaystyle\text{errors }\eqref{eq.errors},\quad\text{controller }\eqref{eq.% trcaking controller},$
	$\displaystyle\text{initial state }\bm{q}(0),\bm{\dot{q}}(0),$

where $l_{t}$ denotes the stage cost at time $t$ . The above optimization problem can be solved by discretizing the dynamics and the cost function first and then numerically solving the discrete-time counterpart.

IV LNNs based Model Approximation and Controller Design

In Section III, the inertia matrix $\bm{M}$ , the Coriolis matrix $\bm{C}$ , and the gravitational force vector $\bm{G}$ of the model are used to design the NBS tracking controller. However, these terms are difficult to obtain for complex systems. In this section, we use LNN $\bm{\mathcal{L}}(\bm{q},\bm{\dot{q}};\bm{\gamma})$ to learn the Lagrangian of the system from data and then construct approximated inertia matrix, Coriolis matrix and gravitational force vectors to design the controller. To ensure a better approximation of the Lagrangian function, we proposed a modified LNN structure, which can guarantee that Property 1 and Property 2 hold for the approximated inertia matrix and the Coriolis matrix.

IV-A Modified LNN for Learning Dynamics

In LNNs, FCNNs are used to represent the Lagrangian function. In doing so, the approximated inertia matrix $\bm{\hat{M}}=\frac{\partial^{2}\mathcal{L}(\bm{q},\bm{\dot{q}};\bm{\gamma^{*}}% )}{\partial\bm{\dot{q}}\partial\bm{\dot{q}}}$ and the Coriolis matrix $\bm{\hat{C}}=\bm{\dot{\hat{M}}}(\bm{q})-\frac{1}{2}\bm{\dot{\hat{M}}}(\bm{q})^% {\top}$ may not satisfy the Property 1 and Property 2. To solve this problem, we use an ICNN $\mathcal{L}_{T}(\bm{q},\bm{\dot{q}};\bm{\gamma_{1}})$ with the parameter $\bm{\gamma_{1}}$ to approximate the kinetic energy $T(\bm{q},\bm{\dot{q}})$ in $\mathcal{L}$ , which is convex w.r.t. $\bm{\dot{q}}$ . Moreover, we use an FCNN $\mathcal{L}_{V}(\bm{q};\bm{\gamma_{2}})$ with the parameter $\bm{\gamma_{2}}$ and input $\bm{q}$ to learn $V(\bm{q})$ in $\mathcal{L}$ . Then we construct the Lagrangian neural network as follows.

\displaystyle\mathcal{L}(\bm{q},\bm{\dot{q}};\bm{\gamma})=\mathcal{L}_{T}(\bm{% q},\bm{\dot{q}};\bm{\gamma_{1}})-\mathcal{L}_{V}(\bm{q};\bm{\gamma_{2}}),

where $\bm{\gamma}=\{\bm{\gamma_{1}},\bm{\gamma_{2}}\}$ , the activation function we choose is softplus. Through this design, we can ensure that $\bm{\hat{M}}=\frac{\partial^{2}\mathcal{L}(\bm{q},\bm{\dot{q}};\bm{\gamma^{*}}% )}{\partial\bm{\dot{q}}\partial\bm{\dot{q}}}=\frac{\partial^{2}\mathcal{L}_{T}% (\bm{q},\bm{\dot{q}};\bm{\gamma^{*}})}{\partial\bm{\dot{q}}\partial\bm{\dot{q}}}$ is positive definite and $\bm{\dot{\hat{M}}}-2\bm{\hat{C}}$ is skew-symmetric.

Using LNNs to approximate the dynamic system model may introduce approximation errors. With $\bm{\hat{M}},\bm{\hat{C}},\bm{\hat{G}}$ , the system dynamics can be modeled as

\displaystyle\bm{\hat{M}}\bm{\ddot{q}}+\bm{\hat{C}}\bm{\dot{q}}+\bm{\hat{G}}+% \delta(\bm{q},\bm{\dot{q}})=\bm{u}+\bm{\tau^{d}},

(14)

where $\delta(\bm{q},\bm{\dot{q}})=-\bm{\hat{M}}\bm{M}^{-1}\{\bm{u}+\bm{\tau^{d}}-\bm% {C}\bm{\dot{q}}-\bm{G}\}+\{\bm{u}+\bm{\tau^{d}}-\bm{\hat{C}}\bm{\dot{q}}-\bm{% \hat{G}}\}$ is the model mismatch.

Since $\delta$ is a function of $\bm{q}$ and $\bm{\dot{q}}$ , we can employ the first-order Taylor expansion and assume that the uncertain term satisfies the inequality

\displaystyle\|\delta(\bm{q},\bm{\dot{q}})\|\leq a\|\bm{q}\|+b\|\bm{\dot{q}}\|% +c,

(15)

where $a,b,c$ are positive constants.

IV-B Controller Design and Performance Analysis

With the approximate model parameters, the NBS tracking controller (9) is modified to

	$\displaystyle\bm{u}=$	$\displaystyle\bm{\hat{G}}+\bm{\hat{M}}\bm{\dot{\phi}}+\bm{\hat{C}}\bm{\phi}-% \frac{\partial\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1}},\bm{S})}{\partial\bm{z_{1}}% }-\bm{D}(\bm{z_{2}};\bm{\theta_{2}})\bm{z_{2}},$		(16)
	$\displaystyle\bm{\phi}=$	$\displaystyle\bm{\dot{q}^{d}}-\frac{\partial\bm{\Phi}(\bm{z_{1}};\bm{\theta_{1% }},\bm{S})}{\partial\bm{z_{1}}}.$		(16)

In the following theorem, we demonstrate that the modified NBS tracking controller can achieve bounded tracking error in the presence of modeling uncertainties.

Theorem 4.

Consider the system (14) with a bounded disturbance $\|\bm{\tau^{d}}\|^{2}\leq d$ , and the model uncertainty satisfies (15). If the controller is designed as (16), with $\bm{D}\geq(b+\frac{b^{2}}{2}+\frac{a}{2}+1)\bm{I},\left.\frac{\partial^{2}\bm{% \Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=0}\geq\alpha\bm{I}$ and $\alpha>\sqrt{a}$ , then the tracking error $z_{1}$ will converge to the set $\{\bm{z}_{1}|\|\bm{z_{1}}\|^{2}\leq\frac{k^{2}+d}{\alpha^{2}-a}\}$ , where $k=\max(c+a\|\bm{q^{d}}\|+b\|\bm{\dot{q}^{d}}\|)$ .

Proof.

From (8) we have

\displaystyle\|\bm{q}\|\leq\|\bm{z_{1}}\|+\|\bm{q^{d}}\|,\quad\|\bm{\dot{q}}\|% \leq\|\bm{z_{2}}\|+\|\bm{\dot{q}^{d}}\|+\|\frac{\partial\bm{\Phi}(\bm{z_{1}})}% {\partial\bm{z_{1}}}\|.

According to (15), one can obtain $\|\delta(\bm{q},\bm{\dot{q}})\|\leq a\|\bm{z_{1}}\|+b\|\bm{z_{2}}\|+b\|\frac{% \partial\bm{\Phi}(\bm{z_{1}})}{\partial\bm{z_{1}}}\|+c+a\|\bm{q^{d}}\|+b\|\bm{% \dot{q}^{d}}\|.$ Since $\bm{q^{d}},\bm{\dot{q}^{d}}$ are bounded reference states, we can define $k=\max(c+a\|\bm{q^{d}}\|+b\|\bm{\dot{q}^{d}}\|)$ . Then we have the following.

\displaystyle\|\delta(\bm{q},\bm{\dot{q}})\|\leq a\|\bm{z_{1}}\|+b\|\bm{z_{2}}% \|+b\|\frac{\partial\bm{\Phi}(\bm{z_{1}})}{\partial\bm{z_{1}}}\|+k.

Consider the following Lyapunov candidate

\displaystyle V(\bm{z_{1}},\bm{z_{2}})=\Phi(\bm{z_{1}})+\frac{1}{2}\bm{z_{2}}^% {\top}\bm{\hat{M}}\bm{z_{2}}

The time derivative of $V$ is

	$\displaystyle\dot{V}=$	$\displaystyle\bm{z_{2}}^{\top}\left(-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1% }}}-\bm{D}\bm{z_{2}}+\bm{\tau^{d}}-\delta\right)+\bm{\dot{z}_{1}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}$
	$\displaystyle=$	$\displaystyle-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}-\bm{z_{2}}^{\top}\bm{D}\bm{z_{2}}+\bm{z% _{2}}^{\top}(\bm{\tau^{d}}-\delta)$
	$\displaystyle\leq$	$\displaystyle-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}-\lambda_{\min}(\bm{D})\\|\bm{z_{2}}\\|^{2% }+\\|\bm{\tau^{d}}\\|\\|\bm{z_{2}}\\|+b\\|\bm{z_{2}}\\|^{2}$
		$\displaystyle+a\\|\bm{z_{1}}\\|\\|\bm{z_{2}}\\|+k\\|\bm{z_{2}}\\|+b\\|\bm{z_{2}}\\|\\|% \frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}\\|$
	$\displaystyle\leq$	$\displaystyle-\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}\frac{% \partial\bm{\Phi}}{\partial\bm{z_{1}}}+\frac{1}{2}\frac{\partial\bm{\Phi}}{% \partial\bm{z_{1}}}^{\top}\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}+\frac{a% }{2}\\|\bm{z_{1}}\\|^{2}+\frac{k^{2}}{2}+\frac{1}{2}\\|\bm{\tau^{d}}\\|^{2}$
		$\displaystyle-(\lambda_{\min}(\bm{D})-b-\frac{b^{2}}{2}-\frac{a}{2}-1)\\|\bm{z_% {2}}\\|^{2}$
	$\displaystyle=$	$\displaystyle-\frac{1}{2}\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{\top}% \frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}+\frac{a}{2}\\|\bm{z_{1}}\\|^{2}+% \frac{k^{2}+d}{2}$
		$\displaystyle-(\lambda_{\min}(\bm{D})-b-\frac{b^{2}}{2}-\frac{a}{2}-1)\\|\bm{z_% {2}}\\|^{2}$

Since $\bm{D}\geq(b+\frac{b^{2}}{2}+\frac{a}{2}+1)\bm{I}$ , we have $h=\lambda_{\min}(\bm{D})-b-\frac{b^{2}}{2}-\frac{a}{2}-1\geq 0.$ Therefore,

\displaystyle\dot{V}

\displaystyle\leq-\frac{1}{2}\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}^{% \top}\frac{\partial\bm{\Phi}}{\partial\bm{z_{1}}}+\frac{a}{2}\|\bm{z_{1}}\|^{2% }+\frac{k^{2}+d}{2}.

Similar to the proof of Theorem 3, if $\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=% \bm{0}}\geq\alpha\bm{I}$ , we further have

	$\displaystyle\dot{V}\leq$	$\displaystyle-\frac{\alpha}{2}\\|\bm{z_{1}}\\|^{2}+\frac{a^{2}}{2}\\|\bm{z_{1}}\\|% ^{2}+\frac{k^{2}+d}{2}$
	$\displaystyle=$	$\displaystyle-\frac{\alpha^{2}-a}{2}\\|\bm{z_{1}}\\|^{2}+\frac{k^{2}+d}{2}.$

Then we can conclude that the tracking error will converge to the set $\{\bm{z_{1}}|\|\bm{z_{1}}\|^{2}\leq\frac{k^{2}+d}{\alpha^{2}-a}\}$ . ∎

In the preceding proof, we show that the NBS tracking controller can achieve bounded tracking errors and the explicit tracking error bounds can be further obtained by carefully designing $\bm{\Phi}$ . In view of Theorem 4, we can set the hyperparameter $m\geq\sqrt{b+\frac{b^{2}}{2}+\frac{a}{2}+1}$ in $\bm{D}$ to guarantee $\bm{D}\geq(b+\frac{b^{2}}{2}+\frac{a}{2}+1)\bm{I}$ . Moreover, we can set the hyperparameter $\bm{S}\geq\alpha\bm{I}$ in $\bm{\Phi}$ or add the regularizer $\text{ReLu}\left(\lambda_{\max}\left(\alpha\bm{I}-\left.\frac{\partial^{2}\bm{% \Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=0}\right)\right)$ during training to ensure that $\left.\frac{\partial^{2}\bm{\Phi}}{\partial\bm{z_{1}}^{2}}\right|_{\bm{z_{1}}=% \bm{0}}\geq\alpha\bm{I}$ . Similarly as in Section III-C, we can further optimize the neural network parameters to achieve better control performance.

V Simulations

In this section, in order to validate our method, we apply the NBS controller to the tracking control of manipulators. We first perform the sin and cos signal tracking task for the two-link planar robot arm with known model information. Our experiments encompass scenarios with and without disturbances and verify the controller performance. Subsequently, using MuJoCo [35], we extend our experimentation to a three-link system, employing LNN to learn the dynamic model of the system and execute trajectory tracking control. We use pytorch with Adam [36] optimizer throughout the experiments. The code is available from: https://github.com/jiajqian/Neural-Backstepping-tracking-controller.

V-A Two-Link Planar Robot Arm with Known Model Information

We consider the two-link planar robot arm model as shown in Fig. 2, where the link masses are concentrated at the ends of the links [32]. The two-link planar robot arm has $2$ control inputs $[u_{1},u_{2}]$ denoting the torque applied to each link and $4$ state variables $[\beta_{1},\beta_{2},\bm{\dot{\beta_{1}}},\bm{\dot{\beta_{2}}}]$ , representing the angle and angular velocity of the links. Each link has mass $m_{i}=1\mathrm{kg}$ and length $l_{i}=1\mathrm{m}$ , where $i=1,2$ . The dynamic model of the two-link planar robot arm can be described by (1) with $\bm{q}=[\beta_{1},\beta_{2}],\bm{\dot{q}}=[\bm{\dot{\beta}_{1}},\bm{\dot{\beta% }_{2}}]$ ,

	$\displaystyle T(\bm{\dot{q}})=$	$\displaystyle\frac{1}{2}(3+2\cos{\beta_{2}})\bm{\dot{\beta_{1}}}^{2}+(1+\cos{% \beta_{2}})\bm{\dot{\beta_{1}}}\bm{\dot{\beta_{2}}}+\frac{1}{2}\bm{\dot{\beta_% {2}}}^{2},$
	$\displaystyle V(\bm{q})=$	$\displaystyle 2g\sin{\beta_{1}}+g\sin(\beta_{1}+\beta_{2}),$

where $g=9.8\mathrm{N/kg}$ . The control target is to let $\beta_{1}$ track the signal $sin(0.1t)$ and $\beta_{2}$ track the signal $cos(0.1t)$ . The desired trajectory is $\bm{q^{d}}=[\beta_{1}^{d},\beta_{2}^{d}]=[sin(0.1t),cos(0.1t)],\bm{\dot{q}^{d}% }=[\bm{\dot{\beta}_{1}^{d}},\bm{\dot{\beta}_{2}^{d}}]=[cos(0.1t),-sin(0.1t)]$

Figure 2: The model of a 2-link planar robot arm

We define the errors as (8). Therefore, the control target can be expressed as stabilizing the errors, that is, ensuring $\underset{t\rightarrow\infty}{\lim}\bm{z_{1}}=0$ .

The NBS tracking controller is designed as (9). In the controller, $\bm{\psi}$ has $3$ hidden layers and each hidden layer has $32$ neurons, $\bm{S}=\bm{I}$ . Moreover, each FCNN in $\bm{D}$ has $2$ hidden layers and each hidden layer has $32$ neurons, $m=0.001$ .

To demonstrate the unconditional stability of the NBS tracking controller, we conduct tracking experiments without any prior training in the parameters of the neural network. The simulations are performed with a time step of $0.01$ seconds. In addition to the state $[0,0,0,0]$ , we also randomly selected three initial states. Fig. 3 depicts the joint angles of the robot $\bm{q}$ and the tracking errors $\bm{z_{1}}$ , representing the deviation between $\bm{q}$ and the desired trajectory $\bm{q^{d}}$ start from different initial states. Remarkably, the NBS tracking controller can successfully track the sin and cos signals, respectively, even without prior training.

We can improve the performance of the NBS tracking controller by solving the optimization problem (13) to achieve better performance. We discrete the optimization problem (13) with a step size of $0.01s$ , the time horizon $T=1s$ , start from the initial state $[0,0,0,0]$ , and choose the stage cost as $l_{t}=\bm{z_{1}}^{\top}\bm{z_{1}}$ . We use a decaying learning rate that starts at $1e^{-3}$ for $200$ epochs. Fig. 4 shows the trajectory and tracking error of the angles using the trained controller and the tranditional PID controller as a baseline. As depicted in Fig. 4, it is evident that the NBS tracking controller effectively steers the angles, rapidly aligning them with the desired trajectory. This outcome signifies a noticeable improvement in performance. And we can observe that the NBS tracking controller has better performance than the traditional PID controller.

Traditional PID controllers cannot effectively deal with the presence of disturbances. To demonstrate that the NBS tracking controller can guarantee a bounded tracking error in the presence of disturbances, we consider the constant disturbance case $\bm{\tau^{d}}=[1.0,1.0]$ . We set the hyperparameter $m$ in $\bm{D}$ to $1.0$ to ensure $\bm{D}(\bm{z_{2}})\geq\frac{1}{2}\bm{I}$ . We choose the stage cost as $l_{t}=\bm{z_{1}}^{\top}\bm{z_{1}}$ , discrete the optimization problem (13) with a step size of $0.01s$ and the time horizon $T=1s$ . We solve the optimization problem (13) to get $\theta_{1}^{*},\theta_{2}^{*}$ . To investigate the impact of the parameter $\alpha$ of the regularization term on the stable tracking error bound, we uniformly sample $40$ values for $\alpha$ from $[0,2]$ . We use a learning rate $1e^{-3}$ for each $\alpha$ over a span of $200$ epochs. According to Thereom 3, the tracking error should converge to $\|\bm{z_{1}}\|^{2}\leq\frac{1}{2\alpha^{2}}\times 2$ . Fig. 5 shows that the NBS tracking controller can be trained to ensure a small steady-state tracking error, and the error is bounded in the presence of disturbances. It can be seen that increasing the value $\alpha$ can reduce the maximum steady-state error. However, it should be noted that $\alpha$ should not be set too large; otherwise, this will make learning the optimal controller difficult.

The specific performances of different simulations are shown in Table I. We simulate 100s, select the maximum value of $\|\bm{z_{1}}\|^{2}$ in the last $30s$ as the steady state tracking error, and select the first time when $\|\bm{z_{1}}\|^{2}$ remains less than $0.01$ as the convergence time. From Table I, we can see more intuitively that our controller has higher tracking accuracy and faster convergence speed than the traditional PID controller. Furthermore, by optimizing the DNN parameters our controller can achieve better performance, especially for the convergence time. And our controller can ensure stability and maintain a small tracking error in the presence of disturbances.

TABLE I: The performance of different controllers, when the system start from the initial states

[0,0,0,0]

controllers	steady state tracking error	convergence time
PID controller	2.06 $e^{-3}$	15.54s
NBS tracking controller (without training)	1.30 $e^{-6}$	5.72s
NBS tracking controller (after training)	1.22 $e^{-6}$	0.13s
NBS tracking controller ( $\alpha=1$ , $\bm{\tau^{d}}=[1.0,1.0]$ )	1.25 $e^{-3}$	0.13s

V-B Three-link Planar Robot Arm with Unknown Model Information

The mathematical model of a three-link planar robot arm is difficult to obtain. Therefore, we propose to use the MuJoCo physics simulator [35] to sample data and then learn the Lagrangian function of the three-link planar robot arm, which gives an approximated model of the three-link planar robot arm. The model we built in MuJoCo has $3$ control inputs $[u_{1},u_{2},u_{3}]$ denoting the torque applied to each link, and $6$ state variables $[\beta_{1},\beta_{2},\beta_{3},\bm{\dot{\beta_{1}}},\bm{\dot{\beta_{2}}},\bm{% \dot{\beta_{3}}}]$ , representing links’ angle and angular velocity. The desired tracking trajectory is $[\beta_{1}^{d}=sin(0.1t),\beta_{2}^{d}=cos(0.1t),\beta_{3}^{d}=sin(0.1t)]$ .

When building the LNN $\mathcal{L}$ , $\mathcal{L}_{T}$ is a PICNN in the form of (7). $\mathcal{L}_{T}$ and $\mathcal{L}_{V}$ have $3$ hidden layers, each layer has $32$ neurons, and the activation function is softplus. First, we let the MuJoCo simulator run without imposing any control input on the robot arm, and obtain the state information of the model, gathering the dataset for training $\mathcal{L}$ . We choose the simulation step size to be $0.001s$ , the initial state to be $[0,0,0,0,0,0]$ , the control inputs $\bm{u}=0$ , and sample $10,000$ points. Then we solve (4) and obtain $\bm{\gamma^{*}}$ . We train $\mathcal{L}$ for $200$ epochs, a batch size of $10$ and a decaying learning rate start at $1e^{-3}$ .

After training $\mathcal{L}$ , we use the LNN-based NBS tracking controller as (16). We discrete the optimization problem (4) with a step size of $0.01s$ , the time horizon $T=1s$ and choose the stage cost as $l_{t}=\bm{z_{1}}^{\top}\bm{z_{1}}$ . We still use a decaying learning rate starting at $1e^{-3}$ for $200$ epochs. Fig. 6 shows the angular trajectory and tracking errors. The NBS tracking controller effectively achieves high-precision tracking, maintaining a steady-state tracking error with $\bm{\|}{z_{1}}\|^{2}\leq 1.5e^{-3}$ . This experiment further demonstrates the efficacy of the LNNs-based NBS tracking controller, particularly in scenarios where the system model remains unknown or difficult to obtain.

VI Conclusions

In this paper, we propose a structured DNN controller based on backstepping methods. With properly designed DNN structures, the controller has unconditional stability guarantees. In addition, its parameters can be optimized to achieve better performance. We further prove that the tracking error is bounded in the presence of disturbances. When the model information is unknown, we use ICNN to improve the LNN structure and use the improved LNNs to learn system dynamics. The controller is then designed on the basis of the learned system dynamics. We can also prove that the tracking error is bounded in the presence of model uncertainties and external disturbances. In the future, we plan to generalize the method to general nonlinear systems.

References

[1] S. N. Kumpati, P. Kannan et al., “Identification and control of dynamical systems using neural networks,” IEEE Transactions on neural networks, vol. 1, no. 1, pp. 4–27, 1990.
[2] D. Rolnick and M. Tegmark, “The power of deeper networks for expressing natural functions,” in International Conference on Learning Representations, 2018.
[3] H. W. Lin, M. Tegmark, and D. Rolnick, “Why does deep and cheap learning work so well?” Journal of Statistical Physics, vol. 168, pp. 1223–1247, 2017.
[4] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao, “Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review,” International Journal of Automation and Computing, vol. 14, no. 5, pp. 503–519, 2017.
[5] B. Bauer and M. Kohler, “On deep learning as a remedy for the curse of dimensionality in nonparametric regression,” The Annals of Statistics, vol. 47, no. 4, pp. pp. 2261–2285, 2019.
[6] A. Punjani and P. Abbeel, “Deep learning helicopter dynamics models,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 3223–3230.
[7] M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using physics as model prior for deep learning,” in International Conference on Learning Representations, 2018.
[8] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho, “Lagrangian neural networks,” in ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020.
[9] N. Takeishi and Y. Kawahara, “Learning dynamics models with stable invariant sets,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, 2021, pp. 9782–9790.
[10] S. Sanyal and K. Roy, “Ramp-net: A robust adaptive mpc for quadrotors via physics-informed neural network,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1019–1025.
[11] K. Y. Chee, T. Z. Jiahao, and M. A. Hsieh, “Knode-mpc: A knowledge-based data-driven predictive control framework for aerial robots,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2819–2826, 2022.
[12] L. Bauersfeld, E. Kaufmann, P. Foehn, S. Sun, and D. Scaramuzza, “Neurobem: Hybrid aerodynamic quadrotor model,” Proceedings of Robotics: Science and Systems XVII, p. 42, 2021.
[13] C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control,” IEEE Transactions on Robotics, 2023.
[14] Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” Advances in neural information processing systems, vol. 32, 2019.
[15] N. Gaby, F. Zhang, and X. Ye, “Lyapunov-net: A deep neural network architecture for lyapunov function approximation,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 2091–2096.
[16] S. M. Richards, F. Berkenkamp, and A. Krause, “The lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems,” in Proceedings of The 2nd Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Billard, A. Dragan, J. Peters, and J. Morimoto, Eds., vol. 87. PMLR, 29–31 Oct 2018, pp. 466–476.
[17] R. Zhou, T. Quartz, H. De Sterck, and J. Liu, “Neural lyapunov control of unknown nonlinear systems with stability guarantees,” Advances in Neural Information Processing Systems, vol. 35, pp. 29 113–29 125, 2022.
[18] H. Dai, B. Landry, L. Yang, M. Pavone, and R. Tedrake, “Lyapunov-stable neural-network control,” in Proceedings of Robotics: Science and Systems, Virtual, July 2021.
[19] W. Jin, Z. Wang, Z. Yang, and S. Mou, “Neural certificates for safe control policies,” arXiv preprint arXiv:2006.08465, 2020.
[20] L. Xu, M. Zakwan, and G. Ferrari-Trecate, “Neural energy casimir control for port-hamiltonian systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 4053–4058.
[21] L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed neural network control with dependability guarantees: a compositional port-hamiltonian approach,” in Learning for Dynamics and Control Conference. PMLR, 2022, pp. 571–583.
[22] S. A. Khader, H. Yin, P. Falco, and D. Kragic, “Learning deep energy shaping policies for stability-guaranteed manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8583–8590, 2021.
[23] S. Massaroli, M. Poli, F. Califano, J. Park, A. Yamashita, and H. Asama, “Optimal energy shaping via neural approximators,” SIAM Journal on Applied Dynamical Systems, vol. 21, no. 3, pp. 2126–2147, Sep. 2022.
[24] S. Sánchez-Escalonilla, R. Reyes-Báez, and B. Jayawardhana, “Stabilization of underactuated systems of degree one via neural interconnection and damping assignment – passivity based control,” in 2022 IEEE 61st Conference on Decision and Control (CDC), Dec. 2022, pp. 2463–2468.
[25] C. Yang, D. Huang, W. He, and L. Cheng, “Neural control of robot manipulators with trajectory tracking constraints and input saturation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 9, pp. 4231–4242, 2020.
[26] Q. Meng, X. Lai, Z. Yan, C.-Y. Su, and M. Wu, “Motion planning and adaptive neural tracking control of an uncertain two-link rigid–flexible manipulator with vibration amplitude constraint,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 8, pp. 3814–3828, 2022.
[27] N. Hassan and A. Saleem, “Neural network-based adaptive controller for trajectory tracking of wheeled mobile robots,” IEEE Access, vol. 10, pp. 13 582–13 597, 2022.
[28] K. Xu and Z. Wang, “The design of a neural network-based adaptive control method for robotic arm trajectory tracking,” Neural Computing and Applications, vol. 35, no. 12, pp. 8785–8795, 2023.
[29] H. Khalil, Nonlinear Systems, ser. Pearson Education. Prentice Hall, 2002.
[30] Q. Hu, L. Xu, and A. Zhang, “Adaptive backstepping trajectory tracking control of robot manipulator,” Journal of the Franklin Institute, vol. 349, no. 3, pp. 1087–1105, 2012.
[31] B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 146–155.
[32] F. L. Lewis, C. T. Abdallah, D. M. Dawson, and F. L. Lewis, Robot manipulator control: theory and practice, ser. Control engineering series. New York : Marcel Dekker, c2004. 2nd ed., 2004.
[33] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: a survey,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 5595–5637, 2017.
[34] J. Z. Kolter and G. Manek, “Learning stable deep dynamics models,” Advances in neural information processing systems, vol. 32, 2019.
[35] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
[36] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.