0% found this document useful (0 votes)
161 views13 pages

Hybrid Physics-Informed Neural Networks

This document proposes a hybrid physics-informed neural network (hybrid PINN) method for solving partial differential equations (PDEs). Unlike traditional PINNs that use automatic differentiation to calculate PDE residuals, the hybrid PINN uses a local fitting method to approximate differential operators. This avoids issues with neural network predictions and provides a convergent rate. Numerical experiments show the hybrid PINN is computationally efficient and accurate in solving PDEs, including for inverse problems and surface PDEs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views13 pages

Hybrid Physics-Informed Neural Networks

This document proposes a hybrid physics-informed neural network (hybrid PINN) method for solving partial differential equations (PDEs). Unlike traditional PINNs that use automatic differentiation to calculate PDE residuals, the hybrid PINN uses a local fitting method to approximate differential operators. This avoids issues with neural network predictions and provides a convergent rate. Numerical experiments show the hybrid PINN is computationally efficient and accurate in solving PDEs, including for inverse problems and surface PDEs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

5514 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO.

10, OCTOBER 2022

A High-Efficient Hybrid Physics-Informed Neural


Networks Based on Convolutional Neural Network
Zhiwei Fang

Abstract— In this article, we develop a hybrid physics-informed PINN [2], [8]–[11]. The idea of the PINN is using the
neural network (hybrid PINN) for partial differential equa- PDE and its boundary and initial conditions to make a
tions (PDEs). We borrow the idea from the convolutional loss function by automatic differentiation (AD) [12] so
neural network (CNN) and finite volume methods. Unlike the
physics-informed neural network (PINN) and its variations, that the residue of the PDE and boundary and initial
the method proposed in this article uses an approximation of conditions is enforced, which tends to zero at the pre-
the differential operator to solve the PDEs instead of automatic selected sample points by minimizing the loss function.
differentiation (AD). The approximation is given by a local fitting Sun and Wang [13] and Yang et al. [14] combined the
method, which is the main contribution of this article. As a result, Bayesian approach with the PINN to solve inverse problems.
our method has been proved to have a convergent rate. This
will also avoid the issue that the neural network gives a bad Yang and Perdikaris [15] combined PINN and GAN architec-
prediction, which sometimes happened in PINN. To the author’s tures to solve the uncertainty quantification problem in an
best knowledge, this is the first work that the machine learning efficient way. PINNs have also been widely used in real prob-
PDE’s solver has a convergent rate, such as in numerical methods. lems and industry. Deng et al. [16] and Sampani et al. [17]
The numerical experiments verify the correctness and efficiency introduce the applications of PINNs to biological and med-
of our algorithm. We also show that our method can be applied
in inverse problems and surface PDEs, although without proof. ical area. Shukla et al. [18] showed how to use PINN to
detect surface-breaking cracks in engineer. PINNs have also
Index Terms— Convolutional neural network (CNN), finite vol- been used in computational fluid dynamics [8], [19]. There
ume method, finite-difference method, hybrid physics-informed
neural network (hybrid PINN), local fitting method. are some developments of PINN by combining with the
idea of finite-element methods. For example, the variational
physics-informed neural networks (VPINNs) [20] establish
I. I NTRODUCTION the loss function by the integration of the residue of the
PDEs multiply by a set of basis. The hp-variational physics-
D EEP neural networks (DNNs), as an essential branch of
machine learning, have been widely used in academia
and industry as the fast growth of machine learning tech-
informed neural networks (hp-VPINNs) [21] use the same idea
to VPINNs, but the basis functions only have local supports.
niques [1]. An important application of DNN in scientific Both VPINNs and hp-VPINNs are trying to make the loss
computing is the DNN solver for partial differential equations function more reasonable to enhance PINN’s performance.
(PDEs) [2]–[5]. They offer a flexible nonlinear approximant However, as we will see in this article, all of these methods are
by the deep composition relationship among each layer and not perfect. Most of the PINN improvements focus on the loss
nonlinearity of the activation functions. The universal approx- function and the analysis of the gradient descent algorithm.
imation theorem [6] guarantees that the DNN has enough In fact, there are many DNN solvers for PDEs other than
ability to approximate the solution of a PDE. Compared with fully connected structures. For example, there is a solver based
the traditional numerical PDEs’ solvers, DNN is easy to on convolutional neural network (CNN) shown in
implement and more flexible to handle other PDEs’ problems, https://github.com/tensorflow/examples/
such as inverse problems. blob/master/community/en/pdes.ipynb
In addition, the numerical methods for PDEs usually need to Essentially, this is a finite-difference scheme with the stencil
solve a large linear system. In most cases, this has been done ⎡ ⎤
0.5 1 0.5
by CPU. However, DNN solvers may utilize GPUs, which is ⎣1 −6 1 ⎦. (1)
potentially more efficient.
0.5 1 0.5
The physics-informed neural network (PINN) is one
of the DNN solvers and attracts more attention recently. In this example, the Laplacian operator  has been approx-
From the paper of [7], there are more than hundreds of imated by the convolution with kernel (1). This enlightens
articles that report the developments and applications of us to improve the PINN by replacing the AD to another
approximation. Indeed, AD gives us a more accurate derivative
Manuscript received 4 July 2020; revised 21 September 2020, 10 December of DNN with respect to the inputs. However, it loses the
2020, and 23 March 2021; accepted 31 March 2021. Date of publication
13 April 2021; date of current version 6 October 2022. information from the neighborhood of the domain, which is
The author is with NVIDIA Corporation, Santa Clara, CA 95051 USA crucial in the approximation of the PDEs’ solutions.
(e-mail: [email protected]). In this article, we will develop a novel approach to approx-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2021.3070878. imate the differential operator on arbitrary geometry, that
Digital Object Identifier 10.1109/TNNLS.2021.3070878 is, we equating a local linear combination of the function
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5515

Fig. 1. PINNs’ architecture diagram.

values and the output of the PDE, and then, the coefficients where N [·] denotes the linear or nonlinear differential oper-
of the linear combination are computed by the least-square ator, B[·] denotes the boundary value operator, the domain
method. We call it the local fitting method. This method is  ⊂ Rd is a bounded open set, and Rd is the d-dimensional
compatible with the neural network framework, and we will Euclidean space.
show that this method has a guaranteed convergent rate, which To solve (2) and (3), we establish a DNN with input x
is one of the main contributions of this work. Then, similar and output u. We want that this DNN can predict the solution
to the convolution operation in CNN’s solver above, we use of (2) and (3). To achieve this goal, we use AD to calculate the
the resultant approximated matrix to calculate the residue residue of the PDE N [u] and use it as part of the loss function.
of the PDE rather than the AD. Then, by using a DNN to We call this DNN as PINN. The architecture of PINN is shown
predict the solution of the PDE, we establish a hybrid PINN in Fig. 1.
Nu
as the surrogate model. The numerical experiments show that Let assume that {xui }i=1 ⊂ Rd is a preselected set of points.
our algorithm is computationally efficient and has a convergent The boldface font means vector throughout this article. Then,
rate. To the author’s best knowledge, this is the first work that we have the following PDE residue in mean squared error
machine learning PDE solver shows a convergent rate such as type:
in numerical methods.
1    
Nu
In summary, the original contributions of this article are N [u] xi 2 .
MSEu = u (4)
given in the following. Nu i=1
1) We develop the local fitting method that gives a local
approximation of the differential operator on arbitrary This will tell the PINN that the prediction has to satisfy the
geometry. PDE (2). In addition, we have the following boundary value
2) We compute the residue of the PDE by the resultant residue to enforce the prediction to also satisfy the boundary
approximated matrix instead of AD. This will avoid condition (3):
making the computational graph of the DNN complicate.
1    
Nb
3) A convergent rate of the hybrid method has been proved, MSEb = B[u] xi 2 (5)
b
which gives a theoretical guarantee of the algorithm. Nb i=1
Also, this is the first machine learning algorithm for Nb
solving PDE that comes with a convergent rate. where {xbi }i=1 is the preselected points set for the boundary
The rest of this article is organized as follows. In Section II, condition. In sum, the loss function reads
we do the case studies to show the PINN and its variations are MSE = MSEu + MSEb . (6)
not perfect. Based on these studies, we come up with our idea
for the hybrid PINN. In Section III, we show the hybrid PINN By minimizing MSE, the PINN can predict the solution of (2)
algorithm and prove the algorithm’s convergent rate result. and (3) at any point in . This is the structure of PINN.
Numerical experiments have been displayed in Section IV to Recently, Kharazmi et al. [20], [21] reported some varia-
verify our theory and algorithm. We conclude this article in tions of PINN, such as VPINN and hp-VPINN. Essentially,
Section V. instead of imposing pointwise constraints for PDE, as shown
in (4), the following integral form constraints have been
II. C ASE S TUDIES AND M OTIVATION adopted:
Ne  2
In this section, we recap the basic idea of PINN and its 1  


variations. Then, we illustrate that they are not perfect by the MSEu =  N [u](x)v i (x)dx
Ne i=1 
case studies followed by the motivation of the hybrid PINN.
Ne
where {v i (x)}i=1 is a set of preselected basis functions.
A. Recap of PINN and Its Variations If v i (x) = δ(x − xi ), where δ(x) denotes the Dirac delta
Let us consider the following abstract PDE problem: function, we get the PINN as introduced above. If the supports
of v i (x), i = 1, . . . , Ne are all , we earn the VPINN scheme.
N [u](x) = 0 in , (2) If the supports v i (x), i = 1, . . . , Ne are proper subsets of ,
B[u](x) = 0 on ∂ (3) we earn the hp-VPINN scheme.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5516 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

Fig. 2. Possible PINN prediction of (7) and (8).

B. Case Studies
Although the PINN and its variations have been widely used Fig. 3. Possible PINN prediction of (10) and (11).
in scientific computing and engineering nowadays, they are not
theoretically perfect. In this section, we use some elementary
examples to illustrate these points. In contrast, the traditional collocation methods in numerical
Let us first look at PINN. The PINN can be considered as analysis are not likely to generate these solutions. This is
a collocation method whose interpolator is a DNN. By the because the traditional collocation methods usually have only
universal approximation theorem [6], the artificial neural net- a few DoFs compared with DNNs. Finally, they will reduce
works may approximate a wide range of functions. This to a linear algebra problem, which usually has a unique solu-
implies that the DNN is potentially a suitable interpolator. tion [22]. This will limit the solution of traditional collocation
However, this may also lead to serious error in PINN. methods and hence get a reasonable solution. Take the radial
Example 1: Consider the following ordinary differential basis function collocation method as an example. We only have
equation (ODE) problem: three DoFs at −1, 0, and 1. Then, we will get a 3×3 linear
system with only zero solution for the commonly used radial
u  (x) = 0 in (−1, 1), (7) basis functions. Then, the solution of this numerical method
u(−1) = u(1) = 0. (8) is u(x) = 0.
Readers may question:
Obviously, the exact solution of (7) and (8) is u(x) = 0, and
In Example 1, you only have one point for the
this solution is unique. Suppose that we are solving (7) and (8)
residue of the ODE. This is far away from enough.
by PINN with point set {0, ±1}. Then, by (4), (5), and (6),
What if we have lots of points in (−1, 1) to enforce
the loss function reads
the residue tend to zero almost everywhere?
1
MSE = |u  (0)|2 + |u(−1)|2 + |u(1)|2 . (9) The answer is: it probably works, but cannot guarantee to fix
2 the issue theoretically. We will explain this in Example 2.
By minimizing (9), we may expect MSE →0. Let even assume Example 2: Consider the following ODE problem:
the ideal case, that is, MSE = 0 exactly. In this ideal case,
u  (x) = 0 in (−1, 1), (10)
we have u  (0) = u(−1) = u(1) = 0. However, the prediction
of PINN may look such as in Fig. 2. Notice that u  (0) = u(−1) = 0, u(1) = 1. (11)
0 means that the solution at x = 0 is locally a horizontal Obviously, the exact solution is u(x) = (x + 1/2), and this
line. Thus, actually, any function that equals zero at ±1 and solution is unique. Suppose that we solve this problem by
equals a nonzero constant in a neighborhood of 0 will be a PINN again. However, this time, let us assume that we have
potential prediction of this PINN. Unfortunately, they are not a large amount of points in [−1, 1] except 0, without loss of
the solution of (7) and (8). generality. Therefore, the loss function reads
In practice, the prediction of this model depends on the
1     i 2 1 
Nu
initialization of the neural network. In complex and large u x  + |u(−1)|2 + |u(1) − 1|2
MSE = u
geometry, this phenomenon is more likely to happen. Nu i=1 2
The reason for this phenomenon is that, for one thing,
Nu
a DNN usually processes many degrees of freedoms (DoFs), where {x ui }i=1 are the points set in (−1, 0) ∪ (0, 1), and we
say, the weights and biases, and complex composition rela- can imagine that this points set is as dense as we want in
tionships among each layer. This allows the DNNs to predict (−1, 0) ∪ (0, 1), that is, Nu → ∞. Again, we ignore the
a wide set of functions. For another, the loss function in PINN training error and consider the ideal case that M S E = 0.
is essentially a pointwise constraint in the domain, and the This implies u  (x) = 0 in (−1, 0) ∪ (0, 1) and u(−1) = 0
points are mutually isolated, that is, we know that the PDE and u(1) = 1. However, even in this case, we cannot avoid
is satisfied on the training points, but we do now know what the solution in Fig. 3 theoretically.
happened besides the training points. Hence, we may not get Notice that u  (x) = 0 implies a straight line, so the solution
the desired solution because many other functions satisfy the in Fig. 3 is possible, given that the curve is piecewise line
pointwise constraints posed in the loss. In addition, the number in [(−1, 0) and (0, 1)]. As suggested in [21], PINN can
of DoFs of DNNs is hard to control, that is, it is hard to tell fit the function of those who lack smoothness. Therefore,
the best DNN’s structure for a specific problem. the phenomenon indicated in this example is possible and

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5517

depends on initialization. The traditional numerical methods


do not have this issue because the basis functions are smooth
enough.
If the activation function is ReLU, then this phenomenon is
more likely to happen because the function shown in Fig. 3
is precisely a ReLU function. However, one may ask that if
a smooth activation function has been used, such as sigmoid,
will this phenomenon still possible to happen? The answer
is yes, but the initialization of the parameters in the DNN
will be very twisted. However, we may change our example a
little bit. If we assume that our training points set to miss a 0
neighborhood, then the phenomenon is still likely to happen.
This is because the prediction might be a piecewise line and
a spline interpolation in the neighborhood of 0. In fact, by the
Baire category theorem, if we miss {0}, we must miss a Fig. 4. General local mesh grid.
neighborhood of it, provided that our training points set are
always finite. the solvability and uniqueness of the numerical algorithm can
In Examples 1 and 2, we carry our examples that PINN be guaranteed.
may theoretically predict incorrect solutions, even in the case However, in VPINN, the latent variable u(x) is given by a
that loss functions equal to zero exactly. In most practical neural network. It neither same as the linear space generated by
cases, the loss functions will not equal zero exactly due to the {v j (x)} Kj=1 nor satisfies the LBB condition. In a finite-element
optimization algorithm and machine precision. The infinitely theory, this is the so-called variational crime [24] because the
many or super huge training points shown in Example 2 are trail function space and the test function space are matched.
also impossible in practice. These factors will incur other Therefore, the incorrect prediction suggested in this example
errors in PINN. is possible in practice. hp-VPINN has the same issue with
For VPINN, the incorrect prediction is also possible, VPINN.
as illustrated in Example 3.
Example 3: Consider the simple function fitting problem
u(x) = 0 in [−1, 1]. Our goal is that fitting u(x) by VPINN C. Motivation of Hybrid PINN
Nu
given a set of points {x ui }i=1 . As suggested in [20], we set the Based on Examples 1–3, we see that the PINN and its
test functions as Legendre polynomials up to degree K . Then, variations are not theoretically perfect because the function
the loss function reads space of PINN is too large. Thus, there is a nonnegligible
K  1 2  N
K 
2 chance to meet the loss function’s undesired critical points
   
u
 i  i 
MSE =  u(x)v j (x)d x  ≈  wi u x u v j x u  during the training. In this section, we are going to find out
   some clues to fix this issue.
j =1 −1 j =1 i=1
(12) A feature of PINN and its variations is that they all rely on
the AD technique. Actually, PINN is not the only way to solve
where v j (x), j = 0, . . . , K , is the j th order Legendre poly- PDE. There is an example of the heat equation at Tensorflow’s
Nu
nomial and {wi }i=1 are the corresponding quadrature weights tutorial website:
N
at nodes {x u }i=1 . As before, let us assume MSE = 0 exactly,
i u
https://github.com/tensorflow/examples/
and the quadrature rule in (12) is exact. Then, any Legendre blob/master/community/en/pdes.ipynb
polynomial whose degree higher than K will be a possible In this example, the Laplacian operator has been discretized
prediction of VPINN due to the orthonormality of Legendre by a convolution operation with the following kernel (1). This
polynomials [23]. kernel is precisely a finite-difference stencil on the Cartesian
The prototypes of VPINN and hp-VPINN in numerical grid [25]. This gives us an important clue to improve the
analysis are finite-element method and spectral method. This PINN: add in traditional numerical method and CNN structure
phenomenon will not happen in the finite-element method and approximate the differential operator by the numerical
and the spectral method. This is because, in those numerical method instead of AD. However, the method suggested in
methods, the solution has been approximated by a func- Tensorflow’s tutorial is only worked for the interval, square,
tion in a finite-element space. In conforming finite-element and cubic domains. Thus, we have to develop a more general
methods and spectral methods, this space is the same as to algorithm for general geometries.
test function space. Then, the numerical solution converges To this end, let us consider a local mesh grid in R2 shown
to the exact solution by the continuity and coercivity of in Fig. 4. In a finite volume method or generalized difference
the bilinear form. In Example 3, this is equivalent to our method [26], the Laplacian operator  at P0 can be discretized
prediction u(x) that is a linear combination of v j (x) for by
j = 0, . . . , K . In nonconforming finite-element methods, 
∂ 2u ∂ 2 u  5
the trail function space and test function must satisfy the u(P0 ) = + ≈ wi u(Pi ). (13)
Ladyzhenskaya–Babuška-Brezzi (LBB) condition [24] so that ∂x2 ∂y 2  P0 i=0

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5518 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

Namely, the linear differential operator  at P0 can be six unknowns in Fig. 4, so theoretically, we should find out
approximated by a linear combination of the function values at six independent equations. Based on the analysis in Section II,
P0 and its neighboring points P1 –P5 . Although (13) is based we can require that (13) is exact for {1, x, y, x 2 , x y, y 2 },
on Fig. 4, the same idea can be applied in any dimension. that is,
In terms of CNN, (13) is actually a convolution operation.
If we can somehow find out a set of reasonable coefficients 
5 
5 
5

{wi }5i=0 in (13), then we can use it as an approximation of the wi = 0, wi x i = 0, wi yi = 0,


operator  at P0 . i=0 i=0 i=0

In the finite-element method whose DoFs locate on the 


5 5 
5

nodes, (13) is given by the weak formulation of the PDE. wi x i2 = 2, wi x i yi = 0, wi yi2 = 2 (14)
i=0 i=0 i=0
In the generalized difference method, (13) is given by the
integration on the dual mesh. Of course, we can simply where (x i , yi ) is the coordinate at Pi . If the coefficient matrix
apply those numerical schemes to approximate the differential of (14) is full-ranked, then this system is uniquely solvable
operator. However, for one thing, those numerical schemes and we get the approximation of  at P0 .
highly depend on the mesh, and establishing the scheme needs Nevertheless, practical cases are more complicated than this.
lots of involved calculations. For another, if we already have For one thing, we cannot guarantee that (14) is uniquely
those numerical schemes, after solving a linear system, we can solvable. Actually, it depends on the locations of the points
get the numerical solution of the PDE. Then, we do not need and what kind of polynomials we use. For example, in this
the machine learning solver anymore. problem, if we have Cartesian grid points P0 = (0, 0) (center),
Therefore, we should put forward a method that is suitable P1 = (0, 1) (up), P2 = (−1, 0) (left), P3 = (0, −1) (down),
for the machine learning framework and inherits the properties and P4 = (1, 0) (right), and then fit the coefficients by using
of numerical methods. The clue of this method can be found x 2 and x 4 , then we have
in the CNN kernel (1). If we consider the numbers in (1) as
the weights of finite-difference scheme at the corresponding  
(x 2 )(0,0) ≈ w2 + w4 = 2, and (x 4 )(0,0) ≈ w2 +w4 = 0
points, then we can verify that this scheme is exact for
1, x, y, x y. This agrees with the finite-difference theory [27].
which is a contradiction.
In numerical integration, we have monomial rules, that is,
For another, the number of unknowns does not always
once the quadrature or cubature rule is exact for some set
match the number of polynomials. Let Pdk denote the set of
of polynomials, then the quadrature or the cubature rule is
d-dimensional homogeneous polynomial with a degree no
feasible with a specific order of accuracy [28]. We can mimic
more than k and Pdk denote the set of d-dimensional homoge-
this rule for our problem; if the numerical scheme is exact for
neous polynomial with degree k. Also, let ρ(d, k) and ρ(d, k)
a set of polynomials, why not make the scheme (13) exact
d+k in Pd and Pd,d+k−1
k k
denote the number of elements respectively.
for a set of polynomials so that we can decide the coefficients
{wi }5i=0 ? In [24], we have ρ(d, k) = k and ρ(d, k) = k . Then,
The roadmap of our algorithm is the following. We first the number of unknowns at each point is given by mesh or
choose a set of basis functions, such as polynomials, points cloud (see Remark 1). For example, in the contradiction
system example above, we only have five unknowns. However,
which satisfy the conditions in Theorem 1. Next, we set
up a least-square problem to find out the coefficients in there is no k ∈ N such that ρ(2, k) = 5 because ρ(d, k) is
a linear combination such as (13). This will give us increasing in both d and k, and ρ(2, 1) = 3 < 5 < 6 =
ρ(2, 2).
an approximation of the differential operator with a con-
vergent rate that is guaranteed. Finally, we use a DNN To conquer all these issues, we use the least-square method
as a surrogate model of PDE’s solution and then mini- with hard and soft constraints. Let us consider a general case
mized the residue of the PDE given by the approximated rather than (13). Let P0 ∈ Rd , and {Pi }m
i=1 are m neighboring

operator. points connected to P0 directly in a mesh. Suppose that we


are looking for the approximation of  at P0 by the following
In Section III, we will see that this method solves the
issue we care about, and we can prove that truncation error equation:
converges to 0 as the points go dense.

m
u(P0 ) ≈ wi u(Pi ). (15)
III. H YBRID PINN M ETHOD i=0
In this section, we will describe the local fitting method
to determine the approximation of the differential operator Once a mesh is given, we can compute its adjacency matrix.
based on findings in Section II. Then, we will summarize our By the adjacency matrix, we will know the number of neigh-
algorithm at the end of this section. boring points for each point in the mesh, and then, the number
of unknowns for each point is clear. Let us assume that the
minimal number of unknowns around the mesh points is n min ,
A. Local Fitting Method while the maximal number of unknowns is n max . Then, choose
We state our algorithm by using Fig. 4. Suppose that we the maximal number kmin such that ρ(d, kmin ) ≤ n min and
are finding an approximation of  at P0 . Notice that we have minimal number kmax such that ρ(d, kmax ) ≥ n max . Now,

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5519

we have the following least-square problem: by


 2 
m
 
m L[u](P0 ) ≈ wi u(Pi ) (17)
min W1 wi f (Pi ) −  f (P0 ) i=0
wi ∈R
where {Pi }m
k i=0
i=1,...,m f ∈Pd min
i=1 are the neighboring points of P0 who connect
 m 2
  to P0 directly in a mesh. If (17) is exact for all polynomials
+ W2 wi f (Pi ) −  f (P0 ) (16) in Pdn , that is,
 
k
f ∈ Pdkmax \Pd min i=0 
m
L[ f ](P0 ) = wi f (Pi ) ∀ f ∈ Pdn (18)
i=0
where W1 and W2 are the penalty constants for hard con-
straint [first term in (16)] and soft constraint [second term then, we have
   m 
in (16)], respectively. Usually, we have W1 > W2 . The goal  
m  
 
of the least-square problem (16) is that we want the hard L[u](P0 ) − wi u(Pi ) ≤ Ch n+1
|wi | (19)
 
i=0 i=1
constraint satisfied exactly. Namely, for the polynomials in
Pdkmin , the scheme (15) is exact. If there are remainder DoFs, where h = max1≤i≤m |P0 Pi | and C is a constant that is
we require that the scheme is exact for a higher degree of independent of h.
polynomials as far as they can to control those remainder Proof: Without loss of generality, let P0 = x0 = 0.
DoFs. This requirement is not coercive for all the points, so we Assume that L has the form
call it the soft constraint. By solving (16), we will get the  l 
approximation of  at P0 . Note that we also transfer the linear L= c j∂α j = c j∂α j (20)
system problem in (14) to the least-square problem, which is j =1 α j ∈A

more suitable for the machine learning framework. We use the where c j ∈ R, 1 ≤ j ≤ l, |α j | = n, and α j ∈ A.
word “local” because, for each point in the mesh, we consider At any Pi = xi = (x 1i , . . . , x di ), 1 ≤ i ≤ m, by using
the local coordinate centered at the point. Therefore, all the multi-dimensional Taylor’s expansion [29], we have
polynomials except constants equal to 0 at the center point  1  1
(for example, P0 in Fig. 4). This procedure is necessary when u(Pi ) = ∂ α u(P0 )xiα + ∂ α u(Q i )xiα
|α|≤n
α! |α|=n+1
α!
the coordinates are far away from the origin (see test 4 in
Section IV). where Q i is a point in the line segment between P0 and Pi .
In (16), the remainder DoFs have been controlled by higher Hence,
order polynomials. We point out that this is not the only way to m m  1
deal with the remainder DoFs. Actually, we can impose other wi u(Pi ) = w0 u(P0 ) + wi ∂ α u(P0 )xiα
reasonable conditions, such as stability conditions or energy i=0 i=1 |α|≤n
α!
conservation conditions, to control those DoFs. In this way, 
m  1 α
our solution may process more desirable properties. However, + wi ∂ u(Q i )xiα
i=1 |α|=n+1
α!
the concrete forms of these conditions will be left as future
work. 
m
= u(P0 ) wi
We also point out that although we use the Laplacian
operator as an example to illustrate our algorithm, Theorem 1 i=0
 m 
 1 α 
and Corollary 1 show that our algorithm is correct for any α
+ ∂ u(P0 ) wi xi
nth order linear differential operator under the corresponding 1≤|α|≤n
α! i=1
conditions.  m 
 1 
α α
Before we show the theorem, let us define some notations. + wi ∂ u(Q i )xi . (21)
Let α = (α1 , . . . , αd ) be a d-dimensional multi-index with |α|=n+1
α! i=1
d
norm |α| = i=1 αi . Its factorial is given by α! = α1 ! · By (18), we have
α2 !, . . . , αd !. If x = (x 1 , . . . , x d ) ∈ Rd , then xα = x 1α1 ·
x 2α2 , . . . , x dαd . The high-order partial derivative is defined by 
m
L[1] = wi = 0.
∂ α = ∂1α1 , . . . , ∂dαd , where i=0

∂ αi Therefore, the first term in (21) vanishes. Note that for a given
∂iαi = . α 0 , we have
∂ x iαi 
α0 α α!, if α 0 = α
∂ x = .
To say α > α  , we mean αi > αi , for any 1 ≤ i ≤ d. Now, 0, if α 0 = α
we state and prove the main theorem in this article.
Thus, by (20),
Theorem 1: Let u : Rd → R be a function that has at least
L[xαk ](x0 )
(n + 1)th continuous partial derivatives (n ≥1) and L be an L[xαk ](x0 ) = ck α k ! or ck =
nth order linear differential operator in Rd that consists of ∂ α αk !
with |α| = n. Suppose that L[u](P0 ) has been approximated for α k ∈ A and L[xα ](x0 ) = 0 for α ∈ A.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5520 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

Therefore, by this fact, (18), (20), and (21) can be reduced that (19) is true for a more general differential operator. The
to result has been shown in the following corollary without proof.
m  Corollary 1: Under the condition in Theorem 1, let L be
1
wi u(Pi ) = L[xα ](x0 ) ∂ α u(P0 ) a linear differential operator in Rd that consists of ∂ α with
i=0 1≤|α|≤n
α! |α| ≤ n. Then, we have the same estimate as shown in (19).
 m 
 1  Once (16) has been solved, we may put the resultant coeffi-
α α
+ wi ∂ u(Q i )xi cients row by row. This will form a sparse matrix A, which is
α! i=1
|α|=n+1 the matrix form approximation of linear operator L. We call

= c j ∂ α j u(P0 ) this technique as the local fitting method. A only has O(N)
α j ∈A nonzero elements where N is the number of mesh points.
 m  Then, L[u](x) can be approximated by Au, where u is the
 1 
+ wi ∂ α u(Q i )xiα column vector of function values of u at mesh points x.
|α|=n+1
α! i=1 Using the local fitting method, we will get an approximation
 m  of the operator, and the accuracy of the approximation is
 1 
α α
= L[u](P0 ) + wi ∂ u(Q i )xi . guaranteed by Theorem 1 and Corollary 1. This means that
|α|=n+1
α! i=1 the truncation error of the approximation is clear. In addi-
tion, the residue of the PDEs has been calculated by linear
Hence,
 combinations of adjacent points, which will link the training
 m    m 
    1   points in the domain together. Thus, no point is trained isolated
 
 wi u(Pi ) − L[u](P0 ) =  wi ∂ α u(Q i )xiα  like what happened in PINN. Therefore, we can avoid the
  |α|=n+1 α! i=1 
i=0 phenomenon mentioned in Section II (Examples 1–3). This is
   
  one of the main contributions of this work.
   m

≤ C0 d M  α 
wi xi  Before we finish this section, we have the following two
|α|=n+1 i=1  remarks.

m Remark 1: We establish the approximation and its theory
≤ C0 d M ρ(d, n + 1)h n+1 |wi | based on a mesh of the domain. However, the only information
i=1 the algorithm needs is the location of the points and where
 m  are their neighboring points. Once we have this information,

:= Ch n+1 |wi | we can use the function values at the center and its neighboring
i=1 points to approximate the differentiation of the function at the
where we arrive (19). Here, C0 denotes the maximum of (1/α!) center point. Hence, this algorithm works for points cloud also.
and M is the maximum of ∂ α u(Q i ) at the polyhedron enclosed The only thing that has to do for points cloud is figuring out
by Pi for i = 1, . . . , m. the neighboring points of each point by some criterion. For
This theorem implies example, we can collect all the points in a ball centered at the
 that the truncation error of our point with some radius. This method can generalize the local
algorithm is Ch n+1 ( m i=1 |wi |). In most cases in
mnumerical fitting method to the high-dimensional problem.
analysis and our numerical experiments, we have i=1 |wi | ∼
O(h −n ) so that Ch n+1 ( m Remark 2: For high-order differential operators, some-
i=1 |wi |) = Ch. This is because, for
α j ∈ A, we have times, the number of nodes is not enough to establish a
reasonable scheme. For example, for the fourth-order bihar-

m
α monic operator 2 in R2 , we have to fit the operator using
wi xi j = c j α j ! at least fourth-order polynomials. This requires ρ(2, 4) = 15
i=1
m equations and then about 15 nodes. However, usually, we do
−n
Thus, averagely, we have i=1 |wi | ∼ O(h ). We can not have so many neighboring points in a mesh of R2 . In this
see that this results in almost all the numerical scheme, for case, we can mimic the idea in the finite element that adding
example [27] extra DoFs on the edges or elements. For example, we can
add some extra nodes in the middle of the edges or at
f (x + h) − 2 f (x) + f (x − h) h 2 (4)
f  (x) = − f (ξ ). (22) the center of each element. This will complement the lack
h2 12 of DoFs. Alternatively, we can add the neighboring points
m −2
In this example,
m we have i=1 |wi | ∼ O(h ). The strict of neighboring points to increase the DoFs. For example,
−n
proof of i=1 |wi | ∼ O(h ) includes the local mesh’s in Fig. 4, we can approximate 2 at P0 by the function values
geometric structure, which is complicated. We leave this proof at P1 –P5 together with the neighboring points of P1 –P5 . This
as future work. is similar to the compact finite-difference scheme.
We also point out that (19) is conservative. Actually, in (22),
we can see that the convergent rate is O(h 2 ) instead of O(h). B. Algorithm of Hybrid PINN
The numerical experiments below verify that the solution is In this section, we summarize the algorithm of hybrid PINN
usually convergent faster than O(h). and give some analysis of it. Once the differential operator
Theorem 1 shows the approximation for the nth order has been approximated, we may combine it with the neural
differential operator. Using a similar technique, we can show network to solve the PDEs.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5521

Fig. 5. Hybrid PINNs’ architecture diagram.

Suppose that we are solving the following PDE problem: Once a new algorithm for numerical PDEs is developed,
a natural question from mathematicians is: what is the advan-
L[u](x) = f (x) in , (23)
tage of the new algorithm? Compared with numerical methods
B[u](x) = 0 on ∂. (24) and PINN, the hybrid PINN inherits the high accurate property
The algorithm of hybrid PINN is summarized in of numerical method and flexibility of PINN. This means that
Algorithm 1. we do not need to write down the weak formulation and form
the stiffness and mass matrices, and the accuracy of the result
Algorithm 1 Framework of Hybrid PINN is guaranteed by Theorem 1. Similar to PINN, the hybrid PINN
can also solve the inverse problem easily, which needs lots of
Input:
Nu works in numerical methods. The simulation time of hybrid
The set of mesh points Du = {xi }i=1 ;
Nb PINN is much fewer than PINN. Another advantage of hybrid
The set of boundary points Db = {xi }i=1 ;
PINN is that as long as the geometry and the PDE of the
The set of polynomials D p = { pi (x)}ki=1 who satisfies
problem are fixed, we can always use the same approximated
theorem 1;
operator A to solve the problem. This will save a lot of time
Establish a neural network u h to approximate the solution.
because the numerical experiments below show that training
The input of this neural network is x, and the output is the
the solution takes a few time once we have A. This property
solution value at x.
suggests that the hybrid PINN is suitable for transfer learning.
Output:
Another readers’ potential question is that if we already
1: Get the approximation matrix A by solving (16) with the
have A, we can solve the PDE by solving a linear system or a
polynomials set D p .
least-square problem. Thus, why we need the neural network?
2: Train the neural network u h by the loss function:
For one thing, although we show that A is a reasonable
1  1  approximation of L and the numerical experiments show that
MSE = |Auh − f (xi )|2 + |B[u h ](xi )|2 ,
Nu x ∈D Nb x ∈D A is invertible in most of the cases, there is no guarantee that
i u i b
A is invertible in all the cases. For another, even though A is
where uh is a column vector of u h at xi , i = 1, . . . Nu . invertible or we find a generalized inverse of A, we can only
3: return u h , who can predict the solution of (23)-(24) with get the result at the mesh points. For the other points in the
any point in . domain, we have to do the interpolation. However, the neural
network serves naturally as a good interpolator. Once we train
The architecture of the hybrid PINN is shown in Fig. 5. the neural network, we can predict the solution directly at any
Remark 3: According to Theorem 1, D p in algorithm 1 point in the domain without additional action. Some problems,
can be chosen as Pdn . This is a good choice for low-order for example, inverse problems, need additional values in the
PDEs, such as Poisson’s equation. For high-order equations in interior of the domain. In this case, hybrid PINN can impose
high-dimensional cases, the DoFs can be increased as men- the restriction directly, but the numerical methods need special
tioned in Remark 2. Alternatively, other type basis functions handling when the additional points are not on the grid.
can be used to reduce the workload, which is a future work. At the end of this section, we point out that although
Remark 4: We remark that as we mentioned in Section I, the PINN and its variations are not perfect as mentioned in
we use the approximated matrix to calculate the residue of Section II, they have been proved effective by many articles,
the PDE instead of AD. We by no means discard AD entirely. such as [8]–[10]. In most cases, Xavier initialization [30]
As we will see in Section IV, we can combine the AD and and other techniques, such as [31], [32], may probably avoid
approximated matrix in a problem so that we can enjoy both strange solutions such as shown in Section II. However, there
flexibility and accuracy. is no theoretical guarantee. On the other hand, the hybrid
Note that the boundary value operator may include the PINN will not generate those strange solutions as it inherits
derivatives on the boundary, such as outer normal derivatives. the numerical methods’ properties by Theorem 1. In fact,
In this case, we can use AD to compute the derivative on the in PINN, the AD will give an accurate derivative of the neural
boundary. The numerical experiments below show that this is network with respect to the inputs, while the local fitting
a good way to deal with the boundary value operator. method will give an approximation of the differential operator

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5522 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

itself. For example, in test 1, the AD can calculate −u h ,


where u h denotes the neural network. However, our local
fitting method will give an approximation of −. This is the
essential difference between hybrid PINN and PINN.
In Section II, we mentioned two issues about PINN, VPINN,
and hp-VPINN. One is the PDE’s residue based on AD that
only gives constraints on isolated points, and the solution
in between will out of control. The other is in VPINN and
hp-VPINN and the function space of DNN is out of control,
so the integral formula may not sufficient to get a correct Fig. 6. Sample of regular mesh.
solution. However, hybrid PINN is immune to both of these
issues because the PDE has been approximated by local fitting,
and its truncation error comes with a convergent rate.

IV. N UMERICAL E XPERIMENTS


In this section, we display numerical experiments to ver-
ify our algorithm. The numerical experiments show that
the hybrid PINN is better than our imagination, and Theo-
rem 1 is conservative. Although we do not prove, the hybrid
PINN can be applied in more general cases, such as
Fig. 7. Sample of irregular mesh.
quasi-linear operators and surface PDEs. All the tests in this
section are done by Amazon web services (AWS) p3.2xlarge.
TABLE I
Sample codes can be found at https://github.com/
R ESULTS OF R EGULAR M ESH
pentilm/Hybrid-PINN.
Since CNN inspires our algorithm, we put a similarity
summary here so that readers may have a comparison between
our algorithm and CNN structure. From the CNN perspec-
tive, the convolution kernel is A in Algorithm 1. However,
the difference is that the CNN kernel is no longer 3 × 3.
This is because we are computing a “convolution” on a mesh
or graph instead of a Cartesian grid. The padding data are has been solved by the trust-region reflective algorithm
the boundary condition, instead of zero padding or the same via scipy.optimize.least_squares with default set-
padding in traditional CNN. We do not have pooling in our tings, and the neural network has been trained by L-BFGS via
algorithm since we do not need to compress our data. The Tensorflow 1.13.
stride is 1 since we need to establish the equation on every We evaluate the result by the following discretized L 2 ()
point in the mesh. Also, the number of channels is 1 because, norm:

usually, the solution is a scalar. 
1  N
u(x)l 2 =  |u(xi )|2
A. Test for Convergent Rate N i=1
In this test, we verify that the hybrid PINN has at least where {xi }i=1
N
are some random points in the domain.
first-order convergent rate, as shown in Theorem 1. We con- We choose N = 105 , and the points are generated by the Latin
sider the following problem: hypercube sampling (LHS). This cross validation is actually
−u(x) = f (x) in  = (0, 1)2 , (25) a Monte Carlo method for L 2 () norm of u(x). In this way,
we not only test the solution at the grid points but also the
u(x) = u 0 (x) on ∂ (26)
interpolation error arises from the neural network. Note that in
2
where x = (x, y), and u 0 (x) = e x sin(y). We choose a previous works about PINN, only the relative error has been
suitable f (x) so that the exact solution of (25) and (26) is tested. The discretized L 2 () norm is stronger than the relative
u 0 (x). We test the convergent rate on both the regular mesh (as error.
shown in Fig. 6) and the irregular mesh (as shown in Fig. 7). The experiment results are shown in Tables I and II. The
The red points in Figs. 6 and 7 are boundary points. discretized L 2 error is given by u(x) − u h (x)l 2 . As we
We choose W1 = 100 and W2 = 1 in (16) and use the can see, the error decreases much faster than the first order
second-order polynomials in R2 as the hard constraints and as we expected in Theorem 1 for mesh size less than or
the third-order polynomials as the soft constraints. Once we equal to 0.1. This means that when the mesh size decreases
got the matrix A, we use a neural network with 4 hidden one time, the error decreases more than one time. The time
layers and 100 neurons for each layer to predict the solution. comparison to solve A and train DNN shows that in hybrid
The activation is sin(s) for the neural network, and the PINN, more time is spending on finding out A for the irregular
neural network has been initialized by Xavier. Equation (16) mesh. This is because we are using CPU to compute A.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5523

TABLE II
R ESULTS OF I RREGULAR M ESH

Fig. 9. Solution for test 3.

the right-hand side of (27) so that the exact solution of (27)


and (28) is u 0 (x). We solve this problem on mesh with size
0.05. All the other parameters keep the same to test 1. The
discretized L 2 error is 1.313956e − 04.
The mathematical proof of the variable coefficients case
Fig. 8. Solution for test 1. of Theorem 1 needs Taylor’s expansion of the coefficients,
which will lead to many extra terms. Although the proof is
TABLE III
potentially cumbersome, the idea of why it works is easy:
E RRORS W ITH D IFFERENT DNN’ S S TRUCTURES ON R EGULAR M ESH
similar to the linear operator, the quasi-linear operator can also
be approximated by a matrix. Actually, this is also the case in
traditional numerical methods, such as finite difference, finite
volume, and finite-element methods.

If there is a GPU-based least-square program for sparse matrix C. Test for 3-D Case
variables, then the computational time for A will be extremely
In this test, we test the hybrid PINN for 3-D Helmholtz
decreased. Even in the finest mesh in our experiments, we only
equation
have 1, 600 gird points but get a discretized L 2 error around
10−4 –10−5 . This is much more efficient than the PINN’s result u(x) + k 2 u(x) = f (x) in  = (0, 1)3 , (29)
shown in [7] that about 20 000 points set gives an around 10−3 u(x) = u 0 (x) on ∂ (30)
relative L 2 error.
From these results, we can see that the regular mesh is more where x = (x, y, z), k = 5 is the wavenumber and
efficient than the irregular mesh. We, therefore, suggest using
u 0 (x) = (0.1 sin(2π x) + tanh(10x)) sin(2π y) sin(2π z).
regular mesh if it is available.
The scatter plot at the cross-validation points of solution We choose a suitable f (x) so that u 0 (x) is the exact solution
based on regular mesh with mesh size 0.025 is shown in Fig. 8. of (29) and (30). This u 0 is enlightened in [21]. The solution
We also study the relationship between the accuracy and changes rapidly between different periods. We solve this prob-
structure of DNN. The number of layers and neurons for lem on a mesh with size 0.06 and keep all the other parameters
each layer has been changed and the errors have been shown the same. The discretized L 2 error is 2.041547e−01. The slice
in Table III. As shown in the table, although there are few of the 3-D solution at x = 0.8, y = 0.8, and z = 0.2 is shown
exceptions, when the number of layers and neurons is increas- in Fig. 9. This experiment also shows that the hybrid PINN
ing, the accuracy is almost increasing. However, a larger can deal with the solution with poor smoothness.
architecture of DNN requires more computational power.
To balance the workload and accuracy, we will adopt the DNN
D. Test for Complex Geometry
structure mentioned above for the following examples.
In this test, we resolve (25)–(30) with
B. Test for Variable Coefficients PDEs  y 2  x 
u 0 (x) = exp sin
In Theorem 1, we assume that the operator L is linear. 1000 1000
In this test, we will show that the hybrid PINN is also work to test the hybrid PINN on complex geometry.  comes from
for quasi-linear operators. Consider the following problem: Wolfram Mathematica’s website:
−(c1 (x)∂xx u(x)+c2(x)∂ yy u(x)) = f (x) in  = (0, 1)2 (27) https://www.wolfram.com/language/11/
differential-eigensystems/
u(x) = u 0 (x) on ∂ (28)
analyze-the-acoustic-eigenmodes-of-a-car
where c1 (x) = 2 + cos(x + y), c2 (x) = 2 + sin(x + y), html?product=mathematica
2
and u 0 (x) = e x sin(y). A suitable f (x) has been put on The data set describes the dimension of a real mini car.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5524 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

Fig. 10. Mesh for the car shape domain.


Fig. 12. Mesh for the surface PDEs.

Fig. 11. Solution for test 4. Fig. 13. Solution for test 5.

where S2 is the Laplace–Beltrami operator on S2 . We by


Also, the mesh for our simulation is shown in Fig. 10. In this the way mention that there are two typos in [9]. One is
mesh, we have 1929 mesh points. As we can see, the geometry f = 12x 1 x 2 x 3 when u = x 1 x 2 x 3 . The other is ∇ ·v = D1 v 1 +
is locally concave, and the area of  is about 2.33117 × 106 . D2 v 2 + D3 v 3 , where v = (v 1 , v 2 , v 3 ) and ∇ = (D1 , D2 , D3 ).
Thus, it is a complex large area geometry. We choose the zeroth- and first-order polynomials on S2
The solution at the mesh points is shown in Fig. 11. As it as the hard constraints and second-order polynomials as the
is hard to generate uniformly distributed random points on  soft constraint. We use the mesh generated by Meshzoo
in this case, we calculate the discretized L 2 error just by the (https://github.com/nschloe/meshzoo) based on
values on the mesh points, which is 3.204727. The error is the icosahedron. The sample mesh used in this experiment is
relatively large due to the large area. Note that the mesh size shown in Fig. 12.
goes from 27.9 to 104.1, as shown in Fig. 10. Also, the L 2 () We choose u 0 (x) = x sin( y) + z as the exact solution and
norm of the exact solution is around 19179, which means then put the suitable f (x) on the right-hand side to balance
that the relative error is about 1.671e − 4. Thus, this error the equation. The results are summarized in Table IV, and the
is acceptable. solution is shown in Fig. 13. Since it is hard to calculate the
actual mesh size (geodesic distance) on the surface, we instead
show the number of mesh points. Although we did not prove
E. Test for Surface PDE
the theorem on the surface, the experiments show that the
In this test, we test the hybrid for surface PDE. The only dif- hybrid PINN still has an almost first-order convergent rate
ference is that we should replace the polynomial sets Pdmin and on S2 with respect to the number of mesh points. This is
Pdmax to the set of polynomials on the surface. We will take the equivalent to the second-order convergent rate with respect to
unit sphere S2 ⊂ R3 as an example. We consider the following size mesh on S2 . The mathematical proof of the convergent
spherical coordinate transform (r, θ, φ) → (x, y, z) = x : rate on the surface will leave as future work.
For comparison, we redo the experiment by using PINN
x = r sin(θ ) cos(φ), y = r sin(θ ) sin(φ), z = r cos(θ )
as the author introduced in [9]. We keep the sample points
where r ∈ [0, ∞), θ ∈ [−(π/2), (π/2)], and φ ∈ [0, 2π). as the same as the mesh points shown in Table IV and
Since S2 can be parameterized by (θ, φ), then we can define calculate the same L 2 () norm. The result is summarized
x = (x/r ), y = (y/r ), and z = (z/r ) so that the polynomials in Table V. By comparing Tables IV and V, we can see
on S2 are defined by x = (x, y, z). We repeat the experiments that the hybrid PINN has tremendous advantages both in
that we did in [9] and resolve the following problem: computational time and accuracy.
It is not surprising that the training time for A is increas-
−S2 u(x) = f (x) on S2 (31) ing, while the number of points is increasing. However,

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
FANG: HIGH-EFFICIENT HYBRID PHYSICS-INFORMED NEURAL NETWORKS BASED ON CNN 5525

TABLE IV
R ESULTS FOR (31) BY H YBRID PINN

TABLE V This experiment shows that hybrid PINN can solve the
C OMPARISON TO PINN inverse problem such as PINN, which is a significant advan-
tage compared with traditional numerical methods.

V. C ONCLUSION AND F UTURE W ORK


A. Conclusion
In this article, we analyze the PINN and some of its
variations. Based on the case studies, we put forward the
in Table IV, when the number of mesh points increases
hybrid PINN. This algorithm is enlightened by the CNN
from 4002 to 16002, the time for A changes suddenly. This
and the finite volume method. A sparse matrix has been
probably because the stop criteria of scipy.optimize.
used to approximate the linear and quasi-linear operators.
least_squares consist of the change of the cost function,
We develop a local fitting method to find out a reasonable
the change of the independent variables, the norm of the
sparse matrix. Theorem 1 shows that the local fitting method
gradient, and so on. Thus, in the real problem, it is hard to say
gives us a good approximation of the differential operator
which criteria will be satisfied first. This may cause that the
with a convergent rate. The numerical experiments verify our
relationship between the training time for A and the number
algorithm. We summarize the features of hybrid PINN and
of points is not polynomial. Therefore, we see a big change in
some future work below, although some of them have been
the table. Nevertheless, the error shows that all the solutions
mentioned in this article.
are reasonable.
The PINN’s result in Table V does not have this phe-
nomenon because PINN uses AD instead of the discretized B. Discussion and Future Work
operator proposed in this article. The hybrid PINN combines the advantages of traditional
We point out that, even though the mesh elements and edges numerical methods and PINN. It is flexible and can be used for
on the surface are out of the surface, it will not introduce many problems, such as surface PDEs and inverse problems.
extra error in hybrid PINN. This is because we only use the Also, it has higher accuracy than PINN and a convergent rate
connection information among the points on the surface and as well. As mentioned in this article, as long as the PDE
never utilize any geometric information of the mesh, which is and the geometry are fixed, we can always use the same
similar to the point in Remark 1. sparse matrix to approximate the differential operator, even
though the boundary condition and the right-hand side term
F. Test for Inverse Problem are changed. This makes hybrid PINN suitable for transfer
In this test, we show that the hybrid PINN can solve the learning and parallel learning. Because the hybrid PINN inher-
inverse problem, which is the model identification in [7]. its the numerical methods, we can expect that some numerical
Consider the following problem [33]: techniques will be applied here, such as superconvergence and
extrapolation [24].
−au(x) + u(x) = f (x) in  = (0, 1)2 , (32)
On the other hand, although we can solve the problem by
u(x) = u 0 (x) on ∂, (33) hybrid PINN in case the geometry is given by a points cloud,
∂u ∂u 0 it needs an additional step to collect the neighboring points’
(x) = (x) on ∂ (34)
∂n ∂n information. Basically, this will take a loop for all the points in
where a is a constant to be determined. We choose u 0 (x) = the points cloud. At this point, PINN is convenient. In addition,
2 2
e x sin(y) and f (x) = −e x (1 + 8x 2 ) sin(y). Thus, the exact Theorem 1 guarantees the local fitting method’s accuracy,
solution of a is 2. In this case, we use the local fitting but there is no guarantee of stability. For the parabolic and
technique to find the approximation matrix A for −, and hyperbolic equations, which are stability sensitive, this prob-
then, in the neural network, the left-hand side of (32) is given lem is nonnegligible. What is more, the proof in Theorem 1
by (aA+I)u, where I is the identity matrix. As we mentioned is for the ideal case that (15) is exact for all polynomials
above, the derivative on the boundary will be computed by in Pdkmin . However, in our algorithm, (15) is solved by (16).
AD. We keep all the other parameters the same to test 1, and This will lead to an error, although the numerical experiments
we do the simulation on the regular mesh with size 0.05. The show that this error will not lead to an issue in practice.
time to train the neural network is 19.4567(s), the prediction Finally, for nonlinear operators, this method will not be applied
for a is 1.985, and the discretized L 2 error of the solution is because it is unreasonable to use a finite-dimensional linear
5.607821e − 06. operator (matrix) to approximate a nonlinear operator.

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.
5526 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 10, OCTOBER 2022

We also state some potential future work based on this [14] L. Yang, X. Meng, and G. Em Karniadakis, “B-PINNs: Bayesian
article. First, as mentioned in this article, there may be physics-informed neural networks for forward and inverse PDE prob-
lems with noisy data,” 2020, arXiv:2003.06097. [Online]. Available:
some remainder DoFs after imposing the hard constraints. http://arxiv.org/abs/2003.06097
We may also utilize those remainder DoFs to control the [15] Y. Yang and P. Perdikaris, “Adversarial uncertainty quantification
stability and the energy or flux conservation, especially for in physics-informed neural networks,” J. Comput. Phys., vol. 394,
pp. 136–152, Oct. 2019.
the parabolic and hyperbolic equations. The concrete form of [16] Y. Deng et al., “Quantifying fibrinogen-dependent aggregation of
those restrictions needs more study. Second, the GPU-based red blood cells in type 2 diabetes mellitus,” Biophys. J., vol. 119,
solver for the sparse matrix least square should be developed. pp. 900–912, Sep. 2020.
[17] K. Sampani, “Computational fluid dynamics (CFD) estimation of throm-
Once this has been done, the hybrid PINN will be even much bus formation in diabetic retinal microaneurysms (MAs),” Investigative
faster than we have shown in Section IV. Third, the proof of Ophthalmol. Vis. Sci., vol. 61, no. 7, p. 5023, 2020.
theorem 1 for variable coefficients [18] K. Shukla, P. Clark Di Leoni, J. Blackshire, D. Sparkman, and
 and surface case is future G. Em Karniadakis, “Physics-informed neural network for ultra-
work. The concrete order of m i=1 |wi | in (19) also deserves sound nondestructive quantification of surface breaking cracks,” 2020,
more effort. Fourth, other than polynomials, we may use other arXiv:2005.03596. [Online]. Available: http://arxiv.org/abs/2005.03596
basis functions in the local fitting method, such as the Fourier [19] X. Jin, S. Cai, H. Li, and G. Em Karniadakis, “NSFnets (Navier-Stokes
flow nets): Physics-informed neural networks for the incompressible
series. Hairer et al. [34] reported a symplectic scheme for Navier-Stokes equations,” 2020, arXiv:2003.06496. [Online]. Available:
Helmholtz equation, which is exact for sin(kx) for any k ∈ Z. http://arxiv.org/abs/2003.06496
This implies that we can find the approximated matrix by other [20] E. Kharazmi, Z. Zhang, and G. E. Karniadakis, “Variational physics-
informed neural networks for solving partial differential equa-
basis, and then, it possesses more amazing properties. Finally, tions,” 2019, arXiv:1912.00873. [Online]. Available: http://arxiv.org/
a reasonable way to approximate the nonlinear operators is abs/1912.00873
also a potential future work. [21] E. Kharazmi, Z. Zhang, and G. Em Karniadakis, “Hp-VPINNs: Vari-
ational physics-informed neural networks with domain decomposi-
tion,” 2020, arXiv:2003.05385. [Online]. Available: http://arxiv.org/abs
ACKNOWLEDGMENT /2003.05385
[22] A. Iserles, A First Course in the Numerical Analysis of Differential
The author would like to thank Dr. Didong Li at Duke Equations, vol. 44. Cambridge, U.K.: Cambridge Univ. Press, 2009.
University, Durham, NC, USA, for his valuable advice on this [23] R. Courant and D. Hilbert, Methods of Mathematical Physics: Partial
Differential Equations. Hoboken, NJ, USA: Wiley, 2008.
article. [24] S. Brenner and R. Scott, The Mathematical Theory of Finite Element
Methods, Vol. 15. New York, NY, USA: Springer-Verlag, 2007.
R EFERENCES [25] D. Braess, Finite Elements: Theory, Fast Solvers, and Applications in
Solid Mechanics. Cambridge, U.K.: Cambridge Univ. Press, 2017.
[1] Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro, “Physics-informed [26] R. Li, Z. Chen, and W. Wu, Generalized Difference Methods for
neural networks for inverse problems in nano-optics and metamaterials,” Differential Equations: Numerical Analysis of Finite volume Methods.
Opt. Exp., vol. 28, no. 8, pp. 11618–11633, 2020. Boca Raton, FL, USA: CRC Press, 2000.
[2] J. Berg and K. Nyström, “A unified deep artificial neural network [27] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, vol. 12.
approach to partial differential equations in complex geometries,” Neu- New York, NY, USA: Springer-Verlag, 2013.
rocomputing, vol. 317, pp. 28–41, Nov. 2018. [28] S. Sobolev and V. Vaskevich, The Theory of Cubature Formulas,
[3] E. Weinan and B. Yu, “The deep Ritz method: A deep learning-based vol. 415. Dordrecht, The Netherlands: Springer, 2013.
numerical algorithm for solving variational problems,” Commun. Math. [29] L. C. Evans, Partial Differential Equations, vol. 19. Providence, RI,
Statist., vol. 6, no. 1, pp. 1–12, 2018. USA: American Mathematical Society, 2010.
[4] Y. Khoo, J. Lu, and L. Ying, “Solving for high-dimensional committor [30] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
functions using artificial neural networks,” Res. Math. Sci., vol. 6, no. 1, feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell.
p. 1, 2019. Statist., Mar. 2010, pp. 249–256.
[5] E. Samaniego et al., “An energy approach to the solution of partial [31] S. Wang, Y. Teng, and P. Perdikaris, “Understanding and mitigat-
differential equations in computational mechanics via machine learning: ing gradient pathologies in physics-informed neural networks,” 2020,
Concepts, implementation and applications,” Comput. Methods Appl. arXiv:2001.04536. [Online]. Available: http://arxiv.org/abs/2001.04536
Mech. Eng., vol. 362, Oct. 2020, Art. no. 112790. [32] S. Wang, X. Yu, and P. Perdikaris, “When and why PINNs fail to train:
[6] M. H. Hassoun, Fundamentals of Artificial Neural Networks. New York, A neural tangent kernel perspective,” 2020, arXiv:2007.14527. [Online].
NY, USA: MIT Press, 1996. Available: http://arxiv.org/abs/2007.14527
[7] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed [33] V. Isakov, Inverse Problems for Partial Differential Equations, vol. 127.
neural networks: A deep learning framework for solving forward and New York, NY, USA: Springer, 2017.
inverse problems involving nonlinear partial differential equations,” [34] E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration:
J. Comput. Phys., vol. 378, pp. 686–707, Mar. 2019. Structure-Preserving Algorithms for Ordinary Differential Equations,
[8] Z. Mao, A. D. Jagtap, and G. E. Karniadakis, “Physics-informed neural vol. 31. Berlin, Germany: Springer-Verlag, 2016.
networks for high-speed flows,” Comput. Methods Appl. Mech. Eng.,
vol. 360, Feb. 2020, Art. no. 112789.
[9] Z. Fang and J. Zhan, “A physics-informed neural network framework
for PDEs on 3D surfaces: Time independent problems,” IEEE Access,
vol. 8, pp. 26328–26335, 2019.
[10] Z. Fang and J. Zhan, “Deep physical informed neural networks for
metamaterial design,” IEEE Access, vol. 8, pp. 24506–24513, 2019. Zhiwei Fang received the Ph.D. degree in applied
[11] A. D. Jagtap, E. Kharazmi, and G. E. Karniadakis, “Conservative mathematics from the Department of Mathematical
physics-informed neural networks on discrete domains for conservation Sciences, University of Nevada, Las Vegas, Las
laws: Applications to forward and inverse problems,” Comput. Methods Vegas, NV, USA, in 2020.
Appl. Mech. Eng., vol. 365, Oct. 2020, Art. no. 113028. He is currently a Senior Software Engineer with
[12] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, AI & HPC, NVIDIA Corporation, Santa Clara,
“Automatic differentiation in machine learning: A survey,” J. Mach. CA, USA. His research interests include machine
Learn. Res., vol. 18, no. 1, pp. 5595–5637, 2017. learning, high-performance computing, numerical
[13] L. Sun and J.-X. Wang, “Physics-constrained Bayesian neural net- analysis, computational electromagnetic fields, and
work for fluid flow reconstruction with sparse and noisy data,” 2020, uncertainty quantification.
arXiv:2001.05542. [Online]. Available: http://arxiv.org/abs/2001.05542

Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 02:14:07 UTC from IEEE Xplore. Restrictions apply.

You might also like