Solving a time-periodic p-Laplacian equation using
deep learning
July 31, 2022
Abstract
We take interest in a parabolic time-periodic p-Laplacian equation, pa-
rameterized either by p ∈ [1, ∞[ or a variable exponent p(x). We provide a
method for solving it using the emerging deep learning technologies.
Introduction
Our focus in this work is to provide a numerical solution to a to a nonlinear parabolic
equation with p-growth conditions and L1 data modeled as
∂u
∂t − ∆p u = f
in QT :=] 0, T [×Ω
u(0, ·) = u(T, ·) in Ω
u=0 on ΣT :=] 0, T [×∂Ω
Here Ω is a bounded open subset of RN with smooth boundary ∂Ω, T > 0 is the
period, f is a measurable function periodic in time with period T and belonging
to L2 (QT ) and ∆p u = div (|∇u|p−2 ∇u) is the p-Laplacian operator, where the
exponent p(·) is assumed to be either a constant p ∈ [1, ∞[ or a continuous function
p(x) on Ω̄ such that inf x∈Ω̄ p > 1.
The well-posedness of such problems has already been studied in [4], but to our
knowledge there are no numerical solutions yet. This may be due to the strong
non-linearity of the p-Laplacian operator, especially for a variable exponent, and
the increasing difficulty of establishing a numerical scheme that also takes in consid-
eration the time steps of the time derivative and the periodic boundary condition.
Although these constraints have been studied individually using the classical
numerical methods (Mainly finite elements (FEM) and finite differences (FDM))in
many papers and works such as [7], we thought of exploiting the advances being
made in Physics-Informed Neural Networks (PINNs), the leading technology for
solving differential equations using deep learning [9], because of their straightfor-
ward implementation that allows us to define the problem directly without resorting
to any discretization and time-stepping. That’s why ultimately this paper will serve
as an introduction to the new differential solvers through deep learning, with the
time-periodic p-Laplacian serving as an example for their strong points.
1
1 Deep learning and PINNs
This section will briefly introduce the necessary aspects of deep learning and PINNs
that the reader might need to understand the implementation of the numerical so-
lution to our problem. We recommend [3][5][6] for more comprehensive reading on
deep learning..
1.1 Key definitions
• Loss function
For the process of learning, we first choose a suitable activation function
and then define a loss function (also known as an error function, or simply
loss), whose value is optimized (maximized or minimized) by changing the
ANN parameters. It is desirable to have a loss function that is convex with
continuous first derivative. This simplifies of the optimization, which can
be performed using the gradient descent or related algorithms. The most
commonly used one is the squared-sum error function defined as:
N
1X
E= kf (xn ; w, b) − tn k2
2 n=1
,
where N is the number of samples, xn is the nth input sample, tn its output,f (∆; w, b)
and k∆k resp. represent the ANN output and the L2 norm. The optimal val-
ues of the parameters −W̃ and b̃− are obtained by solving an optimization
problem.
A smooth and convex loss function can be optimized using the gradient opti-
mization method provided that the derivative of the loss function is known.
• Backpropagation
Backpropagation is the application of the chain rule1 of calculating the deriva-
tive in ANN. In backpropagation, the information first flows from the input to
the output and then in the reverse direction. The difference of the predicted
output and the actual output (the error), is then determined. The error made
by network to predict true value or label is differentiated with respect to all
the parameters and weights of the network and then the weights are adjusted
accordingly.
In fact, for each set, we compute the prediction and its associated loss. We
sum up the loss to compute the final error. Then we use the backpropagation
algorithm to propagate the error in order to compute the partial derivatives
∂E ∂E
and of the loss function E for all weights w and bias b.
∂w ∂b
1
the chain rule is a formula to compute the derivative of a composite function. That is, if f and
g are differentiable functions, then the chain rule expresses the derivative of their composite f ◦ g
in terms of the derivatives of f and g and the product of functions as follows:(f ◦ g)0 = (f 0 ◦ g) · g 0 .
2
• Optimization
Once all the derivatives are computed, we update our parameters using a
chosen optimization technique such as Stochastic Gradient Descent. We then
iterate the prediction (e.g. forward pass), the backpropagation of errors (e.g.
backward pass) and the optimization until convergence, eventually to find a
local minimum low enough to ensure good predictions. Even if the chosen
surrogate loss function of a neural network is non convex, GD works well in
practice. Gradient descent is one of the most popular algorithms to perform
optimization and by far the most common way to optimize neural networks.
Every deep learning library contains implementations of various algorithms to
optimize gradient descent (e.g. lasagne, caffe, and keras). These algorithms,
however, are often used as black-box optimizers, as practical explanations
of their strengths and weaknesses are hard to come by. Other optimization
algorithms are ”adam” and ”Limited-memory BFGS”.
• Overfitting
A common problem encountered in the process of learning is overfitting. It
generally occurs when the learning is performed for too long, and especially
when the training set is too small to evenly represent all types of patterns from
the domain of possible network inputs. In such a case the learning may adjust
the network to random features present in the training data. Overfitting is
observed during the learning process, when network’s predictive performance
is improving on the training set, however worsening on previously unseen
test data. To combat this issue the labeled data is split into a training and
a validation set. The main reason why to use the validation set is that it
shows the error rates on the data independent from the data we are training
on. A study by Guyon suggests that the optimal ratio between the size of
training vs. validation data set depends on the number of recognized classes
and the complexity of class features. An estimation of feature complexity is
however quite cumbersome. While learning, the performance of the ANN is
regularly examined on validation data set. When errors retrieved on validation
data reach a stopping point, learning process is stopped and the network is
considered trained.
• Hyperparameters
Hyperparameters are variables that we need to set before applying a learning
algorithm to a dataset.
– Number of hidden layers: adding more hidden layers of neurons gen-
erally improves accuracy, to a certain limit which can differ depending
on the problem.
– Dropout: what percentage of neurons should be randomly “killed” dur-
ing each epoch to prevent overfitting.
– Neural network activation function : which function should be used
to process the inputs flowing into each neuron. The activation function
3
can impact the network’s ability to converge and learn for different ranges
of input values, and also its training speed.
– Weight initializer it is necessary to set initial weights for the first for-
ward pass. Two basic options are to set weights to zero or to randomize
them. However, this can result in a vanishing or exploding gradient,
which will make it difficult to train the model. To mitigate this problem,
you can use a heuristic2 to determine the weights. Common heuristics
used for the Tanh activation are Xavier initialization or Glorot Uniform.
– Learning rate: how fast the backpropagation algorithm performs gra-
dient descent. A lower learning rate makes the network train faster but
might result in missing the minimum of the loss function
– Optimizer algorithm and neural network momentum : these pa-
rameters determine the rate at which samples are fed to the model for
training. An epoch is a group of samples which are passed through the
model together (forward pass) and then run through backpropagation
(backward pass) to determine their optimal weights. If the epoch can-
not be run all together due the size of the sample or complexity of the
network, it is split into batches, and the epoch is run in two or more it-
erations. The number of epochs and batches per epoch can significantly
affect model fit, as shown below.
1.2 Physics Informed Neural Networks (PINNs)
In this subsection, we briefly summarize the implementation of PINNs. For further
reading, we recommend [10][11][2]. This subsection is inspired from [1] which also
contains the implementation in python.
One-Dimensional Advection-Diffusion Equation
The advection-diffusion equation is a combination of the diffusion and convection
(advection) equations, and describes physical phenomena where particles, energy,
or other physical quantities are transferred inside a physical system due to two
processes: diffusion and convection. Depending on context, the same equation can
be called the convection-diffusion equation, drift-diffusion equation, or (generic)
scalar transport equation. In one-dimension, the simplified transient form of the
equation can be written as the following
∂u ∂ 2u ∂u
=D 2 −V + S,
∂t ∂x ∂x
where
- u = u(t, x) is the quantity of interest (e.g., pollutant concentration for mass
transfer, temperature for heat transfer).
- D is the diffusivity (also called diffusion coefficient), such as mass diffusivity
for particle motion or thermal diffusivity for heat transport.
2
a formula tied to the number of neuron layers
4
- V is the velocity magnitude that the quantity is moving with. In general, it is
a function of time and space, but for simplicity we assume it is a constant value.
- S = S(t, x) describes sources or sinks of the quantity u. For example, for
chemical species, S > 0 means that a chemical reaction is creating more of the
species.
Quick Theory on Equation-informed Deep Learning
To address the above problem, we consider constructing a deep neural network that
returns predictions constrained by the parametrized partial differential equation in
the following form
ut − Duxx + V ux − S = 0,
where u(t, x) is represented by a deep neural network parametrized by a set of
parameters θ i.e., u(t, x) = fθ (t, x), where x is a vector of space coordinates and t
is the time coordinate. As neural networks are differentiable representations, this
construction defines a so-called equation-informed neural network that corresponds
to the PDE residual i.e.,
∂ ∂2 ∂
rθ (t, x) := fθ (t, x) − D 2 fθ (t, x) + V fθ (t, x) − S(t, x).
∂t ∂x ∂x
The resulting training procedure allows us to recover the shared network pa-
rameters θ using a few scattered observations of u(t, x), namely
{(ti , xi ) , ui } , i = 1, . . . , Nu ,
along with a larger number of residual points,
{(ti , xi ) , ri } , i = 1, . . . , Nr ,
that aim to penalize the PDE residual at a finite set of Nr residual points. The
data for the residuals are typically zero (i.e., ri = 0 ), or they may correspond to
external forcing terms evaluated at the corresponding location e.g., S (ti , xi ) , i =
1, . . . , Nr .
The resulting optimization problem can be effectively solved using standard
stochastic gradient descent without necessitating any elaborate constrained opti-
mization techniques, simply by minimizing the composite loss function
Nu Nr
1 X 2 1 X
L(θ) = kfθ (ti , xi ) − ui k + krθ (ti , xi ) − ri k2 ,
Nu i=1 Nr i=1
where the required gradients ∂L/∂θ can be readily obtained using automatic
differentiation.
5
Figure 1: The architecture of PINNs
Application as a PDE solver
Numerical solution of a PDE requires the imposition of proper boundary and ini-
tial conditions. This can be easily achieved by including the boundary and initial
conditions as observations given at Nu points. Thus, they will contribute to the
first term in the loss function. In addition, the equation residual is minimized on
an arbitrary set of Nr residual points. No data are required to be provided at these
points.
2 Application to time-periodic p-Laplacian
To find the numerical solution to our problem, we implement PINNs via the deep
learning library DeepXDE [8].
We study 3 different cases, the normal Laplacian p = 2, a p-Laplacian with
p = 4 and the p(x)-Laplacian. All these problems are well-posed as evidenced in ....
We use normalized domains (T = 1, Ω = [0, 1]×[0, 1]) and we verify the accuracy
of our results by comparing them to the exact solution which is
u(x, y, t) = xy(1 − x)(1 − y)sin(2πt)
with the source term f (x, y, t) being changed to fit the problem i.e the Method of
Manufactured Solutions (MMS).
6
2.1 The procedure
The PDE residual
We first start by defining the PDE residual i.e Y (X) = ut − ∆p(X) u − f (X) with
X = (x, y, t)
1 def p(x,y):
2 return #value of p or p(x,y)
3 def func(x,y,t):
4 return #source term
5 )
6
7 def pde(x,y):
8 # derivative of the the output wrt first input (x_coor)
9 u_x = [Link](y, x, i=0, j=0)
10 # derivative of the the output wrt second input (y_coor)
11 u_y = [Link](y, x, i=0, j=1)
12 # derivative of the the output wrt third input (time)
13 u_t = [Link](y, x, i=0, j=2)
14
15 norm_gradu = (u_x**2 + u_y**2)**((p(x[: , 0:1],x[: , 1:2])-2)/2)
16
17 u_x_norm_grad = u_x * norm_gradu
18 u_y_norm_grad = u_y * norm_gradu
19
20 u_x_norm_grad_x = [Link](u_x_norm_grad, x, i=0, j=0)
21 u_y_norm_grad_y = [Link](u_y_norm_grad, x, i=0, j=1)
22
23 return u_t - (u_x_norm_grad_x + u_y_norm_grad_y)
24 - func(x[: , 0:1],x[: , 1:2],x[: , 2:3])
Defining the domain and boundaries
We next define the domains and boundaries:
1 # func_bdry is a function that returns the Direchlet BC
2 # value, which is zero in our problem
3 def func_bdry(x):
4 return [Link]([len(x),1])
5 # we implement the domain, which only serves as a placeholder
6 # as a unit cube
7 # seeing how \Omega=[0,1]^2 and t \in [0,1]
8 domain = [Link](xmin=[0, 0, 0], xmax=[1, 1, 1])
9
10 # the bottom boundary is y=X[1]=0
11 def boundary_b(x, on_boundary):
7
12 return on_boundary and [Link](x[1], 0)
13 # the top boundary is y=X[1]=1
14 def boundary_t(x, on_boundary):
15 return on_boundary and [Link](x[1], 1)
16 # the right boundary is x=X[0]=1
17 def boundary_r(x, on_boundary):
18 return on_boundary and [Link](x[0], 1)
19 # the left boundary is x=X[1]=0
20 def boundary_l(x, on_boundary):
21 return on_boundary and [Link](x[0], 0)
22 # the time boudary is t=X[2]=0 and t=X[2]=1
23 def time_boundary(x, domain):
24 return domain and ([Link](x[2], 0) or [Link](x[2], 1))
25 # but since since the solution is time periodic
26 # we only need one boundary
The neural network
We also define the Neural Network and some of its hyperparameters, such that we
have 4 hidden layers of 50 neurons, with the snake activation function for its good
performance on periodic functions and the Glorot uniform weights initializer.
1 net = [Link]([3] + [50] * 4 + [1],
2 [Link], "Glorot uniform")
Now after defining the problem settings, we need to impose the boundary con-
ditions.
The boundary conditions
There are two methods to do so, either by imposing soft BCs, which is to embed
to solution in the training loss function (see. 1), or by imposing hard BCs ref [],
which is a technique that modifies the architecture of the domain to exactly fit the
BCs. Using either one of them depends on the problem at hand, but we define both
here in this guide.
Soft BCs
1 # we impose zero on the 4 boundaries
2 # component=0 refers to the output i.e solution u(x,y,t)
3 # of which we only have one
4 bc1 = [Link](domain, func_bdy, boundary_b, component=0)
5 bc2 = [Link](domain, func_bdy, boundary_t, component=0)
6 bc3 = [Link](domain, func_bdy, boundary_r, component=0)
7 bc4 = [Link](domain, func_bdy, boundary_l, component=0)
8
8
9 # the second argument is 2 since the periodicity
10 # is on t=X[2]
11 pbc1 = [Link](domain, 2, time_boundary, component=0)
We then embed these BCs into the training, as we will see a bit further.
Hard BCs
For a more detailed explanation we refer the reader to ref []. The basic principle,
for a lack of better wording, is ”disguising” the inputs and/or outputs of the net-
work itself to fit the boundary conditions. For imposing Periodic conditions, we
approximate i.e replace the periodic variable (here t) by a concatenation of Fourier
transforms. For Dirichlet conditions, we change the output of the network to a
function that takes into account the variables and the boundaries and imposes a
zero.
We define these transformations in (resp.) feature_transform
and output_transform.
We then apply them to the network using the modules apply_feature_transform
and apply_output_transform.
1 def feature_transform(inputs):
2 # Periodic BC in x
3 P = 1
4 w = 2 * [Link] / P
5 x, y, t = inputs[:, :1], inputs[:, 1:2], w*inputs[:,2:]
6 return [Link]((
7 x,
8 y,
9 [Link](t),
10 [Link](t),
11 [Link](2 * t),
12 [Link](2 * t),
13 [Link](3 * t),
14 [Link](3 * t),
15 [Link](4 * t),
16 [Link](4 * t),
17 ),
18 axis=1,
19 )
20
21
22 def output_transform(inputs, outputs):
23 x, y, t = inputs[:, :1], inputs[:, 1:2],inputs[:, 2:]
24 return x*y*(1-x)*(1-y)*outputs
25
9
26 net.apply_feature_transform(feature_transform)
27 net.apply_output_transform(output_transform)
The training data
We use the module [Link] to define the training data, which comprises of the
PDE residual and BCs as defined above (in the case of the soft BCs), and the
number of training and validation points.
1 data = [Link](
2 domain, pde, [bc1,bc2,bc3,bc4,pbc1], num_domain=1000,
3 num_boundary=1200, num_test=10000
4 )
The model
Our model is now set
1 model = [Link](net,data)
Note that in the case of hard boundary conditions, the BCs in data are disregarded
seeing how net is modified as seen above.
Training the model
The settings that are taken into account during the training such as the learning
rate, weight losses, batch sizes... etc. of a model usually depend on the problem
and the experience of the user, and can and should be be modified to get better
results.
1 [Link]("adam", lr=0.001, loss='MSE',
2 loss_weights=loss_weights)
3
4 checker = [Link](
5 "model/[Link]", save_better_only=True, period=1000
6 )
7
8 losshistory, train_state = [Link](epochs=40000,
9 batch_size = batch_size_,callbacks = [checker])
10 [Link]("L-BFGS-B")
11
12 [Link].set_LBFGS_options(
13 maxcor=50,
14 )
15 losshistory, train_state = [Link](batch_size = batch_size_)
10
2.2 In the case of p=2
Our problem is as follows
∂u
∂t − ∆u = f
in QT :=] 0, T [×Ω
u(0, ·) = u(T, ·) in Ω
u=0 on ΣT :=] 0, T [×∂Ω
• We train the model in batches, initially using the adam optimizer for 3000
iterations then L-BFGS-B, on 400 boundary points, 400 domain points and we use
as a validation (test) set 1000 random points.
• We also impose hard boundary conditions, a technique that implements the
boundary conditions by altering the architecture of the network instead of cal-
culating the loss of each boundary condition (see 1). This technique saves time,
computation resources and memory, while also giving a more accurate result.
• The results of the training are:
(a) Output of adam training (b) Output of L-BFGS-B
training
Figure 2: Training output
Interpretation: train loss refers to the PDE loss on the training points while
test loss refers to the performance of the model on the validation set (test points).
This means that the solution has satisfied the PDE in both the training points,
and successfully predicted the solution on points other than the ones it was trained
on. test metric refers to L2 relative error between the predicted solution and the
exact solution, and while it shows initially for the adam training, it simply wasn’t
programmed to be printed for the L-BFGS-B training despite it being computed.
This is the strength of PINN solvers, where we can always be sure of the solu-
tion satisfying the problem. We can also note the performance of the L-BFGS-B
optimizer.
11
Visualizing the solution
(a) Solution for t=0
(b) Solution for t=0.25 (c) Solution for t=0.75 (d) Solution for t=1
Figure 3: Slices of the time-periodic Laplacian solution by deep learning
We can see the initial and final solutions (t=0 and t=1) being equal (i.e periodicity)
but still non-zero, which is due to the approximation errors linked to the random
initial weights and the relatively small training set.
This still doesn’t affect the accuracy of the solution elsewhere, with an accuracy
of order 10− 5 as evidenced by these random points of the solution:
Figure 4: Different slices of the predicted solution vs the exact solution
and the error of a random slice (t=0.65):
12
Figure 5: Different slices of the L2 relative error between predicted solution vs the
exact solution
2.3 In the case of p=4
Our problem is as follows:
∂u 2
∂t − div(|∇u| ∇u) = f
in QT :=] 0, T [×Ω
u(0, ·) = u(T, ·) in Ω
u=0 on ΣT :=] 0, T [×∂Ω
Using the same model and network architecture, but increasing the number of
training points and adam training iterations, we get the following results:
(a) Output of adam training (b) Output of L-BFGS-B
training
Figure 6: Training output
Interpretation: We have the same very good performance despite the increas-
ing non-linearity of the 4-Laplacian.
13
Studying the solution
Figure 7: Different slices of the predicted solution vs the exact solution of the
4-Laplacian
We see how we improved the accuracy of the result by 2 orders via more training
points and iterations.
2.4 The case of p = p(x, y)
14
Bibliography
[1] [Link]
[2] [Link]
[3] Ovidiu Calin. Deep learning architectures. Springer, 2020.
[4] Abderrahim Charkaoui and Nour Eddine Alaa. Existence and uniqueness of
renormalized periodic solution to a nonlinear parabolic problem with variable
exponent and l1 data. Journal of Mathematical Analysis and Applications,
506(2):125674, 2022.
[5] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT
press, 2016.
[6] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature,
521(7553):436–444, 2015.
[7] Sébastien Loisel. Efficient algorithms for solving the p-Laplacian in polynomial
time. Numerische Mathematik, 146(2):369–400, 2020.
[8] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. Deep-
XDE: A deep learning library for solving differential equations. SIAM Review,
63(1):208–228, 2021.
[9] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed
neural networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations. Journal of Compu-
tational Physics, 378:686–707, 2019.
[10] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed
deep learning (part i): Data-driven solutions of nonlinear partial differential
equations. arXiv preprint arXiv:1711.10561, 2017.
[11] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed
deep learning (part ii): Data-driven discovery of nonlinear partial differential
equations. arXiv preprint arXiv:1711.10566, 2017.
15