Physics-Informed Deep Learning
Physics-Informed Deep Learning
Abstract
We introduce physics informed neural networks – neural networks that are
trained to solve supervised learning tasks while respecting any given law of
physics described by general nonlinear partial differential equations. In this
second part of our two-part treatise, we focus on the problem of data-driven
discovery of partial differential equations. Depending on whether the avail-
able data is scattered in space-time or arranged in fixed temporal snapshots,
we introduce two main classes of algorithms, namely continuous time and
discrete time models. The effectiveness of our approach is demonstrated us-
ing a wide range of benchmark problems in mathematical physics, including
conservation laws, incompressible fluid flow, and the propagation of nonlinear
shallow-water waves.
Keywords:
Data-driven scientific computing, Machine learning, Predictive modeling,
Runge-Kutta methods, Nonlinear dynamics
1. Introduction
Deep learning has gained unprecedented attention over the last few years,
and deservedly so, as it has introduced transformative solutions across diverse
scientific disciplines [1, 2, 3, 4]. Despite the ongoing success, there exist many
scientific applications that have yet failed to benefit from this emerging tech-
nology, primarily due to the high cost of data acquisition. It is well known
In the first part of this study, we introduced physics informed neural net-
works as a viable solution for training deep neural networks with few training
examples, for cases where the available data is known to respect a given phys-
ical law described by a system of partial differential equations. Such cases are
abundant in the study of physical, biological, and engineering systems, where
longstanding developments of mathematical physics have shed tremendous
insight on how such systems are structured, interact, and dynamically evolve
in time. We saw how the knowledge of an underlying physical law can in-
troduce structure that effectively regularizes the training of neural networks,
and enables them to generalize well even when only a few training examples
are available. Through the lens of different benchmark problems, we high-
lighted the key features of physics informed neural networks in the context
of data-driven solutions of partial differential equations [5, 6].
In this second part of our study, we shift our attention to the problem of
data-driven discovery of partial differential equations [7, 8, 9]. To this end,
let us consider parametrized and nonlinear partial differential equations of
the general form
where u(t, x) denotes the latent (hidden) solution, N [·; λ] is a nonlinear op-
erator parametrized by λ, and Ω is a subset of RD . This setup encapsulates a
wide range of problems in mathematical physics including conservation laws,
diffusion processes, advection-diffusion-reaction systems, and kinetic equa-
tions. As a motivating example, the one dimensional Burgers’ equation [10]
corresponds to the case where N [u; λ] = λ1 uux − λ2 uxx and λ = (λ1 , λ2 ).
Here, the subscripts denote partial differentiation in either time or space.
Now, the problem of data-driven discovery of partial differential equations
poses the following question: given a small set of scattered and potentially
noisy observations of the hidden state u(t, x) of a system, what are the pa-
rameters λ that best describe the observed data?
2
In what follows, we will provide an overview of our two main approaches
to tackle this problem, namely continuous time and discrete time models, as
well as a series of results and systematic studies for a diverse collection of
benchmarks. In the first approach, we will assume availability of scattered
and potential noisy measurements across the entire spatio-temporal domain.
In the latter, we will try to infer the unknown parameters λ from only two
data snapshots taken at distinct time instants. All data and codes used in
this manuscript are publicly available on GitHub at [Link]
maziarraissi/PINNs.
3
and proceed by approximating u(t, x) by a deep neural network. This will
result in the physics informed neural network f (t, x). The shared parameters
of the neural networks u(t, x) and f (t, x) along with the parameters λ =
(λ1 , λ2 ) of the differential operator can be learned by minimizing the mean
squared error loss
M SE = M SEu + M SEf , (5)
where
N
1 X
M SEu = |u(tiu , xiu ) − ui |2 ,
N i=1
and
N
1 X
M SEf = |f (tiu , xiu )|2 .
N i=1
4
u(t, x)
1.0 1.00
0.75
0.5 0.50
0.25
0.0 0.00
x
−0.25
−0.5 −0.50
−0.75
−1.0 −1.00
0.0 0.2 0.4 0.6 0.8
t Data (2000 points)
u(t, x)
u(t, x)
0 0 0
−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
x x x
Exact Prediction
Figure 1: Burgers equation: Top: Predicted solution u(t, x) along with the training data.
Middle: Comparison of the predicted and exact solutions corresponding to the three tem-
poral snapshots depicted by the dashed vertical lines in the top panel. Bottom: Correct
partial differential equation along with the identified one obtained by learning λ1 and λ2 .
5
% error in λ1 % error in λ2
noise
0% 1% 5% 10% 0% 1% 5% 10%
Nu
500 0.131 0.518 0.118 1.319 13.885 0.483 1.708 4.058
1000 0.186 0.533 0.157 1.869 3.719 8.262 3.481 14.544
1500 0.432 0.033 0.706 0.725 3.093 1.423 0.502 3.156
2000 0.096 0.039 0.190 0.101 0.469 0.008 6.216 6.391
Table 1: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different number of training data N corrupted by different noise levels. Here, the neural
network architecture is kept fixed to 9 layers and 20 neurons per layer.
% error in λ1 % error in λ2
Neurons
10 20 40 10 20 40
Layers
2 11.696 2.837 1.679 103.919 67.055 49.186
4 0.332 0.109 0.428 4.721 1.234 6.170
6 0.668 0.629 0.118 3.144 3.123 1.158
8 0.414 0.141 0.266 8.459 1.902 1.552
Table 2: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2
for different number of hidden layers and neurons per layer. Here, the training data
is considered to be noise-free and fixed to N = 2, 000.
interest. They may be used to model the weather, ocean currents, water flow
in a pipe and air flow around a wing. The Navier-Stokes equations in their
full and simplified forms help with the design of aircraft and cars, the study
of blood flow, the design of power stations, the analysis of the dispersion of
pollutants, and many other applications. Let us consider the Navier-Stokes
equations in two dimensions1 (2D) given explicitly by
where u(t, x, y) denotes the x-component of the velocity field, v(t, x, y) the
y-component, and p(t, x, y) the pressure. Here, λ = (λ1 , λ2 ) are the unknown
1
It is straightforward to generalize the proposed framework to the Navier-Stokes equa-
tions in three dimensions (3D).
6
parameters. Solutions to the Navier-Stokes equations are searched in the set
of divergence-free functions; i.e.,
ux + vy = 0. (7)
This extra equation is the continuity equation for incompressible fluids that
describes the conservation of mass of the fluid. We make the assumption
that
u = ψy , v = −ψx , (8)
for some latent function ψ(t, x, y).2 Under this assumption, the continuity
equation (7) will be automatically satisfied. Given noisy measurements
{ti , xi , y i , ui , v i }N
i=1
2
This construction can be generalized to three dimensional problems by employing the
notion of vector potentials.
7
for different regimes of the Reynolds number Re = u∞ D/ν. Assuming a
non-dimensional free stream velocity u∞ = 1, cylinder diameter D = 1, and
kinematic viscosity ν = 0.01, the system exhibits a periodic steady state
behavior characterized by a asymmetrical vortex shedding pattern in the
cylinder wake, known as the Kármán vortex street [14].
8
Vorticity
3
2
5
1
0 0
y
−1
−5
−2
−3
−15 −10 −5 0 5 10 15 20 25
x
u(t, x, y) v(t, x, y)
y t y t
x x
Figure 2: Navier-Stokes equation: Top: Incompressible flow and dynamic vortex shedding
past a circular cylinder at Re = 100. The spatio-temporal training data correspond to
the depicted rectangular region in the cylinder wake. Bottom: Locations of training data-
points for the the stream-wise and transverse velocity components, u(t, x, y) and v(t, x, t),
respectively.
the unknown parameters λ1 and λ2 with very high accuracy even when the
training data was corrupted with noise. Specifically, for the case of noise-
free training data, the error in estimating λ1 and λ2 is 0.078%, and 4.67%,
respectively. The predictions remain robust even when the training data are
corrupted with 1% uncorrelated Gaussian noise, returning an error of 0.17%,
and 5.70%, for λ1 and λ2 , respectively.
9
Predicted pressure Exact pressure
2 2
1.4 0.0
1 1.3 1 −0.1
1.2 −0.2
0 0
y
y
1.1 −0.3
−1 1.0 −1 −0.4
−2 0.9 −2 −0.5
2 4 6 8 2 4 6 8
x x
10
very large.
Here, un+cj (x) = u(tn + cj ∆t, x) for j = 1, . . . , q. This general form en-
capsulates both implicit and explicit time-stepping schemes, depending on
the choice of the parameters {aij , bj , cj }. Equations (11) can be equivalently
expressed as
un = uni , i = 1, . . . , q,
(12)
un+1 = un+1
i , i = 1, . . . , q.
where
Pq
uni := un+ci + ∆t j=1 aij N [un+cj ; λ], i = 1, . . . , q,
Pq (13)
un+1
i := un+ci + ∆t j=1 (aij − bj )N [u
n+cj
; λ], i = 1, . . . , q.
This prior assumption along with equations (13) result in two physics in-
formed neural networks
n
u1 (x), . . . , unq (x), unq+1 (x) ,
(15)
and n+1
u1 (x), . . . , un+1 (x), un+1
q q+1 (x) . (16)
Given noisy measurements at two distinct temporal snapshots {xn , un } and
{xn+1 , un+1 } of the system at times tn and tn+1 , respectively, the shared
parameters of the neural networks (14), (15), and (16) along with the pa-
rameters λ of the differential operator can be trained by minimizing the sum
11
of squared errors
SSE = SSEn + SSEn+1 , (17)
where
q Nn
X X
SSEn := |unj (xn,i ) − un,i |2 ,
j=1 i=1
and
q Nn+1
X X
SSEn+1 := |un+1
j (xn+1,i ) − un+1,i |2 .
j=1 i=1
N N N
Here, xn = {xn,i }i=1
n
, un = {un,i }i=1
n
, xn+1 = {xn+1,i }i=1
n+1
, and un+1 =
Nn+1
{un+1,i }i=1 .
Given merely two training data snapshots, the shared parameters of the
neural networks (14), (15), and (16) along with the parameters λ = (λ1 , λ2 )
of the Burgers’ equation can be learned by minimizing the sum of squared
errors in equation (17). Here, we have created a training data-set comprising
of Nn = 199 and Nn+1 = 201 spatial points by randomly sampling the exact
solution at time instants tn = 0.1 and tn+1 = 0.9, respectively. The training
data are shown in the top and middle panel of figure 4. The neural network
architecture used here consists of 4 hidden layers with 50 neurons each, while
the number of Runge-Kutta stages is empirically chosen to yield a temporal
error accumulation of the order of machine precision by setting3
3
This is motivated by the theoretical error estimates for implicit Runge-Kutta schemes
suggesting a truncation error of O(∆t2q ) [16].
12
% error in λ1 % error in λ2
noise
0% 1% 5% 10% 0% 1% 5% 10%
∆t
0.2 0.002 0.435 6.073 3.273 0.151 4.982 59.314 83.969
0.4 0.001 0.119 1.679 2.985 0.088 2.816 8.396 8.377
0.6 0.002 0.064 2.096 1.383 0.090 0.068 3.493 24.321
0.8 0.010 0.221 0.097 1.233 1.918 3.215 13.479 1.621
Table 3: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different gap size ∆t between two different snapshots and for different noise levels.
where the time-step for this example is ∆t = 0.8. The bottom panel of
figure 4 summarizes the identified parameters λ = (λ1 , λ2 ) for the cases of
noise-free data, as well as noisy data with 1% of Gaussian uncorrelated noise
corruption. For both cases, the proposed algorithm is able to learn the cor-
rect parameter values λ1 = 1.0 and λ2 = 0.01/π with remarkable accuracy,
despite the fact that the two data snapshots used for training are very far
apart, and potentially describe different regimes of the underlying dynamics.
13
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x
−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.10 t = 0.90
199 trainng data 201 trainng data
1.0
0.5
0.5
u(t, x)
u(t, x)
0.0 0.0
−0.5
−0.5
−1.0
−1 0 1 −1 0 1
x x
Exact Data
Figure 4: Burgers equation: Top: Solution u(t, x) along with the temporal locations of the
two training snapshots. Middle: Training data and exact solution corresponding to the
two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom:
Correct partial differential equation along with the identified one obtained by learning
λ1 , λ2 .
water surfaces; the Korteweg-de Vries (KdV) equation. This equation can
also be viewed as Burgers’ equation with an added dispersive term. The KdV
equation has several connections to physical problems. It describes the evolu-
tion of long one-dimensional waves in many physical settings. Such physical
14
% error in λ1 % error in λ2
Neurons
10 25 50 10 25 50
Layers
1 1.868 4.868 1.960 180.373 237.463 123.539
2 0.443 0.037 0.015 29.474 2.676 1.561
3 0.123 0.012 0.004 7.991 1.906 0.586
4 0.012 0.020 0.011 1.125 4.448 2.014
Table 4: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different number of hidden layers and neurons in each layer.
with (λ1 , λ2 ) being the unknown parameters. For the KdV equation, the
nonlinear operator in equations (13) is given by
and the shared parameters of the neural networks (14), (15), and (16) along
with the parameters λ = (λ1 , λ2 ) of the KdV equation can be learned by
minimizing the sum of squared errors (17).
To obtain a set of training and test data we simulated (20) using con-
ventional spectral methods. Specifically, starting from an initial condition
u(0, x) = cos(πx) and assuming periodic boundary conditions, we have inte-
grated equation (20) up to a final time t = 1.0 using the Chebfun package
[18] with a spectral Fourier discretization with 512 modes and a fourth-order
explicit Runge-Kutta temporal integrator with time-step ∆t = 10−6 . Using
this data-set, we then extract two solution snapshots at time tn = 0.2 and
tn+1 = 0.8, and randomly sub-sample them using Nn = 199 and Nn+1 = 201
to generate a training data-set. We then use these data to train a discrete
time physics informed neural network by minimizing the sum of squared error
15
loss of equation (17) using L-BFGS [12]. The network architecture used here
comprises of 4 hidden layers, 50 neurons per layer, and an output layer pre-
dicting the solution at the q Runge-Kutta stages, i.e., un+cj (x), j = 1, . . . , q,
where q is computed using equation (19) by setting ∆t = 0.6.
Although a series of promising results was presented, the reader may per-
haps agree that this two-part treatise creates more questions than it answers.
In a broader context, and along the way of seeking further understanding of
such tools, we believe that this work advocates a fruitful synergy between
machine learning and classical computational physics that has the potential
to enrich both fields and lead to high-impact developments.
16
u(t, x)
1.0
2.0
0.5 1.5
1.0
0.0
x
0.5
0.0
−0.5
−0.5
−1.0 −1.0
0.0 0.2 0.4 0.6 0.8 1.0
t
t = 0.20 t = 0.80
199 trainng data 201 trainng data
1.0
2
0.5
u(t, x)
u(t, x)
1
0.0
−0.5 0
−1.0
−1 0 1 −1 0 1
x x
Exact Data
Figure 5: KdV equation: Top: Solution u(t, x) along with the temporal locations of the
two training snapshots. Middle: Training data and exact solution corresponding to the
two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom:
Correct partial differential equation along with the identified one obtained by learning
λ1 , λ2 .
Acknowledgements
This work received support by the DARPA EQUiPS grant N66001-15-
2-4055, the MURI/ARO grant W911NF-15-1-0562, and the AFOSR grant
FA9550-17-1-0013. All data and codes used in this manuscript are publicly
17
available on GitHub at [Link]
References
[1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with
deep convolutional neural networks, in: Advances in neural information
processing systems, pp. 1097–1105.
18
[11] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Au-
tomatic differentiation in machine learning: a survey, arXiv preprint
arXiv:1502.05767 (2015).
[12] D. C. Liu, J. Nocedal, On the limited memory BFGS method for large
scale optimization, Mathematical programming 45 (1989) 503–528.
[17] T. Dauxois, Fermi, Pasta, Ulam and a mysterious lady, arXiv preprint
arXiv:0801.1590 (2008).
19