New Bridges Between Deep Learning and Partial Differential Equations
New Bridges Between Deep Learning and Partial Differential Equations
Understanding the world through data and computation has always formed the core of scientific
discovery. Amid many different approaches, two common paradigms have emerged. On the one
hand, primarily data-driven approaches—such as deep neural networks (DNNs)—have proven
extremely successful in recent years. Their success is based mainly on their ability to approximate
complicated functions with generic models when trained using vast amounts of data and
enormous computational resources. But despite their many triumphs, DNNs are difficult to analyze
and thus remain mysterious. Most importantly, they lack the robustness, explainability,
interpretability, and fairness required for high-stakes decision-making. On the other hand,
increasingly realistic model-based approaches—typically derived from first principles and
formulated as partial differential equations (PDEs)—are now available for various tasks. One can
often calibrate these models—which enable detailed theoretical studies, analysis, and
interpretation—with relatively few measurements, thus facilitating their accurate predictions of
phenomena. However, computational methods for PDEs remains a vibrant research area whose
open challenges include the efficient solution of highly nonlinear coupled systems and PDEs in
high dimensions.
In recent years, exciting work at the interface of data-driven and model-based approaches has
blended both paradigms. For instance, PDE techniques and models have yielded better insight
into deep learning algorithms, more robust networks, and more efficient training algorithms. As
another example, consider the solution of high-dimensional PDEs, wherein DNNs have provided
new avenues for tackling the curse of dimensionality. One must understand that the exchange
between deep learning and PDEs is bidirectional and benefits both communities. I hope to offer a
glimpse into a few of these activities and make a case for the creation of new bridges between
applied mathematics and data science.
https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 1/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations
layers, the network’s depth has been limited in practice. However, the arrival of the so-called
residual neural networks (ResNet) in 2016—which outperformed traditional networks across a
variety of tasks—dramatically changed this situation.
For a simple example of a ResNet in action, consider the training of a neural network that
classifies points in ℝ into two classes based on training data {(𝐲 , 𝑐 ), (𝐲
2 (1) (1)
), …} ⊂ ℝ × {0, 1} . We have plotted an instance of this scenario in Figure 1a. The deep
(2) (2) 2
,𝑐
learning approach to this problem consists of two stages. We first transform the feature space
(possibly increasing its dimension) via a neural network. Next, we employ a simple classification
model, such as linear multinomial regression. By utilizing a ResNet with 𝑁 layers for the first step,
we transform the data point 𝐲 into 𝐮 as follows: 𝑁
𝐮0 = 𝐊in 𝐲
𝐮1 = 𝐮0 + ℎ 𝜎(𝐊0 𝐮0 + 𝐛0 )
⋮ = ⋮
𝐮𝑁 = 𝐮𝑁 −1 + ℎ 𝜎(𝐊𝑁 −1 𝐮𝑁 −1 + 𝐛𝑁 −1 ).
Here, 𝜎(𝑥) = tanh(𝑥) serves as an activation function that is applied element-wise, ℎ > 0 is a
fixed step size, and 𝐊 ∈ ℝ , 𝐊 , … , 𝐊
in
3×2
∈ ℝ
0
, and 𝐛 , … , 𝐛
𝑁 −1
∈ ℝ are the
3×3
0 𝑁 −1
3
trainable weights. Figure 1b depicts the ResNet’s action for the learned weights, which we
determined through optimization. Based on the projections of the transformed points 𝐮 onto 𝑁
their first two dimensions, it is apparent that solving the classification problem with a linear model
has become trivial. Figure 1c displays the trained classifier in the original data space.
Figure 1. Binary classification via a deep residual neural network. 1a. A synthetic dataset consisting of concentric ellipsoids
in two dimensions that are labeled into two classes, which are visualized as blue and red points. 1b. The propagated input
features for the trained neural network. When trained successfully, the propagated features can be classified with a linear
model. We visualize the decision boundary with a black line and the model’s prediction with a background that is colored
according to the predicted class. 1c. The classifier’s prediction in the original data space. Figure courtesy of Lars Ruthotto.
https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 2/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations
One can also interpret the transformed feature 𝐮 as the forward Euler approximation of 𝐮
𝑁
Here, 𝑇 > 0 is an artificial final time that is loosely related to the network’s depth [4, 8].
This continuous viewpoint has been popularized in the machine learning community under the
term “neural ordinary differential equations” (ODEs) [3]; however, similar ideas are already
published [6]. Scientists have recently been applying ODE techniques to create faster, better-
understood algorithms for neural networks. For instance, we have proposed new architectures
that lead to more stable ODE dynamics [8]. Furthermore, since one might view training as a
(stochastic) optimal control problem, efficient solvers for the learning problem (as well as insight
into this problem) have resulted from adapted computational science and engineering methods
[5, 7].
Analysis of high-dimensional datasets like speech, image, and video data has been a significant
focal point for the deep learning community. In fact, deep learning’s breakthroughs in speech and
image recognition roughly a decade ago are partly responsible for renewed interest in the
subject. Nevertheless, some challenges remain difficult or beyond reach. These ongoing
problems include controlling a self-driving car based on predictions made from high-resolution
images of street scenes, and reliably computing the volume fraction of COVID-19-affected lung
tissue in three-dimensional computed tomography images [10].
While such theoretical and computational challenges may seem insurmountable, we can turn to
the field of PDE-based imaging for inspiration. In the last several decades, researchers have
created many celebrated algorithms by interpreting image data as discretized functions that can
be processed via PDE or integral operators. One can also apply this viewpoint to deep learning
with convolutional neural networks whose operators are linear combinations of PDE operators [6].
We have used this observation to extend the neural ODE framework to PDEs and create new
types of networks. Specifically, we adapted residual neural networks to form unique models that
inherit the stability of parabolic PDEs or—upon suitable discretization—lead to reversible
hyperbolic networks [11]. The latter can help overcome memory limitations of current computing
hardware. For instance, we trained a hyperbolic network with more than 1,200 layers to classify
images on a single graphics processing unit [2].
computational costs prohibits the application of the finite difference method—and other methods
https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 3/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations
that rely on grids—to high-dimensional problems that arise in areas like statistics, finance, and
economics. To avoid this, one can utilize a neural network to parameterize the PDE solution and
rely on the network’s universal approximation properties. While the concept itself is not especially
novel, deep learning advances—particularly new architectures, improved theoretical results,
optimization algorithms, and easy-to-use software packages—have enabled several impressive
outcomes.
One such example is the application of neural networks to high-dimensional mean field games
[12]. Mean field games arise in multiple applications [1]. Their solution is characterized by the value
function that satisfies a PDE system, which couples the continuity equation and the Hamilton-
Jacobi-Bellman (HJB) equation. Computing the value function is extremely challenging due to the
forward-backward structure, the HJB equation’s nonlinearity, and the high dimensionality. Our
approach employs a neural network that is specifically designed to allow a mesh-free solution of
the continuity equation via a Lagrangian method. Although more analysis is needed to fully
understand the stochastic non-convex optimization problem that trains the network, our initial
results indicate that neural networks can compete with well-understood, mesh-based methods in
two dimensions while also being scalable to 100 dimensions.
Outlook
Here I intend to provide a glimpse into the exciting activities and opportunities at the interface of
deep learning and applied mathematics. To demonstrate that this is not a one-way street, I discuss
the promises of deep learning for difficult and almost impossible problems in applied math,
particularly the numerical solution of high-dimensional PDEs.
The coming years will almost certainly see SIAM and its members drive advances in these areas.
Given the widespread use of deep learning in real-world applications, one can perhaps expect
the biggest impact to stem from mathematical theory—including numerical analysis—that aims to
obtain reliable, interpretable, fair, and efficient machine learning models. These models would
also enable deep learning in scientific applications where current results suggest significant
potential yet open issues, such as convergence guarantees and uncertainty quantification. Finally,
fusing data-driven and model-based approaches is a promising means of compensating the lack
of first-principle-based models with data in the form of measurements, observations, and
simulations.
This article is based on Lars Ruthotto’s invited talk at the 2020 SIAM Annual Meeting, which took
place virtually last July. Ruthotto’s presentation is available on SIAM’s YouTube Channel.
References
[1] Caines, P.E. (2020, April 1). Mean field game theory: A tractable methodology for large population problems. SIAM News,
53(3), p. 5.
[2] Chang, B. Meng, L. Haber, E., Ruthotto, L., Begert, D., & Holtham, E. (2018). Reversible architectures for arbitrarily deep
https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 4/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations
residual neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (pp. 2811-2818).
New Orleans, LA.
[3] Chen, T.Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. In Advances in
Neural Information Processing Systems 31 (NeurIPS 2018). Montreal, Canada.
[4] E, W. (2017). A proposal on machine learning via dynamical systems. Comm. Math. Stat., 5(1), 1-11.
[5] Gholami, A., Keutzer, K., & Biros, G. (2019). ANODE: Unconditionally accurate memory-efficient gradients for neural
ODEs. Preprint, arXiv:1902.10298.
[6] González-García, R., Rico-Martínez, R., & Kevrekidis, I.G. (1998). Identification of distributed parameter systems: A neural
net based approach. Comp. Chem. Eng., 22, S965-S968.
[7] Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., & Gauger, N.R. (2020). Layer-parallel training of deep residual neural
networks. SIAM J. Math. Data Sci., 2(1), 1-23.
[8] Haber, E., & Ruthotto, L. (2017). Stable architectures for deep neural networks. Inverse Prob., 34(1), 1-22.
[9] Higham, C.F., & Higham, D. (2019). Deep learning: An introduction for applied mathematicians. SIAM Rev., 61(4), 860-891.
[10] Lensink, K., Parker, W., & Haber, E. (2020, July 13). Deep learning for COVID-19 diagnosis. SIAM News, 53(6), p. 1.
[11] Ruthotto, L., & Haber, E. (2020). Deep neural networks motivated by partial differential equations. J. Math. Imag. Vision,
62(3), 352-364.
[12] Ruthotto, L., Osher, S.J., Li, W., Nurbekyan, L., & Wu Fung, S. (2020). A machine learning framework for solving high-
dimensional mean field game and mean field control problems. Proc. Natl. Acad. Sci., 117(17), 9183-9193.
[13] Strang, G. (2018, December 3). The functions of deep learning. SIAM News, 51(10), p. 1.
https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 5/5