0% found this document useful (0 votes)
86 views

New Bridges Between Deep Learning and Partial Differential Equations

1) The document discusses new connections between deep learning and partial differential equations (PDEs). It explores how techniques from PDEs can provide insights into deep learning algorithms and how neural networks can help solve high-dimensional PDEs. 2) Specifically, it describes how residual neural networks can be interpreted as approximations of solutions to ordinary differential equations (ODEs). This continuous viewpoint has led to more stable and better-understood neural network architectures. 3) It also discusses how PDE-based imaging algorithms provide inspiration for applying PDE techniques to deep learning, such as using convolutional neural networks with operators related to PDEs.

Uploaded by

Aman Jalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

New Bridges Between Deep Learning and Partial Differential Equations

1) The document discusses new connections between deep learning and partial differential equations (PDEs). It explores how techniques from PDEs can provide insights into deep learning algorithms and how neural networks can help solve high-dimensional PDEs. 2) Specifically, it describes how residual neural networks can be interpreted as approximations of solutions to ordinary differential equations (ODEs). This continuous viewpoint has led to more stable and better-understood neural network architectures. 3) It also discusses how PDE-based imaging algorithms provide inspiration for applying PDE techniques to deep learning, such as using convolutional neural networks with operators related to PDEs.

Uploaded by

Aman Jalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations

Research | January 25, 2021

New Bridges Between Deep Learning


and Partial Differential Equations
By Lars Ruthotto

Understanding the world through data and computation has always formed the core of scientific
discovery. Amid many different approaches, two common paradigms have emerged. On the one
hand, primarily data-driven approaches—such as deep neural networks (DNNs)—have proven
extremely successful in recent years. Their success is based mainly on their ability to approximate
complicated functions with generic models when trained using vast amounts of data and
enormous computational resources. But despite their many triumphs, DNNs are difficult to analyze
and thus remain mysterious. Most importantly, they lack the robustness, explainability,
interpretability, and fairness required for high-stakes decision-making. On the other hand,
increasingly realistic model-based approaches—typically derived from first principles and
formulated as partial differential equations (PDEs)—are now available for various tasks. One can
often calibrate these models—which enable detailed theoretical studies, analysis, and
interpretation—with relatively few measurements, thus facilitating their accurate predictions of
phenomena. However, computational methods for PDEs remains a vibrant research area whose
open challenges include the efficient solution of highly nonlinear coupled systems and PDEs in
high dimensions.

In recent years, exciting work at the interface of data-driven and model-based approaches has
blended both paradigms. For instance, PDE techniques and models have yielded better insight
into deep learning algorithms, more robust networks, and more efficient training algorithms. As
another example, consider the solution of high-dimensional PDEs, wherein DNNs have provided
new avenues for tackling the curse of dimensionality. One must understand that the exchange
between deep learning and PDEs is bidirectional and benefits both communities. I hope to offer a
glimpse into a few of these activities and make a case for the creation of new bridges between
applied mathematics and data science.

Continuous Neural Networks Motivated by Ordinary and


Partial Differential Equations
Researchers have traditionally constructed DNNs by concatenating a small, finite number of
functions, each consisting of a trainable affine mapping and a pointwise nonlinearity [9, 13].
Because the difficulty of initializing and training the network weights increases with the number of

https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 1/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations

layers, the network’s depth has been limited in practice. However, the arrival of the so-called
residual neural networks (ResNet) in 2016—which outperformed traditional networks across a
variety of tasks—dramatically changed this situation.

For a simple example of a ResNet in action, consider the training of a neural network that
classifies points in ℝ  into two classes based on training data {(𝐲 , 𝑐 ), (𝐲
2 (1) (1)

), …} ⊂ ℝ × {0, 1} . We have plotted an instance of this scenario in Figure 1a. The deep
(2) (2) 2
,𝑐

learning approach to this problem consists of two stages. We first transform the feature space
(possibly increasing its dimension) via a neural network. Next, we employ a simple classification
model, such as linear multinomial regression. By utilizing a ResNet with 𝑁  layers for the first step,
we transform the data point 𝐲 into 𝐮 as follows: 𝑁

𝐮0 = 𝐊in 𝐲

𝐮1 = 𝐮0 + ℎ 𝜎(𝐊0 𝐮0 + 𝐛0 )

⋮ = ⋮

𝐮𝑁 = 𝐮𝑁 −1 + ℎ 𝜎(𝐊𝑁 −1 𝐮𝑁 −1 + 𝐛𝑁 −1 ).

Here, 𝜎(𝑥) = tanh(𝑥)  serves as an activation function that is applied element-wise, ℎ > 0  is a
fixed step size, and 𝐊 ∈ ℝ ,  𝐊 , … , 𝐊
in
3×2
∈ ℝ
0
, and 𝐛 , … , 𝐛
𝑁 −1
∈ ℝ  are the
3×3
0 𝑁 −1
3

trainable weights. Figure 1b depicts the ResNet’s action for the learned weights, which we
determined through optimization. Based on the projections of the transformed points 𝐮  onto 𝑁

their first two dimensions, it is apparent that solving the classification problem with a linear model
has become trivial. Figure 1c displays the trained classifier in the original data space.

Figure 1. Binary classification via a deep residual neural network. 1a. A synthetic dataset consisting of concentric ellipsoids
in two dimensions that are labeled into two classes, which are visualized as blue and red points. 1b. The propagated input
features for the trained neural network. When trained successfully, the propagated features can be classified with a linear
model. We visualize the decision boundary with a black line and the model’s prediction with a background that is colored
according to the predicted class. 1c. The classifier’s prediction in the original data space. Figure courtesy of Lars Ruthotto.

https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 2/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations

One can also interpret the transformed feature 𝐮  as the forward Euler approximation of 𝐮
𝑁

(𝑇 ) that satisfies the initial value problem

∂𝑡 𝐮(𝑡) = 𝜎(𝐊(𝑡)𝐮(𝑡) + 𝐛(𝑡)), 𝑡 ∈ (0, 𝑇 ], 𝐮(0) = 𝐊in 𝐲.

Here, 𝑇 > 0  is an artificial final time that is loosely related to the network’s depth [4, 8].

This continuous viewpoint has been popularized in the machine learning community under the
term “neural ordinary differential equations” (ODEs) [3]; however, similar ideas are already
published [6]. Scientists have recently been applying ODE techniques to create faster, better-
understood algorithms for neural networks. For instance, we have proposed new architectures
that lead to more stable ODE dynamics [8]. Furthermore, since one might view training as a
(stochastic) optimal control problem, efficient solvers for the learning problem (as well as insight
into this problem) have resulted from adapted computational science and engineering methods
[5, 7].

Analysis of high-dimensional datasets like speech, image, and video data has been a significant
focal point for the deep learning community. In fact, deep learning’s breakthroughs in speech and
image recognition roughly a decade ago are partly responsible for renewed interest in the
subject. Nevertheless, some challenges remain difficult or beyond reach. These ongoing
problems include controlling a self-driving car based on predictions made from high-resolution
images of street scenes, and reliably computing the volume fraction of COVID-19-affected lung
tissue in three-dimensional computed tomography images [10].

While such theoretical and computational challenges may seem insurmountable, we can turn to
the field of PDE-based imaging for inspiration. In the last several decades, researchers have
created many celebrated algorithms by interpreting image data as discretized functions that can
be processed via PDE or integral operators. One can also apply this viewpoint to deep learning
with convolutional neural networks whose operators are linear combinations of PDE operators [6].

We have used this observation to extend the neural ODE framework to PDEs and create new
types of networks. Specifically, we adapted residual neural networks to form unique models that
inherit the stability of parabolic PDEs or—upon suitable discretization—lead to reversible
hyperbolic networks [11]. The latter can help overcome memory limitations of current computing
hardware. For instance, we trained a hyperbolic network with more than 1,200 layers to classify
images on a single graphics processing unit [2].

Deep Learning for the Solution of High-Dimensional PDEs


With few exceptions, the numerical solution of high-dimensional PDEs is challenging due to the
curse of dimensionality. As a simple example, consider a finite difference method for the solution
of Poisson’s problem on a rectangular grid with 𝑛 cells in each dimension. This approach quickly
becomes prohibitive as 𝑑 grows, since the mesh consists of 𝑛  cells. The exponential growth of
𝑑

computational costs prohibits the application of the finite difference method—and other methods

https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 3/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations

that rely on grids—to high-dimensional problems that arise in areas like statistics, finance, and
economics. To avoid this, one can utilize a neural network to parameterize the PDE solution and
rely on the network’s universal approximation properties. While the concept itself is not especially
novel, deep learning advances—particularly new architectures, improved theoretical results,
optimization algorithms, and easy-to-use software packages—have enabled several impressive
outcomes.

One such example is the application of neural networks to high-dimensional mean field games
[12]. Mean field games arise in multiple applications [1]. Their solution is characterized by the value
function that satisfies a PDE system, which couples the continuity equation and the Hamilton-
Jacobi-Bellman (HJB) equation. Computing the value function is extremely challenging due to the
forward-backward structure, the HJB equation’s nonlinearity, and the high dimensionality. Our
approach employs a neural network that is specifically designed to allow a mesh-free solution of
the continuity equation via a Lagrangian method. Although more analysis is needed to fully
understand the stochastic non-convex optimization problem that trains the network, our initial
results indicate that neural networks can compete with well-understood, mesh-based methods in
two dimensions while also being scalable to 100 dimensions.

Outlook
Here I intend to provide a glimpse into the exciting activities and opportunities at the interface of
deep learning and applied mathematics. To demonstrate that this is not a one-way street, I discuss
the promises of deep learning for difficult and almost impossible problems in applied math,
particularly the numerical solution of high-dimensional PDEs.

The coming years will almost certainly see SIAM and its members drive advances in these areas.
Given the widespread use of deep learning in real-world applications, one can perhaps expect
the biggest impact to stem from mathematical theory—including numerical analysis—that aims to
obtain reliable, interpretable, fair, and efficient machine learning models. These models would
also enable deep learning in scientific applications where current results suggest significant
potential yet open issues, such as convergence guarantees and uncertainty quantification. Finally,
fusing data-driven and model-based approaches is a promising means of compensating the lack
of first-principle-based models with data in the form of measurements, observations, and
simulations.

This article is based on Lars Ruthotto’s invited talk at the 2020 SIAM Annual Meeting, which took
place virtually last July. Ruthotto’s presentation is available on SIAM’s YouTube Channel. 

References
[1] Caines, P.E. (2020, April 1). Mean field game theory: A tractable methodology for large population problems. SIAM News,
53(3), p. 5.
[2] Chang, B. Meng, L. Haber, E., Ruthotto, L., Begert, D., & Holtham, E. (2018). Reversible architectures for arbitrarily deep

https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 4/5
1/25/2021 New Bridges Between Deep Learning and Partial Differential Equations

residual neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (pp. 2811-2818).
New Orleans, LA.
[3] Chen, T.Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. In Advances in
Neural Information Processing Systems 31 (NeurIPS 2018). Montreal, Canada.
[4] E, W. (2017). A proposal on machine learning via dynamical systems. Comm. Math. Stat., 5(1), 1-11.
[5] Gholami, A., Keutzer, K., & Biros, G. (2019). ANODE: Unconditionally accurate memory-efficient gradients for neural
ODEs. Preprint, arXiv:1902.10298.
[6] González-García, R., Rico-Martínez, R., & Kevrekidis, I.G. (1998). Identification of distributed parameter systems: A neural
net based approach. Comp. Chem. Eng., 22, S965-S968.
[7] Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., & Gauger, N.R. (2020). Layer-parallel training of deep residual neural
networks. SIAM J. Math. Data Sci., 2(1), 1-23.
[8] Haber, E., & Ruthotto, L. (2017). Stable architectures for deep neural networks. Inverse Prob., 34(1), 1-22.
[9] Higham, C.F., & Higham, D. (2019). Deep learning: An introduction for applied mathematicians. SIAM Rev., 61(4), 860-891.
[10] Lensink, K., Parker, W., & Haber, E. (2020, July 13). Deep learning for COVID-19 diagnosis. SIAM News, 53(6), p. 1.
[11] Ruthotto, L., & Haber, E. (2020). Deep neural networks motivated by partial differential equations. J. Math. Imag. Vision,
62(3), 352-364.
[12] Ruthotto, L., Osher, S.J., Li, W., Nurbekyan, L., & Wu Fung, S. (2020). A machine learning framework for solving high-
dimensional mean field game and mean field control problems. Proc. Natl. Acad. Sci., 117(17), 9183-9193.
[13] Strang, G. (2018, December 3). The functions of deep learning. SIAM News, 51(10), p. 1.

Lars Ruthotto is an applied mathematician who develops computational methods for


machine learning and inverse problems. He is an associate professor in the Department
of Mathematics and Department of Computer Science at Emory University.

https://sinews.siam.org/Details-Page/new-bridges-between-deep-learning-and-partial-differential-equations 5/5

You might also like