Machine Learning For Fluid Mechanics
Machine Learning For Fluid Mechanics
net/publication/335785608
CITATIONS READS
143 3,444
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Steven L. Brunton on 18 January 2020.
1
Supervised Semi-supervised Un-supervised
Figure 1
Organizational overview of various machine learning methods.
1. INTRODUCTION
Fluid mechanics has traditionally dealt with massive amounts of data from experiments,
field measurements, and large-scale numerical simulations. Indeed, in the past few decades
big data has been a reality in fluid mechanics research (Pollard et al. 2016), due to high-
performance computing architectures and advances in experimental measurement capa-
bilities. Over the past 50 years many techniques were developed to handle such data,
ranging from advanced algorithms for data processing and compression, to fluid mechanics
databases (Perlman et al. 2007; Wu & Moin 2008). However, the analysis of fluid mechanics
data has largely relied on domain expertise, statistical analysis, and heuristic algorithms.
The growth of data today is widespread across scientific disciplines, and gaining insight
and actionable information from data has become a new mode of scientific inquiry as well
as a commercial opportunity. Our generation is experiencing an unprecedented confluence
of 1) vast and increasing volumes of data, 2) advances in computational hardware and
reduced costs for computation, data storage and transfer, 3) sophisticated algorithms, 4)
an abundance of open source software and benchmark problems, and 5) significant and
ongoing investment by industry. These advances have, in turn, fueled renewed interest and
Machine learning:
Algorithms that progress in the field of machine learning (ML) to extra information from this data. Machine
process and extract learning is now rapidly making inroads in fluid mechanics. These learning algorithms may
information from be categorized into supervised, semi-supervised, and unsupervised learning (see Fig. 1),
data. They facilitate depending on the information available about the data to the learning machine.
automation of tasks Machine learning provides a modular and agile modeling framework that can be tailored
and augment human
domain knowledge. to address many challenges in fluid mechanics, such as reduced-order modeling, experimental
They are linked to data processing, shape optimization, turbulence closure modeling, and control. As scientific
learning processes inquiry shifts from first principles to data-driven approaches, we may draw a parallel with
and are categorized the development of numerical methods in the 1940’s and 1950’s to solve the equations of
as supervised, fluid dynamics. With the increasing prevalence of data-driven methods, fluid mechanics
semi-supervised, or
unsupervised. will both benefit from learning algorithms and present challenges that may further advance
these algorithms to complement human understanding and engineering intuition.
In addition to outlining successes, we must note the importance of understanding how
learning algorithms work and when these methods succeed or fail. It is important to balance
excitement about the capabilities of machine learning with the reality that its application
to fluid mechanics is an open and challenging field. In this context, we also emphasize the
benefit of incorporating domain knowledge about fluid mechanics into learning algorithms.
We envision that the fluid mechanics community can contribute to advances in machine
learning reminiscent of advances in numerical methods in the last century.
ŷ
Sample LEARNING
Generator x MACHINE
p(x) φ(x, y, w)
SYSTEM y
p(y | x)
Figure 3
The learning problem: A learning machine uses inputs from a sample generator and observations
from a system to generate an approximation of its output (Credit: Cherkassky & Mulier (2007)).
Alternative loss functions may reflect different constraints on the learning machine such
as sparsity (Hastie et al. 2009; Brunton & Kutz 2019). The choice of the approximation
function reflects prior knowledge about the data and the choice between linear and nonlinear
methods directly bears on the computational cost associated with the learning methods.
ht 1 ht ct 1 ct
+
tanh
RNN tanh
RNN LSTM tanh
LSTM
ht 1 ht
xt 1 xt xt+1 xt 1 xt xt+1
Figure 4
Recurrent neural nets (RRNs) for time-series predictions and the Long Short-Term Memory
(LSTM) regularization (Hochreiter & Schmidhuber (1997)).
2.1.1. Neural networks. Neural networks are arguably the most well known methods in
Neural network: A
supervised learning. They are fundamental nonlinear function approximators, and in recent
computational
years a number of efforts have been dedicated in understanding their effectiveness. The architecture, based
universal approximation theorem (Hornik et al. 1989) states that any function may be loosely on biological
approximated by a sufficiently large and deep network. Recent work has shown that sparsely networks of neurons,
connected, deep neural networks are information theoretic optimal nonlinear approximators for nonlinear
regression. A simple
for a wide range of functions and systems (Bölcskei et al. 2019).
neural network with
The power and flexibility of neural networks emanates from their modular structure input x, output ŷ,
based on the neuron as a central building element, a caricature of the neurons in the hu- activation function σ
man brain. Each neuron receives an input, processes it through an activation function, and and weights w, w0
produces an output. Multiple neurons can be combined into different structures that reflect that are determined
from data y by mini-
knowledge about the problem and the type of data. Feed-forward networks are among the
mizing E = ||y−ŷ||2 .
most common structures, and they are composed of layers of neurons, where a weighted out-
put from one layer is the input to the next layer. NN architectures have an input layer that
receives the data and an output layer that produces a prediction. Nonlinear optimization
methods, such as back-propagation (Rumelhart et al. 1986), are used to identify the net-
work weights to minimize the error between the prediction and labeled training data. Deep
neural networks involve multiple layers and various types of nonlinear activation functions.
When the activation functions are expressed in terms of convolutional kernels, a powerful
class of networks emerges, namely convolutional neural networks (CNN), with great success
in image and pattern recognition (Krizhevsky et al. 2012; Goodfellow et al. 2016).
Recurrent neural networks (RNNs), depicted in Fig. 4, are of particular interest to
fluid mechanics. They operate on sequences of data (e.g., images from a video, time-
series, etc.) and their weights are obtained by back-propagation through time (BPTT).
RNNs have been quite successful for natural language processing and speech recognition.
However, their effectiveness has been hindered by diminishing or exploding gradients that
emerge during their training. The renewed interest in RNNs is largely attributed to the
development of the long short-term memory (LSTM) (Hochreiter & Schmidhuber 1997)
algorithms that deploy cell states and gating mechanisms to store and forget information
about past inputs, thus alleviating the problems with gradients and the transmission of
long-term information that standard RNNs suffer from. An extended architecture, called
the multi-dimensional LSTM network (MD-LSTM) (Graves et al. 2007), was proposed to
efficiently handle high-dimensional spatiotemporal data. A number of potent alternatives
to RNNS have appeared over the years; the echo state network has been used for prediction
in dynamical systems (Pathak et al. 2018).
1 N
S= (xn − x̄)(xn − x̄)T
N∑n=1
z z
Sui = λiui
retain M < D
eigenvectors
x x̂ x x̂
Figure 5
W ⋅ z =(dAE,
PCA/POD (left) vs shallow autoencoder (sAE, middle), versus deep autoencoder
T
x̃ right). If the node activation
functions in sAE are linear, then U and V are matrices that minimize the loss function kx̂ − VUxk. The node activation
1 N
x W⋅x= z
may be nonlinear, minimizing the loss function kx − ψ(ϕ(x))k. The input x ∈ RD is reduced to z ∈ RM , with
x̄ =
functions
∑ D.
MN
xn x̃
Note that the PCA/POD requires the solution of a problem specific eigenvalue equation while the neuron
n=1
modules and can be extended to nonlinear activation functions and multiple nodes and layers. (adapted from C. Bishop)
1 N
S= (xn − x̄)(xn − x̄)T
N∑n=1 2.1.2. Classification: Support vector machines and random forests. Classification is a su-
Sui = λiui
pervised learning task that can determine the label or category of a set of measurements
from a-priori labeled training data. It is perhaps the oldest method for learning, starting
retain M < D
eigenvectors with the perceptron (Rosenblatt 1958), which could classify between two types of linearly
separable data. Two fundamental classification algorithms are support vector machines
(SVM) (Schölkopf & Smola 2002) and random forests (Breiman 2001), which have been
widely adopted in industry until recent progress by deep neural networks. The problem can
be specified by the following loss functional, which is most simply expressed for two classes:
(
0, if y = φ(x, y, w),
L y, φ(x, y, w) = (3)
1, if y 6= φ(x, y, w).
Deep learning:
Neural networks Here the output of the learning machine is an indicator on the class to which the data
with multiple layers, belong. The risk functional quantifies the probability of misclassification and the task is
used to create to minimize the risk based on the training data by suitable choice of φ(x, y, w). Random
powerful hierarchical forests are based on an ensemble of decision trees that hierarchically split the data using
representations at
simple conditional statements; these decisions are interpretable and fast to evaluate at scale.
varying levels of
abstraction. In the context of classification, an SVM maps the data into a high-dimensional feature space
on which a linear classification is possible.
2.2.1. Dimensionality reduction I : POD, PCA and auto-encoders. The extraction of flow
features from experimental data and large scale simulations is a cornerstone for flow model-
ing. Moreover identifying lower dimensional representations for high-dimensional data can
be used as pre-processing for all tasks in supervised learning algorithms. Dimensionality
reduction can also be viewed as an “information filtering bottleneck” where the data is
processed through a lower dimensional representation before being mapped backed to the
2.2.2. Dimensionality reduction II: Discrete principal curves and self-organizing maps. The
mapping between high-dimensional data and a low-dimensional representation can be struc-
tured through an explicit shaping of the lower dimensional space, possibly reflecting an
a-priori knowledge about this subspace. These techniques can be seen as extensions of the
linear auto-encoders, where the encoder and decoder can be nonlinear functions. This non-
linearity may come however at the expense of losing the inverse relationship between the
encoder and decoder functions that is one of the strengths of linear PCA. An alternative
is to define the decoder as an approximation of the inverse of the encoder, leading to the
method of principal curves. Principal curves are structures on which the data are projected
during the encoding step of the learning algorithm. In turn the decoding step amounts to
an approximation of the inverse of this mapping by adding for example some smoothing
onto the principal curves. An important version of this process is the self-organizing map
(SOM) introduced by Kohonen (1995). In SOMs the projection subspace is described into
a finite set of values with specified connectivity architecture and distance metrics. The
encoder step amounts to identifying for each data point the closest node point on the SOM
and the decoder step is a weighted regression estimate, using for example kernel functions,
that take advantage of the specified distance metric between the map nodes. This modifies
the node centers, and the process can be iterated until the empirical risk of the autoencoder
has been minimized. The SOM capabilities can be exemplified by comparing it to linear
PCA for two dimensional set of points. The linear PCA will provide as an approximation
the least squares straight line between the points whereas the SOM will map the points
onto a curved line that better approximates the data. We note that SOMs can be extended
to areas beyond floating point data and they offer an interesting way for creating data bases
based on features of flow fields.
We note that vector quantization is a data reduction method, not necessarily employed for
dimensionality reduction. In the latter the learning problem seeks to identify low dimen-
sional features in high dimensional data, whereas quantization amounts to finding represen-
tative clusters of the data. Vector quantization must also be distinguished from clustering
as in the former the number of desired centers is determined a-priori whereas clustering
aims to identify meaningful groupings in the data. When these groupings are represented
by some prototypes then clustering and quantization have strong similarities.
2.3.1. Generative adversarial networks (GAN). GANs are learning algorithms that result in
a generative model, i.e. a model that produces data according to a probability distribution,
which mimics that of the data used for its training. The learning machine is composed
of two networks that compete with each other in a zero sum game (Goodfellow et al.
2014). The generative network produces candidate data examples that are evaluated by the
discriminative, or critic, network to optimize a certain task. The generative (G) network’s
training objective is to synthesize novel examples of data to fool the discriminative network
into misclassifying them as belonging to the true data distribution. The weights of these
networks (N) are obtained through a process, inspired by game theory, called adversarial (A)
learning. The final objective of the GAN training process is to identify the generative model
that produces an output that reflects the underlying system. Labeled data are provided by
the discriminator network and the function to be minimized is the KL divergence between
the two distributions. In the ensuing “game”, the discriminator aims to maximize the
probability of it discriminating between true data and data produced by the generator,
while the generator aims to minimize the same probability. Because the generative and
discriminative networks essentially train themselves, after initialization with labeled training
data, this procedure is often referred to as self-supervised. This self-training process adds to
the appeal of GANs but at the same time one must be cautious on whether an equilibrium
will ever be reached in the above mentioned game. As with other training algorithms, large
amounts of data help the process but, at the moment, there is no guarantee of convergence.
3.1.2. Clustering and classification. Clustering and classification are cornerstones of ma-
chine learning. There are dozens of mature algorithms to choose from, depending on the
size of the data and the desired number of categories. The k-means algorithm has been
successfully employed by Kaiser et al. (2014) to develop a data-driven discretization of a
high-dimensional phase space for the fluid mixing layer. This low-dimensional representa-
tion, in terms of a small number of clusters, enabled tractable Markov transition models
for how the flow evolves in time from one state to another. Because the cluster centroids
exist in the data space, it is possible to associate each cluster centroid with a physical flow
field, lending additional interpretability. In Amsallem et al. (2012) k-means clustering was
used to partition phase space into separate regions, in which local reduced-order bases were
constructed, resulting in improved stability and robustness to parameter variations.
POD modes
Autoencoder modes
Figure 6
Unsupervised learning example: Merging of two vortices (top), POD modes (middle) and respective modes from a linear
auto-encoder (bottom). Note that unlike POD modes, the autoencoder modes are not orthogonal and are not ordered.
Classification is also widely used in fluid dynamics to distinguish between various canon-
ical behaviors and dynamic regimes. Classification is a supervised learning approach where
labeled data is used to develop a model to sort new data into one of several categories.
Recently, Colvert et al. (2018) investigated the classification of wake topology (e.g., 2S,
2P+2S, 2P+4S) behind a pitching airfoil from local vorticity measurements using neural
networks; extensions have compared performance for various types of sensors (Alsalman
et al. 2018). In Wang & Hemati (2017) the k nearest neighbors (KNN) algorithm was used
to detect exotic wakes. Similarly, neural networks have been combined with dynamical sys-
tems models to detect flow disturbances and estimate their parameters (Hou et al. 2019).
Related graph and network approaches in fluids by Nair & Taira (2015) have been used
for community detection in wake flows (Meena et al. 2018). Finally, one of the earliest
examples of machine learning classification in fluid dynamics by Bright et al. (2013) was
based on sparse representation (Wright et al. 2009).
3.1.3. Sparse and randomized methods. In parallel to machine learning, there have been
great strides in sparse optimization and randomized linear algebra. Machine learning and
sparse algorithms are synergistic, in that underlying low-dimensional representations facili-
tate sparse measurements (Manohar et al. 2018) and fast randomized computations (Halko
et al. 2011). Decreasing the amount of data to train and execute a model is important when
a fast decision is required, as in control. Compressed sensing has already been leveraged
for compact representations of wall-bounded turbulence (Bourguignon et al. 2014) and for
POD based flow reconstruction (Bai et al. 2014).
Low-dimensional structure in data also facilitates dramatically accelerated computations
via randomized linear algebra (Mahoney 2011; Halko et al. 2011). If a matrix has low-rank
structure, then there are extremely efficient matrix decomposition algorithms based on
random sampling; this is closely related to the idea of sparsity and the high-dimensional
geometry of sparse vectors. The basic idea is that if a large matrix has low-dimensional
structure, then with high probability this structure will be preserved after projecting the
columns or rows onto a random low-dimensional subspace, facilitating efficient downstream
computations. These so-called randomized numerical methods have the potential to trans-
form computational linear algebra, providing accurate matrix decompositions at a fraction
3.1.4. Super resolution and flow cleansing. Much of machine learning is focused on imaging
science, providing robust approaches to improve resolution and remove noise and corruption
based on statistical inference. These super resolution and de-noising algorithms have the
potential to improve the quality of both simulations and experiments in fluids.
Super resolution involves the inference of a high-resolution image from low-resolution
measurements, leveraging the statistical structure of high-resolution training data. Several
approaches have been developed for super resolution, for example based on a library of
examples (Freeman et al. 2002), sparse representation in a library (Yang et al. 2010), and
most recently based on convolutional neural networks (Dong et al. 2014). Experimental flow
field measurements from particle image velocimetry (PIV) (Willert & Gharib 1991; Adrian
1991) provide a compelling application where there is a tension between local flow resolution
and the size of the imaging domain. Super resolution could leverage expensive and high-
resolution data on smaller domains to improve the resolution on a larger imaging domain.
Large eddy simulations (LES) (Germano et al. 1991; Meneveau & Katz 2000) may also
benefit from super resolution to infer the high-resolution structure inside a low-resolution
cell that is required to compute boundary conditions. Recently Fukami et al. (2018) have
developed a CNN-based super-resolution algorithm and demonstrated its effectiveness on
turbulent flow reconstruction, showing that the energy spectrum is accurately preserved.
One drawback of super-resolution is that it is often extremely costly computationally, mak-
ing it useful for applications where high-resolution imaging may be prohibitively expensive;
however, improved neural-network based approaches may drive the cost down significantly.
We note also that Xie et al. (2018) recently employed GANs for super-resolution.
The processing of experimental PIV and particle tracking has been also one of the first
applications of machine learning. Neural networks have been used for fast PIV (Knaak et al.
1997) and particle tracking velocimetry (Labonté 1999), with impressive demonstrations for
three-dimensional Lagrangian particle tracking (Ouellette et al. 2006). More recently, deep
convolutional neural networks have been used to construct velocity fields from PIV image
pairs (Lee et al. 2017). Related approaches have also been used to detect spurious vectors
in PIV data (Liang et al. 2003) to remove outliers and fill in corrupt pixels.
3.2.1. Linear models through nonlinear embeddings: DMD and Koopman analysis. Many
classical techniques in system identification may be considered machine learning, as they
are data-driven models that generalize beyond the training data. The dynamic mode de-
composition (DMD) (Schmid 2010; Kutz et al. 2016) is a modern approach, to extract
spatiotemporal coherent structures from time-series data of fluid flows, resulting in a low-
dimensional linear model for the evolution of these dominant coherent structures. DMD is
based on data-driven regression and is equally valid for time-resolved experimental and nu-
merical data. DMD is closely related to the Koopman operator (Rowley et al. 2009; Mezic
3.2.2. Neural network modeling. Over the last three decades neural networks have been
used to model dynamical systems and fluid mechanics problems. Early examples include
the use of NNs to learn the solutions of ordinary and partial differential equations (Dis-
sanayake & Phan-Thien 1994; Gonzalez-Garcia et al. 1998; Lagaris et al. 1998). We note
that the potential of this work has not been fully explored and in recent years there is further
advances (Chen et al. 2018; Raissi & Karniadakis 2018) including discrete and continuous
in time networks. We note also the possibility of using these methods to uncover latent vari-
ables and reduce the number of parametric studies often associated with partial differential
equations Raissi et al. (2019). Neural networks are also frequently employed in nonlinear
system identification techniques, such as NARMAX, which are often used to model fluid
systems (Semeraro et al. 2016; Glaz et al. 2010). In fluid mechanics, neural networks were
widely used to model heat transfer (Jambunathan et al. 1996), turbomachinery (Pierret &
Van den Braembussche 1998), turbulent flows (Milano & Koumoutsakos 2002), and other
problems in aeronautics (Faller & Schreck 1996).
Recurrent Neural Netwosk with LSTMs (Hochreiter & Schmidhuber (1997) have been
revolutionary for speech recognition, and they are considered one of the landmark successes
of artificial intellignece. The are currently being used to model dynamical systems and
for data driven predictions of extreme events (Wan et al. 2018; Vlachas et al. 2018). An
interesting finding of these studies is that combining data driven and reduced order models
is a potent method that outperforms each of its components on a number of studies. Gener-
ative adversarial networks (GANs) (Goodfellow et al. 2014) are also being used to capture
physics (Wu et al. 2018). GANs have potential to aid in the modeling and simulation of
turbulence (Kim et al. 2018), although this field is nascent.
Despite the promise and widespread use of neural networks in dynamical systems, a
number of challenges remains. Neural networks are fundamentally interpolative, and so the
function is only well approximated in the span (or under the probability distribution) of
the sampled data used to train them. Thus, caution should be exercised when using neural
network models for an extrapolation task. In many computer vision and speech recognition
3.2.4. Closure models with machine learning. The use of machine learning to develop turbu-
lence closures is an active area of research (Duraisamy et al. 2019). The extreme separation
of spatiotemporal scales in turbulent flows makes it exceedingly costly to resolve all scales in
simulation, and even with Moore’s law, we are decades away from resolving all scales in rel-
evant configurations (e.g., aircraft, submarines, etc.). It is common to truncate small scales
and model their effect on the large scales with a closure model. Common approaches include
Reynolds averaged Navier Stokes (RANS) and large eddy simulation (LES). However, these
models may require careful tuning to match fully resolved simulations or experiments.
Machine learning has been used to identify and model discrepancies in the Reynolds
stress tensor between a RANS model and high-fidelity simulations (Ling & Templeton
2015; Parish & Duraisamy 2016; Ling et al. 2016b; Xiao et al. 2016; Singh et al. 2017; Wang
et al. 2017). Ling & Templeton (2015) compare support vector machines, Adaboost decision
trees, and random forests to classify and predict regions of high uncertainty in the Reynolds
stress tensor. Wang et al. (2017) use random forests to built a supervised model for the
discrepancy in the Reynolds stress tensor. Xiao et al. (2016) leveraged sparse online velocity
measurements in a Bayesian framework to infer these discrepancies. In related work, Parish
& Duraisamy (2016) develop the field inversion and machine learning modeling framework,
that builds corrective models based on inverse modeling. This framework was later used by
Singh et al. (2017) to develop a neural network enhanced correction to the Spalart-Allmaras
RANS model, with excellent performance. A key result by Ling et al. (2016b) employed the
first deep network architecture with many hidden layers to model the anisotropic Reynolds
stress tensor, as shown in Fig. 7. Their novel architecture incorporates a multiplicative layer
to embed Galilean invariance into the tensor predictions. This provides an innovative and
simple approach to embed known physical symmetries and invariances into the learning
architecture (Ling et al. 2016a), which we believe will be essential in future efforts that
combine learning for physics. For large eddy simulation closures, Maulik et al. (2019) have
employed artificial neural networks to predict the turbulence source term from coarsely
resolved quantities.
3.2.5. Challenges of machine learning for dynamical systems. Applying machine learning to
model physical dynamical systems poses a number of unique challenges and opportunities.
Model interpretability and generalizability are essential cornerstones in physics. A well
crafted model will yield hypotheses for new phenomena that have not been observed before.
This principle is clearly exhibited in the parsimonious formulation of classical mechanics in
Newton’s second law.
High-dimensional systems, such as those encountered in unsteady fluid dynamics, have
the challenges of multi-scale dynamics, sensitivity to noise and disturbances, latent variables
and transients, all of which require careful attention when applying machine learning tech-
niques. In machine learning for dynamics, we distinguish two tasks: discovering unknown
physics and improving models by incorporating known physics. Many learning architec-
tures, cannot readily incorporate physical constraints in the form of symmetries, boundary
conditions, and global conservation laws. This is a critical area for continued development
and a number of recent works have presented generalizable physics models (Battaglia et al.
2018).
4.2.1. Neural networks for control. Neural networks have received significant attention for
system identification (see Sec. 3) and control, including applications in aerodynamics (Phan
et al. 1995). The application of NNs to turbulence flow control was pioneered in Lee
et al. (1997). The skin-friction drag of a turbulent boundary layer was reduced using
local wall-normal blowing and suction based on few skin friction sensors. A sensor-based
control law was learned from a known optimal full-information controller, with little loss
in overall performance. Furthermore, a single-layer network was optimized for skin-friction
drag reduction without incorporating any prior knowledge of the actuation commands.
Both strategies led to a conceptually simple local opposition control. Several other studies
employ neural networks, e.g. for phasor control (Rabault et al. 2019) or even frequency cross
talk. The price for the theoretical advantage of approximating arbitrary nonlinear control
laws is the need for many parameters to be optimized. Neural network control may require
exorbitant computational or experimental resources for configurations with complex high-
dimensional nonlinearities and many sensors and actuators. At the same time, the training
time of neural networks has been improved by several orders of magnitude since these early
applications, which warrants further investigation into their potential for flow control.
4.2.2. Genetic algorithms for control. Genetic algorithms have been deployed to solve a
number of flow control problems. They require that the structure of the control law is
pre-specified and contains only a few adjustable parameters. An example of GA for con-
trol design in fluids was used for experimental mixing optimization of the backward-facing
step (Benard et al. 2016). As with neural network control, the learning time increases with
the number of parameters, making it challenging or even prohibitive for controllers with
nonlinearities, e.g. a constant-linear-quadratic law, with signal history, e.g. a Kalman filter,
or with multiple sensors and actuators.
Genetic programming has been used extensively in active control for engineering ap-
plications (Dracopoulos 1997; Fleming & Purshouse 2002) and in recent years in several
flow control plants. This includes the learning of multi-frequency open-loop actuation,
Reinforcement multi-input sensor feedback, and distributed control. We refer to Duriez et al. (2016) for
Learning: An agentan in-depth description of the method and to Noack (2018) for an overview of the plants.
learns a policy toWe remark that most control laws have been obtained within 1000 test evalutations, each
maximize its longrequiring only few seconds in a wind-tunnel.
term rewards by
interacting with its
environment. 4.3. Flow Control via Reinforcement Learning
In recent years RL has advanced beyond the realm of games and has become a funda-
mental mode of problem solving in a growing number of domains, including to reproduce
the dynamics of hydrological systems (Loucks et al. 2005), actively control the oscillatory
laminar flow around bluff bodies (Guéniat et al. 2016), study the individual (Gazzola et al.
2014) or the collective motion of fish (Gazzola et al. 2016; Novati et al. 2017; Verma et al.
2018), maximize the range of simulated (Reddy et al. 2016) and robotic (Reddy et al. 2018)
gliders, optimize the kinematic motion of UAVs (Kim et al. 2004; Tedrake et al. 2009), and
optimize the motion of microswimmers (Colabrese et al. 2017, 2018). Figure 8 provides a
schematic of reinforcement learning with compelling examples related to fluid mechanics.
Fluid mechanics knowledge is essential for applications of RL, as success or failure hinges
on properly selecting states, actions, and rewards that reflect the governing mechanisms of
the flow problem. Natural organisms and their sensors, such as the visual system in a bird
or the lateral line in a fish, can guide the choice of states. As sensor technologies progress at
a rapid pace, the algorithmic challenge may be that of optimal sensor placement (Papadim-
itriou & Papadimitriou 2015; Manohar et al. 2018). The actions reflect the flow actuation
device and may involve body deformation or wing flapping. Rewards may include energetic
factors, such as the cost of transport, or proximity to the center of a fish school to avoid pre-
dation. The computational cost of RL remains a challenge to its widespread adoption, but
we believe this deficiency can be mediated by the parallelism inherent to RL. There is grow-
ing interest in methods designed to be transferable from low-accuracy (e.g. 2-dimensional)
to high-accuracy (e.g. 3-dimensional) simulations (Verma et al. 2018), or from simulations
to related real-world applications (Richter et al. 2016; Bousmalis et al. 2017).
SUMMARY POINTS
1. Machine learning entails powerful information processing algorithms that are rel-
evant for modeling, optimization, and control of fluids. Effective problem solvers
will have expertise in machine learning and in-depth knowledge of fluid mechanics.
2. Fluid mechanics is a traditional discipline of big data. For decades it has used
machine learning to understand, predict, optimize, and control flows. Currently,
machine learning capabilities are advancing at an incredible rate, and fluid me-
chanics is beginning to tap into the full potential of these powerful methods.
3. Many tasks in fluid mechanics, such as reduced-order modeling, shape optimization,
and feedback control, may be posed as optimization and regression tasks. Machine
learning can dramatically improve optimization performance and reduce conver-
gence time. Machine learning is also used for dimensionality reduction, identifying
low-dimensional manifolds and discrete flow regimes, which benefit understanding.
4. Flow control strategies have been traditionally based on the precise sequence: from
understanding to modeling and then control. The machine-learning paradigm sug-
gests more flexibility and iterates between data driven and first principle approaches.
FUTURE ISSUES
1. Machine learning algorithms often come without guarantees for performance, ro-
bustness, or convergence, even for well-defined tasks. How can interpretability,
generalizability, and explainability of the results be achieved?
2. Incorporating and enforcing known flow physics is a challenge and opportunity
for machine learning algorithms. Can we hybridize data driven and first principle
approaches in fluid mechanics?
3. There are many possibilities to discover new physical mechanisms, symmetries,
constraints, and invariances from fluids data.
4. Data driven modeling may be a potent alternative in revisiting existing empirical
laws in fluid mechanics.
5. Machine learning encourages open sharing of data and software. Can this assist the
development of frameworks for reproducible and open science in fluid mechanics?
6. Fluids researchers will benefit from interfacing with the machine learning commu-
nity, where the latest advances are reported in peer reviewed conferences.
ACKNOWLEDGMENTS
SLB acknowledges funding from the Army Research Office (ARO W911NF-17-1-0306,
W911NF-17-1-0422) and the Air Force Office of Scientific Research (AFOSR FA9550-18-
1-0200). BRN acknowledges funding by LIMSI-CNRS, Université Paris Sud (SMEMaG),
the French National Research Agency (ANR-11-IDEX-0003-02, ANR-17-ASTR-0022) and
the German Research Foundation (CRC880, SE 2504/2-1, SE 2504/3-1). PK acknowledges
funding from the ERC Advanced Investigator Award (FMCoBe, No. 34117), the Swiss
National Science Foundation and the Swiss Supercomputing center (CSCS). We are grate-
ful for discussions with Nathan Kutz (University of Washington), Jean-Christophe Loiseau
(ENSAM ParisTech, Paris), François Lusseyran (LIMSI-CNRS, Paris), Guido Novati (ETH
Zurich), Luc Pastur (ENSTA ParisTech, Paris), and Pantelis Vlachas (ETH Zurich).
LITERATURE CITED
Adrian RJ. 1991. Particle-imaging techniques for experimental fluid mechanics. Annu. Rev. Fluid
Mech. 23:261–304
Alsalman M, Colvert B, Kanso E. 2018. Training bioinspired sensors to classify flows. Bioinspiration
Biomim. 14:016009
Amsallem D, Zahr MJ, Farhat C. 2012. Nonlinear model order reduction based on local reduced-
order bases. Int. J. Numer. Meth. Engin. 92:891–916
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, et al. 2017. Hindsight experience replay,
In Adv. Neural Inf. Process. Syst.
Bai Z, Wimalajeewa T, Berger Z, Wang G, Glauser M, Varshney PK. 2014. Low-dimensional ap-
proach for reconstruction of airfoil data via compressive sensing. AIAA J. 53:920–933
Baldi P, Hornik K. 1989. Neural networks and principal component analysis: Learning How linear PCA
(or POD) connects
from examples without local minima. Neural Netw. 2:53–58
to linear neural
Barber D. 2012. Bayesian inference and machine learning. Cambridge University Press
networks.
Reproducible
Barber, R. F. and Candes E. J., 2015. Conrolling the False discovry rate via knock-offs
science: a
Annals of Statistics 43:2055-2085
framework.
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, et al. 2018. Relational
inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261
Bellman R. 1952. On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 38:716–719
Benard N, Pons-Prats J, Periaux J, Bugeda G, Braud P, et al. 2016. Turbulent separated shear flow
control by surface plasma actuator: experimental optimization by genetic algorithm approach.
Exp. Fluids 57:22:1–17
Bewley TR, Moin P, Temam R. 2001. DNS-based predictive control of turbulence: an optimal
benchmark for feedback algorithms. J. Fluid Mech. 447:179–225
Bishop CM, James GD. 1993. Analysis of multiphase flows using dual-energy gamma densitometry
and neural networks. Nucl. Instrum. Methods Phys. Res. 327:580–593
Bölcskei H, Grohs P, Kutyniok G, Petersen P. 2019. Optimal approximation with Theoretical
Analysis of the
sparsely connected deep neural networks. SIAM J. Math. Data Sci. 1:8–45
approximation
Bourguignon JL, Tropp JA, Sharma AS, McKeon BJ. 2014. Compact representation of wall-bounded
properties of deep
turbulence using compressive sampling. Phys. Fluids 26:015109
neural networks.
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, et al. 2017. Using simulation and domain
adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857
Breiman L. 2001. Random forests. Mach. Learn. 45:5–32