Integrating Scientific Knowledge with Machine Learning
arXiv:2003.04919v6 [[Link]-ph] 14 Mar 2022
for Engineering and Environmental Systems
JARED WILLARD* , University of Minnesota
XIAOWEI JIA* , University of Pittsburgh
SHAOMING XU, University of Minnesota
MICHAEL STEINBACH, University of Minnesota
VIPIN KUMAR, University of Minnesota
There is a growing consensus that solutions to complex science and engineering problems require novel
methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art
machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-
centric objective areas for which these approaches have been applied are summarized, and then classes of
methodologies used to construct physics-guided ML models and hybrid physics-ML frameworks are described.
We then provide a taxonomy of these existing techniques, which uncovers knowledge gaps and potential
crossovers of methods between disciplines that can serve as ideas for future research.
CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies → Ma-
chine learning.
Additional Key Words and Phrases: physics-guided, neural networks, deep learning, physics-informed, theory-
guided, hybrid, knowledge integration
ACM Reference Format:
Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, and Vipin Kumar. 2020. Integrating Scientific
Knowledge with Machine Learning for Engineering and Environmental Systems. 1, 1 (March 2020), 35 pages.
[Link]
1 INTRODUCTION
Machine learning (ML) models, which have already found tremendous success in commercial
applications, are beginning to play an important role in advancing scientific discovery in environ-
mental and engineering domains traditionally dominated by mechanistic (e.g. first principle) models
[30, 124, 128, 141, 142, 157, 232, 283]. The use of ML models is particularly promising in scientific
problems involving processes that are not completely understood, or where it is computationally
infeasible to run mechanistic models at desired resolutions in space and time. However, the ap-
plication of even the state-of-the-art black box ML models has often met with limited success in
This work was supported by NSF grant #1934721 and by DARPA award W911NF-18-1-0027.
* Bothauthors contributed equally to this research.
Authors’ addresses: Jared Willard, willa099@[Link], University of Minnesota, Minneapolis, Minnesota, 55455; Xiaowei Jia,
xiaowei@[Link], University of Pittsburgh, Pittsburgh, Pennsylvania, 15260; Shaoming Xu, xu000114@[Link], University
of Minnesota, Minneapolis, Minnesota, 55455; Michael Steinbach, stei0062@[Link], University of Minnesota, Minneapolis,
Minnesota, 55455; Vipin Kumar, kumar001@[Link], University of Minnesota, Minneapolis, Minnesota, 55455.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting
with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from permissions@[Link].
© 2020 Association for Computing Machinery.
XXXX-XXXX/2020/3-ART $15.00
[Link]
, Vol. 1, No. 1, Article . Publication date: March 2020.