0% found this document useful (0 votes)
26 views19 pages

2023 White Paper Physics Guided Machine Learning

cv

Uploaded by

Kirit Siddhapara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views19 pages

2023 White Paper Physics Guided Machine Learning

cv

Uploaded by

Kirit Siddhapara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Physics-Guided

Machine Learning:
Putting AI to work in industry
Physics-Guided Table of contents
Machine Learning:
Summary............................................................................................................................................ pg. 3
Putting AI to work in industry
Introduction....................................................................................................................................... pg. 4

What is physics-guided machine learning?.................................................................................... pg. 5


Constructing physics-guided machine learning..................................................................... pg. 5
Mathematical modeling vs machine learning.......................................................................... pg. 6
Adding physics to machine learning models: a deep dive...................................................... pg. 6
Feature engineering................................................................................................................... pg. 8
Proxy models............................................................................................................................... pg. 9
Customizing the loss function using physics knowledge...................................................... pg. 9

Physics-guided machine learning solutions................................................................................... pg. 11


Oil-water separation................................................................................................................... pg. 11
Virtual flow meters..................................................................................................................... pg. 14

About Cognite Conclusion......................................................................................................................................... pg. 17

Cognite is a global industrial SaaS company that


supports the full-scale digital transformation of
asset-heavy industries around the world. Our core
Industrial DataOps platform, Cognite Data Fusion®,
enables data and domain users to collaborate to
quickly and safely develop, operationalize, and

2
scale industrial AI solutions and applications.

©COGNITE 2023 — COGNITE.COM


Cognite Data Fusion® codifies industrial domain
knowledge into software that fits into your existing
ecosystem and enables scale from proofs of
concepts to truly data-driven operations to deliver
both profitability and sustainability.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Summary

Hydrocarbon production systems generate huge This paper explains how to use physical principles
data sets, often with time series stretching back in feature engineering to improve machine learn-
decades. However, much of the data may be obso- ing outcomes. Equipped with energy, mass, and
lete due to changing reservoir conditions and modi- force balances; pressure, volume, and tempera-
fications to assets, and there may be scant data ture (PVT) data for production fluids; and dimen-
close to optimal operating conditions due to the sional and order-of-magnitude analyses, oil and
inadequacy of existing optimization tools. gas companies can squeeze additional value from
a pure data-based approach while avoiding expen-
Data science, artificial intelligence (AI), and sive, time-consuming, and often inaccurate simu-
machine learning can contribute significantly to lations.
the optimization of production operations, and
there is a trend toward hybrid AI, which combines
data science with traditional physics-based simu-
lators to deliver added value.

Physics-guided machine learning can add tremen-


dous value to digitalization initiatives across a wide
range of production optimization use cases and
speed up decision processes that mitigate produc-
tion losses in complex industrial phenomena.

3
©COGNITE 2023 — COGNITE.COM
WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Introduction

Artificial intelligence (AI) has been immensely ant. As an example, if an irrelevant advertisement Machine learning has a clear value potential in
successful in areas such as image recognition, is displayed on a website, it’s not the end of the the oil and gas industry, but it must go hand-in-
natural language processing, advertising, and world. Either it results in a click or it doesn’t. For the hand with physics.
games — let’s call them classic applications of AI. oil and gas industry, the size of the error is usually
However, for industries such as oil and gas and critical. A small error in the predicted surge volume
manufacturing, the success stories are fewer, is usually not a problem, but a large error could lead
despite the high value potential. This is because of to a trip or, worse, a flooding incident.
the fundamental differences between the classic
applications of AI and those used for industrial ⇢ Three: For classic applications of AI the amount
problems. of training data may be enormous. For example, the
ImageNet data set contains more than 14 million
There are four main reasons why: pictures. The amount of text available for natural
language processing is almost unfathomable. For
⇢ One: For many of the classic applications of AI some applications it is even possible to automati-
there are few or zero competing methods. One cally generate training data. It is a common miscon-
example: There are few mathematical models ception that the oil and gas industry has large
describing consumer behavior. In the oil and gas amounts of data. Although a typical installation
industry, in comparison, the majority of the prob- will be instrumented with thousands of sensors
lems are governed by the laws of physics and can that may have been collecting data for decades,
be described using mathematical and phenome- the actual amount of relevant data is small.
nological models that form the basis of advanced
simulators. The industry has been using these ⇢ Four: Finally, an important difference between
simulators for decades to support critical decisions, some of the industries where AI has been success-
and while the simulators have varying degrees of ful and the oil and gas industry is the quality of the

4
accuracy and uncertainty, users have an under- data. A typical data set used by classic applications

©COGNITE 2023 — COGNITE.COM


standing of them and take uncertainty into account of AI will have no or negligible noise levels. Much
when making decisions based on their results. of the data used in oil and gas engineering comes
from physical sensors located in harsh environ-
⇢ Two: For many of the classic applications of AI ments, which means they are subject to varying
the consequence of an erroneous prediction is not degrees of noise and bias and different raw data
severe, and the size of the error is often not import- compression levels.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
What is physics-guided machine learning?

Early in the Fourth Industrial Revolution it was to carry out mitigating actions if the quality is not ous first-principle modeling to pure empirical
widely believed that through digitalization all prob- satisfactory. modeling, and this is the area we want to explore to
lems would be solved using AI, and that machine understand how we can construct physics-guided
learning would replace mathematical models. That There are several ways to create the model f. Histor- machine learning.
understanding is changing. AI and machine learn- ically this was done using physics insight and math-
ing are increasingly seen as complementary tools ematical modeling. There are various techniques
to be used with existing industry-specific tools (for for deriving such models, where the most rigorous
example physics simulators). One example of this approach is based on first principles like conser-
is the use of mathematical modeling and feature vation and balance principles. Relevant examples
engineering to reinforce a machine learning model. are conservation of mass, momentum, volume, and
energy, which are the foundation for many of the
successful physics simulators used in the oil and
Constructing physics-guided machine gas industry.
learning
Not all problems are easily described using first
Assume we want to predict a set of parameters principles, or the resulting mathematical model
Y from a set of observed variables X. Typically X may be too complex to be solved within a reason-
represents our sensor data. We denote this rela- able time frame. An effective approach is to aver-
tionship as age effects in time or time and space, reducing the
complexity of the phenomena that are modeled
Y=f(X), and sometimes also the number of spatial dimen-
sions. Turbulence modeling is an example of aver-
where f is our predictive model. One example of aging small-scale effects while hydraulic modeling
an application is where Y is a property that is not is an example of averaging effects in entire spatial

5
continuously measured, such as the quality of a dimensions.

©COGNITE 2023 — COGNITE.COM


product. Instead of only relying on infrequent spot
samples, it is possible to create a model f that On the other side of the spectrum we find empir-
approximates the product quality Y based on the ical or phenomenological modeling, where only
state X of the system. This is often called a virtual measurements are used to derive the model. Pure
or soft sensor, and it is particularly useful when machine learning belongs to this category. In
the results of spot samples are not available in time between, there is a continuous range from rigor-

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Mathematical modeling different components, leading to a literally infinite ¹ Swanson, C. J., Julian, B., Ihas, G. G., and Donnelly,
vs machine learning number of possible compositions. This exponen- R. J. 2002. Pipe flow measurements over a wide
tially increasing data requirement as a function of range of Reynolds numbers using liquid helium
Let’s compare the strengths and weaknesses of the number of input parameters is known as the and various gases. J. Fluid Mech. 461, 51–60.
the more rigorous physics simulators and machine curse of dimensionality.
learning methods: ► Zagarola, M. V., and Smits, A. J. 1998. Mean-flow
scaling of turbulent pipe flow. J. Fluid Mech., vol.
Simulators and machine learning models clearly 373, pp. 33–79.
complement each other. Combining the two
methods keeps the strengths and reduces the
weaknesses.

Physics simulators Machine learning


Adding physics to machine learning
models: a deep dive Can predict without access to historical data (from Requires a large set of training data for relevant
first oil) conditions
One of the most fundamental problems in the oil
Tested, tried, and proven across industries, even Unproven; considered hard to interpret (“black
and gas industry is pressure drop in a pipeline. This
for critical applications box”)
determines the maximum throughput for a given
pipe length and diameter; alternatively, it shows Require a mathematical model derived from phys- Possible to set up without any knowledge of the
the need for a pressure boost to meet a required ics principles (not always possible) underlying physics
flow rate.
Require a complete set of data such as boundary
Can work even on a small set of sensors (but may
For the purposes of this section, we will consider conditions, geometry, and fluid and material prop-
not be very accurate)
erties
single-phase pressure drop measurements from
two different laboratories¹ for six different fluids (He, Can predict outside the range of data used to
O₂, N₂, Air, CO₂, and SF₆), two different pipes (both High uncertainty outside the range of the training
create and validate the model (with varying uncer-

6
diameter D and roughness ϵ), different tempera- data
tainty)

©COGNITE 2023 — COGNITE.COM


tures T, different pressures P, and different veloci-
ties U, giving us six different input variables, where Fewer success stories for predicting time-depen-
Can predict future events; transient models
one is a categorical variable. Assuming we need 10 dent problems
data points per continuous variable to resolve the
Provide all values from the equations at all posi-
behavior, we need 105 experiments in total per fluid. Provides only the output variables it was trained on
tions in the numerical grid
In addition, reservoir fluids consist of thousands of

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Obviously it is not realistic to generate such a vast There are three important observations:
amount of data, so the goal in this example is to
transform the input parameters into new features, 1. The transformation collapses all the input
which simplifies the problem the machine learning features into one (Reynolds number). Hence, our
model needs to approximate. From fluid mechan- model Y=f(X) will be λ=f(Re)
ics we know that the pressure gradient depends
on the pipe diameter D and the wall shear stress 2. There seems to be a change in the trend of the
τ. The wall shear stress is not a measured quantity, friction factor around Reynolds number 2300.
but we know that the wall shear stress depends This is well-known from fluid mechanics and is
on the fluid density ρ and the velocity U. The fluid the transition from laminar to turbulent flow.
density is a property of the chosen fluid and can be
computed using the laws of thermodynamics and 3. By using the log transformation, the friction
the pressure and temperature for the individual factor looks very close to linear for Reynolds
experiments. The pressure gradient is expressed numbers less than 2300 and something similar
Fig. 1: Single-phase pipe flow experiments from
by a force balance (momentum conservation) to a slowly exponentially decaying function for Oregon (blue) and Princeton (green).
Reynolds numbers larger than 4000. It seems like
a good idea to change our model to (λ) =f((Re)).

There is a remaining unknown in the equation, Figure 2 ► shows the result of a linear regression
namely λ, which is known as the friction factor. An for Re<2300 and a Gaussian Process regression for
important step in any data science work is to inves- Re>4000.
tigate the data by visualization. We will plot the fric-
tion factor instead of the pressure gradient, but we The transformation reduced our five-parame-
still have the challenge of selecting the parameters ter input space to one parameter (the Reynolds
that the friction factors should be plotted against. number), greatly reducing the data needed. It also
Again, from fluid mechanics we know the impor- transformed the problem into a mostly smooth

7
tance of the Reynolds number Re, and we select problem with a linear part and a slowly decaying

©COGNITE 2023 — COGNITE.COM


that as our x-value. Applying this transformation part. Importantly, it also allowed us to isolate and
on all the experiments results in Figure 1 ►. Note model the discontinuous behavior around Re=2300.
that we also did a log transformation of both axes.
The log transformation from a strongly nonlinear
behavior to a more linear behavior reduced the Fig. 2: Linear regression model for Re<2300 and
need for data. In addition, it also makes it easier to Gaussian process regression for Re>4000.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
impose regularization in the model, making it less Feature engineering ually. For this to be possible a given combination of
sensitive to noise in the data. input features from two different wells has to have
The single-phase pipe flow example above shows the same output value. For the single-phase pres-
Note that it is possible to fit a model to the entire set the power of feature engineering, and it illustrates sure drop example, the data from the two different
of data. As an example, a three-layer feedforward two important techniques. labs was comparable when looking at the Reynolds
neural network gave good predictions. However, number and the friction factor instead of the pres-
the extrapolation property for lower Reynolds The first is transformation of features using dimen- sure gradient and the original input variables.
numbers was poor. The linear model has excel- sional analysis. A commonly used starting point
lent extrapolation properties, since it captures the for dimensional analysis is Buckingham’s Pi theo- The second technique that was applied in the
correct physics in the transformed space. rem, which states that the number of dimension- example in the previous section was the inclu-
less parameters is equal to the number of relevant sion of physics models. In the experiments the
One final but very important lesson from this exam- variables minus the number of independent dimen- fluid composition was known and the pressure
ple: We know from fluid mechanics that there exist sions. Imagine an example with five variables (U, ρ, and temperature were measured; however, the
well-established models for the friction factor, like μ, D, ϵ) and three independent dimensions (time, model needed the fluid properties density and
the Colebrook model. From this we know that the length, and mass). According to Buckingham’s Pi viscosity. In some situations the fluid properties
friction factor is also a function of the relative pipe theorem, that gives us 5-3=2 dimensionless param- can be measured separately, but they can also be
roughness eters, which is the same as the number of input computed using equations of state, the compo-
parameters to the Colebrook friction factor model. sition of the fluid, and the pressure and tempera-
A challenge is that there are endless possibilities ture. By converting the fluid composition, the
for creating dimensionless parameters. It takes pressure, and the temperature to fluid properties,
experience and sometimes a lot of trial and error we relieved the machine learning method of the
something that is supported by other experiments. to find the best set of dimensionless parameters. burden of learning this complex behavior from a
The pipes from the Oregon and Princeton experi- scarce data set. Another way of interpreting this
ments have different relative pipe roughnesses; A very attractive method in machine learning is is that we used our physics knowledge and some
however, the Oregon data does not contain data transfer learning, where knowledge from solv- sensor values to create virtual sensors of the fluid
for Reynolds numbers in the region where the ing one problem can be transferred to a differ- properties that are used as input features instead
wall roughness becomes important (the hydrau- ent but related problem. One example is to train of the originally measured pressure and tempera-

8
lic rough region). If we had a more extensive data a facial recognition model by training on images ture.

©COGNITE 2023 — COGNITE.COM


set, our model should have been (λ) =f((Re) , ϵrel ). from ImageNet to learn how to recognize a face
This is a reminder that for nonlinear problems the and then train on a specific person to be able to Most sensors are already feature-engineered. A
importance of a feature may be strongly depen- recognize that individual. This is highly attractive temperature sensor can be of the thermocouple
dent on the operational conditions, and it may not for problems with scarce data but numerous similar type, where the electrical voltage between two
be revealed by the available data. problems. An example would be to train a model on dissimilar metals is measured. The temperature
data from a set of wells instead of each well individ- is calculated based on the voltage measurement

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
and knowledge of the proportionality constant. Proxy models Customizing the loss function using
Another slightly more sophisticated example is a physics knowledge
Venturi single-phase flow meter, which measures When optimizing a process we do not only need to
the pressure drop across a throat and computes know the current state but also how changes in Most out-of-the-box machine learning models use
the volumetric flow rate based on Bernoulli’s equa- operational conditions will influence the outcome unweighted least squares as the default loss func-
tion, the continuity equation, and the fluid proper- we are trying to optimize. This usually requires tion. However, in reality, the consequence of errors
ties. The fluid properties can be computed based numerous calls to our model f. If f is a computation- is dependent on the operational conditions. When
on the fluid composition, the measured pressure, ally expensive model, for example a physics simu- predicting surge volumes arriving at the receiving
and the temperature. Consequently, the majority lator, it may be impossible to compute the optimal facility, large relative errors for small surges have
of sensors already incorporate important physics operational conditions fast enough for the oper- little or no consequence, but medium errors for
knowledge. Feature engineering is just a continu- ator to act on the advice. An added challenge is larger surges may lead to trips, emergency flaring,
ation of this approach. that a simulator may produce no results for certain or in the worst case accidents. This is particularly
input conditions (crash), or it may have a nons- important if the data set is biased to the less prob-
Feature-engineered variables do not have to mooth behavior as a function of some of the control lematic area, which is very common due to oper-
be perfect in order to be useful. They only need parameters. This creates additional challenges for ational practicalities; most historical operations
to capture the main features of the behav- the optimizer. have occurred safely, so there is little if any data for
ior. The discrepancy will be compensated for by irregular conditions.
the machine learning model. Most mathematical A well-known technique from optimization is the
modeling techniques have less flexibility in this use of proxy models. Instead of optimizing on the There are numerous ways to include this knowl-
sense. When a functional form is chosen, the full model f, we optimize on an approximation model edge into the loss function, and it is an important
unknowns in the model are determined by match- f. One technique is to create f by fitting a machine technique to ensure that during training we priori-
ing data. If the functional form is correct, it results learning method to presimulated results from tize the accuracy of the model in the region where
in a robust model that can extrapolate with low a physics simulator. For a sufficiently large data accuracy is important. A simple approach is to
uncertainty. However, if the functional form does set the proxy model will inherit the accuracy and increase the weight in the loss function for data
not capture the true physics, it has no way of predictive capabilities of the simulator while having in the region where errors are critical.
compensating for it. A simple example would be the evaluation speed and robustness of a machine
to fit a linear model to a quadratic function. The learning model, provided the model is not used This needs to be combined with rebalancing the

9
mathematical model will never be correct, while a outside the range of the training data. A remedy data set. For a classification problem the groups

©COGNITE 2023 — COGNITE.COM


machine learning model that takes the linear model for evaluations outside the available data set of f is are explicitly given, making it easy to detect imbal-
as an input feature will be able to correct for the to automatically run the simulator for those eval- ance. For our regression problem the groups are
missing physics, at least in the range of the data. uations, extending the training set and retraining not explicitly given, but our understanding of phys-
the machine learning model f. ics will help us determine how to classify the data
into groups so that we can evaluate if we have an
unbalanced data set and hence compensate for it.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
The same weighted loss function technique can
be used for weighting data based on other impor-
tance factors such as the age of the data, assum-
ing that the field is changing and older data is less
relevant than newer.

It is equally important to report the error from the


test data not only as a single number but as a func-
tion of the parameters that characterize the crit-
icality of an error. This information is crucial to be
able to understand the uncertainty and correctly
determine safety margins for the model.

10
©COGNITE 2023 — COGNITE.COM
WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Physics-guided machine learning solutions

Cognite takes a hybrid approach to artificial intelli- Cognite’s approach (separators), centrifuging (hydrocyclones), floa-
gence, combining the best of data-driven machine tation (degasser), and the use of chemicals. In
learning and physics-based modeling. Produced water disposal is one of many chal- situations where the disposed water is highly
lenges at oil and gas facilities with high water- contaminated, it is extremely difficult to determine
Cognite differentiates from pure AI companies cut wells. Keeping the oil contamination level in which part of the plant is responsible for the prob-
with a hybrid data science model unique to indus- the produced water below environmental limits lem. Possible causes range from excessive emulsi-
trial reality. requires an efficient separation process. Obtaining fication due to wellhead choke setting to inefficient
produced water that meets environmental regu- floatation in the degasser due to unfavorable pres-
Physics-driven modeling lations requires an efficient separation process, sure.
and virtual simulations which is governed by a series of complex physical
interactions. To make matters even more complicated, oil and gas
Hybrid AI plants undergo continuous adjustments imposed
Significant production losses are associated with by control room engineers in order to maximize
situations with high oil-in-water levels, because production and minimize the risk of hazards.
Data-driven safely discharging water to the sea requires slow- Furthermore, occasional modifications to the plant
machine learning
ing down production while troubleshooting for composition, such as the startup of a new well or
worst actors on the facility. replacing equipment or injection chemicals, may
have a significant impact on the produced water
Oil-water separation To identify what could be causing high oil-in-water treatment.
concentration, operators often take spot sample
Solution: A smart monitoring system that visu- measurements at different parts of the produc- It is practically impossible to accurately model
alizes all data relevant for troubleshooting water tion facility and then perform mitigating actions oil-in-water concentrations based on live opera-
contamination and a recommender system with once the bad actor is located. Operators rarely tional conditions with a traditional (deterministic)
an underlying machine learning model to identify have much information to determine where to start approach. Even the most sophisticated process

11
worst actors related to high oil-in-water concen- the search, however, which can make finding the simulators would require tremendous computa-

©COGNITE 2023 — COGNITE.COM


trations. bad actor a time-consuming process. Each spot tional resources and skilled engineers and still yield
sampling campaign can take up to two hours and undeterminable accuracy.
Impact: In one example, the solution saved an oil occur multiple times a week.
and gas operator an estimated $6 million a year. The only realistic approach to modeling oil in water
Separation of oil-water dispersions is a complex, is by means of regression, using computing power
READ MORE→ multistage process that involves gravity settling to find hidden patterns and relationships between

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
operational conditions (X) and the oil-in-water Physics and domain knowledge are included in the where dpipe is the pipe’s inner diameter.
concentration (Y). Furthermore, since the prob- model through an extensive data engineering pipe-
lem is both multivariate and nonlinear in nature, we line. First-principle physics modeling of key physical The ratio also enables us to compress the input
have to solve a nonlinear multivariate regression processes, such as choke dispersion and separator variable space significantly and reduce the number
problem. efficiency, provides a way to compress the variable of dependent features to train on. Although the
space and reduce the number of dependent vari- absolute value of the dispersion levels can be
The technical toolkit required to solve this sort ables in the data set. Multiphase flow and process inaccurate as a result of the leading order approx-
of problem exists within the machine learning simulators Digital Oil Field and Unisim, respectively, imation and lack of tuning, when used as an input
domain. It includes ensemble algorithms such enrich the data set with key data such as fluid prop- parameter to machine learning, it showed signifi-
as gradient boosted trees (GBT) and recurrent erties and well-specific flow rates. The flow rates cant improvement of the model predictions as well
neural networks (RNN). One important difference can be used to calculate the time delay between as the importance allocation.
between these algorithms is the way temporal wellhead and point of discharge. This time shift must
coherency is embedded in their respective archi- be taken into account, as some wells are located The separator efficiency model was formulated by
tectures. GBTs consider each row of the data set as more than 30 km from the processing facility. means of a time-scale balance approach. Here the
a system snapshot, while RNNs take into account buoyancy time scale that arises from the Stokes
the sequential nature of time series data. The choke dispersion model considers an energy drag and buoyancy force is compared to the separa-
balance between hydrodynamic kinetic (Eh) energy tor residence time scale of the water body. This
Addressing obstructive events in historical data, and potential surface energy (ES ). The former is model results in a dimensionless parameter that
for example the replacement of equipment or a a function of the pressure-drop ΔPchoke, which in comprises multiple independent variables, includ-
chemical compound, requires a dynamic machine turn depends on the flow rate Qm across the choke, ing flow rates, temperature, separator geometry,
learning approach. One such approach is to auto- the kinematic mixture viscosity vm, and k-factor. water level, and fluid properties. Instead of intro-
matically retrain and reconfigure models until vali- Whereas the latter depends on the droplet diam- ducing each parameter as an input to the machine
dation criteria are met. eter ddrop and surface tension σ, where the drop- learning model, this single nondimensional param-
let diameter is modeled using Hinze’s model and eter represents the entire separator. When impor-
The output from machine learning models needs the surface tension is interpolated from a lookup tance is allocated to this parameter, the user of the
to undergo comprehensive processing in order to table generated using offline thermodynamics tool will understand that this particular equipment
render it interpretable. Machine learning interpret- simulations. The ratio Eh / ES provides an indica- is behaving abnormally.

12
ability libraries such as SHAP and LIME let users tion of the dispersion level that arises from shear

©COGNITE 2023 — COGNITE.COM


extract local importance of features with respect forces induced by the choke settings. The ratio is
to any given target prediction. This is an essential expressed as
aspect of the process, as the importance measure
will in turn be associated with a potential root cause
of local oil-in-water observation.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Fig. 3: An illustration of the choke dispersion model. Fig. 4: Separator efficiency model.

Figure 5 ► shows the information available to Fig. 5: Schematics of the major components in the
the operator in the control room. The upper part separation trains. The first plot shows the feature
importance of the different well templates and
of the dashboard shows the feature importance
separation stages. The second plot shows the
for the different components. The schematics to
predicted and measured oil-in-water concentra-
the left show the influence of the different well tion.

13
templates and the main separation components,

©COGNITE 2023 — COGNITE.COM


while the table to the right shows the number for
each component in the separation train. The lower
graph shows the predicted oil-in-water concen-
tration in blue and the measured concentration in
green. The predictions match the measured values
well except for a few short time-scale incidents.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Virtual flow meters mixed and pass through a throat where the pres-
sure drop is measured, similar to most single phase
Solution: A combination of physical modeling of flow meters. The average density is measured
fluid flow with data analytics on sensor data, creat- using a gamma densitometer or an x-ray sensor,
ing a virtual window into the production system and the water fraction is measured using a capaci-
that continuously supplies gas, oil, and water flow tance or conductance sensor. The fluid properties
rates. are computed based on the fluid composition and
the measured pressure and temperature, and the
Impact: In one example, the solution saved an oil rates of the different phases are then computed
and gas operator an estimated $5-10 million a year based on a mathematical model for the pressure
by giving petroleum engineers and field opera- drop across the throat.
tors 24-hour access to granular insights for better,
faster decision-making. A virtual flow meter is a virtual sensor that uses the
existing sensors (as shown in Figure 6 ►) combined
READ MORE→ with a mathematical model of the multiphase flow.
Many commercial vendors offer VFMs. What these
VFMs have in common is that they are based on
Cognite’s approach rigorous models for conservation of mass, momen-
tum, energy, and volume. These are sophisticated
Flow rates of gas, hydrocarbon liquid, and water solutions that require little to no data, can predict Fig. 6: An illustration of a well and the
are key inputs to most optimization solutions. outside the available data, and can be used for commonly available sensors used in a virtual
Upstream of separation, the flow is a mixture of look-ahead and planning applications. A virtual flow flow meter.
gas, oil, and water, making measurement a difficult meter based on physics-guided machine learning,
task. Multiphase flow meters (MPFM) can measure in comparison, uses simpler and more approxima-
two- or three-phase flow using different tech- tive physics models.
niques, but these meters are expensive and need
frequent calibration in order to produce reliable

14
measurements.

©COGNITE 2023 — COGNITE.COM


To understand a multiphase virtual flow meter
(VFM), it is useful to understand how a typical MPFM
works. Designs may differ, but the operational prin-
ciple of most meters is as follows: The fluids are

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
We have derived a list of different engineered The pressure drop in the well is the sum of the is low, it is a relatively robust estimation. This is an
features where some of the features require hydrostatic pressure drop in the well and the fric- indication that the well pressure drop dPwell has
additional information, such as a CV curve for the tional pressure drop a strong influence on determining the gas-liquid
choke, fluid properties, and information about the fraction. From this we understand how error and
wellbore. If any part of this information does not drift in these sensors affect the model.
exist, it can either be approximated or a different
engineered feature can be selected. Remember The heat balance in the well is another import-
that the engineered feature does not have to be ant engineered feature. Again, this is related to
exact, but it should approximate the correct trend. the mass flow F, the specific heat capacity of the
The purpose of the machine learning model is to where ρM is the mixture density, g is the acceler- phases Cp, the heat transfer coefficient Ω, and an
compensate for the imperfections in the physics. ation due to gravity, H is the difference in height estimation of the surrounding temperature Tr.
between the bottomhole and wellhead pressure
An initial observation is that the well tests are sensors, L is the length of the wellbore between
commonly reported at standard conditions, but the bottomhole and wellhead pressure sensors, D FCp(TBH–TWH)=ΩπDL(Tf–Tr)
the physics is governed by the in-situ conditions. is the wellbore inner diameter, ReM is the mixture’s
The conversion from standard conditions to in-situ Reynolds number, ϵrel is the relative wellbore rough-
conditions requires information about the fluid ness, and U is the average velocity of the fluid The difference in heat capacity between oil and
densities as a function of pressure and tempera- mixture. water is usually significantly larger than the differ-
ture, as well as flashing between liquid and gas. ence in density, indicating that the heat balance
If the correct information is not available, it can From our single-phase flow experiment example, engineered feature strongly influences the esti-
be approximated using ideal gas law for gas and we already know how to estimate the frictional mation of the water cut. From our experience, the
assuming incompressibility for the liquids. pressure drop, assuming homogeneous mixing of wellhead temperature sensor is often the most
the phases. The densities for the different phases unreliable sensor. If it is poorly insulated, it is
The production choke can be viewed as a single- and the height difference between the pressure strongly affected by weather conditions, causing
phase flow meter with an adjustable throttling. sensors gives the hydrostatic pressure drop. an increased uncertainty in the water-cut predic
From Bernoulli’s equation we can derive a simple Note that both the frictional and hydrostatic pres- tions. However, if weather information exists, a
valve equation which relates the measured pres- sure drop estimations require an estimate of the machine learning-based model will to a certain

15
sure drop across the valve dPchoke=PWH– PDC to the cross-sectional fraction of each phase. extent be able to compensate for this. Unknown

©COGNITE 2023 — COGNITE.COM


mixture’s volumetric flow rate, the flow area in the or uncertain parameters, such as the heat transfer
choke, and fluid properties. The valve opening is Assuming low velocity, the pressure drop will be coefficient Ω and the rock temperature Tr are effec-
usually reported as the fraction of the stem travel, dominated by the hydrostatic pressure drop, and tively estimated by the machine learning algorithm.
and we convert it to the flow area using the choke hence we have a good estimate of the gas-liquid
CV curve. ratio by making an assumption of the water cut. Note that several of the parameters may change
Since the density difference between oil and water along the wellbore. In this simple approach, this

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
is handled by using a single representative value.
125
More sophisticated approaches would integrate
along the wellbore.
100

Figure 7 ► shows comparisons of the volumetric

Qw (Sm3/day)
75
oil (Qo) and water (Qw) rate between well test data
(black line) and virtual flow meter predictions (red
line), using the approach detailed above. Note that 50

the well test data in the comparison was not part


of the training data set for the virtual flow meter 25
model.

800

Qo (Sm3/day)
600

400

200

Data

16
Prediction

©COGNITE 2023 — COGNITE.COM


Fig. 7: Comparison of well test data and virtual flow meter results (the x-axis
corresponds to the data points time stamp). The first plot shows the volumet-
ric oil rate Qo, while the second plot shows the volumetric water flow rate Qw.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Conclusion

AI and machine learning should not be considered The building blocks are there: the data, the tools,
a goal in digitalization, but rather seen as another and the domain knowledge. It is up to the indus-
tool in the toolbox available to heavy-asset indus- try to put them all together to unleash the value
tries. Using machine learning to replace an existing potential that lies ahead.
solution is not disruption; however, when machine
learning is used to solve previously unsolvable
problems, or when it significantly outperforms
existing solutions, then it becomes a disruptive
tool. Combining our understanding of physics
with data is the key to unlocking the potential of
machine learning in industrial settings.

Some consider the addition of physics a sign of


defeat, since the appeal of machine learning is that
models are supposed to find relations themselves
based on nothing but data. This couldn’t be further
from the truth. The combination of physics and
data science represents an opportunity to gain
a competitive advantage. Machine learning finds
patterns from information, and by adding physics,
we provide more information — and more impor-
tantly, more accurate information.

The oil and gas industry already has the subject-mat-

17
ter experts needed to take advantage of phys-

©COGNITE 2023 — COGNITE.COM


ics-guided machine learning. Now the challenge
is to set up cross-disciplinary teams with both
subject-matter experts and data scientists, and
to create a common working language that both
camps can speak as they collaborate.

WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
Want to know more about our product?

Explore more insights from Cognite

PRODUCT TOUR CUSTOMER STORIES ANALYST REPORT BLOG

Learn from Cognite customers and Discover how Cognite Data Fusion® Customer interviews and financial Discover our rich catalog of indus-
product managers how Cognite makes data more accessible and analysis reveal an ROI of 400% and try insights and technology deep
Data Fusion® simplifies and stream- meaningful, driving insights that total benefits of $21.56M over three dives.
lines the data experience of a unlock opportunities in real-time, years for the Cognite Data Fusion®
subject matter expert. reduce costs, and improve the platform.
integrity and sustainability of your
operations.

WATCH NOW → GO TO STORIES → READ THE REPORT → READ OUR NEWEST BLOGS →

18
©COGNITE 2023 — COGNITE.COM
WHY COGNITE DATA FUSION® → BENEFITS FOR YOUR TEAMS → CONTACT SALES →
COGNITE.COM →

You might also like