0% found this document useful (0 votes)
74 views12 pages

Bayesian Optimization For Adaptive Experimental Design A Review

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views12 pages

Bayesian Optimization For Adaptive Experimental Design A Review

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received November 27, 2019, accepted December 27, 2019, date of publication January 13, 2020, date of current

version January 23, 2020.


Digital Object Identifier 10.1109/ACCESS.2020.2966228

Bayesian Optimization for Adaptive Experimental


Design: A Review
STEWART GREENHILL , SANTU RANA , SUNIL GUPTA , PRATIBHA VELLANKI ,
AND SVETHA VENKATESH
Applied Artificial Intelligence Institute, Deakin University, Waurn Ponds Campus, Geelong, VIC 3216, Australia
Corresponding author: Stewart Greenhill ([email protected])
This work was supported by the Australian Government through the Australian Research Council (ARC). The work of Svetha Venkatesh
was supported by the ARC Australian Laureate Fellowship under Grant FL170100006.

ABSTRACT Bayesian optimisation is a statistical method that efficiently models and optimises expensive
‘‘black-box’’ functions. This review considers the application of Bayesian optimisation to experimental
design, in comparison to existing Design of Experiments (DOE) methods. Solutions are surveyed for a range
of core issues in experimental design including: the incorporation of prior knowledge, high dimensional
optimisation, constraints, batch evaluation, multiple objectives, multi-fidelity data, and mixed variable types.

INDEX TERMS Bayesian methods, design for experiments, design optimization, machine learning algo-
rithms.

I. INTRODUCTION which must balanced including yield, robustness, and cost.


Experiments are fundamental to scientific and engineering In classical experimental design, modelling and optimisation
practice. A well-designed experiment yields an empirical are separate processes, but newer model-based approaches
model of a process, which facilitates understanding and can potentially sample more efficiently by adapting to the
prediction of its behaviour. Experiments are often costly, response surface, and can incorporate optimisation into the
so formal Design of Experiments methods (or DOE) [1]–[3] modelling process.
optimise measurement of the design space to give the best Machine learning has made great strides in the recent
model from the fewest observations. past, and we present here a machine learning approach to
Models are important decision tools for design engineers. experimental design. Bayesian Optimisation (BO) [19], [20]
Understanding of design problems is enhanced when the is a powerful method for efficient global optimisation of
design space can be explored cheaply and rapidly, allowing expensive black-box functions. The experimental method
adjustment of the number and range of design variables, iden- introduces specific challenges: how to handle constraints,
tification of ineffective constraints, balancing multiple design high dimensionality, mixed variable types, multiple objec-
objectives, and optimisation [4]. Industrial processes must tives, parallel (batch) evaluation, and the transfer of prior
be robust to environmental conditions, component variation, knowledge. Several reviews have presented BO for a tech-
and variability around a target [3]. Robust Parameter Design nical audience [20]–[22]. Our review surveys recent meth-
(RPD) [5]–[7] systematically characterises the influence of ods for systematically handling these challenges within a
uncontrollable variables and noise. The number of observa- BO framework, with an emphasis on applications in science
tions required to build a model increases rapidly with the and engineering, and in the context of modern experimental
number of variables, making it challenging to investigate design.
systems with many variables. Screening experiments can Bayesian optimisation is a sample efficient optimisa-
identify subsets of important variables to be later investigated tion algorithm and thus suits optimisation of expensive,
in more detail [8], [9]. Optimisation is important in most black-box systems. By ‘‘black-box’’ we mean that the
industrial applications, and there are often multiple objectives objective function does not have a closed-form represen-
tation, does not provide function derivatives, and only
The associate editor coordinating the review of this manuscript and allows point-wise evaluation. Several optimisation algo-
approving it for publication was Bijoy Chand Chatterjee . rithms can handle optimisation of black-box functions such as

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 13937
S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

FIGURE 1. Sampling methods used in experimental design. In classical Factorial designs samples are placed on a geometric grid. Space filling
designs are used with a variety of non-linear models. Sample requirements are determined heuristically, but these designs are empirically much
more efficient than grids.

multi-start derivative free local optimiser e.g. COBYLA [36], many research issues remain open [10], [11] such as: mixing
or evolutionary algorithms e.g. ISRES [37], or Lipschitzian of discrete and continuous variables, incorporation of global
methods such as DIRECT [34]. However, none of these are sensitivity information, and sequential sampling.
designed to be sample efficient, and all need to evaluate a Response Surface Methodology (RSM) [3], [12] is a
function many times to perform optimisation. In contrast, sequential approach which has become the primary method
Bayesian optimisation uses a model based approach with for industrial experimentation. In its original form, response
an adaptive sampling strategy to minimise the number of surfaces are second order polynomials which are determined
function evaluations. using central composite factorial experiments, and a path
Past approaches to experimental design have closely cou- of steepest ascent is used to seek an optimal point. For
pled sampling and modelling. Factorial designs assume a robust design, replication is used to estimate noise fac-
linear model and sample at orthogonal corners of the design tors, and optimisation must consider dual responses for pro-
space (see Figure 1). For more complex non-linear models, cess mean and variance. Approaches for handling multi-
general purpose space-filling designs such as Latin hyper- ple objectives include ‘‘split-plot’’ techniques, ‘‘desirability
cubes offer a more uniform coverage of the design space. For functions’’ and Pareto fronts [13]. Non-parametric RSM can
N sample points in k dimensions, there are (N !)k−1 possi- be more general than second-order polynomials, and uses
ble Latin hypercube designs, and finding a suitable design techniques such as Gaussian processes, thin-plate splines,
involves balancing space-filling (e.g. via entropy, or potential and neural networks. Alternative optimisation approaches
energy) with other desirable properties such as orthogonality. include simulated annealing, branch-and-bound and genetic
Much literature exists on the design of Latin hypercubes, and algorithms [14].

13938 VOLUME 8, 2020


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

In many areas, experiments are performed with detailed The recent advances in both the theory and practice of
computer simulations of physical systems. Aerospace design- Bayesian optimisation has led to a plethora of techniques.
ers frequently work with expensive CFD (computational In most parts, each advance is applicable to a sub-set of
fluid dynamic) and FEA (finite element analysis) simula- experimental conditions. What is lacking is both an overview
tions. Multi-agent simulations are used to model how actor of these methods and a methodology to adapt these tech-
behaviour determines the outcome of group interactions niques to a particular experimental design context. We fill
in areas such as defence, networking, transportation, and this gap and provide a comprehensive study of the state-of-
logistics. Design and Analysis of Computer Experiments the-art Bayesian optimisation algorithms in terms of their
(or DACE, after [15]) differs from DOE in several ways. Sim- applicability in experimental optimisation. Further, we pro-
ulations are generally deterministic, without random effects vide a template of how disparate algorithms can be connected
and uncontrolled variables, so less emphasis is placed on to create a fit-for-purpose solution. This thus provides an
dealing with measurement noise. Simulations often include overview of the capability and increases the reach of these
many variables, so there is more need to handle high dimen- powerful methods. We conclude by discussion where further
sionality and mixed variable types. Where the response research is needed.
is complex, non-parametric models are used, including
Gaussian Processes, Multivariate Adaptive Regression II. BAYESIAN OPTIMISATION
Splines, and Support Vector Regression [4], [16], [17]. Bayesian optimisation incorporates two main ideas:
A problem with classical DOE and space-filling designs is • A Gaussian process (GP) is used to maintain a belief
that the sampling pattern is determined before measurements over the design space. This simultaneously models the
are made, and cannot adapt to features that appear during predicted mean µt (x) and the epistemic uncertainty
the experiment. In contrast, adaptive sampling [16], [18] σt (x) at any point x in the input space, given a set
is a sequential process that decides the location of the next of observations D1:t = {(x1 , y1 ), (x2 , y2 ), ...(xt , yt )},
sample by balancing two criteria. Firstly, it samples in areas where xt is the process input, and yt is the corresponding
that have not been previously explored (e.g. based on distance output at time t.
from previous samples). Secondly, it samples more densely • An acquisition function expresses the most promising
in areas where interesting behaviour is observed, such as setting for the next experiment, based on the predicted
rapid change or non-linearity. This can be detected using mean µt (x) and the uncertainty σt (x).
local gradients, prediction variance (e.g. where uncertainty A GP is completely specified by its mean function m(x) and
is modelled), by checking agreement between the model and covariance function k(x, x0 ):
data (cross-validation), or agreement between an ensemble
of models. BO is a form of model-based global optimisation f (x) ∼ GP(m(x), k(x, x0 )) (1)
(MBGO [16]), which uses adaptive sampling to guide the
experiment towards a global optimum. Unlike pure adaptive The covariance function k(x, x0 ) is also called the ‘‘kernel’’,
sampling, MBGO considers the optimum of the modelled and expresses the ‘‘smoothness’’ of the process. We expect
objective when deciding where to sample. that if two points x and x0 are ‘‘close’’, then the corresponding
Recently, there has been a surge in applying Bayesian opti- process outputs y and y0 will also be ‘‘close’’, and that the
misation to design problems involving physical products and closeness depends on the distance between the points, and
processes. In [23], Bayesian optimisation is applied in combi- not the absolute location or direction of separation. A popular
nation with a density functional theory (DFT) based computa- choice for the covariance function is the squared exponential
tional tool to design low thermal hysteresis NiTi-based shape (SE) function, also known as radial basis function (RBF):
memory alloys. Similarly, in [24] Bayesian optimisation is
 
1 2
used to optimise both the alloy composition and the asso- k(x, x0 ) = exp − 2 x − x0 (2)

ciated heat treatment schedule to improve the performance
of Al-7xxx series alloys. In [25], Bayesian optimisation is Equation 2 says that the correlation decreases with the
applied for high-quality nano-fibre design meeting a required square of the distance between points, and includes a param-
specification of fibre length and diameter within few tens of eter θ to define the length scale over which this happens.
iterations, greatly accelerating the production process. It has Specialised kernel functions are sometimes used to express
also been applied in other diverse fields including optimi- pre-existing knowledge about the function (e.g. if something
sation of nano-structures for optimal phonon transport [26], is known about the shape of f ).
optimisation for maximum power point tracking in photo- In an experimental setting, observations include a term
voltaic power plants [27], optimisation for efficient determi- for normally distributed noise  ∼ N (0, σnoise2 ), and the
nation of metal oxide grain boundary structures [28], and for observation model is:
optimisation of computer game design to maximise engage- y = f (x) + 
ment [29]. It has also been used in a recent neuroscience study
[30] in designing cognitive tasks that maximally segregate Gaussian process regression (or ‘‘kriging’’) can predict the
ventral and dorsal FPN activity. value of the objective function f (·) at time t + 1 for any

VOLUME 8, 2020 13939


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

FIGURE 2. Bayesian optimisation is an iterative process in which the unknown system response is modelled using a Gaussian process. An acquisition
function expresses the most promising setting for the next experiment, and can be efficiently optimised. The model quality improves progressively over
time as successive measurements are incorporated.

location x. The result is a normal distribution with mean µt (x) PI prefers areas where improvement over the current max-
and uncertainty σt (x). imum f (x+ ) is most likely. EI considers not only proba-
bility of improvement, but also the expected magnitude of
P(ft+1 | D1:t , x) = N (µt (x), σt2 (x)) (3) improvement. GP-UCB maximises f (·) while minimising
regret, the difference between the average utility and the ideal
where utility. Regret bounds are important for theoretically proving
convergence. Unlike the original function, the acquisition
µt (x) = kT [K + σnoise
2
I ]−1 y1:t (4)
function can be cheaply sampled, and may be optimised using
σt (x) = k(x, x) − k [K T
+ σnoise
2
I ]−1 k a derivative-free global optimisation method like DIRECT
k = [k(x, x1 ), k(x, x2 ), . . . , k(x, xt )] [34] or using multi-start method with a derivative based
k(x1 , x1 ) . . . k(x1 , xt )
 
local optimiser such as L-BFGS [35]. Details can be found
.. .. .. in [19], [21].
K = . . . (5)
 

k(xt , x1 ) ... k(xt , xt )
III. EXPERIMENTAL DESIGN WITH BAYESIAN
Using the Gaussian process model, an acquisition func- OPTIMISATION
tion is constructed to represent the most promising setting BO has been influential in computer science for hyper-
for the next experiment. Acquisition functions are mainly parameter tuning [38]–[42], combinatorial optimisation
derived from the µ(x) and σ (x) of the GP model, and are [43], [44], and reinforcement learning [21]. Recent years have
hence cheap to compute. The acquisition function allows a seen new applications in areas such as robotics [45], [46],
balance between exploitation (sampling where the objective neuroscience [47], [48], and materials discovery [49]–[55].
mean µ(·) is high) and exploration (sampling where the Bayesian optimisation is an iterative process outlined
uncertainty σ (·) is high), and its global maximiser is used as in Figure 2, which can be applied to experiments where
the next experimental setting. inputs are unconstrained and the objective is a scalarised
Acquisition functions are designed to be large near poten- function of measured outputs. Examples of this kind include
tially high values of the objective function. Figure 3 shows material design using physical models [56], or laboratory
commonly used acquisition functions: PI, EI, and GP-UCB. experiments [25]. However, experiments often involve

13940 VOLUME 8, 2020


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

FIGURE 3. Acquisition functions expressed in terms of the mean µ(x), variance σ (x), and current maximum f (x+ ). 8(·) and φ(·) are the cumulative
distribution function and the probability distribution function of the standard normal distribution. Some functions include factors to balance between
exploration and exploitation: ξ in PI is constant, whereas κt in GP-UCB usually increases with iteration, causing the search to maintain exploration even
with many samples.

function, this can be imposed on the GP model. This could


include monotonicity, function shape, or the probable loca-
tion of the optimum or other features. (3) Where depen-
dency structures exist in the design space, these can exploited
to constrain the GP, or to handle high dimensionality via
embedding.

B. HIGH DIMENSIONAL OPTIMISATION


The acquisition function must be optimised to find the next
best suggestion for evaluating the objective. In continuous
domain the acquisition functions can be extremely sharp in
high dimensions, having only a few peaks marooned in a large
FIGURE 4. Case Study: Alloy design process. Alloy samples are cast with terrain of almost flat surface. Global optimisation algorithms
varying compositions according to a set of constraints. The samples are such as DIRECT [34] are infeasible above about 10 dimen-
then tempered to improve hardness, which is subsequently measured.
The physics of tempering of an alloy is based on nucleation and growth. sions, and gradient-dependent methods cannot move if ini-
During nucleation, new ‘‘phases” or precipitates are formed when tialised in the flat terrain.
clusters of atoms self-organise. These precipitates then diffuse together
to achieve the requisite alloy characteristics in the growth step. General strategies for tackling high-dimensionality include
[103]: reducing the design space, screening important vari-
ables, decomposing the design into simpler sub-problems,
complicating factors such as constraints, batches, and multi- mapping into a lower-dimensional space, and visualisation.
ple objectives. For example, in the alloy design process the Table 1(4) outlines approaches that have been reported for
composition of each sample follows a set of mixture con- high dimensional BO, including: using coarse-to-fine approx-
straints (see Figure 4). Batches of samples then undergo heat imations, projection into a lower-dimensional space, and
treatment for up to 70 hours, exposed to the same tempera- approximation through low-rank matrices or additive struc-
tures but with possible variation in duration between samples tures. Choice of a method depends on whether the objective
[24]. The optimiser must produce a batch of experimental function has an intrinsic low dimensional structure (4B) or
settings, obeying inequality constraints, with some factors not (4A).
varying and others fixed within each batch. This impacts the Standard BO is known to perform well in low dimen-
design of the optimiser, through the formulation of the model, sions, but performance degrades above about 15-20 dimen-
acquisition functions, and the search strategy. These are active sions. High dimensional BO has been demonstrated for
areas of research, and recent developments are surveyed in the 25-34 intrinsic dimensions on ‘‘real world’’ data, and up to
following discussion. 50 dimensions for synthetic functions [73], [77]. Projection
methods have been shown to work independently of the num-
A. INCORPORATING PRIOR KNOWLEDGE ber of extrinsic dimensions [43], [79], [81], whereas special
Where successive experiments are sufficiently similar to kernels are shown to work in hundreds of dimensions [75].
previous ones, it may be desirable to transfer knowledge
from previous outcomes. Prior knowledge about the function C. MULTI-OBJECTIVE OPTIMISATION
or data can be used to reduce the search complexity and Design problems often include multiple objectives which can
accelerate optimisation. Table 1 outlines some approaches. be challenging to optimise. For example [104] demonstrates
(1) Knowledge may be transferred from past (source) multiple objectives for discovery of new materials. Scalarisa-
experiments to new (target) experiments where there are tion by weighted sum of objectives can be done, but may not
known or learnable similarities between the domains. For work when objectives have strong conflicts. In that setting a
example, the source and target may be loosely similar, Pareto set of optimal points can be found [105]. For a point
or have similar trends. (2) Where something is known in a Pareto set, any one of the objectives cannot be improved
about the influence of particular variables on the objective without penalising another objective.

VOLUME 8, 2020 13941


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

TABLE 1. Methods for transferring prior knowledge from past experiments (source) to new experiments (target) (1–3). Methods marked (*) have only
been demonstrated for Gaussian processes, but are also applicable to Bayesian optimisation. Methods for handling high dimensionality (4),
constraints (5), and parallel optimisation (6).

Many methods have been proposed for using Bayesian conflicting objectives, while also remaining scale-invariant
optimisation for multi-objective optimisation [106]–[109], toward different objectives. The method performs better
but these suffer from computational limitations because the than [107], but suffers in high dimensions and can be com-
acquisition function generally requires computation for all putationally expensive. Predictive entropy search is used by
objective functions and as the number of objective functions [110], allowing the different objectives to be decoupled, com-
grow the computational cost grows exponentially. puting acquisition for subsets of objectives when required.
Moving away from EI, the method of [109] allows the opti- The computational cost increases linearly with the num-
misation of multiple objectives without rank modelling for ber of objectives. The method of [111] can be used for

13942 VOLUME 8, 2020


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

single- or multiple-objective optimisation, including in mul- accurate than measurements obtained from casting experi-
tiple inequality constraints and has been shown to be robust ments. Multi-fidelity Bayesian optimisation has been demon-
in highly constrained settings where the feasible design space strated in [113], [114]. Recently, [115] proposed BO for
is small. an optimisation problem with multi-fidelity data. Although
multi-fidelity approach has been applied in problem-specific
D. CONSTRAINTS context or non-optimisation related tasks [41], [116]–[120],
Table 1(5) outlines some approaches to handling constraints. the method of [115] generalises well for BO problems.
If constraints are known, they can be handled during optimi-
sation of the acquisition function by limiting the search. More G. MIXED-TYPE INPUT
difficult are ‘‘black box’’ constraints that can be evaluated Experimental parameters are often combinations of different
but have unknown form. If the constraint is cheap to evalu- types: continuous, discrete, categorical, and binary. Incorpo-
ate, this is not a problem. Methods for expensive constraint ration of mixed type input is challenging across the domains,
functions include a weighted EI function [83], [84], and including simpler methods such as Latin hypercube sam-
weighted predictive entropy search [86]. A lookahead strat- pling [11]. Non-continuous variables are problematic in
egy for unknown constraints is described by [88]. A different BO because the objective function approximation with GP
formulation for the unknown is proposed by [85], handling assumes continuous input space, with covariance functions
expensive constraints using ADMM solver of [112]. defining the relationship between these continuous variables.
The above methods deal with inequality constraints. In [89] One common way to deal with discrete variables is to round
both inequality and equality constraints are handled, using the value to a close integer [40], but this approach leads to
slack variables to convert inequality constraints to equal- sub-optimal optimisation [121].
ity constraints, and Augmented Lagrangian (AL) to con- Two options for handling mixed-type inputs are:
vert these inequality constraints into a sequence of simpler (1) designing kernels that are suitable for different variables,
sub-problems. and (2) subsampling of data for maximising the objective
The concept of weighted predictive entropy search has function, which is especially useful in higher dimensional
been extended for multi-objective problems [87] for inequal- space. For integer variables the problem can be solved
ity constraints which are both unknown and expensive to through kernel transformation, by assuming the objective
evaluate. A different type of constraint specifically for mul- function to be flat for the region where two continuous vari-
tiple objectives is investigated by [90] where between all ables would be rounded to the same integer [121]. In [67] cat-
the objectives, there exists a rank order preference on which egorical variables are included by one-hot-encoding along-
objective is important. The algorithm developed therein can side numerical variables. A specialised kernel for categorical
preferentially sample the Pareto set such that Pareto samples variables is proposed in [122].
are more varied for the more important objectives. Random forest regression is a good alternative to GP for
regression in a sequential model-based algorithm configura-
E. PARALLEL (BATCH) OPTIMISATION tion (SMAC, [44]). Random forests are good at exploitation
In some experiments it can be efficient to evaluate several but don’t perform well for exploration as they may not predict
settings in parallel. For example, during alloy design batches well at points that are distant from observations. Additionally,
of different mixtures undergo similar heat treatment phases, a non-differentiable response surface renders it unsuitable for
so the optimiser must recommend multiple settings before gradient-based optimisation.
receiving any new results. Sequential algorithms can be used
to find the point that maximises the acquisition function, IV. DISCUSSION
and then move on to find the next point in the batch after Machine-learning methods through Bayesian optimisation
suppressing this point. Suppression can be achieved by tem- offer a powerful way to deal with many problems of experi-
porarily updating the GP with a hypothetical value for the mental optimization that have not been previously addressed.
point (e.g. based on a recent posterior mean), or by applying While techniques exist for different issues (high dimensional-
a penalty in the acquisition function. Table 1(6) outlines ity, multi-objective, etc.), few works solve multiple issues in
some approaches that have been reported. Most methods are a general way. Methods are likely to be composable where
for unconstrained batches, though recent work has handled no incompatible changes are required to the BO process.
constraints on selected variables within a batch [102]. Figure 5 outlines composability based on the current reper-
toire of Bayesian optimisation algorithms. When a design
F. MULTI-FIDELITY OPTIMISATION problem is single objective, has single fidelity measure-
When function evaluations are prohibitively expensive, cheap ment, and all the variables are continuous then it offers
approximations may be useful. In such situations high fidelity the greatest flexibility in terms of adding specific capability
data obtained through experimentation might be augmented such as transfer learning or high dimensional optimisation.
by low fidelity data obtained through running a simula- Other cases require careful selection of algorithms to add
tion. For example, during alloy design, simulation soft- desired capabilities. For example, the method of [111] han-
ware can predict the alloy strength but results may be less dles multiple objectives with constraints, and the method

VOLUME 8, 2020 13943


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

FIGURE 5. Current capability graph on the composability of various aspects of experimental design problems in Bayesian optimisation. It is possible to
compose algorithms which lie on a path in the graph. It is possible to finish at any block and even skip multiple blocks on a path. Regular text denotes
the capability achievable with standard Bayesian optimisation, whereas highlighted text denotes the existence of specialised algorithms.

of [43] handles parallel evaluation in high dimensions with but includes some enhancements such as mixed factor
mixed type inputs. Some combinations may not even be types (continuous, discrete, categorical), and automatic
possible, for example, Random Forest based algorithm such hyperparameter tuning.
as [44] would not admit many capabilities. Note that this • BayesOpt (https://github.com/rmcantin/bayesopt) is
graph does not portray any theoretical limitations, but merely written in C++, and includes common interfaces for C,
presents a gist of the current capability through the lens of C++, Python, Matlab, and Octave [123].
composability.
Several open-source libraries are available for incor-
V. CONCLUSION
porating BO into computer programs. Depending on the
This review has presented an overview of Bayesian opti-
application, computation speed may be an issue. A com-
misation (BO) with application to experimental design.
mon operation in most algorithms is Cholesky decomposition
BO was introduced in relation to existing Design of Experi-
which is used to invert the kernel matrix and is generally
ments (DOE) methods such as factorial designs, response sur-
O(n3 ) for n data points, but with care this can be calculated
face methodology, and adaptive sampling. A brief discussion
incrementally as new points arrive, reducing the complexity
of the theory highlighted the roles of the Gaussian process,
to O(n2 ) [123]. Several algorithms gain speed-up by imple-
kernel, and acquisition function. A set of seven core issues
menting part of the algorithm on a GPU, which can be
was identified as being important in practical experimental
up to 100 times faster than the equivalent single-threaded
designs, and some detailed solutions were reviewed. These
code [124].
core issues are: (1) the incorporation of prior knowledge,
• GPyOpt (https://github.com/SheffieldML/GPyOpt) is a (2) high dimensional optimisation, (3) constraints, (4) batch
Bayesian optimisation framework, written in Python evaluation, (5) multiple objectives, (6) multi-fidelity data, and
and supporting parallel optimisation, mixed factor types (7) mixed variable types.
(continuous, discrete, and categorical), and inequality Recent works have shown the potential of Bayesian optimi-
constraints. sation in fields such as robotics, neuroscience, and materials
• GPflowOpt (https://github.com/GPflow/GPflowOpt) is discovery. As the range of potential applications expands, it is
written in Python and uses TensorFlow (https://www. increasingly unlikely that ‘‘vanilla’’ optimisation approaches
tensorflow.org) to accelerate computation on GPU hard- for small numbers of unconstrained, continuous variables will
ware. It supports multi-objective acquisition functions, be appropriate. This is particularly true in DACE simulation
and black-box constraints [125]. applications where high dimensional mixed-type inputs are
• DiceOptim (https://cran.r-project.org/web/packages/ typical.
DiceOptim/index.html) is a BO package written in R. Bayesian optimisation offers a powerful and rigorous
Mixed equality and inequality constraints are imple- framework for exploring and optimising expensive ‘‘black
mented using the method of [89], and parallel optimi- box’’ functions. While solutions exist for the core issues in
sation is via multipoint EI [91], however parallel and experimental design, each approach has strengths and weak-
constraints cannot be mixed in a single optimisation. nesses that could potentially be improved, and the combina-
• MOE (https://github.com/Yelp/MOE) supports paral- tion of the individual solutions is not necessarily straight-
lel optimisation via multi-point stochastic gradient forward. Thus there is a need for ongoing work in this
ascent [124]. Interfaces are provided for Python and area to: (1) improve the efficiency, generality, and scala-
C++, and optimisation can be accelerated on GPU bility of approaches to the core issues, (2) develop designs
hardware. that allow easy combination of multiple approaches, and
• SigOpt (http://sigopt.com) offers Bayesian optimisation (3) develop theoretical guarantees on the performance of
as a web service. The implementation is based on MOE, solutions.

13944 VOLUME 8, 2020


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

REFERENCES [25] C. Li, D. R. de Celis Leal, S. Rana, S. Gupta, A. Sutti, S. Greenhill,


[1] R. A. Fisher, The Design of Experiments. Edinburgh, U.K.: Oliver & T. Slezak, M. Height, and S. Venkatesh, ‘‘Rapid Bayesian optimisation
Boyd, 1935. for synthesis of short polymer fiber materials,’’ Sci. Rep., vol. 7, no. 1,
[2] D. C. Montgomery, Design and Analysis of Experiments. Hoboken, NJ, p. 5683, 2017.
USA: Wiley, 2017. [26] S. Ju, T. Shiga, L. Feng, Z. Hou, K. Tsuda, and J. Shiomi, ‘‘Designing
[3] R. H. Myers, D. C. Montgomery, and C. M. Anderson-Cook, ‘‘Response nanostructures for phonon transport via Bayesian optimization,’’ Phys.
surface methodology: Process and product optimization using designed Rev. X, vol. 7, no. 2, 2017, Art. no. 021024.
experiments,’’ in Applied Probability & Statistics (Wiley Series in Prob- [27] H. Abdelrahman, F. Berkenkamp, J. Poland, and A. Krause, ‘‘Bayesian
ability and Statistics). 2009. optimization for maximum power point tracking in photovoltaic power
[4] G. G. Wang and S. Shan, ‘‘Review of metamodeling techniques in support plants,’’ in Proc. Eur. Control Conf. (ECC), Jun. 2016, pp. 2078–2083.
of engineering design optimization,’’ J. Mech. Des., vol. 129, no. 4, [28] S. Kikuchi, H. Oda, S. Kiyohara, and T. Mizoguchi, ‘‘Bayesian optimiza-
pp. 370–380, Apr. 2007. tion for efficient determination of metal oxide grain boundary structures,’’
[5] W. C. Parr, ‘‘Introduction to quality engineering: Designing quality into Phys. B, Condens. Matter, vol. 532, pp. 24–28, Mar. 2018.
products and processes,’’ Technometrics, vol. 31, no. 2, pp. 255–256, [29] M. M. Khajah, B. D. Roads, R. V. Lindsey, Y.-E. Liu, and M. C. Mozer,
May 1989. ‘‘Designing engaging games using Bayesian optimization,’’ in Proc. CHI
[6] T. Hasenkamp, M. Arvidsson, and I. Gremyr, ‘‘A review of practices for Conf. Hum. Factors Comput. Syst. (CHI), 2016, pp. 5571–5582.
robust design methodology,’’ J. Eng. Design, vol. 20, no. 6, pp. 645–657, [30] R. Lorenz, I. R. Violante, R. P. Monti, G. Montana, A. Hampshire, and
Dec. 2009. R. Leech, ‘‘Dissociating frontoparietal brain networks with neuroadap-
[7] S. M. Göhler and T. J. Howard, ‘‘A framework for the application of tive Bayesian optimization,’’ Nature Commun., vol. 9, no. 1, p. 1227,
robust design methods and tools,’’ in Proc. 1st Int. Symp. Robust Design, 2018.
T. J. Howard and T. Eifler, Eds. Lyngby, Denmark: Technical Univ. of [31] H. J. Kushner, ‘‘A new method of locating the maximum point of an
Denmark, 2014, pp. 123–133. arbitrary multipeak curve in the presence of noise,’’ J. Basic Eng., vol. 86,
[8] D. C. Woods and S. M. Lewis, ‘‘Design of experiments for screening,’’ in no. 1, pp. 97–106, Mar. 1964.
Handbook Uncertainty Quantification. 2017, pp. 1143–1185. [32] J. Mockus, V Tiesis, and A. Zilinskas, ‘‘The application of Bayesian
[9] A. Dean and S. Lewis, Eds., Screening: Methods for Experimentation in methods for seeking the extremum,’’ in Toward Global Optimization 2, L.
Industry, Drug Discovery, and Genetics. New York, NY, USA: Springer- C. W. Dixon and G. P Szego, Eds. Amsterdam, The Netherlands: North
Verlag, 2006. Holland, 1978.
[10] F. A. Viana, ‘‘Things you wanted to know about the Latin hypercube [33] N. Srinivas, A. Krause, S. Kakade, and M. W. Seeger, ‘‘Gaussian process
design and were afraid to ask,’’ in Proc. 10th World Congr. Struct. optimization in the bandit setting: No regret and experimental design,’’ in
Multidisciplinary Optim., 2013, pp. 1–9. Proc. Int. Conf. Mach. Learn., 2010, pp. 1015–1022.
[11] H. Vieira, S. M. Sanchez, K. H. Kienitz, and M. C. N. Belderrain, [34] D. R. Jones, C. D. Perttunen, and B. E. Stuckman, ‘‘Lipschitzian opti-
‘‘Efficient, nearly orthogonal-and-balanced, mixed designs: An effective mization without the Lipschitz constant,’’ J. Optim. Theory Appl., vol. 79,
way to conduct trade-off analyses via simulation,’’ J. Simul., vol. 7, no. 4, no. 1, pp. 157–181, 1993.
pp. 264–275, Nov. 2013. [35] D. C. Liu and J. Nocedal, ‘‘On the limited memory BFGS method for large
[12] G. E. P. Box and K. B. Wilson, ‘‘On the experimental attainment of scale optimization,’’ Math. Program., vol. 45, nos. 1–3, pp. 503–528,
optimum conditions,’’ J. Roy. Stat. Soc. B, Methodol., vol. 13, no. 1, 1989.
[36] M. J. Powell, ‘‘A view of algorithms for optimization without deriva-
pp. 1–45, 1951.
tives,’’ Math. Today-Bull. Inst. Math. Appl., vol. 43, no. 5, pp. 170–174,
[13] J. L. Chapman, L. Lu, and C. M. Anderson-Cook, ‘‘Process optimization
2007.
for multiple responses utilizing the Pareto front approach,’’ Qual. Eng.,
[37] T. P. Runarsson and X. Yao, ‘‘Stochastic ranking for constrained evo-
vol. 26, no. 3, pp. 253–268, Jul. 2014.
lutionary optimization,’’ IEEE Trans. Evol. Comput., vol. 4, no. 3,
[14] R. H. Myers, D. C. Montgomery, G. G. Vining, C. M. Borror, and
pp. 284–294, Sep. 2000.
S. M. Kowalski, ‘‘Response surface methodology: A retrospective and
[38] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, ‘‘Algorithms for hyper-
literature survey,’’ J. Qual. Technol., vol. 36, no. 1, pp. 53–77, Jan. 2004.
parameter optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011,
[15] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, ‘‘Design and analysis
pp. 2546–2554.
of computer experiments,’’ Stat. Sci., vol. 4, no. 4, pp. 409–423, 1989. [39] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Auto-WEKA:
[16] H. Liu, Y.-S. Ong, and J. Cai, ‘‘A survey of adaptive sampling for Combined selection and hyperparameter optimization of classification
global metamodeling in support of simulation-based complex engineering algorithms,’’ in Proc. Int. Conf. Knowl. Discovery Data Mining, 2013,
design,’’ Struct. Multidisciplinary Optim., vol. 57, no. 1, pp. 393–416, pp. 847–855.
Jan. 2018. [40] J. Snoek, H. Larochelle, and R. P. Adams, ‘‘Practical Bayesian optimiza-
[17] F. A. Viana, T. W. Simpson, V. Balabanov, and V. Toropov, ‘‘Special tion of machine learning algorithms,’’ in Proc. Adv. Neural Inf. Process.
section on multidisciplinary design optimization: Metamodeling in multi- Syst., 2012, pp. 2960–2968.
disciplinary design optimization: How far have we really come?’’ AIAA J., [41] K. Swersky, J. Snoek, and R. P. Adams, ‘‘Multi-task Bayesian optimiza-
vol. 52, no. 4, pp. 670–690, 2014. tion,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2004–2012.
[18] S. S. Garud, I. A. Karimi, and M. Kraft, ‘‘Design of computer experi- [42] P. Baldi, P. Sadowski, and D. Whiteson, ‘‘Enhanced Higgs boson to τ +
ments: A review,’’ Comput. Chem. Eng., vol. 106, pp. 71–95, Nov. 2017. τ -search with deep learning,’’ Phys. Rev. Lett., vol. 114, no. 11, 2015,
[19] D. R. Jones, M. Schonlau, and W. J. Welch, ‘‘Efficient global optimization Art. no. 111801.
of expensive black-box functions,’’ J. Global Optim., vol. 13, no. 4, [43] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. De Freitas, ‘‘Bayesian
pp. 455–492, 1998. optimization in high dimensions via random embeddings,’’ in Proc. Int.
[20] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, Joint Conf. Artif. Intell., Jul. 2013, pp. 1778–1784.
‘‘Taking the human out of the loop: A review of Bayesian optimization,’’ [44] F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Sequential model-based
Proc. IEEE, vol. 104, no. 1, pp. 148–175, Jan. 2016. optimization for general algorithm configuration,’’ in Proc. Int. Conf.
[21] E. Brochu, V. M. Cora, and N. de Freitas, ‘‘A tutorial on Bayesian opti- Learn. Intell. Optim. Berlin, Germany: Springer, 2011, pp. 507–523.
mization of expensive cost functions, with application to active user mod- [45] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, ‘‘Robots that can adapt
eling and hierarchical reinforcement learning,’’ 2010, arXiv:1012.2599. like animals,’’ Nature, vol. 521, no. 7553, pp. 503–507, May 2015.
[Online]. Available: https://arxiv.org/abs/1012.2599 [46] R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, ‘‘Bayesian
[22] P. I. Frazier, ‘‘A tutorial on Bayesian optimization,’’ 2018, optimization for learning gaits under uncertainty,’’ Ann. Math. Artif.
arXiv:1807.02811. [Online]. Available: https://arxiv.org/abs/1807.02811 Intell., vol. 76, nos. 1–2, pp. 5–23, Feb. 2016.
[23] D. Xue, P. V. Balachandran, J. Hogden, J. Theiler, D. Xue, and [47] J. Lancaster, R. Lorenz, R. Leech, and J. H. Cole, ‘‘Bayesian optimisation
T. Lookman, ‘‘Accelerated search for materials with targeted properties for neuroimaging pre-processing in brain age classification and predic-
by adaptive design,’’ Nature Commun., vol. 7, p. 11241, Apr. 2016. tion,’’ Frontiers Aging Neurosci., vol. 10, p. 28, Feb. 2018.
[24] A. Vahid, S. Rana, S. Gupta, P. Vellanki, S. Venkatesh, and T. Dorin, [48] R. Lorenz, R. P. Monti, I. R. Violante, C. Anagnostopoulos, A. A. Faisal,
‘‘New Bayesian-optimization-based design of high-strength 7xxx-series G. Montana, and R. Leech, ‘‘The automatic neuroscientist: A framework
alloys from recycled aluminum,’’ J. Minerals, Metals Mater. Soc., vol. 70, for optimizing experimental design with closed-loop real-time fMRI,’’
no. 11, pp. 2704–2709, Nov. 2018. NeuroImage, vol. 129, pp. 320–334, Apr. 2016.

VOLUME 8, 2020 13945


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

[49] T. Ueno, T. D. Rhone, Z. Hou, T. Mizoguchi, and K. Tsuda, ‘‘COMBO: [72] T. Dai Nguyen, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, K. J. Deane,
An efficient Bayesian optimization library for materials science,’’ Mater. and P. G. Sanders, Cascade Bayesian Optimization. 2016, pp. 268–280.
Discovery, vol. 4, pp. 18–21, Jun. 2016. [73] S. Rana, C. Li, S. Gupta, V. Nguyen, and S. Venkatesh, ‘‘High dimen-
[50] T. Lookman, P. V. Balachandran, D. Xue, J. Hogden, and J. Theiler, ‘‘Sta- sional Bayesian optimization with elastic Gaussian process,’’ in Proc. Int.
tistical inference and adaptive design for materials discovery,’’ Current Conf. Mach. Learn., 2017, pp. 2883–2891.
Opinion Solid State Mater. Sci., vol. 21, no. 3, pp. 121–128, Jun. 2017. [74] C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, and A. Shilton, ‘‘High
[51] R. Gómez-Bombarelli et al., ‘‘Design of efficient molecular organic light- dimensional Bayesian optimization using dropout,’’ in Proc. 26th Int.
emitting diodes by a high-throughput virtual screening and experimental Joint Conf. Artif. Intell., 2017, pp. 2096–2102.
approach,’’ Nature Mater., vol. 15, no. 10, pp. 1120–1127, Oct. 2016. [75] C. Oh, E. Gavves, and M. Welling, ‘‘BOCK: Bayesian optimization
[52] P. I. Frazier and J. Wang, ‘‘Bayesian optimization for materials design,’’ with cylindrical kernels,’’ in Proc. 35th Int. Conf. Mach. Learn., J.
in Proc. Inf. Sci. Mater. Discovery Design. Cham, Switzerland: Springer, Dy and A. Krause, Eds. Stockholm, Sweden: Stockholmsmässan, 2018,
2016, pp. 45–75. pp. 3868–3877.
[53] A. Seko, T. Maekawa, K. Tsuda, and I. Tanaka, ‘‘Machine learning [76] C.-L. Li, K. Kandasamy, B. Póczos, and J. Schneider, ‘‘High dimensional
with systematic density-functional theory calculations: Application to Bayesian optimization via restricted projection pursuit models,’’ in Proc.
melting temperatures of single-and binary-component solids,’’ Phys. Rev. Artif. Intell. Statist., 2016, pp. 884–892.
B, Condens. Matter, vol. 89, no. 5, 2014, Art. no. 054303. [77] Z. Wang, C. Li, S. Jegelka, and P. Kohli, ‘‘Batched high-dimensional
[54] A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput, and I. Tanaka, Bayesian optimization via structural kernel learning,’’ in Proc. Int. Conf.
‘‘Discovery of low thermal conductivity compounds with first- Mach. Learn., 2017.
principles anharmonic lattice dynamics calculations and Bayesian [78] J. Gardner, C. Guo, K. Weinberger, R. Garnett, and R. Grosse, ‘‘Discover-
optimization,’’ 2015, arXiv:1506.06439. [Online]. Available: ing and exploiting additive structure for Bayesian optimization,’’ in Proc.
https://arxiv.org/abs/1506.06439 Artif. Intell. Statist., 2017, pp. 1311–1319.
[55] A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput, and I. Tanaka, [79] A. Nayebi, A. Munteanu, and M. Poloczek, ‘‘A framework for Bayesian
‘‘Prediction of low-thermal-conductivity compounds with first-principles optimization in embedded subspaces,’’ in Proc. Int. Conf. Mach. Learn.,
anharmonic lattice-dynamics calculations and Bayesian optimization,’’ 2019, pp. 4752–4761.
Phys. Rev. Lett., vol. 115, no. 20, 2015, Art. no. 205901. [80] M. Mutny and A. Krause, ‘‘Efficient high dimensional Bayesian opti-
[56] D. Packwood, Bayesian Optimization for Materials Science. Singapore: mization with additivity and quadrature Fourier features,’’ in Proc. Adv.
Springer, 2017. Neural Inf. Process. Syst., 2018, pp. 9005–9016.
[57] D. Yogatama and G. Mann, ‘‘Efficient transfer learning method for auto- [81] J. Kirschner, M. Mutny, N. Hiller, R. Ischebeck, and A. Krause, ‘‘Adaptive
matic hyperparameter tuning,’’ in Proc. 17th Int. Conf. Artif. Intell. Statist. and safe Bayesian optimization in high dimensions via one-dimensional
(AISTATS), Reykjavik, Iceland, Apr. 2014, pp. 1077–1085. subspaces,’’ in Proc. Int. Conf. Mach. Learn., 2019, pp. 3429–3438.
[58] T. T. Joy, S. Rana, S. K. Gupta, and S. Venkatesh, ‘‘Flexible transfer [82] J. Djolonga, A. Krause, and V. Cevher, ‘‘High-dimensional Gaussian
learning framework for Bayesian optimisation,’’ in Proc. Pacific–Asia process bandits,’’ in Proc. Adv. Neural Inf. Process. Syst. 27th Annu. Conf.
Conf. Knowl. Discovery Data Mining. Cham, Switzerland: Springer, Neural Inf. Process. Syst., Lake Tahoe, NV, USA, 2013, pp. 1025–1033.
2016, pp. 102–114. [83] M. A. Gelbart, J. Snoek, and R. P. Adams, ‘‘Bayesian optimization
[59] A. Shilton, S. Gupta, S. Rana, and S. Venkatesh, ‘‘Regret bounds for with unknown constraints,’’ in Proc. Uncertainty Artif. Intell., 2014,
transfer learning in Bayesian optimisation,’’ in Proc. Artif. Intell. Statist., pp. 250–259.
2017, pp. 307–315. [84] J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and
[60] R. Bardenet, M. Brendel, B. Kégl, and M. Sebag, ‘‘Collaborative hyper- J. P. Cunningham, ‘‘Bayesian optimization with inequality constraints,’’
parameter tuning,’’ in Proc. 30th Int. Conf. Mach. Learn. (ICML), Atlanta, in Proc. Int. Conf. Mach. Learn., 2014, pp. 937–945.
GA, USA, Jun. 2013, pp. 199–207. [85] S. Ariafar, J. Coll-Font, D. Brooks, and J. Dy, ‘‘ADMMBO: Bayesian
[61] J. Riihimäki and A. Vehtari, ‘‘Gaussian processes with monotonic- optimization with unknown constraints using ADMM,’’ J. Mach. Learn.
ity information,’’ in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, Res., vol. 20, no. 123, pp. 1–26, 2019.
pp. 645–652. [86] J. M. Hernández-Lobato, M. Gelbart, M. Hoffman, R. Adams, and
[62] M. Jauch and V. Peña, ‘‘Bayesian optimization with shape con- Z. Ghahramani, ‘‘Predictive entropy search for Bayesian optimization
straints,’’ 2016, arXiv:1612.08915. [Online]. Available: https://arxiv.org with unknown constraints,’’ in Proc. Int. Conf. Mach. Learn., 2015,
/abs/1612.08915 pp. 1699–1707.
[63] C. Li, S. Rana, S. Gupta, V. Nguyen, and S. Venkatesh, ‘‘Bayesian [87] E. C. Garrido-Merchán and D. Hernández-Lobato, ‘‘Predictive entropy
optimization with monotonicity information,’’ in Proc. 31st Conf. Neural search for multi-objective Bayesian optimization with constraints,’’ Neu-
Inf. Process. Syst. (NIPS), 2017. rocomputing, vol. 361, pp. 50–68, Oct. 2019.
[64] P. J. Lenk and T. Choi, ‘‘Bayesian analysis of shape-restricted functions [88] R. Lam and K. Willcox, ‘‘Lookahead Bayesian optimization with
using Gaussian process priors,’’ Statistica Sinica, vol. 27, pp. 43–69, inequality constraints,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017,
Jan. 2017. pp. 1888–1898.
[65] M. R. Andersen, E. Siivola, and A. Vehtari, ‘‘Bayesian optimization of [89] V. Picheny, R. B. Gramacy, S. Wild, and S. Le Digabel, ‘‘Bayesian
unimodal functions,’’ in Proc. Adv. Neural Inf. Process. Syst. (NIPS), optimization under mixed constraints with a slack-variable aug-
2017. mented Lagrangian,’’ in Proc. Adv. Neural Inf. Process. Syst., 2016,
[66] A. Ramachandran, S. K. Gupta, R. Santu, and S. Venkatesh, pp. 1435–1443.
‘‘Information-theoretic transfer learning framework for Bayesian [90] M. Abdolshah, A. Shilton, S. Rana, S. Gupta, and S. Venkatesh, ‘‘Multi-
optimisation,’’ in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery objective Bayesian optimisation with preferences over objectives,’’ in
Databases. Cham, Switzerland: Springer, 2018. Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2019.
[67] R. Jenatton, C. Archambeau, J. González, and M. Seeger, ‘‘Bayesian opti- [91] D. Ginsbourger, R. Le Riche, and L. Carraro, ‘‘A multi-points criterion for
mization with tree-structured dependencies,’’ in Proc. Int. Conf. Mach. deterministic parallel global optimization based on Gaussian processes,’’
Learn., 2017, pp. 1655–1664. Département Méthodes et Modèles Mathématiques pour l’Industrie, 3MI-
[68] K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, ENSMSE, Saint-Étienne, France, Tech. Rep. hal-00260579, 2008.
‘‘Raiders of the lost architecture: Kernels for Bayesian optimization in [92] J. Azimi, A. Fern, and X. Z. Fern, ‘‘Batch Bayesian optimization via
conditional parameter spaces,’’ 2014, arXiv:1409.4011. [Online]. Avail- simulation matching,’’ in Proc. Adv. Neural Inf. Process. Syst., 2010,
able: https://arxiv.org/abs/1409.4011 pp. 109–117.
[69] D. K. Duvenaud, H. Nickisch, and C. E. Rasmussen, ‘‘Additive [93] T. Desautels, A. Krause, and J. W. Burdick, ‘‘Parallelizing exploration-
Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, exploitation tradeoffs in Gaussian process bandit optimization,’’ J. Mach.
pp. 226–234. Learn. Res., vol. 15, no. 1, pp. 3873–3923, 2014.
[70] K. Kandasamy, J. G. Schneider, and B. Póczos, ‘‘High dimensional [94] J. González, Z. Dai, P. Hennig, and N. D. Lawrence, ‘‘Batch Bayesian
Bayesian optimisation and bandits via additive models,’’ in Proc. 32nd optimization via local penalization,’’ in Proc. Artif. Intell. Statist., 2015,
Int. Conf. Mach. Learn. (ICML), Lille, France, Jul. 2015, pp. 295–304. pp. 648–657.
[71] F. Hutter and M. A. Osborne, ‘‘A kernel for hierarchical [95] V. Nguyen, S. Rana, S. K. Gupta, C. Li, and S. Venkatesh, ‘‘Budgeted
parameter spaces,’’ 2013, arXiv:1310.5738. [Online]. Available: batch Bayesian optimization,’’ in Proc. IEEE 16th Int. Conf. Data Mining
https://arxiv.org/abs/1310.5738 (ICDM), Dec. 2016, pp. 1107–1112.

13946 VOLUME 8, 2020


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

[96] C. Gong, J. Peng, and Q. Liu, ‘‘Quantile stein variational gradient descent [119] A. Sabharwal, H. Samulowitz, and G. Tesauro, ‘‘Selecting near-
for batch Bayesian optimization,’’ in Proc. Int. Conf. Mach. Learn., 2019, optimal learners via incremental data allocation,’’ in Proc. AAAI, 2016,
pp. 2347–2356. pp. 2007–2015.
[97] E. Contal, D. Buffoni, A. Robicquet, and N. Vayatis, ‘‘Parallel Gaussian [120] C. Zhang and K. Chaudhuri, ‘‘Active learning from weak and strong
process optimization with upper confidence bound and pure exploration,’’ labelers,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 703–711.
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases. [121] E. C. Garrido-Merchán and D. Hernández-Lobato, ‘‘Dealing with Cat-
Berlin, Germany: Springer, 2013, pp. 225–240. egorical and Integer-valued Variables in Bayesian optimization with
[98] S. Gupta, A. Shilton, S. Rana, and S. Venkatesh, ‘‘Exploiting strategy- Gaussian processes,’’ 2017, arXiv:1706.03673. [Online]. Available:
space diversity for batch Bayesian optimization,’’ in Proc. Int. Conf. Artif. https://arxiv.org/abs/1805.03463
Intell. Statist., 2018, pp. 538–547. [122] M. A. Villegas García, ‘‘An investigation into new kernels for categorical
[99] T. T. Joy, S. Rana, S. Gupta, and S. Venkatesh, ‘‘Batch Bayesian optimiza- variables,’’ M.S. thesis, Departament de Llenguatges i Sistemes Infor-
tion using multi-scale search,’’ Knowl.-Based Syst., vol. 187, Jan. 2020, màtics, Universitat Politècnica de Catalunya, Barcelona, Spain, 2013.
Art. no. 104818. [123] R. Martinez-Cantin, ‘‘BayesOpt: A Bayesian optimization library for
[100] A. Shah and Z. Ghahramani, ‘‘Parallel predictive entropy search for nonlinear optimization, experimental design and bandits,’’ J. Mach.
batch global optimization of expensive objective functions,’’ in Proc. Adv. Learn. Res., vol. 15, no. 1, pp. 3735–3739, 2014.
Neural Inf. Process. Syst., 2015, pp. 3330–3338. [124] J. Wang, S. C. Clark, E. Liu, and P. I. Frazier, ‘‘Parallel Bayesian global
[101] J. Wu and P. Frazier, ‘‘The parallel knowledge gradient method for batch optimization of expensive functions,’’ 2016, arXiv:1602.05149. [Online].
Bayesian optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2016, Available: https://arxiv.org/abs/1602.05149
pp. 3126–3134. [125] N. Knudde, J. van der Herten, T. Dhaene, and I. Couckuyt, ‘‘GPflow:
[102] P. Vellanki, S. Rana, S. Gupta, D. Rubin, A. Sutti, T. Dorin, M. Height, A Gaussian process library using TensorFlow,’’ 2017, arXiv:1711.03845.
P. Sanders, and S. Venkatesh, ‘‘Process-constrained batch Bayesian opti- [Online]. Available: https://arxiv.org/abs/1610.08733
misation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 3417–3426.
[103] S. Shan and G. G. Wang, ‘‘Survey of modeling and optimization strate- STEWART GREENHILL received the B.Sc. degree
gies to solve high-dimensional design problems with computationally-
in computer science from the University of West-
expensive black-box functions,’’ Struct. Multidisciplinary Optim., vol. 41,
ern Australia, in 1987, and the Ph.D. degree
no. 2, pp. 219–241, Mar. 2010.
[104] A. M. Gopakumar, P. V. Balachandran, D. Xue, J. E. Gubernatis, and in environmental science from Murdoch Univer-
T. Lookman, ‘‘Multi-objective optimization for materials discovery via sity, in 1992. He is currently a Research Fellow
adaptive design,’’ Sci. Rep., vol. 8, no. 1, p. 3738, 2018. with the Applied Artificial Intelligence Institute,
[105] Y. Collette and P. Siarry, Multiobjective Optimization: Principles and Deakin University, Australia. His research inter-
Case Studies. Berlin, Germany: Springer-Verlag, 2013. ests include machine learning, signal processing,
[106] J. Knowles, ‘‘ParEGO: A hybrid algorithm with on-line landscape embedded systems, software engineering, visual-
approximation for expensive multiobjective optimization problems,’’ ization, and interaction design.
IEEE Trans. Evol. Comput., vol. 10, no. 1, pp. 50–66, Feb. 2006.
[107] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze, ‘‘Multiobjective SANTU RANA is currently a Researcher in the
optimization on a limited budget of evaluations using model-assisted field of machine learning and computer vision
S -metric selection,’’ in Proc. Int. Conf. Parallel Problem Solving Nature. with the Applied Artificial Intelligence Insti-
Berlin, Germany: Springer, 2008, pp. 784–794.
tute, Deakin University, Australia. His research in
[108] M. Emmerich and J.-W. Klinkenberg, ‘‘The computation of the expected
improvement in dominated hypervolume of Pareto front approxima-
high-dimensional Bayesian optimization has been
tions,’’ Rapport Technique, Leiden Univ., Leiden, The Netherlands, applied to efficiently design alloys with large num-
Tech. Rep. LIACS-TR 9-2008, 2008. ber of elements. He has been actively conduct-
[109] V. Picheny, ‘‘Multiobjective optimization using Gaussian process emula- ing research in Bayesian experimental design with
tors via stepwise uncertainty reduction,’’ Statist. Comput., vol. 25, no. 6, applications in advanced manufacturing. In the last
pp. 1265–1280, Nov. 2015. four years, he has published more than 40 research
[110] D. Hernández-Lobato, J. Hernandez-Lobato, A. Shah, and R. Adams, articles improving various aspects of Bayesian optimization algorithm. Alto-
‘‘Predictive entropy search for multi-objective Bayesian optimization,’’ gether, he has published over 79 research articles, including 14 refereed
in Proc. Int. Conf. Mach. Learn., 2016, pp. 1492–1501. journal articles, 58 fully refereed conference proceedings, and seven work-
[111] P. Feliot, J. Bect, and E. Vazquez, ‘‘A Bayesian approach to constrained shop articles, with over 515 citations and an H-index of 12. He is also a
single-and multi-objective optimization,’’ J. Global Optim., vol. 67, Co-Inventor of two patents. His broad research interests lie in devising practi-
nos. 1–2, pp. 97–133, 2017.
cal machine learning algorithms for various tasks, such as object recognition,
[112] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, ‘‘Distributed
mathematical optimization, and healthcare data modeling.
optimization and statistical learning via the alternating direction method
of multipliers,’’ Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,
2011. SUNIL GUPTA is currently a Researcher in
[113] P. Perdikaris and G. E. Karniadakis, ‘‘Model inversion via multi-fidelity the field of machine learning and data mining
Bayesian optimization: A new paradigm for parameter estimation in with the Applied Artificial Intelligence Institute,
haemodynamics, and beyond,’’ J. Roy. Soc. Interface, vol. 13, no. 118, Deakin University, Australia. He has published
p. 20151107, 2016. over 100 research articles, including two book
[114] A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause, chapters, 25 refereed journal articles, 70 fully ref-
S. Schaal, and S. Trimpe, ‘‘Virtual vs. real: Trading off simulations ereed conference proceedings, and nine workshop
and physical experiments in reinforcement learning with Bayesian opti- articles with over 1000 citations and an H-index
mization,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, of 17. His research interest lies in developing
pp. 1557–1563. data-driven models for real world processes and
[115] K. Kandasamy, G. Dasarathy, J. Schneider, and B. Poczos, ‘‘Multi-fidelity
phenomena covering both big-data and small-data problems. His recent
Bayesian optimisation with continuous approximations,’’ in Proc. Int.
research in optimization using small data (Bayesian optimization) has found
Conf. Mach. Learn., 2017.
[116] A. Klein, S. Bartels, S. Falkner, P. Hennig, and F. Hutter, ‘‘Towards applications in efficient experimental design of products and processes in
efficient Bayesian optimization for big data,’’ in Proc. NIPS Workshop advanced manufacturing, such as alloy design with certain target proper-
Bayesian Optim. (BayesOpt), vol. 134, 2015, p. 98. ties, design of short nanofibers with appropriate length and thickness, and
[117] M. Poloczek, J. Wang, and P. Frazier, ‘‘Multi-information source opti- optimal setting of parameters in 3d-printers. His research has won several
mization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4291–4301. best paper awards in the field of data mining and machine learning. He is
[118] M. Cutler, T. J. Walsh, and J. P. How, ‘‘Reinforcement learning with multi- also a Co-Inventor of a patent related to experimental design. He regularly
fidelity simulators,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), serves at technical program committees of the prestigious machine learning
May 2014, pp. 3888–3895. conferences.

VOLUME 8, 2020 13947


S. Greenhill et al.: BO for Adaptive Experimental Design: A Review

PRATIBHA VELLANKI graduated from Pune SVETHA VENKATESH is currently an ARC


University in electronics and telecommunication, Australian Laureate Fellow, the Alfred Deakin
in 2009, where she continued to pursue the mas- Professor, and the Co-Director of the Applied Arti-
ter’s degree in signal processing, in 2011. She ficial Intelligence Institute (A2I2), Deakin Univer-
received the Ph.D. degree from Deakin University, sity, Australia. She and her team have tackled a
Australia. She worked as an Associate Research wide range of problems of societal significance,
Fellow in the field of applied machine learning including the critical areas of autism, security,
with Deakin University, from 2012 to 2018. She is and aged care. The outcomes have impacted the
interested in taking on a machine learning perspec- community and evolved into publications, patents,
tive on research problems that affect people and tools, and spin-off companies. This includes more
health. She currently works as a Data Scientist for the Office for National than 600 publications, three full patents, start-up companies (iCetana), and a
Statistics, U.K. This work was done during her time with Deakin University. significant product (TOBY Playpad).
Prof. Venkatesh was elected as a Fellow of the International Association of
Pattern Recognition, in 2004, for contributions to formulation and extraction
of semantics in multimedia data, and the Australian Academy of Techno-
logical Sciences and Engineering, in 2006. In 2017, she was appointed as
an Australian Laureate Fellow, the highest individual award the Australian
Research Council can bestow.

13948 VOLUME 8, 2020

You might also like