Bayesian Optimization For Adaptive Experimental Design A Review
Bayesian Optimization For Adaptive Experimental Design A Review
ABSTRACT Bayesian optimisation is a statistical method that efficiently models and optimises expensive
‘‘black-box’’ functions. This review considers the application of Bayesian optimisation to experimental
design, in comparison to existing Design of Experiments (DOE) methods. Solutions are surveyed for a range
of core issues in experimental design including: the incorporation of prior knowledge, high dimensional
optimisation, constraints, batch evaluation, multiple objectives, multi-fidelity data, and mixed variable types.
INDEX TERMS Bayesian methods, design for experiments, design optimization, machine learning algo-
rithms.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 13937
S. Greenhill et al.: BO for Adaptive Experimental Design: A Review
FIGURE 1. Sampling methods used in experimental design. In classical Factorial designs samples are placed on a geometric grid. Space filling
designs are used with a variety of non-linear models. Sample requirements are determined heuristically, but these designs are empirically much
more efficient than grids.
multi-start derivative free local optimiser e.g. COBYLA [36], many research issues remain open [10], [11] such as: mixing
or evolutionary algorithms e.g. ISRES [37], or Lipschitzian of discrete and continuous variables, incorporation of global
methods such as DIRECT [34]. However, none of these are sensitivity information, and sequential sampling.
designed to be sample efficient, and all need to evaluate a Response Surface Methodology (RSM) [3], [12] is a
function many times to perform optimisation. In contrast, sequential approach which has become the primary method
Bayesian optimisation uses a model based approach with for industrial experimentation. In its original form, response
an adaptive sampling strategy to minimise the number of surfaces are second order polynomials which are determined
function evaluations. using central composite factorial experiments, and a path
Past approaches to experimental design have closely cou- of steepest ascent is used to seek an optimal point. For
pled sampling and modelling. Factorial designs assume a robust design, replication is used to estimate noise fac-
linear model and sample at orthogonal corners of the design tors, and optimisation must consider dual responses for pro-
space (see Figure 1). For more complex non-linear models, cess mean and variance. Approaches for handling multi-
general purpose space-filling designs such as Latin hyper- ple objectives include ‘‘split-plot’’ techniques, ‘‘desirability
cubes offer a more uniform coverage of the design space. For functions’’ and Pareto fronts [13]. Non-parametric RSM can
N sample points in k dimensions, there are (N !)k−1 possi- be more general than second-order polynomials, and uses
ble Latin hypercube designs, and finding a suitable design techniques such as Gaussian processes, thin-plate splines,
involves balancing space-filling (e.g. via entropy, or potential and neural networks. Alternative optimisation approaches
energy) with other desirable properties such as orthogonality. include simulated annealing, branch-and-bound and genetic
Much literature exists on the design of Latin hypercubes, and algorithms [14].
In many areas, experiments are performed with detailed The recent advances in both the theory and practice of
computer simulations of physical systems. Aerospace design- Bayesian optimisation has led to a plethora of techniques.
ers frequently work with expensive CFD (computational In most parts, each advance is applicable to a sub-set of
fluid dynamic) and FEA (finite element analysis) simula- experimental conditions. What is lacking is both an overview
tions. Multi-agent simulations are used to model how actor of these methods and a methodology to adapt these tech-
behaviour determines the outcome of group interactions niques to a particular experimental design context. We fill
in areas such as defence, networking, transportation, and this gap and provide a comprehensive study of the state-of-
logistics. Design and Analysis of Computer Experiments the-art Bayesian optimisation algorithms in terms of their
(or DACE, after [15]) differs from DOE in several ways. Sim- applicability in experimental optimisation. Further, we pro-
ulations are generally deterministic, without random effects vide a template of how disparate algorithms can be connected
and uncontrolled variables, so less emphasis is placed on to create a fit-for-purpose solution. This thus provides an
dealing with measurement noise. Simulations often include overview of the capability and increases the reach of these
many variables, so there is more need to handle high dimen- powerful methods. We conclude by discussion where further
sionality and mixed variable types. Where the response research is needed.
is complex, non-parametric models are used, including
Gaussian Processes, Multivariate Adaptive Regression II. BAYESIAN OPTIMISATION
Splines, and Support Vector Regression [4], [16], [17]. Bayesian optimisation incorporates two main ideas:
A problem with classical DOE and space-filling designs is • A Gaussian process (GP) is used to maintain a belief
that the sampling pattern is determined before measurements over the design space. This simultaneously models the
are made, and cannot adapt to features that appear during predicted mean µt (x) and the epistemic uncertainty
the experiment. In contrast, adaptive sampling [16], [18] σt (x) at any point x in the input space, given a set
is a sequential process that decides the location of the next of observations D1:t = {(x1 , y1 ), (x2 , y2 ), ...(xt , yt )},
sample by balancing two criteria. Firstly, it samples in areas where xt is the process input, and yt is the corresponding
that have not been previously explored (e.g. based on distance output at time t.
from previous samples). Secondly, it samples more densely • An acquisition function expresses the most promising
in areas where interesting behaviour is observed, such as setting for the next experiment, based on the predicted
rapid change or non-linearity. This can be detected using mean µt (x) and the uncertainty σt (x).
local gradients, prediction variance (e.g. where uncertainty A GP is completely specified by its mean function m(x) and
is modelled), by checking agreement between the model and covariance function k(x, x0 ):
data (cross-validation), or agreement between an ensemble
of models. BO is a form of model-based global optimisation f (x) ∼ GP(m(x), k(x, x0 )) (1)
(MBGO [16]), which uses adaptive sampling to guide the
experiment towards a global optimum. Unlike pure adaptive The covariance function k(x, x0 ) is also called the ‘‘kernel’’,
sampling, MBGO considers the optimum of the modelled and expresses the ‘‘smoothness’’ of the process. We expect
objective when deciding where to sample. that if two points x and x0 are ‘‘close’’, then the corresponding
Recently, there has been a surge in applying Bayesian opti- process outputs y and y0 will also be ‘‘close’’, and that the
misation to design problems involving physical products and closeness depends on the distance between the points, and
processes. In [23], Bayesian optimisation is applied in combi- not the absolute location or direction of separation. A popular
nation with a density functional theory (DFT) based computa- choice for the covariance function is the squared exponential
tional tool to design low thermal hysteresis NiTi-based shape (SE) function, also known as radial basis function (RBF):
memory alloys. Similarly, in [24] Bayesian optimisation is
1 2
used to optimise both the alloy composition and the asso- k(x, x0 ) = exp − 2 x − x0 (2)
2θ
ciated heat treatment schedule to improve the performance
of Al-7xxx series alloys. In [25], Bayesian optimisation is Equation 2 says that the correlation decreases with the
applied for high-quality nano-fibre design meeting a required square of the distance between points, and includes a param-
specification of fibre length and diameter within few tens of eter θ to define the length scale over which this happens.
iterations, greatly accelerating the production process. It has Specialised kernel functions are sometimes used to express
also been applied in other diverse fields including optimi- pre-existing knowledge about the function (e.g. if something
sation of nano-structures for optimal phonon transport [26], is known about the shape of f ).
optimisation for maximum power point tracking in photo- In an experimental setting, observations include a term
voltaic power plants [27], optimisation for efficient determi- for normally distributed noise ∼ N (0, σnoise2 ), and the
nation of metal oxide grain boundary structures [28], and for observation model is:
optimisation of computer game design to maximise engage- y = f (x) +
ment [29]. It has also been used in a recent neuroscience study
[30] in designing cognitive tasks that maximally segregate Gaussian process regression (or ‘‘kriging’’) can predict the
ventral and dorsal FPN activity. value of the objective function f (·) at time t + 1 for any
FIGURE 2. Bayesian optimisation is an iterative process in which the unknown system response is modelled using a Gaussian process. An acquisition
function expresses the most promising setting for the next experiment, and can be efficiently optimised. The model quality improves progressively over
time as successive measurements are incorporated.
location x. The result is a normal distribution with mean µt (x) PI prefers areas where improvement over the current max-
and uncertainty σt (x). imum f (x+ ) is most likely. EI considers not only proba-
bility of improvement, but also the expected magnitude of
P(ft+1 | D1:t , x) = N (µt (x), σt2 (x)) (3) improvement. GP-UCB maximises f (·) while minimising
regret, the difference between the average utility and the ideal
where utility. Regret bounds are important for theoretically proving
convergence. Unlike the original function, the acquisition
µt (x) = kT [K + σnoise
2
I ]−1 y1:t (4)
function can be cheaply sampled, and may be optimised using
σt (x) = k(x, x) − k [K T
+ σnoise
2
I ]−1 k a derivative-free global optimisation method like DIRECT
k = [k(x, x1 ), k(x, x2 ), . . . , k(x, xt )] [34] or using multi-start method with a derivative based
k(x1 , x1 ) . . . k(x1 , xt )
local optimiser such as L-BFGS [35]. Details can be found
.. .. .. in [19], [21].
K = . . . (5)
k(xt , x1 ) ... k(xt , xt )
III. EXPERIMENTAL DESIGN WITH BAYESIAN
Using the Gaussian process model, an acquisition func- OPTIMISATION
tion is constructed to represent the most promising setting BO has been influential in computer science for hyper-
for the next experiment. Acquisition functions are mainly parameter tuning [38]–[42], combinatorial optimisation
derived from the µ(x) and σ (x) of the GP model, and are [43], [44], and reinforcement learning [21]. Recent years have
hence cheap to compute. The acquisition function allows a seen new applications in areas such as robotics [45], [46],
balance between exploitation (sampling where the objective neuroscience [47], [48], and materials discovery [49]–[55].
mean µ(·) is high) and exploration (sampling where the Bayesian optimisation is an iterative process outlined
uncertainty σ (·) is high), and its global maximiser is used as in Figure 2, which can be applied to experiments where
the next experimental setting. inputs are unconstrained and the objective is a scalarised
Acquisition functions are designed to be large near poten- function of measured outputs. Examples of this kind include
tially high values of the objective function. Figure 3 shows material design using physical models [56], or laboratory
commonly used acquisition functions: PI, EI, and GP-UCB. experiments [25]. However, experiments often involve
FIGURE 3. Acquisition functions expressed in terms of the mean µ(x), variance σ (x), and current maximum f (x+ ). 8(·) and φ(·) are the cumulative
distribution function and the probability distribution function of the standard normal distribution. Some functions include factors to balance between
exploration and exploitation: ξ in PI is constant, whereas κt in GP-UCB usually increases with iteration, causing the search to maintain exploration even
with many samples.
TABLE 1. Methods for transferring prior knowledge from past experiments (source) to new experiments (target) (1–3). Methods marked (*) have only
been demonstrated for Gaussian processes, but are also applicable to Bayesian optimisation. Methods for handling high dimensionality (4),
constraints (5), and parallel optimisation (6).
Many methods have been proposed for using Bayesian conflicting objectives, while also remaining scale-invariant
optimisation for multi-objective optimisation [106]–[109], toward different objectives. The method performs better
but these suffer from computational limitations because the than [107], but suffers in high dimensions and can be com-
acquisition function generally requires computation for all putationally expensive. Predictive entropy search is used by
objective functions and as the number of objective functions [110], allowing the different objectives to be decoupled, com-
grow the computational cost grows exponentially. puting acquisition for subsets of objectives when required.
Moving away from EI, the method of [109] allows the opti- The computational cost increases linearly with the num-
misation of multiple objectives without rank modelling for ber of objectives. The method of [111] can be used for
single- or multiple-objective optimisation, including in mul- accurate than measurements obtained from casting experi-
tiple inequality constraints and has been shown to be robust ments. Multi-fidelity Bayesian optimisation has been demon-
in highly constrained settings where the feasible design space strated in [113], [114]. Recently, [115] proposed BO for
is small. an optimisation problem with multi-fidelity data. Although
multi-fidelity approach has been applied in problem-specific
D. CONSTRAINTS context or non-optimisation related tasks [41], [116]–[120],
Table 1(5) outlines some approaches to handling constraints. the method of [115] generalises well for BO problems.
If constraints are known, they can be handled during optimi-
sation of the acquisition function by limiting the search. More G. MIXED-TYPE INPUT
difficult are ‘‘black box’’ constraints that can be evaluated Experimental parameters are often combinations of different
but have unknown form. If the constraint is cheap to evalu- types: continuous, discrete, categorical, and binary. Incorpo-
ate, this is not a problem. Methods for expensive constraint ration of mixed type input is challenging across the domains,
functions include a weighted EI function [83], [84], and including simpler methods such as Latin hypercube sam-
weighted predictive entropy search [86]. A lookahead strat- pling [11]. Non-continuous variables are problematic in
egy for unknown constraints is described by [88]. A different BO because the objective function approximation with GP
formulation for the unknown is proposed by [85], handling assumes continuous input space, with covariance functions
expensive constraints using ADMM solver of [112]. defining the relationship between these continuous variables.
The above methods deal with inequality constraints. In [89] One common way to deal with discrete variables is to round
both inequality and equality constraints are handled, using the value to a close integer [40], but this approach leads to
slack variables to convert inequality constraints to equal- sub-optimal optimisation [121].
ity constraints, and Augmented Lagrangian (AL) to con- Two options for handling mixed-type inputs are:
vert these inequality constraints into a sequence of simpler (1) designing kernels that are suitable for different variables,
sub-problems. and (2) subsampling of data for maximising the objective
The concept of weighted predictive entropy search has function, which is especially useful in higher dimensional
been extended for multi-objective problems [87] for inequal- space. For integer variables the problem can be solved
ity constraints which are both unknown and expensive to through kernel transformation, by assuming the objective
evaluate. A different type of constraint specifically for mul- function to be flat for the region where two continuous vari-
tiple objectives is investigated by [90] where between all ables would be rounded to the same integer [121]. In [67] cat-
the objectives, there exists a rank order preference on which egorical variables are included by one-hot-encoding along-
objective is important. The algorithm developed therein can side numerical variables. A specialised kernel for categorical
preferentially sample the Pareto set such that Pareto samples variables is proposed in [122].
are more varied for the more important objectives. Random forest regression is a good alternative to GP for
regression in a sequential model-based algorithm configura-
E. PARALLEL (BATCH) OPTIMISATION tion (SMAC, [44]). Random forests are good at exploitation
In some experiments it can be efficient to evaluate several but don’t perform well for exploration as they may not predict
settings in parallel. For example, during alloy design batches well at points that are distant from observations. Additionally,
of different mixtures undergo similar heat treatment phases, a non-differentiable response surface renders it unsuitable for
so the optimiser must recommend multiple settings before gradient-based optimisation.
receiving any new results. Sequential algorithms can be used
to find the point that maximises the acquisition function, IV. DISCUSSION
and then move on to find the next point in the batch after Machine-learning methods through Bayesian optimisation
suppressing this point. Suppression can be achieved by tem- offer a powerful way to deal with many problems of experi-
porarily updating the GP with a hypothetical value for the mental optimization that have not been previously addressed.
point (e.g. based on a recent posterior mean), or by applying While techniques exist for different issues (high dimensional-
a penalty in the acquisition function. Table 1(6) outlines ity, multi-objective, etc.), few works solve multiple issues in
some approaches that have been reported. Most methods are a general way. Methods are likely to be composable where
for unconstrained batches, though recent work has handled no incompatible changes are required to the BO process.
constraints on selected variables within a batch [102]. Figure 5 outlines composability based on the current reper-
toire of Bayesian optimisation algorithms. When a design
F. MULTI-FIDELITY OPTIMISATION problem is single objective, has single fidelity measure-
When function evaluations are prohibitively expensive, cheap ment, and all the variables are continuous then it offers
approximations may be useful. In such situations high fidelity the greatest flexibility in terms of adding specific capability
data obtained through experimentation might be augmented such as transfer learning or high dimensional optimisation.
by low fidelity data obtained through running a simula- Other cases require careful selection of algorithms to add
tion. For example, during alloy design, simulation soft- desired capabilities. For example, the method of [111] han-
ware can predict the alloy strength but results may be less dles multiple objectives with constraints, and the method
FIGURE 5. Current capability graph on the composability of various aspects of experimental design problems in Bayesian optimisation. It is possible to
compose algorithms which lie on a path in the graph. It is possible to finish at any block and even skip multiple blocks on a path. Regular text denotes
the capability achievable with standard Bayesian optimisation, whereas highlighted text denotes the existence of specialised algorithms.
of [43] handles parallel evaluation in high dimensions with but includes some enhancements such as mixed factor
mixed type inputs. Some combinations may not even be types (continuous, discrete, categorical), and automatic
possible, for example, Random Forest based algorithm such hyperparameter tuning.
as [44] would not admit many capabilities. Note that this • BayesOpt (https://github.com/rmcantin/bayesopt) is
graph does not portray any theoretical limitations, but merely written in C++, and includes common interfaces for C,
presents a gist of the current capability through the lens of C++, Python, Matlab, and Octave [123].
composability.
Several open-source libraries are available for incor-
V. CONCLUSION
porating BO into computer programs. Depending on the
This review has presented an overview of Bayesian opti-
application, computation speed may be an issue. A com-
misation (BO) with application to experimental design.
mon operation in most algorithms is Cholesky decomposition
BO was introduced in relation to existing Design of Experi-
which is used to invert the kernel matrix and is generally
ments (DOE) methods such as factorial designs, response sur-
O(n3 ) for n data points, but with care this can be calculated
face methodology, and adaptive sampling. A brief discussion
incrementally as new points arrive, reducing the complexity
of the theory highlighted the roles of the Gaussian process,
to O(n2 ) [123]. Several algorithms gain speed-up by imple-
kernel, and acquisition function. A set of seven core issues
menting part of the algorithm on a GPU, which can be
was identified as being important in practical experimental
up to 100 times faster than the equivalent single-threaded
designs, and some detailed solutions were reviewed. These
code [124].
core issues are: (1) the incorporation of prior knowledge,
• GPyOpt (https://github.com/SheffieldML/GPyOpt) is a (2) high dimensional optimisation, (3) constraints, (4) batch
Bayesian optimisation framework, written in Python evaluation, (5) multiple objectives, (6) multi-fidelity data, and
and supporting parallel optimisation, mixed factor types (7) mixed variable types.
(continuous, discrete, and categorical), and inequality Recent works have shown the potential of Bayesian optimi-
constraints. sation in fields such as robotics, neuroscience, and materials
• GPflowOpt (https://github.com/GPflow/GPflowOpt) is discovery. As the range of potential applications expands, it is
written in Python and uses TensorFlow (https://www. increasingly unlikely that ‘‘vanilla’’ optimisation approaches
tensorflow.org) to accelerate computation on GPU hard- for small numbers of unconstrained, continuous variables will
ware. It supports multi-objective acquisition functions, be appropriate. This is particularly true in DACE simulation
and black-box constraints [125]. applications where high dimensional mixed-type inputs are
• DiceOptim (https://cran.r-project.org/web/packages/ typical.
DiceOptim/index.html) is a BO package written in R. Bayesian optimisation offers a powerful and rigorous
Mixed equality and inequality constraints are imple- framework for exploring and optimising expensive ‘‘black
mented using the method of [89], and parallel optimi- box’’ functions. While solutions exist for the core issues in
sation is via multipoint EI [91], however parallel and experimental design, each approach has strengths and weak-
constraints cannot be mixed in a single optimisation. nesses that could potentially be improved, and the combina-
• MOE (https://github.com/Yelp/MOE) supports paral- tion of the individual solutions is not necessarily straight-
lel optimisation via multi-point stochastic gradient forward. Thus there is a need for ongoing work in this
ascent [124]. Interfaces are provided for Python and area to: (1) improve the efficiency, generality, and scala-
C++, and optimisation can be accelerated on GPU bility of approaches to the core issues, (2) develop designs
hardware. that allow easy combination of multiple approaches, and
• SigOpt (http://sigopt.com) offers Bayesian optimisation (3) develop theoretical guarantees on the performance of
as a web service. The implementation is based on MOE, solutions.
[49] T. Ueno, T. D. Rhone, Z. Hou, T. Mizoguchi, and K. Tsuda, ‘‘COMBO: [72] T. Dai Nguyen, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, K. J. Deane,
An efficient Bayesian optimization library for materials science,’’ Mater. and P. G. Sanders, Cascade Bayesian Optimization. 2016, pp. 268–280.
Discovery, vol. 4, pp. 18–21, Jun. 2016. [73] S. Rana, C. Li, S. Gupta, V. Nguyen, and S. Venkatesh, ‘‘High dimen-
[50] T. Lookman, P. V. Balachandran, D. Xue, J. Hogden, and J. Theiler, ‘‘Sta- sional Bayesian optimization with elastic Gaussian process,’’ in Proc. Int.
tistical inference and adaptive design for materials discovery,’’ Current Conf. Mach. Learn., 2017, pp. 2883–2891.
Opinion Solid State Mater. Sci., vol. 21, no. 3, pp. 121–128, Jun. 2017. [74] C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, and A. Shilton, ‘‘High
[51] R. Gómez-Bombarelli et al., ‘‘Design of efficient molecular organic light- dimensional Bayesian optimization using dropout,’’ in Proc. 26th Int.
emitting diodes by a high-throughput virtual screening and experimental Joint Conf. Artif. Intell., 2017, pp. 2096–2102.
approach,’’ Nature Mater., vol. 15, no. 10, pp. 1120–1127, Oct. 2016. [75] C. Oh, E. Gavves, and M. Welling, ‘‘BOCK: Bayesian optimization
[52] P. I. Frazier and J. Wang, ‘‘Bayesian optimization for materials design,’’ with cylindrical kernels,’’ in Proc. 35th Int. Conf. Mach. Learn., J.
in Proc. Inf. Sci. Mater. Discovery Design. Cham, Switzerland: Springer, Dy and A. Krause, Eds. Stockholm, Sweden: Stockholmsmässan, 2018,
2016, pp. 45–75. pp. 3868–3877.
[53] A. Seko, T. Maekawa, K. Tsuda, and I. Tanaka, ‘‘Machine learning [76] C.-L. Li, K. Kandasamy, B. Póczos, and J. Schneider, ‘‘High dimensional
with systematic density-functional theory calculations: Application to Bayesian optimization via restricted projection pursuit models,’’ in Proc.
melting temperatures of single-and binary-component solids,’’ Phys. Rev. Artif. Intell. Statist., 2016, pp. 884–892.
B, Condens. Matter, vol. 89, no. 5, 2014, Art. no. 054303. [77] Z. Wang, C. Li, S. Jegelka, and P. Kohli, ‘‘Batched high-dimensional
[54] A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput, and I. Tanaka, Bayesian optimization via structural kernel learning,’’ in Proc. Int. Conf.
‘‘Discovery of low thermal conductivity compounds with first- Mach. Learn., 2017.
principles anharmonic lattice dynamics calculations and Bayesian [78] J. Gardner, C. Guo, K. Weinberger, R. Garnett, and R. Grosse, ‘‘Discover-
optimization,’’ 2015, arXiv:1506.06439. [Online]. Available: ing and exploiting additive structure for Bayesian optimization,’’ in Proc.
https://arxiv.org/abs/1506.06439 Artif. Intell. Statist., 2017, pp. 1311–1319.
[55] A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput, and I. Tanaka, [79] A. Nayebi, A. Munteanu, and M. Poloczek, ‘‘A framework for Bayesian
‘‘Prediction of low-thermal-conductivity compounds with first-principles optimization in embedded subspaces,’’ in Proc. Int. Conf. Mach. Learn.,
anharmonic lattice-dynamics calculations and Bayesian optimization,’’ 2019, pp. 4752–4761.
Phys. Rev. Lett., vol. 115, no. 20, 2015, Art. no. 205901. [80] M. Mutny and A. Krause, ‘‘Efficient high dimensional Bayesian opti-
[56] D. Packwood, Bayesian Optimization for Materials Science. Singapore: mization with additivity and quadrature Fourier features,’’ in Proc. Adv.
Springer, 2017. Neural Inf. Process. Syst., 2018, pp. 9005–9016.
[57] D. Yogatama and G. Mann, ‘‘Efficient transfer learning method for auto- [81] J. Kirschner, M. Mutny, N. Hiller, R. Ischebeck, and A. Krause, ‘‘Adaptive
matic hyperparameter tuning,’’ in Proc. 17th Int. Conf. Artif. Intell. Statist. and safe Bayesian optimization in high dimensions via one-dimensional
(AISTATS), Reykjavik, Iceland, Apr. 2014, pp. 1077–1085. subspaces,’’ in Proc. Int. Conf. Mach. Learn., 2019, pp. 3429–3438.
[58] T. T. Joy, S. Rana, S. K. Gupta, and S. Venkatesh, ‘‘Flexible transfer [82] J. Djolonga, A. Krause, and V. Cevher, ‘‘High-dimensional Gaussian
learning framework for Bayesian optimisation,’’ in Proc. Pacific–Asia process bandits,’’ in Proc. Adv. Neural Inf. Process. Syst. 27th Annu. Conf.
Conf. Knowl. Discovery Data Mining. Cham, Switzerland: Springer, Neural Inf. Process. Syst., Lake Tahoe, NV, USA, 2013, pp. 1025–1033.
2016, pp. 102–114. [83] M. A. Gelbart, J. Snoek, and R. P. Adams, ‘‘Bayesian optimization
[59] A. Shilton, S. Gupta, S. Rana, and S. Venkatesh, ‘‘Regret bounds for with unknown constraints,’’ in Proc. Uncertainty Artif. Intell., 2014,
transfer learning in Bayesian optimisation,’’ in Proc. Artif. Intell. Statist., pp. 250–259.
2017, pp. 307–315. [84] J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and
[60] R. Bardenet, M. Brendel, B. Kégl, and M. Sebag, ‘‘Collaborative hyper- J. P. Cunningham, ‘‘Bayesian optimization with inequality constraints,’’
parameter tuning,’’ in Proc. 30th Int. Conf. Mach. Learn. (ICML), Atlanta, in Proc. Int. Conf. Mach. Learn., 2014, pp. 937–945.
GA, USA, Jun. 2013, pp. 199–207. [85] S. Ariafar, J. Coll-Font, D. Brooks, and J. Dy, ‘‘ADMMBO: Bayesian
[61] J. Riihimäki and A. Vehtari, ‘‘Gaussian processes with monotonic- optimization with unknown constraints using ADMM,’’ J. Mach. Learn.
ity information,’’ in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, Res., vol. 20, no. 123, pp. 1–26, 2019.
pp. 645–652. [86] J. M. Hernández-Lobato, M. Gelbart, M. Hoffman, R. Adams, and
[62] M. Jauch and V. Peña, ‘‘Bayesian optimization with shape con- Z. Ghahramani, ‘‘Predictive entropy search for Bayesian optimization
straints,’’ 2016, arXiv:1612.08915. [Online]. Available: https://arxiv.org with unknown constraints,’’ in Proc. Int. Conf. Mach. Learn., 2015,
/abs/1612.08915 pp. 1699–1707.
[63] C. Li, S. Rana, S. Gupta, V. Nguyen, and S. Venkatesh, ‘‘Bayesian [87] E. C. Garrido-Merchán and D. Hernández-Lobato, ‘‘Predictive entropy
optimization with monotonicity information,’’ in Proc. 31st Conf. Neural search for multi-objective Bayesian optimization with constraints,’’ Neu-
Inf. Process. Syst. (NIPS), 2017. rocomputing, vol. 361, pp. 50–68, Oct. 2019.
[64] P. J. Lenk and T. Choi, ‘‘Bayesian analysis of shape-restricted functions [88] R. Lam and K. Willcox, ‘‘Lookahead Bayesian optimization with
using Gaussian process priors,’’ Statistica Sinica, vol. 27, pp. 43–69, inequality constraints,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017,
Jan. 2017. pp. 1888–1898.
[65] M. R. Andersen, E. Siivola, and A. Vehtari, ‘‘Bayesian optimization of [89] V. Picheny, R. B. Gramacy, S. Wild, and S. Le Digabel, ‘‘Bayesian
unimodal functions,’’ in Proc. Adv. Neural Inf. Process. Syst. (NIPS), optimization under mixed constraints with a slack-variable aug-
2017. mented Lagrangian,’’ in Proc. Adv. Neural Inf. Process. Syst., 2016,
[66] A. Ramachandran, S. K. Gupta, R. Santu, and S. Venkatesh, pp. 1435–1443.
‘‘Information-theoretic transfer learning framework for Bayesian [90] M. Abdolshah, A. Shilton, S. Rana, S. Gupta, and S. Venkatesh, ‘‘Multi-
optimisation,’’ in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery objective Bayesian optimisation with preferences over objectives,’’ in
Databases. Cham, Switzerland: Springer, 2018. Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2019.
[67] R. Jenatton, C. Archambeau, J. González, and M. Seeger, ‘‘Bayesian opti- [91] D. Ginsbourger, R. Le Riche, and L. Carraro, ‘‘A multi-points criterion for
mization with tree-structured dependencies,’’ in Proc. Int. Conf. Mach. deterministic parallel global optimization based on Gaussian processes,’’
Learn., 2017, pp. 1655–1664. Département Méthodes et Modèles Mathématiques pour l’Industrie, 3MI-
[68] K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, ENSMSE, Saint-Étienne, France, Tech. Rep. hal-00260579, 2008.
‘‘Raiders of the lost architecture: Kernels for Bayesian optimization in [92] J. Azimi, A. Fern, and X. Z. Fern, ‘‘Batch Bayesian optimization via
conditional parameter spaces,’’ 2014, arXiv:1409.4011. [Online]. Avail- simulation matching,’’ in Proc. Adv. Neural Inf. Process. Syst., 2010,
able: https://arxiv.org/abs/1409.4011 pp. 109–117.
[69] D. K. Duvenaud, H. Nickisch, and C. E. Rasmussen, ‘‘Additive [93] T. Desautels, A. Krause, and J. W. Burdick, ‘‘Parallelizing exploration-
Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, exploitation tradeoffs in Gaussian process bandit optimization,’’ J. Mach.
pp. 226–234. Learn. Res., vol. 15, no. 1, pp. 3873–3923, 2014.
[70] K. Kandasamy, J. G. Schneider, and B. Póczos, ‘‘High dimensional [94] J. González, Z. Dai, P. Hennig, and N. D. Lawrence, ‘‘Batch Bayesian
Bayesian optimisation and bandits via additive models,’’ in Proc. 32nd optimization via local penalization,’’ in Proc. Artif. Intell. Statist., 2015,
Int. Conf. Mach. Learn. (ICML), Lille, France, Jul. 2015, pp. 295–304. pp. 648–657.
[71] F. Hutter and M. A. Osborne, ‘‘A kernel for hierarchical [95] V. Nguyen, S. Rana, S. K. Gupta, C. Li, and S. Venkatesh, ‘‘Budgeted
parameter spaces,’’ 2013, arXiv:1310.5738. [Online]. Available: batch Bayesian optimization,’’ in Proc. IEEE 16th Int. Conf. Data Mining
https://arxiv.org/abs/1310.5738 (ICDM), Dec. 2016, pp. 1107–1112.
[96] C. Gong, J. Peng, and Q. Liu, ‘‘Quantile stein variational gradient descent [119] A. Sabharwal, H. Samulowitz, and G. Tesauro, ‘‘Selecting near-
for batch Bayesian optimization,’’ in Proc. Int. Conf. Mach. Learn., 2019, optimal learners via incremental data allocation,’’ in Proc. AAAI, 2016,
pp. 2347–2356. pp. 2007–2015.
[97] E. Contal, D. Buffoni, A. Robicquet, and N. Vayatis, ‘‘Parallel Gaussian [120] C. Zhang and K. Chaudhuri, ‘‘Active learning from weak and strong
process optimization with upper confidence bound and pure exploration,’’ labelers,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 703–711.
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases. [121] E. C. Garrido-Merchán and D. Hernández-Lobato, ‘‘Dealing with Cat-
Berlin, Germany: Springer, 2013, pp. 225–240. egorical and Integer-valued Variables in Bayesian optimization with
[98] S. Gupta, A. Shilton, S. Rana, and S. Venkatesh, ‘‘Exploiting strategy- Gaussian processes,’’ 2017, arXiv:1706.03673. [Online]. Available:
space diversity for batch Bayesian optimization,’’ in Proc. Int. Conf. Artif. https://arxiv.org/abs/1805.03463
Intell. Statist., 2018, pp. 538–547. [122] M. A. Villegas García, ‘‘An investigation into new kernels for categorical
[99] T. T. Joy, S. Rana, S. Gupta, and S. Venkatesh, ‘‘Batch Bayesian optimiza- variables,’’ M.S. thesis, Departament de Llenguatges i Sistemes Infor-
tion using multi-scale search,’’ Knowl.-Based Syst., vol. 187, Jan. 2020, màtics, Universitat Politècnica de Catalunya, Barcelona, Spain, 2013.
Art. no. 104818. [123] R. Martinez-Cantin, ‘‘BayesOpt: A Bayesian optimization library for
[100] A. Shah and Z. Ghahramani, ‘‘Parallel predictive entropy search for nonlinear optimization, experimental design and bandits,’’ J. Mach.
batch global optimization of expensive objective functions,’’ in Proc. Adv. Learn. Res., vol. 15, no. 1, pp. 3735–3739, 2014.
Neural Inf. Process. Syst., 2015, pp. 3330–3338. [124] J. Wang, S. C. Clark, E. Liu, and P. I. Frazier, ‘‘Parallel Bayesian global
[101] J. Wu and P. Frazier, ‘‘The parallel knowledge gradient method for batch optimization of expensive functions,’’ 2016, arXiv:1602.05149. [Online].
Bayesian optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2016, Available: https://arxiv.org/abs/1602.05149
pp. 3126–3134. [125] N. Knudde, J. van der Herten, T. Dhaene, and I. Couckuyt, ‘‘GPflow:
[102] P. Vellanki, S. Rana, S. Gupta, D. Rubin, A. Sutti, T. Dorin, M. Height, A Gaussian process library using TensorFlow,’’ 2017, arXiv:1711.03845.
P. Sanders, and S. Venkatesh, ‘‘Process-constrained batch Bayesian opti- [Online]. Available: https://arxiv.org/abs/1610.08733
misation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 3417–3426.
[103] S. Shan and G. G. Wang, ‘‘Survey of modeling and optimization strate- STEWART GREENHILL received the B.Sc. degree
gies to solve high-dimensional design problems with computationally-
in computer science from the University of West-
expensive black-box functions,’’ Struct. Multidisciplinary Optim., vol. 41,
ern Australia, in 1987, and the Ph.D. degree
no. 2, pp. 219–241, Mar. 2010.
[104] A. M. Gopakumar, P. V. Balachandran, D. Xue, J. E. Gubernatis, and in environmental science from Murdoch Univer-
T. Lookman, ‘‘Multi-objective optimization for materials discovery via sity, in 1992. He is currently a Research Fellow
adaptive design,’’ Sci. Rep., vol. 8, no. 1, p. 3738, 2018. with the Applied Artificial Intelligence Institute,
[105] Y. Collette and P. Siarry, Multiobjective Optimization: Principles and Deakin University, Australia. His research inter-
Case Studies. Berlin, Germany: Springer-Verlag, 2013. ests include machine learning, signal processing,
[106] J. Knowles, ‘‘ParEGO: A hybrid algorithm with on-line landscape embedded systems, software engineering, visual-
approximation for expensive multiobjective optimization problems,’’ ization, and interaction design.
IEEE Trans. Evol. Comput., vol. 10, no. 1, pp. 50–66, Feb. 2006.
[107] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze, ‘‘Multiobjective SANTU RANA is currently a Researcher in the
optimization on a limited budget of evaluations using model-assisted field of machine learning and computer vision
S -metric selection,’’ in Proc. Int. Conf. Parallel Problem Solving Nature. with the Applied Artificial Intelligence Insti-
Berlin, Germany: Springer, 2008, pp. 784–794.
tute, Deakin University, Australia. His research in
[108] M. Emmerich and J.-W. Klinkenberg, ‘‘The computation of the expected
improvement in dominated hypervolume of Pareto front approxima-
high-dimensional Bayesian optimization has been
tions,’’ Rapport Technique, Leiden Univ., Leiden, The Netherlands, applied to efficiently design alloys with large num-
Tech. Rep. LIACS-TR 9-2008, 2008. ber of elements. He has been actively conduct-
[109] V. Picheny, ‘‘Multiobjective optimization using Gaussian process emula- ing research in Bayesian experimental design with
tors via stepwise uncertainty reduction,’’ Statist. Comput., vol. 25, no. 6, applications in advanced manufacturing. In the last
pp. 1265–1280, Nov. 2015. four years, he has published more than 40 research
[110] D. Hernández-Lobato, J. Hernandez-Lobato, A. Shah, and R. Adams, articles improving various aspects of Bayesian optimization algorithm. Alto-
‘‘Predictive entropy search for multi-objective Bayesian optimization,’’ gether, he has published over 79 research articles, including 14 refereed
in Proc. Int. Conf. Mach. Learn., 2016, pp. 1492–1501. journal articles, 58 fully refereed conference proceedings, and seven work-
[111] P. Feliot, J. Bect, and E. Vazquez, ‘‘A Bayesian approach to constrained shop articles, with over 515 citations and an H-index of 12. He is also a
single-and multi-objective optimization,’’ J. Global Optim., vol. 67, Co-Inventor of two patents. His broad research interests lie in devising practi-
nos. 1–2, pp. 97–133, 2017.
cal machine learning algorithms for various tasks, such as object recognition,
[112] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, ‘‘Distributed
mathematical optimization, and healthcare data modeling.
optimization and statistical learning via the alternating direction method
of multipliers,’’ Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,
2011. SUNIL GUPTA is currently a Researcher in
[113] P. Perdikaris and G. E. Karniadakis, ‘‘Model inversion via multi-fidelity the field of machine learning and data mining
Bayesian optimization: A new paradigm for parameter estimation in with the Applied Artificial Intelligence Institute,
haemodynamics, and beyond,’’ J. Roy. Soc. Interface, vol. 13, no. 118, Deakin University, Australia. He has published
p. 20151107, 2016. over 100 research articles, including two book
[114] A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause, chapters, 25 refereed journal articles, 70 fully ref-
S. Schaal, and S. Trimpe, ‘‘Virtual vs. real: Trading off simulations ereed conference proceedings, and nine workshop
and physical experiments in reinforcement learning with Bayesian opti- articles with over 1000 citations and an H-index
mization,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, of 17. His research interest lies in developing
pp. 1557–1563. data-driven models for real world processes and
[115] K. Kandasamy, G. Dasarathy, J. Schneider, and B. Poczos, ‘‘Multi-fidelity
phenomena covering both big-data and small-data problems. His recent
Bayesian optimisation with continuous approximations,’’ in Proc. Int.
research in optimization using small data (Bayesian optimization) has found
Conf. Mach. Learn., 2017.
[116] A. Klein, S. Bartels, S. Falkner, P. Hennig, and F. Hutter, ‘‘Towards applications in efficient experimental design of products and processes in
efficient Bayesian optimization for big data,’’ in Proc. NIPS Workshop advanced manufacturing, such as alloy design with certain target proper-
Bayesian Optim. (BayesOpt), vol. 134, 2015, p. 98. ties, design of short nanofibers with appropriate length and thickness, and
[117] M. Poloczek, J. Wang, and P. Frazier, ‘‘Multi-information source opti- optimal setting of parameters in 3d-printers. His research has won several
mization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4291–4301. best paper awards in the field of data mining and machine learning. He is
[118] M. Cutler, T. J. Walsh, and J. P. How, ‘‘Reinforcement learning with multi- also a Co-Inventor of a patent related to experimental design. He regularly
fidelity simulators,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), serves at technical program committees of the prestigious machine learning
May 2014, pp. 3888–3895. conferences.