Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
2023 The Authors. Published by the Royal Society under the terms of the
Creative Commons Attribution License http://creativecommons.org/licenses/
by/4.0/, which permits unrestricted use, provided the original author and
source are credited.
1. Introduction 2
Bayesian data analysis is now an established part of the lexicon in contemporary applied statistics
The first direction focuses on intelligent data collection: instead of collecting and analysing all
possible data, or alternatively relying on traditional static experimental or survey designs, can we
devise efficient, cost-effective approaches to collecting those data that will be most informative for
the inferential purpose? In §2, authors Buchhorn and McGree focus on the opportunity to address
this issue through Bayesian optimal experimental design. While there is an emerging literature
on this approach in the context of clinical trials, they extend this attention to sampling designs
for complex ecosystems. Furthermore, they address the challenge of exact implementation of the
derived design in practice by introducing sampling windows in the optimal design. The new
methodology and computational solution are illustrated in a case study of monitoring coral reefs.
Following from consideration of data collection, the second direction considered in this paper
focuses on opportunities and challenges afforded through the emergence of new data sources.
In §3, authors Price, Santos-Fernández and Vercelloni focus on two such sources: quantitative
information elicited from subjects in virtual reality (VR) settings, and data provided by citizen
scientists. Bayesian approaches to modelling and analysing these data can help to increase trust
in these data and facilitate their inclusion in mainstream analyses. Some methods for achieving
this are set in the context of two case studies based in the Antarctic and the Australian Great
Barrier Reef.
The challenges of data collection are considered from a different direction in §4. Here, authors
Hassan and Salomone reflect on the exponential rise in interest in federated analysis and learning.
A canonical application of these approaches is the analysis of sensitive data from multiple data
sources held by different data custodians, while leaving the data in situ and maintaining data
privacy. The case study in this section focuses on federated learning with spatially dependent
latent variables.
In §§5 and 6, we swing attention away from data to the models themselves. First, authors
Drovandi, Jenner, Salomone and Wang consider the challenge of modelling increasingly complex
systems via implicit models, i.e. models with intractable likelihoods that can nevertheless be
simulated, and the opportunity afforded by likelihood-free algorithms such as sequential Monte
Carlo-based approximate Bayesian computation (SMC-ABC). These approaches are applied to a
substantive case study of calibrating a complex agent-based model (ABM) of tumour growth. In
§6, another direction for modelling is discussed by authors Bon, Bretherton and Drovandi. This
focuses on the challenge of transferring models developed in one context (dataset, location etc.) to
another context. Fully Bayesian approaches to this challenge are still emerging and promise great
opportunities in both research and practice.
The final direction we explore is in the translation of Bayesian practice to software products.
We acknowledge the plethora of Bayesian packages embedded in software such as R, Matlab
and Python, as well as stand-alone Bayesian products such as BUGS, INLA and Stan. These
have revolutionized the practice of Bayesian data analysis and have placed this capability in the
hands of applied researchers and practitioners. In §7, we focus on substantive software products
created to support purposeful decision-making that are underpinned by Bayesian models. Author
Mayfield describes a COVID-19 vaccine risk-benefit calculator (CoRiCAL) driven by a Bayesian
network model; Vercelloni describes a platform for global monitoring of coral reefs (ReefCloud)
based on a Bayesian hierarchical model; and Cramb describes an interactive visualization of small
3
area cancer incidence and survival across Australia (the Australian Cancer Atlas) based on a
Bayesian spatial model.
information from complex systems. However, the sheer size of these complex systems (e.g.natural
ecosystems like the Great Barrier Reef and river networks) and the expense of data collection
means that data cannot be collected throughout the whole system. Further, practical constraints
like connectivity, accessibility and data storage issues reduce our ability to sample frequently
through time. This has led to innovation in statistical methods for data collection, promoting
an emerging era of ‘intelligent data collection’ where data are collected for a particular purpose
such as understanding mechanisms for change, monitoring biodiversity and identifying threats
or vulnerabilities to long-term sustainability. Bayesian optimal experimental design is one such
area of recent innovation.
Bayesian design offers a framework for optimizing the collection of data specified by a design
d for a particular experimental goal, which may be to increase precision of parameter estimates,
maximize prediction accuracy and/or distinguish between competing models. More specifically,
Bayesian design is concerned with maximizing an expected utility, U(d) = Eu(d, θ, y) through the
choice of design d within a design space D, while accounting for uncertainty about, for example,
the parameter θ ∈ Θ and all conceivable datasets we might observe y ∈ Y. A Bayesian optimal
design d∗ can therefore be expressed as
1
M
d∗ ≈ arg maxd∈D u(d, θ (m) , y(m) ), (2.1)
M
m=1
where θ (m) ∼ p(θ; d) and y(m) ∼ p(y|θ (m) ; d), for some large value of M. Thus, computations
involving M different individual posterior distributions are required just to approximate the
expected utility of a design.
Secondly, d may be high-dimensional, meaning that a potentially large optimization problem
needs to be solved for a computationally expensive and noisy objective function. Accordingly,
the majority of research in Bayesian design has focused on developing new methods to address
one of these two challenges. Below we provide a brief summary of some of the relevant literature
from the last 25 years.
Since the conception of statistical decision theory [1,2] upon which the decision-theoretic
4
framework of Bayesian design is based [3], there have been numerous strategies presented
in the literature to address the above challenges. Curve-fitting methods were proposed by
J
hJ (d, θ 1:J , y1:J ) ∝ u(d, θ j , yj )p(yj , θ j ; d),
j=1
where it can be shown that the marginal distribution of d is proportional to U(d). Markov
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
chain Monte Carlo (MCMC) methods were then used to sample from this distribution, and
subsequently to approximate the marginal mode of d. Extensions of this approach were given
in [6,7] which include adopting a sequential Monte Carlo (SMC) algorithm to more efficiently
sample from the augmented distribution as J increases. However, such approaches are limited to
low-dimensional design problems (i.e. 3–4 design points) and simple models due to difficulties in
sampling efficiently in high dimensions.
Recently, there has been a shift from sampling-based methods to rapid, approximate posterior
inference methods. Combined with a Monte Carlo approximation as given in equation (2.1),
this has enabled expected utility functions to be efficiently approximated for realistic design
problems. This includes those based on complex models (such as nonlinear models) and models
for data that exhibit complex dependence structures (such as those with different sources of
variability including spatially and between groups). Such approximate inference methods include
the Laplace approximation [8] and variational Bayes [9], which have been combined with new
optimization algorithms (e.g. the approximate coordinate exchange algorithm; ACE [10]) to solve
the most complex and high-dimensional design problems to date.
The most prominent application of Bayesian design methods appears in the clinical trial
literature [11]. Recently, this has been exacerbated by the outbreak of COVID-19 where it
has been desirable to conduct clinical trial assessments as quickly as possible, with Bayesian
(adaptive) designs shown to yield more resource efficient and ethical clinical trials [12,13]. More
recently, Bayesian design methods have been proposed as a basis to efficiently monitor large
environmental systems like the Great Barrier Reef [14,15]. In the following case study, we show
how such methods can be used to form sampling designs to monitor a coral reef system, and
extend these methods to provide flexible designs that address major practical constraints when
sampling real-world ecosystems.
where θ ∗ = arg maxθ∈Θ log p(y, θ ; d) and H(θ ∗ ) is the Hessian matrix evaluated at θ ∗ . Here θ =
(β, γ ), and marginalization of Z is performed approximately using Monte Carlo integration. To
obtain an optimal design to for monitoring of the shoal, we propose a two-step approach
(i) Firstly, a global search for the Bayesian optimal design d∗ = (d∗1 , . . . , d∗q ), where q = 3 (the
total number of transects) is conducted. We consider a discretized design space, and find
designs via a discrete version of ACE; and
(ii) Secondly, we form design efficiency windows (illustrating robustness to imprecise
sampling) across rk for each transect k = 1, . . . , q. To do so, we specify a zero-mean
Gaussian process (GP) prior for the approximate expected utility across r ∈ Rq by U(r; d∗ ),
∗
i.e. U(r; d ) ∼ GP(0, K(·) + ζ0 I), for some kernel matrix K(·), and ζ0 > 0. The windows are
then obtained as follows:
(a) Centre the radius on d∗ , i.e. the Bayesian design from (i), and specify a maximum
value for rk for k = 1, . . . , q;
iid
(b) Randomly sample δi,1 , δi,2 ∼ Unif(−rk , rk ), where k is the transect from which image i
is obtained, and evaluate the approximate expected utility of the design at locations
si + δ i ;
(c) Fit a GP defined on r ∈ Rq to the approximate expected utilities;
(d) Emulate the expected utility surface across values of r using the posterior predictive
mean of the GP, denoted Ū(r);
(e) Normalize the predicted expected utility values by that of the original Bayesian
design as follows:
Ū(r; d∗ )
eff(r) = , (2.2)
Ū(0; d∗ )
(a) design efficiency (b) optimal transect region design
8 614 000 6
...............................................................
depth
1.0 d *3 d *2 50
northing
0.8
0.6 8 613 000 40
100 30
r3 0.4
d *1
0.2 20
0
8 612 500
0 200 transect
100 100 8 612 000
r3 200 0 r2
611 000 612 000 613 000 614 000
easting
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
Figure 1. The Bayesian design across the Barracouta East coral shoal, d∗ = (d∗1 , d∗2 , d∗3 ), are illustrated as black transect lines
(b). Sampling windows are formed around these transects allowing for flexibility in sampling locations while retaining 0.99 of
the optimal utility. Design efficiency contours across r ∈ Rq are shown (a). (Online version in colour.)
and use the above to obtain design efficiency contours (plotted in figure 1a). For
some design efficiency contour value c > 0, the corresponding sampling window is
the region in space defined by radii r(c) that satisfy eff(r(c)) = c.
Based on the approach, the Bayesian design, d∗ , shown in figure 1, situates the transects in
shallower areas of the reef, but at different depths in the shallow areas, presumably to provide
information about the depth effects, β. Avoiding the deeper regions of the shoal makes sense
physiologically, as the corals monitored here are photosynthetic organisms, and therefore rely
on light to survive. This design thus avoids the collection of data in areas where there is little
chance of observing coral. The design efficiency contours are also shown in figure 1. If we
consider a design efficiency of 0.99 in equation (2.2), then possible radius values are (50, 0, 44)
for the three transects, with the flexibility (sampling windows) this provides shown around each
transect. As can be seen, transect d∗2 is more sensitive than the other two, suggesting more effort
should be placed in sampling this transect precisely. In practical terms, sampling from shallow
areas of the reef, d∗1 and d∗3 , can be undertaken when the conditions are more unpredictable
(e.g.strong currents), and samples from d∗2 can be obtained when field conditions are more
preferable.
In conclusion, Bayesian optimal design addresses a fundamental problem in science: the
intelligent collection of data resulting in greater information efficiency, reduced sampling cost
and improved estimation. Such benefits have been observed in clinical trials [19–22] and
environmental monitoring [15,23], and we have shown how they can be used to offer flexible
yet efficient sampling in a real-world context. One limitation of the approach is the potential
reliance of designs on a number of assumptions, e.g. an assumed model for the data, so we would
encourage future research in areas that reduce this reliance and thus provide more robust designs
for data collection.
effective and efficient analysis methodology. Recently, Bayesian models have seen use as a method
to evaluate subject elicitation in the areas of coral reef conservation [27], jaguar and koala habitat
suitability assessments [28,29], and the aesthetic value of sites in the Antarctic Peninsula.
(i ) Case study: quantifying aesthetics of tourist landing sites in the Antarctic Peninsula
In the Antarctic Peninsula, the effects of climate change and the associated increase of ice-free
areas are threatening the fragile terrestrial biodiversity [30]. As well as high ecological importance,
these ecosystems also have a unique aesthetic value which has been formally recognized in
Article 3 of the Protocol on Environmental Protection to the Antarctic Treaty [31]. There is
value in protecting beautiful landscapes, as tourism in Antarctica is based largely on the natural
beauty of the environment. This case study quantifies aesthetic values in the Antarctic Peninsula
by recording elicitation from subjects immersed in a VR environment using a state-of-the-art
web-based framework R2VR [32].
Subject elicitation in this case study is drawn from 16 photos, obtained via 360◦ photography at
tourist landing sites in the Antarctic Peninsula. Consultation produced landscape characteristics
of interest, e.g. the presence of certain animals and the weather. These characteristics and images
were then used to construct an interview, to be held while the subject was immersed in the VR
environment, with responses recorded on the Likert scale, from strongly disagree to strongly
agree. From this elicitation process, responses to each question are recorded for each scene
presented to the participant, as well as their opinion of the aesthetic value of the scene itself.
Additionally, general participant characteristics such as gender identity and age are also recorded.
A Bayesian hierarchical model is used for modelling the response of whether or not a subject
i determines scene j (j = 1, . . . , o) as aesthetically pleasing (yij ) as a function of responses to
statements such as ‘there are animals in this image’ and ‘this image is monotonous’ (xik , k =
1, . . . , m), subject characteristics such as age and gender (xih , h = m, . . . , m + n), and subject-
reported confidence in their response to each interview statement (sij , j = 1, . . . , m), where zero
represents low confidence and one represents high confidence. The model is
ind
yij |αj , β 0sij , β 1 , ∼ Bernoulli logit−1 (αj + β
0sij , β 1 xi )
β 1 ∼ N (0, 102 In ),
iid
τlk , τα ∼ Gamma(10−2 , 10−2 ), k = 1, . . . , m, l = 0, 1
Its applications can be found across almost all disciplines of science, especially in ecology
and conservation where scientists are harnessing its power to help solve critical challenges
such as climate change and the decline in species abundance. Examples of citizen scientists’
contributions include reporting sightings of species, measuring environmental variables and
identifying species on images. Hundreds of CS projects can be found in popular online platforms
including Zooniverse [33], eButterfly [34], eBird [35] and iNaturalist [36]. A fundamental issue
often discussed surrounding CS is the quality of the data produced, which is generally error-
prone and biased. For example, bias can arise in CS datasets due to (i) the unstructured nature
of the data, (ii) collecting data opportunistically, with more observations from frequently visited
locations [37] or at irregular frequencies across time [38], and (iii) as a result of differing abilities of
the participants to perform tasks such as detecting or identifying species [39,40]. However, recent
advances in statistics, machine learning and data science are helping realize its full potential and
increase trustworthiness [40–42].
Frequently, CS data are elicited via image classification. For example, asking the participants
whether images contain a target class or species. In this section, we illustrate two modelling
approaches for these types of data.
In the first approach, we consider a binary response variable yij representing whether the
category has been correctly identified by the participant (i = 1, . . . , m) in the image (j = 1, . . . , n).
The probability of obtaining a correct answer can be modelled using an item response model such
as the three-parameter logistic model (3PL) [40,43],
yij |Zi , Bj , ηj , αj , ∼ Bernoulli ηj + (1 − ηj )logit−1 (αj (Zi − Bj ) , (3.1)
where αj and βj are the shape and the scale parameters in the beta distribution, respectively.
The above model can be parametrized via a specified prior mean μj for each Yj , and a
common precision parameter φ, via αj = μj φ and βj = −μj φ + φ, which in turn implies that
(a) (b)
9
2.0
density
yj
1.0
ŷij
yjpred
0.5
0
0 0.25 0.50 0.75 1.00
proportion of hard corals
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
Figure 2. (a) Elicited points with benthic categories in an underwater image from Great Barrier Reef, Australia. (b) True latent
proportion (in red) and the apparent proportion of hard corals (in green). The predicted proportion is represented in blue. (Online
version in colour.)
Var[Yj ] = μj (1 − μj )/(1 + φ). Covariates can also be incorporated by defining a beta regression
with logit(μj ) = ξ xj + Uj + εj , where εj are error terms, and Uj are spatially dependent random
effects. Both approaches account for spatial variation (captured in Bj or Uj for the first and second
approach, respectively) using different spatial structures (e.g. conditional autoregressive (CAR)
priors, covariance matrices, or Gaussian random fields). See more details in [40,42,45].
The following case study illustrates the estimation of the latent proportion of hard corals
across the Great Barrier Reef in Australia, obtained from underwater images classified by citizen
scientists. Figure 2 shows 15 spatially balanced random points in one of the images used in the
study. The apparent proportion of hard coral in the image was obtained using the number of
points selected by participants containing this category out of 15. Using equation (3.2), the (biased)
estimates obtained from the citizen scientists can be corrected producing a similar density to the
latent unobserved proportions.
The integration of CS data with current monitoring efforts from Australian federal agencies
and non-governmental organizations is a breakthrough to increase the amount of information
about changes along the Great Barrier Reef, learn about climate change impacts and adapt
management actions consequently. This model introduced here is the root of a digital platform
that estimates the health of the Great Barrier Reef using all available information. This study
contributes to increasing the trust of CS and produce reliable data for environmental conservation
while engaging and arising awareness about coral reefs.
...............................................................
h(θt, g(1), g(2), g(3))
f p (θ_y(1) ), p(θ_y(2) ), p(θ_y(3) )
g(1)
p(θ|y(1)) p(θ|y(3)) θt+1
θt+1g(2) θt+1g(3)
p(θ|y(2))
calculate calculate
g(1) = θtlog p(θt, y(1)) g(3) = θtlog p(θt, y(3))
calculate
calculate p(θ|y(1)) calculate p(θ|y(3)) g(2) = θtlog p(θt, y(2))
calculate p(θ|y(2))
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
Figure 3. Federated approaches lie on a continuum between post hoc posterior amalgamation approaches as used in certain
distributed MCMC approaches (a) and collaborative multi-round approaches (b). (Online version in colour.)
same entities. An example of a horizontal setting is where different countries possess the data
for those primarily residing within. By contrast, an example of vertical federated learning would
be where two companies possess their respective sales data for the same collection of customers.
The term ‘federated learning’ originated in the deep learning literature with the introduction of
the FedAvg algorithm [46]. FedAvg involves updating parameter values of a global model to be
the weighted average of parameter values obtained by updating the same model locally (possibly
many times) at each iteration. This work led to many related optimization algorithms, e.g. FedProx
[47], and FedNova [48] which account for heterogeneous (non i.i.d.) data sources, and the Bayesian
nonparametric approach for learning neural networks of [49], where local model parameters are
matched to a global model via the posterior distribution of a Beta-Bernoulli process [50]. To date,
practical federated analyses appear restricted to the frequentist setting. Examples include the
prediction of breast cancer using distributed logistic regression [51] and modelling of the survival
of oral cavity cancer through a distributed proportional Cox hazards model [52]. Both these
approaches conduct parameter estimation via a Newton–Raphson algorithm [53] and result in
equivalent maximum-likelihood estimates to those obtained in a standard, non-federated setting.
Algorithms for the maximum-likelihood estimation of log-linear and logistic regression models
in vertical federated learning settings [54–58] use ideas such as secure multiparty computation
[59], and formulating the parameter estimation task as a dual problem [60]. Several overarching
software infrastructures such as VANTAGE6 [61] ensure the correct and secure use of the data of
each custodian within the specified algorithm, given acceptable (model- and application-specific)
rules for information exchange.
Despite the potentially enabling capabilities of federated methods, to our knowledge, Bayesian
federated learning methods have yet to impact real-world applications. In the Bayesian inference
setting, the ‘learning’ task becomes one of performing posterior inference, e.g. via MCMC
or variational inference techniques. Note that Bayesian federated learning approaches may
involve multiple communication rounds, though this is only sometimes the case. For example,
many distributed MCMC approaches (e.g. [62–64]), combine individually-fit model posteriors,
requiring only a single communication step from each local node. A recent intermediate
approach [65] is to construct a surrogate likelihood of the complete dataset via an arbitrarily
specified number of communication steps. After constructing the surrogate likelihood, an MCMC
algorithm is run on a single device. As the number of communication steps increases, the
approximation error introduced by the surrogate likelihood decreases. Figure 3 illustrates the
difference between post hoc posterior amalgamation strategies and collaborative multi-round
approaches.
In certain cases, carrying out federated Bayesian inference is (at least in principle) relatively
straightforward. For example, a naive MCMC algorithm would be trivial to construct for a simple
model class, such as any generalized linear model (which assumes the data are independent),
11
provided that one is not concerned with the number of communication steps. To see this, note
that the (log-)posterior density function decomposes as
Hence, for the horizontal setting, all that is required is the nodes sharing the sum of
their respective log-likelihood terms with the server. However, this approach would require
a minimum of two communication steps per iteration of the Markov chain. Recent MCMC
methods, similar in style to the FedAvg algorithm (which use Langevin dynamics to update the
Markov chain), require only a single communication step per iteration [66,67]. Such approaches
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
exploit gradient information which decomposes as a sum similarly to (4.1), though eschew the
usual Metropolis–Hastings correction and are hence asymptotically inexact. In some instances,
a formally justified notion of privacy may be required, as opposed to simply an intuitive one
given by aggregation of terms. Differential privacy (DP) (e.g. [68]) provides such guarantees, and
there are variants of MCMC that ensure this, such as DP-MCMC [69], which accomplishes privacy
guarantees at the cost of a slight perturbation of stationary distribution of the chain. It is worth
noting that all of the above examples mentioned are specific to the horizontal setting, with the
vertical setting proving especially challenging as one does not have a beneficial decomposition
like that of (4.1).
As the above alludes to, the development and use of Bayesian federated learning algorithms
are complex for several reasons. A method is only suitable for a prescribed application if it satisfies
a combination of requirements, such as being able to work with the desired model, computational
and communication costs, privacy and accuracy. For each application, the choice of model and
federated method will depend on where the priorities lie, e.g. accuracy, efficiency or privacy. In
some cases, there may be no feasible algorithm (an example is given in the upcoming case study).
Thus, inference approaches that improve upon some (or even all) of these aspects are important
and warrant future research.
The ultimate goal of federated Bayesian analysis is to circumvent the need for data merging
[70,71] in scenarios where merging is considered infeasible. However, for Bayesian federated
learning to reach this point, these approaches must offer custodians and interested parties an
accurate inference for complex models while maintaining a level of privacy acceptable to those
data custodians. Thus, the methodological development that enables federated inference for more
advanced Bayesian models efficiently and/or with additional privacy guarantees is likely to
emerge as a critical area of interest in the coming years.
(b) Case study: federated learning with spatially dependent latent variables
The greatest hindrance to employing federated learning in real-world applications is the lack of
possible model types that current algorithms address. Commonly, applied statistical modelling
involves incorporating hierarchical structures, and latent variables [72]. To our knowledge, there
are no federated Bayesian analysis algorithms at all for such models. To briefly illustrate the
unique challenges and the need for developments that account for the nuances of different
models, the case study considers spatially dependent latent variables based on neighbourhood
structures. For simplicity, the focus is on the Intrinsic Conditional AutoRegressive (ICAR) prior
[73], although variations such as the Besag–York–Mollie [74] and Leroux [75] models are similar in
what follows (the latter is used for example, in the Australian Cancer Atlas described in §7). The
ICAR prior posits a vector of spatially dependent latent variables, denoted here as Z. Each element
of Z corresponds to a latent area-level effect of a ‘site’, which is influenced by neighbouring
sites. Writing i ∼ j to denote that sites i and j are considered neighbours, and assuming the
graph arising from the neighbourhood structure is fully connected, the ICAR prior with precision
hyperparameter τ has log-density
12
n τ
log p(z; τ ) = log τ − (zi − zj )2 + const.
...............................................................
i∼j
The above may be problematic if the data custodians insist that the latent variables corresponding
to their areas must be kept private to themselves. To see why, consider the case that there are two
(2) data custodians, with the sets C1 and C2 containing the indices of data possessed by the first
and second custodian, respectively. Then,
(zi − zj )2 = (zi − zj )2 + (zi − zj )2 + (zi − zj )2 , (4.2)
i∼j i,j∈C1 :i∼j i,j∈C2 :i∼j i∈C1 ,j∈C2 :i∼j
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
where terms in blue are those relevant to the sites under the first custodian, and those in red to
the second. When computing the log-posterior density (as required, for example, in Markov chain
Monte Carlo algorithms), the first two terms on the right-hand side above can be aggregated and
sent to the central server. However, the final term cannot as each individual summand requires the
individual latent variables to be processed. This is because the latter term considers interactions
across custodian boundaries.
Consequently, solutions such as (i) employing judicious reparameterization of the latent
variables (possibly compatible with the one that is often also required to enforce identifiability),
(ii) changing the model to add additional auxiliary variables or (iii) otherwise approximating the
troublesome term, are required. An additional challenge is that even if inferences on individual
latent variables are only available to their respective custodians, they may nevertheless ‘leak’
information across custodian boundaries to neighbouring sites due to the underlying dependency
structure.
While the above certainly highlights particular challenges, the first two terms of (4.2) split
nicely across custodians and hint that latent variables need not always be problematic. For more
straightforward cases such as the latter, specialized accurate and efficient inference approaches
that allow individual custodians to avoid ever sharing their latent variables (either directly or
indirectly) or data are the subject of forthcoming work by the authors of this section, who have
a longer-term goal of tackling more challenging cases such as ICAR and its relatives in different
settings.
10–15 years advancing ABC and related methods that lie within the more general class of
so-called likelihood-free inference methods. A substantial portion of methodologically focused
ABC research considers aspects including the effective choice of || · || (e.g. [81,82]), efficient
sampling algorithms to explore the approximate posterior in (5.1) (e.g. [83]) and ABC’s theoretical
properties (e.g. [84]). Many of the developments of ABC and some related methods (e.g. Bayesian
synthetic likelihood [85,86]) prior to 2018 are discussed in [76], the first-ever monograph on ABC.
The following case study considers a popular class of sampling algorithms for ABC based
on SMC. SMC-based ABC algorithms improve efficiency compared to sampling naively from
the prior by gradually reducing the ABC tolerance where the output produced at iteration t is
used to improve the proposal distribution of θ at iteration t + 1. The output of the algorithm is N
samples, or ‘particles’, from the ABC posterior in (5.1) with a final that is either pre-specified or
determined adaptively by the algorithm. Each particle has attached to it a ‘distance’, which is the
value of ||x − y|| for x simulated from the model based on the particle’s parameter value. Here,
we use the adaptive SMC-ABC algorithm in [87], itself a minor modification of the replenishment
algorithm of [88]. The algorithm is summarized below.
(i) Draw N samples from the prior, and for each sample, simulate the model and compute
the corresponding distance. Initialize as the largest distance among the set of particles.
(ii) Set the next as the α-quantile of the set of distances. Retain the Nα particles with distance
less than or equal to .
(iii) Resample the retained particle set N − Nα times so that there are N particles.
(iv) Run MCMC on each of the resampled N − Nα particles with stationary distribution (5.1)
with the current . This step helps to remove duplicate particles created from the previous
resampling step. The number of MCMC iterations can be adaptively set based on the
MCMC acceptance rate.
(v) Repeat steps (ii)–(iv) until a desired is reached or the MCMC acceptance rate in step
(iv) is too small (i.e. the number of MCMC steps becomes too large for the computational
budget).
A key computational inefficiency of ABC and closely related methods such as BSL is that many
of the model simulations yield MCMC proposals that are rejected. To obtain a suitable quality of
approximation, it is not uncommon to require continuing the algorithm past the point where
is small enough to have average acceptance probabilities of 10−2 or less. To overcome this issue,
there has been significant attention devoted to machine learning based approaches to likelihood-
free inference, especially in the past 5 years. These methods use model simulations (from different
parameter values) as training data for building a conditional density estimator of the likelihood
(e.g. [89]), likelihood ratio (e.g. [90]) or posterior density (e.g. [91]). Following this estimation,
standard methods from the Bayesian inference toolkit can be used. Many machine learning
approaches to likelihood-free inference can be implemented sequentially, so that samples from
the approximate posterior in the previous (or all previous) iterations can comprise an increasingly
informed training set that yields a more accurate conditional density estimator in regions of non-
14
negligible posterior probability. For the case study below, we compare the SMC-ABC approach
with the sequential neural likelihood (SNL) method of [89], which is outlined below.
(iv) Repeat steps (ii) and (iii) for a desired number of rounds.
...............................................................
2.5 – 97.5%
2000 2000
1000 1000
0 0
5 10 15 20 25 5 10 15 20 25
1500 1500
tumour size
1000 1000
500 500
0 0
5 10 15 20 25 5 10 15 20 25
3000 3000
2000 2000
1000 1000
0 0
5 10 15 20 25 5 10 15 20 25
2000 2000
1000 1000
0 0
5 10 15 20 25 5 10 15 20 25
time (days) time (days)
Figure 4. The posterior predictive distributions of (a) SMC-ABC and (b) SNL for the synthetic and ovarian cancer datasets. The
black solid line is the tumour growth data. (Online version in colour.)
simulations for each dataset. To compare the performance of SMC-ABC and SNL, we compute
the posterior predictive distribution for each dataset. For SNL, we choose the round that
visually produces the most accurate posterior predictive distribution. We find that for SNL the
performance can degrade with increasing rounds in three ovarian cancer datasets.
The results are shown in figure 4. It can be seen that SMC-ABC produces posterior predictive
distributions that tightly enclose the time series of tumour volumes for three real-world ovarian
cancer datasets. It is evident that SNL produces an accurate posterior predictive distribution for
the synthetic dataset, with substantially fewer model simulations than that used for SMC-ABC.
This result is aligned with other synthetic examples in the literature (e.g. [106]). However, the
16
SNL results for the real data are mixed, and for the three real datasets SMC-ABC produces more
accurate posterior predictive distributions. Further, we do not necessarily see an improvement
6. Model transfer
Updating prior beliefs based on data is a core tenet of Bayesian inference. In the Bayesian
context, model transfer extends Bayesian updating by incorporating information from a well-
known source domain into a target domain. Consider the scenario where a target domain has
insufficient data yT to enable useful inference. Model transfer allows us to borrow information
from a source domain with sufficient data yS to improve inference. The transferability problem
then is a question of when to transfer information, which information to transfer, and how to
transfer this information. This problem appears across several domains, with some solutions
exploiting the underlying properties of the source model, while others create informative priors
with the source information. Below, we will discuss several different approaches to the model
transfer problem. This broad topic is also known as transfer learning in the machine learning
literature [109].
Naive updating, which uses all available source information, is a natural starting point to
approach model transfer, though it can be detrimental. If the source and target distributions are
dissimilar, negative transfer [110] may occur reducing the inference or predictive power from our
posterior. Power priors [111] correct for the difference between source and target distributions by
flattening the likelihood of the source distribution. This flattening is done by choosing a value
φ ∈ [0, 1] and raising the source likelihood to the value of φ which gives
π (θ |φ, yT , yS ) ∝ fT (yT |θ)fS (yS |θ)φ π (θ),
where fS (yS |θ ) and fT (yT |θ ) are the source and target likelihood functions, respectively. Naive
updating would simply use the value φ = 1. Finding an appropriate value for φ is challenging,
intuitively we want to treat this as a latent variable and assign an appropriate prior. Unfortunately,
even when both datasets are from the same distribution, the resulting posterior marginal of φ may
exhibit only slightly less variance than the chosen prior. This phenomenon is analysed in [112]
with illustrative examples. Other approaches attempt to determine an appropriate value of φ by
optimization. Different information criteria, from the standard deviance information criterion to
more complex penalized likelihood-type criterion, have been used [113] including the marginal
likelihood [114] and the pseudo-marginal likelihood [115] which are evaluated using only the
target data.
The transfer learning literature has a large number of methods for model transfer, evident
by the recent review paper [109]. Many of these methods are specific to neural networks, but
some can still be applied to broader classes of statistical models. An example of such a method
is described in [116] which uses an ensemble of convolutional neural networks with a majority
voting selection step that is easily generalized for use beyond neural networks. Another method,
TrAdaBoost.R2 [117,118] adapts boosting [119] to the model transfer problem. This method
iteratively reweights each data point in the source and target domain to improve the predictive
17
performance of the target model. There are also several methods specific to generalized linear
models. These use a variety of approaches to achieve model transfer for generalized linear models
Above, λ ∈ [0, 1], where λ = 0 indicates no information transfer and λ = 1 complete information
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
transfer. For the interested reader, exemplar code is available via [128].
Current state-of-the-art Bayesian model transfer generalizes naive Bayesian updating but relies
on fixed levels of transfer rather than incorporating uncertainty. It is still not clear how one should
learn an optimal φ value in this paradigm but we expect future research will address this and
use uncertainty more effectively. Moreover, given the interest in model-specific transfer learning,
we believe that a Bayesian approach will be useful to develop general methods that are model
agnostic.
7. Purposeful products
A key advantage of Bayesian methods is their ability to assist in decision making, and here three
different case studies showcase innovative tools using Bayesian approaches.
...............................................................
two 0% vaccine
none one two
dose
variant alpha delta alpha delta alpha delta
vaccine effectiveness against
effectiveness
symptomatic infection effective 0 0 0.6 0.3 0.8 0.6
vaccine
effective 60% not
1 1 0.4 0.7 0.2 0.4
not effective 60% effective
Downloaded from https://royalsocietypublishing.org/ on 12 June 2024
Figure 5. An example Bayesian network with a single, dependent child node (vaccine effectiveness) and two parent nodes
(vaccine doses and variant). Conditional probability table for vaccine effectiveness is shown on the right. (Online version in
colour.)
that the model can respond to user-defined scenarios such as ‘how likely is it that I will get sick’
rather than just ‘will I get sick’. Finally, Bayesian networks are highly interpretable models [140],
as they allow exploration of the effect of different observed values (evidence) on the probability
of certain outcomes.
The COVID-19 risk calculator (CoRiCAL—https://corical.immunisationcoalition.org.au) was
developed to help the general public, as well as the doctors advising them, weigh-up the risks
and benefits of receiving a COVID-19 vaccination. A Bayesian network model was constructed
and parameterized based on the best available evidence from a range of sources that can be
used to determine a person’s risk of developing symptomatic COVID-19, dying or other adverse
effects from COVID-19, or suffering from adverse effects (including death) from the vaccine itself
[141]. The model relied on Australian data to represent the context as accurately as possible,
however in cases where local data was lacking, international data was used [142,143]. Full model
information, along with model code is available via the link [144]. A web-based interface (figure 6)
was developed to create a user-friendly tool that considers a person’s age and sex, the brand of the
vaccine, how many vaccines they have had already, and the current levels of transmission within
the community and displays their chances of an adverse event alongside common relatable risks.
As the pandemic landscape changes, it remains crucial that the evidence for making informed
choices on COVID-19 vaccination is made accessible. The model is updated in light of new
variants, and as new vaccines become available and recommended (e.g. booster shots).
Figure 6. An example output from the CoRiCAL COVID-19 risk calculator tool. (Online version in colour.)
uses outputs from artificial intelligence algorithms trained to classify points on images [148].
For each Marine Ecoregion of the World (MEOW, [149]), a set of images j = 1, 2, . . . , J, each
composed of k = 1, 2, . . . , 50 elicitation points is used across years of monitoring. Counts, yit
for observation i sampled at location si and time t, are modelled using a binomial distribution
(with p the probability of positive and ni the total number of positive cases) and controlled by
additional components including the fixed effects of environmental disturbances (cyclones and
mass coral bleaching events), sampling nested design (depth, transect, site, reef and monitoring
program) modelled as independent and identically distributed Gaussian random effects, and
spatio-temporal random effects.
The novelty in this model is the incorporation of a spatio-temporal random effects composed
of a first-order autoregressive process in time and a Gaussian field that is approximated using a
Gaussian Markov random field (GMRF), where the covariance is determined by a Matérn kernel.
We employed the GMRF representation as a stochastic partial differential equation, using the
method of [150], implemented in the R package INLA [151]. The spatial domain is based on the
observed data locations and a buffer with adjacent MEOWs to allow information sharing between
units. Spatial predictions are estimated at a grid level of 5 × 5 km resolution and posterior
distributions used to reconstruct coral cover values at coarser spatial scales including MEOWs
units and country level. Finally, estimations of coral cover are weighted by the proportion of coral
reefs within a MEOW unit following the methodology developed as part of the global coral reef
monitoring network [152]. We use the default INLA priors for different types of model parameters
as discussed in [153]. The model is as follows:
yit |β, Z, Vi ∼ binomial ni , logit−1 β xi + r(si , t) + Vi ,
The priors for the autoregressive parameter φ and independent Gaussian random effects Vi used
are the INLA defaults. Research efforts focus on developing new technologies to assess the status
of coral reefs in rapid and cost-effective ways through automatic image detection [148] and learn
about impacts of multiple disturbances and management strategies [154,155]. ReefCloud is an
open-access digital tool that support coral reef monitoring and decision-making by integration
of data analyses and reporting (https://reefcloud.ai/). The online collection of worldwide data
provides a unique opportunity to model these data together to (i) increase understanding on
the impacts of environmental disturbances and (ii) reduce uncertainty when estimating coral
20
Figure 7. An example output from ReefCloud showing temporal trend in coral cover estimating from a Bayesian model for the
central and southern parts of the Great Barrier Reef. (Online version in colour.)
trajectories at large spatial scales. The pilot product version is developed using the most extensive
monitoring program in the world surveying the Great Barrier Reef, Australia. Machine learning
outputs from one million of reef images are used to predict values in coral cover across 3000
coral reefs from 2004 onward. The ReefCloud online dashboard makes knowledge about reef
changes accessible to everyone (figure 7). The project also educates the reef research community
and managers on how Bayesian statistical modelling can help to increase our understanding of
the impacts of climate change on coral reefs and supporting decision-making from local to global
scales.
Figure 8. An example screenshot of the Australian Cancer Atlas showing melanoma incidence patterns with summary graphs.
Red represents high incidence while blue is low in comparison to the national average (pale yellow). (Online version in colour.)
patterns, and further include cancer risk factors, some types of cancer treatment and selected
cancer clinical/stage patterns. Underpinned by Bayesian methods, the Atlas will continue to
provide the methods and visualizations necessary for accurate estimation, interpretation and
making decisions.
8. Conclusion
This paper has focused on a small number of current opportunities and challenges in the
application of the Bayesian paradigm. Of course, these are not the only issues, but collectively
they point to the maturity of current Bayesian practice and the promise of a fully mature
Bayesian future. As a final thought, we note that many advances in applied Bayesian statistics
in recent years are deeply indebted to computational and methodological advances surrounding
complex hierarchically structured models. Modern applied Bayesian statistics thus finds itself at
the interface with not only its traditional neighbour mathematics, but also increasingly with the
field of computer science. This partnership is one of considerable further promise in the years to
come.
Data accessibility. This article has no additional data.
Authors’ contributions. J.J.B.: formal analysis, investigation, methodology, software, writing—original draft,
writing—review and editing; A.B.: formal analysis, investigation, methodology, software, writing—original
draft, writing—review and editing; K.B.: formal analysis, investigation, methodology, software, writing—
original draft, writing—review and editing; S.C.: formal analysis, investigation, methodology, software,
writing—original draft, writing—review and editing; C.C.D.: formal analysis, investigation, methodology,
software, writing—original draft, writing—review and editing; C.H.: formal analysis, investigation,
methodology, software, writing—original draft, writing—review and editing; A.J.: formal analysis,
investigation, methodology, software, writing—original draft, writing—review and editing; H.M.: formal
analysis, investigation, methodology, software, writing—original draft, writing—review and editing; J.M.M.:
formal analysis, investigation, methodology, software, writing—original draft, writing—review and editing;
K.M.: conceptualization, formal analysis, investigation, methodology, project administration, resources,
supervision, writing—original draft, writing—review and editing; A.P.: formal analysis, investigation,
methodology, supervision, writing—original draft, writing—review and editing; R.S.: formal analysis,
investigation, methodology, software, writing—original draft, writing—review and editing; E.S.-F: formal
22
analysis, investigation, methodology, writing—original draft, writing—review and editing; J.V.: formal
analysis, investigation, methodology, software, writing—original draft, writing—review and editing; X.W.:
Acknowledgements. We thank Dr Jasmine Lee for collecting and providing the 360◦ images in Antarctica.
References
1. Neyman J, Pearson ES. 1928 On the use and interpretation of certain test criteria for purposes
of statistical inference: Part I. Biometrika 20A, 175–240.
2. Wald A. 1949 Statistical decision functions. Ann. Math. Stat. 20, 165–205. (doi:10.1214/aoms/
1177730030)
3. Lindley DV. 1972 Bayesian statistics: a review. Montpelier, VT: SIAM.
4. Müller P, Parmigiani G. 1995 Optimal design via curve fitting of Monte Carlo experiments.
J. Am. Stat. Assoc. 90, 1322–1330.
5. Müller P. 1999 Simulation-based optimal design. Handb. Stat. 6, 459–474.
6. Müller P, Sansó B, De Iorio M. 2004 Optimal Bayesian design by inhomogeneous Markov
chain simulation. J. Am. Stat. Assoc. 99, 788–798.
7. Amzal B, Bois FY, Parent E, Robert CP. 2006 Bayesian-optimal design via interacting particle
systems. J. Am. Stat. Assoc. 101, 773–785. (doi:10.1198/016214505000001159)
8. Overstall AM, McGree JM, Drovandi CC. 2018 An approach for finding fully Bayesian
optimal designs using normal-based approximations to loss functions. Stat. Comput. 28,
343–358. (doi:10.1007/s11222-017-9734-x)
9. Foster A, Jankowiak M, Bingham E, Horsfall P, Teh YW, Rainforth T, Goodman N. 2019
Variational Bayesian optimal experimental design. Part of advances in neural information
processing systems 32 (NeurIPS 2019).
10. Overstall AM, Woods DC. 2017 Bayesian design of experiments using approximate
coordinate exchange. Technometrics 59, 458–470. (doi:10.1080/00401706.2016.1251495)
11. Berry DA. 2006 Bayesian clinical trials. Nat. Rev. Drug Discov. 5, 27–36. (doi:10.1038/nrd1927)
12. Connor JT, Elm JJ, Brogliofor KR, The ESETT and ADAPT-IT Investigators, 2013 Bayesian
adaptive trials for comparative effectiveness research: an example in status epilepticus. J.
Clin. Epidemiol. 66, S130–S137. (doi:10.1016/j.jclinepi.2013.02.015)
13. Thorlund K, Haggstrom J, Park JJH, Mills EJ. 2018 Key design considerations for adaptive
clinical trials: a primer for clinicians. BMJ 360, k698. (doi:10.1136/bmj.k698)
14. Kang SY, McGree JM, Drovandi C, Mengersen K, Caley J. 2016 Bayesian adaptive
design: improving the effectiveness of reef monitoring programs. Ecol. Appl. 26, 2637–2648.
(doi:10.1002/eap.1409)
15. Thilan P, Fisher R, Thompson H, Menendez P, Gilmour J, McGree JM. 2022 Adaptive
monitoring of coral health at Scott Reef where data exhibit nonlinear and disturbed trends
over time. Ecol. Evol. 12, e9233.
16. Wagner D, Friedlander AM, Pyle RL, Brooks CM, Gjerde KM, Wilhelm TA. 2020 Coral reefs
of the high seas: hidden biodiversity hotspots in need of protection. Front. Mar. Sci. 7, 1–13.
17. AIMS, 2021 Annual summary report of coral reef condition 2020/21. See www.aims.gov.au/
reef-monitoring/gbr-condition-summary-2020-2021.
18. Buchhorn K, Mengersen K, Santos-Fernandez E, Peterson EE, McGree JM. 2022 Bayesian
design with sampling windows for complex spatial processes. Preprint (https://arxiv.org/
abs/2206.05369).
19. Bassi A, Berkhof J, de Jong D, van de Ven PM. 2021 Bayesian adaptive decision-theoretic
23
designs for multi-arm multi-stage clinical trials. Stat. Methods Med. Res. 30, 717–730.
(doi:10.1177/0962280220973697)
24. Mazumdar S, Ceccaroni L, Piera J, Hölker F, Berre A, Arlinghaus R, Bowser A. 2018 Citizen
science technologies and new opportunities for participation. In Citizen science technologies
and new opportunities for participation. London, UK: UCL Press.
25. Queiroz ACM, Nascimento AM, Tori R, Silva Leme MID. 2019 Immersive virtual
environments and learning assessments. In Int. Conf. on Immersive Learning, pp. 172–181.
Berlin, Germany: Springer.
26. Fauville G, Queiroz ACM, Bailenson JN. 2020 Virtual reality as a promising tool to promote
climate change awareness. Technol. Health, 91–108.
27. Vercelloni J et al. 2018 Using virtual reality to estimate aesthetic values of coral reefs. R. Soc.
Open Sci. 5, 172226. (doi:10.1098/rsos.172226)
28. Mengersen K et al. 2017 Modelling imperfect presence data obtained by citizen science.
Environmetrics 28, e2446. (doi:10.1002/env.2446)
29. Leigh C et al. 2019 Using virtual reality and thermal imagery to improve statistical modelling
of vulnerable and protected species. PLoS ONE 14, e0217809.
30. Lee JR, Raymond B, Bracegirdle TJ, Chadès I, Fuller RA, Shaw JD, Terauds A.
2017 Climate change drives expansion of Antarctic ice-free habitat. Nature 547, 49–54.
(doi:10.1038/nature22996)
31. Parties ATC. 1960 Protocol on environmental protection to the Antarctic treaty. Madrid,
Spain.
32. Vercelloni J, Peppinck J, Santos-Fernandez E, McBain M, Heron G, Dodgen T, Peterson
EE, Mengersen K. 2021 Connecting virtual reality and ecology: a new tool to run
seamless immersive experiments in R. PeerJ Comput. Sci. 7, e544. (doi:10.7717/peerj-
cs.544)
33. Zooniverse, 2022 Zooniverse. See www.zooniverse.org (accessed 23 September 2022).
34. Prudic KL, McFarland KP, Oliver JC, Hutchinson RA, Long EC, Kerr JT, Larrivée M. 2017
eButterfly: leveraging massive online citizen science for butterfly conservation. Insects 8, 53.
(doi:10.3390/insects8020053)
35. Sullivan BL, Wood CL, Iliff MJ, Bonney RE, Fink D, Kelling S. 2009 eBird: a citizen-
based bird observation network in the biological sciences. Biol. Conserv. 142, 2282–2292.
(doi:10.1016/j.biocon.2009.05.006)
36. Nugent J. 2018 Inaturalist. Sci. Scope 41, 12–13.
37. van Strien AJ et al. 2013 Occupancy modelling as a new approach to assess supranational
trends using opportunistic data: a pilot study for the damselfly Calopteryx splendens. Biodivers.
Conserv. 22, 673–686. (doi:10.1007/s10531-013-0436-1)
38. Dwyer RG, Carpenter-Bundhoo L, Franklin CE, Campbell HA. 2016 Using citizen-collected
wildlife sightings to predict traffic strike hot spots for threatened species: a case study on the
southern cassowary. J. Appl. Ecol. 53, 973–982. (doi:10.1111/1365-2664.12635)
39. Strebel N, Kéry M, Schaub M, Schmid H. 2014 Studying phenology by flexible modelling
of seasonal detectability peaks. Methods Ecol. Evol. 5, 483–490. (doi:10.1111/2041-210X.
12175)
40. Santos-Fernández E, Mengersen K. 2021 Understanding the reliability of citizen science
observational data using item response models. Methods Ecol. Evol. 12, 1533–1548.
41. Freitag A, Meyer R, Whiteman L. 2016 Strategies employed by citizen science programs to
increase the credibility of their data. Citiz. Sci.: Theory Pract. 1, 2.
42. Santos-Fernandez E, Peterson EE, Vercelloni J, Rushworth E, Mengersen K. 2021 Correcting
24
misclassification errors in crowdsourced ecological data: a Bayesian perspective. J. R. Stat.
Soc. C 70, 147–173.
47. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V. 2020 Federated optimization in
heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450.
48. Wang J, Liu Q, Liang H, Joshi G, Poor HV. 2020 Tackling the objective inconsistency
problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 33, 7611–
7623.
49. Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N, Khazaeni Y. 2019 Bayesian
nonparametric federated learning of neural networks. In Int. Conf. on Machine Learning,
pp. 7252–7261. Long Beach, CA: PMLR.
50. Thibaux R, Jordan MI. 2007 Hierarchical beta processes and the Indian buffet process.
In Conf. on Artificial Intelligence and Statistics, pp. 564–571. San Juan, Puerto Rico:
PMLR.
51. Deist TM et al. 2020 Distributed learning on 20 000+ lung cancer patients—the personal
health train. Radiother. Oncol. 144, 189–200. (doi:10.1016/j.radonc.2019.11.019)
52. Geleijnse G, Chiang RCJ, Sieswerda M, Schuurman M, Lee K, van Soest J, Dekker A, Lee
WC, Verbeek XA. 2020 Prognostic factors for survival in patients with oral cavity cancer:
a comparison of the Netherlands and Taiwan using privacy-preserving federated analyses.
Sci. Rep. 10, 20526.
53. Cellamare M, van Gestel AJ, Alradhi H, Martin F, Moncada-Torres A. 2022 A
federated generalized linear model for privacy-preserving analysis. Algorithms 15, 243.
(doi:10.3390/a15070243)
54. Fienberg SE, Fulp WJ, Slavkovic AB, Wrobel TA. 2006 ‘Secure’ log-linear and logistic
regression analysis of distributed databases. In Int. Conf. on Privacy in Statistical Databases,
pp. 277–290. Berlin, Germany: Springer.
55. Slavkovic AB, Nardi Y, Tibbits MM. 2007 ‘Secure’ logistic regression of horizontally and
vertically partitioned distributed databases. In 7th IEEE Int. Conf. on Data Mining Workshops
(ICDMW 2007), pp. 723–728. Omaha, NE: IEEE.
56. Shi H, Jiang C, Dai W, Jiang X, Tang Y, Ohno-Machado L, Wang S. 2016 Secure multi-party
computation grid logistic regression (SMAC-GLORE). BMC Med. Inform. Decis. Mak. 16, 175–
187. (doi:10.1186/s12911-016-0316-1)
57. Li Y, Jiang X, Wang S, Xiong H, Ohno-Machado L. 2016 Vertical grid logistic regression
(VERTIGO). J. Am. Med. Inform. Assoc. 23, 570–579. (doi:10.1093/jamia/ocv146)
58. Kamphorst B, Rooijakkers T, Veugen T, Cellamare M, Knoors D. 2022 Accurate training of
the Cox proportional hazards model on vertically-partitioned data while preserving privacy.
BMC Med. Inform. Decis. Mak. 22, 1–18. (doi:10.1186/s12911-022-01771-3)
59. Cramer R, Damgård IB. 2015 Secure multiparty computation. Cambridge, UK: Cambridge
University Press.
60. Minka TP. 2003 A comparison of numerical optimizers for logistic regression. Unpublished
Draft, pp. 1–18.
61. Moncada-Torres A, Martin F, Sieswerda M, Van Soest J, Geleijnse G. 2020 VANTAGE6:
an open source privacy preserving federated learning infrastructure for secure insight
exchange. In AMIA Annual Symp. Proc., vol. 2020, p. 870. Chicago, Il: American Medical
Informatics Association.
62. Wang X, Dunson DB. 2014 Parallelizing MCMC via Weierstrass sampler. Preprint (https://
arxiv.org/abs/1312.4605).
63. Neiswanger W, Wang C, Xing EP. 2014 Asymptotically exact, embarrassingly parallel
25
MCMC. In Proc. of the 13th Conf. on Uncertainty in Artificial Intelligence, UAI’14, pp. 623–632.
Arlington, Virginia, USA: AUAI Press.
org/abs/2005.08679).
69. Heikkilä M, Jälkö J, Dikmen O, Honkela A. 2019 Differentially private Markov chain Monte
Carlo. Adv. Neural Inf. Process. Syst. 32.
70. Bohensky MA, Jolley D, Sundararajan V, Evans S, Pilcher DV, Scott I, Brand CA. 2010 Data
linkage: a powerful research tool with potential problems. BMC Health Serv. Res. 10, 1–7.
(doi:10.1186/1472-6963-10-346)
71. Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, Goldstein H.
2017 Challenges in administrative data linkage for research. Big Data Soc. 4, 1–12.
(doi:10.1177/2053951717745678)
72. Gelman A, Hill J. 2006 Data analysis using regression and multilevel/hierarchical models.
Cambridge, UK: Cambridge University Press.
73. Besag J. 1974 Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. B
(Methodol.) 36, 192–225.
74. Besag J, York J, Mollié A. 1991 Bayesian image restoration, with two applications in spatial
statistics. Ann. Inst. Stat. Math. 43, 1–20. (doi:10.1007/BF00116466)
75. Leroux BG, Lei X, Breslow N. 2000 Estimation of disease rates in small areas: a new mixed
model for spatial dependence. In Statistical Models in Epidemiology, the Environment, and
Clinical Trials, pp. 179–191. Berlin: Springer.
76. Sisson SA, Fan Y, Beaumont M. 2018 Handbook of approximate Bayesian computation. London,
UK: Chapman and Hall/CRC.
77. Beaumont MA, Zhang W, Balding DJ. 2002 Approximate Bayesian computation in
population genetics. Genetics 162, 2025–2035. (doi:10.1093/genetics/162.4.2025)
78. Beven K, Binley A. 1992 The future of distributed models: model calibration and uncertainty
prediction. Hydrol. Processes 6, 279–298. (doi:10.1002/hyp.3360060305)
79. Beven K, Binley A. 2014 Glue: 20 years on. Hydrol. Processes 28, 5897–5918.
(doi:10.1002/hyp.10082)
80. Nott DJ, Marshall L, Brown J. 2012 Generalized likelihood uncertainty estimation (GLUE)
and approximate Bayesian computation: what’s the connection? Water Resour. Res. 48.
(doi:10.1029/2011WR011128)
81. Prangle D. 2018 Summary statistics. In Handbook of approximate Bayesian computation, pp. 125–
152. London, UK: Chapman and Hall/CRC.
82. Drovandi C, Frazier DT. 2022 A comparison of likelihood-free methods with and without
summary statistics. Stat. Comput. 32, 1–23. (doi:10.1007/s11222-022-10092-4)
83. Sisson S, Fan Y 2018 Handbook of approximate Bayesian computation. ABC Samplers, 87–123.
London, UK: Chapman and Hall/CRC (Chapter).
84. Frazier DT, Martin GM, Robert CP, Rousseau J. 2018 Asymptotic properties of approximate
Bayesian computation. Biometrika 105, 593–607. (doi:10.1093/biomet/asy027)
85. Price LF, Drovandi CC, Lee A, Nott DJ. 2018 Bayesian synthetic likelihood. J. Comput. Graph.
Stat. 27, 1–11. (doi:10.1080/10618600.2017.1302882)
86. Frazier D, Nott DJ, Drovandi C, Kohn R. 2022 Bayesian inference using synthetic likelihood:
asymptotics and adjustments. J. Am. Stat. Assoc. 1–12.
87. Carr MJ, Simpson MJ, Drovandi C. 2021 Estimating parameters of a stochastic cell invasion
model with fluorescent cell cycle labelling using approximate Bayesian computation. J. R.
Soc. Interface 18, 20210362. (doi:10.1098/rsif.2021.0362)
88. Drovandi CC, Pettitt AN. 2011 Estimation of parameters for macroparasite
26
population evolution using approximate Bayesian computation. Biometrics 67, 225–233.
(doi:10.1111/j.1541-0420.2010.01410.x)
139. Xue J, Gui D, Lei J, Zeng F, Mao D, Zhang Z. 2017 Model development of a
participatory Bayesian network for coupling ecosystem services into integrated water
resources management. J. Hydrol. 554, 50–65. (doi:10.1016/j.jhydrol.2017.08.045)
140. Uusitalo L. 2007 Advantages and challenges of Bayesian networks in environmental
modelling. Ecol. Modell. 203, 312–318. (doi:10.1016/j.ecolmodel.2006.11.033)
141. Mayfield HJ et al. 2022 Designing an evidence-based Bayesian network for estimating
the risk versus benefits of AstraZeneca COVID-19 vaccine. Vaccine 40, 3072–3084.
(doi:10.1016/j.vaccine.2022.04.004)
142. Lau CL et al. 2021 Risk-benefit analysis of the AstraZeneca COVID-19 vaccine in
Australia using a Bayesian network modelling framework. Vaccine 39, 7429–7440.
(doi:10.1016/j.vaccine.2021.10.079)
143. Sinclair JE, Mayfield HJ, Short KR, Brown SJ, Puranik R, Mengersen K, Litt JC, Lau CL. 2022 A
Bayesian network analysis quantifying risks versus benefits of the Pfizer COVID-19 vaccine
in Australia. npj Vaccines 7, 1–11. (doi:10.1038/s41541-022-00517-6)
144. BayesFusion interactive model repository: CoRiCal AstraZeneca model. See https://repo.
bayesfusion.com/network/permalink?net=Small+BNs%2FCoRiCalAZ.xdsl.
145. Dixon AM, Forster PM, Heron SF, Stoner AM, Beger M. 2022 Future loss of
local-scale thermal refugia in coral reef ecosystems. PLoS Clim. 1, e0000004.
(doi:10.1371/journal.pclm.0000004)
146. Hughes TP et al. 2018 Spatial and temporal patterns of mass bleaching of corals in the
anthropocene. Science 359, 80–83. (doi:10.1126/science.aan8048)
147. Vercelloni J, Mengersen K, Ruggeri F, Caley MJ. 2017 Improved coral population estimation
reveals trends at multiple scales on Australia’s Great Barrier Reef. Ecosystems 20, 1337–1350.
(doi:10.1007/s10021-017-0115-2)
148. Gonzalez-Rivero M et al. 2020 Monitoring of coral reefs using artificial intelligence: a feasible
and cost-effective approach. Remote Sens. 12, 489.
149. Spalding MD et al. 2007 Marine ecoregions of the world: a bioregionalization of coastal and
shelf areas. BioScience 57, 573–583. (doi:10.1641/B570707)
150. Lindgren F, Rue H. 2015 Bayesian spatial modelling with R-INLA. J. Stat. Softw. 63, 1–25.
(doi:10.18637/jss.v063.i19)
151. Rue H, Martino S, Chopin N. 2009 Approximate Bayesian inference for latent Gaussian
models by using integrated nested Laplace approximations. J. R. Stat. Soc. B 71, 319–392.
(doi:10.1111/j.1467-9868.2008.00700.x)
152. Souter D, Planes S, Wicquart J, Logan M, Obura D, Staub F, 2020 Status of coral reefs of
the world: 2020. In Global Coral Reef Monitoring Network; Int. Coral Reef Initiative. Townsville,
Australia: Australian Institute of Marine Science.
153. Moraga P. 2019 Geospatial health data: modeling and visualization with R-INLA and Shiny. New
York, NY: CRC Press.
154. Vercelloni J, Liquet B, Kennedy EV, González-Rivero M, Caley MJ, Peterson EE, Puotinen M,
Hoegh-Guldberg O, Mengersen K. 2020 Forecasting intensifying disturbance effects on coral
reefs. Glob. Change Biol. 26, 2785–2797. (doi:10.1111/gcb.15059)
155. Kennedy EV et al. 2020 Coral reef community changes in Karimunjawa National Park,
Indonesia: assessing the efficacy of management in the face of local and global stressors.
J. Mar. Sci. Eng. 8, 760. (doi:10.3390/jmse8100760)
156. Australian Institute of Health Welfare. 2021 Cancer in Australia 2021. Report, AIHW.
29
157. Leroux BG, Lei X, Breslow N. 2000 Estimation of disease rates in small areas: a new mixed model
for spatial dependence, pp. 135–178. New York, NY: Springer.