0% found this document useful (0 votes)
32 views

Markov Chain Monte Carlo Sampling Using A Reservoir Method

This document summarizes a research article that proposes a novel application of reservoir sampling to improve standard Markov chain Monte Carlo (MCMC) methods. Specifically, it introduces a stochastic thinning algorithm using reservoir sampling that can be embedded in most MCMC methods to reduce autocorrelation among the generated sample. The algorithm converges in total variation to the target distribution in probability under mild conditions. Simulation results show that the proposed sampling algorithm has smaller Monte Carlo variance than the corresponding MCMC methods without thinning, while maintaining similar estimation bias.

Uploaded by

Kevin Rios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Markov Chain Monte Carlo Sampling Using A Reservoir Method

This document summarizes a research article that proposes a novel application of reservoir sampling to improve standard Markov chain Monte Carlo (MCMC) methods. Specifically, it introduces a stochastic thinning algorithm using reservoir sampling that can be embedded in most MCMC methods to reduce autocorrelation among the generated sample. The algorithm converges in total variation to the target distribution in probability under mild conditions. Simulation results show that the proposed sampling algorithm has smaller Monte Carlo variance than the corresponding MCMC methods without thinning, while maintaining similar estimation bias.

Uploaded by

Kevin Rios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

COMSTA: 6789 Model 3G pp. 1–11 (col.

fig: nil)

Computational Statistics and Data Analysis xxx (xxxx) xxx

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis


journal homepage: www.elsevier.com/locate/csda

Markov chain Monte Carlo sampling using a reservoir method


Zhonglei Wang
Wang Yanan Institute for Studies in Economics and School of Economics Xiamen University, Xiamen, Fujian 361005,
People’s Republic of China

highlights

• A reservoir sampling algorithm embedded in MCMC methods is proposed.


• The sample generated by the proposed sampling algorithm converges in total variation to the target distribution in probability
under mild conditions.
• The commonly used thinning procedure is a special case of the proposed sampling algorithm.
• Simulation results show that the proposed sampling algorithm is more efficient than the corresponding MCMC methods without
thinning.

article info a b s t r a c t

Article history: Markov chain Monte Carlo methods are widely used to draw a sample from a target
Received 15 October 2017 distribution which is hard to characterize analytically, and reservoir sampling is devel-
Received in revised form 3 March 2019 oped to obtain a sample from a data stream sequentially in a single pass. A stochastic
Accepted 2 May 2019
thinning algorithm using reservoir sampling is proposed, and it can be embedded in
Available online xxxx
most Markov chain Monte Carlo methods to reduce the autocorrelation among the
Keywords: generated sample. The distribution of the sample generated by the proposed sampling
Convergence algorithm converges in total variation to the target distribution in probability under mild
Metropolis–Hastings algorithm conditions. A practical method is introduced to detect the convergence of the proposed
Removal procedure sampling algorithm. Two simulation studies are conducted to compare the proposed
Stochastic thinning
sampling algorithm and the corresponding Markov chain Monte Carlo methods without
thinning, and results show that the estimation bias of the proposed sampling algorithm
is approximately the same as the corresponding Markov chain Monte Carlo method,
but the proposed sampling algorithm has a smaller Monte Carlo variance. The proposed
sampling algorithm saves computer memory in the sense that the storage of a small
portion of the Markov chain is required in each iteration.
© 2019 Elsevier B.V. All rights reserved.

1. Introduction 1

Drawing a sample from a target distribution is a fundamental computer practice in many fields. Markov chain Monte 2

Carlo (MCMC) methods are widely used when the target distribution is complex and infeasible to characterize analytically, 3

and the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970; Chib and Greenberg, 1995) and the Gibbs 4

sampler (Geman and Geman, 1984; Gelfand and Smith, 1990) are the most popular MCMC approaches. 5

MCMC is an active research area, and many methods have been proposed during the past thirty years. In order to 6

handle the local-trap problem, where a target density has many local minima separated by regions with high densities 7

E-mail address: [email protected].

https://doi.org/10.1016/j.csda.2019.05.001
0167-9473/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

2 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 (Liang et al., 2011), different solutions are provided. Swendsen and Wang (1987) proposed the Swendsen–Wang algorithm
2 for Potts models (Wu, 1982), and it is efficient since a cluster of spins is updated in each iteration. Edwards and Sokal
3 (1988) generalized the Swendsen–Wang algorithm to other models. Neal (2003) discussed a slice sampling method, and
4 it draws a sample from a region with large density values. Gilks et al. (1994) considered an adaptive direction sampling
5 method to speed up the convergence of the MCMC. Based on the framework of the adaptive direction sampling method, Liu
6 et al. (2000) discussed a conjugate-gradient Monte Carlo method and demonstrated that it performs significantly better
7 than the traditional Metropolis–Hastings algorithm. Except for the local-trap problem, Gilks et al. (1995) and Gilks et al.
8 (1997) proposed the adaptive rejection Metropolis sampling method for the case where the target probability density
9 function is not log-concave. Girolami and Calderhead (2011) discussed MCMC algorithms when the target distribution
10 is high-dimensional and the sample exhibits strong correlation. Leskovec and Faloutsos (2006) proposed to use random
11 walk to obtain a sample from a large graph. Gjoka et al. (2010) compared four sampling methods to get a representative
12 sample for an undirected graph using Facebook data. Al Hasan and Dave (2018) discussed the usage of MCMC algorithms
13 to approximate local topological structures in large networks.
14 Reservoir sampling (Vitter, 1985; Efraimidis and Spirakis, 2006) is a powerful sequential random sampling algo-
15 rithm (Vitter, 1984) for drawing a sample from a data stream in a single pass, and it has been widely used in areas
16 with data streams, such as financial applications, network monitoring and security (Babcock et al., 2002). The basic idea
17 of reservoir sampling is to update one element in the reservoir randomly with a given probability in each iteration. It
18 is implemented without reading the complete population and only requires constant computer memory to store the
19 reservoir.
20 In this paper, we propose a novel application of reservoir sampling to improve the standard MCMC methods in the
21 sense that the generated sample is less correlated. The proposed sampling algorithm is a stochastic thinning procedure,
22 and it can be embedded in most MCMC methods. The distribution of the sample generated by the proposed sampling
23 algorithm converges in total variation to the target distribution in probability under mild conditions. It saves computer
24 memory by storing a small portion of a Markov chain.
25 The rest of the paper is organized as follows. In Section 2, the concepts of MCMC methods and the Metropolis–Hastings
26 algorithm are briefly reviewed. The sampling algorithm embedded in the Metropolis–Hastings algorithm is proposed in
27 Section 3, and other MCMC methods can be used as well. The convergence property of the proposed sampling algorithm
28 is studied in Section 4. A practical way to detect the convergence of the proposed sampling algorithm is introduced in
29 Section 5. Two simulation studies are conducted to test the performance of the proposed sampling algorithm in Section 6.
30 Discussion is given in Section 7.

31 2. Basic setup

32 Denote (S, S , π ) to be a probability space, where S ⊂ Rp with p ≥ 1, S is a σ -algebra for S, and π (·) is a probability
33 measure of the measurable space (S, S ). We are interested in obtaining a sample of size n from the target distribution
34 π (·).
35 An MCMC method is often used when it is difficult to draw a sample from π (·) directly. Denote X0 , X1 , . . . to be a time
36 homogeneous Markov chain with π (·) as its invariant distribution (Athreya and Lahiri, 2006). Traditional MCMC methods
37 treat {XN +i : i = 1, . . . , n} as a sample of size n from the target distribution π (·) approximately, where N is the ‘‘burn-in’’
38 iteration or a large number determined by a convergence diagnostic tool, such as the scale reduction factor discussed
39 by Gelman and Rubin (1992).
40 As one of the widely used MCMC methods, the Metropolis–Hastings algorithm is briefly reviewed and is used to
41 demonstrate the proposed sampling algorithm in the next section. Suppose that the target distribution π (·) is dominated
42 by a σ -finite measure µ(·), and it∫has a probability density function f (·). Denote q(y | x) to be a probability density function
43 of y with respect to µ(·), that is, S q(y | x)µ(dy) = 1 for x ∈ S, and it is easy to generate a sample from q(y | x). A Markov
44 chain X0 , X1 , . . . is obtained by repeating the following two steps from a random starting value X0 .

45 Step 1. Given Xk = x, generate y by the probability density function q(y | x), where k ≥ 0.
46 Step 2. Set

with probability p(x, y)


{
y
47 Xk+1 =
x with probability 1 − p(x, y),
48 where
{ }
f (y)q(x | y)
49 p(x, y) = min ,1 .
f (x)q(y | x)
50 It is well-known that a sample generated by the Metropolis–Hastings algorithm is not independent since information
51 of Xk is used to generate Xk+1 , and the autocorrelation of a consecutive segment of the MCMC is large. In the next section,
52 we propose a sampling algorithm such that the dependence among the sample is weakened automatically, and only a
53 small portion of a Markov chain is recorded in each iteration.

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 3

3. Proposed sampling algorithm 1

A reservoir sampling algorithm embedded in the Metropolis–Hastings algorithm is proposed to generate a sample of 2

size n from the target distribution π (·), and it updates a reservoir of size n in each iteration. 3
(m) (m) (m) (m) (m)
For m = 0, . . . , M, denote R(m) = {x1 , . . . , xn } to be the reservoir at the mth iteration, and Nx = (kx,1 , . . . , kx,n ) to 4
(m)
be the counting vector for elements in R , where M is the number of iterations, and it is determined by a convergence 5
(m) (m)
diagnostic tool discussed in Section 5. For i = 1, . . . , n, kx,i counts the number of times that xi stays in the reservoir 6
(m) (m) (m)
consecutively at the mth iteration, and kx,i ≤ m + 1. For example, if kx,1 = 3 and m ≥ 2, then x1 is selected into the 7

reservoir at the (m − 2)th iteration, and it stays in the reservoir consecutively up to the mth iteration. The proposed 8

sampling algorithm embedded in the Metropolis–Hastings algorithm consists of the following four steps. That is, 9

(0)
Step 1. [Initialization]: Choose x1 from the support of the target distribution π (·) randomly, and use the Metropolis– 10
(0)
Hastings algorithm to generate the remaining elements for R(0) . Set kx,i = 1 for i = 1, . . . , n and m = 11

1. 12

Step 2. [Addition]: For the mth iteration, apply the Metropolis–Hastings algorithm to generate L elements, say 13
(m−1)
y1 , . . . , yL , using xn as the starting value, where L < n is a fixed value. Assign the counting value ky,j = 1 14
(m−1)
for j = 1 . . . , L. Combine R(m−1) and Nx with their counterparts associated with y1 , . . . , yL−1 to obtain the 15
(m) (m) (m) (m−1)
extended reservoir RE and the counting vector NE , where RE = {x1 , . . . , xn(m−1) , y1 , . . . , yL−1 } and NE(m) = 16
(m−1) (m−1) (m) (m)
(kx,1 , . . . , kx,n , ky,1 , . . . , ky,L−1 ). Notice that there are n + L − 1 elements in RE and NE 17
(m)
Step 3. [Removal]: Remove L elements from RE sequentially. For the lth removal step, let 18

(m)
(m) ki (m)
Pi,l = ∑ (m)
I(i ∈ Al ), 19
(m)
j∈Al
kj

(m) (m)
and remove one element from Al : i ∈ Al(m) }, where l = 1, . . . , L, i =
with removal probabilities {Pi,l 20
(m) (m) (m)
1, . . . , n + L − 1 is the index of the elements in is the ith element of NE , I(x ∈ A) = 1 if x ∈ A and 0
RE , ki 21
(m) (m)
otherwise, Al ⊂ {1, . . . , n + L − 1} is the index set after removing l − 1 elements, and A1 = {1, . . . , n + L − 1}. 22
(m) (m) (m)
After removing L elements from RE , denote the remaining ones as x1 , . . . , xn−1 . Update the reservoir by R(m) = 23

{x(m) (m) (m) (m)


1 , . . . , xn−1 , xn }, where xn = yL , and update Nx(m) accordingly. 24

Step 4. [Repetition]: Set m = m + 1, and repeat Step 2–3 until m > M. 25

The removal procedure in Step 3 is the ‘‘Weighted Random Sampling without Replacement, with defined Weights’’ 26

discussed by Efraimidis (2015), and it tends to remove the ‘‘older’’ elements in the reservoir. The elements of the updated 27
(0)
reservoirs are from one Markov chain generated by the Metropolis–Hastings algorithm using x1 as the starting value. 28

Thus, the convergence property of the Metropolis–Hastings algorithm holds for the proposed sampling algorithm, and 29

more details are provided in the next section. By implementing the removal procedure in Step 3, the autocorrelation 30

among the elements in the reservoir is weakened automatically since a consecutive segment of the Markov chain is less 31

likely to appear in the updated reservoir. By putting yL into R(m) directly in Step 3, we can speed up the convergence of 32

the proposed sampling algorithm. 33

Instead of the Metropolis–Hastings algorithm, the proposed sampling algorithm can be embedded in other MCMC 34

methods as well. The proposed sampling algorithm saves computer memory by storing at most (n + L)(p + 1) scalars, and 35

it is much better than the traditional MCMC algorithms, which stores an unnecessary part of a Markov chain in order to 36

check convergence if the scale reduction factor (Gelman and Rubin, 1992) is used. 37

4. Main results 38

(1) (2)
Based on the removal procedure of the proposed sampling algorithm, the counting vectors Nx , Nx , . . . form another 39

Markov chain. There exists a consistent probability space (Ω , F , P ∗ ) and random variables {Zm : m = 1, 2, . . .} such that 40
(1) (m)
(Z1 , . . . , Zm ) is identically distributed as (Nx , . . . , Nx ) for m ≥ 1; see Theorem 6.3.4 discussed by Athreya and Lahiri 41
(m) (m) ∑n (m)
(2006) for details. Without loss of generality, we denote Zm as Nx for m ≥ 1. Let Sx = i=1 x,i be the total counts of
K 42
(m) (m)
times that elements stay in the reservoir at the mth step, where Kx,i is the ith element of Nx . By the removal procedure, 43
(m)
it can be shown that n ≤ Sx ≤ (m + 1)(n − L) + L for m ∈ N. 44

In order to guarantee the convergence of the proposed sampling algorithm, we need to show that every element in 45

the reservoir can be updated by a more recent one in a long run. To be rigorous, we have the following result. 46

Theorem 4.1. Given a sample size n > 1 and L satisfying L < n, for m ≥ 1 and i = 1, . . . , n, we have 47

( )j
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) < 1 − , (1) 48
s+1

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

4 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 and recall that P ∗ (·) is a consistent probability measure of the counting vectors. Furthermore, for m ∈ N and ϵ > 0, there exists
2 J = J(m, ϵ ) ∈ N such that
⎛ ⎞

⋃ (m+j) ϵ
3 P ∗ ⎝ {Kx,i = j + 1} | Sx(m) = s⎠ < . (2)
n
j=J

4 The proof of Theorem 4.1 is given in Appendix A. Some comments are made for Theorem 4.1. First, the event
5 {Kx(m
,i
+j)
= j + 1} implies that element x(m i
+j)
is selected into the reservoir at the mth iteration, and it stays in R(m+i) for
∗ (m+j)
6 i = 1, . . . , j. By the proof of (1), P (Kx,i = j+1 | Sx(m) = s) converges to 0 becomes faster as L increases; see (A.2) and (A.3)
7 for details. However, the elements in the reservoir may be more correlated since more newly generated elements remain in
8 the reservoir. Thus, L balances the convergence rate and the autocorrelation among the sample generated by the proposed
9 sampling algorithm. By (2) of Theorem 4.1, the probability of any element staying in the reservoir consecutively for a long
10 run diminishes to 0. Specifically, suppose that an element x0 is selected into the reservoir at the m0 th iteration. Then, both
11 n and m0 are regarded as fixed for x0 in the proposed sampling algorithm. For any ϵ > 0, there exists J = J(m0 , ϵ ), such
12 that the probability of x0 remaining in the reservoir consecutively for j steps is smaller than ϵ/n, where j ≥ J. Setting
13 ϵ → 0, we conclude that the probability of x0 staying in the reservoir consecutively for a long run diminishes to 0. Since
14 inequalities are used to prove Theorem 4.1, the bound in (1) is not exact.
(m)
15 Denote Pi (x, ·) to be the conditional probability measure for the random variable associated with the ith element in
(m) (0)
16 R given X1 = x. Since the removal procedure is independent with the values in the reservoir, and yL is added to R(m)
17 directly, we have
(m)
18 Pi (x, ·) = P ti (x, ·),
19 where P t (x, ·) is the conditional probability measure for the tth element in the Markov chain given the same starting value,
(m) (m) (m)
20 ti is an integer between L(m − kx,i ) + n and L(m − kx,i + 1) + n when m ≥ 1 and kx,i ≤ m, and between 1 and n when
(m)
21 kx,i = m + 1. By Theorem 1.5.1 discussed by Liang et al. (2011) and Theorem 4.1, we have the following corollary for the
22 convergence property of the proposed sampling algorithm.

23 Corollary 4.1. Suppose that the transition kernel P(x, ·) is π -irreducible, has π (·) as its invariance distribution, and is aperiodic
24 and Harris recurrent. For a fixed sample size n and L,
⏐ ⏐
(m)
max sup ⏐Pi (x, A) − π (A)⏐ → 0 (3)
⏐ ⏐
25
i∈{1,...,n} A∈S

26 in probability as m → ∞, where x ∈ S.
27 See Liang et al. (2011) and Athreya and Lahiri (2006) for concepts about the Markov chain, such as irreducibility
28 and recurrence. The proof of Corollary 4.1 is given in Appendix B, and we do not use a conditional probability in (3).
29 Corollary 4.1 shows that the proposed sampling algorithm works when the embedded MCMC method satisfies certain
30 conditions.
31 Recall that the convergence of the proposed sampling algorithm is determined by the removal procedure in Step 3. In
32 order to speed up the convergence in (3) for a fixed L, other removal probability functions can be used. For example,
(m)
33 PE ,i ∝ w (ki ) (4)
α
34 for i = 1, . . . , n + L − 1, where w (n) = O(n ) is an increasing function of n with α ∈ [1, ∞). We can also use exponential
35 or hyper-power functions, for example, w (n) = exp(an) with a ∈ (0, ∞) and w (n) = nn . A heuristic validation of
(m)
36 the removal probability function in (4) is briefly discussed. If ki > kj(m) for some i, j ∈ {1, . . . , n + L − 1}, then
(m) (m) (m) (m)
37 ki /kj < w (ki )/w (kj ). Thus, there is a larger probability that an element with a larger counting number is removed.
38 It is a good way to speed up the convergence of the proposed sampling algorithm by a new removal probability function
39 in (4), but one disadvantage is that the correlation among the elements in the reservoir increases since the element with a
40 smaller counting number is more likely to be kept in the reservoir. Thus, it is not advisable to use a large L and a removal
41 probability function w (n), which increases dramatically, at the same time in practice.
42 Before closing this section, a special removal probability function w (·) is briefly discussed. The removal probabilities
(m−1)
43 of elements in {x1 , y1 , . . . , yL−1 } are set to be 1, and those of the remaining elements in R(m)
E are set to be 0. It can be
(m+n+1)
44 seen that this special removal probability function satisfies P ∗ (Kx,i = n + 2) = 0 for Theorem 4.1, and it corresponds
45 to the commonly used thinning procedure, which generates a sample of size n by discarding all but every Lth element of
46 an MCMC after some ‘‘burn-in’’ iterations or convergence is detected. Thus, this commonly used thinning approach can
47 be regarded as a special case of the proposed sampling algorithm.

48 5. Convergence detection

49 A diagnostic tool discussed by Gelman and Rubin (1992) is modified to detect the convergence of the proposed
50 sampling algorithm for practical use, and assume p = 1 for simplicity.

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 5

(0) (0)
Let x1,1 , . . . , x1,J be J starting values, selected in such a way that they are over-dispersed in the support of the target 1
(m)
distribution π (·). Denote Rj = {x(m) (m) (0)
1,j , . . . , xn,j } to be the reservoir in the mth iteration using x1,j as its starting value. Let 2
(m) ∑n (m) ∑J (m)
= n−1 xi,j and x̄(m) j=1 x̄j . The between- and within-variance of the J reservoirs are computed as
−1
x̄j i=1 =J 3

J (
n ∑ (m)
)2
B(m) = x̄j − x̄(m) , 4
J −1
j=1
J n (
1 ∑ ∑ (m) (m)
)2
W (m) = xi,j − x̄j . 5
J(n − 1)
j=1 i=1

Under mild conditions, B(m) overestimates the true variance of the target distribution, W (m) underestimates it, and 6

both of them are asymptotically unbiased as the sample size n → ∞; see Gelman and Rubin (1992) for details. Thus, we 7

propose to use the following scale reduction factor to detect the convergence of the proposed sampling algorithm. That 8

is, 9

n−1 1 (m)
W (m) B+
r̂ (m) = n
(m)
n
. 10
W
Since the reservoir size is fixed to be n, we do not expect r̂ (m) to converge to 1 as m → ∞. If r̂ (m) oscillates around a fixed 11

value, which is close to 1, then the updated reservoir is said to converge in practice. 12

6. Simulation studies 13

6.1. A continuous distribution case 14

A simulation study is conducted to test the performance of the proposed sampling algorithm based on a continuous 15

distribution. Let the target distribution associated with a random variable X be π (x) = 0.5Φ1 (x) + 0.5Φ2 (x), and it is a 16

mixture of two normal distributions, where Φ1 (x) is the cumulative distribution function of a normal distribution with 17

mean −1 and variance 1, and Φ2 (x) is that of a normal distribution with mean 3 and variance 0.25. Notice that there is 18

a local minimum separated by two high density regions. We are interested in estimating the population mean µ = E(X ), 19

standard deviation σ = var(X )−1/2 , and three proportions pi = (−∞,b ) dπ (x) for i = 1, . . . , 3, where var(X ) is the

20
i √
variance of X , b1 = −2, b2 = 0, and b3 = 3. The theoretical values of the five parameters are (µ, σ ) = (1, 37/8) 21

and (p1 , p2 , p3 ) ≈ (0.08, 0.42, 0.75). 22

We consider two sampling methods to get a sample of size n, that is, the proposed sampling algorithm embedded 23

in slice sampling (Neal, 2003) and the traditional slice sampling method using the last n elements of the same Markov 24

chain Monte Carlo for the proposed sampling algorithm, and we compare them in terms of the relative bias and relative 25

efficiency; a brief review of the slice sampling method is given in Appendix C. Denote {xb,1 , . . . , xb,n } to be a sample 26

obtained by a sampling method in the bth Monte Carlo simulation, and the relative bias of the mean estimator is 27

x̄ − µ
R.Bias = , 28
µ
and the relative efficiency of the proposed sampling algorithm compared with the traditional slice sampling without 29

thinning is 30

V̂SS ,µ
R.Eff = , 31
V̂Prop,µ
where x̄ is the sample mean of {x̄b : b = 1, . . . , B}, x̄b is the sample mean of {xb,1 , . . . , xb,n }, B is the number of Monte 32

Carlo simulations, V̂Prop,µ and V̂SS ,µ are the Monte Carlo variance of the proposed sampling algorithm and the traditional 33
∑B
slice sampling method, and the Monte Carlo variance is calculated as (B − 1)−1 b=1 (x̄b − x̄)2 . We consider n ∈ {50, 100}, 34

L ∈ {1, 5, 10, 20}, M = 2000, and conduct B = 2000 Monte Carlo simulations. 35

The scale reduction factor discussed in Section 5 is used to detect the convergence of the proposed sampling algorithm, 36

and the traditional scale reduction factor (Gelman and Rubin, 1992) is used for the traditional slice sampling method. 37

Notice that the traditional scale reduction factor uses the second half of the MCMC to calculate its value. The scale 38

reduction factors are computed using 9 starting values, and the jth starting value is the (10j)%th quantile of a Cauchy 39

distribution with mean 0 and scale 3 for j = 1, . . . , 9. 40

Fig. 1 shows the convergence diagnostic results for the proposed sampling algorithm with L ∈ {1, 5, 10, 20} and the 41

traditional slice sampling method when the sample size n = 50. From the results, we can conclude that the scale reduction 42

factors of the proposed sampling algorithm oscillate around fixed values when those of the traditional slice sampling 43

method are close to 1. Thus, the scale reduction factor discussed in Section 5 is reasonable to detect the convergence for 44

the proposed sampling algorithm in practice. 45

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

6 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Fig. 1. Scale reduction factors of the proposed sampling algorithm (Proposed) embedded in slice sampling with L = 1 (a), L = 5 (b), L = 10 (c) and
L = 20 (d) and the traditional slice sampling method (Traditional) when the sample size n = 50.

Fig. 2. Monte Carlo mean of the autocorrelation for the sample generated by the traditional slice sampling method (Trd.) and the proposed sampling
algorithm with L = 1, L = 5, L = 10 and L = 20 based on 2000 Monte Carlo simulations when the sample size n = 50. The horizontal line
corresponds to no autocorrelation. The results for the traditional slice sampling method and the proposed sampling algorithm with L = 1 overlay
with each other.

1 Based on 2000 Monte Carlo simulations, Fig. 2 shows the Monte Carlo mean of the autocorrelation for samples of size
2 n = 50, generated by the proposed sampling algorithm with L ∈ {1, 5, 10, 20} and the traditional slice sampling method.
3 The results for the proposed sampling algorithm with L = 1 and the traditional slice sampling method are approximately
4 the same. As the value of L increases, the autocorrelation of the sample generated by the proposed sampling algorithm
5 decreases to 0 faster than that of the traditional slice sampling method.
6 Table 1 shows the relative bias and relative efficiency of the two methods when estimating those five parameters of
7 interest. The proposed sampling algorithm with L = 1 performs similarly with the traditional slice sampling method in

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 7

Table 1
Relative bias (R.Bias) and the relative efficiency (R.Eff.) of the estimators from the proposed sampling algorithm
embedded in the slice sampling, denoted as ‘‘Proposed sampling algorithm’’, and the traditional slice sampling method,
denoted as ‘‘Traditional slice sampling method’’, based on 2000 Monte Carlo simulations. The population mean is
denoted as µ, standard deviation as σ , and the three proportions as p1 , p2 and p3 .
n L Proposed sampling algorithm Traditional slice sampling method
µ σ p1 p2 p3 µ σ p1 p2 p3
R.Bias 0.04 −0.15 −0.01 −0.02 −0.01 0.04 −0.15 −0.01 −0.02 −0.01
1
R.Eff. 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.04 −0.14 0.01 0.02 0.01 −0.04 −0.15 0.01 0.02 0.01
5
50 R.Eff. 1.04 1.09 1.04 1.04 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias 0.04 −0.12 0.00 −0.02 −0.01 0.04 −0.14 −0.01 −0.02 −0.01
10
R.Eff. 1.10 1.27 1.04 1.09 1.07 1.00 1.00 1.00 1.00 1.00
R.Bias 0.00 −0.10 0.01 0.00 0.00 0.01 −0.14 0.01 −0.01 0.00
20
R.Eff. 1.29 1.71 1.12 1.28 1.23 1.00 1.00 1.00 1.00 1.00

R.Bias 0.04 −0.08 −0.03 −0.02 0.00 0.04 −0.08 −0.03 −0.02 0.00
1
R.Eff. 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias 0.00 −0.07 0.00 0.00 0.00 0.00 −0.07 0.00 0.00 0.00
5
100 R.Eff. 1.01 1.06 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00
R.Bias 0.01 −0.07 −0.01 −0.01 0.00 0.01 −0.07 −0.01 −0.01 0.00
10
R.Eff. 1.05 1.12 1.04 1.05 1.05 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.06 0.00 0.01 0.00 −0.01 −0.07 0.00 0.00 0.00
20
R.Eff. 1.13 1.33 1.08 1.12 1.11 1.00 1.00 1.00 1.00 1.00

the sense that their relative bias is the same for both sample sizes, and the relative efficiency of the proposed sampling 1

algorithm is close to 1. For a fixed sample size, as the value of L increases, the relative bias of the proposed sampling 2

algorithm is approximately the same with the one of the traditional slice sampling method, but the relative efficiency 3

of the proposed sampling algorithm increases. For a fixed L, as the sample size increases, the relative efficiency of the 4

proposed sampling algorithm decreases. To sum up, the proposed sampling method is more efficient than the traditional 5

slice sampling method in estimating the parameters of interest when L > 1, but the efficiency of the proposed sampling 6

algorithm is limited when L is small compared with the sample size n since it is hard to update the sample in the reservoir 7

efficiently. It is recommended that a large L should be used when the sample size n is large. 8

6.2. A discrete distribution case 9

As technology develops, online social networks, such as Facebook and Twitter, become more and more popular. 10

However, the complete networks are confidential to protect the users’ information. It is desirable to obtain a representative 11

sample of a complete network, and many methods are developed to achieve this goal using MCMC (Gjoka et al., 2010; 12

Al Hasan and Dave, 2018; Leskovec and Faloutsos, 2006). 13

In this section, we use the online friendship network derived from the music streaming service Deezer (November, 14

2017) in Hungary (Rozemberczki et al., 2018) to test the performance of the proposed sampling algorithm. This network 15

is undirected and contains N = 47,538 nodes and E = 222, 887 edges. It is treated as a complete network, and two 16

methods are used to obtain a representative sample of size n from it. One is the Metropolis–Hastings Random Walk 17

(MHRW) recommended by Gjoka et al. (2010), and the other one is the proposed sampling algorithm embedded in the 18

MHRW with L ∈ {1, 5, 10, 20} and M = 2000; see Appendix D for a brief review of MHRW. Denote V = {vi : i = 1, . . . , N } 19

to be the node set of this network, and kv to be the degree of v for v ∈ V . We are interested in making inference about 20

the node degree of this∑ network, including the median (M) and the standard deviation (σ ) of the node degrees and three 21

proportions pj = N −1 v∈V kv < qj for j = 1, . . . , 3, where q1 = 7, q2 = 15, and q3 = 25. The ground truth for the 22

five parameters is (M , σ , p1 , p2 , p3 ) ≈ (8, 7.39, 0.43, 0.81, 0.96), and the relative bias and relative efficiency are used to 23

compare the two sampling methods. 24

We conduct 2000 Monte Carlo simulations, and Table 2 summarizes the simulation results. When L = 1, the proposed 25

sampling algorithm performs similarly as MHRW. For a fixed sample size, the proposed sampling algorithm becomes more 26

efficient compared with MHRW as L increases. On the other hand, for a fixed L, the relative efficiency of the proposed 27

sampling algorithm decreases as the sample size increases. These findings are the same as the first simulation. That is, 28

the proposed sampling algorithm is more efficient than the MHRW when L > 1, but its efficiency is limited if L is small 29

compared with the sample size n. 30

7. Discussion 31

MCMC methods are popular in drawing samples from a target distribution in practice, especially under Bayesian 32

settings (Gelman et al., 2014). By applying the idea of reservoir sampling to the Metropolis–Hastings algorithm, an efficient 33

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

8 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Table 2
Relative bias (R.Bias) and the relative efficiency (R.Eff.) of the estimators from the proposed sampling algorithm
embedded in Metropolis–Hastings Random Walk, denoted as ‘‘Proposed sampling algorithm’’, and the Metropolis–
Hastings Random Walk, denoted as ‘‘Metropolis–Hastings Random Walk’’, based on 2000 Monte Carlo simulations. The
population median is denoted as M, standard deviation as σ , and the three proportions as p1 , p2 and p3 .
n L Proposed sampling algorithm Metropolis–Hastings Random Walk
M σ p1 p2 p3 M σ p1 p2 p3
R.Bias 0.00 −0.09 −0.04 −0.01 0.00 0.00 −0.09 −0.04 −0.01 0.00
1
R.Eff. 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.09 −0.03 −0.01 0.00 −0.01 −0.10 −0.03 −0.01 0.00
5
50 R.Eff. 1.05 1.02 1.05 1.04 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.10 −0.03 −0.01 0.00 −0.01 −0.10 −0.03 −0.01 0.00
10
R.Eff. 1.08 1.07 1.09 1.09 1.05 1.00 1.00 1.00 1.00 1.00
R.Bias −0.02 −0.07 −0.02 0.00 0.00 −0.02 −0.09 −0.02 0.00 0.00
20
R.Eff. 1.22 1.15 1.29 1.21 1.17 1.00 1.00 1.00 1.00 1.00

R.Bias −0.02 −0.05 −0.03 −0.01 0.00 −0.02 −0.05 −0.03 −0.01 0.00
1
R.Eff. 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.02 −0.06 −0.03 −0.01 0.00 −0.02 −0.06 −0.03 −0.01 0.00
5
100 R.Eff. 1.02 1.03 1.02 1.01 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.04 −0.04 −0.01 0.00 −0.01 −0.05 −0.04 −0.01 0.00
10
R.Eff. 1.05 1.02 1.05 1.05 1.02 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.04 −0.04 −0.01 0.00 −0.01 −0.05 −0.04 −0.01 0.00
20
R.Eff. 1.12 1.07 1.12 1.11 1.07 1.00 1.00 1.00 1.00 1.00

1 sampling algorithm is proposed to obtain a sample from a target distribution. The proposed sampling algorithm can be
2 embedded in other MCMC methods as well, including the slice sampling method (Neal, 2003) and the algorithms for
3 discrete populations (Gjoka et al., 2010).
4 For each iteration of the proposed sampling algorithm, it only requires limited computer memory to store the sample,
5 by which the convergence detection is conducted using the modified scale reduction factor. Simulation studies show the
6 better performance of the proposed sampling algorithm compared with the traditional MCMC methods without thinning.
7 However, there is no theoretical result for choosing the optimal L and the optimal removal probability function in (4).
8 Both L and the removal probability function balance the trade-off between the speed of convergence and the autocor-
9 relation among the sample selected in each iteration. Theoretical results validate that the conditional distribution of the
10 sample converges in total variation to the target distribution in probability, regardless of the choice of L and the removal
11 probability function. It is advised not to choose a large L and a removal probability function increasing dramatically at
12 the same time, and a large value of L should be chosen if the sample size n is large. In order to get a sample of size n
13 using the proposed sampling algorithm, an MCMC with length (ML + n) is needed, and it may result in high computational
14 cost (Geyer, 1992), including the computation of the removal probabilities and the removal procedure of the proposed
15 sampling algorithm. Such computational cost becomes significant when the sample size and the value of L are large.
16 The thinning procedure is a practical way to decrease the autocorrelation among the sample, and a commonly used
17 thinning procedure is a special case of the proposed sampling algorithm. Some researchers (Geyer, 1992; MacEachern and
18 Berliner, 1994; Link and Eaton, 2012) discouraged thinning of an MCMC, and their argument is based on comparing the
19 whole chain with a thinned chain of reduced length. In our study, we compare the un-thinned and thinned chains with
20 the same length, and the efficiency of the proposed sampling algorithm is demonstrated by two simulation studies.

21 Acknowledgments

22 We would like to thank Dr. Jae Kwang Kim for discussing and providing constructive comments for this paper. We
23 would also like to thank the associate editor and two anonymous reviewers for their detailed and constructive comments.

24 Appendix A. Proof of Theorem 4.1

(m) (m+1)
25
26 Given Sx = s, we can show that Sx ≤ s + (n − L) + L = s + n with probability 1. More generally, we can show
(m+j)
27 Sx ≤ s + jn for j ≥ 1. The probability that an element is not removed achieves its maximum at the removal of the first
28 element. Thus, we have
j { }L j { }L
(m+j)
∏ k ∏ k
29 P ∗ (Kx,i = j + 1 | Sx(m) = s) ≤ 1− (m+k−1)
≤ 1− . (A.1)
Sx +L s + (k − 1)n + L
k=1 k=1

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 9

Consider the function h1 (x) = 1 − x{s + (x − 1)n + L}−1 for x ∈ R. Its first-order derivative with respect to x is 1

h1 (x) = −(s − n + L){s + (x − 1)n + L}2 < 0 since s ≥ n. By (A.1), we have



2

( )jL
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) ≤ 1 − . (A.2) 3
s+L

Consider the function h2 (x) = {1 − (s + x)−1 }x for x > 0. Its first-order derivative with respect to x is 4
( )x { ( ) }
1 1 x 1
h′2 (x) = 1− ln 1 − + 5
s+x s+x 1 − (s + x)−1 (s + x)2
( )x
1 1−s
< 1− , (A.3) 6
s+x (s + x)(s + x − 1)
where the inequality holds since ln(1 − x) < −x for x ∈ (0, 1). Since s ≥ n > 1, by (A.3), h2 (x) is a decreasing function. 7

Thus, by (A.2), we have 8

( )j
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) < 1 − . (A.4) 9
s+1
Thus, we have proved (1). 10

By the removal procedure, we can show 11

{Kx(m
,i
+l)
= l + 1} ⊂ {Kx(m
,i
+j)
= j + 1} 12

for l ≥ j. Thus, we have 13



{Kx(m
,i
+l)
= l + 1} ⊂ {Kx(m
,i
+j)
= j + 1}. (A.5) 14

l=j

By (A.4), for m ≥ 1 and ϵ > 0, there exists J = J(m, ϵ ) ∈ N such that 15

)j
ϵ
(
1
1− < (A.6) 16
s+1 n
holds for j ≥ J. By (A.4)–(A.6), we have proved (2). 17

Appendix B. Proof of Corollary 4.1 18

For any ϵ > 0 and x ∈ S, by Theorem 1.5.1 of Liang et al. (2011), there exists N = N(ϵ, x) ∈ N, such that 19
20

sup ⏐P t (x, A) − π (A)⏐ < ϵ


⏐ ⏐
(B.1) 21
A∈S

holds for any t ≥ N; see Tierney (1994) for more details about the convergence of the Markov chain Monte Carlo. Besides, 22

there exists M1 = M1 (ϵ, x) to guarantee 23

L(M1 − 1) + n > N . 24

That is, the random variable associated with any element of the L newly generated values {y1 , . . . , yL } after the M1 th step 25

satisfies (B.1) since its corresponding index t ≥ (M1 − 1)L + n > N. 26

To show (3), we only need to consider the elements that are selected into the reservoir prior to the M1 -step and remain 27
(M )
in the reservoir at the M1 th step. Denote A = {i : kx,i 1 > 1} to be the index set of such elements, and the cardinality of 28

A is no greater than n, which is a fixed integer. 29


(m+j)
By (A.4), the conditional probability P ∗ (Kx,i = j + 1 | Sx(m) = s) is bounded by s. Besides, Sx(m) < (M1 + 1)(n − L) + L 30

for m < M1 . Thus, we have 31

( )j
(m+j) 1
P ∗ (Kx,i = j + 1) < 1 − . (B.2) 32
(M1 + 1)(n − L) + L + 1
By (B.2), for i ∈ A, there exists Ji such that 33
⎛ ⎞

⋃ (m +j) ϵ
P∗ ⎝ {Kx,i i = j + 1}⎠ < , (B.3) 34
n
j=Ji

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

10 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 where mi < M1 . Denote J = max{Ji : i ∈ A}. Then,


⎛ ⎞ ⎛ ⎞

⋃⋃ ∑ ∞

(m +j) (m +j)
2 P∗ ⎝ Kx,i i
{ = j + 1}⎠ ≤ P ∗ ⎝ {Kx,i i = j + 1}⎠
i∈A j=J i∈A j=J
n
∑ϵ
3 < = ϵ. (B.4)
n
i=1

4 Take M = M1 + J, we can show that, for m > M, with probability 1 − ϵ , we have


⏐ ⏐
(m)
max sup ⏐Pi (x, A) − π (A)⏐ < ϵ. (B.5)
⏐ ⏐
5
i∈{1,...,n} A∈S

6 Since ϵ is arbitrary, we have proved (3).

7 Appendix C. Brief review of slice sampling

8
9 For simplicity, we assume that the dimension of S is p = 1. Denote x0 to be the current point in the MCMC, y = g(x0 ) − e
10 to be the vertical level defining the slice {x : g(x) > y}, w to be an estimate of the typical size of a slice, m to be an integer
11 limiting the size of a lice to be mw , and Il and Iu to be a lower bound and an upper bound of the support of f (x), where
12 g(x) = log[f (x)], e is exponentially distributed with mean 1, and the support of f (x) is {x ∈ S : f (x) > 0}. The slice
13 sampling to get a new sample point can be summarized in the following steps (Neal, 2003). That is,

14 Step 1. ‘‘Doubling’’ procedure for finding an interval around the current point.

15 1.1. Initialize L = x0 − w u and R = L + w , where u ∼ U(0, 1), and U(a, b) is a uniform distribution on the interval
16 (a, b).
17 1.2. Initialize J = ⌊mv⌋ and K = (m − 1) − J, where v ∼ U(0, 1), and ⌊x⌋ is the largest integer that is smaller than
18 or equal to x.
19 1.3. While J > 0, L > Il and y < g(L), repeat the following step.

20 i. Update L = L − w and J = J − 1.

21 1.4. While K > 0, R < Iu and y < g(R), repeat the following step.

22 i. Update R = R + w and K = K − 1.

23 Step 2. Modify (L, R) by L = max{L, Il } and R = min{R, Iu }, where max{a, b} and min{a, b} are the maximum and minimum
24 number between a and b, respectively.
25 Step 3. Repeat the following steps until a break.

26 3.1. Generate x1 ∼ U(L, R).


27 3.2. If g(x1 ) ≥ y, break the repetition and accept x1 as a new sample point. Otherwise, update R = x1 if x1 > x0 ,
28 and L = x1 if x1 < x0 .

29 Appendix D. Brief review of Metropolis–Hastings Random Walk

31
30 Denote v to be the current node in the MCMC. The MHRW algorithm (Gjoka et al., 2010) to get a new sample point
32 can be summarized in the following steps.

33 Step 1. Select a node w that is connected with v uniformly at random.


34 Step 2. Generate p ∼ U(0, 1).
35 Step 3. If p ≤ kv /kw , a new sample point is w , otherwise a new sample point is v .

36 References

37 Al Hasan, M., Dave, V.S., 2018. Triangle counting in large networks: a review. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 8 (2), e1226.
38 http://dx.doi.org/10.1002/widm.1226.
39 Athreya, K.B., Lahiri, S.N., 2006. Measure theory and probability theory. In: Oxford Handbook of Innovation. Springer Science & Business Media, New
40 York, pp. 439–480.
41 Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J., 2002. Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM
42 SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. In: PODS ’02, ACM, New York, pp. 1–16. http://dx.doi.org/10.1145/543613.
43 543615.
44 Chib, S., Greenberg, E., 1995. Understanding the Metropolis-Hastings algorithm. Amer. Statist. 49 (4), 327–335. http://dx.doi.org/10.1080/00031305.
45 1995.10476177.
46 Edwards, R.G., Sokal, A.D., 1988. Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Phys. Rev. D 38
47 (6), 2009. http://dx.doi.org/10.1103/PhysRevD.38.2009.

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 11

Efraimidis, P.S., 2015. Weighted random sampling over data streams. In: Algorithms, Probability, Networks, and Games. Springer, pp. 183–195. 1
Efraimidis, P.S., Spirakis, P.G., 2006. Weighted random sampling with a reservoir. Inform. Process. Lett. 97 (5), 181–185. http://dx.doi.org/10.1016/j. 2
ipl.2005.11.003. 3
Gelfand, A.E., Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 (410), 398–409. http: 4
//dx.doi.org/10.1080/01621459.1990.10476213. 5
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2014. Bayesian Data Analysis, third ed. Taylor & Francis, Florida. 6
Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Statist. Sci. 7 (4), 457–472. http://dx.doi.org/10.1214/ss/ 7
1177011136. 8
Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 9
PAMI-6 (6), 721–741. http://dx.doi.org/10.1109/TPAMI.1984.4767596. 10
Geyer, C.J., 1992. Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483. http://dx.doi.org/10.1214/ss/1177011137. 11
Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection metropolis sampling within Gibbs sampling. J. R. Stat. Soc. Ser. C 44 (4), 455–472. 12
http://dx.doi.org/10.2307/2986138. 13
Gilks, W.R., Neal, R.M., Best, N.G., Tan, K.K.C., 1997. Corrigendum: adaptive rejection Metropolis sampling. J. R. Stat. Soc. Ser. C 46 (4), 541–542. 14
http://dx.doi.org/10.1111/1467-9876.00091. 15
Gilks, W.R., Roberts, G.O., George, E.I., 1994. Adaptive direction sampling. J. R. Stat. Soc., Ser. D Stat. 43 (1), 179–189. http://dx.doi.org/10.2307/2348942. 16
Girolami, M., Calderhead, B., 2011. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B 73 (2), 123–214. 17
http://dx.doi.org/10.1111/j.1467-9868.2010.00765.x. 18
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A., 2010. Walking in facebook: A case study of unbiased sampling of OSNs. In: Infocom, 2010 19
Proceedings IEEE. IEEE, pp. 1–9. http://dx.doi.org/10.1109/INFCOM.2010.5462078. 20
Hastings, W.K., 1970. Monte Carlo Sampling methods using Markov chains and their applications. Biometrika 57 (1), 97–109. http://dx.doi.org/10. 21
1093/biomet/57.1.97. 22
Leskovec, J., Faloutsos, C., 2006. Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge 23
Discovery and Data Mining. ACM, pp. 631–636. http://dx.doi.org/10.1145/1150402.1150479. 24
Liang, F., Liu, C., Carroll, R., 2011. Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, Vol. 714. John Wiley & Sons, Chichester. 25
Link, W.A., Eaton, M.J., 2012. On thinning of chains in MCMC. Methods Ecol. Evol. 3 (1), 112–115. http://dx.doi.org/10.1111/j.2041-210X.2011.00131.x. 26
Liu, J.S., Liang, F., Wong, W.H., 2000. The multiple-try method and local optimization in Metropolis sampling. J. Amer. Statist. Assoc. 95 (449), 121–134. 27
http://dx.doi.org/10.1080/01621459.2000.10473908. 28
MacEachern, S.N., Berliner, L.M., 1994. Subsampling the Gibbs sampler. Amer. Statist. 48 (3), 188–190. http://dx.doi.org/10.1080/00031305.1994. 29
10476054. 30
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculations by fast computing machines. J. Chem. 31
Phys. 21 (6), 1087–1092. http://dx.doi.org/10.1063/1.1699114. 32
Neal, R.M., 2003. Slice sampling. Ann. Statist. 31, 705–741. http://dx.doi.org/10.1214/aos/1056562461. 33
Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C., 2018. GEMSEC: Graph Embedding with Self Clustering, arXiv:arXiv:1802.03997. 34
Swendsen, R.H., Wang, J.-S., 1987. Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 (2), 86. http://dx.doi.org/10.1103/ 35
PhysRevLett.58.86. 36
Tierney, L., 1994. Markov Chains for exploring posterior distributions. Ann. Statist. 22 (2), 1701–1728. http://dx.doi.org/10.1214/aos/1176325750. 37
Vitter, J.S., 1984. Faster methods for random sampling. Commun. ACM. 27 (7), 703–718. http://dx.doi.org/10.1145/358105.893. 38
Vitter, J.S., 1985. Random sampling with a reservoir. ACM Trans. Math. Software 11 (1), 37–57. http://dx.doi.org/10.1145/3147.3165. 39
Wu, F.-Y., 1982. The potts model. Rev. Modern Phys. 54 (1), 235. http://dx.doi.org/10.1103/RevModPhys.54.235. 40

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.

You might also like