0% found this document useful (0 votes)

32 views

Markov Chain Monte Carlo Sampling Using A Reservoir Method

This document summarizes a research article that proposes a novel application of reservoir sampling to improve standard Markov chain Monte Carlo (MCMC) methods. Specifically, it introduces a stochastic thinning algorithm using reservoir sampling that can be embedded in most MCMC methods to reduce autocorrelation among the generated sample. The algorithm converges in total variation to the target distribution in probability under mild conditions. Simulation results show that the proposed sampling algorithm has smaller Monte Carlo variance than the corresponding MCMC methods without thinning, while maintaining similar estimation bias.

Uploaded by

Kevin Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Markov Chain Monte Carlo Sampling Using A Reservoir Method

Uploaded by

Kevin Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

COMSTA: 6789 Model 3G pp. 1–11 (col.

fig: nil)

Computational Statistics and Data Analysis xxx (xxxx) xxx

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Markov chain Monte Carlo sampling using a reservoir method

Zhonglei Wang
Wang Yanan Institute for Studies in Economics and School of Economics Xiamen University, Xiamen, Fujian 361005,
People’s Republic of China

highlights

• A reservoir sampling algorithm embedded in MCMC methods is proposed.

• The sample generated by the proposed sampling algorithm converges in total variation to the target distribution in probability
under mild conditions.
• The commonly used thinning procedure is a special case of the proposed sampling algorithm.
• Simulation results show that the proposed sampling algorithm is more efficient than the corresponding MCMC methods without
thinning.

article info a b s t r a c t

Article history: Markov chain Monte Carlo methods are widely used to draw a sample from a target
Received 15 October 2017 distribution which is hard to characterize analytically, and reservoir sampling is devel-
Received in revised form 3 March 2019 oped to obtain a sample from a data stream sequentially in a single pass. A stochastic
Accepted 2 May 2019
thinning algorithm using reservoir sampling is proposed, and it can be embedded in
Available online xxxx
most Markov chain Monte Carlo methods to reduce the autocorrelation among the
Keywords: generated sample. The distribution of the sample generated by the proposed sampling
Convergence algorithm converges in total variation to the target distribution in probability under mild
Metropolis–Hastings algorithm conditions. A practical method is introduced to detect the convergence of the proposed
Removal procedure sampling algorithm. Two simulation studies are conducted to compare the proposed
Stochastic thinning
sampling algorithm and the corresponding Markov chain Monte Carlo methods without
thinning, and results show that the estimation bias of the proposed sampling algorithm
is approximately the same as the corresponding Markov chain Monte Carlo method,
but the proposed sampling algorithm has a smaller Monte Carlo variance. The proposed
sampling algorithm saves computer memory in the sense that the storage of a small
portion of the Markov chain is required in each iteration.
© 2019 Elsevier B.V. All rights reserved.

1. Introduction 1

Drawing a sample from a target distribution is a fundamental computer practice in many fields. Markov chain Monte 2

Carlo (MCMC) methods are widely used when the target distribution is complex and infeasible to characterize analytically, 3

and the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970; Chib and Greenberg, 1995) and the Gibbs 4

sampler (Geman and Geman, 1984; Gelfand and Smith, 1990) are the most popular MCMC approaches. 5

MCMC is an active research area, and many methods have been proposed during the past thirty years. In order to 6

handle the local-trap problem, where a target density has many local minima separated by regions with high densities 7

E-mail address: [email protected].

https://doi.org/10.1016/j.csda.2019.05.001
0167-9473/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Z. Wang, Markov chain Monte Carlo sampling using a reservoir method. Computational Statistics and Data Analysis (2019),
https://doi.org/10.1016/j.csda.2019.05.001.
COMSTA: 6789

2 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 (Liang et al., 2011), different solutions are provided. Swendsen and Wang (1987) proposed the Swendsen–Wang algorithm
2 for Potts models (Wu, 1982), and it is efficient since a cluster of spins is updated in each iteration. Edwards and Sokal
3 (1988) generalized the Swendsen–Wang algorithm to other models. Neal (2003) discussed a slice sampling method, and
4 it draws a sample from a region with large density values. Gilks et al. (1994) considered an adaptive direction sampling
5 method to speed up the convergence of the MCMC. Based on the framework of the adaptive direction sampling method, Liu
6 et al. (2000) discussed a conjugate-gradient Monte Carlo method and demonstrated that it performs significantly better
7 than the traditional Metropolis–Hastings algorithm. Except for the local-trap problem, Gilks et al. (1995) and Gilks et al.
8 (1997) proposed the adaptive rejection Metropolis sampling method for the case where the target probability density
9 function is not log-concave. Girolami and Calderhead (2011) discussed MCMC algorithms when the target distribution
10 is high-dimensional and the sample exhibits strong correlation. Leskovec and Faloutsos (2006) proposed to use random
11 walk to obtain a sample from a large graph. Gjoka et al. (2010) compared four sampling methods to get a representative
12 sample for an undirected graph using Facebook data. Al Hasan and Dave (2018) discussed the usage of MCMC algorithms
13 to approximate local topological structures in large networks.
14 Reservoir sampling (Vitter, 1985; Efraimidis and Spirakis, 2006) is a powerful sequential random sampling algo-
15 rithm (Vitter, 1984) for drawing a sample from a data stream in a single pass, and it has been widely used in areas
16 with data streams, such as financial applications, network monitoring and security (Babcock et al., 2002). The basic idea
17 of reservoir sampling is to update one element in the reservoir randomly with a given probability in each iteration. It
18 is implemented without reading the complete population and only requires constant computer memory to store the
19 reservoir.
20 In this paper, we propose a novel application of reservoir sampling to improve the standard MCMC methods in the
21 sense that the generated sample is less correlated. The proposed sampling algorithm is a stochastic thinning procedure,
22 and it can be embedded in most MCMC methods. The distribution of the sample generated by the proposed sampling
23 algorithm converges in total variation to the target distribution in probability under mild conditions. It saves computer
24 memory by storing a small portion of a Markov chain.
25 The rest of the paper is organized as follows. In Section 2, the concepts of MCMC methods and the Metropolis–Hastings
26 algorithm are briefly reviewed. The sampling algorithm embedded in the Metropolis–Hastings algorithm is proposed in
27 Section 3, and other MCMC methods can be used as well. The convergence property of the proposed sampling algorithm
28 is studied in Section 4. A practical way to detect the convergence of the proposed sampling algorithm is introduced in
29 Section 5. Two simulation studies are conducted to test the performance of the proposed sampling algorithm in Section 6.
30 Discussion is given in Section 7.

31 2. Basic setup

32 Denote (S, S , π ) to be a probability space, where S ⊂ Rp with p ≥ 1, S is a σ -algebra for S, and π (·) is a probability
33 measure of the measurable space (S, S ). We are interested in obtaining a sample of size n from the target distribution
34 π (·).
35 An MCMC method is often used when it is difficult to draw a sample from π (·) directly. Denote X0 , X1 , . . . to be a time
36 homogeneous Markov chain with π (·) as its invariant distribution (Athreya and Lahiri, 2006). Traditional MCMC methods
37 treat {XN +i : i = 1, . . . , n} as a sample of size n from the target distribution π (·) approximately, where N is the ‘‘burn-in’’
38 iteration or a large number determined by a convergence diagnostic tool, such as the scale reduction factor discussed
39 by Gelman and Rubin (1992).
40 As one of the widely used MCMC methods, the Metropolis–Hastings algorithm is briefly reviewed and is used to
41 demonstrate the proposed sampling algorithm in the next section. Suppose that the target distribution π (·) is dominated
42 by a σ -finite measure µ(·), and it∫has a probability density function f (·). Denote q(y | x) to be a probability density function
43 of y with respect to µ(·), that is, S q(y | x)µ(dy) = 1 for x ∈ S, and it is easy to generate a sample from q(y | x). A Markov
44 chain X0 , X1 , . . . is obtained by repeating the following two steps from a random starting value X0 .

45 Step 1. Given Xk = x, generate y by the probability density function q(y | x), where k ≥ 0.
46 Step 2. Set

with probability p(x, y)

{
y
47 Xk+1 =
x with probability 1 − p(x, y),
48 where
{ }
f (y)q(x | y)
49 p(x, y) = min ,1 .
f (x)q(y | x)
50 It is well-known that a sample generated by the Metropolis–Hastings algorithm is not independent since information
51 of Xk is used to generate Xk+1 , and the autocorrelation of a consecutive segment of the MCMC is large. In the next section,
52 we propose a sampling algorithm such that the dependence among the sample is weakened automatically, and only a
53 small portion of a Markov chain is recorded in each iteration.

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 3

3. Proposed sampling algorithm 1

A reservoir sampling algorithm embedded in the Metropolis–Hastings algorithm is proposed to generate a sample of 2

size n from the target distribution π (·), and it updates a reservoir of size n in each iteration. 3
(m) (m) (m) (m) (m)
For m = 0, . . . , M, denote R(m) = {x1 , . . . , xn } to be the reservoir at the mth iteration, and Nx = (kx,1 , . . . , kx,n ) to 4
(m)
be the counting vector for elements in R , where M is the number of iterations, and it is determined by a convergence 5
(m) (m)
diagnostic tool discussed in Section 5. For i = 1, . . . , n, kx,i counts the number of times that xi stays in the reservoir 6
(m) (m) (m)
consecutively at the mth iteration, and kx,i ≤ m + 1. For example, if kx,1 = 3 and m ≥ 2, then x1 is selected into the 7

reservoir at the (m − 2)th iteration, and it stays in the reservoir consecutively up to the mth iteration. The proposed 8

sampling algorithm embedded in the Metropolis–Hastings algorithm consists of the following four steps. That is, 9

(0)
Step 1. [Initialization]: Choose x1 from the support of the target distribution π (·) randomly, and use the Metropolis– 10
(0)
Hastings algorithm to generate the remaining elements for R(0) . Set kx,i = 1 for i = 1, . . . , n and m = 11

1. 12

Step 2. [Addition]: For the mth iteration, apply the Metropolis–Hastings algorithm to generate L elements, say 13
(m−1)
y1 , . . . , yL , using xn as the starting value, where L < n is a fixed value. Assign the counting value ky,j = 1 14
(m−1)
for j = 1 . . . , L. Combine R(m−1) and Nx with their counterparts associated with y1 , . . . , yL−1 to obtain the 15
(m) (m) (m) (m−1)
extended reservoir RE and the counting vector NE , where RE = {x1 , . . . , xn(m−1) , y1 , . . . , yL−1 } and NE(m) = 16
(m−1) (m−1) (m) (m)
(kx,1 , . . . , kx,n , ky,1 , . . . , ky,L−1 ). Notice that there are n + L − 1 elements in RE and NE 17
(m)
Step 3. [Removal]: Remove L elements from RE sequentially. For the lth removal step, let 18

(m)
(m) ki (m)
Pi,l = ∑ (m)
I(i ∈ Al ), 19
(m)
j∈Al
kj

(m) (m)
and remove one element from Al : i ∈ Al(m) }, where l = 1, . . . , L, i =
with removal probabilities {Pi,l 20
(m) (m) (m)
1, . . . , n + L − 1 is the index of the elements in is the ith element of NE , I(x ∈ A) = 1 if x ∈ A and 0
RE , ki 21
(m) (m)
otherwise, Al ⊂ {1, . . . , n + L − 1} is the index set after removing l − 1 elements, and A1 = {1, . . . , n + L − 1}. 22
(m) (m) (m)
After removing L elements from RE , denote the remaining ones as x1 , . . . , xn−1 . Update the reservoir by R(m) = 23

{x(m) (m) (m) (m)

1 , . . . , xn−1 , xn }, where xn = yL , and update Nx(m) accordingly. 24

Step 4. [Repetition]: Set m = m + 1, and repeat Step 2–3 until m > M. 25

The removal procedure in Step 3 is the ‘‘Weighted Random Sampling without Replacement, with defined Weights’’ 26

discussed by Efraimidis (2015), and it tends to remove the ‘‘older’’ elements in the reservoir. The elements of the updated 27
(0)
reservoirs are from one Markov chain generated by the Metropolis–Hastings algorithm using x1 as the starting value. 28

Thus, the convergence property of the Metropolis–Hastings algorithm holds for the proposed sampling algorithm, and 29

more details are provided in the next section. By implementing the removal procedure in Step 3, the autocorrelation 30

among the elements in the reservoir is weakened automatically since a consecutive segment of the Markov chain is less 31

likely to appear in the updated reservoir. By putting yL into R(m) directly in Step 3, we can speed up the convergence of 32

the proposed sampling algorithm. 33

Instead of the Metropolis–Hastings algorithm, the proposed sampling algorithm can be embedded in other MCMC 34

methods as well. The proposed sampling algorithm saves computer memory by storing at most (n + L)(p + 1) scalars, and 35

it is much better than the traditional MCMC algorithms, which stores an unnecessary part of a Markov chain in order to 36

check convergence if the scale reduction factor (Gelman and Rubin, 1992) is used. 37

4. Main results 38

(1) (2)
Based on the removal procedure of the proposed sampling algorithm, the counting vectors Nx , Nx , . . . form another 39

Markov chain. There exists a consistent probability space (Ω , F , P ∗ ) and random variables {Zm : m = 1, 2, . . .} such that 40
(1) (m)
(Z1 , . . . , Zm ) is identically distributed as (Nx , . . . , Nx ) for m ≥ 1; see Theorem 6.3.4 discussed by Athreya and Lahiri 41
(m) (m) ∑n (m)
(2006) for details. Without loss of generality, we denote Zm as Nx for m ≥ 1. Let Sx = i=1 x,i be the total counts of
K 42
(m) (m)
times that elements stay in the reservoir at the mth step, where Kx,i is the ith element of Nx . By the removal procedure, 43
(m)
it can be shown that n ≤ Sx ≤ (m + 1)(n − L) + L for m ∈ N. 44

In order to guarantee the convergence of the proposed sampling algorithm, we need to show that every element in 45

the reservoir can be updated by a more recent one in a long run. To be rigorous, we have the following result. 46

Theorem 4.1. Given a sample size n > 1 and L satisfying L < n, for m ≥ 1 and i = 1, . . . , n, we have 47

( )j
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) < 1 − , (1) 48
s+1

4 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 and recall that P ∗ (·) is a consistent probability measure of the counting vectors. Furthermore, for m ∈ N and ϵ > 0, there exists
2 J = J(m, ϵ ) ∈ N such that
⎛ ⎞
∞
⋃ (m+j) ϵ
3 P ∗ ⎝ {Kx,i = j + 1} | Sx(m) = s⎠ < . (2)
n
j=J

4 The proof of Theorem 4.1 is given in Appendix A. Some comments are made for Theorem 4.1. First, the event
5 {Kx(m
,i
+j)
= j + 1} implies that element x(m i
+j)
is selected into the reservoir at the mth iteration, and it stays in R(m+i) for
∗ (m+j)
6 i = 1, . . . , j. By the proof of (1), P (Kx,i = j+1 | Sx(m) = s) converges to 0 becomes faster as L increases; see (A.2) and (A.3)
7 for details. However, the elements in the reservoir may be more correlated since more newly generated elements remain in
8 the reservoir. Thus, L balances the convergence rate and the autocorrelation among the sample generated by the proposed
9 sampling algorithm. By (2) of Theorem 4.1, the probability of any element staying in the reservoir consecutively for a long
10 run diminishes to 0. Specifically, suppose that an element x0 is selected into the reservoir at the m0 th iteration. Then, both
11 n and m0 are regarded as fixed for x0 in the proposed sampling algorithm. For any ϵ > 0, there exists J = J(m0 , ϵ ), such
12 that the probability of x0 remaining in the reservoir consecutively for j steps is smaller than ϵ/n, where j ≥ J. Setting
13 ϵ → 0, we conclude that the probability of x0 staying in the reservoir consecutively for a long run diminishes to 0. Since
14 inequalities are used to prove Theorem 4.1, the bound in (1) is not exact.
(m)
15 Denote Pi (x, ·) to be the conditional probability measure for the random variable associated with the ith element in
(m) (0)
16 R given X1 = x. Since the removal procedure is independent with the values in the reservoir, and yL is added to R(m)
17 directly, we have
(m)
18 Pi (x, ·) = P ti (x, ·),
19 where P t (x, ·) is the conditional probability measure for the tth element in the Markov chain given the same starting value,
(m) (m) (m)
20 ti is an integer between L(m − kx,i ) + n and L(m − kx,i + 1) + n when m ≥ 1 and kx,i ≤ m, and between 1 and n when
(m)
21 kx,i = m + 1. By Theorem 1.5.1 discussed by Liang et al. (2011) and Theorem 4.1, we have the following corollary for the
22 convergence property of the proposed sampling algorithm.

23 Corollary 4.1. Suppose that the transition kernel P(x, ·) is π -irreducible, has π (·) as its invariance distribution, and is aperiodic
24 and Harris recurrent. For a fixed sample size n and L,
⏐ ⏐
(m)
max sup ⏐Pi (x, A) − π (A)⏐ → 0 (3)
⏐ ⏐
25
i∈{1,...,n} A∈S

26 in probability as m → ∞, where x ∈ S.
27 See Liang et al. (2011) and Athreya and Lahiri (2006) for concepts about the Markov chain, such as irreducibility
28 and recurrence. The proof of Corollary 4.1 is given in Appendix B, and we do not use a conditional probability in (3).
29 Corollary 4.1 shows that the proposed sampling algorithm works when the embedded MCMC method satisfies certain
30 conditions.
31 Recall that the convergence of the proposed sampling algorithm is determined by the removal procedure in Step 3. In
32 order to speed up the convergence in (3) for a fixed L, other removal probability functions can be used. For example,
(m)
33 PE ,i ∝ w (ki ) (4)
α
34 for i = 1, . . . , n + L − 1, where w (n) = O(n ) is an increasing function of n with α ∈ [1, ∞). We can also use exponential
35 or hyper-power functions, for example, w (n) = exp(an) with a ∈ (0, ∞) and w (n) = nn . A heuristic validation of
(m)
36 the removal probability function in (4) is briefly discussed. If ki > kj(m) for some i, j ∈ {1, . . . , n + L − 1}, then
(m) (m) (m) (m)
37 ki /kj < w (ki )/w (kj ). Thus, there is a larger probability that an element with a larger counting number is removed.
38 It is a good way to speed up the convergence of the proposed sampling algorithm by a new removal probability function
39 in (4), but one disadvantage is that the correlation among the elements in the reservoir increases since the element with a
40 smaller counting number is more likely to be kept in the reservoir. Thus, it is not advisable to use a large L and a removal
41 probability function w (n), which increases dramatically, at the same time in practice.
42 Before closing this section, a special removal probability function w (·) is briefly discussed. The removal probabilities
(m−1)
43 of elements in {x1 , y1 , . . . , yL−1 } are set to be 1, and those of the remaining elements in R(m)
E are set to be 0. It can be
(m+n+1)
44 seen that this special removal probability function satisfies P ∗ (Kx,i = n + 2) = 0 for Theorem 4.1, and it corresponds
45 to the commonly used thinning procedure, which generates a sample of size n by discarding all but every Lth element of
46 an MCMC after some ‘‘burn-in’’ iterations or convergence is detected. Thus, this commonly used thinning approach can
47 be regarded as a special case of the proposed sampling algorithm.

48 5. Convergence detection

49 A diagnostic tool discussed by Gelman and Rubin (1992) is modified to detect the convergence of the proposed
50 sampling algorithm for practical use, and assume p = 1 for simplicity.

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 5

(0) (0)
Let x1,1 , . . . , x1,J be J starting values, selected in such a way that they are over-dispersed in the support of the target 1
(m)
distribution π (·). Denote Rj = {x(m) (m) (0)
1,j , . . . , xn,j } to be the reservoir in the mth iteration using x1,j as its starting value. Let 2
(m) ∑n (m) ∑J (m)
= n−1 xi,j and x̄(m) j=1 x̄j . The between- and within-variance of the J reservoirs are computed as
−1
x̄j i=1 =J 3

J (
n ∑ (m)
)2
B(m) = x̄j − x̄(m) , 4
J −1
j=1
J n (
1 ∑ ∑ (m) (m)
)2
W (m) = xi,j − x̄j . 5
J(n − 1)
j=1 i=1

Under mild conditions, B(m) overestimates the true variance of the target distribution, W (m) underestimates it, and 6

both of them are asymptotically unbiased as the sample size n → ∞; see Gelman and Rubin (1992) for details. Thus, we 7

propose to use the following scale reduction factor to detect the convergence of the proposed sampling algorithm. That 8

is, 9
√
n−1 1 (m)
W (m) B+
r̂ (m) = n
(m)
n
. 10
W
Since the reservoir size is fixed to be n, we do not expect r̂ (m) to converge to 1 as m → ∞. If r̂ (m) oscillates around a fixed 11

value, which is close to 1, then the updated reservoir is said to converge in practice. 12

6. Simulation studies 13

6.1. A continuous distribution case 14

A simulation study is conducted to test the performance of the proposed sampling algorithm based on a continuous 15

distribution. Let the target distribution associated with a random variable X be π (x) = 0.5Φ1 (x) + 0.5Φ2 (x), and it is a 16

mixture of two normal distributions, where Φ1 (x) is the cumulative distribution function of a normal distribution with 17

mean −1 and variance 1, and Φ2 (x) is that of a normal distribution with mean 3 and variance 0.25. Notice that there is 18

a local minimum separated by two high density regions. We are interested in estimating the population mean µ = E(X ), 19

standard deviation σ = var(X )−1/2 , and three proportions pi = (−∞,b ) dπ (x) for i = 1, . . . , 3, where var(X ) is the
∫
20
i √
variance of X , b1 = −2, b2 = 0, and b3 = 3. The theoretical values of the five parameters are (µ, σ ) = (1, 37/8) 21

and (p1 , p2 , p3 ) ≈ (0.08, 0.42, 0.75). 22

We consider two sampling methods to get a sample of size n, that is, the proposed sampling algorithm embedded 23

in slice sampling (Neal, 2003) and the traditional slice sampling method using the last n elements of the same Markov 24

chain Monte Carlo for the proposed sampling algorithm, and we compare them in terms of the relative bias and relative 25

efficiency; a brief review of the slice sampling method is given in Appendix C. Denote {xb,1 , . . . , xb,n } to be a sample 26

obtained by a sampling method in the bth Monte Carlo simulation, and the relative bias of the mean estimator is 27

x̄ − µ
R.Bias = , 28
µ
and the relative efficiency of the proposed sampling algorithm compared with the traditional slice sampling without 29

thinning is 30

V̂SS ,µ
R.Eff = , 31
V̂Prop,µ
where x̄ is the sample mean of {x̄b : b = 1, . . . , B}, x̄b is the sample mean of {xb,1 , . . . , xb,n }, B is the number of Monte 32

Carlo simulations, V̂Prop,µ and V̂SS ,µ are the Monte Carlo variance of the proposed sampling algorithm and the traditional 33
∑B
slice sampling method, and the Monte Carlo variance is calculated as (B − 1)−1 b=1 (x̄b − x̄)2 . We consider n ∈ {50, 100}, 34

L ∈ {1, 5, 10, 20}, M = 2000, and conduct B = 2000 Monte Carlo simulations. 35

The scale reduction factor discussed in Section 5 is used to detect the convergence of the proposed sampling algorithm, 36

and the traditional scale reduction factor (Gelman and Rubin, 1992) is used for the traditional slice sampling method. 37

Notice that the traditional scale reduction factor uses the second half of the MCMC to calculate its value. The scale 38

reduction factors are computed using 9 starting values, and the jth starting value is the (10j)%th quantile of a Cauchy 39

distribution with mean 0 and scale 3 for j = 1, . . . , 9. 40

Fig. 1 shows the convergence diagnostic results for the proposed sampling algorithm with L ∈ {1, 5, 10, 20} and the 41

traditional slice sampling method when the sample size n = 50. From the results, we can conclude that the scale reduction 42

factors of the proposed sampling algorithm oscillate around fixed values when those of the traditional slice sampling 43

method are close to 1. Thus, the scale reduction factor discussed in Section 5 is reasonable to detect the convergence for 44

the proposed sampling algorithm in practice. 45

6 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Fig. 1. Scale reduction factors of the proposed sampling algorithm (Proposed) embedded in slice sampling with L = 1 (a), L = 5 (b), L = 10 (c) and
L = 20 (d) and the traditional slice sampling method (Traditional) when the sample size n = 50.

Fig. 2. Monte Carlo mean of the autocorrelation for the sample generated by the traditional slice sampling method (Trd.) and the proposed sampling
algorithm with L = 1, L = 5, L = 10 and L = 20 based on 2000 Monte Carlo simulations when the sample size n = 50. The horizontal line
corresponds to no autocorrelation. The results for the traditional slice sampling method and the proposed sampling algorithm with L = 1 overlay
with each other.

1 Based on 2000 Monte Carlo simulations, Fig. 2 shows the Monte Carlo mean of the autocorrelation for samples of size
2 n = 50, generated by the proposed sampling algorithm with L ∈ {1, 5, 10, 20} and the traditional slice sampling method.
3 The results for the proposed sampling algorithm with L = 1 and the traditional slice sampling method are approximately
4 the same. As the value of L increases, the autocorrelation of the sample generated by the proposed sampling algorithm
5 decreases to 0 faster than that of the traditional slice sampling method.
6 Table 1 shows the relative bias and relative efficiency of the two methods when estimating those five parameters of
7 interest. The proposed sampling algorithm with L = 1 performs similarly with the traditional slice sampling method in

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 7

Table 1
Relative bias (R.Bias) and the relative efficiency (R.Eff.) of the estimators from the proposed sampling algorithm
embedded in the slice sampling, denoted as ‘‘Proposed sampling algorithm’’, and the traditional slice sampling method,
denoted as ‘‘Traditional slice sampling method’’, based on 2000 Monte Carlo simulations. The population mean is
denoted as µ, standard deviation as σ , and the three proportions as p1 , p2 and p3 .
n L Proposed sampling algorithm Traditional slice sampling method
µ σ p1 p2 p3 µ σ p1 p2 p3
R.Bias 0.04 −0.15 −0.01 −0.02 −0.01 0.04 −0.15 −0.01 −0.02 −0.01
1
R.Eff. 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.04 −0.14 0.01 0.02 0.01 −0.04 −0.15 0.01 0.02 0.01
5
50 R.Eff. 1.04 1.09 1.04 1.04 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias 0.04 −0.12 0.00 −0.02 −0.01 0.04 −0.14 −0.01 −0.02 −0.01
10
R.Eff. 1.10 1.27 1.04 1.09 1.07 1.00 1.00 1.00 1.00 1.00
R.Bias 0.00 −0.10 0.01 0.00 0.00 0.01 −0.14 0.01 −0.01 0.00
20
R.Eff. 1.29 1.71 1.12 1.28 1.23 1.00 1.00 1.00 1.00 1.00

R.Bias 0.04 −0.08 −0.03 −0.02 0.00 0.04 −0.08 −0.03 −0.02 0.00
1
R.Eff. 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias 0.00 −0.07 0.00 0.00 0.00 0.00 −0.07 0.00 0.00 0.00
5
100 R.Eff. 1.01 1.06 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00
R.Bias 0.01 −0.07 −0.01 −0.01 0.00 0.01 −0.07 −0.01 −0.01 0.00
10
R.Eff. 1.05 1.12 1.04 1.05 1.05 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.06 0.00 0.01 0.00 −0.01 −0.07 0.00 0.00 0.00
20
R.Eff. 1.13 1.33 1.08 1.12 1.11 1.00 1.00 1.00 1.00 1.00

the sense that their relative bias is the same for both sample sizes, and the relative efficiency of the proposed sampling 1

algorithm is close to 1. For a fixed sample size, as the value of L increases, the relative bias of the proposed sampling 2

algorithm is approximately the same with the one of the traditional slice sampling method, but the relative efficiency 3

of the proposed sampling algorithm increases. For a fixed L, as the sample size increases, the relative efficiency of the 4

proposed sampling algorithm decreases. To sum up, the proposed sampling method is more efficient than the traditional 5

slice sampling method in estimating the parameters of interest when L > 1, but the efficiency of the proposed sampling 6

algorithm is limited when L is small compared with the sample size n since it is hard to update the sample in the reservoir 7

efficiently. It is recommended that a large L should be used when the sample size n is large. 8

6.2. A discrete distribution case 9

As technology develops, online social networks, such as Facebook and Twitter, become more and more popular. 10

However, the complete networks are confidential to protect the users’ information. It is desirable to obtain a representative 11

sample of a complete network, and many methods are developed to achieve this goal using MCMC (Gjoka et al., 2010; 12

Al Hasan and Dave, 2018; Leskovec and Faloutsos, 2006). 13

In this section, we use the online friendship network derived from the music streaming service Deezer (November, 14

2017) in Hungary (Rozemberczki et al., 2018) to test the performance of the proposed sampling algorithm. This network 15

is undirected and contains N = 47,538 nodes and E = 222, 887 edges. It is treated as a complete network, and two 16

methods are used to obtain a representative sample of size n from it. One is the Metropolis–Hastings Random Walk 17

(MHRW) recommended by Gjoka et al. (2010), and the other one is the proposed sampling algorithm embedded in the 18

MHRW with L ∈ {1, 5, 10, 20} and M = 2000; see Appendix D for a brief review of MHRW. Denote V = {vi : i = 1, . . . , N } 19

to be the node set of this network, and kv to be the degree of v for v ∈ V . We are interested in making inference about 20

the node degree of this∑ network, including the median (M) and the standard deviation (σ ) of the node degrees and three 21

proportions pj = N −1 v∈V kv < qj for j = 1, . . . , 3, where q1 = 7, q2 = 15, and q3 = 25. The ground truth for the 22

five parameters is (M , σ , p1 , p2 , p3 ) ≈ (8, 7.39, 0.43, 0.81, 0.96), and the relative bias and relative efficiency are used to 23

compare the two sampling methods. 24

We conduct 2000 Monte Carlo simulations, and Table 2 summarizes the simulation results. When L = 1, the proposed 25

sampling algorithm performs similarly as MHRW. For a fixed sample size, the proposed sampling algorithm becomes more 26

efficient compared with MHRW as L increases. On the other hand, for a fixed L, the relative efficiency of the proposed 27

sampling algorithm decreases as the sample size increases. These findings are the same as the first simulation. That is, 28

the proposed sampling algorithm is more efficient than the MHRW when L > 1, but its efficiency is limited if L is small 29

compared with the sample size n. 30

7. Discussion 31

MCMC methods are popular in drawing samples from a target distribution in practice, especially under Bayesian 32

settings (Gelman et al., 2014). By applying the idea of reservoir sampling to the Metropolis–Hastings algorithm, an efficient 33

8 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Table 2
Relative bias (R.Bias) and the relative efficiency (R.Eff.) of the estimators from the proposed sampling algorithm
embedded in Metropolis–Hastings Random Walk, denoted as ‘‘Proposed sampling algorithm’’, and the Metropolis–
Hastings Random Walk, denoted as ‘‘Metropolis–Hastings Random Walk’’, based on 2000 Monte Carlo simulations. The
population median is denoted as M, standard deviation as σ , and the three proportions as p1 , p2 and p3 .
n L Proposed sampling algorithm Metropolis–Hastings Random Walk
M σ p1 p2 p3 M σ p1 p2 p3
R.Bias 0.00 −0.09 −0.04 −0.01 0.00 0.00 −0.09 −0.04 −0.01 0.00
1
R.Eff. 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.09 −0.03 −0.01 0.00 −0.01 −0.10 −0.03 −0.01 0.00
5
50 R.Eff. 1.05 1.02 1.05 1.04 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.10 −0.03 −0.01 0.00 −0.01 −0.10 −0.03 −0.01 0.00
10
R.Eff. 1.08 1.07 1.09 1.09 1.05 1.00 1.00 1.00 1.00 1.00
R.Bias −0.02 −0.07 −0.02 0.00 0.00 −0.02 −0.09 −0.02 0.00 0.00
20
R.Eff. 1.22 1.15 1.29 1.21 1.17 1.00 1.00 1.00 1.00 1.00

R.Bias −0.02 −0.05 −0.03 −0.01 0.00 −0.02 −0.05 −0.03 −0.01 0.00
1
R.Eff. 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
R.Bias −0.02 −0.06 −0.03 −0.01 0.00 −0.02 −0.06 −0.03 −0.01 0.00
5
100 R.Eff. 1.02 1.03 1.02 1.01 1.03 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.04 −0.04 −0.01 0.00 −0.01 −0.05 −0.04 −0.01 0.00
10
R.Eff. 1.05 1.02 1.05 1.05 1.02 1.00 1.00 1.00 1.00 1.00
R.Bias −0.01 −0.04 −0.04 −0.01 0.00 −0.01 −0.05 −0.04 −0.01 0.00
20
R.Eff. 1.12 1.07 1.12 1.11 1.07 1.00 1.00 1.00 1.00 1.00

1 sampling algorithm is proposed to obtain a sample from a target distribution. The proposed sampling algorithm can be
2 embedded in other MCMC methods as well, including the slice sampling method (Neal, 2003) and the algorithms for
3 discrete populations (Gjoka et al., 2010).
4 For each iteration of the proposed sampling algorithm, it only requires limited computer memory to store the sample,
5 by which the convergence detection is conducted using the modified scale reduction factor. Simulation studies show the
6 better performance of the proposed sampling algorithm compared with the traditional MCMC methods without thinning.
7 However, there is no theoretical result for choosing the optimal L and the optimal removal probability function in (4).
8 Both L and the removal probability function balance the trade-off between the speed of convergence and the autocor-
9 relation among the sample selected in each iteration. Theoretical results validate that the conditional distribution of the
10 sample converges in total variation to the target distribution in probability, regardless of the choice of L and the removal
11 probability function. It is advised not to choose a large L and a removal probability function increasing dramatically at
12 the same time, and a large value of L should be chosen if the sample size n is large. In order to get a sample of size n
13 using the proposed sampling algorithm, an MCMC with length (ML + n) is needed, and it may result in high computational
14 cost (Geyer, 1992), including the computation of the removal probabilities and the removal procedure of the proposed
15 sampling algorithm. Such computational cost becomes significant when the sample size and the value of L are large.
16 The thinning procedure is a practical way to decrease the autocorrelation among the sample, and a commonly used
17 thinning procedure is a special case of the proposed sampling algorithm. Some researchers (Geyer, 1992; MacEachern and
18 Berliner, 1994; Link and Eaton, 2012) discouraged thinning of an MCMC, and their argument is based on comparing the
19 whole chain with a thinned chain of reduced length. In our study, we compare the un-thinned and thinned chains with
20 the same length, and the efficiency of the proposed sampling algorithm is demonstrated by two simulation studies.

21 Acknowledgments

22 We would like to thank Dr. Jae Kwang Kim for discussing and providing constructive comments for this paper. We
23 would also like to thank the associate editor and two anonymous reviewers for their detailed and constructive comments.

24 Appendix A. Proof of Theorem 4.1

(m) (m+1)
25
26 Given Sx = s, we can show that Sx ≤ s + (n − L) + L = s + n with probability 1. More generally, we can show
(m+j)
27 Sx ≤ s + jn for j ≥ 1. The probability that an element is not removed achieves its maximum at the removal of the first
28 element. Thus, we have
j { }L j { }L
(m+j)
∏ k ∏ k
29 P ∗ (Kx,i = j + 1 | Sx(m) = s) ≤ 1− (m+k−1)
≤ 1− . (A.1)
Sx +L s + (k − 1)n + L
k=1 k=1

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 9

Consider the function h1 (x) = 1 − x{s + (x − 1)n + L}−1 for x ∈ R. Its first-order derivative with respect to x is 1

h1 (x) = −(s − n + L){s + (x − 1)n + L}2 < 0 since s ≥ n. By (A.1), we have

′
2

( )jL
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) ≤ 1 − . (A.2) 3
s+L

Consider the function h2 (x) = {1 − (s + x)−1 }x for x > 0. Its first-order derivative with respect to x is 4
( )x { ( ) }
1 1 x 1
h′2 (x) = 1− ln 1 − + 5
s+x s+x 1 − (s + x)−1 (s + x)2
( )x
1 1−s
< 1− , (A.3) 6
s+x (s + x)(s + x − 1)
where the inequality holds since ln(1 − x) < −x for x ∈ (0, 1). Since s ≥ n > 1, by (A.3), h2 (x) is a decreasing function. 7

Thus, by (A.2), we have 8

( )j
(m+j) 1
P ∗ (Kx,i = j + 1 | Sx(m) = s) < 1 − . (A.4) 9
s+1
Thus, we have proved (1). 10

By the removal procedure, we can show 11

{Kx(m
,i
+l)
= l + 1} ⊂ {Kx(m
,i
+j)
= j + 1} 12

for l ≥ j. Thus, we have 13

∞
⋃
{Kx(m
,i
+l)
= l + 1} ⊂ {Kx(m
,i
+j)
= j + 1}. (A.5) 14

l=j

By (A.4), for m ≥ 1 and ϵ > 0, there exists J = J(m, ϵ ) ∈ N such that 15

)j
ϵ
(
1
1− < (A.6) 16
s+1 n
holds for j ≥ J. By (A.4)–(A.6), we have proved (2). 17

Appendix B. Proof of Corollary 4.1 18

For any ϵ > 0 and x ∈ S, by Theorem 1.5.1 of Liang et al. (2011), there exists N = N(ϵ, x) ∈ N, such that 19
20

sup ⏐P t (x, A) − π (A)⏐ < ϵ

⏐ ⏐
(B.1) 21
A∈S

holds for any t ≥ N; see Tierney (1994) for more details about the convergence of the Markov chain Monte Carlo. Besides, 22

there exists M1 = M1 (ϵ, x) to guarantee 23

L(M1 − 1) + n > N . 24

That is, the random variable associated with any element of the L newly generated values {y1 , . . . , yL } after the M1 th step 25

satisfies (B.1) since its corresponding index t ≥ (M1 − 1)L + n > N. 26

To show (3), we only need to consider the elements that are selected into the reservoir prior to the M1 -step and remain 27
(M )
in the reservoir at the M1 th step. Denote A = {i : kx,i 1 > 1} to be the index set of such elements, and the cardinality of 28

A is no greater than n, which is a fixed integer. 29

(m+j)
By (A.4), the conditional probability P ∗ (Kx,i = j + 1 | Sx(m) = s) is bounded by s. Besides, Sx(m) < (M1 + 1)(n − L) + L 30

for m < M1 . Thus, we have 31

( )j
(m+j) 1
P ∗ (Kx,i = j + 1) < 1 − . (B.2) 32
(M1 + 1)(n − L) + L + 1
By (B.2), for i ∈ A, there exists Ji such that 33
⎛ ⎞
∞
⋃ (m +j) ϵ
P∗ ⎝ {Kx,i i = j + 1}⎠ < , (B.3) 34
n
j=Ji

10 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 where mi < M1 . Denote J = max{Ji : i ∈ A}. Then,

⎛ ⎞ ⎛ ⎞
∞
⋃⋃ ∑ ∞
⋃
(m +j) (m +j)
2 P∗ ⎝ Kx,i i
{ = j + 1}⎠ ≤ P ∗ ⎝ {Kx,i i = j + 1}⎠
i∈A j=J i∈A j=J
n
∑ϵ
3 < = ϵ. (B.4)
n
i=1

4 Take M = M1 + J, we can show that, for m > M, with probability 1 − ϵ , we have

⏐ ⏐
(m)
max sup ⏐Pi (x, A) − π (A)⏐ < ϵ. (B.5)
⏐ ⏐
5
i∈{1,...,n} A∈S

6 Since ϵ is arbitrary, we have proved (3).

7 Appendix C. Brief review of slice sampling

8
9 For simplicity, we assume that the dimension of S is p = 1. Denote x0 to be the current point in the MCMC, y = g(x0 ) − e
10 to be the vertical level defining the slice {x : g(x) > y}, w to be an estimate of the typical size of a slice, m to be an integer
11 limiting the size of a lice to be mw , and Il and Iu to be a lower bound and an upper bound of the support of f (x), where
12 g(x) = log[f (x)], e is exponentially distributed with mean 1, and the support of f (x) is {x ∈ S : f (x) > 0}. The slice
13 sampling to get a new sample point can be summarized in the following steps (Neal, 2003). That is,

14 Step 1. ‘‘Doubling’’ procedure for finding an interval around the current point.

15 1.1. Initialize L = x0 − w u and R = L + w , where u ∼ U(0, 1), and U(a, b) is a uniform distribution on the interval
16 (a, b).
17 1.2. Initialize J = ⌊mv⌋ and K = (m − 1) − J, where v ∼ U(0, 1), and ⌊x⌋ is the largest integer that is smaller than
18 or equal to x.
19 1.3. While J > 0, L > Il and y < g(L), repeat the following step.

20 i. Update L = L − w and J = J − 1.

21 1.4. While K > 0, R < Iu and y < g(R), repeat the following step.

22 i. Update R = R + w and K = K − 1.

23 Step 2. Modify (L, R) by L = max{L, Il } and R = min{R, Iu }, where max{a, b} and min{a, b} are the maximum and minimum
24 number between a and b, respectively.
25 Step 3. Repeat the following steps until a break.

26 3.1. Generate x1 ∼ U(L, R).

27 3.2. If g(x1 ) ≥ y, break the repetition and accept x1 as a new sample point. Otherwise, update R = x1 if x1 > x0 ,
28 and L = x1 if x1 < x0 .

29 Appendix D. Brief review of Metropolis–Hastings Random Walk

31
30 Denote v to be the current node in the MCMC. The MHRW algorithm (Gjoka et al., 2010) to get a new sample point
32 can be summarized in the following steps.

33 Step 1. Select a node w that is connected with v uniformly at random.

34 Step 2. Generate p ∼ U(0, 1).
35 Step 3. If p ≤ kv /kw , a new sample point is w , otherwise a new sample point is v .

36 References

37 Al Hasan, M., Dave, V.S., 2018. Triangle counting in large networks: a review. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 8 (2), e1226.
38 http://dx.doi.org/10.1002/widm.1226.
39 Athreya, K.B., Lahiri, S.N., 2006. Measure theory and probability theory. In: Oxford Handbook of Innovation. Springer Science & Business Media, New
40 York, pp. 439–480.
41 Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J., 2002. Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM
42 SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. In: PODS ’02, ACM, New York, pp. 1–16. http://dx.doi.org/10.1145/543613.
43 543615.
44 Chib, S., Greenberg, E., 1995. Understanding the Metropolis-Hastings algorithm. Amer. Statist. 49 (4), 327–335. http://dx.doi.org/10.1080/00031305.
45 1995.10476177.
46 Edwards, R.G., Sokal, A.D., 1988. Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Phys. Rev. D 38
47 (6), 2009. http://dx.doi.org/10.1103/PhysRevD.38.2009.

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 11

Efraimidis, P.S., 2015. Weighted random sampling over data streams. In: Algorithms, Probability, Networks, and Games. Springer, pp. 183–195. 1
Efraimidis, P.S., Spirakis, P.G., 2006. Weighted random sampling with a reservoir. Inform. Process. Lett. 97 (5), 181–185. http://dx.doi.org/10.1016/j. 2
ipl.2005.11.003. 3
Gelfand, A.E., Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 (410), 398–409. http: 4
//dx.doi.org/10.1080/01621459.1990.10476213. 5
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2014. Bayesian Data Analysis, third ed. Taylor & Francis, Florida. 6
Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Statist. Sci. 7 (4), 457–472. http://dx.doi.org/10.1214/ss/ 7
1177011136. 8
Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 9
PAMI-6 (6), 721–741. http://dx.doi.org/10.1109/TPAMI.1984.4767596. 10
Geyer, C.J., 1992. Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483. http://dx.doi.org/10.1214/ss/1177011137. 11
Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection metropolis sampling within Gibbs sampling. J. R. Stat. Soc. Ser. C 44 (4), 455–472. 12
http://dx.doi.org/10.2307/2986138. 13
Gilks, W.R., Neal, R.M., Best, N.G., Tan, K.K.C., 1997. Corrigendum: adaptive rejection Metropolis sampling. J. R. Stat. Soc. Ser. C 46 (4), 541–542. 14
http://dx.doi.org/10.1111/1467-9876.00091. 15
Gilks, W.R., Roberts, G.O., George, E.I., 1994. Adaptive direction sampling. J. R. Stat. Soc., Ser. D Stat. 43 (1), 179–189. http://dx.doi.org/10.2307/2348942. 16
Girolami, M., Calderhead, B., 2011. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B 73 (2), 123–214. 17
http://dx.doi.org/10.1111/j.1467-9868.2010.00765.x. 18
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A., 2010. Walking in facebook: A case study of unbiased sampling of OSNs. In: Infocom, 2010 19
Proceedings IEEE. IEEE, pp. 1–9. http://dx.doi.org/10.1109/INFCOM.2010.5462078. 20
Hastings, W.K., 1970. Monte Carlo Sampling methods using Markov chains and their applications. Biometrika 57 (1), 97–109. http://dx.doi.org/10. 21
1093/biomet/57.1.97. 22
Leskovec, J., Faloutsos, C., 2006. Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge 23
Discovery and Data Mining. ACM, pp. 631–636. http://dx.doi.org/10.1145/1150402.1150479. 24
Liang, F., Liu, C., Carroll, R., 2011. Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, Vol. 714. John Wiley & Sons, Chichester. 25
Link, W.A., Eaton, M.J., 2012. On thinning of chains in MCMC. Methods Ecol. Evol. 3 (1), 112–115. http://dx.doi.org/10.1111/j.2041-210X.2011.00131.x. 26
Liu, J.S., Liang, F., Wong, W.H., 2000. The multiple-try method and local optimization in Metropolis sampling. J. Amer. Statist. Assoc. 95 (449), 121–134. 27
http://dx.doi.org/10.1080/01621459.2000.10473908. 28
MacEachern, S.N., Berliner, L.M., 1994. Subsampling the Gibbs sampler. Amer. Statist. 48 (3), 188–190. http://dx.doi.org/10.1080/00031305.1994. 29
10476054. 30
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculations by fast computing machines. J. Chem. 31
Phys. 21 (6), 1087–1092. http://dx.doi.org/10.1063/1.1699114. 32
Neal, R.M., 2003. Slice sampling. Ann. Statist. 31, 705–741. http://dx.doi.org/10.1214/aos/1056562461. 33
Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C., 2018. GEMSEC: Graph Embedding with Self Clustering, arXiv:arXiv:1802.03997. 34
Swendsen, R.H., Wang, J.-S., 1987. Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 (2), 86. http://dx.doi.org/10.1103/ 35
PhysRevLett.58.86. 36
Tierney, L., 1994. Markov Chains for exploring posterior distributions. Ann. Statist. 22 (2), 1701–1728. http://dx.doi.org/10.1214/aos/1176325750. 37
Vitter, J.S., 1984. Faster methods for random sampling. Commun. ACM. 27 (7), 703–718. http://dx.doi.org/10.1145/358105.893. 38
Vitter, J.S., 1985. Random sampling with a reservoir. ACM Trans. Math. Software 11 (1), 37–57. http://dx.doi.org/10.1145/3147.3165. 39
Wu, F.-Y., 1982. The potts model. Rev. Modern Phys. 54 (1), 235. http://dx.doi.org/10.1103/RevModPhys.54.235. 40

Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
No ratings yet
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
485 pages
Bayes Rules (Johnson, Alicia A.ott, Miles Q.dogucu, Mine)
100% (1)
Bayes Rules (Johnson, Alicia A.ott, Miles Q.dogucu, Mine)
713 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
03 Markov Chain Monte Carlo
No ratings yet
03 Markov Chain Monte Carlo
4 pages
Statistical Computation Algorithm Based On Markov Chain Monte Carlo Sampling To Solve Multivariable Nonlinear Optimization
No ratings yet
Statistical Computation Algorithm Based On Markov Chain Monte Carlo Sampling To Solve Multivariable Nonlinear Optimization
5 pages
NeurIPS-2019-sample-adaptive-mcmc-Paper
No ratings yet
NeurIPS-2019-sample-adaptive-mcmc-Paper
12 pages
Cra I U Rosenthal Ann Rev
No ratings yet
Cra I U Rosenthal Ann Rev
40 pages
Algorithms Probability Distribution Markov Chain Limiting Distribution
No ratings yet
Algorithms Probability Distribution Markov Chain Limiting Distribution
1 page
MCMC Final Edition
No ratings yet
MCMC Final Edition
17 pages
Annurev Statistics 022513 115540
No ratings yet
Annurev Statistics 022513 115540
26 pages
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
No ratings yet
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
64 pages
PDF Sampling: Markov Chain Monte Carlo: X N I I
No ratings yet
PDF Sampling: Markov Chain Monte Carlo: X N I I
13 pages
Cowles 96 Markov
No ratings yet
Cowles 96 Markov
52 pages
Segmentation Using Population Based Markov Chain Monte Carlo
No ratings yet
Segmentation Using Population Based Markov Chain Monte Carlo
5 pages
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
100% (1)
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
31 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
2 pages
Variational Langevin Hamiltonian Monte Carlo For Distant Multi-Modal Sampling
No ratings yet
Variational Langevin Hamiltonian Monte Carlo For Distant Multi-Modal Sampling
25 pages
Annurev Statistics 031219 041300
No ratings yet
Annurev Statistics 031219 041300
26 pages
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
No ratings yet
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
66 pages
Chib Greenberg 1995
No ratings yet
Chib Greenberg 1995
12 pages
Chib-UnderstandingMetropolisHastingsAlgorithm-1995
No ratings yet
Chib-UnderstandingMetropolisHastingsAlgorithm-1995
10 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
Adaptive MCMC For Everyone
No ratings yet
Adaptive MCMC For Everyone
13 pages
Bayesian Analysis
No ratings yet
Bayesian Analysis
20 pages
Sampling Methods: Søren Højsgaard
No ratings yet
Sampling Methods: Søren Højsgaard
22 pages
Girolami
No ratings yet
Girolami
92 pages
Hamiltonian Monte Carlo For Efficient Gaussian Sampling: Long and Random Steps
No ratings yet
Hamiltonian Monte Carlo For Efficient Gaussian Sampling: Long and Random Steps
30 pages
A Complete Recipe For Stochastic Gradient MCMC: (Yianma@u, Tqchen@cs, Ebfox@stat) .Washington - Edu
No ratings yet
A Complete Recipe For Stochastic Gradient MCMC: (Yianma@u, Tqchen@cs, Ebfox@stat) .Washington - Edu
16 pages
728852
No ratings yet
728852
124 pages
Markov Chain Monte Carlo and Gibbs Sampling
No ratings yet
Markov Chain Monte Carlo and Gibbs Sampling
24 pages
Examples of Adaptive MCMC
No ratings yet
Examples of Adaptive MCMC
28 pages
p403 17 MCMC
No ratings yet
p403 17 MCMC
33 pages
18 Aos1715
No ratings yet
18 Aos1715
33 pages
Mcmc-A Comparative Study
No ratings yet
Mcmc-A Comparative Study
29 pages
Arnaud Dufays Inference 3
No ratings yet
Arnaud Dufays Inference 3
45 pages
ML Unit5
No ratings yet
ML Unit5
14 pages
An Introduction To MCMC For Machine Learning: Abstract
No ratings yet
An Introduction To MCMC For Machine Learning: Abstract
39 pages
An Introduction To MCMC For Machine Learning
No ratings yet
An Introduction To MCMC For Machine Learning
39 pages
MARKOV CHAIN MONTE CARLO
No ratings yet
MARKOV CHAIN MONTE CARLO
51 pages
Lecture 18-2
No ratings yet
Lecture 18-2
11 pages
WIREs Computational Stats - 2018 - Robert - Accelerating MCMC Algorithms
No ratings yet
WIREs Computational Stats - 2018 - Robert - Accelerating MCMC Algorithms
14 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
9 pages
Questions_for_Unit_5__RM_
No ratings yet
Questions_for_Unit_5__RM_
4 pages
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Sequential Monte Carlo Methods
No ratings yet
Sequential Monte Carlo Methods
6 pages
MCMC
No ratings yet
MCMC
7 pages
Metropolis-Hastings Algorithm - Wikipedia
No ratings yet
Metropolis-Hastings Algorithm - Wikipedia
10 pages
2405.13574v1
No ratings yet
2405.13574v1
44 pages
Mid Exam Analisis Algoritma
No ratings yet
Mid Exam Analisis Algoritma
70 pages
An N-Dimensional Rosenbrock Distribution For MCMC Testing: Filippo Pagani Martin Wiegand Saralees Nadarajah May 8, 2020
No ratings yet
An N-Dimensional Rosenbrock Distribution For MCMC Testing: Filippo Pagani Martin Wiegand Saralees Nadarajah May 8, 2020
22 pages
Markov Chain Monte-Carlo Explained
No ratings yet
Markov Chain Monte-Carlo Explained
12 pages
Markov Chain Monte Carlo (MCMC) Methods: Example 11 (Matlab)
No ratings yet
Markov Chain Monte Carlo (MCMC) Methods: Example 11 (Matlab)
21 pages
MCMC
No ratings yet
MCMC
70 pages
3-MS2 (MCMC)
No ratings yet
3-MS2 (MCMC)
32 pages
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
No ratings yet
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
487 pages
16 Aap1257
No ratings yet
16 Aap1257
43 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
29 pages
markov-chain-monte-carlo
No ratings yet
markov-chain-monte-carlo
8 pages
MyNotesUnit5
No ratings yet
MyNotesUnit5
12 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
A Simulation Model for the Optimization and Risk Management of Preproduction Mine Development in a Block Caving Mining Project
No ratings yet
A Simulation Model for the Optimization and Risk Management of Preproduction Mine Development in a Block Caving Mining Project
6 pages
Pre-Undercut Caving in El Teniente mine, Chile
No ratings yet
Pre-Undercut Caving in El Teniente mine, Chile
66 pages
Optimising Cemented Paste Fill at Darlot Gold Mine through the use of BASF Product MasterRoc MF 505C
No ratings yet
Optimising Cemented Paste Fill at Darlot Gold Mine through the use of BASF Product MasterRoc MF 505C
10 pages
Effects of Groundwater Abstraction and Desalination Brine Injection On A Middle Miocene Aquifer of The El-Dabaa Area, Northern Coast of Egypt
No ratings yet
Effects of Groundwater Abstraction and Desalination Brine Injection On A Middle Miocene Aquifer of The El-Dabaa Area, Northern Coast of Egypt
7 pages
Rheological Properties of Cemented Paste Backfill and the Construction of a Prediction Model
No ratings yet
Rheological Properties of Cemented Paste Backfill and the Construction of a Prediction Model
17 pages
Analysis of Induced Stress During Construction and Production Stages of Drawbells in Block Caving Mines
No ratings yet
Analysis of Induced Stress During Construction and Production Stages of Drawbells in Block Caving Mines
18 pages
Mining Cave Block in Level Undercut the and Boundary Stope Optimizing
No ratings yet
Mining Cave Block in Level Undercut the and Boundary Stope Optimizing
13 pages
Optimising for Success at the Grasberg Block Cave
No ratings yet
Optimising for Success at the Grasberg Block Cave
14 pages
Geotechnical Properties of Mine Fill
No ratings yet
Geotechnical Properties of Mine Fill
7 pages
A Solution to Estimate the Total and Effective Stresses in Backfilled Stopes with an Impervious Base during the Filling Operation of Cohesionless Backfill
No ratings yet
A Solution to Estimate the Total and Effective Stresses in Backfilled Stopes with an Impervious Base during the Filling Operation of Cohesionless Backfill
17 pages
Mathematical Programming Applications in Block-Caving Scheduling꞉ A Review of Models and Algorithms
No ratings yet
Mathematical Programming Applications in Block-Caving Scheduling꞉ A Review of Models and Algorithms
25 pages
Construction and Optimization Method of The Open-Pit Mine DEM Based On The Oblique Photogrammetry Generated DSM
No ratings yet
Construction and Optimization Method of The Open-Pit Mine DEM Based On The Oblique Photogrammetry Generated DSM
11 pages
On The Potential of Ground Penetrating Radar To Help Rock Fall Hazard Assessment A Case Study of A Limestone Slab, Gorges de La Bourne (French Alps)
No ratings yet
On The Potential of Ground Penetrating Radar To Help Rock Fall Hazard Assessment A Case Study of A Limestone Slab, Gorges de La Bourne (French Alps)
14 pages
Defining Spatial Entropy From Multivariate Distributions of Co-Occurrences
No ratings yet
Defining Spatial Entropy From Multivariate Distributions of Co-Occurrences
14 pages
Limiting The Influence of Extreme Grades in Ordinary Kriged Estimates
No ratings yet
Limiting The Influence of Extreme Grades in Ordinary Kriged Estimates
11 pages
An Overview of Bench Design For Cut Slopes With An Example of An Advanced Dataset Assessment Technique
No ratings yet
An Overview of Bench Design For Cut Slopes With An Example of An Advanced Dataset Assessment Technique
18 pages
Stope Optimization With Convexity Constraints
No ratings yet
Stope Optimization With Convexity Constraints
20 pages
Day-to-Day Evolution of Departure Time Choice in Stochastic Capacity Bottleneck Models With Bounded Rationality and Various Information Perceptions
No ratings yet
Day-to-Day Evolution of Departure Time Choice in Stochastic Capacity Bottleneck Models With Bounded Rationality and Various Information Perceptions
25 pages
Creep Behavior of Rocks and Its Application To The Long-Term Stability of Deep Rock Tunnels
No ratings yet
Creep Behavior of Rocks and Its Application To The Long-Term Stability of Deep Rock Tunnels
35 pages
A Case Study in Developing A Life of Mine Plan
No ratings yet
A Case Study in Developing A Life of Mine Plan
5 pages
Net Present Value Maximization Model For Optimum Cut-Off Grade Policy of Open Pit Mining Operations
No ratings yet
Net Present Value Maximization Model For Optimum Cut-Off Grade Policy of Open Pit Mining Operations
10 pages
Artificial Intelligence Algorithms For Realtime Production Planning With Incoming New Information in Mining Complexes
No ratings yet
Artificial Intelligence Algorithms For Realtime Production Planning With Incoming New Information in Mining Complexes
309 pages
CASMI Manual
No ratings yet
CASMI Manual
17 pages
Continuous Modeling of Open Pit Dynamics
No ratings yet
Continuous Modeling of Open Pit Dynamics
14 pages
Sattarvand Niemann Delius
No ratings yet
Sattarvand Niemann Delius
14 pages
Basu 1999
No ratings yet
Basu 1999
7 pages
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
No ratings yet
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
33 pages
BayesianStatisticsandMarketing ByRossiand Allenby
No ratings yet
BayesianStatisticsandMarketing ByRossiand Allenby
26 pages
Six Textbook Mistakes in Computational Physics
No ratings yet
Six Textbook Mistakes in Computational Physics
14 pages
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
No ratings yet
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
16 pages
Markov Chain Monte Carlo Sampling Using A Reservoir Method
No ratings yet
Markov Chain Monte Carlo Sampling Using A Reservoir Method
11 pages
Coles and Davison - Statistical Modelling of Extreme Values - June 2008
100% (1)
Coles and Davison - Statistical Modelling of Extreme Values - June 2008
70 pages
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
100% (3)
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
74 pages
Bayesian Hierarchical Models: With Applications Using R Peter D. Congdon pdf download
100% (4)
Bayesian Hierarchical Models: With Applications Using R Peter D. Congdon pdf download
60 pages
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
No ratings yet
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
35 pages
[Ebooks PDF] download Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little full chapters
No ratings yet
[Ebooks PDF] download Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little full chapters
47 pages
PDF Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein Download
100% (14)
PDF Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein Download
70 pages
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
No ratings yet
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
58 pages
Department of Physics College of Natural and Computational Sciences Addis Ababa University
No ratings yet
Department of Physics College of Natural and Computational Sciences Addis Ababa University
11 pages
(Ebook) Bayesian Methods for Management and Business: Pragmatic Solutions for Real Problems by Eugene D. Hahn ISBN 9781118637555, 1118637550 instant download
100% (1)
(Ebook) Bayesian Methods for Management and Business: Pragmatic Solutions for Real Problems by Eugene D. Hahn ISBN 9781118637555, 1118637550 instant download
49 pages
MA40189 20 Mock - Sols
No ratings yet
MA40189 20 Mock - Sols
17 pages
MCMC: Gibbs Sampling: D K k1 k+1 D
No ratings yet
MCMC: Gibbs Sampling: D K k1 k+1 D
7 pages
[Ebooks PDF] download Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein full chapters
100% (7)
[Ebooks PDF] download Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein full chapters
50 pages
CPLM
No ratings yet
CPLM
28 pages
The BUGS Project Examples Volume 2
No ratings yet
The BUGS Project Examples Volume 2
42 pages
Mathematical Theory of Bayesian Statistics First Edition Watanabe - The ebook is ready for download to explore the complete content
100% (1)
Mathematical Theory of Bayesian Statistics First Edition Watanabe - The ebook is ready for download to explore the complete content
63 pages
Item-Level Factor Analysis
No ratings yet
Item-Level Factor Analysis
15 pages
A Bayesian Proportional-Hazards Model in Survival Analysis
No ratings yet
A Bayesian Proportional-Hazards Model in Survival Analysis
11 pages
Figaro Tutorial
No ratings yet
Figaro Tutorial
69 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Monte Carlo Simulation Methods Lecture Notes
No ratings yet
Monte Carlo Simulation Methods Lecture Notes
10 pages
Edinburgh Uni Thesis Guidelines
100% (3)
Edinburgh Uni Thesis Guidelines
4 pages

Markov Chain Monte Carlo Sampling Using A Reservoir Method

Uploaded by

Markov Chain Monte Carlo Sampling Using A Reservoir Method

Uploaded by

COMSTA: 6789 Model 3G pp. 1–11 (col.

Computational Statistics and Data Analysis xxx (xxxx) xxx

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

Markov chain Monte Carlo sampling using a reservoir method

• A reservoir sampling algorithm embedded in MCMC methods is proposed.

E-mail address: [email protected].

2 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

with probability p(x, y)

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 3

3. Proposed sampling algorithm 1

{x(m) (m) (m) (m)

Step 4. [Repetition]: Set m = m + 1, and repeat Step 2–3 until m > M. 25

the proposed sampling algorithm. 33

4 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 5

6.1. A continuous distribution case 14

and (p1 , p2 , p3 ) ≈ (0.08, 0.42, 0.75). 22

distribution with mean 0 and scale 3 for j = 1, . . . , 9. 40

the proposed sampling algorithm in practice. 45

6 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 7

6.2. A discrete distribution case 9

Al Hasan and Dave, 2018; Leskovec and Faloutsos, 2006). 13

compare the two sampling methods. 24

compared with the sample size n. 30

8 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

24 Appendix A. Proof of Theorem 4.1

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 9

h1 (x) = −(s − n + L){s + (x − 1)n + L}2 < 0 since s ≥ n. By (A.1), we have

Thus, by (A.2), we have 8

By the removal procedure, we can show 11

for l ≥ j. Thus, we have 13

By (A.4), for m ≥ 1 and ϵ > 0, there exists J = J(m, ϵ ) ∈ N such that 15

Appendix B. Proof of Corollary 4.1 18

sup ⏐P t (x, A) − π (A)⏐ < ϵ

there exists M1 = M1 (ϵ, x) to guarantee 23

satisfies (B.1) since its corresponding index t ≥ (M1 − 1)L + n > N. 26

A is no greater than n, which is a fixed integer. 29

for m < M1 . Thus, we have 31

10 Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

1 where mi < M1 . Denote J = max{Ji : i ∈ A}. Then,

4 Take M = M1 + J, we can show that, for m > M, with probability 1 − ϵ , we have

6 Since ϵ is arbitrary, we have proved (3).

7 Appendix C. Brief review of slice sampling

26 3.1. Generate x1 ∼ U(L, R).

29 Appendix D. Brief review of Metropolis–Hastings Random Walk

33 Step 1. Select a node w that is connected with v uniformly at random.

Z. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx 11

You might also like