One justification for pseudo-marginal Metropolis-Hastings algorithms is the completion or demarginalisation of the initial target with the random variates used to compute the unbiased estimator of the target or likelihood. In a recent arXival, M.-N. Tran, Robert Kohn, M. Quiroz and M. Villani explore the idea of only updating part of those auxiliary random variates, hence the block in the title. The idea is to “reduce the variability in the ratio of the likelihood estimates” but I think it also reduces the moves of the sampler by creating a strong correlation between the likelihood estimates. Of course, a different appeal of the approach is when facing a large product of densities, large enough to prevent the overall approximation at once and requiring blockwise approximations. As in, e.g., consensus Monte Carlo and other “big data” (re)solutions. The convergence results provided in the paper are highly stylised (like assuming the log of the unbiased estimator of the likelihood being normal and simulation run from the prior), but they lead to a characterisation of the inefficiency of the pseudo-marginal algorithm. The inefficiency being defined as the ratio of the variances when using the true likelihood and when using the limiting unbiased estimator. There is however no corresponding result for selecting the number of blocks, G, which is chosen as G=100 in the paper.
Archive for block sampling
block-wise pseudo-marginals
Posted in Books, pictures, Statistics, University life with tags block sampling, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, pseudo-marginal MCMC on April 4, 2016 by xi'anindependent Metropolis-Hastings
Posted in Books, Statistics with tags 0.234, block sampling, Gibbs sampler, independent Metropolis-Hastings algorithm, Metropolis-within-Gibbs algorithm, optimal acceptance rate, random walk on November 24, 2015 by xi'an“In this paper we have demonstrated the potential benefits, both theoretical and practical, of the independence sampler over the random walk Metropolis algorithm.”
Peter Neal and Tsun Man Clement Lee arXived a paper on optimising the independent Metropolis-Hastings algorithm. I was a bit surprised at this “return” of the independent sampler, which I hardly mention in my lectures, so I had a look at the paper. The goal is to produce an equivalent to what Gelman, Gilks and Wild (1996) obtained for random walk samplers. In the formal setting when the target is a product of n identical densities f, the optimal number k of components to update in one Metropolis-Hastings (within Gibbs) round is approximately 2.835/I, where I is the symmetrised Kullback-Leibler distance between the (univariate) target f and the independent proposal q. When I is finite. The most surprising part is that the optimal acceptance rate is again 0.234, as in the random walk case. This is surprising in that I usually associate the independent Metropolis-Hastings algorithm with high acceptance rates. But this is of course when calibrating the proposal q, not the block size k of the Gibbs part. Hence, while this calibration of the independent Metropolis-within-Gibbs sampler is worth the study and almost automatically applicable, it remains that it only applies to a certain category of problems where blocking can take place. As in the disease models illustrating the paper. And requires an adequate choice of proposal distribution for, otherwise, the above quote becomes inappropriate.
Questions on the parallel Rao-Blackwellisation
Posted in R, Statistics, University life with tags acceleration of MCMC algorithms, block sampling, MCMC algorithms, Monte Carlo Statistical Methods, parallel processing, simulation on December 22, 2010 by xi'anPierre Jacob and I got this email from a student about our parallel Rao-Blackwellisation paper. Here are some parts of the questions and our answer:
Although I understand how the strategy proposed in the paper helps in variance reduction, I do not understand why you set b=1 (mentioned in Section 3.2) and why it plays no role in the comparison. If b=1 and p=4 for example, every chains (out of the 4 chains of the block) has a length of 4 samples. And there are no additional blocks. How is this length enough for the sampler to converge and sample the distribution adequately? You mention that you do 10000 independent runs (Which means you do the above procedure 10000 times independently) but aren’t all these runs equally short and inadequate? Even for p=100, aren’t the chains too short? Continue reading