Archive for Markov chain Monte Carlo algorithm

normal variates in Metropolis step

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on November 14, 2017 by xi'an

A definitely puzzled participant on X validated, confusing the Normal variate or variable used in the random walk Metropolis-Hastings step with its Normal density… It took some cumulated efforts to point out the distinction. Especially as the originator of the question had a rather strong a priori about his or her background:

“I take issue with your assumption that advice on the Metropolis Algorithm is useless to me because of my ignorance of variates. I am currently taking an experimental course on Bayesian data inference and I’m enjoying it very much, i believe i have a relatively good understanding of the algorithm, but i was unclear about this specific.”

despite pondering the meaning of the call to rnorm(1)… I will keep this question in store to use in class when I teach Metropolis-Hastings in a couple of weeks.

Bouncing bouncy particle papers

Posted in Books, pictures, Statistics, University life with tags , , , , on July 27, 2017 by xi'an

Yesterday, two papers on bouncy particle samplers simultaneously appeared on arXiv, arxiv:1707.05200 by Chris Sherlock and Alex Thiery, and arxiv:1707.05296 by Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. As a coordinated move by both groups of authors who had met the weeks before at the Isaac Newton Institute in Cambridge.

The paper by Sherlock and Thiery, entitled a discrete bouncy particle sampler, considers a delayed rejection approach that only requires point-wise evaluations of the target density. The delay being into making a speed flip move after a proposal involving a flip in the speed and a drift in the variable of interest is rejected. To achieve guaranteed ergodicity, they add a random perturbation as in our recent paper, plus another perturbation based on a Brownian argument. Given that this is a discretised version of the continuous-time bouncy particle sampler, the discretisation step δ need be calibrated. The authors follow a rather circumvoluted argument to argue in favour of seeking a maximum number of reflections (for which I have obviously no intuition). Overall, I find it hard to assess how much of an advance this is, even when simulations support the notion of a geometric convergence.

“Our results provide a cautionary example that in certain high-dimensional scenarios, it is still preferable to perform refreshment even when randomized bounces are used.” Vanetti et al.

The paper by Paul Vanetti and co-authors has a much more ambitious scale in that it unifies most of the work done so far in this area and relates piecewise deterministic processes, Hamiltonian Monte Carlo, and discrete versions, containing on top fine convergence results. The main idea is to improve upon the existing deterministic methods by taking (more) into account the target density. Hence the use of a bouncy particle sampler associated with the Hamiltonian (as in HMC). This borrows from an earlier slice sampler idea of Iain Murray, Ryan Adams, and David McKay (AISTATS 2010), exploiting an exact Hamiltonian dynamics for an approximation to the true target to explore its support. Except that bouncing somewhat avoids the slice step. The [eight] discrete bouncy particle particle samplers derived from this framework are both correct against the targeted distribution and do not require the simulation of event times. The paper distinguishes between global and local versions, the later exploiting conditional independence properties in the (augmented) target. Which sounds like a version of multiple slice sampling.

adaptive exchange

Posted in Books, Statistics, University life with tags , , , , , , , , , , on October 27, 2016 by xi'an

In the March 2016 issue of JASA that currently sits on my desk, there is a paper by Liang, Jim, Song and Liu on the adaptive exchange algorithm, which aims at handling posteriors for sampling distributions with intractable normalising constants. The concept behind the algorithm is the exchange principle initiated by Jesper Møller and co-authors in 2006, where an auxiliary pseudo-observation is simulated for the missing constants to vanish in a Metropolis-Hastings ratio. (The name exchangeable was introduced in a subsequent paper by Iain Murray, Zoubin Ghahramani and David MacKay, also in 2006.)

 The crux of the method is to run an iteration as [where y denotes the observation]

  1. Proposing a new value θ’ of the parameter from a proposal q(θ’|θ);
  2. Generate a pseudo-observation z~ƒ(z|θ’);
  3. Accept with probability

\dfrac{\pi(\theta')f(y|\theta')}{\pi(\theta)f(y|\theta)}\dfrac{q(\theta|\theta')f(z|\theta)}{q(\theta'|\theta)f(z|\theta')}

which has the appeal to cancel all normalising constants. And the repeal of requiring an exact simulation from the very distribution with the missing constant, ƒ(.|θ). Which means that in practice a finite number of MCMC steps will be used and will bias the outcome. The algorithm is unusual in that it replaces the exact proposal q(θ’|θ) with an unbiased random version q(θ’|θ)ƒ(z|θ’), z being just an augmentation of the proposal. (The current JASA paper by Liang et al. seems to confuse augment and argument, see p.378.)

To avoid the difficulty in simulating from ƒ(.|θ), the authors draw pseudo-observations from sampling distributions with a finite number m of parameter values under the [unrealistic] assumption (A⁰) that this collection of values provides an almost complete cover of the posterior support. One of the tricks stands with an auxiliary [time-heterogeneous] chain of pseudo-observations generated by single Metropolis steps from one of these m fixed targets. These pseudo-observations are then used in the main (or target) chain to define the above exchange probability. The auxiliary chain is Markov but time-heterogeneous since the probabilities of accepting a move are evolving with time according to a simulated annealing schedule. Which produces a convergent estimate of the m normalising constants. The main chain is not Markov in that it depends on the whole history of the auxiliary chain [see Step 5, p.380]. Even jointly the collection of both chains is not Markov. The paper prefers to consider the process as an adaptive Markov chain. I did not check the rather intricate in details, so cannot judge of the validity of the overall algorithm; I simply note that one condition (A², p.383) is incredibly strong in that it assumes the Markov transition kernel to be Doeblin uniformly on any compact set of the calibration parameters. However, the major difficulty with this approach seems to be in its delicate calibration. From providing a reference set of m parameter values scanning the posterior support to picking transition kernels on both the parameter and the sample spaces, to properly cooling the annealing schedule [always a fun part!], there seems to be [from my armchair expert’s perspective, of course!] a wide range of opportunities for missing the target or running into zero acceptance problems. Both examples analysed in the paper, the auto-logistic and the auto-normal models, are actually of limited complexity in that they depend on a few parameters, 2 and 4 resp., and enjoy sufficient statistics, of dimensions 2 and 4 as well. Hence simulating (pseudo-)realisations of those sufficient statistics should be less challenging than the original approach replicating an entire vector of thousands of dimensions.

stability of noisy Metropolis-Hastings

Posted in Statistics with tags , , , , , , on September 28, 2016 by xi'an

noisymcmcFelipe Medina-Aguayo, Antony Lee and Gareth Roberts (all at Warwick University) have recently published—even though the paper was accepted a year ago—a paper in Statistics and Computing about a variant to the pseudo-marginal Metropolis-Hastings algorithm. The modification is to simulate an estimate of the likelihood or posterior at the current value of the Markov chain at every iteration, rather than reproducing the current estimate. The reason for this refreshment of the weight estimate is to prevent stickiness in the chain, when a random weight leads to a very high value of the posterior. Unfortunately, this change leads to a Markov chain with the wrong stationary distribution. When this stationary exists! The paper actually produces examples of transient noisy chains, even in simple cases such as a geometric target distribution. And even when taking the average of a large number of weights. But the paper also contains sufficient conditions, like negative weight moments or uniform ergodicity of the proposal, for the noisy chain to be geometrically ergodic. Even though the applicability of those conditions to complex targets is not always obvious.

hypothesis testing for MCMC

Posted in Books, Statistics, University life with tags , , , , on October 6, 2014 by xi'an

A recent arXival by Benjamin Gyori and Daniel Paulin considers sequential testing based on MCMC simulation. The test is about an expectation under the target and stationary distribution of the Markov chain (i.e., the posterior in a Bayesian setting). Hence testing whether or not the posterior expectation is below a certain bound is not directly relevant from a Bayesian perspective. One would test instead whether or not the parameter itself is below the bound… The paper is then more a study of sequential tests when the data is a Markov chain than in any clear connection with MCMC topics. Despite the paper including an example of a Metropolis-Hastings scheme for approximating the posterior on the parameters of an ODE. I am a bit puzzled by the purpose of the test, as I was rather expecting tests connected with the convergence of the Markov chain or of the empirical mean. (But, given the current hour, I may also have missed a crucial point!)