Archive for Metropolis-Hastings algorithm

minibatch acceptance for Metropolis-Hastings

Posted in Books, Statistics with tags , , , , , on January 12, 2018 by xi'an

An arXival that appeared last July by Seita, Pan, Chen, and Canny, and that relates to my current interest in speeding up MCMC. And to 2014 papers by  Korattikara et al., and Bardenet et al. Published in Uncertainty in AI by now. The authors claim that their method requires less data per iteration than earlier ones…

“Our test is applicable when the variance (over data samples) of the log probability ratio between the proposal and the current state is less than one.”

By test, the authors mean a mini-batch formulation of the Metropolis-Hastings acceptance ratio in the (special) setting of iid data. First they use Barker’s version of the acceptance probability instead of Metropolis’. Second, they use a Gaussian approximation to the distribution of the logarithm of the Metropolis ratio for the minibatch, while the Barker acceptance step corresponds to comparing a logistic perturbation of the logarithm of the Metropolis ratio against zero. Which amounts to compare the logarithm of the Metropolis ratio for the minibatch, perturbed by a logistic minus Normal variate. (The cancellation of the Normal in eqn (13) is a form of fiducial fallacy, where the Normal variate has two different meanings. In other words, the difference of two Normal variates is not equal to zero.) However, the next step escapes me as the authors seek to optimise the distribution of this logistic minus Normal variate. Which I thought was uniquely defined as such a difference. Another constraint is that the estimated variance of the log-likelihood ratio gets below one. (Why one?) The argument is that the average of the individual log-likelihoods is approximately Normal by virtue of the Central Limit Theorem. Even when randomised. While the illustrations on a Gaussian mixture and on a logistic regression demonstrate huge gains in computational time, it is unclear to me to which amount one can trust the approximation for a given model and sample size…

multinomial resampling by Metropolis

Posted in Books, Statistics with tags , , , , , on December 28, 2017 by xi'an

A few years ago Lawrence Murray wrote a note on accelerating the resampling stage in particle filters by using a Metropolis step. And GPUs. The notion that Metropolis can be applied in this setting is at first puzzling since exact multinomial sampling is available. And Metropolis requires convergence guarantees. Which Lawrence covers by a Raftery and Lewis assessment, which has severe limitations in general but may well be adequate for this very case, although possibly too conservative in the number of recommended Metropolis iterations. The gain brought by Metropolis is that it does not require summing up all the particle weights, and as a result the gain is real in that Metropolis beats all other approaches (time-wise) when the number of particles is not too large and the heterogeneity of the weighs not too  high. (I did not know of this note until Richard Everitt brought it to my attention.)

random wake

Posted in Books, Kids, R, Statistics with tags , , , , , , on December 27, 2017 by xi'an

Just too often on X validated, one sees questions displaying a complete ignorance of the basics that makes one purposelessly wonder what is the point of trying to implement advanced methods when missing the necessary background. And just as often, I reacted to the question by wondering out loud about this… In the current case, the question was about debugging an R code for a mixture of two exponential distributions and the random walk Metropolis algorithm that comes with it. Except that the Normal noise was replaced with a Uniform U(0,1) noise, leading to a most obviously transient Markov chain.I eventually corrected the R code, which returned a rather nicely label-switching output. And which did not necessarily help with the comprehension of the fundamentals.

normal variates in Metropolis step

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on November 14, 2017 by xi'an

A definitely puzzled participant on X validated, confusing the Normal variate or variable used in the random walk Metropolis-Hastings step with its Normal density… It took some cumulated efforts to point out the distinction. Especially as the originator of the question had a rather strong a priori about his or her background:

“I take issue with your assumption that advice on the Metropolis Algorithm is useless to me because of my ignorance of variates. I am currently taking an experimental course on Bayesian data inference and I’m enjoying it very much, i believe i have a relatively good understanding of the algorithm, but i was unclear about this specific.”

despite pondering the meaning of the call to rnorm(1)… I will keep this question in store to use in class when I teach Metropolis-Hastings in a couple of weeks.

Barker at the Bernoulli factory

Posted in Books, Statistics with tags , , , , , , , on October 5, 2017 by xi'an

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

non-reversible Langevin samplers

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 6, 2017 by xi'an

In the train to Oxford yesterday night, I read through the recently arXived Duncan et al.’s Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation. Standing up the whole trip in the great tradition of British trains.

The paper is fairly theoretical and full of Foster-Lyapunov assumptions but aims at defending an approach based on a non-reversible diffusion. One idea is that the diffusion based on the drift {∇ log π(x) + γ(x)} is associated with the target π provided

∇ . {π(x)γ(x)} = 0

which holds for the Langevin diffusion when γ(x)=0, but produces a non-reversible process in the alternative. The Langevin choice γ(x)=0 happens to be the worst possible when considering the asymptotic variance. In practice however the diffusion need be discretised, which induces an approximation that may be catastrophic for convergence if not corrected, and a relapse into reversibility if corrected by Metropolis. The proposal in the paper is to use a Lie-Trotter splitting I had never heard of before to split between reversible [∇ log π(x)] and non-reversible [γ(x)] parts of the process. The deterministic part is chosen as γ(x)=∇ log π(x) [but then what is the point since this is Langevin?] or as the gradient of a power of π(x). Although I was mostly lost by that stage, the paper then considers the error induced by a numerical integrator related with this deterministic part, towards deriving asymptotic mean and variance for the splitting scheme. On the unit hypercube. Although the paper includes a numerical example for the warped normal target, I find it hard to visualise the implementation of this scheme. Having obviously not heeded Nicolas’ and James’ advice, the authors also analyse the Pima Indian dataset by a logistic regression!)

weakly informative reparameterisations for location-scale mixtures

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , on January 19, 2017 by xi'an

fitted_density_galaxy_data_500itersWe have been working towards a revision of our reparameterisation paper for quite a while now and too advantage of Kate Lee visiting Paris this fortnight to make a final round: we have now arXived (and submitted) the new version. The major change against the earlier version is the extension of the approach to a large class of models that include infinitely divisible distributions, compound Gaussian, Poisson, and exponential distributions, and completely monotonic densities. The concept remains identical: change the parameterisation of a mixture from a component-wise decomposition to a construct made of the first moment(s) of the distribution and of component-wise objects constrained by the moment equation(s). There is of course a bijection between both parameterisations, but the constraints appearing in the latter produce compact parameter spaces for which (different) uniform priors can be proposed. While the resulting posteriors are no longer conjugate, even conditional on the latent variables, standard Metropolis algorithms can be implemented to produce Monte Carlo approximations of these posteriors.