Archive for pseudo-marginal MCMC

IMS workshop [day 5]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on September 3, 2018 by xi'an

The last day of the starting workshop [and my last day in Singapore] was a day of importance [sampling] with talks by Matti Vihola opposing importance sampling and delayed acceptance and particle MCMC, related to several papers of his that I missed. To be continued in the coming weeks at the IMS, which is another reason to regret having to leave that early [as my Parisian semester starts this Monday with an undergrad class at 8:30!]

And then a talk by Joaquín Miguez on stabilizing importance sampling by truncation which reminded me very much of the later work by Andrew Gelman and Aki Vehtari on Pareto smoothed importance sampling, with further operators adapted to sequential settings and the similar drawback that when the importance sampler is poor, i.e., when the simulated points are all very far from the centre of mass, no amount of fudging with the weights will bring the points closer. AMIS made an appearance as a reference method, to be improved by this truncation of the weights, a wee bit surprising as it should bring the large weights of the earlier stages down.

Followed by an almost silent talk by Nick Whiteley, who having lost his voice to the air conditioning whispered his talk in the microphone. Having once faced a lost voice during an introductory lecture to a large undergraduate audience, I could not but completely commiserate for the hardship of the task. Although this made the audience most silent and attentive. His topic was the Viterbi process and its parallelisation, by using a truncated horizon (presenting connection with overdamped Langevin, eg Durmus and Moulines and Dalalyan).

And due to a pressing appointment with my son and his girlfriend [who were traveling through Singapore on that day] for a chili crab dinner on my way to the airport, I missed the final talk by Arnaud Doucet, where he was to reconsider PDMP algorithms without the continuous time layer, a perspective I find most appealing!

Overall, this was a quite diverse and rich [starting] seminar, backed by the superb organisation of the IMS and the smooth living conditions on the NUS campus [once I had mastered the bus routes], which would have made much more sense for me as part of a longer stay, which is actually what happened the previous time I visited the IMS (in 2005), again clashing with my course schedule at home… And as always, I am impressed with the city-state of Singapore, for the highly diverse food scene in particular, but also this [maybe illusory] impression of coexistence between communities. And even though the ecological footprint could certainly be decreased, measures to curb car ownership (with a 150% purchase tax) and use (with congestion charges).

IMS workshop [day 4]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on August 31, 2018 by xi'an

While I did not repeat the mistake of yesterday morning, just as well because the sun was unbearably strong!, I managed this time to board a bus headed in the wrong direction and as a result went through several remote NUS campi! Missing the first talk of the day as a result. By Youssef Marzouk, with a connection between sequential Monte Carlo and optimal transport. Transport for sampling, that is. The following talk by Tiangang Cui was however related, with Marzouk a co-author, as it aimed at finding linear transforms towards creating Normal approximations to the target to be used as proposals in Metropolis algorithms. Which may sound like something already tried a zillion times in the MCMC literature, except that the setting was rather specific to some inverse problems, imposing a generalised Normal structure on the transform, then optimised by transport arguments. It is unclear to me [from just attending the talk] how complex this derivation is and how dimension steps in, but the produced illustrations were quite robust to an increase in dimension.

The remaining talks for the day were mostly particular, from Anthony Lee introducing a new and almost costless way of producing variance estimates in particle filters, exploiting only the ancestry of particles, to Mike Pitt discussing the correlated pseudo-marginal algorithm developed with George Deligiannidis and Arnaud Doucet. Which somewhat paradoxically managed to fight the degeneracy [i.e., the need for a number of terms increasing like the time index T] found in independent pseudo-marginal resolutions, moving down to almost log(T)… With an interesting connection to the quasi SMC approach of Mathieu and Nicolas. And Sebastian Reich also stressed the links with optimal transport in a talk about data assimilation that was way beyond my reach. The day concluded with fireworks, through a magistral lecture by Professeur Del Moral on a continuous time version of PMCMC using the Feynman-Kac terminology. Pierre did a superb job during his lecture towards leading the whole room to the conclusion.

JSM 2018 [#3]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 1, 2018 by xi'an

As I skipped day #2 for climbing, here I am on day #3, attending JSM 2018, with a [fully Canadian!] session on (conditional) copula (where Bruno Rémillard talked of copulas for mixed data, with unknown atoms, which sounded like an impossible target!), and another on four highlights from Bayesian Analysis, (the journal), with Maria Terres defending the (often ill-considered!) spectral approach within Bayesian analysis, modelling spectral densities (Fourier transforms of correlations functions, not probability densities), an advantage compared with MCAR modelling being the automated derivation of dependence graphs. While the spectral ghost did not completely dissipate for me, the use of DIC that she mentioned at the very end seems to call for investigation as I do not know of well-studied cases of complex dependent data with clearly specified DICs. Then Chris Drobandi was speaking of ABC being used for prior choice, an idea I vaguely remember seeing quite a while ago as a referee (or another paper!), paper in BA that I missed (and obviously did not referee). Using the same reference table works (for simple ABC) with different datasets but also different priors. I did not get first the notion that the reference table also produces an evaluation of the marginal distribution but indeed the entire simulation from prior x generative model gives a Monte Carlo representation of the marginal, hence the evidence at the observed data. Borrowing from Evans’ fringe Bayesian approach to model choice by prior predictive check for prior-model conflict. I remain sceptic or at least agnostic on the notion of using data to compare priors. And here on using ABC in tractable settings.

The afternoon session was [a mostly Australian] Advanced Bayesian computational methods,  with Robert Kohn on variational Bayes, with an interesting comparison of (exact) MCMC and (approximative) variational Bayes results for some species intensity and the remark that forecasting may be much more tolerant to the approximation than estimation. Making me wonder at a possibility of assessing VB on the marginals manageable by MCMC. Unless I miss a complexity such that the decomposition is impossible. And Antonietta Mira on estimating time-evolving networks estimated by ABC (which Anto first showed me in Orly airport, waiting for her plane!). With a possibility of a zero distance. Next talk by Nadja Klein on impicit copulas, linked with shrinkage properties I was unaware of, including the case of spike & slab copulas. Michael Smith also spoke of copulas with discrete margins, mentioning a version with continuous latent variables (as I thought could be done during the first session of the day), then moving to variational Bayes which sounds quite popular at JSM 2018. And David Gunawan made a presentation of a paper mixing pseudo-marginal Metropolis with particle Gibbs sampling, written with Chris Carter and Robert Kohn, making me wonder at their feature of using the white noise as an auxiliary variable in the estimation of the likelihood, which is quite clever but seems to get against the validation of the pseudo-marginal principle. (Warning: I have been known to be wrong!)

Langevin on a wrong bend

Posted in Books, Statistics with tags , , , , , , , on October 19, 2017 by xi'an

Arnak Dalayan and Avetik Karagulyan (CREST) arXived a paper the other week on a focussed study of the Langevin algorithm [not MALA] when the gradient of the target is incorrect. With the following improvements [quoting non-verbatim from the paper]:

  1. a varying-step Langevin that reduces the number of iterations for a given Wasserstein precision, compared with recent results by e.g. Alan Durmus and Éric Moulines;
  2. an extension of convergence results for error-prone evaluations of the gradient of the target (i.e., the gradient is replaced with a noisy version, under some moment assumptions that do not include unbiasedness);
  3. a new second-order sampling algorithm termed LMCO’, with improved convergence properties.

What is particularly interesting to me in this setting is the use in all these papers of a discretised Langevin diffusion (a.k.a., random walk with a drift induced by the gradient of the log-target) without the original Metropolis correction. The results rely on an assumption of [strong?] log-concavity of the target, with “user-friendly” bounds on the Wasserstein distance depending on the constants appearing in this log-concavity constraint. And so does the adaptive step. (In the case of the noisy version, the bias and variance of the noise also matter. As pointed out by the authors, there is still applicability to scaling MCMC for large samples. Beyond pseudo-marginal situations.)

“…this, at first sight very disappointing behavior of the LMC algorithm is, in fact, continuously connected to the exponential convergence of the gradient descent.”

The paper concludes with an interesting mise en parallèle of Langevin algorithms and of gradient descent algorithms, since the convergence rates are the same.

Barker at the Bernoulli factory

Posted in Books, Statistics with tags , , , , , , , on October 5, 2017 by xi'an

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

impressions from EcoSta2017 [guest post]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on July 6, 2017 by xi'an

[This is a guest post on the recent EcoSta2017 (Econometrics and Statistics) conference in Hong Kong, contributed by Chris Drovandi from QUT, Brisbane.]

There were (at least) two sessions on Bayesian Computation at the recent EcoSta (Econometrics and Statistics) 2017 conference in Hong Kong. Below is my review of them. My overall impression of the conference is that there were lots of interesting talks, albeit a lot in financial time series, not my area. Even so I managed to pick up a few ideas/concepts that could be useful in my research. One criticism I had was that there were too many sessions in parallel, which made choosing quite difficult and some sessions very poorly attended. Another criticism of many participants I spoke to was that the location of the conference was relatively far from the city area.

In the first session (chaired by Robert Kohn), Minh-Ngoc Tran spoke about this paper on Bayesian estimation of high-dimensional Copula models with mixed discrete/continuous margins. Copula models with all continuous margins are relatively easy to deal with, but when the margins are discrete or mixed there are issues with computing the likelihood. The main idea of the paper is to re-write the intractable likelihood as an integral over a hypercube of ≤J dimensions (where J is the number of variables), which can then be estimated unbiasedly (with variance reduction by using randomised quasi-MC numbers). The paper develops advanced (correlated) pseudo-marginal and variational Bayes methods for inference.

In the following talk, Chris Carter spoke about different types of pseudo-marginal methods, particle marginal Metropolis-Hastings and particle Gibbs for state space models. Chris suggests that a combination of these methods into a single algorithm can further improve mixing. Continue reading

Russian roulette still rolling

Posted in Statistics with tags , , , , , , , , , , , , on March 22, 2017 by xi'an

Colin Wei and Iain Murray arXived a new version of their paper on doubly-intractable distributions, which is to be presented at AISTATS. It builds upon the Russian roulette estimator of Lyne et al. (2015), which itself exploits the debiasing technique of McLeish et al. (2011) [found earlier in the physics literature as in Carter and Cashwell, 1975, according to the current paper]. Such an unbiased estimator of the inverse of the normalising constant can be used for pseudo-marginal MCMC, except that the estimator is sometimes negative and has to be so as proved by Pierre Jacob and co-authors. As I discussed in my post on the Russian roulette estimator, replacing the negative estimate with its absolute value does not seem right because a negative value indicates that the quantity is close to zero, hence replacing it with zero would sound more appropriate. Wei and Murray start from the property that, while the expectation of the importance weight is equal to the normalising constant, the expectation of the inverse of the importance weight converges to the inverse of the weight for an MCMC chain. This however sounds like an harmonic mean estimate because the property would also stand for any substitute to the importance density, as it only requires the density to integrate to one… As noted in the paper, the variance of the resulting Roulette estimator “will be high” or even infinite. Following Glynn et al. (2014), the authors build a coupled version of that solution, which key feature is to cut the higher order terms in the debiasing estimator. This does not guarantee finite variance or positivity of the estimate, though. In order to decrease the variance (assuming it is finite), backward coupling is introduced, with a Rao-Blackwellisation step using our 1996 Biometrika derivation. Which happens to be of lower cost than the standard Rao-Blackwellisation in that special case, O(N) versus O(N²), N being the stopping rule used in the debiasing estimator. Under the assumption that the inverse importance weight has finite expectation [wrt the importance density], the resulting backward-coupling Russian roulette estimator can be proven to be unbiased, as it enjoys a finite expectation. (As in the generalised harmonic mean case, the constraint imposes thinner tails on the importance function, which then hampers the convergence of the MCMC chain.) No mention is made of achieving finite variance for those estimators, which again is a serious concern due to the similarity with harmonic means…