Archive for pseudo-marginal MCMC

easy-to-use empirical likelihood ABC

Posted in Statistics, University life with tags , , , , , , , on October 23, 2018 by xi'an

A newly arXived paper from a group of researchers at NUS I wish we had discussed when I was there last month. As we wrote this empirical ABCe paper in PNAS with Kerrie Mengersen and Pierre Pudlo in 2012. Plus the SAME paper with Arnaud Doucet and Simon Godsill ten years earlier, which the authors prefer to call data cloning in continuation of the more recent Lele et al. (2007). They could actually have used my original denomination of prior feedback (1992? I remember presenting the idea at Camp Casella in Cornell that summer) as well! Actually, I am not certain invoking prior feedback is quite necessary since this is a form of simulated method of moments as well.

Now, did we really assume that some moments of the distribution were analytically available, although the likelihood was not?! Even before going through the paper, it dawned on me that these theoretical moments could have been simulated instead, since the model is a generative one: for a given parameter value, a direct Monte Carlo approximation to the exact moment can be produced and can serve as a constraint for the empirical likelihood definition. I am surprised and aggrieved that we would not think of this empirical likelihood version of a method of moments. Which is central to the current paper. In the sense that, were the parameter exact, the differences between the moments based on the actual data x⁰ and the moments based on m replicas of the simulated data x¹,x²,… have mean zero, meaning the moment constraint is immediately available. Meaning an empirical likelihood is easily constructed, replacing the actual likelihood in an MCMC scheme, albeit at a rather high computing cost. Congratulations to the authors for uncovering this possibility that we missed!

“The summary statistics in this example were judiciously chosen.”

One point in the paper on which I disagree with the authors is the argument that MCMC sampling based on an empirical likelihood can be seen as an implementation of the pseudo-marginal Metropolis-Hastings method. The major difference in my opinion is that there is no unbiasedness here (and no generic result that indicates convergence to the exact posterior as the number of simulations grows to infinity). The other point unclear to me is about the selection of summaries [or moments] for implementing the method, which seems to be based on their performances in the subsequent estimation, performances that are hard to assess properly in intractable likelihood cases. In the last example of stereological extremes (not covered in our paper), for instance, the output is compared with the parallel synthetic likelihood result.

IMS workshop [day 5]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on September 3, 2018 by xi'an

The last day of the starting workshop [and my last day in Singapore] was a day of importance [sampling] with talks by Matti Vihola opposing importance sampling and delayed acceptance and particle MCMC, related to several papers of his that I missed. To be continued in the coming weeks at the IMS, which is another reason to regret having to leave that early [as my Parisian semester starts this Monday with an undergrad class at 8:30!]

And then a talk by Joaquín Miguez on stabilizing importance sampling by truncation which reminded me very much of the later work by Andrew Gelman and Aki Vehtari on Pareto smoothed importance sampling, with further operators adapted to sequential settings and the similar drawback that when the importance sampler is poor, i.e., when the simulated points are all very far from the centre of mass, no amount of fudging with the weights will bring the points closer. AMIS made an appearance as a reference method, to be improved by this truncation of the weights, a wee bit surprising as it should bring the large weights of the earlier stages down.

Followed by an almost silent talk by Nick Whiteley, who having lost his voice to the air conditioning whispered his talk in the microphone. Having once faced a lost voice during an introductory lecture to a large undergraduate audience, I could not but completely commiserate for the hardship of the task. Although this made the audience most silent and attentive. His topic was the Viterbi process and its parallelisation, by using a truncated horizon (presenting connection with overdamped Langevin, eg Durmus and Moulines and Dalalyan).

And due to a pressing appointment with my son and his girlfriend [who were traveling through Singapore on that day] for a chili crab dinner on my way to the airport, I missed the final talk by Arnaud Doucet, where he was to reconsider PDMP algorithms without the continuous time layer, a perspective I find most appealing!

Overall, this was a quite diverse and rich [starting] seminar, backed by the superb organisation of the IMS and the smooth living conditions on the NUS campus [once I had mastered the bus routes], which would have made much more sense for me as part of a longer stay, which is actually what happened the previous time I visited the IMS (in 2005), again clashing with my course schedule at home… And as always, I am impressed with the city-state of Singapore, for the highly diverse food scene in particular, but also this [maybe illusory] impression of coexistence between communities. And even though the ecological footprint could certainly be decreased, measures to curb car ownership (with a 150% purchase tax) and use (with congestion charges).

IMS workshop [day 4]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on August 31, 2018 by xi'an

While I did not repeat the mistake of yesterday morning, just as well because the sun was unbearably strong!, I managed this time to board a bus headed in the wrong direction and as a result went through several remote NUS campi! Missing the first talk of the day as a result. By Youssef Marzouk, with a connection between sequential Monte Carlo and optimal transport. Transport for sampling, that is. The following talk by Tiangang Cui was however related, with Marzouk a co-author, as it aimed at finding linear transforms towards creating Normal approximations to the target to be used as proposals in Metropolis algorithms. Which may sound like something already tried a zillion times in the MCMC literature, except that the setting was rather specific to some inverse problems, imposing a generalised Normal structure on the transform, then optimised by transport arguments. It is unclear to me [from just attending the talk] how complex this derivation is and how dimension steps in, but the produced illustrations were quite robust to an increase in dimension.

The remaining talks for the day were mostly particular, from Anthony Lee introducing a new and almost costless way of producing variance estimates in particle filters, exploiting only the ancestry of particles, to Mike Pitt discussing the correlated pseudo-marginal algorithm developed with George Deligiannidis and Arnaud Doucet. Which somewhat paradoxically managed to fight the degeneracy [i.e., the need for a number of terms increasing like the time index T] found in independent pseudo-marginal resolutions, moving down to almost log(T)… With an interesting connection to the quasi SMC approach of Mathieu and Nicolas. And Sebastian Reich also stressed the links with optimal transport in a talk about data assimilation that was way beyond my reach. The day concluded with fireworks, through a magistral lecture by Professeur Del Moral on a continuous time version of PMCMC using the Feynman-Kac terminology. Pierre did a superb job during his lecture towards leading the whole room to the conclusion.

JSM 2018 [#3]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 1, 2018 by xi'an

As I skipped day #2 for climbing, here I am on day #3, attending JSM 2018, with a [fully Canadian!] session on (conditional) copula (where Bruno Rémillard talked of copulas for mixed data, with unknown atoms, which sounded like an impossible target!), and another on four highlights from Bayesian Analysis, (the journal), with Maria Terres defending the (often ill-considered!) spectral approach within Bayesian analysis, modelling spectral densities (Fourier transforms of correlations functions, not probability densities), an advantage compared with MCAR modelling being the automated derivation of dependence graphs. While the spectral ghost did not completely dissipate for me, the use of DIC that she mentioned at the very end seems to call for investigation as I do not know of well-studied cases of complex dependent data with clearly specified DICs. Then Chris Drobandi was speaking of ABC being used for prior choice, an idea I vaguely remember seeing quite a while ago as a referee (or another paper!), paper in BA that I missed (and obviously did not referee). Using the same reference table works (for simple ABC) with different datasets but also different priors. I did not get first the notion that the reference table also produces an evaluation of the marginal distribution but indeed the entire simulation from prior x generative model gives a Monte Carlo representation of the marginal, hence the evidence at the observed data. Borrowing from Evans’ fringe Bayesian approach to model choice by prior predictive check for prior-model conflict. I remain sceptic or at least agnostic on the notion of using data to compare priors. And here on using ABC in tractable settings.

The afternoon session was [a mostly Australian] Advanced Bayesian computational methods,  with Robert Kohn on variational Bayes, with an interesting comparison of (exact) MCMC and (approximative) variational Bayes results for some species intensity and the remark that forecasting may be much more tolerant to the approximation than estimation. Making me wonder at a possibility of assessing VB on the marginals manageable by MCMC. Unless I miss a complexity such that the decomposition is impossible. And Antonietta Mira on estimating time-evolving networks estimated by ABC (which Anto first showed me in Orly airport, waiting for her plane!). With a possibility of a zero distance. Next talk by Nadja Klein on impicit copulas, linked with shrinkage properties I was unaware of, including the case of spike & slab copulas. Michael Smith also spoke of copulas with discrete margins, mentioning a version with continuous latent variables (as I thought could be done during the first session of the day), then moving to variational Bayes which sounds quite popular at JSM 2018. And David Gunawan made a presentation of a paper mixing pseudo-marginal Metropolis with particle Gibbs sampling, written with Chris Carter and Robert Kohn, making me wonder at their feature of using the white noise as an auxiliary variable in the estimation of the likelihood, which is quite clever but seems to get against the validation of the pseudo-marginal principle. (Warning: I have been known to be wrong!)

Langevin on a wrong bend

Posted in Books, Statistics with tags , , , , , , , on October 19, 2017 by xi'an

Arnak Dalayan and Avetik Karagulyan (CREST) arXived a paper the other week on a focussed study of the Langevin algorithm [not MALA] when the gradient of the target is incorrect. With the following improvements [quoting non-verbatim from the paper]:

  1. a varying-step Langevin that reduces the number of iterations for a given Wasserstein precision, compared with recent results by e.g. Alan Durmus and Éric Moulines;
  2. an extension of convergence results for error-prone evaluations of the gradient of the target (i.e., the gradient is replaced with a noisy version, under some moment assumptions that do not include unbiasedness);
  3. a new second-order sampling algorithm termed LMCO’, with improved convergence properties.

What is particularly interesting to me in this setting is the use in all these papers of a discretised Langevin diffusion (a.k.a., random walk with a drift induced by the gradient of the log-target) without the original Metropolis correction. The results rely on an assumption of [strong?] log-concavity of the target, with “user-friendly” bounds on the Wasserstein distance depending on the constants appearing in this log-concavity constraint. And so does the adaptive step. (In the case of the noisy version, the bias and variance of the noise also matter. As pointed out by the authors, there is still applicability to scaling MCMC for large samples. Beyond pseudo-marginal situations.)

“…this, at first sight very disappointing behavior of the LMC algorithm is, in fact, continuously connected to the exponential convergence of the gradient descent.”

The paper concludes with an interesting mise en parallèle of Langevin algorithms and of gradient descent algorithms, since the convergence rates are the same.

Barker at the Bernoulli factory

Posted in Books, Statistics with tags , , , , , , , on October 5, 2017 by xi'an

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

impressions from EcoSta2017 [guest post]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on July 6, 2017 by xi'an

[This is a guest post on the recent EcoSta2017 (Econometrics and Statistics) conference in Hong Kong, contributed by Chris Drovandi from QUT, Brisbane.]

There were (at least) two sessions on Bayesian Computation at the recent EcoSta (Econometrics and Statistics) 2017 conference in Hong Kong. Below is my review of them. My overall impression of the conference is that there were lots of interesting talks, albeit a lot in financial time series, not my area. Even so I managed to pick up a few ideas/concepts that could be useful in my research. One criticism I had was that there were too many sessions in parallel, which made choosing quite difficult and some sessions very poorly attended. Another criticism of many participants I spoke to was that the location of the conference was relatively far from the city area.

In the first session (chaired by Robert Kohn), Minh-Ngoc Tran spoke about this paper on Bayesian estimation of high-dimensional Copula models with mixed discrete/continuous margins. Copula models with all continuous margins are relatively easy to deal with, but when the margins are discrete or mixed there are issues with computing the likelihood. The main idea of the paper is to re-write the intractable likelihood as an integral over a hypercube of ≤J dimensions (where J is the number of variables), which can then be estimated unbiasedly (with variance reduction by using randomised quasi-MC numbers). The paper develops advanced (correlated) pseudo-marginal and variational Bayes methods for inference.

In the following talk, Chris Carter spoke about different types of pseudo-marginal methods, particle marginal Metropolis-Hastings and particle Gibbs for state space models. Chris suggests that a combination of these methods into a single algorithm can further improve mixing. Continue reading