Archive for MCMC

general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).

Rao-Blackwellisation in the MCMC era

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 6, 2021 by xi'an

A few months ago, as indicated on this blog, I was contacted by ISR editors to write a piece on Rao-Blackwellisation, towards a special issue celebrating Calyampudi Radhakrishna Rao’s 100th birthday. Gareth Roberts and I came up with this survey, now on arXiv, discussing different aspects of Monte Carlo and Markov Chain Monte Carlo that pertained to Rao-Blackwellisation, one way or another. As I discussed the topic with several friends over the Fall, it appeared that the difficulty was more in setting the boundaries. Than in finding connections. In a way anything conditioning or demarginalising or resorting to auxiliary variates is a form of Rao-Blackwellisation. When re-reading the JASA Gelfand and Smith 1990 paper where I first saw the link between the Rao-Blackwell theorem and simulation, I realised my memory of it had drifted from the original, since the authors proposed there an approximation of the marginal based on replicas rather than the original Markov chain. Being much closer to Tanner and Wong (1987) than I thought. It is only later that the true notion took shape. [Since the current version is still a draft, any comment or suggestion would be most welcomed!]

your GAN is secretly an energy-based model

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 5, 2021 by xi'an

As I was reading this NeurIPS 2020 paper by Che et al., and trying to make sense of it, I came across a citation to our paper Casella, Robert and Wells (2004) on a generalized accept-reject sampling scheme where the proposal changes at each simulation that sounds surprising if appreciated! But after checking this paper also appears as the first reference on the Wikipedia page for rejection sampling, which makes me wonder if many actually read it. (On the side, we mostly wrote this paper on a drive from Baltimore to Ithaca, after JSM 1999.)

“We provide more evidence that it is beneficial to sample from the energy-based model defined both by the generator and the discriminator instead of from the generator only.”

The paper seems to propose a post-processing of the generator output by a GAN, generating from the mixture of both generator and discriminator, via a (unscented) Langevin algorithm. The core idea is that, if p(.) is the true data generating process, g(.) the estimated generator and d(.) the discriminator, then

p(x) ≈ p⁰(x)∝g(x) exp(d(x))

(The approximation would be exact the discriminator optimal.) The authors work with the latent z’s, in the GAN meaning that generating pseudo-data x from g means taking a deterministic transform of z, x=G(z). When considering the above p⁰, a generation from p⁰ can be seen as accept-reject with acceptance probability proportional to exp[d{G(z)}]. (On the side, Lemma 1 is the standard validation for accept-reject sampling schemes.)

Reading this paper made me realised how much the field had evolved since my previous GAN related read. With directions like Metropolis-Hastings GANs and Wasserstein GANs. (And I noticed a “broader impact” section past the conclusion section about possible misuses with societal consequences, which is a new requirement for NeurIPS publications.)

Arianna Rosenbluth (1927-2020)

Posted in Statistics with tags , , , , , on December 30, 2020 by xi'an

Bayesian phylogeographic inference of SARS-CoV-2

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on December 14, 2020 by xi'an

Nature Communications of 10 October has a paper by Philippe Lemey et al. (incl. Marc Suchard) on including travel history and removing sampling bias on the study of the virus spread. (Which I was asked to review for a CNRS COVID watch platform, Bibliovid.)

The data is made of curated genomes available in GISAID on March 10, that is, before lockdown even started in France. With (trustworthy?) travel history data for over 20% of the sampled patients. (And an unwelcome reminder that Hong Kong is part of China, at a time of repression and “mainlandisation” by the CCP.)

“we model a discrete diffusion process between 44 locations within China, including 13 provinces, one municipality (Beijing), and one special administrative area (Hong Kong). We fit a generalized linear model (GLM) parameterization of the discrete diffusion process…”

The diffusion is actually a continuous-time Markov process, with a phylogeny that incorporates nodes associated with location. The Bayesian analysis of the model is made by MCMC, since, contrary to ABC, the likelihood can be computed by Felsenstein’s pruning algorithm. The covariates are used to calibrate the Markov process transitions between locations. The paper also includes a posterior predictive accuracy assessment.

“…we generate Markov jump estimates of the transition histories that are averaged over the entire posterior in our Bayesian inference.”

In particular the paper describes “travel-aware reconstruction” analyses that track the spatial path followed by a virus until collection, as below. The top graph represents the posterior probability distribution of this path.Given the lack of representativity, the authors also develop an additional “approach that adds unsampled taxa to assess the sensitivity of inferences to sampling bias”, although it mostly reflects the assumptions made in producing the artificial data. (With a possible connection with ABC?). If I understood correctly, they added 458 taxa for 14 locations,

An interesting opening made in the conclusion about the scalability of the approach:

“With the large number of SARS-CoV-2 genomes now available, the question arises how scalable the incorporation of un-sampled taxa will be. For computationally expensive Bayesian inferences, the approach may need to go hand in hand with down-sampling procedures or more detailed examination of specific sub-lineages.”

In the end, I find it hard, as with other COVID-related papers I read, to check how much the limitations, errors, truncations, &tc., attached with the data at hand impact the validation of this philogeographic reconstruction, and how the model can help further than reconstructing histories of contamination at the (relatively) early stage.