Archive for unbiased MCMC

unbiased Hamiltonian Monte Carlo with couplings

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 25, 2019 by xi'an

In the June issue of Biometrika, which had been sitting for a few weeks on my desk under my teapot!, Jeremy Heng and Pierre Jacob published a paper on unbiased estimators for Hamiltonian Monte Carlo using couplings. (Disclaimer: I was not involved with the review or editing of this paper.) Which extends to HMC environments the earlier paper of Pierre Jacob, John O’Leary and Yves Atchadé, to be discussed soon at the Royal Statistical Society. The fundamentals are the same, namely that an unbiased estimator can be produced from a converging sequence of estimators and that it can be de facto computed if two Markov chains with the same marginal can be coupled. The issue with Hamiltonians is to figure out how to couple their dynamics. In the Gaussian case, it is relatively easy to see that two chains with the same initial momentum meet periodically. In general, there is contraction within a compact set (Lemma 1). The coupling extends to a time discretisation of the Hamiltonian flow by a leap-frog integrator, still using the same momentum. Which roughly amounts in using the same random numbers in both chains. When defining a relaxed meeting (!) where both chains are within δ of one another, the authors rely on a drift condition (8) that reminds me of the early days of MCMC convergence and seem to imply the existence of a small set “where the target distribution [density] is strongly log-concave”. And which makes me wonder if this small set could be used instead to create renewal events that would in turn ensure both stationarity and unbiasedness without the recourse to a second coupled chain. When compared on a Gaussian example with couplings on Metropolis-Hastings and MALA (Fig. 1), the coupled HMC sees hardly any impact of the dimension of the target (in the average coupling time), with a much lower value. However, I wonder at the relevance of the meeting time as an assessment of efficiency. In the sense that the coupling time is not a convergence time but reflects as well on the initial conditions. I acknowledge that this allows for an averaging over  parallel implementations but I remain puzzled by the statement that this leads to “estimators that are consistent in the limit of the number of replicates, rather than in the usual limit of the number of Markov chain iterations”, since a particularly poor initial distribution could on principle lead to a mode of the target being never explored or on the coupling time being ever so rarely too large for the computing abilities at hand.

No review this summer

Posted in Books, Statistics, University life with tags , , , , , , , , on September 19, 2019 by xi'an

A recent editorial in Nature was a declaration by a biologist from UCL on her refusal to accept refereeing requests during the summer (or was it the summer break), which was motivated by a need to reconnect with her son. Which is a good enough reason (!), but reflects sadly on the increasing pressure on one’s schedule to juggle teaching, research, administration, grant hunting, society service, along with a balanced enough family life. (Although I have been rather privileged in this regard!) Given that refereeing or journal editing is neither visible nor rewarded, it comes as the first task to be postponed or abandoned, even though most of us realise it is essential to keep science working as a whole and to make our own papers published. I have actually noticed an increasing difficulty in the past decade to get (good) referees to accept new reviews, often asking for deadlines that are hurting the authors, like six months. Making them practically unavailable. As I mentioned earlier on this blog, it could be that publishing referees’ reports as discussions would help, since they would become recognised as (unreviewed!) publications, but it is unclear this is the solution. If judging from the similar difficulty in getting discussions for discussed papers. (As an aside, there are two exciting papers coming up for discussion in Series B, ‘Unbiased Markov chain Monte Carlo methods with couplings’ by  Pierre E. Jacob, John O’Leary and Yves F. Atchadé and in Bayesian Analysis, Latent nested nonparametric priors by Frederico Camerlenghi, David Dunson, Antonio Lijoi, Igor Prünster, and Abel Rodríguez). Which is surprising when considering the willingness of a part of the community to engage into forii discussions, sometimes of a considerable length as illustrated on Andrew’s blog.

Another entry in Nature mentioned the case of two University of København tenured professors in geology who were fired for either using a private email address (?!) or being away on field work during an exam and at a conference without permission from the administration. Which does not even remotely sound like a faulty behaviour to me or else I would have been fired eons ago..!

assessing MCMC convergence

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on June 6, 2019 by xi'an

When MCMC became mainstream in the 1990’s, there was a flurry of proposals to check, assess, and even guarantee convergence to the stationary distribution, as discussed in our MCMC book. Along with Chantal Guihenneuc and Kerrie Mengersen, we also maintained for a while a reviewww webpage categorising theses. Niloy Biswas and Pierre Jacob have recently posted a paper where they propose the use of couplings (and unbiased MCMC) towards deriving bounds on different metrics between the target and the current distribution of the Markov chain. Two chains are created from a given kernel and coupled with a lag of L, meaning that after a while, the two chains become one with a time difference of L. (The supplementary material contains many details on how to induce coupling.) The distance to the target can then be bounded by a sum of distances between the two chains until they merge. The above picture from the paper is a comparison a Polya-Urn sampler with several HMC samplers for a logistic target (not involving the Pima Indian dataset!). The larger the lag L the more accurate the bound. But the larger the lag the more expensive the assessment of how many steps are needed to convergence. Especially when considering that the evaluation requires restarting the chains from scratch and rerunning until they couple again, rather than continuing one run which can only brings the chain closer to stationarity and to being distributed from the target. I thus wonder at the possibility of some Rao-Blackwellisation of the simulations used in this assessment (while realising once more than assessing convergence almost inevitably requires another order of magnitude than convergence itself!). Without a clear idea of how to do it… For instance, keeping the values of the chain(s) at the time of coupling is not directly helpful to create a sample from the target since they are not distributed from that target.

[Pierre also wrote a blog post about the paper on Statisfaction that is definitely much clearer and pedagogical than the above.]

convergences of MCMC and unbiasedness

Posted in pictures, Statistics, University life with tags , , , , , , , , , on January 16, 2018 by xi'an

During his talk on unbiased MCMC in Dauphine today, Pierre Jacob provided a nice illustration of the convergence modes of MCMC algorithms. With the stationary target achieved after 100 Metropolis iterations, while the mean of the target taking much more iterations to be approximated by the empirical average. Plus a nice connection between coupling time and convergence. Convergence to the target.During Pierre’s talk, some simple questions came to mind, from developing an “impatient user version”, as in perfect sampling, in order  to stop chains that run “forever”,  to optimising parallelisation in order to avoid problems of asynchronicity. While the complexity of coupling increases with dimension and the coupling probability goes down, the average coupling time varies but an unexpected figure is that the expected cost per iteration is of 2 simulations, irrespective of the chosen kernels. Pierre also made a connection with optimal transport coupling and stressed that the maximal coupling was for the proposal and not for the target.

Better together in Kolkata [slides]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on January 4, 2018 by xi'an

Here are the slides of the talk on modularisation I am giving today at the PC Mahalanobis 125 Conference in Kolkata, mostly borrowed from Pierre’s talk at O’Bayes 2018 last month:

[which made me realise Slideshare has discontinued the option to update one’s presentation, forcing users to create a new presentation for each update!] Incidentally, the amphitheatre at ISI is located right on top of a geological exhibit room with a reconstituted Barapasaurus tagorei so I will figuratively ride a dinosaur during my talk!