**M**y colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. *[Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]*

## Archive for MCMC convergence

## EntropyMCMC [R package]

Posted in Statistics with tags convergence assessment, CRAN, discretization, entropy, EntropyMCMC, Lecture Notes in Statistics, MCMC, MCMC convergence, Monte Carlo Statistical Methods, R package, Springer-Verlag, Université d'Orléans, untractable normalizing constant on March 26, 2019 by xi'an## revisiting the Gelman-Rubin diagnostic

Posted in Books, pictures, Statistics, Travel, University life with tags ABCruise, asymptotic variance, convergence diagnostics, effective sample size, Gelman-Rubin statistic, Gulf of Bothnia, independence, MCMC, MCMC convergence, Monte Carlo Statistical Methods, stopping rule, subsampling, sunset, Titanic on January 23, 2019 by xi'an**J**ust before Xmas, Dootika Vats (Warwick) and Christina Knudson arXived a paper on a re-evaluation of the ultra-popular 1992 Gelman and Rubin MCMC convergence diagnostic. Which compares within-variance and between-variance on parallel chains started from hopefully dispersed initial values. Or equivalently an under-estimating and an over-estimating estimate of the MCMC average. In this paper, the authors take advantage of the variance estimators developed by Galin Jones, James Flegal, Dootika Vats and co-authors, which are batch mean estimators consistently estimating the asymptotic variance. They also discuss the choice of a cut-off on the ratio R of variance estimates, i.e., how close to one need it be? By relating R to the effective sample size (for which we also have reservations), which gives another way of calibrating the cut-off. The main conclusion of the study is that the recommended 1.1 bound is too large for a reasonable proximity to the true value of the Bayes estimator *(Disclaimer: The above ABCruise header is unrelated with the paper, apart from its use of the Titanic dataset!)
*

In fact, I have other difficulties than setting the cut-off point with the original scheme as a way to assess MCMC convergence or lack thereof, among which

- its dependence on the parameterisation of the chain and on the estimation of a specific target function
- its dependence on the starting distribution which makes the time to convergence not absolutely meaningful
- the confusion between getting to stationarity and exploring the whole target
- its missing the option to resort to subsampling schemes to attain pseudo-independence or scale time to convergence (albeit see 3. above)
- a potential bias brought by the stopping rule.

## Markov Chains [not a book review]

Posted in Books, pictures, Statistics, University life with tags book review, concentration inequalities, coupling, Eric Moulines, irreducibility, Markov chain and stochastic stability, Markov chain Monte Carlo, Markov chains, MCMC convergence, probability theory, Randal Douc, Richard Tweedie, Sean Meyn, Wasserstein distance on January 14, 2019 by xi'an**A**s Randal Douc and Éric Moulines are both very close friends and two authors of this book on Markov chains, I cannot engage into a regular book review! Judging from the table of contents, the coverage is not too dissimilar to the now classic Markov chain Stochastic Stability book by Sean Meyn and the late Richard Tweedie (1994), called the Bible of Markov chains by Peter Glynn, with more emphasis on convergence matters and a more mathematical perspective. The 757 pages book also includes a massive appendix on maths and probability background. As indicated in the preface, “the reason [the authors] thought it would be useful to write a new book is to survey some of the developments made during the 25 years that have elapsed since the publication of Meyn and Tweedie (1993b).” Connecting with the theoretical developments brought by MCMC methods. Like subgeometric rates of convergence to stationarity, sample paths, limit theorems, and concentration inequalities. The book also reflects on the numerous contributions of the authors to the field. Hence a perfect candidate for teaching Markov chains to mathematically well-prepared. graduate audiences. Congrats to the authors!

## “more Bayesian” GANs

Posted in Books, Statistics with tags Bayesian GANs, compatible conditional distributions, GANs, MCMC convergence, pseudo-likelihood on December 21, 2018 by xi'an**O**n X validated, I got pointed to this recent paper by He, Wang, Lee and Tiang, that proposes a new form of Bayesian GAN. Although I do not see it as really Bayesian, as explained below.

“[The]existing Bayesian method (Saatchi & Wilson, 2017) may lead to incompatible conditionals, which suggest that the underlying joint distribution actually does not exist.”

## Bayesian GANs [#2]

Posted in Books, pictures, R, Statistics with tags ABC in Edinburgh, Bayesian GANs, compatible conditional distributions, Edinburgh, GANs, generative adversarial networks, ISBA 2018, joint posterior, MCMC convergence, Metropolis-within-Gibbs algorithm, Monte Carlo Statistical Methods, normal model, University of Edinburgh on June 27, 2018 by xi'an**A**s an illustration of the lack of convergence of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a logistic discriminator (one parameter) and implemented the scheme (during an ISBA 2018 session). With flat priors on both parameters. And a Normal random walk as Metropolis-Hastings proposal. As expected, since there is no stationary distribution associated with the Markov chain, simulated chains do not exhibit a stationary pattern,

And they eventually reach an overflow error or a trapping state as the log-likelihood gets approximately to zero (red curve).

Too bad I missed the talk by Shakir Mohammed yesterday, being stuck on the Edinburgh by-pass at rush hour!, as I would have loved to hear his views about this rather essential issue…

## convergences of MCMC and unbiasedness

Posted in pictures, Statistics, University life with tags asynchronous algorithms, Hastings-Metropolis sampler, impatient user, maximal coupling, MCMC convergence, optimal transport, parallelisation, Paris Dauphine, perfect sampling, unbiased MCMC on January 16, 2018 by xi'an**D**uring his talk on unbiased MCMC in Dauphine today, Pierre Jacob provided a nice illustration of the convergence modes of MCMC algorithms. With the stationary target achieved after 100 Metropolis iterations, while the mean of the target taking much more iterations to be approximated by the empirical average. Plus a nice connection between coupling time and convergence. Convergence to the target.During Pierre’s talk, some simple questions came to mind, from developing an “impatient user version”, as in perfect sampling, in order to stop chains that run “forever”, to optimising parallelisation in order to avoid problems of asynchronicity. While the complexity of coupling increases with dimension and the coupling probability goes down, the average coupling time varies but an unexpected figure is that the expected cost per iteration is of 2 simulations, irrespective of the chosen kernels. Pierre also made a connection with optimal transport coupling and stressed that the maximal coupling was for the proposal and not for the target.

## ABC forecasts

Posted in Books, pictures, Statistics with tags ABC, ABC consistency, Australia, forecasting, MCMC convergence, Monash University, prediction, state space model, time series on January 9, 2018 by xi'an**M**y friends and co-authors David Frazier, Gael Martin, Brendan McCabe, and Worapree Maneesoonthorn arXived a paper on ABC forecasting at the turn of the year. ABC prediction is a natural extension of ABC inference in that, provided the full conditional of a future observation given past data and parameters is available but the posterior is not, ABC simulations of the parameters induce an approximation of the predictive. The paper thus considers the impact of this extension on the precision of the predictions. And argues that it is possible that this approximation is preferable to running MCMC in some settings. A first interesting result is that using ABC and hence conditioning on an insufficient summary statistic has no asymptotic impact on the resulting prediction, provided Bayesian concentration of the corresponding posterior takes place as in our convergence paper under revision.

“…conditioning inference about θ on η(y) rather than y makes no difference to the probabilistic statements made about [future observations]”

The above result holds both in terms of convergence in total variation and for proper scoring rules. Even though there is always a loss in accuracy in using ABC. Now, one may think this is a direct consequence of our (and others) earlier convergence results, but numerical experiments on standard time series show the distinct feature that, while the [MCMC] posterior and ABC posterior distributions on the parameters clearly differ, the predictives are more or less identical! With a potential speed gain in using ABC, although comparing parallel ABC versus non-parallel MCMC is rather delicate. For instance, a preliminary parallel ABC could be run as a burnin’ step for parallel MCMC, since all chains would then be roughly in the stationary regime. Another interesting outcome of these experiments is a case when the summary statistics produces a non-consistent ABC posterior, but still leads to a very similar predictive, as shown on this graph.This unexpected accuracy in prediction may further be exploited in state space models, towards producing particle algorithms that are greatly accelerated. Of course, an easy objection to this acceleration is that the impact of the approximation is unknown and un-assessed. However, such an acceleration leaves room for multiple implementations, possibly with different sets of summaries, to check for consistency over replicates.