## an independent sampler that maximizes the acceptance rate of the MH algorithm

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on September 3, 2019 by xi'an

An ICLR 2019 paper by Neklyudov, Egorov and Vetrov on an optimal choice of the proposal in an independent Metropolis algorithm I discovered via an X validated question. Namely whether or not the expected Metropolis-Hastings acceptance ratio is always one (which it is not when the support of the proposal is restricted). The paper mentions the domination of the Accept-Reject algorithm by the associated independent Metropolis-Hastings algorithm, which has actually been stated in our Monte Carlo Statistical Methods (1999, Lemma 6.3.2) and may prove even older. The authors also note that the expected acceptance probability is equal to one minus the total variation distance between the joint defined as target x Metropolis-Hastings proposal distribution and its time-reversed version. Which seems to suffer from the same difficulty as the one mentioned in the X validated question. Namely that it only holds when the support of the Metropolis-Hastings proposal is at least the support of the target (or else when the support of the joint defined as target x Metropolis-Hastings proposal distribution is somewhat symmetric. Replacing total variation with Kullback-Leibler then leads to a manageable optimisation target if the proposal is a parameterised independent distribution. With a GAN version when the proposal is not explicitly available. I find it rather strange that one still seeks independent proposals for running Metropolis-Hastings algorithms as the result will depend on the family of proposals considered and as performances will deteriorate with dimension (the authors mention a 10% acceptance rate, which sounds quite low). [As an aside, ICLR 2020 will take part in Addis Abeba next April.]

## off to SimStat2019, Salzburg

Posted in Mountains, Running, Statistics, University life with tags , , , , , , , , , , , , , on September 2, 2019 by xi'an

Today, I am off to Salzburg for the SimStat 2019 workshop, or more formally the 10th International Workshop on Simulation and Statistics, where I give a talk on ABC. The program of the workshop is quite diverse and rich and so I do not think I will have time to take advantage of the Hohe Tauern or the Berchtesgaden Alps to go climbing. Especially since I am also discussing papers in an ABC session.

## efficient MCMC sampling

Posted in Statistics with tags , , , on June 24, 2019 by xi'an

Maxime Vono, Daniel Paulin and Arnaud Doucet recently arXived a paper about a regularisation technique that allows for efficient sampling from a complex posterior which potential function factorises as a large sum of transforms of linear projections of the parameter θ

$U(\theta)=\sum_i U_i(A_i\theta)$

The central idea in the paper [which was new to me] is to introduce auxiliary variates for the different terms in the sum, replacing the projections in the transforms, with an additional regularisation forcing these auxiliary variates to be as close as possible from the corresponding projection

$U(\theta,\mathbf z)=\sum_i U_i(z_i)+\varrho^{-1}||z_i-A_i\theta||^2$

This is only an approximation to the true target but it enjoys the possibility to run a massive Gibbs sampler in quite a reduced dimension. As the variance ρ of the regularisation term goes to zero the marginal posterior on the parameter θ converges to the true posterior. The authors manage to achieve precise convergence rates both in total variation and in Wasserstein distance.

From a practical point of view, only judging from the logistic example, it is hard to fathom how much this approach improves upon other approaches (provided they still apply) as the impact of the value of ρ should be assessed on top of the convergence of the high-dimensional Gibbs sampler. Or is there an annealing version in the pipe-line? While parallelisation is a major argument, it also seems that the Gibbs sampler need a central monitoring for each new simulation of θ. Unless some asynchronous version can be implemented.

## delayed but published!

Posted in Statistics with tags , , , , , , on June 20, 2019 by xi'an

## assessing MCMC convergence

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on June 6, 2019 by xi'an

When MCMC became mainstream in the 1990’s, there was a flurry of proposals to check, assess, and even guarantee convergence to the stationary distribution, as discussed in our MCMC book. Along with Chantal Guihenneuc and Kerrie Mengersen, we also maintained for a while a reviewww webpage categorising theses. Niloy Biswas and Pierre Jacob have recently posted a paper where they propose the use of couplings (and unbiased MCMC) towards deriving bounds on different metrics between the target and the current distribution of the Markov chain. Two chains are created from a given kernel and coupled with a lag of L, meaning that after a while, the two chains become one with a time difference of L. (The supplementary material contains many details on how to induce coupling.) The distance to the target can then be bounded by a sum of distances between the two chains until they merge. The above picture from the paper is a comparison a Polya-Urn sampler with several HMC samplers for a logistic target (not involving the Pima Indian dataset!). The larger the lag L the more accurate the bound. But the larger the lag the more expensive the assessment of how many steps are needed to convergence. Especially when considering that the evaluation requires restarting the chains from scratch and rerunning until they couple again, rather than continuing one run which can only brings the chain closer to stationarity and to being distributed from the target. I thus wonder at the possibility of some Rao-Blackwellisation of the simulations used in this assessment (while realising once more than assessing convergence almost inevitably requires another order of magnitude than convergence itself!). Without a clear idea of how to do it… For instance, keeping the values of the chain(s) at the time of coupling is not directly helpful to create a sample from the target since they are not distributed from that target.

[Pierre also wrote a blog post about the paper on Statisfaction that is definitely much clearer and pedagogical than the above.]

## likelihood free nested sampling

Posted in Books, Statistics with tags , , , , , , , , , , , on April 26, 2019 by xi'an

A recent paper by Mikelson and Khammash found on bioRxiv considers the (paradoxical?) mixture of nested sampling and intractable likelihood. They however cover only the case when a particle filter or another unbiased estimator of the likelihood function can be found. Unless I am missing something in the paper, this seems a very costly and convoluted approach when pseudo-marginal MCMC is available. Or the rather substantial literature on computational approaches to state-space models. Furthermore simulating under the lower likelihood constraint gets even more intricate than for standard nested sampling as the parameter space is augmented with the likelihood estimator as an extra variable. And this makes a constrained simulation the harder, to the point that the paper need resort to a Dirichlet process Gaussian mixture approximation of the constrained density. It thus sounds quite an intricate approach to the problem. (For one of the realistic examples, the authors mention a 12 hour computation on a 48 core cluster. Producing an approximation of the evidence that is not unarguably stabilised, contrary to the above.) Once again, not being completely up-to-date in sequential Monte Carlo, I may miss a difficulty in analysing such models with other methods, but the proposal seems to be highly demanding with respect to the target.

## EntropyMCMC [R package]

Posted in Statistics with tags , , , , , , , , , , , , on March 26, 2019 by xi'an

My colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. [Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]