**O**n March 1, I have started handling papers for Biometrika as deputy editor, along with Omiros Papaspiliopoulos. With on average one paper a day to handle this means a change in my schedule and presumably less blog posts about recent papers and arXivals if I want to keep my daily morning runs!

## Archive for Biometrika

## and here we go!

Posted in Books, Running, Statistics, University life with tags academic journals, Biometrika, blogging, editor, peer review, scientific editing on March 16, 2018 by xi'an## delayed acceptance ABC-SMC

Posted in pictures, Statistics, Travel with tags ABC-MCMC, ABC-SMC, Biometrika, delayed acceptance, lazy ABC, sequential Monte Carlo, SMC-ABC, stratified sampling on December 11, 2017 by xi'an**L**ast summer, during my vacation on Skye, Richard Everitt and Paulina Rowińska arXived a paper on delayed acceptance associated with ABC. ArXival that I missed, then! In order to decrease the number of simulations from the likelihood. As in our own delayed acceptance paper (without ABC), a cheap alternative generator is used to first reject the least likely parameters values, before possibly continuing to use a full generator. Also as lazy ABC. The first step of this ABC algorithm requires a cheap generator plus a primary tolerance ε¹ to compare the generation with the data or part of it. This may be followed by a second generation with a second tolerance level ε². The paper applies more specifically ABC-SMC as introduced in Sisson, Fan and Tanaka (2007) and reassessed in our subsequent 2009 Biometrika paper with Mark Beaumont, Jean-Marie Cornuet and Jean-Michel Marin. As well as in the ABC-SMC paper by Pierre Del Moral and Arnaud Doucet.

When looking at the version of the algorithm [Algorithm 2] based on two basic acceptance ABC steps, there are two features I find intriguing: (i) the primary step uses a cheap generator to reject early poor values of the parameter, followed by the second step involving a more expensive and exact generator, but I see no impact of the choice of this cheap generator in the acceptance probability; (ii) this is an SMC algorithm with imposed resampling at each iteration but there is no visible step for creating new weights after the resampling step. In the current presentation, it sounds like the weights do not change from the initial step, except for those turning to zero and the renormalisation transforms. Which makes the (unspecified) stratification of little interest if any. I must therefore miss a point in the implementation!

One puzzling sentence in the appendix is that the resampling algorithm used in the SMC step “ensures that every particle that is alive before resampling is represented in the resampled particles”, which reminds me of an argument [possibly a different one] made already in Sisson, Fan and Tanaka (2007) and that we could not validate in our subsequent paper. For resampling to be correct, a form of multinomial sampling must be implemented, even via variance reduction schemes like stratified or systematic sampling.

## Biometrika

Posted in Books, Statistics, University life with tags academic journals, Anthony Davison, Biometrika, Paul Fearnhead, PCI Comput Stat, Peer Community, scientific editing, Series B on November 29, 2017 by xi'an**A**fter ten years of outstanding dedication to Biometrika, Anthony Davison is retiring as Editor of *Biometrika* on 31 December. Ten years! Running a top journal like Biometrika is a massive service to the statistics community, especially when considering the painstaking stage of literally editing each paper towards the stylistic requirements of the journal. For which we definitely should all be quite grateful to Anthony. And to the new Editor, Paul Fearnhead, for taking over. I will actually join the editorial board as assistant editor, along with Omiros Papaspiliopoulos, meaning we will share together the task of screening and allocating submissions. A bit daunting given the volume of submissions is roughly similar to the one I was handling for Series B ten years ago. And given the PCI Comput Stat experiment starting soon!

## Russian roulette still rolling

Posted in Statistics with tags AISTATS 2017, Biometrika, coupling, debiasing, doubly intractable problems, harmonic mean estimator, MCMC, MCMC algorithm, normalising constant, Peter Glynn, pseudo-marginal MCMC, Rao-Blackwellisation, Russian roulette on March 22, 2017 by xi'an**C**olin Wei and Iain Murray arXived a new version of their paper on doubly-intractable distributions, which is to be presented at AISTATS. It builds upon the Russian roulette estimator of Lyne et al. (2015), which itself exploits the debiasing technique of McLeish et al. (2011) [found earlier in the physics literature as in Carter and Cashwell, 1975, according to the current paper]. Such an unbiased estimator of the inverse of the normalising constant can be used for pseudo-marginal MCMC, except that the estimator is sometimes negative and has to be so as proved by Pierre Jacob and co-authors. As I discussed in my post on the Russian roulette estimator, replacing the negative estimate with its absolute value does not seem right because a negative value indicates that the quantity is close to zero, hence replacing it with zero would sound more appropriate. Wei and Murray start from the property that, while the expectation of the importance weight is equal to the normalising constant, the expectation of the inverse of the importance weight converges to the inverse of the weight for an MCMC chain. This however sounds like an harmonic mean estimate because the property would also stand for any substitute to the importance density, as it only requires the density to integrate to one… As noted in the paper, the variance of the resulting Roulette estimator “will be high” or even infinite. Following Glynn et al. (2014), the authors build a coupled version of that solution, which key feature is to cut the higher order terms in the debiasing estimator. This does not guarantee finite variance or positivity of the estimate, though. In order to decrease the variance (assuming it is finite), backward coupling is introduced, with a Rao-Blackwellisation step using our 1996 Biometrika derivation. Which happens to be of lower cost than the standard Rao-Blackwellisation in that special case, O(N) versus O(N²), N being the stopping rule used in the debiasing estimator. Under the assumption that the *inverse* importance weight has finite expectation [wrt the importance density], the resulting backward-coupling Russian roulette estimator can be proven to be unbiased, as it enjoys a finite expectation. (As in the generalised harmonic mean case, the constraint imposes thinner tails on the importance function, which then hampers the convergence of the MCMC chain.) No mention is made of achieving finite variance for those estimators, which again is a serious concern due to the similarity with harmonic means…

## Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags Bell Labs, Biometrika, Canada, Julian Besag, Metropolis-Hastings algorithm, obituary, Peskun ordering, University of Canterbury, University of Victoria, Victoria, Wilfred Keith Hastings on December 9, 2016 by xi'an**A** few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.

## Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags astrostatistics, Bayes factor, Biometrika, Brad Carlin, bridge sampling, cosmology, encompassing model, MCMC, mixtures of distributions, nested sampling, Péru, Sid Chib on September 13, 2016 by xi'an**A**. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,

B^{10}=π(φ⁰|x)/π(φ⁰).

Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… *hold the presses!*, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].