**M**y friends Luke Bornn, Natesh Pillai and Dawn Woodard just arXived along with Aaron Smith a short note on the convergence properties of ABC. When compared with acceptance-rejection or regular MCMC. Unsurprisingly, ABC does worse in both cases. What is central to this note is that ABC can be (re)interpreted as a pseudo-marginal method where the data comparison step acts like an unbiased estimator of the true ABC target (not of the original ABC target, mind!). From there, it is mostly an application of Christophe Andrieu’s and Matti Vihola’s results in this setup. The authors also argue that using a single pseudo-data simulation per parameter value is the optimal strategy (as compared with using several), when considering asymptotic variance. This makes sense in terms of simulating in a larger dimensional space but what of the cost of producing those pseudo-datasets against the cost of producing a new parameter? There are a few (rare) cases where the datasets are much cheaper to produce.

## Archive for MCMSki IV

## a pseudo-marginal perspective on the ABC algorithm

Posted in Mountains, pictures, Statistics, University life with tags ABC, ABC-MCMC, acceptance rate, Alps, asymptotics, Chamonix, MCMSki IV, pseudo-data, ranking on May 5, 2014 by xi'an## MCqMC 2014 [closup]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags Berlin, Brussels, ISBA 2016, Leuven, MCMSki IV, MCQMC2014, train, WSC 2012 on April 16, 2014 by xi'an**A**s mentioned earlier, this was my very first MCqMC conference and I really enjoyed it, even though (or because) there were many topics that did not fall within my areas of interest. (By comparison, WSC is a serie of conferences too remote from those areas for my taste, as I realised in Berlin where we hardly attended any talk and hardly anyone attended my session!) Here I appreciated the exposure to different mathematical visions on Monte Carlo, without being swamped by applications as at WSC… Obviously, our own Bayesian computational community was much less represented than at, say, MCMSki! Nonetheless, I learned a lot during this conference for instance from Peter Glynn‘s fantastic talk, and I came back home with new problems and useful references [as well as a two-hour delay in the train ride from Brussels]. I also obviously enjoyed the college-town atmosphere of Leuven, the many historical landmarks and the easily-found running routes out of the town. I am thus quite eager to attend the next MCqMC 2016 meeting (in Stanford, an added bonus!) and even vaguely toying with the idea of organising MCqMC 2018 in Monaco (depending on the return for ISBA 2016 and ISBA 2018). In any case, thanks to the scientific committee for the invitation to give a plenary lecture in Leuven and to the local committee for a perfect organisation of the meeting.

## Pre-processing for approximate Bayesian computation in image analysis

Posted in R, Statistics, University life with tags ABC, Chamonix, image processing, MCMC, MCMSki IV, Monte Carlo Statistical Methods, path sampling, Potts model, QUT, simulation, SMC-ABC, Statistics and Computing, sufficient statistics, summary statistics on March 21, 2014 by xi'an**W**ith Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of *Statistics & Computing*. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).

**A**cross 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform *a single* SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)

**S**ince my student’s paper on Seaman et al (2012) got promptly rejected by *TAS* for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.

## Statistics and Computing special MCMSk’issue [call for papers]

Posted in Books, Mountains, R, Statistics, University life with tags Chamonix-Mont-Blanc, deadline, guest editors, JRSSB, MCMSki IV, Read paper, Series B, ski resorts, special issue, Statistics and Computing, The Economist, University of Warwick on February 7, 2014 by xi'an**F**ollowing the exciting and innovative talks, posters and discussions at MCMski IV, the editor of *Statistics and Computing*, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.

**S**o if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on * March 15* (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.

**A**n editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.

## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags BUGS, Chamonix, CREST, cut models, decompression, flu, graphical models, JAGS, Martyn Plummer, MCMC, MCMSki IV, Monte Carlo Statistical Methods, OpenBUGS, The BUGS book on January 29, 2014 by xi'an**A**t MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea *[although this happened to be one of those few sessions where the flu had not yet completely taken over!]* and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. *(Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)*

**I** cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a *cut* is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

instead of

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we *do not* simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):