## future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on September 29, 2014 by xi'an

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision…  A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics

## likelihood-free inference via classification

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on August 5, 2014 by xi'an

Last week, Michael Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander posted on arXiv the last version of the paper they had presented at MCMSki 4. As indicated by its (above) title, it suggests implementing ABC based on classification tools. Thus making it somewhat connected to our recent random forest paper.

The starting idea in the paper is that datasets generated from distributions with different parameters should be easier to classify than datasets generated from distributions with the same parameters. And that classification accuracy naturally induces a distance between datasets and between the parameters behind those datasets. We had followed some of the same track when starting using random forests, before realising that for our model choice setting, proceeding the entire ABC way once the random forest procedure had been constructed was counter-productive. Random forests are just too deadly as efficient model choice machines to try to compete with them through an ABC postprocessing. Performances are just… Not. As. Good!

A side question: I have obviously never thought about that before but why is the naïve Bayes classification rule so called?! It never sounded very Bayesian to me to (a) use the true value of the parameter and (b) average the classification performances. Interestingly, the authors (i) show identical performances of other classification methods (Fig. 2) and (ii) an exception for MA time series: when we first experimented random forests, raw data from an MA(2) model was tested to select between MA(1) and  MA(2) models, and the performances of the resulting random forest were quite poor.

Now, an opposition between our two approaches is that Michael and his coauthors also include point estimation within the range of classification-based ABC inference. As we stressed in our paper, we restrict the range to classification and model choice because we do not think those machine learning tools are stable and powerful enough to perform regression and posterior probability approximation. I also see a practical weakness in the estimation scheme proposed in this new paper. Namely that the Monte Carlo gets into the way of the consistency theorem. And possibly of the simulation method itself. Another remark is that, while the authors compare the fit produced by different classification methods, there should be a way to aggregate them towards higher efficiency. Returning once more to our random forest paper, we saw improved performances each time we included a reference method, from LDA to SVMs. It would be interesting to see a (summary) variable selection version of the proposed method. A final remark is that computing time and effort do not seem to get mentioned in the paper (unless Indian jetlag confuses me more than usual). I wonder how fast the computing effort grows with the sample size to reach parametric and quadratic convergence rates.

## a pseudo-marginal perspective on the ABC algorithm

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , on May 5, 2014 by xi'an

My friends Luke Bornn, Natesh Pillai and Dawn Woodard just arXived along with Aaron Smith a short note on the convergence properties of ABC. When compared with acceptance-rejection or regular MCMC. Unsurprisingly, ABC does worse in both cases. What is central to this note is that ABC can be (re)interpreted as a pseudo-marginal method where the data comparison step acts like an unbiased estimator of the true ABC target (not of the original ABC target, mind!). From there, it is mostly an application of Christophe Andrieu’s and Matti Vihola’s results in this setup. The authors also argue that using a single pseudo-data simulation per parameter value is the optimal strategy (as compared with using several), when considering asymptotic variance. This makes sense in terms of simulating in a larger dimensional space but what of the cost of producing those pseudo-datasets against the cost of producing a new parameter? There are a few (rare) cases where the datasets are much cheaper to produce.

## MCqMC 2014 [closup]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , on April 16, 2014 by xi'an

As mentioned earlier, this was my very first MCqMC conference and I really enjoyed it, even though (or because) there were many topics that did not fall within my areas of interest. (By comparison, WSC is a serie of conferences too remote from those areas for my taste, as I realised in Berlin where we hardly attended any talk and hardly anyone attended my session!) Here I appreciated the exposure to different mathematical visions on Monte Carlo, without being swamped by applications as at WSC… Obviously, our own Bayesian computational community was much less represented than at, say, MCMSki! Nonetheless, I learned a lot during this conference for instance from Peter Glynn‘s fantastic talk, and I came back home with new problems and useful references [as well as a two-hour delay in the train ride from Brussels]. I also obviously enjoyed the college-town atmosphere of Leuven, the many historical landmarks  and the easily-found running routes out of the town. I am thus quite eager to attend the next MCqMC 2016 meeting (in Stanford, an added bonus!) and even vaguely toying with the idea of organising MCqMC 2018 in Monaco (depending on the return for ISBA 2016 and ISBA 2018). In any case, thanks to the scientific committee for the invitation to give a plenary lecture in Leuven and to the local committee for a perfect organisation of the meeting.

## Pre-processing for approximate Bayesian computation in image analysis

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on March 21, 2014 by xi'an

With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).

Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)

Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.

## Statistics and Computing special MCMSk’issue [call for papers]

Posted in Books, Mountains, R, Statistics, University life with tags , , , , , , , , , , , on February 7, 2014 by xi'an

Following the exciting and innovative talks, posters and discussions at MCMski IV, the editor of Statistics and Computing, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.

So if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on March 15 (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.

An editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.

## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$