## likelihood-free inference via classification

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on August 5, 2014 by xi'an

Last week, Michael Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander posted on arXiv the last version of the paper they had presented at MCMSki 4. As indicated by its (above) title, it suggests implementing ABC based on classification tools. Thus making it somewhat connected to our recent random forest paper.

The starting idea in the paper is that datasets generated from distributions with different parameters should be easier to classify than datasets generated from distributions with the same parameters. And that classification accuracy naturally induces a distance between datasets and between the parameters behind those datasets. We had followed some of the same track when starting using random forests, before realising that for our model choice setting, proceeding the entire ABC way once the random forest procedure had been constructed was counter-productive. Random forests are just too deadly as efficient model choice machines to try to compete with them through an ABC postprocessing. Performances are just… Not. As. Good!

A side question: I have obviously never thought about that before but why is the naïve Bayes classification rule so called?! It never sounded very Bayesian to me to (a) use the true value of the parameter and (b) average the classification performances. Interestingly, the authors (i) show identical performances of other classification methods (Fig. 2) and (ii) an exception for MA time series: when we first experimented random forests, raw data from an MA(2) model was tested to select between MA(1) and  MA(2) models, and the performances of the resulting random forest were quite poor.

Now, an opposition between our two approaches is that Michael and his coauthors also include point estimation within the range of classification-based ABC inference. As we stressed in our paper, we restrict the range to classification and model choice because we do not think those machine learning tools are stable and powerful enough to perform regression and posterior probability approximation. I also see a practical weakness in the estimation scheme proposed in this new paper. Namely that the Monte Carlo gets into the way of the consistency theorem. And possibly of the simulation method itself. Another remark is that, while the authors compare the fit produced by different classification methods, there should be a way to aggregate them towards higher efficiency. Returning once more to our random forest paper, we saw improved performances each time we included a reference method, from LDA to SVMs. It would be interesting to see a (summary) variable selection version of the proposed method. A final remark is that computing time and effort do not seem to get mentioned in the paper (unless Indian jetlag confuses me more than usual). I wonder how fast the computing effort grows with the sample size to reach parametric and quadratic convergence rates.

## a pseudo-marginal perspective on the ABC algorithm

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , on May 5, 2014 by xi'an

My friends Luke Bornn, Natesh Pillai and Dawn Woodard just arXived along with Aaron Smith a short note on the convergence properties of ABC. When compared with acceptance-rejection or regular MCMC. Unsurprisingly, ABC does worse in both cases. What is central to this note is that ABC can be (re)interpreted as a pseudo-marginal method where the data comparison step acts like an unbiased estimator of the true ABC target (not of the original ABC target, mind!). From there, it is mostly an application of Christophe Andrieu’s and Matti Vihola’s results in this setup. The authors also argue that using a single pseudo-data simulation per parameter value is the optimal strategy (as compared with using several), when considering asymptotic variance. This makes sense in terms of simulating in a larger dimensional space but what of the cost of producing those pseudo-datasets against the cost of producing a new parameter? There are a few (rare) cases where the datasets are much cheaper to produce.

## Pre-processing for approximate Bayesian computation in image analysis

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on March 21, 2014 by xi'an

With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).

Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)

Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.

## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we do not simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):

## MCMSki 4, 5… [rejuvenating suggestion]

Posted in Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , on January 16, 2014 by xi'an

Another thing I should have included in the program. Or in the organising committee: a link with the Young Bayesians (j-ISBA) section… As pointed out to me by Kerrie Mengersen, ISBA meetings are obvious opportunities for young researchers to interact and network, as well as for seeking a job. Thus, there should be time slots dedicated to them in every ISBA sponsored meeting, from a mixer on the very first day to a job market coffee break the next day (and to any other social activity bound to increase the interactivity. Like a ski race.). So I would suggest every ISBA sponsored event (and no only the Bayesian Young Statistician Meetings!) should include a j-ISBA representative in its committee(s) to enforce this policy… (Kerrie also suggested random permutations during the banquet which is a neat idea provided the restaurant structure allows for this. It would have been total chaos in La Calèche last week!)

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #18)

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , , , , on January 6, 2014 by xi'an

MCMSki IV is about to start! While further participants may still register (registration is still open!), we are currently 223 registered participants, without accompanying people. I do hope most of these managed to reach the town of Chamonix-Mont-Blanc despite the foul weather on the East Coast. Unfortunately, three speakers (so far) cannot make it: Yugo Chen (Urbana-Champaign), David Hunter (Penn State), Georgios Karagiannis (Toronto), and Liam Paninski (New York). Nial Friel will replace David Hunter and give a talk on Noisy MCMC.

First, the  posters for tonight session (A to K authors) should be posted today (before dinner) on the boards at the end of the main lecture theatre. And removed tonight as well. Check my wordpress blog for the abstracts. (When I mentioned there was no deadline for sending abstracts, I did not expect getting one last Friday!)

Second, I remind potential skiers that the most manageable option is to ski on the Brévent domain, uphill from the conference centre. There is even a small rental place facing the cable-car station (make sure to phone +33450535264 to check they still have skis available) and renting storage closets…

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #17)

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 3, 2014 by xi'an

We are a few days from the start, here are the latest items of information for the participants:

The shuttle transfer on January 5th, from Geneva Airport to Chamonix lasts 1 hour 30 minutes. At your arrival in the airport , follow the “Swiss Exit”. After the customs, the bus driver (handling a sign “MCMC’Ski Chamonix”) will be waiting for you at the Meeting Point in the Arrival Hall. The bus driver will arrive 10 minutes before the time of the meeting and will check for each participant on his or her list. There may be delays in case of poor weather.  The bus will drop you in front of or close to your hotel. If you miss the bus initially booked, you can get the next one. If you miss the last transfer, taking a taxi will be the only solution (warning, about 250 Euros!!!)

The registration will start on Monday January 6th at 8am, the conference will start at 8.45am. The conference will take place at the Majestic Congress Center, located 241 Allée du Majestic, in downtown Chamonix. There are signs all over town directing to Majestic Congrés. (No skiing equipment, i.e., skis, boots, boards, is allowed inside the building.) Speakers are advised to check with their chair in advance about downloading their talk.

The Richard Tweedie ski race should take place on Wednesday at 1pm, weather and snow permitting. There will be a registration line at the registration desk. (The cost is 10€ per person and does not include lift passes or equipment.) Thanks to Antonietta Mira, there will be two pairs of skis to be won!)