Just mentioning that a second version of our paper has been arXived and submitted to JMLR, the main input being the inclusion of a reference to the abcrf package. And just repeating our best selling arguments that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.
Archive for sunrise
This series of posts is most probably getting by now an imposition on the ‘Og readership, which either attended ISBA 2016 and does (do?) not need my impressions or did not attend and hence does (do?) not need vague impressions about talks they (it?) did not see, but indulge me in reminiscing about this last ISBA meeting (or more reasonably ignore this post altogether). Now that I am back home (with most of my Sard wine bottles intact!, and a good array of Sard cheeses).
This meeting seems to be the largest ISBA meeting ever, with hundreds of young statisticians taking part in it (despite my early misgivings about the deterrent represented by the overall cost of attending the meeting. I presume holding the meeting in Europe made it easier and cheaper for most Europeans to attend (and hopefully the same will happen in Edinburgh in 2018!), as was the (somewhat unsuspected) wide availability of rental alternatives in the close vicinity of the conference resort. I also presume the same travel opportunities would not have been true in Banff, although local costs would have been lower. It was fantastic to see so many new researchers interested in Bayesian statistics and to meet some of them. And to have more sessions run by the j-Bayes section of ISBA (although I found it counterproductive that such sessions do not focus on a thematically coherent theme). As a result, the meeting was more intense than ever and I found it truly exhausting, despite skipping most poster sessions. Maybe also because I did not skip a single session thanks to the availability of an interesting theme for each block in the schedule. (And because I attended more [great] Sard dinners than I originally intended.) Having five sessions in parallel indeed means there is a fabulous offer of themes for every taste. It also means there are inevitably conflicts when picking one’s session.
Back to poster sessions, I feel I missed an essential part of the meeting, which made ISBA meetings so unique, but it also seems to me the organisation of those sessions should be reconsidered against the rise in attendance. (And my growing inability to stay up late!) One solution suggested by my recent AISTATS experience is to select posters towards lowering the number of posters in the four poster sessions. The success rate for the Cadiz meeting was 35%.) The obvious downsizes are the selection process (but this was done quite efficiently for AISTATS) and the potential reduction in the number of participants. A medium ground could see a smaller fraction of posters to be selected by this process (and published one way or another as in machine-learning conferences) and presented during the evening poster sessions, with other posters being given during the coffee breaks [which certainly does not help in reducing the intensity of the schedule]. Another and altogether solution is to extend the parallelism of oral sessions to poster sessions, by regrouping them into five or six themes or keywords chosen by the presenters and having those presented in different rooms to split the attendance down to human level and tolerable decibels. Nothing preventing participants to visit several rooms in a given evening. Or to keep posters for several nights in a row if the number of rooms allows.
It may also be that this edition of ISBA 2016 sees the end of the resort-style meeting in the spirit of the early Valencia meetings. Edinburgh 2018 will certainly be an open-space conference in that meals and lodgings will be “on” the participants who may choose where and how much. I have heard many times the argument that conferences held in single hotels or resorts facilitated the contacts between young and senior researchers, but I fear this is not sustainable against the growth of the audience. Holding the meeting in a reasonably close and compact location, as a University building, should allow for a sufficient degree of interaction, as was the case at ISBA 2016. (Kerrie Mengersen also suggested that a few restaurants nearby could be designated as “favourites” for participants to interact at dinner time.) Another suggestion to reinforce networking and interacting would be to hold more satellite workshops before the main conference. It seems there could be a young Bayesian workshop in England the prior week as well as a summer short course on simulation methods.
Organising meetings is getting increasingly complex and provides few rewards at the academic level, so I am grateful to the organisers of ISBA 2016 to have agreed to carry the burden this year. And to the scientific committee for setting the quality bar that high. (A special thought too for my friend Walter Racugno who had the ultimate bad luck of having an accident the very week of the meeting he had contributed to organise!)
[Even though I predict this is my last post on ISBA 2016 I would be delighted to have guest posts on others’ impressions on the meeting. Feel free to send me entries!]
Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of this new approach to conducting ABC for Bayesian inference by random forests. (And not ABC model choice as in the Bioinformatics paper with Pierre Pudlo and others.)
I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.
An intriguing point we uncovered through Louis’ experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest. When estimating the posterior expectation of a transform h(θ) given the observed η(y), this summary statistic η(y) ends up in a given leaf for each tree in the forest and all that matters for computing the weight is the number of points from the reference table ending up in this very leaf. I do find this difficult to explain when confronting the case when many simulated points are in the leaf against the case when a single simulated point makes the leaf. This single point ends up being much more influential that all the points in the other situation… While being an outlier of sorts against the prior simulation. But now that I think more about it (after an expensive Lapin Kulta beer in the Helsinki airport while waiting for a change of tire on our airplane!), it somewhat makes sense that rare simulations that agree with the data should be weighted much more than values that stem from the prior simulations and hence do not translate much of an information brought by the observation. (If this sounds murky, blame the beer.) What I found great about this new approach is that it produces a non-parametric evaluation of the cdf of the quantity of interest h(θ) at no calibration cost or hardly any. (An R package is in the making, to be added to the existing R functions of abcrf we developed for the ABC model choice paper.)