## ABC random forests for Bayesian parameter inference

**B**efore leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of this new approach to conducting ABC for Bayesian inference by random forests. (And not ABC model choice as in the Bioinformatics paper with Pierre Pudlo and others.)

I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.

An intriguing point we uncovered through Louis’ experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest. When estimating the posterior expectation of a transform h(θ) given the observed η(y), this summary statistic η(y) ends up in a given leaf for each tree in the forest and all that matters for computing the weight is the number of points from the reference table ending up in this very leaf. I do find this difficult to explain when confronting the case when many simulated points are in the leaf against the case when a single simulated point makes the leaf. This single point ends up being much more influential that all the points in the other situation… While being an outlier of sorts against the prior simulation. But now that I think more about it (after an expensive Lapin Kulta beer in the Helsinki airport while waiting for a change of tire on our airplane!), it somewhat makes sense that rare simulations that agree with the data should be weighted much more than values that stem from the prior simulations and hence do not translate much of an information brought by the observation. (If this sounds murky, blame the beer.) What I found great about this new approach is that it produces a non-parametric evaluation of the cdf of the quantity of interest h(θ) at no calibration cost or hardly any. (An R package is in the making, to be added to the existing R functions of abcrf we developed for the ABC model choice paper.)

May 24, 2016 at 11:00 pm

Hi Xi’an,

Do you by any chance know if there is a tutorial for abc-rf? I’ve only been able to find the R manual.

Thanks!

May 25, 2016 at 4:29 am

Thanks, Brittany! No, I am afraid there is only the R manual for the time being… But abc-rf should not be that hard to operate given that it is essentially a single function. Feel free to call upon us in case you have trouble with operating that function.

May 27, 2016 at 11:38 pm

Hi Xi’an – thank you for your response! Would you by any chance be willing to share an example script with me? I’m just having trouble getting started with the program. I have generated models in DIYABC and am not sure how to import the reference table into abcrf. I’m afraid that I’m not an expert in either R or ABC methods, so it’s all a bit challenging for me. Thanks so much!

May 28, 2016 at 10:06 am

As a starter, have you tried the examples provided in the abcrf reference manual? There is a dataset included in the package, snp, that you can use for testing the few functions in the package.

June 1, 2016 at 10:39 pm

Hi again! Yes I’ve seen and worked through the example snp data set. I just don’t see a dataset similar to theis example in my DIYABC reftable files. I’m not sure if it’s because I’m using an older version (2.0.4) or because there is some data processing step that I need to do prior to importing the data into R. I think that Pierre Pudlo may help with my questions, so sorry for bugging you! Thank you :-)

June 1, 2016 at 10:51 pm

We are working towards updating abcrf in a few days (once the French Statistical meeting is over) and hopefully we will include a translation function like diyabc2abrf()…

May 21, 2016 at 6:55 am

[…] article was first published on R – Xi’an’s Og , and kindly contributed […]