**V**ersion 1.1 of our R library abcrf version 1.1 is now available on CRAN. Improvements against the earlier version are numerous and substantial. In particular, calculations of the random forests have been parallelised and, for machines with multiple cores, the computing gain can be enormous. (The package does along with the random forest model choice paper published in Bioinformatics.)

## Archive for ABC model choice

## new version of abcrf

Posted in R, Statistics, University life with tags ABC model choice, bioinformatics, CRAN, parallelisation, R, R package, random forests on February 12, 2016 by xi'an## ABC for wargames

Posted in Books, Kids, pictures, Statistics with tags ABC, ABC model choice, Bayes factor, differential equation, elves, PLoS ONE, warhammer on February 10, 2016 by xi'an**I** recently came across an ABC paper in PLoS ONE by Xavier Rubio-Campillo applying this simulation technique to the validation of some differential equation models linking force sizes and values for both sides. The dataset is made of battle casualties separated into four periods, from *pike and musket* to the *American Civil War*. The outcome is used to compute an ABC Bayes factor but it seems this computation is highly dependent on the tolerance threshold. With highly variable numerical values. The most favoured model includes some fatigue effect about the decreasing efficiency of armies along time. While the paper somehow reminded me of a most peculiar book, I have no idea on the depth of this analysis, namely on how relevant it is to model a battle through a two-dimensional system of differential equations, given the numerous factors involved in the matter…

## Goodness-of-fit statistics for ABC

Posted in Books, Statistics, University life with tags ABC, ABC model choice, Bayesian p-values, goodness of fit, posterior predictive, summary statistics on February 1, 2016 by xi'an

“Posterior predictive checks are well-suited to Approximate Bayesian Computation”

Louisiane Lemaire and her coauthors from Grenoble have just arXived a new paper on designing a goodness-of-fit statistic from ABC outputs. The statistic is constructed from a comparison between the observed (summary) statistics and replicated summary statistics generated from the posterior predictive distribution. This is a major difference with the standard ABC distance, when the replicated summary statistics are generated from the prior predictive distribution. The core of the paper is about calibrating a posterior predictive p-value derived from this distance, since it is not properly calibrated in the frequentist sense that it is not uniformly distributed “under the null”. A point I discussed in an ‘Og entry about Andrews’ book a few years ago.

The paper opposes the average distance between ABC acceptable summary statistics and the observed realisation to the average distance between ABC posterior predictive simulations of summary statistics and the observed realisation. In the simplest case (e.g., without a post-processing of the summary statistics), the main difference between both average distances is that the summary statistics are used twice in the first version: first to select the acceptable values of the parameters and a second time for the average distance. Which makes it biased downwards. The second version is more computationally demanding, especially when deriving the associated p-value. It however produces a higher power under the alternative. Obviously depending on how the alternative is defined, since goodness-of-fit is only related to the null, i.e., to a specific model.

From a general perspective, I do not completely agree with the conclusions of the paper in that (a) this is a frequentist assessment and partakes in the shortcomings of p-values and (b) the choice of summary statistics has a huge impact on the decision about the fit since hardly varying statistics are more likely to lead to a good fit than appropriately varying ones.

## ABC model choice via random forests accepted!

Posted in Books, pictures, Statistics, University life with tags ABC, ABC model choice, bioinformatics, Montpellier, PNAS, random forests on October 21, 2015 by xi'an

“This revision represents a very nice response to the earlier round of reviews, including a significant extension in which the posterior probability of the selected model is now estimated (whereas previously this was not included). The extension is a very nice one, and I am happy to see it included.” Anonymous

**G**reat news [at least for us], our paper on ABC model choice has been accepted by Bioninformatics! With the pleasant comment above from one anonymous referee. This occurs after quite a prolonged gestation, which actually contributed to a shift in our understanding and our implementation of the method. I am still a wee bit unhappy at the rejection by PNAS, but it paradoxically led to a more elaborate article. So all is well that ends well! Except the story is not finished and we have still exploring the multiple usages of random forests in ABC.

## ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags ABC model choice, abcrf, Bayesian model choice, DIYABC, France, model posterior probabilities, PNAS, R, random forests, UFOs on September 4, 2015 by xi'an**W**hile my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.

## abcfr 0.9-3

Posted in R, Statistics, University life with tags ABC, ABC model choice, abcrf, bioinformatics, CRAN, R, random forests, reference table, SNPs on August 27, 2015 by xi'an**I**n conjunction with our reliable ABC model choice via random forest paper, about to be resubmitted to *Bioinformatics*, we have contributed an R package called abcrf that produces a most likely model and its posterior probability out of an ABC reference table. In conjunction with the realisation that we could devise an approximation to the (ABC) posterior probability using a secondary random forest. “We” meaning Jean-Michel Marin and Pierre Pudlo, as I only acted as a beta tester!

The package abcrf consists of three functions:

*abcrf*, which constructs a random forest from a reference table and returns an object of class `abc-rf’;*plot.abcrf*, which gives both variable importance plot of a model choice abc-rf object and the projection of the reference table on the LDA axes;*predict.abcrf*, which predict the model for new data and evaluate the posterior probability of the MAP.

An illustration from the manual:

data(snp) data(snp.obs) mc.rf <- abcrf(snp[1:1e3, 1], snp[1:1e3, -1]) predict(mc.rf, snp[1:1e3, -1], snp.obs)