Archive for ABC model choice

ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 5, 2016 by xi'an

garden tree, Jan. 12, 2012Shinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)

new version of abcrf

Posted in R, Statistics, University life with tags , , , , , , on February 12, 2016 by xi'an
fig-tree near Brisbane, Australia, Aug. 18, 2012Version 1.1 of our R library abcrf version 1.1  is now available on CRAN.  Improvements against the earlier version are numerous and substantial. In particular,  calculations of the random forests have been parallelised and, for machines with multiple cores, the computing gain can be enormous. (The package does along with the random forest model choice paper published in Bioinformatics.)

ABC for wargames

Posted in Books, Kids, pictures, Statistics with tags , , , , , , on February 10, 2016 by xi'an

I recently came across an ABC paper in PLoS ONE by Xavier Rubio-Campillo applying this simulation technique to the validation of some differential equation models linking force sizes and values for both sides. The dataset is made of battle casualties separated into four periods, from pike and musket to the American Civil War. The outcome is used to compute an ABC Bayes factor but it seems this computation is highly dependent on the tolerance threshold. With highly variable numerical values. The most favoured model includes some fatigue effect about the decreasing efficiency of armies along time. While the paper somehow reminded me of a most peculiar book, I have no idea on the depth of this analysis, namely on how relevant it is to model a battle through a two-dimensional system of differential equations, given the numerous factors involved in the matter…

Goodness-of-fit statistics for ABC

Posted in Books, Statistics, University life with tags , , , , , on February 1, 2016 by xi'an

“Posterior predictive checks are well-suited to Approximate Bayesian Computation”

Louisiane Lemaire and her coauthors from Grenoble have just arXived a new paper on designing a goodness-of-fit statistic from ABC outputs. The statistic is constructed from a comparison between the observed (summary) statistics and replicated summary statistics generated from the posterior predictive distribution. This is a major difference with the standard ABC distance, when the replicated summary statistics are generated from the prior predictive distribution. The core of the paper is about calibrating a posterior predictive p-value derived from this distance, since it is not properly calibrated in the frequentist sense that it is not uniformly distributed “under the null”. A point I discussed in an ‘Og entry about Andrews’ book a few years ago.

The paper opposes the average distance between ABC acceptable summary statistics and the observed realisation to the average distance between ABC posterior predictive simulations of summary statistics and the observed realisation. In the simplest case (e.g., without a post-processing of the summary statistics), the main difference between both average distances is that the summary statistics are used twice in the first version: first to select the acceptable values of the parameters and a second time for the average distance. Which makes it biased downwards. The second version is more computationally demanding, especially when deriving the associated p-value. It however produces a higher power under the alternative. Obviously depending on how the alternative is defined, since goodness-of-fit is only related to the null, i.e., to a specific model.

From a general perspective, I do not completely agree with the conclusions of the paper in that (a) this is a frequentist assessment and partakes in the shortcomings of p-values and (b) the choice of summary statistics has a huge impact on the decision about the fit since hardly varying statistics are more likely to lead to a good fit than appropriately varying ones.

ABC model choice via random forests accepted!

Posted in Books, pictures, Statistics, University life with tags , , , , , on October 21, 2015 by xi'an

treerise6“This revision represents a very nice response to the earlier round of reviews, including a significant extension in which the posterior probability of the selected model is now estimated (whereas previously this was not included). The extension is a very nice one, and I am happy to see it included.” Anonymous

Great news [at least for us], our paper on ABC model choice has been accepted by Bioninformatics! With the pleasant comment above from one anonymous referee. This occurs after quite a prolonged gestation, which actually contributed to a shift in our understanding and our implementation of the method. I am still a wee bit unhappy at the rejection by PNAS, but it paradoxically led to a more elaborate article. So all is well that ends well! Except the story is not finished and we have still exploring the multiple usages of random forests in ABC.

seminar im München, am Max-Planck-Institut für Astrophysik

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on October 15, 2015 by xi'an

On Friday, I give a talk in München on ABC model choice. At the Max-Planck Institute for Astrophysics. As coincidence go, I happen to talk the week after John Skilling gave a seminar there. On Bayesian tomography, not on nested sampling. And the conference organisers put the cover of the book Think Bayes: Bayesian Statistics Made Simple, written by Allen Downey, a book I reviewed yesterday night for CHANCE (soon to appear on the ‘Og!) [not that I understand the connection with the Max-Planck Institute or with my talk!, warum nicht?!] The slides are the same as in Oxford for SPA 2015:

ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on September 4, 2015 by xi'an

While my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.

Follow

Get every new post delivered to your Inbox.

Join 1,029 other followers