Archive for DIYABC

the new DIYABC-RF

Posted in Books, pictures, R, Statistics, Wines with tags , , , , , , , , , , , , , , , , on April 15, 2021 by xi'an

My friends and co-authors from Montpellier have released last month the third version of the DIYABC software, DIYABC-RF, which includes and promotes the use of random forests for parameter inference and model selection, in connection with Louis Raynal’s thesis. Intended as the earlier versions of DIYABC for population genetic applications. Bienvenue!!!

The software DIYABC Random Forest (hereafter DIYABC-RF) v1.0 is composed of three parts: the dataset simulator, the Random Forest inference engine and the graphical user interface. The whole is packaged as a standalone and user-friendly graphical application named DIYABC-RF GUI and available at https://diyabc.github.io. The different developer and user manuals for each component of the software are available on the same website. DIYABC-RF is a multithreaded software on three operating systems: GNU/Linux, Microsoft Windows and MacOS. One can use the program can be used through a modern and user-friendly graphical interface designed as an R shiny application (Chang et al. 2019). For a fluid and simplified user experience, this interface is available through a standalone application, which does not require installing R or any dependencies and hence can be used independently. The application is also implemented in an R package providing a standard shiny web application (with the same graphical interface) that can be run locally as any shiny application, or hosted as a web service to provide a DIYABC-RF server for multiple users.

ABC in Svalbard [#1]

Posted in Books, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 13, 2021 by xi'an

It started a bit awkwardly for me as I ran late, having accidentally switched to UK time the previous evening (despite a record-breaking biking-time to the University!), then the welcome desk could not find the key to the webinar room and I ended up following the first session from my office, by myself (and my teapot)… Until we managed to reunite in the said room (with an air quality detector!).

Software sessions are rather difficult to follow and I wonder what the idea on-line version should be. We could borrow from our teaching experience new-gained from the past year, where we had to engage students without the ability to roam the computer lab and look at their screens to force engage them into coding. It is however unrealistic to run a computer lab, unless a few “guinea pigs” could be selected in advance and show their progress or lack thereof during the session. In any case, thanks to the speakers who made the presentations of

  1. BSL(R)
  2. ELFI (Python)
  3. ABCpy (Python)

this morning/evening. (Just taking the opportunity to point out the publication of the latest version of DIYABC!).

Florence Forbes’ talk on using mixture of experts was quite alluring (and generated online discussions during the break, recovering some of the fun in real conferences), esp. from my longtime interest normalising flows in mixtures of regression (and more to come as part of our biweekly reading group!). Louis talked about gaining efficiency by not resampling the entire data in large network models. Edwin Fong brought martingales and infinite dimension distributions to the rescue, generalising Polya urns! And Justin Alsing discussed the advantages of estimating the likelihood rather than estimating the posterior, which sounds counterintuitive. With a return to mixtures as approximations, using instead normalising flows. With the worth-repeating message that ABC marginalises over nuisance parameters so easily! And a nice perspective on ABayesian decision, which does not occur that often in the ABC literature. Cecilia Viscardi made a link between likelihood estimation and large deviations à la Sanov, the rare event being associated with the larger distances, albeit dependent on a primary choice of the tolerance. Michael Gutmann presented an intringuing optimisation Monte Carlo approach from his last year AISTATS 2020 paper, the simulated parameter being defined by a fiducial inversion. Reweighted by the prior times a Jacobian term, which stroke me as a wee bit odd, ie using two distributions on θ. And Rito concluded the day by seeking approximate sufficient statistics by constructing exponential families whose components are themselves parameterised as neural networks with neural parameter ω. Leading to an unnormalised model because of the energy function, hence to the use of inference techniques on ω that do not require the constant, like Gutmann & Hyvärinen (2012). And using the (pseudo-)sufficient statistic as ABCsummary statistic. Which still requires an exchange MCMC step within ABC.

the new version of abcrf

Posted in pictures, R, Statistics, University life with tags , , , , , , on June 7, 2016 by xi'an

gaarden tree, Jan. 16, 2012A new version of the R package abcrf has been posted on Friday by Jean-Michel Marin, in conjunction with the recent arXival of our paper on point estimation via ABC and random forests. The new R functions come to supplement the existing ones towards implementing ABC point estimation:

  1. covRegAbcrf, which predicts the posterior covariance between those two response variables, given a new dataset of summaries.
  2. plot.regAbcrf, which provides a variable importance plot;
  3. predict.regabcrf, which predicts the posterior expectation, median, variance, quantiles for a given parameter and a new dataset;
  4. regAbcrf, which produces a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter;
  5. snp, a simulated example in population genetics used as reference table in our Bioinformatics paper.

Unfortunately, we could not produce directly a diyabc2abcrf function for translating a regular DIYABC output into a proper abcrf format, since the translation has to occur in DIYABC instead. And even this is not a straightforward move (to be corrected in the next version of DIYABC).

ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on September 4, 2015 by xi'an

While my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.

ABC model choice via random forests [expanded]

Posted in Statistics, University life with tags , , , , , , , , , , , on October 1, 2014 by xi'an

outofAfToday, we arXived a second version of our paper on ABC model choice with random forests. Or maybe [A]BC model choice with random forests. Since the random forest is built on a simulation from the prior predictive and no further approximation is used in the process. Except for the computation of the posterior [predictive] error rate. The update wrt the earlier version is that we ran massive simulations throughout the summer, on existing and new datasets. In particular, we have included a Human dataset extracted from the 1000 Genomes Project. Made of 51,250 SNP loci. While this dataset is not used to test new evolution scenarios, we compared six out-of-Africa scenarios, with a possible admixture for Americans of African ancestry. The scenario selected by a random forest procedure posits a single out-of-Africa colonization event with a secondary split into a European and an East Asian population lineages, and a recent genetic admixture between African and European lineages, for Americans of African origin. The procedure reported a high level of confidence since the estimated posterior error rate is equal to zero. The SNP loci were carefully selected using the following criteria: (i) all individuals have a genotype characterized by a quality score (GQ)>10, (ii) polymorphism is present in at least one of the individuals in order to fit the SNP simulation algorithm of Hudson (2002) used in DIYABC V2 (Cornuet et al., 2014), (iii) the minimum distance between two consecutive SNPs is 1 kb in order to minimize linkage disequilibrium between SNP, and (iv) SNP loci showing significant deviation from Hardy-Weinberg equilibrium at a 1% threshold in at least one of the four populations have been removed.

In terms of random forests, we optimised the size of the bootstrap subsamples for all of our datasets. While this optimisation requires extra computing time, it is negligible when compared with the enormous time taken by a logistic regression, which is [yet] the standard ABC model choice approach. Now the data has been gathered, it is only a matter of days before we can send the paper to a journal