Archive for CRAN

new version of abcrf

Posted in R, Statistics, University life with tags , , , , , , on February 12, 2016 by xi'an
fig-tree near Brisbane, Australia, Aug. 18, 2012Version 1.1 of our R library abcrf version 1.1  is now available on CRAN.  Improvements against the earlier version are numerous and substantial. In particular,  calculations of the random forests have been parallelised and, for machines with multiple cores, the computing gain can be enormous. (The package does along with the random forest model choice paper published in Bioinformatics.)

abcfr 0.9-3

Posted in R, Statistics, University life with tags , , , , , , , , on August 27, 2015 by xi'an

garden tree, Jan. 12, 2012In conjunction with our reliable ABC model choice via random forest paper, about to be resubmitted to Bioinformatics, we have contributed an R package called abcrf that produces a most likely model and its posterior probability out of an ABC reference table. In conjunction with the realisation that we could devise an approximation to the (ABC) posterior probability using a secondary random forest. “We” meaning Jean-Michel Marin and Pierre Pudlo, as I only acted as a beta tester!

abcrfThe package abcrf consists of three functions:

  • abcrf, which constructs a random forest from a reference table and returns an object of class `abc-rf’;
  • plot.abcrf, which gives both variable importance plot of a model choice abc-rf object and the projection of the reference table on the LDA axes;
  • predict.abcrf, which predict the model for new data and evaluate the posterior probability of the MAP.

An illustration from the manual:

data(snp)
data(snp.obs)
mc.rf <- abcrf(snp[1:1e3, 1], snp[1:1e3, -1])
predict(mc.rf, snp[1:1e3, -1], snp.obs)

making a random walk geometrically ergodic

Posted in R, Statistics with tags , , , , , , , , , on March 2, 2013 by xi'an

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc,  started thanks to the ‘Og and still under completion.

packed off!!!

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on February 9, 2013 by xi'an

La Défense, Paris, Feb. 04, 2013Deliverance!!! We have at last completed our book! Bayesian Essentials with R is off my desk! In a final nitty-gritty day of compiling and recompiling the R package bayess and the LaTeX file, we have reached versions that were in par with our expectations. The package has been submitted to CRAN (it has gone back and forth a few times, with requests to lower the computing time in the examples: each example should take less than 10s, then 5s…), then accepted by CRAN, incl. a Windows version, and the book has be sent to Springer-Verlag. This truly is a deliverance for me as this book project has been on my work horizon almost constantly for more than the past two years, led to exciting times in Luminy, Carnon and Berlin, has taken an heavy toll on my collaborations and research activities, and was slowly turning into a unsavoury chore! I am thus delighted Jean-Michel and I managed to close the door before any disastrous consequence on either the book or our friendship could develop. Bayesian Essentials with R is certainly an improvement compared with Bayesian Core, primarily by providing a direct access to the R code. We dearly hope it will attract a wider readership by reducing the mathematical requirements (even though some parts are still too involved for most undergraduates) and we will keep testing it with our own students in Montpellier and Paris over the coming months. In the meanwhile, I just enjoy this feeling of renewed freedom!!!

structure and uncertainty, Bristol, Sept. 26

Posted in Books, pictures, R, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on September 27, 2012 by xi'an

Another day full of interesting and challenging—in the sense they generated new questions for me—talks at the SuSTain workshop. After another (dry and fast) run around the Downs; Leo Held started the talks with one of my favourite topics, namely the theory of g-priors in generalized linear models. He did bring a new perspective on the subject, introducing the notion of a testing Bayes factor based on the residual statistic produced by a classical (maximum likelihood) analysis, connected with earlier works of Vale Johnson. While I did not truly get the motivation for switching from the original data to this less informative quantity, I find this perspective opening new questions for dealing with settings where the true data is replaced with one or several classical statistics. With possible strong connections to ABC, of course. Incidentally, Leo managed to produce a napkin with Peter Green’s intro to MCMC dating back from their first meeting in 1994: a feat I certainly could not reproduce (as I also met both Peter and Leo for the first time in 1994, at CIRM)… Then Richard Everit presented his recent JCGS paper on Bayesian inference on latent Markov random fields, centred on the issue that simulating the latent MRF involves an MCMC step that is not exact (as in our earlier ABC paper for Ising models with Aude Grelaud). I already discussed this paper in an earlier blog and the only additional question that comes to my mind is whether or not a comparison with the auxiliary variable approach of Møller et al. (2006) would make sense.

In the intermission, I had a great conversation with Oliver Ratman on his talk of yesterday on the surprising feature that some models produce as “data” some sample from a pseudo-posterior.. Opening once again new vistas! The following talks were more on the mathematical side, with James Cussens focussing on the use of integer programming for Bayesian variable selections, then Éric Moulines presenting a recent work with a PhD student of his on PAC-Bayesian bounds and the superiority of combining experts. Including a CRAN package. Éric concluded his talk with the funny occurence of Peter’s photograph on Éric’s Microsoft Research Profile own page, due to Éric posting our joint photograph at the top of Pic du Midi d’Ossau in 2005… (He concluded with a picture of the mountain that was the exact symmetry of mine yesterday!)

The afternoon was equally superb with Gareth Roberts covering fifteen years of scaling MCMC algorithms, from the mythical 0.234 figure to the optimal temperature decrease in simulated annealing, John Kent playing the outlier with an EM algorithm—however including a formal prior distribution and raising the challenge as to why Bayesians never had to constrain the posterior expectation, which prompted me to infer that (a) the prior distribution should include all constraints and (b) the posterior expectation was not the “right” tool in non-convex parameters spaces—. Natalia Bochkina presented a recent work, joint with Peter Green, on connecting image analysis with Bayesian asymptotics, reminding me of my early attempts at reading Ibragimov and Has’minskii in the 1990’s. Then a second work with Vladimir Spoikoini on Bayesian asymptotics with misspecified models, introducing a new notion of effective dimension. The last talk of the day was by Nils Hjort about his coming book on “Credibility, confidence and likelihood“—not yet advertised by CUP—which sounds like an attempt at resuscitating Fisher by deriving distributions in the parameter space from frequentist confidence intervals. I already discussed this notion in an earlier blog, so I am fairly skeptical about it, but the talk was representative of Nils’ highly entertaining and though-provoking style! Esp. as he sprinkled the talk with examples where MLE (and some default Bayes estimators) did not work. And reanalysed one of Chris Sims‘ example presented during his Nobel Prize talk…