**M**y colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. *[Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]*

## Archive for CRAN

## EntropyMCMC [R package]

Posted in Statistics with tags convergence assessment, CRAN, discretization, entropy, EntropyMCMC, Lecture Notes in Statistics, MCMC, MCMC convergence, Monte Carlo Statistical Methods, R package, Springer-Verlag, Université d'Orléans, untractable normalizing constant on March 26, 2019 by xi'an## R package truncnorm

Posted in Statistics with tags accept-reject algorithm, CRAN, John Geweke, R, truncated normal, truncnorm on November 8, 2017 by xi'an**T**his week in Warwick, thanks to a (rather incomprehensible) X validated question, I came across the CRAN R package truncnorm, which provides the “density, distribution function, quantile function, random generation and expected value function for the truncated normal distribution”. The short description of the sampler states that the method follows the accept-reject solution of John Geweke (1991), which I reproduced [independently!] a few years later. I may have missed the right code, but checking on the Github depository associated with this package, I did not find in the C code a trace of our optimal solution via a translated exponential proposal, since the exponential proosal, when used, relies on a scale equal to the left truncation point, *a* in the above picture. Obviously, this does not make a major difference in the execution time (and the algorithm is still correct!).

## Extending R

Posted in Books, Kids, R, Statistics with tags Bell Labs, book review, C, CRAN, extending R, Fortran, John Chambers, laptop, Linux, Luke Tierney, object-oriented programming, packages, Pascal, R, Ross Ihaka, S, S-plus, unix on July 13, 2016 by xi'an**A**s I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on programming and the story of R (Parts 2 and 1), and then focus on object-oriented programming (Part 3) and the interfaces from R to other languages (Part 4). While this is “only” a programming book, and thus not strictly appealing to statisticians, reading one of the original actors’ thoughts on the past, present, and future of R is simply fantastic!!! And John Chambers is definitely not calling to simply start over and build something better, as Ross Ihaka did in this [most read] post a few years ago. (It is also great to see the names of friends appearing at times, like Julie, Luke, and Duncan!)

“I wrote most of the original software for S3 methods, which were useful for their application, in the early 1990s.”

In the (hi)story part, Chambers delves into the details of the evolution of S at Bells Labs, as described in his [first] “blue book” (which I kept on my shelf until very recently, next to the “white book“!) and of the occurrence of R in the mid-1990s. I find those sections fascinating maybe the more because I am somewhat of a contemporary, having first learned Fortran (and Pascal) in the mid-1980’s, before moving in the early 1990s to C (that I mostly coded as translated Pascal!), S-plus and eventually R, in conjunction with a (forced) migration from Unix to Linux, as my local computer managers abandoned Unix and mainframe in favour of some virtual Windows machines. And as I started running R on laptops with the help of friends more skilled than I (again keeping some of the early R manuals on my shelf until recently). Maybe one of the most surprising things about those reminiscences is that the very first version of R was dated Feb 29, 2000! Not because of Feb 29, 2000 (which, as Chambers points out, is the first use of the third-order correction to the Gregorian calendar, although I would have thought 1600 was the first one), but because I would have thought it appeared earlier, in conjunction with my first Linux laptop, but this memory is alas getting too vague!

As indicated above, the book is mostly about programming, which means in my case that some sections are definitely beyond my reach! For instance, reading “*the onus is on the person writing the calling function to avoid using a reference object as the argument to an existing function that expects a named list*” is not immediately clear… Nonetheless, most sections are readable [at my level] and enlightening about the mottoes “*everything that exists is an object*” and “*everything that happens is a function*” repeated throughout. (And about my psycho-rigid ways of translating Pascal into every other language!) I obviously learned about new commands and notions, like the difference between

```
````x <- 3`

and

```
````x <<- 3`

(but I was disappointed to learn that the number of <‘s was not related with the depth or height of the allocation!) In particular, I found the part about replacement fascinating, explaining how a command like

```
````diag(x)[i] = 3`

could modify x directly. (While definitely worth reading, the chapter on R packages could have benefited from more details. But as Chambers points out there are whole books about this.) Overall, I am afraid the book will not improve *my* (limited) way of programming in R but I definitely recommend it to anyone even moderately skilled in the language.

## the new version of abcrf

Posted in pictures, R, Statistics, University life with tags ABC, CRAN, DIYABC, population genetics, R, random forests, regression random forest on June 7, 2016 by xi'an**A** new version of the R package abcrf has been posted on Friday by Jean-Michel Marin, in conjunction with the recent arXival of our paper on point estimation via ABC and random forests. The new R functions come to supplement the existing ones towards implementing ABC point estimation:

*covRegAbcrf*, which predicts the posterior covariance between those two response variables, given a new dataset of summaries.*plot.regAbcrf*, which provides a variable importance plot;*predict.regabcrf*, which predicts the posterior expectation, median, variance, quantiles for a given parameter and a new dataset;*regAbcrf*, which produces a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter;*snp*, a simulated example in population genetics used as reference table in our Bioinformatics paper.

Unfortunately, we could not produce directly a *diyabc2abcrf* function for translating a regular DIYABC output into a proper abcrf format, since the translation has to occur in DIYABC instead. And even this is not a straightforward move (to be corrected in the next version of DIYABC).

## new version of abcrf

Posted in R, Statistics, University life with tags ABC model choice, bioinformatics, CRAN, parallelisation, R, R package, random forests on February 12, 2016 by xi'an**V**ersion 1.1 of our R library abcrf version 1.1 is now available on CRAN. Improvements against the earlier version are numerous and substantial. In particular, calculations of the random forests have been parallelised and, for machines with multiple cores, the computing gain can be enormous. (The package does along with the random forest model choice paper published in Bioinformatics.)

## abcfr 0.9-3

Posted in R, Statistics, University life with tags ABC, ABC model choice, abcrf, bioinformatics, CRAN, R, random forests, reference table, SNPs on August 27, 2015 by xi'an**I**n conjunction with our reliable ABC model choice via random forest paper, about to be resubmitted to *Bioinformatics*, we have contributed an R package called abcrf that produces a most likely model and its posterior probability out of an ABC reference table. In conjunction with the realisation that we could devise an approximation to the (ABC) posterior probability using a secondary random forest. “We” meaning Jean-Michel Marin and Pierre Pudlo, as I only acted as a beta tester!

The package abcrf consists of three functions:

*abcrf*, which constructs a random forest from a reference table and returns an object of class `abc-rf’;*plot.abcrf*, which gives both variable importance plot of a model choice abc-rf object and the projection of the reference table on the LDA axes;*predict.abcrf*, which predict the model for new data and evaluate the posterior probability of the MAP.

An illustration from the manual:

data(snp) data(snp.obs) mc.rf <- abcrf(snp[1:1e3, 1], snp[1:1e3, -1]) predict(mc.rf, snp[1:1e3, -1], snp.obs)

## making a random walk geometrically ergodic

Posted in R, Statistics with tags Annals of Statistics, CRAN, geometric ergodicity, métro, MCMC, Metropolis-Hastings, R, R package, random walk, uniform ergodicity on March 2, 2013 by xi'an**W**hile a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, *AoS*, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent *Annals of Statistics* paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc, started thanks to the ‘Og and still under completion.