Archive for R package

testing R code [book review]

Posted in R, Statistics, Travel with tags , , , , , , on March 1, 2017 by xi'an

When I saw this title among the CRC Press novelties, I immediately ordered it as I though it fairly exciting. Now that I have gone through the book, the excitement has died. Maybe faster than need be as I read it while being stuck in a soulless Schipol airport and missing the only ice-climbing opportunity of the year!

Testing R Code was written by Richard Cotton and is quite short: once you take out the appendices and the answers to the exercises, it is about 130 pages long, with a significant proportion of code and output. And it is about some functions developed by Hadley Wickham from RStudio, for testing the coherence of R code in terms of inputs more than outputs. The functions are assertive and testthat. Intended for run-time versus development-time testing. Meaning that the output versus the input are what the author of the code intends them to be. The other chapters contain advices and heuristics about writing maintainable testable code, and incorporating a testing feature in an R package.

While I am definitely a poorly qualified reader for this type of R books, my disappointment stems from my expectation of a book about debugging R code, which is possibly due to a misunderstanding of the term testing. This is an unrealistic expectation, for sure, as testing for a code to produce what it is supposed to do requires some advanced knowledge of what the output should be, at least in some representative situations. Which means using interface like RStudio is capital in spotting unsavoury behaviours of some variables, if not foolproof in any case.

Le Monde puzzle [#967]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , on September 30, 2016 by xi'an

A Sudoku-like Le Monde mathematical puzzle for a come-back (now that it competes with The Riddler!):

Does there exist a 3×3 grid with different and positive integer entries such that the sum of rows, columns, and both diagonals is a prime number? If there exist such grids, find the grid with the minimal sum?

I first downloaded the R package primes. Then I checked if by any chance a small bound on the entries was sufficient:


Running the blind experiment

for (t in 1:1e6){
  if (cale(matrix(sample(n,9),3))) print(n)}

I got 10 as the minimal value of n. Trying with n=9 did not give any positive case. Running another blind experiment checking for the minimal sum led to the result

> A
 [,1] [,2] [,3]
[1,] 8 3 6
[2,] 1 5 7
[3,] 2 11 4

with sum 47.

new version of abcrf

Posted in R, Statistics, University life with tags , , , , , , on February 12, 2016 by xi'an
fig-tree near Brisbane, Australia, Aug. 18, 2012Version 1.1 of our R library abcrf version 1.1  is now available on CRAN.  Improvements against the earlier version are numerous and substantial. In particular,  calculations of the random forests have been parallelised and, for machines with multiple cores, the computing gain can be enormous. (The package does along with the random forest model choice paper published in Bioinformatics.)

Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

making a random walk geometrically ergodic

Posted in R, Statistics with tags , , , , , , , , , on March 2, 2013 by xi'an

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc,  started thanks to the ‘Og and still under completion.

Arrogance sampling

Posted in R, Statistics with tags , , , , , , , , on January 8, 2011 by xi'an

A new posting on arXiv by Benedict Escoto on a simulation method for approximating normalising constants (i.e. evidence) with an eye-catching name! Here is the abstract

This paper describes a method for estimating the marginal likelihood or Bayes factors of Bayesian models using non-parametric importance sampling (“arrogance sampling”). This method can also be used to compute the normalizing constant of probability distributions. Because the required inputs are samples from the distribution to be normalized and the scaled density at those samples, this method may be a convenient replacement for the harmonic mean estimator. The method has been implemented in the open source R package margLikArrogance. Continue reading

Difficulty with mcsm?

Posted in Books, R, Statistics with tags , , , on May 5, 2010 by xi'an

An email from Keith I got this morning:

Professor Robert,
I have loaded the mcsm package to windows.
The following messages appear in the R console:

trying URL ''
Content type 'application/zip' length 193590 bytes (189 Kb)
opened URL
downloaded 189 Kb
package 'mcsm' successfully unpacked and MD5 sums checked

But when I use the demo command as on page 37 from Introducing Monte Carlo Methods with R, I get:

demo (Chapter.1)
Error in demo (Chapter.1): No demo found for topic 'Chapter.1'

How do I fix this?

I think the fix is in loading the package by library(mcsm) [each time one needs it] after installing it [only once]…