Archive for R

one-way random walks

Posted in Kids, R, Statistics with tags , , , on May 2, 2021 by xi'an

A rather puzzling riddle from The Riddler on an 3×3 directed grid and the probability to get from the North-West to the South-East nodes following the arrows. Puzzling because while the solution could be reasonably computed with an R code like

for(i in 1:2^12){
  for(j in 1:12)sol=max(sol,

where paz is the list of the 12 possible paths from North-West to South-East (excluding loops!), leading to a probability of 1135/2¹², I could not find a logical reasoning to reach this number. The paths of length 4, 6, 8 are valid in 2⁸, 2⁶, 2⁴ of the cases, respectively and logically!, but this does not help as they are dependent.

how many T-Rex can you fit in your backyard?

Posted in Statistics with tags , , , , , on April 30, 2021 by xi'an

A fascinating question examined in this issue of Science [as pointed out by Nature!] in a paper by Marshall et al. on how many T. Rex(es) roamed the Earth at a given time (in the Cretaceous).  The figure is evaluated from Damuth’s Law and relying on estimates of their body mass (8 tons?), the range of its habitat, the longevity of the species (1.2 million years?), its generation time (18 years?), somewhat surprisingly taking the maximum age (28 years) as the age of the oldest observed fossil.

“We assessed the impact of uncertainties in the data used with Monte Carlo simulations, but these simulations do not accommodate uncertainties that might stem from the choices made in the design of our approach.”

The resulting global evaluation is of an abundance of about 20,000 individuals at a given time, albeit with a 95% confidence interval between 1300 and 328,000 animals, with around 127,000 generations, and a total number of T. rex that ever lived amounting to 2.5 billion animals. Fun exercise, but I am rather reserved at the validity of the evaluation, given the uncertainty and poor data about most terms in the equation.

a common confusion between sample and population moments

Posted in Books, Kids, R, Statistics with tags , , , , , , , on April 29, 2021 by xi'an

ten computer codes that transformed science

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on April 23, 2021 by xi'an

In a “Feature” article of 21 January 2021, Nature goes over a poll on “software tools that have had a big impact on the world of science”. Among those,

the Fortran compiler (1957), which is one of the first symbolic languages, developed by IBM. This is the first computer language I learned (in 1982) and one of the two (with SAS) I ever coded on punch cards for the massive computers of INSEE. I quickly and enthusiastically switched to Pascal (and the Apple IIe) the year after and despite an attempt at moving to C, I alas kept the Pascal programming style in my subsequent C codes (until I gave up in the early 2000’s!). Moving to R full time, even though I had been using Splus since a Unix version was produced. Interestingly, a later survey of Nature readers put R at the top of the list of what should have been included!, incidentally including Monte Carlo algorithms into the list (and I did not vote in that poll!),

the fast Fourier transform (1965), co-introduced by John Tukey, but which I never ever used (or at least knowingly!),

arXiv (1991), which was started as an emailed preprint list by Paul Ginsparg at Los Alamos, getting the current name by 1998, and where I only started publishing (or arXiving) in 2007, perhaps because it then sounded difficult to submit a preprint there, perhaps because having a worldwide preprint server sounded more like bother (esp. since we had then to publish our preprints on the local servers) than revolution, perhaps because of a vague worry of being overtaken by others… Anyway, I now see arXiv as the primary outlet for publishing papers, with the possible added features of arXiv-backed journals and Peer Community validations,

the IPython Notebook (2011), by Fernando Pérez, which started by 259 lines of Python code, and turned into Jupyter in 2014. I know nothing about this, but I can relate to the relevance of the project when thinking about Rmarkdown, which I find more and more to be a great way to work on collaborative projects and to teach. And for producing reproducible research. (I do remember writing once a paper in Sweave, but not which one…!)

the new DIYABC-RF

Posted in Books, pictures, R, Statistics, Wines with tags , , , , , , , , , , , , , , , , on April 15, 2021 by xi'an

My friends and co-authors from Montpellier have released last month the third version of the DIYABC software, DIYABC-RF, which includes and promotes the use of random forests for parameter inference and model selection, in connection with Louis Raynal’s thesis. Intended as the earlier versions of DIYABC for population genetic applications. Bienvenue!!!

The software DIYABC Random Forest (hereafter DIYABC-RF) v1.0 is composed of three parts: the dataset simulator, the Random Forest inference engine and the graphical user interface. The whole is packaged as a standalone and user-friendly graphical application named DIYABC-RF GUI and available at The different developer and user manuals for each component of the software are available on the same website. DIYABC-RF is a multithreaded software on three operating systems: GNU/Linux, Microsoft Windows and MacOS. One can use the program can be used through a modern and user-friendly graphical interface designed as an R shiny application (Chang et al. 2019). For a fluid and simplified user experience, this interface is available through a standalone application, which does not require installing R or any dependencies and hence can be used independently. The application is also implemented in an R package providing a standard shiny web application (with the same graphical interface) that can be run locally as any shiny application, or hosted as a web service to provide a DIYABC-RF server for multiple users.