Archive for Use R

wrong algebra for slice sampler

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , , , , on January 27, 2021 by xi'an

Once more, and thrice alas!, I became aware of a typo in our “Use R!” book through a question on X validated from a reader unable to reproduce the slice of a basic 2D slice sampler for a logistic regression with coefficients (a,b). Indeed, our slice reads as the incorrect set (missing the i=1,…,n)

\left\{ (a,b): y_i(a+bx_i) > \log \frac{u_i}{1-u_i} \right\}

when it should have been

\bigcap_{i=1} \left\{ (a,b)\,:\ (-1)^{y_i}(a+bx_i) > \log\frac{u_i}{1-u_i} \right\}

which is the version I found in my LaTeX file. So I do not know what happened (unless I corrected the LaTeX file at a later date and cannot remember it, but the latest chance on the file reads October 2011…). Fortunately, the resulting slices in a and b and the following R code remain correct. Unfortunately, both French and Japanese translations reproduce the mistake…

Bayesian Core and loose logs

Posted in Books, R, Statistics, University life with tags , , , , on July 26, 2011 by xi'an

Jean-Michel (aka Jean-Claude!) Marin came for a few days so that we could make late progress on the revision of our book Bayesian Core towards an Use R! version. In one of the R programs in the mixture chapter, we were getting improbable answers, until we found an R mistake in the shape of

 > sum(c(1,2,3,log=TRUE))
 [1] 7
 > sum(c(1,2,3),log=TRUE)
 [1] 7
 

which was not detected by the compiler… There are surely plenty of good reasons for this to happen and it did not take long to fix the bug, still… annoying!

Méthodes de Monte-Carlo avec R

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on December 3, 2010 by xi'an

The translation of the book Introducing Monte Carlo Methods with R is close to being completed. The copy-editing and page-setting are done, I have received the cover proposal and am happy with it, so it should now go to production and be ready by early January, (earlier than the tentative end of February indicated on amazon) maybe in time for my R class students to get it before the exam. Thanks to the efforts of Pierre-André Cornillon and Eric Matzner (from the Université de Haute-Bretagne in Rennes), the move from the Use R! series format to the Pratique R series format was done seamlessly and effortlessly for me. (Again, thanks to the traductors who did produce their translations in sometimes less than a month!) I am curious to see how much of a market there is for the French translation… The Japanese translation is scheduled for August 2011 at the very least, but I am obviously not involved at all in this translation!

Julien on R shortcomings

Posted in Books, R, Statistics, University life with tags , , , , , , , on September 8, 2010 by xi'an

Julien Cornebise posted a rather detailed set of comments (from Jasper!) that I thought was interesting and thought-provoking enough (!) to promote to a guest post. Here it is , then, to keep the debate rolling (with my only censoring being the removal of smileys!). (Please keep in mind that I do not endorse everything stated in this guest post! Especially the point on “Use R!“)

On C vs R
As a reply to Duncan: indeed C (at least for the bottlenecks) will probably always be faster for the final, mainstream use of an algorithm [e.g. as a distributed R library, or a standalone program]. Machine-level, smart compilers, etc etc. The same goes for Matlab, and even for Python: e.g. Pierre Jacob (Xian’s great PhD student) uses Weave to inline C in his Python code for the bottlenecks — simple, and fast. Some hedge funds even hire coders to recode the Matlab code of their consulting academic statisticians.

Point taken. But, as Radford Neal points out, that doesn’t justify R to be much slower that it could be:

  • When statisticians (cf Xian) want to develop/prototype new algorithms and methods while focussing on the math/stat/algo more than on the language-dependent implementation, it is still a shame to waste 50% (or even 25%). Same goes for the memory management, or even for some language features[1]
  • Even less computer-savvy users of R for real-case data, willing to use existing algorithms (not developing new algos) but on big/intricate datasets can be put off by slow speed — or even by memory failures.
  • And the library is BRILLIANT.

On Future Language vs R
Thanks David and Martyn for the link to Ihaka’s great paper on R-like lisp-based. Says things better than I could, and with an expertise on R that I haven’t. I also didn’t know about Robert Gentleman and his success at Harvard (but he *invented* the thing, not merely tuned it up).

Developing a whole new language and concept, as advocated in Ihaka’s paper and as suggested by gappy3000 would be a great leap forward, and a needed breakthrough to change the way we use computational stats. I would *love* to see that, as I personally think (as Ihaka advocates in the paper you link to) that R, as a language, is a hell of a pain [2] and I am saddened to see a lot of “Use R” books who will root its inadequate use for needs where the language hardly fits the bill — although the library does.

But R is here and in everyday use, and the matter is more of making it worth using, to its full potential. I have no special attachment to R, but any breakthrough language that would not be entirely compatible with the massive library contributed over the years would be doomed to fail to pick-up the everyday statistician—and we’re talking here about far-fetched long-term moves. Sanitary breakthrough, but harder to make happen when such an anchor is here.
I would say that R has turned into the Fortran of statistics: here to stay, anchored by the inertia that stems from its intrinsic (and widely acknowledged) merits  (I’ve been nice, I didn’t say Cobol.).

So until of the great leap forward comes (or until we make it happen as a community), I second Radford Neal‘s call for optimization of the existing core of R.

Rejoinder
As a rejoinder to the comments here, I think we need to consider separately

  1. R’s brilliant library
  2. R’s not-so-brilliant language and/or interpreter.

It seems to me from this topic that the community needs/should push for, in chronological order.

  1. First, a speed-up of R’s existing interpreter as called for by Radford Neal.  “Easy” and short-term task, by good-willing amateur coders, or, better, by solid CS people.
  2. Team-up with CS experts interested in developing computational stat-related tools.
  3. With them, get out of the now dead-ended R language and embark on a new stat framework based on an *existing*, proven, language. *Must*  be able to reuse the brilliant R library/codes brought up by the community. Failing so would fail to pick up the userbase = die in limbo.  That’s more or less what is called for by Ihaka (except for his doubts on the backward compatibility, see Section 7 of his paper).  Much harder and longer term, but worth it.

From then on
Who knows the R community enough to relay this call, and make it happen ? I’m out of my league.

Uninteresting footnotes:
[1] I have twitched several times when trying R, feeling the coding was somewhat unnatural from a CS point of view. [Mind, I twitch all the same, although on other points, with Matlab]
[2] again, I speak only out of the few tries I gave it, as I gave up using it for my everyday work, I am biased — and ignorant

Neal

Research in pair next summer

Posted in Books, Mountains, R, Statistics, Travel, University life with tags , , , on April 30, 2010 by xi'an

Today I received the very good news that our proposal with Jean-Michel Marin to undertake “research in pair” in CIRM, Luminy, a fortnight next summer was accepted! This research centre in Mathematics is a southern and French version of the renowned German centre of Oberwolfach and, while I would have prefered the cool Black Forest to the burning rocks of the nearby calanques, I am very grateful for this support from the sponsors of the  CIRM centre. We aim at revising the book Bayesian Core towards a Use R! version during this fortnight (if the heat does not kill our legendary productivity!).  The CIRM centre is located in a nicely renovated bastide within a small park, and the famous climbing cliffs of the calanques are within walking distance. (I just need to find a climbing partner!) I have organised several meetings there along the years and the atmosphere there is always propitious for research. (There is also a well-provided library, if not comparable to Oberwolfach.)