Archive for Use R

Bayesian Core and loose logs

Posted in Books, R, Statistics, University life with tags , , , , on July 26, 2011 by xi'an

Jean-Michel (aka Jean-Claude!) Marin came for a few days so that we could make late progress on the revision of our book Bayesian Core towards an Use R! version. In one of the R programs in the mixture chapter, we were getting improbable answers, until we found an R mistake in the shape of

 > sum(c(1,2,3,log=TRUE))
 [1] 7
 > sum(c(1,2,3),log=TRUE)
 [1] 7

which was not detected by the compiler… There are surely plenty of good reasons for this to happen and it did not take long to fix the bug, still… annoying!

Méthodes de Monte-Carlo avec R

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on December 3, 2010 by xi'an

The translation of the book Introducing Monte Carlo Methods with R is close to being completed. The copy-editing and page-setting are done, I have received the cover proposal and am happy with it, so it should now go to production and be ready by early January, (earlier than the tentative end of February indicated on amazon) maybe in time for my R class students to get it before the exam. Thanks to the efforts of Pierre-André Cornillon and Eric Matzner (from the Université de Haute-Bretagne in Rennes), the move from the Use R! series format to the Pratique R series format was done seamlessly and effortlessly for me. (Again, thanks to the traductors who did produce their translations in sometimes less than a month!) I am curious to see how much of a market there is for the French translation… The Japanese translation is scheduled for August 2011 at the very least, but I am obviously not involved at all in this translation!

Julien on R shortcomings

Posted in Books, R, Statistics, University life with tags , , , , , , , on September 8, 2010 by xi'an

Julien Cornebise posted a rather detailed set of comments (from Jasper!) that I thought was interesting and thought-provoking enough (!) to promote to a guest post. Here it is , then, to keep the debate rolling (with my only censoring being the removal of smileys!). (Please keep in mind that I do not endorse everything stated in this guest post! Especially the point on “Use R!“)

On C vs R
As a reply to Duncan: indeed C (at least for the bottlenecks) will probably always be faster for the final, mainstream use of an algorithm [e.g. as a distributed R library, or a standalone program]. Machine-level, smart compilers, etc etc. The same goes for Matlab, and even for Python: e.g. Pierre Jacob (Xian’s great PhD student) uses Weave to inline C in his Python code for the bottlenecks — simple, and fast. Some hedge funds even hire coders to recode the Matlab code of their consulting academic statisticians.

Point taken. But, as Radford Neal points out, that doesn’t justify R to be much slower that it could be:

  • When statisticians (cf Xian) want to develop/prototype new algorithms and methods while focussing on the math/stat/algo more than on the language-dependent implementation, it is still a shame to waste 50% (or even 25%). Same goes for the memory management, or even for some language features[1]
  • Even less computer-savvy users of R for real-case data, willing to use existing algorithms (not developing new algos) but on big/intricate datasets can be put off by slow speed — or even by memory failures.
  • And the library is BRILLIANT.

On Future Language vs R
Thanks David and Martyn for the link to Ihaka’s great paper on R-like lisp-based. Says things better than I could, and with an expertise on R that I haven’t. I also didn’t know about Robert Gentleman and his success at Harvard (but he *invented* the thing, not merely tuned it up).

Developing a whole new language and concept, as advocated in Ihaka’s paper and as suggested by gappy3000 would be a great leap forward, and a needed breakthrough to change the way we use computational stats. I would *love* to see that, as I personally think (as Ihaka advocates in the paper you link to) that R, as a language, is a hell of a pain [2] and I am saddened to see a lot of “Use R” books who will root its inadequate use for needs where the language hardly fits the bill — although the library does.

But R is here and in everyday use, and the matter is more of making it worth using, to its full potential. I have no special attachment to R, but any breakthrough language that would not be entirely compatible with the massive library contributed over the years would be doomed to fail to pick-up the everyday statistician—and we’re talking here about far-fetched long-term moves. Sanitary breakthrough, but harder to make happen when such an anchor is here.
I would say that R has turned into the Fortran of statistics: here to stay, anchored by the inertia that stems from its intrinsic (and widely acknowledged) merits  (I’ve been nice, I didn’t say Cobol.).

So until of the great leap forward comes (or until we make it happen as a community), I second Radford Neal‘s call for optimization of the existing core of R.

As a rejoinder to the comments here, I think we need to consider separately

  1. R’s brilliant library
  2. R’s not-so-brilliant language and/or interpreter.

It seems to me from this topic that the community needs/should push for, in chronological order.

  1. First, a speed-up of R’s existing interpreter as called for by Radford Neal.  “Easy” and short-term task, by good-willing amateur coders, or, better, by solid CS people.
  2. Team-up with CS experts interested in developing computational stat-related tools.
  3. With them, get out of the now dead-ended R language and embark on a new stat framework based on an *existing*, proven, language. *Must*  be able to reuse the brilliant R library/codes brought up by the community. Failing so would fail to pick up the userbase = die in limbo.  That’s more or less what is called for by Ihaka (except for his doubts on the backward compatibility, see Section 7 of his paper).  Much harder and longer term, but worth it.

From then on
Who knows the R community enough to relay this call, and make it happen ? I’m out of my league.

Uninteresting footnotes:
[1] I have twitched several times when trying R, feeling the coding was somewhat unnatural from a CS point of view. [Mind, I twitch all the same, although on other points, with Matlab]
[2] again, I speak only out of the few tries I gave it, as I gave up using it for my everyday work, I am biased — and ignorant


Research in pair next summer

Posted in Books, Mountains, R, Statistics, Travel, University life with tags , , , on April 30, 2010 by xi'an

Today I received the very good news that our proposal with Jean-Michel Marin to undertake “research in pair” in CIRM, Luminy, a fortnight next summer was accepted! This research centre in Mathematics is a southern and French version of the renowned German centre of Oberwolfach and, while I would have prefered the cool Black Forest to the burning rocks of the nearby calanques, I am very grateful for this support from the sponsors of the  CIRM centre. We aim at revising the book Bayesian Core towards a Use R! version during this fortnight (if the heat does not kill our legendary productivity!).  The CIRM centre is located in a nicely renovated bastide within a small park, and the famous climbing cliffs of the calanques are within walking distance. (I just need to find a climbing partner!) I have organised several meetings there along the years and the atmosphere there is always propitious for research. (There is also a well-provided library, if not comparable to Oberwolfach.)

JSM 2009 impressions [day 3]

Posted in Books, Running, Statistics, University life with tags , , , , , on August 5, 2009 by xi'an

The day started very early with the Gertrude Cox Scholarship 5k race, since my wife and I had to leave the hotel at 5:15am to catch the first metro to the RFK stadium. We met other runners in the metro and we all managed to get to the parking lot of the stadium. There were actually fewer runners than at the previous Gertrude Cox races I ran (like the first one in 1989 in D.C.), maybe around 40 of us, and the track for the race was one loop around the huge parking lot, not inside the stadium quite obviously. We started at about 6:20am in a warm humid weather and I managed to keep track with the two leaders for about one kilometer (3:38) before setting to my own pace. I stuck to a third place for the rest of the race, ending up in 18:28 about 30 seconds behind David Dunson and more than a minute behind the winner, in what felt like more than 5k.

The first session I attended was the Medallion lecture by Allistair Sinclair who talked about exact convergence speeds for MCMC algorithms in combinatorics. While the talk was beautifully organised and quite broad in reaching to the audience, I must admit I ended up being disappointed at the lack of connection with the MCMC developments found in Statistics, especially the huge corpus of work by Gareth Roberts and Jeff Rosenthal. This is another illustration of the gap between computer scientists working in combinatorics and applied probabilists, even though they are using the same tools. In the afternoon, I went to the Savage Award Finalists session, where the four finalist were presenting their PhD thesis work. Interestingly, they all have some Bayesian features in their work, albeit from different perspectives, and David Dunson managed to give a great discussion on those four theses at the same pace he ran the morning 5k! Later that day, at the SBSS (Section on Bayesian Statistical Science) mixer, the Savage Award was given to Lorenzo Trippa from Milano, now at the M.D. Anderson Cancer Center, Texas A & M, for his extensions of Polya tree models.

I was mentioning the new books in the Use R! series in the previous post. I spotted yesterday a book by Phil Spector on Data Manipulation with R that I immediately bought because Phil’s material on R available on the web has been quite helpful in writing Introducing Monte Carlo Methods with R. (Hence the free cap!) Note that he should not be confused with the music producer Phil Spector, who worked with the Ramones and is now in jail! I incidentally spotted two copies of the paperback version of the The Bayesian Choice printed in hard-cover by mistake but sold at the paperback price. (This is due to the new print-on-demand strategy of publishers that eliminates inventory.)

JSM 2009 impressions [day 2]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on August 4, 2009 by xi'an

Julien Cornebise wrote his impressions on yesterday [day 2] as comments to day 1 and he is welcome as a guest editor! I completely agree with his views on George Casella’s Medallion Lecture on design, which emphasized the need to reconsider this somehow neglected part of the Statistics curriculum. George’s lecture was both passionate and broad, which made it accessible to the large audience there. It was based on his Statistical Design book, on sale at the Springer Verlag booth in the Exhibit hall when you go to check the Enigma machine at the NSA booth. Along with a whole table of new books in the Use R! series, soon to be augmented by our book Introducing Monte Carlo Methods with R with George Casella, which is available in a draft version at the booth. (We actually signed the contract for Introducing Monte Carlo Methods with R with Springer yesterday afternoon.) The Springer editor, John Kimmel, is one of the ASA Fellows this year, in recognition of his support of the dissemination of new ideas in Statistics (my wording) and this is a great initiative from the ASA committee on Fellows as he unreservedly deserves it, if only for launching the Use R! series!


As mentioned by Julien, the session on the future of Statistics was reserved to the happy “fews” who managed to get a seat and others had to stay in the “present” thanks to this safety regulation that seems to be implemented on some talks/rooms and not others. I passed the first people being stopped by a fierce guard on my way to the “past”, ie to the cosmology and astrophysics session. There, I enjoyed very much Larry Wasserman’s talk on Nonparametric estimation of filaments for uncovering a challenging problem as well as for his elegant resolution of the problem. As well as the presentation by Laura Cayon of Detection of weak lensing, where I discovered that my old Purdue friend Anirban das Gupta was also involved in cosmology. I also went to the Monte Carlo and Sequential Analyses: Methods and Applications session, organised by Mike West, but the talks were too short to make much of an impact on me, even though I appreciated the talk by Minghui Shi on Particle stochastic search for high-dimensional variable selection that linked with Nicolas Chopin’s early work on exploring a large dataset and I was also intrigued by the talk of Ioanna Manolopoulou on Targeted sequential resampling from large Data sets in mixture modeling for using proxies to the real mixture model. The day ended up with a Board meeting for ISBA, that unfortunately took place outside in a hot humid weather… I now have to get ready for the Gertrude Cox Scholarship 5k race, since it starts at 6:15am (yes, am!).


Get every new post delivered to your Inbox.

Join 634 other followers