Archive for speed

60k versus 65k

Posted in pictures, Running, Travel with tags , , , , on August 25, 2012 by xi'an

An add in Melbourne took a while to click in (for me!): it represented a woman projected on the hood of a car with the legend 65k and the same woman prostrated in front of the same car with the legend 60k… I was seeing this ad every day when driven to Monash and could not see the point as I was interpreting 60k as 60kg! So it sounded like a weird campaign for a new diet… After a while, I eventually got the point that it was a campaign towards speed reduction and against drivers thinking that 60km/h does not differ much from 65km/h. (I could not find a reproduction of the campaign posters on the official site.) Besides this misinterpretation, I find the message rather unclear and unconvincing: while driving more slowly obviously gives a driver more time to react, the 60k/65k opposition could be replaced with a 55k/60k opposition and would not make less or more sense. Furthermore, the variability in driver’s reactions and car behaviours is likely to influence the consequences of an impact as significantly as a reduction of 5km/h…

speed of R, C, &tc.

Posted in R, Running, Statistics, University life with tags , , , , , , , , , on February 3, 2012 by xi'an

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

  • matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
  • octave is a good matlab emulator, if no special attention is payed to execution speed…;
  • scilab appears as a credible, efficient alternative to matlab;
  • still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!

News about speeding R up

Posted in R, Running, Statistics with tags , , , , , on May 24, 2011 by xi'an

The most visited post ever on the ‘Og was In{s}a(ne), my report on Radford Neal’s experiments with speeding up R by using different brackets (the second most populat was Ross Ihaka’s comments, “simply start over and build something better”). I just spotted two new entries by Radford on his blog that are bound to rekindle the debate about the speed of R. The latest one shows that matrix multiplication can be made close to ten time faster by changing the way testing for the presence of NaN’s in a matrix is operated. This gain is not as shocking as producing a 25% improvement when replacing x=1/(1+x) with x=1/{1+x}, but a factor 10 is such a major gain…

Julien on R shortcomings

Posted in Books, R, Statistics, University life with tags , , , , , , , on September 8, 2010 by xi'an

Julien Cornebise posted a rather detailed set of comments (from Jasper!) that I thought was interesting and thought-provoking enough (!) to promote to a guest post. Here it is , then, to keep the debate rolling (with my only censoring being the removal of smileys!). (Please keep in mind that I do not endorse everything stated in this guest post! Especially the point on “Use R!“)

On C vs R
As a reply to Duncan: indeed C (at least for the bottlenecks) will probably always be faster for the final, mainstream use of an algorithm [e.g. as a distributed R library, or a standalone program]. Machine-level, smart compilers, etc etc. The same goes for Matlab, and even for Python: e.g. Pierre Jacob (Xian’s great PhD student) uses Weave to inline C in his Python code for the bottlenecks — simple, and fast. Some hedge funds even hire coders to recode the Matlab code of their consulting academic statisticians.

Point taken. But, as Radford Neal points out, that doesn’t justify R to be much slower that it could be:

  • When statisticians (cf Xian) want to develop/prototype new algorithms and methods while focussing on the math/stat/algo more than on the language-dependent implementation, it is still a shame to waste 50% (or even 25%). Same goes for the memory management, or even for some language features[1]
  • Even less computer-savvy users of R for real-case data, willing to use existing algorithms (not developing new algos) but on big/intricate datasets can be put off by slow speed — or even by memory failures.
  • And the library is BRILLIANT.

On Future Language vs R
Thanks David and Martyn for the link to Ihaka’s great paper on R-like lisp-based. Says things better than I could, and with an expertise on R that I haven’t. I also didn’t know about Robert Gentleman and his success at Harvard (but he *invented* the thing, not merely tuned it up).

Developing a whole new language and concept, as advocated in Ihaka’s paper and as suggested by gappy3000 would be a great leap forward, and a needed breakthrough to change the way we use computational stats. I would *love* to see that, as I personally think (as Ihaka advocates in the paper you link to) that R, as a language, is a hell of a pain [2] and I am saddened to see a lot of “Use R” books who will root its inadequate use for needs where the language hardly fits the bill — although the library does.

But R is here and in everyday use, and the matter is more of making it worth using, to its full potential. I have no special attachment to R, but any breakthrough language that would not be entirely compatible with the massive library contributed over the years would be doomed to fail to pick-up the everyday statistician—and we’re talking here about far-fetched long-term moves. Sanitary breakthrough, but harder to make happen when such an anchor is here.
I would say that R has turned into the Fortran of statistics: here to stay, anchored by the inertia that stems from its intrinsic (and widely acknowledged) merits  (I’ve been nice, I didn’t say Cobol.).

So until of the great leap forward comes (or until we make it happen as a community), I second Radford Neal‘s call for optimization of the existing core of R.

Rejoinder
As a rejoinder to the comments here, I think we need to consider separately

  1. R’s brilliant library
  2. R’s not-so-brilliant language and/or interpreter.

It seems to me from this topic that the community needs/should push for, in chronological order.

  1. First, a speed-up of R’s existing interpreter as called for by Radford Neal.  “Easy” and short-term task, by good-willing amateur coders, or, better, by solid CS people.
  2. Team-up with CS experts interested in developing computational stat-related tools.
  3. With them, get out of the now dead-ended R language and embark on a new stat framework based on an *existing*, proven, language. *Must*  be able to reuse the brilliant R library/codes brought up by the community. Failing so would fail to pick up the userbase = die in limbo.  That’s more or less what is called for by Ihaka (except for his doubts on the backward compatibility, see Section 7 of his paper).  Much harder and longer term, but worth it.

From then on
Who knows the R community enough to relay this call, and make it happen ? I’m out of my league.

Uninteresting footnotes:
[1] I have twitched several times when trying R, feeling the coding was somewhat unnatural from a CS point of view. [Mind, I twitch all the same, although on other points, with Matlab]
[2] again, I speak only out of the few tries I gave it, as I gave up using it for my everyday work, I am biased — and ignorant

Neal

In{s}a(ne)!!

Posted in R, Statistics with tags , , , , on September 6, 2010 by xi'an

Having missed the earliest entry by Radford last month, due to disconnection in Yosemite, I was stunned to read his three entries of the past month about R performances being significantly modify when changing brackets with curly brackets! I (obviously!) checked on my own machine and found indeed the changes in system.time uncovered by Radford… The worst is that I have a tendency to use redundant brackets to separate entities in long expressions and thus make them more readable (I know, this is debatable!), so each new parenthesis add to the wasted time… Note that it is the same with curly bracket: any extra curly bracket takes some additional computing time…

> f=function(n) for (i in 1:n) x=1/(1+x)
> g=function(n) for (i in 1:n) x=(1/(1+x))
> h=function(n) for (i in 1:n) x=(1+x)^(-1)
> j=function(n) for (i in 1:n) x={1/{1+x}}
> k=function(n) for (i in 1:n) x=1/{1+x}
> x=1
> system.time(f(10^6))
user  system elapsed
1.684   0.020   1.705
> system.time(g(10^6))
user  system elapsed
1.964   0.012   1.976
> system.time(h(10^6))
user  system elapsed
2.452   0.016   2.470
> system.time(j(10^6))
user  system elapsed
1.716   0.008   1.725
> system.time(k(10^6))
user  system elapsed
1.532   0.016   1.548

In the latest of his posts, Radford lists a series of 14 patches that speed up R up to 25%… That R can face such absurdities is pretty annoying! Given that I am currently fighting testing an ABC algorithm with MCMC inner runs, which takes forever to run, this is even depressing!!! As Radford stresses, “slow speed is a significant impediment to greater use of R, much more so than lack of some wish list of new features.”

Follow

Get every new post delivered to your Inbox.

Join 557 other followers