take those hats off [from R]!
This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh [who are visiting at the moment] and when coding an MCMC run I was telling them that I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons. Suddenly, I became worried that this representation involved a computation, as opposed to Nsim=1e3 and ran a little experiment:
> system.time(for (t in 1:10^8) x=10^3) utilisateur système écoulé 30.704 0.032 30.717 > system.time(for (t in 1:1e8) x=10^3) utilisateur système écoulé 30.338 0.040 30.359 > system.time(for (t in 1:10^8) x=1000) utilisateur système écoulé 6.548 0.084 6.631 > system.time(for (t in 1:1e8) x=1000) utilisateur système écoulé 6.088 0.032 6.115 > system.time(for (t in 1:10^8) x=1e3) utilisateur système écoulé 6.134 0.029 6.157 > system.time(for (t in 1:1e8) x=1e3) utilisateur système écoulé 6.627 0.032 6.654 > system.time(for (t in 1:10^8) x=exp(3*log(10))) utilisateur système écoulé 60.571 0.000 57.103
So using the usual scientific notation with powers is taking its toll! While the calculator notation with e is cost free… Weird!
I understand that the R notation 10^6 is an abbreviation for a power function that can be equally applied to pi^pi, say, but still feel aggrieved that a nice scientific notation like 10⁶ ends up as a computing trap! I thus asked the question to the Stack Overflow forum, getting the (predictable) answer that the R code 10^6 meant calling the R power function, while 1e6 was a constant. Since 10⁶ does not differ from ππ, there is no reason 10⁶ should be recognised by R as a million. Except that it makes my coding more coherent.
> system.time( for (t in 1:10^8) x=pi^pi) utilisateur système écoulé 44.518 0.000 43.179 > system.time( for (t in 1:10^8) x=10^6) utilisateur système écoulé 38.336 0.000 37.860
Another thing I discovered from this answer to my question is that negative integers are also requesting call to a function:
> system.time( for (t in 1:10^8) x=1) utilisateur système écoulé 10.561 0.801 11.062 > system.time( for (t in 1:10^8) x=-1) utilisateur système écoulé 22.711 0.860 23.098
This sounds even weirder.
May 6, 2015 at 10:45 am
I see no conclusion being made as to whether your findings should be a concern or not to the everyday user. You start your article saying you have found what could be “a hugely (?) delaying tactic in [your] R codes” but do not really answer the embedded question.
While the differences in computation times in your experiments are relatively high (5x and 10x), all approaches remain ridiculously fast at an absolute level. After all, you had to do close to a billion such operations to start noticing a difference! So when comparing a single `n <- 10^3` versus `n<-1e3`, you are only talking about saving a few nanoseconds. I bet that if you were to profile your overall code with or without the use of `n <- 10^3`, you would not be able to notice a statistically significant difference. Similarly, the fact `x <- -1` is making a call to the `-` function should be absolutely no concern to the lambda user. It is so fast there is nothing to notice in a sea of much slower operations.
So the conclusion should be, in my opinion, that you can keep using whatever you like. You mention "readability reasons" for preferring using `Nsim=10^3`. By all means, this kind of considerations (readability, personal taste) should take priority over a few saved nanoseconds. Keep the hats if you want!
May 6, 2015 at 12:48 am
You can use JIT compilation too
require(compiler)
f <- function() {
for (t in 1:10^8) x=10^3
}
cf <- cmpfun(f)
system.time(f())
system.time(cf())
May 6, 2015 at 7:55 am
Thanks, Alex: very good point, I wonder why I/we do not use the compile function all the time..!
May 5, 2015 at 4:40 pm
[…] R: 10^3 vs. 1e3 This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh [who are visiting at the moment] and when coding an MCMC run I was telling them that I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons. Suddenly, I became worried that this representation involved a computation, as opposed to Nsim=1e3 and ran a little experiment: […]
May 5, 2015 at 10:04 am
“I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons” – and this then is a good indicator why mixing both notations is a Bad Idea ;-)
May 5, 2015 at 8:59 am
The negative, and positive for that matter, operators are indeed valid unary functions in R, and as such require function calls.
> system.time( for (t in 1:10^8) x=1)
user system elapsed
11.74 0.01 11.81
> system.time( for (t in 1:10^8) x=+1)
user system elapsed
17.89 0.09 18.07
> system.time( for (t in 1:10^8) x=-1)
user system elapsed
20.50 0.03 20.64
This is seen if you request the function code
> `-`
function (e1, e2) .Primitive(“-“)
May 5, 2015 at 8:52 am
R uses the C system function pow for the ^ operator, as per the help file, which means that there are potentially optimisations for small, integer exponents taking place in the calculation of pow(10,6) [pow(int, int)] which won’t be the case in pow(π,π) [pow(double, double)] (I’m less familiar with C, I know Fortran does this). There is potentially type casting going on too depending on which overloads are available.
> system.time(for (t in 1:1e8) x=1.3^3L)
user system elapsed
34.57 0.07 34.66
> system.time(for (t in 1:1e8) x=1.3^3.0)
user system elapsed
35.46 0.09 35.63
> system.time(for (t in 1:1e8) x=1L^3L)
user system elapsed
29.81 0.06 29.94
My guess as to why the specification of the loop limit has no impact is that the upper limit is only evaluated once.
Repeating your test with even simple multiplication results in the same issue as raising to a power, so it’s the overhead of function evaluations, which you could possibly track down with profiling if it’s happening on the R side of the call, though it’s probably on the C side.
> system.time(for (t in 1:10^8) x=10^3)
user system elapsed
37.80 0.08 38.17
> system.time(for (t in 1:10^8) x=10*3)
user system elapsed
27.44 0.09 27.58
> system.time(for (t in 1:10^8) x=1e3)
user system elapsed
12.78 0.05 12.85
> system.time(for (t in 1:10^8) x=1000)
user system elapsed
12.56 0.05 12.65
Lastly, I found a wide variance for the repeated results of each test, so I wouldn’t go taking these numbers as a quantitative measure. Qualitative, certainly.
May 5, 2015 at 5:31 am
Just curious–do you know of a workaround for the negative numbers?
May 5, 2015 at 8:11 am
Eric: not that I know of but this does not say much. Josh O’Brien in his answer states that there is no way around…
May 5, 2015 at 4:11 pm
The problem is that unary minus is not as simple as it looks. It needs to handle both integer and floating point constants, as well as special numeric constants like NA and TRUE (e.g. -NA is NA, and -TRUE is -1). So it is much easier to parse unary minus as a function call and let the arithmetic functions handle the complexity (see src/main/arithmetic.c) rather than trying to optimize this in the parser.
May 19, 2015 at 11:35 pm
Try “-1.” (with a point). That should tell the compiler to make it a float.
May 20, 2015 at 8:02 am
Thanks for the suggestion. It does not seem to make a difference:
May 5, 2015 at 5:14 am
In your original NSim example (I do it exaclty the way you do it) I don’t know if it makes a big difference (because it’s evaluated only once). Check it out:
M <- 10^7
system.time( for (i in 1:M) x <- M )
N <- 1e7
system.time( for (i in 1:N) x <- N )
May 5, 2015 at 8:09 am
Paulo: Once M=10^7 is defined it does not make a difference indeed!