## take those hats off [from R]! This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh [who are visiting at the moment] and when coding an MCMC run I was telling them that I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons. Suddenly, I became worried that this representation involved a computation, as opposed to Nsim=1e3 and ran a little experiment:

```> system.time(for (t in 1:10^8) x=10^3)
utilisateur     système      écoulé
30.704       0.032      30.717
> system.time(for (t in 1:1e8) x=10^3)
utilisateur     système      écoulé
30.338       0.040      30.359
> system.time(for (t in 1:10^8) x=1000)
utilisateur     système      écoulé
6.548       0.084       6.631
> system.time(for (t in 1:1e8) x=1000)
utilisateur     système      écoulé
6.088       0.032       6.115
> system.time(for (t in 1:10^8) x=1e3)
utilisateur     système      écoulé
6.134       0.029       6.157
> system.time(for (t in 1:1e8) x=1e3)
utilisateur     système      écoulé
6.627       0.032       6.654
> system.time(for (t in 1:10^8) x=exp(3*log(10)))
utilisateur     système      écoulé
60.571        0.000     57.103
```

So using the usual scientific notation with powers is taking its toll! While the calculator notation with e is cost free… Weird!

I understand that the R notation 10^6 is an abbreviation for a power function that can be equally applied to pi^pi, say, but still feel aggrieved that a nice scientific notation like 10⁶ ends up as a computing trap! I thus asked the question to the Stack Overflow forum, getting the (predictable) answer that the R code 10^6 meant calling the R power function, while 1e6 was a constant. Since 10⁶ does not differ from ππ, there is no reason 10⁶ should be recognised by R as a million. Except that it makes my coding more coherent.

```> system.time( for (t in 1:10^8) x=pi^pi)
utilisateur     système      écoulé
44.518       0.000      43.179
> system.time( for (t in 1:10^8) x=10^6)
utilisateur     système      écoulé
38.336       0.000      37.860
```

Another thing I discovered from this answer to my question is that negative integers are also requesting call to a function:

```> system.time( for (t in 1:10^8) x=1)
utilisateur     système      écoulé
10.561       0.801      11.062
> system.time( for (t in 1:10^8) x=-1)
utilisateur     système      écoulé
22.711       0.860      23.098
```

This sounds even weirder.

### 14 Responses to “take those hats off [from R]!”

1. flodel81 Says:

I see no conclusion being made as to whether your findings should be a concern or not to the everyday user. You start your article saying you have found what could be “a hugely (?) delaying tactic in [your] R codes” but do not really answer the embedded question.

While the differences in computation times in your experiments are relatively high (5x and 10x), all approaches remain ridiculously fast at an absolute level. After all, you had to do close to a billion such operations to start noticing a difference! So when comparing a single `n <- 10^3` versus `n<-1e3`, you are only talking about saving a few nanoseconds. I bet that if you were to profile your overall code with or without the use of `n <- 10^3`, you would not be able to notice a statistically significant difference. Similarly, the fact `x <- -1` is making a call to the `-` function should be absolutely no concern to the lambda user. It is so fast there is nothing to notice in a sea of much slower operations.

So the conclusion should be, in my opinion, that you can keep using whatever you like. You mention "readability reasons" for preferring using `Nsim=10^3`. By all means, this kind of considerations (readability, personal taste) should take priority over a few saved nanoseconds. Keep the hats if you want!

2. Alex Says:

You can use JIT compilation too

require(compiler)
f <- function() {
for (t in 1:10^8) x=10^3
}
cf <- cmpfun(f)
system.time(f())
system.time(cf())

• xi'an Says:

Thanks, Alex: very good point, I wonder why I/we do not use the compile function all the time..!

3. […] R: 10^3 vs. 1e3 This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh [who are visiting at the moment] and when coding an MCMC run I was telling them that I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons. Suddenly, I became worried that this representation involved a computation, as opposed to Nsim=1e3 and ran a little experiment: […]

4. fjbl Says:

“I usually preferred to code Nsim=10000 as Nsim=10^3 for readability reasons” – and this then is a good indicator why mixing both notations is a Bad Idea ;-)

5. jonocarroll Says:

The negative, and positive for that matter, operators are indeed valid unary functions in R, and as such require function calls.

> system.time( for (t in 1:10^8) x=1)
user system elapsed
11.74 0.01 11.81
> system.time( for (t in 1:10^8) x=+1)
user system elapsed
17.89 0.09 18.07
> system.time( for (t in 1:10^8) x=-1)
user system elapsed
20.50 0.03 20.64

This is seen if you request the function code

> `-`
function (e1, e2) .Primitive(“-“)

6. jonocarroll Says:

R uses the C system function pow for the ^ operator, as per the help file, which means that there are potentially optimisations for small, integer exponents taking place in the calculation of pow(10,6) [pow(int, int)] which won’t be the case in pow(π,π) [pow(double, double)] (I’m less familiar with C, I know Fortran does this). There is potentially type casting going on too depending on which overloads are available.

> system.time(for (t in 1:1e8) x=1.3^3L)
user system elapsed
34.57 0.07 34.66
> system.time(for (t in 1:1e8) x=1.3^3.0)
user system elapsed
35.46 0.09 35.63
> system.time(for (t in 1:1e8) x=1L^3L)
user system elapsed
29.81 0.06 29.94

My guess as to why the specification of the loop limit has no impact is that the upper limit is only evaluated once.

Repeating your test with even simple multiplication results in the same issue as raising to a power, so it’s the overhead of function evaluations, which you could possibly track down with profiling if it’s happening on the R side of the call, though it’s probably on the C side.

> system.time(for (t in 1:10^8) x=10^3)
user system elapsed
37.80 0.08 38.17
> system.time(for (t in 1:10^8) x=10*3)
user system elapsed
27.44 0.09 27.58
> system.time(for (t in 1:10^8) x=1e3)
user system elapsed
12.78 0.05 12.85
> system.time(for (t in 1:10^8) x=1000)
user system elapsed
12.56 0.05 12.65

Lastly, I found a wide variance for the repeated results of each test, so I wouldn’t go taking these numbers as a quantitative measure. Qualitative, certainly.

7. Eric Kernfeld Says:

Just curious–do you know of a workaround for the negative numbers?

• xi'an Says:

Eric: not that I know of but this does not say much. Josh O’Brien in his answer states that there is no way around…

• Martyn Says:

The problem is that unary minus is not as simple as it looks. It needs to handle both integer and floating point constants, as well as special numeric constants like NA and TRUE (e.g. -NA is NA, and -TRUE is -1). So it is much easier to parse unary minus as a function call and let the arithmetic functions handle the complexity (see src/main/arithmetic.c) rather than trying to optimize this in the parser.

• no real name Says:

Try “-1.” (with a point). That should tell the compiler to make it a float.

• xi'an Says:

Thanks for the suggestion. It does not seem to make a difference:

```> microbenchmark(for (t in 1:1e6) x=-1)
Unit: milliseconds
min       lq     mean   median       uq     max
139.3023 173.7036 177.2113 174.6745 176.8892 220.0643
> microbenchmark(for (t in 1:1e6) x=-1.)
Unit: milliseconds
min       lq     mean   median       uq     max
140.5867 175.4953 177.9176 176.1609 178.4025 214.9357
```
8. Paulo Says:

In your original NSim example (I do it exaclty the way you do it) I don’t know if it makes a big difference (because it’s evaluated only once). Check it out:

M <- 10^7
system.time( for (i in 1:M) x <- M )
N <- 1e7
system.time( for (i in 1:N) x <- N )

• xi'an Says:

Paulo: Once M=10^7 is defined it does not make a difference indeed!

This site uses Akismet to reduce spam. Learn how your comment data is processed.