## optimising accept-reject

Posted in R, Statistics, University life with tags , , , , , , on November 21, 2012 by xi'an

I spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one Poisson distribution, the code

rtpois=function(n,lambda){
sampl=c()
while (length(sampl)<n){
x=rpois(1,lambda)
if (x!=0) sampl=c(sampl,x)}
return(sampl)
}


is favoured by my R course students but highly inefficient:

> system.time(rtpois(10^5,.5))
user  system elapsed
61.600  27.781  98.727


both for the stepwise increase in the size of the vector and for the loop. For instance, defining the vector sampl first requires a tenth of the above time (note the switch from 10⁵ to 10⁶):

> system.time(rtpois(10^6,.5))
user  system elapsed
54.155   0.200  62.635


As discussed by the author of the post, a more efficient programming should aim at avoiding the loop by predicting the number of proposals necessary to accept a given number of values. Since the bound M used in accept-reject algorithms is also the expected number of attempts for one acceptance, one should start with something around Mn proposed values. (Assuming of course all densities are properly normalised.) For instance, in the case of the truncated-at-one Poisson distribution based on proposals from the regular Poisson, the bound is 1/1-e. A first implementation of this principle is to build the sample via a few loops:

rtpois=function(n,lambda){
propal=rpois(ceiling(n/(1-exp(-lambda))),lambda)
propal=propal[propal>0]
n0=length(propal)
if (n0>=n)
return(propal[1:n])
else return(c(propal,rtpois(n-n0,lambda)))
}


with a higher efficiency:

> system.time(rtpois(10^6,.5))
user  system elapsed
0.816   0.092   0.928


Replacing the expectation with an upper bound using the variance of the negative binomial distribution does not make a significant dent in the computing time

rtpois=function(n,lambda){
M=1/(1-exp(-lambda))
propal=rpois(ceiling(M*(n+2*sqrt(n/M)/(M-1))),lambda)
propal=propal[propal>0]
n0=length(propal)
if (n0>=n)
return(propal[1:n])
else return(c(propal,rtpois(n-n0,lambda)))}


since we get

> system.time(rtpois(10^6,.5))
user  system elapsed
0.824   0.048   0.877


The second point about this Poisson example is that simulating a distribution with a restricted support using another distribution with a larger support is quite inefficient. Especially when λ goes to zero By comparison, using a Poisson proposal with parameter μ and translating it by 1 may bring a considerable improvement: without getting into the gory details, it can be shown that the optimal value of μ (in terms of maximal acceptance probability) is λ and that the corresponding probability of acceptance is

$\dfrac{1-\exp\{-\lambda\}}{\lambda}$

which is larger than the probability of the original approach when λ is less than one. As shown by the graph below, this allows for a lower bound in the probability of acceptance that remains tolerable.

## In{s}a(ne)!!

Posted in R, Statistics with tags , , , , on September 6, 2010 by xi'an

Having missed the earliest entry by Radford last month, due to disconnection in Yosemite, I was stunned to read his three entries of the past month about R performances being significantly modify when changing brackets with curly brackets! I (obviously!) checked on my own machine and found indeed the changes in system.time uncovered by Radford… The worst is that I have a tendency to use redundant brackets to separate entities in long expressions and thus make them more readable (I know, this is debatable!), so each new parenthesis add to the wasted time… Note that it is the same with curly bracket: any extra curly bracket takes some additional computing time…

> f=function(n) for (i in 1:n) x=1/(1+x)
> g=function(n) for (i in 1:n) x=(1/(1+x))
> h=function(n) for (i in 1:n) x=(1+x)^(-1)
> j=function(n) for (i in 1:n) x={1/{1+x}}
> k=function(n) for (i in 1:n) x=1/{1+x}
> x=1
> system.time(f(10^6))
user  system elapsed
1.684   0.020   1.705
> system.time(g(10^6))
user  system elapsed
1.964   0.012   1.976
> system.time(h(10^6))
user  system elapsed
2.452   0.016   2.470
> system.time(j(10^6))
user  system elapsed
1.716   0.008   1.725
> system.time(k(10^6))
user  system elapsed
1.532   0.016   1.548

In the latest of his posts, Radford lists a series of 14 patches that speed up R up to 25%… That R can face such absurdities is pretty annoying! Given that I am currently fighting testing an ABC algorithm with MCMC inner runs, which takes forever to run, this is even depressing!!! As Radford stresses, “slow speed is a significant impediment to greater use of R, much more so than lack of some wish list of new features.”