## fine-sliced Poisson [a.k.a. sashimi]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , , , on March 20, 2014 by xi'an

As my student Kévin Guimard had not mailed me his own Poisson slice sampler of a Poisson distribution, I could not tell why the code was not working! My earlier post prompted him to do so and a somewhat optimised version is given below:

nsim = 10^4
lambda = 6

max.factorial = function(x,u){
k = x
parf=1
while (parf*u<1){
k = k + 1
parf = parf * k
}
k = k - (parf*u>1)
return (k)
}

x = rep(floor(lambda), nsim)
for (t in 2:nsim){
v1 = ceiling((log(runif(1))/log(lambda))+x[t-1])
ranj=max(0,v1):max.factorial(x[t-1],runif(1))
x[t]=sample(ranj,size=1)
}
barplot(as.vector(rbind(
table(x)/length(x),dpois(min(x):max(x),
lambda))),col=c("sienna","gold"))


As you can easily check by running the code, it does not work. My student actually majored my MCMC class and he spent quite a while pondering why the code was not working. I did ponder as well for a part of a morning in Warwick, removing causes for exponential or factorial overflows (hence the shape of the code), but not eliciting the issue… (This now sounds like lethal fugu sashimi! ) Before reading any further, can you spot the problem?!

The corrected R code is as follows:

x = rep(lambda, nsim)
for (t in 2:nsim){
v1=ceiling((log(runif(1))/log(lambda))+x[t-1])
ranj=max(0,v1):max.factorial(x[t-1],runif(1))
if (length(ranj)>1){
x[t] = sample(ranj, size = 1)
}else{
x[t]=ranj}
}


The culprit is thus the R function sample which simply does not understand Dirac masses and the basics of probability! When running

> sample(150:150,1)
[1] 23


you can clearly see where the problem stands…! Well-documented issue with sample that already caused me woes… Another interesting thing about this slice sampler is that it is awfully slow in exploring the tails. And to converge to the centre from the tails. This is not very pronounced in the above graph with a mean of 6. Moving to 50 makes it more apparent:

This is due to the poor mixing of the chain, as shown by the raw sequence below, which strives to achieve a single cycle out of 10⁵ iterations! In any case, thanks to Kévin for an interesting morning!

## sliced Poisson

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , on March 18, 2014 by xi'an

One of my students complained that his slice sampler of a Poisson distribution was not working when following the instructions in Monte Carlo Statistical Methods (Exercise 8.5). This puzzled me during my early morning run and I checked on my way back, even before attacking the fresh baguette I had brought from the bakery… The following R code is the check. And it does work! As the comparison above shows…

slice=function(el,u){
#generate uniform over finite integer set
mode=floor(lambda)
sli=mode
x=mode+1
while (dpois(x,el)>u){
sli=c(sli,x);x=x+1}
x=mode-1
while (dpois(x,el)>u){
sli=c(sli,x);x=x-1}
return(sample(sli,1))}

#example
T=10^4
lambda=2.414

x=rep(floor(lambda),T)
for (t in 2:T)
x[t]=slice(lambda,runif(1)*dpois(x[t-1],lambda))

barplot(as.vector(rbind(
table(x)/length(x),dpois(0:max(x),
lambda))),col=c("sienna","gold"))


## optimising accept-reject

Posted in R, Statistics, University life with tags , , , , , , on November 21, 2012 by xi'an

I spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one Poisson distribution, the code

rtpois=function(n,lambda){
sampl=c()
while (length(sampl)<n){
x=rpois(1,lambda)
if (x!=0) sampl=c(sampl,x)}
return(sampl)
}


is favoured by my R course students but highly inefficient:

> system.time(rtpois(10^5,.5))
user  system elapsed
61.600  27.781  98.727


both for the stepwise increase in the size of the vector and for the loop. For instance, defining the vector sampl first requires a tenth of the above time (note the switch from 10⁵ to 10⁶):

> system.time(rtpois(10^6,.5))
user  system elapsed
54.155   0.200  62.635


As discussed by the author of the post, a more efficient programming should aim at avoiding the loop by predicting the number of proposals necessary to accept a given number of values. Since the bound M used in accept-reject algorithms is also the expected number of attempts for one acceptance, one should start with something around Mn proposed values. (Assuming of course all densities are properly normalised.) For instance, in the case of the truncated-at-one Poisson distribution based on proposals from the regular Poisson, the bound is 1/1-e. A first implementation of this principle is to build the sample via a few loops:

rtpois=function(n,lambda){
propal=rpois(ceiling(n/(1-exp(-lambda))),lambda)
propal=propal[propal>0]
n0=length(propal)
if (n0>=n)
return(propal[1:n])
else return(c(propal,rtpois(n-n0,lambda)))
}


with a higher efficiency:

> system.time(rtpois(10^6,.5))
user  system elapsed
0.816   0.092   0.928


Replacing the expectation with an upper bound using the variance of the negative binomial distribution does not make a significant dent in the computing time

rtpois=function(n,lambda){
M=1/(1-exp(-lambda))
propal=rpois(ceiling(M*(n+2*sqrt(n/M)/(M-1))),lambda)
propal=propal[propal>0]
n0=length(propal)
if (n0>=n)
return(propal[1:n])
else return(c(propal,rtpois(n-n0,lambda)))}


since we get

> system.time(rtpois(10^6,.5))
user  system elapsed
0.824   0.048   0.877


The second point about this Poisson example is that simulating a distribution with a restricted support using another distribution with a larger support is quite inefficient. Especially when λ goes to zero By comparison, using a Poisson proposal with parameter μ and translating it by 1 may bring a considerable improvement: without getting into the gory details, it can be shown that the optimal value of μ (in terms of maximal acceptance probability) is λ and that the corresponding probability of acceptance is

$\dfrac{1-\exp\{-\lambda\}}{\lambda}$

which is larger than the probability of the original approach when λ is less than one. As shown by the graph below, this allows for a lower bound in the probability of acceptance that remains tolerable.

## Correlated Poissons

Posted in Statistics with tags , , on March 2, 2011 by xi'an

A graduate student came to see me the other day with a bivariate Poisson distribution and a question about using EM in this framework. The problem boils down to adding one correlation parameter and an extra term in the likelihood

$(1-\rho)^{n_1}(1+\lambda\rho)^{n_2}(1+\mu\rho)^{n_3}(1-\lambda\mu\rho)^{n_4}\quad 0\le\rho\le\min(1,\frac{1}{\lambda\mu})$

Both terms involving sums are easy to deal with, using latent variables as in mixture models. The subtractions are trickier, as the negative parts cannot appear in a conditional distribution. Even though the problem can be handled by a direct numerical maximisation or by an almost standard Metropolis-within-Gibbs sampler, my suggestion regarding EM per se was to proceed by conditional EM, one parameter at a time. For instance, when considering $\rho$ conditional on both Poisson parameters, depending on whether $\lambda\mu>1$ or not, one can consider either

$(1-\theta/\lambda\mu)^{n_1}(1+\theta/\mu)^{n_2}(1+\theta/\lambda)^{n_3}(1-\theta)^{n_4}\quad0<\theta<1$

and turn

$(1-\theta/\lambda\mu) \text{ into } (1-\theta+\theta\{1-\frac{1}{\lambda\mu}\})$

thus producing a Beta-like target function in $\theta$ after completion, or turn

$(1-\lambda\mu\rho) \text{ into } (1-\rho+\{1-\lambda\mu\}\rho)$

to produce a Beta-like target function in $\rho$ after completion. In the end, this is a rather pedestrian exercise and I am still frustrated at missing the trick to handle the subtractions directly, however it was nonetheless a nice question!