Archive for sampling from an atomic population

an elegant sampler

Posted in Books, Kids, R, University life with tags , , , , , , , on January 15, 2020 by xi'an

Following an X validated question on how to simulate a multinomial with fixed average, W. Huber produced a highly elegant and efficient resolution with the compact R code

tabulate(sample.int((k-1)*n, s-n) %% n + 1, n) + 1

where k is the number of classes, n the number of draws, and s equal to n times the fixed average. The R function sample.int is an alternative to sample that seems faster. Breaking the outcome of

sample.int((k-1)*n, s-n)

as nonzero positions in an n x (k-1) matrix and adding a adding a row of n 1’s leads to a simulation of integers between 1 and k by counting the 1’s in each of the n columns, which is the meaning of the above picture. Where the colour code is added after counting the number of 1’s. Since there are s 1’s in this matrix, the sum is automatically equal to s. Since the s-n positions are chosen uniformly over the n x (k-1) locations, the outcome is uniform. The rest of the R code is a brutally efficient way to translate the idea into a function. (By comparison, I brute-forced the question by suggesting a basic Metropolis algorithm.)

fine-sliced Poisson [a.k.a. sashimi]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , , , on March 20, 2014 by xi'an

As my student Kévin Guimard had not mailed me his own Poisson slice sampler of a Poisson distribution, I could not tell why the code was not working! My earlier post prompted him to do so and a somewhat optimised version is given below:

nsim = 10^4
lambda = 6

max.factorial = function(x,u){
        k = x
        parf=1
        while (parf*u<1){
          k = k + 1
          parf = parf * k
          }
        k = k - (parf*u>1)
        return (k)
        }

x = rep(floor(lambda), nsim)
for (t in 2:nsim){
        v1 = ceiling((log(runif(1))/log(lambda))+x[t-1])
        ranj=max(0,v1):max.factorial(x[t-1],runif(1))
        x[t]=sample(ranj,size=1)
        }
barplot(as.vector(rbind(
   table(x)/length(x),dpois(min(x):max(x),
   lambda))),col=c("sienna","gold"))

As you can easily check by running the code, it does not work. My student actually majored my MCMC class and he spent quite a while pondering why the code was not working. I did ponder as well for a part of a morning in Warwick, removing causes for exponential or factorial overflows (hence the shape of the code), but not eliciting the issue… (This now sounds like lethal fugu sashimi! ) Before reading any further, can you spot the problem?!

The corrected R code is as follows:

x = rep(lambda, nsim)
for (t in 2:nsim){
        v1=ceiling((log(runif(1))/log(lambda))+x[t-1])
        ranj=max(0,v1):max.factorial(x[t-1],runif(1))
        if (length(ranj)>1){
          x[t] = sample(ranj, size = 1)
          }else{
                x[t]=ranj}
 }

The culprit is thus the R function sample which simply does not understand Dirac masses and the basics of probability! When running

> sample(150:150,1)
[1] 23

you can clearly see where the problem stands…! Well-documented issue with sample that already caused me woes… Another interesting thing about this slice sampler is that it is awfully slow in exploring the tails. And to converge to the centre from the tails. This is not very pronounced in the above graph with a mean of 6. Moving to 50 makes it more apparent:

slisson5This is due to the poor mixing of the chain, as shown by the raw sequence below, which strives to achieve a single cycle out of 10⁵ iterations! In any case, thanks to Kévin for an interesting morning!

slisson4

%d bloggers like this: