**T**his paper was arXived today by Martino and Louzada. It starts from the adaptive rejection sampling algorithm of Gilks and Wild (1993), which provided an almost automatic random generator for univariate log-concave target probability densities. By creating an envelope of the true target based on convexity. This algorithm was at the core of BUGS in its early days. And makes a good example of accept reject algorithm that I often use in class. It is however not so popular nowadays because of the unidimensional constraint. And because it is not particularly adequate for simulating a *single* realisation from a given target. As is the case when used in a Gibbs sampler. The authors only consider here the issue of the growing cost of running the adaptive rejection sampling algorithm, which is due to the update of the envelope at each rejection. They however fail to quantify this increase, which involves (a) finding the interval where the current rejection lies, among n existing intervals, which is of order O(n), and (b) creating both modified envelopes on both new intervals, which is of order O(1). The proposal of the authors, called cheap adaptive rejection sampling, settles for a fixed number τ of support points (and intervals), solely swapping a rejected point with the closest support point when this improves the acceptance rate. The test for swapping involves first computing the alternative target (on a pair of intervals) and the corresponding normalising constant, while swapping the rejected point with the closest support point involves finding the closest point, which is of order O(log τ). I am rather surprised that the authors do not even mention the execution time orders, resorting instead to a simulation where the gain brought by their proposal is not overwhelming. There is also an additional cost for figuring out the fixed number τ of support points, another issue not mentioned in the paper.

## Archive for accept-reject algorithm

## adaptive rejection sampling with fixed number of nodes

Posted in Books, Statistics, University life with tags accept-reject algorithm, ARS, BUGS, Gibbs sampler, Monte Carlo Statistical Methods, Wally Gilks on October 8, 2015 by xi'an## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags accept-reject algorithm, George Casella, Introducing Monte Carlo Methods with R, Lebesgue integration, Riemann integration on April 8, 2015 by xi'an**I** had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

which seemed to be missing a density term or, in other words, be different from

What is surprising for me is that the integral

has a clear meaning as a Riemann integral, hence should be more intuitive….

## recycling accept-reject rejections (#2)

Posted in R, Statistics, University life with tags accept-reject algorithm, compiler, Data augmentation, Gibbs sampling, MCMC, Monte Carlo Statistical Methods, Student's t distribution on July 2, 2014 by xi'an**F**ollowing yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,

which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).

**A**s shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:

> system.time(g8()) user system elapsed 53.751 0.056 54.103 > system.time(g9()) user system elapsed 1.156 0.000 1.161

when compared with the no-completion version. Here is the entire R code that produced both MCMC samples: Continue reading

## recycling accept-reject rejections

Posted in Statistics, University life with tags accept-reject algorithm, arXiv, auxiliary variable, Data augmentation, George Casella, intractable likelihood, Monte Carlo Statistical Methods, Rao-Blackwellisation, recycling, untractable normalizing constant on July 1, 2014 by xi'an**V**inayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (*aka* likelihood) satisfies

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

A closed-form, if not necessarily congenial, function.

**N**ow this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

**I**t is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006) The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

- in closed form;
- with a manageable and tight enough “constant” M;
- compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

**T**he paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of *rejection* is bounded from below, i.e. calling for a *less* efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of *acceptance* is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, *Statistical Science*).

**N**ote also that, despite the opposition raised by the authors, the method *per se* does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

*Maybe some further experimental evidence tomorrow…*

## R finals

Posted in R, Statistics, University life with tags accept-reject algorithm, bootstrap, India, ISBA, ks.test(), normalising constant, R exam, Université Paris-Sud, Varanasi on January 31, 2013 by xi'an**O**n the morning I returned from Varanasi and the ISBA meeting there, I had to give my R final exam (along with three of my colleagues in Paris-Dauphine). This year, the R course was completely in English, exam included, which means I can post it here as it may attract more interest than the French examens of past years…

**I** just completed grading my 32 copies, all from exam A, which takes a while as I have to check (and sometimes recover) the R code, and often to correct the obvious mistakes to see if the deeper understanding of the concepts is there. This year student cohort is surprisingly homogeneous: I did not spot any of the horrors I may have mentioned in previous posts.

**I** must alas acknowledge a grievous typo in the version of Exam B that was used the day of the final: cutting-and-pasting from A to B, I forgot to change the parameters in Exercise 2, asking them to simulate a Gamma(0,1). It is only after half an hour that a bright student pointed out the impossibility… We had tested the exams prior to printing them but this somehow escaped the four of us!

**N**ow, as I was entering my grades into the global spreadsheet, I noticed a perfect… lack of correlation between those and the grades at the midterm exam. I wonder what that means: I could be grading at random, the levels in November and in January could be uncorrelated, some students could have cheated in November and others in January, student’s names or file names got mixed up, …? A rather surprising outcome!

## optimising accept-reject

Posted in R, Statistics, University life with tags accept-reject algorithm, loops, Poisson distribution, R, R course, SAS, system.time on November 21, 2012 by xi'an**I** spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one Poisson distribution, the code

rtpois=function(n,lambda){ sampl=c() while (length(sampl)<n){ x=rpois(1,lambda) if (x!=0) sampl=c(sampl,x)} return(sampl) }

is favoured by my R course students but highly inefficient:

> system.time(rtpois(10^5,.5)) user system elapsed 61.600 27.781 98.727

both for the stepwise increase in the size of the vector and for the loop. For instance, defining the vector sampl first requires a tenth of the above time (note the switch from 10⁵ to 10⁶):

> system.time(rtpois(10^6,.5)) user system elapsed 54.155 0.200 62.635

**A**s discussed by the author of the post, a more efficient programming should aim at avoiding the loop by predicting the number of proposals necessary to accept a given number of values. Since the bound M used in accept-reject algorithms is also the expected number of attempts for one acceptance, one should start with something around Mn proposed values. (Assuming of course all densities are properly normalised.) For instance, in the case of the truncated-at-one Poisson distribution based on proposals from the regular Poisson, the bound is 1/1-e^{-λ}. A first implementation of this principle is to build the sample via a few loops:

rtpois=function(n,lambda){ propal=rpois(ceiling(n/(1-exp(-lambda))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda))) }

with a higher efficiency:

> system.time(rtpois(10^6,.5)) user system elapsed 0.816 0.092 0.928

Replacing the expectation with an upper bound using the variance of the negative binomial distribution does not make a significant dent in the computing time

rtpois=function(n,lambda){ M=1/(1-exp(-lambda)) propal=rpois(ceiling(M*(n+2*sqrt(n/M)/(M-1))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda)))}

since we get

> system.time(rtpois(10^6,.5)) user system elapsed 0.824 0.048 0.877

**T**he second point about this Poisson example is that simulating a distribution with a restricted support using another distribution with a larger support is quite inefficient. Especially when λ goes to zero By comparison, using a Poisson proposal with parameter μ and translating it by 1 may bring a considerable improvement: without getting into the gory details, it can be shown that the optimal value of μ (in terms of maximal acceptance probability) is λ and that the corresponding probability of acceptance is

which is larger than the probability of the original approach when λ is less than one. As shown by the graph below, this allows for a lower bound in the probability of acceptance that remains tolerable.