## Archive for accept-reject algorithm

Posted in Books, Statistics, University life with tags , , , , , , , , , on June 2, 2020 by xi'an

“There are three common approaches for selecting the tolerance sequence (…) [they] can lead to inefficient sampling”

Umberto Simola, Jessi Cisewski-Kehe, Michael Gutmann and Jukka Corander recently arXived a paper entitled Adaptive Approximate Bayesian Computation Tolerance Selection. I appreciate that they start from our ABC-PMC paper, i.e., Beaumont et al. (2009) [although the representation that the ABC tolerances are fixed in advance is somewhat incorrect in that we used in our codes quantiles of the distances to set our tolerances.] This is also the approach advocated for the initialisation step by the current paper.  Although remaining a wee bit vague. Subsequent steps are based on the proximity between the resulting approximations to the ABC posteriors, more exactly with a quantile derived from the maximum of the ratio between two estimated successive ABC posteriors. Mimicking the Accept-Reject step if always one step too late.  The iteration stops when the ratio is almost one, possibly missing the target due to Monte Carlo variability. (Recall that the “optimal” tolerance is not zero for a finite sample size.)

“…the decrease in the acceptance rate is mitigated by the improvement in the proposed particles.”

A problem is that it depends on the form of the approximation and requires non-parametric hence imprecise steps. Maybe variational encoders could help. Interesting approach by Sugiyama et al. (2012), of which I knew nothing, the core idea being that the ratio of two densities is also the solution to minimising a distance between the numerator density and a variable function times the bottom density. However since only the maximum of the ratio is needed, a more focused approach could be devised. Rather than first approximating the ratio and second maximising the estimated ratio. Maybe the solution of Goffinet et al. (1992) on estimating an accept-reject constant could work.

A further comment is that the estimated density is not properly normalised, which lessens the Accept-Reject analogy since the optimum may well stand above one. And thus stop “too soon”. (Incidentally, the paper contains the mixture example of Sisson et al. (2007), for which our own graphs were strongly criticised during our Biometrika submission!)

## ensemble rejection sampling

Posted in Statistics with tags , , , on March 25, 2020 by xi'an

George Deligiannidis, Arnaud Doucet and Sylvain Rubenthaler have constructed a form of Rao-Blackwellised estimate out of a regular rejection sampler. Doubly surprisingly as turning importance sampling into regular sampling plus  gaining over the standard accept-reject estimate. They call their approach ensemble rejection sampling. This is done by seeing the N-sample created from the proposal as an importance sampler, exploiting the importance weights towards estimating the (intractable) normalising constant of the target density, and creating an upper bound on this estimate Ẑ. That depends on the current value X from the N-sample under consideration for acceptance as

Z⁺=Ẑ+{max(w)-w(X)}/N

with a probability Ẑ/Z⁺ to accept X. The amazing result is that the X thus marginaly produced is distributed from the target! Meaning that this is a case for a self-normalised importance sampling distribution producing an exact simulation from the target. While this cannot produce an iid sample, it can be exploited to produce unbiased estimators of expectations under the target. Without even resampling and at a linear cost in the sample size N.

The method can be extended to the dynamic (state-space) case. At a cost of O(N²T) as first observed by Radford Neal. However, the importance sample seems to be distributed from a product of proposals that do not account for the previous particles. But maybe accounting for the observations. While the result involves upper bounds on the dynamic importance weights, the capacity to deliver exact simulations remains a major achievement, in my opinion.

## Why do we draw parameters to draw from a marginal distribution that does not contain the parameters?

Posted in Statistics with tags , , , , , , , on November 3, 2019 by xi'an

A revealing question on X validated of a simulation concept students (and others) have trouble gripping with. Namely using auxiliary variates to simulate from a marginal distribution, since these auxiliary variables are later dismissed and hence appear to them (students) of no use at all. Even after being exposed to the accept-reject algorithm. Or to multiple importance sampling. In the sense that a realisation of a random variable can be associated with a whole series of densities in an importance weight, all of them being valid (but some more equal than others!).

## an independent sampler that maximizes the acceptance rate of the MH algorithm

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on September 3, 2019 by xi'an

An ICLR 2019 paper by Neklyudov, Egorov and Vetrov on an optimal choice of the proposal in an independent Metropolis algorithm I discovered via an X validated question. Namely whether or not the expected Metropolis-Hastings acceptance ratio is always one (which it is not when the support of the proposal is restricted). The paper mentions the domination of the Accept-Reject algorithm by the associated independent Metropolis-Hastings algorithm, which has actually been stated in our Monte Carlo Statistical Methods (1999, Lemma 6.3.2) and may prove even older. The authors also note that the expected acceptance probability is equal to one minus the total variation distance between the joint defined as target x Metropolis-Hastings proposal distribution and its time-reversed version. Which seems to suffer from the same difficulty as the one mentioned in the X validated question. Namely that it only holds when the support of the Metropolis-Hastings proposal is at least the support of the target (or else when the support of the joint defined as target x Metropolis-Hastings proposal distribution is somewhat symmetric. Replacing total variation with Kullback-Leibler then leads to a manageable optimisation target if the proposal is a parameterised independent distribution. With a GAN version when the proposal is not explicitly available. I find it rather strange that one still seeks independent proposals for running Metropolis-Hastings algorithms as the result will depend on the family of proposals considered and as performances will deteriorate with dimension (the authors mention a 10% acceptance rate, which sounds quite low). [As an aside, ICLR 2020 will take part in Addis Abeba next April.]

## simulation fodder for future exams

Posted in Books, Kids, R, Statistics with tags , , , , on February 20, 2019 by xi'an

Here are two nice exercises for a future simulation exam, seen and solved on X validated.The first one is about simulating a Gibbs sampler associated with the joint target

exp{-|x|-|y|-a|y-x|}

defined over IR² for a≥0 (or possibly a>-1). The conditionals are identical and non-standard, but a simple bound on the conditional density is the corresponding standard double exponential density, which makes for a straightforward accept-reject implementation. However it is also feasible to break the full conditional into three parts, depending on the respective positions of x, y, and 0, and to obtain easily invertible cdfs on the three intervals.The second exercise is about simulating from the cdf

$F(x)=1-\exp\{-ax-bx^{p+1}/(p+1)\}$

which can be numerically inverted. It is however more fun to call for an accept-reject algorithm by bounding the density with a ½ ½ mixture of an Exponential Exp(a) and of the 1/(p+1)-th power of an Exponential Exp(b/(p+1)). Since no extra constant appears in the solution,  I suspect the (p+1) in b/(p+1) was introduced on purpose. As seen in the above fit for 10⁶ simulations (and a=1,b=2,p=3), there is no deviation from the target! There is nonetheless an even simpler and rather elegant resolution to the exercise: since the tail function (1-F(x)) appears as the product of two tail functions, exp(-ax) and the other one, the cdf is the distribution of the minimum of two random variates, one with the Exp(a) distribution and the other one being the 1/(p+1)-th power of an Exponential Exp(b/(p+1)) distribution. Which of course returns a very similar histogram fit: