## ARS: when to update?

Posted in Books, Kids, Statistics, University life with tags , , , , , on May 25, 2017 by xi'an

An email I got today from Heng Zhou wondered about the validity of the above form of the ARS algorithm. As printed in our book Monte Carlo Statistical Methods. The worry is that in the original version of the algorithm the envelope of the log-concave target f(.) is only updated for rejected values. My reply to the question is that there is no difference in the versions towards returning a value simulated from f, since changing the envelope between simulations does not modify the accept-reject nature of the algorithm. There is no issue of dependence between the simulations of this adaptive accept-reject method, all simulations remain independent. The question is rather one about efficiency, namely does it pay to update the envelope(s) when accepting a new value and I think it does because the costly part is the computation of f(x), rather than the call to the piecewise-exponential envelope. Correct me if I am wrong!

## complexity of the von Neumann algorithm

Posted in Statistics with tags , , , , , , , , , on April 3, 2017 by xi'an

“Without the possibility of computing infimum and supremum of the density f over compact subintervals of the domain of f, sampling absolutely continuous distribution using the rejection method seems to be impossible in total generality.”

The von Neumann algorithm is another name for the rejection method introduced by von Neumann circa 1951. It was thus most exciting to spot a paper by Luc Devroye and Claude Gravel appearing in the latest Statistics and Computing. Assessing the method in terms of random bits and precision. Specifically, assuming that the only available random generator is one of random bits, which necessarily leads to an approximation when the target is a continuous density. The authors first propose a bisection algorithm for distributions defined on a compact interval, which compares random bits with recursive bisections of the unit interval and stops when the interval is small enough.

In higher dimension, for densities f over the unit hypercube, they recall that the original algorithm consisted in simulating uniforms x and u over the hypercube and [0,1], using the uniform as the proposal distribution and comparing the density at x, f(x), with the rescaled uniform. When using only random bits, the proposed method is based on a quadtree that subdivides the unit hypercube into smaller and smaller hypercubes until the selected hypercube is entirely above or below the density. And is small enough for the desired precision. This obviously requires for the computation of the upper and lower bound of the density over the hypercubes to be feasible, with Devroye and Gravel considering that this is a necessary property as shown by the above quote. Densities with non-compact support can be re-expressed as densities on the unit hypercube thanks to the cdf transform. (Actually, this is equivalent to the general accept-reject algorithm, based on the associated proposal.)

“With the oracles introduced in our modification of von Neumann’s method, we believe that it is impossible to design a rejection algorithm for densities that are not Riemann-integrable, so the question of the design of a universally valid rejection algorithm under the random bit model remains open.”

In conclusion, I enjoyed very much reading this paper, especially the reflection it proposes on the connection between Riemann integrability and rejection algorithms. (Actually, I cannot think straight away of a simulation algorithm that would handle non-Riemann-integrable densities, apart from nested sampling. Or of significant non-Riemann-integrable densities.)

## Sobol’s Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on December 10, 2016 by xi'an

The name of Ilya Sobol is familiar to researchers in quasi-Monte Carlo methods for his Sobol’s sequences. I was thus surprised to find in my office a small book entitled The Monte Carlo Method by this author, which is a translation of his 1968 book in Russian. I have no idea how it reached my office and I went to check with the library of Paris-Dauphine around the corner [of my corridor] whether it had been lost: apparently, the library got rid of it among a collection of old books… Now, having read through this 67 pages book (or booklet as Sobol puts it) makes me somewhat agree with the librarians, in that there is nothing of major relevance in this short introduction. It is quite interesting to go through the book and see the basics of simulation principles and Monte Carlo techniques unfolding, from the inverse cdf principle [established by a rather convoluted proof] to importance sampling, but the amount of information is about equivalent to the Wikipedia entry on the topic. From an historical perspective, it is also captivating to see the efforts to connect physical random generators (such as those based on vacuum tube noise) to shift-register pseudo-random generators created by Sobol in 1958. On a Soviet Strela computer.

While Googling the title of that book could not provide any connection, I found out that a 1994 version had been published under the title of A Primer for the Monte Carlo Method, which is mostly the same as my version, except for a few additional sections on pseudo-random generation, from the congruential method (with a FORTRAN code) to the accept-reject method being then called von Neumann’s instead of Neyman’s, to the notion of constructive dimension of a simulation technique, which amounts to demarginalisation, to quasi-Monte Carlo [for three pages]. A funny side note is that the author notes in the preface that the first translation [now in my office] was published without his permission!

## Poisson process model for Monte Carlo methods

Posted in Books with tags , , , , , , , on February 25, 2016 by xi'an

“Taken together this view of Monte Carlo simulation as a maximization problem is a promising direction, because it connects Monte Carlo research with the literature on optimization.”

Chris Maddison arXived today a paper on the use of Poisson processes in Monte Carlo simulation. based on the so-called Gumbel-max trick, which amounts to add to the log-probabilities log p(i) of the discrete target, iid Gumbel variables, and to take the argmax as the result of the simulation. A neat trick as it does not require the probability distribution to be normalised. And as indicated in the above quote to relate simulation and optimisation. The generalisation considered here replaces the iid Gumbel variates by a Gumbel process, which is constructed as an “exponential race”, i.e., a Poisson process with an exponential auxiliary variable. The underlying variates can be generated from a substitute density, à la accept-reject, which means this alternative bounds the true target.  As illustrated in the plot above.

The paper discusses two implementations of the principle found in an earlier NIPS 2014 paper [paper that contains most of the novelty about this method], one that refines the partition and the associated choice of proposals, and another one that exploits a branch-and-bound tree structure to optimise the Gumbel process. With apparently higher performances. Overall, I wonder at the applicability of the approach because of the accept-reject structure: it seems unlikely to apply to high dimensional problems.

While this is quite exciting, I find it surprising that this paper completely omits references to Brian Ripley’s considerable input on simulation and point processes. As well as the relevant Geyer and Møller (1994). (I am obviously extremely pleased to see that our 2004 paper with George Casella and Marty Wells is quoted there. We had written this paper in Cornell, a few years earlier, right after the 1999 JSM in Baltimore, but it has hardly been mentioned since then!)

## difference between Metropolis, Gibbs, importance, and rejection sampling

Posted in Books, Kids, Statistics with tags , , , , , on December 14, 2015 by xi'an

Last week, while I was preparing my talk for the NIPS workshop, I spotted this fairly generic question on X validated. And decided to procrastinate by answering through generic comments on the pros and cons of each method. This is a challenging if probably empty question as it lacks a measure of evaluation for those different approaches.  And this is another reason why I replied, in that it relates to my pondering the a-statistical nature of simulation-based approximation methods. Also called probabilistic numerics, not statistical numerics, eh! It is indeed close to impossible to compare such approaches and others on a general basis. For instance, the comparative analysis greatly differs when dealing with a once-in-a-lifetime problem and with an everyday issue, e.g. when building a package for a sufficiently standard model. In the former case, a quick-and-dirty off-the-shelf solution is recommended, while in the latter, designing an efficient and fine-tuned approach makes sense. (The pros and cons I discussed in my X validated answer thus do not apply in most settings!) If anything, using several approaches, whenever possible, is the best advice to give. If not on the targeted problem, at least on a toy or simulated version, to check for performances of those different tools. But this brings back the issue of cost and time… An endless garden of forking paths, one would say [in another setting].

## adaptive rejection sampling with fixed number of nodes

Posted in Books, Statistics, University life with tags , , , , , on October 8, 2015 by xi'an

This paper was arXived today by Martino and Louzada. It starts from the adaptive rejection sampling algorithm of Gilks and Wild (1993), which provided an almost automatic random generator for univariate log-concave target probability densities. By creating an envelope of the true target based on convexity. This algorithm was at the core of BUGS in its early days. And makes a good example of accept reject algorithm that I often use in class. It is however not so popular nowadays because of the unidimensional constraint. And because it is not particularly adequate for simulating a single realisation from a given target. As is the case when used in a Gibbs sampler. The authors only consider here the issue of the growing cost of running the adaptive rejection sampling algorithm, which is due to the update of the envelope at each rejection. They however fail to quantify this increase, which involves (a) finding the interval where the current rejection lies, among n existing intervals, which is of order O(n), and (b) creating both modified envelopes on both new intervals, which is of order O(1). The proposal of the authors, called cheap adaptive rejection sampling, settles for a fixed number τ of support points (and intervals), solely swapping a rejected point with the closest support point when this improves the acceptance rate. The test for swapping involves first computing the alternative target (on a pair of intervals) and the corresponding normalising constant, while swapping the rejected point with the closest support point involves finding the closest point, which is of order O(log τ). I am rather surprised that the authors do not even mention the execution time orders, resorting instead to a simulation where the gain brought by their proposal is not overwhelming. There is also an additional cost for figuring out the fixed number τ of support points, another issue not mentioned in the paper.

## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags , , , , on April 8, 2015 by xi'an

I had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

$\mathbb{P}(Y\in \mathcal{A},U\le f(Y)/Mg(Y)) = \int_\mathcal{A} \int_0^{f(y)/Mg(y)}\,\text{d}u\,g(y)\,\text{d}y\,,$

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

$\int_0^a \,\text{d}u\,,$

which seemed to be missing a density term or, in other words, be different from

$\int_0^1 \,\mathbb{I}_{(0,a)}(u)\,\text{d}u\,,$

What is surprising for me is that the integral

$\int_0^a \,\text{d}u$

has a clear meaning as a Riemann integral, hence should be more intuitive….