Archive for Luc Devroye

infinite mixtures are likely to take a while to simulate

Posted in Books, Statistics with tags , , , , , , , , on February 22, 2018 by xi'an

Another question on X validated got me highly interested for a while, as I had considered myself the problem in the past, until I realised while discussing with Murray Pollock in Warwick that there was no general answer: when a density f is represented as an infinite series decomposition into weighted densities, some weights being negative, is there an efficient way to generate from such a density? One natural approach to the question is to look at the mixture with positive weights, f⁺, since it gives an upper bound on the target density. Simulating from this upper bound f⁺ and accepting the outcome x with probability equal to the negative part over the sum of the positive and negative parts f⁻(x)/f(x) is a valid solution. Except that it is not implementable if

  1.  the positive and negative parts both involve infinite sums with no exploitable feature that can turn them into finite sums or closed form functions,
  2.  the sum of the positive weights is infinite, which is the case when the series of the weights is not absolutely converging.

Even when the method is implementable it may be arbitrarily inefficient in the sense that the probability of acceptance is equal to to the inverse of the sum of the positive weights and that simulating from the bounding mixture in the regular way uses the original weights which may be unrelated in size with the actual importance of the corresponding components in the actual target. Hence, when expressed in this general form, the problem cannot allow for a generic solution.

Obviously, if more is known about the components of the mixture, as for instance the sequence of weights being alternated, there exist specialised methods, as detailed in the section of series representations in Devroye’s (1985) simulation bible. For instance, in the case when positive and negative weight densities can be paired, in the sense that their weighted difference is positive, a latent index variable can be included. But I cannot think of a generic method where the initial positive and negative components are used for simulation, as it may on the opposite be the case that no finite sum difference is everywhere positive.

A of A

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on November 30, 2017 by xi'an

Next June, at the same time as the ISBA meeting in Edinburgh, which is slowly taking shape, there will be an Analysis of Algorithms (AofA) meeting in Uppsala (Sweden) with Luc Devroye as the plenary Flajolet Award speaker. The full name of the conference is the 29th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms. While it is unfortunate the two conferences take place at the same time (and not in the same location), this also provides a continuity of conferences with the following week MCqMC in Rennes and the subsequent week summer school in simulation in Warwick (with Art Owen as the LMS Lecturer).

About our summer school, I want to point out that, thanks to several sponsors, we will be able to provide a consequent number of bursaries for junior researchers. This should be an additional incentive for attendees of the previous week Young Bayesian meeting (BAYSM) to remain the extra days nearby Warwick and attend this fantastic opportunity. Other instructors are Nicolas Chopin, Mark Huber and Jeff Rosenthal!

an elegant result on exponential spacings

Posted in Statistics with tags , , , , , , , , , , , , , on April 19, 2017 by xi'an

A question on X validated I spotted in the train back from Lyon got me desperately seeking a reference in Devroye’s Generation Bible despite the abyssal wireless and a group of screeching urchins a few seats away from me… The question is about why

\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)

when the Y’s are standard exponentials. Since this reminded me immediately of exponential spacings, thanks to our Devroye fan-club reading group in Warwick,  I tried to download Devroye’s Chapter V and managed after a few aborts (and a significant increase in decibels from the family corner). The result by Sukhatme (1937) is in plain sight as Theorem 2.3 and is quite elegant as it relies on the fact that

\sum_{i=1}^n y_i=\sum_{j=1}^n (n-j+1)(y_{(j)}-y_{(j-1)})=\sum_{j=2}^n (y_{(j)}-y_{(1)})

hence sums up as a mere linear change of variables! (Pandurang Vasudeo Sukhatme (1911–1997) was an Indian statistician who worked on human nutrition and got the Guy Medal of the RSS in 1963.)

complexity of the von Neumann algorithm

Posted in Statistics with tags , , , , , , , , , on April 3, 2017 by xi'an

“Without the possibility of computing infimum and supremum of the density f over compact subintervals of the domain of f, sampling absolutely continuous distribution using the rejection method seems to be impossible in total generality.”

The von Neumann algorithm is another name for the rejection method introduced by von Neumann circa 1951. It was thus most exciting to spot a paper by Luc Devroye and Claude Gravel appearing in the latest Statistics and Computing. Assessing the method in terms of random bits and precision. Specifically, assuming that the only available random generator is one of random bits, which necessarily leads to an approximation when the target is a continuous density. The authors first propose a bisection algorithm for distributions defined on a compact interval, which compares random bits with recursive bisections of the unit interval and stops when the interval is small enough.

In higher dimension, for densities f over the unit hypercube, they recall that the original algorithm consisted in simulating uniforms x and u over the hypercube and [0,1], using the uniform as the proposal distribution and comparing the density at x, f(x), with the rescaled uniform. When using only random bits, the proposed method is based on a quadtree that subdivides the unit hypercube into smaller and smaller hypercubes until the selected hypercube is entirely above or below the density. And is small enough for the desired precision. This obviously requires for the computation of the upper and lower bound of the density over the hypercubes to be feasible, with Devroye and Gravel considering that this is a necessary property as shown by the above quote. Densities with non-compact support can be re-expressed as densities on the unit hypercube thanks to the cdf transform. (Actually, this is equivalent to the general accept-reject algorithm, based on the associated proposal.)

“With the oracles introduced in our modification of von Neumann’s method, we believe that it is impossible to design a rejection algorithm for densities that are not Riemann-integrable, so the question of the design of a universally valid rejection algorithm under the random bit model remains open.”

In conclusion, I enjoyed very much reading this paper, especially the reflection it proposes on the connection between Riemann integrability and rejection algorithms. (Actually, I cannot think straight away of a simulation algorithm that would handle non-Riemann-integrable densities, apart from nested sampling. Or of significant non-Riemann-integrable densities.)

pitfalls of nested Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , , on December 19, 2016 by xi'an

Cockatoo Island, Sydney Harbour, July 15, 2012A few days ago, Tom Rainforth, Robert Cornish, Hongseok Yang, and Frank Wood from Oxford have arXived a paper on the limitations of nested Monte Carlo. By nested Monte Carlo [not nested sampling], they mean Monte Carlo techniques used to evaluate the expectation of a non-linear transform of an expectation, which often call for plug-in resolution. The main result is that this expectation cannot be evaluated by an unbiased estimator. Which is only mildly surprising. I do wonder if there still exist series solutions à la Glynn and Rhee, as in the Russian roulette version. Which is mentioned in a footnote. Or specially tuned versions, as suggested by some techniques found in Devroye’s book where the expectation of the exponential of another expectation is considered… (The paper is quite short, which may be correlated with the format imposed by some machine-learning conference proceedings like AISTATS.)

ratio-of-uniforms [-1]

Posted in Books, pictures, R, Statistics, University life with tags , , , on December 12, 2016 by xi'an

Luca Martino pointed out to me my own and forgotten review of a 2012 paper of his, “On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling” that obviously discusses a generalised version of Kinderman and Monahan’s (1977) ratio-of-uniform method. And further points out the earlier 1991 paper by Jon Wakefield, Alan Gelfand and Adrian Smith that contains the general form I rediscovered a few posts ago… Called the GRoU in Martino et al.. While the generalisation in the massive arXiv document is in finding Φ such that the above region is bounded and can be explored by uniform sampling over a box.

Neither reference mentions using the cdf transform, though, which does guarantee a bounded ratio-of-uniform set in u. (An apparent contradiction with Martino et al.  statement (34), unless I am confused. Maybe due to using Φ⁻¹ instead of Φ?) But I still wonder at the usefulness of my derivations those past weeks!

ratio-of-uniforms [#4]

Posted in Books, pictures, R, Statistics, University life with tags , , , , on December 2, 2016 by xi'an

Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set Λ of (u,v)’s in R⁺xX such that


induces the distribution with density proportional to ƒ on φοΦ⁻¹(U)V. This set Λ has a boundary that is parameterised as

u=Φοƒ(x),  v=1/φοƒ(x), x∈Χ

which remains bounded in u since Φ is a cdf and in v if φ has fat enough tails. At both 0 and ∞. When ƒ is the Gamma(a,1) density this can be achieved if φ behaves like log(x)² near zero and like a inverse power at infinity. Without getting into all the gory details, closed form density φ and cdf Φ can be constructed for all a’s, as shown for a=½ by the boundaries in u and v (yellow) below

bundawhich leads to a bounded associated set Λ

ratgamp5At this stage, I remain uncertain of the relevance of such derivations, if only because the set A thus derived is ill-suited for uniform draws proposed on the enclosing square box. And also because a Gamma(a,1) simulation can rather simply be derived from a Gamma(a+1,1) simulation. But, who knows?!, there may be alternative usages of this representation, such as innovative slice samplers. Which means the ratio-of-uniform method may reappear on the ‘Og one of those days…