## Buffon machines

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on December 22, 2020 by xi'an

By chance I came across a 2010 paper entitled On Buffon Machines and Numbers by Philippe Flajolet, Maryse Pelletier and Michèle Soria. Which relates to Bernoulli factories, a related riddle, and the recent paper by Luis Mendo I reviewed here. What the authors call a Buffon machine is a device that produces a perfect simulation of a discrete random variable out of a uniform bit generator. Just like (George Louis Leclerc, comte de) Buffon’s needle produces a Bernoulli outcome with success probability π/4. out of a real Uniform over (0,1). Turned into a sequence of Uniform random bits.

“Machines that always halt can only produce Bernoulli distributions whose parameter is a dyadic rational.”

When I first read this sentence it seemed to clash with the earlier riddle, until I realised the later started from a B(p) coin to produce a fair coin, while this paper starts with a fair coin.

The paper hence aims at a version of the Bernoulli factory problem (see Definition 2), although the term is only mentioned at the very end, with the added requirement of simplicity and conciseness translated as a finite expected number of draws and possibly an exponential tail.

It first recalls the (Forsythe–)von Neumann method of generating exponential (and other) variates out of a Uniform generator (see Section IV.2 in Devroye’s generation bible). Expanded into a general algorithm for generating discrete random variables whose pmf’s are related to permutation cardinals,

$\mathbb P(N=n)\propto P_n\lambda^n/n!$

if the Bernoulli generator has probability λ. These include the Poisson and the logarithmic distributions and as a side product Bernoulli distributions with some logarithmic, exponential and trigonometric transforms of λ.

As a side remark, a Bernoulli generator with probability 1/π is derived from Ramanujan identity

$\frac{1}{\pi} = \sum_{n=0}^\infty {2n \choose n}^3 \frac{6n+1}{2^{8n+2}}$

as “a discrete analogue of Buffon’s original. In a neat connection with Mendo’s paper, the authors of this 2010 paper note that Euler’s constant does not appear to be achievable by a Buffon machine.

## simulating hazard

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , , , , , , on May 26, 2020 by xi'an

A rather straightforward X validated question that however leads to an interesting simulation question: when given the hazard function h(·), rather than the probability density f(·), how does one simulate this distribution? Mathematically h(·) identifies the probability distribution as much as f(·),

$1-F(x)=\exp\left\{ \int_{-\infty}^x h(t)\,\text{d}t \right\}=\exp\{H(x)\}$

which means cdf inversion could be implemented in principle. But in practice, assuming the integral is intractable, what would an exact solution look like? Including MCMC versions exploiting one fixed point representation or the other.. Since

$f(x)=h(x)\,\exp\left\{ \int_{-\infty}^x h(t)\,\text{d}t \right\}$

using an unbiased estimator of the exponential term in a pseudo-marginal algorithm would work. And getting an unbiased estimator of the exponential term can be done by Glynn & Rhee debiasing. But this is rather costly… Having Devroye’s book under my nose [at my home desk] should however have driven me earlier to the obvious solution to… simply open it!!! A whole section (VI.2) is indeed dedicated to simulations when the distribution is given by the hazard rate. (Which made me realise this problem is related with PDMPs in that thinning and composition tricks are common to both.) Besides the inversion method, ie X=H⁻¹(U), Devroye suggests thinning a Poisson process when h(·) is bounded by a manageable g(·). Or a generic dynamic thinning approach that converges when h(·) is non-increasing.

## simulating a sum of Uniforms

Posted in Statistics with tags , , , , , , , , , , on May 1, 2020 by xi'an

When considering the distribution of the sum (or average) of N Uniform variates, called either Irwin-Hall for the sum or Bates for the average, simulating the N uniforms then adding them shows a linear cost in N. The density of the resulting variate is well-known,

$f_X(x;N)=\dfrac{1}{2(N-1)!}\sum_{k=0}^N (-1)^k{N \choose k} (x-k)^{N-1}\text{sign}(x-k)$

but similarly is of order N. Furthermore, controlling the terms in the alternating sum may prove delicate, as shown by the R function unifed::dirwin.hall() whose code

for (k in 0:floor(x)) ret1 <- ret1 + (-1)^k * choose(n, k) *
(x - k)^(n - 1)


quickly becomes unreliable (although I managed an easy fix by using logs and a reference value of the magnitude of the terms in the summation). There is however a quick solution provided by [of course!] Devroye (NURVG, Section XIV.3, p.708), using the fact that the characteristic function of the Irwin-Hall distribution [for Uniforms over (-1,1)] is quite straightforward

$\Phi_N(t) = [\sin(t)/t]^N$

which means the density can be bounded from above and results in an algorithm (NURVG, Section XIV.3, p.714) with complexity at most N to the power 5/8, if not clearly spelled out in the book. Obviously, it can be objected that for N large enough, like N=20, the difference between the true distribution and the CLT approximation is quite negligible (reminding me of my early simulating days where generating a Normal was done by averaging a dozen uniforms and properly rescaling!). But this is not an exact approach and the correction proves too costly. As shown by Section XIV.4 on the simulation of sums in NURVG. So… the game is afoot!

## ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on October 30, 2018 by xi'an

A ziggurat (Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels. Wikipedia

In a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the  conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.

## infinite mixtures are likely to take a while to simulate

Posted in Books, Statistics with tags , , , , , , , , on February 22, 2018 by xi'an

Another question on X validated got me highly interested for a while, as I had considered myself the problem in the past, until I realised while discussing with Murray Pollock in Warwick that there was no general answer: when a density f is represented as an infinite series decomposition into weighted densities, some weights being negative, is there an efficient way to generate from such a density? One natural approach to the question is to look at the mixture with positive weights, f⁺, since it gives an upper bound on the target density. Simulating from this upper bound f⁺ and accepting the outcome x with probability equal to the negative part over the sum of the positive and negative parts f⁻(x)/f(x) is a valid solution. Except that it is not implementable if

1.  the positive and negative parts both involve infinite sums with no exploitable feature that can turn them into finite sums or closed form functions,
2.  the sum of the positive weights is infinite, which is the case when the series of the weights is not absolutely converging.

Even when the method is implementable it may be arbitrarily inefficient in the sense that the probability of acceptance is equal to to the inverse of the sum of the positive weights and that simulating from the bounding mixture in the regular way uses the original weights which may be unrelated in size with the actual importance of the corresponding components in the actual target. Hence, when expressed in this general form, the problem cannot allow for a generic solution.

Obviously, if more is known about the components of the mixture, as for instance the sequence of weights being alternated, there exist specialised methods, as detailed in the section of series representations in Devroye’s (1985) simulation bible. For instance, in the case when positive and negative weight densities can be paired, in the sense that their weighted difference is positive, a latent index variable can be included. But I cannot think of a generic method where the initial positive and negative components are used for simulation, as it may on the opposite be the case that no finite sum difference is everywhere positive.