## A discrete Bernoulli factory

Posted in Books, Kids, Statistics with tags , , , , , , on October 18, 2021 by xi'an

A rather confusing (and now closed) question on X validated contained an interesting challenge of simulating an arbitrary discrete distribution using a single (standard) dice. It indeed made me think of the (more challenging) Bernoulli factory problem of simulating B(f(p)) using a B(p) simulator (with p unknown). I still do not see what the optimal solution is but the core challenge is to avoid simulating U(0,1) variate by exploiting the discrete nature of the target. Which may be an issue if the probabilities of the target are irrational and one is considering the cdf inversion approach. An alternative is to use an accept-reject approach, which also works for discrete distributions, by first deriving an instrumental distribution on the discrete support of the target from dice rolls, second finding the maximum of the ratio instrument to target, and third devising a discrete approach to selecting a generation with a probability taking a finite number of values. Which may prove quite costly. Finally, the least debatable approach is to turn the dice into a Uniform generator by using each draw as a digit in the base 5 representation of this Uniform variate, up to the precision desired for the resolution, and then apply the most efficient algorithm for the target distribution.

## ISBA 20.2.21

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 30, 2021 by xi'an

A second day which started earlier and more smoothly with a 100% local j-ISBA session. (Not counting the invigorating swim in Morgiou!) With talks by junior researchers from master to postdoc level, as this ISBA mirror meeting was primarily designed for them, so that they could all present their work, towards gaining more visibility for their research and facilitating more interactions with the participants. CIRM also insisted on this aspect of the workshop, which was well-attended.

I alas had to skip the poster session [and the joys of gather.town] despite skipping lunch [BND], due to organisational constraints. Then attended the Approximate Bayesian computation section, including one talk by Geoff Nicholls on confidence estimation for ABC, following upon the talk given by Kate last evening. And one by Florian Maire on learning the bound in accept-reject algorithms on the go, as in Caffo et al. (2002), which I found quite exciting and opening new possibilities, esp. if the Markov chain thus produced can be recycled towards unbiasedness without getting the constant right! For instance, by Rao-Blackwellisation, multiple mixtures, black box importance sampling, whatever. (This also reminded me of the earlier Goffinet et al. 1996.)

Followed by another Bayesian (modeling and) computation session. With my old friend Peter Müller talking about mixture inference with dependent priors (and a saturated colour scheme!), Matteo Ruggieri [who could not make it to CIRM!] on computable Bayesian inference for HMMs. Providing an impressive improvement upon particle filters for approximating the evidence. Also bringing the most realistic Chinese restaurant with conveyor belt! And Ming Yuan Zhou using optimal transport to define distance between distributions. With two different conditional distributions depending on which marginal is first fixed. And a connection with GANs (of course!).

And it was great to watch and listen to my friend Alicia Carriquiry talking on forensic statistics and the case for (or not?!) Bayes factors. And remembering Dennis Lindley. And my friend Jim Berger on frequentism versus Bayes! Consistency seems innocuous as most Bayes procedures are. Empirical coverage is another kind of consistency, isn’t it?

A remark I made when re-typing the program for CIRM is that there are surprisingly few talks about COVID-19 overall, maybe due to the program being mostly set for ISBA 2020 in Kunming. Maybe because we are more cautious than the competition…?!

And, at last, despite a higher density of boars around the CIRM facilities, no one got hurt yesterday! Unless one counts the impact of the French defeat at the Euro 2021 on the football fans here…

## piling up ziggurats

Posted in Books, pictures, Statistics, Travel with tags , , , , , , on June 7, 2021 by xi'an

This semester. a group of Dauphine graduate students worked under my direction on simulation problems and resorted to using the Ziggurat method developed by George Marsaglia and Wai Wan Tsang, at about the time Devroye was completing his simulation bible. The algorithm covers the half-Normal density by 2², 2⁴, 2⁸, &tc., stripes, all but one rectangles and all with the same surface v. Generating uniformly from the tail strip means generating either uniformly from the rectangle part, x<r, or exactly from the Normal tail x>r, using a drifted exponential accept-reject. The choice between both does not require the surface of the rectangle but a single simulation y=vU/f(r). Furthermore, for the other rectangles, checking first that the first coordinate of the simulated point is less than the left boundary of the rectangle above avoids computing the density. This method is incredibly powerful, once the boundaries have been determined. With 2³² stripes, its efficiency is 99.3% acceptance rate. Compared with a fast algorithm by Ahrens & Dieter (1989), it is three times faster…

## simulating Maxwell distribution

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on April 22, 2021 by xi'an

A question that came out on X validated a few days ago is how to efficiently simulate from a distribution with density

x²φ(x).

(Obviously this density is already properly normalised since the second moment of the standard Normal  distribution is one.) The first solution that came out (by Jarle Tufto) exploits the fact that this density corresponds to a signed root of a χ²(3) variate. This is a very efficient proposal that requires a Gamma sampler and a random sign sampler. Since the cdf is available in closed form,

Φ(x)-xφ(x),

I ran a comparison with a numerical inversion, but this is much slower. I also tried an accept-reject version based on a Normal proposal with a larger variance, but even when optimising this variance, the running time was about twice as large. While checking Devroye (1986) for any possible if unlikely trick, I came upon this distribution twice (p.119 in an unsolved exercise, p.176 presented as the Maxwell distribution). With the remark that, if

X~x²φ(x),  then  Y=UX~φ(x).

Inverting this result leads to X being distributed as

sign(Y)√(Y²-2log(U)),

which recovers the original χ²(3) solution, if slightly (and mysteriously) increasing the simulation speed.

## your GAN is secretly an energy-based model

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 5, 2021 by xi'an

As I was reading this NeurIPS 2020 paper by Che et al., and trying to make sense of it, I came across a citation to our paper Casella, Robert and Wells (2004) on a generalized accept-reject sampling scheme where the proposal changes at each simulation that sounds surprising if appreciated! But after checking this paper also appears as the first reference on the Wikipedia page for rejection sampling, which makes me wonder if many actually read it. (On the side, we mostly wrote this paper on a drive from Baltimore to Ithaca, after JSM 1999.)

“We provide more evidence that it is beneficial to sample from the energy-based model defined both by the generator and the discriminator instead of from the generator only.”

The paper seems to propose a post-processing of the generator output by a GAN, generating from the mixture of both generator and discriminator, via a (unscented) Langevin algorithm. The core idea is that, if p(.) is the true data generating process, g(.) the estimated generator and d(.) the discriminator, then

p(x) ≈ p⁰(x)∝g(x) exp(d(x))

(The approximation would be exact the discriminator optimal.) The authors work with the latent z’s, in the GAN meaning that generating pseudo-data x from g means taking a deterministic transform of z, x=G(z). When considering the above p⁰, a generation from p⁰ can be seen as accept-reject with acceptance probability proportional to exp[d{G(z)}]. (On the side, Lemma 1 is the standard validation for accept-reject sampling schemes.)

Reading this paper made me realise how much the field had evolved since my previous GAN related read. With directions like Metropolis-Hastings GANs and Wasserstein GANs. (And I noticed a “broader impact” section past the conclusion section about possible misuses with societal consequences, which is a new requirement for NeurIPS publications.)