**O**n Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt and later explored with Richard Tweedie). As demonstrated in the experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.

## Archive for simulation

## inefficiency of data augmentation for large samples

Posted in Books, pictures, Running, Statistics, Travel, University life with tags convergence of Gibbs samplers, Data augmentation, Gibbs sampling, Hamiltonian Monte Carlo, importance sampling, logit model, MCMC, Monte Carlo Statistical Methods, probit model, simulation, spectral gap on May 31, 2016 by xi'an## a Bernoulli factory of sorts?

Posted in Books, Kids, Statistics with tags Bernoulli distribution, Bernoulli factory, cross validated, Monte Carlo, simulation, Stack Echange on May 10, 2016 by xi'an**A** nice question was posted on X validated as to figure out a way to simulate a Bernoulli B(q) variate when using only a Bernoulli B(p) generator. With the additional question of handling the special case q=a/b, a rational probability. This is not exactly a Bernoulli factory problem in that q does not write as f(p), but still a neat challenge. My solution would have been similar to the one posted by William Huber, namely to simulate a sequence of B(p) or B(1-p) towards zooming on q until the simulation of the underlying uniforms U allows us to conclude at the position of U wrt q. For instance, if p>q and X~B(p) is equal to zero, the underlying uniform is more than p, hence more than q, leading to returning zero for the B(q) generation. Else, a second B(p) or B(1-p) generation means breaking the interval (0,p) into two parts, one of which allows for stopping the generation, and so on. The solution posted by William Huber contains an R code that could be easily improved by choosing for each interval between p and (1-p) towards the maximal probability of stopping. I still wonder at the ultimate optimal solution that would minimise the (average or median) number of calls to the Bernoulli(p) generator.

## more e’s [and R’s]

Posted in Kids, pictures, R, Statistics with tags cross validated, debiasing, George Forsythe, Monte Carlo Statistical Methods, R, Russian roulette, simulation, unbiased estimation on February 22, 2016 by xi'an**A**lex Thiéry suggested debiasing the biased estimate of e by Rhee and Glynn truncated series method, so I tried the method to see how much of an improvement (if any!) this would bring. I first attempted to naïvely implement the raw formula of Rhee and Glynn

with a (large) Poisson distribution on the stopping rule N, but this took ages. I then realised that the index n did not have to be absolute, i.e. to start at n=1 and proceed snailwise one integer at a time: the formula remains equally valid after a change of time, i.e. n=can start at an arbitrary value and proceeds by steps of arbitrary size, which obviously speeds things up! Continue reading

## The answer is e, what was the question?!

Posted in Books, R, Statistics with tags Buffon's needle, cross validated, Gnedenko, Monte Carlo integration, Poisson process, simulation on February 12, 2016 by xi'an**A** rather exotic question on X validated: *since π can be approximated by random sampling over a unit square, is there an equivalent for approximating e?* This is an interesting question, as, indeed, why not focus on *e* rather than* π *after all?! But very quickly the very artificiality of the problem comes back to hit one in one’s face… With no restriction, it is straightforward to think of a Monte Carlo average that converges to *e* as the number of simulations grows to infinity. However, such methods like Poisson and normal simulations require some complex functions like sine, cosine, or exponential… But then someone came up with a connection to the great Russian probabilist Gnedenko, who gave as an exercise that the average number of uniforms one needs to add to exceed 1 is exactly *e*, because it writes as

(The result was later detailed in the American Statistician as an introductory simulation exercise akin to Buffon’s needle.) This is a brilliant solution as it does not involve anything but a standard uniform generator. I do not think it relates in any close way to the generation from a Poisson process with parameter λ=1 where the probability to exceed one in one step is *e*⁻¹, hence deriving a Geometric variable from this process leads to an unbiased estimator of *e* as well. As an aside, W. Huber proposed the following elegantly concise line of R code to implement an approximation of *e*:

`1/mean(n*diff(sort(runif(n+1))) > 1)`

Hard to beat, isn’t it?! (Although it is more exactly a Monte Carlo approximation of

which adds a further level of approximation to the solution….)

## approximating evidence with missing data

Posted in Books, pictures, Statistics, University life with tags Bayes factor, Bayesian Choice, Bayesian model comparison, bridge sampling, Chib's approximation, defensive mixture, harmonic mean, importance sampling, MCMC algorithms, mixture, Monte Carlo Statistical Methods, nested sampling, Pima Indians, reversible jump MCMC, simulation, University of Warwick on December 23, 2015 by xi'an**P**anayiota Touloupou (Warwick), Naif Alzahrani, Peter Neal, Simon Spencer (Warwick) and Trevelyan McKinley arXived a paper yesterday on Model comparison with missing data using MCMC and importance sampling, where they proposed an importance sampling strategy based on an early MCMC run to approximate the marginal likelihood a.k.a. the evidence. Another instance of estimating a constant. It is thus similar to our Frontier paper with Jean-Michel, as well as to the recent Pima Indian survey of James and Nicolas. The authors give the difficulty to calibrate reversible jump MCMC as the starting point to their research. The importance sampler they use is the natural choice of a Gaussian or *t* distribution centred at some estimate of θ and with covariance matrix associated with Fisher’s information. Or derived from the warmup MCMC run. The comparison between the different approximations to the evidence are done first over longitudinal epidemiological models. Involving 11 parameters in the example processed therein. The competitors to the 9 versions of importance samplers investigated in the paper are the raw harmonic mean [rather than our HPD truncated version], Chib’s, path sampling and RJMCMC [which does not make much sense when comparing two models]. But neither bridge sampling, nor nested sampling. Without any surprise (!) harmonic means do not converge to the right value, but more surprisingly Chib’s method happens to be less accurate than most importance solutions studied therein. It may be due to the fact that Chib’s approximation requires three MCMC runs and hence is quite costly. The fact that the mixture (or defensive) importance sampling [with 5% weight on the prior] did best begs for a comparison with bridge sampling, no? The difficulty with such study is obviously that the results only apply in the setting of the simulation, hence that e.g. another mixture importance sampler or Chib’s solution would behave differently in another model. In particular, it is hard to judge of the impact of the dimensions of the parameter and of the missing data.

## difference between Metropolis, Gibbs, importance, and rejection sampling

Posted in Books, Kids, Statistics with tags accept-reject algorithm, cross validated, importance sampling, MCMC algorithms, Monte Carlo Statistical Methods, simulation on December 14, 2015 by xi'an**L**ast week, while I was preparing my talk for the NIPS workshop, I spotted this fairly generic question on X validated. And decided to procrastinate by answering through generic comments on the pros and cons of each method. This is a challenging if probably empty question as it lacks a measure of evaluation for those different approaches. And this is another reason why I replied, in that it relates to my pondering the a-statistical nature of simulation-based approximation methods. Also called probabilistic numerics, not statistical numerics, eh! It is indeed close to impossible to compare such approaches and others on a general basis. For instance, the comparative analysis greatly differs when dealing with a once-in-a-lifetime problem and with an everyday issue, e.g. when building a package for a sufficiently standard model. In the former case, a quick-and-dirty off-the-shelf solution is recommended, while in the latter, designing an efficient and fine-tuned approach makes sense. (The pros and cons I discussed in my X validated answer thus do not apply in most settings!) If anything, using several approaches, whenever possible, is the best advice to give. If not on the targeted problem, at least on a toy or simulated version, to check for performances of those different tools. But this brings back the issue of cost and time… An endless garden of forking paths, one would say [in another setting].