**A** few days ago I came across a short paper in the Central European Journal of Economic Modelling and Econometrics by Pajor and Osiewalski that proposes a correction to the infamous harmonic mean estimator that is essentially the one Darren and I made in 2009, namely to restrict the evaluations of the likelihood function to a subset **A** of the simulations from the posterior. Paper that relates to an earlier 2009 paper by Peter Lenk, which investigates the same object with this same proposal and that we had missed for all that time. The difference is that, while we examine an arbitrary HPD region at level 50% or 80% as the subset **A**, Lenk proposes to derive a minimum likelihood value from the MCMC run and to use the associated HPD region, which means using all simulations, hence producing the same object as the original harmonic mean estimator, except that it is corrected by a multiplicative factor P(**A**). Or rather an approximation. This correction thus maintains the infinite variance of the original, a point apparently missed in the paper.

## Archive for Gibbs sampler

## another version of the corrected harmonic mean estimator

Posted in Books, pictures, Statistics, University life with tags Gibbs sampler, harmonic mean estimator, HPD region, importance sampling, MCMC algorithm, Monte Carlo Statistical Methods on June 11, 2018 by xi'an## amazing appendix

Posted in Books, Statistics, Travel, University life with tags auxiliary variable, Colorado, Fort Collins, Gibbs sampler, Julian Besag, MCMC, Metropolis-within-Gibbs algorithm, Monte Carlo Statistical Methods, Oxford, random simulation, simulation, Statistical Science on February 13, 2018 by xi'an**I**n the first appendix of the 1995 Statistical Science paper of Besag, Green, Higdon and Mengersen, on MCMC, “Bayesian Computation and Stochastic Systems”, stands a fairly neat result I was not aware of (and which Arnaud Doucet, with his unrivalled knowledge of the literature!, pointed out to me in Oxford, avoiding me the tedium to try to prove it afresco!). I remember well reading a version of the paper in Fort Collins, Colorado, in 1993 (I think!) but nothing about this result.

It goes as follows: when running a Metropolis-within-Gibbs sampler for component x¹ of a collection of variates x¹,x²,…, thus aiming at simulating from the full conditional of x¹ given x⁻¹ by making a proposal q(x|x¹,x⁻¹), it is perfectly acceptable to use a proposal that depends on a parameter α (no surprise so far!) *and* to generate this parameter α anew at each iteration (still unsurprising as α can be taken as an auxiliary variable) *and* to have the distribution of this parameter α depending on the other variates x²,…, i.e., x⁻¹. This is the surprising part, as adding α as an auxiliary variable was messing up the update of x⁻¹. But the proof as found in the 1995 paper [page 35] does not require to consider α as such as it establishes global balance directly. (Or maybe still detailed balance when writing the whole Gibbs sampler as a cycle of Metropolis steps.) Terrific! And a whiff mysterious..!

## slice sampling for Dirichlet mixture process

Posted in Books, Statistics, University life with tags Communications in Statistics, Dirichlet process, Gibbs sampler, intractability, MCMC, Monte Carlo Statistical Methods, normalising constant, slice sampling on June 21, 2017 by xi'an**W**hen working with my PhD student Changye in Dauphine this morning I realised that slice sampling also applies to discrete support distributions and could even be of use in such settings. That it works is (now) straightforward in that the missing variable representation behind the slice sampler also applies to densities defined with respect to a discrete measure. That this is useful transpires from the short paper of Stephen Walker (2007) where we saw this, as Stephen relies on the slice sampler to sample from the Dirichlet mixture model by eliminating the tail problem associated with this distribution. (This paper appeared in Communications in Statistics and it is through Pati & Dunson (2014) taking advantage of this trick that Changye found about its very existence. I may have known about it in an earlier life, but I had clearly forgotten everything!)

While the prior distribution (of the weights) of the Dirichlet mixture process is easy to generate via the stick breaking representation, the posterior distribution is trickier as the weights are multiplied by the values of the sampling distribution (likelihood) at the corresponding parameter values and they cannot be normalised. Introducing a uniform to replace all weights in the mixture with an indicator that the uniform is less than those weights corresponds to a (latent variable) completion [or a demarginalisation as we called this trick in Monte Carlo Statistical Methods]. As elaborated in the paper, the Gibbs steps corresponding to this completion are easy to implement, involving only a finite number of components. Meaning the allocation to a component of the mixture can be operated rather efficiently. Or not when considering that the weights in the Dirichlet mixture are not monotone, hence that a large number of them may need to be computed before picking the next index in the mixture when the uniform draw happens to be quite small.

## simulation under zero measure constraints

Posted in Books, Kids, R, Statistics, University life with tags cross validated, Gibbs sampler, maximum likelihood estimation, Monte Carlo algorithm, Monte Carlo Statistical Methods, simulation, Stack Exchange, Statistics Forum on November 17, 2016 by xi'an**A** theme that comes up fairly regularly on X validated is the production of a sample with given moments, either for calibration motives or from a misunderstanding of the difference between a distribution mean and a sample average. Here are some entries on that topic:

- How to sample from a distribution so that mean of samples equals expected value?
- Sample random variables conditional on their sum
- Simulation involving conditioning on sum of random variables
- conditional expectation of squared standard normal

In most of those questions, the constraint in on the sum or mean of the sample, which allows for an easy resolution by a change of variables. It however gets somewhat harder when the constraint involves more moments or, worse, an implicit solution to an equation. A good example of the later is the quest for a sample with a given maximum likelihood estimate in the case this MLE cannot be derived analytically. As for instance with a location-scale t sample…

Actually, even when the constraint is solely on the sum, a relevant question is the production of an efficient simulation mechanism. Using a Gibbs sampler that changes one component of the sample at each iteration does not qualify, even though it eventually produces the proper sample. Except for small samples. As in this example

```
n=3;T=1e4
s0=.5 #fixed average
sampl=matrix(s0,T,n)
for (t in 2:T){
sampl[t,]=sampl[t-1,]
for (i in 1:(n-1)){
sampl[t,i]=runif(1,
min=max(0,n*s0-sum(sampl[t,c(-i,-n)])-1),
max=min(1,n*s0-sum(sampl[t,c(-i,-n)])))
sampl[t,n]=n*s0-sum(sampl[t,-n])}}
```

For very large samples, I figure that proposing from the unconstrained density can achieve a sufficient efficiency, but the in-between setting remains an interesting problem.

## independent Metropolis-Hastings

Posted in Books, Statistics with tags 0.234, block sampling, Gibbs sampler, independent Metropolis-Hastings algorithm, Metropolis-within-Gibbs algorithm, optimal acceptance rate, random walk on November 24, 2015 by xi'an

“In this paper we have demonstrated the potential benefits, both theoretical and practical, of the independence sampler over the random walk Metropolis algorithm.”

**P**eter Neal and Tsun Man Clement Lee arXived a paper on optimising the independent Metropolis-Hastings algorithm. I was a bit surprised at this “return” of the independent sampler, which I hardly mention in my lectures, so I had a look at the paper. The goal is to produce an equivalent to what Gelman, Gilks and Wild (1996) obtained for random walk samplers. In the formal setting when the target is a product of n identical densities f, the optimal number k of components to update in one Metropolis-Hastings (within Gibbs) round is approximately 2.835/I, where I is the symmetrised Kullback-Leibler distance between the (univariate) target f and the independent proposal q. When I is finite. The most surprising part is that the optimal acceptance rate is again 0.234, as in the random walk case. This is surprising in that I usually associate the independent Metropolis-Hastings algorithm with high acceptance rates. But this is of course when calibrating the proposal q, not the block size k of the Gibbs part. Hence, while this calibration of the independent Metropolis-within-Gibbs sampler is worth the study and almost automatically applicable, it remains that it only applies to a certain category of problems where blocking can take place. As in the disease models illustrating the paper. And requires an adequate choice of proposal distribution for, otherwise, the above quote becomes inappropriate.

## bouncy particle sampler

Posted in pictures, Statistics, Travel, University life with tags curse of dimensionality, Gibbs sampler, Hamiltonian Monte Carlo, interacting particle systems, Monte Carlo Statistical Methods, particle, pinball sampler, Teneriffe on October 30, 2015 by xi'an** A**lexandre Bouchard-Coté, Sebastian Vollmer and Arnaud Doucet just arXived a paper with the above title, which reminded me of a proposal Kerrie Mengersen and I made at Valencia 7, in Tenerife, the [short-lived!] pinball sampler. This sampler was a particle (MCMC) sampler where we used the location of the other particles to avoid their neighbourhood, by bouncing away from them according to a delayed rejection principle, with an overall Gibbs justification since the resulting target was the product of copies of the target distribution. The difficulty in implementing the (neat!) idea was in figuring out the amount of bouncing or, in more physical terms, the energy allocated to the move.

In the current paper, inspired from an earlier paper in physics, the Markov chain (or single particle) evolves by linear moves, changing directions according to a Poisson process, with intensity and direction depending on the target distribution. A local version takes advantage of a decomposition of the target into a product of terms involving only some components of the whole parameter to be simulated. And hence allowing for moves in subspaces. An extension proposed by the authors is to bounce along the Hamiltonian isoclines. The method is demonstrably ergodic and irreducible. In practice, I wonder at the level of calibration or preliminary testing required to facilitate the exploration of the parameter space, particularly in the local version that seems to multiply items to be calibrated.

## asynchronous distributed Gibbs sampling

Posted in Books, Statistics, Travel, University life with tags big data, bottleneck, Calgary, Dan Simpson, David Draper, forking paths, Gibbs sampler, parallel MCMC on October 13, 2015 by xi'an**A**lexander Terenin, Dan Simpson, and David Draper just arXived a paper on an alternative brand of Gibbs sampler, which they think can revolutionise the sampler and overcome its well-known bottlenecks. David had sent me the paper in advance and thus I had time to read it in the plane to Calgary. (This is also the very first paper I see acknowledging a pair of trousers..! With no connection whatsoever with bottlenecks!)

“Note that not all updates that are sent will be received by all other workers: because of network traffic congestion and other types of failures, a significant portion of the updates will be lost along the way.”

The approach is inherently parallel in that several “workers” (processors or graphical units) run Gibbs samplers in parallel, using their current knowledge of the system. This means they update a component of the model parameter, based on the information they have last received, and then send back this new value to the system. For physical reasons, there is not instantaneity in this transmission and thus all workers do not condition on the same “current” value, necessarily. Perceiving this algorithm as a “garden of forking paths” where each full conditional uses values picked at random from a collection of subchains (one for each worker), I can see why the algorithm should remain valid.

“Thus, the quality of this [ABC] method rises and falls with the ingenuity of the user in identifying nearly-sufficient statistics.”

It is also clear that this approach allows for any degree of parallelisation. However, it is less clear to me why this should constitute an improvement. With respect to the bottlenecks mentioned at the beginning of the paper, I do not truly see how the large data problem is bypassed. Except in cases where conditionals only depend on small parts of the data. Or why large dimensions can be more easily managed when compared with a plain Gibbs sampler or, better, parallel plain Gibbs samplers that would run on the same number of processors. (I do not think the paper runs the comparison in that manner, using instead a one-processor Gibbs sampler as its benchmark. Or less processors in the third example.) Since the forking paths repeatedly merge back at aperiodic stages, there is no multiplication or clear increase of the exploratory abilities of the sampler. Except for having competing proposed values [or even proposals] selected randomly. So maybe reaching a wee bit farther from time to time.