Barker (from the lovely city of Dunedin) and Link published a paper in the American Statistician last September that I missed, as I missed their earlier email about the paper since it arrived The Day After… The paper is about a new specification of RJMCMC, almost twenty years after Peter Green’s (1995) introduction of the method. The authors use the notion of a palette, “from which all model specific parameters can be calculated” (in a deterministic way). One can see the palette ψ as an intermediary step in the move between two models. This reduces the number of bijections, if not the construction of the dreaded Jacobians!, but forces the construction of pseudo-priors on the unessential parts of ψ for every model. Because the dimension of ψ is fixed, a Gibbs sampling interleaving model index and palette value is then implementable. The conditional of the model index given the palette is available provided there are not too many models under competitions, with the probabilities recyclable towards a Rao-Blackwell approximation of the model probability. I wonder at whether or not another Rao-Blackwellisation is possible, namely to draw from all the simulated palettes a sample for the parameter of an arbitrarily chosen model.
Archive for Gibbs sampling
Following my earlier post about the young astronomer who feared he was running his MCMC for too long, here is an update from his visit to my office this morning. This visit proved quite an instructive visit for both of us. (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is as earlier completely unrelated with the young astronomer!)
First, the reason why he thought MCMC was running too long was that the acceptance rate was plummeting down to zero, whatever the random walk scale. The reason for this behaviour is that he was actually running a standard simulated annealing algorithm, hence observing the stabilisation of the Markov chain in one of the (global) modes of the target function. In that sense, he was right that the MCMC was run for “too long”, as there was nothing to expect once the mode had been reached and the temperature turned down to zero. So the algorithm was working correctly.
Second, the astronomy problem he considers had a rather complex likelihood, for which he substituted a distance between the (discretised) observed data and (discretised) simulated data, simulated conditional on the current parameter value. Now…does this ring a bell? If not, here is a three letter clue: ABC… Indeed, the trick he had found to get around this likelihood calculation issue was to re-invent a version of ABC-MCMC! Except that the distance was re-introduced into a regular MCMC scheme as a substitute to the log-likelihood. And compared with the distance at the previous MCMC iteration. This is quite clever, even though this substitution suffers from a normalisation issue (that I already mentioned in the post about Holmes’ and Walker’s idea to turn loss functions into pseudo likelihoods. Regular ABC does not encounter this difficult, obviously. I am still bemused by this reinvention of ABC from scratch!
So we are now at a stage where my young friend will experiment with (hopefully) correct ABC steps, trying to derive the tolerance value from warmup simulations and use some of the accelerating tricks suggested by Umberto Picchini and Julie Forman to avoid simulating the characteristics of millions of stars for nothing. And we agreed to meet soon for an update. Indeed, a fairly profitable morning for both of us!
Here is an excerpt from an email I just received from a young astronomer with whom I have had many email exchanges about the nature and implementation of MCMC algorithms, not making my point apparently:
The acceptance ratio turn to be good if I used (imposed by me) smaller numbers of iterations. What I think I am doing wrong is the convergence criteria. I am not stopping when I should stop.
To which I replied he should come (or Skype) and talk with me as I cannot get into enough details to point out his analysis is wrong… It may be the case that the MCMC algorithm finds a first mode, explores its neighbourhood (hence a good acceptance rate and green signals for convergence), then wanders away, attracted by other modes. It may also be the case the code has mistakes. Anyway, you cannot stop a (basic) MCMC algorithm too late or let it run for too long! (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is unrelated to the young astronomer!)
My former student Roberto Casarin and his colleagues wrote (and arXived) a paper entitled Adaptive sticky generalized Metropolis algorithm. The basic idea is to use some of the rejected and past values of the chain to build an adaptive proposal, the criterion for choosing those values being related with the distance at the rejected point between the target and the proposal. In a sense, it gives a reward to surprising points, i.e. points where the proposal does poorly in approximating the target. On top of this, they include a multiple-try strategy where several values are generated from the current proposal and one of them is selected, to be accepted or rejected in a Metropolis step. The learning set may include several of the proposed (and rejected) values. This paper generalises Holden, Hauge and Holden (AoAP, 2009) and extends their proof of stationarity. The authors explore at length (the paper is 63 pages long!) the construction of the adaptive proposal distribution. This construction appears to be quite similar to Gilks’ and Wild’s (1993) ARMS algorithm. Hence, unless I missed a generalisation, it seems to me that the solutions are restricted to unidimensional settings. For instance, the authors propose to implement their algorithm for each complex conditional in a Gibbs sampler, meaning starting from scratch and running a large enough number of iterations to “reach” convergence. I also wonder at the correspondence between this construction and the original assumption of a minorisation condition wrt the target density in the event of an unbounded support. While this paper represents an interesting extension of the automated simulation algorithms of the ARMS type, and while the method is investigated thoroughly by several simulation experiments (in the second half of the paper), I remain somehow circumspect at the possibly of using ASMTM in complex high-dimensional problems as the learning cost soar with the dimension.
In a sort of echo from an earlier post, I received this emailed question from Gabriel:
I am contacting you in connection with my internship and your book «Le choix bayésien» where I cannot find an answer to my question. Given a constrained parameter space and an unconstrained Markov chain, is it correct to subsample the chain in order to keep only those points that satisfy the constraint?
To which I replied that this would induce
a bias in the outcome, even though this is a marginally valid argument (if the Markov chain is in its stationary regime, picking one value at random from those satisfying the constraint is akin to accept-reject). The unbiased approach is to resort to Metropolis-Hastings steps in Gabriel’s Gibbs sampler to check whether or not each proposed move stays within the constrained space. If not, one need to replicate the current value of the chain…
Update: Following comments by Ajay and David, I withdraw the term “bias”. The method works as a low key accept-reject but can be very inefficient, to the extent of having no visit to the constrained set. (However, in the event of a practically/numerically disconnected support with a huge gap between the connected components, it may also be more efficient than a low energy Metropolis-Hastings algorithm. Mileage may vary!)