## running MCMC for too long, and even longer…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , on October 23, 2013 by xi'an

Following my earlier post about the young astronomer who feared he was running his MCMC for too long, here is an update from his visit to my office this morning.  This visit proved quite an instructive visit for both of us. (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is as earlier completely unrelated with the young astronomer!)

First, the reason why he thought MCMC was running too long was that the acceptance rate was plummeting down to zero, whatever the random walk scale. The reason for this behaviour is that he was actually running a standard simulated annealing algorithm, hence observing the stabilisation of the Markov chain in one of the (global) modes of the target function. In that sense, he was right that the MCMC was run for “too long”, as there was nothing to expect once the mode had been reached and the temperature turned down to zero. So the algorithm was working correctly.

Second, the astronomy problem he considers had a rather complex likelihood, for which he substituted a distance between the (discretised) observed data and (discretised) simulated data, simulated conditional on the current parameter value. Now…does this ring a bell? If not, here is a three letter clue: ABC… Indeed, the trick he had found to get around this likelihood calculation issue was to re-invent a version of ABC-MCMC! Except that the distance was re-introduced into a regular MCMC scheme as a substitute to the log-likelihood. And compared with the distance at the previous MCMC iteration. This is quite clever, even though this substitution suffers from a normalisation issue (that I already mentioned in the post about Holmes’ and Walker’s idea to turn loss functions into pseudo likelihoods. Regular ABC does not encounter this difficult, obviously. I am still bemused by this reinvention of ABC from scratch!

So we are now at a stage where my young friend will experiment with (hopefully) correct ABC steps, trying to derive the tolerance value from warmup simulations and use some of the accelerating tricks suggested by Umberto Picchini and Julie Forman to avoid simulating the characteristics of millions of stars for nothing. And we agreed to meet soon for an update. Indeed, a fairly profitable morning for both of us!

## running MCMC for too long…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , on October 20, 2013 by xi'an

Here is an excerpt from an email I just received from a young astronomer with whom I have had many email exchanges about the nature and implementation of MCMC algorithms, not making my point apparently:

The acceptance ratio turn to be good if I used (imposed by me) smaller numbers of iterations. What I think I am doing wrong is the convergence criteria. I am not stopping when I should stop.

To which I replied he should come (or Skype) and talk with me as I cannot get into enough details to point out his analysis is wrong… It may be the case that the MCMC algorithm finds a first mode, explores its neighbourhood (hence a good acceptance rate and green signals for convergence), then wanders away, attracted by other modes. It may also be the case the code has mistakes. Anyway, you cannot stop a (basic) MCMC algorithm too late or let it run for too long! (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is unrelated to the young astronomer!)

## Robust adaptive Metropolis algorithm [arXiv:10114381]

Posted in R, Statistics with tags , , , , , , on November 23, 2010 by xi'an

Matti Vihola has posted a new paper on arXiv about adaptive (random walk) Metropolis-Hastings algorithms. The update in the (lower diagonal) scale matrix is

$S_nS_n^\text{T}=S_{n-1}\left(\mathbf{I}_d-\eta_n[\alpha_n-\alpha^\star]\dfrac{U_nU_n^\text{T}}{||U_n||^2}\right)S^\text{T}_{n-1}$

where

• $\alpha_n$ is the current acceptance probability and $\alpha^\star$ the target acceptance rate;
• $U_n$ is the current random noise for the proposal, $Y_n=X_{n-1}+S_{n-1}U_n$;
• $\eta_n$ is a step size sequence decaying to zero.

The spirit of the adaptation is therefore a Robbins-Monro type adaptation of the covariance matrix in the random walk, with a target acceptance rate. It follows the lines Christophe Andrieu and I had drafted in our [most famous!] unpublished paper, Controlled MCMC for optimal sampling. The current paper shows that the fixed point for $S_n$ is proportional to the scale of the target if the latter is elliptically symmetric (but does not establish a sufficient condition for convergence). It concludes with a Law of Large Numbers for the empirical average of the $f(X_n)$ under rather strong assumptions (on f, the target, and the matrices $S_n$). The simulations run on formalised examples show a clear improvement over the existing adaptive algorithms (see above) and the method is implemented within Matti Vihola’s Grapham software. I presume Matti will present this latest work during his invited talk at Adap’skiii.

Ps-Took me at least 15 minutes to spot the error in the above LaTeX formula, ending up with S^\text{T}_{n−1}: Copy-pasting from the pdf file had produced an unconventional minus sign in n−1 that was impossible to spot!