multiple try/point Metropolis algorithm

Among the arXiv documents I printed at the turn of the year in order to get a better look at them (in the métro if nowhere else!), there were two papers by Luca Martino and co-authors from Universidad Carlos III, Madrid, A multi-point Metropolis scheme with generic weight functions and Different acceptance functions for multiple try Metropolis schemes. The multiple-try algorithm sounds like another version of the delayed sampling algorithm of Tierney and Mira (1999) and Green and Mira (2001). I somehow missed it, even though it was introduced in Liu et al. (2000) and Quin and Liu (2001). Multiple-try Metropolis builds upon the idea that, instead of making one proposal at a time, it is feasible to build a sequence of proposals and to pick one among those, presumably rather likely and hence more open to being accepted. The sequence of proposals may depend upon the past propositions as well as on the current value, lending some degree of adaptability to the scheme. In the current implementation, the algorithm remains rather clumsy [in my opinion] in that (a) a fixed horizon N need be fixed and (b) an additional series of backward simulations need be produced simply to keep the balance equation happy… Hence a total number of simulations O(N) for one possible acceptance. The first note slightly extends Quin and Liu (2001) by using a fairly general weighting scheme. The second paper studies some particular choices for the weights in a much less adaptive scheme (where parallelisation would be an appropriate alternative, since each proposal in the multiple try only depends on the current value of the chain). But it does not demonstrate a more efficient behaviour than when using a cycle or a mixture of Metropolis-Hastings algorithms. The method seems to regain popularity, though, as Roberto Casarin, Radu Craiu and Fabrizio Leisen (also from Carlos III)  arXived a paper on a multiple-try algorithm, connected with population Monte Carlo, and more recently published it in Statistics and Computing.

8 Responses to “multiple try/point Metropolis algorithm”

  1. […] the discretisation means that all intermediate points can be used, as suggested by Paul via a multiple try scheme. Mark then presented an application of Hamiltonian ideas and schemes to biochemical […]

  2. […] of those arXiv papers that had sat on my to-read pile for way too long: Likelihood-free parallel tempering by Meïli […]

  3. […] &tc.the Art of R Programming [guest post]Improving convergence of Data Augmentation [published]multiple try/point Metropolis algorithmThe Alloy of […]

  4. […] Leo Stein on the geometry of Hamiltonian Monte Carlo was one of the remaining ones on my pile of unread arXiv postings. I read it in the Paris métro yesterday (on my way to Jon Wellner’s point de vue), which is […]

  5. […] the beer social in the faculty upstairs right after helped!). And I had time to work quietly on multiple-try MCMC, following thoughts raised both by the comments on the related post and the GPU meeting in Warwick. […]

  6. Radford makes a good point. In our experience it is difficult for a naive implementation of multiple-try Metropolis to achieve an advantage over ordinary Metropolis. As he suggests above, a possible way to overcome this is to avoid computation of the target density at each of the K multiple tries. This is the approach we took here: http://jmlr.csail.mit.edu/proceedings/papers/v9/pandolfi10a/pandolfi10a.pdf (the details are in Section 3.2).

  7. Radford Neal Says:

    Indeed, a problem with the original multiple-try Metropolis paper is that in none of the examples given do they demonstrate an advantage over ordinary Metropolis with the same proposal distribution. (They don’t even present such a comparison, preferring to compare against ordinary Metropolis with a different proposal distribution.)

    My “Ensemble MCMC” method (see http://arxiv.org/abs/1101.0387, and my blog entry at http://radfordneal.wordpress.com/2011/01/01/ensemble-mcmc/) can be seen as better way of doing something similar. One point I make in that paper is that one can expect an advantage only when there is some computational shortcut for evaluating the density at K points with less than K times the cost of evaluating the density at one point.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.