multiple try Metropolis

martinLuca Martino and Francisco Louzada recently wrote a paper in Computational Statistics about some difficulties with the multiple try Metropolis algorithm. This version of Metropolis by Liu et al. (2000) makes several proposals in parallel and picks one among them by multinomial sampling where the weights are proportional to the corresponding importance weights. This is followed by a Metropolis acceptance step that requires simulating the same number of proposed moves from the selected value. While this is necessary to achieve detailed balance, this mixture of MCMC and importance sampling is inefficient in that it simulates a large number of particles and ends up using only one of them. By comparison, a particle filter for the same setting would propagate all N particles along iterations and only resamples occasionaly when the ESS is getting too small. (I also wonder if the method could be seen as a special kind of pseudo-marginal approach, given that the acceptance ratio is an empirical average with expectation the missing normalising constan [as I later realised the authors had pointed out!]… In which case efficiency comparisons by Christophe Andrieu and Matti Vihola could prove useful.)

The issue raised by Martino and Louzada is that the estimator of the normalising constant can be poor at times, especially when the chain is in low regions of the target, and hence get the chain stuck. The above graph illustrates this setting in the paper. However, the reason for the failure is mostly that the proposal distribution is inappropriate for the purpose of approximating the normalising constant, i.e., that importance sampling does not converge in this situation, since otherwise the average of the importance weights should a.s. converge to the normalising constant. And the method should not worsen when increasing the number of proposals at a given stage. (The solution proposed by the authors to have a random number of proposals seems unlikely to solve the issue in a generic situation. Changing the proposals towards different tail behaviours as in population Monte Carlo is more akin to defensive sampling and thus more likely to avoid trapping states. Interestingly, the authors eventually resort to a mixture denominator in the importance sampler following AMIS.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s