## adaptive Metropolis-Hastings sampling using reversible dependent mixture proposals

**I**n the plane to Birmingham, I was reading this recent arXived paper by Minh-Ngoc Tran, Michael K. Pitt, and Robert Kohn. The adaptive structure of their ACMH algorithm is based upon two parallel Markov chains, the former (called the *trial* chain) feeding the proposal densities of the later (called the *main* chain), bypassing the more traditional diminishing adaptation conditions. (Even though convergence actually follows from a minorisation condition.) These proposals are mixtures of *t* distributions fitted by variational Bayes approximations. Furthermore, the proposals are (a) reversible and (b) mixing local [dependent] and global [independent] components. One nice aspect of the reversibility is that the proposals do not have to be evaluated at each step.

**T**he convergence results in the paper indeed assume a uniform minorisation condition on *all* proposal densities: although this sounded restrictive at first (but allows for straightforward proofs), I realised this could be implemented by adding a specific component to the mixture as in Corollary 3. (I checked the proof to realise that the minorisation on the proposal extends to the minorisation on the Metropolis-Hastings transition kernel.) A *reversible* kernel is defined as satisfying the detailed balance condition, which means that a *single* Gibbs step is reversible even though the Gibbs sampler as a whole is not. If a reversible Markov kernel with stationary distribution ζ is used, the acceptance probability in the Metropolis-Hastings transition is

α(x,z) = min{1,π(z)ζ(x)/π(x)ζ(z)}

(a result I thought was already known). The sweet deal is that the transition kernel involves Dirac masses, but the acceptance probability bypasses the difficulty. The way mixtures of *t* distributions can be reversible follows from Pitt & Walker (2006) construction, with ζ a specific mixture of *t* distributions. This target is estimated by variational Bayes. The paper further bypasses my classical objection to the use of normal, t or mixtures thereof, distributions: this modelling assumes a sort of common Euclidean space for all components, which is (a) highly restrictive and (b) very inefficient in terms of acceptance rate. Instead, Tran & al. resort to Metropolis-within-Gibbs by constructing a partition of the components into subgroups.

May 23, 2013 at 3:47 am

It sounds cool, even though I don’t fully understand it (e.g. the reference to Dirac Mass, among others)