**H**yungsuk Tak, Xiao-Li Meng and David van Dyk just arXived a paper on a multiple choice proposal in Metropolis-Hastings algorithms towards dealing with multimodal targets. Called “A repulsive-attractive Metropolis algorithm for multimodality” *[although I wonder why XXL did not jump at the opportunity to use the “love-hate” denomination!]*. The proposal distribution includes a [forced] downward Metropolis-Hastings move that uses the inverse of the target density π as its own target, namely 1/{π(x)+ε}. Followed by a [forced] Metropolis-Hastings upward move which target is {π(x)+ε}. The +ε is just there to avoid handling ratios of zeroes (although I wonder why using the convention 0/0=1 would not work). And chosen as 10⁻³²³ by default in connection with R smallest positive number. Whether or not the “downward” move is truly downwards and the “upward” move is truly upwards obviously depends on the generating distribution: I find it rather surprising that the authors consider the *same* random walk density in both cases as I would have imagined relying on a more dispersed distribution for the downward move in order to reach more easily other modes. For instance, the downward move could have been based on an *anti*-Langevin proposal, relying on the gradient to proceed further down…

This special choice of a single proposal however simplifies the acceptance ratio (and keeps the overall proposal symmetric). The final acceptance ratio still requires a ratio of intractable normalising constants that the authors bypass by Møller et al. (2006) auxiliary variable trick. While the authors mention the alternative pseudo-marginal approach of Andrieu and Roberts (2009), they do not try to implement it, although this would be straightforward here: since the normalising constants are the probabilities of accepting a downward and an upward move, respectively. Those can easily be evaluated at a cost similar to the use of the auxiliary variables. That is,

– generate a few moves from the current value and record the proportion *p* of accepted downward moves;

– generate a few moves from the final proposed value and record the proportion *q* of accepted downward moves;

and replace the ratio of intractable normalising constants with *p/q*. It is not even clear that one needs those extra moves since the algorithm requires an acceptance in the downward and upward moves, hence generate Geometric variates associated with those probabilities p and q, variates that can be used for estimating them. From a theoretical perspective, I also wonder if forcing the downward and upward moves truly leads to an improved convergence speed. Considering the case when the random walk is poorly calibrated for either the downward or upward move, the number of failed attempts before an acceptance may get beyond the reasonable.

As XXL and David pointed out to me, the unusual aspect of the approach is that here the proposal density is intractable, rather than the target density itself. This makes using Andrieu and Roberts (2009) seemingly less straightforward. However, as I was reminded this afternoon at the statistics and probability seminar in Bristol, the argument for the pseudo-marginal based on an unbiased estimator is that w Q(w|x) has a marginal in x equal to π(x) when the expectation of w is π(x). In thecurrent problem, the proposal in x can extended into a proposal in (x,w), w P(w|x) whose marginal is the proposal on x.

If we complement the target π(x) with the conditional P(w|x), the acceptance probability would then involve

{π(x’) P(w’|x’) / π(x) P(w|x)} / {w’ P(w’|x’) / w P(w|x)} = {π(x’) / π(x)} {w/w’}

so it seems the pseudo-marginal (or auxiliary variable) argument also extends to the proposal. Here is a short experiment that shows no discrepancy between target and histogram:

nozero=1e-300
#love-hate move
move<-function(x){
bacwa=1;prop1=prop2=rnorm(1,x,2)
while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){
prop1=rnorm(1,x,2);bacwa=bacwa+1}
while (runif(1)>{pi(prop2)+nozero}/{pi(prop1)+nozero})
prop2=rnorm(1,prop1,2)
y=x
if (runif(1)<pi(prop2)*bacwa/pi(x)/fowa){
y=prop2;assign("fowa",bacwa)}
return(y)}
#arbitrary bimodal target
pi<-function(x){.25*dnorm(x)+.75*dnorm(x,mean=5)}
#running the chain
T=1e5
x=5*rnorm(1);luv8=rep(x,T)
fowa=1;prop1=rnorm(1,x,2) #initial estimate
while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){
fowa=fowa+1;prop1=rnorm(1,x,2)}
for (t in 2:T)
luv8[t]=move(luv8[t-1])