Random dive MH

A new Metropolis-Hastings algorithm that I would call “universal” was posted by Somak Dutta yesterday on arXiv. Multiplicative random walk Metropolis-Hastings on the real line contains a different Metropolis-Hastings algorithm called the random dive. The proposed new value x’ given the current value x is defined by

\begin{cases} x\epsilon &\text{with probability } 1/2\\ x/\epsilon &\text{with probability } 1/2\end{cases}

when \epsilon is a random variable on (-1,1). Thus, at each iteration, the current value is either shrunk or expanded by a random amount. When I read the paper in the Paris metro, while appreciating that this algorithm could be geometrically ergodic as opposed to the classical random walk algorithm, I was not convinced by the practical impact of the method, as I thought that the special role of zero in the proposal was not always reflected in the target. Especially when considering that the proposal is parameter-free. However, after running the following R program on a target centred away from zero, I found the proposal quite efficient.

f=function(x){.5*dnorm(x,mean=14)+.5*dnorm(x,mean=35)}
Nsim=10^5
x=rep(5,Nsim)
for (t in 2:Nsim){
coef=runif(1,min=-1)^sample(c(-1,1),1)
prop=x[t-1]*coef
prob=abs(coef)*f(prop)/f(x[t-1])
x[t]=x[t-1]
if (runif(1)<prob) x[t]=prop
}
hist(x,pro=T,nclass=113,col=”wheat2″)
curve(f,add=T,n=1001,col=”sienna”,lwd=2)

Obviously, it is difficult to believe that this extension will keep working similarly well when the dimension increases but this is an interesting way of creating a heavy tail proposal.

9 Responses to “Random dive MH”

  1. […] was reading Christian Robert’s Blog today and quite liked the new Metropolis-Hastings algorithm he was discussing. It seemed simple and […]

  2. […] my suspicious and pessimistic nature always makes me wary of universality claims! I completely acknowledge the difficulty in picking the number of leapfrog steps L in the […]

  3. Somak: my naïve approach would be to consider first the “absolute” chain of the |x_t|‘s with target distribution f(y)+f(-y) on the positive real line. This chain is a regular log – random walk Metropolis-Hastings algorithm. Once the chain (|x_t|) is simulated, getting the sign of x_t given |x_t| is a simple Bernoulli (exact) simulation with probabilities proportional to f(|x_t|) and f(-|x_t|), so there is no convergence issue involved.

  4. @xian: Thank you very much for the valuable discussion. I am yet to explore the convergence properties in higher dimension. Properties similar to those in one-dimension are expected to hold, and in any case, numerical experiments demonstrate that the proposal is quite efficient in higher dimensions, particularly for thick-tailed and multimodal distributions. I also like to mention that this one-dimension issue should not be looked down upon since even RWMH began its journey with one dimension only.

    I’m also unaware of such proposals being used in stat. literature especially in such a general set-up. May be I’ll have to search harder..

    I also tried to switch to the ‘absolute part” and consider the random signs etc but I couldn’t proceed in that way. Theoretical results might be proved easily in higher dimension if such an argument via “log-absolute” chain goes through in one-dimensional set-up.

  5. @David: Multiplicative random walk is well-known and is NOT what I have considered in my article, explaining throughout, including the abstract and particularly Section 3, the differences between multiplicative random walk and RDMH. All the theory and application I found on “multiplicative random walk” are for problems with $(0,\infty)$ as the state space and that’s just log-random walk, and I’m not aware of any work that addresses multiplicative random walk on the entire real line, as I have.

    In your example also, the random multiplier is positive and hence if
    1. the state-space is $(0,\infty)$ then it’s just a log-random walk $\log x’ = \log x + \epsilon$ where $\epsilon$ has a density supported proportional to $\exp(\epsilon)$ on $(-\log\beta, \log\beta)$.
    2. the state-space is entire real line then the chain becomes reducible (an easy Exercise 6.14 in MCSM2) and is useless.

    The fact that the random multiplier can take both positive and negative values makes it different from other algorithms. The term “new” might a bit strong here. I guess I should update the draft and include the example (I was unaware of this one) given by you.
    Thanks.

  6. Sorry, it was in a comment I posted earlier but it didn’t appear for some reason. I was saying that this type of move is very common in phylogenetic tree estimation where the proposal is x -> zx for some z chosen uar from the interval [beta^-1,beta], for beta >1. In this case the Jacobian is 1/z. I’m surprised to see this being touted as a new algorithm when it is so similar to this rather commonly used proposal.

    • What surprised me is how efficient the Pareto tail (i.e. the 1/\epsilon bit) is in recovering faraway modes… I presume indeed similar proposals have been used in the MCMC literature. Also, if you consider that the change of sign from x to x’ is B(1/2), switching to the absolute “chain” means that it is a regular random walk in the log scale, so it may be that some results in the paper could also be derived from earlier papers on random walk Metropolis-Hastings algorithms. (The absolute value of the Markov chain is not a Markov chain, granted!)

  7. That should be beta > 1.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.