t-walk on the wild side

When I read in the abstract of the recent A General Purpose Sampling Algorithm for Continuous Distributions, published by Christen and Fox in Bayesian Analysis that

We develop a new general purpose MCMC sampler for arbitrary continuous distributions that requires no tuning.

I am slightly bemused. The proposal of the authors is certainly interesting and widely applicable but to cover arbitrary distributions in arbitrary dimensions with no tuning and great performances sounds too much like marketing on steroids! The 101 Theorem in MCMC methods is that, no matter how good your sampler is, there exists an exotic distribution out there whose only purpose is to make it crash!

The algorithm in A General Purpose Sampling Algorithm for Continuous Distributions is based on two dual and coupled chains which are used towards a double target $\pi(x)\pi(x^\prime)$. Given that only one of the two chains moves at each iteration, according to a random walk, there is a calibration parameter that definitely influences the performances of the method, if not the acceptance probability. This multiple chain approach is reminding me  both of coupled schemes developed by Gareth Roberts in the late 1990’s, along with Laird Breyer, in the wake of the perfect sampling “revolution” and of delayed rejection sampling, as proposed by Antonietta Mira in those years as well. However, there is no particular result in the paper showing an improvement in convergence time over more traditional samplers. (In fact, the random walk nature of the algorithm strongly suggests a lack of uniform ergodicity.) The paper only offers a comparison with an older optimal scaled random walk proposal of Roberts and Rosenthal (Statistical Science, 2001). Rather than with the more recent and effective adaptive Metropolis-Hastings algorithm developed by the same authors.

Since the authors developed a complete set of computer packages, including one in R, I figure people will start to test the method to check for possible improvement over the existing solutions. If the t-walk is indeed superior sui generis, we should hear more about it in the near future…

3 Responses to “t-walk on the wild side”

1. […] dive MH A new Metropolis-Hastings algorithm that I would call “universal” was posted by Somak Dutta yesterday on arXiv. contains a different Metropolis-Hastings algorithm […]

2. Colin: The comparison with our down-the-shelf PMC sampler is shown on Darren Wraith’s experiment. I agree with you that the more basic proposals we have the better, since mixing proposals always makes things better.

3. Of course one is correct to be skeptical about a random-walk sampler that claims to be super-fast for arbitrary distributions and have no tuning. Fortunately we do not claim that for the t-walk — we are way too seasoned in running MCMC on big problems to be so mistaken. What we do claim is that the t-walk will do an adequate job for many routine problems faced by practicing statisticians, and significantly reduce the problem-specific work required, so reducing the human time required to complete an analysis. In the present age of exotic super-multi-hyper MCMC algorithms, plain old utility to the practitioner seems to be not enough for some people. We still believe it is a laudable goal. To believe that a method must demonstrate “a … result … showing an improvement in convergence time over more traditional samplers” is missing exactly this point.

It is true that there are distributions for which the t-walk will perform poorly, and we mention some that we found in the paper. Nevertheless, the zero time that it takes to fire up a t-walk sampler makes it a great thing to try first. If it does not work for your problem, move on — it has cost you very little.

The adequate performance of the t-walk is demonstrated by the comparisons presented in the paper. You don’t need to take our word for it — you can download the t-walk code and run the tests yourself.

On a technical note, I think the comparison to Roberts’ coupled schemes (that were big advances when presented) is naive in terms of important algorithmic properties. It is not an accident that the t-walk scales very differently with problem size. One is missing an(other) important property of a sampler to not see scaling as a primary issue.

This site uses Akismet to reduce spam. Learn how your comment data is processed.