## MCMSki [day 2]

I was still feeling poorly this morning with my brain in a kind of flu-induced haze so could not concentrate for a whole talk, which is a shame as I missed most of the contents of the astrostatistics session put together by David van Dyk… Especially the talk by Roberto Trotta I was definitely looking for. And the defence of nested sampling strategies for marginal likelihood approximations. Even though I spotted posterior distributions for WMAP and Plank data on the ΛCDM that reminded me of our own work in this area… Apologies thus to all speakers for dozing in and out, it was certainly not due to a lack of interest!

Sebastian Seehars mentioned emcee (for ensemble Monte Carlo), with a corresponding software nicknamed “the MCMC hammer”, and their own CosmoHammer software. I read the paper by Goodman and Ware (2010) this afternoon during the ski break (if not on a ski lift!). Actually, I do not understand why an MCMC should be affine invariant: a good adaptive MCMC sampler should anyway catch up the right scale of the target distribution. Other than that, the ensemble sampler reminds me very much of the pinball sampler we developed with Kerrie Mengersen (1995 Valencia meeting), where the target is the product of L targets,

$\pi(x_1)\cdots\pi(x_L)$

and a Gibbs-like sampler can be constructed, moving one component (with index k, say) of the L-sample at a time. (Just as in the pinball sampler.) Rather than avoiding all other components (as in the pinball sampler), Goodman and Ware draw a single other component at random  (with index j, say) and make a proposal away from it:

$\eta=x_j(t) + \zeta \{x_k(t)-x_j(t)\}$

where ζ is a scale random variable with (log-) symmetry around 1. The authors claim improvement over a single track Metropolis algorithm, but it of course depends on the type of Metropolis algorithms that is chosen… Overall, I think the criticism of the pinball sampler also applies here: using a product of targets can only slow down the convergence. Further, the affine structure of the target support is not a given. Highly constrained settings should not cope well with linear transforms and non-linear reparameterisations would be more efficient….

### 3 Responses to “MCMSki [day 2]”

1. Goodman/Weare (not “Ware”) walkers and ter Braak’s diffrential evolution are similar in that they both use ensembles. The difference is that Goodman/Weare’s basic walker steps interpolate/extraploate between two ensemble points to update one of them, whereas differential evolution uses the line between two other points to update a third ensemble member. Not being a math stats whiz, I didn’t initially understand the Metropolis acceptance for the Goodman/Weare approach — but Goodman patiently explained to me why you sometimes need to reason about neighborhoods, not just points.

emcee is a nice piece of software in that it lets you plug in an arbitrarily-defined density function in Python. So people are using it with large, externally-defined models that often have person-years invested in them, such as big physics models for astronomy or energy usage models simulating buildings — something you can’t do with BUGS/JAGS/Stan (though you could do with a simple Metropolis algorithm).

The argument for affine invariance is that you don’t have to do any adaptation or reparameterization gymnastics. My understanding is that Gibbs is scale invariant, but not rotation invariant, whereas HMC with fixed step size and unit mass matrix is rotation invariant, but not scale invariant. If you estimate a dense mass matrix for HMC, (I think) you can get global scale invariance and preserve rotation invariance. Of course, RHMC gives you the best of both worlds in that it’s not only scale and rotation invariant, it adapts to local curvature of the kind you get in hierarchical models (like Neal’s “textbook” example, the funnel [or “whirlpool” as Andrew would have it]).

It’d be nice to see some evaluations of these ensemble samplers compared to other methods. Especially for hierarchical models. We plan to implement both the ter Braak and Goodman/Weare samplers in Stan, so we should be able to do this for a large range of relatively simple models. But we probably won’t get to it soon (unless someone volunteers…).

• Bob, any news on this comparison you were mentioning a while ago..?!

2. That proposal looks a lot like the proposal from C.F. ter Braak’s differential evolution Monte Carlo sampler…