density normalization for MCMC algorithms

Another paper addressing the estimation of the normalising constant and the wealth of available solutions just came out on arXiv, with the full title of “Target density normalization for Markov chain Monte Carlo algorithms“, written by Allen Caldwell and Chang Liu. (I became aware of it by courtesy of Ewan Cameron, as it appeared in the physics section of arXiv. It is actually a wee bit annoying that papers in the subcategory “Data Analysis, Statistics and Probability” of physics do not get an automated reposting on the statistics lists…)

In this paper, the authors compare three approaches to the problem of finding

\mathfrak{I} = \int_\Omega f(\lambda)\,\text{d}\lambda

when the density f is unormalised, i.e., in more formal terms, when f is proportional to a probability density (and available):

  1. an “arithmetic mean”, which is an importance sampler based on (a) reducing the integration volume to a neighbourhood ω of the global mode. This neighbourhood is chosen as an hypercube and the importance function turns out to be the uniform over this hypercube. The corresponding estimator is then a rescaled version of the average of f over uniform simulations in ω.
  2.  an “harmonic mean”, of all choices!, with again an integration over the neighbourhood ω of the global mode in order to avoid the almost sure infinite variance of harmonic mean estimators.
  3. a Laplace approximation, using the target at the mode and the Hessian at the mode as well.

The paper then goes to comparing those three solutions on a few examples, demonstrating how the diameter of the hypercube can be calibrated towards a minimum (estimated) uncertainty. The rather anticlimactic conclusion is that the arithmetic mean is the most reliable solution as harmonic means may fail in larger dimension and more importantly fail to signal its failure, while Laplace approximations only approximate well quasi-Gaussian densities…

What I find most interesting in this paper is the idea of using only one part of the integration space to compute the integral, even though it is not exactly new. Focussing on a specific region ω has pros and cons, the pros being that the reduction to a modal region reduces needs for absolute MCMC convergence and helps in selecting alternative proposals and also prevents from the worst consequences of using a dreaded harmonic mean, the cons being that the region needs be well-identified, which means requirements on the MCMC kernel, and that the estimate is a product of two estimates, the frequency being driven by a Binomial noise.  I also like very much the idea of calibrating the diameter Δof the hypercube ex-post by estimating the uncertainty.

As an aside, the paper mentions most of the alternative solutions I just presented in my Monte Carlo graduate course two days ago (like nested or bridge or Rao-Blackwellised sampling, including our proposal with Darren Wraith), but dismisses them as not “directly applicable in an MCMC setting”, i.e., without modifying this setting. I unsurprisingly dispute this labelling, both because something like the Laplace approximation requires extra-work on the MCMC output (and once done this work can lead to advanced Laplace methods like INLA) and because other methods could be considered as well (for instance, bridge sampling over several hypercubes). As shown in the recent paper by Mathieu Gerber and Nicolas Chopin (soon to be discussed at the RSS!), MCqMC has also become a feasible alternative that would compete well with the methods studied in this paper.

Overall, this is a paper that comes in a long list of papers on constant approximations. I do not find the Markov chain of MCMC aspect particularly compelling or specific, once the effective sample size is accounted for. It would be nice to find generic ways of optimising the visit to the hypercube ω and to estimate efficiently the weight of ω. The comparison is solely run over examples, but they all rely on a proper characterisation of the hypercube and the ability to simulate efficiently f over that hypercube.

4 Responses to “density normalization for MCMC algorithms”

  1. Interesting comments; I haven’t had time to read the paper as thoroughly yet myself but my first impressions were that (a) the proposal of truncation to stabilise the HMA recalled Martin Weinberg’s BA paper ( ); and (b) the focus is indeed on hyper-cube-like parameter spaces, which reflects how we (presently) tend to think about Monte Carlo sampling problems in astronomy.

  2. My very limited (but I’m working on it! – the paper is hard) understanding of SQMC is that it’s limited to rather low sample space dimensions ~10. (I genuinely can’t tell if this is an effect of the “implementation” or the whole QMC concept…) So at least the final example in this paper doesn’t fall in that regime.

    The other methods you mention, however, should work fine.

    That being said, the higher the dimension (because I’m really interested in infinite dimensions), the less useful I think marginal likelihoods are (as it’s all about the prior in the regime, which is a knotty thing that is either given by god or by pragmatism [or, i guess, both?]), so maybe the regime in which SQMC works is the one for which this problem is relevant.

    • I think we should start seriously discussing this marginal likelihood “stuff” next time I am in Warwick then..!

      • Probably… It’s one of those things that has always disturbed me. I know in these context that my prior (or my model) is highly informative, and that that prior is not chosen in a subjective way, so I’m scared of marginal likelihoods except possibly in a “model averaging context”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s