Initializing adaptive importance sampling with Markov chains

Another paper recently arXived by Beaujean and Caldwell elaborated on our population Monte Carlo papers (Cappé et al., 2005, Douc et al., 2007, Wraith et al., 2010) to design a more thorough starting distribution. Interestingly, the authors mention the fact that PMC is an EM-type algorithm to emphasize the importance of the starting distribution, as with “poor proposal, PMC fails as proposal updates lead to a consecutively poorer approximation of the target” (p.2). I had not thought of this possible feature of PMC, which indeed proceeds along integrated EM steps, and thus could converge to a local optimum (if not poorer than the start as the Kullback-Leibler divergence decreases).

The solution proposed in this paper is similar to the one we developed in our AMIS paper. An important part of the simulation is dedicated to the construction of the starting distribution, which is a mixture deduced from multiple Metropolis-Hastings runs. I find the method spends an unnecessary long time on refining this mixture by culling the number of components: down-the-shelf clustering techniques should be sufficient, esp. if one considers that the value of the target is available at every simulated point. This has been my pet (if idle) theory for a long while: we do not take (enough) advantage of this informative feature in our simulation methods… I also find the Student’s t versus Gaussian kernel debate (p.6) somehow superfluous: as we shown in Douc et al., 2007, we can process Student’s t distributions so we can as well work with those. And rather worry about the homogeneity assumption this choice implies: working with any elliptically symmetric kernel assumes a local Euclidean structure on the parameter space, for all components, and does not model properly highly curved spaces. Another pet theory of mine’s. As for picking the necessary number of simulations at each PMC iteration, I would add to the ESS and the survival rate of the components a measure of the Kullback-Leibler divergence, as it should decrease at each iteration (with an infinite number of particles).

Another interesting feature is in the comparison with Multinest, the current version of nested sampling, developed by Farhan Feroz. This is the second time I read a paper involving nested sampling in the past two days. While this PMC implementation does better than nested sampling on the examples processed in the paper, the Multinest outcome remains relevant, particularly because it handles multi-modality fairly well. The authors seem to think parallelisation is an issue with nested sampling, while I do see why: at the most naïve stage, several nested samplers can be run in parallel and the outcomes pulled together.

One Response to “Initializing adaptive importance sampling with Markov chains”

  1. Re: MultiNest … in fact we (myself & Farhan) have recently been having some promising results with a neat change to the MutliNest evidence summation with connection to AMIS. The point is to treat the draws from the proposal ellipsoids as if from a single mixture proposal (a la AMIS or Vardi’s biased sampling). Unlike AMIS which is intended to iterate ever closer towards an ‘optimal’ proposal the nested sampling version specifically follows an ellipsoidal approximation to the (inverse map of) nested sampling’s sequence of log delta Z increments.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.