Hill climbing
Yesterday, I received the following email from Rob Taylor:
Dr. Robert, I’ve made an observation about a variation on the Gibbs sampler that hopefully would interest you enough to want to answer my question. I’ve noticed that if I want to simply estimate the mean of a unimodal posterior density (such as a multivariate Gaussian), I can modify the Gibbs sampler to just sample the MEAN of the full conditionals at each update and get convergence to the true posterior mean in many cases. In other words I’m only sampling the posterior mean instead of sampling the target posterior distribution (or something of that flavor). So my question is: Does modifying the Gibb’s sampler to sample only the mean of the full conditionals (instead of the sampling the distribution) have any supporting theory or prior art? Empirically it seems to work very well, but I don’t know if there’s an argument for why it works.
To which I replied: What you are implementing is closer to the EM algorithm than to Gibbs sampling. By using the (conditional) mean (or, better, mode) in unimodal conditional posteriors you are using a local maximum in one direction corresponding to the conditioned parameter and by repeating this across all parameters the algorithm increases the corresponding value of the posterior in well-behaved models. So this is a special case of hill climbing algorithm. The theory behind it is however gradient-like rather than Gibbs-like, because by taking the mean at each step you remove the randomness of a Gibbs sampler step and hence its Markovian validation. Simulated annealing would be a stochastic version of this algorithm, using Markov simulation but progressively concentrating the conditional distributions around their mode.
January 11, 2011 at 12:16 am
[…] the presentation of the first Le Monde puzzle of the year, I tried a simulated annealing solution on an early morning in my hotel room. Here is the R code, which is unfortunately too […]
October 7, 2010 at 8:04 am
Your mention of simulated annealing / conditionals in this context got me thinking of my comment with Aline Tabet on Andrieu et al’s recent read paper. To steal directly from the comment:
Through PMCMC [particle Markov chain Monte Carlo] sampling, we can separate the variables of interest into those which may be easily sampled by using traditional MCMC techniques and those which require a more specialized SMC approach. Consider for instance the use of simulated annealing in an SMC framework (Neal, 2001; Del Moral et al., 2006). Rather than finding the posterior maximum a posteriori estimate of all parameters, PMCMC sampling now allows practitioners to combine annealing with traditional MCMC methods to maximize over some dimensions simultaneously while exploring the full posterior in others.
It’d be interesting to study the properties of such an approach; as you say, it is perhaps closer to EM than MCMC.
October 7, 2010 at 8:13 am
If you replace mean with mode, you get Besag’s interated conditional modes (ICM) algorithm that he developed in the context of Markov random fields.
October 7, 2010 at 8:37 am
This sounds like profile likelihood. But a more interesting interpretation would be to separate easily simulated parameters from harder-to-simulate parameters and to replace the formers by their MAP, in order to facilitate the exploration of the posterior of the latters… Interesting, indeed!