Yesterday, I received the following email from Rob Taylor:
Dr. Robert, I’ve made an observation about a variation on the Gibbs sampler that hopefully would interest you enough to want to answer my question. I’ve noticed that if I want to simply estimate the mean of a unimodal posterior density (such as a multivariate Gaussian), I can modify the Gibbs sampler to just sample the MEAN of the full conditionals at each update and get convergence to the true posterior mean in many cases. In other words I’m only sampling the posterior mean instead of sampling the target posterior distribution (or something of that flavor). So my question is: Does modifying the Gibb’s sampler to sample only the mean of the full conditionals (instead of the sampling the distribution) have any supporting theory or prior art? Empirically it seems to work very well, but I don’t know if there’s an argument for why it works.
To which I replied: What you are implementing is closer to the EM algorithm than to Gibbs sampling. By using the (conditional) mean (or, better, mode) in unimodal conditional posteriors you are using a local maximum in one direction corresponding to the conditioned parameter and by repeating this across all parameters the algorithm increases the corresponding value of the posterior in well-behaved models. So this is a special case of hill climbing algorithm. The theory behind it is however gradient-like rather than Gibbs-like, because by taking the mean at each step you remove the randomness of a Gibbs sampler step and hence its Markovian validation. Simulated annealing would be a stochastic version of this algorithm, using Markov simulation but progressively concentrating the conditional distributions around their mode.