## merging MCMC subposteriors

**C**hristopher Nemeth and Chris Sherlock arXived a paper yesterday about an approach to distributed MCMC sampling via Gaussian processes. As in several other papers commented on the ‘Og, the issue is to merge MCMC samples from sub-posteriors into a sample or any sort of approximation of the complete (product) posterior. I am quite sympathetic to the approach adopted in this paper, namely to use a log-Gaussian process representation of each sub-posterior and then to replace each sub-posterior with its log-Gaussian process posterior expectation in an MCMC or importance scheme. And to assess its variability through the posterior variance of the sum of log-Gaussian processes. As pointed out by the authors the closed form representation of the posterior mean of the log-posterior is invaluable as it allows for an HMC implementation. And importance solutions as well. The probabilistic numerics behind this perspective are also highly relevant.

A few arguable (?) points:

- The method often relies on importance sampling and hence on the choice of an importance function that is most likely influential but delicate to calibrate in complex settings as I presume the Gaussian estimates are not useful in this regard;
- Using Monte Carlo to approximate the value of the approximate density at a given parameter value (by simulating from the posterior distribution) is natural but is it that efficient?
- It could be that, by treating all sub-posterior samples as noisy versions of the same (true) posterior, a more accurate approximation of this posterior could be constructed;
- The method relies on the exponentiation of a posterior expectation or simulation. As of yesterday, I am somehow wary of log-normal expectations!
- If the purpose of the exercise is to approximate univariate integrals, it would seem more profitable to use the Gaussian processes at the univariate level;
- The way the normalising missing constants and the duplicate simulations are processed (or not) could deserve further exploration;
- Computing costs are in fine unclear when compared with the other methods in the toolbox.

June 9, 2016 at 4:28 pm

Thank you for taking the time to look at our paper. We’d like to briefly respond to a few of the points.

1. Our main DIS uses the HMC sample from the mean of the exponentiated GP as the proposal distribution and so adapts to the shape. There could, potentially, have been an issue with a mismatch in the tails but we saw no evidence of this. Student-t versions of the Gaussian approximation seemed to work reasonably well for the GP-IS, and we note that the HMC sample could have been used instead (subject to the extra computational expense).

3. Interesting point! This had not occurred to us. If the data were partitioned at random then there might well be something that could be done here (though I’ve not through what). If, on the other hand, the data were only independent across partitions (e.g. multiple observations per subject) then the structure might mean there are real differences in the subposteriors.

4. Agreed. We should probably provide typical values of the estimate of the GP parameter \hat{\sigma} to show that it is not too large. Note that, except in the tails, \sigma(\theta)<<\sigma.

5. Unfortunately, this won't work for our method as we only have values of the *joint* subposterior, \pi(\theta_1,…\theta_d). We do not have the *marginal* subposterior \pi(\theta_1), say so we can't fit a GP to the log of this.