Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β]. I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…
Archive for JCGS
Richard Everitt tweetted yesterday about a recent publication in JCGS by Rajib Paul, Steve MacEachern and Mark Berliner on convergence assessment via stratification. (The paper is free-access.) Since this is another clear interest of mine’s, I had a look at the paper in the train to Besançon. (And wrote this post as a result.)
The idea therein is to compare the common empirical average with a weighted average relying on a partition of the parameter space: restricted means are computed for each element of the partition and then weighted by the probability of the element. Of course, those probabilities are generally unknown and need to be estimated simultaneously. If applied as is, this idea reproduces the original empirical average! So the authors use instead batches of simulations and corresponding estimates, weighted by the overall estimates of the probabilities, in which case the estimator differs from the original one. The convergence assessment is then to check both estimates are comparable. Using for instance Galin Jone’s batch method since they have the same limiting variance. (I thought we mentioned this damning feature in Monte Carlo Statistical Methods, but cannot find a trace of it except in my lecture slides…)
The difference between both estimates is the addition of weights p_in/q_ijn, made of the ratio of the estimates of the probability of the ith element of the partition. This addition thus introduces an extra element of randomness in the estimate and this is the crux of the convergence assessment. I was slightly worried though by the fact that the weight is in essence an harmonic mean, i.e. 1/q_ijn/Σ q_imn… Could it be that this estimate has no finite variance for a finite sample size? (The proofs in the paper all consider the asymptotic variance using the delta method.) However, having the weights adding up to K alleviates my concerns. Of course, as with other convergence assessments, the method is not fool-proof in that tiny, isolated, and unsuspected spikes not (yet) visited by the Markov chain cannot be detected via this comparison of averages.
We have now completed our revision of the parallel computation paper and hope to send it to JCGS within a few days. As seen on the arXiv version, and given the very positive reviews we received, the changes are minor, mostly focusing on the explanation of the principle and on the argument that it comes essentially free. Pierre also redrew the graphs in a more compact and nicer way, thanks to the ggplot2 package abilities. In addition, Pierre put the R and python programs used in the paper on a public depository.
We have now received reports back from JCGS for our parallel MCMC paper and they all are very nice and supportive! The reviewers essentially all like the Rao-Blackwellisation concept we developed in the paper and ask for additions towards a more concrete feeling for the practical consequences of the method. We should thus be able to manage a revision in the coming week or so. Especially because we have had further ideas about the extension of the method to regular Metropolis-Hastings algorithms like the random walk avatar. (The above picture is completely unrelated with the paper, but conveys some feeling of parallelism. I made it [in R] for the poster of the O’Bayes 03 meeting I organised in Aussois in 2003.)