Richard Everitt tweetted yesterday about a recent publication in JCGS by Rajib Paul, Steve MacEachern and Mark Berliner on convergence assessment via stratification. (The paper is free-access.) Since this is another clear interest of mine’s, I had a look at the paper in the train to Besançon. (And wrote this post as a result.)
The idea therein is to compare the common empirical average with a weighted average relying on a partition of the parameter space: restricted means are computed for each element of the partition and then weighted by the probability of the element. Of course, those probabilities are generally unknown and need to be estimated simultaneously. If applied as is, this idea reproduces the original empirical average! So the authors use instead batches of simulations and corresponding estimates, weighted by the overall estimates of the probabilities, in which case the estimator differs from the original one. The convergence assessment is then to check both estimates are comparable. Using for instance Galin Jone’s batch method since they have the same limiting variance. (I thought we mentioned this damning feature in Monte Carlo Statistical Methods, but cannot find a trace of it except in my lecture slides…)
The difference between both estimates is the addition of weights p_in/q_ijn, made of the ratio of the estimates of the probability of the ith element of the partition. This addition thus introduces an extra element of randomness in the estimate and this is the crux of the convergence assessment. I was slightly worried though by the fact that the weight is in essence an harmonic mean, i.e. 1/q_ijn/Σ q_imn… Could it be that this estimate has no finite variance for a finite sample size? (The proofs in the paper all consider the asymptotic variance using the delta method.) However, having the weights adding up to K alleviates my concerns. Of course, as with other convergence assessments, the method is not fool-proof in that tiny, isolated, and unsuspected spikes not (yet) visited by the Markov chain cannot be detected via this comparison of averages.