On optimality of kernels for ABC-SMC
This freshly arXived paper by Sarah Filippi, Chris Barnes, Julien Cornebise, and Michael Stumpf, is in the lineage of our 2009 Biometrika ABC-PMC (population Monte Carlo) paper with Marc Beaumont, Jean-Marie Cornuet and Jean-Michel Marin. (I actually missed the first posting while in Berlin last summer. Flying to Utah gave me the opportunity to read it at length!) The paper focusses on the impact of the transition kernel in our PMC scheme: while we used component-wise adaptive proposals, the paper studies multivariate adaptivity with a covariance matrix adapted from the whole population, or locally or from an approximation to the information matrix. The simulation study run in the paper shows that, even when accounting for the additional cost due to the derivation of the matrix, the multivariate adaptation can improve the acceptance rate by a fair amount. So this is an interesting and positive sequel to our paper (that I may well end up refereeing one of those days, like an earlier paper from some of the authors!)
The main criticism I may have about the paper is that the selection of the tolerance sequence is not done in an adaptive way, while it could, given the recent developments of Del Moral et al. and of Drovandri and Pettitt (as well as our even more recent still-un-arXived submission to Stat & Computing!). While the target is the same for all transition kernels, thus the comparison still makes sense as is, the final product is to build a complete adaptive scheme that comes as close as possible to the genuine posterior.
This paper also raised a new question: there is a slight distinction between the Kullback-Leibler divergence we used and the Kullback-Leibler divergence the authors use here. (In fact, we do not account for the change in the tolerance.) Now, since what only matters is the distribution of the current particles, and while the distribution on the past particles is needed to compute the double integral leading to the divergence, there is a complete freedom in the choice of this past distribution. As in Del Moral et al., the distribution L(θ:t-1|θt) could therefore be chosen towards an optimal acceptance rate or something akin. I wonder if anyone ever looked at this…