Along with David Frazier and Gael Martin from Monash University, Melbourne, we have just completed (and arXived) a paper on the (Bayesian) consistency of ABC methods, producing sufficient conditions on the summary statistics to ensure consistency of the ABC posterior. Consistency in the sense of the prior concentrating at the true value of the parameter when the sample size and the inverse tolerance (intolerance?!) go to infinity. The conditions are essentially that the summary statistics concentrates around its mean and that this mean identifies the parameter. They are thus weaker conditions than those found earlier consistency results where the authors considered convergence to the genuine posterior distribution (given the summary), as for instance in Biau et al. (2014) or Li and Fearnhead (2015). We do not require here a specific rate of decrease to zero for the tolerance ε. But still they do not hold all the time, as shown for the MA(2) example and its first two autocorrelation summaries, example we started using in the Marin et al. (2011) survey. We further propose a consistency assessment based on the main consistency theorem, namely that the ABC-based estimates of the marginal posterior densities for the parameters should vary little when adding extra components to the summary statistic, densities estimated from simulated data. And that the mean of the resulting summary statistic is indeed one-to-one. This may sound somewhat similar to the stepwise search algorithm of Joyce and Marjoram (2008), but those authors aim at obtaining a vector of summary statistics that is as informative as possible. We also examine the consistency conditions when using an auxiliary model as in indirect inference. For instance, when using an AR(2) auxiliary model for estimating an MA(2) model. And ODEs.
Archive for Melbourne
Last week, on arXiv, Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, and Jun Liu (all from Harvard) proposed a weighting scheme to associated MCMC simulations, in connection with the parallel MCMC of Ben Calderhead discussed earlier on the ‘Og. The weight attached to each proposal is either the acceptance probability itself (with the rejection probability being attached to the current value of the MCMC chain) or a renormalised version of the joint target x proposal, either forward or backward. Both solutions are unbiased in that they have the same expectation as the original MCMC average, being some sort of conditional expectation. The proof of domination in the paper builds upon Calderhead’s formalism.
This work reminded me of several reweighting proposals we made over the years, from the global Rao-Blackwellisation strategy with George Casella, to the vanilla Rao-Blackwellisation solution we wrote with Randal Douc a few years ago, both of whom also are demonstrably improving upon the standard MCMC average. By similarly recycling proposed but rejected values. Or by diminishing the variability due to the uniform draw. The slightly parallel nature of the approach also connects with our parallel MCM version with Pierre Jacob (now Harvard as well!) and Murray Smith (who now leaves in Melbourne, hence the otherwise unrelated picture).
While it took quite a while (!), with several visits by three of us to our respective antipodes, incl. my exciting trip to Melbourne and Monash University two years ago, our paper on ABC for state space models was arXived yesterday! Thanks to my coauthors, Gael Martin, Brendan McCabe, and Worapree Maneesoonthorn, I am very glad of this outcome and of the new perspective on ABC it produces. For one thing, it concentrates on the selection of summary statistics from a more econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrated that both asymptotic sufficiency and Bayes consistency can be achieved when using maximum likelihood estimators of the parameters of an auxiliary model as summary statistics. In addition, the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score vector. Using the score instead of the MLE as a summary statistics allows for huge gains in terms of speed. The method is then applied to a continuous time state space model, using as auxiliary model an augmented unscented Kalman filter. We also found in the various state space models tested therein that the ABC approach based on the marginal [likelihood] score was performing quite well, including wrt Fearnhead’s and Prangle’s (2012) approach… I like the idea of using such a generic object as the unscented Kalman filter for state space models, even when it is not a particularly accurate representation of the true model. Another appealing feature of the paper is in the connections made with indirect inference.
In 1993, we wrote a paper [with George Casella and Gene/Juinn Hwang] on the paradoxical consequences of using the loss function
(published in Statistica Sinica, 3, 141-155) since it led to the following property: for the standard normal mean estimation problem, the regular confidence interval is dominated by the modified confidence interval equal to the empty set when s² is too large… This was first pointed out by Jim Berger and the most natural culprit is the artificial loss function where the first part is unbounded while the second part is bounded by k. Recently, Paul Kabaila—whom I met in both Adelaide, where he quite appropriately commented about the abnormal talk at the conference!, and Melbourne, where we met with his students after my seminar at the University of Melbourne—published a paper (first on arXiv then in Statistics and Probability Letters) where he demonstrates that the mere modification of the above loss into
solves the paradox:! For Jeffreys’ non-informative prior, the Bayes (optimal) estimate is the regular confidence interval. besides doing the trick, this nice resolution explains the earlier paradox as being linked to a lack of invariance in the (earlier) loss function. This is somehow satisfactory since Jeffreys’ prior also is the invariant prior in this case.
An add in Melbourne took a while to click in (for me!): it represented a woman projected on the hood of a car with the legend 65k and the same woman prostrated in front of the same car with the legend 60k… I was seeing this ad every day when driven to Monash and could not see the point as I was interpreting 60k as 60kg! So it sounded like a weird campaign for a new diet… After a while, I eventually got the point that it was a campaign towards speed reduction and against drivers thinking that 60km/h does not differ much from 65km/h. (I could not find a reproduction of the campaign posters on the official site.) Besides this misinterpretation, I find the message rather unclear and unconvincing: while driving more slowly obviously gives a driver more time to react, the 60k/65k opposition could be replaced with a 55k/60k opposition and would not make less or more sense. Furthermore, the variability in driver’s reactions and car behaviours is likely to influence the consequences of an impact as significantly as a reduction of 5km/h…
Following my courses in Monash, we celebrated (!) by having a nice dinner at St Kilda, one of Melbourne beaches. The restaurant had an incredible collection of French wines, including a whole range of wines from the upper Rhône valley, like Côte Rotie, Saint Joseph (my favourite!), and Cornas. Those were however priced at three and even four digits (in dollars or Euros!), and we sampled instead the local version of those, which were truly pleasant (although too young and missing a few hours of oxygenation!)